Hi Tarun, Sachin is correct you use the layout file to identify which position in the string of characters correspond to which variables. Even though I'm an R user I think this extraction is more easily done in STATA. I've attached my STATA code for the 68th round extraction
Since the NSSO data are samples, the multiplier acts as a survey weights so you can get population level estimates based on the sampled survey responses. Look at the readme (attached) for more information on how these multipliers are used to calculate survey weights (especially this part): For generating subsample-wise estimates based on data of all subrounds taken together, either Subsample-1 households or Subsample-2 households are to be considered at one time. Subsample code is available in the data file. (Please see layout of data). Apply final weight (or all-subround multipliers) as follows : final weight = MLT/100, if NSS=NSC = MLT/200 otherwise. Also, I found this blog <https://zakku78.wordpress.com/category/nsso-data/> very helpful for explaining NSSO data, the comments in particular may ask and answer common questions that you have. You can even write to the author and he seems generally quick to respond. Good luck! On Sunday, August 12, 2018 at 5:28:57 AM UTC+5:30, Tarun Kateja wrote: > > Hi Sachin, > > I also want to extract 68th round Household and Consumer expenditure data. > I am little confused and have never worked with Stata. Can you explain what > is multiplier and how to use it? and can you share your code to extract > data from .txt file? > > This will be a great help! > > Thanks > > On Monday, September 5, 2016 at 1:03:02 PM UTC+5:30, sachin wrote: >> >> Hi, >> I have used 68th round data for agri consumption and poverty estimation >> using STATA. >> I am assuming that the raw data you are referring to is also available in >> .txt format. As I know, the NSSO data has a highly structured format - >> Schedule.Level>Block>Item No. The variables are not declared in the raw >> data. These variables are to be understood from the "layout" file for that >> specific round (released along with the NSSO round data) and this is >> available along with raw data. >> >> The data is a long string characters. These are read in a specific >> manner. The layout file will specify how many characters must be read >> together to form each variable. So it could look like - >> v11 1-3 v12 4-8 v13 9-10 v14 11-13 v15 14-14 v16 15-15 v17 16-18 v18 >> 19-20 v19 21-22 v110 23-24 v111 25-25 v112 26-26 and so on. >> >> Now, this is the data that is then called from your software, to be read >> from a raw data file (.txt) and then a table of required variables is >> obtained for analysis. In a sense, the raw data is always excerpted for >> analysis. And for this one begins with the layout file to check the >> variables of interest and how they are encoded in the data. >> >> I am not sure this helps. With STATA it works a bit easy. With R, I do >> not know how to assemble the same dataframe, although the analysis using >> the variables will be a breeze. >> >> Best >> Sachin >> >> >> >> On Sunday, 4 September 2016 15:27:55 UTC+5:30, Devdatta Tengshe wrote: >>> >>> Can you share the link where this data is available? That way we can >>> have a look at it. >>> >>> Regards, >>> Devdatta Tengshe >>> Ph: 735-358-0782 >>> >>> On 04-Sep-2016 3:01 pm, "Jagriti Arora" <reach....@gmail.com> wrote: >>> >>>> Hi, >>>> Can anyone tell me how I can make sense of the raw data NSSO provides >>>> on its website? >>>> I tried converting the XML to dataframe in R, to no avail. I, now, have >>>> an excel sheet with references and variables that have not been previously >>>> declared. >>>> Can anyone help? I'm looking for data from 38th and 66th round. >>>> >>>> Thanks and regards! >>>> >>>> -- >>>> Datameet is a community of Data Science enthusiasts in India. Know more >>>> about us by visiting http://datameet.org >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "datameet" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to datameet+u...@googlegroups.com. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
NSS_68th_Type2.do
Description: Binary data
Government of India. Data Processing Division. National Sample Survey Office. 164, Gopal Lal Thakur Road, Kolkata-108. Phone No. 2577-1128. --------------------------------------- NSS 68th Round. Final Multiplier-posted unit-level data for Schedule 1.0 Type-1 of NSS 68th round. A) Data for Consumer Expenditure Survey (Sch. 1.0 Type 1). Bihar There are 11 files belonging to 11 different levels as per layout (lay68_sch010_typ1.xls). Data Files ------------------------------------------------------------------------------ No. of Data Remarks records Files name ------------------------------------------------------------------------------ 4582 R6801T1L01.txt Level-01 records for the state of Bihar 4582 R6801T1L02.txt Level-02 records for the state of Bihar 4582 R6801T1L03.txt Level-03 records for the state of Bihar 24014 R6801T1L04.txt Level-04 records for the state of Bihar 248509 R6801T1L05.txt Level-05 records for the state of Bihar 76604 R6801T1L06.txt Level-06 records for the state of Bihar 16097 R6801T1L07.txt Level-07 records for the state of Bihar 98096 R6801T1L08.txt Level-08 records for the state of Bihar 158094 R6801T1L09.txt Level-09 records for the state of Bihar 4582 R6801T1L10.txt Level-10 records for the state of Bihar 162284 R6801T1L11.txt Level-11 records for the state of Bihar ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Record length for data is 142. B) Multiplier files for Schdule 1.0 -------------------------------------------------------------- No. of No. of Multiplier records bytes # File name -------------------------------------------------------------- 12784 3950256 mlt6801 -------------------------------------------------------------- Record length is 308. # : All the levelwise data files and multiplier file have been converted to DOS for user's convenience. The "No. of Bytes" shown here is according to UNIX. Note for users : ---------------- (1) These are text data with fixed record-length of 143 characters (including new-line character). First 126 bytes are data, next 6 bytes comprise of number of first stage units surveyed within a substratum for the sub-sample (NSS) and sub-sample combined (NSC) and next 10 bytes are weight or multiplier for the sub-sample (MLT). Last byte is for Newline character. (2) The Layout of data is given in the MS Excel-file lay68_sch010_typ1.xls. (3) For generating any estimate, one has to extract relevant portion of the data, and aggregate after applying the weights. (4) Weights (or multipliers) are given at the end of each record from 133rd byte onwards. The weights (multipliers) are Sub-sample-wise, details of which are as given below : (For description of subsample, please see Instructions, NSS 68th Round, Manual for field staff, Vol-I) NSS,NSC and subsample-wise weights (all-subround multipliers) ------------------------------------------------------------- NSS = Bytes 127-129 (3 bytes) NSC = Bytes 130-132 (3 bytes) MLT = Bytes 133-142 (10 bytes, assumed two places of decimal) ------------------------------------------------------------- All records of an household will have same weight figure. In case of those Blocks/Levels, where Item/Person Sl.No. is not applicable, the field is filled up with "00000". (5) In the value fields (in Rs. or quantity or area etc.) only the numeric figure is given in datafile. The decimal point is to be assumed after looking at the type of that field in the printed schedule. ---------------------------------------------------------------- Use of subsample-wise weights (all-subround multipliers) ---------------------------------------------------------------- For generating subsample-wise estimates based on data of all subrounds taken together, either Subsample-1 households or Subsample-2 households are to be considered at one time. Subsample code is available in the data file. (Please see layout of data). Apply final weight (or all-subround multipliers) as follows : final weight = MLT/100, if NSS=NSC = MLT/200 otherwise. ----------------------------------------------------------------------------- (6) Common Primary Key for identification of a record for any schedule is : FSU Serial Number = 4(5) (i.e., offset = 4th byte, length = 5 bytes) Segment Number = 32(1) Second Stage Stratum Number = 33(1) household Number = 34(2) Level Number = 36(2) Item Code = 38(5) ------------------------------------------------------------------------------- (7) List of Documents --------------------- General Information ----- README68_S010T1.txt Text Data Layout ----- lay68_sch010_typ1.xls Subsample-wise multiplier (all-subround) file layout for schedule 1.0 ----- multlay68_010.xls Blank schedule 0.0 ----- sch00-final_180411.doc Blank schedule 1.0 ----- sch 1.0 type 1-final.doc Estimation procedure note and related tables for stratum composition for 68th round in the Common folder ----- EST68_final.doc State codes ----- State code.doc Ammendment to NIC 2008 ----- nic amendment_2008.pdf Please note that Blank schedules are in the folder 68v-2 within the Common folder. Please note that instructions for different schedules are in the folder 68v-1 within the Common folder. ------------------------------------------------------------------------------