Re: [datameet] Extracting NSSO data

GALEN PATRICK MURRAY Mon, 13 Aug 2018 16:06:18 -0700

Hi Tarun,

Sachin is correct you use the layout file to identify which position in the 
string of characters correspond to which variables. Even though I'm an R 
user I think this extraction is more easily done in STATA. I've attached my 
STATA code for the 68th round extraction


Since the NSSO data are samples, the multiplier acts as a survey weights so 
you can get population level estimates based on the sampled survey 
responses. Look at the readme (attached) for more information on how these 
multipliers are used to calculate survey weights (especially this part):

 For generating subsample-wise estimates based on data of all 
    subrounds taken together, either Subsample-1 households or 
    Subsample-2 households are to be considered at one time.  
    Subsample code is available in the data file. 
    (Please see layout of data).   
      
     Apply final weight (or all-subround multipliers) as follows :
     
     final weight = MLT/100,   if NSS=NSC
                  = MLT/200    otherwise.

Also, I found this blog <https://zakku78.wordpress.com/category/nsso-data/> 
very 
helpful for explaining NSSO data, the comments in particular may ask and 
answer common questions that you have. You can even write to the author and 
he seems generally quick to respond. Good luck!




On Sunday, August 12, 2018 at 5:28:57 AM UTC+5:30, Tarun Kateja wrote:
>
> Hi Sachin,
>
> I also want to extract 68th round Household and Consumer expenditure data. 
> I am little confused and have never worked with Stata. Can you explain what 
> is multiplier and how to use it? and can you share your code to extract 
> data from .txt file? 
>
> This will be a great help!
>
> Thanks
>
> On Monday, September 5, 2016 at 1:03:02 PM UTC+5:30, sachin wrote:
>>
>> Hi, 
>> I have used 68th round data for agri consumption and poverty estimation 
>> using STATA. 
>> I am assuming that the raw data you are referring to is also available in 
>> .txt format. As I know, the NSSO data has a highly structured format - 
>> Schedule.Level>Block>Item No. The variables are not declared in the raw 
>> data. These variables are to be understood from the "layout" file for that 
>> specific round (released along with the NSSO round data) and this is 
>> available along with raw data. 
>>
>> The data is a long string characters. These are read in a specific 
>> manner. The layout file will specify how many characters must be read 
>> together to form each variable. So it could look like - 
>> v11 1-3 v12 4-8 v13 9-10 v14 11-13 v15 14-14 v16 15-15 v17 16-18 v18 
>> 19-20 v19 21-22 v110 23-24 v111 25-25 v112 26-26 and so on. 
>>
>> Now, this is the data that is then called from your software, to be read 
>> from a raw data file (.txt) and then a table of required variables is 
>> obtained for analysis. In a sense, the raw data is always excerpted for 
>> analysis. And for this one begins with the layout file to check the 
>> variables of interest and how they are encoded in the data.
>>
>> I am not sure this helps. With STATA it works a bit easy. With R, I do 
>> not know how to assemble the same dataframe, although the analysis using 
>> the variables will be a breeze.
>>
>> Best
>> Sachin 
>>  
>>
>>
>> On Sunday, 4 September 2016 15:27:55 UTC+5:30, Devdatta Tengshe wrote:
>>>
>>> Can you share the link where this data is available? That way we can 
>>> have a look at it.
>>>
>>> Regards,
>>> Devdatta Tengshe
>>> Ph: 735-358-0782
>>>
>>> On 04-Sep-2016 3:01 pm, "Jagriti Arora" <reach....@gmail.com> wrote:
>>>
>>>> Hi,
>>>> Can anyone tell me how I can make sense of the raw data NSSO provides 
>>>> on its website?
>>>> I tried converting the XML to dataframe in R, to no avail. I, now, have 
>>>> an excel sheet with references and variables that have not been previously 
>>>> declared.
>>>> Can anyone help? I'm looking for data from 38th and 66th round.
>>>>
>>>> Thanks and regards!
>>>>
>>>> -- 
>>>> Datameet is a community of Data Science enthusiasts in India. Know more 
>>>> about us by visiting http://datameet.org
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "datameet" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to datameet+u...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

NSS_68th_Type2.do
Description: Binary data

Government of India.
Data Processing Division.
National Sample Survey Office.
164, Gopal Lal Thakur Road, Kolkata-108.
Phone No. 2577-1128.
---------------------------------------
NSS 68th Round.
Final Multiplier-posted unit-level data 
for Schedule 1.0 Type-1 of NSS 68th round.


A) Data for Consumer Expenditure Survey (Sch. 1.0 Type 1). Bihar
   There are 11 files belonging to 11 different levels as per layout 
   (lay68_sch010_typ1.xls).

Data Files
------------------------------------------------------------------------------
    No. of        Data                               Remarks    
   records       Files name
------------------------------------------------------------------------------
      4582      R6801T1L01.txt   Level-01 records   for the state of Bihar   
      4582      R6801T1L02.txt   Level-02 records   for the state of Bihar   
      4582      R6801T1L03.txt   Level-03 records   for the state of Bihar    
     24014      R6801T1L04.txt   Level-04 records   for the state of Bihar    
    248509      R6801T1L05.txt   Level-05 records   for the state of Bihar    
     76604      R6801T1L06.txt   Level-06 records   for the state of Bihar    
     16097      R6801T1L07.txt   Level-07 records   for the state of Bihar    
     98096      R6801T1L08.txt   Level-08 records   for the state of Bihar    
    158094      R6801T1L09.txt   Level-09 records   for the state of Bihar   
      4582      R6801T1L10.txt   Level-10 records   for the state of Bihar    
    162284      R6801T1L11.txt   Level-11 records   for the state of Bihar    
------------------------------------------------------------------------------
            
------------------------------------------------------------------------------
Record length for data is 142.


B) Multiplier files for Schdule 1.0 
--------------------------------------------------------------
      No. of                     No. of            Multiplier
     records                      bytes #           File name
--------------------------------------------------------------
       12784                    3950256               mlt6801      
--------------------------------------------------------------
Record length is 308.

# : All the levelwise data files and multiplier file have been converted to DOS 
for user's convenience.
    The "No. of Bytes" shown here is according to UNIX.

Note for users :
----------------
(1)  These are text data with fixed record-length of 143 characters (including 
new-line character).
     First 126 bytes are data, next 6 bytes comprise of number of 
     first stage units surveyed within a substratum for the sub-sample (NSS) 
     and sub-sample combined (NSC) and next 10 bytes are weight or multiplier
     for the sub-sample (MLT). Last byte is for Newline character.
     
(2)  The Layout of data is given in the MS Excel-file lay68_sch010_typ1.xls.
   
(3)  For generating any estimate, one has to extract relevant portion
     of the data, and aggregate after applying the weights.
     
(4)  Weights (or multipliers) are given at the end of each record
     from 133rd byte onwards. The weights (multipliers) are
     Sub-sample-wise, details of which are as given below :
     (For description of subsample, please see Instructions, 
     NSS 68th Round, Manual for field staff, Vol-I)

     
     NSS,NSC and subsample-wise weights (all-subround multipliers)   
     -------------------------------------------------------------
     NSS = Bytes 127-129 (3 bytes)
     NSC = Bytes 130-132 (3 bytes)
     MLT = Bytes 133-142 (10 bytes, assumed two places of decimal)
     -------------------------------------------------------------
    
     All records of an household will have same weight figure.
 
    In case of those Blocks/Levels, where Item/Person Sl.No. is not
    applicable, the field is filled up with  "00000".
    
(5) In the value fields (in Rs. or quantity or area etc.) only the numeric 
figure 
    is given in datafile. The decimal point is to be assumed after looking at 
the 
    type of that field in the printed schedule.

----------------------------------------------------------------
   Use of subsample-wise weights (all-subround multipliers) 
----------------------------------------------------------------

    For generating subsample-wise estimates based on data of all 
    subrounds taken together, either Subsample-1 households or 
    Subsample-2 households are to be considered at one time.  
    Subsample code is available in the data file. 
    (Please see layout of data).   
      
     Apply final weight (or all-subround multipliers) as follows :
     
     final weight = MLT/100,   if NSS=NSC
                  = MLT/200    otherwise.
    
-----------------------------------------------------------------------------

(6)    Common Primary Key for identification of a record for any schedule is :

       FSU Serial Number                  =  4(5)   (i.e., offset = 4th byte,
                                                           length = 5 bytes)
       Segment Number                     = 32(1)
       Second Stage Stratum Number        = 33(1)
       household Number                   = 34(2)
       Level Number                       = 36(2)
       Item Code                          = 38(5)
                    
-------------------------------------------------------------------------------


(7) List of Documents
---------------------

General Information                 -----  README68_S010T1.txt

Text Data Layout                    -----  lay68_sch010_typ1.xls

Subsample-wise multiplier (all-subround) file layout
for schedule 1.0                    -----  multlay68_010.xls

Blank schedule 0.0                  -----  sch00-final_180411.doc
Blank schedule 1.0                  -----  sch 1.0 type 1-final.doc

Estimation procedure note and related tables for stratum composition
for 68th round in the Common folder -----  EST68_final.doc

State codes                         -----  State code.doc

Ammendment to NIC 2008              -----  nic amendment_2008.pdf

Please note that Blank schedules are in the 
folder 68v-2 within the Common folder.

Please note that instructions for different schedules are in the 
folder 68v-1 within the Common folder.

------------------------------------------------------------------------------

Re: [datameet] Extracting NSSO data

Reply via email to