Re: [datameet] Extracting NSSO data

2018-08-14 Thread Tarun Kateja
Hi,

Hearty thanks to all for your responses.

Extracting fixed length information is relatively easy considering there is
separate data file for each level. My confusion on multiplier is not
getting clear. Why is there a separate multiplier file for complete data
and we also have last 10 bytes of each row as multiplier.

How to use multiplier to populate the data? Are we simply multiplying each
row's every attribute with weight (calculated from Multiplier as given in
Readme file) and if yes which weights (multiplier from separate multiplier
file or multipliers given as last 10 bytes in each level data file)?

I can extract the data using python if we dont need to do any manipulation
or calculations and simply use byte position to know the attribute value.
Blogs are not explaining multiplier properly and everyone focusing on
software like Stata but I need conceptual understanding to utilize the
information in most Accurate way.

Thanks
Tarun Kateja
IIT Madras
Contact: (+91) 9092392724


On Tue, Aug 14, 2018 at 10:32 AM Chandrasekhar S. 
wrote:

> Greetings!
>
> If you purchased data from NSSO it comes with a program (nesstar) that
> extracts the data for you. Use this program and it will extract to
> whichever format you would like including STATA.
>
> Hope this helps.
>
> Chandrasekhar
>
> On Wed, Aug 8, 2018 at 10:59 PM, Tarun Kateja 
> wrote:
>
>> Hi Sachin,
>>
>> I also want to extract 68th round Household and Consumer expenditure
>> data. I am little confused and have never worked with Stata. Can you
>> explain what is multiplier and how to use it? and can you share your code
>> to extract data from .txt file?
>>
>> This will be a great help!
>>
>> Thanks
>>
>> On Monday, September 5, 2016 at 1:03:02 PM UTC+5:30, sachin wrote:
>>>
>>> Hi,
>>> I have used 68th round data for agri consumption and poverty estimation
>>> using STATA.
>>> I am assuming that the raw data you are referring to is also available
>>> in .txt format. As I know, the NSSO data has a highly structured format -
>>> Schedule.Level>Block>Item No. The variables are not declared in the raw
>>> data. These variables are to be understood from the "layout" file for that
>>> specific round (released along with the NSSO round data) and this is
>>> available along with raw data.
>>>
>>> The data is a long string characters. These are read in a specific
>>> manner. The layout file will specify how many characters must be read
>>> together to form each variable. So it could look like -
>>> v11 1-3 v12 4-8 v13 9-10 v14 11-13 v15 14-14 v16 15-15 v17 16-18 v18
>>> 19-20 v19 21-22 v110 23-24 v111 25-25 v112 26-26 and so on.
>>>
>>> Now, this is the data that is then called from your software, to be read
>>> from a raw data file (.txt) and then a table of required variables is
>>> obtained for analysis. In a sense, the raw data is always excerpted for
>>> analysis. And for this one begins with the layout file to check the
>>> variables of interest and how they are encoded in the data.
>>>
>>> I am not sure this helps. With STATA it works a bit easy. With R, I do
>>> not know how to assemble the same dataframe, although the analysis using
>>> the variables will be a breeze.
>>>
>>> Best
>>> Sachin
>>>
>>>
>>>
>>> On Sunday, 4 September 2016 15:27:55 UTC+5:30, Devdatta Tengshe wrote:

 Can you share the link where this data is available? That way we can
 have a look at it.

 Regards,
 Devdatta Tengshe
 Ph: 735-358-0782

 On 04-Sep-2016 3:01 pm, "Jagriti Arora"  wrote:

> Hi,
> Can anyone tell me how I can make sense of the raw data NSSO provides
> on its website?
> I tried converting the XML to dataframe in R, to no avail. I, now,
> have an excel sheet with references and variables that have not been
> previously declared.
> Can anyone help? I'm looking for data from 38th and 66th round.
>
> Thanks and regards!
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
 --
>> Datameet is a community of Data Science enthusiasts in India. Know more
>> about us by visiting http://datameet.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "datameet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to datameet+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups

Re: [datameet] Extracting NSSO data

2018-08-13 Thread Chandrasekhar S.
Greetings!

If you purchased data from NSSO it comes with a program (nesstar) that
extracts the data for you. Use this program and it will extract to
whichever format you would like including STATA.

Hope this helps.

Chandrasekhar

On Wed, Aug 8, 2018 at 10:59 PM, Tarun Kateja 
wrote:

> Hi Sachin,
>
> I also want to extract 68th round Household and Consumer expenditure data.
> I am little confused and have never worked with Stata. Can you explain what
> is multiplier and how to use it? and can you share your code to extract
> data from .txt file?
>
> This will be a great help!
>
> Thanks
>
> On Monday, September 5, 2016 at 1:03:02 PM UTC+5:30, sachin wrote:
>>
>> Hi,
>> I have used 68th round data for agri consumption and poverty estimation
>> using STATA.
>> I am assuming that the raw data you are referring to is also available in
>> .txt format. As I know, the NSSO data has a highly structured format -
>> Schedule.Level>Block>Item No. The variables are not declared in the raw
>> data. These variables are to be understood from the "layout" file for that
>> specific round (released along with the NSSO round data) and this is
>> available along with raw data.
>>
>> The data is a long string characters. These are read in a specific
>> manner. The layout file will specify how many characters must be read
>> together to form each variable. So it could look like -
>> v11 1-3 v12 4-8 v13 9-10 v14 11-13 v15 14-14 v16 15-15 v17 16-18 v18
>> 19-20 v19 21-22 v110 23-24 v111 25-25 v112 26-26 and so on.
>>
>> Now, this is the data that is then called from your software, to be read
>> from a raw data file (.txt) and then a table of required variables is
>> obtained for analysis. In a sense, the raw data is always excerpted for
>> analysis. And for this one begins with the layout file to check the
>> variables of interest and how they are encoded in the data.
>>
>> I am not sure this helps. With STATA it works a bit easy. With R, I do
>> not know how to assemble the same dataframe, although the analysis using
>> the variables will be a breeze.
>>
>> Best
>> Sachin
>>
>>
>>
>> On Sunday, 4 September 2016 15:27:55 UTC+5:30, Devdatta Tengshe wrote:
>>>
>>> Can you share the link where this data is available? That way we can
>>> have a look at it.
>>>
>>> Regards,
>>> Devdatta Tengshe
>>> Ph: 735-358-0782
>>>
>>> On 04-Sep-2016 3:01 pm, "Jagriti Arora"  wrote:
>>>
 Hi,
 Can anyone tell me how I can make sense of the raw data NSSO provides
 on its website?
 I tried converting the XML to dataframe in R, to no avail. I, now, have
 an excel sheet with references and variables that have not been previously
 declared.
 Can anyone help? I'm looking for data from 38th and 66th round.

 Thanks and regards!

 --
 Datameet is a community of Data Science enthusiasts in India. Know more
 about us by visiting http://datameet.org
 ---
 You received this message because you are subscribed to the Google
 Groups "datameet" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to datameet+u...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

>>> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Extracting NSSO data

2018-08-13 Thread GALEN PATRICK MURRAY
Hi Tarun,

Sachin is correct you use the layout file to identify which position in the 
string of characters correspond to which variables. Even though I'm an R 
user I think this extraction is more easily done in STATA. I've attached my 
STATA code for the 68th round extraction 

Since the NSSO data are samples, the multiplier acts as a survey weights so 
you can get population level estimates based on the sampled survey 
responses. Look at the readme (attached) for more information on how these 
multipliers are used to calculate survey weights (especially this part):

 For generating subsample-wise estimates based on data of all 
subrounds taken together, either Subsample-1 households or 
Subsample-2 households are to be considered at one time.  
Subsample code is available in the data file. 
(Please see layout of data).   
  
 Apply final weight (or all-subround multipliers) as follows :
 
 final weight = MLT/100,   if NSS=NSC
  = MLT/200otherwise.

Also, I found this blog  
very 
helpful for explaining NSSO data, the comments in particular may ask and 
answer common questions that you have. You can even write to the author and 
he seems generally quick to respond. Good luck!




On Sunday, August 12, 2018 at 5:28:57 AM UTC+5:30, Tarun Kateja wrote:
>
> Hi Sachin,
>
> I also want to extract 68th round Household and Consumer expenditure data. 
> I am little confused and have never worked with Stata. Can you explain what 
> is multiplier and how to use it? and can you share your code to extract 
> data from .txt file? 
>
> This will be a great help!
>
> Thanks
>
> On Monday, September 5, 2016 at 1:03:02 PM UTC+5:30, sachin wrote:
>>
>> Hi, 
>> I have used 68th round data for agri consumption and poverty estimation 
>> using STATA. 
>> I am assuming that the raw data you are referring to is also available in 
>> .txt format. As I know, the NSSO data has a highly structured format - 
>> Schedule.Level>Block>Item No. The variables are not declared in the raw 
>> data. These variables are to be understood from the "layout" file for that 
>> specific round (released along with the NSSO round data) and this is 
>> available along with raw data. 
>>
>> The data is a long string characters. These are read in a specific 
>> manner. The layout file will specify how many characters must be read 
>> together to form each variable. So it could look like - 
>> v11 1-3 v12 4-8 v13 9-10 v14 11-13 v15 14-14 v16 15-15 v17 16-18 v18 
>> 19-20 v19 21-22 v110 23-24 v111 25-25 v112 26-26 and so on. 
>>
>> Now, this is the data that is then called from your software, to be read 
>> from a raw data file (.txt) and then a table of required variables is 
>> obtained for analysis. In a sense, the raw data is always excerpted for 
>> analysis. And for this one begins with the layout file to check the 
>> variables of interest and how they are encoded in the data.
>>
>> I am not sure this helps. With STATA it works a bit easy. With R, I do 
>> not know how to assemble the same dataframe, although the analysis using 
>> the variables will be a breeze.
>>
>> Best
>> Sachin 
>>  
>>
>>
>> On Sunday, 4 September 2016 15:27:55 UTC+5:30, Devdatta Tengshe wrote:
>>>
>>> Can you share the link where this data is available? That way we can 
>>> have a look at it.
>>>
>>> Regards,
>>> Devdatta Tengshe
>>> Ph: 735-358-0782
>>>
>>> On 04-Sep-2016 3:01 pm, "Jagriti Arora"  wrote:
>>>
 Hi,
 Can anyone tell me how I can make sense of the raw data NSSO provides 
 on its website?
 I tried converting the XML to dataframe in R, to no avail. I, now, have 
 an excel sheet with references and variables that have not been previously 
 declared.
 Can anyone help? I'm looking for data from 38th and 66th round.

 Thanks and regards!

 -- 
 Datameet is a community of Data Science enthusiasts in India. Know more 
 about us by visiting http://datameet.org
 --- 
 You received this message because you are subscribed to the Google 
 Groups "datameet" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to datameet+u...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

>>>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


NSS_68th_Type2.do
Description: Binary data

Government of India.
Data Processing Division.
National Sample Survey Office.
164, Gopal Lal Thakur Road, Kolkata-108.
Phone No. 2577-1128.
---
NSS 68th Round.
Final 

Re: [datameet] Extracting NSSO data

2018-08-11 Thread Tarun Kateja
Hi Sachin,

I also want to extract 68th round Household and Consumer expenditure data. 
I am little confused and have never worked with Stata. Can you explain what 
is multiplier and how to use it? and can you share your code to extract 
data from .txt file? 

This will be a great help!

Thanks

On Monday, September 5, 2016 at 1:03:02 PM UTC+5:30, sachin wrote:
>
> Hi, 
> I have used 68th round data for agri consumption and poverty estimation 
> using STATA. 
> I am assuming that the raw data you are referring to is also available in 
> .txt format. As I know, the NSSO data has a highly structured format - 
> Schedule.Level>Block>Item No. The variables are not declared in the raw 
> data. These variables are to be understood from the "layout" file for that 
> specific round (released along with the NSSO round data) and this is 
> available along with raw data. 
>
> The data is a long string characters. These are read in a specific manner. 
> The layout file will specify how many characters must be read together to 
> form each variable. So it could look like - 
> v11 1-3 v12 4-8 v13 9-10 v14 11-13 v15 14-14 v16 15-15 v17 16-18 v18 19-20 
> v19 21-22 v110 23-24 v111 25-25 v112 26-26 and so on. 
>
> Now, this is the data that is then called from your software, to be read 
> from a raw data file (.txt) and then a table of required variables is 
> obtained for analysis. In a sense, the raw data is always excerpted for 
> analysis. And for this one begins with the layout file to check the 
> variables of interest and how they are encoded in the data.
>
> I am not sure this helps. With STATA it works a bit easy. With R, I do not 
> know how to assemble the same dataframe, although the analysis using the 
> variables will be a breeze.
>
> Best
> Sachin 
>  
>
>
> On Sunday, 4 September 2016 15:27:55 UTC+5:30, Devdatta Tengshe wrote:
>>
>> Can you share the link where this data is available? That way we can have 
>> a look at it.
>>
>> Regards,
>> Devdatta Tengshe
>> Ph: 735-358-0782
>>
>> On 04-Sep-2016 3:01 pm, "Jagriti Arora"  wrote:
>>
>>> Hi,
>>> Can anyone tell me how I can make sense of the raw data NSSO provides on 
>>> its website?
>>> I tried converting the XML to dataframe in R, to no avail. I, now, have 
>>> an excel sheet with references and variables that have not been previously 
>>> declared.
>>> Can anyone help? I'm looking for data from 38th and 66th round.
>>>
>>> Thanks and regards!
>>>
>>> -- 
>>> Datameet is a community of Data Science enthusiasts in India. Know more 
>>> about us by visiting http://datameet.org
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "datameet" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to datameet+u...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Extracting NSSO data

2016-09-05 Thread sachin
Hi, 
I have used 68th round data for agri consumption and poverty estimation 
using STATA. 
I am assuming that the raw data you are referring to is also available in 
.txt format. As I know, the NSSO data has a highly structured format - 
Schedule.Level>Block>Item No. The variables are not declared in the raw 
data. These variables are to be understood from the "layout" file for that 
specific round (released along with the NSSO round data) and this is 
available along with raw data. 

The data is a long string characters. These are read in a specific manner. 
The layout file will specify how many characters must be read together to 
form each variable. So it could look like - 
v11 1-3 v12 4-8 v13 9-10 v14 11-13 v15 14-14 v16 15-15 v17 16-18 v18 19-20 
v19 21-22 v110 23-24 v111 25-25 v112 26-26 and so on. 

Now, this is the data that is then called from your software, to be read 
from a raw data file (.txt) and then a table of required variables is 
obtained for analysis. In a sense, the raw data is always excerpted for 
analysis. And for this one begins with the layout file to check the 
variables of interest and how they are encoded in the data.

I am not sure this helps. With STATA it works a bit easy. With R, I do not 
know how to assemble the same dataframe, although the analysis using the 
variables will be a breeze.

Best
Sachin 
 


On Sunday, 4 September 2016 15:27:55 UTC+5:30, Devdatta Tengshe wrote:
>
> Can you share the link where this data is available? That way we can have 
> a look at it.
>
> Regards,
> Devdatta Tengshe
> Ph: 735-358-0782
>
> On 04-Sep-2016 3:01 pm, "Jagriti Arora"  
> wrote:
>
>> Hi,
>> Can anyone tell me how I can make sense of the raw data NSSO provides on 
>> its website?
>> I tried converting the XML to dataframe in R, to no avail. I, now, have 
>> an excel sheet with references and variables that have not been previously 
>> declared.
>> Can anyone help? I'm looking for data from 38th and 66th round.
>>
>> Thanks and regards!
>>
>> -- 
>> Datameet is a community of Data Science enthusiasts in India. Know more 
>> about us by visiting http://datameet.org
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "datameet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to datameet+u...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.