I thought the following might be of interest to people who work in 
microeconometrics.

Allin Cottrell

---------- Forwarded message ----------
List-Post: gretl-users@gretlml.univpm.it
Date: Thu, 25 Oct 2007 11:44:59 -0400 (EDT)
From: Allin Cottrell 
To: Mohammad (Mitu) Ashraf
Subject: Re: Gretl and PUMS Data

On Wed, 24 Oct 2007, Mohammad (Mitu) Ashraf wrote:

> I am an associate professor of economics at UNC-Pembroke. I just 
> started using Gretl for my research. I am switching from SAS. 
> First of all, I want to thank you and your colleagues for 
> developing such a wonderful tool.

Thanks!

> I have been trying to figure out how to Gretl for Public Use 
> Micro Data Sample (PUMS). I am wondering if you can point me in 
> the right direction. Your response is greatly appreciated.

I haven't made much use of PUMS data myself, but here's what I 
found on quick experimentation.  I went to 

http://factfinder.census.gov/home/en/acs_pums_2006.html

and downloaded the 2006 Population Records for North Carolina in 
CSV format.  Gretl was close to being able to read this straight 
off, but there was one problem.  

When gretl encounters non-numeric data for a particular variable 
in a CSV import it treats the values of that variable as strings, 
constructs a numeric coding, and creates a "string table" that 
presents the coding to the user.  BUT this is done only if 
non-numeric data are encountered in the first data row for the 
variable in question.  That is, if we read (apparently) numeric 
data on rows 1 to k-1, then encounter non-numeric data on row k, 
we flag an error and stop reading.

The trouble is that some of the PUMS variables are codings, some 
but not all values of which contain non-numeric characters.  For 
example, NAICSP, the "NAICS Industry Code", which has values 
(among others) of 1133 and 113M.  

Here's a solution, perhaps not permanent if we can think of 
something better: I've added a new parameter to the "set" command, 
namely "codevars".  You can do, for example,

 set codevars NAICSP SOCP

prior to importing a CSV file.  This tells gretl that the 
variables NAICSP and SOCP should be interpreted as string-coded, 
even if the first values look to be numeric.

(In general you say: "set codevars <varnames>", where <varnames> 
is a space-separated list of names.  You can say "set codevars 
null" to clean out the list.)

For the North Carolina PUMS data, this now works to open the file 
in gretl:

 set codevars NAICSP SOCP
 open ss06pnc.csv

This feature is in CVS gretl, and also in the current Windows 
snapshot at

http://ricardo.ecn.wfu.edu/pub/gretl/gretl_install.exe

You may have to engage in some trial and error.  I've beefed up 
the error reporting a little.  So, in relation to the example 
above, if you do

 set codevars NAICSP 
 open ss06pnc.csv

you then see:

 Variable 106 (SOCP), observation 12, '434XXX':
 Extraneous character 'X' in data

which in effect tells you that you need to add SOCP to the 
"codevars" list -- if it seems to you that 434XXX is a legtitimate 
value for that variable.

Reply via email to