I thought the following might be of interest to people who work in microeconometrics.
Allin Cottrell ---------- Forwarded message ---------- List-Post: gretl-users@gretlml.univpm.it Date: Thu, 25 Oct 2007 11:44:59 -0400 (EDT) From: Allin Cottrell To: Mohammad (Mitu) Ashraf Subject: Re: Gretl and PUMS Data On Wed, 24 Oct 2007, Mohammad (Mitu) Ashraf wrote: > I am an associate professor of economics at UNC-Pembroke. I just > started using Gretl for my research. I am switching from SAS. > First of all, I want to thank you and your colleagues for > developing such a wonderful tool. Thanks! > I have been trying to figure out how to Gretl for Public Use > Micro Data Sample (PUMS). I am wondering if you can point me in > the right direction. Your response is greatly appreciated. I haven't made much use of PUMS data myself, but here's what I found on quick experimentation. I went to http://factfinder.census.gov/home/en/acs_pums_2006.html and downloaded the 2006 Population Records for North Carolina in CSV format. Gretl was close to being able to read this straight off, but there was one problem. When gretl encounters non-numeric data for a particular variable in a CSV import it treats the values of that variable as strings, constructs a numeric coding, and creates a "string table" that presents the coding to the user. BUT this is done only if non-numeric data are encountered in the first data row for the variable in question. That is, if we read (apparently) numeric data on rows 1 to k-1, then encounter non-numeric data on row k, we flag an error and stop reading. The trouble is that some of the PUMS variables are codings, some but not all values of which contain non-numeric characters. For example, NAICSP, the "NAICS Industry Code", which has values (among others) of 1133 and 113M. Here's a solution, perhaps not permanent if we can think of something better: I've added a new parameter to the "set" command, namely "codevars". You can do, for example, set codevars NAICSP SOCP prior to importing a CSV file. This tells gretl that the variables NAICSP and SOCP should be interpreted as string-coded, even if the first values look to be numeric. (In general you say: "set codevars <varnames>", where <varnames> is a space-separated list of names. You can say "set codevars null" to clean out the list.) For the North Carolina PUMS data, this now works to open the file in gretl: set codevars NAICSP SOCP open ss06pnc.csv This feature is in CVS gretl, and also in the current Windows snapshot at http://ricardo.ecn.wfu.edu/pub/gretl/gretl_install.exe You may have to engage in some trial and error. I've beefed up the error reporting a little. So, in relation to the example above, if you do set codevars NAICSP open ss06pnc.csv you then see: Variable 106 (SOCP), observation 12, '434XXX': Extraneous character 'X' in data which in effect tells you that you need to add SOCP to the "codevars" list -- if it seems to you that 434XXX is a legtitimate value for that variable.