Re: [Tutor] reading variables in a data set?
On Sat, Jul 4, 2009 at 12:09 PM, Steven Buck wrote: > I've used a module (StataTools) from (http://presbrey.mit.edu/PyDTA ) to get > a Stata ".dta" file into Python. In Stata the data set is an NXK matrix > where N is the number of observations (households) and K is the number of > variables. > I gather it's now a list where each element of the list is an observation (a > vector) for one household. The name of my list is "data"; I gather Python > recognizes the first observation by: data[1] . > Example, > data = [X_1, X_2, X_3, . . . . , X_N] where each X_i for all i, is vector > of household characteristics, eg X_1 = (age_1, wage_1, . . . , residence_1). > > I also have a list for variable names called "varname"; although I'm not > sure the module I used to extract the ".dta" into Python also created a > correspondence between the varname list and the data list--the python > interpreter won't print anything when I type one of the variable names, I > was hoping it would print out a vector of ages or the like. varname is probably just a list of strings without any direct connection to the data. > In anycase, I'd like to make a scatter plot in pylab, but don't know how to > identify a variable in "data" (i.e. I'd like a vector listing the ages and > another vector listing the wages of households). Perhaps, I need to run > subroutine to collect each relevant data point to create a new list which I > define as my variable of interest? From the above example, I'd like to > create a list such as: age = [age_1, age_2, . . . , age_N] and likewise for > wages. You can use a list comprehension to collect columns from the data. If age is the first element of each observation (index 0), and wages the second (index 1), then ages = [ observation[0] for observation in data ] wages = [ observation[1] for observation in data ] > Any help you could offer would be very much appreciated. Also, this is my > first time using the python tutor, so let me know if I've used it > appropriately or if I should change/narrow the structure of my question. It's very helpful if you show us the code you have so far. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] reading variables in a data set?
On 7/4/2009 9:09 AM Steven Buck said... Dear Python Tutor, I'm doing econometric work and am a new user of Python. I have read several of the tutorials, but haven't found them useful for a newbie problem I've encountered. I've used a module (StataTools) from (http://presbrey.mit.edu/PyDTA ) to get a Stata ".dta" file into Python. In Stata the data set is an NXK matrix where N is the number of observations (households) and K is the number of variables. I gather it's now a list where each element of the list is an observation (a vector) for one household. The name of my list is "data"; I gather Python recognizes the first observation by: data[1] . Example, data = [X_1, X_2, X_3, . . . . , X_N] where each X_i for all i, is vector of household characteristics, eg X_1 = (age_1, wage_1, . . . , residence_1). I also have a list for variable names called "varname"; although I'm not sure the module I used to extract the ".dta" into Python also created a correspondence between the varname list and the data list--the python interpreter won't print anything when I type one of the variable names, I was hoping it would print out a vector of ages or the like. Assuming you're working in the python console somewhat from the example on the source website for PyDTA: from PyDTA import Reader dta = Reader(file('input.dta')) fields = ','.join(['%s']*len(dta.variables())) ... you might try starting at dir|help (dta.variables) I didn't look, but the sources are available as well. In anycase, I'd like to make a scatter plot in pylab, I think I'd use dictionaries along these lines: wages = { age_1: [ X_1, X_15, X_3...], age_2: [ X_2, X_5... ], ] but don't know how to identify a variable in "data" (i.e. I'd like a vector listing the ages and another vector listing the wages of households). I think poking into dta.variables will answer this one. HTH, Emile Perhaps, I need to run subroutine to collect each relevant data point to create a new list which I define as my variable of interest? From the above example, I'd like to create a list such as: age = [age_1, age_2, . . . , age_N] and likewise for wages. Any help you could offer would be very much appreciated. Also, this is my first time using the python tutor, so let me know if I've used it appropriately or if I should change/narrow the structure of my question. Thanks Steve -- Steven Buck Ph.D. Student Department of Agricultural and Resource Economics University of California, Berkeley ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] reading variables in a data set?
Pardon me, I don't have time to address all of your questions; however, Steven Buck wrote: I gather it's now a list where each element of the list is an observation (a vector) for one household. The name of my list is "data"; I gather Python recognizes the first observation by: data[1] . No, the first item in a list is going to be data[0], not data[1]. Python counts from 0 not 1. Unless by the "first observation" you mean the "one after the zeroth observation" but that is not the common usage of that term. Example, data = [X_1, X_2, X_3, . . . . , X_N] where each X_i for all i, is vector of household characteristics, eg X_1 = (age_1, wage_1, . . . , residence_1). I also have a list for variable names called "varname"; although I'm not sure the module I used to extract the ".dta" into Python also created a correspondence between the varname list and the data list--the python interpreter won't print anything when I type one of the variable names, I was hoping it would print out a vector of ages or the like. It should output whatever is contained in the variable, if you're at the interpreter. Sounds like you're not getting your data in. >>> x = ["hello", "world!", 42] >>> x ['hello', 'world!', 42] Hope that helps a litttle bit, good luck! -Luke ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] reading variables in a data set?
Dear Python Tutor, I'm doing econometric work and am a new user of Python. I have read several of the tutorials, but haven't found them useful for a newbie problem I've encountered. I've used a module (StataTools) from (http://presbrey.mit.edu/PyDTA ) to get a Stata ".dta" file into Python. In Stata the data set is an NXK matrix where N is the number of observations (households) and K is the number of variables. I gather it's now a list where each element of the list is an observation (a vector) for one household. The name of my list is "data"; I gather Python recognizes the first observation by: data[1] . Example, data = [X_1, X_2, X_3, . . . . , X_N] where each X_i for all i, is vector of household characteristics, eg X_1 = (age_1, wage_1, . . . , residence_1). I also have a list for variable names called "varname"; although I'm not sure the module I used to extract the ".dta" into Python also created a correspondence between the varname list and the data list--the python interpreter won't print anything when I type one of the variable names, I was hoping it would print out a vector of ages or the like. In anycase, I'd like to make a scatter plot in pylab, but don't know how to identify a variable in "data" (i.e. I'd like a vector listing the ages and another vector listing the wages of households). Perhaps, I need to run subroutine to collect each relevant data point to create a new list which I define as my variable of interest? From the above example, I'd like to create a list such as: age = [age_1, age_2, . . . , age_N] and likewise for wages. Any help you could offer would be very much appreciated. Also, this is my first time using the python tutor, so let me know if I've used it appropriately or if I should change/narrow the structure of my question. Thanks Steve -- Steven Buck Ph.D. Student Department of Agricultural and Resource Economics University of California, Berkeley ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor