subject:"\[Tutor\] reading variables in a data set\?"

Re: [Tutor] reading variables in a data set?

2009-07-05 Thread Kent Johnson

On Sat, Jul 4, 2009 at 12:09 PM, Steven Buck wrote:

> I've used a module (StataTools) from (http://presbrey.mit.edu/PyDTA ) to get
> a Stata ".dta" file into Python. In Stata the data set is an NXK matrix
> where N is the number of observations (households) and K is the number of
> variables.
> I gather it's now a list where each element of the list is an observation (a
> vector) for one household.  The name of my list is "data"; I gather Python
> recognizes the first observation by: data[1] .
> Example,
> data = [X_1, X_2, X_3, . . . . , X_N]  where each X_i for all i, is vector
> of household characteristics, eg X_1 = (age_1, wage_1, . . . , residence_1).
>
> I also have a list for variable names called "varname"; although I'm not
> sure the module I used to extract the ".dta" into Python also created a
> correspondence between the varname list and the data list--the python
> interpreter won't print anything when I type one of the variable names, I
> was hoping it would print out a vector of ages or the like.

varname is probably just a list of strings without any direct
connection to the data.

> In anycase, I'd like to make a scatter plot in pylab, but don't know how to
> identify a variable in "data" (i.e.  I'd like a vector listing the ages and
> another vector listing the wages of  households).  Perhaps, I need to run
> subroutine to collect each relevant data point to create a new list which I
> define as my variable of interest?  From the above example, I'd like to
> create a list such as: age = [age_1, age_2, . . . , age_N] and likewise for
> wages.

You can use a list comprehension to collect columns from the data. If
age is the first element of each observation (index 0), and wages the
second (index 1), then
ages = [ observation[0] for observation in data ]
wages = [ observation[1] for observation in data ]

> Any help you could offer would be very much appreciated.  Also, this is my
> first time using the python tutor, so let me know if I've used it
> appropriately or if I should change/narrow the structure of my question.

It's very helpful if you show us the code you have so far.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading variables in a data set?

2009-07-04 Thread Emile van Sebille


On 7/4/2009 9:09 AM Steven Buck said...

Dear Python Tutor,
I'm doing econometric work and am a new user of Python. I have read 
several of the tutorials, but haven't found them useful for a newbie 
problem I've encountered.
I've used a module (StataTools) from (http://presbrey.mit.edu/PyDTA ) to 
get a Stata ".dta" file into Python. In Stata the data set is an NXK 
matrix where N is the number of observations (households) and K is the 
number of variables. 
I gather it's now a list where each element of the list is an 
observation (a vector) for one household.  The name of my list is 
"data"; I gather Python recognizes the first observation by: data[1] . 
Example,
data = [X_1, X_2, X_3, . . . . , X_N]  where each X_i for all i, is 
vector of household characteristics, eg X_1 = (age_1, wage_1, . . . , 
residence_1).
 
I also have a list for variable names called "varname"; although I'm not 
sure the module I used to extract the ".dta" into Python also created a 
correspondence between the varname list and the data list--the python 
interpreter won't print anything when I type one of the variable names, 
I was hoping it would print out a vector of ages or the like. 


Assuming you're working in the python console somewhat from the example 
on the source website for PyDTA:


from PyDTA import Reader
dta = Reader(file('input.dta'))
fields = ','.join(['%s']*len(dta.variables()))

... you might try starting at dir|help (dta.variables)

I didn't look, but the sources are available as well.


 
In anycase, I'd like to make a scatter plot in pylab, 


I think I'd use dictionaries along these lines:

  wages = { age_1: [ X_1, X_15, X_3...],
age_2: [ X_2, X_5... ],
  ]


but don't know how 
to  identify a variable in "data" (i.e.  I'd like a vector listing the 
ages and another vector listing the wages of  households).  


I think poking into dta.variables will answer this one.

HTH,

Emile

Perhaps, I 
need to run subroutine to collect each relevant data point to create a 
new list which I define as my variable of interest?  From the above 
example, I'd like to create a list such as: age = [age_1, age_2, . . . , 
age_N] and likewise for wages.
 
Any help you could offer would be very much appreciated.  Also, this is 
my first time using the python tutor, so let me know if I've used it 
appropriately or if I should change/narrow the structure of my question.
 
Thanks

Steve

--
Steven Buck
Ph.D. Student
Department of Agricultural and Resource Economics
University of California, Berkeley




___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading variables in a data set?

2009-07-04 Thread Luke Paireepinart

Pardon me, I don't have time to address all of your questions; however,
Steven Buck wrote:
I gather it's now a list where each element of the list is an 
observation (a vector) for one household.  The name of my list is 
"data"; I gather Python recognizes the first observation by: data[1] .
No, the first item in a list is going to be data[0], not data[1].  
Python counts from 0 not 1.  Unless by the "first observation" you mean 
the "one after the zeroth observation" but that is not the common usage 
of that term.

Example,
data = [X_1, X_2, X_3, . . . . , X_N]  where each X_i for all i, is 
vector of household characteristics, eg X_1 = (age_1, wage_1, . . . , 
residence_1).

I also have a list for variable names called "varname"; although I'm 
not sure the module I used to extract the ".dta" into Python also 
created a correspondence between the varname list and the data 
list--the python interpreter won't print anything when I type one of 
the variable names, I was hoping it would print out a vector of ages 
or the like.
It should output whatever is contained in the variable, if you're at the 
interpreter.  Sounds like you're not getting your data in.

>>> x = ["hello", "world!", 42]
>>> x
['hello', 'world!', 42]

Hope that helps a litttle bit, good luck!
-Luke
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] reading variables in a data set?

2009-07-04 Thread Steven Buck

 Dear Python Tutor,
I'm doing econometric work and am a new user of Python. I have read several
of the tutorials, but haven't found them useful for a newbie problem I've
encountered.
I've used a module (StataTools) from (http://presbrey.mit.edu/PyDTA ) to get
a Stata ".dta" file into Python. In Stata the data set is an NXK matrix
where N is the number of observations (households) and K is the number of
variables.
I gather it's now a list where each element of the list is an observation (a
vector) for one household.  The name of my list is "data"; I gather Python
recognizes the first observation by: data[1] .
Example,
data = [X_1, X_2, X_3, . . . . , X_N]  where each X_i for all i, is vector
of household characteristics, eg X_1 = (age_1, wage_1, . . . , residence_1).

I also have a list for variable names called "varname"; although I'm not
sure the module I used to extract the ".dta" into Python also created a
correspondence between the varname list and the data list--the python
interpreter won't print anything when I type one of the variable names, I
was hoping it would print out a vector of ages or the like.

In anycase, I'd like to make a scatter plot in pylab, but don't know how to
identify a variable in "data" (i.e.  I'd like a vector listing the ages and
another vector listing the wages of  households).  Perhaps, I need to run
subroutine to collect each relevant data point to create a new list which I
define as my variable of interest?  From the above example, I'd like to
create a list such as: age = [age_1, age_2, . . . , age_N] and likewise for
wages.

Any help you could offer would be very much appreciated.  Also, this is my
first time using the python tutor, so let me know if I've used it
appropriately or if I should change/narrow the structure of my question.

Thanks
Steve

-- 
Steven Buck
Ph.D. Student
Department of Agricultural and Resource Economics
University of California, Berkeley
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading variables in a data set?

Re: [Tutor] reading variables in a data set?

Re: [Tutor] reading variables in a data set?

[Tutor] reading variables in a data set?

4 matches

Site Navigation

Mail list logo

Footer information