[Rpy] Making dataframes ... fast

Gary Strangman Mon, 28 Sep 2009 18:19:02 -0700

Hi all,

I have a python list of lists (each sublist is a row of data), plus a list 
of column names. Something like this ...


>>> d = [['S80', 'C', 137.5, 0],
          ['S82', 'C', 155.1, 1],
          ['S83', 'T', 11.96, 0],
          ['S84', 'T', 47,    1]]
          ['S85', 'T', numpy.nan, 1]]
>>> colnames = ['code','pop','score','flag']

I'm looking for the /fastest/ way to create an R dataframe (via rpy2) 
using these two variables. It could be via dictionaries, numpy object 
arrays, whatever, it just needs to be fast. Note that the data has mixed 
types (some columns are strings, some are floats, some are ints), and 
there are missing values which I'd like R to interpret as NA. I can 
pre-transform the elements of the d variable as required to facilitate 
this.

I need to do this step several hundred thousand times (yes, different data 
each time) on up to ~10,000 element datasets, so any speedup suggestions 
are welcome.

-best
Gary

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

[Rpy] Making dataframes ... fast

Reply via email to