On Wed, Feb 6, 2013 at 5:36 PM, Yarden Katz <yarden.k...@gmail.com> wrote:
> [cross listed to pandas list since it's at intersection of pandas/rpy2
> - apologies for redundancy]
>
> Hi all,
>
> I'm trying to plot numpy arrays and pandas DataFrames with Rpy2 and am
> running into several problems. I import rpy2, pandas and scipy/numpy
> as follows:
>
> import rpy2
> from rpy2.robjects import r
> import rpy2.robjects.numpy2ri
> import pandas.rpy.common as com
> rpy2.robjects.numpy2ri.activate()
> from numpy import *
> import scipy
> import pandas
>
> Then I read a CSV file as a pandas DataFrame as usual:
>
> # Read a pandas DataFrame from file
> data = pandas.read_table("myfile.txt")
> r.hist(data.col1, xlab="", ylab="")
>
> "col1" of the "data" DataFrame contains only floats. When I plot it,
> my plot is littered with random numbers from the array on the
> histogram plot.  They appear in bold in the top of the plot, and below
> the x-axis is regular font.  They completely
> hide the xtick labels and the x-axis label (if there was one.)  If I
> don't pass xlab="", the entire histogram is covered with numbers.
>
> My question is: how can I get rpy2 to actually read the information
> from the pandas DataFrame and use it in the plot? This is  what
> happens natively in R with DataFrames, and I'm trying to get the same
> behavior here. For example, since it knows the names/labels of each
> column (in this case "col1"), it can place that on the X-axis.  Is
> this possible?  Does it require a conversion to an Rpy DataFrame
> before?
>
> Related issue: When I try to plot the DataFrame with:
>
> r.plot(data)
>
> It fails as well, with the error:
>
> ValueError: Nothing can be done for the type <class
> 'pandas.core.frame.DataFrame'> at the moment.
>
> Is it possible to get rpy2 to plot the DataFrame as best as it can
> (just like in native R, where R does whatever guessing is most
> reasonable to plot the DF in the requested way)?  Since pandas
> DataFrames can do everything R DataFrames can, it should be possible.
>
> I also tried to explicitly convert to R DataFrames first:
>
> r.plot(com.convert_to_r_dataframe(data))
>
> which generates this output (the first part just prints a column from
> my dataframe for some reason)
>
> ==
> 0     1.791385
> 1     0.152134
> 2     0.000000
> 3     0.649393
> 4     0.000000
> 5     0.605132
> 6     0.000000
> 7     0.000000
> 8     0.000000
> 9     0.000000
> 10    2.084081
> 11    0.488127
> 12    0.006791
> 13    0.000000
> 14    0.244846
> ...
> 21500      1.578385
> 21501      0.080556
> 21502    166.923864
> 21503     15.274696
> 21504      0.000000
> 21505      1.333847
> 21506      0.000000
> 21507      0.000000
> 21508      0.000000
> 21509      0.075611
> 21510      0.000000
> 21511      2.025098
> 21512      0.562991
> 21513      0.000000
> 21514      0.000000
> Name: rpkm, Length: 21515
> Error in plot.window(...) : need finite 'xlim' values
> In addition: Warning messages:
> 1: In data.matrix(x) : NAs introduced by coercion
> 2: In data.matrix(x) : NAs introduced by coercion
> 3: In min(x) : no non-missing arguments to min; returning Inf
> 4: In max(x) : no non-missing arguments to max; returning -Inf
> 5: In min(x) : no non-missing arguments to min; returning Inf
> 6: In max(x) : no non-missing arguments to max; returning -Inf
> ---------------------------------------------------------------------------
> RRuntimeError                             Traceback (most recent call last)
> /home/yarden/.local/lib/python2.7/site-packages/IPython/utils/py3compat.pyc
> in execfile(fname, *where)
>     176             else:
>     177                 filename = fname
> --> 178             __builtin__.execfile(filename, *where)
>
> /home/yarden/test_rpy2.py in <module>()
>      19 print data.rpkm
>      20 #r.hist(data.rpkm.values, xlab="", ylab="")
> ---> 21 r.plot(com.convert_to_r_dataframe(data))
>      22
>      23
>
> /home/yarden/.local/lib/python2.7/site-packages/rpy2-2.3.2-py2.7-linux-x86_64.egg/rpy2/robjects/functions.pyc
> in __call__(self, *args, **kwargs)
>      84                 v = kwargs.pop(k)
>      85                 kwargs[r_k] = v
> ---> 86         return super(SignatureTranslatedFunction,
> self).__call__(*args, **kwargs)
>
> /home/yarden/.local/lib/python2.7/site-packages/rpy2-2.3.2-py2.7-linux-x86_64.egg/rpy2/robjects/functions.pyc
> in __call__(self, *args, **kwargs)
>      33         for k, v in kwargs.iteritems():
>      34             new_kwargs[k] = conversion.py2ri(v)
> ---> 35         res = super(Function, self).__call__(*new_args, **new_kwargs)
>      36         res = conversion.ri2py(res)
>      37         return res
>
> RRuntimeError: Error in plot.window(...) : need finite 'xlim' values
> ==
>
> Advice on this will be very much appreciated.  Thank you.
>
> ------------------------------------------------------------------------------
> Free Next-Gen Firewall Hardware Offer
> Buy your Sophos next-gen firewall before the end March 2013
> and get the hardware for free! Learn more.
> http://p.sf.net/sfu/sophos-d2d-feb
> _______________________________________________
> rpy-list mailing list
> rpy-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rpy-list

Could you move this to GitHub? I don't have time to look at it right
now but dalejung or other active pandas users may be able to help.

- Wes

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Reply via email to