On Wed, Feb 6, 2013 at 5:36 PM, Yarden Katz <yarden.k...@gmail.com> wrote: > [cross listed to pandas list since it's at intersection of pandas/rpy2 > - apologies for redundancy] > > Hi all, > > I'm trying to plot numpy arrays and pandas DataFrames with Rpy2 and am > running into several problems. I import rpy2, pandas and scipy/numpy > as follows: > > import rpy2 > from rpy2.robjects import r > import rpy2.robjects.numpy2ri > import pandas.rpy.common as com > rpy2.robjects.numpy2ri.activate() > from numpy import * > import scipy > import pandas > > Then I read a CSV file as a pandas DataFrame as usual: > > # Read a pandas DataFrame from file > data = pandas.read_table("myfile.txt") > r.hist(data.col1, xlab="", ylab="") > > "col1" of the "data" DataFrame contains only floats. When I plot it, > my plot is littered with random numbers from the array on the > histogram plot. They appear in bold in the top of the plot, and below > the x-axis is regular font. They completely > hide the xtick labels and the x-axis label (if there was one.) If I > don't pass xlab="", the entire histogram is covered with numbers. > > My question is: how can I get rpy2 to actually read the information > from the pandas DataFrame and use it in the plot? This is what > happens natively in R with DataFrames, and I'm trying to get the same > behavior here. For example, since it knows the names/labels of each > column (in this case "col1"), it can place that on the X-axis. Is > this possible? Does it require a conversion to an Rpy DataFrame > before? > > Related issue: When I try to plot the DataFrame with: > > r.plot(data) > > It fails as well, with the error: > > ValueError: Nothing can be done for the type <class > 'pandas.core.frame.DataFrame'> at the moment. > > Is it possible to get rpy2 to plot the DataFrame as best as it can > (just like in native R, where R does whatever guessing is most > reasonable to plot the DF in the requested way)? Since pandas > DataFrames can do everything R DataFrames can, it should be possible. > > I also tried to explicitly convert to R DataFrames first: > > r.plot(com.convert_to_r_dataframe(data)) > > which generates this output (the first part just prints a column from > my dataframe for some reason) > > == > 0 1.791385 > 1 0.152134 > 2 0.000000 > 3 0.649393 > 4 0.000000 > 5 0.605132 > 6 0.000000 > 7 0.000000 > 8 0.000000 > 9 0.000000 > 10 2.084081 > 11 0.488127 > 12 0.006791 > 13 0.000000 > 14 0.244846 > ... > 21500 1.578385 > 21501 0.080556 > 21502 166.923864 > 21503 15.274696 > 21504 0.000000 > 21505 1.333847 > 21506 0.000000 > 21507 0.000000 > 21508 0.000000 > 21509 0.075611 > 21510 0.000000 > 21511 2.025098 > 21512 0.562991 > 21513 0.000000 > 21514 0.000000 > Name: rpkm, Length: 21515 > Error in plot.window(...) : need finite 'xlim' values > In addition: Warning messages: > 1: In data.matrix(x) : NAs introduced by coercion > 2: In data.matrix(x) : NAs introduced by coercion > 3: In min(x) : no non-missing arguments to min; returning Inf > 4: In max(x) : no non-missing arguments to max; returning -Inf > 5: In min(x) : no non-missing arguments to min; returning Inf > 6: In max(x) : no non-missing arguments to max; returning -Inf > --------------------------------------------------------------------------- > RRuntimeError Traceback (most recent call last) > /home/yarden/.local/lib/python2.7/site-packages/IPython/utils/py3compat.pyc > in execfile(fname, *where) > 176 else: > 177 filename = fname > --> 178 __builtin__.execfile(filename, *where) > > /home/yarden/test_rpy2.py in <module>() > 19 print data.rpkm > 20 #r.hist(data.rpkm.values, xlab="", ylab="") > ---> 21 r.plot(com.convert_to_r_dataframe(data)) > 22 > 23 > > /home/yarden/.local/lib/python2.7/site-packages/rpy2-2.3.2-py2.7-linux-x86_64.egg/rpy2/robjects/functions.pyc > in __call__(self, *args, **kwargs) > 84 v = kwargs.pop(k) > 85 kwargs[r_k] = v > ---> 86 return super(SignatureTranslatedFunction, > self).__call__(*args, **kwargs) > > /home/yarden/.local/lib/python2.7/site-packages/rpy2-2.3.2-py2.7-linux-x86_64.egg/rpy2/robjects/functions.pyc > in __call__(self, *args, **kwargs) > 33 for k, v in kwargs.iteritems(): > 34 new_kwargs[k] = conversion.py2ri(v) > ---> 35 res = super(Function, self).__call__(*new_args, **new_kwargs) > 36 res = conversion.ri2py(res) > 37 return res > > RRuntimeError: Error in plot.window(...) : need finite 'xlim' values > == > > Advice on this will be very much appreciated. Thank you. > > ------------------------------------------------------------------------------ > Free Next-Gen Firewall Hardware Offer > Buy your Sophos next-gen firewall before the end March 2013 > and get the hardware for free! Learn more. > http://p.sf.net/sfu/sophos-d2d-feb > _______________________________________________ > rpy-list mailing list > rpy-list@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rpy-list
Could you move this to GitHub? I don't have time to look at it right now but dalejung or other active pandas users may be able to help. - Wes ------------------------------------------------------------------------------ Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list