Re: [Rpy] Example code for changing a column in a dataframe?

Laurent Sun, 16 May 2010 14:07:27 -0700

On 13/05/10 16:45, John Owens wrote:
> John Owens<john_owens<at>  yahoo.com>  writes:
>
>>
>> I'm having some trouble figuring out how to change a column in a DataFrame.
>
> Laurent, thanks for the careful and thoughtful replies. May I offer a
> more realistic example and ask three questions? What I would like to do
> is read in a csv with a Date column into R, then manipulate that column
> in python, then put the manipulated result as a date back into R. I
> am aware that R has some date-manipulation commands, but they're not
> nearly as complete as python's.
>
> #!/usr/bin/env python2.6
> import rpy2.robjects as ro
>
> df = ro.DataFrame({'a': ro.StrVector(("Mar-15-2010",
>                                        "Mar-16-2010",
>                                        "Mar-17-2010")),
>                     'b': ro.IntVector((4,5,6))})
> print df
> print df.colnames
> print df.rx2('a')
> print [x for x in df.rx2('a')]    # question 1
>
> df[0] = ro.StrVector(("2010-03-15",  # question 2
>                        "2010-03-16",
>                        "2010-03-17"));
> print df.rx2('a')
> # question 3
>
> Question 1: What I want is to see three text strings (the dates
> starting with "Mar"). Instead, what I see is [1, 2, 3]. I can't
> manipulate [1, 2, 3]. How do I get the text strings back out
> instead of [1, 2, 3]? (And what are [1, 2, 3] - a factorvector
> kind of representation?)


Yes. By default R converts vectors of strings into factors when 
constructing a data.frame. The way to avoid it is to wrap the vector 
into a call to "base.I()" (not my choice, that's the way it is in R)

> Question 2: Is there any way to do df['a'] instead of df[0]?

There is, but you'll have to implement a class overriding the default 
behaviour with something you like better.

.rx() and .rx2() are for R-style extraction, __getitem__/[ is for 
Python-style.
Both are kept apart to try minimizing the risk of mistakes that would 
occur when confusing how each behave (e.g., indexing starts at 1 in R, 
names do not have to be unique and R returns only the first one, 
etc...). Minimizing risks for hard-to-trace mistakes, even if at the 
cost of slightly more code to write, has been a deliberate choice (the 
relative absence of a '.' -> '_' conversion whenever ambiguous 
resolutions can happen is an other manifestation of this choice).

Having that said, hopefully I can be reasoned out of heresy if this is 
the case.

> How do I find out what the index is for a particular column name?

Python's way:
>>> tuple(df.colnames).index('b')
1
The index returned is a Python index (starts at zero).

(note: rpy2 vectors could directly implement the method index(), as the 
improvement was suggested by someone on this list - It will be in for 
2.2, and probably backported to a bugfix release in the 2.1.x series).
R's way will use "which()" (in the package "base").

> Question 3: How do I apply the R 'as.date' function to a
> column in a DataFrame, since that's what I need to do to treat
> them as dates in R?

Just call it:
base.as_Date(df.rx2("dates"))

I'll point you to the R documentation for details about the parameters 
for R's function "as.Date()".
(note: someone made the interesting suggestion to have a DateVector 
class in rpy2... this might be the easiest thing to offer to users)

> Once I get this working I'm happy to write it up as an example for
> the docs (tell me where you want me to put it in the source distro
> and I'll send a patch file).

Beside obvious corrections of grammatical horrors, or improvements of 
the existing documentation, I'd be gladly accepting "case-studies" to 
show how to go from requirements to implementation, or "cookbook" and go 
into a dedicated section. The client-server example already in the doc 
would move there.

> JDO
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> rpy-list mailing list
> rpy-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rpy-list


------------------------------------------------------------------------------

_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Re: [Rpy] Example code for changing a column in a dataframe?

Reply via email to