On 13/05/10 16:45, John Owens wrote:
> John Owens<john_owens<at> yahoo.com> writes:
>
>>
>> I'm having some trouble figuring out how to change a column in a DataFrame.
>
> Laurent, thanks for the careful and thoughtful replies. May I offer a
> more realistic example and ask three questions? What I would like to do
> is read in a csv with a Date column into R, then manipulate that column
> in python, then put the manipulated result as a date back into R. I
> am aware that R has some date-manipulation commands, but they're not
> nearly as complete as python's.
>
> #!/usr/bin/env python2.6
> import rpy2.robjects as ro
>
> df = ro.DataFrame({'a': ro.StrVector(("Mar-15-2010",
> "Mar-16-2010",
> "Mar-17-2010")),
> 'b': ro.IntVector((4,5,6))})
> print df
> print df.colnames
> print df.rx2('a')
> print [x for x in df.rx2('a')] # question 1
>
> df[0] = ro.StrVector(("2010-03-15", # question 2
> "2010-03-16",
> "2010-03-17"));
> print df.rx2('a')
> # question 3
>
> Question 1: What I want is to see three text strings (the dates
> starting with "Mar"). Instead, what I see is [1, 2, 3]. I can't
> manipulate [1, 2, 3]. How do I get the text strings back out
> instead of [1, 2, 3]? (And what are [1, 2, 3] - a factorvector
> kind of representation?)
Yes. By default R converts vectors of strings into factors when
constructing a data.frame. The way to avoid it is to wrap the vector
into a call to "base.I()" (not my choice, that's the way it is in R)
> Question 2: Is there any way to do df['a'] instead of df[0]?
There is, but you'll have to implement a class overriding the default
behaviour with something you like better.
.rx() and .rx2() are for R-style extraction, __getitem__/[ is for
Python-style.
Both are kept apart to try minimizing the risk of mistakes that would
occur when confusing how each behave (e.g., indexing starts at 1 in R,
names do not have to be unique and R returns only the first one,
etc...). Minimizing risks for hard-to-trace mistakes, even if at the
cost of slightly more code to write, has been a deliberate choice (the
relative absence of a '.' -> '_' conversion whenever ambiguous
resolutions can happen is an other manifestation of this choice).
Having that said, hopefully I can be reasoned out of heresy if this is
the case.
> How do I find out what the index is for a particular column name?
Python's way:
>>> tuple(df.colnames).index('b')
1
The index returned is a Python index (starts at zero).
(note: rpy2 vectors could directly implement the method index(), as the
improvement was suggested by someone on this list - It will be in for
2.2, and probably backported to a bugfix release in the 2.1.x series).
R's way will use "which()" (in the package "base").
> Question 3: How do I apply the R 'as.date' function to a
> column in a DataFrame, since that's what I need to do to treat
> them as dates in R?
Just call it:
base.as_Date(df.rx2("dates"))
I'll point you to the R documentation for details about the parameters
for R's function "as.Date()".
(note: someone made the interesting suggestion to have a DateVector
class in rpy2... this might be the easiest thing to offer to users)
> Once I get this working I'm happy to write it up as an example for
> the docs (tell me where you want me to put it in the source distro
> and I'll send a patch file).
Beside obvious corrections of grammatical horrors, or improvements of
the existing documentation, I'd be gladly accepting "case-studies" to
show how to go from requirements to implementation, or "cookbook" and go
into a dedicated section. The client-server example already in the doc
would move there.
> JDO
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> rpy-list mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rpy-list
------------------------------------------------------------------------------
_______________________________________________
rpy-list mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rpy-list