On 13/05/10 16:45, John Owens wrote: > John Owens<john_owens<at> yahoo.com> writes: > >> >> I'm having some trouble figuring out how to change a column in a DataFrame. > > Laurent, thanks for the careful and thoughtful replies. May I offer a > more realistic example and ask three questions? What I would like to do > is read in a csv with a Date column into R, then manipulate that column > in python, then put the manipulated result as a date back into R. I > am aware that R has some date-manipulation commands, but they're not > nearly as complete as python's. > > #!/usr/bin/env python2.6 > import rpy2.robjects as ro > > df = ro.DataFrame({'a': ro.StrVector(("Mar-15-2010", > "Mar-16-2010", > "Mar-17-2010")), > 'b': ro.IntVector((4,5,6))}) > print df > print df.colnames > print df.rx2('a') > print [x for x in df.rx2('a')] # question 1 > > df[0] = ro.StrVector(("2010-03-15", # question 2 > "2010-03-16", > "2010-03-17")); > print df.rx2('a') > # question 3 > > Question 1: What I want is to see three text strings (the dates > starting with "Mar"). Instead, what I see is [1, 2, 3]. I can't > manipulate [1, 2, 3]. How do I get the text strings back out > instead of [1, 2, 3]? (And what are [1, 2, 3] - a factorvector > kind of representation?)
Yes. By default R converts vectors of strings into factors when constructing a data.frame. The way to avoid it is to wrap the vector into a call to "base.I()" (not my choice, that's the way it is in R) > Question 2: Is there any way to do df['a'] instead of df[0]? There is, but you'll have to implement a class overriding the default behaviour with something you like better. .rx() and .rx2() are for R-style extraction, __getitem__/[ is for Python-style. Both are kept apart to try minimizing the risk of mistakes that would occur when confusing how each behave (e.g., indexing starts at 1 in R, names do not have to be unique and R returns only the first one, etc...). Minimizing risks for hard-to-trace mistakes, even if at the cost of slightly more code to write, has been a deliberate choice (the relative absence of a '.' -> '_' conversion whenever ambiguous resolutions can happen is an other manifestation of this choice). Having that said, hopefully I can be reasoned out of heresy if this is the case. > How do I find out what the index is for a particular column name? Python's way: >>> tuple(df.colnames).index('b') 1 The index returned is a Python index (starts at zero). (note: rpy2 vectors could directly implement the method index(), as the improvement was suggested by someone on this list - It will be in for 2.2, and probably backported to a bugfix release in the 2.1.x series). R's way will use "which()" (in the package "base"). > Question 3: How do I apply the R 'as.date' function to a > column in a DataFrame, since that's what I need to do to treat > them as dates in R? Just call it: base.as_Date(df.rx2("dates")) I'll point you to the R documentation for details about the parameters for R's function "as.Date()". (note: someone made the interesting suggestion to have a DateVector class in rpy2... this might be the easiest thing to offer to users) > Once I get this working I'm happy to write it up as an example for > the docs (tell me where you want me to put it in the source distro > and I'll send a patch file). Beside obvious corrections of grammatical horrors, or improvements of the existing documentation, I'd be gladly accepting "case-studies" to show how to go from requirements to implementation, or "cookbook" and go into a dedicated section. The client-server example already in the doc would move there. > JDO > > > ------------------------------------------------------------------------------ > > _______________________________________________ > rpy-list mailing list > rpy-list@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rpy-list ------------------------------------------------------------------------------ _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list