On 21/05/10 00:08, John Owens wrote: > I'd like to do two (actually three) things: > > 1) Using a grep-like operator, delete rows in a dataframe that match a > particular pattern in a particular column (in my case, every row that > has a '#' as the first character in column 'a') > 2) Set elements in a dataframe based on the characteristics of other > elements, across all rows (in my case, if an element in column 'c' > is NA, set it to 2*that row's value in column 'b') > 2a) Only do this if column 'd''s value is a particular value (in my > case, the character 'J') > > I'm trying to do this with calling the R code directly using ro.r, > but that's (a) not satisfying because I'd rather do it in python (how?), > (b) rpy/R doesn't seem to like doing code like "df = ro.r("function(df)"), > and (c) it doesn't work anyway. > > I'm having some coredump problems when instantiating the dataframe below > with NAs in it, so forgive any errd ors in the code since I can't run it. > Thanks for any help! > > JDO > > ========================================================== > > #!/usr/bin/env python2.6 > import rpy2.robjects as ro > > df = ro.DataFrame({'a': ro.StrVector(('# x','y','z')), > 'b': ro.IntVector((4,5,6)), > 'c': ro.IntVector((8,ro.NA_integer,10)), > 'd': ro.StrVector(('I','J','K')), > }) > > # would like to delete all rows whose name in column 'a' begins with a '#' > df = ro.r("df[grep('^#', sdpf[,%d], invert=TRUE),]" % \ > tuple(df.colnames).index('a'))
from rpy2.robjects.packages import importr base = importr('base') # if column 'a' is vector, you have to work on the levels. That's one # more level on indirection, so I avoid it for the sake of clarity. df = ro.DataFrame({'a': base.I(ro.StrVector(('# x','y','z'))), 'b': ro.IntVector((4,5,6)), 'c': ro.IntVector((8,ro.NA_integer[0],10)), 'd': base.I(ro.StrVector(('I','J','K'))), }) # leave out rows with elements of 'a' starting with '#' base.subset(df, base.parse(text='! grepl("^#", a)')) # same with more logic done in Python df.rx(ro.BoolVector([not x.startswith("#") for x in df.rx2('a')]), True) > # would like to set all NAs in 'c' to 2*value in 'b' > df = ro.r("ifelse(is.na(df$c), 2*df$b, df$c)") > > # would really like to do this only if column 'd' is 'J' - not sure how > Something like: def myfunc(i, df): if df.rx2('a')[i].startswith("#") and df.rx2('d') == 'J': rx2('c')[i] = rx2('b')[i] * 2 for i in range(df.nrow): myfunc(i, df) > > > ------------------------------------------------------------------------------ > > _______________________________________________ > rpy-list mailing list > rpy-list@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rpy-list ------------------------------------------------------------------------------ _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list