Thanks for your help! Your original solution would have worked just fine too if I didn't have a number of other large data frames loaded. This is my first time working with such large data sets, so I'd never previously run out of available memory. I finished up the project, though I'm sure all be back to this mailing list in the future!
On Sun, Jan 27, 2013 at 11:54 AM, arun kirshna [via R] < ml-node+s789695n4656784...@n4.nabble.com> wrote: > Hi, > I tried with bigger dataset. > > set.seed(25) > names <- sample(c("bob", "joe", "[hidden > email]<http://user/SendEmail.jtp?type=node&node=4656784&i=0>", > "emily", "[hidden > email]<http://user/SendEmail.jtp?type=node&node=4656784&i=1>"),5e6,replace=TRUE) > > set.seed(1651) > emails > <- sample(c("[hidden > email]<http://user/SendEmail.jtp?type=node&node=4656784&i=2>", > "[hidden email] <http://user/SendEmail.jtp?type=node&node=4656784&i=3>", > "[hidden > email] <http://user/SendEmail.jtp?type=node&node=4656784&i=4>", > "[hidden email] <http://user/SendEmail.jtp?type=node&node=4656784&i=5>", > "[hidden email] > <http://user/SendEmail.jtp?type=node&node=4656784&i=6>"),5e6,replace=TRUE) > > > df <- data.frame(names, emails) > dim(df) > #[1] 5000000 2 > df[]<-lapply(df,as.character) > system.time(df[,1][grep("@",df$names)]<- "" ) > # user system elapsed > # 1.732 0.108 1.844 > system.time(dfNew1<-df[grep("\\w+",df$names),]) > # user system elapsed > # 0.896 0.024 0.923 > system.time(dfNew2<- df[df$names!="",]) > # user system elapsed > # 0.460 0.028 0.490 > A.K. > > > > > > > > ________________________________ > From: Yasha Podeswa <[hidden > email]<http://user/SendEmail.jtp?type=node&node=4656784&i=7>> > > To: arun <[hidden > email]<http://user/SendEmail.jtp?type=node&node=4656784&i=8>> > > Cc: R help <[hidden > email]<http://user/SendEmail.jtp?type=node&node=4656784&i=9>>; > Uwe Ligges <[hidden > email]<http://user/SendEmail.jtp?type=node&node=4656784&i=10>> > > Sent: Sunday, January 27, 2013 2:05 PM > Subject: Re: [R] Removing values containing a specific character > > > You two were 100% right, it was just a memory issue. This was part of a > bigger project where I had a number of data frames loaded, all with 1-5 > million rows. Cleaned up my code to have less data frames loaded at once, > and everything is working great. Thanks for the help! > On Jan 27, 2013 9:46 AM, "arun" <[hidden > email]<http://user/SendEmail.jtp?type=node&node=4656784&i=11>> > wrote: > > Hi Yasha, > > > > > I guess you got Uwe's response. > > > > I created `df2` with the intention of getting the two results from the > original dataset. > >For example, after you get the first result > >df[,1][grep("@",df$names)]<- "" > >#you can get the second result by: > >df[df$names!="",] > > # names emails > >#1 bob [hidden > >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=12> > >#2 joe [hidden > >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=13> > >#4 emily [hidden > >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=14> > > > >#or > >df[grep("\\w+",df$names),] > ># names emails > >#1 bob [hidden > >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=15> > >#2 joe [hidden > >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=16> > >#4 emily [hidden > >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=17> > > > >But, I am not sure how this will work over a 5.5 million rows. > >A.K. > > > > > > > > > >----- Original Message ----- > >From: ypodeswa <[hidden > >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=18>> > > >To: [hidden email]<http://user/SendEmail.jtp?type=node&node=4656784&i=19> > >Cc: > >Sent: Sunday, January 27, 2013 1:11 AM > >Subject: Re: [R] Removing values containing a specific character > > > >Actually, it worked perfectly for my sample data, but my actual data has > >5.5 million rows, and grep doesn't seem to work with over a million rows. > >Any idea on a workaround? > > > > > >On Sat, Jan 26, 2013 at 9:37 PM, Yasha Podeswa <[hidden > >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=20>> > wrote: > > > >> Awesome, thanks Arun, that's exactly what I was looking for! > >> > >> > >> On Sat, Jan 26, 2013 at 9:21 PM, arun kirshna [via R] < > >> [hidden email] <http://user/SendEmail.jtp?type=node&node=4656784&i=21>> > wrote: > >> > >>> Hi, > >>> Try this: > >>> df[]<-lapply(df,as.character) > >>> df2<-df > >>> df[,1][grep("@",df$names)]<- "" > >>> df > >>> #names emails > >>> #1 bob [hidden > >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=22> > >>> #2 joe [hidden > >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=23> > >>> #3 [hidden > >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=24> > >>> #4 emily [hidden > >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=25> > >>> #5 [hidden > >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=26> > >>> > >>> #2nd part: > >>> > >>> df2[-grep("@",df2$names),] > >>> names emails > >>> #1 bob [hidden > >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=27> > >>> #2 joe [hidden > >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=28> > >>> #4 emily [hidden > >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=29> > >>> A.K. > >>> > >>> ------------------------------ > >>> If you reply to this email, your message will be added to the > >>> discussion below: > >>> > >>> > http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656749.html > >>> To unsubscribe from Removing values containing a specific character, > click > >>> here< > > >>> . > >>> NAML< > http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > > >>> > >> > >> > > > > > > > > > >-- > >View this message in context: > http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656751.html > >Sent from the R help mailing list archive at Nabble.com. > > [[alternative HTML version deleted]] > > > >______________________________________________ > >[hidden email] > ><http://user/SendEmail.jtp?type=node&node=4656784&i=30>mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > [hidden email] <http://user/SendEmail.jtp?type=node&node=4656784&i=31>mailing > list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656784.html > To unsubscribe from Removing values containing a specific character, click > here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4656744&code=eXBvZGVzd2FAZ21haWwuY29tfDQ2NTY3NDR8LTEyMTY0MzM4NDk=> > . > NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656790.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.