Thanks for your help!  Your original solution would have worked just fine
too if I didn't have a number of other large data frames loaded.  This is
my first time working with such large data sets, so I'd never previously
run out of available memory.  I finished up the project, though I'm sure
all be back to this mailing list in the future!


On Sun, Jan 27, 2013 at 11:54 AM, arun kirshna [via R] <
ml-node+s789695n4656784...@n4.nabble.com> wrote:

> Hi,
> I tried with bigger dataset.
>
> set.seed(25)
> names <- sample(c("bob", "joe", "[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=0>",
> "emily", "[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=1>"),5e6,replace=TRUE)
>
> set.seed(1651)
> emails
>  <- sample(c("[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=2>",
> "[hidden email] <http://user/SendEmail.jtp?type=node&node=4656784&i=3>", 
> "[hidden
> email] <http://user/SendEmail.jtp?type=node&node=4656784&i=4>",
>  "[hidden email] <http://user/SendEmail.jtp?type=node&node=4656784&i=5>",
> "[hidden email] 
> <http://user/SendEmail.jtp?type=node&node=4656784&i=6>"),5e6,replace=TRUE)
>
>
>  df <- data.frame(names, emails)
>  dim(df)
> #[1] 5000000       2
>  df[]<-lapply(df,as.character)
>  system.time(df[,1][grep("@",df$names)]<- "" )
> #   user  system elapsed
> #  1.732   0.108   1.844
>  system.time(dfNew1<-df[grep("\\w+",df$names),])
> #   user  system elapsed
> #  0.896   0.024   0.923
>  system.time(dfNew2<- df[df$names!="",])
> #   user  system elapsed
>  # 0.460   0.028   0.490
> A.K.
>
>
>
>
>
>
>
> ________________________________
> From: Yasha Podeswa <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=7>>
>
> To: arun <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=8>>
>
> Cc: R help <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=9>>;
> Uwe Ligges <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=10>>
>
> Sent: Sunday, January 27, 2013 2:05 PM
> Subject: Re: [R] Removing values containing a specific character
>
>
> You two were 100% right, it was just a memory issue.  This was part of a
> bigger project where I had a number of data frames loaded, all with 1-5
> million rows. Cleaned up my code to have less data frames loaded at once,
> and everything is working great.  Thanks for the help!
> On Jan 27, 2013 9:46 AM, "arun" <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=11>>
> wrote:
>
> Hi Yasha,
>
> >
> > I guess you got Uwe's response.
> >
> > I created `df2` with the intention of getting the two results from the
> original dataset.
> >For example, after you get the first result
> >df[,1][grep("@",df$names)]<- ""
> >#you can get the second result by:
> >df[df$names!="",]
> > # names             emails
> >#1   bob       [hidden 
> >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=12>
> >#2   joe [hidden 
> >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=13>
> >#4 emily   [hidden 
> >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=14>
> >
> >#or
> >df[grep("\\w+",df$names),]
> >#  names             emails
> >#1   bob       [hidden 
> >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=15>
> >#2   joe [hidden 
> >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=16>
> >#4 emily   [hidden 
> >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=17>
> >
> >But, I am  not sure how this will work over a 5.5 million rows.
> >A.K.
> >
> >
> >
> >
> >----- Original Message -----
> >From: ypodeswa <[hidden 
> >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=18>>
>
> >To: [hidden email]<http://user/SendEmail.jtp?type=node&node=4656784&i=19>
> >Cc:
> >Sent: Sunday, January 27, 2013 1:11 AM
> >Subject: Re: [R] Removing values containing a specific character
> >
> >Actually, it worked perfectly for my sample data, but my actual data has
> >5.5 million rows, and grep doesn't seem to work with over a million rows.
> >Any idea on a workaround?
> >
> >
> >On Sat, Jan 26, 2013 at 9:37 PM, Yasha Podeswa <[hidden 
> >email]<http://user/SendEmail.jtp?type=node&node=4656784&i=20>>
> wrote:
> >
> >> Awesome, thanks Arun, that's exactly what I was looking for!
> >>
> >>
> >> On Sat, Jan 26, 2013 at 9:21 PM, arun kirshna [via R] <
> >> [hidden email] <http://user/SendEmail.jtp?type=node&node=4656784&i=21>>
> wrote:
> >>
> >>> Hi,
> >>> Try this:
> >>> df[]<-lapply(df,as.character)
> >>> df2<-df
> >>> df[,1][grep("@",df$names)]<- ""
> >>> df
> >>>   #names             emails
> >>> #1   bob      [hidden 
> >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=22>
> >>> #2   joe [hidden 
> >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=23>
> >>> #3          [hidden 
> >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=24>
> >>> #4 emily  [hidden 
> >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=25>
> >>> #5          [hidden 
> >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=26>
> >>>
> >>> #2nd part:
> >>>
> >>>  df2[-grep("@",df2$names),]
> >>>   names             emails
> >>> #1   bob      [hidden 
> >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=27>
> >>> #2   joe [hidden 
> >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=28>
> >>> #4 emily  [hidden 
> >>> email]<http://user/SendEmail.jtp?type=node&node=4656784&i=29>
> >>> A.K.
> >>>
> >>> ------------------------------
> >>>  If you reply to this email, your message will be added to the
> >>> discussion below:
> >>>
> >>>
> http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656749.html
> >>>  To unsubscribe from Removing values containing a specific character,
> click
> >>> here<
>
> >>> .
> >>> NAML<
> http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> >>>
> >>
> >>
> >
> >
> >
> >
> >--
> >View this message in context:
> http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656751.html
> >Sent from the R help mailing list archive at Nabble.com.
> >    [[alternative HTML version deleted]]
> >
> >______________________________________________
> >[hidden email] 
> ><http://user/SendEmail.jtp?type=node&node=4656784&i=30>mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4656784&i=31>mailing 
> list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656784.html
>  To unsubscribe from Removing values containing a specific character, click
> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4656744&code=eXBvZGVzd2FAZ21haWwuY29tfDQ2NTY3NDR8LTEyMTY0MzM4NDk=>
> .
> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656790.html
Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to