> From: Omar André Gonzáles Díaz > Subject: [R] regex not working for some entries in for loop > > I'm using some regex in a for loop to check for some values in column > "source", > and put a result in column "fuente".
Your regexes are on multiple lines and include whitespace and linefeeds. For example you are not testing for " .*forum.*|.*buy.*"; you are testing for " .*forum.*| .*buy.*" (which among other things includes a \n) Don’t do that. Keep it to one line with no white space. if you must have line breaks in the code, form the pattern using paste, as in pat1 <- paste(c("site.*", ".*event.*", ".*free.*", ".*theguardlan.*", ".*guardlink.*", ".*torture.*", ".*forum.*", ".*buy.*", ".*share.*", ".*buttons.*", ".*pyme\\.lavoztx\\.com\\.*", ".*amezon.*", "computrabajo.com.pe", ".*porn.*", "quality"), collapse="|") spam <- grepl(pat1, sf$source,ignore.case = T) Also, it's not immediately clear why you’re looping. grepl returns a vector of logicals; you have a vector of character strings. Consider replacing 'if' constructs with 'ifelse' - albeit a complicated ifelse() - and doing the whole thing without a loop. S Ellison ******************************************************************* This email and any attachments are confidential. Any use, copying or disclosure other than by the intended recipient is unauthorised. If you have received this message in error, please notify the sender immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com and delete this message and any copies from your computer and network. LGC Limited. Registered in England 2991879. Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.