Hello, I'm glad it helped. See answer inline.
Em 03-07-2012 17:09, Claudia Penaloza escreveu:
Thank you Rui and Jim, both 'i1' and 'i1new' worked perfectly because there are no instances of 'Dd' or 'dD' in the data set (that I would/not want to include/exclude)... but I understand that 'i1new' targets precisely what I want. Why isn't a leader of zero's required for either 'i1' or 'i1new', as so? i1newer <- grepl("^0{0,}[D]*$|^0{0,}[d]*$", dd$ch)
Because both 'i1' and 'i1new' test from beginning to end of string, allowing only '0' and either 'd' or 'D', but not both (i1new).
So, there's no need to explicitly test for a string that begins with '0'. Rui Barradas
Thank you again, Claudia On Tue, Jul 3, 2012 at 2:06 AM, Rui Barradas <ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>> wrote: Hello, Inline. Em 03-07-2012 01:15, jim holtman escreveu: You will have to change the 'i1' expression as follows: i1 <- grepl("^([0D]|[0d])*$", dd$ch) i1 # matches strings with d & D in them [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # second string had 'd' & 'D' in it so it was TRUE above and FALSE below i1new <- grepl("^([0D]*$|[0d]*$)", dd$ch) i1new [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE Right, apparently, I forgot that grep is greedy, and the test cases were not complete. I put a 'd' and 'D' in the second string and the original regular expression is equivalent to grepl("^[0dD]*$", dd$ch) This is only for the first request, and does not solve cases where there are characters other than '0', 'd' or 'D', but 'd' or 'D' are the first non-zero. This is the case of my 4th row, changed from the OP's data example. My regexpr for 'i2' is equivalent to this one, that I believe is more readable: i2b <- grepl("^0{0,}[Dd]", dd$ch) First a zero, that might occur zero or more times, then a 'd' or 'D', then and til the end, irrelevant. which will match strings containing d, D and 0. If you only want 'd' or 'D' (and not both), then you will have to use the one in 'i1new'. To the OP: bottom line, use Jim's 'i1new' and my 'i2' or 'i2b'. Rui Barradas On Mon, Jul 2, 2012 at 7:24 PM, Rui Barradas <ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>> wrote: Hello, Try regular expressions instead. In this data.frame, I've changed row nr.4 to have a row with 'D' as first non-zero character. dd <- read.table(text=" ch count 1 0000000000D0000000000000000000__000000000000000000 0.007368 2 0000000000d0000000000000000000__000000000000000000 0.002456 3 000000000T00000000000000000000__000000000000000000 0.007368 4 000000000DT0000000000000000000__000000000000000000 0.007368 5 000000000T00000000000000000000__000000000000000000 0.002456 6 000000000Td0000000000000000000__000000000000000000 0.002456 7 00000000T000000000000000000000__000000000000000000 0.007368 8 00000000T0D0000000000000000000__000000000000000000 0.007368 9 00000000T000000000000000000000__000000000000000000 0.002456 10 00000000T0d0000000000000000000__000000000000000000 0.002456 ", header=TRUE) dd i1 <- grepl("^([0D]|[0d])*$", dd$ch) i2 <- grepl("^0*[Dd]", dd$ch) dd[!i1, ] dd[!i2, ] dd[!(i1 | i2), ] Hope this helps, Rui Barradas Em 02-07-2012 23:48, Claudia Penaloza escreveu: I would like to remove rows from the following data frame (df) if there are only two specific elements found in the df$ch character string (I want to remove rows with only "0" & "D" or "0" & "d"). Alternatively, I would like to remove rows if the first non-zero element is "D" or "d". ch count 1 0000000000D0000000000000000000__000000000000000000 0.007368; 2 0000000000d0000000000000000000__000000000000000000 0.002456; 3 000000000T00000000000000000000__000000000000000000 0.007368; 4 000000000TD0000000000000000000__000000000000000000 0.007368; 5 000000000T00000000000000000000__000000000000000000 0.002456; 6 000000000Td0000000000000000000__000000000000000000 0.002456; 7 00000000T000000000000000000000__000000000000000000 0.007368; 8 00000000T0D0000000000000000000__000000000000000000 0.007368; 9 00000000T000000000000000000000__000000000000000000 0.002456; 10 00000000T0d0000000000000000000__000000000000000000 0.002456; I tried the following but it doesn't work if there is more than one character per string: df <- df[!df$ch %in% c("0","D"),] df <- df[!df$ch %in% c("0","d"),] Any help greatly appreciated, Claudia [[alternative HTML version deleted]] ________________________________________________ R-help@r-project.org <mailto:R-help@r-project.org> mailing list https://stat.ethz.ch/mailman/__listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/__posting-guide.html <http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ________________________________________________ R-help@r-project.org <mailto:R-help@r-project.org> mailing list https://stat.ethz.ch/mailman/__listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/__posting-guide.html <http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.