Sarah, Thank you very much. For the other variables I was trying to do the same job in different way because it is easier to list it
Example test < which(dat$var1 !="BAA" | dat$var1 !="FAG" ) { dat <- dat[-test,]} and I did not get the right result. What am I missing here? On Wed, Nov 11, 2015 at 7:54 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote: > On Wed, Nov 11, 2015 at 8:44 PM, Ashta <sewa...@gmail.com> wrote: > > Hi Sarah, > > > > I used the following to clean my data, the program crushed several times. > > > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > > > What is the difference between these two > > > > test <- dat[dat$Var1 %in% "YYZ" | dat$Var1 %in% "MSN" ,] > > Besides that you're using %in% wrong? I told you how to proceed. > > myvalues <- c("YYZ", "MSN") > > test <- subset(dat, Var1 %in% myvalues) > > > > subset(dat, Var1 %in% myvalues) > X Var1 Freq > 3 3 MSN 1040 > 4 4 YYZ 300 > > > > > > > > > > > On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.gos...@gmail.com> > > wrote: > >> > >> Please keep replies on the list so others may participate in the > >> conversation. > >> > >> If you have a character vector containing the potential values, you > >> might look at %in% for one approach to subsetting your data. > >> > >> Var1 %in% myvalues > >> > >> Sarah > >> > >> On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewa...@gmail.com> wrote: > >> > Thank you Sarah for your prompt response! > >> > > >> > I have the list of values of the variable Var1 it is around 20. > >> > How can I modify this one to include all the 20 valid values? > >> > > >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > >> > > >> > Is there a way (efficient ) of doing it? > >> > > >> > Thank you again > >> > > >> > > >> > > >> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.gos...@gmail.com > > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewa...@gmail.com> wrote: > >> >> > Hi all, > >> >> > > >> >> > I have a data frame with huge rows and columns. > >> >> > > >> >> > When I looked at the data, it has several garbage values need to > be > >> >> > > >> >> > cleaned. For a sample I am showing you the frequency distribution > >> >> > of one variables > >> >> > > >> >> > Var1 Freq > >> >> > 1 : 3 > >> >> > 2 ] 6 > >> >> > 3 MSN 1040 > >> >> > 4 YYZ 300 > >> >> > 5 \\ 4 > >> >> > 6 + 3 > >> >> > 7. ?> 15 > >> >> > >> >> Please use dput() to provide your data. I made a guess at what you > had > >> >> in R, but could be wrong. > >> >> > >> >> > >> >> > and continues. > >> >> > > >> >> > I want to keep those rows that contain only a valid variable value > >> >> > > >> >> > In this case MSN and YYZ. I tried the following > >> >> > > >> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* > >> >> > > >> >> > but I am not getting the desired result. > >> >> > >> >> What are you getting? How does it differ from the desired result? > >> >> > >> >> > I have > >> >> > > >> >> > Any help or idea? > >> >> > >> >> I get: > >> >> > >> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", > >> >> > "\\\\", > >> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = > >> >> c("X", > >> >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) > >> >> > > >> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > >> >> > test > >> >> X Var1 Freq > >> >> 3 3 MSN 1040 > >> >> 4 4 YYZ 300 > >> >> > >> >> Which seems reasonable to me. > >> >> > >> >> > >> >> > > >> >> > [[alternative HTML version deleted]] > >> >> > >> >> Please don't post in HTML either: it introduces all sorts of errors > to > >> >> your message. > >> >> > >> >> Sarah > >> >> > > > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.