Hi, Thanks for the response. Unfortunately this did not solve my problem and may be the way I represented my data would be the problem. I am not sure that I can give a link for the data which will give a clear representation. If that is not a proper way, I have to follow my original method.
Regards, Sri On Thu, Jul 28, 2016 at 12:56 AM, jeremiah rounds <roundsjerem...@gmail.com> wrote: > Correction to my code. I created a "doc" variable because I was thinking > of doing something faster, but I never did the change. grep needed to work > on the original source "dat" to be used for counting. > > Fixed: > > combs = structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, 34L, > 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1", > "V2", "V3"), class = "data.frame", row.names = c(NA, -5L)) > > dat = list( > c(77,65,34,23,55, 65,23,77, 44), > c(65,23,77,65,55,34, 77, 34,65, 10), > c(77,34,65), > c(55,78,56), > c(98,23,77,65,34, 65, 23, 77, 34)) > > > words = unlist(apply(combs, 1 , function(d) paste(as.character(d), > collapse=" "))) > dat = lapply(dat, function(d) paste( as.character(d), collapse= " ")) > #doc = paste(dat, collapse = " ## ") # just some arbitrary separator > character that isn't in your words > counts = sapply(words, function(w) length(grep(w, dat))) > names(counts) = words > counts > cbind(combs, data.frame(N = counts)) > > > On Wed, Jul 27, 2016 at 11:27 AM, sri vathsan <srivib...@gmail.com> wrote: > >> Hi, >> >> It is not a just 79 triplets. As I said, there are 79 codes. I am making >> triplets out of that 79 codes and matching the triplets in the list. >> >> Please find the dput of the data below. >> >> > dput(head(newd,10)) >> structure(list(uniq_id = c("1", "2", "3", "4", "5", "6", "7", >> "8", "9", "10"), hi = c("11, 22, 84, 85, 108, 111", "18, 84, 85, >> 87, 122, 134", >> "2, 18, 22", "18, 108, 122, 134, 176", "19, 85, 87, 100, 107", >> "79, 85, 111", "11, 88, 108", "19, 88, 96", "19, 85, 96", >> "19, 100, 103")), .Names = c("uniq_id", "hi"), row.names = c(NA, >> -10L), class = c("tbl_df", "tbl", "data.frame")) >> > >> >> I am trying to count the frequency of the triplets in the above data using >> the below code. >> >> # split column into a list >> myList <- strsplit(newd$hi, split=",") >> # get all pairwise combinations >> myCombos <- t(combn(unique(unlist(myList)), 3)) >> # count the instances where the pair is present >> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) { >> sum(sapply(myList, function(j) { >> sum(!is.na(match(c(myCombos[i,]), j)))})==3)}) >> #final matrix >> final <- cbind(matrix(as.integer(myCombos), nrow(myCombos)), myCounts) >> >> I hope I made my point clear. Please let me know if I miss anything. >> >> Regards, >> Sri >> >> >> >> >> On Wed, Jul 27, 2016 at 11:19 PM, Sarah Goslee <sarah.gos...@gmail.com> >> wrote: >> >> > You said you had 79 triplets and 8000 records. >> > >> > When I compared 100 triplets to 10000 records it took 86 seconds. >> > >> > So obviously there is something you're not telling us about the format >> > of your data. >> > >> > If you use dput() to provide actual examples, you will get better >> > results than if we on Rhelp have to guess. Because we tend to guess in >> > ways that make the most sense after extensive R experience, and that's >> > probably not what you have. >> > >> > Sarah >> > >> > On Wed, Jul 27, 2016 at 1:29 PM, sri vathsan <srivib...@gmail.com> >> wrote: >> > > Hi, >> > > >> > > Thanks for the solution. But I am afraid that after running this code >> > still >> > > it takes more time. It has been an hour and still it is executing. I >> > > understand the delay because each triplet has to compare almost 9000 >> > > elements. >> > > >> > > Regards, >> > > Sri >> > > >> > > On Wed, Jul 27, 2016 at 9:02 PM, Sarah Goslee <sarah.gos...@gmail.com >> > >> > > wrote: >> > >> >> > >> Hi, >> > >> >> > >> It's really a good idea to use dput() or some other reproducible way >> > >> to provide data. I had to guess as to what your data looked like. >> > >> >> > >> It appears that order doesn't matter? >> > >> >> > >> Given than, here's one approach: >> > >> >> > >> combs <- structure(list(V1 = c(65L, 77L, 55L, 23L, 34L), V2 = c(23L, >> > 34L, >> > >> 34L, 77L, 65L), V3 = c(77L, 65L, 23L, 34L, 55L)), .Names = c("V1", >> > >> "V2", "V3"), class = "data.frame", row.names = c(NA, -5L)) >> > >> >> > >> dat <- list( >> > >> c(77,65,34,23,55), >> > >> c(65,23,77,65,55,34), >> > >> c(77,34,65), >> > >> c(55,78,56), >> > >> c(98,23,77,65,34)) >> > >> >> > >> >> > >> sapply(seq_len(nrow(combs)), function(i)sum(sapply(dat, >> > >> function(j)all(combs[i,] %in% j)))) >> > >> >> > >> On a dataset of comparable time to yours, it takes me under a minute >> > and a >> > >> half. >> > >> >> > >> > combs <- combs[rep(1:nrow(combs), length=100), ] >> > >> > dat <- dat[rep(1:length(dat), length=10000)] >> > >> > >> > >> > dim(combs) >> > >> [1] 100 3 >> > >> > length(dat) >> > >> [1] 10000 >> > >> > >> > >> > system.time(test <- sapply(seq_len(nrow(combs)), >> > >> > function(i)sum(sapply(dat, function(j)all(combs[i,] %in% j))))) >> > >> user system elapsed >> > >> 86.380 0.006 86.391 >> > >> >> > >> >> > >> >> > >> >> > >> On Wed, Jul 27, 2016 at 10:47 AM, sri vathsan <srivib...@gmail.com> >> > wrote: >> > >> > Hi, >> > >> > >> > >> > Apologizes for the less information. >> > >> > >> > >> > Basically, myCombos is a matrix with 3 variables which is a triplet >> > that >> > >> > is >> > >> > a combination of 79 codes. There are around 3lakh combination as >> such >> > >> > and >> > >> > it looks like below. >> > >> > >> > >> > V1 V2 V3 >> > >> > 65 23 77 >> > >> > 77 34 65 >> > >> > 55 34 23 >> > >> > 23 77 34 >> > >> > 34 65 55 >> > >> > >> > >> > Each triplet will compare in a list (mylist) having 8177 elements >> > which >> > >> > will looks like below. >> > >> > >> > >> > 77,65,34,23,55 >> > >> > 65,23,77,65,55,34 >> > >> > 77,34,65 >> > >> > 55,78,56 >> > >> > 98,23,77,65,34 >> > >> > >> > >> > Now I want to count the no of occurrence of the triplet in the >> above >> > >> > list. >> > >> > I.e., the triplet 65 23 77 is seen 3 times in the list. So my >> output >> > >> > looks >> > >> > like below >> > >> > >> > >> > V1 V2 V3 Freq >> > >> > 65 23 77 3 >> > >> > 77 34 65 4 >> > >> > 55 34 23 2 >> > >> > >> > >> > I hope, I made it clear this time. >> > >> > >> > >> > >> > >> > On Wed, Jul 27, 2016 at 7:00 PM, Bert Gunter < >> bgunter.4...@gmail.com> >> > >> > wrote: >> > >> > >> > >> >> Not entirely sure I understand, but match() is already >> vectorized, so >> > >> >> you >> > >> >> should be able to lose the supply(). This would speed things up a >> > lot. >> > >> >> Please re-read ?match *carefully* . >> > >> >> >> > >> >> Bert >> > >> >> >> > >> >> On Jul 27, 2016 6:15 AM, "sri vathsan" <srivib...@gmail.com> >> wrote: >> > >> >> >> > >> >> Hi, >> > >> >> >> > >> >> I created list of 3 combination numbers (mycombos, around 3 lakh >> > >> >> combinations) and counting the occurrence of those combination in >> > >> >> another >> > >> >> list. This comparision list (mylist) is having around 8000 >> records.I >> > am >> > >> >> using the following code. >> > >> >> >> > >> >> myCounts <- sapply(1:nrow(myCombos), FUN=function(i) { >> > >> >> sum(sapply(myList, function(j) { >> > >> >> sum(!is.na(match(c(myCombos[i,]), j)))})==3)}) >> > >> >> >> > >> >> The above code takes very long time to execute and is there any >> other >> > >> >> effecting method which will reduce the time. >> > >> >> -- >> > >> >> >> > >> >> Regards, >> > >> >> Srivathsan.K >> > >> >> >> > > >> > > >> > > >> > > >> > >> >> >> >> -- >> >> Regards, >> Srivathsan.K >> Phone : 9600165206 >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- Regards, Srivathsan.K Phone : 9600165206 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.