Re: [R] Identyfing rows with specific conditions
You may run into memory issues if the table is that large in which case you may need to break your customers into subsets and process each subset separately. Then combine the results of the subsets into a single file. There are a couple of relatively easy ways to speed things up, but I don't know if it will be enough. The simplest is to change from data.frames to matrices: Meals <- structure(list(mealAcode = c(34L, 89L, 25L, 34L, 25L), mealBcode = c(66L, 39L, 77L, 39L, 34L)), .Names = c("mealAcode", "mealBcode"), row.names = c(NA, -5L), class = "data.frame") Customers <- structure(list(id = c(15L, 11L, 85L), M1 = c(77L, 25L, 89L), M2 = c(34L, 34L, 25L), M3 = c(25L, 39L, 77L)), .Names = c("id", "M1", "M2", "M3"), class = "data.frame", row.names = c(NA, -3L )) Meals <- as.matrix(Meals) Meals <- unname(Meals) Customers <- as.matrix(Customers) Customers <- unname(Customers) Results <- matrix(nrow=0, ncol=3) k <- 0 for (i in seq_len(nrow(Meals))) { for (j in seq_len(nrow(Customers))) { if (any(Customers[j, -1] %in% Meals[i, 1]) & any(Customers[j, -1] %in% Meals[i, 2])) { k <- k+1 Results <- rbind(Results, c(Meals[i, ], Customers[j, 1])) } } } colnames(Results) <- c("mealAcode", "mealBcode", "id") Results This should process quite a bit faster, but is still slowed down by the fact that we are allocating memory for each match. If we pre-allocate the memory, it is faster, but you have to know how much to allocate: Customers <- unname(Customers) Meals <- unname(Meals) Results <- matrix(nrow=100, ncol=3) k <- 0 for (i in seq_len(nrow(Meals))) { for (j in seq_len(nrow(Customers))) { if (any(Customers[j, -1] %in% Meals[i, 1]) & any(Customers[j, -1] %in% Meals[i, 2])) { k <- k+1 Results[k, ] <- c(Meals[i, ], Customers[j, 1]) } } } colnames(Results) <- c("mealAcode", "mealBcode", "id") Results This pre-allocates space for a million rows so it should be even faster, but it will fail if there are more rows, so guess high. There are some specialized packages such as data.table and dplyr in R that might be even faster. Also package parallel could use parallel processing to speed things up. David C From: Allaisone 1 [mailto:allaiso...@hotmail.com] Sent: Wednesday, May 24, 2017 7:54 AM To: David L Carlson ; Bert Gunter Cc: r-help@r-project.org Subject: Re: [R] Identyfing rows with specific conditions Dear David .., Many thanks for spending the time and effort solving this problem. I liked your suggestion to generate the results with the first format as this will be very helpful for further analysis : # mealAcode mealBcode id # 1 25 77 15 # 2 25 77 85 # 3 34 39 11 # 4 25 34 15 # 5 25 34 11 I wonder if if there is any method to process the analysis faster. The end output will be a very large table as I already know from my previous analysis that for example I have around 20,000 customers taking one of these combinations and there are about 17,000 customers taking another combination of meals and so on. Considering that I have 2000 rows (2000 combinations) and that number of customers per combination is very high (20,000 or lower), this will generate a huge table with millions of rows which is very complex. I have tested the code just to see and as expected it takes a long time(hours) before I stopped the analysis. From: David L Carlson Sent: 23 May 2017 18:20:02 To: Allaisone 1; Bert Gunter Cc: r-help@r-project.org Subject: RE: [R] Identyfing rows with specific conditions You will generally get a better response if you create a data set that we can test our ideas on. You sketched out the data but your final results include meal combinations that are not in your Meals data frame. Also using dput() to provide the data makes it easier to recreate the data since just printing it loses important information such as if the numbers are integer or decimal, or if the character variables are characters or factors. Also send your message as plain text, not html which messes up columns and other details. Here is a modification of your data with the beginnings of a solution: Meals <- structure(list(mealAcode = c(34L, 89L, 25L, 34L, 25L), mealBcode = c(66L, 39L, 77L, 39L, 34L)), .Names = c("mealAcode", "mealBcode"), row.names = c(NA, -5L), class = "data.frame") Meals # mealAcode mealBcode # 1 34 66 # 2 89 39 # 3 25 77 # 4 34 39 # 5 25 34 Customers <- structure(l
Re: [R] Identyfing rows with specific conditions
You will generally get a better response if you create a data set that we can test our ideas on. You sketched out the data but your final results include meal combinations that are not in your Meals data frame. Also using dput() to provide the data makes it easier to recreate the data since just printing it loses important information such as if the numbers are integer or decimal, or if the character variables are characters or factors. Also send your message as plain text, not html which messes up columns and other details. Here is a modification of your data with the beginnings of a solution: Meals <- structure(list(mealAcode = c(34L, 89L, 25L, 34L, 25L), mealBcode = c(66L, 39L, 77L, 39L, 34L)), .Names = c("mealAcode", "mealBcode"), row.names = c(NA, -5L), class = "data.frame") Meals # mealAcode mealBcode # 13466 # 28939 # 32577 # 43439 # 52534 Customers <- structure(list(id = c(15L, 11L, 85L), M1 = c(77L, 25L, 89L), M2 = c(34L, 34L, 25L), M3 = c(25L, 39L, 77L)), .Names = c("id", "M1", "M2", "M3"), class = "data.frame", row.names = c(NA, -3L )) Customers # id M1 M2 M3 # 1 15 77 34 25 # 2 11 25 34 39 # 3 85 89 25 77 Results <- data.frame(mealAcode=NA, mealBcode=NA, id=NA) k <- 0 for (i in seq_len(nrow(Meals))) { for (j in seq_len(nrow(Customers))) { if (any(Customers[j, -1] %in% Meals[i, 1]) & any(Customers[j, -1] %in% Meals[i, 2])) { k <- k+1 Results[k, ] <- cbind(Meals[i, ], id=Customers[j, 1]) } } rownames(Results) <- NULL } Results # mealAcode mealBcode id # 12577 15 # 22577 85 # 33439 11 # 42534 15 # 52534 11 Results is a data frame that can be modified to get various forms of output. You mention a data frame with ids in separate columns. That might be unwieldy for many types of analysis. The following list shows the ids for each meal combination, the number of meals of each combination consumed, and a matrix similar to the one you provided. Results.lst <- split(Results$id, paste(Results[ , 1], Results[ , 2], sep=" - ")) Results.lst # $`25 - 34` # [1] 15 11 # # $`25 - 77` # [1] 15 85 # # $`34 - 39` # [1] 11 table(Results[, 1:2]) # mealBcode # mealAcode 34 39 77 #25 2 0 2 #34 0 1 0 maxid <- 5 t(sapply(Results.lst, function(x) c(sort(as.vector(x)), rep(NA, maxid-length(as.vector(x)) # [,1] [,2] [,3] [,4] [,5] # 25 - 34 11 15 NA NA NA # 25 - 77 15 85 NA NA NA # 34 - 39 11 NA NA NA NA - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Allaisone 1 Sent: Monday, May 22, 2017 4:41 PM To: Bert Gunter Cc: r-help@r-project.org Subject: Re: [R] Identyfing rows with specific conditions Dear Bert I have answered your questions in my last 2 messages. If you did not see them, I will answer again. The 2 tables are all data.frames and the order of the meals does not matter. The meal cannot be replicated for each person, they are all different. The missing values are to the right end of each row as each row starts with meal codes first. This task is just a small part of a long 2 years project. Regards ____________ From: Bert Gunter Sent: 22 May 2017 14:35:49 To: Allaisone 1 Cc: r-help@r-project.org Subject: Re: [R] Identyfing rows with specific conditions You haven't said whether your "table" is a matrix or data frame. Presumably the latter. Nor have you answered my question about whether order of your meal code pairs matters. Another question: can meals be replicated for an ID or are they all different? Finally, is this a homework assignment or class project of some sort? Or is it a real task -- i.e., what is the context? Again, be sure to cc the list. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, May 22, 2017 at 1:56 AM, Allaisone 1 wrote: > Hi Bert .., > > > The number of meals differ from one customer to other customer. You may find > one customer with only one meal and another one with 2,3 or even rarely 30 > meals. You may also > > find no meal at all for some customers so the entire row takes the missing > value "\N" . Any > > row starts with the meals codes first, then all missing values are to the > right end of the table. > > > From: Bert Gunter > Sent: 22 M
Re: [R] Identyfing rows with specific conditions
You haven't said whether your "table" is a matrix or data frame. Presumably the latter. Nor have you answered my question about whether order of your meal code pairs matters. Another question: can meals be replicated for an ID or are they all different? Finally, is this a homework assignment or class project of some sort? Or is it a real task -- i.e., what is the context? Again, be sure to cc the list. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, May 22, 2017 at 1:56 AM, Allaisone 1 wrote: > Hi Bert .., > > > The number of meals differ from one customer to other customer. You may find > one customer with only one meal and another one with 2,3 or even rarely 30 > meals. You may also > > find no meal at all for some customers so the entire row takes the missing > value "\N" . Any > > row starts with the meals codes first, then all missing values are to the > right end of the table. > > > From: Bert Gunter > Sent: 22 May 2017 03:11:11 > To: Allaisone 1 > Cc: r-help@r-project.org > Subject: Re: [R] Identyfing rows with specific conditions > > Clarification: > > Does each customer have the same number of meals or do they differ > from customer to customer? If the latter, how are missing meals > notated? Do they always occur at the (right) end or can they occur > anywhere in the row? > > Presumably each customer ID can have many different meal code > combinations, right ?(since they can have 30 different meals with > potentially 30 choose 2 = 435 combinations apiece) > > Please make sure you reply to the list, not just to me, as I may not > pursue this further but am just trying to clarify for anyone else who > may wish to help. > > > Cheers, > Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Sun, May 21, 2017 at 5:10 PM, Allaisone 1 wrote: >> >> Hi All.., >> >> I have 2 tables. The first one contains 2 columns with the headers say >> "meal A code" & "meal B code " in a table called "Meals" with 2000 rows each >> of which with a different combination of meals(unique combination per row). >> >> >>>Meals >> >> meal A code meal B code >> >> 1 34 66 >> >> 2 89 39 >> >> 3 25 77 >> >> The second table(customers) shows customers ids in the first column with >> Meals codes(M) next to each customer. There are about 300,000 customers >> (300,000 rows). >> >>> Customers >> 1 2 3 4..30 >> id M1 M2 M3 >> 1 15 773425 >> 2 11 2534 39 >> 385 89 2577 >> . >> . >> 300,000 >> >> I would like to identify all customers ids who have had each meal >> combination in the first table so the final output would be the first table >> with ids attached next to each meal combination in each row like this: >> >>>IdsMeals >> >> >> MAcode MBcode ids >> >> 1 343911 >> >> 2 25 34 15 11 >> >> 3 25 7715 85 >> >> Would you please suggest any solutions to this problem? >> >> Regards >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identyfing rows with specific conditions
Hi Again.., All of my 2 tables are data.frames and the order of meals does not matter. Meal A =2 and Meal B= 15 is the same as Meal A=15 and Meal B= 2. From: Bert Gunter Sent: 22 May 2017 03:19:57 To: Allaisone 1 Cc: r-help@r-project.org Subject: Re: [R] Identyfing rows with specific conditions More clarification: Are your "tables" matrices or data frames? (If you don't know what this means, you need to spend a little time with a e.g. web tutorial to learn). Also, does Meal A Meal B order count? -- i.e. is Meal A = 2, Meal B = 15 the same as Meal A = 15 and Meal B = 2? This is important. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, May 21, 2017 at 5:10 PM, Allaisone 1 wrote: > > Hi All.., > > I have 2 tables. The first one contains 2 columns with the headers say "meal > A code" & "meal B code " in a table called "Meals" with 2000 rows each of > which with a different combination of meals(unique combination per row). > > >>Meals > > meal A code meal B code > > 1 34 66 > > 2 89 39 > > 3 25 77 > > The second table(customers) shows customers ids in the first column with > Meals codes(M) next to each customer. There are about 300,000 customers > (300,000 rows). > >> Customers > 1 2 3 4..30 > id M1 M2 M3 > 1 15 773425 > 2 11 2534 39 > 385 89 2577 > . > . > 300,000 > > I would like to identify all customers ids who have had each meal combination > in the first table so the final output would be the first table with ids > attached next to each meal combination in each row like this: > >>IdsMeals > > > MAcode MBcode ids > > 1 343911 > > 2 25 34 15 11 > > 3 25 7715 85 > > Would you please suggest any solutions to this problem? > > Regards > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identyfing rows with specific conditions
More clarification: Are your "tables" matrices or data frames? (If you don't know what this means, you need to spend a little time with a e.g. web tutorial to learn). Also, does Meal A Meal B order count? -- i.e. is Meal A = 2, Meal B = 15 the same as Meal A = 15 and Meal B = 2? This is important. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, May 21, 2017 at 5:10 PM, Allaisone 1 wrote: > > Hi All.., > > I have 2 tables. The first one contains 2 columns with the headers say "meal > A code" & "meal B code " in a table called "Meals" with 2000 rows each of > which with a different combination of meals(unique combination per row). > > >>Meals > > meal A code meal B code > > 1 34 66 > > 2 89 39 > > 3 25 77 > > The second table(customers) shows customers ids in the first column with > Meals codes(M) next to each customer. There are about 300,000 customers > (300,000 rows). > >> Customers > 1 2 3 4..30 > id M1 M2 M3 > 1 15 773425 > 2 11 2534 39 > 385 89 2577 > . > . > 300,000 > > I would like to identify all customers ids who have had each meal combination > in the first table so the final output would be the first table with ids > attached next to each meal combination in each row like this: > >>IdsMeals > > > MAcode MBcode ids > > 1 343911 > > 2 25 34 15 11 > > 3 25 7715 85 > > Would you please suggest any solutions to this problem? > > Regards > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identyfing rows with specific conditions
Clarification: Does each customer have the same number of meals or do they differ from customer to customer? If the latter, how are missing meals notated? Do they always occur at the (right) end or can they occur anywhere in the row? Presumably each customer ID can have many different meal code combinations, right ?(since they can have 30 different meals with potentially 30 choose 2 = 435 combinations apiece) Please make sure you reply to the list, not just to me, as I may not pursue this further but am just trying to clarify for anyone else who may wish to help. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, May 21, 2017 at 5:10 PM, Allaisone 1 wrote: > > Hi All.., > > I have 2 tables. The first one contains 2 columns with the headers say "meal > A code" & "meal B code " in a table called "Meals" with 2000 rows each of > which with a different combination of meals(unique combination per row). > > >>Meals > > meal A code meal B code > > 1 34 66 > > 2 89 39 > > 3 25 77 > > The second table(customers) shows customers ids in the first column with > Meals codes(M) next to each customer. There are about 300,000 customers > (300,000 rows). > >> Customers > 1 2 3 4..30 > id M1 M2 M3 > 1 15 773425 > 2 11 2534 39 > 385 89 2577 > . > . > 300,000 > > I would like to identify all customers ids who have had each meal combination > in the first table so the final output would be the first table with ids > attached next to each meal combination in each row like this: > >>IdsMeals > > > MAcode MBcode ids > > 1 343911 > > 2 25 34 15 11 > > 3 25 7715 85 > > Would you please suggest any solutions to this problem? > > Regards > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.