Re: [R] Identyfing rows with specific conditions

2017-05-24 Thread David L Carlson
You may run into memory issues if the table is that large in which case you may 
need to break your customers into subsets and process each subset separately. 
Then combine the results of the subsets into a single file. 

There are a couple of relatively easy ways to speed things up, but I don't know 
if it will be enough. The simplest is to change from data.frames to matrices:

Meals <- structure(list(mealAcode = c(34L, 89L, 25L, 34L, 25L), mealBcode = 
c(66L, 
39L, 77L, 39L, 34L)), .Names = c("mealAcode", "mealBcode"), row.names = c(NA, 
-5L), class = "data.frame")

Customers <- structure(list(id = c(15L, 11L, 85L), M1 = c(77L, 25L, 89L), 
M2 = c(34L, 34L, 25L), M3 = c(25L, 39L, 77L)), .Names = c("id", 
"M1", "M2", "M3"), class = "data.frame", row.names = c(NA, -3L
))

Meals <- as.matrix(Meals)
Meals <- unname(Meals)
Customers <- as.matrix(Customers)
Customers <- unname(Customers)

Results <- matrix(nrow=0, ncol=3)
k <- 0
for (i in seq_len(nrow(Meals))) {
for (j in seq_len(nrow(Customers))) {
if (any(Customers[j, -1] %in% Meals[i, 1]) & 
any(Customers[j, -1] %in% Meals[i, 2])) {
   k <- k+1
   Results <- rbind(Results, c(Meals[i, ], Customers[j, 1]))
}
}
}
colnames(Results) <- c("mealAcode", "mealBcode", "id")
Results

This should process quite a bit faster, but is still slowed down by the fact 
that we are allocating memory for each match. If we pre-allocate the memory, it 
is faster, but you have to know how much to allocate:

Customers <- unname(Customers)
Meals <- unname(Meals)
Results <- matrix(nrow=100, ncol=3)
k <- 0
for (i in seq_len(nrow(Meals))) {
for (j in seq_len(nrow(Customers))) {
if (any(Customers[j, -1] %in% Meals[i, 1]) & 
any(Customers[j, -1] %in% Meals[i, 2])) {
   k <- k+1
   Results[k, ] <- c(Meals[i, ], Customers[j, 1])
}
}
}
colnames(Results) <- c("mealAcode", "mealBcode", "id")
Results

This pre-allocates space for a million rows so it should be even faster, but it 
will fail if there are more rows, so guess high. 

There are some specialized packages such as data.table and dplyr in R that 
might be even faster. Also package parallel could use parallel processing to 
speed things up.


David C

From: Allaisone 1 [mailto:allaiso...@hotmail.com] 
Sent: Wednesday, May 24, 2017 7:54 AM
To: David L Carlson ; Bert Gunter 
Cc: r-help@r-project.org
Subject: Re: [R] Identyfing rows with specific conditions

Dear David ..,

Many thanks for spending the time and effort solving this problem.

I liked your suggestion to generate the results with the first format
as this will be very helpful for further analysis :
 #   mealAcode mealBcode id
# 1    25    77             15
# 2    25    77             85
# 3    34    39             11
# 4    25    34             15
# 5    25    34             11

I wonder if if there is any method to process the analysis faster. The end 
output will be a very large table as I already know from my previous analysis 
that for example I have around 20,000 customers taking one of these 
combinations and there are about 17,000 customers taking another combination of 
meals and so on. Considering that I have 2000 rows (2000 combinations) and that 
number of customers per combination is very high 
(20,000 or lower), this will generate a huge table with millions of rows which 
is very complex.  I have tested the code just to see and as expected it takes a 
long time(hours) before I stopped the analysis. 

From: David L Carlson 
Sent: 23 May 2017 18:20:02
To: Allaisone 1; Bert Gunter
Cc: r-help@r-project.org
Subject: RE: [R] Identyfing rows with specific conditions 
 
You will generally get a better response if you create a data set that we can 
test our ideas on. You sketched out the data but your final results include 
meal combinations that are not in your Meals data frame. Also using dput() to 
provide the data makes it easier to recreate the data since just printing it 
loses important information such as if the numbers are integer or decimal, or 
if the character variables are characters or factors. Also send your message as 
plain text, not html which messes up columns and other details. Here is a 
modification of your data with the beginnings of a solution:

Meals <- structure(list(mealAcode = c(34L, 89L, 25L, 34L, 25L), mealBcode = 
c(66L, 
39L, 77L, 39L, 34L)), .Names = c("mealAcode", "mealBcode"), row.names = c(NA, 
-5L), class = "data.frame")
Meals
#   mealAcode mealBcode
# 1    34    66
# 2    89    39
# 3    25    77
# 4    34    39
# 5    25    34

Customers <- structure(l

Re: [R] Identyfing rows with specific conditions

2017-05-23 Thread David L Carlson
You will generally get a better response if you create a data set that we can 
test our ideas on. You sketched out the data but your final results include 
meal combinations that are not in your Meals data frame. Also using dput() to 
provide the data makes it easier to recreate the data since just printing it 
loses important information such as if the numbers are integer or decimal, or 
if the character variables are characters or factors. Also send your message as 
plain text, not html which messes up columns and other details. Here is a 
modification of your data with the beginnings of a solution:

Meals <- structure(list(mealAcode = c(34L, 89L, 25L, 34L, 25L), mealBcode = 
c(66L, 
39L, 77L, 39L, 34L)), .Names = c("mealAcode", "mealBcode"), row.names = c(NA, 
-5L), class = "data.frame")
Meals
#   mealAcode mealBcode
# 13466
# 28939
# 32577
# 43439
# 52534

Customers <- structure(list(id = c(15L, 11L, 85L), M1 = c(77L, 25L, 89L), 
M2 = c(34L, 34L, 25L), M3 = c(25L, 39L, 77L)), .Names = c("id", 
"M1", "M2", "M3"), class = "data.frame", row.names = c(NA, -3L
))
Customers
#   id M1 M2 M3
# 1 15 77 34 25
# 2 11 25 34 39
# 3 85 89 25 77

Results <- data.frame(mealAcode=NA, mealBcode=NA, id=NA)
k <- 0
for (i in seq_len(nrow(Meals))) {
for (j in seq_len(nrow(Customers))) {
if (any(Customers[j, -1] %in% Meals[i, 1]) & 
any(Customers[j, -1] %in% Meals[i, 2])) {
   k <- k+1
   Results[k, ] <- cbind(Meals[i, ], id=Customers[j, 1])
}
}
rownames(Results) <- NULL
}
Results
#   mealAcode mealBcode id
# 12577 15
# 22577 85
# 33439 11
# 42534 15
# 52534 11

Results is a data frame that can be modified to get various forms of output. 
You mention a data frame with ids in separate columns. That might be unwieldy 
for many types of analysis. The following list shows the ids for each meal 
combination,  the number of meals of each combination consumed, and a matrix 
similar to the one you provided.

Results.lst <- split(Results$id,  paste(Results[ , 1], Results[ , 2], sep=" - 
"))
Results.lst
# $`25 - 34`
# [1] 15 11
# 
# $`25 - 77`
# [1] 15 85
# 
# $`34 - 39`
# [1] 11
table(Results[, 1:2])
#  mealBcode
# mealAcode 34 39 77
#25  2  0  2
#34  0  1  0
maxid <- 5
t(sapply(Results.lst, function(x) c(sort(as.vector(x)), 
 rep(NA, maxid-length(as.vector(x))
# [,1] [,2] [,3] [,4] [,5]
# 25 - 34   11   15   NA   NA   NA
# 25 - 77   15   85   NA   NA   NA
# 34 - 39   11   NA   NA   NA   NA

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352




-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Allaisone 1
Sent: Monday, May 22, 2017 4:41 PM
To: Bert Gunter 
Cc: r-help@r-project.org
Subject: Re: [R] Identyfing rows with specific conditions

Dear Bert

I have answered your questions in my last 2 messages. If you did not see them, 
I will answer again. The 2 tables are all data.frames and the order of the 
meals does not matter. The meal cannot be replicated for each person, they are 
all different. The missing values are to the right end of each row as each row 
starts with meal codes first. This task is just a small part of a long 2 years 
project.

Regards
____________
From: Bert Gunter 
Sent: 22 May 2017 14:35:49
To: Allaisone 1
Cc: r-help@r-project.org
Subject: Re: [R] Identyfing rows with specific conditions

You haven't said whether your "table" is a matrix or data frame.
Presumably the latter.

Nor have you answered my question about whether order of your meal
code pairs matters.

Another question: can meals be replicated for an ID or are they all different?

Finally, is this a homework assignment or class project of some sort?
Or is it a real task -- i.e., what is the context?

Again, be sure to cc the list.

-- Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, May 22, 2017 at 1:56 AM, Allaisone 1  wrote:
> Hi Bert ..,
>
>
> The number of meals differ from one customer to other customer. You may find
> one customer with only one meal and another one with 2,3 or even rarely 30
> meals. You may also
>
> find no meal at all for some customers so the entire row takes the missing
> value "\N" . Any
>
> row starts with the meals codes first, then all missing values are to the
> right end of the table.
>
> 
> From: Bert Gunter 
> Sent: 22 M

Re: [R] Identyfing rows with specific conditions

2017-05-22 Thread Bert Gunter
You haven't said whether your "table" is a matrix or data frame.
Presumably the latter.

Nor have you answered my question about whether order of your meal
code pairs matters.

Another question: can meals be replicated for an ID or are they all different?

Finally, is this a homework assignment or class project of some sort?
Or is it a real task -- i.e., what is the context?

Again, be sure to cc the list.

-- Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, May 22, 2017 at 1:56 AM, Allaisone 1  wrote:
> Hi Bert ..,
>
>
> The number of meals differ from one customer to other customer. You may find
> one customer with only one meal and another one with 2,3 or even rarely 30
> meals. You may also
>
> find no meal at all for some customers so the entire row takes the missing
> value "\N" . Any
>
> row starts with the meals codes first, then all missing values are to the
> right end of the table.
>
> 
> From: Bert Gunter 
> Sent: 22 May 2017 03:11:11
> To: Allaisone 1
> Cc: r-help@r-project.org
> Subject: Re: [R] Identyfing rows with specific conditions
>
> Clarification:
>
> Does each customer have the same number of meals or do they differ
> from customer to customer? If the latter, how are missing meals
> notated? Do they always occur at the (right) end or can they occur
> anywhere in the row?
>
> Presumably each customer ID can have many different meal code
> combinations, right ?(since they can have 30 different meals with
> potentially 30 choose 2 = 435 combinations apiece)
>
> Please make sure you reply to the list, not just to me, as I may not
> pursue this further but am just trying to clarify for anyone else who
> may wish to help.
>
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sun, May 21, 2017 at 5:10 PM, Allaisone 1  wrote:
>>
>> Hi All..,
>>
>> I have 2 tables. The first one contains 2 columns with the headers say
>> "meal A code" & "meal B code " in a table called "Meals" with 2000 rows each
>> of which with a different combination of meals(unique combination per row).
>>
>>
>>>Meals
>>
>> meal A code  meal B code
>>
>> 1  34   66
>>
>> 2   89  39
>>
>> 3   25   77
>>
>> The second table(customers) shows customers ids in the first column with
>> Meals codes(M) next to each customer. There are about 300,000 customers
>> (300,000 rows).
>>
>>> Customers
>>  1 2 3   4..30
>>  id   M1  M2   M3
>> 1   15  773425
>> 2   11  2534 39
>> 385 89 2577
>> .
>> .
>> 300,000
>>
>> I would like to identify all customers ids who have had each meal
>> combination in the first table so the final output would be the first table
>> with ids attached next to each meal combination in each row like this:
>>
>>>IdsMeals
>>
>>
>>   MAcode  MBcode  ids
>>
>> 1 343911
>>
>> 2 25   34  15   11
>>
>> 3  25 7715   85
>>
>> Would you please suggest any solutions to this problem?
>>
>> Regards
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Identyfing rows with specific conditions

2017-05-22 Thread Allaisone 1
Hi Again..,


All of my 2 tables are data.frames and the order of meals does not matter. Meal 
A =2 and Meal B= 15 is the same as Meal A=15 and Meal B= 2.


From: Bert Gunter 
Sent: 22 May 2017 03:19:57
To: Allaisone 1
Cc: r-help@r-project.org
Subject: Re: [R] Identyfing rows with specific conditions

More clarification:

Are your "tables" matrices or data frames? (If you don't know what
this means, you need to spend a little time with a e.g. web tutorial
to learn).

Also, does Meal A Meal B order count? -- i.e. is Meal A = 2, Meal B =
15 the same as Meal A = 15 and Meal B = 2?  This is important.

Cheers,

Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, May 21, 2017 at 5:10 PM, Allaisone 1  wrote:
>
> Hi All..,
>
> I have 2 tables. The first one contains 2 columns with the headers say "meal 
> A code" & "meal B code " in a table called "Meals" with 2000 rows each of 
> which with a different combination of meals(unique combination per row).
>
>
>>Meals
>
> meal A code  meal B code
>
> 1  34   66
>
> 2   89  39
>
> 3   25   77
>
> The second table(customers) shows customers ids in the first column with 
> Meals codes(M) next to each customer. There are about 300,000 customers 
> (300,000 rows).
>
>> Customers
>  1 2 3   4..30
>  id   M1  M2   M3
> 1   15  773425
> 2   11  2534 39
> 385 89 2577
> .
> .
> 300,000
>
> I would like to identify all customers ids who have had each meal combination 
> in the first table so the final output would be the first table with ids 
> attached next to each meal combination in each row like this:
>
>>IdsMeals
>
>
>   MAcode  MBcode  ids
>
> 1 343911
>
> 2 25   34  15   11
>
> 3  25 7715   85
>
> Would you please suggest any solutions to this problem?
>
> Regards
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Identyfing rows with specific conditions

2017-05-21 Thread Bert Gunter
More clarification:

Are your "tables" matrices or data frames? (If you don't know what
this means, you need to spend a little time with a e.g. web tutorial
to learn).

Also, does Meal A Meal B order count? -- i.e. is Meal A = 2, Meal B =
15 the same as Meal A = 15 and Meal B = 2?  This is important.

Cheers,

Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, May 21, 2017 at 5:10 PM, Allaisone 1  wrote:
>
> Hi All..,
>
> I have 2 tables. The first one contains 2 columns with the headers say "meal 
> A code" & "meal B code " in a table called "Meals" with 2000 rows each of 
> which with a different combination of meals(unique combination per row).
>
>
>>Meals
>
> meal A code  meal B code
>
> 1  34   66
>
> 2   89  39
>
> 3   25   77
>
> The second table(customers) shows customers ids in the first column with 
> Meals codes(M) next to each customer. There are about 300,000 customers 
> (300,000 rows).
>
>> Customers
>  1 2 3   4..30
>  id   M1  M2   M3
> 1   15  773425
> 2   11  2534 39
> 385 89 2577
> .
> .
> 300,000
>
> I would like to identify all customers ids who have had each meal combination 
> in the first table so the final output would be the first table with ids 
> attached next to each meal combination in each row like this:
>
>>IdsMeals
>
>
>   MAcode  MBcode  ids
>
> 1 343911
>
> 2 25   34  15   11
>
> 3  25 7715   85
>
> Would you please suggest any solutions to this problem?
>
> Regards
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Identyfing rows with specific conditions

2017-05-21 Thread Bert Gunter
Clarification:

Does each customer have the same number of meals or do they differ
from customer to customer? If the latter, how are missing meals
notated? Do they always occur at the (right) end or can they occur
anywhere in the row?

Presumably each customer ID can have many different meal code
combinations, right ?(since they can have 30 different meals with
potentially 30 choose 2 = 435 combinations apiece)

Please make sure you reply to the list, not just to me, as I may not
pursue this further but am just trying to clarify for anyone else who
may wish to help.


Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, May 21, 2017 at 5:10 PM, Allaisone 1  wrote:
>
> Hi All..,
>
> I have 2 tables. The first one contains 2 columns with the headers say "meal 
> A code" & "meal B code " in a table called "Meals" with 2000 rows each of 
> which with a different combination of meals(unique combination per row).
>
>
>>Meals
>
> meal A code  meal B code
>
> 1  34   66
>
> 2   89  39
>
> 3   25   77
>
> The second table(customers) shows customers ids in the first column with 
> Meals codes(M) next to each customer. There are about 300,000 customers 
> (300,000 rows).
>
>> Customers
>  1 2 3   4..30
>  id   M1  M2   M3
> 1   15  773425
> 2   11  2534 39
> 385 89 2577
> .
> .
> 300,000
>
> I would like to identify all customers ids who have had each meal combination 
> in the first table so the final output would be the first table with ids 
> attached next to each meal combination in each row like this:
>
>>IdsMeals
>
>
>   MAcode  MBcode  ids
>
> 1 343911
>
> 2 25   34  15   11
>
> 3  25 7715   85
>
> Would you please suggest any solutions to this problem?
>
> Regards
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.