Re: [R] Time intervals is converted into seconds after converting list of dfs into a single Df.

2019-12-24 Thread Allaisone 1
Many thanks Bert for being so cooperative.
Deviding the data into small bites would be
a good suggestion but I will wait first to see
If someone else may have another idea.

Many thanks

From: Bert Gunter 
Sent: 24 December 2019 21:03:56
To: Allaisone 1 
Cc: Patrick (Malone Quantitative) ; 
r-help@r-project.org 
Subject: Re: [R] Time intervals is converted into seconds after converting list 
of dfs into a single Df.

1. "Similar" or "same" column names. The former is probably not going to work.

2. Manipulations with data frames can consume a lot of memory. rbinding 8000 
data frames is likely to be very slow with lots of time swapping memory 
around(???). Perhaps try taking smaller bites (say 1000 at a time) and then 
combining them. Or have you already tried this? If you do wish to do this, wait 
to give experts a chance to tell you that my suggestion is completely useless 
before you attempt it.

3. I'll let someone else resolve your dates problem, as I have never used 
lubridate.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Dec 24, 2019 at 12:38 PM Allaisone 1 
mailto:allaiso...@hotmail.com>> wrote:
Hi dear Patrick ,

Thanks for your replay. Below is a reproducible example . First,  I generated 
two  similar Dfs with one column contains the interval. Then, I put the 2 dfs 
in a list. Now, converting this list into df provides different results 
depending on the code. See below for more details.


 # dataframe 1

id <- c(1,1)

dates1 <- c("2010/2/4","2011/2/4")

dates2 <- c("2010/9/4","2011/1/1")

df1 <- data.frame(id,dates1,dates2)

df1[,2] <- as.Date(df1[,2])

df1[,3] <- as.Date(df1[,3])

df1$interaction <- 
intersect(interval(df1[1,2],df1[2,2]),interval(df1[1,3],df1[2,3]))



  # Dataframe 2

id <- c(2,2)

dates1 <- c("2010/1/4","2011/2/4")

dates2 <- c("2010/10/4","2011/1/16")

df2 <- data.frame(id,dates1,dates2)

df2[,2] <- as.Date(df1[,2])

df2[,3] <- as.Date(df1[,3])


df2$interaction <- 
intersect(interval(df1[1,2],df1[2,2]),interval(df1[1,3],df1[2,3]))



 # 2 datframes in a list :

 ListOfDFs <- list(df1,df2)

 # Convert list of Dfs into a single df :-

 SingDF <- ldply( ListOfDFs,data.frame)

   # The interval has been converted into numbers which is not what I want.

   #but trying this code :
 SingDF <- do.call(rbind,ListOfDFs)

   # It works perfectly but only with this example as we have only 2 
datframes. Howver, in my actual data I have around 8000 datframes. Applying 
this code to it , make R code freezes and I waited for many hours but it still 
freezes with no results generated.

 Could anyone please suggest any alternative syntax or modifications to the 
codes above?

Kind Regards




Sent from Outlook

From: Patrick (Malone Quantitative) 
mailto:mal...@malonequantitative.com>>
Sent: 24 December 2019 17:01:59
To: Allaisone 1 mailto:allaiso...@hotmail.com>>
Cc: r-help@r-project.org<mailto:r-help@r-project.org> 
mailto:r-help@r-project.org>>
Subject: Re: [R] Time intervals is converted into seconds after converting list 
of dfs into a single Df.

You didn't provide a reproducible example for testing (or post in
plain text), but lubridate has an as.interval() function. You'll need
to be able to extract the start time, though, for use in the function.

On Tue, Dec 24, 2019 at 11:54 AM Allaisone 1 
mailto:allaiso...@hotmail.com>> wrote:
>
>
> Hi dear group ,
>
> I have list of datframes with similar column names. I want to rebind all 
> dataframes so I have a single dataframe. One of the column's in each df is of 
> 'interval' time class which was generated from 'lubridate' package.
>
> The problem is that when I convert the list of dfs into a single df using any 
> of the below codes :
>
> Library(plyr)
> MySingleDf <- ldply(MyListOfDfs, data.frame)
> Or
> MySingleDf <- ldply(MyListOfDfs, rbind)
> Or
> MySingleDf <- rebind. fill (MyListOfDfs)
>
> What heppens is that  time intervals which looks like : 2010-4-5 
> UTC--2011-7-9 UTC is converted into a single numeric value which seems to be 
> the difference between the 2 dates in seconds.
>
> When I use :
> MySingleDf <- do.call ("rbind",MyListOfDfs)
>
> The code is freezes and it shows like of the data are being analysed but no 
> result. I have used this code previously for the same purpose but with 
> another datse and it works perfectly.
>
> What I want to see is that time intervals are shown as they are but not 
> converted into seconds.
>
> Could you please suggest any alternative synta

Re: [R] Time intervals is converted into seconds after converting list of dfs into a single Df.

2019-12-24 Thread Allaisone 1
Hi dear Patrick ,

Thanks for your replay. Below is a reproducible example . First,  I generated 
two  similar Dfs with one column contains the interval. Then, I put the 2 dfs 
in a list. Now, converting this list into df provides different results 
depending on the code. See below for more details.


 # dataframe 1​

id <- c(1,1)​

dates1 <- c("2010/2/4","2011/2/4")​

dates2 <- c("2010/9/4","2011/1/1")​

df1 <- data.frame(id,dates1,dates2)​

df1[,2] <- as.Date(df1[,2])​

df1[,3] <- as.Date(df1[,3])​

df1$interaction <- 
intersect(interval(df1[1,2],df1[2,2]),interval(df1[1,3],df1[2,3]))​

  ​

  # Dataframe 2​

id <- c(2,2)​

dates1 <- c("2010/1/4","2011/2/4")​

dates2 <- c("2010/10/4","2011/1/16")​

df2 <- data.frame(id,dates1,dates2)​

df2[,2] <- as.Date(df1[,2])​

df2[,3] <- as.Date(df1[,3])​


df2$interaction <- 
intersect(interval(df1[1,2],df1[2,2]),interval(df1[1,3],df1[2,3]))​



 # 2 datframes in a list :​

 ListOfDFs <- list(df1,df2)​

 # Convert list of Dfs into a single df :-​

 SingDF <- ldply( ListOfDFs,data.frame)​

   # The interval has been converted into numbers which is not what I want.​

   #​but trying this code :
 SingDF <- do.call(rbind,ListOfDFs)​

   # It works perfectly but only with this example as​ we have only 2 
datframes. Howver, in my actual data I have​ around 8000 datframes. Applying 
this code to it , make R code​ freezes and I waited for many hours but it still 
freezes with​ no results generated.​

 Could anyone please suggest any alternative syntax or modifications to the 
codes above?

Kind Regards
 ​



Sent from Outlook

From: Patrick (Malone Quantitative) 
Sent: 24 December 2019 17:01:59
To: Allaisone 1 
Cc: r-help@r-project.org 
Subject: Re: [R] Time intervals is converted into seconds after converting list 
of dfs into a single Df.

You didn't provide a reproducible example for testing (or post in
plain text), but lubridate has an as.interval() function. You'll need
to be able to extract the start time, though, for use in the function.

On Tue, Dec 24, 2019 at 11:54 AM Allaisone 1  wrote:
>
>
> Hi dear group ,
>
> I have list of datframes with similar column names. I want to rebind all 
> dataframes so I have a single dataframe. One of the column's in each df is of 
> 'interval' time class which was generated from 'lubridate' package.
>
> The problem is that when I convert the list of dfs into a single df using any 
> of the below codes :
>
> Library(plyr)
> MySingleDf <- ldply(MyListOfDfs, data.frame)
> Or
> MySingleDf <- ldply(MyListOfDfs, rbind)
> Or
> MySingleDf <- rebind. fill (MyListOfDfs)
>
> What heppens is that  time intervals which looks like : 2010-4-5 
> UTC--2011-7-9 UTC is converted into a single numeric value which seems to be 
> the difference between the 2 dates in seconds.
>
> When I use :
> MySingleDf <- do.call ("rbind",MyListOfDfs)
>
> The code is freezes and it shows like of the data are being analysed but no 
> result. I have used this code previously for the same purpose but with 
> another datse and it works perfectly.
>
> What I want to see is that time intervals are shown as they are but not 
> converted into seconds.
>
> Could you please suggest any alternative syntax or modifications to my codes ?
>
> Thank you so much in advance
>
> Regards
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Time intervals is converted into seconds after converting list of dfs into a single Df.

2019-12-24 Thread Allaisone 1


Hi dear group ,

I have list of datframes with similar column names. I want to rebind all 
dataframes so I have a single dataframe. One of the column's in each df is of 
'interval' time class which was generated from 'lubridate' package.

The problem is that when I convert the list of dfs into a single df using any 
of the below codes :

Library(plyr)
MySingleDf <- ldply(MyListOfDfs, data.frame)
Or
MySingleDf <- ldply(MyListOfDfs, rbind)
Or
MySingleDf <- rebind. fill (MyListOfDfs)

What heppens is that  time intervals which looks like : 2010-4-5 UTC--2011-7-9 
UTC is converted into a single numeric value which seems to be the difference 
between the 2 dates in seconds.

When I use :
MySingleDf <- do.call ("rbind",MyListOfDfs)

The code is freezes and it shows like of the data are being analysed but no 
result. I have used this code previously for the same purpose but with another 
datse and it works perfectly.

What I want to see is that time intervals are shown as they are but not 
converted into seconds.

Could you please suggest any alternative syntax or modifications to my codes ?

Thank you so much in advance

Regards



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Complicated analysis for huge databases

2017-11-17 Thread Allaisone 1

Thanks Boris , this was very helpful but I'm struggling with the last part.

1) I combined the first 2 columns :-


library(tidyr)
SingleMealsCode <-unite(MyData, MealsCombinations, c(MealA, MealB), 
remove=FALSE)
SingleMealsCode <- SingleMealsCode[,-2]

  2) I separated this dataframe into different dataframes based on 
"MealsCombination"
   column so R will recognize each meal combination separately :

SeparatedGroupsofmealsCombs <- 
split(SingleMealCode,SingleMealCode$MealsCombinations)

after investigating the structure of "SeparatedGroupsofmealsCombs" , I can see
a list of different databases, each of which represents a different Meal 
combinations which is great.

No, I'm struggling with the last part, how can I run the maf code for all 
dataframes?

when I run this code as before :-

maf <- apply(SeparatedGroupsofmealsCombs, 2, function(x)maf(tabulate(x+1)))

an error message says : dim(X) must have a positive length . I'm not sure which 
length
I need to specify.. any suggestions to correct this syntax ?

Regards
Allaisone


From: Boris Steipe <boris.ste...@utoronto.ca>
Sent: 17 November 2017 21:12:06
To: Allaisone 1
Cc: R-help
Subject: Re: [R] Complicated analysis for huge databases

Combine columns 1 and 2 into a column with a single ID like "33.55", "44.66" 
and use split() on these IDs to break up your dataset. Iterate over the list of 
data frames split() returns.


B.

> On Nov 17, 2017, at 12:59 PM, Allaisone 1 <allaiso...@hotmail.com> wrote:
>
>
> Hi all ..,
>
>
> I have a large dataset of around 600,000 rows and 600 columns. The first col 
> is codes for Meal A, the second columns is codes for Meal B. The third column 
> is customers IDs where each customer had a combination of meals. Each column 
> of the rest columns contains values 0,1,or 2. The dataset is organised in a 
> way so that the first group of customers had similar meals combinations, this 
> is followed by another group of customers with similar meals combinations but 
> different from the first group and so on. The dataset looks like this :-
>
>
>> MyData
>
>   Meal A Meal B Cust.ID  IIIIII IV   
> .. 600
>
> 133 55 1 0   12   
> 0
>
> 233 55  3 1  02   
>  2
>
> 333 55  5 2  11   
>   2
>
> 444 66   70  2 2  
>   2
>
> 5   44  66   41  1  0 
>   1
>
> 6   44  6692  0  
> 1   2
>
> .
>
> .
>
> 600,000
>
>
>
> I wanted to find maf() for each column(from 4 to 600) after calculating the 
> frequency of the 3 values (0,1,2) but this should be done group by group 
> (i.e. group(33-55) : rows 1:3 then group(44-66) :rows 4:6 and so on).
>
>
> I can do the analysis  for the entire column but not group by group like this 
> :
>
>
> MAF <- apply(MyData[,4:600], 2, function(x)maf(tabulate(x+1)))
>
> How can I modify this code to tell R to do the analysis group by group for 
> each column so I get maf value for 33-55 group of clolumn I, then maf value 
> for group 44-66 in the same column I,then the rest of groups in this column 
> and do the same for the remaining columns.
>
> In fact, I'm interested in doing this analysis for only 300 columns but all 
> of the 600 columns.
> I have another sheet contains names of columns of interest like this :
>
>> ColOfinterest
>
> Col
> I
> IV
> V
> .
> .
> 300
>
> Any one would help with the best combination of syntax to perform this 
> complex analysis?
>
> Regards
> Allaisone
>
>
>
>
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating frequencies of multiple values in 200 colomns

2017-11-10 Thread Allaisone 1


Thank you for your effort Bert..,


I knew what is the problem now, the values (1,2,3) were only an example. The 
values I have are 0 , 1, 2 . Tabulate () function seem to ignore calculating 
the frequency of 0 values and this is my exact problem as the frequency of 0 
values should also be calculated for the maf to be calculated correctly.


From: Bert Gunter <bgunter.4...@gmail.com>
Sent: 09 November 2017 23:51:35
To: Allaisone 1; R-help
Subject: Re: [R] Calculating frequencies of multiple values in 200 colomns

[[elided Hotmail spam]]

"For example, if I have the values : 1 , 2 , 3 in each column, applying 
Tabulate () would calculate the frequency of 1 and 2 without 3"

Huh??

> x <- sample(1:3,10,TRUE)
> x
 [1] 1 3 1 1 1 3 2 3 2 1
> tabulate(x)
[1] 5 2 3

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, Nov 9, 2017 at 3:44 PM, Allaisone 1 
<allaiso...@hotmail.com<mailto:allaiso...@hotmail.com>> wrote:

Thank you so much for your replay


Actually, I tried apply() function but struggled with the part of writing the 
appropriate function inside it which calculate the frequency of the 3 values. 
Tabulate () function is a good start but the problem is that this calculates 
the frequency of two values only per column which means that when I apply maf 
() function , maf value will be calculated using the frequency of these 2 
values only without considering the frequency of the 3rd value. For example, if 
I have the values : 1 , 2 , 3 in each column, applying Tabulate () would 
calculate the frequency of 1 and 2 without 3 . I need a way to calculate the 
frequencies of all of the 3 values so the calculation of maf will be correct as 
it will consider all the 3 frequencies but not only 2 .


Regards

Allahisone


From: Bert Gunter <bgunter.4...@gmail.com<mailto:bgunter.4...@gmail.com>>
Sent: 09 November 2017 20:56:39
To: Allaisone 1
Cc: r-help@R-project.org
Subject: Re: [R] Calculating frequencies of multiple values in 200 colomns

This is not a good way to do things! R has many powerful built in functions to 
do this sort of thing for you. Searching  -- e.g. at 
rseek.org<http://rseek.org> or even a plain old google search -- can help you 
find them. Also, it looks like you need to go through a tutorial or two to 
learn more about R's basic functionality.

In this case, something like (no reproducible example given, so can't confirm):

apply(Values, 2, function(x)maf(tabulate(x)))

should be close to what you want .


Cheers,
Bert







Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Thu, Nov 9, 2017 at 11:44 AM, Allaisone 1 
<allaiso...@hotmail.com<mailto:allaiso...@hotmail.com>> wrote:

Hi All


I have a dataset of 200 columns and 1000 rows , there are 3 repeated values 
under each column (7,8,10). I wanted to calculate the frequency of each value 
under each column and then apply the function maf () given that the frequency 
of each value is known. I can do the analysis step by step like this :-


> Values


 A   B   C   ... 200

1  7   10  7

2  7   87

3  10 87

4   8  7 10

.

.

.




For column A : I calculate the frequency for the 3 values as follows :

 count7 <- length(which(Values$A == 7))

count8 <- length(which(Values$A == 8))

count10 <- length(which(Values$A == 10))


count7 = 2, count8 = 1 , count10= 1.


Then, I create a vector  and type the frequencies manually :


 Freq<- c( count7=2  ,count8= 1,count10=1)


Then I apply the function maf ()  :-

maf(Freq)


This gives me the result I need for column A , could you please help me

to perform the analysis for all of the 200 columns at once ?


Regards

Allahisone


[[alternative HTML version deleted]]

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Identyfing rows with specific conditions

2017-05-22 Thread Allaisone 1
Hi Again..,


All of my 2 tables are data.frames and the order of meals does not matter. Meal 
A =2 and Meal B= 15 is the same as Meal A=15 and Meal B= 2.


From: Bert Gunter <bgunter.4...@gmail.com>
Sent: 22 May 2017 03:19:57
To: Allaisone 1
Cc: r-help@r-project.org
Subject: Re: [R] Identyfing rows with specific conditions

More clarification:

Are your "tables" matrices or data frames? (If you don't know what
this means, you need to spend a little time with a e.g. web tutorial
to learn).

Also, does Meal A Meal B order count? -- i.e. is Meal A = 2, Meal B =
15 the same as Meal A = 15 and Meal B = 2?  This is important.

Cheers,

Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, May 21, 2017 at 5:10 PM, Allaisone 1 <allaiso...@hotmail.com> wrote:
>
> Hi All..,
>
> I have 2 tables. The first one contains 2 columns with the headers say "meal 
> A code" & "meal B code " in a table called "Meals" with 2000 rows each of 
> which with a different combination of meals(unique combination per row).
>
>
>>Meals
>
> meal A code  meal B code
>
> 1  34   66
>
> 2   89  39
>
> 3   25   77
>
> The second table(customers) shows customers ids in the first column with 
> Meals codes(M) next to each customer. There are about 300,000 customers 
> (300,000 rows).
>
>> Customers
>  1 2 3   4..30
>  id   M1  M2   M3
> 1   15  773425
> 2   11  2534 39
> 385 89 2577
> .
> .
> 300,000
>
> I would like to identify all customers ids who have had each meal combination 
> in the first table so the final output would be the first table with ids 
> attached next to each meal combination in each row like this:
>
>>IdsMeals
>
>
>   MAcode  MBcode  ids
>
> 1 343911
>
> 2 25   34  15   11
>
> 3  25 7715   85
>
> Would you please suggest any solutions to this problem?
>
> Regards
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.