Re: [R] Complicated analysis for huge databases

Boris Steipe Fri, 17 Nov 2017 16:36:07 -0800

Something like the following?

AllMAFs <- list()


for (i in length(SeparatedGroupsofmealsCombs) {
  AllMAFs[[i]] <- apply(SeparatedGroupsofmealsCombs[[i]], 2, 
function(x)maf(tabulate(x+1)))
}


(untested, of course)
Also the solution is a bit generic since I don't know what the output of maf() 
looks like in your case, and I don't understand why you use tabulate because I 
would have assumed that's what maf() does - but that's not for me to worry 
about :-) 



B.



> On Nov 17, 2017, at 7:15 PM, Allaisone 1 <[email protected]> wrote:
> 
> 
> Thanks Boris , this was very helpful but I'm struggling with the last part.
> 
> 1) I combined the first 2 columns :-
> 
>  
> library(tidyr)
> SingleMealsCode <-unite(MyData, MealsCombinations, c(MealA, MealB), 
> remove=FALSE)
> SingleMealsCode <- SingleMealsCode[,-2]
> 
>   2) I separated this dataframe into different dataframes based on 
> "MealsCombination"
>    column so R will recognize each meal combination separately :
> 
> SeparatedGroupsofmealsCombs <- 
> split(SingleMealCode,SingleMealCode$MealsCombinations)
> 
> after investigating the structure of "SeparatedGroupsofmealsCombs" , I can 
> see 
> a list of different databases, each of which represents a different Meal 
> combinations which is great.
> 
> No, I'm struggling with the last part, how can I run the maf code for all 
> dataframes?
> 
> when I run this code as before :-
> 
> maf <- apply(SeparatedGroupsofmealsCombs, 2, function(x)maf(tabulate(x+1))) 
> 
> an error message says : dim(X) must have a positive length . I'm not sure 
> which length 
> I need to specify.. any suggestions to correct this syntax ?  
> 
> Regards
> Allaisone
> From: Boris Steipe <[email protected]>
> Sent: 17 November 2017 21:12:06
> To: Allaisone 1
> Cc: R-help
> Subject: Re: [R] Complicated analysis for huge databases
>  
> Combine columns 1 and 2 into a column with a single ID like "33.55", "44.66" 
> and use split() on these IDs to break up your dataset. Iterate over the list 
> of data frames split() returns. 
> 
> 
> B.
> 
> > On Nov 17, 2017, at 12:59 PM, Allaisone 1 <[email protected]> wrote:
> > 
> > 
> > Hi all ..,
> > 
> > 
> > I have a large dataset of around 600,000 rows and 600 columns. The first 
> > col is codes for Meal A, the second columns is codes for Meal B. The third 
> > column is customers IDs where each customer had a combination of meals. 
> > Each column of the rest columns contains values 0,1,or 2. The dataset is 
> > organised in a way so that the first group of customers had similar meals 
> > combinations, this is followed by another group of customers with similar 
> > meals combinations but different from the first group and so on. The 
> > dataset looks like this :-
> > 
> > 
> >> MyData
> > 
> >       Meal A     Meal B     Cust.ID      I            II        III     IV  
> >  ...... 600
> > 
> > 1    33                 55             1             0           1        2 
> >       0
> > 
> > 2    33                 55              3             1          0        2 
> >        2
> > 
> > 3    33                 55              5             2          1        1 
> >         2
> > 
> > 4    44                 66               7            0          2         
> > 2        2
> > 
> > 5   44                  66               4            1          1          
> > 0       1
> > 
> > 6   44                  66                9            2          0         
> >  1       2
> > 
> > .
> > 
> > .
> > 
> > 600,000
> > 
> > 
> > 
> > I wanted to find maf() for each column(from 4 to 600) after calculating the 
> > frequency of the 3 values (0,1,2) but this should be done group by group 
> > (i.e. group(33-55) : rows 1:3 then group(44-66) :rows 4:6 and so on).
> > 
> > 
> > I can do the analysis  for the entire column but not group by group like 
> > this :
> > 
> > 
> > MAF <- apply(MyData[,4:600], 2, function(x)maf(tabulate(x+1)))
> > 
> > How can I modify this code to tell R to do the analysis group by group for 
> > each column so I get maf value for 33-55 group of clolumn I, then maf value 
> > for group 44-66 in the same column I,then the rest of groups in this column 
> > and do the same for the remaining columns.
> > 
> > In fact, I'm interested in doing this analysis for only 300 columns but all 
> > of the 600 columns.
> > I have another sheet contains names of columns of interest like this :
> > 
> >> ColOfinterest
> > 
> > Col
> > I
> > IV
> > V
> > .
> > .
> > 300
> > 
> > Any one would help with the best combination of syntax to perform this 
> > complex analysis?
> > 
> > Regards
> > Allaisone
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >        [[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > [email protected] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Complicated analysis for huge databases

Reply via email to