Re: [R] Complicated analysis for huge databases

Boris Steipe Sat, 18 Nov 2017 13:45:07 -0800

The correct code is: 

   for (i in 1:length(SeparatedGroupsofmealsCombs)) { ...



I had mentioned that this is untested, but the error is so obvious ...




B.



> On Nov 18, 2017, at 4:40 PM, Allaisone 1 <[email protected]> wrote:
> 
> 
> The loop : 
> 
> AllMAFs <- list()
>  
>  for (i in length(SeparatedGroupsofmealsCombs) {
>   AllMAFs[[i]] <- apply( SeparatedGroupsofmealsCombs[[i]], 2, function(x)maf( 
> tabulate( x+1) ))
> }
> 
> gives these errors (I tried this many times and I'm sure I copied it 
> entirely) :-
> Error in apply(SeparatedGroupsofmealsCombs[[i]], 2, function(x) 
> maf(tabulate(x +  : 
>   object 'i' not found
> >  }
> Error: unexpected '}' in " }"
> 
> 
> The lapply function :
>   results<-lapply(SeparatedGroupsofmealsCombs , function(x)maf(tabulate(x+1)))
> gives this error :-
> Error in FUN(left, right) : non-numeric argument to binary operator
> 
> I have been trying since yesterday but but until now I'm not able to identify 
> the correct syntax.
> 
> 
> 
> 
> From: David Winsemius <[email protected]>
> Sent: 18 November 2017 20:06:56
> To: Allaisone 1
> Cc: Boris Steipe; R-help
> Subject: Re: [R] Complicated analysis for huge databases
>  
> 
> > On Nov 18, 2017, at 1:52 AM, Allaisone 1 <[email protected]> wrote:
> > 
> > Although the loop seems to be formulated correctly I wonder why
> > it gives me these errors :
> > 
> > -object 'i' not found
> > - unexpected '}' in "}"
> 
> You probably did not copy the entire code offered. But we cannot know since 
> you did not "show your code", not=r did you post complete error messages. 
> Both of these practices are strongly recommended by the Posting Guide. Please 
> read it (again?).
> 
> -- 
> David.
> > 
> > 
> > the desired output is expected to be very large as for each dataframe in 
> > the list of dataframes I expect to see maf value for each of the 600 
> > columns! and this is only for
> > 
> > for one dataframe in the list .. I have around 150-200 dataframes.. not 
> > sure how R will store these results.. but first I need the analysis to be 
> > done correctly. The final output has to be something like this :-
> > 
> > 
> >> mafsforeachcolumns(I,II,...600)foreachcombination
> > 
> >      MealsCombinations    Cust.ID      I              II            III     
> >         IV       ...... 600
> > 1          33-55                          1             0.124      0.10     
> >  0.65       0.467
> >                                                  3
> >                                                  5
> > 
> > 2      44-66                                7           0.134     0.43      
> >  0.64       0.479
> >                                                  4
> >                                                  9
> > 
> > .
> > 
> > .
> > 
> > ~180 dataframes
> > 
> > 
> > ________________________________
> > From: Boris Steipe <[email protected]>
> > Sent: 18 November 2017 00:35:16
> > To: Allaisone 1; R-help
> > Subject: Re: [R] Complicated analysis for huge databases
> > 
> > Something like the following?
> > 
> > AllMAFs <- list()
> > 
> > for (i in length(SeparatedGroupsofmealsCombs) {
> >  AllMAFs[[i]] <- apply( SeparatedGroupsofmealsCombs[[i]], 2, 
> > function(x)maf( tabulate( x+1) ))
> > }
> > 
> > 
> > (untested, of course)
> > Also the solution is a bit generic since I don't know what the output of 
> > maf() looks like in your case, and I don't understand why you use tabulate 
> > because I would have assumed that's what maf() does - but that's not for me 
> > to worry about :-)
> > 
> > 
> > 
> > B.
> > 
> > 
> > 
> >> On Nov 17, 2017, at 7:15 PM, Allaisone 1 <[email protected]> wrote:
> >> 
> >> 
> >> Thanks Boris , this was very helpful but I'm struggling with the last part.
> >> 
> >> 1) I combined the first 2 columns :-
> >> 
> >> 
> >> library(tidyr)
> >> SingleMealsCode <-unite(MyData, MealsCombinations, c(MealA, MealB), 
> >> remove=FALSE)
> >> SingleMealsCode <- SingleMealsCode[,-2]
> >> 
> >>  2) I separated this dataframe into different dataframes based on 
> >> "MealsCombination"
> >>   column so R will recognize each meal combination separately :
> >> 
> >> SeparatedGroupsofmealsCombs <- 
> >> split(SingleMealCode,SingleMealCode$MealsCombinations)
> >> 
> >> after investigating the structure of "SeparatedGroupsofmealsCombs" , I can 
> >> see
> >> a list of different databases, each of which represents a different Meal 
> >> combinations which is great.
> >> 
> >> No, I'm struggling with the last part, how can I run the maf code for all 
> >> dataframes?
> >> 
> >> when I run this code as before :-
> >> 
> >> maf <- apply(SeparatedGroupsofmealsCombs, 2, function(x)maf(tabulate(x+1)))
> >> 
> >> an error message says : dim(X) must have a positive length . I'm not sure 
> >> which length
> >> I need to specify.. any suggestions to correct this syntax ?
> >> 
> >> Regards
> >> Allaisone
> >> From: Boris Steipe <[email protected]>
> >> Sent: 17 November 2017 21:12:06
> >> To: Allaisone 1
> >> Cc: R-help
> >> Subject: Re: [R] Complicated analysis for huge databases
> >> 
> >> Combine columns 1 and 2 into a column with a single ID like "33.55", 
> >> "44.66" and use split() on these IDs to break up your dataset. Iterate 
> >> over the list of data frames split() returns.
> >> 
> >> 
> >> B.
> >> 
> >>> On Nov 17, 2017, at 12:59 PM, Allaisone 1 <[email protected]> wrote:
> >>> 
> >>> 
> >>> Hi all ..,
> >>> 
> >>> 
> >>> I have a large dataset of around 600,000 rows and 600 columns. The first 
> >>> col is codes for Meal A, the second columns is codes for Meal B. The 
> >>> third column is customers IDs where each customer had a combination of 
> >>> meals. Each column of the rest columns contains values 0,1,or 2. The 
> >>> dataset is organised in a way so that the first group of customers had 
> >>> similar meals combinations, this is followed by another group of 
> >>> customers with similar meals combinations but different from the first 
> >>> group and so on. The dataset looks like this :-
> >>> 
> >>> 
> >>>> MyData
> >>> 
> >>>      Meal A     Meal B     Cust.ID      I            II        III     IV 
> >>>   ...... 600
> >>> 
> >>> 1    33                 55             1             0           1        
> >>> 2       0
> >>> 
> >>> 2    33                 55              3             1          0        
> >>> 2        2
> >>> 
> >>> 3    33                 55              5             2          1        
> >>> 1         2
> >>> 
> >>> 4    44                 66               7            0          2        
> >>>  2        2
> >>> 
> >>> 5   44                  66               4            1          1        
> >>>   0       1
> >>> 
> >>> 6   44                  66                9            2          0       
> >>>    1       2
> >>> 
> >>> .
> >>> 
> >>> .
> >>> 
> >>> 600,000
> >>> 
> >>> 
> >>> 
> >>> I wanted to find maf() for each column(from 4 to 600) after calculating 
> >>> the frequency of the 3 values (0,1,2) but this should be done group by 
> >>> group (i.e. group(33-55) : rows 1:3 then group(44-66) :rows 4:6 and so 
> >>> on).
> >>> 
> >>> 
> >>> I can do the analysis  for the entire column but not group by group like 
> >>> this :
> >>> 
> >>> 
> >>> MAF <- apply(MyData[,4:600], 2, function(x)maf(tabulate(x+1)))
> >>> 
> >>> How can I modify this code to tell R to do the analysis group by group 
> >>> for each column so I get maf value for 33-55 group of clolumn I, then maf 
> >>> value for group 44-66 in the same column I,then the rest of groups in 
> >>> this column and do the same for the remaining columns.
> >>> 
> >>> In fact, I'm interested in doing this analysis for only 300 columns but 
> >>> all of the 600 columns.
> >>> I have another sheet contains names of columns of interest like this :
> >>> 
> >>>> ColOfinterest
> >>> 
> >>> Col
> >>> I
> >>> IV
> >>> V
> >>> .
> >>> .
> >>> 300
> >>> 
> >>> Any one would help with the best combination of syntax to perform this 
> >>> complex analysis?
> >>> 
> >>> Regards
> >>> Allaisone
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>>       [[alternative HTML version deleted]]
> >>> 
> >>> ______________________________________________
> >>> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> > 
> > 
> >        [[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > [email protected] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> 'Any technology distinguishable from magic is insufficiently advanced.'   
> -Gehm's Corollary to Clarke's Third Law

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Complicated analysis for huge databases

Reply via email to