If your data can be split into groups and you can call into your favorite R 
package on each group of data (in parallel):

https://spark.apache.org/docs/latest/sparkr.html#run-a-given-function-on-a-large-dataset-grouping-by-input-columns-and-using-gapply-or-gapplycollect


________________________________
From: Nisha Muktewar <ni...@cloudera.com>
Sent: Monday, March 26, 2018 2:27:52 PM
To: Josh Goldsborough
Cc: user
Subject: Re: [Spark R]: Linear Mixed-Effects Models in Spark R

Look at LinkedIn's Photon ML package: https://github.com/linkedin/photon-ml

One of the caveats is/was that the input data has to be in Avro in a specific 
format.

On Mon, Mar 26, 2018 at 1:46 PM, Josh Goldsborough 
<joshgoldsboroughs...@gmail.com<mailto:joshgoldsboroughs...@gmail.com>> wrote:
The company I work for is trying to do some mixed-effects regression modeling 
in our new big data platform including SparkR.

We can run via SparkR's support of native R & use lme4.  But it runs single 
threaded.  So we're looking for tricks/techniques to process large data sets.


This was asked a couple years ago:
https://stackoverflow.com/questions/39790820/mixed-effects-models-in-spark-or-other-technology

But I wanted to ask again, in case anyone had an answer now.

Thanks,
Josh Goldsborough

Reply via email to