If your data can be split into groups and you can call into your favorite R package on each group of data (in parallel):
https://spark.apache.org/docs/latest/sparkr.html#run-a-given-function-on-a-large-dataset-grouping-by-input-columns-and-using-gapply-or-gapplycollect ________________________________ From: Nisha Muktewar <ni...@cloudera.com> Sent: Monday, March 26, 2018 2:27:52 PM To: Josh Goldsborough Cc: user Subject: Re: [Spark R]: Linear Mixed-Effects Models in Spark R Look at LinkedIn's Photon ML package: https://github.com/linkedin/photon-ml One of the caveats is/was that the input data has to be in Avro in a specific format. On Mon, Mar 26, 2018 at 1:46 PM, Josh Goldsborough <joshgoldsboroughs...@gmail.com<mailto:joshgoldsboroughs...@gmail.com>> wrote: The company I work for is trying to do some mixed-effects regression modeling in our new big data platform including SparkR. We can run via SparkR's support of native R & use lme4. But it runs single threaded. So we're looking for tricks/techniques to process large data sets. This was asked a couple years ago: https://stackoverflow.com/questions/39790820/mixed-effects-models-in-spark-or-other-technology But I wanted to ask again, in case anyone had an answer now. Thanks, Josh Goldsborough