GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/447
Fix batchpredict for custom PersistentModel Fixes [PIO-138](https://issues.apache.org/jira/browse/PIO-138) Switches batch query processing from Spark RDD to a Scala parallel collection. As a result, the `pio batchpredict` command changes in the following ways: * `--query-partitions` option is no longer available; parallelism is now managed by Scala's [parallel collections](http://docs.scala-lang.org/overviews/parallel-collections/overview.html) * `--input` option is now read as a plain, local file * `--output` option is now written as a plain, local file * because the input & output files are no longer parallelized through Spark, memory limits may require that large batch jobs be split into multiple command runs. This solves the root problem that certain custom PersistentModels, such as [ALS Recommendation template](https://github.com/apache/incubator-predictionio-template-recommender), may themselves [contain RDDs](https://github.com/apache/incubator-predictionio-template-recommender/blob/develop/src/main/scala/ALSModel.scala#L27), which cannot be nested inside the batch queries RDD. (See [SPARK-5063](https://issues.apache.org/jira/browse/SPARK-5063)) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio batch-predict-persistent-model Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/447.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #447 ---- commit aad6b22ff9382bb1780efaa3e97af04c92a672f3 Author: Mars Hall <m...@heroku.com> Date: 2017-11-17T23:25:46Z Parallelize batchpredict with Scala parallel collections instead of Spark RDD ---- ---