[GitHub] incubator-predictionio pull request #447: Fix batchpredict for custom Persis...

mars Fri, 17 Nov 2017 15:50:29 -0800

GitHub user mars opened a pull request:

    https://github.com/apache/incubator-predictionio/pull/447


    Fix batchpredict for custom PersistentModel

    Fixes [PIO-138](https://issues.apache.org/jira/browse/PIO-138)
    
    Switches batch query processing from Spark RDD to a Scala parallel 
collection. As a result, the `pio batchpredict` command changes in the 
following ways:
    
    * `--query-partitions` option is no longer available; parallelism is now 
managed by Scala's [parallel 
collections](http://docs.scala-lang.org/overviews/parallel-collections/overview.html)
    * `--input` option is now read as a plain, local file
    * `--output` option is now written as a plain, local file
    * because the input & output files are no longer parallelized through 
Spark, memory limits may require that large batch jobs be split into multiple 
command runs.
    
    This solves the root problem that certain custom PersistentModels, such as 
[ALS Recommendation 
template](https://github.com/apache/incubator-predictionio-template-recommender),
 may themselves [contain 
RDDs](https://github.com/apache/incubator-predictionio-template-recommender/blob/develop/src/main/scala/ALSModel.scala#L27),
 which cannot be nested inside the batch queries RDD. (See 
[SPARK-5063](https://issues.apache.org/jira/browse/SPARK-5063))

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mars/incubator-predictionio 
batch-predict-persistent-model

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-predictionio/pull/447.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #447
    
----
commit aad6b22ff9382bb1780efaa3e97af04c92a672f3
Author: Mars Hall <m...@heroku.com>
Date:   2017-11-17T23:25:46Z

    Parallelize batchpredict with Scala parallel collections instead of Spark 
RDD

----


---

[GitHub] incubator-predictionio pull request #447: Fix batchpredict for custom Persis...

Reply via email to