[
https://issues.apache.org/jira/browse/SPARK-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370747#comment-14370747
]
Joseph K. Bradley commented on SPARK-5981:
------------------------------------------
I agree that it should not work for Decision Trees. I was citing
DecisionTreeModel as a good example of why PySpark models which use
JavaModelWrapper cannot be used within maps.
The issue is that JavaModelWrapper.call is within the closure (lambda) passed
to each worker. When the worker tries to execute the closure (lambda), it will
call JavaModelWrapper.call, which fails.
Does that make sense?
> pyspark ML models should support predict/transform on vector within map
> -----------------------------------------------------------------------
>
> Key: SPARK-5981
> URL: https://issues.apache.org/jira/browse/SPARK-5981
> Project: Spark
> Issue Type: Improvement
> Components: MLlib, PySpark
> Affects Versions: 1.3.0
> Reporter: Joseph K. Bradley
>
> Currently, most Python models only have limited support for single-vector
> prediction.
> E.g., one can call {code}model.predict(myFeatureVector){code} for a single
> instance, but that fails within a map for Python ML models and transformers
> which use JavaModelWrapper:
> {code}
> data.map(lambda features: model.predict(features))
> {code}
> This fails because JavaModelWrapper.call uses the SparkContext (within the
> transformation). (It works for linear models, which do prediction within
> Python.)
> Supporting prediction within a map would require storing the model and doing
> prediction/transformation within Python.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]