subject:"pass unique ID to mllib algorithms pyspark"

Re: pass unique ID to mllib algorithms pyspark

2014-11-05 Thread Tamas Jambor

Hi Xiangrui,

Thanks for the reply. is this still due to be released in 1.2
(SPARK-3530 is still open)?

Thanks,

On Wed, Nov 5, 2014 at 3:21 AM, Xiangrui Meng men...@gmail.com wrote:
The proposed new set of APIs (SPARK-3573, SPARK-3530) will address
this issue. We carry over extra columns with training and prediction
and then leverage on Spark SQL's execution plan optimization to decide
which columns are really needed. For the current set of APIs, we can
add `predictOnValues` to models, which carries over the input keys.
StreamingKMeans and StreamingLinearRegression implement this method.
-Xiangrui

On Tue, Nov 4, 2014 at 2:30 AM, jamborta jambo...@gmail.com wrote:
Hi all,

There are a few algorithms in pyspark where the prediction part is
implemented in scala (e.g. ALS, decision trees) where it is not very easy to
manipulate the prediction methods.

I think it is a very common scenario that the user would like to generate
prediction for a datasets, so that each predicted value is identifiable
(e.g. have a unique id attached to it). this is not possible in the current
implementation as predict functions take a feature vector and return the
predicted values where, I believe, the order is not guaranteed, so there is
no way to join it back with the original data the predictions are generated
from.

Is there a way around this at the moment?

thanks,

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/pass-unique-ID-to-mllib-algorithms-pyspark-tp18051.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: pass unique ID to mllib algorithms pyspark

2014-11-04 Thread Xiangrui Meng

The proposed new set of APIs (SPARK-3573, SPARK-3530) will address
this issue. We carry over extra columns with training and prediction
and then leverage on Spark SQL's execution plan optimization to decide
which columns are really needed. For the current set of APIs, we can
add `predictOnValues` to models, which carries over the input keys.
StreamingKMeans and StreamingLinearRegression implement this method.
-Xiangrui

On Tue, Nov 4, 2014 at 2:30 AM, jamborta jambo...@gmail.com wrote:
Hi all,

There are a few algorithms in pyspark where the prediction part is
implemented in scala (e.g. ALS, decision trees) where it is not very easy to
manipulate the prediction methods.

Is there a way around this at the moment?

thanks,

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: pass unique ID to mllib algorithms pyspark

Re: pass unique ID to mllib algorithms pyspark

2 matches

Site Navigation

Mail list logo

Footer information