[jira] [Commented] (SPARK-9120) Add multivariate regression (or prediction) interface

2016-07-27 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395434#comment-15395434
 ] 

Alexander Ulanov commented on SPARK-9120:
-

Thanks for the comment, RegressionModel does not extend that trait indeed. 
However it is designed to handle one output variable, as mentioned in the 
description. This presents it from use in multivariate regression.

> Add multivariate regression (or prediction) interface
> -
>
> Key: SPARK-9120
> URL: https://issues.apache.org/jira/browse/SPARK-9120
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.4.0
>Reporter: Alexander Ulanov
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> org.apache.spark.ml.regression.RegressionModel supports prediction only for a 
> single variable with a method "predict:Double" by extending the Predictor. 
> There is a need for multivariate prediction, at least for regression. I 
> propose to modify "RegressionModel" interface similarly to how it is done in 
> "ClassificationModel", which supports multiclass classification. It has 
> "predict:Double" and "predictRaw:Vector". Analogously, "RegressionModel" 
> should have something like "predictMultivariate:Vector".
> Update: After reading the design docs, adding "predictMultivariate" to 
> RegressionModel does not seem reasonable to me anymore. The issue is as 
> follows. RegressionModel has "predict:Double". Its "train" method uses 
> "predict:Double" for prediction, i.e. PredictionModel (and RegressionModel) 
> is hard-coded to have only one output. There exist a similar problem in MLLib 
> (https://issues.apache.org/jira/browse/SPARK-5362). 
> The possible solution for this problem might require to redesign the class 
> hierarchy or addition of a separate interface that extends model. Though the 
> latter means code duplication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9120) Add multivariate regression (or prediction) interface

2016-07-18 Thread Ruben Janssen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383081#comment-15383081
 ] 

Ruben Janssen commented on SPARK-9120:
--

Bumping this JIRA because of the recent PR for JIRA 
https://issues.apache.org/jira/browse/SPARK-10409 which triggered the same 
discussion. Given 10409 is on the road map for 2.1 
(https://issues.apache.org/jira/browse/SPARK-5575), we should keep discussion 
at one place or at least link this JIRA to 10409. 

Regarding the update on the description which states 'The issue is as follows. 
RegressionModel extends PredictionModel which has "predict:Double".': this 
seems to be out of date if I am not missing something. ClassificationModel in 
ML seems to be extending PredictionModel in the same way RegressionModel does. 
The initial solution stated therefore seems to be sufficient in case we want to 
have multivariate regression for all regression algorithms that implement the 
interface. I am not sure if this is the case however, but if not, I think it 
would be best to create a separate interface which can then be implemented by 
algorithms unique (and to keep things consistent, we let ClassificationModel 
als us to have it: it would not require us to change any code if the naming 
would be consistent).


> Add multivariate regression (or prediction) interface
> -
>
> Key: SPARK-9120
> URL: https://issues.apache.org/jira/browse/SPARK-9120
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.4.0
>Reporter: Alexander Ulanov
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> org.apache.spark.ml.regression.RegressionModel supports prediction only for a 
> single variable with a method "predict:Double" by extending the Predictor. 
> There is a need for multivariate prediction, at least for regression. I 
> propose to modify "RegressionModel" interface similarly to how it is done in 
> "ClassificationModel", which supports multiclass classification. It has 
> "predict:Double" and "predictRaw:Vector". Analogously, "RegressionModel" 
> should have something like "predictMultivariate:Vector".
> Update: After reading the design docs, adding "predictMultivariate" to 
> RegressionModel does not seem reasonable to me anymore. The issue is as 
> follows. RegressionModel extends PredictionModel which has "predict:Double". 
> Its "train" method uses "predict:Double" for prediction, i.e. PredictionModel 
> (and RegressionModel) is hard-coded to have only one output. There exist a 
> similar problem in MLLib (https://issues.apache.org/jira/browse/SPARK-5362). 
> The possible solution for this problem might require to redesign the class 
> hierarchy or addition of a separate interface that extends model. Though the 
> latter means code duplication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9120) Add multivariate regression (or prediction) interface

2015-08-12 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694516#comment-14694516
 ] 

Joseph K. Bradley commented on SPARK-9120:
--

We can set a new target version as needed.

 Add multivariate regression (or prediction) interface
 -

 Key: SPARK-9120
 URL: https://issues.apache.org/jira/browse/SPARK-9120
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 1.4.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 org.apache.spark.ml.regression.RegressionModel supports prediction only for a 
 single variable with a method predict:Double by extending the Predictor. 
 There is a need for multivariate prediction, at least for regression. I 
 propose to modify RegressionModel interface similarly to how it is done in 
 ClassificationModel, which supports multiclass classification. It has 
 predict:Double and predictRaw:Vector. Analogously, RegressionModel 
 should have something like predictMultivariate:Vector.
 Update: After reading the design docs, adding predictMultivariate to 
 RegressionModel does not seem reasonable to me anymore. The issue is as 
 follows. RegressionModel extends PredictionModel which has predict:Double. 
 Its train method uses predict:Double for prediction, i.e. PredictionModel 
 (and RegressionModel) is hard-coded to have only one output. There exist a 
 similar problem in MLLib (https://issues.apache.org/jira/browse/SPARK-5362). 
 The possible solution for this problem might require to redesign the class 
 hierarchy or addition of a separate interface that extends model. Though the 
 latter means code duplication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9120) Add multivariate regression (or prediction) interface

2015-07-16 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630654#comment-14630654
 ] 

Alexander Ulanov commented on SPARK-9120:
-

Thank you, it sounds doable.

 Add multivariate regression (or prediction) interface
 -

 Key: SPARK-9120
 URL: https://issues.apache.org/jira/browse/SPARK-9120
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 1.4.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 org.apache.spark.ml.regression.RegressionModel supports prediction only for a 
 single variable with a method predict:Double by extending the Predictor. 
 There is a need for multivariate prediction, at least for regression. I 
 propose to modify RegressionModel interface similarly to how it is done in 
 ClassificationModel, which supports multiclass classification. It has 
 predict:Double and predictRaw:Vector. Analogously, RegressionModel 
 should have something like predictMultivariate:Vector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9120) Add multivariate regression (or prediction) interface

2015-07-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630585#comment-14630585
 ] 

Joseph K. Bradley commented on SPARK-9120:
--

I meant that it should be implemented under ML using the more generic 
abstractions such as Transformer and Estimator.  Would that work?

More specialized abstractions analogous to Predictor and Classifier could be 
added later on.

 Add multivariate regression (or prediction) interface
 -

 Key: SPARK-9120
 URL: https://issues.apache.org/jira/browse/SPARK-9120
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 1.4.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 org.apache.spark.ml.regression.RegressionModel supports prediction only for a 
 single variable with a method predict:Double by extending the Predictor. 
 There is a need for multivariate prediction, at least for regression. I 
 propose to modify RegressionModel interface similarly to how it is done in 
 ClassificationModel, which supports multiclass classification. It has 
 predict:Double and predictRaw:Vector. Analogously, RegressionModel 
 should have something like predictMultivariate:Vector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9120) Add multivariate regression (or prediction) interface

2015-07-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630546#comment-14630546
 ] 

Joseph K. Bradley commented on SPARK-9120:
--

This sounds reasonable.

One caveat though: Since adding those abstractions, I have wondered a bit about 
their generality.  I feel like they are mainly useful for helping developers 
write new algorithms and avoid some boilerplate code.  For providing public 
abstractions, I think we should probably design some traits---but I have not 
had time to think about this deeply.

So I think we should do this lazily: If you have an algorithm to add, it should 
be added with the interface.  As we add more algorithms, then we can start 
thinking about creating an abstraction.

What do you think?

 Add multivariate regression (or prediction) interface
 -

 Key: SPARK-9120
 URL: https://issues.apache.org/jira/browse/SPARK-9120
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 1.4.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 org.apache.spark.ml.regression.RegressionModel supports prediction only for a 
 single variable with a method predict:Double by extending the Predictor. 
 There is a need for multivariate prediction, at least for regression. I 
 propose to modify RegressionModel interface similarly to how it is done in 
 ClassificationModel, which supports multiclass classification. It has 
 predict:Double and predictRaw:Vector. Analogously, RegressionModel 
 should have something like predictMultivariate:Vector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9120) Add multivariate regression (or prediction) interface

2015-07-16 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630596#comment-14630596
 ] 

Alexander Ulanov commented on SPARK-9120:
-

I think it should work for the train (aka fit) that has to return the model, 
not sure about the model itself. The common ancestor Model does not contain 
anything that can be called for prediction, its direct successor 
PredictionModel has predict:Double. Is there another way that you were 
mentioning?

 Add multivariate regression (or prediction) interface
 -

 Key: SPARK-9120
 URL: https://issues.apache.org/jira/browse/SPARK-9120
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 1.4.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 org.apache.spark.ml.regression.RegressionModel supports prediction only for a 
 single variable with a method predict:Double by extending the Predictor. 
 There is a need for multivariate prediction, at least for regression. I 
 propose to modify RegressionModel interface similarly to how it is done in 
 ClassificationModel, which supports multiclass classification. It has 
 predict:Double and predictRaw:Vector. Analogously, RegressionModel 
 should have something like predictMultivariate:Vector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9120) Add multivariate regression (or prediction) interface

2015-07-16 Thread Alexander Ulanov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630560#comment-14630560
 ] 

Alexander Ulanov commented on SPARK-9120:
-

Thank you for sharing your thoughts. Do you mean that the algorithm that does 
multivariate regression should not be implemented within ML since ML does not 
support multivariate, so the algorithm should live within MLlib for a while 
until you figure out a generic interface? By support I mean handling the .fit 
and .transform staff etc.

 Add multivariate regression (or prediction) interface
 -

 Key: SPARK-9120
 URL: https://issues.apache.org/jira/browse/SPARK-9120
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 1.4.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 org.apache.spark.ml.regression.RegressionModel supports prediction only for a 
 single variable with a method predict:Double by extending the Predictor. 
 There is a need for multivariate prediction, at least for regression. I 
 propose to modify RegressionModel interface similarly to how it is done in 
 ClassificationModel, which supports multiclass classification. It has 
 predict:Double and predictRaw:Vector. Analogously, RegressionModel 
 should have something like predictMultivariate:Vector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9120) Add multivariate regression (or prediction) interface

2015-07-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630644#comment-14630644
 ] 

Joseph K. Bradley commented on SPARK-9120:
--

The prediction method is transform().  Model inherits from Transformer, which 
declares transform().  The protected predict: Double method is really a 
convenience for developers so they don't have to implement transform() 
directly.  But if you implement transform() yourself, you have complete control 
over the schema of the input and output DataFrame.  (Prediction will mean 
adding one or more columns to the DataFrame.)

 Add multivariate regression (or prediction) interface
 -

 Key: SPARK-9120
 URL: https://issues.apache.org/jira/browse/SPARK-9120
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 1.4.0
Reporter: Alexander Ulanov
 Fix For: 1.4.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 org.apache.spark.ml.regression.RegressionModel supports prediction only for a 
 single variable with a method predict:Double by extending the Predictor. 
 There is a need for multivariate prediction, at least for regression. I 
 propose to modify RegressionModel interface similarly to how it is done in 
 ClassificationModel, which supports multiclass classification. It has 
 predict:Double and predictRaw:Vector. Analogously, RegressionModel 
 should have something like predictMultivariate:Vector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org