[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2017-05-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995862#comment-15995862
 ] 

Apache Spark commented on SPARK-10931:
--

User 'BryanCutler' has created a pull request for this issue:
https://github.com/apache/spark/pull/17849

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2017-04-09 Thread Maciej Szymkiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962141#comment-15962141
 ] 

Maciej Szymkiewicz commented on SPARK-10931:


[~vlad.feinberg] It is worth noting that without {{parent}} some features (like 
{{CrossValidator}} or  {{TrainValidationSplit}}) are crippled. 

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2016-08-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421753#comment-15421753
 ] 

Apache Spark commented on SPARK-10931:
--

User 'evanyc15' has created a pull request for this issue:
https://github.com/apache/spark/pull/14653

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2016-07-11 Thread Vladimir Feinberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371420#comment-15371420
 ] 

Vladimir Feinberg commented on SPARK-10931:
---

[~josephkb] Te intention of this JIRA is a bit confusing. To my understanding, 
there are three kinds of params:

1. Estimator-related params that only have to do with fitting (e.g., 
regularization)
2. Independent model and estimator-related params to do with prediction (e.g., 
number of maximum iterations)
3. Shared model and estimator params that are set once per fitted pipeline 
(e.g., number of components in PCA).

I'd venture that we'd want a model to have:

1. Access to an immutable version of (1) and (3).
  * In Scala, this is done by having a {{parent}} reference to the generating 
{{Estimator}}, but this is a reference, so if the estimator changes then the 
params will, too, inconsistent with the model. It should be copy-on-write (this 
may be SPARK-7494, I'm not sure). Also, {{parent}} is a mutable reference.
  * In Python, there is no {{parent}}

2. Access to a mutable version of (2), where mutation should change model 
behavior
  * Both languages have this.

3. Separation of concerns. If a parameter falls into categories (1) or (3), it 
shouldn't be a parameter for the model, since changing its value has no effect 
except confusion
  * Both Python and Scala will, as of this JIRA, copy everything - groups (1), 
(2), (3) - to the model, each with its own version.

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2016-05-03 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270105#comment-15270105
 ] 

holdenk commented on SPARK-10931:
-

So, just to be certain, for https://issues.apache.org/jira/browse/SPARK-14813 
we won't try and resolve the params not being present in the models?

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2016-04-29 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264815#comment-15264815
 ] 

Joseph K. Bradley commented on SPARK-10931:
---

Changing target to 2.1 since the code freeze is upon us.

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2016-03-10 Thread Timothy Hunter (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190093#comment-15190093
 ] 

Timothy Hunter commented on SPARK-10931:


Using python decorators, it is fairly easy to autogenerate at runtime all the 
param wrappers, getters and setters and extract the documentation from the 
scala side so that the documentation of the parameter is included in the 
docstring of the getters and setters.

There are two issues with that:
 - do we need to specialize the documentation or some of the conversions 
between java and python? In both cases, it is possible to "subclass" and make 
sure that the methods do not get overwritten by some autogenerated stubs
 - the documentation of a class (which relies on the bytecode, not on runtime 
instances) would miss all the params, because they are only generated in 
runtime objects. I believe there are some ways around it, such as inserting 
such methods at import time, but that would require more investigation.

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2016-01-12 Thread Evan Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094671#comment-15094671
 ] 

Evan Chen commented on SPARK-10931:
---

Hey Joseph,

Please advise on which method is preferred.

Thank you

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2015-12-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061051#comment-15061051
 ] 

Joseph K. Bradley commented on SPARK-10931:
---

I'd very strongly prefer not to modify every model.  I believe we can save a 
lot of code by using a generic, shared implementation.  Check out {{getattr}} 
here: [https://docs.python.org/2/library/functions.html]

In the wrapper.py file in spark.ml, there are some abstractions defined.  I'm 
hoping one of those can be modified to provide access to Params.

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2015-12-16 Thread Evan Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061090#comment-15061090
 ] 

Evan Chen commented on SPARK-10931:
---

Hey Joseph,

If using the getattr method, are you suggesting fetching the parameter straight 
from the Model java object or from the Estimator and copying it into the Model 
itself?

Thanks

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2015-12-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061324#comment-15061324
 ] 

Joseph K. Bradley commented on SPARK-10931:
---

I was thinking of fetching it from the Java object and returning it.  That 
would probably only be reasonable for getters (getMyParam), not the Param or 
the setter (which won't apply to most models anyways).

On the other hand, the problem with my suggestion is that it won't show up in 
the API documentation.  It might be worth modifying each model individually.  
I'm wondering if we should think about ways to keep the Params synched with the 
JVM model; I'd like to think about ways to do this more.

Just a warning: I'll be traveling for a bit and won't be able to check the 
status for 1.5 weeks, but it's possible others can take a look in the meantime.

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2015-12-11 Thread Evan Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053808#comment-15053808
 ] 

Evan Chen commented on SPARK-10931:
---

Hey Joseph,

Thanks for the suggestion. 
I was wondering what model abstraction and getattr method are you referring to?
I modified every model on the Python side to reflect how it is being done on 
the Scala side. 
Let me know what you think.

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2015-12-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053721#comment-15053721
 ] 

Apache Spark commented on SPARK-10931:
--

User 'evanyc15' has created a pull request for this issue:
https://github.com/apache/spark/pull/10270

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2015-12-08 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047324#comment-15047324
 ] 

Joseph K. Bradley commented on SPARK-10931:
---

OK great!  Btw, I just thought that this might be doable in a generic way which 
does not require modifying every Model:
* In the Model abstraction, override getattr to check for whether the attribute 
name is the name of a Param in the parent Estimator.  If so, return that Param. 
 If not, call the default getattr, if any.


> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2015-12-04 Thread Evan Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042482#comment-15042482
 ] 

Evan Chen commented on SPARK-10931:
---

Hey Joseph,

Just wanted to give a status update. I should be submitting a PR soon.

Thanks,

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10931) PySpark ML Models should contain Param values

2015-10-06 Thread Evan Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945505#comment-14945505
 ] 

Evan Chen commented on SPARK-10931:
---

Hey Joseph,

I can work on this.

Thanks

> PySpark ML Models should contain Param values
> -
>
> Key: SPARK-10931
> URL: https://issues.apache.org/jira/browse/SPARK-10931
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>
> PySpark spark.ml Models are generally wrappers around Java objects and do not 
> even contain Param values.  This JIRA is for copying the Param values from 
> the Estimator to the model.
> This can likely be solved by modifying Estimator.fit to copy Param values, 
> but should also include proper unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org