[jira] [Updated] (SPARK-7146) Should ML sharedParams be a public API?
[ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-7146: - Target Version/s: (was: 2.2.0) > Should ML sharedParams be a public API? > --- > > Key: SPARK-7146 > URL: https://issues.apache.org/jira/browse/SPARK-7146 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: Joseph K. Bradley > > Proposal: Make most of the Param traits in sharedParams.scala public. Mark > them as DeveloperApi. > Pros: > * Sharing the Param traits helps to encourage standardized Param names and > documentation. > Cons: > * Users have to be careful since parameters can have different meanings for > different algorithms. > * If the shared Params are public, then implementations could test for the > traits. It is unclear if we want users to rely on these traits, which are > somewhat experimental. > Currently, the shared params are private. > h3. UPDATED proposal > * Some Params are clearly safe to make public. We will do so. > * Some Params could be made public but may require caveats in the trait doc. > * Some Params have turned out not to be shared in practice. We can move > those Params to the classes which use them. > *Public shared params*: > * I/O column params > ** HasFeaturesCol > ** HasInputCol > ** HasInputCols > ** HasLabelCol > ** HasOutputCol > ** HasPredictionCol > ** HasProbabilityCol > ** HasRawPredictionCol > ** HasVarianceCol > ** HasWeightCol > * Algorithm settings > ** HasCheckpointInterval > ** HasElasticNetParam > ** HasFitIntercept > ** HasMaxIter > ** HasRegParam > ** HasSeed > ** HasStandardization (less common) > ** HasStepSize > ** HasTol > *Questionable params*: > * HasHandleInvalid (only used in StringIndexer, but might be more widely used > later on) > * HasSolver (used in LinearRegression and GeneralizedLinearRegression, but > same meaning as Optimizer in LDA) > *Params to be removed from sharedParams*: > * HasThreshold (only used in LogisticRegression) > * HasThresholds (only used in ProbabilisticClassifier) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7146) Should ML sharedParams be a public API?
[ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-7146: - Target Version/s: 2.2.0 (was: 2.1.0) > Should ML sharedParams be a public API? > --- > > Key: SPARK-7146 > URL: https://issues.apache.org/jira/browse/SPARK-7146 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: Joseph K. Bradley > > Proposal: Make most of the Param traits in sharedParams.scala public. Mark > them as DeveloperApi. > Pros: > * Sharing the Param traits helps to encourage standardized Param names and > documentation. > Cons: > * Users have to be careful since parameters can have different meanings for > different algorithms. > * If the shared Params are public, then implementations could test for the > traits. It is unclear if we want users to rely on these traits, which are > somewhat experimental. > Currently, the shared params are private. > h3. UPDATED proposal > * Some Params are clearly safe to make public. We will do so. > * Some Params could be made public but may require caveats in the trait doc. > * Some Params have turned out not to be shared in practice. We can move > those Params to the classes which use them. > *Public shared params*: > * I/O column params > ** HasFeaturesCol > ** HasInputCol > ** HasInputCols > ** HasLabelCol > ** HasOutputCol > ** HasPredictionCol > ** HasProbabilityCol > ** HasRawPredictionCol > ** HasVarianceCol > ** HasWeightCol > * Algorithm settings > ** HasCheckpointInterval > ** HasElasticNetParam > ** HasFitIntercept > ** HasMaxIter > ** HasRegParam > ** HasSeed > ** HasStandardization (less common) > ** HasStepSize > ** HasTol > *Questionable params*: > * HasHandleInvalid (only used in StringIndexer, but might be more widely used > later on) > * HasSolver (used in LinearRegression and GeneralizedLinearRegression, but > same meaning as Optimizer in LDA) > *Params to be removed from sharedParams*: > * HasThreshold (only used in LogisticRegression) > * HasThresholds (only used in ProbabilisticClassifier) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7146) Should ML sharedParams be a public API?
[ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-7146: - Target Version/s: 2.1.0 (was: 2.0.0) > Should ML sharedParams be a public API? > --- > > Key: SPARK-7146 > URL: https://issues.apache.org/jira/browse/SPARK-7146 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: Joseph K. Bradley > > Proposal: Make most of the Param traits in sharedParams.scala public. Mark > them as DeveloperApi. > Pros: > * Sharing the Param traits helps to encourage standardized Param names and > documentation. > Cons: > * Users have to be careful since parameters can have different meanings for > different algorithms. > * If the shared Params are public, then implementations could test for the > traits. It is unclear if we want users to rely on these traits, which are > somewhat experimental. > Currently, the shared params are private. > h3. UPDATED proposal > * Some Params are clearly safe to make public. We will do so. > * Some Params could be made public but may require caveats in the trait doc. > * Some Params have turned out not to be shared in practice. We can move > those Params to the classes which use them. > *Public shared params*: > * I/O column params > ** HasFeaturesCol > ** HasInputCol > ** HasInputCols > ** HasLabelCol > ** HasOutputCol > ** HasPredictionCol > ** HasProbabilityCol > ** HasRawPredictionCol > ** HasVarianceCol > ** HasWeightCol > * Algorithm settings > ** HasCheckpointInterval > ** HasElasticNetParam > ** HasFitIntercept > ** HasMaxIter > ** HasRegParam > ** HasSeed > ** HasStandardization (less common) > ** HasStepSize > ** HasTol > *Questionable params*: > * HasHandleInvalid (only used in StringIndexer, but might be more widely used > later on) > * HasSolver (used in LinearRegression and GeneralizedLinearRegression, but > same meaning as Optimizer in LDA) > *Params to be removed from sharedParams*: > * HasThreshold (only used in LogisticRegression) > * HasThresholds (only used in ProbabilisticClassifier) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7146) Should ML sharedParams be a public API?
[ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-7146: - Description: Proposal: Make most of the Param traits in sharedParams.scala public. Mark them as DeveloperApi. Pros: * Sharing the Param traits helps to encourage standardized Param names and documentation. Cons: * Users have to be careful since parameters can have different meanings for different algorithms. * If the shared Params are public, then implementations could test for the traits. It is unclear if we want users to rely on these traits, which are somewhat experimental. Currently, the shared params are private. h3. UPDATED proposal * Some Params are clearly safe to make public. We will do so. * Some Params could be made public but may require caveats in the trait doc. * Some Params have turned out not to be shared in practice. We can move those Params to the classes which use them. *Public shared params*: * I/O column params ** HasFeaturesCol ** HasInputCol ** HasInputCols ** HasLabelCol ** HasOutputCol ** HasPredictionCol ** HasProbabilityCol ** HasRawPredictionCol ** HasVarianceCol ** HasWeightCol * Algorithm settings ** HasCheckpointInterval ** HasElasticNetParam ** HasFitIntercept ** HasMaxIter ** HasRegParam ** HasSeed ** HasStandardization (less common) ** HasStepSize ** HasTol *Questionable params*: * HasHandleInvalid (only used in StringIndexer, but might be more widely used later on) * HasSolver (used in LinearRegression and GeneralizedLinearRegression, but same meaning as Optimizer in LDA) *Params to be removed from sharedParams*: * HasThreshold (only used in LogisticRegression) * HasThresholds (only used in ProbabilisticClassifier) was: Discussion: Should the Param traits in sharedParams.scala be public? Pros: * Sharing the Param traits helps to encourage standardized Param names and documentation. Cons: * Users have to be careful since parameters can have different meanings for different algorithms. * If the shared Params are public, then implementations could test for the traits. It is unclear if we want users to rely on these traits, which are somewhat experimental. Currently, the shared params are private. h3. UPDATED proposal * Some Params are clearly safe to make public. We will do so. * Some Params could be made public but may require caveats in the trait doc. * Some Params have turned out not to be shared in practice. We can move those Params to the classes which use them. *Public shared params*: * I/O column params ** HasFeaturesCol ** HasInputCol ** HasInputCols ** HasLabelCol ** HasOutputCol ** HasPredictionCol ** HasProbabilityCol ** HasRawPredictionCol ** HasVarianceCol ** HasWeightCol * Algorithm settings ** HasCheckpointInterval ** HasElasticNetParam ** HasFitIntercept ** HasMaxIter ** HasRegParam ** HasSeed ** HasStandardization (less common) ** HasStepSize ** HasTol *Questionable params*: * HasHandleInvalid (only used in StringIndexer, but might be more widely used later on) * HasSolver (used in LinearRegression and GeneralizedLinearRegression, but same meaning as Optimizer in LDA) *Params to be removed from sharedParams*: * HasThreshold (only used in LogisticRegression) * HasThresholds (only used in ProbabilisticClassifier) > Should ML sharedParams be a public API? > --- > > Key: SPARK-7146 > URL: https://issues.apache.org/jira/browse/SPARK-7146 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: Joseph K. Bradley > > Proposal: Make most of the Param traits in sharedParams.scala public. Mark > them as DeveloperApi. > Pros: > * Sharing the Param traits helps to encourage standardized Param names and > documentation. > Cons: > * Users have to be careful since parameters can have different meanings for > different algorithms. > * If the shared Params are public, then implementations could test for the > traits. It is unclear if we want users to rely on these traits, which are > somewhat experimental. > Currently, the shared params are private. > h3. UPDATED proposal > * Some Params are clearly safe to make public. We will do so. > * Some Params could be made public but may require caveats in the trait doc. > * Some Params have turned out not to be shared in practice. We can move > those Params to the classes which use them. > *Public shared params*: > * I/O column params > ** HasFeaturesCol > ** HasInputCol > ** HasInputCols > ** HasLabelCol > ** HasOutputCol > ** HasPredictionCol > ** HasProbabilityCol > ** HasRawPredictionCol > ** HasVarianceCol > ** HasWeightCol > * Algorithm settings > ** HasCheckpointInterval > ** HasElasticNetParam > ** HasFitIntercept > ** HasMaxIter > ** HasRegParam > ** HasSeed > ** HasStandardization (less common) > ** HasStepSize > ** HasTol > *Questionable
[jira] [Updated] (SPARK-7146) Should ML sharedParams be a public API?
[ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-7146: - Description: Discussion: Should the Param traits in sharedParams.scala be public? Pros: * Sharing the Param traits helps to encourage standardized Param names and documentation. Cons: * Users have to be careful since parameters can have different meanings for different algorithms. * If the shared Params are public, then implementations could test for the traits. It is unclear if we want users to rely on these traits, which are somewhat experimental. Currently, the shared params are private. h3. UPDATED proposal * Some Params are clearly safe to make public. We will do so. * Some Params could be made public but may require caveats in the trait doc. * Some Params have turned out not to be shared in practice. We can move those Params to the classes which use them. *Public shared params*: * I/O column params ** HasFeaturesCol ** HasInputCol ** HasInputCols ** HasLabelCol ** HasOutputCol ** HasPredictionCol ** HasProbabilityCol ** HasRawPredictionCol ** HasVarianceCol ** HasWeightCol * Algorithm settings ** HasCheckpointInterval ** HasElasticNetParam ** HasFitIntercept ** HasMaxIter ** HasRegParam ** HasSeed ** HasStandardization (less common) ** HasStepSize ** HasTol *Questionable params*: * HasHandleInvalid (only used in StringIndexer, but might be more widely used later on) * HasSolver (used in LinearRegression and GeneralizedLinearRegression, but same meaning as Optimizer in LDA) *Params to be removed from sharedParams*: * HasThreshold (only used in LogisticRegression) * HasThresholds (only used in ProbabilisticClassifier) was: Discussion: Should the Param traits in sharedParams.scala be public? Pros: * Sharing the Param traits helps to encourage standardized Param names and documentation. Cons: * Users have to be careful since parameters can have different meanings for different algorithms. * If the shared Params are public, then implementations could test for the traits. It is unclear if we want users to rely on these traits, which are somewhat experimental. Currently, the shared params are private. Proposal: Either (a) make the shared params private to encourage users to write specialized documentation and value checks for parameters, or (b) design a better way to encourage overriding documentation and parameter value checks > Should ML sharedParams be a public API? > --- > > Key: SPARK-7146 > URL: https://issues.apache.org/jira/browse/SPARK-7146 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: Joseph K. Bradley > > Discussion: Should the Param traits in sharedParams.scala be public? > Pros: > * Sharing the Param traits helps to encourage standardized Param names and > documentation. > Cons: > * Users have to be careful since parameters can have different meanings for > different algorithms. > * If the shared Params are public, then implementations could test for the > traits. It is unclear if we want users to rely on these traits, which are > somewhat experimental. > Currently, the shared params are private. > h3. UPDATED proposal > * Some Params are clearly safe to make public. We will do so. > * Some Params could be made public but may require caveats in the trait doc. > * Some Params have turned out not to be shared in practice. We can move > those Params to the classes which use them. > *Public shared params*: > * I/O column params > ** HasFeaturesCol > ** HasInputCol > ** HasInputCols > ** HasLabelCol > ** HasOutputCol > ** HasPredictionCol > ** HasProbabilityCol > ** HasRawPredictionCol > ** HasVarianceCol > ** HasWeightCol > * Algorithm settings > ** HasCheckpointInterval > ** HasElasticNetParam > ** HasFitIntercept > ** HasMaxIter > ** HasRegParam > ** HasSeed > ** HasStandardization (less common) > ** HasStepSize > ** HasTol > *Questionable params*: > * HasHandleInvalid (only used in StringIndexer, but might be more widely used > later on) > * HasSolver (used in LinearRegression and GeneralizedLinearRegression, but > same meaning as Optimizer in LDA) > *Params to be removed from sharedParams*: > * HasThreshold (only used in LogisticRegression) > * HasThresholds (only used in ProbabilisticClassifier) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7146) Should ML sharedParams be a public API?
[ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-7146: - Target Version/s: 2.0.0 (was: ) > Should ML sharedParams be a public API? > --- > > Key: SPARK-7146 > URL: https://issues.apache.org/jira/browse/SPARK-7146 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: Joseph K. Bradley > > Discussion: Should the Param traits in sharedParams.scala be public? > Pros: > * Sharing the Param traits helps to encourage standardized Param names and > documentation. > Cons: > * Users have to be careful since parameters can have different meanings for > different algorithms. > * If the shared Params are public, then implementations could test for the > traits. It is unclear if we want users to rely on these traits, which are > somewhat experimental. > Currently, the shared params are private. > Proposal: Either > (a) make the shared params private to encourage users to write specialized > documentation and value checks for parameters, or > (b) design a better way to encourage overriding documentation and parameter > value checks -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7146) Should ML sharedParams be a public API?
[ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-7146: - Target Version/s: 1.7.0 (was: 1.6.0) > Should ML sharedParams be a public API? > --- > > Key: SPARK-7146 > URL: https://issues.apache.org/jira/browse/SPARK-7146 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: Joseph K. Bradley > > Discussion: Should the Param traits in sharedParams.scala be public? > Pros: > * Sharing the Param traits helps to encourage standardized Param names and > documentation. > Cons: > * Users have to be careful since parameters can have different meanings for > different algorithms. > * If the shared Params are public, then implementations could test for the > traits. It is unclear if we want users to rely on these traits, which are > somewhat experimental. > Currently, the shared params are private. > Proposal: Either > (a) make the shared params private to encourage users to write specialized > documentation and value checks for parameters, or > (b) design a better way to encourage overriding documentation and parameter > value checks -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7146) Should ML sharedParams be a public API?
[ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-7146: - Description: Discussion: Should the Param traits in sharedParams.scala be public? Pros: * Sharing the Param traits helps to encourage standardized Param names and documentation. Cons: * Users have to be careful since parameters can have different meanings for different algorithms. * If the shared Params are public, then implementations could test for the traits. It is unclear if we want users to rely on these traits, which are somewhat experimental. Currently, the shared params are private. Proposal: Either (a) make the shared params private to encourage users to write specialized documentation and value checks for parameters, or (b) design a better way to encourage overriding documentation and parameter value checks was: Discussion: Should the Param traits in sharedParams.scala be private? Pros: * Users have to be careful since parameters can have different meanings for different algorithms. Cons: * Sharing the Param traits helps to encourage standardized Param names and documentation. * If the shared Params are public, then implementations could test for the traits. We probably do not want users to do that. Currently, the shared params are public but marked as DeveloperApi. Proposal: Either (a) make the shared params private to encourage users to write specialized documentation and value checks for parameters, or (b) design a better way to encourage overriding documentation and parameter value checks > Should ML sharedParams be a public API? > --- > > Key: SPARK-7146 > URL: https://issues.apache.org/jira/browse/SPARK-7146 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: Joseph K. Bradley > > Discussion: Should the Param traits in sharedParams.scala be public? > Pros: > * Sharing the Param traits helps to encourage standardized Param names and > documentation. > Cons: > * Users have to be careful since parameters can have different meanings for > different algorithms. > * If the shared Params are public, then implementations could test for the > traits. It is unclear if we want users to rely on these traits, which are > somewhat experimental. > Currently, the shared params are private. > Proposal: Either > (a) make the shared params private to encourage users to write specialized > documentation and value checks for parameters, or > (b) design a better way to encourage overriding documentation and parameter > value checks -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7146) Should ML sharedParams be a public API?
[ https://issues.apache.org/jira/browse/SPARK-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-7146: - Target Version/s: 1.6.0 (was: 1.5.0) > Should ML sharedParams be a public API? > --- > > Key: SPARK-7146 > URL: https://issues.apache.org/jira/browse/SPARK-7146 > Project: Spark > Issue Type: Brainstorming > Components: ML >Reporter: Joseph K. Bradley > > Discussion: Should the Param traits in sharedParams.scala be private? > Pros: > * Users have to be careful since parameters can have different meanings for > different algorithms. > Cons: > * Sharing the Param traits helps to encourage standardized Param names and > documentation. > * If the shared Params are public, then implementations could test for the > traits. We probably do not want users to do that. > Currently, the shared params are public but marked as DeveloperApi. > Proposal: Either > (a) make the shared params private to encourage users to write specialized > documentation and value checks for parameters, or > (b) design a better way to encourage overriding documentation and parameter > value checks -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org