[jira] [Updated] (SPARK-30144) MLP param map missing

Glen-Erik Cortes (Jira) Thu, 05 Dec 2019 15:21:51 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-30144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Glen-Erik Cortes updated SPARK-30144:
-------------------------------------
    Description: 
Param maps for fitted classifiers are available with all classifiers except for 
the
 MultilayerPerceptronClassifier.
  
 There is no way to track or know what parameters were best during a 
crossvalidation or which parameters were used for submodels.
  
{code:java}
{
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='featuresCol', 
doc='features column name'): 'features', 
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='labelCol', 
doc='label column name'): 'fake_banknote', 
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
name='predictionCol', doc='prediction column name'): 'prediction', 
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
name='probabilityCol', doc='Column name for predicted class conditional 
probabilities. Note: Not all models output well-calibrated probability 
estimates! These probabilities should be treated as confidences, not precise 
probabilities'): 'probability', 
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column name'): 
'rawPrediction'}{code}
 
 GBTClassifier for example shows all parameters:
  
{code:java}
  {
Param(parent='GBTClassifier_a0e77b3430aa', name='cacheNodeIds', doc='If false, 
the algorithm will pass trees to executors to match instances with nodes. If 
true, the algorithm will cache node IDs for each instance. Caching can speed up 
training of deeper trees.'): False, 
Param(parent='GBTClassifier_a0e77b3430aa', name='checkpointInterval', doc='set 
checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the 
cache will get checkpointed every 10 iterations. Note: this setting will be 
ignored if the checkpoint directory is not set in the SparkContext'): 10, 
Param(parent='GBTClassifier_a0e77b3430aa', name='featureSubsetStrategy', 
doc='The number of features to consider for splits at each tree node. Supported 
options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].'): 'all', 
Param(parent='GBTClassifier_a0e77b3430aa', name='featuresCol', doc='features 
column name'): 'features', 
Param(parent='GBTClassifier_a0e77b3430aa', name='labelCol', doc='label column 
name'): 'fake_banknote', Param(parent='GBTClassifier_a0e77b3430aa', 
name='lossType', doc='Loss function which GBT tries to minimize 
(case-insensitive). Supported options: logistic'): 'logistic', 
Param(parent='GBTClassifier_a0e77b3430aa', name='maxBins', doc='Max number of 
bins for discretizing continuous features. Must be >=2 and >= number of 
categories for any categorical feature.'): 8, 
Param(parent='GBTClassifier_a0e77b3430aa', name='maxDepth', doc='Maximum depth 
of the tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1 internal 
node + 2 leaf nodes.'): 5, Param(parent='GBTClassifier_a0e77b3430aa', 
name='maxIter', doc='maximum number of iterations (>= 0)'): 20, 
Param(parent='GBTClassifier_a0e77b3430aa', name='maxMemoryInMB', doc='Maximum 
memory in MB allocated to histogram aggregation.'): 256, 
Param(parent='GBTClassifier_a0e77b3430aa', name='minInfoGain', doc='Minimum 
information gain for a split to be considered at a tree node.'): 0.0, 
Param(parent='GBTClassifier_a0e77b3430aa', name='minInstancesPerNode', 
doc='Minimum number of instances each child must have after split. If a split 
causes the left or right child to have fewer than minInstancesPerNode, the 
split will be discarded as invalid. Should be >= 1.'): 1, 
Param(parent='GBTClassifier_a0e77b3430aa', name='predictionCol', 
doc='prediction column name'): 'prediction', 
Param(parent='GBTClassifier_a0e77b3430aa', name='seed', doc='random seed'): 
1234, 
Param(parent='GBTClassifier_a0e77b3430aa', name='stepSize', doc='Step size 
(a.k.a. learning rate) in interval (0, 1] for shrinking the contribution of 
each estimator.'): 0.1, 
Param(parent='GBTClassifier_a0e77b3430aa', name='subsamplingRate', 
doc='Fraction of the training data used for learning each decision tree, in 
range (0, 1].'): 1.0}{code}
 
 Full example notebook here:

[https://colab.research.google.com/drive/1lwSHioZKlLh96FhGkdYFe6FUuRfTcSxH]

  was:
Param maps for fitted classifiers are available with all classifiers except for 
the
MultilayerPerceptronClassifier.
 
There is no way to track or know what parameters were best during a 
crossvalidation or which parameters were used for submodels.
 
{code:java}
{Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
name='featuresCol', doc='features column name'): 'features', 
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='labelCol', 
doc='label column name'): 'fake_banknote', 
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
name='predictionCol', doc='prediction column name'): 'prediction', 
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
name='probabilityCol', doc='Column name for predicted class conditional 
probabilities. Note: Not all models output well-calibrated probability 
estimates! These probabilities should be treated as confidences, not precise 
probabilities'): 'probability', 
Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column name'): 
'rawPrediction'}{code}
 
GBTClassifier for example shows all parameters:
 
{code:java}
  {Param(parent='GBTClassifier_a0e77b3430aa', name='cacheNodeIds', doc='If 
false, the algorithm will pass trees to executors to match instances with 
nodes. If true, the algorithm will cache node IDs for each instance. Caching 
can speed up training of deeper trees.'): False, 
Param(parent='GBTClassifier_a0e77b3430aa', name='checkpointInterval', doc='set 
checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the 
cache will get checkpointed every 10 iterations. Note: this setting will be 
ignored if the checkpoint directory is not set in the SparkContext'): 10, 
Param(parent='GBTClassifier_a0e77b3430aa', name='featureSubsetStrategy', 
doc='The number of features to consider for splits at each tree node. Supported 
options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].'): 'all', 
Param(parent='GBTClassifier_a0e77b3430aa', name='featuresCol', doc='features 
column name'): 'features', Param(parent='GBTClassifier_a0e77b3430aa', 
name='labelCol', doc='label column name'): 'fake_banknote', 
Param(parent='GBTClassifier_a0e77b3430aa', name='lossType', doc='Loss function 
which GBT tries to minimize (case-insensitive). Supported options: logistic'): 
'logistic', Param(parent='GBTClassifier_a0e77b3430aa', name='maxBins', doc='Max 
number of bins for discretizing continuous features. Must be >=2 and >= number 
of categories for any categorical feature.'): 8, 
Param(parent='GBTClassifier_a0e77b3430aa', name='maxDepth', doc='Maximum depth 
of the tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1 internal 
node + 2 leaf nodes.'): 5, Param(parent='GBTClassifier_a0e77b3430aa', 
name='maxIter', doc='maximum number of iterations (>= 0)'): 20, 
Param(parent='GBTClassifier_a0e77b3430aa', name='maxMemoryInMB', doc='Maximum 
memory in MB allocated to histogram aggregation.'): 256, 
Param(parent='GBTClassifier_a0e77b3430aa', name='minInfoGain', doc='Minimum 
information gain for a split to be considered at a tree node.'): 0.0, 
Param(parent='GBTClassifier_a0e77b3430aa', name='minInstancesPerNode', 
doc='Minimum number of instances each child must have after split. If a split 
causes the left or right child to have fewer than minInstancesPerNode, the 
split will be discarded as invalid. Should be >= 1.'): 1, 
Param(parent='GBTClassifier_a0e77b3430aa', name='predictionCol', 
doc='prediction column name'): 'prediction', 
Param(parent='GBTClassifier_a0e77b3430aa', name='seed', doc='random seed'): 
1234, Param(parent='GBTClassifier_a0e77b3430aa', name='stepSize', doc='Step 
size (a.k.a. learning rate) in interval (0, 1] for shrinking the contribution 
of each estimator.'): 0.1, Param(parent='GBTClassifier_a0e77b3430aa', 
name='subsamplingRate', doc='Fraction of the training data used for learning 
each decision tree, in range (0, 1].'): 1.0}{code}
 
Full example notebook here:

https://colab.research.google.com/drive/1lwSHioZKlLh96FhGkdYFe6FUuRfTcSxH


> MLP param map missing
> ---------------------
>
>                 Key: SPARK-30144
>                 URL: https://issues.apache.org/jira/browse/SPARK-30144
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 2.4.4
>            Reporter: Glen-Erik Cortes
>            Priority: Minor
>
> Param maps for fitted classifiers are available with all classifiers except 
> for the
>  MultilayerPerceptronClassifier.
>   
>  There is no way to track or know what parameters were best during a 
> crossvalidation or which parameters were used for submodels.
>   
> {code:java}
> {
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
> name='featuresCol', doc='features column name'): 'features', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='labelCol', 
> doc='label column name'): 'fake_banknote', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
> name='predictionCol', doc='prediction column name'): 'prediction', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
> name='probabilityCol', doc='Column name for predicted class conditional 
> probabilities. Note: Not all models output well-calibrated probability 
> estimates! These probabilities should be treated as confidences, not precise 
> probabilities'): 'probability', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
> name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column 
> name'): 'rawPrediction'}{code}
>  
>  GBTClassifier for example shows all parameters:
>   
> {code:java}
>   {
> Param(parent='GBTClassifier_a0e77b3430aa', name='cacheNodeIds', doc='If 
> false, the algorithm will pass trees to executors to match instances with 
> nodes. If true, the algorithm will cache node IDs for each instance. Caching 
> can speed up training of deeper trees.'): False, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='checkpointInterval', 
> doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means 
> that the cache will get checkpointed every 10 iterations. Note: this setting 
> will be ignored if the checkpoint directory is not set in the SparkContext'): 
> 10, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='featureSubsetStrategy', 
> doc='The number of features to consider for splits at each tree node. 
> Supported options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].'): 
> 'all', 
> Param(parent='GBTClassifier_a0e77b3430aa', name='featuresCol', doc='features 
> column name'): 'features', 
> Param(parent='GBTClassifier_a0e77b3430aa', name='labelCol', doc='label column 
> name'): 'fake_banknote', Param(parent='GBTClassifier_a0e77b3430aa', 
> name='lossType', doc='Loss function which GBT tries to minimize 
> (case-insensitive). Supported options: logistic'): 'logistic', 
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxBins', doc='Max number of 
> bins for discretizing continuous features. Must be >=2 and >= number of 
> categories for any categorical feature.'): 8, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxDepth', doc='Maximum 
> depth of the tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1 
> internal node + 2 leaf nodes.'): 5, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxIter', doc='maximum 
> number of iterations (>= 0)'): 20, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxMemoryInMB', doc='Maximum 
> memory in MB allocated to histogram aggregation.'): 256, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='minInfoGain', doc='Minimum 
> information gain for a split to be considered at a tree node.'): 0.0, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='minInstancesPerNode', 
> doc='Minimum number of instances each child must have after split. If a split 
> causes the left or right child to have fewer than minInstancesPerNode, the 
> split will be discarded as invalid. Should be >= 1.'): 1, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='predictionCol', 
> doc='prediction column name'): 'prediction', 
> Param(parent='GBTClassifier_a0e77b3430aa', name='seed', doc='random seed'): 
> 1234, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='stepSize', doc='Step size 
> (a.k.a. learning rate) in interval (0, 1] for shrinking the contribution of 
> each estimator.'): 0.1, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='subsamplingRate', 
> doc='Fraction of the training data used for learning each decision tree, in 
> range (0, 1].'): 1.0}{code}
>  
>  Full example notebook here:
> [https://colab.research.google.com/drive/1lwSHioZKlLh96FhGkdYFe6FUuRfTcSxH]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30144) MLP param map missing

Reply via email to