[jira] [Updated] (SPARK-30144) MLP param map missing

Huaxin Gao (Jira) Thu, 26 Dec 2019 17:21:46 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-30144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Huaxin Gao updated SPARK-30144:
-------------------------------
    Docs Text: 
>From 3.0, MultilayerPerceptronClassificationModel extends 
>MultilayerPerceptronParams to expose the training params. As a result, 
layers in MultilayerPerceptronClassificationModel has been changed from 
Array[Int] to IntArrayParam. User should use 
MultilayerPerceptronClassificationModel.getLayers instead of 
MultilayerPerceptronClassificationModel.layers to retrieve the size of layers. 

> MLP param map missing
> ---------------------
>
>                 Key: SPARK-30144
>                 URL: https://issues.apache.org/jira/browse/SPARK-30144
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.4.4
>            Reporter: Glen-Erik Cortes
>            Priority: Minor
>              Labels: release_notes
>         Attachments: MLP_params_missing.ipynb, 
> data_banknote_authentication.csv
>
>
> Param maps for fitted classifiers are available with all classifiers except 
> for the MultilayerPerceptronClassifier.
>   
>  There is no way to track or know what parameters were best during a 
> crossvalidation or which parameters were used for submodels.
>   
> {code:java}
> {
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
> name='featuresCol', doc='features column name'): 'features', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='labelCol', 
> doc='label column name'): 'fake_banknote', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
> name='predictionCol', doc='prediction column name'): 'prediction', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
> name='probabilityCol', doc='Column name for predicted class conditional 
> probabilities. Note: Not all models output well-calibrated probability 
> estimates! These probabilities should be treated as confidences, not precise 
> probabilities'): 'probability', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', 
> name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column 
> name'): 'rawPrediction'}{code}
>  
>  GBTClassifier for example shows all parameters:
>   
> {code:java}
>   {
> Param(parent='GBTClassifier_a0e77b3430aa', name='cacheNodeIds', doc='If 
> false, the algorithm will pass trees to executors to match instances with 
> nodes. If true, the algorithm will cache node IDs for each instance. Caching 
> can speed up training of deeper trees.'): False, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='checkpointInterval', 
> doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means 
> that the cache will get checkpointed every 10 iterations. Note: this setting 
> will be ignored if the checkpoint directory is not set in the SparkContext'): 
> 10, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='featureSubsetStrategy', 
> doc='The number of features to consider for splits at each tree node. 
> Supported options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].'): 
> 'all', 
> Param(parent='GBTClassifier_a0e77b3430aa', name='featuresCol', doc='features 
> column name'): 'features', 
> Param(parent='GBTClassifier_a0e77b3430aa', name='labelCol', doc='label column 
> name'): 'fake_banknote', Param(parent='GBTClassifier_a0e77b3430aa', 
> name='lossType', doc='Loss function which GBT tries to minimize 
> (case-insensitive). Supported options: logistic'): 'logistic', 
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxBins', doc='Max number of 
> bins for discretizing continuous features. Must be >=2 and >= number of 
> categories for any categorical feature.'): 8, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxDepth', doc='Maximum 
> depth of the tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1 
> internal node + 2 leaf nodes.'): 5, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxIter', doc='maximum 
> number of iterations (>= 0)'): 20, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxMemoryInMB', doc='Maximum 
> memory in MB allocated to histogram aggregation.'): 256, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='minInfoGain', doc='Minimum 
> information gain for a split to be considered at a tree node.'): 0.0, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='minInstancesPerNode', 
> doc='Minimum number of instances each child must have after split. If a split 
> causes the left or right child to have fewer than minInstancesPerNode, the 
> split will be discarded as invalid. Should be >= 1.'): 1, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='predictionCol', 
> doc='prediction column name'): 'prediction', 
> Param(parent='GBTClassifier_a0e77b3430aa', name='seed', doc='random seed'): 
> 1234, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='stepSize', doc='Step size 
> (a.k.a. learning rate) in interval (0, 1] for shrinking the contribution of 
> each estimator.'): 0.1, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='subsamplingRate', 
> doc='Fraction of the training data used for learning each decision tree, in 
> range (0, 1].'): 1.0}{code}
>  
> See attached ipynb or example notebook here:
> [https://colab.research.google.com/drive/1lwSHioZKlLh96FhGkdYFe6FUuRfTcSxH]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30144) MLP param map missing

Reply via email to