[ https://issues.apache.org/jira/browse/SPARK-30144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen resolved SPARK-30144. ---------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26838 [https://github.com/apache/spark/pull/26838] > MLP param map missing > --------------------- > > Key: SPARK-30144 > URL: https://issues.apache.org/jira/browse/SPARK-30144 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.4.4 > Reporter: Glen-Erik Cortes > Assignee: Huaxin Gao > Priority: Minor > Labels: release_notes > Fix For: 3.0.0 > > Attachments: MLP_params_missing.ipynb, > data_banknote_authentication.csv > > > Param maps for fitted classifiers are available with all classifiers except > for the MultilayerPerceptronClassifier. > > There is no way to track or know what parameters were best during a > crossvalidation or which parameters were used for submodels. > > {code:java} > { > Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', > name='featuresCol', doc='features column name'): 'features', > Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='labelCol', > doc='label column name'): 'fake_banknote', > Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', > name='predictionCol', doc='prediction column name'): 'prediction', > Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', > name='probabilityCol', doc='Column name for predicted class conditional > probabilities. Note: Not all models output well-calibrated probability > estimates! These probabilities should be treated as confidences, not precise > probabilities'): 'probability', > Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', > name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column > name'): 'rawPrediction'}{code} > > GBTClassifier for example shows all parameters: > > {code:java} > { > Param(parent='GBTClassifier_a0e77b3430aa', name='cacheNodeIds', doc='If > false, the algorithm will pass trees to executors to match instances with > nodes. If true, the algorithm will cache node IDs for each instance. Caching > can speed up training of deeper trees.'): False, > Param(parent='GBTClassifier_a0e77b3430aa', name='checkpointInterval', > doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means > that the cache will get checkpointed every 10 iterations. Note: this setting > will be ignored if the checkpoint directory is not set in the SparkContext'): > 10, > Param(parent='GBTClassifier_a0e77b3430aa', name='featureSubsetStrategy', > doc='The number of features to consider for splits at each tree node. > Supported options: auto, all, onethird, sqrt, log2, (0.0-1.0], [1-n].'): > 'all', > Param(parent='GBTClassifier_a0e77b3430aa', name='featuresCol', doc='features > column name'): 'features', > Param(parent='GBTClassifier_a0e77b3430aa', name='labelCol', doc='label column > name'): 'fake_banknote', Param(parent='GBTClassifier_a0e77b3430aa', > name='lossType', doc='Loss function which GBT tries to minimize > (case-insensitive). Supported options: logistic'): 'logistic', > Param(parent='GBTClassifier_a0e77b3430aa', name='maxBins', doc='Max number of > bins for discretizing continuous features. Must be >=2 and >= number of > categories for any categorical feature.'): 8, > Param(parent='GBTClassifier_a0e77b3430aa', name='maxDepth', doc='Maximum > depth of the tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1 > internal node + 2 leaf nodes.'): 5, > Param(parent='GBTClassifier_a0e77b3430aa', name='maxIter', doc='maximum > number of iterations (>= 0)'): 20, > Param(parent='GBTClassifier_a0e77b3430aa', name='maxMemoryInMB', doc='Maximum > memory in MB allocated to histogram aggregation.'): 256, > Param(parent='GBTClassifier_a0e77b3430aa', name='minInfoGain', doc='Minimum > information gain for a split to be considered at a tree node.'): 0.0, > Param(parent='GBTClassifier_a0e77b3430aa', name='minInstancesPerNode', > doc='Minimum number of instances each child must have after split. If a split > causes the left or right child to have fewer than minInstancesPerNode, the > split will be discarded as invalid. Should be >= 1.'): 1, > Param(parent='GBTClassifier_a0e77b3430aa', name='predictionCol', > doc='prediction column name'): 'prediction', > Param(parent='GBTClassifier_a0e77b3430aa', name='seed', doc='random seed'): > 1234, > Param(parent='GBTClassifier_a0e77b3430aa', name='stepSize', doc='Step size > (a.k.a. learning rate) in interval (0, 1] for shrinking the contribution of > each estimator.'): 0.1, > Param(parent='GBTClassifier_a0e77b3430aa', name='subsamplingRate', > doc='Fraction of the training data used for learning each decision tree, in > range (0, 1].'): 1.0}{code} > > See attached ipynb or example notebook here: > [https://colab.research.google.com/drive/1lwSHioZKlLh96FhGkdYFe6FUuRfTcSxH] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org