[GitHub] spark pull request #19152: [SPARK-21915][ML][PySpark] Model 1 and Model 2 Pa...

marktab Wed, 13 Sep 2017 19:59:37 -0700

GitHub user marktab reopened a pull request:

    https://github.com/apache/spark/pull/19152


    [SPARK-21915][ML][PySpark] Model 1 and Model 2 ParamMaps Missing

    @dongjoon-hyun @HyukjinKwon
    
    Error in PySpark example code:
    /examples/src/main/python/ml/estimator_transformer_param_example.py
    
    The original Scala code says
    println("Model 2 was fit using parameters: " + 
model2.parent.extractParamMap)
    
    The parent is lr
    
    There is no method for accessing parent as is done in Scala.
    
    This code has been tested in Python, and returns values consistent with 
Scala
    
    ## What changes were proposed in this pull request?
    
    Proposing to call the lr variable instead of model1 or model2
    
    ## How was this patch tested?
    
    This patch was tested with Spark 2.1.0 comparing the Scala and PySpark 
results. Pyspark returns nothing at present for those two print lines.
    
    The output for model2 in PySpark should be
    
    {Param(parent='LogisticRegression_4187be538f744d5a9090', name='tol', 
doc='the convergence tolerance for iterative algorithms (>= 0).'): 1e-06,
    Param(parent='LogisticRegression_4187be538f744d5a9090', 
name='elasticNetParam', doc='the ElasticNet mixing parameter, in range [0, 1]. 
For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 
penalty.'): 0.0,
    Param(parent='LogisticRegression_4187be538f744d5a9090', 
name='predictionCol', doc='prediction column name.'): 'prediction',
    Param(parent='LogisticRegression_4187be538f744d5a9090', name='featuresCol', 
doc='features column name.'): 'features',
    Param(parent='LogisticRegression_4187be538f744d5a9090', name='labelCol', 
doc='label column name.'): 'label',
    Param(parent='LogisticRegression_4187be538f744d5a9090', 
name='probabilityCol', doc='Column name for predicted class conditional 
probabilities. Note: Not all models output well-calibrated probability 
estimates! These probabilities should be treated as confidences, not precise 
probabilities.'): 'myProbability',
    Param(parent='LogisticRegression_4187be538f744d5a9090', 
name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column 
name.'): 'rawPrediction',
    Param(parent='LogisticRegression_4187be538f744d5a9090', name='family', 
doc='The name of family which is a description of the label distribution to be 
used in the model. Supported options: auto, binomial, multinomial'): 'auto',
    Param(parent='LogisticRegression_4187be538f744d5a9090', 
name='fitIntercept', doc='whether to fit an intercept term.'): True,
    Param(parent='LogisticRegression_4187be538f744d5a9090', name='threshold', 
doc='Threshold in binary classification prediction, in range [0, 1]. If 
threshold and thresholds are both set, they must match.e.g. if threshold is p, 
then thresholds must be equal to [1-p, p].'): 0.55,
    Param(parent='LogisticRegression_4187be538f744d5a9090', 
name='aggregationDepth', doc='suggested depth for treeAggregate (>= 2).'): 2,
    Param(parent='LogisticRegression_4187be538f744d5a9090', name='maxIter', 
doc='max number of iterations (>= 0).'): 30,
    Param(parent='LogisticRegression_4187be538f744d5a9090', name='regParam', 
doc='regularization parameter (>= 0).'): 0.1,
    Param(parent='LogisticRegression_4187be538f744d5a9090', 
name='standardization', doc='whether to standardize the training features 
before fitting the model.'): True}
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marktab/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19152.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19152
    
----
commit a2ccb8a83d13d39c95f0accccc1cac1c74dca064
Author: MarkTab marktab.net <mark...@users.noreply.github.com>
Date:   2017-09-07T02:20:59Z

    Model 1 and Model 2 ParamMaps Missing
    
    @dongjoon-hyun @HyukjinKwon
    
    Error in PySpark example code:
    
[https://github.com/apache/spark/blob/master/examples/src/main/python/ml/estimator_transformer_param_example.py]
    
    The original Scala code says
    println("Model 2 was fit using parameters: " + 
model2.parent.extractParamMap)
    
    The parent is lr
    
    There is no method for accessing parent as is done in Scala.
    
    This code has been tested in Python, and returns values consistent with 
Scala

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19152: [SPARK-21915][ML][PySpark] Model 1 and Model 2 Pa...

Reply via email to