[jira] [Created] (SPARK-24712) TrainValidationSplit ignores label column name and forces to be "label"

Pablo J. Villacorta (JIRA) Sun, 01 Jul 2018 11:49:47 -0700

Pablo J. Villacorta created SPARK-24712:
-------------------------------------------


             Summary: TrainValidationSplit ignores label column name and forces 
to be "label"
                 Key: SPARK-24712
                 URL: https://issues.apache.org/jira/browse/SPARK-24712
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 2.2.0
            Reporter: Pablo J. Villacorta


When a TrainValidationSplit is fit on a Pipeline containing a ML model, the 
labelCol property of the model is ignored, and the call to fit() will fail 
unless the labelCol equals "label". As an example, the following pyspark code 
only wors when the variable labelColumn is set to "label"
{code:java}
from pyspark.sql.functions import rand, randn
from pyspark.ml.regression import LinearRegression

labelColumn = "target"  # CHANGE THIS TO "label" AND THE CODE WORKS

df = spark.range(0, 10).select(rand(seed=10).alias("uniform"), 
randn(seed=27).alias(labelColumn))
vectorAssembler = 
VectorAssembler().setInputCols(["uniform"]).setOutputCol("features")
lr = LinearRegression().setFeaturesCol("features").setLabelCol(labelColumn)
mypipeline = Pipeline(stages = [vectorAssembler, lr])

paramGrid = ParamGridBuilder()\
.addGrid(lr.regParam, [0.01, 0.1])\
.build()

trainValidationSplit = TrainValidationSplit()\
.setEstimator(mypipeline)\
.setEvaluator(RegressionEvaluator())\
.setEstimatorParamMaps(paramGrid)\
.setTrainRatio(0.8)

trainValidationSplit.fit(df)  # FAIL UNLESS labelColumn IS SET TO "label"
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24712) TrainValidationSplit ignores label column name and forces to be "label"

Reply via email to