[ 
https://issues.apache.org/jira/browse/SPARK-24712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Gaido resolved SPARK-24712.
---------------------------------
    Resolution: Not A Problem

> TrainValidationSplit ignores label column name and forces to be "label"
> -----------------------------------------------------------------------
>
>                 Key: SPARK-24712
>                 URL: https://issues.apache.org/jira/browse/SPARK-24712
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Pablo J. Villacorta
>            Priority: Major
>
> When a TrainValidationSplit is fit on a Pipeline containing a ML model, the 
> labelCol property of the model is ignored, and the call to fit() will fail 
> unless the labelCol equals "label". As an example, the following pyspark code 
> only works when the variable labelColumnĀ is set to "label"
> {code:java}
> from pyspark.sql.functions import rand, randn
> from pyspark.ml.regression import LinearRegression
> labelColumn = "target"  # CHANGE THIS TO "label" AND THE CODE WORKS
> df = spark.range(0, 10).select(rand(seed=10).alias("uniform"), 
> randn(seed=27).alias(labelColumn))
> vectorAssembler = 
> VectorAssembler().setInputCols(["uniform"]).setOutputCol("features")
> lr = LinearRegression().setFeaturesCol("features").setLabelCol(labelColumn)
> mypipeline = Pipeline(stages = [vectorAssembler, lr])
> paramGrid = ParamGridBuilder()\
> .addGrid(lr.regParam, [0.01, 0.1])\
> .build()
> trainValidationSplit = TrainValidationSplit()\
> .setEstimator(mypipeline)\
> .setEvaluator(RegressionEvaluator())\
> .setEstimatorParamMaps(paramGrid)\
> .setTrainRatio(0.8)
> trainValidationSplit.fit(df)  # FAIL UNLESS labelColumn IS SET TO "label"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to