[ https://issues.apache.org/jira/browse/SPARK-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987807#comment-14987807 ]
Apache Spark commented on SPARK-11478: -------------------------------------- User 'rekhajoshm' has created a pull request for this issue: https://github.com/apache/spark/pull/9440 > ML StringIndexer return inconsistent schema > ------------------------------------------- > > Key: SPARK-11478 > URL: https://issues.apache.org/jira/browse/SPARK-11478 > Project: Spark > Issue Type: Bug > Components: ML > Reporter: Yanbo Liang > > ML StringIndexer transform and transformSchema return inconsistent schema. > {code} > val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, > "a"), (5, "c")), 2) > val df = sqlContext.createDataFrame(data).toDF("id", "label") > val indexer = new StringIndexer() > .setInputCol("label") > .setOutputCol("labelIndex") > .fit(df) > val transformed = indexer.transform(df) > println(transformed.schema.toString()) > println(indexer.transformSchema(df.schema)) > The nullable of "labelIndex" return inconsistent value: > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org