[ https://issues.apache.org/jira/browse/SPARK-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994007#comment-14994007 ]
Joseph K. Bradley commented on SPARK-11478: ------------------------------------------- {quote}it is difficult to get the "nullable" value of specific column before it generated{quote} --> Is this true? What is an example? I could imagine this happening in the future but cannot think of an example at this time. For now, does it work to change toStructField to set nullable to true? All of the UDFs which create Double fields apparently set nullable = true by default (because of how ScalaReflection works). In the long term, it'd be nice to have everything be an Option (allowing an unknown state). > ML StringIndexer return inconsistent schema > ------------------------------------------- > > Key: SPARK-11478 > URL: https://issues.apache.org/jira/browse/SPARK-11478 > Project: Spark > Issue Type: Bug > Components: ML > Reporter: Yanbo Liang > > ML StringIndexer transform and transformSchema return inconsistent schema. > {code} > val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, > "a"), (5, "c")), 2) > val df = sqlContext.createDataFrame(data).toDF("id", "label") > val indexer = new StringIndexer() > .setInputCol("label") > .setOutputCol("labelIndex") > .fit(df) > val transformed = indexer.transform(df) > println(transformed.schema.toString()) > println(indexer.transformSchema(df.schema)) > The nullable of "labelIndex" return inconsistent value: > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org