[ https://issues.apache.org/jira/browse/SPARK-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charlie Feng updated SPARK-26136: --------------------------------- Component/s: Spark Core > Row.getAs return null value in some condition > --------------------------------------------- > > Key: SPARK-26136 > URL: https://issues.apache.org/jira/browse/SPARK-26136 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.3.0, 2.3.2, 2.4.0 > Environment: Windows 10 > JDK 1.8.0_181 > scala 2.11.12 > spark 2.4.0 / 2.3.2 / 2.3.0 > > Reporter: Charlie Feng > Priority: Major > > Row.getAs("fieldName") will return null value when all below conditions met: > * Used in DataFrame.flatMap() > * Another map() call inside flatMap > * call row.getAs("fieldName") inside a Tuple. > *Source code to reproduce the bug:* > import org.apache.spark.sql.SparkSession > object FlatMapGetAsBug { > def main(args: Array[String]) { > val spark = > SparkSession.builder.appName("SparkUtil").master("local").getOrCreate > import spark.implicits._; > val df = Seq(("a1", "b1", "x,y,z")).toDF("A", "B", "XYZ") > df.show(); > val df2 = df.flatMap(row => row.getAs[String]("XYZ").split(",") > .map(xyz => { > var colA: String = row.getAs("A"); > var col0: String = row.getString(0); > (row.getAs("A"), colA, row.getString(0), col0, row.getString(1), xyz) > })).toDF("ColumnA_API1", "ColumnA_API2", "ColumnA_API3", "ColumnA_API4", > "ColumnB", "ColumnXYZ") > df2.show(); > spark.close() > } > } > *Console Output:* > +---+---+-----+ > | A| B| XYZ| > +---+---+-----+ > | a1| b1|x,y,z| > +---+---+-----+ > +------------+------------+------------+------------+-------+---------+ > |ColumnA_API1|ColumnA_API2|ColumnA_API3|ColumnA_API4|ColumnB|ColumnXYZ| > +------------+------------+------------+------------+-------+---------+ > | null| a1| a1| a1| b1| x| > | null| a1| a1| a1| b1| y| > | null| a1| a1| a1| b1| z| > +------------+------------+------------+------------+-------+---------+ > We try to get "A" column with 4 approach > 1) call row.getAs("A") inside a tuple > 2) call row.getAs("A"), save result into a variable "colA", and add variable > into the tuple > 3) call row.getString(0) inside a tuple > 4) call row.getString(0), save result into a variable "col0", and add > variable into the tuple > And we found that approach 2~4 get value "a1" successfully, but approach 1 > get "null" > This issue existing in spark 2.4.0/2.3.2/2.3.0 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org