[ https://issues.apache.org/jira/browse/SPARK-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-26136. ---------------------------------- Resolution: Invalid For questions, please ask to mailing list next time. When filing an issue, please make it readable as much as possible. > Row.getAs return null value in some condition > --------------------------------------------- > > Key: SPARK-26136 > URL: https://issues.apache.org/jira/browse/SPARK-26136 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.3.0, 2.3.2, 2.4.0 > Environment: Windows 10 > JDK 1.8.0_181 > scala 2.11.12 > spark 2.4.0 / 2.3.2 / 2.3.0 > > Reporter: Charlie Feng > Priority: Major > > {{Row.getAs("fieldName")}} will return null value when all below conditions > met: > * Used in {{DataFrame.flatMap()}} > * {{Another map()}} call inside {{flatMap}} > * call {{row.getAs("fieldName")}} inside a {{Tuple}}. > Source code to reproduce the bug: > {code} > import org.apache.spark.sql.SparkSession > object FlatMapGetAsBug { > def main(args: Array[String]) { > val spark = > SparkSession.builder.appName("SparkUtil").master("local").getOrCreate > import spark.implicits._; > val df = Seq(("a1", "b1", "x,y,z")).toDF("A", "B", "XYZ") > df.show(); > val df2 = df.flatMap { row => > row.getAs[String]("XYZ").split(",").map { xyz => > var colA: String = row.getAs("A"); > var col0: String = row.getString(0); > (row.getAs("A"), colA, row.getString(0), col0, row.getString(1), xyz) > } > }.toDF("ColumnA_API1", "ColumnA_API2", "ColumnA_API3", "ColumnA_API4", > "ColumnB", "ColumnXYZ") > df2.show(); > spark.close() > } > } > {code} > Console Output: > {code} > +---+---+-----+ > | A| B| XYZ| > +---+---+-----+ > | a1| b1|x,y,z| > +---+---+-----+ > +------------+------------+------------+------------+-------+---------+ > |ColumnA_API1|ColumnA_API2|ColumnA_API3|ColumnA_API4|ColumnB|ColumnXYZ| > +------------+------------+------------+------------+-------+---------+ > | null| a1| a1| a1| b1| x| > | null| a1| a1| a1| b1| y| > | null| a1| a1| a1| b1| z| > +------------+------------+------------+------------+-------+---------+ > {code} > We try to get "A" column with 4 approach > 1. call {{row.getAs("A")}} inside a tuple > 2. call {{row.getAs("A")}}, save result into a variable "colA", and add > variable into the tuple > 3. call {{row.getString(0)}} inside a tuple > 4. call {{row.getString(0)}}, save result into a variable "col0", and add > variable into the tuple > And we found that approach 2~4 get value "a1" successfully, but approach 1 > get "null" > This issue existing in spark 2.4.0/2.3.2/2.3.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org