[ https://issues.apache.org/jira/browse/SPARK-32796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
L. C. Hsieh resolved SPARK-32796. --------------------------------- Resolution: Won't Fix > Make withField API support nested struct in array > ------------------------------------------------- > > Key: SPARK-32796 > URL: https://issues.apache.org/jira/browse/SPARK-32796 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: L. C. Hsieh > Assignee: L. C. Hsieh > Priority: Major > > Currently {{Column.withField}} only supports {{StructType}}. For nested > struct in {{ArrayType}}, it doesn't support. We can support {{ArrayType}} to > make the API more general and useful. > It could significantly simplify how we add/replace deeply nested fields. > For example, > {code} > private lazy val arrayType = ArrayType( > StructType(Seq( > StructField("a", IntegerType, nullable = false), > StructField("b", IntegerType, nullable = true), > StructField("c", IntegerType, nullable = false))), > containsNull = true) > private lazy val arrayStructArrayLevel1: DataFrame = spark.createDataFrame( > sparkContext.parallelize(Row(Array(Row(Array(Row(1, null, 3)), null, 3))) > :: Nil), > StructType( > Seq(StructField("a", ArrayType( > StructType(Seq( > StructField("a", arrayType, nullable = false), > StructField("b", IntegerType, nullable = true), > StructField("c", IntegerType, nullable = false))), > containsNull = false))))) > {code} > The data looks like: > {code} > +---------------------------+ > |a | > +---------------------------+ > |[{[{1, null, 3}], null, 3}]| > +---------------------------+ > {code} > In order to replace deeply nested b column, like: > {code} > +------------------------+ > |a | > +------------------------+ > |[{[{1, 2, 3}], null, 3}]| > +------------------------+ > {code} > Currently by using transform + withField, we probably need: > {code} > arrayStructArrayLevel1.withColumn("a", > transform($"a", _.withField("a", > flatten(transform($"a.a", transform(_, _.withField("b", lit(2)))))))) > {code} > Using modified withField, we can do it like: > {code} > arrayStructArrayLevel1.withColumn("a", $"a".withField("a.b", lit(2))) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org