[ https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15464427#comment-15464427 ]
Ewan Leith commented on SPARK-13721: ------------------------------------ Assuming Don's use case is the same as ours, we have to do odd looking queries like this pseudo-code to get the full set of entries when using explode with records where the nested array is not always populated (with the .filter's to make it explicit what's happening): val df1 = df .filter("column.nested_array is not null") .withColumn("element", explode(col("column.nested_array"))) .select("other_column", "element") val df2 = df .filter("column.nested_array is null") .select("other_column", lit("") as "element") df1.unionAll(df2) > Add support for LATERAL VIEW OUTER explode() > -------------------------------------------- > > Key: SPARK-13721 > URL: https://issues.apache.org/jira/browse/SPARK-13721 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Ian Hellstrom > > Hive supports the [LATERAL VIEW > OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews] > syntax to make sure that when an array is empty, the content from the outer > table is still returned. > Within Spark, this is currently only possible within the HiveContext and > executing HiveQL statements. It would be nice if the standard explode() > DataFrame method allows the same. A possible signature would be: > {code:scala} > explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = > false) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org