[ 
https://issues.apache.org/jira/browse/SPARK-13721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15464427#comment-15464427
 ] 

Ewan Leith commented on SPARK-13721:
------------------------------------

Assuming Don's use case is the same as ours, we have to do odd looking queries 
like this pseudo-code to get the full set of entries when using explode with 
records where the nested array is not always populated (with the .filter's to 
make it explicit what's happening):

val df1 = df
  .filter("column.nested_array is not null")
  .withColumn("element", explode(col("column.nested_array")))
  .select("other_column", "element")

val df2 = df
  .filter("column.nested_array is null")
  .select("other_column", lit("") as "element")

df1.unionAll(df2)



> Add support for LATERAL VIEW OUTER explode()
> --------------------------------------------
>
>                 Key: SPARK-13721
>                 URL: https://issues.apache.org/jira/browse/SPARK-13721
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Ian Hellstrom
>
> Hive supports the [LATERAL VIEW 
> OUTER|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews]
>  syntax to make sure that when an array is empty, the content from the outer 
> table is still returned. 
> Within Spark, this is currently only possible within the HiveContext and 
> executing HiveQL statements. It would be nice if the standard explode() 
> DataFrame method allows the same. A possible signature would be: 
> {code:scala}
> explode[A, B](inputColumn: String, outputColumn: String, outer: Boolean = 
> false)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to