[ 
https://issues.apache.org/jira/browse/ARROW-12099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555329#comment-17555329
 ] 

Nick Crews edited comment on ARROW-12099 at 6/17/22 12:13 AM:
--------------------------------------------------------------

Small tweak to Guido's implementation (thank you for this!): If the table only 
has the one ListArray or MapArray column, then it crashes.

This handles that case:
{code:python}
import pyarrow as pa
import pyarrow.compute as pc
def explode_table(table, column):
    null_filled = pc.fill_null(table[column], [None])
    flattened = pc.list_flatten(null_filled)
    other_columns = list(table.schema.names)
    other_columns.remove(column)
    if len(other_columns) == 0:
        return pa.table({column: flattened})
    else:
        indices = pc.list_parent_indices(null_filled)
        result = table.select(other_columns).take(indices)
        result = result.append_column(
            pa.field(column, table.schema.field(column).type.value_type),
            flattened,
        )
        return result {code}


was (Author: JIRAUSER291113):
Small tweak to Guido's implementation (thank you for this!): If the table only 
has the one ListArray or MapArray column, then it crashes.

This handles that case:
{code:python}
import pyarrow as paimport pyarrow.compute as pc
def explode_table(table, column):    null_filled = pc.fill_null(table[column], 
[None])    flattened = pc.list_flatten(null_filled)    other_columns = 
list(table.schema.names)    other_columns.remove(column)    if 
len(other_columns) == 0:        return pa.table({column: flattened})    else:   
     indices = pc.list_parent_indices(null_filled)        result = 
table.select(other_columns).take(indices)        result = result.append_column( 
           pa.field(column, table.schema.field(column).type.value_type),        
    flattened,        )        return result {code}

> [Python] Explode array column
> -----------------------------
>
>                 Key: ARROW-12099
>                 URL: https://issues.apache.org/jira/browse/ARROW-12099
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Malthe Borch
>            Priority: Major
>
> In Apache Spark, 
> [explode|https://spark.apache.org/docs/latest/api/sql/index.html#explode] 
> separates the elements of an array column (or expression) into multiple row.
> Note that each explode works at the top-level only (not recursively).
> This would also work with the existing 
> [flatten|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten]
>  method to allow fully unnesting a 
> [pyarrow.StructArray|https://arrow.apache.org/docs/python/generated/pyarrow.StructArray.html#pyarrow-structarray].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to