Wes McKinney created ARROW-1681:
-----------------------------------

             Summary: [Python] Error writing with nulls in lists
                 Key: ARROW-1681
                 URL: https://issues.apache.org/jira/browse/ARROW-1681
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.7.1
            Reporter: Wes McKinney
             Fix For: 0.8.0


Created from https://github.com/apache/arrow/issues/1208

Hi,
Not sure if this is related or the same as ARROW-1584, but I can't seem to find 
a way to handle arrays of lists which occasionally consist of empty lists only.

To reproduce:

{code}
na = [] # None, [""]

arrays = {
    'c1': pa.array([["test"], na, na], type=pa.list_(pa.string())),
    'c2': pa.array([na, na, na], type=pa.list_(pa.string())),
}

rb = pa.RecordBatch.from_arrays(list(arrays.values()), list(arrays.keys()))
df = rb.to_pandas()

pa.serialize_pandas(df)
# > ArrowNotImplementedError: Unable to convert type: null

tbl = pa.Table.from_pandas(df)
sink = pa.BufferOutputStream()
writer = pa.RecordBatchFileWriter(sink, tbl.schema)
writer.write_table(tbl)
# > ArrowNotImplementedError: Unable to convert type: null
{code}

In my use case I'm processing data in batches where individual fields contain 
lists of strings. Some of the batches may, however, contain empty lists only. 
And there doesn't seem to be any representation in Arrow at the moment to deal 
with this situation.

Also, since I'm serializing the batches into a single file/stream, their 
schemas need to be consistent, which is why I tried explicitly specifying the 
type of the array as list_(string). The only workaround I've found is to 
replace empty lists with [""], but that implies lots of unnecessary glue code 
on the client side. Is there a better workaround until this is fixed in an 
official conda release?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to