Wes McKinney created ARROW-1681: ----------------------------------- Summary: [Python] Error writing with nulls in lists Key: ARROW-1681 URL: https://issues.apache.org/jira/browse/ARROW-1681 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.7.1 Reporter: Wes McKinney Fix For: 0.8.0
Created from https://github.com/apache/arrow/issues/1208 Hi, Not sure if this is related or the same as ARROW-1584, but I can't seem to find a way to handle arrays of lists which occasionally consist of empty lists only. To reproduce: {code} na = [] # None, [""] arrays = { 'c1': pa.array([["test"], na, na], type=pa.list_(pa.string())), 'c2': pa.array([na, na, na], type=pa.list_(pa.string())), } rb = pa.RecordBatch.from_arrays(list(arrays.values()), list(arrays.keys())) df = rb.to_pandas() pa.serialize_pandas(df) # > ArrowNotImplementedError: Unable to convert type: null tbl = pa.Table.from_pandas(df) sink = pa.BufferOutputStream() writer = pa.RecordBatchFileWriter(sink, tbl.schema) writer.write_table(tbl) # > ArrowNotImplementedError: Unable to convert type: null {code} In my use case I'm processing data in batches where individual fields contain lists of strings. Some of the batches may, however, contain empty lists only. And there doesn't seem to be any representation in Arrow at the moment to deal with this situation. Also, since I'm serializing the batches into a single file/stream, their schemas need to be consistent, which is why I tried explicitly specifying the type of the array as list_(string). The only workaround I've found is to replace empty lists with [""], but that implies lots of unnecessary glue code on the client side. Is there a better workaround until this is fixed in an official conda release? -- This message was sent by Atlassian JIRA (v6.4.14#64029)