AlenkaF commented on issue #34739: URL: https://github.com/apache/arrow/issues/34739#issuecomment-1485064943
The issue in the example that doesn't work is that a `NullArray` is created (in [table.add_column](https://github.com/apache/arrow/blob/main/python/pyarrow/table.pxi#L4562-L4565)) as the only element in the column being appended is `None`. `NullArray` is of type `pa.null()` and not `pa.string()` and so we get an `ArrowInvalid` error: ```python >>> pa.chunked_array([["x"]]) <pyarrow.lib.ChunkedArray object at 0x11672b600> [ [ "x" ] ] >>> pa.chunked_array([["x"]]).chunk(0) <pyarrow.lib.StringArray object at 0x11671ae60> [ "x" ] >>> pa.chunked_array([[None]]) <pyarrow.lib.ChunkedArray object at 0x11672b740> [ 1 nulls ] >>> pa.chunked_array([[None]]).chunk(0) <pyarrow.lib.NullArray object at 0x11671ae60> 1 nulls ``` That will not happen if you have examples with more than one row and not all elements of a column missing: ```python >>> pa.chunked_array([[None, "x"]]).chunk(0) <pyarrow.lib.StringArray object at 0x11671af80> [ null, "x" ] ``` ```python import pyarrow as pa table = pa.Table.from_pylist([{"a": None}, {"a": "first"}], pa.schema([pa.field("a", pa.string(), nullable=True)])) table = table.append_column(pa.field("b", pa.string(), nullable=True), [["x", "y"]]) table = table.append_column(pa.field("n", pa.string(), nullable=True), [[None, "second"]]) table # pyarrow.Table # a: string # b: string # n: string # ---- # a: [[null,"first"]] # b: [["x","y"]] # n: [[null,"second"]] table.schema.field("n") # pyarrow.Field<n: string> table.schema.field("n").nullable # True ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org