kevinjqliu commented on issue #1255:
URL:
https://github.com/apache/iceberg-python/issues/1255#issuecomment-2442979795
> So I expect that the value of the second record other than
"string_field_1" to be null when I insert these records into the iceberg table
using pyiceberg.
```
import pyarrow as pa
schema = pa.schema(
[
pa.field("string_field_1", pa.string(), True),
pa.field("int_field_1", pa.int32(), True),
pa.field("float_field_1", pa.float32(), True),
pa.field(
"struct_field_1",
pa.struct(
[
pa.field("string_nested_1", pa.string()),
pa.field("int_item_2", pa.int32()),
pa.field("float_item_2", pa.float32()),
]
),
),
pa.field("list_field_1", pa.list_(pa.string())),
pa.field("list_field_2", pa.list_(pa.int32())),
pa.field("list_field_3", pa.list_(pa.float32())),
pa.field("map_field_1", pa.map_(pa.string(), pa.string())),
pa.field("map_field_2", pa.map_(pa.string(), pa.int32())),
pa.field("map_field_3", pa.map_(pa.string(), pa.float32())),
]
)
records = [
{
"string_field_1": "field_1",
"int_field_1": 123,
"float_field_1": 1.23,
"struct_field_1": {
"string_nested_1": "nest_1",
"int_item_2": 1234,
"float_item_2": 1.234,
},
"list_field_1": ["a", "b", "c"],
"list_field_2": [1, 2, 3],
"list_field_3": [0.1, 0.2, 0.3],
"map_field_1": {"a": "b", "b": "c"},
"map_field_2": {"a": 1, "b": 2},
"map_field_3": {"a": 0.1, "b": 0.2},
},
{
"string_field_1": "field_1_b",
},
]
pyarrow_table: pa.Table = pa.Table.from_pylist(records, schema=schema)
print(pyarrow_table["struct_field_1"].to_pandas())
```
returns
```
0 {'string_nested_1': 'nest_1', 'int_item_2': 12...
1 None
Name: struct_field_1, dtype: object
```
which confirms the value is None in pyarrow.
After appending to the table, is the record None when you read it back?
`table.scan().to_pandas()`
> I then checked the table using AWS Athena, but the "struct_field_1" of the
second record is not null.
It is not clear to me that Athena returns a non-null value in the example
you provided.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]