SGA-taichi-kato commented on issue #1255:
URL:
https://github.com/apache/iceberg-python/issues/1255#issuecomment-2442934911
Hi @kevinjqliu
Here we define the pyarrow schema. What I'm asking about is the struct field
"struct_field_1".
```python
from pyiceberg.catalog import load_catalog
import pyarrow as pa
schema = pa.schema(
[
pa.field("string_field_1", pa.string(), True),
pa.field("int_field_1", pa.int32(), True),
pa.field("float_field_1", pa.float32(), True),
pa.field(
"struct_field_1",
pa.struct(
[
pa.field("string_nested_1", pa.string()),
pa.field("int_item_2", pa.int32()),
pa.field("float_item_2", pa.float32()),
]
),
),
pa.field("list_field_1", pa.list_(pa.string())),
pa.field("list_field_2", pa.list_(pa.int32())),
pa.field("list_field_3", pa.list_(pa.float32())),
pa.field("map_field_1", pa.map_(pa.string(), pa.string())),
pa.field("map_field_2", pa.map_(pa.string(), pa.int32())),
pa.field("map_field_3", pa.map_(pa.string(), pa.float32())),
]
)
```
And then, I create the two records, but the second record has no value other
than "string_field_1".
So I expect that the value of the second record other than "string_field_1"
to be null when I insert these records into the iceberg table using pyiceberg.
```python
records = [
{
"string_field_1": "field_1",
"int_field_1": 123,
"float_field_1": 1.23,
"struct_field_1": {
"string_nested_1": "nest_1",
"int_item_2": 1234,
"float_item_2": 1.234,
},
"list_field_1": ["a", "b", "c"],
"list_field_2": [1, 2, 3],
"list_field_3": [0.1, 0.2, 0.3],
"map_field_1": {"a": "b", "b": "c"},
"map_field_2": {"a": 1, "b": 2},
"map_field_3": {"a": 0.1, "b": 0.2},
},
{
"string_field_1": "field_1_b",
},
]
```
And, I inserted the records above to glue iceberg table.
```python
catalog = load_catalog(
"glue",
**{
"type": "glue",
"glue.region": "us-west-2",
"s3.region": "us-west-2",
},
)
table_name = "iceberg_test"
location = f"s3://tmp_bucket/test/iceberg/{table_name}"
catalog.drop_table(f"test.{table_name}")
table = catalog.create_table(
f"test.{table_name}",
schema,
location=location,
)
pyarrow_table: pa.Table = pa.Table.from_pylist(records, schema=schema)
table.append(pyarrow_table)
```
I then checked the table using AWS Athena, but the "struct_field_1" of the
second record is not null.
So I'm asking you about why does this occur, and how can I avoid it.
```
"string_field_1","int_field_1","float_field_1","struct_field_1","list_field_1","list_field_2","list_field_3","map_field_1","map_field_2","map_field_3"
"field_1","123","1.23","{string_nested_1=nest_1, int_item_2=1234,
float_item_2=1.234}","[a, b, c]","[1, 2, 3]","[0.1, 0.2, 0.3]","{a=b,
b=c}","{a=1, b=2}","{a=0.1, b=0.2}"
"field_1_b",,,"{string_nested_1=, int_item_2=0, float_item_2=0.0}",,,,,,
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]