JonasJ-ap commented on code in PR #6997:
URL: https://github.com/apache/iceberg/pull/6997#discussion_r1131163621
##########
python/tests/conftest.py:
##########
@@ -116,6 +117,17 @@ def table_schema_simple() -> Schema:
)
[email protected](scope="session")
+def pyarrow_schema_simple() -> pa.Schema:
Review Comment:
I applied the change discussed in
https://github.com/apache/iceberg/pull/6997#discussion_r1125317095, so now I am
able to do a round-trip unit test here. I also added several tests for
individual types
For real parquet file test, I downloaded one from my test table on AWS and
perform the following test:
```python
parquet_test_path =
"/Users/jonasjiang/.CMVolumes/gluetestjonas/warehouse/iceberg_ref.db/nested_frame_unpartitioned_parquet/data/00002-17-e809caf4-2060-4319-b299-a2caf0dd133d-00001.parquet"
fs = LocalFileSystem()
def test_pyarrow_to_iceberg(path):
with fs.open_input_file(path) as f:
parquet_schema = pq.read_schema(f)
converted_iceberg_schema = pyarrow_to_schema(parquet_schema)
stored_iceberg_schema =
Schema.parse_raw(parquet_schema.metadata[b'iceberg.schema'].decode())
print(stored_iceberg_schema)
assert converted_iceberg_schema == stored_iceberg_schema
print("Verified")
test_pyarrow_to_iceberg(parquet_test_path)
```
The output:
```python
table {
1: id: required long
2: longCol: required long
3: decimalCol: optional decimal(10, 2)
4: magic_number: required double
5: dateCol: required date
6: dateString: required string
7: random1: optional long
8: random2: optional long
9: random3: optional long
10: random4: optional long
11: random5: optional long
12: innerStruct1: required struct<23: random1: optional long, 24: random2:
optional long>
13: innerStruct2: required struct<25: random3: optional long, 26: random4:
optional long>
14: structCol1: required struct<27: innerStruct1: required struct<29:
random1: optional long, 30: random2: optional long>, 28: innerStruct2: required
struct<31: random3: optional long, 32: random4: optional long>>
15: innerStruct3: required struct<33: col1: optional string, 34: col2:
optional string>
16: structCol2: required struct<35: innerStruct3: required struct<37:
col1: optional string, 38: col2: optional string>, 36: col2: required
struct<39: col1: optional string, 40: col2: optional string>>
17: arrayCol: required list<long>
18: arrayStructCol: required list<struct<43: random1: optional long, 44:
random2: optional long>>
19: mapCol1: required map<struct<47: innerStruct1: required struct<49:
random1: optional long, 50: random2: optional long>, 48: innerStruct2: required
struct<51: random3: optional long, 52: random4: optional long>>, struct<53:
innerStruct3: required struct<55: col1: optional string, 56: col2: optional
string>, 54: col2: required struct<57: col1: optional string, 58: col2:
optional string>>>
20: mapCol2: required map<long, string>
21: mapCol3: required map<date, list<long>>
22: structCol3: required struct<64: structCol2: required struct<67:
innerStruct3: required struct<69: col1: optional string, 70: col2: optional
string>, 68: col2: required struct<71: col1: optional string, 72: col2:
optional string>>, 65: mapCol3: required map<date, list<long>>, 66: arrayCol:
required list<long>>
}
Verified
```
Please let me know if you want to see more related tests
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]