JonasJ-ap commented on code in PR #6997:
URL: https://github.com/apache/iceberg/pull/6997#discussion_r1131163621


##########
python/tests/conftest.py:
##########
@@ -116,6 +117,17 @@ def table_schema_simple() -> Schema:
     )
 
 
[email protected](scope="session")
+def pyarrow_schema_simple() -> pa.Schema:

Review Comment:
   I applied the change discussed in 
https://github.com/apache/iceberg/pull/6997#discussion_r1125317095, so now I am 
able to do a round-trip unit test here. I also added several tests for 
individual types
   
   For real parquet file test, I downloaded one from my test table on AWS and 
perform the following test:
   ```python
   parquet_test_path = 
"/Users/jonasjiang/.CMVolumes/gluetestjonas/warehouse/iceberg_ref.db/nested_frame_unpartitioned_parquet/data/00002-17-e809caf4-2060-4319-b299-a2caf0dd133d-00001.parquet"
   fs = LocalFileSystem()
   def test_pyarrow_to_iceberg(path):
       with fs.open_input_file(path) as f:
           parquet_schema = pq.read_schema(f)
           converted_iceberg_schema = pyarrow_to_schema(parquet_schema)
           stored_iceberg_schema = 
Schema.parse_raw(parquet_schema.metadata[b'iceberg.schema'].decode())
           print(stored_iceberg_schema)
           assert converted_iceberg_schema == stored_iceberg_schema
           print("Verified")
   test_pyarrow_to_iceberg(parquet_test_path)
   ```
   The output:
   ```python
   table {
     1: id: required long
     2: longCol: required long
     3: decimalCol: optional decimal(10, 2)
     4: magic_number: required double
     5: dateCol: required date
     6: dateString: required string
     7: random1: optional long
     8: random2: optional long
     9: random3: optional long
     10: random4: optional long
     11: random5: optional long
     12: innerStruct1: required struct<23: random1: optional long, 24: random2: 
optional long>
     13: innerStruct2: required struct<25: random3: optional long, 26: random4: 
optional long>
     14: structCol1: required struct<27: innerStruct1: required struct<29: 
random1: optional long, 30: random2: optional long>, 28: innerStruct2: required 
struct<31: random3: optional long, 32: random4: optional long>>
     15: innerStruct3: required struct<33: col1: optional string, 34: col2: 
optional string>
     16: structCol2: required struct<35: innerStruct3: required struct<37: 
col1: optional string, 38: col2: optional string>, 36: col2: required 
struct<39: col1: optional string, 40: col2: optional string>>
     17: arrayCol: required list<long>
     18: arrayStructCol: required list<struct<43: random1: optional long, 44: 
random2: optional long>>
     19: mapCol1: required map<struct<47: innerStruct1: required struct<49: 
random1: optional long, 50: random2: optional long>, 48: innerStruct2: required 
struct<51: random3: optional long, 52: random4: optional long>>, struct<53: 
innerStruct3: required struct<55: col1: optional string, 56: col2: optional 
string>, 54: col2: required struct<57: col1: optional string, 58: col2: 
optional string>>>
     20: mapCol2: required map<long, string>
     21: mapCol3: required map<date, list<long>>
     22: structCol3: required struct<64: structCol2: required struct<67: 
innerStruct3: required struct<69: col1: optional string, 70: col2: optional 
string>, 68: col2: required struct<71: col1: optional string, 72: col2: 
optional string>>, 65: mapCol3: required map<date, list<long>>, 66: arrayCol: 
required list<long>>
   }
   Verified
   ```
   Please let me know if you want to see more related tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to