wgtmac commented on issue #2: URL: https://github.com/apache/iceberg-cpp/issues/2#issuecomment-2496086676
Thanks @zeroshade for the detail! The table below is the type mapping between iceberg and arrow. I think we can provide a wrapper around arrow data types to use only a subset of them. On the read path, the mapping is pretty clear except for String/LargeString/Binary/LargeBinary. We can by default use String/Binary unless explicitly configured. On the write path, we can simply error out for unsupported arrow types. Just want to add that the ongoing iceberg `variant` and `geometry` types will not have any issue, parquet-cpp will anyway implement them because they are part of the parquet spec. Therefore I don't think there is a compelling reason not to use `arrow::DataType` directly. | iceberg | arrow | |---------|-------| | unknown | Null | | boolean | Boolean | | int | Int32 | | long | Int64 | | float | Float32 | | double | Float64 | | decimal(P,S) | Decimal(P,S) | | date | Date32 | | time | Time64 | | timestamp | Timestamp(MICRO) | | timestamptz | Timestamp(MICRO,UTC) | | timestamp_ns | Timestamp(NANO) | | timestamptz_ns | Timestamp(NANO,UTC) | | string | String/LargeString | | uuid | UUID canonical extension type | | fixed(L) | FixedSizeBinary (L) | | binary | Binary/LargeBinary | | struct | Struct | | list | List/LargeList | | map | Map | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
