rouault commented on PR #367: URL: https://github.com/apache/arrow-nanoarrow/pull/367#issuecomment-1902685209
Sorry for hickjacking this thread but I've become aware of the introductions of those new arrow data types because GDAL compilation broke against arrow 15.0 (due to a switch() not handling the new cases). Will be addressed in https://github.com/OSGeo/gdal/pull/9116 in a minimalistic way by erroring out on those types. Do you know if those types will be actually found in serialized formats, namely Parquet and Feather ? And I hope that I won't have to add support for them in OGRLayer::WriteArrowBatch()... Their value proposition compared to regular string or binary is unclear to me. The only thing I see is that they might be a way to reduce memory usage by pointing to the same offset in case of duplicated strings? but wasn't that the purpose of dictionaries? (actually the same question would hold for the RUN_END_ENCODED stuff added in libarrow 12). As a relatively new comer to the Arrow ecosystem, I should point that the proliferation of basic data types is going to be a serious obstacle to adoption by new implementations. Perhaps there should be some "data type negociation" mechanism where a consumer could tell to the producer something like "I just understand Int32, Float64, String. Do your best to present me only those data types by possibly morphing content to that" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
