Hi everyone, I wanted to gauge interest and feasibility for adding support for natively reading an arrow::DictionaryArray from a parquet file. Currently, writing an arrow::DictionaryArray is read back as the native index type [0]. I came across a prior discussion for this problem in the context of pandas [1] but I think this would be useful for other arrow clients (C++ or otherwise).
The solution I had in mind would be to add arrow type information as column metadata. This metadata would then be used when reading back the parquet file to determine which arrow type to create for the column data. I’m willing to contribute this feature but first wanted to get some feedback on whether this would be generally useful and if the high-level proposed solution would make sense. Thanks! Hatem [0] This test demonstrates this behavior https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/arrow-reader-writer-test.cc#L1848 [1] https://github.com/apache/arrow/issues/1688