alamb opened a new issue, #4813: URL: https://github.com/apache/arrow-rs/issues/4813
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** We struggle with the memory used by the RowConverter when interning values from `DictionaryArrays`. We are even proposing a special CardinalityAware wrapper on top of the RowConverter in DataFusion (see https://github.com/apache/arrow-datafusion/pull/7401) At the moment, round tripping data from Array to Rows and then back to Array works like this: ``` DictionaryArray -- (preserve_dictionaries = false) --> Rows --> Primtive/StringArray ``` In DataFusion we must maintain the same input / output types, so in our proposed improvement we needed to add a call to `cast`, which @tustvold notes is likely very expensive: https://github.com/apache/arrow-datafusion/pull/7401/files#r1324281222 **Describe the solution you'd like** I would like the `RowConverter` to produce the same output type as the input type on `SortField`, even if [preserve_dictionaries](https://docs.rs/arrow-row/46.0.0/arrow_row/struct.SortField.html#method.preserve_dictionaries) is set to false This would avoid a copy of the String data and likely perform much better. **Describe alternatives you've considered** We could potentially simply remove stateful row encoding: https://github.com/apache/arrow-rs/issues/4811 **Additional context** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
