alamb opened a new issue, #4813:
URL: https://github.com/apache/arrow-rs/issues/4813

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   We struggle with the memory used by the RowConverter when interning values 
from `DictionaryArrays`. We are even proposing a special CardinalityAware 
wrapper on top of the RowConverter in DataFusion  (see 
https://github.com/apache/arrow-datafusion/pull/7401)
   
   At the moment, round tripping data from Array to Rows and then back to Array 
works like this:
   ```
   DictionaryArray -- (preserve_dictionaries = false) --> Rows --> 
Primtive/StringArray
   ```
   
   In DataFusion we must maintain the same input / output types, so in our 
proposed improvement we needed to add a call to `cast`, which @tustvold  notes 
is likely very expensive:  
https://github.com/apache/arrow-datafusion/pull/7401/files#r1324281222
   
   **Describe the solution you'd like**
   I would like the `RowConverter` to produce the same output type as the input 
type on `SortField`, even if 
[preserve_dictionaries](https://docs.rs/arrow-row/46.0.0/arrow_row/struct.SortField.html#method.preserve_dictionaries)
 is set to false
   
   This would avoid a copy of the String data and likely perform much better. 
   
   **Describe alternatives you've considered**
   We could potentially simply remove stateful row encoding: 
https://github.com/apache/arrow-rs/issues/4811
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to