tustvold commented on code in PR #9077:
URL: https://github.com/apache/arrow-rs/pull/9077#discussion_r2662937623
##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -819,7 +821,15 @@ impl ArrowColumnWriter {
pub fn write(&mut self, col: &ArrowLeafColumn) -> Result<()> {
match &mut self.writer {
ArrowColumnWriterImpl::Column(c) => {
- write_leaf(c, &col.0)?;
+ let leaf = col.0.array();
+ match leaf.as_any_dictionary_opt() {
+ Some(dictionary) => {
+ let materialized =
+ arrow_select::take::take(dictionary.values(),
dictionary.keys(), None)?;
Review Comment:
That is correct, there is likely a fair amount of low-hanging fruit to
optimising the parquet writer
Edit: Although it does appear there is something to handle dictionary array
of bytearray, let me check I didn't break that
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]