tustvold commented on code in PR #9077:
URL: https://github.com/apache/arrow-rs/pull/9077#discussion_r2662937623


##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -819,7 +821,15 @@ impl ArrowColumnWriter {
     pub fn write(&mut self, col: &ArrowLeafColumn) -> Result<()> {
         match &mut self.writer {
             ArrowColumnWriterImpl::Column(c) => {
-                write_leaf(c, &col.0)?;
+                let leaf = col.0.array();
+                match leaf.as_any_dictionary_opt() {
+                    Some(dictionary) => {
+                        let materialized =
+                            arrow_select::take::take(dictionary.values(), 
dictionary.keys(), None)?;

Review Comment:
   That is correct, there is likely a fair amount of low-hanging fruit to 
optimising the parquet writer
   
   Edit: Although it does appear there is something to handle dictionary array 
of bytearray, let me check I didn't break that
   
   Edit edit: Looks like there is a specialized path for ByteArray including 
dictionary encoded ByteArray, written by... me... - 
https://github.com/apache/arrow-rs/pull/2221 🤦 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to