Re: [PR] Experimental parquet decoder with first-class selection pushdown support [arrow-rs]

via GitHub Sat, 01 Mar 2025 05:22:12 -0800


bharath-techie commented on PR #6921:
URL: https://github.com/apache/arrow-rs/pull/6921#issuecomment-2692201208


   Hi @XiangpengHao ,
   I'm sure this is still work in progress.
   
   But we're encountering extremely slow IO pushdown queries when we were doing 
POC with datafusion / arrow-rs for parquet reads. [ some of the cases , say in 
100 million rows of application logs, say i query for status = 200 or status = 
400 , its 8x slower than filter exec ]
   
   So I took your changes , applied on arrow-rs 52.1.0 and did a round of 
testing with datafusion 45.0.
   
   ```
   if self.decoders.contains_key(&encoding) {
               return Err(general_err!("Column cannot have more than one 
dictionary"));
           }
   ```
   All batch reads seem to end at this line as part of column decoder 
(src/column/reader/decoder.rs) and hence fails with error.
   
   So just want to check if there are there other planned PRs / changes pending 
for this PR before it gets merged ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Experimental parquet decoder with first-class selection pushdown support [arrow-rs]

Reply via email to