bharath-techie commented on PR #6921: URL: https://github.com/apache/arrow-rs/pull/6921#issuecomment-2692201208
Hi @XiangpengHao , I'm sure this is still work in progress. But we're encountering extremely slow IO pushdown queries when we were doing POC with datafusion / arrow-rs for parquet reads. [ some of the cases , say in 100 million rows of application logs, say i query for status = 200 or status = 400 , its 8x slower than filter exec ] So I took your changes , applied on arrow-rs 52.1.0 and did a round of testing with datafusion 45.0. ``` if self.decoders.contains_key(&encoding) { return Err(general_err!("Column cannot have more than one dictionary")); } ``` All batch reads seem to end at this line as part of column decoder (src/column/reader/decoder.rs) and hence fails with error. So just want to check if there are there other planned PRs / changes pending for this PR before it gets merged ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org