Re: [PR] Initial PR [arrow-datafusion-comet]

via GitHub Thu, 25 Jan 2024 14:41:24 -0800


sunchao commented on code in PR #1:
URL: 
https://github.com/apache/arrow-datafusion-comet/pull/1#discussion_r1467039101



##########
common/src/main/java/org/apache/comet/parquet/AbstractColumnReader.java:
##########
@@ -0,0 +1,116 @@
+/*

Review Comment:
   Yes, when we started there are several things that are not yet ready in the 
Rust implementation yet, so we chose to use this hybrid implementation. The 
Rust implementation definitely has become much mature now, and we do want to 
switch to it at some point. 
   
   I think to check what are the things that are missing in the Rust side. 
Perhaps:
   - Parquet encryption support 
   - Check [all the 
predicates](https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/filter2/predicate/FilterApi.java)
 and see if they are supported (e.g., in/notIn?)
   - Dictionary pushdown? maybe it is already supported.
   
   We also needed to do a bunch of Spark-specific things in our native Parquet 
reader. For instance, Spark has this timestamp/date rebase feature for 
conversions from the old Julian calendar to Gregorian calendar, and it also 
reads small precision decimal into `i32` or `i64` on the Java side, which 
requires special handling.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Initial PR [arrow-datafusion-comet]

Reply via email to