sunchao commented on PR #1: URL: https://github.com/apache/arrow-datafusion-comet/pull/1#issuecomment-1911097541
Thanks @alamb , really appreciated > I wonder if you have a public roadmap about where you hope to take this project? We don't have it yet. Internally we do have roadmap under `doc` but it was removed in this PR. We can add it back after the initial PR. > As I understand it the next step is to perform the IP clearance process ... That's great! I'll check how it was done for other projects, and let you know if I need any help with it. > There appears to be another implementation of parquet in java as well as in rust. Yes, the Comet Parquet reader is a hybrid implementation: the IO part is done in Java while the decoding (to Arrow) & decompression is done in native. This is based on the assumption that we won't get much performance gain by moving the IO part to native. While keeping it in Java, we are able to leverage various storage connectors such as S3 and HDFS, that are already pretty mature, as well as Parquet features that are missing on the native side, like [encryption support](https://github.com/apache/arrow-rs/issues/3511). With that said, at some point we do want to switch to a fully native Parquet reader like the one in DF. This can potentially help to simplify a lot of the logic we currently have. > There is a set of kernels (e.g. core/src/execution/kernels/strings.rs that seems somewhat similar to what is in arrow-rs and datafusion) Yes, I think we should be able to switch to the ones in DF now. These were added long time back when some of the string kernels in DF still didn't support dictionary, which is no longer true. > The [docs](https://github.com/apache/arrow-datafusion-comet/blob/comet-upstream/README.md) imply there is codgen for filters, but I didn't find any reference to that in the code This is something we want to do in Comet, but hasn't started yet :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
