[GitHub] [arrow-datafusion] rdettai edited a comment on pull request #1010: Reorganize table providers by table format

GitBox Wed, 22 Sep 2021 08:48:59 -0700


rdettai edited a comment on pull request #1010:
URL: https://github.com/apache/arrow-datafusion/pull/1010#issuecomment-924131108



   @houqp @alamb @yjshen this is now getting closer to what I am targeting. 
Some notes for anyone who would want to go through the code:
   - I have duplicated the `TableProvider` and the `ExecutionPlan` 
implementations for the file formats (parquet, csv, json). The objective is to 
remove the old ones once this is functional.
   - In general, I am trying to minimize the number of constructor overloading 
because I find them very hard to read and test
   - The `ListingTable` does not support partitioning yet, but the stubs are 
there to show how and where it will be implemented. I don't plan to include 
implementations in this PR.
   - The `ListingTable` and `FileFormat` implems still use the file system 
calls. The next step (another PR) will be to use the `ObjectStore`.
   
   Note that I am really trying not to add any feature, just re-organize the 
code. But I think that ones this is done, it will be fairly easy to add the 
Hive partitioning implem and use the ObjectStore.
   
   I know it is a lot to ask (as this is again a fairly large change) but it 
would be great if you could take a "quick" look and at least validate the 
overall approach. Thanks 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] rdettai edited a comment on pull request #1010: Reorganize table providers by table format

Reply via email to