alamb commented on pull request #1223:
URL: https://github.com/apache/arrow-datafusion/pull/1223#issuecomment-961315661


   Thank you very much @yahoNanJing .
   
   I personally would like to see such integration / connector code live in 
other repos (rather than a feature flag on datafusion). My primary rationale is 
to keep DataFusion and Ballista development easier. 
   
   DataFusion and Ballista already have fairly substantial overhead to working 
on this codebase, so adding another integration will make testing / coding 
changes in this repo take longer.
   
   For example, I don't know how to run / test hdfs locally. I tried on this 
branch:
   
   ```
   -*- mode: compilation; default-directory: "~/Software/arrow-datafusion/" -*-
   cargo test --features=hdfs  -p datafusion --test sql
      Compiling fs-hdfs v0.1.3
      Compiling parquet v6.0.0
   error: failed to run custom build command for `fs-hdfs v0.1.3`
   
   Caused by:
     process didn't exit successfully: 
`/Volumes/RAMDisk/df-target/debug/build/fs-hdfs-e451fc9b245fa518/build-script-build`
 (exit status: 101)
     --- stderr
     thread 'main' panicked at 'JAVA_HOME shell environment must be set: 
environment variable not found', 
/Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/fs-hdfs-0.1.3/build.rs:132:13
     note: run with `RUST_BACKTRACE=1` environment variable to display a 
backtrace
   warning: build failed, waiting for other jobs to finish...
   ```
   
   I am sure I could go figure out how to setup the HDFS dependency, but I 
don't have the time right now.
   
   Putting the HDFS code in this repo will help ensure we keep DataFusion's API 
resonably stable as you mentioned on 
https://github.com/apache/arrow-datafusion/pull/1062#discussion_r741575976 and 
effectivly make the maintenance of the hdfs connector easier.
   
   However, in total, I selfishly would like to optimize for the maintenance of 
DataFusion as it currently exists rather than the hdfs connector.
   
   Thoughts @houqp  @Dandandan @Jimexist @rdettai ?
   
   BTW if there is consensus for putting this code into the arrow-datafusion 
repo, prior I would like to see:
   1. CI coverage (aka testing that this works as part of the github checks -- 
I may have missed it but I don't think this is covered now)
   2. A developer's guide explaining how to configure / run the tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to