I agree we can do the phase1 merge. It should not break anything.

Phase 2 ... Paul suggested "just throw everything into
$DRILL_CONFIG_DIR", plugin jars, schema jars, everything, as
apparently that gets automatically copied everywhere and put on the
class path.

I left off right at that point for lack of knowledge.

How would a test work that way? I.e, a maven test under
src/test/java... how is it going to arrange for DRILL_CONFIG_DIR to be
defined, and put things into that directory before drill executes (and
reads the env for DRILL_CONFIG_DIR's value). I normally think of
env-vars as frozen at the time the JVM starts, so tests can't change
them unless they are forking a process, and in a complex system like
drill I have no idea the implications of this.

The only logic change needed I think is to deal with "there is exactly
1 file to parse and query", vs. "there are numerous files to parse and
query"  These files could, I suppose, be distributed somehow, but they
also could just be a bunch of files. My guess is drill already has all
of this, and we just have to reuse the pattern from some other
extension.


On Wed, Oct 2, 2024 at 9:17 AM Charles Givre <cgi...@apache.org> wrote:
>
> Hi Mike,
> I hope all is well.  I need to apologize as I grossly overestimated my 
> available free time to assist with the DFDL / Drill integration.  I had a 
> thought which I wanted to propose.
>
> My thinking is that we should complete the integration in two phases:
>
> Phase 1:
> For phase 1, I propose that we merge the work that you’ve already done.  We’d 
> have to make sure that the DFDL files are accessible from the class path.  
> This isn’t really a great solution, but it is just to get the pieces in so we 
> can work on phase 2.  I don’t like seeing good work languishing in the PR 
> queue and getting stale.  To complete phase 1, all we’d really have to do is 
> get the unit tests working.
>
> Phase 2:
> The remaining issue revolves around making the DFDL files accessible to Drill 
> and also so that a user can easily add or remove files.  For this we have a 
> solution: DRILL-4726[1] which provides dynamic UDF support.  Basically what 
> I’m proposing is that we duplicate the components of this PR for Drill.  The 
> end result would be that a user could copy the UDF files to a staging 
> directory.  Then the user would run a command like:
>
> CREATE DAFFODIL SCHEMA xxxx USING JAR yyyyy
>
> When the user does that, the file would be propagated to all the Drill nodes. 
>  Implementing this feature would really involve a lot of duplicating with 
> slight mods from that pull request.  What do you think?
> Best,
> — C
>
>
>
> [1]: https://github.com/apache/drill/pull/574
>
>
>

Reply via email to