I agree we can do the phase1 merge. It should not break anything. Phase 2 ... Paul suggested "just throw everything into $DRILL_CONFIG_DIR", plugin jars, schema jars, everything, as apparently that gets automatically copied everywhere and put on the class path.
I left off right at that point for lack of knowledge. How would a test work that way? I.e, a maven test under src/test/java... how is it going to arrange for DRILL_CONFIG_DIR to be defined, and put things into that directory before drill executes (and reads the env for DRILL_CONFIG_DIR's value). I normally think of env-vars as frozen at the time the JVM starts, so tests can't change them unless they are forking a process, and in a complex system like drill I have no idea the implications of this. The only logic change needed I think is to deal with "there is exactly 1 file to parse and query", vs. "there are numerous files to parse and query" These files could, I suppose, be distributed somehow, but they also could just be a bunch of files. My guess is drill already has all of this, and we just have to reuse the pattern from some other extension. On Wed, Oct 2, 2024 at 9:17 AM Charles Givre <cgi...@apache.org> wrote: > > Hi Mike, > I hope all is well. I need to apologize as I grossly overestimated my > available free time to assist with the DFDL / Drill integration. I had a > thought which I wanted to propose. > > My thinking is that we should complete the integration in two phases: > > Phase 1: > For phase 1, I propose that we merge the work that you’ve already done. We’d > have to make sure that the DFDL files are accessible from the class path. > This isn’t really a great solution, but it is just to get the pieces in so we > can work on phase 2. I don’t like seeing good work languishing in the PR > queue and getting stale. To complete phase 1, all we’d really have to do is > get the unit tests working. > > Phase 2: > The remaining issue revolves around making the DFDL files accessible to Drill > and also so that a user can easily add or remove files. For this we have a > solution: DRILL-4726[1] which provides dynamic UDF support. Basically what > I’m proposing is that we duplicate the components of this PR for Drill. The > end result would be that a user could copy the UDF files to a staging > directory. Then the user would run a command like: > > CREATE DAFFODIL SCHEMA xxxx USING JAR yyyyy > > When the user does that, the file would be propagated to all the Drill nodes. > Implementing this feature would really involve a lot of duplicating with > slight mods from that pull request. What do you think? > Best, > — C > > > > [1]: https://github.com/apache/drill/pull/574 > > >