Ok, to get what I've done so far merged, i have to rebase it on the latest drill commit, and the junit tests that exercise it must work.
Also update to Daffodil 3.9.0 which was just released. This *should* be very easy, the unit tests were all working last I tried them. I will try to get this done this week. On Mon, Oct 7, 2024 at 9:58 AM Charles Givre <cgi...@apache.org> wrote: > Hi Mike, > Let me answer this as best I can. Firstly, just to be clear on this > point, the phase 1 implementation isn’t the desired state. It’s not really > all that workable, but it gets what you’ve already done merged. Since, as > you mentioned, DFDL needs multiple files, what if you were to put these > files in the classpath in a folder? IE: > > Classpath/schema1/ > Classpath/schema2/ > > For tests, I’d imagine all you have to do is copy the valid files into the > test/resources/ folder then run your queries. In real life situations a > user would have to copy all the files into the classpath of all drill > nodes. This will be dealt with in phase 2. In phase 2, the user will > simply have to copy the files into a staging directory and Drill will > handle copying them to all nodes. (I think) > > Best, > — C > > > > On Oct 3, 2024, at 10:15, Mike Beckerle <mbecke...@apache.org> wrote: > > > > I agree we can do the phase1 merge. It should not break anything. > > > > Phase 2 ... Paul suggested "just throw everything into > > $DRILL_CONFIG_DIR", plugin jars, schema jars, everything, as > > apparently that gets automatically copied everywhere and put on the > > class path. > > > > I left off right at that point for lack of knowledge. > > > > How would a test work that way? I.e, a maven test under > > src/test/java... how is it going to arrange for DRILL_CONFIG_DIR to be > > defined, and put things into that directory before drill executes (and > > reads the env for DRILL_CONFIG_DIR's value). I normally think of > > env-vars as frozen at the time the JVM starts, so tests can't change > > them unless they are forking a process, and in a complex system like > > drill I have no idea the implications of this. > > > > The only logic change needed I think is to deal with "there is exactly > > 1 file to parse and query", vs. "there are numerous files to parse and > > query" These files could, I suppose, be distributed somehow, but they > > also could just be a bunch of files. My guess is drill already has all > > of this, and we just have to reuse the pattern from some other > > extension. > > > > > > On Wed, Oct 2, 2024 at 9:17 AM Charles Givre <cgi...@apache.org> wrote: > >> > >> Hi Mike, > >> I hope all is well. I need to apologize as I grossly overestimated my > available free time to assist with the DFDL / Drill integration. I had a > thought which I wanted to propose. > >> > >> My thinking is that we should complete the integration in two phases: > >> > >> Phase 1: > >> For phase 1, I propose that we merge the work that you’ve already > done. We’d have to make sure that the DFDL files are accessible from the > class path. This isn’t really a great solution, but it is just to get the > pieces in so we can work on phase 2. I don’t like seeing good work > languishing in the PR queue and getting stale. To complete phase 1, all > we’d really have to do is get the unit tests working. > >> > >> Phase 2: > >> The remaining issue revolves around making the DFDL files accessible to > Drill and also so that a user can easily add or remove files. For this we > have a solution: DRILL-4726[1] which provides dynamic UDF support. > Basically what I’m proposing is that we duplicate the components of this PR > for Drill. The end result would be that a user could copy the UDF files to > a staging directory. Then the user would run a command like: > >> > >> CREATE DAFFODIL SCHEMA xxxx USING JAR yyyyy > >> > >> When the user does that, the file would be propagated to all the Drill > nodes. Implementing this feature would really involve a lot of duplicating > with slight mods from that pull request. What do you think? > >> Best, > >> — C > >> > >> > >> > >> [1]: https://github.com/apache/drill/pull/574 > >> > >> > >> > >