Thanks! Looking forward.
> On Oct 7, 2024, at 10:31, Mike Beckerle <mbecke...@apache.org> wrote: > > Ok, to get what I've done so far merged, i have to rebase it on the latest > drill commit, and the junit tests that exercise it must work. > > Also update to Daffodil 3.9.0 which was just released. > > This *should* be very easy, the unit tests were all working last I tried > them. > > I will try to get this done this week. > > > On Mon, Oct 7, 2024 at 9:58 AM Charles Givre <cgi...@apache.org> wrote: > >> Hi Mike, >> Let me answer this as best I can. Firstly, just to be clear on this >> point, the phase 1 implementation isn’t the desired state. It’s not really >> all that workable, but it gets what you’ve already done merged. Since, as >> you mentioned, DFDL needs multiple files, what if you were to put these >> files in the classpath in a folder? IE: >> >> Classpath/schema1/ >> Classpath/schema2/ >> >> For tests, I’d imagine all you have to do is copy the valid files into the >> test/resources/ folder then run your queries. In real life situations a >> user would have to copy all the files into the classpath of all drill >> nodes. This will be dealt with in phase 2. In phase 2, the user will >> simply have to copy the files into a staging directory and Drill will >> handle copying them to all nodes. (I think) >> >> Best, >> — C >> >> >>> On Oct 3, 2024, at 10:15, Mike Beckerle <mbecke...@apache.org> wrote: >>> >>> I agree we can do the phase1 merge. It should not break anything. >>> >>> Phase 2 ... Paul suggested "just throw everything into >>> $DRILL_CONFIG_DIR", plugin jars, schema jars, everything, as >>> apparently that gets automatically copied everywhere and put on the >>> class path. >>> >>> I left off right at that point for lack of knowledge. >>> >>> How would a test work that way? I.e, a maven test under >>> src/test/java... how is it going to arrange for DRILL_CONFIG_DIR to be >>> defined, and put things into that directory before drill executes (and >>> reads the env for DRILL_CONFIG_DIR's value). I normally think of >>> env-vars as frozen at the time the JVM starts, so tests can't change >>> them unless they are forking a process, and in a complex system like >>> drill I have no idea the implications of this. >>> >>> The only logic change needed I think is to deal with "there is exactly >>> 1 file to parse and query", vs. "there are numerous files to parse and >>> query" These files could, I suppose, be distributed somehow, but they >>> also could just be a bunch of files. My guess is drill already has all >>> of this, and we just have to reuse the pattern from some other >>> extension. >>> >>> >>> On Wed, Oct 2, 2024 at 9:17 AM Charles Givre <cgi...@apache.org> wrote: >>>> >>>> Hi Mike, >>>> I hope all is well. I need to apologize as I grossly overestimated my >> available free time to assist with the DFDL / Drill integration. I had a >> thought which I wanted to propose. >>>> >>>> My thinking is that we should complete the integration in two phases: >>>> >>>> Phase 1: >>>> For phase 1, I propose that we merge the work that you’ve already >> done. We’d have to make sure that the DFDL files are accessible from the >> class path. This isn’t really a great solution, but it is just to get the >> pieces in so we can work on phase 2. I don’t like seeing good work >> languishing in the PR queue and getting stale. To complete phase 1, all >> we’d really have to do is get the unit tests working. >>>> >>>> Phase 2: >>>> The remaining issue revolves around making the DFDL files accessible to >> Drill and also so that a user can easily add or remove files. For this we >> have a solution: DRILL-4726[1] which provides dynamic UDF support. >> Basically what I’m proposing is that we duplicate the components of this PR >> for Drill. The end result would be that a user could copy the UDF files to >> a staging directory. Then the user would run a command like: >>>> >>>> CREATE DAFFODIL SCHEMA xxxx USING JAR yyyyy >>>> >>>> When the user does that, the file would be propagated to all the Drill >> nodes. Implementing this feature would really involve a lot of duplicating >> with slight mods from that pull request. What do you think? >>>> Best, >>>> — C >>>> >>>> >>>> >>>> [1]: https://github.com/apache/drill/pull/574 >>>> >>>> >>>> >> >>