Ok, to get what I've done so far merged, i have to rebase it on the latest
drill commit, and the junit tests that exercise it must work.

Also update to Daffodil 3.9.0 which was just released.

This *should* be very easy, the unit tests were all working last I tried
them.

I will try to get this done this week.


On Mon, Oct 7, 2024 at 9:58 AM Charles Givre <cgi...@apache.org> wrote:

> Hi Mike,
> Let me answer this as best I can.  Firstly, just to be clear on this
> point, the phase 1 implementation isn’t the desired state.  It’s not really
> all that workable, but it gets what you’ve already done merged.   Since, as
> you mentioned, DFDL needs multiple files, what if you were to put these
> files in the classpath in a folder?  IE:
>
> Classpath/schema1/
> Classpath/schema2/
>
> For tests, I’d imagine all you have to do is copy the valid files into the
> test/resources/ folder then run your queries.   In real life situations a
> user would have to copy all the files into the classpath of all drill
> nodes.  This will be dealt with in phase 2.  In phase 2, the user will
> simply have to copy the files into a staging directory and Drill will
> handle copying them to all nodes.  (I think)
>
> Best,
> — C
>
>
> > On Oct 3, 2024, at 10:15, Mike Beckerle <mbecke...@apache.org> wrote:
> >
> > I agree we can do the phase1 merge. It should not break anything.
> >
> > Phase 2 ... Paul suggested "just throw everything into
> > $DRILL_CONFIG_DIR", plugin jars, schema jars, everything, as
> > apparently that gets automatically copied everywhere and put on the
> > class path.
> >
> > I left off right at that point for lack of knowledge.
> >
> > How would a test work that way? I.e, a maven test under
> > src/test/java... how is it going to arrange for DRILL_CONFIG_DIR to be
> > defined, and put things into that directory before drill executes (and
> > reads the env for DRILL_CONFIG_DIR's value). I normally think of
> > env-vars as frozen at the time the JVM starts, so tests can't change
> > them unless they are forking a process, and in a complex system like
> > drill I have no idea the implications of this.
> >
> > The only logic change needed I think is to deal with "there is exactly
> > 1 file to parse and query", vs. "there are numerous files to parse and
> > query"  These files could, I suppose, be distributed somehow, but they
> > also could just be a bunch of files. My guess is drill already has all
> > of this, and we just have to reuse the pattern from some other
> > extension.
> >
> >
> > On Wed, Oct 2, 2024 at 9:17 AM Charles Givre <cgi...@apache.org> wrote:
> >>
> >> Hi Mike,
> >> I hope all is well.  I need to apologize as I grossly overestimated my
> available free time to assist with the DFDL / Drill integration.  I had a
> thought which I wanted to propose.
> >>
> >> My thinking is that we should complete the integration in two phases:
> >>
> >> Phase 1:
> >> For phase 1, I propose that we merge the work that you’ve already
> done.  We’d have to make sure that the DFDL files are accessible from the
> class path.  This isn’t really a great solution, but it is just to get the
> pieces in so we can work on phase 2.  I don’t like seeing good work
> languishing in the PR queue and getting stale.  To complete phase 1, all
> we’d really have to do is get the unit tests working.
> >>
> >> Phase 2:
> >> The remaining issue revolves around making the DFDL files accessible to
> Drill and also so that a user can easily add or remove files.  For this we
> have a solution: DRILL-4726[1] which provides dynamic UDF support.
> Basically what I’m proposing is that we duplicate the components of this PR
> for Drill.  The end result would be that a user could copy the UDF files to
> a staging directory.  Then the user would run a command like:
> >>
> >> CREATE DAFFODIL SCHEMA xxxx USING JAR yyyyy
> >>
> >> When the user does that, the file would be propagated to all the Drill
> nodes.  Implementing this feature would really involve a lot of duplicating
> with slight mods from that pull request.  What do you think?
> >> Best,
> >> — C
> >>
> >>
> >>
> >> [1]: https://github.com/apache/drill/pull/574
> >>
> >>
> >>
>
>

Reply via email to