Re: xml_reader branch - and getting Drill to use the XSD-derived metadata

2023-07-31 Thread Charles Givre
HI Mike, 
That's awesome work!  I wasn't expecting you to work on this but I really 
appreciate it.   My plan was to do this in two phases.  The first was simply to 
get Drill to be able to read an XSD schema and translate that into a Drill 
schema.  Part two would be to integrate that into the XML reader and the HTTP 
plugin which also uses the XML reader.  

I'd be happy to discuss this with you.  Could we shoot for tomorrow (Tues) 
afternoon sometime?  I'm basically free except at 4PM.
Thx!
-- C




> On Jul 31, 2023, at 5:55 PM, Mike Beckerle  wrote:
> 
> I added a first cut at attribute support -
> https://github.com/cgivre/drill/pull/6
> 
> Ok, so given that the xsd_reader can now map a small usable subset of XSD
> to drill metadata, it seems next we want Drill to start using this
> metadata.
> 
> I am not sure where to start here. Where would the metadata from the XSD
> plug into the query planning/building?
> 
> Can we schedule a time to discuss?
> 
> I am broadly available the rest of this week. Tues at 1pm or 3pm, any time
> the rest of the week, but sooner is better as I also have time to work on
> this currently.
> 
> Mike Beckerle
> Apache Daffodil PMC | daffodil.apache.org
> OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> Owl Cyber Defense | www.owlcyberdefense.com



xml_reader branch - and getting Drill to use the XSD-derived metadata

2023-07-31 Thread Mike Beckerle
I added a first cut at attribute support -
https://github.com/cgivre/drill/pull/6

Ok, so given that the xsd_reader can now map a small usable subset of XSD
to drill metadata, it seems next we want Drill to start using this
metadata.

I am not sure where to start here. Where would the metadata from the XSD
plug into the query planning/building?

Can we schedule a time to discuss?

I am broadly available the rest of this week. Tues at 1pm or 3pm, any time
the rest of the week, but sooner is better as I also have time to work on
this currently.

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com


Re: drill tests not passing

2023-07-31 Thread Mike Beckerle
Charles,

fixed your testComplexXSD in your xsd_reader branch.

https://github.com/cgivre/drill/pull/5

Need to add attribute support, but this is quite close.




On Fri, Jul 14, 2023 at 5:59 PM Charles Givre  wrote:

> Hi Mike,
> One more thing... I've been working on an XSD Reader for Drill for some
> time.  (This is still very buggy)
> https://github.com/cgivre/drill/tree/xsd_reader
>
>  What this does is attempt to convert a XML XSD file into a Drill Schema.
> Best,
> -- C
>
>
>
> On Jul 14, 2023, at 2:20 PM, Charles Givre  wrote:
>
> Mike,
> Are you able to build Drill w/o the tests?  If so, my suggestion is really
> just to start working on the DFDL extensions.  I've been doing Drill stuff
> for far too long and really haven't needed to run the full battery of unit
> tests locally.  As long as you can build it and can execute individual unit
> tests, you should be ok.  Others may disagree, but for what you're doing,
> I'd think it would be fine.
> Best,
> -- C
>
>
>
> On Jul 14, 2023, at 2:04 PM, Mike Beckerle  wrote:
>
> Update: I did a clean and install -DskipTests=true.
>
> Then I tried the mvn test using the non-UTC timezone stuff, as suggested.
>
> But alas, it still fails, this time the failure unique and is only in
> "Java Execution Engine"
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-dependency-plugin:3.4.0:unpack
> (unpack-vector-types) on project drill-java-exec: Artifact has not been
> packaged yet. When used on reactor artifact, unpack should be executed
> after packaging: see MDEP-98. -> [Help 1]
>
> The command and complete trace output are below.
>
> I need assistance on how to proceed.
>
> Complete trace from the mvn test is attached.
>
>
> On Thu, Jul 13, 2023 at 1:13 PM Mike Beckerle 
> wrote:
>
>> To answer questions:
>>
>> 1. Paul: This is a 100% stock build. All I have done is clone the repo
>> (master branch). Make a new git branch (in case I make future changes). Try
>> to build (success) and test (failed so far).
>>
>> 2. James: The /opt/drill directory I created is owned by my userid and
>> has full read/write access for all the development activities. I just put
>> it there so it would have a shorter path to fix the first Hive-related
>> glitch I encountered with the Linux 255 limit on file pathname length.
>>
>> I will try the suggested maven command line for non-UTC and see if things
>> improve.
>>
>> The challenge for me as a newby is how do I know if I have everything
>> properly configured?
>>
>> Can I just turn off building and testing of the Hive-related stuff in
>> some supported/well-known way?
>>
>> If so, I would suggest I'd like to turn off not just Hive, but *as much
>> as possible*. I really just need the embedded drill to work.
>>
>> I would agree with @Charles Givre   that a contrib
>> package addition is the ideal approach and that's what I'll be attempting.
>>
>> -mikeb
>>
>> On Thu, Jul 13, 2023 at 10:59 AM Charles Givre  wrote:
>>
>>> I'll add some heresy here... IMHO, for the purposes of developing a DFDL
>>> extension, you probably don't need all the Drill tests to run.  For your
>>> project, my suggestion would be to add a module to the contrib package and
>>> that way your changes are relatively self contained.
>>> Best,
>>> -- C
>>>
>>>
>>>
>>> > On Jul 13, 2023, at 10:27 AM, James Turton  wrote:
>>> >
>>> > Hi Mike
>>> >
>>> > Here's the command line I use to run tests on a machine that's not in
>>> the UTC time zone (plus some unrelated memory size arguments).
>>> >
>>> > mvn test -Djunit.args="-Duser.timezone=UTC -Duser.language=en
>>> -Duser.region=US" -DmemoryMb=2560 -DdirectMemoryMb=2560
>>> >
>>> > I have one other question to add to Paul's comments - does the OS user
>>> that you're running Maven under have write access to all of the source tree
>>> that you put at /opt/drill?
>>> >
>>> > On 2023/07/11 22:12, Paul Rogers wrote:
>>> >> Hi Mike,
>>> >>
>>> >> A quick glance at the log suggests a failure in the tests for the JSON
>>> >> reader, in the Mongo extended types. Drill's date/time support has
>>> >> historically been fragile. Some tests only work if your machine is
>>> set to
>>> >> use the UTC time zone (or Java is told to pretend that the time is
>>> UTC.)
>>> >> The Mongo types test failure seems to be around a date/time test so
>>> maybe
>>> >> this is the issue?
>>> >>
>>> >> There are also failures indicating that the Drillbit (Drill server)
>>> died.
>>> >> Not sure how this can happen, as tests run Drill embedded (or used
>>> to.)
>>> >> Looking earlier in the logs, it seems that the Drillbit didn't start
>>> due to
>>> >> UDF (user-defined function) failures:
>>> >>
>>> >> Found duplicated function in drill-custom-lower.jar:
>>> >> custom_lower(VARCHAR-REQUIRED)
>>> >> Found duplicated function in built-in: lower(VARCHAR-REQUIRED)
>>> >>
>>> >> Not sure how this could occur: it should have failed in all builds.
>>> >>
>>> >> Also:
>>> >>
>>> >> File
>>> >>
>>> /opt/drill/exec/java-exec/target/org.apache