Re: status of daffodil + drill work

2023-11-14 Thread Mike Beckerle
daffodil with git hash d26a582b62c5e26b4bcac895b9cb8960c3ce8522
(2023-11-14) or newer supports the metadata and data bridges used by the
current drill integration work I have been doing.

I no longer have a special fork of the daffodil repo.

On Fri, Nov 10, 2023 at 7:00 PM Mike Beckerle  wrote:

>
> I have saved my work at this checkpoint while debugging
>
> I have junit tests working that show I can create Drill metadata and parse
> data via Drill SQL from DFDL schemas that describe what turn into Drill
> flat row-sets, with all columns being simple types (only INT at the
> moment). These work fine.
>
> The next thing to add is a column that is a map. First baby step in nested
> substructure.
>
> The test for this is testComplexQuery3
>
> This test introduces a column that is not simple/INT, it is a map.
>
> So the row now looks like {a1, a2, b: {b1, b2}} where 'b' is the map, and a1, 
> a2, b1, b2 are int type.
>
> This fails.
>
> The new 'b' map column is causing a failure when the DaffodilBatchReader 
> invokes rowSetLoader.save() to close out the row.
>
> It seems to populate the row with a1, a2, b1, and b2, and endWrite on the map 
> is called and that all works.
>
> It fails at an 'assert state == State.IN_ROW', at line 308 of 
> AbstractTupleWriter.java.
>
> So something about having added this column (which is a map), to the row, is 
> causing the state to be incorrect.
>
> If you look at my Drill PR (https://github.com/apache/drill/pull/2836) you 
> can search for FIXME.
>
> My fork repo: https://github.com/mbeckerle/drill, branch daffodil-2835.
>
> My next step is to go back to daffodil, and get all the changes I have needed 
> there integrated in and pushed to the main branch.
>
> That way at least others will have an easier time running this Drill branch 
> of mine to see what is going wrong.
>
> Mike Beckerle
> Apache Daffodil PMC | daffodil.apache.org
> OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> Owl Cyber Defense | www.owlcyberdefense.com
>
>
>


status of daffodil + drill work

2023-11-10 Thread Mike Beckerle
I have saved my work at this checkpoint while debugging

I have junit tests working that show I can create Drill metadata and parse
data via Drill SQL from DFDL schemas that describe what turn into Drill
flat row-sets, with all columns being simple types (only INT at the
moment). These work fine.

The next thing to add is a column that is a map. First baby step in nested
substructure.

The test for this is testComplexQuery3

This test introduces a column that is not simple/INT, it is a map.

So the row now looks like {a1, a2, b: {b1, b2}} where 'b' is the map,
and a1, a2, b1, b2 are int type.

This fails.

The new 'b' map column is causing a failure when the
DaffodilBatchReader invokes rowSetLoader.save() to close out the row.

It seems to populate the row with a1, a2, b1, and b2, and endWrite on
the map is called and that all works.

It fails at an 'assert state == State.IN_ROW', at line 308 of
AbstractTupleWriter.java.

So something about having added this column (which is a map), to the
row, is causing the state to be incorrect.

If you look at my Drill PR (https://github.com/apache/drill/pull/2836)
you can search for FIXME.

My fork repo: https://github.com/mbeckerle/drill, branch daffodil-2835.

My next step is to go back to daffodil, and get all the changes I have
needed there integrated in and pushed to the main branch.

That way at least others will have an easier time running this Drill
branch of mine to see what is going wrong.

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com