status of daffodil + drill work

Mike Beckerle Fri, 10 Nov 2023 16:01:09 -0800

I have saved my work at this checkpoint while debugging

I have junit tests working that show I can create Drill metadata and parse
data via Drill SQL from DFDL schemas that describe what turn into Drill
flat row-sets, with all columns being simple types (only INT at the
moment). These work fine.


The next thing to add is a column that is a map. First baby step in nested
substructure.

The test for this is testComplexQuery3

This test introduces a column that is not simple/INT, it is a map.

So the row now looks like {a1, a2, b: {b1, b2}} where 'b' is the map,
and a1, a2, b1, b2 are int type.

This fails.

The new 'b' map column is causing a failure when the
DaffodilBatchReader invokes rowSetLoader.save() to close out the row.

It seems to populate the row with a1, a2, b1, and b2, and endWrite on
the map is called and that all works.

It fails at an 'assert state == State.IN_ROW', at line 308 of
AbstractTupleWriter.java.

So something about having added this column (which is a map), to the
row, is causing the state to be incorrect.

If you look at my Drill PR (https://github.com/apache/drill/pull/2836)
you can search for FIXME.

My fork repo: https://github.com/mbeckerle/drill, branch daffodil-2835.

My next step is to go back to daffodil, and get all the changes I have
needed there integrated in and pushed to the main branch.

That way at least others will have an easier time running this Drill
branch of mine to see what is going wrong.

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com

status of daffodil + drill work

Reply via email to