So when a data format is described by a DFDL schema, I can generate equivalent Drill schema (TupleMetadata). This schema is always complete. I have unit tests working with this.
To do this for a real SQL query, I need the DFDL schema to be identified on the SQL query by a file path or URI. Q: How do I get that DFDL schema File/URI parameter from the SQL query? Next, assuming I have the DFDL schema identified, I generate an equivalent Drill TupleMetadata from it. (Or, hopefully retrieve it from a cache) What objects do I call, or what classes do I have to create to make this Drill TupleMetadata available to Drill so it uses it in all the ways a static Drill schema can be useful? I just need pointers to the code that illustrate how to do this. Thanks -Mike Beckerle On Thu, Oct 12, 2023 at 12:13 AM Paul Rogers <par0...@gmail.com> wrote: > Mike, > > This is a complex question and has two answers. > > First, the standard enhanced vector framework (EVF) used by most readers > assumes a "pull" model: read each record. This is where the next() comes > in: readers just implement this to read the next record. But, the code > under EVF works with a push model: the readers write to vectors, and signal > the next record. EVF translates the lower-level push model to the > higher-level, easier-to-use pull model. The best example of this is the > JSON reader which uses Jackson to parse JSON and responds to the > corresponding events. > > You can thus take over the task of filling a batch of records. I'd have to > poke around the code to refresh my memory. Or, you can take a look at the > (quite complex) JSON parser, or the EVF itself to see what it does. There > are many unit tests that show this at various levels of abstraction. > > Basically, you have to: > > * Start a batch > * Ask if you can start the next record (which might be declined if the > batch is full) > * Write each field. For complex fields, such as records, recursively do the > start/end record work. > * Mark the record as complete. > > You should be able to map event handlers to EVF actions as a result. Even > though DFDL wants to "drive", it still has to give up control once the > batch is full. EVF will then handle the (surprisingly complex) task of > finishing up the batch and returning it as the output of the Scan operator. > > - Paul > > On Wed, Oct 11, 2023 at 6:30 PM Mike Beckerle <mbecke...@apache.org> > wrote: > > > Daffodil parsing generates event callbacks to an InfosetOutputter, which > is > > analogous to a SAX event handler. > > > > Drill is expecting an iterator style of calling next() to advance through > > the input, i.e., Drill has the control thread and expects to do pull > > parsing. At least from the code I studied in the format-xml contrib. > > > > Is there any alternative? Before I dig into creating another one of these > > co-routine-style control inversions (which have proven to be problematic > > for performance. > > >