[
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058462#comment-18058462
]
ASF GitHub Bot commented on DRILL-8474:
---------------------------------------
cgivre commented on PR #2989:
URL: https://github.com/apache/drill/pull/2989#issuecomment-3897574658
Thanks Steve!
We’d really welcome feedback once this is merged. Could I send something to
the Daffodil list with demo and/or instructions so people could try it out?
Regarding the review, unfortunately, you’re not a committer to Drill, so we
need Mike or someone else to approve. But my goal is to merge this ASAP.
Best,
— C
> On Feb 13, 2026, at 09:34, Steve Lawrence ***@***.***> wrote:
>
>
> @stevedlawrence approved this pull request.
>
> Looks great to me! Just a couple tiny comments.
>
> I agree with Mike it would be useful to test some more complex schemas
that are built with multiple component and plugins jars, like the PCAP one
mentioned. I think having users modify the classpath would work if Drill
doesn't have a built-in way to do that. Though maybe the jar registry does
that? It's not clear to me if that automatically puts things on the classpath
or just some files available somehow else.
>
> But even without that, it seems like the core functionality is here. And
although many of the more interesting schemas require multiple jars or plugins,
there are plenty of schemas that don't, so I think there is still lots of value
even without it.
>
> I would recommend merging this, and later versions of Drill can work to
add and or test Daffodil component/plugin support if needed.
>
> In
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DaffodilDataProcessorFactory.java
<https://github.com/apache/drill/pull/2989#discussion_r2804134130>:
>
> > + dmp.setupDP(validationMode, null);
> + } else {
> + List<Diagnostic> pfDiags;
> + try {
> + pfDiags = dmp.compileSchema(schemaFileURI, rootName, rootNS);
> + } catch (URISyntaxException | IOException e) {
> + throw new CompileFailure(e);
> + }
> + dmp.setupDP(validationMode, pfDiags);
> + }
> + return dmp.dp;
> + }
> +
> + private void loadSchema(URI schemaFileURI) throws IOException,
InvalidParserException {
> + Compiler c = Daffodil.compiler();
> + dp =
c.reload(Channels.newChannel(schemaFileURI.toURL().openStream()));
> Just to be clear, since I don't think Daffodil documents this very well,
the way save/reload works in Daffodil is save uses Java serialization to
serialize a DataProcessor, and reload just deserializes it. In all uses that
I'm aware of, these .bin files are trusted and so are safe to
reload/deserialize. But I know there are a number of security concerns with
deserializing untrusted data. If the Drill security model doesn't necessarily
trust users with files provided to CREATE DAFFODIL SCHEMA or SELECT * from ...
type=daffodil, then it might be worth considering if reload is something you
want to disallow. It's a potential loss since saved parsers are easier to
distribute and can avoid long schema compilation times, but it does come with
potential risk.
>
> In
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java
<https://github.com/apache/drill/pull/2989#discussion_r2804303273>:
>
> > + *
> + * @return true if there are rows retrieved, false if no rows were
retrieved, which means no more
> + * will ever be retrieved (end of data).
> + * @throws RuntimeException
> + * on parse errors.
> + */
> + @Override
> + public boolean next() {
> + // Check assumed invariants
> + // We don't know if there is data or not. This could be called on an
empty data file.
> + // We DO know that this won't be called if there is no space in the
batch for even 1
> + // row.
> + if (dafParser.isEOF()) {
> + return false; // return without even checking for more rows or
trying to parse.
> + }
> + while (rowSetLoader.start() && !dafParser.isEOF()) { // we never
zero-trip this loop.
> One thing to consider is that it is possible for Daffodil to consume zero
bits of data and it be considered a success. Schemas that allow that are kindof
a pathological case and don't really exist in the real world, but might be
something to consider since it is technically possible.
>
> The way to detect this is to access ParseResult.location().bitPos1b() at
the end of a parse. If that bit position is the same as before the parse then
it means the parse successfully consumed zero bits, and it will continue to do
so in an infinite loop if you keep attempting to parse. For example, this
usually looks something like this:
>
> int lastBitPosition = 1;
> while (...)
> ParseResult pr = dp.parse();
> ... // check isError
> int currBitPosition = pr.location().bitPos1b();
> if (currBitPosition == lastBitPosition)) {
> // consumed no data, exit parsing to avoid infinite loop
> break;
> }
> lastBitPosition = currBitPosition;
> }
> Note that even if Daffodil consumes zero bits, it is still possible to
create infoset events, so you really do need to check the bit position instead
checking if infoset events were created.
>
> —
> Reply to this email directly, view it on GitHub
<https://github.com/apache/drill/pull/2989#pullrequestreview-3797200685>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKB7PTGDQYOJRHOBQWSSZ34LXOILAVCNFSM6AAAAAB4XWHMGWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTOOJXGIYDANRYGU>.
> You are receiving this because you were assigned.
>
> Add Daffodil Format Plugin
> --------------------------
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
> Issue Type: New Feature
> Affects Versions: 1.21.1
> Reporter: Charles Givre
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)