[
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035666#comment-18035666
]
ASF GitHub Bot commented on DRILL-8474:
---------------------------------------
mbeckerle commented on PR #2989:
URL: https://github.com/apache/drill/pull/2989#issuecomment-3493623186
Ok, If I specify an actual jar file containing some compiled java code, will
that be put onto the java classpath in the drill bits?
The issue I'm seeing is that schemas are normally pre-compiled into a ".bin"
file which is fast to load, but in addition to this file, the schema may have a
dependency on certain Daffodil plug in code, which is compiled java in jar
files. This dependency can be on multiple different jar files. All these
dependency jar files need to be on the classpath.
The daffodil plugins are of 3 kinds. UDFs, "layers" (which compute checksums
or decompress zip files, etc. ), and charset definitions. All are dynamically
loaded into the JVM when the DFDL schema requests them. They are found using
the
All these different jar files need to be on the Java classpath so that their
metadata allows dynamic loading.
So while a simple DFDL schema might be contained in one jar file, in general
there can be a dependency on multiple jar files which must be placed onto the
Java classpath in a specific order. The schema may be needed in source form
also for validation of data.
As a case in point, on github there are DFDL schema projects named:
- envelope-payload
- tcpMessage
- mil-std-2045
- PCAP
- ethernetIP
These are separate component DFDL schemas that are assembled to form an
assembly schema by way of schema composition.
The only jar file that needs to be on the classpath is the one from
ethernetIP, since that defines a layer algorithm for computing IPv4 checksums.
The DFDL schema that combines all these components can be pre-compiled into
an envelope-payload.bin file.
So in this case I need this ".bin" file to be distributed across the cluster
and loaded by Daffodil in each drill bit, and with the ethernetIP.jar file
distributed across the drill cluster and the ethernetIP.jar needs to be on the
classpath of the drill bit java process.
> Add Daffodil Format Plugin
> --------------------------
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
> Issue Type: New Feature
> Affects Versions: 1.21.1
> Reporter: Charles Givre
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)