What about the scala/java/resources directories? Do those still exist or are they simplified somehow?

We currently have an xsd/ directory to allow schematron, xslt, etc to be included in the same repo. Do we still have that directory?

How do pluggable UDF's and Layers fit into this? Do we suggest those are in separate repos, or can they fit into this?

Note that I believe sbt supports organizations in a single directory name, e.g.

  src/
  └── main/
      └── resources/
          └── org.foo.myschema/
              └── xsd/
                  └── common.xsd

So that could be one approach to reduce the deep directory structures.

Generally, I'm definitely in favor of simplifying the layout, but this to me feels like it might just add more confusion since it's sort of close to the existing layout, but not quite the same.

If we are potentially going to go against the standards, and potentially make IDE support more difficult, I almost wonder if we should be more ambitious and come up with something that is completely different? I'm not sure what that would be, but could be more flat. For example, maybe something like this:

  dfdl-fmt/
  ├── build.sbt
  ├── dfdl/
  │   ├── format.dfdl.xsd
  │   └── main.dfdl.xsd
  ├── layer/
  │   └── MyLayer.scala
  ├── sch/
  ├── tdml/
  │   └── main.tdml
  ├── udf/
  │   └── MyUDF.scala
  └── xslt/

A plugin could implicitly add organization structure so things are namespace when building a jar. Or maybe we even do something like NiFi has with .nar fles have have a custom package format, e.g. .dar

It's probably a lot more work, and things to work out (e.g. how to dependencies work for udf and layers), and almost certainly needs a plugin to work instead of just tweaking sbt properties, but something like that feels more ideal to me.

Note that maybe we don't even use sbt for this. Maybe there's a better tool for something like this.

Another thing to consider that is related, with NiFi we found it difficult to add jars to the NiFi classpath for a specific processor, which means loading schemas from a jar on the classpath couldn't be done. Having a custom package format could make this easier, since all the .dar processing/lookup would be done by Daffodil rather than standard classpath lookups.


On 12/3/21 5:25 PM, Mike Beckerle wrote:
Experience in giving DFDL training via daffodil is that our standard schema
project layout <https://daffodil.apache.org/dfdl-layout/> is much too deep
(directory wise) for many users to conveniently navigate and use. It gets
in the way of learning.

Our layout was designed to follow sbt conventions that enable automated
dependency management, packaging, etc. It is easy to use if you are
accustomed to using an IDE like Eclipse or IntelliJ.  It is also
extraordinarily valuable (and underappreciated) that 'sbt test' does a
built-in-self-test on a schema, and that 'sbt publishLocal' creates a Jar
of a DFDL schema for managed dependencies use between schemas.

But new users are mostly coming to DFDL/Daffodil from a command-line prompt
and a text editor (e.g., VIM).

I am wondering if we can have our cake and eat it too, without too much
added sbt complexity, and without losing 'sbt test' and 'sbt publishLocal'
working their magic for us.

E.g., what if a simplified layout was:

mySchema/schema - takes the place of src/main/*. Also no package-style
directory folder structure.
mySchema/test - takes the place of src/test/*. No package-style directory
folder structure.

It would be optional if users want to user mySchema/test/data and
mySchema/test/infosets to separate infosets and data, or just put all those
files in the same place and use file extensions (.dat vs. .dat.xml vs.
.tdml, etc.) to distinguish the kinds of content.

Such a flattened tree structure requires that the schema file names are
well chosen to be unlikely to conflict with other users chosen names, so a
name like common.dfdl.xsd or main.dfdl.xsd would be no good as there is no
package directory structure to make them unique.

But names like common-mySchema.dfdl.xsd and main-mySchema.dfdl.xsd would
still be quite convenient to use, particularly if the mySchema name is well
chosen. (Note how I've put the unique part of the name first, so that
name-completion will work most easily on command line.)

I think this would still work with sbt if we simply override the default
paths (and perhaps file patterns) used for specifying source and resources.

Thoughts?


Reply via email to