Re: simplified schema project layout

Mike Beckerle Mon, 20 Dec 2021 12:07:26 -0800

I converted the DFDLSchemas CSV example to use the simplified layout.

I actually like this a lot better for simple examples than the
original "standard schema file system layout".


Take a look and see what you think:

https://github.com/DFDLSchemas/CSV/pull/7

On Wed, Dec 8, 2021 at 12:05 PM Mike Beckerle <mbecke...@apache.org> wrote:
>
> I will give this a try.
>
> On Wed, Dec 8, 2021 at 10:39 AM Steve Lawrence <slawre...@apache.org> wrote:
> >
> > That's fair, I agree there definitely is some redundancy. In general I'm
> > not a huge fan of mixing sources and resources, but maybe it's not too
> > big of a deal since in this case since sources for UDF/Layers will be
> > rare, and when they do exist there's probably only a very small number
> > of them.
> >
> > I haven't tested this much, but based on some examples and playing
> > around a bit, I think this gets you what you're after:
> >
> >    organization := "org.example"
> >
> >    name := "dfdl-fmt"
> >
> >    version := "0.1.0-SNAPSHOT"
> >
> >    lazy val root = (project in file("."))
> >      .settings(
> >        Project.inConfig(Compile)(flattenSettings("src")),
> >        Project.inConfig(Test)(flattenSettings("test")),
> >      )
> >
> >    def flattenSettings(name: String) = Seq(
> >      unmanagedSourceDirectories := Seq(baseDirectory.value / name),
> >      unmanagedResourceDirectories := unmanagedSourceDirectories.value,
> >      unmanagedSources / includeFilter := "*.java" | "*.scala",
> >      unmanagedResources / excludeFilter := (unmanagedSources /
> > includeFilter).value,
> >    )
> >
> > (note that we probably also want many of the existing settings in our
> > current build.sbt files)
> >
> > All the non-test stuff goes in a "src" directory. Sources are anything
> > that ends with .java or .scala. Resources are anything that isn't a source.
> >
> > And the "test" directory has the exact same layout, but for tests.
> >
> > The .class files that end up in the jar are namespaced by the package line.
> >
> > The resources that end up in the jar are namespaced by the directory
> > structure and/or file naming convention as they are in the src/ or test/
> > directory. So schema authors can namespace schemas however they want,
> > whether it be directories or file names, or not at all.
> >
> >
> > On 12/8/21 9:56 AM, Mike Beckerle wrote:
> > > I guess my concern is that all the depth associated with the sbt-based
> > > standard layout feels completely redundant to me.
> > >
> > > I am suggesting of the src/main/scala, we need only main/. Of
> > > src/main/resources/kind we need only main/.
> > >
> > > E.g, Why are all the typed subdirs needed (xsd/, dfdl/, etc.) when
> > > file extensions can be used to distinguish resource types and
> > > programming language compilers to be used?
> > >
> > > To me the only "real" distinction in the standard project layout is
> > > main vs. test which is needed to exclude test stuff when packaging.
> > >
> > > The rest is
> > > (a) using directories as "package names" - which can be done with
> > > well-chosen longer file names
> > > (b) using directories as redundant file typing - which can be done
> > > with file name extensions.
> > >
> > > To me a UDF is a META-INF/services file and some scala/java code in
> > > the "main" area.
> > > Ditto for a layer definition.
> > >
> > > I guess concretely I am wondering if there is a way to override basic
> > > sbt settings like this:
> > >
> > > * Instead of src/main/scala, just look for main/*.scala
> > > * Instead of src/main/java, just look for main/*.java
> > > * Instead of src/main/resources/* just look for main/* where the file
> > > name does not end in ".scala" nor ".java"
> > >
> > > And similarly for test things, where src/test/whatever just becomes
> > > test/whatever and distinctions are made using file name extensions.
> > >
> > > On Wed, Dec 8, 2021 at 9:21 AM Steve Lawrence <slawre...@apache.org> 
> > > wrote:
> > >>
> > >> What about the scala/java/resources directories? Do those still exist or
> > >> are they simplified somehow?
> > >>
> > >> We currently have an xsd/ directory to allow schematron, xslt, etc to be
> > >> included in the same repo. Do we still have that directory?
> > >>
> > >> How do pluggable UDF's and Layers fit into this? Do we suggest those are
> > >> in separate repos, or can they fit into this?
> > >>
> > >> Note that I believe sbt supports organizations in a single directory
> > >> name, e.g.
> > >>
> > >>     src/
> > >>     └── main/
> > >>         └── resources/
> > >>             └── org.foo.myschema/
> > >>                 └── xsd/
> > >>                     └── common.xsd
> > >>
> > >> So that could be one approach to reduce the deep directory structures.
> > >>
> > >> Generally, I'm definitely in favor of simplifying the layout, but this
> > >> to me feels like it might just add more confusion since it's sort of
> > >> close to the existing layout, but not quite the same.
> > >>
> > >> If we are potentially going to go against the standards, and potentially
> > >> make IDE support more difficult, I almost wonder if we should be more
> > >> ambitious and come up with something that is completely different? I'm
> > >> not sure what that would be, but could be more flat. For example, maybe
> > >> something like this:
> > >>
> > >>     dfdl-fmt/
> > >>     ├── build.sbt
> > >>     ├── dfdl/
> > >>     │   ├── format.dfdl.xsd
> > >>     │   └── main.dfdl.xsd
> > >>     ├── layer/
> > >>     │   └── MyLayer.scala
> > >>     ├── sch/
> > >>     ├── tdml/
> > >>     │   └── main.tdml
> > >>     ├── udf/
> > >>     │   └── MyUDF.scala
> > >>     └── xslt/
> > >>
> > >> A plugin could implicitly add organization structure so things are
> > >> namespace when building a jar. Or maybe we even do something like NiFi
> > >> has with .nar fles have have a custom package format, e.g. .dar
> > >>
> > >> It's probably a lot more work, and things to work out (e.g. how to
> > >> dependencies work for udf and layers), and almost certainly needs  a
> > >> plugin to work instead of just tweaking sbt properties, but something
> > >> like that feels more ideal to me.
> > >>
> > >> Note that maybe we don't even use sbt for this. Maybe there's a better
> > >> tool for something like this.
> > >>
> > >> Another thing to consider that is related, with NiFi we found it
> > >> difficult to add jars to the NiFi classpath for a specific processor,
> > >> which means loading schemas from a jar on the classpath couldn't be
> > >> done. Having a custom package format could make this easier, since all
> > >> the .dar processing/lookup would be done by Daffodil rather than
> > >> standard classpath lookups.
> > >>
> > >>
> > >> On 12/3/21 5:25 PM, Mike Beckerle wrote:
> > >>> Experience in giving DFDL training via daffodil is that our standard 
> > >>> schema
> > >>> project layout <https://daffodil.apache.org/dfdl-layout/> is much too 
> > >>> deep
> > >>> (directory wise) for many users to conveniently navigate and use. It 
> > >>> gets
> > >>> in the way of learning.
> > >>>
> > >>> Our layout was designed to follow sbt conventions that enable automated
> > >>> dependency management, packaging, etc. It is easy to use if you are
> > >>> accustomed to using an IDE like Eclipse or IntelliJ.  It is also
> > >>> extraordinarily valuable (and underappreciated) that 'sbt test' does a
> > >>> built-in-self-test on a schema, and that 'sbt publishLocal' creates a 
> > >>> Jar
> > >>> of a DFDL schema for managed dependencies use between schemas.
> > >>>
> > >>> But new users are mostly coming to DFDL/Daffodil from a command-line 
> > >>> prompt
> > >>> and a text editor (e.g., VIM).
> > >>>
> > >>> I am wondering if we can have our cake and eat it too, without too much
> > >>> added sbt complexity, and without losing 'sbt test' and 'sbt 
> > >>> publishLocal'
> > >>> working their magic for us.
> > >>>
> > >>> E.g., what if a simplified layout was:
> > >>>
> > >>> mySchema/schema - takes the place of src/main/*. Also no package-style
> > >>> directory folder structure.
> > >>> mySchema/test - takes the place of src/test/*. No package-style 
> > >>> directory
> > >>> folder structure.
> > >>>
> > >>> It would be optional if users want to user mySchema/test/data and
> > >>> mySchema/test/infosets to separate infosets and data, or just put all 
> > >>> those
> > >>> files in the same place and use file extensions (.dat vs. .dat.xml vs.
> > >>> .tdml, etc.) to distinguish the kinds of content.
> > >>>
> > >>> Such a flattened tree structure requires that the schema file names are
> > >>> well chosen to be unlikely to conflict with other users chosen names, 
> > >>> so a
> > >>> name like common.dfdl.xsd or main.dfdl.xsd would be no good as there is 
> > >>> no
> > >>> package directory structure to make them unique.
> > >>>
> > >>> But names like common-mySchema.dfdl.xsd and main-mySchema.dfdl.xsd would
> > >>> still be quite convenient to use, particularly if the mySchema name is 
> > >>> well
> > >>> chosen. (Note how I've put the unique part of the name first, so that
> > >>> name-completion will work most easily on command line.)
> > >>>
> > >>> I think this would still work with sbt if we simply override the default
> > >>> paths (and perhaps file patterns) used for specifying source and 
> > >>> resources.
> > >>>
> > >>> Thoughts?
> > >>>
> > >>
> >

Re: simplified schema project layout

Reply via email to