I converted the DFDLSchemas CSV example to use the simplified layout. I actually like this a lot better for simple examples than the original "standard schema file system layout".
Take a look and see what you think: https://github.com/DFDLSchemas/CSV/pull/7 On Wed, Dec 8, 2021 at 12:05 PM Mike Beckerle <mbecke...@apache.org> wrote: > > I will give this a try. > > On Wed, Dec 8, 2021 at 10:39 AM Steve Lawrence <slawre...@apache.org> wrote: > > > > That's fair, I agree there definitely is some redundancy. In general I'm > > not a huge fan of mixing sources and resources, but maybe it's not too > > big of a deal since in this case since sources for UDF/Layers will be > > rare, and when they do exist there's probably only a very small number > > of them. > > > > I haven't tested this much, but based on some examples and playing > > around a bit, I think this gets you what you're after: > > > > organization := "org.example" > > > > name := "dfdl-fmt" > > > > version := "0.1.0-SNAPSHOT" > > > > lazy val root = (project in file(".")) > > .settings( > > Project.inConfig(Compile)(flattenSettings("src")), > > Project.inConfig(Test)(flattenSettings("test")), > > ) > > > > def flattenSettings(name: String) = Seq( > > unmanagedSourceDirectories := Seq(baseDirectory.value / name), > > unmanagedResourceDirectories := unmanagedSourceDirectories.value, > > unmanagedSources / includeFilter := "*.java" | "*.scala", > > unmanagedResources / excludeFilter := (unmanagedSources / > > includeFilter).value, > > ) > > > > (note that we probably also want many of the existing settings in our > > current build.sbt files) > > > > All the non-test stuff goes in a "src" directory. Sources are anything > > that ends with .java or .scala. Resources are anything that isn't a source. > > > > And the "test" directory has the exact same layout, but for tests. > > > > The .class files that end up in the jar are namespaced by the package line. > > > > The resources that end up in the jar are namespaced by the directory > > structure and/or file naming convention as they are in the src/ or test/ > > directory. So schema authors can namespace schemas however they want, > > whether it be directories or file names, or not at all. > > > > > > On 12/8/21 9:56 AM, Mike Beckerle wrote: > > > I guess my concern is that all the depth associated with the sbt-based > > > standard layout feels completely redundant to me. > > > > > > I am suggesting of the src/main/scala, we need only main/. Of > > > src/main/resources/kind we need only main/. > > > > > > E.g, Why are all the typed subdirs needed (xsd/, dfdl/, etc.) when > > > file extensions can be used to distinguish resource types and > > > programming language compilers to be used? > > > > > > To me the only "real" distinction in the standard project layout is > > > main vs. test which is needed to exclude test stuff when packaging. > > > > > > The rest is > > > (a) using directories as "package names" - which can be done with > > > well-chosen longer file names > > > (b) using directories as redundant file typing - which can be done > > > with file name extensions. > > > > > > To me a UDF is a META-INF/services file and some scala/java code in > > > the "main" area. > > > Ditto for a layer definition. > > > > > > I guess concretely I am wondering if there is a way to override basic > > > sbt settings like this: > > > > > > * Instead of src/main/scala, just look for main/*.scala > > > * Instead of src/main/java, just look for main/*.java > > > * Instead of src/main/resources/* just look for main/* where the file > > > name does not end in ".scala" nor ".java" > > > > > > And similarly for test things, where src/test/whatever just becomes > > > test/whatever and distinctions are made using file name extensions. > > > > > > On Wed, Dec 8, 2021 at 9:21 AM Steve Lawrence <slawre...@apache.org> > > > wrote: > > >> > > >> What about the scala/java/resources directories? Do those still exist or > > >> are they simplified somehow? > > >> > > >> We currently have an xsd/ directory to allow schematron, xslt, etc to be > > >> included in the same repo. Do we still have that directory? > > >> > > >> How do pluggable UDF's and Layers fit into this? Do we suggest those are > > >> in separate repos, or can they fit into this? > > >> > > >> Note that I believe sbt supports organizations in a single directory > > >> name, e.g. > > >> > > >> src/ > > >> └── main/ > > >> └── resources/ > > >> └── org.foo.myschema/ > > >> └── xsd/ > > >> └── common.xsd > > >> > > >> So that could be one approach to reduce the deep directory structures. > > >> > > >> Generally, I'm definitely in favor of simplifying the layout, but this > > >> to me feels like it might just add more confusion since it's sort of > > >> close to the existing layout, but not quite the same. > > >> > > >> If we are potentially going to go against the standards, and potentially > > >> make IDE support more difficult, I almost wonder if we should be more > > >> ambitious and come up with something that is completely different? I'm > > >> not sure what that would be, but could be more flat. For example, maybe > > >> something like this: > > >> > > >> dfdl-fmt/ > > >> ├── build.sbt > > >> ├── dfdl/ > > >> │ ├── format.dfdl.xsd > > >> │ └── main.dfdl.xsd > > >> ├── layer/ > > >> │ └── MyLayer.scala > > >> ├── sch/ > > >> ├── tdml/ > > >> │ └── main.tdml > > >> ├── udf/ > > >> │ └── MyUDF.scala > > >> └── xslt/ > > >> > > >> A plugin could implicitly add organization structure so things are > > >> namespace when building a jar. Or maybe we even do something like NiFi > > >> has with .nar fles have have a custom package format, e.g. .dar > > >> > > >> It's probably a lot more work, and things to work out (e.g. how to > > >> dependencies work for udf and layers), and almost certainly needs a > > >> plugin to work instead of just tweaking sbt properties, but something > > >> like that feels more ideal to me. > > >> > > >> Note that maybe we don't even use sbt for this. Maybe there's a better > > >> tool for something like this. > > >> > > >> Another thing to consider that is related, with NiFi we found it > > >> difficult to add jars to the NiFi classpath for a specific processor, > > >> which means loading schemas from a jar on the classpath couldn't be > > >> done. Having a custom package format could make this easier, since all > > >> the .dar processing/lookup would be done by Daffodil rather than > > >> standard classpath lookups. > > >> > > >> > > >> On 12/3/21 5:25 PM, Mike Beckerle wrote: > > >>> Experience in giving DFDL training via daffodil is that our standard > > >>> schema > > >>> project layout <https://daffodil.apache.org/dfdl-layout/> is much too > > >>> deep > > >>> (directory wise) for many users to conveniently navigate and use. It > > >>> gets > > >>> in the way of learning. > > >>> > > >>> Our layout was designed to follow sbt conventions that enable automated > > >>> dependency management, packaging, etc. It is easy to use if you are > > >>> accustomed to using an IDE like Eclipse or IntelliJ. It is also > > >>> extraordinarily valuable (and underappreciated) that 'sbt test' does a > > >>> built-in-self-test on a schema, and that 'sbt publishLocal' creates a > > >>> Jar > > >>> of a DFDL schema for managed dependencies use between schemas. > > >>> > > >>> But new users are mostly coming to DFDL/Daffodil from a command-line > > >>> prompt > > >>> and a text editor (e.g., VIM). > > >>> > > >>> I am wondering if we can have our cake and eat it too, without too much > > >>> added sbt complexity, and without losing 'sbt test' and 'sbt > > >>> publishLocal' > > >>> working their magic for us. > > >>> > > >>> E.g., what if a simplified layout was: > > >>> > > >>> mySchema/schema - takes the place of src/main/*. Also no package-style > > >>> directory folder structure. > > >>> mySchema/test - takes the place of src/test/*. No package-style > > >>> directory > > >>> folder structure. > > >>> > > >>> It would be optional if users want to user mySchema/test/data and > > >>> mySchema/test/infosets to separate infosets and data, or just put all > > >>> those > > >>> files in the same place and use file extensions (.dat vs. .dat.xml vs. > > >>> .tdml, etc.) to distinguish the kinds of content. > > >>> > > >>> Such a flattened tree structure requires that the schema file names are > > >>> well chosen to be unlikely to conflict with other users chosen names, > > >>> so a > > >>> name like common.dfdl.xsd or main.dfdl.xsd would be no good as there is > > >>> no > > >>> package directory structure to make them unique. > > >>> > > >>> But names like common-mySchema.dfdl.xsd and main-mySchema.dfdl.xsd would > > >>> still be quite convenient to use, particularly if the mySchema name is > > >>> well > > >>> chosen. (Note how I've put the unique part of the name first, so that > > >>> name-completion will work most easily on command line.) > > >>> > > >>> I think this would still work with sbt if we simply override the default > > >>> paths (and perhaps file patterns) used for specifying source and > > >>> resources. > > >>> > > >>> Thoughts? > > >>> > > >> > >