This is an automated email from the ASF dual-hosted git repository.
mbeckerle pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/daffodil-site.git
The following commit(s) were added to refs/heads/main by this push:
new da4927b Add more training and best-practice materials
da4927b is described below
commit da4927bfb170c7cd3d44c21671dd80bc4e88e53a
Author: Michael Beckerle <[email protected]>
AuthorDate: Tue Dec 16 13:59:10 2025 -0500
Add more training and best-practice materials
Add best practice note about enums.
Enum symbols should not contain whitespace.
Best practice materials are slide decks mostly.
Add complex type around otherwise anon choice in
examples on best practice page.
Update (partly) Standard profile design note
Overlaps with best practices doc quite a bit.
Not all of this is fixed, but it's only a design note.
DAFFODIL-2998
---
site/best-practices/P-Avoid-Check-Constraints.pdf | Bin 0 -> 187529 bytes
site/best-practices/P-Avoid-Check-Constraints.pptx | Bin 0 -> 261131 bytes
.../P-DFDL-BLOBs-v-HexBinary-array.pdf | Bin 0 -> 138852 bytes
.../P-DFDL-BLOBs-v-HexBinary-array.pptx | Bin 0 -> 255363 bytes
site/best-practices/P-DFDL-Reject-Elements.pdf | Bin 0 -> 96789 bytes
site/best-practices/P-DFDL-Reject-Elements.pptx | Bin 0 -> 253205 bytes
site/best-practices/P-DFDL-Round-Trip-Testing.pdf | Bin 0 -> 141572 bytes
site/best-practices/P-DFDL-Round-Trip-Testing.pptx | Bin 0 -> 323912 bytes
site/best-practices/P-DFDL-Structured-Text.pdf | Bin 0 -> 146772 bytes
site/best-practices/P-DFDL-Structured-Text.pptx | Bin 0 -> 258146 bytes
.../design-notes/Proposed-DFDL-Standard-Profile.md | 43 ++++++----
site/dfdl-best-practices.md | 91 ++++++++++++---------
site/dfdl-extensions.md | 9 ++
site/dfdl-training.md | 8 ++
.../P-DFDL-Properties-lengthKind-bitOrder.pdf | Bin 0 -> 133664 bytes
.../P-DFDL-Properties-lengthKind-bitOrder.pptx | Bin 0 -> 259843 bytes
site/tutorials/P-Filling-vs-Padding-Trimming.pdf | Bin 0 -> 143349 bytes
site/tutorials/P-Filling-vs-Padding-Trimming.pptx | Bin 0 -> 255931 bytes
18 files changed, 98 insertions(+), 53 deletions(-)
diff --git a/site/best-practices/P-Avoid-Check-Constraints.pdf
b/site/best-practices/P-Avoid-Check-Constraints.pdf
new file mode 100755
index 0000000..c30522b
Binary files /dev/null and b/site/best-practices/P-Avoid-Check-Constraints.pdf
differ
diff --git a/site/best-practices/P-Avoid-Check-Constraints.pptx
b/site/best-practices/P-Avoid-Check-Constraints.pptx
new file mode 100755
index 0000000..887b90f
Binary files /dev/null and b/site/best-practices/P-Avoid-Check-Constraints.pptx
differ
diff --git a/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pdf
b/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pdf
new file mode 100755
index 0000000..d962f89
Binary files /dev/null and
b/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pdf differ
diff --git a/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pptx
b/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pptx
new file mode 100755
index 0000000..f47ae01
Binary files /dev/null and
b/site/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pptx differ
diff --git a/site/best-practices/P-DFDL-Reject-Elements.pdf
b/site/best-practices/P-DFDL-Reject-Elements.pdf
new file mode 100755
index 0000000..a5e0bed
Binary files /dev/null and b/site/best-practices/P-DFDL-Reject-Elements.pdf
differ
diff --git a/site/best-practices/P-DFDL-Reject-Elements.pptx
b/site/best-practices/P-DFDL-Reject-Elements.pptx
new file mode 100755
index 0000000..d91cc27
Binary files /dev/null and b/site/best-practices/P-DFDL-Reject-Elements.pptx
differ
diff --git a/site/best-practices/P-DFDL-Round-Trip-Testing.pdf
b/site/best-practices/P-DFDL-Round-Trip-Testing.pdf
new file mode 100755
index 0000000..6521f9e
Binary files /dev/null and b/site/best-practices/P-DFDL-Round-Trip-Testing.pdf
differ
diff --git a/site/best-practices/P-DFDL-Round-Trip-Testing.pptx
b/site/best-practices/P-DFDL-Round-Trip-Testing.pptx
new file mode 100755
index 0000000..af54f24
Binary files /dev/null and b/site/best-practices/P-DFDL-Round-Trip-Testing.pptx
differ
diff --git a/site/best-practices/P-DFDL-Structured-Text.pdf
b/site/best-practices/P-DFDL-Structured-Text.pdf
new file mode 100755
index 0000000..b70ccc3
Binary files /dev/null and b/site/best-practices/P-DFDL-Structured-Text.pdf
differ
diff --git a/site/best-practices/P-DFDL-Structured-Text.pptx
b/site/best-practices/P-DFDL-Structured-Text.pptx
new file mode 100755
index 0000000..a8dfea4
Binary files /dev/null and b/site/best-practices/P-DFDL-Structured-Text.pptx
differ
diff --git a/site/dev/design-notes/Proposed-DFDL-Standard-Profile.md
b/site/dev/design-notes/Proposed-DFDL-Standard-Profile.md
index 6743b35..f4c6cb1 100644
--- a/site/dev/design-notes/Proposed-DFDL-Standard-Profile.md
+++ b/site/dev/design-notes/Proposed-DFDL-Standard-Profile.md
@@ -22,10 +22,26 @@ limitations under the License.
{% endcomment %}
-->
-*Version 0.3 2023-12-08*
+*Version 0.4 2025-12-22*
+
+
+## Table of Contents
+{:.no_toc}
+<!-- The {: .no_toc } excludes the above heading from the ToC -->
+
+1. yes, this is the standard Jekyll way to do a ToC (this line gets removed)
+{:toc}
+<!-- note the above line {:toc} cannot have whitespace at the start -->
+
# Introduction
+> **Note:** This proposed standard profile overlaps a great deal with the
+> [DFDL Schema Best Practices](/dfdl-best-practices) and can be viewed as a
+> mechanism to enforce many of those practices.
+>
+> This page needs to be revised in light of the best practices page.
+
In attempting to integrate Apache Daffodil with other data processing
software, the need to make
DFDL schemas interoperate properly in conjunction with other data models has
arisen.
@@ -40,10 +56,12 @@ structured data.
The following things are allowed in DFDL v1.0, but are difficult to map into
most data models:
-- anonymous choices
-- duplicate element child names
+- [anonymous choices](/dfdl-best-practices#avoidAnonymousChoices)
+- [duplicate element child
names](/dfdl-best-practices#AvoidChildElementsWithSameName)
- namespaces that are different, but where the prefixes are not unique
-- global names for element children
+ - There are numerous guidelines about namespaces and avoiding prefixes in
the
+ [DFDL Schema Best Practices](/dfdl-best-practices)
+- [global names for element
children](/dfdl-best-practices#avoidElementNamespaces)
A more restrictive subset of DFDL, a _standard profile_, is needed which can
be enforced (on
request) to ensure that DFDL schemas will be usable with a variety of data
processing systems.
@@ -55,15 +73,9 @@ standard profile (which is a subset of DFDL).
# Standard Profile Restrictions
-## No Anonymous Choices
-
-Choices must be the model groups of complex type definitions and are not
allowed in any other
-context.
+## Group References Cannot Carry DFDL Properties
{#groupReferencesCannotCarryDFDLProperties}
-Each choice branch must begin with a different element. (This is already a XML
Schema requirement -
-Unique Particle Attribution.)
-
-## Group References Cannot Carry DFDL Properties
+> **Note:** This is not mentioned in the best practices, but should be.
Group references are allowed, but DFDL format properties cannot be expressed
on group references; hence,
combining those properties with those of the group definition is not required.
@@ -82,7 +94,7 @@ Allowing groups and group references reduces the difficulty
of converting many l
schemas to conform to the standard profile, and makes this possible without
introducing many
otherwise unneeded element and type definitions.
-## No Element References
+## No Element References {#noElementReferences}
There is no corresponding form of sharing in most data structure systems.
@@ -97,7 +109,7 @@ All namespace prefixes must be unique in the entire schema.
This enables one to create unique identifiers by concatenating prefix_local to
create global names.
-## All Element Children Have Unique Names
+## All Element Children Have Unique Names {#allElementChildrenHaveUniqueNames}
All children element declarations must have unique names within their
enclosing parent element.
@@ -228,7 +240,7 @@ it, requiring instead that an inner sequence carrying the
assertion or
discriminator with NO child content, be inserted in the sequence at the
point where the evaluation is required to occur.
- Requesting/Enabling the Standard Profile
+# Requesting/Enabling the Standard Profile
If the standard profile is requested, then use of constructs outside of the
standard profile is a
Schema Definition Error.
@@ -281,4 +293,3 @@ Including such an explicitly non-standard-profile schema
into a schema that requ
profile should cause a Schema Definition Error.
The inverse however, is not true.
A schema that explicitly obeys the standard profile can be included/imported
into any schema.
-
diff --git a/site/dfdl-best-practices.md b/site/dfdl-best-practices.md
index a322e7f..567de9b 100644
--- a/site/dfdl-best-practices.md
+++ b/site/dfdl-best-practices.md
@@ -42,9 +42,22 @@ This page is a collection of notes on how to create DFDL
schemas to obtain some
using multiple different _XML Schema Validation libraries_ such as [Xerces
C](
{{ site.data.links.reference.xercesc}}) and [libxml2]({{
site.data.links.reference.libxml2}}).
-The [DFDL Training page lists several example
schemas](/dfdl-training#exampleSchemas) which follow
+The [DFDL Training page lists several example
schemas](/dfdl-training#exampleSchemas) which follow
this style guide fully which you can use as good starting points.
+There are also best-practice materials on:
+- [Slides on Well-Formed vs. Valid (Avoiding `dfdl:checkConstraints(.)`)](
+/best-practices/P-Avoid-Check-Constraints.pdf)
+- [Slides on Handling large opaque BLOBs of binary data](
+/best-practices/P-DFDL-BLOBs-v-HexBinary-array.pdf)
+- [Slides on Using _Reject Elements_ to capture bad data](
+/best-practices/P-DFDL-Reject-Elements.pdf)
+- [Slides on Round-trip (parse + unparse) testing (with TDML)](
+/best-practices/P-DFDL-Round-Trip-Testing.pdf)
+- [Slides on DFDL Schemas for ad-hoc structured text formats](
+/best-practices/P-DFDL-Structured-Text.pdf)
+
+
This set of notes represents best practices after learning _the hard way_ from
many debugging
exercises and creating a wide variety of DFDL schemas from small teaching
examples to large
production schemas for major data formats with more than 100K lines of DFDL.
@@ -62,7 +75,7 @@ that one might call _Strict Venetian-Blind Type Library_.
Below are the details.
-# Avoid Element Namespaces
+# Avoid Element Namespaces {#avoidElementNamespaces}
Much of the complexity of XML and XML Schema comes from their namespace
features.
This can be avoided entirely by following simple conventions.
@@ -276,7 +289,7 @@ This is not quite as clean, but minimizes redundancy within
what is allowed.
Note that the DFDL Workgroup is considering adding the ability to [put DFDL
properties on complex
types]({{site.data.links.dfdlSpec.issue71}}) in a future version of the DFDL
standard.
-# Avoid Child Elements with the Same Name
+# Avoid Child Elements with the Same Name {#AvoidChildElementsWithSameName}
XML Schema has a data model with some flexibility needed only for markup
languages.
@@ -316,7 +329,7 @@ typical in structured data systems.
JSON also has no notion of child elements with the same name, so avoiding this
enables a
DFDL schema to be JSON compatible.
-# Avoid Anonymous Choices
+# Avoid Anonymous Choices {#avoidAnonymousChoices}
XML Schema allows a choice to be anonymous within the data model of an
element. For example:
```xml
<element name="myElement">
@@ -387,25 +400,27 @@ structures of the other data systems which do not allow
anonymous choices.
Given two different versions of a schema, consider:
```xml
-<choice>
- <element name="v1">
- <complexType>
- <sequence>
- <element name="a" .../>
- <element name="c" type="xs:int" dfdl:length="7"/>
- </sequence>
- </complexType>
- </element>
- <element name="v2">
- <complexType>
- <sequence>
- <element name="b" .../>
- <element name="c" type="xs:int" dfdl:length="6"/>
- <element name="spare" type="xs:unsignedInt" dfdl:length="1"/>
- </sequence>
- </complexType>
- </element>
-</choice>
+<complexType name="v1OrV2">
+ <choice>
+ <element name="v1">
+ <complexType>
+ <sequence>
+ <element name="a" .../>
+ <element name="c" type="xs:int" dfdl:length="7"/>
+ </sequence>
+ </complexType>
+ </element>
+ <element name="v2">
+ <complexType>
+ <sequence>
+ <element name="b" .../>
+ <element name="c" type="xs:int" dfdl:length="6"/>
+ <element name="spare" type="xs:unsignedInt" dfdl:length="1"/>
+ </sequence>
+ </complexType>
+ </element>
+ </choice>
+</complexType>
```
Note both versions 1 and 2 have a child named `c` which is an `xs:int`.
@@ -415,19 +430,21 @@ The two differ only by a DFDL property (`dfdl:length`).
Consider instead using this technique:
```xml
-<choice>
- <sequence>
- <element name="v1" type="pre:empty"/>
- <element name="a" .../>
- <element name="c" type="xs:int" dfdl:length="7"/>
- </sequence>
- <sequence>
- <element name="v2" type="pre:empty"/>
- <element name="b" .../>
- <element name="c" type="xs:int" dfdl:length="6"/>
- <element name="spare" type="xs:unsignedInt" dfdl:length="1"/>
- </sequence>
-</choice>
+<complexType name="v1OrV2">
+ <choice>
+ <sequence>
+ <element name="v1" type="pre:empty"/>
+ <element name="a" .../>
+ <element name="c" type="xs:int" dfdl:length="7"/>
+ </sequence>
+ <sequence>
+ <element name="v2" type="pre:empty"/>
+ <element name="b" .../>
+ <element name="c" type="xs:int" dfdl:length="6"/>
+ <element name="spare" type="xs:unsignedInt" dfdl:length="1"/>
+ </sequence>
+ </choice>
+</complexType>
```
This uses a marker element which will be `<v1/>` or `<v2/>` before the other
elements.
A path to the `c` element will not have a `v1` nor `v2` element parent.
@@ -528,7 +545,7 @@ are small.
>
> ### About Spec Deltas
>
-> A deltas between two versions of a format specification document can be
classified as one of
+> A delta between two versions of a format specification document can be
classified as one of
> these kinds:
> 1. Prose Correction: A clarification or correction to the text of the
> document that improves it,
> but does not represent any actual change to the data format.
diff --git a/site/dfdl-extensions.md b/site/dfdl-extensions.md
index c4d97fe..b152513 100644
--- a/site/dfdl-extensions.md
+++ b/site/dfdl-extensions.md
@@ -396,6 +396,15 @@ different reserved values since when unparsed, the
constant string `Reserved` wi
_canonicalized_ to integer 0.
Putting data into canonical form when unparsing generally improves data
security.
+> **Best Practices Note:** Avoid whitespace of any kind in enumerated constant
values.
+> It is best to replace spaces by underscores ("_").
+> This avoids problems when the infoset, represented in XML, is pretty printed
or otherwise
+> formatted.
+> Whitespace is generally fungible in XML, and a space could be turned into a
line
+> break by a variety of XML processing resulting in data that will
+> not validate (as an XML document) nor unparse successfully.
+
+
# Extended Behaviors for DFDL Types
## Type ``xs:hexBinary``
diff --git a/site/dfdl-training.md b/site/dfdl-training.md
index 855d57d..d8c02df 100644
--- a/site/dfdl-training.md
+++ b/site/dfdl-training.md
@@ -202,6 +202,14 @@ showcasing:
- Multi-version support - this schema handles both revisions C and D1 of the
format
simultaneously.
+# Specific DFDL Properties
+Short training slide decks or pages about specific properties.
+- [DFDL `lengthKind`, `lengthUnits`, `bitOrder`, and `byteOrder` properties](
+/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pdf)
+- [DFDL Pad and Fill (`dfdl:fillByte`)](
+/tutorials/P-Filling-vs-Padding-Trimming.pdf)
+
+
# Other Learning Resources
There are a variety of other materials on the Internet that provide some DFDL
training:
diff --git a/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pdf
b/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pdf
new file mode 100755
index 0000000..45ce110
Binary files /dev/null and
b/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pdf differ
diff --git a/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pptx
b/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pptx
new file mode 100755
index 0000000..716d536
Binary files /dev/null and
b/site/tutorials/P-DFDL-Properties-lengthKind-bitOrder.pptx differ
diff --git a/site/tutorials/P-Filling-vs-Padding-Trimming.pdf
b/site/tutorials/P-Filling-vs-Padding-Trimming.pdf
new file mode 100755
index 0000000..481d34e
Binary files /dev/null and b/site/tutorials/P-Filling-vs-Padding-Trimming.pdf
differ
diff --git a/site/tutorials/P-Filling-vs-Padding-Trimming.pptx
b/site/tutorials/P-Filling-vs-Padding-Trimming.pptx
new file mode 100755
index 0000000..3402b2d
Binary files /dev/null and b/site/tutorials/P-Filling-vs-Padding-Trimming.pptx
differ