stevedlawrence commented on code in PR #193:
URL: https://github.com/apache/daffodil-site/pull/193#discussion_r2465690243
##########
site/_pandoc/Makefile:
##########
@@ -0,0 +1,63 @@
+# ==========================================================
+# Pandoc PDF generator for Jekyll site
+# Scans Markdown files with "pdf: true" in YAML front matter
+# and produces PDFs in the site's ./pdf/ directory
+# ==========================================================
+
+# --- Configuration ---
+SITE_ROOT := ..
+AWK_UNWRAP := $(SITE_ROOT)/_pandoc/unwrap-pandoc.awk
+AWK_LIST := $(SITE_ROOT)/_pandoc/list-pdf-sources.awk
+PANDOC := pandoc
+
+# Output directory for generated PDFs (at site root)
+PDF_OUTDIR := $(SITE_ROOT)/pdf
+
+DEFAULTS := $(SITE_ROOT)/_pandoc/basic.yaml
+
+# --- Candidate Markdown files (exclude build/tool/output dirs) ---
+# Use find + awk pipeline โ awk -f avoids executable bit.
+MD_CANDIDATES := $(shell find $(SITE_ROOT) \
+ -type f -name '*.md' \
+ -not -path '*/_*/*' \
+ -not -path '*/node_modules/*' \
+ -not -path '*/vendor/*' \
+ -not -path '*/pdf/*' \
+ -print0 | xargs -0 -r awk -f $(AWK_LIST))
Review Comment:
We don't have a vendor directory, and I think the pdf directory shouldn't
contain and .md files? And I think node_modules is in the repo root so
shouldn't be in the site root and will be ignored. Also, `_*` directories
should never have pdf:true in them. All this to say, feels like we could
simplify this quite a bit and just use grep, e.g.:
```bash
MD_CANDIDATES=`$(grep -Rl '^pdf: true$' $(SITE_ROOT) --include '*.md')
```
It's not as relaxed about whitespace like the awk/sed script, and doesn't
even require pdf:true to be in the header, but I think that's probably fine, I
doubt we'll ever have a single line with jsut "pdf: true" in it anywhere. And
is much easier to maintain than the awk script.
##########
site/dfdl-extensions.md:
##########
@@ -21,38 +22,60 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
+<!--
+The :target="_blank" syntax below makes this open in a new tab
+and work in the PDF and jekyll web page.
+But displays as literal text in the IDE markdown previewer.
+-->
+<div class="only-jekyll" markdown="1">
+_This page is available as a [downloadable
PDF](../pdf/dfdl-extensions.pdf){:target="_blank"}._
Review Comment:
Wondering if we can move this logic somewhere else so we don't have to
duplicate it every time we mark a page as pdf: true. Maybe this is rare enough
that's it's not necessary? But for example, you could put somethign like this
in _navigation.html.
```
{% if page.pdf == "true" %}
<div><i>This page is available as a <a href="../pdf/{{ page.title }}.pdf"
target="_blank">downloadable PDF</a></i></div>
{% endif %}
```
##########
site/_pandoc/list-pdf-sources.awk:
##########
@@ -0,0 +1,59 @@
+#!/usr/bin/awk -f
+# Prints FILENAME iff the file has YAML front matter with "pdf: true".
+# - Must be called with filenames (works via find/xargs).
+# - Ignores matches outside the front matter.
+# - Front matter is the lines between the first '---' and the next '---'.
+
+# We process each file independently.
+# Use a per-file BEGINFILE block if available (GNU awk). Otherwise reset on
first record.
+BEGIN {
+ have_beginfile = 0
+}
+BEGINFILE {
+ have_beginfile = 1
+ in_front = 0
+ seen_start = 0
+ want_pdf = 0
+}
+
+# For non-GNU awk compatibility:
+# Reset on the first line of each file if BEGINFILE isn't supported.
+FNR == 1 && !have_beginfile {
+ in_front = 0
+ seen_start = 0
+ want_pdf = 0
+}
+
+{
+ # Detect start of front matter
+ if (!seen_start) {
+ if ($0 ~ /^[[:space:]]*---[[:space:]]*$/) {
+ seen_start = 1
+ in_front = 1
+ next
+ } else {
+ # No front matter: skip file
+ nextfile
+ }
+ }
+
+ # If in front matter, look for end and for pdf:true
+ if (in_front) {
+ if ($0 ~ /^[[:space:]]*---[[:space:]]*$/) {
+ # End of front matter
+ in_front = 0
+ if (want_pdf) {
+ print FILENAME
+ }
+ nextfile
+ }
+ # Match "pdf: true" allowing spaces; ensure it's a key at line start
+ if ($0 ~ /^[[:space:]]*pdf:[[:space:]]*true([[:space:]]|$)/) {
+ want_pdf = 1
+ }
+ next
+ }
+
+ # If we got here, weโve passed front matter without finding pdf:true
+ nextfile
+}
Review Comment:
Thoughts on a simple sed/bash script?
```bash
#!/bin/bash
for FILE in "$@"; do
sed -n '/^---$/,/^---$/p' "$FILE" |
grep -Eq '^[[:space:]]*pdf:[[:space:]]*true[[:space:]]*$' && echo "$FILE"
done
```
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
Review Comment:
Can we remove these emojis? I don't think they add anything, if anythign
they make things more confusing.
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
Review Comment:
These don't exist, can we remove these?
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
Review Comment:
We don't have config.yml or _post or _pages. Suggest we just remove this as
it doesn't provide anything useful.
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐งฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf
+```
+
+---
+
+## ๐งฐ How the AWK Script Works
+
+`unwrap-pandoc.awk` removes the HTML comment wrappers used to hide LaTeX from
Jekyll.
+
+Input example:
+
+````markdown
+<!-- PANDOC:START -->
+<!--
+\LaTeX code
+-->
+<!-- PANDOC:END -->
+````
+
+Output to Pandoc:
+
+```markdown
+\LaTeX code
+```
+
+That means Pandoc receives clean, valid LaTeX syntax while Jekyll never sees
it.
+
+---
Review Comment:
I think the aboe section kindof already mentions pandoc:start/end.
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐งฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf
+```
+
+---
+
+## ๐งฐ How the AWK Script Works
+
+`unwrap-pandoc.awk` removes the HTML comment wrappers used to hide LaTeX from
Jekyll.
+
+Input example:
+
+````markdown
+<!-- PANDOC:START -->
+<!--
+\LaTeX code
+-->
+<!-- PANDOC:END -->
+````
+
+Output to Pandoc:
+
+```markdown
+\LaTeX code
+```
+
+That means Pandoc receives clean, valid LaTeX syntax while Jekyll never sees
it.
+
+---
+
+## โ๏ธ Customizing Pandoc
+
+Pandoc is run with `--defaults=basic.yaml` which specifies the
`template_basic.tex` is used.
+The template can be modified to change the PDF output.
+
+---
+
+## ๐งฑ Recommended Workflow
+
+1. Write Markdown pages normally for your Jekyll site.
+2. When you also want a PDF version, add `pdf: true` to front matter.
+3. If needed, wrap LaTeX-specific content in `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` blocks.
+4. From `_pandoc/`, run:
+ ```bash
+ make
+ ```
+5. Find the generated PDFs in `pdf/`.
+
+---
+
+## ๐ช Why This Setup Works
+
+| Concern | Solution |
+|----------|-----------|
+| Jekyll shouldnโt see LaTeX | Hidden in HTML comments |
+| Pandoc must see LaTeX | AWK removes wrappers |
+| Need automatic PDF generation | Makefile scans for `pdf: true` |
+| Keep tools separate | Everything lives in `_pandoc/` |
+
+---
Review Comment:
Suggest we remove this section.
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐งฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf
+```
+
+---
+
+## ๐งฐ How the AWK Script Works
+
+`unwrap-pandoc.awk` removes the HTML comment wrappers used to hide LaTeX from
Jekyll.
+
+Input example:
+
+````markdown
+<!-- PANDOC:START -->
+<!--
+\LaTeX code
+-->
+<!-- PANDOC:END -->
+````
+
+Output to Pandoc:
+
+```markdown
+\LaTeX code
+```
+
+That means Pandoc receives clean, valid LaTeX syntax while Jekyll never sees
it.
+
+---
+
+## โ๏ธ Customizing Pandoc
+
+Pandoc is run with `--defaults=basic.yaml` which specifies the
`template_basic.tex` is used.
+The template can be modified to change the PDF output.
+
+---
+
+## ๐งฑ Recommended Workflow
+
+1. Write Markdown pages normally for your Jekyll site.
+2. When you also want a PDF version, add `pdf: true` to front matter.
+3. If needed, wrap LaTeX-specific content in `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` blocks.
+4. From `_pandoc/`, run:
+ ```bash
+ make
+ ```
+5. Find the generated PDFs in `pdf/`.
+
+---
+
+## ๐ช Why This Setup Works
+
+| Concern | Solution |
+|----------|-----------|
+| Jekyll shouldnโt see LaTeX | Hidden in HTML comments |
+| Pandoc must see LaTeX | AWK removes wrappers |
+| Need automatic PDF generation | Makefile scans for `pdf: true` |
+| Keep tools separate | Everything lives in `_pandoc/` |
+
+---
+
+## ๐งพ Example Output
+
+```
+pdf/
+โโโ about.pdf
+โโโ _posts/
+ โโโ 2025-01-01-example.pdf
+```
+
+---
+
+**Maintainer Notes**
+
+- `_pandoc/Makefile` assumes itโs run from `_pandoc/`, with site root as `..`
+- Pandoc and AWK must be available on your `PATH`
+
+---
+
+## Pandoc Tools Installation
+
+These tools run on Linux.
+
+On Ubuntu you have to install these things:
+
+ sudo apt install pandoc texlive-latex-base texlive-latex-recommended \
+ texlive-fonts-recommended texlive-xetex texlive-latex-extra
+
+I have found one must update pandoc to a more up to date version.
+This is currently dependent on pandoc 3.7.0.2 which can be downloaded from
+https://github.com/jgm/pandoc/releases/tag/3.7.0.2 .
Review Comment:
Does this really depend on 3.7? I don't think we should rely on devs needing
to install newer versions from source. Fedora only ships with 3.1, suggest we
avoid modern features to make these easier to test.
##########
site/dfdl-extensions.md:
##########
@@ -21,38 +22,60 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
+<!--
+The :target="_blank" syntax below makes this open in a new tab
+and work in the PDF and jekyll web page.
+But displays as literal text in the IDE markdown previewer.
+-->
+<div class="only-jekyll" markdown="1">
+_This page is available as a [downloadable
PDF](../pdf/dfdl-extensions.pdf){:target="_blank"}._
+
+### Table of Contents
+{:.no_toc}
+
+1. use ordered table of contents
+{:toc}
+</div>
+
+<div class="only-pandoc" markdown="1">
+# Introduction
+</div>
Review Comment:
Shoudl we just always have the Intruduction header? The less differences
between PDF and website the better.
##########
site/dfdl-extensions.md:
##########
@@ -21,38 +22,60 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
+<!--
+The :target="_blank" syntax below makes this open in a new tab
+and work in the PDF and jekyll web page.
+But displays as literal text in the IDE markdown previewer.
+-->
+<div class="only-jekyll" markdown="1">
+_This page is available as a [downloadable
PDF](../pdf/dfdl-extensions.pdf){:target="_blank"}._
+
+### Table of Contents
+{:.no_toc}
+
+1. use ordered table of contents
+{:toc}
+</div>
+
+<div class="only-pandoc" markdown="1">
+# Introduction
+</div>
Daffodil provides extensions to the DFDL specification.
-These properties are in the namespace defined by the URI
+These functions and properties are in the namespace defined by the URI
``http://www.ogf.org/dfdl/dfdl-1.0/extensions`` which is normally bound to the
``dfdlx`` prefix
like so:
``` xml
-<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
- xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
- xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
+<schema xmlns="http://www.w3.org/2001/XMLSchema"
+ xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+ xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
>
```
-The following symbols defined in this namespace are described below.
+The DFDL language extensions described below have Long Term Support (LTS) in
Daffodil
+going forward, and are proposed for inclusion in a future revision of the DFDL
+standard.
+DFDL schema authors can depend on the features and behaviors defined here
without fear
+that these extensions will be withdrawn in the future.
-### Expression Functions
+# Expression Functions
Review Comment:
I think there wareason we used the `###`. That said, we should probably
figure out why and change things. It is pretty annoying that our .md pages cant
use normal headings. I'll look into this and see if I can figure out the
reasoning and if we can change it.
##########
site/dfdl-extensions.md:
##########
@@ -87,46 +110,112 @@ found after fields `a` and `b`:
<xs:element name="tag" type="xs:int" dfdl:length="8" />
```
-Bitwise Functions
+## Bitwise Functions: `bitAnd`, `bitOr`, `bitXor`, `bitNot`, `leftShift`,
`rightShift`
+
+These functions are defined on types `long`, `int`, `short`, `byte`,
`unsignedLong`,
+`unsignedInt`, `unsignedShort`, and `unsignedByte`
+
+### `dfdlx:bitAnd(arg1, arg2)`
+
+This computes the bitwise AND of two integers.
+
+- Both arguments must be signed, or both must be unsigned.
+- If the two arguments are not the same type the smaller one is converted into
the type of the
+larger one.
+- If the smaller argument is signed, this conversion does sign-extension.
+- The result type is the that of the largest argument.
+
+### `dfdlx:bitOr(arg1, arg2)`
+
+This computes the bitwise OR of two integers.
+
+- Both arguments must be signed, or both must be unsigned.
+- If the two arguments are not the same type the smaller one is converted into
the type of the
+larger one.
+- If the smaller argument is signed, this conversion does sign-extension.
+- The result type is the that of the largest argument.
+
+### `dfdlx:bitXor(arg1, arg2)`
+
+This computes the bitwise Exclusive OR of two integers.
+
+- Both arguments must be signed, or both must be unsigned.
+- If the two arguments are not the same type the smaller one is converted into
the type of the
+larger one.
+- If the smaller argument is signed, this conversion does sign-extension.
+- The result type is the that of the largest argument.
+
+### `dfdlx:bitNot(arg)`
+
+This computes the bitwise NOT of an integer. Every bit is inverted. The result
type is the same
+as the argument type.
+
+### `dfdlx:leftShift(value, shiftCount)`
+
+This is the _logical_ shift left, meaning that bits are shifted from
less-significant positions
+to more-significant positions.
+
+- The left-most bits shifted out are discarded.
+- Zeros are shifted in for the right-most bits.
+- The result type is the same as the `value` argument type.
+- It is a processing error if the `shiftCount` argument is < 0.
+- It is a processing error if the `shiftCount` argument is greater than the
number of
+ bits in the type of the value argument.
+
+### `dfdlx:rightShift(value, shiftCount)`
+
+This is the _arithmetic_ shift right, meaning bits move from most-significant
to
+less-significant positions.
+If _logical_ (zero-filling) shift right is needed, you must use unsigned types.
+
+- The `value` argument is shifted by the `shiftCount`.
+- The right-most bits shifted out are discarded.
+- If the `value` is signed, then the sign bit is shifted in for the left-most
bits.
+- If the `value` is unsigned, then zeros are shifted in for the left-most
bits.
+- The result type is the same as the `value` argument type.
+- It is a processing error if the `shiftCount` argument is < 0.
+- It is a processing error if the `shiftCount` argument is greater than the
number of
+ bits in the type of the value argument.
+
+## `dfdlx:doubleFromRawLong(longArg): double` and
`dfdlx:doubleToRawLong(doubleArg): long`
- : TBD, but the complete list (all ``dfdlx``) is `BitAnd`, `BitNot`,
`BitOr`, `BitXor`, `LeftShift`,
- `RightShift`
+IEEE binary float and double values that are not NaN will parse to base 10
text and unparse back
+to the same exact IEEE binary bits.
+However, the same cannot be said for NaN (not a number) values, of which there
are many bit
+patterns.
+To preserve float and double NaN values bit for bit you can use these
functions to compute
+`xs:long` values that enable the DFDL Infoset to preserve the bits of a float
or double value
+even if it is a NaN.
-``dfdlx:doubleFromRawLong`` and ``dfdlx:doubleToRawLong``
- : Converting binary floating point numbers to/from base 10 text can result
in lost information.
-The base 10 representation, converted back to binary representation, may not
be bit-for-bit
- identical. These functions can be used to carry 8-byte double precision
IEEE floating point
- numbers as type `xs:long` so that no information is lost. The DFDL schema
can still obtain
- and operate on the floating point value by converting these `xs:long`
values into type
- `xs:double`, and back if necessary for unparsing a new value.
-### Properties
+# Properties
-``dfdlx:parseUnparsePolicy``
+## `dfdlx:parseUnparsePolicy`
- : A property applied to simple and complex elements, which specifies
whether the element supports only parsing, only unparsing, or both parsing and
unparse. Valid values for this property are ``parse``, ``unparse``, or
``both``. This allows one to leave off properties that are required for only
parse or only unparse, such as ``dfdl:outputValueCalc`` or
``dfdl:outputNewLine``, so that one may have a valid schema if only a subset of
functionality is needed.
+A property applied to simple and complex elements, which specifies whether the
element supports only parsing, only unparsing, or both parsing and unparse.
Valid values for this property are ``parse``, ``unparse``, or ``both``. This
allows one to leave off properties that are required for only parse or only
unparse, such as ``dfdl:outputValueCalc`` or ``dfdl:outputNewLine``, so that
one may have a valid schema if only a subset of functionality is needed.
- All elements must have a compatible parseUnparsePolicy with the
compilation parseUnparsePolicy (which is defined by the root element
daf:parseUnparsePolicy and/or the Daffodil parseUnparsePolicy tunable) or it is
a Schema Definition Error. An element is defined to have a compatible
parseUnparsePolicy if it has the same value as the compilation
parseUnparsePolicy or if it has the value ``both``.
+All elements must have a compatible parseUnparsePolicy with the compilation
parseUnparsePolicy (which is defined by the root element daf:parseUnparsePolicy
and/or the Daffodil parseUnparsePolicy tunable) or it is a Schema Definition
Error. An element is defined to have a compatible parseUnparsePolicy if it has
the same value as the compilation parseUnparsePolicy or if it has the value
``both``.
- For compatibility, if this property is not defined, it is assumed to be
``both``.
+For compatibility, if this property is not defined, it is assumed to be
``both``.
-``dfdlx:layer``
+## `dfdlx:layer`
- : [Layers](/layers) provide algorithmic capabilities for decoding/encoding
data or computing
+_Layers_ provide algorithmic capabilities for decoding/encoding data or
computing
checksums. Some are built-in to Daffodil. New layers can be created in
Java/Scala and
plugged-in to Daffodil dynamically.
+There is [separate Layer documentation](/layers).
-``dfdlx:direction``
+## `dfdlx:direction`
- : TBD
+This property has
Review Comment:
Should probably stay TBD until documented.
##########
site/_pandoc/only.lua:
##########
@@ -0,0 +1,60 @@
+-- only.lua: drop .only-jekyll, keep contents of .only-pandoc
+-- Handles both native Div/Span nodes and raw HTML <div> wrappers.
+
+local List = require 'pandoc.List'
+
+local function has_class(classes, cls)
+ return classes and List.includes(classes, cls)
+end
+
+-- Native block divs (Pandoc recognized <div class="..."> as Div)
+function Div(el)
+ if has_class(el.classes, 'only-jekyll') then
+ return {} -- drop entirely
+ elseif has_class(el.classes, 'only-pandoc') then
+ return el.content -- unwrap: keep inner blocks
+ end
+end
+
+-- Native inline spans
+function Span(el)
+ if has_class(el.classes, 'only-jekyll') then
+ return {}
+ elseif has_class(el.classes, 'only-pandoc') then
+ return el.content
+ end
+end
+
+-- Fallback for raw HTML wrappers when Pandoc didnโt turn them into Divs.
+function Pandoc(doc)
+ local out = List()
+ local mode = nil -- nil | 'drop' | 'keep'
+
+ local function is_open_of(txt, klass)
+ -- match <div ... class="... klass ...">
+ return txt:match('<div[^>]-class=[\'"][^\'"]-' .. klass .. '[^\'"]-[\'"]')
+ end
+
+ for _, blk in ipairs(doc.blocks) do
+ if blk.t == 'RawBlock' and blk.format:match('html') then
+ local t = blk.text
+ if is_open_of(t, 'only%-jekyll') then
+ mode = 'drop' -- drop wrapper and its inner content
+ elseif is_open_of(t, 'only%-pandoc') then
+ mode = 'keep' -- drop wrapper, keep inner content
+ elseif t:match('</div>') and mode ~= nil then
+ mode = nil
+ else
+ if not mode or mode == 'keep' then out:insert(blk) end
+ end
+ else
+ if not mode then
+ out:insert(blk)
+ elseif mode == 'keep' then
+ out:insert(blk)
+ end
+ end
+ end
+
+ return pandoc.Pandoc(out, doc.meta)
+end
Review Comment:
What is the difference between these .only-pandoc and the PANDOC:START/END
things? Seems like they are two different ways to do the same thign?
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐งฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf
Review Comment:
Is this right, shouldn't this be make ../pdf/about.pdf?
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
Review Comment:
I thought they go int he ../pdf directory?
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
Review Comment:
Do we need this? I don't think we actually use latex anywhere. I also would
want to avoid things where are pdf files are different than our markdown. Seems
like pandoc should be able to handle converting our normal markdown files to
pdf without a problem. It would be nice if we could remove all the preprocessor
stuff if we don't really use it.
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
Review Comment:
We don't have _posts or _pages, suggest we just say .md files.
##########
site/pdf/dfdl-extensions.pdf:
##########
Review Comment:
We should not commit changes to these PDF files. Instead, we should modify
the build/publish CI tool to rebuild the PDF files and commit them along with
site changes.
##########
site/_pandoc/Makefile:
##########
@@ -0,0 +1,63 @@
+# ==========================================================
+# Pandoc PDF generator for Jekyll site
+# Scans Markdown files with "pdf: true" in YAML front matter
+# and produces PDFs in the site's ./pdf/ directory
+# ==========================================================
+
+# --- Configuration ---
+SITE_ROOT := ..
+AWK_UNWRAP := $(SITE_ROOT)/_pandoc/unwrap-pandoc.awk
+AWK_LIST := $(SITE_ROOT)/_pandoc/list-pdf-sources.awk
+PANDOC := pandoc
+
+# Output directory for generated PDFs (at site root)
+PDF_OUTDIR := $(SITE_ROOT)/pdf
+
+DEFAULTS := $(SITE_ROOT)/_pandoc/basic.yaml
+
+# --- Candidate Markdown files (exclude build/tool/output dirs) ---
+# Use find + awk pipeline โ awk -f avoids executable bit.
+MD_CANDIDATES := $(shell find $(SITE_ROOT) \
+ -type f -name '*.md' \
+ -not -path '*/_*/*' \
+ -not -path '*/node_modules/*' \
+ -not -path '*/vendor/*' \
+ -not -path '*/pdf/*' \
+ -print0 | xargs -0 -r awk -f $(AWK_LIST))
+
+# --- Files to build ---
+PDF_SRCS := $(MD_CANDIDATES)
+PDFS := $(patsubst $(SITE_ROOT)/%.md,$(PDF_OUTDIR)/%.pdf,$(PDF_SRCS))
+
Review Comment:
I wonder how useful it is to have a single PDF per page. For examle, if
someone wants to create a PDF of a page, they can just print it to a PDF.
Feels like a more useful documentation would be to combine chosen pages into
a single large PDF. That way users could download that and have a single
offlien source for all Daffodil releated things. That might simplify much of
the logic too. For example, we just have one navigation link do download
offline documentation, rather than having a bunch of pages with indivdual links.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]