mbeckerle commented on code in PR #193:
URL: https://github.com/apache/daffodil-site/pull/193#discussion_r2469791351
##########
site/_pandoc/list-pdf-sources.awk:
##########
@@ -0,0 +1,59 @@
+#!/usr/bin/awk -f
+# Prints FILENAME iff the file has YAML front matter with "pdf: true".
+# - Must be called with filenames (works via find/xargs).
+# - Ignores matches outside the front matter.
+# - Front matter is the lines between the first '---' and the next '---'.
+
+# We process each file independently.
+# Use a per-file BEGINFILE block if available (GNU awk). Otherwise reset on
first record.
+BEGIN {
+ have_beginfile = 0
+}
+BEGINFILE {
+ have_beginfile = 1
+ in_front = 0
+ seen_start = 0
+ want_pdf = 0
+}
+
+# For non-GNU awk compatibility:
+# Reset on the first line of each file if BEGINFILE isn't supported.
+FNR == 1 && !have_beginfile {
+ in_front = 0
+ seen_start = 0
+ want_pdf = 0
+}
+
+{
+ # Detect start of front matter
+ if (!seen_start) {
+ if ($0 ~ /^[[:space:]]*---[[:space:]]*$/) {
+ seen_start = 1
+ in_front = 1
+ next
+ } else {
+ # No front matter: skip file
+ nextfile
+ }
+ }
+
+ # If in front matter, look for end and for pdf:true
+ if (in_front) {
+ if ($0 ~ /^[[:space:]]*---[[:space:]]*$/) {
+ # End of front matter
+ in_front = 0
+ if (want_pdf) {
+ print FILENAME
+ }
+ nextfile
+ }
+ # Match "pdf: true" allowing spaces; ensure it's a key at line start
+ if ($0 ~ /^[[:space:]]*pdf:[[:space:]]*true([[:space:]]|$)/) {
+ want_pdf = 1
+ }
+ next
+ }
+
+ # If we got here, weโve passed front matter without finding pdf:true
+ nextfile
+}
Review Comment:
One-liner grep in prior comment supercedes any need for this script as awk
or bash.
##########
site/_pandoc/Makefile:
##########
@@ -0,0 +1,63 @@
+# ==========================================================
+# Pandoc PDF generator for Jekyll site
+# Scans Markdown files with "pdf: true" in YAML front matter
+# and produces PDFs in the site's ./pdf/ directory
+# ==========================================================
+
+# --- Configuration ---
+SITE_ROOT := ..
+AWK_UNWRAP := $(SITE_ROOT)/_pandoc/unwrap-pandoc.awk
+AWK_LIST := $(SITE_ROOT)/_pandoc/list-pdf-sources.awk
+PANDOC := pandoc
+
+# Output directory for generated PDFs (at site root)
+PDF_OUTDIR := $(SITE_ROOT)/pdf
+
+DEFAULTS := $(SITE_ROOT)/_pandoc/basic.yaml
+
+# --- Candidate Markdown files (exclude build/tool/output dirs) ---
+# Use find + awk pipeline โ awk -f avoids executable bit.
+MD_CANDIDATES := $(shell find $(SITE_ROOT) \
+ -type f -name '*.md' \
+ -not -path '*/_*/*' \
+ -not -path '*/node_modules/*' \
+ -not -path '*/vendor/*' \
+ -not -path '*/pdf/*' \
+ -print0 | xargs -0 -r awk -f $(AWK_LIST))
Review Comment:
Fixed
##########
site/dfdl-extensions.md:
##########
@@ -21,38 +22,60 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
+<!--
+The :target="_blank" syntax below makes this open in a new tab
+and work in the PDF and jekyll web page.
+But displays as literal text in the IDE markdown previewer.
+-->
+<div class="only-jekyll" markdown="1">
+_This page is available as a [downloadable
PDF](../pdf/dfdl-extensions.pdf){:target="_blank"}._
+
+### Table of Contents
+{:.no_toc}
+
+1. use ordered table of contents
+{:toc}
+</div>
+
+<div class="only-pandoc" markdown="1">
+# Introduction
+</div>
Review Comment:
Done.
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
Review Comment:
Updated
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
Review Comment:
fixed
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐งฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf
+```
+
+---
+
+## ๐งฐ How the AWK Script Works
+
+`unwrap-pandoc.awk` removes the HTML comment wrappers used to hide LaTeX from
Jekyll.
+
+Input example:
+
+````markdown
+<!-- PANDOC:START -->
+<!--
+\LaTeX code
+-->
+<!-- PANDOC:END -->
+````
+
+Output to Pandoc:
+
+```markdown
+\LaTeX code
+```
+
+That means Pandoc receives clean, valid LaTeX syntax while Jekyll never sees
it.
+
+---
Review Comment:
removed section
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
Review Comment:
fixed
##########
site/dfdl-extensions.md:
##########
@@ -21,38 +22,60 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
+<!--
+The :target="_blank" syntax below makes this open in a new tab
+and work in the PDF and jekyll web page.
+But displays as literal text in the IDE markdown previewer.
+-->
+<div class="only-jekyll" markdown="1">
+_This page is available as a [downloadable
PDF](../pdf/dfdl-extensions.pdf){:target="_blank"}._
Review Comment:
Done. Had to put it into default.html and use page.path etc. but works.
However, pages still have conditionalization about the jekyll table of
contents. That is, if you want the jekyll page to have a TOC, you have to
include the only-jekyll conditional TOC boilerplate at the top (because pandoc
generates the TOC without needing this, and ends up including the jekyll code
as plain text). I think PDF pages are probably going to want a TOC usually both
on the web and PDF, but I haven't figured out how to centralize this yet.
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
Review Comment:
removed irrelevant parts.
##########
site/_pandoc/only.lua:
##########
@@ -0,0 +1,60 @@
+-- only.lua: drop .only-jekyll, keep contents of .only-pandoc
+-- Handles both native Div/Span nodes and raw HTML <div> wrappers.
+
+local List = require 'pandoc.List'
+
+local function has_class(classes, cls)
+ return classes and List.includes(classes, cls)
+end
+
+-- Native block divs (Pandoc recognized <div class="..."> as Div)
+function Div(el)
+ if has_class(el.classes, 'only-jekyll') then
+ return {} -- drop entirely
+ elseif has_class(el.classes, 'only-pandoc') then
+ return el.content -- unwrap: keep inner blocks
+ end
+end
+
+-- Native inline spans
+function Span(el)
+ if has_class(el.classes, 'only-jekyll') then
+ return {}
+ elseif has_class(el.classes, 'only-pandoc') then
+ return el.content
+ end
+end
+
+-- Fallback for raw HTML wrappers when Pandoc didnโt turn them into Divs.
+function Pandoc(doc)
+ local out = List()
+ local mode = nil -- nil | 'drop' | 'keep'
+
+ local function is_open_of(txt, klass)
+ -- match <div ... class="... klass ...">
+ return txt:match('<div[^>]-class=[\'"][^\'"]-' .. klass .. '[^\'"]-[\'"]')
+ end
+
+ for _, blk in ipairs(doc.blocks) do
+ if blk.t == 'RawBlock' and blk.format:match('html') then
+ local t = blk.text
+ if is_open_of(t, 'only%-jekyll') then
+ mode = 'drop' -- drop wrapper and its inner content
+ elseif is_open_of(t, 'only%-pandoc') then
+ mode = 'keep' -- drop wrapper, keep inner content
+ elseif t:match('</div>') and mode ~= nil then
+ mode = nil
+ else
+ if not mode or mode == 'keep' then out:insert(blk) end
+ end
+ else
+ if not mode then
+ out:insert(blk)
+ elseif mode == 'keep' then
+ out:insert(blk)
+ end
+ end
+ end
+
+ return pandoc.Pandoc(out, doc.meta)
+end
Review Comment:
I dropped the PANDOC:START/END stuff. This class/div stuff is used instead.
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
Review Comment:
fixed
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
Review Comment:
Section removed anyway.
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐งฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf
Review Comment:
fixed
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐งฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf
+```
+
+---
+
+## ๐งฐ How the AWK Script Works
+
+`unwrap-pandoc.awk` removes the HTML comment wrappers used to hide LaTeX from
Jekyll.
+
+Input example:
+
+````markdown
+<!-- PANDOC:START -->
+<!--
+\LaTeX code
+-->
+<!-- PANDOC:END -->
+````
+
+Output to Pandoc:
+
+```markdown
+\LaTeX code
+```
+
+That means Pandoc receives clean, valid LaTeX syntax while Jekyll never sees
it.
+
+---
+
+## โ๏ธ Customizing Pandoc
+
+Pandoc is run with `--defaults=basic.yaml` which specifies the
`template_basic.tex` is used.
+The template can be modified to change the PDF output.
+
+---
+
+## ๐งฑ Recommended Workflow
+
+1. Write Markdown pages normally for your Jekyll site.
+2. When you also want a PDF version, add `pdf: true` to front matter.
+3. If needed, wrap LaTeX-specific content in `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` blocks.
+4. From `_pandoc/`, run:
+ ```bash
+ make
+ ```
+5. Find the generated PDFs in `pdf/`.
+
+---
+
+## ๐ช Why This Setup Works
+
+| Concern | Solution |
+|----------|-----------|
+| Jekyll shouldnโt see LaTeX | Hidden in HTML comments |
+| Pandoc must see LaTeX | AWK removes wrappers |
+| Need automatic PDF generation | Makefile scans for `pdf: true` |
+| Keep tools separate | Everything lives in `_pandoc/` |
+
+---
+
+## ๐งพ Example Output
+
+```
+pdf/
+โโโ about.pdf
+โโโ _posts/
+ โโโ 2025-01-01-example.pdf
+```
+
+---
+
+**Maintainer Notes**
+
+- `_pandoc/Makefile` assumes itโs run from `_pandoc/`, with site root as `..`
+- Pandoc and AWK must be available on your `PATH`
+
+---
+
+## Pandoc Tools Installation
+
+These tools run on Linux.
+
+On Ubuntu you have to install these things:
+
+ sudo apt install pandoc texlive-latex-base texlive-latex-recommended \
+ texlive-fonts-recommended texlive-xetex texlive-latex-extra
+
+I have found one must update pandoc to a more up to date version.
+This is currently dependent on pandoc 3.7.0.2 which can be downloaded from
+https://github.com/jgm/pandoc/releases/tag/3.7.0.2 .
Review Comment:
I am retesting with pandoc 3.1.3.
##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐งญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected
Jekyll pages while keeping the same Markdown files usable by Jekyll for the
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐๏ธ Directory Layout
+
+```
+_pandoc/
+โ
+โโโ README.md โ this file
+โโโ Makefile โ builds all PDFs
+โโโ unwrap-pandoc.awk โ preprocessor that removes comment wrappers
+โโโ template.latex โ (optional) custom LaTeX template
+โโโ header.tex โ (optional) extra LaTeX header content
+โโโ ../pdf/ โ generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐งฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page
layout.
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐งฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf
+```
+
+---
+
+## ๐งฐ How the AWK Script Works
+
+`unwrap-pandoc.awk` removes the HTML comment wrappers used to hide LaTeX from
Jekyll.
+
+Input example:
+
+````markdown
+<!-- PANDOC:START -->
+<!--
+\LaTeX code
+-->
+<!-- PANDOC:END -->
+````
+
+Output to Pandoc:
+
+```markdown
+\LaTeX code
+```
+
+That means Pandoc receives clean, valid LaTeX syntax while Jekyll never sees
it.
+
+---
+
+## โ๏ธ Customizing Pandoc
+
+Pandoc is run with `--defaults=basic.yaml` which specifies the
`template_basic.tex` is used.
+The template can be modified to change the PDF output.
+
+---
+
+## ๐งฑ Recommended Workflow
+
+1. Write Markdown pages normally for your Jekyll site.
+2. When you also want a PDF version, add `pdf: true` to front matter.
+3. If needed, wrap LaTeX-specific content in `<!-- PANDOC:START -->` / `<!--
PANDOC:END -->` blocks.
+4. From `_pandoc/`, run:
+ ```bash
+ make
+ ```
+5. Find the generated PDFs in `pdf/`.
+
+---
+
+## ๐ช Why This Setup Works
+
+| Concern | Solution |
+|----------|-----------|
+| Jekyll shouldnโt see LaTeX | Hidden in HTML comments |
+| Pandoc must see LaTeX | AWK removes wrappers |
+| Need automatic PDF generation | Makefile scans for `pdf: true` |
+| Keep tools separate | Everything lives in `_pandoc/` |
+
+---
Review Comment:
removed
##########
site/_pandoc/Makefile:
##########
@@ -0,0 +1,63 @@
+# ==========================================================
+# Pandoc PDF generator for Jekyll site
+# Scans Markdown files with "pdf: true" in YAML front matter
+# and produces PDFs in the site's ./pdf/ directory
+# ==========================================================
+
+# --- Configuration ---
+SITE_ROOT := ..
+AWK_UNWRAP := $(SITE_ROOT)/_pandoc/unwrap-pandoc.awk
+AWK_LIST := $(SITE_ROOT)/_pandoc/list-pdf-sources.awk
+PANDOC := pandoc
+
+# Output directory for generated PDFs (at site root)
+PDF_OUTDIR := $(SITE_ROOT)/pdf
+
+DEFAULTS := $(SITE_ROOT)/_pandoc/basic.yaml
+
+# --- Candidate Markdown files (exclude build/tool/output dirs) ---
+# Use find + awk pipeline โ awk -f avoids executable bit.
+MD_CANDIDATES := $(shell find $(SITE_ROOT) \
+ -type f -name '*.md' \
+ -not -path '*/_*/*' \
+ -not -path '*/node_modules/*' \
+ -not -path '*/vendor/*' \
+ -not -path '*/pdf/*' \
+ -print0 | xargs -0 -r awk -f $(AWK_LIST))
+
+# --- Files to build ---
+PDF_SRCS := $(MD_CANDIDATES)
+PDFS := $(patsubst $(SITE_ROOT)/%.md,$(PDF_OUTDIR)/%.pdf,$(PDF_SRCS))
+
Review Comment:
printing web pages to PDF generally results in non-useful stuff.
Making one PDF is feasible by feeding all the "pdf: true" markdown files to
pandoc at once in the right sequence. I will experiment with this more later if
I decide to keep pursuing this.
##########
site/pdf/dfdl-extensions.pdf:
##########
Review Comment:
I've not fixed this yet, nor the issue of one-big PDF vs. separate PDFs per
page.
This is an interesting experiment in jekyll + pandoc, but I think I will
focus on content creation, not this sort of tooling for a while.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]