mbeckerle commented on code in PR #193:
URL: https://github.com/apache/daffodil-site/pull/193#discussion_r2469791351


##########
site/_pandoc/list-pdf-sources.awk:
##########
@@ -0,0 +1,59 @@
+#!/usr/bin/awk -f
+# Prints FILENAME iff the file has YAML front matter with "pdf: true".
+# - Must be called with filenames (works via find/xargs).
+# - Ignores matches outside the front matter.
+# - Front matter is the lines between the first '---' and the next '---'.
+
+# We process each file independently.
+# Use a per-file BEGINFILE block if available (GNU awk). Otherwise reset on 
first record.
+BEGIN {
+  have_beginfile = 0
+}
+BEGINFILE {
+  have_beginfile = 1
+  in_front = 0
+  seen_start = 0
+  want_pdf = 0
+}
+
+# For non-GNU awk compatibility:
+# Reset on the first line of each file if BEGINFILE isn't supported.
+FNR == 1 && !have_beginfile {
+  in_front = 0
+  seen_start = 0
+  want_pdf = 0
+}
+
+{
+  # Detect start of front matter
+  if (!seen_start) {
+    if ($0 ~ /^[[:space:]]*---[[:space:]]*$/) {
+      seen_start = 1
+      in_front = 1
+      next
+    } else {
+      # No front matter: skip file
+      nextfile
+    }
+  }
+
+  # If in front matter, look for end and for pdf:true
+  if (in_front) {
+    if ($0 ~ /^[[:space:]]*---[[:space:]]*$/) {
+      # End of front matter
+      in_front = 0
+      if (want_pdf) {
+        print FILENAME
+      }
+      nextfile
+    }
+    # Match "pdf: true" allowing spaces; ensure it's a key at line start
+    if ($0 ~ /^[[:space:]]*pdf:[[:space:]]*true([[:space:]]|$)/) {
+      want_pdf = 1
+    }
+    next
+  }
+
+  # If we got here, weโ€™ve passed front matter without finding pdf:true
+  nextfile
+}

Review Comment:
   One-liner grep in prior comment supercedes any need for this script as awk 
or bash. 



##########
site/_pandoc/Makefile:
##########
@@ -0,0 +1,63 @@
+# ==========================================================
+# Pandoc PDF generator for Jekyll site
+# Scans Markdown files with "pdf: true" in YAML front matter
+# and produces PDFs in the site's ./pdf/ directory
+# ==========================================================
+
+# --- Configuration ---
+SITE_ROOT := ..
+AWK_UNWRAP := $(SITE_ROOT)/_pandoc/unwrap-pandoc.awk
+AWK_LIST   := $(SITE_ROOT)/_pandoc/list-pdf-sources.awk
+PANDOC := pandoc
+
+# Output directory for generated PDFs (at site root)
+PDF_OUTDIR := $(SITE_ROOT)/pdf
+
+DEFAULTS := $(SITE_ROOT)/_pandoc/basic.yaml
+
+# --- Candidate Markdown files (exclude build/tool/output dirs) ---
+# Use find + awk pipeline โ€” awk -f avoids executable bit.
+MD_CANDIDATES := $(shell find $(SITE_ROOT) \
+  -type f -name '*.md' \
+  -not -path '*/_*/*' \
+  -not -path '*/node_modules/*' \
+  -not -path '*/vendor/*' \
+  -not -path '*/pdf/*' \
+  -print0 | xargs -0 -r awk -f $(AWK_LIST))

Review Comment:
   Fixed



##########
site/dfdl-extensions.md:
##########
@@ -21,38 +22,60 @@ See the License for the specific language governing 
permissions and
 limitations under the License.
 {% endcomment %}
 -->
+<!-- 
+The :target="_blank" syntax below makes this open in a new tab 
+and work in the PDF and jekyll web page.
+But displays as literal text in the IDE markdown previewer. 
+--> 
+<div class="only-jekyll" markdown="1">
+_This page is available as a [downloadable 
PDF](../pdf/dfdl-extensions.pdf){:target="_blank"}._
+
+### Table of Contents
+{:.no_toc}
+
+1. use ordered table of contents 
+{:toc}
+</div>
+
+<div class="only-pandoc" markdown="1">
+# Introduction
+</div>

Review Comment:
   Done. 



##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐Ÿงญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected 
Jekyll pages while keeping the same Markdown files usable by Jekyll for the 
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐Ÿ—๏ธ Directory Layout
+
+```
+_pandoc/
+โ”‚
+โ”œโ”€โ”€ README.md              โ† this file
+โ”œโ”€โ”€ Makefile               โ† builds all PDFs
+โ”œโ”€โ”€ unwrap-pandoc.awk      โ† preprocessor that removes comment wrappers
+โ”œโ”€โ”€ template.latex         โ† (optional) custom LaTeX template
+โ”œโ”€โ”€ header.tex             โ† (optional) extra LaTeX header content

Review Comment:
   Updated



##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐Ÿงญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected 
Jekyll pages while keeping the same Markdown files usable by Jekyll for the 
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐Ÿ—๏ธ Directory Layout
+
+```
+_pandoc/
+โ”‚
+โ”œโ”€โ”€ README.md              โ† this file
+โ”œโ”€โ”€ Makefile               โ† builds all PDFs
+โ”œโ”€โ”€ unwrap-pandoc.awk      โ† preprocessor that removes comment wrappers
+โ”œโ”€โ”€ template.latex         โ† (optional) custom LaTeX template
+โ”œโ”€โ”€ header.tex             โ† (optional) extra LaTeX header content
+โ””โ”€โ”€ ../pdf/                โ† generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐Ÿงฉ How It Works

Review Comment:
   fixed



##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐Ÿงญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected 
Jekyll pages while keeping the same Markdown files usable by Jekyll for the 
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐Ÿ—๏ธ Directory Layout
+
+```
+_pandoc/
+โ”‚
+โ”œโ”€โ”€ README.md              โ† this file
+โ”œโ”€โ”€ Makefile               โ† builds all PDFs
+โ”œโ”€โ”€ unwrap-pandoc.awk      โ† preprocessor that removes comment wrappers
+โ”œโ”€โ”€ template.latex         โ† (optional) custom LaTeX template
+โ”œโ”€โ”€ header.tex             โ† (optional) extra LaTeX header content
+โ””โ”€โ”€ ../pdf/                โ† generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐Ÿงฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect 
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page 
layout.  
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores 
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!-- 
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐Ÿงฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf
+```
+
+---
+
+## ๐Ÿงฐ How the AWK Script Works
+
+`unwrap-pandoc.awk` removes the HTML comment wrappers used to hide LaTeX from 
Jekyll.
+
+Input example:
+
+````markdown
+<!-- PANDOC:START -->
+<!--
+\LaTeX code
+-->
+<!-- PANDOC:END -->
+````
+
+Output to Pandoc:
+
+```markdown
+\LaTeX code
+```
+
+That means Pandoc receives clean, valid LaTeX syntax while Jekyll never sees 
it.
+
+---

Review Comment:
   removed section



##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐Ÿงญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected 
Jekyll pages while keeping the same Markdown files usable by Jekyll for the 
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐Ÿ—๏ธ Directory Layout
+
+```
+_pandoc/
+โ”‚
+โ”œโ”€โ”€ README.md              โ† this file
+โ”œโ”€โ”€ Makefile               โ† builds all PDFs
+โ”œโ”€โ”€ unwrap-pandoc.awk      โ† preprocessor that removes comment wrappers
+โ”œโ”€โ”€ template.latex         โ† (optional) custom LaTeX template
+โ”œโ”€โ”€ header.tex             โ† (optional) extra LaTeX header content
+โ””โ”€โ”€ ../pdf/                โ† generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐Ÿงฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect 
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page 
layout.  
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores 
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!-- 
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/

Review Comment:
   fixed



##########
site/dfdl-extensions.md:
##########
@@ -21,38 +22,60 @@ See the License for the specific language governing 
permissions and
 limitations under the License.
 {% endcomment %}
 -->
+<!-- 
+The :target="_blank" syntax below makes this open in a new tab 
+and work in the PDF and jekyll web page.
+But displays as literal text in the IDE markdown previewer. 
+--> 
+<div class="only-jekyll" markdown="1">
+_This page is available as a [downloadable 
PDF](../pdf/dfdl-extensions.pdf){:target="_blank"}._

Review Comment:
   Done. Had to put it into default.html and use page.path etc. but works. 
However, pages still have conditionalization about the jekyll table of 
contents. That is, if you want the jekyll page to have a TOC, you have to 
include the only-jekyll conditional TOC boilerplate at the top (because pandoc 
generates the TOC without needing this, and ends up including the jekyll code 
as plain text). I think PDF pages are probably going to want a TOC usually both 
on the web and PDF, but I haven't figured out how to centralize this yet. 



##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐Ÿงญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected 
Jekyll pages while keeping the same Markdown files usable by Jekyll for the 
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐Ÿ—๏ธ Directory Layout
+
+```
+_pandoc/
+โ”‚
+โ”œโ”€โ”€ README.md              โ† this file
+โ”œโ”€โ”€ Makefile               โ† builds all PDFs
+โ”œโ”€โ”€ unwrap-pandoc.awk      โ† preprocessor that removes comment wrappers
+โ”œโ”€โ”€ template.latex         โ† (optional) custom LaTeX template
+โ”œโ”€โ”€ header.tex             โ† (optional) extra LaTeX header content
+โ””โ”€โ”€ ../pdf/                โ† generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```

Review Comment:
   removed irrelevant parts. 



##########
site/_pandoc/only.lua:
##########
@@ -0,0 +1,60 @@
+-- only.lua: drop .only-jekyll, keep contents of .only-pandoc
+-- Handles both native Div/Span nodes and raw HTML <div> wrappers.
+
+local List = require 'pandoc.List'
+
+local function has_class(classes, cls)
+  return classes and List.includes(classes, cls)
+end
+
+-- Native block divs (Pandoc recognized <div class="..."> as Div)
+function Div(el)
+  if has_class(el.classes, 'only-jekyll') then
+    return {}                  -- drop entirely
+  elseif has_class(el.classes, 'only-pandoc') then
+    return el.content          -- unwrap: keep inner blocks
+  end
+end
+
+-- Native inline spans
+function Span(el)
+  if has_class(el.classes, 'only-jekyll') then
+    return {}
+  elseif has_class(el.classes, 'only-pandoc') then
+    return el.content
+  end
+end
+
+-- Fallback for raw HTML wrappers when Pandoc didnโ€™t turn them into Divs.
+function Pandoc(doc)
+  local out = List()
+  local mode = nil  -- nil | 'drop' | 'keep'
+
+  local function is_open_of(txt, klass)
+    -- match <div ... class="... klass ...">
+    return txt:match('<div[^>]-class=[\'"][^\'"]-' .. klass .. '[^\'"]-[\'"]')
+  end
+
+  for _, blk in ipairs(doc.blocks) do
+    if blk.t == 'RawBlock' and blk.format:match('html') then
+      local t = blk.text
+      if is_open_of(t, 'only%-jekyll') then
+        mode = 'drop'   -- drop wrapper and its inner content
+      elseif is_open_of(t, 'only%-pandoc') then
+        mode = 'keep'   -- drop wrapper, keep inner content
+      elseif t:match('</div>') and mode ~= nil then
+        mode = nil
+      else
+        if not mode or mode == 'keep' then out:insert(blk) end
+      end
+    else
+      if not mode then
+        out:insert(blk)
+      elseif mode == 'keep' then
+        out:insert(blk)
+      end
+    end
+  end
+
+  return pandoc.Pandoc(out, doc.meta)
+end

Review Comment:
   I dropped the PANDOC:START/END stuff. This class/div stuff is used instead. 



##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐Ÿงญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected 
Jekyll pages while keeping the same Markdown files usable by Jekyll for the 
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐Ÿ—๏ธ Directory Layout
+
+```
+_pandoc/
+โ”‚
+โ”œโ”€โ”€ README.md              โ† this file
+โ”œโ”€โ”€ Makefile               โ† builds all PDFs
+โ”œโ”€โ”€ unwrap-pandoc.awk      โ† preprocessor that removes comment wrappers
+โ”œโ”€โ”€ template.latex         โ† (optional) custom LaTeX template
+โ”œโ”€โ”€ header.tex             โ† (optional) extra LaTeX header content
+โ””โ”€โ”€ ../pdf/                โ† generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐Ÿงฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:

Review Comment:
   fixed



##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐Ÿงญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected 
Jekyll pages while keeping the same Markdown files usable by Jekyll for the 
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐Ÿ—๏ธ Directory Layout
+
+```
+_pandoc/
+โ”‚
+โ”œโ”€โ”€ README.md              โ† this file
+โ”œโ”€โ”€ Makefile               โ† builds all PDFs
+โ”œโ”€โ”€ unwrap-pandoc.awk      โ† preprocessor that removes comment wrappers
+โ”œโ”€โ”€ template.latex         โ† (optional) custom LaTeX template
+โ”œโ”€โ”€ header.tex             โ† (optional) extra LaTeX header content
+โ””โ”€โ”€ ../pdf/                โ† generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐Ÿงฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect 
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page 
layout.  
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores 
but our AWK preprocessor removes before running Pandoc.

Review Comment:
   Section removed anyway. 



##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐Ÿงญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected 
Jekyll pages while keeping the same Markdown files usable by Jekyll for the 
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐Ÿ—๏ธ Directory Layout
+
+```
+_pandoc/
+โ”‚
+โ”œโ”€โ”€ README.md              โ† this file
+โ”œโ”€โ”€ Makefile               โ† builds all PDFs
+โ”œโ”€โ”€ unwrap-pandoc.awk      โ† preprocessor that removes comment wrappers
+โ”œโ”€โ”€ template.latex         โ† (optional) custom LaTeX template
+โ”œโ”€โ”€ header.tex             โ† (optional) extra LaTeX header content
+โ””โ”€โ”€ ../pdf/                โ† generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐Ÿงฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect 
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page 
layout.  
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores 
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!-- 
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐Ÿงฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf

Review Comment:
   fixed



##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐Ÿงญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected 
Jekyll pages while keeping the same Markdown files usable by Jekyll for the 
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐Ÿ—๏ธ Directory Layout
+
+```
+_pandoc/
+โ”‚
+โ”œโ”€โ”€ README.md              โ† this file
+โ”œโ”€โ”€ Makefile               โ† builds all PDFs
+โ”œโ”€โ”€ unwrap-pandoc.awk      โ† preprocessor that removes comment wrappers
+โ”œโ”€โ”€ template.latex         โ† (optional) custom LaTeX template
+โ”œโ”€โ”€ header.tex             โ† (optional) extra LaTeX header content
+โ””โ”€โ”€ ../pdf/                โ† generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐Ÿงฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect 
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page 
layout.  
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores 
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!-- 
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐Ÿงฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf
+```
+
+---
+
+## ๐Ÿงฐ How the AWK Script Works
+
+`unwrap-pandoc.awk` removes the HTML comment wrappers used to hide LaTeX from 
Jekyll.
+
+Input example:
+
+````markdown
+<!-- PANDOC:START -->
+<!--
+\LaTeX code
+-->
+<!-- PANDOC:END -->
+````
+
+Output to Pandoc:
+
+```markdown
+\LaTeX code
+```
+
+That means Pandoc receives clean, valid LaTeX syntax while Jekyll never sees 
it.
+
+---
+
+## โš™๏ธ Customizing Pandoc
+
+Pandoc is run with `--defaults=basic.yaml` which specifies the 
`template_basic.tex` is used.
+The template can be modified to change the PDF output. 
+
+---
+
+## ๐Ÿงฑ Recommended Workflow
+
+1. Write Markdown pages normally for your Jekyll site.
+2. When you also want a PDF version, add `pdf: true` to front matter.
+3. If needed, wrap LaTeX-specific content in `<!-- PANDOC:START -->` / `<!-- 
PANDOC:END -->` blocks.
+4. From `_pandoc/`, run:
+   ```bash
+   make
+   ```
+5. Find the generated PDFs in `pdf/`.
+
+---
+
+## ๐Ÿช„ Why This Setup Works
+
+| Concern | Solution |
+|----------|-----------|
+| Jekyll shouldnโ€™t see LaTeX | Hidden in HTML comments |
+| Pandoc must see LaTeX | AWK removes wrappers |
+| Need automatic PDF generation | Makefile scans for `pdf: true` |
+| Keep tools separate | Everything lives in `_pandoc/` |
+
+---
+
+## ๐Ÿงพ Example Output
+
+```
+pdf/
+โ”œโ”€โ”€ about.pdf
+โ””โ”€โ”€ _posts/
+    โ””โ”€โ”€ 2025-01-01-example.pdf
+```
+
+---
+
+**Maintainer Notes**
+
+- `_pandoc/Makefile` assumes itโ€™s run from `_pandoc/`, with site root as `..`
+- Pandoc and AWK must be available on your `PATH`
+
+---
+
+## Pandoc Tools Installation
+
+These tools run on Linux.
+
+On Ubuntu you have to install these things:
+
+    sudo apt install pandoc texlive-latex-base texlive-latex-recommended \
+      texlive-fonts-recommended texlive-xetex texlive-latex-extra
+
+I have found one must update pandoc to a more up to date version.
+This is currently dependent on pandoc 3.7.0.2 which can be downloaded from
+https://github.com/jgm/pandoc/releases/tag/3.7.0.2 . 

Review Comment:
   I am retesting with pandoc 3.1.3. 



##########
site/_pandoc/README.md:
##########
@@ -0,0 +1,223 @@
+---
+layout: page
+title: Pandoc + Jekyll Integration
+pdf: false
+---
+# ๐Ÿงญ Pandoc + Jekyll Integration
+
+This directory contains tools for generating **PDF versions** of selected 
Jekyll pages while keeping the same Markdown files usable by Jekyll for the 
website.
+
+The goal is to have **one Markdown source** that:
+- renders cleanly in the Jekyll site (for HTML),
+- and can also be converted into a polished PDF using **Pandoc + LaTeX**.
+
+---
+
+## ๐Ÿ—๏ธ Directory Layout
+
+```
+_pandoc/
+โ”‚
+โ”œโ”€โ”€ README.md              โ† this file
+โ”œโ”€โ”€ Makefile               โ† builds all PDFs
+โ”œโ”€โ”€ unwrap-pandoc.awk      โ† preprocessor that removes comment wrappers
+โ”œโ”€โ”€ template.latex         โ† (optional) custom LaTeX template
+โ”œโ”€โ”€ header.tex             โ† (optional) extra LaTeX header content
+โ””โ”€โ”€ ../pdf/                โ† generated PDFs appear here
+```
+
+At the root of the Jekyll site:
+
+```
+_config.yml
+_posts/
+pages/
+assets/
+_pandoc/
+pdf/
+```
+
+---
+
+## ๐Ÿงฉ How It Works
+
+### 1. Mark pages that should have PDFs
+
+Any Markdown file (in `_posts`, `pages/`, or elsewhere) can be tagged with:
+
+```yaml
+---
+title: Example Page
+layout: page
+pdf: true
+---
+```
+
+The Makefile will scan the entire Jekyll project and automatically detect 
these files.
+
+---
+
+### 2. Use HTML comment wrappers for Pandoc-only content
+
+Pandoc sometimes needs LaTeX code for things like custom tables, math, or page 
layout.  
+We hide that LaTeX from Jekyll using **HTML comments**, which Jekyll ignores 
but our AWK preprocessor removes before running Pandoc.
+
+Example:
+
+````markdown
+Regular Markdown content here.
+
+<!-- PANDOC:START -->
+<!--
+```{=latex}
+\begin{tabular}{ll}
+A & B \\
+C & D \\
+\end{tabular}
+```
+-->
+<!-- PANDOC:END -->
+
+More Markdown content.
+````
+
+When viewed on the Jekyll site:
+- This section is hidden (HTML comments are ignored).
+
+When built via Pandoc:
+- The `unwrap-pandoc.awk` script strips the comment wrappers,
+- The inner LaTeX becomes active, producing a correct PDF table.
+
+---
+
+### 3. The Makefile
+
+The `_pandoc/Makefile` automates the whole process.
+
+It:
+
+1. Recursively scans the site for Markdown files with `pdf: true`.
+2. Runs `unwrap-pandoc.awk` to clean up `<!-- PANDOC:START -->` / `<!-- 
PANDOC:END -->` wrappers.
+3. Invokes Pandoc with the configured LaTeX template to produce a PDF.
+
+The resulting PDFs go into:
+
+```
+_pandoc/output/
+```
+
+keeping them separate from the Jekyll site itself.
+
+---
+
+## ๐Ÿงฎ Example Commands
+
+From inside the `_pandoc/` directory:
+
+### Build all PDFs
+```bash
+make
+```
+
+### Clean all generated PDFs
+```bash
+make clean
+```
+
+### List all Markdown files with `pdf: true`
+```bash
+make list
+```
+
+### Force rebuild of one PDF
+```bash
+make ../about.pdf
+```
+
+---
+
+## ๐Ÿงฐ How the AWK Script Works
+
+`unwrap-pandoc.awk` removes the HTML comment wrappers used to hide LaTeX from 
Jekyll.
+
+Input example:
+
+````markdown
+<!-- PANDOC:START -->
+<!--
+\LaTeX code
+-->
+<!-- PANDOC:END -->
+````
+
+Output to Pandoc:
+
+```markdown
+\LaTeX code
+```
+
+That means Pandoc receives clean, valid LaTeX syntax while Jekyll never sees 
it.
+
+---
+
+## โš™๏ธ Customizing Pandoc
+
+Pandoc is run with `--defaults=basic.yaml` which specifies the 
`template_basic.tex` is used.
+The template can be modified to change the PDF output. 
+
+---
+
+## ๐Ÿงฑ Recommended Workflow
+
+1. Write Markdown pages normally for your Jekyll site.
+2. When you also want a PDF version, add `pdf: true` to front matter.
+3. If needed, wrap LaTeX-specific content in `<!-- PANDOC:START -->` / `<!-- 
PANDOC:END -->` blocks.
+4. From `_pandoc/`, run:
+   ```bash
+   make
+   ```
+5. Find the generated PDFs in `pdf/`.
+
+---
+
+## ๐Ÿช„ Why This Setup Works
+
+| Concern | Solution |
+|----------|-----------|
+| Jekyll shouldnโ€™t see LaTeX | Hidden in HTML comments |
+| Pandoc must see LaTeX | AWK removes wrappers |
+| Need automatic PDF generation | Makefile scans for `pdf: true` |
+| Keep tools separate | Everything lives in `_pandoc/` |
+
+---

Review Comment:
   removed



##########
site/_pandoc/Makefile:
##########
@@ -0,0 +1,63 @@
+# ==========================================================
+# Pandoc PDF generator for Jekyll site
+# Scans Markdown files with "pdf: true" in YAML front matter
+# and produces PDFs in the site's ./pdf/ directory
+# ==========================================================
+
+# --- Configuration ---
+SITE_ROOT := ..
+AWK_UNWRAP := $(SITE_ROOT)/_pandoc/unwrap-pandoc.awk
+AWK_LIST   := $(SITE_ROOT)/_pandoc/list-pdf-sources.awk
+PANDOC := pandoc
+
+# Output directory for generated PDFs (at site root)
+PDF_OUTDIR := $(SITE_ROOT)/pdf
+
+DEFAULTS := $(SITE_ROOT)/_pandoc/basic.yaml
+
+# --- Candidate Markdown files (exclude build/tool/output dirs) ---
+# Use find + awk pipeline โ€” awk -f avoids executable bit.
+MD_CANDIDATES := $(shell find $(SITE_ROOT) \
+  -type f -name '*.md' \
+  -not -path '*/_*/*' \
+  -not -path '*/node_modules/*' \
+  -not -path '*/vendor/*' \
+  -not -path '*/pdf/*' \
+  -print0 | xargs -0 -r awk -f $(AWK_LIST))
+
+# --- Files to build ---
+PDF_SRCS := $(MD_CANDIDATES)
+PDFS     := $(patsubst $(SITE_ROOT)/%.md,$(PDF_OUTDIR)/%.pdf,$(PDF_SRCS))
+

Review Comment:
   printing web pages to PDF generally results in non-useful stuff. 
   
   Making one PDF is feasible by feeding all the "pdf: true" markdown files to 
pandoc at once in the right sequence. I will experiment with this more later if 
I decide to keep pursuing this. 



##########
site/pdf/dfdl-extensions.pdf:
##########


Review Comment:
   I've not fixed this yet, nor the issue of one-big PDF vs. separate PDFs per 
page. 
   
   This is an interesting experiment in jekyll + pandoc, but I think I will 
focus on content creation, not this sort of tooling for a while. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to