Hi Mark,

Mark Wielaard <[email protected]> writes:

> I think it is a good thing to make it easier for users to regenerate
> the documentation locally for offline usage. And it would be helpful
> to check documentation generation work by using is as a snapshot
> builder and/or an Action that could be run on a merge request in the
> forge.
>
> We don't have to change the workflow to generate the online docs, it
> could still be done through a cron job. But if we can use the same
> script to generate them also locally, through a snapshots builder and
> maybe a merge request Action on the forge that would be great. Then
> when that works, we can decide whether to change the actual mechanism.

To my understanding, no automation exists for release documentation.
This is what I mean.

> I used the script to create a gcc docs snapshot builder:
> https://snapshots.sourceware.org/gcc/docs/
> https://builder.sourceware.org/buildbot/#/builders/gcc-snapshots
>
> I had to add the following packages to the fedora-latest container:
>
>   mandoc docbook5-style-xsl doxygen graphviz dblatex libxml2 libxslt
>   texlive-latex texlive-makeindex texinfo texinfo-tex python3-sphinx
>   groff-base groff-perl texlive-hanging texlive-adjustbox
>   texlive-stackengine texlive-tocloft texlive-newunicodechar
>
> Might be good to document that somewhere. Also not everything is
> checked for so when you are missing some packages things might just
> break half-way through.

Yes, the checks are mostly what the existing scripts were already
checking, extended slightly, and aren't comprehensive.

> I am not sure what to do about the CSS. It would be way nicer if that
> was also embedded in the source instead of relying on an external URL
> or repository.
>
> Also it would be nice if there was a little top-level index.html.
> Maybe a snippet like at the end of
> https://gcc.gnu.org/onlinedocs/index.html (Current development)?

That could be added.  Currently, there isn't really an equivalent (see
https://gcc.gnu.org/onlinedocs/gcc-15.2.0/ for instance)

> Some comments on the actual script below.
>
>>  maintainer-scripts/gen_gcc_docs.sh | 391 +++++++++++++++++++++++++++++
>>  1 file changed, 391 insertions(+)
>>  create mode 100755 maintainer-scripts/gen_gcc_docs.sh
>> 
>> diff --git a/maintainer-scripts/gen_gcc_docs.sh 
>> b/maintainer-scripts/gen_gcc_docs.sh
>> new file mode 100755
>> index 000000000000..c10733d21da2
>> --- /dev/null
>> +++ b/maintainer-scripts/gen_gcc_docs.sh
>> @@ -0,0 +1,391 @@
>> +#!/usr/bin/bash
>> +#
>> +# Copyright (C) 2025 Free Software Foundation, Inc.
>> +#
>> +# This script is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3, or (at your option)
>> +# any later version.
>> +
>> +# Usage: gen_gcc_docs.sh [srcdir] [outdir]
>> +#
>> +# Generates and outputs GCC documentation to [outdir].
>> +#
>> +# Impacted by a few environment variables:
>> +# - BUGURL :: The bug URL to insert into the manuals.
>> +# - CSS :: URL to pass as the CSS reference in HTML manuals.
>> +# - BRANCH :: Documentation branch to build.  Defaults to git default.
>> +# - TEXI2DVI, TEXI2PDF, MAKEINFO, SPHINXBUILD :: Names of the respective 
>> tools.
>> +
>> +# Based on update_web_docs_git and generate_libstdcxx_web_docs.
>> +
>> +MANUALS=(
>> +  cpp
>> +  cppinternals
>> +  fastjar
>
> fastjar brings back memories, but I believe we haven't shipped it in
> 15 years.
>
>> +  gcc
>> +  gccgo
>> +  gccint
>> +  gcj
>
> Likewise for gcj
>
>> +  gdc
>> +  gfortran
>> +  gfc-internals
>> +  gm2
>> +  gnat_ugn
>> +  gnat-style
>> +  gnat_rm
>> +  libgomp
>> +  libitm
>> +  libquadmath
>> +  libiberty
>> +  porting
>
> Isn't porting part of libstdc++ now?
>
>> +)
>
> So jit, libstdc++ and gcobol are their own thing?

Yes, they aren't Texinfo.

> Why is libffi bot included?
>
>> +die() {
>> +  echo "fatal error ($?)${*+: }$*" >&2
>> +  exit 1
>> +}
>> +
>> +v() {
>> +  echo "+ $*" >&2
>> +  "$@"
>> +}
>> +export -f v die
>> +
>> +# Check arguments.
>> +[[ $1 ]] \
>> +  || die "Please specify the source directory as the first argument"
>> +srcdir="$1"
>> +if ! [[ $srcdir = /* ]]; then
>> +  srcdir="$(pwd)/${srcdir}"
>> +fi
>> +
>> +[[ $2 ]] \
>> +  || die "Please specify the output directory as the directory argument"
>> +outdir="$2"
>> +if ! [[ $outdir = /* ]]; then
>> +  outdir="$(pwd)/${outdir}"
>> +fi
>
> OK, makes them required and absolute paths.
>
>> +## Find build tools.
>> +# The gccadmin home directory contains a special build of Texinfo that has
>> +# support for copyable anchors.  Find it.
>> +makeinfo_git=/home/gccadmin/texinfo/install-git/bin/
>> +if [ -x "${makeinfo_git}"/makeinfo ]; then
>> +  : "${MAKEINFO:=${makeinfo_git}/makeinfo}"
>> +  : "${TEXI2DVI:=${makeinfo_git}/texi2dvi}"
>> +  : "${TEXI2PDF:=${makeinfo_git}/texi2pdf}"
>> +else
>> +  : "${MAKEINFO:=makeinfo}"
>> +  : "${TEXI2DVI:=texi2dvi}"
>> +  : "${TEXI2PDF:=texi2pdf}"
>> +fi
>> +
>> +py_venv_bin=/home/gccadmin/venv/bin
>> +# Similarly, it also has a virtualenv that contains a more up-to-date 
>> Sphinx.
>> +if [ -x "${py_venv_bin}"/sphinx-build ]; then
>> +  : "${SPHINXBUILD:=${py_venv_bin}/sphinx-build}"
>> +else
>> +  : "${SPHINXBUILD:=sphinx-build}"
>> +fi
>> +export MAKEINFO TEXI2DVI TEXI2PDF SPHINXBUILD
>
> Do we really need that special case hardcoded /home/gccadmin/...?
> Can't we just require those bin dirst are just prepended to PATH
> before invoking the script or that they set the special case TOOL env
> variables?

We can, the intention was to make it simpler to run on the GCC admin
machine (to replace the current scripts), without making it rely on that 
behaviour.

>> +# Check for the programs.
>> +for i in \
>> +  doxygen dot dblatex pdflatex makeindex "${MAKEINFO}" "${TEXI2DVI}" \
>> +          "${TEXI2PDF}" "${SPHINXBUILD}"; do
>> +  echo >&2 -n "Checking for ${i##*/}... "
>> +  type >&2 -P "$i" && continue
>> +  echo >&2 "not found"
>> +  exit 1
>> +done
>
> Maybe at least add mandoc? xsltproc? groff? check that groff can
> generate PDF? That all required latex packages are installed?

I'll go over the list of packages you listed above and add checks.

>> +# Set sane defaults.
>> +: "${BUGURL:=https://gcc.gnu.org/bugs/}";
>> +: "${CSS:=/texinfo-manuals.css}" # https://gcc.gnu.org/texinfo-manuals.css
>> +export CSS BUGURL
>
> Maybe include that css in the sources so it is standalone by default?

Maybe, that could work if there's no CSS set.

>> +v mkdir -p "${outdir}" || die "Failed to create the output directory"
>> +
>> +workdir="$(mktemp -d)" \
>> +  || die "Failed to get new work directory"
>> +readonly workdir
>> +trap 'cd /; rm -rf "$workdir"' EXIT
>> +cd "$workdir" || die "Failed to enter $workdir"
>> +
>> +if [[ -z ${BRANCH} ]]; then
>> +  git clone -q "$srcdir" gccsrc
>> +else
>> +  git clone -b "${BRANCH}" -q "$srcdir" gccsrc
>> +fi || die "Clone failed"
>
> Not a fan of the cd /; rm -rf ... but lets pretend that works out ok.

Yes, that's fair, but it's impossible for 'rm' not to get a positional
argument due to the quotes, so this invocation is, in the worst case,
equivalent to `rm -rf ''`.  Unfortunately, we can't really avoid cd-ing
somewhere, since the work directory needs to be freed up for deleting.

> So the current script depends on the srcdir being a full gcc git repo
> from which it can checkout a BRANCH and then build the docs for that
> branch. I think it might make sense to have the script on each branch
> for that ranch, so you would just build the docs for the source/branch
> you have. Since different branches might have different sets of
> manuals.

I initially wanted this to function across versions (since the old
scripts did that also, and we'd need to run them on gccadmin, which uses
copies from trunk AFAIU), but in general I do agree that an approach
that makes each version track its own tweaks and works more strictly is
better.  If we can do that on gccadmin, I'd prefer that too.

>> +######## BUILD libstdc++ DOCS
>> +# Before we wipe out everything but JIT and Texinfo documentation, we need 
>> to
>> +# generate the libstdc++ manual.
>> +mkdir gccbld \
>> +  || die "Couldn't make build directory"
>> +(
>> +  set -e
>> +  cd gccbld
>> +
>> +  disabled_libs=()
>> +  for dir in ../gccsrc/lib*; do
>> +    dir="${dir##*/}"
>> +    [[ -d $dir ]] || continue
>> +    [[ $dir == libstdc++-v3 ]] && continue
>> +    disabled_libs+=( --disable-"${dir}" )
>> +  done
>> +
>> +  v ../gccsrc/configure \
>> +    --enable-languages=c,c++ \
>> +    --disable-gcc \
>> +    --disable-multilib \
>> +    "${disabled_libs[@]}" \
>> +    --docdir=/docs \
>> +    || die "Failed to configure GCC for libstdc++"
>> +  v make configure-target-libstdc++-v3 || die "Failed to configure 
>> libstdc++"
>> +
>> +  # Pick out the target directory.
>> +  target=  # Suppress warnings from shellcheck.
>> +  eval "$(grep '^target=' config.log)"
>> +  v make -C "${target}"/libstdc++-v3 \
>> +    doc-install-{html,xml,pdf} \
>> +    DESTDIR="$(pwd)"/_dest \
>> +    || die "Failed to compile libstdc++ docs"
>> +  set +x
>
> Doesn't that make things very verbose?

Hm, yes, this slipped by, I didn't intend to leave it.

>> +  cd _dest/docs
>> +  v mkdir libstdc++
>> +  for which in api manual; do
>> +    echo "Prepping libstdc++-${which}..."
>> +    if [[ -f libstdc++-"${which}"-single.xml ]]; then
>> +      # Only needed for GCC 4.7.x
>> +      v mv libstdc++-"${which}"{-single.xml,} || die
>> +    fi
>
> Do we really want to support 4.7.x in this (modern) script?
> See also the BRANCH comment above.

Same answer as above.

>> +    v gzip --best libstdc++-"${which}".xml || die
>> +    v gzip --best libstdc++-"${which}".pdf || die
>> +
>> +    v mv libstdc++-"${which}"{.html,-html} || die
>> +    v tar czf libstdc++-"${which}"-html.tar.gz libstdc++-"${which}"-html \
>> +      || die
>> +    mv libstdc++-"${which}"-html libstdc++/"${which}"
>> +
>> +    # Install the results.
>> +    v cp libstdc++-"${which}".xml.gz "${outdir}" || die
>> +    v cp libstdc++-"${which}".pdf.gz "${outdir}" || die
>> +    v cp libstdc++-"${which}"-html.tar.gz "${outdir}"
>> +  done
>> +
>> +  v cp -Ta libstdc++ "${outdir}"/libstdc++ || die
>> +) || die "Failed to generate libstdc++ docs"
>> +
>> +v rm -rf gccbld || die
>> +
>> +######## PREPARE SOURCES
>> +
>> +# Remove all unwanted files.  This is needed to avoid packaging all the
>> +# sources instead of only documentation sources.
>> +# Note that we have to preserve gcc/jit/docs since the jit docs are
>> +# not .texi files (Makefile, .rst and .png), and the jit docs use
>> +# include directives to pull in content from jit/jit-common.h and
>> +# jit/notes.txt, and parts of the jit.db testsuite, so we have to preserve
>> +# those also.
>> +find gccsrc -type f \( -name '*.texi' \
>> +     -o -path gccsrc/gcc/doc/install.texi2html \
>> +     -o -path gccsrc/gcc/doc/include/texinfo.tex \
>> +     -o -path gccsrc/gcc/BASE-VER \
>> +     -o -path gccsrc/gcc/DEV-PHASE \
>> +     -o -path "gccsrc/gcc/cobol/gcobol.[13]" \
>> +     -o -path "gccsrc/gcc/ada/doc/gnat_ugn/*.png" \
>> +     -o -path "gccsrc/gcc/jit/docs/*" \
>> +     -o -path "gccsrc/gcc/jit/jit-common.h" \
>> +     -o -path "gccsrc/gcc/jit/notes.txt" \
>> +     -o -path "gccsrc/gcc/doc/libgdiagnostics/*" \
>> +     -o -path "gccsrc/gcc/testsuite/jit.dg/*" \
>> +     -o -print0 \) | xargs -0 rm -f \
>> +  || die "Failed to clean up source tree"
>> +
>> +# The directory to pass to -I; this is the one with texinfo.tex
>> +# and fdl.texi.
>> +export includedir=gccsrc/gcc/doc/include
>
> Does this need to be an exported variable?

Yes, docs_build_simple is invoked through a new bash process.

>> +# Generate gcc-vers.texi.
>> +(
>> +  set -e
>> +  echo "@set version-GCC $(cat gccsrc/gcc/BASE-VER)"
>> +  if [ "$(cat gccsrc/gcc/DEV-PHASE)" = "experimental" ]; then
>> +    echo "@set DEVELOPMENT"
>> +  else
>> +    echo "@clear DEVELOPMENT"
>> +  fi
>> +  echo "@set srcdir $workdir/gccsrc/gcc"
>> +  echo "@set VERSION_PACKAGE (GCC)"
>> +  echo "@set BUGURL @uref{$BUGURL}"
>> +) > "$includedir"/gcc-vers.texi \
>> +  || die "Failed to generate gcc-vers.texi"
>> +
>> +# Generate libquadmath-vers.texi.
>> +echo "@set BUGURL @uref{$BUGURL}" \
>> +     > "$includedir"/libquadmath-vers.texi \
>> +  || die "Failed to generate libquadmath-vers.texi"
>> +
>> +# Build a tarball of the sources.
>> +tar cf docs-sources.tar --xform 's/^gccsrc/gcc/' gccsrc \
>> +  || die "Failed to build sources"
>
> Why not create a tar.gz? See also below.
>
>> +######## BUILD DOCS
>> +docs_build_single() {
>> +  [[ $1 ]] || die "bad docs_build_single invoc"
>> +  local manual="$1" filename miargs
>> +  filename="$(find . -name "${manual}.texi")" \
>> +    || die "Failed to find ${manual}.texi"
>> +
>> +  # Silently ignore if no such manual exists is missing.
>> +  [[ $filename ]] || return 0
>
> Maybe don't be silent about it?
> If a manual suddenly disappears shouldn't this script just be adapted?

This is one of the places where supporting many versions manifests.  If
we decide not to do that, this should be loud indeed.

>> +  miargs=(
>> +    -I "${includedir}"
>> +    -I "$(dirname "${filename}")"
>> +  )
>> +
>> +  # Manual specific arguments.
>> +  case "$manual" in
>> +    gm2)
>> +      miargs+=(
>> +        -I gccsrc/gcc/m2/target-independent
>> +        -I gccsrc/gcc/m2/target-independent/m2
>> +      )
>> +      ;;
>> +    gnat_ugn)
>> +      miargs+=(
>> +        -I gccsrc/gcc/ada
>> +        -I gccsrc/gcc/ada/doc/gnat_ugn
>> +      )
>> +      ;;
>> +    *) ;;
>> +  esac
>> +
>> +  v "${MAKEINFO}" --html \
>> +    "${miargs[@]}" \
>> +    -c CONTENTS_OUTPUT_LOCATION=inline \
>> +    --css-ref "${CSS}" \
>> +    -o "${manual}" \
>> +    "${filename}" \
>> +    || die "Failed to generate HTML for ${manual}"
>> +  tar cf "${manual}-html.tar" "${manual}"/*.html \
>> +    || die "Failed to pack up ${manual}-html.tar"
>
> Maybe generate a tar.gz directly?

Will try, I think that'd be okay probably.

>> +  v "${TEXI2DVI}" "${miargs[@]}" \
>> +    -o "${manual}.dvi" \
>> +    "${filename}" \
>> +    </dev/null >/dev/null \
>> +    || die "Failed to generate ${manual}.dvi"
>> +  v dvips -q -o "${manual}".{ps,dvi} \
>> +    </dev/null >/dev/null \
>> +    || die "Failed to generate ${manual}.ps"
>
> Do we really still want to produce a dvi and ps file if we already
> produce a pdf below?

We currently do.  Not sure what the benefit is, but we do, so I kept it.

>> +  v "${TEXI2PDF}" "${miargs[@]}" \
>> +    -o "${manual}.pdf" \
>> +    "${filename}" \
>> +    </dev/null >/dev/null \
>> +    || die "Failed to generate ${manual}.pdf"
>> +
>> +  while read -d $'\0' -r f; do
>> +    # Do this for the contents of each file.
>> +    sed -i -e 's/_002d/-/g' "$f" \
>> +      || die "Failed to hack $f"
>> +    # And rename files if necessary.
>> +    ff="${f//_002d/-}"
>> +    if [ "$f" != "$ff" ]; then
>> +      printf "Renaming %s to %s\n" "$f" "$ff"
>
> Maybe make this silent, the log already is fairly big?

This is log of a hack, so I'd prefer if this specific thing was loud and
other things were made quieter.

>> +      mv "$f" "$ff" || die "Failed to rename $f"
>> +    fi
>> +  done < <(find "${manual}" -name '*.html' -print0)
>> +}
>> +export -f docs_build_single
>> +
>> +# Now convert the relevant files from texi to HTML, PDF and PostScript.
>> +if type -P parallel >&/dev/null; then
>> +  parallel docs_build_single '{}' ::: "${MANUALS[@]}"
>> +else
>> +  for man in "${MANUALS[@]}"; do
>> +    docs_build_single "${man}"
>> +  done
>> +fi
>
> Interesting use of parallel (note, not currently installed on server
> or in the container). Does it work with the nagware thing? Otherwise
> it might be useful to explicitly do
>   mkdir -p ~/.parallel; touch ~/.parallel/will-cite

Hm, I didn't check on a machine that doesn't already have will-cite.

>> +v make -C gccsrc/gcc/jit/docs html SPHINXBUILD="${SPHINXBUILD}" \
>> +  || die "Failed to generate libgccjit docs"
>> +
>> +v cp -a gccsrc/gcc/jit/docs/_build/html jit || die "failed to cp jit"
>> +
>> +
>> +if [[ -d gccsrc/gcc/doc/libgdiagnostics/ ]]; then
>> +  v make -C gccsrc/gcc/doc/libgdiagnostics/ html 
>> SPHINXBUILD="${SPHINXBUILD}" \
>> +    || die "Failed to generate libgdiagnostics docs"
>> +
>> +  v cp -a gccsrc/gcc/doc/libgdiagnostics/_build/html libgdiagnostics \
>> +    || die "failed to cp libgdiagnostics"
>> +fi
>
> This is why I think it might make sense to have this script be
> specific to each branch.
>
>> +######## BUILD gcobol DOCS
>> +# The COBOL FE maintains man pages.  Convert them to HTML and PDF.
>> +cobol_mdoc2pdf_html() {
>> +  mkdir -p gcobol
>> +  input="$1"
>> +  d="${input%/*}"
>> +  pdf="$2"
>> +  html="gcobol/$3"
>> +  groff -mdoc -T pdf "$input" > "${pdf}" || die
>> +  mandoc -T html "$filename" > "${html}" || die
>> +}
>> +find . -name gcobol.[13] |
>> +  while read filename
>> +  do
>> +    case ${filename##*.} in
>> +      1)
>> +        cobol_mdoc2pdf_html "$filename" gcobol.pdf gcobol.html
>> +        ;;
>> +      3)
>> +        cobol_mdoc2pdf_html "$filename" gcobol_io.pdf gcobol_io.html
>> +        ;;
>> +    esac
>> +  done
>> +
>> +# Then build a gzipped copy of each of the resulting .html, .ps and .tar 
>> files
>> +(
>> +  shopt -s nullglob
>> +  for file in */*.html *.ps *.pdf *.tar; do
>> +    # Tell gzip to produce reproducible zips.
>> +    SOURCE_DATE_EPOCH=1 gzip --best > "$file".gz <"$file"
>> +  done
>> +)
>
> Here you also create tar.gz files. Leaving the .tar archives as is.
> Since the .tar archives are really big already I would remove them
> here, or simply directly create tar.gz files above.
>
>> +# And copy the resulting files to the web server.
>> +while read -d $'\0' -r file; do
>> +  outfile="${outdir}/${file}"
>> +  mkdir -p "$(dirname "${outfile}")" \
>> +    || die "Failed to generate output directory"
>> +  cp "${file}" "${outfile}" \
>> +    || die "Failed to copy ${file}"
>> +done < <(find . \
>> +              -not -path "./gccsrc/*" \
>> +              \( -name "*.html" \
>> +              -o -name "*.png" \
>> +              -o -name "*.css" \
>> +              -o -name "*.js" \
>> +              -o -name "*.txt" \
>> +              -o -name '*.html.gz' \
>> +              -o -name '*.ps' \
>> +              -o -name '*.ps.gz' \
>> +              -o -name '*.pdf' \
>> +              -o -name '*.pdf.gz' \
>> +              -o -name '*.tar' \
>> +              -o -name '*.tar.gz' \
>> +              \) -print0)
>
> So I might suggest to skip *.ps, *.ps.gz and *.tar here.

This would mean diverging from the current practice, which is fine by
me, but I'd like second opinions also (see the contents of
https://gcc.gnu.org/onlinedocs/gcc-15.2.0/ as an example).

Thanks, have a lovely day.
-- 
Arsen Arsenović

Attachment: signature.asc
Description: PGP signature

Reply via email to