Hi,

On 2026-02-16 10:39:06 +0300, Nazir Bilal Yavuz wrote:
>  # Run validation only once, common to all subsequent targets.  While
>  # we're at it, also resolve all entities (that is, copy all included
>  # files into one big file).  This helps tools that don't understand
> @@ -108,12 +128,15 @@ postgres_full_xml = custom_target('postgres-full.xml',
>    depfile: 'postgres-full.xml.d',
>    command: [xmllint, '--nonet', '--noent', '--valid',
>              '--path', '@OUTDIR@', '-o', '@OUTPUT@', '@INPUT@'],
> -  depends: doc_generated,
> +  depends: [doc_generated, sgml_check_stamp],
>    build_by_default: false,
>  )
>  docs += postgres_full_xml
>  alldocs += postgres_full_xml

Not sure I love that now the entire docs build fails if there is a tab
anywhere. It's probably ok, but ...

Given you add it as a test for meson and a check for make, why not just make
sgml_check_stamp a dependency of the test (meson) and check targets (make)?


> +postgres_full_xml_target = 'postgres_full_xml'
> +alias_target(postgres_full_xml_target, postgres_full_xml)

Why would we want a top-level postgres_full_xml target?


>  if xsltproc_bin.found()
>    xsltproc_flags = [
> @@ -308,3 +331,13 @@ endif
>  if alldocs.length() != 0
>    alias_target('alldocs', alldocs)
>  endif
> +
> +# Test creation of 'postgres_full_xml'.
> +test(
> +  'doc' / 'sgml_syntax_check',
> +  meson_bin,
> +  args: ['compile', postgres_full_xml_target],
> +  suite: 'doc'
> +)

I'm confused about this 'compile' bit here. What is that intending to do? I
don't have such a 'compile' command in my PATH, so the test fails here...


> diff --git a/doc/src/sgml/sgml_syntax_check.pl 
> b/doc/src/sgml/sgml_syntax_check.pl
> new file mode 100755
> index 00000000000..61b4cd203c0
> --- /dev/null
> +++ b/doc/src/sgml/sgml_syntax_check.pl
> @@ -0,0 +1,115 @@
> +#!/usr/bin/perl
> +#
> +# Perl script that checks SGML/XSL syntax and formatting issues.
> +#
> +# doc/src/sgml/sgml_syntax_check.pl
> +#
> +# Copyright (c) 2026, PostgreSQL Global Development Group
> +
> +use strict;
> +use warnings FATAL => 'all';
> +use Getopt::Long;
> +
> +use File::Find;
> +
> +my $srcdir;
> +my $builddir;
> +my $dep_file;
> +my $stamp_file;
> +
> +GetOptions(
> +     'srcdir:s' => \$srcdir,
> +     'builddir:s' => \$builddir,
> +     'dep_file:s' => \$dep_file,
> +     'stamp_file:s' => \$stamp_file) or die "$0: wrong arguments";

I don't think there's a point in having a dep file here. Dependency files are
only useful if you have dependency on *additional* files to the ones
explicitly passed in (e.g. a .c file may depend on headers, but the headers
aren't listed as arguments to the compiler).


> +die "$0: --srcdir must be specified\n" unless defined $srcdir;
> +
> +# Find files to process - all the sgml and xsl files (including in 
> subdirectories)
> +my @files_to_process;
> +my @dirs_to_search = ($srcdir);
> +push @dirs_to_search, $builddir if defined $builddir;
> +find(
> +     sub {
> +             return unless -f $_;
> +             return if $_ !~ /\.(sgml|xsl)$/;
> +             push @files_to_process, $File::Find::name;
> +     },
> +     @dirs_to_search,);

Why is this any more complicated than just opening the file exactly in the
passed in source dir?


> +# tabs and non-breaking spaces are harmless, but it is best to avoid them in 
> SGML files
> +sub check_tabs_and_nbsp
> +{
> +     my $errors = 0;
> +     for my $f (@files_to_process)
> +     {
> +             open my $fh, "<:encoding(UTF-8)", $f or die "Can't open $f: $!";
> +             my $line_no = 0;
> +             while (<$fh>)
> +             {
> +                     $line_no++;
> +                     if (/\t/)
> +                     {
> +                             print STDERR "Tab found in $f:$line_no\n";
> +                             $errors++;
> +                     }
> +                     if (/\xC2\xA0/)
> +                     {
> +                             print STDERR "$f:$line_no: contains 
> non-breaking space\n";
> +                             $errors++;
> +                     }
> +             }
> +             close($fh);
> +     }
> +
> +     if ($errors)
> +     {
> +             remove_stamp_file();

I don't think that's needed, the stamp file needs to be newer than the inputs
to have an effect, and here you're just removing an old one, no?


> +sub create_stamp_file
> +{
> +     # Avoid touching existing stamp file to prevent unnecessary rebuilds
> +     if (!(-f $stamp_file))
> +     {
> +             open my $fh, '>', $stamp_file
> +               or die "can't open $stamp_file: $!";
> +             close $fh;
> +     }
> +}

Won't that *cause* rebuilds? With a stamp file you normally want the stamp
file to be *newer* than the inputs. Which it won't be, if you don't touch it
here.

The only reason it doesn't cause quick rebuilds with meson is that ninja
remembers the timestamps of files an avoids rebuilds if they haven't changed.

Greetings,

Andres Freund


Reply via email to