Hello all Now that we are using the more modern tool onsgmls instead of nsgmls in our "validate" script:
https://anonscm.debian.org/cgit/debwww/cron.git/tree/scripts/validate I've returned to this bug. The output of the validate script for the files containing "emojis" didn't change much: **** Errors validating /srv/www.debian.org/www/international/l10n/po/en_GB.it.html: *** Line 122, character 357: cannot convert character reference to number 128513 because character not in internal character set I was a bit surprised that we are still getting these errors, because if I pass the online w3c validator https://validator.w3.org/ or even a manual onsgmls command in the machine that builds the website: onsgmls -E0 -s /path/to/dtd /path/to/file in both cases I don't get any error. So I've looked at the "validate" script and played a bit with the options set there, and I'd like to bring to your attention the lines L363-376: # Determine whether we're dealing with HTML or XHTML and set the SP # environment accordingly. if ($xhtml{$htmlLevel}) { $ENV{'SGML_CATALOG_FILES'} = $xhtmlCatalog; $ENV{'SP_ENCODING'} = 'xml'; } else { $ENV{'SGML_CATALOG_FILES'} = $htmlCatalog; if (defined $charset) { $ENV{'SP_ENCODING'} = $charset; } else { $ENV{'SP_ENCODING'} = "ISO-8859-1"; } } $ENV{'SP_CHARSET_FIXED'} = 1 If I comment this last line (and thus, letting onsgmls run in not fixed mode), I get no errors validating the file. I've read the documentation about these options: http://openjade.sourceforge.net/doc/charset.htm but frankly I don't understand it very much. I've done: larjona@wolkenstein:~$ sudo -u debwww env | grep SP_ and it returns nothing, so I guess only the environment set in "validate" script is taken into account, if we don't set the variables there, defaults rule. I've modified and run a copy of the validate script, making it print some values when checking a file, and document type is correctly detected (HTML 4.01 Strict), as well as charset (utf-8). I'm not sure I can safely comment the line 376 $ENV{'SP_CHARSET_FIXED'} = 1; to avoid the errors, or even comment the whole paragraph, and trust onsgmls to do the right thing. Anybody with more experience in this can help? Thanks -- Laura Arjona Reina https://wiki.debian.org/LauraArjona