Package: www.debian.org User: www.debian....@packages.debian.org Usertag: scripts Severity: important
Hi all I'm starting to work in the bug #980921 (Pages in HTML5) and, as it is mentioned there, we need to adapt our "validate" script so it correctly processes the pages declared as HTML5 (currently, only the homepage in the different languages). The current status is following: Related scripts: https://salsa.debian.org/webmaster-team/cron/-/blob/master/lessoften executed once a day, calling (via run-parts) the following script: https://salsa.debian.org/webmaster-team/cron/-/blob/master/scripts/999Xvalidate which gets the list of languages and folders to process and then calls: https://salsa.debian.org/webmaster-team/cron/-/blob/master/scripts/validate Which is the actual script doing the HTML validation, using the onsgmls command (part of opensp package). This command validates a SGML file based on a DTD. The issue (as far as I know) is that there is no "official" SGML DTD template to use when parsing HTML5 files. I have tried adapting the "validate" script to be able to recognize the DOCTYPE header used for html5 files, and then tried to pass a DTD (I tried downloading the ones here http://sgmljs.net/docs/w3c-html5-dtd.html and here http://sgmljs.net/docs/w3c-html52-dtd.html and also here https://jkorpela.fi/html5-dtd.html ) but couldn't make it work, and also was not convinced it is the better approach. I've tried to look at what w3c validator uses and they use Nu.checker: https://validator.w3.org/nu/about.html https://github.com/validator/validator/releases/latest But I'm not sure if this is packaged in Debian in any of its flavours. I have searched https://packages.debian.org/search?keywords=html5 but none of the results looks like a commandline tool that we could call instead of onsgmls So I don't know what to do at this point. In my local machine, I have downloaded the vnu.jar file from the latest Nu checker release " and tried to validate files and it works. But I don't know if asking DSA to install openjdk in www-master and include a copy of vnu.jar in our cron scripts is good and/or elegant. Opinions, advice and patches are very welcome. Meanwhile, I guess we can modify 99Xvalidate to add file exclusions, and exclude, for now, /index.*.html and later the few other files we have with html5 tags for now. I don't know how to exclude the index.*.html files on top folder only and not in subfolders but I guess playing with find -wholename and prune will do the treak (if you know, please go ahead). Kind regards, -- Laura Arjona https://wiki.debian.org/LauraArjona