Control: tags -1 moreinfo On 2015-02-22 05:26, Kevin Ryde wrote: > Package: lintian > Version: 2.5.30+deb8u3 > Severity: wishlist > Tags: patch > > If a .html file is in a package then usually its <img> files should be > in the package too so it displays nicely. I suggest the few lines below > to check this. > > Without picking on any particular maintainers, missing images can be > found in for example > * whizzytex where /usr/share/doc/whizzytex/whizzytex.html is missing > whizzytex001.png (and two others) > * texlive-pictures-doc (very big) where > /usr/share/doc/texlive-doc/latex/mathspic/sourcecode113.html is > missing a fig1.jpg deep in its detailed description >
Hi Kevin, Thanks for writing a lintian check for this. It is indeed an interesting proposal. I do have some concerns on the performance front. On some packages, this will be the "second slowest" check taking 10s or more. E.g. lazarus-doc and php5-doc contain quite a few HTML files[1]. It is possible that /some/ of this will be solved by merging it with the code from checks/files.pm that do some checking of HTML files (would at least save reading the file twice). > I'm unsure if my code notices images supplied by dependent packages. > I put a group bit like the manpages and symlinks checks, but I don't > really understand when packages are a group. Eg. per html.pm comments, > texlive-lang-french uses images from texlive-base and has a correct > declared dependency, but I couldn't make the right incantation to have > it recognised :-(. > I suspect it is correct. However, it requires that the binaries are built from the same source. Accordingly, it would never work with texline-lang-french and texlive-base as they are from different source packages. > Incidentally HTML::Parser would be a more reliable html parse of course. > But are lintian dependencies supposed to be kept down? I see another > rough html parse in files.pm for privacy breaches. A good parse might > help accuracy there against obscure quoting or escaping. > Depends on what we are pulling in. The libhtml-parser-perl (and libhtml-tagset-perl) seem (at first glance) to increase the footprint with 0.3MB. With it already been in stable, it likely to be an acceptable extra dependency. To be honest, I am also interested in the performance characteristics of using HTML::Parser over the current approach. Especially if it can be used to enhance the performance of our similar checks (e.g. the privacy breaker one in c/files.pm). > I thought separate html.pm script to leave room for other checks related > to html parse (whatever method). Maybe similar treatment of css or > javascript (though I don't rate those), even some href checking. No > full link checker, but detect document parts apparently missing from a > package. > > [...] > That could make sense - I am thinking it would make sense to move the privacy breaker checks into the http-check file as well. Currently, it scans all files matching: $fname =~ m,\.(?:x?html?|js|xht|xml|css)$,i Which seems fairly compatible with a http check. Thanks, ~Niels [1] A slightly longer list of packages to choose from: 23437 freefoam-dev-doc 19348 lazarus-doc-1.2.4 18266 libreoffice-dev-doc 17346 libgcj-doc 13280 libboost1.55-doc 13159 libboost1.54-doc 12532 php-doc 12455 vtk6-doc 12285 fp-docs-2.6.4 11929 openjdk-8-doc 10873 liblapack-doc 10845 openjdk-7-doc 10288 openjdk-6-doc 10163 vtk-doc 9473 pike7.8-reference Computed by: apt-file search '.htm' | grep -E '\.html?$' | cut -f1 -d':' | \ sort | uniq -c | sort --numeric --reverse -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org