[jira] [Updated] (SOLR-10934) create a link+anchor checker for the ref-guide PDF using PDFBox

Hoss Man (JIRA) Wed, 01 Nov 2017 16:13:27 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hoss Man updated SOLR-10934:
----------------------------
    Attachment: SOLR-10934.patch

bq. What we might want to consider, is refactoring our build.xml, so that the 
same <asciidoctor:convert/> task options use to generate the PDF, could also be 
used to generate a bare bones version of the html-site – ie: not using jekyll, 
just using raw asciidoctor with the "html5" output option. Then we could (in 
theory) run the same HTML link checking code we currently have against that 
output dir – just for the purpose of checking the links, not with any plan to 
ever publish it.

I'm attaching a path that takes this approach -- i think it works pretty well.

Unfortunately refactoring just the build.xml file proved to be insufficient to 
be able to re-use the existing {{<ascidoctor;convert>}} in a macro because of 
how the underlying Task class works -- it has some hard assumptions about XML 
element attributes like "sourceDocumentName" not being used even if they are ht 
empty string because of ant property expansion -- but i was able to deal with 
that by adding out own little AntTask subclass into the tools jar.

i also did a little more refactoring of the build.xml file so running building 
both the PDF & jekyll site via {{ant}} wouldn't waste time redudently also 
building & validating the bare-bones HTML version. (unfortunately if you 
explicitly run {{ant build-pdf build-site}} this still happens, but hey: baby 
steps)

like the previous patch, this includes some "nocommit" annotated intentional 
anchor/link errors in the {{*.adoc}} files.  If you apply the patch as is, and 
run {{ant}} or {{ant build-pdf}} or {{ant build-site}} you'll get all the same 
validation errors that we want to see happen with this kind of bad content.  If 
you refer the {{solr/solr-ref-guide/src}} changes then everything will start 
building happily.

what do folks think of this approach?



> create a link+anchor checker for the ref-guide PDF using PDFBox
> ---------------------------------------------------------------
>
>                 Key: SOLR-10934
>                 URL: https://issues.apache.org/jira/browse/SOLR-10934
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: documentation
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: SOLR-10934.patch, SOLR-10934.patch
>
>
> We currently have CheckLinksAndAnchors.java which is automatically run 
> against the ref-guide HTML as part of the build to use JSoup to find bad 
> links/anchors that asciidoctor doesn't complain about -- but not everyone 
> does/can build the HTML version of the ref-guide sincif we can e it requires 
> manually installing jekyll.
> The PDF build only requires things installed by ivy (via JRuby) and we 
> already have some PDFBox based code in ReducePDFSize.java that operates on 
> this PDF every time it's run -- so if we can find a way to do similar checks 
> using the PDFBox API we could catch these broken links faster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10934) create a link+anchor checker for the ref-guide PDF using PDFBox

Reply via email to