[ 
https://issues.apache.org/jira/browse/SOLR-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058339#comment-16058339
 ] 

Hoss Man commented on SOLR-10934:
---------------------------------

based on the code in this SO post, it looks like we should be able to..

* loop over all PDAnnotations in each PDPage 
* if the annotation isa PDAnnotationLink then we can access it's PDAction and 
PDDestination
* PDActionURI is an external link, PDActionGoTo is an inter-document link
* PDActionGoTo can point at either a PDPageDestination (page num?) or a 
PDNamedDestination (named anchor?)
* we lookup PDNamedDestination instances in the document catlog.

that _should_ enable us to vet that all inter-document links point to a valid 
anchor.

one thing i'm not sure about is if would be possible to check for the "anchor 
used more then once in diff adoc files" type problem -- i suspect that the 
catalog's list of PDNamedDestination doesn't allow dups, so that info may 
already be lost as part of the PDF creation??

> create a link+anchor checker for the ref-guide PDF using PDFBox
> ---------------------------------------------------------------
>
>                 Key: SOLR-10934
>                 URL: https://issues.apache.org/jira/browse/SOLR-10934
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: documentation
>            Reporter: Hoss Man
>
> We currently have CheckLinksAndAnchors.java which is automatically run 
> against the ref-guide HTML as part of the build to use JSoup to find bad 
> links/anchors that asciidoctor doesn't complain about -- but not everyone 
> does/can build the HTML version of the ref-guide sincif we can e it requires 
> manually installing jekyll.
> The PDF build only requires things installed by ivy (via JRuby) and we 
> already have some PDFBox based code in ReducePDFSize.java that operates on 
> this PDF every time it's run -- so if we can find a way to do similar checks 
> using the PDFBox API we could catch these broken links faster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to