Markus Jelsma created TIKA-975:
----------------------------------
Summary: LinkBuilder to optionally collapse anchor whitespace
Key: TIKA-975
URL: https://issues.apache.org/jira/browse/TIKA-975
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.2
Reporter: Markus Jelsma
Priority: Minor
Fix For: 1.3
Links extracted by the LinkContentHandler contain the verbatim anchor text.
This is usually fine but unfortunately many websites have the anchor text
spread over multiple lines or have it indented with tabulators or spaces.
This patch adds a boolean option to LinkContentHandler with which whitespace
collapsing can be toggled on or off. Default behaviour remains as-is and the
API remains backward compatible.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira