[ https://issues.apache.org/jira/browse/TIKA-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342009#comment-14342009 ]
Tyler Palsulich commented on TIKA-381: -------------------------------------- This is still an issue in 1.8-SNAPSHOT. {code}<a href="http://goog le.com">link</a>{code} turns into {{<a shape="rect" href="http://goog le.com">link</a>}} > HtmlParser should strip linefeeds out of links > ---------------------------------------------- > > Key: TIKA-381 > URL: https://issues.apache.org/jira/browse/TIKA-381 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.8 > Reporter: Ken Krugler > Assignee: Ken Krugler > > A number of HTML pages contain links where the URL has a linefeed in the > middle of it. > Browsers such as Firefox will automatically remove the character but Tika > passes it back, which results in a broken URL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)