[ https://issues.apache.org/jira/browse/TIKA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769215#comment-17769215 ]
Tim Allison commented on TIKA-2562: ----------------------------------- We're now getting this with the migration to JSoup in the 3.x/main branch. This looks right? <p>Lorem ipsum dolor sit amet, consectetur adipiscing laborum.</p> <a href="http://www.google.com">http://www.google.com</a> <a href="https://mail.google.com/mail/?tab=wm">https://mail.google.com/mail/?tab=wm</a> > tika server parse HTML removes DIVs around hyperlink & adds shape > ----------------------------------------------------------------- > > Key: TIKA-2562 > URL: https://issues.apache.org/jira/browse/TIKA-2562 > Project: Tika > Issue Type: Bug > Components: gui, parser, server > Affects Versions: 1.17 > Reporter: NW Brad > Priority: Major > Attachments: tika_adds_shape_to_hyperlink.html > > > Hyperlinks in a HTML document that are parsed via tika server: > curl -X PUT --upload-file tika_adds_shape_to_hyperlink.html > [http://localhost:9998/tika] --header "Accept: text/html" > sent: > <div> > <a > href="http://www.google.com">[http://www.google.com|http://www.google.com/]</a> > </div> > received back: > <a shape="rect" > href="http://www.google.com">[http://www.google.com|http://www.google.com/]</a> > > Divs are are gone and a shape has been added > -- This message was sent by Atlassian Jira (v8.20.10#820010)