[
https://issues.apache.org/jira/browse/TIKA-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950951#comment-17950951
]
Tim Allison commented on TIKA-4411:
-----------------------------------
K. The xhtml issue appears to be a difference in how jsoup 1.18.3 and jsoup
1.19.1 handle broken xhtml.
Note: the change in jsoup happened between 1.18.3 and 1.19.1 -- we're now using
the latest version of jsoup: 1.20.1, which still has the 1.19.1 behavior.
The publicly available example file is here:
https://bug1554250.bmoattachments.org/attachment.cgi?id=9068831
If anyone wants to dig into this and open an issue on jsoup (if there's a
problem?!), please go for it.
I don't think this is a significant enough difference to warrant downgrading
jsoup to 1.18.3.
I'll start the 3.2.0 release process shortly. I'm happy to respin if anyone
disagrees or would prefer a different solution.
Onwards!
> Run the 3.2.0 release process
> -----------------------------
>
> Key: TIKA-4411
> URL: https://issues.apache.org/jira/browse/TIKA-4411
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Fix For: 3.2.0
>
> Attachments: reports-3.2.0-pre-rc1.tgz, reports-3.2.0.tgz
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)