Hi Tilman,

  Thank you for raising this. 3X4JRZZ4TQ2GK4QQDQEXMFCVLM3FM5I4 is not
related to TIKA-3734.  The updated junrar (7.5.0) is swallowing a
(new) exception on this file and stopping the parse without throwing
an exception.  The earlier version of junrar (7.4.1) did not find a
problem with the file.

  My ubuntu package util throws an exception on this file, and I think
it is just kind of wonky.

  I'm going to fix the dependency convergence issues.  Is there anything else?

      Best,

                 Tim

On Tue, Apr 26, 2022 at 2:52 PM Tilman Hausherr <thaush...@t-online.de> wrote:
>
> Am 26.04.2022 um 13:07 schrieb Tim Allison:
> > Reports are here:
> > https://corpora.tika.apache.org/base/reports/reports-tika-1.28.2-SNAPSHOT.tgz
> >
> > I found two issues that should be fixed (TIKA-3733 and TIKA-3734).  I
> > think both are related to the underlying parsers being stricter (which
> > is good), but we need to change our code to handle these cases more
> > robustly.
> >
> > Let me know if you see anything else.
>
> What about commoncrawl3/3X/3X4JRZZ4TQ2GK4QQDQEXMFCVLM3FM5I4 , this is
> also a rar file and the last entry in content_diffs_no_exceptions.xlsx .
> Is that related to TIKA-3734 ?
>
> Tilman
>

Reply via email to