I should clarify that I fixed the two regressions that I had
identified in the release candidate.  The regression results that I
shared were run with 1.x before those fixes.

Still, let's fix the dependency convergence, and please let me know if
there's anything else you find in the regression reports!

On Tue, Apr 26, 2022 at 3:40 PM Tim Allison <talli...@apache.org> wrote:
>
> Hi Tilman,
>
>   Thank you for raising this. 3X4JRZZ4TQ2GK4QQDQEXMFCVLM3FM5I4 is not
> related to TIKA-3734.  The updated junrar (7.5.0) is swallowing a
> (new) exception on this file and stopping the parse without throwing
> an exception.  The earlier version of junrar (7.4.1) did not find a
> problem with the file.
>
>   My ubuntu package util throws an exception on this file, and I think
> it is just kind of wonky.
>
>   I'm going to fix the dependency convergence issues.  Is there anything else?
>
>       Best,
>
>                  Tim
>
> On Tue, Apr 26, 2022 at 2:52 PM Tilman Hausherr <thaush...@t-online.de> wrote:
> >
> > Am 26.04.2022 um 13:07 schrieb Tim Allison:
> > > Reports are here:
> > > https://corpora.tika.apache.org/base/reports/reports-tika-1.28.2-SNAPSHOT.tgz
> > >
> > > I found two issues that should be fixed (TIKA-3733 and TIKA-3734).  I
> > > think both are related to the underlying parsers being stricter (which
> > > is good), but we need to change our code to handle these cases more
> > > robustly.
> > >
> > > Let me know if you see anything else.
> >
> > What about commoncrawl3/3X/3X4JRZZ4TQ2GK4QQDQEXMFCVLM3FM5I4 , this is
> > also a rar file and the last entry in content_diffs_no_exceptions.xlsx .
> > Is that related to TIKA-3734 ?
> >
> > Tilman
> >

Reply via email to