Y, those are embedded files within the zip (e.g. "/shuffle.bmp"). I should add "embedded resource path" and or "resource name" as a field in that report.
I'll open a ticket for that. Those exceptions at least '43R5U3BXJUDJXDZ25OAE33ZU47362WLV' are actually new...I need to figure out why we are getting those now but weren't with 2.8.0. Thank you! On Wed, Jul 19, 2023 at 11:47 PM Tilman Hausherr <[email protected]> wrote: > In the new_exceptions_in_B_details.xlsx file, > commoncrawl3/43/43R5U3BXJUDJXDZ25OAE33ZU47362WLV is listed as "bmp", but > it is a zip file. And I get no exception when trying to extract all > attachments with the -z option > > Same for commoncrawl3/M4/M4J5KAPEC5F62UXFNCPRQATQWH3FSWPG > > Tilman > > On 19.07.2023 19:19, Tim Allison wrote: > > Results are here: > > https://corpora.tika.apache.org/base/reports/tika-2.8.1-pre-rc1.tgz > > > > This is on a new set of ~1.3 million files, including fewer truncated > PDFs. > > > > I've only had a chance to look quickly. No showstoppers leapt out to me. > > There are some expected differences, and a couple of surprises. I'm > going > > to dig a bit tomorrow and then start the release process unless anyone > > finds anything concerning or has a blocker. > > > > Thank you, all! > > > > Best, > > > > Tim > > > > On Thu, Jul 13, 2023 at 7:00 PM Tim Allison <[email protected]> wrote: > > > >> All, > >> I think we’re at a good place for a minor version release? Should I > >> start the regression tests tomorrow for potential release next week or > week > >> after? > >> Any blockers or things we should try to get in? > >> > >> Thank you! > >> > >> Best, > >> > >> Tim > >> > >> On Thu, Jul 13, 2023 at 5:20 PM Nicolò Mendola (Jira) <[email protected]> > >> wrote: > >> > >>> [ > >>> > https://issues.apache.org/jira/browse/TIKA-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742957#comment-17742957 > >>> ] > >>> > >>> Nicolò Mendola edited comment on TIKA-4064 at 7/13/23 9:19 PM: > >>> --------------------------------------------------------------- > >>> > >>> Just out of interest, is there an eta for release 2.8.1 to be > published? > >>> > >>> Best regards > >>> > >>> > >>> was (Author: JIRAUSER296595): > >>> Just out of interest, is there an eta for release 2.8.1 to be > published? > >>> > >>> > >>> > >>> Best regards > >>> > >>>> Update to 2.8.1 > >>>> --------------- > >>>> > >>>> Key: TIKA-4064 > >>>> URL: https://issues.apache.org/jira/browse/TIKA-4064 > >>>> Project: Tika > >>>> Issue Type: Task > >>>> Components: build > >>>> Affects Versions: 2.8.0 > >>>> Reporter: Tilman Hausherr > >>>> Priority: Minor > >>>> Fix For: 2.8.1 > >>>> > >>>> > >>>> The latest maven versions plugin finds much more outdated stuff than > >>> the previous one, so I'll do a few updates. > >>> > >>> > >>> > >>> -- > >>> This message was sent by Atlassian Jira > >>> (v8.20.10#820010) > >>> > >
