[VOTE] Release Apache Tika 2.4.0 Candidate #1

2022-04-28 Thread Tim Allison
A candidate for the Tika 2.4.0 release is available at: https://dist.apache.org/repos/dist/dev/tika/2.4.0 The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/2.4.0-rc1/ The SHA-512 checksum of the archive is

[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-28 Thread Dan Coldrick (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529667#comment-17529667 ] Dan Coldrick commented on TIKA-3742: [~nick]  I've made a start today which I can share at some point

Re: Next releases WAS: Re: 2.4.0 release?

2022-04-28 Thread Tim Allison
https://repository.apache.org is having a bad day. Requests are timing out left and right. I'll try to perform the release of 2.4.0-rc1 later today or tomorrow when the repo is happier. On Thu, Apr 28, 2022 at 9:47 AM Tim Allison wrote: > > I've upgraded junrar in both branches, and the

[jira] [Commented] (TIKA-3743) github actions -- we should install

2022-04-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529551#comment-17529551 ] Hudson commented on TIKA-3743: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #533 (See

Re: [VOTE] Release Apache Tika 1.28.2 Candidate #2

2022-04-28 Thread Tilman Hausherr
+1 Tilman Am 28.04.2022 um 16:54 schrieb Tim Allison: A candidate for the Tika 1.28.2 release is available at: https://dist.apache.org/repos/dist/dev/tika/1.28.2 The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/1.28.2-rc2/ The SHA-512

[jira] [Commented] (TIKA-3743) github actions -- we should install

2022-04-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529503#comment-17529503 ] Tim Allison commented on TIKA-3743: --- Hahahahaha. That didn't work. {noformat} [INFO]

[jira] [Updated] (TIKA-3743) github actions -- we should install

2022-04-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3743: -- Attachment: Screenshot from 2022-04-28 11-39-16.png > github actions -- we should install >

Re: How to deal with the recursive content in Tika 2

2022-04-28 Thread Sergey Beryozkin
Great, will give it a try asap Cheers, Serget On Thu, Apr 28, 2022 at 4:22 PM Tim Allison wrote: > Give this a try: > > https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/metadata/TikaCoreProperties.java#L60 > > On Thu, Apr 28, 2022 at 11:12 AM Sergey Beryozkin >

[jira] [Created] (TIKA-3743) github actions -- we should install

2022-04-28 Thread Tim Allison (Jira)
Tim Allison created TIKA-3743: - Summary: github actions -- we should install Key: TIKA-3743 URL: https://issues.apache.org/jira/browse/TIKA-3743 Project: Tika Issue Type: Improvement

Re: How to deal with the recursive content in Tika 2

2022-04-28 Thread Tim Allison
Give this a try: https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/metadata/TikaCoreProperties.java#L60 On Thu, Apr 28, 2022 at 11:12 AM Sergey Beryozkin wrote: > > Hi Tim, All > > We have a pending issue in Quarkus Tika to upgrade to Tika 2. > One of the problems

How to deal with the recursive content in Tika 2

2022-04-28 Thread Sergey Beryozkin
Hi Tim, All We have a pending issue in Quarkus Tika to upgrade to Tika 2. One of the problems is that according to a user's comment the recursive content is treated somehow differently in Tika2, specifically, this code:

[jira] [Commented] (TIKA-3740) Update junrar > 7.5.0 when available

2022-04-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529482#comment-17529482 ] Hudson commented on TIKA-3740: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #531 (See

[VOTE] Release Apache Tika 1.28.2 Candidate #2

2022-04-28 Thread Tim Allison
A candidate for the Tika 1.28.2 release is available at: https://dist.apache.org/repos/dist/dev/tika/1.28.2 The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/1.28.2-rc2/ The SHA-512 checksum of the archive is

[jira] [Commented] (TIKA-3740) Update junrar > 7.5.0 when available

2022-04-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529468#comment-17529468 ] Hudson commented on TIKA-3740: -- UNSTABLE: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #193 (See

[jira] [Commented] (TIKA-3571) Add an interface for rendering engines

2022-04-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529465#comment-17529465 ] Tim Allison commented on TIKA-3571: --- The other thing we need to account for is multiple renderings per

[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529459#comment-17529459 ] Tim Allison commented on TIKA-3742: --- [~nick] your gist looks great! [~monkmachine], I'm passing the

[jira] [Resolved] (TIKA-3740) Update junrar > 7.5.0 when available

2022-04-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3740. --- Fix Version/s: 1.28.2 2.4.0 Resolution: Fixed Many thanks to [~gotson] and

Re: Next releases WAS: Re: 2.4.0 release?

2022-04-28 Thread Tim Allison
I've upgraded junrar in both branches, and the regression results look good. I'll start 1.28.2-rc2 shortly, and then follow up with 2.4.0-rc1 if there aren't any objections. On Tue, Apr 26, 2022 at 9:10 AM Tim Allison wrote: > > All, > > I'm prepping rc1 for 1.28.2 now. > > I'm running the

[jira] [Comment Edited] (TIKA-3740) Update junrar > 7.5.0 when available

2022-04-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529444#comment-17529444 ] Tim Allison edited comment on TIKA-3740 at 4/28/22 1:43 PM: Regression results

[jira] [Commented] (TIKA-3740) Update junrar > 7.5.0 when available

2022-04-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529444#comment-17529444 ] Tim Allison commented on TIKA-3740: --- Regression results on 1.x branch on full set of rar files looks

[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529431#comment-17529431 ] Tim Allison commented on TIKA-3742: --- IOUtils.readFully()? > Advice around DGN7 parser and whether to

[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-28 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529417#comment-17529417 ] Nick Burch commented on TIKA-3742: -- I believe {{readNBytes}} only came in with Java 9, and the particular

[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-28 Thread Dan Coldrick (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529409#comment-17529409 ] Dan Coldrick commented on TIKA-3742: [~nick]  I can have a go although I can't get the following line

Re: 1.28.2 regression results

2022-04-28 Thread Tim Allison
Tilman, Thank you for looking carefully at the reports! > commoncrawl3/OR/ORTIXLZEFH4QC5RJTV3L5XBNOVW42KPH 1Sonig is what we're getting in 2.3.0 and in the 2.4.0-soon-to-be-candidate, and it looks correct based on the underlying xml and when I open it in LibreOffice. It looks like it was

Re: 1.28.2 regression results

2022-04-28 Thread Tilman Hausherr
Am 28.04.2022 um 00:25 schrieb Tim Allison: Are available here: https://corpora.tika.apache.org/base/reports/tika-1.28.2-reports-20220427.tgz I haven't taken a look yet. Let me know if you find anything. commoncrawl3/OR/ORTIXLZEFH4QC5RJTV3L5XBNOVW42KPH this is minor and is related to