Re: [PR] Bump aws.version from 1.12.687 to 1.12.688 [tika]

2024-03-26 Thread via GitHub


THausherr merged PR #1694:
URL: https://github.com/apache/tika/pull/1694


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump com.fasterxml.woodstox:woodstox-core from 6.6.1 to 6.6.2 [tika]

2024-03-26 Thread via GitHub


THausherr merged PR #1693:
URL: https://github.com/apache/tika/pull/1693


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump aws.version from 1.12.687 to 1.12.688 [tika]

2024-03-26 Thread via GitHub


dependabot[bot] opened a new pull request, #1694:
URL: https://github.com/apache/tika/pull/1694

   Bumps `aws.version` from 1.12.687 to 1.12.688.
   Updates `com.amazonaws:aws-java-sdk-s3` from 1.12.687 to 1.12.688
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md";>com.amazonaws:aws-java-sdk-s3's
 changelog.
   
   1.12.688 2024-03-26
   AWS Cost Explorer Service
   
   
   Features
   
   Adds support for backfill of cost allocation tags, with new 
StartCostAllocationTagBackfill and ListCostAllocationTagBackfillHistory 
API.
   
   
   
   Agents for Amazon Bedrock Runtime
   
   
   Features
   
   This release adds support to customize prompts sent through the 
RetrieveAndGenerate API in Agents for Amazon Bedrock.
   
   
   
   Amazon EC2 Container Service
   
   
   Features
   
   This is a documentation update for Amazon ECS.
   
   
   
   Amazon Elastic Compute Cloud
   
   
   Features
   
   Documentation updates for Elastic Compute Cloud (EC2).
   
   
   
   FinSpace User Environment Management service
   
   
   Features
   
   Add new operation delete-kx-cluster-node and add status parameter to 
list-kx-cluster-node operation.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/1345c00f58615df8caf0219d8fa752d0fa810646";>1345c00
 AWS SDK for Java 1.12.688
   https://github.com/aws/aws-sdk-java/commit/293d08c6e8af8f7975dadcc044c7a4eb336a08d3";>293d08c
 Update GitHub version number to 1.12.688-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.687...1.12.688";>compare 
view
   
   
   
   
   Updates `com.amazonaws:aws-java-sdk-transcribe` from 1.12.687 to 1.12.688
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md";>com.amazonaws:aws-java-sdk-transcribe's
 changelog.
   
   1.12.688 2024-03-26
   AWS Cost Explorer Service
   
   
   Features
   
   Adds support for backfill of cost allocation tags, with new 
StartCostAllocationTagBackfill and ListCostAllocationTagBackfillHistory 
API.
   
   
   
   Agents for Amazon Bedrock Runtime
   
   
   Features
   
   This release adds support to customize prompts sent through the 
RetrieveAndGenerate API in Agents for Amazon Bedrock.
   
   
   
   Amazon EC2 Container Service
   
   
   Features
   
   This is a documentation update for Amazon ECS.
   
   
   
   Amazon Elastic Compute Cloud
   
   
   Features
   
   Documentation updates for Elastic Compute Cloud (EC2).
   
   
   
   FinSpace User Environment Management service
   
   
   Features
   
   Add new operation delete-kx-cluster-node and add status parameter to 
list-kx-cluster-node operation.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/1345c00f58615df8caf0219d8fa752d0fa810646";>1345c00
 AWS SDK for Java 1.12.688
   https://github.com/aws/aws-sdk-java/commit/293d08c6e8af8f7975dadcc044c7a4eb336a08d3";>293d08c
 Update GitHub version number to 1.12.688-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.687...1.12.688";>compare 
view
   
   
   
   
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apac

[PR] Bump com.fasterxml.woodstox:woodstox-core from 6.6.1 to 6.6.2 [tika]

2024-03-26 Thread via GitHub


dependabot[bot] opened a new pull request, #1693:
URL: https://github.com/apache/tika/pull/1693

   Bumps 
[com.fasterxml.woodstox:woodstox-core](https://github.com/FasterXML/woodstox) 
from 6.6.1 to 6.6.2.
   
   Commits
   
   https://github.com/FasterXML/woodstox/commit/3bed26213d3446e50408a2f10f8eabf5219c9035";>3bed262
 [maven-release-plugin] prepare release woodstox-core-6.6.2
   https://github.com/FasterXML/woodstox/commit/06dfc28437aed9a4c850e0b03c002bb5e1781daa";>06dfc28
 Update release notes wrt https://redirect.github.com/FasterXML/woodstox/issues/200";>#200
   https://github.com/FasterXML/woodstox/commit/d4431712fba049843cbb55031543d9b5a7b16236";>d443171
 Fix shading of isorelax (https://redirect.github.com/FasterXML/woodstox/issues/200";>#200). (https://redirect.github.com/FasterXML/woodstox/issues/202";>#202)
   https://github.com/FasterXML/woodstox/commit/ef10fdca71b298d3a20bdb7434e68e0e798a6812";>ef10fdc
 Fix indentation of test class (remove tabs)
   https://github.com/FasterXML/woodstox/commit/4a256472344435d4fd6954298753b7fea68d1f44";>4a25647
 Update oss-parent ref
   https://github.com/FasterXML/woodstox/commit/85551aa596515a5689c8c892cfa8a25425ea3440";>85551aa
 [maven-release-plugin] prepare for next development iteration
   See full diff in https://github.com/FasterXML/woodstox/compare/woodstox-core-6.6.1...woodstox-core-6.6.2";>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=com.fasterxml.woodstox:woodstox-core&package-manager=maven&previous-version=6.6.1&new-version=6.6.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4152) Fix tika as a service

2024-03-26 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831017#comment-17831017
 ] 

Tim Allison commented on TIKA-4152:
---

[~epugh] any chance you might have a chance to look into this? Totally 
understand if not. Thank you.

> Fix tika as a service
> -
>
> Key: TIKA-4152
> URL: https://issues.apache.org/jira/browse/TIKA-4152
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
>
> We've gotten two reports on the user list in the last month or so on the tika 
> as a service scripts no longer working.
> We should fix this.
> https://lists.apache.org/thread/mnf3pxlmvdy456v4s2b8r7mv3khl3msk
> https://lists.apache.org/thread/ozkrrvbwc0bvqmqb9zc4xofhnd3djqz1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release Apache Tika 2.9.2 Candidate #2

2024-03-26 Thread Tilman Hausherr

+1

successful build on Windows 10, oracle jdk 1.8.0_391

Tilman

On 26.03.2024 16:52, Tim Allison wrote:

A candidate for the Tika 2.9.2 release is available at:
https://dist.apache.org/repos/dist/dev/tika/2.9.2

The release candidate is a zip archive of the sources in:
https://github.com/apache/tika/tree/2.9.2-rc2/

The SHA-512 checksum of the archive is
5ac7b981aa89d44e177dfb457d6f6b73dd54d43641da31e76b3e8bd9dbc236b9d2e6f6958d9182f36cbee6409293f3f21421f9c89837f693f5e10f997e9b063c.

In addition, a staged maven repository is available here:
https://repository.apache.org/content/repositories/orgapachetika-1099/org/apache/tika

Please vote on releasing this package as Apache Tika 2.9.2.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Tika PMC votes are cast.

[ ] +1 Release this package as Apache Tika 2.9.2
[ ] -1 Do not release this package because...

Here's my +1

Best,

   Tim





[VOTE] Release Apache Tika 2.9.2 Candidate #2

2024-03-26 Thread Tim Allison
A candidate for the Tika 2.9.2 release is available at:
https://dist.apache.org/repos/dist/dev/tika/2.9.2

The release candidate is a zip archive of the sources in:
https://github.com/apache/tika/tree/2.9.2-rc2/

The SHA-512 checksum of the archive is
5ac7b981aa89d44e177dfb457d6f6b73dd54d43641da31e76b3e8bd9dbc236b9d2e6f6958d9182f36cbee6409293f3f21421f9c89837f693f5e10f997e9b063c.

In addition, a staged maven repository is available here:
https://repository.apache.org/content/repositories/orgapachetika-1099/org/apache/tika

Please vote on releasing this package as Apache Tika 2.9.2.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Tika PMC votes are cast.

[ ] +1 Release this package as Apache Tika 2.9.2
[ ] -1 Do not release this package because...

Here's my +1

Best,

   Tim


[jira] [Created] (TIKA-4227) Register tika-helm Chart in artifacthub.io

2024-03-26 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created TIKA-4227:
--

 Summary: Register tika-helm Chart in artifacthub.io
 Key: TIKA-4227
 URL: https://issues.apache.org/jira/browse/TIKA-4227
 Project: Tika
  Issue Type: Task
  Components: tika-helm
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.9.2


[https://artifacthub.io/] represents the most popular search interface for 
(amongst lots of other artifacts) Helm Charts.

This task will register the tika-helm Chart with [https://artifacthub.io/].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


2.9.2 rc1 canceled

2024-03-26 Thread Tim Allison
All,
  I got most of the way through the 2.9.2 rc1 process (including closing
the artifact in sonatype and committing the artifacts to the dist.dev repo)
when Tilman noticed the regression in the EpubParser. To make our github
history and general accounting simpler, rather than redoing the rc1
artifact, I'm -1 on that artifact. I've made the change and will post the
vote shortly for rc2.
  Apologies for my missteps.

 Best,

Tim


Re: Tika chart cannot be reached

2024-03-26 Thread Lewis John McGibbney
Hi Pietro,

On 2024/03/26 08:13:39 Pietro Susca wrote:

> 
> Francesco request's is that repo url in not working
> also tika is not searchable on the helm repo hub

Do you mean here - https://artifacthub.io/ ?
If you want it to be searchable via that platform then i can try to make an 
entry.

If there are any other problems with the Chart then please let me know.

Ciao
lewismc


[jira] [Commented] (TIKA-4219) Figure out what to do with epubs with encrypted non-core content

2024-03-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830989#comment-17830989
 ] 

Hudson commented on TIKA-4219:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1576 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1576/])
TIKA-4219 -- clean up...do not include font names in main package (tallison: 
[https://github.com/apache/tika/commit/e88be05ad588a59916f199643f51673d693b0642])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/src/main/java/org/apache/tika/parser/epub/EpubParser.java


> Figure out what to do with epubs with encrypted non-core content
> 
>
> Key: TIKA-4219
> URL: https://issues.apache.org/jira/browse/TIKA-4219
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> On TIKA-4218, we noticed several epubs that were now being identified as 
> encrypted, which is good. We did this work on TIKA-4176.
> On the other hand, we found several epubs that were now identified as 
> encrypted but which had content before we were doing the encryption detection.
> The issue in at least one file that I reviewed is that non-core content is 
> encrypted -- the fonts. So, from a text+metadata extraction, we could still 
> get all the content and then throw an Encrypted Exception or maybe flag 
> something as encrypted.
> I'm not sure what the best thing to do is in this case.
> An example file is here: 
> http://corpora.tika.apache.org/base/docs/commoncrawl3/47/47WOSBEUHE6CRMVDFBOOHUD36FEQAZ6T



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Tika chart cannot be reached

2024-03-26 Thread Lewis John McGibbney
Hi Francesco,
Thanks for letting us know that the repository was unreachable… I can only 
conclude that this was intermittent.
I can easily fetch and deploy the Chart as follows

helm repo add tika https://apache.jfrog.io/artifactory/tika
helm install tika tika/tika --set image.tag=latest-full -n tika-test

Thanks
lewismc

On 2024/03/25 12:16:31 Francesco Scuccimarri wrote:
> Hi Team Dev Tika,
> Over the past few days, I've encountered an issue while trying to use
> tika-helm . When I attempt to add the
> repository for Tika charts using the Helm command, I receive the following
> error message:
> 
> *Looks like 'https://apache.jfrog.io/artifactory/tika/
> ' is not a valid chart
> repository or cannot be reached.*
> 
> It seems that the issue is specific to the Tika chart repository.
> Do you have any updates regarding any changes to the Tika chart repository
> or its accessibility? I've reviewed the documentation and searched online,
> but I haven't found any recent information about this issue.
> 
> Thank you very much for your support.
> 
> Best regards,
> Francesco Scuccimarri
> 


Re: [PR] Support for adding custom tika configuration [tika-helm]

2024-03-26 Thread via GitHub


lewismc commented on PR #15:
URL: https://github.com/apache/tika-helm/pull/15#issuecomment-2020719947

   @nddipiazza any feedback? Would be great to get this merged if possible. 
Thanks 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (TIKA-4226) Use jsoup for epubs

2024-03-26 Thread Tim Allison (Jira)
Tim Allison created TIKA-4226:
-

 Summary: Use jsoup for epubs
 Key: TIKA-4226
 URL: https://issues.apache.org/jira/browse/TIKA-4226
 Project: Tika
  Issue Type: Improvement
Reporter: Tim Allison


We're getting quite a few xml exceptions when parsing epubs (roughly 1k out of 
8k total). We should use Jsoup to handle contents of epubs more robustly.

This is a proposal for 3.x. WDYT?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-26 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830974#comment-17830974
 ] 

Tim Allison commented on TIKA-4218:
---

Thank you, [~tilman]! The mp4 is weird because exiftool was run in 2.9.2, but 
our regular MP4Parser was run in 2.9.1. The bad text in the epub is a side 
effect of work on TIKA-4219. I've just pushed a fix. Thank you, again.

> Run regression tests to support 2.9.2 release
> -
>
> Key: TIKA-4218
> URL: https://issues.apache.org/jira/browse/TIKA-4218
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Attachments: 2.9.1-876503.pdf.json, 2.9.2-876503.pdf.json
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-26 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830954#comment-17830954
 ] 

Tilman Hausherr commented on TIKA-4218:
---

6FOMNUPGPA6IG66Z4NIUEQIVOR5ON46Q (an MP4 file) has a loss of metadata 
(bierenbach: 2 | earlier: 2 | https://www.facebook.com/speedlinecablecam: 2 | 
https://www.speedline-cablecam.com: 2 | in: 2 | of: 2 | the: 2 | this: 2 | 
woods: 2 | year: 2)

EEXR753OKDGYAIXL36PZ2EGYPN477SZU and a few other files have one word in 
TOP_10_MORE_IN_A which reappears in TOP_10_MORE_IN_B but with "oebps". Here, 
"secretary" becomes "secretaryoebps". I don't know if this is a bug or not.

> Run regression tests to support 2.9.2 release
> -
>
> Key: TIKA-4218
> URL: https://issues.apache.org/jira/browse/TIKA-4218
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Attachments: 2.9.1-876503.pdf.json, 2.9.2-876503.pdf.json
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4223) STL file exported with OpenSCAD not detected correctly

2024-03-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830935#comment-17830935
 ] 

Hudson commented on TIKA-4223:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1575 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1575/])
TIKA-4223 -- add detection of stl (#1691) (github: 
[https://github.com/apache/tika/commit/9d45b69dab2016342e44ee2b8bf5ed508676b38b])
* (edit) tika-core/src/test/java/org/apache/tika/TikaDetectionTest.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/mime/TestMimeTypes.java
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* (add) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/test-documents/testSTL-binary.stl
* (add) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/test-documents/testSTL-ascii.stl


> STL file exported with OpenSCAD not detected correctly
> --
>
> Key: TIKA-4223
> URL: https://issues.apache.org/jira/browse/TIKA-4223
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 2.9.1
>Reporter: Robin Schimpf
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
> Attachments: linear_extrude_ascii.stl, linear_extrude_binary.stl
>
>
> STL files can be in ASCII or in binary format. Exporting this file 
> ([https://github.com/openscad/openscad/blob/master/examples/Basics/linear_extrude.scad)]
>  with OpenSCAD into STL the ASCII result file is detected as text/plain.
> Also the binary STL is detected with application/vnd.ms-pki.stl which differs 
> from the model/stl mime-type Wikipedia lists for those files.
>  
> Used commands for attached files
> {code:java}
> openscad.exe --export-format asciistl -o result\linear_extrude_ascii.stl 
> examples\Basics\linear_extrude.scad {code}
> {code:java}
> openscad.exe --export-format binstl -o result\linear_extrude_binary.stl 
> examples\Basics\linear_extrude.scad
> {code}
> Refs:
> https://en.wikipedia.org/wiki/STL_(file_format)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-26 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830934#comment-17830934
 ] 

Tim Allison commented on TIKA-4218:
---

I'll start building RC1. If we find any problems, we can cancel the vote and go 
with rc2.

> Run regression tests to support 2.9.2 release
> -
>
> Key: TIKA-4218
> URL: https://issues.apache.org/jira/browse/TIKA-4218
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Attachments: 2.9.1-876503.pdf.json, 2.9.2-876503.pdf.json
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-26 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830930#comment-17830930
 ] 

Tim Allison commented on TIKA-4218:
---

The number of new attachments from pptx based on [~xyang200]'s work on 
TIKA-4211 is amazing: ~3500 new attachments across our corpus.

> Run regression tests to support 2.9.2 release
> -
>
> Key: TIKA-4218
> URL: https://issues.apache.org/jira/browse/TIKA-4218
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Attachments: 2.9.1-876503.pdf.json, 2.9.2-876503.pdf.json
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-26 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830927#comment-17830927
 ] 

Tim Allison commented on TIKA-4218:
---

This is looking much, much better. I made a small change to the EpubParser that 
will prevent writing the names of font files into the main contents.

The fork of COMPRESS-674 helped out dramatically as did the other fixes.

I merged the recent mime detection updates for 3d files. I _think_ we're good 
to go with rc1 for 2.9.2.


> Run regression tests to support 2.9.2 release
> -
>
> Key: TIKA-4218
> URL: https://issues.apache.org/jira/browse/TIKA-4218
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Attachments: 2.9.1-876503.pdf.json, 2.9.2-876503.pdf.json
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-26 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830918#comment-17830918
 ] 

Tim Allison commented on TIKA-4218:
---

https://corpora.tika.apache.org/base/reports/tika-2.9.2b-prerc1-reports.tgz

> Run regression tests to support 2.9.2 release
> -
>
> Key: TIKA-4218
> URL: https://issues.apache.org/jira/browse/TIKA-4218
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Attachments: 2.9.1-876503.pdf.json, 2.9.2-876503.pdf.json
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-4225) Add detection for AMF

2024-03-26 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-4225.
---
Fix Version/s: 2.9.2
   3.0.0
   Resolution: Fixed

> Add detection for AMF
> -
>
> Key: TIKA-4225
> URL: https://issues.apache.org/jira/browse/TIKA-4225
> Project: Tika
>  Issue Type: Improvement
>Reporter: Robin Schimpf
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
> Attachments: linear_extrude.amf
>
>
> AMF is an alternative format to STL for 3D models. Exporting this file 
> ([https://github.com/openscad/openscad/blob/master/examples/Basics/linear_extrude.scad)]
>  with OpenSCAD into AMF the result file is detected as application/xml.
>  
> Export command
> {code:java}
> openscad.exe -o result\linear_extrude.amf examples\Basics\linear_extrude.scad 
> {code}
> Refs:
> [https://en.wikipedia.org/wiki/Additive_manufacturing_file_format]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-4224) Add detection for 3MF

2024-03-26 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-4224.
---
Fix Version/s: 2.9.2
   3.0.0
   Resolution: Fixed

> Add detection for 3MF
> -
>
> Key: TIKA-4224
> URL: https://issues.apache.org/jira/browse/TIKA-4224
> Project: Tika
>  Issue Type: Improvement
>Reporter: Robin Schimpf
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
> Attachments: linear_extrude.3mf
>
>
> 3MF is an alternative format to STL for 3D models. Exporting this file 
> ([https://github.com/openscad/openscad/blob/master/examples/Basics/linear_extrude.scad)]
>  with OpenSCAD into 3MF the result file is detected as application/zip.
>  
> Export command
> {code:java}
> openscad.exe -o result\linear_extrude.3mf examples\Basics\linear_extrude.scad 
> {code}
> Refs:
> [https://en.wikipedia.org/wiki/3D_Manufacturing_Format]
> [https://3mf.io/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-4222) Add detection for OpenSCAD

2024-03-26 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-4222.
---
Fix Version/s: 2.9.2
   3.0.0
   Resolution: Fixed

> Add detection for OpenSCAD
> --
>
> Key: TIKA-4222
> URL: https://issues.apache.org/jira/browse/TIKA-4222
> Project: Tika
>  Issue Type: Improvement
>Reporter: Robin Schimpf
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
>
> OpenSCAD (https://openscad.org/index.html) is a 3D modeller based on a custom 
> script language. The files are currently detected as text/plain.
>  
>  
> Examples can be found here: 
> https://github.com/openscad/openscad/tree/master/examples



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-4223) STL file exported with OpenSCAD not detected correctly

2024-03-26 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-4223:
--
Fix Version/s: 3.0.0

> STL file exported with OpenSCAD not detected correctly
> --
>
> Key: TIKA-4223
> URL: https://issues.apache.org/jira/browse/TIKA-4223
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 2.9.1
>Reporter: Robin Schimpf
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
> Attachments: linear_extrude_ascii.stl, linear_extrude_binary.stl
>
>
> STL files can be in ASCII or in binary format. Exporting this file 
> ([https://github.com/openscad/openscad/blob/master/examples/Basics/linear_extrude.scad)]
>  with OpenSCAD into STL the ASCII result file is detected as text/plain.
> Also the binary STL is detected with application/vnd.ms-pki.stl which differs 
> from the model/stl mime-type Wikipedia lists for those files.
>  
> Used commands for attached files
> {code:java}
> openscad.exe --export-format asciistl -o result\linear_extrude_ascii.stl 
> examples\Basics\linear_extrude.scad {code}
> {code:java}
> openscad.exe --export-format binstl -o result\linear_extrude_binary.stl 
> examples\Basics\linear_extrude.scad
> {code}
> Refs:
> https://en.wikipedia.org/wiki/STL_(file_format)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-4223) STL file exported with OpenSCAD not detected correctly

2024-03-26 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-4223.
---
Fix Version/s: 2.9.2
   Resolution: Fixed

> STL file exported with OpenSCAD not detected correctly
> --
>
> Key: TIKA-4223
> URL: https://issues.apache.org/jira/browse/TIKA-4223
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 2.9.1
>Reporter: Robin Schimpf
>Priority: Major
> Fix For: 2.9.2
>
> Attachments: linear_extrude_ascii.stl, linear_extrude_binary.stl
>
>
> STL files can be in ASCII or in binary format. Exporting this file 
> ([https://github.com/openscad/openscad/blob/master/examples/Basics/linear_extrude.scad)]
>  with OpenSCAD into STL the ASCII result file is detected as text/plain.
> Also the binary STL is detected with application/vnd.ms-pki.stl which differs 
> from the model/stl mime-type Wikipedia lists for those files.
>  
> Used commands for attached files
> {code:java}
> openscad.exe --export-format asciistl -o result\linear_extrude_ascii.stl 
> examples\Basics\linear_extrude.scad {code}
> {code:java}
> openscad.exe --export-format binstl -o result\linear_extrude_binary.stl 
> examples\Basics\linear_extrude.scad
> {code}
> Refs:
> https://en.wikipedia.org/wiki/STL_(file_format)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4223) STL file exported with OpenSCAD not detected correctly

2024-03-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830911#comment-17830911
 ] 

ASF GitHub Bot commented on TIKA-4223:
--

tballison merged PR #1691:
URL: https://github.com/apache/tika/pull/1691




> STL file exported with OpenSCAD not detected correctly
> --
>
> Key: TIKA-4223
> URL: https://issues.apache.org/jira/browse/TIKA-4223
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 2.9.1
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: linear_extrude_ascii.stl, linear_extrude_binary.stl
>
>
> STL files can be in ASCII or in binary format. Exporting this file 
> ([https://github.com/openscad/openscad/blob/master/examples/Basics/linear_extrude.scad)]
>  with OpenSCAD into STL the ASCII result file is detected as text/plain.
> Also the binary STL is detected with application/vnd.ms-pki.stl which differs 
> from the model/stl mime-type Wikipedia lists for those files.
>  
> Used commands for attached files
> {code:java}
> openscad.exe --export-format asciistl -o result\linear_extrude_ascii.stl 
> examples\Basics\linear_extrude.scad {code}
> {code:java}
> openscad.exe --export-format binstl -o result\linear_extrude_binary.stl 
> examples\Basics\linear_extrude.scad
> {code}
> Refs:
> https://en.wikipedia.org/wiki/STL_(file_format)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4223) STL file exported with OpenSCAD not detected correctly

2024-03-26 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830910#comment-17830910
 ] 

Tim Allison commented on TIKA-4223:
---

Thank you [~nick]! I was hoping you'd have a chance to review.

> STL file exported with OpenSCAD not detected correctly
> --
>
> Key: TIKA-4223
> URL: https://issues.apache.org/jira/browse/TIKA-4223
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 2.9.1
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: linear_extrude_ascii.stl, linear_extrude_binary.stl
>
>
> STL files can be in ASCII or in binary format. Exporting this file 
> ([https://github.com/openscad/openscad/blob/master/examples/Basics/linear_extrude.scad)]
>  with OpenSCAD into STL the ASCII result file is detected as text/plain.
> Also the binary STL is detected with application/vnd.ms-pki.stl which differs 
> from the model/stl mime-type Wikipedia lists for those files.
>  
> Used commands for attached files
> {code:java}
> openscad.exe --export-format asciistl -o result\linear_extrude_ascii.stl 
> examples\Basics\linear_extrude.scad {code}
> {code:java}
> openscad.exe --export-format binstl -o result\linear_extrude_binary.stl 
> examples\Basics\linear_extrude.scad
> {code}
> Refs:
> https://en.wikipedia.org/wiki/STL_(file_format)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] TIKA-4223 -- add detection of stl [tika]

2024-03-26 Thread via GitHub


tballison merged PR #1691:
URL: https://github.com/apache/tika/pull/1691


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4223) STL file exported with OpenSCAD not detected correctly

2024-03-26 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830867#comment-17830867
 ] 

Nick Burch commented on TIKA-4223:
--

A lot of the early file extension allocations were taken from the HTTPD mime 
magics, which for obscure formats is unlikely to be representative of use 
today. So, for something like this, I'm +1 to moving the glob to a more 
common/popular format that also shares the same extension

> STL file exported with OpenSCAD not detected correctly
> --
>
> Key: TIKA-4223
> URL: https://issues.apache.org/jira/browse/TIKA-4223
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 2.9.1
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: linear_extrude_ascii.stl, linear_extrude_binary.stl
>
>
> STL files can be in ASCII or in binary format. Exporting this file 
> ([https://github.com/openscad/openscad/blob/master/examples/Basics/linear_extrude.scad)]
>  with OpenSCAD into STL the ASCII result file is detected as text/plain.
> Also the binary STL is detected with application/vnd.ms-pki.stl which differs 
> from the model/stl mime-type Wikipedia lists for those files.
>  
> Used commands for attached files
> {code:java}
> openscad.exe --export-format asciistl -o result\linear_extrude_ascii.stl 
> examples\Basics\linear_extrude.scad {code}
> {code:java}
> openscad.exe --export-format binstl -o result\linear_extrude_binary.stl 
> examples\Basics\linear_extrude.scad
> {code}
> Refs:
> https://en.wikipedia.org/wiki/STL_(file_format)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4223) STL file exported with OpenSCAD not detected correctly

2024-03-26 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830860#comment-17830860
 ] 

Tim Allison commented on TIKA-4223:
---

So, the one microsoft stl that I can find online: 
http://ctldl.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab
 is the wrapper. The stl file is actually a {{application/pkcs7-signature}}.

Further, when I google or duckduckgo "stl" and file format, the answer is far 
and away this shape format, not the vnd.ms-pki.stl.

So, I propose, moving the glob for ".stl" from vnd.ms-pki.stl to 
{{model/x.stl-binary}}.

> STL file exported with OpenSCAD not detected correctly
> --
>
> Key: TIKA-4223
> URL: https://issues.apache.org/jira/browse/TIKA-4223
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 2.9.1
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: linear_extrude_ascii.stl, linear_extrude_binary.stl
>
>
> STL files can be in ASCII or in binary format. Exporting this file 
> ([https://github.com/openscad/openscad/blob/master/examples/Basics/linear_extrude.scad)]
>  with OpenSCAD into STL the ASCII result file is detected as text/plain.
> Also the binary STL is detected with application/vnd.ms-pki.stl which differs 
> from the model/stl mime-type Wikipedia lists for those files.
>  
> Used commands for attached files
> {code:java}
> openscad.exe --export-format asciistl -o result\linear_extrude_ascii.stl 
> examples\Basics\linear_extrude.scad {code}
> {code:java}
> openscad.exe --export-format binstl -o result\linear_extrude_binary.stl 
> examples\Basics\linear_extrude.scad
> {code}
> Refs:
> https://en.wikipedia.org/wiki/STL_(file_format)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Tika chart cannot be reached

2024-03-26 Thread Pietro Susca
What it means?

Francesco request's is that repo url in not working
also tika is not searchable on the helm repo hub

it's also available until a week ago , and now give error like :

helm repo add tika   *https://apache.jfrog.io/artifactory/tika/
( or the backup url?? at the
same)*
*Error: looks like "https://apache.jfrog.io/ui/native/tika/tika/
" is not a valid chart
repository or cannot be reached: error converting YAML to JSON: yaml: line
3: mapping values are not allowed in this context*

regards

Il giorno lun 25 mar 2024 alle ore 14:43 Tim Allison 
ha scritto:

> Looks like it is back up?
>
> https://apache.jfrog.io/ui/native/tika/tika/
>
> Also looks like we never pushed 2.9.1. We'll make sure to push 2.9.2 when
> that is ready.
>
> On Mon, Mar 25, 2024 at 9:04 AM Francesco Scuccimarri <
> francesco.scuccima...@maggioli.it> wrote:
>
>> Hi Team Dev Tika,
>> Over the past few days, I've encountered an issue while trying to use
>> tika-helm . When I attempt to add
>> the
>> repository for Tika charts using the Helm command, I receive the following
>> error message:
>>
>> *Looks like 'https://apache.jfrog.io/artifactory/tika/
>> ' is not a valid chart
>> repository or cannot be reached.*
>>
>> It seems that the issue is specific to the Tika chart repository.
>> Do you have any updates regarding any changes to the Tika chart repository
>> or its accessibility? I've reviewed the documentation and searched online,
>> but I haven't found any recent information about this issue.
>>
>> Thank you very much for your support.
>>
>> Best regards,
>> Francesco Scuccimarri
>>
>