[GitHub] [tika] THausherr merged pull request #743: Bump aws.version from 1.12.319 to 1.12.320

2022-10-13 Thread GitBox


THausherr merged PR #743:
URL: https://github.com/apache/tika/pull/743


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #743: Bump aws.version from 1.12.319 to 1.12.320

2022-10-13 Thread GitBox


dependabot[bot] opened a new pull request, #743:
URL: https://github.com/apache/tika/pull/743

   Bumps `aws.version` from 1.12.319 to 1.12.320.
   Updates `aws-java-sdk-transcribe` from 1.12.319 to 1.12.320
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-transcribe's
 changelog.
   
   1.12.320 2022-10-13
   AWS Amplify UI Builder
   
   
   Features
   
   We are releasing the ability for fields to be configured as arrays.
   
   
   
   AWS Directory Service
   
   
   Features
   
   This release adds support for describing and updating AWS Managed 
Microsoft AD set up.
   
   
   
   AWS Elemental MediaLive
   
   
   Features
   
   AWS Elemental MediaLive now supports forwarding SCTE-35 messages through 
the Event Signaling and Management (ESAM) API, and can read those SCTE-35 
messages from an inactive source.
   
   
   
   AWS Elemental MediaPackage VOD
   
   
   Features
   
   This release adds SPEKE v2 support for MediaPackage VOD. Speke v2 is an 
upgrade to the existing SPEKE API to support multiple encryption keys, based on 
an encryption contract selected by the customer.
   
   
   
   AWS Identity and Access Management
   
   
   Features
   
   Documentation updates for the AWS Identity and Access Management API 
Reference.
   
   
   
   AWS IoT FleetWise
   
   
   Features
   
   Documentation update for AWS IoT FleetWise
   
   
   
   AWS Panorama
   
   
   Features
   
   Pause and resume camera stream processing with 
SignalApplicationInstanceNodeInstances. Reboot an appliance with 
CreateJobForDevices. More application state information in 
DescribeApplicationInstance response.
   
   
   
   AWS RDS DataService
   
   
   Features
   
   Doc update to reflect no support for schema parameter on 
BatchExecuteStatement API
   
   
   
   AWS Systems Manager Incident Manager
   
   
   Features
   
   Update RelatedItem enum to support Tasks
   
   
   
   AWS Transfer Family
   
   
   Features
   
   This release adds an option for customers to configure workflows that 
are triggered when files are only partially received from a client due to 
premature session disconnect.
   
   
   
   Amazon Appflow
   
   
   Features
   
   With this update, you can choose which Salesforce API is used by Amazon 
AppFlow to transfer data to or from your Salesforce account. You can choose the 
Salesforce REST API or Bulk API 2.0. You can also choose for Amazon AppFlow to 
pick the API automatically.
   
   
   
   Amazon Connect Service
   
   
   Features
   
   This release adds support for a secondary email and a mobile number for 
Amazon Connect instance users.
   
   
   
   Amazon Connect Wisdom Service
   
   
   ... (truncated)
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/df9f71a18374665d655a05a6824c52aa625a0b2e;>df9f71a
 AWS SDK for Java 1.12.320
   https://github.com/aws/aws-sdk-java/commit/0feb2499f78fd6028c5ba7900b3441619919b3d0;>0feb249
 Update GitHub version number to 1.12.320-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.319...1.12.320;>compare 
view
   
   
   
   
   Updates `aws-java-sdk-s3` from 1.12.319 to 1.12.320
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-s3's
 changelog.
   
   1.12.320 2022-10-13
   AWS Amplify UI Builder
   
   
   Features
   
   We are releasing the ability for fields to be configured as arrays.
   
   
   
   AWS Directory Service
   
   
   Features
   
   This release adds support for describing and updating AWS Managed 
Microsoft AD set up.
   
   
   
   AWS Elemental MediaLive
   
   
   Features
   
   AWS Elemental MediaLive now supports forwarding SCTE-35 messages through 
the Event Signaling and Management (ESAM) API, and can read those SCTE-35 
messages from an inactive source.
   
   
   
   AWS Elemental MediaPackage VOD
   
   
   Features
   
   This release adds SPEKE v2 support for MediaPackage VOD. Speke v2 is an 
upgrade to the existing SPEKE API to support multiple encryption keys, based on 
an encryption contract selected by the customer.
   
   
   
   AWS Identity and Access Management
   
   
   Features
   
   Documentation updates for the AWS Identity and Access Management API 
Reference.
   
   
   
   AWS IoT FleetWise
   
   
   Features
   
   Documentation update for AWS IoT FleetWise
   
   
   
   AWS Panorama
   
   
   Features
   
   Pause and resume camera stream processing with 
SignalApplicationInstanceNodeInstances. Reboot an appliance with 
CreateJobForDevices. More application state information in 
DescribeApplicationInstance response.
   
   
   
   AWS RDS DataService
   
   
   Features
   
   Doc update to reflect no support for schema parameter on 
BatchExecuteStatement API
   
   
   
   AWS Systems Manager Incident Manager
   
   
   Features
   
   Update RelatedItem enum to support Tasks
   
   
   
   AWS Transfer 

[jira] [Commented] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators

2022-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617327#comment-17617327
 ] 

ASF GitHub Bot commented on TIKA-3879:
--

nddipiazza commented on code in PR #742:
URL: https://github.com/apache/tika/pull/742#discussion_r995166742


##
tika-pipes/tika-emitters/tika-emitter-s3/src/main/java/org/apache/tika/pipes/emitter/s3/S3Emitter.java:
##
@@ -16,33 +16,19 @@
  */
 package org.apache.tika.pipes.emitter.s3;
 
-import static org.apache.tika.config.TikaConfig.mustNotBeEmpty;

Review Comment:
   ah yeah. i'm getting fired up on a new laptop with new everything. so i'm 
setting up my checkstyle profile now





> add test containers test for s3 fetcher, emitter and pipe iterators
> ---
>
> Key: TIKA-3879
> URL: https://issues.apache.org/jira/browse/TIKA-3879
> Project: Tika
>  Issue Type: Test
>  Components: tika-pipes
>Reporter: Nicholas DiPiazza
>Priority: Major
>
> need to add a testcontainers integration test for s3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] nddipiazza commented on a diff in pull request #742: TIKA-3879 - add s3 testcontainers integration test

2022-10-13 Thread GitBox


nddipiazza commented on code in PR #742:
URL: https://github.com/apache/tika/pull/742#discussion_r995166742


##
tika-pipes/tika-emitters/tika-emitter-s3/src/main/java/org/apache/tika/pipes/emitter/s3/S3Emitter.java:
##
@@ -16,33 +16,19 @@
  */
 package org.apache.tika.pipes.emitter.s3;
 
-import static org.apache.tika.config.TikaConfig.mustNotBeEmpty;

Review Comment:
   ah yeah. i'm getting fired up on a new laptop with new everything. so i'm 
setting up my checkstyle profile now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators

2022-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617320#comment-17617320
 ] 

ASF GitHub Bot commented on TIKA-3879:
--

tballison commented on code in PR #742:
URL: https://github.com/apache/tika/pull/742#discussion_r995161152


##
tika-integration-tests/tika-pipes-s3-integration-tests/src/test/java/org/apache/tika/pipes/s3/tests/S3PipeIntegrationTest.java:
##
@@ -0,0 +1,144 @@
+package org.apache.tika.pipes.s3.tests;

Review Comment:
   Probably need a license on this?  checkstyle should have complained!





> add test containers test for s3 fetcher, emitter and pipe iterators
> ---
>
> Key: TIKA-3879
> URL: https://issues.apache.org/jira/browse/TIKA-3879
> Project: Tika
>  Issue Type: Test
>  Components: tika-pipes
>Reporter: Nicholas DiPiazza
>Priority: Major
>
> need to add a testcontainers integration test for s3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] tballison commented on pull request #742: TIKA-3879 - add s3 testcontainers integration test

2022-10-13 Thread GitBox


tballison commented on PR #742:
URL: https://github.com/apache/tika/pull/742#issuecomment-1278224792

   Couple of really small things.  Thank you so much for getting this rolling!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators

2022-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617322#comment-17617322
 ] 

ASF GitHub Bot commented on TIKA-3879:
--

tballison commented on PR #742:
URL: https://github.com/apache/tika/pull/742#issuecomment-1278224792

   Couple of really small things.  Thank you so much for getting this rolling!




> add test containers test for s3 fetcher, emitter and pipe iterators
> ---
>
> Key: TIKA-3879
> URL: https://issues.apache.org/jira/browse/TIKA-3879
> Project: Tika
>  Issue Type: Test
>  Components: tika-pipes
>Reporter: Nicholas DiPiazza
>Priority: Major
>
> need to add a testcontainers integration test for s3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] tballison commented on a diff in pull request #742: TIKA-3879 - add s3 testcontainers integration test

2022-10-13 Thread GitBox


tballison commented on code in PR #742:
URL: https://github.com/apache/tika/pull/742#discussion_r995161152


##
tika-integration-tests/tika-pipes-s3-integration-tests/src/test/java/org/apache/tika/pipes/s3/tests/S3PipeIntegrationTest.java:
##
@@ -0,0 +1,144 @@
+package org.apache.tika.pipes.s3.tests;

Review Comment:
   Probably need a license on this?  checkstyle should have complained!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators

2022-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617321#comment-17617321
 ] 

ASF GitHub Bot commented on TIKA-3879:
--

tballison commented on code in PR #742:
URL: https://github.com/apache/tika/pull/742#discussion_r995161476


##
tika-pipes/tika-emitters/tika-emitter-s3/src/main/java/org/apache/tika/pipes/emitter/s3/S3Emitter.java:
##
@@ -16,33 +16,19 @@
  */
 package org.apache.tika.pipes.emitter.s3;
 
-import static org.apache.tika.config.TikaConfig.mustNotBeEmpty;

Review Comment:
   Import order had to be fixed?





> add test containers test for s3 fetcher, emitter and pipe iterators
> ---
>
> Key: TIKA-3879
> URL: https://issues.apache.org/jira/browse/TIKA-3879
> Project: Tika
>  Issue Type: Test
>  Components: tika-pipes
>Reporter: Nicholas DiPiazza
>Priority: Major
>
> need to add a testcontainers integration test for s3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] tballison commented on a diff in pull request #742: TIKA-3879 - add s3 testcontainers integration test

2022-10-13 Thread GitBox


tballison commented on code in PR #742:
URL: https://github.com/apache/tika/pull/742#discussion_r995161476


##
tika-pipes/tika-emitters/tika-emitter-s3/src/main/java/org/apache/tika/pipes/emitter/s3/S3Emitter.java:
##
@@ -16,33 +16,19 @@
  */
 package org.apache.tika.pipes.emitter.s3;
 
-import static org.apache.tika.config.TikaConfig.mustNotBeEmpty;

Review Comment:
   Import order had to be fixed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators

2022-10-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617311#comment-17617311
 ] 

ASF GitHub Bot commented on TIKA-3879:
--

nddipiazza opened a new pull request, #742:
URL: https://github.com/apache/tika/pull/742

   # add s3 tika pipes integration tests
   
   add integration test for s3 pipe iterator, s3 fetcher, and s3 emitter.




> add test containers test for s3 fetcher, emitter and pipe iterators
> ---
>
> Key: TIKA-3879
> URL: https://issues.apache.org/jira/browse/TIKA-3879
> Project: Tika
>  Issue Type: Test
>  Components: tika-pipes
>Reporter: Nicholas DiPiazza
>Priority: Major
>
> need to add a testcontainers integration test for s3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3874) Add summary of missing unicode mappings for PDF

2022-10-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617308#comment-17617308
 ] 

Hudson commented on TIKA-3874:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #843 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/843/])
TIKA-3874 -- Add summary of missing unicode mappings for PDF (tallison: 
[https://github.com/apache/tika/commit/d6ae5185aff23e834af64b6ea347b638e54981d3])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java


> Add summary of missing unicode mappings for PDF
> ---
>
> Key: TIKA-3874
> URL: https://issues.apache.org/jira/browse/TIKA-3874
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 2.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF

2022-10-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617310#comment-17617310
 ] 

Hudson commented on TIKA-3875:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #843 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/843/])
TIKA-3875 -- Add metadata items for "broken" fonts and non-embedded fonts for 
PDF (tallison: 
[https://github.com/apache/tika/commit/3fcda6da8155028ae915951cf448e40c0df2e348])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java


> Add metadata items for "broken" fonts and non-embedded fonts for PDF
> 
>
> Key: TIKA-3875
> URL: https://issues.apache.org/jira/browse/TIKA-3875
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 2.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3878) Improve PipesReporter and PipesIterator to report the total number of files to be processed

2022-10-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617309#comment-17617309
 ] 

Hudson commented on TIKA-3878:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #843 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/843/])
TIKA-3878 -- allow pipes iterators to count the total number of files. 
(tallison: 
[https://github.com/apache/tika/commit/339289e45eae6560155f0fb7631687cfc86ba610])
* (edit) 
tika-pipes/tika-pipes-reporters/tika-pipes-reporter-fs-status/src/test/java/org/apache/tika/pipes/reporters/fs/TestFileSystemStatusReporter.java
* (edit) tika-parent/pom.xml
* (edit) tika-pipes/tika-pipes-reporters/tika-pipes-reporter-fs-status/pom.xml
* (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesReporter.java
* (add) 
tika-core/src/main/java/org/apache/tika/pipes/pipesiterator/TotalCounter.java
* (edit) 
tika-pipes/tika-pipes-reporters/tika-pipes-reporter-fs-status/src/main/java/org/apache/tika/pipes/reporters/fs/FileSystemStatusReporter.java
* (add) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncStatus.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncProcessor.java
* (edit) 
tika-core/src/main/java/org/apache/tika/pipes/pipesiterator/fs/FileSystemPipesIterator.java
* (add) 
tika-core/src/main/java/org/apache/tika/pipes/pipesiterator/TotalCountResult.java


> Improve PipesReporter and PipesIterator to report the total number of files 
> to be processed
> ---
>
> Key: TIKA-3878
> URL: https://issues.apache.org/jira/browse/TIKA-3878
> Project: Tika
>  Issue Type: New Feature
>Reporter: Tim Allison
>Priority: Major
>
> For user-facing applications, it would be useful to give them a sense of 
> progress in reporting with a denominator (total files to process). 
> Some pipesiterators will have a natural shortcut (select count(1)... for jdbc 
> or other queries in OpenSearch and/or Solr).  Some will have to do twice the 
> work -- file system and s3(?).  And some simply won't be able to report a 
> total number.
> My initial target is the FileSystemPipesIterator and the 
> FileSystemStatusReporter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] nddipiazza opened a new pull request, #742: TIKA-3879 - add s3 testcontainers integration test

2022-10-13 Thread GitBox


nddipiazza opened a new pull request, #742:
URL: https://github.com/apache/tika/pull/742

   # add s3 tika pipes integration tests
   
   add integration test for s3 pipe iterator, s3 fetcher, and s3 emitter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators

2022-10-13 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-3879:
---

 Summary: add test containers test for s3 fetcher, emitter and pipe 
iterators
 Key: TIKA-3879
 URL: https://issues.apache.org/jira/browse/TIKA-3879
 Project: Tika
  Issue Type: Test
  Components: tika-pipes
Reporter: Nicholas DiPiazza


need to add a testcontainers integration test for s3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika-helm] lewismc merged pull request #6: Fixes a bug with the order of HPA resources in ArgoCD

2022-10-13 Thread GitBox


lewismc merged PR #6:
URL: https://github.com/apache/tika-helm/pull/6


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika-helm] lewismc commented on pull request #6: Fixes a bug with the order of HPA resources in ArgoCD

2022-10-13 Thread GitBox


lewismc commented on PR #6:
URL: https://github.com/apache/tika-helm/pull/6#issuecomment-1278179073

   Hi @stijnbrouwers it looks like the issue is going through a cycle of being 
stale then being reopened. I'll merge your PR. Thanks for yur patience. I was 
away from the project for a while.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3826) Helm: use appVersion from Charts.yaml intsead of images.tag

2022-10-13 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617298#comment-17617298
 ] 

Lewis John McGibbney commented on TIKA-3826:


[~hairmare] good suggestion. Please file a PR and tage me. i will be happy to 
review.
Thanks

> Helm: use appVersion from Charts.yaml intsead of images.tag
> ---
>
> Key: TIKA-3826
> URL: https://issues.apache.org/jira/browse/TIKA-3826
> Project: Tika
>  Issue Type: Bug
>  Components: helm
>Affects Versions: 2.2.1
>Reporter: Lucas Bickel
>Priority: Major
>
> This is about the [tika Helm chart|https://github.com/apache/tika-helm].
> In `values.yaml` we currently have 
> [this|https://github.com/apache/tika-helm/blob/492386471616713bddbc5851912acdd78bd87609/values.yaml#L25-L26]:
> {code:yaml}
> # Overrides the image tag whose default is the chart appVersion.
>   tag: "1.26"
> {code}
> This leads to {{ .Values.image.tag | default .Chart.AppVersion }} [in 
> deployment.yaml|https://github.com/apache/tika-helm/blob/492386471616713bddbc5851912acdd78bd87609/templates/deployment.yaml#L52]
>  being dead code.
> Currently the docs indicate that we should set {{image.tag}} during the 
> deployment, skipping this step results in deploying a very outdated tika 1.26.
> My proposal for fixing this is to set the appVersion in {{Chart.yaml}} to the 
> latest 2.4.1-full version and set the image.tag to an empty version so it 
> defaults to the version from Chart.yaml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF

2022-10-13 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3875.
---
Fix Version/s: 2.5.1
   Resolution: Fixed

Many thanks to [~tilman] for guidance on the PDFBox user's list for where to 
capture this info.

> Add metadata items for "broken" fonts and non-embedded fonts for PDF
> 
>
> Key: TIKA-3875
> URL: https://issues.apache.org/jira/browse/TIKA-3875
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 2.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-3874) Add summary of missing unicode mappings for PDF

2022-10-13 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3874.
---
Fix Version/s: 2.5.1
   Resolution: Fixed

> Add summary of missing unicode mappings for PDF
> ---
>
> Key: TIKA-3874
> URL: https://issues.apache.org/jira/browse/TIKA-3874
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 2.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3876) Add a main() method to AsyncProcessor

2022-10-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617129#comment-17617129
 ] 

Hudson commented on TIKA-3876:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #841 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/841/])
TIKA-3876 create a main() in AsyncProcessor (tallison: 
[https://github.com/apache/tika/commit/07386be85574e2174f2ab5564f5df0e910fefbf9])
* (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncProcessor.java
* (edit) tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java


> Add a main() method to AsyncProcessor
> -
>
> Key: TIKA-3876
> URL: https://issues.apache.org/jira/browse/TIKA-3876
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Trivial
> Fix For: 2.5.1
>
>
> This will allow users to call an async process with only tika-core.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3877) FileSystemStatusReporter's reporterThread should be daemon

2022-10-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617128#comment-17617128
 ] 

Hudson commented on TIKA-3877:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #841 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/841/])
TIKA-3877 -- fix potential thread leak in FileSystemStatusReporter (tallison: 
[https://github.com/apache/tika/commit/79c4aef6ba8f31eca3e740216a5d8fa09f0e3895])
* (edit) 
tika-pipes/tika-pipes-reporters/tika-pipes-reporter-fs-status/src/main/java/org/apache/tika/pipes/reporters/fs/FileSystemStatusReporter.java


> FileSystemStatusReporter's reporterThread should be daemon
> --
>
> Key: TIKA-3877
> URL: https://issues.apache.org/jira/browse/TIKA-3877
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.5.1
>
>
> Thread leak. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3878) Improve PipesReporter and PipesIterator to report the total number of files to be processed

2022-10-13 Thread Tim Allison (Jira)
Tim Allison created TIKA-3878:
-

 Summary: Improve PipesReporter and PipesIterator to report the 
total number of files to be processed
 Key: TIKA-3878
 URL: https://issues.apache.org/jira/browse/TIKA-3878
 Project: Tika
  Issue Type: New Feature
Reporter: Tim Allison


For user-facing applications, it would be useful to give them a sense of 
progress in reporting with a denominator (total files to process). 

Some pipesiterators will have a natural shortcut (select count(1)... for jdbc 
or other queries in OpenSearch and/or Solr).  Some will have to do twice the 
work -- file system and s3(?).  And some simply won't be able to report a total 
number.

My initial target is the FileSystemPipesIterator and the 
FileSystemStatusReporter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-3877) FileSystemStatusReporter's reporterThread should be daemon

2022-10-13 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3877.
---
Fix Version/s: 2.5.1
   Resolution: Fixed

> FileSystemStatusReporter's reporterThread should be daemon
> --
>
> Key: TIKA-3877
> URL: https://issues.apache.org/jira/browse/TIKA-3877
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.5.1
>
>
> Thread leak. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3877) FileSystemStatusReporter's reporterThread should be daemon

2022-10-13 Thread Tim Allison (Jira)
Tim Allison created TIKA-3877:
-

 Summary: FileSystemStatusReporter's reporterThread should be daemon
 Key: TIKA-3877
 URL: https://issues.apache.org/jira/browse/TIKA-3877
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison


Thread leak. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-3876) Add a main() method to AsyncProcessor

2022-10-13 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3876.
---
Fix Version/s: 2.5.1
   Resolution: Fixed

> Add a main() method to AsyncProcessor
> -
>
> Key: TIKA-3876
> URL: https://issues.apache.org/jira/browse/TIKA-3876
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Trivial
> Fix For: 2.5.1
>
>
> This will allow users to call an async process with only tika-core.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3876) Add a main() method to AsyncProcessor

2022-10-13 Thread Tim Allison (Jira)
Tim Allison created TIKA-3876:
-

 Summary: Add a main() method to AsyncProcessor
 Key: TIKA-3876
 URL: https://issues.apache.org/jira/browse/TIKA-3876
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison


This will allow users to call an async process with only tika-core.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3874) Add summary of missing unicode mappings for PDF

2022-10-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616949#comment-17616949
 ] 

Tim Allison commented on TIKA-3874:
---

Not clear how we want to do this.  The simplest method would be a percentage, 
but it feels like we should have a sense of scale as well.  

If one pdf only has 10 characters and 9 of them lack mappings, is that a 
greater loss of information than a PDF with 1 characters and missing 
mappings for 1000?

Perhaps one field for overall average and one for sum of missing?

> Add summary of missing unicode mappings for PDF
> ---
>
> Key: TIKA-3874
> URL: https://issues.apache.org/jira/browse/TIKA-3874
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF

2022-10-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616948#comment-17616948
 ] 

Tim Allison edited comment on TIKA-3875 at 10/13/22 10:48 AM:
--

[~tilman] responded to my question on the PDFBox user list that PDFont has an 
.isEmbedded() method.  We have access to PDFonts with the document at the end 
of each page and on every call to showGlyph().

Not sure if we want a boolean for the document or counts of characters per page 
like we do for missing unicode mappings.  Or both?


was (Author: talli...@mitre.org):
[~tilman] responded to my question on the PDFBox user list that PDFont has an 
.isEmbedded() method.  We have access to PDFonts with the document at the end 
of each page and on every call to showGlyph().

Not sure if we want a boolean for the document or counts of characters per page 
like we do for missing unicode mappings.

> Add metadata items for "broken" fonts and non-embedded fonts for PDF
> 
>
> Key: TIKA-3875
> URL: https://issues.apache.org/jira/browse/TIKA-3875
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF

2022-10-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616948#comment-17616948
 ] 

Tim Allison commented on TIKA-3875:
---

[~tilman] responded to my question on the PDFBox user list that PDFont has an 
.isEmbedded() method.  We have access to PDFonts with the document at the end 
of each page and on every call to showGlyph().

Not sure if we want a boolean for the document or counts of characters per page 
like we do for missing unicode mappings.

> Add metadata items for "broken" fonts and non-embedded fonts for PDF
> 
>
> Key: TIKA-3875
> URL: https://issues.apache.org/jira/browse/TIKA-3875
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF

2022-10-13 Thread Tim Allison (Jira)
Tim Allison created TIKA-3875:
-

 Summary: Add metadata items for "broken" fonts and non-embedded 
fonts for PDF
 Key: TIKA-3875
 URL: https://issues.apache.org/jira/browse/TIKA-3875
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF

2022-10-13 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-3875:
--
Priority: Minor  (was: Major)

> Add metadata items for "broken" fonts and non-embedded fonts for PDF
> 
>
> Key: TIKA-3875
> URL: https://issues.apache.org/jira/browse/TIKA-3875
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3874) Add summary of missing unicode mappings for PDF

2022-10-13 Thread Tim Allison (Jira)
Tim Allison created TIKA-3874:
-

 Summary: Add summary of missing unicode mappings for PDF
 Key: TIKA-3874
 URL: https://issues.apache.org/jira/browse/TIKA-3874
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] THausherr merged pull request #741: Bump cxf.version from 3.5.3 to 3.5.4

2022-10-13 Thread GitBox


THausherr merged PR #741:
URL: https://github.com/apache/tika/pull/741


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org