[GitHub] [tika] THausherr merged pull request #743: Bump aws.version from 1.12.319 to 1.12.320
THausherr merged PR #743: URL: https://github.com/apache/tika/pull/743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] dependabot[bot] opened a new pull request, #743: Bump aws.version from 1.12.319 to 1.12.320
dependabot[bot] opened a new pull request, #743: URL: https://github.com/apache/tika/pull/743 Bumps `aws.version` from 1.12.319 to 1.12.320. Updates `aws-java-sdk-transcribe` from 1.12.319 to 1.12.320 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-transcribe's changelog. 1.12.320 2022-10-13 AWS Amplify UI Builder Features We are releasing the ability for fields to be configured as arrays. AWS Directory Service Features This release adds support for describing and updating AWS Managed Microsoft AD set up. AWS Elemental MediaLive Features AWS Elemental MediaLive now supports forwarding SCTE-35 messages through the Event Signaling and Management (ESAM) API, and can read those SCTE-35 messages from an inactive source. AWS Elemental MediaPackage VOD Features This release adds SPEKE v2 support for MediaPackage VOD. Speke v2 is an upgrade to the existing SPEKE API to support multiple encryption keys, based on an encryption contract selected by the customer. AWS Identity and Access Management Features Documentation updates for the AWS Identity and Access Management API Reference. AWS IoT FleetWise Features Documentation update for AWS IoT FleetWise AWS Panorama Features Pause and resume camera stream processing with SignalApplicationInstanceNodeInstances. Reboot an appliance with CreateJobForDevices. More application state information in DescribeApplicationInstance response. AWS RDS DataService Features Doc update to reflect no support for schema parameter on BatchExecuteStatement API AWS Systems Manager Incident Manager Features Update RelatedItem enum to support Tasks AWS Transfer Family Features This release adds an option for customers to configure workflows that are triggered when files are only partially received from a client due to premature session disconnect. Amazon Appflow Features With this update, you can choose which Salesforce API is used by Amazon AppFlow to transfer data to or from your Salesforce account. You can choose the Salesforce REST API or Bulk API 2.0. You can also choose for Amazon AppFlow to pick the API automatically. Amazon Connect Service Features This release adds support for a secondary email and a mobile number for Amazon Connect instance users. Amazon Connect Wisdom Service ... (truncated) Commits https://github.com/aws/aws-sdk-java/commit/df9f71a18374665d655a05a6824c52aa625a0b2e;>df9f71a AWS SDK for Java 1.12.320 https://github.com/aws/aws-sdk-java/commit/0feb2499f78fd6028c5ba7900b3441619919b3d0;>0feb249 Update GitHub version number to 1.12.320-SNAPSHOT See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.319...1.12.320;>compare view Updates `aws-java-sdk-s3` from 1.12.319 to 1.12.320 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-s3's changelog. 1.12.320 2022-10-13 AWS Amplify UI Builder Features We are releasing the ability for fields to be configured as arrays. AWS Directory Service Features This release adds support for describing and updating AWS Managed Microsoft AD set up. AWS Elemental MediaLive Features AWS Elemental MediaLive now supports forwarding SCTE-35 messages through the Event Signaling and Management (ESAM) API, and can read those SCTE-35 messages from an inactive source. AWS Elemental MediaPackage VOD Features This release adds SPEKE v2 support for MediaPackage VOD. Speke v2 is an upgrade to the existing SPEKE API to support multiple encryption keys, based on an encryption contract selected by the customer. AWS Identity and Access Management Features Documentation updates for the AWS Identity and Access Management API Reference. AWS IoT FleetWise Features Documentation update for AWS IoT FleetWise AWS Panorama Features Pause and resume camera stream processing with SignalApplicationInstanceNodeInstances. Reboot an appliance with CreateJobForDevices. More application state information in DescribeApplicationInstance response. AWS RDS DataService Features Doc update to reflect no support for schema parameter on BatchExecuteStatement API AWS Systems Manager Incident Manager Features Update RelatedItem enum to support Tasks AWS Transfer
[jira] [Commented] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators
[ https://issues.apache.org/jira/browse/TIKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617327#comment-17617327 ] ASF GitHub Bot commented on TIKA-3879: -- nddipiazza commented on code in PR #742: URL: https://github.com/apache/tika/pull/742#discussion_r995166742 ## tika-pipes/tika-emitters/tika-emitter-s3/src/main/java/org/apache/tika/pipes/emitter/s3/S3Emitter.java: ## @@ -16,33 +16,19 @@ */ package org.apache.tika.pipes.emitter.s3; -import static org.apache.tika.config.TikaConfig.mustNotBeEmpty; Review Comment: ah yeah. i'm getting fired up on a new laptop with new everything. so i'm setting up my checkstyle profile now > add test containers test for s3 fetcher, emitter and pipe iterators > --- > > Key: TIKA-3879 > URL: https://issues.apache.org/jira/browse/TIKA-3879 > Project: Tika > Issue Type: Test > Components: tika-pipes >Reporter: Nicholas DiPiazza >Priority: Major > > need to add a testcontainers integration test for s3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika] nddipiazza commented on a diff in pull request #742: TIKA-3879 - add s3 testcontainers integration test
nddipiazza commented on code in PR #742: URL: https://github.com/apache/tika/pull/742#discussion_r995166742 ## tika-pipes/tika-emitters/tika-emitter-s3/src/main/java/org/apache/tika/pipes/emitter/s3/S3Emitter.java: ## @@ -16,33 +16,19 @@ */ package org.apache.tika.pipes.emitter.s3; -import static org.apache.tika.config.TikaConfig.mustNotBeEmpty; Review Comment: ah yeah. i'm getting fired up on a new laptop with new everything. so i'm setting up my checkstyle profile now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators
[ https://issues.apache.org/jira/browse/TIKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617320#comment-17617320 ] ASF GitHub Bot commented on TIKA-3879: -- tballison commented on code in PR #742: URL: https://github.com/apache/tika/pull/742#discussion_r995161152 ## tika-integration-tests/tika-pipes-s3-integration-tests/src/test/java/org/apache/tika/pipes/s3/tests/S3PipeIntegrationTest.java: ## @@ -0,0 +1,144 @@ +package org.apache.tika.pipes.s3.tests; Review Comment: Probably need a license on this? checkstyle should have complained! > add test containers test for s3 fetcher, emitter and pipe iterators > --- > > Key: TIKA-3879 > URL: https://issues.apache.org/jira/browse/TIKA-3879 > Project: Tika > Issue Type: Test > Components: tika-pipes >Reporter: Nicholas DiPiazza >Priority: Major > > need to add a testcontainers integration test for s3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika] tballison commented on pull request #742: TIKA-3879 - add s3 testcontainers integration test
tballison commented on PR #742: URL: https://github.com/apache/tika/pull/742#issuecomment-1278224792 Couple of really small things. Thank you so much for getting this rolling! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators
[ https://issues.apache.org/jira/browse/TIKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617322#comment-17617322 ] ASF GitHub Bot commented on TIKA-3879: -- tballison commented on PR #742: URL: https://github.com/apache/tika/pull/742#issuecomment-1278224792 Couple of really small things. Thank you so much for getting this rolling! > add test containers test for s3 fetcher, emitter and pipe iterators > --- > > Key: TIKA-3879 > URL: https://issues.apache.org/jira/browse/TIKA-3879 > Project: Tika > Issue Type: Test > Components: tika-pipes >Reporter: Nicholas DiPiazza >Priority: Major > > need to add a testcontainers integration test for s3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika] tballison commented on a diff in pull request #742: TIKA-3879 - add s3 testcontainers integration test
tballison commented on code in PR #742: URL: https://github.com/apache/tika/pull/742#discussion_r995161152 ## tika-integration-tests/tika-pipes-s3-integration-tests/src/test/java/org/apache/tika/pipes/s3/tests/S3PipeIntegrationTest.java: ## @@ -0,0 +1,144 @@ +package org.apache.tika.pipes.s3.tests; Review Comment: Probably need a license on this? checkstyle should have complained! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators
[ https://issues.apache.org/jira/browse/TIKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617321#comment-17617321 ] ASF GitHub Bot commented on TIKA-3879: -- tballison commented on code in PR #742: URL: https://github.com/apache/tika/pull/742#discussion_r995161476 ## tika-pipes/tika-emitters/tika-emitter-s3/src/main/java/org/apache/tika/pipes/emitter/s3/S3Emitter.java: ## @@ -16,33 +16,19 @@ */ package org.apache.tika.pipes.emitter.s3; -import static org.apache.tika.config.TikaConfig.mustNotBeEmpty; Review Comment: Import order had to be fixed? > add test containers test for s3 fetcher, emitter and pipe iterators > --- > > Key: TIKA-3879 > URL: https://issues.apache.org/jira/browse/TIKA-3879 > Project: Tika > Issue Type: Test > Components: tika-pipes >Reporter: Nicholas DiPiazza >Priority: Major > > need to add a testcontainers integration test for s3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika] tballison commented on a diff in pull request #742: TIKA-3879 - add s3 testcontainers integration test
tballison commented on code in PR #742: URL: https://github.com/apache/tika/pull/742#discussion_r995161476 ## tika-pipes/tika-emitters/tika-emitter-s3/src/main/java/org/apache/tika/pipes/emitter/s3/S3Emitter.java: ## @@ -16,33 +16,19 @@ */ package org.apache.tika.pipes.emitter.s3; -import static org.apache.tika.config.TikaConfig.mustNotBeEmpty; Review Comment: Import order had to be fixed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators
[ https://issues.apache.org/jira/browse/TIKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617311#comment-17617311 ] ASF GitHub Bot commented on TIKA-3879: -- nddipiazza opened a new pull request, #742: URL: https://github.com/apache/tika/pull/742 # add s3 tika pipes integration tests add integration test for s3 pipe iterator, s3 fetcher, and s3 emitter. > add test containers test for s3 fetcher, emitter and pipe iterators > --- > > Key: TIKA-3879 > URL: https://issues.apache.org/jira/browse/TIKA-3879 > Project: Tika > Issue Type: Test > Components: tika-pipes >Reporter: Nicholas DiPiazza >Priority: Major > > need to add a testcontainers integration test for s3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3874) Add summary of missing unicode mappings for PDF
[ https://issues.apache.org/jira/browse/TIKA-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617308#comment-17617308 ] Hudson commented on TIKA-3874: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #843 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/843/]) TIKA-3874 -- Add summary of missing unicode mappings for PDF (tallison: [https://github.com/apache/tika/commit/d6ae5185aff23e834af64b6ea347b638e54981d3]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java * (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java > Add summary of missing unicode mappings for PDF > --- > > Key: TIKA-3874 > URL: https://issues.apache.org/jira/browse/TIKA-3874 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > Fix For: 2.5.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF
[ https://issues.apache.org/jira/browse/TIKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617310#comment-17617310 ] Hudson commented on TIKA-3875: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #843 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/843/]) TIKA-3875 -- Add metadata items for "broken" fonts and non-embedded fonts for PDF (tallison: [https://github.com/apache/tika/commit/3fcda6da8155028ae915951cf448e40c0df2e348]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java * (edit) tika-core/src/main/java/org/apache/tika/metadata/PDF.java > Add metadata items for "broken" fonts and non-embedded fonts for PDF > > > Key: TIKA-3875 > URL: https://issues.apache.org/jira/browse/TIKA-3875 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > Fix For: 2.5.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3878) Improve PipesReporter and PipesIterator to report the total number of files to be processed
[ https://issues.apache.org/jira/browse/TIKA-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617309#comment-17617309 ] Hudson commented on TIKA-3878: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #843 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/843/]) TIKA-3878 -- allow pipes iterators to count the total number of files. (tallison: [https://github.com/apache/tika/commit/339289e45eae6560155f0fb7631687cfc86ba610]) * (edit) tika-pipes/tika-pipes-reporters/tika-pipes-reporter-fs-status/src/test/java/org/apache/tika/pipes/reporters/fs/TestFileSystemStatusReporter.java * (edit) tika-parent/pom.xml * (edit) tika-pipes/tika-pipes-reporters/tika-pipes-reporter-fs-status/pom.xml * (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesReporter.java * (add) tika-core/src/main/java/org/apache/tika/pipes/pipesiterator/TotalCounter.java * (edit) tika-pipes/tika-pipes-reporters/tika-pipes-reporter-fs-status/src/main/java/org/apache/tika/pipes/reporters/fs/FileSystemStatusReporter.java * (add) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncStatus.java * (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncProcessor.java * (edit) tika-core/src/main/java/org/apache/tika/pipes/pipesiterator/fs/FileSystemPipesIterator.java * (add) tika-core/src/main/java/org/apache/tika/pipes/pipesiterator/TotalCountResult.java > Improve PipesReporter and PipesIterator to report the total number of files > to be processed > --- > > Key: TIKA-3878 > URL: https://issues.apache.org/jira/browse/TIKA-3878 > Project: Tika > Issue Type: New Feature >Reporter: Tim Allison >Priority: Major > > For user-facing applications, it would be useful to give them a sense of > progress in reporting with a denominator (total files to process). > Some pipesiterators will have a natural shortcut (select count(1)... for jdbc > or other queries in OpenSearch and/or Solr). Some will have to do twice the > work -- file system and s3(?). And some simply won't be able to report a > total number. > My initial target is the FileSystemPipesIterator and the > FileSystemStatusReporter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika] nddipiazza opened a new pull request, #742: TIKA-3879 - add s3 testcontainers integration test
nddipiazza opened a new pull request, #742: URL: https://github.com/apache/tika/pull/742 # add s3 tika pipes integration tests add integration test for s3 pipe iterator, s3 fetcher, and s3 emitter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators
Nicholas DiPiazza created TIKA-3879: --- Summary: add test containers test for s3 fetcher, emitter and pipe iterators Key: TIKA-3879 URL: https://issues.apache.org/jira/browse/TIKA-3879 Project: Tika Issue Type: Test Components: tika-pipes Reporter: Nicholas DiPiazza need to add a testcontainers integration test for s3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika-helm] lewismc merged pull request #6: Fixes a bug with the order of HPA resources in ArgoCD
lewismc merged PR #6: URL: https://github.com/apache/tika-helm/pull/6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika-helm] lewismc commented on pull request #6: Fixes a bug with the order of HPA resources in ArgoCD
lewismc commented on PR #6: URL: https://github.com/apache/tika-helm/pull/6#issuecomment-1278179073 Hi @stijnbrouwers it looks like the issue is going through a cycle of being stale then being reopened. I'll merge your PR. Thanks for yur patience. I was away from the project for a while. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-3826) Helm: use appVersion from Charts.yaml intsead of images.tag
[ https://issues.apache.org/jira/browse/TIKA-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617298#comment-17617298 ] Lewis John McGibbney commented on TIKA-3826: [~hairmare] good suggestion. Please file a PR and tage me. i will be happy to review. Thanks > Helm: use appVersion from Charts.yaml intsead of images.tag > --- > > Key: TIKA-3826 > URL: https://issues.apache.org/jira/browse/TIKA-3826 > Project: Tika > Issue Type: Bug > Components: helm >Affects Versions: 2.2.1 >Reporter: Lucas Bickel >Priority: Major > > This is about the [tika Helm chart|https://github.com/apache/tika-helm]. > In `values.yaml` we currently have > [this|https://github.com/apache/tika-helm/blob/492386471616713bddbc5851912acdd78bd87609/values.yaml#L25-L26]: > {code:yaml} > # Overrides the image tag whose default is the chart appVersion. > tag: "1.26" > {code} > This leads to {{ .Values.image.tag | default .Chart.AppVersion }} [in > deployment.yaml|https://github.com/apache/tika-helm/blob/492386471616713bddbc5851912acdd78bd87609/templates/deployment.yaml#L52] > being dead code. > Currently the docs indicate that we should set {{image.tag}} during the > deployment, skipping this step results in deploying a very outdated tika 1.26. > My proposal for fixing this is to set the appVersion in {{Chart.yaml}} to the > latest 2.4.1-full version and set the image.tag to an empty version so it > defaults to the version from Chart.yaml. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF
[ https://issues.apache.org/jira/browse/TIKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3875. --- Fix Version/s: 2.5.1 Resolution: Fixed Many thanks to [~tilman] for guidance on the PDFBox user's list for where to capture this info. > Add metadata items for "broken" fonts and non-embedded fonts for PDF > > > Key: TIKA-3875 > URL: https://issues.apache.org/jira/browse/TIKA-3875 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > Fix For: 2.5.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (TIKA-3874) Add summary of missing unicode mappings for PDF
[ https://issues.apache.org/jira/browse/TIKA-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3874. --- Fix Version/s: 2.5.1 Resolution: Fixed > Add summary of missing unicode mappings for PDF > --- > > Key: TIKA-3874 > URL: https://issues.apache.org/jira/browse/TIKA-3874 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > Fix For: 2.5.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3876) Add a main() method to AsyncProcessor
[ https://issues.apache.org/jira/browse/TIKA-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617129#comment-17617129 ] Hudson commented on TIKA-3876: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #841 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/841/]) TIKA-3876 create a main() in AsyncProcessor (tallison: [https://github.com/apache/tika/commit/07386be85574e2174f2ab5564f5df0e910fefbf9]) * (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncProcessor.java * (edit) tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java > Add a main() method to AsyncProcessor > - > > Key: TIKA-3876 > URL: https://issues.apache.org/jira/browse/TIKA-3876 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Trivial > Fix For: 2.5.1 > > > This will allow users to call an async process with only tika-core. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3877) FileSystemStatusReporter's reporterThread should be daemon
[ https://issues.apache.org/jira/browse/TIKA-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617128#comment-17617128 ] Hudson commented on TIKA-3877: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #841 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/841/]) TIKA-3877 -- fix potential thread leak in FileSystemStatusReporter (tallison: [https://github.com/apache/tika/commit/79c4aef6ba8f31eca3e740216a5d8fa09f0e3895]) * (edit) tika-pipes/tika-pipes-reporters/tika-pipes-reporter-fs-status/src/main/java/org/apache/tika/pipes/reporters/fs/FileSystemStatusReporter.java > FileSystemStatusReporter's reporterThread should be daemon > -- > > Key: TIKA-3877 > URL: https://issues.apache.org/jira/browse/TIKA-3877 > Project: Tika > Issue Type: Bug >Reporter: Tim Allison >Priority: Major > Fix For: 2.5.1 > > > Thread leak. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-3878) Improve PipesReporter and PipesIterator to report the total number of files to be processed
Tim Allison created TIKA-3878: - Summary: Improve PipesReporter and PipesIterator to report the total number of files to be processed Key: TIKA-3878 URL: https://issues.apache.org/jira/browse/TIKA-3878 Project: Tika Issue Type: New Feature Reporter: Tim Allison For user-facing applications, it would be useful to give them a sense of progress in reporting with a denominator (total files to process). Some pipesiterators will have a natural shortcut (select count(1)... for jdbc or other queries in OpenSearch and/or Solr). Some will have to do twice the work -- file system and s3(?). And some simply won't be able to report a total number. My initial target is the FileSystemPipesIterator and the FileSystemStatusReporter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (TIKA-3877) FileSystemStatusReporter's reporterThread should be daemon
[ https://issues.apache.org/jira/browse/TIKA-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3877. --- Fix Version/s: 2.5.1 Resolution: Fixed > FileSystemStatusReporter's reporterThread should be daemon > -- > > Key: TIKA-3877 > URL: https://issues.apache.org/jira/browse/TIKA-3877 > Project: Tika > Issue Type: Bug >Reporter: Tim Allison >Priority: Major > Fix For: 2.5.1 > > > Thread leak. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-3877) FileSystemStatusReporter's reporterThread should be daemon
Tim Allison created TIKA-3877: - Summary: FileSystemStatusReporter's reporterThread should be daemon Key: TIKA-3877 URL: https://issues.apache.org/jira/browse/TIKA-3877 Project: Tika Issue Type: Bug Reporter: Tim Allison Thread leak. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (TIKA-3876) Add a main() method to AsyncProcessor
[ https://issues.apache.org/jira/browse/TIKA-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3876. --- Fix Version/s: 2.5.1 Resolution: Fixed > Add a main() method to AsyncProcessor > - > > Key: TIKA-3876 > URL: https://issues.apache.org/jira/browse/TIKA-3876 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Trivial > Fix For: 2.5.1 > > > This will allow users to call an async process with only tika-core. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-3876) Add a main() method to AsyncProcessor
Tim Allison created TIKA-3876: - Summary: Add a main() method to AsyncProcessor Key: TIKA-3876 URL: https://issues.apache.org/jira/browse/TIKA-3876 Project: Tika Issue Type: Task Reporter: Tim Allison This will allow users to call an async process with only tika-core. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3874) Add summary of missing unicode mappings for PDF
[ https://issues.apache.org/jira/browse/TIKA-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616949#comment-17616949 ] Tim Allison commented on TIKA-3874: --- Not clear how we want to do this. The simplest method would be a percentage, but it feels like we should have a sense of scale as well. If one pdf only has 10 characters and 9 of them lack mappings, is that a greater loss of information than a PDF with 1 characters and missing mappings for 1000? Perhaps one field for overall average and one for sum of missing? > Add summary of missing unicode mappings for PDF > --- > > Key: TIKA-3874 > URL: https://issues.apache.org/jira/browse/TIKA-3874 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF
[ https://issues.apache.org/jira/browse/TIKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616948#comment-17616948 ] Tim Allison edited comment on TIKA-3875 at 10/13/22 10:48 AM: -- [~tilman] responded to my question on the PDFBox user list that PDFont has an .isEmbedded() method. We have access to PDFonts with the document at the end of each page and on every call to showGlyph(). Not sure if we want a boolean for the document or counts of characters per page like we do for missing unicode mappings. Or both? was (Author: talli...@mitre.org): [~tilman] responded to my question on the PDFBox user list that PDFont has an .isEmbedded() method. We have access to PDFonts with the document at the end of each page and on every call to showGlyph(). Not sure if we want a boolean for the document or counts of characters per page like we do for missing unicode mappings. > Add metadata items for "broken" fonts and non-embedded fonts for PDF > > > Key: TIKA-3875 > URL: https://issues.apache.org/jira/browse/TIKA-3875 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF
[ https://issues.apache.org/jira/browse/TIKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616948#comment-17616948 ] Tim Allison commented on TIKA-3875: --- [~tilman] responded to my question on the PDFBox user list that PDFont has an .isEmbedded() method. We have access to PDFonts with the document at the end of each page and on every call to showGlyph(). Not sure if we want a boolean for the document or counts of characters per page like we do for missing unicode mappings. > Add metadata items for "broken" fonts and non-embedded fonts for PDF > > > Key: TIKA-3875 > URL: https://issues.apache.org/jira/browse/TIKA-3875 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF
Tim Allison created TIKA-3875: - Summary: Add metadata items for "broken" fonts and non-embedded fonts for PDF Key: TIKA-3875 URL: https://issues.apache.org/jira/browse/TIKA-3875 Project: Tika Issue Type: Task Reporter: Tim Allison -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (TIKA-3875) Add metadata items for "broken" fonts and non-embedded fonts for PDF
[ https://issues.apache.org/jira/browse/TIKA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3875: -- Priority: Minor (was: Major) > Add metadata items for "broken" fonts and non-embedded fonts for PDF > > > Key: TIKA-3875 > URL: https://issues.apache.org/jira/browse/TIKA-3875 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-3874) Add summary of missing unicode mappings for PDF
Tim Allison created TIKA-3874: - Summary: Add summary of missing unicode mappings for PDF Key: TIKA-3874 URL: https://issues.apache.org/jira/browse/TIKA-3874 Project: Tika Issue Type: Task Reporter: Tim Allison -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika] THausherr merged pull request #741: Bump cxf.version from 3.5.3 to 3.5.4
THausherr merged PR #741: URL: https://github.com/apache/tika/pull/741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org