[GitHub] [tika] THausherr merged pull request #991: Bump jetty-bom from 9.4.50.v20221201 to 9.4.51.v20230217

2023-02-27 Thread via GitHub


THausherr merged PR #991:
URL: https://github.com/apache/tika/pull/991


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #988: Bump maven-compiler-plugin from 3.10.1 to 3.11.0

2023-02-27 Thread via GitHub


THausherr merged PR #988:
URL: https://github.com/apache/tika/pull/988


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #989: Bump aws.version from 1.12.415 to 1.12.416

2023-02-27 Thread via GitHub


THausherr merged PR #989:
URL: https://github.com/apache/tika/pull/989


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #990: Bump zstd-jni from 1.5.4-1 to 1.5.4-2

2023-02-27 Thread via GitHub


THausherr merged PR #990:
URL: https://github.com/apache/tika/pull/990


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #991: Bump jetty-bom from 9.4.50.v20221201 to 9.4.51.v20230217

2023-02-27 Thread via GitHub


dependabot[bot] opened a new pull request, #991:
URL: https://github.com/apache/tika/pull/991

   Bumps [jetty-bom](https://github.com/eclipse/jetty.project) from 
9.4.50.v20221201 to 9.4.51.v20230217.
   
   Release notes
   Sourced from https://github.com/eclipse/jetty.project/releases;>jetty-bom's 
releases.
   
   9.4.51.v20230217
   Sponsored Release
   This is a release of the https://github-redirect.dependabot.com/eclipse/jetty.project/issues/7958;>End
 of Community Support Jetty 9.x series that was sponsored by a https://github.com/eclipse/jetty.project/blob/HEAD/mailto:sa...@webtide.com;>support
 contract from Webtide.com
   Changelog
   
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9352;>#9352
 - Update / Fix CookieCutter
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9345;>#9345
 - Multipart Cleanups
   
   Dependencies
   
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9269;>#9269
 - Bump ant.version to 1.10.13
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9370;>#9370
 - Bump asciidoctorj-diagram to 2.2.4
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9364;>#9364
 - Bump eclipse-jarsigner-plugin to 1.4.2
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9251;>#9251
 - Bump infinispan.version to 11.0.17.Final
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9247;>#9247
 - Bump maven-checkstyle-plugin to 3.2.1
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9267;>#9267
 - Bump maven-dependency-plugin to 3.5.0
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9365;>#9365
 - Bump maven-deploy-plugin to 3.1.0
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9252;>#9252
 - Bump maven-enforcer-plugin to 3.2.1
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9363;>#9363
 - Bump maven-invoker-plugin to 3.5.0
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9266;>#9266
 - Bump maven-plugin-plugin to 3.7.1
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9263;>#9263
 - Bump maven.plugin-tools.version to 3.7.1
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9256;>#9256
 - Bump maven.resolver.version to 1.9.4
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9368;>#9368
 - Bump maven.surefire.plugin.version to 3.0.0-M9
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9362;>#9362
 - Bump maven.version to 3.9.0
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9100;>#9100
 - Bump org.apache.aries.spifly.dynamic.bundle to 1.3.6
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9103;>#9103
 - Bump org.eclipse.osgi to 3.18.200
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9110;>#9110
 - Bump org.eclipse.osgi.services to 3.11.100
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9262;>#9262
 - Bump spring-beans to 5.3.25
   
   
   
   
   Commits
   
   https://github.com/eclipse/jetty.project/commit/b45c405e4544384de066f814ed42ae3dceacdd49;>b45c405
 Updating to version 9.4.51.v20230217
   https://github.com/eclipse/jetty.project/commit/3beaa8158c589da77ff35af90a52225b938abdb8;>3beaa81
 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9368;>#9368
 from eclipse/dependabot/maven/jetty-9.4.x/maven.sure...
   https://github.com/eclipse/jetty.project/commit/d382683e2be1dc7527bd628df988b3e27147a94a;>d382683
 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9370;>#9370
 from eclipse/dependabot/maven/jetty-9.4.x/org.asciid...
   https://github.com/eclipse/jetty.project/commit/d52d1336da67fac3a2f7a5889d5207c78d33c389;>d52d133
 Bump maven.surefire.plugin.version from 3.0.0-M8 to 3.0.0-M9
   https://github.com/eclipse/jetty.project/commit/1bc959a9c3be3769ec59660df74663ceaf586ea7;>1bc959a
 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9365;>#9365
 from eclipse/dependabot/maven/jetty-9.4.x/org.apache...
   https://github.com/eclipse/jetty.project/commit/08c89c797abef55c0a500e4440c6055e1f97ed90;>08c89c7
 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9364;>#9364
 from eclipse/dependabot/maven/jetty-9.4.x/org.eclips...
   https://github.com/eclipse/jetty.project/commit/2a30acaffef584a11c1a53b371ee6ee7535d0566;>2a30aca
 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9363;>#9363
 from eclipse/dependabot/maven/jetty-9.4.x/org.apache...
   https://github.com/eclipse/jetty.project/commit/6ab783d9c810f1a1e4469244e8194111c19345f4;>6ab783d
 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9362;>#9362
 from eclipse/dependabot/maven/jetty-9.4.x/maven.vers...
  

[jira] [Updated] (TIKA-3981) Tika parser meets window system file

2023-02-27 Thread Tika User (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tika User updated TIKA-3981:

Attachment: Tika_Testing.docx

> Tika parser meets window system file
> 
>
> Key: TIKA-3981
> URL: https://issues.apache.org/jira/browse/TIKA-3981
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Major
> Attachments: ASK_Tika_Parser.docx, Tika_Testing.docx
>
>
> Hi All,
>  
>    I execute the command "java -jar tika-app-2.7.0.jar." and load the 
> windows system execute file where.exe. 
>   You could find the file in your own windows system, 
> c:\Windows\systen32\where.exe.
>   Tika gets the dcterms:created, "2037-03-05T20:49:08Z" , but I get 
> confused the future time. 
>   Could you help check why tika gets the special created date, please?  
>  
>  Attachment is also my testing with several tika versions, for your 
> reference. 
> Thank you.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3981) Tika parser meets window system file

2023-02-27 Thread Tika User (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694330#comment-17694330
 ] 

Tika User commented on TIKA-3981:
-

Hi [~nick] ,

 

  Only the special files, existed in the C:\windows\System32, show the 
future time or 1988 time. They are owned by Microsoft. Our laptops are 
installed Windows 10.

  By the way, in the Window Explorer, these files show the sensible time.  
Attachment(Tika_Testing.docx) is also my testing, for your reference.

  Thank you.

[^Tika_Testing.docx]

> Tika parser meets window system file
> 
>
> Key: TIKA-3981
> URL: https://issues.apache.org/jira/browse/TIKA-3981
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Major
> Attachments: ASK_Tika_Parser.docx, Tika_Testing.docx
>
>
> Hi All,
>  
>    I execute the command "java -jar tika-app-2.7.0.jar." and load the 
> windows system execute file where.exe. 
>   You could find the file in your own windows system, 
> c:\Windows\systen32\where.exe.
>   Tika gets the dcterms:created, "2037-03-05T20:49:08Z" , but I get 
> confused the future time. 
>   Could you help check why tika gets the special created date, please?  
>  
>  Attachment is also my testing with several tika versions, for your 
> reference. 
> Thank you.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] dependabot[bot] opened a new pull request, #990: Bump zstd-jni from 1.5.4-1 to 1.5.4-2

2023-02-27 Thread via GitHub


dependabot[bot] opened a new pull request, #990:
URL: https://github.com/apache/tika/pull/990

   Bumps [zstd-jni](https://github.com/luben/zstd-jni) from 1.5.4-1 to 1.5.4-2.
   
   Commits
   
   https://github.com/luben/zstd-jni/commit/46699bbb024a7e04a61e61d7dbe12fdb1ed9c5dd;>46699bb
 v1.5.4-2
   https://github.com/luben/zstd-jni/commit/3545ce8d36ed27fee769943a50a4e4ecc620232d;>3545ce8
 Also update CI to codecov/codecov-action@v3
   https://github.com/luben/zstd-jni/commit/b575a0ae3afc1dd688706becdf79c6cd8bf4456c;>b575a0a
 Update CI: use actions/setup-java@v3
   https://github.com/luben/zstd-jni/commit/54d22045bcb4cb0a3778f61a456880393577e2c1;>54d2204
 Also pass the pointer explicitly in ZstCompressCtx
   https://github.com/luben/zstd-jni/commit/1317e44493c676b1164c8b0398616d43fa349b5b;>1317e44
 ZstdDecompressCtx: pass nativePtr directly to JNI calls
   https://github.com/luben/zstd-jni/commit/73a378f546968216e9c27504a0cb4999db245358;>73a378f
 Fix new lines and extra spaces
   See full diff in https://github.com/luben/zstd-jni/compare/v1.5.4-1...v1.5.4-2;>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=com.github.luben:zstd-jni=maven=1.5.4-1=1.5.4-2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #989: Bump aws.version from 1.12.415 to 1.12.416

2023-02-27 Thread via GitHub


dependabot[bot] opened a new pull request, #989:
URL: https://github.com/apache/tika/pull/989

   Bumps `aws.version` from 1.12.415 to 1.12.416.
   Updates `aws-java-sdk-s3` from 1.12.415 to 1.12.416
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-s3's
 changelog.
   
   1.12.416 2023-02-27
   AWS Elemental MediaConvert
   
   
   Features
   
   The AWS Elemental MediaConvert SDK has added support for HDR10 to SDR 
tone mapping, and animated GIF video input sources.
   
   
   
   AWS Lambda
   
   
   Features
   
   This release adds the ability to create ESMs with Document DB change 
streams as event source. For more information see  https://docs.aws.amazon.com/lambda/latest/dg/with-documentdb.html;>https://docs.aws.amazon.com/lambda/latest/dg/with-documentdb.html.
   
   
   
   Amazon CloudWatch Internet Monitor
   
   
   Features
   
   CloudWatch Internet Monitor is a a new service within CloudWatch that 
will help application developers and network engineers continuously monitor 
internet performance metrics such as availability and performance between their 
AWS-hosted applications and end-users of these applications
   
   
   
   Amazon DevOps Guru
   
   
   Features
   
   This release adds the description field on ListAnomaliesForInsight and 
DescribeAnomaly API responses for proactive anomalies.
   
   
   
   Amazon Timestream Write
   
   
   Features
   
   This release adds the ability to ingest batched historical data or 
migrate data in bulk from S3 into Timestream using CSV files.
   
   
   
   Elastic Disaster Recovery Service
   
   
   Features
   
   New fields were added to reflect availability zone data in source server 
and recovery instance description commands responses, as well as source server 
launch status.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/8d9555dca5c43682cba7c1f67981bd7a61fd0f17;>8d9555d
 AWS SDK for Java 1.12.416
   https://github.com/aws/aws-sdk-java/commit/37b4fb884e3ed111e3211c5c8a4d8a529146a6d4;>37b4fb8
 Update GitHub version number to 1.12.416-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.415...1.12.416;>compare 
view
   
   
   
   
   Updates `aws-java-sdk-transcribe` from 1.12.415 to 1.12.416
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-transcribe's
 changelog.
   
   1.12.416 2023-02-27
   AWS Elemental MediaConvert
   
   
   Features
   
   The AWS Elemental MediaConvert SDK has added support for HDR10 to SDR 
tone mapping, and animated GIF video input sources.
   
   
   
   AWS Lambda
   
   
   Features
   
   This release adds the ability to create ESMs with Document DB change 
streams as event source. For more information see  https://docs.aws.amazon.com/lambda/latest/dg/with-documentdb.html;>https://docs.aws.amazon.com/lambda/latest/dg/with-documentdb.html.
   
   
   
   Amazon CloudWatch Internet Monitor
   
   
   Features
   
   CloudWatch Internet Monitor is a a new service within CloudWatch that 
will help application developers and network engineers continuously monitor 
internet performance metrics such as availability and performance between their 
AWS-hosted applications and end-users of these applications
   
   
   
   Amazon DevOps Guru
   
   
   Features
   
   This release adds the description field on ListAnomaliesForInsight and 
DescribeAnomaly API responses for proactive anomalies.
   
   
   
   Amazon Timestream Write
   
   
   Features
   
   This release adds the ability to ingest batched historical data or 
migrate data in bulk from S3 into Timestream using CSV files.
   
   
   
   Elastic Disaster Recovery Service
   
   
   Features
   
   New fields were added to reflect availability zone data in source server 
and recovery instance description commands responses, as well as source server 
launch status.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/8d9555dca5c43682cba7c1f67981bd7a61fd0f17;>8d9555d
 AWS SDK for Java 1.12.416
   https://github.com/aws/aws-sdk-java/commit/37b4fb884e3ed111e3211c5c8a4d8a529146a6d4;>37b4fb8
 Update GitHub version number to 1.12.416-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.415...1.12.416;>compare 
view
   
   
   
   
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - 

[GitHub] [tika] dependabot[bot] opened a new pull request, #988: Bump maven-compiler-plugin from 3.10.1 to 3.11.0

2023-02-27 Thread via GitHub


dependabot[bot] opened a new pull request, #988:
URL: https://github.com/apache/tika/pull/988

   Bumps 
[maven-compiler-plugin](https://github.com/apache/maven-compiler-plugin) from 
3.10.1 to 3.11.0.
   
   Commits
   
   https://github.com/apache/maven-compiler-plugin/commit/eeda628b832bf3cc27571e2073f62d582a6d9527;>eeda628
 [maven-release-plugin] prepare release maven-compiler-plugin-3.11.0
   https://github.com/apache/maven-compiler-plugin/commit/82b799f3501d0dc3ef868859245816c563c46f04;>82b799f
 [MCOMPILER-527] Upgrade plexus-java to 1.1.2 (https://github-redirect.dependabot.com/apache/maven-compiler-plugin/issues/177;>#177)
   https://github.com/apache/maven-compiler-plugin/commit/f9c2350c885a96638db66fbab4d9180729a31d5a;>f9c2350
 [MCOMPILER-526] Fix IT (https://github-redirect.dependabot.com/apache/maven-compiler-plugin/issues/178;>#178)
   https://github.com/apache/maven-compiler-plugin/commit/4022bd0f37626124dad394b2e4583fd6768fa74a;>4022bd0
 [MCOMPILER-494] - Add a useModulePath switch to the 
testCompile mojo (https://github-redirect.dependabot.com/apache/maven-compiler-plugin/issues/119;>#119)
   https://github.com/apache/maven-compiler-plugin/commit/f4a8a54e116b07e888ac7b6371fa24b7a81517b3;>f4a8a54
 [MCOMPILER-525] Incorrect detection of dependency change (https://github-redirect.dependabot.com/apache/maven-compiler-plugin/issues/172;>#172)
   https://github.com/apache/maven-compiler-plugin/commit/86b9f5972bcb005305f8abb8fb1f3c0d89df2726;>86b9f59
 [MCOMPILER-395] Allow dependency exclusions for 'annotationProcessorPaths' 
(#...
   https://github.com/apache/maven-compiler-plugin/commit/e304ceb91cb625399638f95be41e6c23ca0970d0;>e304ceb
 [MCOMPILER-526] Ignore reformat commit for git blame
   https://github.com/apache/maven-compiler-plugin/commit/f7a4613eaa2364dcaf10f96f04a6b1afb2feb7ed;>f7a4613
 [MCOMPILER-526] Reformat
   https://github.com/apache/maven-compiler-plugin/commit/cc78aee657a684af721b3efafd0e1525272d4201;>cc78aee
 [MCOMPILER-526] Upgrade to parent 39
   https://github.com/apache/maven-compiler-plugin/commit/3dca82f4bf91e747c81ff3fe43e670f7cd7c08e1;>3dca82f
 [MCOMPILER-526] Add packages to please the formatter
   Additional commits viewable in https://github.com/apache/maven-compiler-plugin/compare/maven-compiler-plugin-3.10.1...maven-compiler-plugin-3.11.0;>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.maven.plugins:maven-compiler-plugin=maven=3.10.1=3.11.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (TIKA-3983) Snapshot versions mismatch

2023-02-27 Thread Alexey Pismenskiy (Jira)
Alexey Pismenskiy created TIKA-3983:
---

 Summary: Snapshot versions mismatch
 Key: TIKA-3983
 URL: https://issues.apache.org/jira/browse/TIKA-3983
 Project: Tika
  Issue Type: Bug
  Components: build
Affects Versions: 2.7.1
Reporter: Alexey Pismenskiy


[https://repository.apache.org/content/repositories/snapshots/org/apache/tika/tika-parsers-standard-package/2.7.1-SNAPSHOT/]
 has a maven-metadata.xml that points to the snapshot version that does not 
exist: 

 20230227.092344
 43

 
That's the reason why local build, that uses a snapshot (2.7.1-SNAPSHOT) fails: 
 
  Apache snapshots: tried

[warn]   
https://repository.apache.org/content/repositories/snapshots/org/apache/tika/tika-parsers-standard-package/2.7.1-SNAPSHOT/tika-parsers-standard-package-2.7.1-20230227.092344-43.pom

[warn]   
https://repository.apache.org/content/repositories/snapshots/org/apache/tika/tika-parsers-standard-package/2.7.1-SNAPSHOT/tika-parsers-standard-package-2.7.1-SNAPSHOT.pom

[warn] ::

[warn] ::          UNRESOLVED DEPENDENCIES         ::

[warn] ::

[warn] :: org.apache.tika#tika-parsers-standard-package;2.7.1-SNAPSHOT: not 
found



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] apismensky commented on pull request #985: [TIKA-3979] OneNoteParser - Improve performance for deserialization

2023-02-27 Thread via GitHub


apismensky commented on PR #985:
URL: https://github.com/apache/tika/pull/985#issuecomment-1446966128

   Confirming with my file: 
   Before fix: 26844 ms
   After fix: 692 ms
   Yay yay! 
   @nddipiazza thanks for fixing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3979) OneNoteParser - Improve performance for deserialization

2023-02-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694154#comment-17694154
 ] 

ASF GitHub Bot commented on TIKA-3979:
--

apismensky commented on PR #985:
URL: https://github.com/apache/tika/pull/985#issuecomment-1446966128

   Confirming with my file: 
   Before fix: 26844 ms
   After fix: 692 ms
   Yay yay! 
   @nddipiazza thanks for fixing!




> OneNoteParser - Improve performance for deserialization
> ---
>
> Key: TIKA-3979
> URL: https://issues.apache.org/jira/browse/TIKA-3979
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 2.7.0
>Reporter: David Xie
>Priority: Major
> Attachments: image-2023-02-20-14-42-10-590.png, 
> image-2023-02-25-12-01-40-311.png
>
>
> We noticed some performance issues specific to parsing OneNote files. Our cpu 
> profiler reports that the parser spends a lot of time on deserializing byte 
> arrays (image included below)
> !image-2023-02-20-14-42-10-590.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3979) OneNoteParser - Improve performance for deserialization

2023-02-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694134#comment-17694134
 ] 

ASF GitHub Bot commented on TIKA-3979:
--

nddipiazza commented on PR #985:
URL: https://github.com/apache/tika/pull/985#issuecomment-1446882008

   yes that is because the onenote parser for alterantive format was just 
printing some general header information before. now it's actually parsing it 
(slowly due to the bug) which should now be fixed hopefully. sorry about that!




> OneNoteParser - Improve performance for deserialization
> ---
>
> Key: TIKA-3979
> URL: https://issues.apache.org/jira/browse/TIKA-3979
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 2.7.0
>Reporter: David Xie
>Priority: Major
> Attachments: image-2023-02-20-14-42-10-590.png, 
> image-2023-02-25-12-01-40-311.png
>
>
> We noticed some performance issues specific to parsing OneNote files. Our cpu 
> profiler reports that the parser spends a lot of time on deserializing byte 
> arrays (image included below)
> !image-2023-02-20-14-42-10-590.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] nddipiazza commented on pull request #985: [TIKA-3979] OneNoteParser - Improve performance for deserialization

2023-02-27 Thread via GitHub


nddipiazza commented on PR #985:
URL: https://github.com/apache/tika/pull/985#issuecomment-1446882008

   yes that is because the onenote parser for alterantive format was just 
printing some general header information before. now it's actually parsing it 
(slowly due to the bug) which should now be fixed hopefully. sorry about that!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3979) OneNoteParser - Improve performance for deserialization

2023-02-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694096#comment-17694096
 ] 

ASF GitHub Bot commented on TIKA-3979:
--

apismensky commented on PR #985:
URL: https://github.com/apache/tika/pull/985#issuecomment-1446743975

   I was going to submit this issue last week. 
   My observation was similar - lots of overhead around BitSet - mem 
allocations / cpu. 
   We switched from tika 1.27 to 2.7.0 
   For one of the files we saw the difference: 
   Extraction took: 2199 ( tika 1.27) vs
   Extraction took: 27010 ( tika 2.7.0) 
   
   Both in ms, so it is more than 10 times slower.
   Original file size is 50.5 Mb
   
   




> OneNoteParser - Improve performance for deserialization
> ---
>
> Key: TIKA-3979
> URL: https://issues.apache.org/jira/browse/TIKA-3979
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 2.7.0
>Reporter: David Xie
>Priority: Major
> Attachments: image-2023-02-20-14-42-10-590.png, 
> image-2023-02-25-12-01-40-311.png
>
>
> We noticed some performance issues specific to parsing OneNote files. Our cpu 
> profiler reports that the parser spends a lot of time on deserializing byte 
> arrays (image included below)
> !image-2023-02-20-14-42-10-590.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] apismensky commented on pull request #985: [TIKA-3979] OneNoteParser - Improve performance for deserialization

2023-02-27 Thread via GitHub


apismensky commented on PR #985:
URL: https://github.com/apache/tika/pull/985#issuecomment-1446743975

   I was going to submit this issue last week. 
   My observation was similar - lots of overhead around BitSet - mem 
allocations / cpu. 
   We switched from tika 1.27 to 2.7.0 
   For one of the files we saw the difference: 
   Extraction took: 2199 ( tika 1.27) vs
   Extraction took: 27010 ( tika 2.7.0) 
   
   Both in ms, so it is more than 10 times slower.
   Original file size is 50.5 Mb
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org