[GitHub] [tika] THausherr merged pull request #991: Bump jetty-bom from 9.4.50.v20221201 to 9.4.51.v20230217
THausherr merged PR #991: URL: https://github.com/apache/tika/pull/991 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] THausherr merged pull request #988: Bump maven-compiler-plugin from 3.10.1 to 3.11.0
THausherr merged PR #988: URL: https://github.com/apache/tika/pull/988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] THausherr merged pull request #989: Bump aws.version from 1.12.415 to 1.12.416
THausherr merged PR #989: URL: https://github.com/apache/tika/pull/989 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] THausherr merged pull request #990: Bump zstd-jni from 1.5.4-1 to 1.5.4-2
THausherr merged PR #990: URL: https://github.com/apache/tika/pull/990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] dependabot[bot] opened a new pull request, #991: Bump jetty-bom from 9.4.50.v20221201 to 9.4.51.v20230217
dependabot[bot] opened a new pull request, #991: URL: https://github.com/apache/tika/pull/991 Bumps [jetty-bom](https://github.com/eclipse/jetty.project) from 9.4.50.v20221201 to 9.4.51.v20230217. Release notes Sourced from https://github.com/eclipse/jetty.project/releases;>jetty-bom's releases. 9.4.51.v20230217 Sponsored Release This is a release of the https://github-redirect.dependabot.com/eclipse/jetty.project/issues/7958;>End of Community Support Jetty 9.x series that was sponsored by a https://github.com/eclipse/jetty.project/blob/HEAD/mailto:sa...@webtide.com;>support contract from Webtide.com Changelog https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9352;>#9352 - Update / Fix CookieCutter https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9345;>#9345 - Multipart Cleanups Dependencies https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9269;>#9269 - Bump ant.version to 1.10.13 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9370;>#9370 - Bump asciidoctorj-diagram to 2.2.4 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9364;>#9364 - Bump eclipse-jarsigner-plugin to 1.4.2 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9251;>#9251 - Bump infinispan.version to 11.0.17.Final https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9247;>#9247 - Bump maven-checkstyle-plugin to 3.2.1 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9267;>#9267 - Bump maven-dependency-plugin to 3.5.0 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9365;>#9365 - Bump maven-deploy-plugin to 3.1.0 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9252;>#9252 - Bump maven-enforcer-plugin to 3.2.1 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9363;>#9363 - Bump maven-invoker-plugin to 3.5.0 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9266;>#9266 - Bump maven-plugin-plugin to 3.7.1 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9263;>#9263 - Bump maven.plugin-tools.version to 3.7.1 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9256;>#9256 - Bump maven.resolver.version to 1.9.4 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9368;>#9368 - Bump maven.surefire.plugin.version to 3.0.0-M9 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9362;>#9362 - Bump maven.version to 3.9.0 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9100;>#9100 - Bump org.apache.aries.spifly.dynamic.bundle to 1.3.6 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9103;>#9103 - Bump org.eclipse.osgi to 3.18.200 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9110;>#9110 - Bump org.eclipse.osgi.services to 3.11.100 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9262;>#9262 - Bump spring-beans to 5.3.25 Commits https://github.com/eclipse/jetty.project/commit/b45c405e4544384de066f814ed42ae3dceacdd49;>b45c405 Updating to version 9.4.51.v20230217 https://github.com/eclipse/jetty.project/commit/3beaa8158c589da77ff35af90a52225b938abdb8;>3beaa81 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9368;>#9368 from eclipse/dependabot/maven/jetty-9.4.x/maven.sure... https://github.com/eclipse/jetty.project/commit/d382683e2be1dc7527bd628df988b3e27147a94a;>d382683 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9370;>#9370 from eclipse/dependabot/maven/jetty-9.4.x/org.asciid... https://github.com/eclipse/jetty.project/commit/d52d1336da67fac3a2f7a5889d5207c78d33c389;>d52d133 Bump maven.surefire.plugin.version from 3.0.0-M8 to 3.0.0-M9 https://github.com/eclipse/jetty.project/commit/1bc959a9c3be3769ec59660df74663ceaf586ea7;>1bc959a Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9365;>#9365 from eclipse/dependabot/maven/jetty-9.4.x/org.apache... https://github.com/eclipse/jetty.project/commit/08c89c797abef55c0a500e4440c6055e1f97ed90;>08c89c7 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9364;>#9364 from eclipse/dependabot/maven/jetty-9.4.x/org.eclips... https://github.com/eclipse/jetty.project/commit/2a30acaffef584a11c1a53b371ee6ee7535d0566;>2a30aca Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9363;>#9363 from eclipse/dependabot/maven/jetty-9.4.x/org.apache... https://github.com/eclipse/jetty.project/commit/6ab783d9c810f1a1e4469244e8194111c19345f4;>6ab783d Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/9362;>#9362 from eclipse/dependabot/maven/jetty-9.4.x/maven.vers...
[jira] [Updated] (TIKA-3981) Tika parser meets window system file
[ https://issues.apache.org/jira/browse/TIKA-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tika User updated TIKA-3981: Attachment: Tika_Testing.docx > Tika parser meets window system file > > > Key: TIKA-3981 > URL: https://issues.apache.org/jira/browse/TIKA-3981 > Project: Tika > Issue Type: Bug >Reporter: Tika User >Priority: Major > Attachments: ASK_Tika_Parser.docx, Tika_Testing.docx > > > Hi All, > > I execute the command "java -jar tika-app-2.7.0.jar." and load the > windows system execute file where.exe. > You could find the file in your own windows system, > c:\Windows\systen32\where.exe. > Tika gets the dcterms:created, "2037-03-05T20:49:08Z" , but I get > confused the future time. > Could you help check why tika gets the special created date, please? > > Attachment is also my testing with several tika versions, for your > reference. > Thank you. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3981) Tika parser meets window system file
[ https://issues.apache.org/jira/browse/TIKA-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694330#comment-17694330 ] Tika User commented on TIKA-3981: - Hi [~nick] , Only the special files, existed in the C:\windows\System32, show the future time or 1988 time. They are owned by Microsoft. Our laptops are installed Windows 10. By the way, in the Window Explorer, these files show the sensible time. Attachment(Tika_Testing.docx) is also my testing, for your reference. Thank you. [^Tika_Testing.docx] > Tika parser meets window system file > > > Key: TIKA-3981 > URL: https://issues.apache.org/jira/browse/TIKA-3981 > Project: Tika > Issue Type: Bug >Reporter: Tika User >Priority: Major > Attachments: ASK_Tika_Parser.docx, Tika_Testing.docx > > > Hi All, > > I execute the command "java -jar tika-app-2.7.0.jar." and load the > windows system execute file where.exe. > You could find the file in your own windows system, > c:\Windows\systen32\where.exe. > Tika gets the dcterms:created, "2037-03-05T20:49:08Z" , but I get > confused the future time. > Could you help check why tika gets the special created date, please? > > Attachment is also my testing with several tika versions, for your > reference. > Thank you. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika] dependabot[bot] opened a new pull request, #990: Bump zstd-jni from 1.5.4-1 to 1.5.4-2
dependabot[bot] opened a new pull request, #990: URL: https://github.com/apache/tika/pull/990 Bumps [zstd-jni](https://github.com/luben/zstd-jni) from 1.5.4-1 to 1.5.4-2. Commits https://github.com/luben/zstd-jni/commit/46699bbb024a7e04a61e61d7dbe12fdb1ed9c5dd;>46699bb v1.5.4-2 https://github.com/luben/zstd-jni/commit/3545ce8d36ed27fee769943a50a4e4ecc620232d;>3545ce8 Also update CI to codecov/codecov-action@v3 https://github.com/luben/zstd-jni/commit/b575a0ae3afc1dd688706becdf79c6cd8bf4456c;>b575a0a Update CI: use actions/setup-java@v3 https://github.com/luben/zstd-jni/commit/54d22045bcb4cb0a3778f61a456880393577e2c1;>54d2204 Also pass the pointer explicitly in ZstCompressCtx https://github.com/luben/zstd-jni/commit/1317e44493c676b1164c8b0398616d43fa349b5b;>1317e44 ZstdDecompressCtx: pass nativePtr directly to JNI calls https://github.com/luben/zstd-jni/commit/73a378f546968216e9c27504a0cb4999db245358;>73a378f Fix new lines and extra spaces See full diff in https://github.com/luben/zstd-jni/compare/v1.5.4-1...v1.5.4-2;>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=com.github.luben:zstd-jni=maven=1.5.4-1=1.5.4-2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] dependabot[bot] opened a new pull request, #989: Bump aws.version from 1.12.415 to 1.12.416
dependabot[bot] opened a new pull request, #989: URL: https://github.com/apache/tika/pull/989 Bumps `aws.version` from 1.12.415 to 1.12.416. Updates `aws-java-sdk-s3` from 1.12.415 to 1.12.416 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-s3's changelog. 1.12.416 2023-02-27 AWS Elemental MediaConvert Features The AWS Elemental MediaConvert SDK has added support for HDR10 to SDR tone mapping, and animated GIF video input sources. AWS Lambda Features This release adds the ability to create ESMs with Document DB change streams as event source. For more information see https://docs.aws.amazon.com/lambda/latest/dg/with-documentdb.html;>https://docs.aws.amazon.com/lambda/latest/dg/with-documentdb.html. Amazon CloudWatch Internet Monitor Features CloudWatch Internet Monitor is a a new service within CloudWatch that will help application developers and network engineers continuously monitor internet performance metrics such as availability and performance between their AWS-hosted applications and end-users of these applications Amazon DevOps Guru Features This release adds the description field on ListAnomaliesForInsight and DescribeAnomaly API responses for proactive anomalies. Amazon Timestream Write Features This release adds the ability to ingest batched historical data or migrate data in bulk from S3 into Timestream using CSV files. Elastic Disaster Recovery Service Features New fields were added to reflect availability zone data in source server and recovery instance description commands responses, as well as source server launch status. Commits https://github.com/aws/aws-sdk-java/commit/8d9555dca5c43682cba7c1f67981bd7a61fd0f17;>8d9555d AWS SDK for Java 1.12.416 https://github.com/aws/aws-sdk-java/commit/37b4fb884e3ed111e3211c5c8a4d8a529146a6d4;>37b4fb8 Update GitHub version number to 1.12.416-SNAPSHOT See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.415...1.12.416;>compare view Updates `aws-java-sdk-transcribe` from 1.12.415 to 1.12.416 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-transcribe's changelog. 1.12.416 2023-02-27 AWS Elemental MediaConvert Features The AWS Elemental MediaConvert SDK has added support for HDR10 to SDR tone mapping, and animated GIF video input sources. AWS Lambda Features This release adds the ability to create ESMs with Document DB change streams as event source. For more information see https://docs.aws.amazon.com/lambda/latest/dg/with-documentdb.html;>https://docs.aws.amazon.com/lambda/latest/dg/with-documentdb.html. Amazon CloudWatch Internet Monitor Features CloudWatch Internet Monitor is a a new service within CloudWatch that will help application developers and network engineers continuously monitor internet performance metrics such as availability and performance between their AWS-hosted applications and end-users of these applications Amazon DevOps Guru Features This release adds the description field on ListAnomaliesForInsight and DescribeAnomaly API responses for proactive anomalies. Amazon Timestream Write Features This release adds the ability to ingest batched historical data or migrate data in bulk from S3 into Timestream using CSV files. Elastic Disaster Recovery Service Features New fields were added to reflect availability zone data in source server and recovery instance description commands responses, as well as source server launch status. Commits https://github.com/aws/aws-sdk-java/commit/8d9555dca5c43682cba7c1f67981bd7a61fd0f17;>8d9555d AWS SDK for Java 1.12.416 https://github.com/aws/aws-sdk-java/commit/37b4fb884e3ed111e3211c5c8a4d8a529146a6d4;>37b4fb8 Update GitHub version number to 1.12.416-SNAPSHOT See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.415...1.12.416;>compare view Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it -
[GitHub] [tika] dependabot[bot] opened a new pull request, #988: Bump maven-compiler-plugin from 3.10.1 to 3.11.0
dependabot[bot] opened a new pull request, #988: URL: https://github.com/apache/tika/pull/988 Bumps [maven-compiler-plugin](https://github.com/apache/maven-compiler-plugin) from 3.10.1 to 3.11.0. Commits https://github.com/apache/maven-compiler-plugin/commit/eeda628b832bf3cc27571e2073f62d582a6d9527;>eeda628 [maven-release-plugin] prepare release maven-compiler-plugin-3.11.0 https://github.com/apache/maven-compiler-plugin/commit/82b799f3501d0dc3ef868859245816c563c46f04;>82b799f [MCOMPILER-527] Upgrade plexus-java to 1.1.2 (https://github-redirect.dependabot.com/apache/maven-compiler-plugin/issues/177;>#177) https://github.com/apache/maven-compiler-plugin/commit/f9c2350c885a96638db66fbab4d9180729a31d5a;>f9c2350 [MCOMPILER-526] Fix IT (https://github-redirect.dependabot.com/apache/maven-compiler-plugin/issues/178;>#178) https://github.com/apache/maven-compiler-plugin/commit/4022bd0f37626124dad394b2e4583fd6768fa74a;>4022bd0 [MCOMPILER-494] - Add a useModulePath switch to the testCompile mojo (https://github-redirect.dependabot.com/apache/maven-compiler-plugin/issues/119;>#119) https://github.com/apache/maven-compiler-plugin/commit/f4a8a54e116b07e888ac7b6371fa24b7a81517b3;>f4a8a54 [MCOMPILER-525] Incorrect detection of dependency change (https://github-redirect.dependabot.com/apache/maven-compiler-plugin/issues/172;>#172) https://github.com/apache/maven-compiler-plugin/commit/86b9f5972bcb005305f8abb8fb1f3c0d89df2726;>86b9f59 [MCOMPILER-395] Allow dependency exclusions for 'annotationProcessorPaths' (#... https://github.com/apache/maven-compiler-plugin/commit/e304ceb91cb625399638f95be41e6c23ca0970d0;>e304ceb [MCOMPILER-526] Ignore reformat commit for git blame https://github.com/apache/maven-compiler-plugin/commit/f7a4613eaa2364dcaf10f96f04a6b1afb2feb7ed;>f7a4613 [MCOMPILER-526] Reformat https://github.com/apache/maven-compiler-plugin/commit/cc78aee657a684af721b3efafd0e1525272d4201;>cc78aee [MCOMPILER-526] Upgrade to parent 39 https://github.com/apache/maven-compiler-plugin/commit/3dca82f4bf91e747c81ff3fe43e670f7cd7c08e1;>3dca82f [MCOMPILER-526] Add packages to please the formatter Additional commits viewable in https://github.com/apache/maven-compiler-plugin/compare/maven-compiler-plugin-3.10.1...maven-compiler-plugin-3.11.0;>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.maven.plugins:maven-compiler-plugin=maven=3.10.1=3.11.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (TIKA-3983) Snapshot versions mismatch
Alexey Pismenskiy created TIKA-3983: --- Summary: Snapshot versions mismatch Key: TIKA-3983 URL: https://issues.apache.org/jira/browse/TIKA-3983 Project: Tika Issue Type: Bug Components: build Affects Versions: 2.7.1 Reporter: Alexey Pismenskiy [https://repository.apache.org/content/repositories/snapshots/org/apache/tika/tika-parsers-standard-package/2.7.1-SNAPSHOT/] has a maven-metadata.xml that points to the snapshot version that does not exist: 20230227.092344 43 That's the reason why local build, that uses a snapshot (2.7.1-SNAPSHOT) fails: Apache snapshots: tried [warn] https://repository.apache.org/content/repositories/snapshots/org/apache/tika/tika-parsers-standard-package/2.7.1-SNAPSHOT/tika-parsers-standard-package-2.7.1-20230227.092344-43.pom [warn] https://repository.apache.org/content/repositories/snapshots/org/apache/tika/tika-parsers-standard-package/2.7.1-SNAPSHOT/tika-parsers-standard-package-2.7.1-SNAPSHOT.pom [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] :: org.apache.tika#tika-parsers-standard-package;2.7.1-SNAPSHOT: not found -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika] apismensky commented on pull request #985: [TIKA-3979] OneNoteParser - Improve performance for deserialization
apismensky commented on PR #985: URL: https://github.com/apache/tika/pull/985#issuecomment-1446966128 Confirming with my file: Before fix: 26844 ms After fix: 692 ms Yay yay! @nddipiazza thanks for fixing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-3979) OneNoteParser - Improve performance for deserialization
[ https://issues.apache.org/jira/browse/TIKA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694154#comment-17694154 ] ASF GitHub Bot commented on TIKA-3979: -- apismensky commented on PR #985: URL: https://github.com/apache/tika/pull/985#issuecomment-1446966128 Confirming with my file: Before fix: 26844 ms After fix: 692 ms Yay yay! @nddipiazza thanks for fixing! > OneNoteParser - Improve performance for deserialization > --- > > Key: TIKA-3979 > URL: https://issues.apache.org/jira/browse/TIKA-3979 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 2.7.0 >Reporter: David Xie >Priority: Major > Attachments: image-2023-02-20-14-42-10-590.png, > image-2023-02-25-12-01-40-311.png > > > We noticed some performance issues specific to parsing OneNote files. Our cpu > profiler reports that the parser spends a lot of time on deserializing byte > arrays (image included below) > !image-2023-02-20-14-42-10-590.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3979) OneNoteParser - Improve performance for deserialization
[ https://issues.apache.org/jira/browse/TIKA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694134#comment-17694134 ] ASF GitHub Bot commented on TIKA-3979: -- nddipiazza commented on PR #985: URL: https://github.com/apache/tika/pull/985#issuecomment-1446882008 yes that is because the onenote parser for alterantive format was just printing some general header information before. now it's actually parsing it (slowly due to the bug) which should now be fixed hopefully. sorry about that! > OneNoteParser - Improve performance for deserialization > --- > > Key: TIKA-3979 > URL: https://issues.apache.org/jira/browse/TIKA-3979 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 2.7.0 >Reporter: David Xie >Priority: Major > Attachments: image-2023-02-20-14-42-10-590.png, > image-2023-02-25-12-01-40-311.png > > > We noticed some performance issues specific to parsing OneNote files. Our cpu > profiler reports that the parser spends a lot of time on deserializing byte > arrays (image included below) > !image-2023-02-20-14-42-10-590.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika] nddipiazza commented on pull request #985: [TIKA-3979] OneNoteParser - Improve performance for deserialization
nddipiazza commented on PR #985: URL: https://github.com/apache/tika/pull/985#issuecomment-1446882008 yes that is because the onenote parser for alterantive format was just printing some general header information before. now it's actually parsing it (slowly due to the bug) which should now be fixed hopefully. sorry about that! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-3979) OneNoteParser - Improve performance for deserialization
[ https://issues.apache.org/jira/browse/TIKA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694096#comment-17694096 ] ASF GitHub Bot commented on TIKA-3979: -- apismensky commented on PR #985: URL: https://github.com/apache/tika/pull/985#issuecomment-1446743975 I was going to submit this issue last week. My observation was similar - lots of overhead around BitSet - mem allocations / cpu. We switched from tika 1.27 to 2.7.0 For one of the files we saw the difference: Extraction took: 2199 ( tika 1.27) vs Extraction took: 27010 ( tika 2.7.0) Both in ms, so it is more than 10 times slower. Original file size is 50.5 Mb > OneNoteParser - Improve performance for deserialization > --- > > Key: TIKA-3979 > URL: https://issues.apache.org/jira/browse/TIKA-3979 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 2.7.0 >Reporter: David Xie >Priority: Major > Attachments: image-2023-02-20-14-42-10-590.png, > image-2023-02-25-12-01-40-311.png > > > We noticed some performance issues specific to parsing OneNote files. Our cpu > profiler reports that the parser spends a lot of time on deserializing byte > arrays (image included below) > !image-2023-02-20-14-42-10-590.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika] apismensky commented on pull request #985: [TIKA-3979] OneNoteParser - Improve performance for deserialization
apismensky commented on PR #985: URL: https://github.com/apache/tika/pull/985#issuecomment-1446743975 I was going to submit this issue last week. My observation was similar - lots of overhead around BitSet - mem allocations / cpu. We switched from tika 1.27 to 2.7.0 For one of the files we saw the difference: Extraction took: 2199 ( tika 1.27) vs Extraction took: 27010 ( tika 2.7.0) Both in ms, so it is more than 10 times slower. Original file size is 50.5 Mb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org