[jira] [Commented] (TIKA-4064) Update to 2.8.1
[ https://issues.apache.org/jira/browse/TIKA-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731281#comment-17731281 ] Hudson commented on TIKA-4064: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk11 #1105 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1105/]) TIKA-4064: update build plugins (tilman: [https://github.com/apache/tika/commit/f2122dbcf2a8426d141e68591ad47730abfc160a]) * (edit) tika-parent/pom.xml * (edit) tika-core/pom.xml > Update to 2.8.1 > --- > > Key: TIKA-4064 > URL: https://issues.apache.org/jira/browse/TIKA-4064 > Project: Tika > Issue Type: Task > Components: build >Affects Versions: 2.8.0 >Reporter: Tilman Hausherr >Priority: Minor > Fix For: 2.8.1 > > > The latest maven versions plugin finds much more outdated stuff than the > previous one, so I'll do a few updates. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4064) Update to 2.8.1
[ https://issues.apache.org/jira/browse/TIKA-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731266#comment-17731266 ] Hudson commented on TIKA-4064: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1104 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1104/]) TIKA-4064: update build plugins, cxf, aws (tilman: [https://github.com/apache/tika/commit/57d29fb6633a3c65fd40a29b93287f4d4695727d]) * (edit) tika-parent/pom.xml > Update to 2.8.1 > --- > > Key: TIKA-4064 > URL: https://issues.apache.org/jira/browse/TIKA-4064 > Project: Tika > Issue Type: Task > Components: build >Affects Versions: 2.8.0 >Reporter: Tilman Hausherr >Priority: Minor > Fix For: 2.8.1 > > > The latest maven versions plugin finds much more outdated stuff than the > previous one, so I'll do a few updates. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4061) Incorrect Automatic-Module-Name in tika-parser-crypto-module
[ https://issues.apache.org/jira/browse/TIKA-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731242#comment-17731242 ] Hudson commented on TIKA-4061: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4061 -- incorrect automatic module name in crypto parser module (tallison: [https://github.com/apache/tika/commit/710d972ee1278e347b02527269050df727ee7ce8]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-crypto-module/pom.xml > Incorrect Automatic-Module-Name in tika-parser-crypto-module > > > Key: TIKA-4061 > URL: https://issues.apache.org/jira/browse/TIKA-4061 > Project: Tika > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Jerome Isaac Haltom >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.8.1 > > > The Automatic-Module-Name property for tika-parse-crypto-module.jar in > MANIFEST.MF is set to org.apache.tika.parser.code. This is the incorrect > value. > This current blocks usage of Tika's Maven artifacts within IKVM projects. It > probably has ramifications for JDK9+ projects using modules as well, but > that's not me, so I don't know. > [https://github.com/ikvmnet/ikvm-maven/issues/33] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4060) Add magic to audio/aac in tika-mimetypes.xml
[ https://issues.apache.org/jira/browse/TIKA-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731249#comment-17731249 ] Hudson commented on TIKA-4060: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4060 Test AAC files, based on testWAV.wav, one without ID3, one with dummy ID3 values (nick: [https://github.com/apache/tika/commit/500900d67ede02e87440caa9f67501d3fe59b770]) * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/test-documents/testAACid3.aac * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/test-documents/testAAC.aac > Add magic to audio/aac in tika-mimetypes.xml > > > Key: TIKA-4060 > URL: https://issues.apache.org/jira/browse/TIKA-4060 > Project: Tika > Issue Type: Sub-task >Reporter: Gregory Lepore >Priority: Minor > Fix For: 2.8.1 > > Attachments: > 067aece423d8694a891a61a45ac0e870914bc1314ef510ac40b36ca3397843ef, > cb1bec08898db7a733b42ac44bdd76b6177cd3a07a2435a83fd99b7453d564d1 > > > Currently tika-mimetypes only recognizes audio/aac files by the file > extension. PRONOM recently added support for identifying aac files, but the > signature is tricky. There are two signatures, below in PRONOM format curly > braces mean to look ahead between the two values for the subsequent patterns. > > The first pattern is pretty basic, the second pattern is the first pattern > after a 2048 ID3 header. > > ||Name|Audio Data Transport Stream sig.1| > ||Description|An FF pattern from BOF with variation of byte stream| > ||Byte sequences| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Maximum Offset|0| > ||Byte order| | > ||Value|FF(F0\|F1\|F8\|F9)(40\|41\|44\|45\|48\|49\|4C\|4D\|50\|51\|54\|55\|58\|59\|5C\|5D\|60\|61\|64\|65\|68\|69\|6C\|6D\|70\|71\|80\|81\|84\|85\|88\|89\|8C\|8D\|90\|91\|94\|95\|98\|99\|9C\|9D\|A0\|A1\|A4\|A5\|A8\|A9\|AC\|AD\|B0\|B1)(00\|01\|20\|40\|41\|60\|80\|81\|60\|A0\|C0\|C1\|E0)| > | > ||Name|Audio Data Transport Stream sig.2| > ||Description|ID3 tag variation with variable byte stream| > ||Byte sequences| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Maximum Offset|0| > ||Byte order| | > ||Value|494433\{0-2045}FF(F0\|F1\|F8\|F9)(40\|41\|44\|45\|48\|49\|4C\|4D\|50\|51\|54\|55\|58\|59\|5C\|5D\|60\|61\|64\|65\|68\|69\|6C\|6D\|70\|71\|80\|81\|84\|85\|88\|89\|8C\|8D\|90\|91\|94\|95\|98\|99\|9C\|9D\|A0\|A1\|A4\|A5\|A8\|A9\|AC\|AD\|B0\|B1)(00\|01\|20\|40\|41\|60\|80\|81\|60\|A0\|C0\|C1\|E0)| > | -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4003) application/vnd.isac.fcs
[ https://issues.apache.org/jira/browse/TIKA-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731254#comment-17731254 ] Hudson commented on TIKA-4003: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4003 (#1150) (github: [https://github.com/apache/tika/commit/487f694938b99a507ea57349e3db084e6c25414b]) * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/mime/OneOffMimeTest.java * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml TIKA-4003 -- add extra spaces to application/vnd.isac.fcs (tallison: [https://github.com/apache/tika/commit/daad9eba7ef37d570d0ee12685c7a86a687f029a]) * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml > application/vnd.isac.fcs > > > Key: TIKA-4003 > URL: https://issues.apache.org/jira/browse/TIKA-4003 > Project: Tika > Issue Type: Sub-task >Reporter: Tim Allison >Priority: Major > Fix For: 2.8.1 > > Attachments: 3215apc_14.fcs, > BD-FACS_Aria_II-Compensation_Controls_B515_Stained_Control.fcs, > Beckman_Coulter-Cyan.fcs > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3056) General upgrades for 1.24
[ https://issues.apache.org/jira/browse/TIKA-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731245#comment-17731245 ] Hudson commented on TIKA-3056: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-3056 -- add magic for ms-fontobject (tallison: [https://github.com/apache/tika/commit/0f8ea6183f3eead20d60c9f9140680d6ad8bec6e]) * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml > General upgrades for 1.24 > - > > Key: TIKA-3056 > URL: https://issues.apache.org/jira/browse/TIKA-3056 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4048) Gzipped WARC not identifying all assets
[ https://issues.apache.org/jira/browse/TIKA-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731253#comment-17731253 ] Hudson commented on TIKA-4048: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4048 -- change default decompressConcatenated to true in CompressorParser (#1166) (github: [https://github.com/apache/tika/commit/1f41ead892b49606c8bc43c97b48d6a05af4becd]) * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/test/resources/test-documents/multiple.gz * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/pkg/GzipParserTest.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/main/java/org/apache/tika/parser/pkg/CompressorParser.java * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/test/resources/org/apache/tika/parser/pkg/tika-gzip-config.xml * (edit) CHANGES.txt > Gzipped WARC not identifying all assets > --- > > Key: TIKA-4048 > URL: https://issues.apache.org/jira/browse/TIKA-4048 > Project: Tika > Issue Type: Bug >Reporter: Gregory Lepore >Priority: Minor > Fix For: 2.8.1 > > Attachments: Screenshot 2023-05-30 at 3.49.19 PM.png, Screenshot > 2023-05-30 at 3.50.41 PM.png, rec-20230518121844489398-5335604b8b23.warc, > rec-20230518121844489398-5335604b8b23.warc.gz, > rec-20230518121844489398-5335604b8b23.warc.gz.json, > rec-20230518121844489398-5335604b8b23.warc.json > > > The WARC parser works for non GZipped WARC files, but for GZipped WARC files > it appears not all embedded files are being identified. > > Processing a WARC.GZ file should return identical JSON output as the plain > WARC file, with the addition of the GZ file metadata. However, in the > attached JSON outputs, the JPEG present in the plain WARC file is not > represented in the WARC.GZ.json file. > > Additionally, the warc: metadata is not being returned for all files, > although this may be by design. > > Attached are two JSON files, one for the GZipped WARC file and one for the > plain WARC file. And the two original files. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4055) Write limit not working correctly in RecursiveParserWrapper
[ https://issues.apache.org/jira/browse/TIKA-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731246#comment-17731246 ] Hudson commented on TIKA-4055: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4055 -- fix bug in writelimit checks in RecursiveParserWrapper and a separate bug in /rmeta (#1156) (github: [https://github.com/apache/tika/commit/f41d8c35a78e845fc1adf548e8eea3df5463a63b]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java * (edit) CHANGES.txt * (edit) tika-core/src/main/java/org/apache/tika/parser/RecursiveParserWrapper.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/log4j.properties * (edit) tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/RecursiveMetadataResource.java * (edit) tika-server/tika-server-standard/src/test/java/org/apache/tika/server/standard/RecursiveMetadataResourceTest.java > Write limit not working correctly in RecursiveParserWrapper > --- > > Key: TIKA-4055 > URL: https://issues.apache.org/jira/browse/TIKA-4055 > Project: Tika > Issue Type: Bug >Reporter: Tim Allison >Priority: Major > Fix For: 2.8.1 > > > [~g...@rhobard.com] noticed that the write limit in the > RecursiveParserWrapper is not working correctly. I can confirm this is a > bug. I'm working on a fix now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4005) application/x-endnote-style
[ https://issues.apache.org/jira/browse/TIKA-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731251#comment-17731251 ] Hudson commented on TIKA-4005: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4005 (#1149) (github: [https://github.com/apache/tika/commit/223ec8e47efdae5748d6377491ddb24c2feade67]) * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml > application/x-endnote-style > --- > > Key: TIKA-4005 > URL: https://issues.apache.org/jira/browse/TIKA-4005 > Project: Tika > Issue Type: Sub-task >Reporter: Tim Allison >Priority: Major > Fix For: 2.8.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4004) font/otf application/vnd.ms-opentype
[ https://issues.apache.org/jira/browse/TIKA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731252#comment-17731252 ] Hudson commented on TIKA-4004: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4004 -- add magic for application/x-font-otf (tallison: [https://github.com/apache/tika/commit/8f8c9f9190df54fa843cf7dd5cdc34a3c87496ce]) * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml > font/otf application/vnd.ms-opentype > > > Key: TIKA-4004 > URL: https://issues.apache.org/jira/browse/TIKA-4004 > Project: Tika > Issue Type: Sub-task >Reporter: Tim Allison >Priority: Major > Fix For: 2.8.1 > > Attachments: 00.warc, aller-bold.eot, aller-light.eot, > fleurons.eot, index.html_id=45_and_type=eot, index.html_id=67_and_type=eot, > index.html_id=75_and_type=eot, index.html_id=77_and_type=eot, > index.html_id=80_and_type=eot, index.html_id=83_and_type=eot, > index.html_id=84_and_type=eot > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4002) application/vnd.tcpdump.pcapng
[ https://issues.apache.org/jira/browse/TIKA-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731244#comment-17731244 ] Hudson commented on TIKA-4002: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4002 -- add mime type detection for pcapng (#1152) (github: [https://github.com/apache/tika/commit/b0080e7df9cc4dda9a01a5fac6631c74a0e2a97a]) * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/mime/OneOffMimeTest.java > application/vnd.tcpdump.pcapng > -- > > Key: TIKA-4002 > URL: https://issues.apache.org/jira/browse/TIKA-4002 > Project: Tika > Issue Type: Sub-task >Reporter: Tim Allison >Priority: Major > Fix For: 2.8.1 > > Attachments: fmt_779_pcap_Packet_Capture_small_capture.pcap > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4046) Bump siegfried detector timeout to one minute
[ https://issues.apache.org/jira/browse/TIKA-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731243#comment-17731243 ] Hudson commented on TIKA-4046: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4046 -- bump siegfried timeout to 1 minute. (tallison: [https://github.com/apache/tika/commit/8877e9fc7ab2eb004ff7b5390aa281a7357a6eb1]) * (edit) tika-detectors/tika-detector-siegfried/src/main/java/org/apache/tika/detect/siegfried/SiegfriedDetector.java > Bump siegfried detector timeout to one minute > - > > Key: TIKA-4046 > URL: https://issues.apache.org/jira/browse/TIKA-4046 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Trivial > Fix For: 2.8.1 > > > It looks like I set the siegfried timeout to 6000 milliseconds. I'm sure > that's a typo for 6 Let's bump it to a minute. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4063) PipesServer should not initialize emitters if the server will never emit results
[ https://issues.apache.org/jira/browse/TIKA-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731240#comment-17731240 ] Hudson commented on TIKA-4063: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4063 -- skip initialization of emitter in PipesServer if emitting from the server has been turned off. (tallison: [https://github.com/apache/tika/commit/1da3b76dee4aef19f0019eea0210a58fbaabcff2]) * (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java > PipesServer should not initialize emitters if the server will never emit > results > > > Key: TIKA-4063 > URL: https://issues.apache.org/jira/browse/TIKA-4063 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Trivial > Fix For: 2.8.1 > > > As a safety valve for large extracts, we enabled direct emitting of data from > the PipesServer, without passing the data back to the PipesClient to be > emitted by the main process. > If a user has disabled emitting from the PipesServer, we should not > initialize the emitters in the PipesServer. > I ran into this recently because sqlite does not like multiple processes > interacting with the same db afaict. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4039) Allow users to set the maximum attachment size in the /unpack resource of tika-server
[ https://issues.apache.org/jira/browse/TIKA-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731250#comment-17731250 ] Hudson commented on TIKA-4039: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4039 (#1181) (github: [https://github.com/apache/tika/commit/2d9daef859296cad877caf29ad7765c0709472d0]) * (edit) CHANGES.txt * (edit) tika-server/tika-server-standard/src/test/java/org/apache/tika/server/standard/UnpackerResourceTest.java * (edit) tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/UnpackerResource.java > Allow users to set the maximum attachment size in the /unpack resource of > tika-server > - > > Key: TIKA-4039 > URL: https://issues.apache.org/jira/browse/TIKA-4039 > Project: Tika > Issue Type: Improvement > Components: config, parser >Affects Versions: 2.7.0 >Reporter: Shay barak >Assignee: Tim Allison >Priority: Blocker > Fix For: 2.8.1 > > Attachments: tika-config.xml > > > Adding the option to override the maximum bytes that Unrar parser can handle > so I would not get the TikaMemoryLimitException. > Wish to have the configuration to look like this: > > > type="int">10 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4054) Add various file identifications to reduce application/octet-stream
[ https://issues.apache.org/jira/browse/TIKA-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731248#comment-17731248 ] Hudson commented on TIKA-4054: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4054 -- add a bunch of mimes via Greg Lepore (#1158) (github: [https://github.com/apache/tika/commit/4edff73f0fe3da1df0ba8d8c5a367fbd35b2af34]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/mime/OneOffMimeTest.java * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml > Add various file identifications to reduce application/octet-stream > --- > > Key: TIKA-4054 > URL: https://issues.apache.org/jira/browse/TIKA-4054 > Project: Tika > Issue Type: Sub-task >Reporter: Gregory Lepore >Priority: Major > Fix For: 2.8.1 > > > Catch all task for various format identification data which are currently > being identified as application/octet-stream. Most data is from PRONOM. > > SPSS Data File > application/x-spss-sav > ||External signatures|File extension: sav| > ||Internal signatures|| > ||Name|SPSS Data File| > ||Description|BOF: $FL2@(#)| > ||Byte sequences|| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Maximum Offset|0| > ||Byte order| | > ||Value|24464C3240282329| > > Amiga Disk File > application/x-amiga-disk-format > ||External signatures|File extension: adf| > ||Internal signatures|| > ||Name|Amiga Disk File| > ||Description|BOF: ‘DOS’ followed by ‘00\|01\|02\|03\|04\|05\|06\|07’ > depending on the format of the disk. More information on the internal > signature can be found here: [http://lclevy.free.fr/adflib/adf_info.html#p41]| > ||Byte sequences|| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Maximum Offset|0| > ||Byte order| | > ||Value|444F53(00\|01\|02\|03\|04\|05\|06\|07)| > > JEOL NMR Spectroscopy > chemical/x-jeol-jdf > ||External signatures|File extension: jdf| > ||Internal signatures| | > ||Name|JDF NMR Spectroscopy big endian| > ||Description|Big Endian: BOF: 4A454F4C2E4E4D52 (JEOL.NMR)| > ||Byte sequences|| > > ||Position type|Absolute from BOF| > ||Offset|0| > ||Maximum Offset|0| > ||Byte order| | > ||Value|4A454F4C2E4E4D52| > | | | > ||Name|JDF little endian| > ||Description|Little Endian: 524D4E2E4C4F454A (RMN.LOEJ)| > ||Byte sequences| | > ||Position type|Absolute from BOF| > ||Offset|0| > ||Maximum Offset|0| > ||Byte order| | > ||Value|524D4E2E4C4F454A| > > ASPRS Lidar Data Exchange Format > no mimetype found > ||External signatures|File extension: las > File extension: laz| > ||Internal signatures|| > ||Name|ASPRS Lidar Data Exchange Format 1.2| > ||Description|ASCII header: LASF, followed after 20 bytes by version number > 1.2| > ||Byte sequences|| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Byte order| | > ||Value|4C415346\{20}0102\{78}[00:99]| > > ASPRS Lidar Data Exchange Format v1.1 > no mimetype found > ||External signatures|File extension: las > File extension: laz| > ||Internal signatures|| > ||Name|ASPRS Lidar Data Exchange Format 1.1| > ||Description|ASCII header: LASF, followed after 20 bytes by version number > 1.1| > ||Byte sequences|| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Byte order| | > ||Value|4C415346\{20}0101\{78}[00:99]| > > 3D Studio > image/x-3ds > ||External signatures|File extension: 3ds| > ||Internal signatures|| > ||Name|3D Studio (V1)| > ||Description|Primary chunk ID, chunk length, version subchunk ID, chunk > length, version, 3D-editor chunk ID.| > ||Byte sequences|| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Byte order|Little-endian| > ||Value|4D4D\{4}02000A00(03\|04)\{3}3D3D| > ||Name|3D Studio (V2)| > ||Description|Primary chunk ID, chunk length, 3D-editor chunk ID| > ||Byte sequences|| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Maximum Offset|0| > ||Byte order| | > ||Value|4D4D\{4}3D3D| > > TAP (ZX Spectrum) > [application/x-spectrum-tzx|https://www.digipres.org/formats/mime-types/#application/x-spectrum-tzx] > ||External signatures|File extension: tap| > ||Internal signatures|| > ||Name|TAPZX| > ||Description|…\{20}ÿ| > ||Byte sequences|| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Maximum Offset|0| > ||Byte order| | > ||Value|13\{20}FF| > > Sibelius > no mimetype found > ||External signatures|File extension: sib| > ||Internal signatures|| > ||Name|Sibelius| > ||Description|Absolute from beginning of file, magic bytes: .SIBELIUS| > ||Byte sequences|| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Maximum Offset|0| > ||Byte order| | > ||Value|0F534942454C495553| > > Portable Sound Format > no mimetype found > ||External signatur
[jira] [Commented] (TIKA-3996) audio/x-sap
[ https://issues.apache.org/jira/browse/TIKA-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731255#comment-17731255 ] Hudson commented on TIKA-3996: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-3996 (#1151) (github: [https://github.com/apache/tika/commit/7118705ef36463a4fd9836f2caedb87dbd5c6ef7]) * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/mime/OneOffMimeTest.java > audio/x-sap > --- > > Key: TIKA-3996 > URL: https://issues.apache.org/jira/browse/TIKA-3996 > Project: Tika > Issue Type: Sub-task >Reporter: Tim Allison >Priority: Major > Fix For: 2.8.1 > > Attachments: airwolf.sap, ala_ma_kota.sap, alchemia.sap > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4000) application/vnd.msa-disk-image
[ https://issues.apache.org/jira/browse/TIKA-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731247#comment-17731247 ] Hudson commented on TIKA-4000: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4000 -- add detection for magic shadow archiver (tallison: [https://github.com/apache/tika/commit/78ce839bcad8d21afc2ce5de48e6d5f6caddfe03]) * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml > application/vnd.msa-disk-image > -- > > Key: TIKA-4000 > URL: https://issues.apache.org/jira/browse/TIKA-4000 > Project: Tika > Issue Type: Sub-task >Reporter: Tim Allison >Priority: Major > Fix For: 2.8.1 > > Attachments: DREAMZ2B.MSA, SOTART2.MSA, TIKBGBB2.MSA > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3941) Consider having pipesserver return intermediate results
[ https://issues.apache.org/jira/browse/TIKA-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731256#comment-17731256 ] Hudson commented on TIKA-3941: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-3941 -- allow reporting of intermediate results from the pipes processor (#1167) (github: [https://github.com/apache/tika/commit/6cea7717c7a90014cd86fa605cc1e9125f173cf4]) * (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncConfig.java * (edit) tika-core/src/test/java/org/apache/tika/pipes/async/MockReporter.java * (edit) tika-core/src/test/java/org/apache/tika/pipes/async/AsyncProcessorTest.java * (edit) tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java * (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesResult.java * (edit) tika-core/src/main/java/org/apache/tika/metadata/TikaCoreProperties.java * (edit) tika-pipes/tika-pipes-reporters/tika-pipes-reporter-jdbc/src/test/java/org/apache/tika/pipes/reporters/jdbc/TestJDBCPipesReporter.java * (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java * (add) tika-core/src/test/java/org/apache/tika/pipes/PipesServerTest.java * (add) tika-core/src/test/java/org/apache/tika/pipes/async/MockDigesterFactory.java * (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesClient.java * (add) tika-core/src/test/resources/org/apache/tika/pipes/TIKA-3941.xml * (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncProcessor.java > Consider having pipesserver return intermediate results > --- > > Key: TIKA-3941 > URL: https://issues.apache.org/jira/browse/TIKA-3941 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Major > Fix For: 2.8.1 > > > If the pipes server crashes, the only information that the pipesclient > receives is of the crash. It would be useful at a minimum to have the pipes > server report an intermediate result after file detection. > Ideally, at a minimum, the pipesclient could report file type, content-length > (if possible) and digest information. > > On another ticket (future work), we could extend intermediate results to > include partial parses/metadata extraction. The challenge here is that the > underlying metadata objects are not thread safe...so we'll punt this to deal > with later if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4062) OfflineContentHandler/ContentHandlerDecorator does not provide option for custom error handling
[ https://issues.apache.org/jira/browse/TIKA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731241#comment-17731241 ] Hudson commented on TIKA-4062: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4062 (#1179) (github: [https://github.com/apache/tika/commit/ceed7be8b1bffd697a79590e50a413744a0b108f]) * (edit) tika-core/src/main/java/org/apache/tika/exception/WriteLimitReachedException.java * (edit) tika-core/src/main/java/org/apache/tika/sax/ContentHandlerDecorator.java > OfflineContentHandler/ContentHandlerDecorator does not provide option for > custom error handling > --- > > Key: TIKA-4062 > URL: https://issues.apache.org/jira/browse/TIKA-4062 > Project: Tika > Issue Type: Bug > Components: tika-core >Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 2.8.0 >Reporter: Ravi Ranjan Jha >Priority: Critical > > OfflineContentHandler/ContentHandlerDecorator does not provide option for > custom error handling > Prior to the change of passing OfflineContentHandler to SAX Parser in > XMLReaderUtils.parseSAX, one could pass a custom ContentHandlerDecorator to > handle exception or override error/warning etc methods. The same is not > possible now because the default impl for handleException in the > OfflineContentHandler's parent ContentHandlerDecorator just throws exception > as shown below: > > protected void handleException(SAXException exception) throws SAXException { > throw exception; > } > > which could probably be (at minimum) > public void handleException(SAXException exception) throws SAXException { > handler.handleException(exception); > } > > This is breaking our app's behavior. Please take it as priority. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4052) application/x-cdf
[ https://issues.apache.org/jira/browse/TIKA-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731239#comment-17731239 ] Hudson commented on TIKA-4052: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1103 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1103/]) TIKA-4052 -- add detection for application/x-cdf (tallison: [https://github.com/apache/tika/commit/0f86aede1e2317b843a6f11ee702570c7d57737d]) * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml > application/x-cdf > - > > Key: TIKA-4052 > URL: https://issues.apache.org/jira/browse/TIKA-4052 > Project: Tika > Issue Type: Sub-task >Reporter: Gregory Lepore >Priority: Major > Fix For: 2.8.1 > > Attachments: track05.cda, track06.cda, track07.cda > > > Examining the Common Crawl files that return application/octet-stream. > > application/x-cdf is one that should be fairly easy to add. > > ||Name|CD Audio| > ||Description|Files are 44 bytes in length, with header sequence ASCII: > RIFF$...CDDAfmt .| > ||Byte sequences| > ||Position type|Absolute from BOF| > ||Offset|0| > ||Byte order| | > ||Value|5249464624004341666D742018| > | -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-4064) Update to 2.8.1
Tilman Hausherr created TIKA-4064: - Summary: Update to 2.8.1 Key: TIKA-4064 URL: https://issues.apache.org/jira/browse/TIKA-4064 Project: Tika Issue Type: Task Components: build Affects Versions: 2.8.0 Reporter: Tilman Hausherr Fix For: 2.8.1 The latest maven versions plugin finds much more outdated stuff than the previous one, so I'll do a few updates. -- This message was sent by Atlassian Jira (v8.20.10#820010)
no tika builds for 29 days
There have been no tika builds for 29 days on the CI: I've tried to start it manually, it failed claiming no maven was available. I then opened and saved the configuration and now it's running. Tilman
[jira] [Commented] (TIKA-3941) Consider having pipesserver return intermediate results
[ https://issues.apache.org/jira/browse/TIKA-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731219#comment-17731219 ] Tilman Hausherr commented on TIKA-3941: --- {{PipesServerTest}} fails on windows, please change {{replaceAll}} to {{replace}} and it works > Consider having pipesserver return intermediate results > --- > > Key: TIKA-3941 > URL: https://issues.apache.org/jira/browse/TIKA-3941 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Major > Fix For: 2.8.1 > > > If the pipes server crashes, the only information that the pipesclient > receives is of the crash. It would be useful at a minimum to have the pipes > server report an intermediate result after file detection. > Ideally, at a minimum, the pipesclient could report file type, content-length > (if possible) and digest information. > > On another ticket (future work), we could extend intermediate results to > include partial parses/metadata extraction. The challenge here is that the > underlying metadata objects are not thread safe...so we'll punt this to deal > with later if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)