[jira] [Created] (TIKA-4026) Consider adding -kb to the unrar parser
Tim Allison created TIKA-4026: - Summary: Consider adding -kb to the unrar parser Key: TIKA-4026 URL: https://issues.apache.org/jira/browse/TIKA-4026 Project: Tika Issue Type: Task Reporter: Tim Allison -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4025) Extract frame count from gifs
[ https://issues.apache.org/jira/browse/TIKA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718681#comment-17718681 ] Tim Allison commented on TIKA-4025: --- Y, agreed. Thank you! I looked at the xmp spec that you referenced in the TIFF class, and that is fairly complex. Anyone have recs for standard for frame count for video? > Extract frame count from gifs > - > > Key: TIKA-4025 > URL: https://issues.apache.org/jira/browse/TIKA-4025 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Trivial > > Over on TIKA-4019, an animated gif example made me realize that we're not > currently extracting the number of frames for gifs into the metadata. We > should do this. > > Any recs for the name of the metadata key? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4025) Extract frame count from gifs
[ https://issues.apache.org/jira/browse/TIKA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718674#comment-17718674 ] Nick Burch commented on TIKA-4025: -- Would a video metadata specification's frame count be a better home? XMP seems to have a pretty complex FrameCount type, from a quick glance I couldn't spot an obvious property using that but I feel like there ought to be one... > Extract frame count from gifs > - > > Key: TIKA-4025 > URL: https://issues.apache.org/jira/browse/TIKA-4025 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Trivial > > Over on TIKA-4019, an animated gif example made me realize that we're not > currently extracting the number of frames for gifs into the metadata. We > should do this. > > Any recs for the name of the metadata key? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (TIKA-4025) Extract frame count from gifs
[ https://issues.apache.org/jira/browse/TIKA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718654#comment-17718654 ] Tim Allison edited comment on TIKA-4025 at 5/2/23 5:09 PM: --- [~nick] and fellow devs how about using our existing TIFF#EXIF_PAGE_COUNT ? HI don't like that because we'd be getting the frame count out of the reader ({{{}{color:#00}reader{color}.getNumImages({color:#0033b3}true{color})){}}}, not from exif metadata. Other recs? was (Author: talli...@mitre.org): [~nick] and fellow devs how about using our existing TIFF#EXIF_PAGE_COUNT ? HI don't like that because we'd be getting the frame count out of the reader ({{{color:#00}reader{color}.getNumImages({color:#0033b3}true{color})}}, not from exif metadata. Other recs? > Extract frame count from gifs > - > > Key: TIKA-4025 > URL: https://issues.apache.org/jira/browse/TIKA-4025 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Trivial > > Over on TIKA-4019, an animated gif example made me realize that we're not > currently extracting the number of frames for gifs into the metadata. We > should do this. > > Any recs for the name of the metadata key? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4025) Extract frame count from gifs
[ https://issues.apache.org/jira/browse/TIKA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718654#comment-17718654 ] Tim Allison commented on TIKA-4025: --- [~nick] and fellow devs how about using our existing TIFF#EXIF_PAGE_COUNT ? HI don't like that because we'd be getting the frame count out of the reader ({{{color:#00}reader{color}.getNumImages({color:#0033b3}true{color})}}, not from exif metadata. Other recs? > Extract frame count from gifs > - > > Key: TIKA-4025 > URL: https://issues.apache.org/jira/browse/TIKA-4025 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Trivial > > Over on TIKA-4019, an animated gif example made me realize that we're not > currently extracting the number of frames for gifs into the metadata. We > should do this. > > Any recs for the name of the metadata key? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-4025) Extract frame count from gifs
Tim Allison created TIKA-4025: - Summary: Extract frame count from gifs Key: TIKA-4025 URL: https://issues.apache.org/jira/browse/TIKA-4025 Project: Tika Issue Type: Task Reporter: Tim Allison Over on TIKA-4019, an animated gif example made me realize that we're not currently extracting the number of frames for gifs into the metadata. We should do this. Any recs for the name of the metadata key? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4019) Animated gif embedded in msg email triggers gifToPix exception
[ https://issues.apache.org/jira/browse/TIKA-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718604#comment-17718604 ] Tim Allison commented on TIKA-4019: --- It looks like jammy is using giflib 5.1.9, and kinetic is using 5.2.1 (as is my mac laptop). If I bump the base image to kinetic, I don't get that exception. So, we can either build giflib 5.2.1 in our jammy base image or bump the base to kinetic??? > Animated gif embedded in msg email triggers gifToPix exception > -- > > Key: TIKA-4019 > URL: https://issues.apache.org/jira/browse/TIKA-4019 > Project: Tika > Issue Type: Bug > Environment: Docker image "apache/tika:latest-full" >Reporter: adjenks >Priority: Minor > > I get many of these errors: > {quote}org.apache.tika.exception.TikaException: TesseractOCRParser bad exit > value 1 err msg: Error in gifToPix: failed to read GIF data > Error in pixReadStreamGif: failed to read gif from file data > Error in pixReadStream: gif: no pix returned > Error in pixRead: pix not read > Error during processing. > at > org.apache.tika.parser.ocr.TesseractOCRParser.runOCRProcess(TesseractOCRParser.java:458) > at > org.apache.tika.parser.ocr.TesseractOCRParser.doOCR(TesseractOCRParser.java:412) > ...etc... > {quote} > The common theme among all of the files producing these errors is that they > are Outlook msg files with embedded animated gifs. > I looked at opening a ticket for Tesseract but I recall their site said > something like please open a ticket with the software that uses Tesseract > first in case it's a configuration problem or something. > Any which way, I think animated gifs should just be ignored. I get hundreds > of these errors. > Thank you. Good luck. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [tika] THausherr merged pull request #1105: Bump aws.version from 1.12.459 to 1.12.460
THausherr merged PR #1105: URL: https://github.com/apache/tika/pull/1105 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] dependabot[bot] opened a new pull request, #1105: Bump aws.version from 1.12.459 to 1.12.460
dependabot[bot] opened a new pull request, #1105: URL: https://github.com/apache/tika/pull/1105 Bumps `aws.version` from 1.12.459 to 1.12.460. Updates `aws-java-sdk-s3` from 1.12.459 to 1.12.460 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-s3's changelog. 1.12.460 2023-05-01 AWS Compute Optimizer Features support for tag filtering within compute optimizer. ability to filter recommendation results by tag and tag key value pairs. ability to filter by inferred workload type added. AWS Key Management Service Features This release makes the NitroEnclave request parameter Recipient and the response field for CiphertextForRecipient available in AWS SDKs. It also adds the regex pattern for CloudHsmClusterId validation. Commits https://github.com/aws/aws-sdk-java/commit/c2c2ba95c0a5c502df78a5ac8dff511a57de1704;>c2c2ba9 AWS SDK for Java 1.12.460 https://github.com/aws/aws-sdk-java/commit/d01e3d7fc874d27a71bb0bdbe5007449008fefc8;>d01e3d7 Update GitHub version number to 1.12.460-SNAPSHOT See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.459...1.12.460;>compare view Updates `aws-java-sdk-transcribe` from 1.12.459 to 1.12.460 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-transcribe's changelog. 1.12.460 2023-05-01 AWS Compute Optimizer Features support for tag filtering within compute optimizer. ability to filter recommendation results by tag and tag key value pairs. ability to filter by inferred workload type added. AWS Key Management Service Features This release makes the NitroEnclave request parameter Recipient and the response field for CiphertextForRecipient available in AWS SDKs. It also adds the regex pattern for CloudHsmClusterId validation. Commits https://github.com/aws/aws-sdk-java/commit/c2c2ba95c0a5c502df78a5ac8dff511a57de1704;>c2c2ba9 AWS SDK for Java 1.12.460 https://github.com/aws/aws-sdk-java/commit/d01e3d7fc874d27a71bb0bdbe5007449008fefc8;>d01e3d7 Update GitHub version number to 1.12.460-SNAPSHOT See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.459...1.12.460;>compare view Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org