[jira] [Created] (TIKA-4026) Consider adding -kb to the unrar parser

2023-05-02 Thread Tim Allison (Jira)
Tim Allison created TIKA-4026:
-

 Summary: Consider adding -kb to the unrar parser
 Key: TIKA-4026
 URL: https://issues.apache.org/jira/browse/TIKA-4026
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4025) Extract frame count from gifs

2023-05-02 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718681#comment-17718681
 ] 

Tim Allison commented on TIKA-4025:
---

Y, agreed. Thank you!  I looked at the xmp spec that you referenced in the TIFF 
class, and that is fairly complex.

Anyone have recs for standard for frame count for video?

> Extract frame count from gifs
> -
>
> Key: TIKA-4025
> URL: https://issues.apache.org/jira/browse/TIKA-4025
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Trivial
>
> Over on TIKA-4019, an animated gif example made me realize that we're not 
> currently extracting the number of frames for gifs into the metadata.  We 
> should do this.
>  
> Any recs for the name of the metadata key?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4025) Extract frame count from gifs

2023-05-02 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718674#comment-17718674
 ] 

Nick Burch commented on TIKA-4025:
--

Would a video metadata specification's frame count be a better home? 

XMP seems to have a pretty complex FrameCount type, from a quick glance I 
couldn't spot an obvious property using that but I feel like there ought to be 
one...

> Extract frame count from gifs
> -
>
> Key: TIKA-4025
> URL: https://issues.apache.org/jira/browse/TIKA-4025
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Trivial
>
> Over on TIKA-4019, an animated gif example made me realize that we're not 
> currently extracting the number of frames for gifs into the metadata.  We 
> should do this.
>  
> Any recs for the name of the metadata key?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (TIKA-4025) Extract frame count from gifs

2023-05-02 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718654#comment-17718654
 ] 

Tim Allison edited comment on TIKA-4025 at 5/2/23 5:09 PM:
---

[~nick] and fellow devs how about using our existing TIFF#EXIF_PAGE_COUNT ? 

 

HI don't like that because we'd be getting the frame count out of the 
reader 
({{{}{color:#00}reader{color}.getNumImages({color:#0033b3}true{color})){}}},
 not from exif metadata.

 

Other recs?


was (Author: talli...@mitre.org):
[~nick] and fellow devs how about using our existing TIFF#EXIF_PAGE_COUNT ? 

 

HI don't like that because we'd be getting the frame count out of the 
reader 
({{{color:#00}reader{color}.getNumImages({color:#0033b3}true{color})}}, not 
from exif metadata.

 

Other recs?

> Extract frame count from gifs
> -
>
> Key: TIKA-4025
> URL: https://issues.apache.org/jira/browse/TIKA-4025
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Trivial
>
> Over on TIKA-4019, an animated gif example made me realize that we're not 
> currently extracting the number of frames for gifs into the metadata.  We 
> should do this.
>  
> Any recs for the name of the metadata key?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4025) Extract frame count from gifs

2023-05-02 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718654#comment-17718654
 ] 

Tim Allison commented on TIKA-4025:
---

[~nick] and fellow devs how about using our existing TIFF#EXIF_PAGE_COUNT ? 

 

HI don't like that because we'd be getting the frame count out of the 
reader 
({{{color:#00}reader{color}.getNumImages({color:#0033b3}true{color})}}, not 
from exif metadata.

 

Other recs?

> Extract frame count from gifs
> -
>
> Key: TIKA-4025
> URL: https://issues.apache.org/jira/browse/TIKA-4025
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Trivial
>
> Over on TIKA-4019, an animated gif example made me realize that we're not 
> currently extracting the number of frames for gifs into the metadata.  We 
> should do this.
>  
> Any recs for the name of the metadata key?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-4025) Extract frame count from gifs

2023-05-02 Thread Tim Allison (Jira)
Tim Allison created TIKA-4025:
-

 Summary: Extract frame count from gifs
 Key: TIKA-4025
 URL: https://issues.apache.org/jira/browse/TIKA-4025
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison


Over on TIKA-4019, an animated gif example made me realize that we're not 
currently extracting the number of frames for gifs into the metadata.  We 
should do this.

 

Any recs for the name of the metadata key?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4019) Animated gif embedded in msg email triggers gifToPix exception

2023-05-02 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718604#comment-17718604
 ] 

Tim Allison commented on TIKA-4019:
---

It looks like jammy is using giflib 5.1.9, and kinetic is using 5.2.1 (as is my 
mac laptop).  If I bump the base image to kinetic, I don't get that exception.

 

So, we can either build giflib 5.2.1 in our jammy base image or bump the base 
to kinetic???

> Animated gif embedded in msg email triggers gifToPix exception
> --
>
> Key: TIKA-4019
> URL: https://issues.apache.org/jira/browse/TIKA-4019
> Project: Tika
>  Issue Type: Bug
> Environment: Docker image "apache/tika:latest-full"
>Reporter: adjenks
>Priority: Minor
>
> I get many of these errors:
> {quote}org.apache.tika.exception.TikaException: TesseractOCRParser bad exit 
> value 1 err msg: Error in gifToPix: failed to read GIF data
> Error in pixReadStreamGif: failed to read gif from file data
> Error in pixReadStream: gif: no pix returned
> Error in pixRead: pix not read
> Error during processing.
> at 
> org.apache.tika.parser.ocr.TesseractOCRParser.runOCRProcess(TesseractOCRParser.java:458)
> at 
> org.apache.tika.parser.ocr.TesseractOCRParser.doOCR(TesseractOCRParser.java:412)
> ...etc...
> {quote}
> The common theme among all of the files producing these errors is that they 
> are Outlook msg files with embedded animated gifs.
> I looked at opening a ticket for Tesseract but I recall their site said 
> something like please open a ticket with the software that uses Tesseract 
> first in case it's a configuration problem or something.
> Any which way, I think animated gifs should just be ignored. I get hundreds 
> of these errors.
> Thank you. Good luck.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] THausherr merged pull request #1105: Bump aws.version from 1.12.459 to 1.12.460

2023-05-02 Thread via GitHub


THausherr merged PR #1105:
URL: https://github.com/apache/tika/pull/1105


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #1105: Bump aws.version from 1.12.459 to 1.12.460

2023-05-02 Thread via GitHub


dependabot[bot] opened a new pull request, #1105:
URL: https://github.com/apache/tika/pull/1105

   Bumps `aws.version` from 1.12.459 to 1.12.460.
   Updates `aws-java-sdk-s3` from 1.12.459 to 1.12.460
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-s3's
 changelog.
   
   1.12.460 2023-05-01
   AWS Compute Optimizer
   
   
   Features
   
   support for tag filtering within compute optimizer. ability to filter 
recommendation results by tag and tag key value pairs. ability to filter by 
inferred workload type added.
   
   
   
   AWS Key Management Service
   
   
   Features
   
   This release makes the NitroEnclave request parameter Recipient and the 
response field for CiphertextForRecipient available in AWS SDKs. It also adds 
the regex pattern for CloudHsmClusterId validation.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/c2c2ba95c0a5c502df78a5ac8dff511a57de1704;>c2c2ba9
 AWS SDK for Java 1.12.460
   https://github.com/aws/aws-sdk-java/commit/d01e3d7fc874d27a71bb0bdbe5007449008fefc8;>d01e3d7
 Update GitHub version number to 1.12.460-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.459...1.12.460;>compare 
view
   
   
   
   
   Updates `aws-java-sdk-transcribe` from 1.12.459 to 1.12.460
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-transcribe's
 changelog.
   
   1.12.460 2023-05-01
   AWS Compute Optimizer
   
   
   Features
   
   support for tag filtering within compute optimizer. ability to filter 
recommendation results by tag and tag key value pairs. ability to filter by 
inferred workload type added.
   
   
   
   AWS Key Management Service
   
   
   Features
   
   This release makes the NitroEnclave request parameter Recipient and the 
response field for CiphertextForRecipient available in AWS SDKs. It also adds 
the regex pattern for CloudHsmClusterId validation.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/c2c2ba95c0a5c502df78a5ac8dff511a57de1704;>c2c2ba9
 AWS SDK for Java 1.12.460
   https://github.com/aws/aws-sdk-java/commit/d01e3d7fc874d27a71bb0bdbe5007449008fefc8;>d01e3d7
 Update GitHub version number to 1.12.460-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.459...1.12.460;>compare 
view
   
   
   
   
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org