[jira] [Commented] (TIKA-3793) General upgrades for 1.28.5

2022-07-11 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564822#comment-17564822
 ] 

Hudson commented on TIKA-3793:
--

SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #236 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/236/])
TIKA-3793: update maven-assembly-plugin (tilman: 
[https://github.com/apache/tika/commit/3dcdfffd31df5ad58a0676c69340e2823e20553c])
* (edit) tika-parent/pom.xml


> General upgrades for 1.28.5
> ---
>
> Key: TIKA-3793
> URL: https://issues.apache.org/jira/browse/TIKA-3793
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 1.28.5
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3814) Extracted text from HTML file does not exclude newline chars from body

2022-07-11 Thread Sai Konuri (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Konuri updated TIKA-3814:
-
Priority: Blocker  (was: Minor)

> Extracted text from HTML file does not exclude newline chars from body
> --
>
> Key: TIKA-3814
> URL: https://issues.apache.org/jira/browse/TIKA-3814
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Sai Konuri
>Priority: Blocker
> Attachments: bug.html, image-2022-07-06-19-08-30-437.png, 
> image-2022-07-06-19-09-54-534.png
>
>
> When there is a newline character ('\n') within the text of a 
> ,,, etc, the text that is extracted is not excluding those 
> newlines. 
> A sample html file is attached.
>  
> {*}Expected{*}:
> !image-2022-07-06-19-08-30-437.png!
>  
> {*}Actual{*}: 
> !image-2022-07-06-19-09-54-534.png!
>  
>  
> This is the code I am using to extract the text of the HTML file: 
> {code:java}
> AutoDetectParser parser = new AutoDetectParser();
> BodyContentHandler handler = new BodyContentHandler();
> Metadata metadata = new Metadata();
> try (InputStream stream = 
> this.getClass().getClassLoader().getResourceAsStream("bug.html")) {
> parser.parse(stream, handler, metadata);
> System.out.println(handler);
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3814) Extracted text from HTML file does not exclude newline chars from body

2022-07-11 Thread Nick Burch (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch updated TIKA-3814:
-
Priority: Trivial  (was: Blocker)

> Extracted text from HTML file does not exclude newline chars from body
> --
>
> Key: TIKA-3814
> URL: https://issues.apache.org/jira/browse/TIKA-3814
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Sai Konuri
>Priority: Trivial
> Attachments: bug.html, image-2022-07-06-19-08-30-437.png, 
> image-2022-07-06-19-09-54-534.png
>
>
> When there is a newline character ('\n') within the text of a 
> ,,, etc, the text that is extracted is not excluding those 
> newlines. 
> A sample html file is attached.
>  
> {*}Expected{*}:
> !image-2022-07-06-19-08-30-437.png!
>  
> {*}Actual{*}: 
> !image-2022-07-06-19-09-54-534.png!
>  
>  
> This is the code I am using to extract the text of the HTML file: 
> {code:java}
> AutoDetectParser parser = new AutoDetectParser();
> BodyContentHandler handler = new BodyContentHandler();
> Metadata metadata = new Metadata();
> try (InputStream stream = 
> this.getClass().getClassLoader().getResourceAsStream("bug.html")) {
> parser.parse(stream, handler, metadata);
> System.out.println(handler);
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3815) Inconsistent Date/Time information extracted from Exif data

2022-07-11 Thread Jira
Luís Filipe Nassif created TIKA-3815:


 Summary: Inconsistent Date/Time information extracted from Exif 
data
 Key: TIKA-3815
 URL: https://issues.apache.org/jira/browse/TIKA-3815
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 2.4.1
Reporter: Luís Filipe Nassif
 Attachments: IMG_20220616_111848_HDR.jpg

Running tika-app-2.4.1.jar on the attached image, this metadata is returned:

Exif IFD0:Date/Time: 2022:06:16 11:18:49
Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
Exif SubIFD:Time Zone: -03:00
Exif SubIFD:Time Zone Digitized: -03:00
Exif SubIFD:Time Zone Original: -03:00
File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
GPS:GPS Date Stamp: 2022:06:16
GPS:GPS Time-Stamp: 14:18:47.000 UTC
dcterms:created: 2022-06-16T08:18:49
dcterms:modified: 2022-06-16T08:18:49
exif:DateTimeOriginal: 2022-06-16T08:18:49

 

The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
specified for some values, I think it makes no sense converting it to timezones 
different than GMT or the one used to take the picture (-03:00), so Tika could 
be making an incorrect timezone conversion on the last 3 fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3815) Inconsistent Date/Time information extracted from Exif data

2022-07-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TIKA-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luís Filipe Nassif updated TIKA-3815:
-
Description: 
Running tika-app-2.4.1.jar on the attached image, these metadata is returned:

Exif IFD0:Date/Time: 2022:06:16 11:18:49
Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
Exif SubIFD:Time Zone: -03:00
Exif SubIFD:Time Zone Digitized: -03:00
Exif SubIFD:Time Zone Original: -03:00
File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
GPS:GPS Date Stamp: 2022:06:16
GPS:GPS Time-Stamp: 14:18:47.000 UTC
dcterms:created: 2022-06-16T08:18:49
dcterms:modified: 2022-06-16T08:18:49
exif:DateTimeOriginal: 2022-06-16T08:18:49

 

The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
specified for some values, I think it makes no sense converting them to 
timezones different than GMT or the one used to take the picture (-03:00), so 
Tika could be making an incorrect timezone conversion on the last 3 fields.

  was:
Running tika-app-2.4.1.jar on the attached image, this metadata is returned:

Exif IFD0:Date/Time: 2022:06:16 11:18:49
Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
Exif SubIFD:Time Zone: -03:00
Exif SubIFD:Time Zone Digitized: -03:00
Exif SubIFD:Time Zone Original: -03:00
File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
GPS:GPS Date Stamp: 2022:06:16
GPS:GPS Time-Stamp: 14:18:47.000 UTC
dcterms:created: 2022-06-16T08:18:49
dcterms:modified: 2022-06-16T08:18:49
exif:DateTimeOriginal: 2022-06-16T08:18:49

 

The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
specified for some values, I think it makes no sense converting it to timezones 
different than GMT or the one used to take the picture (-03:00), so Tika could 
be making an incorrect timezone conversion on the last 3 fields.


> Inconsistent Date/Time information extracted from Exif data
> ---
>
> Key: TIKA-3815
> URL: https://issues.apache.org/jira/browse/TIKA-3815
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
>Reporter: Luís Filipe Nassif
>Priority: Major
> Attachments: IMG_20220616_111848_HDR.jpg
>
>
> Running tika-app-2.4.1.jar on the attached image, these metadata is returned:
> Exif IFD0:Date/Time: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
> Exif SubIFD:Time Zone: -03:00
> Exif SubIFD:Time Zone Digitized: -03:00
> Exif SubIFD:Time Zone Original: -03:00
> File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
> GPS:GPS Date Stamp: 2022:06:16
> GPS:GPS Time-Stamp: 14:18:47.000 UTC
> dcterms:created: 2022-06-16T08:18:49
> dcterms:modified: 2022-06-16T08:18:49
> exif:DateTimeOriginal: 2022-06-16T08:18:49
>  
> The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
> specified for some values, I think it makes no sense converting them to 
> timezones different than GMT or the one used to take the picture (-03:00), so 
> Tika could be making an incorrect timezone conversion on the last 3 fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3815) Inconsistent Date/Time information extracted from Exif data

2022-07-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/TIKA-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luís Filipe Nassif updated TIKA-3815:
-
Description: 
Running tika-app-2.4.1.jar on the attached image, these metadata is returned:

Exif IFD0:Date/Time: 2022:06:16 11:18:49
Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
Exif SubIFD:Time Zone: -03:00
Exif SubIFD:Time Zone Digitized: -03:00
Exif SubIFD:Time Zone Original: -03:00
File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
GPS:GPS Date Stamp: 2022:06:16
GPS:GPS Time-Stamp: 14:18:47.000 UTC
dcterms:created: 2022-06-16T08:18:49
dcterms:modified: 2022-06-16T08:18:49
exif:DateTimeOriginal: 2022-06-16T08:18:49

 

The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
specified for some values, I think it makes no sense converting them to 
timezones different than GMT, the one used to take the picture (-03:00) or the 
one used to run the application (-03:00), so Tika could be making an incorrect 
timezone conversion on the last 3 fields.

  was:
Running tika-app-2.4.1.jar on the attached image, these metadata is returned:

Exif IFD0:Date/Time: 2022:06:16 11:18:49
Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
Exif SubIFD:Time Zone: -03:00
Exif SubIFD:Time Zone Digitized: -03:00
Exif SubIFD:Time Zone Original: -03:00
File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
GPS:GPS Date Stamp: 2022:06:16
GPS:GPS Time-Stamp: 14:18:47.000 UTC
dcterms:created: 2022-06-16T08:18:49
dcterms:modified: 2022-06-16T08:18:49
exif:DateTimeOriginal: 2022-06-16T08:18:49

 

The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
specified for some values, I think it makes no sense converting them to 
timezones different than GMT or the one used to take the picture (-03:00), so 
Tika could be making an incorrect timezone conversion on the last 3 fields.


> Inconsistent Date/Time information extracted from Exif data
> ---
>
> Key: TIKA-3815
> URL: https://issues.apache.org/jira/browse/TIKA-3815
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
>Reporter: Luís Filipe Nassif
>Priority: Major
> Attachments: IMG_20220616_111848_HDR.jpg
>
>
> Running tika-app-2.4.1.jar on the attached image, these metadata is returned:
> Exif IFD0:Date/Time: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
> Exif SubIFD:Time Zone: -03:00
> Exif SubIFD:Time Zone Digitized: -03:00
> Exif SubIFD:Time Zone Original: -03:00
> File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
> GPS:GPS Date Stamp: 2022:06:16
> GPS:GPS Time-Stamp: 14:18:47.000 UTC
> dcterms:created: 2022-06-16T08:18:49
> dcterms:modified: 2022-06-16T08:18:49
> exif:DateTimeOriginal: 2022-06-16T08:18:49
>  
> The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
> specified for some values, I think it makes no sense converting them to 
> timezones different than GMT, the one used to take the picture (-03:00) or 
> the one used to run the application (-03:00), so Tika could be making an 
> incorrect timezone conversion on the last 3 fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3815) Inconsistent Date/Time information extracted from Exif data

2022-07-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/TIKA-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565187#comment-17565187
 ] 

Luís Filipe Nassif commented on TIKA-3815:
--

Does anyone know if the EXIF specification defines a default timezone for those 
dates if it is not specified? Even if not, maybe we could make some checking if 
"Exif SubIFD:Time Zone*" fields are found and if different date values differ 
exactly about those timezone fields...

> Inconsistent Date/Time information extracted from Exif data
> ---
>
> Key: TIKA-3815
> URL: https://issues.apache.org/jira/browse/TIKA-3815
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
>Reporter: Luís Filipe Nassif
>Priority: Major
> Attachments: IMG_20220616_111848_HDR.jpg
>
>
> Running tika-app-2.4.1.jar on the attached image, these metadata is returned:
> Exif IFD0:Date/Time: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
> Exif SubIFD:Time Zone: -03:00
> Exif SubIFD:Time Zone Digitized: -03:00
> Exif SubIFD:Time Zone Original: -03:00
> File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
> GPS:GPS Date Stamp: 2022:06:16
> GPS:GPS Time-Stamp: 14:18:47.000 UTC
> dcterms:created: 2022-06-16T08:18:49
> dcterms:modified: 2022-06-16T08:18:49
> exif:DateTimeOriginal: 2022-06-16T08:18:49
>  
> The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
> specified for some values, I think it makes no sense converting them to 
> timezones different than GMT, the one used to take the picture (-03:00) or 
> the one used to run the application (-03:00), so Tika could be making an 
> incorrect timezone conversion on the last 3 fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3815) Inconsistent Date/Time information extracted from Exif data

2022-07-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/TIKA-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565200#comment-17565200
 ] 

Luís Filipe Nassif commented on TIKA-3815:
--

Ok we should at least change this SimpleDateFormat:

[https://github.com/apache/tika/blob/2.4.1/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-image-module/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java#L381]

to set its timezone explicitly to GMT, otherwise it will use the default/local 
timezone when formatting the Date (that uses UTC as reference) even if the 
output pattern has no timezone information to print. Without this, running Tika 
on different timezones could return different date values...

 

I'll submit a fix to that if there aren't objections.

> Inconsistent Date/Time information extracted from Exif data
> ---
>
> Key: TIKA-3815
> URL: https://issues.apache.org/jira/browse/TIKA-3815
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
>Reporter: Luís Filipe Nassif
>Priority: Major
> Attachments: IMG_20220616_111848_HDR.jpg
>
>
> Running tika-app-2.4.1.jar on the attached image, these metadata is returned:
> Exif IFD0:Date/Time: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
> Exif SubIFD:Time Zone: -03:00
> Exif SubIFD:Time Zone Digitized: -03:00
> Exif SubIFD:Time Zone Original: -03:00
> File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
> GPS:GPS Date Stamp: 2022:06:16
> GPS:GPS Time-Stamp: 14:18:47.000 UTC
> dcterms:created: 2022-06-16T08:18:49
> dcterms:modified: 2022-06-16T08:18:49
> exif:DateTimeOriginal: 2022-06-16T08:18:49
>  
> The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
> specified for some values, I think it makes no sense converting them to 
> timezones different than GMT, the one used to take the picture (-03:00) or 
> the one used to run the application (-03:00), so Tika could be making an 
> incorrect timezone conversion on the last 3 fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] lfcnassif opened a new pull request, #605: TIKA-3815: set GMT timezone for unspecified timezones like drew noakes,

2022-07-11 Thread GitBox


lfcnassif opened a new pull request, #605:
URL: https://github.com/apache/tika/pull/605

   Fixes TIKA-3815.
   
   Actually Drew Noakes metadata-extractor library uses **GMT** timezone when 
timezone is not specified since version 2.8.1, not JVM default:
   [fix tests to work on different 
timezones](https://github.com/drewnoakes/metadata-extractor/commit/1899deac8af55934363183dae605e286bc79afa3)
   
   So I changed the code to reflect that and updated tests. Now `user.timezone` 
doesn't need to be fixed to UTC anymore to run tests, they should work on any 
timezone.
   
   I would like some other committer to test this to be sure tests work on a 
timezone different than mine (GMT-3)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3815) Inconsistent Date/Time information extracted from Exif data

2022-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565229#comment-17565229
 ] 

ASF GitHub Bot commented on TIKA-3815:
--

lfcnassif opened a new pull request, #605:
URL: https://github.com/apache/tika/pull/605

   Fixes TIKA-3815.
   
   Actually Drew Noakes metadata-extractor library uses **GMT** timezone when 
timezone is not specified since version 2.8.1, not JVM default:
   [fix tests to work on different 
timezones](https://github.com/drewnoakes/metadata-extractor/commit/1899deac8af55934363183dae605e286bc79afa3)
   
   So I changed the code to reflect that and updated tests. Now `user.timezone` 
doesn't need to be fixed to UTC anymore to run tests, they should work on any 
timezone.
   
   I would like some other committer to test this to be sure tests work on a 
timezone different than mine (GMT-3)




> Inconsistent Date/Time information extracted from Exif data
> ---
>
> Key: TIKA-3815
> URL: https://issues.apache.org/jira/browse/TIKA-3815
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
>Reporter: Luís Filipe Nassif
>Priority: Major
> Attachments: IMG_20220616_111848_HDR.jpg
>
>
> Running tika-app-2.4.1.jar on the attached image, these metadata is returned:
> Exif IFD0:Date/Time: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
> Exif SubIFD:Time Zone: -03:00
> Exif SubIFD:Time Zone Digitized: -03:00
> Exif SubIFD:Time Zone Original: -03:00
> File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
> GPS:GPS Date Stamp: 2022:06:16
> GPS:GPS Time-Stamp: 14:18:47.000 UTC
> dcterms:created: 2022-06-16T08:18:49
> dcterms:modified: 2022-06-16T08:18:49
> exif:DateTimeOriginal: 2022-06-16T08:18:49
>  
> The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
> specified for some values, I think it makes no sense converting them to 
> timezones different than GMT, the one used to take the picture (-03:00) or 
> the one used to run the application (-03:00), so Tika could be making an 
> incorrect timezone conversion on the last 3 fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] lfcnassif commented on pull request #605: TIKA-3815: set GMT timezone for unspecified timezones like drew noakes,

2022-07-11 Thread GitBox


lfcnassif commented on PR #605:
URL: https://github.com/apache/tika/pull/605#issuecomment-1181250211

   Should this be also applied to 1.x branch or we are just fixing security 
issues?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3815) Inconsistent Date/Time information extracted from Exif data

2022-07-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565231#comment-17565231
 ] 

ASF GitHub Bot commented on TIKA-3815:
--

lfcnassif commented on PR #605:
URL: https://github.com/apache/tika/pull/605#issuecomment-1181250211

   Should this be also applied to 1.x branch or we are just fixing security 
issues?




> Inconsistent Date/Time information extracted from Exif data
> ---
>
> Key: TIKA-3815
> URL: https://issues.apache.org/jira/browse/TIKA-3815
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
>Reporter: Luís Filipe Nassif
>Priority: Major
> Attachments: IMG_20220616_111848_HDR.jpg
>
>
> Running tika-app-2.4.1.jar on the attached image, these metadata is returned:
> Exif IFD0:Date/Time: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Digitized: 2022:06:16 11:18:49
> Exif SubIFD:Date/Time Original: 2022:06:16 11:18:49
> Exif SubIFD:Time Zone: -03:00
> Exif SubIFD:Time Zone Digitized: -03:00
> Exif SubIFD:Time Zone Original: -03:00
> File Modified Date: Thu Jun 16 11:18:50 -03:00 2022
> GPS:GPS Date Stamp: 2022:06:16
> GPS:GPS Time-Stamp: 14:18:47.000 UTC
> dcterms:created: 2022-06-16T08:18:49
> dcterms:modified: 2022-06-16T08:18:49
> exif:DateTimeOriginal: 2022-06-16T08:18:49
>  
> The right value is 2022-06-16T14:18:49Z. Although there is no timezone 
> specified for some values, I think it makes no sense converting them to 
> timezones different than GMT, the one used to take the picture (-03:00) or 
> the one used to run the application (-03:00), so Tika could be making an 
> incorrect timezone conversion on the last 3 fields.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3795) General upgrades for 2.4.2

2022-07-11 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565262#comment-17565262
 ] 

Hudson commented on TIKA-3795:
--

FAILURE: Integrated in Jenkins build Tika » tika-main-jdk8 #676 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/676/])
TIKA-3795: update aws (tilman: 
[https://github.com/apache/tika/commit/4a1628da24236fcf771e47a8986cb2d2bcc70cda])
* (edit) tika-parent/pom.xml


> General upgrades for 2.4.2
> --
>
> Key: TIKA-3795
> URL: https://issues.apache.org/jira/browse/TIKA-3795
> Project: Tika
>  Issue Type: Improvement
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 2.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3795) General upgrades for 2.4.2

2022-07-11 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated TIKA-3795:
--
Attachment: image-2022-07-12-06-46-00-215.png

> General upgrades for 2.4.2
> --
>
> Key: TIKA-3795
> URL: https://issues.apache.org/jira/browse/TIKA-3795
> Project: Tika
>  Issue Type: Improvement
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 2.4.2
>
> Attachments: image-2022-07-12-06-46-00-215.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3795) General upgrades for 2.4.2

2022-07-11 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565266#comment-17565266
 ] 

Tilman Hausherr commented on TIKA-3795:
---

!image-2022-07-12-06-46-00-215.png!

> General upgrades for 2.4.2
> --
>
> Key: TIKA-3795
> URL: https://issues.apache.org/jira/browse/TIKA-3795
> Project: Tika
>  Issue Type: Improvement
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 2.4.2
>
> Attachments: image-2022-07-12-06-46-00-215.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3793) General upgrades for 1.28.5

2022-07-11 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565275#comment-17565275
 ] 

Hudson commented on TIKA-3793:
--

SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #237 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/237/])
TIKA-3793: update cxf.micrometer (tilman: 
[https://github.com/apache/tika/commit/0a6da81213eb2a52bf75e23138a64c53ce4b41f8])
* (edit) tika-server/pom.xml


> General upgrades for 1.28.5
> ---
>
> Key: TIKA-3793
> URL: https://issues.apache.org/jira/browse/TIKA-3793
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 1.28.5
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] dependabot[bot] opened a new pull request, #606: Bump netty.version from 4.1.78.Final to 4.1.79.Final

2022-07-11 Thread GitBox


dependabot[bot] opened a new pull request, #606:
URL: https://github.com/apache/tika/pull/606

   Bumps `netty.version` from 4.1.78.Final to 4.1.79.Final.
   Updates `netty-common` from 4.1.78.Final to 4.1.79.Final
   
   Commits
   
   https://github.com/netty/netty/commit/aa59245955f5121c4230d46c41a4b7f55e0c9ba7";>aa59245
 [maven-release-plugin] prepare release netty-4.1.79.Final
   https://github.com/netty/netty/commit/9728d62b406df45513ee693dcfdf727957bc8020";>9728d62
 Add support for LoongArch64 architecture (https://github-redirect.dependabot.com/netty/netty/issues/12580";>#12580)
   https://github.com/netty/netty/commit/1befae89aa1bc4605132b06618afc6a43efa3fa4";>1befae8
 Only enable test if brotli is there (https://github-redirect.dependabot.com/netty/netty/issues/12592";>#12592)
   https://github.com/netty/netty/commit/15812da820fd3aef3df687953bc54ee1c6eb894b";>15812da
 Offload multicast operations to the EventLoop if not called from within an 
Ev...
   https://github.com/netty/netty/commit/2f7234b7b8771cf8f6eaaf78c3f9932fbfe546bf";>2f7234b
 Fix isOriginForm and isAsteriskForm (https://github-redirect.dependabot.com/netty/netty/issues/12568";>#12568)
   https://github.com/netty/netty/commit/c949fa6a95725605a7e31be7abecbeb63262044a";>c949fa6
 Keep completed flag for retained/duplicated HttpData (https://github-redirect.dependabot.com/netty/netty/issues/12576";>#12576)
   https://github.com/netty/netty/commit/29b203f75ad74146d78aa703e94be797ad1a48f2";>29b203f
 Fix typo in KQueueChannelConfig (https://github-redirect.dependabot.com/netty/netty/issues/12537";>#12537)
   https://github.com/netty/netty/commit/985971e41f2b80cb6d56c99aa585068daa50716d";>985971e
 Update to new codeql-action (https://github-redirect.dependabot.com/netty/netty/issues/12551";>#12551)
   https://github.com/netty/netty/commit/7d540fc39ce21c305efef7c182350f611ef51933";>7d540fc
 Add Better Path Handling (https://github-redirect.dependabot.com/netty/netty/issues/12533";>#12533)
   https://github.com/netty/netty/commit/f4edab3af3492fa5c512e014ff63f260d817f5e5";>f4edab3
 Replace ctx.channel().writeAndFlush with ctx.writeAndFlush in WebSockets 
hand...
   Additional commits viewable in https://github.com/netty/netty/compare/netty-4.1.78.Final...netty-4.1.79.Final";>compare
 view
   
   
   
   
   Updates `netty-handler` from 4.1.78.Final to 4.1.79.Final
   
   Commits
   
   https://github.com/netty/netty/commit/aa59245955f5121c4230d46c41a4b7f55e0c9ba7";>aa59245
 [maven-release-plugin] prepare release netty-4.1.79.Final
   https://github.com/netty/netty/commit/9728d62b406df45513ee693dcfdf727957bc8020";>9728d62
 Add support for LoongArch64 architecture (https://github-redirect.dependabot.com/netty/netty/issues/12580";>#12580)
   https://github.com/netty/netty/commit/1befae89aa1bc4605132b06618afc6a43efa3fa4";>1befae8
 Only enable test if brotli is there (https://github-redirect.dependabot.com/netty/netty/issues/12592";>#12592)
   https://github.com/netty/netty/commit/15812da820fd3aef3df687953bc54ee1c6eb894b";>15812da
 Offload multicast operations to the EventLoop if not called from within an 
Ev...
   https://github.com/netty/netty/commit/2f7234b7b8771cf8f6eaaf78c3f9932fbfe546bf";>2f7234b
 Fix isOriginForm and isAsteriskForm (https://github-redirect.dependabot.com/netty/netty/issues/12568";>#12568)
   https://github.com/netty/netty/commit/c949fa6a95725605a7e31be7abecbeb63262044a";>c949fa6
 Keep completed flag for retained/duplicated HttpData (https://github-redirect.dependabot.com/netty/netty/issues/12576";>#12576)
   https://github.com/netty/netty/commit/29b203f75ad74146d78aa703e94be797ad1a48f2";>29b203f
 Fix typo in KQueueChannelConfig (https://github-redirect.dependabot.com/netty/netty/issues/12537";>#12537)
   https://github.com/netty/netty/commit/985971e41f2b80cb6d56c99aa585068daa50716d";>985971e
 Update to new codeql-action (https://github-redirect.dependabot.com/netty/netty/issues/12551";>#12551)
   https://github.com/netty/netty/commit/7d540fc39ce21c305efef7c182350f611ef51933";>7d540fc
 Add Better Path Handling (https://github-redirect.dependabot.com/netty/netty/issues/12533";>#12533)
   https://github.com/netty/netty/commit/f4edab3af3492fa5c512e014ff63f260d817f5e5";>f4edab3
 Replace ctx.channel().writeAndFlush with ctx.writeAndFlush in WebSockets 
hand...
   Additional commits viewable in https://github.com/netty/netty/compare/netty-4.1.78.Final...netty-4.1.79.Final";>compare
 view
   
   
   
   
   Updates `netty-transport-native-unix-common` from 4.1.78.Final to 
4.1.79.Final
   
   Commits
   
   https://github.com/netty/netty/commit/aa59245955f5121c4230d46c41a4b7f55e0c9ba7";>aa59245
 [maven-release-plugin] prepare release netty-4.1.79.Final
   https://github.com/netty/netty/commit/9728d62b406df45513ee693dcfdf727957bc8020";>9728d62
 Add support for LoongArch64 architecture (https://github-redirect.dependabot.com/netty/netty/issues/12580";>#12580)
   https://github.com/netty/netty/commit/1befae89aa1bc4605132b06618afc6a43e

[jira] [Commented] (TIKA-3795) General upgrades for 2.4.2

2022-07-11 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565285#comment-17565285
 ] 

Hudson commented on TIKA-3795:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #677 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/677/])
TIKA-3795: add jetty-client to avoid older version (tilman: 
[https://github.com/apache/tika/commit/7cc5872a1d137a334dc04cd6bbe08a52e904ec4f])
* (edit) tika-parent/pom.xml


> General upgrades for 2.4.2
> --
>
> Key: TIKA-3795
> URL: https://issues.apache.org/jira/browse/TIKA-3795
> Project: Tika
>  Issue Type: Improvement
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 2.4.2
>
> Attachments: image-2022-07-12-06-46-00-215.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] THausherr merged pull request #606: Bump netty.version from 4.1.78.Final to 4.1.79.Final

2022-07-11 Thread GitBox


THausherr merged PR #606:
URL: https://github.com/apache/tika/pull/606


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org