[GitHub] [tika] dependabot[bot] opened a new pull request, #1318: Bump aws.version from 1.12.543 to 1.12.544

2023-09-05 Thread via GitHub


dependabot[bot] opened a new pull request, #1318:
URL: https://github.com/apache/tika/pull/1318

   Bumps `aws.version` from 1.12.543 to 1.12.544.
   Updates `com.amazonaws:aws-java-sdk-s3` from 1.12.543 to 1.12.544
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>com.amazonaws:aws-java-sdk-s3's
 changelog.
   
   1.12.544 2023-09-05
   AWS Cloud9
   
   
   Features
   
   Added support for Ubuntu 22.04 that was not picked up in a previous 
Trebuchet request. Doc-only update.
   
   
   
   AWS Compute Optimizer
   
   
   Features
   
   This release adds support to provide recommendations for G4dn and P3 
instances that use NVIDIA GPUs.
   
   
   
   AWSBillingConductor
   
   
   Features
   
   This release adds support for line item filtering in for the custom line 
item resource.
   
   
   
   Amazon EC2 Container Service
   
   
   Features
   
   Documentation only update for Amazon ECS.
   
   
   
   Amazon Elastic Compute Cloud
   
   
   Features
   
   Introducing Amazon EC2 C7gd, M7gd, and R7gd Instances with up to 3.8 TB 
of local NVMe-based SSD block-level storage. These instances are powered by AWS 
Graviton3 processors, delivering up to 25% better performance over 
Graviton2-based instances.
   
   
   
   Amazon EventBridge
   
   
   Features
   
   Improve Endpoint Ruleset test coverage.
   
   
   
   Amazon Relational Database Service
   
   
   Features
   
   Add support for feature integration with AWS Backup.
   
   
   
   Amazon SageMaker Service
   
   
   Features
   
   SageMaker Neo now supports data input shape derivation for Pytorch 2.0  
and XGBoost compilation job for cloud instance targets. You can skip 
DataInputConfig field during compilation job creation. You can also access 
derived information from model in DescribeCompilationJob response.
   
   
   
   Amazon VPC Lattice
   
   
   Features
   
   This release adds Lambda event structure version config support for 
LAMBDA target groups. It also adds newline support for auth policies.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/3c263e31fafd09ab5aed2de25ad107a848baa1dd;>3c263e3
 AWS SDK for Java 1.12.544
   https://github.com/aws/aws-sdk-java/commit/0bfb4d6126eb01b73a44ec8e18568b45fbe10591;>0bfb4d6
 Update GitHub version number to 1.12.544-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.543...1.12.544;>compare 
view
   
   
   
   
   Updates `com.amazonaws:aws-java-sdk-transcribe` from 1.12.543 to 1.12.544
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>com.amazonaws:aws-java-sdk-transcribe's
 changelog.
   
   1.12.544 2023-09-05
   AWS Cloud9
   
   
   Features
   
   Added support for Ubuntu 22.04 that was not picked up in a previous 
Trebuchet request. Doc-only update.
   
   
   
   AWS Compute Optimizer
   
   
   Features
   
   This release adds support to provide recommendations for G4dn and P3 
instances that use NVIDIA GPUs.
   
   
   
   AWSBillingConductor
   
   
   Features
   
   This release adds support for line item filtering in for the custom line 
item resource.
   
   
   
   Amazon EC2 Container Service
   
   
   Features
   
   Documentation only update for Amazon ECS.
   
   
   
   Amazon Elastic Compute Cloud
   
   
   Features
   
   Introducing Amazon EC2 C7gd, M7gd, and R7gd Instances with up to 3.8 TB 
of local NVMe-based SSD block-level storage. These instances are powered by AWS 
Graviton3 processors, delivering up to 25% better performance over 
Graviton2-based instances.
   
   
   
   Amazon EventBridge
   
   
   Features
   
   Improve Endpoint Ruleset test coverage.
   
   
   
   Amazon Relational Database Service
   
   
   Features
   
   Add support for feature integration with AWS Backup.
   
   
   
   Amazon SageMaker Service
   
   
   Features
   
   SageMaker Neo now supports data input shape derivation for Pytorch 2.0  
and XGBoost compilation job for cloud instance targets. You can skip 
DataInputConfig field during compilation job creation. You can also access 
derived information from model in DescribeCompilationJob response.
   
   
   
   Amazon VPC Lattice
   
   
   Features
   
   This release adds Lambda event structure version config support for 
LAMBDA target groups. It also adds newline support for auth policies.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/3c263e31fafd09ab5aed2de25ad107a848baa1dd;>3c263e3
 AWS SDK for Java 1.12.544
   https://github.com/aws/aws-sdk-java/commit/0bfb4d6126eb01b73a44ec8e18568b45fbe10591;>0bfb4d6
 Update GitHub version number to 1.12.544-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.543...1.12.544;>compare 
view
   
   
   
   
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 

[jira] [Commented] (TIKA-4124) embedded html of type http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk is not parsed

2023-09-05 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762216#comment-17762216
 ] 

Hudson commented on TIKA-4124:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1230 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1230/])
TIKA-4124 -- extract alternate format chunk from ooxml (#1317) (github: 
[https://github.com/apache/tika/commit/f6290858bae72ed1c561ce75812c577e6b736a32])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLContainerExtractionTest.java
* (edit) 
tika-core/src/main/java/org/apache/tika/metadata/TikaCoreProperties.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java


> embedded html of type 
> http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk 
> is not parsed
> ---
>
> Key: TIKA-4124
> URL: https://issues.apache.org/jira/browse/TIKA-4124
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Reporter: Tim Barrett
>Priority: Minor
>
> Word documents that may have been created using third party programs such as 
> docx4j sometimes contain embedded html. This is not parsed by Tika. The 
> embedded HTML file usually resides within the main folder of the docx 
> internal structure.
> Changing the code in: 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedPart()
> as follows, handles this (the final else if)
>  
> {color:#7f0055}if{color}{color:#00} 
> (POIXMLDocument.{color}{color:#c0}OLE_OBJECT_REL_TYPE{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  && 
> {color}{color:#c0}TYPE_OLE_OBJECT{color}{color:#00}.equals({color}{color:#6a3e3e}target{color}{color:#00}.getContentType()))
>  {{color}
> {color:#00} 
> handleEmbeddedOLE({color}{color:#6a3e3e}target{color}{color:#00}, 
> {color}{color:#6a3e3e}xhtml{color}{color:#00}, 
> {color}{color:#6a3e3e}sourceDesc{color}{color:#00} + 
> {color}{color:#6a3e3e}rel{color}{color:#00}.getId(), 
> {color}{color:#6a3e3e}parentMetadata{color}{color:#00});{color}
> {color:#00} {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#6a3e3e}targetURI{color}{color:#00} != 
> {color}{color:#7f0055}null{color}{color:#00}) {{color}
> {color:#00} 
> {color}{color:#6a3e3e}handledTarget{color}{color:#00}.add({color}{color:#6a3e3e}targetURI{color}{color:#00}.toString());{color}
> {color:#00} }{color}
> {color:#00} } {color}{color:#7f0055}else{color}{color:#00} 
> {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#c0}RELATION_MEDIA{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  || 
> {color}{color:#c0}RELATION_VIDEO{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  || 
> {color}{color:#c0}RELATION_AUDIO{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00}){color}
> {color:#00} || 
> PackageRelationshipTypes.{color}{color:#c0}IMAGE_PART{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  || 
> POIXMLDocument.{color}{color:#c0}PACK_OBJECT_REL_TYPE{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00}){color}
> {color:#00} || 
> POIXMLDocument.{color}{color:#c0}OLE_OBJECT_REL_TYPE{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00}))
>  {{color}
> {color:#00} 
> handleEmbeddedFile({color}{color:#6a3e3e}target{color}{color:#00}, 
> {color}{color:#6a3e3e}xhtml{color}{color:#00}, 
> {color}{color:#6a3e3e}sourceDesc{color}{color:#00} + 
> {color}{color:#6a3e3e}rel{color}{color:#00}.getId());{color}
> {color:#00} {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#6a3e3e}targetURI{color}{color:#00} != 
> {color}{color:#7f0055}null{color}{color:#00}) {{color}
> {color:#00} 
> {color}{color:#6a3e3e}handledTarget{color}{color:#00}.add({color}{color:#6a3e3e}targetURI{color}{color:#00}.toString());{color}
> {color:#00} }{color}
> {color:#00} } {color}{color:#7f0055}else{color}{color:#00} 
> {color}{color:#7f0055}if{color}{color:#00} 
> (XSSFRelation.{color}{color:#c0}VBA_MACROS{color}{color:#00}.getRelation().equals({color}{color:#6a3e3e}type{color}{color:#00}))
>  {{color}
> {color:#00} 
> handleMacros({color}{color:#6a3e3e}target{color}{color:#00}, 
> {color}{color:#6a3e3e}xhtml{color}{color:#00});{color}
> {color:#00} 

[GitHub] [tika] tballison merged pull request #1317: TIKA-4124

2023-09-05 Thread via GitHub


tballison merged PR #1317:
URL: https://github.com/apache/tika/pull/1317


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] tballison opened a new pull request, #1317: TIKA-4124

2023-09-05 Thread via GitHub


tballison opened a new pull request, #1317:
URL: https://github.com/apache/tika/pull/1317

   
   
   Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! 
Your help is appreciated!
   
   Before opening the pull request, please verify that
   * there is an open issue on the [Tika issue 
tracker](https://issues.apache.org/jira/projects/TIKA) which describes the 
problem or the improvement. We cannot accept pull requests without an issue 
because the change wouldn't be listed in the release notes.
   * the issue ID (`TIKA-`)
 - is referenced in the title of the pull request
 - and placed in front of your commit messages surrounded by square 
brackets (`[TIKA-] Issue or pull request title`)
   * commits are squashed into a single one (or few commits for larger changes)
   * Tika is successfully built and unit tests pass by running `mvn clean test`
   * there should be no conflicts when merging the pull request branch into the 
*recent* `main` branch. If there are conflicts, please try to rebase the pull 
request branch on top of a freshly pulled `main` branch
   * if you add new module that downstream users will depend upon add it to 
relevant group in `tika-bom/pom.xml`.
   
   We will be able to faster integrate your pull request if these conditions 
are met. If you have any questions how to fix your problem or about using Tika 
in general, please sign up for the [Tika mailing 
list](http://tika.apache.org/mail-lists.html). Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-3347) Upgrade to PDFBox 3.x when available

2023-09-05 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762117#comment-17762117
 ] 

Tim Allison commented on TIKA-3347:
---

https://github.com/apache/camel-quarkus/issues/5234

> Upgrade to PDFBox 3.x when available
> 
>
> Key: TIKA-3347
> URL: https://issues.apache.org/jira/browse/TIKA-3347
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> 3.0.0-RC1 was recently released.  We should integrate it on a dev branch asap 
> so that we can help with regression testing...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4124) embedded html of type http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk is not parsed

2023-09-05 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762096#comment-17762096
 ] 

Tim Allison commented on TIKA-4124:
---

Not clear what the licenses are but there are some example files and a helpful 
discussion here: https://github.com/jgm/pandoc/issues/3883

Looks like the alt chunks can be rtf, html or a bunch of other file formats.  
Ideally, we'd inline the content, but it will be simpler to handle these like 
attachments as in the above example code fix.

> embedded html of type 
> http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk 
> is not parsed
> ---
>
> Key: TIKA-4124
> URL: https://issues.apache.org/jira/browse/TIKA-4124
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Reporter: Tim Barrett
>Priority: Minor
>
> Word documents that may have been created using third party programs such as 
> docx4j sometimes contain embedded html. This is not parsed by Tika. The 
> embedded HTML file usually resides within the main folder of the docx 
> internal structure.
> Changing the code in: 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedPart()
> as follows, handles this (the final else if)
>  
> {color:#7f0055}if{color}{color:#00} 
> (POIXMLDocument.{color}{color:#c0}OLE_OBJECT_REL_TYPE{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  && 
> {color}{color:#c0}TYPE_OLE_OBJECT{color}{color:#00}.equals({color}{color:#6a3e3e}target{color}{color:#00}.getContentType()))
>  {{color}
> {color:#00} 
> handleEmbeddedOLE({color}{color:#6a3e3e}target{color}{color:#00}, 
> {color}{color:#6a3e3e}xhtml{color}{color:#00}, 
> {color}{color:#6a3e3e}sourceDesc{color}{color:#00} + 
> {color}{color:#6a3e3e}rel{color}{color:#00}.getId(), 
> {color}{color:#6a3e3e}parentMetadata{color}{color:#00});{color}
> {color:#00} {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#6a3e3e}targetURI{color}{color:#00} != 
> {color}{color:#7f0055}null{color}{color:#00}) {{color}
> {color:#00} 
> {color}{color:#6a3e3e}handledTarget{color}{color:#00}.add({color}{color:#6a3e3e}targetURI{color}{color:#00}.toString());{color}
> {color:#00} }{color}
> {color:#00} } {color}{color:#7f0055}else{color}{color:#00} 
> {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#c0}RELATION_MEDIA{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  || 
> {color}{color:#c0}RELATION_VIDEO{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  || 
> {color}{color:#c0}RELATION_AUDIO{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00}){color}
> {color:#00} || 
> PackageRelationshipTypes.{color}{color:#c0}IMAGE_PART{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  || 
> POIXMLDocument.{color}{color:#c0}PACK_OBJECT_REL_TYPE{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00}){color}
> {color:#00} || 
> POIXMLDocument.{color}{color:#c0}OLE_OBJECT_REL_TYPE{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00}))
>  {{color}
> {color:#00} 
> handleEmbeddedFile({color}{color:#6a3e3e}target{color}{color:#00}, 
> {color}{color:#6a3e3e}xhtml{color}{color:#00}, 
> {color}{color:#6a3e3e}sourceDesc{color}{color:#00} + 
> {color}{color:#6a3e3e}rel{color}{color:#00}.getId());{color}
> {color:#00} {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#6a3e3e}targetURI{color}{color:#00} != 
> {color}{color:#7f0055}null{color}{color:#00}) {{color}
> {color:#00} 
> {color}{color:#6a3e3e}handledTarget{color}{color:#00}.add({color}{color:#6a3e3e}targetURI{color}{color:#00}.toString());{color}
> {color:#00} }{color}
> {color:#00} } {color}{color:#7f0055}else{color}{color:#00} 
> {color}{color:#7f0055}if{color}{color:#00} 
> (XSSFRelation.{color}{color:#c0}VBA_MACROS{color}{color:#00}.getRelation().equals({color}{color:#6a3e3e}type{color}{color:#00}))
>  {{color}
> {color:#00} 
> handleMacros({color}{color:#6a3e3e}target{color}{color:#00}, 
> {color}{color:#6a3e3e}xhtml{color}{color:#00});{color}
> {color:#00} {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#6a3e3e}targetURI{color}{color:#00} != 
> {color}{color:#7f0055}null{color}{color:#00}) {{color}
> {color:#00} 
> {color}{color:#6a3e3e}handledTarget{color}{color:#00}.add({color}{color:#6a3e3e}targetURI{color}{color:#00}.toString());{color}
> {color:#00} }{color}
> {color:#00} } 

[jira] [Commented] (TIKA-4124) embedded html of type http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk is not parsed

2023-09-05 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762095#comment-17762095
 ] 

Tim Allison commented on TIKA-4124:
---

Thank you for opening this issue.  Any chance you can attach an example file?

> embedded html of type 
> http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk 
> is not parsed
> ---
>
> Key: TIKA-4124
> URL: https://issues.apache.org/jira/browse/TIKA-4124
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Reporter: Tim Barrett
>Priority: Minor
>
> Word documents that may have been created using third party programs such as 
> docx4j sometimes contain embedded html. This is not parsed by Tika. The 
> embedded HTML file usually resides within the main folder of the docx 
> internal structure.
> Changing the code in: 
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedPart()
> as follows, handles this (the final else if)
>  
> {color:#7f0055}if{color}{color:#00} 
> (POIXMLDocument.{color}{color:#c0}OLE_OBJECT_REL_TYPE{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  && 
> {color}{color:#c0}TYPE_OLE_OBJECT{color}{color:#00}.equals({color}{color:#6a3e3e}target{color}{color:#00}.getContentType()))
>  {{color}
> {color:#00} 
> handleEmbeddedOLE({color}{color:#6a3e3e}target{color}{color:#00}, 
> {color}{color:#6a3e3e}xhtml{color}{color:#00}, 
> {color}{color:#6a3e3e}sourceDesc{color}{color:#00} + 
> {color}{color:#6a3e3e}rel{color}{color:#00}.getId(), 
> {color}{color:#6a3e3e}parentMetadata{color}{color:#00});{color}
> {color:#00} {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#6a3e3e}targetURI{color}{color:#00} != 
> {color}{color:#7f0055}null{color}{color:#00}) {{color}
> {color:#00} 
> {color}{color:#6a3e3e}handledTarget{color}{color:#00}.add({color}{color:#6a3e3e}targetURI{color}{color:#00}.toString());{color}
> {color:#00} }{color}
> {color:#00} } {color}{color:#7f0055}else{color}{color:#00} 
> {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#c0}RELATION_MEDIA{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  || 
> {color}{color:#c0}RELATION_VIDEO{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  || 
> {color}{color:#c0}RELATION_AUDIO{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00}){color}
> {color:#00} || 
> PackageRelationshipTypes.{color}{color:#c0}IMAGE_PART{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00})
>  || 
> POIXMLDocument.{color}{color:#c0}PACK_OBJECT_REL_TYPE{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00}){color}
> {color:#00} || 
> POIXMLDocument.{color}{color:#c0}OLE_OBJECT_REL_TYPE{color}{color:#00}.equals({color}{color:#6a3e3e}type{color}{color:#00}))
>  {{color}
> {color:#00} 
> handleEmbeddedFile({color}{color:#6a3e3e}target{color}{color:#00}, 
> {color}{color:#6a3e3e}xhtml{color}{color:#00}, 
> {color}{color:#6a3e3e}sourceDesc{color}{color:#00} + 
> {color}{color:#6a3e3e}rel{color}{color:#00}.getId());{color}
> {color:#00} {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#6a3e3e}targetURI{color}{color:#00} != 
> {color}{color:#7f0055}null{color}{color:#00}) {{color}
> {color:#00} 
> {color}{color:#6a3e3e}handledTarget{color}{color:#00}.add({color}{color:#6a3e3e}targetURI{color}{color:#00}.toString());{color}
> {color:#00} }{color}
> {color:#00} } {color}{color:#7f0055}else{color}{color:#00} 
> {color}{color:#7f0055}if{color}{color:#00} 
> (XSSFRelation.{color}{color:#c0}VBA_MACROS{color}{color:#00}.getRelation().equals({color}{color:#6a3e3e}type{color}{color:#00}))
>  {{color}
> {color:#00} 
> handleMacros({color}{color:#6a3e3e}target{color}{color:#00}, 
> {color}{color:#6a3e3e}xhtml{color}{color:#00});{color}
> {color:#00} {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#6a3e3e}targetURI{color}{color:#00} != 
> {color}{color:#7f0055}null{color}{color:#00}) {{color}
> {color:#00} 
> {color}{color:#6a3e3e}handledTarget{color}{color:#00}.add({color}{color:#6a3e3e}targetURI{color}{color:#00}.toString());{color}
> {color:#00} }{color}
> {color:#00} } {color}{color:#7f0055}else{color}{color:#00} 
> {color}{color:#7f0055}if{color}{color:#00} 
> ({color}{color:#6a3e3e}type{color}{color:#00}.endsWith({color}{color:#2a00ff}"aFChunk"{color}{color:#00}))
>  {{color}
>  
> {color:#00} 
> 

[jira] [Updated] (TIKA-4119) Return media type "text/javascript" instead of "application/javascript to follow RFC-9239

2023-09-05 Thread Nick Burch (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch updated TIKA-4119:
-
Component/s: mime

> Return media type "text/javascript" instead of "application/javascript to 
> follow RFC-9239
> -
>
> Key: TIKA-4119
> URL: https://issues.apache.org/jira/browse/TIKA-4119
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Reporter: Matthias Juchmes
>Priority: Major
>  Labels: tika-3x
>
> [RFC-9239|https://www.rfc-editor.org/rfc/rfc9239.html] obsoletes some 
> javascript media types, including "application/javascript", which is 
> currently returned by Tika for javascript files. "text/javascript" is defined 
> as the most widely supported one, so Tika should reflect this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-4119) Return media type "text/javascript" instead of "application/javascript to follow RFC-9239

2023-09-05 Thread Nick Burch (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch updated TIKA-4119:
-
Labels: tika-3x  (was: )

> Return media type "text/javascript" instead of "application/javascript to 
> follow RFC-9239
> -
>
> Key: TIKA-4119
> URL: https://issues.apache.org/jira/browse/TIKA-4119
> Project: Tika
>  Issue Type: Improvement
>Reporter: Matthias Juchmes
>Priority: Major
>  Labels: tika-3x
>
> [RFC-9239|https://www.rfc-editor.org/rfc/rfc9239.html] obsoletes some 
> javascript media types, including "application/javascript", which is 
> currently returned by Tika for javascript files. "text/javascript" is defined 
> as the most widely supported one, so Tika should reflect this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4119) Return media type "text/javascript" instead of "application/javascript to follow RFC-9239

2023-09-05 Thread Matthias Juchmes (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761987#comment-17761987
 ] 

Matthias Juchmes commented on TIKA-4119:


I agree that changing this for 3.x probably makes the most sense.

> Return media type "text/javascript" instead of "application/javascript to 
> follow RFC-9239
> -
>
> Key: TIKA-4119
> URL: https://issues.apache.org/jira/browse/TIKA-4119
> Project: Tika
>  Issue Type: Improvement
>Reporter: Matthias Juchmes
>Priority: Major
>
> [RFC-9239|https://www.rfc-editor.org/rfc/rfc9239.html] obsoletes some 
> javascript media types, including "application/javascript", which is 
> currently returned by Tika for javascript files. "text/javascript" is defined 
> as the most widely supported one, so Tika should reflect this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)