[GitHub] [tika] THausherr merged pull request #1366: Bump org.xerial.snappy:snappy-java from 1.1.10.4 to 1.1.10.5

2023-09-27 Thread via GitHub


THausherr merged PR #1366:
URL: https://github.com/apache/tika/pull/1366


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #1367: Bump aws.version from 1.12.558 to 1.12.559

2023-09-27 Thread via GitHub


THausherr merged PR #1367:
URL: https://github.com/apache/tika/pull/1367


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #1365: Bump org.netpreserve:jwarc from 0.28.2 to 0.28.3

2023-09-27 Thread via GitHub


THausherr merged PR #1365:
URL: https://github.com/apache/tika/pull/1365


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #1367: Bump aws.version from 1.12.558 to 1.12.559

2023-09-27 Thread via GitHub


dependabot[bot] opened a new pull request, #1367:
URL: https://github.com/apache/tika/pull/1367

   Bumps `aws.version` from 1.12.558 to 1.12.559.
   Updates `com.amazonaws:aws-java-sdk-s3` from 1.12.558 to 1.12.559
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>com.amazonaws:aws-java-sdk-s3's
 changelog.
   
   1.12.559 2023-09-27
   AWS IoT
   
   
   Features
   
   Added support for IoT Rules Engine Kafka Action Headers
   
   
   
   Amazon Cognito Identity Provider
   
   
   Features
   
   The UserPoolType Status field is no longer used.
   
   
   
   Amazon Kinesis Firehose
   
   
   Features
   
   Features : Adding support for new data ingestion source to Kinesis 
Firehose - AWS Managed Services Kafka.
   
   
   
   Amazon Textract
   
   
   Features
   
   This release adds new feature - Layout to Analyze Document API which can 
automatically extract layout elements such as titles, paragraphs, headers, 
section headers, lists, page numbers, footers, table areas, key-value areas and 
figure areas and order the elements as a human would read.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/95664258365b60115188fbedc51aa266034a54fe;>9566425
 AWS SDK for Java 1.12.559
   https://github.com/aws/aws-sdk-java/commit/f1f837094a73e96203f15bd24ac95274a4f58792;>f1f8370
 Update GitHub version number to 1.12.559-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.558...1.12.559;>compare 
view
   
   
   
   
   Updates `com.amazonaws:aws-java-sdk-transcribe` from 1.12.558 to 1.12.559
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>com.amazonaws:aws-java-sdk-transcribe's
 changelog.
   
   1.12.559 2023-09-27
   AWS IoT
   
   
   Features
   
   Added support for IoT Rules Engine Kafka Action Headers
   
   
   
   Amazon Cognito Identity Provider
   
   
   Features
   
   The UserPoolType Status field is no longer used.
   
   
   
   Amazon Kinesis Firehose
   
   
   Features
   
   Features : Adding support for new data ingestion source to Kinesis 
Firehose - AWS Managed Services Kafka.
   
   
   
   Amazon Textract
   
   
   Features
   
   This release adds new feature - Layout to Analyze Document API which can 
automatically extract layout elements such as titles, paragraphs, headers, 
section headers, lists, page numbers, footers, table areas, key-value areas and 
figure areas and order the elements as a human would read.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/95664258365b60115188fbedc51aa266034a54fe;>9566425
 AWS SDK for Java 1.12.559
   https://github.com/aws/aws-sdk-java/commit/f1f837094a73e96203f15bd24ac95274a4f58792;>f1f8370
 Update GitHub version number to 1.12.559-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.558...1.12.559;>compare 
view
   
   
   
   
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #1366: Bump org.xerial.snappy:snappy-java from 1.1.10.4 to 1.1.10.5

2023-09-27 Thread via GitHub


dependabot[bot] opened a new pull request, #1366:
URL: https://github.com/apache/tika/pull/1366

   Bumps [org.xerial.snappy:snappy-java](https://github.com/xerial/snappy-java) 
from 1.1.10.4 to 1.1.10.5.
   
   Release notes
   Sourced from https://github.com/xerial/snappy-java/releases;>org.xerial.snappy:snappy-java's
 releases.
   
   v1.1.10.5
   
   What's Changed
    Features
   
   Feature: Add Windows arm64 (e.g., Surface Pro X, Surface Pro 9 with 5G, 
etc.) support by https://github.com/imsudiproy;>@​imsudiproy in https://redirect.github.com/xerial/snappy-java/pull/511;>xerial/snappy-java#511
   Linux ppc64-le: Use an LTS-version of cross-compiler to support 
GLIBC_2.28 by https://github.com/xerial;>@​xerial in 
https://redirect.github.com/xerial/snappy-java/pull/516;>xerial/snappy-java#516
   
    Bug Fixes
   
   Fix GLIBC_2.32 not found error in ppc64le on an older version of Linux 
(e.g,. RedHat8.6) https://redirect.github.com/xerial/snappy-java/issues/512;>#512 by https://github.com/vineshcpaul;>@​vineshcpaul in https://redirect.github.com/xerial/snappy-java/pull/515;>xerial/snappy-java#515
   internal fix: Use Windows-aarch64 target name by https://github.com/xerial;>@​xerial in https://redirect.github.com/xerial/snappy-java/pull/518;>xerial/snappy-java#518
   win-aarch64 (fix): Fix dll name by https://github.com/xerial;>@​xerial in https://redirect.github.com/xerial/snappy-java/pull/520;>xerial/snappy-java#520
   
    Dependency Updates
   
   Bump jwlawson/actions-setup-cmake from 1.13 to 1.14 by https://github.com/dependabot;>@​dependabot in https://redirect.github.com/xerial/snappy-java/pull/514;>xerial/snappy-java#514
   Update native libraries by https://github.com/github-actions;>@​github-actions in 
https://redirect.github.com/xerial/snappy-java/pull/519;>xerial/snappy-java#519
   Update native libraries by https://github.com/github-actions;>@​github-actions in 
https://redirect.github.com/xerial/snappy-java/pull/521;>xerial/snappy-java#521
   internal: Support JDK21 in CI by https://github.com/xerial;>@​xerial in https://redirect.github.com/xerial/snappy-java/pull/510;>xerial/snappy-java#510
   
   New Contributors
   
   https://github.com/vineshcpaul;>@​vineshcpaul 
made their first contribution in https://redirect.github.com/xerial/snappy-java/pull/515;>xerial/snappy-java#515
   
   Full Changelog: https://github.com/xerial/snappy-java/compare/v1.1.10.4...v1.1.10.5;>https://github.com/xerial/snappy-java/compare/v1.1.10.4...v1.1.10.5
   
   
   
   Commits
   
   https://github.com/xerial/snappy-java/commit/08abfa4f85b3a39c5fe8fa2f43b482440bedb5a3;>08abfa4
 Update native libraries for 4b2c1e89a42bc1fc715199974140f93cefe37d71 (https://redirect.github.com/xerial/snappy-java/issues/521;>#521)
   https://github.com/xerial/snappy-java/commit/4b2c1e89a42bc1fc715199974140f93cefe37d71;>4b2c1e8
 win-aarch64 (fix): Fix dll name (https://redirect.github.com/xerial/snappy-java/issues/520;>#520)
   https://github.com/xerial/snappy-java/commit/0fff1ac8f59dea7d7906b85f7f3536eebcc03388;>0fff1ac
 Update native libraries for e6d1196bc68dd76d19e915ee0124c4d42b020ef2 (https://redirect.github.com/xerial/snappy-java/issues/519;>#519)
   https://github.com/xerial/snappy-java/commit/e6d1196bc68dd76d19e915ee0124c4d42b020ef2;>e6d1196
 internal fix: Use Windows-aarch64 target name and compiler options (https://redirect.github.com/xerial/snappy-java/issues/518;>#518)
   https://github.com/xerial/snappy-java/commit/3c67a7b51cc78e7c2f3b050541f37edfa450eb23;>3c67a7b
 ppc64-le (Fix): Use an LTS-version of cross-compiler for Linux ppc64-le (https://redirect.github.com/xerial/snappy-java/issues/516;>#516)
   https://github.com/xerial/snappy-java/commit/67f5d2698170a7c75cda62c91106dc8201e8d75a;>67f5d26
 Bump jwlawson/actions-setup-cmake from 1.13 to 1.14 (https://redirect.github.com/xerial/snappy-java/issues/514;>#514)
   https://github.com/xerial/snappy-java/commit/ee96b64c7d878fceeb272eb5e6fcac6c0aea4088;>ee96b64
 Feature: Add Windows arm64 support (https://redirect.github.com/xerial/snappy-java/issues/511;>#511)
   https://github.com/xerial/snappy-java/commit/0016fed5b455a14ec15da6d40d8422cd985bc843;>0016fed
 Fix GLIBC_2.32 not found error on IBM PowerPC LE RedHat 8.6 OS (required by 
/...
   https://github.com/xerial/snappy-java/commit/681b2e1b96f8b3ada5f46162aaea32cc0d41472d;>681b2e1
 internal: Support JDK21 (https://redirect.github.com/xerial/snappy-java/issues/510;>#510)
   See full diff in https://github.com/xerial/snappy-java/compare/v1.1.10.4...v1.1.10.5;>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.xerial.snappy:snappy-java=maven=1.1.10.4=1.1.10.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can 

[GitHub] [tika] dependabot[bot] opened a new pull request, #1365: Bump org.netpreserve:jwarc from 0.28.2 to 0.28.3

2023-09-27 Thread via GitHub


dependabot[bot] opened a new pull request, #1365:
URL: https://github.com/apache/tika/pull/1365

   Bumps [org.netpreserve:jwarc](https://github.com/iipc/jwarc) from 0.28.2 to 
0.28.3.
   
   Release notes
   Sourced from https://github.com/iipc/jwarc/releases;>org.netpreserve:jwarc's 
releases.
   
   v0.28.3
   Release 0.28.3
   Bugs fixed:
   
   Fixed multithreading issue on GzipChannel write header https://redirect.github.com/iipc/jwarc/issues/79;>#79
   
   
   
   
   Commits
   
   https://github.com/iipc/jwarc/commit/0c2503fc586a235bb118a98176cf385c80c1d2b3;>0c2503f
 Release 0.28.3
   https://github.com/iipc/jwarc/commit/68575d42cf0c589b78923055864ab73640b5e78e;>68575d4
 Fix multithreading issue on GzipChannel write header
   See full diff in https://github.com/iipc/jwarc/compare/v0.28.2...v0.28.3;>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.netpreserve:jwarc=maven=0.28.2=0.28.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4139) Tika modules are not JPMS friendly

2023-09-27 Thread Maxim Solodovnik (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769876#comment-17769876
 ] 

Maxim Solodovnik commented on TIKA-4139:


{quote}Any recs for detecting modules missing these and/or failing the build 
for missing module names?{quote}
I'm not aware of such :(

Maybe I can write some bash script ... :)

I'm afraid I found one more issue with JPMS:

I'm having {{src/main/resources/org/apache/tika/mime/custom-mimetypes.xml}} (As 
recommended here https://tika.apache.org/2.6.0/parser_guide.html)

But this immediately introduces {{org.apache.tika.mime}} package in our jar 
(which is conflicting with same package in Tika ... :((( )

Maybe it worth to create some alternative way to implement this?

Like
* Something like MimeTypesFactory.CUSTOM_MIMES_SYS_PROP but pointing to 
resource on classpath?
* Extendable static Map at MimeTypesFactory?
* Some sort of Service Locator?
* something better than above? :)))

Shall I create new JIRA?


> Tika modules are not JPMS friendly
> --
>
> Key: TIKA-4139
> URL: https://issues.apache.org/jira/browse/TIKA-4139
> Project: Tika
>  Issue Type: Improvement
>Affects Versions: 2.9.0
>Reporter: Maxim Solodovnik
>Priority: Major
> Fix For: 3.0.0-BETA
>
>
> Hello,
> Tika-3 has some major changes, let's add some more :)
> Recently I got following warning while trying to use Tika in JPMS web 
> application:
> {code}
> [INFO] --- compiler:3.11.0:compile (default-compile) @ openmeetings-util ---
> [WARNING] Can't extract module name from 
> tika-parsers-standard-package-2.9.0.jar: tika.parsers.standard.package: 
> Invalid module name: 'package' is not a Java identifier
> {code}
> I've checked {code}main{code} branch and find no {{module-info.java}} and 
> {{Automatic-Module-Name}} also doesn't set.
> Maybe {{Automatic-Module-Name}} can be added to Tika modules?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4144) Remove exclusions from thredds/ucar based on old licensing

2023-09-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769740#comment-17769740
 ] 

ASF GitHub Bot commented on TIKA-4144:
--

tballison opened a new pull request, #1364:
URL: https://github.com/apache/tika/pull/1364

   
   
   Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! 
Your help is appreciated!
   
   Before opening the pull request, please verify that
   * there is an open issue on the [Tika issue 
tracker](https://issues.apache.org/jira/projects/TIKA) which describes the 
problem or the improvement. We cannot accept pull requests without an issue 
because the change wouldn't be listed in the release notes.
   * the issue ID (`TIKA-`)
 - is referenced in the title of the pull request
 - and placed in front of your commit messages surrounded by square 
brackets (`[TIKA-] Issue or pull request title`)
   * commits are squashed into a single one (or few commits for larger changes)
   * Tika is successfully built and unit tests pass by running `mvn clean test`
   * there should be no conflicts when merging the pull request branch into the 
*recent* `main` branch. If there are conflicts, please try to rebase the pull 
request branch on top of a freshly pulled `main` branch
   * if you add new module that downstream users will depend upon add it to 
relevant group in `tika-bom/pom.xml`.
   
   We will be able to faster integrate your pull request if these conditions 
are met. If you have any questions how to fix your problem or about using Tika 
in general, please sign up for the [Tika mailing 
list](http://tika.apache.org/mail-lists.html). Thanks!
   




> Remove exclusions from thredds/ucar based on old licensing
> --
>
> Key: TIKA-4144
> URL: https://issues.apache.org/jira/browse/TIKA-4144
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Trivial
>
> On TIKA-763, Jukka noticed that some of the source files for netcdf had 
> incompatible licenses, and we've been excluding those class files since then.
> I recently looked at the version of netcdf that we're currently using and 
> those source files have been re-licensed to the general thredds/netcdf 
> license, which is available here: 
> https://github.com/Unidata/thredds/blob/v4.5.5/LICENSE.txt
> See also: https://issues.apache.org/jira/browse/TIKA-766
> Ideally, if we could get netcdf to publish to maven central, we could pick up 
> [5.x which changed the license to 
> BSD-3|https://www.unidata.ucar.edu/blogs/developer/entry/thredds-licence-change].
> Short of that, we can at least remove the class exclusions now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] tballison opened a new pull request, #1364: TIKA-4144

2023-09-27 Thread via GitHub


tballison opened a new pull request, #1364:
URL: https://github.com/apache/tika/pull/1364

   
   
   Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! 
Your help is appreciated!
   
   Before opening the pull request, please verify that
   * there is an open issue on the [Tika issue 
tracker](https://issues.apache.org/jira/projects/TIKA) which describes the 
problem or the improvement. We cannot accept pull requests without an issue 
because the change wouldn't be listed in the release notes.
   * the issue ID (`TIKA-`)
 - is referenced in the title of the pull request
 - and placed in front of your commit messages surrounded by square 
brackets (`[TIKA-] Issue or pull request title`)
   * commits are squashed into a single one (or few commits for larger changes)
   * Tika is successfully built and unit tests pass by running `mvn clean test`
   * there should be no conflicts when merging the pull request branch into the 
*recent* `main` branch. If there are conflicts, please try to rebase the pull 
request branch on top of a freshly pulled `main` branch
   * if you add new module that downstream users will depend upon add it to 
relevant group in `tika-bom/pom.xml`.
   
   We will be able to faster integrate your pull request if these conditions 
are met. If you have any questions how to fix your problem or about using Tika 
in general, please sign up for the [Tika mailing 
list](http://tika.apache.org/mail-lists.html). Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (TIKA-4144) Remove exclusions from thredds/ucar based on old licensing

2023-09-27 Thread Tim Allison (Jira)
Tim Allison created TIKA-4144:
-

 Summary: Remove exclusions from thredds/ucar based on old licensing
 Key: TIKA-4144
 URL: https://issues.apache.org/jira/browse/TIKA-4144
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison


On TIKA-763, Jukka noticed that some of the source files for netcdf had 
incompatible licenses, and we've been excluding those class files since then.

I recently looked at the version of netcdf that we're currently using and those 
source files have been re-licensed to the general thredds/netcdf license, which 
is available here: https://github.com/Unidata/thredds/blob/v4.5.5/LICENSE.txt

See also: https://issues.apache.org/jira/browse/TIKA-766

Ideally, if we could get netcdf to publish to maven central, we could pick up 
[5.x which changed the license to 
BSD-3|https://www.unidata.ucar.edu/blogs/developer/entry/thredds-licence-change].

Short of that, we can at least remove the class exclusions now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4143) Consider adding alternative to fat jar artifacts

2023-09-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769681#comment-17769681
 ] 

ASF GitHub Bot commented on TIKA-4143:
--

tballison opened a new pull request, #1363:
URL: https://github.com/apache/tika/pull/1363

   
   
   Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! 
Your help is appreciated!
   
   Before opening the pull request, please verify that
   * there is an open issue on the [Tika issue 
tracker](https://issues.apache.org/jira/projects/TIKA) which describes the 
problem or the improvement. We cannot accept pull requests without an issue 
because the change wouldn't be listed in the release notes.
   * the issue ID (`TIKA-`)
 - is referenced in the title of the pull request
 - and placed in front of your commit messages surrounded by square 
brackets (`[TIKA-] Issue or pull request title`)
   * commits are squashed into a single one (or few commits for larger changes)
   * Tika is successfully built and unit tests pass by running `mvn clean test`
   * there should be no conflicts when merging the pull request branch into the 
*recent* `main` branch. If there are conflicts, please try to rebase the pull 
request branch on top of a freshly pulled `main` branch
   * if you add new module that downstream users will depend upon add it to 
relevant group in `tika-bom/pom.xml`.
   
   We will be able to faster integrate your pull request if these conditions 
are met. If you have any questions how to fix your problem or about using Tika 
in general, please sign up for the [Tika mailing 
list](http://tika.apache.org/mail-lists.html). Thanks!
   




> Consider adding alternative to fat jar artifacts
> 
>
> Key: TIKA-4143
> URL: https://issues.apache.org/jira/browse/TIKA-4143
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> With jpms, it feels like shading is not a great option.  
> I'm not proposing getting rid of fat jars in 3.x.  I'm only proposing 
> offering thin-jar options in addition to our usual shaded fat jars.
> I'm opening this ticket to discuss options for packaging tika-app, 
> tika-server and possibly other components in non-fat jars.
> For app and server, we could put dependencies in the lib/ directory next to 
> the main jar and add "lib" to the classpath of the main jar?  Then zip the 
> main jar and lib directory for distribution?
> Other recommendations?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] tballison opened a new pull request, #1363: TIKA-4143 -- add optional thin jar distributions

2023-09-27 Thread via GitHub


tballison opened a new pull request, #1363:
URL: https://github.com/apache/tika/pull/1363

   
   
   Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! 
Your help is appreciated!
   
   Before opening the pull request, please verify that
   * there is an open issue on the [Tika issue 
tracker](https://issues.apache.org/jira/projects/TIKA) which describes the 
problem or the improvement. We cannot accept pull requests without an issue 
because the change wouldn't be listed in the release notes.
   * the issue ID (`TIKA-`)
 - is referenced in the title of the pull request
 - and placed in front of your commit messages surrounded by square 
brackets (`[TIKA-] Issue or pull request title`)
   * commits are squashed into a single one (or few commits for larger changes)
   * Tika is successfully built and unit tests pass by running `mvn clean test`
   * there should be no conflicts when merging the pull request branch into the 
*recent* `main` branch. If there are conflicts, please try to rebase the pull 
request branch on top of a freshly pulled `main` branch
   * if you add new module that downstream users will depend upon add it to 
relevant group in `tika-bom/pom.xml`.
   
   We will be able to faster integrate your pull request if these conditions 
are met. If you have any questions how to fix your problem or about using Tika 
in general, please sign up for the [Tika mailing 
list](http://tika.apache.org/mail-lists.html). Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4143) Consider adding alternative to fat jar artifacts

2023-09-27 Thread Maxim Solodovnik (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769674#comment-17769674
 ] 

Maxim Solodovnik commented on TIKA-4143:


Maybe it can be BOM? :)

> Consider adding alternative to fat jar artifacts
> 
>
> Key: TIKA-4143
> URL: https://issues.apache.org/jira/browse/TIKA-4143
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> With jpms, it feels like shading is not a great option.  
> I'm not proposing getting rid of fat jars in 3.x.  I'm only proposing 
> offering thin-jar options in addition to our usual shaded fat jars.
> I'm opening this ticket to discuss options for packaging tika-app, 
> tika-server and possibly other components in non-fat jars.
> For app and server, we could put dependencies in the lib/ directory next to 
> the main jar and add "lib" to the classpath of the main jar?  Then zip the 
> main jar and lib directory for distribution?
> Other recommendations?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-4143) Consider adding alternative to fat jar artifacts

2023-09-27 Thread Tim Allison (Jira)
Tim Allison created TIKA-4143:
-

 Summary: Consider adding alternative to fat jar artifacts
 Key: TIKA-4143
 URL: https://issues.apache.org/jira/browse/TIKA-4143
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison


With jpms, it feels like shading is not a great option.  

I'm not proposing getting rid of fat jars in 3.x.  I'm only proposing offering 
thin-jar options in addition to our usual shaded fat jars.

I'm opening this ticket to discuss options for packaging tika-app, tika-server 
and possibly other components in non-fat jars.

For app and server, we could put dependencies in the lib/ directory next to the 
main jar and add "lib" to the classpath of the main jar?  Then zip the main jar 
and lib directory for distribution?

Other recommendations?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-4142) Upgrade tika-deployment in 3.x/main branch

2023-09-27 Thread Tim Allison (Jira)
Tim Allison created TIKA-4142:
-

 Summary: Upgrade tika-deployment in 3.x/main branch
 Key: TIKA-4142
 URL: https://issues.apache.org/jira/browse/TIKA-4142
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-4141) Upgrade solrj to 9.x in our 3.x/main branch

2023-09-27 Thread Tim Allison (Jira)
Tim Allison created TIKA-4141:
-

 Summary: Upgrade solrj to 9.x in our 3.x/main branch
 Key: TIKA-4141
 URL: https://issues.apache.org/jira/browse/TIKA-4141
 Project: Tika
  Issue Type: Improvement
Reporter: Tim Allison


I'm not sure we want to upgrade or how to go about it, but wanted to open an 
issue for discussion.

It looks like solrj 9.x uses http/2 where possible.  Will this be compatible 
with Solr 7-8?  Would we need to create a separate emitter for those who want 
to use solrj 9.x?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-4109) Remove use of EOL component TagSoup 1.2.1 from tika-parsers-standard-package

2023-09-27 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-4109.
---
Fix Version/s: 3.0.0-BETA
   Resolution: Fixed

> Remove use of EOL component TagSoup 1.2.1 from tika-parsers-standard-package
> 
>
> Key: TIKA-4109
> URL: https://issues.apache.org/jira/browse/TIKA-4109
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Sandeep Kulkarni
>Priority: Major
> Fix For: 3.0.0-BETA
>
>
> tika-parsers-standard-package has dependency of 
> *org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1.* It is getting detected EOL as 
> there is no new version since 10+ yrs by source code scanners.
> That project is not maintained any more and the source code for it also not 
> available anymore. Homepage is also not reachable: 
> [http://home.ccil.org/~cowan/XML/tagsoup/.|http://home.ccil.org/~cowan/XML/tagsoup/]
> There is a fork created on Github: 
> [https://github.com/zmokhtar/TagSoup-Webs.] But there does not seems to be 
> any further activity there as well.
> Is it possible to remove the use of TagSoup 1.2.1 by using alternates? If I 
> was aware of one, I would have suggested myself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4109) Remove use of EOL component TagSoup 1.2.1 from tika-parsers-standard-package

2023-09-27 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769564#comment-17769564
 ] 

Tim Allison commented on TIKA-4109:
---

TIKA-3948 is waiting for the next release of Apache SIS.  Once we can merge 
that, we should be good to go for a 3.0.0-BETA release with a 3.0.0 release a 
month later.  That's my personal speculation... we have to see what fellow Tika 
devs think.

IIRC, SIS thought they might make their next release by the upcoming Community 
Over Code conference which is Oct 7-10.

> Remove use of EOL component TagSoup 1.2.1 from tika-parsers-standard-package
> 
>
> Key: TIKA-4109
> URL: https://issues.apache.org/jira/browse/TIKA-4109
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Sandeep Kulkarni
>Priority: Major
>
> tika-parsers-standard-package has dependency of 
> *org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1.* It is getting detected EOL as 
> there is no new version since 10+ yrs by source code scanners.
> That project is not maintained any more and the source code for it also not 
> available anymore. Homepage is also not reachable: 
> [http://home.ccil.org/~cowan/XML/tagsoup/.|http://home.ccil.org/~cowan/XML/tagsoup/]
> There is a fork created on Github: 
> [https://github.com/zmokhtar/TagSoup-Webs.] But there does not seems to be 
> any further activity there as well.
> Is it possible to remove the use of TagSoup 1.2.1 by using alternates? If I 
> was aware of one, I would have suggested myself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4109) Remove use of EOL component TagSoup 1.2.1 from tika-parsers-standard-package

2023-09-27 Thread Sandeep Kulkarni (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769462#comment-17769462
 ] 

Sandeep Kulkarni commented on TIKA-4109:


Hi [~tallison], is it possible to get an idea timeline required for making use 
of jsoup in place of TagSoup?

> Remove use of EOL component TagSoup 1.2.1 from tika-parsers-standard-package
> 
>
> Key: TIKA-4109
> URL: https://issues.apache.org/jira/browse/TIKA-4109
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Sandeep Kulkarni
>Priority: Major
>
> tika-parsers-standard-package has dependency of 
> *org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1.* It is getting detected EOL as 
> there is no new version since 10+ yrs by source code scanners.
> That project is not maintained any more and the source code for it also not 
> available anymore. Homepage is also not reachable: 
> [http://home.ccil.org/~cowan/XML/tagsoup/.|http://home.ccil.org/~cowan/XML/tagsoup/]
> There is a fork created on Github: 
> [https://github.com/zmokhtar/TagSoup-Webs.] But there does not seems to be 
> any further activity there as well.
> Is it possible to remove the use of TagSoup 1.2.1 by using alternates? If I 
> was aware of one, I would have suggested myself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)