[GitHub] [tika] dependabot[bot] opened a new pull request, #701: Bump google-cloud-storage from 2.11.3 to 2.12.0

2022-09-15 Thread GitBox


dependabot[bot] opened a new pull request, #701:
URL: https://github.com/apache/tika/pull/701

   Bumps [google-cloud-storage](https://github.com/googleapis/java-storage) 
from 2.11.3 to 2.12.0.
   
   Release notes
   Sourced from https://github.com/googleapis/java-storage/releases;>google-cloud-storage's
 releases.
   
   v2.12.0
   https://github.com/googleapis/java-storage/compare/v2.11.3...v2.12.0;>2.12.0
 (2022-09-15)
   Features
   
   Add toString method for CustomPlacementConfig (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1602;>#1602)
 (https://github.com/googleapis/java-storage/commit/51aca10fafe685ed9e7cb41bc4ae79be10feb080;>51aca10)
   
   Documentation
   
   Add batch sample (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1559;>#1559)
 (https://github.com/googleapis/java-storage/commit/583bf73f5d58aa5d79fbaa12b24407c558235eed;>583bf73)
   Document thread safety of library (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1566;>#1566)
 (https://github.com/googleapis/java-storage/commit/c7408999e811ba917edb0c136432afa29075e0f2;>c740899)
   Fix broken links in readme (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1520;>#1520)
 (https://github.com/googleapis/java-storage/commit/840b08a03fa7c0535855140244c282f79403b458;>840b08a)
   
   Dependencies
   
   Update dependency com.google.cloud:google-cloud-shared-dependencies to 
v3.0.2 (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1611;>#1611)
 (https://github.com/googleapis/java-storage/commit/8a48aea7e0049c64ef944b532a2874115b1e2323;>8a48aea)
   Update dependency com.google.cloud:google-cloud-shared-dependencies to 
v3.0.3 (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1620;>#1620)
 (https://github.com/googleapis/java-storage/commit/20e63785462e7876a7ff0ca1363007cc160f;>20e6378)
   
   
   
   
   Changelog
   Sourced from https://github.com/googleapis/java-storage/blob/main/CHANGELOG.md;>google-cloud-storage's
 changelog.
   
   https://github.com/googleapis/java-storage/compare/v2.11.3...v2.12.0;>2.12.0
 (2022-09-15)
   Features
   
   Add toString method for CustomPlacementConfig (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1602;>#1602)
 (https://github.com/googleapis/java-storage/commit/51aca10fafe685ed9e7cb41bc4ae79be10feb080;>51aca10)
   
   Documentation
   
   Add batch sample (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1559;>#1559)
 (https://github.com/googleapis/java-storage/commit/583bf73f5d58aa5d79fbaa12b24407c558235eed;>583bf73)
   Document thread safety of library (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1566;>#1566)
 (https://github.com/googleapis/java-storage/commit/c7408999e811ba917edb0c136432afa29075e0f2;>c740899)
   Fix broken links in readme (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1520;>#1520)
 (https://github.com/googleapis/java-storage/commit/840b08a03fa7c0535855140244c282f79403b458;>840b08a)
   
   Dependencies
   
   Update dependency com.google.cloud:google-cloud-shared-dependencies to 
v3.0.2 (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1611;>#1611)
 (https://github.com/googleapis/java-storage/commit/8a48aea7e0049c64ef944b532a2874115b1e2323;>8a48aea)
   Update dependency com.google.cloud:google-cloud-shared-dependencies to 
v3.0.3 (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1620;>#1620)
 (https://github.com/googleapis/java-storage/commit/20e63785462e7876a7ff0ca1363007cc160f;>20e6378)
   
   
   
   
   Commits
   
   https://github.com/googleapis/java-storage/commit/932259e9a744081b5416c9fb582af519b4360146;>932259e
 chore(main): release 2.12.0 (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1565;>#1565)
   https://github.com/googleapis/java-storage/commit/20e63785462e7876a7ff0ca1363007cc160f;>20e6378
 deps: update dependency com.google.cloud:google-cloud-shared-dependencies to 
...
   https://github.com/googleapis/java-storage/commit/5915383e68cb99d416f2b50b7f924a91b788ad13;>5915383
 test(deps): update dependency com.google.cloud:google-cloud-pubsub to 
v1.120
   https://github.com/googleapis/java-storage/commit/c4432fda450b9cb6f03c984e0c0d89e4d71f3c6c;>c4432fd
 chore(bazel): Update WORKSPACE files for rules_gapic, gax_java, 
generator_jav...
   https://github.com/googleapis/java-storage/commit/c779dde5724ddc2153be06c6fae72ac4bb325e07;>c779dde
 test(deps): update dependency com.google.cloud:google-cloud-pubsub to 
v1.120
   https://github.com/googleapis/java-storage/commit/3ef792fd180023ed63e2790554e2cfb772651f5a;>3ef792f
 test(deps): update dependency org.mockito:mockito-core to v4.8.0 (https://github-redirect.dependabot.com/googleapis/java-storage/issues/1609;>#1609)
   https://github.com/googleapis/java-storage/commit/34f2aa85e975293c7358be9b955b3bea257e9815;>34f2aa8
 

[GitHub] [tika] dependabot[bot] opened a new pull request, #700: Bump spring-context from 5.3.22 to 5.3.23

2022-09-15 Thread GitBox


dependabot[bot] opened a new pull request, #700:
URL: https://github.com/apache/tika/pull/700

   Bumps [spring-context](https://github.com/spring-projects/spring-framework) 
from 5.3.22 to 5.3.23.
   
   Release notes
   Sourced from https://github.com/spring-projects/spring-framework/releases;>spring-context's
 releases.
   
   v5.3.23
   :star: New Features
   
   Introduce AnnotationUtils.isSynthesizedAnnotation(Annotation) https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/29054;>#29054
   Introduce createContext() factory method in 
AbstractGenericWebContextLoader https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/28983;>#28983
   Support TreeSet collection type in CollectionFactory.createCollection() 
without using reflection https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28949;>#28949
   Document when RequestEntity.getUrl() throws an 
UnsupportedOperationException https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/28930;>#28930
   Deprecate NestedIOException https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/28929;>#28929
   Make isConnected() in WebSocketConnectionManager public https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28785;>#28785
   Expose headers from STOMP RECEIPT frame to registered callbacks https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28715;>#28715
   Make WebClientException serializable https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/28321;>#28321
   
   :lady_beetle: Bug Fixes
   
   Ordering inconsistency with beans defined in parent context https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/29105;>#29105
   RelativeRedirectResponseWrapper does not commit response in sendRedirect 
https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/29050;>#29050
   MockServerContainerContextCustomizerFactory does not support 
@Nested tests https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/29037;>#29037
   Request to improve KotlinSerializationJsonHttpMessageConverter logic in 
RestTemplate https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/29008;>#29008
   WebFlux: multipart requests hang sometimes https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/28963;>#28963
   DataBufferUtils.write(Publisher, Path) loses context https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/28933;>#28933
   connectionTimeOut and readTimeout not working on UrlResource https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/28909;>#28909
   SockJsServiceRegistration#setSupressCors has a typo and should be 
deprecated https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28853;>#28853
   RenderingResponse does not set status code on redirect views https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/28839;>#28839
   Avoid IllegalArgumentException when setting WebSocket error status https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28836;>#28836
   Loss of context path after using ServerRequest.from https://github-redirect.dependabot.com/spring-projects/spring-framework/issues/28820;>#28820
   ResponseCookie does not declare nullability annotations consistently for 
domain and path https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28780;>#28780
   
   :notebook_with_decorative_cover: Documentation
   
   Fix typo in data-access section https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/29048;>#29048
   Correct description of @RequestParam with WebFlux https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28944;>#28944
   Fix broken kdoc-api links in kotlin.adoc https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28908;>#28908
   Fix typos in Javadoc of class AbstractEncoder https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28885;>#28885
   Fix links in Javadoc and reference docs https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28876;>#28876
   Add missing closing parenthesis in reference doc https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28867;>#28867
   Fix typos in Javadoc, reference docs, and code https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28822;>#28822
   Replace use of the tt HTML tag in Javadoc https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28819;>#28819
   Fix broken link in rsocket documentation https://github-redirect.dependabot.com/spring-projects/spring-framework/pull/28817;>#28817
   Clarify docs on JNDI properties in Servlet environment 

[GitHub] [tika] dependabot[bot] opened a new pull request, #699: Bump aws.version from 1.12.303 to 1.12.304

2022-09-15 Thread GitBox


dependabot[bot] opened a new pull request, #699:
URL: https://github.com/apache/tika/pull/699

   Bumps `aws.version` from 1.12.303 to 1.12.304.
   Updates `aws-java-sdk-transcribe` from 1.12.303 to 1.12.304
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-transcribe's
 changelog.
   
   1.12.304 2022-09-15
   Amazon DynamoDB
   
   
   Features
   
   Increased DynamoDB transaction limit from 25 to 100.
   
   
   
   Amazon Elastic Compute Cloud
   
   
   Features
   
   This feature allows customers to create tags for 
vpc-endpoint-connections and vpc-endpoint-service-permissions.
   
   
   
   Amazon SageMaker Service
   
   
   Features
   
   Amazon SageMaker Automatic Model Tuning now supports specifying 
Hyperband strategy for tuning jobs, which uses a multi-fidelity based tuning 
strategy to stop underperforming hyperparameter configurations early.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/6550dbc6d5b2c12118eecd88ac325857251a0909;>6550dbc
 AWS SDK for Java 1.12.304
   https://github.com/aws/aws-sdk-java/commit/ee307b365c9c979c81cfd5f32990de045599f064;>ee307b3
 Update GitHub version number to 1.12.304-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.303...1.12.304;>compare 
view
   
   
   
   
   Updates `aws-java-sdk-s3` from 1.12.303 to 1.12.304
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-s3's
 changelog.
   
   1.12.304 2022-09-15
   Amazon DynamoDB
   
   
   Features
   
   Increased DynamoDB transaction limit from 25 to 100.
   
   
   
   Amazon Elastic Compute Cloud
   
   
   Features
   
   This feature allows customers to create tags for 
vpc-endpoint-connections and vpc-endpoint-service-permissions.
   
   
   
   Amazon SageMaker Service
   
   
   Features
   
   Amazon SageMaker Automatic Model Tuning now supports specifying 
Hyperband strategy for tuning jobs, which uses a multi-fidelity based tuning 
strategy to stop underperforming hyperparameter configurations early.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/6550dbc6d5b2c12118eecd88ac325857251a0909;>6550dbc
 AWS SDK for Java 1.12.304
   https://github.com/aws/aws-sdk-java/commit/ee307b365c9c979c81cfd5f32990de045599f064;>ee307b3
 Update GitHub version number to 1.12.304-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.303...1.12.304;>compare 
view
   
   
   
   
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #698: Bump jetty.version from 9.4.48.v20220622 to 9.4.49.v20220914

2022-09-15 Thread GitBox


dependabot[bot] opened a new pull request, #698:
URL: https://github.com/apache/tika/pull/698

   Bumps `jetty.version` from 9.4.48.v20220622 to 9.4.49.v20220914.
   Updates `jetty-http` from 9.4.48.v20220622 to 9.4.49.v20220914
   
   Release notes
   Sourced from https://github.com/eclipse/jetty.project/releases;>jetty-http's 
releases.
   
   9.4.49.v20220914
   Changelog
   
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8578;>#8578
 - getRequestURL can append null if 
getRequestURI is unspecified in an authority-form 
request-target
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8493;>#8493
 - Review HTTP client feature setRemoveIdleDestinations
   
   Dependencies
   
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8253;>#8253
 - Bump google-cloud-datastore to 2.9.1
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8233;>#8233
 - Bump jna to 5.12.1
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8242;>#8242
 - Bump mariadb-java-client to 3.0.6
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8238;>#8238
 - Bump maven-enforcer-plugin to 3.1.0
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8230;>#8230
 - Bump maven.version to 3.8.6
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8246;>#8246
 - Bump org.eclipse.osgi to 3.18.0
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8245;>#8245
 - Bump testcontainers.version to 1.17.3
   
   
   
   
   Commits
   
   https://github.com/eclipse/jetty.project/commit/4231a3b2e4cb8548a412a789936d640a97b1aa0a;>4231a3b
 Updating to version 9.4.49.v20220914
   https://github.com/eclipse/jetty.project/commit/b32d739a1d158c270b98c300e9b84af245bfde2d;>b32d739
 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8579;>#8579
 from eclipse/fix/jetty-9.4.x-abstractproxy-null-requ...
   https://github.com/eclipse/jetty.project/commit/5944ff4b3a0aa0b9c2a5ad4048fd497e6d7a23cf;>5944ff4
 Issue https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8578;>#8578
 - Changes from review
   https://github.com/eclipse/jetty.project/commit/48c16deb21efd67d369675a9126e68459fdc9408;>48c16de
 Issue https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8578;>#8578
 - test both request URL/URI results
   https://github.com/eclipse/jetty.project/commit/d3c7ee3d71c57a32336481df7246c49ff51282b1;>d3c7ee3
 Issue https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8578;>#8578
 - restore backward compat of getRequestURL and getRequestURI when...
   https://github.com/eclipse/jetty.project/commit/06f2fa41ddd83236a8484572e93fb3363c2084ad;>06f2fa4
 Jetty 9.4.x : fix client remove idle destinations (https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8495;>#8495)
   https://github.com/eclipse/jetty.project/commit/940455b01274d957075166d53e9b908b27ed7ad6;>940455b
 https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8414;>#8414:
 fix drainTo when head == tail but the queue isn't empty
   https://github.com/eclipse/jetty.project/commit/a846f4fc9dc734d40084f58af44ac925c0ba0aa8;>a846f4f
 Updating for published CVES (https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8273;>#8273)
   https://github.com/eclipse/jetty.project/commit/064682b4ce57282e49a80a64b6d7a7a66fb47b28;>064682b
 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8253;>#8253
 from eclipse/dependabot/maven/jetty-9.4.x/com.google...
   https://github.com/eclipse/jetty.project/commit/7b4057142ed44a29849f24a2572d2649e9458921;>7b40571
 Merge pull request https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8245;>#8245
 from eclipse/dependabot/maven/jetty-9.4.x/testcontai...
   Additional commits viewable in https://github.com/eclipse/jetty.project/compare/jetty-9.4.48.v20220622...jetty-9.4.49.v20220914;>compare
 view
   
   
   
   
   Updates `jetty-io` from 9.4.48.v20220622 to 9.4.49.v20220914
   
   Release notes
   Sourced from https://github.com/eclipse/jetty.project/releases;>jetty-io's 
releases.
   
   9.4.49.v20220914
   Changelog
   
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8578;>#8578
 - getRequestURL can append null if 
getRequestURI is unspecified in an authority-form 
request-target
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8493;>#8493
 - Review HTTP client feature setRemoveIdleDestinations
   
   Dependencies
   
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8253;>#8253
 - Bump google-cloud-datastore to 2.9.1
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8233;>#8233
 - Bump jna to 5.12.1
   https://github-redirect.dependabot.com/eclipse/jetty.project/issues/8242;>#8242
 - Bump mariadb-java-client to 3.0.6
   

[jira] [Closed] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed TIKA-3858.
-
Resolution: Duplicate

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>  Labels: ActualText
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605596#comment-17605596
 ] 

Tilman Hausherr commented on TIKA-3858:
---

No, except OCR. There will always be files with incomplete extraction. I don't 
understand why Chrome is producing these weird (but legit) files, the 
/ToUnicode syntax supports ligatures. ActualText support is not being worked on 
at this time. I have added your name in the watchers list. I'll close this 
issue because it isn't the fault of tika.

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>  Labels: ActualText
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605580#comment-17605580
 ] 

tom hill commented on TIKA-3858:


Ok, thanks.

Is there anything I can do as a Tika user to work around this issue?

Is ActualText support being considered?

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>  Labels: ActualText
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)


[ https://issues.apache.org/jira/browse/TIKA-3858 ]


Tilman Hausherr deleted comment on TIKA-3858:
---

was (Author: tilman):
Please attach the problematic file, and compare to what you get with Adobe 
Reader.

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>  Labels: ActualText
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)


[ https://issues.apache.org/jira/browse/TIKA-3858 ]


Tilman Hausherr deleted comment on TIKA-3858:
---

was (Author: JIRAUSER295805):
Apologies, I was still editing the cloned issue. You are responding to the old 
text. I will update.

 

 

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>  Labels: ActualText
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)


[ https://issues.apache.org/jira/browse/TIKA-3858 ]


Tilman Hausherr deleted comment on TIKA-3858:
---

was (Author: JIRAUSER295805):
Ok, the description has been updated. 

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>  Labels: ActualText
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)


[ https://issues.apache.org/jira/browse/TIKA-3858 ]


Tilman Hausherr deleted comment on TIKA-3858:
---

was (Author: tilman):
The current PDFBox version (2.0.26) doesn't use it. It's used in PDFBox 1.8.17 
which has many drawbacks. The latest tika version is 2.4.1, please try that one.

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>  Labels: ActualText
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated TIKA-3858:
--
Labels: ActualText  (was: )

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>  Labels: ActualText
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605576#comment-17605576
 ] 

Tilman Hausherr commented on TIKA-3858:
---

The font has an incorrect /ToUnicode stream which can be found at 
{{Root/Pages/Kids/[0]/Resources/Font/F7/ToUnicode}} with PDFDebugger. The 
incorrect line is {{<8D> <>}} i.e. it maps the 8D code to 0. However the 
page content stream corrects this with the {{ActualText}} feature that we don't 
support
{code}
  /P << /MCID 8 >> BDC
/F7 14 Tf
1 0 0 -1 64 293 Tm
(\015) Tj
9.491989 0 Td
(j) Tj
5.026001 0 Td
(>) Tj
/Span << /ActualText (ft) >> BDC
  7.6019897 0 Td
  (\215) Tj
EMC
9.673996 0 Td
(k) Tj
  EMC
{code}
More on this in PDFBOX-4532 and PDFBOX-5155.

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605518#comment-17605518
 ] 

tom hill commented on TIKA-3858:


When I open TikaChromeInboxLigature.pdf in Adobe reader, the word "Drafts" uses 
the ft ligature. I can tell by selecting one character at a time. When I copy 
the word Drafts and paste it into TextEdit, I get "f" and "t" as separate 
characters. 

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3856) Upgrade to jempbox 1.8.17

2022-09-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605517#comment-17605517
 ] 

Hudson commented on TIKA-3856:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #800 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/800/])
TIKA-3856 -- upgrade jempbox to 1.8.17 (tallison: 
[https://github.com/apache/tika/commit/1b593b12146867ba8827ee55e7e64b01ccb4533c])
* (edit) CHANGES.txt
* (edit) tika-parent/pom.xml


> Upgrade to jempbox 1.8.17
> -
>
> Key: TIKA-3856
> URL: https://issues.apache.org/jira/browse/TIKA-3856
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.5.0
>
>
> Vote passed. In release process now. Many thanks to [~lehmi] [~tilman] and 
> our PDFBox colleagues!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605512#comment-17605512
 ] 

tom hill edited comment on TIKA-3858 at 9/15/22 8:14 PM:
-

For the attachment TikaChromeInboxLigature.pdf

 

% java -jar tika-app-2.4.1.jar TikaChromeInboxLigature.pdf  | grep Dra | 
hexdump -C

  3c 70 3e 44 72 61 ef bf  bd 73 0a                 |Dra...s.|

000b

 

I believe that is 0xFFFD for the replacement character.


was (Author: JIRAUSER295805):
For the attachment TikaChromeInboxLigature.pdf

 

% java -jar tika-app-2.4.1.jar TikaChromeInboxLigature.pdf  | grep Dra | 
hexdump -C

  3c 70 3e 44 72 61 ef bf  bd 73 0a                 |Dra...s.|

000b

 

 

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605512#comment-17605512
 ] 

tom hill commented on TIKA-3858:


For the attachment TikaChromeInboxLigature.pdf

 

% java -jar tika-app-2.4.1.jar TikaChromeInboxLigature.pdf  | grep Dra | 
hexdump -C

  3c 70 3e 44 72 61 ef bf  bd 73 0a                 |Dra...s.|

000b

 

 

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tom hill updated TIKA-3858:
---
Attachment: TikaChromeInboxLigature.pdf

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
> Attachments: TikaChromeInboxLigature.pdf
>
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-3858:
--
Affects Version/s: 2.4.1
   (was: 1.5)

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.4.1
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605486#comment-17605486
 ] 

Tilman Hausherr commented on TIKA-3858:
---

Please attach the problematic file, and compare to what you get with Adobe 
Reader.

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605485#comment-17605485
 ] 

tom hill commented on TIKA-3858:


Ok, the description has been updated. 

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tom hill updated TIKA-3858:
---
Description: 
It appears that the issue in TIKA-1289 is still present. Ligatures get replaced 
by a question mark.

As a particular example, the ft ligature is getting replaced by utf-8: ef bf  bd

Is there any new resolution on this issue? Just returning the fl ligature would 
be great, or normalizing it to f, t.

This particular example comes from saving my gmail inbox page as a pdf, in 
chrome. It uses the ft ligature in the word "Drafts".

There are many similar examples, it's not specific to one pdf generator. 

I'm using tika-app-2.4.1.jar 

  was:
It appears that the issue in TIKA-1289 is still present. Ligatures get replaced 
by a question mark.

As a particular example, the ft ligature is getting replaced by utf-8: ef bf  bd

Is there any new resolution on this issue? Just returning the fl ligature would 
be great, or normalizing it to f, t.

This particular example comes from saving my gmail inbox page as a pdf, in 
chrome. It uses the ft ligature in the word "Drafts".

There are many similar examples, it's not specific to one pdf generator. 


>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 
> I'm using tika-app-2.4.1.jar 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tom hill updated TIKA-3858:
---
Description: 
It appears that the issue in TIKA-1289 is still present. Ligatures get replaced 
by a question mark.

As a particular example, the ft ligature is getting replaced by utf-8: ef bf  bd

Is there any new resolution on this issue? Just returning the fl ligature would 
be great, or normalizing it to f, t.

This particular example comes from saving my gmail inbox page as a pdf, in 
chrome. It uses the ft ligature in the word "Drafts".

There are many similar examples, it's not specific to one pdf generator. 

  was:
It appears that the issue in TIKA-1289 is still present. Ligatures get replaced 
by a question mark.

As a particular example, the ft ligature is getting replaced by utf-8: ef bf  bd

Is there any new resolution on this issue? Just returning the fl ligature would 
be great, or normalizing it to f, t.


>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.
> This particular example comes from saving my gmail inbox page as a pdf, in 
> chrome. It uses the ft ligature in the word "Drafts".
> There are many similar examples, it's not specific to one pdf generator. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tom hill updated TIKA-3858:
---
Description: 
It appears that the issue in TIKA-1289 is still present. Ligatures get replaced 
by a question mark.

As a particular example, the ft ligature is getting replaced by utf-8: ef bf  bd

Is there any new resolution on this issue? Just returning the fl ligature would 
be great, or normalizing it to f, t.

  was:It appears that the issue in TIKA-1289 is still present. Ligatures get 
replaced by a question mark.


>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.
> As a particular example, the ft ligature is getting replaced by utf-8: ef bf  
> bd
> Is there any new resolution on this issue? Just returning the fl ligature 
> would be great, or normalizing it to f, t.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605482#comment-17605482
 ] 

tom hill commented on TIKA-3858:


Apologies, I was still editing the cloned issue. You are responding to the old 
text. I will update.

 

 

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tom hill updated TIKA-3858:
---
Description: It appears that the issue in TIKA-1289 is still present. 
Ligatures get replaced by a question mark.  (was: According to tika sources 
review, it uses pdfbox to parse pdf files. 
I found that pdfbox itself uses icu4j to handle ligatures.
Unfortunately, when i added icu4j jar to my classpath nothing changed, 
ligatures are still not converted. Sample pdf file is attached.)

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>
> It appears that the issue in TIKA-1289 is still present. Ligatures get 
> replaced by a question mark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605481#comment-17605481
 ] 

Tilman Hausherr commented on TIKA-3858:
---

The current PDFBox version (2.0.26) doesn't use it. It's used in PDFBox 1.8.17 
which has many drawbacks. The latest tika version is 2.4.1, please try that one.

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>
> According to tika sources review, it uses pdfbox to parse pdf files. 
> I found that pdfbox itself uses icu4j to handle ligatures.
> Unfortunately, when i added icu4j jar to my classpath nothing changed, 
> ligatures are still not converted. Sample pdf file is attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated TIKA-3858:
--
Fix Version/s: (was: 1.7)

>  Ligatures convert on text extraction
> -
>
> Key: TIKA-3858
> URL: https://issues.apache.org/jira/browse/TIKA-3858
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: win 8, jre 1.5
>Reporter: tom hill
>Priority: Major
>
> According to tika sources review, it uses pdfbox to parse pdf files. 
> I found that pdfbox itself uses icu4j to handle ligatures.
> Unfortunately, when i added icu4j jar to my classpath nothing changed, 
> ligatures are still not converted. Sample pdf file is attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3858) Ligatures convert on text extraction

2022-09-15 Thread tom hill (Jira)
tom hill created TIKA-3858:
--

 Summary:  Ligatures convert on text extraction
 Key: TIKA-3858
 URL: https://issues.apache.org/jira/browse/TIKA-3858
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.5
 Environment: win 8, jre 1.5
Reporter: tom hill
 Fix For: 1.7


According to tika sources review, it uses pdfbox to parse pdf files. 
I found that pdfbox itself uses icu4j to handle ligatures.
Unfortunately, when i added icu4j jar to my classpath nothing changed, 
ligatures are still not converted. Sample pdf file is attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3855) Implement upsert for OpenSearch emitter

2022-09-15 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605474#comment-17605474
 ] 

Hudson commented on TIKA-3855:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #799 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/799/])
TIKA-3855 -- enable upsert for OpenSearchEmitter (tallison: 
[https://github.com/apache/tika/commit/0abebdc27dbfc9eb34abc3619e13ef816af9e331])
* (edit) 
tika-pipes/tika-emitters/tika-emitter-opensearch/src/main/java/org/apache/tika/pipes/emitter/opensearch/OpenSearchClient.java
* (edit) 
tika-integration-tests/tika-pipes-opensearch-integration-tests/src/test/java/org/apache/tika/pipes/opensearch/tests/TikaPipesOpenSearchTest.java
* (edit) 
tika-integration-tests/tika-pipes-opensearch-integration-tests/src/test/java/org/apache/tika/pipes/xsearch/tests/XSearchTestClient.java
* (edit) CHANGES.txt
* (edit) 
tika-integration-tests/tika-pipes-opensearch-integration-tests/src/test/java/org/apache/tika/pipes/xsearch/tests/TikaPipesXSearchBase.java
* (edit) 
tika-integration-tests/tika-pipes-opensearch-integration-tests/src/test/resources/opensearch/tika-config-opensearch.xml
* (edit) 
tika-pipes/tika-emitters/tika-emitter-opensearch/src/test/java/org/apache/tika/pipes/emitter/opensearch/OpenSearchClientTest.java
* (edit) 
tika-pipes/tika-emitters/tika-emitter-opensearch/src/main/java/org/apache/tika/pipes/emitter/opensearch/OpenSearchEmitter.java


> Implement upsert for OpenSearch emitter
> ---
>
> Key: TIKA-3855
> URL: https://issues.apache.org/jira/browse/TIKA-3855
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-3856) Upgrade to jempbox 1.8.17

2022-09-15 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3856.
---
Fix Version/s: 2.5.0
   Resolution: Fixed

> Upgrade to jempbox 1.8.17
> -
>
> Key: TIKA-3856
> URL: https://issues.apache.org/jira/browse/TIKA-3856
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.5.0
>
>
> Vote passed. In release process now. Many thanks to [~lehmi] [~tilman] and 
> our PDFBox colleagues!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3857) Upgrade to POI 5.2.3

2022-09-15 Thread Tim Allison (Jira)
Tim Allison created TIKA-3857:
-

 Summary: Upgrade to POI 5.2.3
 Key: TIKA-3857
 URL: https://issues.apache.org/jira/browse/TIKA-3857
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison


Ran the regression tests today, and all looks good: 
https://corpora.tika.apache.org/base/reports/tika-2.5.0-poi-reports.tgz

 

Vote wraps up tomorrow. :fingers-crossed:



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3856) Upgrade to jempbox 1.8.17

2022-09-15 Thread Tim Allison (Jira)
Tim Allison created TIKA-3856:
-

 Summary: Upgrade to jempbox 1.8.17
 Key: TIKA-3856
 URL: https://issues.apache.org/jira/browse/TIKA-3856
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison


Vote passed. In release process now. Many thanks to [~lehmi] [~tilman] and our 
PDFBox colleagues!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-3855) Implement upsert for OpenSearch emitter

2022-09-15 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3855.
---
Fix Version/s: 2.5.0
   Resolution: Fixed

> Implement upsert for OpenSearch emitter
> ---
>
> Key: TIKA-3855
> URL: https://issues.apache.org/jira/browse/TIKA-3855
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3855) Implement upsert for OpenSearch emitter

2022-09-15 Thread Tim Allison (Jira)
Tim Allison created TIKA-3855:
-

 Summary: Implement upsert for OpenSearch emitter
 Key: TIKA-3855
 URL: https://issues.apache.org/jira/browse/TIKA-3855
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: Issue related to file mime type detection

2022-09-15 Thread Nick Burch

On Thu, 15 Sep 2022, Sindhu Mahadevappa wrote:
We have been looking for the latest Tika 2.4.1 jar file, looks like it 
is not available anywhere.


You can get the Tika App and Tika Server jars for 2.4.1 from
https://tika.apache.org/download.html

For the core and parser jars, manually downloading is not recommended as 
you risk missing dependencies. Just ask Maven or Gradle and they'll pull 
the latest jars for you


Nick


RE: Issue related to file mime type detection

2022-09-15 Thread Sindhu Mahadevappa
Hi Team,

Thanks for the quick response.
We have been looking for the latest Tika 2.4.1 jar file, looks like it is not 
available anywhere.

Can you please share the link where we can get the latest 2.4.1 jar file, it 
will be very helpful.

Thanks & Regards
Sindhu Mahadevappa

> -Original Message-
> From: Nick Burch 
> Sent: Friday, September 9, 2022 3:48 PM
> To: Sindhu Mahadevappa 
> Cc: dev@tika.apache.org
> Subject: Re: Issue related to file mime type detection
>
> [EXTERNAL] This message originated from outside of ArisGlobal. Please treat 
> hyperlinks, attachments, and instructions in this email with caution. 
> ArisGlobal will not ask for you for credentials in any email.
>
> On Fri, 9 Sep 2022, Sindhu Mahadevappa wrote:
>> We are using tika-parsers 1.23
>
> Tika 1.23 was released in December 2019! You should really use
> something much more recent
>
>> for comparing uploaded file mime type from file name as well as from
>> file content for security purpose.
>
> Apache Tika's detection is not recommended for security purposes. We try our 
> best to give an answer. Our detection does not defend against specially 
> crafted files which look like one type but is actually a different one.
>
>> mime type from file name as audio/mp4 and mine type from file content
>> as
>> video/mp4 so it is validating as file type not supported.
>
> Try with a more recent version of Apache Tika. Make sure you include
> the Tika Parsers jar and dependencies for container aware detection
> within MP4 files. If you still have an issue with Tika 2.4.1, raise a
> bug and upload a triggering file so we can investigate
>
> Nick
> This email and any files transmitted with it are confidential and intended 
> solely for the use of the individual or entity to whom they are addressed. If 
> you are not the named addressee you should not disseminate, distribute or 
> copy this e-mail. Please notify the sender or system manager by email 
> immediately if you have received this e-mail by mistake and delete this 
> e-mail from your system. If you are not the intended recipient you are 
> notified that disclosing, copying, distributing or taking any action in 
> reliance on the contents of this information is strictly prohibited and 
> against the law.
>
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.


[GitHub] [tika] THausherr merged pull request #696: Bump aws.version from 1.12.301 to 1.12.303

2022-09-15 Thread GitBox


THausherr merged PR #696:
URL: https://github.com/apache/tika/pull/696


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #697: Bump maven-shade-plugin from 3.3.0 to 3.4.0

2022-09-15 Thread GitBox


THausherr merged PR #697:
URL: https://github.com/apache/tika/pull/697


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #695: Bump protobuf-java from 3.21.5 to 3.21.6

2022-09-15 Thread GitBox


THausherr merged PR #695:
URL: https://github.com/apache/tika/pull/695


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #697: Bump maven-shade-plugin from 3.3.0 to 3.4.0

2022-09-15 Thread GitBox


dependabot[bot] opened a new pull request, #697:
URL: https://github.com/apache/tika/pull/697

   Bumps [maven-shade-plugin](https://github.com/apache/maven-shade-plugin) 
from 3.3.0 to 3.4.0.
   
   Commits
   
   https://github.com/apache/maven-shade-plugin/commit/885de678577573111568e80b45869a90e2a8fb46;>885de67
 [maven-release-plugin] prepare release maven-shade-plugin-3.4.0
   https://github.com/apache/maven-shade-plugin/commit/dc8f0679c129238813ea797ccebe690b53380eb4;>dc8f067
 Revert [maven-release-plugin] prepare release 
maven-shade-plugin-3.3.1
   https://github.com/apache/maven-shade-plugin/commit/dcd5caed85dbec16d8222dd9d128d16db6ee9900;>dcd5cae
 Revert [maven-release-plugin] prepare for next development 
iteration
   https://github.com/apache/maven-shade-plugin/commit/b2d5b53f88f05616f4b92dc14f800b48bfbc9a52;>b2d5b53
 [maven-release-plugin] prepare for next development iteration
   https://github.com/apache/maven-shade-plugin/commit/a09e6de960061ccf600ad0c979df99d748770a55;>a09e6de
 [maven-release-plugin] prepare release maven-shade-plugin-3.3.1
   https://github.com/apache/maven-shade-plugin/commit/875114a0c8f56dcce5dcc354d095d356dee0767a;>875114a
 [MSHADE-416] Fix Jenkins URL
   https://github.com/apache/maven-shade-plugin/commit/ad2f6f8e7855860b69b950d14ca8ec627b099d6b;>ad2f6f8
 [MSHADE-425] Relocate services name before add to serviceEntries
   https://github.com/apache/maven-shade-plugin/commit/26b587384bb664daf59c30e72693ee1ae105fd71;>26b5873
 gha shared v3
   https://github.com/apache/maven-shade-plugin/commit/3994b11b02182db588aa76d928b9ecc949ef15c3;>3994b11
 Bump xmlunit-legacy from 2.7.0 to 2.9.0
   https://github.com/apache/maven-shade-plugin/commit/89d9e791275450a0d742221d798005330ea797cc;>89d9e79
 Added release drafter.
   Additional commits viewable in https://github.com/apache/maven-shade-plugin/compare/maven-shade-plugin-3.3.0...maven-shade-plugin-3.4.0;>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.maven.plugins:maven-shade-plugin=maven=3.3.0=3.4.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #696: Bump aws.version from 1.12.301 to 1.12.303

2022-09-15 Thread GitBox


dependabot[bot] opened a new pull request, #696:
URL: https://github.com/apache/tika/pull/696

   Bumps `aws.version` from 1.12.301 to 1.12.303.
   Updates `aws-java-sdk-transcribe` from 1.12.301 to 1.12.303
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-transcribe's
 changelog.
   
   1.12.303 2022-09-14
   AWS Amplify UI Builder
   
   
   Features
   
   Amplify Studio UIBuilder is introducing forms functionality. Forms can 
be configured from Data Store models, JSON, or from scratch. These forms can 
then be generated in your project and used like any other React components.
   
   
   
   Amazon Elastic Compute Cloud
   
   
   Features
   
   This update introduces API operations to manage and create local gateway 
route tables, CoIP pools, and VIF group associations.
   
   
   
   1.12.302 2022-09-13
   AWS Transfer Family
   
   
   Features
   
   This release introduces the ability to have multiple server host keys 
for any of your Transfer Family servers that use the SFTP protocol.
   
   
   
   AWSKendraFrontendService
   
   
   Features
   
   This release enables our customer to choose the option of Sharepoint 
2019 for the on-premise Sharepoint connector.
   
   
   
   Amazon CloudWatch Evidently
   
   
   Features
   
   This release adds support for the client-side evaluation - powered by 
AWS AppConfig feature.
   
   
   
   Amazon Connect Customer Profiles
   
   
   Features
   
   Added isUnstructured in response for Customer Profiles Integration 
APIs
   
   
   
   Amazon Elastic Compute Cloud
   
   
   Features
   
   Two new features for local gateway route tables: support for static 
routes targeting Elastic Network Interfaces and direct VPC routing.
   
   
   
   Elastic Disaster Recovery Service
   
   
   Features
   
   Fixed the data type of lagDuration that is returned in Describe Source 
Server API
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/69485f42087cf9dc8cc39ec64c83c6274a40ed0c;>69485f4
 AWS SDK for Java 1.12.303
   https://github.com/aws/aws-sdk-java/commit/16bc711b176e85482e324a8130ec8fc2e86be87d;>16bc711
 Update GitHub version number to 1.12.303-SNAPSHOT
   https://github.com/aws/aws-sdk-java/commit/cdb8ca809845bb5e32c00f4f27c67175cfc64809;>cdb8ca8
 AWS SDK for Java 1.12.302
   https://github.com/aws/aws-sdk-java/commit/6534fd1bf1131bdda73a83ca3feb11669878cde3;>6534fd1
 Update GitHub version number to 1.12.302-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.301...1.12.303;>compare 
view
   
   
   
   
   Updates `aws-java-sdk-s3` from 1.12.301 to 1.12.303
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-s3's
 changelog.
   
   1.12.303 2022-09-14
   AWS Amplify UI Builder
   
   
   Features
   
   Amplify Studio UIBuilder is introducing forms functionality. Forms can 
be configured from Data Store models, JSON, or from scratch. These forms can 
then be generated in your project and used like any other React components.
   
   
   
   Amazon Elastic Compute Cloud
   
   
   Features
   
   This update introduces API operations to manage and create local gateway 
route tables, CoIP pools, and VIF group associations.
   
   
   
   1.12.302 2022-09-13
   AWS Transfer Family
   
   
   Features
   
   This release introduces the ability to have multiple server host keys 
for any of your Transfer Family servers that use the SFTP protocol.
   
   
   
   AWSKendraFrontendService
   
   
   Features
   
   This release enables our customer to choose the option of Sharepoint 
2019 for the on-premise Sharepoint connector.
   
   
   
   Amazon CloudWatch Evidently
   
   
   Features
   
   This release adds support for the client-side evaluation - powered by 
AWS AppConfig feature.
   
   
   
   Amazon Connect Customer Profiles
   
   
   Features
   
   Added isUnstructured in response for Customer Profiles Integration 
APIs
   
   
   
   Amazon Elastic Compute Cloud
   
   
   Features
   
   Two new features for local gateway route tables: support for static 
routes targeting Elastic Network Interfaces and direct VPC routing.
   
   
   
   Elastic Disaster Recovery Service
   
   
   Features
   
   Fixed the data type of lagDuration that is returned in Describe Source 
Server API
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/69485f42087cf9dc8cc39ec64c83c6274a40ed0c;>69485f4
 AWS SDK for Java 1.12.303
   https://github.com/aws/aws-sdk-java/commit/16bc711b176e85482e324a8130ec8fc2e86be87d;>16bc711
 Update GitHub version number to 1.12.303-SNAPSHOT
   https://github.com/aws/aws-sdk-java/commit/cdb8ca809845bb5e32c00f4f27c67175cfc64809;>cdb8ca8
 AWS SDK for Java 1.12.302
   https://github.com/aws/aws-sdk-java/commit/6534fd1bf1131bdda73a83ca3feb11669878cde3;>6534fd1
 Update GitHub version number to 1.12.302-SNAPSHOT
   See full diff in 

[GitHub] [tika] dependabot[bot] opened a new pull request, #695: Bump protobuf-java from 3.21.5 to 3.21.6

2022-09-15 Thread GitBox


dependabot[bot] opened a new pull request, #695:
URL: https://github.com/apache/tika/pull/695

   Bumps [protobuf-java](https://github.com/protocolbuffers/protobuf) from 
3.21.5 to 3.21.6.
   
   Commits
   
   https://github.com/protocolbuffers/protobuf/commit/24487dd1045c7f3d64a21f38a3f0c06cc4cf2edb;>24487dd
 Updating version.json and repo version numbers to: 21.6
   https://github.com/protocolbuffers/protobuf/commit/d88266c319e42650344f3c5df3a0feecc7865fb5;>d88266c
 Merge pull request https://github-redirect.dependabot.com/protocolbuffers/protobuf/issues/10545;>#10545
 from deannagarcia/21.x
   https://github.com/protocolbuffers/protobuf/commit/cd0ee8f45d0d749a1e4deb9847e53efb62c04d7b;>cd0ee8f
 Apply patch
   https://github.com/protocolbuffers/protobuf/commit/ea2f20498e2853a58875f247b06edcb567ccd86b;>ea2f204
 Uninstall system protobuf to prevent version conflicts (https://github-redirect.dependabot.com/protocolbuffers/protobuf/issues/10522;>#10522)
   https://github.com/protocolbuffers/protobuf/commit/aafacb09c75d521b11500970827214f2247dd4aa;>aafacb0
 Remove broken use_bazel.sh (https://github-redirect.dependabot.com/protocolbuffers/protobuf/issues/10511;>#10511)
   https://github.com/protocolbuffers/protobuf/commit/40847c7ee5848f41c505a1ece1f27ec4a687837b;>40847c7
 Fix Kokoro tests to work on Monterey machines (https://github-redirect.dependabot.com/protocolbuffers/protobuf/issues/10473;>#10473)
   https://github.com/protocolbuffers/protobuf/commit/2fb33f46a6cf6dc20fb76edef7e00162b5eedb44;>2fb33f4
 Merge pull request https://github-redirect.dependabot.com/protocolbuffers/protobuf/issues/10382;>#10382
 from protocolbuffers/21.x-202208092202
   https://github.com/protocolbuffers/protobuf/commit/29f03e04d3f72b1749b1bf720183b0fb9b6b7d69;>29f03e0
 Update version.json to: 21.6-dev
   https://github.com/protocolbuffers/protobuf/commit/638779f353731a0a04496bde20d14164684c3d93;>638779f
 Merge pull request https://github-redirect.dependabot.com/protocolbuffers/protobuf/issues/10380;>#10380
 from protocolbuffers/21.x-202208091710
   See full diff in https://github.com/protocolbuffers/protobuf/compare/v3.21.5...v3.21.6;>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=com.google.protobuf:protobuf-java=maven=3.21.5=3.21.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org