[jira] [Updated] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated TIKA-4232:
---
Fix Version/s: 2.9.3

> Create and execute unit tests for tika-helm
> ---
>
> Key: TIKA-4232
> URL: https://issues.apache.org/jira/browse/TIKA-4232
> Project: Tika
>  Issue Type: Improvement
>  Components: tika-helm
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.9.3
>
>
> The goal is to execute chart unit tests against each tika-helm pull request.
> I found the [Helm Unit 
> Tests|[https://github.com/marketplace/actions/helm-unit-tests]] GitHub Action 
> which uses [https://github.com/helm-unittest/helm-unittest] as a Helm plugin.
> The PR will consist of one or more unit tests automated via the GitHub action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved TIKA-4232.

Resolution: Fixed

> Create and execute unit tests for tika-helm
> ---
>
> Key: TIKA-4232
> URL: https://issues.apache.org/jira/browse/TIKA-4232
> Project: Tika
>  Issue Type: Improvement
>  Components: tika-helm
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
>
> The goal is to execute chart unit tests against each tika-helm pull request.
> I found the [Helm Unit 
> Tests|[https://github.com/marketplace/actions/helm-unit-tests]] GitHub Action 
> which uses [https://github.com/helm-unittest/helm-unittest] as a Helm plugin.
> The PR will consist of one or more unit tests automated via the GitHub action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed TIKA-4232.
--

> Create and execute unit tests for tika-helm
> ---
>
> Key: TIKA-4232
> URL: https://issues.apache.org/jira/browse/TIKA-4232
> Project: Tika
>  Issue Type: Improvement
>  Components: tika-helm
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.9.3
>
>
> The goal is to execute chart unit tests against each tika-helm pull request.
> I found the [Helm Unit 
> Tests|[https://github.com/marketplace/actions/helm-unit-tests]] GitHub Action 
> which uses [https://github.com/helm-unittest/helm-unittest] as a Helm plugin.
> The PR will consist of one or more unit tests automated via the GitHub action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845510#comment-17845510
 ] 

ASF GitHub Bot commented on TIKA-4232:
--

lewismc commented on PR #17:
URL: https://github.com/apache/tika-helm/pull/17#issuecomment-2105292770

   INFRA ticket was resolved and everything passing great now. 




> Create and execute unit tests for tika-helm
> ---
>
> Key: TIKA-4232
> URL: https://issues.apache.org/jira/browse/TIKA-4232
> Project: Tika
>  Issue Type: Improvement
>  Components: tika-helm
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
>
> The goal is to execute chart unit tests against each tika-helm pull request.
> I found the [Helm Unit 
> Tests|[https://github.com/marketplace/actions/helm-unit-tests]] GitHub Action 
> which uses [https://github.com/helm-unittest/helm-unittest] as a Helm plugin.
> The PR will consist of one or more unit tests automated via the GitHub action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] TIKA-4232 Create and execute unit tests for tika-helm [tika-helm]

2024-05-10 Thread via GitHub


lewismc commented on PR #17:
URL: https://github.com/apache/tika-helm/pull/17#issuecomment-2105292770

   INFRA ticket was resolved and everything passing great now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845509#comment-17845509
 ] 

ASF GitHub Bot commented on TIKA-4232:
--

lewismc merged PR #17:
URL: https://github.com/apache/tika-helm/pull/17




> Create and execute unit tests for tika-helm
> ---
>
> Key: TIKA-4232
> URL: https://issues.apache.org/jira/browse/TIKA-4232
> Project: Tika
>  Issue Type: Improvement
>  Components: tika-helm
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
>
> The goal is to execute chart unit tests against each tika-helm pull request.
> I found the [Helm Unit 
> Tests|[https://github.com/marketplace/actions/helm-unit-tests]] GitHub Action 
> which uses [https://github.com/helm-unittest/helm-unittest] as a Helm plugin.
> The PR will consist of one or more unit tests automated via the GitHub action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] TIKA-4232 Create and execute unit tests for tika-helm [tika-helm]

2024-05-10 Thread via GitHub


lewismc merged PR #17:
URL: https://github.com/apache/tika-helm/pull/17


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845508#comment-17845508
 ] 

ASF GitHub Bot commented on TIKA-4232:
--

lewismc opened a new pull request, #17:
URL: https://github.com/apache/tika-helm/pull/17

   PR to address https://issues.apache.org/jira/browse/TIKA-4232




> Create and execute unit tests for tika-helm
> ---
>
> Key: TIKA-4232
> URL: https://issues.apache.org/jira/browse/TIKA-4232
> Project: Tika
>  Issue Type: Improvement
>  Components: tika-helm
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
>
> The goal is to execute chart unit tests against each tika-helm pull request.
> I found the [Helm Unit 
> Tests|[https://github.com/marketplace/actions/helm-unit-tests]] GitHub Action 
> which uses [https://github.com/helm-unittest/helm-unittest] as a Helm plugin.
> The PR will consist of one or more unit tests automated via the GitHub action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845507#comment-17845507
 ] 

ASF GitHub Bot commented on TIKA-4232:
--

lewismc closed pull request #17: TIKA-4232 Create and execute unit tests for 
tika-helm
URL: https://github.com/apache/tika-helm/pull/17




> Create and execute unit tests for tika-helm
> ---
>
> Key: TIKA-4232
> URL: https://issues.apache.org/jira/browse/TIKA-4232
> Project: Tika
>  Issue Type: Improvement
>  Components: tika-helm
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
>
> The goal is to execute chart unit tests against each tika-helm pull request.
> I found the [Helm Unit 
> Tests|[https://github.com/marketplace/actions/helm-unit-tests]] GitHub Action 
> which uses [https://github.com/helm-unittest/helm-unittest] as a Helm plugin.
> The PR will consist of one or more unit tests automated via the GitHub action.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] TIKA-4232 Create and execute unit tests for tika-helm [tika-helm]

2024-05-10 Thread via GitHub


lewismc closed pull request #17: TIKA-4232 Create and execute unit tests for 
tika-helm
URL: https://github.com/apache/tika-helm/pull/17


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845302#comment-17845302
 ] 

ASF GitHub Bot commented on TIKA-4252:
--

tballison commented on code in PR #1753:
URL: https://github.com/apache/tika/pull/1753#discussion_r1596634451


##
tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java:
##
@@ -455,33 +455,33 @@ private Fetcher getFetcher(FetchEmitTuple t) {
 }
 }
 
-protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple t, 
Fetcher fetcher) {
-FetchKey fetchKey = t.getFetchKey();
+protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple 
fetchEmitTuple, Fetcher fetcher) {
+FetchKey fetchKey = fetchEmitTuple.getFetchKey();
+Metadata fetchResponseMetadata = new Metadata();

Review Comment:
   The metadata that goes in the fetchemittuple was envisioned to be 
user-injected metadata that was injected after the parse and then emitted (e.g. 
provenance metadata).
   
   I think we need to put both metadatas on the fetchemittuple.
   
   This is what I'm thinking...let me know what you think.
   
   So, there will be three metadatas in play. The fetchemit tuple will have a 
fetchRequestMetadata (???) and a userMetadata (???). At parse time, we'll 
create a fresh metadata object, which we'll call "responseMetadata" in the 
following call: fetcher.fetch(requestMetadata, responseMetadata).
   
   The parse will then use the responseMetadata and, after the parse, inject 
the userMetadata from the fetchEmitTuple.
   
   The fetcher may use the fetchRequestMetadata to carry out its request, but 
info from that one should not make it into the "responseMetadata" nor make it 
into the emit data.





> PipesClient#process - seems to lose the Fetch input metadata?
> -
>
> Key: TIKA-4252
> URL: https://issues.apache.org/jira/browse/TIKA-4252
> Project: Tika
>  Issue Type: Bug
>Reporter: Nicholas DiPiazza
>Priority: Major
> Fix For: 3.0.0
>
>
> when calling:
> PipesResult pipesResult = pipesClient.process(new 
> FetchEmitTuple(request.getFetchKey(),
>                     new FetchKey(fetcher.getName(), request.getFetchKey()), 
> new EmitKey(), tikaMetadata, HandlerConfig.DEFAULT_HANDLER_CONFIG, 
> FetchEmitTuple.ON_PARSE_EXCEPTION.SKIP));
> the tikaMetadata is not present in the fetch data when the fetch method is 
> called.
>  
> It's OK through this part: 
>             UnsynchronizedByteArrayOutputStream bos = 
> UnsynchronizedByteArrayOutputStream.builder().get();
>             try (ObjectOutputStream objectOutputStream = new 
> ObjectOutputStream(bos))
> {                 objectOutputStream.writeObject(t);             }
>             byte[] bytes = bos.toByteArray();
>             output.write(CALL.getByte());
>             output.writeInt(bytes.length);
>             output.write(bytes);
>             output.flush();
>  
> i verified the bytes have the expected metadata from that point.
>  
> UPDATE: found issue
>  
> org.apache.tika.pipes.PipesServer#parseFromTuple
>  
> is using a new Metadata when it should only use empty metadata if fetch tuple 
> metadata is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] TIKA-4252: add request metadata [tika]

2024-05-10 Thread via GitHub


tballison commented on code in PR #1753:
URL: https://github.com/apache/tika/pull/1753#discussion_r1596634451


##
tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java:
##
@@ -455,33 +455,33 @@ private Fetcher getFetcher(FetchEmitTuple t) {
 }
 }
 
-protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple t, 
Fetcher fetcher) {
-FetchKey fetchKey = t.getFetchKey();
+protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple 
fetchEmitTuple, Fetcher fetcher) {
+FetchKey fetchKey = fetchEmitTuple.getFetchKey();
+Metadata fetchResponseMetadata = new Metadata();

Review Comment:
   The metadata that goes in the fetchemittuple was envisioned to be 
user-injected metadata that was injected after the parse and then emitted (e.g. 
provenance metadata).
   
   I think we need to put both metadatas on the fetchemittuple.
   
   This is what I'm thinking...let me know what you think.
   
   So, there will be three metadatas in play. The fetchemit tuple will have a 
fetchRequestMetadata (???) and a userMetadata (???). At parse time, we'll 
create a fresh metadata object, which we'll call "responseMetadata" in the 
following call: fetcher.fetch(requestMetadata, responseMetadata).
   
   The parse will then use the responseMetadata and, after the parse, inject 
the userMetadata from the fetchEmitTuple.
   
   The fetcher may use the fetchRequestMetadata to carry out its request, but 
info from that one should not make it into the "responseMetadata" nor make it 
into the emit data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845299#comment-17845299
 ] 

ASF GitHub Bot commented on TIKA-4252:
--

tballison commented on code in PR #1753:
URL: https://github.com/apache/tika/pull/1753#discussion_r1596634451


##
tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java:
##
@@ -455,33 +455,33 @@ private Fetcher getFetcher(FetchEmitTuple t) {
 }
 }
 
-protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple t, 
Fetcher fetcher) {
-FetchKey fetchKey = t.getFetchKey();
+protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple 
fetchEmitTuple, Fetcher fetcher) {
+FetchKey fetchKey = fetchEmitTuple.getFetchKey();
+Metadata fetchResponseMetadata = new Metadata();

Review Comment:
   The metadata that goes in the fetchemittuple was envisioned to be 
user-injected metadata that passed through the parse process and was emitted 
(provenance metadata).
   
   I think we need to put both metadatas on the fetchemittuple.





> PipesClient#process - seems to lose the Fetch input metadata?
> -
>
> Key: TIKA-4252
> URL: https://issues.apache.org/jira/browse/TIKA-4252
> Project: Tika
>  Issue Type: Bug
>Reporter: Nicholas DiPiazza
>Priority: Major
> Fix For: 3.0.0
>
>
> when calling:
> PipesResult pipesResult = pipesClient.process(new 
> FetchEmitTuple(request.getFetchKey(),
>                     new FetchKey(fetcher.getName(), request.getFetchKey()), 
> new EmitKey(), tikaMetadata, HandlerConfig.DEFAULT_HANDLER_CONFIG, 
> FetchEmitTuple.ON_PARSE_EXCEPTION.SKIP));
> the tikaMetadata is not present in the fetch data when the fetch method is 
> called.
>  
> It's OK through this part: 
>             UnsynchronizedByteArrayOutputStream bos = 
> UnsynchronizedByteArrayOutputStream.builder().get();
>             try (ObjectOutputStream objectOutputStream = new 
> ObjectOutputStream(bos))
> {                 objectOutputStream.writeObject(t);             }
>             byte[] bytes = bos.toByteArray();
>             output.write(CALL.getByte());
>             output.writeInt(bytes.length);
>             output.write(bytes);
>             output.flush();
>  
> i verified the bytes have the expected metadata from that point.
>  
> UPDATE: found issue
>  
> org.apache.tika.pipes.PipesServer#parseFromTuple
>  
> is using a new Metadata when it should only use empty metadata if fetch tuple 
> metadata is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] TIKA-4252: add request metadata [tika]

2024-05-10 Thread via GitHub


tballison commented on code in PR #1753:
URL: https://github.com/apache/tika/pull/1753#discussion_r1596634451


##
tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java:
##
@@ -455,33 +455,33 @@ private Fetcher getFetcher(FetchEmitTuple t) {
 }
 }
 
-protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple t, 
Fetcher fetcher) {
-FetchKey fetchKey = t.getFetchKey();
+protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple 
fetchEmitTuple, Fetcher fetcher) {
+FetchKey fetchKey = fetchEmitTuple.getFetchKey();
+Metadata fetchResponseMetadata = new Metadata();

Review Comment:
   The metadata that goes in the fetchemittuple was envisioned to be 
user-injected metadata that passed through the parse process and was emitted 
(provenance metadata).
   
   I think we need to put both metadatas on the fetchemittuple.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-10 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845229#comment-17845229
 ] 

Hudson commented on TIKA-4252:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk11 #1625 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1625/])
TIKA-4252: add request metadata (#1753) (github: 
[https://github.com/apache/tika/commit/b068e4290ad311b1e5f1ddaa6afa40be9e7bd797])
* (edit) 
tika-core/src/main/java/org/apache/tika/pipes/fetcher/fs/FileSystemFetcher.java
* (edit) tika-core/src/test/java/org/apache/tika/pipes/fetcher/MockFetcher.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/fetcher/EmptyFetcher.java
* (edit) 
tika-pipes/tika-fetchers/tika-fetcher-http/src/main/java/org/apache/tika/pipes/fetcher/http/HttpFetcher.java
* (edit) 
tika-core/src/main/java/org/apache/tika/pipes/fetcher/url/UrlFetcher.java
* (edit) 
tika-pipes/tika-fetchers/tika-fetcher-gcs/src/main/java/org/apache/tika/pipes/fetcher/gcs/GCSFetcher.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/fetcher/Fetcher.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/fetcher/RangeFetcher.java
* (edit) tika-core/src/test/java/org/apache/tika/pipes/async/MockFetcher.java
* (edit) 
tika-pipes/tika-fetchers/tika-fetcher-az-blob/src/main/java/org/apache/tika/pipes/fetcher/azblob/AZBlobFetcher.java
* (edit) 
tika-pipes/tika-fetchers/tika-fetcher-s3/src/main/java/org/apache/tika/pipes/fetcher/s3/S3Fetcher.java


> PipesClient#process - seems to lose the Fetch input metadata?
> -
>
> Key: TIKA-4252
> URL: https://issues.apache.org/jira/browse/TIKA-4252
> Project: Tika
>  Issue Type: Bug
>Reporter: Nicholas DiPiazza
>Priority: Major
> Fix For: 3.0.0
>
>
> when calling:
> PipesResult pipesResult = pipesClient.process(new 
> FetchEmitTuple(request.getFetchKey(),
>                     new FetchKey(fetcher.getName(), request.getFetchKey()), 
> new EmitKey(), tikaMetadata, HandlerConfig.DEFAULT_HANDLER_CONFIG, 
> FetchEmitTuple.ON_PARSE_EXCEPTION.SKIP));
> the tikaMetadata is not present in the fetch data when the fetch method is 
> called.
>  
> It's OK through this part: 
>             UnsynchronizedByteArrayOutputStream bos = 
> UnsynchronizedByteArrayOutputStream.builder().get();
>             try (ObjectOutputStream objectOutputStream = new 
> ObjectOutputStream(bos))
> {                 objectOutputStream.writeObject(t);             }
>             byte[] bytes = bos.toByteArray();
>             output.write(CALL.getByte());
>             output.writeInt(bytes.length);
>             output.write(bytes);
>             output.flush();
>  
> i verified the bytes have the expected metadata from that point.
>  
> UPDATE: found issue
>  
> org.apache.tika.pipes.PipesServer#parseFromTuple
>  
> is using a new Metadata when it should only use empty metadata if fetch tuple 
> metadata is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)