tballison commented on code in PR #1753:
URL: https://github.com/apache/tika/pull/1753#discussion_r1596634451


##########
tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java:
##########
@@ -455,33 +455,33 @@ private Fetcher getFetcher(FetchEmitTuple t) {
         }
     }
 
-    protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple t, 
Fetcher fetcher) {
-        FetchKey fetchKey = t.getFetchKey();
+    protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple 
fetchEmitTuple, Fetcher fetcher) {
+        FetchKey fetchKey = fetchEmitTuple.getFetchKey();
+        Metadata fetchResponseMetadata = new Metadata();

Review Comment:
   The metadata that goes in the fetchemittuple was envisioned to be 
user-injected metadata that was injected after the parse and then emitted (e.g. 
provenance metadata).
   
   I think we need to put both metadatas on the fetchemittuple.
   
   This is what I'm thinking...let me know what you think.
   
   So, there will be three metadatas in play. The fetchemit tuple will have a 
fetchRequestMetadata (???) and a userMetadata (???). At parse time, we'll 
create a fresh metadata object, which we'll call "responseMetadata" in the 
following call: fetcher.fetch(requestMetadata, responseMetadata).
   
   The parse will then use the responseMetadata and, after the parse, inject 
the userMetadata from the fetchEmitTuple.
   
   The fetcher may use the fetchRequestMetadata to carry out its request, but 
info from that one should not make it into the "responseMetadata" nor make it 
into the emit data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to