tballison commented on code in PR #1753: URL: https://github.com/apache/tika/pull/1753#discussion_r1596634451
########## tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java: ########## @@ -455,33 +455,33 @@ private Fetcher getFetcher(FetchEmitTuple t) { } } - protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple t, Fetcher fetcher) { - FetchKey fetchKey = t.getFetchKey(); + protected MetadataListAndEmbeddedBytes parseFromTuple(FetchEmitTuple fetchEmitTuple, Fetcher fetcher) { + FetchKey fetchKey = fetchEmitTuple.getFetchKey(); + Metadata fetchResponseMetadata = new Metadata(); Review Comment: The metadata that goes in the fetchemittuple was envisioned to be user-injected metadata that was injected after the parse and then emitted (e.g. provenance metadata). I think we need to put both metadatas on the fetchemittuple. This is what I'm thinking...let me know what you think. So, there will be three metadatas in play. The fetchemit tuple will have a fetchRequestMetadata (???) and a userMetadata (???). At parse time, we'll create a fresh metadata object, which we'll call "responseMetadata" in the following call: fetcher.fetch(requestMetadata, responseMetadata). The parse will then use the responseMetadata and, after the parse, inject the userMetadata from the fetchEmitTuple. The fetcher may use the fetchRequestMetadata to carry out its request, but info from that one should not make it into the "responseMetadata" nor make it into the emit data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org