janhoy commented on PR #3670:
URL: https://github.com/apache/solr/pull/3670#issuecomment-3325360384

   @epugh So I pushed a baby step more code, providing
   * Sub classed the test, so all the `assertQ` tests run both for `local` and 
`tikaserver` backends
   * Made some more tikaserver tests pass
   <img width="348" height="459" alt="Skjermbilde 2025-09-23 kl  19 29 10" 
src="https://github.com/user-attachments/assets/2f71100b-b1a4-4807-acbc-422aad69b11d";
 />
   
   Failing tests typically fail due to lack of support for the SAX style 
capture feature of Tika API. Others fail due to difference in metadata between 
Tika1 and TikaServer3, e.g. where Tika1 would output metadata like `title` and 
`author`, this is now normalized and would be `dc:title` and `dc:author`. We 
can either provide a (feature-toggled?) mapper in TikaServer that tries to fill 
in those old metadata keys, and make tests pass that way. Or we could modify 
tests to look for the "modern" normalized equivalent, which is output by both 
Tika1 and TikaServer. I think perhaps we can start with the latter.
   
   Unfortunately it looks like Crave cannot run TestContainers tests (no docker 
available?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to