On Thu, 17 Jun 2010, Max Valjanski wrote:
I tried to do that, but I found that this does not fit into Tika architecture. It is required to read whole file to parse OLE-container.

Yup, I've found much the same thing. My idea was to have a new detector that you can layer in between the others, which will parse the containers and keep them around if needed. If you don't want it, skip it from the chain.

I'm not sure if what I've done makes sense, but I've attached a patch that demos the idea to TIKA-447 . Do people think the idea is worth pursuing further, or should we try something different?

Nick

Reply via email to