Hi, On Mon, Aug 22, 2011 at 11:06 AM, nirnaydewan <nirnayde...@gmail.com> wrote: > But for the XHTML output, i believe that is one time process while > extraction is being done. That means again i have to store/index that xhtml > output text as well for later use. Is this correct or am i missing > something?
Correct, you'd need to store the preview somewhere. Note that with the TeeContentHandler class you can get both a text-only output for indexing and an XHTML output for preview from a single parsing pass through Tika. BR, Jukka Zitting