Hi,

On Sat, Aug 20, 2011 at 3:39 PM, nirnaydewan <[email protected]> wrote:
> Now, further expanding Doc A node, i want to show a preview of the whole
> text like it was in document with formatting and all.
>
> Is it really possible? Because to what i have known till now is that, only
> the text is extracted and stored with all formatting discarded.

Not really. Tika's output is mostly designed for full text extraction,
so to get a properly formatted preview you'd need to store also the
original document and have some custom preview generation code that
understands that document format.

There's been some work on making the XHTML output from Tika more
useful as a rough preview of the document, so you could try using
that. However, even with possible future improvements, the XHTML
output from Tika will never be able to mirror all (or many cases even
most) formatting details of the original document.

Finally, there's an open feature request
(https://issues.apache.org/jira/browse/TIKA-90) for making it possible
for Tika parsers to create thumbnail images of the parsed documents.
Once such a feature gets implemented, it would be possible for an
application to use such images also as a simple preview mechanism.

BR,

Jukka Zitting

Reply via email to