Hi, On Sat, Aug 20, 2011 at 3:39 PM, nirnaydewan <[email protected]> wrote: > Now, further expanding Doc A node, i want to show a preview of the whole > text like it was in document with formatting and all. > > Is it really possible? Because to what i have known till now is that, only > the text is extracted and stored with all formatting discarded.
Not really. Tika's output is mostly designed for full text extraction, so to get a properly formatted preview you'd need to store also the original document and have some custom preview generation code that understands that document format. There's been some work on making the XHTML output from Tika more useful as a rough preview of the document, so you could try using that. However, even with possible future improvements, the XHTML output from Tika will never be able to mirror all (or many cases even most) formatting details of the original document. Finally, there's an open feature request (https://issues.apache.org/jira/browse/TIKA-90) for making it possible for Tika parsers to create thumbnail images of the parsed documents. Once such a feature gets implemented, it would be possible for an application to use such images also as a simple preview mechanism. BR, Jukka Zitting
