Dejan Nenov wrote:
Second that - I was a client of Stellent - the libs work great but are
expensive. To see Stellent in action - get a copy of the free X1 desktop
search or the X1 server (Lucene based).

I would say that the libs work great but are slow.

One problem is that they don't provide a Java API. The "Java" API they provide is sample code which calls a native executable, not even a JNI library. So you pay the penalty of that native app starting up every time you extract a document.

If all you want is the plain text, for many document types it's actually fairly fast, and beats having to write code for every document type yourself (or locating libraries to do it for you.) But as soon as you want the marked up text, it becomes a completely different story. We benchmarked it to be something like 10 times slower to handle markup than handling raw text and metadata. Most of this extra time was spent parsing the XML it outputs, which is often far more verbose than it needs to be for the amount of formatting it actually contains.

Daniel


--
Daniel Noll

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web: http://www.nuix.com.au/                        Fax: +61 2 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to