Dejan Nenov wrote:
Second that - I was a client of Stellent - the libs work great but are
expensive. To see Stellent in action - get a copy of the free X1 desktop
search or the X1 server (Lucene based).
I would say that the libs work great but are slow.
One problem is that they don't provide a Java API. The "Java" API they
provide is sample code which calls a native executable, not even a JNI
library. So you pay the penalty of that native app starting up every
time you extract a document.
If all you want is the plain text, for many document types it's actually
fairly fast, and beats having to write code for every document type
yourself (or locating libraries to do it for you.) But as soon as you
want the marked up text, it becomes a completely different story. We
benchmarked it to be something like 10 times slower to handle markup
than handling raw text and metadata. Most of this extra time was spent
parsing the XML it outputs, which is often far more verbose than it
needs to be for the amount of formatting it actually contains.
Daniel
--
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699
Web: http://www.nuix.com.au/ Fax: +61 2 9212 6902
This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]