Hi David,
On Tue, Aug 31, 2004 at 02:02:35PM +0100, David Adams wrote:
> Martin,
>
> This is a joke, yes?
>
> If not, please note that the Lucene FAQs make it clear that it is equally
> dependant on external parsers.
That's what a colleague recommended to me: to look in the lucene FAQ's
whether there is any alternative to the already mentioned parsers..
Honestly, I didn't take a look first before writing my email. :(
Fact is: indexing some webtree with the mentioned ppthtml, xlhtml or
xpdf takes ten times longer with a load of 10 on a dualproc Sun V480 with
4G RAM. Indexing only .doc files and .html rundig completes in about 30mins.
I discover hanging ppthtml and xlhtml processes, consuming nearly 95% CPU
and consuming about 1GB RAM for each document. Of course, those processes
don't come back and have to be killed... :(
> We use wp2html to convert Word documents and it's fine,but we bought it only
> because we needed to convert Wordperfect documents (not that we get many!)
>
> David Adams
Yours,
Martin
--
--------------------------------------------------------
arago AG, Institut fuer komplexes Datenmanagement
Am Niddatal 3, 60488 Frankfurt/Main, [EMAIL PROTECTED]
Tel. 069/405680, Fax 069/40568111, http://www.arago.de
--------------------------------------------------------
pgpbyr3LINPSq.pgp
Description: PGP signature

