I have found that ppthtml 0.4 from www.xlhtml.org (now relocated to http://chicago.sourceforge.net/xlhtml), which is what I use, does not always succeed in extracting text after the first embedded image.
I have not found problems with ppthtml on RedHat Linux, but on Solaris the process size could be very large. With >20Mbytes .ppt files I doubt if it would run.
David Adams Corporate Information Services Information Systems Services University of Southampton
----- Original Message ----- From: "F. Spitzer, GEOSYSTEMS" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Wednesday, January 19, 2005 6:48 AM
Subject: [htdig] Indexing large Powerpoints
Good morning List!
I have one problem to solve. Maybe you can help me?
We have a huge (more than 250) Powerpoint collection. So I want htdig to build up an index, allowing the users to search for keywords.
Things are working so far. Htdig does itâs job quite well. The only problem that I still have consists with ppt-files larger than 20 MB. Unfortunately nearly 50% of the files are larger than 20 MB.
I set max_doc_size to 80000000 (80MB, this is the largest ppt). But running htdig will produce the following output: Input file size of 45956608 at or above 20000000 limit.
For me it seems, that there is an other limitation of htdig, that ignores the value set by max_doc_size.
How can I overcome this limitation?
I though about writing a shell script that does the conversion of ppt to html before running htdig. Htdig will than use the html files for building up the index. Using url_part_aliases during db creation and during the search will replace the html-doc location to the original ppt location.
Has anybody did this before? Ore even better is there an other solution for my problem.
Thanks a lot for you help. Any hints are welcome.
Cheers Fritz
Fritz Spitzer Schulungsleitung und Systemintegration
-------------------------------------------------------------------- GEOSYSTEMS GmbH RiesstraÃe 10, D-82110 Germering, GERMANY www.geosystems.de
E: [EMAIL PROTECTED] T: +49-(0)89-89 43 43 -0 (Ext. -20) F: +49-(0)89-89 43 43 99
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Abonnieren Sie unseren Newsletter, um immer auf dem Laufenden zu sein: www.geosystems.de/newsletter
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general
------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

