Some of the tools listed use cmd line execs to output a doc of some sort to text and then I grab the text and add it to a lucene doc, etc etc...
Any stats on the scalability of that? In large scale applications, I'm assuming this will cause some serious issues... anyone have any input on this? -Chris Fraschetti On Thu, 09 Sep 2004 09:54:43 -0700, David Spencer <[EMAIL PROTECTED]> wrote: > Honey George wrote: > > > Hi, > > I know some of them. > > 1. PDF > > + http://www.pdfbox.org/ > > + http://www.foolabs.com/xpdf/download.html > > - I am using this and found good. It even supports > > My dated experience from 2 years ago was that (the evil, native code) > foolabs pdf parser was the best, but obviously things could have changed. > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg02912.html > > > various languages. > > 2. word > > + http://sourceforge.net/projects/wvware > > 3. excel > > + http://www.jguru.com/faq/view.jsp?EID=1074230 > > > > -George > > --- [EMAIL PROTECTED] wrote: > > > >>Anyone know of any reliable parsers out there for > >>pdf word > >>excel or powerpoint? > > For powerpoint it's not easy. I've been using this and it has worked > fine util recently and seems to sometimes go into an infinite loop now > on some recent PPTs. Native code and a package that seems to be dormant > but to some extent it does the job. The file "ppthtml" does the work. > > http://chicago.sourceforge.net/xlhtml > > > > >> > >> > > > > --------------------------------------------------------------------- > > > >>To unsubscribe, e-mail: > >>[EMAIL PROTECTED] > >>For additional commands, e-mail: > >>[EMAIL PROTECTED] > >> > >> > > > > > > > > > > > > > > ___________________________________________________________ALL-NEW Yahoo! > > Messenger - all new features - even more fun! http://uk.messenger.yahoo.com > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]