Some of the tools listed use cmd line execs to output a doc of some
sort to text and then I grab the text and add it to a lucene doc, etc
etc...

Any stats on the scalability of that? In large scale applications, I'm
assuming this will cause some serious issues... anyone have any input
on this?

-Chris Fraschetti


On Thu, 09 Sep 2004 09:54:43 -0700, David Spencer
<[EMAIL PROTECTED]> wrote:
> Honey George wrote:
> 
> > Hi,
> >   I know some of them.
> > 1. PDF
> >  + http://www.pdfbox.org/
> >  + http://www.foolabs.com/xpdf/download.html
> >    - I am using this and found good. It even supports
> 
> My dated experience from 2 years ago was that (the evil, native code)
> foolabs pdf parser was the best, but obviously things could have changed.
> 
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg02912.html
> 
> >      various languages.
> > 2. word
> >   + http://sourceforge.net/projects/wvware
> > 3. excel
> >   + http://www.jguru.com/faq/view.jsp?EID=1074230
> >
> > -George
> >  --- [EMAIL PROTECTED] wrote:
> >
> >>Anyone know of any reliable parsers out there for
> >>pdf word
> >>excel or powerpoint?
> 
> For powerpoint it's not easy. I've been using this and it has worked
> fine util recently and seems to sometimes go into an infinite loop now
> on some recent PPTs. Native code and a package that seems to be dormant
> but to some extent it does the job. The file "ppthtml" does the work.
> 
> http://chicago.sourceforge.net/xlhtml
> 
> 
> 
> >>
> >>
> >
> > ---------------------------------------------------------------------
> >
> >>To unsubscribe, e-mail:
> >>[EMAIL PROTECTED]
> >>For additional commands, e-mail:
> >>[EMAIL PROTECTED]
> >>
> >>
> >
> >
> >
> >
> >
> >
> > ___________________________________________________________ALL-NEW Yahoo! 
> > Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to