Hi Delphians,

I'm using D7 and Vista/XP here.

I'm looking for some component that can extract text from a whole variety of
formats, such as Word, PDFs, HTML, XML, and so on.  I only want the text,
not the visuals or other nontextual materials, and I don't care about the
structure of the text (paragraphs, etc) so long as I have it in a usable
format.  

I know that Windows has an iFilter interface, but I'm not sure if that is a
good way to get the text, because it requires the user of my software to
download DLLs for each kind of format.  

Companies like Google, Yahoo, etc have access to text in many, many formats
for their search engines, and I suspect there is some software available
that does that extraction since its such a widely applicable need.  

Component, suggestions, or shared experiences are welcome.  


-Rich


_______________________________________________
Delphi mailing list -> [email protected]
http://lists.elists.org/cgi-bin/mailman/listinfo/delphi

Reply via email to