I would prefer to see a good open-source framework pulling together a collection of document parsers but which isn't tied directly to Lucene (that binding would be via *another* project). If the parser framework extracted document text in a standard document-and-application-neutral form (XML/Java object?) this could underpin *any* IR/IE project wanting to make use of the parser functionality e.g. the GATE framework for example. That would ultimately make a much more valuable piece of functionality and is the approach taken by Stellent (used by many search engines, recently purchased by Oracle).

Cheers
Mark




        
        
                
___________________________________________________________ All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to