On Monday 19 April 2004 14:01, Mario Ivankovits wrote:
> Stephane James Vaucher wrote:
> > Anyone try what Joerg suggested here?
> > http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]
> >pache.org&msgNo=6231
>
> Dont know what you would like to do, but if you simply would like to
> extract text, you could simply try this sniplet:

This leads to question I was thinking; it seems that originally this thread 
started by someone pointing that OO can be used as converter from other 
formats... but how about tokenizer for native OO documents? I have written 
full-featured converters from OO to (simplified) DocBook and HTML, and 
creating one for just tokenizing to be used by Lucene would be much easier. 
Even if it would tokenize into separate fields (document metadata, content, 
maybe bibliography separately etc), it'd be easy to do.

Would anyone find full-featured, customizable OpenOffice document tokenizer 
useful?

-+ Tatu +-



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to