Re: How to index the parsed content effectively

2014-07-02 Thread Christian Reuschling
t;> initial attempt >> was to index the output from ToTextContentHandler.toString() as a Lucene >> Text field. >> >> This is unlikely to be effective for large files. So I wonder what >> strategies exist for a >> more effective indexing/tokenization of t

Re: How to index the parsed content effectively

2014-07-02 Thread Christian Reuschling
riment with. > > The feedback will be appreciated Cheers, Sergey - -- ______ Christian Reuschling, Dipl.-Ing.(BA) Software Engineer Knowledge Management Department German Research Center for Artificial Intelligence DFKI GmbH Trippstadter Straße 122, D-67663 Kaiserslautern, Germany Phone: +49.631.2057

Re: 回复:Re: hello , how to utilize tika inside lucene ?

2013-02-25 Thread Christian Reuschling
McCandless >> 收件人:user@tika.apache.org, >> sdr...@sina.com 主题:Re: hello , how to utilize tika inside lucene ? >> 日期:2013年02月25日 03点55分 >> > - -- __ Christian Reuschling, Dipl.-Ing.(BA) Software Engineer Knowledge Management Depart

Leech crawler 1.3 released!

2013-02-05 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Migrated to Tika 1.3, for those that use Tika and need further crawling capabilities. https://github.com/leechcrawler/leech Enjoy! :) Christian - -- __ Christian