t;> initial attempt
>> was to index the output from ToTextContentHandler.toString() as a Lucene
>> Text field.
>>
>> This is unlikely to be effective for large files. So I wonder what
>> strategies exist for a
>> more effective indexing/tokenization of t
riment with.
>
> The feedback will be appreciated Cheers, Sergey
- --
______
Christian Reuschling, Dipl.-Ing.(BA)
Software Engineer
Knowledge Management Department
German Research Center for Artificial Intelligence DFKI GmbH
Trippstadter Straße 122, D-67663 Kaiserslautern, Germany
Phone: +49.631.2057
McCandless
>> 收件人:user@tika.apache.org,
>> sdr...@sina.com 主题:Re: hello , how to utilize tika inside lucene ?
>> 日期:2013年02月25日 03点55分
>>
>
- --
__
Christian Reuschling, Dipl.-Ing.(BA)
Software Engineer
Knowledge Management Depart
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Migrated to Tika 1.3, for those that use Tika and need further crawling
capabilities.
https://github.com/leechcrawler/leech
Enjoy! :)
Christian
- --
__
Christian