Hi, On Thu, May 28, 2009 at 11:56 AM, Paul Skinner <[email protected]> wrote: > My preferred option would therefore be to use the jackrabbit-tika component.
OK. I'll resurrect it then in the new JCR Commons subproject from where we can release it as a standalone component. The basic idea behind the jackrabbit-tika component is that you can replace all your configured text extractor classes with org.apache.jackrabbit.tika.TikaTextExtractor that will use Apache Tika to extract text from most major file formats. BR, Jukka Zitting
