Hi, Any interest in this? If not, is there some other Lucene project that I should approach?
BR, Jukka Zitting On 7/18/06, Jukka Zitting <[EMAIL PROTECTED]> wrote:
Hi, I'm a committer of the Apache Jackrabbit project, and I've recently been working on improving the full text indexing support in Jackrabbit. We've used standard Lucene Java as the embedded full text search engine in Jackrabbit, but created our own set of parsers for extracting text content from binary files. So far our parser interface TextFilter [1] has been Jackrabbit-specific, but my recent refactoring proposal, TextExtractor, [2] aims for a generic solution that converts a generic InputStream into a Reader for passing to Lucene Java. Before coming up with the proposal I tried looking for similar solutions, but couldn't find any that would have satisfied my requirement of no external dependencies other than the JRE. Your o.a.nutch.parse.Parser interface however came quite close, and you already have an extensive set of existing implementations, so I'd like to leverage your work with the Parser implementations while finding a way to avoid the full Nutch and Hadoop dependencies. I believe that there are a number of other Lucene users who have similar needs. Thus I'd like to ask if there would be interest in making your Parser interface and implementations more easily accessible to external projects, perhaps as a separate library. If you're interested, I'd be happy to participate in such an effort. [1] http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit/src/main/java/org/apache/jackrabbit/core/query/TextFilter.java?view=markup [2] http://issues.apache.org/jira/browse/JCR-415 BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR consulting, and Java development