Jukka Zitting wrote:

> Hi,
>
> Any interest in this?


definitely :-)

Michi

> If not, is there some other Lucene project that
> I should approach?
>
> BR,
>
> Jukka Zitting
>
> On 7/18/06, Jukka Zitting <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> I'm a committer of the Apache Jackrabbit project, and I've recently
>> been working on improving the full text indexing support in
>> Jackrabbit. We've used standard Lucene Java as the embedded full text
>> search engine in Jackrabbit, but created our own set of parsers for
>> extracting text content from binary files. So far our parser interface
>> TextFilter [1] has been Jackrabbit-specific, but my recent refactoring
>> proposal, TextExtractor, [2] aims for a generic solution that converts
>> a generic InputStream into a Reader for passing to Lucene Java.
>>
>> Before coming up with the proposal I tried looking for similar
>> solutions, but couldn't find any that would have satisfied my
>> requirement of no external dependencies other than the JRE. Your
>> o.a.nutch.parse.Parser interface however came quite close, and you
>> already have an extensive set of existing implementations, so I'd like
>> to leverage your work with the Parser implementations while finding a
>> way to avoid the full Nutch and Hadoop dependencies. I believe that
>> there are a number of other Lucene users who have similar needs.
>>
>> Thus I'd like to ask if there would be interest in making your Parser
>> interface and implementations more easily accessible to external
>> projects, perhaps as a separate library. If  you're interested, I'd be
>> happy to participate in such an effort.
>>
>> [1] 
>> http://svn.apache.org/viewvc/jackrabbit/trunk/jackrabbit/src/main/java/org/apache/jackrabbit/core/query/TextFilter.java?view=markup
>>  
>>
>> [2] http://issues.apache.org/jira/browse/JCR-415
>>
>>
>> BR,
>>
>> Jukka Zitting
>>
>> -- 
>> Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
>> Software craftsmanship, JCR consulting, and Java development
>>
>


-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
[EMAIL PROTECTED]                        [EMAIL PROTECTED]
+41 44 272 91 61


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to