text filters ...

Nicolas Mon, 01 May 2006 02:19:40 -0700

Hi all,

I am trying to get jackrabbit to index pdf with Japanese content.

Jackrabbit works really well with indexing content located in nodesas text, even Japanese (using the CJK analyser of Lucene), and I alsoget proper results on searching english through pdf documents. (usingthe pdf text filter classes)

But I cannot get the search to return anything from pdf with japanesetext.

So I wanted to write my own PdfFilter and use that class fordebugging. Unfortunately, I cannot get my text filter class to beused. I am using the following for the SearchIndex configuration:

<SearchIndexclass="org.apache.jackrabbit.core.query.lucene.SearchIndex">

            <param name="textFilterClasses" value="my.own.PdfFilter" />
            <param name="path" value="${wsp.home}/index" />

            <param name="useCompoundFile" value="true" />
            <param name="minMergeDocs" value="100" />
            <param name="volatileIdleTime" value="3" />
            <param name="maxMergeDocs" value="100000" />
            <param name="mergeFactor" value="10" />
            <param name="bufferSize" value="10" />
   </SearchIndex>

It seems like the textFilterClasses parameter is never used.

Can anybody confirm or infirm the above ?

Or even better, if anybody has some hints or piece of advice as howto achieve an index of a japanese pdf.


Thank you in advance,

Nicolas Modrzyk,

text filters ...

Reply via email to