Hi all,

I am trying to get jackrabbit to index pdf with Japanese content.
Jackrabbit works really well with indexing content located in nodes as text, even Japanese (using the CJK analyser of Lucene), and I also get proper results on searching english through pdf documents. (using the pdf text filter classes)

But I cannot get the search to return anything from pdf with japanese text.

So I wanted to write my own PdfFilter and use that class for debugging. Unfortunately, I cannot get my text filter class to be used. I am using the following for the SearchIndex configuration:

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="textFilterClasses" value="my.own.PdfFilter" />
            <param name="path" value="${wsp.home}/index" />
<!-- <param name="analyzer" value="org.apache.lucene.analysis.cjk.CJKAnalyzer"/> -->
            <param name="useCompoundFile" value="true" />
            <param name="minMergeDocs" value="100" />
            <param name="volatileIdleTime" value="3" />
            <param name="maxMergeDocs" value="100000" />
            <param name="mergeFactor" value="10" />
            <param name="bufferSize" value="10" />
   </SearchIndex>

It seems like the textFilterClasses parameter is never used.

Can anybody confirm or infirm the above ?
Or even better, if anybody has some hints or piece of advice as how to achieve an index of a japanese pdf.

Thank you in advance,

Nicolas Modrzyk,

Reply via email to