Hi all,
I am trying to get jackrabbit to index pdf with Japanese content.
Jackrabbit works really well with indexing content located in nodes
as text, even Japanese (using the CJK analyser of Lucene), and I also
get proper results on searching english through pdf documents. (using
the pdf text filter classes)
But I cannot get the search to return anything from pdf with japanese
text.
So I wanted to write my own PdfFilter and use that class for
debugging. Unfortunately, I cannot get my text filter class to be
used. I am using the following for the SearchIndex configuration:
<SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="textFilterClasses" value="my.own.PdfFilter" />
<param name="path" value="${wsp.home}/index" />
<!-- <param name="analyzer"
value="org.apache.lucene.analysis.cjk.CJKAnalyzer"/> -->
<param name="useCompoundFile" value="true" />
<param name="minMergeDocs" value="100" />
<param name="volatileIdleTime" value="3" />
<param name="maxMergeDocs" value="100000" />
<param name="mergeFactor" value="10" />
<param name="bufferSize" value="10" />
</SearchIndex>
It seems like the textFilterClasses parameter is never used.
Can anybody confirm or infirm the above ?
Or even better, if anybody has some hints or piece of advice as how
to achieve an index of a japanese pdf.
Thank you in advance,
Nicolas Modrzyk,