Hi Team,
I am new to Apache Solr. I may be missing something obvious. I am trying to
remove the duplicates from the search results in Solr 8.6 and I am trying
to use solr.ShingleFilterFactory and solr.MinHashFilterFactory. Attaching
the snippet here,
<fieldType name="text_min_hash" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" minShingleSize="3"
maxShingleSize="4"
outputUnigrams="true" outputUnigramsIfNoShingles="false"/>
<filter class="solr.MinHashFilterFactory" bucketCount="512"
hashSetSize="1" hashCount="1" withRotation="true" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
However, it is not really removing the duplicates from the results. Kindly
let me know if I am missing something. Any leads would be appreciated.
Thanks & Regards,
-Sourav.