Thank you again Ivan (and sorry for the silence, I was away these last few days). I made the jar with maven, the problem that I have now is a compilation failure due to the override annotation in NormRemovalSimilarity.java ("*method does not override or implement a method from a supertype*"). When I put the line in comment, the jar is built with success but I think that the new decodeNormValue function is not overriding the original one (normal!). Indeed, when I search my field contents that has similarity=my_similarity, the explanation of the score is:
... { "value": 0.25, "description": "fieldNorm(doc=0)" } ... I suppose that under the new similarity, the value should be 1.0, shouldn't it? Cheers, Patrick Le jeudi 3 avril 2014 12:15:15 UTC-4, Ivan Brusic a écrit : > > I added a simple Maven pom to the gist: > https://gist.github.com/brusic/9786587#file-pom-xml > > Easiest thing to do is download Maven (if you do not have it) and use it > take care handling the dependencies and build a jar if you simple execute: > mvn package > > Since Elasticsearch already comes bundle with the correct jars, you can > also add those to your classpath instead. I think you only need Lucene > core, which is in $ES_HOME/lib/lucene-core-4-?-?.jar Substitute the > question marks for the correct version. I am not on Elasticsearch, so I do > not know offhand which version of Lucene is packaged. > > -- > Ivan > > > On Thu, Apr 3, 2014 at 7:44 AM, geantbrun <agin.p...@gmail.com<javascript:> > > wrote: > >> Ivan, >> Sorry but I realize (I'm totally unaware of Java) that I skipped the java >> compile step (I simply put the java files in a jar file with jar cf). The >> problem now is that executing : >> >> javac NormRemovalSimilarity.java -classpath ./elasticsearch-1.1.0.jar >> >> generates errors, the first one being: >> >> package org.apache.lucene.search.similarities does not exist >> >> Googled it but found nothing. Any idea? >> Patrick >> >> P.S. I installed elasticsearch following the easy >> way<https://gist.github.com/wingdspur/2026107>(dpkg the deb file) >> >> Le jeudi 3 avril 2014 09:16:02 UTC-4, geantbrun a écrit : >> >>> Thanks again for your great help Ivan. Does not work for me. When I >>> substitute NormRemovalSimilarityProvider by BM25SimilarityProvider (or >>> simply by BM25), it works. Is it possible that I put my jar file in the >>> wrong directory (usr/share/elasticsearch/lib)? Is it necessary to >>> *register* somewhere the new classes I define before restarting service? >>> Cheers, >>> Patrick >>> >>> Le mercredi 2 avril 2014 17:47:46 UTC-4, Ivan Brusic a écrit : >>>> >>>> Are you using a full class name? I have no problems with >>>> >>>> curl -XPOST 'http://localhost:9200/sim/' -d ' >>>> { >>>> "settings" : { >>>> "similarity" : { >>>> "my_similarity" : { >>>> "type" : "org.elasticsearch.index.similarity. >>>> NormRemovalSimilarityProvider" >>>> } >>>> } >>>> }, >>>> "mappings" : { >>>> "post" : { >>>> "properties" : { >>>> "id" : { "type" : "long", "store" : "yes", "precision_step" : "0" }, >>>> "name" : { "type" : "string", "store" : "yes", "index" : >>>> "analyzed"}, >>>> "contents" : { "type" : "string", "store" : "no", "index" : >>>> "analyzed", "similarity" : "my_similarity"} >>>> } >>>> } >>>> } >>>> } >>>> ' >>>> >>>> >>>> >>>> On Wed, Apr 2, 2014 at 12:03 PM, geantbrun <agin.p...@gmail.com> wrote: >>>> >>>>> In order to better understand the error, I copied your >>>>> NormRemovalSimilarity and NormRemovalSimilarityProvider code snippets in >>>>> usr/share/elasticsearch/lib. I put these 2 files in a jar named >>>>> NormRemovalSimilarity.jar. After restarting the elasticsearch service, I >>>>> tried to create the index with the same mapping as before (except that I >>>>> put "type" : "NormRemoval" in the settings of my_similarity. >>>>> >>>>> The result is the same: >>>>> {"error":"IndexCreationException[[exbd] failed to create index]; >>>>> nested: NoClassSettingsException[Failed to load class setting [type] >>>>> with value [NormRemoval]]; nested: ClassNotFoundException[org. >>>>> elasticsearch.index.similarity.normremoval. >>>>> NormRemovalSimilarityProvider]; ","status":500}] >>>>> >>>>> I deleted the jar file just to see if the error is the same: yes it >>>>> is. It's like the new similarity is never found or loaded. Is it still >>>>> working without modifications on your side? >>>>> Cheers, >>>>> Patrick >>>>> >>>>> >>>>> Le mercredi 2 avril 2014 00:31:44 UTC-4, Ivan Brusic a écrit : >>>>>> >>>>>> It has been a while since I used a custom similarity, but what you >>>>>> have looks right. Can you try a full class name instead? >>>>>> Use org.elasticsearch.index.similarity.tfCappedSimilarityProvider. >>>>>> According to the error, it is looking for org.elasticsearch.index.si >>>>>> milarity.tfcappedsimilarity.tfCappedSimilaritySimilarityProvider. >>>>>> >>>>>> -- >>>>>> Ivan >>>>>> >>>>>> >>>>>> On Tue, Apr 1, 2014 at 7:00 AM, geantbrun <agin.p...@gmail.com>wrote: >>>>>> >>>>>>> Sure. >>>>>>> >>>>>>> { >>>>>>> "settings" : { >>>>>>> "index" : { >>>>>>> "similarity" : { >>>>>>> "my_similarity" : { >>>>>>> "type" : "tfCappedSimilarity" >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> }, >>>>>>> "mappings" : { >>>>>>> "post" : { >>>>>>> "properties" : { >>>>>>> "id" : { "type" : "long", "store" : "yes", "precision_step" : >>>>>>> "0" }, >>>>>>> "name" : { "type" : "string", "store" : "yes", "index" : >>>>>>> "analyzed"}, >>>>>>> "contents" : { "type" : "string", "store" : "no", "index" : >>>>>>> "analyzed", "similarity" : "my_similarity"} >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> If I substitute tfCappedSimilarity for tfCapped in the mapping, the >>>>>>> error is the same except that provider is referred as >>>>>>> tfCappedSimilarityProvider and not as tfCappedSimilaritySimilarit >>>>>>> yProvider. >>>>>>> Cheers, >>>>>>> Patrick >>>>>>> >>>>>>> >>>>>>> Le lundi 31 mars 2014 17:13:24 UTC-4, Ivan Brusic a écrit : >>>>>>>> >>>>>>>> Can you also post your mapping where you defined the similarity? >>>>>>>> >>>>>>>> -- >>>>>>>> Ivan >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 31, 2014 at 10:36 AM, geantbrun <agin.p...@gmail.com>wrote: >>>>>>>> >>>>>>>>> I realize that I probably have to define the similarity property >>>>>>>>> of my field as "my_similarity" (and not as "tfCappedSimilarity") and >>>>>>>>> define >>>>>>>>> in the settings my_similarity as being of type tfCappedSimilarity. >>>>>>>>> When I do that, I get the following error at the index/mapping >>>>>>>>> creation: >>>>>>>>> >>>>>>>>> {"error":"IndexCreationException[[exbd] failed to create index]; >>>>>>>>> nested: NoClassSettingsException[Failed to load class setting >>>>>>>>> [type] with value [tfCappedSimilarity]]; nested: >>>>>>>>> ClassNotFoundException[org. >>>>>>>>> elasticsearch.index.similarity.tfcappedsimilarity.tfCappedSimil >>>>>>>>> aritySimilarityProvider]; ","status":500}] >>>>>>>>> >>>>>>>>> Note that the provider is referred in the error as >>>>>>>>> tfCappedSimilaritySimilarityProvider (similarity repeated 2 >>>>>>>>> times). Is it normal? >>>>>>>>> Patrick >>>>>>>>> >>>>>>>>> Le lundi 31 mars 2014 13:06:00 UTC-4, geantbrun a écrit : >>>>>>>>> >>>>>>>>>> Hi Ivan, >>>>>>>>>> I followed your instructions but it does not seem to work, I must >>>>>>>>>> be wrong somewhere. I created the jar file from the following two >>>>>>>>>> java >>>>>>>>>> files, could you tell me if they are ok? >>>>>>>>>> >>>>>>>>>> tfCappedSimilarity.java >>>>>>>>>> *************************** >>>>>>>>>> package org.elasticsearch.index.similarity; >>>>>>>>>> >>>>>>>>>> import org.apache.lucene.search.similarities.DefaultSimilarity; >>>>>>>>>> import org.elasticsearch.common.logging.ESLogger; >>>>>>>>>> import org.elasticsearch.common.logging.Loggers; >>>>>>>>>> >>>>>>>>>> public class tfCappedSimilarity extends DefaultSimilarity { >>>>>>>>>> >>>>>>>>>> private ESLogger logger; >>>>>>>>>> >>>>>>>>>> public tfCappedSimilarity() { >>>>>>>>>> logger = Loggers.getLogger(getClass()); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> /** >>>>>>>>>> * Capped tf value >>>>>>>>>> */ >>>>>>>>>> @Override >>>>>>>>>> public float tf(float freq) { >>>>>>>>>> return (float)Math.sqrt(Math.min(9, freq)); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> tfCappedSimilarityProvider.java >>>>>>>>>> ************************************* >>>>>>>>>> package org.elasticsearch.index.similarity; >>>>>>>>>> >>>>>>>>>> import org.elasticsearch.common.inject.Inject; >>>>>>>>>> import org.elasticsearch.common.inject.assistedinject.Assisted; >>>>>>>>>> import org.elasticsearch.common.settings.Settings; >>>>>>>>>> >>>>>>>>>> public class tfCappedSimilarityProvider extends >>>>>>>>>> AbstractSimilarityProvider { >>>>>>>>>> >>>>>>>>>> private tfCappedSimilarity similarity; >>>>>>>>>> >>>>>>>>>> @Inject >>>>>>>>>> public tfCappedSimilarityProvider(@Assisted String name, >>>>>>>>>> @Assisted Settings settings) { >>>>>>>>>> super(name); >>>>>>>>>> this.similarity = new tfCappedSimilarity(); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> /** >>>>>>>>>> * {@inheritDoc} >>>>>>>>>> */ >>>>>>>>>> @Override >>>>>>>>>> public tfCappedSimilarity get() { >>>>>>>>>> return similarity; >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In my mapping, I define the similarity property of my field as >>>>>>>>>> tfCappedSimilarity, is it ok? >>>>>>>>>> >>>>>>>>>> What makes me say that it does not work: I insert a doc with a >>>>>>>>>> word repeated 16 times in my field. When I do a search with that >>>>>>>>>> word, the >>>>>>>>>> result shows a tf of 4 (square root of 16) and not 3 as I was >>>>>>>>>> expecting, Is >>>>>>>>>> there a way to know if the similarity was loaded or not (maybe in a >>>>>>>>>> log >>>>>>>>>> file?). >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Patrick >>>>>>>>>> >>>>>>>>>> Le mercredi 26 mars 2014 17:16:36 UTC-4, Ivan Brusic a écrit : >>>>>>>>>>> >>>>>>>>>>> I updated my gist to illustrate the SimilarityProvider that goes >>>>>>>>>>> along with it. Similarities are easier to add to Elasticsearch than >>>>>>>>>>> most >>>>>>>>>>> plugins. You just need to compile the two files into a jar and then >>>>>>>>>>> add >>>>>>>>>>> that jar into Elasticsearch's classpath ($ES_HOME/lib most likely). >>>>>>>>>>> The >>>>>>>>>>> code will scan for every SimilarityProvider defined and load it. >>>>>>>>>>> >>>>>>>>>>> You then mapping the similarity to a field: http://www. >>>>>>>>>>> elasticsearch.org/guide/en/elasticsearch/reference/ >>>>>>>>>>> current/mapping-core-types.html#_configuring_similarity_ >>>>>>>>>>> per_field >>>>>>>>>>> >>>>>>>>>>> Note that you cannot change the similarity of a field >>>>>>>>>>> dynamically. >>>>>>>>>>> >>>>>>>>>>> Ivan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://www.elasticsearch.org/guide/en/elasticsearch/referenc >>>>>>>>>>> e/current/mapping-core-types.html#_configuring_similarity_pe >>>>>>>>>>> r_field >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 26, 2014 at 12:49 PM, geantbrun <agin.p...@gmail.com >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> Britta is looping over words that are passed as parameters. >>>>>>>>>>>> It's easy to implement her script for a simple query but what >>>>>>>>>>>> about boolean >>>>>>>>>>>> querys? In my understanding (but I could be wrong of course), I >>>>>>>>>>>> would have >>>>>>>>>>>> to parse the query to call the script with each sub-clause, am I >>>>>>>>>>>> wrong? >>>>>>>>>>>> >>>>>>>>>>>> I prefer your custom similarity alternative. Again, sorry for >>>>>>>>>>>> the silly question (newbie!) but where do you put your java file? >>>>>>>>>>>> Is it the >>>>>>>>>>>> only thing that is needed (except for the modification in the >>>>>>>>>>>> mapping)? >>>>>>>>>>>> cheers, >>>>>>>>>>>> Patrick >>>>>>>>>>>> >>>>>>>>>>>> Le mercredi 26 mars 2014 11:58:52 UTC-4, Ivan Brusic a écrit : >>>>>>>>>>>>> >>>>>>>>>>>>> I am still on a version of Elasticsearch that does not have >>>>>>>>>>>>> access to the new scoring capabilities, so I cannot test out any >>>>>>>>>>>>> scripts. >>>>>>>>>>>>> The non normalized term frequency should be the line: >>>>>>>>>>>>> tf = _index[field][word].tf() >>>>>>>>>>>>> >>>>>>>>>>>>> If that is the case, you could substitute that line with >>>>>>>>>>>>> something like: >>>>>>>>>>>>> tf = Math.min(10, _index[field][word].tf()) >>>>>>>>>>>>> >>>>>>>>>>>>> As a stated before, I am used to using Similarities, so I find >>>>>>>>>>>>> the example easier. Here is a custom similarity that I used in >>>>>>>>>>>>> Elasticsearch (removes any norms that are indexed): >>>>>>>>>>>>> https://gist.github.com/brusic/9786587 >>>>>>>>>>>>> >>>>>>>>>>>>> The second part would be the tf() method you would need to >>>>>>>>>>>>> implement instead of decodeNormValue I used. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> >>>>>>>>>>>>> Ivan >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "elasticsearch" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/6370b4dc-824 >>>>>>>>> 3-4aea-918a-e4e4e9588aaf%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6370b4dc-8243-4aea-918a-e4e4e9588aaf%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elasticsearch" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4 >>>>>>> a-427d-952e-a203f2376fb8%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/f9c6111c-9c4a-427d-952e-a203f2376fb8%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to elasticsearc...@googlegroups.com. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7% >>>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/68488979-9153-430b-b349-2192717677e7%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/25ca773c-17fc-4b03-aaf7-58464f6a6885%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57c7df18-aea1-4b8c-98ce-9ee8e25a738d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.