Re: Length norm not functioning in solr queries.
Mikhail, Thank you for confirming this , however Ahmet's proposal seems more simpler to implement to me . On Wed, Dec 10, 2014 at 5:07 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: S.L, I briefly skimmed Lucene50NormsConsumer.writeNormsField(), my conclusion is: if you supply own similarity, which just avoids putting float to byte in Similarity.computeNorm(FieldInvertState), you get right this value in . Similarity.decodeNormValue(long). You may wonder but this is what's exactly done in PreciseDefaultSimilarity in TestLongNormValueSource. I think you can just use it. On Wed, Dec 10, 2014 at 12:11 PM, S.L simpleliving...@gmail.com wrote: Hi Ahmet, Is there already an implementation of the suggested work around ? Thanks. On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Default length norm is not best option for differentiating very short documents, like product names. Please see : http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec I suggest you to create an additional integer field, that holds number of tokens. You can populate it via update processor. And then penalise (using fuction queries) according to that field. This way you have more fine grained and flexible control over it. Ahmet On Tuesday, December 9, 2014 12:22 PM, S.L simpleliving...@gmail.com wrote: Hi , Mikhail Thanks , I looked at the explain and this is what I see for the two different documents in questions, they have identical scores even though the document 2 has a shorter productName field, I do not see any lenghtNorm related information in the explain. Also I am not exactly clear on what needs to be looked in the API ? *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf= productNameps=1pf2= productNamepf3= productNamestopwords=truelowercaseOperators=true *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory Unlocked * - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 *productName Apple iPhone 4S 16GB for Net10, No Contract, White* - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote: I have two documents doc1 and doc2 and each one of those has a field called phoneName. doc1:phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White Here if I search for q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true Doc1 and Doc2 both have the same identical score , but since the field phoneName in the doc2 has shorter length I would expect it to have a higher score , but both have an identical score of 9.961212. The phoneName filed is defined as follows.As we can see no where am I specifying omitNorms=True, still the behavior seems to be that the length norm is not functioning at all. Can some one let me know whats the issue here ? field name=phoneName type=text_en_splitting indexed=true stored=true required=true / fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / !-- in this example, we will only use
Re: Length norm not functioning in solr queries.
Ahmet, Thank you , as the configurations in SolrCloud are uploaded to zookeeper , are there any special steps that need to be taken to make this work in SolrCloud ? On Wed, Dec 10, 2014 at 4:32 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Or even better, you can use your new field for tie break purposes. Where scores are identical. e.g. sort=score desc, wordCount asc Ahmet On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, You mean update processor factory? Here is augmented (wordCount field added) version of your example : doc1: phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked wordCount: 11 doc2: phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White wordCount: 9 First task is simply calculate wordCount values. You can do it in your indexing code, or other places. I quickly skimmed existing update processors but I couldn't find stock implementation. CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is all about multivalued fields. I guess, A simple javascript that splits on whitespace and returns the produced array size would do the trick : StatelessScriptUpdateProcessorFactory At this point you have a int field named word count. boost=div(1,wordCount) should work. Or you can came up with more sophisticated math formula. Ahmet On Wednesday, December 10, 2014 11:12 AM, S.L simpleliving...@gmail.com wrote: Hi Ahmet, Is there already an implementation of the suggested work around ? Thanks. On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Default length norm is not best option for differentiating very short documents, like product names. Please see : http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec I suggest you to create an additional integer field, that holds number of tokens. You can populate it via update processor. And then penalise (using fuction queries) according to that field. This way you have more fine grained and flexible control over it. Ahmet On Tuesday, December 9, 2014 12:22 PM, S.L simpleliving...@gmail.com wrote: Hi , Mikhail Thanks , I looked at the explain and this is what I see for the two different documents in questions, they have identical scores even though the document 2 has a shorter productName field, I do not see any lenghtNorm related information in the explain. Also I am not exactly clear on what needs to be looked in the API ? *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf= productNameps=1pf2= productNamepf3= productNamestopwords=truelowercaseOperators=true *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory Unlocked * - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 *productName Apple iPhone 4S 16GB for Net10, No Contract, White* - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote: I have two documents doc1 and doc2 and each one of those has a field called phoneName. doc1:phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White Here if I search for q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true Doc1 and Doc2 both have the same identical score , but since the field phoneName in the doc2 has shorter length I would expect it to have a higher score , but both have an identical
Re: Length norm not functioning in solr queries.
Hi, No special steps to be taken for cloud setup. Please note that for both solutions, re-index is mandatory. Ahmet On Thursday, December 11, 2014 12:15 PM, S.L simpleliving...@gmail.com wrote: Ahmet, Thank you , as the configurations in SolrCloud are uploaded to zookeeper , are there any special steps that need to be taken to make this work in SolrCloud ? On Wed, Dec 10, 2014 at 4:32 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Or even better, you can use your new field for tie break purposes. Where scores are identical. e.g. sort=score desc, wordCount asc Ahmet On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, You mean update processor factory? Here is augmented (wordCount field added) version of your example : doc1: phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked wordCount: 11 doc2: phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White wordCount: 9 First task is simply calculate wordCount values. You can do it in your indexing code, or other places. I quickly skimmed existing update processors but I couldn't find stock implementation. CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is all about multivalued fields. I guess, A simple javascript that splits on whitespace and returns the produced array size would do the trick : StatelessScriptUpdateProcessorFactory At this point you have a int field named word count. boost=div(1,wordCount) should work. Or you can came up with more sophisticated math formula. Ahmet On Wednesday, December 10, 2014 11:12 AM, S.L simpleliving...@gmail.com wrote: Hi Ahmet, Is there already an implementation of the suggested work around ? Thanks. On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Default length norm is not best option for differentiating very short documents, like product names. Please see : http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec I suggest you to create an additional integer field, that holds number of tokens. You can populate it via update processor. And then penalise (using fuction queries) according to that field. This way you have more fine grained and flexible control over it. Ahmet On Tuesday, December 9, 2014 12:22 PM, S.L simpleliving...@gmail.com wrote: Hi , Mikhail Thanks , I looked at the explain and this is what I see for the two different documents in questions, they have identical scores even though the document 2 has a shorter productName field, I do not see any lenghtNorm related information in the explain. Also I am not exactly clear on what needs to be looked in the API ? *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf= productNameps=1pf2= productNamepf3= productNamestopwords=truelowercaseOperators=true *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory Unlocked * - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 *productName Apple iPhone 4S 16GB for Net10, No Contract, White* - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote: I have two documents doc1 and doc2 and each one of those has a field called phoneName. doc1:phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White Here if I search for q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true
Re: Length norm not functioning in solr queries.
Yes, I understand that reindexing is neccesary , however for some reason I was not able to invoke the js script from the updateprocessor, so I ended up using Java only solution at index time. Thanks. On Thu, Dec 11, 2014 at 7:18 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, No special steps to be taken for cloud setup. Please note that for both solutions, re-index is mandatory. Ahmet On Thursday, December 11, 2014 12:15 PM, S.L simpleliving...@gmail.com wrote: Ahmet, Thank you , as the configurations in SolrCloud are uploaded to zookeeper , are there any special steps that need to be taken to make this work in SolrCloud ? On Wed, Dec 10, 2014 at 4:32 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Or even better, you can use your new field for tie break purposes. Where scores are identical. e.g. sort=score desc, wordCount asc Ahmet On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, You mean update processor factory? Here is augmented (wordCount field added) version of your example : doc1: phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked wordCount: 11 doc2: phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White wordCount: 9 First task is simply calculate wordCount values. You can do it in your indexing code, or other places. I quickly skimmed existing update processors but I couldn't find stock implementation. CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is all about multivalued fields. I guess, A simple javascript that splits on whitespace and returns the produced array size would do the trick : StatelessScriptUpdateProcessorFactory At this point you have a int field named word count. boost=div(1,wordCount) should work. Or you can came up with more sophisticated math formula. Ahmet On Wednesday, December 10, 2014 11:12 AM, S.L simpleliving...@gmail.com wrote: Hi Ahmet, Is there already an implementation of the suggested work around ? Thanks. On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Default length norm is not best option for differentiating very short documents, like product names. Please see : http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec I suggest you to create an additional integer field, that holds number of tokens. You can populate it via update processor. And then penalise (using fuction queries) according to that field. This way you have more fine grained and flexible control over it. Ahmet On Tuesday, December 9, 2014 12:22 PM, S.L simpleliving...@gmail.com wrote: Hi , Mikhail Thanks , I looked at the explain and this is what I see for the two different documents in questions, they have identical scores even though the document 2 has a shorter productName field, I do not see any lenghtNorm related information in the explain. Also I am not exactly clear on what needs to be looked in the API ? *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf= productNameps=1pf2= productNamepf3= productNamestopwords=truelowercaseOperators=true *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory Unlocked * - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 *productName Apple iPhone 4S 16GB for Net10, No Contract, White* - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote:
Re: Length norm not functioning in solr queries.
Hi Ahmet, Is there already an implementation of the suggested work around ? Thanks. On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Default length norm is not best option for differentiating very short documents, like product names. Please see : http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec I suggest you to create an additional integer field, that holds number of tokens. You can populate it via update processor. And then penalise (using fuction queries) according to that field. This way you have more fine grained and flexible control over it. Ahmet On Tuesday, December 9, 2014 12:22 PM, S.L simpleliving...@gmail.com wrote: Hi , Mikhail Thanks , I looked at the explain and this is what I see for the two different documents in questions, they have identical scores even though the document 2 has a shorter productName field, I do not see any lenghtNorm related information in the explain. Also I am not exactly clear on what needs to be looked in the API ? *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf= productNameps=1pf2= productNamepf3= productNamestopwords=truelowercaseOperators=true *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory Unlocked * - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 *productName Apple iPhone 4S 16GB for Net10, No Contract, White* - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote: I have two documents doc1 and doc2 and each one of those has a field called phoneName. doc1:phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White Here if I search for q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true Doc1 and Doc2 both have the same identical score , but since the field phoneName in the doc2 has shorter length I would expect it to have a higher score , but both have an identical score of 9.961212. The phoneName filed is defined as follows.As we can see no where am I specifying omitNorms=True, still the behavior seems to be that the length norm is not functioning at all. Can some one let me know whats the issue here ? field name=phoneName type=text_en_splitting indexed=true stored=true required=true / fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.KeywordMarkerFilterFactory
Re: Length norm not functioning in solr queries.
Hi, You mean update processor factory? Here is augmented (wordCount field added) version of your example : doc1: phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked wordCount: 11 doc2: phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White wordCount: 9 First task is simply calculate wordCount values. You can do it in your indexing code, or other places. I quickly skimmed existing update processors but I couldn't find stock implementation. CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is all about multivalued fields. I guess, A simple javascript that splits on whitespace and returns the produced array size would do the trick : StatelessScriptUpdateProcessorFactory At this point you have a int field named word count. boost=div(1,wordCount) should work. Or you can came up with more sophisticated math formula. Ahmet On Wednesday, December 10, 2014 11:12 AM, S.L simpleliving...@gmail.com wrote: Hi Ahmet, Is there already an implementation of the suggested work around ? Thanks. On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Default length norm is not best option for differentiating very short documents, like product names. Please see : http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec I suggest you to create an additional integer field, that holds number of tokens. You can populate it via update processor. And then penalise (using fuction queries) according to that field. This way you have more fine grained and flexible control over it. Ahmet On Tuesday, December 9, 2014 12:22 PM, S.L simpleliving...@gmail.com wrote: Hi , Mikhail Thanks , I looked at the explain and this is what I see for the two different documents in questions, they have identical scores even though the document 2 has a shorter productName field, I do not see any lenghtNorm related information in the explain. Also I am not exactly clear on what needs to be looked in the API ? *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf= productNameps=1pf2= productNamepf3= productNamestopwords=truelowercaseOperators=true *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory Unlocked * - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 *productName Apple iPhone 4S 16GB for Net10, No Contract, White* - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote: I have two documents doc1 and doc2 and each one of those has a field called phoneName. doc1:phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White Here if I search for q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true Doc1 and Doc2 both have the same identical score , but since the field phoneName in the doc2 has shorter length I would expect it to have a higher score , but both have an identical score of 9.961212. The phoneName filed is defined as follows.As we can see no where am I specifying omitNorms=True, still the behavior seems to be that the length norm is not functioning at all. Can some one let me know whats the issue here ? field name=phoneName type=text_en_splitting indexed=true stored=true required=true / fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer
Re: Length norm not functioning in solr queries.
Hi, Or even better, you can use your new field for tie break purposes. Where scores are identical. e.g. sort=score desc, wordCount asc Ahmet On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, You mean update processor factory? Here is augmented (wordCount field added) version of your example : doc1: phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked wordCount: 11 doc2: phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White wordCount: 9 First task is simply calculate wordCount values. You can do it in your indexing code, or other places. I quickly skimmed existing update processors but I couldn't find stock implementation. CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is all about multivalued fields. I guess, A simple javascript that splits on whitespace and returns the produced array size would do the trick : StatelessScriptUpdateProcessorFactory At this point you have a int field named word count. boost=div(1,wordCount) should work. Or you can came up with more sophisticated math formula. Ahmet On Wednesday, December 10, 2014 11:12 AM, S.L simpleliving...@gmail.com wrote: Hi Ahmet, Is there already an implementation of the suggested work around ? Thanks. On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Default length norm is not best option for differentiating very short documents, like product names. Please see : http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec I suggest you to create an additional integer field, that holds number of tokens. You can populate it via update processor. And then penalise (using fuction queries) according to that field. This way you have more fine grained and flexible control over it. Ahmet On Tuesday, December 9, 2014 12:22 PM, S.L simpleliving...@gmail.com wrote: Hi , Mikhail Thanks , I looked at the explain and this is what I see for the two different documents in questions, they have identical scores even though the document 2 has a shorter productName field, I do not see any lenghtNorm related information in the explain. Also I am not exactly clear on what needs to be looked in the API ? *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf= productNameps=1pf2= productNamepf3= productNamestopwords=truelowercaseOperators=true *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory Unlocked * - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 *productName Apple iPhone 4S 16GB for Net10, No Contract, White* - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote: I have two documents doc1 and doc2 and each one of those has a field called phoneName. doc1:phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White Here if I search for q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true Doc1 and Doc2 both have the same identical score , but since the field phoneName in the doc2 has shorter length I would expect it to have a higher score , but both have an identical score of 9.961212. The phoneName filed is defined as follows.As we can see no where am I specifying omitNorms=True, still the behavior seems to be that the length norm is not functioning at all. Can some one let me know whats the issue here ? field name=phoneName type=text_en_splitting indexed=true stored=true required=true /
Re: Length norm not functioning in solr queries.
S.L, I briefly skimmed Lucene50NormsConsumer.writeNormsField(), my conclusion is: if you supply own similarity, which just avoids putting float to byte in Similarity.computeNorm(FieldInvertState), you get right this value in . Similarity.decodeNormValue(long). You may wonder but this is what's exactly done in PreciseDefaultSimilarity in TestLongNormValueSource. I think you can just use it. On Wed, Dec 10, 2014 at 12:11 PM, S.L simpleliving...@gmail.com wrote: Hi Ahmet, Is there already an implementation of the suggested work around ? Thanks. On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Default length norm is not best option for differentiating very short documents, like product names. Please see : http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec I suggest you to create an additional integer field, that holds number of tokens. You can populate it via update processor. And then penalise (using fuction queries) according to that field. This way you have more fine grained and flexible control over it. Ahmet On Tuesday, December 9, 2014 12:22 PM, S.L simpleliving...@gmail.com wrote: Hi , Mikhail Thanks , I looked at the explain and this is what I see for the two different documents in questions, they have identical scores even though the document 2 has a shorter productName field, I do not see any lenghtNorm related information in the explain. Also I am not exactly clear on what needs to be looked in the API ? *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf= productNameps=1pf2= productNamepf3= productNamestopwords=truelowercaseOperators=true *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory Unlocked * - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 *productName Apple iPhone 4S 16GB for Net10, No Contract, White* - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote: I have two documents doc1 and doc2 and each one of those has a field called phoneName. doc1:phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White Here if I search for q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true Doc1 and Doc2 both have the same identical score , but since the field phoneName in the doc2 has shorter length I would expect it to have a higher score , but both have an identical score of 9.961212. The phoneName filed is defined as follows.As we can see no where am I specifying omitNorms=True, still the behavior seems to be that the length norm is not functioning at all. Can some one let me know whats the issue here ? field name=phoneName type=text_en_splitting indexed=true stored=true required=true / fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to
Re: Length norm not functioning in solr queries.
Hi , Mikhail Thanks , I looked at the explain and this is what I see for the two different documents in questions, they have identical scores even though the document 2 has a shorter productName field, I do not see any lenghtNorm related information in the explain. Also I am not exactly clear on what needs to be looked in the API ? *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf= productNameps=1pf2= productNamepf3= productNamestopwords=truelowercaseOperators=true *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory Unlocked * - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 *productName Apple iPhone 4S 16GB for Net10, No Contract, White* - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote: I have two documents doc1 and doc2 and each one of those has a field called phoneName. doc1:phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White Here if I search for q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true Doc1 and Doc2 both have the same identical score , but since the field phoneName in the doc2 has shorter length I would expect it to have a higher score , but both have an identical score of 9.961212. The phoneName filed is defined as follows.As we can see no where am I specifying omitNorms=True, still the behavior seems to be that the length norm is not functioning at all. Can some one let me know whats the issue here ? field name=phoneName type=text_en_splitting indexed=true stored=true required=true / fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt / filter class=solr.PorterStemFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter
Re: Length norm not functioning in solr queries.
Hi, Default length norm is not best option for differentiating very short documents, like product names. Please see : http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec I suggest you to create an additional integer field, that holds number of tokens. You can populate it via update processor. And then penalise (using fuction queries) according to that field. This way you have more fine grained and flexible control over it. Ahmet On Tuesday, December 9, 2014 12:22 PM, S.L simpleliving...@gmail.com wrote: Hi , Mikhail Thanks , I looked at the explain and this is what I see for the two different documents in questions, they have identical scores even though the document 2 has a shorter productName field, I do not see any lenghtNorm related information in the explain. Also I am not exactly clear on what needs to be looked in the API ? *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf= productNameps=1pf2= productNamepf3= productNamestopwords=truelowercaseOperators=true *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory Unlocked * - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 *productName Apple iPhone 4S 16GB for Net10, No Contract, White* - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote: I have two documents doc1 and doc2 and each one of those has a field called phoneName. doc1:phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White Here if I search for q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true Doc1 and Doc2 both have the same identical score , but since the field phoneName in the doc2 has shorter length I would expect it to have a higher score , but both have an identical score of 9.961212. The phoneName filed is defined as follows.As we can see no where am I specifying omitNorms=True, still the behavior seems to be that the length norm is not functioning at all. Can some one let me know whats the issue here ? field name=phoneName type=text_en_splitting indexed=true stored=true required=true / fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt / filter class=solr.PorterStemFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory
Re: Length norm not functioning in solr queries.
I wonder why your explains are so brief, mine looks like str 0.4500489 = (MATCH) weight(text:inc in 17) [DefaultSimilarity], result of: 0.4500489 = fieldWeight in 17, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 2.880313 = idf(docFreq=8, maxDocs=59) 0.15625 = fieldNorm(doc=17)/str str 0.4500489 = (MATCH) weight(text:inc in 27) [DefaultSimilarity], result of: 0.4500489 = fieldWeight in 27, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 2.880313 = idf(docFreq=8, maxDocs=59) 0.15625 = fieldNorm(doc=27)/str here we can see fieldNorm factors. These two docs are rather different, however norm factors are equal. Also I am not exactly clear on what needs to be looked in the API ? Because you can see how exactly how it looses precision when stores float field norm in the byte. On Tue, Dec 9, 2014 at 1:22 PM, S.L simpleliving...@gmail.com wrote: Hi , Mikhail Thanks , I looked at the explain and this is what I see for the two different documents in questions, they have identical scores even though the document 2 has a shorter productName field, I do not see any lenghtNorm related information in the explain. Also I am not exactly clear on what needs to be looked in the API ? *Search Query* : q=iphone+4s+16gbqf= productNamemm=1pf= productNameps=1pf2= productNamepf3= productNamestopwords=truelowercaseOperators=true *productName Details about Apple iPhone 4s 16GB Smartphone ATT Factory Unlocked * - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 *productName Apple iPhone 4S 16GB for Net10, No Contract, White* - *100%* 10.649221 sum of the following: - *10.58%* 1.1270299 sum of the following: - *2.1%* 0.22383358 productName:iphon - *3.47%* 0.36922288 productName:4 s - *5.01%* 0.53397346 productName:16 gb - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 - *27.79%* 2.959255 sum of the following: - *10.97%* 1.1680154 productName:iphon 4 s~1 - *16.82%* 1.7912396 productName:4 s 16 gb~1 - *30.81%* 3.2814684 productName:iphon 4 s 16 gb~1 On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote: I have two documents doc1 and doc2 and each one of those has a field called phoneName. doc1:phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White Here if I search for q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true Doc1 and Doc2 both have the same identical score , but since the field phoneName in the doc2 has shorter length I would expect it to have a higher score , but both have an identical score of 9.961212. The phoneName filed is defined as follows.As we can see no where am I specifying omitNorms=True, still the behavior seems to be that the length norm is not functioning at all. Can some one let me know whats the issue here ? field name=phoneName type=text_en_splitting indexed=true stored=true required=true / fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1
Re: Length norm not functioning in solr queries.
It's worth to look into explain to check particular scoring values. But for most suspect is the reducing precision when float norms are stored in byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float) On Mon, Dec 8, 2014 at 5:49 PM, S.L simpleliving...@gmail.com wrote: I have two documents doc1 and doc2 and each one of those has a field called phoneName. doc1:phoneName:Details about Apple iPhone 4s - 16GB - White (Verizon) Smartphone Factory Unlocked doc2:phoneName:Apple iPhone 4S 16GB for Net10, No Contract, White Here if I search for q=iphone+4s+16gbqf=phoneNamemm=1pf=phoneNameps=1pf2=phoneNamepf3=phoneNamestopwords=truelowercaseOperators=true Doc1 and Doc2 both have the same identical score , but since the field phoneName in the doc2 has shorter length I would expect it to have a higher score , but both have an identical score of 9.961212. The phoneName filed is defined as follows.As we can see no where am I specifying omitNorms=True, still the behavior seems to be that the length norm is not functioning at all. Can some one let me know whats the issue here ? field name=phoneName type=text_en_splitting indexed=true stored=true required=true / fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt / filter class=solr.PorterStemFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt / filter class=solr.PorterStemFilterFactory / /analyzer /fieldType -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com