Re: Position increment in WordDelimiterFilter.
On 19 January 2016 at 05:41, Modassar Atherwrote: > Thanks Shawn for your explanation. > > Everything else about the analysis looks > correct to me, and the positions you see are needed for a phrase query > to work correctly. > > Here the "WiFi device" will not be searched as there is a gap in between > because Fi is at position 2. The document containing WiFi device will be > seen as a phrase with no word in between hence it should match phrase "WiFi > device" but it will not whereas "WiFi device"~1 will matched. > > ,Let's try to summarise in detail as this is quite confusing : 1) Index : "WiFi device" tokenized as you described [ WiFi1 > Wi 1 > WiFi1 > Fi 2 > device 3 ] 2) Query time simple whitespace tokenized : "WiFi device" [ WiFi(0) device(1) ] In this case, it will happen what you exactly quoted. I should take a look to an old message in the mailing list, pretty sure we faced this very same discussion. The problem with word expansion is that whatever you do you are going to get some side effect. Cheers > Best, > Modassar > > On Mon, Jan 18, 2016 at 7:57 PM, Shawn Heisey wrote: > > > On 1/18/2016 6:21 AM, Modassar Ather wrote: > > > Can you please send us tokens you get (and positions) when you analyze > > > *WiFi device* > > > > > > Tokens generated and their respective positions. > > > > > > WiFi1 > > > Wi 1 > > > WiFi1 > > > Fi 2 > > > device 3 > > > > It seems very odd to me that the original value would show up twice with > > the preserveOriginal parameter set, but I am seeing the same behavior on > > 4.7 and 5.3. Because both copies are at the same position, this will > > not affect search, but will slightly affect relevance if you are not > > specifying a sort parameter. Everything else about the analysis looks > > correct to me, and the positions you see are needed for a phrase query > > to work correctly. > > > > I have seen working configurations where preserveOriginal is set on the > > index analysis but NOT set on query analysis. This is how my own schema > > is configured. One of the reasons for this configuration is to reduce > > the number of terms in the query so it is faster than it would be if > > preserveOriginal were present and generated additional terms. The > > preserveOriginal on the index side ensures a match whether mixed case is > > used or not. > > > > Thanks, > > Shawn > > > > > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: Position increment in WordDelimiterFilter.
On 1/18/2016 6:21 AM, Modassar Ather wrote: > Can you please send us tokens you get (and positions) when you analyze > *WiFi device* > > Tokens generated and their respective positions. > > WiFi1 > Wi 1 > WiFi1 > Fi 2 > device 3 It seems very odd to me that the original value would show up twice with the preserveOriginal parameter set, but I am seeing the same behavior on 4.7 and 5.3. Because both copies are at the same position, this will not affect search, but will slightly affect relevance if you are not specifying a sort parameter. Everything else about the analysis looks correct to me, and the positions you see are needed for a phrase query to work correctly. I have seen working configurations where preserveOriginal is set on the index analysis but NOT set on query analysis. This is how my own schema is configured. One of the reasons for this configuration is to reduce the number of terms in the query so it is faster than it would be if preserveOriginal were present and generated additional terms. The preserveOriginal on the index side ensures a match whether mixed case is used or not. Thanks, Shawn
Re: Position increment in WordDelimiterFilter.
Can you please send us tokens you get (and positions) when you analyze *WiFi device* Tokens generated and their respective positions. WiFi1 Wi 1 WiFi1 Fi2 device 3 Best, Modassar On Fri, Jan 15, 2016 at 6:25 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Can you please send us tokens you get (and positions) when you analyze > *WiFi device* > > On 15.01.2016 13:15, Modassar Ather wrote: > >> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? >> I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two >> different token. Please refer to my examples given in previous mail about >> the issues faced. >> Wi Fi are two term which will match but what happens if for a content >> having *WiFi device* is searched with *"WiFi device"*. It will not match >> as >> there is a position increment by WordDelimiterFilter for WiFi. >> "WiFi device"~1 will match which is confusing that there is no gap in the >> content why a slop is required. >> >> Why do you use WordDelimiterFilter? Can you give us few examples where it >> is useful? >> It is useful when a word like* lucene-search documentation *is indexed >> with >> >> WordDelimiterFilter and it is broken in two terms like lucene and search >> then it will be helpful to get the documents containing it for queries >> like >> lucene documentation or search documentation. >> >> Best, >> Modassar >> >> On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic < >> emir.arnauto...@sematext.com> wrote: >> >> Modassar, >>> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why >>> do you use WordDelimiterFilter? Can you give us few examples where it is >>> useful? >>> >>> Thanks, >>> Emir >>> >>> >>> On 15.01.2016 05:13, Modassar Ather wrote: >>> >>> Thanks for your responses. It seems to me that you don't want to split on numbers. It is not with number only. Even if you try to analyze WiFi it will create 4 token one of which will be at position 2. So basically the issue is with position increment which causes few of the queries behave unexpectedly. Which release of Solr are you using? I am using Lucene/Solr-5.4.0. Best, Modassar On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky < jack.krupan...@gmail.com wrote: Which release of Solr are you using? Last year (or so) there was a Lucene > change that had the effect of keeping all terms for WDF at the same > position. There was also some discussion about whether this was either > a > bug or a bug fix, but I don't recall any resolution. > > -- Jack Krupansky > > On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather < > modather1...@gmail.com> > wrote: > > Hi, > >> I have following definition for WordDelimiterFilter. >> >> > generateNumberParts="1" catenateWords="1" catenateNumbers="1" >> catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> >> >> The analysis of 3d shows following four tokens and their positions. >> >> token position >> 3d 1 >> 3 1 >> 3d 1 >> d 2 >> >> Please help me understand why d is at 2? Should not it also be at >> >> position > > 1. >> Is it a bug and if not is there any attribute which I can use to >> restrict >> the position increment? >> >> Thanks, >> Modassar >> >> >> -- >>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management >>> Solr & Elasticsearch Support * http://sematext.com/ >>> >>> >>> > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > >
Re: Position increment in WordDelimiterFilter.
Thanks Shawn for your explanation. Everything else about the analysis looks correct to me, and the positions you see are needed for a phrase query to work correctly. Here the "WiFi device" will not be searched as there is a gap in between because Fi is at position 2. The document containing WiFi device will be seen as a phrase with no word in between hence it should match phrase "WiFi device" but it will not whereas "WiFi device"~1 will matched. Best, Modassar On Mon, Jan 18, 2016 at 7:57 PM, Shawn Heiseywrote: > On 1/18/2016 6:21 AM, Modassar Ather wrote: > > Can you please send us tokens you get (and positions) when you analyze > > *WiFi device* > > > > Tokens generated and their respective positions. > > > > WiFi1 > > Wi 1 > > WiFi1 > > Fi 2 > > device 3 > > It seems very odd to me that the original value would show up twice with > the preserveOriginal parameter set, but I am seeing the same behavior on > 4.7 and 5.3. Because both copies are at the same position, this will > not affect search, but will slightly affect relevance if you are not > specifying a sort parameter. Everything else about the analysis looks > correct to me, and the positions you see are needed for a phrase query > to work correctly. > > I have seen working configurations where preserveOriginal is set on the > index analysis but NOT set on query analysis. This is how my own schema > is configured. One of the reasons for this configuration is to reduce > the number of terms in the query so it is faster than it would be if > preserveOriginal were present and generated additional terms. The > preserveOriginal on the index side ensures a match whether mixed case is > used or not. > > Thanks, > Shawn > >
Re: Position increment in WordDelimiterFilter.
Can you please send us tokens you get (and positions) when you analyze *WiFi device* On 15.01.2016 13:15, Modassar Ather wrote: Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two different token. Please refer to my examples given in previous mail about the issues faced. Wi Fi are two term which will match but what happens if for a content having *WiFi device* is searched with *"WiFi device"*. It will not match as there is a position increment by WordDelimiterFilter for WiFi. "WiFi device"~1 will match which is confusing that there is no gap in the content why a slop is required. Why do you use WordDelimiterFilter? Can you give us few examples where it is useful? It is useful when a word like* lucene-search documentation *is indexed with WordDelimiterFilter and it is broken in two terms like lucene and search then it will be helpful to get the documents containing it for queries like lucene documentation or search documentation. Best, Modassar On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Modassar, Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why do you use WordDelimiterFilter? Can you give us few examples where it is useful? Thanks, Emir On 15.01.2016 05:13, Modassar Ather wrote: Thanks for your responses. It seems to me that you don't want to split on numbers. It is not with number only. Even if you try to analyze WiFi it will create 4 token one of which will be at position 2. So basically the issue is with position increment which causes few of the queries behave unexpectedly. Which release of Solr are you using? I am using Lucene/Solr-5.4.0. Best, Modassar On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupanskywrote: Hi, I have following definition for WordDelimiterFilter. The analysis of 3d shows following four tokens and their positions. token position 3d 1 3 1 3d 1 d 2 Please help me understand why d is at 2? Should not it also be at position 1. Is it a bug and if not is there any attribute which I can use to restrict the position increment? Thanks, Modassar -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: Position increment in WordDelimiterFilter.
Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two different token. Please refer to my examples given in previous mail about the issues faced. Wi Fi are two term which will match but what happens if for a content having *WiFi device* is searched with *"WiFi device"*. It will not match as there is a position increment by WordDelimiterFilter for WiFi. "WiFi device"~1 will match which is confusing that there is no gap in the content why a slop is required. Why do you use WordDelimiterFilter? Can you give us few examples where it is useful? It is useful when a word like* lucene-search documentation *is indexed with WordDelimiterFilter and it is broken in two terms like lucene and search then it will be helpful to get the documents containing it for queries like lucene documentation or search documentation. Best, Modassar On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Modassar, > Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why > do you use WordDelimiterFilter? Can you give us few examples where it is > useful? > > Thanks, > Emir > > > On 15.01.2016 05:13, Modassar Ather wrote: > >> Thanks for your responses. >> >> It seems to me that you don't want to split on numbers. >> It is not with number only. Even if you try to analyze WiFi it will create >> 4 token one of which will be at position 2. So basically the issue is with >> position increment which causes few of the queries behave unexpectedly. >> >> Which release of Solr are you using? >> I am using Lucene/Solr-5.4.0. >> >> Best, >> Modassar >> >> On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky> > >> wrote: >> >> Which release of Solr are you using? Last year (or so) there was a Lucene >>> change that had the effect of keeping all terms for WDF at the same >>> position. There was also some discussion about whether this was either a >>> bug or a bug fix, but I don't recall any resolution. >>> >>> -- Jack Krupansky >>> >>> On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather >>> wrote: >>> >>> Hi, I have following definition for WordDelimiterFilter. >>> generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> The analysis of 3d shows following four tokens and their positions. token position 3d 1 3 1 3d 1 d 2 Please help me understand why d is at 2? Should not it also be at >>> position >>> 1. Is it a bug and if not is there any attribute which I can use to restrict the position increment? Thanks, Modassar > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > >
Re: Position increment in WordDelimiterFilter.
Modassar, Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why do you use WordDelimiterFilter? Can you give us few examples where it is useful? Thanks, Emir On 15.01.2016 05:13, Modassar Ather wrote: Thanks for your responses. It seems to me that you don't want to split on numbers. It is not with number only. Even if you try to analyze WiFi it will create 4 token one of which will be at position 2. So basically the issue is with position increment which causes few of the queries behave unexpectedly. Which release of Solr are you using? I am using Lucene/Solr-5.4.0. Best, Modassar On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupanskywrote: Which release of Solr are you using? Last year (or so) there was a Lucene change that had the effect of keeping all terms for WDF at the same position. There was also some discussion about whether this was either a bug or a bug fix, but I don't recall any resolution. -- Jack Krupansky On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather wrote: Hi, I have following definition for WordDelimiterFilter. The analysis of 3d shows following four tokens and their positions. token position 3d 1 3 1 3d 1 d 2 Please help me understand why d is at 2? Should not it also be at position 1. Is it a bug and if not is there any attribute which I can use to restrict the position increment? Thanks, Modassar -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: Position increment in WordDelimiterFilter.
I've tried out your settings and here's what I get: 3d 1 3 1 d 2 3d 2 1) can you confirm if you've made a typo while typing out your results? 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is split. Try the same thing with d3 and you'll get 3 and d3 at position 2 On Thu, 14 Jan 2016, 15:11 Emir Arnautovicwrote: > Hi Modassar, > Why do you think it should be at position 1? In that case searching for > "3 d" would not find anything. Is it what you expect? > > Thanks, > Emir > > On 14.01.2016 10:15, Modassar Ather wrote: > > Hi, > > > > I have following definition for WordDelimiterFilter. > > > > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> > > > > The analysis of 3d shows following four tokens and their positions. > > > > token position > > 3d 1 > > 3 1 > > 3d 1 > > d 2 > > > > Please help me understand why d is at 2? Should not it also be at > position > > 1. > > Is it a bug and if not is there any attribute which I can use to restrict > > the position increment? > > > > Thanks, > > Modassar > > > > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > -- Regards, Binoy Dalal
Re: Position increment in WordDelimiterFilter.
Hi Modassar, Why do you think it should be at position 1? In that case searching for "3 d" would not find anything. Is it what you expect? Thanks, Emir On 14.01.2016 10:15, Modassar Ather wrote: Hi, I have following definition for WordDelimiterFilter. The analysis of 3d shows following four tokens and their positions. token position 3d 1 3 1 3d 1 d 2 Please help me understand why d is at 2? Should not it also be at position 1. Is it a bug and if not is there any attribute which I can use to restrict the position increment? Thanks, Modassar -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: Position increment in WordDelimiterFilter.
Irrespective of it what I want to understand why there is an increment in position. Should not all the terms be at same position as they are yielded from the same term/token? No they won't. The positions are incremented because typically these splits are used in phrase queries which solr might autogenenerate or you might have enabled. For a phrase query to work in a case such as 3d solr needs to know that 3 comes before d and not the other way around. In the case that all the positions are the same, solr won't be able to tell that 3d could be a phrase and hence won't be able to query it as such. I hope that you understand what I'm trying to say. On Thu, 14 Jan 2016, 18:12 Modassar Atherwrote: > Thanks for your responses. > > Why do you think it should be at position 1? In that case searching for "3 > d" would not find anything. Is it what you expect? > During search some of the results returned are not wanted. Following is the > example. > Search query: "3d image" > Search results with 3-d image/3 d image/1d image are also returned. This is > happening because of position increment. > Another example is "1d obj*" returning results containing "d-object" > related results. This can bring a completely different search item. Here > the token d matches with d of d-object as this term is again split same > way. > The position increment will also cause the "3d image" search fail on a > document containing "3d image" as the "d" comes at position 2. > > 1) can you confirm if you've made a typo while typing out your results? > I have confirmed the position attribute displayed on analysis page and I > found there is no typo. > 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is > split. > Irrespective of it what I want to understand why there is an increment in > position. Should not all the terms be at same position as they are yielded > from the same term/token? > > Best, > Modassar > > On Thu, Jan 14, 2016 at 3:25 PM, Binoy Dalal > wrote: > > > I've tried out your settings and here's what I get: > > 3d 1 > > 3 1 > > d 2 > > 3d 2 > > > > 1) can you confirm if you've made a typo while typing out your results? > > 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is > > split. > > Try the same thing with d3 and you'll get 3 and d3 at position 2 > > > > On Thu, 14 Jan 2016, 15:11 Emir Arnautovic > > > wrote: > > > > > Hi Modassar, > > > Why do you think it should be at position 1? In that case searching for > > > "3 d" would not find anything. Is it what you expect? > > > > > > Thanks, > > > Emir > > > > > > On 14.01.2016 10:15, Modassar Ather wrote: > > > > Hi, > > > > > > > > I have following definition for WordDelimiterFilter. > > > > > > > > > > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > > > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> > > > > > > > > The analysis of 3d shows following four tokens and their positions. > > > > > > > > token position > > > > 3d 1 > > > > 3 1 > > > > 3d 1 > > > > d 2 > > > > > > > > Please help me understand why d is at 2? Should not it also be at > > > position > > > > 1. > > > > Is it a bug and if not is there any attribute which I can use to > > restrict > > > > the position increment? > > > > > > > > Thanks, > > > > Modassar > > > > > > > > > > -- > > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > -- > > Regards, > > Binoy Dalal > > > -- Regards, Binoy Dalal
Re: Position increment in WordDelimiterFilter.
Thanks for your responses. Why do you think it should be at position 1? In that case searching for "3 d" would not find anything. Is it what you expect? During search some of the results returned are not wanted. Following is the example. Search query: "3d image" Search results with 3-d image/3 d image/1d image are also returned. This is happening because of position increment. Another example is "1d obj*" returning results containing "d-object" related results. This can bring a completely different search item. Here the token d matches with d of d-object as this term is again split same way. The position increment will also cause the "3d image" search fail on a document containing "3d image" as the "d" comes at position 2. 1) can you confirm if you've made a typo while typing out your results? I have confirmed the position attribute displayed on analysis page and I found there is no typo. 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is split. Irrespective of it what I want to understand why there is an increment in position. Should not all the terms be at same position as they are yielded from the same term/token? Best, Modassar On Thu, Jan 14, 2016 at 3:25 PM, Binoy Dalalwrote: > I've tried out your settings and here's what I get: > 3d 1 > 3 1 > d 2 > 3d 2 > > 1) can you confirm if you've made a typo while typing out your results? > 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is > split. > Try the same thing with d3 and you'll get 3 and d3 at position 2 > > On Thu, 14 Jan 2016, 15:11 Emir Arnautovic > wrote: > > > Hi Modassar, > > Why do you think it should be at position 1? In that case searching for > > "3 d" would not find anything. Is it what you expect? > > > > Thanks, > > Emir > > > > On 14.01.2016 10:15, Modassar Ather wrote: > > > Hi, > > > > > > I have following definition for WordDelimiterFilter. > > > > > > > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> > > > > > > The analysis of 3d shows following four tokens and their positions. > > > > > > token position > > > 3d 1 > > > 3 1 > > > 3d 1 > > > d 2 > > > > > > Please help me understand why d is at 2? Should not it also be at > > position > > > 1. > > > Is it a bug and if not is there any attribute which I can use to > restrict > > > the position increment? > > > > > > Thanks, > > > Modassar > > > > > > > -- > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > Solr & Elasticsearch Support * http://sematext.com/ > > > > -- > Regards, > Binoy Dalal >
Re: Position increment in WordDelimiterFilter.
Hi, It seems to me that you don't want to split on numbers. Maybe there are other cases where you need to so it is turned on. If there are such cases I would suggest you create test with expectations so you can check what is best working for you. It is highly likely that you will not be able to create solution that will suite all cases so you will have to do some tradeoffs. Emir On 14.01.2016 13:42, Modassar Ather wrote: Thanks for your responses. Why do you think it should be at position 1? In that case searching for "3 d" would not find anything. Is it what you expect? During search some of the results returned are not wanted. Following is the example. Search query: "3d image" Search results with 3-d image/3 d image/1d image are also returned. This is happening because of position increment. Another example is "1d obj*" returning results containing "d-object" related results. This can bring a completely different search item. Here the token d matches with d of d-object as this term is again split same way. The position increment will also cause the "3d image" search fail on a document containing "3d image" as the "d" comes at position 2. 1) can you confirm if you've made a typo while typing out your results? I have confirmed the position attribute displayed on analysis page and I found there is no typo. 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is split. Irrespective of it what I want to understand why there is an increment in position. Should not all the terms be at same position as they are yielded from the same term/token? Best, Modassar On Thu, Jan 14, 2016 at 3:25 PM, Binoy Dalalwrote: I've tried out your settings and here's what I get: 3d 1 3 1 d 2 3d 2 1) can you confirm if you've made a typo while typing out your results? 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is split. Try the same thing with d3 and you'll get 3 and d3 at position 2 On Thu, 14 Jan 2016, 15:11 Emir Arnautovic wrote: Hi Modassar, Why do you think it should be at position 1? In that case searching for "3 d" would not find anything. Is it what you expect? Thanks, Emir On 14.01.2016 10:15, Modassar Ather wrote: Hi, I have following definition for WordDelimiterFilter. The analysis of 3d shows following four tokens and their positions. token position 3d 1 3 1 3d 1 d 2 Please help me understand why d is at 2? Should not it also be at position 1. Is it a bug and if not is there any attribute which I can use to restrict the position increment? Thanks, Modassar -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ -- Regards, Binoy Dalal -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: Position increment in WordDelimiterFilter.
Which release of Solr are you using? Last year (or so) there was a Lucene change that had the effect of keeping all terms for WDF at the same position. There was also some discussion about whether this was either a bug or a bug fix, but I don't recall any resolution. -- Jack Krupansky On Thu, Jan 14, 2016 at 4:15 AM, Modassar Atherwrote: > Hi, > > I have following definition for WordDelimiterFilter. > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> > > The analysis of 3d shows following four tokens and their positions. > > token position > 3d 1 > 3 1 > 3d 1 > d 2 > > Please help me understand why d is at 2? Should not it also be at position > 1. > Is it a bug and if not is there any attribute which I can use to restrict > the position increment? > > Thanks, > Modassar >
Re: Position increment in WordDelimiterFilter.
Thanks for your responses. It seems to me that you don't want to split on numbers. It is not with number only. Even if you try to analyze WiFi it will create 4 token one of which will be at position 2. So basically the issue is with position increment which causes few of the queries behave unexpectedly. Which release of Solr are you using? I am using Lucene/Solr-5.4.0. Best, Modassar On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupanskywrote: > Which release of Solr are you using? Last year (or so) there was a Lucene > change that had the effect of keeping all terms for WDF at the same > position. There was also some discussion about whether this was either a > bug or a bug fix, but I don't recall any resolution. > > -- Jack Krupansky > > On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather > wrote: > > > Hi, > > > > I have following definition for WordDelimiterFilter. > > > > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> > > > > The analysis of 3d shows following four tokens and their positions. > > > > token position > > 3d 1 > > 3 1 > > 3d 1 > > d 2 > > > > Please help me understand why d is at 2? Should not it also be at > position > > 1. > > Is it a bug and if not is there any attribute which I can use to restrict > > the position increment? > > > > Thanks, > > Modassar > > >