Re: Position increment in WordDelimiterFilter.

2016-01-20 Thread Alessandro Benedetti
On 19 January 2016 at 05:41, Modassar Ather  wrote:

> Thanks Shawn for your explanation.
>
> Everything else about the analysis looks
> correct to me, and the positions you see are needed for a phrase query
> to work correctly.
>
> Here the "WiFi device" will not be searched as there is a gap in between
> because Fi is at position 2. The document containing WiFi device will be
> seen as a phrase with no word in between hence it should match phrase "WiFi
> device" but it will not whereas "WiFi device"~1 will matched.
>
> ,Let's try to summarise in detail as this is quite confusing :

1) Index : "WiFi device"
tokenized as you described
[
WiFi1
> Wi  1
> WiFi1
> Fi  2
> device  3
]

2) Query time simple whitespace tokenized : "WiFi device"
[
WiFi(0)
device(1)
]

In this case, it will happen what you exactly quoted.
I should take a look to an old message in the mailing list, pretty sure we
faced this very same discussion.
The problem with word expansion is that whatever you do you are going to
get some side effect.

Cheers

> Best,
> Modassar
>
> On Mon, Jan 18, 2016 at 7:57 PM, Shawn Heisey  wrote:
>
> > On 1/18/2016 6:21 AM, Modassar Ather wrote:
> > > Can you please send us tokens you get (and positions) when you analyze
> > > *WiFi device*
> > >
> > > Tokens generated and their respective positions.
> > >
> > > WiFi1
> > > Wi  1
> > > WiFi1
> > > Fi  2
> > > device  3
> >
> > It seems very odd to me that the original value would show up twice with
> > the preserveOriginal parameter set, but I am seeing the same behavior on
> > 4.7 and 5.3.  Because both copies are at the same position, this will
> > not affect search, but will slightly affect relevance if you are not
> > specifying a sort parameter.  Everything else about the analysis looks
> > correct to me, and the positions you see are needed for a phrase query
> > to work correctly.
> >
> > I have seen working configurations where preserveOriginal is set on the
> > index analysis but NOT set on query analysis.  This is how my own schema
> > is configured.  One of the reasons for this configuration is to reduce
> > the number of terms in the query so it is faster than it would be if
> > preserveOriginal were present and generated additional terms.  The
> > preserveOriginal on the index side ensures a match whether mixed case is
> > used or not.
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Position increment in WordDelimiterFilter.

2016-01-18 Thread Shawn Heisey
On 1/18/2016 6:21 AM, Modassar Ather wrote:
> Can you please send us tokens you get (and positions) when you analyze
> *WiFi device*
>
> Tokens generated and their respective positions.
>
> WiFi1
> Wi  1
> WiFi1
> Fi  2
> device  3

It seems very odd to me that the original value would show up twice with
the preserveOriginal parameter set, but I am seeing the same behavior on
4.7 and 5.3.  Because both copies are at the same position, this will
not affect search, but will slightly affect relevance if you are not
specifying a sort parameter.  Everything else about the analysis looks
correct to me, and the positions you see are needed for a phrase query
to work correctly.

I have seen working configurations where preserveOriginal is set on the
index analysis but NOT set on query analysis.  This is how my own schema
is configured.  One of the reasons for this configuration is to reduce
the number of terms in the query so it is faster than it would be if
preserveOriginal were present and generated additional terms.  The
preserveOriginal on the index side ensures a match whether mixed case is
used or not.

Thanks,
Shawn



Re: Position increment in WordDelimiterFilter.

2016-01-18 Thread Modassar Ather
Can you please send us tokens you get (and positions) when you analyze
*WiFi device*

Tokens generated and their respective positions.

WiFi1
Wi   1
WiFi1
Fi2
device 3

Best,
Modassar

On Fri, Jan 15, 2016 at 6:25 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Can you please send us tokens you get (and positions) when you analyze
> *WiFi device*
>
> On 15.01.2016 13:15, Modassar Ather wrote:
>
>> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other?
>> I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two
>> different token. Please refer to my examples given in previous mail about
>> the issues faced.
>> Wi Fi are two term which will match but what happens if for a content
>> having *WiFi device* is searched with *"WiFi device"*. It will not match
>> as
>> there is a position increment by WordDelimiterFilter for WiFi.
>> "WiFi device"~1 will match which is confusing that there is no gap in the
>> content why a slop is required.
>>
>> Why do you use WordDelimiterFilter? Can you give us few examples where it
>> is useful?
>> It is useful when a word like* lucene-search documentation *is indexed
>> with
>>
>> WordDelimiterFilter and it is broken in two terms like lucene and search
>> then it will be helpful to get the documents containing it for queries
>> like
>> lucene documentation or search documentation.
>>
>> Best,
>> Modassar
>>
>> On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic <
>> emir.arnauto...@sematext.com> wrote:
>>
>> Modassar,
>>> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why
>>> do you use WordDelimiterFilter? Can you give us few examples where it is
>>> useful?
>>>
>>> Thanks,
>>> Emir
>>>
>>>
>>> On 15.01.2016 05:13, Modassar Ather wrote:
>>>
>>> Thanks for your responses.

 It seems to me that you don't want to split on numbers.
 It is not with number only. Even if you try to analyze WiFi it will
 create
 4 token one of which will be at position 2. So basically the issue is
 with
 position increment which causes few of the queries behave unexpectedly.

 Which release of Solr are you using?
 I am using Lucene/Solr-5.4.0.

 Best,
 Modassar

 On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky <
 jack.krupan...@gmail.com
 wrote:

 Which release of Solr are you using? Last year (or so) there was a
 Lucene

> change that had the effect of keeping all terms for WDF at the same
> position. There was also some discussion about whether this was either
> a
> bug or a bug fix, but I don't recall any resolution.
>
> -- Jack Krupansky
>
> On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather <
> modather1...@gmail.com>
> wrote:
>
> Hi,
>
>> I have following definition for WordDelimiterFilter.
>>
>> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>> catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
>>
>> The analysis of 3d shows following four tokens and their positions.
>>
>> token position
>> 3d 1
>> 3   1
>> 3d 1
>> d   2
>>
>> Please help me understand why d is at 2? Should not it also be at
>>
>> position
>
> 1.
>> Is it a bug and if not is there any attribute which I can use to
>> restrict
>> the position increment?
>>
>> Thanks,
>> Modassar
>>
>>
>> --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: Position increment in WordDelimiterFilter.

2016-01-18 Thread Modassar Ather
Thanks Shawn for your explanation.

Everything else about the analysis looks
correct to me, and the positions you see are needed for a phrase query
to work correctly.

Here the "WiFi device" will not be searched as there is a gap in between
because Fi is at position 2. The document containing WiFi device will be
seen as a phrase with no word in between hence it should match phrase "WiFi
device" but it will not whereas "WiFi device"~1 will matched.

Best,
Modassar

On Mon, Jan 18, 2016 at 7:57 PM, Shawn Heisey  wrote:

> On 1/18/2016 6:21 AM, Modassar Ather wrote:
> > Can you please send us tokens you get (and positions) when you analyze
> > *WiFi device*
> >
> > Tokens generated and their respective positions.
> >
> > WiFi1
> > Wi  1
> > WiFi1
> > Fi  2
> > device  3
>
> It seems very odd to me that the original value would show up twice with
> the preserveOriginal parameter set, but I am seeing the same behavior on
> 4.7 and 5.3.  Because both copies are at the same position, this will
> not affect search, but will slightly affect relevance if you are not
> specifying a sort parameter.  Everything else about the analysis looks
> correct to me, and the positions you see are needed for a phrase query
> to work correctly.
>
> I have seen working configurations where preserveOriginal is set on the
> index analysis but NOT set on query analysis.  This is how my own schema
> is configured.  One of the reasons for this configuration is to reduce
> the number of terms in the query so it is faster than it would be if
> preserveOriginal were present and generated additional terms.  The
> preserveOriginal on the index side ensures a match whether mixed case is
> used or not.
>
> Thanks,
> Shawn
>
>


Re: Position increment in WordDelimiterFilter.

2016-01-15 Thread Emir Arnautovic
Can you please send us tokens you get (and positions) when you analyze 
*WiFi device*


On 15.01.2016 13:15, Modassar Ather wrote:

Are you saying that WiFi Wi-Fi and Wi Fi should not match each other?
I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two
different token. Please refer to my examples given in previous mail about
the issues faced.
Wi Fi are two term which will match but what happens if for a content
having *WiFi device* is searched with *"WiFi device"*. It will not match as
there is a position increment by WordDelimiterFilter for WiFi.
"WiFi device"~1 will match which is confusing that there is no gap in the
content why a slop is required.

Why do you use WordDelimiterFilter? Can you give us few examples where it
is useful?
It is useful when a word like* lucene-search documentation *is indexed with
WordDelimiterFilter and it is broken in two terms like lucene and search
then it will be helpful to get the documents containing it for queries like
lucene documentation or search documentation.

Best,
Modassar

On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Modassar,
Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why
do you use WordDelimiterFilter? Can you give us few examples where it is
useful?

Thanks,
Emir


On 15.01.2016 05:13, Modassar Ather wrote:


Thanks for your responses.

It seems to me that you don't want to split on numbers.
It is not with number only. Even if you try to analyze WiFi it will create
4 token one of which will be at position 2. So basically the issue is with
position increment which causes few of the queries behave unexpectedly.

Which release of Solr are you using?
I am using Lucene/Solr-5.4.0.

Best,
Modassar

On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky 
wrote:

Hi,

I have following definition for WordDelimiterFilter.



The analysis of 3d shows following four tokens and their positions.

token position
3d 1
3   1
3d 1
d   2

Please help me understand why d is at 2? Should not it also be at


position


1.
Is it a bug and if not is there any attribute which I can use to
restrict
the position increment?

Thanks,
Modassar



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Position increment in WordDelimiterFilter.

2016-01-15 Thread Modassar Ather
Are you saying that WiFi Wi-Fi and Wi Fi should not match each other?
I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two
different token. Please refer to my examples given in previous mail about
the issues faced.
Wi Fi are two term which will match but what happens if for a content
having *WiFi device* is searched with *"WiFi device"*. It will not match as
there is a position increment by WordDelimiterFilter for WiFi.
"WiFi device"~1 will match which is confusing that there is no gap in the
content why a slop is required.

Why do you use WordDelimiterFilter? Can you give us few examples where it
is useful?
It is useful when a word like* lucene-search documentation *is indexed with
WordDelimiterFilter and it is broken in two terms like lucene and search
then it will be helpful to get the documents containing it for queries like
lucene documentation or search documentation.

Best,
Modassar

On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Modassar,
> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why
> do you use WordDelimiterFilter? Can you give us few examples where it is
> useful?
>
> Thanks,
> Emir
>
>
> On 15.01.2016 05:13, Modassar Ather wrote:
>
>> Thanks for your responses.
>>
>> It seems to me that you don't want to split on numbers.
>> It is not with number only. Even if you try to analyze WiFi it will create
>> 4 token one of which will be at position 2. So basically the issue is with
>> position increment which causes few of the queries behave unexpectedly.
>>
>> Which release of Solr are you using?
>> I am using Lucene/Solr-5.4.0.
>>
>> Best,
>> Modassar
>>
>> On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky > >
>> wrote:
>>
>> Which release of Solr are you using? Last year (or so) there was a Lucene
>>> change that had the effect of keeping all terms for WDF at the same
>>> position. There was also some discussion about whether this was either a
>>> bug or a bug fix, but I don't recall any resolution.
>>>
>>> -- Jack Krupansky
>>>
>>> On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather 
>>> wrote:
>>>
>>> Hi,

 I have following definition for WordDelimiterFilter.

 >>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
 catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>

 The analysis of 3d shows following four tokens and their positions.

 token position
 3d 1
 3   1
 3d 1
 d   2

 Please help me understand why d is at 2? Should not it also be at

>>> position
>>>
 1.
 Is it a bug and if not is there any attribute which I can use to
 restrict
 the position increment?

 Thanks,
 Modassar


> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: Position increment in WordDelimiterFilter.

2016-01-15 Thread Emir Arnautovic

Modassar,
Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? 
Why do you use WordDelimiterFilter? Can you give us few examples where 
it is useful?


Thanks,
Emir

On 15.01.2016 05:13, Modassar Ather wrote:

Thanks for your responses.

It seems to me that you don't want to split on numbers.
It is not with number only. Even if you try to analyze WiFi it will create
4 token one of which will be at position 2. So basically the issue is with
position increment which causes few of the queries behave unexpectedly.

Which release of Solr are you using?
I am using Lucene/Solr-5.4.0.

Best,
Modassar

On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky 
wrote:


Which release of Solr are you using? Last year (or so) there was a Lucene
change that had the effect of keeping all terms for WDF at the same
position. There was also some discussion about whether this was either a
bug or a bug fix, but I don't recall any resolution.

-- Jack Krupansky

On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather 
wrote:


Hi,

I have following definition for WordDelimiterFilter.



The analysis of 3d shows following four tokens and their positions.

token position
3d 1
3   1
3d 1
d   2

Please help me understand why d is at 2? Should not it also be at

position

1.
Is it a bug and if not is there any attribute which I can use to restrict
the position increment?

Thanks,
Modassar



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Binoy Dalal
I've tried out your settings and here's what I get:
3d 1
3   1
d   2
3d 2

1) can you confirm if you've made a typo while typing out your results?
2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is
split.
Try the same thing with d3 and you'll get 3 and d3 at position 2

On Thu, 14 Jan 2016, 15:11 Emir Arnautovic 
wrote:

> Hi Modassar,
> Why do you think it should be at position 1? In that case searching for
> "3 d" would not find anything. Is it what you expect?
>
> Thanks,
> Emir
>
> On 14.01.2016 10:15, Modassar Ather wrote:
> > Hi,
> >
> > I have following definition for WordDelimiterFilter.
> >
> >  > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
> >
> > The analysis of 3d shows following four tokens and their positions.
> >
> > token position
> > 3d 1
> > 3   1
> > 3d 1
> > d   2
> >
> > Please help me understand why d is at 2? Should not it also be at
> position
> > 1.
> > Is it a bug and if not is there any attribute which I can use to restrict
> > the position increment?
> >
> > Thanks,
> > Modassar
> >
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
> --
Regards,
Binoy Dalal


Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Emir Arnautovic

Hi Modassar,
Why do you think it should be at position 1? In that case searching for 
"3 d" would not find anything. Is it what you expect?


Thanks,
Emir

On 14.01.2016 10:15, Modassar Ather wrote:

Hi,

I have following definition for WordDelimiterFilter.



The analysis of 3d shows following four tokens and their positions.

token position
3d 1
3   1
3d 1
d   2

Please help me understand why d is at 2? Should not it also be at position
1.
Is it a bug and if not is there any attribute which I can use to restrict
the position increment?

Thanks,
Modassar



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Binoy Dalal
Irrespective of it what I want to understand why there is an increment in
position. Should not all the terms be at same position as they are yielded
from the same term/token?

No they won't.
The positions are incremented because typically these splits are used in
phrase queries which solr might autogenenerate or you might have enabled.
For a phrase query to work in a case such as 3d solr needs to know that 3
comes before d and not the other way around.
In the case that all the positions are the same, solr won't be able to tell
that 3d could be a phrase and hence won't be able to query it as such.
I hope that you understand what I'm trying to say.

On Thu, 14 Jan 2016, 18:12 Modassar Ather  wrote:

> Thanks for your responses.
>
> Why do you think it should be at position 1? In that case searching for "3
> d" would not find anything. Is it what you expect?
> During search some of the results returned are not wanted. Following is the
> example.
> Search query: "3d image"
> Search results with 3-d image/3 d image/1d image are also returned. This is
> happening because of position increment.
> Another example is "1d obj*" returning results containing "d-object"
> related results. This can bring a completely different search item. Here
> the token d matches with d of d-object as this term is again split same
> way.
> The position increment will also cause the "3d image" search fail on a
> document containing "3d image" as the "d" comes at position 2.
>
> 1) can you confirm if you've made a typo while typing out your results?
> I have confirmed the position attribute displayed on analysis page and I
> found there is no typo.
> 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is
> split.
> Irrespective of it what I want to understand why there is an increment in
> position. Should not all the terms be at same position as they are yielded
> from the same term/token?
>
> Best,
> Modassar
>
> On Thu, Jan 14, 2016 at 3:25 PM, Binoy Dalal 
> wrote:
>
> > I've tried out your settings and here's what I get:
> > 3d 1
> > 3   1
> > d   2
> > 3d 2
> >
> > 1) can you confirm if you've made a typo while typing out your results?
> > 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is
> > split.
> > Try the same thing with d3 and you'll get 3 and d3 at position 2
> >
> > On Thu, 14 Jan 2016, 15:11 Emir Arnautovic  >
> > wrote:
> >
> > > Hi Modassar,
> > > Why do you think it should be at position 1? In that case searching for
> > > "3 d" would not find anything. Is it what you expect?
> > >
> > > Thanks,
> > > Emir
> > >
> > > On 14.01.2016 10:15, Modassar Ather wrote:
> > > > Hi,
> > > >
> > > > I have following definition for WordDelimiterFilter.
> > > >
> > > >  > > > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > > > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
> > > >
> > > > The analysis of 3d shows following four tokens and their positions.
> > > >
> > > > token position
> > > > 3d 1
> > > > 3   1
> > > > 3d 1
> > > > d   2
> > > >
> > > > Please help me understand why d is at 2? Should not it also be at
> > > position
> > > > 1.
> > > > Is it a bug and if not is there any attribute which I can use to
> > restrict
> > > > the position increment?
> > > >
> > > > Thanks,
> > > > Modassar
> > > >
> > >
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > > --
> > Regards,
> > Binoy Dalal
> >
>
-- 
Regards,
Binoy Dalal


Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Modassar Ather
Thanks for your responses.

Why do you think it should be at position 1? In that case searching for "3
d" would not find anything. Is it what you expect?
During search some of the results returned are not wanted. Following is the
example.
Search query: "3d image"
Search results with 3-d image/3 d image/1d image are also returned. This is
happening because of position increment.
Another example is "1d obj*" returning results containing "d-object"
related results. This can bring a completely different search item. Here
the token d matches with d of d-object as this term is again split same way.
The position increment will also cause the "3d image" search fail on a
document containing "3d image" as the "d" comes at position 2.

1) can you confirm if you've made a typo while typing out your results?
I have confirmed the position attribute displayed on analysis page and I
found there is no typo.
2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is
split.
Irrespective of it what I want to understand why there is an increment in
position. Should not all the terms be at same position as they are yielded
from the same term/token?

Best,
Modassar

On Thu, Jan 14, 2016 at 3:25 PM, Binoy Dalal  wrote:

> I've tried out your settings and here's what I get:
> 3d 1
> 3   1
> d   2
> 3d 2
>
> 1) can you confirm if you've made a typo while typing out your results?
> 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is
> split.
> Try the same thing with d3 and you'll get 3 and d3 at position 2
>
> On Thu, 14 Jan 2016, 15:11 Emir Arnautovic 
> wrote:
>
> > Hi Modassar,
> > Why do you think it should be at position 1? In that case searching for
> > "3 d" would not find anything. Is it what you expect?
> >
> > Thanks,
> > Emir
> >
> > On 14.01.2016 10:15, Modassar Ather wrote:
> > > Hi,
> > >
> > > I have following definition for WordDelimiterFilter.
> > >
> > >  > > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
> > >
> > > The analysis of 3d shows following four tokens and their positions.
> > >
> > > token position
> > > 3d 1
> > > 3   1
> > > 3d 1
> > > d   2
> > >
> > > Please help me understand why d is at 2? Should not it also be at
> > position
> > > 1.
> > > Is it a bug and if not is there any attribute which I can use to
> restrict
> > > the position increment?
> > >
> > > Thanks,
> > > Modassar
> > >
> >
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> > --
> Regards,
> Binoy Dalal
>


Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Emir Arnautovic

Hi,
It seems to me that you don't want to split on numbers. Maybe there are 
other cases where you need to so it is turned on. If there are such 
cases I would suggest you create test with expectations so you can check 
what is best working for you. It is highly likely that you will not be 
able to create solution that will suite all cases so you will have to do 
some tradeoffs.


Emir

On 14.01.2016 13:42, Modassar Ather wrote:

Thanks for your responses.

Why do you think it should be at position 1? In that case searching for "3
d" would not find anything. Is it what you expect?
During search some of the results returned are not wanted. Following is the
example.
Search query: "3d image"
Search results with 3-d image/3 d image/1d image are also returned. This is
happening because of position increment.
Another example is "1d obj*" returning results containing "d-object"
related results. This can bring a completely different search item. Here
the token d matches with d of d-object as this term is again split same way.
The position increment will also cause the "3d image" search fail on a
document containing "3d image" as the "d" comes at position 2.

1) can you confirm if you've made a typo while typing out your results?
I have confirmed the position attribute displayed on analysis page and I
found there is no typo.
2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is
split.
Irrespective of it what I want to understand why there is an increment in
position. Should not all the terms be at same position as they are yielded
from the same term/token?

Best,
Modassar

On Thu, Jan 14, 2016 at 3:25 PM, Binoy Dalal  wrote:


I've tried out your settings and here's what I get:
3d 1
3   1
d   2
3d 2

1) can you confirm if you've made a typo while typing out your results?
2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is
split.
Try the same thing with d3 and you'll get 3 and d3 at position 2

On Thu, 14 Jan 2016, 15:11 Emir Arnautovic 
wrote:


Hi Modassar,
Why do you think it should be at position 1? In that case searching for
"3 d" would not find anything. Is it what you expect?

Thanks,
Emir

On 14.01.2016 10:15, Modassar Ather wrote:

Hi,

I have following definition for WordDelimiterFilter.



The analysis of 3d shows following four tokens and their positions.

token position
3d 1
3   1
3d 1
d   2

Please help me understand why d is at 2? Should not it also be at

position

1.
Is it a bug and if not is there any attribute which I can use to

restrict

the position increment?

Thanks,
Modassar


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

--

Regards,
Binoy Dalal



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Jack Krupansky
Which release of Solr are you using? Last year (or so) there was a Lucene
change that had the effect of keeping all terms for WDF at the same
position. There was also some discussion about whether this was either a
bug or a bug fix, but I don't recall any resolution.

-- Jack Krupansky

On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather 
wrote:

> Hi,
>
> I have following definition for WordDelimiterFilter.
>
>  generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
>
> The analysis of 3d shows following four tokens and their positions.
>
> token position
> 3d 1
> 3   1
> 3d 1
> d   2
>
> Please help me understand why d is at 2? Should not it also be at position
> 1.
> Is it a bug and if not is there any attribute which I can use to restrict
> the position increment?
>
> Thanks,
> Modassar
>


Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Modassar Ather
Thanks for your responses.

It seems to me that you don't want to split on numbers.
It is not with number only. Even if you try to analyze WiFi it will create
4 token one of which will be at position 2. So basically the issue is with
position increment which causes few of the queries behave unexpectedly.

Which release of Solr are you using?
I am using Lucene/Solr-5.4.0.

Best,
Modassar

On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky 
wrote:

> Which release of Solr are you using? Last year (or so) there was a Lucene
> change that had the effect of keeping all terms for WDF at the same
> position. There was also some discussion about whether this was either a
> bug or a bug fix, but I don't recall any resolution.
>
> -- Jack Krupansky
>
> On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather 
> wrote:
>
> > Hi,
> >
> > I have following definition for WordDelimiterFilter.
> >
> >  > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
> >
> > The analysis of 3d shows following four tokens and their positions.
> >
> > token position
> > 3d 1
> > 3   1
> > 3d 1
> > d   2
> >
> > Please help me understand why d is at 2? Should not it also be at
> position
> > 1.
> > Is it a bug and if not is there any attribute which I can use to restrict
> > the position increment?
> >
> > Thanks,
> > Modassar
> >
>