subject:"wildcard search"

Re: wildcard search doesn't fetch results when field has whi

2019-03-31 Thread Erick Erickson

Yes. But You haven’t told us what _type_ of field you’re working with though. 
If it’s a “string” type, then ComplexPhraseQueryParser won’t work. Looking 
again at your example it looks as though you are using strings. Then try
abc\ d*

Adding debug=query to your url will show you how the query gets parsed and may 
help considerably.

Best,
Erick
 

> On Mar 31, 2019, at 7:24 AM, Ahemad Ali  
> wrote:
> 
> Erick,I tried complexqueryparser, still no result.Escape white space, do you 
> mean to say using "\" ?Thanks,Ahemad 
> 
> Sent from Yahoo Mail on Android 
> 
>  On Sun, Mar 31, 2019 at 1:22, Erick Erickson wrote: 
>   Try complexphrasequeryparser. If (and only if) you always want to search
> from the beginning of the content, you might be able to use string rather
> than text-based Fields but make sure to escape whitespace...
> 
> Best,
> Erick
> 
> On Sat, Mar 30, 2019, 10:33 ahemad.sh...@yahoo.com.INVALID
>  wrote:
> 
>> Hi ,
>> I have field with white spaces and special characters on which indexing
>> needs to be done to do wildcard querying.
>> It works for most of the scnearios with wildcard search.
>> e.g. if my data is "ali.abc" and "abc_pqr" and "ali abc" and "ahemad ali"
>> then search with ali* gives this three results.
>> 
>> But I am not able to search with say -  ali a*
>> 
>> Search with query q="ali abc" gives exact match and desired result.
>> 
>> I want to do wildcard search where criteria can include spaces like
>> example - "ahemad a* or ahemad a*
>> 
>> 
>> i.e. if space is present then I am not able to to wildcard search.
>> 
>> Is there any way by which wildcard search will be achieved even if space
>> is present in token.
>> 
>> The field type have is below:
>> 
>> > sortMissingLast="true">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> > replacement=""replace="all" />
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> > replacement=""replace="all" />
>> 
>> 
>> 
>> 
>> Any help would be great.
>> Thanks,Ahemad Ali
>

Re: wildcard search doesn't fetch results when field has whi

2019-03-31 Thread Ahemad Ali

Erick,I tried complexqueryparser, still no result.Escape white space, do you 
mean to say using "\" ?Thanks,Ahemad 

Sent from Yahoo Mail on Android 
 
  On Sun, Mar 31, 2019 at 1:22, Erick Erickson wrote:  
 Try complexphrasequeryparser. If (and only if) you always want to search
from the beginning of the content, you might be able to use string rather
than text-based Fields but make sure to escape whitespace...

Best,
Erick

On Sat, Mar 30, 2019, 10:33 ahemad.sh...@yahoo.com.INVALID
 wrote:

> Hi ,
> I have field with white spaces and special characters on which indexing
> needs to be done to do wildcard querying.
> It works for most of the scnearios with wildcard search.
> e.g. if my data is "ali.abc" and "abc_pqr" and "ali abc" and "ahemad ali"
> then search with ali* gives this three results.
>
> But I am not able to search with say -  ali a*
>
> Search with query q="ali abc" gives exact match and desired result.
>
> I want to do wildcard search where criteria can include spaces like
> example - "ahemad a* or ahemad a*
>
>
> i.e. if space is present then I am not able to to wildcard search.
>
> Is there any way by which wildcard search will be achieved even if space
> is present in token.
>
> The field type have is below:
>
>     sortMissingLast="true">
>
>    
>
>        
>
>        
>
>         replacement=""replace="all" />
>
>        
>
>    
>
>    
>
>        
>
>        
>
>         replacement=""replace="all" />
>
>    
>
> 
> Any help would be great.
> Thanks,Ahemad Ali

Re: wildcard search doesn't fetch results when field has white spaces and special charecters

2019-03-30 Thread Erick Erickson

Try complexphrasequeryparser. If (and only if) you always want to search
from the beginning of the content, you might be able to use string rather
than text-based Fields but make sure to escape whitespace...

Best,
Erick

On Sat, Mar 30, 2019, 10:33 ahemad.sh...@yahoo.com.INVALID
 wrote:

> Hi ,
> I have field with white spaces and special characters on which indexing
> needs to be done to do wildcard querying.
> It works for most of the scnearios with wildcard search.
> e.g. if my data is "ali.abc" and "abc_pqr" and "ali abc" and "ahemad ali"
> then search with ali* gives this three results.
>
> But I am not able to search with say -  ali a*
>
> Search with query q="ali abc" gives exact match and desired result.
>
> I want to do wildcard search where criteria can include spaces like
> example - "ahemad a* or ahemad a*
>
>
> i.e. if space is present then I am not able to to wildcard search.
>
> Is there any way by which wildcard search will be achieved even if space
> is present in token.
>
> The field type have is below:
>
>  sortMissingLast="true">
>
> 
>
> 
>
> 
>
>  replacement=""replace="all" />
>
> 
>
> 
>
> 
>
> 
>
> 
>
>  replacement=""replace="all" />
>
> 
>
> 
> Any help would be great.
> Thanks,Ahemad Ali

wildcard search doesn't fetch results when field has white spaces and special charecters

2019-03-30 Thread ahemad.sh...@yahoo.com.INVALID

Hi ,
I have field with white spaces and special characters on which indexing needs 
to be done to do wildcard querying.
It works for most of the scnearios with wildcard search.
e.g. if my data is "ali.abc" and "abc_pqr" and "ali abc" and "ahemad ali" then 
search with ali* gives this three results.

But I am not able to search with say -  ali a*

Search with query q="ali abc" gives exact match and desired result.

I want to do wildcard search where criteria can include spaces like example - 
"ahemad a* or ahemad a*


i.e. if space is present then I am not able to to wildcard search.

Is there any way by which wildcard search will be achieved even if space is 
present in token.

The field type have is below:

    

    

    

    

    

    

    

    

    

    

    

    


Any help would be great.
Thanks,Ahemad Ali

Re: Edismax leading wildcard search

2018-12-07 Thread Erick Erickson

Well, the other option is to allow leading wildcards, but use
ReversedWildcardFilterFactory. Admittedly that increases the size of
your index, but apparently your users expect leading wildcards so why
not support them?

Best,
Erick
On Fri, Dec 7, 2018 at 6:58 AM Kudrettin Güleryüz  wrote:
>
> Hi,
>
> I am also wondering how to disable leading wildcards in Solr. Can you
> please suggest how to disable leading wildcards in Solr? I know in Lucene
> it is a flag that's set to false by default.
>
> > Do it on the client side. Just don't allow leading asterisks or question
> marks in your query term.
>
> This does not look trivial to me. A search query can be very complicated.
> How do you suggest to detect trailing wildcards from a complicated Lucene
> query?
>
> Thank you
>
> On Fri, Dec 22, 2017 at 6:07 AM Michael Kuhlmann  wrote:
>
> > Am 22.12.2017 um 11:57 schrieb Selvam Raman:
> > > 1) how can i disable leading wildcard search
> >
> > Do it on the client side. Just don't allow leading asterisks or question
> > marks in your query term.
> >
> > > 2) why leading wildcard search takes so much of time to give the
> > response.
> > >
> >
> > Because Lucene can't just look in the index for all terms beginning with
> > something; it needs to look in all terms instead. Basically, indexed
> > terms are in alphabetical order, but that doesn't help with leading
> > wildcards.
> >
> > There's a ReversedWildcardFilterFactory in Solr to address this issue.
> >
> > -Michael
> >

Re: Edismax leading wildcard search

2018-12-07 Thread Kudrettin Güleryüz

Hi,

I am also wondering how to disable leading wildcards in Solr. Can you
please suggest how to disable leading wildcards in Solr? I know in Lucene
it is a flag that's set to false by default.

> Do it on the client side. Just don't allow leading asterisks or question
marks in your query term.

This does not look trivial to me. A search query can be very complicated.
How do you suggest to detect trailing wildcards from a complicated Lucene
query?

Thank you

On Fri, Dec 22, 2017 at 6:07 AM Michael Kuhlmann  wrote:

> Am 22.12.2017 um 11:57 schrieb Selvam Raman:
> > 1) how can i disable leading wildcard search
>
> Do it on the client side. Just don't allow leading asterisks or question
> marks in your query term.
>
> > 2) why leading wildcard search takes so much of time to give the
> response.
> >
>
> Because Lucene can't just look in the index for all terms beginning with
> something; it needs to look in all terms instead. Basically, indexed
> terms are in alphabetical order, but that doesn't help with leading
> wildcards.
>
> There's a ReversedWildcardFilterFactory in Solr to address this issue.
>
> -Michael
>

Re: Enable default wildcard search

2017-12-29 Thread Mikhail Khludnev

Right. Sticking to index only processing here should resolve false matches
to 3116 by [3115].
The log should have OutOfMemoryError: heap blah or something. That's the
cause.

On Fri, Dec 29, 2017 at 5:27 PM, Siarhei Chystsiakou 
wrote:

> Thank you for your answer.
>
> I tried to use  EdgeNGram under the same settings
>
>  maxGramSize="25"/>
>
> the same problem emerged, the search was not exactly correct. For instance,
> I need to find the figure 311570, I enter 3115 into the search bar, in the
> result I get all the figures that start from 311 and not 3115. Should I
> probably had to turn on this option for indexing only?
> But I'm still concerned with the fact that in case of this option Solr
> often crashed during indexing. How to turn on debug correctly so as to show
> you detailed errors?
>
>
> RU 
> Спасибо за Ваш ответ.
> Я пробовал использовать EdgeNGram при таких же настройках
>
>  maxGramSize="25"/>
>
> возникала такая же проблема, был не совсем правильный поиск. Например надо
> найти число 311570 в поиск я ввожу 3115, в результате я получал все числа
> которые начинались на 311 а не 3115. Возможно данную опцию надо было
> включить только для индексации ?
> Но меня все равно беспокоит, что при данной опции часто в процессе
> индексации падал Solr. Как правильно включить debug что бы Вам показать
> более детальные ошибки ?
>
>
>
>
> 2017-12-28 22:47 GMT+01:00 Mikhail Khludnev :
>
> > Obviously, Chris has nothing in common with Christmas, hence this classic
> > search behavior is correct.
> > What people are asking here is autocomplete, and it's a separate UX and
> > algorithms.
> > You can start to explore different aspects of this field from
> > https://lucidworks.com/2015/03/04/solr-suggester/
> > You see NGamming just freak the heap out. So, you can band aid it with
> > EdgeNGram (and it's what you probably want to have) and add some heap to
> > your poor server.
> > Another approach, is to stop ngramming but try to really search by
> wildcard
> > with http://yonik.com/solr-query-parameter-substitution/
> > It should be something like q=${text}* and when client pass text=foo it
> > searches for foo*, but it doesn't work for a few words and expensive as
> > well.
> >
> > On Wed, Dec 27, 2017 at 3:34 PM, Siarhei Chystsiakou  >
> > wrote:
> >
> > > Hi everybody!
> > > I  try integration Solr 6.6.1  with my email server (dovecot 2.32). I
> > have
> > > the following  settings:
> > >
> > > schema.xml - https://pastebin.com/1XXWTs8V
> > > solrconfig.xml - https://pastebin.com/5HSswCcv
> > >
> > > But under these settings, the search works only on the full
> coincidence,
> > > for instance, if I search for Chris it doesn't find  Christmas. The
> > client
> > > does not support wildcard search. I would like to know how to turn on
> > > wildcard search for all queries.
> > >
> > > I tried to do that by adding the following line to schema.xml
> > >
> > >  > maxGramSize="25"/>
> > >
> > > but when I added it, Solr 6.6.1 very often showed errors during the
> > > indexing, which led to its full crash, even the web interface didn't
> > > respond, only the full Solr restart helped. This problem emerged both
> on
> > > Solr 6.6.1 and Solr 7.2
> > >
> > > Also, in case of this option, the search result was not what I
> expected.
> > > For example, when I searched for the word domain, the words domes and
> > > domain were also included. I suppose, that from the point of view of
> this
> > > operation, the result is correct, but this is not what I need.
> > >
> > > That is why I would like to know, how to turn on the standard wildcard
> > > search. As it is impossible on the client's side, I would like to
> manage
> > it
> > > from the Solr side.
> > >
> > > Thanks.
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>



-- 
Sincerely yours
Mikhail Khludnev

Re: Enable default wildcard search

2017-12-29 Thread Siarhei Chystsiakou

Hi Rick!
Yes, as soon as I get the required result I'll definitely publish it on
GitHub. Dovecot default scheme  doesn't suit me, when I use it the search
works according to the full word, but I want to make it wildcard search. I
don't have the solution. Hope the this group will help me.


2017-12-29 17:56 GMT+01:00 Rick Leir :

> Siarhei:
> Will you be putting up your system at github? I would like to Solr-ize my
> dovecot.
>
> Maybe you saw this already:
> https://github.com/dovecot/core/blob/master/doc/solr-schema.xml
>
> https://github.com/dovecot/core/blob/master/src/plugins/
> fts-solr/solr-connection.c
>
> https://github.com/dovecot/core/blob/master/src/plugins/
> fts-solr/fts-solr-plugin.h
>
> https://github.com/bdraco/dovecot/blob/master/doc/wiki/
> Plugins.FTS.Solr.txt
> Cheers -- Rick
>
> On December 28, 2017 4:15:06 PM EST, Siarhei Chystsiakou <
> brest...@gmail.com> wrote:
> >Hi
> >Does anyone have any idea how to fix this?
> >
> >2017-12-27 13:34 GMT+01:00 Siarhei Chystsiakou :
> >
> >> Hi everybody!
> >> I  try integration Solr 6.6.1  with my email server (dovecot 2.32). I
> >have
> >> the following  settings:
> >>
> >> schema.xml - https://pastebin.com/1XXWTs8V
> >> solrconfig.xml - https://pastebin.com/5HSswCcv
> >>
> >> But under these settings, the search works only on the full
> >coincidence,
> >> for instance, if I search for Chris it doesn't find  Christmas. The
> >client
> >> does not support wildcard search. I would like to know how to turn on
> >> wildcard search for all queries.
> >>
> >> I tried to do that by adding the following line to schema.xml
> >>
> >>  >maxGramSize="25"/>
> >>
> >> but when I added it, Solr 6.6.1 very often showed errors during the
> >> indexing, which led to its full crash, even the web interface didn't
> >> respond, only the full Solr restart helped. This problem emerged both
> >on
> >> Solr 6.6.1 and Solr 7.2
> >>
> >> Also, in case of this option, the search result was not what I
> >expected.
> >> For example, when I searched for the word domain, the words domes and
> >> domain were also included. I suppose, that from the point of view of
> >this
> >> operation, the result is correct, but this is not what I need.
> >>
> >> That is why I would like to know, how to turn on the standard
> >wildcard
> >> search. As it is impossible on the client's side, I would like to
> >manage it
> >> from the Solr side.
> >>
> >> Thanks.
> >>
> >>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Enable default wildcard search

2017-12-29 Thread Rick Leir

Siarhei:
Will you be putting up your system at github? I would like to Solr-ize my 
dovecot.

Maybe you saw this already:
https://github.com/dovecot/core/blob/master/doc/solr-schema.xml

https://github.com/dovecot/core/blob/master/src/plugins/fts-solr/solr-connection.c

https://github.com/dovecot/core/blob/master/src/plugins/fts-solr/fts-solr-plugin.h

https://github.com/bdraco/dovecot/blob/master/doc/wiki/Plugins.FTS.Solr.txt
Cheers -- Rick

On December 28, 2017 4:15:06 PM EST, Siarhei Chystsiakou  
wrote:
>Hi
>Does anyone have any idea how to fix this?
>
>2017-12-27 13:34 GMT+01:00 Siarhei Chystsiakou :
>
>> Hi everybody!
>> I  try integration Solr 6.6.1  with my email server (dovecot 2.32). I
>have
>> the following  settings:
>>
>> schema.xml - https://pastebin.com/1XXWTs8V
>> solrconfig.xml - https://pastebin.com/5HSswCcv
>>
>> But under these settings, the search works only on the full
>coincidence,
>> for instance, if I search for Chris it doesn't find  Christmas. The
>client
>> does not support wildcard search. I would like to know how to turn on
>> wildcard search for all queries.
>>
>> I tried to do that by adding the following line to schema.xml
>>
>> maxGramSize="25"/>
>>
>> but when I added it, Solr 6.6.1 very often showed errors during the
>> indexing, which led to its full crash, even the web interface didn't
>> respond, only the full Solr restart helped. This problem emerged both
>on
>> Solr 6.6.1 and Solr 7.2
>>
>> Also, in case of this option, the search result was not what I
>expected.
>> For example, when I searched for the word domain, the words domes and
>> domain were also included. I suppose, that from the point of view of
>this
>> operation, the result is correct, but this is not what I need.
>>
>> That is why I would like to know, how to turn on the standard
>wildcard
>> search. As it is impossible on the client's side, I would like to
>manage it
>> from the Solr side.
>>
>> Thanks.
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Enable default wildcard search

2017-12-29 Thread Siarhei Chystsiakou

Thank you for your answer.

I tried to use  EdgeNGram under the same settings

the same problem emerged, the search was not exactly correct. For instance,
I need to find the figure 311570, I enter 3115 into the search bar, in the
result I get all the figures that start from 311 and not 3115. Should I
probably had to turn on this option for indexing only?
But I'm still concerned with the fact that in case of this option Solr
often crashed during indexing. How to turn on debug correctly so as to show
you detailed errors?

RU 
Спасибо за Ваш ответ.
Я пробовал использовать EdgeNGram при таких же настройках

возникала такая же проблема, был не совсем правильный поиск. Например надо
найти число 311570 в поиск я ввожу 3115, в результате я получал все числа
которые начинались на 311 а не 3115. Возможно данную опцию надо было
включить только для индексации ?
Но меня все равно беспокоит, что при данной опции часто в процессе
индексации падал Solr. Как правильно включить debug что бы Вам показать
более детальные ошибки ?

2017-12-28 22:47 GMT+01:00 Mikhail Khludnev :

> Obviously, Chris has nothing in common with Christmas, hence this classic
> search behavior is correct.
> What people are asking here is autocomplete, and it's a separate UX and
> algorithms.
> You can start to explore different aspects of this field from
> https://lucidworks.com/2015/03/04/solr-suggester/
> You see NGamming just freak the heap out. So, you can band aid it with
> EdgeNGram (and it's what you probably want to have) and add some heap to
> your poor server.
> Another approach, is to stop ngramming but try to really search by wildcard
> with http://yonik.com/solr-query-parameter-substitution/
> It should be something like q=${text}* and when client pass text=foo it
> searches for foo*, but it doesn't work for a few words and expensive as
> well.
>
> On Wed, Dec 27, 2017 at 3:34 PM, Siarhei Chystsiakou 
> wrote:
>
> > Hi everybody!
> > I  try integration Solr 6.6.1  with my email server (dovecot 2.32). I
> have
> > the following  settings:
> >
> > schema.xml - https://pastebin.com/1XXWTs8V
> > solrconfig.xml - https://pastebin.com/5HSswCcv
> >
> > But under these settings, the search works only on the full coincidence,
> > for instance, if I search for Chris it doesn't find  Christmas. The
> client
> > does not support wildcard search. I would like to know how to turn on
> > wildcard search for all queries.
> >
> > I tried to do that by adding the following line to schema.xml
> >
> >  maxGramSize="25"/>
> >
> > but when I added it, Solr 6.6.1 very often showed errors during the
> > indexing, which led to its full crash, even the web interface didn't
> > respond, only the full Solr restart helped. This problem emerged both on
> > Solr 6.6.1 and Solr 7.2
> >
> > Also, in case of this option, the search result was not what I expected.
> > For example, when I searched for the word domain, the words domes and
> > domain were also included. I suppose, that from the point of view of this
> > operation, the result is correct, but this is not what I need.
> >
> > That is why I would like to know, how to turn on the standard wildcard
> > search. As it is impossible on the client's side, I would like to manage
> it
> > from the Solr side.
> >
> > Thanks.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: Enable default wildcard search

2017-12-28 Thread Mikhail Khludnev

Obviously, Chris has nothing in common with Christmas, hence this classic
search behavior is correct.
What people are asking here is autocomplete, and it's a separate UX and
algorithms.
You can start to explore different aspects of this field from
https://lucidworks.com/2015/03/04/solr-suggester/
You see NGamming just freak the heap out. So, you can band aid it with
EdgeNGram (and it's what you probably want to have) and add some heap to
your poor server.
Another approach, is to stop ngramming but try to really search by wildcard
with http://yonik.com/solr-query-parameter-substitution/
It should be something like q=${text}* and when client pass text=foo it
searches for foo*, but it doesn't work for a few words and expensive as
well.

On Wed, Dec 27, 2017 at 3:34 PM, Siarhei Chystsiakou 
wrote:

> Hi everybody!
> I  try integration Solr 6.6.1  with my email server (dovecot 2.32). I have
> the following  settings:
>
> schema.xml - https://pastebin.com/1XXWTs8V
> solrconfig.xml - https://pastebin.com/5HSswCcv
>
> But under these settings, the search works only on the full coincidence,
> for instance, if I search for Chris it doesn't find  Christmas. The client
> does not support wildcard search. I would like to know how to turn on
> wildcard search for all queries.
>
> I tried to do that by adding the following line to schema.xml
>
> 
>
> but when I added it, Solr 6.6.1 very often showed errors during the
> indexing, which led to its full crash, even the web interface didn't
> respond, only the full Solr restart helped. This problem emerged both on
> Solr 6.6.1 and Solr 7.2
>
> Also, in case of this option, the search result was not what I expected.
> For example, when I searched for the word domain, the words domes and
> domain were also included. I suppose, that from the point of view of this
> operation, the result is correct, but this is not what I need.
>
> That is why I would like to know, how to turn on the standard wildcard
> search. As it is impossible on the client's side, I would like to manage it
> from the Solr side.
>
> Thanks.
>

-- 
Sincerely yours
Mikhail Khludnev

Re: Enable default wildcard search

2017-12-28 Thread Siarhei Chystsiakou

Hi
Does anyone have any idea how to fix this?

2017-12-27 13:34 GMT+01:00 Siarhei Chystsiakou :

> Hi everybody!
> I  try integration Solr 6.6.1  with my email server (dovecot 2.32). I have
> the following  settings:
>
> schema.xml - https://pastebin.com/1XXWTs8V
> solrconfig.xml - https://pastebin.com/5HSswCcv
>
> But under these settings, the search works only on the full coincidence,
> for instance, if I search for Chris it doesn't find  Christmas. The client
> does not support wildcard search. I would like to know how to turn on
> wildcard search for all queries.
>
> I tried to do that by adding the following line to schema.xml
>
> 
>
> but when I added it, Solr 6.6.1 very often showed errors during the
> indexing, which led to its full crash, even the web interface didn't
> respond, only the full Solr restart helped. This problem emerged both on
> Solr 6.6.1 and Solr 7.2
>
> Also, in case of this option, the search result was not what I expected.
> For example, when I searched for the word domain, the words domes and
> domain were also included. I suppose, that from the point of view of this
> operation, the result is correct, but this is not what I need.
>
> That is why I would like to know, how to turn on the standard wildcard
> search. As it is impossible on the client's side, I would like to manage it
> from the Solr side.
>
> Thanks.
>
>

Enable default wildcard search

2017-12-27 Thread Siarhei Chystsiakou

Hi everybody!
I  try integration Solr 6.6.1  with my email server (dovecot 2.32). I have
the following  settings:

schema.xml - https://pastebin.com/1XXWTs8V
solrconfig.xml - https://pastebin.com/5HSswCcv

But under these settings, the search works only on the full coincidence,
for instance, if I search for Chris it doesn't find  Christmas. The client
does not support wildcard search. I would like to know how to turn on
wildcard search for all queries.

I tried to do that by adding the following line to schema.xml



but when I added it, Solr 6.6.1 very often showed errors during the
indexing, which led to its full crash, even the web interface didn't
respond, only the full Solr restart helped. This problem emerged both on
Solr 6.6.1 and Solr 7.2

Also, in case of this option, the search result was not what I expected.
For example, when I searched for the word domain, the words domes and
domain were also included. I suppose, that from the point of view of this
operation, the result is correct, but this is not what I need.

That is why I would like to know, how to turn on the standard wildcard
search. As it is impossible on the client's side, I would like to manage it
from the Solr side.

Thanks.

Re: Edismax leading wildcard search

2017-12-22 Thread Michael Kuhlmann

Am 22.12.2017 um 11:57 schrieb Selvam Raman:
> 1) how can i disable leading wildcard search

Do it on the client side. Just don't allow leading asterisks or question
marks in your query term.

> 2) why leading wildcard search takes so much of time to give the response.
> 

Because Lucene can't just look in the index for all terms beginning with
something; it needs to look in all terms instead. Basically, indexed
terms are in alphabetical order, but that doesn't help with leading
wildcards.

There's a ReversedWildcardFilterFactory in Solr to address this issue.

-Michael

Edismax leading wildcard search

2017-12-22 Thread Selvam Raman

Hi,

Solr version - 6.4

Parser - Edismax

Leading wildcard search is allowed in edismax.

1) how can i disable leading wildcard search
2) why leading wildcard search takes so much of time to give the response.

-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Re: solr.TrieDoubleField deprecated with 7.1.0 but wildcard "*" search behaviour is different with solr.DoublePointField

2017-12-11 Thread Chris Hostetter


AFAICT The behavior you're describing with Trie fields was never 
intentionally supported/documented? 

It appears that it only worked as a fluke side effect of how the default 
implementation of FieldType.getprefixQuery() was inherited by Trie fields 
*and* because "indexed=true" TrieFields use Terms (just like StrField) ... 
so prefix of "" (the empty string) matched all of the Trie terms in a 
field.

(note that the syntax you're describing does *not* work for Trie fields 
that are "indexed=false docValues=true")

In general, there seems to be a bit of a mess in terms of trying to 
specify "prefix queries" (which is what "foo_d:*" really is under the 
covers) or "wild card" queries against numeric fields. I created a jira to 
try and come to a concensus about how this should behave moving forward...

https://issues.apache.org/jira/browse/SOLR-11746

...but i would suggest you move away from depending on that syntax and use 
the officially supported/documented range query syntax (foo_d[* TO *]) 
instead.




: some question about the new DoublePointField which should be used
: instead of the TrieDoubleField in 7.1.
...
: If i am using the deprecated one its possible to get a match for a
: double field like this:
: 
: test_d:*
: 
: even in 7.1.0.
: 
: But with the new DoublePointField, which you should use instead, you
: won't get that match - you have to use e.g. [* TO *].

: Is this an intended change in runtime / query behaviour or some bug or
: is it possible to restore that behaviour with the new field too?




-Hoss
http://www.lucidworks.com/

solr.TrieDoubleField deprecated with 7.1.0 but wildcard "*" search behaviour is different with solr.DoublePointField

2017-12-11 Thread Torsten Krah

Hi,

some question about the new DoublePointField which should be used
instead of the TrieDoubleField in 7.1.

https://lucene.apache.org/solr/guide/7_1/field-types-included-with-solr.html

If i am using the deprecated one its possible to get a match for a
double field like this:

test_d:*

even in 7.1.0.

But with the new DoublePointField, which you should use instead, you
won't get that match - you have to use e.g. [* TO *].

Short recipe can be found here to have a look yourself:

https://stackoverflow.com/questions/47473188/solr-7-1-querying-double-field-for-any-value-not-possible-with-anymore/47752445

Is this an intended change in runtime / query behaviour or some bug or
is it possible to restore that behaviour with the new field too?

kind regards

Torsten


smime.p7s
Description: S/MIME cryptographic signature

RE: Solr Wildcard Search

2017-11-30 Thread Allison, Timothy B.

A slightly more refined answer...  In my experience with the systems I've 
worked with, Porter and other stemmers can be useful as a "fallback field" with 
a really low boost, but you should be really careful if you're only searching 
on one field.

Cannot recommend Doug Turnbull and John Berryman's "Relevant Search" enough on 
how to layer fields...among many other great insights: 
https://www.manning.com/books/relevant-search


 -Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Thursday, November 30, 2017 9:20 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Wildcard Search

At the very least the English possessive filter, which you have.  Great!

Depending on what your query log analysis finds -- perhaps users are pretty 
much only searching on nouns? -- you might consider 
EnglishMinimalStemFilterFactory.

I wouldn't say that porter was or wasn't chosen intentionally.  It may be good 
for some use cases.  However, for the use cases I've seen, it has been 
disastrous.   

I have code that shows "equivalence sets" for analysis chain A vs analysis 
chain B...with some noise...assume same tokenization...  I should probably 
share that code on github or fold it into Luke somehow?  You can see this on a 
one-off basis in the Solr admin window via the Analysis tab, but to see this on 
your corpus/corpora across terms can be eye-opening, and then to cross-check it 
against query logs...quite powerful.


On one corpus, when I compared the same analysis chain A without Porter and B 
with porter, the output is e.g.:

"stemmed\tunstemmed #docs|unstemmed #docs..."

public  public 9834 | publication 1429 | publications 960 | publicly 662 | 
public's 176 | publicize 118 | publicized 107 | publicity 91 | publically 66 | 
publicizing 63 | publication's 6 | publicizes 4 | public_ 1 | publication_ 1 | 
publiced 1

effect  effective 6329 | effect 3157 | effectively 1745 | effectiveness 1198 | 
effects 831 | effected 139 | effecting 85 | effectives 1

new new 13279 | newness 6 | newed 3 | newe 2 | newing 1

order   order 7256 | orders 3125 | ordered 1840 | ordering 758 | orderly 241 | 
order's 17 | orderable 3 | orders_ 1

Imagine users searching for "publication" (~2500 docs) and getting back every 
document that mentions "public" (~10k).  That's a huge problem in many 
circumstances.  Good luck finding the name "newing".


-Original Message-
From: Georgy Nevsky [mailto:gnevsky.cn...@thomasnet.com]
Sent: Thursday, November 30, 2017 8:31 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Wildcard Search

I understand stemming reason. Thank you.

What do you suggest to use for stemming instead of "Porter" ? I guess, it 
wasn't chosen intentionally.

In the best we trust
Georgy Nevsky


-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: Thursday, November 30, 2017 8:25 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Wildcard Search

The initial question wasn't about a phrasal search, but I largely agree that 
diff q parsers handle the analysis chain differently for multiterms.

Yes, Porter is crazily aggressive. USE WITH CAUTION!

As has been pointed out, use the Solr admin window and the "debug" in the query 
option to see what's going on.

Use the Solr admin Analysis feature to see how your tokens are being modified 
by each step in the analysis chain.

If you use solr admin and debug the query for "shipping", you see that it is 
stemmed to "ship"...hence all of your matches work.  Porter doesn't have rules 
for words ending in "pp", so it doesn't stem "shipp" to "ship".  So, your 
wildcard query is looking for words that start with "shipp", and given that 
"shipping" was stemmed to "ship", it won't find it.  It would find "shippqrs" 
because porter wouldn't know what to do with that 😊

Again, Porter can be very dangerous if it doesn't align with user expectations.



-Original Message-
From: Atita Arora [mailto:atitaar...@gmail.com]
Sent: Thursday, November 30, 2017 8:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Wildcard Search

As Rick raised the most important aspect here , that the phrase is broken into 
multiple terms ORed together , I believe if the use case requires to perform 
wildcard search on phrases , we would need to store the entire phrase as a 
single term in the index which probably is not happening right now and hence 
are not found when sent across as phrases.
I tried this on my local Solr 7.1 without phrase this works as expected , 
however as soon as I do phrase search it fails for the reason as i mentioned 
above.

Let me know if I can clarify further.

On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky 
wrote:

> I wish

RE: Solr Wildcard Search

2017-11-30 Thread Allison, Timothy B.

At the very least the English possessive filter, which you have.  Great!

Depending on what your query log analysis finds -- perhaps users are pretty 
much only searching on nouns? -- you might consider 
EnglishMinimalStemFilterFactory.

I wouldn't say that porter was or wasn't chosen intentionally.  It may be good 
for some use cases.  However, for the use cases I've seen, it has been 
disastrous.   

I have code that shows "equivalence sets" for analysis chain A vs analysis 
chain B...with some noise...assume same tokenization...  I should probably 
share that code on github or fold it into Luke somehow?  You can see this on a 
one-off basis in the Solr admin window via the Analysis tab, but to see this on 
your corpus/corpora across terms can be eye-opening, and then to cross-check it 
against query logs...quite powerful.

On one corpus, when I compared the same analysis chain A without Porter and B 
with porter, the output is e.g.:

"stemmed\tunstemmed #docs|unstemmed #docs..."

public  public 9834 | publication 1429 | publications 960 | publicly 662 | 
public's 176 | publicize 118 | publicized 107 | publicity 91 | publically 66 | 
publicizing 63 | publication's 6 | publicizes 4 | public_ 1 | publication_ 1 | 
publiced 1

effect  effective 6329 | effect 3157 | effectively 1745 | effectiveness 1198 | 
effects 831 | effected 139 | effecting 85 | effectives 1

new new 13279 | newness 6 | newed 3 | newe 2 | newing 1

order   order 7256 | orders 3125 | ordered 1840 | ordering 758 | orderly 241 | 
order's 17 | orderable 3 | orders_ 1

Imagine users searching for "publication" (~2500 docs) and getting back every 
document that mentions "public" (~10k).  That's a huge problem in many 
circumstances.  Good luck finding the name "newing".

-Original Message-
From: Georgy Nevsky [mailto:gnevsky.cn...@thomasnet.com] 
Sent: Thursday, November 30, 2017 8:31 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Wildcard Search

I understand stemming reason. Thank you.

What do you suggest to use for stemming instead of "Porter" ? I guess, it 
wasn't chosen intentionally.

In the best we trust
Georgy Nevsky

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: Thursday, November 30, 2017 8:25 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Wildcard Search

The initial question wasn't about a phrasal search, but I largely agree that 
diff q parsers handle the analysis chain differently for multiterms.

Yes, Porter is crazily aggressive. USE WITH CAUTION!

As has been pointed out, use the Solr admin window and the "debug" in the query 
option to see what's going on.

Use the Solr admin Analysis feature to see how your tokens are being modified 
by each step in the analysis chain.

If you use solr admin and debug the query for "shipping", you see that it is 
stemmed to "ship"...hence all of your matches work.  Porter doesn't have rules 
for words ending in "pp", so it doesn't stem "shipp" to "ship".  So, your 
wildcard query is looking for words that start with "shipp", and given that 
"shipping" was stemmed to "ship", it won't find it.  It would find "shippqrs" 
because porter wouldn't know what to do with that 😊

Again, Porter can be very dangerous if it doesn't align with user expectations.

-Original Message-
From: Atita Arora [mailto:atitaar...@gmail.com]
Sent: Thursday, November 30, 2017 8:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Wildcard Search

As Rick raised the most important aspect here , that the phrase is broken into 
multiple terms ORed together , I believe if the use case requires to perform 
wildcard search on phrases , we would need to store the entire phrase as a 
single term in the index which probably is not happening right now and hence 
are not found when sent across as phrases.
I tried this on my local Solr 7.1 without phrase this works as expected , 
however as soon as I do phrase search it fails for the reason as i mentioned 
above.

Let me know if I can clarify further.

On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky 
wrote:

> I wish to understand if I can do something to get in result term 
> "shipping"
> when search for "shipp*"?
>
> Here field definition:
>  multiValued="false"/>
>
>  positionIncrementGap="100">
>   
> 
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> 
> 
>  protected="protwords.txt"/>
> 
>   
>
> Anything else can be important? Most configuration parameters are 
> default to Apache Solr 7.1.0.
>
> In the bes

RE: Solr Wildcard Search

2017-11-30 Thread Georgy Nevsky

I understand stemming reason. Thank you.

What do you suggest to use for stemming instead of "Porter" ? I guess, it
wasn't chosen intentionally.

In the best we trust
Georgy Nevsky

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: Thursday, November 30, 2017 8:25 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Wildcard Search

The initial question wasn't about a phrasal search, but I largely agree that
diff q parsers handle the analysis chain differently for multiterms.

Yes, Porter is crazily aggressive. USE WITH CAUTION!

As has been pointed out, use the Solr admin window and the "debug" in the
query option to see what's going on.

Use the Solr admin Analysis feature to see how your tokens are being
modified by each step in the analysis chain.

If you use solr admin and debug the query for "shipping", you see that it is
stemmed to "ship"...hence all of your matches work.  Porter doesn't have
rules for words ending in "pp", so it doesn't stem "shipp" to "ship".  So,
your wildcard query is looking for words that start with "shipp", and given
that "shipping" was stemmed to "ship", it won't find it.  It would find
"shippqrs" because porter wouldn't know what to do with that 😊

Again, Porter can be very dangerous if it doesn't align with user
expectations.

-Original Message-
From: Atita Arora [mailto:atitaar...@gmail.com]
Sent: Thursday, November 30, 2017 8:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Wildcard Search

As Rick raised the most important aspect here , that the phrase is broken
into multiple terms ORed together , I believe if the use case requires to
perform wildcard search on phrases , we would need to store the entire
phrase as a single term in the index which probably is not happening right
now and hence are not found when sent across as phrases.
I tried this on my local Solr 7.1 without phrase this works as expected ,
however as soon as I do phrase search it fails for the reason as i mentioned
above.

Let me know if I can clarify further.

On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky 
wrote:

> I wish to understand if I can do something to get in result term
> "shipping"
> when search for "shipp*"?
>
> Here field definition:
>  multiValued="false"/>
>
>  positionIncrementGap="100">
>   
> 
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> 
> 
>  protected="protwords.txt"/>
> 
>   
>
> Anything else can be important? Most configuration parameters are
> default to Apache Solr 7.1.0.
>
> In the best we trust
> Georgy Nevsky
>
>
> -Original Message-
> From: Rick Leir [mailto:rl...@leirtech.com]
> Sent: Thursday, November 30, 2017 7:32 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Wildcard Search
>
> George,
> When you get those results it could be due to stemming.
>
> Wildcard processing expands your term to multiple terms, OR'd
> together. It also takes you down a different analysis pathway, as many
> analysis components do not work with multiple terms. Look into the
> SolrAdmin console, and use the analysis tab to understand what is
> going on.
>
> If you still have doubts, tell us more about your config.
> Cheers --Rick
>
>
> On November 30, 2017 7:06:42 AM EST, Georgy Nevsky
>  wrote:
> >Can somebody help me understand how Solr Wildcard Search is working?
> >
> >If I’m doing search for “ship*” term I’m getting in result many
> >strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”,
> >etc.
> >
> >But if I’m searching for “shipp*” I don’t get any result.
> >
> >
> >
> >In the best we trust
> >
> >Georgy Nevsky
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>

RE: Solr Wildcard Search

2017-11-30 Thread Allison, Timothy B.

The initial question wasn't about a phrasal search, but I largely agree that 
diff q parsers handle the analysis chain differently for multiterms.

Yes, Porter is crazily aggressive. USE WITH CAUTION!  

As has been pointed out, use the Solr admin window and the "debug" in the query 
option to see what's going on.

Use the Solr admin Analysis feature to see how your tokens are being modified 
by each step in the analysis chain.

If you use solr admin and debug the query for "shipping", you see that it is 
stemmed to "ship"...hence all of your matches work.  Porter doesn't have rules 
for words ending in "pp", so it doesn't stem "shipp" to "ship".  So, your 
wildcard query is looking for words that start with "shipp", and given that 
"shipping" was stemmed to "ship", it won't find it.  It would find "shippqrs" 
because porter wouldn't know what to do with that 😊

Again, Porter can be very dangerous if it doesn't align with user expectations.

-Original Message-
From: Atita Arora [mailto:atitaar...@gmail.com] 
Sent: Thursday, November 30, 2017 8:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Wildcard Search

As Rick raised the most important aspect here , that the phrase is broken into 
multiple terms ORed together , I believe if the use case requires to perform 
wildcard search on phrases , we would need to store the entire phrase as a 
single term in the index which probably is not happening right now and hence 
are not found when sent across as phrases.
I tried this on my local Solr 7.1 without phrase this works as expected , 
however as soon as I do phrase search it fails for the reason as i mentioned 
above.

Let me know if I can clarify further.

On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky 
wrote:

> I wish to understand if I can do something to get in result term "shipping"
> when search for "shipp*"?
>
> Here field definition:
>  multiValued="false"/>
>
>  positionIncrementGap="100">
>   
> 
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> 
> 
>  protected="protwords.txt"/>
> 
>   
>
> Anything else can be important? Most configuration parameters are 
> default to Apache Solr 7.1.0.
>
> In the best we trust
> Georgy Nevsky
>
>
> -Original Message-
> From: Rick Leir [mailto:rl...@leirtech.com]
> Sent: Thursday, November 30, 2017 7:32 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Wildcard Search
>
> George,
> When you get those results it could be due to stemming.
>
> Wildcard processing expands your term to multiple terms, OR'd 
> together. It also takes you down a different analysis pathway, as many 
> analysis components do not work with multiple terms. Look into the 
> SolrAdmin console, and use the analysis tab to understand what is 
> going on.
>
> If you still have doubts, tell us more about your config.
> Cheers --Rick
>
>
> On November 30, 2017 7:06:42 AM EST, Georgy Nevsky 
>  wrote:
> >Can somebody help me understand how Solr Wildcard Search is working?
> >
> >If I’m doing search for “ship*” term I’m getting in result many 
> >strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”, 
> >etc.
> >
> >But if I’m searching for “shipp*” I don’t get any result.
> >
> >
> >
> >In the best we trust
> >
> >Georgy Nevsky
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>

Re: Solr Wildcard Search

2017-11-30 Thread Atita Arora

As Rick raised the most important aspect here , that the phrase is broken
into multiple terms ORed together ,
I believe if the use case requires to perform wildcard search on phrases ,
we would need to store the entire phrase as a single term in the index
which probably is not happening right now and hence are not found when sent
across as phrases.
I tried this on my local Solr 7.1 without phrase this works as expected ,
however as soon as I do phrase search it fails for the reason as i
mentioned above.

Let me know if I can clarify further.

On Thu, Nov 30, 2017 at 6:31 PM, Georgy Nevsky 
wrote:

> I wish to understand if I can do something to get in result term "shipping"
> when search for "shipp*"?
>
> Here field definition:
>  multiValued="false"/>
>
>  positionIncrementGap="100">
>   
> 
>  ignoreCase="true"
> words="lang/stopwords_en.txt"
> />
> 
> 
>  protected="protwords.txt"/>
> 
>   
>
> Anything else can be important? Most configuration parameters are default
> to
> Apache Solr 7.1.0.
>
> In the best we trust
> Georgy Nevsky
>
>
> -Original Message-----
> From: Rick Leir [mailto:rl...@leirtech.com]
> Sent: Thursday, November 30, 2017 7:32 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Wildcard Search
>
> George,
> When you get those results it could be due to stemming.
>
> Wildcard processing expands your term to multiple terms, OR'd together. It
> also takes you down a different analysis pathway, as many analysis
> components do not work with multiple terms. Look into the SolrAdmin
> console,
> and use the analysis tab to understand what is going on.
>
> If you still have doubts, tell us more about your config.
> Cheers --Rick
>
>
> On November 30, 2017 7:06:42 AM EST, Georgy Nevsky
>  wrote:
> >Can somebody help me understand how Solr Wildcard Search is working?
> >
> >If I’m doing search for “ship*” term I’m getting in result many
> >strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”,
> >etc.
> >
> >But if I’m searching for “shipp*” I don’t get any result.
> >
> >
> >
> >In the best we trust
> >
> >Georgy Nevsky
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>

RE: Solr Wildcard Search

2017-11-30 Thread Georgy Nevsky

I wish to understand if I can do something to get in result term "shipping"
when search for "shipp*"?

Here field definition:

Anything else can be important? Most configuration parameters are default to
Apache Solr 7.1.0.

In the best we trust
Georgy Nevsky

-Original Message-
From: Rick Leir [mailto:rl...@leirtech.com]
Sent: Thursday, November 30, 2017 7:32 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Wildcard Search

George,
When you get those results it could be due to stemming.

Wildcard processing expands your term to multiple terms, OR'd together. It
also takes you down a different analysis pathway, as many analysis
components do not work with multiple terms. Look into the SolrAdmin console,
and use the analysis tab to understand what is going on.

If you still have doubts, tell us more about your config.
Cheers --Rick

On November 30, 2017 7:06:42 AM EST, Georgy Nevsky
 wrote:
>Can somebody help me understand how Solr Wildcard Search is working?
>
>If I’m doing search for “ship*” term I’m getting in result many
>strings, like “Shipping Weight”, “Ship From”, “Shipping Calculator”,
>etc.
>
>But if I’m searching for “shipp*” I don’t get any result.
>
>
>
>In the best we trust
>
>Georgy Nevsky

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr Wildcard Search

2017-11-30 Thread Rick Leir

George,
When you get those results it could be due to stemming.

Wildcard processing expands your term to multiple terms, OR'd together. It also 
takes you down a different analysis pathway, as many analysis components do not 
work with multiple terms. Look into the SolrAdmin console, and use the analysis 
tab to understand what is going on.

If you still have doubts, tell us more about your config.
Cheers --Rick

On November 30, 2017 7:06:42 AM EST, Georgy Nevsky 
 wrote:
>Can somebody help me understand how Solr Wildcard Search is working?
>
>If I’m doing search for “ship*” term I’m getting in result many
>strings,
>like “Shipping Weight”, “Ship From”, “Shipping Calculator”, etc.
>
>But if I’m searching for “shipp*” I don’t get any result.
>
>
>
>In the best we trust
>
>Georgy Nevsky

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Solr Wildcard Search

2017-11-30 Thread Georgy Nevsky

Can somebody help me understand how Solr Wildcard Search is working?

If I’m doing search for “ship*” term I’m getting in result many strings,
like “Shipping Weight”, “Ship From”, “Shipping Calculator”, etc.

But if I’m searching for “shipp*” I don’t get any result.



In the best we trust

Georgy Nevsky

Re: StrField with Wildcard Search

2016-09-08 Thread Ahmet Arslan

Hi,

I think AutomatonQuery is used.
http://opensourceconnections.com/blog/2013/02/21/lucene-4-finite-state-automaton-in-10-minutes-intro-tutorial/
https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/AutomatonQuery.html

Ahmet

On Thursday, September 8, 2016 3:54 PM, Sandeep Khanzode
wrote:

Hi,

Okay.

So it seems that the wildcard searches will perform a (sort-of) dictionary
search where they will inspect every (full keyword) token at search time, and
do a match instead of a match on pre-created index-time tokens with TextField.
However, the wildcard/fuzzy functionality will still be provided no matter the
approach...

SRK

On Thursday, September 8, 2016 5:05 PM, Ahmet Arslan
wrote:

Hi,

EdgeNGram and Wildcard may be used to achieve the same goal: prefix search or
starts with search.

Lets say, wildcard enumerates the whole inverted index, thus it may get slower
for very large databases.
With this one no index time manipulation is required.

EdgeNGram does its magic at index time, indexes a lot of tokens, all possible
prefixes.
Index size gets bigger, query time no wildcard operator required in this one.

Ahmet

On Thursday, September 8, 2016 12:35 PM, Sandeep Khanzode
wrote:
Hello,
There are quite a few links that detail the difference between StrField and
TextField. Also links that explain that, even though the field is indexed, it
is not tokenized and stored as a single keyword, as can be verified by the
debug analysis on Solr admin and CURL debugQuery options.
What I am unable to understand is how a wildcard works on StrFields? For
example, if the name is "John Doe" and I search for "John*", I get that match.
Which means, that somewhere deep within, maybe a Trie or Dictionary
representation exists that allows this search with a partial string.
I would have assumed that wildcard would match on TextFields which allow
(Edge)NGramFilters, etc. -- SRK

Re: StrField with Wildcard Search

2016-09-08 Thread Sandeep Khanzode

Hi,
Okay.
So it seems that the wildcard searches will perform a (sort-of) dictionary 
search where they will inspect every (full keyword) token at search time, and 
do a match instead of a match on pre-created index-time tokens with TextField. 
However, the wildcard/fuzzy functionality will still be provided no matter the 
approach... SRK 

On Thursday, September 8, 2016 5:05 PM, Ahmet Arslan 
 wrote:
 

 Hi,

EdgeNGram and Wildcard may be used to achieve the same goal: prefix search or 
starts with search.

Lets say, wildcard enumerates the whole inverted index, thus it may get slower 
for very large databases.
With this one no index time manipulation is required.

EdgeNGram does its magic at index time, indexes a lot of tokens, all possible 
prefixes.
Index size gets bigger, query time no wildcard operator required in this one.

Ahmet



On Thursday, September 8, 2016 12:35 PM, Sandeep Khanzode 
 wrote:
Hello,
There are quite a few links that detail the difference between StrField and 
TextField. Also links that explain that, even though the field is indexed, it 
is not tokenized and stored as a single keyword, as can be verified by the 
debug analysis on Solr admin and CURL debugQuery options.
What I am unable to understand is how a wildcard works on StrFields? For 
example, if the name is "John Doe" and I search for "John*", I get that match. 
Which means, that somewhere deep within, maybe a Trie or Dictionary 
representation exists that allows this search with a partial string.
I would have assumed that wildcard would match on TextFields which allow 
(Edge)NGramFilters, etc.  -- SRK

Re: StrField with Wildcard Search

2016-09-08 Thread Ahmet Arslan

Hi,

EdgeNGram and Wildcard may be used to achieve the same goal: prefix search or 
starts with search.

Lets say, wildcard enumerates the whole inverted index, thus it may get slower 
for very large databases.
With this one no index time manipulation is required.

EdgeNGram does its magic at index time, indexes a lot of tokens, all possible 
prefixes.
Index size gets bigger, query time no wildcard operator required in this one.

Ahmet



On Thursday, September 8, 2016 12:35 PM, Sandeep Khanzode 
 wrote:
Hello,
There are quite a few links that detail the difference between StrField and 
TextField. Also links that explain that, even though the field is indexed, it 
is not tokenized and stored as a single keyword, as can be verified by the 
debug analysis on Solr admin and CURL debugQuery options.
What I am unable to understand is how a wildcard works on StrFields? For 
example, if the name is "John Doe" and I search for "John*", I get that match. 
Which means, that somewhere deep within, maybe a Trie or Dictionary 
representation exists that allows this search with a partial string.
I would have assumed that wildcard would match on TextFields which allow 
(Edge)NGramFilters, etc.  -- SRK

StrField with Wildcard Search

2016-09-08 Thread Sandeep Khanzode

Hello,
There are quite a few links that detail the difference between StrField and 
TextField. Also links that explain that, even though the field is indexed, it 
is not tokenized and stored as a single keyword, as can be verified by the 
debug analysis on Solr admin and CURL debugQuery options.
What I am unable to understand is how a wildcard works on StrFields? For 
example, if the name is "John Doe" and I search for "John*", I get that match. 
Which means, that somewhere deep within, maybe a Trie or Dictionary 
representation exists that allows this search with a partial string.
I would have assumed that wildcard would match on TextFields which allow 
(Edge)NGramFilters, etc.  -- SRK

RE: Wildcard search not working

2016-08-12 Thread Ribeaud, Christian (Ext)

Hi Ahmet, Hi Upayavira,

OK, it seems that I have to dive a bit deeper in the Solr filters and 
tokenizers. I've just realized that my command there is too limited.
Thanks a lot guys so far for help. Cheers and have a nice day,

christian

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Freitag, 12. August 2016 07:41
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Christian,

Please use the following filter before/above the stemmer.


Plus, you may want to add :


  
  
  

Ahmet



On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, 
honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the 
corresponding field:

...




 









...

What is wrong with this schema? Respectively, what should I change to be able 
to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel



-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Donnerstag, 11. August 2016 16:00
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche 
depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are 
executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match 
it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian

Re: Wildcard search not working

2016-08-12 Thread Ahmet Arslan

Hi Christian,

Please use the following filter before/above the stemmer.


Plus, you may want to add :


  
  
  

Ahmet



On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, 
honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the 
corresponding field:

...




 









...

What is wrong with this schema? Respectively, what should I change to be able 
to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel



-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Donnerstag, 11. August 2016 16:00
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche 
depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are 
executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match 
it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian

Re: Wildcard search not working

2016-08-11 Thread Upayavira

You have a stemming filter in your analysis chain. Go to the analysis
tab, select the 'text' field, and put "Roche" into both boxes. Click
analyse. I bet you you will see Roch, not Roche, because of your
stemming filter shown below.

That's what Ahmet shrewdly identified above.

Upayavira

On Thu, 11 Aug 2016, at 08:31 PM, Ribeaud, Christian (Ext) wrote:
> Hi Ahmet,
> 
> Many thanks for your reply. I had a look at the URL you pointed out but,
> honestly, I have to admit that I did not fully understand you.
> Let's be a bit more concrete. Following the schema snippet for the
> corresponding field:
> 
> ...
>  required="false" multiValued="false" />
> 
> 
>  positionIncrementGap="100">
>  
> 
> 
>  words="lang/stopwords_de.txt" format="snowball" />
> 
> 
> 
> 
> 
> 
> ...
> 
> What is wrong with this schema? Respectively, what should I change to be
> able to correctly do wildcard searches?
> 
> Many thanks for your time. Cheers,
> 
> christian
> --
> Christian Ribeaud
> Software Engineer (External)
> NIBR / WSJ-310.5.17
> Novartis Campus
> CH-4056 Basel
> 
> 
> -----Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com] 
> Sent: Donnerstag, 11. August 2016 16:00
> To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
> Subject: Re: Wildcard search not working
> 
> Hi Chiristian,
> 
> The query r?che may not return at least the same number of matches as
> roche depending on your analysis chain.
> The difference is roche is analyzed but r?che don't. Wildcard queries are
> executed on the indexed/analyzed terms.
> For example, if roche is indexed/analyzed as roch, the query r?che won't
> match it.
> 
> Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis
> 
> Ahmet
> 
> 
> 
> On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)"
>  wrote:
> Hi,
> 
> What would be the reasons making the wildcard search for Lucene Query
> Parser NOT working?
> 
> We are using Solr 5.4.1 and, using the admin console, I am triggering for
> instance searches with term 'roche' in a specific core. Everything fine,
> I am getting for instance two matches. I would expect at least the same
> number of matches with term 'r?che'. However, this does NOT happen. I am
> getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not
> work neither but 'roch*' works.
> 
> Switching debug mode brings following output:
> 
> "debug": {
> "rawquerystring": "roch?",
> "querystring": "roch?",
> "parsedquery": "text:roch?",
> "parsedquery_toString": "text:roch?",
> "explain": {},
> "QParser": "LuceneQParser",
> ...
> 
> Any idea? Thanks and cheers,
> 
> christian

RE: Wildcard search not working

2016-08-11 Thread Ribeaud, Christian (Ext)

Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, 
honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the 
corresponding field:

...




 









...

What is wrong with this schema? Respectively, what should I change to be able 
to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Donnerstag, 11. August 2016 16:00
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche 
depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are 
executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match 
it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian

Re: Wildcard search not working

2016-08-11 Thread Ahmet Arslan

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche 
depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are 
executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match 
it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian

Wildcard search not working

2016-08-11 Thread Ribeaud, Christian (Ext)

Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian

RE: wildcard search for string having spaces

2016-06-15 Thread Roshan Kamble

Great.
First option worked for me. I was trying with q=abc\sp*... it should be q=abc\ 
p*

Thanks

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Wednesday, June 15, 2016 6:25 PM
To: solr-user@lucene.apache.org; Roshan Kamble
Subject: Re: wildcard search for string having spaces

Hi Roshan,

I think there are two options:

1) escape the space q=abc\ p*
2) use prefix query parser q={!prefix f=my_string}abc p

Ahmet

On Wednesday, June 15, 2016 3:48 PM, Roshan Kamble 
 wrote:
Hello,

I have below custom field type defined for solr 6.0.0

I am using above field to ensure that entire string is considered as single 
token and search should be case insensitive.

It works for most of the scnearios with wildcard search.
e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* 
gives this three results.

But I am not able to search with say abc p*

Search with query q="abc pqr" gives exact match and desired result.

I want to do wildcard search where criteria can include spaces like above 
example

i.e. if space is present then I am not able to to wildcard search.

Is there any way by which wildcard search will be achieved even if space is 
present in token.

Regards,
Roshan

The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.

 The information in this email is confidential and may be legally privileged. 
It is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.

Re: wildcard search for string having spaces

2016-06-15 Thread Ahmet Arslan

Hi Roshan,

I think there are two options:

1) escape the space q=abc\ p*
2) use prefix query parser q={!prefix f=my_string}abc p

Ahmet


On Wednesday, June 15, 2016 3:48 PM, Roshan Kamble 
 wrote:
Hello,

I have below custom field type defined for solr 6.0.0

   

  
  


  
  

  


I am using above field to ensure that entire string is considered as single 
token and search should be case insensitive.

It works for most of the scnearios with wildcard search.
e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* 
gives this three results.

But I am not able to search with say abc p*

Search with query q="abc pqr" gives exact match and desired result.

I want to do wildcard search where criteria can include spaces like above 
example


i.e. if space is present then I am not able to to wildcard search.

Is there any way by which wildcard search will be achieved even if space is 
present in token.

Regards,
Roshan

The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.

wildcard search for string having spaces

2016-06-15 Thread Roshan Kamble

Hello,

I have below custom field type defined for solr 6.0.0

   

  
  


  
  

  


I am using above field to ensure that entire string is considered as single 
token and search should be case insensitive.

It works for most of the scnearios with wildcard search.
e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* 
gives this three results.

But I am not able to search with say abc p*

Search with query q="abc pqr" gives exact match and desired result.

I want to do wildcard search where criteria can include spaces like above 
example


i.e. if space is present then I am not able to to wildcard search.

Is there any way by which wildcard search will be achieved even if space is 
present in token.

Regards,
Roshan

The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.

Re: Solr Wildcard Search for large amount of text

2015-06-27 Thread Jack Krupansky

What do you want actual user queries to look like? I mean, having to
explicitly write asterisks after every term is a real pain.

Indexing ngrams has the advantage that phrase queries and edismax phrase
boosting work automatically. Phrases don't work with explicit wildcard
queries.

The only real downside to ngrams is that they explode the size of the
index. But memory is supposed to be cheap these days. I mean, compare the
cost of the extra RAM (to keep the full index in memory) to the cost to
users of tehir productivity constructing queries and having expensive staff
to help them figure out why various queries don't work as expected.

How big is your corpus - number of documents and average document size?

-- Jack Krupansky

On Sat, Jun 27, 2015 at 6:27 AM, octopus  wrote:

> Hi, I'm looking at Solr's features for wildcard search used for a large
> amount of text. I read on the net that solr.EdgeNGramFilterFactory is used
> to generate tokens for wildcard searching.
>
> For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria",
> "nigeria", "nigerian"
>
> However, I have a large amount of text out there which requires wildcard
> search and it's not viable to use EdgeNGrameFilterFactory as the amount of
> processing will be too huge. Do you have any suggestions/advice please?
>
> Thank you so much for your time!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Wildcard-Search-for-large-amount-of-text-tp4214392.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr Wildcard Search for large amount of text

2015-06-27 Thread Erick Erickson

Try it and see ;).

My experience is that wildcards work fine, although
what "fine" is up to you to decide _if_ you restrict
it to requiring at least two leading "real" characters,
and I actually prefer three. I.e.
ab* or abc*. Note that if you require leading
wildcards, use the reverse wildcard filter.

I will vociferously argue that single-letter wildcards are
not useful anyway. I mean every single document in your
corpus will probably match every single-letter wildcard
(a*, b*, whatever), providing no benefit to the user.

And, the need for wildcards can often be reduced or
eliminated if you use can autosuggest or autocomplete.
Of course if you're trying to satisfy more complex use
cases where the user is composing their own complex
clauses that may not apply.

FWIW,
Erick

On Sat, Jun 27, 2015 at 10:06 AM, Shawn Heisey  wrote:
> On 6/27/2015 4:27 AM, octopus wrote:
>> Hi, I'm looking at Solr's features for wildcard search used for a large
>> amount of text. I read on the net that solr.EdgeNGramFilterFactory is used
>> to generate tokens for wildcard searching.
>>
>> For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria",
>> "nigeria", "nigerian"
>>
>> However, I have a large amount of text out there which requires wildcard
>> search and it's not viable to use EdgeNGrameFilterFactory as the amount of
>> processing will be too huge. Do you have any suggestions/advice please?
>
> Both edgengrams and wildcards are ways to do this.  There are advantages
> and disadvantages to both ways.
>
> To do a wildcard search, Solr (Lucene really) must look up all the
> matching terms in the index and substitute them into the query so that
> it becomes a large number of simple string matches.  If you have a large
> number of terms in your index, that can be slow.  The expensive work
> (expanding the terms) is done for every single query.
>
> The edgengram filter does similar work, but it does it at *index* time,
> rather than query time.  At query time, you are doing a simple string
> match with one term, although the index contains many more terms,
> because the very expensive work was done at index time.
>
> It's difficult to know which approach will be more efficient on *your*
> index without experimentation, but there is a general rule when it comes
> to Solr performance: As much as possible, do the expensive work at index
> time.
>
> Thanks,
> Shawn
>

Re: Solr Wildcard Search for large amount of text

2015-06-27 Thread Shawn Heisey

On 6/27/2015 4:27 AM, octopus wrote:
> Hi, I'm looking at Solr's features for wildcard search used for a large
> amount of text. I read on the net that solr.EdgeNGramFilterFactory is used
> to generate tokens for wildcard searching. 
> 
> For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria",
> "nigeria", "nigerian"
> 
> However, I have a large amount of text out there which requires wildcard
> search and it's not viable to use EdgeNGrameFilterFactory as the amount of
> processing will be too huge. Do you have any suggestions/advice please?

Both edgengrams and wildcards are ways to do this.  There are advantages
and disadvantages to both ways.

To do a wildcard search, Solr (Lucene really) must look up all the
matching terms in the index and substitute them into the query so that
it becomes a large number of simple string matches.  If you have a large
number of terms in your index, that can be slow.  The expensive work
(expanding the terms) is done for every single query.

The edgengram filter does similar work, but it does it at *index* time,
rather than query time.  At query time, you are doing a simple string
match with one term, although the index contains many more terms,
because the very expensive work was done at index time.

It's difficult to know which approach will be more efficient on *your*
index without experimentation, but there is a general rule when it comes
to Solr performance: As much as possible, do the expensive work at index
time.

Thanks,
Shawn

Re: Solr Wildcard Search for large amount of text

2015-06-27 Thread Upayavira

That is one way to implement wildcarda, but isnt the most efficient.

Just index normally, tokenized, and search with an asterisk suffix, e.g.
foo*

This will build a finite state transformer that will make wildcard
handling efficient.

Upayavira

On, Jun 27, 2015, at 11:27 AM, pus wrote:
> Hi, I'm looking at Solr's features for wildcard search used for a large
> amount of text. I read on the net that solr.EdgeNGramFilterFactory is
> used
> to generate tokens for wildcard searching. 
> 
> For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria",
> "nigeria", "nigerian"
> 
> However, I have a large amount of text out there which requires wildcard
> search and it's not viable to use EdgeNGrameFilterFactory as the amount
> of
> processing will be too huge. Do you have any suggestions/advice please?
> 
> Thank you so much for your time! 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Wildcard-Search-for-large-amount-of-text-tp4214392.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Solr Wildcard Search for large amount of text

2015-06-27 Thread octopus

Hi, I'm looking at Solr's features for wildcard search used for a large
amount of text. I read on the net that solr.EdgeNGramFilterFactory is used
to generate tokens for wildcard searching. 

For Nigerian => "ni", "nig", "nige", "niger", "nigeri", "nigeria",
"nigeria", "nigerian"

However, I have a large amount of text out there which requires wildcard
search and it's not viable to use EdgeNGrameFilterFactory as the amount of
processing will be too huge. Do you have any suggestions/advice please?

Thank you so much for your time! 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Wildcard-Search-for-large-amount-of-text-tp4214392.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: rq breaks wildcard search?

2015-04-22 Thread Ryan Josal

Awesome thanks!  I was on 4.10.2

Ryan

> On Apr 22, 2015, at 16:44, Joel Bernstein  wrote:
> 
> For your own implementation you'll need to implement the following methods:
> 
> public Query rewrite(IndexReader reader) throws IOException
> public void extractTerms(Set terms)
> 
> You can review the 4.10.3 version of the ReRankQParserPlugin to see how it
> implements these methods.
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
>> On Wed, Apr 22, 2015 at 7:33 PM, Joel Bernstein  wrote:
>> 
>> Just confirmed that wildcard queries work with Re-Ranking following
>> SOLR-6323.
>> 
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>> 
>> On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein 
>> wrote:
>> 
>>> This should be resolved in
>>> https://issues.apache.org/jira/browse/SOLR-6323.
>>> 
>>> Solr 4.10.3
>>> 
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>> 
 On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal  wrote:

 Using edismax, supplying a rq= param, like {!rerank ...} is causing an
 UnsupportedOperationException because the Query doesn't implement
 createWeight.  This is for WildcardQuery in particular.  From some
 preliminary debugging it looks like without rq, somehow the qf Queries
 might turn into ConstantScore instead of WildcardQuery.  I don't think
 this
 is related to the RankQuery implementation as my own subclass has the
 same
 issue.  Anyway the effect is that all q's containing ? or * return http
 500
 because I always have rq on.  Can anyone confirm if this is a bug?  I
 will
 log it in Jira if so.

 Also, does anyone know how I can work around it?  Specifically, can I
 disable edismax from making WildcardQueries?

 Ryan
>>

Re: rq breaks wildcard search?

2015-04-22 Thread Joel Bernstein

For your own implementation you'll need to implement the following methods:

public Query rewrite(IndexReader reader) throws IOException
public void extractTerms(Set terms)

You can review the 4.10.3 version of the ReRankQParserPlugin to see how it
implements these methods.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Apr 22, 2015 at 7:33 PM, Joel Bernstein  wrote:

> Just confirmed that wildcard queries work with Re-Ranking following
> SOLR-6323.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein 
> wrote:
>
>> This should be resolved in
>> https://issues.apache.org/jira/browse/SOLR-6323.
>>
>> Solr 4.10.3
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal  wrote:
>>
>>> Using edismax, supplying a rq= param, like {!rerank ...} is causing an
>>> UnsupportedOperationException because the Query doesn't implement
>>> createWeight.  This is for WildcardQuery in particular.  From some
>>> preliminary debugging it looks like without rq, somehow the qf Queries
>>> might turn into ConstantScore instead of WildcardQuery.  I don't think
>>> this
>>> is related to the RankQuery implementation as my own subclass has the
>>> same
>>> issue.  Anyway the effect is that all q's containing ? or * return http
>>> 500
>>> because I always have rq on.  Can anyone confirm if this is a bug?  I
>>> will
>>> log it in Jira if so.
>>>
>>> Also, does anyone know how I can work around it?  Specifically, can I
>>> disable edismax from making WildcardQueries?
>>>
>>> Ryan
>>>
>>
>>
>

Re: rq breaks wildcard search?

2015-04-22 Thread Joel Bernstein

Just confirmed that wildcard queries work with Re-Ranking following
SOLR-6323.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein  wrote:

> This should be resolved in https://issues.apache.org/jira/browse/SOLR-6323
> .
>
> Solr 4.10.3
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal  wrote:
>
>> Using edismax, supplying a rq= param, like {!rerank ...} is causing an
>> UnsupportedOperationException because the Query doesn't implement
>> createWeight.  This is for WildcardQuery in particular.  From some
>> preliminary debugging it looks like without rq, somehow the qf Queries
>> might turn into ConstantScore instead of WildcardQuery.  I don't think
>> this
>> is related to the RankQuery implementation as my own subclass has the same
>> issue.  Anyway the effect is that all q's containing ? or * return http
>> 500
>> because I always have rq on.  Can anyone confirm if this is a bug?  I will
>> log it in Jira if so.
>>
>> Also, does anyone know how I can work around it?  Specifically, can I
>> disable edismax from making WildcardQueries?
>>
>> Ryan
>>
>
>

Re: rq breaks wildcard search?

2015-04-22 Thread Joel Bernstein

This should be resolved in https://issues.apache.org/jira/browse/SOLR-6323.

Solr 4.10.3

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal  wrote:

> Using edismax, supplying a rq= param, like {!rerank ...} is causing an
> UnsupportedOperationException because the Query doesn't implement
> createWeight.  This is for WildcardQuery in particular.  From some
> preliminary debugging it looks like without rq, somehow the qf Queries
> might turn into ConstantScore instead of WildcardQuery.  I don't think this
> is related to the RankQuery implementation as my own subclass has the same
> issue.  Anyway the effect is that all q's containing ? or * return http 500
> because I always have rq on.  Can anyone confirm if this is a bug?  I will
> log it in Jira if so.
>
> Also, does anyone know how I can work around it?  Specifically, can I
> disable edismax from making WildcardQueries?
>
> Ryan
>

rq breaks wildcard search?

2015-04-15 Thread Ryan Josal

Using edismax, supplying a rq= param, like {!rerank ...} is causing an
UnsupportedOperationException because the Query doesn't implement
createWeight.  This is for WildcardQuery in particular.  From some
preliminary debugging it looks like without rq, somehow the qf Queries
might turn into ConstantScore instead of WildcardQuery.  I don't think this
is related to the RankQuery implementation as my own subclass has the same
issue.  Anyway the effect is that all q's containing ? or * return http 500
because I always have rq on.  Can anyone confirm if this is a bug?  I will
log it in Jira if so.

Also, does anyone know how I can work around it?  Specifically, can I
disable edismax from making WildcardQueries?

Ryan

Re: Weird Problem (possible bug?) with german stemming and wildcard search

2014-10-15 Thread Thomas Michael Engelke


Thank you very much,

this information is worht it's weight in gold. So far, we've used the 
asterisk method because it seemed logical and straight-forward. We will 
slowly migrate to a version using EdgeNGramFilterFactory.


Thanks a bunch.

Am 07.10.2014 14:42 schrieb Alexandre Rafalovitch:


On 7 October 2014 08:25, Thomas Michael Engelke
 wrote:

So the culprit is the asterisk at the end. As far as we can read from 
the docs, an asterisk is just 0 or more characters, which means that 
the literal word in front of the asterisk should match the query.


Not quite: http://wiki.apache.org/solr/MultitermQueryAnalysis [1]

It's actually quite complicated and even depends on exact version of
Solr you are using. In fact, out of all the analyzers you showed
above, I think only LowerCase will be present on the chain. Look for
(multi) marker at: http://www.solr-start.com/info/analyzers/ [2] for 
more

details.

On a higher level, I would suggest getting away from *-based expansion
and looking at EdgeNGrams instead. You can see an example of
autocomplete at
http://www.solr-start.com/javadoc/solr-lucene/index.html [3] and the
matching configuration at:
https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 
[4]


Or a dedicated Suggester module, though information on that is a bit
harder to find.

Regards,
Alex.

Personal: http://www.outerthoughts.com/ [5] and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ [6] and 
@solrstart
Solr popularizers community: 
https://www.linkedin.com/groups?gid=6713853 [7]



Links:
--
[1] http://wiki.apache.org/solr/MultitermQueryAnalysis
[2] http://www.solr-start.com/info/analyzers/
[3] http://www.solr-start.com/javadoc/solr-lucene/index.html
[4] 
https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24

[5] http://www.outerthoughts.com/
[6] http://www.solr-start.com/
[7] https://www.linkedin.com/groups?gid=6713853

Re: Weird Problem (possible bug?) with german stemming and wildcard search

2014-10-07 Thread Markus Jelsma

Hi - you should not use wild cards for autocompletion, Lucene has far better 
tools for making very good autocompletion, also, since a wild card is a multi 
term query, they are not passed through your configured query time analyzer.

Some other comments:
- you use a porter stemmer but you should use one of the German specific stem 
filters.
- you don't have an index time tokenizer defined, this should not be possible 
and behaviour is undefined as far as i know.


On Tuesday 07 October 2014 14:25:27 Thomas Michael Engelke wrote:
> I have a problem with a stemmed german field. The field definition:
> 
>  stored="true" required="false" multiValued="false"/>
> ...
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>
>   words="stopwords.txt"/>
>   generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>  
>   protected="protwords.txt"/>
>  
>
>
>  
>   ignoreCase="true" expand="true"/>
>   words="stopwords.txt"/>
>   generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>  
>   protected="protwords.txt"/>
>  
>
> 
> 
> When we search for a word from an autosuggest kind of component, we
> always add an asterisk to a word, so when somebody enters something like
> "Radbremszylinder" and waits for some milliseconds, the autosuggest list
> is filled with the results of searching for "Radbremszylinder*". This
> seemed to work quite well. Today we got a bug report from a customer for
> that exact word.
> 
> So I made an analysis for the word as "Field value (index)" and "Field
> value (query)", and it looked like this:
> 
> ST   RadbremszylinderWT   Radbremszylinder*
> SF   RadbremszylinderSF   Radbremszylinder*
> WDF  RadbremszylinderSF   Radbremszylinder*
> LCF  radbremszylinderWDF  Radbremszylinder
> SKMF radbremszylinderLCF  radbremszylinder
> PSF  radbremszylind  SKMF radbremszylinder
> 
> As you can see, the end result looks very much alike. However, records
> containing that word in their "description" field aren't reported as
> results. Strangely enough, records containing "Radbremszylindern"
> (plural) are reported as results. Removing the asterisk from the end
> reports all records with "Radbremszylinder", just as we would expect. So
> the culprit is the asterisk at the end. As far as we can read from the
> docs, an asterisk is just 0 or more characters, which means that the
> literal word in front of the asterisk should match the query.
> 
> Searching further we tried some variations, and it seems that searching
> for "Radbremszylind*" works. All records with any variation
> ("Radbremszylinder", "Radbremszylindern") are reported. So maybe there's
> a weird interaction with stemming?
> 
> Any ideas?

Re: Weird Problem (possible bug?) with german stemming and wildcard search

2014-10-07 Thread Alexandre Rafalovitch

On 7 October 2014 08:25, Thomas Michael Engelke
 wrote:
> So the culprit is the asterisk at the end. As far as we can read from the
> docs, an asterisk is just 0 or more characters, which means that the literal
> word in front of the asterisk should match the query.

Not quite: http://wiki.apache.org/solr/MultitermQueryAnalysis

It's actually quite complicated and even depends on exact version of
Solr you are using. In fact, out of all the analyzers you showed
above, I think only LowerCase will be present on the chain. Look for
(multi) marker at: http://www.solr-start.com/info/analyzers/ for more
details.

On a higher level, I would suggest getting away from *-based expansion
and looking at EdgeNGrams instead. You can see an example of
autocomplete at
http://www.solr-start.com/javadoc/solr-lucene/index.html and the
matching configuration at:
https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24

Or a dedicated Suggester module, though information on that is a bit
harder to find.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

Weird Problem (possible bug?) with german stemming and wildcard search

2014-10-07 Thread Thomas Michael Engelke


I have a problem with a stemmed german field. The field definition:

stored="true" required="false" multiValued="false"/>

...
positionIncrementGap="100" autoGeneratePhraseQueries="true">

  
words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>


protected="protwords.txt"/>


  
  

ignoreCase="true" expand="true"/>
words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>


protected="protwords.txt"/>


  


When we search for a word from an autosuggest kind of component, we 
always add an asterisk to a word, so when somebody enters something like 
"Radbremszylinder" and waits for some milliseconds, the autosuggest list 
is filled with the results of searching for "Radbremszylinder*". This 
seemed to work quite well. Today we got a bug report from a customer for 
that exact word.


So I made an analysis for the word as "Field value (index)" and "Field 
value (query)", and it looked like this:


ST   RadbremszylinderWT   Radbremszylinder*
SF   RadbremszylinderSF   Radbremszylinder*
WDF  RadbremszylinderSF   Radbremszylinder*
LCF  radbremszylinderWDF  Radbremszylinder
SKMF radbremszylinderLCF  radbremszylinder
PSF  radbremszylind  SKMF radbremszylinder

As you can see, the end result looks very much alike. However, records 
containing that word in their "description" field aren't reported as 
results. Strangely enough, records containing "Radbremszylindern" 
(plural) are reported as results. Removing the asterisk from the end 
reports all records with "Radbremszylinder", just as we would expect. So 
the culprit is the asterisk at the end. As far as we can read from the 
docs, an asterisk is just 0 or more characters, which means that the 
literal word in front of the asterisk should match the query.


Searching further we tried some variations, and it seems that searching 
for "Radbremszylind*" works. All records with any variation 
("Radbremszylinder", "Radbremszylindern") are reported. So maybe there's 
a weird interaction with stemming?


Any ideas?

Re: Wildcard search makes no sense!!

2014-10-02 Thread waynemailinglist

Ok I think I understand your points there. Just clarify say if the term was
"Large increased" and my filters went something like:

Large|increased
Large|increase|increased
large|increase|increased

the final tokens indexed would be large|increase|increased  ?

Once again thanks for all the help.


On Thu, Oct 2, 2014 at 2:30 PM, Shawn Heisey-2 [via Lucene] <
ml-node+s472066n4162306...@n3.nabble.com> wrote:

> On 10/2/2014 4:33 AM, waynemailinglist wrote:
>
> > Something that is still not clear in my mind is how this tokenising
> works.
> > For example with the filters I have when I run the analyser I get:
> > Field: Hello You
> >
> > Hello|You
> > Hello|You
> > Hello|You
> > hello|you
> > hello|you
> >
> >
> > Does this mean that the index is stored as 'hello|you' (the final one)
> and
> > that when I run a query and it goes through the filters whatever the end
> > result of that is must match the 'hello|you' in order to return a
> result?
>
> The index has two terms for this field if this is the whole input --
> hello and you -- which can be searched for individually.  The tokenizer
> does the initial job of separating the input into tokens (terms) ...
> some filters can create additional terms, depending on exactly what's
> left when the tokenizer is done.
>
> Thanks,
> Shawn
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162306.html
>  To unsubscribe from Wildcard search makes no sense!!, click here
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4162069&code=d2F5bmVtYWlsaW5nbGlzdHNAZ21haWwuY29tfDQxNjIwNjl8LTIxOTMxNzkyNQ==>
> .
> NAML
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162349.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search makes no sense!!

2014-10-02 Thread Erick Erickson

right, prior to 3.6, the standard way to handle wildcards was to,
essentially, pre-analyze the terms that had  wildcards. This works
fine for simple filters, things like lowercasing for instance, but
doesn't work so well for things like stemming.

So you're doing what can be done at this point, but moving to 4.x (or
even 3.6) would solve it better.

Best,
Erick

On Thu, Oct 2, 2014 at 6:29 AM, Shawn Heisey  wrote:
> On 10/2/2014 4:33 AM, waynemailinglist wrote:
>> Something that is still not clear in my mind is how this tokenising works.
>> For example with the filters I have when I run the analyser I get:
>> Field: Hello You
>>
>> Hello|You
>> Hello|You
>> Hello|You
>> hello|you
>> hello|you
>>
>>
>> Does this mean that the index is stored as 'hello|you' (the final one) and
>> that when I run a query and it goes through the filters whatever the end
>> result of that is must match the 'hello|you' in order to return a result?
>
> The index has two terms for this field if this is the whole input --
> hello and you -- which can be searched for individually.  The tokenizer
> does the initial job of separating the input into tokens (terms) ...
> some filters can create additional terms, depending on exactly what's
> left when the tokenizer is done.
>
> Thanks,
> Shawn
>

Re: Wildcard search makes no sense!!

2014-10-02 Thread Shawn Heisey

On 10/2/2014 4:33 AM, waynemailinglist wrote:
> Something that is still not clear in my mind is how this tokenising works.
> For example with the filters I have when I run the analyser I get:
> Field: Hello You
> 
> Hello|You
> Hello|You
> Hello|You
> hello|you
> hello|you
> 
> 
> Does this mean that the index is stored as 'hello|you' (the final one) and
> that when I run a query and it goes through the filters whatever the end
> result of that is must match the 'hello|you' in order to return a result?

The index has two terms for this field if this is the whole input --
hello and you -- which can be searched for individually.  The tokenizer
does the initial job of separating the input into tokens (terms) ...
some filters can create additional terms, depending on exactly what's
left when the tokenizer is done.

Thanks,
Shawn

Re: Wildcard search makes no sense!!

2014-10-02 Thread waynemailinglist

Many many thanks for the replies - it was helpful for me to start
understanding how this works.

I'm using 3.5 so this goes to explain a lot. What I have done is if the
query contains a * I make the query lowercase before sending to solr. This
seems to have solved this issue given your explanation above. Many thanks 

Something that is still not clear in my mind is how this tokenising works.
For example with the filters I have when I run the analyser I get:
Field: Hello You

Hello|You
Hello|You
Hello|You
hello|you
hello|you


Does this mean that the index is stored as 'hello|you' (the final one) and
that when I run a query and it goes through the filters whatever the end
result of that is must match the 'hello|you' in order to return a result?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162284.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search makes no sense!!

2014-10-01 Thread Erick Erickson

Two things:

1> what version of Solr are you using? If it's prior to 3.6, then the
bits that handle applying lowercaseFilter to wildcards isn't in the
code.

2> what do you see if you add &debug=query?

I just tried it with your analysis chain and it seemed to work. Did
you completely blow your index away when trying this? I did get into a
state where my terms didn't show up. When you change the schema,
sometimes some information about the fields is written into the index
and is incompatible with later changes.

By "completely blow away" I mean
stop Solr
rm -rf blah/collection/data
start Solr
reindex
test


Best,
Erick

On Wed, Oct 1, 2014 at 10:10 AM, waynemailinglist
 wrote:
> I'm still stuck on this actually. I would really appreciate any pointers.
> If I search for :
> query 1: Κώστας
> result: Κώστας
>
> query 2: Κώστα*
> result: 
>
> I've looked at the analyser but I don't really understand what I'm looking
> at if I'm honest. It gives the output:
> Field (name): title
> Field value: Κώστας
> Field value (query): Κώστα*
>
> Index Analyzer
> Κώστας
> Κώστας
> Κώστας
> κώστας
> κώστας
> Query Analyzer
> Κώστα*
> Κώστα*
> Κώστα*
> Κώστα
> κώστα
> κώστα
>
>
> In my schema I have defined
> 
>  ignoreCase="true" expand="true"/> (only used in query)
>  words="stopwords.txt"/>
>  generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0"/>
> 
> 
>
>
> I tried adding ASCIIFoldingFilterFactory but that didm;t make any difference
> after reindexing.
>
> Any ideas?
>
> many thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162150.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search makes no sense!!

2014-10-01 Thread Alexandre Rafalovitch

If you use "*" you use Multiterm analysis path, which is semi-hidden
and is a lot more limited to the things done with normal tokens:
https://wiki.apache.org/solr/MultitermQueryAnalysis

The Analyzer components that are NOT multiterm aware cannot be used
that way. Looking at: http://www.solr-start.com/info/analyzers/ , you
can see that only LowerCase analyzer is multiterm aware (with (multi)
in the brackets). So, the rest are not used.

You may switch to EdgeNGrams or similar instead.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 1 October 2014 13:10, waynemailinglist  wrote:
> I'm still stuck on this actually. I would really appreciate any pointers.
> If I search for :
> query 1: Κώστας
> result: Κώστας
>
> query 2: Κώστα*
> result: 
>
> I've looked at the analyser but I don't really understand what I'm looking
> at if I'm honest. It gives the output:
> Field (name): title
> Field value: Κώστας
> Field value (query): Κώστα*
>
> Index Analyzer
> Κώστας
> Κώστας
> Κώστας
> κώστας
> κώστας
> Query Analyzer
> Κώστα*
> Κώστα*
> Κώστα*
> Κώστα
> κώστα
> κώστα
>
>
> In my schema I have defined
> 
>  ignoreCase="true" expand="true"/> (only used in query)
>  words="stopwords.txt"/>
>  generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0"/>
> 
> 
>
>
> I tried adding ASCIIFoldingFilterFactory but that didm;t make any difference
> after reindexing.
>
> Any ideas?
>
> many thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162150.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search makes no sense!!

2014-10-01 Thread waynemailinglist

I'm still stuck on this actually. I would really appreciate any pointers. 
If I search for :
query 1: Κώστας
result: Κώστας

query 2: Κώστα*
result: 

I've looked at the analyser but I don't really understand what I'm looking
at if I'm honest. It gives the output:
Field (name): title
Field value: Κώστας
Field value (query): Κώστα*

Index Analyzer
Κώστας
Κώστας
Κώστας
κώστας
κώστας
Query Analyzer
Κώστα*
Κώστα*
Κώστα*
Κώστα
κώστα
κώστα


In my schema I have defined

 (only used in query)






I tried adding ASCIIFoldingFilterFactory but that didm;t make any difference
after reindexing.

Any ideas?

many thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162150.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search makes no sense!!

2014-10-01 Thread waynemailinglist

Ahmet -  many thanks - I removed the EnglishPorterFilterFactory and reindexed
and this seems to behave as expected now.

Jack - thanks aswell - I'm very much a noob with this, and thats a great
tip.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-makes-no-sense-tp4162069p4162086.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search makes no sense!!

2014-10-01 Thread Jack Krupansky

The presence of a wildcard in a query term short circuits some portions of 
the analysis process. Some token filters like lower case can still be 
performed on the query terms, but others, like stemming, cannot. So, either 
simplify the analysis (be more selective of what token filters you use), or 
you will have to modify your query terms so that you manually simulate the 
token transformations that your text analysis is performing.


Take one of your indexed terms that you think should match and send it 
through the Solr Admin UI analysis page for the query field and see what the 
source token gets analyzed into - that's what your wildcard prefix must 
match. Sometimes (usually!) you will be surprised.


-- Jack Krupansky

-Original Message- 
From: Wayne W

Sent: Wednesday, October 1, 2014 7:16 AM
To: solr-user@lucene.apache.org
Subject: Wildcard search makes no sense!!

Hi,

I don't understand this at all. We are indexing some contact names. When we
do a standard query:

query 1: capi*
result: Capital Health

query 2: capit*
result: Capital Health

query 3: capita*
result: 

query 4: capital*
result: 

I understand (as we are using solar 3.5) that the wildcard search does not
actually return the query without the wildcard so I understand at least why
query 4 is not working ( I need to use: capital* OR capital ). What I don't
understand is why query 3 is not working.

Also if we place in the text field the following 3 contacts:

j...@capitalhealth.com
f...@capitalhealth.com
Capital Heath

When searching for:

query A: capita*
result: j...@capitalhealth.com, f...@capitalhealth.com

query B: capit*
result: j...@capitalhealth.com, f...@capitalhealth.com, Capital Heath


What is going on and how can I solve this?
many thanks as I'm really stuck on this

Re: Wildcard search makes no sense!!

2014-10-01 Thread Toke Eskildsen

On Wed, 2014-10-01 at 13:16 +0200, Wayne W wrote:
> query 2: capit*
> result: Capital Health
> 
> query 3: capita*
> result: 

You are likely using a stemmer for the field: "Capital Health" gets
indexed as "capit" and "health", so there are no tokens starting with
"capita".

Turn off the stemmer or add a non-stemmed copy-field for trunkated
searches.

(sanity-checked at http://9ol.es/porter_js_demo.html)

- Toke Eskildsen, State and University Library, Denmark

Re: Wildcard search makes no sense!!

2014-10-01 Thread Ahmet Arslan

Hi,

Probably you have stemmer and it is eating up Capital to capit. Thats the 
reason.
Either remove stemmer from analyser chain or add keyword repeat filter.

Ahmet



On Wednesday, October 1, 2014 2:16 PM, Wayne W  
wrote:
Hi,

I don't understand this at all. We are indexing some contact names. When we
do a standard query:

query 1: capi*
result: Capital Health

query 2: capit*
result: Capital Health

query 3: capita*
result: 

query 4: capital*
result: 

I understand (as we are using solar 3.5) that the wildcard search does not
actually return the query without the wildcard so I understand at least why
query 4 is not working ( I need to use: capital* OR capital ). What I don't
understand is why query 3 is not working.

Also if we place in the text field the following 3 contacts:

j...@capitalhealth.com
f...@capitalhealth.com
Capital Heath

When searching for:

query A: capita*
result: j...@capitalhealth.com, f...@capitalhealth.com

query B: capit*
result: j...@capitalhealth.com, f...@capitalhealth.com, Capital Heath


What is going on and how can I solve this?
many thanks as I'm really stuck on this

Wildcard search makes no sense!!

2014-10-01 Thread Wayne W

Hi,

I don't understand this at all. We are indexing some contact names. When we
do a standard query:

query 1: capi*
result: Capital Health

query 2: capit*
result: Capital Health

query 3: capita*
result: 

query 4: capital*
result: 

I understand (as we are using solar 3.5) that the wildcard search does not
actually return the query without the wildcard so I understand at least why
query 4 is not working ( I need to use: capital* OR capital ). What I don't
understand is why query 3 is not working.

Also if we place in the text field the following 3 contacts:

j...@capitalhealth.com
f...@capitalhealth.com
Capital Heath

When searching for:

query A: capita*
result: j...@capitalhealth.com, f...@capitalhealth.com

query B: capit*
result: j...@capitalhealth.com, f...@capitalhealth.com, Capital Heath


What is going on and how can I solve this?
many thanks as I'm really stuck on this

Re: Stemming not working with wildcard search

2014-04-30 Thread Erick Erickson

Did you re-index? And what do you get when adding &debug=query? That
should show you the parsed query. Have you looked at the results of
the admin/analysis page? That tool is invaluable for seeing what the
actual transformations are.

Best,
Erick

On Mon, Apr 28, 2014 at 11:41 AM, Geepalem  wrote:
> Hi Ahmet,
>
> Thanks for your prompt response!
>
> I have added filters which you have specified but still its not working.
> Below is field Query Analyzer
>
>  
> 
> 
>
> 
> 
> 
>  
>
> http://localhost:8080/solr/master/select?q=page_title_t:*products*
> http://localhost:8080/solr/master/select?q=page_title_t:*product*
>
>
> Please let me know if I am doing anything wrong.
>
> Thanks,
> G. Naresh Kumar
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382p4133556.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working with search term having special characters and digits

2014-04-29 Thread Geepalem

Can someone help me out with this issue please?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385p4133770.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stemming not working with wildcard search

2014-04-29 Thread Geepalem

Can someone help me out with this issue?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382p4133769.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stemming not working with wildcard search

2014-04-28 Thread Geepalem

Hi Ahmet,

Thanks for your prompt response!

I have added filters which you have specified but still its not working.
Below is field Query Analyzer

 

 




 

http://localhost:8080/solr/master/select?q=page_title_t:*products*
http://localhost:8080/solr/master/select?q=page_title_t:*product*


Please let me know if I am doing anything wrong.

Thanks,
G. Naresh Kumar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382p4133556.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working with search term having special characters and digits

2014-04-28 Thread Geepalem

Thanks jack for prompt response!

So is there any solution to make this scenario works? 
Or wildcard doesn't work with special characters and numerics?

Thanks,
G. Naresh Kumar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385p4133554.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stemming not working with wildcard search

2014-04-28 Thread Ahmet Arslan

Hi Naresh,

quotes are only meaningful when there are two or more terms. don't use quotes 
for products* and product*.

As regarding stemming and wildcards, use following chain, and your wildcard 
searches will be happier.





Ahmet


On Monday, April 28, 2014 5:41 PM, Jack Krupansky  
wrote:
Wildcards and stemming are incompatible at query time - you need to manually 
stem the term before applying your wildcard.

Wildcards are not supported in quoted phrases. They will be treated as 
punctuation, and ignored by the standard tokenizer or the word delimiter 
filter.

-- Jack Krupansky

-Original Message- 
From: Geepalem
Sent: Sunday, April 27, 2014 3:13 PM
To: solr-user@lucene.apache.org
Subject: Stemming not working with wildcard search

Hi,

I have added  SnowballPorterFilterFactory filter to field type to make
singular and plural search terms return same results.

So below queries (double quotes around search term) returning similar
results which is fine.

http://localhost:8080/solr/master/select?q=page_title_t:"product*";
http://localhost:8080/solr/master/select?q=page_title_t:"products*";

But when I have analyzed results, in both result sets, documents which dont
start with words "Product" or "products" didnt come though there are few
documents available.

So I have added * as prefix and suffix to search term without double quotes
to do wildcard search.

http://localhost:8080/solr/master/select?q=page_title_t:*product*
http://localhost:8080/solr/master/select?q=page_title_t:*products*

Now, stemming is not working as above second query is not returning similar
results as query 1.

If double quotes are added around search term then its returning similar
results but results are not as expected. With double quotes it wont return
results like "Old products", "New products", "Cool Product".
It will only return results with the values like "Product 1", "Product
2","Products of USA".

Please suggest or guide how to make stemming work with wildcard search.


Appreciate immediate response!!

Thanks,
G. Naresh Kumar





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stemming not working with wildcard search

2014-04-28 Thread Jack Krupansky

Wildcards and stemming are incompatible at query time - you need to manually 
stem the term before applying your wildcard.


Wildcards are not supported in quoted phrases. They will be treated as 
punctuation, and ignored by the standard tokenizer or the word delimiter 
filter.


-- Jack Krupansky

-Original Message- 
From: Geepalem

Sent: Sunday, April 27, 2014 3:13 PM
To: solr-user@lucene.apache.org
Subject: Stemming not working with wildcard search

Hi,

I have added  SnowballPorterFilterFactory filter to field type to make
singular and plural search terms return same results.

So below queries (double quotes around search term) returning similar
results which is fine.

http://localhost:8080/solr/master/select?q=page_title_t:"product*";
http://localhost:8080/solr/master/select?q=page_title_t:"products*";

But when I have analyzed results, in both result sets, documents which dont
start with words "Product" or "products" didnt come though there are few
documents available.

So I have added * as prefix and suffix to search term without double quotes
to do wildcard search.

http://localhost:8080/solr/master/select?q=page_title_t:*product*
http://localhost:8080/solr/master/select?q=page_title_t:*products*

Now, stemming is not working as above second query is not returning similar
results as query 1.

If double quotes are added around search term then its returning similar
results but results are not as expected. With double quotes it wont return
results like "Old products", "New products", "Cool Product".
It will only return results with the values like "Product 1", "Product
2","Products of USA".

Please suggest or guide how to make stemming work with wildcard search.


Appreciate immediate response!!

Thanks,
G. Naresh Kumar





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working with search term having special characters and digits

2014-04-28 Thread Jack Krupansky

Wildcard query only works for single terms. Any embedded special characters 
will cause a term to be split into multiple terms at index time. The use of 
a wildcard in a query term with embedded special characters will bypass 
normal analysis - you need to enter the term exactly as it would be analyzed 
at index time for wildcard to work.


Ditto is your filed type uses the word delimiter filter with the split 
digits option enabled - the alpha and numeric portions will generate 
separate terms - and cause a wildcard to fail.


-- Jack Krupansky

-Original Message- 
From: Geepalem

Sent: Sunday, April 27, 2014 3:30 PM
To: solr-user@lucene.apache.org
Subject: Wildcard search not working with search term having special 
characters and digits


Hi,

Below query without wildcard search is returning results.
http://localhost:8080/solr/master/select?q=page_title_t:"an-138";

But below query with wildcard is not returning results
http://localhost:8080/solr/master/select?q=page_title_t:"an-13*";

Below query with wildcard search and no didgits  is returning results.
http://localhost:8080/solr/master/select?q=page_title_t:"an-*";

I have tried by adding WordDelimeter Filter but there is no luck.



Please suggest or guide how to make wildcard search works with special
characters and digits.

Appreciate immediate response!!

Thanks,
G. Naresh Kumar






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working with search term having special characters and digits

2014-04-28 Thread Geepalem

Can some one please help me with this as I am struck with this issue.. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385p4133478.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stemming not working with wildcard search

2014-04-28 Thread Geepalem

Can some one please help me with this as I am struck with this issue..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382p4133477.html
Sent from the Solr - User mailing list archive at Nabble.com.

Wildcard search not working with search term having special characters and digits

2014-04-27 Thread Geepalem

Hi,

Below query without wildcard search is returning results.
http://localhost:8080/solr/master/select?q=page_title_t:"an-138"; 

But below query with wildcard is not returning results
http://localhost:8080/solr/master/select?q=page_title_t:"an-13*";

Below query with wildcard search and no didgits  is returning results.
http://localhost:8080/solr/master/select?q=page_title_t:"an-*"; 

I have tried by adding WordDelimeter Filter but there is no luck.



Please suggest or guide how to make wildcard search works with special
characters and digits.

Appreciate immediate response!!

Thanks,
G. Naresh Kumar


 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385.html
Sent from the Solr - User mailing list archive at Nabble.com.

Stemming not working with wildcard search

2014-04-27 Thread Geepalem

Hi,

I have added  SnowballPorterFilterFactory filter to field type to make
singular and plural search terms return same results.

So below queries (double quotes around search term) returning similar
results which is fine.

http://localhost:8080/solr/master/select?q=page_title_t:"product*";
http://localhost:8080/solr/master/select?q=page_title_t:"products*";

But when I have analyzed results, in both result sets, documents which dont
start with words "Product" or "products" didnt come though there are few
documents available.

So I have added * as prefix and suffix to search term without double quotes
to do wildcard search.

http://localhost:8080/solr/master/select?q=page_title_t:*product*
http://localhost:8080/solr/master/select?q=page_title_t:*products*

Now, stemming is not working as above second query is not returning similar
results as query 1.

If double quotes are added around search term then its returning similar
results but results are not as expected. With double quotes it wont return
results like "Old products", "New products", "Cool Product".
It will only return results with the values like "Product 1", "Product
2","Products of USA".

Please suggest or guide how to make stemming work with wildcard search.


Appreciate immediate response!!

Thanks,
G. Naresh Kumar





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-03-05 Thread Ahmet Arslan

Hi,

Forget about patternReplaceCharFilter for a moment. Your example is more clear
this time.

q=titleName:1999/99*

should return following two docs:

d1) JULIUS CAESER (1999/99)

d2) ARABIAN NIGHTS - 1999/99

This is achievable with the following type.

1) MappingCharFilterFactory with mappings.txt

"(" => ""
")" => ""

2) WhiteSpaceTokenizerFactory
3) LowercaseFilterFactory

I dont understand your sentence : "i will never be able to specifically search
the title i want as 1999/99."
But please try / test above. I also suggest you to use prefix query parser.

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-PrefixQueryParser

Ahmet

On Wednesday, March 5, 2014 11:20 PM, Kashish wrote:
Hi Ahmet,

Let me explain with another scenario .

There is a title -> ARABIAN NIGHTS - 1999/99

Now in autocomplete, if i give 1999/99 , in the backend i append an asterisk
to it and form the solr url thsi way

q=titleName:1999/99*

I get the above mentioned title.- so works perfect

Now lets add another title to this.

-> JULIUS CAESER (1999/99)

If i pass the same query parameter, i would definitely expect both these
titles to come up. but this new one doesn't come(Because of the braces).

I can add patternReplaceFilter but this way i will never be able to
specifically search the title i want as 1999/99.

Hope you get what i am trying to achieve. Is my understanding wrong
somewhere?

Thanks.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121512.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-03-05 Thread Kashish

Hi Ahmet,

Let me explain with another scenario .

There is a title ->  ARABIAN NIGHTS - 1999/99

Now in autocomplete, if i give 1999/99 , in the backend i append an asterisk
to it and form the solr url thsi way

q=titleName:1999/99*

I get the above mentioned title.- so works perfect

Now lets add another title to this.

-> JULIUS CAESER (1999/99)

If i pass the same query parameter, i would definitely expect both these
titles to come up. but this new one doesn't come(Because of the braces).

I can add patternReplaceFilter but this way i will never be able to
specifically search the title i want as 1999/99.

Hope you get what i am trying to achieve. Is my understanding wrong
somewhere?

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121512.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-03-05 Thread Ahmet Arslan

Hi Kashish,

This is confusing. You gave the following example :

query 1999/99* should return RABIAN NIGHTS #01 (1999/99)

However you said "I cannot ignore parenthesis or other special characters..."

Above two contadicts each other.

Since you are after autocomplete you might be interested in this 
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/

Ahmet


On Wednesday, March 5, 2014 8:36 PM, Kashish  wrote:
Hi, Pls help me with this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121457.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-03-05 Thread Kashish

Hi, Pls help me with this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121457.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-03-04 Thread Kashish

Hi Erick,

I understand what you pointing out but the thing is.. this is for
autocomplete feature. I cannot ignore parenthesis or other special
characters as in certain titles like 'A Team of five', if the user fives 'a
team' then titles containing a-team and rest also comes off and this one
gets lost as we show only top 6 results (user can drill down to get closer
to the result he wants).
 
I modified my fieldtype so at index added worddelimeter delimeter and at
query time added patternfilter but now still if i use asterisk i get no
records for 1999/99* but get without asterisk. Thsi is not what i want as by
default, whatever the user enters we append asterisk to it for autocomplete
search.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121205.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-02-25 Thread Erick Erickson

The admin/analysis page is your friend. Taking some time to
get acquainted with that page will save you lots and lots and
lots of time. In this case, you'd have seen that your input
is actually tokenized as (1999/99), parentheses and all as a
_single_ token, so of course searching for 1999/99 wouldn't work.

Searching for *1999/99* is generally a bad idea. It'll work, but it's
a kludge.

What you _do_ need to do is define your use-cases. Let's
assume that you _never_ want parentheses to be relevant. You
could use PatternReplaceCharFilterFactory or PatternReplaceFilterFactory
in both index and query parts of your analysis chain to remove
parens. Or really any kinds of extraneous characters you decided
were unimportant.

But you need to decide what's "important" and enforce that.

Best,
Erick

On Tue, Feb 25, 2014 at 7:28 PM, Kashish  wrote:

> Hi Ahmet/Erick,
>
> I tried escaping as well. See no luck.
>
> The title am looking for is  - ARABIAN NIGHTS #01 (1999/99)
>
> I figured out that if i pass the query as *1999/99* (i.e asterisk not only
> at the end but at the beginning as well), It works.
>
> The problem is the braces. I can change my field type and add
>
>   generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
>
> But this will show too many results in autocomplete.
>
> Is there any best way to handle this? Or should i pass asterisk before and
> after the query?
>
> Thanks.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119678.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-02-25 Thread Kashish

Hi Ahmet/Erick,

I tried escaping as well. See no luck.

The title am looking for is  - ARABIAN NIGHTS #01 (1999/99)

I figured out that if i pass the query as *1999/99* (i.e asterisk not only
at the end but at the beginning as well), It works.

The problem is the braces. I can change my field type and add 

 

But this will show too many results in autocomplete.

Is there any best way to handle this? Or should i pass asterisk before and
after the query?

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119678.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-02-25 Thread Ahmet Arslan

Hi,

By saying escaping I mean this : q=title_autocomplete:1999\/99*   It is 
different than URL encoding.

http://lucene.apache.org/core/4_6_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters

If prefix query parser didn't return what you want then it must be something 
with indexed terms.

Can you give an example raw documents text that you expect to retrieve with 
this query?



On Tuesday, February 25, 2014 10:15 PM, Kashish  
wrote:
Hi Ahmet,

Thanks for your reply.

Yes. I pass my query this way - > q=title_autocomplete:1999%2f99

I tried your way too. But no luck. :( 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119615.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working if the query conatins numbers along with special characters.

2014-02-25 Thread Erick Erickson

What does it say happens on your admin/analysis page
for that field?

And did you by any chance change your schema without
reindexing everything?

Also, try the TermsComonent to see what tokens are actually
_in_ your index. Schema-browser from the admin page can
help here too.

Best,
Erick


On Tue, Feb 25, 2014 at 12:05 PM, Ahmet Arslan  wrote:

> Hi Kashish,
>
>
> What happens when you use this q={!prefix f=title_autocomplete}1999/99
>
> I suspect '/' character is a special query parser character therefore it
> needs to be escaped.
>
> Ahmet
>
>
> On Tuesday, February 25, 2014 9:55 PM, Kashish 
> wrote:
> Hi,
>
> I have a very weird problem. The wild card search works fine for all
> scenarios but one. It doesn't seem to give any result for query 1999/99*. I
> checked the debug query and its formed perfect.
>
> title_autocomplete:1999/99*
> title_autocomplete:1999/99*
> (+title_autocomplete:1999/99* ())/no_coord
> +title_autocomplete:1999/99* ()
>
> This is my fieldType
>
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
>
> 
>   
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
>
>
> 
>   
> 
>
> Please help we with this.
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-02-25 Thread Kashish

Hi Ahmet,

Thanks for your reply.

Yes. I pass my query this way - > q=title_autocomplete:1999%2f99

I tried your way too. But no luck. :( 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4119615.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard search not working if the query conatins numbers along with special characters.

2014-02-25 Thread Ahmet Arslan

Hi Kashish,


What happens when you use this q={!prefix f=title_autocomplete}1999/99

I suspect '/' character is a special query parser character therefore it needs 
to be escaped.

Ahmet


On Tuesday, February 25, 2014 9:55 PM, Kashish  
wrote:
Hi,

I have a very weird problem. The wild card search works fine for all
scenarios but one. It doesn't seem to give any result for query 1999/99*. I
checked the debug query and its formed perfect.

title_autocomplete:1999/99*
title_autocomplete:1999/99*
(+title_autocomplete:1999/99* ())/no_coord
+title_autocomplete:1999/99* ()

This is my fieldType


      
        
        
        
        
      
      
        
        

      
        
      
    

Please help we with this.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608.html
Sent from the Solr - User mailing list archive at Nabble.com.

Wildcard search not working if the query conatins numbers along with special characters.

2014-02-25 Thread Kashish

Hi,

I have a very weird problem. The wild card search works fine for all
scenarios but one. It doesn't seem to give any result for query 1999/99*. I
checked the debug query and its formed perfect.

title_autocomplete:1999/99*
title_autocomplete:1999/99*
(+title_autocomplete:1999/99* ())/no_coord
+title_autocomplete:1999/99* ()

This is my fieldType

 
  




  
  



   

  


Please help we with this.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr wildcard search

2013-09-14 Thread Erick Erickson

Also be aware that some analysis steps may not
be performed on wildcards. The filter has to be
MultTermAware. See:

https://wiki.apache.org/solr/MultitermQueryAnalysis
and
http://searchhub.org/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

Best,
Erick

On Fri, Sep 13, 2013 at 12:12 PM, Jack Krupansky wrote:

> Wildcard applies only to a single term. The escaped space suggests that
> you are trying to match a wildcard on multiple terms.
>
> Try the contrib complex phrase query parser.
>
> -- Jack Krupansky
>
> -Original Message- From: Prasi S
> Sent: Friday, September 13, 2013 6:37 AM
> To: solr-user@lucene.apache.org
> Subject: Solr wildcard search
>
>
> Hi all,
> I am working with wildcard queries and few things are confusing.
>
> 1. Does a wildcard search omit the analysers on a particular field?
>
> 2. I have searched for
> q=google\ technology - >gives result
> q=google technology -> Gives results
> q=google tech*   -> gives results
> q=google\ tech* -> 0 results. The debug Query for the last query is  name="parsedquery_toString">**text:google tech*
>
> Why does this happen.
>
>
> Thanks,
> Prasi
>

Solr wildcard search

2013-09-13 Thread Prasi S

Hi all,
I am working with wildcard queries and few things are confusing.

1. Does a wildcard search omit the analysers on a particular field?

2. I have searched for
q=google\ technology - >gives result
q=google technology -> Gives results
q=google tech*   -> gives results
q=google\ tech* -> 0 results. The debug Query for the last query is text:google tech*

Why does this happen.


Thanks,
Prasi

Re: Solr wildcard search

2013-09-13 Thread Jack Krupansky

Wildcard applies only to a single term. The escaped space suggests that you 
are trying to match a wildcard on multiple terms.


Try the contrib complex phrase query parser.

-- Jack Krupansky

-Original Message- 
From: Prasi S

Sent: Friday, September 13, 2013 6:37 AM
To: solr-user@lucene.apache.org
Subject: Solr wildcard search

Hi all,
I am working with wildcard queries and few things are confusing.

1. Does a wildcard search omit the analysers on a particular field?

2. I have searched for
q=google\ technology - >gives result
q=google technology -> Gives results
q=google tech*   -> gives results
q=google\ tech* -> 0 results. The debug Query for the last query is text:google tech*

Why does this happen.


Thanks,
Prasi

Re: Synonyms with wildcard search

2013-07-30 Thread Jack Krupansky

Sorry, but Solr synonym processing does not know about wildcards, so it is 
bypassed when a wildcard is present.


Technically, it could probably be enhanced to support them, at least for 
some common special cases such as yours, but that prospect won't help you 
right now.


Your best bet is to preprocess your queries in your application layer and 
perform the mapping there.


-- Jack Krupansky

-Original Message- 
From: Sandeep Gupta

Sent: Tuesday, July 30, 2013 5:22 AM
To: solr-user@lucene.apache.org
Subject: Synonyms with wildcard search

Hello All,

I want to know whether it is possible to make a query of word which has
synonym+wildcard.

For example :  I have one field which is type of text_en (default fieldType
in 4.3.1)
And synonym.txt file has this entry
colour => color

Now when I am using full text search as colour* (with wild card) then
search result is not returning the keyword of type colorology... (as in
case If I use color* then I am getting this word)

So any suggestions as how I can achieve this Or its not possible.

Thanks
Sandeep

Synonyms with wildcard search

2013-07-30 Thread Sandeep Gupta

Hello All,

I want to know whether it is possible to make a query of word which has
synonym+wildcard.

For example :  I have one field which is type of text_en (default fieldType
in 4.3.1)
And synonym.txt file has this entry
colour => color

Now when I am using full text search as colour* (with wild card) then
search result is not returning the keyword of type colorology... (as in
case If I use color* then I am getting this word)

So any suggestions as how I can achieve this Or its not possible.

Thanks
Sandeep

Re: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-13 Thread Dmitry Kan

Just a quick comment from our experience: since we have quite a lot of data
indexed in our Solr, we take some extra measures to ensure, no bogus
wild-card queries are accepted by the system (for instance *, **, *** etc).
And that is done in the QueryParser. Wanted to mention this approach as one
way of handling simple query security checks.

-- Dmitry

On Tue, Nov 13, 2012 at 6:22 AM, Jack Krupansky wrote:

> Be sure to realize that even with reverse wildcard support, the user can
> add a trailing wildcard as well (double-ended wildcard) and then you are
> back in the same boat.
>
> The overall idea is that: 1) Hardware is much faster than just 3 or 4
> years ago, and 2) even though document counts are getting much larger, the
> number of unique terms (which is all that matters for wildcard performance)
> does not tend to grow as fast as document count grows. And, some fields
> have a much more limited vocabulary (unique terms), so a leading wildcard
> is not necessarily a big performance hit.
>
> Technology advances. We should permit our mindsets to advance as well.
>
> -- Jack Krupansky
>
>
> -Original Message- From: François Schiettecatte
> Sent: Monday, November 12, 2012 2:38 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1?
>
>
> John
>
> You can still use leading wildcards even if you dont have the
> ReversedWildcardFilterFactory in your analysis but it means you will be
> scanning the entire dictionary when the search is run which can be a
> performance issue. If you do use ReversedWildcardFilterFactory you wont
> have that performance issue but you will increase the overall size of your
> index. Its a tradeoff.
>
> When I looked into it for a site I built I decided that the tradeoff was
> not worth it (after benchmarking) given how few leading wildcards searches
> it was getting.
>
> Best regards
>
> François
>
>
> On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote:
>
>
>>
>> Hi,
>>
>>
>> I'm migrating from Solr 1.2 to 3.6.1.  I used the same analyzer as I was,
>> and re-indexed my data.  I did not add
>> solr.**ReversedWildcardFilterFactory to my index analyzer, but yet
>> leading wild cards are working!!  Does this mean it's turned on by default?
>>  If so, how do I turn it off, and what are the implication of leaving ON?
>> Won't my searches be slower and consume more memory?
>>
>>
>> Thanks,
>>
>>
>> --MJ
>>
>>


-- 
Regards,

Dmitry Kan

Re: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread Jack Krupansky

Be sure to realize that even with reverse wildcard support, the user can add 
a trailing wildcard as well (double-ended wildcard) and then you are back in 
the same boat.


The overall idea is that: 1) Hardware is much faster than just 3 or 4 years 
ago, and 2) even though document counts are getting much larger, the number 
of unique terms (which is all that matters for wildcard performance) does 
not tend to grow as fast as document count grows. And, some fields have a 
much more limited vocabulary (unique terms), so a leading wildcard is not 
necessarily a big performance hit.


Technology advances. We should permit our mindsets to advance as well.

-- Jack Krupansky

-Original Message- 
From: François Schiettecatte

Sent: Monday, November 12, 2012 2:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1?

John

You can still use leading wildcards even if you dont have the 
ReversedWildcardFilterFactory in your analysis but it means you will be 
scanning the entire dictionary when the search is run which can be a 
performance issue. If you do use ReversedWildcardFilterFactory you wont have 
that performance issue but you will increase the overall size of your index. 
Its a tradeoff.


When I looked into it for a site I built I decided that the tradeoff was not 
worth it (after benchmarking) given how few leading wildcards searches it 
was getting.


Best regards

François


On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote:




Hi,


I'm migrating from Solr 1.2 to 3.6.1.  I used the same analyzer as I was, 
and re-indexed my data.  I did not add
solr.ReversedWildcardFilterFactory to my index analyzer, but yet leading 
wild cards are working!!  Does this mean it's turned on by default?  If 
so, how do I turn it off, and what are the implication of leaving ON? 
Won't my searches be slower and consume more memory?



Thanks,


--MJ

Re: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread Yonik Seeley

On Tue, Nov 13, 2012 at 2:27 AM,   wrote:
> I'm surprised that this has not been logged as adefect.  The fact that this 
> is ON bydefault, means someone can bring down a server; this is bad enough to 
> categorizethis as a security issue.

It's all relative.  There are tons of queries that can take a long
time and disabling them all by default would just be frustrating for
users (range queries, prefix queries, regex queries, etc).  If a
single wildcard query like *a is bad, then non leading wildcard a*a
a*a a*a a*a a*a a*a a*a a*a will probably be just as bad (or [a TO z],
or [* TO *], etc.  It's no real protection from a security
perspective.

Individual control of different query types in edismax would probably
be nice though (and perhaps a minimum wildcard prefix length rather
than just an on/off switch).

-Yonik
http://lucidworks.com

RE: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread johnmunir


I'm surprised that this has not been logged as adefect.  The fact that this is 
ON bydefault, means someone can bring down a server; this is bad enough to 
categorizethis as a security issue.
 
--MJ
 
-Original Message-
From: Michael Ryan [mailto:mr...@moreover.com] 
Sent: Monday, November 12, 2012 8:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Is leading wildcard search turned on by default in Solr 3.6.1?
 
Yeah, thesituation is kind of a pain right now. In 
https://issues.apache.org/jira/browse/SOLR-2438, it was enabled by default and 
there is noway to disable without patching SolrQueryParser. There's also the 
edismaxparser which doesn't have a setting for this, which I've made a jira for 
at https://issues.apache.org/jira/browse/SOLR-3031.
 
I'm surprisedother people haven't requested this, as any instance of serious 
size can bebrought to its knees by a wildcard query.
 
-Michael
 
-OriginalMessage-
From: johnmu...@aol.com [mailto:johnmu...@aol.com] 
Sent: Monday,November 12, 2012 7:58 PM
To: solr-user@lucene.apache.org
Subject: RE: Isleading wildcard search turned on by default in Solr 3.6.1?
 
 
At one point, insome version of Solr, it was OFF by default, and you had to 
enable it via asetting (either in solrconfig.xml or schema.xml, I don't 
remember).  It looks like this is no longer thecase.  Even worse, and if this 
is true,disabling it no longer seems to be possible to disable it via a Solr 
setting!!
 
 
-- MJ

RE: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread Michael Ryan

Yeah, the situation is kind of a pain right now. In 
https://issues.apache.org/jira/browse/SOLR-2438, it was enabled by default and 
there is no way to disable without patching SolrQueryParser. There's also the 
edismax parser which doesn't have a setting for this, which I've made a jira 
for at https://issues.apache.org/jira/browse/SOLR-3031.

I'm surprised other people haven't requested this, as any instance of serious 
size can be brought to its knees by a wildcard query.

-Michael

-Original Message-
From: johnmu...@aol.com [mailto:johnmu...@aol.com] 
Sent: Monday, November 12, 2012 7:58 PM
To: solr-user@lucene.apache.org
Subject: RE: Is leading wildcard search turned on by default in Solr 3.6.1?


At one point, in some version of Solr, it was OFF by default, and you had to 
enable it via a setting (either in solrconfig.xml or schema.xml, I don't 
remember).  It looks like this is no longer the case.  Even worse, and if this 
is true, disabling it no longer seems to be possible to disable it via a Solr 
setting!!


-- MJ

RE: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread johnmunir


At one point, in some version of Solr, it was OFF by default, and you had to 
enable it via a setting (either in solrconfig.xml or schema.xml, I don't 
remember).  It looks like this is no longer the case.  Even worse, and if this 
is true, disabling it no longer seems to be possible to disable it via a Solr 
setting!!


-- MJ


-Original Message-
From: François Schiettecatte [mailto:fschietteca...@gmail.com] 
Sent: Monday, November 12, 2012 7:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1?


I suspect it is just part of the wildcard handling, maybe someone can chime in 
here, you may need to catch this before it gets to SOLR.


François


On Nov 12, 2012, at 5:44 PM, johnmu...@aol.com wrote:


> Thanks for the quick response.
> 
> 
> So, I do not want to use ReversedWildcardFilterFactory, but leading wildcard 
> is working and thus is ON by default.  How do I disable it to prevent the use 
> of it and the issues that come with it?
> 
> 
> -- MJ
> 
> 
> 
> -Original Message-
> From: François Schiettecat
> te 
> To: solr-user 
> Sent: Mon, Nov 12, 2012 5:39 pm
> Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1?
> 
> 
> John
> 
> You can still use leading wildcards even if you dont have the 
> ReversedWildcardFilterFactory in your analysis but it means you will 
> be scanning the entire dictionary when the search is run which can be a 
> performance issue.
> If you do use ReversedWildcardFilterFactory you wont have that 
> performance issue but you will increase the overall size of your index. Its a 
> tradeoff.
> 
> When I looked into it for a site I built I decided that the tradeoff 
> was not worth it (after benchmarking) given how few leading wildcards 
> searches it was getting.
> 
> Best regards
> 
> François
> 
> 
> On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote:
> 
>> 
>> 
>> Hi,
>> 
>> 
>> I'm migrating from Solr 1.2 to 3.6.1.  I used the same analyzer as I 
>> was, and
> re-indexed my data.  I did not add
>> solr.ReversedWildcardFilterFactory to my index analyzer, but yet 
>> leading wild
> cards are working!!  Does this mean it's turned on by default?  If so, 
> how do I turn it off, and what are the implication of leaving ON?  
> Won't my searches be slower and consume more memory?
>> 
>> 
>> Thanks,
>> 
>> 
>> --MJ
>> 
> 
> 
> 
>

Re: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread François Schiettecatte

I suspect it is just part of the wildcard handling, maybe someone can chime in 
here, you may need to catch this before it gets to SOLR.

François

On Nov 12, 2012, at 5:44 PM, johnmu...@aol.com wrote:

> Thanks for the quick response.
> 
> 
> So, I do not want to use ReversedWildcardFilterFactory, but leading wildcard 
> is working and thus is ON by default.  How do I disable it to prevent the use 
> of it and the issues that come with it?
> 
> 
> -- MJ
> 
> 
> 
> -Original Message-
> From: François Schiettecat
> te 
> To: solr-user 
> Sent: Mon, Nov 12, 2012 5:39 pm
> Subject: Re: Is leading wildcard search turned on by default in Solr 3.6.1?
> 
> 
> John
> 
> You can still use leading wildcards even if you dont have the 
> ReversedWildcardFilterFactory in your analysis but it means you will be 
> scanning 
> the entire dictionary when the search is run which can be a performance 
> issue. 
> If you do use ReversedWildcardFilterFactory you wont have that performance 
> issue 
> but you will increase the overall size of your index. Its a tradeoff. 
> 
> When I looked into it for a site I built I decided that the tradeoff was not 
> worth it (after benchmarking) given how few leading wildcards searches it was 
> getting.
> 
> Best regards
> 
> François
> 
> 
> On Nov 12, 2012, at 5:33 PM, johnmu...@aol.com wrote:
> 
>> 
>> 
>> Hi,
>> 
>> 
>> I'm migrating from Solr 1.2 to 3.6.1.  I used the same analyzer as I was, 
>> and 
> re-indexed my data.  I did not add 
>> solr.ReversedWildcardFilterFactory to my index analyzer, but yet leading 
>> wild 
> cards are working!!  Does this mean it's turned on by default?  If so, how do 
> I 
> turn it off, and what are the implication of leaving ON?  Won't my searches 
> be 
> slower and consume more memory?
>> 
>> 
>> Thanks,
>> 
>> 
>> --MJ
>> 
> 
> 
> 
>

1 2 3 >

1 - 100 of 228 matches

Mail list logo