Re: Synonym expansions w/ phrase slop exhausting memory after upgrading to SOLR 7

2019-12-18 Thread Nick D
Michael,

Thank you so much, that was extremely helpful. My googlefu wasn't good
enough I guess.

1. Was my initial fix just to stop it from exploding.

2. Will be the perm solutions for now until we can get some things squared
away for 8.0.

Sounds like even in 8 there is a problem with any graph query expansion
potential still growing rather large but it just won't consume all
available memory, is that correct?

One final question, why would the maxbooleanqueries value in the solrconfig
still apply? Reading through all the jiras I thought that was supposed to
still be a fail safe, did I miss something?

Thanks again for your help,

Nick

On Wed, Dec 18, 2019, 8:10 AM Michael Gibney 
wrote:

> This is related to this issue:
> https://issues.apache.org/jira/browse/SOLR-13336
>
> Also tangentially relevant:
> https://issues.apache.org/jira/browse/LUCENE-8531
> https://issues.apache.org/jira/browse/SOLR-12243
>
> I think your options include:
> 1. setting slop=0, which restores SpanNearQuery as the graph phrase
> query implementation (see LUCENE-8531)
> 2. downgrading to 7.5 would avoid the OOM, but would cause graph
> phrase queries to be effectively ignored (see SOLR-12243)
> 3. upgrade to 8.0, which will restore the failsafe maxBooleanClauses,
> avoiding OOM but returning an error code for affected queries (which
> in your case sounds like most queries?) (see SOLR-13336)
>
> Michael
>
> On Tue, Dec 17, 2019 at 4:16 PM Nick D  wrote:
> >
> > Hello All,
> >
> > We recently upgraded from Solr 6.6 to Solr 7.7.2 and recently had spikes
> in
> > memory that eventually caused either an OOM or almost 100% utilization of
> > the available memory. After trying a few things, increasing the JVM heap,
> > making sure docValues were set for all Sort, facet fields (thought maybe
> > the fieldCache was blowing up), I was able to isolate a single query that
> > would cause the used memory to become fully exhausted and effectively
> > render the instance dead. After applying a timeAllowed  value to the
> query
> > and reducing the query phrase (system would crash on without throwing the
> > warning on longer queries containing synonyms). I was able to idenitify
> the
> > following warning in the logs:
> >
> > o.a.s.s.SolrIndexSearcher Query: <very long synonym expansion>
> >
> > the request took too long to iterate over terms. Timeout: timeoutAt:
> > 812182664173653 (System.nanoTime(): 812182715745553),
> > TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@7a0db441
> >
> > I have narrowed the problem down to the following:
> > the way synonyms are being expaneded along with phrase slop.
> >
> > With a ps=5 I get 4096 possible permutations of the phrase being searched
> > with because of synonyms, looking similar to:
> > ngs_title:"bereavement leave type build bereavement leave type data p"~5
> >  ngs_title:"bereavement leave type build bereavement bereavement type
> data
> > p"~5
> >  ngs_title:"bereavement leave type build bereavement jury duty type data
> > p"~5
> >  ngs_title:"bereavement leave type build bereavement maternity leave type
> > data p"~5
> >  ngs_title:"bereavement leave type build bereavement paternity type data
> > p"~5
> >  ngs_title:"bereavement leave type build bereavement paternity leave type
> > data p"~5
> >  ngs_title:"bereavement leave type build bereavement adoption leave type
> > data p"~5
> >  ngs_title:"bereavement leave type build jury duty maternity leave type
> > data p"~5
> >  ngs_title:"bereavement leave type build jury duty paternity type data
> p"~5
> >  ngs_title:"bereavement leave type build jury duty paternity leave type
> > data p"~5
> >  ngs_title:"bereavement leave type build jury duty adoption leave type
> data
> > p"~5
> >  ngs_title:"bereavement leave type build jury duty absence type data p"~5
> >  ngs_title:"bereavement leave type build maternity leave leave type data
> > p"~5
> >  ngs_title:"bereavement leave type build maternity leave bereavement type
> > data p"~5
> >  ngs_title:"bereavement leave type build maternity leave jury duty type
> > data p"~5
> >
> > 
> >
> > Previously in Solr 6 that same query, with the same synonyms (and query
> > analysis chain) would produce a parsedQuery like when using a &ps=5:
> > DisjunctionMaxQuery(((ngs_field_description:\"leave leave type build
> leave
> > leave type data ? p leave leave type type.enabled\"~5

Synonym expansions w/ phrase slop exhausting memory after upgrading to SOLR 7

2019-12-17 Thread Nick D
Hello All,

We recently upgraded from Solr 6.6 to Solr 7.7.2 and recently had spikes in
memory that eventually caused either an OOM or almost 100% utilization of
the available memory. After trying a few things, increasing the JVM heap,
making sure docValues were set for all Sort, facet fields (thought maybe
the fieldCache was blowing up), I was able to isolate a single query that
would cause the used memory to become fully exhausted and effectively
render the instance dead. After applying a timeAllowed  value to the query
and reducing the query phrase (system would crash on without throwing the
warning on longer queries containing synonyms). I was able to idenitify the
following warning in the logs:

o.a.s.s.SolrIndexSearcher Query: 

the request took too long to iterate over terms. Timeout: timeoutAt:
812182664173653 (System.nanoTime(): 812182715745553),
TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@7a0db441

I have narrowed the problem down to the following:
the way synonyms are being expaneded along with phrase slop.

With a ps=5 I get 4096 possible permutations of the phrase being searched
with because of synonyms, looking similar to:
ngs_title:"bereavement leave type build bereavement leave type data p"~5
 ngs_title:"bereavement leave type build bereavement bereavement type data
p"~5
 ngs_title:"bereavement leave type build bereavement jury duty type data
p"~5
 ngs_title:"bereavement leave type build bereavement maternity leave type
data p"~5
 ngs_title:"bereavement leave type build bereavement paternity type data
p"~5
 ngs_title:"bereavement leave type build bereavement paternity leave type
data p"~5
 ngs_title:"bereavement leave type build bereavement adoption leave type
data p"~5
 ngs_title:"bereavement leave type build jury duty maternity leave type
data p"~5
 ngs_title:"bereavement leave type build jury duty paternity type data p"~5
 ngs_title:"bereavement leave type build jury duty paternity leave type
data p"~5
 ngs_title:"bereavement leave type build jury duty adoption leave type data
p"~5
 ngs_title:"bereavement leave type build jury duty absence type data p"~5
 ngs_title:"bereavement leave type build maternity leave leave type data
p"~5
 ngs_title:"bereavement leave type build maternity leave bereavement type
data p"~5
 ngs_title:"bereavement leave type build maternity leave jury duty type
data p"~5



Previously in Solr 6 that same query, with the same synonyms (and query
analysis chain) would produce a parsedQuery like when using a &ps=5:
DisjunctionMaxQuery(((ngs_field_description:\"leave leave type build leave
leave type data ? p leave leave type type.enabled\"~5)^3.0 |
(ngs_title:\"leave leave type build leave leave type data ? p leave leave
type type.enabled\"~5)^10.0)

The expansion wasn't being applied to the added disjunctionMaxQuery to when
adjusting rankings with phrase slop.

In general the parsedqueries between 6 and 7 are differnet, with some new
`spanNears` showing but they don't create the memory consumpution issues
that I have seen when a large synonym expansion is happening along w/ using
a PS parameter.

I didn't see much in terms on release notes changes for synonym changes
(outside of SOW=false being the default for version . 7).

The field being opertated on has the following query analysis chain:

 




  

Not sure if there is a change in phrase slop that now takes synonyms into
account and if there is way to disable that kind of expansion or not. I am
not sure if it is related to SOLR-10980
 or
not, does seem to be related,  but referenced Solr 6 which does not do the
expansion.

Any help would be greatly appreciated.

Nick


Re: Min-should-match and Mutli-word synonyms unexpected result

2018-02-06 Thread Nick D
Thanks Steve,

I'll test out that version.

Nick

On Feb 6, 2018 6:23 AM, "Steve Rowe"  wrote:

> Hi Nick,
>
> I think this was fixed by https://issues.apache.org/
> jira/browse/LUCENE-7878 in Solr 6.6.1.
>
> --
> Steve
> www.lucidworks.com
>
> > On Feb 5, 2018, at 3:58 PM, Nick D  wrote:
> >
> > I have run into an issue with multi-word synonyms and a min-should-match
> > (MM) of anything other than `0`, *Solr version 6.6.0*.
> >
> > Here is my example query, first with mm set to zero and the second with a
> > non-zero value:
> >
> > With MM set to 0
> > select?fl=*&indent=on&wt=json&debug=ALL&q=EIB&qf=ngs_title%
> 20ngs_field_description&sow=false&mm=0
> >
> > which parse to:
> >
> > parsedquery_toString":"+(((+ngs_field_description:enterprise
> > +ngs_field_description:interface +ngs_field_description:builder)
> > ngs_field_description:eib) | ((+ngs_title:enterprise
> > +ngs_title:interface +ngs_title:builder) ngs_title:eib))~0.01"
> >
> > and using my default MM (2<-35%)
> > select?fl=*&indent=on&wt=json&debug=ALL&q=EIB&qf=ngs_title%
> 20ngs_field_description&sow=false
> >
> > which parse to
> >
> > +ngs_field_description:enterprise +ngs_field_description:interface
> > +ngs_field_description:builder) ngs_field_description:eib)~2) |
> > (((+ngs_title:enterprise +ngs_title:interface +ngs_title:builder)
> > ngs_title:eib)~2))
> >
> > My synonym here is:
> > EIB, Enterprise Interface Builder
> >
> > For my two documents I have the field ngs_title with values "EIB" (Doc 1)
> > and "enterprise interface builder" (Doc 2)
> >
> > For both queries the doc 1 is always returned as EIB is matched, but for
> > doc 2 although I have EIB and Enterprise interface builder defined as
> > equivalent synonyms when the MM is not set to zero that document is not
> > returned. From the parsestring I see the ~2 being applied for the MM but
> my
> > expectation was that it has been met via the synonyms and the fact that I
> > am not actaully searching a phrase.
> >
> > I couldn't find much on the relationship between the two outside of a
> some
> > of the things Doug Turnbull had linked to another solr-user question and
> > this blog post that mentions weirdness around MM and multi-word:
> >
> > https://lucidworks.com/2017/04/18/multi-word-synonyms-
> solr-adds-query-time-support/
> >
> > http://opensourceconnections.com/blog/2013/10/27/why-is-
> multi-term-synonyms-so-hard-in-solr/
> >
> > Also looked through the comments here,
> > https://issues.apache.org/jira/browse/SOLR-9185, but at first glance
> didn't
> > see anything that jumped out at me.
> >
> > Here is the field definition for the ngs_* fields:
> >
> >  positionIncrementGap="100">
> >  
> > > mapping="mapping-ISOLatin1Accent.txt"/>
> > > pattern="([()])" replacement=""/>
> >
> > > pattern="(^[^0-9A-Za-z_]+)|([^0-9A-Za-z_]+$)" replacement=""/>
> > > words="stopwords.txt"/>
> >
> >
> > > maxGramSize="50"/>
> >  
> >  
> >
> > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >
> >  
> >
> >
> > I am not sure if we cannot use MM anymore for these type of queries or if
> > there is something I setup incorrectly, any help would be greatly
> > appreciated.
> >
> > Nick
>
>


Min-should-match and Mutli-word synonyms unexpected result

2018-02-05 Thread Nick D
I have run into an issue with multi-word synonyms and a min-should-match
(MM) of anything other than `0`, *Solr version 6.6.0*.

Here is my example query, first with mm set to zero and the second with a
non-zero value:

With MM set to 0
select?fl=*&indent=on&wt=json&debug=ALL&q=EIB&qf=ngs_title%20ngs_field_description&sow=false&mm=0

which parse to:

parsedquery_toString":"+(((+ngs_field_description:enterprise
+ngs_field_description:interface +ngs_field_description:builder)
ngs_field_description:eib) | ((+ngs_title:enterprise
+ngs_title:interface +ngs_title:builder) ngs_title:eib))~0.01"

and using my default MM (2<-35%)
select?fl=*&indent=on&wt=json&debug=ALL&q=EIB&qf=ngs_title%20ngs_field_description&sow=false

which parse to

+ngs_field_description:enterprise +ngs_field_description:interface
+ngs_field_description:builder) ngs_field_description:eib)~2) |
(((+ngs_title:enterprise +ngs_title:interface +ngs_title:builder)
ngs_title:eib)~2))

My synonym here is:
EIB, Enterprise Interface Builder

For my two documents I have the field ngs_title with values "EIB" (Doc 1)
and "enterprise interface builder" (Doc 2)

For both queries the doc 1 is always returned as EIB is matched, but for
doc 2 although I have EIB and Enterprise interface builder defined as
equivalent synonyms when the MM is not set to zero that document is not
returned. From the parsestring I see the ~2 being applied for the MM but my
expectation was that it has been met via the synonyms and the fact that I
am not actaully searching a phrase.

I couldn't find much on the relationship between the two outside of a some
of the things Doug Turnbull had linked to another solr-user question and
this blog post that mentions weirdness around MM and multi-word:

https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/

http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/

Also looked through the comments here,
https://issues.apache.org/jira/browse/SOLR-9185, but at first glance didn't
see anything that jumped out at me.

Here is the field definition for the ngs_* fields:


  








  
  



  


I am not sure if we cannot use MM anymore for these type of queries or if
there is something I setup incorrectly, any help would be greatly
appreciated.

Nick


Re: Facet data type

2016-05-27 Thread Nick D
Steven,

The case that I was pointing to was specifically talking about the need for
a int to be set to multivalued=true for the field to be used as a
facet.field. I personally ran into it when upgrading to 5.x from 4.10.2. I
believe setting docValues=true will not have an affect (untested by me but
there was mention of that in the Jira). Also there are some linking Jiras
that talk about other issues with Facets in 5.x but my guess is if you
aren't upgrading from 4.x to 5.x then you will probably wont hit the issue
but there are some things people are finding with Doc values and
performance with 4.x upgrades.

I think there are some even more knowledgeable people on here who could
chime in with a more detailed explanation or correct me if I misspoke.

Nick

On Fri, May 27, 2016 at 12:11 PM, Steven White  wrote:

> Thanks Erick.
>
> What about Solr defect SOLR-7495 that Nick mentioned?  It sounds like
> because of this defect, I should NOT set docValues="true" on a filed when:
> a) type="int" and b) multiValued="true".  Can you confirm that I got this
> right?  I'm on Solr 5.2.1
>
> Steve
>
>
> On Fri, May 27, 2016 at 1:30 PM, Erick Erickson 
> wrote:
>
> > bq: my index size grew by 20%.  Is this expected
> >
> > Yes. But don't worry about it ;). Basically, you've serialized
> > to disk the "uninverted" form of the field. But, that is
> > accessed through Lucene by MMapDirectory, see:
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >
> > If you don't use DocValues, the uninverted version
> > is built in Java's memory, which is much more expensive
> > for a variety of reasons. What you lose in disk size you gain
> > in a lower JVM footprint, fewer GC problems etc.
> >
> > But the implication is, indeed, that you should use DocValues
> > for field you intend to facet and/or sort etc on. If you only search
> > it's just wasted space.
> >
> > Best,
> > Erick
> >
> > On Fri, May 27, 2016 at 6:25 AM, Steven White 
> > wrote:
> > > Thank you Erick for pointing out about DocValues.  I re-indexed my data
> > > with it set to true and my index size grew by 20%.  Is this expected?
> > >
> > > Hi Nick, I'm not clear about SOLR-7495.  Are you saying I should not
> use
> > > docValues=true if:type="int"and multiValued="true"?  I'm on Solr 5.2.1.
> > > Thanks.
> > >
> > > Steve
> > >
> > > On Thu, May 26, 2016 at 9:29 PM, Nick D  wrote:
> > >
> > >> Although you did mention that you wont need to sort and you are using
> > >> mutlivalued=true. On the off chance you do change something like
> > >> multivalued=false docValues=false then this will come in to play:
> > >>
> > >> https://issues.apache.org/jira/browse/SOLR-7495
> > >>
> > >> This has been a rather large pain to deal with in terms of faceting.
> > (the
> > >> Lucene change that caused a number of Issues is also referenced in
> this
> > >> Jira).
> > >>
> > >> Nick
> > >>
> > >>
> > >> On Thu, May 26, 2016 at 11:45 AM, Erick Erickson <
> > erickerick...@gmail.com>
> > >> wrote:
> > >>
> > >> > I always prefer ints to strings, they can't help but take
> > >> > up less memory, comparing two ints is much faster than
> > >> > two strings etc. Although Lucene can play some tricks
> > >> > to make that less noticeable.
> > >> >
> > >> > Although if these are just a few values, it'll be hard to
> > >> > actually measure the perf difference.
> > >> >
> > >> > And if it's a _lot_ of unique values, you have other problems
> > >> > than the int/string distinction. Faceting on very high
> > >> > cardinality fields is something that can have performance
> > >> > implications.
> > >> >
> > >> > But I'd certainly add docValues="true" to the definition no matter
> > >> > which you decide on.
> > >> >
> > >> > Best,
> > >> > Erick
> > >> >
> > >> > On Wed, May 25, 2016 at 9:29 AM, Steven White  >
> > >> > wrote:
> > >> > > Hi everyone,
> > >> > >
> > >> > > I will be faceting on data of type integers and I'm wonder if
> there
> > is
> > >> > any
> > >> > > difference on how I design my schema.  I have no need to sort or
> use
> > >> > range
> > >> > > facet, given this, in terms of Lucene performance and index size,
> > does
> > >> it
> > >> > > make any difference if I use:
> > >> > >
> > >> > > #1:  > >> > indexed="true"
> > >> > > required="true" stored="false"/>
> > >> > >
> > >> > > Or
> > >> > >
> > >> > > #2:  > indexed="true"
> > >> > > required="true" stored="false"/>
> > >> > >
> > >> > > (notice how I changed the "type" from "string" to "int" in #2)
> > >> > >
> > >> > > Thanks in advanced.
> > >> > >
> > >> > > Steve
> > >> >
> > >>
> >
>


Re: Facet data type

2016-05-26 Thread Nick D
Although you did mention that you wont need to sort and you are using
mutlivalued=true. On the off chance you do change something like
multivalued=false docValues=false then this will come in to play:

https://issues.apache.org/jira/browse/SOLR-7495

This has been a rather large pain to deal with in terms of faceting. (the
Lucene change that caused a number of Issues is also referenced in this
Jira).

Nick


On Thu, May 26, 2016 at 11:45 AM, Erick Erickson 
wrote:

> I always prefer ints to strings, they can't help but take
> up less memory, comparing two ints is much faster than
> two strings etc. Although Lucene can play some tricks
> to make that less noticeable.
>
> Although if these are just a few values, it'll be hard to
> actually measure the perf difference.
>
> And if it's a _lot_ of unique values, you have other problems
> than the int/string distinction. Faceting on very high
> cardinality fields is something that can have performance
> implications.
>
> But I'd certainly add docValues="true" to the definition no matter
> which you decide on.
>
> Best,
> Erick
>
> On Wed, May 25, 2016 at 9:29 AM, Steven White 
> wrote:
> > Hi everyone,
> >
> > I will be faceting on data of type integers and I'm wonder if there is
> any
> > difference on how I design my schema.  I have no need to sort or use
> range
> > facet, given this, in terms of Lucene performance and index size, does it
> > make any difference if I use:
> >
> > #1:  indexed="true"
> > required="true" stored="false"/>
> >
> > Or
> >
> > #2:  > required="true" stored="false"/>
> >
> > (notice how I changed the "type" from "string" to "int" in #2)
> >
> > Thanks in advanced.
> >
> > Steve
>


Re: More Like This on not new documents

2016-05-13 Thread Nick D
https://wiki.apache.org/solr/MoreLikeThisHandler

Bottom of the page, using context streams. I believe this still works in
newer versions of Solr. Although I have not tested it on a new version of
Solr.

But if you plan on indexing the document anyways then just indexing and
then passing the ID to mlt isn't a bad thing at all.

Nick

On Fri, May 13, 2016 at 2:23 AM, Vincenzo D'Amore 
wrote:

> Hi all,
>
> anybody know if is there a chance to use the mlt component with a new
> document not existing in the collection?
>
> In other words, if I have a new document, should I always first add it to
> my collection and only then, using the mlt component, have the list of
> similar documents?
>
>
> Best regards,
> Vincenzo
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


Re: Dynamically change solr suggest field

2016-05-11 Thread Nick D
There are only two ways I can think of to accomplish this and neither of
them are dynamically setting the suggester field as is looks according to
the Doc (which does sometimes have lacking info so I might be wrong) you
cannot set something like *suggest.fl=combo_box_field* at query time. But
maybe they can help you get started.

1. Multiple suggester request handlers for each option in combo box. This
way you just change the request handler in the query you submit based on
the context.

2. Use copy fields to put all possible suggestions into same field name, so
no more dynamic field settings, with another field defining whatever the
option would be for that document out of the combo box and use context
filters which can be passed at query time to limit the suggestions to those
filtered by whats in the combo box.
https://cwiki.apache.org/confluence/display/solr/Suggester#Suggester-ContextFiltering

Hope this helps a bit

Nick

On Wed, May 11, 2016 at 7:05 AM, Lasitha Wattaladeniya 
wrote:

> Hello devs,
>
> I'm trying to implement auto complete text suggestions using solr. I have a
> text box and next to that there's a combo box. So the auto complete should
> suggest based on the value selected in the combo box.
>
> Basically I should be able to change the suggest field based on the value
> selected in the combo box. I was trying to solve this problem whole day but
> not much luck. Can anybody tell me is there a way of doing this ?
>
> Regards,
> Lasitha.
>
> Lasitha Wattaladeniya
> Software Engineer
>
> Mobile : +6593896893
> Blog : techreadme.blogspot.com
>


Re: How to search in solr for words like %rek Dr%

2016-05-10 Thread Nick D
Don't really get what "Q= {!dismax qf=address} "rek Dr*" - It is not
allowed since perfix in Quotes is not allowed" means, why cant you use
exact phrase matching? Do you have some limitation of quoting as you are
specifically looking for an exact phrase I dont see why you wouldn't want
exact matching.


Anyways

You can look into using another type of tokenizer, my guess is you are
probably using the standard tokenizer or possibly the whitespace tokenizer.
You may want to try a different one a see what result you get. Also you
probably wont need to use the wildcards if you setup you gram sizes the way
you want.

The shingle factory can do stuff like (now my memory is a bit fuzzy on this
but I play with it in the admin page).

This is a sentence
shingle = 4
this_is_a_sentence

Combine that with your ngram factory and you can do something like. Mingram
= 4 max =50
this
this_i
this_is

this_is_a_sentence

his_i
his_is

his_is_a_sentence

etc.


Then apply the shingle factory on query to take something like

his is-> his_is and you will get that phrase back.

My personal favorite is just using edgengram and fixing something like but
the concept is the same with regular old ngram:

2001 N Drive Derek Fullerton

2
[32]
0
1
1
word
1
20
[32 30]
0
2
1
word
1
200
[32 30 30]
0
3
1
word
1
2001
[32 30 30 31]
0
4
1
word
1
n
[6e]
5
6
1
word
2
d
[64]
7
8
1
word
3
dr
[64 72]
7
9
1
word
3
dri
[64 72 69]
7
10
1
word
3
driv
[64 72 69 76]
7
11
1
word
3
drive
[64 72 69 76 65]
7
12
1
word
3
d
[64]
13
14
1
word
4
de
[64 65]
13
15
1
word
4
der
[64 65 72]
13
16
1
word
4
dere
[64 65 72 65]
13
17
1
word
4
derek
[64 65 72 65 6b]
13
18
1
word
4
f
[66]
19
20
1
word
5
fu
[66 75]
19
21
1
word
5
ful
[66 75 6c]
19
22
1
word
5
full
[66 75 6c 6c]
19
23
1
word
5
fulle
[66 75 6c 6c 65]
19
24
1
word
5
fuller
[66 75 6c 6c 65 72]
19
25
1
word
5
fullert
[66 75 6c 6c 65 72 74]
19
26
1
word
5
fullerto
[66 75 6c 6c 65 72 74 6f]
19
27
1
word
5
fullerton
[66 75 6c 6c 65 72 74 6f 6e]
19
28
1
word
5

Works great for a quick type-ahead field type.

Oh and by the way your ngram size is two small for _rek_ to be split up
from _derek_


Setting up a few different field types and playing with the analyzer in
admin page can give you a good idea about what both index and query time
results can be and with your tiny data set is the best way I can think of
to see instant results with your new field types.

Nick

On Tue, May 10, 2016 at 10:01 AM, Thrinadh Kuppili 
wrote:

> I have tried with  maxGramSize="12"/> and search using the Extended Dismax
>
> Q= {!dismax qf=address} rek Dr* - It did not work as expected since i am
> getting all the records which has rek, Dr .
>
> Q= {!dismax qf=address} "rek Dr*" - It is not allowed since perfix in
> Quotes
> is not allowed.
>
> Q= {!complexphrase inOrder=true}address:"rek dr*" - It did not work since
> it
> is searching for words starts with rek
>
> I am not aware of shingle factory as of now will try to use and findout how
> i can use.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4275859.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to search in solr for words like %rek Dr%

2016-05-10 Thread Nick D
You can use a combination of ngram or edgengram fields and possibly the
shingle factory if you want to combine words. Also might want to have it as
exact text with no query sloop if the two words, even the partial text,
need to be right next to each other. Edge is great for left to right ngram
is great just to splitup by a size.  There are a number of tokenizers you
can try out.

Nick
On May 10, 2016 9:22 AM, "Thrinadh Kuppili"  wrote:

> I am trying to search a field named Address which has a space in it.
> Example :
> Address has the below values in it.
> 1. 2000 North Derek Dr Fullerton
> 2. 2011 N Derek Drive Fullerton
> 3. 2108 N Derek Drive Fullerton
> 4. 2100 N Derek Drive Fullerton
> 5. 2001 N Drive Derek Fullerton
>
> Search Query:- Derek Drive or rek Dr
> Expectation is it should return all  2,3,4 and it should not return 1 & 5 .
>
> Finally i am trying to find a word which can search similar to database
> search of %N Derek%
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr edismax field boosting

2016-05-10 Thread Nick D
plain'=>{
>   'http://localhost:4503/baseurl/upendra-custon.html'=>'
> 0.14641379 = max of:
>   0.14641379 = weight(_text_:upendra in 0) [], result of:
> 0.14641379 = score(doc=0,freq=8.0 = termFreq=8.0
> ), product of:
>   0.074107975 = idf(docFreq=6, docCount=6)
>   1.9756819 = tfNorm, computed from:
> 8.0 = termFreq=8.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 168.3 = avgFieldLength
> 113.8 = fieldLength
> ',
>   '
> http://localhost:4503/baseurl/upendra-custon/care-body-content.html'=>'
> 0.13738367 = max of:
>   0.13738367 = weight(_text_:upendra in 1) [], result of:
> 0.13738367 = score(doc=1,freq=4.0 = termFreq=4.0
> ), product of:
>   0.074107975 = idf(docFreq=6, docCount=6)
>   1.853831 = tfNorm, computed from:
> 4.0 = termFreq=4.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 168.3 = avgFieldLength
> 83.591835 = fieldLength
> ',
>   'http://localhost:4503/baseurl/upendra-custon/care-keyword.html'=>'
> 0.13738367 = max of:
>   0.13738367 = weight(_text_:upendra in 2) [], result of:
> 0.13738367 = score(doc=2,freq=4.0 = termFreq=4.0
> ), product of:
>   0.074107975 = idf(docFreq=6, docCount=6)
>   1.853831 = tfNorm, computed from:
> 4.0 = termFreq=4.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 168.3 = avgFieldLength
> 83.591835 = fieldLength
> ',
>   'http://localhost:4503/baseurl/upendra-custon/care.html'=>'
> 0.13286635 = max of:
>   0.13286635 = weight(_text_:upendra in 3) [], result of:
> 0.13286635 = score(doc=3,freq=4.0 = termFreq=4.0
> ), product of:
>   0.074107975 = idf(docFreq=6, docCount=6)
>   1.7928753 = tfNorm, computed from:
> 4.0 = termFreq=4.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 168.3 = avgFieldLength
> 113.8 = fieldLength
> ',
>   '
> http://localhost:4503/baseurl/upendra-custon/care-description.html'=>'
> 0.13053702 = max of:
>   0.13053702 = weight(_text_:upendra in 4) [], result of:
> 0.13053702 = score(doc=4,freq=3.0 = termFreq=3.0
> ), product of:
>   0.074107975 = idf(docFreq=6, docCount=6)
>   1.7614436 = tfNorm, computed from:
> 3.0 = termFreq=3.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 168.3 = avgFieldLength
> 83.591835 = fieldLength
> ',
>   'http://localhost:4503/baseurl/upendra-custon/care-without-.html'=>'
> 0.11870542 = max of:
>   0.11870542 = weight(_text_:upendra in 5) [], result of:
> 0.11870542 = score(doc=5,freq=2.0 = termFreq=2.0
> ), product of:
>   0.074107975 = idf(docFreq=6, docCount=6)
>   1.6017901 = tfNorm, computed from:
> 2.0 = termFreq=2.0
> 1.2 = parameter k1
> 0.75 = parameter b
> 168.3 = avgFieldLength
> 83.591835 = fieldLength
> '},
> 'QParser'=>'ExtendedDismaxQParser',
> 'altquerystring'=>nil,
> 'boost_queries'=>nil,
> 'parsed_boost_queries'=>[],
> 'boostfuncs'=>nil,
> 'timing'=>{
>   'time'=>6.0,
>   'prepare'=>{
> 'time'=>0.0,
> 'query'=>{
>   'time'=>0.0},
> 'facet'=>{
>   'time'=>0.0},
> 'facet_module'=>{
>   'time'=>0.0},
> 'mlt'=>{
>   'time'=>0.0},
> 'highlight'=>{
>   'time'=>0.0},
> 'stats'=>{
>   'time'=>0.0},
> 'expand'=>{
>   'time'=>0.0},
> 'debug'=>{
>   'time'=>0.0}},
>   'process'=>{
> 'time'=>5.0,
> 'query'=>{
>   'time'=>0.0},
> 'facet'=>{
>   'time'=>0.0},
> 'facet_module'=>{
>   'time'=>0.0},
> 'mlt'=>{
>   'time'=>0.0},
> 'highlight'=>{
>   'time'=>0.0},
> 'stats'=>{
>   'time'=>0.0},
> 'expand'=>{
>   'time'=>0.0},
> 'debug'=>{
>

Re: Solr edismax field boosting

2016-05-09 Thread Nick D
;q.alt":"Upendra",
>   "indent":"on",
>   "qf":"metatag.description^9 h1^7 h2^6 h3^5 h4^4 _text_^1 id^0.5",
>   "wt":"json",
>   "debugQuery":"on",
>   "_":"1462810987788"}},
>   "response":{"numFound":3,"start":0,"maxScore":0.8430033,"docs":[
>   {
> "h2":["Looks like your browser is a little out-of-date."],
> "h3":["Already a member?"],
> "strtitle":"Upendra Custon",
> "id":"
> http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon.html
> ",
> "tstamp":"2016-05-09T17:15:57.604Z",
> "metatag.hideininternalsearch":[false],
> "segment":[20160509224553],
> "digest":["844296a63233b3e4089424fe1ec9d036"],
> "boost":[1.4142135],
> "lang":"en",
> "_version_":1533871839698223104,
> "host":"localhost",
> "url":"
> http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon.html
> ",
> "score":0.8430033},
>   {
> "metatag.description":"test",
> "h1":["Health care"],
> "h2":["Looks like your browser is a little out-of-date."],
> "h3":["Already a member?"],
> "strtitle":"Upendra",
> "id":"
> http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon/health-care.html
> ",
> "tstamp":"2016-05-09T17:15:57.838Z",
> "metatag.hideininternalsearch":[false],
> "metatag.topresultthumbnailalt":[","],
> "segment":[20160509224553],
> "digest":["dd4ef8879be2d4d3f28e24928e9b84c5"],
> "boost":[1.4142135],
> "lang":"en",
> "metatag.keywords":",",
> "_version_":1533871839731777536,
> "host":"localhost",
> "url":"
> http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon/health-care.html
> ",
> "score":0.7009616},
>   {
> "metatag.description":"Upendra decription testing",
> "h1":["healthcare description"],
> "h2":["Looks like your browser is a little out-of-date."],
> "h3":["Already a member?"],
> "strtitle":"healthcare description",
> "id":"
> http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon/healthcare-description.html
> ",
> "tstamp":"2016-05-09T17:15:57.682Z",
> "metatag.hideininternalsearch":[false],
> "metatag.topresultthumbnailalt":[","],
> "segment":[20160509224553],
> "digest":["6262795db6aed05a5de7cc3cbe496401"],
> "boost":[1.4142135],
> "lang":"en",
> "metatag.keywords":",",
> "_version_":1533871839739117568,
> "host":"localhost",
> "url":"
> http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon/healthcare-description.html
> ",
> "score":0.5481102}]
>   },
>   "debug":{
> "rawquerystring":"Upendra",
> "querystring":"Upendra",
>
> "parsedquery":"(+DisjunctionMaxQuery(((metatag.description:Upendra)^9.0 |
> (h1:Upendra)^7.0 | (h2:Upendra)^6.0 | (h3:Upendra)^5.0 | (id:Upendra)^0.5 |
> (h4:Upendra)^4.0 | _text_:upendra)~0.99))/no_coord",
> "parsedquery_toString":"+((metatag.description:Upendra)^9.0 |
> (h1:Upendra)^7.0 | (h2:Upendra)^6.0 | (h3:Upendra)^5.0 | (id:Upendra)^0.5 |
> (h4:Upendra)^4.0 | _text_:upendra)~0.99",
> "explain":{
>   "
> http://localhost:4503/content/uhcdotcom/en/home/waysin/poc/upendra-custon.html":"\n0.84300333
> = max plus 0.99 times others of:\n  0.84300333 = weight(_text_:upendra in
> 0) [], result of:\n0.84300333 = score(doc=0,freq=6.0 = termFreq=6.0\n),
> product of:\n  0.44183275 = id

Re: Solr edismax field boosting

2016-05-09 Thread Nick D
You can add the debug flag to the end of the request and see exactly what
the scoring is and why things are happening.

&debug=ALL will show you everything including the scoring.

Showing the result of the debug query should help you, or adding that into
your question here, decipher what is going on with your scoring and how the
boosts are(n't) working.

Nick

On Mon, May 9, 2016 at 7:22 PM, Megha Bhandari 
wrote:

> Hi
>
> We are trying to boost certain fields with relevancy. However we are not
> getting results as per expectation. Below is the configuration in
> solr-config.xml.
> Even though the title field has a lesser boost than metatag.description
> results for title field are coming higher.
>
> We even created test data that have data only in description in
> metatag.description and title. Example , page 1 has foo in description and
> page 2 has foo in title. Solr is still returning page 2 before page 1.
>
> We are using Solr 5.5 and Nutch 1.11 currently.
>
> Following is the configuration we are using. Any ideas on what we are
> missing to enable correct field boosting?
>
> 
> 
>   
> metatag.keywords^10 metatag.description^9 title^8 h1^7 h2^6 h3^5
> h4^4 id _text_^1
>   
>   explicit
>   10
>
>   
>
>   explicit
>   _text_
>   default
>   on
>   false
>   10
>   5
>   5
>   false
>   true
>   10
>   5
> 
>   id title metatag.description itemtype
> lang metatag.hideininternalsearch metatag.topresultthumbnailalt
> metatag.topresultthumbnailurl playerid playerkey
>   on
>   0
>   title metatag.description
>   
>   
> 
> 
>   spellcheck
> elevator
> 
>   
>
> Thanks
> Megha
>