Re: match in order

2017-11-04 Thread Erick Erickson
This _particular_ use case might be a good candidate for shingles.
That filter pairs up tuples into tokens, so your docs would have
A_B
A_C
A_C
B_D

and the search would be broken up (assuming the appropriate parameters
to ShingleFilterFacotry) to
X_A A_B B_Y
thus would match the specified doc only, assuming default operator of OR.

This wouldn't generalize for slop at all though.

Best,
Erick

On Sat, Nov 4, 2017 at 12:05 PM, Vincenzo D'Amore  wrote:
> Thanks Erick.
>
> Right, if there is no slop specified it is like have an "exact match".  So I 
> can simplify the query in:
>
> bq=field1:("X A" OR "A B" OR "B Y")^10
>
> I'm struggling to understand if there is any way to split the user query in 
> pairs directly with solr.
>
>> On Sat, Nov 4, 2017 at 6:32 PM, Erick Erickson  
>> wrote:
>> Looks good to me. The only thing I'd mention is that in the example
>> given, complexprhase query is unnecessary, but only because there's no
>> "slop" specified. If by "near" you can also mean "within 3 words" or
>> some such, then you need complexPhraseQuery..
>>
>> FWIW,
>> Erick
>>
>> On Sat, Nov 4, 2017 at 10:12 AM, Vincenzo D'Amore  wrote:
>> > Hi,
>> >
>> > I have a field field1 where there are only pairs of terms, for example the
>> > documents
>> >
>> > doc1 { field1 : "A B", title : "Hello title 1" }
>> > doc2 { field1 : "A C", title : "Hello title 2"  }
>> > doc3 { field1 : "A D", title : "Hello title 3"  }
>> > doc4 { field1 : "B D", title : "Hello title 4"  }
>> >
>> > I have to boost the documents where there is a pair terms in the same order
>> > used in the query:
>> >
>> > To be clear, if I the user search four terms: X A B Y
>> >
>> > I have to check they are in a field:
>> >
>> > X near A, A near B,  B near Y:
>> >
>> > I've implemented this problem using complexphrase:
>> >
>> > bq={!complexphrase inOrder=true df=field1}("X A" OR "A B" OR "B Y")^10
>> >
>> > What do you think of this solution? Is there another solution, may be using
>> > a different query parser?
>> >
>> > Trying another way, I've also used with surround query parser, but I think,
>> > I was unable to write the query correctly, never matches.
>> >
>> > bq={!surround}field1:(W(X, A) OR W(A,B) OR W(B, Y))^10
>> >
>> > Not sure if this is the correct syntax, I've also not found enough
>> > documentation that explaining.
>> >
>> > Best regards,
>> > Vincenzo
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251


recent utf8 problems

2017-11-04 Thread Dr. Mario Michael Krell
Hi,

We recently discovered issues with solr with converting utf8 code in the 
search. One or two month ago everything was still working.

- What might have caused it is a Java update (Java 8 Update 151). 
- We are using firefox as well as chrome for displaying results.
- We tested it with Solr 6.5, Solr 7.0.0, 7.0.1, and 7.1.

We created a search engine base on the yfcc100m and in the normal browser 
(http://localhost:8983/solr/#/mmc_search3/query 
), we can search for 
"title:T%C3%BCbingen” in the query field and get more than 3 million results:

{
  "responseHeader":{
"status":0,
"QTime":103},
  "response":{"numFound":3092484,"start":0,"docs":[
  {
"photoid":"6182384834",

However, when we use the respective web-address, 
http://localhost:8983/solr/mmc_search3/select?q=title:T%C3%BCbingen=json 

The results are deduced to zero:
{
  "responseHeader":{
"status":0,
"QTime":0},
  "response":{"numFound":0,"start":0,"docs":[]
  }}

responseHeader  
status  0
QTime   0
response
numFound0
start   0
docs[]

I would be happy for any suggestions on how to fix this problem. For me it 
seems like a bug in solr caused by Java.

Best,

Mario

Re: match in order

2017-11-04 Thread Vincenzo D'Amore
Thanks Erick. 

Right, if there is no slop specified it is like have an "exact match".  So I 
can simplify the query in:

bq=field1:("X A" OR "A B" OR "B Y")^10 

I'm struggling to understand if there is any way to split the user query in 
pairs directly with solr.

> On Sat, Nov 4, 2017 at 6:32 PM, Erick Erickson  
> wrote:
> Looks good to me. The only thing I'd mention is that in the example
> given, complexprhase query is unnecessary, but only because there's no
> "slop" specified. If by "near" you can also mean "within 3 words" or
> some such, then you need complexPhraseQuery..
> 
> FWIW,
> Erick
> 
> On Sat, Nov 4, 2017 at 10:12 AM, Vincenzo D'Amore  wrote:
> > Hi,
> >
> > I have a field field1 where there are only pairs of terms, for example the
> > documents
> >
> > doc1 { field1 : "A B", title : "Hello title 1" }
> > doc2 { field1 : "A C", title : "Hello title 2"  }
> > doc3 { field1 : "A D", title : "Hello title 3"  }
> > doc4 { field1 : "B D", title : "Hello title 4"  }
> >
> > I have to boost the documents where there is a pair terms in the same order
> > used in the query:
> >
> > To be clear, if I the user search four terms: X A B Y
> >
> > I have to check they are in a field:
> >
> > X near A, A near B,  B near Y:
> >
> > I've implemented this problem using complexphrase:
> >
> > bq={!complexphrase inOrder=true df=field1}("X A" OR "A B" OR "B Y")^10
> >
> > What do you think of this solution? Is there another solution, may be using
> > a different query parser?
> >
> > Trying another way, I've also used with surround query parser, but I think,
> > I was unable to write the query correctly, never matches.
> >
> > bq={!surround}field1:(W(X, A) OR W(A,B) OR W(B, Y))^10
> >
> > Not sure if this is the correct syntax, I've also not found enough
> > documentation that explaining.
> >
> > Best regards,
> > Vincenzo



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Leader initiated recovery authentication failure

2017-11-04 Thread Wei
Hi All,

After enabling basic authentication for solr cloud, I noticed that the
internal leader initiated recovery failed with 401 response.

The recovery request from leader:

GET 
//replica1.mycloud.com:9090/solr/admin/cores?action=*REQUESTRECOVERY*=replica1=javabin=2
HTTP/1.1" 401 310 "-"
"Solr[org.apache.solr.client.solrj.impl.HttpSolrClient] 1.0" 5

My authorization config is:

authorization:
{

   - class: "solr.RuleBasedAuthorizationPlugin",
   - permissions:
   [
  -
  {
 - name: "security-edit",
 - role: "admin",
 - index: 1
 },
  -
  {
 - name: "schema-edit",
 - role: "admin",
 - index: 2
 },
  -
  {
 - name: "config-edit",
 - role: "admin",
 - index: 3
 },
  -
  {
 - name: "core-admin-edit",
 - role: "admin",
 - index: 4
 },
  -
  {
 - name: "collection-admin-edit",
 - role: "admin",
 - index: 5
 }
  ],


Looks the unauthorized error is because core-admin-edit requires admin
access. How can I config authentication credentials for solr cloud's
internal request? Appreciate your help!

Thanks,
Wei


Re: match in order

2017-11-04 Thread Erick Erickson
Looks good to me. The only thing I'd mention is that in the example
given, complexprhase query is unnecessary, but only because there's no
"slop" specified. If by "near" you can also mean "within 3 words" or
some such, then you need complexPhraseQuery..

FWIW,
Erick

On Sat, Nov 4, 2017 at 10:12 AM, Vincenzo D'Amore  wrote:
> Hi,
>
> I have a field field1 where there are only pairs of terms, for example the
> documents
>
> doc1 { field1 : "A B", title : "Hello title 1" }
> doc2 { field1 : "A C", title : "Hello title 2"  }
> doc3 { field1 : "A D", title : "Hello title 3"  }
> doc4 { field1 : "B D", title : "Hello title 4"  }
>
> I have to boost the documents where there is a pair terms in the same order
> used in the query:
>
> To be clear, if I the user search four terms: X A B Y
>
> I have to check they are in a field:
>
> X near A, A near B,  B near Y:
>
> I've implemented this problem using complexphrase:
>
> bq={!complexphrase inOrder=true df=field1}("X A" OR "A B" OR "B Y")^10
>
> What do you think of this solution? Is there another solution, may be using
> a different query parser?
>
> Trying another way, I've also used with surround query parser, but I think,
> I was unable to write the query correctly, never matches.
>
> bq={!surround}field1:(W(X, A) OR W(A,B) OR W(B, Y))^10
>
> Not sure if this is the correct syntax, I've also not found enough
> documentation that explaining.
>
> Best regards,
> Vincenzo


match in order

2017-11-04 Thread Vincenzo D'Amore
Hi,

I have a field field1 where there are only pairs of terms, for example the
documents

doc1 { field1 : "A B", title : "Hello title 1" }
doc2 { field1 : "A C", title : "Hello title 2"  }
doc3 { field1 : "A D", title : "Hello title 3"  }
doc4 { field1 : "B D", title : "Hello title 4"  }

I have to boost the documents where there is a pair terms in the same order
used in the query:

To be clear, if I the user search four terms: X A B Y

I have to check they are in a field:

X near A, A near B,  B near Y:

I've implemented this problem using complexphrase:

bq={!complexphrase inOrder=true df=field1}("X A" OR "A B" OR "B Y")^10

What do you think of this solution? Is there another solution, may be using
a different query parser?

Trying another way, I've also used with surround query parser, but I think,
I was unable to write the query correctly, never matches.

bq={!surround}field1:(W(X, A) OR W(A,B) OR W(B, Y))^10

Not sure if this is the correct syntax, I've also not found enough
documentation that explaining.

Best regards,
Vincenzo


Re: Advice on Stemming in Solr

2017-11-04 Thread Zheng Lin Edwin Yeo
Hi Emir,

We are looking at the configuration, to try to adjust the rules to suit our
use case.

Regards,
Edwin


On 3 November 2017 at 16:24, Emir Arnautović 
wrote:

> Hi Edwin,
> Hunspell is configurable, language independent library and you can define
> any morphology rules. It’s beed there for a while and I would not be
> surprised if someone already adjusted english rules to suite you case.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 3 Nov 2017, at 04:25, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi Emir,
> >
> > We are looking to change to HunspellStemFilterFactory. This has a
> > dictionary file containing words and applicable flags, and an affix file
> > that specifies how these flags will control spell checking.
> > Probably we can control it from those files in HunspellStemFilterFactory?
> >
> > Regards,
> > Edwin
> >
> >
> > On 2 November 2017 at 17:46, Emir Arnautović <
> emir.arnauto...@sematext.com>
> > wrote:
> >
> >> Hi Edwin,
> >> It seems that it would be best if you do not apply *ing stemming rule at
> >> all. The first idea is to trick stemmer and replace any word that ends
> with
> >> ing to some nonexisting char combination e.g. ‘wqx’. You can use solr.
> PatternReplaceFilterFactory
> >> to do that. You can switch it back after stemming if want to have proper
> >> token in index.
> >>
> >> HTH,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 2 Nov 2017, at 03:23, Zheng Lin Edwin Yeo 
> >> wrote:
> >>>
> >>> Hi Emir,
> >>>
> >>> We do have quite alot of words that should not be stemmed. Currently,
> the
> >>> KStemFilterFactory are stemming all the non-English words that end with
> >>> "ing" as well. There are quite alot of places and names which ends in
> >>> "ing", and all these are being stemmed as well, which leads to an
> >>> inaccurate search.
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>>
> >>> On 1 November 2017 at 18:20, Emir Arnautović <
> >> emir.arnauto...@sematext.com>
> >>> wrote:
> >>>
>  Hi Edwin,
>  If the number of words that should not be stemmed is not high you
> could
>  use KeywordMarkerFilterFactory to flag those words as keywords and it
>  should prevent stemmer from changing them.
>  Depending on what you want to achieve, you might not be able to avoid
>  using stemmer at indexing time. If you want to find documents that
> >> contain
>  only “walking” with search term “walk”, then you have to stem at index
>  time. Cases when you use stemming on query time only are rare and
> >> specific.
>  If you want to prefer exact matches over stemmed matches, you have to
>  index same content with and without stemming and boost matches on
> field
>  without stemming.
> 
>  HTH,
>  Emir
>  --
>  Monitoring - Log Management - Alerting - Anomaly Detection
>  Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> 
> 
> 
> > On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo 
>  wrote:
> >
> > Hi,
> >
> > We are currently using KStemFilterFactory in Solr, but we found that
> it
>  is
> > actually doing stemming on non-English words like "ximenting", which
> it
> > stem to "ximent". This is not what we wanted.
> >
> > Another option is to use the HunspellStemFilterFactory, but there are
>  some
> > English words like "running", walking" that are not being stemmed.
> >
> > Would like to check, is it advisable to use Stemming at index? Or we
>  should
> > not use Stemming at index time, but at query time, do a search for
> the
> > stemmed words as well, like for example, if the user search for
>  "walking",
> > we will do the search together with "walk", and the actual word of
>  walking
> > will have higher weightage.
> >
> > I'm currently using Solr 6.5.1.
> >
> > Regards,
> > Edwin
> 
> 
> >>
> >>
>
>


Re: SolrClould 6.6 stability challenges

2017-11-04 Thread Rick Dig
not committing after the batch. made sure we have that turned off.
maxTime is set to 30 (300 seconds), openSearcher is set to true.


On Sat, Nov 4, 2017 at 6:50 PM, Amrit Sarkar  wrote:

> Pretty much what Emir has stated. I want to know, when you saw;
>
> all of this runs perfectly ok when indexing isn't happening. as soon as
> > we start "nrt" indexing one of the follower nodes goes down within 10 to
> 20
> > minutes.
>
>
> When you say "NRT" indexing, what is the commit strategy in indexing. With
> auto-commit so highly set, are you committing after batch, if yes, what's
> the number.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Sat, Nov 4, 2017 at 2:47 PM, Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
>
> > Hi Rick,
> > Do you see any errors in logs? Do you have any monitoring tool? Maybe you
> > can check heap and GC metrics around time when incident happened. It is
> not
> > large heap but some major GC could cause pause large enough to trigger
> some
> > snowball and end up with node in recovery state.
> > What is indexing rate you observe? Why do you have max warming searchers
> 5
> > (did you mean this with autowarmingsearchers?) when you commit every 5
> min?
> > Why did you increase it - you seen errors with default 2? Maybe you
> commit
> > every bulk?
> > Do you see similar behaviour when you just do indexing without queries?
> >
> > Thanks,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> > > On 4 Nov 2017, at 05:15, Rick Dig  wrote:
> > >
> > > hello all,
> > > we are trying to run solrcloud 6.6 in a production setting.
> > > here's our config and issue
> > > 1) 3 nodes, 1 shard, replication factor 3
> > > 2) all nodes are 16GB RAM, 4 core
> > > 3) Our production load is about 2000 requests per minute
> > > 4) index is fairly small, index size is around 400 MB with 300k
> documents
> > > 5) autocommit is currently set to 5 minutes (even though ideally we
> would
> > > like a smaller interval).
> > > 6) the jvm runs with 8 gb Xms and Xmx with CMS gc.
> > > 7) all of this runs perfectly ok when indexing isn't happening. as soon
> > as
> > > we start "nrt" indexing one of the follower nodes goes down within 10
> to
> > 20
> > > minutes. from this point on the nodes never recover unless we stop
> > > indexing.  the master usually is the last one to fall.
> > > 8) there are maybe 5 to 7 processes indexing at the same time with
> > document
> > > batch sizes of 500.
> > > 9) maxRambuffersizeMB is 100, autowarmingsearchers is 5,
> > > 10) no cpu and / or oom issues that we can see.
> > > 11) cpu load does go fairly high 15 to 20 at times.
> > > any help or pointers appreciated
> > >
> > > thanks
> > > rick
> >
> >
>


Re: SolrClould 6.6 stability challenges

2017-11-04 Thread Amrit Sarkar
Pretty much what Emir has stated. I want to know, when you saw;

all of this runs perfectly ok when indexing isn't happening. as soon as
> we start "nrt" indexing one of the follower nodes goes down within 10 to 20
> minutes.


When you say "NRT" indexing, what is the commit strategy in indexing. With
auto-commit so highly set, are you committing after batch, if yes, what's
the number.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Sat, Nov 4, 2017 at 2:47 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Rick,
> Do you see any errors in logs? Do you have any monitoring tool? Maybe you
> can check heap and GC metrics around time when incident happened. It is not
> large heap but some major GC could cause pause large enough to trigger some
> snowball and end up with node in recovery state.
> What is indexing rate you observe? Why do you have max warming searchers 5
> (did you mean this with autowarmingsearchers?) when you commit every 5 min?
> Why did you increase it - you seen errors with default 2? Maybe you commit
> every bulk?
> Do you see similar behaviour when you just do indexing without queries?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 4 Nov 2017, at 05:15, Rick Dig  wrote:
> >
> > hello all,
> > we are trying to run solrcloud 6.6 in a production setting.
> > here's our config and issue
> > 1) 3 nodes, 1 shard, replication factor 3
> > 2) all nodes are 16GB RAM, 4 core
> > 3) Our production load is about 2000 requests per minute
> > 4) index is fairly small, index size is around 400 MB with 300k documents
> > 5) autocommit is currently set to 5 minutes (even though ideally we would
> > like a smaller interval).
> > 6) the jvm runs with 8 gb Xms and Xmx with CMS gc.
> > 7) all of this runs perfectly ok when indexing isn't happening. as soon
> as
> > we start "nrt" indexing one of the follower nodes goes down within 10 to
> 20
> > minutes. from this point on the nodes never recover unless we stop
> > indexing.  the master usually is the last one to fall.
> > 8) there are maybe 5 to 7 processes indexing at the same time with
> document
> > batch sizes of 500.
> > 9) maxRambuffersizeMB is 100, autowarmingsearchers is 5,
> > 10) no cpu and / or oom issues that we can see.
> > 11) cpu load does go fairly high 15 to 20 at times.
> > any help or pointers appreciated
> >
> > thanks
> > rick
>
>


Re: SolrClould 6.6 stability challenges

2017-11-04 Thread Emir Arnautović
Hi Rick,
Do you see any errors in logs? Do you have any monitoring tool? Maybe you can 
check heap and GC metrics around time when incident happened. It is not large 
heap but some major GC could cause pause large enough to trigger some snowball 
and end up with node in recovery state.
What is indexing rate you observe? Why do you have max warming searchers 5 (did 
you mean this with autowarmingsearchers?) when you commit every 5 min? Why did 
you increase it - you seen errors with default 2? Maybe you commit every bulk?
Do you see similar behaviour when you just do indexing without queries?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 4 Nov 2017, at 05:15, Rick Dig  wrote:
> 
> hello all,
> we are trying to run solrcloud 6.6 in a production setting.
> here's our config and issue
> 1) 3 nodes, 1 shard, replication factor 3
> 2) all nodes are 16GB RAM, 4 core
> 3) Our production load is about 2000 requests per minute
> 4) index is fairly small, index size is around 400 MB with 300k documents
> 5) autocommit is currently set to 5 minutes (even though ideally we would
> like a smaller interval).
> 6) the jvm runs with 8 gb Xms and Xmx with CMS gc.
> 7) all of this runs perfectly ok when indexing isn't happening. as soon as
> we start "nrt" indexing one of the follower nodes goes down within 10 to 20
> minutes. from this point on the nodes never recover unless we stop
> indexing.  the master usually is the last one to fall.
> 8) there are maybe 5 to 7 processes indexing at the same time with document
> batch sizes of 500.
> 9) maxRambuffersizeMB is 100, autowarmingsearchers is 5,
> 10) no cpu and / or oom issues that we can see.
> 11) cpu load does go fairly high 15 to 20 at times.
> any help or pointers appreciated
> 
> thanks
> rick