Re: REBALANCELEADERS is not reliable

2019-01-08 Thread Erick Erickson
It's weirder than that. In the current test on master, the
assumption is that the node recorded as leader in ZK
is actually the leader, see
TestRebalanceLeaders.checkZkLeadersAgree(). The theory
is that the identified leader node in ZK is actually the leader
after the rebalance command. But you're right, I don't see
an actual check that the collection's status agrees.

That aside, though, there are several problems I'm uncovering

1> BALANCESHARDUNIQUE can wind up with multiple
"preferredLeader" properties defined. Some time between
the original code and now someone refactored a bunch of
code and missed removing a unique property if it was
already assigned and being assigned to another replica
in the same slice.

2> to make it much worse, I've rewritten the tests
extensively and I can beast the rewritten tests 1,000
times and no failures. If I test manually by just issuing
the commands, everything works fine. By "testing manually"
I mean (working with 4 Vms, 10 shards 4 replicas)
> create the collection
> issue the BALANCESHARDUNIQUE command
> issue the REBALANCELEADERS command


However, if instead I
> create the collection
> issue the BALANCESHARDUNIQUE command
> shut down 3 of 4 Solr instances so all the leaders
   are on the same host.
> restart the 3 instances
> issue the REBALANCELEADERS command then
   it doesn't work.

At least that's what I think I'm seeing, but it makes no
real sense yet.

So I'm first trying to understand why my manual test
fails so regularly, then I can incorporate that setup
into the unit test (I'm thinking of just shutting down
and restarting some of the Jetty instances).

But it's a total mystery to me why restarting Solr instances
should have any effect. But that's certainly not
something that happens in the current test so I have
hopes that tracking that down will lead to understanding
what the invalid assumption I'm making is and we can
test for that too.,

On Tue, Jan 8, 2019 at 1:42 AM Bernd Fehling
 wrote:
>
> Hi Erick,
>
> after some more hours of debugging the rough result is, who ever invented
> this leader election did not check if an action returns the estimated
> result. There are only checks for exceptions, true/false, new sequence
> numbers and so on, but never if a leader election to the preferredleader
> really took place.
>
> If doing a rebalanceleaders to preferredleader I also have to check if:
> - a rebalance took place
> - the preferredleader has really become leader (and not anyone else)
>
> Currently this is not checked and the call rebalanceleaders to preferredleader
> is like a shot into the dark with hope of success. And thats why any
> problems have never been discovered or reported.
>
> Bernd
>
>
> Am 21.12.18 um 18:00 schrieb Erick Erickson:
> > I looked at the test last night and it's...disturbing. It succeeds
> > 100% of the time. Manual testing seems to fail very often.
> > Of course it was late and I was a bit cross-eyed, so maybe
> > I wasn't looking at the manual tests correctly. Or maybe the
> > test is buggy.
> >
> > I beasted the test 100x last night and all of them succeeded.
> >
> > This was with all NRT replicas.
> >
> > Today I'm going to modify the test into a stand-alone program
> > to see if it's something in the test environment that causes
> > it to succeed. I've got to get this to fail as a unit test before I
> > have confidence in any fixes, and also confidence that things
> > like this will be caught going forward.
> >
> > Erick
> >
> > On Fri, Dec 21, 2018 at 3:59 AM Bernd Fehling
> >  wrote:
> >>
> >> As far as I could see with debugger there is still a problem in requeing.
> >>
> >> There is a watcher and it is recognized that the watcher is not a 
> >> preferredleader.
> >> So it tries to locate a preferredleader with success.
> >> It then calls makeReplicaFirstWatcher and gets a new sequence number for
> >> the preferredleader replica. But now we have two replicas with the same
> >> sequence number. One replica which already owns that sequence number and
> >> the replica which got the new (and the same) number as new sequence number.
> >> It now tries to solve this with queueNodesWithSameSequence.
> >> Might be something in rejoinElection.
> >> At least the call to rejoinElection seems right. For preferredleader it
> >> is true for rejoinAtHead and for the other replica with same sequence 
> >> number
> >> it is false for rejoinAtHead.
> >>
> >> A test case should have 3 shards with 3 cores per shard and should try to
> >> set preferredleader to different replicas at random. And then try to
> >> rebalance and check the results.
> >>
> >> So far, regards, Bernd
> >>
> >>
> >> Am 21.12.18 um 07:11 schrieb Erick Erickson:
> >>> I'm reworking the test case, so hold off on doing that. If you want to
> >>> raise a JIRA, though. please do and attach your patch...
> >>>
> >>> On Thu, Dec 20, 2018 at 10:53 AM Erick Erickson  
> >>> wrote:
> 
>  Nothing that I know of was _intentionally_ changed with this between
>  

Re: Solr relevancy score different on replicated nodes

2019-01-08 Thread Erick Erickson
bq. Shouldn't both replica and leader come to same state
after this much long period.

No. After that long, the docs will be the same, all the docs
present on one replica will be present and searchable on
the other. However, they will be in different segments so the
"stats skew" will remain.

But displaying the scores isn't a good reason to worry about
this. Frankly, that's almost always a mistake. Scores are
meaningless outside of ranking the docs _in a single
query_. Because a doc in one query got a score of 10 but
some other doc in some other query scored 5 doesn't say
anything at all about whether one was "twice as good" as
another. Even within the same query, the same two
scores don't mean one doc is "twice as good".

I think this is a waste of effort frankly. At best, I've seen
UIs where they display, say, 1 to 5 stars that are just
showing the percentile that the particular doc had
_relative to the max score of that query_, unrelated
to any other query.

If you insist (and again I think it's a mistake) you can
optimize periodically, but if you're using anything
earlier than Solr 7.5 that has its own traps and I do
NOT recommend it unless you can do it every time
you change your index. See:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
and
https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/

On Tue, Jan 8, 2019 at 7:28 AM Ashish Bisht  wrote:
>
> Thank you Erick for explaining.
>
> In my senario, I stopped indexing and updates too and waited for 1 day.
> Restarted solr too.Shouldn't both replica and leader come to same state
> after this much long period. As you said this gets corrected by segment
> merging, hope it is internal process itself and no manual activity required.
>
> For us score matters as we are using it to display some scenarios on search
> and it gave changing values.As of now we are dependent of single
> shard-replica but in future we might need more replicas
> Will planning indexing and updates outside peak query hour help?
>
> I have tried the exact cache while debugging score difference during
> sharding.Didn't help much.Anyhow that's a different topic.
>
> Thanks again,
>
> Regards
> Ashish Bisht
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr code base setup

2019-01-08 Thread Erick Erickson
This link seems to be temporarily down, please check it as time
passes: https://www.wiki.apache.org/solr/HowToContribute
This one also gets you
startedhttps://wiki.apache.org/lucene-java/HowToContribute

In short:

Get the source code.

Install Apache Ant

In the parent directory execute the ant target "ant". That'll give you
a list of "targets", the interesting ones at this point are "idea",
"netbeans" or "eclipse". Those will create a project you can open in
the respective IDEs. NOTE: the first time you run an ant target, you
may be prompted to run a specific target to install "ivy".

For Solr, go into the solr directory and execute "ant server" and/or
"ant dist" to build Solr. At that point you have just what you'd get
from downloading and exploding the standard distro. Of course you have
to have java installed. Java 8 is the usual standard at this point.

Again, the solr "How to Contribute" page walks you through the steps
in more detail, I don't know why it's not found at present but it
should come back sometime

Best,
Erick

On Tue, Jan 8, 2019 at 9:36 AM Rajdeep Sahoo  wrote:
>
> How can I set up code Base and do the changes.


Re: Questions for SynonymGraphFilter and WordDelimiterGraphFilter

2019-01-08 Thread Wei
bump..

On Mon, Jan 7, 2019 at 11:53 AM Wei  wrote:

> Thanks Thomas. You mentioned "Also there is no need for the
> FlattenGraphFilter", that's quite interesting because the Solr
> documentation says it's mandatory for indexing:
> https://lucene.apache.org/solr/guide/7_6/filter-descriptions.html. Is
> there any more explanation for this?
>
> Best regards,
> Wei
>
>
> On Mon, Jan 7, 2019 at 7:56 AM Thomas Aglassinger <
> t.aglassin...@netconomy.net> wrote:
>
>> Hi Wei,
>>
>> here's a fairly simple field type we currently use in a project that
>> seems to do the job with graph synonyms. Maybe this helps as a starting
>> point for you:
>>
>> > positionIncrementGap="100">
>> 
>> 
>> > managed="de" />
>> > managed="de" />
>> > preserveOriginal="1"
>> generateWordParts="1" generateNumberParts="1"
>> catenateWords="1"
>> catenateNumbers="1" catenateAll="0"
>> splitOnCaseChange="1" />
>> 
>> 
>> 
>> 
>> 
>> 
>>
>> As you can see we use the same filters for both indexing and query, so
>> this might have some impact on positional queries but so far it seems
>> negligible for the short synonyms we use in practice. Also there is no need
>> for the FlattenGraphFilter.
>>
>> The WhitespaceTokenizerFactory ensures that you can define synonyms with
>> hyphens like mac-book -> macbook.
>>
>> Best regards, Thomas.
>>
>>
>> On 05.01.19, 02:11, "Wei"  wrote:
>>
>> Hello,
>>
>> We are upgrading to Solr 7.6.0 and noticed that SynonymFilter and
>> WordDelimiterFilter have been deprecated. Solr doc recommends to use
>> SynonymGraphFilter and WordDelimiterGraphFilter instead
>> I guess the StopFilter mess up the SynonymGraphFilter output? Not sure
>> if  it's a solr defect or there is a guideline that StopFilter should
>> not be put after graph filters.
>>
>> Thanks in advance for you input.
>>
>>
>> Thanks,
>>
>> Wei
>>
>>
>>


Fwd: Setting Solr Home via installation script

2019-01-08 Thread Stephon Harris
Seeing if anyone has any thoughts on this again.

-- Forwarded message -
From: Stephon Harris 
Date: Mon, Jan 7, 2019 at 10:05 AM
Subject: Setting Solr Home via installation script
To: 



I am trying to install solr as a service so that when a restart takes place
the solr home directory is set to `example/schemaless/solr`where there are
cores I have created while running solr in the schemaless example.



As instructed in taking Solr to Production
,
I ran the command sudo bash ./install_solr_service.sh solr-7.4.0.tgz -i
/opt/ -d example/schemaless/solr -u solr -s solr -p 8983 and it started
solr successfully, however the solr home was set to /var/solr/data. I
thought that giving the -d option Solr home would be set to
example/schemaless/solr. What should I do to get solr home set to
example/schemaless/solr? Is there another way I should go about getting the
cores that I created under the schemaless directory in the solr home
directory?

-- 
Stephon Harris

*Enterprise Knowledge, LLC*
*Web: *http://www.enterprise-knowledge.com/

*E-mail:* shar...@enterprise-knowledge.com/

*Cell:* 832-628-8352



-- 
Stephon Harris

*Enterprise Knowledge, LLC*
*Web: *http://www.enterprise-knowledge.com/

*E-mail:* shar...@enterprise-knowledge.com/

*Cell:* 832-628-8352


Re: SolrCloud 6.5.1 Stability/Recovery Issues

2019-01-08 Thread Shawn Heisey

On 1/8/2019 12:12 PM, Johnston, Charlie wrote:

We have been using Solr 6.5.1 leveraging SolrCloud backed by ZooKeeper for a 
multi-client, multi-node cluster for several months now and have been having a 
few stability/recovery issues we’d like to confirm if they are fixed in Solr 7 
or not. We run 3 large Solr nodes in the cluster (each with 64 GBs of heap, 
100’s of collections, and 9000+ cores).


Managing that many indexes is currently SolrCloud's achilles heel.

Splitting the cluster across more Solr nodes will help to some degree, 
but dealing with thousands of replicas in a single cluster is simply not 
going to scale.  In addition to splitting the indexes across more nodes, 
you may also need to create multiple clusters so that each cluster is 
managing a smaller number of shard replicas.


This is a known problem, and there is constantly work underway to try 
and improve the situation.  I need to repeat the experiments that I did 
on SOLR-7191 on a much newer version so that I can have a better idea of 
whether the situation has improved in 7.x versions.


The discussion on SOLR-7191 is long and very dense, but it might be 
worth reading.  You have about twice as many cores as I was creating in 
my experiments, which means that Solr will be processing more messages 
for recovery operations:


https://issues.apache.org/jira/browse/SOLR-7191

I think that SOLR-10265 pinpoints the central problem that causes these 
issues.  Some of its sub-issues have been implemented, which MIGHT mean 
that 7.x is a lot better off:


https://issues.apache.org/jira/browse/SOLR-10265

Thanks,
Shawn



Re: Inverted index and forward index in Solr shrading

2019-01-08 Thread Shawn Heisey

On 1/8/2019 10:34 AM, Rajdeep Sahoo wrote:

Can any explain me what is inverted index and forward index used in solr
shrading . Is there any resource from where I can find some knowledge about
this.


Most of Solr's functionality is provided by Lucene, which is a 
Java-based programming API that provides full-text search.


https://lucene.apache.org/core/
https://lucene.apache.org/core/7_6_0/index.html

The inverted index is the central feature of Lucene, where most of its 
capability starts.


I've never heard of a forward index.  The closest things I can think of 
that might satisfy that terminology are either stored fields or DocValues.


If you want in-depth information about Lucene, you may find better 
answers on the mailing list for that product than this mailing list.


https://lucene.apache.org/core/discussion.html

To answer a later message where you asked about the source code:

https://wiki.apache.org/solr/HowToContribute

Note that when you get the source code for Solr, you are also getting 
all the source code for Lucene.  Both projects share the same codebase.


Thanks,
Shawn



Re: Tool to format the solr query for easier reading?

2019-01-08 Thread Jan Høydahl
I find myself doing exactly the same, so such a tool would be wonderful.
I sometimes use the Solr Query Debugger 

 Chrome plugin but that does not help with decoding the q parameter, only to 
more easily see all params and navigate explain.
And sometimes I use https://explain.solr.pl/explains/new 
 for decoding explain.
Lucene already parses the query string and builds a Lucene query, I wonder if 
it would be easiest to consume that Object graph with some kind of plugin to 
produce an indented view of the query. Or perhaps that could be a new display 
mode for debug=query as a standard feature?

Another option you have is of course to start sending your Solr queries as JSON 
DSL https://lucene.apache.org/solr/guide/7_6/json-query-dsl.html 
 in the first 
place :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com 

> 8. jan. 2019 kl. 05:33 skrev Hullegård, Jimi 
>  >:
> 
> Hi,
> 
> I often find myself having to analyze an already existing solr query. But 
> when the number of clauses and/or number of nested parentheses reach a 
> certain level I can no longer grasp what the query is about by just a quick 
> glance. Sometimes I can look at the code generating the query, but it might 
> be autogenerated in a complex way, or I might only have access to a log 
> output of the query.
> 
> Here is an example query, based on a real query in our system:
> 
> 
> system:(a) type:(x OR y OR z) date1:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS] 
> ((boolean1:false OR date2:[* TO 2019-08-31T06:15:00Z/DAY-30DAYS])) 
> -date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *] (((*:* -date4:*) OR date5:* OR 
> date3:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS]))
> 
> 
> Here I find it quite difficult to what clauses are grouped together (using 
> parentheses). What I tend to do in these circumstances is to copy the query 
> into a simple text editor, and then manually add line breaks and indentation 
> matching the parentheses levels.
> 
> For the query above, it would result in something like this:
> 
> 
> system:(a)
> type:(x OR y OR z)
> date1:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS]
> (
> (boolean1:false OR date2:[* TO 
> 2019-08-31T06:15:00Z/DAY-30DAYS])
> )
> -date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *]
> (
> ((*:* -date4:*) OR date5:* OR date3:[* TO 
> 2019-08-31T06:15:00Z/DAY+1DAYS])
> )
> 
> 
> But that is a slow process, and I might make a mistake that messes up the 
> interpretation completely. Especially when there are several levels of nested 
> parentheses.
> 
> Does anyone know of any kind of tool that would help automate this? It 
> wouldn't have to format its output like my example, as long as it makes it 
> easier to see what start and end parentheses belong to each other, preferably 
> using multiple lines and indentation.
> 
> A java tool would be perfect, because then I could easily integrate it into 
> our existing debugging tools, but an online formatter (like 
> http://jsonformatter.curiousconcept.com 
> ) would also be very useful.
> 
> Regards
> /Jimi
> 
> Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR. Här 
> kan du läsa mer om vår behandling och dina rättigheter, 
> Integritetspolicy  
> >



SolrCloud 6.5.1 Stability/Recovery Issues

2019-01-08 Thread Johnston, Charlie
Hi,

We have been using Solr 6.5.1 leveraging SolrCloud backed by ZooKeeper for a 
multi-client, multi-node cluster for several months now and have been having a 
few stability/recovery issues we’d like to confirm if they are fixed in Solr 7 
or not. We run 3 large Solr nodes in the cluster (each with 64 GBs of heap, 
100’s of collections, and 9000+ cores). We have found that if anything happens 
(e.g. long garbage collection pauses) that puts replicas of the same shard on 
more than one node into the down/recovering state at the same, then they 
usually fail to complete the recovery process and reach the active state again, 
even though all of the nodes in the cluster are up and running. The same thing 
happens if we shut down multiple nodes and try to start them up at the same 
time. It looks like each replica thinks the other is supposed to be the leader 
and results in neither ever accepting the responsibility. Is this a common 
error in SolrCloud and to the best of the knowledge of the mailing list has it 
been fixed in any Solr 7 releases?

A second issue we have faced with the same cluster is that when the above 
situation arises, our recovery steps is to stop all the impacted nodes 
(sometimes it’s all 3 nodes in the cluster) and start them one at a time, 
waiting for all cores on each node to recover before starting the next one. 
During this process we have found two different issues. One issue is that 
recovering the nodes seems to take a long time, sometimes more than one hour 
for all the cores to move into the active state. Another and more pressing 
issue that we have run into is sometimes it seems the order in which we start 
the servers back up matters. We sometimes run into cases where we start a node 
and all but a few (< 10 cores) won’t recover even after several hours and 
several attempts of restarting the same node. These cores never leave the down 
state. To fix this we need to then stop the node and attempt starting another 
node until we find one that fully recovers so that we can return to the 
originally problematic node to try again- which has always worked in the end 
but only after a lot of pain. Our hope is to get to a day where we can start 
all the nodes in the cluster at the same time and have it “just work” i.e. 
converge on  a fully active state 100% of the time vs. managing this one node 
at a time process. Our expectation is that if all the nodes in  the cluster are 
up and healthy then all of the cores in  the cluster would eventually reach the 
active state, but that seems to just not be the case. Are there any fixes in 
Solr 7 or potentially some configuration we can add to Solr 6 that resolve 
either of these issues?

Best,
Charlie Johnston


This message may contain information that is confidential or privileged. If you 
are not the intended recipient, please advise the sender immediately and delete 
this message. See 
http://www.blackrock.com/corporate/compliance/email-disclaimers for further 
information.  Please refer to 
http://www.blackrock.com/corporate/compliance/privacy-policy for more 
information about BlackRock’s Privacy Policy.
For a list of BlackRock's office addresses worldwide, see 
http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2019 BlackRock, Inc. All rights reserved.


Solr block join

2019-01-08 Thread Rajdeep Sahoo
What is the use of block join in solr


Solr code base setup

2019-01-08 Thread Rajdeep Sahoo
How can I set up code Base and do the changes.


Inverted index and forward index in Solr shrading

2019-01-08 Thread Rajdeep Sahoo
Can any explain me what is inverted index and forward index used in solr
shrading . Is there any resource from where I can find some knowledge about
this.


Re: Solr relevancy score different on replicated nodes

2019-01-08 Thread Ashish Bisht
Thank you Erick for explaining. 

In my senario, I stopped indexing and updates too and waited for 1 day.
Restarted solr too.Shouldn't both replica and leader come to same state
after this much long period. As you said this gets corrected by segment
merging, hope it is internal process itself and no manual activity required.

For us score matters as we are using it to display some scenarios on search
and it gave changing values.As of now we are dependent of single
shard-replica but in future we might need more replicas
Will planning indexing and updates outside peak query hour help? 

I have tried the exact cache while debugging score difference during
sharding.Didn't help much.Anyhow that's a different topic. 

Thanks again, 

Regards
Ashish Bisht





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


SV: SV: Tool to format the solr query for easier reading?

2019-01-08 Thread Hullegård , Jimi
I tried it now (but I had to install it from the official chrome store, the 
link from your blog didn't work). But the only formatting it seems to be doing 
is adding a line break after each "OR", which doesn't help much.

Although, to be fair, the plugin is still "Evaluating..." my query, so in 
theory I guess that it could do some more formatting when it is done (whenever 
that would be, it has been stuck in this "Evaluating..." for about 10 minutes 
now).

-Ursprungligt meddelande-
Från: Charlie Hull 
Skickat: den 8 januari 2019 16:32
Till: solr-user@lucene.apache.org
Ämne: Re: SV: Tool to format the solr query for easier reading?


Hi Jimi,

I recalled that the Chrome plugin would do this, obviously it's not a perfect 
solution for you as you've prefer a Java formatter but it's a start - have you 
tried this one?

Best

Charlie
>
> /Jimi
>
> -Ursprungligt meddelande-
> Från: Charlie Hull 
> Skickat: den 8 januari 2019 15:55
> Till: solr-user@lucene.apache.org
> Ämne: Re: Tool to format the solr query for easier reading?
>
> On 08/01/2019 04:33, Hullegård, Jimi wrote:
>> Hi,
>
> Hi Jimi,
>
> There are some suggestions in part 4 of my recent blog:
> http://www.flax.co.uk/blog/2018/11/15/defining-relevance-engineering-p
> art-4-tools/
>
> Cheers
>
> Charlie
>>
>> I often find myself having to analyze an already existing solr query. But 
>> when the number of clauses and/or number of nested parentheses reach a 
>> certain level I can no longer grasp what the query is about by just a quick 
>> glance. Sometimes I can look at the code generating the query, but it might 
>> be autogenerated in a complex way, or I might only have access to a log 
>> output of the query.
>>
>> Here is an example query, based on a real query in our system:
>>
>>
>> system:(a) type:(x OR y OR z) date1:[* TO
>> 2019-08-31T06:15:00Z/DAY+1DAYS] ((boolean1:false OR date2:[* TO
>> 2019-08-31T06:15:00Z/DAY-30DAYS]))
>> -date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *] (((*:* -date4:*) OR
>> date5:* OR date3:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS]))
>>
>>
>> Here I find it quite difficult to what clauses are grouped together (using 
>> parentheses). What I tend to do in these circumstances is to copy the query 
>> into a simple text editor, and then manually add line breaks and indentation 
>> matching the parentheses levels.
>>
>> For the query above, it would result in something like this:
>>
>>
>> system:(a)
>> type:(x OR y OR z)
>> date1:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS] (
>>(boolean1:false OR date2:[* TO
>> 2019-08-31T06:15:00Z/DAY-30DAYS])
>> )
>> -date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *] (
>>((*:* -date4:*) OR date5:* OR date3:[*
>> TO 2019-08-31T06:15:00Z/DAY+1DAYS])
>> )
>>
>>
>> But that is a slow process, and I might make a mistake that messes up the 
>> interpretation completely. Especially when there are several levels of 
>> nested parentheses.
>>
>> Does anyone know of any kind of tool that would help automate this? It 
>> wouldn't have to format its output like my example, as long as it makes it 
>> easier to see what start and end parentheses belong to each other, 
>> preferably using multiple lines and indentation.
>>
>> A java tool would be perfect, because then I could easily integrate it into 
>> our existing debugging tools, but an online formatter (like 
>> http://jsonformatter.curiousconcept.com) would also be very useful.
>>
>> Regards
>> /Jimi
>>
>> Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR.
>> Här kan du läsa mer om vår behandling och dina rättigheter,
>> Integritetspolicy> e
>> t-och-behandling-av-personuppgifter_697219.html?utm_source=sn-email
>> t
>> m_medium=email>
>>
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
> Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR.
> Här kan du läsa mer om vår behandling och dina rättigheter,
> Integritetspolicy t-och-behandling-av-personuppgifter_697219.html?utm_source=sn-email
> m_medium=email>
>


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk
Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR. Här kan 
du läsa mer om vår behandling och dina rättigheter, 
Integritetspolicy


Re: REBALANCELEADERS is not reliable

2019-01-08 Thread Bernd Fehling

Hi Erick,

after some more hours of debugging the rough result is, who ever invented
this leader election did not check if an action returns the estimated
result. There are only checks for exceptions, true/false, new sequence
numbers and so on, but never if a leader election to the preferredleader
really took place.

If doing a rebalanceleaders to preferredleader I also have to check if:
- a rebalance took place
- the preferredleader has really become leader (and not anyone else)

Currently this is not checked and the call rebalanceleaders to preferredleader
is like a shot into the dark with hope of success. And thats why any
problems have never been discovered or reported.

Bernd


Am 21.12.18 um 18:00 schrieb Erick Erickson:

I looked at the test last night and it's...disturbing. It succeeds
100% of the time. Manual testing seems to fail very often.
Of course it was late and I was a bit cross-eyed, so maybe
I wasn't looking at the manual tests correctly. Or maybe the
test is buggy.

I beasted the test 100x last night and all of them succeeded.

This was with all NRT replicas.

Today I'm going to modify the test into a stand-alone program
to see if it's something in the test environment that causes
it to succeed. I've got to get this to fail as a unit test before I
have confidence in any fixes, and also confidence that things
like this will be caught going forward.

Erick

On Fri, Dec 21, 2018 at 3:59 AM Bernd Fehling
 wrote:


As far as I could see with debugger there is still a problem in requeing.

There is a watcher and it is recognized that the watcher is not a 
preferredleader.
So it tries to locate a preferredleader with success.
It then calls makeReplicaFirstWatcher and gets a new sequence number for
the preferredleader replica. But now we have two replicas with the same
sequence number. One replica which already owns that sequence number and
the replica which got the new (and the same) number as new sequence number.
It now tries to solve this with queueNodesWithSameSequence.
Might be something in rejoinElection.
At least the call to rejoinElection seems right. For preferredleader it
is true for rejoinAtHead and for the other replica with same sequence number
it is false for rejoinAtHead.

A test case should have 3 shards with 3 cores per shard and should try to
set preferredleader to different replicas at random. And then try to
rebalance and check the results.

So far, regards, Bernd


Am 21.12.18 um 07:11 schrieb Erick Erickson:

I'm reworking the test case, so hold off on doing that. If you want to
raise a JIRA, though. please do and attach your patch...

On Thu, Dec 20, 2018 at 10:53 AM Erick Erickson  wrote:


Nothing that I know of was _intentionally_ changed with this between
6x and 7x. That said, nothing that I know of was done to verify that
TLOG and PULL replicas (added in 7x) were handled correctly. There's a
test "TestRebalanceLeaders" for this functionality that has run since
the feature was put in, but it has _not_ been modified to create TLOG
and PULL replicas and test with those.

For this patch to be complete, we should either extend that test or
make another that fails without this patch and succeeds with it.

I'd probably recommend modifying TestRebalanceLeaders to randomly
create TLOG and (maybe) PULL replicas so we'd keep covering the
various cases.

Best,
Erick


On Thu, Dec 20, 2018 at 8:06 AM Bernd Fehling
 wrote:


Hi Vadim,
I just tried it with 6.6.5.
In my test cloud with 5 shards, 5 nodes, 3 cores per node it missed
one shard to become leader. But noticed that one shard already was
leader. No errors or exceptions in logs.
May be I should enable debug logging and try again to see all logging
messages from the patch.

Might be they also changed other parts between 6.6.5 and 7.6.0 so that
it works for you.

I also just changed from zookeeper 3.4.10 to 3.4.13 which works fine,
even with 3.4.10 dataDir. No errors no complains. Seems to be compatible.

Regards, Bernd


Am 20.12.18 um 12:31 schrieb Vadim Ivanov:

Yes! It works!
I have tested RebalanceLeaders today with the patch provided by Endika Posadas. 
(http://lucene.472066.n3.nabble.com/Rebalance-Leaders-Leader-node-deleted-when-rebalancing-leaders-td4417040.html)
And at last it works as expected on my collection with 5 nodes and about 400 
shards.
Original patch was slightly incompatible with 7.6.0
I hope this patch will help to try this feature with 7.6
https://drive.google.com/file/d/19z_MPjxItGyghTjXr6zTCVsiSJg1tN20

RebalanceLeaders was not very useful feature before 7.0 (as all replicas were 
NRT)
But new replica types made it very helpful to keep big clusters in order...

I wonder, why there is no any jira about this case (or maybe I missed it)?
Anyone who cares, please, help to create jira and improve this feature in the 
nearest releaase



Re: SV: Tool to format the solr query for easier reading?

2019-01-08 Thread Charlie Hull

On 08/01/2019 09:20, Hullegård, Jimi wrote:

Hi Charlie,

Care to elaborate on that a little? I can't seem to find any tool in that blog 
entry that formats a given solr query. What tool did you have in mind?


Hi Jimi,

I recalled that the Chrome plugin would do this, obviously it's not a 
perfect solution for you as you've prefer a Java formatter but it's a 
start - have you tried this one?


Best

Charlie


/Jimi

-Ursprungligt meddelande-
Från: Charlie Hull 
Skickat: den 8 januari 2019 15:55
Till: solr-user@lucene.apache.org
Ämne: Re: Tool to format the solr query for easier reading?

On 08/01/2019 04:33, Hullegård, Jimi wrote:

Hi,


Hi Jimi,

There are some suggestions in part 4 of my recent blog:
http://www.flax.co.uk/blog/2018/11/15/defining-relevance-engineering-part-4-tools/

Cheers

Charlie


I often find myself having to analyze an already existing solr query. But when 
the number of clauses and/or number of nested parentheses reach a certain level 
I can no longer grasp what the query is about by just a quick glance. Sometimes 
I can look at the code generating the query, but it might be autogenerated in a 
complex way, or I might only have access to a log output of the query.

Here is an example query, based on a real query in our system:


system:(a) type:(x OR y OR z) date1:[* TO
2019-08-31T06:15:00Z/DAY+1DAYS] ((boolean1:false OR date2:[* TO
2019-08-31T06:15:00Z/DAY-30DAYS]))
-date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *] (((*:* -date4:*) OR
date5:* OR date3:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS]))


Here I find it quite difficult to what clauses are grouped together (using 
parentheses). What I tend to do in these circumstances is to copy the query 
into a simple text editor, and then manually add line breaks and indentation 
matching the parentheses levels.

For the query above, it would result in something like this:


system:(a)
type:(x OR y OR z)
date1:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS] (
   (boolean1:false OR date2:[* TO
2019-08-31T06:15:00Z/DAY-30DAYS])
)
-date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *] (
   ((*:* -date4:*) OR date5:* OR date3:[*
TO 2019-08-31T06:15:00Z/DAY+1DAYS])
)


But that is a slow process, and I might make a mistake that messes up the 
interpretation completely. Especially when there are several levels of nested 
parentheses.

Does anyone know of any kind of tool that would help automate this? It wouldn't 
have to format its output like my example, as long as it makes it easier to see 
what start and end parentheses belong to each other, preferably using multiple 
lines and indentation.

A java tool would be perfect, because then I could easily integrate it into our 
existing debugging tools, but an online formatter (like 
http://jsonformatter.curiousconcept.com) would also be very useful.

Regards
/Jimi

Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR.
Här kan du läsa mer om vår behandling och dina rättigheter,
Integritetspolicy




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk
Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR. Här kan du läsa 
mer om vår behandling och dina rättigheter, 
Integritetspolicy




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: SV: Tool to format the solr query for easier reading?

2019-01-08 Thread Charlie Hull

On 08/01/2019 09:20, Hullegård, Jimi wrote:

Hi Charlie,

Care to elaborate on that a little? I can't seem to find any tool in that blog 
entry that formats a given solr query. What tool did you have in mind?


This also does some basic URL splitting: 
https://www.freeformatter.com/url-parser-query-string-splitter.html


Cheers

Charlie


/Jimi

-Ursprungligt meddelande-
Från: Charlie Hull 
Skickat: den 8 januari 2019 15:55
Till: solr-user@lucene.apache.org
Ämne: Re: Tool to format the solr query for easier reading?

On 08/01/2019 04:33, Hullegård, Jimi wrote:

Hi,


Hi Jimi,

There are some suggestions in part 4 of my recent blog:
http://www.flax.co.uk/blog/2018/11/15/defining-relevance-engineering-part-4-tools/

Cheers

Charlie


I often find myself having to analyze an already existing solr query. But when 
the number of clauses and/or number of nested parentheses reach a certain level 
I can no longer grasp what the query is about by just a quick glance. Sometimes 
I can look at the code generating the query, but it might be autogenerated in a 
complex way, or I might only have access to a log output of the query.

Here is an example query, based on a real query in our system:


system:(a) type:(x OR y OR z) date1:[* TO
2019-08-31T06:15:00Z/DAY+1DAYS] ((boolean1:false OR date2:[* TO
2019-08-31T06:15:00Z/DAY-30DAYS]))
-date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *] (((*:* -date4:*) OR
date5:* OR date3:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS]))


Here I find it quite difficult to what clauses are grouped together (using 
parentheses). What I tend to do in these circumstances is to copy the query 
into a simple text editor, and then manually add line breaks and indentation 
matching the parentheses levels.

For the query above, it would result in something like this:


system:(a)
type:(x OR y OR z)
date1:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS] (
   (boolean1:false OR date2:[* TO
2019-08-31T06:15:00Z/DAY-30DAYS])
)
-date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *] (
   ((*:* -date4:*) OR date5:* OR date3:[*
TO 2019-08-31T06:15:00Z/DAY+1DAYS])
)


But that is a slow process, and I might make a mistake that messes up the 
interpretation completely. Especially when there are several levels of nested 
parentheses.

Does anyone know of any kind of tool that would help automate this? It wouldn't 
have to format its output like my example, as long as it makes it easier to see 
what start and end parentheses belong to each other, preferably using multiple 
lines and indentation.

A java tool would be perfect, because then I could easily integrate it into our 
existing debugging tools, but an online formatter (like 
http://jsonformatter.curiousconcept.com) would also be very useful.

Regards
/Jimi

Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR.
Här kan du läsa mer om vår behandling och dina rättigheter,
Integritetspolicy




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk
Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR. Här kan du läsa 
mer om vår behandling och dina rättigheter, 
Integritetspolicy




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


SV: Tool to format the solr query for easier reading?

2019-01-08 Thread Hullegård , Jimi
Hi Charlie,

Care to elaborate on that a little? I can't seem to find any tool in that blog 
entry that formats a given solr query. What tool did you have in mind?

/Jimi

-Ursprungligt meddelande-
Från: Charlie Hull 
Skickat: den 8 januari 2019 15:55
Till: solr-user@lucene.apache.org
Ämne: Re: Tool to format the solr query for easier reading?

On 08/01/2019 04:33, Hullegård, Jimi wrote:
> Hi,

Hi Jimi,

There are some suggestions in part 4 of my recent blog:
http://www.flax.co.uk/blog/2018/11/15/defining-relevance-engineering-part-4-tools/

Cheers

Charlie
>
> I often find myself having to analyze an already existing solr query. But 
> when the number of clauses and/or number of nested parentheses reach a 
> certain level I can no longer grasp what the query is about by just a quick 
> glance. Sometimes I can look at the code generating the query, but it might 
> be autogenerated in a complex way, or I might only have access to a log 
> output of the query.
>
> Here is an example query, based on a real query in our system:
>
>
> system:(a) type:(x OR y OR z) date1:[* TO
> 2019-08-31T06:15:00Z/DAY+1DAYS] ((boolean1:false OR date2:[* TO
> 2019-08-31T06:15:00Z/DAY-30DAYS]))
> -date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *] (((*:* -date4:*) OR
> date5:* OR date3:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS]))
>
>
> Here I find it quite difficult to what clauses are grouped together (using 
> parentheses). What I tend to do in these circumstances is to copy the query 
> into a simple text editor, and then manually add line breaks and indentation 
> matching the parentheses levels.
>
> For the query above, it would result in something like this:
>
>
> system:(a)
> type:(x OR y OR z)
> date1:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS] (
>   (boolean1:false OR date2:[* TO
> 2019-08-31T06:15:00Z/DAY-30DAYS])
> )
> -date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *] (
>   ((*:* -date4:*) OR date5:* OR date3:[*
> TO 2019-08-31T06:15:00Z/DAY+1DAYS])
> )
>
>
> But that is a slow process, and I might make a mistake that messes up the 
> interpretation completely. Especially when there are several levels of nested 
> parentheses.
>
> Does anyone know of any kind of tool that would help automate this? It 
> wouldn't have to format its output like my example, as long as it makes it 
> easier to see what start and end parentheses belong to each other, preferably 
> using multiple lines and indentation.
>
> A java tool would be perfect, because then I could easily integrate it into 
> our existing debugging tools, but an online formatter (like 
> http://jsonformatter.curiousconcept.com) would also be very useful.
>
> Regards
> /Jimi
>
> Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR.
> Här kan du läsa mer om vår behandling och dina rättigheter,
> Integritetspolicy t-och-behandling-av-personuppgifter_697219.html?utm_source=sn-email
> m_medium=email>
>


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk
Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR. Här kan 
du läsa mer om vår behandling och dina rättigheter, 
Integritetspolicy


Re: Tool to format the solr query for easier reading?

2019-01-08 Thread Charlie Hull

On 08/01/2019 04:33, Hullegård, Jimi wrote:

Hi,


Hi Jimi,

There are some suggestions in part 4 of my recent blog: 
http://www.flax.co.uk/blog/2018/11/15/defining-relevance-engineering-part-4-tools/


Cheers

Charlie


I often find myself having to analyze an already existing solr query. But when 
the number of clauses and/or number of nested parentheses reach a certain level 
I can no longer grasp what the query is about by just a quick glance. Sometimes 
I can look at the code generating the query, but it might be autogenerated in a 
complex way, or I might only have access to a log output of the query.

Here is an example query, based on a real query in our system:


system:(a) type:(x OR y OR z) date1:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS] 
((boolean1:false OR date2:[* TO 2019-08-31T06:15:00Z/DAY-30DAYS])) 
-date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *] (((*:* -date4:*) OR date5:* OR 
date3:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS]))


Here I find it quite difficult to what clauses are grouped together (using 
parentheses). What I tend to do in these circumstances is to copy the query 
into a simple text editor, and then manually add line breaks and indentation 
matching the parentheses levels.

For the query above, it would result in something like this:


system:(a)
type:(x OR y OR z)
date1:[* TO 2019-08-31T06:15:00Z/DAY+1DAYS]
(
  (boolean1:false OR date2:[* TO 
2019-08-31T06:15:00Z/DAY-30DAYS])
)
-date3:[2019-08-31T06:15:00Z/DAY+1DAYS TO *]
(
  ((*:* -date4:*) OR date5:* OR date3:[* TO 
2019-08-31T06:15:00Z/DAY+1DAYS])
)


But that is a slow process, and I might make a mistake that messes up the 
interpretation completely. Especially when there are several levels of nested 
parentheses.

Does anyone know of any kind of tool that would help automate this? It wouldn't 
have to format its output like my example, as long as it makes it easier to see 
what start and end parentheses belong to each other, preferably using multiple 
lines and indentation.

A java tool would be perfect, because then I could easily integrate it into our 
existing debugging tools, but an online formatter (like 
http://jsonformatter.curiousconcept.com) would also be very useful.

Regards
/Jimi

Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR. Här kan du läsa 
mer om vår behandling och dina rättigheter, 
Integritetspolicy




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk