Re: Primary vs. replica shard inconsistencies?

Paul Smith Thu, 30 Jan 2014 17:30:17 -0800

if it helps at all, i've pushed the flappy item detector tool (*cough*)
here:


https://github.com/Aconex/es-flappyitem-detector

We have a simple 3-node cluster, 5 shards, 1 replica, so I'm sure there's
code in there that is built around those assumptions, but should be easily
modified to suit your purpose perhaps.

cheers,

Paul


On 31 January 2014 11:59, Paul Smith <tallpsm...@gmail.com> wrote:

> the flappy detection tool I have connects to the cluster using the
> standard java autodiscovery mechanism, and, works out which shards are
> involved, and then creates explicit TransportClient connection to each
> host, so would need access to 9300 (the SMILE based protocol port).  Would
> that help? (is 9300 accessible from a host that can run java ?
>
>
> On 31 January 2014 11:45, <xav...@gaikai.com> wrote:
>
>> We have 4 query heads total (esq1.r6, esq2.r6, esq3.r7, esq4.r7).
>> Interestingly query heads in the same rack give the same results. We don't
>> do deletes at all on these indices so that shouldn't be an issue.
>> Unfortunately at the moment I can't do preference=_local while getting the
>> _id(s) directly because we don't allow access on 9200 on our worker nodes.
>> I might be able to right some code to figure this out though. Either way
>> here's my id results from the different heads.
>>
>> esq2.r6 gets 28 total results
>> esq3.r7 gets 9 total results
>>
>> $curl -XGET "
>> http://esq2.r6:9200/events/_search?q=sessionId:1390953880&size=100"; | jq
>> '.hits.hits[]._id' | sort
>>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
>>  Current
>>                                  Dload  Upload   Total   Spent    Left
>>  Speed
>> 100 19337  100 19337    0     0  1039k      0 --:--:-- --:--:-- --:--:--
>> 1049k
>> "0LcI_px4SZy5ZQkI_V7Qyw"
>> "1sAGREtMSfK8OIxZErm8RQ"
>> "6IV2v4TFTr-Gl1eC6hrj0Q"
>> "6nwMexTHQBmFxfykOgKqWA"
>> "7hFYs6y-QG6wGYEkoBKmdg"
>> "9MTM10SeQ2yqWIb08oPnFA"
>> "aELtGN6DQpmdRlQbr8i0uA"
>> "AUHUg6k0QZOf_oGjsjSsGA"
>> "Bo_u1eYGSF2LeU78kbcFZg"
>> "EWs1K8YsR9-IBSAWK6ld7A"
>> "Fx4l6_axSGCxpyFm7C7BSQ"
>> "gpCrAZrNTNezWPfensER3g"
>> "HAFmGcWuQAylxGjmnZZkSQ"
>> "HB4Kwz3RSWWH5NHvyH4JMg"
>> "H-eP-33FREOtq7v0uBPWbQ"
>> "_IH6W4DoTRmdms0FJNlg4g"
>> "iK_3TbzcSj2-MbMXip_XFg"
>> "J4bjPFIcQ1ewrQqjN2qz6Q"
>> "kfonMDBuR--UIhkyM2cWrg"
>> "Kr6-9-3uR9Wp2923n-O2NA"
>> "Nw_9rjwvQ62u-HsuWIm53A"
>> "QRmY8R2MQemuePb0EkYxWA"
>> "usloSJzQRzCpOQ8bxKi2vA"
>> "w9NGEWg-QiivMpjyurYKrA"
>> "wKy-YzB-TK2lnK86Sx2RBA"
>> "y2ZmJ-_GRAmi3eHy1y8jzw"
>> "ZmFj7w4hR5Cvy-owCLmZ1Q"
>> "ZmlndPBLT-ivuOxm_A7yDA"
>>
>> $curl -XGET "
>> http://esq3.r7:9200/events/_search?q=sessionId:1390953880&size=100"; | jq
>> '.hits.hits[]._id' | sort
>>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
>>  Current
>>                                  Dload  Upload   Total   Spent    Left
>>  Speed
>> 100  6808  100  6808    0     0  70082      0 --:--:-- --:--:-- --:--:--
>> 70185
>> "1sAGREtMSfK8OIxZErm8RQ"
>> "7hFYs6y-QG6wGYEkoBKmdg"
>> "aELtGN6DQpmdRlQbr8i0uA"
>> "Fx4l6_axSGCxpyFm7C7BSQ"
>> "HAFmGcWuQAylxGjmnZZkSQ"
>> "H-eP-33FREOtq7v0uBPWbQ"
>> "QRmY8R2MQemuePb0EkYxWA"
>> "wKy-YzB-TK2lnK86Sx2RBA"
>> "y2ZmJ-_GRAmi3eHy1y8jzw"
>>
>> And here is es3.r7 with preference=_primary_first:
>>
>> $curl -XGET "
>> http://esq3.r7/events/_search?q=sessionId:1390953880&size=100&preference=_primary_first";
>> | jq '.hits.hits[]._id' | sort
>>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
>>  Current
>>                                  Dload  Upload   Total   Spent    Left
>>  Speed
>> 100 19335  100 19335    0     0   871k      0 --:--:-- --:--:-- --:--:--
>>  899k
>> "0LcI_px4SZy5ZQkI_V7Qyw"
>> "1sAGREtMSfK8OIxZErm8RQ"
>> "6IV2v4TFTr-Gl1eC6hrj0Q"
>> "6nwMexTHQBmFxfykOgKqWA"
>> "7hFYs6y-QG6wGYEkoBKmdg"
>> "9MTM10SeQ2yqWIb08oPnFA"
>> "aELtGN6DQpmdRlQbr8i0uA"
>> "AUHUg6k0QZOf_oGjsjSsGA"
>> "Bo_u1eYGSF2LeU78kbcFZg"
>> "EWs1K8YsR9-IBSAWK6ld7A"
>> "Fx4l6_axSGCxpyFm7C7BSQ"
>> "gpCrAZrNTNezWPfensER3g"
>> "HAFmGcWuQAylxGjmnZZkSQ"
>> "HB4Kwz3RSWWH5NHvyH4JMg"
>> "H-eP-33FREOtq7v0uBPWbQ"
>> "_IH6W4DoTRmdms0FJNlg4g"
>> "iK_3TbzcSj2-MbMXip_XFg"
>> "J4bjPFIcQ1ewrQqjN2qz6Q"
>> "kfonMDBuR--UIhkyM2cWrg"
>> "Kr6-9-3uR9Wp2923n-O2NA"
>> "Nw_9rjwvQ62u-HsuWIm53A"
>> "QRmY8R2MQemuePb0EkYxWA"
>> "usloSJzQRzCpOQ8bxKi2vA"
>> "w9NGEWg-QiivMpjyurYKrA"
>> "wKy-YzB-TK2lnK86Sx2RBA"
>> "y2ZmJ-_GRAmi3eHy1y8jzw"
>> "ZmFj7w4hR5Cvy-owCLmZ1Q"
>> "ZmlndPBLT-ivuOxm_A7yDA"
>>
>>
>> On Thursday, January 30, 2014 4:00:49 PM UTC-8, tallpsmith wrote:
>>
>>> If you can narrow down a specific few IDs of results that
>>> appear/disappear based on the primary/replica shard, and confirm through an
>>> explicit GET of that ID with the preference=_local on the primary shard &
>>> replica for that result.  To work out which shard # a specific ID belongs
>>> to, you can run this query:
>>>
>>> curl -XGET '*http://127.0.0.1:9200/_all/_search?pretty=1
>>> <http://127.0.0.1:9200/_all/_search?pretty=1>*' -d '
>>> {
>>> "fields" : [],
>>> "query" : {
>>> "ids" : {
>>> "values" : [
>>> "123456789"
>>> ]
>>> }
>>> },
>>> "explain" : 1
>>> }
>>> '
>>>
>>> where the "values" attribute you place the ID of the item you're after.
>>>  Within the result response you'l see the shard Id, use that to identify
>>> which host is the primary and which is the replica.  You can then run the
>>> GET query with the preference=_local on each of those hosts and see if the
>>> primary or replica shows the result.  You will need to understand whether
>>> the item that is 'flappy' (appearing/disappearing depending on the shard
>>> being searched) is supposed to be in there or not, perhaps checking the
>>> data store that is the source of the index (is it a dB?).
>>>
>>> We have very infrequent case where the replica shard is not properly
>>> receiving a delete at least with 0.19.10.  The delete successfully applies
>>> to the Primary, but the Replica still holds the value and returns it within
>>> search results.  We have loads of insert/update/delete activity and the
>>> number of flappy items is very small, but it is definitely a thing.
>>>
>>> see this previous thread:  http://elasticsearch-users.
>>> 115913.n3.nabble.com/Deleted-items-appears-when-searching-
>>> a-replica-shard-td4029075.html
>>>
>>> If it is the replica shard that's incorrect (my bet), the way to fix it
>>> is to relocate the replica shard to another host.  The relocation will take
>>> the copy of the primary (correct copy) and recreate a new replica shard,
>>> effectively neutralizing the inconsistency.
>>>
>>> We have written a tool, Scrutineer (https://github.com/aconex/scrutineer)
>>> which can help detect inconsistencies in your cluster.  I also have a tool
>>> not yet published to github that can help check these Primary/Replica
>>> inconsistencies if that would help (you pass a list of IDs to it and it'll
>>> check whether they're flappy between the primary & replica or not).  It can
>>> also help automate the rebuilding of just the replica shards by shunting
>>> them around (rather than a full rolling restart of ALL the shards, just the
>>> shard replicas you want)
>>>
>>> cheers,
>>>
>>> Paul Smith
>>>
>>>
>>>
>>>
>>>
>>> On 31 January 2014 09:44, Binh Ly <bi...@hibalo.com> wrote:
>>>
>>>> Xavier, can you post an example of 1 full query and then also show how
>>>> the results of this one query is inconsistent? Just trying to understand
>>>> what is inconsistent. Thanks.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/038735ba-cef6-4634-9d46-7ff39dffc4d2%
>>>> 40googlegroups.com.
>>>>
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/a99ab249-ddf4-4c38-97d7-3bfe8ec41b5f%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHfYWB5%3DUVzDmvmNHHmhkGnFaEAV37-c2cnMPKSvuqsJ3MzC2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Primary vs. replica shard inconsistencies?

Reply via email to