[jira] [Updated] (SOLR-12974) RandomSort not consistent in SolrCloud Mode

Shrey Shivam (JIRA) Wed, 07 Nov 2018 23:02:44 -0800


     [ 
https://issues.apache.org/jira/browse/SOLR-12974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shrey Shivam updated SOLR-12974:
--------------------------------
    Description: 
Expected behaviour of RandomSort is that given the same random field name 
(random_<seed>) which acts a seed, the sorting order will remain consistent 
with the same version of Solr Index.

>From schema.xml:

{{~<!-- The "RandomSortField" is not used to store or search any data. You can 
declare fields of this type it in your schema to generate pseudo-random 
orderings of your docs for sorting or function purposes. The ordering is 
generated based on the field name and the version of the index. As long as the 
index version remains unchanged, and the same field name is reused, the 
ordering of the docs will be consistent. If you want different psuedo-random 
orderings of documents, for the same version of the index, use a dynamicField 
and change the field name in the request. -->~}}

 

In master slave mode, replication happens based on index version. If version 
number of slave is different than that of master, replication is done by slaves 
and the index number is updated to match the index version of master.

However in SolrCloud mode, observation has been that replicas of the same shard 
do not maintain the same version number at all times even though the documents 
are same and consistent. 

This has been previously discussed in [mailing list 
|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201508.mbox/%3ccae3utzmggprv-p6juwjwm2yyyxfw893xayq7+2hav7mmobm...@mail.gmail.com%3E]as
 well.
{quote}SolrCloud works very differently than the old master-slave replication.

The index is NOT copied from the leader to the other replicas, except
 in extreme recovery circumstances.

Each replica builds its own copy of the index independently from the
 others. Due to slight timing differences in the indexing operations,
 and possible actions related to transaction log replay on node restart,
 each replica may end up with a different index layout. There also could
 be differences in the number of deleted documents. Unless something
 goes really wrong, all replicas should contain the same live documents.
{quote}
 

When a query to a shard is made which has 2 or more replicas, any replica is 
chosen to respond to the query. Now, if all replicas do not have the same index 
number, RandomSort will generate random hash seed differently for the same 
random_<seed> field name.

In the source code of 
[RandomSort|https://github.com/apache/lucene-solr/blob/branch_6_5/solr/core/src/java/org/apache/solr/schema/RandomSortField.java]
 class, in line 86, it mentions the use of index version (of shard) to create 
random hash seed.

Hence when querying a Solr Collection, for the same query, Solr is giving 
different results depending on version mismatch in replicas as well as based on 
which replica is serving request each time.

 

Example of Solr Query where random field is being used:

[https://solr-stage.mydomain.com:8983/solr/mycollection/select?wt=json&q=*:*&defType=edismax&fl=id&boost=if(query(]{!v='documentDate:[2018-11-07
 TO 
*]'}),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),1),sub(1,div(1,1))),if(or(exists(query(\{!v='documentType:sponsored'})),exists(query(\{!v='documentType:featured'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),4),sub(1,div(1,4))),
 
if(or(exists(query(\{!v='documentType:listing'})),exists(query(\{!v='documentType:promotional'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),2),sub(1,div(1,2))),scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1))))

  was:
Expected behaviour of RandomSort is that given the same random field name 
(random_<seed>) which acts a seed, the sorting order will remain consistent 
with the same version of Solr Index.

>From schema.xml:

{{~<!-- The "RandomSortField" is not used to store or search any data. You can 
declare fields of this type it in your schema to generate pseudo-random 
orderings of your docs for sorting or function purposes. The ordering is 
generated based on the field name and the version of the index. As long as the 
index version remains unchanged, and the same field name is reused, the 
ordering of the docs will be consistent. If you want different psuedo-random 
orderings of documents, for the same version of the index, use a dynamicField 
and change the field name in the request. -->~}}

 

In master slave mode, replication happens based on index version. If version 
number of slave is different than that of master, replication is done by slaves 
and the index number is updated to match the index version of master.

However in SolrCloud mode, observation has been that replicas of the same shard 
do not maintain the same version number at all times even though the documents 
are same and consistent. 

This has been previously discussed in [mailing list 
|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201508.mbox/%3ccae3utzmggprv-p6juwjwm2yyyxfw893xayq7+2hav7mmobm...@mail.gmail.com%3E]as
 well.
{quote}SolrCloud works very differently than the old master-slave replication.

The index is NOT copied from the leader to the other replicas, except
 in extreme recovery circumstances.

Each replica builds its own copy of the index independently from the
 others. Due to slight timing differences in the indexing operations,
 and possible actions related to transaction log replay on node restart,
 each replica may end up with a different index layout. There also could
 be differences in the number of deleted documents. Unless something
 goes really wrong, all replicas should contain the same live documents.
{quote}
 

When a query to a shard is made which has 2 or more replicas, any replica is 
chosen to respond to the query. Now, if all replicas do not have the same index 
number, RandomSort will generate random hash seed differently for the same 
random_<seed> field name.

In the source code of 
[RandomSort|https://github.com/apache/lucene-solr/blob/branch_6_5/solr/core/src/java/org/apache/solr/schema/RandomSortField.java]
 class, in line 86, it mentions the use of index version (of shard) to create 
random hash seed.

Hence when querying a Solr Collection, for the same query, Solr is giving 
different results depending on version mismatch in replicas as well as based on 
which replica is serving request each time.

 

Example of Solr Query where random field is being used:

https://solr-stage.mydomain.com:8983/solr/mycollection/select?wt=json&q=*:*&defType=edismax&fl=id&boost=if(query(\{!v='documentDate:[2018-11-07
 TO 
*]'}),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),1),sub(1,div(1,1))),if(or(exists(query(\{!v='documentType:sponsored'})),exists(query(\{!v='documentType:featured'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),4),sub(1,div(1,4))),
 
if(or(exists(query(\{!v='documentType:listing'})),exists(query(\{!v='documentType:promotional'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),2),sub(1,div(1,2))),scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1))))


> RandomSort not consistent in SolrCloud Mode
> -------------------------------------------
>
>                 Key: SOLR-12974
>                 URL: https://issues.apache.org/jira/browse/SOLR-12974
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 6.5.1
>            Reporter: Shrey Shivam
>            Priority: Minor
>
> Expected behaviour of RandomSort is that given the same random field name 
> (random_<seed>) which acts a seed, the sorting order will remain consistent 
> with the same version of Solr Index.
> From schema.xml:
> {{~<!-- The "RandomSortField" is not used to store or search any data. You 
> can declare fields of this type it in your schema to generate pseudo-random 
> orderings of your docs for sorting or function purposes. The ordering is 
> generated based on the field name and the version of the index. As long as 
> the index version remains unchanged, and the same field name is reused, the 
> ordering of the docs will be consistent. If you want different psuedo-random 
> orderings of documents, for the same version of the index, use a dynamicField 
> and change the field name in the request. -->~}}
>  
> In master slave mode, replication happens based on index version. If version 
> number of slave is different than that of master, replication is done by 
> slaves and the index number is updated to match the index version of master.
> However in SolrCloud mode, observation has been that replicas of the same 
> shard do not maintain the same version number at all times even though the 
> documents are same and consistent. 
> This has been previously discussed in [mailing list 
> |https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201508.mbox/%3ccae3utzmggprv-p6juwjwm2yyyxfw893xayq7+2hav7mmobm...@mail.gmail.com%3E]as
>  well.
> {quote}SolrCloud works very differently than the old master-slave replication.
> The index is NOT copied from the leader to the other replicas, except
>  in extreme recovery circumstances.
> Each replica builds its own copy of the index independently from the
>  others. Due to slight timing differences in the indexing operations,
>  and possible actions related to transaction log replay on node restart,
>  each replica may end up with a different index layout. There also could
>  be differences in the number of deleted documents. Unless something
>  goes really wrong, all replicas should contain the same live documents.
> {quote}
>  
> When a query to a shard is made which has 2 or more replicas, any replica is 
> chosen to respond to the query. Now, if all replicas do not have the same 
> index number, RandomSort will generate random hash seed differently for the 
> same random_<seed> field name.
> In the source code of 
> [RandomSort|https://github.com/apache/lucene-solr/blob/branch_6_5/solr/core/src/java/org/apache/solr/schema/RandomSortField.java]
>  class, in line 86, it mentions the use of index version (of shard) to create 
> random hash seed.
> Hence when querying a Solr Collection, for the same query, Solr is giving 
> different results depending on version mismatch in replicas as well as based 
> on which replica is serving request each time.
>  
> Example of Solr Query where random field is being used:
> [https://solr-stage.mydomain.com:8983/solr/mycollection/select?wt=json&q=*:*&defType=edismax&fl=id&boost=if(query(]{!v='documentDate:[2018-11-07
>  TO 
> *]'}),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),1),sub(1,div(1,1))),if(or(exists(query(\{!v='documentType:sponsored'})),exists(query(\{!v='documentType:featured'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),4),sub(1,div(1,4))),
>  
> if(or(exists(query(\{!v='documentType:listing'})),exists(query(\{!v='documentType:promotional'}))),sum(div(scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1),2),sub(1,div(1,2))),scale(random_SW84gaDAf3RynhOyGQDZlgAAAYc1,0,1))))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-12974) RandomSort not consistent in SolrCloud Mode

Reply via email to