Re: Elastic Search and consistency
On Fri, Jun 13, 2014 at 12:11 PM, shikhar shik...@schmizz.net wrote: I take this back, I understand the ES model better now. So although the write-consistency-level check is only applied before the write is about to be issued, with sync replication the client can only get an ack if it succeded on the primary shard as well as all replicas (as per the same cluster state as the check is performed on). In case it fails on some replica(s), the operation would be retried (together with the write- consistency-level check using a possibly-updated cluster state). FWIW I'm really not sure anymore. TransportShardReplicationOperationAction where this stuff is happening has a bunch of logic in performReplicas(..) https://github.com/elasticsearch/elasticsearch/blob/a06fd46a72193a387024b00e226241511a3851d0/src/main/java/org/elasticsearch/action/support/replication/TransportShardReplicationOperationAction.java#L557-L676 where it decides to take into account updated cluster state, and there seem to be exceptions for certain kinds of failures being tolerated. Seems like this would be so much more straightforward if a write were to be fanned-out and then block uptil max of timeout for checking that the requried number of replicas succeeded (with success on primary being required). -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHWG4DMM_-0ySs3iCFkVKxL3DrNW2vT9daUAT9eWfXHpUrN2wQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elastic Search and consistency
index.gateway.local.sync: 0 is related to durability, it means, the underlying data is really going to disk by using the guarantee of FileChannel.force(false). This destroys performance compared to the default value of ES, because there are a lot more I/O operations on OS layer when fsync() is performed at each operation. Note, there is a small time gap between the API call, the transfer of the API call to the executing node(s), and the execution of the translog write ahead operation logging (no matter if sync=0 or sync!=0). In the early stage of request processing, a response may not have been sent back to the client, because the request is still on the way. In rare cases, all nodes that are determined to receive the API call may crash, and also the responding node is not able to create a response for sending it back to the client. So it is up to the ES client to maintain a list of yet unanswered API calls, and wait for successes, failures, and timeouts of all requests being sent. API calls not answered should be treated accordingly at client side, to avoid data inconsistency. For example, unanswered insertion operations should be repeated (after the ES cluster comes up again). This situation is rare - sudden fail of the ability of an ES cluster to create responses and a lot of outstanding API requests. And on well-balanced ES clusters, the gap should be minimal and almost not noticeable. On heavy loaded clusters, or on clusters with slow I/O, the latencies are high, and the gap may be more noticeable. The default settings of ES helps a lot to balance out these situations, by trying to prevent the overrun of the ES API. Jörg On Fri, Jun 13, 2014 at 8:41 AM, shikhar shik...@schmizz.net wrote: On Thu, Jun 12, 2014 at 8:52 PM, shikhar shik...@schmizz.net wrote: ES currently does not seem to provide any guarantee that an acknowledged write (from the caller's perspective) succeeded on a quorum of replicas. I take this back, I understand the ES model better now. So although the write-consistency-level check is only applied before the write is about to be issued, with sync replication the client can only get an ack if it succeded on the primary shard as well as all replicas (as per the same cluster state as the check is performed on). In case it fails on some replica(s), the operation would be retried (together with the write-consistency-level check using a possibly-updated cluster state). This makes it unsuitable for a primary data store, given you can see data loss despite having replicas! If using ES as a primary store, you should really be running it with * index.gateway.local.sync: 0* to make sure the translog fsync's on every write operation A follow-up question: what if there is a failure on one of the replicas that prevents writes (e.g. disk full) but this is not preventing the node from dropping out of discovery state due to being healthy otherwise? Does it not make that node a SPOF? This is something we have run into with SolrCloud https://issues.apache.org/jira/browse/SOLR-5805. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOi0sTu5_mwv%3DNJL5SH7%3D1Z5CG2iULSbF0_P7ZDULY-qw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOi0sTu5_mwv%3DNJL5SH7%3D1Z5CG2iULSbF0_P7ZDULY-qw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEbbuO6uWE5ZdxyR6Z%3DTTzUe9Bzt3%2Bcqf%2BFjo9w_rKRyA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elastic Search and consistency
On Thu, Jun 12, 2014 at 8:52 PM, shikhar shik...@schmizz.net wrote: ES currently does not seem to provide any guarantee that an acknowledged write (from the caller's perspective) succeeded on a quorum of replicas. I take this back, I understand the ES model better now. So although the write-consistency-level check is only applied before the write is about to be issued, with sync replication the client can only get an ack if it succeded on the primary shard as well as all replicas (as per the same cluster state as the check is performed on). In case it fails on some replica(s), the operation would be retried (together with the write-consistency-level check using a possibly-updated cluster state). This makes it unsuitable for a primary data store, given you can see data loss despite having replicas! If using ES as a primary store, you should really be running it with * index.gateway.local.sync: 0* to make sure the translog fsync's on every write operation A follow-up question: what if there is a failure on one of the replicas that prevents writes (e.g. disk full) but this is not preventing the node from dropping out of discovery state due to being healthy otherwise? Does it not make that node a SPOF? This is something we have run into with SolrCloud https://issues.apache.org/jira/browse/SOLR-5805. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOi0sTu5_mwv%3DNJL5SH7%3D1Z5CG2iULSbF0_P7ZDULY-qw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elastic Search and consistency
Hi Anand, I would love to know as well, and asked in this thread today: quorum write + sync replication: guarantees. It is pretty important information given ElasticSearch is being touted as a reliable data store... which the presence of a write consistency level setting would tempt one to believe, but there doesn't seem to be evidence to support the guarantees that implies. Cheers, Shikhar On Sat, Apr 5, 2014 at 12:06 AM, Anand Somani meatfor...@gmail.com wrote: Hi, We are trying to use ES as a system of record, so I trying to understand its consistency and durability related guarantees and knobs. For this scenario assume I will have a 2 copies ( 1 replica). From what I have read and gleaned ( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency) = - write.consistency is a check to see if the operation can be executed. It offers no guarantees in terms of data durability or operation guarantee - replication type = Is this for durability guarantees? - sync all copies of the data are saved/indexed on nodes before it returns. - async primary node will be synchronous the others will be async. Failure scenario question Scenario is = PUT with write.consistency=quorum, but different replications = sync/async. = What happens if there is a network glitch after the quorum check succeeds, such that one of the non-primary shard machines is not reachable or down? 1. Put with write.consistency=quorum and replication=sync = 1. Would the operation fail from the callers perspective? Seems like it would after some retries? 2. I have read that if the primary succeeds, and if it has problems with replica then it will try to create a new shard, is that correct understanding? I assume that should be preventable to handle network failures/partitions? 2. Put with write.consistency=quorum and replication=async = 1. The operation succeeds, since primary shard is up. 2. Now in sometime the unreachable node is back, how does it get that copy of data that it did not get, is there some steps/process that will do that automatically? If these have been asked/answered or documents, please point me to a source and my apologies. Thanks Anand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANN9sNpFY39T2t2uy51Zh0m1p%3DBpqtqb7f9ho1iWTV%3DeUab23Q%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CANN9sNpFY39T2t2uy51Zh0m1p%3DBpqtqb7f9ho1iWTV%3DeUab23Q%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHWG4DPhhY20_PPDOC%2BDTCsDv%2BddhkNDd-Q5pejeAS-SK3qREg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elastic Search and consistency
I think the documentation is quite clear, but I try to explain in my own words. 1.1 Not sure what you mean after the quorum check. Write consistency is a model where ES makes sure there are enough recipients (nodes) before writes are executed. consistency=quorum fails if you have too few nodes to match the actual replica count of the index. 1.2. No new shard is created - not sure I understand why such a thing should happen? 2.1. No, the operation fails early if replica count is too low. replication=async means, the operation is executed without waiting for nodes to respond. It does not mean to ignore replica count. 2.2. When a node comes back, index recovery is performed to obtain consistency. The node holding a replica with intact checksums and youngest translog wins and will send this version of the shard to the other node. Jörg On Fri, Apr 4, 2014 at 8:36 PM, Anand Somani meatfor...@gmail.com wrote: Hi, We are trying to use ES as a system of record, so I trying to understand its consistency and durability related guarantees and knobs. For this scenario assume I will have a 2 copies ( 1 replica). From what I have read and gleaned ( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency) = - write.consistency is a check to see if the operation can be executed. It offers no guarantees in terms of data durability or operation guarantee - replication type = Is this for durability guarantees? - sync all copies of the data are saved/indexed on nodes before it returns. - async primary node will be synchronous the others will be async. Failure scenario question Scenario is = PUT with write.consistency=quorum, but different replications = sync/async. = What happens if there is a network glitch after the quorum check succeeds, such that one of the non-primary shard machines is not reachable or down? 1. Put with write.consistency=quorum and replication=sync = 1. Would the operation fail from the callers perspective? Seems like it would after some retries? 2. I have read that if the primary succeeds, and if it has problems with replica then it will try to create a new shard, is that correct understanding? I assume that should be preventable to handle network failures/partitions? 2. Put with write.consistency=quorum and replication=async = 1. The operation succeeds, since primary shard is up. 2. Now in sometime the unreachable node is back, how does it get that copy of data that it did not get, is there some steps/process that will do that automatically? If these have been asked/answered or documents, please point me to a source and my apologies. Thanks Anand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANN9sNpFY39T2t2uy51Zh0m1p%3DBpqtqb7f9ho1iWTV%3DeUab23Q%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CANN9sNpFY39T2t2uy51Zh0m1p%3DBpqtqb7f9ho1iWTV%3DeUab23Q%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG2SqYCjnq7WO5BCJZdPXtVDKbmcuwLB0WQU5Djcvd-rg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Elastic Search and consistency
On Thu, Jun 12, 2014 at 8:42 PM, joergpra...@gmail.com joergpra...@gmail.com wrote: 1.1 Not sure what you mean after the quorum check. Write consistency is a model where ES makes sure there are enough recipients (nodes) before writes are executed. consistency=quorum fails if you have too few nodes to match the actual replica count of the index. This is the part that I am also after. ES currently does not seem to provide any guarantee that an acknowledged write (from the caller's perspective) succeeded on a quorum of replicas. This makes it unsuitable for a primary data store, given you can see data loss despite having replicas! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHWG4DPDXoRzF1wXOK44MWKihR2R0A_JVZnZ1L1T6dxq7JVySg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Elastic Search and consistency
Hi, We are trying to use ES as a system of record, so I trying to understand its consistency and durability related guarantees and knobs. For this scenario assume I will have a 2 copies ( 1 replica). From what I have read and gleaned ( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency) = - write.consistency is a check to see if the operation can be executed. It offers no guarantees in terms of data durability or operation guarantee - replication type = Is this for durability guarantees? - sync all copies of the data are saved/indexed on nodes before it returns. - async primary node will be synchronous the others will be async. Failure scenario question Scenario is = PUT with write.consistency=quorum, but different replications = sync/async. = What happens if there is a network glitch after the quorum check succeeds, such that one of the non-primary shard machines is not reachable or down? 1. Put with write.consistency=quorum and replication=sync = 1. Would the operation fail from the callers perspective? Seems like it would after some retries? 2. I have read that if the primary succeeds, and if it has problems with replica then it will try to create a new shard, is that correct understanding? I assume that should be preventable to handle network failures/partitions? 2. Put with write.consistency=quorum and replication=async = 1. The operation succeeds, since primary shard is up. 2. Now in sometime the unreachable node is back, how does it get that copy of data that it did not get, is there some steps/process that will do that automatically? If these have been asked/answered or documents, please point me to a source and my apologies. Thanks Anand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CANN9sNpFY39T2t2uy51Zh0m1p%3DBpqtqb7f9ho1iWTV%3DeUab23Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.