[jira] [Commented] (KAFKA-4666) Failure test for Kafka configured for consistency vs availability

Ewen Cheslack-Postava (JIRA) Tue, 24 Jan 2017 21:28:09 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837218#comment-15837218
 ]


Ewen Cheslack-Postava commented on KAFKA-4666:
----------------------------------------------

[~ecesena] Absolutely -- if you're very familiar with Kafka's semantics this 
can seem obvious, but clarifications in docs are always welcome. I'm sure 
plenty of folks would appreciate a clarification in the docs (with the tradeoff 
that too much detail can become confusing).

Re: the test, I'm all for merging more tests, but this is a sort of weird one 
-- it technically will be correct since it catches the expected exception, but 
after that there aren't any useful assertions, are there? (If I'm missing it, 
let me know -- I only quickly scanned the patch.) What I might expect from a 
test like this is to assert that there *is* data loss due to the situation, but 
even that is tricky because data loss is only possible, not required. This is 
what get's pretty tricky about testing cases like this (and writing system 
tests in general, since useful tests are generally non-deterministic).

I'm definitely all for adding a test that validates the expected behavior, I 
just want to make sure we have a test that a) verifies what we want to test and 
b) is actually robust.

Again, props for jumping into the system tests to do validation here, I think 
that's awesome :)

> Failure test for Kafka configured for consistency vs availability
> -----------------------------------------------------------------
>
>                 Key: KAFKA-4666
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4666
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Emanuele Cesena
>         Attachments: consistency_test.py
>
>
> We recently had an issue with our Kafka setup because of a misconfiguration.
> In short, we thought we have configured Kafka for durability, but we didn't 
> set the producers to acks=all. During a full outage, we had situations where 
> some partitions were "partitioned", meaning that the followers started 
> without properly waiting for the right leader, and thus we lost data. Again, 
> this is not an issue with Kafka, but a misconfiguration on our side.
> I think we reproduced the issue, and we built a docker test that proves that, 
> if the producer isn't set with acks=all, then data can be lost during an 
> almost full outage. The test is attached.
> I was thinking to send a PR, but wanted to run this through you first, as 
> it's not necessarily proving that a feature works as expected.
> In addition, I think the documentation could be slightly improved, for 
> instance in the section:
> http://kafka.apache.org/documentation/#design_ha
> by clearly stating that there are 3 steps one should do for configuring kafka 
> for consistency, the third being that producers should be set with acks=all 
> (which is now part of the 2nd point).
> Please let me know what do you think, and I can send a PR if you agree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4666) Failure test for Kafka configured for consistency vs availability

Reply via email to