[ https://issues.apache.org/jira/browse/KAFKA-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404030#comment-13404030 ]
Jay Kreps commented on KAFKA-382: --------------------------------- Yes, that makes sense. > Write ordering guarantee violated > --------------------------------- > > Key: KAFKA-382 > URL: https://issues.apache.org/jira/browse/KAFKA-382 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8 > Reporter: Jay Kreps > Assignee: Jay Kreps > Fix For: 0.8 > > > The guarantee is that if the producer does > send(X) > send(Y) > the client see X first and Y second, but this may not actually happen in 0.8. > The reason is because of the parallel I/O threads and the single queue in the > network server. The current model is one work queue and one response queue > per selector. The single queue is great from a parallelism point of view--if > one thread is blocked another can do the work--but this actually breaks the > ordering guarantee. Not sure how I missed this in the initial work. :-( > The reason for the single work queue was to avoid blocking a whole selector > when one thread does a flush. But I wonder now how relevant that is now. If > the durability guarantee comes from replication I think there is not much > reason to have a blocking flush, we can rely on pdflush to do it in the > background so doing the write synchronously may be fine. > I think the solution is to modify RequestChannel to have one work queue per > I/O thread and hash into the work queue by connection id. In this solution a > blocked I/O thread only blocks clients that hash onto it. This retains the > current async model but no longer has the property that a blocked thread > doesn't block everyone. (At first I thought we didn't need a RequestChannel > at all any more and could just synchronously return zero or more requests > from KafkaApis, but in reality because of the possibility of request timeout > from a background thread, this won't work.) > It would also be possible to be smarter still and attempt a non-blocking > solution that only preserves the write-ordering guarantees. One solution > would be as follows. Each request from a given connection would be assigned > an increasing number starting with 0 by the network layer. KafkaApi would > keep a "last processed" number for each connection. Any request which is more > than the current number for that connection + 1 would be re-enqueued. I don't > like this solution because it is more complex and because I don't think > blocking flushes are needed now that we have replication (e.g. you can just > turn on replication and rely on pdflush which is async), so optimizing this > case is not useful imo. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira