Jay this is great news!  I'm so happy to see progress on this front.
It will help our use case immensely.



On Apr 4, 2012, at 8:38 AM, Jay Kreps <jay.kr...@gmail.com> wrote:

> I ran a quick timing test on the long-poll fetches in 0.8
> (https://issues.apache.org/jira/browse/KAFKA-48). This measures
> end-to-end latency of a send to the broker followed by a read by the
> consumer. Previously this number was actually pretty bad for us, we
> had a backoff of a few hundred ms in the consumer to avoid
> spin-waiting on data arrival, so on average you waited 50% of that.
> Bottom line is that after the long poll patch the end-to-end latency
> is 0.93ms. This is not amazing by low-latency messaging standards but
> absolutely incredibly by the standards of high-throughput log
> aggregation systems, and there are a few ways this test is pessimistic
> (see below).
>
> This is using a flush.interval of 1 to avoid batching writes because
> right now we don't hand out messages until post-flush. After
> replication I don't think we need this delay because the durability
> guarantees will come from replication not disk flush, which is
> arguably better.
>
> This number is obviously important for people who hope for low-latency
> messaging. More importantly this is also very closely related to the
> number we will see for the quorum writes with replication. Obviously
> there is a big difference if the send() performance with acks > 1
> takes 5ms, 50ms, or 500ms to get replicated and acknowledged. This
> send, replicate, and acknowledge loop is pretty similar to the fetch
> and consume loop I am testing, so we can probably put together a
> reasonable simulation of performance for different replication
> scenarios with this data.
>
> I think a sub-1ms produce/fetch path means we should definitely be
> able to get a sub-10ms replicated send().
>
> This test is something like,
> loop {
>  start = System.nanoTime
>  producer.send(message)
>  iterator.next()
>  recordTime(System.nanoTime - start)
> }
>
> This measurement is just over localhost so there is no actual network
> latency, which is optimistic. There are two ways the test is
> pessimistic. First, as I mentioned it includes a disk flush on each
> message because of the flush.interval. This may represent much of the
> time. Second, the send() call is now synchronous, so the consumer is
> actually blocked by the producer acknowledgement.
>
> -Jay

Reply via email to