Okay, I checked on this. The producer computes the checksum twice, which is once too many. The server computes the checksum an astounding 39 times in 0.8 branch! I think this answers the earlier questions: The right thing to do is clearly to remove the check from the iterator as people are iterating many times.
-Jay On Fri, Sep 14, 2012 at 9:12 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > I actually think the validBytes() call is looping and hence computing CRCs > too. So actually at least three times. > > -Jay > > > On Fri, Sep 14, 2012 at 9:10 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> Hey guys, >> >> I noticed that our ByteBufferMessageSet iterator computes the message CRC >> on iteration. Log.append also loops through all messages and checks the >> same checksum. This means we are effectively computing the CRC32 twice >> inside Log.append (we may actually be doing it more than that we would have >> to look for other times we loop through the message set). This is actually >> quite expensive. Does anyone remember why the second check was added? I am >> not completely sure of the right fix--intuitively I think we should remove >> the automatic check in the iterator and instead add a check in the >> high-level consumer. I think it is okay for the simple consumer to be left >> to manually call isValid or not since it is a lower-level interface. >> >> Thoughts? >> >> -Jay >> > >