[ 
https://issues.apache.org/jira/browse/KAFKA-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402344#comment-13402344
 ] 

Jay Kreps commented on KAFKA-374:
---------------------------------

Here are full performance results

The size is in bytes and the value for native/java is the nanoseconds per 
message averaged over a large number of messages:

size    native  java    improvement
16      149.47  108.11  27.7%
32      197.8   149.78  24.3%
64      291.01  219.89  24.4%
128     487.36  357.64  26.6%
256     892.78  631.15  29.3%
512     1774.22 1251.4  29.5%
1024    3412.79 2470.58 27.6%
2048    6594.28 4421.38 33.0%
4096    13121.85        8751.19 33.3%
8192    25689.03        18173.61        29.3%
16384   51258.21        36278.3 29.2%
32768   103584.61       73240.5 29.3%
65536   207569.05       146748.51       29.3%
131072  415893.86       292083.12       29.8%

I suspect there is still some scala numeric boxing magic happening here that 
would be good to get rid of.
                
> Move to java CRC32 implementation
> ---------------------------------
>
>                 Key: KAFKA-374
>                 URL: https://issues.apache.org/jira/browse/KAFKA-374
>             Project: Kafka
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jay Kreps
>            Priority: Minor
>              Labels: newbie
>         Attachments: KAFKA-374-draft.patch
>
>
> We keep a per-record crc32. This is fairly cheap algorithm, but the java 
> implementation uses JNI and it seems to be a bit expensive for small records. 
> I have seen this before in Kafka profiles, and I noticed it on another 
> application I was working on. Basically with small records the native 
> implementation can only checksum < 100MB/sec. Hadoop has done some analysis 
> of this and replaced it with a Java implementation that is 2x faster for 
> large values and 5-10x faster for small values. Details are here HADOOP-6148.
> We should do a quick read/write benchmark on log and message set iteration 
> and see if this improves things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to