[ 
https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8614:
--------------------------------------
    Attachment: Sample.java

JMH benchmark. For small sizes it's not as fast, but at a kilobyte it is many 
times faster. It also doesn't evict random cache lines so the impact could be 
larger than what the micro benchmark shows.

For large sizes it indeed does 13 gigabytes/sec which is pretty crazy.

There is a performance delta between direct and non-direct byte buffers in 
favor of direct byte buffers and the one case I looked at it was 2x faster.

{noformat}
     [java] Benchmark                                         (byteSize)   Mode 
 Samples         Score         Error  Units
     [java] o.a.c.t.m.Sample.CRC32Array                              128  thrpt 
       6  13905041.788 ±  598179.976  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32                           128  thrpt 
       6  10525663.252 ±  507525.667  ops/s

     [java] o.a.c.t.m.Sample.CRC32Array                              512  thrpt 
       6  14571599.254 ± 8930061.376  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32                           512  thrpt 
       6   2835430.274 ±   92029.259  ops/s

     [java] o.a.c.t.m.Sample.CRC32Array                             1024  thrpt 
       6   8337714.641 ± 3988493.638  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32                          1024  thrpt 
       6   1428928.434 ±   31709.319  ops/s

     [java] o.a.c.t.m.Sample.CRC32Array                          1048576  thrpt 
       6     12364.723 ±     344.434  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32                       1048576  thrpt 
       6      1412.017 ±      89.214  ops/s

     [java] o.a.c.t.m.Sample.CRC32ByteBuffer                         128  thrpt 
       6  15925509.375 ±  779733.985  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer                 128  thrpt 
       6  10446360.681 ±  599847.210  ops/s

     [java] o.a.c.t.m.Sample.CRC32ByteBuffer                         512  thrpt 
       6  10906108.722 ±  346735.334  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer                 512  thrpt 
       6   2873179.754 ±  140004.771  ops/s

     [java] o.a.c.t.m.Sample.CRC32ByteBuffer                        1024  thrpt 
       6   6582936.616 ± 2219292.645  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer                1024  thrpt 
       6   1440343.345 ±   42303.806  ops/s

     [java] o.a.c.t.m.Sample.CRC32ByteBuffer                     1048576  thrpt 
       6     12555.846 ±     514.918  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer             1048576  thrpt 
       6      1414.886 ±      58.363  ops/s

     [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect                   128  thrpt 
       6  31786603.552 ± 2000265.643  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect           128  thrpt 
       6   9169128.441 ±  296419.993  ops/s

     [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect                   512  thrpt 
       6  15768165.220 ±  589215.966  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect           512  thrpt 
       6   2614215.362 ±  171099.973  ops/s

     [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect                  1024  thrpt 
       6   9846566.689 ±  447235.143  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect          1024  thrpt 
       6   1327731.561 ±   41147.584  ops/s

     [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect               1048576  thrpt 
       6     12467.127 ±     543.952  ops/s
     [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect       1048576  thrpt 
       6      1333.941 ±      20.311  ops/s


     [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped            128  thrpt 
       6  30545863.214 ± 2669919.886  ops/s
     [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped            512  thrpt 
       6  14929967.141 ± 1596223.606  ops/s
     [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped           1024  thrpt 
       6   9408037.238 ±  564849.404  ops/s
     [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped        1048576  thrpt 
       6     12020.464 ±     417.515  ops/s
     [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped                  128  thrpt 
       6  12996481.274 ± 9216253.478  ops/s
     [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped                  512  thrpt 
       6   9632311.965 ± 4249496.365  ops/s
     [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped                 1024  thrpt 
       6   7068335.746 ± 2112734.871  ops/s
     [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped              1048576  thrpt 
       6     12580.275 ±     838.737  ops/s
{noformat}

> Select optimal CRC32 implementation at runtime
> ----------------------------------------------
>
>                 Key: CASSANDRA-8614
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8614
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ariel Weisberg
>              Labels: performance
>         Attachments: 8614.patch, Sample.java
>
>
> JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec 
> per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if 
> I recall and it has a lookup table that evicts random cache lines every time 
> it runs.
> In order to capture the benefit of that when it is available we can select a 
> CRC32 implementation at startup in a static block.
> If JDK 8 is not what is running we can fall back to the existing 
> PureJavaCRC32 implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to