[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-8614: -- Component/s: Local Write-Read Paths Compaction > Select optimal CRC32 implementation at runtime > -- > > Key: CASSANDRA-8614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8614 > Project: Cassandra > Issue Type: Improvement > Components: Compaction, Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Labels: performance > Attachments: 8614.patch, CRC32.class, Sample.java > > > JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec > per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if > I recall and it has a lookup table that evicts random cache lines every time > it runs. > In order to capture the benefit of that when it is available we can select a > CRC32 implementation at startup in a static block. > If JDK 8 is not what is running we can fall back to the existing > PureJavaCRC32 implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-8614: -- Attachment: CRC32.class > Select optimal CRC32 implementation at runtime > -- > > Key: CASSANDRA-8614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8614 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Labels: performance > Attachments: 8614.patch, CRC32.class, Sample.java > > > JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec > per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if > I recall and it has a lookup table that evicts random cache lines every time > it runs. > In order to capture the benefit of that when it is available we can select a > CRC32 implementation at startup in a static block. > If JDK 8 is not what is running we can fall back to the existing > PureJavaCRC32 implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-8614: -- Reviewer: Benedict > Select optimal CRC32 implementation at runtime > -- > > Key: CASSANDRA-8614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8614 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Labels: performance > Attachments: 8614.patch, Sample.java > > > JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec > per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if > I recall and it has a lookup table that evicts random cache lines every time > it runs. > In order to capture the benefit of that when it is available we can select a > CRC32 implementation at startup in a static block. > If JDK 8 is not what is running we can fall back to the existing > PureJavaCRC32 implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-8614: -- Attachment: 8614.patch New patch addressing a few things. I had to put in a fake Checksum class to compile against with the JDK 8 methods for byte buffers. javac is surprisingly OK with just pointing at a source file. Fixed formatting and added a test to make sure JDK detection is actually detecting and giving up the goodness. I think this should go in so we at least get it for the commit log. It looks like Adler is not fast in JDK 8 on Linux. It is inexplicably fast on OS X. The same speed as CRC32. I don't have an explanation for the funky performance numbers on OS X. On Linux I get the expected behavior where disabling the intrinsic is slow and switching to JDK 7 is slow. I will create a separate ticket for discussion of the right way to replace Adler32 with CRC32 in SSTables. > Select optimal CRC32 implementation at runtime > -- > > Key: CASSANDRA-8614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8614 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Labels: performance > Attachments: 8614.patch, Sample.java > > > JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec > per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if > I recall and it has a lookup table that evicts random cache lines every time > it runs. > In order to capture the benefit of that when it is available we can select a > CRC32 implementation at startup in a static block. > If JDK 8 is not what is running we can fall back to the existing > PureJavaCRC32 implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-8614: -- Attachment: (was: 8614.patch) > Select optimal CRC32 implementation at runtime > -- > > Key: CASSANDRA-8614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8614 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Labels: performance > Attachments: Sample.java > > > JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec > per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if > I recall and it has a lookup table that evicts random cache lines every time > it runs. > In order to capture the benefit of that when it is available we can select a > CRC32 implementation at startup in a static block. > If JDK 8 is not what is running we can fall back to the existing > PureJavaCRC32 implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-8614: Assignee: Ariel Weisberg (was: Benedict) > Select optimal CRC32 implementation at runtime > -- > > Key: CASSANDRA-8614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8614 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Labels: performance > Attachments: 8614.patch, Sample.java > > > JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec > per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if > I recall and it has a lookup table that evicts random cache lines every time > it runs. > In order to capture the benefit of that when it is available we can select a > CRC32 implementation at startup in a static block. > If JDK 8 is not what is running we can fall back to the existing > PureJavaCRC32 implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-8614: -- Attachment: Sample.java JMH benchmark. For small sizes it's not as fast, but at a kilobyte it is many times faster. It also doesn't evict random cache lines so the impact could be larger than what the micro benchmark shows. For large sizes it indeed does 13 gigabytes/sec which is pretty crazy. There is a performance delta between direct and non-direct byte buffers in favor of direct byte buffers and the one case I looked at it was 2x faster. {noformat} [java] Benchmark (byteSize) Mode Samples Score Error Units [java] o.a.c.t.m.Sample.CRC32Array 128 thrpt 6 13905041.788 ± 598179.976 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32 128 thrpt 6 10525663.252 ± 507525.667 ops/s [java] o.a.c.t.m.Sample.CRC32Array 512 thrpt 6 14571599.254 ± 8930061.376 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32 512 thrpt 6 2835430.274 ± 92029.259 ops/s [java] o.a.c.t.m.Sample.CRC32Array 1024 thrpt 6 8337714.641 ± 3988493.638 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32 1024 thrpt 6 1428928.434 ± 31709.319 ops/s [java] o.a.c.t.m.Sample.CRC32Array 1048576 thrpt 6 12364.723 ± 344.434 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32 1048576 thrpt 6 1412.017 ± 89.214 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBuffer 128 thrpt 6 15925509.375 ± 779733.985 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer 128 thrpt 6 10446360.681 ± 599847.210 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBuffer 512 thrpt 6 10906108.722 ± 346735.334 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer 512 thrpt 6 2873179.754 ± 140004.771 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBuffer1024 thrpt 6 6582936.616 ± 2219292.645 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer1024 thrpt 6 1440343.345 ± 42303.806 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBuffer 1048576 thrpt 6 12555.846 ± 514.918 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBuffer 1048576 thrpt 6 1414.886 ± 58.363 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect 128 thrpt 6 31786603.552 ± 2000265.643 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect 128 thrpt 6 9169128.441 ± 296419.993 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect 512 thrpt 6 15768165.220 ± 589215.966 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect 512 thrpt 6 2614215.362 ± 171099.973 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect 1024 thrpt 6 9846566.689 ± 447235.143 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect 1024 thrpt 6 1327731.561 ± 41147.584 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferDirect 1048576 thrpt 6 12467.127 ± 543.952 ops/s [java] o.a.c.t.m.Sample.PureJavaCrc32ByteBufferDirect 1048576 thrpt 6 1333.941 ± 20.311 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped128 thrpt 6 30545863.214 ± 2669919.886 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped512 thrpt 6 14929967.141 ± 1596223.606 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped 1024 thrpt 6 9408037.238 ± 564849.404 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferDirectWrapped1048576 thrpt 6 12020.464 ± 417.515 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped 128 thrpt 6 12996481.274 ± 9216253.478 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped 512 thrpt 6 9632311.965 ± 4249496.365 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped 1024 thrpt 6 7068335.746 ± 2112734.871 ops/s [java] o.a.c.t.m.Sample.CRC32ByteBufferWrapped 1048576 thrpt 6 12580.275 ± 838.737 ops/s {noformat} > Select optimal CRC32 implementation at runtime > -- > > Key: CASSANDRA-8614 > URL: https://is
[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-8614: -- Attachment: 8614.patch Patch including all/missing files > Select optimal CRC32 implementation at runtime > -- > > Key: CASSANDRA-8614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8614 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ariel Weisberg > Labels: performance > Attachments: 8614.patch > > > JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec > per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if > I recall and it has a lookup table that evicts random cache lines every time > it runs. > In order to capture the benefit of that when it is available we can select a > CRC32 implementation at startup in a static block. > If JDK 8 is not what is running we can fall back to the existing > PureJavaCRC32 implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-8614: -- Attachment: (was: 8614.patch) > Select optimal CRC32 implementation at runtime > -- > > Key: CASSANDRA-8614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8614 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ariel Weisberg > Labels: performance > > JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec > per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if > I recall and it has a lookup table that evicts random cache lines every time > it runs. > In order to capture the benefit of that when it is available we can select a > CRC32 implementation at startup in a static block. > If JDK 8 is not what is running we can fall back to the existing > PureJavaCRC32 implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-8614: -- Attachment: (was: CRC32FactoryTest.java) > Select optimal CRC32 implementation at runtime > -- > > Key: CASSANDRA-8614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8614 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ariel Weisberg > Labels: performance > > JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec > per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if > I recall and it has a lookup table that evicts random cache lines every time > it runs. > In order to capture the benefit of that when it is available we can select a > CRC32 implementation at startup in a static block. > If JDK 8 is not what is running we can fall back to the existing > PureJavaCRC32 implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8614) Select optimal CRC32 implementation at runtime
[ https://issues.apache.org/jira/browse/CASSANDRA-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-8614: -- Attachment: CRC32FactoryTest.java 8614.patch Compiles on Java 7 and when run on Java 8 you get the intrinsic. There is a test case to validate that the two checksums implementations behave the same. > Select optimal CRC32 implementation at runtime > -- > > Key: CASSANDRA-8614 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8614 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ariel Weisberg > Labels: performance > Attachments: 8614.patch, CRC32FactoryTest.java > > > JDK 8 has support for an intrinsic for CRC32 that runs at 12-13 gigabytes/sec > per core in my quick and dirty test. PureJavaCRC32 is < 800 megabytes/sec if > I recall and it has a lookup table that evicts random cache lines every time > it runs. > In order to capture the benefit of that when it is available we can select a > CRC32 implementation at startup in a static block. > If JDK 8 is not what is running we can fall back to the existing > PureJavaCRC32 implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)