See also: http://cr.openjdk.java.net/~drchase/7088419/webrev.01/
This is the version that was not coded as an intrinsic, that also included fork/join. The crazy Intel instructions are accessed from native code, so you can get a feel for what the code looks like before it is converted to an intrinsic. There’s a certain amount of brain-hurt involved in the fork/join code, but it works. I’m still trying to figure out if the whole thing is just bit-flipped. David On 2014-10-17, at 4:50 PM, David Chase <david.r.ch...@oracle.com> wrote: > > On 2014-10-17, at 2:53 PM, Staffan Friberg <staffan.frib...@oracle.com> wrote: > >> Fully agree that using Unsafe makes one sad. >> >> I'm just about to send out a new webrev with Alan's and Peter's comments, >> once I have that done I will give using the NIO-Buffer API a second try to >> see if using IntBuffer and LongBuffer is able to achieve similar performance. >> >> As I noted in my reply the second goal after adding this API will is to >> create intrinsics that make use of the crc32c instructions available on both >> x86 and SPARC which will bump the performance even further. So one thing I >> try to do is make sure the implementation makes it easy to do that without >> having to completely rewrite it again. > > I’d like to review this, but it will take me a little bit of time. > Recall that I did a lot of work on CRC for Intel to take advantage > of the carryless multiply instructions for CRC32. > > Reading the comments, if I understand this properly, the difference between > CRC32 and CRC32C is that CRC32C is just the bit or byte flip of CRC32 as > we currently compute it. If so, wouldn’t it make more sense to not reinvent > that rather tricky wheel? The code you have carefully written will run > substantially > slower than CRC32 on recent Intel hardware (Haswell and newer in particular) > because there an intrinsic is already substituted. > > Can you verify whether this bit/byte flipping equivalence holds, or not? > > If we were interested in true peak performance, we’d also investigate > fork/join parallelism; I did this once and it worked just fine if you made the > block sizes large enough. > > David >