See also: http://cr.openjdk.java.net/~drchase/7088419/webrev.01/

This is the version that was not coded as an intrinsic, that also included
fork/join.  The crazy Intel instructions are accessed from native code,
so you can get a feel for what the code looks like before it is converted
to an intrinsic.  There’s a certain amount of brain-hurt involved in the
fork/join code, but it works.

I’m still trying to figure out if the whole thing is just bit-flipped.

David

On 2014-10-17, at 4:50 PM, David Chase <david.r.ch...@oracle.com> wrote:

> 
> On 2014-10-17, at 2:53 PM, Staffan Friberg <staffan.frib...@oracle.com> wrote:
> 
>> Fully agree that using Unsafe makes one sad.
>> 
>> I'm just about to send out a new webrev with Alan's and Peter's comments, 
>> once I have that done I will give using the NIO-Buffer API a second try to 
>> see if using IntBuffer and LongBuffer is able to achieve similar performance.
>> 
>> As I noted in my reply the second goal after adding this API will is to 
>> create intrinsics that make use of the crc32c instructions available on both 
>> x86 and SPARC which will bump the performance even further. So one thing I 
>> try to do is make sure the implementation makes it easy to do that without 
>> having to completely rewrite it again.
> 
> I’d like to review this, but it will take me a little bit of time.
> Recall that I did a lot of work on CRC for Intel to take advantage
> of the carryless multiply instructions for CRC32.
> 
> Reading the comments, if I understand this properly, the difference between
> CRC32 and CRC32C is that CRC32C is just the bit or byte flip of CRC32 as
> we currently compute it.   If so, wouldn’t it make more sense to not reinvent
> that rather tricky wheel?  The code you have carefully written will run 
> substantially
> slower than CRC32 on recent Intel hardware (Haswell and newer in particular)
> because there an intrinsic is already substituted.
> 
> Can you verify whether this bit/byte flipping equivalence holds, or not?
> 
> If we were interested in true peak performance, we’d also investigate
> fork/join parallelism; I did this once and it worked just fine if you made the
> block sizes large enough.
> 
> David
> 

Reply via email to