On Sat, Sep 13, 2014 at 10:27 PM, k...@rice.edu <k...@rice.edu> wrote:

> On Sat, Sep 13, 2014 at 09:50:55PM -0300, Arthur Silva wrote:
> > On Sat, Sep 13, 2014 at 1:55 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >
> > > Andres Freund <and...@2ndquadrant.com> writes:
> > > > On 2014-09-13 08:52:33 +0300, Ants Aasma wrote:
> > > >> On Sat, Sep 13, 2014 at 6:59 AM, Arthur Silva <arthur...@gmail.com>
> > > wrote:
> > > >>> That's not entirely true. CRC-32C beats pretty much everything with
> > > the same
> > > >>> length quality-wise and has both hardware implementations and
> highly
> > > >>> optimized software versions.
> > >
> > > >> For better or for worse CRC is biased by detecting all single bit
> > > >> errors, the detection capability of larger errors is slightly
> > > >> diminished. The quality of the other algorithms I mentioned is also
> > > >> very good, while producing uniformly varying output.
> > >
> > > > There's also much more literature about the various CRCs in
> comparison
> > > > to some of these hash allgorithms.
> > >
> > > Indeed.  CRCs have well-understood properties for error detection.
> > > Have any of these new algorithms been analyzed even a hundredth as
> > > thoroughly?  No.  I'm unimpressed by evidence-free claims that
> > > something else is "also very good".
> > >
> > > Now, CRCs are designed for detecting the sorts of short burst errors
> > > that are (or were, back in the day) common on phone lines.  You could
> > > certainly make an argument that that's not the type of threat we face
> > > for PG data.  However, I've not seen anyone actually make such an
> > > argument, let alone demonstrate that some other algorithm would be
> better.
> > > To start with, you'd need to explain precisely what other error pattern
> > > is more important to defend against, and why.
> > >
> > >                         regards, tom lane
> > >
> >
> > Mysql went this way as well, changing the CRC polynomial in 5.6.
> >
> > What we are looking for here is uniqueness thus better error detection.
> Not
> > avalanche effect, nor cryptographically secure, nor bit distribution.
> > As far as I'm aware CRC32C is unbeaten collision wise and time proven.
> >
> > I couldn't find tests with xxhash and crc32 on the same hardware so I
> spent
> > some time putting together a benchmark (see attachment, to run it just
> > start run.sh)
> >
> > I included a crc32 implementation using ssr4.2 instructions (which works
> on
> > pretty much any Intel processor built after 2008 and AMD built after
> 2012),
> > a portable Slice-By-8 software implementation and xxhash since it's the
> > fastest software 32bit hash I know of.
> >
> > Here're the results running the test program on my i5-4200M
> >
> > crc sb8: 90444623
> > elapsed: 0.513688s
> > speed: 1.485220 GB/s
> >
> > crc hw: 90444623
> > elapsed: 0.048327s
> > speed: 15.786877 GB/s
> >
> > xxhash: 7f4a8d5
> > elapsed: 0.182100s
> > speed: 4.189663 GB/s
> >
> > The hardware version is insanely and works on the majority of Postgres
> > setups and the fallback software implementations is 2.8x slower than the
> > fastest 32bit hash around.
> >
> > Hopefully it'll be useful in the discussion.
>
> Thank you for running this sample benchmark. It definitely shows that the
> hardware version of the CRC is very fast, unfortunately it is really only
> available on x64 Intel/AMD processors which leaves all the rest lacking.
> For current 64-bit hardware, it might be instructive to also try using
> the XXH64 version and just take one half of the hash. It should come in
> at around 8.5 GB/s, or very nearly the speed of the hardware accelerated
> CRC. Also, while I understand that CRC has a very venerable history and
> is well studied for transmission type errors, I have been unable to find
> any research on its applicability to validating file/block writes to a
> disk drive. While it is to quote you "unbeaten collision wise", xxhash,
> both the 32-bit and 64-bit version are its equal. Since there seems to
> be a lack of research on disk based error detection versus CRC polynomials,
> it seems likely that any of the proposed hash functions are on an equal
> footing in this regard. As Andres commented up-thread, xxhash comes along
> for "free" with lz4.
>
> Regards,
> Ken
>

For the sake of completeness the results for xxhash64 in my machine

xxhash64
speed:  7.365398 GB/s

Which is indeed very fast.

Reply via email to