Re: Alternate hashing algorithm for higher performance?

dormando Sun, 17 Mar 2019 14:27:31 -0700

Sounds good!

fwiw binprot simply lacks a method for batching responses. I'll be fixing
it this year but that's going to be a big refactor of the old frontend
code. There's some use to it but there should be an option for batching
anyway.


VM's fine. I do most of my quick tests on a NUC, which is essentially a
laptop with better cooling. Have some donated hardware from packet.net for
bigger tests, thankfully :)

mc-crusher is nice since it uses very little CPU compared to something
like memtier. A lot of "memcached" benchmarks end up benchmarking the
benchmark, and are only useful in measuring relative latency of changes in
the daemon vs scalability/throughput.

Just uploaded some fixes and an updated README file. pretty sure I'm the
only regular user of the thing so please ask away if anything is
weird/confusing.

have fun,
-Dormando

On Sun, 17 Mar 2019, 'Eamonn Nugent' via memcached wrote:

> Hi,
>
> I've been meaning to get myself dedicated test environments at one point, for 
> my own stuff as well. Guess now is as good a time as any
> to try it out.
> I'll start using mc-crusher for benchmarks, and I'm going to throw it on at 
> least a VM for now, run some long-term benchmarks, and
> likely try out a couple algos to figure out which one runs best. Good to know 
> about ascii being a different workload, I wasn't sure if
> they were essentially aliased or some sort of individually unique protocol. 
> Maybe if I get annoyed enough at syscalls, I'll find a way
> of writing an OS that allows user applications to be run safely without the 
> same overhead. Tempting, tempting...
>
> No worries about it being a pain in the ass - this is the process behind good 
> and reliable software, and I personally wouldn't want it
> any other way. I just wanted to get back with early results to get your 
> feedback.
>
> Thanks,
>
> Eamonn
>
> On Sun, Mar 17, 2019 at 4:16 PM dormando <dorma...@rydia.net> wrote:
>       Yo,
>
>       Fwiw, I use mc-crusher as the "official" benchmark:
>       https://github.com/memcached/mc-crusher tho I need to update the README
>       slightly. will do that in a bit.
>
>       I also test on hardware uninterrupted by other things with turbo 
> disabled
>       :) testing on laptops can be really tough since you'll get different
>       frequencies minute to minute. You have to interleave test runs 
> A/B/A/B/A/B
>       then average to get through the noise.
>
>       Also, make sure to test both binprot/ascii. with ascii multiget you can
>       deeply pipeline requests and get more hashing vs syscalls.
>
>       Also, test short vs long keys. a for loop and some scripting should get
>       you there. :)
>
>       I don't really want to add a ton of algo's. More or less a "best 
> average"
>       is fine. Given the syscall overhead, hashing is barely measurable for a
>       lot of workloads. When I switched from jenkins to murmur I did a
>       somewhat comprehensive set of tests then swapped default + left old one
>       in just in case. I highly value good defaults! Libmemcached ended up
>       kitchen-sinking hash algo's and I think that didn't work out well in the
>       long run.
>
>       I did also test hash chain bucket depth a bit. Finally, loading up
>       different counts of keys (1m, 10m, 100m, etc) and re-running uniform
>       random benchmarks since fairness will affect the bucket depth and thus
>       latency.
>
>       Sorry if that's a pain in the ass, but this thing is quite widely used 
> and
>       there aren't really beta testers :) Have to be thorough these days.
>
>       -Dormando
>
>       On Sun, 17 Mar 2019, 'Eamonn Nugent' via memcached wrote:
>
>       > Reporting back with very preliminary benchmarks. Somehow, xxh64 is 
> actually faster than xxh3 on my machine. One thing I
>       forgot to
>       > mention before - I also looked at latencies with xxh32/64, and saw 
> the 99th percentile latency lowered by about half
>       compared to mmh3.
>       > So it could be beneficial in that sense. Latencies with xxh3 are in 
> the 3.6ms 99% range, xxh64 go down to about 3.0 (I saw
>       2.5
>       > yesterday, maybe testing on a laptop with about a billion chrome tabs 
> open isn't a brilliant idea), and mmh3 were in the
>       4.xms range.
>       > All of this with modern options, but with non-modern, xxh64 shone 
> quite a bit. I was doing my testing there yesterday.
>       > I used the following memtier_benchmark command to stress test:
>       >
>       > ./memtier_benchmark -P memcache_binary -p 11211 --key-prefix="" 
> --key-maximum=9999999999999999999
>       >
>       > A lot of this seems to be very architecture dependent. Maybe it would 
> make sense to include a lot of hash algos long term,
>       and let power
>       > users figure out which they feel like using? Not sure, though, and 
> you're the expert here :P
>       >
>       > Thanks,
>       >
>       > Eamonn
>       >
>       > On Sun, Mar 17, 2019 at 2:59 PM Eamonn Nugent 
> <eamonn.nug...@demilletech.net> wrote:
>       >       Hiya,
>       > Last night, I was running memtier_benchmark on my laptop (mid-2015 
> 15" MBP, 2.5GHz 4c i7) and achieved about a 10-15%
>       throughput
>       > improvement on both modern and non-modern settings on the 64 bit 
> variant. 32 bit variant was about equal in performance
>       (the
>       > results showed them to be within about 3% of each other, but most of 
> the difference was probably just typical entropy). I
>       was able
>       > to solve the 32/64 bit compile time problem by adding in a wrapper 
> and some compile-time declarations, so I'd say that's
>       about 50%
>       > solved for x86-based systems. But yeah, with ARM, it could turn 
> interesting.
>       >
>       > As a next-ish step, I'm going to attempt to drop in xxh3, but since 
> it's still in active development, it's probably not
>       good as
>       > anything more than a tech demo. I'm happy, if it would help, just to 
> go nuts adding a dozen different algos into hash.c,
>       though
>       > (cityhash/farmhash, as you mentioned). In xxhash's implementation, 
> though, I played with some compile-time flags to make
>       it a bit
>       > faster, and I've been toying with the idea of modifying it so no seed 
> logic ever occurs, to maybe gain a couple cycles of
>       speed
>       > increase. I'm also looking into seeing if I can find a pure assembly 
> version to squeeze a bit more speed out of x86 and
>       ARM
>       > versions. I should probably get one of my ARM systems running and 
> test the difference...
>       >
>       > But hey, thanks for humoring me. Maybe next I'll take a look at the 
> reading & processing command steps, and see if there's
>       > anything I can do. Or maybe parallelizing rotl... Hm. I'll keep on 
> with trying it out :)
>       >
>       > Thanks,
>       >
>       > Eamonn
>       >
>       >
>       > On Sun, Mar 17, 2019 at 2:46 PM dormando <dorma...@rydia.net> wrote:
>       >       Hey,
>       >
>       >       What exact test did you do?
>       >
>       >       Well to be honest I've been wanting to swap in xxhash for a 
> long time, but
>       >       in my own profiling other things show up higher than murmur so 
> I keep
>       >       deprioritizing it :)
>       >
>       >       One big problem with the hash algo is mc keys can be short and 
> are
>       >       hashed one at a time. xxhash is more optimized for longer data 
> (kilobytes
>       >       to megabytes). The original author tries to address this with 
> an updated
>       >       algorithm:
>       >       
> https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html
>       >
>       >       xxhash makes significant use of instruction parallelism, such 
> that if a
>       >       key is 8 bytes or less you could end up waiting for the 
> pipeline more
>       >       than murmur. Other algos like cityhash/farmhash are better at 
> short keys
>       >       IIRC. Also xx's 32bit algo is a bit slower on 64bit machines... 
> so if I
>       >       wanted to use it I was going to test both 32bit and 64bit 
> hashes and then
>       >       have to do compile time testing to figure out which to use. 
> It's also
>       >       heavily x86 optimized so we might have to default something 
> else for ARM.
>       >
>       >       Sorry, not debated on the list, just in my own head :) It's not 
> quite as
>       >       straightforward as just dropping it in. If you're willing to 
> get all the
>       >       conditions tested go nuts! :)
>       >
>       >       -Dormando
>       >
>       >       On Sat, 16 Mar 2019, eamonn.nugent via memcached wrote:
>       >
>       >       > Hi there,
>       >       > I started using memcached in prod a week or two ago, and am 
> loving it. I wanted to give back, and took a look
>       >       through the issues board,
>       >       > but most of them looked solved. So, in my usual "it's never 
> fast enough" style, I went and profiled its
>       performance,
>       >       and had some fun.
>       >       >
>       >       > After seeing that MurmurHash3 was taking a good amount of the 
> execution time, I decided to run a test integrating
>       >       one of my old favorite
>       >       > hash functions, xxhash. My guess is that Memcached could 
> benefit from using the hash function, as it is faster
>       than
>       >       MMH3 and has several
>       >       > native variants. I ran some of my own tests, and found 
> roughly equal performance, but with no tuning performed on
>       >       xxhash. For example,
>       >       > using an assembly (x86/arm/etc) version could likely speed up 
> hashing, along with properly implementing it in
>       >       memcached. However, I was
>       >       > also running this on a much older Nehalem CPU, so there could 
> be unseen advantages to one or both of the
>       algorithms
>       >       by running them on a
>       >       > newer CPU. I'm in the process of fighting with my newer 
> systems to get libevent installed properly, so I'll report
>       >       back with more
>       >       > up-to-date tests later.
>       >       >
>       >       > I did a cursory search, but didn't find any mention of the 
> algo in the mailing list. If this has been discussed,
>       >       though, apologies for
>       >       > bringing it up again. On the other hand, I would be happy to 
> write a PR to add it, using the `hash_algorithm` CLI
>       >       arg.
>       >       >
>       >       > Thanks,
>       >       > Eamonn
>       >       >
>       >       > --
>       >       >
>       >       > ---
>       >       > You received this message because you are subscribed to the 
> Google Groups "memcached" group.
>       >       > To unsubscribe from this group and stop receiving emails from 
> it, send an email to
>       >       memcached+unsubscr...@googlegroups.com.
>       >       > For more options, visit https://groups.google.com/d/optout.
>       >       >
>       >       >
>       >
>       >       --
>       >
>       >       ---
>       >       You received this message because you are subscribed to a topic 
> in the Google Groups "memcached" group.
>       >       To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/Y02zPF-WTKg/unsubscribe.
>       >       To unsubscribe from this group and all its topics, send an 
> email to memcached+unsubscr...@googlegroups.com.
>       >       For more options, visit https://groups.google.com/d/optout.
>       >
>       > --
>       >
>       > ---
>       > You received this message because you are subscribed to the Google 
> Groups "memcached" group.
>       > To unsubscribe from this group and stop receiving emails from it, 
> send an email to memcached+unsubscr...@googlegroups.com.
>       > For more options, visit https://groups.google.com/d/optout.
>       >
>       >
>
>       --
>
>       ---
>       You received this message because you are subscribed to a topic in the 
> Google Groups "memcached" group.
>       To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/Y02zPF-WTKg/unsubscribe.
>       To unsubscribe from this group and all its topics, send an email to 
> memcached+unsubscr...@googlegroups.com.
>       For more options, visit https://groups.google.com/d/optout.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Alternate hashing algorithm for higher performance?

Reply via email to