Re: Scalability and benchmarks

2010-07-09 Thread Guille -bisho-
 I hadn't read the paper, just the abstract of the talk at Velocity. It
 read like FUD, so I wanted to confirm/deny that here. Thanks to Vladimir
 for posting the link and allowing us to sort that out.


Well, there are some bottleneck. For example with UDP memcache wakes
up all childs when a packet arrives, degrading performance a lot and
limiting to one core.

-- 
Guille -ℬḭṩḩø- bishi...@gmail.com
:wq


Re: Scalability and benchmarks

2010-07-08 Thread Simon Riggs
On Thu, 2010-07-01 at 07:37 -0700, dormando wrote:

 First, Hidden scalability gotchas - for the love of fuck, nobody gets
 hit by that. Especially nobody attending velocity. I'm not even sure
 facebook bothered to scale that lock.
 
 Given the title, the overtly academic content, and the lack of serious
 discussion as to the application of such knowledge, we end up with stupid
 threads like this. Imagine how many people are just walking away with that
 poor idea of holy shit I should use REDIS^WCassandra^Wetc because
 memcached doesn't scale! - without anyone to inform them that redis isn't
 even multithreaded and cassandra apparently sucks at scaling down. Google
 is absolutely loaded with people benchmarking various shit vs memcached
 and declaring that memcached is slower, despite the issue being in the
 *client software* or even the benchmark itself.
 
 People can't understand the difference! Please don't confuse them more!
 
 There are *many* more interesting topics with scalability gotchas, like
 the memory reallocation problem that we're working on. That one *actually
 affects people*, is mitigated through education, and will be solved
 through code. Other NoSQL solutions are absolutely riddled with
 usability bugs that have nothing to do with how many non-fsync'd writes
 you can push through a cluster per second. What separates academic wank
 from truly useful topics is whether or not you can take the subject (given
 the context applied to it) and actually do anything with it.
 
 There're dozens of real problems to pick on us about - I sorely wish
 people would stop hyperfocusing on the ones that don't matter. Sorry if
 I've picked on your favorite memcached alternative; I don't really care
 for a rebuttal, I'm sure everyone's working on fixing their problems :P

Yeh, agreed.

I hadn't read the paper, just the abstract of the talk at Velocity. It
read like FUD, so I wanted to confirm/deny that here. Thanks to Vladimir
for posting the link and allowing us to sort that out.

The paper is laughably academico-technical. Measurements look good, not
bad, and as you say in absolute terms they are really very good. The
title doesn't match the contents certainly.

One point of note was that the performance on Sparc was less than
performance on other CPUs, so Sun/Oracle were the most hit. They made no
attempt to resolve that in more normal ways, nor did they even point out
their hardware had come out poorly on that. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services



Re: Scalability and benchmarks

2010-07-01 Thread Vladimir Vuksan
I believe the FUD you are referring to was following presentation at
Velocity

Hidden Scalability Gotchas in Memcached and Friends

http://en.oreilly.com/velocity2010/public/schedule/detail/13046

There is a link to the PDF of slides so you can see what they talked
about. Here is the short link to it

http://j.mp/cgIsE9

It was an example of applying a model to a set of controlled
measurements and not necessarily picking on memcached. One of the
findings was that contention has increased between versions 1.2.8 and
1.4.5 from about 2.5% to 9.8%. Another is that memcached didn't perform
well beyond 6 threads which they attributed to locks. At Sun they coded
some patches where they replaced the single lock with multiple locks and
you can view performance on slide 25.

The measurements do show that memcached performs extremely well ie. 300k
ops on a single instance however memcached was not able to take
advantage of additional threads.

Key takeaway from this talk was not that memcached doesn't scale but
that it can perform even better.

Vladimir

U Sri, 30. 06. 2010., u 11:35 -0700, dormando je napisao/la:

  I've seen some FUD from people claiming that memcached doesn't scale
  very well on multiple CPUs, which surprised me.
 
  Is there an accepted benchmark we can use to examine performance in more
  detail?
 
  Does anybody have any testing results in that area?
 
 While we don't have a standard set of benchmarks yet, we do test it,
 people have tested it, and it's routinely heavily hammered in production
 all over the place, including several top 10 websites.
 
 The FUD is easy enough to dispell: just squint at their numbers a little
 bit and think it through carefully.
 
 Most of these are saying that you hit a wall scaling memcached past,
 say, 300,000 requests per second on a single box. (though I think with the
 latest 1.4 it's easier to hit 500,000+). Remember that 300,000 requests
 per second at 1k per request is over 2.5 gigabits of outbound alone.
 
 If your requests are much smaller than that, you might skid in under
 1gbps. For sanity's sake though, take a realistic look at how many
 requests per second you actually hit memcached with. 100,000+ per second
 per box is unlikely but sometimes happens, and works fine.
 
 Even large sites are likely doing less than that, as they will need more
 memory before hosing a box.
 
 memcached doesn't scale! is a pile of horseshit. The same industry
 claims all the single-threaded NoSQL stuff Scales just fine. Figure out
 what your own needs are for requests per second, then try to do that with
 your hardware. Odds are good you'll be ale to hit 10x-100x that mark.
 
 I think sometime this year we'll be making it scale across CPU's better
 than it presently does, but the only people who would ever notice would be
 users with massive memcached instances on 10gbps ethernet servicing small
 requests. I'm sure there's someone out there like that, but I doubt anyone
 listening to the FUD would be one of them.
 
 -Dormando




Re: Scalability and benchmarks

2010-07-01 Thread dormando


On Thu, 1 Jul 2010, Vladimir Vuksan wrote:

 I believe the FUD you are referring to was following presentation at Velocity

 Hidden Scalability Gotchas in Memcached and Friends

 http://en.oreilly.com/velocity2010/public/schedule/detail/13046

 There is a link to the PDF of slides so you can see what they talked about. 
 Here is the short link to it

 http://j.mp/cgIsE9

 It was an example of applying a model to a set of controlled measurements and 
 not necessarily picking on memcached. One of the findings was that contention 
 has increased between versions 1.2.8 and 1.4.5
 from about 2.5% to 9.8%. Another is that memcached didn't perform well beyond 
 6 threads which they attributed to locks. At Sun they coded some patches 
 where they replaced the single lock with multiple
 locks and you can view performance on slide 25.

 The measurements do show that memcached performs extremely well ie. 300k ops 
 on a single instance however memcached was not able to take advantage of 
 additional threads.

 Key takeaway from this talk was not that memcached doesn't scale but that it 
 can perform even better.

I don't want to turn this into a flamebait trollfest, but I've always been
seriously annoyed by this benchmark that those Sun/Oracle folks keep
doing. So I'm going to toss out some flamebait and pray that it doesn't
turn this thread much longer.

First, Hidden scalability gotchas - for the love of fuck, nobody gets
hit by that. Especially nobody attending velocity. I'm not even sure
facebook bothered to scale that lock.

Given the title, the overtly academic content, and the lack of serious
discussion as to the application of such knowledge, we end up with stupid
threads like this. Imagine how many people are just walking away with that
poor idea of holy shit I should use REDIS^WCassandra^Wetc because
memcached doesn't scale! - without anyone to inform them that redis isn't
even multithreaded and cassandra apparently sucks at scaling down. Google
is absolutely loaded with people benchmarking various shit vs memcached
and declaring that memcached is slower, despite the issue being in the
*client software* or even the benchmark itself.

People can't understand the difference! Please don't confuse them more!

There are *many* more interesting topics with scalability gotchas, like
the memory reallocation problem that we're working on. That one *actually
affects people*, is mitigated through education, and will be solved
through code. Other NoSQL solutions are absolutely riddled with
usability bugs that have nothing to do with how many non-fsync'd writes
you can push through a cluster per second. What separates academic wank
from truly useful topics is whether or not you can take the subject (given
the context applied to it) and actually do anything with it.

There're dozens of real problems to pick on us about - I sorely wish
people would stop hyperfocusing on the ones that don't matter. Sorry if
I've picked on your favorite memcached alternative; I don't really care
for a rebuttal, I'm sure everyone's working on fixing their problems :P

-Dormando


Re: Scalability and benchmarks

2010-07-01 Thread Artur Ejsmont
hehe Oracle has Sun and they have Oracle Coherence (more fully-featured data
grid) so they have to come up with studies with lots of scary numbers to
make sure the message goes out that open source sucks and you should stay
away from it. Buy the real enterprise product today and save tomorrow ;- )

To me it seemed biased.

Art

On 1 July 2010 15:37, dormando dorma...@rydia.net wrote:



 On Thu, 1 Jul 2010, Vladimir Vuksan wrote:

  I believe the FUD you are referring to was following presentation at
 Velocity
 
  Hidden Scalability Gotchas in Memcached and Friends
 
  http://en.oreilly.com/velocity2010/public/schedule/detail/13046
 
  There is a link to the PDF of slides so you can see what they talked
 about. Here is the short link to it
 
  http://j.mp/cgIsE9
 
  It was an example of applying a model to a set of controlled measurements
 and not necessarily picking on memcached. One of the findings was that
 contention has increased between versions 1.2.8 and 1.4.5
  from about 2.5% to 9.8%. Another is that memcached didn't perform well
 beyond 6 threads which they attributed to locks. At Sun they coded some
 patches where they replaced the single lock with multiple
  locks and you can view performance on slide 25.
 
  The measurements do show that memcached performs extremely well ie. 300k
 ops on a single instance however memcached was not able to take advantage of
 additional threads.
 
  Key takeaway from this talk was not that memcached doesn't scale but that
 it can perform even better.

 I don't want to turn this into a flamebait trollfest, but I've always been
 seriously annoyed by this benchmark that those Sun/Oracle folks keep
 doing. So I'm going to toss out some flamebait and pray that it doesn't
 turn this thread much longer.

 First, Hidden scalability gotchas - for the love of fuck, nobody gets
 hit by that. Especially nobody attending velocity. I'm not even sure
 facebook bothered to scale that lock.

 Given the title, the overtly academic content, and the lack of serious
 discussion as to the application of such knowledge, we end up with stupid
 threads like this. Imagine how many people are just walking away with that
 poor idea of holy shit I should use REDIS^WCassandra^Wetc because
 memcached doesn't scale! - without anyone to inform them that redis isn't
 even multithreaded and cassandra apparently sucks at scaling down. Google
 is absolutely loaded with people benchmarking various shit vs memcached
 and declaring that memcached is slower, despite the issue being in the
 *client software* or even the benchmark itself.

 People can't understand the difference! Please don't confuse them more!

 There are *many* more interesting topics with scalability gotchas, like
 the memory reallocation problem that we're working on. That one *actually
 affects people*, is mitigated through education, and will be solved
 through code. Other NoSQL solutions are absolutely riddled with
 usability bugs that have nothing to do with how many non-fsync'd writes
 you can push through a cluster per second. What separates academic wank
 from truly useful topics is whether or not you can take the subject (given
 the context applied to it) and actually do anything with it.

 There're dozens of real problems to pick on us about - I sorely wish
 people would stop hyperfocusing on the ones that don't matter. Sorry if
 I've picked on your favorite memcached alternative; I don't really care
 for a rebuttal, I'm sure everyone's working on fixing their problems :P

 -Dormando



Re: Scalability and benchmarks

2010-07-01 Thread Les Mikesell

On 7/1/2010 9:37 AM, dormando wrote:


Given the title, the overtly academic content, and the lack of serious
discussion as to the application of such knowledge, we end up with stupid
threads like this. Imagine how many people are just walking away with that
poor idea of holy shit I should use REDIS^WCassandra^Wetc because
memcached doesn't scale! - without anyone to inform them that redis isn't
even multithreaded and cassandra apparently sucks at scaling down. Google
is absolutely loaded with people benchmarking various shit vs memcached
and declaring that memcached is slower, despite the issue being in the
*client software* or even the benchmark itself.

People can't understand the difference! Please don't confuse them more!

There are *many* more interesting topics with scalability gotchas, like
the memory reallocation problem that we're working on.


I have the cache spread across about 20 servers per site, with the 
servers also doing some other work, and could add more if the 
performance of a single server ever becomes an issue.  The question I 
find more interesting and probably a lot harder to nail down is what 
kind of performance impact will it have if one or a few of the servers 
go down at a busy time.  Ignoring the impact on the origin servers, how 
long does it typically take a client to figure out that a server is gone 
and what happens if it is supposed to be fielding 100k requests/second 
at the time?  And do all the different clients rebalance in the same way 
once they notice a configured server is dead?


In real-world situations I think the client performance is much more 
important than the server, especially since in this case it has to deal 
with the load balancing and failover logic - and you can just add more 
servers if you need them.


--
  Les Mikesell
   lesmikes...@gmail.com


Re: Scalability and benchmarks

2010-06-30 Thread dormando
 I've seen some FUD from people claiming that memcached doesn't scale
 very well on multiple CPUs, which surprised me.

 Is there an accepted benchmark we can use to examine performance in more
 detail?

 Does anybody have any testing results in that area?

While we don't have a standard set of benchmarks yet, we do test it,
people have tested it, and it's routinely heavily hammered in production
all over the place, including several top 10 websites.

The FUD is easy enough to dispell: just squint at their numbers a little
bit and think it through carefully.

Most of these are saying that you hit a wall scaling memcached past,
say, 300,000 requests per second on a single box. (though I think with the
latest 1.4 it's easier to hit 500,000+). Remember that 300,000 requests
per second at 1k per request is over 2.5 gigabits of outbound alone.

If your requests are much smaller than that, you might skid in under
1gbps. For sanity's sake though, take a realistic look at how many
requests per second you actually hit memcached with. 100,000+ per second
per box is unlikely but sometimes happens, and works fine.

Even large sites are likely doing less than that, as they will need more
memory before hosing a box.

memcached doesn't scale! is a pile of horseshit. The same industry
claims all the single-threaded NoSQL stuff Scales just fine. Figure out
what your own needs are for requests per second, then try to do that with
your hardware. Odds are good you'll be ale to hit 10x-100x that mark.

I think sometime this year we'll be making it scale across CPU's better
than it presently does, but the only people who would ever notice would be
users with massive memcached instances on 10gbps ethernet servicing small
requests. I'm sure there's someone out there like that, but I doubt anyone
listening to the FUD would be one of them.

-Dormando


Re: Scalability and benchmarks

2010-06-30 Thread Jason Dixon
On Wed, Jun 30, 2010 at 11:35:08AM -0700, dormando wrote:
  I've seen some FUD from people claiming that memcached doesn't scale
  very well on multiple CPUs, which surprised me.
 
  Is there an accepted benchmark we can use to examine performance in more
  detail?
 
  Does anybody have any testing results in that area?
 
 While we don't have a standard set of benchmarks yet, we do test it,
 people have tested it, and it's routinely heavily hammered in production
 all over the place, including several top 10 websites.
 
 The FUD is easy enough to dispell: just squint at their numbers a little
 bit and think it through carefully.
 
 Most of these are saying that you hit a wall scaling memcached past,
 say, 300,000 requests per second on a single box. (though I think with the
 latest 1.4 it's easier to hit 500,000+). Remember that 300,000 requests
 per second at 1k per request is over 2.5 gigabits of outbound alone.
 
 If your requests are much smaller than that, you might skid in under
 1gbps. For sanity's sake though, take a realistic look at how many
 requests per second you actually hit memcached with. 100,000+ per second
 per box is unlikely but sometimes happens, and works fine.
 
 Even large sites are likely doing less than that, as they will need more
 memory before hosing a box.
 
 memcached doesn't scale! is a pile of horseshit. The same industry
 claims all the single-threaded NoSQL stuff Scales just fine. Figure out
 what your own needs are for requests per second, then try to do that with
 your hardware. Odds are good you'll be ale to hit 10x-100x that mark.
 
 I think sometime this year we'll be making it scale across CPU's better
 than it presently does, but the only people who would ever notice would be
 users with massive memcached instances on 10gbps ethernet servicing small
 requests. I'm sure there's someone out there like that, but I doubt anyone
 listening to the FUD would be one of them.

Great analysis, and exactly the sort of case studies we'd like to see at
Surge this year.  We haven't received any memcached-related CFPs yet.  I
hope the community sees this and someone attempts to rectify it.  :)

http://omniti.com/surge/2010/cfp

-- 
Jason Dixon
OmniTI Computer Consulting, Inc.
jdi...@omniti.com
443.325.1357 x.241


Re: Scalability and benchmarks

2010-06-30 Thread Ben Manes
I've seen load testing benchmarks using memslap which showed linear scaling, 
but nothing to the extent that demonstrated locking to be a bottleneck (for the 
reasons that domando stated). When I peaked at the code a while back it looked 
like the locking on the hashtable / lru chain was coarse, but that's not 
surprising once you realize that the protocol handling will most likely 
dominate the time. If the lru lock ever does become a bottleneck (which would 
be quite an achievement), there are some fairly simple algorithm tricks that 
can amortize the lock operations to make it concurrent without sacrificing the 
a global recency ordering. If I recall correctly, though, there was some work 
about 1-1/2 yrs ago to avoid contention on the statistics counters which was 
causing some degradation under extreme conditions.





From: dormando dorma...@rydia.net
To: memcached@googlegroups.com
Sent: Wed, June 30, 2010 11:35:08 AM
Subject: Re: Scalability and benchmarks

 I've seen some FUD from people claiming that memcached doesn't scale
 very well on multiple CPUs, which surprised me.

 Is there an accepted benchmark we can use to examine performance in more
 detail?

 Does anybody have any testing results in that area?

While we don't have a standard set of benchmarks yet, we do test it,
people have tested it, and it's routinely heavily hammered in production
all over the place, including several top 10 websites.

The FUD is easy enough to dispell: just squint at their numbers a little
bit and think it through carefully.

Most of these are saying that you hit a wall scaling memcached past,
say, 300,000 requests per second on a single box. (though I think with the
latest 1.4 it's easier to hit 500,000+). Remember that 300,000 requests
per second at 1k per request is over 2.5 gigabits of outbound alone.

If your requests are much smaller than that, you might skid in under
1gbps. For sanity's sake though, take a realistic look at how many
requests per second you actually hit memcached with. 100,000+ per second
per box is unlikely but sometimes happens, and works fine.

Even large sites are likely doing less than that, as they will need more
memory before hosing a box.

memcached doesn't scale! is a pile of horseshit. The same industry
claims all the single-threaded NoSQL stuff Scales just fine. Figure out
what your own needs are for requests per second, then try to do that with
your hardware. Odds are good you'll be ale to hit 10x-100x that mark.

I think sometime this year we'll be making it scale across CPU's better
than it presently does, but the only people who would ever notice would be
users with massive memcached instances on 10gbps ethernet servicing small
requests. I'm sure there's someone out there like that, but I doubt anyone
listening to the FUD would be one of them.

-Dormando



  

Re: Scalability and benchmarks

2010-06-30 Thread Arjen van der Meijden

On 30-6-2010 20:35 dormando wrote:

Most of these are saying that you hit a wall scaling memcached past,
say, 300,000 requests per second on a single box. (though I think with the
latest 1.4 it's easier to hit 500,000+). Remember that 300,000 requests
per second at 1k per request is over 2.5 gigabits of outbound alone.

If your requests are much smaller than that, you might skid in under
1gbps. For sanity's sake though, take a realistic look at how many
requests per second you actually hit memcached with. 100,000+ per second
per box is unlikely but sometimes happens, and works fine.


With what kind of boxes would that be?

With 300-500k/sec you're getting really close to lowlevel limitations of 
single network interfaces. With dell 1950's (with broadcom netextreme II 
5708 and dual xeon 5150) we were able to produce about 550-600,000 
packets/second with traffic/load-generating software (pktgen in the 
linux kernel) to test ddos protection. And that was with fire-and-forget 
64byte tcp or udp-packets. Not with traffic some server was actually 
producing useful responses to requests it received earlier.


With a much more recent Dell R210 (broadcom netextreme II 5709 and core 
i3 530) we were able to reach twice that though. That one was able to 
reach about 1.1 million pps. But still, that's with a packet generator 
generating unusable traffic. If you actually have to read the requests, 
process them and produce a response with a body, reaching up to 500k 
requests/second even on higher grade hardware with multiple interfaces 
sounds pretty good to me.


Best regards,

Arjen


Re: Scalability and benchmarks

2010-06-30 Thread dormando
 With what kind of boxes would that be?

 With 300-500k/sec you're getting really close to lowlevel limitations of
 single network interfaces. With dell 1950's (with broadcom netextreme II 5708
 and dual xeon 5150) we were able to produce about 550-600,000 packets/second
 with traffic/load-generating software (pktgen in the linux kernel) to test
 ddos protection. And that was with fire-and-forget 64byte tcp or udp-packets.
 Not with traffic some server was actually producing useful responses to
 requests it received earlier.

 With a much more recent Dell R210 (broadcom netextreme II 5709 and core i3
 530) we were able to reach twice that though. That one was able to reach about
 1.1 million pps. But still, that's with a packet generator generating unusable
 traffic. If you actually have to read the requests, process them and produce a
 response with a body, reaching up to 500k requests/second even on higher grade
 hardware with multiple interfaces sounds pretty good to me.

For most hardware memcached is limited by the NIC. I'd welcome someone to
prove a simple case showing otherwise, at which time we'd prioritize an
easy fix :)

-Dormando


Re: Scalability and benchmarks

2010-06-30 Thread Les Mikesell

On 6/30/2010 5:35 PM, dormando wrote:

With what kind of boxes would that be?

With 300-500k/sec you're getting really close to lowlevel limitations of
single network interfaces. With dell 1950's (with broadcom netextreme II 5708
and dual xeon 5150) we were able to produce about 550-600,000 packets/second
with traffic/load-generating software (pktgen in the linux kernel) to test
ddos protection. And that was with fire-and-forget 64byte tcp or udp-packets.
Not with traffic some server was actually producing useful responses to
requests it received earlier.

With a much more recent Dell R210 (broadcom netextreme II 5709 and core i3
530) we were able to reach twice that though. That one was able to reach about
1.1 million pps. But still, that's with a packet generator generating unusable
traffic. If you actually have to read the requests, process them and produce a
response with a body, reaching up to 500k requests/second even on higher grade
hardware with multiple interfaces sounds pretty good to me.


For most hardware memcached is limited by the NIC. I'd welcome someone to
prove a simple case showing otherwise, at which time we'd prioritize an
easy fix :)


Does that mean you should use multiple NICs on the servers and spread 
the clients over different networks?


--
  Les Mikesell
   lesmikes...@gmail.com



Re: Scalability and benchmarks

2010-06-30 Thread dormando
 
  For most hardware memcached is limited by the NIC. I'd welcome someone to
  prove a simple case showing otherwise, at which time we'd prioritize an
  easy fix :)

 Does that mean you should use multiple NICs on the servers and spread the
 clients over different networks?

It means you probably don't need to worry about it. Most people will start
nailing evictions and need to add more instances long before they overrun
the NIC.

Also uh, no, you'd probably use port bonding instead of doing something
crazy with client networking.


Re: Scalability and benchmarks

2010-06-30 Thread Arjen van der Meijden

On 1-7-2010 0:49 Les Mikesell wrote:

On 6/30/2010 5:35 PM, dormando wrote:

For most hardware memcached is limited by the NIC. I'd welcome someone to
prove a simple case showing otherwise, at which time we'd prioritize an
easy fix :)


Does that mean you should use multiple NICs on the servers and spread
the clients over different networks?


Afaik about most operating system and managed switches nowadays support 
port-trunking to make a bigger pipe of several network-interfaces. We 
use it on our fileserver for instance which came shipped with four 1 
Gbit-ports and we configured it to be a trunk of three on our main 
network and a single failover-port on our backup network.


Long story short, you don't need to spread the clients across networks 
to be able to use multiple nics/ports on your server as long as the 
switches and OS (and maybe the nics too) support port-trunking 
(server-grade versions of all those probably do that).


Best regards,

Arjen