Re: [ceph-users] Deep scrub, cache pools, replica 1
On Tue, Nov 11, 2014 at 2:32 PM, Christian Balzer ch...@gol.com wrote: On Tue, 11 Nov 2014 10:21:49 -0800 Gregory Farnum wrote: On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer ch...@gol.com wrote: Hello, One of my clusters has become busy enough (I'm looking at you, evil Window VMs that I shall banish elsewhere soon) to experience client noticeable performance impacts during deep scrub. Before this I instructed all OSDs to deep scrub in parallel at Saturday night and that finished before Sunday morning. So for now I'll fire them off one by one to reduce the load. Looking forward, that cluster doesn't need more space so instead of adding more hosts and OSDs I was thinking of a cache pool instead. I suppose that will keep the clients happy while the slow pool gets scrubbed. Is there anybody who tested cache pools with Firefly and compared the performance to Giant? For testing I'm currently playing with a single storage node and 8 SSD backed OSDs. Now what very much blew my mind is that a pool with a replication of 1 still does quite the impressive read orgy, clearly reading all the data in the PGs. Why? And what is it comparing that data with, the cosmic background radiation? Yeah, cache pools currently do full-object promotions whenever an object is accessed. There are some ideas and projects to improve this or reduce its effects, but they're mostly just getting started. Thanks for confirming that, so probably not much better than Firefly _aside_ from the fact that SSD pools should be quite a bit faster in and by themselves in Giant. Guess there is no other way to find out than to test things, I have a feeling that determining the hot working set otherwise will be rather difficult. At least, I assume that's what you mean by a read orgy; perhaps you are seeing something else entirely? Indeed I did, this was just an observation that any pool with a replica of 1 will still read ALL the data during a deep-scrub. What good would that do? Oh, I see what you're saying; you mean it was reading all the data during a scrub, not just that it was promoting things. Anyway, reading all the data during a deep scrub verifies that we *can* read all the data. That's one of the fundamental tasks of scrubbing data in a storage system. It's often accompanied by other checks or recovery behaviors to easily repair issues that are discovered, but simply maintaining confidence that the data actually exists is the principle goal. :) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deep scrub, cache pools, replica 1
On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer ch...@gol.com wrote: Hello, One of my clusters has become busy enough (I'm looking at you, evil Window VMs that I shall banish elsewhere soon) to experience client noticeable performance impacts during deep scrub. Before this I instructed all OSDs to deep scrub in parallel at Saturday night and that finished before Sunday morning. So for now I'll fire them off one by one to reduce the load. Looking forward, that cluster doesn't need more space so instead of adding more hosts and OSDs I was thinking of a cache pool instead. I suppose that will keep the clients happy while the slow pool gets scrubbed. Is there anybody who tested cache pools with Firefly and compared the performance to Giant? For testing I'm currently playing with a single storage node and 8 SSD backed OSDs. Now what very much blew my mind is that a pool with a replication of 1 still does quite the impressive read orgy, clearly reading all the data in the PGs. Why? And what is it comparing that data with, the cosmic background radiation? Yeah, cache pools currently do full-object promotions whenever an object is accessed. There are some ideas and projects to improve this or reduce its effects, but they're mostly just getting started. At least, I assume that's what you mean by a read orgy; perhaps you are seeing something else entirely? Also, even on cache pools you don't really want to run with 1x replication as they hold the only copy of whatever data is dirty... -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deep scrub, cache pools, replica 1
On Tue, 11 Nov 2014 10:21:49 -0800 Gregory Farnum wrote: On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer ch...@gol.com wrote: Hello, One of my clusters has become busy enough (I'm looking at you, evil Window VMs that I shall banish elsewhere soon) to experience client noticeable performance impacts during deep scrub. Before this I instructed all OSDs to deep scrub in parallel at Saturday night and that finished before Sunday morning. So for now I'll fire them off one by one to reduce the load. Looking forward, that cluster doesn't need more space so instead of adding more hosts and OSDs I was thinking of a cache pool instead. I suppose that will keep the clients happy while the slow pool gets scrubbed. Is there anybody who tested cache pools with Firefly and compared the performance to Giant? For testing I'm currently playing with a single storage node and 8 SSD backed OSDs. Now what very much blew my mind is that a pool with a replication of 1 still does quite the impressive read orgy, clearly reading all the data in the PGs. Why? And what is it comparing that data with, the cosmic background radiation? Yeah, cache pools currently do full-object promotions whenever an object is accessed. There are some ideas and projects to improve this or reduce its effects, but they're mostly just getting started. Thanks for confirming that, so probably not much better than Firefly _aside_ from the fact that SSD pools should be quite a bit faster in and by themselves in Giant. Guess there is no other way to find out than to test things, I have a feeling that determining the hot working set otherwise will be rather difficult. At least, I assume that's what you mean by a read orgy; perhaps you are seeing something else entirely? Indeed I did, this was just an observation that any pool with a replica of 1 will still read ALL the data during a deep-scrub. What good would that do? Also, even on cache pools you don't really want to run with 1x replication as they hold the only copy of whatever data is dirty... Oh, I agree, this is for testing only. Also a replica of 1 doesn't have to mean that the data is unsafe (the OSDs could be RAIDed). But even though, in production the loss of a single node shouldn't impact things. And once you go there, a replica of 2 comes naturally. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Deep scrub, cache pools, replica 1
Hello, One of my clusters has become busy enough (I'm looking at you, evil Window VMs that I shall banish elsewhere soon) to experience client noticeable performance impacts during deep scrub. Before this I instructed all OSDs to deep scrub in parallel at Saturday night and that finished before Sunday morning. So for now I'll fire them off one by one to reduce the load. Looking forward, that cluster doesn't need more space so instead of adding more hosts and OSDs I was thinking of a cache pool instead. I suppose that will keep the clients happy while the slow pool gets scrubbed. Is there anybody who tested cache pools with Firefly and compared the performance to Giant? For testing I'm currently playing with a single storage node and 8 SSD backed OSDs. Now what very much blew my mind is that a pool with a replication of 1 still does quite the impressive read orgy, clearly reading all the data in the PGs. Why? And what is it comparing that data with, the cosmic background radiation? Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com