Re: [ceph-users] Effect of tunables on client system load

2017-06-14 Thread Nathanial Byrnes
Thanks for the input David. I'm not sold on xenserver per se, but, It is
what we've been using for the past 7 years... Proxmox has been coming up a
lot recently, I guess it is time to give it a look. I like the sound of
directly using librbd.

   Regards,
   Nate

On Wed, Jun 14, 2017 at 10:30 AM, David Turner 
wrote:

> I don't know if you're sold on Xen and only Xen, but I've been running a 3
> node cluster hyper-converged on a 4 node Proxmox cluster for my home
> projects.  3 of the nodes are running on Proxmox with Ceph OSDs, Mon, and
> MDS daemons.  The fourth node is a much beefier system handling the
> majority of the Virtualization.  This is a system that works pretty much
> right out of the box and utilizes librbd instead of dealing with fuse or
> kernel drivers to access the Ceph disks.  Proxmox also has drivers for
> Gluster built-in if you want to compare to that as well.
>
> In my setup I have all of my VM's primarily on the 4th dedicated VM host,
> but if it goes down for any reason, the important VM's will distribute
> themselves onto the other Proxmox nodes and then when the primary VM host
> is back up, they will move back.  Migrations between nodes takes less than
> a minute for live migration because it only needs to send over the current
> system state and ram information.
>
> I would not suggest using Ceph for VM disks on any Hypervisor that does
> not utilize librbd to access the disks as RBDs.  That sort of VM usage is
> just not how Ceph was designed to host VM disks.  I know this doesn't
> answer your question, but I feel like you should have been asking a
> different question.
>
> On Tue, Jun 13, 2017 at 9:43 PM Nathanial Byrnes  wrote:
>
>> Thanks very much for the insights Greg!
>>
>> My most recent suspicion around the resource consumption is that, with my
>> current configuration, xen is provisioning rbd-nbd storage for guests,
>> rather than just using the kernel module like I was last time around. And,
>> (while I'm unsure of how this works) but it seems there is a tapdisk
>> process for each guest on each xenserver along with the rbd-nbd processes.
>> Perhaps due to this use of NBD xenserver is taking a scenic route through
>> userspace that it wasn't before... That said, gluster is attached via fuse
>> ... I apparently need to dig more into how Xen is attaching to Ceph vs
>> gluster
>>
>>Anyway, thanks again!
>>
>>Nate
>>
>> On Tue, Jun 13, 2017 at 5:30 PM, Gregory Farnum 
>> wrote:
>>
>>>
>>>
>>> On Thu, Jun 8, 2017 at 11:11 PM Nathanial Byrnes  wrote:
>>>
 Hi All,
First, some background:
I have been running a small (4 compute nodes) xen server cluster
 backed by both a small ceph (4 other nodes with a total of 18x 1-spindle
 osd's) and small gluster cluster (2 nodes each with a 14 spindle RAID
 array). I started with gluster 3-4 years ago, at first using NFS to access
 gluster, then upgraded to gluster FUSE. However, I had been facinated with
 ceph since I first read about it, and probably added ceph as soon as XCP
 released a kernel with RBD support, possibly approaching 2 years ago.
With Ceph, since I started out with the kernel RBD, I believe it
 locked me to Bobtail tunables. I connected to XCP via a project that tricks
 XCP into running LVM on the RBDs managing all this through the iSCSI mgmt
 infrastructure somehow... Only recently I've switched to a newer project
 that uses the RBD-NBD mapping instead. This should let me use whatever
 tunables my client SW support AFAIK. I have not yet changed my tunables as
 the data re-org will probably take a day or two (only 1Gb networking...).

Over this time period, I've observed that my gluster backed guests
 tend not to consume as much of domain-0's (the Xen VM management host)
 resources as do my Ceph backed guests. To me, this is somewhat intuitive
  as the ceph client has to do more "thinking" than the gluster client.
 However, It seems to me that the IO performance of the VM guests is well
 outside than the difference in spindle count would suggest. I am open to
 the notion that there are probably quite a few sub-optimal design
 choices/constraints within the environment. However, I haven't the
 resources to conduct all that many experiments and benchmarks So, over
 time I've ended up treating ceph as my resilient storage, and gluster as my
 more performant (3x vs 2x replication, and, as mentioned above, my gluster
 guests had quicker guest IO and lower dom-0 load).

 So, on to my questions:

Would setting my tunables to jewel (my present release), or anything
 newer than bobtail (which is what I think I am set to if I read the ceph
 status warning correctly) reduce my dom-0 load and/or improve any aspects
 of the client IO performance?

>>>
>>> 

Re: [ceph-users] Effect of tunables on client system load

2017-06-14 Thread David Turner
I don't know if you're sold on Xen and only Xen, but I've been running a 3
node cluster hyper-converged on a 4 node Proxmox cluster for my home
projects.  3 of the nodes are running on Proxmox with Ceph OSDs, Mon, and
MDS daemons.  The fourth node is a much beefier system handling the
majority of the Virtualization.  This is a system that works pretty much
right out of the box and utilizes librbd instead of dealing with fuse or
kernel drivers to access the Ceph disks.  Proxmox also has drivers for
Gluster built-in if you want to compare to that as well.

In my setup I have all of my VM's primarily on the 4th dedicated VM host,
but if it goes down for any reason, the important VM's will distribute
themselves onto the other Proxmox nodes and then when the primary VM host
is back up, they will move back.  Migrations between nodes takes less than
a minute for live migration because it only needs to send over the current
system state and ram information.

I would not suggest using Ceph for VM disks on any Hypervisor that does not
utilize librbd to access the disks as RBDs.  That sort of VM usage is just
not how Ceph was designed to host VM disks.  I know this doesn't answer
your question, but I feel like you should have been asking a different
question.

On Tue, Jun 13, 2017 at 9:43 PM Nathanial Byrnes  wrote:

> Thanks very much for the insights Greg!
>
> My most recent suspicion around the resource consumption is that, with my
> current configuration, xen is provisioning rbd-nbd storage for guests,
> rather than just using the kernel module like I was last time around. And,
> (while I'm unsure of how this works) but it seems there is a tapdisk
> process for each guest on each xenserver along with the rbd-nbd processes.
> Perhaps due to this use of NBD xenserver is taking a scenic route through
> userspace that it wasn't before... That said, gluster is attached via fuse
> ... I apparently need to dig more into how Xen is attaching to Ceph vs
> gluster
>
>Anyway, thanks again!
>
>Nate
>
> On Tue, Jun 13, 2017 at 5:30 PM, Gregory Farnum 
> wrote:
>
>>
>>
>> On Thu, Jun 8, 2017 at 11:11 PM Nathanial Byrnes  wrote:
>>
>>> Hi All,
>>>First, some background:
>>>I have been running a small (4 compute nodes) xen server cluster
>>> backed by both a small ceph (4 other nodes with a total of 18x 1-spindle
>>> osd's) and small gluster cluster (2 nodes each with a 14 spindle RAID
>>> array). I started with gluster 3-4 years ago, at first using NFS to access
>>> gluster, then upgraded to gluster FUSE. However, I had been facinated with
>>> ceph since I first read about it, and probably added ceph as soon as XCP
>>> released a kernel with RBD support, possibly approaching 2 years ago.
>>>With Ceph, since I started out with the kernel RBD, I believe it
>>> locked me to Bobtail tunables. I connected to XCP via a project that tricks
>>> XCP into running LVM on the RBDs managing all this through the iSCSI mgmt
>>> infrastructure somehow... Only recently I've switched to a newer project
>>> that uses the RBD-NBD mapping instead. This should let me use whatever
>>> tunables my client SW support AFAIK. I have not yet changed my tunables as
>>> the data re-org will probably take a day or two (only 1Gb networking...).
>>>
>>>Over this time period, I've observed that my gluster backed guests
>>> tend not to consume as much of domain-0's (the Xen VM management host)
>>> resources as do my Ceph backed guests. To me, this is somewhat intuitive
>>>  as the ceph client has to do more "thinking" than the gluster client.
>>> However, It seems to me that the IO performance of the VM guests is well
>>> outside than the difference in spindle count would suggest. I am open to
>>> the notion that there are probably quite a few sub-optimal design
>>> choices/constraints within the environment. However, I haven't the
>>> resources to conduct all that many experiments and benchmarks So, over
>>> time I've ended up treating ceph as my resilient storage, and gluster as my
>>> more performant (3x vs 2x replication, and, as mentioned above, my gluster
>>> guests had quicker guest IO and lower dom-0 load).
>>>
>>> So, on to my questions:
>>>
>>>Would setting my tunables to jewel (my present release), or anything
>>> newer than bobtail (which is what I think I am set to if I read the ceph
>>> status warning correctly) reduce my dom-0 load and/or improve any aspects
>>> of the client IO performance?
>>>
>>
>> Unfortunately no. The tunables are entirely about how CRUSH works, and
>> while it's possible to construct pessimal CRUSH maps that are impossible to
>> satisfy and take a long time to churn through calculations, it's hard and
>> you clearly haven't done that here. I think you're just seeing that the
>> basic CPU cost of a Ceph IO is higher than in Gluster, or else there is
>> something unusual about the Xen configuration you have here compared to
>> 

Re: [ceph-users] Effect of tunables on client system load

2017-06-13 Thread Nathanial Byrnes
Thanks very much for the insights Greg!

My most recent suspicion around the resource consumption is that, with my
current configuration, xen is provisioning rbd-nbd storage for guests,
rather than just using the kernel module like I was last time around. And,
(while I'm unsure of how this works) but it seems there is a tapdisk
process for each guest on each xenserver along with the rbd-nbd processes.
Perhaps due to this use of NBD xenserver is taking a scenic route through
userspace that it wasn't before... That said, gluster is attached via fuse
... I apparently need to dig more into how Xen is attaching to Ceph vs
gluster

   Anyway, thanks again!

   Nate

On Tue, Jun 13, 2017 at 5:30 PM, Gregory Farnum  wrote:

>
>
> On Thu, Jun 8, 2017 at 11:11 PM Nathanial Byrnes  wrote:
>
>> Hi All,
>>First, some background:
>>I have been running a small (4 compute nodes) xen server cluster
>> backed by both a small ceph (4 other nodes with a total of 18x 1-spindle
>> osd's) and small gluster cluster (2 nodes each with a 14 spindle RAID
>> array). I started with gluster 3-4 years ago, at first using NFS to access
>> gluster, then upgraded to gluster FUSE. However, I had been facinated with
>> ceph since I first read about it, and probably added ceph as soon as XCP
>> released a kernel with RBD support, possibly approaching 2 years ago.
>>With Ceph, since I started out with the kernel RBD, I believe it
>> locked me to Bobtail tunables. I connected to XCP via a project that tricks
>> XCP into running LVM on the RBDs managing all this through the iSCSI mgmt
>> infrastructure somehow... Only recently I've switched to a newer project
>> that uses the RBD-NBD mapping instead. This should let me use whatever
>> tunables my client SW support AFAIK. I have not yet changed my tunables as
>> the data re-org will probably take a day or two (only 1Gb networking...).
>>
>>Over this time period, I've observed that my gluster backed guests
>> tend not to consume as much of domain-0's (the Xen VM management host)
>> resources as do my Ceph backed guests. To me, this is somewhat intuitive
>>  as the ceph client has to do more "thinking" than the gluster client.
>> However, It seems to me that the IO performance of the VM guests is well
>> outside than the difference in spindle count would suggest. I am open to
>> the notion that there are probably quite a few sub-optimal design
>> choices/constraints within the environment. However, I haven't the
>> resources to conduct all that many experiments and benchmarks So, over
>> time I've ended up treating ceph as my resilient storage, and gluster as my
>> more performant (3x vs 2x replication, and, as mentioned above, my gluster
>> guests had quicker guest IO and lower dom-0 load).
>>
>> So, on to my questions:
>>
>>Would setting my tunables to jewel (my present release), or anything
>> newer than bobtail (which is what I think I am set to if I read the ceph
>> status warning correctly) reduce my dom-0 load and/or improve any aspects
>> of the client IO performance?
>>
>
> Unfortunately no. The tunables are entirely about how CRUSH works, and
> while it's possible to construct pessimal CRUSH maps that are impossible to
> satisfy and take a long time to churn through calculations, it's hard and
> you clearly haven't done that here. I think you're just seeing that the
> basic CPU cost of a Ceph IO is higher than in Gluster, or else there is
> something unusual about the Xen configuration you have here compared to
> more common deployments.
>
>
>>
>>Will adding nodes to the cluster ceph reduce load on dom-0, and/or
>> improve client IO performance (I doubt the former and would expect the
>> latter...)?
>>
>
> In general adding nodes will increase parallel throughput (ie, async IO on
> one client or the performance of multiple clients), but won't reduce
> latencies. It shouldn't have much (any?) impact on client CPU usage (other
> than if the client is pushing through more IO, it will use proportionally
> more CPU), nor on the CPU usage of existing daemons.
>
>
>>
>>So, why did I bring up gluster at all? In an ideal world, I would like
>> to have just one storage environment that would satisfy all my
>> organizations needs. If forced to choose with the knowledge I have today, I
>> would have to select gluster. I am hoping to come up with some actionable
>> data points that might help me discover some of my mistakes which might
>> explain my experience to date and maybe even help remedy said mistakes. As
>> I mentioned earlier, I like ceph, more so than gluster, and would like to
>> employ more within my environment. But, given budgetary constraints, I need
>> to do what's best for my organization.
>>
>>
> Yeah. I'm a little surprised you noticed it in the environment you
> described, but there aren't many people running Xen on Ceph so perhaps
> there's something odd happening with the setup it has there which 

Re: [ceph-users] Effect of tunables on client system load

2017-06-13 Thread Gregory Farnum
On Thu, Jun 8, 2017 at 11:11 PM Nathanial Byrnes  wrote:

> Hi All,
>First, some background:
>I have been running a small (4 compute nodes) xen server cluster
> backed by both a small ceph (4 other nodes with a total of 18x 1-spindle
> osd's) and small gluster cluster (2 nodes each with a 14 spindle RAID
> array). I started with gluster 3-4 years ago, at first using NFS to access
> gluster, then upgraded to gluster FUSE. However, I had been facinated with
> ceph since I first read about it, and probably added ceph as soon as XCP
> released a kernel with RBD support, possibly approaching 2 years ago.
>With Ceph, since I started out with the kernel RBD, I believe it
> locked me to Bobtail tunables. I connected to XCP via a project that tricks
> XCP into running LVM on the RBDs managing all this through the iSCSI mgmt
> infrastructure somehow... Only recently I've switched to a newer project
> that uses the RBD-NBD mapping instead. This should let me use whatever
> tunables my client SW support AFAIK. I have not yet changed my tunables as
> the data re-org will probably take a day or two (only 1Gb networking...).
>
>Over this time period, I've observed that my gluster backed guests tend
> not to consume as much of domain-0's (the Xen VM management host) resources
> as do my Ceph backed guests. To me, this is somewhat intuitive  as the ceph
> client has to do more "thinking" than the gluster client. However, It seems
> to me that the IO performance of the VM guests is well outside than the
> difference in spindle count would suggest. I am open to the notion that
> there are probably quite a few sub-optimal design choices/constraints
> within the environment. However, I haven't the resources to conduct all
> that many experiments and benchmarks So, over time I've ended up
> treating ceph as my resilient storage, and gluster as my more performant
> (3x vs 2x replication, and, as mentioned above, my gluster guests had
> quicker guest IO and lower dom-0 load).
>
> So, on to my questions:
>
>Would setting my tunables to jewel (my present release), or anything
> newer than bobtail (which is what I think I am set to if I read the ceph
> status warning correctly) reduce my dom-0 load and/or improve any aspects
> of the client IO performance?
>

Unfortunately no. The tunables are entirely about how CRUSH works, and
while it's possible to construct pessimal CRUSH maps that are impossible to
satisfy and take a long time to churn through calculations, it's hard and
you clearly haven't done that here. I think you're just seeing that the
basic CPU cost of a Ceph IO is higher than in Gluster, or else there is
something unusual about the Xen configuration you have here compared to
more common deployments.


>
>Will adding nodes to the cluster ceph reduce load on dom-0, and/or
> improve client IO performance (I doubt the former and would expect the
> latter...)?
>

In general adding nodes will increase parallel throughput (ie, async IO on
one client or the performance of multiple clients), but won't reduce
latencies. It shouldn't have much (any?) impact on client CPU usage (other
than if the client is pushing through more IO, it will use proportionally
more CPU), nor on the CPU usage of existing daemons.


>
>So, why did I bring up gluster at all? In an ideal world, I would like
> to have just one storage environment that would satisfy all my
> organizations needs. If forced to choose with the knowledge I have today, I
> would have to select gluster. I am hoping to come up with some actionable
> data points that might help me discover some of my mistakes which might
> explain my experience to date and maybe even help remedy said mistakes. As
> I mentioned earlier, I like ceph, more so than gluster, and would like to
> employ more within my environment. But, given budgetary constraints, I need
> to do what's best for my organization.
>
>
Yeah. I'm a little surprised you noticed it in the environment you
described, but there aren't many people running Xen on Ceph so perhaps
there's something odd happening with the setup it has there which I and
others aren't picking up on. :/

Good luck!
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com