Re: [rdo-users] Poor Ceph Performance

2018-11-26 Thread Cody
Hi John,

Thank you so much for the reply. I will make a post to the Ceph ML and
add the link back to this thread.

Here I am attaching the cluster specs below just for the reference. It
uses 9 baremetal nodes (1 Undercloud, 3 Controllers (HA), 3 Ceph, 2
Compute) with following details:

Undercloud & Compute nodes:
CPU: E3-1230V2 @3.7GHz
RAM: 16GB
Ports: 1Gbps for provisioning; 1Gbps for external/VLANs

Controller nodes (with Ceph mon & mgr) :
CPU: 2 x E5-2603 @1.8GHz
RAM: 16GB
Ports: 1Gbps for provisioning; 1Gbps for VLANs

Ceph nodes:
CPU: 2 x E5-2603 @1.8GHz
RAM: 16GB
Ports: 1Gbps for provisioning; 1Gbps for VLANs
Journaling: 1 SSD (SATA3, consumer grade)
OSDs: 2 x 2TB @ 7200rpm (SATA3, consumer grade)

Switch:
HUAWEI S1700 Series (24 x 1Gbps ports, 56Gbps switching capacity)

The gears are old and under-configured especially for their RAM
capacity. But this is just for PoC with minimal usages and with no
sign of CPU/RAM starvation during the test.

On the software side, it is running Queens release. The ceph-ansible
version is 3.1.6 and is using filestore with non-collocated setup.


Best regards,
Cody



On Mon, Nov 26, 2018 at 9:13 AM John Fulton  wrote:
>
> On Sun, Nov 25, 2018 at 11:29 PM Cody  wrote:
> >
> > Hello,
> >
> > My tripleO cluster is deployed with Ceph. Both Cinder and Nova use RBD
> > as backend. While all essential functions work, services involving
> > Ceph are getting very poor performance. E.g., it takes several hours
> > to upload an 8GB image into Cinder and about 20 minutes to completely
> > boot up an instance (from launch to ssh ready).
> >
> > Running 'ceph -s' shows a top write speed at 6~700 KiB/s during image
> > upload and read speed 2 MiB/s during instance launch.
> >
> > I used the default scheme for network isolation and a single 1G port
> > for all VLAN traffics on each overcloud node. I haven't set jumbo
> > frame on the storage network VLAN yet, but think the performance
> > should not be this bad with MTU 1500. Something must be wrong. Any
> > suggestions for debugging?
>
> Hi Cody,
>
> If you're using queens or rocky, then ceph luminous was deployed in
> containers. Though tripleo did the overall deployment, ceph-ansible
> would have done the actual ceph deployment and configuration and you
> can determine the ceph-ansible version via 'rpm -q ceph-ansible' on
> your undercloud. It probably makes sense for you to pass along what
> you mentioned above in addition to some other info, which I'll note
> below, to the ceph-users list
> (http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com), who will be
> focused on ceph itself. When you contact them (I'm on the list too)
> also let them know the following:
>
> 1. How many OSD servers you have and how many OSDs per server
> 2. What type of disks you're using per OSD and how you set up journaling
> 3. Specs of your servers themselves (OpenStack controller servers w/
> CPU X and Ram Y for Ceph monitors and Ceph Storage servers RAM/CPU
> info)
> 4. Did you override the RAM/CPU for the Mon, Mgr, and OSD containers?
> If so, what did you override them to?
>
> TripleO can pass any parameter you would normally pass to ceph-ansible
> as described in the following:
>
>  
> https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#customizing-ceph-conf-with-ceph-ansible
>
> So if you let them know things in terms of a containerized
> ceph-ansible luminous deployment and the ceph.conf and they have
> suggestions, then you can apply the suggestions back to ceph-ansible
> through tripleo as described above. If you start troubleshooting the
> cluster as per this troubleshooting guide [2] and share the results
> that would also help.
>
> I've gotten better performance than you describe on a completely
> virtualized deployment using my PC [1] using quickstart with the
> defaults that TripleO passes using queens and rocky. Though, TripleO
> tends to favor the defaults which ceph-ansible uses. However, with a
> single 1G port for all network traffic I don't expect great
> performance.
>
> Feel free to CC me when you email ceph-users and feel free to share on
> rdo-users a link to the thread you started there in case anyone else
> on this list is interested.
>
>   John
>
> [1] http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html
> [2] 
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/pdf/troubleshooting_guide/Red_Hat_Ceph_Storage-3-Troubleshooting_Guide-en-US.pdf
>
> > Thank you very much.
> >
> > Best regards,
> > Cody
> > ___
> > users mailing list
> > users@lists.rdoproject.org
> > http://lists.rdoproject.org/mailman/listinfo/users
> >
> > To unsubscribe: users-unsubscr...@lists.rdoproject.org
___
users mailing list
users@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/users

To unsubscribe: users-unsubscr...@lists.rdoproject.org


Re: [rdo-users] Poor Ceph Performance

2018-11-26 Thread Donny Davis
Also how are you uploading the images?

On Mon, Nov 26, 2018 at 10:54 AM Donny Davis  wrote:

> What kind of images are you using?
>
> On Mon, Nov 26, 2018 at 9:14 AM John Fulton  wrote:
>
>> On Sun, Nov 25, 2018 at 11:29 PM Cody  wrote:
>> >
>> > Hello,
>> >
>> > My tripleO cluster is deployed with Ceph. Both Cinder and Nova use RBD
>> > as backend. While all essential functions work, services involving
>> > Ceph are getting very poor performance. E.g., it takes several hours
>> > to upload an 8GB image into Cinder and about 20 minutes to completely
>> > boot up an instance (from launch to ssh ready).
>> >
>> > Running 'ceph -s' shows a top write speed at 6~700 KiB/s during image
>> > upload and read speed 2 MiB/s during instance launch.
>> >
>> > I used the default scheme for network isolation and a single 1G port
>> > for all VLAN traffics on each overcloud node. I haven't set jumbo
>> > frame on the storage network VLAN yet, but think the performance
>> > should not be this bad with MTU 1500. Something must be wrong. Any
>> > suggestions for debugging?
>>
>> Hi Cody,
>>
>> If you're using queens or rocky, then ceph luminous was deployed in
>> containers. Though tripleo did the overall deployment, ceph-ansible
>> would have done the actual ceph deployment and configuration and you
>> can determine the ceph-ansible version via 'rpm -q ceph-ansible' on
>> your undercloud. It probably makes sense for you to pass along what
>> you mentioned above in addition to some other info, which I'll note
>> below, to the ceph-users list
>> (http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com), who will be
>> focused on ceph itself. When you contact them (I'm on the list too)
>> also let them know the following:
>>
>> 1. How many OSD servers you have and how many OSDs per server
>> 2. What type of disks you're using per OSD and how you set up journaling
>> 3. Specs of your servers themselves (OpenStack controller servers w/
>> CPU X and Ram Y for Ceph monitors and Ceph Storage servers RAM/CPU
>> info)
>> 4. Did you override the RAM/CPU for the Mon, Mgr, and OSD containers?
>> If so, what did you override them to?
>>
>> TripleO can pass any parameter you would normally pass to ceph-ansible
>> as described in the following:
>>
>>
>> https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#customizing-ceph-conf-with-ceph-ansible
>>
>> So if you let them know things in terms of a containerized
>> ceph-ansible luminous deployment and the ceph.conf and they have
>> suggestions, then you can apply the suggestions back to ceph-ansible
>> through tripleo as described above. If you start troubleshooting the
>> cluster as per this troubleshooting guide [2] and share the results
>> that would also help.
>>
>> I've gotten better performance than you describe on a completely
>> virtualized deployment using my PC [1] using quickstart with the
>> defaults that TripleO passes using queens and rocky. Though, TripleO
>> tends to favor the defaults which ceph-ansible uses. However, with a
>> single 1G port for all network traffic I don't expect great
>> performance.
>>
>> Feel free to CC me when you email ceph-users and feel free to share on
>> rdo-users a link to the thread you started there in case anyone else
>> on this list is interested.
>>
>>   John
>>
>> [1]
>> http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html
>> [2]
>> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/pdf/troubleshooting_guide/Red_Hat_Ceph_Storage-3-Troubleshooting_Guide-en-US.pdf
>>
>> > Thank you very much.
>> >
>> > Best regards,
>> > Cody
>> > ___
>> > users mailing list
>> > users@lists.rdoproject.org
>> > http://lists.rdoproject.org/mailman/listinfo/users
>> >
>> > To unsubscribe: users-unsubscr...@lists.rdoproject.org
>> ___
>> users mailing list
>> users@lists.rdoproject.org
>> http://lists.rdoproject.org/mailman/listinfo/users
>>
>> To unsubscribe: users-unsubscr...@lists.rdoproject.org
>>
>
___
users mailing list
users@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/users

To unsubscribe: users-unsubscr...@lists.rdoproject.org


[rdo-users] [Fedocal] Reminder meeting : RDO meeting

2018-11-26 Thread hguemar
Dear all,

You are kindly invited to the meeting:
   RDO meeting on 2018-11-28 from 15:00:00 to 16:00:00 UTC
   At r...@irc.freenode.net

The meeting will be about:
RDO IRC meeting
[Agenda at https://etherpad.openstack.org/p/RDO-Meeting 
](https://etherpad.openstack.org/p/RDO-Meeting)

Every Wednesday on #rdo on Freenode IRC


Source: https://apps.fedoraproject.org/calendar/meeting/8759/

___
users mailing list
users@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/users

To unsubscribe: users-unsubscr...@lists.rdoproject.org


Re: [rdo-users] Poor Ceph Performance

2018-11-26 Thread John Fulton
On Sun, Nov 25, 2018 at 11:29 PM Cody  wrote:
>
> Hello,
>
> My tripleO cluster is deployed with Ceph. Both Cinder and Nova use RBD
> as backend. While all essential functions work, services involving
> Ceph are getting very poor performance. E.g., it takes several hours
> to upload an 8GB image into Cinder and about 20 minutes to completely
> boot up an instance (from launch to ssh ready).
>
> Running 'ceph -s' shows a top write speed at 6~700 KiB/s during image
> upload and read speed 2 MiB/s during instance launch.
>
> I used the default scheme for network isolation and a single 1G port
> for all VLAN traffics on each overcloud node. I haven't set jumbo
> frame on the storage network VLAN yet, but think the performance
> should not be this bad with MTU 1500. Something must be wrong. Any
> suggestions for debugging?

Hi Cody,

If you're using queens or rocky, then ceph luminous was deployed in
containers. Though tripleo did the overall deployment, ceph-ansible
would have done the actual ceph deployment and configuration and you
can determine the ceph-ansible version via 'rpm -q ceph-ansible' on
your undercloud. It probably makes sense for you to pass along what
you mentioned above in addition to some other info, which I'll note
below, to the ceph-users list
(http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com), who will be
focused on ceph itself. When you contact them (I'm on the list too)
also let them know the following:

1. How many OSD servers you have and how many OSDs per server
2. What type of disks you're using per OSD and how you set up journaling
3. Specs of your servers themselves (OpenStack controller servers w/
CPU X and Ram Y for Ceph monitors and Ceph Storage servers RAM/CPU
info)
4. Did you override the RAM/CPU for the Mon, Mgr, and OSD containers?
If so, what did you override them to?

TripleO can pass any parameter you would normally pass to ceph-ansible
as described in the following:

 
https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#customizing-ceph-conf-with-ceph-ansible

So if you let them know things in terms of a containerized
ceph-ansible luminous deployment and the ceph.conf and they have
suggestions, then you can apply the suggestions back to ceph-ansible
through tripleo as described above. If you start troubleshooting the
cluster as per this troubleshooting guide [2] and share the results
that would also help.

I've gotten better performance than you describe on a completely
virtualized deployment using my PC [1] using quickstart with the
defaults that TripleO passes using queens and rocky. Though, TripleO
tends to favor the defaults which ceph-ansible uses. However, with a
single 1G port for all network traffic I don't expect great
performance.

Feel free to CC me when you email ceph-users and feel free to share on
rdo-users a link to the thread you started there in case anyone else
on this list is interested.

  John

[1] http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html
[2] 
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/pdf/troubleshooting_guide/Red_Hat_Ceph_Storage-3-Troubleshooting_Guide-en-US.pdf

> Thank you very much.
>
> Best regards,
> Cody
> ___
> users mailing list
> users@lists.rdoproject.org
> http://lists.rdoproject.org/mailman/listinfo/users
>
> To unsubscribe: users-unsubscr...@lists.rdoproject.org
___
users mailing list
users@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/users

To unsubscribe: users-unsubscr...@lists.rdoproject.org