Re: [ceph-users] Rbd map command doesn't work
I changed the profile to Hammer and it works. This bring up a question, by changing the profile to “Hammer” am I going to lose some of the performance optimizations done in ‘Jewel’? - epk From: Bruce McFarland [mailto:bkmcfarl...@earthlink.net] Sent: Tuesday, August 16, 2016 4:52 PM To: Somnath Roy Cc: EP Komarla ; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Rbd map command doesn't work EP, Try setting the crush map to use legacy tunables. I've had the same issue with the"feature mismatch" errors when using krbd that didn't support format 2 and running jewel 10.2.2 on the storage nodes. From the command line: ceph osd crush tunables legacy Bruce On Aug 16, 2016, at 4:21 PM, Somnath Roy mailto:somnath@sandisk.com>> wrote: This is usual feature mismatch stuff , the inbox krbd you are using is not supporting Jewel. Try googling with the error and I am sure you will get lot of prior discussion around that.. From: EP Komarla [mailto:ep.koma...@flextronics.com] Sent: Tuesday, August 16, 2016 4:15 PM To: Somnath Roy; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: RE: Rbd map command doesn't work Somnath, Thanks. I am trying your suggestion. See the commands below. Still it doesn’t seem to go. I am missing something here… Thanks, - epk = [test@ep-c2-client-01 ~]$ rbd create rbd/test1 --size 1G --image-format 1 rbd: image format 1 is deprecated [test@ep-c2-client-01 ~]$ rbd map rbd/test1 rbd: sysfs write failed In some cases useful info is found in syslog - try "dmesg | tail" or so. rbd: map failed: (13) Permission denied [test@ep-c2-client-01 ~]$ sudo rbd map rbd/test1 ^C[test@ep-c2-client-01 ~]$ [test@ep-c2-client-01 ~]$ [test@ep-c2-client-01 ~]$ [test@ep-c2-client-01 ~]$ [test@ep-c2-client-01 ~]$ dmesg|tail -20 [1201954.248195] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1201954.253365] libceph: mon0 172.20.60.51:6789 missing required protocol features [1201964.274082] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1201964.281195] libceph: mon0 172.20.60.51:6789 missing required protocol features [1201974.298195] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1201974.305300] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204128.917562] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204128.924173] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204138.956737] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204138.964011] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204148.980701] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204148.987892] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204159.004939] libceph: mon2 172.20.60.53:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204159.012136] libceph: mon2 172.20.60.53:6789 missing required protocol features [1204169.028802] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204169.035992] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204476.803192] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204476.810578] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204486.821279] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 From: Somnath Roy [mailto:somnath@sandisk.com] Sent: Tuesday, August 16, 2016 3:59 PM To: EP Komarla mailto:ep.koma...@flextronics.com>>; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: RE: Rbd map command doesn't work The default format of rbd image in jewel is 2 along with bunch of other deatures enabled , so, you have following two option: 1. create a format 1 image –image-format 1 2. Or, do this in the ceph.conf file [client] or [global] before creating image.. rbd_default_features = 3 Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of EP Komarla Sent: Tuesday, August 16, 2016 2:52 PM To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: [ceph-users] Rbd map command doesn't work All, I am creating an image and mapping it. The be
Re: [ceph-users] Rbd map command doesn't work
Somnath, Thanks. I am trying your suggestion. See the commands below. Still it doesn't seem to go. I am missing something here... Thanks, - epk = [test@ep-c2-client-01 ~]$ rbd create rbd/test1 --size 1G --image-format 1 rbd: image format 1 is deprecated [test@ep-c2-client-01 ~]$ rbd map rbd/test1 rbd: sysfs write failed In some cases useful info is found in syslog - try "dmesg | tail" or so. rbd: map failed: (13) Permission denied [test@ep-c2-client-01 ~]$ sudo rbd map rbd/test1 ^C[test@ep-c2-client-01 ~]$ [test@ep-c2-client-01 ~]$ [test@ep-c2-client-01 ~]$ [test@ep-c2-client-01 ~]$ [test@ep-c2-client-01 ~]$ dmesg|tail -20 [1201954.248195] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1201954.253365] libceph: mon0 172.20.60.51:6789 missing required protocol features [1201964.274082] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1201964.281195] libceph: mon0 172.20.60.51:6789 missing required protocol features [1201974.298195] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1201974.305300] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204128.917562] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204128.924173] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204138.956737] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204138.964011] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204148.980701] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204148.987892] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204159.004939] libceph: mon2 172.20.60.53:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204159.012136] libceph: mon2 172.20.60.53:6789 missing required protocol features [1204169.028802] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204169.035992] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204476.803192] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1204476.810578] libceph: mon0 172.20.60.51:6789 missing required protocol features [1204486.821279] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 From: Somnath Roy [mailto:somnath@sandisk.com] Sent: Tuesday, August 16, 2016 3:59 PM To: EP Komarla ; ceph-users@lists.ceph.com Subject: RE: Rbd map command doesn't work The default format of rbd image in jewel is 2 along with bunch of other deatures enabled , so, you have following two option: 1. create a format 1 image -image-format 1 2. Or, do this in the ceph.conf file [client] or [global] before creating image.. rbd_default_features = 3 Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of EP Komarla Sent: Tuesday, August 16, 2016 2:52 PM To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: [ceph-users] Rbd map command doesn't work All, I am creating an image and mapping it. The below commands used to work in Hammer, now the same is not working in Jewel. I see the message about some feature set mismatch - what features are we talking about here? Is this a known issue in Jewel with a workaround? Thanks, - epk = [test@ep-c2-client-01 ~]$ rbd create rbd/test1 --size 1G [test@ep-c2-client-01 ~]$ rbd info test1 rbd image 'test1': size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.8146238e1f29 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: [test@ep-c2-client-01 ~]$ rbd map rbd/test1 rbd: sysfs write failed In some cases useful info is found in syslog - try "dmesg | tail" or so. rbd: map failed: (13) Permission denied [test@ep-c2-client-01 ~]$ dmesg|tail [1197731.547522] libceph: mon1 172.20.60.52:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1197731.554621] libceph: mon1 172.20.60.52:6789 missing required protocol features [1197741.571645] libceph: mon2 172.20.60.53:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1197741.578760] libceph: mon2 172.20.60.53:6789
[ceph-users] Rbd map command doesn't work
All, I am creating an image and mapping it. The below commands used to work in Hammer, now the same is not working in Jewel. I see the message about some feature set mismatch - what features are we talking about here? Is this a known issue in Jewel with a workaround? Thanks, - epk = [test@ep-c2-client-01 ~]$ rbd create rbd/test1 --size 1G [test@ep-c2-client-01 ~]$ rbd info test1 rbd image 'test1': size 1024 MB in 256 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.8146238e1f29 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: [test@ep-c2-client-01 ~]$ rbd map rbd/test1 rbd: sysfs write failed In some cases useful info is found in syslog - try "dmesg | tail" or so. rbd: map failed: (13) Permission denied [test@ep-c2-client-01 ~]$ dmesg|tail [1197731.547522] libceph: mon1 172.20.60.52:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1197731.554621] libceph: mon1 172.20.60.52:6789 missing required protocol features [1197741.571645] libceph: mon2 172.20.60.53:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1197741.578760] libceph: mon2 172.20.60.53:6789 missing required protocol features [1198586.766120] libceph: mon1 172.20.60.52:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1198586.771248] libceph: mon1 172.20.60.52:6789 missing required protocol features [1198596.789453] libceph: mon0 172.20.60.51:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1198596.796557] libceph: mon0 172.20.60.51:6789 missing required protocol features [1198606.813825] libceph: mon1 172.20.60.52:6789 feature set mismatch, my 102b84a842a42 < server's 40102b84a842a42, missing 400 [1198606.820929] libceph: mon1 172.20.60.52:6789 missing required protocol features [test@ep-c2-client-01 ~]$ sudo rbd map rbd/test1 EP KOMARLA, [Flex_RGB_Sml_tm] Emal: ep.koma...@flextronics.com Address: 677 Gibraltor Ct, Building #2, Milpitas, CA 94035, USA Phone: 408-674-6090 (mobile) Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd readahead settings
Team, I am trying to configure the rbd readahead value? Before I increase this value, I am trying to find out the current value that is set to. How do I know the values of these parameters? rbd readahead max bytes rbd readahead trigger requests rbd readahead disable after bytes Thanks, - epk EP KOMARLA, [Flex_RGB_Sml_tm] Emal: ep.koma...@flextronics.com Address: 677 Gibraltor Ct, Building #2, Milpitas, CA 94035, USA Phone: 408-674-6090 (mobile) Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph-deploy on Jewel error
Hi All, I am trying to do a fresh install of Ceph Jewel on my cluster. I went through all the steps in configuring the network, ssh, password, etc. Now I am at the stage of running the ceph-deploy commands to install monitors and other nodes. I am getting the below error when I am deploying the first monitor. Not able to figure out what it is that I am missing here. Any pointers or help appreciated. Thanks in advance. - epk [ep-c2-mon-01][DEBUG ] ---> Package librbd1.x86_64 1:0.94.7-0.el7 will be updated [ep-c2-mon-01][DEBUG ] ---> Package librbd1.x86_64 1:10.2.2-0.el7 will be an update [ep-c2-mon-01][DEBUG ] ---> Package python-cephfs.x86_64 1:0.94.7-0.el7 will be updated [ep-c2-mon-01][DEBUG ] ---> Package python-cephfs.x86_64 1:10.2.2-0.el7 will be an update [ep-c2-mon-01][DEBUG ] ---> Package python-rados.x86_64 1:0.94.7-0.el7 will be updated [ep-c2-mon-01][DEBUG ] ---> Package python-rados.x86_64 1:10.2.2-0.el7 will be an update [ep-c2-mon-01][DEBUG ] ---> Package python-rbd.x86_64 1:0.94.7-0.el7 will be updated [ep-c2-mon-01][DEBUG ] ---> Package python-rbd.x86_64 1:10.2.2-0.el7 will be an update [ep-c2-mon-01][DEBUG ] --> Running transaction check [ep-c2-mon-01][DEBUG ] ---> Package ceph-selinux.x86_64 1:10.2.2-0.el7 will be installed [ep-c2-mon-01][DEBUG ] --> Processing Dependency: selinux-policy-base >= 3.13.1-60.el7_2.3 for package: 1:ceph-selinux-10.2.2-0.el7.x86_64 [ep-c2-mon-01][DEBUG ] ---> Package python-setuptools.noarch 0:0.9.8-4.el7 will be installed [ep-c2-mon-01][DEBUG ] --> Finished Dependency Resolution [ep-c2-mon-01][WARNIN] Error: Package: 1:ceph-selinux-10.2.2-0.el7.x86_64 (ceph) [ep-c2-mon-01][DEBUG ] You could try using --skip-broken to work around the problem [ep-c2-mon-01][WARNIN]Requires: selinux-policy-base >= 3.13.1-60.el7_2.3 [ep-c2-mon-01][WARNIN]Installed: selinux-policy-targeted-3.13.1-60.el7.noarch (@CentOS/7) [ep-c2-mon-01][WARNIN]selinux-policy-base = 3.13.1-60.el7 [ep-c2-mon-01][WARNIN]Available: selinux-policy-minimum-3.13.1-60.el7.noarch (CentOS-7) [ep-c2-mon-01][WARNIN]selinux-policy-base = 3.13.1-60.el7 [ep-c2-mon-01][WARNIN]Available: selinux-policy-mls-3.13.1-60.el7.noarch (CentOS-7) [ep-c2-mon-01][WARNIN]selinux-policy-base = 3.13.1-60.el7 [ep-c2-mon-01][DEBUG ] You could try running: rpm -Va --nofiles --nodigest [ep-c2-mon-01][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install ceph ceph-radosgw EP KOMARLA, [Flex_RGB_Sml_tm] Emal: ep.koma...@flextronics.com Address: 677 Gibraltor Ct, Building #2, Milpitas, CA 94035, USA Phone: 408-674-6090 (mobile) Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance pattern
I am using O_DIRECT=1 -Original Message- From: Mark Nelson [mailto:mnel...@redhat.com] Sent: Wednesday, July 27, 2016 8:33 AM To: EP Komarla ; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Ceph performance pattern Ok. Are you using O_DIRECT? That will disable readahead on the client, but if you don't use O_DIRECT you won't get the benefit of iodepth=16. See fio's man page: "Number of I/O units to keep in flight against the file. Note that increasing iodepth beyond 1 will not affect synchronous ioengines (except for small degress when verify_async is in use). Even async engines my impose OS restrictions causing the desired depth not to be achieved. This may happen on Linux when using libaio and not setting direct=1, since buffered IO is not async on that OS. Keep an eye on the IO depth distribution in the fio output to verify that the achieved depth is as expected. Default: 1." IE, how you are testing could really affect the ability to do client-side readahead and may affect how much client-side concurrency you are getting. Mark On 07/27/2016 10:14 AM, EP Komarla wrote: > I am using aio engine in fio. > > Fio is working on rbd images > > - epk > > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > Of Mark Nelson > Sent: Tuesday, July 26, 2016 6:27 PM > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Ceph performance pattern > > Hi epk, > > Which ioengine are you using? if it's librbd, you might try playing with > librbd readahead as well: > > # don't disable readahead after a certain number of bytes rbd > readahead disable after bytes = 0 > > # Set the librbd readahead to whatever: > rbd readahead max bytes = 4194304 > > If it's with kvm+guests, you may be better off playing with the guest > readahead but you can try the librbd readahead if you want. > > Another thing to watch out for is fragmentation. btrfs OSDs for example will > fragment terribly after small random writes to RBD images due to how > copy-on-write works. That can cause havoc with RBD sequential reads in > general. > > Mark > > > On 07/26/2016 06:38 PM, EP Komarla wrote: >> Hi, >> >> >> >> I am showing below fio results for Sequential Read on my Ceph cluster. >> I am trying to understand this pattern: >> >> >> >> - why there is a dip in the performance for block sizes 32k-256k? >> >> - is this an expected performance graph? >> >> - have you seen this kind of pattern before >> >> >> >> >> >> My cluster details: >> >> Ceph: Hammer release >> >> Cluster: 6 nodes (dual Intel sockets) each with 20 OSDs and 4 SSDs (5 >> OSD journals on one SSD) >> >> Client network: 10Gbps >> >> Cluster network: 10Gbps >> >> FIO test: >> >> - 2 Client servers >> >> - Sequential Read >> >> - Run time of 600 seconds >> >> - Filesize = 1TB >> >> - 10 rbd images per client >> >> - Queue depth=16 >> >> >> >> Any ideas on tuning this cluster? Where should I look first? >> >> >> >> Thanks, >> >> >> >> - epk >> >> >> >> >> Legal Disclaimer: >> The information contained in this message may be privileged and >> confidential. It is intended to be read only by the individual or >> entity to whom it is addressed or by their designee. If the reader of >> this message is not the intended recipient, you are on notice that >> any distribution of this message, in any form, is strictly >> prohibited. If you have received this message in error, please >> immediately notify the sender and delete or destroy any copy of this message! >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > Legal Disclaimer: > The information contained in this message may be privileged and confidential. > It is intended to be read only by the individual or entity to whom it is > addressed or by their designee. If the reader of this message is not the > intended recipient, you are on notice that any distribution of this message, > in any form, is strictly prohibited. If you have received this message in > error, please immediately notify the sender and delete or destroy any copy of > this mess
Re: [ceph-users] Ceph performance pattern
I am using aio engine in fio. Fio is working on rbd images - epk -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark Nelson Sent: Tuesday, July 26, 2016 6:27 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Ceph performance pattern Hi epk, Which ioengine are you using? if it's librbd, you might try playing with librbd readahead as well: # don't disable readahead after a certain number of bytes rbd readahead disable after bytes = 0 # Set the librbd readahead to whatever: rbd readahead max bytes = 4194304 If it's with kvm+guests, you may be better off playing with the guest readahead but you can try the librbd readahead if you want. Another thing to watch out for is fragmentation. btrfs OSDs for example will fragment terribly after small random writes to RBD images due to how copy-on-write works. That can cause havoc with RBD sequential reads in general. Mark On 07/26/2016 06:38 PM, EP Komarla wrote: > Hi, > > > > I am showing below fio results for Sequential Read on my Ceph cluster. > I am trying to understand this pattern: > > > > - why there is a dip in the performance for block sizes 32k-256k? > > - is this an expected performance graph? > > - have you seen this kind of pattern before > > > > > > My cluster details: > > Ceph: Hammer release > > Cluster: 6 nodes (dual Intel sockets) each with 20 OSDs and 4 SSDs (5 > OSD journals on one SSD) > > Client network: 10Gbps > > Cluster network: 10Gbps > > FIO test: > > - 2 Client servers > > - Sequential Read > > - Run time of 600 seconds > > - Filesize = 1TB > > - 10 rbd images per client > > - Queue depth=16 > > > > Any ideas on tuning this cluster? Where should I look first? > > > > Thanks, > > > > - epk > > > > > Legal Disclaimer: > The information contained in this message may be privileged and > confidential. It is intended to be read only by the individual or > entity to whom it is addressed or by their designee. If the reader of > this message is not the intended recipient, you are on notice that any > distribution of this message, in any form, is strictly prohibited. If > you have received this message in error, please immediately notify the > sender and delete or destroy any copy of this message! > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance pattern
Thanks Somnath. I am running with CentOS7.2. Have you seen this pattern before? - epk From: Somnath Roy [mailto:somnath@sandisk.com] Sent: Tuesday, July 26, 2016 4:44 PM To: EP Komarla ; ceph-users@lists.ceph.com Subject: RE: Ceph performance pattern Which OS/kernel you are running with ? Try setting bigger read_ahead_kb for sequential runs. Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of EP Komarla Sent: Tuesday, July 26, 2016 4:38 PM To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: [ceph-users] Ceph performance pattern Hi, I am showing below fio results for Sequential Read on my Ceph cluster. I am trying to understand this pattern: - why there is a dip in the performance for block sizes 32k-256k? - is this an expected performance graph? - have you seen this kind of pattern before [cid:image001.png@01D1E75D.B0A2F760] My cluster details: Ceph: Hammer release Cluster: 6 nodes (dual Intel sockets) each with 20 OSDs and 4 SSDs (5 OSD journals on one SSD) Client network: 10Gbps Cluster network: 10Gbps FIO test: - 2 Client servers - Sequential Read - Run time of 600 seconds - Filesize = 1TB - 10 rbd images per client - Queue depth=16 Any ideas on tuning this cluster? Where should I look first? Thanks, - epk Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message! PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph performance pattern
Hi, I am showing below fio results for Sequential Read on my Ceph cluster. I am trying to understand this pattern: - why there is a dip in the performance for block sizes 32k-256k? - is this an expected performance graph? - have you seen this kind of pattern before [cid:image001.png@01D1E75C.2224A750] My cluster details: Ceph: Hammer release Cluster: 6 nodes (dual Intel sockets) each with 20 OSDs and 4 SSDs (5 OSD journals on one SSD) Client network: 10Gbps Cluster network: 10Gbps FIO test: - 2 Client servers - Sequential Read - Run time of 600 seconds - Filesize = 1TB - 10 rbd images per client - Queue depth=16 Any ideas on tuning this cluster? Where should I look first? Thanks, - epk Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph performance calculator
Team, Have a performance related question on Ceph. I know performance of a ceph cluster depends on so many factors like type of storage servers, processors (no of processor, raw performance of processor), memory, network links, type of disks, journal disks, etc. On top of the hardware features, it is also influenced by the type of operation you are doing like seqRead, seqWrite, blocksize, etc., etc. Today one way we demonstrate performance is using benchmarks and test configurations. As a result, it is difficult to compare performance without understanding the underlying system and the usecases. Now coming to my question. Is there a Ceph performance calculator, that takes all (or some) of these factors and gives out an estimate of the performance you can expect for different scenarios? I was asked this question, I didn't know how to answer this question, I thought of checking with the wider user group to see if someone is aware of such a tool or knows how to do this calculation. Any pointers will be appreciated. Thanks, - epk Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD dropped out, now trying to get them back on to the cluster
The first question I have is to understand why some disks/OSDs showed status of 'DOWN' - there was no activity on the cluster. Last night all the OSDs were up. What can cause OSDs to go down? - epk From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of EP Komarla Sent: Monday, July 18, 2016 6:43 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] OSD dropped out, now trying to get them back on to the cluster Hi, I have created a cluster with the below configuration: - 6 Storage nodes, each with 20 disks - I have total of 120 OSDs Cluster was working fine. All of a sudden today morning I noticed some OSD's (7 to be exact) were down on one server. I rebooted the server, 4 OSDs came back. Three OSDs were stuck on 'down' state. I tried to bring them online, wasn't successful. I removed the OSDs using: Ceph osd rm osd.0 Ceph osd crush remove osd.0 I ran the same command for all the 3 osd's that were done. Now I am left with 17 OSDs. I am trying to bring them back online. I used the commands: Ceph-deploy osd create ep-storage-2-14:sda:sdu Ceph-deploy osd create ep-storage-2-14:sda:sdu Where sda is the OSD partition and sdu is the journal partition I am getting this below error: [cid:image002.png@01D1E125.5FC858A0] I am not following what this error means. Can someone help me on how to bring these OSDs back? I know I am making some mistake, but can't figure out. Thanks in advance, - epk EP KOMARLA, [Flex_RGB_Sml_tm] Emal: ep.koma...@flextronics.com<mailto:ep.koma...@flextronics.com> Address: 677 Gibraltor Ct, Building #2, Milpitas, CA 94035, USA Phone: 408-674-6090 (mobile) Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message! Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD dropped out, now trying to get them back on to the cluster
Hi, I have created a cluster with the below configuration: - 6 Storage nodes, each with 20 disks - I have total of 120 OSDs Cluster was working fine. All of a sudden today morning I noticed some OSD's (7 to be exact) were down on one server. I rebooted the server, 4 OSDs came back. Three OSDs were stuck on 'down' state. I tried to bring them online, wasn't successful. I removed the OSDs using: Ceph osd rm osd.0 Ceph osd crush remove osd.0 I ran the same command for all the 3 osd's that were done. Now I am left with 17 OSDs. I am trying to bring them back online. I used the commands: Ceph-deploy osd create ep-storage-2-14:sda:sdu Ceph-deploy osd create ep-storage-2-14:sda:sdu Where sda is the OSD partition and sdu is the journal partition I am getting this below error: [cid:image002.png@01D1E124.2D68CC10] I am not following what this error means. Can someone help me on how to bring these OSDs back? I know I am making some mistake, but can't figure out. Thanks in advance, - epk EP KOMARLA, [Flex_RGB_Sml_tm] Emal: ep.koma...@flextronics.com Address: 677 Gibraltor Ct, Building #2, Milpitas, CA 94035, USA Phone: 408-674-6090 (mobile) Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd command anomaly
Thanks. It works. From: c.y. lee [mailto:c...@inwinstack.com] Sent: Wednesday, July 13, 2016 6:17 PM To: EP Komarla Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] rbd command anomaly Hi, You need to specify pool name. rbd -p testpool info testvol11 On Thu, Jul 14, 2016 at 8:55 AM, EP Komarla mailto:ep.koma...@flextronics.com>> wrote: Hi, I am seeing an issue. I created 5 images testvol11-15 and I mapped them to /dev/rbd0-4. When I execute the command ‘rbd showmapped’, it shows correctly the image and the mappings as shown below: [root@ep-compute-2-16 run1]# rbd showmapped id pool image snap device 0 testpool testvol11 -/dev/rbd0 1 testpool testvol12 -/dev/rbd1 2 testpool testvol13 -/dev/rbd2 3 testpool testvol14 -/dev/rbd3 4 testpool testvol15 -/dev/rbd4 I created image by this command: rbd create testvol11 -p testpool --size 512 -m ep-compute-2-15 mapping was done using this command: rbd map testvol11 -p testpool --name client.admin -m ep-compute-2-15 However, when I try to find details about each image it is failing [root@ep-compute-2-16 run1]# rbd info testvol11 2016-07-13 17:50:23.093293 7f3372c1a7c0 -1 librbd::ImageCtx: error finding header: (2) No such file or directory rbd: error opening image testvol11: (2) No such file or directory Even the image list fails: [root@ep-compute-2-16 run1]# rbd ls [root@ep-compute-2-16 run1]# Unable to understand why I am seeing this anomaly. Any clues or pointers are appreciated. Thanks, - epk Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message! ___ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd command anomaly
Hi, I am seeing an issue. I created 5 images testvol11-15 and I mapped them to /dev/rbd0-4. When I execute the command 'rbd showmapped', it shows correctly the image and the mappings as shown below: [root@ep-compute-2-16 run1]# rbd showmapped id pool image snap device 0 testpool testvol11 -/dev/rbd0 1 testpool testvol12 -/dev/rbd1 2 testpool testvol13 -/dev/rbd2 3 testpool testvol14 -/dev/rbd3 4 testpool testvol15 -/dev/rbd4 I created image by this command: rbd create testvol11 -p testpool --size 512 -m ep-compute-2-15 mapping was done using this command: rbd map testvol11 -p testpool --name client.admin -m ep-compute-2-15 However, when I try to find details about each image it is failing [root@ep-compute-2-16 run1]# rbd info testvol11 2016-07-13 17:50:23.093293 7f3372c1a7c0 -1 librbd::ImageCtx: error finding header: (2) No such file or directory rbd: error opening image testvol11: (2) No such file or directory Even the image list fails: [root@ep-compute-2-16 run1]# rbd ls [root@ep-compute-2-16 run1]# Unable to understand why I am seeing this anomaly. Any clues or pointers are appreciated. Thanks, - epk Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Question on Sequential Write performance at 4K blocksize
Hi All, Have a question on the performance of sequential write @ 4K block sizes. Here is my configuration: Ceph Cluster: 6 Nodes. Each node with :- 20x HDDs (OSDs) - 10K RPM 1.2 TB SAS disks SSDs - 4x - Intel S3710, 400GB; for OSD journals shared across 20 HDDs (i.e., SSD journal ratio 1:5) Network: - Client network - 10Gbps - Cluster network - 10Gbps - Each node with dual NIC - Intel 82599 ES - driver version 4.0.1 Traffic generators: 2 client servers - running on dual Intel sockets with 16 physical cores (32 cores with hyper-threading enabled) Test program: FIO - sequential read/write; random read/write Blocksizes - 4k, 32k, 256k... FIO - Number of jobs = 32; IO depth = 64 Runtime = 10 minutes; Ramptime = 5 minutes Filesize = 4096g (5TB) I observe that my sequential write performance at 4K block size is very low - I am getting around 6MB/sec bandwidth. The performance improves significantly at larger block sizes (shown below) FIO - Sequential Write test Block Size Sequential Write Bandwidth KB/Sec 4K 5694 32K 141020 256K 747421 1024K 602236 4096K 683029 Here are my questions: - Why is the sequential write performance at 4K block size so low? Is this in-line what others see? - Is it because of less number of clients, i.e., traffic generators? I am planning to increase the number of clients to 4 servers. - There is a later version on NIC driver from Intel, v4.3.15 - do you think upgrading to later version (v4.3.15) will improve performance? Any thoughts or pointers will be helpful. Thanks, - epk Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph OSD journal utilization
Hi, I am looking for a way to monitor the utilization of OSD journals - by observing the utilization pattern over time, I can determine if I have over provisioned them or not. Is there a way to do this? When I googled on this topic, I saw one similar request about 4 years back. I am wondering if there is some traction on this topic since then. Thanks a lot. - epk Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Do you see a data loss if a SSD hosting several OSD journals crashes
So, which is correct, all replicas must be written or only min_size before ack? But for me the takeaway is that writes are protected - even if the journal drive crashes, I am covered. - epk -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Anthony D'Atri Sent: Friday, May 20, 2016 1:32 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Do you see a data loss if a SSD hosting several OSD journals crashes > Ceph will not acknowledge a client write before all journals (replica > size, 3 by default) have received the data, so loosing one journal SSD > will NEVER result in an actual data loss. Some say that all replicas must be written; others say that only min_size, 2 by default, must be written before ack. --aad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] NVRAM cards as OSD journals
Hi, I am contemplating using a NVRAM card for OSD journals in place of SSD drives in our ceph cluster. Configuration: * 4 Ceph servers * Each server has 24 OSDs (each OSD is a 1TB SAS drive) * 1 PCIe NVRAM card of 16GB capacity per ceph server * Both Client & cluster network is 10Gbps As per ceph documents: The expected throughput number should include the expected disk throughput (i.e., sustained data transfer rate), and network throughput. For example, a 7200 RPM disk will likely have approximately 100 MB/s. Taking the min() of the disk and network throughput should provide a reasonable expected throughput. Some users just start off with a 10GB journal size. For example: osd journal size = 1 Given that I have a single 16GB card per server that has to be carved among all 24OSDs, I will have to configure each OSD journal to be much smaller around 600MB, i.e., 16GB/24 drives. This value is much smaller than 10GB/OSD journal that is generally used. So, I am wondering if this configuration and journal size is valid. Is there a performance benefit of having a journal that is this small? Also, do I have to reduce the default "filestore maxsync interval" from 5 seconds to a smaller value say 2 seconds to match the smaller journal size? Have people used NVRAM cards in the Ceph clusters as journals? What is their experience? Any thoughts? Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Do you see a data loss if a SSD hosting several OSD journals crashes
* We are trying to assess if we are going to see a data loss if an SSD that is hosting journals for few OSDs crashes. In our configuration, each SSD is partitioned into 5 chunks and each chunk is mapped as a journal drive for one OSD. What I understand from the Ceph documentation: "Consistency: Ceph OSD Daemons require a filesystem interface that guarantees atomic compound operations. Ceph OSD Daemons write a description of the operation to the journal and apply the operation to the filesystem. This enables atomic updates to an object (for example, placement group metadata). Every few seconds-between filestore max sync interval and filestore min sync interval-the Ceph OSD Daemon stops writes and synchronizes the journal with the filesystem, allowing Ceph OSD Daemons to trim operations from the journal and reuse the space. On failure, Ceph OSD Daemons replay the journal starting after the last synchronization operation." So, my question is what happens if an SSD fails - am I going to lose all the data that has not been written/synchronized to OSD? In my case, am I going to lose data for all the 5 OSDs which can be bad? This is of concern to us. What are the options to prevent any data loss at all? Is it better to have the journals on the same hard drive, i.e., to have one journal per OSD and host it on the same hard drive? Of course, performance will not be as good as having an SSD for OSD journal. In this case, I am thinking I will not lose data as there are secondary OSDs where data is replicated (we are using triple replication). Any thoughts? What other solutions people have adopted for data reliability and consistency to address the case I am mentioning? Legal Disclaimer: The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com