[ceph-users] OSD latency inaccurate reports?
Hello, I noticed that commit/apply latency reported using: ceph pg dump -f json-pretty is very different from the values reported when querying the OSD sockets. What is your opinion? What are the targets the I should fetch metrics from in order to be as precise as possible? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Slow requests during ceph osd boot
Hello, after rebooting a ceph node and the OSDs starting booting and joining the cluster, we experience slow requests that get resolved immediately after cluster recovers. It is improtant to note that before the node reboot, we set noout flag in order to prevent recovery - so there are only degraded PGs when OSDs shut down- and let the cluster handle the OSDs down/up in the lightest way. Is there any tunable we should consider in order to avoid service degradation for our ceph clients? Regards, Kostis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Firefly 0.80.10 ready to upgrade to?
On Mon, Jul 13, 2015 at 11:25 AM, Kostis Fardelas dante1...@gmail.com wrote: Hello, it seems that new packages for firefly have been uploaded to repo. However, I can't find any details in Ceph Release notes. There is only one thread in ceph-devel [1], but it is not clear what this new version is about. Is it safe to upgrade from 0.80.9 to 0.80.10? These packages got created and uploaded to the repository without release notes. I'm not sure why but I believe they're safe to use. Hopefully Sage and our release guys can resolve that soon as we've gotten several queries on the subject. :) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All pgs with - up [0] acting [0], new cluster installation
Maybe this can help to get the origin of the problem. If I run ceph pg dump, and the end of the response i get: osdstat kbused kbavail kb hb in hb out 0 36688 5194908 5231596 [1,2,3,4,5,6,7,8] [] 1 34004 5197592 5231596 [] [] 2 34004 5197592 5231596 [1] [] 3 34004 5197592 5231596 [0,1,2,4,5,6,7,8] [] 4 34004 5197592 5231596 [1,2] [] 5 34004 5197592 5231596 [1,2,4] [] 6 34004 5197592 5231596 [0,1,2,3,4,5,7,8] [] 7 34004 5197592 5231596 [1,2,4,5] [] 8 34004 5197592 5231596 [1,2,4,5,7] [] sum 308720 46775644 47084364 Please someone can help me? 2015-07-13 11:45 GMT+02:00 alberto ayllon albertoayllon...@gmail.com: Hello everybody and thanks for your help. Hello, I'm newbie in CEPH, I'm trying to install a CEPH cluster with test purpose. I had just installed a CEPH cluster with three VMs (ubuntu 14.04), each one has one mon daemon and three OSDs, also each server has 3 disk. Cluster has only one poll (rbd) with pg and pgp_num = 280, and osd pool get rbd size = 2. I made cluster's installation with ceph-deploy, ceph version is 0.94.2 I think cluster's OSDs are having peering problems, because if I run ceph status, it returns: # ceph status cluster d54a2216-b522-4744-a7cc-a2106e1281b6 health HEALTH_WARN 280 pgs degraded 280 pgs stuck degraded 280 pgs stuck unclean 280 pgs stuck undersized 280 pgs undersized monmap e3: 3 mons at {ceph01= 172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0 } election epoch 38, quorum 0,1,2 ceph01,ceph02,ceph03 osdmap e46: 9 osds: 9 up, 9 in pgmap v129: 280 pgs, 1 pools, 0 bytes data, 0 objects 301 MB used, 45679 MB / 45980 MB avail 280 active+undersized+degraded And for all pgs, the command ceph pg map X.yy returns something like: osdmap e46 pg 0.d7 (0.d7) - up [0] acting [0] As I know Acting Set and Up Set must have the same value, but as they are equal to 0, there are not defined OSDs to stores pgs replicas, and I think this is why all pg are in active+undersized+degraded state. Has anyone any idea of what I have to do for Active Set and Up Set reaches correct values. Thanks a lot! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] He8 drives
On 13 July 2015 at 21:36, Emmanuel Florac eflo...@intellique.com wrote: I've benchmarked it and found it has about exactly the same performance profile as the He6. Compared to the Seagate 6TB it draws much less power (almost half), and that's the main selling point IMO, with durability. That's consistent with this other published review (which I found after the storagereview one): http://www.tomsitpro.com/articles/hgst-ultrastar-he8-8tb-hdd,2-921-8.html So seems like a decent option for a capacity-first Ceph cluster. -- Cheers, ~Blairo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All pgs with - up [0] acting [0], new cluster installation
On 13-07-15 14:07, alberto ayllon wrote: On 13-07-15 13:12, alberto ayllon wrote: Maybe this can help to get the origin of the problem. If I run ceph pg dump, and the end of the response i get: What does 'ceph osd tree' tell you? It seems there is something wrong with your CRUSHMap. Wido Thanks for your answer Wido. Here is the output of ceph osd tree; # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0 root default -2 0 host ceph01 0 0 osd.0up 1.0 1.0 3 0 osd.3up 1.0 1.0 6 0 osd.6up 1.0 1.0 -3 0 host ceph02 1 0 osd.1up 1.0 1.0 4 0 osd.4up 1.0 1.0 7 0 osd.7up 1.0 1.0 -4 0 host ceph03 2 0 osd.2up 1.0 1.0 5 0 osd.5up 1.0 1.0 8 0 osd.8up 1.0 1.0 The weights are allo zero (0) of all the OSDs. How big are the disks? I think they are very tiny , eg 10GB? You probably want a bit bigger disks to test with. Or set the weight manually of each OSD: $ ceph osd crush reweight osd.X 1 Wido osdstatkbusedkbavailkbhb inhb out 03668851949085231596[1,2,3,4,5,6,7,8][] 13400451975925231596[][] 23400451975925231596[1][] 33400451975925231596[0,1,2,4,5,6,7,8][] 43400451975925231596[1,2][] 53400451975925231596[1,2,4][] 63400451975925231596[0,1,2,3,4,5,7,8][] 73400451975925231596[1,2,4,5][] 83400451975925231596[1,2,4,5,7][] sum3087204677564447084364 Please someone can help me? 2015-07-13 11:45 GMT+02:00 alberto ayllon albertoayllonces at gmail.com http://gmail.com mailto:albertoayllonces mailto:albertoayllonces at gmail.com http://gmail.com: Hello everybody and thanks foryour help. Hello, I'm newbie in CEPH, I'm trying to install a CEPHcluster with test purpose. I had just installed a CEPH cluster with three VMs (ubuntu 14.04), each one has one mon daemon and three OSDs, also each server has 3 disk. Cluster has only one poll (rbd) with pg and pgp_num = 280, and osd pool get rbd size = 2. I made cluster's installation with ceph-deploy, ceph version is 0.94.2 I think cluster's OSDs are having peering problems, because if Irun ceph status, it returns: # ceph status cluster d54a2216-b522-4744-a7cc-a2106e1281b6 health HEALTH_WARN 280 pgs degraded 280 pgs stuck degraded 280 pgs stuck unclean 280 pgs stuck undersized 280 pgs undersized monmap e3: 3 mons at {ceph01=172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0 http://172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0 http://172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0} election epoch 38, quorum 0,1,2 ceph01,ceph02,ceph03 osdmap e46: 9 osds: 9 up, 9 in pgmap v129: 280 pgs, 1 pools, 0 bytes data, 0 objects 301 MB used, 45679 MB / 45980 MB avail 280 active+undersized+degraded And for all pgs, the command ceph pg map X.yyreturns something like: osdmap e46 pg 0.d7 (0.d7) - up [0] acting [0] As I know Acting Set and Up Set must have the same value, but as they are equal to 0, there are not defined OSDs to stores pgs replicas, and I think this is why all pg are in active+undersized+degraded state. Has anyone any idea of what I have to do for Active Set and Up Set reaches correct values. Thanks a lot! ___ ceph-users mailing list ceph-users at lists.ceph.com http://lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All pgs with - up [0] acting [0], new cluster installation
Hi Wido. Thanks again. I will rebuild the cluster with bigger disk. Again thanks for your help. 2015-07-13 14:15 GMT+02:00 Wido den Hollander w...@42on.com: On 13-07-15 14:07, alberto ayllon wrote: On 13-07-15 13:12, alberto ayllon wrote: Maybe this can help to get the origin of the problem. If I run ceph pg dump, and the end of the response i get: What does 'ceph osd tree' tell you? It seems there is something wrong with your CRUSHMap. Wido Thanks for your answer Wido. Here is the output of ceph osd tree; # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0 root default -2 0 host ceph01 0 0 osd.0up 1.0 1.0 3 0 osd.3up 1.0 1.0 6 0 osd.6up 1.0 1.0 -3 0 host ceph02 1 0 osd.1up 1.0 1.0 4 0 osd.4up 1.0 1.0 7 0 osd.7up 1.0 1.0 -4 0 host ceph03 2 0 osd.2up 1.0 1.0 5 0 osd.5up 1.0 1.0 8 0 osd.8up 1.0 1.0 The weights are allo zero (0) of all the OSDs. How big are the disks? I think they are very tiny , eg 10GB? You probably want a bit bigger disks to test with. Or set the weight manually of each OSD: $ ceph osd crush reweight osd.X 1 Wido osdstatkbusedkbavailkbhb inhb out 03668851949085231596[1,2,3,4,5,6,7,8][] 13400451975925231596[][] 23400451975925231596[1][] 33400451975925231596[0,1,2,4,5,6,7,8][] 43400451975925231596[1,2][] 53400451975925231596[1,2,4][] 63400451975925231596[0,1,2,3,4,5,7,8][] 73400451975925231596[1,2,4,5][] 83400451975925231596[1,2,4,5,7][] sum3087204677564447084364 Please someone can help me? 2015-07-13 11:45 GMT+02:00 alberto ayllon albertoayllonces at gmail.com http://gmail.com mailto:albertoayllonces mailto:albertoayllonces at gmail.com http://gmail.com: Hello everybody and thanks foryour help. Hello, I'm newbie in CEPH, I'm trying to install a CEPHcluster with test purpose. I had just installed a CEPH cluster with three VMs (ubuntu 14.04), each one has one mon daemon and three OSDs, also each server has 3 disk. Cluster has only one poll (rbd) with pg and pgp_num = 280, and osd pool get rbd size = 2. I made cluster's installation with ceph-deploy, ceph version is 0.94.2 I think cluster's OSDs are having peering problems, because if Irun ceph status, it returns: # ceph status cluster d54a2216-b522-4744-a7cc-a2106e1281b6 health HEALTH_WARN 280 pgs degraded 280 pgs stuck degraded 280 pgs stuck unclean 280 pgs stuck undersized 280 pgs undersized monmap e3: 3 mons at {ceph01= 172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0 http://172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0 http://172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0 } election epoch 38, quorum 0,1,2 ceph01,ceph02,ceph03 osdmap e46: 9 osds: 9 up, 9 in pgmap v129: 280 pgs, 1 pools, 0 bytes data, 0 objects 301 MB used, 45679 MB / 45980 MB avail 280 active+undersized+degraded And for all pgs, the command ceph pg map X.yyreturns something like: osdmap e46 pg 0.d7 (0.d7) - up [0] acting [0] As I know Acting Set and Up Set must have the same value, but as they are equal to 0, there are not defined OSDs to stores pgs replicas, and I think this is why all pg are in active+undersized+degraded state. Has anyone any idea of what I have to do for Active Set and Up Set reaches correct values. Thanks a lot! ___ ceph-users mailing list ceph-users at lists.ceph.com http://lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS kernel client reboots on write
On Mon, Jul 13, 2015 at 9:49 AM, Ilya Dryomov idryo...@gmail.com wrote: On Fri, Jul 10, 2015 at 9:36 PM, Jan Pekař jan.pe...@imatic.cz wrote: Hi all, I think I found a bug in cephfs kernel client. When I create directory in cephfs and set layout to ceph.dir.layout=stripe_unit=1073741824 stripe_count=1 object_size=1073741824 pool=somepool attepmts to write larger file will cause kernel hung or reboot. When I'm using cephfs client based on fuse, it works (but now I have some issues with fuse and concurrent writes too, but it is not this kind of problem). Which kernel are you running? What do you see in the dmesg when it hangs? What is the panic splat when it crashes? How big is the larger file that you are trying to write? I think object_size and stripe_unit 1073741824 is max value, or can I set it higher? Default values stripe_unit=4194304 stripe_count=1 object_size=4194304 works without problem on write. My goal was not to split file between osd's each 4MB of its size but save it in one piece. This is generally not a very good idea - you have to consider the distribution of objects across PGs and how your OSDs will be utilized. Yeah. Beyond that, the OSDs will reject writes exceeding a certain size (90MB by default). I'm not sure exactly what mismatch you're running into here but I can think of several different ways a 1GB write/single object could get stuck; it's just not a good idea. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Firefly 0.80.10 ready to upgrade to?
Hello, it seems that new packages for firefly have been uploaded to repo. However, I can't find any details in Ceph Release notes. There is only one thread in ceph-devel [1], but it is not clear what this new version is about. Is it safe to upgrade from 0.80.9 to 0.80.10? Regards, Kostis [1] http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/25684 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS kernel client reboots on write
On Fri, Jul 10, 2015 at 9:36 PM, Jan Pekař jan.pe...@imatic.cz wrote: Hi all, I think I found a bug in cephfs kernel client. When I create directory in cephfs and set layout to ceph.dir.layout=stripe_unit=1073741824 stripe_count=1 object_size=1073741824 pool=somepool attepmts to write larger file will cause kernel hung or reboot. When I'm using cephfs client based on fuse, it works (but now I have some issues with fuse and concurrent writes too, but it is not this kind of problem). Which kernel are you running? What do you see in the dmesg when it hangs? What is the panic splat when it crashes? How big is the larger file that you are trying to write? I think object_size and stripe_unit 1073741824 is max value, or can I set it higher? Default values stripe_unit=4194304 stripe_count=1 object_size=4194304 works without problem on write. My goal was not to split file between osd's each 4MB of its size but save it in one piece. This is generally not a very good idea - you have to consider the distribution of objects across PGs and how your OSDs will be utilized. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 32 bit limitation for ceph on arm
Hi, I am building a ceph cluster on Arm. Is there any limitation for 32 bit in regard to number of nodes, storage capacity etc? Please suggest.. Thanks. Daleep Singh Bais ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Firefly 0.80.10 ready to upgrade to?
On 13-07-15 12:25, Kostis Fardelas wrote: Hello, it seems that new packages for firefly have been uploaded to repo. However, I can't find any details in Ceph Release notes. There is only one thread in ceph-devel [1], but it is not clear what this new version is about. Is it safe to upgrade from 0.80.9 to 0.80.10? I already have multiple systems running 0.80.10 which came from .7, .8 and .9. .10 works just fine. Is a 500TB production cluster. Wido Regards, Kostis [1] http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/25684 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] All pgs with - up [0] acting [0], new cluster installation
Hello everybody and thanks for your help. Hello, I'm newbie in CEPH, I'm trying to install a CEPH cluster with test purpose. I had just installed a CEPH cluster with three VMs (ubuntu 14.04), each one has one mon daemon and three OSDs, also each server has 3 disk. Cluster has only one poll (rbd) with pg and pgp_num = 280, and osd pool get rbd size = 2. I made cluster's installation with ceph-deploy, ceph version is 0.94.2 I think cluster's OSDs are having peering problems, because if I run ceph status, it returns: # ceph status cluster d54a2216-b522-4744-a7cc-a2106e1281b6 health HEALTH_WARN 280 pgs degraded 280 pgs stuck degraded 280 pgs stuck unclean 280 pgs stuck undersized 280 pgs undersized monmap e3: 3 mons at {ceph01= 172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0 } election epoch 38, quorum 0,1,2 ceph01,ceph02,ceph03 osdmap e46: 9 osds: 9 up, 9 in pgmap v129: 280 pgs, 1 pools, 0 bytes data, 0 objects 301 MB used, 45679 MB / 45980 MB avail 280 active+undersized+degraded And for all pgs, the command ceph pg map X.yy returns something like: osdmap e46 pg 0.d7 (0.d7) - up [0] acting [0] As I know Acting Set and Up Set must have the same value, but as they are equal to 0, there are not defined OSDs to stores pgs replicas, and I think this is why all pg are in active+undersized+degraded state. Has anyone any idea of what I have to do for Active Set and Up Set reaches correct values. Thanks a lot! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All pgs with - up [0] acting [0], new cluster installation
On 13-07-15 13:12, alberto ayllon wrote: Maybe this can help to get the origin of the problem. If I run ceph pg dump, and the end of the response i get: What does 'ceph osd tree' tell you? It seems there is something wrong with your CRUSHMap. Wido Thanks for your answer Wido. Here is the output of ceph osd tree; # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0 root default -2 0 host ceph01 0 0 osd.0up 1.0 1.0 3 0 osd.3up 1.0 1.0 6 0 osd.6up 1.0 1.0 -3 0 host ceph02 1 0 osd.1up 1.0 1.0 4 0 osd.4up 1.0 1.0 7 0 osd.7up 1.0 1.0 -4 0 host ceph03 2 0 osd.2up 1.0 1.0 5 0 osd.5up 1.0 1.0 8 0 osd.8up 1.0 1.0 osdstatkbusedkbavailkbhb inhb out 03668851949085231596[1,2,3,4,5,6,7,8][] 13400451975925231596[][] 23400451975925231596[1][] 33400451975925231596[0,1,2,4,5,6,7,8][] 43400451975925231596[1,2][] 53400451975925231596[1,2,4][] 63400451975925231596[0,1,2,3,4,5,7,8][] 73400451975925231596[1,2,4,5][] 83400451975925231596[1,2,4,5,7][] sum3087204677564447084364 Please someone can help me? 2015-07-13 11:45 GMT+02:00 alberto ayllon albertoayllonces at gmail.com mailto:albertoayllonces at gmail.com: Hello everybody and thanks foryour help. Hello, I'm newbie in CEPH, I'm trying to install a CEPHcluster with test purpose. I had just installed a CEPH cluster with three VMs (ubuntu 14.04), each one has one mon daemon and three OSDs, also each server has 3 disk. Cluster has only one poll (rbd) with pg and pgp_num = 280, and osd pool get rbd size = 2. I made cluster's installation with ceph-deploy, ceph version is 0.94.2 I think cluster's OSDs are having peering problems, because if Irun ceph status, it returns: # ceph status cluster d54a2216-b522-4744-a7cc-a2106e1281b6 health HEALTH_WARN 280 pgs degraded 280 pgs stuck degraded 280 pgs stuck unclean 280 pgs stuck undersized 280 pgs undersized monmap e3: 3 mons at {ceph01= 172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0 http://172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0 } election epoch 38, quorum 0,1,2 ceph01,ceph02,ceph03 osdmap e46: 9 osds: 9 up, 9 in pgmap v129: 280 pgs, 1 pools, 0 bytes data, 0 objects 301 MB used, 45679 MB / 45980 MB avail 280 active+undersized+degraded And for all pgs, the command ceph pg map X.yyreturns something like: osdmap e46 pg 0.d7 (0.d7) - up [0] acting [0] As I know Acting Set and Up Set must have the same value, but as they are equal to 0, there are not defined OSDs to stores pgs replicas, and I think this is why all pg are in active+undersized+degraded state. Has anyone any idea of what I have to do for Active Set and Up Set reaches correct values. Thanks a lot! ___ ceph-users mailing list ceph-users at lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs omap
Sorry for reviving an old thread, but could I get some input on this, pretty please? ext4 has 256-byte inodes by default (at least according to docs) but the fragment below says: OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) The default 512b is too much if the inode is just 256b, so shouldn’t that be 256b in case people use the default ext4 inode size? Anyway, is it better to format ext4 with larger inodes (say 2048b) and set filestore_max_inline_xattr_size_other=1536, or leave it at defaults? (As I understand it, on ext4 xattrs ale limited to one block, inode size + something can spill to one different inode - maybe someone knows better). Is filestore_max_inline_xattr_size and absolute limit, or is it filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality? Does OSD do the sane thing if for some reason the xattrs do not fit? What are the performance implications of storing the xattrs in leveldb? And lastly - what size of xattrs should I really expect if all I use is RBD for OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and pool snapshots). This overhead is quite large My plan so far is to format the drives like this: mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256 (2048b inode, 4096b block size, one inode for 512k of space and set filestore_max_inline_xattr_size_other=1536 Does that make sense? Thanks! Jan On 02 Jul 2015, at 12:18, Jan Schermer j...@schermer.cz wrote: Does anyone have a known-good set of parameters for ext4? I want to try it as well but I’m a bit worried what happnes if I get it wrong. Thanks Jan On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Balzer Sent: 02 July 2015 02:23 To: Ceph Users Subject: Re: [ceph-users] xattrs vs omap On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote: It is replaced with the following config option.. // Use omap for xattrs for attrs over // filestore_max_inline_xattr_size or OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536) OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048) OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) // for more than filestore_max_inline_xattrs attrs OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2) If these limits crossed, xattrs will be stored in omap.. Sounds fair. Since I only use RBD I don't think it will ever exceed this. Possibly, see my thread about performance difference between new and old pools. Still not quite sure what's going on, but for some reasons some of the objects behind RBD's have larger xattrs which is causing really poor performance. Thanks, Chibi For ext4, you can use either filestore_max*_other or filestore_max_inline_xattrs/ filestore_max_inline_xattr_size. I any case, later two will override everything. Thanks Regards Somnath -Original Message- From: Christian Balzer [mailto:ch...@gol.com] Sent: Wednesday, July 01, 2015 5:26 PM To: Ceph Users Cc: Somnath Roy Subject: Re: [ceph-users] xattrs vs omap Hello, On Wed, 1 Jul 2015 15:24:13 + Somnath Roy wrote: It doesn't matter, I think filestore_xattr_use_omap is a 'noop' and not used in the Hammer. Then what was this functionality replaced with, esp. considering EXT4 based OSDs? Chibi Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Adam Tygart Sent: Wednesday, July 01, 2015 8:20 AM To: Ceph Users Subject: [ceph-users] xattrs vs omap Hello all, I've got a coworker who put filestore_xattr_use_omap = true in the ceph.conf when we first started building the cluster. Now he can't remember why. He thinks it may be a holdover from our first Ceph cluster (running dumpling on ext4, iirc). In the newly built cluster, we are using XFS with 2048 byte inodes, running Ceph 0.94.2. It currently has production data in it. From my reading of other threads, it looks like this is probably not something you want set to true (at least on XFS), due to performance implications. Is this something you can change on a running cluster? Is it worth the hassle? Thanks, Adam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified
Re: [ceph-users] All pgs with - up [0] acting [0], new cluster installation
On 13-07-15 13:12, alberto ayllon wrote: Maybe this can help to get the origin of the problem. If I run ceph pg dump, and the end of the response i get: What does 'ceph osd tree' tell you? It seems there is something wrong with your CRUSHMap. Wido osdstatkbusedkbavailkbhb inhb out 03668851949085231596[1,2,3,4,5,6,7,8][] 13400451975925231596[][] 23400451975925231596[1][] 33400451975925231596[0,1,2,4,5,6,7,8][] 43400451975925231596[1,2][] 53400451975925231596[1,2,4][] 63400451975925231596[0,1,2,3,4,5,7,8][] 73400451975925231596[1,2,4,5][] 83400451975925231596[1,2,4,5,7][] sum3087204677564447084364 Please someone can help me? 2015-07-13 11:45 GMT+02:00 alberto ayllon albertoayllon...@gmail.com mailto:albertoayllon...@gmail.com: Hello everybody and thanks foryour help. Hello, I'm newbie in CEPH, I'm trying to install a CEPHcluster with test purpose. I had just installed a CEPH cluster with three VMs (ubuntu 14.04), each one has one mon daemon and three OSDs, also each server has 3 disk. Cluster has only one poll (rbd) with pg and pgp_num = 280, and osd pool get rbd size = 2. I made cluster's installation with ceph-deploy, ceph version is 0.94.2 I think cluster's OSDs are having peering problems, because if Irun ceph status, it returns: # ceph status cluster d54a2216-b522-4744-a7cc-a2106e1281b6 health HEALTH_WARN 280 pgs degraded 280 pgs stuck degraded 280 pgs stuck unclean 280 pgs stuck undersized 280 pgs undersized monmap e3: 3 mons at {ceph01=172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0 http://172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0} election epoch 38, quorum 0,1,2 ceph01,ceph02,ceph03 osdmap e46: 9 osds: 9 up, 9 in pgmap v129: 280 pgs, 1 pools, 0 bytes data, 0 objects 301 MB used, 45679 MB / 45980 MB avail 280 active+undersized+degraded And for all pgs, the command ceph pg map X.yyreturns something like: osdmap e46 pg 0.d7 (0.d7) - up [0] acting [0] As I know Acting Set and Up Set must have the same value, but as they are equal to 0, there are not defined OSDs to stores pgs replicas, and I think this is why all pg are in active+undersized+degraded state. Has anyone any idea of what I have to do for Active Set and Up Set reaches correct values. Thanks a lot! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] He8 drives
Le Wed, 8 Jul 2015 10:28:17 +1000 Blair Bethwaite blair.bethwa...@gmail.com écrivait: Does anyone have any experience with the newish HGST He8 8TB Helium filled HDDs? I've benchmarked it and found it has about exactly the same performance profile as the He6. Compared to the Seagate 6TB it draws much less power (almost half), and that's the main selling point IMO, with durability. -- Emmanuel Florac | Direction technique | Intellique | eflo...@intellique.com | +33 1 78 94 84 02 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 32 bit limitation for ceph on arm
Why do you stick to 32bit? Kinjo On Mon, Jul 13, 2015 at 7:35 PM, Daleep Bais daleepb...@gmail.com wrote: Hi, I am building a ceph cluster on Arm. Is there any limitation for 32 bit in regard to number of nodes, storage capacity etc? Please suggest.. Thanks. Daleep Singh Bais ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Life w/ Linux http://i-shinobu.hatenablog.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] slow requests going up and down
Does the ceph health detail show anything about stale or unclean PGs, or are you just getting the blocked ops messages? On 7/13/15, 5:38 PM, Deneau, Tom tom.den...@amd.com wrote: I have a cluster where over the weekend something happened and successive calls to ceph health detail show things like below. What does it mean when the number of blocked requests goes up and down like this? Some clients are still running successfully. -- Tom Deneau, AMD HEALTH_WARN 20 requests are blocked 32 sec; 2 osds have slow requests 20 ops are blocked 536871 sec 2 ops are blocked 536871 sec on osd.5 18 ops are blocked 536871 sec on osd.7 2 osds have slow requests HEALTH_WARN 4 requests are blocked 32 sec; 2 osds have slow requests 4 ops are blocked 536871 sec 2 ops are blocked 536871 sec on osd.5 2 ops are blocked 536871 sec on osd.7 2 osds have slow requests HEALTH_WARN 27 requests are blocked 32 sec; 2 osds have slow requests 27 ops are blocked 536871 sec 2 ops are blocked 536871 sec on osd.5 25 ops are blocked 536871 sec on osd.7 2 osds have slow requests HEALTH_WARN 34 requests are blocked 32 sec; 2 osds have slow requests 34 ops are blocked 536871 sec 9 ops are blocked 536871 sec on osd.5 25 ops are blocked 536871 sec on osd.7 2 osds have slow requests ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue with journal on another drive
Thank you Lionel, This was very helpful. I actually chose to split the partition and then recreated the OSDs. Everything is up and running now. Rimma On 7/13/15 6:34 PM, Lionel Bouton wrote: On 07/14/15 00:08, Rimma Iontel wrote: Hi all, [...] Is there something that needed to be done to journal partition to enable sharing between multiple OSDs? Or is there something else that's causing the isssue? IIRC you can't share a volume between multiple OSDs. What you could do if splitting this partition isn't possible is create a LVM volume group with it as a single physical volume (change type of partition to lvm, pvcreate /dev/sda6, vgcreate journal_vg /dev/sda6). Then you can create a logical volumes in it for each of your OSDs (lvcreate -n osdn_journal -L one_third_of_available_space journal_vg) and use them (/dev/journal_vg/osdn_journal) in your configuration. Lionel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 32 bit limitation for ceph on arm
Hi, I have an existing hardware which I have to use. Please suggest so that accordingly I could implement. Thanks On Mon, Jul 13, 2015 at 5:51 PM, Shinobu Kinjo shinobu...@gmail.com wrote: Why do you stick to 32bit? Kinjo On Mon, Jul 13, 2015 at 7:35 PM, Daleep Bais daleepb...@gmail.com wrote: Hi, I am building a ceph cluster on Arm. Is there any limitation for 32 bit in regard to number of nodes, storage capacity etc? Please suggest.. Thanks. Daleep Singh Bais ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Life w/ Linux http://i-shinobu.hatenablog.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] slow requests going up and down
Hello, to quote Sherlock Holmes: Data, data, data. I cannot make bricks without clay. That the number of blocked requests is varying is indeed interesting, but I presume you're more interested in fixing this than dissecting this particular tidbit? If so... Start with the basics, all relevant software version, a description of your cluster, full outputs of ceph osd tree and ceph -s, etc. The same 2 OSDs are affected, anything peculiar going on in their logs? How about their SMART status? Are they being deep-scrubbed (logs above) or otherwise busy (atop, iostat)? You may find something in the performance counters, blocked requests section, see: http://ceph.com/docs/v0.69/dev/perf_counters/ Lastly, the most likely fix will be restarting the affected OSDs. See also: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg15410.html Christian On Mon, 13 Jul 2015 22:38:57 + Deneau, Tom wrote: I have a cluster where over the weekend something happened and successive calls to ceph health detail show things like below. What does it mean when the number of blocked requests goes up and down like this? Some clients are still running successfully. -- Tom Deneau, AMD HEALTH_WARN 20 requests are blocked 32 sec; 2 osds have slow requests 20 ops are blocked 536871 sec 2 ops are blocked 536871 sec on osd.5 18 ops are blocked 536871 sec on osd.7 2 osds have slow requests HEALTH_WARN 4 requests are blocked 32 sec; 2 osds have slow requests 4 ops are blocked 536871 sec 2 ops are blocked 536871 sec on osd.5 2 ops are blocked 536871 sec on osd.7 2 osds have slow requests HEALTH_WARN 27 requests are blocked 32 sec; 2 osds have slow requests 27 ops are blocked 536871 sec 2 ops are blocked 536871 sec on osd.5 25 ops are blocked 536871 sec on osd.7 2 osds have slow requests HEALTH_WARN 34 requests are blocked 32 sec; 2 osds have slow requests 34 ops are blocked 536871 sec 9 ops are blocked 536871 sec on osd.5 25 ops are blocked 536871 sec on osd.7 2 osds have slow requests ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to recover from: 1 pgs down; 10 pgs incomplete; 10 pgs stuck inactive; 10 pgs stuck unclean
Hello everybody, I was testing a ceph cluster with osd_pool_default_size = 2 and while rebuilding the OSD on one ceph node a disk in an other node started getting read errors and ceph kept taking the OSD down, and instead of me executing ceph osd set nodown while the other node was rebuilding I kept restarting the OSD for a while and ceph took the OSD in for a few minutes and then taking it back down. I then removed the bad OSD from the cluster and later added it back in with nodown flag set and a weight of zero, moving all the data away. Then removed the OSD again and added a new OSD with a new hard drive. However I ended up with the following cluster status and I can't seem to find how to get the cluster healthy again. I'm doing this as tests before taking this ceph configuration in further production. http://paste.debian.net/plain/281922 If I lost data, my bad, but how could I figure out in what pool the data was lost and in what rbd volume (so what kvm guest lost data). Kind regards, Jelle de Jong ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ruby bindings for Librados
Hi, I have an Ruby application which currently talks S3, but I want to have the application talk native RADOS. Now looking online I found various Ruby bindings for librados, but none of them seem official. What I found: * desperados: https://github.com/johnl/desperados * ceph-ruby: https://github.com/netskin/ceph-ruby The last commit for desperados was in March 2013 and ceph-ruby in April 2015. Anybody out there using Ruby bindings? If so, which one and what are the experiences? -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ruby bindings for Librados
Hi Wido, I'm the dev of https://github.com/netskin/ceph-ruby and still use it in production on some systems. It has everything I need so I didn't develop any further. If you find any bugs or need new features, just open an issue and I'm happy to have a look. Best Corin Am 13.07.2015 um 21:24 schrieb Wido den Hollander: Hi, I have an Ruby application which currently talks S3, but I want to have the application talk native RADOS. Now looking online I found various Ruby bindings for librados, but none of them seem official. What I found: * desperados: https://github.com/johnl/desperados * ceph-ruby: https://github.com/netskin/ceph-ruby The last commit for desperados was in March 2013 and ceph-ruby in April 2015. Anybody out there using Ruby bindings? If so, which one and what are the experiences? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] He8 drives
Hi, I have just expand our ceph-cluster (7 nodes) with one 8TB HGST (change from 4TB to 8TB) on each node (and 11 4TB HGST). But I have set the primary affinity to 0 for the 8 TB-disks... in this case my performance values are not 8-TB-disk related. Udo On 08.07.2015 02:28, Blair Bethwaite wrote: Hi folks, Does anyone have any experience with the newish HGST He8 8TB Helium filled HDDs? Storagereview looked at them here: http://www.storagereview.com/hgst_ultrastar_helium_he8_8tb_enterprise_hard_drive_review. I'm torn as to the lower read performance shown there than e.g. the He6 or Seagate 6TB, but thing is, I think we probably have enough aggregate IOPs with ~170 drives. Has anyone tried these in a Ceph cluster yet? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs omap
inline -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan Schermer Sent: Monday, July 13, 2015 2:32 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] xattrs vs omap Sorry for reviving an old thread, but could I get some input on this, pretty please? ext4 has 256-byte inodes by default (at least according to docs) but the fragment below says: OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) The default 512b is too much if the inode is just 256b, so shouldn’t that be 256b in case people use the default ext4 inode size? Anyway, is it better to format ext4 with larger inodes (say 2048b) and set filestore_max_inline_xattr_size_other=1536, or leave it at defaults? [Somnath] Why 1536 ? why not 1024 or any power of 2 ? I am not seeing any harm though, but, curious. (As I understand it, on ext4 xattrs ale limited to one block, inode size + something can spill to one different inode - maybe someone knows better). [Somnath] The xttr size (_) is now more than 256 bytes and it will spill over, so, bigger inode size will be good. But, I would suggest do your benchmark before putting it into production. Is filestore_max_inline_xattr_size and absolute limit, or is it filestore_max_inline_xattr_size*filestore_max_inline_xattrs in reality? [Somnath] The *_size is tracking the xttr size per attribute and *inline_xattrs keep track of max number of inline attributes allowed. So, if a xattr size is *_size , it will go to omap and also if the total number of xattra *inline_xattrs , it will go to omap. If you are only using rbd, the number of inline xattrs will be always 2 and it will not cross that default max limit. Does OSD do the sane thing if for some reason the xattrs do not fit? What are the performance implications of storing the xattrs in leveldb? [Somnath] Even though I don't have the exact numbers, but, it has a significant overhead if the xattrs go to leveldb. And lastly - what size of xattrs should I really expect if all I use is RBD for OpenStack instances? (No radosgw, no cephfs, but heavy on rbd image and pool snapshots). This overhead is quite large [Somnath] It will be 2 xattrs, default _ will be little bigger than 256 bytes and _snapset is small depends on number of snaps/clones, but unlikely will cross 256 bytes range. My plan so far is to format the drives like this: mkfs.ext4 -I 2048 -b 4096 -i 524288 -E stride=32,stripe-width=256 (2048b inode, 4096b block size, one inode for 512k of space and set filestore_max_inline_xattr_size_other=1536 [Somnath] Not much idea on ext4, sorry.. Does that make sense? Thanks! Jan On 02 Jul 2015, at 12:18, Jan Schermer j...@schermer.cz wrote: Does anyone have a known-good set of parameters for ext4? I want to try it as well but I’m a bit worried what happnes if I get it wrong. Thanks Jan On 02 Jul 2015, at 09:40, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Balzer Sent: 02 July 2015 02:23 To: Ceph Users Subject: Re: [ceph-users] xattrs vs omap On Thu, 2 Jul 2015 00:36:18 + Somnath Roy wrote: It is replaced with the following config option.. // Use omap for xattrs for attrs over // filestore_max_inline_xattr_size or OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536) OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048) OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512) // for more than filestore_max_inline_xattrs attrs OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10) OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2) If these limits crossed, xattrs will be stored in omap.. Sounds fair. Since I only use RBD I don't think it will ever exceed this. Possibly, see my thread about performance difference between new and old pools. Still not quite sure what's going on, but for some reasons some of the objects behind RBD's have larger xattrs which is causing really poor performance. Thanks, Chibi For ext4, you can use either filestore_max*_other or filestore_max_inline_xattrs/ filestore_max_inline_xattr_size. I any case, later two will override everything. Thanks Regards Somnath -Original Message- From: Christian Balzer [mailto:ch...@gol.com] Sent: Wednesday, July 01, 2015 5:26 PM To: Ceph Users Cc: Somnath Roy Subject: Re: [ceph-users] xattrs vs omap Hello, On Wed, 1 Jul 2015 15:24:13 + Somnath Roy wrote: It doesn't matter, I think filestore_xattr_use_omap is a 'noop' and not used in the Hammer. Then what was this functionality replaced with, esp. considering EXT4 based OSDs? Chibi Thanks Regards Somnath -Original Message- From: ceph-users
Re: [ceph-users] mds0: Client failing to respond to cache pressure
Thanks John. I will back the test down to the simple case of 1 client without the kernel driver and only running NFS Ganesha, and work forward till I trip the problem and report my findings. Eric On Mon, Jul 13, 2015 at 2:18 AM, John Spray john.sp...@redhat.com wrote: On 13/07/2015 04:02, Eric Eastman wrote: Hi John, I am seeing this problem with Ceph v9.0.1 with the v4.1 kernel on all nodes. This system is using 4 Ceph FS client systems. They all have the kernel driver version of CephFS loaded, but none are mounting the file system. All 4 clients are using the libcephfs VFS interface to Ganesha NFS (V2.2.0-2) and Samba (Version 4.3.0pre1-GIT-0791bb0) to share out the Ceph file system. # ceph -s cluster 6d8aae1e-1125-11e5-a708-001b78e265be health HEALTH_WARN 4 near full osd(s) mds0: Client ede-c2-gw01 failing to respond to cache pressure mds0: Client ede-c2-gw02:cephfs failing to respond to cache pressure mds0: Client ede-c2-gw03:cephfs failing to respond to cache pressure monmap e1: 3 mons at {ede-c2-mon01= 10.15.2.121:6789/0,ede-c2-mon02=10.15.2.122:6789/0,ede-c2-mon03=10.15.2.123:6789/0 } election epoch 8, quorum 0,1,2 ede-c2-mon01,ede-c2-mon02,ede-c2-mon03 mdsmap e912: 1/1/1 up {0=ede-c2-mds03=up:active}, 2 up:standby osdmap e272: 8 osds: 8 up, 8 in pgmap v225264: 832 pgs, 4 pools, 188 GB data, 5173 kobjects 212 GB used, 48715 MB / 263 GB avail 832 active+clean client io 1379 kB/s rd, 20653 B/s wr, 98 op/s It would help if we knew whether it's the kernel clients or the userspace clients that are generating the warnings here. You've probably already done this, but I'd get rid of any unused kernel client mounts to simplify the situation. We haven't tested the cache limit enforcement with NFS Ganesha, so there is a decent chance that it is broken. The ganehsha FSAL is doing ll_get/ll_put reference counting on inodes, so it seems quite possible that its cache is pinning things that we would otherwise be evicting in response to cache pressure. You mention samba as well, You can see if the MDS cache is indeed exceeding its limit by looking at the output of: ceph daemon mds.daemon id perf dump mds ...where the inodes value tells you how many are in the cache, vs. inode_max. If you can, it would be useful to boil this down to a straightforward test case: if you start with a healthy cluster, mount a single ganesha client, and do your 5 million file procedure, do you get the warning? Same for samba/kernel mounts -- this is likely to be a client side issue, so we need to confirm which client is misbehaving. Cheers, John # cat /proc/version Linux version 4.1.0-040100-generic (kernel@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201506220235 SMP Mon Jun 22 06:36:19 UTC 2015 # ceph -v ceph version 9.0.1 (997b3f998d565a744bfefaaf34b08b891f8dbf64) The systems are all running Ubuntu Trusty that has been upgraded to the 4.1 kernel. This is all physical machines and no VMs. The test run that caused the problem was create and verifying 5 million small files. We have some tools that flag when Ceph is in a WARN state so it would be nice to get rid of this warning. Please let me know what additional information you need. Thanks, Eric On Fri, Jul 10, 2015 at 4:19 AM, 谷枫 feiche...@gmail.com wrote: Thank you John, All my server is ubuntu14.04 with 3.16 kernel. Not all of clients appear this problem, the cluster seems functioning well now. As you say,i will change the mds_cache_size to 50 from 10 to take a test, thanks again! 2015-07-10 17:00 GMT+08:00 John Spray john.sp...@redhat.com: This is usually caused by use of older kernel clients. I don't remember exactly what version it was fixed in, but iirc we've seen the problem with 3.14 and seen it go away with 3.18. If your system is otherwise functioning well, this is not a critical error -- it just means that the MDS might not be able to fully control its memory usage (i.e. it can exceed mds_cache_size). John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph packages for openSUSE 13.2, Factory, Tumbleweed
This is to announce that ceph has been packaged for openSUSE 13.2, openSUSE Factory, and openSUSE Tumbleweed. It is building in the OpenSUSE Build Service (OBS), filesystems:ceph project, from the development branch of what will become SUSE Enterprise Storage 2. https://build.opensuse.org/package/show/filesystems:ceph/ceph If you have the time and inclination to test the OBS ceph packages on openSUSE 13.2, Factory, and/or Tumbleweed, I will be interested to hear from you. The same applies if you need help downloading/installing the packages. Thanks and regards. -- Nathan Cutler Software Engineer Distributed Storage SUSE LINUX, s.r.o. Tel.: +420 284 084 037 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ruby bindings for Librados
On 07/13/2015 09:43 PM, Corin Langosch wrote: Hi Wido, I'm the dev of https://github.com/netskin/ceph-ruby and still use it in production on some systems. It has everything I need so I didn't develop any further. If you find any bugs or need new features, just open an issue and I'm happy to have a look. Ah, that's great! We should look into making a Ruby binding official and moving it to Ceph's Github project. That would make it more clear for end-users. I see that RADOS namespaces are currently not implemented in the Ruby bindings. Not many bindings have them though. Might be worth looking at. I'll give the current bindings a try btw! Best Corin Am 13.07.2015 um 21:24 schrieb Wido den Hollander: Hi, I have an Ruby application which currently talks S3, but I want to have the application talk native RADOS. Now looking online I found various Ruby bindings for librados, but none of them seem official. What I found: * desperados: https://github.com/johnl/desperados * ceph-ruby: https://github.com/netskin/ceph-ruby The last commit for desperados was in March 2013 and ceph-ruby in April 2015. Anybody out there using Ruby bindings? If so, which one and what are the experiences? -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS kernel client reboots on write
On 2015-07-13 12:01, Gregory Farnum wrote: On Mon, Jul 13, 2015 at 9:49 AM, Ilya Dryomov idryo...@gmail.com wrote: On Fri, Jul 10, 2015 at 9:36 PM, Jan Pekař jan.pe...@imatic.cz wrote: Hi all, I think I found a bug in cephfs kernel client. When I create directory in cephfs and set layout to ceph.dir.layout=stripe_unit=1073741824 stripe_count=1 object_size=1073741824 pool=somepool attepmts to write larger file will cause kernel hung or reboot. When I'm using cephfs client based on fuse, it works (but now I have some issues with fuse and concurrent writes too, but it is not this kind of problem). Which kernel are you running? What do you see in the dmesg when it hangs? What is the panic splat when it crashes? How big is the larger file that you are trying to write? I'm running 4.0.3 kernel but it was the same with older ones. Computer hangs, so I cannot display dmesg. I will try to catch it with remote syslog. Larger file is about 500MB. Last time 300MB was ok. I think object_size and stripe_unit 1073741824 is max value, or can I set it higher? Default values stripe_unit=4194304 stripe_count=1 object_size=4194304 works without problem on write. My goal was not to split file between osd's each 4MB of its size but save it in one piece. This is generally not a very good idea - you have to consider the distribution of objects across PGs and how your OSDs will be utilized. Yeah. Beyond that, the OSDs will reject writes exceeding a certain size (90MB by default). I'm not sure exactly what mismatch you're running into here but I can think of several different ways a 1GB write/single object could get stuck; it's just not a good idea. -Greg I'm using it this way from the beginning and with FUSE I had no problem with big files. Objects in my OSD has often 1GB and no problem with it. -- Ing. Jan Pekař jan.pe...@imatic.cz | +420603811737 Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Issue with journal on another drive
Hi all, I am trying to set up a three-node ceph cluster. Each node is running RHEL 7.1 and has three 1TB HDD drives for OSDs (sdb, sdc, sdd) and an SSD partition (/dev/sda6) for the journal. I zapped the HDDs and used the following to create OSDs: # ceph-deploy --overwrite-conf osd create node:/dev/sdb:/dev/sda6 # ceph-deploy --overwrite-conf osd create node:/dev/sdc:/dev/sda6 # ceph-deploy --overwrite-conf osd create node:/dev/sdd:/dev/sda6 Didn't get any errors but some of the OSDs are not coming up on the nodes: # ceph osd tree # idweight type name up/down reweight -1 8.19root default -2 2.73host osd-01 3 0.91osd.3 up 1 0 0.91osd.0 up 1 1 0.91osd.1 down0 -3 2.73host osd-02 4 0.91osd.4 up 1 2 0.91osd.2 down0 7 0.91osd.7 down0 -4 2.73host osd-03 8 0.91osd.8 up 1 5 0.91osd.5 down0 6 0.91osd.6 up 1 Cluster is not doing well: # ceph -s cluster a1a1fa57-d9eb-4eb1-b0de-7729ce7eb10c health HEALTH_WARN 1724 pgs degraded; 96 pgs incomplete; 2 pgs stale; 96 pgs stuck inactive; 2 pgs stuck stale; 2666 pgs stuck unclean; recovery 4/24 objects degraded (16.667%) monmap e1: 3 mons at {cntrl-01=10.10.103.21:6789/0,cntrl-02=10.10.103.22:6789/0,cntrl-03=10.10.103.23:6789/0}, election epoch 18, quorum 0,1,2 cntrl-01,cntrl-02,cntrl-03 osdmap e345: 9 osds: 5 up, 5 in pgmap v16755: 4096 pgs, 2 pools, 12976 kB data, 8 objects 385 MB used, 4654 GB / 4655 GB avail 4/24 objects degraded (16.667%) 46 active 627 active+degraded+remapped 1430 active+clean 52 incomplete 1097 active+degraded 798 active+remapped 2 stale+active 44 remapped+incomplete I see the following in the logs for the failed OSDs: 2015-07-13 13:58:39.562223 7fafeb12d7c0 0 ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7), process ceph-osd, pid 4906 2015-07-13 13:58:39.592437 7fafeb12d7c0 0 filestore(/var/lib/ceph/osd/ceph-7) mount detected xfs (libxfs) 2015-07-13 13:58:39.592447 7fafeb12d7c0 1 filestore(/var/lib/ceph/osd/ceph-7) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-07-13 13:58:39.635624 7fafeb12d7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is supported and appears to work 2015-07-13 13:58:39.635633 7fafeb12d7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-07-13 13:58:39.643786 7fafeb12d7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-07-13 13:58:39.643838 7fafeb12d7c0 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-7) detect_feature: extsize is disabled by conf 2015-07-13 13:58:39.792118 7fafeb12d7c0 0 filestore(/var/lib/ceph/osd/ceph-7) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-07-13 13:58:40.064871 7fafeb12d7c0 1 journal _open /var/lib/ceph/osd/ceph-7/journal fd 20: 131080388608 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-07-13 13:58:40.064897 7fafeb12d7c0 -1 journal FileJournal::open: ondisk fsid 60436b03-ece2-4709-a847-cf46ae9d7481 doesn't match expected 1d4e4290-0e91-4f53-a477-bfc09990ef72, invalid (someone else's?) journal 2015-07-13 13:58:40.064928 7fafeb12d7c0 -1 filestore(/var/lib/ceph/osd/ceph-7) mount failed to open journal /var/lib/ceph/osd/ceph-7/journal: (22) Invalid argument 2015-07-13 13:58:40.073118 7fafeb12d7c0 -1 ESC[0;31m ** ERROR: error converting store /var/lib/ceph/osd/ceph-7: (22) Invalid argument Is there something that needed to be done to journal partition to enable sharing between multiple OSDs? Or is there something else that's causing the isssue? Thanks. -- Rimma ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] slow requests going up and down
I have a cluster where over the weekend something happened and successive calls to ceph health detail show things like below. What does it mean when the number of blocked requests goes up and down like this? Some clients are still running successfully. -- Tom Deneau, AMD HEALTH_WARN 20 requests are blocked 32 sec; 2 osds have slow requests 20 ops are blocked 536871 sec 2 ops are blocked 536871 sec on osd.5 18 ops are blocked 536871 sec on osd.7 2 osds have slow requests HEALTH_WARN 4 requests are blocked 32 sec; 2 osds have slow requests 4 ops are blocked 536871 sec 2 ops are blocked 536871 sec on osd.5 2 ops are blocked 536871 sec on osd.7 2 osds have slow requests HEALTH_WARN 27 requests are blocked 32 sec; 2 osds have slow requests 27 ops are blocked 536871 sec 2 ops are blocked 536871 sec on osd.5 25 ops are blocked 536871 sec on osd.7 2 osds have slow requests HEALTH_WARN 34 requests are blocked 32 sec; 2 osds have slow requests 34 ops are blocked 536871 sec 9 ops are blocked 536871 sec on osd.5 25 ops are blocked 536871 sec on osd.7 2 osds have slow requests ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph daemons stucked in FUTEX_WAIT syscall
Hi , I'm running a small cephFS ( 21 TB , 16 OSDs having different sizes between 400G and 3.5 TB ) cluster that is used as a file warehouse (both small and big files). Every day there are times when a lot of processes running on the client servers ( using either fuse of kernel client) become stuck in D state and when I run a strace of them I see them waiting in FUTEX_WAIT syscall. The same issue I'm able to see on all OSD demons. The ceph version I'm running is Firefly 0.80.10 both on clients and on server daemons. I use ext4 as osd filesystem. Operating system on servers : Ubuntu 14.04 and kernel 3.13. Operaing system on clients : Ubuntu 12.04 LTS with HWE option kernel 3.13 The osd daemons are using RAID5 virtual disks (6 x 300 GB 10K RPM disks on RAID controller Dell PERC H700 with 512MB BBU using write-back mode). The servers which the ceph daemons are running on are also hosting KVM VMs ( OpenStack Nova ). Because of this unfortunate setup the performance is really bad, but at least I shouldn't see as many locking issues (or shoud I ? ). The only thing which temporarily improves the performance is restarting every osd. After such a restart I see some processes on client machines resume I/O but only for a couple of hours, then the whole process must be repeated. I cannot afford to run a setup without RAID because there isn't enough RAM left for a couple of osd daemons. The ceph.conf settings I use : auth cluster required = cephx auth service required = cephx auth client required = cephx filestore xattr use omap = true osd pool default size = 2 osd pool default min size = 1 osd pool default pg num = 128 osd pool default pgp num = 128 public network = 10.71.13.0/24 cluster network = 10.71.12.0/24 Did someone else experienced this kind of behaviour (stuck processes in FUTEX_WAIT syscall) when running firefly release on Ubuntu 14.04 ? Thanks, Simion Rad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue with journal on another drive
On 07/14/15 00:08, Rimma Iontel wrote: Hi all, [...] Is there something that needed to be done to journal partition to enable sharing between multiple OSDs? Or is there something else that's causing the isssue? IIRC you can't share a volume between multiple OSDs. What you could do if splitting this partition isn't possible is create a LVM volume group with it as a single physical volume (change type of partition to lvm, pvcreate /dev/sda6, vgcreate journal_vg /dev/sda6). Then you can create a logical volumes in it for each of your OSDs (lvcreate -n osdn_journal -L one_third_of_available_space journal_vg) and use them (/dev/journal_vg/osdn_journal) in your configuration. Lionel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Configuring Ceph without DNS
On 13 Jul 2015, at 4:58 pm, Abhishek Varshney abhishekvrs...@gmail.com wrote: I have a requirement wherein I wish to setup Ceph where hostname resolution is not supported and I just have IP addresses to work with. Is there a way through which I can achieve this in Ceph? If yes, what are the caveats associated with that approach? We’ve been operating our Dumpling (now Firefly) cluster this way since it was put into production over 18-months ago, using host files to define all the Ceph hosts, works perfectly well. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Configuring Ceph without DNS
Hi, I have a requirement wherein I wish to setup Ceph where hostname resolution is not supported and I just have IP addresses to work with. Is there a way through which I can achieve this in Ceph? If yes, what are the caveats associated with that approach? PS: I am using ceph-deploy for deployment. Thanks Abhishek ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Configuring Ceph without DNS
Hi, Could you try to use the host files instead af DNS. - Defining all CEPH hosts in /etc/hosts with their ip’s should solve the problem. thanks, Peter Calum Fra: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] På vegne af Abhishek Varshney Sendt: 13. juli 2015 08:59 Til: ceph-users@lists.ceph.com Emne: [ceph-users] Configuring Ceph without DNS Hi, I have a requirement wherein I wish to setup Ceph where hostname resolution is not supported and I just have IP addresses to work with. Is there a way through which I can achieve this in Ceph? If yes, what are the caveats associated with that approach? PS: I am using ceph-deploy for deployment. Thanks Abhishek ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Configuring Ceph without DNS
Hi Peter and Nigel, I have tries /etc/hosts and it works perfectly fine! But I am looking for an alternative (if any) to do away completely with hostnames and just use IP addresses instead. Thanks Abhishek On 13 July 2015 at 12:40, Nigel Williams nigel.d.willi...@gmail.com wrote: On 13 Jul 2015, at 4:58 pm, Abhishek Varshney abhishekvrs...@gmail.com wrote: I have a requirement wherein I wish to setup Ceph where hostname resolution is not supported and I just have IP addresses to work with. Is there a way through which I can achieve this in Ceph? If yes, what are the caveats associated with that approach? We’ve been operating our Dumpling (now Firefly) cluster this way since it was put into production over 18-months ago, using host files to define all the Ceph hosts, works perfectly well. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs without admin key
Yes: clients need an MDS key that says allow, and an OSD key that permits it access to the RADOS pool you're using as your CephFS data pool. If you're already trying that and getting an error, please post the caps you're using. Thanks, John On 12/07/2015 14:12, Bernhard Duebi wrote: Hi, I'm new to ceph. I setup a small cluster and successfully connected kvm/qemu to use block devices. Now I'm experimenting with CephFS. I use ceph-fuse on SLES12 (ceph 0.94). I can mount the file-system and write to it, but only when the admin keyring is present, which gives the FS client full admin privileges. For kvm/qemu I can limit the privileges by creating key with limited privileges. I was googling if the same is possible for CephFS. I found some answers but none of them work because I always get permission denied. Any hints how the key should look like? Thanks Bernhard FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks orcas on your desktop! Check it out at http://www.inbox.com/marineaquarium ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Configuring Ceph without DNS
On 13-07-15 09:13, Abhishek Varshney wrote: Hi Peter and Nigel, I have tries /etc/hosts and it works perfectly fine! But I am looking for an alternative (if any) to do away completely with hostnames and just use IP addresses instead. It's just that ceph-deploy wants DNS, but if you go for manual bootstrapping there is no requirement for DNS at all. Ceph internally doesn't do anything with DNS, it has the monitor addresses hardcoded in the monmap and that is leading for the cluster. Wido Thanks Abhishek On 13 July 2015 at 12:40, Nigel Williams nigel.d.willi...@gmail.com mailto:nigel.d.willi...@gmail.com wrote: On 13 Jul 2015, at 4:58 pm, Abhishek Varshney abhishekvrs...@gmail.com mailto:abhishekvrs...@gmail.com wrote: I have a requirement wherein I wish to setup Ceph where hostname resolution is not supported and I just have IP addresses to work with. Is there a way through which I can achieve this in Ceph? If yes, what are the caveats associated with that approach? We’ve been operating our Dumpling (now Firefly) cluster this way since it was put into production over 18-months ago, using host files to define all the Ceph hosts, works perfectly well. ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client failing to respond to cache pressure
On 13/07/2015 04:02, Eric Eastman wrote: Hi John, I am seeing this problem with Ceph v9.0.1 with the v4.1 kernel on all nodes. This system is using 4 Ceph FS client systems. They all have the kernel driver version of CephFS loaded, but none are mounting the file system. All 4 clients are using the libcephfs VFS interface to Ganesha NFS (V2.2.0-2) and Samba (Version 4.3.0pre1-GIT-0791bb0) to share out the Ceph file system. # ceph -s cluster 6d8aae1e-1125-11e5-a708-001b78e265be health HEALTH_WARN 4 near full osd(s) mds0: Client ede-c2-gw01 failing to respond to cache pressure mds0: Client ede-c2-gw02:cephfs failing to respond to cache pressure mds0: Client ede-c2-gw03:cephfs failing to respond to cache pressure monmap e1: 3 mons at {ede-c2-mon01=10.15.2.121:6789/0,ede-c2-mon02=10.15.2.122:6789/0,ede-c2-mon03=10.15.2.123:6789/0} election epoch 8, quorum 0,1,2 ede-c2-mon01,ede-c2-mon02,ede-c2-mon03 mdsmap e912: 1/1/1 up {0=ede-c2-mds03=up:active}, 2 up:standby osdmap e272: 8 osds: 8 up, 8 in pgmap v225264: 832 pgs, 4 pools, 188 GB data, 5173 kobjects 212 GB used, 48715 MB / 263 GB avail 832 active+clean client io 1379 kB/s rd, 20653 B/s wr, 98 op/s It would help if we knew whether it's the kernel clients or the userspace clients that are generating the warnings here. You've probably already done this, but I'd get rid of any unused kernel client mounts to simplify the situation. We haven't tested the cache limit enforcement with NFS Ganesha, so there is a decent chance that it is broken. The ganehsha FSAL is doing ll_get/ll_put reference counting on inodes, so it seems quite possible that its cache is pinning things that we would otherwise be evicting in response to cache pressure. You mention samba as well, You can see if the MDS cache is indeed exceeding its limit by looking at the output of: ceph daemon mds.daemon id perf dump mds ...where the inodes value tells you how many are in the cache, vs. inode_max. If you can, it would be useful to boil this down to a straightforward test case: if you start with a healthy cluster, mount a single ganesha client, and do your 5 million file procedure, do you get the warning? Same for samba/kernel mounts -- this is likely to be a client side issue, so we need to confirm which client is misbehaving. Cheers, John # cat /proc/version Linux version 4.1.0-040100-generic (kernel@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201506220235 SMP Mon Jun 22 06:36:19 UTC 2015 # ceph -v ceph version 9.0.1 (997b3f998d565a744bfefaaf34b08b891f8dbf64) The systems are all running Ubuntu Trusty that has been upgraded to the 4.1 kernel. This is all physical machines and no VMs. The test run that caused the problem was create and verifying 5 million small files. We have some tools that flag when Ceph is in a WARN state so it would be nice to get rid of this warning. Please let me know what additional information you need. Thanks, Eric On Fri, Jul 10, 2015 at 4:19 AM, 谷枫 feiche...@gmail.com wrote: Thank you John, All my server is ubuntu14.04 with 3.16 kernel. Not all of clients appear this problem, the cluster seems functioning well now. As you say,i will change the mds_cache_size to 50 from 10 to take a test, thanks again! 2015-07-10 17:00 GMT+08:00 John Spray john.sp...@redhat.com: This is usually caused by use of older kernel clients. I don't remember exactly what version it was fixed in, but iirc we've seen the problem with 3.14 and seen it go away with 3.18. If your system is otherwise functioning well, this is not a critical error -- it just means that the MDS might not be able to fully control its memory usage (i.e. it can exceed mds_cache_size). John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com