Re: [ceph-users] 0.87 rados df fault
Thanks for your answer Greg. Unfortunately, the three monitor were working perfectly for at least 30 minutes after the upgrade. I don't know their memory usage at the time. What I did was : upgrade mons, upgrade osds, upgrade mds (single mds), upgrade fuse clients. I checked that everything was ok (health OK and data available). Then I started a rsync of around 7TB of data, mostly files between 100KB and 10MB, with 6TB of data already in CephFS. Currently the memory usage of my mons is around 110MB (on 1GB of memory and 1GB of swap). I'll keep an eye on this. On another matter (maybe I should start another thread), sometimes I have : health HEALTH_WARN mds0: Client wimi-recette-files-nginx:recette-files-rw failing to respond to cache pressure; mds0: Client wimi-prod-backupmanager:files-rw failing to respond to cache pressure And two minutes later : health HEALTH_OK Cephfs fuse clients only. But everything is working well, so I'm not so worried. Regards, -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information On lun., 2014-11-03 at 09:57 -0800, Gregory Farnum wrote: > On Mon, Nov 3, 2014 at 4:40 AM, Thomas Lemarchand > wrote: > > Update : > > > > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084] > > [21787] 0 21780 492110 185044 920 240143 0 > > ceph-mon > > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115] > > [13136] 0 1313652172 1753 590 0 > > ceph > > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746126] Out > > of memory: Kill process 21787 (ceph-mon) score 827 or sacrifice child > > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746262] > > Killed process 21787 (ceph-mon) total-vm:1968440kB, anon-rss:740176kB, > > file-rss:0kB > > > > OOM kill. > > I have 1GB memory on my mons, and 1GB swap. > > It's the only mon that crashed. Is there a change in memory requirement > > from Firefly ? > > There generally shouldn't be, but I don't think it's something we > monitored closely. > More likely your monitor was running near its memory limit already and > restarting all the OSDs (and servicing the resulting changes) pushed > it over the edge. > -Greg > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 0.87 rados df fault
On Mon, Nov 3, 2014 at 4:40 AM, Thomas Lemarchand wrote: > Update : > > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084] > [21787] 0 21780 492110 185044 920 240143 0 > ceph-mon > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115] > [13136] 0 1313652172 1753 590 0 > ceph > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746126] Out > of memory: Kill process 21787 (ceph-mon) score 827 or sacrifice child > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746262] > Killed process 21787 (ceph-mon) total-vm:1968440kB, anon-rss:740176kB, > file-rss:0kB > > OOM kill. > I have 1GB memory on my mons, and 1GB swap. > It's the only mon that crashed. Is there a change in memory requirement > from Firefly ? There generally shouldn't be, but I don't think it's something we monitored closely. More likely your monitor was running near its memory limit already and restarting all the OSDs (and servicing the resulting changes) pushed it over the edge. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 0.87 rados df fault
Update : /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084] [21787] 0 21780 492110 185044 920 240143 0 ceph-mon /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115] [13136] 0 1313652172 1753 590 0 ceph /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746126] Out of memory: Kill process 21787 (ceph-mon) score 827 or sacrifice child /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746262] Killed process 21787 (ceph-mon) total-vm:1968440kB, anon-rss:740176kB, file-rss:0kB OOM kill. I have 1GB memory on my mons, and 1GB swap. It's the only mon that crashed. Is there a change in memory requirement from Firefly ? Regards, -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information On lun., 2014-11-03 at 11:47 +0100, Thomas Lemarchand wrote: > Update : this error is linked to a crashed mon. It crashed during the > weekend. I try to understand why. I never had a mon crash before Giant. > > -- > Thomas Lemarchand > Cloud Solutions SAS - Responsable des systèmes d'information > > > > On lun., 2014-11-03 at 11:08 +0100, Thomas Lemarchand wrote: > > Hello all, > > > > I upgraded my cluster to Giant. Everything is working well, but on one > > mon I get a strange error when I do "rados df" : > > > > root@a-mon:~# rados df > > 2014-11-03 10:57:15.313618 7ff2434f0700 0 -- :/1009400 >> > > 10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1 > > c=0xe37b20).fault > > pool name category KB objects clones > > degraded unfound rdrd KB wrwr > > KB > > data- 0 88057910 > > 0 0 434686 434686 90533620 > > metadata- 63991517680 > > 0 0 1852535 1746370585 15900570178050318 > > wimi-files - 8893618079 99833970 > > 0 0 296284 2747513 18874311 8951883370 > > wimi-recette-files - 978453 235134 > > 00 0 272389 1321262 498429 > > 1042175 > > total used 27056765864 19076090 > > total avail78381176704 > > total space 105437942568 > > > > root@a-mon:~# ceph -v > > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) > > > > > > In the same cluster, on another mon, no problem : > > > > root@c-mon:/etc/ceph# rados df > > pool name category KB objects clones > > degraded unfound rdrd KB wrwr > > KB > > data- 0 88056340 > > 0 0 434686 434686 90532050 > > metadata- 63626517680 > > 0 0 1852535 1746370585 15900450178049886 > > wimi-files - 8893618079 99833970 > > 0 0 296284 2747513 18874311 8951883370 > > wimi-recette-files - 978449 235100 > > 00 0 272352 1321225 498232 > > 1042138 > > total used 27056761472 19075899 > > total avail78381181096 > > total space 105437942568 > > > > root@c-mon:/etc/ceph# ceph -v > > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) > > > > Is it a known error ? > > I can fill a formal bug report if needed. This problem is not important, > > but I fear implications outside of "rados df". > > > > Regards, > > -- > > Thomas Lemarchand > > Cloud Solutions SAS - Responsable des systèmes d'information > > > > > > > > > > > > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 0.87 rados df fault
Update : this error is linked to a crashed mon. It crashed during the weekend. I try to understand why. I never had a mon crash before Giant. -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information On lun., 2014-11-03 at 11:08 +0100, Thomas Lemarchand wrote: > Hello all, > > I upgraded my cluster to Giant. Everything is working well, but on one > mon I get a strange error when I do "rados df" : > > root@a-mon:~# rados df > 2014-11-03 10:57:15.313618 7ff2434f0700 0 -- :/1009400 >> > 10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0xe37b20).fault > pool name category KB objects clones > degraded unfound rdrd KB wrwr > KB > data- 0 88057910 > 0 0 434686 434686 90533620 > metadata- 63991517680 > 0 0 1852535 1746370585 15900570178050318 > wimi-files - 8893618079 99833970 > 0 0 296284 2747513 18874311 8951883370 > wimi-recette-files - 978453 235134 > 00 0 272389 1321262 498429 > 1042175 > total used 27056765864 19076090 > total avail78381176704 > total space 105437942568 > > root@a-mon:~# ceph -v > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) > > > In the same cluster, on another mon, no problem : > > root@c-mon:/etc/ceph# rados df > pool name category KB objects clones > degraded unfound rdrd KB wrwr > KB > data- 0 88056340 > 0 0 434686 434686 90532050 > metadata- 63626517680 > 0 0 1852535 1746370585 15900450178049886 > wimi-files - 8893618079 99833970 > 0 0 296284 2747513 18874311 8951883370 > wimi-recette-files - 978449 235100 > 00 0 272352 1321225 498232 > 1042138 > total used 27056761472 19075899 > total avail78381181096 > total space 105437942568 > > root@c-mon:/etc/ceph# ceph -v > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) > > Is it a known error ? > I can fill a formal bug report if needed. This problem is not important, > but I fear implications outside of "rados df". > > Regards, > -- > Thomas Lemarchand > Cloud Solutions SAS - Responsable des systèmes d'information > > > > > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 0.87 rados df fault
Hello all, I upgraded my cluster to Giant. Everything is working well, but on one mon I get a strange error when I do "rados df" : root@a-mon:~# rados df 2014-11-03 10:57:15.313618 7ff2434f0700 0 -- :/1009400 >> 10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0xe37b20).fault pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 0 88057910 0 0 434686 434686 90533620 metadata- 63991517680 0 0 1852535 1746370585 15900570178050318 wimi-files - 8893618079 99833970 0 0 296284 2747513 18874311 8951883370 wimi-recette-files - 978453 235134 00 0 272389 1321262 498429 1042175 total used 27056765864 19076090 total avail78381176704 total space 105437942568 root@a-mon:~# ceph -v ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) In the same cluster, on another mon, no problem : root@c-mon:/etc/ceph# rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 0 88056340 0 0 434686 434686 90532050 metadata- 63626517680 0 0 1852535 1746370585 15900450178049886 wimi-files - 8893618079 99833970 0 0 296284 2747513 18874311 8951883370 wimi-recette-files - 978449 235100 00 0 272352 1321225 498232 1042138 total used 27056761472 19075899 total avail78381181096 total space 105437942568 root@c-mon:/etc/ceph# ceph -v ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) Is it a known error ? I can fill a formal bug report if needed. This problem is not important, but I fear implications outside of "rados df". Regards, -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com