Re: [ceph-users] 0.87 rados df fault

2014-11-04 Thread Thomas Lemarchand
Thanks for your answer Greg.

Unfortunately, the three monitor were working perfectly for at least 30
minutes after the upgrade.

I don't know their memory usage at the time.
What I did was : upgrade mons, upgrade osds, upgrade mds (single mds),
upgrade fuse clients. I checked that everything was ok (health OK and
data available). Then I started a rsync of around 7TB of data, mostly
files between 100KB and 10MB, with 6TB of data already in CephFS.

Currently the memory usage of my mons is around 110MB (on 1GB of memory
and 1GB of swap).

I'll keep an eye on this.

On another matter (maybe I should start another thread), sometimes I
have : health HEALTH_WARN mds0: Client
wimi-recette-files-nginx:recette-files-rw failing to respond to cache
pressure; mds0: Client wimi-prod-backupmanager:files-rw failing to
respond to cache pressure

And two minutes later :
health HEALTH_OK

Cephfs fuse clients only. But everything is working well, so I'm not so
worried.

Regards,

-- 
Thomas Lemarchand
Cloud Solutions SAS - Responsable des systèmes d'information



On lun., 2014-11-03 at 09:57 -0800, Gregory Farnum wrote:
> On Mon, Nov 3, 2014 at 4:40 AM, Thomas Lemarchand
>  wrote:
> > Update :
> >
> > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084]
> > [21787] 0 21780   492110   185044 920   240143 0
> > ceph-mon
> > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115]
> > [13136] 0 1313652172 1753  590 0
> > ceph
> > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746126] Out
> > of memory: Kill process 21787 (ceph-mon) score 827 or sacrifice child
> > /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746262]
> > Killed process 21787 (ceph-mon) total-vm:1968440kB, anon-rss:740176kB,
> > file-rss:0kB
> >
> > OOM kill.
> > I have 1GB memory on my mons, and 1GB swap.
> > It's the only mon that crashed. Is there a change in memory requirement
> > from Firefly ?
> 
> There generally shouldn't be, but I don't think it's something we
> monitored closely.
> More likely your monitor was running near its memory limit already and
> restarting all the OSDs (and servicing the resulting changes) pushed
> it over the edge.
> -Greg
> 


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 0.87 rados df fault

2014-11-03 Thread Gregory Farnum
On Mon, Nov 3, 2014 at 4:40 AM, Thomas Lemarchand
 wrote:
> Update :
>
> /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084]
> [21787] 0 21780   492110   185044 920   240143 0
> ceph-mon
> /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115]
> [13136] 0 1313652172 1753  590 0
> ceph
> /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746126] Out
> of memory: Kill process 21787 (ceph-mon) score 827 or sacrifice child
> /var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746262]
> Killed process 21787 (ceph-mon) total-vm:1968440kB, anon-rss:740176kB,
> file-rss:0kB
>
> OOM kill.
> I have 1GB memory on my mons, and 1GB swap.
> It's the only mon that crashed. Is there a change in memory requirement
> from Firefly ?

There generally shouldn't be, but I don't think it's something we
monitored closely.
More likely your monitor was running near its memory limit already and
restarting all the OSDs (and servicing the resulting changes) pushed
it over the edge.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 0.87 rados df fault

2014-11-03 Thread Thomas Lemarchand
Update : 

/var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746084]
[21787] 0 21780   492110   185044 920   240143 0
ceph-mon
/var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746115]
[13136] 0 1313652172 1753  590 0
ceph
/var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746126] Out
of memory: Kill process 21787 (ceph-mon) score 827 or sacrifice child
/var/log/kern.log.1:Oct 31 17:19:17 c-mon kernel: [17289149.746262]
Killed process 21787 (ceph-mon) total-vm:1968440kB, anon-rss:740176kB,
file-rss:0kB

OOM kill.
I have 1GB memory on my mons, and 1GB swap.
It's the only mon that crashed. Is there a change in memory requirement
from Firefly ?

Regards,
-- 
Thomas Lemarchand
Cloud Solutions SAS - Responsable des systèmes d'information



On lun., 2014-11-03 at 11:47 +0100, Thomas Lemarchand wrote:
> Update : this error is linked to a crashed mon. It crashed during the
> weekend. I try to understand why. I never had a mon crash before Giant.
> 
> -- 
> Thomas Lemarchand
> Cloud Solutions SAS - Responsable des systèmes d'information
> 
> 
> 
> On lun., 2014-11-03 at 11:08 +0100, Thomas Lemarchand wrote:
> > Hello all,
> > 
> > I upgraded my cluster to Giant. Everything is working well, but on one
> > mon I get a strange error when I do "rados df" :
> > 
> > root@a-mon:~# rados df
> > 2014-11-03 10:57:15.313618 7ff2434f0700  0 -- :/1009400 >>
> > 10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1
> > c=0xe37b20).fault
> > pool name   category KB  objects   clones
> > degraded  unfound   rdrd KB   wrwr
> > KB
> > data-  0  88057910
> > 0   0   434686   434686  90533620
> > metadata-  63991517680
> > 0   0  1852535   1746370585 15900570178050318
> > wimi-files  - 8893618079  99833970
> > 0   0   296284  2747513 18874311   8951883370
> > wimi-recette-files - 978453   235134
> > 00   0   272389  1321262   498429
> > 1042175
> >   total used 27056765864 19076090
> >   total avail78381176704
> >   total space   105437942568
> > 
> > root@a-mon:~# ceph -v
> > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> > 
> > 
> > In the same cluster, on another mon, no problem :
> > 
> > root@c-mon:/etc/ceph# rados df
> > pool name   category KB  objects   clones
> > degraded  unfound   rdrd KB   wrwr
> > KB
> > data-  0  88056340
> > 0   0   434686   434686  90532050
> > metadata-  63626517680
> > 0   0  1852535   1746370585 15900450178049886
> > wimi-files  - 8893618079  99833970
> > 0   0   296284  2747513 18874311   8951883370
> > wimi-recette-files - 978449   235100
> > 00   0   272352  1321225   498232
> > 1042138
> >   total used 27056761472 19075899
> >   total avail78381181096
> >   total space   105437942568
> > 
> > root@c-mon:/etc/ceph# ceph -v
> > ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> > 
> > Is it a known error ?
> > I can fill a formal bug report if needed. This problem is not important,
> > but I fear implications outside of "rados df".
> > 
> > Regards,
> > -- 
> > Thomas Lemarchand
> > Cloud Solutions SAS - Responsable des systèmes d'information
> > 
> > 
> > 
> > 
> > 
> 
> 


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 0.87 rados df fault

2014-11-03 Thread Thomas Lemarchand
Update : this error is linked to a crashed mon. It crashed during the
weekend. I try to understand why. I never had a mon crash before Giant.

-- 
Thomas Lemarchand
Cloud Solutions SAS - Responsable des systèmes d'information



On lun., 2014-11-03 at 11:08 +0100, Thomas Lemarchand wrote:
> Hello all,
> 
> I upgraded my cluster to Giant. Everything is working well, but on one
> mon I get a strange error when I do "rados df" :
> 
> root@a-mon:~# rados df
> 2014-11-03 10:57:15.313618 7ff2434f0700  0 -- :/1009400 >>
> 10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0xe37b20).fault
> pool name   category KB  objects   clones
> degraded  unfound   rdrd KB   wrwr
> KB
> data-  0  88057910
> 0   0   434686   434686  90533620
> metadata-  63991517680
> 0   0  1852535   1746370585 15900570178050318
> wimi-files  - 8893618079  99833970
> 0   0   296284  2747513 18874311   8951883370
> wimi-recette-files - 978453   235134
> 00   0   272389  1321262   498429
> 1042175
>   total used 27056765864 19076090
>   total avail78381176704
>   total space   105437942568
> 
> root@a-mon:~# ceph -v
> ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> 
> 
> In the same cluster, on another mon, no problem :
> 
> root@c-mon:/etc/ceph# rados df
> pool name   category KB  objects   clones
> degraded  unfound   rdrd KB   wrwr
> KB
> data-  0  88056340
> 0   0   434686   434686  90532050
> metadata-  63626517680
> 0   0  1852535   1746370585 15900450178049886
> wimi-files  - 8893618079  99833970
> 0   0   296284  2747513 18874311   8951883370
> wimi-recette-files - 978449   235100
> 00   0   272352  1321225   498232
> 1042138
>   total used 27056761472 19075899
>   total avail78381181096
>   total space   105437942568
> 
> root@c-mon:/etc/ceph# ceph -v
> ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
> 
> Is it a known error ?
> I can fill a formal bug report if needed. This problem is not important,
> but I fear implications outside of "rados df".
> 
> Regards,
> -- 
> Thomas Lemarchand
> Cloud Solutions SAS - Responsable des systèmes d'information
> 
> 
> 
> 
> 


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 0.87 rados df fault

2014-11-03 Thread Thomas Lemarchand
Hello all,

I upgraded my cluster to Giant. Everything is working well, but on one
mon I get a strange error when I do "rados df" :

root@a-mon:~# rados df
2014-11-03 10:57:15.313618 7ff2434f0700  0 -- :/1009400 >>
10.94.67.202:6789/0 pipe(0xe37890 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0xe37b20).fault
pool name   category KB  objects   clones
degraded  unfound   rdrd KB   wrwr
KB
data-  0  88057910
0   0   434686   434686  90533620
metadata-  63991517680
0   0  1852535   1746370585 15900570178050318
wimi-files  - 8893618079  99833970
0   0   296284  2747513 18874311   8951883370
wimi-recette-files - 978453   235134
00   0   272389  1321262   498429
1042175
  total used 27056765864 19076090
  total avail78381176704
  total space   105437942568

root@a-mon:~# ceph -v
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)


In the same cluster, on another mon, no problem :

root@c-mon:/etc/ceph# rados df
pool name   category KB  objects   clones
degraded  unfound   rdrd KB   wrwr
KB
data-  0  88056340
0   0   434686   434686  90532050
metadata-  63626517680
0   0  1852535   1746370585 15900450178049886
wimi-files  - 8893618079  99833970
0   0   296284  2747513 18874311   8951883370
wimi-recette-files - 978449   235100
00   0   272352  1321225   498232
1042138
  total used 27056761472 19075899
  total avail78381181096
  total space   105437942568

root@c-mon:/etc/ceph# ceph -v
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)

Is it a known error ?
I can fill a formal bug report if needed. This problem is not important,
but I fear implications outside of "rados df".

Regards,
-- 
Thomas Lemarchand
Cloud Solutions SAS - Responsable des systèmes d'information





-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com