date:20150309

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread HEWLETT, Paul (Paul)** CTR **

Hi Wildo

If I disable the epel repo then the error changes:

[root@ninja ~]# yum install --disablerepo=epel ceph
Loaded plugins: langpacks, priorities, product-id, subscription-manager
10 packages excluded due to repository priority protections
Resolving Dependencies
.
-- Finished Dependency Resolution
Error: Package: gperftools-libs-2.1-1.el7.x86_64 (ceph)
   Requires: libunwind.so.8()(64bit)

So this is related to the EPEL repo breaking ceph again. I have 
check_obsoletes=1 as recommended
on this list a couple weeks ago.

Is there any chance you could copy the libunwind repo to eu.ceph.com ?

Paul Hewlett
Senior Systems Engineer
Velocix, Cambridge
Alcatel-Lucent
t: +44 1223 435893 m: +44 7985327353




From: Wido den Hollander [w...@42on.com]
Sent: 09 March 2015 13:43
To: HEWLETT, Paul (Paul)** CTR **; ceph-users
Subject: Re: [ceph-users] New eu.ceph.com mirror machine

On 03/09/2015 02:27 PM, HEWLETT, Paul (Paul)** CTR ** wrote:
 When did you make the change?


Yesterday

 It worked on Friday albeit with these extra lines in ceph.repo:

 [Ceph-el7]
 name=Ceph-el7
 baseurl=http://eu.ceph.com/rpms/rhel7/noarch/
 enabled=1
 gpgcheck=0

 which I removed when I discovered this no longer existed.


Ah, I think I know. The rsync script probably didn't clean up those old
directories, since they don't exist here either:
http://ceph.com/rpms/rhel7/noarch/

That caused some confusion since this machine is a fresh sync from ceph.com

 Regards
 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353



 
 From: Wido den Hollander [w...@42on.com]
 Sent: 09 March 2015 12:15
 To: HEWLETT, Paul (Paul)** CTR **; ceph-users
 Subject: Re: [ceph-users] New eu.ceph.com mirror machine

 On 03/09/2015 12:54 PM, HEWLETT, Paul (Paul)** CTR ** wrote:
 Hi Wildo

 Has something broken with this move? The following has worked for me 
 repeatedly over the last 2 months:


 It shouldn't have broken anything, but you never know.

 The machine rsyncs the data from ceph.com directly. The directories you
 are pointing at do exist and contain data.

 Anybody else noticing something?

 This a.m. I tried to install ceph using the following repo file:

 [root@citrus ~]# cat /etc/yum.repos.d/ceph.repo
 [ceph]
 name=Ceph packages for $basearch
 baseurl=http://ceph.com/rpm-giant/rhel7/$basearch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-noarch]
 name=Ceph noarch packages
 baseurl=http://ceph.com/rpm-giant/rhel7/noarch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-source]
 name=Ceph source packages
 baseurl=http://ceph.com/rpm-giant/rhel7/SRPMS
 enabled=0
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 and ceph now fails to install:

 msg: Error: Package: 1:ceph-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7
 Error: Package: 1:ceph-common-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7

 Regards
 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353



 
 From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido den 
 Hollander [w...@42on.com]
 Sent: 09 March 2015 11:15
 To: ceph-users
 Subject: [ceph-users] New eu.ceph.com mirror machine

 Hi,

 Since the recent reports of rsync failing on eu.ceph.com I moved
 eu.ceph.com to a new machine.

 It went from physical to a KVM VM backed by RBD, so it's now running on
 Ceph.

 URLs or rsync paths haven't changed, it's still eu.ceph.com and
 available over IPv4 and IPv6.

 This Virtual Machine is dedicated for running eu.ceph.com, so hopefully
 rsync won't fail anymore.

 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread HEWLETT, Paul (Paul)** CTR **

Hi Wildo

Has something broken with this move? The following has worked for me repeatedly 
over the last 2 months:

This a.m. I tried to install ceph using the following repo file:

[root@citrus ~]# cat /etc/yum.repos.d/ceph.repo 
[ceph]
name=Ceph packages for $basearch
baseurl=http://ceph.com/rpm-giant/rhel7/$basearch
enabled=1
priority=2
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

[ceph-noarch]
name=Ceph noarch packages
baseurl=http://ceph.com/rpm-giant/rhel7/noarch
enabled=1
priority=2
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

[ceph-source]
name=Ceph source packages
baseurl=http://ceph.com/rpm-giant/rhel7/SRPMS
enabled=0
priority=2
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

and ceph now fails to install:

msg: Error: Package: 1:ceph-0.87.1-0.el7.x86_64 (ceph)
   Requires: python-ceph = 1:0.87.1-0.el7
   Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
   python-ceph = 1:0.86-0.el7
   Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
   python-ceph = 1:0.87-0.el7
   Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
   python-ceph = 1:0.87.1-0.el7
Error: Package: 1:ceph-common-0.87.1-0.el7.x86_64 (ceph)
   Requires: python-ceph = 1:0.87.1-0.el7
   Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
   python-ceph = 1:0.86-0.el7
   Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
   python-ceph = 1:0.87-0.el7
   Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
   python-ceph = 1:0.87.1-0.el7

Regards
Paul Hewlett
Senior Systems Engineer
Velocix, Cambridge
Alcatel-Lucent
t: +44 1223 435893 m: +44 7985327353




From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido den 
Hollander [w...@42on.com]
Sent: 09 March 2015 11:15
To: ceph-users
Subject: [ceph-users] New eu.ceph.com mirror machine

Hi,

Since the recent reports of rsync failing on eu.ceph.com I moved
eu.ceph.com to a new machine.

It went from physical to a KVM VM backed by RBD, so it's now running on
Ceph.

URLs or rsync paths haven't changed, it's still eu.ceph.com and
available over IPv4 and IPv6.

This Virtual Machine is dedicated for running eu.ceph.com, so hopefully
rsync won't fail anymore.

--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread Wido den Hollander

On 03/09/2015 12:54 PM, HEWLETT, Paul (Paul)** CTR ** wrote:
 Hi Wildo
 
 Has something broken with this move? The following has worked for me 
 repeatedly over the last 2 months:
 

It shouldn't have broken anything, but you never know.

The machine rsyncs the data from ceph.com directly. The directories you
are pointing at do exist and contain data.

Anybody else noticing something?

 This a.m. I tried to install ceph using the following repo file:
 
 [root@citrus ~]# cat /etc/yum.repos.d/ceph.repo 
 [ceph]
 name=Ceph packages for $basearch
 baseurl=http://ceph.com/rpm-giant/rhel7/$basearch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 
 [ceph-noarch]
 name=Ceph noarch packages
 baseurl=http://ceph.com/rpm-giant/rhel7/noarch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 
 [ceph-source]
 name=Ceph source packages
 baseurl=http://ceph.com/rpm-giant/rhel7/SRPMS
 enabled=0
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 
 and ceph now fails to install:
 
 msg: Error: Package: 1:ceph-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7
 Error: Package: 1:ceph-common-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7
 
 Regards
 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353
 
 
 
 
 From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido den 
 Hollander [w...@42on.com]
 Sent: 09 March 2015 11:15
 To: ceph-users
 Subject: [ceph-users] New eu.ceph.com mirror machine
 
 Hi,
 
 Since the recent reports of rsync failing on eu.ceph.com I moved
 eu.ceph.com to a new machine.
 
 It went from physical to a KVM VM backed by RBD, so it's now running on
 Ceph.
 
 URLs or rsync paths haven't changed, it's still eu.ceph.com and
 available over IPv4 and IPv6.
 
 This Virtual Machine is dedicated for running eu.ceph.com, so hopefully
 rsync won't fail anymore.
 
 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant
 
 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread joel.merr...@gmail.com

Hi,

I'm trying to fix an issue within 0.93 on our internal cloud related
to incomplete pg's (yes, I realise the folly of having the dev release
- it's a not-so-test env now, so I need to recover this really). I'll
detail the current outage info;

72 initial (now 65) OSDs
6 nodes

* Update to 0.92 from Giant.
* Fine for a day
* MDS outage overnight and subsequent node failure
* Massive increase in RAM utilisation (10G per OSD!)
* More failure
* OSD's 'out' to try to alleviate new large cluster requirements and a
couple died under additional load
* 'superfluous and faulty' OSD's rm, auth keys deleted
* RAM added to nodes (96GB each - serving 10-12 OSDs)
* Ugrade to 0.93
* Fix broken journals due to 0.92 update
* No more missing objects or degredation

So, that brings me to today, I still have 73/2264 PGs listed as stuck
incomplete/inactive. I also have requests that are blocked.

Upon querying said placement groups, I notice that they are
'blocked_by' non-existent OSDs (ones I have removed due to issues).
I have no way to tell them the OSD is lost (as it'a already been
removed, both from osdmap and crushmap).
Exporting the crushmap shows non-existant OSDs as deviceN (i.e.
device36 for the removed osd.36)
Deleting those and reimporting crush map makes no affect

Some further pg detail - https://gist.github.com/joelio/cecca9b48aca6d44451b


So I'm stuck, I can't recover the pg's as I can't remove a
non-existent OSD that the PG think's blocking it.

Help graciously accepted!
Joel

--
$ echo kpfmAdpoofdufevq/dp/vl | perl -pe 's/(.)/chr(ord($1)-1)/ge'


-- 
$ echo kpfmAdpoofdufevq/dp/vl | perl -pe 's/(.)/chr(ord($1)-1)/ge'
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Mohamed Pakkeer

Hi Karan,

We faced same issue and resolved after increasing the open file limit and
maximum no of threads

Config reference

/etc/security/limit.conf

root hard nofile 65535

sysctl -w kernel.pid_max=4194303
http://tracker.ceph.com/issues/10554#change-47024

Cheers

Mohamed Pakkeer

On Mon, Mar 9, 2015 at 4:20 PM, Azad Aliyar azad.ali...@sparksupport.com
wrote:

 *Check Max Threadcount:* If you have a node with a lot of OSDs, you may
 be hitting the default maximum number of threads (e.g., usually 32k),
 especially during recovery. You can increase the number of threads using
 sysctl to see if increasing the maximum number of threads to the maximum
 possible number of threads allowed (i.e., 4194303) will help. For example:

 sysctl -w kernel.pid_max=4194303

  If increasing the maximum thread count resolves the issue, you can make
 it permanent by including a kernel.pid_max setting in the /etc/sysctl.conf
 file. For example:

 kernel.pid_max = 4194303


 On Mon, Mar 9, 2015 at 4:11 PM, Karan Singh karan.si...@csc.fi wrote:

 Hello Community need help to fix a long going Ceph problem.

 Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to
 restart OSD’s i am getting this error


 *2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970*
 *common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)*


 *Environment *:  4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5 ,
 3.17.2-1.el6.elrepo.x86_64

 Tried upgrading from 0.80.7 to 0.80.8  but no Luck

 Tried centOS stock kernel 2.6.32  but no Luck

 Memory is not a problem more then 150+GB is free


 Did any one every faced this problem ??

 *Cluster status *

  *  cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33*
 * health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs
 incomplete; 1735 pgs peering; 8938 pgs stale; 1*
 *736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean;
 recovery 6061/31080 objects degraded (19*
 *.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02,
 mon.pouta-s03*
 * monmap e3: 3 mons at
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789*
 */0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03*
 * osdmap e26633: 239 osds: 85 up, 196 in*
 *  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects*
 *4699 GB used, 707 TB / 711 TB avail*
 *6061/31080 objects degraded (19.501%)*
 *  14 down+remapped+peering*
 *  39 active*
 *3289 active+clean*
 * 547 peering*
 * 663 stale+down+peering*
 * 705 stale+active+remapped*
 *   1 active+degraded+remapped*
 *   1 stale+down+incomplete*
 * 484 down+peering*
 * 455 active+remapped*
 *3696 stale+active+degraded*
 *   4 remapped+peering*
 *  23 stale+down+remapped+peering*
 *  51 stale+active*
 *3637 active+degraded*
 *3799 stale+active+clean*

 *OSD :  Logs *

 *2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970*
 *common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)*

 * ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)*
 * 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]*
 * 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]*
 * 3: (Accepter::entry()+0x265) [0xb5c635]*
 * 4: /lib64/libpthread.so.0() [0x3c8a6079d1]*
 * 5: (clone()+0x6d) [0x3c8a2e89dd]*
 * NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.*


 *More information at Ceph Tracker Issue :  *
 http://tracker.ceph.com/issues/10988#change-49018


 
 Karan Singh
 Systems Specialist , Storage Platforms
 CSC - IT Center for Science,
 Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
 mobile: +358 503 812758
 tel. +358 9 4572001
 fax +358 9 4572302
 http://www.csc.fi/
 


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
Warm Regards,  Azad Aliyar
  Linux Server Engineer
  *Email* :  azad.ali...@sparksupport.com   *|*   *Skype* :   spark.azad
 http://www.sparksupport.com http://www.sparkmycloud.com
 https://www.facebook.com/sparksupport
 http://www.linkedin.com/company/244846
 https://twitter.com/sparksupport3rd Floor, Leela Infopark, Phase
 -2,Kakanad, Kochi-30, Kerala, India  *Phone*:+91 484 6561696 , 
 *Mobile*:91-8129270421.
   *Confidentiality Notice:* Information in this

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread Wido den Hollander

On 03/09/2015 02:47 PM, HEWLETT, Paul (Paul)** CTR ** wrote:
 Hi Wildo
 
 If I disable the epel repo then the error changes:
 
 [root@ninja ~]# yum install --disablerepo=epel ceph
 Loaded plugins: langpacks, priorities, product-id, subscription-manager
 10 packages excluded due to repository priority protections
 Resolving Dependencies
 .
 -- Finished Dependency Resolution
 Error: Package: gperftools-libs-2.1-1.el7.x86_64 (ceph)
Requires: libunwind.so.8()(64bit)
 
 So this is related to the EPEL repo breaking ceph again. I have 
 check_obsoletes=1 as recommended
 on this list a couple weeks ago.
 
 Is there any chance you could copy the libunwind repo to eu.ceph.com ?
 

Hmm, I'll check the rsync script again. No manual copy should be
required, it should fully sync the whole repository.

I'll look into that!

Wido

 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353
 
 
 
 
 From: Wido den Hollander [w...@42on.com]
 Sent: 09 March 2015 13:43
 To: HEWLETT, Paul (Paul)** CTR **; ceph-users
 Subject: Re: [ceph-users] New eu.ceph.com mirror machine
 
 On 03/09/2015 02:27 PM, HEWLETT, Paul (Paul)** CTR ** wrote:
 When did you make the change?

 
 Yesterday
 
 It worked on Friday albeit with these extra lines in ceph.repo:

 [Ceph-el7]
 name=Ceph-el7
 baseurl=http://eu.ceph.com/rpms/rhel7/noarch/
 enabled=1
 gpgcheck=0

 which I removed when I discovered this no longer existed.

 
 Ah, I think I know. The rsync script probably didn't clean up those old
 directories, since they don't exist here either:
 http://ceph.com/rpms/rhel7/noarch/
 
 That caused some confusion since this machine is a fresh sync from ceph.com
 
 Regards
 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353



 
 From: Wido den Hollander [w...@42on.com]
 Sent: 09 March 2015 12:15
 To: HEWLETT, Paul (Paul)** CTR **; ceph-users
 Subject: Re: [ceph-users] New eu.ceph.com mirror machine

 On 03/09/2015 12:54 PM, HEWLETT, Paul (Paul)** CTR ** wrote:
 Hi Wildo

 Has something broken with this move? The following has worked for me 
 repeatedly over the last 2 months:


 It shouldn't have broken anything, but you never know.

 The machine rsyncs the data from ceph.com directly. The directories you
 are pointing at do exist and contain data.

 Anybody else noticing something?

 This a.m. I tried to install ceph using the following repo file:

 [root@citrus ~]# cat /etc/yum.repos.d/ceph.repo
 [ceph]
 name=Ceph packages for $basearch
 baseurl=http://ceph.com/rpm-giant/rhel7/$basearch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-noarch]
 name=Ceph noarch packages
 baseurl=http://ceph.com/rpm-giant/rhel7/noarch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-source]
 name=Ceph source packages
 baseurl=http://ceph.com/rpm-giant/rhel7/SRPMS
 enabled=0
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 and ceph now fails to install:

 msg: Error: Package: 1:ceph-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7
 Error: Package: 1:ceph-common-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7

 Regards
 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353



 
 From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido den 
 Hollander [w...@42on.com]
 Sent: 09 March 2015 11:15
 To: ceph-users
 Subject: [ceph-users] New eu.ceph.com mirror machine

 Hi,

 Since the recent reports of rsync failing on eu.ceph.com I moved
 eu.ceph.com to a new machine.

 It went from physical to a KVM VM backed by RBD, so it's now running on
 Ceph.

 URLs or rsync paths haven't changed, it's still eu.ceph.com and
 available over IPv4 and IPv6.

 This Virtual Machine is dedicated for running eu.ceph.com, so hopefully
 rsync won't fail anymore.

 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on

[ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread joel.merr...@gmail.com

Hi,

I'm trying to fix an issue within 0.93 on our internal cloud related
to incomplete pg's (yes, I realise the folly of having the dev release
- it's a not-so-test env now, so I need to recover this really). I'll
detail the current outage info;

72 initial (now 65) OSDs
6 nodes

* Update to 0.92 from Giant.
* Fine for a day
* MDS outage overnight and subsequent node failure
* Massive increase in RAM utilisation (10G per OSD!)
* More failure
* OSD's 'out' to try to alleviate new large cluster requirements and a
couple died under additional load
* 'superfluous and faulty' OSD's rm, auth keys deleted
* RAM added to nodes (96GB each - serving 10-12 OSDs)
* Ugrade to 0.93
* Fix broken journals due to 0.92 update
* No more missing objects or degredation

So, that brings me to today, I still have 73/2264 PGs listed as stuck
incomplete/inactive. I also have requests that are blocked.

Upon querying said placement groups, I notice that they are
'blocked_by' non-existent OSDs (ones I have removed due to issues).
I have no way to tell them the OSD is lost (as it'a already been
removed, both from osdmap and crushmap).
Exporting the crushmap shows non-existant OSDs as deviceN (i.e.
device36 for the removed osd.36)
Deleting those and reimporting crush map makes no affect

Some further pg detail - https://gist.github.com/joelio/cecca9b48aca6d44451b


So I'm stuck, I can't recover the pg's as I can't remove a
non-existent OSD that the PG think's blocking it.

Help graciously accepted!
Joel

-- 
$ echo kpfmAdpoofdufevq/dp/vl | perl -pe 's/(.)/chr(ord($1)-1)/ge'
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread HEWLETT, Paul (Paul)** CTR **

When did you make the change?

It worked on Friday albeit with these extra lines in ceph.repo:

[Ceph-el7]
name=Ceph-el7
baseurl=http://eu.ceph.com/rpms/rhel7/noarch/
enabled=1
gpgcheck=0

which I removed when I discovered this no longer existed.

Regards
Paul Hewlett
Senior Systems Engineer
Velocix, Cambridge
Alcatel-Lucent
t: +44 1223 435893 m: +44 7985327353




From: Wido den Hollander [w...@42on.com]
Sent: 09 March 2015 12:15
To: HEWLETT, Paul (Paul)** CTR **; ceph-users
Subject: Re: [ceph-users] New eu.ceph.com mirror machine

On 03/09/2015 12:54 PM, HEWLETT, Paul (Paul)** CTR ** wrote:
 Hi Wildo

 Has something broken with this move? The following has worked for me 
 repeatedly over the last 2 months:


It shouldn't have broken anything, but you never know.

The machine rsyncs the data from ceph.com directly. The directories you
are pointing at do exist and contain data.

Anybody else noticing something?

 This a.m. I tried to install ceph using the following repo file:

 [root@citrus ~]# cat /etc/yum.repos.d/ceph.repo
 [ceph]
 name=Ceph packages for $basearch
 baseurl=http://ceph.com/rpm-giant/rhel7/$basearch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-noarch]
 name=Ceph noarch packages
 baseurl=http://ceph.com/rpm-giant/rhel7/noarch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-source]
 name=Ceph source packages
 baseurl=http://ceph.com/rpm-giant/rhel7/SRPMS
 enabled=0
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 and ceph now fails to install:

 msg: Error: Package: 1:ceph-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7
 Error: Package: 1:ceph-common-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7

 Regards
 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353



 
 From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido den 
 Hollander [w...@42on.com]
 Sent: 09 March 2015 11:15
 To: ceph-users
 Subject: [ceph-users] New eu.ceph.com mirror machine

 Hi,

 Since the recent reports of rsync failing on eu.ceph.com I moved
 eu.ceph.com to a new machine.

 It went from physical to a KVM VM backed by RBD, so it's now running on
 Ceph.

 URLs or rsync paths haven't changed, it's still eu.ceph.com and
 available over IPv4 and IPv6.

 This Virtual Machine is dedicated for running eu.ceph.com, so hopefully
 rsync won't fail anymore.

 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread Wido den Hollander

On 03/09/2015 02:27 PM, HEWLETT, Paul (Paul)** CTR ** wrote:
 When did you make the change?
 

Yesterday

 It worked on Friday albeit with these extra lines in ceph.repo:
 
 [Ceph-el7]
 name=Ceph-el7
 baseurl=http://eu.ceph.com/rpms/rhel7/noarch/
 enabled=1
 gpgcheck=0
 
 which I removed when I discovered this no longer existed.
 

Ah, I think I know. The rsync script probably didn't clean up those old
directories, since they don't exist here either:
http://ceph.com/rpms/rhel7/noarch/

That caused some confusion since this machine is a fresh sync from ceph.com

 Regards
 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353
 
 
 
 
 From: Wido den Hollander [w...@42on.com]
 Sent: 09 March 2015 12:15
 To: HEWLETT, Paul (Paul)** CTR **; ceph-users
 Subject: Re: [ceph-users] New eu.ceph.com mirror machine
 
 On 03/09/2015 12:54 PM, HEWLETT, Paul (Paul)** CTR ** wrote:
 Hi Wildo

 Has something broken with this move? The following has worked for me 
 repeatedly over the last 2 months:

 
 It shouldn't have broken anything, but you never know.
 
 The machine rsyncs the data from ceph.com directly. The directories you
 are pointing at do exist and contain data.
 
 Anybody else noticing something?
 
 This a.m. I tried to install ceph using the following repo file:

 [root@citrus ~]# cat /etc/yum.repos.d/ceph.repo
 [ceph]
 name=Ceph packages for $basearch
 baseurl=http://ceph.com/rpm-giant/rhel7/$basearch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-noarch]
 name=Ceph noarch packages
 baseurl=http://ceph.com/rpm-giant/rhel7/noarch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-source]
 name=Ceph source packages
 baseurl=http://ceph.com/rpm-giant/rhel7/SRPMS
 enabled=0
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 and ceph now fails to install:

 msg: Error: Package: 1:ceph-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7
 Error: Package: 1:ceph-common-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7

 Regards
 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353



 
 From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido den 
 Hollander [w...@42on.com]
 Sent: 09 March 2015 11:15
 To: ceph-users
 Subject: [ceph-users] New eu.ceph.com mirror machine

 Hi,

 Since the recent reports of rsync failing on eu.ceph.com I moved
 eu.ceph.com to a new machine.

 It went from physical to a KVM VM backed by RBD, so it's now running on
 Ceph.

 URLs or rsync paths haven't changed, it's still eu.ceph.com and
 available over IPv4 and IPv6.

 This Virtual Machine is dedicated for running eu.ceph.com, so hopefully
 rsync won't fail anymore.

 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 
 
 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant
 
 Phone: +31 (0)20 700 9902
 Skype: contact42on
 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread Wido den Hollander

Hi,

Since the recent reports of rsync failing on eu.ceph.com I moved
eu.ceph.com to a new machine.

It went from physical to a KVM VM backed by RBD, so it's now running on
Ceph.

URLs or rsync paths haven't changed, it's still eu.ceph.com and
available over IPv4 and IPv6.

This Virtual Machine is dedicated for running eu.ceph.com, so hopefully
rsync won't fail anymore.

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Christian Eichelmann

Hi Karan,

as you are actually writing in your own book, the problem is the sysctl
setting kernel.pid_max. I've seen in your bug report that you were
setting it to 65536, which is still to low for high density hardware.

In our cluster, one OSD server has in an idle situation about 66.000
Threads (60 OSDs per Server). The number of threads increases when you
increase the number of placement groups in the cluster, which I think
has triggered your problem.

Set the kernel.pid_max setting to 4194303 (the maximum) like Azad
Aliyar suggested, and the problem should be gone.

Regards,
Christian

Am 09.03.2015 11:41, schrieb Karan Singh:
 Hello Community need help to fix a long going Ceph problem.
 
 Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to
 restart OSD’s i am getting this error 
 
 
 /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970/
 /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/
 
 
 *Environment *:  4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5
 , 3.17.2-1.el6.elrepo.x86_64
 
 Tried upgrading from 0.80.7 to 0.80.8  but no Luck
 
 Tried centOS stock kernel 2.6.32  but no Luck
 
 Memory is not a problem more then 150+GB is free 
 
 
 Did any one every faced this problem ??
 
 *Cluster status *
 *
 *
  / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
 / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs
 incomplete; 1735 pgs peering; 8938 pgs stale; 1/
 /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean;
 recovery 6061/31080 objects degraded (19/
 /.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02,
 mon.pouta-s03/
 / monmap e3: 3 mons at
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789/
 //0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03/
 /   * osdmap e26633: 239 osds: 85 up, 196 in*/
 /  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects/
 /4699 GB used, 707 TB / 711 TB avail/
 /6061/31080 objects degraded (19.501%)/
 /  14 down+remapped+peering/
 /  39 active/
 /3289 active+clean/
 / 547 peering/
 / 663 stale+down+peering/
 / 705 stale+active+remapped/
 /   1 active+degraded+remapped/
 /   1 stale+down+incomplete/
 / 484 down+peering/
 / 455 active+remapped/
 /3696 stale+active+degraded/
 /   4 remapped+peering/
 /  23 stale+down+remapped+peering/
 /  51 stale+active/
 /3637 active+degraded/
 /3799 stale+active+clean/
 
 *OSD :  Logs *
 
 /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970/
 /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/
 /
 /
 / ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)/
 / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/
 / 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]/
 / 3: (Accepter::entry()+0x265) [0xb5c635]/
 / 4: /lib64/libpthread.so.0() [0x3c8a6079d1]/
 / 5: (clone()+0x6d) [0x3c8a2e89dd]/
 / NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this./
 
 
 *More information at Ceph Tracker Issue :
  *http://tracker.ceph.com/issues/10988#change-49018
 
 
 
 Karan Singh 
 Systems Specialist , Storage Platforms
 CSC - IT Center for Science,
 Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
 mobile: +358 503 812758
 tel. +358 9 4572001
 fax +358 9 4572302
 http://www.csc.fi/
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
Christian Eichelmann
Systemadministrator

11 Internet AG - IT Operations Mail  Media Advertising  Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Disk serial number from OSD

2015-03-09 Thread Nick Fisk

Hi All,

I just created this little bash script to retrieve the /dev/disk/by-id
string for each OSD on a host. Our disks are internally mounted so have no
concept of drive bays, this should make it easier to work out what disk has
failed.

#!/bin/bash

DISKS=`ceph-disk list | grep ceph data`
old_IFS=$IFS
IFS=$'\n'
echo $DISKS
for DISK in $DISKS; do
DEV=`awk '{print $1}'  $DISK`
OSD=`awk '{print $7}'  $DISK`
DEV=`echo $DEV | sed -e 's/\/dev\///g'`
ID=`ls -l /dev/disk/by-id | grep $DEV | awk '{print $9}' | egrep -v
wwn`
echo $OSD $ID
done
IFS=$old_IFS




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New eu.ceph.com mirror machine

2015-03-09 Thread HEWLETT, Paul (Paul)** CTR **

Hi Wildo

It seems that your move coincided with yet another change in the EPEL repo.

For anyone who is interested, I fixed this by:

 1. ensuring that check_obsoletes=1 is in 
/etc/yum/pluginconf.d/priorities.conf
 2. Install libunwind explicitly:

yum install libunwind

 3. Install ceph with epel disabled:

   yum install --disablerepo=epel ceph

Regards
Paul Hewlett
Senior Systems Engineer
Velocix, Cambridge
Alcatel-Lucent
t: +44 1223 435893 m: +44 7985327353




From: Wido den Hollander [w...@42on.com]
Sent: 09 March 2015 13:43
To: HEWLETT, Paul (Paul)** CTR **; ceph-users
Subject: Re: [ceph-users] New eu.ceph.com mirror machine

On 03/09/2015 02:27 PM, HEWLETT, Paul (Paul)** CTR ** wrote:
 When did you make the change?


Yesterday

 It worked on Friday albeit with these extra lines in ceph.repo:

 [Ceph-el7]
 name=Ceph-el7
 baseurl=http://eu.ceph.com/rpms/rhel7/noarch/
 enabled=1
 gpgcheck=0

 which I removed when I discovered this no longer existed.


Ah, I think I know. The rsync script probably didn't clean up those old
directories, since they don't exist here either:
http://ceph.com/rpms/rhel7/noarch/

That caused some confusion since this machine is a fresh sync from ceph.com

 Regards
 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353



 
 From: Wido den Hollander [w...@42on.com]
 Sent: 09 March 2015 12:15
 To: HEWLETT, Paul (Paul)** CTR **; ceph-users
 Subject: Re: [ceph-users] New eu.ceph.com mirror machine

 On 03/09/2015 12:54 PM, HEWLETT, Paul (Paul)** CTR ** wrote:
 Hi Wildo

 Has something broken with this move? The following has worked for me 
 repeatedly over the last 2 months:


 It shouldn't have broken anything, but you never know.

 The machine rsyncs the data from ceph.com directly. The directories you
 are pointing at do exist and contain data.

 Anybody else noticing something?

 This a.m. I tried to install ceph using the following repo file:

 [root@citrus ~]# cat /etc/yum.repos.d/ceph.repo
 [ceph]
 name=Ceph packages for $basearch
 baseurl=http://ceph.com/rpm-giant/rhel7/$basearch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-noarch]
 name=Ceph noarch packages
 baseurl=http://ceph.com/rpm-giant/rhel7/noarch
 enabled=1
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 [ceph-source]
 name=Ceph source packages
 baseurl=http://ceph.com/rpm-giant/rhel7/SRPMS
 enabled=0
 priority=2
 gpgcheck=1
 type=rpm-md
 gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

 and ceph now fails to install:

 msg: Error: Package: 1:ceph-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7
 Error: Package: 1:ceph-common-0.87.1-0.el7.x86_64 (ceph)
Requires: python-ceph = 1:0.87.1-0.el7
Available: 1:python-ceph-0.86-0.el7.x86_64 (ceph)
python-ceph = 1:0.86-0.el7
Available: 1:python-ceph-0.87-0.el7.x86_64 (ceph)
python-ceph = 1:0.87-0.el7
Available: 1:python-ceph-0.87.1-0.el7.x86_64 (ceph)
python-ceph = 1:0.87.1-0.el7

 Regards
 Paul Hewlett
 Senior Systems Engineer
 Velocix, Cambridge
 Alcatel-Lucent
 t: +44 1223 435893 m: +44 7985327353



 
 From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido den 
 Hollander [w...@42on.com]
 Sent: 09 March 2015 11:15
 To: ceph-users
 Subject: [ceph-users] New eu.ceph.com mirror machine

 Hi,

 Since the recent reports of rsync failing on eu.ceph.com I moved
 eu.ceph.com to a new machine.

 It went from physical to a KVM VM backed by RBD, so it's now running on
 Ceph.

 URLs or rsync paths haven't changed, it's still eu.ceph.com and
 available over IPv4 and IPv6.

 This Virtual Machine is dedicated for running eu.ceph.com, so hopefully
 rsync won't fail anymore.

 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on



--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Karan Singh

Thanks Guys kernel.pid_max=4194303 did the trick.

- Karan -

 On 09 Mar 2015, at 14:48, Christian Eichelmann 
 christian.eichelm...@1und1.de wrote:
 
 Hi Karan,
 
 as you are actually writing in your own book, the problem is the sysctl
 setting kernel.pid_max. I've seen in your bug report that you were
 setting it to 65536, which is still to low for high density hardware.
 
 In our cluster, one OSD server has in an idle situation about 66.000
 Threads (60 OSDs per Server). The number of threads increases when you
 increase the number of placement groups in the cluster, which I think
 has triggered your problem.
 
 Set the kernel.pid_max setting to 4194303 (the maximum) like Azad
 Aliyar suggested, and the problem should be gone.
 
 Regards,
 Christian
 
 Am 09.03.2015 11:41, schrieb Karan Singh:
 Hello Community need help to fix a long going Ceph problem.
 
 Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to
 restart OSD’s i am getting this error 
 
 
 /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970/
 /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/
 
 
 *Environment *:  4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5
 , 3.17.2-1.el6.elrepo.x86_64
 
 Tried upgrading from 0.80.7 to 0.80.8  but no Luck
 
 Tried centOS stock kernel 2.6.32  but no Luck
 
 Memory is not a problem more then 150+GB is free 
 
 
 Did any one every faced this problem ??
 
 *Cluster status *
 *
 *
 / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
 / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs
 incomplete; 1735 pgs peering; 8938 pgs stale; 1/
 /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean;
 recovery 6061/31080 objects degraded (19/
 /.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02,
 mon.pouta-s03/
 / monmap e3: 3 mons at
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789/
 //0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03/
 /   * osdmap e26633: 239 osds: 85 up, 196 in*/
 /  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects/
 /4699 GB used, 707 TB / 711 TB avail/
 /6061/31080 objects degraded (19.501%)/
 /  14 down+remapped+peering/
 /  39 active/
 /3289 active+clean/
 / 547 peering/
 / 663 stale+down+peering/
 / 705 stale+active+remapped/
 /   1 active+degraded+remapped/
 /   1 stale+down+incomplete/
 / 484 down+peering/
 / 455 active+remapped/
 /3696 stale+active+degraded/
 /   4 remapped+peering/
 /  23 stale+down+remapped+peering/
 /  51 stale+active/
 /3637 active+degraded/
 /3799 stale+active+clean/
 
 *OSD :  Logs *
 
 /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970/
 /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/
 /
 /
 / ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)/
 / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/
 / 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]/
 / 3: (Accepter::entry()+0x265) [0xb5c635]/
 / 4: /lib64/libpthread.so.0() [0x3c8a6079d1]/
 / 5: (clone()+0x6d) [0x3c8a2e89dd]/
 / NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this./
 
 
 *More information at Ceph Tracker Issue :
 *http://tracker.ceph.com/issues/10988#change-49018
 
 
 
 Karan Singh 
 Systems Specialist , Storage Platforms
 CSC - IT Center for Science,
 Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
 mobile: +358 503 812758
 tel. +358 9 4572001
 fax +358 9 4572302
 http://www.csc.fi/
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 -- 
 Christian Eichelmann
 Systemadministrator
 
 11 Internet AG - IT Operations Mail  Media Advertising  Targeting
 Brauerstraße 48 · DE-76135 Karlsruhe
 Telefon: +49 721 91374-8026
 christian.eichelm...@1und1.de
 
 Amtsgericht Montabaur / HRB 6484
 Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
 Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
 Aufsichtsratsvorsitzender: Michael Scheeren



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread Samuel Just

You'll probably have to recreate osds with the same ids (empty ones),
let them boot, stop them, and mark them lost.  There is a feature in the
tracker to improve this behavior: http://tracker.ceph.com/issues/10976
-Sam

On Mon, 2015-03-09 at 12:24 +, joel.merr...@gmail.com wrote:
 Hi,
 
 I'm trying to fix an issue within 0.93 on our internal cloud related
 to incomplete pg's (yes, I realise the folly of having the dev release
 - it's a not-so-test env now, so I need to recover this really). I'll
 detail the current outage info;
 
 72 initial (now 65) OSDs
 6 nodes
 
 * Update to 0.92 from Giant.
 * Fine for a day
 * MDS outage overnight and subsequent node failure
 * Massive increase in RAM utilisation (10G per OSD!)
 * More failure
 * OSD's 'out' to try to alleviate new large cluster requirements and a
 couple died under additional load
 * 'superfluous and faulty' OSD's rm, auth keys deleted
 * RAM added to nodes (96GB each - serving 10-12 OSDs)
 * Ugrade to 0.93
 * Fix broken journals due to 0.92 update
 * No more missing objects or degredation
 
 So, that brings me to today, I still have 73/2264 PGs listed as stuck
 incomplete/inactive. I also have requests that are blocked.
 
 Upon querying said placement groups, I notice that they are
 'blocked_by' non-existent OSDs (ones I have removed due to issues).
 I have no way to tell them the OSD is lost (as it'a already been
 removed, both from osdmap and crushmap).
 Exporting the crushmap shows non-existant OSDs as deviceN (i.e.
 device36 for the removed osd.36)
 Deleting those and reimporting crush map makes no affect
 
 Some further pg detail - https://gist.github.com/joelio/cecca9b48aca6d44451b
 
 
 So I'm stuck, I can't recover the pg's as I can't remove a
 non-existent OSD that the PG think's blocking it.
 
 Help graciously accepted!
 Joel
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Steffen Winther ceph.u...@siimnet.dk
 To: ceph-users@lists.ceph.com
 Sent: Monday, March 9, 2015 12:43:58 AM
 Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP

 Steffen W Sørensen stefws@... writes:

  Response:
  HTTP/1.1 200 OK
  Date: Fri, 06 Mar 2015 10:41:14 GMT
  Server: Apache/2.2.22 (Fedora)
  Connection: close
  Transfer-Encoding: chunked
  Content-Type: application/xml

  This response makes the App say:

  S3.createBucket, class S3, code UnexpectedContent,
  message Inconsistency in S3 response. error
  response is not a valid xml message

  Are our S3 GW not responding properly?
 Why doesn't the radosGW return a Content-Length: 0 header
 when the body is empty?

If you're using apache, then it filters out zero Content-Length. Nothing much 
radosgw can do about it.

 http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html

 Maybe this is confusing my App to expect some XML in body

You can try using the radosgw civetweb frontend, see if it changes anything.

Yehuda

  2. at every create bucket OP the GW create what looks like new containers
  for ACLs in .rgw pool, is this normal
  or howto avoid such multiple objects clottering the GW pools?
 Is there something wrong since I get multiple ACL object for this bucket
 everytime my App tries to recreate same bucket or
 is this a feature/bug in radosGW?

  # rados -p .rgw ls
  .bucket.meta.mssCl:default.6309817.1
  .bucket.meta.mssCl:default.6187712.3
  .bucket.meta.mssCl:default.6299841.7
  .bucket.meta.mssCl:default.6309817.5
  .bucket.meta.mssCl:default.6187712.2
  .bucket.meta.mssCl:default.6187712.19
  .bucket.meta.mssCl:default.6187712.12
  mssCl
  ...

  # rados -p .rgw listxattr .bucket.meta.mssCl:default.6187712.12
  ceph.objclass.version
  user.rgw.acl

 /Steffen
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Nicheal

Umm.. Too many Threads are created in SimpleMessenger, every pipe
should create two working threads for sending and receiving messages.
Thus, AsyncMessenger would be promissing but still in development.

Regards
Ning Yao


2015-03-09 20:48 GMT+08:00 Christian Eichelmann christian.eichelm...@1und1.de:
 Hi Karan,

 as you are actually writing in your own book, the problem is the sysctl
 setting kernel.pid_max. I've seen in your bug report that you were
 setting it to 65536, which is still to low for high density hardware.

 In our cluster, one OSD server has in an idle situation about 66.000
 Threads (60 OSDs per Server). The number of threads increases when you
 increase the number of placement groups in the cluster, which I think
 has triggered your problem.

 Set the kernel.pid_max setting to 4194303 (the maximum) like Azad
 Aliyar suggested, and the problem should be gone.

 Regards,
 Christian

 Am 09.03.2015 11:41, schrieb Karan Singh:
 Hello Community need help to fix a long going Ceph problem.

 Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to
 restart OSD’s i am getting this error


 /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970/
 /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/


 *Environment *:  4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5
 , 3.17.2-1.el6.elrepo.x86_64

 Tried upgrading from 0.80.7 to 0.80.8  but no Luck

 Tried centOS stock kernel 2.6.32  but no Luck

 Memory is not a problem more then 150+GB is free


 Did any one every faced this problem ??

 *Cluster status *
 *
 *
  / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
 / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs
 incomplete; 1735 pgs peering; 8938 pgs stale; 1/
 /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean;
 recovery 6061/31080 objects degraded (19/
 /.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02,
 mon.pouta-s03/
 / monmap e3: 3 mons at
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789/
 //0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03/
 /   * osdmap e26633: 239 osds: 85 up, 196 in*/
 /  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects/
 /4699 GB used, 707 TB / 711 TB avail/
 /6061/31080 objects degraded (19.501%)/
 /  14 down+remapped+peering/
 /  39 active/
 /3289 active+clean/
 / 547 peering/
 / 663 stale+down+peering/
 / 705 stale+active+remapped/
 /   1 active+degraded+remapped/
 /   1 stale+down+incomplete/
 / 484 down+peering/
 / 455 active+remapped/
 /3696 stale+active+degraded/
 /   4 remapped+peering/
 /  23 stale+down+remapped+peering/
 /  51 stale+active/
 /3637 active+degraded/
 /3799 stale+active+clean/

 *OSD :  Logs *

 /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970/
 /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/
 /
 /
 / ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)/
 / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/
 / 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]/
 / 3: (Accepter::entry()+0x265) [0xb5c635]/
 / 4: /lib64/libpthread.so.0() [0x3c8a6079d1]/
 / 5: (clone()+0x6d) [0x3c8a2e89dd]/
 / NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this./


 *More information at Ceph Tracker Issue :
  *http://tracker.ceph.com/issues/10988#change-49018


 
 Karan Singh
 Systems Specialist , Storage Platforms
 CSC - IT Center for Science,
 Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
 mobile: +358 503 812758
 tel. +358 9 4572001
 fax +358 9 4572302
 http://www.csc.fi/
 



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Christian Eichelmann
 Systemadministrator

 11 Internet AG - IT Operations Mail  Media Advertising  Targeting
 Brauerstraße 48 · DE-76135 Karlsruhe
 Telefon: +49 721 91374-8026
 christian.eichelm...@1und1.de

 Amtsgericht Montabaur / HRB 6484
 Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
 Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
 Aufsichtsratsvorsitzender: Michael Scheeren
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com

Re: [ceph-users] tgt and krbd

2015-03-09 Thread Nick Fisk

Hi Mike,

I was using bs_aio with the krbd and still saw a small caching effect. I'm
not sure if it was on the ESXi or tgt/krbd page cache side, but I was
definitely seeing the IO's being coalesced into larger ones on the krbd
device in iostat. Either way, it would make me potentially nervous to run it
like that in a HA setup.


 tgt itself does not do any type of caching, but depending on how you have
 tgt access the underlying block device you might end up using the normal
old
 linux page cache like you would if you did
 
 dd if=/dev/rbd0 of=/dev/null bs=4K count=1 dd if=/dev/rbd0 of=/dev/null
 bs=4K count=1
 
 This is what Ronnie meant in that thread when he was saying there might be
 caching in the underlying device.
 
 If you use tgt bs_rdwr.c (--bstype=rdwr) with the default settings and
with
 krbd then you will end up doing caching, because the krbd's block device
will
 be accessed like in the dd example above (no direct bits set).
 
 You can tell tgt bs_rdwr devices to use O_DIRECT or O_SYNC. When you
 create the lun pass in the --bsoflags {direct | sync }. Here is an
example
 from the man page:
 
 tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1
--bsoflags=sync -
 -backing-store=/data/100m_image.raw
 
 
 If you use bs_aio.c then we always set O_DIRECT when opening the krbd
 device, so no page caching is done. I think linux aio might require this
or at
 least it did at the time it was written.
 
 Also the cache settings exported to the other OS's initiator with that
 modepage command might affect performance then too. It might change
 how that OS does writes like send cache syncs down or do some sort of
 barrier or FUA.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-09 Thread mad Engineer

Thank you Nick for explaining the problem with 4k writes.Queue depth used
in this setup is 256 the maximum supported.
Can you clarify that adding more nodes will not increase iops.In general
how will we increase iops of a ceph cluster.

Thanks for your help


On Sat, Mar 7, 2015 at 5:57 PM, Nick Fisk n...@fisk.me.uk wrote:

 You are hitting serial latency limits. For a 4kb sync write to happen it
 has to:-

 1. Travel across network from client to Primary OSD
 2. Be processed by Ceph
 3. Get Written to Pri OSD
 4. Ack travels across network to client

 At 4kb these 4 steps take up a very high percentage of the actual
 processing time as compared to the actual write to the SSD. Apart from
 faster (more ghz) CPU's which will improve step 2, there's not much that
 can be done. Future Ceph releases may improve step2 as well, but I wouldn't
 imagine it will change dramitcally.

 Replication level 1 will also see the IOPs drop as you are introducing
 yet more ceph processing and network delays. Unless a future Ceph feature
 can be implemented where it returns the ack to client once data has hit the
 1st OSD.

 Still a 1000 iops, is not that bad.  You mention it needs to achieve 8000
 iops to replace your existing SAN, at what queue depth is this required?
 You are getting way above that at a queue depth of only 16.

 I doubt most Ethernet based enterprise SANs would be able to provide 8000
 iops at a queue depth of 1, as just network delays would be limiting you to
 around that figure. A network delay of .1ms will limit you to 10,000 IOPs,
 .2ms = 5000IOPs and so on.

 If you really do need pure SSD performance for a certain client you will
 need to move the SSD local to it using some sort of caching software
 running on the client , although this can bring its own challenges.

 Nick

  -Original Message-
  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
  mad Engineer
  Sent: 07 March 2015 10:55
  To: Somnath Roy
  Cc: ceph-users
  Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes
 and 9
  OSD with 3.16-3 kernel
 
  Update:
 
  Hardware:
  Upgraded RAID controller to LSI Megaraid 9341 -12Gbps
  3 Samsung 840 EVO - was showing 45K iops for fio test with 7 threads and
 4k
  block size in JBOD mode
  CPU- 16 cores @2.27Ghz
  RAM- 24Gb
  NIC- 10Gbits with under 1 ms latency, iperf shows 9.18 Gbps between host
  and client
 
   Software
  Ubuntu 14.04 with stock kernel 3.13-
  Upgraded from firefly to giant [ceph version 0.87.1
  (283c2e7cfa2457799f534744d7d549f83ea1335e)]
  Changed file system to btrfs and i/o scheduler to noop.
 
  Ceph Setup
  replication to 1 and using 2 SSD OSD and 1 SSD for Journal.All are
 samsung 840
  EVO in JBOD mode on single server.
 
  Configuration:
  [global]
  fsid = 979f32fc-6f31-43b0-832f-29fcc4c5a648
  mon_initial_members = ceph1
  mon_host = 10.99.10.118
  auth_cluster_required = cephx
  auth_service_required = cephx
  auth_client_required = cephx
  filestore_xattr_use_omap = true
  osd_pool_default_size = 1
  osd_pool_default_min_size = 1
  osd_pool_default_pg_num = 250
  osd_pool_default_pgp_num = 250
  debug_lockdep = 0/0
  debug_context = 0/0
  debug_crush = 0/0
  debug_buffer = 0/0
  debug_timer = 0/0
  debug_filer = 0/0
  debug_objecter = 0/0
  debug_rados = 0/0
  debug_rbd = 0/0
  debug_journaler = 0/0
  debug_objectcatcher = 0/0
  debug_client = 0/0
  debug_osd = 0/0
  debug_optracker = 0/0
  debug_objclass = 0/0
  debug_filestore = 0/0
  debug_journal = 0/0
  debug_ms = 0/0
  debug_monc = 0/0
  debug_tp = 0/0
  debug_auth = 0/0
  debug_finisher = 0/0
  debug_heartbeatmap = 0/0
  debug_perfcounter = 0/0
  debug_asok = 0/0
  debug_throttle = 0/0
  debug_mon = 0/0
  debug_paxos = 0/0
  debug_rgw = 0/0
 
  [client]
  rbd_cache = true
 
  Client
  Ubuntu 14.04 with 16 Core @2.53 Ghz and 24G RAM
 
  Results
  rados bench -p rdp -b 4096 -t 16 10 write
 
  rados bench -p rbd -b 4096 -t 16 10 write
   Maintaining 16 concurrent writes of 4096 bytes for up to 10 seconds or 0
  objects
   Object prefix: benchmark_data_ubuntucompute_3931
 sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
 lat
   0   0 0 0 0 0 -
  0
   1  16  6370  6354   24.8124   24.8203
  0.002210.00251512
   2  16 11618 11602   22.6536  20.5
 0.0010250.00275493
   3  16 16889 16873   21.9637   20.5898
 0.0012880.00281797
   4  16 17310 1729416.884   1.64453
 0.0540660.00365805
   5  16 17695 1767913.808   1.50391
 0.0014510.0009
   6  16 18127 18111   11.78681.6875
 0.0014630.00527521
   7  16 21647 21631   12.0669 13.75  0.001601
 0.0051773
   8  16 28056 28040   13.6872   25.0352
 0.0052680.00456353
   9  16 28947 2893112.553   3.48047
  0.066470.00494762
  10  16 29346 29330   11.4536   1.55859

[ceph-users] [ANN] ceph-deploy 1.5.22 released

2015-03-09 Thread Travis Rhoden

Hi All,

This is a new release of ceph-deploy that changes a couple of behaviors.

On RPM-based distros, ceph-deploy will now automatically enable
check_obsoletes in the Yum priorities plugin. This resolves an issue
many community members hit where package dependency resolution was
breaking due to conflicts between upstream packaging (hosted on
ceph.com) and downstream (i.e., Fedora or EPEL).

The other important change is that when using ceph-deploy to install
Ceph packages on a RHEL machine, the --release flag *must* be used if
you want to install upstream packages. In other words, if you want to
install Giant on a RHEL machine, you would need to use ceph-deploy
install --release giant. If the --release flag is not used,
ceph-deploy will expect to use downstream package on RHEL. This is
documented at [1].

The full changelog can be seen at [2].

Please update!

 - Travis


[1] http://ceph.com/ceph-deploy/docs/install.html#distribution-notes
[2] http://ceph.com/ceph-deploy/docs/changelog.html#id1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Yehuda Sadeh-Weinraub

- Original Message -
 From: Steffen Winther ceph.u...@siimnet.dk
 To: ceph-users@lists.ceph.com
 Sent: Monday, March 9, 2015 1:25:43 PM
 Subject: Re: [ceph-users] S3 RadosGW - Create bucket OP

 Yehuda Sadeh-Weinraub yehuda@... writes:

  If you're using apache, then it filters out zero Content-Length.
  Nothing much radosgw can do about it.
  You can try using the radosgw civetweb frontend, see if it changes
  anything.
 Thanks, only no difference...

 Req:
 PUT /mssCl/ HTTP/1.1
 Host: rgw.gsp.sprawl.dk:7480
 Authorization: AWS auth id
 Date: Mon, 09 Mar 2015 20:18:16 GMT
 Content-Length: 0

 Response:
 HTTP/1.1 200 OK
 Content-type: application/xml
 Content-Length: 0

 App still says:

 S3.createBucket, class S3, code UnexpectedContent,
 message Inconsistency in S3 response. error response is not a valid xml
 message

 :/

According to the api specified here 
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUT.html, there's no 
response expected. I can only assume that the application tries to decode the 
xml if xml content type is returned. What kind of application is that?

 Yehuda any comments on below 2. issue?

 2. at every create bucket OP the GW create what looks like new containers
 for ACLs in .rgw pool, is this normal
 or howto avoid such multiple objects clottering the GW pools?
 Is there something wrong since I get multiple ACL object for this bucket
 everytime my App tries to recreate same bucket or
 is this a feature/bug in radosGW?

That's a bug.

Yehuda

 # rados -p .rgw ls
 .bucket.meta.mssCl:default.6309817.1
 .bucket.meta.mssCl:default.6187712.3
 .bucket.meta.mssCl:default.6299841.7
 .bucket.meta.mssCl:default.6309817.5
 .bucket.meta.mssCl:default.6187712.2
 .bucket.meta.mssCl:default.6187712.19
 .bucket.meta.mssCl:default.6187712.12
 mssCl
 ...

 # rados -p .rgw listxattr .bucket.meta.mssCl:default.6187712.12
 ceph.objclass.version
 user.rgw.acl

 /Steffen

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Tony Harris

I know I'm not even close to this type of a problem yet with my small
cluster (both test and production clusters) - but it would be great if
something like that could appear in the cluster HEALTHWARN, if Ceph could
determine the amount of used processes and compare them against the current
limit then throw a health warning if it gets within say 10 or 15% of the
max value.  That would be a really quick indicator for anyone who
frequently checks the health status (like through a web portal) as they may
see it more quickly then during their regular log check interval.  Just a
thought.

-Tony

On Mon, Mar 9, 2015 at 2:01 PM, Sage Weil s...@newdream.net wrote:

 On Mon, 9 Mar 2015, Karan Singh wrote:
  Thanks Guys kernel.pid_max=4194303 did the trick.

 Great to hear!  Sorry we missed that you only had it at 65536.

 This is a really common problem that people hit when their clusters start
 to grow.  Is there somewhere in the docs we can put this to catch more
 users?  Or maybe a warning issued by the osds themselves or something if
 they see limits that are low?

 sage

  - Karan -
 
On 09 Mar 2015, at 14:48, Christian Eichelmann
christian.eichelm...@1und1.de wrote:
 
  Hi Karan,
 
  as you are actually writing in your own book, the problem is the
  sysctl
  setting kernel.pid_max. I've seen in your bug report that you were
  setting it to 65536, which is still to low for high density hardware.
 
  In our cluster, one OSD server has in an idle situation about 66.000
  Threads (60 OSDs per Server). The number of threads increases when you
  increase the number of placement groups in the cluster, which I think
  has triggered your problem.
 
  Set the kernel.pid_max setting to 4194303 (the maximum) like Azad
  Aliyar suggested, and the problem should be gone.
 
  Regards,
  Christian
 
  Am 09.03.2015 11:41, schrieb Karan Singh:
Hello Community need help to fix a long going Ceph
problem.
 
Cluster is unhealthy , Multiple OSDs are DOWN. When i am
trying to
restart OSD?s i am getting this error
 
 
/2015-03-09 12:22:16.312774 7f760dac9700 -1
common/Thread.cc
http://Thread.cc: In function 'void
Thread::create(size_t)' thread
7f760dac9700 time 2015-03-09 12:22:16.311970/
/common/Thread.cc http://Thread.cc: 129: FAILED
assert(ret == 0)/
 
 
*Environment *:  4 Nodes , OSD+Monitor , Firefly latest ,
CentOS6.5
, 3.17.2-1.el6.elrepo.x86_64
 
Tried upgrading from 0.80.7 to 0.80.8  but no Luck
 
Tried centOS stock kernel 2.6.32  but no Luck
 
Memory is not a problem more then 150+GB is free
 
 
Did any one every faced this problem ??
 
*Cluster status *
*
*
/ cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
/ health HEALTH_WARN 7334 pgs degraded; 1185 pgs down;
1 pgs
incomplete; 1735 pgs peering; 8938 pgs stale; 1/
/736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs
stuck unclean;
recovery 6061/31080 objects degraded (19/
/.501%); 111/196 in osds are down; clock skew detected on
mon.pouta-s02,
mon.pouta-s03/
/ monmap e3: 3 mons at
 
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX
.50.3:6789/
//0}, election epoch 1312, quorum 0,1,2
pouta-s01,pouta-s02,pouta-s03/
/   * osdmap e26633: 239 osds: 85 up, 196 in*/
/  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data,
10360 objects/
/4699 GB used, 707 TB / 711 TB avail/
/6061/31080 objects degraded (19.501%)/
/  14 down+remapped+peering/
/  39 active/
/3289 active+clean/
/ 547 peering/
/ 663 stale+down+peering/
/ 705 stale+active+remapped/
/   1 active+degraded+remapped/
/   1 stale+down+incomplete/
/ 484 down+peering/
/ 455 active+remapped/
/3696 stale+active+degraded/
/   4 remapped+peering/
/  23 stale+down+remapped+peering/
/  51 stale+active/
/3637 active+degraded/
/3799 stale+active+clean/
 
*OSD :  Logs *
 
/2015-03-09 12:22:16.312774 7f760dac9700 -1
common/Thread.cc
http://Thread.cc: In function 'void
Thread::create(size_t)' thread
7f760dac9700 time 2015-03-09 12:22:16.311970/
/common/Thread.cc http://Thread.cc: 129: FAILED
assert(ret == 0)/
/
/
/ ceph version 0.80.8
(69eaad7f8308f21573c604f121956e64679a52a7)/
/ 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/
/ 2:

Re: [ceph-users] EC Pool and Cache Tier Tuning

2015-03-09 Thread Nick Fisk

Either option #1 or #2 depending on if your data has hot spots or you need
to use EC pools. I'm finding that the cache tier can actually slow stuff
down depending on how much data is in the cache tier vs on the slower tier.

Writes will be about the same speed for both solutions, reads will be a lot
faster using a cache tier if the data resides in it.

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Steffen Winther
 Sent: 09 March 2015 20:47
 To: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] EC Pool and Cache Tier Tuning
 
 Nick Fisk nick@... writes:
 
  My Ceph cluster comprises of 4 Nodes each with the following:- 10x 3TB
  WD Red Pro disks - EC pool k=3 m=3 (7200rpm) 2x S3700 100GB SSD's (20k
  Write IOPs) for HDD Journals 1x S3700 400GB SSD (35k Write IOPs) for
  cache tier - 3x replica
 If I have following 4x node config:
 
   2x S3700 200GB SSD's
   4x 4TB HDDs
 
 What config to aim for to optimize RBD write/read OPs:
 
   1x S3700 200GB SSD for 4x journals
   1x S3700 200GB cache tier
   4x 4TB HDD OSD disk
 
 or:
 
   2x S3700 200GB SSD for 2x journals
   4x 4TB HDD OSD disk
 
 or:
 
   2x S3700 200GB cache tier
   4x 4TB HDD OSD disk
 
 /Steffen
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph repo - RSYNC?

2015-03-09 Thread Jesus Chavez (jeschave)

Hi David also for the Calamari or gui monitoring interface is there any way to 
get user account and passwd of inktank since the repo to install Calamari seems 
to be only for people inside of inktank



Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52 55 5267 3146tel:+52%2055%205267%203146
Mobile: +51 1 5538883255tel:+51%201%205538883255

CCIE - 44433

On Mar 8, 2015, at 10:38 AM, David Moreau Simard 
dmsim...@iweb.commailto:dmsim...@iweb.com wrote:

Hi,

With the help of Inktank we have been providing a Ceph mirror at
ceph.mirror.iweb.cahttp://ceph.mirror.iweb.ca.
Quick facts:
- Located on the east coast of Canada (Montreal, Quebec)
- Syncs every four hours directly off of the official repositories
- Available over http (http://ceph.mirror.iweb.ca/) and rsync
(rsync://mirror.iweb.ca/ceph)

We're working on a brand new, faster and improved infrastructure for all
of our mirrors and it will be backed by Ceph.. So the Ceph mirror will
soon be stored on a Ceph cluster :)

Feel free to use it !
--
David Moreau Simard


On 2015-03-05, 1:14 PM, Brian Rak 
b...@gameservers.commailto:b...@gameservers.com wrote:

Do any of the Ceph repositories run rsync?  We generally mirror the
repository locally so we don't encounter any unexpected upgrades.

eu.ceph.comhttp://eu.ceph.com used to run this, but it seems to be down now.

# rsync rsync://eu.ceph.com
rsync: failed to connect to eu.ceph.comhttp://eu.ceph.com: Connection refused 
(111)
rsync error: error in socket IO (code 10) at clientserver.c(124)
[receiver=3.0.6]

___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] EC Pool and Cache Tier Tuning

2015-03-09 Thread Steffen Winther

Nick Fisk nick@... writes:

 My Ceph cluster comprises of 4 Nodes each with the following:-
 10x 3TB WD Red Pro disks - EC pool k=3 m=3 (7200rpm)
 2x S3700 100GB SSD's (20k Write IOPs) for HDD Journals
 1x S3700 400GB SSD (35k Write IOPs) for cache tier - 3x replica 
If I have following 4x node config:

  2x S3700 200GB SSD's
  4x 4TB HDDs

What config to aim for to optimize RBD write/read OPs:

  1x S3700 200GB SSD for 4x journals
  1x S3700 200GB cache tier
  4x 4TB HDD OSD disk

or:

  2x S3700 200GB SSD for 2x journals
  4x 4TB HDD OSD disk

or:

  2x S3700 200GB cache tier
  4x 4TB HDD OSD disk

/Steffen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Udo Lembke

Hi Tony,
sounds like an good idea!

Udo
On 09.03.2015 21:55, Tony Harris wrote:
 I know I'm not even close to this type of a problem yet with my small
 cluster (both test and production clusters) - but it would be great if
 something like that could appear in the cluster HEALTHWARN, if Ceph
 could determine the amount of used processes and compare them against
 the current limit then throw a health warning if it gets within say 10
 or 15% of the max value.  That would be a really quick indicator for
 anyone who frequently checks the health status (like through a web
 portal) as they may see it more quickly then during their regular log
 check interval.  Just a thought.

 -Tony


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 OSD with 3.16-3 kernel

2015-03-09 Thread Nick Fisk

Can you run the Fio test again but with a queue depth of 32. This will probably 
show what your cluster is capable of. Adding more nodes with SSD's will 
probably help scale, but only at higher io depths. At low queue depths you are 
probably already at the limit as per my earlier email.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of mad 
Engineer
Sent: 09 March 2015 17:23
To: Nick Fisk
Cc: ceph-users
Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9 
OSD with 3.16-3 kernel

Thank you Nick for explaining the problem with 4k writes.Queue depth used in 
this setup is 256 the maximum supported.
Can you clarify that adding more nodes will not increase iops.In general how 
will we increase iops of a ceph cluster.

Thanks for your help

On Sat, Mar 7, 2015 at 5:57 PM, Nick Fisk n...@fisk.me.uk wrote:
You are hitting serial latency limits. For a 4kb sync write to happen it has 
to:-

1. Travel across network from client to Primary OSD
2. Be processed by Ceph
3. Get Written to Pri OSD
4. Ack travels across network to client

At 4kb these 4 steps take up a very high percentage of the actual processing 
time as compared to the actual write to the SSD. Apart from faster (more ghz) 
CPU's which will improve step 2, there's not much that can be done. Future Ceph 
releases may improve step2 as well, but I wouldn't imagine it will change 
dramitcally.

Replication level 1 will also see the IOPs drop as you are introducing yet 
more ceph processing and network delays. Unless a future Ceph feature can be 
implemented where it returns the ack to client once data has hit the 1st OSD.

Still a 1000 iops, is not that bad.  You mention it needs to achieve 8000 iops 
to replace your existing SAN, at what queue depth is this required? You are 
getting way above that at a queue depth of only 16.

I doubt most Ethernet based enterprise SANs would be able to provide 8000 iops 
at a queue depth of 1, as just network delays would be limiting you to around 
that figure. A network delay of .1ms will limit you to 10,000 IOPs, .2ms = 
5000IOPs and so on.

If you really do need pure SSD performance for a certain client you will need 
to move the SSD local to it using some sort of caching software running on the 
client , although this can bring its own challenges.

Nick

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 mad Engineer
 Sent: 07 March 2015 10:55
 To: Somnath Roy
 Cc: ceph-users
 Subject: Re: [ceph-users] Extreme slowness in SSD cluster with 3 nodes and 9
 OSD with 3.16-3 kernel

 Update:

 Hardware:
 Upgraded RAID controller to LSI Megaraid 9341 -12Gbps
 3 Samsung 840 EVO - was showing 45K iops for fio test with 7 threads and 4k
 block size in JBOD mode
 CPU- 16 cores @2.27Ghz
 RAM- 24Gb
 NIC- 10Gbits with under 1 ms latency, iperf shows 9.18 Gbps between host
 and client

  Software
 Ubuntu 14.04 with stock kernel 3.13-
 Upgraded from firefly to giant [ceph version 0.87.1
 (283c2e7cfa2457799f534744d7d549f83ea1335e)]
 Changed file system to btrfs and i/o scheduler to noop.

 Ceph Setup
 replication to 1 and using 2 SSD OSD and 1 SSD for Journal.All are samsung 840
 EVO in JBOD mode on single server.

 Configuration:
 [global]
 fsid = 979f32fc-6f31-43b0-832f-29fcc4c5a648
 mon_initial_members = ceph1
 mon_host = 10.99.10.118
 auth_cluster_required = cephx
 auth_service_required = cephx
 auth_client_required = cephx
 filestore_xattr_use_omap = true
 osd_pool_default_size = 1
 osd_pool_default_min_size = 1
 osd_pool_default_pg_num = 250
 osd_pool_default_pgp_num = 250
 debug_lockdep = 0/0
 debug_context = 0/0
 debug_crush = 0/0
 debug_buffer = 0/0
 debug_timer = 0/0
 debug_filer = 0/0
 debug_objecter = 0/0
 debug_rados = 0/0
 debug_rbd = 0/0
 debug_journaler = 0/0
 debug_objectcatcher = 0/0
 debug_client = 0/0
 debug_osd = 0/0
 debug_optracker = 0/0
 debug_objclass = 0/0
 debug_filestore = 0/0
 debug_journal = 0/0
 debug_ms = 0/0
 debug_monc = 0/0
 debug_tp = 0/0
 debug_auth = 0/0
 debug_finisher = 0/0
 debug_heartbeatmap = 0/0
 debug_perfcounter = 0/0
 debug_asok = 0/0
 debug_throttle = 0/0
 debug_mon = 0/0
 debug_paxos = 0/0
 debug_rgw = 0/0

 [client]
 rbd_cache = true

 Client
 Ubuntu 14.04 with 16 Core @2.53 Ghz and 24G RAM

 Results
 rados bench -p rdp -b 4096 -t 16 10 write

 rados bench -p rbd -b 4096 -t 16 10 write
  Maintaining 16 concurrent writes of 4096 bytes for up to 10 seconds or 0
 objects
  Object prefix: benchmark_data_ubuntucompute_3931
sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
  0   0 0 0 0 0 - 0
  1  16  6370  6354   24.8124   24.8203   0.002210.00251512
  2  16 11618 11602   22.6536  20.5  0.0010250.00275493
  3  16 16889 16873   21.9637   20.5898  0.0012880.00281797
  4  16 17310 1729416.884   1.64453  0.0540660.00365805
  5  16

Re: [ceph-users] qemu-kvm and cloned rbd image

2015-03-09 Thread koukou73gr


On 03/05/2015 07:19 PM, Josh Durgin wrote:

client.libvirt
 key: 
 caps: [mon] allow r
 caps: [osd] allow class-read object_prefix rbd_children, allow rw
class-read pool=rbd


This includes everything except class-write on the pool you're using.
You'll need that so that a copy_up call (used just for clones) works.
That's what was getting a permissions error. You can use rwx for short.


Josh thanks! That was the problem indeed.

I removed class-write capability because I also use this user as the 
default for ceph cli commands. Without class-write this user can't erase 
an existing image from the pool, while at the same time being able to 
create new ones.


I should probably come up with a better scheme if I am to utilize cloned 
images.


Thanks again!

-Kostas

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Prioritize Heartbeat packets

2015-03-09 Thread Robert LeBlanc

I've found commit 9b9a682fe035c985e416ee1c112fa58f9045a27c and I see
that when 'osd heartbeat use min delay socket = true' it will mark the
packet with DSCP CS6. Based on the setting of the socket in
msg/simple/Pipe.cc is it possible that this can apply to both OSD and
monitor? I don't understand the code enough to know how the
set_socket_options() is called from the OSD and monitor.

If this applies to both monitor and OSD, would it be better to rename
the option to a more generic name?

Thanks,

On Sat, Mar 7, 2015 at 4:23 PM, Daniel Swarbrick
daniel.swarbr...@gmail.com wrote:
 Judging by the commit, this ought to do the trick:

 osd heartbeat use min delay socket = true

 On 07/03/15 01:20, Robert LeBlanc wrote:

 I see that Jian Wen has done work on this for 0.94. I tried looking
 through the code to see if I can figure out how to configure this new
 option, but it all went over my head pretty quick.

 Can I get a brief summary on how to set the priority of heartbeat
 packets or where to look in the code to figure it out?

 Thanks,

 On Thu, Aug 28, 2014 at 2:01 AM, Daniel Swarbrick
 daniel.swarbr...@profitbricks.com
 mailto:daniel.swarbr...@profitbricks.com wrote:

 On 28/08/14 02:56, Sage Weil wrote:
  I seem to remember someone telling me there were hooks/hints you
 could
  call that would tag either a socket or possibly data on that socket
 with a
  label for use by iptables and such.. but I forget what it was.
 

 Something like setsockopt() SO_MARK?

 *SO_MARK *(since Linux 2.6.25)
Set the mark for each packet sent through this socket
 (similar
to the netfilter MARK target but socket-based).
 Changing the
mark can be used for mark-based routing without
 netfilter or
for packet filtering.  Setting this option requires the
*CAP_NET_ADMIN *capability.

 Alternatively, directly set IP_TOS options on the socket, or
 SO_PRIORITY
 which sets the IP TOS bits as well.


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Sage Weil

On Mon, 9 Mar 2015, Karan Singh wrote:
 Thanks Guys kernel.pid_max=4194303 did the trick.

Great to hear!  Sorry we missed that you only had it at 65536.

This is a really common problem that people hit when their clusters start 
to grow.  Is there somewhere in the docs we can put this to catch more 
users?  Or maybe a warning issued by the osds themselves or something if 
they see limits that are low?

sage

 - Karan -
 
   On 09 Mar 2015, at 14:48, Christian Eichelmann
   christian.eichelm...@1und1.de wrote:
 
 Hi Karan,
 
 as you are actually writing in your own book, the problem is the
 sysctl
 setting kernel.pid_max. I've seen in your bug report that you were
 setting it to 65536, which is still to low for high density hardware.
 
 In our cluster, one OSD server has in an idle situation about 66.000
 Threads (60 OSDs per Server). The number of threads increases when you
 increase the number of placement groups in the cluster, which I think
 has triggered your problem.
 
 Set the kernel.pid_max setting to 4194303 (the maximum) like Azad
 Aliyar suggested, and the problem should be gone.
 
 Regards,
 Christian
 
 Am 09.03.2015 11:41, schrieb Karan Singh:
   Hello Community need help to fix a long going Ceph
   problem.
 
   Cluster is unhealthy , Multiple OSDs are DOWN. When i am
   trying to
   restart OSD?s i am getting this error
 
 
   /2015-03-09 12:22:16.312774 7f760dac9700 -1
   common/Thread.cc
   http://Thread.cc: In function 'void
   Thread::create(size_t)' thread
   7f760dac9700 time 2015-03-09 12:22:16.311970/
   /common/Thread.cc http://Thread.cc: 129: FAILED
   assert(ret == 0)/
 
 
   *Environment *:  4 Nodes , OSD+Monitor , Firefly latest ,
   CentOS6.5
   , 3.17.2-1.el6.elrepo.x86_64
 
   Tried upgrading from 0.80.7 to 0.80.8  but no Luck
 
   Tried centOS stock kernel 2.6.32  but no Luck
 
   Memory is not a problem more then 150+GB is free
 
 
   Did any one every faced this problem ??
 
   *Cluster status *
   *
   *
   / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
   / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down;
   1 pgs
   incomplete; 1735 pgs peering; 8938 pgs stale; 1/
   /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs
   stuck unclean;
   recovery 6061/31080 objects degraded (19/
   /.501%); 111/196 in osds are down; clock skew detected on
   mon.pouta-s02,
   mon.pouta-s03/
   / monmap e3: 3 mons at
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX
   .50.3:6789/
   //0}, election epoch 1312, quorum 0,1,2
   pouta-s01,pouta-s02,pouta-s03/
   /   * osdmap e26633: 239 osds: 85 up, 196 in*/
   /  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data,
   10360 objects/
   /    4699 GB used, 707 TB / 711 TB avail/
   /    6061/31080 objects degraded (19.501%)/
   /  14 down+remapped+peering/
   /  39 active/
   /    3289 active+clean/
   / 547 peering/
   / 663 stale+down+peering/
   / 705 stale+active+remapped/
   /   1 active+degraded+remapped/
   /   1 stale+down+incomplete/
   / 484 down+peering/
   / 455 active+remapped/
   /    3696 stale+active+degraded/
   /   4 remapped+peering/
   /  23 stale+down+remapped+peering/
   /  51 stale+active/
   /    3637 active+degraded/
   /    3799 stale+active+clean/
 
   *OSD :  Logs *
 
   /2015-03-09 12:22:16.312774 7f760dac9700 -1
   common/Thread.cc
   http://Thread.cc: In function 'void
   Thread::create(size_t)' thread
   7f760dac9700 time 2015-03-09 12:22:16.311970/
   /common/Thread.cc http://Thread.cc: 129: FAILED
   assert(ret == 0)/
   /
   /
   / ceph version 0.80.8
   (69eaad7f8308f21573c604f121956e64679a52a7)/
   / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/
   / 2: (SimpleMessenger::add_accept_pipe(int)+0x6a)
   [0xae84fa]/
   / 3: (Accepter::entry()+0x265) [0xb5c635]/
   / 4: /lib64/libpthread.so.0() [0x3c8a6079d1]/
   / 5: (clone()+0x6d) [0x3c8a2e89dd]/
   / NOTE: a copy of the executable, or `objdump -rdS
   executable` is
   needed to interpret this./
 
 
   *More information at Ceph Tracker Issue :
   *http://tracker.ceph.com/issues/10988#change-49018
 
 
   
   Karan Singh
   Systems Specialist , Storage Platforms
   CSC - IT Center for Science,
   Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
   mobile: +358 503 812758
   tel. +358 9 4572001
   fax +358 9 4572302

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Steffen Winther

Yehuda Sadeh-Weinraub yehuda@... writes:


 If you're using apache, then it filters out zero Content-Length.
 Nothing much radosgw can do about it.
 You can try using the radosgw civetweb frontend, see if it changes anything.
Thanks, only no difference...

Req:
PUT /mssCl/ HTTP/1.1
Host: rgw.gsp.sprawl.dk:7480
Authorization: AWS auth id
Date: Mon, 09 Mar 2015 20:18:16 GMT
Content-Length: 0

Response:
HTTP/1.1 200 OK
Content-type: application/xml
Content-Length: 0

App still says:

S3.createBucket, class S3, code UnexpectedContent,
message Inconsistency in S3 response. error response is not a valid xml 
message

:/


Yehuda any comments on below 2. issue?

2. at every create bucket OP the GW create what looks like new containers
for ACLs in .rgw pool, is this normal
or howto avoid such multiple objects clottering the GW pools?
Is there something wrong since I get multiple ACL object for this bucket
everytime my App tries to recreate same bucket or
is this a feature/bug in radosGW?

# rados -p .rgw ls
.bucket.meta.mssCl:default.6309817.1
.bucket.meta.mssCl:default.6187712.3
.bucket.meta.mssCl:default.6299841.7
.bucket.meta.mssCl:default.6309817.5
.bucket.meta.mssCl:default.6187712.2
.bucket.meta.mssCl:default.6187712.19
.bucket.meta.mssCl:default.6187712.12
mssCl
...

# rados -p .rgw listxattr .bucket.meta.mssCl:default.6187712.12
ceph.objclass.version
user.rgw.acl

/Steffen


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Tr : RadosGW - Bucket link and ACLs

2015-03-09 Thread Italo Santos

Yeah, I was thinking about that and will be the alternative for me too...  

Regards.

Italo Santos
http://italosantos.com.br/


On Friday, March 6, 2015 at 18:20, ghislain.cheval...@orange.com wrote:

  
  
  
  Message d'origine 
 De : CHEVALIER Ghislain IMT/OLPS ghislain.cheval...@orange.com 
 (mailto:ghislain.cheval...@orange.com)  
 Date :06/03/2015 21:56 (GMT+01:00)  
 À : Italo Santos okd...@gmail.com (mailto:okd...@gmail.com)  
 Cc :  
 Objet : RE : [ceph-users] RadosGW - Bucket link and ACLs  
  
 Hi
 We encountered this behavior when developing the rgw admin module in inkscope 
 and we fixed it as foĺlowed:
 As you created the user access key and secret key with the admin user it 
 seems better to create the bucket with these credentials
  
 Best regards
  
 Envoyé de mon Galaxy Ace4 Orange
  
  
  Message d'origine 
 De : Italo Santos okd...@gmail.com (mailto:okd...@gmail.com)  
 Date :06/03/2015 20:52 (GMT+01:00)  
 À : ceph-users@lists.ceph.com (mailto:ceph-users@lists.ceph.com)  
 Cc :  
 Objet : [ceph-users] RadosGW - Bucket link and ACLs  
  
 Hello,  
  
 I’m building a object storage environment and I’m in trouble with some 
 administration ops, to manage the entire environment I decided create an 
 admin user and use that to manage the client users which I’ll create further. 
  
  
 Using the admin (called “italux) I created a new user (called cliente”) and 
 after that I created a new bucket with the admin user (called 
 cliente-bucket). After that, still using the admin, I change the permissions 
 of the cliente-bucket” (which is owned by admin) granting FULL_CONTROL to 
 the “cliente” user.  
  
 So, using the admin API I unlink the “cliente-bucket” from the admin user and 
 link to the “cliente” user, changing the ownership of the bucket:  
  
 In [86]: url = 
 'http://radosgw.example.com/admin/bucket?format=jsonbucket=cliente-bucket'  
 In [87]: r = requests.get(url, auth=S3Auth(access_key, secret_key, server))
 In [88]: r.content
 Out[88]: 
 '{bucket:cliente-bucket,pool:.rgw.buckets,index_pool:.rgw.buckets.index,id:default.4361528.1,marker:default.4361528.1,owner:cliente,ver:1,master_ver:0,mtime:1425670280,max_marker:,usage:{},bucket_quota:{enabled:false,max_size_kb:-1,max_objects:-1}}’
  
  
 After that, when I try change the permissions/acls of the bucket using the 
 “cliente” user and I’m getting AccessDenied. Looking to the raw debug logs it 
 seems that the owner of the bucket wasn’t change. Anyone knows why?  
  
 RadosGW debug logs:  
  
 2015-03-06 16:32:55.943167 7fd32bf57700  1 == starting new request 
 req=0x3cf78a0 =  
 2015-03-06 16:32:55.943183 7fd32bf57700  2 req 2:0.16::PUT /::initializing
 2015-03-06 16:32:55.943189 7fd32bf57700 10 
 host=cliente-bucket.radosgw.example.com 
 rgw_dns_name=object-storage.locaweb.com.br (http://web.com.br)
 2015-03-06 16:32:55.943220 7fd32bf57700 10 s-object=NULL 
 s-bucket=cliente-bucket
 2015-03-06 16:32:55.943225 7fd32bf57700  2 req 2:0.57:s3:PUT /::getting op
 2015-03-06 16:32:55.943230 7fd32bf57700  2 req 2:0.62:s3:PUT 
 /:put_acls:authorizing
 2015-03-06 16:32:55.943269 7fd32bf57700 10 get_canon_resource(): 
 dest=/cliente-bucket/?acl
 2015-03-06 16:32:55.943272 7fd32bf57700 10 auth_hdr:
 PUT
  
  
 Fri, 06 Mar 2015 19:32:55 GMT  
 /cliente-bucket/?acl
 2015-03-06 16:32:55.943370 7fd32bf57700 15 calculated 
 digest=xtSrQR+GsHyqjqGLdiPmjoP62x4=
 2015-03-06 16:32:55.943375 7fd32bf57700 15 
 auth_sign=xtSrQR+GsHyqjqGLdiPmjoP62x4=
 2015-03-06 16:32:55.943377 7fd32bf57700 15 compare=0
 2015-03-06 16:32:55.943384 7fd32bf57700  2 req 2:0.000216:s3:PUT 
 /:put_acls:reading permissions
 2015-03-06 16:32:55.943425 7fd32bf57700 15 Read 
 AccessControlPolicyAccessControlPolicy 
 xmlns=http://s3.amazonaws.com/doc/2006-03-01/;OwnerIDitalux/IDDisplayNameItalo
  Santos/DisplayName/OwnerAccessControlListGrantGrantee 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; 
 xsi:type=CanonicalUserIDcliente/IDDisplayNameCliente/DisplayName/GranteePermissionFULL_CONTROL/Permission/Grant/AccessControlList/AccessControlPolicy
 2015-03-06 16:32:55.943441 7fd32bf57700  2 req 2:0.000273:s3:PUT 
 /:put_acls:init op
 2015-03-06 16:32:55.943447 7fd32bf57700  2 req 2:0.000280:s3:PUT 
 /:put_acls:verifying op mask
 2015-03-06 16:32:55.943451 7fd32bf57700 20 required_mask= 2 user.op_mask=7
 2015-03-06 16:32:55.943453 7fd32bf57700  2 req 2:0.000286:s3:PUT 
 /:put_acls:verifying op permissions
 2015-03-06 16:32:55.943457 7fd32bf57700  5 Searching permissions for 
 uid=cliente mask=56
 2015-03-06 16:32:55.943461 7fd32bf57700  5 Found permission: 15
 2015-03-06 16:32:55.943462 7fd32bf57700  5 Searching permissions for group=1 
 mask=56
 2015-03-06 16:32:55.943464 7fd32bf57700  5 Permissions for group not found
 2015-03-06 16:32:55.943466 7fd32bf57700  5 Searching permissions for group=2 
 mask=56
 2015-03-06 16:32:55.943468 7fd32bf57700  5 Permissions for group not found
 2015-03-06 16:32:55.943469 7fd32bf57700

Re: [ceph-users] RadosGW - Create bucket via admin API

2015-03-09 Thread Italo Santos

Hello Georgios,  

I thought which had some admin alternative to do that, but I realised don’t 
have once the bucket belongs to a specify user. So the alternative is, after 
create the user authenticate with created credentials to create the bucket.

Thanks  

Regards.

Italo Santos
http://italosantos.com.br/


On Friday, March 6, 2015 at 07:40, Georgios Dimitrakakis wrote:

 Hi Italo,
  
 Check the S3 Bucket OPS at :  
 http://ceph.com/docs/master/radosgw/s3/bucketops/
  
 or use any of the examples provided in Python  
 (http://ceph.com/docs/master/radosgw/s3/python/) or PHP  
 (http://ceph.com/docs/master/radosgw/s3/php/) or JAVA  
 (http://ceph.com/docs/master/radosgw/s3/java/) or anything else that is  
 provided through S3 API (http://ceph.com/docs/master/radosgw/s3/)
  
 Regards,
  
  
 George
  
  Hello guys,
   
  On adminops documentation that saw how to remove a bucket, but I
  can’t find the URI to create one, I’d like to know if this is
  possible?
   
  Regards.
   
  ITALO SANTOS
  http://italosantos.com.br/ [1]
   
   
   
  Links:
  --
  [1] http://italosantos.com.br/
   
  
  
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com (mailto:ceph-users@lists.ceph.com)
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
  


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-09 Thread joel.merr...@gmail.com

On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just sj...@redhat.com wrote:
 You'll probably have to recreate osds with the same ids (empty ones),
 let them boot, stop them, and mark them lost.  There is a feature in the
 tracker to improve this behavior: http://tracker.ceph.com/issues/10976
 -Sam

Thanks Sam, I've readded the OSDs, they became unblocked but there are
still the same number of pgs stuck. I looked at them in some more
detail and it seems they all have num_bytes='0'. Tried a repair too,
for good measure. Still nothing I'm afraid.

Does this mean some underlying catastrophe has happened and they are
never going to recover? Following on, would that cause data loss.
There are no missing objects and I'm hoping there's appropriate
checksumming / replicas to balance that out, but now I'm not so sure.

Thanks again,
Joel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Karan Singh

Hello Community need help to fix a long going Ceph problem.

Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to restart 
OSD’s i am getting this error 


2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc: In function 'void 
Thread::create(size_t)' thread 7f760dac9700 time 2015-03-09 12:22:16.311970
common/Thread.cc: 129: FAILED assert(ret == 0)


Environment :  4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5 , 
3.17.2-1.el6.elrepo.x86_64

Tried upgrading from 0.80.7 to 0.80.8  but no Luck

Tried centOS stock kernel 2.6.32  but no Luck

Memory is not a problem more then 150+GB is free 


Did any one every faced this problem ??

Cluster status 

   cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33
 health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs incomplete; 
1735 pgs peering; 8938 pgs stale; 1
736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean; recovery 
6061/31080 objects degraded (19
.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02, 
mon.pouta-s03
 monmap e3: 3 mons at 
{pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789
/0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03
 osdmap e26633: 239 osds: 85 up, 196 in
  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects
4699 GB used, 707 TB / 711 TB avail
6061/31080 objects degraded (19.501%)
  14 down+remapped+peering
  39 active
3289 active+clean
 547 peering
 663 stale+down+peering
 705 stale+active+remapped
   1 active+degraded+remapped
   1 stale+down+incomplete
 484 down+peering
 455 active+remapped
3696 stale+active+degraded
   4 remapped+peering
  23 stale+down+remapped+peering
  51 stale+active
3637 active+degraded
3799 stale+active+clean

OSD :  Logs 

2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc: In function 'void 
Thread::create(size_t)' thread 7f760dac9700 time 2015-03-09 12:22:16.311970
common/Thread.cc: 129: FAILED assert(ret == 0)

 ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)
 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]
 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]
 3: (Accepter::entry()+0x265) [0xb5c635]
 4: /lib64/libpthread.so.0() [0x3c8a6079d1]
 5: (clone()+0x6d) [0x3c8a2e89dd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.


More information at Ceph Tracker Issue :  
http://tracker.ceph.com/issues/10988#change-49018 
http://tracker.ceph.com/issues/10988#change-49018



Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph mds zombie

2015-03-09 Thread Francois Lafont

Hi,

On 09/03/2015 04:06, kenmasida wrote :

  I have resolved the problem，thank you very much。 When I use ceph-fuse to 
 mount the client，it work well.

Good news but can you give the kernel version of your client cephfs OS?

Like you, I had one problem with cephfs in the client side and it come
probably from the kernel 3.16 of my cephfs clients because (like you) my
problem didn't produce with ceph-fuse or with a kernel 3.13.

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Azad Aliyar

*Check Max Threadcount:* If you have a node with a lot of OSDs, you may be
hitting the default maximum number of threads (e.g., usually 32k),
especially during recovery. You can increase the number of threads using
sysctl to see if increasing the maximum number of threads to the maximum
possible number of threads allowed (i.e., 4194303) will help. For example:

sysctl -w kernel.pid_max=4194303

 If increasing the maximum thread count resolves the issue, you can make it
permanent by including a kernel.pid_max setting in the /etc/sysctl.conf
file. For example:

kernel.pid_max = 4194303


On Mon, Mar 9, 2015 at 4:11 PM, Karan Singh karan.si...@csc.fi wrote:

 Hello Community need help to fix a long going Ceph problem.

 Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to restart
 OSD’s i am getting this error


 *2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970*
 *common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)*


 *Environment *:  4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5 ,
 3.17.2-1.el6.elrepo.x86_64

 Tried upgrading from 0.80.7 to 0.80.8  but no Luck

 Tried centOS stock kernel 2.6.32  but no Luck

 Memory is not a problem more then 150+GB is free


 Did any one every faced this problem ??

 *Cluster status *

  *  cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33*
 * health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs
 incomplete; 1735 pgs peering; 8938 pgs stale; 1*
 *736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean;
 recovery 6061/31080 objects degraded (19*
 *.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02,
 mon.pouta-s03*
 * monmap e3: 3 mons at
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789*
 */0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03*
 * osdmap e26633: 239 osds: 85 up, 196 in*
 *  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects*
 *4699 GB used, 707 TB / 711 TB avail*
 *6061/31080 objects degraded (19.501%)*
 *  14 down+remapped+peering*
 *  39 active*
 *3289 active+clean*
 * 547 peering*
 * 663 stale+down+peering*
 * 705 stale+active+remapped*
 *   1 active+degraded+remapped*
 *   1 stale+down+incomplete*
 * 484 down+peering*
 * 455 active+remapped*
 *3696 stale+active+degraded*
 *   4 remapped+peering*
 *  23 stale+down+remapped+peering*
 *  51 stale+active*
 *3637 active+degraded*
 *3799 stale+active+clean*

 *OSD :  Logs *

 *2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970*
 *common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)*

 * ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)*
 * 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]*
 * 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]*
 * 3: (Accepter::entry()+0x265) [0xb5c635]*
 * 4: /lib64/libpthread.so.0() [0x3c8a6079d1]*
 * 5: (clone()+0x6d) [0x3c8a2e89dd]*
 * NOTE: a copy of the executable, or `objdump -rdS executable` is needed
 to interpret this.*


 *More information at Ceph Tracker Issue :  *
 http://tracker.ceph.com/issues/10988#change-49018


 
 Karan Singh
 Systems Specialist , Storage Platforms
 CSC - IT Center for Science,
 Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
 mobile: +358 503 812758
 tel. +358 9 4572001
 fax +358 9 4572302
 http://www.csc.fi/
 


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
   Warm Regards,  Azad Aliyar
 Linux Server Engineer
 *Email* :  azad.ali...@sparksupport.com   *|*   *Skype* :   spark.azad
http://www.sparksupport.com http://www.sparkmycloud.com
https://www.facebook.com/sparksupport
http://www.linkedin.com/company/244846  https://twitter.com/sparksupport
3rd Floor, Leela Infopark, Phase -2,Kakanad, Kochi-30, Kerala, India
*Phone*:+91 484 6561696 , *Mobile*:91-8129270421.   *Confidentiality
Notice:* Information in this e-mail is proprietary to SparkSupport. and is
intended for use only by the addressed, and may contain information that is
privileged, confidential or exempt from disclosure. If you are not the
intended recipient, you are notified that any use of this information in
any manner is strictly prohibited. Please delete this mail  notify us
immediately at i...@sparksupport.com

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Azad Aliyar

Great Karan.

On Mon, Mar 9, 2015 at 9:32 PM, Karan Singh karan.si...@csc.fi wrote:

 Thanks Guys kernel.pid_max=4194303 did the trick.

 - Karan -

 On 09 Mar 2015, at 14:48, Christian Eichelmann 
 christian.eichelm...@1und1.de wrote:

 Hi Karan,

 as you are actually writing in your own book, the problem is the sysctl
 setting kernel.pid_max. I've seen in your bug report that you were
 setting it to 65536, which is still to low for high density hardware.

 In our cluster, one OSD server has in an idle situation about 66.000
 Threads (60 OSDs per Server). The number of threads increases when you
 increase the number of placement groups in the cluster, which I think
 has triggered your problem.

 Set the kernel.pid_max setting to 4194303 (the maximum) like Azad
 Aliyar suggested, and the problem should be gone.

 Regards,
 Christian

 Am 09.03.2015 11:41, schrieb Karan Singh:

 Hello Community need help to fix a long going Ceph problem.

 Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to
 restart OSD’s i am getting this error


 /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970/
 /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/


 *Environment *:  4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5
 , 3.17.2-1.el6.elrepo.x86_64

 Tried upgrading from 0.80.7 to 0.80.8  but no Luck

 Tried centOS stock kernel 2.6.32  but no Luck

 Memory is not a problem more then 150+GB is free


 Did any one every faced this problem ??

 *Cluster status *
 *
 *
 / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
 / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs
 incomplete; 1735 pgs peering; 8938 pgs stale; 1/
 /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean;
 recovery 6061/31080 objects degraded (19/
 /.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02,
 mon.pouta-s03/
 / monmap e3: 3 mons at

 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789/
 //0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03/
 /   * osdmap e26633: 239 osds: 85 up, 196 in*/
 /  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects/
 /4699 GB used, 707 TB / 711 TB avail/
 /6061/31080 objects degraded (19.501%)/
 /  14 down+remapped+peering/
 /  39 active/
 /3289 active+clean/
 / 547 peering/
 / 663 stale+down+peering/
 / 705 stale+active+remapped/
 /   1 active+degraded+remapped/
 /   1 stale+down+incomplete/
 / 484 down+peering/
 / 455 active+remapped/
 /3696 stale+active+degraded/
 /   4 remapped+peering/
 /  23 stale+down+remapped+peering/
 /  51 stale+active/
 /3637 active+degraded/
 /3799 stale+active+clean/

 *OSD :  Logs *

 /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc
 http://Thread.cc: In function 'void Thread::create(size_t)' thread
 7f760dac9700 time 2015-03-09 12:22:16.311970/
 /common/Thread.cc http://Thread.cc: 129: FAILED assert(ret == 0)/
 /
 /
 / ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)/
 / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/
 / 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]/
 / 3: (Accepter::entry()+0x265) [0xb5c635]/
 / 4: /lib64/libpthread.so.0() [0x3c8a6079d1]/
 / 5: (clone()+0x6d) [0x3c8a2e89dd]/
 / NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this./


 *More information at Ceph Tracker Issue :
 *http://tracker.ceph.com/issues/10988#change-49018


 
 Karan Singh
 Systems Specialist , Storage Platforms
 CSC - IT Center for Science,
 Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
 mobile: +358 503 812758
 tel. +358 9 4572001
 fax +358 9 4572302
 http://www.csc.fi/
 



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Christian Eichelmann
 Systemadministrator

 11 Internet AG - IT Operations Mail  Media Advertising  Targeting
 Brauerstraße 48 · DE-76135 Karlsruhe
 Telefon: +49 721 91374-8026
 christian.eichelm...@1und1.de

 Amtsgericht Montabaur / HRB 6484
 Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
 Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
 Aufsichtsratsvorsitzender: Michael Scheeren





-- 
   Warm Regards,  Azad Aliyar
 Linux Server Engineer
 *Email* :  azad.ali...@sparksupport.com   *|*   *Skype* :   spark.azad
http://www.sparksupport.com

[ceph-users] how to improve seek time using hammer-test release

2015-03-09 Thread kevin parrikar

hello All,
 I just setup single node ceph with no replication to
familiarize with ceph.
using 2 intel S3500 SSD 800 Gb and 8Gb RAM and 16 core CPU.

Os is ubuntu 14.04 64 bit ,kbd is loaded (modprobe kbd)

When running bonniee++ against /dev/rbd0
it shows a seekrate of  892.2/s.

How can the seek time be improved.If i ran 5 bonnie on /mnt where /dev/rbd0
is mounted as ext4 seek/s reduces to 500/s .I am trying to achieve over
1000 seek/s for each thread. What can i do to improve performance.

*Tried following *
scheduler to noop
filesystem to btrfs
debugging to 0/0 (all parameters found from mailing list) - This showed
some noticeable difference .


Will configuring ssd in RAID0 improve this,A single OSD from RAID0

Regards,
Kevin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Nicheal

2015-03-10 3:01 GMT+08:00 Sage Weil s...@newdream.net:
 On Mon, 9 Mar 2015, Karan Singh wrote:
 Thanks Guys kernel.pid_max=4194303 did the trick.

 Great to hear!  Sorry we missed that you only had it at 65536.

 This is a really common problem that people hit when their clusters start
 to grow.  Is there somewhere in the docs we can put this to catch more
 users?  Or maybe a warning issued by the osds themselves or something if
 they see limits that are low?

 sage

Um, I think we can add the command to the shell script
/etc/init.d/ceph.  Something like we deal with the max fd limitation
(ulimit -n 32768). Thus, if we use command service ceph start osd.*
to start osds, it will be automatically changed to the proper value.

 - Karan -

   On 09 Mar 2015, at 14:48, Christian Eichelmann
   christian.eichelm...@1und1.de wrote:

 Hi Karan,

 as you are actually writing in your own book, the problem is the
 sysctl
 setting kernel.pid_max. I've seen in your bug report that you were
 setting it to 65536, which is still to low for high density hardware.

 In our cluster, one OSD server has in an idle situation about 66.000
 Threads (60 OSDs per Server). The number of threads increases when you
 increase the number of placement groups in the cluster, which I think
 has triggered your problem.

 Set the kernel.pid_max setting to 4194303 (the maximum) like Azad
 Aliyar suggested, and the problem should be gone.

 Regards,
 Christian

 Am 09.03.2015 11:41, schrieb Karan Singh:
   Hello Community need help to fix a long going Ceph
   problem.

   Cluster is unhealthy , Multiple OSDs are DOWN. When i am
   trying to
   restart OSD?s i am getting this error


   /2015-03-09 12:22:16.312774 7f760dac9700 -1
   common/Thread.cc
   http://Thread.cc: In function 'void
   Thread::create(size_t)' thread
   7f760dac9700 time 2015-03-09 12:22:16.311970/
   /common/Thread.cc http://Thread.cc: 129: FAILED
   assert(ret == 0)/


   *Environment *:  4 Nodes , OSD+Monitor , Firefly latest ,
   CentOS6.5
   , 3.17.2-1.el6.elrepo.x86_64

   Tried upgrading from 0.80.7 to 0.80.8  but no Luck

   Tried centOS stock kernel 2.6.32  but no Luck

   Memory is not a problem more then 150+GB is free


   Did any one every faced this problem ??

   *Cluster status *
   *
   *
   / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
   / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down;
   1 pgs
   incomplete; 1735 pgs peering; 8938 pgs stale; 1/
   /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs
   stuck unclean;
   recovery 6061/31080 objects degraded (19/
   /.501%); 111/196 in osds are down; clock skew detected on
   mon.pouta-s02,
   mon.pouta-s03/
   / monmap e3: 3 mons at
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX
   .50.3:6789/
   //0}, election epoch 1312, quorum 0,1,2
   pouta-s01,pouta-s02,pouta-s03/
   /   * osdmap e26633: 239 osds: 85 up, 196 in*/
   /  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data,
   10360 objects/
   /4699 GB used, 707 TB / 711 TB avail/
   /6061/31080 objects degraded (19.501%)/
   /  14 down+remapped+peering/
   /  39 active/
   /3289 active+clean/
   / 547 peering/
   / 663 stale+down+peering/
   / 705 stale+active+remapped/
   /   1 active+degraded+remapped/
   /   1 stale+down+incomplete/
   / 484 down+peering/
   / 455 active+remapped/
   /3696 stale+active+degraded/
   /   4 remapped+peering/
   /  23 stale+down+remapped+peering/
   /  51 stale+active/
   /3637 active+degraded/
   /3799 stale+active+clean/

   *OSD :  Logs *

   /2015-03-09 12:22:16.312774 7f760dac9700 -1
   common/Thread.cc
   http://Thread.cc: In function 'void
   Thread::create(size_t)' thread
   7f760dac9700 time 2015-03-09 12:22:16.311970/
   /common/Thread.cc http://Thread.cc: 129: FAILED
   assert(ret == 0)/
   /
   /
   / ceph version 0.80.8
   (69eaad7f8308f21573c604f121956e64679a52a7)/
   / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/
   / 2: (SimpleMessenger::add_accept_pipe(int)+0x6a)
   [0xae84fa]/
   / 3: (Accepter::entry()+0x265) [0xb5c635]/
   / 4: /lib64/libpthread.so.0() [0x3c8a6079d1]/
   / 5: (clone()+0x6d) [0x3c8a2e89dd]/
   / NOTE: a copy of the executable, or `objdump -rdS
   executable` is
   needed to interpret this./


   *More information at Ceph Tracker Issue :
   *http://tracker.ceph.com/issues/10988#change-49018

Re: [ceph-users] Prioritize Heartbeat packets

2015-03-09 Thread Robert LeBlanc

Jian,

Thanks for the clarification. I'll mark traffic destined for the
monitors as well. We are getting ready to put our first cluster into
production. If you are interested we will be testing the heartbeat
priority to see if we can saturate the network (not an easy task for
40 Gb) and keep the cluster from falling apart. Our network team is
marking COS based on the DSCP and enforcing priority. We have three
VLANs on bonded 40 GbE, management, storage (monitors, clients, OSDs),
and cluster (replication). We have three priority classes management
(heartbeats on all VLANs, SSH, DNS, etc), storage traffic (no
marking), and replication (scavenger class). We are interested to see
how things pan out.

Thanks,
Robert

On Mon, Mar 9, 2015 at 8:58 PM, Jian Wen wenjia...@gmail.com wrote:
 Only OSD calls set_socket_priority().
 See  https://github.com/ceph/ceph/pull/3353

 On Tue, Mar 10, 2015 at 3:36 AM, Robert LeBlanc rob...@leblancnet.us wrote:
 I've found commit 9b9a682fe035c985e416ee1c112fa58f9045a27c and I see
 that when 'osd heartbeat use min delay socket = true' it will mark the
 packet with DSCP CS6. Based on the setting of the socket in
 msg/simple/Pipe.cc is it possible that this can apply to both OSD and
 monitor? I don't understand the code enough to know how the
 set_socket_options() is called from the OSD and monitor.

 If this applies to both monitor and OSD, would it be better to rename
 the option to a more generic name?

 Thanks,

 On Sat, Mar 7, 2015 at 4:23 PM, Daniel Swarbrick
 daniel.swarbr...@gmail.com wrote:
 Judging by the commit, this ought to do the trick:

 osd heartbeat use min delay socket = true

 On 07/03/15 01:20, Robert LeBlanc wrote:

 I see that Jian Wen has done work on this for 0.94. I tried looking
 through the code to see if I can figure out how to configure this new
 option, but it all went over my head pretty quick.

 Can I get a brief summary on how to set the priority of heartbeat
 packets or where to look in the code to figure it out?

 Thanks,

 On Thu, Aug 28, 2014 at 2:01 AM, Daniel Swarbrick
 daniel.swarbr...@profitbricks.com
 mailto:daniel.swarbr...@profitbricks.com wrote:

 On 28/08/14 02:56, Sage Weil wrote:
  I seem to remember someone telling me there were hooks/hints you
 could
  call that would tag either a socket or possibly data on that socket
 with a
  label for use by iptables and such.. but I forget what it was.
 

 Something like setsockopt() SO_MARK?

 *SO_MARK *(since Linux 2.6.25)
Set the mark for each packet sent through this socket
 (similar
to the netfilter MARK target but socket-based).
 Changing the
mark can be used for mark-based routing without
 netfilter or
for packet filtering.  Setting this option requires the
*CAP_NET_ADMIN *capability.

 Alternatively, directly set IP_TOS options on the socket, or
 SO_PRIORITY
 which sets the IP TOS bits as well.


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Best,

 Jian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Prioritize Heartbeat packets

2015-03-09 Thread Jian Wen

Only OSD calls set_socket_priority().
See  https://github.com/ceph/ceph/pull/3353

On Tue, Mar 10, 2015 at 3:36 AM, Robert LeBlanc rob...@leblancnet.us wrote:
 I've found commit 9b9a682fe035c985e416ee1c112fa58f9045a27c and I see
 that when 'osd heartbeat use min delay socket = true' it will mark the
 packet with DSCP CS6. Based on the setting of the socket in
 msg/simple/Pipe.cc is it possible that this can apply to both OSD and
 monitor? I don't understand the code enough to know how the
 set_socket_options() is called from the OSD and monitor.

 If this applies to both monitor and OSD, would it be better to rename
 the option to a more generic name?

 Thanks,

 On Sat, Mar 7, 2015 at 4:23 PM, Daniel Swarbrick
 daniel.swarbr...@gmail.com wrote:
 Judging by the commit, this ought to do the trick:

 osd heartbeat use min delay socket = true

 On 07/03/15 01:20, Robert LeBlanc wrote:

 I see that Jian Wen has done work on this for 0.94. I tried looking
 through the code to see if I can figure out how to configure this new
 option, but it all went over my head pretty quick.

 Can I get a brief summary on how to set the priority of heartbeat
 packets or where to look in the code to figure it out?

 Thanks,

 On Thu, Aug 28, 2014 at 2:01 AM, Daniel Swarbrick
 daniel.swarbr...@profitbricks.com
 mailto:daniel.swarbr...@profitbricks.com wrote:

 On 28/08/14 02:56, Sage Weil wrote:
  I seem to remember someone telling me there were hooks/hints you
 could
  call that would tag either a socket or possibly data on that socket
 with a
  label for use by iptables and such.. but I forget what it was.
 

 Something like setsockopt() SO_MARK?

 *SO_MARK *(since Linux 2.6.25)
Set the mark for each packet sent through this socket
 (similar
to the netfilter MARK target but socket-based).
 Changing the
mark can be used for mark-based routing without
 netfilter or
for packet filtering.  Setting this option requires the
*CAP_NET_ADMIN *capability.

 Alternatively, directly set IP_TOS options on the socket, or
 SO_PRIORITY
 which sets the IP TOS bits as well.


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best,

Jian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rados import error: short write

2015-03-09 Thread Leslie Teo




we use `rados export  poolA /opt/zs.rgw-buckets`  export ceph cluster pool 
named poolA  into localdir /opt/ .and import the directroy  /opt/zs.rgw-buckets 
 into another ceph cluster pool named hello , and following the error :shell  
rados import /opt/zs.rgw-buckets hello --create[ERROR]upload: 
rados_write error: short write[ERROR]upload error: -5
the directory /opt/zs.rgw-buckets include the Chinese character
how can we solve this problem when migrate rados pool ?

  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-09 Thread Steffen Winther

Steffen W Sørensen stefws@... writes:

 Response:
 HTTP/1.1 200 OK
 Date: Fri, 06 Mar 2015 10:41:14 GMT
 Server: Apache/2.2.22 (Fedora)
 Connection: close
 Transfer-Encoding: chunked
 Content-Type: application/xml
 
 This response makes the App say:
 
 S3.createBucket, class S3, code UnexpectedContent,
 message Inconsistency in S3 response. error
 response is not a valid xml message
 
 Are our S3 GW not responding properly?
Why doesn't the radosGW return a Content-Length: 0 header
when the body is empty?

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html

Maybe this is confusing my App to expect some XML in body
 
 2. at every create bucket OP the GW create what looks like new containers
 for ACLs in .rgw pool, is this normal
 or howto avoid such multiple objects clottering the GW pools?
Is there something wrong since I get multiple ACL object for this bucket
everytime my App tries to recreate same bucket or
is this a feature/bug in radosGW?

 
 # rados -p .rgw ls 
 .bucket.meta.mssCl:default.6309817.1
 .bucket.meta.mssCl:default.6187712.3
 .bucket.meta.mssCl:default.6299841.7
 .bucket.meta.mssCl:default.6309817.5
 .bucket.meta.mssCl:default.6187712.2
 .bucket.meta.mssCl:default.6187712.19
 .bucket.meta.mssCl:default.6187712.12
 mssCl
 ...
 
 # rados -p .rgw listxattr .bucket.meta.mssCl:default.6187712.12
 ceph.objclass.version
 user.rgw.acl

/Steffen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph node operating system high availability and osd restoration best practices.

2015-03-09 Thread Vivek Varghese Cherian

Hi,

I have a 4 node ceph cluster and the operating system used on the nodes is
Ubuntu
14.04.

The ceph cluster currently has 12 osds spread across the 4 nodes. Currently
one of the nodes has been restored after an operating system file system
corruption
which basically made the node and the osds on that particular node
inaccessible to
the rest of the cluster

I had to re-install the operating system to make the node accessible and I
am currently
in the process of restoring the osds on the re-installed node.

I have 3 questions

1) Is there any mechanism to provide Node Operating System high
availability on a ceph cluster ?

2) Are there any best practices to follow while restoring the osds on a
node that has been restored
after an operating system crash ?

3) Is there any way to check if the data stored on the ceph cluster is safe
and has been replicated to
the other 3 nodes when one nodes crashed ?


Regards,
-- 
Vivek
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

45 matches

Mail list logo