Re: [ceph-users] ceph-giant installation error on centos 6.6

2015-02-17 Thread Brad Hubbard

On 02/18/2015 12:43 PM, Wenxiao He wrote:


Hello,

I need some help as I am getting package dependency errors when trying to 
install ceph-giant on centos 6.6. See below for repo files and also the yum 
install output.




---> Package python-imaging.x86_64 0:1.1.6-19.el6 will be installed
--> Finished Dependency Resolution
Error: Package: 1:librbd1-0.87-0.el6.x86_64 (Ceph)
Requires: liblttng-ust.so.0()(64bit)
Error: Package: gperftools-libs-2.0-11.el6.3.x86_64 (Ceph)
Requires: libunwind.so.8()(64bit)
Error: Package: 1:librados2-0.87-0.el6.x86_64 (Ceph)
Requires: liblttng-ust.so.0()(64bit)
Error: Package: 1:ceph-0.87-0.el6.x86_64 (Ceph)
Requires: liblttng-ust.so.0()(64bit)


Looks like you may need to install libunwind and lttng-ust from EPEL 6?

They seem to be the packages that supply liblttng-ust.so and ibunwind.so so you
could try installing those from EPEL 6 and see how that goes?

Note that this should not be taken as the, or even a, authorative answer :)

Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unexpectedly low number of concurrent backfills

2015-02-17 Thread Gregory Farnum
On Tue, Feb 17, 2015 at 9:48 PM, Florian Haas  wrote:
> On Tue, Feb 17, 2015 at 11:19 PM, Gregory Farnum  wrote:
>> On Tue, Feb 17, 2015 at 12:09 PM, Florian Haas  wrote:
>>> Hello everyone,
>>>
>>> I'm seeing some OSD behavior that I consider unexpected; perhaps
>>> someone can shed some insight.
>>>
>>> Ceph giant (0.87.0), osd max backfills and osd recovery max active
>>> both set to 1.
>>>
>>> Please take a moment to look at the following "ceph health detail" screen 
>>> dump:
>>>
>>> HEALTH_WARN 14 pgs backfill; 1 pgs backfilling; 15 pgs stuck unclean;
>>> recovery 16/65732491 objects degraded (0.000%); 328254/65732491
>>> objects misplaced (0.499%)
>>> pg 20.3db is stuck unclean for 13547.432043, current state
>>> active+remapped+wait_backfill, last acting [45,90,157]
>>> pg 15.318 is stuck unclean for 13547.380581, current state
>>> active+remapped+wait_backfill, last acting [41,17,120]
>>> pg 15.34a is stuck unclean for 13548.115170, current state
>>> active+remapped+wait_backfill, last acting [64,87,80]
>>> pg 20.6f is stuck unclean for 13548.019218, current state
>>> active+remapped+wait_backfill, last acting [13,38,98]
>>> pg 20.44c is stuck unclean for 13548.075430, current state
>>> active+remapped+wait_backfill, last acting [174,127,139]
>>> pg 20.bc is stuck unclean for 13545.743397, current state
>>> active+remapped+wait_backfill, last acting [72,64,104]
>>> pg 15.1ac is stuck unclean for 13548.181461, current state
>>> active+remapped+wait_backfill, last acting [121,145,84]
>>> pg 15.1af is stuck unclean for 13547.962269, current state
>>> active+remapped+backfilling, last acting [150,62,101]
>>> pg 20.396 is stuck unclean for 13547.835109, current state
>>> active+remapped+wait_backfill, last acting [134,49,96]
>>> pg 15.1ba is stuck unclean for 13548.128752, current state
>>> active+remapped+wait_backfill, last acting [122,63,162]
>>> pg 15.3fd is stuck unclean for 13547.644431, current state
>>> active+remapped+wait_backfill, last acting [156,38,131]
>>> pg 20.41c is stuck unclean for 13548.133470, current state
>>> active+remapped+wait_backfill, last acting [78,85,168]
>>> pg 20.525 is stuck unclean for 13545.272774, current state
>>> active+remapped+wait_backfill, last acting [76,57,148]
>>> pg 15.1ca is stuck unclean for 13547.944928, current state
>>> active+remapped+wait_backfill, last acting [157,19,36]
>>> pg 20.11e is stuck unclean for 13545.368614, current state
>>> active+remapped+wait_backfill, last acting [36,134,8]
>>> pg 20.525 is active+remapped+wait_backfill, acting [76,57,148]
>>> pg 20.44c is active+remapped+wait_backfill, acting [174,127,139]
>>> pg 20.41c is active+remapped+wait_backfill, acting [78,85,168]
>>> pg 15.3fd is active+remapped+wait_backfill, acting [156,38,131]
>>> pg 20.3db is active+remapped+wait_backfill, acting [45,90,157]
>>> pg 20.396 is active+remapped+wait_backfill, acting [134,49,96]
>>> pg 15.34a is active+remapped+wait_backfill, acting [64,87,80]
>>> pg 15.318 is active+remapped+wait_backfill, acting [41,17,120]
>>> pg 15.1ca is active+remapped+wait_backfill, acting [157,19,36]
>>> pg 15.1ba is active+remapped+wait_backfill, acting [122,63,162]
>>> pg 15.1ac is active+remapped+wait_backfill, acting [121,145,84]
>>> pg 15.1af is active+remapped+backfilling, acting [150,62,101]
>>> pg 20.11e is active+remapped+wait_backfill, acting [36,134,8]
>>> pg 20.bc is active+remapped+wait_backfill, acting [72,64,104]
>>> pg 20.6f is active+remapped+wait_backfill, acting [13,38,98]
>>> recovery 16/65732491 objects degraded (0.000%); 328254/65732491
>>> objects misplaced (0.499%)
>>>
>>> As you can see, there is barely any overlap between the acting OSDs
>>> for those PGs. osd max backfills should only limit the number of
>>> concurrent backfills out of a single OSD, and so in the situation
>>> above I would expect the 15 backfills to happen mostly concurrently.
>>> As it is they are being serialized, and that seems to needlessly slow
>>> down the process and extend the time needed to complete recovery.
>>>
>>> I'm pretty sure I'm missing something obvious here, but what is it?
>>
>> The max backfill values cover both incoming and outgoing results.
>> Presumably these are all waiting on a small set of target OSDs which
>> are currently receiving backfills of some other PG.
>
> Thanks for the reply, and I am aware of that, but I am not sure how it
> applies here.
>
> What I quoted was the complete list of then-current backfills in the
> cluster. Those are *all* the PGs affected by backfills. And they're so
> scattered across OSDs that there is barely any overlap. The only OSDs
> I even see listed twice are 38 and 64, which would affect PGs
> 15.3fd/20.6f 15.34a/20.bc. What is causing the others to wait?
>
> Or am I misunderstanding the "acting" value here and some other OSDs
> are involved, and if so, how would I find out what those are?

Yes, unless I'm misremembering. Look at the pg dump for those PGs and
check out the "up" versus "acting" values. The "act

Re: [ceph-users] Unexpectedly low number of concurrent backfills

2015-02-17 Thread Florian Haas
On Tue, Feb 17, 2015 at 11:19 PM, Gregory Farnum  wrote:
> On Tue, Feb 17, 2015 at 12:09 PM, Florian Haas  wrote:
>> Hello everyone,
>>
>> I'm seeing some OSD behavior that I consider unexpected; perhaps
>> someone can shed some insight.
>>
>> Ceph giant (0.87.0), osd max backfills and osd recovery max active
>> both set to 1.
>>
>> Please take a moment to look at the following "ceph health detail" screen 
>> dump:
>>
>> HEALTH_WARN 14 pgs backfill; 1 pgs backfilling; 15 pgs stuck unclean;
>> recovery 16/65732491 objects degraded (0.000%); 328254/65732491
>> objects misplaced (0.499%)
>> pg 20.3db is stuck unclean for 13547.432043, current state
>> active+remapped+wait_backfill, last acting [45,90,157]
>> pg 15.318 is stuck unclean for 13547.380581, current state
>> active+remapped+wait_backfill, last acting [41,17,120]
>> pg 15.34a is stuck unclean for 13548.115170, current state
>> active+remapped+wait_backfill, last acting [64,87,80]
>> pg 20.6f is stuck unclean for 13548.019218, current state
>> active+remapped+wait_backfill, last acting [13,38,98]
>> pg 20.44c is stuck unclean for 13548.075430, current state
>> active+remapped+wait_backfill, last acting [174,127,139]
>> pg 20.bc is stuck unclean for 13545.743397, current state
>> active+remapped+wait_backfill, last acting [72,64,104]
>> pg 15.1ac is stuck unclean for 13548.181461, current state
>> active+remapped+wait_backfill, last acting [121,145,84]
>> pg 15.1af is stuck unclean for 13547.962269, current state
>> active+remapped+backfilling, last acting [150,62,101]
>> pg 20.396 is stuck unclean for 13547.835109, current state
>> active+remapped+wait_backfill, last acting [134,49,96]
>> pg 15.1ba is stuck unclean for 13548.128752, current state
>> active+remapped+wait_backfill, last acting [122,63,162]
>> pg 15.3fd is stuck unclean for 13547.644431, current state
>> active+remapped+wait_backfill, last acting [156,38,131]
>> pg 20.41c is stuck unclean for 13548.133470, current state
>> active+remapped+wait_backfill, last acting [78,85,168]
>> pg 20.525 is stuck unclean for 13545.272774, current state
>> active+remapped+wait_backfill, last acting [76,57,148]
>> pg 15.1ca is stuck unclean for 13547.944928, current state
>> active+remapped+wait_backfill, last acting [157,19,36]
>> pg 20.11e is stuck unclean for 13545.368614, current state
>> active+remapped+wait_backfill, last acting [36,134,8]
>> pg 20.525 is active+remapped+wait_backfill, acting [76,57,148]
>> pg 20.44c is active+remapped+wait_backfill, acting [174,127,139]
>> pg 20.41c is active+remapped+wait_backfill, acting [78,85,168]
>> pg 15.3fd is active+remapped+wait_backfill, acting [156,38,131]
>> pg 20.3db is active+remapped+wait_backfill, acting [45,90,157]
>> pg 20.396 is active+remapped+wait_backfill, acting [134,49,96]
>> pg 15.34a is active+remapped+wait_backfill, acting [64,87,80]
>> pg 15.318 is active+remapped+wait_backfill, acting [41,17,120]
>> pg 15.1ca is active+remapped+wait_backfill, acting [157,19,36]
>> pg 15.1ba is active+remapped+wait_backfill, acting [122,63,162]
>> pg 15.1ac is active+remapped+wait_backfill, acting [121,145,84]
>> pg 15.1af is active+remapped+backfilling, acting [150,62,101]
>> pg 20.11e is active+remapped+wait_backfill, acting [36,134,8]
>> pg 20.bc is active+remapped+wait_backfill, acting [72,64,104]
>> pg 20.6f is active+remapped+wait_backfill, acting [13,38,98]
>> recovery 16/65732491 objects degraded (0.000%); 328254/65732491
>> objects misplaced (0.499%)
>>
>> As you can see, there is barely any overlap between the acting OSDs
>> for those PGs. osd max backfills should only limit the number of
>> concurrent backfills out of a single OSD, and so in the situation
>> above I would expect the 15 backfills to happen mostly concurrently.
>> As it is they are being serialized, and that seems to needlessly slow
>> down the process and extend the time needed to complete recovery.
>>
>> I'm pretty sure I'm missing something obvious here, but what is it?
>
> The max backfill values cover both incoming and outgoing results.
> Presumably these are all waiting on a small set of target OSDs which
> are currently receiving backfills of some other PG.

Thanks for the reply, and I am aware of that, but I am not sure how it
applies here.

What I quoted was the complete list of then-current backfills in the
cluster. Those are *all* the PGs affected by backfills. And they're so
scattered across OSDs that there is barely any overlap. The only OSDs
I even see listed twice are 38 and 64, which would affect PGs
15.3fd/20.6f 15.34a/20.bc. What is causing the others to wait?

Or am I misunderstanding the "acting" value here and some other OSDs
are involved, and if so, how would I find out what those are?

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-giant installation error on centos 6.6

2015-02-17 Thread Wenxiao He
Hello,

I need some help as I am getting package dependency errors when trying to
install ceph-giant on centos 6.6. See below for repo files and also the yum
install output.

# lsb_release -a
LSB Version:
 
:base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description:CentOS release 6.6 (Final)
Release:6.6
Codename:   Final

# cat ceph.repo
[Ceph]
name=Ceph packages for $basearch
baseurl=http://ceph.com/rpm-giant/el6/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://ceph.com/rpm-giant/el6/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

[ceph-source]
name=Ceph source packages
baseurl=http://ceph.com/rpm-giant/el6/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

# cat ceph-extras.repo
[ceph-extras]
name=Ceph Extras Packages
baseurl=http://ceph.com/packages/ceph-extras/rpm/centos6/$basearch
enabled=1
priority=2
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

[ceph-extras-noarch]
name=Ceph Extras noarch
baseurl=http://ceph.com/packages/ceph-extras/rpm/centos6/noarch
enabled=1
priority=2
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

[ceph-extras-source]
name=Ceph Extras Sources
baseurl=http://ceph.com/packages/ceph-extras/rpm/centos6/SRPMS
enabled=1
priority=2
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc


# yum install ceph
Loaded plugins: fastestmirror, refresh-packagekit, security
Setting up Install Process
Loading mirror speeds from cached hostfile
 * base: mirrors.easynews.com
 * extras: mirror.steadfast.net
 * updates: mirrors.sonic.net
Resolving Dependencies
--> Running transaction check
---> Package ceph.x86_64 1:0.87-0.el6 will be installed
--> Processing Dependency: librbd1 = 1:0.87-0.el6 for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: python-ceph = 1:0.87-0.el6 for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: libcephfs1 = 1:0.87-0.el6 for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: librados2 = 1:0.87-0.el6 for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: ceph-common = 1:0.87-0.el6 for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: python-flask for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: python-requests for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: gdisk for package: 1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: python-argparse for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: liblttng-ust.so.0()(64bit) for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: libtcmalloc.so.4()(64bit) for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: libleveldb.so.1()(64bit) for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: libcephfs.so.1()(64bit) for package:
1:ceph-0.87-0.el6.x86_64
--> Processing Dependency: librados.so.2()(64bit) for package:
1:ceph-0.87-0.el6.x86_64
--> Running transaction check
---> Package ceph.x86_64 1:0.87-0.el6 will be installed
--> Processing Dependency: liblttng-ust.so.0()(64bit) for package:
1:ceph-0.87-0.el6.x86_64
---> Package ceph-common.x86_64 1:0.87-0.el6 will be installed
---> Package gdisk.x86_64 0:0.8.10-1.el6 will be installed
---> Package gperftools-libs.x86_64 0:2.0-11.el6.3 will be installed
--> Processing Dependency: libunwind.so.8()(64bit) for package:
gperftools-libs-2.0-11.el6.3.x86_64
---> Package leveldb.x86_64 0:1.7.0-2.el6 will be installed
---> Package libcephfs1.x86_64 1:0.87-0.el6 will be installed
---> Package librados2.x86_64 1:0.87-0.el6 will be installed
--> Processing Dependency: liblttng-ust.so.0()(64bit) for package:
1:librados2-0.87-0.el6.x86_64
---> Package librbd1.x86_64 1:0.87-0.el6 will be installed
--> Processing Dependency: liblttng-ust.so.0()(64bit) for package:
1:librbd1-0.87-0.el6.x86_64
---> Package python-argparse.noarch 0:1.2.1-2.el6.centos will be installed
---> Package python-ceph.x86_64 1:0.87-0.el6 will be installed
---> Package python-flask.noarch 1:0.9-5.el6 will be installed
--> Processing Dependency: python-werkzeug for package:
1:python-flask-0.9-5.el6.noarch
--> Processing Dependency: python-sphinx for package:
1:python-flask-0.9-5.el6.noarch
--> Processing Dependency: python-jinja2-26 for package:
1:python-flask-0.9-5.el6.noarch
---> Package python-requests.noarch 0:1.1.0-4.el6.centos will be installed
--> Processing Dependency: python-urllib3 for package:
python-requests-1.1.0-4.el6.centos.noarch
--> Processing Dependency: python-ordereddict for package:
python-requests-1.1.0-4.el6.centos.noarch
--> Processing Dependency: python-chardet for p

Re: [ceph-users] Ceph Block Device

2015-02-17 Thread Brad Hubbard

On 02/18/2015 11:48 AM, Garg, Pankaj wrote:

libkmod: ERROR ../libkmod/libkmod.c:556 kmod_search_moddep: could not open 
moddep file


Try "sudo moddep" and then running your modprobe again.

This seems more like an OS issue than a Ceph specific issue.

Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Block Device

2015-02-17 Thread Garg, Pankaj
Hi Brad,

This is Ubuntu 14.04, running on ARM.
/lib/modules/3.18.0-02094-gab62ac9/modules.dep.bin doesn't exist. 
Rmmod rbd command says "rmmod: ERROR: Module rbd is not currently loaded".

Running as Root doesn't make any difference. I was running as sudo anyway.

Thanks
Pankaj

-Original Message-
From: Brad Hubbard [mailto:bhubb...@redhat.com] 
Sent: Tuesday, February 17, 2015 5:06 PM
To: Garg, Pankaj; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph Block Device

On 02/18/2015 09:56 AM, Garg, Pankaj wrote:
> Hi,
>
> I have a Ceph cluster and I am trying to create a block device. I execute the 
> following command, and get errors:
>
> èsudo rbd map cephblockimage --pool rbd -k /etc/ceph/ceph.client.admin.keyring
>
> libkmod: ERROR ../libkmod/libkmod.c:556 kmod_search_moddep: could not open 
> moddep file '/lib/modules/3.18.0-02094-gab62ac9/modules.dep.bin'
>
> modinfo: ERROR: Module alias rbd not found.
>
> modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open 
> moddep file '/lib/modules/3.18.0-02094-gab62ac9/modules.dep.bin'
>
> rbd: modprobe rbd failed! (256)

What distro/release is this?

Does /lib/modules/3.18.0-02094-gab62ac9/modules.dep.bin exist?

Can you run the command as root?

>
> Need help with what is wrong. I installed the Ceph package on the machine 
> where I execute the command. This is on ARM BTW.  Is there something I am 
> missing?
>
> I am able to run Object storage and rados bench just fine on the cluster.
>
> Thanks
>
> Pankaj
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 


Kindest Regards,

Brad Hubbard
Senior Software Maintenance Engineer
Red Hat Global Support Services
Asia Pacific Region
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Block Device

2015-02-17 Thread Brad Hubbard

On 02/18/2015 09:56 AM, Garg, Pankaj wrote:

Hi,

I have a Ceph cluster and I am trying to create a block device. I execute the 
following command, and get errors:

èsudo rbd map cephblockimage --pool rbd -k /etc/ceph/ceph.client.admin.keyring

libkmod: ERROR ../libkmod/libkmod.c:556 kmod_search_moddep: could not open 
moddep file '/lib/modules/3.18.0-02094-gab62ac9/modules.dep.bin'

modinfo: ERROR: Module alias rbd not found.

modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open 
moddep file '/lib/modules/3.18.0-02094-gab62ac9/modules.dep.bin'

rbd: modprobe rbd failed! (256)


What distro/release is this?

Does /lib/modules/3.18.0-02094-gab62ac9/modules.dep.bin exist?

Can you run the command as root?



Need help with what is wrong. I installed the Ceph package on the machine where 
I execute the command. This is on ARM BTW.  Is there something I am missing?

I am able to run Object storage and rados bench just fine on the cluster.

Thanks

Pankaj



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--


Kindest Regards,

Brad Hubbard
Senior Software Maintenance Engineer
Red Hat Global Support Services
Asia Pacific Region
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Happy New Chinese Year!

2015-02-17 Thread Mark Nelson

Xīnnián kuàilè!

Mark

On 02/17/2015 06:23 PM, xmdx...@gmail.com wrote:

hi, everyone:

Happy New Chinese Year!

—
通过 Mailbox 发送


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Happy New Chinese Year!

2015-02-17 Thread xmdxcxz
hi, everyone:

 Happy New Chinese Year!

—
通过 Mailbox 发送___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Block Device

2015-02-17 Thread Garg, Pankaj
Hi,
I have a Ceph cluster and I am trying to create a block device. I execute the 
following command, and get errors:


è sudo rbd map cephblockimage --pool rbd -k /etc/ceph/ceph.client.admin.keyring
libkmod: ERROR ../libkmod/libkmod.c:556 kmod_search_moddep: could not open 
moddep file '/lib/modules/3.18.0-02094-gab62ac9/modules.dep.bin'
modinfo: ERROR: Module alias rbd not found.
modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open 
moddep file '/lib/modules/3.18.0-02094-gab62ac9/modules.dep.bin'
rbd: modprobe rbd failed! (256)


Need help with what is wrong. I installed the Ceph package on the machine where 
I execute the command. This is on ARM BTW.  Is there something I am missing?
I am able to run Object storage and rados bench just fine on the cluster.


Thanks
Pankaj
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unexpectedly low number of concurrent backfills

2015-02-17 Thread Gregory Farnum
On Tue, Feb 17, 2015 at 12:09 PM, Florian Haas  wrote:
> Hello everyone,
>
> I'm seeing some OSD behavior that I consider unexpected; perhaps
> someone can shed some insight.
>
> Ceph giant (0.87.0), osd max backfills and osd recovery max active
> both set to 1.
>
> Please take a moment to look at the following "ceph health detail" screen 
> dump:
>
> HEALTH_WARN 14 pgs backfill; 1 pgs backfilling; 15 pgs stuck unclean;
> recovery 16/65732491 objects degraded (0.000%); 328254/65732491
> objects misplaced (0.499%)
> pg 20.3db is stuck unclean for 13547.432043, current state
> active+remapped+wait_backfill, last acting [45,90,157]
> pg 15.318 is stuck unclean for 13547.380581, current state
> active+remapped+wait_backfill, last acting [41,17,120]
> pg 15.34a is stuck unclean for 13548.115170, current state
> active+remapped+wait_backfill, last acting [64,87,80]
> pg 20.6f is stuck unclean for 13548.019218, current state
> active+remapped+wait_backfill, last acting [13,38,98]
> pg 20.44c is stuck unclean for 13548.075430, current state
> active+remapped+wait_backfill, last acting [174,127,139]
> pg 20.bc is stuck unclean for 13545.743397, current state
> active+remapped+wait_backfill, last acting [72,64,104]
> pg 15.1ac is stuck unclean for 13548.181461, current state
> active+remapped+wait_backfill, last acting [121,145,84]
> pg 15.1af is stuck unclean for 13547.962269, current state
> active+remapped+backfilling, last acting [150,62,101]
> pg 20.396 is stuck unclean for 13547.835109, current state
> active+remapped+wait_backfill, last acting [134,49,96]
> pg 15.1ba is stuck unclean for 13548.128752, current state
> active+remapped+wait_backfill, last acting [122,63,162]
> pg 15.3fd is stuck unclean for 13547.644431, current state
> active+remapped+wait_backfill, last acting [156,38,131]
> pg 20.41c is stuck unclean for 13548.133470, current state
> active+remapped+wait_backfill, last acting [78,85,168]
> pg 20.525 is stuck unclean for 13545.272774, current state
> active+remapped+wait_backfill, last acting [76,57,148]
> pg 15.1ca is stuck unclean for 13547.944928, current state
> active+remapped+wait_backfill, last acting [157,19,36]
> pg 20.11e is stuck unclean for 13545.368614, current state
> active+remapped+wait_backfill, last acting [36,134,8]
> pg 20.525 is active+remapped+wait_backfill, acting [76,57,148]
> pg 20.44c is active+remapped+wait_backfill, acting [174,127,139]
> pg 20.41c is active+remapped+wait_backfill, acting [78,85,168]
> pg 15.3fd is active+remapped+wait_backfill, acting [156,38,131]
> pg 20.3db is active+remapped+wait_backfill, acting [45,90,157]
> pg 20.396 is active+remapped+wait_backfill, acting [134,49,96]
> pg 15.34a is active+remapped+wait_backfill, acting [64,87,80]
> pg 15.318 is active+remapped+wait_backfill, acting [41,17,120]
> pg 15.1ca is active+remapped+wait_backfill, acting [157,19,36]
> pg 15.1ba is active+remapped+wait_backfill, acting [122,63,162]
> pg 15.1ac is active+remapped+wait_backfill, acting [121,145,84]
> pg 15.1af is active+remapped+backfilling, acting [150,62,101]
> pg 20.11e is active+remapped+wait_backfill, acting [36,134,8]
> pg 20.bc is active+remapped+wait_backfill, acting [72,64,104]
> pg 20.6f is active+remapped+wait_backfill, acting [13,38,98]
> recovery 16/65732491 objects degraded (0.000%); 328254/65732491
> objects misplaced (0.499%)
>
> As you can see, there is barely any overlap between the acting OSDs
> for those PGs. osd max backfills should only limit the number of
> concurrent backfills out of a single OSD, and so in the situation
> above I would expect the 15 backfills to happen mostly concurrently.
> As it is they are being serialized, and that seems to needlessly slow
> down the process and extend the time needed to complete recovery.
>
> I'm pretty sure I'm missing something obvious here, but what is it?

The max backfill values cover both incoming and outgoing results.
Presumably these are all waiting on a small set of target OSDs which
are currently receiving backfills of some other PG.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help needed

2015-02-17 Thread SUNDAY A. OLUTAYO
I appreciate you all. 

Yes, this fix it. 

Thanks, 
Sunday Olutayo 

- Original Message -

From: "Alan Johnson"  
To: "SUNDAY A. OLUTAYO" , "Jacob Weeks (RIS-BCT)" 
 
Cc: ceph-de...@lists.ceph.com, ceph-users@lists.ceph.com, 
maintain...@lists.ceph.com 
Sent: Tuesday, February 17, 2015 10:46:48 PM 
Subject: RE: [ceph-users] Help needed 


Did you set permissions to "sudo chmod +r /etc/ceph/ceph.client.admin.keyring"? 

Thx 



Alan 



From: ceph-users  on behalf of SUNDAY A. 
OLUTAYO  
Sent: Tuesday, February 17, 2015 4:59 PM 
To: Jacob Weeks (RIS-BCT) 
Cc: ceph-de...@lists.ceph.com; ceph-users@lists.ceph.com; 
maintain...@lists.ceph.com 
Subject: Re: [ceph-users] Help needed 


I did that but the problem still persist. 

Thanks, 
Sunday Olutayo 


- Original Message -

From: "Jacob Weeks (RIS-BCT)"  
To: "SUNDAY A. OLUTAYO" , ceph-users@lists.ceph.com, 
ceph-de...@lists.ceph.com, maintain...@lists.ceph.com 
Sent: Tuesday, February 17, 2015 9:57:11 PM 
Subject: RE: [ceph-users] Help needed 



There should be a *.client.admin.keyring file in the directory you were in 
while you ran ceph-deploy. 

Try copying that file to /etc/ceph/ 

Thanks, 

Jacob 



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of SUNDAY 
A. OLUTAYO 
Sent: Tuesday, February 17, 2015 3:39 PM 
To: ceph-users@lists.ceph.com; ceph-de...@lists.ceph.com; 
maintain...@lists.ceph.com 
Subject: [ceph-users] Help needed 


I am setting up a ceph cluster on Ubuntu 14.04.1 LTS, all went well without 
error 
but the "ceph status" after "ceph-deploy mon create-initial" indecate otherwise 

This is the error message; monclient[hunting]: 
Error: missing keyring cannot use cephx for authentication 
librados: client.admin initialization error No such file or directory 

Thankd, 
Sunday Olutayo 


 The information contained in this 
e-mail message is intended only for the personal and confidential use of the 
recipient(s) named above. This message may be an attorney-client communication 
and/or work product and as such is privileged and confidential. If the reader 
of this message is not the intended recipient or an agent responsible for 
delivering it to the intended recipient, you are hereby notified that you have 
received this document in error and that any review, dissemination, 
distribution, or copying of this message is strictly prohibited. If you have 
received this communication in error, please notify us immediately by e-mail, 
and delete the original message. 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help needed

2015-02-17 Thread Alan Johnson
Did you set permissions to "sudo chmod +r /etc/ceph/ceph.client.admin.keyring"?



Thx

Alan


From: ceph-users  on behalf of SUNDAY A. 
OLUTAYO 
Sent: Tuesday, February 17, 2015 4:59 PM
To: Jacob Weeks (RIS-BCT)
Cc: ceph-de...@lists.ceph.com; ceph-users@lists.ceph.com; 
maintain...@lists.ceph.com
Subject: Re: [ceph-users] Help needed

I did that but the problem still persist.

Thanks,
Sunday Olutayo


From: "Jacob Weeks (RIS-BCT)" 
To: "SUNDAY A. OLUTAYO" , ceph-users@lists.ceph.com, 
ceph-de...@lists.ceph.com, maintain...@lists.ceph.com
Sent: Tuesday, February 17, 2015 9:57:11 PM
Subject: RE: [ceph-users] Help needed

There should be a *.client.admin.keyring file in the directory you were in 
while you ran ceph-deploy.

Try copying that file to /etc/ceph/

Thanks,

Jacob

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of SUNDAY 
A. OLUTAYO
Sent: Tuesday, February 17, 2015 3:39 PM
To: ceph-users@lists.ceph.com; ceph-de...@lists.ceph.com; 
maintain...@lists.ceph.com
Subject: [ceph-users] Help needed

I am setting up a ceph cluster on Ubuntu 14.04.1 LTS, all went well without 
error
but the "ceph status" after "ceph-deploy mon create-initial" indecate otherwise

This is the error message; monclient[hunting]:
Error: missing keyring cannot use cephx for authentication
librados: client.admin initialization error No such file or directory
Thankd,
Sunday Olutayo


 The information contained in this 
e-mail message is intended only for the personal and confidential use of the 
recipient(s) named above. This message may be an attorney-client communication 
and/or work product and as such is privileged and confidential. If the reader 
of this message is not the intended recipient or an agent responsible for 
delivering it to the intended recipient, you are hereby notified that you have 
received this document in error and that any review, dissemination, 
distribution, or copying of this message is strictly prohibited. If you have 
received this communication in error, please notify us immediately by e-mail, 
and delete the original message.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help needed

2015-02-17 Thread SUNDAY A. OLUTAYO
I did that but the problem still persist. 

Thanks, 
Sunday Olutayo 


- Original Message -

From: "Jacob Weeks (RIS-BCT)"  
To: "SUNDAY A. OLUTAYO" , ceph-users@lists.ceph.com, 
ceph-de...@lists.ceph.com, maintain...@lists.ceph.com 
Sent: Tuesday, February 17, 2015 9:57:11 PM 
Subject: RE: [ceph-users] Help needed 



There should be a *.client.admin.keyring file in the directory you were in 
while you ran ceph-deploy. 

Try copying that file to /etc/ceph/ 

Thanks, 

Jacob 



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of SUNDAY 
A. OLUTAYO 
Sent: Tuesday, February 17, 2015 3:39 PM 
To: ceph-users@lists.ceph.com; ceph-de...@lists.ceph.com; 
maintain...@lists.ceph.com 
Subject: [ceph-users] Help needed 


I am setting up a ceph cluster on Ubuntu 14.04.1 LTS, all went well without 
error 
but the "ceph status" after "ceph-deploy mon create-initial" indecate otherwise 

This is the error message; monclient[hunting]: 
Error: missing keyring cannot use cephx for authentication 
librados: client.admin initialization error No such file or directory 

Thankd, 
Sunday Olutayo 


 The information contained in this 
e-mail message is intended only for the personal and confidential use of the 
recipient(s) named above. This message may be an attorney-client communication 
and/or work product and as such is privileged and confidential. If the reader 
of this message is not the intended recipient or an agent responsible for 
delivering it to the intended recipient, you are hereby notified that you have 
received this document in error and that any review, dissemination, 
distribution, or copying of this message is strictly prohibited. If you have 
received this communication in error, please notify us immediately by e-mail, 
and delete the original message. 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Introducing "Learning Ceph" : The First ever Book on Ceph

2015-02-17 Thread Federico Lucifredi

To be exact, the platform used throughout is CentOS 6.4... I am reading my copy 
right now :)

Best -F

- Original Message -
From: "SUNDAY A. OLUTAYO" 
To: "Andrei Mikhailovsky" 
Cc: ceph-users@lists.ceph.com
Sent: Monday, February 16, 2015 3:28:45 AM
Subject: Re: [ceph-users] Introducing "Learning Ceph" : The First ever Book on 
Ceph

I bought a copy some days ago, great job but it is Redhat specific. 

Thanks, 

Sunday Olutayo 
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help needed

2015-02-17 Thread Weeks, Jacob (RIS-BCT)
There should be a *.client.admin.keyring file in the directory you were in 
while you ran ceph-deploy.

Try copying that file to /etc/ceph/

Thanks,

Jacob

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of SUNDAY 
A. OLUTAYO
Sent: Tuesday, February 17, 2015 3:39 PM
To: ceph-users@lists.ceph.com; ceph-de...@lists.ceph.com; 
maintain...@lists.ceph.com
Subject: [ceph-users] Help needed

I am setting up a ceph cluster on Ubuntu 14.04.1 LTS, all went well without 
error
but the "ceph status" after "ceph-deploy mon create-initial" indecate otherwise

This is the error message; monclient[hunting]:
Error: missing keyring cannot use cephx for authentication
librados: client.admin initialization error No such file or directory
Thankd,
Sunday Olutayo


 The information contained in this 
e-mail message is intended only for the personal and confidential use of the 
recipient(s) named above. This message may be an attorney-client communication 
and/or work product and as such is privileged and confidential. If the reader 
of this message is not the intended recipient or an agent responsible for 
delivering it to the intended recipient, you are hereby notified that you have 
received this document in error and that any review, dissemination, 
distribution, or copying of this message is strictly prohibited. If you have 
received this communication in error, please notify us immediately by e-mail, 
and delete the original message.  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison

2015-02-17 Thread Stephen Hindle
Awesome!  Thanks Much!

On Tue, Feb 17, 2015 at 1:28 PM, Mark Nelson  wrote:
> Hi Stephen,
>
> It's a benchmark automation tool we wrote that builds a ceph cluster and
> then runs benchmarks against it.  It's still pretty rough (no real error
> checking,  no documentation, etc).  We have some partners that are
> interested in using it too and I'd like to make it useful for the community
> so we're going to try to make it a bit more accessible.
>
> cbt is here:
>
> https://github.com/ceph/ceph-tools/tree/master/cbt
>
> We've also been using it to prototype nightly performance testing of firefly
> and master for the last month or two on some of our lab nodes. The cron job
> and test suites are here:
>
> https://github.com/ceph/ceph-tools/tree/master/regression
>
> Mark
>
>
>
> On 02/17/2015 02:16 PM, Stephen Hindle wrote:
>>
>> I was wondering what the 'CBT' tool is ?  Google is useless for that
>> acronym...
>>
>> Thanks!
>> Steve
>>
>> On Tue, Feb 17, 2015 at 10:37 AM, Mark Nelson  wrote:
>>>
>>> Hi All,
>>>
>>> I wrote up a short document describing some tests I ran recently to look
>>> at
>>> how SSD backed OSD performance has changed across our LTS releases. This
>>> is
>>> just looking at RADOS performance and not RBD or RGW.  It also doesn't
>>> offer
>>> any real explanations regarding the results.  It's just a first high
>>> level
>>> step toward understanding some of the behaviors folks on the mailing list
>>> have reported over the last couple of releases.  I hope you find it
>>> useful.
>>>
>>> Mark
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>

-- 
The information in this message may be confidential.  It is intended solely 
for
the addressee(s).  If you are not the intended recipient, any disclosure,
copying or distribution of the message, or any action or omission taken by 
you
in reliance on it, is prohibited and may be unlawful.  Please immediately
contact the sender if you have received this message in error.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison

2015-02-17 Thread Mark Nelson

Hi Stephen,

It's a benchmark automation tool we wrote that builds a ceph cluster and 
then runs benchmarks against it.  It's still pretty rough (no real error 
checking,  no documentation, etc).  We have some partners that are 
interested in using it too and I'd like to make it useful for the 
community so we're going to try to make it a bit more accessible.


cbt is here:

https://github.com/ceph/ceph-tools/tree/master/cbt

We've also been using it to prototype nightly performance testing of 
firefly and master for the last month or two on some of our lab nodes. 
The cron job and test suites are here:


https://github.com/ceph/ceph-tools/tree/master/regression

Mark


On 02/17/2015 02:16 PM, Stephen Hindle wrote:

I was wondering what the 'CBT' tool is ?  Google is useless for that acronym...

Thanks!
Steve

On Tue, Feb 17, 2015 at 10:37 AM, Mark Nelson  wrote:

Hi All,

I wrote up a short document describing some tests I ran recently to look at
how SSD backed OSD performance has changed across our LTS releases. This is
just looking at RADOS performance and not RBD or RGW.  It also doesn't offer
any real explanations regarding the results.  It's just a first high level
step toward understanding some of the behaviors folks on the mailing list
have reported over the last couple of releases.  I hope you find it useful.

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison

2015-02-17 Thread Karan Singh
Thanks Mark , for a superb explanation . This is indeed very useful.


Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/


> On 17 Feb 2015, at 22:16, Stephen Hindle  wrote:
> 
> I was wondering what the 'CBT' tool is ?  Google is useless for that 
> acronym...
> 
> Thanks!
> Steve
> 
> On Tue, Feb 17, 2015 at 10:37 AM, Mark Nelson  wrote:
>> Hi All,
>> 
>> I wrote up a short document describing some tests I ran recently to look at
>> how SSD backed OSD performance has changed across our LTS releases. This is
>> just looking at RADOS performance and not RBD or RGW.  It also doesn't offer
>> any real explanations regarding the results.  It's just a first high level
>> step toward understanding some of the behaviors folks on the mailing list
>> have reported over the last couple of releases.  I hope you find it useful.
>> 
>> Mark
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> -- 
> The information in this message may be confidential.  It is intended solely 
> for
> the addressee(s).  If you are not the intended recipient, any disclosure,
> copying or distribution of the message, or any action or omission taken by 
> you
> in reliance on it, is prohibited and may be unlawful.  Please immediately
> contact the sender if you have received this message in error.
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison

2015-02-17 Thread Stephen Hindle
I was wondering what the 'CBT' tool is ?  Google is useless for that acronym...

Thanks!
Steve

On Tue, Feb 17, 2015 at 10:37 AM, Mark Nelson  wrote:
> Hi All,
>
> I wrote up a short document describing some tests I ran recently to look at
> how SSD backed OSD performance has changed across our LTS releases. This is
> just looking at RADOS performance and not RBD or RGW.  It also doesn't offer
> any real explanations regarding the results.  It's just a first high level
> step toward understanding some of the behaviors folks on the mailing list
> have reported over the last couple of releases.  I hope you find it useful.
>
> Mark
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
The information in this message may be confidential.  It is intended solely 
for
the addressee(s).  If you are not the intended recipient, any disclosure,
copying or distribution of the message, or any action or omission taken by 
you
in reliance on it, is prohibited and may be unlawful.  Please immediately
contact the sender if you have received this message in error.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help needed

2015-02-17 Thread SUNDAY A. OLUTAYO
I am setting up a ceph cluster on Ubuntu 14.04.1 LTS, all went well without 
error 
but the "ceph status" after "ceph-deploy mon create-initial" indecate otherwise 

This is the error message; monclient[hunting]: 
Error: missing keyring cannot use cephx for authentication 
librados: client.admin initialization error No such file or directory 


Thankd, 
Sunday Olutayo 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unexpectedly low number of concurrent backfills

2015-02-17 Thread Florian Haas
Hello everyone,

I'm seeing some OSD behavior that I consider unexpected; perhaps
someone can shed some insight.

Ceph giant (0.87.0), osd max backfills and osd recovery max active
both set to 1.

Please take a moment to look at the following "ceph health detail" screen dump:

HEALTH_WARN 14 pgs backfill; 1 pgs backfilling; 15 pgs stuck unclean;
recovery 16/65732491 objects degraded (0.000%); 328254/65732491
objects misplaced (0.499%)
pg 20.3db is stuck unclean for 13547.432043, current state
active+remapped+wait_backfill, last acting [45,90,157]
pg 15.318 is stuck unclean for 13547.380581, current state
active+remapped+wait_backfill, last acting [41,17,120]
pg 15.34a is stuck unclean for 13548.115170, current state
active+remapped+wait_backfill, last acting [64,87,80]
pg 20.6f is stuck unclean for 13548.019218, current state
active+remapped+wait_backfill, last acting [13,38,98]
pg 20.44c is stuck unclean for 13548.075430, current state
active+remapped+wait_backfill, last acting [174,127,139]
pg 20.bc is stuck unclean for 13545.743397, current state
active+remapped+wait_backfill, last acting [72,64,104]
pg 15.1ac is stuck unclean for 13548.181461, current state
active+remapped+wait_backfill, last acting [121,145,84]
pg 15.1af is stuck unclean for 13547.962269, current state
active+remapped+backfilling, last acting [150,62,101]
pg 20.396 is stuck unclean for 13547.835109, current state
active+remapped+wait_backfill, last acting [134,49,96]
pg 15.1ba is stuck unclean for 13548.128752, current state
active+remapped+wait_backfill, last acting [122,63,162]
pg 15.3fd is stuck unclean for 13547.644431, current state
active+remapped+wait_backfill, last acting [156,38,131]
pg 20.41c is stuck unclean for 13548.133470, current state
active+remapped+wait_backfill, last acting [78,85,168]
pg 20.525 is stuck unclean for 13545.272774, current state
active+remapped+wait_backfill, last acting [76,57,148]
pg 15.1ca is stuck unclean for 13547.944928, current state
active+remapped+wait_backfill, last acting [157,19,36]
pg 20.11e is stuck unclean for 13545.368614, current state
active+remapped+wait_backfill, last acting [36,134,8]
pg 20.525 is active+remapped+wait_backfill, acting [76,57,148]
pg 20.44c is active+remapped+wait_backfill, acting [174,127,139]
pg 20.41c is active+remapped+wait_backfill, acting [78,85,168]
pg 15.3fd is active+remapped+wait_backfill, acting [156,38,131]
pg 20.3db is active+remapped+wait_backfill, acting [45,90,157]
pg 20.396 is active+remapped+wait_backfill, acting [134,49,96]
pg 15.34a is active+remapped+wait_backfill, acting [64,87,80]
pg 15.318 is active+remapped+wait_backfill, acting [41,17,120]
pg 15.1ca is active+remapped+wait_backfill, acting [157,19,36]
pg 15.1ba is active+remapped+wait_backfill, acting [122,63,162]
pg 15.1ac is active+remapped+wait_backfill, acting [121,145,84]
pg 15.1af is active+remapped+backfilling, acting [150,62,101]
pg 20.11e is active+remapped+wait_backfill, acting [36,134,8]
pg 20.bc is active+remapped+wait_backfill, acting [72,64,104]
pg 20.6f is active+remapped+wait_backfill, acting [13,38,98]
recovery 16/65732491 objects degraded (0.000%); 328254/65732491
objects misplaced (0.499%)

As you can see, there is barely any overlap between the acting OSDs
for those PGs. osd max backfills should only limit the number of
concurrent backfills out of a single OSD, and so in the situation
above I would expect the 15 backfills to happen mostly concurrently.
As it is they are being serialized, and that seems to needlessly slow
down the process and extend the time needed to complete recovery.

I'm pretty sure I'm missing something obvious here, but what is it?

All insight greatly appreciated. :) Thank you!

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and data locality?

2015-02-17 Thread Gregory Farnum
On Tue, Feb 17, 2015 at 10:36 AM, Jake Kugel  wrote:
> Hi,
>
> I'm just starting to look at Ceph and CephFS.  I see that Ceph supports
> dynamic object interfaces to allow some processing of object data on the
> same node where the data is stored [1].  This might be a naive question,
> but is there any way to get data locality when using CephFS? For example,
> somehow arrange for parts of the filesystem to reside on OSDs on same
> system using CephFS client?

It's unrelated to the in-place RADOS class computation, but you can do
some intelligent placement by having specialized CRUSH rules and
making use of the CephFS' data layouts. Check the docs! :)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS and data locality?

2015-02-17 Thread Jake Kugel
Hi,

I'm just starting to look at Ceph and CephFS.  I see that Ceph supports 
dynamic object interfaces to allow some processing of object data on the 
same node where the data is stored [1].  This might be a naive question, 
but is there any way to get data locality when using CephFS? For example, 
somehow arrange for parts of the filesystem to reside on OSDs on same 
system using CephFS client?

Thank you,
Jake

[1] http://ceph.com/rados/dynamic-object-interfaces-with-lua/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison

2015-02-17 Thread Irek Fasikhov
Mark, very very good!

2015-02-17 20:37 GMT+03:00 Mark Nelson :

> Hi All,
>
> I wrote up a short document describing some tests I ran recently to look
> at how SSD backed OSD performance has changed across our LTS releases. This
> is just looking at RADOS performance and not RBD or RGW.  It also doesn't
> offer any real explanations regarding the results.  It's just a first high
> level step toward understanding some of the behaviors folks on the mailing
> list have reported over the last couple of releases.  I hope you find it
> useful.
>
> Mark
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] My PG is UP and Acting, yet it is unclean

2015-02-17 Thread Bahaa A. L.
Hi All,

I have a group of PGs that are up and acting, yet they are not clean, and 
causing the cluster to be in a warning mode, i.e. non-health.

This is my cluster status:

$ ceph -s

cluster 17bea68b-1634-4cd1-8b2a-00a60ef4761d
 health HEALTH_WARN 203 pgs stuck unclean; recovery 6/132 objects degraded 
(4.545%)
 monmap e1: 1 mons at {ceph-node1=172.31.0.84:6789/0}, election epoch 2, 
quorum 0 ceph-node1
 osdmap e122: 6 osds: 6 up, 6 in
  pgmap v6732: 1920 pgs, 16 pools, 10243 kB data, 66 objects
288 MB used, 18077 MB / 18365 MB avail
6/132 objects degraded (4.545%)
 203 active
1717 active+clean

So, as we can see, all the pgs are active, yet, some are unclean

Also .. When picking one of my placement groups: 

$ ceph pg map 0.592350

osdmap e122 pg 0.592350 (0.50) -> up [0,5] acting [0,5]


Why would this be?

B.R.
Beanos
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dedicated disks for monitor and mds?

2015-02-17 Thread John Spray

- Original Message -
> From: "Francois Lafont" 
> To: ceph-users@lists.ceph.com
> Sent: Monday, February 16, 2015 4:13:40 PM
> Subject: [ceph-users] Dedicated disks for monitor and mds?

> 1. I have read "10 GB per daemon for the monitor". But is
> I/O disk performance important for a monitor? Is it unreasonable
> to put the working directory of the monitor in the same partition
> of the root filesystem (ie /)?
> 
> 2. I have exactly the same question for the mds daemon.

The MDS does not use local storage at all -- CephFS metadata is stored in RADOS 
(i.e. the MDS stores data via the OSDs).

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CentOS7 librbd1-devel problem.

2015-02-17 Thread Ken Dreyer
On 02/17/2015 01:07 AM, Leszek Master wrote:
> Hello all. I have to install qemu on one of my ceph nodes to test
> somethings. I added there a ceph-giant repository and connceted it to
> ceph cluster. The problem is that i need to build from sourcess qemu
> with rbd support and there is no librbd1-devel in the ceph repository.
> Also in the epel i have only librbd1-devel at version 0.80.7, my ceph
> version installed is 0.87. So there is dependency problem. How can i get
> it working properly? where can i find librbd1-devel at 0.87 version that
> i can install at my Centos 7?

The -devel RPMs were split up downstream in EPEL's 0.80.7 packages, but
this change has not yet been done in the upstream packaging. It's in
progress, at http://tracker.ceph.com/issues/10884

If you need RBD headers for the v0.87 release, you can install the
"ceph-devel-0.87" RPM.

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Supermicro hardware recommendation

2015-02-17 Thread Mike
17.02.2015 04:11, Christian Balzer пишет:
> 
> Hello,
> 
> re-adding the mailing list.
> 
> On Mon, 16 Feb 2015 17:54:01 +0300 Mike wrote:
> 
>> Hello
>>
>> 05.02.2015 08:35, Christian Balzer пишет:
>>>
>>> Hello,
>>>
>
>> LSI 2308 IT
>> 2 x SSD Intel DC S3700 400GB
>> 2 x SSD Intel DC S3700 200GB
> Why the separation of SSDs? 
> They aren't going to be that busy with regards to the OS.

 We would like to use 400GB SSD for a cache pool, and 200GB SSD for
 the journaling.

>>> Don't, at least not like that.
>>> First and foremost, SSD based OSDs/pools have different requirements,
>>> especially when it comes to CPU. 
>>> Mixing your HDD and SSD based OSDs in the same chassis is a generally
>>> a bad idea.
>>
>> Why? If we have for example SuperServer 6028U-TR4+ with proper
>> configuration  (4 x SSD DC S3700 for cache pool/8 x 6-8Tb SATA HDD for
>> Cold storage/E5-2695V3 CPU/128Gb RAM), why it's still bad idea? It's
>> something inside Ceph don't work well?
>>
> 
> Ceph in and by itself will of course work.
> 
> But your example up there is total overkill on one hand and simply not
> balanced on the other hand.
> You'd be much better off (both performance and price wise) if you'd go
> with something less powerful for a HDD storage node like this:
> http://www.supermicro.com/products/system/2U/6027/SSG-6027R-E1R12T.cfm
> with 2 400GB Intels in the back for journals and 16 cores total.
> 
> While your SSD based storage nodes would be nicely dense by using
> something like:
> http://www.supermicro.com/products/system/2U/2028/SYS-2028TP-DC0TR.cfm
> with 2 E5-2690 v3 per node (I'd actually rather prefer E5-2687W v3, but
> those are running too hot).
> Alternatively one of the 1U cases with up to 10 SSDs.
> 
> Also maintaining a crush map that separates the SSD from HDD pools is made
> a lot easier, less error prone by segregating nodes into SSD and HDD ones.
> 
> There are several more reasons below.
> 
> 

Yes this normal variants of configurations. But in this way you have 2
different nodes versus 1, it's require a more support inside company.

In a whole setup you will be have for each MON, OSD, SSD-CACHE servers
one configuration and another configurations for compute nodes.

A lot of support, supplies, attention.

That's why we still trying reduce amount of configuration for support.
It's a balance support versus cost/speed/etc.

>> For me cache pool it's 1-st fast small storage between big slow storage.
>>
> That's the idea, yes.
> But besides the problems with performance I'm listing again below, that
> "small" is another, very difficult to judge in advance problem.
> By mixing your cache pool SSD OSDs into the HDD OSD chassis, you're
> making yourself inflexible in that area (as in just add another SSD cache
> pool node when needed). 
> 

Yes in some way inflexible, but I have one configuration not two and can
grow up cluster simply add modes.

>> You don't need journal anymore and if you need you can enlarge fast
>> storage.
>>
> You still need the journal of course, it's (unfortunately in some cases)
> a basic requirement in Ceph. 
> I suppose what you meant is "don't need journal on SSDs anymore".
> And while that is true, this makes your slow storage at least twice as
> slow, which at some point (deep-scrub, data re-balancing, very busy
> cluster) is likely to make you wish you had those journal SSDs.
> 
>  

Yes, journal on cold storage is need for re-balancing cluster if some
node/hdd fail or promote/remove object from ssd cache.

I remember a email in this mail list from one of inktank guys (sorry,
didn't remember him full email and name), they wrote that "you no need
journal if you use cache pool".

>>> If you really want to use SSD based OSDs, got at least with Giant,
>>> probably better even to wait for Hammer. 
>>> Otherwise your performance will be nowhere near the investment you're
>>> making. 
>>> Read up in the ML archives about SSD based clusters and their
>>> performance, as well as cache pools.
>>>
>>> Which brings us to the second point, cache pools are pretty pointless
>>> currently when it comes to performance. So unless you're planning to
>>> use EC pools, you will gain very little from them.
>>
>> So, ssd cache pool useless at all?
>>
> They're (currently) not performing all that well, ask people on the ML
> who're actually using them. 

By now it's true I'm reading ML every day.

> This is a combination of Ceph currently being unable to fully utilize the
> full potential of SSDs in general and the cache pool code (having to
> promote/demote whole objects mainly) in particular.
> 
> Both of these things are of course known to the Ceph developers and being
> improved, but right now I don't think they will give you what you expect
> from them.
> 
> I would build a good, solid, classic Ceph cluster at this point in time
> and have a small cache pool for testing. 
> Once that pool performs to your satisfaction, you can always grow it.
> Another r

Re: [ceph-users] Power failure recovery woes

2015-02-17 Thread Michal Kozanecki
Oh one more thing, the OSD's partitions/drives, how did they get mounted (mount 
options)?



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Michal 
Kozanecki
Sent: February-17-15 9:27 AM
To: Jeff; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Power failure recovery woes

Hi Jeff,

What type model drives are you using as OSDs? Any Journals? If so, what model? 
What does your ceph.conf look like? What sort of load is on the cluster (if 
it's still "online")? What distro/version? Firewall rules set properly?

Michal Kozanecki


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jeff
Sent: February-17-15 9:17 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Power failure recovery woes

Some additional information/questions:

Here is the output of "ceph osd tree"

Some of the "down" OSD's are actually running, but are "down". For example 
osd.1:

 root 30158  8.6 12.7 1542860 781288 ?  Ssl 07:47   4:40 
/usr/bin/ceph-osd --cluster=ceph -i 0 -f

  Is there any way to get the cluster to recognize them as being up?  
osd-1 has the "FAILED assert(last_e.version.version < e.version.version)" 
errors.

Thanks,
  Jeff


# idweight  type name   up/down reweight
-1  10.22   root default
-2  2.72host ceph1
0   0.91osd.0   up  1
1   0.91osd.1   down0
2   0.9 osd.2   down0
-3  1.82host ceph2
3   0.91osd.3   down0
4   0.91osd.4   down0
-4  2.04host ceph3
5   0.68osd.5   up  1
6   0.68osd.6   up  1
7   0.68osd.7   up  1
8   0.68osd.8   down0
-5  1.82host ceph4
9   0.91osd.9   up  1
10  0.91osd.10  down0
-6  1.82host ceph5
11  0.91osd.11  up  1
12  0.91osd.12  up  1

On 2/17/2015 8:28 AM, Jeff wrote:
>
>
>  Original Message 
> Subject: Re: [ceph-users] Power failure recovery woes
> Date: 2015-02-17 04:23
> From: Udo Lembke 
> To: Jeff , ceph-users@lists.ceph.com
>
> Hi Jeff,
> is the osd /var/lib/ceph/osd/ceph-2 mounted?
>
> If not, does it helps, if you mounted the osd and start with service 
> ceph start osd.2 ??
>
> Udo
>
> Am 17.02.2015 09:54, schrieb Jeff:
>> Hi,
>>
>> We had a nasty power failure yesterday and even with UPS's our small
>> (5 node, 12 OSD) cluster is having problems recovering.
>>
>> We are running ceph 0.87
>>
>> 3 of our OSD's are down consistently (others stop and are 
>> restartable, but our cluster is so slow that almost everything we do times 
>> out).
>>
>> We are seeing errors like this on the OSD's that never run:
>>
>> ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1) 
>> Operation not permitted
>>
>> We are seeing errors like these of the OSD's that run some of the time:
>>
>> osd/PGLog.cc: 844: FAILED assert(last_e.version.version <
>> e.version.version)
>> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide
>> timeout")
>>
>> Does anyone have any suggestions on how to recover our cluster?
>>
>> Thanks!
>>   Jeff
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power failure recovery woes

2015-02-17 Thread Michal Kozanecki
Hi Jeff,

What type model drives are you using as OSDs? Any Journals? If so, what model? 
What does your ceph.conf look like? What sort of load is on the cluster (if 
it's still "online")? What distro/version? Firewall rules set properly?

Michal Kozanecki


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jeff
Sent: February-17-15 9:17 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Power failure recovery woes

Some additional information/questions:

Here is the output of "ceph osd tree"

Some of the "down" OSD's are actually running, but are "down". For example 
osd.1:

 root 30158  8.6 12.7 1542860 781288 ?  Ssl 07:47   4:40 
/usr/bin/ceph-osd --cluster=ceph -i 0 -f

  Is there any way to get the cluster to recognize them as being up?  
osd-1 has the "FAILED assert(last_e.version.version < e.version.version)" 
errors.

Thanks,
  Jeff


# idweight  type name   up/down reweight
-1  10.22   root default
-2  2.72host ceph1
0   0.91osd.0   up  1
1   0.91osd.1   down0
2   0.9 osd.2   down0
-3  1.82host ceph2
3   0.91osd.3   down0
4   0.91osd.4   down0
-4  2.04host ceph3
5   0.68osd.5   up  1
6   0.68osd.6   up  1
7   0.68osd.7   up  1
8   0.68osd.8   down0
-5  1.82host ceph4
9   0.91osd.9   up  1
10  0.91osd.10  down0
-6  1.82host ceph5
11  0.91osd.11  up  1
12  0.91osd.12  up  1

On 2/17/2015 8:28 AM, Jeff wrote:
>
>
>  Original Message 
> Subject: Re: [ceph-users] Power failure recovery woes
> Date: 2015-02-17 04:23
> From: Udo Lembke 
> To: Jeff , ceph-users@lists.ceph.com
>
> Hi Jeff,
> is the osd /var/lib/ceph/osd/ceph-2 mounted?
>
> If not, does it helps, if you mounted the osd and start with service 
> ceph start osd.2 ??
>
> Udo
>
> Am 17.02.2015 09:54, schrieb Jeff:
>> Hi,
>>
>> We had a nasty power failure yesterday and even with UPS's our small 
>> (5 node, 12 OSD) cluster is having problems recovering.
>>
>> We are running ceph 0.87
>>
>> 3 of our OSD's are down consistently (others stop and are 
>> restartable, but our cluster is so slow that almost everything we do times 
>> out).
>>
>> We are seeing errors like this on the OSD's that never run:
>>
>> ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1) 
>> Operation not permitted
>>
>> We are seeing errors like these of the OSD's that run some of the time:
>>
>> osd/PGLog.cc: 844: FAILED assert(last_e.version.version <
>> e.version.version)
>> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide
>> timeout")
>>
>> Does anyone have any suggestions on how to recover our cluster?
>>
>> Thanks!
>>   Jeff
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power failure recovery woes

2015-02-17 Thread Jeff

Some additional information/questions:

Here is the output of "ceph osd tree"

Some of the "down" OSD's are actually running, but are "down". For 
example osd.1:


root 30158  8.6 12.7 1542860 781288 ?  Ssl 07:47   4:40 
/usr/bin/ceph-osd --cluster=ceph -i 0 -f


 Is there any way to get the cluster to recognize them as being up?  
osd-1 has the "FAILED assert(last_e.version.version < 
e.version.version)" errors.


Thanks,
 Jeff


# idweight  type name   up/down reweight
-1  10.22   root default
-2  2.72host ceph1
0   0.91osd.0   up  1
1   0.91osd.1   down0
2   0.9 osd.2   down0
-3  1.82host ceph2
3   0.91osd.3   down0
4   0.91osd.4   down0
-4  2.04host ceph3
5   0.68osd.5   up  1
6   0.68osd.6   up  1
7   0.68osd.7   up  1
8   0.68osd.8   down0
-5  1.82host ceph4
9   0.91osd.9   up  1
10  0.91osd.10  down0
-6  1.82host ceph5
11  0.91osd.11  up  1
12  0.91osd.12  up  1

On 2/17/2015 8:28 AM, Jeff wrote:



 Original Message 
Subject: Re: [ceph-users] Power failure recovery woes
Date: 2015-02-17 04:23
From: Udo Lembke 
To: Jeff , ceph-users@lists.ceph.com

Hi Jeff,
is the osd /var/lib/ceph/osd/ceph-2 mounted?

If not, does it helps, if you mounted the osd and start with
service ceph start osd.2
??

Udo

Am 17.02.2015 09:54, schrieb Jeff:

Hi,

We had a nasty power failure yesterday and even with UPS's our small (5
node, 12 OSD) cluster is having problems recovering.

We are running ceph 0.87

3 of our OSD's are down consistently (others stop and are restartable,
but our cluster is so slow that almost everything we do times out).

We are seeing errors like this on the OSD's that never run:

ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1)
Operation not permitted

We are seeing errors like these of the OSD's that run some of the time:

osd/PGLog.cc: 844: FAILED assert(last_e.version.version <
e.version.version)
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide 
timeout")


Does anyone have any suggestions on how to recover our cluster?

Thanks!
  Jeff


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power failure recovery woes

2015-02-17 Thread Jeff

Udo,

Yes, the osd is mounted:  /dev/sda4  963605972 260295676 703310296  
28% /var/lib/ceph/osd/ceph-2


Thanks,
Jeff

 Original Message 
Subject: Re: [ceph-users] Power failure recovery woes
Date: 2015-02-17 04:23
From: Udo Lembke 
To: Jeff , ceph-users@lists.ceph.com

Hi Jeff,
is the osd /var/lib/ceph/osd/ceph-2 mounted?

If not, does it helps, if you mounted the osd and start with
service ceph start osd.2
??

Udo

Am 17.02.2015 09:54, schrieb Jeff:

Hi,

We had a nasty power failure yesterday and even with UPS's our small (5
node, 12 OSD) cluster is having problems recovering.

We are running ceph 0.87

3 of our OSD's are down consistently (others stop and are restartable,
but our cluster is so slow that almost everything we do times out).

We are seeing errors like this on the OSD's that never run:

ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1)
Operation not permitted

We are seeing errors like these of the OSD's that run some of the time:

osd/PGLog.cc: 844: FAILED assert(last_e.version.version <
e.version.version)
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide
timeout")

Does anyone have any suggestions on how to recover our cluster?

Thanks!
  Jeff


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Introducing "Learning Ceph" : The First ever Book on Ceph

2015-02-17 Thread Vivek Varghese Cherian
On Fri, Feb 6, 2015 at 4:23 AM, Karan Singh  wrote:

> Hello Community Members
>
> I am happy to introduce the first book on Ceph with the title "*Learning
> Ceph*".
>
> Me and many folks from the publishing house together with technical
> reviewers spent several months to get this book compiled and published.
>
> Finally the book is up for sale on , i hope you would like it and surely
> will learn a lot from it.
>
> Amazon :
> http://www.amazon.com/Learning-Ceph-Karan-Singh/dp/1783985623/ref=sr_1_1?s=books&ie=UTF8&qid=1423174441&sr=1-1&keywords=ceph
> Packtpub : https://www.packtpub.com/application-development/learning-ceph
>
>
>
Hi Karan,

It would have been great if you could release the book under a creative
commons, or any other free/open source license so that people like me can
download it and read it.
After all ceph is open source, I don't see why a book on ceph should not
follow the same licensing pattern as ceph does.

Regards,
-- 
Vivek Varghese Cherian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] My PG is UP and Acting, yet it is unclean

2015-02-17 Thread B L
Hi All,

I have a group of PGs that are up and acting, yet they are not clean, and 
causing the cluster to be in a warning mode, i.e. non-health.

This is my cluster status:

$ ceph -s

   cluster 17bea68b-1634-4cd1-8b2a-00a60ef4761d
health HEALTH_WARN 203 pgs stuck unclean; recovery 6/132 objects degraded 
(4.545%)
monmap e1: 1 mons at {ceph-node1=172.31.0.84:6789/0}, election epoch 2, 
quorum 0 ceph-node1
osdmap e122: 6 osds: 6 up, 6 in
 pgmap v6732: 1920 pgs, 16 pools, 10243 kB data, 66 objects
   288 MB used, 18077 MB / 18365 MB avail
   6/132 objects degraded (4.545%)
203 active
   1717 active+clean

So, as we can see, all the pgs are active, yet, some are unclean

Also .. When picking one of my placement groups: 

$ ceph pg map 0.592350

osdmap e122 pg 0.592350 (0.50) -> up [0,5] acting [0,5]


Why would this be?

B.R.
Beanos
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "store is getting too big" on monitors

2015-02-17 Thread Mohamed Pakkeer
Hi Joao,

We followed your instruction to create the store dump

ceph-kvstore-tool /var/lib/ceph/mon/ceph-FOO/store.db list > store.dump'

for above store's location, let's call it $STORE:

for m in osdmap pgmap; do
  for k in first_committed last_committed; do
ceph-kvstore-tool $STORE get $m $k >> store.dump
  done
done

ceph-kvstore-tool $STORE get pgmap_meta last_osdmap_epoch >> store.dump
ceph-kvstore-tool $STORE get pgmap_meta version >> store.dump


Please find the store dump on the following link.

http://jmp.sh/LUh6iWo


-- 
Thanks & Regards
K.Mohamed Pakkeer



On Mon, Feb 16, 2015 at 8:14 PM, Joao Eduardo Luis  wrote:

> On 02/16/2015 12:57 PM, Mohamed Pakkeer wrote:
>
>>
>>   Hi ceph-experts,
>>
>>We are getting "store is getting too big" on our test cluster.
>> Cluster is running with giant release and configured as EC pool to test
>> cephFS.
>>
>> cluster c2a97a2f-fdc7-4eb5-82ef-70c52f2eceb1
>>   health HEALTH_WARN too few pgs per osd (0 < min 20); mon.master01
>> store is getting too big! 15376 MB >= 15360 MB; mon.master02 store is
>> getting too big! 15402 MB >= 15360 MB; mon.master03 store is getting too
>> big! 15402 MB >= 15360 MB; clock skew detected on mon.master02,
>> mon.master03
>>   monmap e3: 3 mons at
>> {master01=10.1.2.231:6789/0,master02=10.1.2.232:6789/0,
>> master03=10.1.2.233:6789/0
>> > master03=10.1.2.233:6789/0>},
>> election epoch 38, quorum 0,1,2 master01,master02,master03
>>   osdmap e97396: 552 osds: 552 up, 552 in
>>pgmap v354736: 0 pgs, 0 pools, 0 bytes data, 0 objects
>>  8547 GB used, 1953 TB / 1962 TB avail
>>
>> We tried monitor restart with mon compact on start = true as well as
>> manual compaction using 'ceph tell mon.FOO compact'. But it didn't
>> reduce the size of store.db. We already deleted the pools and mds to
>> start fresh cluster. Do we need to delete the mon and recreate again or
>> do we have any solution to reduce the store size?
>>
>
> Could you get us a list of all the keys on the store using
> 'ceph-kvstore-tool' ?  Instructions on the email you quoted.
>
> Cheers!
>
>   -Joao
>
>
>> Regards,
>> K.Mohamed Pakkeer
>>
>>
>>
>> On 12/10/2014 07:30 PM, Kevin Sumner wrote:
>>
>> The mons have grown another 30GB each overnight (except for 003?),
>> which
>> is quite worrying.  I ran a little bit of testing yesterday after my
>> post, but not a significant amount.
>>
>> I wouldn’t expect compact on start to help this situation based on the
>> name since we don’t (shouldn’t?) restart the mons regularly, but there
>> appears to be no documentation on it.  We’re pretty good on disk space
>> on the mons currently, but if that changes, I’ll probably use this to
>> see about bringing these numbers in line.
>>
>> This is an issue that has been seen on larger clusters, and it usually
>> takes a monitor restart, with 'mon compact on start = true' or manual
>> compaction 'ceph tell mon.FOO compact' to bring the monitor back to a
>> sane disk usage level.
>>
>> However, I have not been able to reproduce this in order to track the
>> source. I'm guessing I lack the scale of the cluster, or the appropriate
>> workload (maybe both).
>>
>> What kind of workload are you running the cluster through? You mention
>> cephfs, but do you have any more info you can share that could help us
>> reproducing this state?
>>
>> Sage also fixed an issue that could potentially cause this (depending on
>> what is causing it in the first place) [1,2,3]. This bug, #9987, is due
>> to a given cached value not being updated, leading to the monitor not
>> removing unnecessary data, potentially causing this growth. This cached
>> value would be set to its proper value when the monitor is restarted
>> though, so a simple restart would have all this unnecessary data blown
>> away.
>>
>> Restarting the monitor ends up masking the true cause of the store
>> growth: whether from #9987 or from obsolete data kept by the monitor's
>> backing store (leveldb), either due to misuse of leveldb or due to
>> leveldb's nature (haven't been able to ascertain which may be at fault,
>> partly due to being unable to reproduce the problem).
>>
>> If you are up to it, I would suggest the following approach in hope to
>> determine what may be at fault:
>>
>> 1) 'ceph tell mon.FOO compact' -- which will force the monitor to
>> compact its store. This won't close leveldb, so it won't have much
>> effect on the store size if it happens to be leveldb holding on to some
>> data (I could go into further detail, but I don't think this is the
>> right medium). 1.a) you may notice the store increasing in size during
>> this period; it's expected. 1.b) compaction may take a while, but in the
>> end you'll hopefully see a significant reduction in size.
>>
>> 2) Assuming that failed, I would suggest doing the following:
>>
>> 2.1) grab ceph-kvstore-tool from the ceph-test package
>> 2.2) stop the 

Re: [ceph-users] Dedicated disks for monitor and mds?

2015-02-17 Thread Francois Lafont
Hi,

Le 17/02/2015 11:15, John Spray a écrit :
 
> The MDS does not use local storage at all -- CephFS metadata is stored in 
> RADOS (i.e. the MDS stores data via the OSDs).

Ah ok. So, consequently, I can put the working directory of
the mds (ie /var/lib/ceph/mds/ceph-$id/) absolutely everywhere,
and for instance in the same disk used by the OS.
Good news. ;)

Thank you John.

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dedicated disks for monitor and mds?

2015-02-17 Thread Francois Lafont
Hello,

Le 17/02/2015 05:55, Christian Balzer wrote :

>> 1. I have read "10 GB per daemon for the monitor". But is
>> I/O disk performance important for a monitor? Is it unreasonable
>> to put the working directory of the monitor in the same partition
>> of the root filesystem (ie /)?
>>
> Yes, monitors are quite I/O sensitive, they like their leveldb to be on a
> fast disk, preferably an SSD. 
> So if your OS in on SSD(s), no worries.
> If your OS is on plain HDDs w/o any caching controller, you may run into
> problems if your cluster gets busy.

Ok, I see. So, for instance, if I have a server with:

- 4 spinning HDD of 500GB, one osd per disk,
- 2 SSD for the journals of the osd (2 journals per SSD)

I can put the working directory in one of the SSD without
problem, is that correct?

>> 2. I have exactly the same question for the mds daemon.
>>
> No idea (not running MDS), but I suspect it would be fine as well as long
> as the OS is on SSD(s).

Ok.

>> I'm asking these questions because if these daemons must have
>> dedicated disks, with the OS too, it consumes disks which could
>> not be used for osd daemons.
>>
>> Off chance, here is my third question:
>>
>> 3. Is there a web site which lists precise examples of hardwares
>> "ceph-approved" by "ceph-users" with the kernel and ceph version?
>>
> Searching this mailing list is probably your best bet.
> Never mind that people tend to update things constantly.

Ok. It could be interesting to have a centralized page.

> In general you will want the newest stable kernel you can run, from what I
> remember the 3.13 in one Ubuntu version was particular bad.

Ah? But Ubuntu 14.04 Trusty seems to be well supported and tested
by ceph (for the Firefly version which is the version I use):
http://ceph.com/docs/master/start/os-recommendations/#platforms

Should I use another distribution (use a LTS distribution seemed
to me a good idea)? Or should I keep Trusty and upgrade the kernel
(with "apt-get linux-headers-3.16.0-30-generic")?

Thanks for your help Christian.

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power failure recovery woes

2015-02-17 Thread Udo Lembke
Hi Jeff,
is the osd /var/lib/ceph/osd/ceph-2 mounted?

If not, does it helps, if you mounted the osd and start with
service ceph start osd.2
??

Udo

Am 17.02.2015 09:54, schrieb Jeff:
> Hi,
> 
> We had a nasty power failure yesterday and even with UPS's our small (5
> node, 12 OSD) cluster is having problems recovering.
> 
> We are running ceph 0.87
> 
> 3 of our OSD's are down consistently (others stop and are restartable,
> but our cluster is so slow that almost everything we do times out).
> 
> We are seeing errors like this on the OSD's that never run:
> 
> ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1)
> Operation not permitted
> 
> We are seeing errors like these of the OSD's that run some of the time:
> 
> osd/PGLog.cc: 844: FAILED assert(last_e.version.version <
> e.version.version)
> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
> 
> Does anyone have any suggestions on how to recover our cluster?
> 
> Thanks!
>   Jeff
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Power failure recovery woes

2015-02-17 Thread Jeff

Hi,

We had a nasty power failure yesterday and even with UPS's our small (5 
node, 12 OSD) cluster is having problems recovering.


We are running ceph 0.87

3 of our OSD's are down consistently (others stop and are restartable, 
but our cluster is so slow that almost everything we do times out).


We are seeing errors like this on the OSD's that never run:

ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1) 
Operation not permitted


We are seeing errors like these of the OSD's that run some of the time:

osd/PGLog.cc: 844: FAILED assert(last_e.version.version < 
e.version.version)

common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")

Does anyone have any suggestions on how to recover our cluster?

Thanks!
  Jeff


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CentOS7 librbd1-devel problem.

2015-02-17 Thread Leszek Master
Hello all. I have to install qemu on one of my ceph nodes to test
somethings. I added there a ceph-giant repository and connceted it to ceph
cluster. The problem is that i need to build from sourcess qemu with rbd
support and there is no librbd1-devel in the ceph repository. Also in the
epel i have only librbd1-devel at version 0.80.7, my ceph version installed
is 0.87. So there is dependency problem. How can i get it working properly?
where can i find librbd1-devel at 0.87 version that i can install at my
Centos 7?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com