[ceph-users] New Geek on Duty!

2013-09-10 Thread Ross David Turk

Greetings, ceph-users :)

I’m pleased to share that Xiaoxi from the Intel Asia Pacific RD Center
is our newest volunteer Geek on Duty!

If you’re not familiar with the Geek on Duty program, here are the
basics: members of our community take shifts on IRC and on the mailing
list to help new users get Ceph up and running quickly.

Xiaoxi will be taking the 10:00 - 13:00 shift in China (which is 7pm PDT,
10pm EDT, 04:00 CEST).  His handle on IRC is “xiaoxi” - everyone say
hello when you see him in the channel next!

Cheers,
Ross

--
Ross Turk
Community, Inktank

@rossturk @inktank @ceph

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] errors after kernel-upgrade

2013-09-10 Thread Markus Goldberg

Hi,
i made a 'stop ceph-all' on my ceph-admin-host and then a kernel-upgrade 
from 3.9 to 3.11 on all of my 3 nodes.

Ubuntu 13.04, ceph 0,68
The kernel-upgrade required a reboot.
Now after rebooting i get the following errors:

/root@bd-a:~# ceph -s//
//cluster e0dbf70d-af59-42a5-b834-7ad739a7f89b//
// health HEALTH_WARN 133 pgs peering; 272 pgs stale; 265 pgs stuck 
unclean; 2 requests are blocked  32 sec; mds cluster is degraded//
// monmap e1: 3 mons at 
{bd-0=xxx.xxx.xxx.20:6789/0,bd-1=///xxx.xxx.xxx/.21:6789/0,bd-2=///xxx.xxx.xxx/.22:6789/0}, 
election epoch 782, quorum 0,1,2 bd-0,bd-1,bd-2//

// mdsmap e451467: 1/1/1 up {0=bd-0=up:replay}, 2 up:standby//
// osdmap e464358: 3 osds: 3 up, 3 in//
//  pgmap v1343477: 792 pgs, 9 pools, 15145 MB data, 4986 objects//
//30927 MB used, 61372 GB / 61408 GB avail//
// 387 active+clean//
// 122 stale+active//
// 140 stale+active+clean//
// 133 peering//
//  10 stale+active+replay//
//
//root@bd-a:~# ceph -s//
//cluster e0dbf70d-af59-42a5-b834-7ad739a7f89b//
// health HEALTH_WARN 6 pgs down; 377 pgs peering; 296 pgs stuck 
unclean; mds cluster is degraded//
// monmap e1: 3 mons at 
{bd-0=///xxx.xxx.xxx/.20:6789/0,bd-1=///xxx.xxx.xxx/.21:6789/0,bd-2=///xxx.xxx.xxx/.22:6789/0}, 
election epoch 782, quorum 0,1,2 bd-0,bd-1,bd-2//

// mdsmap e451467: 1/1/1 up {0=bd-0=up:replay}, 2 up:standby//
// osdmap e464400: 3 osds: 3 up, 3 in//
//  pgmap v1343586: 792 pgs, 9 pools, 15145 MB data, 4986 objects//
//31046 MB used, 61372 GB / 61408 GB avail//
// 142 active//
// 270 active+clean//
//   3 active+replay//
// 371 peering//
//   6 down+peering//
//
//root@bd-a:~# ceph -s//
//cluster e0dbf70d-af59-42a5-b834-7ad739a7f89b//
// health HEALTH_WARN 257 pgs peering; 359 pgs stuck unclean; 1 
requests are blocked  32 sec; mds cluster is degraded//
// monmap e1: 3 mons at 
{bd-0=///xxx.xxx.xxx/.20:6789/0,bd-1=///xxx.xxx.xxx/.21:6789/0,bd-2=///xxx.xxx.xxx/.22:6789/0}, 
election epoch 782, quorum 0,1,2 bd-0,bd-1,bd-2//

// mdsmap e451467: 1/1/1 up {0=bd-0=up:replay}, 2 up:standby//
// osdmap e464403: 3 osds: 3 up, 3 in//
//  pgmap v1343594: 792 pgs, 9 pools, 15145 MB data, 4986 objects//
//31103 MB used, 61372 GB / 61408 GB avail//
// 373 active//
// 157 active+clean//
//   5 active+replay//
// 257 peering//
//
//root@bd-a:~#/

As you can see above, the errors are changing, perhaps any selfrepair is 
on the run in the background. But this is since 12 hours.

What should i do ?

Thank you,
  Markus
Am 09.09.2013 13:52, schrieb Yan, Zheng:
The bug has been fixed in 3.11 kernel by commit ccca4e37b1 (libceph: 
fix truncate size calculation). We don't backport cephfs bug fixes to 
old kernel. please update the kernel or use ceph-fuse. Regards Yan, Zheng

Best regards,
Tobi

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
MfG,
  Markus Goldberg


Markus Goldberg | Universität Hildesheim
| Rechenzentrum
Tel +49 5121 883212 | Marienburger Platz 22, D-31141 Hildesheim, Germany
Fax +49 5121 883205 | email goldb...@uni-hildesheim.de



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet
Hi,

I have a space problem on a production cluster, like if there is unused
data not freed : ceph df and rados df reports 613GB of data, and
disk usage is 2640GB (with 3 replica). It should be near 1839GB.


I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush
rules to put pools on SAS or on SSD.

My pools :
# ceph osd dump | grep ^pool
pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 
576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins 
pg_num 576 pgp_num 576 last_change 68317 owner 0
pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 
576 pgp_num 576 last_change 68321 owner 0
pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins 
pg_num 200 pgp_num 200 last_change 172933 owner 0
pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash rjenkins 
pg_num 800 pgp_num 800 last_change 172929 owner 0
pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins 
pg_num 2048 pgp_num 2048 last_change 172935 owner 0

Only hdd3copies, sas3copies and ssd3copies are really used :
# ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED 
76498G 51849G 24648G   32.22 

POOLS:
NAME   ID USED  %USED OBJECTS 
data   0  46753 0 72  
metadata   1  0 0 0   
rbd2  8 0 1   
hdd3copies 3  2724G 3.56  5190954 
ssd3copies 6  613G  0.80  347668  
sas3copies 9  3692G 4.83  764394  


My CRUSH rules was :

rule SASperHost {
ruleset 4
type replicated
min_size 1
max_size 10
step take SASroot
step chooseleaf firstn 0 type host
step emit
}

and :

rule SSDperOSD {
ruleset 3
type replicated
min_size 1
max_size 10
step take SSDroot
step choose firstn 0 type osd
step emit
}


but, since the cluster was full because of that space problem, I swith to a 
different rule :

rule SSDperOSDfirst {
ruleset 7
type replicated
min_size 1
max_size 10
step take SSDroot
step choose firstn 1 type osd
step emit
step take SASroot
step chooseleaf firstn -1 type net
step emit
}


So with that last rule, I should have only one replica on my SSD OSD, so 613GB 
of space used. But if I check on OSD I see 1212GB really used.

I also use snapshots, maybe snapshots are ignored by ceph df and rados df ?

Thanks for any help.

Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] trouble with ceph-deploy

2013-09-10 Thread Pavel Timoschenkov
OSD created only if I use single disk for data and journal.

Situation with separate disks:
1.
ceph-deploy disk zap ceph001:sdaa ceph001:sda1 [ceph_deploy.osd][DEBUG ] 
zapping /dev/sdaa on ceph001 [ceph_deploy.osd][DEBUG ] zapping /dev/sda1 on 
ceph001
2.
Wiped file system on ceph001
wipefs /dev/sdaa
wipefs: WARNING: /dev/sdaa: appears to contain 'gpt' partition table wipefs 
/dev/sdaa1
wipefs: error: /dev/sdaa1: probing initialization failed
3. 
ceph-deploy osd create ceph001:sdaa:/dev/sda1
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
ceph001:/dev/sdaa:/dev/sda1
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph001 [ceph_deploy.osd][DEBUG ] 
Host ceph001 is now ready for osd use.
[ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaa journal 
/dev/sda1 activate True
4.
ceph -k ceph.client.admin.keyring -s
  cluster d4d39e90-9610-41f3-be73-db361908b433
   health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
   monmap e1: 1 mons at {ceph001=172.16.4.32:6789/0}, election epoch 2, quorum 
0 ceph001
   osdmap e1: 0 osds: 0 up, 0 in
pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB avail
   mdsmap e1: 0/0/1 up

With single disk:
1.
ceph-deploy disk zap ceph001:sdaa
[ceph_deploy.osd][DEBUG ] zapping /dev/sdaa on ceph001
2.
ceph-deploy osd create ceph001:sdaa
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph001:/dev/sdaa:
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph001 [ceph_deploy.osd][DEBUG ] 
Host ceph001 is now ready for osd use.
[ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaa journal None 
activate True
3.
ceph@ceph-admin:~$ ceph -k ceph.client.admin.keyring -s
  cluster d4d39e90-9610-41f3-be73-db361908b433
   health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean
   monmap e1: 1 mons at {ceph001=172.16.4.32:6789/0}, election epoch 2, quorum 
0 ceph001
   osdmap e2: 1 osds: 0 up, 0 in
pgmap v3: 192 pgs: 192 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB avail
   mdsmap e1: 0/0/1 up

-Original Message-
From: Sage Weil [mailto:s...@inktank.com] 
Sent: Monday, September 09, 2013 7:09 PM
To: Pavel Timoschenkov
Cc: Alfredo Deza; ceph-users@lists.ceph.com
Subject: RE: [ceph-users] trouble with ceph-deploy

If you manually use wipefs to clear out the fs signatures after you zap, does 
it work then?

I've opened http://tracker.ceph.com/issues/6258 as I think that is the answer 
here, but if you could confirm that wipefs does in fact solve the problem, that 
would be helpful!

Thanks-
sage


On Mon, 9 Sep 2013, Pavel Timoschenkov wrote:

 for the experiment:
 
 - blank disk sdae for data
 
 blkid -p /dev/sdaf
 /dev/sdaf: PTTYPE=gpt
 
 - and sda4 partition for journal
 
 blkid -p /dev/sda4
 /dev/sda4: PTTYPE=gpt PART_ENTRY_SCHEME=gpt PART_ENTRY_NAME=Linux 
 filesystem PART_ENTRY_UUID=cdc46436-b6ed-40bb-adb4-63cf1c41cbe3 
 PART_ENTRY_TYPE=0fc63daf-8483-4772-8e79-3d69d8477de4 PART_ENTRY_NUMBER=4 
 PART_ENTRY_OFFSET=62916608 PART_ENTRY_SIZE=20971520 PART_ENTRY_DISK=8:0
 
 - zapped disk
 
 ceph-deploy disk zap ceph001:sdaf ceph001:sda4 [ceph_deploy.osd][DEBUG 
 ] zapping /dev/sdaf on ceph001 [ceph_deploy.osd][DEBUG ] zapping 
 /dev/sda4 on ceph001
 
 - after this:
 
 ceph-deploy osd create ceph001:sdae:/dev/sda4 [ceph_deploy.osd][DEBUG 
 ] Preparing cluster ceph disks ceph001:/dev/sdaf:/dev/sda4 
 [ceph_deploy.osd][DEBUG ] Deploying osd to ceph001 
 [ceph_deploy.osd][DEBUG ] Host ceph001 is now ready for osd use.
 [ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaf 
 journal /dev/sda4 activate True
 
 
 - after this:
 
 blkid -p /dev/sdaf1
 /dev/sdaf1: ambivalent result (probably more filesystems on the 
 device, use wipefs(8) to see more details)
 
 wipefs /dev/sdaf1
 offset   type
 
 0x3  zfs_member   [raid]
 
 0x0  xfs   [filesystem]
  UUID:  aba50262-0427-4f8b-8eb9-513814af6b81
 
 - and OSD not created
 
 but if I'm using sungle disk for data and journal:
 
 ceph-deploy disk zap ceph001:sdaf
 [ceph_deploy.osd][DEBUG ] zapping /dev/sdaf on ceph001
 
 ceph-deploy osd create ceph001:sdaf
 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph001:/dev/sdaf:
 [ceph_deploy.osd][DEBUG ] Deploying osd to ceph001 
 [ceph_deploy.osd][DEBUG ] Host ceph001 is now ready for osd use.
 [ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaf 
 journal None activate True
 
 OSD created!
 
 -Original Message-
 From: Sage Weil [mailto:s...@inktank.com]
 Sent: Friday, September 06, 2013 6:41 PM
 To: Pavel Timoschenkov
 Cc: Alfredo Deza; ceph-users@lists.ceph.com
 Subject: RE: [ceph-users] trouble with ceph-deploy
 
 On Fri, 6 Sep 2013, Pavel Timoschenkov wrote:
  Try
  ceph-disk -v activate /dev/sdaa1
  
  ceph-disk -v activate /dev/sdaa1
  /dev/sdaa1: ambivalent result (probably more filesystems on the 
  device, use wipefs(8) to see more details)
 
 

Re: [ceph-users] rbd cp copies of sparse files become fully allocated

2013-09-10 Thread Andrey Korolyov
On Tue, Sep 10, 2013 at 3:03 AM, Josh Durgin josh.dur...@inktank.com wrote:
 On 09/09/2013 04:57 AM, Andrey Korolyov wrote:

 May I also suggest the same for export/import mechanism? Say, if image
 was created by fallocate we may also want to leave holes upon upload
 and vice-versa for export.


 Import and export already omit runs of zeroes. They could detect
 smaller runs (currently they look at object size chunks), and export
 might be more efficient if it used diff_iterate() instead of
 read_iterate(). Have you observed them misbehaving with sparse images?



Did you meant dumpling? As I had checked some months ago cuttlefish
not had such feature.

 On Mon, Sep 9, 2013 at 8:45 AM, Sage Weil s...@inktank.com wrote:

 On Sat, 7 Sep 2013, Oliver Daudey wrote:

 Hey all,

 This topic has been partly discussed here:

 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-March/000799.html

 Tested on Ceph version 0.67.2.

 If you create a fresh empty image of, say, 100GB in size on RBD and then
 use rbd cp to make a copy of it, even though the image is sparse, the
 command will attempt to read every part of it and take far more time
 than expected.

 After reading the above thread, I understand why the copy of an
 essentially empty sparse image on RBD would take so long, but it doesn't
 explain why the copy won't be sparse itself.  If I use rbd cp to copy
 an image, the copy will take it's full allocated size on disk, even if
 the original was empty.  If I use the QEMU qemu-img-tool's
 convert-option to convert the original image to the copy without
 changing the format, essentially only making a copy, it takes it's time
 as well, but will be faster than rbd cp and the resulting copy will be
 sparse.

 Example-commands:
 rbd create --size 102400 test1
 rbd cp test1 test2
 qemu-img convert -p -f rbd -O rbd rbd:rbd/test1 rbd:rbd/test3

 Shouldn't rbd cp at least have an option to attempt to sparsify the
 copy, or copy the sparse parts as sparse?  Same goes for rbd clone,
 BTW.


 Yep, this is in fact a bug.  Opened http://tracker.ceph.com/issues/6257.

 Thanks!
 sage


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet
Some additionnal informations : if I look on one PG only, for example
the 6.31f. ceph pg dump report a size of 616GB :

# ceph pg dump | grep ^6\\. | awk '{ SUM+=($6/1024/1024) } END { print SUM }'
631717

But on disk, on the 3 replica I have :
# du -sh  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
1,3G/var/lib/ceph/osd/ceph-50/current/6.31f_head/

Since I was suspected a snapshot problem, I try to count only head
files :
# find /var/lib/ceph/osd/ceph-50/current/6.31f_head/ -type f -name '*head*' 
-print0 | xargs -r -0 du -hc | tail -n1
448Mtotal

and the content of the directory : http://pastebin.com/u73mTvjs


Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit :
 Hi,
 
 I have a space problem on a production cluster, like if there is unused
 data not freed : ceph df and rados df reports 613GB of data, and
 disk usage is 2640GB (with 3 replica). It should be near 1839GB.
 
 
 I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush
 rules to put pools on SAS or on SSD.
 
 My pools :
 # ceph osd dump | grep ^pool
 pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
 pg_num 576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45
 pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins 
 pg_num 576 pgp_num 576 last_change 68317 owner 0
 pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins 
 pg_num 576 pgp_num 576 last_change 68321 owner 0
 pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
 rjenkins pg_num 200 pgp_num 200 last_change 172933 owner 0
 pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash 
 rjenkins pg_num 800 pgp_num 800 last_change 172929 owner 0
 pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
 rjenkins pg_num 2048 pgp_num 2048 last_change 172935 owner 0
 
 Only hdd3copies, sas3copies and ssd3copies are really used :
 # ceph df
 GLOBAL:
 SIZE   AVAIL  RAW USED %RAW USED 
 76498G 51849G 24648G   32.22 
 
 POOLS:
 NAME   ID USED  %USED OBJECTS 
 data   0  46753 0 72  
 metadata   1  0 0 0   
 rbd2  8 0 1   
 hdd3copies 3  2724G 3.56  5190954 
 ssd3copies 6  613G  0.80  347668  
 sas3copies 9  3692G 4.83  764394  
 
 
 My CRUSH rules was :
 
 rule SASperHost {
   ruleset 4
   type replicated
   min_size 1
   max_size 10
   step take SASroot
   step chooseleaf firstn 0 type host
   step emit
 }
 
 and :
 
 rule SSDperOSD {
   ruleset 3
   type replicated
   min_size 1
   max_size 10
   step take SSDroot
   step choose firstn 0 type osd
   step emit
 }
 
 
 but, since the cluster was full because of that space problem, I swith to a 
 different rule :
 
 rule SSDperOSDfirst {
   ruleset 7
   type replicated
   min_size 1
   max_size 10
   step take SSDroot
   step choose firstn 1 type osd
   step emit
 step take SASroot
 step chooseleaf firstn -1 type net
 step emit
 }
 
 
 So with that last rule, I should have only one replica on my SSD OSD, so 
 613GB of space used. But if I check on OSD I see 1212GB really used.
 
 I also use snapshots, maybe snapshots are ignored by ceph df and rados df 
 ?
 
 Thanks for any help.
 
 Olivier
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet
I also checked that all files in that PG still are on that PG :

for IMG in `find . -type f -printf '%f\n' | awk -F '__' '{ print $1 }' |
sort --unique` ; do echo -n $IMG ; ceph osd map ssd3copies $IMG | grep
-v 6\\.31f ; echo ; done

And all objects are referenced in rados (compared with rados --pool
ssd3copies ls rados.ssd3copies.dump).



Le mardi 10 septembre 2013 à 13:46 +0200, Olivier Bonvalet a écrit :
 Some additionnal informations : if I look on one PG only, for example
 the 6.31f. ceph pg dump report a size of 616GB :
 
 # ceph pg dump | grep ^6\\. | awk '{ SUM+=($6/1024/1024) } END { print SUM }'
 631717
 
 But on disk, on the 3 replica I have :
 # du -sh  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
 1,3G  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
 
 Since I was suspected a snapshot problem, I try to count only head
 files :
 # find /var/lib/ceph/osd/ceph-50/current/6.31f_head/ -type f -name '*head*' 
 -print0 | xargs -r -0 du -hc | tail -n1
 448M  total
 
 and the content of the directory : http://pastebin.com/u73mTvjs
 
 
 Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit :
  Hi,
  
  I have a space problem on a production cluster, like if there is unused
  data not freed : ceph df and rados df reports 613GB of data, and
  disk usage is 2640GB (with 3 replica). It should be near 1839GB.
  
  
  I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush
  rules to put pools on SAS or on SSD.
  
  My pools :
  # ceph osd dump | grep ^pool
  pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
  pg_num 576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45
  pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash 
  rjenkins pg_num 576 pgp_num 576 last_change 68317 owner 0
  pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins 
  pg_num 576 pgp_num 576 last_change 68321 owner 0
  pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
  rjenkins pg_num 200 pgp_num 200 last_change 172933 owner 0
  pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash 
  rjenkins pg_num 800 pgp_num 800 last_change 172929 owner 0
  pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
  rjenkins pg_num 2048 pgp_num 2048 last_change 172935 owner 0
  
  Only hdd3copies, sas3copies and ssd3copies are really used :
  # ceph df
  GLOBAL:
  SIZE   AVAIL  RAW USED %RAW USED 
  76498G 51849G 24648G   32.22 
  
  POOLS:
  NAME   ID USED  %USED OBJECTS 
  data   0  46753 0 72  
  metadata   1  0 0 0   
  rbd2  8 0 1   
  hdd3copies 3  2724G 3.56  5190954 
  ssd3copies 6  613G  0.80  347668  
  sas3copies 9  3692G 4.83  764394  
  
  
  My CRUSH rules was :
  
  rule SASperHost {
  ruleset 4
  type replicated
  min_size 1
  max_size 10
  step take SASroot
  step chooseleaf firstn 0 type host
  step emit
  }
  
  and :
  
  rule SSDperOSD {
  ruleset 3
  type replicated
  min_size 1
  max_size 10
  step take SSDroot
  step choose firstn 0 type osd
  step emit
  }
  
  
  but, since the cluster was full because of that space problem, I swith to a 
  different rule :
  
  rule SSDperOSDfirst {
  ruleset 7
  type replicated
  min_size 1
  max_size 10
  step take SSDroot
  step choose firstn 1 type osd
  step emit
  step take SASroot
  step chooseleaf firstn -1 type net
  step emit
  }
  
  
  So with that last rule, I should have only one replica on my SSD OSD, so 
  613GB of space used. But if I check on OSD I see 1212GB really used.
  
  I also use snapshots, maybe snapshots are ignored by ceph df and rados 
  df ?
  
  Thanks for any help.
  
  Olivier
  
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] status of glance/cinder/nova integration in openstack grizzly

2013-09-10 Thread Darren Birkett
Hi All,

tl;dr - does glance/rbd and cinder/rbd play together nicely in grizzly?

I'm currently testing a ceph/rados back end with an openstack installation.
 I have the following things working OK:

1. cinder configured to create volumes in RBD
2. nova configured to boot from RBD backed cinder volumes (libvirt UUID
secret set etc)
3. glance configured to use RBD as a back end store for images

With this setup, when I create a bootable volume in cinder, passing an id
of an image in glance, the image gets downloaded, converted to raw, and
then created as an RBD object and made available to cinder.  The correct
metadata field for the cinder volume is populated (volume_image_metadata)
and so the cinder client marks the volume as bootable.  This is all fine.

If I want to take advantage of the fact that both glance images and cinder
volumes are stored in RBD, I can add the following flag to the
glance-api.conf:

show_image_direct_url = True

This enables cinder to see that the glance image is stored in RBD, and the
cinder rbd driver then, instead of downloading the image and creating an
RBD image from it, just issues an 'rbd clone' command (seen in the
cinder-volume.log):

rbd clone --pool images --image dcb2f16d-a09d-4064-9198-1965274e214d --snap
snap --dest-pool volumes --dest volume-20987f9d-b4fb-463d-8b8f-fa667bd47c6d

This is all very nice, and the cinder volume is available immediately as
you'd expect.  The problem is that the metadata field is not populated so
it's not seen as bootable.  Even manually populating this field leaves the
volume unbootable.  The volume can not even be attached to another instance
for inspection.

libvirt doesn't seem to be able to access the rbd device. From
nova-compute.log:

qemu-system-x86_64: -drive
file=rbd:volumes/volume-20987f9d-b4fb-463d-8b8f-fa667bd47c6d:id=volumes:key=AQAnAy9ScPB4IRAAtxD/V1rDciqFiT9AMPPr+A==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw,serial=20987f9d-b4fb-463d-8b8f-fa667bd47c6d,cache=none:
error reading header from volume-20987f9d-b4fb-463d-8b8f-fa667bd47c6d

qemu-system-x86_64: -drive
file=rbd:volumes/volume-20987f9d-b4fb-463d-8b8f-fa667bd47c6d:id=volumes:key=AQAnAy9ScPB4IRAAtxD/V1rDciqFiT9AMPPr+A==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw,serial=20987f9d-b4fb-463d-8b8f-fa667bd47c6d,cache=none:
could not open disk image
rbd:volumes/volume-20987f9d-b4fb-463d-8b8f-fa667bd47c6d:id=volumes:key=AQAnAy9ScPB4IRAAtxD/V1rDciqFiT9AMPPr+A==:auth_supported=cephx\;none:
Operation not permitted

It's almost like a permission issue, but my ceph/rbd knowledge is still
fledgeling.

I know that the cinder rbd driver has been rewritten to use librbd in
havana, and I'm wondering if this will change any of this behaviour?  I'm
also wondering if anyone has actually got this working with grizzly, and
how?

Many thanks
Darren
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] About the data movement in Ceph

2013-09-10 Thread atrmat
Hi all, 
recently i read the source code and paper, and i have some questions about the 
data movement: 
1. when OSD's add or removal, how Ceph do this data migration and rebalance the 
crush map? is it the rados modify the crush map or cluster map, and the primary 
OSD does the data movement according to the cluster map? how to found the data 
migration in the source code?
2. when OSD's down or failed, how Ceph recover the data in other OSDs? is it 
the primary OSD copy the PG to the new located OSD?
3. the OSD has 4 status bits: up,down,in,out. But i can't found the defined 
status-- CEPH_OSD_DOWN, is it the OSD call the function mark_osd_down() to 
modify the OSD status in OSDMap?
Very appreciate ur kindly reply!
Thx!

atrmat___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] trouble with ceph-deploy

2013-09-10 Thread Sage Weil
On Tue, 10 Sep 2013, Pavel Timoschenkov wrote:
 OSD created only if I use single disk for data and journal.
 
 Situation with separate disks:
 1.
 ceph-deploy disk zap ceph001:sdaa ceph001:sda1
 [ceph_deploy.osd][DEBUG ] zapping /dev/sdaa on ceph001
 [ceph_deploy.osd][DEBUG ] zapping /dev/sda1 on ceph001
 2.
 Wiped file system on ceph001
 wipefs /dev/sdaa
 wipefs: WARNING: /dev/sdaa: appears to contain 'gpt' partition table
 wipefs /dev/sdaa1
 wipefs: error: /dev/sdaa1: probing initialization failed

I think this is still the problem.  What happens if you do wipefs *before* 
the zap?  I wonder if the signature offsets are relative to sdaa1 and it 
doesn't see them after the partition table is zeroed out by zap?

Thanks-
sage


 3. 
 ceph-deploy osd create ceph001:sdaa:/dev/sda1
 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
 ceph001:/dev/sdaa:/dev/sda1
 [ceph_deploy.osd][DEBUG ] Deploying osd to ceph001
 [ceph_deploy.osd][DEBUG ] Host ceph001 is now ready for osd use.
 [ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaa journal 
 /dev/sda1 activate True
 4.
 ceph -k ceph.client.admin.keyring -s
   cluster d4d39e90-9610-41f3-be73-db361908b433
health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
monmap e1: 1 mons at {ceph001=172.16.4.32:6789/0}, election epoch 2, 
 quorum 0 ceph001
osdmap e1: 0 osds: 0 up, 0 in
 pgmap v2: 192 pgs: 192 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB 
 avail
mdsmap e1: 0/0/1 up
 
 With single disk:
 1.
 ceph-deploy disk zap ceph001:sdaa
 [ceph_deploy.osd][DEBUG ] zapping /dev/sdaa on ceph001
 2.
 ceph-deploy osd create ceph001:sdaa
 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph001:/dev/sdaa:
 [ceph_deploy.osd][DEBUG ] Deploying osd to ceph001
 [ceph_deploy.osd][DEBUG ] Host ceph001 is now ready for osd use.
 [ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaa journal None 
 activate True
 3.
 ceph@ceph-admin:~$ ceph -k ceph.client.admin.keyring -s
   cluster d4d39e90-9610-41f3-be73-db361908b433
health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean
monmap e1: 1 mons at {ceph001=172.16.4.32:6789/0}, election epoch 2, 
 quorum 0 ceph001
osdmap e2: 1 osds: 0 up, 0 in
 pgmap v3: 192 pgs: 192 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB 
 avail
mdsmap e1: 0/0/1 up
 
 
 -Original Message-
 From: Sage Weil [mailto:s...@inktank.com] 
 Sent: Monday, September 09, 2013 7:09 PM
 To: Pavel Timoschenkov
 Cc: Alfredo Deza; ceph-users@lists.ceph.com
 Subject: RE: [ceph-users] trouble with ceph-deploy
 
 If you manually use wipefs to clear out the fs signatures after you zap, does 
 it work then?
 
 I've opened http://tracker.ceph.com/issues/6258 as I think that is the answer 
 here, but if you could confirm that wipefs does in fact solve the problem, 
 that would be helpful!
 
 Thanks-
 sage
 
 
 On Mon, 9 Sep 2013, Pavel Timoschenkov wrote:
 
  for the experiment:
  
  - blank disk sdae for data
  
  blkid -p /dev/sdaf
  /dev/sdaf: PTTYPE=gpt
  
  - and sda4 partition for journal
  
  blkid -p /dev/sda4
  /dev/sda4: PTTYPE=gpt PART_ENTRY_SCHEME=gpt PART_ENTRY_NAME=Linux 
  filesystem PART_ENTRY_UUID=cdc46436-b6ed-40bb-adb4-63cf1c41cbe3 
  PART_ENTRY_TYPE=0fc63daf-8483-4772-8e79-3d69d8477de4 
  PART_ENTRY_NUMBER=4 PART_ENTRY_OFFSET=62916608 
  PART_ENTRY_SIZE=20971520 PART_ENTRY_DISK=8:0
  
  - zapped disk
  
  ceph-deploy disk zap ceph001:sdaf ceph001:sda4 [ceph_deploy.osd][DEBUG 
  ] zapping /dev/sdaf on ceph001 [ceph_deploy.osd][DEBUG ] zapping 
  /dev/sda4 on ceph001
  
  - after this:
  
  ceph-deploy osd create ceph001:sdae:/dev/sda4 [ceph_deploy.osd][DEBUG 
  ] Preparing cluster ceph disks ceph001:/dev/sdaf:/dev/sda4 
  [ceph_deploy.osd][DEBUG ] Deploying osd to ceph001 
  [ceph_deploy.osd][DEBUG ] Host ceph001 is now ready for osd use.
  [ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaf 
  journal /dev/sda4 activate True
  
  
  - after this:
  
  blkid -p /dev/sdaf1
  /dev/sdaf1: ambivalent result (probably more filesystems on the 
  device, use wipefs(8) to see more details)
  
  wipefs /dev/sdaf1
  offset   type
  
  0x3  zfs_member   [raid]
  
  0x0  xfs   [filesystem]
   UUID:  aba50262-0427-4f8b-8eb9-513814af6b81
  
  - and OSD not created
  
  but if I'm using sungle disk for data and journal:
  
  ceph-deploy disk zap ceph001:sdaf
  [ceph_deploy.osd][DEBUG ] zapping /dev/sdaf on ceph001
  
  ceph-deploy osd create ceph001:sdaf
  [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph001:/dev/sdaf:
  [ceph_deploy.osd][DEBUG ] Deploying osd to ceph001 
  [ceph_deploy.osd][DEBUG ] Host ceph001 is now ready for osd use.
  [ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaf 
  journal None activate True
  
  OSD created!
  
  -Original Message-
  From: Sage Weil 

Re: [ceph-users] Understanding ceph status

2013-09-10 Thread Joao Eduardo Luis

On 09/10/2013 12:59 AM, Gaylord Holder wrote:

There are a lot of numbers ceph status prints.

Is there any documentation on what they are?

I'm particulary curious about what seems a total data.

ceph status says I have 314TB, when I calculate I have 24TB.

It also says:

10615 GB used, 8005 GB / 18621 GB avail;


This is, respectively, the sum of used space as reported by each osd, 
sum of available space as reported by each osd, and sum of total space 
as reported by each osd.


Each osd always report the result from 'statfs', issued on the osd data 
dir.  This means that it will always obtain the results as reported by 
the fs when the osd lives.


This sum doesn't take into account the replication factor, thus this 
should always be taken as the raw size for the file system on which each 
osd lives.


The difference between your 26TB and the reported 18TB would most likely 
be the space taken by the fs for metadata and whatnot, leaving you with 
a given N bytes available for storage, such that N*num_osds = 8 missing 
TBs your noticing.




which I take to be 10TB used/8T available for use, and 18TB total
available.

This doesn't make sense to me as I have 24TB raw and with default 2x
replication, I should only have 12TB available??

I see MB/s, K/s, o/s, but what are E/s units?


'E/s' would be a reference to 'Exa'.  Could you provide a sample line 
where you're seeing this?


  -Joao



-Gaylord
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] About the data movement in Ceph

2013-09-10 Thread Sage Weil
On Tue, 10 Sep 2013, atrmat wrote:
 Hi all,
 recently i read the source code and paper, and i have some questions about
 the data movement:
 1. when OSD's add or removal, how Ceph do this data migration and rebalance
 the crush map? is it the rados modify the crush map or cluster map, and the
 primary OSD does the data movement according to the cluster map? how to
 found the data migration in the source code?

The OSDMap changes when the osd is added or removed (or some other event 
or administrator action happens).  In response, the OSDs recalculate where 
the PGs should be stored, and move data in response to that.

 2. when OSD's down or failed, how Ceph recover the data in other OSDs? is it
 the primary OSD copy the PG to the new located OSD?

The (new) primary figures out where data is/was (peering) and the 
coordinates any data migration (recovery) to where the data should now be 
(according to the latest OSDMap and its embedded CRUSH map).

 3. the OSD has 4 status bits: up,down,in,out. But i can't found the defined
 status-- CEPH_OSD_DOWN, is it the OSD call the function mark_osd_down() to
 modify the OSD status in OSDMap?

See OSDMap.h: is_up() and is_down().  For in/out, it is either binary 
(is_in() and is_out() or can be somewhere in between; see get_weight()).

Hope that helps!

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.67.3 Dumpling released

2013-09-10 Thread Sage Weil
This point release fixes a few important performance regressions with the 
OSD (both with CPU and disk utilization), as well as several other 
important but less common problems. We recommend that all production users 
upgrade.

Notable changes since v0.67.2 include:

 * ceph-disk: partprobe after creation journal partition
 * ceph-disk: specify fs type when mounting
 * ceph-post-file: new utility to help share logs and other files with 
   ceph developers
 * libcephfs: fix truncate vs readahead race (crash)
 * mds: fix flock/fcntl lock deadlock
 * mds: fix rejoin loop when encountering pre-dumpling backpointers
 * mon: allow name and addr discovery during election stage
 * mon: always refresh after Paxos store_state (fixes recovery corner 
   case)
 * mon: fix off-by-4x bug with osd byte counts
 * osd: add and disable 'pg log keys debug' by default
 * osd: add option to disable throttling
 * osd: avoid leveldb iterators for pg log append and trim
 * osd: fix readdir_r invocations
 * osd: use fdatasync instead of sync
 * radosgw: fix sysvinit script return status
 * rbd: relicense as LGPL2
 * rgw: flush pending data on multipart upload
 * rgw: recheck object name during S3 POST
 * rgw: reorder init/startup
 * rpm: fix debuginfo package build

For more detailed information, see

 * http://ceph.com/docs/master/release-notes/#v0-67-3-dumpling
 * http://ceph.com/docs/master/_downloads/v0.67.3.txt

You can get v0.67.3 from the usual locations:

 * Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.67.3.tar.gz
 * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian
 * For RPMs, see http://ceph.com/docs/master/install/rpm
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Samuel Just
Can you post the rest of you crush map?
-Sam

On Tue, Sep 10, 2013 at 5:52 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
 I also checked that all files in that PG still are on that PG :

 for IMG in `find . -type f -printf '%f\n' | awk -F '__' '{ print $1 }' |
 sort --unique` ; do echo -n $IMG ; ceph osd map ssd3copies $IMG | grep
 -v 6\\.31f ; echo ; done

 And all objects are referenced in rados (compared with rados --pool
 ssd3copies ls rados.ssd3copies.dump).



 Le mardi 10 septembre 2013 à 13:46 +0200, Olivier Bonvalet a écrit :
 Some additionnal informations : if I look on one PG only, for example
 the 6.31f. ceph pg dump report a size of 616GB :

 # ceph pg dump | grep ^6\\. | awk '{ SUM+=($6/1024/1024) } END { print SUM }'
 631717

 But on disk, on the 3 replica I have :
 # du -sh  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
 1,3G  /var/lib/ceph/osd/ceph-50/current/6.31f_head/

 Since I was suspected a snapshot problem, I try to count only head
 files :
 # find /var/lib/ceph/osd/ceph-50/current/6.31f_head/ -type f -name '*head*' 
 -print0 | xargs -r -0 du -hc | tail -n1
 448M  total

 and the content of the directory : http://pastebin.com/u73mTvjs


 Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit :
  Hi,
 
  I have a space problem on a production cluster, like if there is unused
  data not freed : ceph df and rados df reports 613GB of data, and
  disk usage is 2640GB (with 3 replica). It should be near 1839GB.
 
 
  I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush
  rules to put pools on SAS or on SSD.
 
  My pools :
  # ceph osd dump | grep ^pool
  pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
  pg_num 576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45
  pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash 
  rjenkins pg_num 576 pgp_num 576 last_change 68317 owner 0
  pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins 
  pg_num 576 pgp_num 576 last_change 68321 owner 0
  pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
  rjenkins pg_num 200 pgp_num 200 last_change 172933 owner 0
  pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash 
  rjenkins pg_num 800 pgp_num 800 last_change 172929 owner 0
  pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
  rjenkins pg_num 2048 pgp_num 2048 last_change 172935 owner 0
 
  Only hdd3copies, sas3copies and ssd3copies are really used :
  # ceph df
  GLOBAL:
  SIZE   AVAIL  RAW USED %RAW USED
  76498G 51849G 24648G   32.22
 
  POOLS:
  NAME   ID USED  %USED OBJECTS
  data   0  46753 0 72
  metadata   1  0 0 0
  rbd2  8 0 1
  hdd3copies 3  2724G 3.56  5190954
  ssd3copies 6  613G  0.80  347668
  sas3copies 9  3692G 4.83  764394
 
 
  My CRUSH rules was :
 
  rule SASperHost {
  ruleset 4
  type replicated
  min_size 1
  max_size 10
  step take SASroot
  step chooseleaf firstn 0 type host
  step emit
  }
 
  and :
 
  rule SSDperOSD {
  ruleset 3
  type replicated
  min_size 1
  max_size 10
  step take SSDroot
  step choose firstn 0 type osd
  step emit
  }
 
 
  but, since the cluster was full because of that space problem, I swith to 
  a different rule :
 
  rule SSDperOSDfirst {
  ruleset 7
  type replicated
  min_size 1
  max_size 10
  step take SSDroot
  step choose firstn 1 type osd
  step emit
  step take SASroot
  step chooseleaf firstn -1 type net
  step emit
  }
 
 
  So with that last rule, I should have only one replica on my SSD OSD, so 
  613GB of space used. But if I check on OSD I see 1212GB really used.
 
  I also use snapshots, maybe snapshots are ignored by ceph df and rados 
  df ?
 
  Thanks for any help.
 
  Olivier
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl wrote:
 Hey list,

 I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
 cluster was:
 - Unmount CephFS everywhere.
 - Upgrade the Ceph-packages.
 - Restart MON.
 - Restart OSD.
 - Restart MDS.

 As soon as I got to the second node, the MDS crashed right after startup.

 Part of the logs (more on request):

 - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
 0~0] 1.d902
 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
-11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
 1: openin
 g mds log
-10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
 discovering lo
 g bounds
 -9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
 recover s
 tart
 -8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
 read_head
 -7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 -
 - 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read 0~0]
 1.844f3
 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
 -6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 
 == mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
 20+0+0 (42
 35168662 0 0) 0x1e93380 con 0x1e5d580
 -5 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
 handle_subscribe_a
 ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10 19:37:32.796448
 -4 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.12:6802/53419
 -3 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.13:6802/45791
 -2 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.11:6800/16562
 -1 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 
 == osd.2 194.109.43.13:6802/45791 1  osd_op_reply(3 mds_anchortable
 [read 0~
 0] ack = -2 (No such file or directory)) v4  114+0+0 (3107677671 0
 0) 0x1e4d
 e00 con 0x1e5ddc0
  0 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
 function
 'void MDSTable::load_2(int, ceph::bufferlist, Context*)' thread
 7fd1ba81f700 ti
 me 2013-09-10 19:35:02.803673
 mds/MDSTable.cc: 152: FAILED assert(r = 0)

  ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
  1: (MDSTable::load_2(int, ceph::buffer::list, Context*)+0x44f) [0x77ce7f]
  2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b]
  3: (MDS::handle_core_message(Message*)+0x987) [0x56f527]
  4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef]
  5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb]
  6: (DispatchQueue::entry()+0x592) [0x92e432]
  7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd]
  8: (()+0x68ca) [0x7fd1bed298ca]
  9: (clone()+0x6d) [0x7fd1bda5cb6d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.

 When trying to mount CephFS, it just hangs now.  Sometimes, an MDS stays
 up for a while, but will eventually crash again.  This CephFS was
 created on 0.67 and I haven't done anything but mount and use it under
 very light load in the mean time.

 Any ideas, or if you need more info, let me know.  It would be nice to
 get my data back, but I have backups too.

Does the filesystem have any data in it? Every time we've seen this
error it's been on an empty cluster which had some weird issue with
startup.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] quick-ceph-deploy

2013-09-10 Thread sriram
Yes I am able to do that.


On Fri, Sep 6, 2013 at 8:19 AM, Alfredo Deza alfredo.d...@inktank.comwrote:

 On Fri, Sep 6, 2013 at 11:05 AM, sriram sriram@gmail.com wrote:
  sudo su -c 'rpm --import
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
  error: https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc:
 key
  1 import failed.
 

 Can you actually get to that URL and see the GPG key?

 Via curl/wget or the browser (if you have one in that host)
 
  On Fri, Sep 6, 2013 at 8:01 AM, Alfredo Deza alfredo.d...@inktank.com
  wrote:
 
  On Fri, Sep 6, 2013 at 10:54 AM, sriram sriram@gmail.com wrote:
   I am running it on the same machine.
  
   [abc@abc-ld ~]$ ceph-deploy install abc-ld
  
  
   On Fri, Sep 6, 2013 at 5:42 AM, Alfredo Deza 
 alfredo.d...@inktank.com
   wrote:
  
   On Thu, Sep 5, 2013 at 8:25 PM, sriram sriram@gmail.com wrote:
I am trying to deploy ceph reading the instructions from this link.
   
http://ceph.com/docs/master/start/quick-ceph-deploy/
   
I get the error below. Can someone let me know if this is something
related
to what I am doing wrong or the script?
   
[abc@abc-ld ~]$ ceph-deploy install abc-ld
[ceph_deploy.install][DEBUG ] Installing stable version dumpling on
cluster
ceph hosts abc-ld
[ceph_deploy.install][DEBUG ] Detecting platform for host abc-ld
 ...
[sudo] password for abc:
[ceph_deploy.install][INFO  ] Distro info:
RedHatEnterpriseWorkstation
6.1
Santiago
[abc-ld][INFO  ] installing ceph on abc-ld
[abc-ld][INFO  ] Running command: su -c 'rpm --import
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 '
[abc-ld][ERROR ] Traceback (most recent call last):
[abc-ld][ERROR ]   File
   
   
 /usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py,
line
21, in install
[abc-ld][ERROR ]   File
/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py,
line
10,
in inner
[abc-ld][ERROR ] def inner(*args, **kwargs):
[abc-ld][ERROR ]   File
/usr/lib/python2.6/site-packages/ceph_deploy/util/wrappers.py,
 line
6,
in
remote_call
[abc-ld][ERROR ] This allows us to only remote-execute the
 actual
calls,
not whole functions.
[abc-ld][ERROR ]   File /usr/lib64/python2.6/subprocess.py, line
502,
in
check_call
[abc-ld][ERROR ] raise CalledProcessError(retcode, cmd)
[abc-ld][ERROR ] CalledProcessError: Command '['su -c \'rpm
 --import
   
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 \'']'
returned non-zero exit status 1
[abc-ld][ERROR ] error:
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc:
 key
1
import failed.
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: su
 -c
'rpm
--import
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 '
  
   Can you try running that command on the host that it failed (I think
   that would be abc-ld)
   and paste the output?
 
  I mean, to run the actual command (from the log output) that caused the
  failure.
 
  In your case, it would be:
 
  rpm --import
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;
 
  
   For some reason that `rpm --import` failed. Could be network related.
  
   
   
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   
  
  
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey list,

I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
cluster was:
- Unmount CephFS everywhere.
- Upgrade the Ceph-packages.
- Restart MON.
- Restart OSD.
- Restart MDS.

As soon as I got to the second node, the MDS crashed right after startup.

Part of the logs (more on request):

- 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
0~0] 1.d902
70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
   -11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
1: openin
g mds log
   -10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
discovering lo
g bounds
-9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
recover s
tart
-8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
read_head
-7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
194.109.43.12:6800/67277 -
- 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read 0~0]
1.844f3
494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
-6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
194.109.43.12:6800/67277 
== mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
20+0+0 (42
35168662 0 0) 0x1e93380 con 0x1e5d580
-5 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
handle_subscribe_a
ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10 19:37:32.796448
-4 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
ms_handle_connect on
 194.109.43.12:6802/53419
-3 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
ms_handle_connect on
 194.109.43.13:6802/45791
-2 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
ms_handle_connect on
 194.109.43.11:6800/16562
-1 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
194.109.43.12:6800/67277 
== osd.2 194.109.43.13:6802/45791 1  osd_op_reply(3 mds_anchortable
[read 0~
0] ack = -2 (No such file or directory)) v4  114+0+0 (3107677671 0
0) 0x1e4d
e00 con 0x1e5ddc0
 0 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
function
'void MDSTable::load_2(int, ceph::bufferlist, Context*)' thread
7fd1ba81f700 ti
me 2013-09-10 19:35:02.803673
mds/MDSTable.cc: 152: FAILED assert(r = 0)

 ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
 1: (MDSTable::load_2(int, ceph::buffer::list, Context*)+0x44f) [0x77ce7f]
 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b]
 3: (MDS::handle_core_message(Message*)+0x987) [0x56f527]
 4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef]
 5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb]
 6: (DispatchQueue::entry()+0x592) [0x92e432]
 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd]
 8: (()+0x68ca) [0x7fd1bed298ca]
 9: (clone()+0x6d) [0x7fd1bda5cb6d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is
needed to interpret this.

When trying to mount CephFS, it just hangs now.  Sometimes, an MDS stays
up for a while, but will eventually crash again.  This CephFS was
created on 0.67 and I haven't done anything but mount and use it under
very light load in the mean time.

Any ideas, or if you need more info, let me know.  It would be nice to
get my data back, but I have backups too.

PS: Note the No such file or directory in the above logs.


   Regards,

  Oliver
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet
Le mardi 10 septembre 2013 à 11:19 -0700, Samuel Just a écrit :
 Can you post the rest of you crush map?
 -Sam
 

Yes :

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 device3
device 4 device4
device 5 device5
device 6 device6
device 7 device7
device 8 device8
device 9 device9
device 10 device10
device 11 device11
device 12 device12
device 13 device13
device 14 device14
device 15 device15
device 16 device16
device 17 device17
device 18 device18
device 19 device19
device 20 device20
device 21 device21
device 22 device22
device 23 device23
device 24 device24
device 25 device25
device 26 device26
device 27 device27
device 28 device28
device 29 device29
device 30 device30
device 31 device31
device 32 device32
device 33 device33
device 34 device34
device 35 device35
device 36 device36
device 37 device37
device 38 device38
device 39 device39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47
device 48 osd.48
device 49 osd.49
device 50 osd.50
device 51 osd.51
device 52 osd.52
device 53 osd.53
device 54 osd.54
device 55 osd.55
device 56 osd.56
device 57 osd.57
device 58 osd.58
device 59 osd.59
device 60 osd.60
device 61 osd.61
device 62 osd.62
device 63 osd.63
device 64 osd.64
device 65 osd.65
device 66 osd.66
device 67 osd.67
device 68 osd.68
device 69 osd.69
device 70 osd.70
device 71 osd.71
device 72 osd.72
device 73 osd.73
device 74 osd.74
device 75 osd.75
device 76 osd.76
device 77 osd.77
device 78 osd.78

# types
type 0 osd
type 1 host
type 2 rack
type 3 net
type 4 room
type 5 datacenter
type 6 root

# buckets
host dragan {
id -17  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.70 weight 2.720
item osd.71 weight 2.720
item osd.72 weight 2.720
item osd.73 weight 2.720
item osd.74 weight 2.720
item osd.75 weight 2.720
item osd.76 weight 2.720
item osd.77 weight 2.720
item osd.78 weight 2.720
}
rack SAS15B01 {
id -40  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item dragan weight 24.480
}
net SAS188-165-15 {
id -72  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS15B01 weight 24.480
}
room SASs15 {
id -90  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS188-165-15 weight 24.480
}
datacenter SASrbx1 {
id -100 # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SASs15 weight 24.480
}
host taman {
id -16  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.49 weight 2.720
item osd.62 weight 2.720
item osd.63 weight 2.720
item osd.64 weight 2.720
item osd.65 weight 2.720
item osd.66 weight 2.720
item osd.67 weight 2.720
item osd.68 weight 2.720
item osd.69 weight 2.720
}
rack SAS31A10 {
id -15  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item taman weight 24.480
}
net SAS178-33-62 {
id -14  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS31A10 weight 24.480
}
room SASs31 {
id -13  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS178-33-62 weight 24.480
}
host kaino {
id -9   # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.40 weight 2.720
item osd.41 weight 2.720
item osd.42 weight 2.720
item osd.43 weight 2.720
item osd.44 weight 2.720
item osd.45 weight 2.720
item osd.46 weight 2.720
item osd.47 weight 2.720
item osd.48 weight 2.720
}
rack SAS34A14 {
id -10  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item kaino weight 24.480
}
net SAS5-135-135 {
id -11  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS34A14 weight 24.480
}
room SASs34 {
id -12  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS5-135-135 weight 24.480
}
datacenter SASrbx2 {
id -101 # do not change unnecessarily
# weight 48.960
alg straw
hash 0  # rjenkins1
 

Re: [ceph-users] status of glance/cinder/nova integration in openstack grizzly

2013-09-10 Thread Mike Dawson

Darren,

I can confirm Copy on Write (show_image_direct_url = True) does work in 
Grizzly.


It sounds like you are close. To check permissions, run 'ceph auth 
list', and reply with client.images and client.volumes (or whatever 
keys you use in Glance and Cinder).


Cheers,
Mike Dawson


On 9/10/2013 10:12 AM, Darren Birkett wrote:

Hi All,

tl;dr - does glance/rbd and cinder/rbd play together nicely in grizzly?

I'm currently testing a ceph/rados back end with an openstack
installation.  I have the following things working OK:

1. cinder configured to create volumes in RBD
2. nova configured to boot from RBD backed cinder volumes (libvirt UUID
secret set etc)
3. glance configured to use RBD as a back end store for images

With this setup, when I create a bootable volume in cinder, passing an
id of an image in glance, the image gets downloaded, converted to raw,
and then created as an RBD object and made available to cinder.  The
correct metadata field for the cinder volume is populated
(volume_image_metadata) and so the cinder client marks the volume as
bootable.  This is all fine.

If I want to take advantage of the fact that both glance images and
cinder volumes are stored in RBD, I can add the following flag to the
glance-api.conf:

show_image_direct_url = True

This enables cinder to see that the glance image is stored in RBD, and
the cinder rbd driver then, instead of downloading the image and
creating an RBD image from it, just issues an 'rbd clone' command (seen
in the cinder-volume.log):

rbd clone --pool images --image dcb2f16d-a09d-4064-9198-1965274e214d
--snap snap --dest-pool volumes --dest
volume-20987f9d-b4fb-463d-8b8f-fa667bd47c6d

This is all very nice, and the cinder volume is available immediately as
you'd expect.  The problem is that the metadata field is not populated
so it's not seen as bootable.  Even manually populating this field
leaves the volume unbootable.  The volume can not even be attached to
another instance for inspection.

libvirt doesn't seem to be able to access the rbd device. From
nova-compute.log:

qemu-system-x86_64: -drive
file=rbd:volumes/volume-20987f9d-b4fb-463d-8b8f-fa667bd47c6d:id=volumes:key=AQAnAy9ScPB4IRAAtxD/V1rDciqFiT9AMPPr+A==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw,serial=20987f9d-b4fb-463d-8b8f-fa667bd47c6d,cache=none:
error reading header from volume-20987f9d-b4fb-463d-8b8f-fa667bd47c6d

qemu-system-x86_64: -drive
file=rbd:volumes/volume-20987f9d-b4fb-463d-8b8f-fa667bd47c6d:id=volumes:key=AQAnAy9ScPB4IRAAtxD/V1rDciqFiT9AMPPr+A==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw,serial=20987f9d-b4fb-463d-8b8f-fa667bd47c6d,cache=none:
could not open disk image
rbd:volumes/volume-20987f9d-b4fb-463d-8b8f-fa667bd47c6d:id=volumes:key=AQAnAy9ScPB4IRAAtxD/V1rDciqFiT9AMPPr+A==:auth_supported=cephx\;none:
Operation not permitted

It's almost like a permission issue, but my ceph/rbd knowledge is still
fledgeling.

I know that the cinder rbd driver has been rewritten to use librbd in
havana, and I'm wondering if this will change any of this behaviour?
  I'm also wondering if anyone has actually got this working with
grizzly, and how?

Many thanks
Darren



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet
and for more visibility (I hope :D), the osd tree :


# idweight  type name   up/down reweight
-8  11.65   root SSDroot
-33 5.8 datacenter SSDrbx1
-32 5.8 room SSDs01
-31 5.8 net SSD188-165-15
-30 5.8 rack SSD01B04
-29 5.8 host skullface
50  0.9 osd.50  up  
1   
51  0.85osd.51  up  
1   
52  1.05osd.52  up  
1   
53  1   osd.53  up  
1   
54  1   osd.54  up  
1   
55  1   osd.55  up  
1   
-27 5.85datacenter SSDrbx2
-34 5.85room SSDs31
-35 5.85net SSD5-135-134
-36 5.85rack SSD31B22
-37 5.85host myra
56  1.1 osd.56  up  
1   
57  1.1 osd.57  up  
1   
58  1   osd.58  up  
1   
59  0.9 osd.59  up  
1   
60  0.9 osd.60  up  
1   
61  0.85osd.61  up  
1   
-1  73.44   root SASroot
-10024.48   datacenter SASrbx1
-90 24.48   room SASs15
-72 24.48   net SAS188-165-15
-40 24.48   rack SAS15B01
-17 24.48   host dragan
70  2.72osd.70  up  
1   
71  2.72osd.71  up  
1   
72  2.72osd.72  up  
1   
73  2.72osd.73  up  
1   
74  2.72osd.74  up  
1   
75  2.72osd.75  up  
1   
76  2.72osd.76  up  
1   
77  2.72osd.77  up  
1   
78  2.72osd.78  up  
1   
-10148.96   datacenter SASrbx2
-13 24.48   room SASs31
-14 24.48   net SAS178-33-62
-15 24.48   rack SAS31A10
-16 24.48   host taman
49  2.72osd.49  up  
1   
62  2.72osd.62  up  
1   
63  2.72osd.63  up  
1   
64  2.72osd.64  up  
0   
65  2.72osd.65  down
0   
66  2.72osd.66  up  
1   
67  2.72osd.67  up  
1   
68  2.72osd.68  up  
1   
69  2.72osd.69  up  
1   
-12 24.48   room SASs34
-11 24.48   net SAS5-135-135
-10 24.48   rack SAS34A14
-9  24.48   host kaino
40  2.72osd.40  up  
1   
41  2.72osd.41  up  
1   
42  2.72osd.42  up  
1   
43  2.72osd.43  up  
1   
44  2.72osd.44  up  
1   
45  2.72osd.45  up  
1   
46  2.72osd.46  up  
1   
47  2.72osd.47  up  
1   
48  2.72osd.48  up  
1   

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet

I removed some garbage about hosts faude / rurkh / murmillia (they was
temporarily added because cluster was full). So the clean CRUSH map :


# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50

# devices
device 0 device0
device 1 device1
device 2 device2
device 3 device3
device 4 device4
device 5 device5
device 6 device6
device 7 device7
device 8 device8
device 9 device9
device 10 device10
device 11 device11
device 12 device12
device 13 device13
device 14 device14
device 15 device15
device 16 device16
device 17 device17
device 18 device18
device 19 device19
device 20 device20
device 21 device21
device 22 device22
device 23 device23
device 24 device24
device 25 device25
device 26 device26
device 27 device27
device 28 device28
device 29 device29
device 30 device30
device 31 device31
device 32 device32
device 33 device33
device 34 device34
device 35 device35
device 36 device36
device 37 device37
device 38 device38
device 39 device39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47
device 48 osd.48
device 49 osd.49
device 50 osd.50
device 51 osd.51
device 52 osd.52
device 53 osd.53
device 54 osd.54
device 55 osd.55
device 56 osd.56
device 57 osd.57
device 58 osd.58
device 59 osd.59
device 60 osd.60
device 61 osd.61
device 62 osd.62
device 63 osd.63
device 64 osd.64
device 65 osd.65
device 66 osd.66
device 67 osd.67
device 68 osd.68
device 69 osd.69
device 70 osd.70
device 71 osd.71
device 72 osd.72
device 73 osd.73
device 74 osd.74
device 75 osd.75
device 76 osd.76
device 77 osd.77
device 78 osd.78

# types
type 0 osd
type 1 host
type 2 rack
type 3 net
type 4 room
type 5 datacenter
type 6 root

# buckets
host dragan {
id -17  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.70 weight 2.720
item osd.71 weight 2.720
item osd.72 weight 2.720
item osd.73 weight 2.720
item osd.74 weight 2.720
item osd.75 weight 2.720
item osd.76 weight 2.720
item osd.77 weight 2.720
item osd.78 weight 2.720
}
rack SAS15B01 {
id -40  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item dragan weight 24.480
}
net SAS188-165-15 {
id -72  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS15B01 weight 24.480
}
room SASs15 {
id -90  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS188-165-15 weight 24.480
}
datacenter SASrbx1 {
id -100 # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SASs15 weight 24.480
}
host taman {
id -16  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.49 weight 2.720
item osd.62 weight 2.720
item osd.63 weight 2.720
item osd.64 weight 2.720
item osd.65 weight 2.720
item osd.66 weight 2.720
item osd.67 weight 2.720
item osd.68 weight 2.720
item osd.69 weight 2.720
}
rack SAS31A10 {
id -15  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item taman weight 24.480
}
net SAS178-33-62 {
id -14  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS31A10 weight 24.480
}
room SASs31 {
id -13  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS178-33-62 weight 24.480
}
host kaino {
id -9   # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.40 weight 2.720
item osd.41 weight 2.720
item osd.42 weight 2.720
item osd.43 weight 2.720
item osd.44 weight 2.720
item osd.45 weight 2.720
item osd.46 weight 2.720
item osd.47 weight 2.720
item osd.48 weight 2.720
}
rack SAS34A14 {
id -10  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item kaino weight 24.480
}
net SAS5-135-135 {
id -11  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS34A14 weight 24.480
}
room SASs34 {
id -12  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS5-135-135 weight 24.480
}
datacenter SASrbx2 {
id -101 # do not change unnecessarily
# weight 48.960
alg straw

Re: [ceph-users] status of glance/cinder/nova integration in openstack grizzly

2013-09-10 Thread Darren Birkett
Hi Mike,

Thanks - glad to hear it definitely works as expected!  Here's my
client.glance and client.volumes from 'ceph auth list':

client.glance
key: AQAWFi9SOKzAABAAPV1ZrpWkx72tmJ5E7nOi3A==
caps: [mon] allow r
caps: [osd] allow rwx pool=images, allow class-read object_prefix
rbd_children
client.volumes
key: AQAnAy9ScPB4IRAAtxD/V1rDciqFiT9AMPPr+A==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx
pool=volumes

Thanks
Darren


On 10 September 2013 20:08, Mike Dawson mike.daw...@cloudapt.com wrote:

 Darren,

 I can confirm Copy on Write (show_image_direct_url = True) does work in
 Grizzly.

 It sounds like you are close. To check permissions, run 'ceph auth list',
 and reply with client.images and client.volumes (or whatever keys you
 use in Glance and Cinder).

 Cheers,
 Mike Dawson



 On 9/10/2013 10:12 AM, Darren Birkett wrote:

 Hi All,

 tl;dr - does glance/rbd and cinder/rbd play together nicely in grizzly?

 I'm currently testing a ceph/rados back end with an openstack
 installation.  I have the following things working OK:

 1. cinder configured to create volumes in RBD
 2. nova configured to boot from RBD backed cinder volumes (libvirt UUID
 secret set etc)
 3. glance configured to use RBD as a back end store for images

 With this setup, when I create a bootable volume in cinder, passing an
 id of an image in glance, the image gets downloaded, converted to raw,
 and then created as an RBD object and made available to cinder.  The
 correct metadata field for the cinder volume is populated
 (volume_image_metadata) and so the cinder client marks the volume as
 bootable.  This is all fine.

 If I want to take advantage of the fact that both glance images and
 cinder volumes are stored in RBD, I can add the following flag to the
 glance-api.conf:

 show_image_direct_url = True

 This enables cinder to see that the glance image is stored in RBD, and
 the cinder rbd driver then, instead of downloading the image and
 creating an RBD image from it, just issues an 'rbd clone' command (seen
 in the cinder-volume.log):

 rbd clone --pool images --image dcb2f16d-a09d-4064-9198-**1965274e214d
 --snap snap --dest-pool volumes --dest
 volume-20987f9d-b4fb-463d-**8b8f-fa667bd47c6d

 This is all very nice, and the cinder volume is available immediately as
 you'd expect.  The problem is that the metadata field is not populated
 so it's not seen as bootable.  Even manually populating this field
 leaves the volume unbootable.  The volume can not even be attached to
 another instance for inspection.

 libvirt doesn't seem to be able to access the rbd device. From
 nova-compute.log:

 qemu-system-x86_64: -drive
 file=rbd:volumes/volume-**20987f9d-b4fb-463d-8b8f-**
 fa667bd47c6d:id=volumes:key=**AQAnAy9ScPB4IRAAtxD/**
 V1rDciqFiT9AMPPr+A==:auth_**supported=cephx\;none,if=none,**
 id=drive-virtio-disk0,format=**raw,serial=20987f9d-b4fb-463d-**
 8b8f-fa667bd47c6d,cache=none:
 error reading header from volume-20987f9d-b4fb-463d-**8b8f-fa667bd47c6d

 qemu-system-x86_64: -drive
 file=rbd:volumes/volume-**20987f9d-b4fb-463d-8b8f-**
 fa667bd47c6d:id=volumes:key=**AQAnAy9ScPB4IRAAtxD/**
 V1rDciqFiT9AMPPr+A==:auth_**supported=cephx\;none,if=none,**
 id=drive-virtio-disk0,format=**raw,serial=20987f9d-b4fb-463d-**
 8b8f-fa667bd47c6d,cache=none:
 could not open disk image
 rbd:volumes/volume-20987f9d-**b4fb-463d-8b8f-fa667bd47c6d:**
 id=volumes:key=**AQAnAy9ScPB4IRAAtxD/**V1rDciqFiT9AMPPr+A==:auth_**
 supported=cephx\;none:
 Operation not permitted

 It's almost like a permission issue, but my ceph/rbd knowledge is still
 fledgeling.

 I know that the cinder rbd driver has been rewritten to use librbd in
 havana, and I'm wondering if this will change any of this behaviour?
   I'm also wondering if anyone has actually got this working with
 grizzly, and how?

 Many thanks
 Darren



 __**_
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] quick-ceph-deploy

2013-09-10 Thread Tamil Muthamizhan
Hi Sriram,

this should help: http://ceph.com/docs/master/install/rpm/

Regards,
Tamil


On Tue, Sep 10, 2013 at 12:55 PM, sriram sriram@gmail.com wrote:

 Can someone tell me the equivalent steps in RHEL for the steps below -

 wget -q -O- 
 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo 
 apt-key add -
 echo deb http://ceph.com/debian-dumpling/ $(lsb_release -sc) main | sudo tee 
 /etc/apt/sources.list.d/ceph.list
 sudo apt-get update
 sudo apt-get install ceph-deploy



 On Tue, Sep 10, 2013 at 12:40 PM, sriram sriram@gmail.com wrote:

 Any help here is appreciated. I am pretty much stuck in trying to install
 ceph on my local box.


 On Tue, Sep 10, 2013 at 11:02 AM, sriram sriram@gmail.com wrote:

 Yes I am able to do that.


 On Fri, Sep 6, 2013 at 8:19 AM, Alfredo Deza 
 alfredo.d...@inktank.comwrote:

 On Fri, Sep 6, 2013 at 11:05 AM, sriram sriram@gmail.com wrote:
  sudo su -c 'rpm --import
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
  error:
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc: key
  1 import failed.
 

 Can you actually get to that URL and see the GPG key?

 Via curl/wget or the browser (if you have one in that host)
 
  On Fri, Sep 6, 2013 at 8:01 AM, Alfredo Deza 
 alfredo.d...@inktank.com
  wrote:
 
  On Fri, Sep 6, 2013 at 10:54 AM, sriram sriram@gmail.com
 wrote:
   I am running it on the same machine.
  
   [abc@abc-ld ~]$ ceph-deploy install abc-ld
  
  
   On Fri, Sep 6, 2013 at 5:42 AM, Alfredo Deza 
 alfredo.d...@inktank.com
   wrote:
  
   On Thu, Sep 5, 2013 at 8:25 PM, sriram sriram@gmail.com
 wrote:
I am trying to deploy ceph reading the instructions from this
 link.
   
http://ceph.com/docs/master/start/quick-ceph-deploy/
   
I get the error below. Can someone let me know if this is
 something
related
to what I am doing wrong or the script?
   
[abc@abc-ld ~]$ ceph-deploy install abc-ld
[ceph_deploy.install][DEBUG ] Installing stable version
 dumpling on
cluster
ceph hosts abc-ld
[ceph_deploy.install][DEBUG ] Detecting platform for host
 abc-ld ...
[sudo] password for abc:
[ceph_deploy.install][INFO  ] Distro info:
RedHatEnterpriseWorkstation
6.1
Santiago
[abc-ld][INFO  ] installing ceph on abc-ld
[abc-ld][INFO  ] Running command: su -c 'rpm --import

 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
[abc-ld][ERROR ] Traceback (most recent call last):
[abc-ld][ERROR ]   File
   
   
 /usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py,
line
21, in install
[abc-ld][ERROR ]   File
   
 /usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py,
line
10,
in inner
[abc-ld][ERROR ] def inner(*args, **kwargs):
[abc-ld][ERROR ]   File
   
 /usr/lib/python2.6/site-packages/ceph_deploy/util/wrappers.py, line
6,
in
remote_call
[abc-ld][ERROR ] This allows us to only remote-execute the
 actual
calls,
not whole functions.
[abc-ld][ERROR ]   File /usr/lib64/python2.6/subprocess.py,
 line
502,
in
check_call
[abc-ld][ERROR ] raise CalledProcessError(retcode, cmd)
[abc-ld][ERROR ] CalledProcessError: Command '['su -c \'rpm
 --import
   

 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc\'']'
returned non-zero exit status 1
[abc-ld][ERROR ] error:
   
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc: key
1
import failed.
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command:
 su -c
'rpm
--import

 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
  
   Can you try running that command on the host that it failed (I
 think
   that would be abc-ld)
   and paste the output?
 
  I mean, to run the actual command (from the log output) that caused
 the
  failure.
 
  In your case, it would be:
 
  rpm --import
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;
 
  
   For some reason that `rpm --import` failed. Could be network
 related.
  
   
   
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   
  
  
 
 





 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
Regards,
Tamil
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] quick-ceph-deploy

2013-09-10 Thread sriram
Can someone tell me the equivalent steps in RHEL for the steps below -

wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc'
| sudo apt-key add -
echo deb http://ceph.com/debian-dumpling/ $(lsb_release -sc) main |
sudo tee /etc/apt/sources.list.d/ceph.list
sudo apt-get update
sudo apt-get install ceph-deploy



On Tue, Sep 10, 2013 at 12:40 PM, sriram sriram@gmail.com wrote:

 Any help here is appreciated. I am pretty much stuck in trying to install
 ceph on my local box.


 On Tue, Sep 10, 2013 at 11:02 AM, sriram sriram@gmail.com wrote:

 Yes I am able to do that.


 On Fri, Sep 6, 2013 at 8:19 AM, Alfredo Deza alfredo.d...@inktank.comwrote:

 On Fri, Sep 6, 2013 at 11:05 AM, sriram sriram@gmail.com wrote:
  sudo su -c 'rpm --import
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
  error:
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc: key
  1 import failed.
 

 Can you actually get to that URL and see the GPG key?

 Via curl/wget or the browser (if you have one in that host)
 
  On Fri, Sep 6, 2013 at 8:01 AM, Alfredo Deza alfredo.d...@inktank.com
 
  wrote:
 
  On Fri, Sep 6, 2013 at 10:54 AM, sriram sriram@gmail.com wrote:
   I am running it on the same machine.
  
   [abc@abc-ld ~]$ ceph-deploy install abc-ld
  
  
   On Fri, Sep 6, 2013 at 5:42 AM, Alfredo Deza 
 alfredo.d...@inktank.com
   wrote:
  
   On Thu, Sep 5, 2013 at 8:25 PM, sriram sriram@gmail.com
 wrote:
I am trying to deploy ceph reading the instructions from this
 link.
   
http://ceph.com/docs/master/start/quick-ceph-deploy/
   
I get the error below. Can someone let me know if this is
 something
related
to what I am doing wrong or the script?
   
[abc@abc-ld ~]$ ceph-deploy install abc-ld
[ceph_deploy.install][DEBUG ] Installing stable version dumpling
 on
cluster
ceph hosts abc-ld
[ceph_deploy.install][DEBUG ] Detecting platform for host abc-ld
 ...
[sudo] password for abc:
[ceph_deploy.install][INFO  ] Distro info:
RedHatEnterpriseWorkstation
6.1
Santiago
[abc-ld][INFO  ] installing ceph on abc-ld
[abc-ld][INFO  ] Running command: su -c 'rpm --import

 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
[abc-ld][ERROR ] Traceback (most recent call last):
[abc-ld][ERROR ]   File
   
   
 /usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py,
line
21, in install
[abc-ld][ERROR ]   File
   
 /usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py,
line
10,
in inner
[abc-ld][ERROR ] def inner(*args, **kwargs):
[abc-ld][ERROR ]   File
/usr/lib/python2.6/site-packages/ceph_deploy/util/wrappers.py,
 line
6,
in
remote_call
[abc-ld][ERROR ] This allows us to only remote-execute the
 actual
calls,
not whole functions.
[abc-ld][ERROR ]   File /usr/lib64/python2.6/subprocess.py,
 line
502,
in
check_call
[abc-ld][ERROR ] raise CalledProcessError(retcode, cmd)
[abc-ld][ERROR ] CalledProcessError: Command '['su -c \'rpm
 --import
   

 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc\'']'
returned non-zero exit status 1
[abc-ld][ERROR ] error:
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc:
 key
1
import failed.
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command:
 su -c
'rpm
--import

 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
  
   Can you try running that command on the host that it failed (I
 think
   that would be abc-ld)
   and paste the output?
 
  I mean, to run the actual command (from the log output) that caused
 the
  failure.
 
  In your case, it would be:
 
  rpm --import
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;
 
  
   For some reason that `rpm --import` failed. Could be network
 related.
  
   
   
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   
  
  
 
 




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] status of glance/cinder/nova integration in openstack grizzly

2013-09-10 Thread Mike Dawson


On 9/10/2013 4:50 PM, Darren Birkett wrote:

Hi Mike,

That led me to realise what the issue was.  My cinder (volumes) client
did not have the correct perms on the images pool.  I ran the following
to update the perms for that client:

ceph auth caps client.volumes mon 'allow r' osd 'allow class-read
object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images'

...and was then able to successfully boot an instance from a cinder
volume that was created by cloning a glance image from the images pool!

Glad you found it. This has been a sticking point for several people.



One last question: I presume the fact that the 'volume_image_metadata'
field is not populated when cloning a glance image into a cinder volume
is a bug?  It means that the cinder client doesn't show the volume as
bootable, though I'm not sure what other detrimental effect it actually
has (clearly the volume can be booted from).
I think you are talking about data in the cinder table of your database 
backend (mysql?). I don't have 'volume_image_metadata' at all here. I 
don't think this is the issue.


To create a Cinder volume from Glance, I do something like:

cinder --os_tenant_name MyTenantName create --image-id 
00e0042e-d007-400a-918a-d5e00cea8b0f --display-name MyVolumeName 40


I can then spin up an instance backed by MyVolumeName and boot as expected.



Thanks
Darren


On 10 September 2013 21:04, Darren Birkett darren.birk...@gmail.com
mailto:darren.birk...@gmail.com wrote:

Hi Mike,

Thanks - glad to hear it definitely works as expected!  Here's my
client.glance and client.volumes from 'ceph auth list':

client.glance
key: AQAWFi9SOKzAABAAPV1ZrpWkx72tmJ5E7nOi3A==
caps: [mon] allow r
caps: [osd] allow rwx pool=images, allow class-read object_prefix
rbd_children
client.volumes
key: AQAnAy9ScPB4IRAAtxD/V1rDciqFiT9AMPPr+A==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx
pool=volumes

Thanks
Darren


On 10 September 2013 20:08, Mike Dawson mike.daw...@cloudapt.com
mailto:mike.daw...@cloudapt.com wrote:

Darren,

I can confirm Copy on Write (show_image_direct_url = True) does
work in Grizzly.

It sounds like you are close. To check permissions, run 'ceph
auth list', and reply with client.images and client.volumes
(or whatever keys you use in Glance and Cinder).

Cheers,
Mike Dawson



On 9/10/2013 10:12 AM, Darren Birkett wrote:

Hi All,

tl;dr - does glance/rbd and cinder/rbd play together nicely
in grizzly?

I'm currently testing a ceph/rados back end with an openstack
installation.  I have the following things working OK:

1. cinder configured to create volumes in RBD
2. nova configured to boot from RBD backed cinder volumes
(libvirt UUID
secret set etc)
3. glance configured to use RBD as a back end store for images

With this setup, when I create a bootable volume in cinder,
passing an
id of an image in glance, the image gets downloaded,
converted to raw,
and then created as an RBD object and made available to
cinder.  The
correct metadata field for the cinder volume is populated
(volume_image_metadata) and so the cinder client marks the
volume as
bootable.  This is all fine.

If I want to take advantage of the fact that both glance
images and
cinder volumes are stored in RBD, I can add the following
flag to the
glance-api.conf:

show_image_direct_url = True

This enables cinder to see that the glance image is stored
in RBD, and
the cinder rbd driver then, instead of downloading the image and
creating an RBD image from it, just issues an 'rbd clone'
command (seen
in the cinder-volume.log):

rbd clone --pool images --image
dcb2f16d-a09d-4064-9198-__1965274e214d
--snap snap --dest-pool volumes --dest
volume-20987f9d-b4fb-463d-__8b8f-fa667bd47c6d

This is all very nice, and the cinder volume is available
immediately as
you'd expect.  The problem is that the metadata field is not
populated
so it's not seen as bootable.  Even manually populating this
field
leaves the volume unbootable.  The volume can not even be
attached to
another instance for inspection.

libvirt doesn't seem to be able to access the rbd device. From
nova-compute.log:

qemu-system-x86_64: -drive


Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
It's not an upgrade issue. There's an MDS object that is somehow
missing. If it exists, then on restart you'll be fine.

Oliver, what is your general cluster config? What filesystem are your
OSDs running on? What version of Ceph were you upgrading from? There's
really no way for this file to not exist once created unless the
underlying FS ate it or the last write both was interrupted and hit
some kind of bug in our transaction code (of which none are known)
during replay.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry larry@disney.com wrote:
 This is scary. Should I hold on upgrade?

 On 9/10/13 11:33 AM, Oliver Daudey oli...@xs4all.nl wrote:

Hey Gregory,

On 10-09-13 20:21, Gregory Farnum wrote:
 On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl
wrote:
 Hey list,

 I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
 cluster was:
 - Unmount CephFS everywhere.
 - Upgrade the Ceph-packages.
 - Restart MON.
 - Restart OSD.
 - Restart MDS.

 As soon as I got to the second node, the MDS crashed right after
startup.

 Part of the logs (more on request):

 - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
 0~0] 1.d902
 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
-11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
 1: openin
 g mds log
-10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
 discovering lo
 g bounds
 -9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
 recover s
 tart
 -8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
 read_head
 -7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 -
 - 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read
0~0]
 1.844f3
 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
 -6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 
 == mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
 20+0+0 (42
 35168662 0 0) 0x1e93380 con 0x1e5d580
 -5 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
 handle_subscribe_a
 ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10
19:37:32.796448
 -4 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.12:6802/53419
 -3 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.13:6802/45791
 -2 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.11:6800/16562
 -1 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 
 == osd.2 194.109.43.13:6802/45791 1  osd_op_reply(3 mds_anchortable
 [read 0~
 0] ack = -2 (No such file or directory)) v4  114+0+0 (3107677671 0
 0) 0x1e4d
 e00 con 0x1e5ddc0
  0 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
 function
 'void MDSTable::load_2(int, ceph::bufferlist, Context*)' thread
 7fd1ba81f700 ti
 me 2013-09-10 19:35:02.803673
 mds/MDSTable.cc: 152: FAILED assert(r = 0)

  ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
  1: (MDSTable::load_2(int, ceph::buffer::list, Context*)+0x44f)
[0x77ce7f]
  2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b]
  3: (MDS::handle_core_message(Message*)+0x987) [0x56f527]
  4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef]
  5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb]
  6: (DispatchQueue::entry()+0x592) [0x92e432]
  7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd]
  8: (()+0x68ca) [0x7fd1bed298ca]
  9: (clone()+0x6d) [0x7fd1bda5cb6d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.

 When trying to mount CephFS, it just hangs now.  Sometimes, an MDS
stays
 up for a while, but will eventually crash again.  This CephFS was
 created on 0.67 and I haven't done anything but mount and use it under
 very light load in the mean time.

 Any ideas, or if you need more info, let me know.  It would be nice to
 get my data back, but I have backups too.

 Does the filesystem have any data in it? Every time we've seen this
 error it's been on an empty cluster which had some weird issue with
 startup.

This one certainly had some data on it, yes.  A couple of 100's of GBs
of disk-images and a couple of trees of smaller files.  Most of them
accessed very rarely since being copied on.


   Regards,

  Oliver
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] status of glance/cinder/nova integration in openstack grizzly

2013-09-10 Thread Josh Durgin

On 09/10/2013 01:50 PM, Darren Birkett wrote:

One last question: I presume the fact that the 'volume_image_metadata'
field is not populated when cloning a glance image into a cinder volume
is a bug?  It means that the cinder client doesn't show the volume as
bootable, though I'm not sure what other detrimental effect it actually
has (clearly the volume can be booted from).


I think this is populated in Havana, but nothing actually uses that
field still afaik. It's just a proxy for 'was this volume created from
an image'.

Josh

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
Also, can you scrub the PG which contains the mds_anchortable object
and see if anything comes up? You should be able to find the key from
the logs (in the osd_op line that contains mds_anchortable) and
convert that into the PG. Or you can just scrub all of osd 2.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Sep 10, 2013 at 1:59 PM, Gregory Farnum g...@inktank.com wrote:
 It's not an upgrade issue. There's an MDS object that is somehow
 missing. If it exists, then on restart you'll be fine.

 Oliver, what is your general cluster config? What filesystem are your
 OSDs running on? What version of Ceph were you upgrading from? There's
 really no way for this file to not exist once created unless the
 underlying FS ate it or the last write both was interrupted and hit
 some kind of bug in our transaction code (of which none are known)
 during replay.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry larry@disney.com wrote:
 This is scary. Should I hold on upgrade?

 On 9/10/13 11:33 AM, Oliver Daudey oli...@xs4all.nl wrote:

Hey Gregory,

On 10-09-13 20:21, Gregory Farnum wrote:
 On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl
wrote:
 Hey list,

 I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
 cluster was:
 - Unmount CephFS everywhere.
 - Upgrade the Ceph-packages.
 - Restart MON.
 - Restart OSD.
 - Restart MDS.

 As soon as I got to the second node, the MDS crashed right after
startup.

 Part of the logs (more on request):

 - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
 0~0] 1.d902
 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
-11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
 1: openin
 g mds log
-10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
 discovering lo
 g bounds
 -9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
 recover s
 tart
 -8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
 read_head
 -7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 -
 - 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read
0~0]
 1.844f3
 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
 -6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 
 == mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
 20+0+0 (42
 35168662 0 0) 0x1e93380 con 0x1e5d580
 -5 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
 handle_subscribe_a
 ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10
19:37:32.796448
 -4 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.12:6802/53419
 -3 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.13:6802/45791
 -2 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.11:6800/16562
 -1 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 
 == osd.2 194.109.43.13:6802/45791 1  osd_op_reply(3 mds_anchortable
 [read 0~
 0] ack = -2 (No such file or directory)) v4  114+0+0 (3107677671 0
 0) 0x1e4d
 e00 con 0x1e5ddc0
  0 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
 function
 'void MDSTable::load_2(int, ceph::bufferlist, Context*)' thread
 7fd1ba81f700 ti
 me 2013-09-10 19:35:02.803673
 mds/MDSTable.cc: 152: FAILED assert(r = 0)

  ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
  1: (MDSTable::load_2(int, ceph::buffer::list, Context*)+0x44f)
[0x77ce7f]
  2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b]
  3: (MDS::handle_core_message(Message*)+0x987) [0x56f527]
  4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef]
  5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb]
  6: (DispatchQueue::entry()+0x592) [0x92e432]
  7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd]
  8: (()+0x68ca) [0x7fd1bed298ca]
  9: (clone()+0x6d) [0x7fd1bda5cb6d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.

 When trying to mount CephFS, it just hangs now.  Sometimes, an MDS
stays
 up for a while, but will eventually crash again.  This CephFS was
 created on 0.67 and I haven't done anything but mount and use it under
 very light load in the mean time.

 Any ideas, or if you need more info, let me know.  It would be nice to
 get my data back, but I have backups too.

 Does the filesystem have any data in it? Every time we've seen this
 error it's been on an empty cluster which had some weird issue with
 startup.

This one certainly had some data on it, yes.  A couple of 100's of GBs
of disk-images and a couple of trees of smaller files.  Most of them
accessed very rarely since being copied on.


   Regards,

  Oliver
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Liu, Larry
This is scary. Should I hold on upgrade?

On 9/10/13 11:33 AM, Oliver Daudey oli...@xs4all.nl wrote:

Hey Gregory,

On 10-09-13 20:21, Gregory Farnum wrote:
 On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl
wrote:
 Hey list,

 I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
 cluster was:
 - Unmount CephFS everywhere.
 - Upgrade the Ceph-packages.
 - Restart MON.
 - Restart OSD.
 - Restart MDS.

 As soon as I got to the second node, the MDS crashed right after
startup.

 Part of the logs (more on request):

 - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
 0~0] 1.d902
 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
-11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
 1: openin
 g mds log
-10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
 discovering lo
 g bounds
 -9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
 recover s
 tart
 -8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
 read_head
 -7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 -
 - 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read
0~0]
 1.844f3
 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
 -6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 
 == mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
 20+0+0 (42
 35168662 0 0) 0x1e93380 con 0x1e5d580
 -5 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
 handle_subscribe_a
 ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10
19:37:32.796448
 -4 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.12:6802/53419
 -3 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.13:6802/45791
 -2 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.11:6800/16562
 -1 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 
 == osd.2 194.109.43.13:6802/45791 1  osd_op_reply(3 mds_anchortable
 [read 0~
 0] ack = -2 (No such file or directory)) v4  114+0+0 (3107677671 0
 0) 0x1e4d
 e00 con 0x1e5ddc0
  0 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
 function
 'void MDSTable::load_2(int, ceph::bufferlist, Context*)' thread
 7fd1ba81f700 ti
 me 2013-09-10 19:35:02.803673
 mds/MDSTable.cc: 152: FAILED assert(r = 0)

  ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
  1: (MDSTable::load_2(int, ceph::buffer::list, Context*)+0x44f)
[0x77ce7f]
  2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b]
  3: (MDS::handle_core_message(Message*)+0x987) [0x56f527]
  4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef]
  5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb]
  6: (DispatchQueue::entry()+0x592) [0x92e432]
  7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd]
  8: (()+0x68ca) [0x7fd1bed298ca]
  9: (clone()+0x6d) [0x7fd1bda5cb6d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.

 When trying to mount CephFS, it just hangs now.  Sometimes, an MDS
stays
 up for a while, but will eventually crash again.  This CephFS was
 created on 0.67 and I haven't done anything but mount and use it under
 very light load in the mean time.

 Any ideas, or if you need more info, let me know.  It would be nice to
 get my data back, but I have backups too.
 
 Does the filesystem have any data in it? Every time we've seen this
 error it's been on an empty cluster which had some weird issue with
 startup.

This one certainly had some data on it, yes.  A couple of 100's of GBs
of disk-images and a couple of trees of smaller files.  Most of them
accessed very rarely since being copied on.


   Regards,

  Oliver
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] status of glance/cinder/nova integration in openstack grizzly

2013-09-10 Thread Darren Birkett
Hi Mike,

That led me to realise what the issue was.  My cinder (volumes) client did
not have the correct perms on the images pool.  I ran the following to
update the perms for that client:

ceph auth caps client.volumes mon 'allow r' osd 'allow class-read
object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images'

...and was then able to successfully boot an instance from a cinder volume
that was created by cloning a glance image from the images pool!

One last question: I presume the fact that the 'volume_image_metadata'
field is not populated when cloning a glance image into a cinder volume is
a bug?  It means that the cinder client doesn't show the volume as
bootable, though I'm not sure what other detrimental effect it actually has
(clearly the volume can be booted from).

Thanks
Darren


On 10 September 2013 21:04, Darren Birkett darren.birk...@gmail.com wrote:

 Hi Mike,

 Thanks - glad to hear it definitely works as expected!  Here's my
 client.glance and client.volumes from 'ceph auth list':

 client.glance
  key: AQAWFi9SOKzAABAAPV1ZrpWkx72tmJ5E7nOi3A==
 caps: [mon] allow r
 caps: [osd] allow rwx pool=images, allow class-read object_prefix
 rbd_children
 client.volumes
 key: AQAnAy9ScPB4IRAAtxD/V1rDciqFiT9AMPPr+A==
 caps: [mon] allow r
 caps: [osd] allow class-read object_prefix rbd_children, allow rwx
 pool=volumes

 Thanks
 Darren


 On 10 September 2013 20:08, Mike Dawson mike.daw...@cloudapt.com wrote:

 Darren,

 I can confirm Copy on Write (show_image_direct_url = True) does work in
 Grizzly.

 It sounds like you are close. To check permissions, run 'ceph auth list',
 and reply with client.images and client.volumes (or whatever keys you
 use in Glance and Cinder).

 Cheers,
 Mike Dawson



 On 9/10/2013 10:12 AM, Darren Birkett wrote:

 Hi All,

 tl;dr - does glance/rbd and cinder/rbd play together nicely in grizzly?

 I'm currently testing a ceph/rados back end with an openstack
 installation.  I have the following things working OK:

 1. cinder configured to create volumes in RBD
 2. nova configured to boot from RBD backed cinder volumes (libvirt UUID
 secret set etc)
 3. glance configured to use RBD as a back end store for images

 With this setup, when I create a bootable volume in cinder, passing an
 id of an image in glance, the image gets downloaded, converted to raw,
 and then created as an RBD object and made available to cinder.  The
 correct metadata field for the cinder volume is populated
 (volume_image_metadata) and so the cinder client marks the volume as
 bootable.  This is all fine.

 If I want to take advantage of the fact that both glance images and
 cinder volumes are stored in RBD, I can add the following flag to the
 glance-api.conf:

 show_image_direct_url = True

 This enables cinder to see that the glance image is stored in RBD, and
 the cinder rbd driver then, instead of downloading the image and
 creating an RBD image from it, just issues an 'rbd clone' command (seen
 in the cinder-volume.log):

 rbd clone --pool images --image dcb2f16d-a09d-4064-9198-**1965274e214d
 --snap snap --dest-pool volumes --dest
 volume-20987f9d-b4fb-463d-**8b8f-fa667bd47c6d

 This is all very nice, and the cinder volume is available immediately as
 you'd expect.  The problem is that the metadata field is not populated
 so it's not seen as bootable.  Even manually populating this field
 leaves the volume unbootable.  The volume can not even be attached to
 another instance for inspection.

 libvirt doesn't seem to be able to access the rbd device. From
 nova-compute.log:

 qemu-system-x86_64: -drive
 file=rbd:volumes/volume-**20987f9d-b4fb-463d-8b8f-**
 fa667bd47c6d:id=volumes:key=**AQAnAy9ScPB4IRAAtxD/**
 V1rDciqFiT9AMPPr+A==:auth_**supported=cephx\;none,if=none,**
 id=drive-virtio-disk0,format=**raw,serial=20987f9d-b4fb-463d-**
 8b8f-fa667bd47c6d,cache=none:
 error reading header from volume-20987f9d-b4fb-463d-**8b8f-fa667bd47c6d

 qemu-system-x86_64: -drive
 file=rbd:volumes/volume-**20987f9d-b4fb-463d-8b8f-**
 fa667bd47c6d:id=volumes:key=**AQAnAy9ScPB4IRAAtxD/**
 V1rDciqFiT9AMPPr+A==:auth_**supported=cephx\;none,if=none,**
 id=drive-virtio-disk0,format=**raw,serial=20987f9d-b4fb-463d-**
 8b8f-fa667bd47c6d,cache=none:
 could not open disk image
 rbd:volumes/volume-20987f9d-**b4fb-463d-8b8f-fa667bd47c6d:**
 id=volumes:key=**AQAnAy9ScPB4IRAAtxD/**V1rDciqFiT9AMPPr+A==:auth_**
 supported=cephx\;none:
 Operation not permitted

 It's almost like a permission issue, but my ceph/rbd knowledge is still
 fledgeling.

 I know that the cinder rbd driver has been rewritten to use librbd in
 havana, and I'm wondering if this will change any of this behaviour?
   I'm also wondering if anyone has actually got this working with
 grizzly, and how?

 Many thanks
 Darren



 __**_
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



Re: [ceph-users] quick-ceph-deploy

2013-09-10 Thread Alfredo Deza
On Tue, Sep 10, 2013 at 4:33 PM, sriram sriram@gmail.com wrote:
 I had followed that to install ceph-deploy but then I am not sure if there
 is a difference between -

 INSTALL-CEPH-DEPLOY described in
 http://ceph.com/docs/master/start/quick-start-preflight/

The preflight seems to cover installation of ceph-deploy with
Debian-based distros, not RPM ones.

 and

 INSTALLING CEPH DEPLOY described in http://ceph.com/docs/master/install/rpm/


That guide goes into detail as to what you need to do with your
repository to add ceph repos and
so that you can install ceph-deploy via yum.

 The reason ceph-deploy install mon-ceph-node seems to fail is related to
 accessing 
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc:;. The
 http://ceph.com/docs/master/start/quick-start-preflight/ has info about it
 while the second link does not. Hence my question.

The second link (with instructions for RPM) does talks about the
release key in the first section:

http://ceph.com/docs/master/install/rpm/#install-release-key

Have you tried accessing that URL from the host that is failing?

Can you use curl/wget to grab that url?


 On Tue, Sep 10, 2013 at 1:21 PM, Tamil Muthamizhan
 tamil.muthamiz...@inktank.com wrote:

 Hi Sriram,

 this should help: http://ceph.com/docs/master/install/rpm/

 Regards,
 Tamil


 On Tue, Sep 10, 2013 at 12:55 PM, sriram sriram@gmail.com wrote:

 Can someone tell me the equivalent steps in RHEL for the steps below -

 wget -q -O-
 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo
 apt-key add -
 echo deb http://ceph.com/debian-dumpling/ $(lsb_release -sc) main | sudo
 tee /etc/apt/sources.list.d/ceph.list
 sudo apt-get update
 sudo apt-get install ceph-deploy



 On Tue, Sep 10, 2013 at 12:40 PM, sriram sriram@gmail.com wrote:

 Any help here is appreciated. I am pretty much stuck in trying to
 install ceph on my local box.


 On Tue, Sep 10, 2013 at 11:02 AM, sriram sriram@gmail.com wrote:

 Yes I am able to do that.


 On Fri, Sep 6, 2013 at 8:19 AM, Alfredo Deza alfredo.d...@inktank.com
 wrote:

 On Fri, Sep 6, 2013 at 11:05 AM, sriram sriram@gmail.com wrote:
  sudo su -c 'rpm --import
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
  error:
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc: key
  1 import failed.
 

 Can you actually get to that URL and see the GPG key?

 Via curl/wget or the browser (if you have one in that host)
 
  On Fri, Sep 6, 2013 at 8:01 AM, Alfredo Deza
  alfredo.d...@inktank.com
  wrote:
 
  On Fri, Sep 6, 2013 at 10:54 AM, sriram sriram@gmail.com
  wrote:
   I am running it on the same machine.
  
   [abc@abc-ld ~]$ ceph-deploy install abc-ld
  
  
   On Fri, Sep 6, 2013 at 5:42 AM, Alfredo Deza
   alfredo.d...@inktank.com
   wrote:
  
   On Thu, Sep 5, 2013 at 8:25 PM, sriram sriram@gmail.com
   wrote:
I am trying to deploy ceph reading the instructions from this
link.
   
http://ceph.com/docs/master/start/quick-ceph-deploy/
   
I get the error below. Can someone let me know if this is
something
related
to what I am doing wrong or the script?
   
[abc@abc-ld ~]$ ceph-deploy install abc-ld
[ceph_deploy.install][DEBUG ] Installing stable version
dumpling on
cluster
ceph hosts abc-ld
[ceph_deploy.install][DEBUG ] Detecting platform for host
abc-ld ...
[sudo] password for abc:
[ceph_deploy.install][INFO  ] Distro info:
RedHatEnterpriseWorkstation
6.1
Santiago
[abc-ld][INFO  ] installing ceph on abc-ld
[abc-ld][INFO  ] Running command: su -c 'rpm --import
   
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
[abc-ld][ERROR ] Traceback (most recent call last):
[abc-ld][ERROR ]   File
   
   
/usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py,
line
21, in install
[abc-ld][ERROR ]   File
   
/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py,
line
10,
in inner
[abc-ld][ERROR ] def inner(*args, **kwargs):
[abc-ld][ERROR ]   File
   
/usr/lib/python2.6/site-packages/ceph_deploy/util/wrappers.py, 
line
6,
in
remote_call
[abc-ld][ERROR ] This allows us to only remote-execute the
actual
calls,
not whole functions.
[abc-ld][ERROR ]   File /usr/lib64/python2.6/subprocess.py,
line
502,
in
check_call
[abc-ld][ERROR ] raise CalledProcessError(retcode, cmd)
[abc-ld][ERROR ] CalledProcessError: Command '['su -c \'rpm
--import
   
   
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc\'']'
returned non-zero exit status 1
[abc-ld][ERROR ] error:
   
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc:
 key
1
import failed.
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command:
su -c
'rpm
--import
   

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
If the problem is somewhere in RADOS/xfs/whatever, then there's a good
chance that the mds_anchortable object exists in its replica OSDs,
but when listing objects those aren't queried, so they won't show up
in a listing. You can use the osdmaptool to map from an object name to
the PG it would show up in, or if you look at your log you should see
a line something like
1 -- LOCAL IP -- OTHER IP -- osd_op(mds.0.31:3 mds_anchortable
[read 0~0] 1.a977f6a7 e165) v4 -- ?+0 0x1e88d80 con 0x1f189a0
In this example, metadata is pool 1 and 1.a977f6a7 is the hash of the
msd_anchortable object, and depending on how many PGs are in the pool
it will be in pg 1.a7, or 1.6a7, or 1.f6a7...
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Tue, Sep 10, 2013 at 2:51 PM, Oliver Daudey oli...@xs4all.nl wrote:
 Hey Gregory,

 The only objects containing table I can find at all, are in the
 metadata-pool:
 # rados --pool=metadata ls | grep -i table
 mds0_inotable

 Looking at another cluster where I use CephFS, there is indeed an object
 named mds_anchortable, but the broken cluster is missing it.  I don't
 see how I can scrub the PG for an object that doesn't appear to exist.
 Please elaborate.


Regards,

  Oliver

 On di, 2013-09-10 at 14:06 -0700, Gregory Farnum wrote:
 Also, can you scrub the PG which contains the mds_anchortable object
 and see if anything comes up? You should be able to find the key from
 the logs (in the osd_op line that contains mds_anchortable) and
 convert that into the PG. Or you can just scrub all of osd 2.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Tue, Sep 10, 2013 at 1:59 PM, Gregory Farnum g...@inktank.com wrote:
  It's not an upgrade issue. There's an MDS object that is somehow
  missing. If it exists, then on restart you'll be fine.
 
  Oliver, what is your general cluster config? What filesystem are your
  OSDs running on? What version of Ceph were you upgrading from? There's
  really no way for this file to not exist once created unless the
  underlying FS ate it or the last write both was interrupted and hit
  some kind of bug in our transaction code (of which none are known)
  during replay.
  -Greg
  Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
  On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry larry@disney.com wrote:
  This is scary. Should I hold on upgrade?
 
  On 9/10/13 11:33 AM, Oliver Daudey oli...@xs4all.nl wrote:
 
 Hey Gregory,
 
 On 10-09-13 20:21, Gregory Farnum wrote:
  On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl
 wrote:
  Hey list,
 
  I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
  cluster was:
  - Unmount CephFS everywhere.
  - Upgrade the Ceph-packages.
  - Restart MON.
  - Restart OSD.
  - Restart MDS.
 
  As soon as I got to the second node, the MDS crashed right after
 startup.
 
  Part of the logs (more on request):
 
  - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
  0~0] 1.d902
  70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
 -11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
  1: openin
  g mds log
 -10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
  discovering lo
  g bounds
  -9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
  recover s
  tart
  -8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
  read_head
  -7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
  194.109.43.12:6800/67277 -
  - 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read
 0~0]
  1.844f3
  494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
  -6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
  194.109.43.12:6800/67277 
  == mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
  20+0+0 (42
  35168662 0 0) 0x1e93380 con 0x1e5d580
  -5 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
  handle_subscribe_a
  ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10
 19:37:32.796448
  -4 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
  ms_handle_connect on
   194.109.43.12:6802/53419
  -3 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
  ms_handle_connect on
   194.109.43.13:6802/45791
  -2 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
  ms_handle_connect on
   194.109.43.11:6800/16562
  -1 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
  194.109.43.12:6800/67277 
  == osd.2 194.109.43.13:6802/45791 1  osd_op_reply(3 mds_anchortable
  [read 0~
  0] ack = -2 (No such file or directory)) v4  114+0+0 (3107677671 0
  0) 0x1e4d
  e00 con 0x1e5ddc0
   0 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
  function
  'void MDSTable::load_2(int, ceph::bufferlist, Context*)' thread
  7fd1ba81f700 ti
  me 2013-09-10 19:35:02.803673
  mds/MDSTable.cc: 152: FAILED assert(r = 0)
 
   ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
   1: (MDSTable::load_2(int, ceph::buffer::list, 

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory,

The only objects containing table I can find at all, are in the
metadata-pool:
# rados --pool=metadata ls | grep -i table
mds0_inotable

Looking at another cluster where I use CephFS, there is indeed an object
named mds_anchortable, but the broken cluster is missing it.  I don't
see how I can scrub the PG for an object that doesn't appear to exist.
Please elaborate.


   Regards,

 Oliver

On di, 2013-09-10 at 14:06 -0700, Gregory Farnum wrote:
 Also, can you scrub the PG which contains the mds_anchortable object
 and see if anything comes up? You should be able to find the key from
 the logs (in the osd_op line that contains mds_anchortable) and
 convert that into the PG. Or you can just scrub all of osd 2.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
 On Tue, Sep 10, 2013 at 1:59 PM, Gregory Farnum g...@inktank.com wrote:
  It's not an upgrade issue. There's an MDS object that is somehow
  missing. If it exists, then on restart you'll be fine.
 
  Oliver, what is your general cluster config? What filesystem are your
  OSDs running on? What version of Ceph were you upgrading from? There's
  really no way for this file to not exist once created unless the
  underlying FS ate it or the last write both was interrupted and hit
  some kind of bug in our transaction code (of which none are known)
  during replay.
  -Greg
  Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
  On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry larry@disney.com wrote:
  This is scary. Should I hold on upgrade?
 
  On 9/10/13 11:33 AM, Oliver Daudey oli...@xs4all.nl wrote:
 
 Hey Gregory,
 
 On 10-09-13 20:21, Gregory Farnum wrote:
  On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl
 wrote:
  Hey list,
 
  I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
  cluster was:
  - Unmount CephFS everywhere.
  - Upgrade the Ceph-packages.
  - Restart MON.
  - Restart OSD.
  - Restart MDS.
 
  As soon as I got to the second node, the MDS crashed right after
 startup.
 
  Part of the logs (more on request):
 
  - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
  0~0] 1.d902
  70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
 -11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
  1: openin
  g mds log
 -10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
  discovering lo
  g bounds
  -9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
  recover s
  tart
  -8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
  read_head
  -7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
  194.109.43.12:6800/67277 -
  - 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read
 0~0]
  1.844f3
  494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
  -6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
  194.109.43.12:6800/67277 
  == mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
  20+0+0 (42
  35168662 0 0) 0x1e93380 con 0x1e5d580
  -5 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
  handle_subscribe_a
  ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10
 19:37:32.796448
  -4 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
  ms_handle_connect on
   194.109.43.12:6802/53419
  -3 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
  ms_handle_connect on
   194.109.43.13:6802/45791
  -2 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
  ms_handle_connect on
   194.109.43.11:6800/16562
  -1 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
  194.109.43.12:6800/67277 
  == osd.2 194.109.43.13:6802/45791 1  osd_op_reply(3 mds_anchortable
  [read 0~
  0] ack = -2 (No such file or directory)) v4  114+0+0 (3107677671 0
  0) 0x1e4d
  e00 con 0x1e5ddc0
   0 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
  function
  'void MDSTable::load_2(int, ceph::bufferlist, Context*)' thread
  7fd1ba81f700 ti
  me 2013-09-10 19:35:02.803673
  mds/MDSTable.cc: 152: FAILED assert(r = 0)
 
   ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
   1: (MDSTable::load_2(int, ceph::buffer::list, Context*)+0x44f)
 [0x77ce7f]
   2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b]
   3: (MDS::handle_core_message(Message*)+0x987) [0x56f527]
   4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef]
   5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb]
   6: (DispatchQueue::entry()+0x592) [0x92e432]
   7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd]
   8: (()+0x68ca) [0x7fd1bed298ca]
   9: (clone()+0x6d) [0x7fd1bda5cb6d]
   NOTE: a copy of the executable, or `objdump -rdS executable` is
  needed to interpret this.
 
  When trying to mount CephFS, it just hangs now.  Sometimes, an MDS
 stays
  up for a while, but will eventually crash again.  This CephFS was
  created on 0.67 and I haven't done anything but mount and use it under
  very light load in the mean time.
 
  Any ideas, or if you need more info, let me 

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
On Tue, Sep 10, 2013 at 2:36 PM, Oliver Daudey oli...@xs4all.nl wrote:
 Hey Gregory,

 My cluster consists of 3 nodes, each running 1 mon, 1 osd and 1 mds.  I
 upgraded from 0.67, but was still running 0.61.7 OSDs at the time of the
 upgrade, because of performance-issues that have just recently been
 fixed.  These have now been upgraded to 0.67.3, along with the rest of
 Ceph.  My OSDs are using XFS as the underlying FS.  I have been
 switching one OSD in my cluster back and forth between 0.61.7 and some
 test-versions, which where based on 0.67.x, to debug aforementioned
 performance-issues with Samuel, but that was before I newfs'ed and
 started using this instance of CephFS.  Furthermore, I don't seem to
 have lost any other data during these tests.

Sam, any idea how we could have lost an object? I checked into how we
touch this one, and all we ever do is read_full and write_full.


 BTW: CephFS has never been very stable for me during stress-tests.  If
 some components are brought down and back up again during operations,
 like stopping and restarting all components on one node while generating
 some load with a cp of a big CephFS directory-tree on another, then,
 once things settle again, doing the same on another node, it always
 quickly ends up like what I see now.

Do you have multiple active MDSes? Or do you just mean when you do a
reboot while generating load it migrates?

 MDSs crashing on start or on
 attempts to mount the CephFS and the only way out being to stop the
 MDSs, wipe the contents of the data and metadata-pools and doing the
 newfs-thing.  I can only assume you guys are putting it through similar
 stress-tests, but if not, try it.

Yeah, we have a bunch of these. I'm not sure that we coordinate
killing an entire node at a time, but I can't think of any way that
would matter. :/

 PS: Is there a way to get back at the data after something like this?
 Do you still want me to keep the current situation to debug it further,
 or can I zap everything, restore my backups and move on?  Thanks!

You could figure out how to build a fake anchortable (just generate an
empty one with ceph-dencoder) and that would let you do most stuff,
although if you have any hard links then those would be lost and I'm
not sure exactly what that would mean at this point — it's possible
with the new lookup-by-ino stuff that it wouldn't matter at all, or it
might make them inaccessible from one link and un-deletable when
removed from the other. (via the FS, that is.) If restoring from
backups is feasible I'd probably just shoot for that after doing a
scrub. (If the scrub turns up something dirty then probably it can be
recovered via a RADOS repair.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

Regards,

   Oliver

 On di, 2013-09-10 at 13:59 -0700, Gregory Farnum wrote:
 It's not an upgrade issue. There's an MDS object that is somehow
 missing. If it exists, then on restart you'll be fine.

 Oliver, what is your general cluster config? What filesystem are your
 OSDs running on? What version of Ceph were you upgrading from? There's
 really no way for this file to not exist once created unless the
 underlying FS ate it or the last write both was interrupted and hit
 some kind of bug in our transaction code (of which none are known)
 during replay.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry larry@disney.com wrote:
  This is scary. Should I hold on upgrade?
 
  On 9/10/13 11:33 AM, Oliver Daudey oli...@xs4all.nl wrote:
 
 Hey Gregory,
 
 On 10-09-13 20:21, Gregory Farnum wrote:
  On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl
 wrote:
  Hey list,
 
  I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
  cluster was:
  - Unmount CephFS everywhere.
  - Upgrade the Ceph-packages.
  - Restart MON.
  - Restart OSD.
  - Restart MDS.
 
  As soon as I got to the second node, the MDS crashed right after
 startup.
 
  Part of the logs (more on request):
 
  - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
  0~0] 1.d902
  70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
 -11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
  1: openin
  g mds log
 -10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
  discovering lo
  g bounds
  -9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
  recover s
  tart
  -8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
  read_head
  -7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
  194.109.43.12:6800/67277 -
  - 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read
 0~0]
  1.844f3
  494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
  -6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
  194.109.43.12:6800/67277 
  == mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
  20+0+0 (42
  35168662 0 0) 0x1e93380 con 0x1e5d580
  -5 2013-09-10 

Re: [ceph-users] rbd cp copies of sparse files become fully allocated

2013-09-10 Thread Josh Durgin

On 09/10/2013 01:51 AM, Andrey Korolyov wrote:

On Tue, Sep 10, 2013 at 3:03 AM, Josh Durgin josh.dur...@inktank.com wrote:

On 09/09/2013 04:57 AM, Andrey Korolyov wrote:


May I also suggest the same for export/import mechanism? Say, if image
was created by fallocate we may also want to leave holes upon upload
and vice-versa for export.



Import and export already omit runs of zeroes. They could detect
smaller runs (currently they look at object size chunks), and export
might be more efficient if it used diff_iterate() instead of
read_iterate(). Have you observed them misbehaving with sparse images?




Did you meant dumpling? As I had checked some months ago cuttlefish
not had such feature.


It's been there at least since bobtail. Export to stdout can't be sparse
though, since you can't seek stdout. Import and export haven't changed
much in a while, and the sparse detection certainly still works on
master (just tried with an empty 1G file).


On Mon, Sep 9, 2013 at 8:45 AM, Sage Weil s...@inktank.com wrote:


On Sat, 7 Sep 2013, Oliver Daudey wrote:


Hey all,

This topic has been partly discussed here:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-March/000799.html

Tested on Ceph version 0.67.2.

If you create a fresh empty image of, say, 100GB in size on RBD and then
use rbd cp to make a copy of it, even though the image is sparse, the
command will attempt to read every part of it and take far more time
than expected.

After reading the above thread, I understand why the copy of an
essentially empty sparse image on RBD would take so long, but it doesn't
explain why the copy won't be sparse itself.  If I use rbd cp to copy
an image, the copy will take it's full allocated size on disk, even if
the original was empty.  If I use the QEMU qemu-img-tool's
convert-option to convert the original image to the copy without
changing the format, essentially only making a copy, it takes it's time
as well, but will be faster than rbd cp and the resulting copy will be
sparse.

Example-commands:
rbd create --size 102400 test1
rbd cp test1 test2
qemu-img convert -p -f rbd -O rbd rbd:rbd/test1 rbd:rbd/test3

Shouldn't rbd cp at least have an option to attempt to sparsify the
copy, or copy the sparse parts as sparse?  Same goes for rbd clone,
BTW.



Yep, this is in fact a bug.  Opened http://tracker.ceph.com/issues/6257.

Thanks!
sage





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory,

My cluster consists of 3 nodes, each running 1 mon, 1 osd and 1 mds.  I
upgraded from 0.67, but was still running 0.61.7 OSDs at the time of the
upgrade, because of performance-issues that have just recently been
fixed.  These have now been upgraded to 0.67.3, along with the rest of
Ceph.  My OSDs are using XFS as the underlying FS.  I have been
switching one OSD in my cluster back and forth between 0.61.7 and some
test-versions, which where based on 0.67.x, to debug aforementioned
performance-issues with Samuel, but that was before I newfs'ed and
started using this instance of CephFS.  Furthermore, I don't seem to
have lost any other data during these tests.

BTW: CephFS has never been very stable for me during stress-tests.  If
some components are brought down and back up again during operations,
like stopping and restarting all components on one node while generating
some load with a cp of a big CephFS directory-tree on another, then,
once things settle again, doing the same on another node, it always
quickly ends up like what I see now.  MDSs crashing on start or on
attempts to mount the CephFS and the only way out being to stop the
MDSs, wipe the contents of the data and metadata-pools and doing the
newfs-thing.  I can only assume you guys are putting it through similar
stress-tests, but if not, try it.

PS: Is there a way to get back at the data after something like this?
Do you still want me to keep the current situation to debug it further,
or can I zap everything, restore my backups and move on?  Thanks!


   Regards,

  Oliver

On di, 2013-09-10 at 13:59 -0700, Gregory Farnum wrote:
 It's not an upgrade issue. There's an MDS object that is somehow
 missing. If it exists, then on restart you'll be fine.
 
 Oliver, what is your general cluster config? What filesystem are your
 OSDs running on? What version of Ceph were you upgrading from? There's
 really no way for this file to not exist once created unless the
 underlying FS ate it or the last write both was interrupted and hit
 some kind of bug in our transaction code (of which none are known)
 during replay.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
 On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry larry@disney.com wrote:
  This is scary. Should I hold on upgrade?
 
  On 9/10/13 11:33 AM, Oliver Daudey oli...@xs4all.nl wrote:
 
 Hey Gregory,
 
 On 10-09-13 20:21, Gregory Farnum wrote:
  On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl
 wrote:
  Hey list,
 
  I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
  cluster was:
  - Unmount CephFS everywhere.
  - Upgrade the Ceph-packages.
  - Restart MON.
  - Restart OSD.
  - Restart MDS.
 
  As soon as I got to the second node, the MDS crashed right after
 startup.
 
  Part of the logs (more on request):
 
  - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
  0~0] 1.d902
  70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
 -11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
  1: openin
  g mds log
 -10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
  discovering lo
  g bounds
  -9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
  recover s
  tart
  -8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
  read_head
  -7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
  194.109.43.12:6800/67277 -
  - 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read
 0~0]
  1.844f3
  494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
  -6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
  194.109.43.12:6800/67277 
  == mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
  20+0+0 (42
  35168662 0 0) 0x1e93380 con 0x1e5d580
  -5 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
  handle_subscribe_a
  ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10
 19:37:32.796448
  -4 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
  ms_handle_connect on
   194.109.43.12:6802/53419
  -3 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
  ms_handle_connect on
   194.109.43.13:6802/45791
  -2 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
  ms_handle_connect on
   194.109.43.11:6800/16562
  -1 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
  194.109.43.12:6800/67277 
  == osd.2 194.109.43.13:6802/45791 1  osd_op_reply(3 mds_anchortable
  [read 0~
  0] ack = -2 (No such file or directory)) v4  114+0+0 (3107677671 0
  0) 0x1e4d
  e00 con 0x1e5ddc0
   0 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
  function
  'void MDSTable::load_2(int, ceph::bufferlist, Context*)' thread
  7fd1ba81f700 ti
  me 2013-09-10 19:35:02.803673
  mds/MDSTable.cc: 152: FAILED assert(r = 0)
 
   ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
   1: (MDSTable::load_2(int, ceph::buffer::list, Context*)+0x44f)
 [0x77ce7f]
   2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) 

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory,

On 10-09-13 20:21, Gregory Farnum wrote:
 On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl wrote:
 Hey list,

 I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
 cluster was:
 - Unmount CephFS everywhere.
 - Upgrade the Ceph-packages.
 - Restart MON.
 - Restart OSD.
 - Restart MDS.

 As soon as I got to the second node, the MDS crashed right after startup.

 Part of the logs (more on request):

 - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
 0~0] 1.d902
 70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
-11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 boot_start
 1: openin
 g mds log
-10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
 discovering lo
 g bounds
 -9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 mds.0.journaler(ro)
 recover s
 tart
 -8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 mds.0.journaler(ro)
 read_head
 -7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 -
 - 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read 0~0]
 1.844f3
 494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
 -6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 
 == mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
 20+0+0 (42
 35168662 0 0) 0x1e93380 con 0x1e5d580
 -5 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
 handle_subscribe_a
 ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10 19:37:32.796448
 -4 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.12:6802/53419
 -3 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.13:6802/45791
 -2 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
 ms_handle_connect on
  194.109.43.11:6800/16562
 -1 2013-09-10 19:35:02.803546 7fd1ba81f700  1 --
 194.109.43.12:6800/67277 
 == osd.2 194.109.43.13:6802/45791 1  osd_op_reply(3 mds_anchortable
 [read 0~
 0] ack = -2 (No such file or directory)) v4  114+0+0 (3107677671 0
 0) 0x1e4d
 e00 con 0x1e5ddc0
  0 2013-09-10 19:35:02.805611 7fd1ba81f700 -1 mds/MDSTable.cc: In
 function
 'void MDSTable::load_2(int, ceph::bufferlist, Context*)' thread
 7fd1ba81f700 ti
 me 2013-09-10 19:35:02.803673
 mds/MDSTable.cc: 152: FAILED assert(r = 0)

  ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
  1: (MDSTable::load_2(int, ceph::buffer::list, Context*)+0x44f) [0x77ce7f]
  2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe3b) [0x7d891b]
  3: (MDS::handle_core_message(Message*)+0x987) [0x56f527]
  4: (MDS::_dispatch(Message*)+0x2f) [0x56f5ef]
  5: (MDS::ms_dispatch(Message*)+0x19b) [0x5710bb]
  6: (DispatchQueue::entry()+0x592) [0x92e432]
  7: (DispatchQueue::DispatchThread::entry()+0xd) [0x8a59bd]
  8: (()+0x68ca) [0x7fd1bed298ca]
  9: (clone()+0x6d) [0x7fd1bda5cb6d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.

 When trying to mount CephFS, it just hangs now.  Sometimes, an MDS stays
 up for a while, but will eventually crash again.  This CephFS was
 created on 0.67 and I haven't done anything but mount and use it under
 very light load in the mean time.

 Any ideas, or if you need more info, let me know.  It would be nice to
 get my data back, but I have backups too.
 
 Does the filesystem have any data in it? Every time we've seen this
 error it's been on an empty cluster which had some weird issue with
 startup.

This one certainly had some data on it, yes.  A couple of 100's of GBs
of disk-images and a couple of trees of smaller files.  Most of them
accessed very rarely since being copied on.


   Regards,

  Oliver
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
Nope, a repair won't change anything if scrub doesn't detect any
inconsistencies. There must be something else going on, but I can't
fathom what...I'll try and look through it a bit more tomorrow. :/
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Sep 10, 2013 at 3:49 PM, Oliver Daudey oli...@xs4all.nl wrote:
 Hey Gregory,

 Thanks for your explanation.  Turns out to be 1.a7 and it seems to scrub
 OK.

 # ceph osd getmap -o osdmap
 # osdmaptool --test-map-object mds_anchortable --pool 1 osdmap
 osdmaptool: osdmap file 'osdmap'
  object 'mds_anchortable' - 1.a7 - [2,0]
 # ceph pg scrub 1.a7

 osd.2 logs:
 2013-09-11 00:41:15.843302 7faf56b1b700  0 log [INF] : 1.a7 scrub ok

 osd.0 didn't show anything in it's logs, though.  Should I try a repair
 next?


Regards,

   Oliver

 On di, 2013-09-10 at 15:01 -0700, Gregory Farnum wrote:
 If the problem is somewhere in RADOS/xfs/whatever, then there's a good
 chance that the mds_anchortable object exists in its replica OSDs,
 but when listing objects those aren't queried, so they won't show up
 in a listing. You can use the osdmaptool to map from an object name to
 the PG it would show up in, or if you look at your log you should see
 a line something like
 1 -- LOCAL IP -- OTHER IP -- osd_op(mds.0.31:3 mds_anchortable
 [read 0~0] 1.a977f6a7 e165) v4 -- ?+0 0x1e88d80 con 0x1f189a0
 In this example, metadata is pool 1 and 1.a977f6a7 is the hash of the
 msd_anchortable object, and depending on how many PGs are in the pool
 it will be in pg 1.a7, or 1.6a7, or 1.f6a7...
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com

 On Tue, Sep 10, 2013 at 2:51 PM, Oliver Daudey oli...@xs4all.nl wrote:
  Hey Gregory,
 
  The only objects containing table I can find at all, are in the
  metadata-pool:
  # rados --pool=metadata ls | grep -i table
  mds0_inotable
 
  Looking at another cluster where I use CephFS, there is indeed an object
  named mds_anchortable, but the broken cluster is missing it.  I don't
  see how I can scrub the PG for an object that doesn't appear to exist.
  Please elaborate.
 
 
 Regards,
 
   Oliver
 
  On di, 2013-09-10 at 14:06 -0700, Gregory Farnum wrote:
  Also, can you scrub the PG which contains the mds_anchortable object
  and see if anything comes up? You should be able to find the key from
  the logs (in the osd_op line that contains mds_anchortable) and
  convert that into the PG. Or you can just scrub all of osd 2.
  -Greg
  Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
  On Tue, Sep 10, 2013 at 1:59 PM, Gregory Farnum g...@inktank.com wrote:
   It's not an upgrade issue. There's an MDS object that is somehow
   missing. If it exists, then on restart you'll be fine.
  
   Oliver, what is your general cluster config? What filesystem are your
   OSDs running on? What version of Ceph were you upgrading from? There's
   really no way for this file to not exist once created unless the
   underlying FS ate it or the last write both was interrupted and hit
   some kind of bug in our transaction code (of which none are known)
   during replay.
   -Greg
   Software Engineer #42 @ http://inktank.com | http://ceph.com
  
  
   On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry larry@disney.com 
   wrote:
   This is scary. Should I hold on upgrade?
  
   On 9/10/13 11:33 AM, Oliver Daudey oli...@xs4all.nl wrote:
  
  Hey Gregory,
  
  On 10-09-13 20:21, Gregory Farnum wrote:
   On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl
  wrote:
   Hey list,
  
   I just upgraded to Ceph 0.67.3.  What I did on every node of my 
   3-node
   cluster was:
   - Unmount CephFS everywhere.
   - Upgrade the Ceph-packages.
   - Restart MON.
   - Restart OSD.
   - Restart MDS.
  
   As soon as I got to the second node, the MDS crashed right after
  startup.
  
   Part of the logs (more on request):
  
   - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
   0~0] 1.d902
   70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
  -11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 
   boot_start
   1: openin
   g mds log
  -10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
   discovering lo
   g bounds
   -9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 
   mds.0.journaler(ro)
   recover s
   tart
   -8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 
   mds.0.journaler(ro)
   read_head
   -7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
   194.109.43.12:6800/67277 -
   - 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read
  0~0]
   1.844f3
   494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
   -6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
   194.109.43.12:6800/67277 
   == mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
   
   20+0+0 (42
   35168662 0 0) 0x1e93380 con 0x1e5d580
   -5 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
   handle_subscribe_a
   ck sent 2013-09-10 19:35:02.796448 renew after 

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory,

Thanks for your explanation.  Turns out to be 1.a7 and it seems to scrub
OK.

# ceph osd getmap -o osdmap
# osdmaptool --test-map-object mds_anchortable --pool 1 osdmap
osdmaptool: osdmap file 'osdmap'
 object 'mds_anchortable' - 1.a7 - [2,0]
# ceph pg scrub 1.a7

osd.2 logs:
2013-09-11 00:41:15.843302 7faf56b1b700  0 log [INF] : 1.a7 scrub ok

osd.0 didn't show anything in it's logs, though.  Should I try a repair
next?


   Regards,

  Oliver

On di, 2013-09-10 at 15:01 -0700, Gregory Farnum wrote:
 If the problem is somewhere in RADOS/xfs/whatever, then there's a good
 chance that the mds_anchortable object exists in its replica OSDs,
 but when listing objects those aren't queried, so they won't show up
 in a listing. You can use the osdmaptool to map from an object name to
 the PG it would show up in, or if you look at your log you should see
 a line something like
 1 -- LOCAL IP -- OTHER IP -- osd_op(mds.0.31:3 mds_anchortable
 [read 0~0] 1.a977f6a7 e165) v4 -- ?+0 0x1e88d80 con 0x1f189a0
 In this example, metadata is pool 1 and 1.a977f6a7 is the hash of the
 msd_anchortable object, and depending on how many PGs are in the pool
 it will be in pg 1.a7, or 1.6a7, or 1.f6a7...
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 On Tue, Sep 10, 2013 at 2:51 PM, Oliver Daudey oli...@xs4all.nl wrote:
  Hey Gregory,
 
  The only objects containing table I can find at all, are in the
  metadata-pool:
  # rados --pool=metadata ls | grep -i table
  mds0_inotable
 
  Looking at another cluster where I use CephFS, there is indeed an object
  named mds_anchortable, but the broken cluster is missing it.  I don't
  see how I can scrub the PG for an object that doesn't appear to exist.
  Please elaborate.
 
 
 Regards,
 
   Oliver
 
  On di, 2013-09-10 at 14:06 -0700, Gregory Farnum wrote:
  Also, can you scrub the PG which contains the mds_anchortable object
  and see if anything comes up? You should be able to find the key from
  the logs (in the osd_op line that contains mds_anchortable) and
  convert that into the PG. Or you can just scrub all of osd 2.
  -Greg
  Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
  On Tue, Sep 10, 2013 at 1:59 PM, Gregory Farnum g...@inktank.com wrote:
   It's not an upgrade issue. There's an MDS object that is somehow
   missing. If it exists, then on restart you'll be fine.
  
   Oliver, what is your general cluster config? What filesystem are your
   OSDs running on? What version of Ceph were you upgrading from? There's
   really no way for this file to not exist once created unless the
   underlying FS ate it or the last write both was interrupted and hit
   some kind of bug in our transaction code (of which none are known)
   during replay.
   -Greg
   Software Engineer #42 @ http://inktank.com | http://ceph.com
  
  
   On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry larry@disney.com wrote:
   This is scary. Should I hold on upgrade?
  
   On 9/10/13 11:33 AM, Oliver Daudey oli...@xs4all.nl wrote:
  
  Hey Gregory,
  
  On 10-09-13 20:21, Gregory Farnum wrote:
   On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl
  wrote:
   Hey list,
  
   I just upgraded to Ceph 0.67.3.  What I did on every node of my 
   3-node
   cluster was:
   - Unmount CephFS everywhere.
   - Upgrade the Ceph-packages.
   - Restart MON.
   - Restart OSD.
   - Restart MDS.
  
   As soon as I got to the second node, the MDS crashed right after
  startup.
  
   Part of the logs (more on request):
  
   - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
   0~0] 1.d902
   70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
  -11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 
   boot_start
   1: openin
   g mds log
  -10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
   discovering lo
   g bounds
   -9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 
   mds.0.journaler(ro)
   recover s
   tart
   -8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 
   mds.0.journaler(ro)
   read_head
   -7 2013-09-10 19:35:02.799028 7fd1ba81f700  1 --
   194.109.43.12:6800/67277 -
   - 194.109.43.11:6800/16562 -- osd_op(mds.0.58:5 200. [read
  0~0]
   1.844f3
   494 e37647) v4 -- ?+0 0x1e48b40 con 0x1e5db00
   -6 2013-09-10 19:35:02.799053 7fd1ba81f700  1 --
   194.109.43.12:6800/67277 
   == mon.2 194.109.43.13:6789/0 16  mon_subscribe_ack(300s) v1 
   20+0+0 (42
   35168662 0 0) 0x1e93380 con 0x1e5d580
   -5 2013-09-10 19:35:02.799099 7fd1ba81f700 10 monclient:
   handle_subscribe_a
   ck sent 2013-09-10 19:35:02.796448 renew after 2013-09-10
  19:37:32.796448
   -4 2013-09-10 19:35:02.800907 7fd1ba81f700  5 mds.0.58
   ms_handle_connect on
194.109.43.12:6802/53419
   -3 2013-09-10 19:35:02.800927 7fd1ba81f700  5 mds.0.58
   ms_handle_connect on
194.109.43.13:6802/45791
   -2 2013-09-10 19:35:02.801176 7fd1ba81f700  5 mds.0.58
   ms_handle_connect on

Re: [ceph-users] xfsprogs not found in RHEL

2013-09-10 Thread sriram
I installed xfsprogs from
http://rpm.pbone.net/index.php3/stat/26/dist/74/size/1400502/name/xfsprogs-3.1.1-4.el6.src.rpm
.
I then ran sudo yum install ceph and I still get the same error. Any
ideas?


On Wed, Aug 28, 2013 at 3:47 PM, sriram sriram@gmail.com wrote:

 Can anyone point me to which xfsprogs RPM to use for RHEL 6


 On Wed, Aug 28, 2013 at 5:46 AM, Sriram sriram@gmail.com wrote:

 Yes I read that but I was not sure if installing from Centos 6 repository
 can cause issues.

 On Aug 27, 2013, at 11:46 PM, Stroppa Daniele (strp) s...@zhaw.ch
 wrote:

  Check this issue: http://tracker.ceph.com/issues/5193

  You might need the RHEL Scalable File System add-on.

  Cheers,
   --
 Daniele Stroppa
 Researcher
 Institute of Information Technology
 Zürich University of Applied Sciences
 http://www.cloudcomp.ch


   From: sriram sriram@gmail.com
 Date: Tue, 27 Aug 2013 22:50:41 -0700
 To: Lincoln Bryant linco...@uchicago.edu
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] xfsprogs not found in RHEL

  Tried

  yum clean all followed by
 yum install ceph

  and the same result.


 On Tue, Aug 27, 2013 at 7:44 PM, Lincoln Bryant linco...@uchicago.eduwrote:

 Hi,

  xfsprogs should be included in the EL6 base.

  Perhaps run yum clean all and try again?

  Cheers,
 Lincoln

On Aug 27, 2013, at 9:16 PM, sriram wrote:

I am trying to install CEPH and I get the following error -

  --- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
 -- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
 --- Package python-babel.noarch 0:0.9.4-5.1.el6 will be installed
 --- Package python-backports-ssl_match_hostname.noarch 0:3.2-0.3.a3.el6
 will be installed
 --- Package python-docutils.noarch 0:0.6-1.el6 will be installed
 -- Processing Dependency: python-imaging for package:
 python-docutils-0.6-1.el6.noarch
 --- Package python-jinja2.x86_64 0:2.2.1-1.el6 will be installed
 --- Package python-pygments.noarch 0:1.1.1-1.el6 will be installed
 --- Package python-six.noarch 0:1.1.0-2.el6 will be installed
 -- Running transaction check
 --- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
 -- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
 --- Package python-imaging.x86_64 0:1.1.6-19.el6 will be installed
 -- Finished Dependency Resolution
 Error: Package: ceph-0.67.2-0.el6.x86_64 (ceph)
Requires: xfsprogs


  Machine Info -

  Linux version 2.6.32-131.4.1.el6.x86_64 (
 mockbu...@x86-003.build.bos.redhat.com) (gcc version 4.4.5 20110214
 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Fri Jun 10 10:54:26 EDT 2011
   ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



  ___ ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory,

Ok, thanks for all your help!  It's weird, as if the object gets deleted
somewhere along the way, but the problem only becomes visible once you
restart the MDSs, which probably have it in memory and then fail to load
it after restart.  I'll answer the questions you had about my test-setup
in a moment.


   Regards,

 Oliver

On di, 2013-09-10 at 16:24 -0700, Gregory Farnum wrote:
 Nope, a repair won't change anything if scrub doesn't detect any
 inconsistencies. There must be something else going on, but I can't
 fathom what...I'll try and look through it a bit more tomorrow. :/
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
 On Tue, Sep 10, 2013 at 3:49 PM, Oliver Daudey oli...@xs4all.nl wrote:
  Hey Gregory,
 
  Thanks for your explanation.  Turns out to be 1.a7 and it seems to scrub
  OK.
 
  # ceph osd getmap -o osdmap
  # osdmaptool --test-map-object mds_anchortable --pool 1 osdmap
  osdmaptool: osdmap file 'osdmap'
   object 'mds_anchortable' - 1.a7 - [2,0]
  # ceph pg scrub 1.a7
 
  osd.2 logs:
  2013-09-11 00:41:15.843302 7faf56b1b700  0 log [INF] : 1.a7 scrub ok
 
  osd.0 didn't show anything in it's logs, though.  Should I try a repair
  next?
 
 
 Regards,
 
Oliver
 
  On di, 2013-09-10 at 15:01 -0700, Gregory Farnum wrote:
  If the problem is somewhere in RADOS/xfs/whatever, then there's a good
  chance that the mds_anchortable object exists in its replica OSDs,
  but when listing objects those aren't queried, so they won't show up
  in a listing. You can use the osdmaptool to map from an object name to
  the PG it would show up in, or if you look at your log you should see
  a line something like
  1 -- LOCAL IP -- OTHER IP -- osd_op(mds.0.31:3 mds_anchortable
  [read 0~0] 1.a977f6a7 e165) v4 -- ?+0 0x1e88d80 con 0x1f189a0
  In this example, metadata is pool 1 and 1.a977f6a7 is the hash of the
  msd_anchortable object, and depending on how many PGs are in the pool
  it will be in pg 1.a7, or 1.6a7, or 1.f6a7...
  -Greg
  Software Engineer #42 @ http://inktank.com | http://ceph.com
 
  On Tue, Sep 10, 2013 at 2:51 PM, Oliver Daudey oli...@xs4all.nl wrote:
   Hey Gregory,
  
   The only objects containing table I can find at all, are in the
   metadata-pool:
   # rados --pool=metadata ls | grep -i table
   mds0_inotable
  
   Looking at another cluster where I use CephFS, there is indeed an object
   named mds_anchortable, but the broken cluster is missing it.  I don't
   see how I can scrub the PG for an object that doesn't appear to exist.
   Please elaborate.
  
  
  Regards,
  
Oliver
  
   On di, 2013-09-10 at 14:06 -0700, Gregory Farnum wrote:
   Also, can you scrub the PG which contains the mds_anchortable object
   and see if anything comes up? You should be able to find the key from
   the logs (in the osd_op line that contains mds_anchortable) and
   convert that into the PG. Or you can just scrub all of osd 2.
   -Greg
   Software Engineer #42 @ http://inktank.com | http://ceph.com
  
  
   On Tue, Sep 10, 2013 at 1:59 PM, Gregory Farnum g...@inktank.com 
   wrote:
It's not an upgrade issue. There's an MDS object that is somehow
missing. If it exists, then on restart you'll be fine.
   
Oliver, what is your general cluster config? What filesystem are your
OSDs running on? What version of Ceph were you upgrading from? There's
really no way for this file to not exist once created unless the
underlying FS ate it or the last write both was interrupted and hit
some kind of bug in our transaction code (of which none are known)
during replay.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
   
   
On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry larry@disney.com 
wrote:
This is scary. Should I hold on upgrade?
   
On 9/10/13 11:33 AM, Oliver Daudey oli...@xs4all.nl wrote:
   
   Hey Gregory,
   
   On 10-09-13 20:21, Gregory Farnum wrote:
On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl
   wrote:
Hey list,
   
I just upgraded to Ceph 0.67.3.  What I did on every node of my 
3-node
cluster was:
- Unmount CephFS everywhere.
- Upgrade the Ceph-packages.
- Restart MON.
- Restart OSD.
- Restart MDS.
   
As soon as I got to the second node, the MDS crashed right after
   startup.
   
Part of the logs (more on request):
   
- 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable 
[read
0~0] 1.d902
70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
   -11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 mds.0.58 
boot_start
1: openin
g mds log
   -10 2013-09-10 19:35:02.798968 7fd1ba81f700  5 mds.0.log open
discovering lo
g bounds
-9 2013-09-10 19:35:02.798988 7fd1ba81f700  1 
mds.0.journaler(ro)
recover s
tart
-8 2013-09-10 19:35:02.798990 7fd1ba81f700  1 
mds.0.journaler(ro)
read_head
-7 2013-09-10 

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory,

On di, 2013-09-10 at 14:48 -0700, Gregory Farnum wrote:
 On Tue, Sep 10, 2013 at 2:36 PM, Oliver Daudey oli...@xs4all.nl wrote:
  Hey Gregory,
 
  My cluster consists of 3 nodes, each running 1 mon, 1 osd and 1 mds.  I
  upgraded from 0.67, but was still running 0.61.7 OSDs at the time of the
  upgrade, because of performance-issues that have just recently been
  fixed.  These have now been upgraded to 0.67.3, along with the rest of
  Ceph.  My OSDs are using XFS as the underlying FS.  I have been
  switching one OSD in my cluster back and forth between 0.61.7 and some
  test-versions, which where based on 0.67.x, to debug aforementioned
  performance-issues with Samuel, but that was before I newfs'ed and
  started using this instance of CephFS.  Furthermore, I don't seem to
  have lost any other data during these tests.
 
 Sam, any idea how we could have lost an object? I checked into how we
 touch this one, and all we ever do is read_full and write_full.
 
 
  BTW: CephFS has never been very stable for me during stress-tests.  If
  some components are brought down and back up again during operations,
  like stopping and restarting all components on one node while generating
  some load with a cp of a big CephFS directory-tree on another, then,
  once things settle again, doing the same on another node, it always
  quickly ends up like what I see now.
 
 Do you have multiple active MDSes? Or do you just mean when you do a
 reboot while generating load it migrates?

I use 1 active/2 standby.  If I happen to stop the active MDS, access to
CephFS hangs for a bit and then it switches to a standby-MDS, after
which access resumes.  By that time, I may bring the node with the MDS I
shut down back up, wait for things to settle and stop services on
another node.  I haven't used configurations with multiple active MDSs
much, because it was considered less well tested.

 
  MDSs crashing on start or on
  attempts to mount the CephFS and the only way out being to stop the
  MDSs, wipe the contents of the data and metadata-pools and doing the
  newfs-thing.  I can only assume you guys are putting it through similar
  stress-tests, but if not, try it.
 
 Yeah, we have a bunch of these. I'm not sure that we coordinate
 killing an entire node at a time, but I can't think of any way that
 would matter. :/

Now that I know what to look for when CephFS fails in this manner again,
I'll be sure to have a better look at the objects themselves and make a
detailed report to the list.

 
  PS: Is there a way to get back at the data after something like this?
  Do you still want me to keep the current situation to debug it further,
  or can I zap everything, restore my backups and move on?  Thanks!
 
 You could figure out how to build a fake anchortable (just generate an
 empty one with ceph-dencoder) and that would let you do most stuff,
 although if you have any hard links then those would be lost and I'm
 not sure exactly what that would mean at this point — it's possible
 with the new lookup-by-ino stuff that it wouldn't matter at all, or it
 might make them inaccessible from one link and un-deletable when
 removed from the other. (via the FS, that is.) If restoring from
 backups is feasible I'd probably just shoot for that after doing a
 scrub. (If the scrub turns up something dirty then probably it can be
 recovered via a RADOS repair.)

Thanks for your explanation!  I'll zap the whole thing and restore from
backup.


   Regards,

  Oliver

  On di, 2013-09-10 at 13:59 -0700, Gregory Farnum wrote:
  It's not an upgrade issue. There's an MDS object that is somehow
  missing. If it exists, then on restart you'll be fine.
 
  Oliver, what is your general cluster config? What filesystem are your
  OSDs running on? What version of Ceph were you upgrading from? There's
  really no way for this file to not exist once created unless the
  underlying FS ate it or the last write both was interrupted and hit
  some kind of bug in our transaction code (of which none are known)
  during replay.
  -Greg
  Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 
  On Tue, Sep 10, 2013 at 1:44 PM, Liu, Larry larry@disney.com wrote:
   This is scary. Should I hold on upgrade?
  
   On 9/10/13 11:33 AM, Oliver Daudey oli...@xs4all.nl wrote:
  
  Hey Gregory,
  
  On 10-09-13 20:21, Gregory Farnum wrote:
   On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey oli...@xs4all.nl
  wrote:
   Hey list,
  
   I just upgraded to Ceph 0.67.3.  What I did on every node of my 3-node
   cluster was:
   - Unmount CephFS everywhere.
   - Upgrade the Ceph-packages.
   - Restart MON.
   - Restart OSD.
   - Restart MDS.
  
   As soon as I got to the second node, the MDS crashed right after
  startup.
  
   Part of the logs (more on request):
  
   - 194.109.43.12:6802/53419 -- osd_op(mds.0.58:4 mds_snaptable [read
   0~0] 1.d902
   70ad e37647) v4 -- ?+0 0x1e48d80 con 0x1e5d9a0
  -11 2013-09-10 19:35:02.798962 7fd1ba81f700  2 

[ceph-users] Hit suicide timeout on osd start

2013-09-10 Thread Andrey Korolyov
Hello,

Got so-famous error on 0.61.8, just for little disk overload on OSD
daemon start. I currently have very large metadata per osd (about
20G), this may be an issue.

#0  0x7f2f46adeb7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00860469 in reraise_fatal (signum=6) at
global/signal_handler.cc:58
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:104
#3  signal handler called
#4  0x7f2f44b45405 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x7f2f44b48b5b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x7f2f4544389d in __gnu_cxx::__verbose_terminate_handler() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x7f2f45441996 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x7f2f454419c3 in std::terminate() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x7f2f45441bee in __cxa_throw () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x0090d2fa in ceph::__ceph_assert_fail (assertion=0xa38ab1
0 == \hit suicide timeout\, file=optimized out, line=79,
func=0xa38c60 bool
ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*,
time_t)) at common/assert.cc:77
#11 0x0087914b in ceph::HeartbeatMap::_check
(this=this@entry=0x26560e0, h=optimized out, who=who@entry=0xa38b40
is_healthy,
now=now@entry=1378860192) at common/HeartbeatMap.cc:79
#12 0x00879956 in ceph::HeartbeatMap::is_healthy
(this=this@entry=0x26560e0) at common/HeartbeatMap.cc:130
#13 0x00879f08 in ceph::HeartbeatMap::check_touch_file
(this=0x26560e0) at common/HeartbeatMap.cc:141
#14 0x009189f5 in CephContextServiceThread::entry
(this=0x2652200) at common/ceph_context.cc:68
#15 0x7f2f46ad6e9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#16 0x7f2f44c013dd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x in ?? ()
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] quick-ceph-deploy

2013-09-10 Thread sriram
Any help here is appreciated. I am pretty much stuck in trying to install
ceph on my local box.


On Tue, Sep 10, 2013 at 11:02 AM, sriram sriram@gmail.com wrote:

 Yes I am able to do that.


 On Fri, Sep 6, 2013 at 8:19 AM, Alfredo Deza alfredo.d...@inktank.comwrote:

 On Fri, Sep 6, 2013 at 11:05 AM, sriram sriram@gmail.com wrote:
  sudo su -c 'rpm --import
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
  error: https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc:
 key
  1 import failed.
 

 Can you actually get to that URL and see the GPG key?

 Via curl/wget or the browser (if you have one in that host)
 
  On Fri, Sep 6, 2013 at 8:01 AM, Alfredo Deza alfredo.d...@inktank.com
  wrote:
 
  On Fri, Sep 6, 2013 at 10:54 AM, sriram sriram@gmail.com wrote:
   I am running it on the same machine.
  
   [abc@abc-ld ~]$ ceph-deploy install abc-ld
  
  
   On Fri, Sep 6, 2013 at 5:42 AM, Alfredo Deza 
 alfredo.d...@inktank.com
   wrote:
  
   On Thu, Sep 5, 2013 at 8:25 PM, sriram sriram@gmail.com
 wrote:
I am trying to deploy ceph reading the instructions from this
 link.
   
http://ceph.com/docs/master/start/quick-ceph-deploy/
   
I get the error below. Can someone let me know if this is
 something
related
to what I am doing wrong or the script?
   
[abc@abc-ld ~]$ ceph-deploy install abc-ld
[ceph_deploy.install][DEBUG ] Installing stable version dumpling
 on
cluster
ceph hosts abc-ld
[ceph_deploy.install][DEBUG ] Detecting platform for host abc-ld
 ...
[sudo] password for abc:
[ceph_deploy.install][INFO  ] Distro info:
RedHatEnterpriseWorkstation
6.1
Santiago
[abc-ld][INFO  ] installing ceph on abc-ld
[abc-ld][INFO  ] Running command: su -c 'rpm --import
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 '
[abc-ld][ERROR ] Traceback (most recent call last):
[abc-ld][ERROR ]   File
   
   
 /usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py,
line
21, in install
[abc-ld][ERROR ]   File
/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py,
line
10,
in inner
[abc-ld][ERROR ] def inner(*args, **kwargs):
[abc-ld][ERROR ]   File
/usr/lib/python2.6/site-packages/ceph_deploy/util/wrappers.py,
 line
6,
in
remote_call
[abc-ld][ERROR ] This allows us to only remote-execute the
 actual
calls,
not whole functions.
[abc-ld][ERROR ]   File /usr/lib64/python2.6/subprocess.py, line
502,
in
check_call
[abc-ld][ERROR ] raise CalledProcessError(retcode, cmd)
[abc-ld][ERROR ] CalledProcessError: Command '['su -c \'rpm
 --import
   
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 \'']'
returned non-zero exit status 1
[abc-ld][ERROR ] error:
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc:
 key
1
import failed.
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: su
 -c
'rpm
--import
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 '
  
   Can you try running that command on the host that it failed (I think
   that would be abc-ld)
   and paste the output?
 
  I mean, to run the actual command (from the log output) that caused the
  failure.
 
  In your case, it would be:
 
  rpm --import
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;
 
  
   For some reason that `rpm --import` failed. Could be network
 related.
  
   
   
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   
  
  
 
 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] quick-ceph-deploy

2013-09-10 Thread sriram
I had followed that to install ceph-deploy but then I am not sure if there
is a difference between -

INSTALL-CEPH-DEPLOY described in
http://ceph.com/docs/master/start/quick-start-preflight/

and

INSTALLING CEPH DEPLOY described in http://ceph.com/docs/master/install/rpm/


The reason ceph-deploy install mon-ceph-node seems to fail is related to
accessing  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc:;.
The http://ceph.com/docs/master/start/quick-start-preflight/ has info about
it while the second link does not. Hence my question.


On Tue, Sep 10, 2013 at 1:21 PM, Tamil Muthamizhan 
tamil.muthamiz...@inktank.com wrote:

 Hi Sriram,

 this should help: http://ceph.com/docs/master/install/rpm/

 Regards,
 Tamil


 On Tue, Sep 10, 2013 at 12:55 PM, sriram sriram@gmail.com wrote:

 Can someone tell me the equivalent steps in RHEL for the steps below -

 wget -q -O- 
 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo 
 apt-key add -
 echo deb http://ceph.com/debian-dumpling/ $(lsb_release -sc) main | sudo tee 
 /etc/apt/sources.list.d/ceph.list
 sudo apt-get update
 sudo apt-get install ceph-deploy



 On Tue, Sep 10, 2013 at 12:40 PM, sriram sriram@gmail.com wrote:

 Any help here is appreciated. I am pretty much stuck in trying to
 install ceph on my local box.


 On Tue, Sep 10, 2013 at 11:02 AM, sriram sriram@gmail.com wrote:

 Yes I am able to do that.


 On Fri, Sep 6, 2013 at 8:19 AM, Alfredo Deza 
 alfredo.d...@inktank.comwrote:

 On Fri, Sep 6, 2013 at 11:05 AM, sriram sriram@gmail.com wrote:
  sudo su -c 'rpm --import
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
  error:
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc: key
  1 import failed.
 

 Can you actually get to that URL and see the GPG key?

 Via curl/wget or the browser (if you have one in that host)
 
  On Fri, Sep 6, 2013 at 8:01 AM, Alfredo Deza 
 alfredo.d...@inktank.com
  wrote:
 
  On Fri, Sep 6, 2013 at 10:54 AM, sriram sriram@gmail.com
 wrote:
   I am running it on the same machine.
  
   [abc@abc-ld ~]$ ceph-deploy install abc-ld
  
  
   On Fri, Sep 6, 2013 at 5:42 AM, Alfredo Deza 
 alfredo.d...@inktank.com
   wrote:
  
   On Thu, Sep 5, 2013 at 8:25 PM, sriram sriram@gmail.com
 wrote:
I am trying to deploy ceph reading the instructions from this
 link.
   
http://ceph.com/docs/master/start/quick-ceph-deploy/
   
I get the error below. Can someone let me know if this is
 something
related
to what I am doing wrong or the script?
   
[abc@abc-ld ~]$ ceph-deploy install abc-ld
[ceph_deploy.install][DEBUG ] Installing stable version
 dumpling on
cluster
ceph hosts abc-ld
[ceph_deploy.install][DEBUG ] Detecting platform for host
 abc-ld ...
[sudo] password for abc:
[ceph_deploy.install][INFO  ] Distro info:
RedHatEnterpriseWorkstation
6.1
Santiago
[abc-ld][INFO  ] installing ceph on abc-ld
[abc-ld][INFO  ] Running command: su -c 'rpm --import

 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
[abc-ld][ERROR ] Traceback (most recent call last):
[abc-ld][ERROR ]   File
   
   
 /usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py,
line
21, in install
[abc-ld][ERROR ]   File
   
 /usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py,
line
10,
in inner
[abc-ld][ERROR ] def inner(*args, **kwargs):
[abc-ld][ERROR ]   File
   
 /usr/lib/python2.6/site-packages/ceph_deploy/util/wrappers.py, line
6,
in
remote_call
[abc-ld][ERROR ] This allows us to only remote-execute the
 actual
calls,
not whole functions.
[abc-ld][ERROR ]   File /usr/lib64/python2.6/subprocess.py,
 line
502,
in
check_call
[abc-ld][ERROR ] raise CalledProcessError(retcode, cmd)
[abc-ld][ERROR ] CalledProcessError: Command '['su -c \'rpm
 --import
   

 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc\'']'
returned non-zero exit status 1
[abc-ld][ERROR ] error:
   
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc: key
1
import failed.
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command:
 su -c
'rpm
--import

 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
  
   Can you try running that command on the host that it failed (I
 think
   that would be abc-ld)
   and paste the output?
 
  I mean, to run the actual command (from the log output) that caused
 the
  failure.
 
  In your case, it would be:
 
  rpm --import
  https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;
 
  
   For some reason that `rpm --import` failed. Could be network
 related.
  
   
   
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com