Re: [ceph-users] LevelDB Backend For Ceph OSD Preview

2013-11-25 Thread Sebastien Han
Nice job Haomai!

 
Sébastien Han 
Cloud Engineer 

"Always give 100%. Unless you're giving blood.” 

Phone: +33 (0)1 49 70 99 72 
Mail: sebastien@enovance.com 
Address : 10, rue de la Victoire - 75009 Paris 
Web : www.enovance.com - Twitter : @enovance 

On 25 Nov 2013, at 02:50, Haomai Wang  wrote:

> 
> 
> 
> On Mon, Nov 25, 2013 at 2:17 AM, Mark Nelson  wrote:
> Great Work! This is very exciting!  Did you happen to try RADOS bench at 
> different object sizes and concurrency levels?
> 
> 
> Maybe can try it later. :-)
>  
> Mark
> 
> 
> On 11/24/2013 03:01 AM, Haomai Wang wrote:
> Hi all,
> 
> For Emperor
> blueprint(http://wiki.ceph.com/01Planning/02Blueprints/Emperor/Add_LevelDB_support_to_ceph_cluster_backend_store),
> I'm sorry to delay the progress. Now, I have done the most of the works
> for the blueprint's goal. Because of sage's F
> blueprint(http://wiki.ceph.com/index.php?title=01Planning/02Blueprints/Firefly/osd:_new_key%2F%2Fvalue_backend),
> I need to adjust some codes to match it. The branch is
> here(https://github.com/yuyuyu101/ceph/tree/wip/6173).
> 
> I have tested the LevelDB backend on three nodes(eight OSDs) and compare
> it to FileStore(ext4). I just use intern benchmark tool "rados bench" to
> get the comparison. The default ceph configurations is used and
> replication size is 2. The filesystem is ext4 and no others changed. The
> results is below:
> 
> *Rados Bench*
> 
> 
> 
> *Bandwidth(MB/sec)*
> 
> 
> 
> *Average Latency*
> 
> 
> 
> *Max Latency*
> 
> 
> 
> *Min Latency*
> 
> 
> 
> *Stddev Latency*
> 
> 
> 
> *Stddev Bandwidth(MB/sec)*
> 
> 
> 
> *Max Bandwidth(MB/sec)*
> 
> 
> 
> *Min Bandwidth(MB/sec)*
> 
> 
> 
> 
> *KVStore*
> 
> 
> 
> *FileStore*
> 
> 
> 
> *KVStore*
> 
> 
> 
> *FileStore*
> 
> 
> 
> *KVStore*
> 
> 
> 
> *FileStore*
> 
> 
> 
> *KVStore*
> 
> 
> 
> *FileStore*
> 
> 
> 
> *KVStore*
> 
> 
> 
> *FileStore*
> 
> 
> 
> *KVStore*
> 
> 
> 
> *FileStore*
> 
> 
> 
> *KVStore*
> 
> 
> 
> *FileStore*
> 
> 
> 
> *KVStore*
> 
> 
> 
> *FileStore*
> 
> *Write 30*
> 
> 
> 
> 
> 24.590
> 
> 
> 
> 23.495
> 
> 
> 
> 4.87257
> 
> 
> 
> 5.07716
> 
> 
> 
> 14.752
> 
> 
> 
> 13.0885
> 
> 
> 
> 0.580851
> 
> 
> 
> 0.605118
> 
> 
> 
> 2.97708
> 
> 
> 
> 3.30538
> 
> 
> 
> 9.91938
> 
> 
> 
> 10.5986
> 
> 
> 
> 44
> 
> 
> 
> 76
> 
> 
> 
> 0
> 
> 
> 
> 0
> 
> *Write 20*
> 
> 
> 
> 
> 23.515
> 
> 
> 
> 23.064
> 
> 
> 
> 3.39745
> 
> 
> 
> 3.45711
> 
> 
> 
> 11.6089
> 
> 
> 
> 11.5996
> 
> 
> 
> 0.169507
> 
> 
> 
> 0.138595
> 
> 
> 
> 2.58285
> 
> 
> 
> 2.75962
> 
> 
> 
> 9.14467
> 
> 
> 
> 8.54156
> 
> 
> 
> 44
> 
> 
> 
> 40
> 
> 
> 
> 0
> 
> 
> 
> 0
> 
> *Write 10*
> 
> 
> 
> 
> 22.927
> 
> 
> 
> 21.980
> 
> 
> 
> 1.73815
> 
> 
> 
> 1.8198
> 
> 
> 
> 5.53792
> 
> 
> 
> 6.46675
> 
> 
> 
> 0.171028
> 
> 
> 
> 0.143392
> 
> 
> 
> 1.05982
> 
> 
> 
> 1.20303
> 
> 
> 
> 9.18403
> 
> 
> 
> 8.74401
> 
> 
> 
> 44
> 
> 
> 
> 40
> 
> 
> 
> 0
> 
> 
> 
> 0
> 
> *Write 5*
> 
> 
> 
> 
> 19.680
> 
> 
> 
> 20.017
> 
> 
> 
> 1.01492
> 
> 
> 
> 0.997019
> 
> 
> 
> 3.10783
> 
> 
> 
> 3.05008
> 
> 
> 
> 0.143758
> 
> 
> 
> 0.138161
> 
> 
> 
> 0.561548
> 
> 
> 
> 0.571459
> 
> 
> 
> 5.92575
> 
> 
> 
> 6.844
> 
> 
> 
> 36
> 
> 
> 
> 32
> 
> 
> 
> 0
> 
> 
> 
> 0
> 
> *Read 30*
> 
> 
> 
> 
> 65.852
> 
> 
> 
> 60.688
> 
> 
> 
> 1.80069
> 
> 
> 
> 1.96009
> 
> 
> 
> 9.30039
> 
> 
> 
> 10.1146
> 
> 
> 
> 0.115153
> 
> 
> 
> 0.061657
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> *Read 20*
> 
> 
> 
> 
> 59.372
> 
> 
> 
> 60.738
> 
> 
> 
> 1.30479
> 
> 
> 
> 1.28383
> 
> 
> 
> 6.28435
> 
> 
> 
> 8.21304
> 
> 
> 
> 0.016843
> 
> 
> 
> 0.012073
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> *Read 10*
> 
> 
> 
> 
> 65.502
> 
> 
> 
> 55.814
> 
> 
> 
> 0.608805
> 
> 
> 
> 0.7087
> 
> 
> 
> 3.3917
> 
> 
> 
> 4.72626
> 
> 
> 
> 0.016267
> 
> 
> 
> 0.011998
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: [ceph-users] alternative approaches to CEPH-FS

2013-11-25 Thread Sebastien Han
Hi,

1) nfs over rbd (http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/)

This has been in production for more than a year now and heavily tested before.
Performance was not expected since frontend server mainly do read (90%).

Cheers.
 
Sébastien Han 
Cloud Engineer 

"Always give 100%. Unless you're giving blood.” 

Phone: +33 (0)1 49 70 99 72 
Mail: sebastien@enovance.com 
Address : 10, rue de la Victoire - 75009 Paris 
Web : www.enovance.com - Twitter : @enovance 

On 14 Nov 2013, at 17:08, Gautam Saxena  wrote:

> 1) nfs over rbd (http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel 520/530 SSD for ceph

2013-11-25 Thread James Pearce
Having a configurable would be ideal.  User should be made beware of 
the need for super-caps via documentation in that case.


Quickly eye-balling the code... can this be patched via journaller.cc 
for testing?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel 520/530 SSD for ceph

2013-11-25 Thread Stefan Priebe - Profihost AG
Hi James,

after having some discussion with the kernel guys and after digging
through the kernel code and sending a patch today ;-)

It is quite easy todo this via the kernel using this one:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=39c60a0948cc06139e2fbfe084f83cb7e7deae3b

Just do:
echo temporary write through > /sys/class/scsi_disk//cache_type

for this disk and the kernel internally does not do any FLUSH. Even
tough the command seems to disable write back it does not.

But it still needs a patch of myself from today:


Signed-off-by: Stefan Priebe 
---
 drivers/scsi/sd.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 734a29a..ccc6242 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -174,7 +174,7 @@ sd_store_cache_type(struct device *dev, struct
device_attribute *attr,
if (sdkp->cache_override) {
sdkp->WCE = wce;
sdkp->RCD = rcd;
-   return count;
+   goto out;
}

if (scsi_mode_sense(sdp, 0x08, 8, buffer, sizeof(buffer), SD_TIMEOUT,
@@ -194,6 +194,7 @@ sd_store_cache_type(struct device *dev, struct
device_attribute *attr,
sd_print_sense_hdr(sdkp, &sshdr);
return -EINVAL;
}
+out:
revalidate_disk(sdkp->disk);
return count;
 }
-- 1.7.10.4



Stefan

Am 25.11.2013 10:59, schrieb James Pearce:
> Having a configurable would be ideal.  User should be made beware of the
> need for super-caps via documentation in that case.
> 
> Quickly eye-balling the code... can this be patched via journaller.cc
> for testing?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Container size via s3api

2013-11-25 Thread Mihály Árva-Tóth
Hello,

I can retrieve a container reservation (sum of objects's size inside) via
Swift API:

$ swift -V 1.0 -A http://localhost/auth -U test:swift -K xxx stat
test_container

  Account: v1
Container: test_container
  Objects: 549
*Bytes: 31665126*

How can I get this information via s3api?

Thank you,
Mihaly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rest api json format incomplet informations

2013-11-25 Thread eric mourgaya
Hi,

I'am trying to list  snapshots of pool  using  ceph-rest-api. The json
format only display the last snapshot of the  pool not all.
the ceph version is ceph version 0.67.3
(408cd61584c72c0d97b774b3d8f95c6b1b06341a)


http://@ip/api/v0.1/osd/dump  :


32013-11-25
11:37:34.695874ericsnap142013-11-25
11:40:30.832976ericsnap252013-11-25
12:40:02.365069ericsnap2wq

versus

http://@ip/api/v0.1/osd/dump.json or with
headers={"Accept":"application/json"}

 "pool_snaps": {  "pool_snap_info": {"stamp":
"2013-11-25 12:40:02.365069","snapid": 5,
"name": "ericsnap2wq"  }},

 I
have you also have the same problem?



-- 
Eric Mourgaya,


Respectons la planete!
Luttons contre la mediocrite!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] alternative approaches to CEPH-FS

2013-11-25 Thread Gautam Saxena
Hi Sebastien.

Thanks! WHen you say "performance was not expected", can you elaborate a
little? Specifically, what did you notice in terms of performance?



On Mon, Nov 25, 2013 at 4:39 AM, Sebastien Han
wrote:

> Hi,
>
> 1) nfs over rbd (http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/
> )
>
> This has been in production for more than a year now and heavily tested
> before.
> Performance was not expected since frontend server mainly do read (90%).
>
> Cheers.
> 
> Sébastien Han
> Cloud Engineer
>
> "Always give 100%. Unless you're giving blood.”
>
> Phone: +33 (0)1 49 70 99 72
> Mail: sebastien@enovance.com
> Address : 10, rue de la Victoire - 75009 Paris
> Web : www.enovance.com - Twitter : @enovance
>
> On 14 Nov 2013, at 17:08, Gautam Saxena  wrote:
>
> > 1) nfs over rbd (
> http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/)
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to enable rbd cache

2013-11-25 Thread Shu, Xinxin
Recently , I want to enable rbd cache to identify performance benefit. I add 
rbd_cache=true option in my ceph configure file, I use 'virsh attach-device' to 
attach rbd to vm, below is my vdb xml file.


  
  
  
  6b5ff6f4-9f8c-4fe0-84d6-9d795967c7dd
  i


I do not know this is ok to enable rbd cache. I see perf counter for rbd cache 
in source code, but when I used admin daemon to check rbd cache statistics,

Ceph -admin-daemon /var/run/ceph/ceph-osd.0.asok perf dump

But I did not get any rbd cahce flags.

My question is how to enable rbd cahce and check rbd cache perf counter, or how 
can I make sure rbd cache is enabled, any tips will be appreciated? Thanks in 
advanced.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] alternative approaches to CEPH-FS

2013-11-25 Thread Sebastien Han
Hi,

Well, basically, the frontend is composed of web servers. 
They mostly do reads on the NFS mount. 
I believe that the biggest frontend has around 60 virtual machines, accessing 
the share and serving it.

Unfortunately, I don’t have any figures anymore but performances were really 
poor in general. However they were fair enough for us since the workload was 
going to be “mixed read”.

 
Sébastien Han 
Cloud Engineer 

"Always give 100%. Unless you're giving blood.” 

Phone: +33 (0)1 49 70 99 72 
Mail: sebastien@enovance.com 
Address : 10, rue de la Victoire - 75009 Paris 
Web : www.enovance.com - Twitter : @enovance 

On 25 Nov 2013, at 13:50, Gautam Saxena  wrote:

> Hi Sebastien.
> 
> Thanks! WHen you say "performance was not expected", can you elaborate a 
> little? Specifically, what did you notice in terms of performance?
> 
> 
> 
> On Mon, Nov 25, 2013 at 4:39 AM, Sebastien Han  
> wrote:
> Hi,
> 
> 1) nfs over rbd (http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/)
> 
> This has been in production for more than a year now and heavily tested 
> before.
> Performance was not expected since frontend server mainly do read (90%).
> 
> Cheers.
> 
> Sébastien Han
> Cloud Engineer
> 
> "Always give 100%. Unless you're giving blood.”
> 
> Phone: +33 (0)1 49 70 99 72
> Mail: sebastien@enovance.com
> Address : 10, rue de la Victoire - 75009 Paris
> Web : www.enovance.com - Twitter : @enovance
> 
> On 14 Nov 2013, at 17:08, Gautam Saxena  wrote:
> 
> > 1) nfs over rbd (http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/)
> 
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to enable rbd cache

2013-11-25 Thread Mark Nelson

On 11/25/2013 07:21 AM, Shu, Xinxin wrote:

Recently , I want to enable rbd cache to identify performance benefit. I
add rbd_cache=true option in my ceph configure file, I use ’virsh
attach-device’ to attach rbd to vm, below is my vdb xml file.


Ceph configuration files are a bit confusing because sometimes you'll 
see something like "rbd_cache" listed somewhere but in the ceph.conf 
file you'll want a space instead:


rbd cache = true

with no underscore.  That should (hopefully) fix it for you!





   

   

   

   6b5ff6f4-9f8c-4fe0-84d6-9d795967c7dd

   i



I do not know this is ok to enable rbd cache. I see perf counter for rbd
cache in source code, but when I used admin daemon to check rbd cache
statistics,

Ceph –admin-daemon /var/run/ceph/ceph-osd.0.asok perf dump

But I did not get any rbd cahce flags.

My question is how to enable rbd cahce and check rbd cache perf counter,
or how can I make sure rbd cache is enabled, any tips will be
appreciated? Thanks in advanced.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Openstack Havana, boot from volume fails

2013-11-25 Thread Jens-Christian Fischer
Hi Narendra

rbd for cinder and glance are according to the ceph documentation here: 
http://ceph.com/docs/master/rbd/rbd-openstack/

rbd for VM images configured like so: https://review.openstack.org/#/c/36042/

config sample (nova.conf):

--- cut ---

volume_driver=nova.volume.driver.RBDDriver
rbd_pool=volumes
rbd_user=volumes
rbd_secret_uuid=--


libvirt_images_type=rbd
# the RADOS pool in which rbd volumes are stored (string value)
libvirt_images_rbd_pool=volumes
# path to the ceph configuration file to use (string value)
libvirt_images_rbd_ceph_conf=/etc/ceph/ceph.conf


# dont inject stuff into partions, RBD backed partitions don't work that way
libvirt_inject_partition = -2

--- cut ---

and finally, used the following files from this repository: 
https://github.com/jdurgin/nova/tree/havana-ephemeral-rbd

image/glance.py
virt/images.py
virt/libvirt/driver.py
virt/libvirt/imagebackend.py
virt/libvirt/utils.py

good luck :)

cheers
jc

-- 
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fisc...@switch.ch
http://www.switch.ch

http://www.switch.ch/socialmedia

On 22.11.2013, at 17:41, "Trivedi, Narendra"  
wrote:

> Hi Jean,
>  
> Could you please tell me which link you followed to install RBD etc. for 
> Havana?
>  
> Thanks!
> Narendra
>  
> From: ceph-users-boun...@lists.ceph.com 
> [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jens-Christian Fischer
> Sent: Thursday, November 21, 2013 8:06 AM
> To: ceph-users@lists.ceph.com
> Cc: Rüdiger Rissmann
> Subject: [ceph-users] Openstack Havana, boot from volume fails
>  
> Hi all
>  
> I'm playing with the boot from volume options in Havana and have run into 
> problems:
>  
> (Openstack Havana, Ceph Dumpling (0.67.4), rbd for glance, cinder and 
> experimental ephemeral disk support)
>  
> The following things do work:
> - glance images are in rbd
> - cinder volumes are in rbd
> - creating a VM from an image works
> - creating a VM from a snapshot works
>  
>  
> However, the booting from volume fails:
>  
> Steps to reproduce:
>  
> Boot from image
> Create snapshot from running instance
> Create volume from this snapshot
> Start a new instance with "boot from volume" and the volume just created:
>  
> The boot process hangs after around 3 seconds, and the console.log of the 
> instance shows this:
>  
> [0.00] Linux version 3.11.0-12-generic (buildd@allspice) (gcc version 
> 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu7) ) #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 
> 2013 (Ubuntu 3.11.0-12.19-generic 3.11.3)
> [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.11.0-12-generic 
> root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0
> ...
> [0.098221] Brought up 1 CPUs
> [0.098964] smpboot: Total of 1 processors activated (4588.94 BogoMIPS)
> [0.100408] NMI watchdog: enabled on all CPUs, permanently consumes one 
> hw-PMU counter.
> [0.102667] devtmpfs: initialized
> …
> [0.560202] Linux agpgart interface v0.103
> [0.562276] brd: module loaded
> [0.563599] loop: module loaded
> [0.565315]  vda: vda1
> [0.568386] scsi0 : ata_piix
> [0.569217] scsi1 : ata_piix
> [0.569972] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc0a0 irq 14
> [0.571289] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc0a8 irq 15
> …
> [0.742082] Freeing unused kernel memory: 1040K (8800016fc000 - 
> 88000180)
> [0.746153] Freeing unused kernel memory: 836K (880001b2f000 - 
> 880001c0)
> Loading, please wait...
> [0.764177] systemd-udevd[95]: starting version 204
> [0.787913] floppy: module verification failed: signature and/or required 
> key missing - tainting kernel
> [0.825174] FDC 0 is a S82078B
> …
> [1.448178] tsc: Refined TSC clocksource calibration: 2294.376 MHz
> error: unexpectedly disconnected from boot status daemon
> Begin: Loading essential drivers ... done.
> Begin: Running /scripts/init-premount ... done.
> Begin: Mounting root file system ... Begin: Running /scripts/local-top ... 
> done.
> Begin: Running /scripts/local-premount ... done.
> [2.384452] EXT4-fs (vda1): mounted filesystem with ordered data mode. 
> Opts: (null)
> Begin: Running /scripts/local-bottom ... done.
> done.
> Begin: Running /scripts/init-bottom ... done.
> [3.021268] init: mountall main process (193) killed by FPE signal
> General error mounting filesystems.
> A maintenance shell will now be started.
> CONTROL-D will terminate this shell and reboot the system.
> root@box-web1:~# 
> The console is stuck, I can't get to the rescue shell
>  
> I can "rbd map" the volume and mount it from a physical host - the filesystem 
> etc all is in good order.
>  
> Any ideas?
>  
> cheers
> jc
>  
> -- 
> SWITCH
> Jens-Christian Fischer, Peta Solutions
> Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
> phone +41 44 268 15 15, direct +41 44 268 15 71
> jens-chris

Re: [ceph-users] Openstack Havana, boot from volume fails

2013-11-25 Thread Jens-Christian Fischer
Hi Steffen

the virsh secret is defined on all compute hosts. Booting from a volume works 
(it's the "boot from image (create volume)" part that doesn't work

cheers
jc
 
-- 
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fisc...@switch.ch
http://www.switch.ch

http://www.switch.ch/socialmedia

On 21.11.2013, at 15:46, Steffen Thorhauer  
wrote:

> Hi,
> I think you have to set the libvirt secret for your ceph UUID on your 
> nova-compute node  like 
> 
> 
> e1915277-e3a5-4547-bc9e-4991c6864dc7
>   
> client.volumes secret
>   
> 
> 
> in secret.xml 
> 
> virsh secret-define secret.xml
> and set the secret
> 
> virsh  secret-set-value e1915277-e3a5-4547-bc9e-4991c6864dc7 
> ceph-secret-of.client-volumes
> 
> Regards,
>   Steffen Thorhauer
> 
> On 11/21/2013 03:05 PM, Jens-Christian Fischer wrote:
>> Hi all
>> 
>> I'm playing with the boot from volume options in Havana and have run into 
>> problems:
>> 
>> (Openstack Havana, Ceph Dumpling (0.67.4), rbd for glance, cinder and 
>> experimental ephemeral disk support)
>> 
>> The following things do work:
>> - glance images are in rbd
>> - cinder volumes are in rbd
>> - creating a VM from an image works
>> - creating a VM from a snapshot works
>> 
>> 
>> However, the booting from volume fails:
>> 
>> Steps to reproduce:
>> 
>> Boot from image
>> Create snapshot from running instance
>> Create volume from this snapshot
>> Start a new instance with "boot from volume" and the volume just created:
>> 
>> The boot process hangs after around 3 seconds, and the console.log of the 
>> instance shows this:
>> 
>> [0.00] Linux version 3.11.0-12-generic (buildd@allspice) (gcc 
>> version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu7) ) #19-Ubuntu SMP Wed Oct 9 
>> 16:20:46 UTC 2013 (Ubuntu 3.11.0-12.19-generic 3.11.3)
>> [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.11.0-12-generic 
>> root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0
>> ...
>> [0.098221] Brought up 1 CPUs
>> [0.098964] smpboot: Total of 1 processors activated (4588.94 BogoMIPS)
>> [0.100408] NMI watchdog: enabled on all CPUs, permanently consumes one 
>> hw-PMU counter.
>> [0.102667] devtmpfs: initialized
>> …
>> [0.560202] Linux agpgart interface v0.103
>> [0.562276] brd: module loaded
>> [0.563599] loop: module loaded
>> [0.565315]  vda: vda1
>> [0.568386] scsi0 : ata_piix
>> [0.569217] scsi1 : ata_piix
>> [0.569972] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc0a0 irq 14
>> [0.571289] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc0a8 irq 15
>> …
>> [0.742082] Freeing unused kernel memory: 1040K (8800016fc000 - 
>> 88000180)
>> [0.746153] Freeing unused kernel memory: 836K (880001b2f000 - 
>> 880001c0)
>> Loading, please wait...
>> [0.764177] systemd-udevd[95]: starting version 204
>> [0.787913] floppy: module verification failed: signature and/or required 
>> key missing - tainting kernel
>> [0.825174] FDC 0 is a S82078B
>> …
>> [1.448178] tsc: Refined TSC clocksource calibration: 2294.376 MHz
>> error: unexpectedly disconnected from boot status daemon
>> Begin: Loading essential drivers ... done.
>> Begin: Running /scripts/init-premount ... done.
>> Begin: Mounting root file system ... Begin: Running /scripts/local-top ... 
>> done.
>> Begin: Running /scripts/local-premount ... done.
>> [2.384452] EXT4-fs (vda1): mounted filesystem with ordered data mode. 
>> Opts: (null)
>> Begin: Running /scripts/local-bottom ... done.
>> done.
>> Begin: Running /scripts/init-bottom ... done.
>> [3.021268] init: mountall main process (193) killed by FPE signal
>> General error mounting filesystems.
>> A maintenance shell will now be started.
>> CONTROL-D will terminate this shell and reboot the system.
>> root@box-web1:~# 
>> The console is stuck, I can't get to the rescue shell
>> 
>> I can "rbd map" the volume and mount it from a physical host - the 
>> filesystem etc all is in good order.
>> 
>> Any ideas?
>> 
>> cheers
>> jc
>> 
>> -- 
>> SWITCH
>> Jens-Christian Fischer, Peta Solutions
>> Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
>> phone +41 44 268 15 15, direct +41 44 268 15 71
>> jens-christian.fisc...@switch.ch
>> http://www.switch.ch
>> 
>> http://www.switch.ch/socialmedia
>> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> -- 
> __
> Steffen Thorhauer
> 
> Department of Technichal and Business Information Systems (ITI)
> Faculty of Computer Science (FIN)
>   Otto von Guericke University Magdeburg
> Universitaetsplatz 2
> 39106 Magdeburg, Germany
> 
> phone: 0391 67 52996
> fax: 0391 67 12341
> email: s...@iti.cs.uni-magdeburg.de
> url: http://wwwiti.cs.uni

[ceph-users] PG state diagram

2013-11-25 Thread Regola, Nathan (Contractor)
Is there a vector graphics file (or a higher resolution file of some type)
of the state diagram on the page below, as I can't read the text.

Thanks,
Nate


http://ceph.com/docs/master/dev/peering/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG state diagram

2013-11-25 Thread Ирек Фасихов
Yes, I would like to see this graph.

Thanks


2013/11/25 Regola, Nathan (Contractor) 

> Is there a vector graphics file (or a higher resolution file of some type)
> of the state diagram on the page below, as I can't read the text.
>
> Thanks,
> Nate
>
>
> http://ceph.com/docs/master/dev/peering/
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] HEALTH_WARN # requests are blocked > 32 sec

2013-11-25 Thread Michael

Hi,

Any ideas on troubleshooting a "requests are blocked" when all of the 
nodes appear to be running OK?
Nothing gets reported in  /var/log/ceph/ceph.log as everything is 
active+clean throughout the event. All of the nodes can be accessed and 
all report the warning while they are blocking.


root@srv-8:~# ceph -w
cluster ab3f7bc0-4cf7-4489-9cde-1af11d68a834
 health HEALTH_WARN 151 requests are blocked > 32 sec
 monmap e5: 3 mons at 
{srv-10=#:6789/0,srv-11=#:6789/0,srv-12=#:6789/0}, election epoch 398, 
quorum 0,1,2 srv-10,srv-11,srv-12

 mdsmap e182: 1/1/1 up {0=srv-195-8-125-10=up:active}
 osdmap e12081: 9 osds: 9 up, 9 in
  pgmap v4454492: 1864 pgs, 4 pools, 244 GB data, 1555 kobjects
488 GB used, 6494 GB / 6982 GB avail
1864 active+clean
  client io 18656 B/s wr, 1 op/s

2013-11-25 13:55:10.080698 mon.0 [INF] pgmap v4454490: 1864 pgs: 1864 
active+clean; 244 GB data, 488 GB used, 6494 GB / 6982 GB avail; 10378 
B/s wr, 1 op/s
2013-11-25 13:55:11.165616 mon.0 [INF] pgmap v4454491: 1864 pgs: 1864 
active+clean; 244 GB data, 488 GB used, 6494 GB / 6982 GB avail; 37331 
B/s wr, 4 op/s
2013-11-25 13:55:13.078688 mon.0 [INF] pgmap v4454492: 1864 pgs: 1864 
active+clean; 244 GB data, 488 GB used, 6494 GB / 6982 GB avail; 18656 
B/s wr, 1 op/s


All reads seem to get prevented from happening while it's going on.

root@srv-10:~# ceph -v
ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de)

Seems to happen for periods of a couple of minutes then wake up again.

Thanks much,
-Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_WARN # requests are blocked > 32 sec

2013-11-25 Thread Ирек Фасихов
ceph health detail


2013/11/25 Michael 

> Hi,
>
> Any ideas on troubleshooting a "requests are blocked" when all of the
> nodes appear to be running OK?
> Nothing gets reported in  /var/log/ceph/ceph.log as everything is
> active+clean throughout the event. All of the nodes can be accessed and all
> report the warning while they are blocking.
>
> root@srv-8:~# ceph -w
> cluster ab3f7bc0-4cf7-4489-9cde-1af11d68a834
>  health HEALTH_WARN 151 requests are blocked > 32 sec
>  monmap e5: 3 mons at {srv-10=#:6789/0,srv-11=#:6789/0,srv-12=#:6789/0},
> election epoch 398, quorum 0,1,2 srv-10,srv-11,srv-12
>  mdsmap e182: 1/1/1 up {0=srv-195-8-125-10=up:active}
>  osdmap e12081: 9 osds: 9 up, 9 in
>   pgmap v4454492: 1864 pgs, 4 pools, 244 GB data, 1555 kobjects
> 488 GB used, 6494 GB / 6982 GB avail
> 1864 active+clean
>   client io 18656 B/s wr, 1 op/s
>
> 2013-11-25 13:55:10.080698 mon.0 [INF] pgmap v4454490: 1864 pgs: 1864
> active+clean; 244 GB data, 488 GB used, 6494 GB / 6982 GB avail; 10378 B/s
> wr, 1 op/s
> 2013-11-25 13:55:11.165616 mon.0 [INF] pgmap v4454491: 1864 pgs: 1864
> active+clean; 244 GB data, 488 GB used, 6494 GB / 6982 GB avail; 37331 B/s
> wr, 4 op/s
> 2013-11-25 13:55:13.078688 mon.0 [INF] pgmap v4454492: 1864 pgs: 1864
> active+clean; 244 GB data, 488 GB used, 6494 GB / 6982 GB avail; 18656 B/s
> wr, 1 op/s
>
> All reads seem to get prevented from happening while it's going on.
>
> root@srv-10:~# ceph -v
> ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de)
>
> Seems to happen for periods of a couple of minutes then wake up again.
>
> Thanks much,
> -Michael
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_WARN # requests are blocked > 32 sec

2013-11-25 Thread Michael

OK waited for it to happen again and detail started with:

HEALTH_WARN 2 requests are blocked > 32 sec; 1 osds have slow requests
2 ops are blocked > 32.768 sec
2 ops are blocked > 32.768 sec on osd.3
1 osds have slow requests

and slowly moving on to:

HEALTH_WARN 154 requests are blocked > 32 sec; 2 osds have slow requests
57 ops are blocked > 524.288 sec
71 ops are blocked > 262.144 sec
16 ops are blocked > 131.072 sec
10 ops are blocked > 65.536 sec
40 ops are blocked > 524.288 sec on osd.3
21 ops are blocked > 262.144 sec on osd.3
10 ops are blocked > 131.072 sec on osd.3
5 ops are blocked > 65.536 sec on osd.3
17 ops are blocked > 524.288 sec on osd.8
50 ops are blocked > 262.144 sec on osd.8
6 ops are blocked > 131.072 sec on osd.8
5 ops are blocked > 65.536 sec on osd.8
2 osds have slow requests

Writes /seem /to be happing during the block but this is now getting 
more frequent and seems to be for longer periods.

Looking at the osd logs for 3 and 8 there's nothing of relevance in there.

Any ideas on the next step?

Thanks,
-Michael

On 25/11/2013 15:28, Ирек Фасихов wrote:

ceph health detail

--
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pg inconsistent : found clone without head

2013-11-25 Thread Laurent Barbe

Hello,

Since yesterday, scrub has detected an inconsistent pg :( :

# ceph health detail(ceph version 0.61.9)
HEALTH_ERR 1 pgs inconsistent; 9 scrub errors
pg 3.136 is active+clean+inconsistent, acting [9,1]
9 scrub errors

# ceph pg map 3.136
osdmap e4363 pg 3.136 (3.136) -> up [9,1] acting [9,1]

But when I try to repair, osd.9 daemon failed :

# ceph pg repair 3.136
instructing pg 3.136 on osd.9 to repair

2013-11-25 10:04:09.758845 7fc2f0706700  0 log [ERR] : 3.136 osd.9 
missing 96ad1336/rb.0.32a6.238e1f29.00034d6a/5ab//3
2013-11-25 10:04:09.759862 7fc2f0706700  0 log [ERR] : repair 3.136 
96ad1336/rb.0.32a6.238e1f29.00034d6a/5ab//3 found clone without head
2013-11-25 10:04:12.872908 7fc2f0706700  0 log [ERR] : 3.136 osd.9 
missing e5822336/rb.0.32a6.238e1f29.00036552/5b3//3
2013-11-25 10:04:12.873064 7fc2f0706700  0 log [ERR] : repair 3.136 
e5822336/rb.0.32a6.238e1f29.00036552/5b3//3 found clone without head
2013-11-25 10:04:14.497750 7fc2f0706700  0 log [ERR] : 3.136 osd.9 
missing 38372336/rb.0.32a6.238e1f29.00011379/5bb//3
2013-11-25 10:04:14.497796 7fc2f0706700  0 log [ERR] : repair 3.136 
38372336/rb.0.32a6.238e1f29.00011379/5bb//3 found clone without head
2013-11-25 10:04:57.557894 7fc2f0706700  0 log [ERR] : 3.136 osd.9 
missing 109b8336/rb.0.32a6.238e1f29.0003ad6b/5ab//3
2013-11-25 10:04:57.558052 7fc2f0706700  0 log [ERR] : repair 3.136 
109b8336/rb.0.32a6.238e1f29.0003ad6b/5ab//3 found clone without head
2013-11-25 10:17:45.835145 7fc2f0706700  0 log [ERR] : 3.136 repair stat 
mismatch, got 8289/8292 objects, 1981/1984 clones, 
26293444608/26294251520 bytes.
2013-11-25 10:17:45.835248 7fc2f0706700  0 log [ERR] : 3.136 repair 4 
missing, 0 inconsistent objects
2013-11-25 10:17:45.835320 7fc2f0706700  0 log [ERR] : 3.136 repair 9 
errors, 5 fixed
2013-11-25 10:17:45.839963 7fc2f0f07700 -1 osd/ReplicatedPG.cc: In 
function 'int ReplicatedPG::recover_primary(int)' thread 7fc2f0f07700 
time 2013-11-25 10:17:45.836790

osd/ReplicatedPG.cc: 6643: FAILED assert(latest->is_update())


The object (found clone without head) concern the rbd images below 
(which is in use) :


# rbd info datashare/share3
rbd image 'share3':
size 1024 GB in 262144 objects
order 22 (4096 KB objects)
block_name_prefix: rb.0.32a6.238e1f29
format: 1


Directory contents :
In OSD.9 (Primary) :
/var/lib/ceph/osd/ceph-9/current/3.136_head/DIR_6/DIR_3/DIR_3/DIR_1# ls 
-l rb.0.32a6.238e1f29.00034d6a*
-rw-r--r-- 1 root root 4194304 nov.   6 02:25 
rb.0.32a6.238e1f29.00034d6a__7ed_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.   8 02:40 
rb.0.32a6.238e1f29.00034d6a__7f5_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.   9 02:44 
rb.0.32a6.238e1f29.00034d6a__7fd_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  12 02:52 
rb.0.32a6.238e1f29.00034d6a__815_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  14 02:39 
rb.0.32a6.238e1f29.00034d6a__825_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  16 02:45 
rb.0.32a6.238e1f29.00034d6a__835_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  19 01:59 
rb.0.32a6.238e1f29.00034d6a__84d_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  20 02:25 
rb.0.32a6.238e1f29.00034d6a__855_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  22 02:18 
rb.0.32a6.238e1f29.00034d6a__865_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  23 02:24 
rb.0.32a6.238e1f29.00034d6a__86d_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  23 02:24 
rb.0.32a6.238e1f29.00034d6a__head_96AD1336__3


In OSD.1 (Replica) :
/var/lib/ceph/osd/ceph-1/current/3.136_head/DIR_6/DIR_3/DIR_3/DIR_1# ls 
-l rb.0.32a6.238e1f29.00034d6a*
-rw-r--r-- 1 root root 4194304 oct.  11 17:13 
rb.0.32a6.238e1f29.00034d6a__5ab_96AD1336__3   <--- 
-rw-r--r-- 1 root root 4194304 nov.   6 02:25 
rb.0.32a6.238e1f29.00034d6a__7ed_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.   8 02:40 
rb.0.32a6.238e1f29.00034d6a__7f5_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.   9 02:44 
rb.0.32a6.238e1f29.00034d6a__7fd_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  12 02:52 
rb.0.32a6.238e1f29.00034d6a__815_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  14 02:39 
rb.0.32a6.238e1f29.00034d6a__825_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  16 02:45 
rb.0.32a6.238e1f29.00034d6a__835_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  19 01:59 
rb.0.32a6.238e1f29.00034d6a__84d_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  20 02:25 
rb.0.32a6.238e1f29.00034d6a__855_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  22 02:18 
rb.0.32a6.238e1f29.00034d6a__865_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  23 02:24 
rb.0.32a6.238e1f29.00034d6a__86d_96AD1336__3
-rw-r--r-- 1 root root 4194304 nov.  23 02:24 
rb.0.32a6.238e1f29.00034d6a__head_96AD1336__3



The file rb.0.32a6.238e1f29.00034d6a__5ab_96AD1336__3 is only 
present on replica on osd.1. It seems that this snapshot (5ab) no longer 
exists

Re: [ceph-users] how to enable rbd cache

2013-11-25 Thread Gregory Farnum
On Mon, Nov 25, 2013 at 5:58 AM, Mark Nelson  wrote:
> On 11/25/2013 07:21 AM, Shu, Xinxin wrote:
>>
>> Recently , I want to enable rbd cache to identify performance benefit. I
>> add rbd_cache=true option in my ceph configure file, I use ’virsh
>> attach-device’ to attach rbd to vm, below is my vdb xml file.
>
>
> Ceph configuration files are a bit confusing because sometimes you'll see
> something like "rbd_cache" listed somewhere but in the ceph.conf file you'll
> want a space instead:
>
> rbd cache = true
>
> with no underscore.  That should (hopefully) fix it for you!

I believe the config file will take either format.

The RBD cache is a client-side thing, though, so it's not ever going
to show up in the OSD! You want to look at the admin socket created by
QEMU (via librbd) to see if it's working. :)
-Greg
-Greg

>
>>
>> 
>>
>>
>>
>>>
>> name='rbd/node12_2:rbd_cache=true:rbd_cache_writethrough_until_flush=true'/>
>>
>>
>>
>>6b5ff6f4-9f8c-4fe0-84d6-9d795967c7dd
>>
>>> function='0x0'/>i
>>
>> 
>>
>> I do not know this is ok to enable rbd cache. I see perf counter for rbd
>> cache in source code, but when I used admin daemon to check rbd cache
>> statistics,
>>
>> Ceph –admin-daemon /var/run/ceph/ceph-osd.0.asok perf dump
>>
>> But I did not get any rbd cahce flags.
>>
>> My question is how to enable rbd cahce and check rbd cache perf counter,
>> or how can I make sure rbd cache is enabled, any tips will be
>> appreciated? Thanks in advanced.
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to enable rbd cache

2013-11-25 Thread Mike Dawson
Greg is right, you need to enable RBD admin sockets. This can be a bit 
tricky though, so here are a few tips:


1) In ceph.conf on the compute node, explicitly set a location for the 
admin socket:


[client.volumes]
admin socket = /var/run/ceph/rbd-$pid.asok

In this example, libvirt/qemu is running with permissions from 
ceph.client.volumes.keyring. If you use something different, adjust 
accordingly. You can put this under a more generic [client] section, but 
there are some downsides (like a new admin socket for each ceph cli 
command).


2) Watch for permissions issues creating the admin socket at the path 
you used above. For me, I needed to explicitly grant some permissions in 
/etc/apparmor.d/abstractions/libvirt-qemu, specifically I had to add:


  # for rbd
  capability mknod,

and

  # for rbd
  /etc/ceph/ceph.conf r,
  /var/log/ceph/* rw,
  /{,var/}run/ceph/** rw,

3) Be aware that if you have multiple rbd volumes attached to a single 
rbd image, you'll only get an admin socket to the volume mounted last. 
If you can set admin_socket via the libvirt xml for each volume, you can 
avoid this issue. This thread will explain better:


http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg16168.html

4) Once you get an RBD admin socket, query it like:

ceph --admin-daemon /var/run/ceph/rbd-29050.asok config show | grep rbd


Cheers,
Mike Dawson


On 11/25/2013 11:12 AM, Gregory Farnum wrote:

On Mon, Nov 25, 2013 at 5:58 AM, Mark Nelson  wrote:

On 11/25/2013 07:21 AM, Shu, Xinxin wrote:


Recently , I want to enable rbd cache to identify performance benefit. I
add rbd_cache=true option in my ceph configure file, I use ’virsh
attach-device’ to attach rbd to vm, below is my vdb xml file.



Ceph configuration files are a bit confusing because sometimes you'll see
something like "rbd_cache" listed somewhere but in the ceph.conf file you'll
want a space instead:

rbd cache = true

with no underscore.  That should (hopefully) fix it for you!


I believe the config file will take either format.

The RBD cache is a client-side thing, though, so it's not ever going
to show up in the OSD! You want to look at the admin socket created by
QEMU (via librbd) to see if it's working. :)
-Greg
-Greg













6b5ff6f4-9f8c-4fe0-84d6-9d795967c7dd

i



I do not know this is ok to enable rbd cache. I see perf counter for rbd
cache in source code, but when I used admin daemon to check rbd cache
statistics,

Ceph –admin-daemon /var/run/ceph/ceph-osd.0.asok perf dump

But I did not get any rbd cahce flags.

My question is how to enable rbd cahce and check rbd cache perf counter,
or how can I make sure rbd cache is enabled, any tips will be
appreciated? Thanks in advanced.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg inconsistent : found clone without head

2013-11-25 Thread Gregory Farnum
On Mon, Nov 25, 2013 at 8:10 AM, Laurent Barbe  wrote:
> Hello,
>
> Since yesterday, scrub has detected an inconsistent pg :( :
>
> # ceph health detail(ceph version 0.61.9)
> HEALTH_ERR 1 pgs inconsistent; 9 scrub errors
> pg 3.136 is active+clean+inconsistent, acting [9,1]
> 9 scrub errors
>
> # ceph pg map 3.136
> osdmap e4363 pg 3.136 (3.136) -> up [9,1] acting [9,1]
>
> But when I try to repair, osd.9 daemon failed :
>
> # ceph pg repair 3.136
> instructing pg 3.136 on osd.9 to repair
>
> 2013-11-25 10:04:09.758845 7fc2f0706700  0 log [ERR] : 3.136 osd.9 missing
> 96ad1336/rb.0.32a6.238e1f29.00034d6a/5ab//3
> 2013-11-25 10:04:09.759862 7fc2f0706700  0 log [ERR] : repair 3.136
> 96ad1336/rb.0.32a6.238e1f29.00034d6a/5ab//3 found clone without head
> 2013-11-25 10:04:12.872908 7fc2f0706700  0 log [ERR] : 3.136 osd.9 missing
> e5822336/rb.0.32a6.238e1f29.00036552/5b3//3
> 2013-11-25 10:04:12.873064 7fc2f0706700  0 log [ERR] : repair 3.136
> e5822336/rb.0.32a6.238e1f29.00036552/5b3//3 found clone without head
> 2013-11-25 10:04:14.497750 7fc2f0706700  0 log [ERR] : 3.136 osd.9 missing
> 38372336/rb.0.32a6.238e1f29.00011379/5bb//3
> 2013-11-25 10:04:14.497796 7fc2f0706700  0 log [ERR] : repair 3.136
> 38372336/rb.0.32a6.238e1f29.00011379/5bb//3 found clone without head
> 2013-11-25 10:04:57.557894 7fc2f0706700  0 log [ERR] : 3.136 osd.9 missing
> 109b8336/rb.0.32a6.238e1f29.0003ad6b/5ab//3
> 2013-11-25 10:04:57.558052 7fc2f0706700  0 log [ERR] : repair 3.136
> 109b8336/rb.0.32a6.238e1f29.0003ad6b/5ab//3 found clone without head
> 2013-11-25 10:17:45.835145 7fc2f0706700  0 log [ERR] : 3.136 repair stat
> mismatch, got 8289/8292 objects, 1981/1984 clones, 26293444608/26294251520
> bytes.
> 2013-11-25 10:17:45.835248 7fc2f0706700  0 log [ERR] : 3.136 repair 4
> missing, 0 inconsistent objects
> 2013-11-25 10:17:45.835320 7fc2f0706700  0 log [ERR] : 3.136 repair 9
> errors, 5 fixed
> 2013-11-25 10:17:45.839963 7fc2f0f07700 -1 osd/ReplicatedPG.cc: In function
> 'int ReplicatedPG::recover_primary(int)' thread 7fc2f0f07700 time 2013-11-25
> 10:17:45.836790
> osd/ReplicatedPG.cc: 6643: FAILED assert(latest->is_update())
>
>
> The object (found clone without head) concern the rbd images below (which is
> in use) :
>
> # rbd info datashare/share3
> rbd image 'share3':
> size 1024 GB in 262144 objects
> order 22 (4096 KB objects)
> block_name_prefix: rb.0.32a6.238e1f29
> format: 1
>
>
> Directory contents :
> In OSD.9 (Primary) :
> /var/lib/ceph/osd/ceph-9/current/3.136_head/DIR_6/DIR_3/DIR_3/DIR_1# ls -l
> rb.0.32a6.238e1f29.00034d6a*
> -rw-r--r-- 1 root root 4194304 nov.   6 02:25
> rb.0.32a6.238e1f29.00034d6a__7ed_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.   8 02:40
> rb.0.32a6.238e1f29.00034d6a__7f5_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.   9 02:44
> rb.0.32a6.238e1f29.00034d6a__7fd_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  12 02:52
> rb.0.32a6.238e1f29.00034d6a__815_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  14 02:39
> rb.0.32a6.238e1f29.00034d6a__825_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  16 02:45
> rb.0.32a6.238e1f29.00034d6a__835_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  19 01:59
> rb.0.32a6.238e1f29.00034d6a__84d_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  20 02:25
> rb.0.32a6.238e1f29.00034d6a__855_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  22 02:18
> rb.0.32a6.238e1f29.00034d6a__865_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  23 02:24
> rb.0.32a6.238e1f29.00034d6a__86d_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  23 02:24
> rb.0.32a6.238e1f29.00034d6a__head_96AD1336__3
>
> In OSD.1 (Replica) :
> /var/lib/ceph/osd/ceph-1/current/3.136_head/DIR_6/DIR_3/DIR_3/DIR_1# ls -l
> rb.0.32a6.238e1f29.00034d6a*
> -rw-r--r-- 1 root root 4194304 oct.  11 17:13
> rb.0.32a6.238e1f29.00034d6a__5ab_96AD1336__3   <--- 
> -rw-r--r-- 1 root root 4194304 nov.   6 02:25
> rb.0.32a6.238e1f29.00034d6a__7ed_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.   8 02:40
> rb.0.32a6.238e1f29.00034d6a__7f5_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.   9 02:44
> rb.0.32a6.238e1f29.00034d6a__7fd_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  12 02:52
> rb.0.32a6.238e1f29.00034d6a__815_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  14 02:39
> rb.0.32a6.238e1f29.00034d6a__825_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  16 02:45
> rb.0.32a6.238e1f29.00034d6a__835_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  19 01:59
> rb.0.32a6.238e1f29.00034d6a__84d_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  20 02:25
> rb.0.32a6.238e1f29.00034d6a__855_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  22 02:18
> rb.0.32a6.238e1f29.00034d6a__865_96AD1336__3
> -rw-r--r-- 1 root root 4194304 nov.  23 02:24
> rb.0.32a6.238e1f29.00034d6a__86d_96AD1336__3
> -rw-r--r-- 1 root root 4194

Re: [ceph-users] PG state diagram

2013-11-25 Thread Gregory Farnum
It's generated from a .dot file which you can render as you like. :)
Please be aware that that diagram is for developers and will be
meaningless without that knowledge.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Nov 25, 2013 at 6:42 AM, Regola, Nathan (Contractor)
 wrote:
> Is there a vector graphics file (or a higher resolution file of some type)
> of the state diagram on the page below, as I can't read the text.
>
> Thanks,
> Nate
>
>
> http://ceph.com/docs/master/dev/peering/
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rest api json format incomplet informations

2013-11-25 Thread John Spray
You have found a bug in the underlying ceph command.  One can see the
same thing using "ceph -f json-pretty osd dump", we get an dict with
the same "pool_snap_info" key used more than once, like this:
  "pool_snaps": { "pool_snap_info": { "snapid": 1,
  "stamp": "2013-11-25 18:27:34.152418",
  "name": "test4.1"},
  "pool_snap_info": { "snapid": 2,
  "stamp": "2013-11-25 18:27:36.370861",
  "name": "test4.2"}},

When deserialized and then serialized by a JSON library such as the
python one, this malformed dict results in one of those entries being
arbitrarily dropped.

I have opened a ticket: http://tracker.ceph.com/issues/6894

Cheers,
John

On Mon, Nov 25, 2013 at 12:21 PM, eric mourgaya  wrote:
> Hi,
>
> I'am trying to list  snapshots of pool  using  ceph-rest-api. The json
> format only display the last snapshot of the  pool not all.
> the ceph version is ceph version 0.67.3
> (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
>
>
> http://@ip/api/v0.1/osd/dump  :
>
>
> 32013-11-25
> 11:37:34.695874ericsnap142013-11-25
> 11:40:30.832976ericsnap252013-11-25
> 12:40:02.365069ericsnap2wq
>
> versus
>
> http://@ip/api/v0.1/osd/dump.json or with
> headers={"Accept":"application/json"}
>
>  "pool_snaps": {
>   "pool_snap_info": {
> "stamp": "2013-11-25 12:40:02.365069",
> "snapid": 5,
> "name": "ericsnap2wq"
>   }
> },
>
>  I
> have you also have the same problem?
>
>
>
> --
> Eric Mourgaya,
>
>
> Respectons la planete!
> Luttons contre la mediocrite!
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] installing OS on software RAID

2013-11-25 Thread Gautam Saxena
We need to install the OS on the 3TB harddisks that come with our Dell
servers. (After many attempts, I've discovered that Dell servers won't
allow attaching an external harddisk via the PCIe slot. (I've tried
everything). )

But, must I therefore sacrifice two hard disks (RAID-1) for the OS?  I
don't see why I can't just create a small partition  (~30GB) on all 6 of my
hard disks, do a software-based RAID 1 on it, and be done.

I know that software based RAID-5 seems computationally expensive, but
shouldn't RAID 1 be fast and computationally inexpensive for a computer
built over the last 4 years? I wouldn't think that a CEPH systems (with
lots of VMs but little data changes) would even do much writing to the OS
partitionbut I'm not sure. (And in the past, I have noticed that RAID5
systems did suck up a lot of CPU and caused lots of waits, unlike what the
blogs implied. But I'm thinking that a RAID 1 takes little CPU and the OS
does little writing to disk; it's mostly reads, which should hit the RAM.)

Does anyone see any holes in the above idea? Any gut instincts? (I would
try it, but it's hard to tell how well the system would really behave under
"real" load conditions without some degree of experience and/or strong
theoretical knowledge.)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] installing OS on software RAID

2013-11-25 Thread James Harper
> 
> We need to install the OS on the 3TB harddisks that come with our Dell
> servers. (After many attempts, I've discovered that Dell servers won't allow
> attaching an external harddisk via the PCIe slot. (I've tried everything). )
> 
> But, must I therefore sacrifice two hard disks (RAID-1) for the OS?  I don't 
> see
> why I can't just create a small partition  (~30GB) on all 6 of my hard disks, 
> do a
> software-based RAID 1 on it, and be done.
> 
> I know that software based RAID-5 seems computationally expensive, but
> shouldn't RAID 1 be fast and computationally inexpensive for a computer
> built over the last 4 years? I wouldn't think that a CEPH systems (with lots 
> of
> VMs but little data changes) would even do much writing to the OS
> partitionbut I'm not sure. (And in the past, I have noticed that RAID5
> systems did suck up a lot of CPU and caused lots of waits, unlike what the
> blogs implied. But I'm thinking that a RAID 1 takes little CPU and the OS does
> little writing to disk; it's mostly reads, which should hit the RAM.)
> 
> Does anyone see any holes in the above idea? Any gut instincts? (I would try
> it, but it's hard to tell how well the system would really behave under "real"
> load conditions without some degree of experience and/or strong
> theoretical knowledge.)

Is the OS doing anything apart from ceph? Would booting a ramdisk-only system 
from USB or compact flash work?

If the OS doesn't produce a lot of writes then having it on the main disk 
should work okay. I've done it exactly as you describe before.

James

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_WARN # requests are blocked > 32 sec

2013-11-25 Thread James Harper
> 
> Writes seem to be happing during the block but this is now getting more
> frequent and seems to be for longer periods.
> Looking at the osd logs for 3 and 8 there's nothing of relevance in there.
> 
> Any ideas on the next step?
> 

Look for iowait and other disk metrics:

iostat -x  1

high iowait with low throughput could indicate a disk problem. dmesg might show 
something unless you are using consumer disks that tend to mask errors from the 
OS.

get smartmontools and do smartctl -H on each disk and see if smart thinks the 
disk is okay, then smartctl -a on each disk and look for evidence of errors 
(uncorrectable etc)

James

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CDS About to Begin

2013-11-25 Thread Patrick McGarry
Our first day of the online Ceph Developer Summit is about to begin.
Connection info is as follows:

IRC: irc.oftc.net  #ceph-summit
YouTube Stream: https://www.youtube.com/watch?v=DWK5RrNRhHU
G+ Event Page: 
https://plus.google.com/b/100228383599142686318/events/ca4mb81hi3j57nvs9lrcm988m4s

If you have questions I am 'scuttlemonkey' on irc, pmcgarry@gmail
(xmpp), or this email.


Best Regards,

Patrick McGarry
Director, Community || Inktank
http://ceph.com  ||  http://inktank.com
@scuttlemonkey || @ceph || @inktank
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] installing OS on software RAID

2013-11-25 Thread Kyle Bader
Several people have reported issues with combining OS and OSD journals
on the same SSD drives/RAID due to contention. If you do something
like this I would definitely test to make sure it meets your
expectations. Ceph logs are going to compose the majority of the
writes to the OS storage devices.

On Mon, Nov 25, 2013 at 12:46 PM, James Harper
 wrote:
>>
>> We need to install the OS on the 3TB harddisks that come with our Dell
>> servers. (After many attempts, I've discovered that Dell servers won't allow
>> attaching an external harddisk via the PCIe slot. (I've tried everything). )
>>
>> But, must I therefore sacrifice two hard disks (RAID-1) for the OS?  I don't 
>> see
>> why I can't just create a small partition  (~30GB) on all 6 of my hard 
>> disks, do a
>> software-based RAID 1 on it, and be done.
>>
>> I know that software based RAID-5 seems computationally expensive, but
>> shouldn't RAID 1 be fast and computationally inexpensive for a computer
>> built over the last 4 years? I wouldn't think that a CEPH systems (with lots 
>> of
>> VMs but little data changes) would even do much writing to the OS
>> partitionbut I'm not sure. (And in the past, I have noticed that RAID5
>> systems did suck up a lot of CPU and caused lots of waits, unlike what the
>> blogs implied. But I'm thinking that a RAID 1 takes little CPU and the OS 
>> does
>> little writing to disk; it's mostly reads, which should hit the RAM.)
>>
>> Does anyone see any holes in the above idea? Any gut instincts? (I would try
>> it, but it's hard to tell how well the system would really behave under 
>> "real"
>> load conditions without some degree of experience and/or strong
>> theoretical knowledge.)
>
> Is the OS doing anything apart from ceph? Would booting a ramdisk-only system 
> from USB or compact flash work?
>
> If the OS doesn't produce a lot of writes then having it on the main disk 
> should work okay. I've done it exactly as you describe before.
>
> James
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 

Kyle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CDS Day 1.2

2013-11-25 Thread Patrick McGarry
For those of you wishing to tune in to the second half of CDS day one,
please join us at:

https://www.youtube.com/watch?v=_kjjCAib_4E

The associated discussion is on irc.oftc.net on channel #ceph-summit

If you have questions please feel free to contact me.


Best Regards,

Patrick McGarry
Director, Community || Inktank
http://ceph.com  ||  http://inktank.com
@scuttlemonkey || @ceph || @inktank
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG state diagram

2013-11-25 Thread Mark Kirkwood
That's rather cool (very easy to change). However given that the current 
generated size is kinda a big thumbnail and too small to be actually 
read meaningfully, would it not make sense to generate a larger 
resolution version by default and make the current one a link to it?


Cheers

Mark

On 26/11/13 07:17, Gregory Farnum wrote:

It's generated from a .dot file which you can render as you like. :)
Please be aware that that diagram is for developers and will be
meaningless without that knowledge.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Nov 25, 2013 at 6:42 AM, Regola, Nathan (Contractor)
 wrote:

Is there a vector graphics file (or a higher resolution file of some type)
of the state diagram on the page below, as I can't read the text.

Thanks,
Nate


http://ceph.com/docs/master/dev/peering/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG state diagram

2013-11-25 Thread Dan Mick

Yes


On 11/25/2013 04:25 PM, Mark Kirkwood wrote:

That's rather cool (very easy to change). However given that the current
generated size is kinda a big thumbnail and too small to be actually
read meaningfully, would it not make sense to generate a larger
resolution version by default and make the current one a link to it?

Cheers

Mark

On 26/11/13 07:17, Gregory Farnum wrote:

It's generated from a .dot file which you can render as you like. :)
Please be aware that that diagram is for developers and will be
meaningless without that knowledge.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Nov 25, 2013 at 6:42 AM, Regola, Nathan (Contractor)
 wrote:

Is there a vector graphics file (or a higher resolution file of some
type)
of the state diagram on the page below, as I can't read the text.

Thanks,
Nate


http://ceph.com/docs/master/dev/peering/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Dan Mick, Filesystem Engineering
Inktank Storage, Inc.   http://inktank.com
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] meet up in shanghai? or user group in China?

2013-11-25 Thread jiangang duan
After talking with Sage, Ross, Patrick and Loic, I am thinking to build up
some Ceph user group in China - for Ceph developer/user to talk, learn and
have fun together - and promote Ceph in China. Anybody in the lists are
interested in this? please drop me a mail for further discussion.

I can arrange some in Shanghai - (if you guys are OK, we can use meeting
room in intel office with snack provide) or we can pick up some industry
forum to gather together.

-jiangang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] meet up in shanghai? or user group in China?

2013-11-25 Thread Mark Nelson

That's great!  I will join you in spirit here in cold Minnesota. :)

Mark

On 11/25/2013 08:59 PM, jiangang duan wrote:

After talking with Sage, Ross, Patrick and Loic, I am thinking to build
up some Ceph user group in China - for Ceph developer/user to talk,
learn and have fun together - and promote Ceph in China. Anybody in the
lists are interested in this? please drop me a mail for further discussion.

I can arrange some in Shanghai - (if you guys are OK, we can use meeting
room in intel office with snack provide) or we can pick up some industry
forum to gather together.

-jiangang


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: meet up in shanghai? or user group in China?

2013-11-25 Thread jiangang duan
hi, After talking with Sage, Ross, Patrick and Loic, I am thinking to build
up some Ceph user group in China - for Ceph developer/user to talk, learn
and have fun together - and promote Ceph in China. Anybody in the lists are
interested in this? please drop me a mail to work on this together.

I can arrange some event in Shanghai - (May use meeting room in intel
office with snack provide or some tea bar) or we can pick up some industry
forum to gather together.

-jiangang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to enable rbd cache

2013-11-25 Thread Shu, Xinxin
Hi mike, I enable rbd admin sockets according to  your suggestions, I add admin 
socket option in my ceph.conf, but in /var/run/ceph directory , there is no 
asok file, I used to nova to boot instances. Below is my steps to enable rbd 
admin socket. If there is something wrong, please let me know:

1: add rbd admin socket to /etc/ceph/ceph.conf, here is my ceph.conf on client 
hosts
[global]
log file = /var/log/ceph/$name.log
max open files = 131072
auth cluster required = none
auth service required = none
auth client required = none
rbd cache = true
debug perfcounter = 20
[client.volumes]
admin socket=/var/run/ceph/rbd-$pid.asok
[mon.a]
host = {monitor_host_name}
mon addr = {monitor_host_addr}

2: modify in /etc/apparmor.d/abstractions/libvirt-qemu
Add 
 # for rbd
 capability mknod,   
   
# for rbd
  /etc/ceph/ceph.conf r,
  /var/log/ceph/* rw,
  /var/run/ceph/** rw,

Then restart libvirt-bin and nova-compute service

3: recreate nova instances and attach rbd, then execute 'dd if=/dev/zero 
of=/dev/vdb bs=64k', after that, check /var/run/ceph/rbd-$pid.asok socket, but 
it did not exist.

My ceph version was cuttlefish. Openstack is folsom.  Is there anything wired 
for you? Please let me know.
  
-Original Message-
From: Mike Dawson [mailto:mike.daw...@cloudapt.com] 
Sent: Tuesday, November 26, 2013 12:41 AM
To: Shu, Xinxin
Cc: Gregory Farnum; Mark Nelson; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] how to enable rbd cache

Greg is right, you need to enable RBD admin sockets. This can be a bit tricky 
though, so here are a few tips:

1) In ceph.conf on the compute node, explicitly set a location for the admin 
socket:

[client.volumes]
 admin socket = /var/run/ceph/rbd-$pid.asok

In this example, libvirt/qemu is running with permissions from 
ceph.client.volumes.keyring. If you use something different, adjust 
accordingly. You can put this under a more generic [client] section, but there 
are some downsides (like a new admin socket for each ceph cli command).

2) Watch for permissions issues creating the admin socket at the path you used 
above. For me, I needed to explicitly grant some permissions in 
/etc/apparmor.d/abstractions/libvirt-qemu, specifically I had to add:

   # for rbd
   capability mknod,

and

   # for rbd
   /etc/ceph/ceph.conf r,
   /var/log/ceph/* rw,
   /{,var/}run/ceph/** rw,

3) Be aware that if you have multiple rbd volumes attached to a single rbd 
image, you'll only get an admin socket to the volume mounted last. 
If you can set admin_socket via the libvirt xml for each volume, you can avoid 
this issue. This thread will explain better:

http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg16168.html

4) Once you get an RBD admin socket, query it like:

ceph --admin-daemon /var/run/ceph/rbd-29050.asok config show | grep rbd


Cheers,
Mike Dawson


On 11/25/2013 11:12 AM, Gregory Farnum wrote:
> On Mon, Nov 25, 2013 at 5:58 AM, Mark Nelson  wrote:
>> On 11/25/2013 07:21 AM, Shu, Xinxin wrote:
>>>
>>> Recently , I want to enable rbd cache to identify performance 
>>> benefit. I add rbd_cache=true option in my ceph configure file, I 
>>> use 'virsh attach-device' to attach rbd to vm, below is my vdb xml file.
>>
>>
>> Ceph configuration files are a bit confusing because sometimes you'll 
>> see something like "rbd_cache" listed somewhere but in the ceph.conf 
>> file you'll want a space instead:
>>
>> rbd cache = true
>>
>> with no underscore.  That should (hopefully) fix it for you!
>
> I believe the config file will take either format.
>
> The RBD cache is a client-side thing, though, so it's not ever going 
> to show up in the OSD! You want to look at the admin socket created by 
> QEMU (via librbd) to see if it's working. :) -Greg -Greg
>
>>
>>>
>>> 
>>>
>>> 
>>>
>>> >>
>>> name='rbd/node12_2:rbd_cache=true:rbd_cache_writethrough_until_flush
>>> =true'/>
>>>
>>> 
>>>
>>> 6b5ff6f4-9f8c-4fe0-84d6-9d795967c7dd
>>>
>>> >> function='0x0'/>i
>>>
>>> 
>>>
>>> I do not know this is ok to enable rbd cache. I see perf counter for 
>>> rbd cache in source code, but when I used admin daemon to check rbd 
>>> cache statistics,
>>>
>>> Ceph -admin-daemon /var/run/ceph/ceph-osd.0.asok perf dump
>>>
>>> But I did not get any rbd cahce flags.
>>>
>>> My question is how to enable rbd cahce and check rbd cache perf 
>>> counter, or how can I make sure rbd cache is enabled, any tips will 
>>> be appreciated? Thanks in advanced.
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.c