Re: [ceph-users] Perl Bindings for Ceph

2013-10-20 Thread Jon
Hello Michael,

1.  Perhaps I'm misunderstanding, but can Ceph present a SCSI interface?  I
don't understand how that would help with reducing the size of the rbd.

4.  Heh. Tell Me about it [3].  But based on that experience, it *seemed* like
I could read ok on the different nodes where the rbd was mounted.  Guess
there's only one way to find out.

Thanks for your feedback!

Best Regards,
Jon A

[3]
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-May/001913.html


On Sun, Oct 20, 2013 at 10:26 PM, Michael Lowe wrote:

> 1. How about enabling trim/discard support in virtio-SCSI and using
> fstrim?  That might work for you.
>
> 4.  Well you can mount them rw in multiple vm's with predictably bad
> results, so I don't see any reason why you could not specify ro as a mount
> option and do ok.
>
> Sent from my iPad
>
> On Oct 21, 2013, at 12:09 AM, Jon  wrote:
>
> Hello,
>
> Are there any current Perl modules for Ceph?  I found a thread [1] from
> 2011 with a version of Ceph::RADOS, but it only has functions to deal with
> pools, and the ->list_pools function causes a seg. fault.
>
> I'm interested in controlling Ceph via script / application and I was
> wondering [hoping] if anyone else had a current module before I go
> reinventing the wheel.  (My wheel would likely leverage calls to system()
> and use the rbd/rados/ceph functions directly initially...  I'm not
> proficient with C/XS)
>
> I've been primarily using OpenNebula, though I've evaluated OpenStack,
> CloudStack, and even Eucalyptus and they all seem to meet ($x-1)/$x
> criteria (one project seems to do one thing better than another, but they
> all are missing one feature that another project has--this is a
> generalization, but this isn't the OpenNebula mailing list).  What I'm
> looking to do at the moment is simplify my lab deployments.  My current
> workflow only takes 10 minutes or so to deploy a new vm:
>
> 1) dump xml of existing vm (usually the "base" vm that the template was
> created from, I actually have a "template" that I just copy and modify now)
> 2) clone rbd to new vm (usually using vmname)
> 3) edit vm template to relflect new values
>-- change name of vm to the new vmname
>-- remove specific identifiers (MAC, etc. unnecessary when copying
> "template")
>-- update disk to reflect new rbd
> 4) login to console and "pre-provision" vm
>   -- update system
>   -- assign hostname
>   -- generate ssh-keys (I remove the sshd host keys when "sysprepping" for
> cloning, ubuntu I know for sure doesn't regenerate the keys on boot, I
> _THINK_ RHEL might)
>
> I actually already did this work on automating deployments[2], but that
> was back when I was primarily using qcow2 images.  It leverages guestfish
> to do all of the vm :management" (setting IP, hostname, generating ssh host
> keys, etc).  But now I want to leverage my Ceph cluster for images.
>
> Couple of tangentially related questions that I don't think warrant a
> whole thread:
>
> 1) Is it possible to zero and compress rbds?  (I like to use virt-sysprep
> and virt-sparcify to prepare my images, then, when I was using qcow images,
> I would compress them before cloning)
> 2)  has anyone used virt-sysprep|virt-sparcify against rbd images?  I
> suppose if I'm creating a template image, I could create the qcow image
> then convert it to an rbd, but qcow-img creates format 1 images.
> 3) anyone know of a way to create format 2 images with qemu-img?  When I
> specify -O rbd qemu-img seg faults, and rbd2 is an invalid format.
> 4) Is it possible to mount an RBD to multiple vms as readonly?  I'm
> thinking like readonly iso images converted to rbds? (is it even possible
> to convert an iso to an image?)
>
>
> Thanks for your help.
>
> Best Regards,
> Jon A
>
> [1]  http://www.spinics.net/lists/ceph-devel/msg04147.html
> [2]  https://github.com/three18ti/PrepVM-App
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Perl Bindings for Ceph

2013-10-20 Thread Michael Lowe
1. How about enabling trim/discard support in virtio-SCSI and using fstrim?  
That might work for you.

4.  Well you can mount them rw in multiple vm's with predictably bad results, 
so I don't see any reason why you could not specify ro as a mount option and do 
ok.

Sent from my iPad

> On Oct 21, 2013, at 12:09 AM, Jon  wrote:
> 
> Hello,
> 
> Are there any current Perl modules for Ceph?  I found a thread [1] from 2011 
> with a version of Ceph::RADOS, but it only has functions to deal with pools, 
> and the ->list_pools function causes a seg. fault.
> 
> I'm interested in controlling Ceph via script / application and I was 
> wondering [hoping] if anyone else had a current module before I go 
> reinventing the wheel.  (My wheel would likely leverage calls to system() and 
> use the rbd/rados/ceph functions directly initially...  I'm not proficient 
> with C/XS)
> 
> I've been primarily using OpenNebula, though I've evaluated OpenStack, 
> CloudStack, and even Eucalyptus and they all seem to meet ($x-1)/$x criteria 
> (one project seems to do one thing better than another, but they all are 
> missing one feature that another project has--this is a generalization, but 
> this isn't the OpenNebula mailing list).  What I'm looking to do at the 
> moment is simplify my lab deployments.  My current workflow only takes 10 
> minutes or so to deploy a new vm:  
> 
> 1) dump xml of existing vm (usually the "base" vm that the template was 
> created from, I actually have a "template" that I just copy and modify now)
> 2) clone rbd to new vm (usually using vmname)
> 3) edit vm template to relflect new values
>-- change name of vm to the new vmname
>-- remove specific identifiers (MAC, etc. unnecessary when copying 
> "template")
>-- update disk to reflect new rbd
> 4) login to console and "pre-provision" vm
>   -- update system
>   -- assign hostname
>   -- generate ssh-keys (I remove the sshd host keys when "sysprepping" for 
> cloning, ubuntu I know for sure doesn't regenerate the keys on boot, I 
> _THINK_ RHEL might)
> 
> I actually already did this work on automating deployments[2], but that was 
> back when I was primarily using qcow2 images.  It leverages guestfish to do 
> all of the vm :management" (setting IP, hostname, generating ssh host keys, 
> etc).  But now I want to leverage my Ceph cluster for images.
> 
> Couple of tangentially related questions that I don't think warrant a whole 
> thread:
> 
> 1) Is it possible to zero and compress rbds?  (I like to use virt-sysprep and 
> virt-sparcify to prepare my images, then, when I was using qcow images, I 
> would compress them before cloning)
> 2)  has anyone used virt-sysprep|virt-sparcify against rbd images?  I suppose 
> if I'm creating a template image, I could create the qcow image then convert 
> it to an rbd, but qcow-img creates format 1 images.
> 3) anyone know of a way to create format 2 images with qemu-img?  When I 
> specify -O rbd qemu-img seg faults, and rbd2 is an invalid format.
> 4) Is it possible to mount an RBD to multiple vms as readonly?  I'm thinking 
> like readonly iso images converted to rbds? (is it even possible to convert 
> an iso to an image?)
> 
> 
> Thanks for your help.
> 
> Best Regards,
> Jon A
> 
> [1]  http://www.spinics.net/lists/ceph-devel/msg04147.html
> [2]  https://github.com/three18ti/PrepVM-App
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Perl Bindings for Ceph

2013-10-20 Thread Jon
Hello,

Are there any current Perl modules for Ceph?  I found a thread [1] from
2011 with a version of Ceph::RADOS, but it only has functions to deal with
pools, and the ->list_pools function causes a seg. fault.

I'm interested in controlling Ceph via script / application and I was
wondering [hoping] if anyone else had a current module before I go
reinventing the wheel.  (My wheel would likely leverage calls to system()
and use the rbd/rados/ceph functions directly initially...  I'm not
proficient with C/XS)

I've been primarily using OpenNebula, though I've evaluated OpenStack,
CloudStack, and even Eucalyptus and they all seem to meet ($x-1)/$x
criteria (one project seems to do one thing better than another, but they
all are missing one feature that another project has--this is a
generalization, but this isn't the OpenNebula mailing list).  What I'm
looking to do at the moment is simplify my lab deployments.  My current
workflow only takes 10 minutes or so to deploy a new vm:

1) dump xml of existing vm (usually the "base" vm that the template was
created from, I actually have a "template" that I just copy and modify now)
2) clone rbd to new vm (usually using vmname)
3) edit vm template to relflect new values
   -- change name of vm to the new vmname
   -- remove specific identifiers (MAC, etc. unnecessary when copying
"template")
   -- update disk to reflect new rbd
4) login to console and "pre-provision" vm
  -- update system
  -- assign hostname
  -- generate ssh-keys (I remove the sshd host keys when "sysprepping" for
cloning, ubuntu I know for sure doesn't regenerate the keys on boot, I
_THINK_ RHEL might)

I actually already did this work on automating deployments[2], but that was
back when I was primarily using qcow2 images.  It leverages guestfish to do
all of the vm :management" (setting IP, hostname, generating ssh host keys,
etc).  But now I want to leverage my Ceph cluster for images.

Couple of tangentially related questions that I don't think warrant a whole
thread:

1) Is it possible to zero and compress rbds?  (I like to use virt-sysprep
and virt-sparcify to prepare my images, then, when I was using qcow images,
I would compress them before cloning)
2)  has anyone used virt-sysprep|virt-sparcify against rbd images?  I
suppose if I'm creating a template image, I could create the qcow image
then convert it to an rbd, but qcow-img creates format 1 images.
3) anyone know of a way to create format 2 images with qemu-img?  When I
specify -O rbd qemu-img seg faults, and rbd2 is an invalid format.
4) Is it possible to mount an RBD to multiple vms as readonly?  I'm
thinking like readonly iso images converted to rbds? (is it even possible
to convert an iso to an image?)


Thanks for your help.

Best Regards,
Jon A

[1]  http://www.spinics.net/lists/ceph-devel/msg04147.html
[2]  https://github.com/three18ti/PrepVM-App
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Radosgw usage

2013-10-20 Thread Derek Yarnell
So I have tried to enable usage logging on a new production Ceph RadosGW
cluster but nothing seems to show up.

I have added to the [client.radosgw.] section the following

rgw enable usage log = true
rgw usage log tick interval = 30
rgw usage log flush threshold = 1024
rgw usage max shards = 32
rgw usage max user shards = 1

Restarted the radosgw but I don't see anything in the logs (running in
debug 20)

# radosgw-admin usage show --uid=derek --bucket=derek
{ "entries": [],
  "summary": []}

Is there something more I can poke to figure out why the gateway is not
logging?

Thanks,
derek

-- 
---
Derek T. Yarnell
University of Maryland
Institute for Advanced Computer Studies
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] poor read performance on rbd+LVM, LVM overload

2013-10-20 Thread Sage Weil
On Sun, 20 Oct 2013, Ugis wrote:
> >> output follows:
> >> #pvs -o pe_start /dev/rbd1p1
> >>   1st PE
> >> 4.00m
> >> # cat /sys/block/rbd1/queue/minimum_io_size
> >> 4194304
> >> # cat /sys/block/rbd1/queue/optimal_io_size
> >> 4194304
> >
> > Well, the parameters are being set at least.  Mike, is it possible that
> > having minimum_io_size set to 4m is causing some read amplification
> > in LVM, translating a small read into a complete fetch of the PE (or
> > somethinga long those lines)?
> >
> > Ugis, if your cluster is on the small side, it might be interesting to see
> > what requests the client is generated in the LVM and non-LVM case by
> > setting 'debug ms = 1' on the osds (e.g., ceph tell osd.* injectargs
> > '--debug-ms 1') and then looking at the osd_op messages that appear in
> > /var/log/ceph/ceph-osd*.log.  It may be obvious that the IO pattern is
> > different.
> >
> Sage, here follows debug output. I am no pro in reading this, but
> seems read block size differ(or what is that number following ~ sign)?

Yep, it's offset~length.  

It looks like without LVM we're getting 128KB requests (which IIRC is 
typical), but with LVM it's only 4KB.  Unfortunately my memory is a bit 
fuzzy here, but I seem to recall a property on the request_queue or device 
that affected this.  RBD is currently doing

segment_size = rbd_obj_bytes(&rbd_dev->header);
blk_queue_max_hw_sectors(q, segment_size / SECTOR_SIZE);
blk_queue_max_segment_size(q, segment_size);
blk_queue_io_min(q, segment_size);
blk_queue_io_opt(q, segment_size);

where segment_size is 4MB (so, much more than 128KB); maybe it has 
something to do with how many smaller ios get coalesced a larger requests?

In any case, something appears to be lost due to the pass through LVM, but 
I'm not very familiar with the block layer code at all...  :/

sage


> 
> OSD.2 read with LVM:
> 2013-10-20 16:59:05.307159 7f95acfa5700  1 -- x.x.x.x:6804/1944 -->
> x.x.x.y:0/269199468 -- osd_op_reply(176566434
> rbd_data.3ad974b0dc51.7cef [read 4083712~4096] ondisk = 0)
> v4 -- ?+0 0xdc35c00 con 0xd9e4840
> 2013-10-20 16:59:05.307655 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
> client.38069 x.x.x.y:0/269199468 5548 
> osd_op(client.38069.1:176566435 rbd_data.3ad974b0dc51.7cef
> [read 4087808~4096] 4.5672f053 e6870) v4  177+0+0 (1554835253 0 0)
> 0x12593d80 con 0xd9e4840
> 2013-10-20 16:59:05.307824 7f95ac7a4700  1 -- x.x.x.x:6804/1944 -->
> x.x.x.y:0/269199468 -- osd_op_reply(176566435
> rbd_data.3ad974b0dc51.7cef [read 4087808~4096] ondisk = 0)
> v4 -- ?+0 0xe24fc00 con 0xd9e4840
> 2013-10-20 16:59:05.308316 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
> client.38069 x.x.x.y:0/269199468 5549 
> osd_op(client.38069.1:176566436 rbd_data.3ad974b0dc51.7cef
> [read 4091904~4096] 4.5672f053 e6870) v4  177+0+0 (3467296840 0 0)
> 0xe28f6c0 con 0xd9e4840
> 2013-10-20 16:59:05.308499 7f95acfa5700  1 -- x.x.x.x:6804/1944 -->
> x.x.x.y:0/269199468 -- osd_op_reply(176566436
> rbd_data.3ad974b0dc51.7cef [read 4091904~4096] ondisk = 0)
> v4 -- ?+0 0xdc35a00 con 0xd9e4840
> 2013-10-20 16:59:05.308985 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
> client.38069 x.x.x.y:0/269199468 5550 
> osd_op(client.38069.1:176566437 rbd_data.3ad974b0dc51.7cef
> [read 4096000~4096] 4.5672f053 e6870) v4  177+0+0 (3104591620 0 0)
> 0xe0b46c0 con 0xd9e4840
> 
> OSD.2 read without LVM
> 2013-10-20 17:03:13.730881 7f95ac7a4700  1 -- x.x.x.x:6804/1944 -->
> x.x.x.y:0/269199468 -- osd_op_reply(176708854
> rb.0.967b.238e1f29.0071 [read 2359296~131072] ondisk = 0) v4
> -- ?+0 0x1019d200 con 0xd9e4840
> 2013-10-20 17:03:13.731318 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
> client.38069 x.x.x.y:0/269199468 18232 
> osd_op(client.38069.1:176708855 rb.0.967b.238e1f29.0071 [read
> 2490368~131072] 4.c0d1e4cb e6870) v4  170+0+0 (1987168552 0 0)
> 0x171a7480 con 0xd9e4840
> 2013-10-20 17:03:13.731664 7f95acfa5700  1 -- x.x.x.x:6804/1944 -->
> x.x.x.y:0/269199468 -- osd_op_reply(176708855
> rb.0.967b.238e1f29.0071 [read 2490368~131072] ondisk = 0) v4
> -- ?+0 0x12b81200 con 0xd9e4840
> 2013-10-20 17:03:13.733112 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
> client.38069 x.x.x.y:0/269199468 18233 
> osd_op(client.38069.1:176708856 rb.0.967b.238e1f29.0071 [read
> 2621440~131072] 4.c0d1e4cb e6870) v4  170+0+0 (527551382 0 0)
> 0x12593d80 con 0xd9e4840
> 2013-10-20 17:03:13.733393 7f95ac7a4700  1 -- x.x.x.x:6804/1944 -->
> x.x.x.y:0/269199468 -- osd_op_reply(176708856
> rb.0.967b.238e1f29.0071 [read 2621440~131072] ondisk = 0) v4
> -- ?+0 0xeba9000 con 0xd9e4840
> 2013-10-20 17:03:13.733741 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
> client.38069 x.x.x.y:0/269199468 18234 
> osd_op(client.38069.1:176708857 rb.0.967b.238e1f29.0071 [read
> 2752512~131072] 4.c0d1e4cb e6870) v4  170+0+0 (178955972 0 0)
> 0xe0b4d80 con 0xd9e4840
>

Re: [ceph-users] health HEALTH_WARN 1 is laggy

2013-10-20 Thread Yan, Zheng
On Mon, Oct 21, 2013 at 9:50 AM, 鹏  wrote:
>
> hi all,
>   today , I ceph cluster has soming wrong!   one of my mdss is loaggy!
>
>#ceph -s
>health HEALTH_WARN 1 is laggy
>
>I restart it
>
>   # scrvice ceph -a restart mds.1
>
>   It is ok at first!  but a few  Minutes latter!

looks like mds.1 was crashed. what does the mds log say?

>
>#ceph -s
>health HEALTH_WARN 0 is laggy
>
>   2 mds can not work at the same time!  one is health ,abt another is
> laggy!
>
>How can I  solve this promble.。
>
>  by the way !  I found when I typed the
>  # ceph osd dump
> ...
>max_mds 1
>...
>
>  Is the filed max_mds cause this problems !  in my ceph.conf , the
> max_mads , which values is 2 , lays  in the [global]   section!
>

multiple mds setup is not stable yet, please keep 'max_mds=1'

Regards
Yan, Zheng
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] health HEALTH_WARN 1 is laggy

2013-10-20 Thread
 
hi all,
  today , I ceph cluster has soming wrong!   one of my mdss is loaggy!
 
   #ceph -s
   health HEALTH_WARN 1 is laggy
 
   I restart it

  # scrvice ceph -a restart mds.1

  It is ok at first!  but a few  Minutes latter!

   #ceph -s
   health HEALTH_WARN 0 is laggy 

  2 mds can not work at the same time!  one is health ,abt another is laggy!

   How can I  solve this promble.。

 by the way !  I found when I typed the
 # ceph osd dump
...
   max_mds 1
   ...

 Is the filed max_mds cause this problems !  in my ceph.conf , the max_mads 
, which values is 2 , lays  in the [global]   section!

   thinks
peng





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cuttlefish: pool recreation results in cluster crash

2013-10-20 Thread Sage Weil
Moved to ceph-devel, and opened http://tracker.ceph.com/issues/6598

Have you tried to reproduce this on dumpling or later?

Thanks!
sage


On Sat, 19 Oct 2013, Andrey Korolyov wrote:

> Hello,
> 
> I was able to reproduce following on the top of current cuttlefish:
> 
> - create pool,
> - delete it after all pgs initialized,
> - create new pool with same name after, say, ten seconds.
> 
> All osds dies immediately with attached trace. The problem exists in
> bobtail as well.
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] poor read performance on rbd+LVM, LVM overload

2013-10-20 Thread Josh Durgin

On 10/20/2013 08:18 AM, Ugis wrote:

output follows:
#pvs -o pe_start /dev/rbd1p1
   1st PE
 4.00m
# cat /sys/block/rbd1/queue/minimum_io_size
4194304
# cat /sys/block/rbd1/queue/optimal_io_size
4194304


Well, the parameters are being set at least.  Mike, is it possible that
having minimum_io_size set to 4m is causing some read amplification
in LVM, translating a small read into a complete fetch of the PE (or
somethinga long those lines)?

Ugis, if your cluster is on the small side, it might be interesting to see
what requests the client is generated in the LVM and non-LVM case by
setting 'debug ms = 1' on the osds (e.g., ceph tell osd.* injectargs
'--debug-ms 1') and then looking at the osd_op messages that appear in
/var/log/ceph/ceph-osd*.log.  It may be obvious that the IO pattern is
different.


Sage, here follows debug output. I am no pro in reading this, but
seems read block size differ(or what is that number following ~ sign)?


Yes, that's the I/O length. LVM is sending requests for 4k at a time,
while plain kernel rbd is sending 128k.




How to proceed with tuning read performance on LVM? Is there some
chanage needed in code of ceph/LVM or my config needs to be tuned?
If what is shown in logs means 4k read block in LVM case - then it
seems I need to tell LVM(or xfs on top of LVM dictates read block
side?) that io block should be rather 4m?


It's a client side issue of sending much smaller requests than it needs
to. Check the queue minimum and optimal sizes for the lvm device - it
sounds like they might be getting set to 4k for some reason.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] poor read performance on rbd+LVM, LVM overload

2013-10-20 Thread Ugis
>> output follows:
>> #pvs -o pe_start /dev/rbd1p1
>>   1st PE
>> 4.00m
>> # cat /sys/block/rbd1/queue/minimum_io_size
>> 4194304
>> # cat /sys/block/rbd1/queue/optimal_io_size
>> 4194304
>
> Well, the parameters are being set at least.  Mike, is it possible that
> having minimum_io_size set to 4m is causing some read amplification
> in LVM, translating a small read into a complete fetch of the PE (or
> somethinga long those lines)?
>
> Ugis, if your cluster is on the small side, it might be interesting to see
> what requests the client is generated in the LVM and non-LVM case by
> setting 'debug ms = 1' on the osds (e.g., ceph tell osd.* injectargs
> '--debug-ms 1') and then looking at the osd_op messages that appear in
> /var/log/ceph/ceph-osd*.log.  It may be obvious that the IO pattern is
> different.
>
Sage, here follows debug output. I am no pro in reading this, but
seems read block size differ(or what is that number following ~ sign)?

OSD.2 read with LVM:
2013-10-20 16:59:05.307159 7f95acfa5700  1 -- x.x.x.x:6804/1944 -->
x.x.x.y:0/269199468 -- osd_op_reply(176566434
rbd_data.3ad974b0dc51.7cef [read 4083712~4096] ondisk = 0)
v4 -- ?+0 0xdc35c00 con 0xd9e4840
2013-10-20 16:59:05.307655 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
client.38069 x.x.x.y:0/269199468 5548 
osd_op(client.38069.1:176566435 rbd_data.3ad974b0dc51.7cef
[read 4087808~4096] 4.5672f053 e6870) v4  177+0+0 (1554835253 0 0)
0x12593d80 con 0xd9e4840
2013-10-20 16:59:05.307824 7f95ac7a4700  1 -- x.x.x.x:6804/1944 -->
x.x.x.y:0/269199468 -- osd_op_reply(176566435
rbd_data.3ad974b0dc51.7cef [read 4087808~4096] ondisk = 0)
v4 -- ?+0 0xe24fc00 con 0xd9e4840
2013-10-20 16:59:05.308316 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
client.38069 x.x.x.y:0/269199468 5549 
osd_op(client.38069.1:176566436 rbd_data.3ad974b0dc51.7cef
[read 4091904~4096] 4.5672f053 e6870) v4  177+0+0 (3467296840 0 0)
0xe28f6c0 con 0xd9e4840
2013-10-20 16:59:05.308499 7f95acfa5700  1 -- x.x.x.x:6804/1944 -->
x.x.x.y:0/269199468 -- osd_op_reply(176566436
rbd_data.3ad974b0dc51.7cef [read 4091904~4096] ondisk = 0)
v4 -- ?+0 0xdc35a00 con 0xd9e4840
2013-10-20 16:59:05.308985 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
client.38069 x.x.x.y:0/269199468 5550 
osd_op(client.38069.1:176566437 rbd_data.3ad974b0dc51.7cef
[read 4096000~4096] 4.5672f053 e6870) v4  177+0+0 (3104591620 0 0)
0xe0b46c0 con 0xd9e4840

OSD.2 read without LVM
2013-10-20 17:03:13.730881 7f95ac7a4700  1 -- x.x.x.x:6804/1944 -->
x.x.x.y:0/269199468 -- osd_op_reply(176708854
rb.0.967b.238e1f29.0071 [read 2359296~131072] ondisk = 0) v4
-- ?+0 0x1019d200 con 0xd9e4840
2013-10-20 17:03:13.731318 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
client.38069 x.x.x.y:0/269199468 18232 
osd_op(client.38069.1:176708855 rb.0.967b.238e1f29.0071 [read
2490368~131072] 4.c0d1e4cb e6870) v4  170+0+0 (1987168552 0 0)
0x171a7480 con 0xd9e4840
2013-10-20 17:03:13.731664 7f95acfa5700  1 -- x.x.x.x:6804/1944 -->
x.x.x.y:0/269199468 -- osd_op_reply(176708855
rb.0.967b.238e1f29.0071 [read 2490368~131072] ondisk = 0) v4
-- ?+0 0x12b81200 con 0xd9e4840
2013-10-20 17:03:13.733112 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
client.38069 x.x.x.y:0/269199468 18233 
osd_op(client.38069.1:176708856 rb.0.967b.238e1f29.0071 [read
2621440~131072] 4.c0d1e4cb e6870) v4  170+0+0 (527551382 0 0)
0x12593d80 con 0xd9e4840
2013-10-20 17:03:13.733393 7f95ac7a4700  1 -- x.x.x.x:6804/1944 -->
x.x.x.y:0/269199468 -- osd_op_reply(176708856
rb.0.967b.238e1f29.0071 [read 2621440~131072] ondisk = 0) v4
-- ?+0 0xeba9000 con 0xd9e4840
2013-10-20 17:03:13.733741 7f95b27b0700  1 -- x.x.x.x:6804/1944 <==
client.38069 x.x.x.y:0/269199468 18234 
osd_op(client.38069.1:176708857 rb.0.967b.238e1f29.0071 [read
2752512~131072] 4.c0d1e4cb e6870) v4  170+0+0 (178955972 0 0)
0xe0b4d80 con 0xd9e4840

How to proceed with tuning read performance on LVM? Is there some
chanage needed in code of ceph/LVM or my config needs to be tuned?
If what is shown in logs means 4k read block in LVM case - then it
seems I need to tell LVM(or xfs on top of LVM dictates read block
side?) that io block should be rather 4m?

Ugis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com