Re: [ceph-users] Broken RPMs for CentOS 6 / RHEL 6

2013-05-03 Thread Gary Lowell
There were a couple other issues with v0.56.5 build.  It's been pulled, and 
replaced with v0.56.6 build, which just adds the couple bits that were missing.

Cheers,
Gary

On May 3, 2013, at 5:12 PM, Travis Austin wrote:

> Gary,
> 
> Any update on the missing RPM packages? I'm assuming it's the same issue with 
> the missing DEB packages for Ubuntu. I'm getting a 404 error at this URL:
> 
> http://ceph.com/debian/dists/precise/main/binary-amd64/Packages
> 
> Thanks,
> Travis
> 
> 
> Travis Austin
> Rezitech, Inc.
> 625 S Palm Street
> La Habra, CA 90631
> 
> Direct: 714-784-0334
> Main: 877-407-2000
> Fax: 866-881-0148
> Web: www.rezitech.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is it possible to make ceph as the virtualization layer for all my storage backends?

2013-05-03 Thread ????
and provides a unified interface to all my applications?
What impacts to throughput/IOPs/latency?
Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken RPMs for CentOS 6 / RHEL 6

2013-05-03 Thread Travis Austin
Gary,

Any update on the missing RPM packages? I'm assuming it's the same issue with 
the missing DEB packages for Ubuntu. I'm getting a 404 error at this URL:

http://ceph.com/debian/dists/precise/main/binary-amd64/Packages

Thanks,
Travis



Travis Austin
Rezitech, Inc.
625 S Palm Street
La Habra, CA 90631

Direct: 714-784-0334
Main: 877-407-2000
Fax: 866-881-0148
Web: www.rezitech.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.56.5 released

2013-05-03 Thread Josh Durgin

On 05/03/2013 01:48 PM, Jens Kristian Søgaard wrote:

Hi,


 * librbd: new async flush method to resolve qemu hangs (requires Qemu
   update as well)


I'm very interested in this update, as it has held our system back.
Which version of qemu is needed?


It's in a release yet.


The release notes for qemu 1.4 doesn't seem to mention this, so perhaps
it is not yet in the newest version? - if so, could someone please
direct me to a patch?


http://git.qemu.org/?p=qemu.git;a=commitdiff;h=dc7588c1eb3008bda53dde1d6b890cd299758155

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.56.5 released

2013-05-03 Thread Jens Kristian Søgaard

Hi,

 * librbd: new async flush method to resolve qemu hangs (requires Qemu 
   update as well)


I'm very interested in this update, as it has held our system back. 
Which version of qemu is needed?


The release notes for qemu 1.4 doesn't seem to mention this, so perhaps 
it is not yet in the newest version? - if so, could someone please 
direct me to a patch?


Thanks!

--
Jens Kristian Søgaard, Mermaid Consulting ApS,
j...@mermaidconsulting.dk,
http://www.mermaidconsulting.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.56.5 released --- not yet!

2013-05-03 Thread Sage Weil
On Fri, 3 May 2013, John Nielsen wrote:
> Will you please also bump the package revision number to e.g. 0.56.5-1.el6? 
> That will make life easier for those of us who eagerly installed the update 
> already by letting package managers do the right thing.

We will; it'll be v0.56.6.  :)

sage

> 
> On May 3, 2013, at 1:26 PM, Sage Weil  wrote:
> 
> > We just discovered a problem with the sysvinit script and with mkcephfs. A 
> > fix is building now, but in the meantime, hold off on this.  We've renamed 
> > the repositories temporarily until a non-broken package is posted.
> > 
> > Thanks!
> > sage
> > 
> > 
> > On Fri, 3 May 2013, Sage Weil wrote:
> > 
> >> Behold, another Bobtail update!  This one serves three main purposes: it 
> >> fixes a small issue with monitor features that is important when upgrading 
> >> from argonaut -> bobtail -> cuttlefish, it backports many changes to the 
> >> ceph-disk helper scripts that allow bobtail clusters to be deployed with 
> >> the new ceph-deploy tool or our chef cookbooks, and it fixes several 
> >> important bugs in librbd.  There is also, of course, the usual collection 
> >> of important bug fixes in other parts of the system.
> >> 
> >> Notable changes include:
> >> 
> >> * mon: fix recording of quorum feature set (important for argonaut -> 
> >>   bobtail -> cuttlefish mon upgrades)
> >> * osd: minor peering bug fixes
> >> * osd: fix a few bugs when pools are renamed
> >> * osd: fix occasionally corrupted pg stats
> >> * osd: fix behavior when broken v0.56[.0] clients connect
> >> * rbd: avoid FIEMAP ioctl on import (it is broken on some kernels)
> >> * librbd: fixes for several request/reply ordering bugs
> >> * librbd: only set STRIPINGV2 feature on new images when needed
> >> * librbd: new async flush method to resolve qemu hangs (requires Qemu 
> >>   update as well)
> >> * librbd: a few fixes to flatten
> >> * ceph-disk: support for dm-crypt
> >> * ceph-disk: many backports to allow bobtail deployments with 
> >>   ceph-deploy, chef
> >> * sysvinit: do not stop starting daemons on first failure
> >> * udev: fixed rules for redhat-based distros
> >> * build fixes for raring
> >> 
> >> For more detailed information, see the complete changelog.
> >> 
> >> You can get v0.56.5 from the usual places:
> >> 
> >> * Git at git://github.com/ceph/ceph.git
> >> * Tarball at http://ceph.com/download/ceph-0.56.5.tar.gz
> >> * For Debian/Ubuntu packages, see 
> >> http://ceph.com/docs/master/install/debian
> >> * For RPMs, see http://ceph.com/docs/master/install/rpm
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> 
> >> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.56.5 released --- not yet!

2013-05-03 Thread John Nielsen
Will you please also bump the package revision number to e.g. 0.56.5-1.el6? 
That will make life easier for those of us who eagerly installed the update 
already by letting package managers do the right thing.

On May 3, 2013, at 1:26 PM, Sage Weil  wrote:

> We just discovered a problem with the sysvinit script and with mkcephfs. A 
> fix is building now, but in the meantime, hold off on this.  We've renamed 
> the repositories temporarily until a non-broken package is posted.
> 
> Thanks!
> sage
> 
> 
> On Fri, 3 May 2013, Sage Weil wrote:
> 
>> Behold, another Bobtail update!  This one serves three main purposes: it 
>> fixes a small issue with monitor features that is important when upgrading 
>> from argonaut -> bobtail -> cuttlefish, it backports many changes to the 
>> ceph-disk helper scripts that allow bobtail clusters to be deployed with 
>> the new ceph-deploy tool or our chef cookbooks, and it fixes several 
>> important bugs in librbd.  There is also, of course, the usual collection 
>> of important bug fixes in other parts of the system.
>> 
>> Notable changes include:
>> 
>> * mon: fix recording of quorum feature set (important for argonaut -> 
>>   bobtail -> cuttlefish mon upgrades)
>> * osd: minor peering bug fixes
>> * osd: fix a few bugs when pools are renamed
>> * osd: fix occasionally corrupted pg stats
>> * osd: fix behavior when broken v0.56[.0] clients connect
>> * rbd: avoid FIEMAP ioctl on import (it is broken on some kernels)
>> * librbd: fixes for several request/reply ordering bugs
>> * librbd: only set STRIPINGV2 feature on new images when needed
>> * librbd: new async flush method to resolve qemu hangs (requires Qemu 
>>   update as well)
>> * librbd: a few fixes to flatten
>> * ceph-disk: support for dm-crypt
>> * ceph-disk: many backports to allow bobtail deployments with 
>>   ceph-deploy, chef
>> * sysvinit: do not stop starting daemons on first failure
>> * udev: fixed rules for redhat-based distros
>> * build fixes for raring
>> 
>> For more detailed information, see the complete changelog.
>> 
>> You can get v0.56.5 from the usual places:
>> 
>> * Git at git://github.com/ceph/ceph.git
>> * Tarball at http://ceph.com/download/ceph-0.56.5.tar.gz
>> * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian
>> * For RPMs, see http://ceph.com/docs/master/install/rpm
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.56.5 released --- not yet!

2013-05-03 Thread Sage Weil
We just discovered a problem with the sysvinit script and with mkcephfs. A 
fix is building now, but in the meantime, hold off on this.  We've renamed 
the repositories temporarily until a non-broken package is posted.

Thanks!
sage


On Fri, 3 May 2013, Sage Weil wrote:

> Behold, another Bobtail update!  This one serves three main purposes: it 
> fixes a small issue with monitor features that is important when upgrading 
> from argonaut -> bobtail -> cuttlefish, it backports many changes to the 
> ceph-disk helper scripts that allow bobtail clusters to be deployed with 
> the new ceph-deploy tool or our chef cookbooks, and it fixes several 
> important bugs in librbd.  There is also, of course, the usual collection 
> of important bug fixes in other parts of the system.
> 
> Notable changes include:
> 
>  * mon: fix recording of quorum feature set (important for argonaut -> 
>bobtail -> cuttlefish mon upgrades)
>  * osd: minor peering bug fixes
>  * osd: fix a few bugs when pools are renamed
>  * osd: fix occasionally corrupted pg stats
>  * osd: fix behavior when broken v0.56[.0] clients connect
>  * rbd: avoid FIEMAP ioctl on import (it is broken on some kernels)
>  * librbd: fixes for several request/reply ordering bugs
>  * librbd: only set STRIPINGV2 feature on new images when needed
>  * librbd: new async flush method to resolve qemu hangs (requires Qemu 
>update as well)
>  * librbd: a few fixes to flatten
>  * ceph-disk: support for dm-crypt
>  * ceph-disk: many backports to allow bobtail deployments with 
>ceph-deploy, chef
>  * sysvinit: do not stop starting daemons on first failure
>  * udev: fixed rules for redhat-based distros
>  * build fixes for raring
> 
> For more detailed information, see the complete changelog.
> 
> You can get v0.56.5 from the usual places:
> 
>  * Git at git://github.com/ceph/ceph.git
>  * Tarball at http://ceph.com/download/ceph-0.56.5.tar.gz
>  * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian
>  * For RPMs, see http://ceph.com/docs/master/install/rpm
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken RPMs for CentOS 6 / RHEL 6

2013-05-03 Thread Gary Lowell
Hi Ricardo -

The rpm build picked up the gptfdisk requires from SLES due to a merge error.  
I'm re-spinning the rpms to correct that.  I'll send an email when they have 
been pushed out.

Cheers,
Gary

On May 3, 2013, at 10:34 AM, Ricardo J. Barberis wrote:

> Hi, I'm trying to install ceph 0.56 from the repo, following instructions in 
> http://ceph.com/docs/master/install/rpm/
> 
> The installation fails with:
> 
> Error: Package: ceph-0.56.5-0.el6.x86_64 (ceph)
>   Requires: gptfdisk
> 
> I have searched EPEL, ElRepo, RepoForge (ex-RPMForge) but I can't find 
> gptfdisk anywhere for CentOS 6.4.
> 
> There is a gdisk in EPEL but it won't satisfy ceph's dependencies.
> 
> BTW, ceph-0.56.4-0.el6.x86_64 installed correctly a few weeks ago, so I'm 
> guessing it's a recent change in ceph's spec file.
> 
> I checked on github and the spec seems fine, maybe I just have to wait for 
> new 
> packages to be compiled, but I though I could let you know, just in case :)
> 
> Cheers,
> -- 
> Ricardo J. Barberis
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.56.5 released

2013-05-03 Thread Sage Weil
Behold, another Bobtail update!  This one serves three main purposes: it 
fixes a small issue with monitor features that is important when upgrading 
from argonaut -> bobtail -> cuttlefish, it backports many changes to the 
ceph-disk helper scripts that allow bobtail clusters to be deployed with 
the new ceph-deploy tool or our chef cookbooks, and it fixes several 
important bugs in librbd.  There is also, of course, the usual collection 
of important bug fixes in other parts of the system.

Notable changes include:

 * mon: fix recording of quorum feature set (important for argonaut -> 
   bobtail -> cuttlefish mon upgrades)
 * osd: minor peering bug fixes
 * osd: fix a few bugs when pools are renamed
 * osd: fix occasionally corrupted pg stats
 * osd: fix behavior when broken v0.56[.0] clients connect
 * rbd: avoid FIEMAP ioctl on import (it is broken on some kernels)
 * librbd: fixes for several request/reply ordering bugs
 * librbd: only set STRIPINGV2 feature on new images when needed
 * librbd: new async flush method to resolve qemu hangs (requires Qemu 
   update as well)
 * librbd: a few fixes to flatten
 * ceph-disk: support for dm-crypt
 * ceph-disk: many backports to allow bobtail deployments with 
   ceph-deploy, chef
 * sysvinit: do not stop starting daemons on first failure
 * udev: fixed rules for redhat-based distros
 * build fixes for raring

For more detailed information, see the complete changelog.

You can get v0.56.5 from the usual places:

 * Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.56.5.tar.gz
 * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian
 * For RPMs, see http://ceph.com/docs/master/install/rpm

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd tell bench

2013-05-03 Thread Travis Rhoden
Thanks Gregory, that's perfect.  Just the clarification I needed.

I was having the same thought, that if I increased the data to be written
to sufficient size, I would hit the backing store.  But even then, since we
just get one number from the  osd tell bench, it would really be some sort
of funky average of the two.  Like you said, it returns when it is "safe",
and that will happen once the last bit has been committed to the journal,
not flushed.  (I believe my thinking there is correct).  Not a complaint or
an issue.  Just like to make sure I understand things correctly.

Thanks again.


On Fri, May 3, 2013 at 1:21 PM, Gregory Farnum  wrote:

> On Fri, May 3, 2013 at 7:34 AM, Travis Rhoden  wrote:
> > I have a question about "tell bench" command.
> >
> > When I run this, is it behaving more or less like a dd on the drive?  It
> > appears to be, but I wanted to confirm whether or not it is bypassing all
> > the normal Ceph stack that would be writing metadata, calculating
> checksums,
> > etc.
> >
> > One bit of behavior I noticed a while back that I was not expecting is
> that
> > this command does write to the journal.  It made sense when I thought
> about
> > it, but when I have an SSD journal in front of an OSD, I can't get the
> "tell
> > bench" command to really show me accurate numbers of the raw speed of the
> > OSD -- instead I get write speeds of the SSD.  Just a small caveat there.
> >
> > The upside to that is when do you something like "tell \* bench", you are
> > able to see if that SSD becomes a bottleneck by hosting multiple
> journals,
> > so I'm not really complaining.  But it does make a bit tough to see if
> > perhaps one OSD is performing much differently than others.
> >
> > But really, I'm mainly curious if it skips any normal metadata/checksum
> > overhead that may be there otherwise.
>
> The way this is implemented, it writes data via the FileStore in a
> given chunk size. I believe the defaults are 1GB of data and 4MB, but
> you can set this: "ceph osd tell  bench  "
> (IIRC). By going through the FileStore it maintains much of the same
> workload as an incoming client request would (so it reports as
> complete at the same time it would return a "safe" response to a
> client, for instance, and does write to the journal), but it does
> leave some stuff out:
> 1) The OSD runs CRCs on the data in incoming messages; here the data
> is generated locally so of course this doesn't happen.
> 2) Normal writes require updating PG metadata (not included here),
> which adds generally one write that's not included here.
>
> If you increase the amount of data written in the bench to exceed the
> journal by some reasonable amount, you should be able to test your
> backing store throughput and not just your journal. :)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Broken RPMs for CentOS 6 / RHEL 6

2013-05-03 Thread Ricardo J. Barberis
Hi, I'm trying to install ceph 0.56 from the repo, following instructions in 
http://ceph.com/docs/master/install/rpm/

The installation fails with:

Error: Package: ceph-0.56.5-0.el6.x86_64 (ceph)
   Requires: gptfdisk

I have searched EPEL, ElRepo, RepoForge (ex-RPMForge) but I can't find 
gptfdisk anywhere for CentOS 6.4.

There is a gdisk in EPEL but it won't satisfy ceph's dependencies.

BTW, ceph-0.56.4-0.el6.x86_64 installed correctly a few weeks ago, so I'm 
guessing it's a recent change in ceph's spec file.

I checked on github and the spec seems fine, maybe I just have to wait for new 
packages to be compiled, but I though I could let you know, just in case :)

Cheers,
-- 
Ricardo J. Barberis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd tell bench

2013-05-03 Thread Gregory Farnum
On Fri, May 3, 2013 at 7:34 AM, Travis Rhoden  wrote:
> I have a question about "tell bench" command.
>
> When I run this, is it behaving more or less like a dd on the drive?  It
> appears to be, but I wanted to confirm whether or not it is bypassing all
> the normal Ceph stack that would be writing metadata, calculating checksums,
> etc.
>
> One bit of behavior I noticed a while back that I was not expecting is that
> this command does write to the journal.  It made sense when I thought about
> it, but when I have an SSD journal in front of an OSD, I can't get the "tell
> bench" command to really show me accurate numbers of the raw speed of the
> OSD -- instead I get write speeds of the SSD.  Just a small caveat there.
>
> The upside to that is when do you something like "tell \* bench", you are
> able to see if that SSD becomes a bottleneck by hosting multiple journals,
> so I'm not really complaining.  But it does make a bit tough to see if
> perhaps one OSD is performing much differently than others.
>
> But really, I'm mainly curious if it skips any normal metadata/checksum
> overhead that may be there otherwise.

The way this is implemented, it writes data via the FileStore in a
given chunk size. I believe the defaults are 1GB of data and 4MB, but
you can set this: "ceph osd tell  bench  "
(IIRC). By going through the FileStore it maintains much of the same
workload as an incoming client request would (so it reports as
complete at the same time it would return a "safe" response to a
client, for instance, and does write to the journal), but it does
leave some stuff out:
1) The OSD runs CRCs on the data in incoming messages; here the data
is generated locally so of course this doesn't happen.
2) Normal writes require updating PG metadata (not included here),
which adds generally one write that's not included here.

If you increase the amount of data written in the bench to exceed the
journal by some reasonable amount, you should be able to test your
backing store throughput and not just your journal. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd tell bench

2013-05-03 Thread Greg

Le 03/05/2013 16:34, Travis Rhoden a écrit :

I have a question about "tell bench" command.

When I run this, is it behaving more or less like a dd on the drive?  
It appears to be, but I wanted to confirm whether or not it is 
bypassing all the normal Ceph stack that would be writing metadata, 
calculating checksums, etc.


One bit of behavior I noticed a while back that I was not expecting is 
that this command does write to the journal. It made sense when I 
thought about it, but when I have an SSD journal in front of an OSD, I 
can't get the "tell bench" command to really show me accurate numbers 
of the raw speed of the OSD -- instead I get write speeds of the SSD.  
Just a small caveat there.


The upside to that is when do you something like "tell \* bench", you 
are able to see if that SSD becomes a bottleneck by hosting multiple 
journals, so I'm not really complaining.  But it does make a bit tough 
to see if perhaps one OSD is performing much differently than others.


But really, I'm mainly curious if it skips any normal 
metadata/checksum overhead that may be there otherwise.


Travis,

I'm no expert but, to me, the bench doesn't bypass the ceph stack.
On a test setup, I set up the journal on the same drive as the data 
drive, when I "tell bench" I can see ~160MB/s throughoutput on the SSD 
block device and the benchmark result is ~80MB/s which leads me to think 
the data is written twice : once to the journal and once to the 
"permanent" storage.
I see almost no read on the block device but the written data probably 
is in the page cache.


Cheers,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph osd tell bench

2013-05-03 Thread Travis Rhoden
I have a question about "tell bench" command.

When I run this, is it behaving more or less like a dd on the drive?  It
appears to be, but I wanted to confirm whether or not it is bypassing all
the normal Ceph stack that would be writing metadata, calculating
checksums, etc.

One bit of behavior I noticed a while back that I was not expecting is that
this command does write to the journal.  It made sense when I thought about
it, but when I have an SSD journal in front of an OSD, I can't get the
"tell bench" command to really show me accurate numbers of the raw speed of
the OSD -- instead I get write speeds of the SSD.  Just a small caveat
there.

The upside to that is when do you something like "tell \* bench", you are
able to see if that SSD becomes a bottleneck by hosting multiple journals,
so I'm not really complaining.  But it does make a bit tough to see if
perhaps one OSD is performing much differently than others.

But really, I'm mainly curious if it skips any normal metadata/checksum
overhead that may be there otherwise.

Thanks,

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Peering and disk utilization

2013-05-03 Thread Erdem Agaoglu
I'm not sure if the problems we are seeing are the same, but it looks like
it. Just a few hours ago, one slow OSD caused a lot of problems for us. It
is somehow reported down, and while cluster was trying to adjust, it said
it was wrongly marked down. So it seems some pgs were stuck in peering. We
restarted the OSD, cluster adjusted, after a while it is reported down
again and the whole process repeated. We thought we should keep the OSD
down, set noup, waited a while, with no luck, repeated. Even if there seems
no hardware problem we decided to set the osd out and started recovery.
Initial peering as you said seems so much resource intensive that it caused
another ~10 OSDs to be reported down, which increased the number of pgs in
peering, then they all said they're wrongly marked down... We already
lowered all the recovery parameters, it takes about 2-3 hours now, but that
doesn't make any difference in the starting phase of the recovery process
which may take up to 10 minutes. We have RBD backed KVM instances and they
are totally frozen for those 10 minutes. And if some pgs are stuck in
peering, it requires manual operation (a restart is what we could come up
with) before anything can actually continue working. We've found
http://www.spinics.net/lists/ceph-users/msg9.html but it doesn't offer
much. We run 0.56.4.


On Thu, May 2, 2013 at 4:57 PM, Andrey Korolyov  wrote:

> Hello,
>
> Speaking of rotating-media-under-filestore case(must be most common in
> Ceph deployments), can peering be less greedy for disk operations
> without slowing down entire 'blackhole timeout', e.g. when it blocks
> client operations? I`m suffering of very long and very disk-intensive
> peering process even on relatively small reweighs on more or less
> significant commit on the underlying storage(50% are very hard to deal
> with, 10% of disk commit way more acceptable). Recovery by itself can
> be throttled low enough to not compete with I/O disk operations from
> clients but slowing peering process means freezing client` I/O for
> longer time, that`s all.
> Cuttlefish seems to do a part of disk controller` job for merging
> writes, but peering is still unacceptably long for _IOPS_-intensive
> cluster(5Mb/s and 800 IOPS on every disk during peering, despite
> controller aligning head movements, disks are 100% busy). SSD-based
> cluster which should not die under lack of IOPS, but prices for such
> thing still closer to the TrueEnterpriseStorage(tm) than any solution
> I can afford.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
erdem agaoglu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com