Re: ceph:rgw issue #10698

2015-02-18 Thread Valery Tschopp

Hi Yehuda,

Would it be possible to have the bug fixes #10698 + #10062 for the S3 
POST issue be backported for the new release of firefly?


This feature is very important for us, our video conversion engine 
relies on the user S3 browser POST.


Best regards,
Valery

On 17/02/15 18:08 , Yehuda Sadeh-Weinraub wrote:

Subject: ceph:rgw issue #10698

Hello Yehuda,

The issue http://tracker.ceph.com/issues/10698 rgw: not failing
POST requests if keystone not configured is marked as resolved,
but I don't think it is backported in firefly.

Issue http://tracker.ceph.com/issues/10062 should already be
backported, if I'm not wrong...



Neither made it in. The bug wasn't set to be backported to firefly.
We can set it to get backported if there's a demand, however, I'm not
sure that it's going to be a trivial backport.

Yehuda



--
SWITCH
--
Valery Tschopp, Software Engineer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
email: valery.tsch...@switch.ch phone: +41 44 268 1544




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison

2015-02-18 Thread Alexandre DERUMIER
Nice Work Mark !

I don't see any tuning about sharding in the config file sample

(osd_op_num_threads_per_shard,osd_op_num_shards,...)

as you only use 1 ssd for the bench, I think it should improve results for 
hammer ?



- Mail original -
De: Mark Nelson mnel...@redhat.com
À: ceph-devel ceph-devel@vger.kernel.org
Cc: ceph-users ceph-us...@lists.ceph.com
Envoyé: Mardi 17 Février 2015 18:37:01
Objet: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance   
comparison

Hi All, 

I wrote up a short document describing some tests I ran recently to look 
at how SSD backed OSD performance has changed across our LTS releases. 
This is just looking at RADOS performance and not RBD or RGW. It also 
doesn't offer any real explanations regarding the results. It's just a 
first high level step toward understanding some of the behaviors folks 
on the mailing list have reported over the last couple of releases. I 
hope you find it useful. 

Mark 

___ 
ceph-users mailing list 
ceph-us...@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dumpling integration branch for v0.67.12 ready for QE

2015-02-18 Thread Loic Dachary


On 18/02/2015 18:38, Yuri Weinstein wrote:
 Hi all
 
 I updated all issues in http://tracker.ceph.com/issues/10560
 
 Based on what is listed there, we have 
 http://tracker.ceph.com/issues/10801 - Yehuda pls comment
 http://tracker.ceph.com/issues/10694 - Sam pls re-confirm
 
 rbd - Josh, I understood that we are good to go, pls re-confirm.
 
 I can re-run some suites if you'd like and we can make a call on this release.
 
 Loic - back to you, let me know what you think.

As long as you're satisfied with the test results, I have no further comment :-)

Cheers

 Thx
 YuriW
 
 - Original Message -
 From: Loic Dachary l...@dachary.org
 To: Yuri Weinstein ywein...@redhat.com
 Cc: Ceph Development ceph-devel@vger.kernel.org, Sage Weil 
 s...@redhat.com, Tamil Muthamizhan tmuth...@redhat.com, Zack Cerza 
 z...@redhat.com, Sandon Van Ness svann...@redhat.com
 Sent: Thursday, February 12, 2015 2:17:49 PM
 Subject: Re: dumpling integration branch for v0.67.12 ready for QE
 
 
 
 On 12/02/2015 23:06, Yuri Weinstein wrote:
 I linked all issues related to this release testing to the ticket 
 http://tracker.ceph.com/issues/10560

 After the team leads make a call of those, including environment issues, I 
 suggest re-running suites the failed again.

 Loic, I'd re-run them in the Octo, since we already started there, if you 
 agree ?
 
 Sure :-)
 

 Thx
 YuriW

 - Original Message -
 From: Yuri Weinstein ywein...@redhat.com
 To: Loic Dachary l...@dachary.org
 Cc: Ceph Development ceph-devel@vger.kernel.org, Sage Weil 
 s...@redhat.com, Tamil Muthamizhan tmuth...@redhat.com
 Sent: Wednesday, February 11, 2015 2:24:33 PM
 Subject: Re: dumpling integration branch for v0.67.12 ready for QE

 I replied to individual suites runs, but just wanted to summarize QE 
 validation status.

 The following suites were executed in the Octo lab (we will use Sepia in the 
 future if nobody objects).

 upgrade:dumpling
 
 ['45493']
 http://tracker.ceph.com/issues/10694 - Known Won't fix
 Assertion: osd/Watch.cc: 290: FAILED assert(!cb)

 *** Sam - pls confirm the Won't fix status.

 ['45495', '45496', '45498', '45499', '45500']
 http://tracker.ceph.com/issues/10838
 s3tests failed

 *** Yehuda - need your verdict on s3tests.

 fs
 
 All green !

 rados
 
 ['45054']
 http://tracker.ceph.com/issues/10841
 Issued certificate has expired 
 *** Sandon pls comment.

 ['45168', '45169']
 http://tracker.ceph.com/issues/10840
 coredump ceph_test_filestore_idempotent_sequence
 *** Sam - pls comment

 ['45215']
 Missing packages - no ticket FYI
 Failed to fetch 
 http://apt-mirror.front.sepia.ceph.com/archive.ubuntu.com/ubuntu/dists/trusty-updates/universe/binary-i386/Packages
   Hash Sum mismatch

 *** Zack, Sandon ?

 ceph-deploy
 

 Travis - pls suggest
 In general I am not sure if we needed to test this - Sage?

 rbd
 
 ['45365', '45366', '45367']
 http://tracker.ceph.com/issues/10842
 unable to connect to apt-mirror.front.sepia.ceph.com

 ['45349', '45350', '45351', '45355', '45356', '45357', '45363']
 http://tracker.ceph.com/issues/10802
 error: image still has watchers 
 (duplicate of 10680)

 *** Zack, Sandon, Josh - all environment noise, pls comment. 

 rgw
 
 ['45382', '45390']
 http://tracker.ceph.com/issues/10843
 s3tests failed - could be related or duplicate of 10838

 *** Yehuda - same as issues in upgrades?

 I am standing by for you analysis/replies and recommendations for next steps.

 Loic - let me know is you want to follow specific items in our backport 
 testing process that I missed here.
 PS:  I would think that you could've wanted to assign the release ticket to 
 QE (me) for validation and at this point I could've re-assigned it back to 
 devel (you), a?

 Thx
 YuriW

 - Original Message -
 From: Loic Dachary l...@dachary.org
 To: Yuri Weinstein ywein...@redhat.com
 Cc: Ceph Development ceph-devel@vger.kernel.org
 Sent: Tuesday, February 10, 2015 9:05:31 AM
 Subject: dumpling integration branch for v0.67.12 ready for QE

 Hi Yuri,

 The dumpling integration branch for v0.67.12 as found at 
 https://github.com/ceph/ceph/commits/dumpling-backports has been approved by 
 Yehuda, Josh and Sam and is ready for QE. 

 For the record, the head is 
 https://github.com/ceph/ceph/commit/3944c77c404c4a05886fe8276d5d0dd7e4f20410

 I think it would be best for the QE tests to use the dumpling-backports. The 
 alternative would be to merge dumpling-backports into dumpling. However, 
 since testing may take a long time and require more patches, it probably is 
 better to not do that iterative process on the dumpling branch itself. As it 
 is now, there already are a number of commits in the dumpling branch that 
 should really be in the dumpling-backports: they do not belong to v0.67.11 
 and are going to be released in v0.67.12. In the future though, the dumpling 
 branch will only receive commits that have been 

Re: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison

2015-02-18 Thread Alexandre DERUMIER
I don't have really good insight yet into how tweaking these would 
affect single-osd performance. I know the PCIe SSDs do have multiple 
controllers on-board so perhaps increasing the number of shards would 
improve things, but I suspect that going too high could maybe start 
hurting performance as well. Have you done any testing here? It could 
be an interesting follow-up paper. 

I think it should be tunned regarding number of osds and number of cores you 
have.
I have done test in past with sommath values

osd_op_num_threads_per_shard = 1
osd_op_num_shards = 25
filestore_fd_cache_size = 64
filestore_fd_cache_shards = 32

But don't have take time to try differents values.
But I was to be able to reach 12iops 4k read with 3osd if I remember. (But 
I was limited by client cpu)

I'm going to do big benchmark next month (3 nodes (20cores) with 6ssd each),
So I'll try to test different sharding values, with different number of osd.


- Mail original -
De: Mark Nelson mnel...@redhat.com
À: aderumier aderum...@odiso.com
Cc: ceph-devel ceph-devel@vger.kernel.org, ceph-users 
ceph-us...@lists.ceph.com
Envoyé: Mercredi 18 Février 2015 15:56:44
Objet: Re: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance 
comparison

Hi Alex, 

Thanks! I didn't tweak the sharding settings at all, so they are just 
at the default values: 

OPTION(osd_op_num_threads_per_shard, OPT_INT, 2) 
OPTION(osd_op_num_shards, OPT_INT, 5) 

I don't have really good insight yet into how tweaking these would 
affect single-osd performance. I know the PCIe SSDs do have multiple 
controllers on-board so perhaps increasing the number of shards would 
improve things, but I suspect that going too high could maybe start 
hurting performance as well. Have you done any testing here? It could 
be an interesting follow-up paper. 

Mark 

On 02/18/2015 02:34 AM, Alexandre DERUMIER wrote: 
 Nice Work Mark ! 
 
 I don't see any tuning about sharding in the config file sample 
 
 (osd_op_num_threads_per_shard,osd_op_num_shards,...) 
 
 as you only use 1 ssd for the bench, I think it should improve results for 
 hammer ? 
 
 
 
 - Mail original - 
 De: Mark Nelson mnel...@redhat.com 
 À: ceph-devel ceph-devel@vger.kernel.org 
 Cc: ceph-users ceph-us...@lists.ceph.com 
 Envoyé: Mardi 17 Février 2015 18:37:01 
 Objet: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance 
 comparison 
 
 Hi All, 
 
 I wrote up a short document describing some tests I ran recently to look 
 at how SSD backed OSD performance has changed across our LTS releases. 
 This is just looking at RADOS performance and not RBD or RGW. It also 
 doesn't offer any real explanations regarding the results. It's just a 
 first high level step toward understanding some of the behaviors folks 
 on the mailing list have reported over the last couple of releases. I 
 hope you find it useful. 
 
 Mark 
 
 ___ 
 ceph-users mailing list 
 ceph-us...@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 -- 
 To unsubscribe from this list: send the line unsubscribe ceph-devel in 
 the body of a message to majord...@vger.kernel.org 
 More majordomo info at http://vger.kernel.org/majordomo-info.html 
 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[radosgw] unconsistency between bucket and bucket.instance metadata

2015-02-18 Thread ghislain.chevalier
Hi all,

Context : Firefly 0.80.8, Ubuntu 14.04 LTS, Lab cluster


Yesterday, I successfully deleted a s3 bucket Bucket001ghis after removing 
the contents that were in.

 Today, as I was browsing the radosgw system metadata, I discovered an 
difference between the bucket metadata and the bucket.instance metadata as 
followed.
radosgw-admin --name client.radosgw.fr-rennes-radosgw1 metadata list bucket
[
bucket001ghis,
ghis,
bucket001johndoe,
bucket001transfert,
myb1,
mybucket]


radosgw-admin --name client.radosgw.fr-rennes-radosgw1 metadata list 
bucket.instance
[
bucket001ghis:fr-rennes-radosgw1.247011.1,
Bucket001ghis:fr-rennes-radosgw1.244654.2,
myb1:fr-rennes-radosgw1.246846.1,
mybucket:fr-rennes-radosgw1.244748.1,
bucket001transfert:fr-rennes-radosgw1.244654.1,
bucket001johndoe:fr-rennes-radosgw1.244742.1,
ghis:fr-rennes-radosgw1.246056.1]

Bucket001ghis:fr-rennes-radosgw1.244654.2 is still referenced in the 
bucket.instance metadata.

What can be the defect?

Best regards



_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dumpling integration branch for v0.67.12 ready for QE

2015-02-18 Thread Samuel Just
Yup, 10694 is a known bug in dumpling which we probably don't want to
fix.  The rados tests look ok to me I think.
-Sam

On Wed, Feb 18, 2015 at 9:38 AM, Yuri Weinstein ywein...@redhat.com wrote:
 Hi all

 I updated all issues in http://tracker.ceph.com/issues/10560

 Based on what is listed there, we have
 http://tracker.ceph.com/issues/10801 - Yehuda pls comment
 http://tracker.ceph.com/issues/10694 - Sam pls re-confirm

 rbd - Josh, I understood that we are good to go, pls re-confirm.

 I can re-run some suites if you'd like and we can make a call on this release.

 Loic - back to you, let me know what you think.

 Thx
 YuriW

 - Original Message -
 From: Loic Dachary l...@dachary.org
 To: Yuri Weinstein ywein...@redhat.com
 Cc: Ceph Development ceph-devel@vger.kernel.org, Sage Weil 
 s...@redhat.com, Tamil Muthamizhan tmuth...@redhat.com, Zack Cerza 
 z...@redhat.com, Sandon Van Ness svann...@redhat.com
 Sent: Thursday, February 12, 2015 2:17:49 PM
 Subject: Re: dumpling integration branch for v0.67.12 ready for QE



 On 12/02/2015 23:06, Yuri Weinstein wrote:
 I linked all issues related to this release testing to the ticket 
 http://tracker.ceph.com/issues/10560

 After the team leads make a call of those, including environment issues, I 
 suggest re-running suites the failed again.

 Loic, I'd re-run them in the Octo, since we already started there, if you 
 agree ?

 Sure :-)


 Thx
 YuriW

 - Original Message -
 From: Yuri Weinstein ywein...@redhat.com
 To: Loic Dachary l...@dachary.org
 Cc: Ceph Development ceph-devel@vger.kernel.org, Sage Weil 
 s...@redhat.com, Tamil Muthamizhan tmuth...@redhat.com
 Sent: Wednesday, February 11, 2015 2:24:33 PM
 Subject: Re: dumpling integration branch for v0.67.12 ready for QE

 I replied to individual suites runs, but just wanted to summarize QE 
 validation status.

 The following suites were executed in the Octo lab (we will use Sepia in the 
 future if nobody objects).

 upgrade:dumpling
 
 ['45493']
 http://tracker.ceph.com/issues/10694 - Known Won't fix
 Assertion: osd/Watch.cc: 290: FAILED assert(!cb)

 *** Sam - pls confirm the Won't fix status.

 ['45495', '45496', '45498', '45499', '45500']
 http://tracker.ceph.com/issues/10838
 s3tests failed

 *** Yehuda - need your verdict on s3tests.

 fs
 
 All green !

 rados
 
 ['45054']
 http://tracker.ceph.com/issues/10841
 Issued certificate has expired
 *** Sandon pls comment.

 ['45168', '45169']
 http://tracker.ceph.com/issues/10840
 coredump ceph_test_filestore_idempotent_sequence
 *** Sam - pls comment

 ['45215']
 Missing packages - no ticket FYI
 Failed to fetch 
 http://apt-mirror.front.sepia.ceph.com/archive.ubuntu.com/ubuntu/dists/trusty-updates/universe/binary-i386/Packages
   Hash Sum mismatch

 *** Zack, Sandon ?

 ceph-deploy
 

 Travis - pls suggest
 In general I am not sure if we needed to test this - Sage?

 rbd
 
 ['45365', '45366', '45367']
 http://tracker.ceph.com/issues/10842
 unable to connect to apt-mirror.front.sepia.ceph.com

 ['45349', '45350', '45351', '45355', '45356', '45357', '45363']
 http://tracker.ceph.com/issues/10802
 error: image still has watchers
 (duplicate of 10680)

 *** Zack, Sandon, Josh - all environment noise, pls comment.

 rgw
 
 ['45382', '45390']
 http://tracker.ceph.com/issues/10843
 s3tests failed - could be related or duplicate of 10838

 *** Yehuda - same as issues in upgrades?

 I am standing by for you analysis/replies and recommendations for next steps.

 Loic - let me know is you want to follow specific items in our backport 
 testing process that I missed here.
 PS:  I would think that you could've wanted to assign the release ticket to 
 QE (me) for validation and at this point I could've re-assigned it back to 
 devel (you), a?

 Thx
 YuriW

 - Original Message -
 From: Loic Dachary l...@dachary.org
 To: Yuri Weinstein ywein...@redhat.com
 Cc: Ceph Development ceph-devel@vger.kernel.org
 Sent: Tuesday, February 10, 2015 9:05:31 AM
 Subject: dumpling integration branch for v0.67.12 ready for QE

 Hi Yuri,

 The dumpling integration branch for v0.67.12 as found at 
 https://github.com/ceph/ceph/commits/dumpling-backports has been approved by 
 Yehuda, Josh and Sam and is ready for QE.

 For the record, the head is 
 https://github.com/ceph/ceph/commit/3944c77c404c4a05886fe8276d5d0dd7e4f20410

 I think it would be best for the QE tests to use the dumpling-backports. The 
 alternative would be to merge dumpling-backports into dumpling. However, 
 since testing may take a long time and require more patches, it probably is 
 better to not do that iterative process on the dumpling branch itself. As it 
 is now, there already are a number of commits in the dumpling branch that 
 should really be in the dumpling-backports: they do not belong to v0.67.11 
 and are going to be released in v0.67.12. In the future though, the dumpling 
 

Re: ceph:rgw issue #10698

2015-02-18 Thread Yehuda Sadeh-Weinraub


- Original Message -
 From: Valery Tschopp valery.tsch...@switch.ch
 To: Yehuda Sadeh-Weinraub yeh...@redhat.com
 Cc: ceph-devel ceph-devel@vger.kernel.org
 Sent: Wednesday, February 18, 2015 12:32:47 AM
 Subject: Re: ceph:rgw issue #10698
 


 Hi Yehuda,
 
 Would it be possible to have the bug fixes #10698 + #10062 for the S3
 POST issue be backported for the new release of firefly?
 
 This feature is very important for us, our video conversion engine
 relies on the user S3 browser POST.
 

I reopened the issues, set them as pending for backport. Pushed the 
wip-firefly-rgw-backports branch with these fixes in it.

Yehuda

 Best regards,
 Valery
 
 On 17/02/15 18:08 , Yehuda Sadeh-Weinraub wrote:
  Subject: ceph:rgw issue #10698
 
  Hello Yehuda,
 
  The issue http://tracker.ceph.com/issues/10698 rgw: not failing
  POST requests if keystone not configured is marked as resolved,
  but I don't think it is backported in firefly.
 
  Issue http://tracker.ceph.com/issues/10062 should already be
  backported, if I'm not wrong...
 
 
  Neither made it in. The bug wasn't set to be backported to firefly.
  We can set it to get backported if there's a demand, however, I'm not
  sure that it's going to be a trivial backport.
 
  Yehuda
 
 
 --
 SWITCH
 --
 Valery Tschopp, Software Engineer, Peta Solutions
 Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
 email: valery.tsch...@switch.ch phone: +41 44 268 1544
 
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dumpling integration branch for v0.67.12 ready for QE

2015-02-18 Thread Yuri Weinstein
Hi all

I updated all issues in http://tracker.ceph.com/issues/10560

Based on what is listed there, we have 
http://tracker.ceph.com/issues/10801 - Yehuda pls comment
http://tracker.ceph.com/issues/10694 - Sam pls re-confirm

rbd - Josh, I understood that we are good to go, pls re-confirm.

I can re-run some suites if you'd like and we can make a call on this release.

Loic - back to you, let me know what you think.

Thx
YuriW

- Original Message -
From: Loic Dachary l...@dachary.org
To: Yuri Weinstein ywein...@redhat.com
Cc: Ceph Development ceph-devel@vger.kernel.org, Sage Weil 
s...@redhat.com, Tamil Muthamizhan tmuth...@redhat.com, Zack Cerza 
z...@redhat.com, Sandon Van Ness svann...@redhat.com
Sent: Thursday, February 12, 2015 2:17:49 PM
Subject: Re: dumpling integration branch for v0.67.12 ready for QE



On 12/02/2015 23:06, Yuri Weinstein wrote:
 I linked all issues related to this release testing to the ticket 
 http://tracker.ceph.com/issues/10560
 
 After the team leads make a call of those, including environment issues, I 
 suggest re-running suites the failed again.
 
 Loic, I'd re-run them in the Octo, since we already started there, if you 
 agree ?

Sure :-)

 
 Thx
 YuriW
 
 - Original Message -
 From: Yuri Weinstein ywein...@redhat.com
 To: Loic Dachary l...@dachary.org
 Cc: Ceph Development ceph-devel@vger.kernel.org, Sage Weil 
 s...@redhat.com, Tamil Muthamizhan tmuth...@redhat.com
 Sent: Wednesday, February 11, 2015 2:24:33 PM
 Subject: Re: dumpling integration branch for v0.67.12 ready for QE
 
 I replied to individual suites runs, but just wanted to summarize QE 
 validation status.
 
 The following suites were executed in the Octo lab (we will use Sepia in the 
 future if nobody objects).
 
 upgrade:dumpling
 
 ['45493']
 http://tracker.ceph.com/issues/10694 - Known Won't fix
 Assertion: osd/Watch.cc: 290: FAILED assert(!cb)
 
 *** Sam - pls confirm the Won't fix status.
 
 ['45495', '45496', '45498', '45499', '45500']
 http://tracker.ceph.com/issues/10838
 s3tests failed
 
 *** Yehuda - need your verdict on s3tests.
 
 fs
 
 All green !
 
 rados
 
 ['45054']
 http://tracker.ceph.com/issues/10841
 Issued certificate has expired 
 *** Sandon pls comment.
 
 ['45168', '45169']
 http://tracker.ceph.com/issues/10840
 coredump ceph_test_filestore_idempotent_sequence
 *** Sam - pls comment
 
 ['45215']
 Missing packages - no ticket FYI
 Failed to fetch 
 http://apt-mirror.front.sepia.ceph.com/archive.ubuntu.com/ubuntu/dists/trusty-updates/universe/binary-i386/Packages
   Hash Sum mismatch
 
 *** Zack, Sandon ?
 
 ceph-deploy
 
 
 Travis - pls suggest
 In general I am not sure if we needed to test this - Sage?
 
 rbd
 
 ['45365', '45366', '45367']
 http://tracker.ceph.com/issues/10842
 unable to connect to apt-mirror.front.sepia.ceph.com
 
 ['45349', '45350', '45351', '45355', '45356', '45357', '45363']
 http://tracker.ceph.com/issues/10802
 error: image still has watchers 
 (duplicate of 10680)
 
 *** Zack, Sandon, Josh - all environment noise, pls comment. 
 
 rgw
 
 ['45382', '45390']
 http://tracker.ceph.com/issues/10843
 s3tests failed - could be related or duplicate of 10838
 
 *** Yehuda - same as issues in upgrades?
 
 I am standing by for you analysis/replies and recommendations for next steps.
 
 Loic - let me know is you want to follow specific items in our backport 
 testing process that I missed here.
 PS:  I would think that you could've wanted to assign the release ticket to 
 QE (me) for validation and at this point I could've re-assigned it back to 
 devel (you), a?
 
 Thx
 YuriW
 
 - Original Message -
 From: Loic Dachary l...@dachary.org
 To: Yuri Weinstein ywein...@redhat.com
 Cc: Ceph Development ceph-devel@vger.kernel.org
 Sent: Tuesday, February 10, 2015 9:05:31 AM
 Subject: dumpling integration branch for v0.67.12 ready for QE
 
 Hi Yuri,
 
 The dumpling integration branch for v0.67.12 as found at 
 https://github.com/ceph/ceph/commits/dumpling-backports has been approved by 
 Yehuda, Josh and Sam and is ready for QE. 
 
 For the record, the head is 
 https://github.com/ceph/ceph/commit/3944c77c404c4a05886fe8276d5d0dd7e4f20410
 
 I think it would be best for the QE tests to use the dumpling-backports. The 
 alternative would be to merge dumpling-backports into dumpling. However, 
 since testing may take a long time and require more patches, it probably is 
 better to not do that iterative process on the dumpling branch itself. As it 
 is now, there already are a number of commits in the dumpling branch that 
 should really be in the dumpling-backports: they do not belong to v0.67.11 
 and are going to be released in v0.67.12. In the future though, the dumpling 
 branch will only receive commits that have been carefully tested and all the 
 integration work will be on the dumpling-backports branch exclusively. So 
 that third parties do not have 

12 March - Ceph Day San Francisco

2015-02-18 Thread Patrick McGarry
Hey cephers,

We still have a couple of speaking slots open for Ceph Day San
Francisco on 12 March. I'm open to both high level what have you been
doing with Ceph type talks as well as more technical here is what
we're writing and/or integrating with Ceph.

I know many folks will be at VAULT, but we figured there would still
be plenty of folks left on the west coast, so let me know if you'd be
interested in speaking. Thanks!


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dumpling integration branch for v0.67.12 ready for QE

2015-02-18 Thread Josh Durgin


- Original Message -
From: Loic Dachary l...@dachary.org
To: Yuri Weinstein ywein...@redhat.com
Cc: Ceph Development ceph-devel@vger.kernel.org, Tamil Muthamizhan 
tmuth...@redhat.com
Sent: Wednesday, February 18, 2015 9:56:14 AM
Subject: Re: dumpling integration branch for v0.67.12 ready for QE



On 18/02/2015 18:38, Yuri Weinstein wrote:
 Hi all
 
 I updated all issues in http://tracker.ceph.com/issues/10560
 
 Based on what is listed there, we have 
 http://tracker.ceph.com/issues/10801 - Yehuda pls comment
 http://tracker.ceph.com/issues/10694 - Sam pls re-confirm
 
 rbd - Josh, I understood that we are good to go, pls re-confirm.

Yes, good to go from my perspective.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


poll: If calamari could monitor and alert on 5 things...

2015-02-18 Thread Gregory Meno
What would they be?
Please respond with your top 5 things I want to know about a Ceph cluster.


I want Calamari to have improved monitoring of Ceph.
I would like to focus on getting a few things exposed really well through the 
calamari-api.

if you need some inspiration there is 
http://redhatstorage.redhat.com/2015/02/12/10-commands-every-ceph-administrator-should-know/
Calamari already exposes a number of these.

regards,
Gregory

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


full_ratios - please explain?

2015-02-18 Thread Wyllys Ingersoll
Can someone explain the interaction and effects of all of these
full_ratio parameters?  I havent found any real good explanation of how
they affect the distribution of data once the cluster gets above the
nearfull and close to the close ratios.


mon_osd_full_ratio
mon_osd_nearfull_ratio

osd_backfill_full_ratio
osd_failsafe_full_ratio
osd_failsafe_nearfull_ratio

We have a cluster with about 144 OSDs (518 TB) and trying to get it to a
90% full rate for testing purposes.

We've found that when some of the OSDs get above the mon_osd_full_ratio
value (.95 in our system), then it stops accepting any new data, even
though there is plenty of space left on other OSDs that are not yet even up
to 90%.  Tweaking the osd_failsafe ratios enabled data to move again for a
bit, but eventually it becomes unbalanced and stops working again.

Is there a recommended combination of values to use that will allow the
cluster to continue accepting data and rebalancing correctly above 90%.

thanks,
 Wyllys Ingersoll
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] libceph: kfree() in put_osd() shouldn't depend on authorizer

2015-02-18 Thread Ilya Dryomov
a255651d4cad (ceph: ensure auth ops are defined before use) made
kfree() in put_osd() conditional on the authorizer.  A mechanical
mistake most likely - fix it.

Cc: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 net/ceph/osd_client.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index f693a2f8ac86..41a4abc7e98e 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1035,10 +1035,11 @@ static void put_osd(struct ceph_osd *osd)
 {
dout(put_osd %p %d - %d\n, osd, atomic_read(osd-o_ref),
 atomic_read(osd-o_ref) - 1);
-   if (atomic_dec_and_test(osd-o_ref)  osd-o_auth.authorizer) {
+   if (atomic_dec_and_test(osd-o_ref)) {
struct ceph_auth_client *ac = osd-o_osdc-client-monc.auth;
 
-   ceph_auth_destroy_authorizer(ac, osd-o_auth.authorizer);
+   if (osd-o_auth.authorizer)
+   ceph_auth_destroy_authorizer(ac, 
osd-o_auth.authorizer);
kfree(osd);
}
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] libceph: kfree() in put_osd() shouldn't depend on authorizer

2015-02-18 Thread Ilya Dryomov
a255651d4cad (ceph: ensure auth ops are defined before use) made
kfree() in put_osd() conditional on the authorizer.  A mechanical
mistake most likely - fix it.

Cc: Alex Elder el...@linaro.org
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 net/ceph/osd_client.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index f693a2f8ac86..41a4abc7e98e 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1035,10 +1035,11 @@ static void put_osd(struct ceph_osd *osd)
 {
dout(put_osd %p %d - %d\n, osd, atomic_read(osd-o_ref),
 atomic_read(osd-o_ref) - 1);
-   if (atomic_dec_and_test(osd-o_ref)  osd-o_auth.authorizer) {
+   if (atomic_dec_and_test(osd-o_ref)) {
struct ceph_auth_client *ac = osd-o_osdc-client-monc.auth;
 
-   ceph_auth_destroy_authorizer(ac, osd-o_auth.authorizer);
+   if (osd-o_auth.authorizer)
+   ceph_auth_destroy_authorizer(ac, 
osd-o_auth.authorizer);
kfree(osd);
}
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libceph: kfree() in put_osd() shouldn't depend on authorizer

2015-02-18 Thread Ilya Dryomov
On Wed, Feb 18, 2015 at 4:27 PM, Ilya Dryomov idryo...@gmail.com wrote:
 a255651d4cad (ceph: ensure auth ops are defined before use) made
 kfree() in put_osd() conditional on the authorizer.  A mechanical
 mistake most likely - fix it.

 Cc: Alex Elder el...@linaro.org
 Signed-off-by: Ilya Dryomov idryo...@gmail.com
 ---
  net/ceph/osd_client.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

 diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
 index f693a2f8ac86..41a4abc7e98e 100644
 --- a/net/ceph/osd_client.c
 +++ b/net/ceph/osd_client.c
 @@ -1035,10 +1035,11 @@ static void put_osd(struct ceph_osd *osd)
  {
 dout(put_osd %p %d - %d\n, osd, atomic_read(osd-o_ref),
  atomic_read(osd-o_ref) - 1);
 -   if (atomic_dec_and_test(osd-o_ref)  osd-o_auth.authorizer) {
 +   if (atomic_dec_and_test(osd-o_ref)) {
 struct ceph_auth_client *ac = osd-o_osdc-client-monc.auth;

 -   ceph_auth_destroy_authorizer(ac, osd-o_auth.authorizer);
 +   if (osd-o_auth.authorizer)
 +   ceph_auth_destroy_authorizer(ac, 
 osd-o_auth.authorizer);
 kfree(osd);
 }
  }

Sorry, this is a dup - ignore it.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libceph: kfree() in put_osd() shouldn't depend on authorizer

2015-02-18 Thread Alex Elder
On 02/18/2015 07:27 AM, Ilya Dryomov wrote:
 a255651d4cad (ceph: ensure auth ops are defined before use) made
 kfree() in put_osd() conditional on the authorizer.  A mechanical
 mistake most likely - fix it.

You are generous in suggesting it's a mechanical mistake.
But it is a mistake nevertheless.  The fix looks good.

Reviewed-by: Alex Elder el...@linaro.org

 Cc: Alex Elder el...@linaro.org
 Signed-off-by: Ilya Dryomov idryo...@gmail.com
 ---
  net/ceph/osd_client.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
 index f693a2f8ac86..41a4abc7e98e 100644
 --- a/net/ceph/osd_client.c
 +++ b/net/ceph/osd_client.c
 @@ -1035,10 +1035,11 @@ static void put_osd(struct ceph_osd *osd)
  {
   dout(put_osd %p %d - %d\n, osd, atomic_read(osd-o_ref),
atomic_read(osd-o_ref) - 1);
 - if (atomic_dec_and_test(osd-o_ref)  osd-o_auth.authorizer) {
 + if (atomic_dec_and_test(osd-o_ref)) {
   struct ceph_auth_client *ac = osd-o_osdc-client-monc.auth;
  
 - ceph_auth_destroy_authorizer(ac, osd-o_auth.authorizer);
 + if (osd-o_auth.authorizer)
 + ceph_auth_destroy_authorizer(ac, 
 osd-o_auth.authorizer);
   kfree(osd);
   }
  }
 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full_ratios - please explain?

2015-02-18 Thread Sage Weil
On Wed, 18 Feb 2015, Wyllys Ingersoll wrote:
 Thanks!  More below inline...
 
 On Wed, Feb 18, 2015 at 10:05 AM, Wido den Hollander w...@42on.com wrote:
  On 18-02-15 15:39, Wyllys Ingersoll wrote:
  Can someone explain the interaction and effects of all of these
  full_ratio parameters?  I havent found any real good explanation of how
  they affect the distribution of data once the cluster gets above the
  nearfull and close to the close ratios.
 
 
  When only ONE (1) OSD goes over the mon_osd_nearfull_ratio the cluster
  goes from HEALTH_OK into HEALTH_WARN state.
 
 
  mon_osd_full_ratio
  mon_osd_nearfull_ratio
 
  osd_backfill_full_ratio
  osd_failsafe_full_ratio
  osd_failsafe_nearfull_ratio
 
  We have a cluster with about 144 OSDs (518 TB) and trying to get it to a
  90% full rate for testing purposes.
 
  We've found that when some of the OSDs get above the mon_osd_full_ratio
  value (.95 in our system), then it stops accepting any new data, even
  though there is plenty of space left on other OSDs that are not yet even up
  to 90%.  Tweaking the osd_failsafe ratios enabled data to move again for a
  bit, but eventually it becomes unbalanced and stops working again.
 
 
  Yes, that is because with Ceph safety goes first. When only one OSD goes
  over the full ratio the whole cluster stops I/O.
 
 
 
 Which full_ratio?  The problem is that there are at least 3
 full_ratios - mon_osd_full_ratio, osd_failsafe_full_ratio, and
 osd_backfill_full_ratio - how do they interact? What is the
 consequence of having one be higher than the others?

mon_osd_full_ratio (.95) ... when any OSD reaches this threshold the 
monitor marks the cluster as 'full' and client writes are not accepted.

mon_osd_nearfull_ratio (.85) ... when any OSD reaches this threshold the 
cluster goes HEALTH_WARN and calls out near-full OSDs.

osd_backfill_full_ratio (.85) ... when an OSD locally reaches this 
threshold it will refuse to migrate a PG to itself.  This prevents 
rebalancing or repair from overfilling an OSD.  It should be lower than 
the

The osd_failsafe_full_ratio (.97) is a final sanity check that makes the 
OSD throw out writes if it is really close to full.

It's bad news if an OSD fills up completely so we do what we can to 
prevent it.

 Its seems extreme that 1 full osd out of potentially hundreds would
 cause all IO into the cluster to stop when there are literally 10s or
 100s of terrabytes of space left on other, less-full OSDs.

Yes, but the nature of hash-based distribution is that you don't know 
where a write will go, so you don't want to let the cluster fill up.  85% 
is pretty conservative; you could increase it if you're comfortable.  Just 
be aware that file systems over 80% start to get very slow so it is a 
bad idea to run them this full anyway.

 The confusion for me (and probably for others) is the proliferation of
 full_ratio parameters and a lack of clarity on how they all affect
 the cluster health and ability to balance when things start to fill
 up.
 
 
 
  CRUSH does not take OSD utilization into account when placing data, so
  it's almost impossible to predict which I/O can continue.
 
  Data safety and integrity is priority number 1. Full disks are a danger
  to those priorities, so I/O is stopped.
 
 
 Understood, but 1 full disk out of hundreds should not cause the
 entire system to stop accepting new data or even balancing out the
 data that it already has especially when there is room to grow yet on
 other OSDs.

The proper response to this currently is that if an OSD reaches the 
lower nearfull threshold the admin gets a warning and triggers some 
rebalancing.  That's why it's 10% lower then the actual full cutoff--there 
is plenty of time to adjust weights and/or expand the cluster.

It's not an ideal approach, perhaps, but it's simple and works well 
enough.  And it's not clear that there's is anything better we can do that 
isn't also very complicated...

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full_ratios - please explain?

2015-02-18 Thread Wyllys Ingersoll
OK, thanks for the clarifications!

-Wyllys


On Wed, Feb 18, 2015 at 10:52 AM, Sage Weil s...@newdream.net wrote:
 On Wed, 18 Feb 2015, Wyllys Ingersoll wrote:
 Thanks!  More below inline...

 On Wed, Feb 18, 2015 at 10:05 AM, Wido den Hollander w...@42on.com wrote:
  On 18-02-15 15:39, Wyllys Ingersoll wrote:
  Can someone explain the interaction and effects of all of these
  full_ratio parameters?  I havent found any real good explanation of how
  they affect the distribution of data once the cluster gets above the
  nearfull and close to the close ratios.
 
 
  When only ONE (1) OSD goes over the mon_osd_nearfull_ratio the cluster
  goes from HEALTH_OK into HEALTH_WARN state.
 
 
  mon_osd_full_ratio
  mon_osd_nearfull_ratio
 
  osd_backfill_full_ratio
  osd_failsafe_full_ratio
  osd_failsafe_nearfull_ratio
 
  We have a cluster with about 144 OSDs (518 TB) and trying to get it to a
  90% full rate for testing purposes.
 
  We've found that when some of the OSDs get above the mon_osd_full_ratio
  value (.95 in our system), then it stops accepting any new data, even
  though there is plenty of space left on other OSDs that are not yet even 
  up
  to 90%.  Tweaking the osd_failsafe ratios enabled data to move again for a
  bit, but eventually it becomes unbalanced and stops working again.
 
 
  Yes, that is because with Ceph safety goes first. When only one OSD goes
  over the full ratio the whole cluster stops I/O.



 Which full_ratio?  The problem is that there are at least 3
 full_ratios - mon_osd_full_ratio, osd_failsafe_full_ratio, and
 osd_backfill_full_ratio - how do they interact? What is the
 consequence of having one be higher than the others?

 mon_osd_full_ratio (.95) ... when any OSD reaches this threshold the
 monitor marks the cluster as 'full' and client writes are not accepted.

 mon_osd_nearfull_ratio (.85) ... when any OSD reaches this threshold the
 cluster goes HEALTH_WARN and calls out near-full OSDs.

 osd_backfill_full_ratio (.85) ... when an OSD locally reaches this
 threshold it will refuse to migrate a PG to itself.  This prevents
 rebalancing or repair from overfilling an OSD.  It should be lower than
 the

 The osd_failsafe_full_ratio (.97) is a final sanity check that makes the
 OSD throw out writes if it is really close to full.

 It's bad news if an OSD fills up completely so we do what we can to
 prevent it.

 Its seems extreme that 1 full osd out of potentially hundreds would
 cause all IO into the cluster to stop when there are literally 10s or
 100s of terrabytes of space left on other, less-full OSDs.

 Yes, but the nature of hash-based distribution is that you don't know
 where a write will go, so you don't want to let the cluster fill up.  85%
 is pretty conservative; you could increase it if you're comfortable.  Just
 be aware that file systems over 80% start to get very slow so it is a
 bad idea to run them this full anyway.

 The confusion for me (and probably for others) is the proliferation of
 full_ratio parameters and a lack of clarity on how they all affect
 the cluster health and ability to balance when things start to fill
 up.


 
  CRUSH does not take OSD utilization into account when placing data, so
  it's almost impossible to predict which I/O can continue.
 
  Data safety and integrity is priority number 1. Full disks are a danger
  to those priorities, so I/O is stopped.


 Understood, but 1 full disk out of hundreds should not cause the
 entire system to stop accepting new data or even balancing out the
 data that it already has especially when there is room to grow yet on
 other OSDs.

 The proper response to this currently is that if an OSD reaches the
 lower nearfull threshold the admin gets a warning and triggers some
 rebalancing.  That's why it's 10% lower then the actual full cutoff--there
 is plenty of time to adjust weights and/or expand the cluster.

 It's not an ideal approach, perhaps, but it's simple and works well
 enough.  And it's not clear that there's is anything better we can do that
 isn't also very complicated...

 sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison

2015-02-18 Thread Mark Nelson

Hi Alex,

Thanks!  I didn't tweak the sharding settings at all, so they are just 
at the default values:


OPTION(osd_op_num_threads_per_shard, OPT_INT, 2)
OPTION(osd_op_num_shards, OPT_INT, 5)

I don't have really good insight yet into how tweaking these would 
affect single-osd performance.  I know the PCIe SSDs do have multiple 
controllers on-board so perhaps increasing the number of shards would 
improve things, but I suspect that going too high could maybe start 
hurting performance as well.  Have you done any testing here?  It could 
be an interesting follow-up paper.


Mark

On 02/18/2015 02:34 AM, Alexandre DERUMIER wrote:

Nice Work Mark !

I don't see any tuning about sharding in the config file sample

(osd_op_num_threads_per_shard,osd_op_num_shards,...)

as you only use 1 ssd for the bench, I think it should improve results for 
hammer ?



- Mail original -
De: Mark Nelson mnel...@redhat.com
À: ceph-devel ceph-devel@vger.kernel.org
Cc: ceph-users ceph-us...@lists.ceph.com
Envoyé: Mardi 17 Février 2015 18:37:01
Objet: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance   
comparison

Hi All,

I wrote up a short document describing some tests I ran recently to look
at how SSD backed OSD performance has changed across our LTS releases.
This is just looking at RADOS performance and not RBD or RGW. It also
doesn't offer any real explanations regarding the results. It's just a
first high level step toward understanding some of the behaviors folks
on the mailing list have reported over the last couple of releases. I
hope you find it useful.

Mark

___
ceph-users mailing list
ceph-us...@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore performance comparison

2015-02-18 Thread Mark Nelson

Hi Andrei,

On 02/18/2015 09:08 AM, Andrei Mikhailovsky wrote:


Mark, many thanks for your effort and ceph performance tests. This puts
things in perspective.

Looking at the results, I was a bit concerned that the IOPs performance
in niether releases come even marginally close to the capabilities of
the underlying ssd device. Even the fastest PCI ssds have only managed
to achieve about the 1/6th IOPs of the raw device.


Perspective is definitely good!  Any time you are dealing with latency 
sensitive workloads, there are a lot of bottlenecks that can limit your 
performance.  There's a world of difference between streaming data to a 
raw SSD as fast as possible and writing data out to a distributed 
storage system that is calculating data placement, invoking the TCP 
stack, doing CRC checks, journaling writes, invoking the VM layer to 
cache data in case it's hot (which in this case it's not).




I guess there is a great deal more optimisations to be done in the
upcoming LTS releases to make the IOPs rate close to the raw device
performance.


There is definitely still room for improvement!  It's important to 
remember though that there is always going to be a trade off between 
flexibility, data integrity, and performance.  If low latency is your 
number one need before anything else, you are probably best off 
eliminating as much software as possible between you and the device 
(except possibly if you can make clever use of caching).  While Ceph 
itself is some times the bottleneck, in many cases we've found that 
bottlenecks in the software that surrounds Ceph are just as big 
obstacles (filesystem, VM layer, TCP stack, leveldb, etc).  If you need 
a distributed storage system that can universally maintain native SSD 
levels of performance, the entire stack has to be highly tuned.




I have done some testing in the past and noticed that despite the server
having a lot of unused resources (about 40-50% server idle and about
60-70% ssd idle) the ceph would not perform well when used with ssds. I
was testing with Firefly + auth and my IOPs rate was around the 3K mark.
Something is holding ceph back from performing well with ssds (((


Out of curiosity, did you try the same tests directly on the SSD?



Andrei



*From: *Mark Nelson mnel...@redhat.com
*To: *ceph-devel ceph-devel@vger.kernel.org
*Cc: *ceph-us...@lists.ceph.com
*Sent: *Tuesday, 17 February, 2015 5:37:01 PM
*Subject: *[ceph-users] Ceph Dumpling/Firefly/Hammer SSD/Memstore
performancecomparison

Hi All,

I wrote up a short document describing some tests I ran recently to
look
at how SSD backed OSD performance has changed across our LTS releases.
This is just looking at RADOS performance and not RBD or RGW.  It also
doesn't offer any real explanations regarding the results.  It's just a
first high level step toward understanding some of the behaviors folks
on the mailing list have reported over the last couple of releases.  I
hope you find it useful.

Mark

___
ceph-users mailing list
ceph-us...@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libceph: fix double __remove_osd() problem

2015-02-18 Thread Alex Elder
On 02/18/2015 07:25 AM, Ilya Dryomov wrote:
 It turns out it's possible to get __remove_osd() called twice on the
 same OSD.  That doesn't sit well with rb_erase() - depending on the
 shape of the tree we can get a NULL dereference, a soft lockup or
 a random crash at some point in the future as we end up touching freed
 memory.  One scenario that I was able to reproduce is as follows:
 
 osd3 is idle, on the osd lru list
 con reset - osd3
 con_fault_finish()
   osd_reset()
   osdmap - osd3 down
   ceph_osdc_handle_map()
 takes map_sem
 kick_requests()
   takes request_mutex
   reset_changed_osds()
 __reset_osd()
   __remove_osd()
   releases request_mutex
 releases map_sem
 takes map_sem
 takes request_mutex
 __kick_osd_requests()
   __reset_osd()
 __remove_osd() -- !!!
 
 A case can be made that osd refcounting is imperfect and reworking it
 would be a proper resolution, but for now Sage and I decided to fix
 this by adding a safe guard around __remove_osd().
 
 Fixes: http://tracker.ceph.com/issues/8087

So now you rely on the RB node in the osd getting cleared
as a signal that it has been removed already.  Yes, that
sounds like refcounting isn't working as desired...

The mutex around all calls to (now) remove_osd() avoids
races.  I like the RB_CLEAR_NODE() call anyway.
OK, enough chit chat.  This looks OK to me.

Reviewed-by: Alex Elder el...@linaro.org

 Cc: Sage Weil sw...@redhat.com
 Cc: sta...@vger.kernel.org # 3.9+: 7c6e6fc53e73: libceph: assert both regular 
 and lingering lists in __remove_osd()
 Cc: sta...@vger.kernel.org # 3.9+: cc9f1f518cec: libceph: change from BUG to 
 WARN for __remove_osd() asserts
 Cc: sta...@vger.kernel.org # 3.9+
 Signed-off-by: Ilya Dryomov idryo...@gmail.com
 ---
  net/ceph/osd_client.c | 26 ++
  1 file changed, 18 insertions(+), 8 deletions(-)
 
 diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
 index 53299c7b0ca4..f693a2f8ac86 100644
 --- a/net/ceph/osd_client.c
 +++ b/net/ceph/osd_client.c
 @@ -1048,14 +1048,24 @@ static void put_osd(struct ceph_osd *osd)
   */
  static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
  {
 - dout(__remove_osd %p\n, osd);
 + dout(%s %p osd%d\n, __func__, osd, osd-o_osd);
   WARN_ON(!list_empty(osd-o_requests));
   WARN_ON(!list_empty(osd-o_linger_requests));
  
 - rb_erase(osd-o_node, osdc-osds);
   list_del_init(osd-o_osd_lru);
 - ceph_con_close(osd-o_con);
 - put_osd(osd);
 + rb_erase(osd-o_node, osdc-osds);
 + RB_CLEAR_NODE(osd-o_node);
 +}
 +
 +static void remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
 +{
 + dout(%s %p osd%d\n, __func__, osd, osd-o_osd);
 +
 + if (!RB_EMPTY_NODE(osd-o_node)) {
 + ceph_con_close(osd-o_con);
 + __remove_osd(osdc, osd);
 + put_osd(osd);
 + }
  }
  
  static void remove_all_osds(struct ceph_osd_client *osdc)
 @@ -1065,7 +1075,7 @@ static void remove_all_osds(struct ceph_osd_client 
 *osdc)
   while (!RB_EMPTY_ROOT(osdc-osds)) {
   struct ceph_osd *osd = rb_entry(rb_first(osdc-osds),
   struct ceph_osd, o_node);
 - __remove_osd(osdc, osd);
 + remove_osd(osdc, osd);
   }
   mutex_unlock(osdc-request_mutex);
  }
 @@ -1106,7 +1116,7 @@ static void remove_old_osds(struct ceph_osd_client 
 *osdc)
   list_for_each_entry_safe(osd, nosd, osdc-osd_lru, o_osd_lru) {
   if (time_before(jiffies, osd-lru_ttl))
   break;
 - __remove_osd(osdc, osd);
 + remove_osd(osdc, osd);
   }
   mutex_unlock(osdc-request_mutex);
  }
 @@ -1121,8 +1131,7 @@ static int __reset_osd(struct ceph_osd_client *osdc, 
 struct ceph_osd *osd)
   dout(__reset_osd %p osd%d\n, osd, osd-o_osd);
   if (list_empty(osd-o_requests) 
   list_empty(osd-o_linger_requests)) {
 - __remove_osd(osdc, osd);
 -
 + remove_osd(osdc, osd);
   return -ENODEV;
   }
  
 @@ -1926,6 +1935,7 @@ static void reset_changed_osds(struct ceph_osd_client 
 *osdc)
  {
   struct rb_node *p, *n;
  
 + dout(%s %p\n, __func__, osdc);
   for (p = rb_first(osdc-osds); p; p = n) {
   struct ceph_osd *osd = rb_entry(p, struct ceph_osd, o_node);
  
 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] libceph: fix double __remove_osd() problem

2015-02-18 Thread Ilya Dryomov
It turns out it's possible to get __remove_osd() called twice on the
same OSD.  That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory.  One scenario that I was able to reproduce is as follows:

osd3 is idle, on the osd lru list
con reset - osd3
con_fault_finish()
  osd_reset()
  osdmap - osd3 down
  ceph_osdc_handle_map()
takes map_sem
kick_requests()
  takes request_mutex
  reset_changed_osds()
__reset_osd()
  __remove_osd()
  releases request_mutex
releases map_sem
takes map_sem
takes request_mutex
__kick_osd_requests()
  __reset_osd()
__remove_osd() -- !!!

A case can be made that osd refcounting is imperfect and reworking it
would be a proper resolution, but for now Sage and I decided to fix
this by adding a safe guard around __remove_osd().

Fixes: http://tracker.ceph.com/issues/8087

Cc: Sage Weil sw...@redhat.com
Cc: sta...@vger.kernel.org # 3.9+: 7c6e6fc53e73: libceph: assert both regular 
and lingering lists in __remove_osd()
Cc: sta...@vger.kernel.org # 3.9+: cc9f1f518cec: libceph: change from BUG to 
WARN for __remove_osd() asserts
Cc: sta...@vger.kernel.org # 3.9+
Signed-off-by: Ilya Dryomov idryo...@gmail.com
---
 net/ceph/osd_client.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 53299c7b0ca4..f693a2f8ac86 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1048,14 +1048,24 @@ static void put_osd(struct ceph_osd *osd)
  */
 static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
 {
-   dout(__remove_osd %p\n, osd);
+   dout(%s %p osd%d\n, __func__, osd, osd-o_osd);
WARN_ON(!list_empty(osd-o_requests));
WARN_ON(!list_empty(osd-o_linger_requests));
 
-   rb_erase(osd-o_node, osdc-osds);
list_del_init(osd-o_osd_lru);
-   ceph_con_close(osd-o_con);
-   put_osd(osd);
+   rb_erase(osd-o_node, osdc-osds);
+   RB_CLEAR_NODE(osd-o_node);
+}
+
+static void remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
+{
+   dout(%s %p osd%d\n, __func__, osd, osd-o_osd);
+
+   if (!RB_EMPTY_NODE(osd-o_node)) {
+   ceph_con_close(osd-o_con);
+   __remove_osd(osdc, osd);
+   put_osd(osd);
+   }
 }
 
 static void remove_all_osds(struct ceph_osd_client *osdc)
@@ -1065,7 +1075,7 @@ static void remove_all_osds(struct ceph_osd_client *osdc)
while (!RB_EMPTY_ROOT(osdc-osds)) {
struct ceph_osd *osd = rb_entry(rb_first(osdc-osds),
struct ceph_osd, o_node);
-   __remove_osd(osdc, osd);
+   remove_osd(osdc, osd);
}
mutex_unlock(osdc-request_mutex);
 }
@@ -1106,7 +1116,7 @@ static void remove_old_osds(struct ceph_osd_client *osdc)
list_for_each_entry_safe(osd, nosd, osdc-osd_lru, o_osd_lru) {
if (time_before(jiffies, osd-lru_ttl))
break;
-   __remove_osd(osdc, osd);
+   remove_osd(osdc, osd);
}
mutex_unlock(osdc-request_mutex);
 }
@@ -1121,8 +1131,7 @@ static int __reset_osd(struct ceph_osd_client *osdc, 
struct ceph_osd *osd)
dout(__reset_osd %p osd%d\n, osd, osd-o_osd);
if (list_empty(osd-o_requests) 
list_empty(osd-o_linger_requests)) {
-   __remove_osd(osdc, osd);
-
+   remove_osd(osdc, osd);
return -ENODEV;
}
 
@@ -1926,6 +1935,7 @@ static void reset_changed_osds(struct ceph_osd_client 
*osdc)
 {
struct rb_node *p, *n;
 
+   dout(%s %p\n, __func__, osdc);
for (p = rb_first(osdc-osds); p; p = n) {
struct ceph_osd *osd = rb_entry(p, struct ceph_osd, o_node);
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full_ratios - please explain?

2015-02-18 Thread Wido den Hollander
On 18-02-15 15:39, Wyllys Ingersoll wrote:
 Can someone explain the interaction and effects of all of these
 full_ratio parameters?  I havent found any real good explanation of how
 they affect the distribution of data once the cluster gets above the
 nearfull and close to the close ratios.
 

When only ONE (1) OSD goes over the mon_osd_nearfull_ratio the cluster
goes from HEALTH_OK into HEALTH_WARN state.

 
 mon_osd_full_ratio
 mon_osd_nearfull_ratio
 
 osd_backfill_full_ratio
 osd_failsafe_full_ratio
 osd_failsafe_nearfull_ratio
 
 We have a cluster with about 144 OSDs (518 TB) and trying to get it to a
 90% full rate for testing purposes.
 
 We've found that when some of the OSDs get above the mon_osd_full_ratio
 value (.95 in our system), then it stops accepting any new data, even
 though there is plenty of space left on other OSDs that are not yet even up
 to 90%.  Tweaking the osd_failsafe ratios enabled data to move again for a
 bit, but eventually it becomes unbalanced and stops working again.
 

Yes, that is because with Ceph safety goes first. When only one OSD goes
over the full ratio the whole cluster stops I/O.

CRUSH does not take OSD utilization into account when placing data, so
it's almost impossible to predict which I/O can continue.

Data safety and integrity is priority number 1. Full disks are a danger
to those priorities, so I/O is stopped.

 Is there a recommended combination of values to use that will allow the
 cluster to continue accepting data and rebalancing correctly above 90%.
 

No, not with those values. Monitor your filesystems that they stay below
those values. If one OSD becomes to full you can weigh it down using
CRUSH to have some data move away from it.

 thanks,
  Wyllys Ingersoll
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


-- 
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libceph: fix double __remove_osd() problem

2015-02-18 Thread Sage Weil
On Wed, 18 Feb 2015, Ilya Dryomov wrote:
 It turns out it's possible to get __remove_osd() called twice on the
 same OSD.  That doesn't sit well with rb_erase() - depending on the
 shape of the tree we can get a NULL dereference, a soft lockup or
 a random crash at some point in the future as we end up touching freed
 memory.  One scenario that I was able to reproduce is as follows:
 
 osd3 is idle, on the osd lru list
 con reset - osd3
 con_fault_finish()
   osd_reset()
   osdmap - osd3 down
   ceph_osdc_handle_map()
 takes map_sem
 kick_requests()
   takes request_mutex
   reset_changed_osds()
 __reset_osd()
   __remove_osd()
   releases request_mutex
 releases map_sem
 takes map_sem
 takes request_mutex
 __kick_osd_requests()
   __reset_osd()
 __remove_osd() -- !!!
 
 A case can be made that osd refcounting is imperfect and reworking it
 would be a proper resolution, but for now Sage and I decided to fix
 this by adding a safe guard around __remove_osd().
 
 Fixes: http://tracker.ceph.com/issues/8087
 
 Cc: Sage Weil sw...@redhat.com
 Cc: sta...@vger.kernel.org # 3.9+: 7c6e6fc53e73: libceph: assert both regular 
 and lingering lists in __remove_osd()
 Cc: sta...@vger.kernel.org # 3.9+: cc9f1f518cec: libceph: change from BUG to 
 WARN for __remove_osd() asserts
 Cc: sta...@vger.kernel.org # 3.9+
 Signed-off-by: Ilya Dryomov idryo...@gmail.com

Reviewed-by: Sage Weil s...@redhat.com


 ---
  net/ceph/osd_client.c | 26 ++
  1 file changed, 18 insertions(+), 8 deletions(-)
 
 diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
 index 53299c7b0ca4..f693a2f8ac86 100644
 --- a/net/ceph/osd_client.c
 +++ b/net/ceph/osd_client.c
 @@ -1048,14 +1048,24 @@ static void put_osd(struct ceph_osd *osd)
   */
  static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
  {
 - dout(__remove_osd %p\n, osd);
 + dout(%s %p osd%d\n, __func__, osd, osd-o_osd);
   WARN_ON(!list_empty(osd-o_requests));
   WARN_ON(!list_empty(osd-o_linger_requests));
  
 - rb_erase(osd-o_node, osdc-osds);
   list_del_init(osd-o_osd_lru);
 - ceph_con_close(osd-o_con);
 - put_osd(osd);
 + rb_erase(osd-o_node, osdc-osds);
 + RB_CLEAR_NODE(osd-o_node);
 +}
 +
 +static void remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
 +{
 + dout(%s %p osd%d\n, __func__, osd, osd-o_osd);
 +
 + if (!RB_EMPTY_NODE(osd-o_node)) {
 + ceph_con_close(osd-o_con);
 + __remove_osd(osdc, osd);
 + put_osd(osd);
 + }
  }
  
  static void remove_all_osds(struct ceph_osd_client *osdc)
 @@ -1065,7 +1075,7 @@ static void remove_all_osds(struct ceph_osd_client 
 *osdc)
   while (!RB_EMPTY_ROOT(osdc-osds)) {
   struct ceph_osd *osd = rb_entry(rb_first(osdc-osds),
   struct ceph_osd, o_node);
 - __remove_osd(osdc, osd);
 + remove_osd(osdc, osd);
   }
   mutex_unlock(osdc-request_mutex);
  }
 @@ -1106,7 +1116,7 @@ static void remove_old_osds(struct ceph_osd_client 
 *osdc)
   list_for_each_entry_safe(osd, nosd, osdc-osd_lru, o_osd_lru) {
   if (time_before(jiffies, osd-lru_ttl))
   break;
 - __remove_osd(osdc, osd);
 + remove_osd(osdc, osd);
   }
   mutex_unlock(osdc-request_mutex);
  }
 @@ -1121,8 +1131,7 @@ static int __reset_osd(struct ceph_osd_client *osdc, 
 struct ceph_osd *osd)
   dout(__reset_osd %p osd%d\n, osd, osd-o_osd);
   if (list_empty(osd-o_requests) 
   list_empty(osd-o_linger_requests)) {
 - __remove_osd(osdc, osd);
 -
 + remove_osd(osdc, osd);
   return -ENODEV;
   }
  
 @@ -1926,6 +1935,7 @@ static void reset_changed_osds(struct ceph_osd_client 
 *osdc)
  {
   struct rb_node *p, *n;
  
 + dout(%s %p\n, __func__, osdc);
   for (p = rb_first(osdc-osds); p; p = n) {
   struct ceph_osd *osd = rb_entry(p, struct ceph_osd, o_node);
  
 -- 
 1.9.3
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libceph: kfree() in put_osd() shouldn't depend on authorizer

2015-02-18 Thread Sage Weil
On Wed, 18 Feb 2015, Ilya Dryomov wrote:
 a255651d4cad (ceph: ensure auth ops are defined before use) made
 kfree() in put_osd() conditional on the authorizer.  A mechanical
 mistake most likely - fix it.
 
 Cc: Alex Elder el...@linaro.org
 Signed-off-by: Ilya Dryomov idryo...@gmail.com

Reviewed-by: Sage Weil s...@redhat.com
 ---
  net/ceph/osd_client.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
 index f693a2f8ac86..41a4abc7e98e 100644
 --- a/net/ceph/osd_client.c
 +++ b/net/ceph/osd_client.c
 @@ -1035,10 +1035,11 @@ static void put_osd(struct ceph_osd *osd)
  {
   dout(put_osd %p %d - %d\n, osd, atomic_read(osd-o_ref),
atomic_read(osd-o_ref) - 1);
 - if (atomic_dec_and_test(osd-o_ref)  osd-o_auth.authorizer) {
 + if (atomic_dec_and_test(osd-o_ref)) {
   struct ceph_auth_client *ac = osd-o_osdc-client-monc.auth;
  
 - ceph_auth_destroy_authorizer(ac, osd-o_auth.authorizer);
 + if (osd-o_auth.authorizer)
 + ceph_auth_destroy_authorizer(ac, 
 osd-o_auth.authorizer);
   kfree(osd);
   }
  }
 -- 
 1.9.3
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


02/18/2015 Weekly Ceph Performance Meeting IS ON!

2015-02-18 Thread Mark Nelson
8AM PST as usual!  Please add an agenda item if there is something you 
want to talk about.  I'll be talking a little bit about some of the SSD 
testing we posted about on the list yesterday.


Here's the links:

Etherpad URL:
http://pad.ceph.com/p/performance_weekly

To join the Meeting:
https://bluejeans.com/268261044

To join via Browser:
https://bluejeans.com/268261044/browser

To join with Lync:
https://bluejeans.com/268261044/lync


To join via Room System:
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 268261044

To join via Phone:
1) Dial:
  +1 408 740 7256
  +1 888 240 2560(US Toll Free)
  +1 408 317 9253(Alternate Number)
  (see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 268261044

Mark
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full_ratios - please explain?

2015-02-18 Thread Wyllys Ingersoll
Thanks!  More below inline...

On Wed, Feb 18, 2015 at 10:05 AM, Wido den Hollander w...@42on.com wrote:
 On 18-02-15 15:39, Wyllys Ingersoll wrote:
 Can someone explain the interaction and effects of all of these
 full_ratio parameters?  I havent found any real good explanation of how
 they affect the distribution of data once the cluster gets above the
 nearfull and close to the close ratios.


 When only ONE (1) OSD goes over the mon_osd_nearfull_ratio the cluster
 goes from HEALTH_OK into HEALTH_WARN state.


 mon_osd_full_ratio
 mon_osd_nearfull_ratio

 osd_backfill_full_ratio
 osd_failsafe_full_ratio
 osd_failsafe_nearfull_ratio

 We have a cluster with about 144 OSDs (518 TB) and trying to get it to a
 90% full rate for testing purposes.

 We've found that when some of the OSDs get above the mon_osd_full_ratio
 value (.95 in our system), then it stops accepting any new data, even
 though there is plenty of space left on other OSDs that are not yet even up
 to 90%.  Tweaking the osd_failsafe ratios enabled data to move again for a
 bit, but eventually it becomes unbalanced and stops working again.


 Yes, that is because with Ceph safety goes first. When only one OSD goes
 over the full ratio the whole cluster stops I/O.



Which full_ratio?  The problem is that there are at least 3
full_ratios - mon_osd_full_ratio, osd_failsafe_full_ratio, and
osd_backfill_full_ratio - how do they interact? What is the
consequence of having one be higher than the others?


Its seems extreme that 1 full osd out of potentially hundreds would
cause all IO into the cluster to stop when there are literally 10s or
100s of terrabytes of space left on other, less-full OSDs.

The confusion for me (and probably for others) is the proliferation of
full_ratio parameters and a lack of clarity on how they all affect
the cluster health and ability to balance when things start to fill
up.



 CRUSH does not take OSD utilization into account when placing data, so
 it's almost impossible to predict which I/O can continue.

 Data safety and integrity is priority number 1. Full disks are a danger
 to those priorities, so I/O is stopped.


Understood, but 1 full disk out of hundreds should not cause the
entire system to stop accepting new data or even balancing out the
data that it already has especially when there is room to grow yet on
other OSDs.

If 1 disk reaches the full_ratio, but 99 (or 999) others are still
well below that value, why doesn't it get balanced out ( assuming the
crush map considers all OSDs equal and all the pools have similar
pg_num values) ?
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


disk failure prediction

2015-02-18 Thread Sage Weil
Interesting paper at FAST:


https://www.usenix.org/system/files/conference/fast15/fast15-paper-ma.pdf

Short version: reallocated sectors correllates with impending disk 
failures (this sounds like what Sandon has been telling us for ages) and 
by preemptively replacing disks with impending failures reduced EMC's rate 
of triple-failures by 80%, and looking at the joint failure probability 
within each raid set reduces the failure rate by 98%.  We wouldn't see 
quite the same results since our raid sets are effectively entire pools, 
but this seems like a strong case for adding smart monitoring to the osds 
or to calamari already and doing some preemptive disk replacement.

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html