Re: issue 8747 / 9011
Hi Sage, On Fri, 19 Sep 2014 16:43:31 Sage Weil wrote: > Are you still seeing this crash? > > osd/ReplicatedPG.cc: 5297: FAILED assert(soid < scrubber.start || soid >= > scrubber.end) Thanks for following-up on this, Sage. Yes, I've seen this crash just recently on 0.80.5. It usually happens during long recovery like when OSD is replaced. I've seen this happening after hours of backfilling/remapping although it may take a long time to manifest. -- Cheers, Dmitry Smirnov GPG key : 4096R/53968D1B --- However beautiful the strategy, you should occasionally look at the results. -- Winston Churchill signature.asc Description: This is a digitally signed message part.
issue 8747 / 9011
Hey Dmitry, Are you still seeing this crash? osd/ReplicatedPG.cc: 5297: FAILED assert(soid < scrubber.start || soid >= scrubber.end) We haven't turned it up in our testing in the last two months, so we still have no log of it occurring. Thanks! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: S3 API Compatibility support
On Fri, Sep 19, 2014 at 8:32 AM, M Ranga Swami Reddy wrote: > Hi Sage, > Thanks for quick reply. > >>Ceph doesn't interact at all with AWS services like Glacier, if that's > > No. I meant that - Ceph interaction with a glacier and RRS type of > storages along with currently used OSD (or standard storage). > >>what you mean. >>For RRS, though, I assume you mean the ability to create buckets with >>reduced redundancy with radosgw? That is supported, although not quite >>the way AWS does it. You can create different pools that back RGW >>buckets, and each bucket is stored in one of those pools. So you could >>make one of them 2x instead of 3x, or use an erasure code of your choice. > > Yes, we can confiure ceph to use 2x replicas, which will look like > reduced redundancy, but AWS uses a separate RRS storage-low cost > (instead of > standard) storage for this purpose. I am checking, if we could > similarly in ceph too. You can use multiple placement targets and can specify on bucket creation which placement target to use. At this time we don't support the exact S3 reduced redundancy fields, although it should be pretty easy to add. > >>What isn't currently supported is the ability to reduce the redundancy of >>individual objects in a bucket. I don't think there is anything >>architecturally preventing that, but it is not implemented or supported. > > OK. Do we have the issue id for the above? Else, we can file one. Please > advise. I think #8929 would cover it. Yehuda > >>When we look at the S3 archival features in more detail (soon!) I'm sure >>this will come up! The current plan is to address object versioning >>first. That is, unless a developer surfaces who wants to start hacking on >>this right away... > > Great to know this. Even we are keen with S3 support in Ceph and we > are happy support you here. > > Thanks > Swami > > On Fri, Sep 19, 2014 at 11:08 AM, Sage Weil wrote: >> On Fri, 19 Sep 2014, M Ranga Swami Reddy wrote: >>> Hi Sage, >>> Could you please advise, if Ceph support the low cost object >>> storages(like Amazon Glacier or RRS) for archiving objects like log >>> file etc.? >> >> Ceph doesn't interact at all with AWS services like Glacier, if that's >> what you mean. >> >> For RRS, though, I assume you mean the ability to create buckets with >> reduced redundancy with radosgw? That is supported, although not quite >> the way AWS does it. You can create different pools that back RGW >> buckets, and each bucket is stored in one of those pools. So you could >> make one of them 2x instead of 3x, or use an erasure code of your choice. >> >> What isn't currently supported is the ability to reduce the redundancy of >> individual objects in a bucket. I don't think there is anything >> architecturally preventing that, but it is not implemented or supported. >> >> When we look at the S3 archival features in more detail (soon!) I'm sure >> this will come up! The current plan is to address object versioning >> first. That is, unless a developer surfaces who wants to start hacking on >> this right away... >> >> sage >> >> >> >>> >>> Thanks >>> Swami >>> >>> On Thu, Sep 18, 2014 at 6:20 PM, M Ranga Swami Reddy >>> wrote: >>> > Hi , >>> > >>> > Could you please check and clarify the below question on object >>> > lifecycle and notification S3 APIs support: >>> > >>> > 1. To support the bucket lifecycle - we need to support the >>> > moving/deleting the objects/buckets based lifecycle settings. >>> > For ex: If an object lifecyle set as below: >>> > 1. Archive it after 10 days - means move this object to low >>> > cost object storage after 10 days of the creation date. >>> >2. Remove this object after 90days - mean remove this >>> > object from the low cost object after 90days of creation date. >>> > >>> > Q1- Does the ceph support the above concept like moving to low cost >>> > storage and delete from that storage? >>> > >>> > 2. To support the object notifications: >>> > - First there should be low cost and high availability storage >>> > with single replica only. If an object created with this type of >>> > object storage, >>> > There could be chances that object could lose, so if an object >>> > of this type of storage lost, set the notifications. >>> > >>> > Q2- Does Ceph support low cost and high availability storage type? >>> > >>> > Thanks >>> > >>> > On Fri, Sep 12, 2014 at 8:00 PM, M Ranga Swami Reddy >>> > wrote: >>> >> Hi Yehuda, >>> >> >>> >> Could you please check and clarify the below question on object >>> >> lifecycle and notification S3 APIs support: >>> >> >>> >> 1. To support the bucket lifecycle - we need to support the >>> >> moving/deleting the objects/buckets based lifecycle settings. >>> >> For ex: If an object lifecyle set as below: >>> >> 1. Archive it after 10 days - means move this object to low >>> >> cost object storage after 10 days of the creation date. >>> >>2. Remove this object after 90days -
Re: Fwd: S3 API Compatibility support
>What do you mean by "RRS storage-low cost storage"? My read of the RRS >numbers is that they simply have a different tier of S3 that runs fewer >replicas and (probably) cheaper disks. In radosgw-land, this would just >be a different rados pool with 2x replicas and (probably) a CRUSH rule >mapping it to different hardware (with bigger and/or cheaper disks). Thats correct. If we could do the with a different rados pool using 2x replicas along with CURSH mapping it to different h/w (with bigger and cheaper disks) , then its same as RRS support in AWS. >> >What isn't currently supported is the ability to reduce the redundancy of >> >individual objects in a bucket. I don't think there is anything >> >architecturally preventing that, but it is not implemented or supported. >> >> OK. Do we have the issue id for the above? Else, we can file one. Please >> advise. >There is the main #4099 issue for object expiration, but there is no real >detail there. The plan is (as always) to have equivalent functionality to S3. >Do you mind creating a new feature ticket that specifically references the >ability to move objects to a second storage tier based on policy? Any >references to AWS docs about the API or functionality would be helpful in >the ticket. Sure, I will create a new feature ticket and add the needful information there. Thanks Swami On Fri, Sep 19, 2014 at 9:08 PM, Sage Weil wrote: > On Fri, 19 Sep 2014, M Ranga Swami Reddy wrote: >> Hi Sage, >> Thanks for quick reply. >> >> >what you mean. >> >For RRS, though, I assume you mean the ability to create buckets with >> >reduced redundancy with radosgw? That is supported, although not quite >> >the way AWS does it. You can create different pools that back RGW >> >buckets, and each bucket is stored in one of those pools. So you could >> >make one of them 2x instead of 3x, or use an erasure code of your choice. >> >> Yes, we can confiure ceph to use 2x replicas, which will look like >> reduced redundancy, but AWS uses a separate RRS storage-low cost >> (instead of >> standard) storage for this purpose. I am checking, if we could >> similarly in ceph too. > > What do you mean by "RRS storage-low cost storage"? My read of the RRS > numbers is that they simply have a different tier of S3 that runs fewer > replicas and (probably) cheaper disks. In radosgw-land, this would just > be a different rados pool with 2x replicas and (probably) a CRUSH rule > mapping it to different hardware (with bigger and/or cheaper disks). > >> >What isn't currently supported is the ability to reduce the redundancy of >> >individual objects in a bucket. I don't think there is anything >> >architecturally preventing that, but it is not implemented or supported. >> >> OK. Do we have the issue id for the above? Else, we can file one. Please >> advise. > > There is the main #4099 issue for object expiration, but there is no real > detail there. The plan is (as always) to have equivalent functionality to > S3. > > Do you mind creating a new feature ticket that specifically references the > ability to move objects to a second storage tier based on policy? Any > references to AWS docs about the API or functionality would be helpful in > the ticket. > >> >When we look at the S3 archival features in more detail (soon!) I'm sure >> >this will come up! The current plan is to address object versioning >> >first. That is, unless a developer surfaces who wants to start hacking on >> >this right away... >> >> Great to know this. Even we are keen with S3 support in Ceph and we >> are happy support you here. > > Great to hear! > > Thanks- > sage > > >> >> Thanks >> Swami >> >> On Fri, Sep 19, 2014 at 11:08 AM, Sage Weil wrote: >> > On Fri, 19 Sep 2014, M Ranga Swami Reddy wrote: >> >> Hi Sage, >> >> Could you please advise, if Ceph support the low cost object >> >> storages(like Amazon Glacier or RRS) for archiving objects like log >> >> file etc.? >> > >> > Ceph doesn't interact at all with AWS services like Glacier, if that's >> > what you mean. >> > >> > For RRS, though, I assume you mean the ability to create buckets with >> > reduced redundancy with radosgw? That is supported, although not quite >> > the way AWS does it. You can create different pools that back RGW >> > buckets, and each bucket is stored in one of those pools. So you could >> > make one of them 2x instead of 3x, or use an erasure code of your choice. >> > >> > What isn't currently supported is the ability to reduce the redundancy of >> > individual objects in a bucket. I don't think there is anything >> > architecturally preventing that, but it is not implemented or supported. >> > >> > When we look at the S3 archival features in more detail (soon!) I'm sure >> > this will come up! The current plan is to address object versioning >> > first. That is, unless a developer surfaces who wants to start hacking on >> > this right away... >> > >> > sage >> > >> > >> > >> >> >> >> Thanks >> >> Swami >> >> >> >
Re: why ZFS on ceph is unstable?
On 09/19/2014 10:40 AM, Eric Eastman wrote: Hi developers,it mentioned in the source code that OPTION(filestore_zfs_snap,OPT_BOOL, false) // zfsonlinux is still unstable. So if we turn on filestore_zfs_snap and neglect journal like btrf, it will be unstable?As is mentioned on the "zfs on linux community", It is stable enough to run a ZFS root filesystem on a GNU/Linux installation for yourworkstation as something to play around with. It is copy-on-write,supports compression, deduplication, file atomicity, off-disk caching,(encryption not support), and much more. So it seems that allfeatures are supported except for encryption.Thus, I am puzzled that the unstable, you mean, is ZFS unstableitself. Or it now is already stable on linux, but still unstable when used as ceph FileStore filesystem.If so, what will happen if we use it, losing data or frequent crash? In testing I did last year, there were multiple issues with using ZFS for my OSD backend, that would lock up the ZFS file systems, and take the OSD down. Several of these have been fixed by the ZFS team. See: https://github.com/zfsonlinux/zfs/issues/1891 https://github.com/zfsonlinux/zfs/issues/1961 https://github.com/zfsonlinux/zfs/issues/2015 The recommendation is to use xattr=sa, but looking at the current open issues for ZFS, there seems to still be issues with this option. See: https://github.com/zfsonlinux/zfs/issues/2700 https://github.com/zfsonlinux/zfs/issues/2717 https://github.com/zfsonlinux/zfs/issues/2663 and others SA xattrs are pretty important from a performance perspective for Ceph on ZFS based on some testing I did a while back with Brian Behlendorf. Also per the recent ZFS posting on clusterhq, aio will not be supported until 0.64 so the following needs to be added to your ceph.conf file filestore zfs_snap = 1 journal aio = 0 journal dio = 0 My plans are to retest ZFS as an OSD backend once ZFS version 0.64 has been released. Please test ZFS with Ceph, and submit bugs, as this is how it will get stable enough to use in production. Eric -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why ZFS on ceph is unstable?
Hi developers,it mentioned in the source code that OPTION(filestore_zfs_snap,OPT_BOOL, false) // zfsonlinux is still unstable. So if we turn on filestore_zfs_snap and neglect journal like btrf, it will be unstable?As is mentioned on the "zfs on linux community", It is stable enough to run a ZFS root filesystem on a GNU/Linux installation for yourworkstation as something to play around with. It is copy-on-write,supports compression, deduplication, file atomicity, off-disk caching,(encryption not support), and much more. So it seems that allfeatures are supported except for encryption.Thus, I am puzzled that the unstable, you mean, is ZFS unstableitself. Or it now is already stable on linux, but still unstable when used as ceph FileStore filesystem.If so, what will happen if we use it, losing data or frequent crash? In testing I did last year, there were multiple issues with using ZFS for my OSD backend, that would lock up the ZFS file systems, and take the OSD down. Several of these have been fixed by the ZFS team. See: https://github.com/zfsonlinux/zfs/issues/1891 https://github.com/zfsonlinux/zfs/issues/1961 https://github.com/zfsonlinux/zfs/issues/2015 The recommendation is to use xattr=sa, but looking at the current open issues for ZFS, there seems to still be issues with this option. See: https://github.com/zfsonlinux/zfs/issues/2700 https://github.com/zfsonlinux/zfs/issues/2717 https://github.com/zfsonlinux/zfs/issues/2663 and others Also per the recent ZFS posting on clusterhq, aio will not be supported until 0.64 so the following needs to be added to your ceph.conf file filestore zfs_snap = 1 journal aio = 0 journal dio = 0 My plans are to retest ZFS as an OSD backend once ZFS version 0.64 has been released. Please test ZFS with Ceph, and submit bugs, as this is how it will get stable enough to use in production. Eric -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: S3 API Compatibility support
On Fri, 19 Sep 2014, M Ranga Swami Reddy wrote: > Hi Sage, > Thanks for quick reply. > > >what you mean. > >For RRS, though, I assume you mean the ability to create buckets with > >reduced redundancy with radosgw? That is supported, although not quite > >the way AWS does it. You can create different pools that back RGW > >buckets, and each bucket is stored in one of those pools. So you could > >make one of them 2x instead of 3x, or use an erasure code of your choice. > > Yes, we can confiure ceph to use 2x replicas, which will look like > reduced redundancy, but AWS uses a separate RRS storage-low cost > (instead of > standard) storage for this purpose. I am checking, if we could > similarly in ceph too. What do you mean by "RRS storage-low cost storage"? My read of the RRS numbers is that they simply have a different tier of S3 that runs fewer replicas and (probably) cheaper disks. In radosgw-land, this would just be a different rados pool with 2x replicas and (probably) a CRUSH rule mapping it to different hardware (with bigger and/or cheaper disks). > >What isn't currently supported is the ability to reduce the redundancy of > >individual objects in a bucket. I don't think there is anything > >architecturally preventing that, but it is not implemented or supported. > > OK. Do we have the issue id for the above? Else, we can file one. Please > advise. There is the main #4099 issue for object expiration, but there is no real detail there. The plan is (as always) to have equivalent functionality to S3. Do you mind creating a new feature ticket that specifically references the ability to move objects to a second storage tier based on policy? Any references to AWS docs about the API or functionality would be helpful in the ticket. > >When we look at the S3 archival features in more detail (soon!) I'm sure > >this will come up! The current plan is to address object versioning > >first. That is, unless a developer surfaces who wants to start hacking on > >this right away... > > Great to know this. Even we are keen with S3 support in Ceph and we > are happy support you here. Great to hear! Thanks- sage > > Thanks > Swami > > On Fri, Sep 19, 2014 at 11:08 AM, Sage Weil wrote: > > On Fri, 19 Sep 2014, M Ranga Swami Reddy wrote: > >> Hi Sage, > >> Could you please advise, if Ceph support the low cost object > >> storages(like Amazon Glacier or RRS) for archiving objects like log > >> file etc.? > > > > Ceph doesn't interact at all with AWS services like Glacier, if that's > > what you mean. > > > > For RRS, though, I assume you mean the ability to create buckets with > > reduced redundancy with radosgw? That is supported, although not quite > > the way AWS does it. You can create different pools that back RGW > > buckets, and each bucket is stored in one of those pools. So you could > > make one of them 2x instead of 3x, or use an erasure code of your choice. > > > > What isn't currently supported is the ability to reduce the redundancy of > > individual objects in a bucket. I don't think there is anything > > architecturally preventing that, but it is not implemented or supported. > > > > When we look at the S3 archival features in more detail (soon!) I'm sure > > this will come up! The current plan is to address object versioning > > first. That is, unless a developer surfaces who wants to start hacking on > > this right away... > > > > sage > > > > > > > >> > >> Thanks > >> Swami > >> > >> On Thu, Sep 18, 2014 at 6:20 PM, M Ranga Swami Reddy > >> wrote: > >> > Hi , > >> > > >> > Could you please check and clarify the below question on object > >> > lifecycle and notification S3 APIs support: > >> > > >> > 1. To support the bucket lifecycle - we need to support the > >> > moving/deleting the objects/buckets based lifecycle settings. > >> > For ex: If an object lifecyle set as below: > >> > 1. Archive it after 10 days - means move this object to low > >> > cost object storage after 10 days of the creation date. > >> >2. Remove this object after 90days - mean remove this > >> > object from the low cost object after 90days of creation date. > >> > > >> > Q1- Does the ceph support the above concept like moving to low cost > >> > storage and delete from that storage? > >> > > >> > 2. To support the object notifications: > >> > - First there should be low cost and high availability storage > >> > with single replica only. If an object created with this type of > >> > object storage, > >> > There could be chances that object could lose, so if an object > >> > of this type of storage lost, set the notifications. > >> > > >> > Q2- Does Ceph support low cost and high availability storage type? > >> > > >> > Thanks > >> > > >> > On Fri, Sep 12, 2014 at 8:00 PM, M Ranga Swami Reddy > >> > wrote: > >> >> Hi Yehuda, > >> >> > >> >> Could you please check and clarify the below question on object > >> >> lifecycle and notification S3 APIs support: >
Re: Fwd: S3 API Compatibility support
Hi Sage, Thanks for quick reply. >Ceph doesn't interact at all with AWS services like Glacier, if that's No. I meant that - Ceph interaction with a glacier and RRS type of storages along with currently used OSD (or standard storage). >what you mean. >For RRS, though, I assume you mean the ability to create buckets with >reduced redundancy with radosgw? That is supported, although not quite >the way AWS does it. You can create different pools that back RGW >buckets, and each bucket is stored in one of those pools. So you could >make one of them 2x instead of 3x, or use an erasure code of your choice. Yes, we can confiure ceph to use 2x replicas, which will look like reduced redundancy, but AWS uses a separate RRS storage-low cost (instead of standard) storage for this purpose. I am checking, if we could similarly in ceph too. >What isn't currently supported is the ability to reduce the redundancy of >individual objects in a bucket. I don't think there is anything >architecturally preventing that, but it is not implemented or supported. OK. Do we have the issue id for the above? Else, we can file one. Please advise. >When we look at the S3 archival features in more detail (soon!) I'm sure >this will come up! The current plan is to address object versioning >first. That is, unless a developer surfaces who wants to start hacking on >this right away... Great to know this. Even we are keen with S3 support in Ceph and we are happy support you here. Thanks Swami On Fri, Sep 19, 2014 at 11:08 AM, Sage Weil wrote: > On Fri, 19 Sep 2014, M Ranga Swami Reddy wrote: >> Hi Sage, >> Could you please advise, if Ceph support the low cost object >> storages(like Amazon Glacier or RRS) for archiving objects like log >> file etc.? > > Ceph doesn't interact at all with AWS services like Glacier, if that's > what you mean. > > For RRS, though, I assume you mean the ability to create buckets with > reduced redundancy with radosgw? That is supported, although not quite > the way AWS does it. You can create different pools that back RGW > buckets, and each bucket is stored in one of those pools. So you could > make one of them 2x instead of 3x, or use an erasure code of your choice. > > What isn't currently supported is the ability to reduce the redundancy of > individual objects in a bucket. I don't think there is anything > architecturally preventing that, but it is not implemented or supported. > > When we look at the S3 archival features in more detail (soon!) I'm sure > this will come up! The current plan is to address object versioning > first. That is, unless a developer surfaces who wants to start hacking on > this right away... > > sage > > > >> >> Thanks >> Swami >> >> On Thu, Sep 18, 2014 at 6:20 PM, M Ranga Swami Reddy >> wrote: >> > Hi , >> > >> > Could you please check and clarify the below question on object >> > lifecycle and notification S3 APIs support: >> > >> > 1. To support the bucket lifecycle - we need to support the >> > moving/deleting the objects/buckets based lifecycle settings. >> > For ex: If an object lifecyle set as below: >> > 1. Archive it after 10 days - means move this object to low >> > cost object storage after 10 days of the creation date. >> >2. Remove this object after 90days - mean remove this >> > object from the low cost object after 90days of creation date. >> > >> > Q1- Does the ceph support the above concept like moving to low cost >> > storage and delete from that storage? >> > >> > 2. To support the object notifications: >> > - First there should be low cost and high availability storage >> > with single replica only. If an object created with this type of >> > object storage, >> > There could be chances that object could lose, so if an object >> > of this type of storage lost, set the notifications. >> > >> > Q2- Does Ceph support low cost and high availability storage type? >> > >> > Thanks >> > >> > On Fri, Sep 12, 2014 at 8:00 PM, M Ranga Swami Reddy >> > wrote: >> >> Hi Yehuda, >> >> >> >> Could you please check and clarify the below question on object >> >> lifecycle and notification S3 APIs support: >> >> >> >> 1. To support the bucket lifecycle - we need to support the >> >> moving/deleting the objects/buckets based lifecycle settings. >> >> For ex: If an object lifecyle set as below: >> >> 1. Archive it after 10 days - means move this object to low >> >> cost object storage after 10 days of the creation date. >> >>2. Remove this object after 90days - mean remove this >> >> object from the low cost object after 90days of creation date. >> >> >> >> Q1- Does the ceph support the above concept like moving to low cost >> >> storage and delete from that storage? >> >> >> >> 2. To support the object notifications: >> >> - First there should be low cost and high availability storage >> >> with single replica only. If an object created with this type of >> >> object storage, >> >> There c
Re: snap_trimming + backfilling is inefficient with many purged_snaps
September 19 2014 5:19 PM, "Sage Weil" wrote: > On Fri, 19 Sep 2014, Dan van der Ster wrote: > >> On Fri, Sep 19, 2014 at 10:41 AM, Dan Van Der Ster >> wrote: On 19 Sep 2014, at 08:12, Florian Haas wrote: On Fri, Sep 19, 2014 at 12:27 AM, Sage Weil wrote: > On Fri, 19 Sep 2014, Florian Haas wrote: >> Hi Sage, >> >> was the off-list reply intentional? > > Whoops! Nope :) > >> On Thu, Sep 18, 2014 at 11:47 PM, Sage Weil wrote: So, disaster is a pretty good description. Would anyone from the core team like to suggest another course of action or workaround, or are Dan and I generally on the right track to make the best out of a pretty bad situation? >>> >>> The short term fix would probably be to just prevent backfill for the >>> time >>> being until the bug is fixed. >> >> As in, osd max backfills = 0? > > Yeah :) > > Just managed to reproduce the problem... > > sage Saw the wip branch. Color me freakishly impressed on the turnaround. :) Thanks! >>> >>> Indeed :) Thanks Sage! >>> wip-9487-dumpling fixes the problem on my test cluster. Trying in prod now? >> >> Final update, after 4 hours in prod and after draining 8 OSDs -- zero >> slow requests :) > > That's great news! > > But, please be careful. This code hasn't been reiewed yet or been through > any testing! I would hold off on further backfills until it's merged. Roger; I've been watching it very closely and so far it seems to work very well. Looking forward to that merge :) Cheers, Dan > > Thanks! > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: severe librbd performance degradation in Giant
On Fri, 19 Sep 2014, Alexandre DERUMIER wrote: > >>Crazy, I've 56 SSDs and can?t go above 20 000 iops. > > I just notice than my fio benchmark is cpu bound... > > I can reach around 4iops. Don't have more client machines for the moment > to bench A quick aside on the fio testing: Mark noticed a few weeks back that the fio rbd driver is doing quite the right thing when you turn up the number of threads: each one issues its own IOs but they touch the same blocks in the image (or something like that). See http://tracker.ceph.com/issues/9391 It would be great to get this fixed in fio... sage > > > - Mail original - > > De: "Stefan Priebe - Profihost AG" > ?: "Xinxin Shu" , "Somnath Roy" > , "Alexandre DERUMIER" , > "Haomai Wang" > Cc: "Sage Weil" , "Josh Durgin" , > ceph-devel@vger.kernel.org > Envoy?: Vendredi 19 Septembre 2014 15:31:14 > Objet: Re: severe librbd performance degradation in Giant > > Am 19.09.2014 um 15:02 schrieb Shu, Xinxin: > > 12 x Intel DC 3700 200GB, every SSD has two OSDs. > > Crazy, I've 56 SSDs and can?t go above 20 000 iops. > > Gr??e Stefan > > > Cheers, > > xinxin > > > > -Original Message- > > From: Stefan Priebe [mailto:s.pri...@profihost.ag] > > Sent: Friday, September 19, 2014 2:54 PM > > To: Shu, Xinxin; Somnath Roy; Alexandre DERUMIER; Haomai Wang > > Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org > > Subject: Re: severe librbd performance degradation in Giant > > > > Am 19.09.2014 03:08, schrieb Shu, Xinxin: > >> I also observed performance degradation on my full SSD setup , I can > >> got ~270K IOPS for 4KB random read with 0.80.4 , but with latest > >> master , I only got ~12K IOPS > > > > This are impressive numbers. Can you tell me how many OSDs you have and > > which SSDs you use? > > > > Thanks, > > Stefan > > > > > >> Cheers, > >> xinxin > >> > >> -Original Message- > >> From: ceph-devel-ow...@vger.kernel.org > >> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy > >> Sent: Friday, September 19, 2014 2:03 AM > >> To: Alexandre DERUMIER; Haomai Wang > >> Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org > >> Subject: RE: severe librbd performance degradation in Giant > >> > >> Alexandre, > >> What tool are you using ? I used fio rbd. > >> > >> Also, I hope you have Giant package installed in the client side as well > >> and rbd_cache =true is set on the client conf file. > >> FYI, firefly librbd + librados and Giant cluster will work seamlessly and > >> I had to make sure fio rbd is really loading giant librbd (if you have > >> multiple copies around , which was in my case) for reproducing it. > >> > >> Thanks & Regards > >> Somnath > >> > >> -Original Message- > >> From: Alexandre DERUMIER [mailto:aderum...@odiso.com] > >> Sent: Thursday, September 18, 2014 2:49 AM > >> To: Haomai Wang > >> Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org; Somnath Roy > >> Subject: Re: severe librbd performance degradation in Giant > >> > According http://tracker.ceph.com/issues/9513, do you mean that rbd > cache will make 10x performance degradation for random read? > >> > >> Hi, on my side, I don't see any degradation performance on read (seq or > >> rand) with or without. > >> > >> firefly : around 12000iops (with or without rbd_cache) giant : around > >> 12000iops (with or without rbd_cache) > >> > >> (and I can reach around 2-3 iops on giant with disabling > >> optracker). > >> > >> > >> rbd_cache only improve write performance for me (4k block ) > >> > >> > >> > >> - Mail original - > >> > >> De: "Haomai Wang" > >> ?: "Somnath Roy" > >> Cc: "Sage Weil" , "Josh Durgin" > >> , ceph-devel@vger.kernel.org > >> Envoy?: Jeudi 18 Septembre 2014 04:27:56 > >> Objet: Re: severe librbd performance degradation in Giant > >> > >> According http://tracker.ceph.com/issues/9513, do you mean that rbd cache > >> will make 10x performance degradation for random read? > >> > >> On Thu, Sep 18, 2014 at 7:44 AM, Somnath Roy > >> wrote: > >>> Josh/Sage, > >>> I should mention that even after turning off rbd cache I am getting ~20% > >>> degradation over Firefly. > >>> > >>> Thanks & Regards > >>> Somnath > >>> > >>> -Original Message- > >>> From: Somnath Roy > >>> Sent: Wednesday, September 17, 2014 2:44 PM > >>> To: Sage Weil > >>> Cc: Josh Durgin; ceph-devel@vger.kernel.org > >>> Subject: RE: severe librbd performance degradation in Giant > >>> > >>> Created a tracker for this. > >>> > >>> http://tracker.ceph.com/issues/9513 > >>> > >>> Thanks & Regards > >>> Somnath > >>> > >>> -Original Message- > >>> From: ceph-devel-ow...@vger.kernel.org > >>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy > >>> Sent: Wednesday, September 17, 2014 2:39 PM > >>> To: Sage Weil > >>> Cc: Josh Durgin; ceph-devel@vger.kernel.org >
Re: [ceph-users] Status of snapshots in CephFS
On Fri, 19 Sep 2014, Florian Haas wrote: > Hello everyone, > > Just thought I'd circle back on some discussions I've had with people > earlier in the year: > > Shortly before firefly, snapshot support for CephFS clients was > effectively disabled by default at the MDS level, and can only be > enabled after accepting a scary warning that your filesystem is highly > likely to break if snapshot support is enabled. Has any progress been > made on this in the interim? > > With libcephfs support slowly maturing in Ganesha, the option of > deploying a Ceph-backed userspace NFS server is becoming more > attractive -- and it's probably a better use of resources than mapping > a boatload of RBDs on an NFS head node and then exporting all the data > from there. Recent snapshot trimming issues notwithstanding, RBD > snapshot support is reasonably stable, but even so, making snapshot > data available via NFS, that way, is rather ugly. In addition, the > libcephfs/Ganesha approach would obviously include much better > horizontal scalability. We haven't done any work on snapshot stability. It is probably moderately stable if snapshots are only done at the root or at a consistent point in the hierarcy (as opposed to random directories), but there are still some basic problems that need to be resolved. I would not suggest deploying this in production! But some stress testing woudl as always be very welcome. :) > In addition, > https://github.com/nfs-ganesha/nfs-ganesha/wiki/ReleaseNotes_2.0#CEPH > states: > > "The current requirement to build and use the Ceph FSAL is a Ceph > build environment which includes Ceph client enhancements staged on > the libwipcephfs development branch. These changes are expected to be > part of the Ceph Firefly release." > > ... though it's not clear whether they ever did make it into firefly. > Could someone in the know comment on that? I think this is referring to the libcephfs API changes that the cohortfs folks did. That all merged shortly before firefly. By the way, we have some basic samba integration tests in our regular regression tests, but nothing based on ganesha. If you really want this to the work, the most valuable thing you could do would be to help get the tests written and integrated into ceph-qa-suite.git. Probably the biggest piece of work there is creating a task/ganesha.py that installs and configures ganesha with the ceph FSAL. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: snap_trimming + backfilling is inefficient with many purged_snaps
On Fri, 19 Sep 2014, Dan van der Ster wrote: > On Fri, Sep 19, 2014 at 10:41 AM, Dan Van Der Ster > wrote: > >> On 19 Sep 2014, at 08:12, Florian Haas wrote: > >> > >> On Fri, Sep 19, 2014 at 12:27 AM, Sage Weil wrote: > >>> On Fri, 19 Sep 2014, Florian Haas wrote: > Hi Sage, > > was the off-list reply intentional? > >>> > >>> Whoops! Nope :) > >>> > On Thu, Sep 18, 2014 at 11:47 PM, Sage Weil wrote: > >> So, disaster is a pretty good description. Would anyone from the core > >> team like to suggest another course of action or workaround, or are > >> Dan and I generally on the right track to make the best out of a > >> pretty bad situation? > > > > The short term fix would probably be to just prevent backfill for the > > time > > being until the bug is fixed. > > As in, osd max backfills = 0? > >>> > >>> Yeah :) > >>> > >>> Just managed to reproduce the problem... > >>> > >>> sage > >> > >> Saw the wip branch. Color me freakishly impressed on the turnaround. :) > >> Thanks! > > > > Indeed :) Thanks Sage! > > wip-9487-dumpling fixes the problem on my test cluster. Trying in prod now? > > Final update, after 4 hours in prod and after draining 8 OSDs -- zero > slow requests :) That's great news! But, please be careful. This code hasn't been reiewed yet or been through any testing! I would hold off on further backfills until it's merged. Thanks! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: severe librbd performance degradation in Giant
On Fri, 19 Sep 2014, Alexandre DERUMIER wrote: > >> with rbd_cache=true , I got around 6iops (and I don't see any network > >> traffic) > >> > >>So maybe they are a bug in fio ? > >>maybe this is related to: > > Oh, sorry, this was my fault, I didn't fill the rbd with datas before doing > the bench > > Now the results are (for 1 osd) > > firefly > -- > bw=37460KB/s, iops=9364 > > giant > - > bw=32741KB/s, iops=8185 > > > So, a little regression > > (the results are equals rbd_cache=true|false) Do you see a difference with rados bench, or is it just librbd? Thanks! sage > > > I'll try to compare with more osds > > - Mail original - > > De: "Alexandre DERUMIER" > ?: "Somnath Roy" > Cc: "Sage Weil" , "Josh Durgin" , > ceph-devel@vger.kernel.org, "Haomai Wang" > Envoy?: Vendredi 19 Septembre 2014 12:09:41 > Objet: Re: severe librbd performance degradation in Giant > > >>What tool are you using ? I used fio rbd. > > fio rbd too > > > [global] > ioengine=rbd > clientname=admin > pool=test > rbdname=test > invalidate=0 > #rw=read > #rw=randwrite > #rw=write > rw=randread > bs=4k > direct=1 > numjobs=2 > group_reporting=1 > size=10G > > [rbd_iodepth32] > iodepth=32 > > > > I just notice something strange > > with rbd_cache=true , I got around 6iops (and I don't see any network > traffic) > > So maybe they are a bug in fio ? > maybe this is related to: > > > http://tracker.ceph.com/issues/9391 > "fio rbd driver rewrites same blocks" > > - Mail original - > > De: "Somnath Roy" > ?: "Alexandre DERUMIER" , "Haomai Wang" > > Cc: "Sage Weil" , "Josh Durgin" , > ceph-devel@vger.kernel.org > Envoy?: Jeudi 18 Septembre 2014 20:02:49 > Objet: RE: severe librbd performance degradation in Giant > > Alexandre, > What tool are you using ? I used fio rbd. > > Also, I hope you have Giant package installed in the client side as well and > rbd_cache =true is set on the client conf file. > FYI, firefly librbd + librados and Giant cluster will work seamlessly and I > had to make sure fio rbd is really loading giant librbd (if you have multiple > copies around , which was in my case) for reproducing it. > > Thanks & Regards > Somnath > > -Original Message- > From: Alexandre DERUMIER [mailto:aderum...@odiso.com] > Sent: Thursday, September 18, 2014 2:49 AM > To: Haomai Wang > Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org; Somnath Roy > Subject: Re: severe librbd performance degradation in Giant > > >>According http://tracker.ceph.com/issues/9513, do you mean that rbd > >>cache will make 10x performance degradation for random read? > > Hi, on my side, I don't see any degradation performance on read (seq or rand) > with or without. > > firefly : around 12000iops (with or without rbd_cache) giant : around > 12000iops (with or without rbd_cache) > > (and I can reach around 2-3 iops on giant with disabling optracker). > > > rbd_cache only improve write performance for me (4k block ) > > > > - Mail original - > > De: "Haomai Wang" > ?: "Somnath Roy" > Cc: "Sage Weil" , "Josh Durgin" , > ceph-devel@vger.kernel.org > Envoy?: Jeudi 18 Septembre 2014 04:27:56 > Objet: Re: severe librbd performance degradation in Giant > > According http://tracker.ceph.com/issues/9513, do you mean that rbd cache > will make 10x performance degradation for random read? > > On Thu, Sep 18, 2014 at 7:44 AM, Somnath Roy wrote: > > Josh/Sage, > > I should mention that even after turning off rbd cache I am getting ~20% > > degradation over Firefly. > > > > Thanks & Regards > > Somnath > > > > -Original Message- > > From: Somnath Roy > > Sent: Wednesday, September 17, 2014 2:44 PM > > To: Sage Weil > > Cc: Josh Durgin; ceph-devel@vger.kernel.org > > Subject: RE: severe librbd performance degradation in Giant > > > > Created a tracker for this. > > > > http://tracker.ceph.com/issues/9513 > > > > Thanks & Regards > > Somnath > > > > -Original Message- > > From: ceph-devel-ow...@vger.kernel.org > > [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy > > Sent: Wednesday, September 17, 2014 2:39 PM > > To: Sage Weil > > Cc: Josh Durgin; ceph-devel@vger.kernel.org > > Subject: RE: severe librbd performance degradation in Giant > > > > Sage, > > It's a 4K random read. > > > > Thanks & Regards > > Somnath > > > > -Original Message- > > From: Sage Weil [mailto:sw...@redhat.com] > > Sent: Wednesday, September 17, 2014 2:36 PM > > To: Somnath Roy > > Cc: Josh Durgin; ceph-devel@vger.kernel.org > > Subject: RE: severe librbd performance degradation in Giant > > > > What was the io pattern? Sequential or random? For random a slowdown makes > > sense (tho maybe not 10x!) but not for sequentail > > > > s > > > > On Wed, 17 Sep 2014, Somnath Roy wrote: > > > >> I set the following in the client side
Re: why ZFS on ceph is unstable?
On Fri, 19 Sep 2014, Nicheal wrote: > Hi developers, > > it mentioned in the source code that OPTION(filestore_zfs_snap, > OPT_BOOL, false) // zfsonlinux is still unstable. So if we turn on > filestore_zfs_snap and neglect journal like btrf, it will be unstable? > > As is mentioned on the "zfs on linux community", It is stable enough > to run a ZFS root filesystem on a GNU/Linux installation for your > workstation as something to play around with. It is copy-on-write, > supports compression, deduplication, file atomicity, off-disk caching, > (encryption not support), and much more. So it seems that all > features are supported except for encryption. > Thus, I am puzzled that the unstable, you mean, is ZFS unstable > itself. Or it now is already stable on linux, but still unstable when > used as ceph FileStore filesystem. > > If so, what will happen if we use it, losing data or frequent crash? At the time the libzfs support was added, zfsonlinux would crash very quickly under the ceph-osd workload. If that has changed, great! We haven't tested it, though, since Zheng added the initial support. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: severe librbd performance degradation in Giant
>>Crazy, I've 56 SSDs and canÄt go above 20 000 iops. I just notice than my fio benchmark is cpu bound... I can reach around 4iops. Don't have more client machines for the moment to bench - Mail original - De: "Stefan Priebe - Profihost AG" À: "Xinxin Shu" , "Somnath Roy" , "Alexandre DERUMIER" , "Haomai Wang" Cc: "Sage Weil" , "Josh Durgin" , ceph-devel@vger.kernel.org Envoyé: Vendredi 19 Septembre 2014 15:31:14 Objet: Re: severe librbd performance degradation in Giant Am 19.09.2014 um 15:02 schrieb Shu, Xinxin: > 12 x Intel DC 3700 200GB, every SSD has two OSDs. Crazy, I've 56 SSDs and canÄt go above 20 000 iops. Grüße Stefan > Cheers, > xinxin > > -Original Message- > From: Stefan Priebe [mailto:s.pri...@profihost.ag] > Sent: Friday, September 19, 2014 2:54 PM > To: Shu, Xinxin; Somnath Roy; Alexandre DERUMIER; Haomai Wang > Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org > Subject: Re: severe librbd performance degradation in Giant > > Am 19.09.2014 03:08, schrieb Shu, Xinxin: >> I also observed performance degradation on my full SSD setup , I can >> got ~270K IOPS for 4KB random read with 0.80.4 , but with latest >> master , I only got ~12K IOPS > > This are impressive numbers. Can you tell me how many OSDs you have and which > SSDs you use? > > Thanks, > Stefan > > >> Cheers, >> xinxin >> >> -Original Message- >> From: ceph-devel-ow...@vger.kernel.org >> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy >> Sent: Friday, September 19, 2014 2:03 AM >> To: Alexandre DERUMIER; Haomai Wang >> Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org >> Subject: RE: severe librbd performance degradation in Giant >> >> Alexandre, >> What tool are you using ? I used fio rbd. >> >> Also, I hope you have Giant package installed in the client side as well and >> rbd_cache =true is set on the client conf file. >> FYI, firefly librbd + librados and Giant cluster will work seamlessly and I >> had to make sure fio rbd is really loading giant librbd (if you have >> multiple copies around , which was in my case) for reproducing it. >> >> Thanks & Regards >> Somnath >> >> -Original Message- >> From: Alexandre DERUMIER [mailto:aderum...@odiso.com] >> Sent: Thursday, September 18, 2014 2:49 AM >> To: Haomai Wang >> Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org; Somnath Roy >> Subject: Re: severe librbd performance degradation in Giant >> According http://tracker.ceph.com/issues/9513, do you mean that rbd cache will make 10x performance degradation for random read? >> >> Hi, on my side, I don't see any degradation performance on read (seq or >> rand) with or without. >> >> firefly : around 12000iops (with or without rbd_cache) giant : around >> 12000iops (with or without rbd_cache) >> >> (and I can reach around 2-3 iops on giant with disabling optracker). >> >> >> rbd_cache only improve write performance for me (4k block ) >> >> >> >> - Mail original - >> >> De: "Haomai Wang" >> À: "Somnath Roy" >> Cc: "Sage Weil" , "Josh Durgin" >> , ceph-devel@vger.kernel.org >> Envoyé: Jeudi 18 Septembre 2014 04:27:56 >> Objet: Re: severe librbd performance degradation in Giant >> >> According http://tracker.ceph.com/issues/9513, do you mean that rbd cache >> will make 10x performance degradation for random read? >> >> On Thu, Sep 18, 2014 at 7:44 AM, Somnath Roy >> wrote: >>> Josh/Sage, >>> I should mention that even after turning off rbd cache I am getting ~20% >>> degradation over Firefly. >>> >>> Thanks & Regards >>> Somnath >>> >>> -Original Message- >>> From: Somnath Roy >>> Sent: Wednesday, September 17, 2014 2:44 PM >>> To: Sage Weil >>> Cc: Josh Durgin; ceph-devel@vger.kernel.org >>> Subject: RE: severe librbd performance degradation in Giant >>> >>> Created a tracker for this. >>> >>> http://tracker.ceph.com/issues/9513 >>> >>> Thanks & Regards >>> Somnath >>> >>> -Original Message- >>> From: ceph-devel-ow...@vger.kernel.org >>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy >>> Sent: Wednesday, September 17, 2014 2:39 PM >>> To: Sage Weil >>> Cc: Josh Durgin; ceph-devel@vger.kernel.org >>> Subject: RE: severe librbd performance degradation in Giant >>> >>> Sage, >>> It's a 4K random read. >>> >>> Thanks & Regards >>> Somnath >>> >>> -Original Message- >>> From: Sage Weil [mailto:sw...@redhat.com] >>> Sent: Wednesday, September 17, 2014 2:36 PM >>> To: Somnath Roy >>> Cc: Josh Durgin; ceph-devel@vger.kernel.org >>> Subject: RE: severe librbd performance degradation in Giant >>> >>> What was the io pattern? Sequential or random? For random a slowdown makes >>> sense (tho maybe not 10x!) but not for sequentail >>> >>> s >>> >>> On Wed, 17 Sep 2014, Somnath Roy wrote: >>> I set the following in the client side /etc/ceph/ceph.conf
Re: severe librbd performance degradation in Giant
Numbers vary a lot from brand to brand and from model to model. Just within Intel, you'd be surprised at the large difference between DC S3500 and DC S3700: http://ark.intel.com/compare/75680,71914 -- David Moreau Simard Le 2014-09-19, 9:31 AM, « Stefan Priebe - Profihost AG » a écrit : >Am 19.09.2014 um 15:02 schrieb Shu, Xinxin: >> 12 x Intel DC 3700 200GB, every SSD has two OSDs. > >Crazy, I've 56 SSDs and canÄt go above 20 000 iops. > >Grüße Stefan > >> Cheers, >> xinxin >> >> -Original Message- >> From: Stefan Priebe [mailto:s.pri...@profihost.ag] >> Sent: Friday, September 19, 2014 2:54 PM >> To: Shu, Xinxin; Somnath Roy; Alexandre DERUMIER; Haomai Wang >> Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org >> Subject: Re: severe librbd performance degradation in Giant >> >> Am 19.09.2014 03:08, schrieb Shu, Xinxin: >>> I also observed performance degradation on my full SSD setup , I can >>> got ~270K IOPS for 4KB random read with 0.80.4 , but with latest >>> master , I only got ~12K IOPS >> >> This are impressive numbers. Can you tell me how many OSDs you have and >>which SSDs you use? >> >> Thanks, >> Stefan >> >> >>> Cheers, >>> xinxin >>> >>> -Original Message- >>> From: ceph-devel-ow...@vger.kernel.org >>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy >>> Sent: Friday, September 19, 2014 2:03 AM >>> To: Alexandre DERUMIER; Haomai Wang >>> Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org >>> Subject: RE: severe librbd performance degradation in Giant >>> >>> Alexandre, >>> What tool are you using ? I used fio rbd. >>> >>> Also, I hope you have Giant package installed in the client side as >>>well and rbd_cache =true is set on the client conf file. >>> FYI, firefly librbd + librados and Giant cluster will work seamlessly >>>and I had to make sure fio rbd is really loading giant librbd (if you >>>have multiple copies around , which was in my case) for reproducing it. >>> >>> Thanks & Regards >>> Somnath >>> >>> -Original Message- >>> From: Alexandre DERUMIER [mailto:aderum...@odiso.com] >>> Sent: Thursday, September 18, 2014 2:49 AM >>> To: Haomai Wang >>> Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org; Somnath Roy >>> Subject: Re: severe librbd performance degradation in Giant >>> > According http://tracker.ceph.com/issues/9513, do you mean that rbd > cache will make 10x performance degradation for random read? >>> >>> Hi, on my side, I don't see any degradation performance on read (seq >>>or rand) with or without. >>> >>> firefly : around 12000iops (with or without rbd_cache) giant : around >>> 12000iops (with or without rbd_cache) >>> >>> (and I can reach around 2-3 iops on giant with disabling >>>optracker). >>> >>> >>> rbd_cache only improve write performance for me (4k block ) >>> >>> >>> >>> - Mail original - >>> >>> De: "Haomai Wang" >>> À: "Somnath Roy" >>> Cc: "Sage Weil" , "Josh Durgin" >>> , ceph-devel@vger.kernel.org >>> Envoyé: Jeudi 18 Septembre 2014 04:27:56 >>> Objet: Re: severe librbd performance degradation in Giant >>> >>> According http://tracker.ceph.com/issues/9513, do you mean that rbd >>>cache will make 10x performance degradation for random read? >>> >>> On Thu, Sep 18, 2014 at 7:44 AM, Somnath Roy >>>wrote: Josh/Sage, I should mention that even after turning off rbd cache I am getting ~20% degradation over Firefly. Thanks & Regards Somnath -Original Message- From: Somnath Roy Sent: Wednesday, September 17, 2014 2:44 PM To: Sage Weil Cc: Josh Durgin; ceph-devel@vger.kernel.org Subject: RE: severe librbd performance degradation in Giant Created a tracker for this. http://tracker.ceph.com/issues/9513 Thanks & Regards Somnath -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy Sent: Wednesday, September 17, 2014 2:39 PM To: Sage Weil Cc: Josh Durgin; ceph-devel@vger.kernel.org Subject: RE: severe librbd performance degradation in Giant Sage, It's a 4K random read. Thanks & Regards Somnath -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Wednesday, September 17, 2014 2:36 PM To: Somnath Roy Cc: Josh Durgin; ceph-devel@vger.kernel.org Subject: RE: severe librbd performance degradation in Giant What was the io pattern? Sequential or random? For random a slowdown makes sense (tho maybe not 10x!) but not for sequentail s On Wed, 17 Sep 2014, Somnath Roy wrote: > I set the following in the client side /etc/ceph/ceph.conf where I >am running fio rbd. > > rbd_cache_writethrough_until_flush = false > > But, no difference. BTW, I am doing Random read, not write. Still >this setting applies ? > >
Re: severe librbd performance degradation in Giant
Am 19.09.2014 um 15:02 schrieb Shu, Xinxin: > 12 x Intel DC 3700 200GB, every SSD has two OSDs. Crazy, I've 56 SSDs and canÄt go above 20 000 iops. Grüße Stefan > Cheers, > xinxin > > -Original Message- > From: Stefan Priebe [mailto:s.pri...@profihost.ag] > Sent: Friday, September 19, 2014 2:54 PM > To: Shu, Xinxin; Somnath Roy; Alexandre DERUMIER; Haomai Wang > Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org > Subject: Re: severe librbd performance degradation in Giant > > Am 19.09.2014 03:08, schrieb Shu, Xinxin: >> I also observed performance degradation on my full SSD setup , I can >> got ~270K IOPS for 4KB random read with 0.80.4 , but with latest >> master , I only got ~12K IOPS > > This are impressive numbers. Can you tell me how many OSDs you have and which > SSDs you use? > > Thanks, > Stefan > > >> Cheers, >> xinxin >> >> -Original Message- >> From: ceph-devel-ow...@vger.kernel.org >> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy >> Sent: Friday, September 19, 2014 2:03 AM >> To: Alexandre DERUMIER; Haomai Wang >> Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org >> Subject: RE: severe librbd performance degradation in Giant >> >> Alexandre, >> What tool are you using ? I used fio rbd. >> >> Also, I hope you have Giant package installed in the client side as well and >> rbd_cache =true is set on the client conf file. >> FYI, firefly librbd + librados and Giant cluster will work seamlessly and I >> had to make sure fio rbd is really loading giant librbd (if you have >> multiple copies around , which was in my case) for reproducing it. >> >> Thanks & Regards >> Somnath >> >> -Original Message- >> From: Alexandre DERUMIER [mailto:aderum...@odiso.com] >> Sent: Thursday, September 18, 2014 2:49 AM >> To: Haomai Wang >> Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org; Somnath Roy >> Subject: Re: severe librbd performance degradation in Giant >> According http://tracker.ceph.com/issues/9513, do you mean that rbd cache will make 10x performance degradation for random read? >> >> Hi, on my side, I don't see any degradation performance on read (seq or >> rand) with or without. >> >> firefly : around 12000iops (with or without rbd_cache) giant : around >> 12000iops (with or without rbd_cache) >> >> (and I can reach around 2-3 iops on giant with disabling optracker). >> >> >> rbd_cache only improve write performance for me (4k block ) >> >> >> >> - Mail original - >> >> De: "Haomai Wang" >> À: "Somnath Roy" >> Cc: "Sage Weil" , "Josh Durgin" >> , ceph-devel@vger.kernel.org >> Envoyé: Jeudi 18 Septembre 2014 04:27:56 >> Objet: Re: severe librbd performance degradation in Giant >> >> According http://tracker.ceph.com/issues/9513, do you mean that rbd cache >> will make 10x performance degradation for random read? >> >> On Thu, Sep 18, 2014 at 7:44 AM, Somnath Roy wrote: >>> Josh/Sage, >>> I should mention that even after turning off rbd cache I am getting ~20% >>> degradation over Firefly. >>> >>> Thanks & Regards >>> Somnath >>> >>> -Original Message- >>> From: Somnath Roy >>> Sent: Wednesday, September 17, 2014 2:44 PM >>> To: Sage Weil >>> Cc: Josh Durgin; ceph-devel@vger.kernel.org >>> Subject: RE: severe librbd performance degradation in Giant >>> >>> Created a tracker for this. >>> >>> http://tracker.ceph.com/issues/9513 >>> >>> Thanks & Regards >>> Somnath >>> >>> -Original Message- >>> From: ceph-devel-ow...@vger.kernel.org >>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy >>> Sent: Wednesday, September 17, 2014 2:39 PM >>> To: Sage Weil >>> Cc: Josh Durgin; ceph-devel@vger.kernel.org >>> Subject: RE: severe librbd performance degradation in Giant >>> >>> Sage, >>> It's a 4K random read. >>> >>> Thanks & Regards >>> Somnath >>> >>> -Original Message- >>> From: Sage Weil [mailto:sw...@redhat.com] >>> Sent: Wednesday, September 17, 2014 2:36 PM >>> To: Somnath Roy >>> Cc: Josh Durgin; ceph-devel@vger.kernel.org >>> Subject: RE: severe librbd performance degradation in Giant >>> >>> What was the io pattern? Sequential or random? For random a slowdown makes >>> sense (tho maybe not 10x!) but not for sequentail >>> >>> s >>> >>> On Wed, 17 Sep 2014, Somnath Roy wrote: >>> I set the following in the client side /etc/ceph/ceph.conf where I am running fio rbd. rbd_cache_writethrough_until_flush = false But, no difference. BTW, I am doing Random read, not write. Still this setting applies ? Next, I tried to tweak the rbd_cache setting to false and I *got back* the old performance. Now, it is similar to firefly throughput ! So, loks like rbd_cache=true was the culprit. Thanks Josh ! Regards Somnath -Original Message- From: Josh Durgin [mailto:josh.dur...@inktank.com] Sent: Wednesday, September 17, 2014 2:20 PM To: Somnath Ro
RE: severe librbd performance degradation in Giant
12 x Intel DC 3700 200GB, every SSD has two OSDs. Cheers, xinxin -Original Message- From: Stefan Priebe [mailto:s.pri...@profihost.ag] Sent: Friday, September 19, 2014 2:54 PM To: Shu, Xinxin; Somnath Roy; Alexandre DERUMIER; Haomai Wang Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org Subject: Re: severe librbd performance degradation in Giant Am 19.09.2014 03:08, schrieb Shu, Xinxin: > I also observed performance degradation on my full SSD setup , I can > got ~270K IOPS for 4KB random read with 0.80.4 , but with latest > master , I only got ~12K IOPS This are impressive numbers. Can you tell me how many OSDs you have and which SSDs you use? Thanks, Stefan > Cheers, > xinxin > > -Original Message- > From: ceph-devel-ow...@vger.kernel.org > [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy > Sent: Friday, September 19, 2014 2:03 AM > To: Alexandre DERUMIER; Haomai Wang > Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org > Subject: RE: severe librbd performance degradation in Giant > > Alexandre, > What tool are you using ? I used fio rbd. > > Also, I hope you have Giant package installed in the client side as well and > rbd_cache =true is set on the client conf file. > FYI, firefly librbd + librados and Giant cluster will work seamlessly and I > had to make sure fio rbd is really loading giant librbd (if you have multiple > copies around , which was in my case) for reproducing it. > > Thanks & Regards > Somnath > > -Original Message- > From: Alexandre DERUMIER [mailto:aderum...@odiso.com] > Sent: Thursday, September 18, 2014 2:49 AM > To: Haomai Wang > Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org; Somnath Roy > Subject: Re: severe librbd performance degradation in Giant > >>> According http://tracker.ceph.com/issues/9513, do you mean that rbd >>> cache will make 10x performance degradation for random read? > > Hi, on my side, I don't see any degradation performance on read (seq or rand) > with or without. > > firefly : around 12000iops (with or without rbd_cache) giant : around > 12000iops (with or without rbd_cache) > > (and I can reach around 2-3 iops on giant with disabling optracker). > > > rbd_cache only improve write performance for me (4k block ) > > > > - Mail original - > > De: "Haomai Wang" > À: "Somnath Roy" > Cc: "Sage Weil" , "Josh Durgin" > , ceph-devel@vger.kernel.org > Envoyé: Jeudi 18 Septembre 2014 04:27:56 > Objet: Re: severe librbd performance degradation in Giant > > According http://tracker.ceph.com/issues/9513, do you mean that rbd cache > will make 10x performance degradation for random read? > > On Thu, Sep 18, 2014 at 7:44 AM, Somnath Roy wrote: >> Josh/Sage, >> I should mention that even after turning off rbd cache I am getting ~20% >> degradation over Firefly. >> >> Thanks & Regards >> Somnath >> >> -Original Message- >> From: Somnath Roy >> Sent: Wednesday, September 17, 2014 2:44 PM >> To: Sage Weil >> Cc: Josh Durgin; ceph-devel@vger.kernel.org >> Subject: RE: severe librbd performance degradation in Giant >> >> Created a tracker for this. >> >> http://tracker.ceph.com/issues/9513 >> >> Thanks & Regards >> Somnath >> >> -Original Message- >> From: ceph-devel-ow...@vger.kernel.org >> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy >> Sent: Wednesday, September 17, 2014 2:39 PM >> To: Sage Weil >> Cc: Josh Durgin; ceph-devel@vger.kernel.org >> Subject: RE: severe librbd performance degradation in Giant >> >> Sage, >> It's a 4K random read. >> >> Thanks & Regards >> Somnath >> >> -Original Message- >> From: Sage Weil [mailto:sw...@redhat.com] >> Sent: Wednesday, September 17, 2014 2:36 PM >> To: Somnath Roy >> Cc: Josh Durgin; ceph-devel@vger.kernel.org >> Subject: RE: severe librbd performance degradation in Giant >> >> What was the io pattern? Sequential or random? For random a slowdown makes >> sense (tho maybe not 10x!) but not for sequentail >> >> s >> >> On Wed, 17 Sep 2014, Somnath Roy wrote: >> >>> I set the following in the client side /etc/ceph/ceph.conf where I am >>> running fio rbd. >>> >>> rbd_cache_writethrough_until_flush = false >>> >>> But, no difference. BTW, I am doing Random read, not write. Still this >>> setting applies ? >>> >>> Next, I tried to tweak the rbd_cache setting to false and I *got back* the >>> old performance. Now, it is similar to firefly throughput ! >>> >>> So, loks like rbd_cache=true was the culprit. >>> >>> Thanks Josh ! >>> >>> Regards >>> Somnath >>> >>> -Original Message- >>> From: Josh Durgin [mailto:josh.dur...@inktank.com] >>> Sent: Wednesday, September 17, 2014 2:20 PM >>> To: Somnath Roy; ceph-devel@vger.kernel.org >>> Subject: Re: severe librbd performance degradation in Giant >>> >>> On 09/17/2014 01:55 PM, Somnath Roy wrote: Hi Sage, We are experiencing severe librbd performance degradation in Giant over firefly release. Here is the experiment we
Re: snap_trimming + backfilling is inefficient with many purged_snaps
On Fri, Sep 19, 2014 at 10:41 AM, Dan Van Der Ster wrote: >> On 19 Sep 2014, at 08:12, Florian Haas wrote: >> >> On Fri, Sep 19, 2014 at 12:27 AM, Sage Weil wrote: >>> On Fri, 19 Sep 2014, Florian Haas wrote: Hi Sage, was the off-list reply intentional? >>> >>> Whoops! Nope :) >>> On Thu, Sep 18, 2014 at 11:47 PM, Sage Weil wrote: >> So, disaster is a pretty good description. Would anyone from the core >> team like to suggest another course of action or workaround, or are >> Dan and I generally on the right track to make the best out of a >> pretty bad situation? > > The short term fix would probably be to just prevent backfill for the time > being until the bug is fixed. As in, osd max backfills = 0? >>> >>> Yeah :) >>> >>> Just managed to reproduce the problem... >>> >>> sage >> >> Saw the wip branch. Color me freakishly impressed on the turnaround. :) >> Thanks! > > Indeed :) Thanks Sage! > wip-9487-dumpling fixes the problem on my test cluster. Trying in prod now… Final update, after 4 hours in prod and after draining 8 OSDs -- zero slow requests :) Thanks again! Dan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: severe librbd performance degradation in Giant
giant results with 6 osd bw=118129KB/s, iops=29532 : rbd_cache = false bw=101771KB/s, iops=25442 : rbd_cache = true fio config (note that numjobs is important, i'm going from 18000iops -> 29000 iops for numjobs 1->4) -- [global] #logging #write_iops_log=write_iops_log #write_bw_log=write_bw_log #write_lat_log=write_lat_log ioengine=rbd clientname=admin pool=test rbdname=test invalidate=0# mandatory #rw=read #rw=randwrite #rw=write rw=randread bs=4K direct=1 numjobs=4 group_reporting=1 size=10G [rbd_iodepth32] iodepth=32 ceph.conf - debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug journaler = 0/0 debug osd = 0/0 debug optracker = 0/0 debug objclass = 0/0 debug filestore = 0/0 debug journal = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 osd_op_num_threads_per_shard = 2 osd_op_num_shards = 25 filestore_fd_cache_size = 64 filestore_fd_cache_shards = 32 ms_nocrc = true cephx sign messages = false cephx require signatures = false ms_dispatch_throttle_bytes = 0 throttler_perf_counter = false [osd] osd_client_message_size_cap = 0 osd_client_message_cap = 0 osd_enable_op_tracker = false - Mail original - De: "Alexandre DERUMIER" À: "Somnath Roy" Cc: "Sage Weil" , "Josh Durgin" , ceph-devel@vger.kernel.org, "Haomai Wang" Envoyé: Vendredi 19 Septembre 2014 13:30:24 Objet: Re: severe librbd performance degradation in Giant >> with rbd_cache=true , I got around 6iops (and I don't see any network >> traffic) >> >>So maybe they are a bug in fio ? >>maybe this is related to: Oh, sorry, this was my fault, I didn't fill the rbd with datas before doing the bench Now the results are (for 1 osd) firefly -- bw=37460KB/s, iops=9364 giant - bw=32741KB/s, iops=8185 So, a little regression (the results are equals rbd_cache=true|false) I'll try to compare with more osds - Mail original - De: "Alexandre DERUMIER" À: "Somnath Roy" Cc: "Sage Weil" , "Josh Durgin" , ceph-devel@vger.kernel.org, "Haomai Wang" Envoyé: Vendredi 19 Septembre 2014 12:09:41 Objet: Re: severe librbd performance degradation in Giant >>What tool are you using ? I used fio rbd. fio rbd too [global] ioengine=rbd clientname=admin pool=test rbdname=test invalidate=0 #rw=read #rw=randwrite #rw=write rw=randread bs=4k direct=1 numjobs=2 group_reporting=1 size=10G [rbd_iodepth32] iodepth=32 I just notice something strange with rbd_cache=true , I got around 6iops (and I don't see any network traffic) So maybe they are a bug in fio ? maybe this is related to: http://tracker.ceph.com/issues/9391 "fio rbd driver rewrites same blocks" - Mail original - De: "Somnath Roy" À: "Alexandre DERUMIER" , "Haomai Wang" Cc: "Sage Weil" , "Josh Durgin" , ceph-devel@vger.kernel.org Envoyé: Jeudi 18 Septembre 2014 20:02:49 Objet: RE: severe librbd performance degradation in Giant Alexandre, What tool are you using ? I used fio rbd. Also, I hope you have Giant package installed in the client side as well and rbd_cache =true is set on the client conf file. FYI, firefly librbd + librados and Giant cluster will work seamlessly and I had to make sure fio rbd is really loading giant librbd (if you have multiple copies around , which was in my case) for reproducing it. Thanks & Regards Somnath -Original Message- From: Alexandre DERUMIER [mailto:aderum...@odiso.com] Sent: Thursday, September 18, 2014 2:49 AM To: Haomai Wang Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org; Somnath Roy Subject: Re: severe librbd performance degradation in Giant >>According http://tracker.ceph.com/issues/9513, do you mean that rbd >>cache will make 10x performance degradation for random read? Hi, on my side, I don't see any degradation performance on read (seq or rand) with or without. firefly : around 12000iops (with or without rbd_cache) giant : around 12000iops (with or without rbd_cache) (and I can reach around 2-3 iops on giant with disabling optracker). rbd_cache only improve write performance for me (4k block ) - Mail original - De: "Haomai Wang" À: "Somnath Roy" Cc: "Sage Weil" , "Josh Durgin" , ceph-devel@vger.kernel.org Envoyé: Jeudi 18 Septembre 2014 04:27:56 Objet: Re: severe librbd performance degradation in Giant According http://tracker.ceph.com/issues/9513, do you mean that rbd cache will make 10x performance degradation for random read? On Thu, Sep 18, 2014 at 7:44 AM, Somnath Roy wrot
Re: severe librbd performance degradation in Giant
>> with rbd_cache=true , I got around 6iops (and I don't see any network >> traffic) >> >>So maybe they are a bug in fio ? >>maybe this is related to: Oh, sorry, this was my fault, I didn't fill the rbd with datas before doing the bench Now the results are (for 1 osd) firefly -- bw=37460KB/s, iops=9364 giant - bw=32741KB/s, iops=8185 So, a little regression (the results are equals rbd_cache=true|false) I'll try to compare with more osds - Mail original - De: "Alexandre DERUMIER" À: "Somnath Roy" Cc: "Sage Weil" , "Josh Durgin" , ceph-devel@vger.kernel.org, "Haomai Wang" Envoyé: Vendredi 19 Septembre 2014 12:09:41 Objet: Re: severe librbd performance degradation in Giant >>What tool are you using ? I used fio rbd. fio rbd too [global] ioengine=rbd clientname=admin pool=test rbdname=test invalidate=0 #rw=read #rw=randwrite #rw=write rw=randread bs=4k direct=1 numjobs=2 group_reporting=1 size=10G [rbd_iodepth32] iodepth=32 I just notice something strange with rbd_cache=true , I got around 6iops (and I don't see any network traffic) So maybe they are a bug in fio ? maybe this is related to: http://tracker.ceph.com/issues/9391 "fio rbd driver rewrites same blocks" - Mail original - De: "Somnath Roy" À: "Alexandre DERUMIER" , "Haomai Wang" Cc: "Sage Weil" , "Josh Durgin" , ceph-devel@vger.kernel.org Envoyé: Jeudi 18 Septembre 2014 20:02:49 Objet: RE: severe librbd performance degradation in Giant Alexandre, What tool are you using ? I used fio rbd. Also, I hope you have Giant package installed in the client side as well and rbd_cache =true is set on the client conf file. FYI, firefly librbd + librados and Giant cluster will work seamlessly and I had to make sure fio rbd is really loading giant librbd (if you have multiple copies around , which was in my case) for reproducing it. Thanks & Regards Somnath -Original Message- From: Alexandre DERUMIER [mailto:aderum...@odiso.com] Sent: Thursday, September 18, 2014 2:49 AM To: Haomai Wang Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org; Somnath Roy Subject: Re: severe librbd performance degradation in Giant >>According http://tracker.ceph.com/issues/9513, do you mean that rbd >>cache will make 10x performance degradation for random read? Hi, on my side, I don't see any degradation performance on read (seq or rand) with or without. firefly : around 12000iops (with or without rbd_cache) giant : around 12000iops (with or without rbd_cache) (and I can reach around 2-3 iops on giant with disabling optracker). rbd_cache only improve write performance for me (4k block ) - Mail original - De: "Haomai Wang" À: "Somnath Roy" Cc: "Sage Weil" , "Josh Durgin" , ceph-devel@vger.kernel.org Envoyé: Jeudi 18 Septembre 2014 04:27:56 Objet: Re: severe librbd performance degradation in Giant According http://tracker.ceph.com/issues/9513, do you mean that rbd cache will make 10x performance degradation for random read? On Thu, Sep 18, 2014 at 7:44 AM, Somnath Roy wrote: > Josh/Sage, > I should mention that even after turning off rbd cache I am getting ~20% > degradation over Firefly. > > Thanks & Regards > Somnath > > -Original Message- > From: Somnath Roy > Sent: Wednesday, September 17, 2014 2:44 PM > To: Sage Weil > Cc: Josh Durgin; ceph-devel@vger.kernel.org > Subject: RE: severe librbd performance degradation in Giant > > Created a tracker for this. > > http://tracker.ceph.com/issues/9513 > > Thanks & Regards > Somnath > > -Original Message- > From: ceph-devel-ow...@vger.kernel.org > [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy > Sent: Wednesday, September 17, 2014 2:39 PM > To: Sage Weil > Cc: Josh Durgin; ceph-devel@vger.kernel.org > Subject: RE: severe librbd performance degradation in Giant > > Sage, > It's a 4K random read. > > Thanks & Regards > Somnath > > -Original Message- > From: Sage Weil [mailto:sw...@redhat.com] > Sent: Wednesday, September 17, 2014 2:36 PM > To: Somnath Roy > Cc: Josh Durgin; ceph-devel@vger.kernel.org > Subject: RE: severe librbd performance degradation in Giant > > What was the io pattern? Sequential or random? For random a slowdown makes > sense (tho maybe not 10x!) but not for sequentail > > s > > On Wed, 17 Sep 2014, Somnath Roy wrote: > >> I set the following in the client side /etc/ceph/ceph.conf where I am >> running fio rbd. >> >> rbd_cache_writethrough_until_flush = false >> >> But, no difference. BTW, I am doing Random read, not write. Still this >> setting applies ? >> >> Next, I tried to tweak the rbd_cache setting to false and I *got back* the >> old performance. Now, it is similar to firefly throughput ! >> >> So, loks like rbd_cache=true was the culprit. >> >> Thanks Josh ! >> >> Regards >> Somnath >> >> -Original Message- >
Re: severe librbd performance degradation in Giant
>>What tool are you using ? I used fio rbd. fio rbd too [global] ioengine=rbd clientname=admin pool=test rbdname=test invalidate=0 #rw=read #rw=randwrite #rw=write rw=randread bs=4k direct=1 numjobs=2 group_reporting=1 size=10G [rbd_iodepth32] iodepth=32 I just notice something strange with rbd_cache=true , I got around 6iops (and I don't see any network traffic) So maybe they are a bug in fio ? maybe this is related to: http://tracker.ceph.com/issues/9391 "fio rbd driver rewrites same blocks" - Mail original - De: "Somnath Roy" À: "Alexandre DERUMIER" , "Haomai Wang" Cc: "Sage Weil" , "Josh Durgin" , ceph-devel@vger.kernel.org Envoyé: Jeudi 18 Septembre 2014 20:02:49 Objet: RE: severe librbd performance degradation in Giant Alexandre, What tool are you using ? I used fio rbd. Also, I hope you have Giant package installed in the client side as well and rbd_cache =true is set on the client conf file. FYI, firefly librbd + librados and Giant cluster will work seamlessly and I had to make sure fio rbd is really loading giant librbd (if you have multiple copies around , which was in my case) for reproducing it. Thanks & Regards Somnath -Original Message- From: Alexandre DERUMIER [mailto:aderum...@odiso.com] Sent: Thursday, September 18, 2014 2:49 AM To: Haomai Wang Cc: Sage Weil; Josh Durgin; ceph-devel@vger.kernel.org; Somnath Roy Subject: Re: severe librbd performance degradation in Giant >>According http://tracker.ceph.com/issues/9513, do you mean that rbd >>cache will make 10x performance degradation for random read? Hi, on my side, I don't see any degradation performance on read (seq or rand) with or without. firefly : around 12000iops (with or without rbd_cache) giant : around 12000iops (with or without rbd_cache) (and I can reach around 2-3 iops on giant with disabling optracker). rbd_cache only improve write performance for me (4k block ) - Mail original - De: "Haomai Wang" À: "Somnath Roy" Cc: "Sage Weil" , "Josh Durgin" , ceph-devel@vger.kernel.org Envoyé: Jeudi 18 Septembre 2014 04:27:56 Objet: Re: severe librbd performance degradation in Giant According http://tracker.ceph.com/issues/9513, do you mean that rbd cache will make 10x performance degradation for random read? On Thu, Sep 18, 2014 at 7:44 AM, Somnath Roy wrote: > Josh/Sage, > I should mention that even after turning off rbd cache I am getting ~20% > degradation over Firefly. > > Thanks & Regards > Somnath > > -Original Message- > From: Somnath Roy > Sent: Wednesday, September 17, 2014 2:44 PM > To: Sage Weil > Cc: Josh Durgin; ceph-devel@vger.kernel.org > Subject: RE: severe librbd performance degradation in Giant > > Created a tracker for this. > > http://tracker.ceph.com/issues/9513 > > Thanks & Regards > Somnath > > -Original Message- > From: ceph-devel-ow...@vger.kernel.org > [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy > Sent: Wednesday, September 17, 2014 2:39 PM > To: Sage Weil > Cc: Josh Durgin; ceph-devel@vger.kernel.org > Subject: RE: severe librbd performance degradation in Giant > > Sage, > It's a 4K random read. > > Thanks & Regards > Somnath > > -Original Message- > From: Sage Weil [mailto:sw...@redhat.com] > Sent: Wednesday, September 17, 2014 2:36 PM > To: Somnath Roy > Cc: Josh Durgin; ceph-devel@vger.kernel.org > Subject: RE: severe librbd performance degradation in Giant > > What was the io pattern? Sequential or random? For random a slowdown makes > sense (tho maybe not 10x!) but not for sequentail > > s > > On Wed, 17 Sep 2014, Somnath Roy wrote: > >> I set the following in the client side /etc/ceph/ceph.conf where I am >> running fio rbd. >> >> rbd_cache_writethrough_until_flush = false >> >> But, no difference. BTW, I am doing Random read, not write. Still this >> setting applies ? >> >> Next, I tried to tweak the rbd_cache setting to false and I *got back* the >> old performance. Now, it is similar to firefly throughput ! >> >> So, loks like rbd_cache=true was the culprit. >> >> Thanks Josh ! >> >> Regards >> Somnath >> >> -Original Message- >> From: Josh Durgin [mailto:josh.dur...@inktank.com] >> Sent: Wednesday, September 17, 2014 2:20 PM >> To: Somnath Roy; ceph-devel@vger.kernel.org >> Subject: Re: severe librbd performance degradation in Giant >> >> On 09/17/2014 01:55 PM, Somnath Roy wrote: >> > Hi Sage, >> > We are experiencing severe librbd performance degradation in Giant over >> > firefly release. Here is the experiment we did to isolate it as a librbd >> > problem. >> > >> > 1. Single OSD is running latest Giant and client is running fio rbd on top >> > of firefly based librbd/librados. For one client it is giving ~11-12K iops >> > (4K RR). >> > 2. Single OSD is running Giant and client is running fio rbd on top of >> > Giant based librbd/librados. For
Re: [PATCH v2 2/3] ec: use 32-byte aligned buffers
Hi Janne, This looks good ! The 32 byte aligned buffer applies to the diff related to buffer.h though, could you update the title ? I tend to prefer erasure-code over ec : it is easier to grep / search ;-) Cheers On 18/09/2014 12:33, Janne Grunau wrote: > Requiring page aligned buffers and realigning the input if necessary > creates measurable oberhead. ceph_erasure_code_benchmark is ~30% faster > with this change for technique=reed_sol_van,k=2,m=1. > > Also prevents a misaligned buffer when bufferlist::c_str(bufferlist) > has to allocate a new buffer to provide continuous one. See bug #9408 > > Signed-off-by: Janne Grunau > --- > src/erasure-code/ErasureCode.cc | 57 > - > src/erasure-code/ErasureCode.h | 3 ++- > 2 files changed, 41 insertions(+), 19 deletions(-) > > diff --git a/src/erasure-code/ErasureCode.cc b/src/erasure-code/ErasureCode.cc > index 5953f49..7aa5235 100644 > --- a/src/erasure-code/ErasureCode.cc > +++ b/src/erasure-code/ErasureCode.cc > @@ -54,22 +54,49 @@ int ErasureCode::minimum_to_decode_with_cost(const > set &want_to_read, > } > > int ErasureCode::encode_prepare(const bufferlist &raw, > -bufferlist *prepared) const > +map &encoded) const > { >unsigned int k = get_data_chunk_count(); >unsigned int m = get_chunk_count() - k; >unsigned blocksize = get_chunk_size(raw.length()); > - unsigned padded_length = blocksize * k; > - *prepared = raw; > - if (padded_length - raw.length() > 0) { > -bufferptr pad(padded_length - raw.length()); > -pad.zero(); > -prepared->push_back(pad); > + unsigned pad_len = blocksize * k - raw.length(); > + unsigned padded_chunks = k - raw.length() / blocksize; > + bufferlist prepared = raw; > + > + if (!prepared.is_aligned()) { > +// splice padded chunks off to make the rebuild faster > +if (padded_chunks) > + prepared.splice((k - padded_chunks) * blocksize, > + padded_chunks * blocksize - pad_len); > +prepared.rebuild_aligned(); > + } > + > + for (unsigned int i = 0; i < k - padded_chunks; i++) { > +int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] : i; > +bufferlist &chunk = encoded[chunk_index]; > +chunk.substr_of(prepared, i * blocksize, blocksize); > + } > + if (padded_chunks) { > +unsigned remainder = raw.length() - (k - padded_chunks) * blocksize; > +bufferlist padded; > +bufferptr buf(buffer::create_aligned(padded_chunks * blocksize)); > + > +raw.copy((k - padded_chunks) * blocksize, remainder, buf.c_str()); > +buf.zero(remainder, pad_len); > +padded.push_back(buf); > + > +for (unsigned int i = k - padded_chunks; i < k; i++) { > + int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] : i; > + bufferlist &chunk = encoded[chunk_index]; > + chunk.substr_of(padded, (i - (k - padded_chunks)) * blocksize, > blocksize); > +} > + } > + for (unsigned int i = k; i < k + m; i++) { > +int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] : i; > +bufferlist &chunk = encoded[chunk_index]; > +chunk.push_back(buffer::create_aligned(blocksize)); >} > - unsigned coding_length = blocksize * m; > - bufferptr coding(buffer::create_page_aligned(coding_length)); > - prepared->push_back(coding); > - prepared->rebuild_page_aligned(); > + >return 0; > } > > @@ -80,15 +107,9 @@ int ErasureCode::encode(const set &want_to_encode, >unsigned int k = get_data_chunk_count(); >unsigned int m = get_chunk_count() - k; >bufferlist out; > - int err = encode_prepare(in, &out); > + int err = encode_prepare(in, *encoded); >if (err) > return err; > - unsigned blocksize = get_chunk_size(in.length()); > - for (unsigned int i = 0; i < k + m; i++) { > -int chunk_index = chunk_mapping.size() > 0 ? chunk_mapping[i] : i; > -bufferlist &chunk = (*encoded)[chunk_index]; > -chunk.substr_of(out, i * blocksize, blocksize); > - } >encode_chunks(want_to_encode, encoded); >for (unsigned int i = 0; i < k + m; i++) { > if (want_to_encode.count(i) == 0) > diff --git a/src/erasure-code/ErasureCode.h b/src/erasure-code/ErasureCode.h > index 7aaea95..62aa383 100644 > --- a/src/erasure-code/ErasureCode.h > +++ b/src/erasure-code/ErasureCode.h > @@ -46,7 +46,8 @@ namespace ceph { > const map &available, > set *minimum); > > -int encode_prepare(const bufferlist &raw, bufferlist *prepared) const; > +int encode_prepare(const bufferlist &raw, > + map &encoded) const; > > virtual int encode(const set &want_to_encode, > const bufferlist &in, > -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
why ZFS on ceph is unstable?
Hi developers, it mentioned in the source code that OPTION(filestore_zfs_snap, OPT_BOOL, false) // zfsonlinux is still unstable. So if we turn on filestore_zfs_snap and neglect journal like btrf, it will be unstable? As is mentioned on the "zfs on linux community", It is stable enough to run a ZFS root filesystem on a GNU/Linux installation for your workstation as something to play around with. It is copy-on-write, supports compression, deduplication, file atomicity, off-disk caching, (encryption not support), and much more. So it seems that all features are supported except for encryption. Thus, I am puzzled that the unstable, you mean, is ZFS unstable itself. Or it now is already stable on linux, but still unstable when used as ceph FileStore filesystem. If so, what will happen if we use it, losing data or frequent crash? Nicheal -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: v2 aligned buffer changes for erasure codes
Hi Andreas, The per_chunk_alignment addresses a backward compatible change in the way they are calculated. The problem was that the initial calculation lead to oversized chunks. The long explanation is at https://github.com/ceph/ceph/commit/c7daaaf5e63d0bd1d444385e62611fe276f6ce29 Please let me know if you see something wrong :-) Cheers On 18/09/2014 14:34, Andreas Joachim Peters wrote: > Hi Janne/Loic, > there is more confusion atleast on my side ... > > I had now a look at the jerasure plug-in and I am now slightly confused why > you have two ways to return in get_alignment ... one is as I assume and > another one is "per_chunk_alignment" ... what should the function return Loic? > > Cheers Andreas. > > From: ceph-devel-ow...@vger.kernel.org [ceph-devel-ow...@vger.kernel.org] on > behalf of Andreas Joachim Peters [andreas.joachim.pet...@cern.ch] > Sent: 18 September 2014 14:18 > To: Janne Grunau; ceph-devel@vger.kernel.org > Subject: RE: v2 aligned buffer changes for erasure codes > > Hi Janne, > => (src/erasure-code/isa/README claims it needs 16*k byte aligned buffers > > I should update the README since it is misleading ... it should say 8*k or > 16*k byte aligned chunk size depending on the compiler/platform used, it is > not the alignment of the allocated buffer addresses.The get_alignment in the > plug-in function is used to compute the chunk size for the encoding (as I > said not the start address alignment). > > If you pass k buffers for decoding each buffer should be aligned at least to > 16 or as you pointed out better 32 bytes. > > For encoding there is normally a single buffer split 'virtually' into k > pieces. To make all pieces starting at an aligned address one needs to align > the chunk size to e.g. 16*k. For the best possible performance on all > platforms we should change the get_alignment function in the ISA plug-in to > return 32*k if there are no other objections ?!?! > > Cheers Andreas. > > From: ceph-devel-ow...@vger.kernel.org [ceph-devel-ow...@vger.kernel.org] on > behalf of Janne Grunau [j...@jannau.net] > Sent: 18 September 2014 12:33 > To: ceph-devel@vger.kernel.org > Subject: v2 aligned buffer changes for erasure codes > > Hi, > > following a is an updated patchset. It passes now make check in src > > It has following changes: > * use 32-byte alignment since the isa plugin use AVX2 >(src/erasure-code/isa/README claims it needs 16*k byte aligned buffers >but I can't see a reason why it would need more than 32-bytes > * ErasureCode::encode_prepare() handles more than one chunk with padding > > cheers > > Janne > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: snap_trimming + backfilling is inefficient with many purged_snaps
> On 19 Sep 2014, at 08:12, Florian Haas wrote: > > On Fri, Sep 19, 2014 at 12:27 AM, Sage Weil wrote: >> On Fri, 19 Sep 2014, Florian Haas wrote: >>> Hi Sage, >>> >>> was the off-list reply intentional? >> >> Whoops! Nope :) >> >>> On Thu, Sep 18, 2014 at 11:47 PM, Sage Weil wrote: > So, disaster is a pretty good description. Would anyone from the core > team like to suggest another course of action or workaround, or are > Dan and I generally on the right track to make the best out of a > pretty bad situation? The short term fix would probably be to just prevent backfill for the time being until the bug is fixed. >>> >>> As in, osd max backfills = 0? >> >> Yeah :) >> >> Just managed to reproduce the problem... >> >> sage > > Saw the wip branch. Color me freakishly impressed on the turnaround. :) > Thanks! Indeed :) Thanks Sage! wip-9487-dumpling fixes the problem on my test cluster. Trying in prod now… Cheers, DanN�r��yb�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj"��!�i