Re: [ceph-users] Ceph cluster with SSDs
Hello, On Sat, 19 Aug 2017 23:22:11 +0530 M Ranga Swami Reddy wrote: > SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage - > MZ-75E4T0B/AM | Samsung > And there's your answer. A bit of googling in the archives here would have shown you that these are TOTALLY unsuitable for use with Ceph. Not only because of the horrid speed when used with/for Ceph journaling (direct/sync I/O) but also their abysmal endurance of 0.04 DWPD over 5 years. Or in other words 160GB/day, which after the Ceph journal double writes and FS journals, other overhead and write amplification in general probably means less that effective 40GB/day. In contrast the lowest endurance DC grade SSDs tend to be 0.3 DWPD and more commonly 1 DWPD. And I'm not buying anything below 3 DWPD for use with Ceph. Your only chance to improve the speed here is to take the journals off them and put them onto fast and durable enough NVMes like the Intel DC P 3700 or at worst 3600 types. That still leaves you with their crappy endurance, only twice as high than before with the journals offloaded. Christian > On Sat, Aug 19, 2017 at 10:44 PM, M Ranga Swami Reddy >wrote: > > Yes, Its in production and used the pg count as per the pg calcuator @ > > ceph.com. > > > > On Fri, Aug 18, 2017 at 3:30 AM, Mehmet wrote: > >> Which ssds are used? Are they in production? If so how is your PG Count? > >> > >> Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy > >> : > >>> > >>> Hello, > >>> I am using the Ceph cluster with HDDs and SSDs. Created separate pool for > >>> each. > >>> Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s > >>> and SSD's OSD show around 280MB/s. > >>> > >>> Ideally, what I expected was - SSD's OSDs should be at-least 40% high > >>> as compared with HDD's OSD bench. > >>> > >>> Did I miss anything here? Any hint is appreciated. > >>> > >>> Thanks > >>> Swami > >>> > >>> > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Rakuten Communications ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7
Where can you get the nfs-ganesha-ceph rpm? Is there a repository that has these? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] VMware + Ceph using NFS sync/async ?
Hi Nick, Interesting your note on PG locking, but I would be surprised if its effect is that bad. I would think that in your example the 2 ms is a total latency, the lock will probably be applied to small portion of that, so the concurrent operations are not serialized for the entire time..but again i may be wrong. Also if the lock is that bad, then we should see 4k sequential writes to be much slower than random ones in general testing, which is not the case. Another thing that may help in vm migration as per your description is reducing the rbd stripe size to be a couple of times smaller than 2M ( 32 x 64k ). Maged On 2017-08-16 16:12, Nick Fisk wrote: > Hi Matt, > > Well behaved applications are the problem here. ESXi sends all writes as sync > writes. So although OS's will still do their own buffering, any ESXi level > operation is all done as sync. This is probably seen the greatest when > migrating vm's between datastores, everything gets done as sync 64KB ios > meaning, copying a 1TB VM can often take nearly 24 hours. > > Osama, can you describe the difference in performance you see between > Openstack and ESXi and what type of operations are these? Sync writes should > be the same no matter the client, except in the NFS case you will have an > extra network hop and potentially a little bit of PG congestion around the FS > journal on the RBd device. > > Osama, you can't compare Ceph to a SAN. Just in terms of network latency you > have an extra 2 hops. In ideal scenario you might be able to get Ceph write > latency down to 0.5-1ms for a 4kb io, compared to to about 0.1-0.3 for a > storage array. However, what you will find with Ceph is that other things > start to increase this average long before you would start to see this on > storage arrays. > > The migration is a good example of this. As I said, ESXi migrates a vm in > 64KB io's, but does 32 of these blocks in parallel at a time. On storage > arrays, these 64KB io's are coalesced in the battery protected write cached > into bigger IO's before being persisted to disk. The storage array can also > accept all 32 of these requests at once. > > A similar thing happens in Ceph/RBD/NFS via the Ceph filestore journal, but > that coalescing is now an extra 2 hops away and with a bit of extra latency > introduced by the Ceph code, we are already a bit slower. But here's the > killer, PG locking!!! You can't write 32 IO's in parallel to the same > object/PG, each one has to be processed sequentially because of the locks. > (Please someone correct me if I'm wrong here). If your 64KB write latency is > 2ms, then you can only do 500 64KB IO's a second. 64KB*500=~30MB/s vs a > Storage Array which would be doing the operation in the hundreds of MB/s > range. > > Note: When proper iSCSI for RBD support is finished, you might be able to use > the VAAI offloads, which would dramatically increase performance for > migrations as well. > > Also once persistent SSD write caching for librbd becomes available, a lot of > these problems will go away, as the SSD will behave like a storage array's > write cache and will only be 1 hop away from the client as well. > > FROM: Matt Benjamin [mailto:mbenj...@redhat.com] > SENT: 16 August 2017 14:49 > TO: Osama Hasebou> CC: n...@fisk.me.uk; ceph-users > SUBJECT: Re: [ceph-users] VMware + Ceph using NFS sync/async ? > > Hi Osama, > > I don't have a clear sense of the the application workflow here--and Nick > appears to--but I thought it worth noting that NFSv3 and NFSv4 clients > shouldn't normally need the sync mount option to achieve i/o stability with > well-behaved applications. In both versions of the protocol, an application > write that is synchronous (or, more typically, the equivalent application > sync barrier) should not succeed until an NFS-protocol COMMIT (or in some > cases w/NFSv4, WRITE w/stable flag set) has been acknowledged by the NFS > server. If the NFS i/o stability model is insufficient for a your workflow, > moreover, I'd be worried that -osync writes (which might be incompletely > applied during a failure event) may not be correctly enforcing your > invariant, either. > > Matt > > On Wed, Aug 16, 2017 at 8:33 AM, Osama Hasebou wrote: > >> Hi Nick, >> >> Thanks for replying! If Ceph is combined with Openstack then, does that mean >> that actually when openstack writes are happening, it is not fully sync'd >> (as in written to disks) before it starts receiving more data, so acting as >> async ? In that scenario there is a chance for data loss if things go bad, >> i.e power outage or something like that ? >> >> As for the slow operations, reading is quite fine when I compare it to a SAN >> storage system connected to VMware. It is writing data, small chunks or big >> ones, that suffer when trying to use the sync option with FIO for >>
Re: [ceph-users] Ceph cluster with SSDs
SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage - MZ-75E4T0B/AM | Samsung On Sat, Aug 19, 2017 at 10:44 PM, M Ranga Swami Reddywrote: > Yes, Its in production and used the pg count as per the pg calcuator @ > ceph.com. > > On Fri, Aug 18, 2017 at 3:30 AM, Mehmet wrote: >> Which ssds are used? Are they in production? If so how is your PG Count? >> >> Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy >> : >>> >>> Hello, >>> I am using the Ceph cluster with HDDs and SSDs. Created separate pool for >>> each. >>> Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s >>> and SSD's OSD show around 280MB/s. >>> >>> Ideally, what I expected was - SSD's OSDs should be at-least 40% high >>> as compared with HDD's OSD bench. >>> >>> Did I miss anything here? Any hint is appreciated. >>> >>> Thanks >>> Swami >>> >>> >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How much max size of Bluestore WAL and DB can used in the normal environment?
Hi, According to the source code of ceph11.2.0,we known that Bluestore WAL and DB are s stored in the top 4% of the OSD disk space.But I found that it didn't really need so much.I decided to modify it to 0.5%, then the meta data size was usually just less than 0.2% in the experiment.Due to my experimental environment(only 10 1TB HDDs), I can't make sure that the modification will be a problem, so I'd like to ask you how much space WAL and DB usually need . ps:In order to reduce the memory consumption, I have to modify the rocksdb configuration: OPTION (bluestore_rocksdb_options, OPT_STR, compression=kNoCompression, max_write_buffer_number=2, min_write_buffer_number_to_merge=1, recycle_log_file_num=4, write_buffer_size= 32768) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph cluster with SSDs
I did not only "osd bench". Performed rbd image mapped and DD test on it... here also got very less number with image on SSD pool as compared with image on HDD pool. As per SSD datasheet - they claim 500 MB/s, but I am getting somewhat near 50 MB/s with dd cmd. On Fri, Aug 18, 2017 at 6:32 AM, Christian Balzerwrote: > > Hello, > > On Fri, 18 Aug 2017 00:00:09 +0200 Mehmet wrote: > >> Which ssds are used? Are they in production? If so how is your PG Count? >> > What he wrote. > W/o knowing which apples you're comparing to what oranges, this is > pointless. > > Also testing osd bench is the LEAST relevant test you can do, as it only > deals with local bandwidth, while what people nearly always want/need in > the end is IOPS and low latency. > Which you test best from a real client perspective. > > Christian > >> Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy >> : >> >Hello, >> >I am using the Ceph cluster with HDDs and SSDs. Created separate pool >> >for each. >> >Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s >> >and SSD's OSD show around 280MB/s. >> > >> >Ideally, what I expected was - SSD's OSDs should be at-least 40% high >> >as compared with HDD's OSD bench. >> > >> >Did I miss anything here? Any hint is appreciated. >> > >> >Thanks >> >Swami >> >___ >> >ceph-users mailing list >> >ceph-users@lists.ceph.com >> >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- > Christian BalzerNetwork/Systems Engineer > ch...@gol.com Rakuten Communications > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph cluster with SSDs
Yes, Its in production and used the pg count as per the pg calcuator @ ceph.com. On Fri, Aug 18, 2017 at 3:30 AM, Mehmetwrote: > Which ssds are used? Are they in production? If so how is your PG Count? > > Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy > : >> >> Hello, >> I am using the Ceph cluster with HDDs and SSDs. Created separate pool for >> each. >> Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s >> and SSD's OSD show around 280MB/s. >> >> Ideally, what I expected was - SSD's OSDs should be at-least 40% high >> as compared with HDD's OSD bench. >> >> Did I miss anything here? Any hint is appreciated. >> >> Thanks >> Swami >> >> >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Luminous radosgw hangs after a few hours
Hi! Apparently the message had nothing to do with the issue. It was just that after the threads affected by the SIGHUP issue crashed, the keystone-related stuff was all that’s left. Regards, Martin Am 19.08.17, 00:34 schrieb "Kamble, Nitin A": I see the same issue with ceph v12.1.4 as well. We are not using openstack or keystone, and see these errors in the rgw log. RGW is not hanging though. Thanks, Nitin From: ceph-users on behalf of Martin Emrich Date: Monday, July 24, 2017 at 10:08 PM To: Vasu Kulkarni , Vaibhav Bhembre Cc: "ceph-users@lists.ceph.com" Subject: Re: [ceph-users] Luminous radosgw hangs after a few hours I created an issue: http://tracker.ceph.com/issues/20763 Regards, Martin Von: Vasu Kulkarni Datum: Montag, 24. Juli 2017 um 19:26 An: Vaibhav Bhembre Cc: Martin Emrich , "ceph-users@lists.ceph.com" Betreff: Re: [ceph-users] Luminous radosgw hangs after a few hours Please raise a tracker for rgw and also provide some additional journalctl logs and info(ceph version, os version etc): http://tracker.ceph.com/projects/rgw On Mon, Jul 24, 2017 at 9:03 AM, Vaibhav Bhembre wrote: I am seeing the same issue on upgrade to Luminous v12.1.0 from Jewel. I am not using Keystone or OpenStack either and my radosgw daemon hangs as well. I have to restart it to resume processing. 2017-07-24 00:23:33.057401 7f196096a700 0 ERROR: keystone revocation processing returned error r=-22 2017-07-24 00:38:33.057524 7f196096a700 0 ERROR: keystone revocation processing returned error r=-22 2017-07-24 00:53:33.057648 7f196096a700 0 ERROR: keystone revocation processing returned error r=-22 2017-07-24 01:08:33.057749 7f196096a700 0 ERROR: keystone revocation processing returned error r=-22 2017-07-24 01:23:33.057878 7f196096a700 0 ERROR: keystone revocation processing returned error r=-22 2017-07-24 01:38:33.057964 7f196096a700 0 ERROR: keystone revocation processing returned error r=-22 2017-07-24 01:53:33.058098 7f196096a700 0 ERROR: keystone revocation processing returned error r=-22 2017-07-24 02:08:33.058225 7f196096a700 0 ERROR: keystone revocation processing returned error r=-22 The following are my keystone config options: "rgw_keystone_url": "" "rgw_keystone_admin_token": "" "rgw_keystone_admin_user": "" "rgw_keystone_admin_password": "" "rgw_keystone_admin_tenant": "" "rgw_keystone_admin_project": "" "rgw_keystone_admin_domain": "" "rgw_keystone_barbican_user": "" "rgw_keystone_barbican_password": "" "rgw_keystone_barbican_tenant": "" "rgw_keystone_barbican_project": "" "rgw_keystone_barbican_domain": "" "rgw_keystone_api_version": "2" "rgw_keystone_accepted_roles": "Member "rgw_keystone_accepted_admin_roles": "" "rgw_keystone_token_cache_size": "1" "rgw_keystone_revocation_interval": "900" "rgw_keystone_verify_ssl": "true" "rgw_keystone_implicit_tenants": "false" "rgw_s3_auth_use_keystone": "false" Is this fixed in RC2 by any chance? On Thu, Jun 29, 2017 at 3:11 AM, Martin Emrich wrote: > Since upgrading to 12.1, our Object Gateways hang after a few hours, I only > see these messages in the log file: > > > > 2017-06-29 07:52:20.877587 7fa8e01e5700 0 ERROR: keystone revocation > processing returned error r=-22 > > 2017-06-29 08:07:20.877761 7fa8e01e5700 0 ERROR: keystone revocation > processing returned error r=-22 > > 2017-06-29 08:07:29.994979 7fa8e11e7700 0 process_single_logshard: Error in > get_bucket_info: (2) No such file or directory > > 2017-06-29 08:22:20.877911 7fa8e01e5700 0 ERROR: keystone revocation > processing returned error r=-22 > > 2017-06-29 08:27:30.086119 7fa8e11e7700 0 process_single_logshard: Error in > get_bucket_info: (2) No such file or directory > > 2017-06-29 08:37:20.878108 7fa8e01e5700 0 ERROR: keystone revocation > processing returned error r=-22 > > 2017-06-29 08:37:30.187696 7fa8e11e7700 0 process_single_logshard: Error in > get_bucket_info: (2) No such file or directory > > 2017-06-29 08:52:20.878283 7fa8e01e5700 0 ERROR: keystone revocation > processing returned error r=-22 > > 2017-06-29 08:57:30.280881 7fa8e11e7700 0 process_single_logshard: Error in > get_bucket_info: (2) No such file or directory > > 2017-06-29 09:07:20.878451 7fa8e01e5700 0 ERROR: keystone