Re: [ceph-users] Ceph cluster with SSDs

2017-08-19 Thread Christian Balzer

Hello,

On Sat, 19 Aug 2017 23:22:11 +0530 M Ranga Swami Reddy wrote:

> SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage -
> MZ-75E4T0B/AM | Samsung
>
And there's your answer.

A bit of googling in the archives here would have shown you that these are
TOTALLY unsuitable for use with Ceph.
Not only because of the horrid speed when used with/for Ceph journaling
(direct/sync I/O) but also their abysmal endurance of 0.04 DWPD over 5
years.
Or in other words 160GB/day, which after the Ceph journal double writes
and FS journals, other overhead and write amplification in general
probably means less that effective 40GB/day.

In contrast the lowest endurance DC grade SSDs tend to be 0.3 DWPD and
more commonly 1 DWPD.
And I'm not buying anything below 3 DWPD for use with Ceph.

Your only chance to improve the speed here is to take the journals off
them and put them onto fast and durable enough NVMes like the Intel DC P
3700 or at worst 3600 types.

That still leaves you with their crappy endurance, only twice as high than
before with the journals offloaded.
 
Christian

> On Sat, Aug 19, 2017 at 10:44 PM, M Ranga Swami Reddy
>  wrote:
> > Yes, Its in production and used the pg count as per the pg calcuator @ 
> > ceph.com.
> >
> > On Fri, Aug 18, 2017 at 3:30 AM, Mehmet  wrote:  
> >> Which ssds are used? Are they in production? If so how is your PG Count?
> >>
> >> Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy
> >> :  
> >>>
> >>> Hello,
> >>> I am using the Ceph cluster with HDDs and SSDs. Created separate pool for
> >>> each.
> >>> Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s
> >>> and SSD's OSD show around 280MB/s.
> >>>
> >>> Ideally, what I expected was - SSD's OSDs should be at-least 40% high
> >>> as compared with HDD's OSD bench.
> >>>
> >>> Did I miss anything here? Any hint is appreciated.
> >>>
> >>> Thanks
> >>> Swami
> >>> 
> >>>
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>  
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

2017-08-19 Thread Marc Roos


Where can you get the nfs-ganesha-ceph rpm? Is there a repository that 
has these?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VMware + Ceph using NFS sync/async ?

2017-08-19 Thread Maged Mokhtar
Hi Nick, 

Interesting your note on PG locking, but I would be surprised if its
effect is that bad. I would think that in your example the 2 ms is a
total latency, the lock will probably be applied to small portion of
that, so the concurrent operations are not serialized for the entire
time..but again i may be wrong. Also if the lock is that bad, then we
should see 4k sequential writes to be much slower than random ones in
general testing, which is not the case. 

Another thing that may help in vm migration as per your description is
reducing the rbd stripe size to be a couple of times smaller than 2M (
32 x 64k ). 

Maged 

On 2017-08-16 16:12, Nick Fisk wrote:

> Hi Matt, 
> 
> Well behaved applications are the problem here. ESXi sends all writes as sync 
> writes. So although OS's will still do their own buffering, any ESXi level 
> operation is all done as sync. This is probably seen the greatest when 
> migrating vm's between datastores, everything gets done as sync 64KB ios 
> meaning, copying a 1TB VM can often take nearly 24 hours. 
> 
> Osama, can you describe the difference in performance you see between 
> Openstack and ESXi and what type of operations are these? Sync writes should 
> be the same no matter the client, except in the NFS case you will have an 
> extra network hop and potentially a little bit of PG congestion around the FS 
> journal on the RBd device. 
> 
> Osama, you can't compare Ceph to a SAN. Just in terms of network latency you 
> have an extra 2 hops. In ideal scenario you might be able to get Ceph write 
> latency down to 0.5-1ms for a 4kb io, compared to to about 0.1-0.3 for a 
> storage array. However, what you will find with Ceph is that other things 
> start to increase this average long before you would start to see this on 
> storage arrays. 
> 
> The migration is a good example of this. As I said, ESXi migrates a vm in 
> 64KB io's, but does 32 of these blocks in parallel at a time. On storage 
> arrays, these 64KB io's are coalesced in the battery protected write cached 
> into bigger IO's before being persisted to disk. The storage array can also 
> accept all 32 of these requests at once. 
> 
> A similar thing happens in Ceph/RBD/NFS via the Ceph filestore journal, but 
> that coalescing is now an extra 2 hops away and with a bit of extra latency 
> introduced by the Ceph code, we are already a bit slower. But here's the 
> killer, PG locking!!! You can't write 32 IO's in parallel to the same 
> object/PG, each one has to be processed sequentially because of the locks. 
> (Please someone correct me if I'm wrong here). If your 64KB write latency is 
> 2ms, then you can only do 500 64KB IO's a second. 64KB*500=~30MB/s vs a 
> Storage Array which would be doing the operation in the hundreds of MB/s 
> range. 
> 
> Note: When proper iSCSI for RBD support is finished, you might be able to use 
> the VAAI offloads, which would dramatically increase performance for 
> migrations as well. 
> 
> Also once persistent SSD write caching for librbd becomes available, a lot of 
> these problems will go away, as the SSD will behave like a storage array's 
> write cache and will only be 1 hop away from the client as well. 
> 
> FROM: Matt Benjamin [mailto:mbenj...@redhat.com] 
> SENT: 16 August 2017 14:49
> TO: Osama Hasebou 
> CC: n...@fisk.me.uk; ceph-users 
> SUBJECT: Re: [ceph-users] VMware + Ceph using NFS sync/async ? 
> 
> Hi Osama, 
> 
> I don't have a clear sense of the the application workflow here--and Nick 
> appears to--but I thought it worth noting that NFSv3 and NFSv4 clients 
> shouldn't normally need the sync mount option to achieve i/o stability with 
> well-behaved applications.  In both versions of the protocol, an application 
> write that is synchronous (or, more typically, the equivalent application 
> sync barrier) should not succeed until an NFS-protocol COMMIT (or in some 
> cases w/NFSv4, WRITE w/stable flag set) has been acknowledged by the NFS 
> server.  If the NFS i/o stability model is insufficient for a your workflow, 
> moreover, I'd be worried that -osync writes (which might be incompletely 
> applied during a failure event) may not be correctly enforcing your 
> invariant, either. 
> 
> Matt 
> 
> On Wed, Aug 16, 2017 at 8:33 AM, Osama Hasebou  wrote:
> 
>> Hi Nick, 
>> 
>> Thanks for replying! If Ceph is combined with Openstack then, does that mean 
>> that actually when openstack writes are happening, it is not fully sync'd 
>> (as in written to disks) before it starts receiving more data, so acting as 
>> async ? In that scenario there is a chance for data loss if things go bad, 
>> i.e power outage or something like that ? 
>> 
>> As for the slow operations, reading is quite fine when I compare it to a SAN 
>> storage system connected to VMware. It is writing data, small chunks or big 
>> ones, that suffer when trying to use the sync option with FIO for 
>> 

Re: [ceph-users] Ceph cluster with SSDs

2017-08-19 Thread M Ranga Swami Reddy
SSD make details : SSD 850 EVO 2.5" SATA III 4TB Memory & Storage -
MZ-75E4T0B/AM | Samsung

On Sat, Aug 19, 2017 at 10:44 PM, M Ranga Swami Reddy
 wrote:
> Yes, Its in production and used the pg count as per the pg calcuator @ 
> ceph.com.
>
> On Fri, Aug 18, 2017 at 3:30 AM, Mehmet  wrote:
>> Which ssds are used? Are they in production? If so how is your PG Count?
>>
>> Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy
>> :
>>>
>>> Hello,
>>> I am using the Ceph cluster with HDDs and SSDs. Created separate pool for
>>> each.
>>> Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s
>>> and SSD's OSD show around 280MB/s.
>>>
>>> Ideally, what I expected was - SSD's OSDs should be at-least 40% high
>>> as compared with HDD's OSD bench.
>>>
>>> Did I miss anything here? Any hint is appreciated.
>>>
>>> Thanks
>>> Swami
>>> 
>>>
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How much max size of Bluestore WAL and DB can used in the normal environment?

2017-08-19 Thread liao junwei
Hi,

According to the source code of ceph11.2.0,we known that Bluestore WAL and 
DB are s stored in the top 4% of the OSD disk space.But I found that it didn't 
really need so much.I decided to modify it to 0.5%, then the meta data size was 
usually just less than 0.2% in the experiment.Due to my experimental 
environment(only 10 1TB HDDs), I can't make sure that the modification will be 
a problem, so I'd like to ask you how much space WAL and DB usually need .


ps:In order to reduce the memory consumption, I have to modify the rocksdb 
configuration: OPTION (bluestore_rocksdb_options, OPT_STR, 
compression=kNoCompression, max_write_buffer_number=2, 
min_write_buffer_number_to_merge=1, recycle_log_file_num=4, write_buffer_size= 
32768)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster with SSDs

2017-08-19 Thread M Ranga Swami Reddy
I did not only "osd bench". Performed rbd image mapped and DD test on
it... here also got very less number with image on SSD pool as
compared with image on HDD pool.
As per SSD datasheet - they claim 500 MB/s, but I am getting somewhat
near 50 MB/s with dd cmd.


On Fri, Aug 18, 2017 at 6:32 AM, Christian Balzer  wrote:
>
> Hello,
>
> On Fri, 18 Aug 2017 00:00:09 +0200 Mehmet wrote:
>
>> Which ssds are used? Are they in production? If so how is your PG Count?
>>
> What he wrote.
> W/o knowing which apples you're comparing to what oranges, this is
> pointless.
>
> Also testing osd bench is the LEAST relevant test you can do, as it only
> deals with local bandwidth, while what people nearly always want/need in
> the end is IOPS and low latency.
> Which you test best from a real client perspective.
>
> Christian
>
>> Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy 
>> :
>> >Hello,
>> >I am using the Ceph cluster with HDDs and SSDs. Created separate pool
>> >for each.
>> >Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s
>> >and SSD's OSD show around 280MB/s.
>> >
>> >Ideally, what I expected was - SSD's OSDs should be at-least 40% high
>> >as compared with HDD's OSD bench.
>> >
>> >Did I miss anything here? Any hint is appreciated.
>> >
>> >Thanks
>> >Swami
>> >___
>> >ceph-users mailing list
>> >ceph-users@lists.ceph.com
>> >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Rakuten Communications
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster with SSDs

2017-08-19 Thread M Ranga Swami Reddy
Yes, Its in production and used the pg count as per the pg calcuator @ ceph.com.

On Fri, Aug 18, 2017 at 3:30 AM, Mehmet  wrote:
> Which ssds are used? Are they in production? If so how is your PG Count?
>
> Am 17. August 2017 20:04:25 MESZ schrieb M Ranga Swami Reddy
> :
>>
>> Hello,
>> I am using the Ceph cluster with HDDs and SSDs. Created separate pool for
>> each.
>> Now, when I ran the "ceph osd bench", HDD's OSDs show around 500 MB/s
>> and SSD's OSD show around 280MB/s.
>>
>> Ideally, what I expected was - SSD's OSDs should be at-least 40% high
>> as compared with HDD's OSD bench.
>>
>> Did I miss anything here? Any hint is appreciated.
>>
>> Thanks
>> Swami
>> 
>>
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous radosgw hangs after a few hours

2017-08-19 Thread Martin Emrich
Hi!

Apparently the message had nothing to do with the issue. It was just that after 
the threads affected by the SIGHUP issue crashed, the keystone-related stuff 
was all that’s left.

Regards,

Martin

Am 19.08.17, 00:34 schrieb "Kamble, Nitin A" :

I see the same issue with ceph v12.1.4 as well. We are not using openstack 
or keystone, and see these errors in the rgw log. RGW is not hanging though.

Thanks,
Nitin


From: ceph-users  on behalf of Martin 
Emrich 
Date: Monday, July 24, 2017 at 10:08 PM
To: Vasu Kulkarni , Vaibhav Bhembre 

Cc: "ceph-users@lists.ceph.com" 
Subject: Re: [ceph-users] Luminous radosgw hangs after a few hours

I created an issue: http://tracker.ceph.com/issues/20763
 
Regards,
 
Martin
 
Von: Vasu Kulkarni 
Datum: Montag, 24. Juli 2017 um 19:26
An: Vaibhav Bhembre 
Cc: Martin Emrich , "ceph-users@lists.ceph.com" 

Betreff: Re: [ceph-users] Luminous radosgw hangs after a few hours
 
Please raise a tracker for rgw and also provide some additional journalctl 
logs and info(ceph version, os version etc): 
http://tracker.ceph.com/projects/rgw
 
On Mon, Jul 24, 2017 at 9:03 AM, Vaibhav Bhembre  
wrote:
I am seeing the same issue on upgrade to Luminous v12.1.0 from Jewel.
I am not using Keystone or OpenStack either and my radosgw daemon
hangs as well. I have to restart it to resume processing.

2017-07-24 00:23:33.057401 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 00:38:33.057524 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 00:53:33.057648 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:08:33.057749 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:23:33.057878 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:38:33.057964 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 01:53:33.058098 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22
2017-07-24 02:08:33.058225 7f196096a700  0 ERROR: keystone revocation
processing returned error r=-22

The following are my keystone config options:

"rgw_keystone_url": ""
"rgw_keystone_admin_token": ""
"rgw_keystone_admin_user": ""
"rgw_keystone_admin_password": ""
"rgw_keystone_admin_tenant": ""
"rgw_keystone_admin_project": ""
"rgw_keystone_admin_domain": ""
"rgw_keystone_barbican_user": ""
"rgw_keystone_barbican_password": ""
"rgw_keystone_barbican_tenant": ""
"rgw_keystone_barbican_project": ""
"rgw_keystone_barbican_domain": ""
"rgw_keystone_api_version": "2"
"rgw_keystone_accepted_roles": "Member
"rgw_keystone_accepted_admin_roles": ""
"rgw_keystone_token_cache_size": "1"
"rgw_keystone_revocation_interval": "900"
"rgw_keystone_verify_ssl": "true"
"rgw_keystone_implicit_tenants": "false"
"rgw_s3_auth_use_keystone": "false"

Is this fixed in RC2 by any chance?

On Thu, Jun 29, 2017 at 3:11 AM, Martin Emrich
 wrote:
> Since upgrading to 12.1, our Object Gateways hang after a few hours, I 
only
> see these messages in the log file:
>
>
>
> 2017-06-29 07:52:20.877587 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:07:20.877761 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:07:29.994979 7fa8e11e7700  0 process_single_logshard: Error 
in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 08:22:20.877911 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:27:30.086119 7fa8e11e7700  0 process_single_logshard: Error 
in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 08:37:20.878108 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:37:30.187696 7fa8e11e7700  0 process_single_logshard: Error 
in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 08:52:20.878283 7fa8e01e5700  0 ERROR: keystone revocation
> processing returned error r=-22
>
> 2017-06-29 08:57:30.280881 7fa8e11e7700  0 process_single_logshard: Error 
in
> get_bucket_info: (2) No such file or directory
>
> 2017-06-29 09:07:20.878451 7fa8e01e5700  0 ERROR: keystone