[ceph-users] ceph admin ops 403 forever

2016-09-10 Thread zhu tong
Powerful list, please help!


I have spent hours on figuring out why server always responds with 403 for the 
following request:

GET /{admin}/usage?format=json HTTP/1.1
Host: {fqdn}

s3Key=someKey
s3Secret=someSecret
dateValue=`TZ=GMT date +"%a, %d %b %Y %T"`
dateValue="$dateValue GMT"
host=http://192.168.57.101
if [ "$1" = "--get-usage" ]; then
resource="/admin/usage?format=json"
stringToSign="GET\n\n\n${dateValue}\n${resource}"
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | 
base64`
curl -L -v --post301 --post302 -i -X GET \
  -H "Date: ${dateValue}" \
  -H "Authorization: AWS ${s3Key}:${signature}" \
  ${host}${resource}
fi


I have found some help examples:

The official:

https://github.com/ceph/ceph/blob/master/src/test/test_rgw_admin_log.cc


The community starred:

https://github.com/dyarnell/rgwadmin/blob/master/rgwadmin/rgw.py


And i made my authentication just like theirs, with the language only 
difference (I think).


Thanks!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw meta pool

2016-09-10 Thread Pavan Rallabhandi
Thanks Casey for the reply, more on the tracker.

Thanks!

On 9/9/16, 11:32 PM, "ceph-users on behalf of Casey Bodley" 
 wrote:

Hi,

My (limited) understanding of this metadata heap pool is that it's an 
archive of metadata entries and their versions. According to Yehuda, 
this was intended to support recovery operations by reverting specific 
metadata objects to a previous version. But nothing has been implemented 
so far, and I'm not aware of any plans to do so. So these objects are 
being created, but never read or deleted.

This was discussed in the rgw standup this morning, and we agreed that 
this archival should be made optional (and default to off), most likely 
by assigning an empty pool name to the zone's 'metadata_heap' field. 
I've created a ticket at http://tracker.ceph.com/issues/17256 to track 
this issue.

Casey


On 09/09/2016 11:01 AM, Warren Wang - ISD wrote:
> A little extra context here. Currently the metadata pool looks like it is
> on track to exceed the number of objects in the data pool, over time. In a
> brand new cluster, we¹re already up to almost 2 million in each pool.
>
>  NAME  ID USED  %USED MAX AVAIL
> OBJECTS
>  default.rgw.buckets.data  17 3092G  0.86  345T
> 2013585
>  default.rgw.meta  25  743M 0  172T
> 1975937
>
> We¹re concerned this will be unmanageable over time.
>
> Warren Wang
>
>
> On 9/9/16, 10:54 AM, "ceph-users on behalf of Pavan Rallabhandi"
>  prallabha...@walmartlabs.com> wrote:
>
>> Any help on this is much appreciated, am considering to fix this, given
>> it¹s confirmed an issue unless am missing something obvious.
>>
>> Thanks,
>> -Pavan.
>>
>> On 9/8/16, 5:04 PM, "ceph-users on behalf of Pavan Rallabhandi"
>> > prallabha...@walmartlabs.com> wrote:
>>
>> Trying it one more time on the users list.
>> 
>> In our clusters running Jewel 10.2.2, I see default.rgw.meta pool
>> running into large number of objects, potentially to the same range of
>> objects contained in the data pool.
>> 
>> I understand that the immutable metadata entries are now stored in
>> this heap pool, but I couldn¹t reason out why the metadata objects are
>> left in this pool even after the actual bucket/object/user deletions.
>> 
>> The put_entry() promptly seems to be storing the same in the heap
>> pool
>> https://github.com/ceph/ceph/blob/master/src/rgw/rgw_metadata.cc#L880,
>> but I do not see them to be reaped ever. Are they left there for some
>> reason?
>> 
>> Thanks,
>> -Pavan.
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> This email and any files transmitted with it are confidential and 
intended solely for the individual or entity to whom they are addressed. If you 
have received this email in error destroy it immediately. *** Walmart 
Confidential ***
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 答复: ceph admin ops 403 forever

2016-09-10 Thread zhu tong
I want to see the output of rgw log, so turn the debug mode on. However, after 
running the following command, I can still see a limited log output. So how to 
turn on the debug mode of RGW?

# ceph daemon mon.ceph-node1 config show | grep debug_rgw
"debug_rgw": "1\/5",

# ceph tell mon.ceph-node1 injectargs --debug-rgw 5/5
injectargs:debug_rgw=5/5




发件人: ceph-users  代表 zhu tong 

发送时间: 2016年9月10日 7:48:11
收件人: ceph-users@lists.ceph.com
主题: [ceph-users] ceph admin ops 403 forever


Powerful list, please help!


I have spent hours on figuring out why server always responds with 403 for the 
following request:

GET /{admin}/usage?format=json HTTP/1.1
Host: {fqdn}

s3Key=someKey
s3Secret=someSecret
dateValue=`TZ=GMT date +"%a, %d %b %Y %T"`
dateValue="$dateValue GMT"
host=http://192.168.57.101
if [ "$1" = "--get-usage" ]; then
resource="/admin/usage?format=json"
stringToSign="GET\n\n\n${dateValue}\n${resource}"
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | 
base64`
curl -L -v --post301 --post302 -i -X GET \
  -H "Date: ${dateValue}" \
  -H "Authorization: AWS ${s3Key}:${signature}" \
  ${host}${resource}
fi


I have found some help examples:

The official:

https://github.com/ceph/ceph/blob/master/src/test/test_rgw_admin_log.cc


The community starred:

https://github.com/dyarnell/rgwadmin/blob/master/rgwadmin/rgw.py


And i made my authentication just like theirs, with the language only 
difference (I think).


Thanks!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS gateway

2016-09-10 Thread jan hugo prins
Based on the advice of some people on this list I have started testing
Ganesha-NFS in combination with Ceph. First results are very good and
the product looks promising. When I want to use this I need to create a
setup where different systems can mount different parts of the tree. How
do I configure this? Do I just need to create different sets of exports,
all with the specific tree they export and the correct access rights?

I read somewhere that you can have a distributed lock manager. Does this
mean that I could create a cluster of multiple NFS servers that all
share the same CephFS filesystem and that I can spread the client load?

Jan Hugo
 

On 09/07/2016 04:30 PM, jan hugo prins wrote:
> Hi,
>
> One of the use-cases I'm currently testing is the possibility to replace
> a NFS storage cluster using a Ceph cluster.
>
> The idea I have is to use a server as an intermediate gateway. On the
> client side it will expose a NFS share and on the Ceph side it will
> mount the CephFS using mount.ceph. The whole network that holds the Ceph
> environment is 10G connected and when I use the same server as S3
> gateway I can store files rather quickly. When I use the same server as
> a NFS gateway putting data on the Ceph cluster is really very slow.
>
> The reason we want to do this is that we want to create a dedicated Ceph
> storage network and have all clients that need some data access either
> use S3 or NFS to access the data. I want to do this this way because I
> don't want to give the clients in some specific networks full access to
> the Ceph filesystem.
>
> Has anyone tried this before? Is this the way to go, or are there better
> ways to fix this?
>

-- 
Met vriendelijke groet / Best regards,

Jan Hugo Prins
Infra and Isilon storage consultant

Better.be B.V.
Auke Vleerstraat 140 E | 7547 AN Enschede | KvK 08097527
T +31 (0) 53 48 00 694 | M +31 (0)6 26 358 951
jpr...@betterbe.com | www.betterbe.com

This e-mail is intended exclusively for the addressee(s), and may not
be passed on to, or made available for use by any person other than 
the addressee(s). Better.be B.V. rules out any and every liability 
resulting from any electronic transmission.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] active+clean+inconsistent: is an unexpected clone

2016-09-10 Thread Dzianis Kahanovich
ceph version 10.2.2-508-g9bfc0cf (9bfc0cf178dc21b0fe33e0ce3b90a18858abaf1b)

After same time add + re-add OSD on 1 (of 3, size=3, min_size=2) nodes (by
mistake I kill second OSD vs. make "out") I got 2 active+clean+inconsistent PGs
(in RBD pool). Both on this 2 "new" OSDs (primary).

On 2 other node MD5 equal, on primary - "clone" object zero size.
Trying to remove primary object(s) in various combinations (clone only, clone &
head) - result - recreating same files.

Last - I copy & remove related RBD images, now both objects not linked to 
nothing.

Can I do something like removing every "clone" objects (with clone ID in name,
not "head") on every 3 replicas? Or OSD map check somewere will be unhappy?

Also sometimes OSD asserions on various places. One of bypass:
--- a/src/osd/ReplicatedPG.cc   2016-09-09 04:44:43.0 +0300
+++ b/src/osd/ReplicatedPG.cc   2016-09-09 04:45:10.0 +0300
@@ -3369,6 +3369,7 @@ ReplicatedPG::OpContextUPtr ReplicatedPG
   ObjectContextRef obc = get_object_context(coid, false, NULL);
   if (!obc) {
 derr << __func__ << "could not find coid " << coid << dendl;
+return NULL;
 assert(0);
   }
   assert(obc->ssc);


Example of log:

# grep -F " 3.4e " /var/log/ceph/ceph.log
2016-09-09 04:03:53.098672 osd.1 10.227.227.103:6801/24502 79 : cluster [INF]
3.4e repair starts
2016-09-09 04:13:14.387856 osd.1 10.227.227.103:6801/24502 80 : cluster [ERR]
3.4e shard 1: soid 3:73d0516f:::rbd_data.2d2082ae8944a.3239:2368
data_digest 0x != known data_digest 0x83fa1440 from auth shard 0, size 0
!= known size 4194304
2016-09-09 04:13:14.387916 osd.1 10.227.227.103:6801/24502 81 : cluster [ERR]
repair 3.4e 3:73d0516f:::rbd_data.2d2082ae8944a.3239:2368 is an
unexpected clone
2016-09-09 07:14:02.269172 osd.1 10.227.227.103:6802/23450 125 : cluster [INF]
3.4e repair starts
2016-09-09 07:25:21.918189 osd.1 10.227.227.103:6802/23450 126 : cluster [ERR]
3.4e shard 1: soid 3:73d0516f:::rbd_data.2d2082ae8944a.3239:2368
data_digest 0x != known data_digest 0x83fa1440 from auth shard 0, size 0
!= known size 4194304
2016-09-09 07:25:21.918297 osd.1 10.227.227.103:6802/23450 127 : cluster [ERR]
repair 3.4e 3:73d0516f:::rbd_data.2d2082ae8944a.3239:2368 is an
unexpected clone
2016-09-09 07:27:19.679535 osd.1 10.227.227.103:6802/23450 128 : cluster [ERR]
3.4e repair 0 missing, 1 inconsistent objects
2016-09-09 07:27:19.679565 osd.1 10.227.227.103:6802/23450 129 : cluster [ERR]
3.4e repair 2 errors, 1 fixed
2016-09-09 07:27:19.692833 osd.1 10.227.227.103:6802/23450 131 : cluster [INF]
3.4e deep-scrub starts
2016-09-09 07:46:53.794432 osd.1 10.227.227.103:6802/23450 132 : cluster [ERR]
3.4e shard 1: soid 3:73d0516f:::rbd_data.2d2082ae8944a.3239:2368
data_digest 0x != known data_digest 0x83fa1440 from auth shard 0, size 0
!= known size 4194304
2016-09-09 07:46:53.794546 osd.1 10.227.227.103:6802/23450 133 : cluster [ERR]
deep-scrub 3.4e 3:73d0516f:::rbd_data.2d2082ae8944a.3239:2368 is an
unexpected clone
2016-09-09 07:49:27.524132 osd.1 10.227.227.103:6802/23450 134 : cluster [ERR]
3.4e deep-scrub 0 missing, 1 inconsistent objects
2016-09-09 07:49:27.524140 osd.1 10.227.227.103:6802/23450 135 : cluster [ERR]
3.4e deep-scrub 2 errors
2016-09-09 13:17:01.440590 osd.1 10.227.227.103:6802/23450 168 : cluster [INF]
3.4e repair starts
2016-09-09 13:27:49.534417 osd.1 10.227.227.103:6802/23450 169 : cluster [ERR]
3.4e shard 1: soid 3:73d0516f:::rbd_data.2d2082ae8944a.3239:2368
data_digest 0x != known data_digest 0x83fa1440 from auth shard 0, size 0
!= known size 4194304
2016-09-09 13:27:49.534482 osd.1 10.227.227.103:6802/23450 170 : cluster [ERR]
repair 3.4e 3:73d0516f:::rbd_data.2d2082ae8944a.3239:2368 is an
unexpected clone
2016-09-09 13:39:44.991204 osd.0 10.227.227.104:6803/32191 130 : cluster [INF]
3.4e starting backfill to osd.7 from (0'0,0'0] MAX to 27023'4325836
2016-09-09 17:18:49.709971 osd.1 10.227.227.103:6802/5237 14 : cluster [INF]
3.4e repair starts
2016-09-09 17:23:18.244064 osd.1 10.227.227.103:6802/5237 15 : cluster [ERR]
3.4e shard 1: soid 3:73d0516f:::rbd_data.2d2082ae8944a.3239:2368
data_digest 0x != known data_digest 0x83fa1440 from auth shard 0, size 0
!= known size 4194304
2016-09-09 17:23:18.244116 osd.1 10.227.227.103:6802/5237 16 : cluster [ERR]
repair 3.4e 3:73d0516f:::rbd_data.2d2082ae8944a.3239:2368 is an
unexpected clone
2016-09-09 17:24:26.490788 osd.1 10.227.227.103:6802/5237 17 : cluster [ERR]
3.4e repair 0 missing, 1 inconsistent objects
2016-09-09 17:24:26.490807 osd.1 10.227.227.103:6802/5237 18 : cluster [ERR]
3.4e repair 2 errors, 1 fixed

-- 
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.by/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGWZoneParams::create(): error creating default zone params: (17) File exists

2016-09-10 Thread Helmut Garrison
Hi

i installed ceph and created an object storage from documents but when i
want to create a user this message appears and after  one or two seconds
system shows my new user details .

RGWZoneParams::create(): error creating default zone params: (17) File
exists

what is this message for ? did i do something wrong or i have to do
configurations ?!! if so tell me what should i do please . thank u so much

regards
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS gateway

2016-09-10 Thread jan hugo prins
Hi Sean,

Thanks for the advice. I'm currently looking at it. First results are
promising.

Jan Hugo


On 09/07/2016 04:48 PM, Sean Redmond wrote:
> Have you seen this :
>
> https://github.com/nfs-ganesha/nfs-ganesha/wiki/Fsalsupport#CEPH
>
>

-- 
Met vriendelijke groet / Best regards,

Jan Hugo Prins
Infra and Isilon storage consultant

Better.be B.V.
Auke Vleerstraat 140 E | 7547 AN Enschede | KvK 08097527
T +31 (0) 53 48 00 694 | M +31 (0)6 26 358 951
jpr...@betterbe.com | www.betterbe.com

This e-mail is intended exclusively for the addressee(s), and may not
be passed on to, or made available for use by any person other than 
the addressee(s). Better.be B.V. rules out any and every liability 
resulting from any electronic transmission.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS gateway

2016-09-10 Thread jan hugo prins
I really think that doing async on big production environments is a no
go. But it could very well explain the issues.
Last week I used to test Ganesha and so far the results look promising.

Jan Hugo.


On 09/07/2016 06:31 PM, David wrote:
> I have clients accessing CephFS over nfs (kernel nfs). I was seeing
> slow writes with sync exports. I haven't had a chance to investigate
> and in the meantime I'm exporting with async (not recommended, but
> acceptable in my environment). 
>
> I've been meaning to test out Ganesha for a while now
>
> @Sean, have you used Ganesha with Ceph? How does performance compare
> with kernel nfs?
>
> On Wed, Sep 7, 2016 at 3:30 PM, jan hugo prins  > wrote:
>
> Hi,
>
> One of the use-cases I'm currently testing is the possibility to
> replace
> a NFS storage cluster using a Ceph cluster.
>
> The idea I have is to use a server as an intermediate gateway. On the
> client side it will expose a NFS share and on the Ceph side it will
> mount the CephFS using mount.ceph. The whole network that holds
> the Ceph
> environment is 10G connected and when I use the same server as S3
> gateway I can store files rather quickly. When I use the same
> server as
> a NFS gateway putting data on the Ceph cluster is really very slow.
>
> The reason we want to do this is that we want to create a
> dedicated Ceph
> storage network and have all clients that need some data access either
> use S3 or NFS to access the data. I want to do this this way because I
> don't want to give the clients in some specific networks full
> access to
> the Ceph filesystem.
>
> Has anyone tried this before? Is this the way to go, or are there
> better
> ways to fix this?
>
> --
> Met vriendelijke groet / Best regards,
>
> Jan Hugo Prins
> Infra and Isilon storage consultant
>
> Better.be B.V.
> Auke Vleerstraat 140 E | 7547 AN Enschede | KvK 08097527
> T +31 (0) 53 48 00 694 
> | M +31 (0)6 26 358 951 
> jpr...@betterbe.com  |
> www.betterbe.com 
>
> This e-mail is intended exclusively for the addressee(s), and may not
> be passed on to, or made available for use by any person other than
> the addressee(s). Better.be B.V. rules out any and every liability
> resulting from any electronic transmission.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
>
>

-- 
Met vriendelijke groet / Best regards,

Jan Hugo Prins
Infra and Isilon storage consultant

Better.be B.V.
Auke Vleerstraat 140 E | 7547 AN Enschede | KvK 08097527
T +31 (0) 53 48 00 694 | M +31 (0)6 26 358 951
jpr...@betterbe.com | www.betterbe.com

This e-mail is intended exclusively for the addressee(s), and may not
be passed on to, or made available for use by any person other than 
the addressee(s). Better.be B.V. rules out any and every liability 
resulting from any electronic transmission.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS gateway

2016-09-10 Thread jan hugo prins
Hi John,
> Exporting kernel client mounts with the kernel NFS server is tested as
> part of the regular testing we do on CephFS, so you should find it
> pretty stable.  This is definitely a legitimate way of putting a layer
> of security between your application servers and your storage cluster.
>
> NFS Ganesha is also an option, that is not as well tested (yet) but it
> has the advantage that you can get nice up to date Ceph client code
> without worrying about upgrading the kernel.  I'm not sure if there
> are recent ganesha packages with the ceph FSAL enabled available
> online, so you may need to compile your own.
The Centos 7 packages have it off by default, and I need them compiled
with Ceph 10.2.2 code instead of the default Ceph packeges that come
with Centos. So I did indeed do my own recompile.

>
> When you say you tried using the server as an NFS gateway, was that
> with kernel NFS + kernel CephFS?  What kind of activities did you find
> were running slowly (big files, small files, etc...)?
>
The gateway was indeed using both kernel NFS and kernel CephFS.
I only tested a simple rsync process because I believe that it shows
very well what a production use would look like.
A lot of reads and writes, a lot of files, all sizes etc. When this
showed to be really very slow I put it aside for a while and then got
the idea to test Ganesha.



-- 
Met vriendelijke groet / Best regards,

Jan Hugo Prins
Infra and Isilon storage consultant

Better.be B.V.
Auke Vleerstraat 140 E | 7547 AN Enschede | KvK 08097527
T +31 (0) 53 48 00 694 | M +31 (0)6 26 358 951
jpr...@betterbe.com | www.betterbe.com

This e-mail is intended exclusively for the addressee(s), and may not
be passed on to, or made available for use by any person other than 
the addressee(s). Better.be B.V. rules out any and every liability 
resulting from any electronic transmission.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph + VMware + Single Thread Performance

2016-09-10 Thread Alex Gorbachev
Confirming again much better performance with ESXi and NFS on RBD
using the XFS hint Nick uses, below.

I saw high load averages on the NFS server nodes, corresponding to
iowait, does not seem to cause too much trouble so far.

Here are HDtune Pro testing results from some recent runs.  The
puzzling part is better random IO performance with 16 mb object size
on both iSCSI and NFS.  I my thinking this should be slower, however,
this has been confirmed by the timed vmotion tests and more random IO
tests by my coworker as well:

Test_type read MB/s write MB/s read iops write iops read multi iops
write multi iops
NFS 1mb 460 103 8753 66 47466 1616
NFS 4mb 441 147 8863 82 47556 764
iSCSI 1mb 117 76 326 90 672 938
iSCSI 4mb 275 60 205 24 2015 1212
NFS 16mb 455 177 7761 119 36403 3175
iSCSI 16mb 300 65 1117 237 12389 1826

( prettier view at
http://storcium.blogspot.com/2016/09/latest-tests-on-nfs-vs.html )

Alex

>
> From: Alex Gorbachev [mailto:a...@iss-integration.com]
> Sent: 04 September 2016 04:45
> To: Nick Fisk 
> Cc: Wilhelm Redbrake ; Horace Ng ; 
> ceph-users 
> Subject: Re: [ceph-users] Ceph + VMware + Single Thread Performance
>
>
>
>
>
> On Saturday, September 3, 2016, Alex Gorbachev  
> wrote:
>
> HI Nick,
>
> On Sun, Aug 21, 2016 at 3:19 PM, Nick Fisk  wrote:
>
> From: Alex Gorbachev [mailto:a...@iss-integration.com]
> Sent: 21 August 2016 15:27
> To: Wilhelm Redbrake 
> Cc: n...@fisk.me.uk; Horace Ng ; ceph-users 
> 
> Subject: Re: [ceph-users] Ceph + VMware + Single Thread Performance
>
>
>
>
>
> On Sunday, August 21, 2016, Wilhelm Redbrake  wrote:
>
> Hi Nick,
> i understand all of your technical improvements.
> But: why do you Not use a simple for example Areca Raid Controller with 8 gb 
> Cache and Bbu ontop in every ceph node.
> Configure n Times RAID 0 on the Controller and enable Write back Cache.
> That must be a latency "Killer" like in all the prop. Storage arrays or Not ??
>
> Best Regards !!
>
>
>
> What we saw specifically with Areca cards is that performance is excellent in 
> benchmarking and for bursty loads. However, once we started loading with more 
> constant workloads (we replicate databases and files to our Ceph cluster), 
> this looks to have saturated the relatively small Areca NVDIMM caches and we 
> went back to pure drive based performance.
>
>
>
> Yes, I think that is a valid point. Although low latency, you are still 
> having to write to the disks twice (journal+data), so once the cache’s on the 
> cards start filling up, you are going to hit problems.
>
>
>
>
>
> So we built 8 new nodes with no Arecas, M500 SSDs for journals (1 SSD per 3 
> HDDs) in hopes that it would help reduce the noisy neighbor impact. That 
> worked, but now the overall latency is really high at times, not always. Red 
> Hat engineer suggested this is due to loading the 7200 rpm NL-SAS drives with 
> too many IOPS, which get their latency sky high. Overall we are functioning 
> fine, but I sure would like storage vmotion and other large operations faster.
>
>
>
>
>
> Yeah this is the biggest pain point I think. Normal VM ops are fine, but if 
> you ever have to move a multi-TB VM, it’s just too slow.
>
>
>
> If you use iscsi with vaai and are migrating a thick provisioned vmdk, then 
> performance is actually quite good, as the block sizes used for the copy are 
> a lot bigger.
>
>
>
> However, my use case required thin provisioned VM’s + snapshots and I found 
> that using iscsi you have no control over the fragmentation of the vmdk’s and 
> so the read performance is then what suffers (certainly with 7.2k disks)
>
>
>
> Also with thin provisioned vmdk’s I think I was seeing PG contention with the 
> updating of the VMFS metadata, although I can’t be sure.
>
>
>
>
>
> I am thinking I will test a few different schedulers and readahead settings 
> to see if we can improve this by parallelizing reads. Also will test NFS, but 
> need to determine whether to do krbd/knfsd or something more interesting like 
> CephFS/Ganesha.
>
>
>
> As you know I’m on NFS now. I’ve found it a lot easier to get going and a lot 
> less sensitive to making config adjustments without suddenly everything 
> dropping offline. The fact that you can specify the extent size on XFS helps 
> massively with using thin vmdks/snapshots to avoid fragmentation. Storage 
> v-motions are a bit faster than iscsi, but I think I am hitting PG contention 
> when esxi tries to write 32 copy threads to the same object. There is 
> probably some tuning that could be done here (RBD striping???) but this is 
> the best it’s been for a long time and I’m reluctant to fiddle any further.
>
>
>
> We have moved ahead and added NFS support to Storcium, and now able ti run 
> NFS servers with Pacemaker in HA mode (all agents are public at 
> https://github.com/akurz/resource-agents/tree/master/heartbeat).  I can 
> confirm that VM performance is definitely better and benchmarks are more 
> smooth (in Windows we can see a lot of choppiness with iSC