Re: [Gluster-devel] glusters 3.12.2: bricks do not start on NetBSD
Emmanuel Dreyfuswrote: > [2017-11-02 12:32:57.429885] E [MSGID: 115092] > [server-handshake.c:586:server_setvolume] 0-gfs-server: No xlator > /export/wd0e is found in child status list > [2017-11-02 12:32:57.430162] I [MSGID: 115091] > [server-handshake.c:761:server_setvolume] 0-gfs-server: Failed to get > client opversion Problem solved through gluster volume sync on each server right after upgrading. I still do not know what went wrong, but I have a workaround. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Coverity fixes
On Fri, Nov 3, 2017 at 9:25 AM, Atin Mukherjeewrote: > > On Fri, 3 Nov 2017 at 18:31, Kaleb S. KEITHLEY > wrote: > >> On 11/02/2017 10:19 AM, Atin Mukherjee wrote: >> > While I appreciate the folks to contribute lot of coverity fixes over >> > last few days, I have an observation for some of the patches the >> > coverity issue id(s) are *not* mentioned which gets maintainers in a >> > difficult situation to understand the exact complaint coming out of the >> > coverity. From my past experience in fixing coverity defects, sometimes >> > the fixes might look simple but they are not. >> > >> > May I request all the developers to include the defect id in the commit >> > message for all the coverity fixes? >> > >> >> How does that work? AFAIK the defect IDs are constantly changing as some >> get fixed and new ones get added. > > > We’d need atleast (a) the defect id with pointer to the coverity link > which most of the devs are now following I guess but with a caveat that > link goes stale in 7 days and the review needs to be done by that time or > (b) the commit message should exactly have the coverity description which > is more neat. > > ( I was not knowing the fact the defect id are not constant and later on > got to know this from Nigel today) > >> >> +1 to providing a clean description of the issue rather than using a temporary defect ID. -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Gluster Summit Discussion: Time taken for regression tests
All, While we discussed many other things, we also discussed about reducing time taken for the regression jobs. As it stands now, it take around 5hr 40mins to complete a single run. There were many suggestions: - Run them in parallel (as each .t test is independent of each other) - Revisit the tests taking long time (20 tests take almost 6000 seconds as of now). - See if we can run the tests in docker (but the issue is the machines we have are of 2cores, so there may not be much gain) There are other suggestions as well: - Spend effort and see if there are repeated steps, and merge the tests. - Most of the time is spent in starting the processes and cleaning up. - Most of the tests run the similar volume create command (depending on the volume type), and run few different type of I/O in different tests. - Try to see if these things can be merged. - Most of the bug-fix .t files belong to this category too. - Classify the tests specific to few non-overlapping volume types and depending on the changeset in the patch (based on the files changed) decide which are the groups to run. - For example, you can't have replicate and disperse volume type together. More ideas and suggestions welcome. Regards, Amar ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Request for Comments: Upgrades from 3.x to 4.0+
Just so I am clear the upgrade process will be as follows: upgrade all clients to 4.0 rolling upgrade all servers to 4.0 (with GD1) kill all GD1 daemons on all servers and run upgrade script (new clients unable to connect at this point) start GD2 ( necessary or does the upgrade script do this?) I assume that once the cluster had been migrated to GD2 the glusterd startup script will be smart enough to start the correct version? -Thanks On 3 November 2017 at 04:06, Kaushal Mwrote: > On Thu, Nov 2, 2017 at 7:53 PM, Darrell Budic > wrote: > > Will the various client packages (centos in my case) be able to > > automatically handle the upgrade vs new install decision, or will we be > > required to do something manually to determine that? > > We should be able to do this with CentOS (and other RPM based distros) > which have well split glusterfs packages currently. > At this moment, I don't know exactly how much can be handled > automatically, but I expect the amount of manual intervention to be > minimal. > The least minimum amount of manual work needed would be enabling and > starting GD2 and starting the migration script. > > > > > It’s a little unclear that things will continue without interruption > because > > of the way you describe the change from GD1 to GD2, since it sounds like > it > > stops GD1. > > With the described upgrade strategy, we can ensure continuous volume > access to clients during the whole process (provided volumes have been > setup with replication or ec). > > During the migration from GD1 to GD2, any existing clients still > retain access, and can continue to work without interruption. > This is possible because gluster keeps the management (glusterds) and > data (bricks and clients) parts separate. > So it is possible to interrupt the management parts, without > interrupting data access to existing clients. > Clients and the server side brick processes need GlusterD to start up. > But once they're running, they can run without GlusterD. GlusterD is > only required again if something goes wrong. > Stopping GD1 during the migration process, will not lead to any > interruptions for existing clients. > The brick process continue to run, and any connected clients continue > to remain connected to the bricks. > Any new clients which try to mount the volumes during this migration > will fail, as a GlusterD will not be available (either GD1 or GD2). > > > Early days, obviously, but if you could clarify if that’s what > > we’re used to as a rolling upgrade or how it works, that would be > > appreciated. > > A Gluster rolling upgrade process, allows data access to volumes > during the process, while upgrading the brick processes as well. > Rolling upgrades with uninterrupted access requires that volumes have > redundancy (replicate or ec). > Rolling upgrades involves upgrading servers belonging to a redundancy > set (replica set or ec set), one at a time. > One at a time, > - A server is picked from a redundancy set > - All Gluster processes are killed on the server, glusterd, bricks and > other daemons included. > - Gluster is upgraded and restarted on the server > - A heal is performed to heal new data onto the bricks. > - Move onto next server after heal finishes. > > Clients maintain uninterrupted access, because a full redundancy set > is never taken offline all at once. > > > Also clarification that we’ll be able to upgrade from 3.x > > (3.1x?) to 4.0, manually or automatically? > > Rolling upgrades from 3.1x to 4.0 are a manual process. But I believe, > gdeploy has playbooks to automate it. > At the end of this you will be left with a 4.0 cluster, but still be > running GD1. > Upgrading from GD1 to GD2, in 4.0 will be a manual process. A script > that automates this is planned only for 4.1. > > > > > > > > > From: Kaushal M > > Subject: [Gluster-users] Request for Comments: Upgrades from 3.x to 4.0+ > > Date: November 2, 2017 at 3:56:05 AM CDT > > To: gluster-us...@gluster.org; Gluster Devel > > > > We're fast approaching the time for Gluster-4.0. And we would like to > > set out the expected upgrade strategy and try to polish it to be as > > user friendly as possible. > > > > We're getting this out here now, because there was quite a bit of > > concern and confusion regarding the upgrades between 3.x and 4.0+. > > > > --- > > ## Background > > > > Gluster-4.0 will bring a newer management daemon, GlusterD-2.0 (GD2), > > which is backwards incompatible with the GlusterD (GD1) in > > GlusterFS-3.1+. As a hybrid cluster of GD1 and GD2 cannot be > > established, rolling upgrades are not possible. This meant that > > upgrades from 3.x to 4.0 would require a volume downtime and possible > > client downtime. > > > > This was a cause of concern among many during the recently concluded > > Gluster Summit 2017. > > > > We would like to keep pains experienced by our users to a
Re: [Gluster-devel] Coverity fixes
On Fri, 3 Nov 2017 at 18:31, Kaleb S. KEITHLEYwrote: > On 11/02/2017 10:19 AM, Atin Mukherjee wrote: > > While I appreciate the folks to contribute lot of coverity fixes over > > last few days, I have an observation for some of the patches the > > coverity issue id(s) are *not* mentioned which gets maintainers in a > > difficult situation to understand the exact complaint coming out of the > > coverity. From my past experience in fixing coverity defects, sometimes > > the fixes might look simple but they are not. > > > > May I request all the developers to include the defect id in the commit > > message for all the coverity fixes? > > > > How does that work? AFAIK the defect IDs are constantly changing as some > get fixed and new ones get added. We’d need atleast (a) the defect id with pointer to the coverity link which most of the devs are now following I guess but with a caveat that link goes stale in 7 days and the review needs to be done by that time or (b) the commit message should exactly have the coverity description which is more neat. ( I was not knowing the fact the defect id are not constant and later on got to know this from Nigel today) > > > (And I know everyone looks at the coverity report after their new code > is committed to see if they might have added a new issue.) > > Today's defect ID 435 might be 436 or 421 tomorrow. > > > -- > > Kaleb > -- - Atin (atinm) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Coverity fixes
On 11/02/2017 10:19 AM, Atin Mukherjee wrote: > While I appreciate the folks to contribute lot of coverity fixes over > last few days, I have an observation for some of the patches the > coverity issue id(s) are *not* mentioned which gets maintainers in a > difficult situation to understand the exact complaint coming out of the > coverity. From my past experience in fixing coverity defects, sometimes > the fixes might look simple but they are not. > > May I request all the developers to include the defect id in the commit > message for all the coverity fixes? > How does that work? AFAIK the defect IDs are constantly changing as some get fixed and new ones get added. (And I know everyone looks at the coverity report after their new code is committed to see if they might have added a new issue.) Today's defect ID 435 might be 436 or 421 tomorrow. -- Kaleb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Coverity covscan for 2017-11-03-2ef2b600 (master branch)
GlusterFS Coverity covscan results are available from http://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2017-11-03-2ef2b600 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] BoF - Gluster for VM store use case
Below you can find three fio commands used for running each benchmark test, sequential write, random 4k read and random 4k write. # fio --name=writefile --size=10G --filesize=10G --filename=fio_file --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio # fio --time_based --name=benchmark --size=10G --runtime=30 --filename=fio_file --ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randread --blocksize=4k --group_reporting # fio --time_based --name=benchmark --size=10G --runtime=30 --filename=fio_file --ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k --group_reporting And here timed extraction of kernel source, first run: # time tar xf linux-4.13.11.tar.xz real 0m8.180s user 0m5.932s sys 0m2.924s second run, after deleting first: # rm -rf linux-4.13.11 # time tar xf linux-4.13.11.tar.xz real 0m6.454s user 0m6.012s sys 0m2.440s El 03/11/17 a les 09:33, Gandalf Corvotempesta ha escrit: Could you please share fio command line used for this test? Additionally, can you tell me the time needed to extract the kernel source? Il 2 nov 2017 11:24 PM, "Ramon Selga"> ha scritto: Hi, Just for your reference we got some similar values in a customer setup with three nodes single Xeon and 4x8TB HDD each with a double 10GbE backbone. We did a simple benchmark with fio tool on a virtual disk (virtio) of a 1TiB of size, XFS formatted directly no partitions no LVM, inside a VM (debian stretch, dual core 4GB RAM) deployed in a gluster volume disperse 3 redundancy 1 distributed 2, sharding enabled. We run a sequential write test 10GB file in 1024k blocks, a random read test with 4k blocks and a random write test also with 4k blocks several times with results very similar to the following: writefile: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=200 fio-2.16 Starting 1 process writefile: (groupid=0, jobs=1): err= 0: pid=11515: Thu Nov 2 16:50:05 2017 write: io=10240MB, bw=473868KB/s, iops=462, runt= 22128msec slat (usec): min=20, max=98830, avg=1972.11, stdev=6612.81 clat (msec): min=150, max=2979, avg=428.49, stdev=189.96 lat (msec): min=151, max=2979, avg=430.47, stdev=189.90 clat percentiles (msec): | 1.00th=[ 204], 5.00th=[ 249], 10.00th=[ 273], 20.00th=[ 293], | 30.00th=[ 306], 40.00th=[ 318], 50.00th=[ 351], 60.00th=[ 502], | 70.00th=[ 545], 80.00th=[ 578], 90.00th=[ 603], 95.00th=[ 627], | 99.00th=[ 717], 99.50th=[ 775], 99.90th=[ 2966], 99.95th=[ 2966], | 99.99th=[ 2966] lat (msec) : 250=5.09%, 500=54.65%, 750=39.64%, 1000=0.31%, 2000=0.07% lat (msec) : >=2000=0.24% cpu : usr=7.81%, sys=1.48%, ctx=1221, majf=0, minf=11 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.3%, >=64=99.4% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=0/w=10240/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=200 Run status group 0 (all jobs): WRITE: io=10240MB, aggrb=473868KB/s, minb=473868KB/s, maxb=473868KB/s, mint=22128msec, maxt=22128msec Disk stats (read/write): vdg: ios=0/10243, merge=0/0, ticks=0/2745892, in_queue=2745884, util=99.18 benchmark: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128 ... fio-2.16 Starting 4 processes benchmark: (groupid=0, jobs=4): err= 0: pid=11529: Thu Nov 2 16:52:40 2017 read : io=1123.9MB, bw=38347KB/s, iops=9586, runt= 30011msec slat (usec): min=1, max=228886, avg=415.40, stdev=3975.72 clat (usec): min=482, max=328648, avg=52664.65, stdev=30216.00 lat (msec): min=9, max=527, avg=53.08, stdev=30.38 clat percentiles (msec): | 1.00th=[ 12], 5.00th=[ 22], 10.00th=[ 23], 20.00th=[ 25], | 30.00th=[ 33], 40.00th=[ 38], 50.00th=[ 47], 60.00th=[ 55], | 70.00th=[ 64], 80.00th=[ 76], 90.00th=[ 95], 95.00th=[ 111], | 99.00th=[ 151], 99.50th=[ 163], 99.90th=[ 192], 99.95th=[ 196], | 99.99th=[ 210] lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 10=0.03%, 20=3.59%, 50=52.41%, 100=36.01%, 250=7.96% lat (msec) : 500=0.01% cpu : usr=0.29%, sys=1.10%, ctx=10157, majf=0, minf=549 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%,
[Gluster-devel] New Defects reported by Coverity Scan for gluster/glusterfs
Hi, Please find the latest report on new defect(s) introduced to gluster/glusterfs found with Coverity Scan. 146 new defect(s) introduced to gluster/glusterfs found with Coverity Scan. 180 defect(s), reported by Coverity Scan earlier, were marked fixed in the recent build analyzed by Coverity Scan. New defect(s) Reported-by: Coverity Scan Showing 20 of 146 defect(s) ** CID 1382343: Incorrect expression (NO_EFFECT) /xlators/cluster/dht/src/dht-common.c: 4962 in dht_dir_common_setxattr() *** CID 1382343: Incorrect expression (NO_EFFECT) /xlators/cluster/dht/src/dht-common.c: 4962 in dht_dir_common_setxattr() 4956 4957 conf = this->private; 4958 local= frame->local; 4959 call_cnt = conf->subvolume_cnt; 4960 local->flags = flags; 4961 >>> CID 1382343: Incorrect expression (NO_EFFECT) >>> Comparing an array to null is not useful: "local->gfid", since the test >>> will always evaluate as true. 4962 if (local->gfid) 4963 gf_uuid_unparse(local->gfid, gfid_local); 4964 4965 /* Check if any user xattr present in xattr 4966 */ 4967 dict_foreach_fnmatch (xattr, "user*", dht_is_user_xattr, ** CID 1382342: Null pointer dereferences (FORWARD_NULL) /rpc/rpc-transport/socket/src/socket.c: 2981 in socket_server_event_handler() *** CID 1382342: Null pointer dereferences (FORWARD_NULL) /rpc/rpc-transport/socket/src/socket.c: 2981 in socket_server_event_handler() 2975 * the new_trans since we've failed at everything so far 2976 */ 2977 rpc_transport_unref (new_trans); 2978 } 2979 } 2980 out: >>> CID 1382342: Null pointer dereferences (FORWARD_NULL) >>> Dereferencing null pointer "ctx". 2981 event_handled (ctx->event_pool, fd, idx, gen); 2982 2983 if (cname && (cname != this->ssl_name)) { 2984 GF_FREE(cname); 2985 } 2986 return ret; ** CID 1382341: Null pointer dereferences (FORWARD_NULL) /libglusterfs/src/ctx.c: 50 in glusterfs_ctx_new() *** CID 1382341: Null pointer dereferences (FORWARD_NULL) /libglusterfs/src/ctx.c: 50 in glusterfs_ctx_new() 44 #endif 45 46 /* lock is never destroyed! */ 47 ret = LOCK_INIT (>lock); 48 if (ret) { 49 free (ctx); >>> CID 1382341: Null pointer dereferences (FORWARD_NULL) >>> Assigning: "ctx" = "NULL". 50 ctx = NULL; 51 } 52 53 GF_ATOMIC_INIT (ctx->stats.max_dict_pairs, 0); 54 GF_ATOMIC_INIT (ctx->stats.total_pairs_used, 0); 55 GF_ATOMIC_INIT (ctx->stats.total_dicts_used, 0); ** CID 1325526:(USE_AFTER_FREE) /xlators/storage/posix/src/posix.c: 6061 in _posix_handle_xattr_keyvalue_pair() /xlators/storage/posix/src/posix.c: 6061 in _posix_handle_xattr_keyvalue_pair() *** CID 1325526:(USE_AFTER_FREE) /xlators/storage/posix/src/posix.c: 6061 in _posix_handle_xattr_keyvalue_pair() 6055 6056 out: 6057 if (op_ret < 0) 6058 filler->op_errno = op_errno; 6059 6060 if (array) >>> CID 1325526:(USE_AFTER_FREE) >>> Calling "__gf_free" frees pointer "array" which has already been freed. 6061 GF_FREE (array); 6062 6063 return op_ret; 6064 } 6065 6066 /** /xlators/storage/posix/src/posix.c: 6061 in _posix_handle_xattr_keyvalue_pair() 6055 6056 out: 6057 if (op_ret < 0) 6058 filler->op_errno = op_errno; 6059 6060 if (array) >>> CID 1325526:(USE_AFTER_FREE) >>> Passing freed pointer "array" as an argument to "__gf_free". 6061 GF_FREE (array); 6062 6063 return op_ret; 6064 } 6065 6066 /** ** CID 1292646: Insecure data handling (TAINTED_SCALAR) *** CID 1292646: Insecure data handling (TAINTED_SCALAR) /libglusterfs/src/store.c: 611 in gf_store_iter_get_next() 605 store_errno = GD_STORE_ENOMEM; 606 goto out; 607 } 608 ret = 0; 609 610 out: >>> CID 1292646: Insecure data handling (TAINTED_SCALAR) >>> Passing tainted variable "scan_str" to a tainted sink. 611 GF_FREE
Re: [Gluster-devel] About GF_ASSERT() macro
As per your review comments, introduced GF_ABORT as part of patch https://review.gluster.org/#/c/18309/5 I wouldn't get into changing anything with ASSERT at the moment, as there are around ~2800 instances :-o Wherever it is critical, lets call 'GF_ABORT()' in future, and also we should have a 'checkpatches.pl' check to warn people about usage of GF_ABORT(). Regards, Amar On Fri, Nov 3, 2017 at 2:05 PM, Xavi Hernandezwrote: > Hi all, > > I've seen that GF_ASSERT() macro is defined in different ways depending on > if we are building in debug mode or not. > > In debug mode, it's an alias of assert(), but in non-debug mode it simply > logs an error message and continues. > > I think that an assert should be a critical check that should always be > true, specially in production code. Allowing the program to continue after > a failure on one of these checks is dangerous. Most probably it will crash > later, losing some information about the real cause of the error. But even > if it doesn't crash, some internal data will be invalid, leading to a bad > behavior. > > I think we should always terminate the process if an assertion fails, even > in production level code. If some failure is not considered so much > critical by the coder, it should not use GF_ASSERT() and only write a log > message or use another of the condition check macros. > > Thoughts ? > > Xavi > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel > -- Amar Tumballi (amarts) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] About GF_ASSERT() macro
Hi all, I've seen that GF_ASSERT() macro is defined in different ways depending on if we are building in debug mode or not. In debug mode, it's an alias of assert(), but in non-debug mode it simply logs an error message and continues. I think that an assert should be a critical check that should always be true, specially in production code. Allowing the program to continue after a failure on one of these checks is dangerous. Most probably it will crash later, losing some information about the real cause of the error. But even if it doesn't crash, some internal data will be invalid, leading to a bad behavior. I think we should always terminate the process if an assertion fails, even in production level code. If some failure is not considered so much critical by the coder, it should not use GF_ASSERT() and only write a log message or use another of the condition check macros. Thoughts ? Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] BoF - Gluster for VM store use case
Could you please share fio command line used for this test? Additionally, can you tell me the time needed to extract the kernel source? Il 2 nov 2017 11:24 PM, "Ramon Selga"ha scritto: > Hi, > > Just for your reference we got some similar values in a customer setup > with three nodes single Xeon and 4x8TB HDD each with a double 10GbE > backbone. > > We did a simple benchmark with fio tool on a virtual disk (virtio) of a > 1TiB of size, XFS formatted directly no partitions no LVM, inside a VM > (debian stretch, dual core 4GB RAM) deployed in a gluster volume disperse 3 > redundancy 1 distributed 2, sharding enabled. > > We run a sequential write test 10GB file in 1024k blocks, a random read > test with 4k blocks and a random write test also with 4k blocks several > times with results very similar to the following: > > writefile: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, > iodepth=200 > fio-2.16 > Starting 1 process > > writefile: (groupid=0, jobs=1): err= 0: pid=11515: Thu Nov 2 16:50:05 2017 > write: io=10240MB, bw=473868KB/s, iops=462, runt= 22128msec > slat (usec): min=20, max=98830, avg=1972.11, stdev=6612.81 > clat (msec): min=150, max=2979, avg=428.49, stdev=189.96 > lat (msec): min=151, max=2979, avg=430.47, stdev=189.90 > clat percentiles (msec): > | 1.00th=[ 204], 5.00th=[ 249], 10.00th=[ 273], 20.00th=[ 293], > | 30.00th=[ 306], 40.00th=[ 318], 50.00th=[ 351], 60.00th=[ 502], > | 70.00th=[ 545], 80.00th=[ 578], 90.00th=[ 603], 95.00th=[ 627], > | 99.00th=[ 717], 99.50th=[ 775], 99.90th=[ 2966], 99.95th=[ 2966], > | 99.99th=[ 2966] > lat (msec) : 250=5.09%, 500=54.65%, 750=39.64%, 1000=0.31%, 2000=0.07% > lat (msec) : >=2000=0.24% > cpu : usr=7.81%, sys=1.48%, ctx=1221, majf=0, minf=11 > IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.3%, > >=64=99.4% > submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.1% > issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=200 > > Run status group 0 (all jobs): > WRITE: io=10240MB, aggrb=473868KB/s, minb=473868KB/s, maxb=473868KB/s, > mint=22128msec, maxt=22128msec > > Disk stats (read/write): > vdg: ios=0/10243, merge=0/0, ticks=0/2745892, in_queue=2745884, > util=99.18 > > benchmark: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, > iodepth=128 > ... > fio-2.16 > Starting 4 processes > > benchmark: (groupid=0, jobs=4): err= 0: pid=11529: Thu Nov 2 16:52:40 2017 > read : io=1123.9MB, bw=38347KB/s, iops=9586, runt= 30011msec > slat (usec): min=1, max=228886, avg=415.40, stdev=3975.72 > clat (usec): min=482, max=328648, avg=52664.65, stdev=30216.00 > lat (msec): min=9, max=527, avg=53.08, stdev=30.38 > clat percentiles (msec): > | 1.00th=[ 12], 5.00th=[ 22], 10.00th=[ 23], 20.00th=[ 25], > | 30.00th=[ 33], 40.00th=[ 38], 50.00th=[ 47], 60.00th=[ 55], > | 70.00th=[ 64], 80.00th=[ 76], 90.00th=[ 95], 95.00th=[ 111], > | 99.00th=[ 151], 99.50th=[ 163], 99.90th=[ 192], 99.95th=[ 196], > | 99.99th=[ 210] > lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01% > lat (msec) : 10=0.03%, 20=3.59%, 50=52.41%, 100=36.01%, 250=7.96% > lat (msec) : 500=0.01% > cpu : usr=0.29%, sys=1.10%, ctx=10157, majf=0, minf=549 > IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, > >=64=99.9% > submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.1% > issued: total=r=287705/w=0/d=0, short=r=0/w=0/d=0, > drop=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=128 > > Run status group 0 (all jobs): >READ: io=1123.9MB, aggrb=38346KB/s, minb=38346KB/s, maxb=38346KB/s, > mint=30011msec, maxt=30011msec > > Disk stats (read/write): > vdg: ios=286499/2, merge=0/0, ticks=3707064/64, in_queue=3708680, > util=99.83% > > benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, > iodepth=128 > ... > fio-2.16 > Starting 4 processes > > benchmark: (groupid=0, jobs=4): err= 0: pid=11545: Thu Nov 2 16:55:54 2017 > write: io=422464KB, bw=14079KB/s, iops=3519, runt= 30006msec > slat (usec): min=1, max=230620, avg=1130.75, stdev=6744.31 > clat (usec): min=643, max=540987, avg=143999.57, stdev=66693.45 > lat (msec): min=8, max=541, avg=145.13, stdev=67.01 > clat percentiles (msec): > | 1.00th=[ 34], 5.00th=[ 75], 10.00th=[ 87], 20.00th=[ 100], > | 30.00th=[ 109], 40.00th=[ 116], 50.00th=[ 123], 60.00th=[ 135], > | 70.00th=[ 151], 80.00th=[ 182], 90.00th=[ 241], 95.00th=[ 289], > | 99.00th=[ 359], 99.50th=[ 416], 99.90th=[ 465], 99.95th=[
Re: [Gluster-devel] [Gluster-users] Request for Comments: Upgrades from 3.x to 4.0+
On Thu, Nov 2, 2017 at 7:53 PM, Darrell Budicwrote: > Will the various client packages (centos in my case) be able to > automatically handle the upgrade vs new install decision, or will we be > required to do something manually to determine that? We should be able to do this with CentOS (and other RPM based distros) which have well split glusterfs packages currently. At this moment, I don't know exactly how much can be handled automatically, but I expect the amount of manual intervention to be minimal. The least minimum amount of manual work needed would be enabling and starting GD2 and starting the migration script. > > It’s a little unclear that things will continue without interruption because > of the way you describe the change from GD1 to GD2, since it sounds like it > stops GD1. With the described upgrade strategy, we can ensure continuous volume access to clients during the whole process (provided volumes have been setup with replication or ec). During the migration from GD1 to GD2, any existing clients still retain access, and can continue to work without interruption. This is possible because gluster keeps the management (glusterds) and data (bricks and clients) parts separate. So it is possible to interrupt the management parts, without interrupting data access to existing clients. Clients and the server side brick processes need GlusterD to start up. But once they're running, they can run without GlusterD. GlusterD is only required again if something goes wrong. Stopping GD1 during the migration process, will not lead to any interruptions for existing clients. The brick process continue to run, and any connected clients continue to remain connected to the bricks. Any new clients which try to mount the volumes during this migration will fail, as a GlusterD will not be available (either GD1 or GD2). > Early days, obviously, but if you could clarify if that’s what > we’re used to as a rolling upgrade or how it works, that would be > appreciated. A Gluster rolling upgrade process, allows data access to volumes during the process, while upgrading the brick processes as well. Rolling upgrades with uninterrupted access requires that volumes have redundancy (replicate or ec). Rolling upgrades involves upgrading servers belonging to a redundancy set (replica set or ec set), one at a time. One at a time, - A server is picked from a redundancy set - All Gluster processes are killed on the server, glusterd, bricks and other daemons included. - Gluster is upgraded and restarted on the server - A heal is performed to heal new data onto the bricks. - Move onto next server after heal finishes. Clients maintain uninterrupted access, because a full redundancy set is never taken offline all at once. > Also clarification that we’ll be able to upgrade from 3.x > (3.1x?) to 4.0, manually or automatically? Rolling upgrades from 3.1x to 4.0 are a manual process. But I believe, gdeploy has playbooks to automate it. At the end of this you will be left with a 4.0 cluster, but still be running GD1. Upgrading from GD1 to GD2, in 4.0 will be a manual process. A script that automates this is planned only for 4.1. > > > > From: Kaushal M > Subject: [Gluster-users] Request for Comments: Upgrades from 3.x to 4.0+ > Date: November 2, 2017 at 3:56:05 AM CDT > To: gluster-us...@gluster.org; Gluster Devel > > We're fast approaching the time for Gluster-4.0. And we would like to > set out the expected upgrade strategy and try to polish it to be as > user friendly as possible. > > We're getting this out here now, because there was quite a bit of > concern and confusion regarding the upgrades between 3.x and 4.0+. > > --- > ## Background > > Gluster-4.0 will bring a newer management daemon, GlusterD-2.0 (GD2), > which is backwards incompatible with the GlusterD (GD1) in > GlusterFS-3.1+. As a hybrid cluster of GD1 and GD2 cannot be > established, rolling upgrades are not possible. This meant that > upgrades from 3.x to 4.0 would require a volume downtime and possible > client downtime. > > This was a cause of concern among many during the recently concluded > Gluster Summit 2017. > > We would like to keep pains experienced by our users to a minimum, so > we are trying to develop an upgrade strategy that avoids downtime as > much as possible. > > ## (Expected) Upgrade strategy from 3.x to 4.0 > > Gluster-4.0 will ship with both GD1 and GD2. > For fresh installations, only GD2 will be installed and available by > default. > For existing installations (upgrades) GD1 will be installed and run by > default. GD2 will also be installed simultaneously, but will not run > automatically. > > GD1 will allow rolling upgrades, and allow properly setup Gluster > volumes to be upgraded to 4.0 binaries, without downtime. > > Once the full pool is upgraded, and all bricks and other daemons are > running 4.0 binaries, migration to GD2 can happen. > > To