Re: [Gluster-devel] glusters 3.12.2: bricks do not start on NetBSD

2017-11-03 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> [2017-11-02 12:32:57.429885] E [MSGID: 115092]
> [server-handshake.c:586:server_setvolume] 0-gfs-server: No xlator
> /export/wd0e is found in child status list
> [2017-11-02 12:32:57.430162] I [MSGID: 115091]
> [server-handshake.c:761:server_setvolume] 0-gfs-server: Failed to get
> client opversion

Problem solved through gluster volume sync on each server right after
upgrading. I still do not know what went wrong, but I have a workaround.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Coverity fixes

2017-11-03 Thread Vijay Bellur
On Fri, Nov 3, 2017 at 9:25 AM, Atin Mukherjee  wrote:

>
> On Fri, 3 Nov 2017 at 18:31, Kaleb S. KEITHLEY 
> wrote:
>
>> On 11/02/2017 10:19 AM, Atin Mukherjee wrote:
>> > While I appreciate the folks to contribute lot of coverity fixes over
>> > last few days, I have an observation for some of the patches the
>> > coverity issue id(s) are *not* mentioned which gets maintainers in a
>> > difficult situation to understand the exact complaint coming out of the
>> > coverity. From my past experience in fixing coverity defects, sometimes
>> > the fixes might look simple but they are not.
>> >
>> > May I request all the developers to include the defect id in the commit
>> > message for all the coverity fixes?
>> >
>>
>> How does that work? AFAIK the defect IDs are constantly changing as some
>> get fixed and new ones get added.
>
>
> We’d need atleast (a) the defect id with pointer to the coverity link
> which most of the devs are now following I guess but with a caveat that
> link goes stale in 7 days and the review needs to be done by that time or
> (b) the commit message should exactly have the coverity description which
> is more neat.
>
> ( I was not knowing the fact the defect id are not constant and later on
> got to know this from Nigel today)
>
>>
>>

+1 to providing a clean description of the issue rather than using a
temporary defect ID.

-Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Gluster Summit Discussion: Time taken for regression tests

2017-11-03 Thread Amar Tumballi
All,

While we discussed many other things, we also discussed about reducing time
taken for the regression jobs. As it stands now, it take around 5hr 40mins
to complete a single run.

There were many suggestions:


   - Run them in parallel (as each .t test is independent of each other)
   - Revisit the tests taking long time (20 tests take almost 6000 seconds
   as of now).
   - See if we can run the tests in docker (but the issue is the machines
   we have are of 2cores, so there may not be much gain)


There are other suggestions as well:


   - Spend effort and see if there are repeated steps, and merge the tests.
  - Most of the time is spent in starting the processes and cleaning up.
  - Most of the tests run the similar volume create command (depending
  on the volume type), and run few different type of I/O in different tests.
  - Try to see if these things can be merged.
  - Most of the bug-fix .t files belong to this category too.
   - Classify the tests specific to few non-overlapping volume types and
   depending on the changeset in the patch (based on the files changed) decide
   which are the groups to run.
  - For example, you can't have replicate and disperse volume type
  together.




More ideas and suggestions welcome.


Regards,
Amar
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Request for Comments: Upgrades from 3.x to 4.0+

2017-11-03 Thread Alastair Neil
Just so I am clear the upgrade process will be as follows:

upgrade all clients to 4.0

rolling upgrade all servers to 4.0 (with GD1)

kill all GD1 daemons on all servers and run upgrade script (new clients
unable to connect at this point)

start GD2 ( necessary or does the upgrade script do this?)


I assume that once the cluster had been migrated to GD2 the glusterd
startup script will be smart enough to start the correct version?

-Thanks





On 3 November 2017 at 04:06, Kaushal M  wrote:

> On Thu, Nov 2, 2017 at 7:53 PM, Darrell Budic 
> wrote:
> > Will the various client packages (centos in my case) be able to
> > automatically handle the upgrade vs new install decision, or will we be
> > required to do something manually to determine that?
>
> We should be able to do this with CentOS (and other RPM based distros)
> which have well split glusterfs packages currently.
> At this moment, I don't know exactly how much can be handled
> automatically, but I expect the amount of manual intervention to be
> minimal.
> The least minimum amount of manual work needed would be enabling and
> starting GD2 and starting the migration script.
>
> >
> > It’s a little unclear that things will continue without interruption
> because
> > of the way you describe the change from GD1 to GD2, since it sounds like
> it
> > stops GD1.
>
> With the described upgrade strategy, we can ensure continuous volume
> access to clients during the whole process (provided volumes have been
> setup with replication or ec).
>
> During the migration from GD1 to GD2, any existing clients still
> retain access, and can continue to work without interruption.
> This is possible because gluster keeps the management  (glusterds) and
> data (bricks and clients) parts separate.
> So it is possible to interrupt the management parts, without
> interrupting data access to existing clients.
> Clients and the server side brick processes need GlusterD to start up.
> But once they're running, they can run without GlusterD. GlusterD is
> only required again if something goes wrong.
> Stopping GD1 during the migration process, will not lead to any
> interruptions for existing clients.
> The brick process continue to run, and any connected clients continue
> to remain connected to the bricks.
> Any new clients which try to mount the volumes during this migration
> will fail, as a GlusterD will not be available (either GD1 or GD2).
>
> > Early days, obviously, but if you could clarify if that’s what
> > we’re used to as a rolling upgrade or how it works, that would be
> > appreciated.
>
> A Gluster rolling upgrade process, allows data access to volumes
> during the process, while upgrading the brick processes as well.
> Rolling upgrades with uninterrupted access requires that volumes have
> redundancy (replicate or ec).
> Rolling upgrades involves upgrading servers belonging to a redundancy
> set (replica set or ec set), one at a time.
> One at a time,
> - A server is picked from a redundancy set
> - All Gluster processes are killed on the server, glusterd, bricks and
> other daemons included.
> - Gluster is upgraded and restarted on the server
> - A heal is performed to heal new data onto the bricks.
> - Move onto next server after heal finishes.
>
> Clients maintain uninterrupted access, because a full redundancy set
> is never taken offline all at once.
>
> > Also clarification that we’ll be able to upgrade from 3.x
> > (3.1x?) to 4.0, manually or automatically?
>
> Rolling upgrades from 3.1x to 4.0 are a manual process. But I believe,
> gdeploy has playbooks to automate it.
> At the end of this you will be left with a 4.0 cluster, but still be
> running GD1.
> Upgrading from GD1 to GD2, in 4.0 will be a manual process. A script
> that automates this is planned only for 4.1.
>
> >
> >
> > 
> > From: Kaushal M 
> > Subject: [Gluster-users] Request for Comments: Upgrades from 3.x to 4.0+
> > Date: November 2, 2017 at 3:56:05 AM CDT
> > To: gluster-us...@gluster.org; Gluster Devel
> >
> > We're fast approaching the time for Gluster-4.0. And we would like to
> > set out the expected upgrade strategy and try to polish it to be as
> > user friendly as possible.
> >
> > We're getting this out here now, because there was quite a bit of
> > concern and confusion regarding the upgrades between 3.x and 4.0+.
> >
> > ---
> > ## Background
> >
> > Gluster-4.0 will bring a newer management daemon, GlusterD-2.0 (GD2),
> > which is backwards incompatible with the GlusterD (GD1) in
> > GlusterFS-3.1+.  As a hybrid cluster of GD1 and GD2 cannot be
> > established, rolling upgrades are not possible. This meant that
> > upgrades from 3.x to 4.0 would require a volume downtime and possible
> > client downtime.
> >
> > This was a cause of concern among many during the recently concluded
> > Gluster Summit 2017.
> >
> > We would like to keep pains experienced by our users to a 

Re: [Gluster-devel] Coverity fixes

2017-11-03 Thread Atin Mukherjee
On Fri, 3 Nov 2017 at 18:31, Kaleb S. KEITHLEY  wrote:

> On 11/02/2017 10:19 AM, Atin Mukherjee wrote:
> > While I appreciate the folks to contribute lot of coverity fixes over
> > last few days, I have an observation for some of the patches the
> > coverity issue id(s) are *not* mentioned which gets maintainers in a
> > difficult situation to understand the exact complaint coming out of the
> > coverity. From my past experience in fixing coverity defects, sometimes
> > the fixes might look simple but they are not.
> >
> > May I request all the developers to include the defect id in the commit
> > message for all the coverity fixes?
> >
>
> How does that work? AFAIK the defect IDs are constantly changing as some
> get fixed and new ones get added.


We’d need atleast (a) the defect id with pointer to the coverity link which
most of the devs are now following I guess but with a caveat that link goes
stale in 7 days and the review needs to be done by that time or (b) the
commit message should exactly have the coverity description which is more
neat.

( I was not knowing the fact the defect id are not constant and later on
got to know this from Nigel today)

>
>
> (And I know everyone looks at the coverity report after their new code
> is committed to see if they might have added a new issue.)
>
> Today's defect ID 435 might be 436 or 421 tomorrow.
>
>
> --
>
> Kaleb
>
-- 
- Atin (atinm)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Coverity fixes

2017-11-03 Thread Kaleb S. KEITHLEY
On 11/02/2017 10:19 AM, Atin Mukherjee wrote:
> While I appreciate the folks to contribute lot of coverity fixes over
> last few days, I have an observation for some of the patches the
> coverity issue id(s) are *not* mentioned which gets maintainers in a
> difficult situation to understand the exact complaint coming out of the
> coverity. From my past experience in fixing coverity defects, sometimes
> the fixes might look simple but they are not.
> 
> May I request all the developers to include the defect id in the commit
> message for all the coverity fixes?
> 

How does that work? AFAIK the defect IDs are constantly changing as some
get fixed and new ones get added.

(And I know everyone looks at the coverity report after their new code
is committed to see if they might have added a new issue.)

Today's defect ID 435 might be 436 or 421 tomorrow.


-- 

Kaleb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Coverity covscan for 2017-11-03-2ef2b600 (master branch)

2017-11-03 Thread staticanalysis
GlusterFS Coverity covscan results are available from
http://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2017-11-03-2ef2b600
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] BoF - Gluster for VM store use case

2017-11-03 Thread Ramon Selga
Below you can find three fio commands used for running each benchmark test, 
sequential write, random 4k read and random 4k write.


# fio --name=writefile --size=10G --filesize=10G --filename=fio_file --bs=1M 
--nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers 
--end_fsync=1 --iodepth=200 --ioengine=libaio


# fio --time_based --name=benchmark --size=10G --runtime=30 --filename=fio_file 
--ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 
--verify=0 --verify_fatal=0 --numjobs=4 --rw=randread --blocksize=4k 
--group_reporting


# fio --time_based --name=benchmark --size=10G --runtime=30 --filename=fio_file 
--ioengine=libaio --randrepeat=0 --iodepth=128 --direct=1 --invalidate=1 
--verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k 
--group_reporting


And here timed extraction of kernel source, first run:

# time tar xf linux-4.13.11.tar.xz

real    0m8.180s
user    0m5.932s
sys     0m2.924s

second run, after deleting first:

# rm -rf linux-4.13.11
# time tar xf linux-4.13.11.tar.xz

real    0m6.454s
user    0m6.012s
sys     0m2.440s


El 03/11/17 a les 09:33, Gandalf Corvotempesta ha escrit:

Could you please share fio command line used for this test?
Additionally, can you tell me the time needed to extract the kernel source?

Il 2 nov 2017 11:24 PM, "Ramon Selga" > ha scritto:


Hi,

Just for your reference we got some similar values in a customer setup
with three nodes single Xeon and 4x8TB HDD each with a double 10GbE 
backbone.

We did a simple benchmark with fio tool on a virtual disk (virtio) of a
1TiB of size, XFS formatted directly no partitions no LVM, inside a VM
(debian stretch, dual core 4GB RAM) deployed in a gluster volume disperse
3 redundancy 1 distributed 2, sharding enabled.

We run a sequential write test 10GB file in 1024k blocks, a random read
test with 4k blocks and a random write test also with 4k blocks several
times with results very similar to the following:

writefile: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, 
iodepth=200
fio-2.16
Starting 1 process

writefile: (groupid=0, jobs=1): err= 0: pid=11515: Thu Nov  2 16:50:05 2017
  write: io=10240MB, bw=473868KB/s, iops=462, runt= 22128msec
    slat (usec): min=20, max=98830, avg=1972.11, stdev=6612.81
    clat (msec): min=150, max=2979, avg=428.49, stdev=189.96
 lat (msec): min=151, max=2979, avg=430.47, stdev=189.90
    clat percentiles (msec):
 |  1.00th=[  204],  5.00th=[  249], 10.00th=[  273], 20.00th=[  293],
 | 30.00th=[  306], 40.00th=[  318], 50.00th=[  351], 60.00th=[  502],
 | 70.00th=[  545], 80.00th=[  578], 90.00th=[  603], 95.00th=[  627],
 | 99.00th=[  717], 99.50th=[  775], 99.90th=[ 2966], 99.95th=[ 2966],
 | 99.99th=[ 2966]
    lat (msec) : 250=5.09%, 500=54.65%, 750=39.64%, 1000=0.31%, 2000=0.07%
    lat (msec) : >=2000=0.24%
  cpu  : usr=7.81%, sys=1.48%, ctx=1221, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.3%, 
>=64=99.4%
 submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.1%
 issued    : total=r=0/w=10240/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=200

Run status group 0 (all jobs):
  WRITE: io=10240MB, aggrb=473868KB/s, minb=473868KB/s, maxb=473868KB/s,
mint=22128msec, maxt=22128msec

Disk stats (read/write):
  vdg: ios=0/10243, merge=0/0, ticks=0/2745892, in_queue=2745884, util=99.18

benchmark: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio,
iodepth=128
...
fio-2.16
Starting 4 processes

benchmark: (groupid=0, jobs=4): err= 0: pid=11529: Thu Nov  2 16:52:40 2017
  read : io=1123.9MB, bw=38347KB/s, iops=9586, runt= 30011msec
    slat (usec): min=1, max=228886, avg=415.40, stdev=3975.72
    clat (usec): min=482, max=328648, avg=52664.65, stdev=30216.00
 lat (msec): min=9, max=527, avg=53.08, stdev=30.38
    clat percentiles (msec):
 |  1.00th=[   12],  5.00th=[   22], 10.00th=[   23], 20.00th=[   25],
 | 30.00th=[   33], 40.00th=[   38], 50.00th=[   47], 60.00th=[   55],
 | 70.00th=[   64], 80.00th=[   76], 90.00th=[   95], 95.00th=[  111],
 | 99.00th=[  151], 99.50th=[  163], 99.90th=[  192], 99.95th=[  196],
 | 99.99th=[  210]
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 10=0.03%, 20=3.59%, 50=52.41%, 100=36.01%, 250=7.96%
    lat (msec) : 500=0.01%
  cpu  : usr=0.29%, sys=1.10%, ctx=10157, majf=0, minf=549
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, 
>=64=99.9%
 submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 

[Gluster-devel] New Defects reported by Coverity Scan for gluster/glusterfs

2017-11-03 Thread scan-admin

Hi,

Please find the latest report on new defect(s) introduced to gluster/glusterfs 
found with Coverity Scan.

146 new defect(s) introduced to gluster/glusterfs found with Coverity Scan.
180 defect(s), reported by Coverity Scan earlier, were marked fixed in the 
recent build analyzed by Coverity Scan.

New defect(s) Reported-by: Coverity Scan
Showing 20 of 146 defect(s)


** CID 1382343:  Incorrect expression  (NO_EFFECT)
/xlators/cluster/dht/src/dht-common.c: 4962 in dht_dir_common_setxattr()



*** CID 1382343:  Incorrect expression  (NO_EFFECT)
/xlators/cluster/dht/src/dht-common.c: 4962 in dht_dir_common_setxattr()
4956 
4957 conf = this->private;
4958 local= frame->local;
4959 call_cnt = conf->subvolume_cnt;
4960 local->flags = flags;
4961 
>>> CID 1382343:  Incorrect expression  (NO_EFFECT)
>>> Comparing an array to null is not useful: "local->gfid", since the test 
>>> will always evaluate as true.
4962 if (local->gfid)
4963 gf_uuid_unparse(local->gfid, gfid_local);
4964 
4965 /* Check if any user xattr present in xattr
4966 */
4967 dict_foreach_fnmatch (xattr, "user*", dht_is_user_xattr,

** CID 1382342:  Null pointer dereferences  (FORWARD_NULL)
/rpc/rpc-transport/socket/src/socket.c: 2981 in socket_server_event_handler()



*** CID 1382342:  Null pointer dereferences  (FORWARD_NULL)
/rpc/rpc-transport/socket/src/socket.c: 2981 in socket_server_event_handler()
2975  * the new_trans since we've failed at 
everything so far
2976  */
2977 rpc_transport_unref (new_trans);
2978 }
2979 }
2980 out:
>>> CID 1382342:  Null pointer dereferences  (FORWARD_NULL)
>>> Dereferencing null pointer "ctx".
2981 event_handled (ctx->event_pool, fd, idx, gen);
2982 
2983 if (cname && (cname != this->ssl_name)) {
2984 GF_FREE(cname);
2985 }
2986 return ret;

** CID 1382341:  Null pointer dereferences  (FORWARD_NULL)
/libglusterfs/src/ctx.c: 50 in glusterfs_ctx_new()



*** CID 1382341:  Null pointer dereferences  (FORWARD_NULL)
/libglusterfs/src/ctx.c: 50 in glusterfs_ctx_new()
44 #endif
45 
46 /* lock is never destroyed! */
47  ret = LOCK_INIT (>lock);
48  if (ret) {
49  free (ctx);
>>> CID 1382341:  Null pointer dereferences  (FORWARD_NULL)
>>> Assigning: "ctx" = "NULL".
50  ctx = NULL;
51  }
52 
53 GF_ATOMIC_INIT (ctx->stats.max_dict_pairs, 0);
54 GF_ATOMIC_INIT (ctx->stats.total_pairs_used, 0);
55 GF_ATOMIC_INIT (ctx->stats.total_dicts_used, 0);

** CID 1325526:(USE_AFTER_FREE)
/xlators/storage/posix/src/posix.c: 6061 in _posix_handle_xattr_keyvalue_pair()
/xlators/storage/posix/src/posix.c: 6061 in _posix_handle_xattr_keyvalue_pair()



*** CID 1325526:(USE_AFTER_FREE)
/xlators/storage/posix/src/posix.c: 6061 in _posix_handle_xattr_keyvalue_pair()
6055 
6056 out:
6057 if (op_ret < 0)
6058 filler->op_errno = op_errno;
6059 
6060 if (array)
>>> CID 1325526:(USE_AFTER_FREE)
>>> Calling "__gf_free" frees pointer "array" which has already been freed.
6061 GF_FREE (array);
6062 
6063 return op_ret;
6064 }
6065 
6066 /**
/xlators/storage/posix/src/posix.c: 6061 in _posix_handle_xattr_keyvalue_pair()
6055 
6056 out:
6057 if (op_ret < 0)
6058 filler->op_errno = op_errno;
6059 
6060 if (array)
>>> CID 1325526:(USE_AFTER_FREE)
>>> Passing freed pointer "array" as an argument to "__gf_free".
6061 GF_FREE (array);
6062 
6063 return op_ret;
6064 }
6065 
6066 /**

** CID 1292646:  Insecure data handling  (TAINTED_SCALAR)



*** CID 1292646:  Insecure data handling  (TAINTED_SCALAR)
/libglusterfs/src/store.c: 611 in gf_store_iter_get_next()
605 store_errno = GD_STORE_ENOMEM;
606 goto out;
607 }
608 ret = 0;
609 
610 out:
>>> CID 1292646:  Insecure data handling  (TAINTED_SCALAR)
>>> Passing tainted variable "scan_str" to a tainted sink.
611 GF_FREE 

Re: [Gluster-devel] About GF_ASSERT() macro

2017-11-03 Thread Amar Tumballi
As per your review comments, introduced GF_ABORT as part of patch
https://review.gluster.org/#/c/18309/5

I wouldn't get into changing anything with ASSERT at the moment, as there
are around ~2800 instances :-o

Wherever it is critical, lets call 'GF_ABORT()' in future, and also we
should have a 'checkpatches.pl' check to warn people about usage of
GF_ABORT().

Regards,
Amar

On Fri, Nov 3, 2017 at 2:05 PM, Xavi Hernandez  wrote:

> Hi all,
>
> I've seen that GF_ASSERT() macro is defined in different ways depending on
> if we are building in debug mode or not.
>
> In debug mode, it's an alias of assert(), but in non-debug mode it simply
> logs an error message and continues.
>
> I think that an assert should be a critical check that should always be
> true, specially in production code. Allowing the program to continue after
> a failure on one of these checks is dangerous. Most probably it will crash
> later, losing some information about the real cause of the error. But even
> if it doesn't crash, some internal data will be invalid, leading to a bad
> behavior.
>
> I think we should always terminate the process if an assertion fails, even
> in production level code. If some failure is not considered so much
> critical by the coder, it should not use GF_ASSERT() and only write a log
> message or use another of the condition check macros.
>
> Thoughts ?
>
> Xavi
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Amar Tumballi (amarts)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] About GF_ASSERT() macro

2017-11-03 Thread Xavi Hernandez
Hi all,

I've seen that GF_ASSERT() macro is defined in different ways depending on
if we are building in debug mode or not.

In debug mode, it's an alias of assert(), but in non-debug mode it simply
logs an error message and continues.

I think that an assert should be a critical check that should always be
true, specially in production code. Allowing the program to continue after
a failure on one of these checks is dangerous. Most probably it will crash
later, losing some information about the real cause of the error. But even
if it doesn't crash, some internal data will be invalid, leading to a bad
behavior.

I think we should always terminate the process if an assertion fails, even
in production level code. If some failure is not considered so much
critical by the coder, it should not use GF_ASSERT() and only write a log
message or use another of the condition check macros.

Thoughts ?

Xavi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] BoF - Gluster for VM store use case

2017-11-03 Thread Gandalf Corvotempesta
Could you please share fio command line used for this test?
Additionally, can you tell me the time needed to extract the kernel source?

Il 2 nov 2017 11:24 PM, "Ramon Selga"  ha scritto:

> Hi,
>
> Just for your reference we got some similar values in a customer setup
> with three nodes single Xeon and 4x8TB HDD each with a double 10GbE
> backbone.
>
> We did a simple benchmark with fio tool on a virtual disk (virtio) of a
> 1TiB of size, XFS formatted directly no partitions no LVM, inside a VM
> (debian stretch, dual core 4GB RAM) deployed in a gluster volume disperse 3
> redundancy 1 distributed 2, sharding enabled.
>
> We run a sequential write test 10GB file in 1024k blocks, a random read
> test with 4k blocks and a random write test also with 4k blocks several
> times with results very similar to the following:
>
> writefile: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio,
> iodepth=200
> fio-2.16
> Starting 1 process
>
> writefile: (groupid=0, jobs=1): err= 0: pid=11515: Thu Nov  2 16:50:05 2017
>   write: io=10240MB, bw=473868KB/s, iops=462, runt= 22128msec
> slat (usec): min=20, max=98830, avg=1972.11, stdev=6612.81
> clat (msec): min=150, max=2979, avg=428.49, stdev=189.96
>  lat (msec): min=151, max=2979, avg=430.47, stdev=189.90
> clat percentiles (msec):
>  |  1.00th=[  204],  5.00th=[  249], 10.00th=[  273], 20.00th=[  293],
>  | 30.00th=[  306], 40.00th=[  318], 50.00th=[  351], 60.00th=[  502],
>  | 70.00th=[  545], 80.00th=[  578], 90.00th=[  603], 95.00th=[  627],
>  | 99.00th=[  717], 99.50th=[  775], 99.90th=[ 2966], 99.95th=[ 2966],
>  | 99.99th=[ 2966]
> lat (msec) : 250=5.09%, 500=54.65%, 750=39.64%, 1000=0.31%, 2000=0.07%
> lat (msec) : >=2000=0.24%
>   cpu  : usr=7.81%, sys=1.48%, ctx=1221, majf=0, minf=11
>   IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.3%,
> >=64=99.4%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.1%
>  issued: total=r=0/w=10240/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>  latency   : target=0, window=0, percentile=100.00%, depth=200
>
> Run status group 0 (all jobs):
>   WRITE: io=10240MB, aggrb=473868KB/s, minb=473868KB/s, maxb=473868KB/s,
> mint=22128msec, maxt=22128msec
>
> Disk stats (read/write):
>   vdg: ios=0/10243, merge=0/0, ticks=0/2745892, in_queue=2745884,
> util=99.18
>
> benchmark: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio,
> iodepth=128
> ...
> fio-2.16
> Starting 4 processes
>
> benchmark: (groupid=0, jobs=4): err= 0: pid=11529: Thu Nov  2 16:52:40 2017
>   read : io=1123.9MB, bw=38347KB/s, iops=9586, runt= 30011msec
> slat (usec): min=1, max=228886, avg=415.40, stdev=3975.72
> clat (usec): min=482, max=328648, avg=52664.65, stdev=30216.00
>  lat (msec): min=9, max=527, avg=53.08, stdev=30.38
> clat percentiles (msec):
>  |  1.00th=[   12],  5.00th=[   22], 10.00th=[   23], 20.00th=[   25],
>  | 30.00th=[   33], 40.00th=[   38], 50.00th=[   47], 60.00th=[   55],
>  | 70.00th=[   64], 80.00th=[   76], 90.00th=[   95], 95.00th=[  111],
>  | 99.00th=[  151], 99.50th=[  163], 99.90th=[  192], 99.95th=[  196],
>  | 99.99th=[  210]
> lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
> lat (msec) : 10=0.03%, 20=3.59%, 50=52.41%, 100=36.01%, 250=7.96%
> lat (msec) : 500=0.01%
>   cpu  : usr=0.29%, sys=1.10%, ctx=10157, majf=0, minf=549
>   IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
> >=64=99.9%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.1%
>  issued: total=r=287705/w=0/d=0, short=r=0/w=0/d=0,
> drop=r=0/w=0/d=0
>  latency   : target=0, window=0, percentile=100.00%, depth=128
>
> Run status group 0 (all jobs):
>READ: io=1123.9MB, aggrb=38346KB/s, minb=38346KB/s, maxb=38346KB/s,
> mint=30011msec, maxt=30011msec
>
> Disk stats (read/write):
>   vdg: ios=286499/2, merge=0/0, ticks=3707064/64, in_queue=3708680,
> util=99.83%
>
> benchmark: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio,
> iodepth=128
> ...
> fio-2.16
> Starting 4 processes
>
> benchmark: (groupid=0, jobs=4): err= 0: pid=11545: Thu Nov  2 16:55:54 2017
>   write: io=422464KB, bw=14079KB/s, iops=3519, runt= 30006msec
> slat (usec): min=1, max=230620, avg=1130.75, stdev=6744.31
> clat (usec): min=643, max=540987, avg=143999.57, stdev=66693.45
>  lat (msec): min=8, max=541, avg=145.13, stdev=67.01
> clat percentiles (msec):
>  |  1.00th=[   34],  5.00th=[   75], 10.00th=[   87], 20.00th=[  100],
>  | 30.00th=[  109], 40.00th=[  116], 50.00th=[  123], 60.00th=[  135],
>  | 70.00th=[  151], 80.00th=[  182], 90.00th=[  241], 95.00th=[  289],
>  | 99.00th=[  359], 99.50th=[  416], 99.90th=[  465], 99.95th=[  

Re: [Gluster-devel] [Gluster-users] Request for Comments: Upgrades from 3.x to 4.0+

2017-11-03 Thread Kaushal M
On Thu, Nov 2, 2017 at 7:53 PM, Darrell Budic  wrote:
> Will the various client packages (centos in my case) be able to
> automatically handle the upgrade vs new install decision, or will we be
> required to do something manually to determine that?

We should be able to do this with CentOS (and other RPM based distros)
which have well split glusterfs packages currently.
At this moment, I don't know exactly how much can be handled
automatically, but I expect the amount of manual intervention to be
minimal.
The least minimum amount of manual work needed would be enabling and
starting GD2 and starting the migration script.

>
> It’s a little unclear that things will continue without interruption because
> of the way you describe the change from GD1 to GD2, since it sounds like it
> stops GD1.

With the described upgrade strategy, we can ensure continuous volume
access to clients during the whole process (provided volumes have been
setup with replication or ec).

During the migration from GD1 to GD2, any existing clients still
retain access, and can continue to work without interruption.
This is possible because gluster keeps the management  (glusterds) and
data (bricks and clients) parts separate.
So it is possible to interrupt the management parts, without
interrupting data access to existing clients.
Clients and the server side brick processes need GlusterD to start up.
But once they're running, they can run without GlusterD. GlusterD is
only required again if something goes wrong.
Stopping GD1 during the migration process, will not lead to any
interruptions for existing clients.
The brick process continue to run, and any connected clients continue
to remain connected to the bricks.
Any new clients which try to mount the volumes during this migration
will fail, as a GlusterD will not be available (either GD1 or GD2).

> Early days, obviously, but if you could clarify if that’s what
> we’re used to as a rolling upgrade or how it works, that would be
> appreciated.

A Gluster rolling upgrade process, allows data access to volumes
during the process, while upgrading the brick processes as well.
Rolling upgrades with uninterrupted access requires that volumes have
redundancy (replicate or ec).
Rolling upgrades involves upgrading servers belonging to a redundancy
set (replica set or ec set), one at a time.
One at a time,
- A server is picked from a redundancy set
- All Gluster processes are killed on the server, glusterd, bricks and
other daemons included.
- Gluster is upgraded and restarted on the server
- A heal is performed to heal new data onto the bricks.
- Move onto next server after heal finishes.

Clients maintain uninterrupted access, because a full redundancy set
is never taken offline all at once.

> Also clarification that we’ll be able to upgrade from 3.x
> (3.1x?) to 4.0, manually or automatically?

Rolling upgrades from 3.1x to 4.0 are a manual process. But I believe,
gdeploy has playbooks to automate it.
At the end of this you will be left with a 4.0 cluster, but still be
running GD1.
Upgrading from GD1 to GD2, in 4.0 will be a manual process. A script
that automates this is planned only for 4.1.

>
>
> 
> From: Kaushal M 
> Subject: [Gluster-users] Request for Comments: Upgrades from 3.x to 4.0+
> Date: November 2, 2017 at 3:56:05 AM CDT
> To: gluster-us...@gluster.org; Gluster Devel
>
> We're fast approaching the time for Gluster-4.0. And we would like to
> set out the expected upgrade strategy and try to polish it to be as
> user friendly as possible.
>
> We're getting this out here now, because there was quite a bit of
> concern and confusion regarding the upgrades between 3.x and 4.0+.
>
> ---
> ## Background
>
> Gluster-4.0 will bring a newer management daemon, GlusterD-2.0 (GD2),
> which is backwards incompatible with the GlusterD (GD1) in
> GlusterFS-3.1+.  As a hybrid cluster of GD1 and GD2 cannot be
> established, rolling upgrades are not possible. This meant that
> upgrades from 3.x to 4.0 would require a volume downtime and possible
> client downtime.
>
> This was a cause of concern among many during the recently concluded
> Gluster Summit 2017.
>
> We would like to keep pains experienced by our users to a minimum, so
> we are trying to develop an upgrade strategy that avoids downtime as
> much as possible.
>
> ## (Expected) Upgrade strategy from 3.x to 4.0
>
> Gluster-4.0 will ship with both GD1 and GD2.
> For fresh installations, only GD2 will be installed and available by
> default.
> For existing installations (upgrades) GD1 will be installed and run by
> default. GD2 will also be installed simultaneously, but will not run
> automatically.
>
> GD1 will allow rolling upgrades, and allow properly setup Gluster
> volumes to be upgraded to 4.0 binaries, without downtime.
>
> Once the full pool is upgraded, and all bricks and other daemons are
> running 4.0 binaries, migration to GD2 can happen.
>
> To