Re: [Gluster-devel] Release 4.0: Branched

2018-01-25 Thread Shyam Ranganathan
On 01/23/2018 03:17 PM, Shyam Ranganathan wrote:
> 4.0 release has been branched!
> 
> I will follow this up with a more detailed schedule for the release, and
> also the granted feature backport exceptions that we are waiting.
> 
> Feature backports would need to make it in by this weekend, so that we
> can tag RC0 by the end of the month.

Backports need to be ready for merge on or before Jan, 29th 2018 3:00 PM
Eastern TZ.

Features that requested and hence are granted backport exceptions are as
follows,

1) Dentry fop serializer xlator on brick stack
https://github.com/gluster/glusterfs/issues/397

@Du please backport the same to the 4.0 branch as the patch in master is
merged.

2) Leases support on GlusterFS
https://github.com/gluster/glusterfs/issues/350

@Jiffin and @ndevos, there is one patch pending against master,
https://review.gluster.org/#/c/18785/ please do the needful and backport
this to the 4.0 branch.

3) Data corruption in write ordering of rebalance and application writes
https://github.com/gluster/glusterfs/issues/308

@susant, @du if we can conclude on the strategy here, please backport as
needed.

4) Couple of patches that are tracked for a backport are,
https://review.gluster.org/#/c/19223/
https://review.gluster.org/#/c/19267/ (prep for ctime changes in later
releases)

Other features discussed are not in scope for a backports to 4.0.

If you asked for one and do not see it in this list, shout out!

> 
> Only exception could be: https://review.gluster.org/#/c/19223/
> 
> Thanks,
> Shyam
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression tests time

2018-01-25 Thread Xavi Hernandez
On Thu, Jan 25, 2018 at 3:03 PM, Jeff Darcy  wrote:

>
>
>
> On Wed, Jan 24, 2018, at 9:37 AM, Xavi Hernandez wrote:
>
> That happens when we use arbitrary delays. If we use an explicit check, it
> will work on all systems.
>
>
> You're arguing against a position not taken. I'm not expressing opposition
> to explicit checks. I'm just saying they don't come for free. If you don't
> believe me, try adding explicit checks in some of the harder cases where
> we're waiting for something that's subject to OS scheduling delays, or for
> large numbers of operations to complete. Geo-replication or multiplexing
> tests should provide some good examples. Adding explicit conditions is the
> right thing to do in the abstract, but as a practical matter the returns
> must justify the cost.
>
> BTW, some of our longest-running tests are in EC. Do we need all of those,
> and do they all need to run as long, or could some be eliminated/shortened?
>

Some tests were already removed some time ago. Anyway, with the changes
introduced, it takes between 10 and 15 minutes to execute all ec related
tests from basic/ec and bugs/ec (an average of 16 to 25 seconds per test).
Before the changes, the same tests were taking between 30 and 60 minutes.

AFR tests have also improved from almost 60 minutes to around 30.


> I agree that parallelizing tests is the way to go, but if we reduce the
> total time to 50%, the parallelized tests will also take 50% less of the
> time.
>
>
> Taking 50% less time but failing spuriously 1% of the time, or all of the
> time in some environments, is not a good thing. If you want to add explicit
> checks that's great, but you also mentioned shortening timeouts and that's
> much more risky.
>

If we have a single test that takes 45 minutes (as we currently have in
some executions: bugs/nfs/bug-1053579.t), parallelization won't help much.
We need to make this test to run faster.

Some tests that were failing after the changes have revealed errors in the
test itself or even in the code, so I think it's a good thing. Currently
I'm investigating what seems a race in the rpc layer during connections
that causes some tests to fail. This is a real problem that high delays or
slow machines were hiding. It seems to cause some gluster requests to fail
spuriously after reconnecting to a brick or glusterd. I'm not 100% sure
about this yet, but initial analysis seems to indicate that.

Xavi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Regression tests time

2018-01-25 Thread Jeff Darcy



On Wed, Jan 24, 2018, at 9:37 AM, Xavi Hernandez wrote:
> That happens when we use arbitrary delays. If we use an explicit
> check, it will work on all systems.
You're arguing against a position not taken. I'm not expressing
opposition to explicit checks. I'm just saying they don't come for free.
If you don't believe me, try adding explicit checks in some of the
harder cases where we're waiting for something that's subject to OS
scheduling delays, or for large numbers of operations to complete. Geo-
replication or multiplexing tests should provide some good examples.
Adding explicit conditions is the right thing to do in the abstract, but
as a practical matter the returns must justify the cost.
BTW, some of our longest-running tests are in EC. Do we need all of
those, and do they all need to run as long, or could some be
eliminated/shortened?
> I agree that parallelizing tests is the way to go, but if we reduce
> the total time to 50%, the parallelized tests will also take 50% less
> of the time.
Taking 50% less time but failing spuriously 1% of the time, or all of
the time in some environments, is not a good thing. If you want to add
explicit checks that's great, but you also mentioned shortening timeouts
and that's much more risky.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Glusto failures with dispersed volumes + Samba

2018-01-25 Thread Henrik Kuhn

Hi gluster-devel experts,

I've stumbled upon the very same issue/observations like the one 
mentioned in 
http://lists.gluster.org/pipermail/gluster-devel/2017-July/053234.html.
Nigel Babu and Pranith Kumar Karampuri told me that this selinux related 
issue had been also been fixed in >= v3.13.0.


The test setup is as follows:
OS: OpenSUSE Leap 42.3 (selinux features installed/enabled)
SW: gluster 3.13.1
- 3 gluster nodes (gnode[1,2,3] with own vlan for gluster communication)
- 1 virtualization server (snode) upon which the test samba server 
instance is running under KVM/QEMU. The image is provided by the 
gluster  (/vmvol/).

The gluster volume /vol1/ is for the samba share.

snode:~ # ssh gnode1 gluster vol info all

*Volume Name: vmvol*
Type: Replicate
Volume ID: a03b8fc1-4fcb-4268-bf09-0f554ba5e7a5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gs-gnode1:/data/glusterfs/vmvol/brick1/brick
Brick2: gs-gnode2:/data/glusterfs/vmvol/brick1/brick
Brick3: gs-gnode3:/data/glusterfs/vmvol/brick1/brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 1
features.shard: on
user.cifs: off
server.allow-insecure: on
storage.owner-uid: 500
storage.owner-gid: 500

*Volume Name: vol1 *
Type: Disperse
Volume ID: fb081b58-bffc-4ddd-bf62-a87a13abec9b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: gs-gnode1:/data/glusterfs/vol1/brick1/brick
Brick2: gs-gnode2:/data/glusterfs/vol1/brick1/brick
Brick3: gs-gnode3:/data/glusterfs/vol1/brick1/brick
Brick4: gs-gnode1:/data/glusterfs/vol1/brick2/brick
Brick5: gs-gnode2:/data/glusterfs/vol1/brick2/brick
Brick6: gs-gnode3:/data/glusterfs/vol1/brick2/brick
Options Reconfigured:
performance.readdir-ahead: on
storage.batch-fsync-delay-usec: 0
performance.stat-prefetch: off
nfs.disable: on
transport.address-family: inet
server.allow-insecure: on

The samba server config is:
samba:~ # cat /etc/samba/smb.conf
[global]
  workgroup = TESTGROUP
  server string = Samba Server Version %v
  log file = /var/log/samba/log.%m
  log level = 2
  map to guest = Bad User
  unix charset = UTF-8
  idmap config * : backend = autorid
  idmap config * : range = 100-199

[gluster]
  kernel share modes = no
  vfs objects = acl_xattr glusterfs
  comment = glusterfs based volume
  browseable = Yes
  read only = No
  writeable = Yes
  public = Yes
  guest ok = Yes
  inherit acls = Yes
  path = /shares/data/
  glusterfs:volume = vol1
  glusterfs:volfile_server = gs-gnode1.origenis.de 
gs-gnode2.origenis.de gs-gnode3.origenis.de

#  glusterfs:volfile_server = 172.17.20.1 172.17.20.2 172.17.20.3
  glusterfs:loglevel = 9
  glusterfs:logfile = /var/log/samba/glusterfs-vol1.%M.log


The corresponding logs on the samba server are:
* The vfs_gluster log shows lots of:
[2018-01-15 15:27:49.349995] D 
[logging.c:1817:__gf_log_inject_timer_event] 0-logging-infra: Starting 
timer now. Timeout = 120, current buf size = 5
[2018-01-15 15:27:49.351598] D 
[rpc-clnt.c:1047:rpc_clnt_connection_init] 0-gfapi: defaulting 
frame-timeout to 30mins
[2018-01-15 15:27:49.351625] D 
[rpc-clnt.c:1061:rpc_clnt_connection_init] 0-gfapi: disable ping-timeout
[2018-01-15 15:27:49.351644] D [rpc-transport.c:279:rpc_transport_load] 
0-rpc-transport: attempt to load file 
/usr/lib64/glusterfs/3.13.1/rpc-transport/socket.so
[2018-01-15 15:27:49.352372] W [MSGID: 101002] 
[options.c:995:xl_opt_validate] 0-gfapi: option 'address-family' is 
deprecated, preferred is 'transport.address-family', continuing with 
correction
[2018-01-15 15:27:49.352402] T [MSGID: 0] 
[options.c:86:xlator_option_validate_int] 0-gfapi: no range check 
required for 'option remote-port 24007'
[2018-01-15 15:27:49.354865] D [socket.c:4236:socket_init] 0-gfapi: 
Configued transport.tcp-user-timeout=0
[2018-01-15 15:27:49.354885] D [socket.c:4254:socket_init] 0-gfapi: 
Reconfigued transport.keepalivecnt=9
[2018-01-15 15:27:49.354901] D [socket.c:4339:socket_init] 0-gfapi: SSL 
support on the I/O path is NOT enabled
[2018-01-15 15:27:49.354914] D [socket.c:4342:socket_init] 0-gfapi: SSL 
support for glusterd is NOT enabled
[2018-01-15 15:27:49.354926] D [socket.c:4359:socket_init] 0-gfapi: 
using system polling thread
[2018-01-15 15:27:49.354943] D 
[rpc-clnt.c:1567:rpcclnt_cbk_program_register] 0-gfapi: New program 
registered: GlusterFS Callback, Num: 52743234, Ver: 1
[2018-01-15 15:27:49.354958] T [rpc-clnt.c:406:rpc_clnt_reconnect] 
0-gfapi: attempting reconnect
[2018-01-15 15:27:49.354971] T [socket.c:3146:socket_connect] 0-gfapi: 
connecting 

[Gluster-devel] Coverity covscan for 2018-01-25-b7844629 (master branch)

2018-01-25 Thread staticanalysis
GlusterFS Coverity covscan results are available from
http://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2018-01-25-b7844629
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED][master] tests/basic/afr/durability-off.t

2018-01-25 Thread Pranith Kumar Karampuri
On Thu, Jan 25, 2018 at 3:09 PM, Milind Changire 
wrote:

> could AFR engineers check why tests/basic/afr/durability-off.t fails in
> brick-mux mode;
>

Issue seems to be something with connections to the bricks at the time of
mount.

*09:30:04* dd: opening `/mnt/glusterfs/0/a.txt': Transport endpoint is
not connected*09:30:10* ./tests/basic/afr/durability-off.t ..


> here's the job URL: https://build.gluster.org/job/centos6-regression/8654/
> console
>
> --
> Milind
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] [FAILED][master] tests/basic/afr/durability-off.t

2018-01-25 Thread Milind Changire
could AFR engineers check why tests/basic/afr/durability-off.t fails in
brick-mux mode;

here's the job URL:
https://build.gluster.org/job/centos6-regression/8654/console

-- 
Milind
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 4.0: Branched

2018-01-25 Thread Atin Mukherjee
Shyam,

We need to have 4.0 version created in bugzilla for GlusterFS which is
currently missing. I have a patch to backport into this branch.

On Wed, Jan 24, 2018 at 1:47 AM, Shyam Ranganathan 
wrote:

> 4.0 release has been branched!
>
> I will follow this up with a more detailed schedule for the release, and
> also the granted feature backport exceptions that we are waiting.
>
> Feature backports would need to make it in by this weekend, so that we
> can tag RC0 by the end of the month.
>
> Only exception could be: https://review.gluster.org/#/c/19223/
>
> Thanks,
> Shyam
> ___
> maintainers mailing list
> maintain...@gluster.org
> http://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel