Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Nithya Balachandran
On 31 July 2018 at 22:11, Atin Mukherjee  wrote:

> I just went through the nightly regression report of brick mux runs and
> here's what I can summarize.
>
> 
> 
> =
> Fails only with brick-mux
> 
> 
> =
> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
> 400 secs. Refer https://fstat.gluster.org/failure/209?state=2&start_
> date=2018-06-30&end_date=2018-07-31&branch=all, specifically the latest
> report https://build.gluster.org/job/regression-test-burn-in/4051/
> consoleText . Wasn't timing out as frequently as it was till 12 July. But
> since 27 July, it has timed out twice. Beginning to believe commit
> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
> secs isn't sufficient enough (Mohit?)
>

One of the failed regression-test-burn in was an actual failure,not a
timeout.
https://build.gluster.org/job/regression-test-burn-in/4049

The brick disconnects from glusterd:

[2018-07-27 16:28:42.882668] I [MSGID: 106005]
[glusterd-handler.c:6129:__glusterd_brick_rpc_notify] 0-management: Brick
builder103.cloud.gluster.org:/d/backends/vol01/brick0 has disconnected from
glusterd.
[2018-07-27 16:28:42.891031] I [MSGID: 106143]
[glusterd-pmap.c:397:pmap_registry_remove] 0*-pmap: removing brick
/d/backends/vol01/brick0 on port 49152*
[2018-07-27 16:28:42.892379] I [MSGID: 106143]
[glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick (null) on
port 49152
[2018-07-27 16:29:02.636027]:++
G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 56 _GFS
--attribute-timeout=0 --entry-timeout=0 -s builder103.cloud.gluster.org
--volfile-id=patchy-vol20 /mnt/glusterfs/vol20 ++


So the client cannot connect to the bricks after this as it never gets the
port info from glusterd. From mnt-glusterfs-vol20.log:

[2018-07-27 16:29:02.769947] I [MSGID: 114020] [client.c:2329:notify]
0-patchy-vol20-client-1: parent translators are ready, attempting connect
on transport
[2018-07-27 16:29:02.770677] E [MSGID: 114058]
[client-handshake.c:1518:client_query_portmap_cbk]
0-patchy-vol20-client-0: *failed
to get the port number for remote subvolume. Please run 'gluster volume
status' on server to see if brick process is running*.
[2018-07-27 16:29:02.770767] I [MSGID: 114018]
[client.c:2255:client_rpc_notify] 0-patchy-vol20-client-0: disconnected
from patchy-vol20-client-0. Client process will keep trying to connect to
glusterd until brick's port is available


>From the brick logs:
[2018-07-27 16:28:34.729241] I [login.c:111:gf_auth] 0-auth/login: allowed
user names: 2b65c380-392e-459f-b722-c130aac29377
[2018-07-27 16:28:34.945474] I [MSGID: 115029]
[server-handshake.c:786:server_setvolume] 0-patchy-vol01-server: accepted
client from
CTX_ID:72dcd65e-2125-4a79-8331-48c0fe9abce7-GRAPH_ID:0-PID:8483-HOST:builder103.cloud.gluster.org-PC_NAME:patchy-vol06-client-2-RECON_NO:-0
(version: 4.2dev)
[2018-07-27 16:28:35.946588] I [MSGID: 101016]
[glusterfs3.h:739:dict_to_xdr] 0-dict: key 'glusterfs.xattrop_index_gfid'
is would not be sent on wire in future [Invalid argument]  *  <--- Last
Brick Log. It looks like the brick went down at this point.*
[2018-07-27 16:29:02.636027]:++
G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 56 _GFS
--attribute-timeout=0 --entry-timeout=0 -s builder103.cloud.gluster.org
--volfile-id=patchy-vol20 /mnt/glusterfs/vol20 ++
[2018-07-27 16:29:12.021827]:++
G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 83 dd
if=/dev/zero of=/mnt/glusterfs/vol20/a_file bs=4k count=1 ++
[2018-07-27 16:29:12.039248]:++
G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 87 killall
-9 glusterd ++
[2018-07-27 16:29:17.073995]:++
G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 89 killall
-9 glusterfsd ++
[2018-07-27 16:29:22.096385]:++
G_LOG:./tests/bugs/core/bug-1432542-mpx-restart-crash.t: TEST: 95 glusterd
++
[2018-07-27 16:29:24.481555] I [MSGID: 100030] [glusterfsd.c:2728:main]
0-/build/install/sbin/glusterfsd: Started running
/build/install/sbin/glusterfsd version 4.2dev (args:
/build/install/sbin/glusterfsd -s builder103.cloud.gluster.org --volfile-id
patchy-vol01.builder103.cloud.gluster.org.d-backends-vol01-brick0 -p
/var/run/gluster/vols/patchy-vol01/builder103.cloud.gluster.org-d-backends-vol01-brick0.pid
-S /var/run/gluster/f4d6c8f7c3f85b18.socket --brick-name
/d/backends/vol01/brick0 -l
/var/log/glusterfs/bricks/d-backends-vol01-brick0.log --xlator-option
*-posix.glusterd-uuid=0db25f79-8880-4f2d-b1e8-584e751ff0b9 --process-name
brick --brick-port 49153 --xlator-option
patchy-vol01-serv

Re: [Gluster-devel] tests/bugs/distribute/bug-1122443.t - spurious failure

2018-08-03 Thread Raghavendra Gowdappa
On Fri, Aug 3, 2018 at 5:58 PM, Yaniv Kaul  wrote:

> Why not revert, fix and resubmit (unless you can quickly fix it)?
> Y.
>

https://review.gluster.org/20634


>
> On Fri, Aug 3, 2018, 5:04 PM Raghavendra Gowdappa 
> wrote:
>
>> Will take a look.
>>
>> On Fri, Aug 3, 2018 at 3:08 PM, Krutika Dhananjay 
>> wrote:
>>
>>> Adding Raghavendra G who actually restored and reworked on this after it
>>> was abandoned.
>>>
>>> -Krutika
>>>
>>> On Fri, Aug 3, 2018 at 2:38 PM, Nithya Balachandran >> > wrote:
>>>
 Using git bisect, the patch that introduced this behaviour is :

 commit 7131de81f72dda0ef685ed60d0887c6e14289b8c
 Author: Krutika Dhananjay 
 Date:   Tue Jan 17 16:40:04 2017 +0530

 performance/readdir-ahead: Invalidate cached dentries if they're
 modified while in cache

 Krutika, can you take a look and fix this?

 To summarize, this is _not_ a spurious failure.


 regards,
 Nithya


 On 3 August 2018 at 14:13, Nithya Balachandran 
 wrote:

> This is a new issue - the test uses ls -l to get some information.
> With the latest master, ls -l returns strange results the first time it is
> called on the mount point causing the test to fail:
>
>
> With the latest master, I created a single brick volume and some files
> inside it.
>
> [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
> 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
> again"; ls -l /mnt/fuse1
> umount: /mnt/fuse1: not mounted
> total 0
> *--. 0 root root 0 Jan  1  1970 file-1*
> *--. 0 root root 0 Jan  1  1970 file-2*
> *--. 0 root root 0 Jan  1  1970 file-3*
> *--. 0 root root 0 Jan  1  1970 file-4*
> *--. 0 root root 0 Jan  1  1970 file-5*
> *d-. 0 root root 0 Jan  1  1970 subdir*
> Trying again
> total 3
> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
> d-. 0 root root  0 Jan  1  1970 subdir
> [root@rhgs313-6 ~]#
>
>
>
> This is consistently reproducible. I am still debugging this to see
> which patch caused this.
>
> regards,
> Nithya
>
>
> On 2 August 2018 at 07:13, Atin Mukherjee 
> wrote:
>
>>
>>
>> On Thu, 2 Aug 2018 at 07:05, Susant Palai  wrote:
>>
>>> Will have a look at it and update.
>>>
>>
>> There’s already a patch from Mohit for this.
>>
>>
>>> Susant
>>>
>>> On Wed, 1 Aug 2018, 18:58 Krutika Dhananjay, 
>>> wrote:
>>>
 Same here - https://build.gluster.org/job/centos7-regression/2024/
 console

 -Krutika

 On Sun, Jul 29, 2018 at 1:53 PM, Atin Mukherjee <
 amukh...@redhat.com> wrote:

> tests/bugs/distribute/bug-1122443.t fails my set up (3 out of 5
> times) running with master branch. As per my knowledge I've not seen 
> this
> test failing earlier. Looks like some recent changes has caused it. 
> One of
> such instance is https://build.gluster.org/job/
> centos7-regression/1955/ .
>
> Request the component owners to take a look at it.
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>> --
>> --Atin
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>

>>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Coverity covscan for 2018-08-03-93d7f3f2 (master branch)

2018-08-03 Thread staticanalysis


GlusterFS Coverity covscan results for the master branch are available from
http://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2018-08-03-93d7f3f2/

Coverity covscan results for other active branches are also available at
http://download.gluster.org/pub/gluster/glusterfs/static-analysis/

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] tests/bugs/distribute/bug-1122443.t - spurious failure

2018-08-03 Thread Yaniv Kaul
Why not revert, fix and resubmit (unless you can quickly fix it)?
Y.


On Fri, Aug 3, 2018, 5:04 PM Raghavendra Gowdappa 
wrote:

> Will take a look.
>
> On Fri, Aug 3, 2018 at 3:08 PM, Krutika Dhananjay 
> wrote:
>
>> Adding Raghavendra G who actually restored and reworked on this after it
>> was abandoned.
>>
>> -Krutika
>>
>> On Fri, Aug 3, 2018 at 2:38 PM, Nithya Balachandran 
>> wrote:
>>
>>> Using git bisect, the patch that introduced this behaviour is :
>>>
>>> commit 7131de81f72dda0ef685ed60d0887c6e14289b8c
>>> Author: Krutika Dhananjay 
>>> Date:   Tue Jan 17 16:40:04 2017 +0530
>>>
>>> performance/readdir-ahead: Invalidate cached dentries if they're
>>> modified while in cache
>>>
>>> Krutika, can you take a look and fix this?
>>>
>>> To summarize, this is _not_ a spurious failure.
>>>
>>>
>>> regards,
>>> Nithya
>>>
>>>
>>> On 3 August 2018 at 14:13, Nithya Balachandran 
>>> wrote:
>>>
 This is a new issue - the test uses ls -l to get some information. With
 the latest master, ls -l returns strange results the first time it is
 called on the mount point causing the test to fail:


 With the latest master, I created a single brick volume and some files
 inside it.

 [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
 again"; ls -l /mnt/fuse1
 umount: /mnt/fuse1: not mounted
 total 0
 *--. 0 root root 0 Jan  1  1970 file-1*
 *--. 0 root root 0 Jan  1  1970 file-2*
 *--. 0 root root 0 Jan  1  1970 file-3*
 *--. 0 root root 0 Jan  1  1970 file-4*
 *--. 0 root root 0 Jan  1  1970 file-5*
 *d-. 0 root root 0 Jan  1  1970 subdir*
 Trying again
 total 3
 -rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
 -rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
 -rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
 -rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
 -rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
 d-. 0 root root  0 Jan  1  1970 subdir
 [root@rhgs313-6 ~]#



 This is consistently reproducible. I am still debugging this to see
 which patch caused this.

 regards,
 Nithya


 On 2 August 2018 at 07:13, Atin Mukherjee 
 wrote:

>
>
> On Thu, 2 Aug 2018 at 07:05, Susant Palai  wrote:
>
>> Will have a look at it and update.
>>
>
> There’s already a patch from Mohit for this.
>
>
>> Susant
>>
>> On Wed, 1 Aug 2018, 18:58 Krutika Dhananjay, 
>> wrote:
>>
>>> Same here -
>>> https://build.gluster.org/job/centos7-regression/2024/console
>>>
>>> -Krutika
>>>
>>> On Sun, Jul 29, 2018 at 1:53 PM, Atin Mukherjee >> > wrote:
>>>
 tests/bugs/distribute/bug-1122443.t fails my set up (3 out of 5
 times) running with master branch. As per my knowledge I've not seen 
 this
 test failing earlier. Looks like some recent changes has caused it. 
 One of
 such instance is
 https://build.gluster.org/job/centos7-regression/1955/ .

 Request the component owners to take a look at it.

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 https://lists.gluster.org/mailman/listinfo/gluster-devel

>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
> --
> --Atin
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>


>>>
>>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Milind Changire
On Fri, Aug 3, 2018 at 11:04 AM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> On Thu, Aug 2, 2018 at 10:03 PM Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> On Thu, Aug 2, 2018 at 7:19 PM Atin Mukherjee 
>> wrote:
>>
>>> New addition - tests/basic/volume.t - failed twice atleast with shd core.
>>>
>>> One such ref - https://build.gluster.org/job/centos7-regression/2058/
>>> console
>>>
>>
>> I will take a look.
>>
>
> The crash is happening inside libc and there are no line numbers to debug
> further. Is there anyway to get symbols, line numbers even for that? We can
> find hints as to what could be going wrong. Let me try to re-create it on
> the machines I have in the meanwhile.
>
> (gdb) bt
> #0  0x7feae916bb4f in _IO_cleanup () from ./lib64/libc.so.6
> #1  0x7feae9127b8b in __run_exit_handlers () from ./lib64/libc.so.6
> #2  0x7feae9127c27 in exit () from ./lib64/libc.so.6
> #3  0x00408ba5 in cleanup_and_exit (signum=15) at
> /home/jenkins/root/workspace/centos7-regression/glusterfsd/
> src/glusterfsd.c:1570
> #4  0x0040a75f in glusterfs_sigwaiter (arg=0x7ffe6faa7540) at
> /home/jenkins/root/workspace/centos7-regression/glusterfsd/
> src/glusterfsd.c:2332
> #5  0x7feae9b27e25 in start_thread () from ./lib64/libpthread.so.0
> #6  0x7feae91ecbad in clone () from ./lib64/libc.so.6
>
> You could install the glibc-debuginfo and other relevant debuginfos on the
system you are trying to reproduce this issue on. That will get you the
line numbers  and symbols.


>>
>>>
>>>
>>> On Thu, Aug 2, 2018 at 6:28 PM Sankarshan Mukhopadhyay <
>>> sankarshan.mukhopadh...@gmail.com> wrote:
>>>
 On Thu, Aug 2, 2018 at 5:48 PM, Kotresh Hiremath Ravishankar
  wrote:
 > I am facing different issue in softserve machines. The fuse mount
 itself is
 > failing.
 > I tried day before yesterday to debug geo-rep failures. I discussed
 with
 > Raghu,
 > but could not root cause it. So none of the tests were passing. It
 happened
 > on
 > both machine instances I tried.
 >

 Ugh! -infra team should have an issue to work with and resolve this.


 --
 sankarshan mukhopadhyay
 
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 https://lists.gluster.org/mailman/listinfo/gluster-devel

>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>> --
>> Pranith
>>
>
>
> --
> Pranith
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Milind
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Raghavendra Gowdappa
On Fri, Aug 3, 2018 at 4:01 PM, Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> Hi Du/Poornima,
>
> I was analysing bitrot and geo-rep failures and I suspect there is a bug
> in some perf xlator
> that was one of the cause. I was seeing following behaviour in few runs.
>
> 1. Geo-rep synced data to slave. It creats empty file and then rsync syncs
> data.
> But test does "stat --format "%F" " to confirm. If it's empty,
> it returns
> "regular empty file" else "regular file". I believe it did get the
> "regular empty file"
> instead of "regular file" until timeout.
>

https://review.gluster.org/20549 might be relevant.


> 2. Other behaviour is with bitrot, with brick-mux. If a file is deleted on
> the back end on one brick
> and the look up is done. What all performance xlators needs to be
> disabled to get the lookup/revalidate
> on the brick where the file was deleted. Earlier, only md-cache was
> disable and it used to work.
> No it's failing intermittently.
>

You need to disable readdirplus in the entire stack. Refer to
https://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html


> Are there any pending patches around these areas that needs to be merged ?
> If there are, then it could be affecting other tests as well.
>
> Thanks,
> Kotresh HR
>
> On Fri, Aug 3, 2018 at 3:07 PM, Karthik Subrahmanya 
> wrote:
>
>>
>>
>> On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya 
>> wrote:
>>
>>>
>>>
>>> On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya 
>>> wrote:
>>>


 On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, 
 wrote:

> I just went through the nightly regression report of brick mux runs
> and here's what I can summarize.
>
> 
> 
> =
> Fails only with brick-mux
> 
> 
> =
> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even
> after 400 secs. Refer https://fstat.gluster.org/fail
> ure/209?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all,
> specifically the latest report https://build.gluster.org/job/
> regression-test-burn-in/4051/consoleText . Wasn't timing out as
> frequently as it was till 12 July. But since 27 July, it has timed out
> twice. Beginning to believe commit 
> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2
> has added the delay and now 400 secs isn't sufficient enough (Mohit?)
>
> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Ref - https://build.gluster.org/job/regression-test-with-multiplex
> /814/console) -  Test fails only in brick-mux mode, AI on Atin to
> look at and get back.
>
> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
> https://build.gluster.org/job/regression-test-with-multiple
> x/813/console) - Seems like failed just twice in last 30 days as per
> https://fstat.gluster.org/failure/251?state=2&start_date=
> 2018-06-30&end_date=2018-07-31&branch=all. Need help from AFR team.
>
> tests/bugs/quota/bug-1293601.t (https://build.gluster.org/job
> /regression-test-with-multiplex/812/console) - Hasn't failed after 26
> July and earlier it was failing regularly. Did we fix this test through 
> any
> patch (Mohit?)
>
> tests/bitrot/bug-1373520.t - (https://build.gluster.org/job
> /regression-test-with-multiplex/811/console)  - Hasn't failed after
> 27 July and earlier it was failing regularly. Did we fix this test through
> any patch (Mohit?)
>
> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a
> core, not sure if related to brick mux or not, so not sure if brick mux is
> culprit here or not. Ref - https://build.gluster.org/job/
> regression-test-with-multiplex/806/console . Seems to be a glustershd
> crash. Need help from AFR folks.
>
> 
> 
> =
> Fails for non-brick mux case too
> 
> 
> =
> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
> very often, with out brick mux as well. Refer
> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText
> . There's an email in gluster-devel and a BZ 1610240 for the same.
>
> tests/bugs/bug-1368312.t - Seems to be recent failures (
> https://build.glus

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Karthik Subrahmanya
On Fri, Aug 3, 2018 at 3:07 PM Karthik Subrahmanya 
wrote:

>
>
> On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya 
> wrote:
>
>>
>>
>> On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya 
>> wrote:
>>
>>>
>>>
>>> On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, 
>>> wrote:
>>>
 I just went through the nightly regression report of brick mux runs and
 here's what I can summarize.


 =
 Fails only with brick-mux

 =
 tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
 400 secs. Refer
 https://fstat.gluster.org/failure/209?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all,
 specifically the latest report
 https://build.gluster.org/job/regression-test-burn-in/4051/consoleText
 . Wasn't timing out as frequently as it was till 12 July. But since 27
 July, it has timed out twice. Beginning to believe commit
 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
 secs isn't sufficient enough (Mohit?)

 tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
 (Ref -
 https://build.gluster.org/job/regression-test-with-multiplex/814/console)
 -  Test fails only in brick-mux mode, AI on Atin to look at and get back.

 tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
 https://build.gluster.org/job/regression-test-with-multiplex/813/console)
 - Seems like failed just twice in last 30 days as per
 https://fstat.gluster.org/failure/251?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all.
 Need help from AFR team.

 tests/bugs/quota/bug-1293601.t (
 https://build.gluster.org/job/regression-test-with-multiplex/812/console)
 - Hasn't failed after 26 July and earlier it was failing regularly. Did we
 fix this test through any patch (Mohit?)

 tests/bitrot/bug-1373520.t - (
 https://build.gluster.org/job/regression-test-with-multiplex/811/console)
 - Hasn't failed after 27 July and earlier it was failing regularly. Did we
 fix this test through any patch (Mohit?)

 tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core,
 not sure if related to brick mux or not, so not sure if brick mux is
 culprit here or not. Ref -
 https://build.gluster.org/job/regression-test-with-multiplex/806/console
 . Seems to be a glustershd crash. Need help from AFR folks.


 =
 Fails for non-brick mux case too

 =
 tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
 very often, with out brick mux as well. Refer
 https://build.gluster.org/job/regression-test-burn-in/4050/consoleText
 . There's an email in gluster-devel and a BZ 1610240 for the same.

 tests/bugs/bug-1368312.t - Seems to be recent failures (
 https://build.gluster.org/job/regression-test-with-multiplex/815/console)
 - seems to be a new failure, however seen this for a non-brick-mux case too
 -
 https://build.gluster.org/job/regression-test-burn-in/4039/consoleText
 . Need some eyes from AFR folks.

 tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to
 brick mux, have seen this failing at multiple default regression runs.
 Refer
 https://fstat.gluster.org/failure/392?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
 . We need help from geo-rep dev to root cause this earlier than later

 tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to brick
 mux, have seen this failing at multiple default regression runs. Refer
 https://fstat.gluster.org/failure/393?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
 . We need help from geo-rep dev to root cause this earlier than later

 tests/bugs/glusterd/validating-server-quorum.t (
 https://build.gluster.org/job/regression-test-with-multiplex/810/console)
 - Fails for non-brick-mux cases too,
 https://fstat.gluster.org/failure/580?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
 .  Atin has a patch https://review.gluster.org/20584 which resolves it
 but patch is failing regression for a different test which is unrelated.

 tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on

Re: [Gluster-devel] tests/bugs/distribute/bug-1122443.t - spurious failure

2018-08-03 Thread Raghavendra Gowdappa
Will take a look.

On Fri, Aug 3, 2018 at 3:08 PM, Krutika Dhananjay 
wrote:

> Adding Raghavendra G who actually restored and reworked on this after it
> was abandoned.
>
> -Krutika
>
> On Fri, Aug 3, 2018 at 2:38 PM, Nithya Balachandran 
> wrote:
>
>> Using git bisect, the patch that introduced this behaviour is :
>>
>> commit 7131de81f72dda0ef685ed60d0887c6e14289b8c
>> Author: Krutika Dhananjay 
>> Date:   Tue Jan 17 16:40:04 2017 +0530
>>
>> performance/readdir-ahead: Invalidate cached dentries if they're
>> modified while in cache
>>
>> Krutika, can you take a look and fix this?
>>
>> To summarize, this is _not_ a spurious failure.
>>
>>
>> regards,
>> Nithya
>>
>>
>> On 3 August 2018 at 14:13, Nithya Balachandran 
>> wrote:
>>
>>> This is a new issue - the test uses ls -l to get some information. With
>>> the latest master, ls -l returns strange results the first time it is
>>> called on the mount point causing the test to fail:
>>>
>>>
>>> With the latest master, I created a single brick volume and some files
>>> inside it.
>>>
>>> [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
>>> 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
>>> again"; ls -l /mnt/fuse1
>>> umount: /mnt/fuse1: not mounted
>>> total 0
>>> *--. 0 root root 0 Jan  1  1970 file-1*
>>> *--. 0 root root 0 Jan  1  1970 file-2*
>>> *--. 0 root root 0 Jan  1  1970 file-3*
>>> *--. 0 root root 0 Jan  1  1970 file-4*
>>> *--. 0 root root 0 Jan  1  1970 file-5*
>>> *d-. 0 root root 0 Jan  1  1970 subdir*
>>> Trying again
>>> total 3
>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
>>> d-. 0 root root  0 Jan  1  1970 subdir
>>> [root@rhgs313-6 ~]#
>>>
>>>
>>>
>>> This is consistently reproducible. I am still debugging this to see
>>> which patch caused this.
>>>
>>> regards,
>>> Nithya
>>>
>>>
>>> On 2 August 2018 at 07:13, Atin Mukherjee 
>>> wrote:
>>>


 On Thu, 2 Aug 2018 at 07:05, Susant Palai  wrote:

> Will have a look at it and update.
>

 There’s already a patch from Mohit for this.


> Susant
>
> On Wed, 1 Aug 2018, 18:58 Krutika Dhananjay, 
> wrote:
>
>> Same here - https://build.gluster.org/job/
>> centos7-regression/2024/console
>>
>> -Krutika
>>
>> On Sun, Jul 29, 2018 at 1:53 PM, Atin Mukherjee 
>> wrote:
>>
>>> tests/bugs/distribute/bug-1122443.t fails my set up (3 out of 5
>>> times) running with master branch. As per my knowledge I've not seen 
>>> this
>>> test failing earlier. Looks like some recent changes has caused it. One 
>>> of
>>> such instance is https://build.gluster.org/job/
>>> centos7-regression/1955/ .
>>>
>>> Request the component owners to take a look at it.
>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel

 --
 --Atin

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 https://lists.gluster.org/mailman/listinfo/gluster-devel

>>>
>>>
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Kotresh Hiremath Ravishankar
Hi Du/Poornima,

I was analysing bitrot and geo-rep failures and I suspect there is a bug in
some perf xlator
that was one of the cause. I was seeing following behaviour in few runs.

1. Geo-rep synced data to slave. It creats empty file and then rsync syncs
data.
But test does "stat --format "%F" " to confirm. If it's empty, it
returns
"regular empty file" else "regular file". I believe it did get the
"regular empty file"
instead of "regular file" until timeout.

2. Other behaviour is with bitrot, with brick-mux. If a file is deleted on
the back end on one brick
and the look up is done. What all performance xlators needs to be
disabled to get the lookup/revalidate
on the brick where the file was deleted. Earlier, only md-cache was
disable and it used to work.
No it's failing intermittently.

Are there any pending patches around these areas that needs to be merged ?
If there are, then it could be affecting other tests as well.

Thanks,
Kotresh HR

On Fri, Aug 3, 2018 at 3:07 PM, Karthik Subrahmanya 
wrote:

>
>
> On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya 
> wrote:
>
>>
>>
>> On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya 
>> wrote:
>>
>>>
>>>
>>> On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, 
>>> wrote:
>>>
 I just went through the nightly regression report of brick mux runs and
 here's what I can summarize.

 
 
 =
 Fails only with brick-mux
 
 
 =
 tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
 400 secs. Refer https://fstat.gluster.org/failure/209?state=2&start_
 date=2018-06-30&end_date=2018-07-31&branch=all, specifically the
 latest report https://build.gluster.org/job/
 regression-test-burn-in/4051/consoleText . Wasn't timing out as
 frequently as it was till 12 July. But since 27 July, it has timed out
 twice. Beginning to believe commit 9400b6f2c8aa219a493961e0ab9770b7f12e80d2
 has added the delay and now 400 secs isn't sufficient enough (Mohit?)

 tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
 (Ref - https://build.gluster.org/job/regression-test-with-
 multiplex/814/console) -  Test fails only in brick-mux mode, AI on
 Atin to look at and get back.

 tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
 https://build.gluster.org/job/regression-test-with-
 multiplex/813/console) - Seems like failed just twice in last 30 days
 as per https://fstat.gluster.org/failure/251?state=2&start_
 date=2018-06-30&end_date=2018-07-31&branch=all. Need help from AFR
 team.

 tests/bugs/quota/bug-1293601.t (https://build.gluster.org/
 job/regression-test-with-multiplex/812/console) - Hasn't failed after
 26 July and earlier it was failing regularly. Did we fix this test through
 any patch (Mohit?)

 tests/bitrot/bug-1373520.t - (https://build.gluster.org/
 job/regression-test-with-multiplex/811/console)  - Hasn't failed after
 27 July and earlier it was failing regularly. Did we fix this test through
 any patch (Mohit?)

 tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a
 core, not sure if related to brick mux or not, so not sure if brick mux is
 culprit here or not. Ref - https://build.gluster.org/job/
 regression-test-with-multiplex/806/console . Seems to be a glustershd
 crash. Need help from AFR folks.

 
 
 =
 Fails for non-brick mux case too
 
 
 =
 tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
 very often, with out brick mux as well. Refer
 https://build.gluster.org/job/regression-test-burn-in/4050/consoleText
 . There's an email in gluster-devel and a BZ 1610240 for the same.

 tests/bugs/bug-1368312.t - Seems to be recent failures (
 https://build.gluster.org/job/regression-test-with-
 multiplex/815/console) - seems to be a new failure, however seen this
 for a non-brick-mux case too - https://build.gluster.org/job/
 regression-test-burn-in/4039/consoleText . Need some eyes from AFR
 folks.

 tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to
 brick mux, have seen this failing at multiple default regression runs.
 Refer 

Re: [Gluster-devel] tests/bugs/distribute/bug-1122443.t - spurious failure

2018-08-03 Thread Krutika Dhananjay
Adding Raghavendra G who actually restored and reworked on this after it
was abandoned.

-Krutika

On Fri, Aug 3, 2018 at 2:38 PM, Nithya Balachandran 
wrote:

> Using git bisect, the patch that introduced this behaviour is :
>
> commit 7131de81f72dda0ef685ed60d0887c6e14289b8c
> Author: Krutika Dhananjay 
> Date:   Tue Jan 17 16:40:04 2017 +0530
>
> performance/readdir-ahead: Invalidate cached dentries if they're
> modified while in cache
>
> Krutika, can you take a look and fix this?
>
> To summarize, this is _not_ a spurious failure.
>
>
> regards,
> Nithya
>
>
> On 3 August 2018 at 14:13, Nithya Balachandran 
> wrote:
>
>> This is a new issue - the test uses ls -l to get some information. With
>> the latest master, ls -l returns strange results the first time it is
>> called on the mount point causing the test to fail:
>>
>>
>> With the latest master, I created a single brick volume and some files
>> inside it.
>>
>> [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
>> 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
>> again"; ls -l /mnt/fuse1
>> umount: /mnt/fuse1: not mounted
>> total 0
>> *--. 0 root root 0 Jan  1  1970 file-1*
>> *--. 0 root root 0 Jan  1  1970 file-2*
>> *--. 0 root root 0 Jan  1  1970 file-3*
>> *--. 0 root root 0 Jan  1  1970 file-4*
>> *--. 0 root root 0 Jan  1  1970 file-5*
>> *d-. 0 root root 0 Jan  1  1970 subdir*
>> Trying again
>> total 3
>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
>> d-. 0 root root  0 Jan  1  1970 subdir
>> [root@rhgs313-6 ~]#
>>
>>
>>
>> This is consistently reproducible. I am still debugging this to see which
>> patch caused this.
>>
>> regards,
>> Nithya
>>
>>
>> On 2 August 2018 at 07:13, Atin Mukherjee 
>> wrote:
>>
>>>
>>>
>>> On Thu, 2 Aug 2018 at 07:05, Susant Palai  wrote:
>>>
 Will have a look at it and update.

>>>
>>> There’s already a patch from Mohit for this.
>>>
>>>
 Susant

 On Wed, 1 Aug 2018, 18:58 Krutika Dhananjay, 
 wrote:

> Same here - https://build.gluster.org/job/
> centos7-regression/2024/console
>
> -Krutika
>
> On Sun, Jul 29, 2018 at 1:53 PM, Atin Mukherjee 
> wrote:
>
>> tests/bugs/distribute/bug-1122443.t fails my set up (3 out of 5
>> times) running with master branch. As per my knowledge I've not seen this
>> test failing earlier. Looks like some recent changes has caused it. One 
>> of
>> such instance is https://build.gluster.org/job/
>> centos7-regression/1955/ .
>>
>> Request the component owners to take a look at it.
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>> --
>>> --Atin
>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Karthik Subrahmanya
On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya 
wrote:

>
>
> On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya 
> wrote:
>
>>
>>
>> On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, 
>> wrote:
>>
>>> I just went through the nightly regression report of brick mux runs and
>>> here's what I can summarize.
>>>
>>>
>>> =
>>> Fails only with brick-mux
>>>
>>> =
>>> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
>>> 400 secs. Refer
>>> https://fstat.gluster.org/failure/209?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all,
>>> specifically the latest report
>>> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText
>>> . Wasn't timing out as frequently as it was till 12 July. But since 27
>>> July, it has timed out twice. Beginning to believe commit
>>> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
>>> secs isn't sufficient enough (Mohit?)
>>>
>>> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>>> (Ref -
>>> https://build.gluster.org/job/regression-test-with-multiplex/814/console)
>>> -  Test fails only in brick-mux mode, AI on Atin to look at and get back.
>>>
>>> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
>>> https://build.gluster.org/job/regression-test-with-multiplex/813/console)
>>> - Seems like failed just twice in last 30 days as per
>>> https://fstat.gluster.org/failure/251?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all.
>>> Need help from AFR team.
>>>
>>> tests/bugs/quota/bug-1293601.t (
>>> https://build.gluster.org/job/regression-test-with-multiplex/812/console)
>>> - Hasn't failed after 26 July and earlier it was failing regularly. Did we
>>> fix this test through any patch (Mohit?)
>>>
>>> tests/bitrot/bug-1373520.t - (
>>> https://build.gluster.org/job/regression-test-with-multiplex/811/console)
>>> - Hasn't failed after 27 July and earlier it was failing regularly. Did we
>>> fix this test through any patch (Mohit?)
>>>
>>> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core,
>>> not sure if related to brick mux or not, so not sure if brick mux is
>>> culprit here or not. Ref -
>>> https://build.gluster.org/job/regression-test-with-multiplex/806/console
>>> . Seems to be a glustershd crash. Need help from AFR folks.
>>>
>>>
>>> =
>>> Fails for non-brick mux case too
>>>
>>> =
>>> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
>>> very often, with out brick mux as well. Refer
>>> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText
>>> . There's an email in gluster-devel and a BZ 1610240 for the same.
>>>
>>> tests/bugs/bug-1368312.t - Seems to be recent failures (
>>> https://build.gluster.org/job/regression-test-with-multiplex/815/console)
>>> - seems to be a new failure, however seen this for a non-brick-mux case too
>>> - https://build.gluster.org/job/regression-test-burn-in/4039/consoleText
>>> . Need some eyes from AFR folks.
>>>
>>> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to brick
>>> mux, have seen this failing at multiple default regression runs. Refer
>>> https://fstat.gluster.org/failure/392?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>>> . We need help from geo-rep dev to root cause this earlier than later
>>>
>>> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to brick
>>> mux, have seen this failing at multiple default regression runs. Refer
>>> https://fstat.gluster.org/failure/393?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>>> . We need help from geo-rep dev to root cause this earlier than later
>>>
>>> tests/bugs/glusterd/validating-server-quorum.t (
>>> https://build.gluster.org/job/regression-test-with-multiplex/810/console)
>>> - Fails for non-brick-mux cases too,
>>> https://fstat.gluster.org/failure/580?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>>> .  Atin has a patch https://review.gluster.org/20584 which resolves it
>>> but patch is failing regression for a different test which is unrelated.
>>>
>>> tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
>>> (Ref -
>>> https://build.gluster.org/job/regression-test-with-multiplex/809/console)
>>> - fails for non brick mux case too -
>>> https://build.gl

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Nithya Balachandran
On 31 July 2018 at 22:11, Atin Mukherjee  wrote:

> I just went through the nightly regression report of brick mux runs and
> here's what I can summarize.
>
> 
> 
> =
> Fails only with brick-mux
> 
> 
> =
> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
> 400 secs. Refer https://fstat.gluster.org/failure/209?state=2&start_
> date=2018-06-30&end_date=2018-07-31&branch=all, specifically the latest
> report https://build.gluster.org/job/regression-test-burn-in/4051/
> consoleText . Wasn't timing out as frequently as it was till 12 July. But
> since 27 July, it has timed out twice. Beginning to believe commit
> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
> secs isn't sufficient enough (Mohit?)
>
> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Ref - https://build.gluster.org/job/regression-test-with-
> multiplex/814/console) -  Test fails only in brick-mux mode, AI on Atin
> to look at and get back.
>
> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
> https://build.gluster.org/job/regression-test-with-multiplex/813/console)
> - Seems like failed just twice in last 30 days as per
> https://fstat.gluster.org/failure/251?state=2&start_
> date=2018-06-30&end_date=2018-07-31&branch=all. Need help from AFR team.
>
> tests/bugs/quota/bug-1293601.t (https://build.gluster.org/
> job/regression-test-with-multiplex/812/console) - Hasn't failed after 26
> July and earlier it was failing regularly. Did we fix this test through any
> patch (Mohit?)
>
> tests/bitrot/bug-1373520.t - (https://build.gluster.org/
> job/regression-test-with-multiplex/811/console)  - Hasn't failed after 27
> July and earlier it was failing regularly. Did we fix this test through any
> patch (Mohit?)
>
> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core,
> not sure if related to brick mux or not, so not sure if brick mux is
> culprit here or not. Ref - https://build.gluster.org/job/
> regression-test-with-multiplex/806/console . Seems to be a glustershd
> crash. Need help from AFR folks.
>
> 
> 
> =
> Fails for non-brick mux case too
> 
> 
> =
> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
> very often, with out brick mux as well. Refer
> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText .
> There's an email in gluster-devel and a BZ 1610240 for the same.
>

Not a spurious failure. This is a bug introduced by commit
7131de81f72dda0ef685ed60d0887c6e14289b8c. I have provided more details in
the other email thread around this.

regards,
Nithya


>
>
> tests/bugs/bug-1368312.t - Seems to be recent failures (
> https://build.gluster.org/job/regression-test-with-multiplex/815/console)
> - seems to be a new failure, however seen this for a non-brick-mux case too
> - https://build.gluster.org/job/regression-test-burn-in/4039/consoleText
> . Need some eyes from AFR folks.
>
> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to brick
> mux, have seen this failing at multiple default regression runs. Refer
> https://fstat.gluster.org/failure/392?state=2&start_
> date=2018-06-30&end_date=2018-07-31&branch=all . We need help from
> geo-rep dev to root cause this earlier than later
>
> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to brick
> mux, have seen this failing at multiple default regression runs. Refer
> https://fstat.gluster.org/failure/393?state=2&start_
> date=2018-06-30&end_date=2018-07-31&branch=all . We need help from
> geo-rep dev to root cause this earlier than later
>
> tests/bugs/glusterd/validating-server-quorum.t (https://build.gluster.org/
> job/regression-test-with-multiplex/810/console) - Fails for non-brick-mux
> cases too, https://fstat.gluster.org/failure/580?state=2&start_
> date=2018-06-30&end_date=2018-07-31&branch=all .  Atin has a patch
> https://review.gluster.org/20584 which resolves it but patch is failing
> regression for a different test which is unrelated.
>
> tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> (Ref - https://build.gluster.org/job/regression-test-with-
> multiplex/809/console) - fails for non brick mux case too -
> https://build.gluster.org/job/regression-test-burn-in/4049/consoleText -
> Need some eyes from AFR folks.
>
> __

Re: [Gluster-devel] tests/bugs/distribute/bug-1122443.t - spurious failure

2018-08-03 Thread Nithya Balachandran
Using git bisect, the patch that introduced this behaviour is :

commit 7131de81f72dda0ef685ed60d0887c6e14289b8c
Author: Krutika Dhananjay 
Date:   Tue Jan 17 16:40:04 2017 +0530

performance/readdir-ahead: Invalidate cached dentries if they're
modified while in cache

Krutika, can you take a look and fix this?

To summarize, this is _not_ a spurious failure.


regards,
Nithya


On 3 August 2018 at 14:13, Nithya Balachandran  wrote:

> This is a new issue - the test uses ls -l to get some information. With
> the latest master, ls -l returns strange results the first time it is
> called on the mount point causing the test to fail:
>
>
> With the latest master, I created a single brick volume and some files
> inside it.
>
> [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
> 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
> again"; ls -l /mnt/fuse1
> umount: /mnt/fuse1: not mounted
> total 0
> *--. 0 root root 0 Jan  1  1970 file-1*
> *--. 0 root root 0 Jan  1  1970 file-2*
> *--. 0 root root 0 Jan  1  1970 file-3*
> *--. 0 root root 0 Jan  1  1970 file-4*
> *--. 0 root root 0 Jan  1  1970 file-5*
> *d-. 0 root root 0 Jan  1  1970 subdir*
> Trying again
> total 3
> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
> d-. 0 root root  0 Jan  1  1970 subdir
> [root@rhgs313-6 ~]#
>
>
>
> This is consistently reproducible. I am still debugging this to see which
> patch caused this.
>
> regards,
> Nithya
>
>
> On 2 August 2018 at 07:13, Atin Mukherjee 
> wrote:
>
>>
>>
>> On Thu, 2 Aug 2018 at 07:05, Susant Palai  wrote:
>>
>>> Will have a look at it and update.
>>>
>>
>> There’s already a patch from Mohit for this.
>>
>>
>>> Susant
>>>
>>> On Wed, 1 Aug 2018, 18:58 Krutika Dhananjay, 
>>> wrote:
>>>
 Same here - https://build.gluster.org/job/
 centos7-regression/2024/console

 -Krutika

 On Sun, Jul 29, 2018 at 1:53 PM, Atin Mukherjee 
 wrote:

> tests/bugs/distribute/bug-1122443.t fails my set up (3 out of 5
> times) running with master branch. As per my knowledge I've not seen this
> test failing earlier. Looks like some recent changes has caused it. One of
> such instance is https://build.gluster.org/job/
> centos7-regression/1955/ .
>
> Request the component owners to take a look at it.
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>> --
>> --Atin
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] tests/bugs/distribute/bug-1122443.t - spurious failure

2018-08-03 Thread Nithya Balachandran
This is a new issue - the test uses ls -l to get some information. With the
latest master, ls -l returns strange results the first time it is called on
the mount point causing the test to fail:


With the latest master, I created a single brick volume and some files
inside it.

[root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying again";
ls -l /mnt/fuse1
umount: /mnt/fuse1: not mounted
total 0
*--. 0 root root 0 Jan  1  1970 file-1*
*--. 0 root root 0 Jan  1  1970 file-2*
*--. 0 root root 0 Jan  1  1970 file-3*
*--. 0 root root 0 Jan  1  1970 file-4*
*--. 0 root root 0 Jan  1  1970 file-5*
*d-. 0 root root 0 Jan  1  1970 subdir*
Trying again
total 3
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
d-. 0 root root  0 Jan  1  1970 subdir
[root@rhgs313-6 ~]#



This is consistently reproducible. I am still debugging this to see which
patch caused this.

regards,
Nithya


On 2 August 2018 at 07:13, Atin Mukherjee 
wrote:

>
>
> On Thu, 2 Aug 2018 at 07:05, Susant Palai  wrote:
>
>> Will have a look at it and update.
>>
>
> There’s already a patch from Mohit for this.
>
>
>> Susant
>>
>> On Wed, 1 Aug 2018, 18:58 Krutika Dhananjay,  wrote:
>>
>>> Same here - https://build.gluster.org/job/centos7-regression/2024/
>>> console
>>>
>>> -Krutika
>>>
>>> On Sun, Jul 29, 2018 at 1:53 PM, Atin Mukherjee 
>>> wrote:
>>>
 tests/bugs/distribute/bug-1122443.t fails my set up (3 out of 5 times)
 running with master branch. As per my knowledge I've not seen this test
 failing earlier. Looks like some recent changes has caused it. One of such
 instance is https://build.gluster.org/job/centos7-regression/1955/ .

 Request the component owners to take a look at it.

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 https://lists.gluster.org/mailman/listinfo/gluster-devel

>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
> --
> --Atin
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Karthik Subrahmanya
On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya 
wrote:

>
>
> On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee,  wrote:
>
>> I just went through the nightly regression report of brick mux runs and
>> here's what I can summarize.
>>
>>
>> =
>> Fails only with brick-mux
>>
>> =
>> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after
>> 400 secs. Refer
>> https://fstat.gluster.org/failure/209?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all,
>> specifically the latest report
>> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText .
>> Wasn't timing out as frequently as it was till 12 July. But since 27 July,
>> it has timed out twice. Beginning to believe commit
>> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400
>> secs isn't sufficient enough (Mohit?)
>>
>> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>> (Ref -
>> https://build.gluster.org/job/regression-test-with-multiplex/814/console)
>> -  Test fails only in brick-mux mode, AI on Atin to look at and get back.
>>
>> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
>> https://build.gluster.org/job/regression-test-with-multiplex/813/console)
>> - Seems like failed just twice in last 30 days as per
>> https://fstat.gluster.org/failure/251?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all.
>> Need help from AFR team.
>>
>> tests/bugs/quota/bug-1293601.t (
>> https://build.gluster.org/job/regression-test-with-multiplex/812/console)
>> - Hasn't failed after 26 July and earlier it was failing regularly. Did we
>> fix this test through any patch (Mohit?)
>>
>> tests/bitrot/bug-1373520.t - (
>> https://build.gluster.org/job/regression-test-with-multiplex/811/console)
>> - Hasn't failed after 27 July and earlier it was failing regularly. Did we
>> fix this test through any patch (Mohit?)
>>
>> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core,
>> not sure if related to brick mux or not, so not sure if brick mux is
>> culprit here or not. Ref -
>> https://build.gluster.org/job/regression-test-with-multiplex/806/console
>> . Seems to be a glustershd crash. Need help from AFR folks.
>>
>>
>> =
>> Fails for non-brick mux case too
>>
>> =
>> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
>> very often, with out brick mux as well. Refer
>> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText .
>> There's an email in gluster-devel and a BZ 1610240 for the same.
>>
>> tests/bugs/bug-1368312.t - Seems to be recent failures (
>> https://build.gluster.org/job/regression-test-with-multiplex/815/console)
>> - seems to be a new failure, however seen this for a non-brick-mux case too
>> - https://build.gluster.org/job/regression-test-burn-in/4039/consoleText
>> . Need some eyes from AFR folks.
>>
>> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to brick
>> mux, have seen this failing at multiple default regression runs. Refer
>> https://fstat.gluster.org/failure/392?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>> . We need help from geo-rep dev to root cause this earlier than later
>>
>> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to brick
>> mux, have seen this failing at multiple default regression runs. Refer
>> https://fstat.gluster.org/failure/393?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>> . We need help from geo-rep dev to root cause this earlier than later
>>
>> tests/bugs/glusterd/validating-server-quorum.t (
>> https://build.gluster.org/job/regression-test-with-multiplex/810/console)
>> - Fails for non-brick-mux cases too,
>> https://fstat.gluster.org/failure/580?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all
>> .  Atin has a patch https://review.gluster.org/20584 which resolves it
>> but patch is failing regression for a different test which is unrelated.
>>
>> tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
>> (Ref -
>> https://build.gluster.org/job/regression-test-with-multiplex/809/console)
>> - fails for non brick mux case too -
>> https://build.gluster.org/job/regression-test-burn-in/4049/consoleText -
>> Need some eyes from AFR folks.
>>
> I am looking at this. It is not reproducible locally. Trying to