Re: [Gluster-devel] .glusterfs directory?

2020-12-21 Thread Emmanuel Dreyfus
On Mon, Dec 21, 2020 at 01:53:06PM +0530, Ravishankar N wrote:
> Are you talking about the entries inside.glusterfs/indices/xattrop/* ? Any
> stale entries here should automatically be purged when self-heal daemon as
> it crawls the folder periodically.

I mean for instance:
# ls -l .glusterfs/aa/aa/dd69-7b3d-45e9-bd0f-8a8bbaa189a5
lrwxrwxrwx  1 root  wheel  60 Nov  4  2018 
.glusterfs//aa/aa/dd69-7b3d-45e9-bd0f-8a8bbaa189a5 -> 
../../f0/91/f091de81-a4e2-4548-acf4-4b19c7bdac5e/tpm_nvwrite
# ls -l .glusterfs/f0/91/f091de81-a4e2-4548-acf4-4b19c7bdac 
ls: .glusterfs/f0/91/f091de81-a4e2-4548-acf4-4b19c7bdac5e/tpm_nvwrite: No such 
file or directory



-- 
Emmanuel Dreyfus
m...@netbsd.org
---

Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] .glusterfs directory?

2020-12-20 Thread Emmanuel Dreyfus
> On a healthy system, one should definitely not remove any files or sub
> directories inside .glusterfs as they contain important metadata. Which
> entries specifically inside .glusterfs do you think are stale and why?

There are indexes leading to no file, causing heal complains.

-- 
Emmanuel Dreyfus
m...@netbsd.org
---

Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] .glusterfs directory?

2020-12-20 Thread Emmanuel Dreyfus
Hello

I have a lot of stale entries in bricks' .glusterfs directories. Is it
safe to just rm -rf it and hope for automatic rebuild? Reading the
source and experimenting, it does not seems obvious.

Or is there a way to clean up stale entries that lead to files that do
not exist anymore? 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
---

Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] readdir performance

2020-08-18 Thread Emmanuel Dreyfus
 us  61.70 us  1GETXATTR
  0.00  44.23 us  40.46 us  47.99 us  2  STATFS
  0.00  66.94 us  53.19 us  80.69 us  2 OPENDIR
  0.01 214.48 us 132.89 us 350.46 us 12  LOOKUP
 99.99 19333949.34 us  808361.68 us 37859537.00 us  2
READDIRP
 
Duration: 138824 seconds
   Data Read: 146374656 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
   Block Size:   4096b+8192b+   16384b+ 
 No. of Reads:9 5 2 
No. of Writes:0 0 0 
 
   Block Size:  32768b+ 
 No. of Reads: 4463 
No. of Writes:0 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls Fop
 -   ---   ---   ---   
  0.00   0.00 us   0.00 us   0.00 us127  FORGET
  0.00   0.00 us   0.00 us   0.00 us127 RELEASE
  0.00   0.00 us   0.00 us   0.00 us  66931  RELEASEDIR
  0.00  61.70 us  61.70 us  61.70 us  1GETXATTR
  0.00  44.23 us  40.46 us  47.99 us  2  STATFS
  0.00  66.94 us  53.19 us  80.69 us  2 OPENDIR
  0.01 214.48 us 132.89 us 350.46 us 12  LOOKUP
 99.99 19333949.34 us  808361.68 us 37859537.00 us  2
READDIRP
 
Duration: 138824 seconds
   Data Read: 146374656 bytes
Data Written: 0 bytes
 
Brick: baril:/export/wd2e
-
Cumulative Stats:
   Block Size:   8192b+   16384b+   32768b+ 
 No. of Reads:   10 220 
No. of Writes:0 0 0 
 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls Fop
 -   ---   ---   ---   
  0.00   0.00 us   0.00 us   0.00 us134  FORGET
  0.00   0.00 us   0.00 us   0.00 us135 RELEASE
  0.00   0.00 us   0.00 us   0.00 us  67991  RELEASEDIR
  0.07  52.79 us  52.79 us  52.79 us  1GETXATTR
  0.10  39.25 us  38.38 us  40.13 us  2  STATFS
  0.13  51.45 us  45.36 us  57.55 us  2 OPENDIR
 20.281406.18 us 133.88 us   13190.43 us 11  LOOKUP
 79.41   60571.65 us   60571.65 us   60571.65 us  1READDIRP
 
Duration: 138822 seconds
   Data Read: 786432 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
   Block Size:   8192b+   16384b+   32768b+ 
 No. of Reads:   10 220 
No. of Writes:0 0 0 
 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls Fop
 -   ---   ---   ---   
  0.00   0.00 us   0.00 us   0.00 us134  FORGET
  0.00   0.00 us   0.00 us   0.00 us135 RELEASE
  0.00   0.00 us   0.00 us   0.00 us  67991  RELEASEDIR
  0.07  52.79 us  52.79 us  52.79 us  1GETXATTR
  0.10  39.25 us  38.38 us  40.13 us  2  STATFS
  0.13  51.45 us  45.36 us  57.55 us  2 OPENDIR
 20.281406.18 us 133.88 us   13190.43 us 11  LOOKUP
 79.41   60571.65 us   60571.65 us   60571.65 us  1READDIRP
 
Duration: 138822 seconds
   Data Read: 786432 bytes
Data Written: 0 bytes




-- 
Emmanuel Dreyfus
m...@netbsd.org
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] Corrupted list in iot_worker

2020-07-18 Thread Emmanuel Dreyfus
Hello

I experienced multiple cases of this crash:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  list_del_init (old=0x24103800)
at ../../../../libglusterfs/src/glusterfs/list.h:82

warning: Source file is more recent than executable.
82  old->prev->next = old->next;
[Current thread is 1 (process 100)]

(gdb) bt
#0  list_del_init (old=0x24103800)
at ../../../../libglusterfs/src/glusterfs/list.h:82
#1  __iot_dequeue (conf=conf@entry=0xb5c03338, pri=pri@entry=0xb1c53fb4)
at io-threads.c:110
#2  0xb59fe379 in iot_worker (data=0xb5c03338) at io-threads.c:222

(gdb) print old
$1 = (struct list_head *) 0x24103800
(gdb) print *old
Cannot access memory at address 0x24103800

Offending code in frame 1:
108 /* Get the first request on that queue. */
109 stub = list_first_entry(>reqs, call_stub_t, list);
110 list_del_init(>list);

(gdb) print stub
$1 = (call_stub_t *) 0x979fb800
(gdb) print *stub
Cannot access memory at address 0x979fb800
(gdb) print ctx->reqs
Cannot access memory at address 0xa8979ff4
(gdb) print *ctx
Cannot access memory at address 0xa8979ff4

We got ctx a bit earlier:
98  /* Get the first per-client queue for this priority. */
99  ctx = list_first_entry(>clients[i], iot_client_ctx_t, 
clients);

And the list is obviously bad here:
(gdb) print *conf->clients[3]->next
$11 = {next = 0x144, prev = 0x0}

Known bug?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Help with smoke test failure

2020-07-17 Thread Emmanuel Dreyfus
Hello

I am still stuck on this one: how should I address the missing 
SpecApproved and DocApproved here?

On Fri, Jul 10, 2020 at 05:43:54PM +0200, Emmanuel Dreyfus wrote:
> > What should I do to get this passed?
> > 
> > https://build.gluster.org/job/comment-on-issue/19308/ : FAILURE <<<
> > Missing SpecApproved flag on Issue 1361
> > Missing DocApproved flag on Issue 1361

-- 
Emmanuel Dreyfus
m...@netbsd.org
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] Help with smoke test failure

2020-07-10 Thread Emmanuel Dreyfus
Sorry, wrong list! I repost at the right place.

Emmanuel Dreyfus  wrote:

> Hello
> 
> What should I do to get this passed?
> 
> https://build.gluster.org/job/comment-on-issue/19308/ : FAILURE <<<
> Missing SpecApproved flag on Issue 1361
> Missing DocApproved flag on Issue 1361


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] heal info output

2020-07-08 Thread Emmanuel Dreyfus
On Mon, Jul 06, 2020 at 12:27:38PM +0200, Xavi Hernandez wrote:
> Is the '.attribute' directory only present on the root directory of a
> filesystem ? if so I strongly recommend to never use the root of a
> filesystem to place bricks. Always place the brick into a subdirectory.

Right, but once the user did the mistake, we need a way out. I found 
the new places in the posix xlator where taht directory was not properly
ignored, I will submit a patch.

> > 2) /owncloud/data  is a directory. mode, owner and groups are the same
> > on bricks. Why is it listed here?
> 
> If files or subdirectories have been created or removed from that directory
> and the operation failed on some brick (or the brick was down), the
> directory is also marked as bad. You should also check the contents.

Indeed there was some messy stuff, but the only way I found to fix it was
cp -rp dir dir.bak && mv dir dir.orig && mv dir.bak dir  rm -rf dir.orig

But now I have directories in split brain. The only difference I find
is a file mtime. How should I fix that? gluster volume heal gfs splut-brain 
latest-mtime dir does not work, and if I try it on the file inside, I am told 
it is
not in split brain.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] self heal deamon not running

2020-07-07 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> bidon# gluster volume heal gfs full
> Launching heal operation to perform full self heal on volume gfs has been
> unsuccessful: Self-heal daemon is not running. Check self-heal daemon log
> file.

I noticed that gluster volume heal gfs info show
Brick bidon:/export/wd2e
Status: Socket is not connected
Number of entries: -

Killing the glusterfsd process for this brick and issuing gluster volume
start gfs force managed to get me out of this situaion.


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] self heal deamon not running

2020-07-07 Thread Emmanuel Dreyfus
 
using set value 42 
[2020-07-08 01:10:10.107534] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-client-4: option strict-locks 
using set value off 
[2020-07-08 01:10:10.107553] I [MSGID: 0] 
[options.c:1240:xlator_option_reconf_int32] 0-gfs-client-5: option ping-timeout 
using set value 42 
[2020-07-08 01:10:10.107569] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-client-5: option strict-locks 
using set value off 
[2020-07-08 01:10:10.107582] I [MSGID: 0] 
[options.c:1239:xlator_option_reconf_uint32] 0-gfs-replicate-2: option 
background-self-heal-count using set value 0 
[2020-07-08 01:10:10.107596] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-2: option 
metadata-self-heal using set value on 
[2020-07-08 01:10:10.107607] I [MSGID: 0] 
[options.c:1236:xlator_option_reconf_str] 0-gfs-replicate-2: option 
data-self-heal using set value on 
[2020-07-08 01:10:10.107618] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-2: option 
entry-self-heal using set value on 
[2020-07-08 01:10:10.107647] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-2: option 
self-heal-daemon using set value enable 
[2020-07-08 01:10:10.107659] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-2: option 
iam-self-heal-daemon using set value yes 
[2020-07-08 01:10:10.108327] I [MSGID: 0] 
[options.c:1240:xlator_option_reconf_int32] 0-gfs-client-6: option ping-timeout 
using set value 42 
[2020-07-08 01:10:10.108372] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-client-6: option strict-locks 
using set value off 
[2020-07-08 01:10:10.108387] I [MSGID: 0] 
[options.c:1240:xlator_option_reconf_int32] 0-gfs-client-7: option ping-timeout 
using set value 42 
[2020-07-08 01:10:10.108402] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-client-7: option strict-locks 
using set value off 
[2020-07-08 01:10:10.108414] I [MSGID: 0] 
[options.c:1239:xlator_option_reconf_uint32] 0-gfs-replicate-3: option 
background-self-heal-count using set value 0 
[2020-07-08 01:10:10.108436] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-3: option 
metadata-self-heal using set value on 
[2020-07-08 01:10:10.108447] I [MSGID: 0] 
[options.c:1236:xlator_option_reconf_str] 0-gfs-replicate-3: option 
data-self-heal using set value on 
[2020-07-08 01:10:10.108458] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-3: option 
entry-self-heal using set value on 
[2020-07-08 01:10:10.108485] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-3: option 
self-heal-daemon using set value enable 
[2020-07-08 01:10:10.108496] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-3: option 
iam-self-heal-daemon using set value yes 
[2020-07-08 01:10:10.108523] I [MSGID: 0] 
[options.c:1236:xlator_option_reconf_str] 0-gfs: option log-level using set 
value INFO 
[2020-07-08 01:10:10.112105] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 
0-glusterfs: Received list of available volfile servers: baril:24007 
[2020-07-08 01:10:10.112210] I [MSGID: 101221] 
[common-utils.c:3822:gf_set_volfile_server_common] 0-gluster: duplicate entry 
for volfile-server [{errno=17}, {error=File exists}] 
[2020-07-08 01:10:10.112309] I [MSGID: 100040] 
[glusterfsd-mgmt.c:109:mgmt_process_volfile] 0-glusterfs: No change in volfile, 
countinuing [] 
[2020-07-08 01:10:09.927148] I [MSGID: 0] 
[options.c:1240:xlator_option_reconf_int32] 0-gfs-client-0: option ping-timeout 
using set value 42 
[2020-07-08 01:10:09.927159] I [MSGID: 0] 
[options.c:1245:xlator_option_reconf_bool] 0-gfs-client-0: option strict-locks 
using set value off 
[2020-07-08 01:10:10.108539] I [MSGID: 0] 
[options.c:1240:xlator_option_reconf_int32] 0-gfs: option threads using set 
value 16 
[2020-07-08 01:11:19.582209] I [socket.c:849:__socket_shutdown] 0-gfs-client-4: 
intentional socket shutdown(19)
[2020-07-08 01:11:19.582327] I [socket.c:849:__socket_shutdown] 0-gfs-client-6: 
intentional socket shutdown(20)
[2020-07-08 01:13:28.372643] I [socket.c:849:__socket_shutdown] 0-gfs-client-4: 
intentional socket shutdown(19)
[2020-07-08 01:13:28.417206] I [socket.c:849:__socket_shutdown] 0-gfs-client-6: 
intentional socket shutdown(20)
[2020-07-08 01:15:49.007545] I [socket.c:849:__socket_shutdown] 0-gfs-client-4: 
intentional socket shutdown(19)
[2020-07-08 01:15:49.035180] I [socket.c:849:__socket_shutdown] 0-gfs-client-6: 
intentional socket shutdown(20)



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] glusterfsd memory usage

2020-07-06 Thread Emmanuel Dreyfus
Hello

I see glusterfsd processes growing up to multiple gigabyte of virtual
memory. Is it something that should be expected? After some time, the
machine runsout of swap and kills the processes

  PID SIZERES COMMAND
 8397 2113M  684M glusterfsd
19427 1412M  168M glusterfsd
16873 1914M  279M glusterfsd
 6809  183M   27M glusterfsd



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] heal info output

2020-07-01 Thread Emmanuel Dreyfus
Hello

gluster volume heal info show me questionable entries. I wonder if these
are bugs, or if I shoud handle them and how.

bidon# gluster volume heal gfs info 
Brick bidon:/export/wd0e_tmp
Status: Connected
Number of entries: 0

Brick baril:/export/wd0e
/.attribute/system 
 
Status: Connected
Number of entries: 2

(...)
Brick bidon:/export/wd2e
 
 
/owncloud/data 
 
 
 
 

There are three cases:
1) /.attribute directory is special on NetBSD, it is where extended
attributes are stored for the filesystem. The posix xlator takes care of
screening it, but there must be some other softrware component that
should learn it must disregeard it. Hints are welcome about where I
should look at.

2) /owncloud/data  is a directory. mode, owner and groups are the same
on bricks. Why is it listed here?

3)  What should I do with this?
-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] NetBSD build fixes for 8.0rc

2020-06-29 Thread Emmanuel Dreyfus
Hello

After a long absence, I tried to upgrade glusterfs on NetBSD. I am a bit
sad to discover that after investing a lot of efforts to setup NetBSD
tests, even the build is broken now.

Here are the build fixes. Can someone explain what went wrong in gerrit
review? Two tests failed, but I am not sure I understand why.
https://review.gluster.org/#/c/glusterfs/+/24648/



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



[Gluster-devel] directory filehandles

2019-07-12 Thread Emmanuel Dreyfus
Hello

I have trouble figuring the whole story about how to cope with FUSE
directory filehandles in the NetBSD implementation.

libfuse makes a special use of filehandles exposed to filesystem for
OPENDIR, READDIR, FSYNCDIR, and RELEASEDIR. For that four operations,
the fh is a pointer to a struct fuse_dh, in which the fh field is
exposed to the filesystem. All other filesystem operations pass the fh
as is from kernel to filesystem back and forth.

That means that a fh obtained by OPENDIR should never be passed to
operations others than (READDIR, FSYNCDIR and RELEASEDIR). For instance,
when porting ltfs to NetBSD, I experienced that passing a fh obtained
from OPENDIR to SETATTR would crash.

glusterfs implementation differs from libfuse because it seems the
filesystem is always passed as is: there is nothing like libfuse struct
fuse_dh. It will therefore happily accept fh obtained by OPENDIR for any
operation, something that I do not expect to happen in libfuse based
filesystems.

My real concern is SETLK on directory. Here glusterfs really wants a fh
or it will report an error. The NetBSD implementation passes the fh it
got from OPENDIR, but I expect a libfuse based filesystem to crash in
such a situation. For now I did not find any libfuse-based filesystem
that implements locking, so I could not test that. 

Could someone clarify this? What are the FUSE operations that should be
sent to filesystem on that kind of program?

int fd;

/* NetBSD  calls FUSE LOOKUP and OPENDIR */
if ((fd = open("/gfs/tmp", O_RDONLY, 0)) == -1)
err(1, "open failed");

/* NetBSD calls FUSE SETLKW */
if (flock(fd, LOCK_EX) == -1)
err(1, "flock failed");



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel



Re: [Gluster-devel] I/O performance

2019-02-01 Thread Emmanuel Dreyfus
On Thu, Jan 31, 2019 at 10:53:48PM -0800, Vijay Bellur wrote:
> Perhaps we could throttle both aspects - number of I/O requests per disk

While there it would be nice to detect and report  a disk with lower than
peer performance: that happen sometimes when a disk is dying, and last
time I was hit by that performance problem, I had a hard time finding
the culprit.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] FUSE directory filehandle

2019-01-09 Thread Emmanuel Dreyfus
Hello

This is not strictly a GlusterFS question since I came to it porting
LTFS to NetBSD, however I would like to make sure I will not break
GlusterFS by fixing NetBSD FUSE implementation for LTFS.

Current NetBSD FUSE implementation sends the filehandle in any FUSE
requests for an open node, regardless of its type (directory or file). 

I discovered that libfuse low level code manages filehandle differently
for opendir/readdir/syncdir/releasedir than for other operations. As a
result, when a getattr is done on a directory, setting the filehandle
obtained from opendir can cause a crash in libfuse.

The fix for NetBSD FUSE implementation is to avoid setting the
filehandle for the following FUSE operations on directories: getattr,
setattr, poll, getlk, setlk, setlkw, read, write (only the first two
ones are likely to be actually used, though)

Does anyone forsee a possible problem for GlusterFS with such a
behavior? In other words, will it be fine to always have a
FUSE_UNKNOWN_FH (aka null) filehandle for getattr/setattr on
directories?


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Split brain after replacing a brick

2018-04-06 Thread Emmanuel Dreyfus
Pranith Kumar Karampuri <pkara...@redhat.com> wrote:

> Could you give the extended attributes of that directory on all the bricks
> to figure out the kind of split-brain?

In the meantime, I cleared all extended attributes on .attribute
directories and it fixed it.

The bug is just that something added extended attributes to this
directory at some time.IIRC the storage/posix translator avoids it, but
there must be some other code path.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Split brain after replacing a brick

2018-03-31 Thread Emmanuel Dreyfus
Hello

After doing a replace-brick and a full heal, I am left with:

Brick bidon:/export/wd0e
Status: Connected
Number of entries: 0

Brick baril:/export/wd0e
Status: Connected
Number of entries: 0

Brick bidon:/export/wd1e
Status: Connected
Number of entries: 0

Brick baril:/export/wd1e
Status: Connected
Number of entries: 0

Brick bidon:/export/wd2e
Status: Connected
Number of entries: 0

Brick baril:/export/wd2e
/.attribute 
Status: Connected
Number of entries: 1

Brick bidon:/export/wd3e_tmp
Status: Connected
Number of entries: 0

Brick baril:/export/wd3e
/ - Is in split-brain

Status: Connected
Number of entries: 1


I guess the baril:/export/wd3e split -brain for / fits /.attribute on
baril:/export/wd2e? How can I check?

.attribute is the hidden directory where NetBSD stores extended
attributes. It should be ignored by healing. Is there a bug to fix here?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Slow volume, gluster volume status bug

2017-11-14 Thread Emmanuel Dreyfus
Emmanuel Dreyfus <m...@netbsd.org> wrote:

> What happens if I remove the trusted.gfid2path.* attributes? Are
> they just re-created?

After reading the source, I concluded I could safely remover the
trusted.gfid2path.* attributes. It fixed the (NetBSD specific)
performance problem.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Slow volume, gluster volume status bug

2017-11-14 Thread Emmanuel Dreyfus
On Tue, Nov 14, 2017 at 06:38:44PM +0530, Atin Mukherjee wrote:
> So this is the origin of why the peers don't understand they are connected.
> Friend handshaking got stuck in the middle and it never recovered back.
> Restarting the glusterd services ideally should fix the state, if not then
> you'd have to manually edit the /var/lib/glusterd/peers/UUID files with
> state=3 and then restart glusterd service.

That fixed it: I now see all my bricks again.
Where could I have found that in the documentation?

Now I just need to know if trusted.gfid2path attributes can be
safely removed once the feature is disabled.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Slow volume, gluster volume status bug

2017-11-14 Thread Emmanuel Dreyfus
On Tue, Nov 14, 2017 at 10:43:39AM +, Emmanuel Dreyfus wrote:
> What happens if I remove the trusted.gfid2path.* attributes? Are
> they just re-created?
> 
> Some hint on how to disable the feature? 

gluster volume set gfs gfid2path off

Can I jst delete the trusted.gfid2path.* attributes once I did that?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Slow volume, gluster volume status bug

2017-11-14 Thread Emmanuel Dreyfus
On Tue, Nov 14, 2017 at 09:17:35AM +, Emmanuel Dreyfus wrote:
> In the meantime I tracked the performance problem to exteded atributes
> system calls. The root of the problem is outside of glusterfs, but fixing
> the consequuences would be nice.

I think I found the problem: listxattr() scales badly on NetBSD with
the amount of extended attributes names on the filesystem. I now have 
14954 different extended attribute names, 14910 being of the form
trusted.gfid2path.7c8a8ff2db92b4ec

I see news about trusted.gfid2path in 
http://docs.gluster.org/en/latest/release-notes/3.12.0/

This exlains why I got hurt when upgrading to 3.12.2.

What happens if I remove the trusted.gfid2path.* attributes? Are
they just re-created?

Some hint on how to disable the feature? 


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Slow volume, gluster volume status bug

2017-11-14 Thread Emmanuel Dreyfus
On Tue, Nov 14, 2017 at 12:17:05PM +0530, Atin Mukherjee wrote:
> > gluster volume status also exhibits trouble: each server will only
> > list its bricks, but not the other's one. I suspect it could just
> > be some tiemout because of slow answer from the peer.

> Have you checked the output of gluster peer status? Also does glusterd log
> file give any hint on time outs, rpc failures, disconnections et all?

gluster peer status says "State: Sent and Received peer request (Connected)"
on both sides.

I have this in glusterd.log:
[2017-11-14 08:49:47.289423] I [MSGID: 106143] 
[glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd3e on 
port 49155
[2017-11-14 08:49:52.289926] I [MSGID: 106143] 
[glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd0e on 
port 49152
[2017-11-14 08:49:52.295394] I [MSGID: 106143] 
[glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd1e on 
port 49153
[2017-11-14 08:49:52.302973] I [MSGID: 106143] 
[glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd2e on 
port 49154
[2017-11-14 08:54:31.535066] W [socket.c:593:__socket_rwv] 0-management: readv 
on 192.0.2.109:24007 failed (Connection reset by peer)
[2017-11-14 08:54:32.567745] I [MSGID: 106004] 
[glusterd-handler.c:6284:__glusterd_peer_rpc_notify] 0-management: Peer  
(<2d7719d9-0466-434c-a881-4081156fac47>), in state , has 
disconnected from glusterd.

An odd thing: the registrations message suggest the local bricks should 
show as online in glusterfs volume status output. They are displayed as 
offline, until I kill the glusterfsd processes and issue a
 gluster volume start gfs force.

ALong with symetrical stuff, the peer has this;
[2017-11-14 08:56:05.799686] E [socket.c:2369:socket_connect_finish] 
0-management: connection to 192.0.2.110:24007 failed (Connection timed out); 
disconnecting socket

In the meantime I tracked the performance problem to exteded atributes
system calls. The root of the problem is outside of glusterfs, but fixing
the consequuences would be nice.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Slow volume, gluster volume status bug

2017-11-13 Thread Emmanuel Dreyfus
Hello

I am looking for hints about how to debug this:

I have a 4x2 Distributed-Replicate volume which exhibits extremely slow
operations. Example:
# time stat /gfs/dl
51969 10143657874486987692 drwxr-xr-x 4 _httpd wheel 172912968 4096 "Nov 13 
17:22:12 2017" "Sep 22 11:53:35 2017" "Sep 22 11:53:35 2017" "Jan  1 01:00:00 
1970" 131072 8 0 /gfs/dl
8.72s real 0.00s user 0.01s system

But the thing is not 100% reproductible. Sometime I get an isntant
(normal) response.

gluster volume status also exhibits trouble: each server will only 
list its bricks, but not the other's one. I suspect it could just
be some tiemout because of slow answer from the peer.

tcpdump tells me that the server can take seconds to answer. 
Brick logs show nothing special.

Any idea?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusters 3.12.2: bricks do not start on NetBSD

2017-11-03 Thread Emmanuel Dreyfus
Emmanuel Dreyfus <m...@netbsd.org> wrote:

> [2017-11-02 12:32:57.429885] E [MSGID: 115092]
> [server-handshake.c:586:server_setvolume] 0-gfs-server: No xlator
> /export/wd0e is found in child status list
> [2017-11-02 12:32:57.430162] I [MSGID: 115091]
> [server-handshake.c:761:server_setvolume] 0-gfs-server: Failed to get
> client opversion

Problem solved through gluster volume sync on each server right after
upgrading. I still do not know what went wrong, but I have a workaround.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusters 3.12.2: bricks do not start on NetBSD

2017-11-02 Thread Emmanuel Dreyfus
Hello

I have been missing updates for a while. Now I try to upgrade
from 3.8.9 to 3.12.2 and I hit a regression: brick processes
start, but gluster volume status show them as not started.

A relevant line in the brick process log is:
[2017-11-02 12:32:56.867606] E [MSGID: 115092] [server-handshake.c:586:server_se
tvolume] 0-gfs-server: No xlator /export/wd0e is found in child status list
[2017-11-02 12:32:56.867803] I [addr.c:55:compare_addr_and_update] 0-/export/wd0
e: allowed = "*", received addr = "192.0.2.109"
[2017-11-02 12:32:56.867803] I [addr.c:55:compare_addr_and_update] 0-/export/wd0
e: allowed = "*", received addr = "192.0.2.109"
[2017-11-02 12:32:56.867863] I [MSGID: 115029] 
[server-handshake.c:793:server_setvolume] 0-gfs-server: accepted client from 
bidon.example.net-25092-2017/11/02-12:32:48:770637-gfs-client-0-0-0 (version: 
3.12.2)
[2017-11-02 12:32:57.429885] E [MSGID: 115092] 
[server-handshake.c:586:server_setvolume] 0-gfs-server: No xlator /export/wd0e 
is found in child status list
[2017-11-02 12:32:57.430162] I [MSGID: 115091] 
[server-handshake.c:761:server_setvolume] 0-gfs-server: Failed to get client 
opversion

Any idea of what goes wrong?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] file upload to a gluster mount

2017-09-22 Thread Emmanuel Dreyfus
Hello

I discover that uploading a file trough PHP to a gluster mount is
wuite slow, because of the small chunk size (5 ko).

Beside patching PHP to increase the chunk size, I can imagine 
making an Apache module that would use gluster API to efficiently
handle a file uploead. Perhaps someone did it already?


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Attackers hitting vulnerable HDFS installations

2017-02-10 Thread Emmanuel Dreyfus
On Fri, Feb 10, 2017 at 08:30:40AM -0500, Ira Cooper wrote:
> But I suspect... You got it right, Gluster isn't big enough to attack today.

It is just a matter of time.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Attackers hitting vulnerable HDFS installations

2017-02-10 Thread Emmanuel Dreyfus
On Thu, Feb 09, 2017 at 03:53:52PM -0500, Jeff Darcy wrote:
> https://www.theregister.co.uk/2017/02/09/hadoop_clusters_fked/
> Similar attacks have occurred against MongoDB and ElasticSearch.  
> How long before they target us?  How will we do?

It is true default glusterfs installation is too open. A simple
solution would be to introduce an access control, either by 
IP whitelist, or better by shared secret.

The obvious problem is that it breaks updates. At least peer
know each others and could agree on automatically creating
a shared secret if it is missing, but we need to break clients.
The annoyance can be mitigated with an helpful message on mount 
failure, in the log and on stdout such as "please copy 
/etc/glusterd/secret from a server"

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Invitation: Re: Question on merging zfs snapshot supp... @ Tue Dec 20, 2016 2:30pm - 3:30pm (IST) (sri...@marirs.net.in)

2016-12-21 Thread Emmanuel Dreyfus
On Wed, Dec 21, 2016 at 10:00:17AM +0530, sri...@marirs.net.in wrote:
> In continuation to the discussion we'd yesterday, I'd be working on the
> change we'd initiated sometime back for pluggable FS specific snapshot
> implementation

Let me know how I can contribute the FFS implementation for NetBSD.
In case it helps for designing the API, here is the relevant man page:
http://netbsd.gw.com/cgi-bin/man-cgi?fss+.NONE+NetBSD-7.0.2

Basically, you find iterate on /dev/fss[0-9], open it cand call ioctl
FSSIOCGET to checkif it is already in use. Once you have an unused 
one, ioctl FSSIOCSET to cast the snapshot. It requires a backing store
file, which may be created by mktemp() and unlinked immediatly.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] quota-rename.t core in netbsd

2016-11-04 Thread Emmanuel Dreyfus
Sanoj Unnikrishnan <sunni...@redhat.com> wrote:

> Ran the same steps as in the quota-rename.t (manually though. multiple
> times!), Could not reproduce the issue.

But running the test framework hits the bug reliabily?


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] quota-rename.t core in netbsd

2016-10-26 Thread Emmanuel Dreyfus
Vijay Bellur <vbel...@redhat.com> wrote:

> Emmanuel might be able to help with problems related to NetBSD
> environment.

Sure, feel free to ask if you hit NetBSD-speciic troubles.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster and FreeBSD

2016-09-20 Thread Emmanuel Dreyfus
On Tue, Sep 20, 2016 at 09:16:54AM +0530, Nigel Babu wrote:
> Giving this thread a signal boost. We should think about this if we're going 
> to
> continue to support *BSD.

An attempt to clarify some apparent confusion: Despite theit very similar
names, *BSD are not different distributions of the same software like 
Linux distributions are. NetBSD and FreeBSD are distinct operating systems,
with theit own kernels and userlands that diverged from a common ancestor
23 years ago.

This is why you should not take FreeBSD behaviors for granted on NetBSD, 
and vice-versa. 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster and FreeBSD

2016-09-19 Thread Emmanuel Dreyfus
Nigel Babu <nig...@redhat.com> wrote:

> Emmanuel, I know you work on NetBSD, but do you have thoughts to add here?

I can help fixinf NetBSD bugs, but does this FreeBSD problem also
applies to NetBSD? A quick tests shows that altough NetBSD does not use
sticky bit on files, it still retains them:

# touch test
# ls -l test
-rw-r--r--  1 root  wheel  0 Sep 20 06:04 test
# chmod u+t test
# ls -l test
-rw-r--r-T  1 root  wheel  0 Sep 20 06:04 test

Note that T means t without x:

# chmod uog+rx test
# ls -l test
-rwxr-xr-t  1 root  wheel  0 Sep 20 06:04 test



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Libunwind

2016-09-09 Thread Emmanuel Dreyfus
On Thu, Sep 08, 2016 at 09:07:33AM -0400, Jeff Darcy wrote:
> (1) Has somebody already gone down this path?  Does it work?

I recall attempting to port the Julia programming language to
NetBSD, and libunwind gave me a hard time because the NetBSD
does not implement an API as large as on Linux. 

My advice is to review supported platfrom's header file before
using some feature, otherwise the result will not be easily portable

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Anyone wants to maintain Mac-OSX port of gluster?

2016-09-06 Thread Emmanuel Dreyfus
On Tue, Sep 06, 2016 at 07:30:08AM -0400, Kaleb S. KEITHLEY wrote:
> Mac OS X doesn't build at the present time because its sed utility (used in
> the xdrgen/rpcgen part of the build) doesn't support the (linux compatible)
> '-r' command line option. (NetBSD and FreeBSD do.)
> 
> (There's an easy fix)

Easy fix, replace sed -r by $SED_R and
SED_R="sed -r" on Linux vs SED_R="sed -E" on BSDs, including OSX. 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regression is now netbsd6-regression

2016-08-18 Thread Emmanuel Dreyfus
On Thu, Aug 18, 2016 at 12:07:08PM +0530, Nigel Babu wrote:
> As in the case of CentOS yesterday, the NetBSD job is now netbsd6-regression.

But we run regressions o nnetbsd-7 branch. Smoke tests are tun on neybsd-6.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Changing the names of regression jobs

2016-08-11 Thread Emmanuel Dreyfus
On Thu, Aug 11, 2016 at 11:07:22AM +0530, Nigel Babu wrote:
> I'd like to propose renaming them to:
> * centos-regression
> * netbsd-regression

I suggest you keep an OS version number;
netbsd7-regression

That way we can introduce an OS update as experimental without breaking 
what is known to work.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD Regression Failures for 2 weeks

2016-08-09 Thread Emmanuel Dreyfus
Jeff Darcy <jda...@redhat.com> wrote:

> I think there's an experiment we should do, which I've discussed with a
> couple of others: redefine EXPECT_WITHIN on NetBSD to double or triple
> the time given, and see if it makes a difference.

I am pretty sure it would help. I already raised a few limits to fix
tests for NetBSD. We could set limits in a per-OS unit, which will be 1s
for Linux, and let's start with 2s for NetBSD and see if we get a
difference on overall failure ratio.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD Regression Failures for 2 weeks

2016-08-09 Thread Emmanuel Dreyfus
On Tue, Aug 09, 2016 at 03:44:43PM +0530, Nigel Babu wrote:
> Here are the netbsd regressions for the last 2 weeks. Please let me know if
> there are infra issues particularly in nbslave7h. As far as I can see, it
> just gets assigned more jobs than other machines, and hence more failures.

Probably right. Since we see no systemic failures, that suggest there
are real rare bugs there. But if they are that difficult to reproduce, 
that does not push people to track them :-/

> *96* of *247* regressions failed

That is huge.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] readdir() harmful in threaded code

2016-07-24 Thread Emmanuel Dreyfus
Vijay Bellur <vbel...@redhat.com> wrote:

> Do you have any concrete examples of problems encountered due to the 
> same directory stream being invoked from multiple threads?

I am not sure this scenario can happen, but what we had were directory
offsets reused among different DIR * opened on the same directory. This
works on Linux but is a standard violation, as directory offsets are
supposed to be valid only for a given DIR *.

It broke NetBSD regression enough that I added a test against it in
xlator/storage/posix/src:posix.c

seekdir (dir, off); 
#ifndef GF_LINUX_HOST_OS
if ((u_long)telldir(dir) != off && off != pfd->dir_eof)
{
gf_msg (THIS->name, GF_LOG_ERROR, EINVAL,
P_MSG_DIR_OPERATION_FAILED,
"seekdir(0x%llx) failed on dir=%p: "
"Invalid argument (offset reused from "
"another DIR * structure?)", off, dir);
errno = EINVAL; 
count = -1;
goto out;
}
#endif /* GF_LINUX_HOST_OS */



About standards and portability, here is the relevant part in Linux man
page: 
> In the current POSIX.1 specification (POSIX.1-2008), readdir(3) is not
> required to be thread-safe.  However, in modern implementations
> (including the glibc implementation), concurrent calls to readdir(3)
> that specify different directory streams are thread-safe.
> 
> It is expected that a future version of POSIX.1 will make readdir_r()
> obsolete, and require that readdir() be thread-safe when concurrently
> employed on different directory streams.

This means linux recommands using readir(), but such practice is likely
to break on other systems, since standards do not currently requires it
to be thread-safe. We can go the readdir() way, but pleas add locks.

Alternatively we can use #ifdef to use alternate code on Linux (readdir)
and others (readdir_r)
-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] readdir() harmful in threaded code

2016-07-23 Thread Emmanuel Dreyfus
Pranith Kumar Karampuri <pkara...@redhat.com> wrote:

> So should we do readdir() with external locks for everything instead?

readdir() with a per-directory lock is safe. However, it may come with a
performance hit in some scenarios, since two threads cannot read the
same directory at once. But I am not sure it can happen in GlusterFS.

I am a bit disturbed by readdir_r() being planned for deprecation. The
Open Group does not say that, or I missed it:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] netbsd smoke tests fail when code patches are backported to release-3.6

2016-05-20 Thread Emmanuel Dreyfus
On Fri, May 20, 2016 at 05:43:07PM +0300, Angelos SAKELLAROPOULOS wrote:
> May I ask why following review requests are not submitted to release-3.6 ?
> It seems that they fail in netbsd, freebsd smoke tests which are not
> related to code changes.

There are build errors. I am note sur how you could have inherited 
them from git checkout, since previous changes were supposed to 
pass smoke too. If you are sure the error are not yours, you
can try to rebase.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] self heal start failure on 3.8rc1

2016-05-19 Thread Emmanuel Dreyfus
Ravishankar N <ravishan...@redhat.com> wrote:

> Yes, since 3.8 was based off master, it has the same issue. 
> http://review.gluster.org/#/c/14414/ has been merged to fix it. If you
> want to temporarily workaround it, just do some dummy 'gluster volume
> set` operation to regenerate the client vol files.

Well, since the goal is to test migration path, I will wait for 3.8rc2

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] self heal start failure on 3.8rc1

2016-05-19 Thread Emmanuel Dreyfus
Hello

After updating from 3.7.11 to 3.8rc1, self heal daemon will 
not start anymore. Here is the log. The "op-version >= 30707"
error reminds me something we already saw in the past.

Any hint?

[2016-05-20 03:34:40.709337] I [MSGID: 100030] [glusterfsd.c:2350:main] 
0-/usr/pkg/sbin/glusterfs: Started running /usr/pkg/sbin/glusterfs version 
3.8rc1 (args: /usr/pkg/sbin/glusterfs -s localhost --volfile-id 
gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l 
/var/log/glusterfs/glustershd.log -S 
/var/run/gluster/c7e5574af4b5b4ffdcb61b1d5e63d8da.socket --xlator-option 
*replicate*.node-uuid=85eb78cd-8ffa-49ca-b3e7-d5030bc3124d)
[2016-05-20 03:34:40.734688] E [socket.c:2391:socket_connect_finish] 
0-glusterfs: connection to ::1:24007 failed (Connection refused)
[2016-05-20 03:34:40.734927] E [glusterfsd-mgmt.c:1902:mgmt_rpc_notify] 
0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Invalid 
argument)
[2016-05-20 03:34:43.781626] I [MSGID: 101173] 
[graph.c:269:gf_add_cmdline_options] 0-gfs-replicate-3: adding option 
'node-uuid' for volume 'gfs-replicate-3' with value 
'85eb78cd-8ffa-49ca-b3e7-d5030bc3124d'
[2016-05-20 03:34:43.781818] I [MSGID: 101173] 
[graph.c:269:gf_add_cmdline_options] 0-gfs-replicate-2: adding option 
'node-uuid' for volume 'gfs-replicate-2' with value 
'85eb78cd-8ffa-49ca-b3e7-d5030bc3124d'
[2016-05-20 03:34:43.781859] I [MSGID: 101173] 
[graph.c:269:gf_add_cmdline_options] 0-gfs-replicate-1: adding option 
'node-uuid' for volume 'gfs-replicate-1' with value 
'85eb78cd-8ffa-49ca-b3e7-d5030bc3124d'
[2016-05-20 03:34:43.781883] I [MSGID: 101173] 
[graph.c:269:gf_add_cmdline_options] 0-gfs-replicate-0: adding option 
'node-uuid' for volume 'gfs-replicate-0' with value 
'85eb78cd-8ffa-49ca-b3e7-d5030bc3124d'
[2016-05-20 03:34:43.782267] E [MSGID: 108040] [afr.c:448:init] 
0-gfs-replicate-3: Unable to fetch afr pending changelogs. Is op-version >= 
30707? [Invalid argument]
[2016-05-20 03:34:43.782390] E [MSGID: 101019] [xlator.c:433:xlator_init] 
0-gfs-replicate-3: Initialization of volume 'gfs-replicate-3' failed, review 
your volfile again
[2016-05-20 03:34:43.782415] E [MSGID: 101066] 
[graph.c:324:glusterfs_graph_init] 0-gfs-replicate-3: initializing translator 
failed
[2016-05-20 03:34:43.782436] E [MSGID: 101176] 
[graph.c:670:glusterfs_graph_activate] 0-graph: init failed
[2016-05-20 03:34:43.783084] W [glusterfsd.c:1265:cleanup_and_exit] 
(-->0xbbbd9dab <rpc_clnt_handle_reply+452> at /usr/pkg/lib/libgfrpc.so.0 
-->0x8056207 <mgmt_getspec_cbk+850> at /usr/pkg/sbin/glusterfs -->0x8051b03 
<glusterfs_process_volfp+467> at /usr/pkg/sbin/glusterfs ) 0-: received signum 
(22), shutting down


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] tests/performance/open-behind.t fails on NetBSD

2016-05-08 Thread Emmanuel Dreyfus
Joseph Fernandes <josfe...@redhat.com> wrote:

> ./tests/performance/open-behind.t is failing continuously on 3.7.11

This is the fate of non enforced tests. It may be a good idea to
invetigate it: perhaps NetBSD gets a reliable failure for a rare bug
that is not NetBSD specific. We already saw such situations.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Requesting for a NetBSD setup

2016-05-02 Thread Emmanuel Dreyfus
On Mon, May 02, 2016 at 01:55:43PM +0530, Manikandan Selvaganesh wrote:
> Could you please provide us a NetBSD machine as the test cases are failing
> and we need to have a look on it?

nbslave72.cloud.gluster.org was put offline for some jenkins breakage
that todes not seems to be slave-related: I gave it a quick try, 
and it is able to build and run tests.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Requesting for NetBSD setup

2016-04-29 Thread Emmanuel Dreyfus
On Fri, Apr 29, 2016 at 01:28:53AM -0400, Karthik Subrahmanya wrote:
> I would like to ask for a NetBSD setup

nbslave7[4gh] are disabled in Jenkins right now. They are labeled 
"Disconnected by kaushal", but I don't kno why. Once it is confirmed
that they are not alread used for testing, you could pick one. 

I still does not know who is the password guardian at Rehat, though.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...

2016-04-25 Thread Emmanuel Dreyfus
On Sun, Apr 24, 2016 at 03:59:40PM +0200, Niels de Vos wrote:
> Well, slaves go into offline, and should be woken up when needed.
> However it seems that Jenkins fails to connect to many slaves :-/

Nothing new here. I tracked this kind of toruble with NetBSD slaves
and only got frustration as the result.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD FUSE and filehandles

2016-04-19 Thread Emmanuel Dreyfus
On Tue, Apr 19, 2016 at 04:25:07PM +0200, Csaba Henk wrote:
> I also have a vague memory that in Linux VFS the file operations
> are dispatched to file objects in quite a pure oop manner (which
> suggests itself to practices like "storing the file handle identifier
> along with the file object"), while in traditional BSD VFS the file
> ops just get the vnode (from which modernization efforts departed
> to various degree across the recent BSD variants).

Yes, NetBSD VFS has not clue about upper representation of the file,
it just has a reference on the vnode. That one will be difficult to 
implement.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Easy build fix to review

2016-04-18 Thread Emmanuel Dreyfus
Hi

Here is an easy build fix to review: remove undefined variable in
Makefile:
http://review.gluster.org/13867
http://review.gluster.org/13868

Any taker?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] More news on 3.7.11

2016-04-15 Thread Emmanuel Dreyfus
On Fri, Apr 15, 2016 at 01:32:23PM +0530, Kaushal M wrote:
> Or,
> 2. Revert the IPv6 patch that exposed this problem

IMO the good practice when a change breaks a stable release
is to back it out, and work on a betterfix on master for later 
pull up to stable.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] NetBSD FUSE and filehandles

2016-04-03 Thread Emmanuel Dreyfus
Hi

Anoop C S asked me about a NetBSD FUSE bug that prevented mandatory
locks to properly work. In order to work on a fix, I need confirmation
about how it works on Linux, which is the reference implementation of
FUSE.

Here is what I understand, please tell me if there is something wrong:

Each time a process opens a file within the FUSE filesystem, the kernel
will call the FUSE open method, and the filesystem shall return a
filehandle. For subsequent operations on the open file descriptor, the
kernel will include the adequate filehandle in the FUSE requests. 

The filehandle is tied to the couple (calling process, file descriptor
within calling process). Each time the calling process calls open again
on the same file, a new file handle is returned. This means this creates
two distincts file handles;

fd1 = open("/mnt/foo", O_RDWR);
fd2 = open("/mnt/foo", O_RDWR);

Is all of that correct? If it is, here is the problem I face now: the
NetBSD kernel implements PUFFS, an interface smilar but incompatible
with FUSE that was developped before FUSE became the de-facto standard.
I have being maintaining the PUFFS to FUSE compatibility layer we use to
run Glusterfs on NetBSD.

PUFFS sends userland requests about vnode operations, the userland
filesystems gets references to the vnode, it can also get the calling
process PID, but currently the file descriptor within calling process is
not provided to the userland filesystem. 

If my understanding of FUSE filehandle semantics is correct, that means
I will have to modify the PUFFS interface so that operations on open
files get a reference about the file descriptor within calling process,
since this is a requirement to retreive the appropriate filehandle for
FUSE.

Anyone can confirm?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] !! operator

2016-04-02 Thread Emmanuel Dreyfus
Jeff Darcy <jda...@redhat.com> wrote:

> Sorry if my comment came off as dismissive.

It was not dissmissive. My reply was not ironic, I was gladly surprised
to learn something about C syntax.

>From an objective code-readability standpoint it's probably a bad
> idiom. 

IMO the root of the problem is the lack of a built-in boolean type in C,
but that is not big business.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] !! operator

2016-04-02 Thread Emmanuel Dreyfus
Jeff Darcy <jda...@redhat.com> wrote:

> It's a common idiom in the Linux kernel/coreutils community.  I
> thought it was in BSD too.

Thanks for the explanation. I was able to practice C for 18 years in
numerous projects without having the opportunity to see it. I will got
to bed less ignorant tonight. :-)

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] !! operator

2016-04-02 Thread Emmanuel Dreyfus
Hello 

I found a !! in glusterfs sources. Is it a C syntax I do not know, a
bug, or just a weird syntax?

xlators/cluster/afr/src/afr-inode-write.c: 
 local->stable_write = !!((fd->flags|flags)&(O_SYNC|O_DSYNC));


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] netbsd regression failure in open-behind.t

2016-03-19 Thread Emmanuel Dreyfus
On Fri, Mar 18, 2016 at 09:08:04AM -0400, Prasanna Kumar Kalever wrote:

gluster volume top $V0 open | grep -w "$F0" >/dev/null 2>&1
TEST [ $? -eq 0 ];

What do we expect here and what do we get?

I note that the test fails either if glustrer volume top fails, 
ot if its output does not contain $F0 (why not use fgrep "$F0" ?)

What happens? Removing  >/dev/null 2>&1 above may be insighful.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD Regression failure on 3.7: ./tests/features/trash.t

2016-03-14 Thread Emmanuel Dreyfus
On Mon, Mar 14, 2016 at 12:08:31PM +0530, Anoop C S wrote:
> Test #59 is a volume heal command:
> TEST $CLI volume heal $V1
> I am not sure why this command itself failed. Let me take a look
> through the archived logs.

It would not be the first time an unrelated bug pops up where
we do not expect it.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to cope with spurious regression failures

2016-03-02 Thread Emmanuel Dreyfus
Raghavendra Talur <rta...@redhat.com> wrote:

> Yes,  because I updated from patch set 2 to 3 and tests for 2 were running
> on the same slave.

It seems my test for concurent runs misfires when previous run was
aborted. I need to improve that.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to cope with spurious regression failures

2016-03-02 Thread Emmanuel Dreyfus
Raghavendra Talur <rta...@redhat.com> wrote:

> The tests passed on the first run itself, except for the NetBSD with
> "another test running on slave" error.

Was the previous test on the slave canceled?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] FreeBSD smoke failure

2016-02-20 Thread Emmanuel Dreyfus
Jeff Darcy <jda...@redhat.com> wrote:

> OK, so are you proposing that we add the same thing on the FreeBSD
> slaves?

This is how I fixed that exact same problem for NetBSD.


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] FreeBSD smoke failure

2016-02-20 Thread Emmanuel Dreyfus
Jeff Darcy <jda...@redhat.com> wrote:

> What solution do you suggest? 

On NetBSD Jenkins slaves slave VM, /opt/qa/build.sh contains this:

PYDIR=`$PYTHONBIN -c 'from distutils.sysconfig import get_python_lib;
print(get_python_lib())'`
su -m root -c "/usr/bin/install -d -o jenkins -m 755 $PYDIR/gluster"



--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] FreeBSD smoke failure

2016-02-20 Thread Emmanuel Dreyfus
Jeff Darcy <jda...@redhat.com> wrote:

> I've seen the same thing, but not all the time even for the same code.
> The fact that it's not consistent suggests that it's a configuration
> issue on some of the workers.

The problem is that glusterfs install target copies glupy python module
outside of glusterfs install directory. The permissions must be
propeperly setup for the unprivilegied build process to succeed the
copy.


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] readdir() harmful in threaded code

2016-02-11 Thread Emmanuel Dreyfus
Juste to make sure there is no misunderstanding here: unfortunately I 
do not have time right now to submit a fix. It would be nice if someone
else coule look at it.

On Wed, Feb 10, 2016 at 01:48:52PM +, Emmanuel Dreyfus wrote:
> Hi
> 
> After obtaining a core in a regression, I noticed there are a few readdir()
> use in threaded code. This is begging for a crash, as readdir() maintains
> an internal state that will be trashed on concurent use. readdir_r()
> should be used instead.
> 
> A quick search shows readdir(à usage here:
> contrib/fuse-util/mount_util.c:30
> extras/test/ld-preload-test/ld-preload-test.c:310
> extras/test/test-ffop.c:550
> libglusterfs/src/compat.c:256
> libglusterfs/src/compat.c:315
> libglusterfs/src/syscall.c:97
> tests/basic/fops-sanity.c:662
> tests/utils/arequal-checksum.c:331
> 
> Occurences in contrib, extra and tests are probably harmless are there
> are usage in standalone programs that are not threaded. We are left with
> three groups of problems:
> 
> 1) libglusterfs/src/compat.c:256 and libglusterfs/src/compat.c:315
> This is Solaris compatibility code. Is it used at all?
> 
> 2)  libglusterfs/src/syscall.c:97 This is the sys_readdir() wrapper, 
> which is in turn used in:
> libglusterfs/src/run.c:284
> xlators/features/bit-rot/src/stub/bit-rot-stub-helpers.c:582
> xlators/features/changelog/lib/src/gf-history-changelog.c:854
> xlators/features/index/src/index.c:471
> xlators/mgmt/glusterd/src/glusterd-snapshot-utils.c
> xlators/storage/posix/src/posix.c:3700
> xlators/storage/posix/src/posix.c:5896
> 
> 3) We also find sys_readdir() in libglusterfs/src/common-utils.h for
> GF_FOR_EACH_ENTRY_IN_DIR() which in turn appears in:
> libglusterfs/src/common-utils.c:3979
> libglusterfs/src/common-utils.c:4002
> xlators/mgmt/glusterd/src/glusterd-hooks.c:365
> xlators/mgmt/glusterd/src/glusterd-hooks.c:379
> xlators/mgmt/glusterd/src/glusterd-store.c:651
> xlators/mgmt/glusterd/src/glusterd-store.c:661
> xlators/mgmt/glusterd/src/glusterd-store.c:1781
> xlators/mgmt/glusterd/src/glusterd-store.c:1806
> xlators/mgmt/glusterd/src/glusterd-store.c:3044
> xlators/mgmt/glusterd/src/glusterd-store.c:3072
> xlators/mgmt/glusterd/src/glusterd-store.c:3593
> xlators/mgmt/glusterd/src/glusterd-store.c:3606
> xlators/mgmt/glusterd/src/glusterd-store.c:4032
> xlators/mgmt/glusterd/src/glusterd-store.c:4111
> 
> There a hive of sprious bugs to squash here.
> 
> -- 
> Emmanuel Dreyfus
> m...@netbsd.org
> _______
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] readdir() harmful in threaded code

2016-02-10 Thread Emmanuel Dreyfus
Hi

After obtaining a core in a regression, I noticed there are a few readdir()
use in threaded code. This is begging for a crash, as readdir() maintains
an internal state that will be trashed on concurent use. readdir_r()
should be used instead.

A quick search shows readdir(à usage here:
contrib/fuse-util/mount_util.c:30
extras/test/ld-preload-test/ld-preload-test.c:310
extras/test/test-ffop.c:550
libglusterfs/src/compat.c:256
libglusterfs/src/compat.c:315
libglusterfs/src/syscall.c:97
tests/basic/fops-sanity.c:662
tests/utils/arequal-checksum.c:331

Occurences in contrib, extra and tests are probably harmless are there
are usage in standalone programs that are not threaded. We are left with
three groups of problems:

1) libglusterfs/src/compat.c:256 and libglusterfs/src/compat.c:315
This is Solaris compatibility code. Is it used at all?

2)  libglusterfs/src/syscall.c:97 This is the sys_readdir() wrapper, 
which is in turn used in:
libglusterfs/src/run.c:284
xlators/features/bit-rot/src/stub/bit-rot-stub-helpers.c:582
xlators/features/changelog/lib/src/gf-history-changelog.c:854
xlators/features/index/src/index.c:471
xlators/mgmt/glusterd/src/glusterd-snapshot-utils.c
xlators/storage/posix/src/posix.c:3700
xlators/storage/posix/src/posix.c:5896

3) We also find sys_readdir() in libglusterfs/src/common-utils.h for
GF_FOR_EACH_ENTRY_IN_DIR() which in turn appears in:
libglusterfs/src/common-utils.c:3979
libglusterfs/src/common-utils.c:4002
xlators/mgmt/glusterd/src/glusterd-hooks.c:365
xlators/mgmt/glusterd/src/glusterd-hooks.c:379
xlators/mgmt/glusterd/src/glusterd-store.c:651
xlators/mgmt/glusterd/src/glusterd-store.c:661
xlators/mgmt/glusterd/src/glusterd-store.c:1781
xlators/mgmt/glusterd/src/glusterd-store.c:1806
xlators/mgmt/glusterd/src/glusterd-store.c:3044
xlators/mgmt/glusterd/src/glusterd-store.c:3072
xlators/mgmt/glusterd/src/glusterd-store.c:3593
xlators/mgmt/glusterd/src/glusterd-store.c:3606
xlators/mgmt/glusterd/src/glusterd-store.c:4032
xlators/mgmt/glusterd/src/glusterd-store.c:4111

There a hive of sprious bugs to squash here.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfsd core on NetBSD (https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14139/consoleFull)

2016-02-10 Thread Emmanuel Dreyfus
On Wed, Feb 10, 2016 at 02:26:35PM +0530, Soumya Koduri wrote:
> Is this issue related to bug1221629 as well?

I do not know, but please someone replace readdir by readdir_r! :-)
-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfsd core on NetBSD (https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14139/consoleFull)

2016-02-10 Thread Emmanuel Dreyfus
On Wed, Feb 10, 2016 at 12:17:23PM +0530, Soumya Koduri wrote:
> I see a core generated in this regression run though all the tests seem to
> have passed. I do not have a netbsd machine to analyze the core.
> Could you please take a look and let me know what the issue could have been?

changelog bug. I am not sure how this could become NULL after it has been 
checked at the beginning of gf_history_changelog().

I note this uses readdir() which is not thread-safe. readdir_r() should
probably be used instead.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xb99912b4 in gf_history_changelog (changelog_dir=0xb7b160f0 "\003", 
start=3081873456, end=0, n_parallel=-1217773520, actual_end=0xb7b05310)
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/changelog/lib/src/gf-history-changelog.c:834
834 gf_log (this->name, GF_LOG_ERROR,
(gdb) print this
$1 = (xlator_t *) 0x0
#0  0xb99912b4 in gf_history_changelog (changelog_dir=0xb7b160f0 "\003", 
start=3081873456, end=0, n_parallel=-1217773520, actual_end=0xb7b05310)
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/changelog/lib/src/gf-history-changelog.c:834
#1  0xbb6fec17 in rpcsvc_record_build_header (recordstart=0x0, 
rlen=3077193776, reply=..., payload=3081855216)
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/rpcsvc.c:857
#2  0xbb6fec95 in rpcsvc_record_build_header (recordstart=0xb7b10030 "", 
rlen=3077193776, reply=..., payload=3081855216)
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/rpcsvc.c:874
#3  0xbb6ffa81 in rpcsvc_submit_generic (req=0xb7b10030, proghdr=0xb7b160f0, 
hdrcount=0, payload=0xb76a4030, payloadcount=1, iobref=0x0)
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/rpcsvc.c:1316
#4  0xbb70506c in xdr_to_rpc_reply (msgbuf=0xb7b10030 "", len=0, 
reply=0xb76a4030, payload=0xb76a4030, 
verfbytes=0x1 )
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/xdr-rpcclnt.c:40
#5  0xbb26cbb5 in socket_server_event_handler (fd=16, idx=3, data=0xb7b10030, 
poll_in=1, poll_out=0, poll_err=0)
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-transport/socket/src/socket.c:2765
#6  0xbb7908da in syncop_rename (subvol=0xbb143030, oldloc=0xba45b4b0, 
newloc=0x3, xdata_in=0x75, xdata_out=0xbb7e8000)
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/syncop.c:2225
#7  0xbb790c21 in syncop_ftruncate (subvol=0xbb143030, fd=0x8062cc0 , 
offset=-4647738537632864458, xdata_in=0xbb7efe75 <_rtld_bind_start+17>, 
xdata_out=0xbb7e8000)
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/syncop.c:2265
#8  0xbb75f6d1 in inode_table_dump (itable=0xbb143030, 
prefix=0x2 )
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/inode.c:2352
#9  0x08050e20 in main (argc=12, argv=0xbf7feaac)
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/glusterfsd/src/glusterfsd.c:2345

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Cores on NetBSD of brick https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14100/consoleFull

2016-02-09 Thread Emmanuel Dreyfus
On Tue, Feb 09, 2016 at 11:56:37AM +0530, Pranith Kumar Karampuri wrote:
> I think the regression run is not giving that link anymore when the crash
> happens? Could you please add that also as a link in regression run?

Ther was the path of the archive, I changed it for a http:// link

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] changelog bug

2016-02-08 Thread Emmanuel Dreyfus
On Mon, Feb 08, 2016 at 12:53:33AM -0500, Manikandan Selvaganesh wrote:
> Thanks and as you have mentioned, I have no clue how my changes 
> produced a core due to a NULL pointer in changelog. 

It is probably an unrelated bug that was nice enough to pop up here.

Too often people disregard NetBSD failures and just retrigger without
looking at the cause, but it has already proven its ability to expose
bugs that are unwilling to come to light in Linux regressions, but still
exist on Linux.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Emmanuel Dreyfus
On Mon, Feb 08, 2016 at 03:26:54PM +0530, Milind Changire wrote:
> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14089/consoleFull
> 
> 
> [08:44:20] ./tests/basic/afr/self-heald.t ..
> not ok 37 Got "0" instead of "1"
> not ok 52 Got "0" instead of "1"
> not ok 67
> Failed 4/83 subtests

There is a core but it is from NetBSD FUSE subsystem. The trace is
not helpful but suggests an abort() call because of unexpected 
situation:

Core was generated by `perfused'.
Program terminated with signal SIGABRT, Aborted.
#0  0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0  0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12

/var/log/messages has a hint:
Feb  8 08:43:15 nbslave7c perfused: file write grow without resize

Indeed I have this assertion in NetBSD FUSE to catch a race condition. 
I think it is the first time I see hit raised, but I am unable to 
conclude on the cause. Let us retrigger (I did it) and see if someone 
else ever hit that again. The bug is more likely in NetBSD FUSE than 
in glusterfs.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Emmanuel Dreyfus
On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote:
> The patch to add it to bad tests has already been merged, so I guess this
> .t's failure won't pop up again.

IMo that was a bit too quick. What is the procedure to get out of the
list?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Emmanuel Dreyfus
On Mon, Feb 08, 2016 at 10:26:22AM +, Emmanuel Dreyfus wrote:
> Indeed, same problem. But unfortunately it is not very reproductible since
> we need to make a full week of runs to see it again. I am tempted to
> just remove the assertion.

NB: this does not fail on stock NetBSD release: the assertion is only there
because FUSE is build with -DDEBUG on NetBSD slave VM. 

OTOH if it happens only in tests/basic/afr/self-heal.t I may be able to 
get it by looping on the test for a while. I will try this on nbslave70.

In the meatime if that one pops up too often and gets annoying, I can get
rid of it by just disabling debug mode.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Emmanuel Dreyfus
On Mon, Feb 08, 2016 at 03:44:43PM +0530, Ravishankar N wrote:
> The .t has been added to bad tests for now @

I am note sure this is relevant: does it fails again? I am very interested
if it is reproductible.

> http://review.gluster.org/#/c/13344/, so you can probably rebase your patch.
> I'm not sure this is a problem with the case, the same issue was reported by
> Manikandan last week : 
> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13895/consoleFull

Indeed, same problem. But unfortunately it is not very reproductible since
we need to make a full week of runs to see it again. I am tempted to
just remove the assertion.

> Is it one of those vndconfig errors? The .t seems to have skipped a few
> tests:

This is because FUSE went away during the test.
The vnconfig problems are fixed now and should not happen anymore.
> 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/quota-anon-fd-nfs.t, ./tests/basic/tier/fops-during-migration.t, ./tests/basic/tier/record-metadata-heat.t

2016-02-08 Thread Emmanuel Dreyfus
On Mon, Feb 08, 2016 at 06:25:09PM +0530, Milind Changire wrote:
> Looks like some cores are available as well.
> Please advise.

#0  0xb99912b4 in gf_changelog_reborp_rpcsvc_notify (rpc=0xb7b160f0, 
mydata=0xb7b1a830, event=RPCSVC_EVENT_ACCEPT, data=0xb76a4030)
at 
/home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/changelog/lib/src/gf-changelog-reborp.c:110
110 return 0;

Crash on return: That smells like stack coruption.



-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Cores on NetBSD of brick https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14100/consoleFull

2016-02-08 Thread Emmanuel Dreyfus
On Mon, Feb 08, 2016 at 07:27:46PM +0530, Pranith Kumar Karampuri wrote:
>   I don't see any logs in the archive. Did we change something?

I think thay are in a different tarball, in /archives/logs
-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] changelog bug

2016-02-06 Thread Emmanuel Dreyfus
quot; " " " " " " " " " " "s"w"i"t"c"h" "("e"v"e"n"t")" "{"
"1"1"7" " " " " " " " " " " " " "c"a"s"e" 
"R"P"C"S"V"C"_"E"V"E"N"T"_"A"C"C"E"P"T":"
"1"1"8" " " " " " " " " " " " " " " " " " " " " "r"e"t" "=" 
"s"y"s"_"u"n"l"i"n"k" "("R"P"C"_"S"O"C"K"("e"n"t"r"y")")";"
"("g"d"b")" "p"r"i"n"t" "e"n"t"r"y"
"$"2" "=" "("g"f"_"c"h"a"n"g"e"l"o"g"_"t" "*")" "0"x"b"7"b"1"a"8"3"0"
"("g"d"b")" "p"r"i"n"t" "*"e"n"t"r"y"
"$"3" "=" "{"s"t"a"t"e"l"o"c"k" "=" "{"p"t"s"_"m"a"g"i"c" "=" "0"," 
"p"t"s"_"s"p"i"n" "=" "0" "'"\"0"0"0"'"," "p"t"s"_"f"l"a"g"s" "=" "0"}"," "
" " "c"o"n"n"s"t"a"t"e" "=" 
"G"F"_"C"H"A"N"G"E"L"O"G"_"C"O"N"N"_"S"T"A"T"E"_"P"E"N"D"I"N"G"," "t"h"i"s" "=" 
"0"x"0"," "l"i"s"t" "=" "{"n"e"x"t" "=" "0"x"0"," "
" " " " "p"r"e"v" "=" "0"x"0"}"," "b"r"i"c"k" "=" "'"\"0"0"0"'" 
"<"r"e"p"e"a"t"s" "5"8"0" "t"i"m"e"s">"."."."," "g"r"p"c" "=" "{"s"v"c" "=" 
"0"x"4"f"c"0"0"," "
" " " " "r"p"c" "=" "0"x"b"1"a"8"3"0"0"0"," "
" " " " "s"o"c"k" "="
"""∑"fi"¿"≠"fi"\"0"0"0"\"0"6"0"\"0"3"7"\"2"7"3"î"\"0"0"0"\"0"0"0"\"0"0"0"\"3"7"4"\"0"0"4"\"0"0"0"\"0"0"0"\"0"6"0"®"±"∑"fi"¿"≠"fi"\"0"0"0"\"0"6"0"\"0"3"7"\"2"7"3"î"\"0"0"0"\"0"0"0"\"0"0
"0"\"3"7"4"\"0"0"4"\"0"0"0"\"0"0"0"\"0"6"0"®"±"∑"fi"¿"≠"fi"\"0"0"0"\"0"6"0"\"0"3"7"\"2"7"3"î"\"0"0"0"\"0"0"0"\"0"0"0"\"3"7"4"\"0"0"4"\"0"0"0"\"0"0"0"\"0"6"0"®"±"∑"fi"¿"≠"fi"\"0"0"0"\"0
"6"0"\"0"3"7"\"2"7"3"î"\"0"0"0"\"0"0"0"\"0"0"0"\"3"7"4"\"0"0"4"\"0"0"0"\"0"0"0"\"0"6"0"®"±"∑"fi"¿"≠"fi"\"0"0"0"\"0"6"0"\"0"3"7"\"2"7"3"î"\"0"0"0"\"0"0"0"\"0"0"0"\"3"7"4"\"0"0"4"\"0"0
"0"\"0"0"0"\"0"6"0"®"±"∑"fi"¿"≠"""}"," "
" " "n"o"t"i"f"y" "=" "5"2"3"2"3"9"6"4"6"," "f"i"n"i" "=" "0"x"9"4"b"b"," 
"c"a"l"l"b"a"c"k" "=" "0"x"4"f"c"0"0"," "
" " "c"o"n"n"e"c"t"e"d" "=" "0"x"b"1"a"8"3"0"0"0"," "d"i"s"c"o"n"n"e"c"t"e"d" 
"=" "0"x"a"d"c"0"d"e"b"7"," "p"t"r" "=" "0"x"1"f"3"0"0"0"d"e"," "
" " "i"n"v"o"k"e"r"x"l" "=" "0"x"9"4"b"b"," "o"r"d"e"r"e"d" "=" 
"("u"n"k"n"o"w"n":" "3"2"6"6"5"6")"," "q"u"e"u"e"e"v"e"n"t" "=" 
"0"x"b"1"a"8"3"0"0"0"," "
" " "p"i"c"k"e"v"e"n"t" "=" "0"x"a"d"c"0"d"e"b"7"," "e"v"e"n"t" "=" "{"l"o"c"k" 
"=" "{"p"t"m"_"m"a"g"i"c" "=" "5"2"3"2"3"9"6"4"6"," "
" " " " " " "p"t"m"_"e"r"r"o"r"c"h"e"c"k" "=" "1"8"7" "'"ª"'"," 
"p"t"m"_"p"a"d"1" "=" """î"\"0"0"0"""," "p"t"m"_"i"n"t"e"r"l"o"c"k" "=" "0" 
"'"\"0"0"0"'"," "
" " " " " " "p"t"m"_"p"a"d"2" "=" """\"3"7"4"\"0"0"4"""," "p"t"m"_"o"w"n"e"r" 
"=" "0"x"b"1"a"8"3"0"0"0"," "p"t"m"_"w"a"i"t"e"r"s" "=" "0"x"a"d"c"0"d"e"b"7"," 
"
" " " " " " "p"t"m"_"r"e"c"u"r"s"e"d" "=" "5"2"3"2"3"9"6"4"6"," 
"p"t"m"_"s"p"a"r"e"2" "=" "0"x"9"4"b"b"}"," "c"o"n"d" "=" "{"
" " " " " " "p"t"c"_"m"a"g"i"c" "=" "3"2"6"6"5"6"," "p"t"c"_"l"o"c"k" "=" "0" 
"'"\"0"0"0"'"," "p"t"c"_"w"a"i"t"e"r"s" "=" "{"
" " " " " " " " "p"t"q"h"_"f"i"r"s"t" "=" "0"x"a"d"c"0"d"e"b"7"," 
"p"t"q"h"_"l"a"s"t" "=" "0"x"1"f"3"0"0"0"d"e"}"," "p"t"c"_"m"u"t"e"x" "=" 
"0"x"9"4"b"b"," "
" " " " " " "p"t"c"_"p"r"i"v"a"t"e" "=" "0"x"4"f"c"0"0"}"," "i"n"v"o"k"e"r" "=" 
"0"x"b"1"a"8"3"0"0"0"," "n"e"x"t"_"s"e"q" "=" "2"9"1"5"0"9"8"2"9"5"," "
" " " " "e"n"t"r"y" "=" "0"x"1"f"3"0"0"0"d"e"," "e"v"e"n"t"s" "=" "{"n"e"x"t" 
"=" "0"x"9"4"b"b"," "p"r"e"v" "=" "0"x"4"f"c"0"0"}"}"}"
"("g"d"b")" "b"t"
"#"0" " "0"x"b"9"a"9"1"2"e"c" "i"n" 
"g"f"_"c"h"a"n"g"e"l"o"g"_"r"e"b"o"r"p"_"r"p"c"s"v"c"_"n"o"t"i"f"y" 
"("r"p"c"="0"x"b"7"b"1"6"0"f"0"," "
" " " " "m"y"d"a"t"a"="0"x"b"7"b"1"a"8"3"0"," 
"e"v"e"n"t"="R"P"C"S"V"C"_"E"V"E"N"T"_"A"C"C"E"P"T"," 
"d"a"t"a"="0"x"b"7"8"b"6"0"3"0")"
" " " " "a"t"
"/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"x"l"a"t"o"r"s"/"f"e"a"t"u"r"e"s"/"c"h"a"n"g
"e"l"o"g"/"l"i"b"/"s"r"c"/"g"f"-"c"h"a"n"g"e"l"o"g"-"r"e"b"o"r"p"."c":"1"1"4"
"#"1" " "0"x"b"b"6"f"b"c"1"7" "i"n" "r"p"c"s"v"c"_"p"r"o"g"r"a"m"_"n"o"t"i"f"y" 
"("l"i"s"t"e"n"e"r"="0"x"b"7"b"2"7"1"1"0"," "
" " " " "e"v"e"n"t"="R"P"C"S"V"C"_"E"V"E"N"T"_"A"C"C"E"P"T"," 
"d"a"t"a"="0"x"b"7"8"b"6"0"3"0")"
" " " " "a"t"
"/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"r"p"c"/"r"p"c"-"l"i"b"/"s"r"c"/"r"p"c"s"v"c
"."c":"3"3"5"
"#"2" " "0"x"b"b"6"f"b"c"9"5" "i"n" "r"p"c"s"v"c"_"a"c"c"e"p"t" 
"("s"v"c"="0"x"b"7"b"1"6"0"f"0"," 
"l"i"s"t"e"n"_"t"r"a"n"s"="0"x"b"7"b"1"0"0"3"0"," "
" " " " "n"e"w"_"t"r"a"n"s"="0"x"b"7"8"b"6"0"3"0")"
" " " " "a"t"
"/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"r"p"c"/"r"p"c"-"l"i"b"/"s"r"c"/"r"p"c"s"v"c
"."c":"3"5"8"
"#"3" " "0"x"b"b"6"f"c"a"8"1" "i"n" "r"p"c"s"v"c"_"n"o"t"i"f"y" 
"("t"r"a"n"s"="0"x"b"7"b"1"0"0"3"0"," "m"y"d"a"t"a"="0"x"b"7"b"1"6"0"f"0"," "
" " " " "e"v"e"n"t"="R"P"C"_"T"R"A"N"S"P"O"R"T"_"A"C"C"E"P"T"," 
"d"a"t"a"="0"x"b"7"8"b"6"0"3"0")"
" " " " "a"t"
"/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"r"p"c"/"r"p"c"-"l"i"b"/"s"r"c"/"r"p"c"s"v"c
"."c":"7"8"6"
"#"4" " "0"x"b"b"7"0"2"0"6"c" "i"n" "r"p"c"_"t"r"a"n"s"p"o"r"t"_"n"o"t"i"f"y" 
"("t"h"i"s"="0"x"b"7"b"1"0"0"3"0"," "
" " " " "e"v"e"n"t"="R"P"C"_"T"R"A"N"S"P"O"R"T"_"A"C"C"E"P"T"," 
"d"a"t"a"="0"x"b"7"8"b"6"0"3"0")"
" " " " "a"t"
"/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"r"p"c"/"r"p"c"-"l"i"b"/"s"r"c"/"r"p"c"-"t"r
"a"n"s"p"o"r"t"."c":"5"4"1"
"#"5" " "0"x"b"b"2"6"9"b"b"5" "i"n" 
"s"o"c"k"e"t"_"s"e"r"v"e"r"_"e"v"e"n"t"_"h"a"n"d"l"e"r" "("f"d"="1"6"," 
"i"d"x"="3"," "d"a"t"a"="0"x"b"7"b"1"0"0"3"0"," "
" " " " "p"o"l"l"_"i"n"="1"," "p"o"l"l"_"o"u"t"="0"," "p"o"l"l"_"e"r"r"="0")"
" " " " "a"t" 
"/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"r"p"c"/"r"-"-"-"T"y"p"e"
"<"r"e"t"u"r"n">" "t"o" "c"o"n"t"i"n"u"e"," "o"r" "q" "<"r"e"t"u"r"n">" "t"o" 
"q"u"i"t"-"-"-"
"p"c"-"t"r"a"n"s"p"o"r"t"/"s"o"c"k"e"t"/"s"r"c"/"s"o"c"k"e"t"."c":"2"7"6"5"
"#"6" " "0"x"b"b"7"8"e"9"6"6" "i"n" 
"e"v"e"n"t"_"d"i"s"p"a"t"c"h"_"p"o"l"l"_"h"a"n"d"l"e"r" 
"("e"v"e"n"t"_"p"o"o"l"="0"x"b"b"1"4"3"0"3"0"," "
" " " " "u"f"d"s"="0"x"b"7"8"9"e"0"b"0"," "i"="3")"
" " " " "a"t"
"/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"l"i"b"g"l"u"s"t"e"r"f"s"/"s"r"c"/"e"v"e"n"t
"-"p"o"l"l"."c":"3"8"9"
"#"7" " "0"x"b"b"7"8"e"c"a"d" "i"n" "e"v"e"n"t"_"d"i"s"p"a"t"c"h"_"p"o"l"l" 
"("e"v"e"n"t"_"p"o"o"l"="0"x"b"b"1"4"3"0"3"0")"
" " " " "a"t"
"/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"l"i"b"g"l"u"s"t"e"r"f"s"/"s"r"c"/"e"v"e"n"t
"-"p"o"l"l"."c":"4"8"2"
"#"8" " "0"x"b"b"7"5"d"2"1"9" "i"n" "e"v"e"n"t"_"d"i"s"p"a"t"c"h" 
"("e"v"e"n"t"_"p"o"o"l"="0"x"b"b"1"4"3"0"3"0")"
" " " " "a"t"
"/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"l"i"b"g"l"u"s"t"e"r"f"s"/"s"r"c"/"e"v"e"n"t
"."c":"1"2"2"
"#"9" " "0"x"0"8"0"5"0"e"2"0" "i"n" "m"a"i"n" "("a"r"g"c"="1"2"," 
"a"r"g"v"="0"x"b"f"7"f"e"a"a"4")"
" " " " "a"t"
"/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"g"l"u"s"t"e"r"f"s"d"/"s"r"c"/"g"l"u"s"t"e"r
"f"s"d"."c":"2"3"4"5"



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-infra] Different version of run-tests.sh in jenkin slaves?

2016-01-28 Thread Emmanuel Dreyfus
On Thu, Jan 28, 2016 at 12:17:58PM +0530, Raghavendra Talur wrote:
> Where do I find config in NetBSD which decides which location to dump core
> in?

sysctl kern.defcorename for the default location and name. It can be
overriden per process using  sysctl proc.$$.corename

> Any particular reason you added /d/backends/*/*.core to list of path to
> search for core?

Yes, this is required for standard compliance of the exposed glusterfs
filesystem in the case of low system PATH_MAX. See in posix.c:

/*  
 * _XOPEN_PATH_MAX is the longest file path len we MUST 
 * support according to POSIX standard. When prepended
 * by the brick base path it may exceed backed filesystem
 * capacity (which MAY be bigger than _XOPEN_PATH_MAX). If
 * this is the case, chdir() to the brick base path and
 * use relative paths when they are too long. See also
 * MAKE_REAL_PATH in posix-handle.h   
  */  
_private->path_max = pathconf(_private->base_path, _PC_PATH_MAX);
if (_private->path_max != -1 &&   
_XOPEN_PATH_MAX + _private->base_path_length > _private->path_max) {
ret = chdir(_private->base_path); 
if (ret) {
gf_msg (this->name, GF_LOG_ERROR, 0,
P_MSG_BASEPATH_CHDIR_FAILED,
"chdir() to \"%s\" failed",
_private->base_path);
goto out;
}
And the core goes in current directory by default. We could use
sysctl(3) to change that if we need.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] Different version of run-tests.sh in jenkin slaves?

2016-01-28 Thread Emmanuel Dreyfus
On Thu, Jan 28, 2016 at 12:17:58PM +0530, Raghavendra Talur wrote:
> Where do I find config in NetBSD which decides which location to dump core
> in?

I crafted the patch below, bbut it is probably much simplier to just
set kern.defcorename to /%n-%p.core on all VM slaves. I will do it.

diff --git a/xlators/storage/posix/src/posix.c 
b/xlators/storage/posix/src/posix.c
index 272d08f..2fd2d7d 100644
--- a/xlators/storage/posix/src/posix.c
+++ b/xlators/storage/posix/src/posix.c
@@ -29,6 +29,10 @@
 #include 
 #endif /* HAVE_LINKAT */
 
+#ifdef __NetBSD__
+#include 
+#endif /* __NetBSD__ */
+
 #include "glusterfs.h"
 #include "checksum.h"
 #include "dict.h"
@@ -6631,6 +6635,8 @@ init (xlator_t *this)
 _private->path_max = pathconf(_private->base_path, _PC_PATH_MAX);
 if (_private->path_max != -1 &&
 _XOPEN_PATH_MAX + _private->base_path_length > _private->path_max) 
{
+char corename[] = "/%n-%p.core";
+
 ret = chdir(_private->base_path);
 if (ret) {
 gf_msg (this->name, GF_LOG_ERROR, 0,
@@ -6639,7 +6645,15 @@ init (xlator_t *this)
 _private->base_path);
 goto out;
 }
+
 #ifdef __NetBSD__
+/* 
+ * Make sure cores go to the root and not in current 
+ * directory
+ */
+(void)sysctlbyname("proc.curproc.corename", NULL, NULL, 
+   corename, strlen(corename) + 1);
+
 /*
  * At least on NetBSD, the chdir() above uncovers a
  * race condition which cause file lookup to fail


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] Different version of run-tests.sh in jenkin slaves?

2016-01-28 Thread Emmanuel Dreyfus
On Thu, Jan 28, 2016 at 12:10:49PM +0530, Atin Mukherjee wrote:
> So does that mean we never analyzed any core reported by NetBSD
> regression failure? That's strange.

We got the cores from / but not from d/backends/*/ as I understand.

I am glad someone figured out the mystery. 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Netbsd regressions are failing because of connection problems?

2016-01-21 Thread Emmanuel Dreyfus
On Thu, Jan 21, 2016 at 04:49:28PM +0100, Michael Scherer wrote:
> > review.gluster.org[0: 184.107.76.10]: errno=Connection refused
> 
> SO I found nothing in gerrit nor netbsd. ANd not the DNS, since it
> managed to resolve stuff fine.
> 
> I suspect the problem was on gerrit, nor on netbsd. Did it happened
> again ?

I could imagine problems with exhausted system resources, but it would
not produce a "Connection refused".

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Netbsd regressions are failing because of connection problems?

2016-01-21 Thread Emmanuel Dreyfus
Michael Scherer <msche...@redhat.com> wrote:

> Depend, if they exhausted FD or something ? I am not a java specialist.

It is not the same errno, AFAIK.
 
> Could also just be too long to answer due to the load, but it was not
> loaded :/

High loads give timeouts. I may be wrong, but I beleive connection
refused is really when it gets a TCP RST.


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Netbsd regressions are failing because of connection problems?

2016-01-20 Thread Emmanuel Dreyfus
Vijay Bellur <vbel...@redhat.com> wrote:

> Does not look like a DNS problem. It is happening to me outside of
> rackspace too.

I mean I have already seen rackspace VM failing to initiate connexions
because rackspace DNS failed to answer DNS requests. This was the cause
of failed regression at some time.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Netbsd regressions are failing because of connection problems?

2016-01-20 Thread Emmanuel Dreyfus
Vijay Bellur <vbel...@redhat.com> wrote:

> There is some problem with review.gluster.org now. git clone/pull fails
> for me consistently.

First check DNS is working. I recall seing rackspace DNS failing to
answer.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ./tests/bugs/changelog/bug-1208470.t failed NetBSD

2016-01-19 Thread Emmanuel Dreyfus
On Tue, Jan 19, 2016 at 09:38:19AM +0530, Ravishankar N wrote:
> ./tests/bugs/changelog/bug-1208470.t seems to have failed a NetBSD run: 
> https://build.gluster.org/job/rackspace-regression-2GB-triggered/17651/consoleFull
>  Not sure if it is spurious as it passed in the subsequent run. Please have
> a look.

I am puzzled: NetBSD regression is supposed to skip the bugs subdirectory.
Someone changed something here?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to cope with spurious regression failures

2016-01-19 Thread Emmanuel Dreyfus
On Tue, Jan 19, 2016 at 07:08:03PM +0530, Raghavendra Talur wrote:
> a. Allowing re-running to tests to make them pass leads to complacency with
> how tests are written.
> b. A test is bad if it is not deterministic and running a bad test has *no*
> value. We are wasting time even if the test runs for a few seconds.

I agree with your vision for the long term, but my proposal address the
short term situation. But we could use the retry approahc to fuel your
blacklist approach:

We could immagine a system where the retry feature would cast votes on 
individual tests: each time we fail once and succeed on retry, cast 
a +1 unreliable for the test.

After a few days, we will have a wall of shame for unreliable tests, 
which could either be fixed or go to the blacklist.

I do not know what software to use to collect and display the results, 
though. Should we have a gerrit change for each test?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regression fixes

2016-01-18 Thread Emmanuel Dreyfus
Hi all

I have the followif changes awaiting code review/merge:
http://review.gluster.org/13204
http://review.gluster.org/13205
http://review.gluster.org/13245
http://review.gluster.org/13247

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] How to cope with spurious regression failures

2016-01-18 Thread Emmanuel Dreyfus
Hi

Spurious regression failures make developers frustrated. One submits a
change and gets completely unrelated failures. The only way out is to
retrigger regression until it passes, a boring and time-wasting task.
Sometimes after 4 or 5 failed runs, the submitter realize there is a
real issue and look at it, which is a waste of time and resources.

The fact that we run regression on multiple platforms makes the
situation worse. If you have 10% of chances to hit a spurious failure on
Linux and a 20% chances to hit a spurious failure on NetBSD (random
number chosen), that means you get roughtly a failure for four
submissions (random prediction, as I used random input numbers, but you
get the idea)

Two solutions are proposed:

1) do not run unreliable tests, as proposed by Raghavendra Talur:
http://review.gluster.org/13173

I have nothing against the idea, but I voted down the change because it
fails to address the need for different test blacklists on different
platforms: we do not have the same unreliable tests on Linux and NetBSD.

2) add a regression option to retry a failed test once, and to validate
the regression if second attempt passes, as I proposed:
http://review.gluster.org/13245

The idea is basicaly to automatically do what every submitter has been
doing: retry without a thought when regression fails. The benefit of
this approach is also that it gives us a better view of what test failed
because of the change, and what test failed because it was unreliable.

The retry feature is optionnal and triggered by using the -r flag to
run-tests.sh. I intend to use it on NetBSD regression to reduce the
number of failures that annoy people. It could be used on Linux
regression too, though I do not plan to touch that on my own.

Please people tell us what approach you prefer. 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] NetBSD regression fixes

2016-01-16 Thread Emmanuel Dreyfus
Emmanuel Dreyfus <m...@netbsd.org> wrote:

> But I just realized the change is wrong, since running tests "new way"
> stops on first failed test. My change just retry the failed test and
> considers the regression run to be good on success, without running next
> tests.
> 
> I will post an update shortly.

Done:
http://review.gluster.org/13245
http://review.gluster.org/13247
-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD hang in quota-anon-fd-nfs.t

2016-01-11 Thread Emmanuel Dreyfus
On Mon, Jan 11, 2016 at 11:51:25AM +0530, Vijaikumar Mallikarjuna wrote:
> All quota test-cases uses 'tests/basic/quota.c' to write data
> Does sync flags have any impact?

It seems to change internal behavior but not the result, I can still
see write calls taking e.g.: 1169s

For the sake of compteness: instead of awaiting for page locked 
(probably by NFS subsystem), it now waits for NFS RPC replies
from the server. Example of kernel backtrace:
sleepq_block
cv_timedwait
nfs_rcvlock
nfs_request
nfs_writerpc
nfs_doio
VOP_STRATEGY
genfs_do_io
genfs_gop_write
genfs_do_putpages
genfs_putpages
VOP_PUTPAGES
nfs_write
VOP_WRITE
vn_write
dofilewrite
sys_write
syscall

I note we mount with -o noac,soft,nolock,vers=3
with the scripts turn into -o tcp,-R=2,soft,nfs3 fro NetBSD.
-R is retry. There is no timeout. Do we need one?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing

2016-01-11 Thread Emmanuel Dreyfus
Niels de Vos <nde...@redhat.com> wrote:

> How would we handle patches that get sent by maintainers? Most
> developers that do code reviews will only +1 those changes. Those will
> never get automatically regression tested then. I dont think a
> maintainer should +2 their own patch immediately either, that suggests
> no further reviews are needed.

Indeed it is a bit odd, but I just CR +2 my own changes...

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD hang in quota-anon-fd-nfs.t

2016-01-10 Thread Emmanuel Dreyfus
Emmanuel Dreyfus <m...@netbsd.org> wrote:

> ps -axl shows the quota helper program is waiting on genput:
> UID  PID  PPID CPU PRI NI  VSZ  RSS WCHAN  STAT TTY  TIME COMMAND
>   0 9660 23707   0 124  0 3360 1080 genput D+   pts/2 0:00.01
> ./tests/basic/quota /mnt/nfs/0//0/1/2/3/4/5/6/7/8/9/new_file_2 256 4 
> 
> The process is stuck in kernel awaiting for a memory page to get
> unlocked. 

I reproduced the situation, and discovered the process is not really
hung. Tracing system calls in the quota procss shows that it does
complete write operations, thought ater a very long time. One write
system call that last 963s, for instance.

It does not hang, but it does not look sane either.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD tests not running to completion.

2016-01-10 Thread Emmanuel Dreyfus
Pranith Kumar Karampuri <pkara...@redhat.com> wrote:

> I tried to look into 3 instances of this failure:
(...)
> same issue as above, two tests are running in parallel.

How is it possible? A & that sends a job in the background?
Are we sure it is the same regression test run? Or is it two regression
test runs that are scheduled simultaneously?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] NetBSD hang in quota-anon-fd-nfs.t

2016-01-10 Thread Emmanuel Dreyfus
Starting a new thread for the sake of clarity.

While looking for the spurious reboot problem, I got a hang in
quota-anon-fd-nfs.t.
[23:39:54] ./tests/basic/quota-anon-fd-nfs.t .. 16/40

ps -axl shows the quota helper program is waiting on genput:
UID  PID  PPID CPU PRI NI  VSZ  RSS WCHAN  STAT TTY  TIME COMMAND
  0 9660 23707   0 124  0 3360 1080 genput D+   pts/2 0:00.01
./tests/basic/quota /mnt/nfs/0//0/1/2/3/4/5/6/7/8/9/new_file_2 256 4 

The process is stuck in kernel awaiting for a memory page to get
unlocked. That filesystem is still alive, which suggests an unwind
operation like the one fixed in http://review.gluster.org/13177

I can unlock the situation by killing glusterfs daemons. Does it rings a
bell for someone? 


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD tests not running to completion.

2016-01-10 Thread Emmanuel Dreyfus
Emmanuel Dreyfus <m...@netbsd.org> wrote:

> While trying to reproduce the problem in
> ./tests/basic/afr/arbiter-statfs.t, I came to many failures here:
> 
> [03:53:07] ./tests/basic/afr/split-brain-resolution.t 

I was running tests from wrong directory :-/
This one is fine with HEAD.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD tests not running to completion.

2016-01-09 Thread Emmanuel Dreyfus
Pranith Kumar Karampuri <pkara...@redhat.com> wrote:

> tests/basic/afr/arbiter-statfs.t

I posted patches to fix this one (but it seems Jenkins is down? No
regression is running)

> tests/basic/afr/self-heal.t
> tests/basic/afr/entry-self-heal.t

That two ones are still to be investigated, and it seems
tests/basic/afr/split-brain-resolution.t is now reliabily broken as
well.

> tests/basic/quota-nfs.t 

That one is marked as bad test and should not cause harm on spurious
failure as its result is ignored.

I am trying to reproduce a spurious VM reboot during tests by looping on
the whole test suite on nbslave70, with reboot on panic disabled (it
will drop into kernel debugger instead). No result so far.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
On Fri, Jan 08, 2016 at 12:42:36PM +0530, Sachidananda URS wrote:
> I have a NetBSD 7.0 installation which I can share with you, to get
> started.
> Once manu@ gets back on a specific version, I can set that up too.

NetBSD 7.0 is fine and has everything required in GENERIC kernel.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
Ravishankar N <ravishan...@redhat.com> wrote:

> It failed with EIO.
> 
> mount_nfs: can't access /patchy: Permission denied
> mount_nfs: can't access /patchy: Permission denied
> mount_nfs: can't access /patchy: Permission denied
> dd: /mnt/nfs/0/test-big-write: Input/output error

I suspect the EIO is just a consequence of the failed mount.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
On Fri, Jan 08, 2016 at 03:18:02PM +0530, Pranith Kumar Karampuri wrote:
> With your support I think we can make things better. To avoid duplication of
> work, did you take any tests that you are already investigating? If not that
> is the first thing I will try to find out.

I will look at the ./tests/basic/afr/arbiter-statfs.t problem with
loopback device.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
Emmanuel Dreyfus <m...@netbsd.org> wrote:

> > With your support I think we can make things better. To avoid duplication of
> > work, did you take any tests that you are already investigating? If not that
> > is the first thing I will try to find out.
> 
> I will look at the ./tests/basic/afr/arbiter-statfs.t problem with
> loopback device.

I tracked it down: vnconfig -l complains about "vnconfig: VNDIOCGET: Bad
file descriptor" when we had a configured loopback device with the
backing store on a filesystem we unmounted.

# dd if=/dev/zero of=/scratch/backend bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 3.034 secs (34560843 bytes/sec)
# vnconfig vnd0 /scratch/backend
# vnconfig -l
vnd0: /scratch (/dev/xbd1a) inode 6
vnd1: not in use
vnd2: not in use
vnd3: not in use
# umount -f /scratch/
# vnconfig -l 
vnconfig: VNDIOCGET: Bad file descriptor

But it seems the workaround is easy:
# vnconfig -u vnd0
# vnconfig -l  
vnd0: not in use
vnd1: not in use
vnd2: not in use
vnd3: not in use

Here is my fixes:
http://review.gluster.org/13204 (master)
http://review.gluster.org/13205 (release-3.7)

And while there, a portability fix in rfc.sh:
http://review.gluster.org/13206 (master)
That bug is not present in release-3.7.
 
-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD tests not running to completion.

2016-01-08 Thread Emmanuel Dreyfus
Pranith Kumar Karampuri <pkara...@redhat.com> wrote:

> With your support I think we can make things better. To avoid 
> duplication of work, did you take any tests that you are already 
> investigating? If not that is the first thing I will try to find out.

While trying to reproduce the problem in
./tests/basic/afr/arbiter-statfs.t, I came to many failures here:

[03:53:07] ./tests/basic/afr/split-brain-resolution.t .. 20/43 getfattr:
Removin
g leading '/' from absolute path names
cat: /mnt/glusterfs/0/data-split-brain.txt: Input/output error
not ok 25 Got "" instead of "brick0_alive"
cat: /mnt/glusterfs/0/data-split-brain.txt: Input/output error
not ok 27 Got "" instead of "brick1_alive"
getfattr: Removing leading '/' from absolute path namesnot ok 30 Got ""
instead of "brick0"
not ok 32 Got "" instead of "brick1"

It is not in the lists posted here. Is it only at mine?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


  1   2   3   4   5   6   >