Re: [Gluster-devel] .glusterfs directory?
On Mon, Dec 21, 2020 at 01:53:06PM +0530, Ravishankar N wrote: > Are you talking about the entries inside.glusterfs/indices/xattrop/* ? Any > stale entries here should automatically be purged when self-heal daemon as > it crawls the folder periodically. I mean for instance: # ls -l .glusterfs/aa/aa/dd69-7b3d-45e9-bd0f-8a8bbaa189a5 lrwxrwxrwx 1 root wheel 60 Nov 4 2018 .glusterfs//aa/aa/dd69-7b3d-45e9-bd0f-8a8bbaa189a5 -> ../../f0/91/f091de81-a4e2-4548-acf4-4b19c7bdac5e/tpm_nvwrite # ls -l .glusterfs/f0/91/f091de81-a4e2-4548-acf4-4b19c7bdac ls: .glusterfs/f0/91/f091de81-a4e2-4548-acf4-4b19c7bdac5e/tpm_nvwrite: No such file or directory -- Emmanuel Dreyfus m...@netbsd.org --- Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] .glusterfs directory?
> On a healthy system, one should definitely not remove any files or sub > directories inside .glusterfs as they contain important metadata. Which > entries specifically inside .glusterfs do you think are stale and why? There are indexes leading to no file, causing heal complains. -- Emmanuel Dreyfus m...@netbsd.org --- Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] .glusterfs directory?
Hello I have a lot of stale entries in bricks' .glusterfs directories. Is it safe to just rm -rf it and hope for automatic rebuild? Reading the source and experimenting, it does not seems obvious. Or is there a way to clean up stale entries that lead to files that do not exist anymore? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org --- Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] readdir performance
us 61.70 us 1GETXATTR 0.00 44.23 us 40.46 us 47.99 us 2 STATFS 0.00 66.94 us 53.19 us 80.69 us 2 OPENDIR 0.01 214.48 us 132.89 us 350.46 us 12 LOOKUP 99.99 19333949.34 us 808361.68 us 37859537.00 us 2 READDIRP Duration: 138824 seconds Data Read: 146374656 bytes Data Written: 0 bytes Interval 0 Stats: Block Size: 4096b+8192b+ 16384b+ No. of Reads:9 5 2 No. of Writes:0 0 0 Block Size: 32768b+ No. of Reads: 4463 No. of Writes:0 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us127 FORGET 0.00 0.00 us 0.00 us 0.00 us127 RELEASE 0.00 0.00 us 0.00 us 0.00 us 66931 RELEASEDIR 0.00 61.70 us 61.70 us 61.70 us 1GETXATTR 0.00 44.23 us 40.46 us 47.99 us 2 STATFS 0.00 66.94 us 53.19 us 80.69 us 2 OPENDIR 0.01 214.48 us 132.89 us 350.46 us 12 LOOKUP 99.99 19333949.34 us 808361.68 us 37859537.00 us 2 READDIRP Duration: 138824 seconds Data Read: 146374656 bytes Data Written: 0 bytes Brick: baril:/export/wd2e - Cumulative Stats: Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 10 220 No. of Writes:0 0 0 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us134 FORGET 0.00 0.00 us 0.00 us 0.00 us135 RELEASE 0.00 0.00 us 0.00 us 0.00 us 67991 RELEASEDIR 0.07 52.79 us 52.79 us 52.79 us 1GETXATTR 0.10 39.25 us 38.38 us 40.13 us 2 STATFS 0.13 51.45 us 45.36 us 57.55 us 2 OPENDIR 20.281406.18 us 133.88 us 13190.43 us 11 LOOKUP 79.41 60571.65 us 60571.65 us 60571.65 us 1READDIRP Duration: 138822 seconds Data Read: 786432 bytes Data Written: 0 bytes Interval 0 Stats: Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 10 220 No. of Writes:0 0 0 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us134 FORGET 0.00 0.00 us 0.00 us 0.00 us135 RELEASE 0.00 0.00 us 0.00 us 0.00 us 67991 RELEASEDIR 0.07 52.79 us 52.79 us 52.79 us 1GETXATTR 0.10 39.25 us 38.38 us 40.13 us 2 STATFS 0.13 51.45 us 45.36 us 57.55 us 2 OPENDIR 20.281406.18 us 133.88 us 13190.43 us 11 LOOKUP 79.41 60571.65 us 60571.65 us 60571.65 us 1READDIRP Duration: 138822 seconds Data Read: 786432 bytes Data Written: 0 bytes -- Emmanuel Dreyfus m...@netbsd.org ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Corrupted list in iot_worker
Hello I experienced multiple cases of this crash: Program terminated with signal SIGSEGV, Segmentation fault. #0 list_del_init (old=0x24103800) at ../../../../libglusterfs/src/glusterfs/list.h:82 warning: Source file is more recent than executable. 82 old->prev->next = old->next; [Current thread is 1 (process 100)] (gdb) bt #0 list_del_init (old=0x24103800) at ../../../../libglusterfs/src/glusterfs/list.h:82 #1 __iot_dequeue (conf=conf@entry=0xb5c03338, pri=pri@entry=0xb1c53fb4) at io-threads.c:110 #2 0xb59fe379 in iot_worker (data=0xb5c03338) at io-threads.c:222 (gdb) print old $1 = (struct list_head *) 0x24103800 (gdb) print *old Cannot access memory at address 0x24103800 Offending code in frame 1: 108 /* Get the first request on that queue. */ 109 stub = list_first_entry(>reqs, call_stub_t, list); 110 list_del_init(>list); (gdb) print stub $1 = (call_stub_t *) 0x979fb800 (gdb) print *stub Cannot access memory at address 0x979fb800 (gdb) print ctx->reqs Cannot access memory at address 0xa8979ff4 (gdb) print *ctx Cannot access memory at address 0xa8979ff4 We got ctx a bit earlier: 98 /* Get the first per-client queue for this priority. */ 99 ctx = list_first_entry(>clients[i], iot_client_ctx_t, clients); And the list is obviously bad here: (gdb) print *conf->clients[3]->next $11 = {next = 0x144, prev = 0x0} Known bug? -- Emmanuel Dreyfus m...@netbsd.org ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Help with smoke test failure
Hello I am still stuck on this one: how should I address the missing SpecApproved and DocApproved here? On Fri, Jul 10, 2020 at 05:43:54PM +0200, Emmanuel Dreyfus wrote: > > What should I do to get this passed? > > > > https://build.gluster.org/job/comment-on-issue/19308/ : FAILURE <<< > > Missing SpecApproved flag on Issue 1361 > > Missing DocApproved flag on Issue 1361 -- Emmanuel Dreyfus m...@netbsd.org ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Help with smoke test failure
Sorry, wrong list! I repost at the right place. Emmanuel Dreyfus wrote: > Hello > > What should I do to get this passed? > > https://build.gluster.org/job/comment-on-issue/19308/ : FAILURE <<< > Missing SpecApproved flag on Issue 1361 > Missing DocApproved flag on Issue 1361 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] heal info output
On Mon, Jul 06, 2020 at 12:27:38PM +0200, Xavi Hernandez wrote: > Is the '.attribute' directory only present on the root directory of a > filesystem ? if so I strongly recommend to never use the root of a > filesystem to place bricks. Always place the brick into a subdirectory. Right, but once the user did the mistake, we need a way out. I found the new places in the posix xlator where taht directory was not properly ignored, I will submit a patch. > > 2) /owncloud/data is a directory. mode, owner and groups are the same > > on bricks. Why is it listed here? > > If files or subdirectories have been created or removed from that directory > and the operation failed on some brick (or the brick was down), the > directory is also marked as bad. You should also check the contents. Indeed there was some messy stuff, but the only way I found to fix it was cp -rp dir dir.bak && mv dir dir.orig && mv dir.bak dir rm -rf dir.orig But now I have directories in split brain. The only difference I find is a file mtime. How should I fix that? gluster volume heal gfs splut-brain latest-mtime dir does not work, and if I try it on the file inside, I am told it is not in split brain. -- Emmanuel Dreyfus m...@netbsd.org ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] self heal deamon not running
Emmanuel Dreyfus wrote: > bidon# gluster volume heal gfs full > Launching heal operation to perform full self heal on volume gfs has been > unsuccessful: Self-heal daemon is not running. Check self-heal daemon log > file. I noticed that gluster volume heal gfs info show Brick bidon:/export/wd2e Status: Socket is not connected Number of entries: - Killing the glusterfsd process for this brick and issuing gluster volume start gfs force managed to get me out of this situaion. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] self heal deamon not running
using set value 42 [2020-07-08 01:10:10.107534] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-client-4: option strict-locks using set value off [2020-07-08 01:10:10.107553] I [MSGID: 0] [options.c:1240:xlator_option_reconf_int32] 0-gfs-client-5: option ping-timeout using set value 42 [2020-07-08 01:10:10.107569] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-client-5: option strict-locks using set value off [2020-07-08 01:10:10.107582] I [MSGID: 0] [options.c:1239:xlator_option_reconf_uint32] 0-gfs-replicate-2: option background-self-heal-count using set value 0 [2020-07-08 01:10:10.107596] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-2: option metadata-self-heal using set value on [2020-07-08 01:10:10.107607] I [MSGID: 0] [options.c:1236:xlator_option_reconf_str] 0-gfs-replicate-2: option data-self-heal using set value on [2020-07-08 01:10:10.107618] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-2: option entry-self-heal using set value on [2020-07-08 01:10:10.107647] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-2: option self-heal-daemon using set value enable [2020-07-08 01:10:10.107659] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-2: option iam-self-heal-daemon using set value yes [2020-07-08 01:10:10.108327] I [MSGID: 0] [options.c:1240:xlator_option_reconf_int32] 0-gfs-client-6: option ping-timeout using set value 42 [2020-07-08 01:10:10.108372] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-client-6: option strict-locks using set value off [2020-07-08 01:10:10.108387] I [MSGID: 0] [options.c:1240:xlator_option_reconf_int32] 0-gfs-client-7: option ping-timeout using set value 42 [2020-07-08 01:10:10.108402] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-client-7: option strict-locks using set value off [2020-07-08 01:10:10.108414] I [MSGID: 0] [options.c:1239:xlator_option_reconf_uint32] 0-gfs-replicate-3: option background-self-heal-count using set value 0 [2020-07-08 01:10:10.108436] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-3: option metadata-self-heal using set value on [2020-07-08 01:10:10.108447] I [MSGID: 0] [options.c:1236:xlator_option_reconf_str] 0-gfs-replicate-3: option data-self-heal using set value on [2020-07-08 01:10:10.108458] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-3: option entry-self-heal using set value on [2020-07-08 01:10:10.108485] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-3: option self-heal-daemon using set value enable [2020-07-08 01:10:10.108496] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-replicate-3: option iam-self-heal-daemon using set value yes [2020-07-08 01:10:10.108523] I [MSGID: 0] [options.c:1236:xlator_option_reconf_str] 0-gfs: option log-level using set value INFO [2020-07-08 01:10:10.112105] I [glusterfsd-mgmt.c:2170:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: baril:24007 [2020-07-08 01:10:10.112210] I [MSGID: 101221] [common-utils.c:3822:gf_set_volfile_server_common] 0-gluster: duplicate entry for volfile-server [{errno=17}, {error=File exists}] [2020-07-08 01:10:10.112309] I [MSGID: 100040] [glusterfsd-mgmt.c:109:mgmt_process_volfile] 0-glusterfs: No change in volfile, countinuing [] [2020-07-08 01:10:09.927148] I [MSGID: 0] [options.c:1240:xlator_option_reconf_int32] 0-gfs-client-0: option ping-timeout using set value 42 [2020-07-08 01:10:09.927159] I [MSGID: 0] [options.c:1245:xlator_option_reconf_bool] 0-gfs-client-0: option strict-locks using set value off [2020-07-08 01:10:10.108539] I [MSGID: 0] [options.c:1240:xlator_option_reconf_int32] 0-gfs: option threads using set value 16 [2020-07-08 01:11:19.582209] I [socket.c:849:__socket_shutdown] 0-gfs-client-4: intentional socket shutdown(19) [2020-07-08 01:11:19.582327] I [socket.c:849:__socket_shutdown] 0-gfs-client-6: intentional socket shutdown(20) [2020-07-08 01:13:28.372643] I [socket.c:849:__socket_shutdown] 0-gfs-client-4: intentional socket shutdown(19) [2020-07-08 01:13:28.417206] I [socket.c:849:__socket_shutdown] 0-gfs-client-6: intentional socket shutdown(20) [2020-07-08 01:15:49.007545] I [socket.c:849:__socket_shutdown] 0-gfs-client-4: intentional socket shutdown(19) [2020-07-08 01:15:49.035180] I [socket.c:849:__socket_shutdown] 0-gfs-client-6: intentional socket shutdown(20) -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] glusterfsd memory usage
Hello I see glusterfsd processes growing up to multiple gigabyte of virtual memory. Is it something that should be expected? After some time, the machine runsout of swap and kills the processes PID SIZERES COMMAND 8397 2113M 684M glusterfsd 19427 1412M 168M glusterfsd 16873 1914M 279M glusterfsd 6809 183M 27M glusterfsd -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] heal info output
Hello gluster volume heal info show me questionable entries. I wonder if these are bugs, or if I shoud handle them and how. bidon# gluster volume heal gfs info Brick bidon:/export/wd0e_tmp Status: Connected Number of entries: 0 Brick baril:/export/wd0e /.attribute/system Status: Connected Number of entries: 2 (...) Brick bidon:/export/wd2e /owncloud/data There are three cases: 1) /.attribute directory is special on NetBSD, it is where extended attributes are stored for the filesystem. The posix xlator takes care of screening it, but there must be some other softrware component that should learn it must disregeard it. Hints are welcome about where I should look at. 2) /owncloud/data is a directory. mode, owner and groups are the same on bricks. Why is it listed here? 3) What should I do with this? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NetBSD build fixes for 8.0rc
Hello After a long absence, I tried to upgrade glusterfs on NetBSD. I am a bit sad to discover that after investing a lot of efforts to setup NetBSD tests, even the build is broken now. Here are the build fixes. Can someone explain what went wrong in gerrit review? Two tests failed, but I am not sure I understand why. https://review.gluster.org/#/c/glusterfs/+/24648/ -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] directory filehandles
Hello I have trouble figuring the whole story about how to cope with FUSE directory filehandles in the NetBSD implementation. libfuse makes a special use of filehandles exposed to filesystem for OPENDIR, READDIR, FSYNCDIR, and RELEASEDIR. For that four operations, the fh is a pointer to a struct fuse_dh, in which the fh field is exposed to the filesystem. All other filesystem operations pass the fh as is from kernel to filesystem back and forth. That means that a fh obtained by OPENDIR should never be passed to operations others than (READDIR, FSYNCDIR and RELEASEDIR). For instance, when porting ltfs to NetBSD, I experienced that passing a fh obtained from OPENDIR to SETATTR would crash. glusterfs implementation differs from libfuse because it seems the filesystem is always passed as is: there is nothing like libfuse struct fuse_dh. It will therefore happily accept fh obtained by OPENDIR for any operation, something that I do not expect to happen in libfuse based filesystems. My real concern is SETLK on directory. Here glusterfs really wants a fh or it will report an error. The NetBSD implementation passes the fh it got from OPENDIR, but I expect a libfuse based filesystem to crash in such a situation. For now I did not find any libfuse-based filesystem that implements locking, so I could not test that. Could someone clarify this? What are the FUSE operations that should be sent to filesystem on that kind of program? int fd; /* NetBSD calls FUSE LOOKUP and OPENDIR */ if ((fd = open("/gfs/tmp", O_RDONLY, 0)) == -1) err(1, "open failed"); /* NetBSD calls FUSE SETLKW */ if (flock(fd, LOCK_EX) == -1) err(1, "flock failed"); -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/836554017 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/486278655 Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] I/O performance
On Thu, Jan 31, 2019 at 10:53:48PM -0800, Vijay Bellur wrote: > Perhaps we could throttle both aspects - number of I/O requests per disk While there it would be nice to detect and report a disk with lower than peer performance: that happen sometimes when a disk is dying, and last time I was hit by that performance problem, I had a hard time finding the culprit. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] FUSE directory filehandle
Hello This is not strictly a GlusterFS question since I came to it porting LTFS to NetBSD, however I would like to make sure I will not break GlusterFS by fixing NetBSD FUSE implementation for LTFS. Current NetBSD FUSE implementation sends the filehandle in any FUSE requests for an open node, regardless of its type (directory or file). I discovered that libfuse low level code manages filehandle differently for opendir/readdir/syncdir/releasedir than for other operations. As a result, when a getattr is done on a directory, setting the filehandle obtained from opendir can cause a crash in libfuse. The fix for NetBSD FUSE implementation is to avoid setting the filehandle for the following FUSE operations on directories: getattr, setattr, poll, getlk, setlk, setlkw, read, write (only the first two ones are likely to be actually used, though) Does anyone forsee a possible problem for GlusterFS with such a behavior? In other words, will it be fine to always have a FUSE_UNKNOWN_FH (aka null) filehandle for getattr/setattr on directories? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Split brain after replacing a brick
Pranith Kumar Karampuri <pkara...@redhat.com> wrote: > Could you give the extended attributes of that directory on all the bricks > to figure out the kind of split-brain? In the meantime, I cleared all extended attributes on .attribute directories and it fixed it. The bug is just that something added extended attributes to this directory at some time.IIRC the storage/posix translator avoids it, but there must be some other code path. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Split brain after replacing a brick
Hello After doing a replace-brick and a full heal, I am left with: Brick bidon:/export/wd0e Status: Connected Number of entries: 0 Brick baril:/export/wd0e Status: Connected Number of entries: 0 Brick bidon:/export/wd1e Status: Connected Number of entries: 0 Brick baril:/export/wd1e Status: Connected Number of entries: 0 Brick bidon:/export/wd2e Status: Connected Number of entries: 0 Brick baril:/export/wd2e /.attribute Status: Connected Number of entries: 1 Brick bidon:/export/wd3e_tmp Status: Connected Number of entries: 0 Brick baril:/export/wd3e / - Is in split-brain Status: Connected Number of entries: 1 I guess the baril:/export/wd3e split -brain for / fits /.attribute on baril:/export/wd2e? How can I check? .attribute is the hidden directory where NetBSD stores extended attributes. It should be ignored by healing. Is there a bug to fix here? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Slow volume, gluster volume status bug
Emmanuel Dreyfus <m...@netbsd.org> wrote: > What happens if I remove the trusted.gfid2path.* attributes? Are > they just re-created? After reading the source, I concluded I could safely remover the trusted.gfid2path.* attributes. It fixed the (NetBSD specific) performance problem. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Slow volume, gluster volume status bug
On Tue, Nov 14, 2017 at 06:38:44PM +0530, Atin Mukherjee wrote: > So this is the origin of why the peers don't understand they are connected. > Friend handshaking got stuck in the middle and it never recovered back. > Restarting the glusterd services ideally should fix the state, if not then > you'd have to manually edit the /var/lib/glusterd/peers/UUID files with > state=3 and then restart glusterd service. That fixed it: I now see all my bricks again. Where could I have found that in the documentation? Now I just need to know if trusted.gfid2path attributes can be safely removed once the feature is disabled. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Slow volume, gluster volume status bug
On Tue, Nov 14, 2017 at 10:43:39AM +, Emmanuel Dreyfus wrote: > What happens if I remove the trusted.gfid2path.* attributes? Are > they just re-created? > > Some hint on how to disable the feature? gluster volume set gfs gfid2path off Can I jst delete the trusted.gfid2path.* attributes once I did that? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Slow volume, gluster volume status bug
On Tue, Nov 14, 2017 at 09:17:35AM +, Emmanuel Dreyfus wrote: > In the meantime I tracked the performance problem to exteded atributes > system calls. The root of the problem is outside of glusterfs, but fixing > the consequuences would be nice. I think I found the problem: listxattr() scales badly on NetBSD with the amount of extended attributes names on the filesystem. I now have 14954 different extended attribute names, 14910 being of the form trusted.gfid2path.7c8a8ff2db92b4ec I see news about trusted.gfid2path in http://docs.gluster.org/en/latest/release-notes/3.12.0/ This exlains why I got hurt when upgrading to 3.12.2. What happens if I remove the trusted.gfid2path.* attributes? Are they just re-created? Some hint on how to disable the feature? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Slow volume, gluster volume status bug
On Tue, Nov 14, 2017 at 12:17:05PM +0530, Atin Mukherjee wrote: > > gluster volume status also exhibits trouble: each server will only > > list its bricks, but not the other's one. I suspect it could just > > be some tiemout because of slow answer from the peer. > Have you checked the output of gluster peer status? Also does glusterd log > file give any hint on time outs, rpc failures, disconnections et all? gluster peer status says "State: Sent and Received peer request (Connected)" on both sides. I have this in glusterd.log: [2017-11-14 08:49:47.289423] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd3e on port 49155 [2017-11-14 08:49:52.289926] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd0e on port 49152 [2017-11-14 08:49:52.295394] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd1e on port 49153 [2017-11-14 08:49:52.302973] I [MSGID: 106143] [glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick /export/wd2e on port 49154 [2017-11-14 08:54:31.535066] W [socket.c:593:__socket_rwv] 0-management: readv on 192.0.2.109:24007 failed (Connection reset by peer) [2017-11-14 08:54:32.567745] I [MSGID: 106004] [glusterd-handler.c:6284:__glusterd_peer_rpc_notify] 0-management: Peer (<2d7719d9-0466-434c-a881-4081156fac47>), in state , has disconnected from glusterd. An odd thing: the registrations message suggest the local bricks should show as online in glusterfs volume status output. They are displayed as offline, until I kill the glusterfsd processes and issue a gluster volume start gfs force. ALong with symetrical stuff, the peer has this; [2017-11-14 08:56:05.799686] E [socket.c:2369:socket_connect_finish] 0-management: connection to 192.0.2.110:24007 failed (Connection timed out); disconnecting socket In the meantime I tracked the performance problem to exteded atributes system calls. The root of the problem is outside of glusterfs, but fixing the consequuences would be nice. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Slow volume, gluster volume status bug
Hello I am looking for hints about how to debug this: I have a 4x2 Distributed-Replicate volume which exhibits extremely slow operations. Example: # time stat /gfs/dl 51969 10143657874486987692 drwxr-xr-x 4 _httpd wheel 172912968 4096 "Nov 13 17:22:12 2017" "Sep 22 11:53:35 2017" "Sep 22 11:53:35 2017" "Jan 1 01:00:00 1970" 131072 8 0 /gfs/dl 8.72s real 0.00s user 0.01s system But the thing is not 100% reproductible. Sometime I get an isntant (normal) response. gluster volume status also exhibits trouble: each server will only list its bricks, but not the other's one. I suspect it could just be some tiemout because of slow answer from the peer. tcpdump tells me that the server can take seconds to answer. Brick logs show nothing special. Any idea? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] glusters 3.12.2: bricks do not start on NetBSD
Emmanuel Dreyfus <m...@netbsd.org> wrote: > [2017-11-02 12:32:57.429885] E [MSGID: 115092] > [server-handshake.c:586:server_setvolume] 0-gfs-server: No xlator > /export/wd0e is found in child status list > [2017-11-02 12:32:57.430162] I [MSGID: 115091] > [server-handshake.c:761:server_setvolume] 0-gfs-server: Failed to get > client opversion Problem solved through gluster volume sync on each server right after upgrading. I still do not know what went wrong, but I have a workaround. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] glusters 3.12.2: bricks do not start on NetBSD
Hello I have been missing updates for a while. Now I try to upgrade from 3.8.9 to 3.12.2 and I hit a regression: brick processes start, but gluster volume status show them as not started. A relevant line in the brick process log is: [2017-11-02 12:32:56.867606] E [MSGID: 115092] [server-handshake.c:586:server_se tvolume] 0-gfs-server: No xlator /export/wd0e is found in child status list [2017-11-02 12:32:56.867803] I [addr.c:55:compare_addr_and_update] 0-/export/wd0 e: allowed = "*", received addr = "192.0.2.109" [2017-11-02 12:32:56.867803] I [addr.c:55:compare_addr_and_update] 0-/export/wd0 e: allowed = "*", received addr = "192.0.2.109" [2017-11-02 12:32:56.867863] I [MSGID: 115029] [server-handshake.c:793:server_setvolume] 0-gfs-server: accepted client from bidon.example.net-25092-2017/11/02-12:32:48:770637-gfs-client-0-0-0 (version: 3.12.2) [2017-11-02 12:32:57.429885] E [MSGID: 115092] [server-handshake.c:586:server_setvolume] 0-gfs-server: No xlator /export/wd0e is found in child status list [2017-11-02 12:32:57.430162] I [MSGID: 115091] [server-handshake.c:761:server_setvolume] 0-gfs-server: Failed to get client opversion Any idea of what goes wrong? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] file upload to a gluster mount
Hello I discover that uploading a file trough PHP to a gluster mount is wuite slow, because of the small chunk size (5 ko). Beside patching PHP to increase the chunk size, I can imagine making an Apache module that would use gluster API to efficiently handle a file uploead. Perhaps someone did it already? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Attackers hitting vulnerable HDFS installations
On Fri, Feb 10, 2017 at 08:30:40AM -0500, Ira Cooper wrote: > But I suspect... You got it right, Gluster isn't big enough to attack today. It is just a matter of time. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Attackers hitting vulnerable HDFS installations
On Thu, Feb 09, 2017 at 03:53:52PM -0500, Jeff Darcy wrote: > https://www.theregister.co.uk/2017/02/09/hadoop_clusters_fked/ > Similar attacks have occurred against MongoDB and ElasticSearch. > How long before they target us? How will we do? It is true default glusterfs installation is too open. A simple solution would be to introduce an access control, either by IP whitelist, or better by shared secret. The obvious problem is that it breaks updates. At least peer know each others and could agree on automatically creating a shared secret if it is missing, but we need to break clients. The annoyance can be mitigated with an helpful message on mount failure, in the log and on stdout such as "please copy /etc/glusterd/secret from a server" -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Invitation: Re: Question on merging zfs snapshot supp... @ Tue Dec 20, 2016 2:30pm - 3:30pm (IST) (sri...@marirs.net.in)
On Wed, Dec 21, 2016 at 10:00:17AM +0530, sri...@marirs.net.in wrote: > In continuation to the discussion we'd yesterday, I'd be working on the > change we'd initiated sometime back for pluggable FS specific snapshot > implementation Let me know how I can contribute the FFS implementation for NetBSD. In case it helps for designing the API, here is the relevant man page: http://netbsd.gw.com/cgi-bin/man-cgi?fss+.NONE+NetBSD-7.0.2 Basically, you find iterate on /dev/fss[0-9], open it cand call ioctl FSSIOCGET to checkif it is already in use. Once you have an unused one, ioctl FSSIOCSET to cast the snapshot. It requires a backing store file, which may be created by mktemp() and unlinked immediatly. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] quota-rename.t core in netbsd
Sanoj Unnikrishnan <sunni...@redhat.com> wrote: > Ran the same steps as in the quota-rename.t (manually though. multiple > times!), Could not reproduce the issue. But running the test framework hits the bug reliabily? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] quota-rename.t core in netbsd
Vijay Bellur <vbel...@redhat.com> wrote: > Emmanuel might be able to help with problems related to NetBSD > environment. Sure, feel free to ask if you hit NetBSD-speciic troubles. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster and FreeBSD
On Tue, Sep 20, 2016 at 09:16:54AM +0530, Nigel Babu wrote: > Giving this thread a signal boost. We should think about this if we're going > to > continue to support *BSD. An attempt to clarify some apparent confusion: Despite theit very similar names, *BSD are not different distributions of the same software like Linux distributions are. NetBSD and FreeBSD are distinct operating systems, with theit own kernels and userlands that diverged from a common ancestor 23 years ago. This is why you should not take FreeBSD behaviors for granted on NetBSD, and vice-versa. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster and FreeBSD
Nigel Babu <nig...@redhat.com> wrote: > Emmanuel, I know you work on NetBSD, but do you have thoughts to add here? I can help fixinf NetBSD bugs, but does this FreeBSD problem also applies to NetBSD? A quick tests shows that altough NetBSD does not use sticky bit on files, it still retains them: # touch test # ls -l test -rw-r--r-- 1 root wheel 0 Sep 20 06:04 test # chmod u+t test # ls -l test -rw-r--r-T 1 root wheel 0 Sep 20 06:04 test Note that T means t without x: # chmod uog+rx test # ls -l test -rwxr-xr-t 1 root wheel 0 Sep 20 06:04 test -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Libunwind
On Thu, Sep 08, 2016 at 09:07:33AM -0400, Jeff Darcy wrote: > (1) Has somebody already gone down this path? Does it work? I recall attempting to port the Julia programming language to NetBSD, and libunwind gave me a hard time because the NetBSD does not implement an API as large as on Linux. My advice is to review supported platfrom's header file before using some feature, otherwise the result will not be easily portable -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Anyone wants to maintain Mac-OSX port of gluster?
On Tue, Sep 06, 2016 at 07:30:08AM -0400, Kaleb S. KEITHLEY wrote: > Mac OS X doesn't build at the present time because its sed utility (used in > the xdrgen/rpcgen part of the build) doesn't support the (linux compatible) > '-r' command line option. (NetBSD and FreeBSD do.) > > (There's an easy fix) Easy fix, replace sed -r by $SED_R and SED_R="sed -r" on Linux vs SED_R="sed -E" on BSDs, including OSX. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression is now netbsd6-regression
On Thu, Aug 18, 2016 at 12:07:08PM +0530, Nigel Babu wrote: > As in the case of CentOS yesterday, the NetBSD job is now netbsd6-regression. But we run regressions o nnetbsd-7 branch. Smoke tests are tun on neybsd-6. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Changing the names of regression jobs
On Thu, Aug 11, 2016 at 11:07:22AM +0530, Nigel Babu wrote: > I'd like to propose renaming them to: > * centos-regression > * netbsd-regression I suggest you keep an OS version number; netbsd7-regression That way we can introduce an OS update as experimental without breaking what is known to work. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD Regression Failures for 2 weeks
Jeff Darcy <jda...@redhat.com> wrote: > I think there's an experiment we should do, which I've discussed with a > couple of others: redefine EXPECT_WITHIN on NetBSD to double or triple > the time given, and see if it makes a difference. I am pretty sure it would help. I already raised a few limits to fix tests for NetBSD. We could set limits in a per-OS unit, which will be 1s for Linux, and let's start with 2s for NetBSD and see if we get a difference on overall failure ratio. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD Regression Failures for 2 weeks
On Tue, Aug 09, 2016 at 03:44:43PM +0530, Nigel Babu wrote: > Here are the netbsd regressions for the last 2 weeks. Please let me know if > there are infra issues particularly in nbslave7h. As far as I can see, it > just gets assigned more jobs than other machines, and hence more failures. Probably right. Since we see no systemic failures, that suggest there are real rare bugs there. But if they are that difficult to reproduce, that does not push people to track them :-/ > *96* of *247* regressions failed That is huge. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] readdir() harmful in threaded code
Vijay Bellur <vbel...@redhat.com> wrote: > Do you have any concrete examples of problems encountered due to the > same directory stream being invoked from multiple threads? I am not sure this scenario can happen, but what we had were directory offsets reused among different DIR * opened on the same directory. This works on Linux but is a standard violation, as directory offsets are supposed to be valid only for a given DIR *. It broke NetBSD regression enough that I added a test against it in xlator/storage/posix/src:posix.c seekdir (dir, off); #ifndef GF_LINUX_HOST_OS if ((u_long)telldir(dir) != off && off != pfd->dir_eof) { gf_msg (THIS->name, GF_LOG_ERROR, EINVAL, P_MSG_DIR_OPERATION_FAILED, "seekdir(0x%llx) failed on dir=%p: " "Invalid argument (offset reused from " "another DIR * structure?)", off, dir); errno = EINVAL; count = -1; goto out; } #endif /* GF_LINUX_HOST_OS */ About standards and portability, here is the relevant part in Linux man page: > In the current POSIX.1 specification (POSIX.1-2008), readdir(3) is not > required to be thread-safe. However, in modern implementations > (including the glibc implementation), concurrent calls to readdir(3) > that specify different directory streams are thread-safe. > > It is expected that a future version of POSIX.1 will make readdir_r() > obsolete, and require that readdir() be thread-safe when concurrently > employed on different directory streams. This means linux recommands using readir(), but such practice is likely to break on other systems, since standards do not currently requires it to be thread-safe. We can go the readdir() way, but pleas add locks. Alternatively we can use #ifdef to use alternate code on Linux (readdir) and others (readdir_r) -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] readdir() harmful in threaded code
Pranith Kumar Karampuri <pkara...@redhat.com> wrote: > So should we do readdir() with external locks for everything instead? readdir() with a per-directory lock is safe. However, it may come with a performance hit in some scenarios, since two threads cannot read the same directory at once. But I am not sure it can happen in GlusterFS. I am a bit disturbed by readdir_r() being planned for deprecation. The Open Group does not say that, or I missed it: http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] netbsd smoke tests fail when code patches are backported to release-3.6
On Fri, May 20, 2016 at 05:43:07PM +0300, Angelos SAKELLAROPOULOS wrote: > May I ask why following review requests are not submitted to release-3.6 ? > It seems that they fail in netbsd, freebsd smoke tests which are not > related to code changes. There are build errors. I am note sur how you could have inherited them from git checkout, since previous changes were supposed to pass smoke too. If you are sure the error are not yours, you can try to rebase. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] self heal start failure on 3.8rc1
Ravishankar N <ravishan...@redhat.com> wrote: > Yes, since 3.8 was based off master, it has the same issue. > http://review.gluster.org/#/c/14414/ has been merged to fix it. If you > want to temporarily workaround it, just do some dummy 'gluster volume > set` operation to regenerate the client vol files. Well, since the goal is to test migration path, I will wait for 3.8rc2 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] self heal start failure on 3.8rc1
Hello After updating from 3.7.11 to 3.8rc1, self heal daemon will not start anymore. Here is the log. The "op-version >= 30707" error reminds me something we already saw in the past. Any hint? [2016-05-20 03:34:40.709337] I [MSGID: 100030] [glusterfsd.c:2350:main] 0-/usr/pkg/sbin/glusterfs: Started running /usr/pkg/sbin/glusterfs version 3.8rc1 (args: /usr/pkg/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/c7e5574af4b5b4ffdcb61b1d5e63d8da.socket --xlator-option *replicate*.node-uuid=85eb78cd-8ffa-49ca-b3e7-d5030bc3124d) [2016-05-20 03:34:40.734688] E [socket.c:2391:socket_connect_finish] 0-glusterfs: connection to ::1:24007 failed (Connection refused) [2016-05-20 03:34:40.734927] E [glusterfsd-mgmt.c:1902:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Invalid argument) [2016-05-20 03:34:43.781626] I [MSGID: 101173] [graph.c:269:gf_add_cmdline_options] 0-gfs-replicate-3: adding option 'node-uuid' for volume 'gfs-replicate-3' with value '85eb78cd-8ffa-49ca-b3e7-d5030bc3124d' [2016-05-20 03:34:43.781818] I [MSGID: 101173] [graph.c:269:gf_add_cmdline_options] 0-gfs-replicate-2: adding option 'node-uuid' for volume 'gfs-replicate-2' with value '85eb78cd-8ffa-49ca-b3e7-d5030bc3124d' [2016-05-20 03:34:43.781859] I [MSGID: 101173] [graph.c:269:gf_add_cmdline_options] 0-gfs-replicate-1: adding option 'node-uuid' for volume 'gfs-replicate-1' with value '85eb78cd-8ffa-49ca-b3e7-d5030bc3124d' [2016-05-20 03:34:43.781883] I [MSGID: 101173] [graph.c:269:gf_add_cmdline_options] 0-gfs-replicate-0: adding option 'node-uuid' for volume 'gfs-replicate-0' with value '85eb78cd-8ffa-49ca-b3e7-d5030bc3124d' [2016-05-20 03:34:43.782267] E [MSGID: 108040] [afr.c:448:init] 0-gfs-replicate-3: Unable to fetch afr pending changelogs. Is op-version >= 30707? [Invalid argument] [2016-05-20 03:34:43.782390] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-gfs-replicate-3: Initialization of volume 'gfs-replicate-3' failed, review your volfile again [2016-05-20 03:34:43.782415] E [MSGID: 101066] [graph.c:324:glusterfs_graph_init] 0-gfs-replicate-3: initializing translator failed [2016-05-20 03:34:43.782436] E [MSGID: 101176] [graph.c:670:glusterfs_graph_activate] 0-graph: init failed [2016-05-20 03:34:43.783084] W [glusterfsd.c:1265:cleanup_and_exit] (-->0xbbbd9dab <rpc_clnt_handle_reply+452> at /usr/pkg/lib/libgfrpc.so.0 -->0x8056207 <mgmt_getspec_cbk+850> at /usr/pkg/sbin/glusterfs -->0x8051b03 <glusterfs_process_volfp+467> at /usr/pkg/sbin/glusterfs ) 0-: received signum (22), shutting down -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] tests/performance/open-behind.t fails on NetBSD
Joseph Fernandes <josfe...@redhat.com> wrote: > ./tests/performance/open-behind.t is failing continuously on 3.7.11 This is the fate of non enforced tests. It may be a good idea to invetigate it: perhaps NetBSD gets a reliable failure for a rare bug that is not NetBSD specific. We already saw such situations. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Requesting for a NetBSD setup
On Mon, May 02, 2016 at 01:55:43PM +0530, Manikandan Selvaganesh wrote: > Could you please provide us a NetBSD machine as the test cases are failing > and we need to have a look on it? nbslave72.cloud.gluster.org was put offline for some jenkins breakage that todes not seems to be slave-related: I gave it a quick try, and it is able to build and run tests. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Requesting for NetBSD setup
On Fri, Apr 29, 2016 at 01:28:53AM -0400, Karthik Subrahmanya wrote: > I would like to ask for a NetBSD setup nbslave7[4gh] are disabled in Jenkins right now. They are labeled "Disconnected by kaushal", but I don't kno why. Once it is confirmed that they are not alread used for testing, you could pick one. I still does not know who is the password guardian at Rehat, though. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] regression machines reporting slowly ? here is the reason ...
On Sun, Apr 24, 2016 at 03:59:40PM +0200, Niels de Vos wrote: > Well, slaves go into offline, and should be woken up when needed. > However it seems that Jenkins fails to connect to many slaves :-/ Nothing new here. I tracked this kind of toruble with NetBSD slaves and only got frustration as the result. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD FUSE and filehandles
On Tue, Apr 19, 2016 at 04:25:07PM +0200, Csaba Henk wrote: > I also have a vague memory that in Linux VFS the file operations > are dispatched to file objects in quite a pure oop manner (which > suggests itself to practices like "storing the file handle identifier > along with the file object"), while in traditional BSD VFS the file > ops just get the vnode (from which modernization efforts departed > to various degree across the recent BSD variants). Yes, NetBSD VFS has not clue about upper representation of the file, it just has a reference on the vnode. That one will be difficult to implement. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Easy build fix to review
Hi Here is an easy build fix to review: remove undefined variable in Makefile: http://review.gluster.org/13867 http://review.gluster.org/13868 Any taker? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] More news on 3.7.11
On Fri, Apr 15, 2016 at 01:32:23PM +0530, Kaushal M wrote: > Or, > 2. Revert the IPv6 patch that exposed this problem IMO the good practice when a change breaks a stable release is to back it out, and work on a betterfix on master for later pull up to stable. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NetBSD FUSE and filehandles
Hi Anoop C S asked me about a NetBSD FUSE bug that prevented mandatory locks to properly work. In order to work on a fix, I need confirmation about how it works on Linux, which is the reference implementation of FUSE. Here is what I understand, please tell me if there is something wrong: Each time a process opens a file within the FUSE filesystem, the kernel will call the FUSE open method, and the filesystem shall return a filehandle. For subsequent operations on the open file descriptor, the kernel will include the adequate filehandle in the FUSE requests. The filehandle is tied to the couple (calling process, file descriptor within calling process). Each time the calling process calls open again on the same file, a new file handle is returned. This means this creates two distincts file handles; fd1 = open("/mnt/foo", O_RDWR); fd2 = open("/mnt/foo", O_RDWR); Is all of that correct? If it is, here is the problem I face now: the NetBSD kernel implements PUFFS, an interface smilar but incompatible with FUSE that was developped before FUSE became the de-facto standard. I have being maintaining the PUFFS to FUSE compatibility layer we use to run Glusterfs on NetBSD. PUFFS sends userland requests about vnode operations, the userland filesystems gets references to the vnode, it can also get the calling process PID, but currently the file descriptor within calling process is not provided to the userland filesystem. If my understanding of FUSE filehandle semantics is correct, that means I will have to modify the PUFFS interface so that operations on open files get a reference about the file descriptor within calling process, since this is a requirement to retreive the appropriate filehandle for FUSE. Anyone can confirm? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] !! operator
Jeff Darcy <jda...@redhat.com> wrote: > Sorry if my comment came off as dismissive. It was not dissmissive. My reply was not ironic, I was gladly surprised to learn something about C syntax. >From an objective code-readability standpoint it's probably a bad > idiom. IMO the root of the problem is the lack of a built-in boolean type in C, but that is not big business. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] !! operator
Jeff Darcy <jda...@redhat.com> wrote: > It's a common idiom in the Linux kernel/coreutils community. I > thought it was in BSD too. Thanks for the explanation. I was able to practice C for 18 years in numerous projects without having the opportunity to see it. I will got to bed less ignorant tonight. :-) -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] !! operator
Hello I found a !! in glusterfs sources. Is it a C syntax I do not know, a bug, or just a weird syntax? xlators/cluster/afr/src/afr-inode-write.c: local->stable_write = !!((fd->flags|flags)&(O_SYNC|O_DSYNC)); -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] netbsd regression failure in open-behind.t
On Fri, Mar 18, 2016 at 09:08:04AM -0400, Prasanna Kumar Kalever wrote: gluster volume top $V0 open | grep -w "$F0" >/dev/null 2>&1 TEST [ $? -eq 0 ]; What do we expect here and what do we get? I note that the test fails either if glustrer volume top fails, ot if its output does not contain $F0 (why not use fgrep "$F0" ?) What happens? Removing >/dev/null 2>&1 above may be insighful. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD Regression failure on 3.7: ./tests/features/trash.t
On Mon, Mar 14, 2016 at 12:08:31PM +0530, Anoop C S wrote: > Test #59 is a volume heal command: > TEST $CLI volume heal $V1 > I am not sure why this command itself failed. Let me take a look > through the archived logs. It would not be the first time an unrelated bug pops up where we do not expect it. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] How to cope with spurious regression failures
Raghavendra Talur <rta...@redhat.com> wrote: > Yes, because I updated from patch set 2 to 3 and tests for 2 were running > on the same slave. It seems my test for concurent runs misfires when previous run was aborted. I need to improve that. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] How to cope with spurious regression failures
Raghavendra Talur <rta...@redhat.com> wrote: > The tests passed on the first run itself, except for the NetBSD with > "another test running on slave" error. Was the previous test on the slave canceled? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FreeBSD smoke failure
Jeff Darcy <jda...@redhat.com> wrote: > OK, so are you proposing that we add the same thing on the FreeBSD > slaves? This is how I fixed that exact same problem for NetBSD. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FreeBSD smoke failure
Jeff Darcy <jda...@redhat.com> wrote: > What solution do you suggest? On NetBSD Jenkins slaves slave VM, /opt/qa/build.sh contains this: PYDIR=`$PYTHONBIN -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())'` su -m root -c "/usr/bin/install -d -o jenkins -m 755 $PYDIR/gluster" -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FreeBSD smoke failure
Jeff Darcy <jda...@redhat.com> wrote: > I've seen the same thing, but not all the time even for the same code. > The fact that it's not consistent suggests that it's a configuration > issue on some of the workers. The problem is that glusterfs install target copies glupy python module outside of glusterfs install directory. The permissions must be propeperly setup for the unprivilegied build process to succeed the copy. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] readdir() harmful in threaded code
Juste to make sure there is no misunderstanding here: unfortunately I do not have time right now to submit a fix. It would be nice if someone else coule look at it. On Wed, Feb 10, 2016 at 01:48:52PM +, Emmanuel Dreyfus wrote: > Hi > > After obtaining a core in a regression, I noticed there are a few readdir() > use in threaded code. This is begging for a crash, as readdir() maintains > an internal state that will be trashed on concurent use. readdir_r() > should be used instead. > > A quick search shows readdir(à usage here: > contrib/fuse-util/mount_util.c:30 > extras/test/ld-preload-test/ld-preload-test.c:310 > extras/test/test-ffop.c:550 > libglusterfs/src/compat.c:256 > libglusterfs/src/compat.c:315 > libglusterfs/src/syscall.c:97 > tests/basic/fops-sanity.c:662 > tests/utils/arequal-checksum.c:331 > > Occurences in contrib, extra and tests are probably harmless are there > are usage in standalone programs that are not threaded. We are left with > three groups of problems: > > 1) libglusterfs/src/compat.c:256 and libglusterfs/src/compat.c:315 > This is Solaris compatibility code. Is it used at all? > > 2) libglusterfs/src/syscall.c:97 This is the sys_readdir() wrapper, > which is in turn used in: > libglusterfs/src/run.c:284 > xlators/features/bit-rot/src/stub/bit-rot-stub-helpers.c:582 > xlators/features/changelog/lib/src/gf-history-changelog.c:854 > xlators/features/index/src/index.c:471 > xlators/mgmt/glusterd/src/glusterd-snapshot-utils.c > xlators/storage/posix/src/posix.c:3700 > xlators/storage/posix/src/posix.c:5896 > > 3) We also find sys_readdir() in libglusterfs/src/common-utils.h for > GF_FOR_EACH_ENTRY_IN_DIR() which in turn appears in: > libglusterfs/src/common-utils.c:3979 > libglusterfs/src/common-utils.c:4002 > xlators/mgmt/glusterd/src/glusterd-hooks.c:365 > xlators/mgmt/glusterd/src/glusterd-hooks.c:379 > xlators/mgmt/glusterd/src/glusterd-store.c:651 > xlators/mgmt/glusterd/src/glusterd-store.c:661 > xlators/mgmt/glusterd/src/glusterd-store.c:1781 > xlators/mgmt/glusterd/src/glusterd-store.c:1806 > xlators/mgmt/glusterd/src/glusterd-store.c:3044 > xlators/mgmt/glusterd/src/glusterd-store.c:3072 > xlators/mgmt/glusterd/src/glusterd-store.c:3593 > xlators/mgmt/glusterd/src/glusterd-store.c:3606 > xlators/mgmt/glusterd/src/glusterd-store.c:4032 > xlators/mgmt/glusterd/src/glusterd-store.c:4111 > > There a hive of sprious bugs to squash here. > > -- > Emmanuel Dreyfus > m...@netbsd.org > _______ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] readdir() harmful in threaded code
Hi After obtaining a core in a regression, I noticed there are a few readdir() use in threaded code. This is begging for a crash, as readdir() maintains an internal state that will be trashed on concurent use. readdir_r() should be used instead. A quick search shows readdir(à usage here: contrib/fuse-util/mount_util.c:30 extras/test/ld-preload-test/ld-preload-test.c:310 extras/test/test-ffop.c:550 libglusterfs/src/compat.c:256 libglusterfs/src/compat.c:315 libglusterfs/src/syscall.c:97 tests/basic/fops-sanity.c:662 tests/utils/arequal-checksum.c:331 Occurences in contrib, extra and tests are probably harmless are there are usage in standalone programs that are not threaded. We are left with three groups of problems: 1) libglusterfs/src/compat.c:256 and libglusterfs/src/compat.c:315 This is Solaris compatibility code. Is it used at all? 2) libglusterfs/src/syscall.c:97 This is the sys_readdir() wrapper, which is in turn used in: libglusterfs/src/run.c:284 xlators/features/bit-rot/src/stub/bit-rot-stub-helpers.c:582 xlators/features/changelog/lib/src/gf-history-changelog.c:854 xlators/features/index/src/index.c:471 xlators/mgmt/glusterd/src/glusterd-snapshot-utils.c xlators/storage/posix/src/posix.c:3700 xlators/storage/posix/src/posix.c:5896 3) We also find sys_readdir() in libglusterfs/src/common-utils.h for GF_FOR_EACH_ENTRY_IN_DIR() which in turn appears in: libglusterfs/src/common-utils.c:3979 libglusterfs/src/common-utils.c:4002 xlators/mgmt/glusterd/src/glusterd-hooks.c:365 xlators/mgmt/glusterd/src/glusterd-hooks.c:379 xlators/mgmt/glusterd/src/glusterd-store.c:651 xlators/mgmt/glusterd/src/glusterd-store.c:661 xlators/mgmt/glusterd/src/glusterd-store.c:1781 xlators/mgmt/glusterd/src/glusterd-store.c:1806 xlators/mgmt/glusterd/src/glusterd-store.c:3044 xlators/mgmt/glusterd/src/glusterd-store.c:3072 xlators/mgmt/glusterd/src/glusterd-store.c:3593 xlators/mgmt/glusterd/src/glusterd-store.c:3606 xlators/mgmt/glusterd/src/glusterd-store.c:4032 xlators/mgmt/glusterd/src/glusterd-store.c:4111 There a hive of sprious bugs to squash here. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] glusterfsd core on NetBSD (https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14139/consoleFull)
On Wed, Feb 10, 2016 at 02:26:35PM +0530, Soumya Koduri wrote: > Is this issue related to bug1221629 as well? I do not know, but please someone replace readdir by readdir_r! :-) -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] glusterfsd core on NetBSD (https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14139/consoleFull)
On Wed, Feb 10, 2016 at 12:17:23PM +0530, Soumya Koduri wrote: > I see a core generated in this regression run though all the tests seem to > have passed. I do not have a netbsd machine to analyze the core. > Could you please take a look and let me know what the issue could have been? changelog bug. I am not sure how this could become NULL after it has been checked at the beginning of gf_history_changelog(). I note this uses readdir() which is not thread-safe. readdir_r() should probably be used instead. Program terminated with signal SIGSEGV, Segmentation fault. #0 0xb99912b4 in gf_history_changelog (changelog_dir=0xb7b160f0 "\003", start=3081873456, end=0, n_parallel=-1217773520, actual_end=0xb7b05310) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/changelog/lib/src/gf-history-changelog.c:834 834 gf_log (this->name, GF_LOG_ERROR, (gdb) print this $1 = (xlator_t *) 0x0 #0 0xb99912b4 in gf_history_changelog (changelog_dir=0xb7b160f0 "\003", start=3081873456, end=0, n_parallel=-1217773520, actual_end=0xb7b05310) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/changelog/lib/src/gf-history-changelog.c:834 #1 0xbb6fec17 in rpcsvc_record_build_header (recordstart=0x0, rlen=3077193776, reply=..., payload=3081855216) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/rpcsvc.c:857 #2 0xbb6fec95 in rpcsvc_record_build_header (recordstart=0xb7b10030 "", rlen=3077193776, reply=..., payload=3081855216) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/rpcsvc.c:874 #3 0xbb6ffa81 in rpcsvc_submit_generic (req=0xb7b10030, proghdr=0xb7b160f0, hdrcount=0, payload=0xb76a4030, payloadcount=1, iobref=0x0) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/rpcsvc.c:1316 #4 0xbb70506c in xdr_to_rpc_reply (msgbuf=0xb7b10030 "", len=0, reply=0xb76a4030, payload=0xb76a4030, verfbytes=0x1 ) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-lib/src/xdr-rpcclnt.c:40 #5 0xbb26cbb5 in socket_server_event_handler (fd=16, idx=3, data=0xb7b10030, poll_in=1, poll_out=0, poll_err=0) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/rpc/rpc-transport/socket/src/socket.c:2765 #6 0xbb7908da in syncop_rename (subvol=0xbb143030, oldloc=0xba45b4b0, newloc=0x3, xdata_in=0x75, xdata_out=0xbb7e8000) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/syncop.c:2225 #7 0xbb790c21 in syncop_ftruncate (subvol=0xbb143030, fd=0x8062cc0 , offset=-4647738537632864458, xdata_in=0xbb7efe75 <_rtld_bind_start+17>, xdata_out=0xbb7e8000) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/syncop.c:2265 #8 0xbb75f6d1 in inode_table_dump (itable=0xbb143030, prefix=0x2 ) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/libglusterfs/src/inode.c:2352 #9 0x08050e20 in main (argc=12, argv=0xbf7feaac) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/glusterfsd/src/glusterfsd.c:2345 -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Cores on NetBSD of brick https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14100/consoleFull
On Tue, Feb 09, 2016 at 11:56:37AM +0530, Pranith Kumar Karampuri wrote: > I think the regression run is not giving that link anymore when the crash > happens? Could you please add that also as a link in regression run? Ther was the path of the archive, I changed it for a http:// link -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] changelog bug
On Mon, Feb 08, 2016 at 12:53:33AM -0500, Manikandan Selvaganesh wrote: > Thanks and as you have mentioned, I have no clue how my changes > produced a core due to a NULL pointer in changelog. It is probably an unrelated bug that was nice enough to pop up here. Too often people disregard NetBSD failures and just retrigger without looking at the cause, but it has already proven its ability to expose bugs that are unwilling to come to light in Linux regressions, but still exist on Linux. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On Mon, Feb 08, 2016 at 03:26:54PM +0530, Milind Changire wrote: > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14089/consoleFull > > > [08:44:20] ./tests/basic/afr/self-heald.t .. > not ok 37 Got "0" instead of "1" > not ok 52 Got "0" instead of "1" > not ok 67 > Failed 4/83 subtests There is a core but it is from NetBSD FUSE subsystem. The trace is not helpful but suggests an abort() call because of unexpected situation: Core was generated by `perfused'. Program terminated with signal SIGABRT, Aborted. #0 0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12 (gdb) bt #0 0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12 /var/log/messages has a hint: Feb 8 08:43:15 nbslave7c perfused: file write grow without resize Indeed I have this assertion in NetBSD FUSE to catch a race condition. I think it is the first time I see hit raised, but I am unable to conclude on the cause. Let us retrigger (I did it) and see if someone else ever hit that again. The bug is more likely in NetBSD FUSE than in glusterfs. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote: > The patch to add it to bad tests has already been merged, so I guess this > .t's failure won't pop up again. IMo that was a bit too quick. What is the procedure to get out of the list? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On Mon, Feb 08, 2016 at 10:26:22AM +, Emmanuel Dreyfus wrote: > Indeed, same problem. But unfortunately it is not very reproductible since > we need to make a full week of runs to see it again. I am tempted to > just remove the assertion. NB: this does not fail on stock NetBSD release: the assertion is only there because FUSE is build with -DDEBUG on NetBSD slave VM. OTOH if it happens only in tests/basic/afr/self-heal.t I may be able to get it by looping on the test for a while. I will try this on nbslave70. In the meatime if that one pops up too often and gets annoying, I can get rid of it by just disabling debug mode. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On Mon, Feb 08, 2016 at 03:44:43PM +0530, Ravishankar N wrote: > The .t has been added to bad tests for now @ I am note sure this is relevant: does it fails again? I am very interested if it is reproductible. > http://review.gluster.org/#/c/13344/, so you can probably rebase your patch. > I'm not sure this is a problem with the case, the same issue was reported by > Manikandan last week : > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13895/consoleFull Indeed, same problem. But unfortunately it is not very reproductible since we need to make a full week of runs to see it again. I am tempted to just remove the assertion. > Is it one of those vndconfig errors? The .t seems to have skipped a few > tests: This is because FUSE went away during the test. The vnconfig problems are fixed now and should not happen anymore. > -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/quota-anon-fd-nfs.t, ./tests/basic/tier/fops-during-migration.t, ./tests/basic/tier/record-metadata-heat.t
On Mon, Feb 08, 2016 at 06:25:09PM +0530, Milind Changire wrote: > Looks like some cores are available as well. > Please advise. #0 0xb99912b4 in gf_changelog_reborp_rpcsvc_notify (rpc=0xb7b160f0, mydata=0xb7b1a830, event=RPCSVC_EVENT_ACCEPT, data=0xb76a4030) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/changelog/lib/src/gf-changelog-reborp.c:110 110 return 0; Crash on return: That smells like stack coruption. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Cores on NetBSD of brick https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14100/consoleFull
On Mon, Feb 08, 2016 at 07:27:46PM +0530, Pranith Kumar Karampuri wrote: > I don't see any logs in the archive. Did we change something? I think thay are in a different tarball, in /archives/logs -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] changelog bug
quot; " " " " " " " " " " "s"w"i"t"c"h" "("e"v"e"n"t")" "{" "1"1"7" " " " " " " " " " " " " "c"a"s"e" "R"P"C"S"V"C"_"E"V"E"N"T"_"A"C"C"E"P"T":" "1"1"8" " " " " " " " " " " " " " " " " " " " " "r"e"t" "=" "s"y"s"_"u"n"l"i"n"k" "("R"P"C"_"S"O"C"K"("e"n"t"r"y")")";" "("g"d"b")" "p"r"i"n"t" "e"n"t"r"y" "$"2" "=" "("g"f"_"c"h"a"n"g"e"l"o"g"_"t" "*")" "0"x"b"7"b"1"a"8"3"0" "("g"d"b")" "p"r"i"n"t" "*"e"n"t"r"y" "$"3" "=" "{"s"t"a"t"e"l"o"c"k" "=" "{"p"t"s"_"m"a"g"i"c" "=" "0"," "p"t"s"_"s"p"i"n" "=" "0" "'"\"0"0"0"'"," "p"t"s"_"f"l"a"g"s" "=" "0"}"," " " " "c"o"n"n"s"t"a"t"e" "=" "G"F"_"C"H"A"N"G"E"L"O"G"_"C"O"N"N"_"S"T"A"T"E"_"P"E"N"D"I"N"G"," "t"h"i"s" "=" "0"x"0"," "l"i"s"t" "=" "{"n"e"x"t" "=" "0"x"0"," " " " " " "p"r"e"v" "=" "0"x"0"}"," "b"r"i"c"k" "=" "'"\"0"0"0"'" "<"r"e"p"e"a"t"s" "5"8"0" "t"i"m"e"s">"."."."," "g"r"p"c" "=" "{"s"v"c" "=" "0"x"4"f"c"0"0"," " " " " " "r"p"c" "=" "0"x"b"1"a"8"3"0"0"0"," " " " " " "s"o"c"k" "=" """∑"fi"¿"≠"fi"\"0"0"0"\"0"6"0"\"0"3"7"\"2"7"3"î"\"0"0"0"\"0"0"0"\"0"0"0"\"3"7"4"\"0"0"4"\"0"0"0"\"0"0"0"\"0"6"0"®"±"∑"fi"¿"≠"fi"\"0"0"0"\"0"6"0"\"0"3"7"\"2"7"3"î"\"0"0"0"\"0"0"0"\"0"0 "0"\"3"7"4"\"0"0"4"\"0"0"0"\"0"0"0"\"0"6"0"®"±"∑"fi"¿"≠"fi"\"0"0"0"\"0"6"0"\"0"3"7"\"2"7"3"î"\"0"0"0"\"0"0"0"\"0"0"0"\"3"7"4"\"0"0"4"\"0"0"0"\"0"0"0"\"0"6"0"®"±"∑"fi"¿"≠"fi"\"0"0"0"\"0 "6"0"\"0"3"7"\"2"7"3"î"\"0"0"0"\"0"0"0"\"0"0"0"\"3"7"4"\"0"0"4"\"0"0"0"\"0"0"0"\"0"6"0"®"±"∑"fi"¿"≠"fi"\"0"0"0"\"0"6"0"\"0"3"7"\"2"7"3"î"\"0"0"0"\"0"0"0"\"0"0"0"\"3"7"4"\"0"0"4"\"0"0 "0"\"0"0"0"\"0"6"0"®"±"∑"fi"¿"≠"""}"," " " " "n"o"t"i"f"y" "=" "5"2"3"2"3"9"6"4"6"," "f"i"n"i" "=" "0"x"9"4"b"b"," "c"a"l"l"b"a"c"k" "=" "0"x"4"f"c"0"0"," " " " "c"o"n"n"e"c"t"e"d" "=" "0"x"b"1"a"8"3"0"0"0"," "d"i"s"c"o"n"n"e"c"t"e"d" "=" "0"x"a"d"c"0"d"e"b"7"," "p"t"r" "=" "0"x"1"f"3"0"0"0"d"e"," " " " "i"n"v"o"k"e"r"x"l" "=" "0"x"9"4"b"b"," "o"r"d"e"r"e"d" "=" "("u"n"k"n"o"w"n":" "3"2"6"6"5"6")"," "q"u"e"u"e"e"v"e"n"t" "=" "0"x"b"1"a"8"3"0"0"0"," " " " "p"i"c"k"e"v"e"n"t" "=" "0"x"a"d"c"0"d"e"b"7"," "e"v"e"n"t" "=" "{"l"o"c"k" "=" "{"p"t"m"_"m"a"g"i"c" "=" "5"2"3"2"3"9"6"4"6"," " " " " " " " "p"t"m"_"e"r"r"o"r"c"h"e"c"k" "=" "1"8"7" "'"ª"'"," "p"t"m"_"p"a"d"1" "=" """î"\"0"0"0"""," "p"t"m"_"i"n"t"e"r"l"o"c"k" "=" "0" "'"\"0"0"0"'"," " " " " " " " "p"t"m"_"p"a"d"2" "=" """\"3"7"4"\"0"0"4"""," "p"t"m"_"o"w"n"e"r" "=" "0"x"b"1"a"8"3"0"0"0"," "p"t"m"_"w"a"i"t"e"r"s" "=" "0"x"a"d"c"0"d"e"b"7"," " " " " " " " "p"t"m"_"r"e"c"u"r"s"e"d" "=" "5"2"3"2"3"9"6"4"6"," "p"t"m"_"s"p"a"r"e"2" "=" "0"x"9"4"b"b"}"," "c"o"n"d" "=" "{" " " " " " " "p"t"c"_"m"a"g"i"c" "=" "3"2"6"6"5"6"," "p"t"c"_"l"o"c"k" "=" "0" "'"\"0"0"0"'"," "p"t"c"_"w"a"i"t"e"r"s" "=" "{" " " " " " " " " "p"t"q"h"_"f"i"r"s"t" "=" "0"x"a"d"c"0"d"e"b"7"," "p"t"q"h"_"l"a"s"t" "=" "0"x"1"f"3"0"0"0"d"e"}"," "p"t"c"_"m"u"t"e"x" "=" "0"x"9"4"b"b"," " " " " " " " "p"t"c"_"p"r"i"v"a"t"e" "=" "0"x"4"f"c"0"0"}"," "i"n"v"o"k"e"r" "=" "0"x"b"1"a"8"3"0"0"0"," "n"e"x"t"_"s"e"q" "=" "2"9"1"5"0"9"8"2"9"5"," " " " " " "e"n"t"r"y" "=" "0"x"1"f"3"0"0"0"d"e"," "e"v"e"n"t"s" "=" "{"n"e"x"t" "=" "0"x"9"4"b"b"," "p"r"e"v" "=" "0"x"4"f"c"0"0"}"}"}" "("g"d"b")" "b"t" "#"0" " "0"x"b"9"a"9"1"2"e"c" "i"n" "g"f"_"c"h"a"n"g"e"l"o"g"_"r"e"b"o"r"p"_"r"p"c"s"v"c"_"n"o"t"i"f"y" "("r"p"c"="0"x"b"7"b"1"6"0"f"0"," " " " " " "m"y"d"a"t"a"="0"x"b"7"b"1"a"8"3"0"," "e"v"e"n"t"="R"P"C"S"V"C"_"E"V"E"N"T"_"A"C"C"E"P"T"," "d"a"t"a"="0"x"b"7"8"b"6"0"3"0")" " " " " "a"t" "/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"x"l"a"t"o"r"s"/"f"e"a"t"u"r"e"s"/"c"h"a"n"g "e"l"o"g"/"l"i"b"/"s"r"c"/"g"f"-"c"h"a"n"g"e"l"o"g"-"r"e"b"o"r"p"."c":"1"1"4" "#"1" " "0"x"b"b"6"f"b"c"1"7" "i"n" "r"p"c"s"v"c"_"p"r"o"g"r"a"m"_"n"o"t"i"f"y" "("l"i"s"t"e"n"e"r"="0"x"b"7"b"2"7"1"1"0"," " " " " " "e"v"e"n"t"="R"P"C"S"V"C"_"E"V"E"N"T"_"A"C"C"E"P"T"," "d"a"t"a"="0"x"b"7"8"b"6"0"3"0")" " " " " "a"t" "/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"r"p"c"/"r"p"c"-"l"i"b"/"s"r"c"/"r"p"c"s"v"c "."c":"3"3"5" "#"2" " "0"x"b"b"6"f"b"c"9"5" "i"n" "r"p"c"s"v"c"_"a"c"c"e"p"t" "("s"v"c"="0"x"b"7"b"1"6"0"f"0"," "l"i"s"t"e"n"_"t"r"a"n"s"="0"x"b"7"b"1"0"0"3"0"," " " " " " "n"e"w"_"t"r"a"n"s"="0"x"b"7"8"b"6"0"3"0")" " " " " "a"t" "/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"r"p"c"/"r"p"c"-"l"i"b"/"s"r"c"/"r"p"c"s"v"c "."c":"3"5"8" "#"3" " "0"x"b"b"6"f"c"a"8"1" "i"n" "r"p"c"s"v"c"_"n"o"t"i"f"y" "("t"r"a"n"s"="0"x"b"7"b"1"0"0"3"0"," "m"y"d"a"t"a"="0"x"b"7"b"1"6"0"f"0"," " " " " " "e"v"e"n"t"="R"P"C"_"T"R"A"N"S"P"O"R"T"_"A"C"C"E"P"T"," "d"a"t"a"="0"x"b"7"8"b"6"0"3"0")" " " " " "a"t" "/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"r"p"c"/"r"p"c"-"l"i"b"/"s"r"c"/"r"p"c"s"v"c "."c":"7"8"6" "#"4" " "0"x"b"b"7"0"2"0"6"c" "i"n" "r"p"c"_"t"r"a"n"s"p"o"r"t"_"n"o"t"i"f"y" "("t"h"i"s"="0"x"b"7"b"1"0"0"3"0"," " " " " " "e"v"e"n"t"="R"P"C"_"T"R"A"N"S"P"O"R"T"_"A"C"C"E"P"T"," "d"a"t"a"="0"x"b"7"8"b"6"0"3"0")" " " " " "a"t" "/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"r"p"c"/"r"p"c"-"l"i"b"/"s"r"c"/"r"p"c"-"t"r "a"n"s"p"o"r"t"."c":"5"4"1" "#"5" " "0"x"b"b"2"6"9"b"b"5" "i"n" "s"o"c"k"e"t"_"s"e"r"v"e"r"_"e"v"e"n"t"_"h"a"n"d"l"e"r" "("f"d"="1"6"," "i"d"x"="3"," "d"a"t"a"="0"x"b"7"b"1"0"0"3"0"," " " " " " "p"o"l"l"_"i"n"="1"," "p"o"l"l"_"o"u"t"="0"," "p"o"l"l"_"e"r"r"="0")" " " " " "a"t" "/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"r"p"c"/"r"-"-"-"T"y"p"e" "<"r"e"t"u"r"n">" "t"o" "c"o"n"t"i"n"u"e"," "o"r" "q" "<"r"e"t"u"r"n">" "t"o" "q"u"i"t"-"-"-" "p"c"-"t"r"a"n"s"p"o"r"t"/"s"o"c"k"e"t"/"s"r"c"/"s"o"c"k"e"t"."c":"2"7"6"5" "#"6" " "0"x"b"b"7"8"e"9"6"6" "i"n" "e"v"e"n"t"_"d"i"s"p"a"t"c"h"_"p"o"l"l"_"h"a"n"d"l"e"r" "("e"v"e"n"t"_"p"o"o"l"="0"x"b"b"1"4"3"0"3"0"," " " " " " "u"f"d"s"="0"x"b"7"8"9"e"0"b"0"," "i"="3")" " " " " "a"t" "/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"l"i"b"g"l"u"s"t"e"r"f"s"/"s"r"c"/"e"v"e"n"t "-"p"o"l"l"."c":"3"8"9" "#"7" " "0"x"b"b"7"8"e"c"a"d" "i"n" "e"v"e"n"t"_"d"i"s"p"a"t"c"h"_"p"o"l"l" "("e"v"e"n"t"_"p"o"o"l"="0"x"b"b"1"4"3"0"3"0")" " " " " "a"t" "/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"l"i"b"g"l"u"s"t"e"r"f"s"/"s"r"c"/"e"v"e"n"t "-"p"o"l"l"."c":"4"8"2" "#"8" " "0"x"b"b"7"5"d"2"1"9" "i"n" "e"v"e"n"t"_"d"i"s"p"a"t"c"h" "("e"v"e"n"t"_"p"o"o"l"="0"x"b"b"1"4"3"0"3"0")" " " " " "a"t" "/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"l"i"b"g"l"u"s"t"e"r"f"s"/"s"r"c"/"e"v"e"n"t "."c":"1"2"2" "#"9" " "0"x"0"8"0"5"0"e"2"0" "i"n" "m"a"i"n" "("a"r"g"c"="1"2"," "a"r"g"v"="0"x"b"f"7"f"e"a"a"4")" " " " " "a"t" "/"h"o"m"e"/"j"e"n"k"i"n"s"/"r"o"o"t"/"w"o"r"k"s"p"a"c"e"/"r"a"c"k"s"p"a"c"e"-"n"e"t"b"s"d"7"-"r"e"g"r"e"s"s"i"o"n"-"t"r"i"g"g"e"r"e"d"/"g"l"u"s"t"e"r"f"s"d"/"s"r"c"/"g"l"u"s"t"e"r "f"s"d"."c":"2"3"4"5" -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] Different version of run-tests.sh in jenkin slaves?
On Thu, Jan 28, 2016 at 12:17:58PM +0530, Raghavendra Talur wrote: > Where do I find config in NetBSD which decides which location to dump core > in? sysctl kern.defcorename for the default location and name. It can be overriden per process using sysctl proc.$$.corename > Any particular reason you added /d/backends/*/*.core to list of path to > search for core? Yes, this is required for standard compliance of the exposed glusterfs filesystem in the case of low system PATH_MAX. See in posix.c: /* * _XOPEN_PATH_MAX is the longest file path len we MUST * support according to POSIX standard. When prepended * by the brick base path it may exceed backed filesystem * capacity (which MAY be bigger than _XOPEN_PATH_MAX). If * this is the case, chdir() to the brick base path and * use relative paths when they are too long. See also * MAKE_REAL_PATH in posix-handle.h */ _private->path_max = pathconf(_private->base_path, _PC_PATH_MAX); if (_private->path_max != -1 && _XOPEN_PATH_MAX + _private->base_path_length > _private->path_max) { ret = chdir(_private->base_path); if (ret) { gf_msg (this->name, GF_LOG_ERROR, 0, P_MSG_BASEPATH_CHDIR_FAILED, "chdir() to \"%s\" failed", _private->base_path); goto out; } And the core goes in current directory by default. We could use sysctl(3) to change that if we need. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] Different version of run-tests.sh in jenkin slaves?
On Thu, Jan 28, 2016 at 12:17:58PM +0530, Raghavendra Talur wrote: > Where do I find config in NetBSD which decides which location to dump core > in? I crafted the patch below, bbut it is probably much simplier to just set kern.defcorename to /%n-%p.core on all VM slaves. I will do it. diff --git a/xlators/storage/posix/src/posix.c b/xlators/storage/posix/src/posix.c index 272d08f..2fd2d7d 100644 --- a/xlators/storage/posix/src/posix.c +++ b/xlators/storage/posix/src/posix.c @@ -29,6 +29,10 @@ #include #endif /* HAVE_LINKAT */ +#ifdef __NetBSD__ +#include +#endif /* __NetBSD__ */ + #include "glusterfs.h" #include "checksum.h" #include "dict.h" @@ -6631,6 +6635,8 @@ init (xlator_t *this) _private->path_max = pathconf(_private->base_path, _PC_PATH_MAX); if (_private->path_max != -1 && _XOPEN_PATH_MAX + _private->base_path_length > _private->path_max) { +char corename[] = "/%n-%p.core"; + ret = chdir(_private->base_path); if (ret) { gf_msg (this->name, GF_LOG_ERROR, 0, @@ -6639,7 +6645,15 @@ init (xlator_t *this) _private->base_path); goto out; } + #ifdef __NetBSD__ +/* + * Make sure cores go to the root and not in current + * directory + */ +(void)sysctlbyname("proc.curproc.corename", NULL, NULL, + corename, strlen(corename) + 1); + /* * At least on NetBSD, the chdir() above uncovers a * race condition which cause file lookup to fail -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] Different version of run-tests.sh in jenkin slaves?
On Thu, Jan 28, 2016 at 12:10:49PM +0530, Atin Mukherjee wrote: > So does that mean we never analyzed any core reported by NetBSD > regression failure? That's strange. We got the cores from / but not from d/backends/*/ as I understand. I am glad someone figured out the mystery. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Netbsd regressions are failing because of connection problems?
On Thu, Jan 21, 2016 at 04:49:28PM +0100, Michael Scherer wrote: > > review.gluster.org[0: 184.107.76.10]: errno=Connection refused > > SO I found nothing in gerrit nor netbsd. ANd not the DNS, since it > managed to resolve stuff fine. > > I suspect the problem was on gerrit, nor on netbsd. Did it happened > again ? I could imagine problems with exhausted system resources, but it would not produce a "Connection refused". -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Netbsd regressions are failing because of connection problems?
Michael Scherer <msche...@redhat.com> wrote: > Depend, if they exhausted FD or something ? I am not a java specialist. It is not the same errno, AFAIK. > Could also just be too long to answer due to the load, but it was not > loaded :/ High loads give timeouts. I may be wrong, but I beleive connection refused is really when it gets a TCP RST. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Netbsd regressions are failing because of connection problems?
Vijay Bellur <vbel...@redhat.com> wrote: > Does not look like a DNS problem. It is happening to me outside of > rackspace too. I mean I have already seen rackspace VM failing to initiate connexions because rackspace DNS failed to answer DNS requests. This was the cause of failed regression at some time. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Netbsd regressions are failing because of connection problems?
Vijay Bellur <vbel...@redhat.com> wrote: > There is some problem with review.gluster.org now. git clone/pull fails > for me consistently. First check DNS is working. I recall seing rackspace DNS failing to answer. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] ./tests/bugs/changelog/bug-1208470.t failed NetBSD
On Tue, Jan 19, 2016 at 09:38:19AM +0530, Ravishankar N wrote: > ./tests/bugs/changelog/bug-1208470.t seems to have failed a NetBSD run: > https://build.gluster.org/job/rackspace-regression-2GB-triggered/17651/consoleFull > Not sure if it is spurious as it passed in the subsequent run. Please have > a look. I am puzzled: NetBSD regression is supposed to skip the bugs subdirectory. Someone changed something here? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] How to cope with spurious regression failures
On Tue, Jan 19, 2016 at 07:08:03PM +0530, Raghavendra Talur wrote: > a. Allowing re-running to tests to make them pass leads to complacency with > how tests are written. > b. A test is bad if it is not deterministic and running a bad test has *no* > value. We are wasting time even if the test runs for a few seconds. I agree with your vision for the long term, but my proposal address the short term situation. But we could use the retry approahc to fuel your blacklist approach: We could immagine a system where the retry feature would cast votes on individual tests: each time we fail once and succeed on retry, cast a +1 unreliable for the test. After a few days, we will have a wall of shame for unreliable tests, which could either be fixed or go to the blacklist. I do not know what software to use to collect and display the results, though. Should we have a gerrit change for each test? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression fixes
Hi all I have the followif changes awaiting code review/merge: http://review.gluster.org/13204 http://review.gluster.org/13205 http://review.gluster.org/13245 http://review.gluster.org/13247 -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] How to cope with spurious regression failures
Hi Spurious regression failures make developers frustrated. One submits a change and gets completely unrelated failures. The only way out is to retrigger regression until it passes, a boring and time-wasting task. Sometimes after 4 or 5 failed runs, the submitter realize there is a real issue and look at it, which is a waste of time and resources. The fact that we run regression on multiple platforms makes the situation worse. If you have 10% of chances to hit a spurious failure on Linux and a 20% chances to hit a spurious failure on NetBSD (random number chosen), that means you get roughtly a failure for four submissions (random prediction, as I used random input numbers, but you get the idea) Two solutions are proposed: 1) do not run unreliable tests, as proposed by Raghavendra Talur: http://review.gluster.org/13173 I have nothing against the idea, but I voted down the change because it fails to address the need for different test blacklists on different platforms: we do not have the same unreliable tests on Linux and NetBSD. 2) add a regression option to retry a failed test once, and to validate the regression if second attempt passes, as I proposed: http://review.gluster.org/13245 The idea is basicaly to automatically do what every submitter has been doing: retry without a thought when regression fails. The benefit of this approach is also that it gives us a better view of what test failed because of the change, and what test failed because it was unreliable. The retry feature is optionnal and triggered by using the -r flag to run-tests.sh. I intend to use it on NetBSD regression to reduce the number of failures that annoy people. It could be used on Linux regression too, though I do not plan to touch that on my own. Please people tell us what approach you prefer. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] NetBSD regression fixes
Emmanuel Dreyfus <m...@netbsd.org> wrote: > But I just realized the change is wrong, since running tests "new way" > stops on first failed test. My change just retry the failed test and > considers the regression run to be good on success, without running next > tests. > > I will post an update shortly. Done: http://review.gluster.org/13245 http://review.gluster.org/13247 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD hang in quota-anon-fd-nfs.t
On Mon, Jan 11, 2016 at 11:51:25AM +0530, Vijaikumar Mallikarjuna wrote: > All quota test-cases uses 'tests/basic/quota.c' to write data > Does sync flags have any impact? It seems to change internal behavior but not the result, I can still see write calls taking e.g.: 1169s For the sake of compteness: instead of awaiting for page locked (probably by NFS subsystem), it now waits for NFS RPC replies from the server. Example of kernel backtrace: sleepq_block cv_timedwait nfs_rcvlock nfs_request nfs_writerpc nfs_doio VOP_STRATEGY genfs_do_io genfs_gop_write genfs_do_putpages genfs_putpages VOP_PUTPAGES nfs_write VOP_WRITE vn_write dofilewrite sys_write syscall I note we mount with -o noac,soft,nolock,vers=3 with the scripts turn into -o tcp,-R=2,soft,nfs3 fro NetBSD. -R is retry. There is no timeout. Do we need one? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gerrit review, submit type and Jenkins testing
Niels de Vos <nde...@redhat.com> wrote: > How would we handle patches that get sent by maintainers? Most > developers that do code reviews will only +1 those changes. Those will > never get automatically regression tested then. I dont think a > maintainer should +2 their own patch immediately either, that suggests > no further reviews are needed. Indeed it is a bit odd, but I just CR +2 my own changes... -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD hang in quota-anon-fd-nfs.t
Emmanuel Dreyfus <m...@netbsd.org> wrote: > ps -axl shows the quota helper program is waiting on genput: > UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND > 0 9660 23707 0 124 0 3360 1080 genput D+ pts/2 0:00.01 > ./tests/basic/quota /mnt/nfs/0//0/1/2/3/4/5/6/7/8/9/new_file_2 256 4 > > The process is stuck in kernel awaiting for a memory page to get > unlocked. I reproduced the situation, and discovered the process is not really hung. Tracing system calls in the quota procss shows that it does complete write operations, thought ater a very long time. One write system call that last 963s, for instance. It does not hang, but it does not look sane either. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD tests not running to completion.
Pranith Kumar Karampuri <pkara...@redhat.com> wrote: > I tried to look into 3 instances of this failure: (...) > same issue as above, two tests are running in parallel. How is it possible? A & that sends a job in the background? Are we sure it is the same regression test run? Or is it two regression test runs that are scheduled simultaneously? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NetBSD hang in quota-anon-fd-nfs.t
Starting a new thread for the sake of clarity. While looking for the spurious reboot problem, I got a hang in quota-anon-fd-nfs.t. [23:39:54] ./tests/basic/quota-anon-fd-nfs.t .. 16/40 ps -axl shows the quota helper program is waiting on genput: UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 0 9660 23707 0 124 0 3360 1080 genput D+ pts/2 0:00.01 ./tests/basic/quota /mnt/nfs/0//0/1/2/3/4/5/6/7/8/9/new_file_2 256 4 The process is stuck in kernel awaiting for a memory page to get unlocked. That filesystem is still alive, which suggests an unwind operation like the one fixed in http://review.gluster.org/13177 I can unlock the situation by killing glusterfs daemons. Does it rings a bell for someone? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD tests not running to completion.
Emmanuel Dreyfus <m...@netbsd.org> wrote: > While trying to reproduce the problem in > ./tests/basic/afr/arbiter-statfs.t, I came to many failures here: > > [03:53:07] ./tests/basic/afr/split-brain-resolution.t I was running tests from wrong directory :-/ This one is fine with HEAD. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD tests not running to completion.
Pranith Kumar Karampuri <pkara...@redhat.com> wrote: > tests/basic/afr/arbiter-statfs.t I posted patches to fix this one (but it seems Jenkins is down? No regression is running) > tests/basic/afr/self-heal.t > tests/basic/afr/entry-self-heal.t That two ones are still to be investigated, and it seems tests/basic/afr/split-brain-resolution.t is now reliabily broken as well. > tests/basic/quota-nfs.t That one is marked as bad test and should not cause harm on spurious failure as its result is ignored. I am trying to reproduce a spurious VM reboot during tests by looping on the whole test suite on nbslave70, with reboot on panic disabled (it will drop into kernel debugger instead). No result so far. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD tests not running to completion.
On Fri, Jan 08, 2016 at 12:42:36PM +0530, Sachidananda URS wrote: > I have a NetBSD 7.0 installation which I can share with you, to get > started. > Once manu@ gets back on a specific version, I can set that up too. NetBSD 7.0 is fine and has everything required in GENERIC kernel. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD tests not running to completion.
Ravishankar N <ravishan...@redhat.com> wrote: > It failed with EIO. > > mount_nfs: can't access /patchy: Permission denied > mount_nfs: can't access /patchy: Permission denied > mount_nfs: can't access /patchy: Permission denied > dd: /mnt/nfs/0/test-big-write: Input/output error I suspect the EIO is just a consequence of the failed mount. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD tests not running to completion.
On Fri, Jan 08, 2016 at 03:18:02PM +0530, Pranith Kumar Karampuri wrote: > With your support I think we can make things better. To avoid duplication of > work, did you take any tests that you are already investigating? If not that > is the first thing I will try to find out. I will look at the ./tests/basic/afr/arbiter-statfs.t problem with loopback device. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD tests not running to completion.
Emmanuel Dreyfus <m...@netbsd.org> wrote: > > With your support I think we can make things better. To avoid duplication of > > work, did you take any tests that you are already investigating? If not that > > is the first thing I will try to find out. > > I will look at the ./tests/basic/afr/arbiter-statfs.t problem with > loopback device. I tracked it down: vnconfig -l complains about "vnconfig: VNDIOCGET: Bad file descriptor" when we had a configured loopback device with the backing store on a filesystem we unmounted. # dd if=/dev/zero of=/scratch/backend bs=1024k count=100 100+0 records in 100+0 records out 104857600 bytes transferred in 3.034 secs (34560843 bytes/sec) # vnconfig vnd0 /scratch/backend # vnconfig -l vnd0: /scratch (/dev/xbd1a) inode 6 vnd1: not in use vnd2: not in use vnd3: not in use # umount -f /scratch/ # vnconfig -l vnconfig: VNDIOCGET: Bad file descriptor But it seems the workaround is easy: # vnconfig -u vnd0 # vnconfig -l vnd0: not in use vnd1: not in use vnd2: not in use vnd3: not in use Here is my fixes: http://review.gluster.org/13204 (master) http://review.gluster.org/13205 (release-3.7) And while there, a portability fix in rfc.sh: http://review.gluster.org/13206 (master) That bug is not present in release-3.7. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD tests not running to completion.
Pranith Kumar Karampuri <pkara...@redhat.com> wrote: > With your support I think we can make things better. To avoid > duplication of work, did you take any tests that you are already > investigating? If not that is the first thing I will try to find out. While trying to reproduce the problem in ./tests/basic/afr/arbiter-statfs.t, I came to many failures here: [03:53:07] ./tests/basic/afr/split-brain-resolution.t .. 20/43 getfattr: Removin g leading '/' from absolute path names cat: /mnt/glusterfs/0/data-split-brain.txt: Input/output error not ok 25 Got "" instead of "brick0_alive" cat: /mnt/glusterfs/0/data-split-brain.txt: Input/output error not ok 27 Got "" instead of "brick1_alive" getfattr: Removing leading '/' from absolute path namesnot ok 30 Got "" instead of "brick0" not ok 32 Got "" instead of "brick1" It is not in the lists posted here. Is it only at mine? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel