Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr
- Original Message - From: Raghavendra Gowdappa rgowd...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Vijay Bellur vbel...@redhat.com, gluster-devel@gluster.org, Anand Avati aav...@redhat.com Sent: Wednesday, May 7, 2014 3:42:16 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr I think with repetitive log message suppression patch being merged, we don't really need gf_log_occasionally (except if they are logged in DEBUG or TRACE levels). That definitely helps. But still, setxattr calls are not supposed to fail with ENOTSUP on FS where we support gluster. If there are special keys which fail with ENOTSUPP, we can conditionally log setxattr failures only when the key is something new? Pranith - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Vijay Bellur vbel...@redhat.com Cc: gluster-devel@gluster.org, Anand Avati aav...@redhat.com Sent: Wednesday, 7 May, 2014 3:12:10 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr - Original Message - From: Vijay Bellur vbel...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com, Anand Avati aav...@redhat.com Cc: gluster-devel@gluster.org Sent: Tuesday, May 6, 2014 7:16:12 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr On 05/06/2014 01:07 PM, Pranith Kumar Karampuri wrote: hi, Why is there occasional logging for ENOTSUP errno when setxattr fails? In the absence of occasional logging, the log files would be flooded with this message every time there is a setxattr() call. How to know which keys are failing setxattr with ENOTSUPP if it is not logged when the key keeps changing? Pranith -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Need inputs for command deprecation output
- Original Message - From: Ravishankar N ravishan...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Friday, May 16, 2014 7:15:58 AM Subject: Re: [Gluster-devel] Need inputs for command deprecation output On 05/16/2014 06:25 AM, Pranith Kumar Karampuri wrote: Hi, As part of changing behaviour of 'volume heal' commands. I want the commands to show the following output. Any feedback in making them better would be awesome :-). root@pranithk-laptop - ~ 06:20:10 :) ⚡ gluster volume heal r2 info healed This command has been deprecated root@pranithk-laptop - ~ 06:20:13 :( ⚡ gluster volume heal r2 info heal-failed This command has been deprecated When a command is deprecated, it still works the way it did but gives out a warning about it not being maintained and possible alternatives to it. If I understand http://review.gluster.org/#/c/7766/ correctly, we are not supporting these commands any more, in which case the right message would be Command not supported I am wondering if we should even let the command be sent to self-heal-daemons from glusterd. How about 06:20:10 :) ⚡ gluster volume heal r2 info healed Command not supported. Instead of 06:20:10 :) ⚡ gluster volume heal r2 info healed brick: brick-1 status: Command not supported brick: brick-2 status: Command not supported Pranith -Ravi Pranith. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurios failures in tests/encryption/crypt.t
hi, crypt.t is failing regression builds once in a while and most of the times it is because of the failures just after the remount in the script. TEST rm -f $M0/testfile-symlink TEST rm -f $M0/testfile-link Both of these are failing with ENOTCONN. I got a chance to look at the logs. According to the brick logs, this is what I see: [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open] 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink: Transport endpoint is not connected This is the very first time I saw posix failing with ENOTCONN. Do we have these bricks on some other network mounts? I wonder why it fails with ENOTCONN. I also see that it happens right after a call_bail on the mount. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Changes to Regression script
- Original Message - From: Vijay Bellur vbel...@redhat.com To: gluster-infra gluster-in...@gluster.org Cc: gluster-devel@gluster.org Sent: Tuesday, May 13, 2014 4:13:02 PM Subject: [Gluster-devel] Changes to Regression script Hi All, Me and Kaushal have effected the following changes on regression.sh in build.gluster.org: 1. If a regression run results in a core and all tests pass, that particular run will be flagged as a failure. Previously a core that would cause test failures only would get marked as a failure. 2. Cores from a particular test run are now archived and are available at /d/archived_builds/. This will also prevent manual intervention for managing cores. 3. Logs from failed regression runs are now archived and are available at /d/logs/glusterfs-timestamp.tgz Do let us know if you have any comments on these changes. This is already proving to be useful :-). I was able to debug one of the spurious failures for crypt.t. But the only problem is I was not able copy out the logs. Had to take avati's help to get the log files. Will it be possible to give access to these files so that anyone can download them? Pranith Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Changes to Regression script
- Original Message - From: Vijay Bellur vbel...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: gluster-infra gluster-in...@gluster.org, gluster-devel@gluster.org Sent: Saturday, May 17, 2014 2:52:03 PM Subject: Re: [Gluster-devel] Changes to Regression script On 05/17/2014 02:10 PM, Pranith Kumar Karampuri wrote: - Original Message - From: Vijay Bellur vbel...@redhat.com To: gluster-infra gluster-in...@gluster.org Cc: gluster-devel@gluster.org Sent: Tuesday, May 13, 2014 4:13:02 PM Subject: [Gluster-devel] Changes to Regression script Hi All, Me and Kaushal have effected the following changes on regression.sh in build.gluster.org: 1. If a regression run results in a core and all tests pass, that particular run will be flagged as a failure. Previously a core that would cause test failures only would get marked as a failure. 2. Cores from a particular test run are now archived and are available at /d/archived_builds/. This will also prevent manual intervention for managing cores. 3. Logs from failed regression runs are now archived and are available at /d/logs/glusterfs-timestamp.tgz Do let us know if you have any comments on these changes. This is already proving to be useful :-). I was able to debug one of the spurious failures for crypt.t. But the only problem is I was not able copy out the logs. Had to take avati's help to get the log files. Will it be possible to give access to these files so that anyone can download them? Good to know! You can access the .tgz files from: http://build.gluster.org:443/logs/ Awesome!! Pranith -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr
Sent the following patch to remove the special treatment of ENOTSUP here: http://review.gluster.org/7788 Pranith - Original Message - From: Kaleb KEITHLEY kkeit...@redhat.com To: gluster-devel@gluster.org Sent: Tuesday, May 13, 2014 8:01:53 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr On 05/13/2014 08:00 AM, Nagaprasad Sathyanarayana wrote: On 05/07/2014 03:44 PM, Pranith Kumar Karampuri wrote: - Original Message - From: Raghavendra Gowdappa rgowd...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Vijay Bellur vbel...@redhat.com, gluster-devel@gluster.org, Anand Avati aav...@redhat.com Sent: Wednesday, May 7, 2014 3:42:16 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr I think with repetitive log message suppression patch being merged, we don't really need gf_log_occasionally (except if they are logged in DEBUG or TRACE levels). That definitely helps. But still, setxattr calls are not supposed to fail with ENOTSUP on FS where we support gluster. If there are special keys which fail with ENOTSUPP, we can conditionally log setxattr failures only when the key is something new? I know this is about EOPNOTSUPP (a.k.a. ENOTSUPP) returned by setxattr(2) for legitimate attrs. But I can't help but wondering if this isn't related to other bugs we've had with, e.g., lgetxattr(2) called on invalid xattrs? E.g. see https://bugzilla.redhat.com/show_bug.cgi?id=765202. We have a hack where xlators communicate with each other by getting (and setting?) invalid xattrs; the posix xlator has logic to filter out invalid xattrs, but due to bugs this hasn't always worked perfectly. It would be interesting to know which xattrs are getting errors and on which fs types. FWIW, in a quick perusal of a fairly recent (3.14.3) kernel, in xfs there are only six places where EOPNOTSUPP is returned, none of them related to xattrs. In ext[34] EOPNOTSUPP can be returned if the user_xattr option is not enabled (enabled by default in ext4.) And in the higher level vfs xattr code there are many places where EOPNOTSUPP _might_ be returned, primarily only if subordinate function calls aren't invoked which would clear the default or return a different error. -- Kaleb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Changes to Regression script
- Original Message - From: Vijay Bellur vbel...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: gluster-infra gluster-in...@gluster.org, gluster-devel@gluster.org Sent: Saturday, 17 May, 2014 2:52:03 PM Subject: Re: [Gluster-devel] Changes to Regression script On 05/17/2014 02:10 PM, Pranith Kumar Karampuri wrote: - Original Message - From: Vijay Bellur vbel...@redhat.com To: gluster-infra gluster-in...@gluster.org Cc: gluster-devel@gluster.org Sent: Tuesday, May 13, 2014 4:13:02 PM Subject: [Gluster-devel] Changes to Regression script Hi All, Me and Kaushal have effected the following changes on regression.sh in build.gluster.org: 1. If a regression run results in a core and all tests pass, that particular run will be flagged as a failure. Previously a core that would cause test failures only would get marked as a failure. 2. Cores from a particular test run are now archived and are available at /d/archived_builds/. This will also prevent manual intervention for managing cores. 3. Logs from failed regression runs are now archived and are available at /d/logs/glusterfs-timestamp.tgz Do let us know if you have any comments on these changes. This is already proving to be useful :-). I was able to debug one of the spurious failures for crypt.t. But the only problem is I was not able copy out the logs. Had to take avati's help to get the log files. Will it be possible to give access to these files so that anyone can download them? Good to know! You can access the .tgz files from: http://build.gluster.org:443/logs/ I was able to access these yesterday. But now it gives 404. Pranith -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious failures because of nfs and snapshots
- Original Message - From: Justin Clift jus...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Monday, 19 May, 2014 10:26:04 AM Subject: Re: [Gluster-devel] Spurious failures because of nfs and snapshots On 16/05/2014, at 1:49 AM, Pranith Kumar Karampuri wrote: hi, In the latest build I fired for review.gluster.com/7766 (http://build.gluster.org/job/regression/4443/console) failed because of spurious failure. The script doesn't wait for nfs export to be available. I fixed that, but interestingly I found quite a few scripts with same problem. Some of the scripts are relying on 'sleep 5' which also could lead to spurious failures if the export is not available in 5 seconds. Cool. Fixing this NFS problem across all of the tests would be really welcome. That specific failed test (bug-1087198.t) is the most common one I've seen over the last few weeks, causing about half of all failures in master. Eliminating this class of regression failure would be really helpful. :) This particular class is eliminated :-). Patch was merged on Friday. Pranith + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious failures because of nfs and snapshots
hi, Please resubmit the patches on top of http://review.gluster.com/#/c/7753 to prevent frequent regression failures. Pranith - Original Message - From: Vijaikumar M vmall...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Joseph Fernandes josfe...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Monday, May 19, 2014 2:40:47 PM Subject: Re: Spurious failures because of nfs and snapshots Brick disconnected with ping-time out: Here is the log message [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main] 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3 -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l /build/install/var/log/glusterfs/br icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba 7b4b0 --brick-port 49164 --xlator-option 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164) 2 [2014-05-19 04:29:38.141118] I [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting ping-timeout to 30secs 3 [2014-05-19 04:30:09.139521] C [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting. Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where ping-timer will be disabled by default for all the rpc connection except for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec). Thanks, Vijay On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote: The latest build failure also has the same issue: Download it from here: http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz Pranith - Original Message - From: Vijaikumar M vmall...@redhat.com To: Joseph Fernandes josfe...@redhat.com Cc: Pranith Kumar Karampuri pkara...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Monday, 19 May, 2014 11:41:28 AM Subject: Re: Spurious failures because of nfs and snapshots Hi Joseph, In the log mentioned below, it say ping-time is set to default value 30sec.I think issue is different. Can you please point me to the logs where you where able to re-create the problem. Thanks, Vijay On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote: hi Vijai, Joseph, In 2 of the last 3 build failures, http://build.gluster.org/job/regression/4479/console, http://build.gluster.org/job/regression/4478/console this test(tests/bugs/bug-1090042.t) failed. Do you guys think it is better to revert this test until the fix is available? Please send a patch to revert the test case if you guys feel so. You can re-submit it along with the fix to the bug mentioned by Joseph. Pranith. - Original Message - From: Joseph Fernandes josfe...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Friday, 16 May, 2014 5:13:57 PM Subject: Re: Spurious failures because of nfs and snapshots Hi All, tests/bugs/bug-1090042.t : I was able to reproduce the issue i.e when this test is done in a loop for i in {1..135} ; do ./bugs/bug-1090042.t When checked the logs [2014-05-16 10:49:49.003978] I [rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-05-16 10:49:49.004035] I [rpc-clnt.c:988:rpc_clnt_connection_init] 0-management: defaulting ping-timeout to 30secs [2014-05-16 10:49:49.004303] I [rpc-clnt.c:973:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2014-05-16 10:49:49.004340] I [rpc-clnt.c:988:rpc_clnt_connection_init] 0-management: defaulting ping-timeout to 30secs The issue is with ping-timeout and is tracked under the bug https://bugzilla.redhat.com/show_bug.cgi?id=1096729 The workaround is mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1096729#c8 Regards, Joe - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Gluster Devel gluster-devel@gluster.org Cc: Joseph Fernandes josfe...@redhat.com Sent: Friday, May 16, 2014 6:19:54 AM Subject: Spurious failures because of nfs and snapshots hi, In the latest build I fired for review.gluster.com/7766 (http://build.gluster.org/job/regression/4443/console) failed
[Gluster-devel] Split-brain present and future in afr
hi, Thanks to Vijay Bellur for helping with the re-write of the draft I sent him :-). Present: Split-brains of files happen in afr today due to 2 primary reasons: 1. Split-brains due to network partition or network split-brains 2. Split-brains due to servers in a replicated group being offline at different points in time without self-heal happening in the common period of time when the servers were online. For further discussion, this is referred to as split-brain over time. To prevent the occurence of split-brains, we have the following quorum implementations in place: a Client quorum - Driven by afr (client) and writes are allowed when majority of bricks in a replica group are online. Majority is by default N/2 + 1, where N is the replication factor for files in a volume. b Server quorum - Driven by glusterd (server) and writes are allowed when majority of peers are online. Majority by default is N/2 + 1, where N is the number of peers in a trusted storage pool. Both a and b primarily safeguard network split-brains. The protection of these quorum implementations for split-brain over time scenarios is not very high. Let us consider how replica 3 and replica 2 can be protected against split-brains. Replica 3: Client quorum is quite effective in this case as writes are only allowed when at least 2 of 3 bricks that form a replica group is seen by afr/client. A recent fix for a corner case race in client quorum, (http://review.gluster.org/7600) makes it very robust. This patch is now part of master and release-3.5. We plan to backport it to release-3.4 too. Replica 2: Majority for client quorum in a deployment with 2 bricks per replica group is 2. Hence availability becomes a problem with replica 2 when either of the bricks is offline. To provide better avaialbility for replica-2, the first brick in a replica set is provided higher weight and quorum is met as long as the first brick is online. If the first brick is offline, then quorum is lost. Let us consider the following cases with B1 and B2 forming a replicated set: B1B2Quorum Online OnlineMet Online Offline Met Offline OfflineNot Met Offline OfflineNot Met Though better availability is provided by client quorum in replica 2 scenarios, it is not very optimal and hence an improvement in behavior seems desirable. Future: Our focus in afr going forward would be to solve three problems to provide better protection against split-brains and resolving them: 1. Better protection for split-brain over time. 2. Policy based split-brain resolution. 3. Provide better availability with client quorum and replica 2. For 1, implementation of outcasting logic will address the problem: - An outcast is a copy of a file on which writes have been performed only when quorum is met. - When a brick goes down and comes back up self-heal daemon will go and mark the affected files on the brick that just came back up as outcasts. The outcast marking can be implemented even before the brick is declared available to regular clients. Once a copy of a file is marked as needing self-heal (or as an outcast), writes from clients will not land on that copy till self-heal is completed and the outcast tag is removed. For 2, we plan to provide commands that can heal based on user configurable policies. Examples of policies would be: - Pick up the largest file as the winner for resolving a self-heal - Choose brick foo as the winner for resolving split-brains - Pick up the file with the latest version as the winner (when versioning for files is available). For 3, we are planning to introduce arbiter bricks that can be used to determine quorum. The arbiter bricks will be dummy bricks that host only files that will be updated from multiple clients. This will be achieved by bringing about variable replication count for configurable class of files within a volume. In the case of a replicated volume with one arbiter brick per replica group, certain files that are prone to split-brain will be in 3 bricks (2 data bricks + 1 arbiter brick). All other files will be present in the regular data bricks. For example, when oVirt VM disks are hosted on a replica 2 volume, sanlock is used by oVirt for arbitration. sanloclk lease files will be written by all clients and VM disks are written by only a single client at any given point of time. In this scenario, we can place sanlock lease files on 2 data + 1 arbiter bricks. The VM disk files will only be present on the 2 data bricks. Client quorum is now determined by looking at 3 bricks instead of 2 and we have better protection when network split-brains happen. A combination of 1. and 3. does seem
Re: [Gluster-devel] Spurious failures because of nfs and snapshots
Hey, Seems like even after this fix is merged, the regression tests are failing for the same script. You can check the logs at http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz Relevant logs: [2014-05-20 20:17:07.026045] : volume create patchy build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : SUCCESS [2014-05-20 20:17:08.030673] : volume start patchy : SUCCESS [2014-05-20 20:17:08.279148] : volume barrier patchy enable : SUCCESS [2014-05-20 20:17:08.476785] : volume barrier patchy enable : FAILED : Failed to reconfigure barrier. [2014-05-20 20:17:08.727429] : volume barrier patchy disable : SUCCESS [2014-05-20 20:17:08.926995] : volume barrier patchy disable : FAILED : Failed to reconfigure barrier. Pranith - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Gluster Devel gluster-devel@gluster.org Cc: Joseph Fernandes josfe...@redhat.com, Vijaikumar M vmall...@redhat.com Sent: Tuesday, May 20, 2014 3:41:11 PM Subject: Re: Spurious failures because of nfs and snapshots hi, Please resubmit the patches on top of http://review.gluster.com/#/c/7753 to prevent frequent regression failures. Pranith - Original Message - From: Vijaikumar M vmall...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Joseph Fernandes josfe...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Monday, May 19, 2014 2:40:47 PM Subject: Re: Spurious failures because of nfs and snapshots Brick disconnected with ping-time out: Here is the log message [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main] 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3 -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l /build/install/var/log/glusterfs/br icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba 7b4b0 --brick-port 49164 --xlator-option 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164) 2 [2014-05-19 04:29:38.141118] I [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting ping-timeout to 30secs 3 [2014-05-19 04:30:09.139521] C [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting. Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where ping-timer will be disabled by default for all the rpc connection except for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec). Thanks, Vijay On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote: The latest build failure also has the same issue: Download it from here: http://build.gluster.org:443/logs/glusterfs-logs-20140518%3a22%3a27%3a31.tgz Pranith - Original Message - From: Vijaikumar M vmall...@redhat.com To: Joseph Fernandes josfe...@redhat.com Cc: Pranith Kumar Karampuri pkara...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Monday, 19 May, 2014 11:41:28 AM Subject: Re: Spurious failures because of nfs and snapshots Hi Joseph, In the log mentioned below, it say ping-time is set to default value 30sec.I think issue is different. Can you please point me to the logs where you where able to re-create the problem. Thanks, Vijay On Monday 19 May 2014 09:39 AM, Pranith Kumar Karampuri wrote: hi Vijai, Joseph, In 2 of the last 3 build failures, http://build.gluster.org/job/regression/4479/console, http://build.gluster.org/job/regression/4478/console this test(tests/bugs/bug-1090042.t) failed. Do you guys think it is better to revert this test until the fix is available? Please send a patch to revert the test case if you guys feel so. You can re-submit it along with the fix to the bug mentioned by Joseph. Pranith. - Original Message - From: Joseph Fernandes josfe...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Friday, 16 May, 2014 5:13:57 PM Subject: Re: Spurious failures because of nfs and snapshots Hi All, tests/bugs/bug-1090042.t : I was able to reproduce the issue i.e when this test is done in a loop for i in {1
Re: [Gluster-devel] Spurious failures because of nfs and snapshots
- Original Message - From: Atin Mukherjee amukh...@redhat.com To: gluster-devel@gluster.org, Pranith Kumar Karampuri pkara...@redhat.com Sent: Wednesday, May 21, 2014 3:39:21 PM Subject: Re: Fwd: Re: [Gluster-devel] Spurious failures because of nfs and snapshots On 05/21/2014 11:42 AM, Atin Mukherjee wrote: On 05/21/2014 10:54 AM, SATHEESARAN wrote: Guys, This is the issue pointed out by Pranith with regard to Barrier. I was reading through it. But I wanted to bring it to concern -- S Original Message Subject: Re: [Gluster-devel] Spurious failures because of nfs and snapshots Date: Tue, 20 May 2014 21:16:57 -0400 (EDT) From: Pranith Kumar Karampuri pkara...@redhat.com To:Vijaikumar M vmall...@redhat.com, Joseph Fernandes josfe...@redhat.com CC:Gluster Devel gluster-devel@gluster.org Hey, Seems like even after this fix is merged, the regression tests are failing for the same script. You can check the logs at http://build.gluster.org:443/logs/glusterfs-logs-20140520%3a14%3a06%3a46.tgz Pranith, Is this the correct link? I don't see any log having this sequence there. Also looking at the log from this mail, this is expected as per the barrier functionality, an enable request followed by another enable should always fail and the same happens for disable. Can you please confirm the link and which particular regression test is causing this issue, is it bug-1090042.t? --Atin Relevant logs: [2014-05-20 20:17:07.026045] : volume create patchy build.gluster.org:/d/backends/patchy1 build.gluster.org:/d/backends/patchy2 : SUCCESS [2014-05-20 20:17:08.030673] : volume start patchy : SUCCESS [2014-05-20 20:17:08.279148] : volume barrier patchy enable : SUCCESS [2014-05-20 20:17:08.476785] : volume barrier patchy enable : FAILED : Failed to reconfigure barrier. [2014-05-20 20:17:08.727429] : volume barrier patchy disable : SUCCESS [2014-05-20 20:17:08.926995] : volume barrier patchy disable : FAILED : Failed to reconfigure barrier. This log is for bug-1092841.t and its expected. Damn :-(. I think I screwed up the timestamps while checking Sorry about that :-(. But there are failures. Check http://build.gluster.org/job/regression/4501/consoleFull Pranith --Atin Pranith - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Gluster Devel gluster-devel@gluster.org Cc: Joseph Fernandes josfe...@redhat.com, Vijaikumar M vmall...@redhat.com Sent: Tuesday, May 20, 2014 3:41:11 PM Subject: Re: Spurious failures because of nfs and snapshots hi, Please resubmit the patches on top of http://review.gluster.com/#/c/7753 to prevent frequent regression failures. Pranith - Original Message - From: Vijaikumar M vmall...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Joseph Fernandes josfe...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Monday, May 19, 2014 2:40:47 PM Subject: Re: Spurious failures because of nfs and snapshots Brick disconnected with ping-time out: Here is the log message [2014-05-19 04:29:38.133266] I [MSGID: 100030] [glusterfsd.c:1998:main] 0-/build/install/sbin/glusterfsd: Started running /build/install/sbi n/glusterfsd version 3.5qa2 (args: /build/install/sbin/glusterfsd -s build.gluster.org --volfile-id /snaps/patchy_snap1/3f2ae3fbb4a74587b1a9 1013f07d327f.build.gluster.org.var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3 -p /var/lib/glusterd/snaps/patchy_snap1/3f2ae3f bb4a74587b1a91013f07d327f/run/build.gluster.org-var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.pid -S /var/run/51fe50a6faf0aae006c815da946caf3a.socket --brick-name /var/run/gluster/snaps/3f2ae3fbb4a74587b1a91013f07d327f/brick3 -l /build/install/var/log/glusterfs/br icks/var-run-gluster-snaps-3f2ae3fbb4a74587b1a91013f07d327f-brick3.log --xlator-option *-posix.glusterd-uuid=494ef3cd-15fc-4c8c-8751-2d441ba 7b4b0 --brick-port 49164 --xlator-option 3f2ae3fbb4a74587b1a91013f07d327f-server.listen-port=49164) 2 [2014-05-19 04:29:38.141118] I [rpc-clnt.c:988:rpc_clnt_connection_init] 0-glusterfs: defaulting ping-timeout to 30secs 3 [2014-05-19 04:30:09.139521] C [rpc-clnt-ping.c:105:rpc_clnt_ping_timer_expired] 0-glusterfs: server 10.3.129.13:24007 has not responded in the last 30 seconds, disconnecting. Patch 'http://review.gluster.org/#/c/7753/' will fix the problem, where ping-timer will be disabled by default for all the rpc connection except for glusterd-glusterd (set to 30sec) and client-glusterd (set to 42sec). Thanks, Vijay On Monday 19 May 2014 11:56 AM, Pranith Kumar Karampuri wrote: The latest build failure also has the same issue: Download it from here: http://build.gluster.org:443/logs/glusterfs-logs
Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t
- Original Message - From: Anand Avati av...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Edward Shishkin edw...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Wednesday, May 21, 2014 12:36:22 PM Subject: Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t On Tue, May 20, 2014 at 10:54 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: - Original Message - From: Anand Avati av...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Edward Shishkin edw...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Wednesday, May 21, 2014 10:53:54 AM Subject: Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t There are a few suspicious things going on here.. On Tue, May 20, 2014 at 10:07 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: hi, crypt.t is failing regression builds once in a while and most of the times it is because of the failures just after the remount in the script. TEST rm -f $M0/testfile-symlink TEST rm -f $M0/testfile-link Both of these are failing with ENOTCONN. I got a chance to look at the logs. According to the brick logs, this is what I see: [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open] 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink: Transport endpoint is not connected posix_open() happening on a symlink? This should NEVER happen. glusterfs itself should NEVER EVER by triggering symlink resolution on the server. In this case, for whatever reason an open() is attempted on a symlink, and it is getting followed back onto gluster's own mount point (test case is creating an absolute link). So first find out: who is triggering fop-open() on a symlink. Fix the caller. http://review.gluster.org/7824 Next: add a check in posix_open() to fail with ELOOP or EINVAL if the inode is a symlink. http://review.gluster.org/7823 I think I understood what you are saying. Open call for symlink on fuse mount lead to an open call again for the target on the same fuse mount. It's not that simple. The client VFS is intelligent enough to resolve symlinks and send open() only on non-symlinks. And the test case script was doing an obvious unlink() (TEST rm -f filename), so it was not initiated by an open() attempt in the first place. My guess is that some xlator (probably crypt?) is doing an open() on an inode and that is going through unchecked in posix. It is a bug in both the caller and posix, but the onus/responsibility is on posix to disallow open() on anything but regular files (even open() on character or block devices should not happen in posix). Which lead to deadlock :). That is why we disallow opens on symlink in gluster? That's not just why open on symlink is disallowed in gluster, it is a more generic problem of following symlinks in general inside gluster. Symlink resolution must strictly happen only in the outermost VFS. Following symlinks inside the filesystem is not only an invalid operation, but can lead to all kinds of deadlocks, security holes (what if you opened a symlink which points to /etc/passwd, should it show the contents of the client machine's /etc/passwd or the server? Now what if you wrote to the file through the symlink? etc. you get the idea..) and wrong/weird/dangerous behaviors. This is not just related to following symlinks, even open()ing special devices.. e.g if you create a char device file with major/minor number of an audio device and wrote pcm data into it, should it play music on the client machine or in the server machine? etc. The summary is, following symlinks or opening non-regular files is VFS/client operation and are invalid operations in a filesystem context. Now only one question remains. How could it not hang everytime? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Regarding spurious failure of bug-884455.t
hi Raghavendra, The failures are happening because rebalance status command is failing. I could see the following logs in glusterd log if I start it with -L DEBUG and try to recreate the bug. Could you please take a look at it. Test Summary Report --- tests/bugs/bug-884455.t (Wstat: 0 Tests: 22 Failed: 1) Failed test: 11 [2014-05-22 01:49:29.493657] D [glusterd-op-sm.c:5961:glusterd_op_ac_send_brick_op] 0-management: Returning with 0 [2014-05-22 01:49:29.493662] D [glusterd-utils.c:8027:glusterd_sm_tr_log_transition_add] 0-management: Transitioning from 'Stage op sent' to 'Brick op sent' due to event 'GD_OP_EVENT_ALL_ACC' [2014-05-22 01:49:29.493667] D [glusterd-utils.c:8029:glusterd_sm_tr_log_transition_add] 0-management: returning 0 [2014-05-22 01:49:29.493673] D [glusterd-op-sm.c:268:glusterd_set_txn_opinfo] 0-: Successfully set opinfo for transaction ID : 3825ae51-bb80-4031-9d61- 82799fe0bc81 [2014-05-22 01:49:29.493678] D [glusterd-op-sm.c:275:glusterd_set_txn_opinfo] 0-: Returning 0 [2014-05-22 01:49:29.493820] D [glusterd-rpc-ops.c:1796:__glusterd_brick_op_cbk] 0-: transaction ID = 3825ae51-bb80-4031-9d61-82799fe0bc81 [2014-05-22 01:49:29.493829] D [glusterd-op-sm.c:6411:glusterd_op_sm_inject_event] 0-management: Enqueue event: 'GD_OP_EVENT_RCVD_RJT' [2014-05-22 01:49:29.493835] D [glusterd-op-sm.c:6489:glusterd_op_sm] 0-management: Dequeued event of type: 'GD_OP_EVENT_RCVD_RJT' Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression testing results for master branch
- Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Justin Clift jus...@gluster.org Cc: Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 6:23:16 AM Subject: Re: [Gluster-devel] Regression testing results for master branch - Original Message - From: Justin Clift jus...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Wednesday, May 21, 2014 11:01:36 PM Subject: Re: [Gluster-devel] Regression testing results for master branch On 21/05/2014, at 6:17 PM, Justin Clift wrote: Hi all, Kicked off 21 VM's in Rackspace earlier today, running the regression tests against master branch. Only 3 VM's failed out of the 21 (86% PASS, 14% FAIL), with all three being for the same test: Test Summary Report --- ./tests/bugs/bug-948686.t (Wstat: 0 Tests: 20 Failed: 2) Failed tests: 13-14 Files=230, Tests=4373, 5601 wallclock secs ( 2.09 usr 1.58 sys + 1012.66 cusr 688.80 csys = 1705.13 CPU) Result: FAIL Interestingly, this one looks like a simple time based thing too. The failed tests are the ones after the sleep: ... #modify volume config to see change in volume-sync TEST $CLI_1 volume set $V0 write-behind off #add some files to the volume to see effect of volume-heal cmd TEST touch $M0/{1..100}; TEST $CLI_1 volume stop $V0; TEST $glusterd_3; sleep 3; TEST $CLI_3 volume start $V0; TEST $CLI_2 volume stop $V0; TEST $CLI_2 volume delete $V0; Do you already have this one on your radar? It wasn't, thanks for bringing it on my radar :-). Sent http://review.gluster.org/7837 to address this. Kaushal, I made this fix based on the assumption that the script seems to be waiting for all glusterds to be online. I could not check the logs because glusterds spawned by cluster.rc seem to be storing the logs not in the default location. Do you think we can make changes to the script so that we can get logs from glusterds spawned by cluster.rc as well? Pranith Pranith + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regression tests and DEBUG flags
hi, I think we should run the regression tests with DEBUG builds so that GF_ASSERTs are caught. I will work with Justin to make sure we don't see too many failures before turning it on. I also want the regression tests to catch memory-corruption (invalid read/write of deallocated memory). For that I sent the following patch http://review.gluster.com/7835 to minimize the effects of mem-pool. Please let me know your comments. Review on the patch would be nice too :-). Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
Kaushal, Rebalance status command seems to be failing sometimes. I sent a mail about such spurious failure earlier today. Did you get a chance to look at the logs and confirm that rebalance didn't fail and it is indeed a timeout? Pranith - Original Message - From: Kaushal M kshlms...@gmail.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Justin Clift jus...@gluster.org, Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 4:40:25 PM Subject: Re: [Gluster-devel] bug-857330/normal.t failure The test is waiting for rebalance to finish. This is a rebalance with some actual data so it could have taken a long time to finish. I did set a pretty high timeout, but it seems like it's not enough for the new VMs. Possible options are, - Increase this timeout further - Reduce the amount of data. Currently this is 100 directories with 10 files each of size between 10-500KB ~kaushal On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Kaushal has more context about these CCed. Keep the setup until he responds so that he can take a look. Pranith - Original Message - From: Justin Clift jus...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 3:54:46 PM Subject: bug-857330/normal.t failure Hi Pranith, Ran a few VM's with your Gerrit CR 7835 applied, and in DEBUG mode (I think). One of the VM's had a failure in bug-857330/normal.t: Test Summary Report --- ./tests/basic/rpm.t (Wstat: 0 Tests: 0 Failed: 0) Parse errors: Bad plan. You planned 8 tests but ran 0. ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 Failed: 1) Failed test: 13 Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + 941.82 cusr 645.54 csys = 1591.22 CPU) Result: FAIL Seems to be this test: COMMAND=volume rebalance $V0 status PATTERN=completed EXPECT_WITHIN 300 $PATTERN get-task-status Is this one on your radar already? Btw, this VM is still online. Can give you access to retrieve logs if useful. + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
- Original Message - From: Kaushal M kshlms...@gmail.com To: Justin Clift jus...@gluster.org, Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 6:04:29 PM Subject: Re: [Gluster-devel] bug-857330/normal.t failure Thanks Justin, I found the problem. The VM can be deleted now. Turns out, there was more than enough time for the rebalance to complete. But we hit a race, which caused a command to fail. The particular test that failed is waiting for rebalance to finish. It does this by doing a 'gluster volume rebalance status' command and checking the result. The EXPECT_WITHIN function runs this command till we have a match, the command fails or the timeout happens. For a rebalance status command, glusterd sends a request to the rebalance process (as a brick_op) to get the latest stats. It had done the same in this case as well. But while glusterd was waiting for the reply, the rebalance completed and the process stopped itself. This caused the rpc connection between glusterd and rebalance proc to close. This caused the all pending requests to be unwound as failures. Which in turnlead to the command failing. Do you think we can print the status of the process as 'not-responding' when such a thing happens, instead of failing the command? Pranith I cannot think of a way to avoid this race from within glusterd. For this particular test, we could avoid using the 'rebalance status' command if we directly checked the rebalance process state using its pid etc. I don't particularly approve of this approach, as I think I used the 'rebalance status' command for a reason. But I currently cannot recall the reason, and if cannot come with it soon, I wouldn't mind changing the test to avoid rebalance status. ~kaushal On Thu, May 22, 2014 at 5:22 PM, Justin Clift jus...@gluster.org wrote: On 22/05/2014, at 12:32 PM, Kaushal M wrote: I haven't yet. But I will. Justin, Can I get take a peek inside the vm? Sure. IP: 23.253.57.20 User: root Password: foobar123 The stdout log from the regression test is in /tmp/regression.log. The GlusterFS git repo is in /root/glusterfs. Um, you should be able to find everything else pretty easily. Btw, this is just a temp VM, so feel free to do anything you want with it. When you're finished with it let me know so I can delete it. :) + Justin ~kaushal On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Kaushal, Rebalance status command seems to be failing sometimes. I sent a mail about such spurious failure earlier today. Did you get a chance to look at the logs and confirm that rebalance didn't fail and it is indeed a timeout? Pranith - Original Message - From: Kaushal M kshlms...@gmail.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Justin Clift jus...@gluster.org , Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 4:40:25 PM Subject: Re: [Gluster-devel] bug-857330/normal.t failure The test is waiting for rebalance to finish. This is a rebalance with some actual data so it could have taken a long time to finish. I did set a pretty high timeout, but it seems like it's not enough for the new VMs. Possible options are, - Increase this timeout further - Reduce the amount of data. Currently this is 100 directories with 10 files each of size between 10-500KB ~kaushal On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Kaushal has more context about these CCed. Keep the setup until he responds so that he can take a look. Pranith - Original Message - From: Justin Clift jus...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 3:54:46 PM Subject: bug-857330/normal.t failure Hi Pranith, Ran a few VM's with your Gerrit CR 7835 applied, and in DEBUG mode (I think). One of the VM's had a failure in bug-857330/normal.t: Test Summary Report --- ./tests/basic/rpm.t (Wstat: 0 Tests: 0 Failed: 0) Parse errors: Bad plan. You planned 8 tests but ran 0. ./tests/bugs/bug-857330/normal.t (Wstat: 0 Tests: 24 Failed: 1) Failed test: 13 Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + 941.82 cusr 645.54 csys = 1591.22 CPU) Result: FAIL Seems to be this test: COMMAND=volume rebalance $V0 status PATTERN=completed EXPECT_WITHIN 300 $PATTERN get-task-status Is this one on your radar already? Btw, this VM is still online. Can give you access to retrieve logs if useful. + Justin -- Open Source and Standards @ Red Hat
Re: [Gluster-devel] Split-brain present and future in afr
- Original Message - From: Jeff Darcy jda...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Tuesday, May 20, 2014 10:08:12 PM Subject: Re: [Gluster-devel] Split-brain present and future in afr 1. Better protection for split-brain over time. 2. Policy based split-brain resolution. 3. Provide better availability with client quorum and replica 2. I would add the following: (4) Quorum enforcement - any kind - on by default. For replica - 3 we can do that. For replica 2, quorum implementation at the moment is not good enough. Until we fix it correctly may be we should let it be. We can revisit that decision once we come up with better solution for replica 2. (5) Fix the problem of volumes losing quorum because unrelated nodes went down (i.e. implement volume-level quorum). (6) Better tools for users to resolve split brain themselves. Agreed. Already in plan for 3.6. For 3, we are planning to introduce arbiter bricks that can be used to determine quorum. The arbiter bricks will be dummy bricks that host only files that will be updated from multiple clients. This will be achieved by bringing about variable replication count for configurable class of files within a volume. In the case of a replicated volume with one arbiter brick per replica group, certain files that are prone to split-brain will be in 3 bricks (2 data bricks + 1 arbiter brick). All other files will be present in the regular data bricks. For example, when oVirt VM disks are hosted on a replica 2 volume, sanlock is used by oVirt for arbitration. sanloclk lease files will be written by all clients and VM disks are written by only a single client at any given point of time. In this scenario, we can place sanlock lease files on 2 data + 1 arbiter bricks. The VM disk files will only be present on the 2 data bricks. Client quorum is now determined by looking at 3 bricks instead of 2 and we have better protection when network split-brains happen. Constantly filtering requests to use either N or N+1 bricks is going to be complicated and hard to debug. Every data-structure allocation or loop based on replica count will have to be examined, and many will have to be modified. That's a *lot* of places. This also overlaps significantly with functionality that can be achieved with data classification (i.e. supporting multiple replica levels within the same volume). What use case requires that it be implemented within AFR instead of more generally and flexibly? 1) It wouldn't still bring in arbiter for replica 2. 2) That would need more bricks, more processes, more ports. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr
Please review http://review.gluster.com/7788 submitted to remove the filtering of that error. Pranith - Original Message - From: Harshavardhana har...@harshavardhana.net To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Kaleb KEITHLEY kkeit...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Friday, May 23, 2014 2:12:02 AM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr http://review.gluster.com/#/c/7823/ - the fix here On Thu, May 22, 2014 at 1:41 PM, Harshavardhana har...@harshavardhana.net wrote: Here are the important locations in the XFS tree coming from 2.6.32 branch STATIC int xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl) { struct xfs_inode *ip = XFS_I(inode); unsigned char *ea_name; int error; if (S_ISLNK(inode-i_mode)) I would generally think this is the issue. return -EOPNOTSUPP; STATIC long xfs_vn_fallocate( struct inode*inode, int mode, loff_t offset, loff_t len) { longerror; loff_t new_size = 0; xfs_flock64_t bf; xfs_inode_t *ip = XFS_I(inode); int cmd = XFS_IOC_RESVSP; int attr_flags = XFS_ATTR_NOLOCK; if (mode ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) return -EOPNOTSUPP; STATIC int xfs_ioc_setxflags( xfs_inode_t *ip, struct file *filp, void__user *arg) { struct fsxattr fa; unsigned intflags; unsigned intmask; int error; if (copy_from_user(flags, arg, sizeof(flags))) return -EFAULT; if (flags ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \ FS_NOATIME_FL | FS_NODUMP_FL | \ FS_SYNC_FL)) return -EOPNOTSUPP; Perhaps some sort of system level acl's are being propagated by us over symlinks() ? - perhaps this is the related to the same issue of following symlinks? On Sun, May 18, 2014 at 10:48 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Sent the following patch to remove the special treatment of ENOTSUP here: http://review.gluster.org/7788 Pranith - Original Message - From: Kaleb KEITHLEY kkeit...@redhat.com To: gluster-devel@gluster.org Sent: Tuesday, May 13, 2014 8:01:53 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr On 05/13/2014 08:00 AM, Nagaprasad Sathyanarayana wrote: On 05/07/2014 03:44 PM, Pranith Kumar Karampuri wrote: - Original Message - From: Raghavendra Gowdappa rgowd...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Vijay Bellur vbel...@redhat.com, gluster-devel@gluster.org, Anand Avati aav...@redhat.com Sent: Wednesday, May 7, 2014 3:42:16 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr I think with repetitive log message suppression patch being merged, we don't really need gf_log_occasionally (except if they are logged in DEBUG or TRACE levels). That definitely helps. But still, setxattr calls are not supposed to fail with ENOTSUP on FS where we support gluster. If there are special keys which fail with ENOTSUPP, we can conditionally log setxattr failures only when the key is something new? I know this is about EOPNOTSUPP (a.k.a. ENOTSUPP) returned by setxattr(2) for legitimate attrs. But I can't help but wondering if this isn't related to other bugs we've had with, e.g., lgetxattr(2) called on invalid xattrs? E.g. see https://bugzilla.redhat.com/show_bug.cgi?id=765202. We have a hack where xlators communicate with each other by getting (and setting?) invalid xattrs; the posix xlator has logic to filter out invalid xattrs, but due to bugs this hasn't always worked perfectly. It would be interesting to know which xattrs are getting errors and on which fs types. FWIW, in a quick perusal of a fairly recent (3.14.3) kernel, in xfs there are only six places where EOPNOTSUPP is returned, none of them related to xattrs. In ext[34] EOPNOTSUPP can be returned if the user_xattr option is not enabled (enabled by default in ext4.) And in the higher level vfs xattr code there are many places where EOPNOTSUPP _might_ be returned, primarily only if subordinate function calls aren't invoked which would clear the default or return a different error. -- Kaleb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious failire ./tests/bugs/bug-1049834.t [16]
CC gluster-devel Pranith - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Avra Sengupta aseng...@redhat.com Sent: Wednesday, May 28, 2014 6:42:53 AM Subject: Spurious failire ./tests/bugs/bug-1049834.t [16] hi Avra, Could you look into it. Patch == http://review.gluster.com/7889/1 Author== Avra Sengupta aseng...@redhat.com Build triggered by== amarts Build-url == http://build.gluster.org/job/regression/4586/consoleFull Download-log-at == http://build.gluster.org:443/logs/regression/glusterfs-logs-20140527:14:51:09.tgz Test written by == Author: Avra Sengupta aseng...@redhat.com ./tests/bugs/bug-1049834.t [16] #!/bin/bash . $(dirname $0)/../include.rc . $(dirname $0)/../cluster.rc . $(dirname $0)/../volume.rc . $(dirname $0)/../snapshot.rc cleanup; 1 TEST verify_lvm_version 2 TEST launch_cluster 2 3 TEST setup_lvm 2 4 TEST $CLI_1 peer probe $H2 5 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 6 TEST $CLI_1 volume create $V0 $H1:$L1 $H2:$L2 7 EXPECT 'Created' volinfo_field $V0 'Status' 8 TEST $CLI_1 volume start $V0 9 EXPECT 'Started' volinfo_field $V0 'Status' #Setting the snap-max-hard-limit to 4 10 TEST $CLI_1 snapshot config $V0 snap-max-hard-limit 4 PID_1=$! wait $PID_1 #Creating 3 snapshots on the volume (which is the soft-limit) 11 TEST create_n_snapshots $V0 3 $V0_snap 12 TEST snapshot_n_exists $V0 3 $V0_snap #Creating the 4th snapshot on the volume and expecting it to be created # but with the deletion of the oldest snapshot i.e 1st snapshot 13 TEST $CLI_1 snapshot create ${V0}_snap4 ${V0} 14 TEST snapshot_exists 1 ${V0}_snap4 15 TEST ! snapshot_exists 1 ${V0}_snap1 ***16 TEST $CLI_1 snapshot delete ${V0}_snap4 17 TEST $CLI_1 snapshot create ${V0}_snap1 ${V0} 18 TEST snapshot_exists 1 ${V0}_snap1 #Deleting the 4 snaps #TEST delete_n_snapshots $V0 4 $V0_snap #TEST ! snapshot_n_exists $V0 4 $V0_snap cleanup; Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Spurious failure in ./tests/bugs/bug-948686.t [14, 15, 16]
hi kp, Could you look into it. Patch == http://review.gluster.com/7889/1 Author== Avra Sengupta aseng...@redhat.com Build triggered by== amarts Build-url == http://build.gluster.org/job/regression/4586/consoleFull Download-log-at == http://build.gluster.org:443/logs/regression/glusterfs-logs-20140527:14:51:09.tgz Test written by == Author: Krishnan Parthasarathi kpart...@redhat.com ./tests/bugs/bug-948686.t [14, 15, 16] #!/bin/bash . $(dirname $0)/../include.rc . $(dirname $0)/../volume.rc . $(dirname $0)/../cluster.rc function check_peers { $CLI_1 peer status | grep 'Peer in Cluster (Connected)' | wc -l } cleanup; #setup cluster and test volume 1 TEST launch_cluster 3; # start 3-node virtual cluster 2 TEST $CLI_1 peer probe $H2; # peer probe server 2 from server 1 cli 3 TEST $CLI_1 peer probe $H3; # peer probe server 3 from server 1 cli 4 EXPECT_WITHIN $PROBE_TIMEOUT 2 check_peers; 5 TEST $CLI_1 volume create $V0 replica 2 $H1:$B1/$V0 $H1:$B1/${V0}_1 $H2:$B2/$V0 $H3:$B3/$V0 6 TEST $CLI_1 volume start $V0 7 TEST glusterfs --volfile-server=$H1 --volfile-id=$V0 $M0 #kill a node 8 TEST kill_node 3 #modify volume config to see change in volume-sync 9 TEST $CLI_1 volume set $V0 write-behind off #add some files to the volume to see effect of volume-heal cmd 10 TEST touch $M0/{1..100}; 11 TEST $CLI_1 volume stop $V0; 12 TEST $glusterd_3; 13 EXPECT_WITHIN $PROBE_TIMEOUT 2 check_peers; ***14 TEST $CLI_3 volume start $V0; ***15 TEST $CLI_2 volume stop $V0; ***16 TEST $CLI_2 volume delete $V0; cleanup; 17 TEST glusterd; 18 TEST $CLI volume create $V0 $H0:$B0/$V0 19 TEST $CLI volume start $V0 pkill glusterd; pkill glusterfsd; 20 TEST glusterd 21 TEST $CLI volume status $V0 cleanup; ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr
Vijay, Could you please merge http://review.gluster.com/7788 if there are no more concerns. Pranith - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Harshavardhana har...@harshavardhana.net Cc: Gluster Devel gluster-devel@gluster.org Sent: Monday, May 26, 2014 1:18:18 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr Please review http://review.gluster.com/7788 submitted to remove the filtering of that error. Pranith - Original Message - From: Harshavardhana har...@harshavardhana.net To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Kaleb KEITHLEY kkeit...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Friday, May 23, 2014 2:12:02 AM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr http://review.gluster.com/#/c/7823/ - the fix here On Thu, May 22, 2014 at 1:41 PM, Harshavardhana har...@harshavardhana.net wrote: Here are the important locations in the XFS tree coming from 2.6.32 branch STATIC int xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl) { struct xfs_inode *ip = XFS_I(inode); unsigned char *ea_name; int error; if (S_ISLNK(inode-i_mode)) I would generally think this is the issue. return -EOPNOTSUPP; STATIC long xfs_vn_fallocate( struct inode*inode, int mode, loff_t offset, loff_t len) { longerror; loff_t new_size = 0; xfs_flock64_t bf; xfs_inode_t *ip = XFS_I(inode); int cmd = XFS_IOC_RESVSP; int attr_flags = XFS_ATTR_NOLOCK; if (mode ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) return -EOPNOTSUPP; STATIC int xfs_ioc_setxflags( xfs_inode_t *ip, struct file *filp, void__user *arg) { struct fsxattr fa; unsigned intflags; unsigned intmask; int error; if (copy_from_user(flags, arg, sizeof(flags))) return -EFAULT; if (flags ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \ FS_NOATIME_FL | FS_NODUMP_FL | \ FS_SYNC_FL)) return -EOPNOTSUPP; Perhaps some sort of system level acl's are being propagated by us over symlinks() ? - perhaps this is the related to the same issue of following symlinks? On Sun, May 18, 2014 at 10:48 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Sent the following patch to remove the special treatment of ENOTSUP here: http://review.gluster.org/7788 Pranith - Original Message - From: Kaleb KEITHLEY kkeit...@redhat.com To: gluster-devel@gluster.org Sent: Tuesday, May 13, 2014 8:01:53 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr On 05/13/2014 08:00 AM, Nagaprasad Sathyanarayana wrote: On 05/07/2014 03:44 PM, Pranith Kumar Karampuri wrote: - Original Message - From: Raghavendra Gowdappa rgowd...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Vijay Bellur vbel...@redhat.com, gluster-devel@gluster.org, Anand Avati aav...@redhat.com Sent: Wednesday, May 7, 2014 3:42:16 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr I think with repetitive log message suppression patch being merged, we don't really need gf_log_occasionally (except if they are logged in DEBUG or TRACE levels). That definitely helps. But still, setxattr calls are not supposed to fail with ENOTSUP on FS where we support gluster. If there are special keys which fail with ENOTSUPP, we can conditionally log setxattr failures only when the key is something new? I know this is about EOPNOTSUPP (a.k.a. ENOTSUPP) returned by setxattr(2) for legitimate attrs. But I can't help but wondering if this isn't related to other bugs we've had with, e.g., lgetxattr(2) called on invalid xattrs? E.g. see https://bugzilla.redhat.com/show_bug.cgi?id=765202. We have a hack where xlators communicate with each other by getting (and setting?) invalid xattrs; the posix xlator has logic to filter out invalid xattrs, but due to bugs this hasn't always worked perfectly. It would be interesting to know which xattrs are getting errors and on which fs types. FWIW, in a quick perusal of a fairly recent (3.14.3) kernel, in xfs there are only six places where EOPNOTSUPP is returned, none of them related to xattrs. In ext[34] EOPNOTSUPP can be returned
Re: [Gluster-devel] Spurious failire ./tests/bugs/bug-1049834.t [16]
- Original Message - From: Avra Sengupta aseng...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Wednesday, May 28, 2014 5:04:40 PM Subject: Re: [Gluster-devel] Spurious failire ./tests/bugs/bug-1049834.t [16] Pranith am looking into a priority issue for snapshot(https://bugzilla.redhat.com/show_bug.cgi?id=1098045) right now, I will get started with this spurious failure as soon as I finish it, which should be max by eod tomorrow. Thanks for the ack Avra. Pranith Regards, Avra On 05/28/2014 06:46 AM, Pranith Kumar Karampuri wrote: FYI, this test failed more than once yesterday. Same test failed both the times. Pranith - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Avra Sengupta aseng...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Wednesday, May 28, 2014 6:43:52 AM Subject: Re: [Gluster-devel] Spurious failire ./tests/bugs/bug-1049834.t [16] CC gluster-devel Pranith - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Avra Sengupta aseng...@redhat.com Sent: Wednesday, May 28, 2014 6:42:53 AM Subject: Spurious failire ./tests/bugs/bug-1049834.t [16] hi Avra, Could you look into it. Patch == http://review.gluster.com/7889/1 Author== Avra Sengupta aseng...@redhat.com Build triggered by== amarts Build-url == http://build.gluster.org/job/regression/4586/consoleFull Download-log-at == http://build.gluster.org:443/logs/regression/glusterfs-logs-20140527:14:51:09.tgz Test written by == Author: Avra Sengupta aseng...@redhat.com ./tests/bugs/bug-1049834.t [16] #!/bin/bash . $(dirname $0)/../include.rc . $(dirname $0)/../cluster.rc . $(dirname $0)/../volume.rc . $(dirname $0)/../snapshot.rc cleanup; 1 TEST verify_lvm_version 2 TEST launch_cluster 2 3 TEST setup_lvm 2 4 TEST $CLI_1 peer probe $H2 5 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 6 TEST $CLI_1 volume create $V0 $H1:$L1 $H2:$L2 7 EXPECT 'Created' volinfo_field $V0 'Status' 8 TEST $CLI_1 volume start $V0 9 EXPECT 'Started' volinfo_field $V0 'Status' #Setting the snap-max-hard-limit to 4 10 TEST $CLI_1 snapshot config $V0 snap-max-hard-limit 4 PID_1=$! wait $PID_1 #Creating 3 snapshots on the volume (which is the soft-limit) 11 TEST create_n_snapshots $V0 3 $V0_snap 12 TEST snapshot_n_exists $V0 3 $V0_snap #Creating the 4th snapshot on the volume and expecting it to be created # but with the deletion of the oldest snapshot i.e 1st snapshot 13 TEST $CLI_1 snapshot create ${V0}_snap4 ${V0} 14 TEST snapshot_exists 1 ${V0}_snap4 15 TEST ! snapshot_exists 1 ${V0}_snap1 ***16 TEST $CLI_1 snapshot delete ${V0}_snap4 17 TEST $CLI_1 snapshot create ${V0}_snap1 ${V0} 18 TEST snapshot_exists 1 ${V0}_snap1 #Deleting the 4 snaps #TEST delete_n_snapshots $V0 4 $V0_snap #TEST ! snapshot_n_exists $V0 4 $V0_snap cleanup; Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [wireshark] TODO features
- Original Message - From: Vikhyat Umrao vum...@redhat.com To: Niels de Vos nde...@redhat.com Cc: gluster-devel@gluster.org Sent: Wednesday, May 28, 2014 3:37:47 PM Subject: Re: [Gluster-devel] [wireshark] TODO features Hi Niels, Thanks for all your inputs and help, I have submitted a patch: https://code.wireshark.org/review/1833 I have absolutely no idea how this is supposed to work, but just wanted to ask what will the 'name' variable be if the file name is 'EMPTY' i.e. RPC_STRING_EMPTY Pranith glusterfs: show filenames in the summary for common procedures With this patch we will have filename on the summary for procedures MKDIR, CREATE and LOOKUP. Example output: 173 18.309307 192.168.100.3 - 192.168.100.4 GlusterFS 224 MKDIR V330 MKDIR Call, Filename: testdir 2606 36.767766 192.168.100.3 - 192.168.100.4 GlusterFS 376 LOOKUP V330 LOOKUP Call, Filename: 1.txt 2612 36.768242 192.168.100.3 - 192.168.100.4 GlusterFS 228 CREATE V330 CREATE Call, Filename: 1.txt That looks good :-) Pranith Thanks, Vikhyat From: Niels de Vos nde...@redhat.com To: Vikhyat Umrao vum...@redhat.com Cc: gluster-devel@gluster.org Sent: Tuesday, April 29, 2014 11:16:20 PM Subject: Re: [Gluster-devel] [wireshark] TODO features On Tue, Apr 29, 2014 at 06:25:15AM -0400, Vikhyat Umrao wrote: Hi, I am interested in TODO wireshark features for GlusterFS : I can start from below given feature for one procedure: = display the filename or filehandle on the summary for common procedures Things to get you and others prepared: 1. go to https://forge.gluster.org/wireshark/pages/Todo 2. login and edit the wiki page, add your name to the topic 3. clone the wireshark repository: $ git clone g...@forge.gluster.org:wireshark/wireshark.git (you have been added to the 'wireshark' group, so you should have push access over ssh) 4. create a new branch for your testing $ git checkout -t -b wip/master/visible-filenames upstream/master 5. make sure you have all the dependencies for compiling Wireshark (quite a lot are needed) $ ./autogen.sh $ ./configure --disable-wireshark (I tend to build only the commandline tools like 'tshark') $ make 6. you should now have a ./tshark executable that you can use for testing The changes you want to make are in epan/dissectors/packet-glusterfs.c. For example, start with adding the name of the file/dir that is passed to LOOKUP. The work to dissect the data in the network packet is done in glusterfs_gfs3_3_op_lookup_call(). It does not really matter on how that function gets executed, that is more a thing for an other task (add support for new procedures). In the NFS-dissector, you can see how this is done. Check the implementation of the dissect_nfs3_lookup_call() function in epan/dissectors/packet-nfs.c. The col_append_fstr() function achieves what you want to do. Of course, you really should share your changes! Now, 'git commit' your change with a suitable commit message and do $ git push origin wip/master/visible-filenames Your branch should now be visible under https://forge.gluster.org/wireshark/wireshark. Let me know, and I'll give it a whirl. Now you've done the filename for LOOKUP, I'm sure you can think of other things that make sense to get displayed. Do ask questions and send corrections if something is missing, or not working as explained here. This email should probably get included in the projects wiki https://forge.gluster.org/wireshark/pages/Home some where. Good luck, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Spurious failure in ./tests/bugs/bug-1038598.t [28]
hi Anuradha, Please look into this. Patch == http://review.gluster.com/#/c/7880/1 Author== Emmanuel Dreyfus m...@netbsd.org Build triggered by== kkeithle Build-url == http://build.gluster.org/job/regression/4603/consoleFull Download-log-at == http://build.gluster.org:443/logs/regression/glusterfs-logs-20140528:18:25:12.tgz Test written by == Author: Anuradha ata...@redhat.com ./tests/bugs/bug-1038598.t [28] 0 #!/bin/bash 1 . $(dirname $0)/../include.rc 2 . $(dirname $0)/../volume.rc 3 4 cleanup; 5 6 TEST glusterd 7 TEST pidof glusterd 8 TEST $CLI volume info; 9 10 TEST $CLI volume create $V0 replica 2 $H0:$B0/${V0}{1,2}; 11 12 function hard_limit() 13 { 14 local QUOTA_PATH=$1; 15 $CLI volume quota $V0 list $QUOTA_PATH | grep $QUOTA_PATH | awk '{print $2}' 16 } 17 18 function soft_limit() 19 { 20 local QUOTA_PATH=$1; 21 $CLI volume quota $V0 list $QUOTA_PATH | grep $QUOTA_PATH | awk '{print $3}' 22 } 23 24 function usage() 25 { 26 local QUOTA_PATH=$1; 27 $CLI volume quota $V0 list $QUOTA_PATH | grep $QUOTA_PATH | awk '{print $4}' 28 } 29 30 function sl_exceeded() 31 { 32 local QUOTA_PATH=$1; 33 $CLI volume quota $V0 list $QUOTA_PATH | grep $QUOTA_PATH | awk '{print $6}' 34 } 35 36 function hl_exceeded() 37 { 38 local QUOTA_PATH=$1; 39 $CLI volume quota $V0 list $QUOTA_PATH | grep $QUOTA_PATH | awk '{print $7}' 40 41 } 42 43 EXPECT $V0 volinfo_field $V0 'Volume Name'; 44 EXPECT 'Created' volinfo_field $V0 'Status'; 45 EXPECT '2' brick_count $V0 46 47 TEST $CLI volume start $V0; 48 EXPECT 'Started' volinfo_field $V0 'Status'; 49 50 TEST $CLI volume quota $V0 enable 51 sleep 5 52 53 TEST glusterfs -s $H0 --volfile-id $V0 $M0; 54 55 TEST mkdir -p $M0/test_dir 56 TEST $CLI volume quota $V0 limit-usage /test_dir 10MB 50 57 58 EXPECT 10.0MB hard_limit /test_dir; 59 EXPECT 50% soft_limit /test_dir; 60 61 TEST dd if=/dev/zero of=$M0/test_dir/file1.txt bs=1M count=4 62 EXPECT 4.0MB usage /test_dir; 63 EXPECT 'No' sl_exceeded /test_dir; 64 EXPECT 'No' hl_exceeded /test_dir; 65 66 TEST dd if=/dev/zero of=$M0/test_dir/file1.txt bs=1M count=6 67 EXPECT 6.0MB usage /test_dir; 68 EXPECT 'Yes' sl_exceeded /test_dir; 69 EXPECT 'No' hl_exceeded /test_dir; 70 71 #set timeout to 0 so that quota gets enforced without any lag 72 TEST $CLI volume set $V0 features.hard-timeout 0 73 TEST $CLI volume set $V0 features.soft-timeout 0 74 75 TEST ! dd if=/dev/zero of=$M0/test_dir/file1.txt bs=1M count=15 76 EXPECT 'Yes' sl_exceeded /test_dir; ***77 EXPECT 'Yes' hl_exceeded /test_dir; 78 79 cleanup; Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Change in glusterfs[master]: NetBSD build fix for gettext
Done Pranith - Original Message - From: Emmanuel Dreyfus m...@netbsd.org To: jGluster Devel gluster-devel@gluster.org Sent: Thursday, May 29, 2014 1:53:12 PM Subject: Re: [Gluster-devel] Change in glusterfs[master]: NetBSD build fix for gettext http://build.gluster.org/job/regression/4603/consoleFull : FAILED Is it possible to reschedule this test? I feel like something was wrong which is not related to my change. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regarding fsetattr
hi, When I run the following program on fuse mount it fails with ENOENT. When I look at the mount logs, it prints error for setattr instead of fsetattr. Wondering anyone knows why the fop comes as setattr instead of fsetattr. Log: [2014-05-29 09:33:38.658023] W [fuse-bridge.c:1056:fuse_setattr_cbk] 0-glusterfs-fuse: 2569: SETATTR() gfid:ae44dd74-ff45-42a8-886e-b4ce2373a267 = -1 (No such file or directory) Program: #include stdio.h #include unistd.h #include sys/types.h #include sys/stat.h #include fcntl.h #include errno.h #include string.h int main () { int ret = 0; int fd=open(a.txt, O_CREAT|O_RDWR); if (fd 0) printf (open failed: %s\n, strerror(errno)); ret = unlink(a.txt); if (ret 0) printf (unlink failed: %s\n, strerror(errno)); if (write (fd, abc, 3) 0) printf (Not able to print %s\n, strerror (errno)); ret = fchmod (fd, S_IRUSR|S_IWUSR|S_IXUSR); if (ret 0) printf (fchmod failed %s\n, strerror(errno)); return 0; } Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding fsetattr
- Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: jGluster Devel gluster-devel@gluster.org Cc: Brian Foster bfos...@redhat.com Sent: Thursday, May 29, 2014 3:08:33 PM Subject: regarding fsetattr hi, When I run the following program on fuse mount it fails with ENOENT. When I look at the mount logs, it prints error for setattr instead of fsetattr. Wondering anyone knows why the fop comes as setattr instead of fsetattr. Log: [2014-05-29 09:33:38.658023] W [fuse-bridge.c:1056:fuse_setattr_cbk] 0-glusterfs-fuse: 2569: SETATTR() gfid:ae44dd74-ff45-42a8-886e-b4ce2373a267 = -1 (No such file or directory) Program: #include stdio.h #include unistd.h #include sys/types.h #include sys/stat.h #include fcntl.h #include errno.h #include string.h int main () { int ret = 0; int fd=open(a.txt, O_CREAT|O_RDWR); if (fd 0) printf (open failed: %s\n, strerror(errno)); ret = unlink(a.txt); if (ret 0) printf (unlink failed: %s\n, strerror(errno)); if (write (fd, abc, 3) 0) printf (Not able to print %s\n, strerror (errno)); ret = fchmod (fd, S_IRUSR|S_IWUSR|S_IXUSR); if (ret 0) printf (fchmod failed %s\n, strerror(errno)); return 0; } Based on vijay's inputs I checked in fuse-brige and this is what I see: 1162if (fsi-valid FATTR_FH 1163!(fsi-valid (FATTR_ATIME|FATTR_MTIME))) { 1164/* We need no loc if kernel sent us an fd and 1165 * we are not fiddling with times */ 1166state-fd = FH_TO_FD (fsi-fh); (gdb) 1167fuse_resolve_fd_init (state, state-resolve, state-fd); 1168} else { 1169fuse_resolve_inode_init (state, state-resolve, finh-nodeid); 1170} 1171 (gdb) p fsi-valid $4 = 1 (gdb) p (fsi-valid FATTR_FH) $5 = 0 (gdb) fsi-valid doesn't have FATTR_FH. Who is supposed to set it? Pranith Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding fsetattr
- Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: jGluster Devel gluster-devel@gluster.org Sent: Thursday, May 29, 2014 3:37:37 PM Subject: Re: [Gluster-devel] regarding fsetattr - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: jGluster Devel gluster-devel@gluster.org Cc: Brian Foster bfos...@redhat.com Sent: Thursday, May 29, 2014 3:08:33 PM Subject: regarding fsetattr hi, When I run the following program on fuse mount it fails with ENOENT. When I look at the mount logs, it prints error for setattr instead of fsetattr. Wondering anyone knows why the fop comes as setattr instead of fsetattr. Log: [2014-05-29 09:33:38.658023] W [fuse-bridge.c:1056:fuse_setattr_cbk] 0-glusterfs-fuse: 2569: SETATTR() gfid:ae44dd74-ff45-42a8-886e-b4ce2373a267 = -1 (No such file or directory) Program: #include stdio.h #include unistd.h #include sys/types.h #include sys/stat.h #include fcntl.h #include errno.h #include string.h int main () { int ret = 0; int fd=open(a.txt, O_CREAT|O_RDWR); if (fd 0) printf (open failed: %s\n, strerror(errno)); ret = unlink(a.txt); if (ret 0) printf (unlink failed: %s\n, strerror(errno)); if (write (fd, abc, 3) 0) printf (Not able to print %s\n, strerror (errno)); ret = fchmod (fd, S_IRUSR|S_IWUSR|S_IXUSR); if (ret 0) printf (fchmod failed %s\n, strerror(errno)); return 0; } Based on vijay's inputs I checked in fuse-brige and this is what I see: 1162 if (fsi-valid FATTR_FH 1163 !(fsi-valid (FATTR_ATIME|FATTR_MTIME))) { 1164 /* We need no loc if kernel sent us an fd and 1165 * we are not fiddling with times */ 1166 state-fd = FH_TO_FD (fsi-fh); (gdb) 1167 fuse_resolve_fd_init (state, state-resolve, state-fd); 1168 } else { 1169 fuse_resolve_inode_init (state, state-resolve, finh-nodeid); 1170 } 1171 (gdb) p fsi-valid $4 = 1 (gdb) p (fsi-valid FATTR_FH) $5 = 0 (gdb) fsi-valid doesn't have FATTR_FH. Who is supposed to set it? had a discussion with brian foster on IRC. The issue is that gluster depends on client fd to be passed down to perform the operations where as setattr is sent on an inode from vfs to fuse and since gluster doesn't have any reference to inode once unlink happens, this issue is seen. I will have one more conversation with brian to find what needs to be fixed. Pranith. Pranith Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Initiative to increase developer paritcipation
hi, We are taking an initiative to come up with some easy bugs where we can help volunteers in the community to send patches for. Goals of this initiative: - Each maintainer needs to come up with a list of bugs that are easy to fix in their components. - All the developers who are already active in the community to help the new comers by answering the questions. - Improve developer documentation to address FAQ - Over time make these new comers as experienced developers in glusterfs :-) Maintainers, Could you please come up with the initial list of bugs by next Wednesday before community meeting? Niels, Could you send out the guideline to mark the bugs as easy fix. Also the wiki link for backports. PS: This is not just for new comers to the community but also for existing developers to explore other components. Please feel free to suggest and give feedback to improve this process :-). Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious failure in tests/basic/bd.t [22, 23, 24, 25]
- Original Message - From: Bharata B Rao bharata@gmail.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: jGluster Devel gluster-devel@gluster.org, M. Mohan Kumar mohankuma...@gmail.com Sent: Friday, May 30, 2014 8:28:15 AM Subject: Re: [Gluster-devel] Spurious failure in tests/basic/bd.t [22, 23, 24, 25] CC'ing to the correct ID of Mohan Thanks! Pranith On Fri, May 30, 2014 at 5:45 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: hi Mohan, Could you please look into this: Patch == http://review.gluster.com/#/c/7926/1 Author== Avra Sengupta aseng...@redhat.com Build triggered by== amarts Build-url == http://build.gluster.org/job/regression/4615/consoleFull Download-log-at == http://build.gluster.org:443/logs/regression/glusterfs-logs-20140529:10:51:46.tgz Test written by == Author: M. Mohan Kumar mo...@in.ibm.com ./tests/basic/bd.t [22, 23, 24, 25] 0 #!/bin/bash 1 2 . $(dirname $0)/../include.rc 3 4 function execute() 5 { 6 cmd=$1 7 shift 8 ${cmd} $@ /dev/null 21 9 } 10 11 function bd_cleanup() 12 { 13 execute vgremove -f ${V0} 14 execute pvremove ${ld} 15 execute losetup -d ${ld} 16 execute rm ${BD_DISK} 17 cleanup 18 } 19 20 function check() 21 { 22 if [ $? -ne 0 ]; then 23 echo prerequsite $@ failed 24 bd_cleanup 25 exit 26 fi 27 } 28 29 SIZE=256 #in MB 30 31 bd_cleanup; 32 33 ## Configure environment needed for BD backend volumes 34 ## Create a file with configured size and 35 ## set it as a temporary loop device to create 36 ## physical volume VG. These are basic things needed 37 ## for testing BD xlator if anyone of these steps fail, 38 ## test script exits 39 function configure() 40 { 41 GLDIR=`$CLI system:: getwd` 42 BD_DISK=${GLDIR}/bd_disk 43 44 execute truncate -s${SIZE}M ${BD_DISK} 45 check ${BD_DISK} creation 46 47 execute losetup -f 48 check losetup 49 ld=`losetup -f` 50 51 execute losetup ${ld} ${BD_DISK} 52 check losetup ${BD_DISK} 53 execute pvcreate -f ${ld} 54 check pvcreate ${ld} 55 execute vgcreate ${V0} ${ld} 56 check vgcreate ${V0} 57 execute lvcreate --thin ${V0}/pool --size 128M 58 } 59 60 function volinfo_field() 61 { 62 local vol=$1; 63 local field=$2; 64 $CLI volume info $vol | grep ^$field: | sed 's/.*: //'; 65 } 66 67 function volume_type() 68 { 69 getfattr -n volume.type $M0/. --only-values --absolute-names -e text 70 } 71 72 TEST glusterd 73 TEST pidof glusterd 74 configure 75 76 TEST $CLI volume create $V0 ${H0}:/$B0/$V0?${V0} 77 EXPECT $V0 volinfo_field $V0 'Volume Name'; 78 EXPECT 'Created' volinfo_field $V0 'Status'; 79 80 ## Start volume and verify 81 TEST $CLI volume start $V0; 82 EXPECT 'Started' volinfo_field $V0 'Status' 83 84 TEST glusterfs --volfile-id=/$V0 --volfile-server=$H0 $M0 85 EXPECT '1' volume_type 86 87 ## Create posix file 88 TEST touch $M0/posix 89 90 TEST touch $M0/lv 91 gfid=`getfattr -n glusterfs.gfid.string $M0/lv --only-values --absolute-names` 92 TEST setfattr -n user.glusterfs.bd -v lv:4MB $M0/lv 93 # Check if LV is created 94 TEST stat /dev/$V0/${gfid} 95 96 ## Create filesystem 97 sleep 1 98 TEST mkfs.ext4 -qF $M0/lv 99 # Cloning 100 TEST touch $M0/lv_clone 101 gfid=`getfattr -n glusterfs.gfid.string $M0/lv_clone --only-values --absolute-names` 102 TEST setfattr -n clone -v ${gfid} $M0/lv 103 TEST stat /dev/$V0/${gfid} 104 105 sleep 1 106 ## Check mounting 107 TEST mount -o loop $M0/lv $M1 108 umount $M1 109 110 # Snapshot 111 TEST touch $M0/lv_sn 112 gfid=`getfattr -n glusterfs.gfid.string $M0/lv_sn --only-values --absolute-names` 113 TEST setfattr -n snapshot -v ${gfid} $M0/lv 114 TEST stat /dev/$V0/${gfid} 115 116 # Merge 117 sleep 1 **118 TEST setfattr -n merge -v $M0/lv_sn $M0/lv_sn **119 TEST ! stat $M0/lv_sn **120 TEST ! stat /dev/$V0/${gfid} 121 122 123 rm $M0/* -f 124 **125 TEST umount $M0 126 TEST $CLI volume stop ${V0} 127 EXPECT 'Stopped' volinfo_field $V0 'Status'; 128 TEST $CLI volume delete ${V0} 129 130 bd_cleanup Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http
Re: [Gluster-devel] Initiative to increase developer paritcipation
CC gluster-devel Pranith - Original Message - From: HUANG Qiulan huan...@ihep.ac.cn To: Pranith Kumar Karampuri pkara...@redhat.com Sent: Friday, May 30, 2014 9:10:54 AM Subject: Re: [Gluster-devel] Initiative to increase developer paritcipation Hi Pranith, I'm glad to participate the Gluster developer team. First introduce myself briefly, I'm a staff of Computing Center, Institute of High Energy Physics, Chinese Academy Science. I have deployed Gluster 3.2.7 in our computing farm with 5 servers which provides storage services for physicists and is about 315TB. For the production package, I do many changes like data distribution ,optimize the lookup request without send requests to all bricks only the hash and hash+1 brick and so on. Recently, I developed a distributed metadata services for Gluster which is being tested. Hope you are intested the work what I have done. Thank you. Cheers, Qiulan Computing center,the Institute of High Energy Physics, China Huang, QiulanTel: (+86) 10 8823 6010-105 P.O. Box 918-7 Fax: (+86) 10 8823 6839 Beijing 100049 P.R. China Email: huan...@ihep.ac.cn === -原始邮件- 发件人: Pranith Kumar Karampuri pkara...@redhat.com 发送时间: 2014年5月30日 星期五 收件人: jGluster Devel gluster-devel@gluster.org, gluster-users gluster-us...@gluster.org 抄送: Kaushal Madappa kmada...@redhat.com 主题: [Gluster-devel] Initiative to increase developer paritcipation hi, We are taking an initiative to come up with some easy bugs where we can help volunteers in the community to send patches for. Goals of this initiative: - Each maintainer needs to come up with a list of bugs that are easy to fix in their components. - All the developers who are already active in the community to help the new comers by answering the questions. - Improve developer documentation to address FAQ - Over time make these new comers as experienced developers in glusterfs :-) Maintainers, Could you please come up with the initial list of bugs by next Wednesday before community meeting? Niels, Could you send out the guideline to mark the bugs as easy fix. Also the wiki link for backports. PS: This is not just for new comers to the community but also for existing developers to explore other components. Please feel free to suggest and give feedback to improve this process :-). Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr
- Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Vijay Bellur vbel...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Wednesday, May 28, 2014 4:16:32 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr Vijay, Could you please merge http://review.gluster.com/7788 if there are no more concerns. Gentle reminder. Pranith. Pranith - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Harshavardhana har...@harshavardhana.net Cc: Gluster Devel gluster-devel@gluster.org Sent: Monday, May 26, 2014 1:18:18 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr Please review http://review.gluster.com/7788 submitted to remove the filtering of that error. Pranith - Original Message - From: Harshavardhana har...@harshavardhana.net To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Kaleb KEITHLEY kkeit...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Friday, May 23, 2014 2:12:02 AM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr http://review.gluster.com/#/c/7823/ - the fix here On Thu, May 22, 2014 at 1:41 PM, Harshavardhana har...@harshavardhana.net wrote: Here are the important locations in the XFS tree coming from 2.6.32 branch STATIC int xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl) { struct xfs_inode *ip = XFS_I(inode); unsigned char *ea_name; int error; if (S_ISLNK(inode-i_mode)) I would generally think this is the issue. return -EOPNOTSUPP; STATIC long xfs_vn_fallocate( struct inode*inode, int mode, loff_t offset, loff_t len) { longerror; loff_t new_size = 0; xfs_flock64_t bf; xfs_inode_t *ip = XFS_I(inode); int cmd = XFS_IOC_RESVSP; int attr_flags = XFS_ATTR_NOLOCK; if (mode ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) return -EOPNOTSUPP; STATIC int xfs_ioc_setxflags( xfs_inode_t *ip, struct file *filp, void__user *arg) { struct fsxattr fa; unsigned intflags; unsigned intmask; int error; if (copy_from_user(flags, arg, sizeof(flags))) return -EFAULT; if (flags ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \ FS_NOATIME_FL | FS_NODUMP_FL | \ FS_SYNC_FL)) return -EOPNOTSUPP; Perhaps some sort of system level acl's are being propagated by us over symlinks() ? - perhaps this is the related to the same issue of following symlinks? On Sun, May 18, 2014 at 10:48 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Sent the following patch to remove the special treatment of ENOTSUP here: http://review.gluster.org/7788 Pranith - Original Message - From: Kaleb KEITHLEY kkeit...@redhat.com To: gluster-devel@gluster.org Sent: Tuesday, May 13, 2014 8:01:53 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr On 05/13/2014 08:00 AM, Nagaprasad Sathyanarayana wrote: On 05/07/2014 03:44 PM, Pranith Kumar Karampuri wrote: - Original Message - From: Raghavendra Gowdappa rgowd...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Vijay Bellur vbel...@redhat.com, gluster-devel@gluster.org, Anand Avati aav...@redhat.com Sent: Wednesday, May 7, 2014 3:42:16 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr I think with repetitive log message suppression patch being merged, we don't really need gf_log_occasionally (except if they are logged in DEBUG or TRACE levels). That definitely helps. But still, setxattr calls are not supposed to fail with ENOTSUP on FS where we support gluster. If there are special keys which fail with ENOTSUPP, we can conditionally log setxattr failures only when the key is something new? I know this is about EOPNOTSUPP (a.k.a. ENOTSUPP) returned by setxattr(2) for legitimate attrs. But I can't help but wondering if this isn't related to other bugs we've had with, e.g., lgetxattr(2) called on invalid xattrs? E.g. see https://bugzilla.redhat.com/show_bug.cgi?id=765202. We have a hack where xlators communicate
Re: [Gluster-devel] All builds are failing with BUILD ERROR
Guys its failing again with the same error: Please proceed with configuring, compiling, and installing. rm: cannot remove `/build/install/var/run/gluster/patchy': Device or resource busy + RET=1 + '[' 1 '!=' 0 ']' + VERDICT='BUILD FAILURE' Pranith On 06/02/2014 09:08 PM, Justin Clift wrote: On 02/06/2014, at 7:04 AM, Kaleb KEITHLEY wrote: snip someone cleaned the loopback devices. I deleted 500 unix domain sockets in /d/install/var/run and requeued the regressions. Interesting. The extra sockets problem is what prompted me to rewrite the cleanup function. The sockets are being created by glusterd during each test startup, but aren't removed by the existing cleanup function. (so, substantial build up over time) I'm not sure which of those two things was the solution. _Probably_ the loopback device thing. The extra sockets seem to be messy but (so far) I haven't seen them break anything. + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Erasure coding doubts session
hi Xavier, Some of the developers are reading the code you submitted for erasure code. We want to know if you would be available on Friday IST so that we can have a discussion and doubt clarification session on IRC. Could you tell which time is good for you. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Need testers for GlusterFS 3.4.4
On 06/04/2014 01:35 AM, Ben Turner wrote: - Original Message - From: Justin Clift jus...@gluster.org To: Ben Turner btur...@redhat.com Cc: James purplei...@gmail.com, gluster-us...@gluster.org, Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 29, 2014 6:12:40 PM Subject: Re: [Gluster-users] [Gluster-devel] Need testers for GlusterFS 3.4.4 On 29/05/2014, at 8:04 PM, Ben Turner wrote: From: James purplei...@gmail.com Sent: Wednesday, May 28, 2014 5:21:21 PM On Wed, May 28, 2014 at 5:02 PM, Justin Clift jus...@gluster.org wrote: Hi all, Are there any Community members around who can test the GlusterFS 3.4.4 beta (rpms are available)? I've provided all the tools and how-to to do this yourself. Should probably take about ~20 min. Old example: https://ttboj.wordpress.com/2014/01/16/testing-glusterfs-during-glusterfest/ Same process should work, except base your testing on the latest vagrant article: https://ttboj.wordpress.com/2014/05/13/vagrant-on-fedora-with-libvirt-reprise/ If you haven't set it up already. I can help out here, I'll have a chance to run through some stuff this weekend. Where should I post feedback? Excellent Ben! Please send feedback to gluster-devel. :) So far so good on 3.4.4, sorry for the delay here. I had to fix my downstream test suites to run outside of RHS / downstream gluster. I did basic sanity testing on glusterfs mounts including: FSSANITY_TEST_LIST: arequal bonnie glusterfs_build compile_kernel dbench dd ffsb fileop fsx fs_mark iozone locks ltp multiple_files posix_compliance postmark read_large rpc syscallbench tiobench I am starting on NFS now, I'll have results tonight or tomorrow morning. I'll look updating the component scripts to work and run them as well. Thanks a lot for this ben. Justin, Ben, Do you think we can automate running of these scripts without a lot of human intervention? If yes, how can I help? We can use that just before making any release in future :-). Pranith -b + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator
On 06/04/2014 11:37 AM, Krutika Dhananjay wrote: Hi, Recently there was a crash in locks translator (BZ 1103347, BZ 1097102) with the following backtrace: (gdb) bt #0 uuid_unpack (in=0x8 Address 0x8 out of bounds, uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44 #1 0x7feeba9e19d6 in uuid_unparse_x (uu=value optimized out, out=0x2350fc0 081bbc7a-7551-44ac-85c7-aad5e2633db9, fmt=0x7feebaa08e00 %08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x) at ../../contrib/uuid/unparse.c:55 #2 0x7feeba9be837 in uuid_utoa (uuid=0x8 Address 0x8 out of bounds) at common-utils.c:2138 #3 0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 #5 0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, client=value optimized out) at posix.c:2550 #6 0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at client_t.c:368 #7 0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, client=0x27724a0, flags=value optimized out) at server-helpers.c:354 #8 0x7feeab77ae2c in server_rpc_notify (rpc=value optimized out, xl=0x2316390, event=value optimized out, data=0x2bf51c0) at server.c:527 #9 0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720 #10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, mydata=value optimized out, event=value optimized out, data=0x2bf51c0) at rpcsvc.c:758 #11 0x7feeba778638 in rpc_transport_notify (this=value optimized out, event=value optimized out, data=value optimized out) at rpc-transport.c:512 #12 0x7feeb115e971 in socket_event_poll_err (fd=value optimized out, idx=value optimized out, data=0x2bf51c0, poll_in=value optimized out, poll_out=0, poll_err=0) at socket.c:1071 #13 socket_event_handler (fd=value optimized out, idx=value optimized out, data=0x2bf51c0, poll_in=value optimized out, poll_out=0, poll_err=0) at socket.c:2240 #14 0x7feeba9fc6a7 in event_dispatch_epoll_handler (event_pool=0x22e2d00) at event-epoll.c:384 #15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445 #16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at glusterfsd.c:2023 (gdb) f 4 #4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428 428pl_inodelk_log_cleanup (l); (gdb) p l-pl_inode-refkeeper $1 = (inode_t *) 0x0 (gdb) pl_inode-refkeeper was found to be NULL even when there were some blocked inodelks in a certain domain of the inode, which when dereferenced by the epoll thread in the cleanup codepath led to a crash. On inspecting the code (for want of a consistent reproducer), three things were found: 1. The function where the crash happens (pl_inodelk_log_cleanup()), makes an attempt to resolve the inode to path as can be seen below. But the way inode_path() itself works is to first construct the path based on the given inode's ancestry and place it in the buffer provided. And if all else fails, the gfid of the inode is placed in a certain format (gfid:%s). This eliminates the need for statements from line 4 through 7 below, thereby preventing dereferencing of pl_inode-refkeeper. Now, although this change prevents the crash altogether, it still does not fix the race that led to pl_inode-refkeeper becoming NULL, and comes at the cost of printing (null) in the log message on line 9 every time pl_inode-refkeeper is found to be NULL, rendering the logged messages somewhat useless. code 0 pl_inode = lock-pl_inode; 1 2 inode_path (pl_inode-refkeeper, NULL, path); 3 4 if (path) 5 file = path; 6 else 7 file = uuid_utoa (pl_inode-refkeeper-gfid); 8 9 gf_log (THIS-name, GF_LOG_WARNING, 10 releasing lock on %s held by 11 {client=%p, pid=%PRId64 lk-owner=%s}, 12 file, lock-client, (uint64_t) lock-client_pid, 13 lkowner_utoa (lock-owner)); \code I think this logging code is from the days when gfid handle concept was not there. So it wasn't returning gfid:gfid-str in cases the path is not present in the dentries. I believe the else block can be deleted safely now. Pranith 2. There is at least one codepath found that can lead to this crash: Imagine an inode on which an inodelk operation is attempted by a client and is successfully granted too. Now, between the time the lock was granted and pl_update_refkeeper() was called by this thread, the client could send a DISCONNECT event, causing cleanup codepath to be executed, where the epoll thread crashes on dereferencing pl_inode-refkeeper which is STILL NULL at this point. Besides, there are still places in locks xlator where the refkeeper is NOT updated whenever the lists are modified - for instance in the cleanup codepath from a
Re: [Gluster-devel] [Gluster-users] Need testers for GlusterFS 3.4.4
On 06/04/2014 07:44 PM, Ben Turner wrote: - Original Message - From: Justin Clift jus...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Ben Turner btur...@redhat.com, gluster-us...@gluster.org, Gluster Devel gluster-devel@gluster.org Sent: Wednesday, June 4, 2014 9:35:47 AM Subject: Re: [Gluster-users] [Gluster-devel] Need testers for GlusterFS 3.4.4 On 04/06/2014, at 6:33 AM, Pranith Kumar Karampuri wrote: On 06/04/2014 01:35 AM, Ben Turner wrote: Sent: Thursday, May 29, 2014 6:12:40 PM snip FSSANITY_TEST_LIST: arequal bonnie glusterfs_build compile_kernel dbench dd ffsb fileop fsx fs_mark iozone locks ltp multiple_files posix_compliance postmark read_large rpc syscallbench tiobench I am starting on NFS now, I'll have results tonight or tomorrow morning. I'll look updating the component scripts to work and run them as well. Thanks a lot for this ben. Justin, Ben, Do you think we can automate running of these scripts without a lot of human intervention? If yes, how can I help? We can use that just before making any release in future :-). It's a decent idea. :) Do you have time to get this up and running? Yep, can do. I'll see what else I can get going as well, I'll start with the sanity tests I mentioned above and go from there. How often do we want these run? Daily? Weekly? On GIT checkin? Only on RC? How long does it take to run them? Pranith -b + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Shall we revert quota-anon-fd.t?
hi, I see that quota-anon-fd.t is causing too many spurious failures. I think we should revert it and raise a bug so that it can be fixed and committed again along with the fix. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Please use http://build.gluster.org/job/rackspace-regression/
hi Guys, Rackspace slaves are in action now, thanks to Justin. Please use the URL in Subject to run the regressions. I already shifted some jobs to rackspace. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t
Thanks a lot for quick resolution Sachin Pranith On 06/12/2014 04:38 PM, Sachin Pandit wrote: http://review.gluster.org/#/c/8041/ is merged upstream. ~ Sachin. - Original Message - From: Sachin Pandit span...@redhat.com To: Raghavendra Talur rta...@redhat.com Cc: Pranith Kumar Karampuri pkara...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Thursday, June 12, 2014 12:58:44 PM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t Patch link http://review.gluster.org/#/c/8041/. ~ Sachin. - Original Message - From: Raghavendra Talur rta...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Sachin Pandit span...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Thursday, June 12, 2014 10:46:14 AM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t Sachin and I looked at the failure. Current guess is that glusterd_2 had not yet completed the handshake with glusterd_1 and hence did not know about the option set. KP suggested that instead of having a sleep before this command, we could get peer status and verify that it is 1 and then get the vol info. Although even this does not make the test fully deterministic, we will be closer to it. Sachin will send out a patch for the same. Raghavendra Talur - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Sachin Pandit span...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Thursday, June 12, 2014 9:54:03 AM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t Check the logs to find the reason. Pranith. On 06/12/2014 09:24 AM, Sachin Pandit wrote: I am not hitting this even after running the test case in a loop. I'll update in this thread once I find out the root cause of the failure. ~ Sachin - Original Message - From: Sachin Pandit span...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Thursday, June 12, 2014 8:50:40 AM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t I will look into this. - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Gluster Devel gluster-devel@gluster.org Cc: rta...@redhat.com, span...@redhat.com Sent: Wednesday, June 11, 2014 9:08:44 PM Subject: spurious regression failure in tests/bugs/bug-1104642.t Raghavendra/Sachin, Could one of you guys take a look at this please. pk1@localhost - ~/workspace/gerrit-repo (master) 21:04:46 :) ⚡ ~/.scripts/regression.py http://build.gluster.org/job/regression/4831/consoleFull Patch == http://review.gluster.com/#/c/7994/2 Author == Raghavendra Talur rta...@redhat.com Build triggered by == amarts Build-url == http://build.gluster.org/job/regression/4831/consoleFull Download-log-at == http://build.gluster.org:443/logs/regression/glusterfs-logs-20140611:08:39:04.tgz Test written by == Author: Sachin Pandit span...@redhat.com ./tests/bugs/bug-1104642.t [13] 0 #!/bin/bash 1 2 . $(dirname $0)/../include.rc 3 . $(dirname $0)/../volume.rc 4 . $(dirname $0)/../cluster.rc 5 6 7 function get_value() 8 { 9 local key=$1 10 local var=CLI_$2 11 12 eval cli_index=\$$var 13 14 $cli_index volume info | grep ^$key\ 15 | sed 's/.*: //' 16 } 17 18 cleanup 19 20 TEST launch_cluster 2 21 22 TEST $CLI_1 peer probe $H2; 23 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 24 25 TEST $CLI_1 volume create $V0 $H1:$B1/${V0}0 $H2:$B2/${V0}1 26 EXPECT $V0 get_value 'Volume Name' 1 27 EXPECT Created get_value 'Status' 1 28 29 TEST $CLI_1 volume start $V0 30 EXPECT Started get_value 'Status' 1 31 32 #Bring down 2nd glusterd 33 TEST kill_glusterd 2 34 35 #set the volume all options from the 1st glusterd 36 TEST $CLI_1 volume set all cluster.server-quorum-ratio 80 37 38 #Bring back the 2nd glusterd 39 TEST $glusterd_2 40 41 #Verify whether the value has been synced 42 EXPECT '80' get_value 'cluster.server-quorum-ratio' 1 ***43 EXPECT '80' get_value 'cluster.server-quorum-ratio' 2 44 45 cleanup; Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious regression test failure in ./tests/bugs/bug-1101143.t
Thanks for reporting. Will take a look. Pranith On 06/12/2014 05:52 PM, Raghavendra Talur wrote: Hi Pranith, This test failed for my patch set today and seems to be a spurious failure. Here is the console output for the run. http://build.gluster.org/job/rackspace-regression/107/consoleFull Could you please have a look at it? -- Thanks! Raghavendra Talur | Red Hat Storage Developer | Bangalore |+918039245176 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] glusterfs split-brain problem
hi, Could you let us know what is the exact problem you are running into? Pranith On 06/13/2014 09:27 AM, Krishnan Parthasarathi wrote: Hi, Pranith, who is the AFR maintainer, would be the best person to answer this question. CC'ing Pranith and gluster-devel. Krish - Original Message - hi Krishnan Parthasarathi Do you tell me which glusterfs-version has great improvement for glusterfs split-brain problem? Can you tell me the relevant links? thank you very much! justgluste...@gmail.com ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Want more spurious regression failure alerts... ?
On 06/13/2014 06:41 PM, Justin Clift wrote: Hi Pranith, Do you want me to keep sending you spurious regression failure notification? There's a fair few of them isn't there? I am doing one run on my VM. I will get back with the ones that fail on my VM. You can also do the same on your machine. Give the output of for i in `cat problematic-ones.txt`; do echo $i $(git log $i | grep Author| tail -1); done Maybe we should make 1 BZ for the lot, and attach the logs to that BZ for later analysis? I am already using 1092850 for this. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Want more spurious regression failure alerts... ?
On 06/15/2014 03:55 PM, Justin Clift wrote: On 15/06/2014, at 3:36 AM, Pranith Kumar Karampuri wrote: On 06/13/2014 06:41 PM, Justin Clift wrote: Hi Pranith, Do you want me to keep sending you spurious regression failure notification? There's a fair few of them isn't there? I am doing one run on my VM. I will get back with the ones that fail on my VM. You can also do the same on your machine. Cool, that should help. :) These are the spurious failures found when running the rackspace-regression-2G tests over friday and yesterday: * bug-859581.t -- SPURIOUS * 4846 - http://slave24.cloud.gluster.org/logs/glusterfs-logs-20140614:14:33:41.tgz * 6009 - http://slave20.cloud.gluster.org/logs/glusterfs-logs-20140613:20:24:58.tgz * 6652 - http://slave22.cloud.gluster.org/logs/glusterfs-logs-20140613:22:04:16.tgz * 7796 - http://slave20.cloud.gluster.org/logs/glusterfs-logs-20140614:14:22:53.tgz * 7987 - http://slave22.cloud.gluster.org/logs/glusterfs-logs-20140613:15:21:04.tgz * 7992 - http://slave10.cloud.gluster.org/logs/glusterfs-logs-20140613:20:21:15.tgz * 8014 - http://slave24.cloud.gluster.org/logs/glusterfs-logs-20140613:20:39:01.tgz * 8054 - http://slave24.cloud.gluster.org/logs/glusterfs-logs-20140613:13:15:50.tgz * 8062 - http://slave10.cloud.gluster.org/logs/glusterfs-logs-20140613:13:28:48.tgz Xavi, Please review http://review.gluster.org/8069 * mgmt_v3-locks.t -- SPURIOUS * 6483 - build.gluster.org - http://build.gluster.org/job/regression/4847/consoleFull * 6630 - http://slave22.cloud.gluster.org/logs/glusterfs-logs-20140614:15:42:39.tgz * 6946 - http://slave21.cloud.gluster.org/logs/glusterfs-logs-20140613:20:57:27.tgz * 7392 - http://slave21.cloud.gluster.org/logs/glusterfs-logs-20140613:13:57:20.tgz * 7852 - http://slave24.cloud.gluster.org/logs/glusterfs-logs-20140613:19:23:17.tgz * 8014 - http://slave24.cloud.gluster.org/logs/glusterfs-logs-20140613:20:39:01.tgz * 8015 - http://slave23.cloud.gluster.org/logs/glusterfs-logs-20140613:14:26:01.tgz * 8048 - http://slave24.cloud.gluster.org/logs/glusterfs-logs-20140613:18:13:07.tgz Avra, Could you take a look. * bug-918437-sh-mtime.t -- SPURIOUS * 6459 - http://slave21.cloud.gluster.org/logs/glusterfs-logs-20140614:18:28:43.tgz * 7493 - http://slave22.cloud.gluster.org/logs/glusterfs-logs-20140613:10:30:16.tgz * 7987 - http://slave10.cloud.gluster.org/logs/glusterfs-logs-20140613:14:23:02.tgz * 7992 - http://slave10.cloud.gluster.org/logs/glusterfs-logs-20140613:20:21:15.tgz Vijay, Could you review and merge http://review.gluster.com/8068 * fops-sanity.t -- SPURIOUS * 8014 - http://slave20.cloud.gluster.org/logs/glusterfs-logs-20140613:18:18:33.tgz * 8066 - http://slave20.cloud.gluster.org/logs/glusterfs-logs-20140614:21:35:57.tgz Still trying to figure this one out. May take a while. * bug-857330/xml.t - SPURIOUS * 7523 - logs may (?) be hard to parse due to other failure data for this CR in them * 8029 - http://slave23.cloud.gluster.org/logs/glusterfs-logs-20140613:16:46:03.tgz Kaushal, Do you want to change the regression test to expect failures in commands executed by EXPECT_WITHIN. i.e. if the command it executes fails then give different output than the one it expects. I fixed quite a few of 'heal full' based spurious failures where they wait for 'cat some-file' to give some output but by the time EXPECT_WITHIN executes 'cat' the file wouldn't even be created. I guess even normal.t will be benefited by this change? Pranith If we resolve these five, our regression testing should be a *lot* more predictable. :) Text file (attached to this email) has the bulk test results. Manually cut-n-pasted from browser to the text doc, so be wary of possible typos. ;) Give the output of for i in `cat problematic-ones.txt`; do echo $i $(git log $i | grep Author| tail -1); done Maybe we should make 1 BZ for the lot, and attach the logs to that BZ for later analysis? I am already using 1092850 for this. Good info. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] idea for reducing probability of spurious regression failures.
hi, Whenever there is a change-to-test/new-test file submitted as part of a commit we run it 5 (may be 10?) times and expect it to be success all the time. This should decrease the probability for spurious regression failures. I am not sure if I have bandwidth this month with the upcoming deadlines for 3.6. I guess I can pursue this change next month if no one completes by that time. Pranith. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] quota tests and usage of sleep
hi, Could you guys remove 'sleep' for quota tests authored by you guys if it can be done. They are leading to spurious failures. I will be sending out a patch removing 'sleep' in other tests. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] tests and umount
hi, I see that most of the tests are doing umount and these may fail sometimes because of EBUSY etc. I am wondering if we should change all of them to umount -l. Let me know if you foresee any problems. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression testing status report
On 06/16/2014 09:24 PM, Justin Clift wrote: On 16/06/2014, at 4:50 PM, Jeff Darcy wrote: Can't thank you enough for this :-) +100 Justin has done a lot of hard, tedious work whipping this infrastructure into better shape, and has significantly improved the project as a result. Such efforts deserve to be recognized. Justin, I owe you a beer. Written thanks to my manager are definitely welcome: :D Daniel Veillard veill...@redhat.com Just saying. ;) Done. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious failure - ./tests/bugs/bug-859581.t
On 06/18/2014 10:11 AM, Atin Mukherjee wrote: On 06/18/2014 10:04 AM, Pranith Kumar Karampuri wrote: On 06/18/2014 09:39 AM, Atin Mukherjee wrote: Pranith, Regression test mentioned in $SUBJECT failed (testcase : 14 16) Console log can be found at http://build.gluster.org/job/rackspace-regression-2GB/227/consoleFull My initial suspect is on HEAL_TIMEOUT (set to 60 seconds) where healing might not have been completed within this time frame and i.e. why EXPECT_WITHIN fails. I am not sure on what basis this HEAL_TIMEOUT's value was derived. Probably you would be the better to analyse it. Having a larger time out value might help here? I don't think it is a spurious failure. There seems to be a bug in afr-v2. I will have to fix that. If its not a spurious failure why its not failing every time? Depends on which subvolume afr picks in readdir. If it reads the one with the directory it will succeed. Otherwise it will fail. Pranith Pranith Cheers, Atin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] tests and umount
On 06/16/2014 09:08 PM, Pranith Kumar Karampuri wrote: On 06/16/2014 09:00 PM, Jeff Darcy wrote: I see that most of the tests are doing umount and these may fail sometimes because of EBUSY etc. I am wondering if we should change all of them to umount -l. Let me know if you foresee any problems. I think I'd try umount -f first. Using -l too much can cause an accumulation of zombie mounts. When I'm hacking around on my own, I sometimes have to do umount -f twice but that's always sufficient. Cool, I will do some kind of EXPECT_WITHIN with umount -f may be 5 times just to be on the safer side. I submitted http://review.gluster.com/8104 for one of the tests as it is failing frequently. Will do the next round later. Pranith If no one has any objections I will send out a patch tomorrow for this. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automating spurious failure status
On 06/19/2014 06:14 PM, Justin Clift wrote: On 19/06/2014, at 1:23 PM, Pranith Kumar Karampuri wrote: hi, I was told that Justin and I were given permission to mark a patch as verified+1 when the tests that failed are spurious failures. I think this process can be automated as well. I already have a script to parse the Console log to identify the tests that failed (I send mails using this, yet to automate the mailing part). What we need to do now is the following: 1) Find the list of tests that are modified/added as part of the commit. 2) Parse the list of tests that failed the full regression (I already have this script). Run 'prove' on these files separately say 5/10 times. If a particular test fails all the time. It is a real failure with more probability. Otherwise it is a spurious failure. If a file that is added as a new test fails even a single time, lets accept the patch after fixing the failures. Otherwise we can give +1 on it, instead of Justin/I manually doing it. Sounds good to me. :) + Justin Also send a mail to gluster-devel about the failures for each test. We'll might want to make that weekly or something? There are several failures every day. :/ Agreed. Pranith + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] 3.5.1 beta 2 Sanity tests
On 06/19/2014 11:32 PM, Justin Clift wrote: On 19/06/2014, at 6:55 PM, Benjamin Turner wrote: snip I went through these a while back and removed anything that wasn't valid for GlusterFS. This test was passing on 3.4.59 when it was released, i am thinking it may have something to do with a sym link to the same directory bz i found a while back? Idk, I'll get it sorted tomorrow. I got this sorted, I needed to add a sleep between the file create and the link. I ran through it manually and it worked every time, took me a few goes to think of timing issue. I didn't need this on 3.4.0.59, is there anything that needs investigated? Any ideas? :) Nope :-( Pranith + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regarding inode-unref on root inode
Does anyone know why inode_unref is no-op for root inode? I see the following code in inode.c static inode_t * __inode_unref (inode_t *inode) { if (!inode) return NULL; if (__is_root_gfid(inode-gfid)) return inode; ... } Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding inode-unref on root inode
On 06/25/2014 11:52 AM, Raghavendra Bhat wrote: On Tuesday 24 June 2014 08:17 PM, Pranith Kumar Karampuri wrote: Does anyone know why inode_unref is no-op for root inode? I see the following code in inode.c static inode_t * __inode_unref (inode_t *inode) { if (!inode) return NULL; if (__is_root_gfid(inode-gfid)) return inode; ... } I think its done with the intention that, root inode should *never* ever get removed from the active inodes list. (not even accidentally). So unref on root-inode is a no-op. Dont know whether there are any other reasons. Thanks, That helps. Pranith. Regards, Raghavendra Bhat Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Feature review: Improved rebalance performance
On 07/01/2014 11:15 AM, Harshavardhana wrote: Besides bandwidth limits, there also needs to be monitors on brick latency. We don't want so many queued iops that operating performance is impacted. AFAIK - rebalance and self-heal threads run in low-priority queue in io-threads by default. No, they don't. We tried doing that but based on experiences from users we disabled that in io-threads. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Running gfid-mismatch.t on NetBSD
Yes this is the expected behavior. Pranith On 07/03/2014 03:25 PM, Emmanuel Dreyfus wrote: Hi Running the first test on NetBSD, I get: = TEST 11 (line 22): ! find /mnt/glusterfs/0/file | xargs stat find: /mnt/glusterfs/0/file: Input/output error not ok 11 RESULT 11: 1 = Why is this failing? If I read correctly the test, we have set a gfid mismatch and find /mnt/glusterfs/0/file getting EIO is the expected behavior. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Running gfid-mismatch.t on NetBSD
On 07/03/2014 04:56 PM, Emmanuel Dreyfus wrote: Pranith Kumar Karampuri pkara...@redhat.com wrote: Yes this is the expected behavior. Then is the not ok 11 something I should see? Yes. See why it is succeeding instead of failing? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-822830.t fails on release-3.5 branch
On 07/04/2014 11:19 AM, Ravishankar N wrote: On 07/04/2014 11:09 AM, Pranith Kumar Karampuri wrote: Ravi, I already sent a patch for it in the morning at http://review.gluster.com/8233 Review please :-) 830665.t is identical in master where it succeeds. Looks like *match_subnet_v4() changes in master need to be backported to 3.5 as well. That is because Avati's patch where EXPECT matches reg-ex is not present on release-3.5 commit 9a34ea6a0a95154013676cabf8528b2679fb36c4 Author: Anand Avati av...@redhat.com Date: Fri Jan 24 18:30:32 2014 -0800 tests: support regex in EXPECT constructs Instead of just strings, provide the ability to specify a regex of the pattern to expect Change-Id: I6ada978197dceecc28490a2a40de73a04ab9abcd Signed-off-by: Anand Avati av...@redhat.com Reviewed-on: http://review.gluster.org/6788 Reviewed-by: Pranith Kumar Karampuri pkara...@redhat.com Tested-by: Gluster Build System jenk...@build.gluster.com Shall we backport this? Pranith Pranith On 07/04/2014 11:00 AM, Ravishankar N wrote: Hi Niels/ Santosh, tests/bugs/bug-830665.t is consistently failing on 3.5 branch: not ok 17 Got *.redhat.com instead of \*.redhat.com not ok 19 Got 192.168.10.[1-5] instead of 192.168.10.\[1-5] and seems to be introduced by http://review.gluster.org/#/c/8223/ Could you please look into it? Thanks, Ravi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-822830.t fails on release-3.5 branch
On 07/04/2014 12:00 PM, Santosh Pradhan wrote: Thanks guys for looking into this. I am just wondering how this passed the regression before Niels could merged this in? Good part is test case needs modification not code ;) There seems to be some bug in our regression testing code. Even though the regression failed it gave the verdict as SUCCESS http://build.gluster.org/job/rackspace-regression-2GB-triggered/97/consoleFull Pranith -Santosh On 07/04/2014 11:51 AM, Ravishankar N wrote: On 07/04/2014 11:20 AM, Pranith Kumar Karampuri wrote: On 07/04/2014 11:19 AM, Ravishankar N wrote: On 07/04/2014 11:09 AM, Pranith Kumar Karampuri wrote: Ravi, I already sent a patch for it in the morning at http://review.gluster.com/8233 Review please :-) 830665.t is identical in master where it succeeds. Looks like *match_subnet_v4() changes in master need to be backported to 3.5 as well. That is because Avati's patch where EXPECT matches reg-ex is not present on release-3.5 commit 9a34ea6a0a95154013676cabf8528b2679fb36c4 Author: Anand Avati av...@redhat.com Date: Fri Jan 24 18:30:32 2014 -0800 tests: support regex in EXPECT constructs Instead of just strings, provide the ability to specify a regex of the pattern to expect Change-Id: I6ada978197dceecc28490a2a40de73a04ab9abcd Signed-off-by: Anand Avati av...@redhat.com Reviewed-on: http://review.gluster.org/6788 Reviewed-by: Pranith Kumar Karampuri pkara...@redhat.com Tested-by: Gluster Build System jenk...@build.gluster.com Shall we backport this? I think we should; reviewed http://review.gluster.org/#/c/8235/. Thanks for the fix :) Pranith Pranith On 07/04/2014 11:00 AM, Ravishankar N wrote: Hi Niels/ Santosh, tests/bugs/bug-830665.t is consistently failing on 3.5 branch: not ok 17 Got *.redhat.com instead of \*.redhat.com not ok 19 Got 192.168.10.[1-5] instead of 192.168.10.\[1-5] and seems to be introduced by http://review.gluster.org/#/c/8223/ Could you please look into it? Thanks, Ravi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-822830.t fails on release-3.5 branch
On 07/04/2014 12:06 PM, Harshavardhana wrote: There seems to be some bug in our regression testing code. Even though the regression failed it gave the verdict as SUCCESS http://build.gluster.org/job/rackspace-regression-2GB-triggered/97/consoleFull This was fixed by Justin Clift recently All is well then :-) Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-822830.t fails on release-3.5 branch
On 07/04/2014 12:04 PM, Harshavardhana wrote: On Thu, Jul 3, 2014 at 11:30 PM, Santosh Pradhan sprad...@redhat.com wrote: Thanks guys for looking into this. I am just wondering how this passed the regression before Niels could merged this in? Good part is test case needs modification not code ;) We need a single maintainer for test cases alone to keep stability across, this would occur if some changes introduce races as we add more and more test cases. I don't mind maintaining it along with Justin if people are okay with it. Pranith. For example chmod.t from posix-compliance fails once in a while and it is not never maintained by us. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding inode_link/unlink
On 07/04/2014 04:28 PM, Raghavendra Gowdappa wrote: - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Gluster Devel gluster-devel@gluster.org, Anand Avati av...@gluster.org, Brian Foster bfos...@redhat.com, Raghavendra Gowdappa rgowd...@redhat.com, Raghavendra Bhat rab...@redhat.com Sent: Friday, July 4, 2014 3:44:29 PM Subject: regarding inode_link/unlink hi, I have a doubt about when a particular dentry_unset thus inode_unref on parent dir happens on fuse-bridge in gluster. When a file is looked up for the first time fuse_entry_cbk does 'inode_link' with parent-gfid/bname. Whenever an unlink/rmdir/(lookup gives ENOENT) happens then corresponding inode unlink happens. The question is, will the present set of operations lead to leaks: 1) Mount 'M0' creates a file 'a' 2) Mount 'M1' of same volume deletes file 'a' M0 never touches 'a' anymore. When will inode_unlink happen for such cases? Will it lead to memory leaks? Kernel will eventually send forget (a) on M0 and that will cleanup the dentries and inode. Its equivalent to a file being looked up and never used again (deleting doesn't matter in this case). Do you know the trigger points for that? When I do 'touch a' on the mount point and leave the system like that, forget is not coming. If I do unlink on the file then forget is coming. Pranith Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] triggers for sending inode forgets
hi, I work on glusterfs and was debugging a memory leak. Need your help in figuring out if something is done properly or not. When a file is looked up for the first time in gluster through fuse, gluster remembers the parent-inode, basename for that inode. Whenever an unlink/rmdir/(lookup gives ENOENT) happens then corresponding forgetting of parent-inode, basename happens. In all other cases it relies on fuse to send forget of an inode to release these associations. I was wondering what are the trigger points for sending forgets by fuse. Lets say M0, M1 are fuse mounts of same volume. 1) Mount 'M0' creates a file 'a' 2) Mount 'M1' of deletes file 'a' M0 never touches 'a' anymore. Will a forget be sent on inode of 'a'? If yes when? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] triggers for sending inode forgets
On 07/05/2014 08:17 AM, Anand Avati wrote: On Fri, Jul 4, 2014 at 7:03 PM, Pranith Kumar Karampuri pkara...@redhat.com mailto:pkara...@redhat.com wrote: hi, I work on glusterfs and was debugging a memory leak. Need your help in figuring out if something is done properly or not. When a file is looked up for the first time in gluster through fuse, gluster remembers the parent-inode, basename for that inode. Whenever an unlink/rmdir/(lookup gives ENOENT) happens then corresponding forgetting of parent-inode, basename happens. This is because of the path resolver explicitly calls d_invalidate() on a dentry when d_revalidate() fails on it. In all other cases it relies on fuse to send forget of an inode to release these associations. I was wondering what are the trigger points for sending forgets by fuse. Lets say M0, M1 are fuse mounts of same volume. 1) Mount 'M0' creates a file 'a' 2) Mount 'M1' of deletes file 'a' M0 never touches 'a' anymore. Will a forget be sent on inode of 'a'? If yes when? Really depends on when the memory manager decides to start reclaiming memory from dcache due to memory pressure. If the system is not under memory pressure, and if the stale dentry is never encountered by the path resolver, the inode may never receive a forget. To keep a tight utilization limit on the inode/dcache, you will have to proactively fuse_notify_inval_entry on old/deleted files. Thanks for this info Avati. I see that in fuse-bridge for glusterfs there is a setxattr interface to do that. Is that what you are referring to? Pranith Thanks ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regarding spurious failure tests/bugs/bug-1112559.t
hi Joseph, The test above failed on a documentation patch, so it has got to be a spurious failure. Check http://build.gluster.org/job/rackspace-regression-2GB-triggered/150/consoleFull for more information Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Problem with smoke, regression ordering
hi Justin, If the regression results complete before the smoke test then 'green-tick-mark' is over-written and people don't realize that the regression succeeded by a simple glance at the list of patches. Can we do anything about it? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regarding message for '-1' on gerrit
hi Justin/Vijay, I always felt '-1' saying 'I prefer you didn't submit this' is a bit harsh. Most of the times all it means is 'Need some more changes' Do you think we can change this message? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding message for '-1' on gerrit
On 07/06/2014 11:05 PM, Vijay Bellur wrote: On 07/06/2014 07:47 PM, Pranith Kumar Karampuri wrote: hi Justin/Vijay, I always felt '-1' saying 'I prefer you didn't submit this' is a bit harsh. Most of the times all it means is 'Need some more changes' Do you think we can change this message? The message can be changed. What would everyone like to see as appropriate messages accompanying values '-1' and '-2'? For '-1' - 'Please address the comments and Resubmit.' I am not sure about '-2' Pranith -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding message for '-1' on gerrit
On 07/07/2014 03:11 PM, Justin Clift wrote: On 07/07/2014, at 2:50 AM, Pranith Kumar Karampuri wrote: On 07/06/2014 11:05 PM, Vijay Bellur wrote: On 07/06/2014 07:47 PM, Pranith Kumar Karampuri wrote: hi Justin/Vijay, I always felt '-1' saying 'I prefer you didn't submit this' is a bit harsh. Most of the times all it means is 'Need some more changes' Do you think we can change this message? The message can be changed. What would everyone like to see as appropriate messages accompanying values '-1' and '-2'? For '-1' - 'Please address the comments and Resubmit.' That sounds good. :) I am not sure about '-2' Maybe something like? I have strong doubts about this approach (seems to reflect it's usage) Agree :-) Pranith + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding spurious failure tests/bugs/bug-1112559.t
On 07/07/2014 06:18 PM, Pranith Kumar Karampuri wrote: Joseph, Any updates on this? It failed 5 regressions today. http://build.gluster.org/job/rackspace-regression-2GB/541/consoleFull http://build.gluster.org/job/rackspace-regression-2GB-triggered/175/consoleFull http://build.gluster.org/job/rackspace-regression-2GB-triggered/173/consoleFull http://build.gluster.org/job/rackspace-regression-2GB-triggered/166/consoleFull http://build.gluster.org/job/rackspace-regression-2GB-triggered/172/consoleFull One more : http://build.gluster.org/job/rackspace-regression-2GB/543/console Pranith CC some more folks who work on snapshot. Pranith On 07/05/2014 11:19 AM, Pranith Kumar Karampuri wrote: hi Joseph, The test above failed on a documentation patch, so it has got to be a spurious failure. Check http://build.gluster.org/job/rackspace-regression-2GB-triggered/150/consoleFull for more information Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Locale problem in master
Including Bala who is the author of the commit Pranith On 07/07/2014 10:18 PM, Anders Blomdell wrote: Due to the line (commit 040319d8bced2f25bf25d8f6b937901c3a40e34b): ./libglusterfs/src/logging.c:503:setlocale(LC_ALL, ); The command env -i LC_NUMERIC=sv_SE.utf8 /usr/sbin/glusterfs ... will fail due to the fact that the swedish decimal separator is not '.', but ',', i.e. _gf_string2double will fail due to strtod ('1.0', tail) will give the tail '.0'. /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] FS Sanity daily results.
On 07/06/2014 07:58 PM, Pranith Kumar Karampuri wrote: On 07/06/2014 02:53 AM, Benjamin Turner wrote: Hi all. I have been running FS sanity on daily builds(glusterfs mounts only at this point) for a few days for a few days and I have been hitting a couple of problems: final pass/fail report = Test Date: Sat Jul 5 01:53:00 EDT 2014 Total : [44] Passed: [41] Failed: [3] Abort : [0] Crash : [0] - [ PASS ] FS Sanity Setup [ PASS ] Running tests. [ PASS ] FS SANITY TEST - arequal [ PASS ] FS SANITY LOG SCAN - arequal [ PASS ] FS SANITY LOG SCAN - bonnie [ PASS ] FS SANITY TEST - glusterfs_build [ PASS ] FS SANITY LOG SCAN - glusterfs_build [ PASS ] FS SANITY TEST - compile_kernel [ PASS ] FS SANITY LOG SCAN - compile_kernel [ PASS ] FS SANITY TEST - dbench [ PASS ] FS SANITY LOG SCAN - dbench [ PASS ] FS SANITY TEST - dd [ PASS ] FS SANITY LOG SCAN - dd [ PASS ] FS SANITY TEST - ffsb [ PASS ] FS SANITY LOG SCAN - ffsb [ PASS ] FS SANITY TEST - fileop [ PASS ] FS SANITY LOG SCAN - fileop [ PASS ] FS SANITY TEST - fsx [ PASS ] FS SANITY LOG SCAN - fsx [ PASS ] FS SANITY LOG SCAN - fs_mark [ PASS ] FS SANITY TEST - iozone [ PASS ] FS SANITY LOG SCAN - iozone [ PASS ] FS SANITY TEST - locks [ PASS ] FS SANITY LOG SCAN - locks [ PASS ] FS SANITY TEST - ltp [ PASS ] FS SANITY LOG SCAN - ltp [ PASS ] FS SANITY TEST - multiple_files [ PASS ] FS SANITY LOG SCAN - multiple_files [ PASS ] FS SANITY TEST - posix_compliance [ PASS ] FS SANITY LOG SCAN - posix_compliance [ PASS ] FS SANITY TEST - postmark [ PASS ] FS SANITY LOG SCAN - postmark [ PASS ] FS SANITY TEST - read_large [ PASS ] FS SANITY LOG SCAN - read_large [ PASS ] FS SANITY TEST - rpc [ PASS ] FS SANITY LOG SCAN - rpc [ PASS ] FS SANITY TEST - syscallbench [ PASS ] FS SANITY LOG SCAN - syscallbench [ PASS ] FS SANITY TEST - tiobench [ PASS ] FS SANITY LOG SCAN - tiobench [ PASS ] FS Sanity Cleanup [ FAIL ] FS SANITY TEST - bonnie [ FAIL ] FS SANITY TEST - fs_mark [ FAIL ] /rhs-tests/beaker/rhs/auto-tests/components/sanity/fs-sanity-tests-v2 Bonnie++ is just very slow(running for 10+ hours on 1 16 GB file) and FS mark has been failing. The bonnie slowness is in re read, here is the best explanation I can find on it: https://blogs.oracle.com/roch/entry/decoding_bonnie *Rewriting...done* This gets a little interesting. It actually reads 8K, lseek back to the start of the block, overwrites the 8K with new data and loops. (see article for more.). On FS mark I am seeing: # fs_mark -d . -D 4 -t 4 -S 5 # Version 3.3, 4 thread(s) starting at Sat Jul 5 00:54:00 2014 # Sync method: POST: Reopen and fsync() each file in order after main write loop. # Directories: Time based hash between directories across 4 subdirectories with 180 seconds per subdirectory. # File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name) # Files info: size 51200 bytes, written with an IO size of 16384 bytes per write # App overhead is time in microseconds spent in the test not doing file writing related system calls. FSUse%Count SizeFiles/sec App Overhead Error in unlink of ./00/53b784e8SKZ0QS9BO7O2EG1DIFQLRDYY : No such file or directory fopen failed to open: fs_log.txt.26676 fs-mark pass # 5 failed I am working on reporting so look for a daily status report email from my jenkins server soon. How do we want to handle failures like this moving forward? Should I just open a BZ after I triage? Do you guys do a new BZ for every failure in the normal regressions tests? Yes bz would be great with all the logs. For spurious regressions at least I just opened one bz and fixed all the bugs reported by Justin against that one. Ben, Did you get a chance to raise the bug? Pranith Pranith -b ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] EXPECT_WITHIN output change
hi, I sent the following patch to change the output of EXPECT_WITHIN: http://review.gluster.org/8263 Patch got one +1 and regressions passed. Merge it please :-). Test: #!/bin/bash . $(dirname $0)/../include.rc EXPECT_WITHIN 10 abc echo def EXPECT_WITHIN 10 def echo def EXPECT_WITHIN 10 abc ls asjfrhg Old-style-output: === 15:10:03 :) ⚡ prove -rfv tests/basic/self-heald-test.t tests/basic/self-heald-test.t .. 1..3 not ok 1 FAILED COMMAND: abc echo def ok 2 ls: cannot access asjfrhg: No such file or directory not ok 3 FAILED COMMAND: abc ls asjfrhg Failed 2/3 subtests Test Summary Report --- tests/basic/self-heald-test.t (Wstat: 0 Tests: 3 Failed: 2) Failed tests: 1, 3 New-style-output: root@pranithk-laptop - /home/pk1/workspace/gerrit-repo (master) 15:10:21 :( ⚡ prove -rfv tests/basic/self-heald-test.t tests/basic/self-heald-test.t .. 1..3 not ok 1 Got def instead of abc FAILED COMMAND: abc echo def ok 2 ls: cannot access asjfrhg: No such file or directory not ok 3 Got instead of abc FAILED COMMAND: abc ls asjfrhg Failed 2/3 subtests Test Summary Report --- tests/basic/self-heald-test.t (Wstat: 0 Tests: 3 Failed: 2) Failed tests: 1, 3 Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] documentation about statedump
hi, I wanted to document the core data structres and debugging infra in gluster. This is the first patch in that series. Please review and provide comments. I am not very familiar with iobuf infra. Please feel free to provide comments in the patch for that section as well. I can amend the document with those changes and resend the patch. http://review.gluster.org/8288 Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regarding warnings on master
hi Harsha, Know anything about the following warnings on latest master? In file included from msg-nfs3.h:20:0, from msg-nfs3.c:22: nlm4-xdr.h:6:14: warning: extra tokens at end of #ifndef directive [enabled by default] #ifndef _NLM4-XDR_H_RPCGEN ^ nlm4-xdr.h:7:14: warning: missing whitespace after the macro name [enabled by default] #define _NLM4-XDR_H_RPCGEN Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Is this a transient failure?
On 07/11/2014 07:05 PM, Justin Clift wrote: On 11/07/2014, at 11:36 AM, Anders Blomdell wrote: In http://build.gluster.org/job/rackspace-regression-2GB-triggered/297/consoleFull, I have one failure: No volumes present read failed: No data available read returning junk fd based file operation 1 failed read failed: No data available read returning junk fstat failed : No data available fd based file operation 2 failed read failed: No data available read returning junk dup fd based file operation failed [18:51:01] ./tests/basic/fops-sanity.t ... What should I do about it? Kick off a regression test manually here, and see if the same failure occurs: http://build.gluster.org/job/rackspace-regression-2GB/ If it happens again, it's not a spurious one. I believe this is a spurious one. I didn't get a chance to debug this issue. Pranith I'll send you a login for Jenkins, so you can kick off jobs manually and stuff. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Change in glusterfs[master]: porting: use __builtin_ffsll() instead of ffsll()
CC gluster-devel, Anuradha who committed the test. Pranith On 07/15/2014 01:58 AM, Harshavardhana wrote: Mr Spurious is here again! Patch Set 2: Verified-1 http://build.gluster.org/job/rackspace-regression-2GB-triggered/351/consoleFull : FAILED Test Summary Report --- ./tests/bugs/bug-1038598.t (Wstat: 0 Tests: 28 Failed: 1) Failed test: 28 Files=262, Tests=7738, 5904 wallclock secs ( 4.09 usr 2.35 sys + 546.04 cusr 750.64 csys = 1303.12 CPU) Result: FAIL ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Developer Documentation for datastructures in gluster
hi, Please respond if you guys volunteer to add documentation for any of the following things that are not already taken. client_t - pranith integration with statedump - pranith mempool - Pranith event-hostory + circ-buff - Raghavendra Bhat inode - Raghavendra Bhat call-stub fd iobuf graph xlator option-framework rbthash runner-framework stack/frame strfd timer store gid-cache(source is heavily documented) dict event-poll Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Developer Documentation for datastructures in gluster
On 07/15/2014 04:47 PM, Kaushal M wrote: What do you mean by 'option-framework'? Is it the xlator options table that we have in each xlator? Or the glusterd volume set framework (which requires the xlator options tables to function anyway)? options.c in libglusterfs Pranith On Tue, Jul 15, 2014 at 4:39 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: hi, Please respond if you guys volunteer to add documentation for any of the following things that are not already taken. client_t - pranith integration with statedump - pranith mempool - Pranith event-hostory + circ-buff - Raghavendra Bhat inode - Raghavendra Bhat call-stub fd iobuf graph xlator option-framework rbthash runner-framework stack/frame strfd timer store gid-cache(source is heavily documented) dict event-poll Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Developer Documentation for datastructures in gluster
On 07/15/2014 07:22 PM, Niels de Vos wrote: On Tue, Jul 15, 2014 at 08:45:45AM -0400, Jeff Darcy wrote: Please respond if you guys volunteer to add documentation for any of the following things that are not already taken. I think the most important thing to describe for each of these is the life cycle rules. When I've tried to teach people about translators, one of the biggest stumbling blocks has been the question of what gets freed after the fop, what gets freed after the callback, and what lives on even longer. There are different rules for dict_t, loc_t, inode_t, etc. Dict_set_*str is one of the worst offenders; even after all this time, I have to go back and re-check which variants do what when the dict itself is freed. If the only thing that comes out of this effort is greater clarity regarding what should be freed when, it will be worth it. client_t - pranith integration with statedump - pranith mempool - Pranith event-hostory + circ-buff - Raghavendra Bhat inode - Raghavendra Bhat call-stub fd iobuf graph xlator option-framework rbthash runner-framework stack/frame strfd timer store gid-cache(source is heavily documented) dict event-poll My Translator 101 series already covers xlators and call frames, so I might as well continue with those. Can you make these available in MarkDown format somewhere under the docs/ directory? Oops sorry. That is what we are going to do. Send patches :-). Pranith. Thanks, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious regression failures again!
hi, We have 4 tests failing once in a while causing problems: 1) tests/bugs/bug-1087198.t - Author: Varun 2) tests/basic/mgmt_v3-locks.t - Author: Avra 3) tests/basic/fops-sanity.t - Author: Pranith Please take a look at them and post updates. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Developer Documentation for datastructures in gluster
On 07/16/2014 11:57 AM, Kaushal M wrote: I'll take up documenting the options framework. I'd like take up graph and dict, if Jeff doesn't mind. Also, I think we should be aiming to document the complete API provided by these components instead of just the data structure. That would be more helpful to everyone IMO. Yes. Will keep that in mind while writing the documentation :-) Pranith ~kaushal On Wed, Jul 16, 2014 at 11:21 AM, Raghavendra Gowdappa rgowd...@redhat.com wrote: syncop-framework is not listed here. I would like to take that up. Also, if nobody is willing to pick up runner framework, I can handle that too. - Original Message - From: Krutika Dhananjay kdhan...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Wednesday, July 16, 2014 10:41:28 AM Subject: Re: [Gluster-devel] Developer Documentation for datastructures in gluster Hi, I'd like to pick up timer and call-stub. -Krutika From: Pranith Kumar Karampuri pkara...@redhat.com To: Gluster Devel gluster-devel@gluster.org Sent: Tuesday, July 15, 2014 4:39:39 PM Subject: [Gluster-devel] Developer Documentation for datastructures in gluster hi, Please respond if you guys volunteer to add documentation for any of the following things that are not already taken. client_t - pranith integration with statedump - pranith mempool - Pranith event-hostory + circ-buff - Raghavendra Bhat inode - Raghavendra Bhat call-stub fd iobuf graph xlator option-framework rbthash runner-framework stack/frame strfd timer store gid-cache(source is heavily documented) dict event-poll Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] What's the impact of enabling the profiler?
On 07/18/2014 03:05 AM, Joe Julian wrote: What impact, if any, does starting profiling (gluster volume profile $vol start) have on performance? Joe, According to the code the only extra things it does is calling gettimeofday() call at the beginning and end of the FOP to calculate latency, increment some variables. So I guess not much? Pranith ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Inspiration for improving our contributor documentation
On 07/17/2014 07:25 PM, Kaushal M wrote: I came across mediawiki's developer documentation and guides when browsing. These docs felt really good to me, and easy to approach. I feel that we should take inspiration from them and start enhancing our docs. (Outright copying with modifications as necessary, could work too. But that just doesn't feel right) Any volunteers? (I'll start as soon as I finish with the developer documentation for data structures for the components I volunteered earlier) ~kaushal [0] - https://www.mediawiki.org/wiki/Developer_hub I love the idea but not sure about the implementation. i.e. considering we already started with .md pages, why not have same kind of pages as .md files in /doc of gluster? We can modify the README in our project so that people can browse all the details in github? Please let me know your thoughts. Pranith [1] - https://www.mediawiki.org/wiki/Category:New_contributors [2] - https://www.mediawiki.org/wiki/Gerrit/Code_review [3] - https://www.mediawiki.org/wiki/Gerrit [4] - https://www.mediawiki.org/wiki/Gerrit/Tutorial [5] - https://www.mediawiki.org/wiki/Gerrit/Getting_started [6] - https://www.mediawiki.org/wiki/Gerrit/Advanced_usage ... and lots more. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. Pranith /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?
On 07/19/2014 11:25 AM, Andrew Lau wrote: On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri pkara...@redhat.com mailto:pkara...@redhat.com wrote: On 07/18/2014 05:43 PM, Andrew Lau wrote: On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur vbel...@redhat.com mailto:vbel...@redhat.com wrote: [Adding gluster-devel] On 07/18/2014 05:20 PM, Andrew Lau wrote: Hi all, As most of you have got hints from previous messages, hosted engine won't work on gluster . A quote from BZ1097639 Using hosted engine with Gluster backed storage is currently something we really warn against. I think this bug should be closed or re-targeted at documentation, because there is nothing we can do here. Hosted engine assumes that all writes are atomic and (immediately) available for all hosts in the cluster. Gluster violates those assumptions. I tried going through BZ1097639 but could not find much detail with respect to gluster there. A few questions around the problem: 1. Can somebody please explain in detail the scenario that causes the problem? 2. Is hosted engine performing synchronous writes to ensure that writes are durable? Also, if there is any documentation that details the hosted engine architecture that would help in enhancing our understanding of its interactions with gluster. Now my question, does this theory prevent a scenario of perhaps something like a gluster replicated volume being mounted as a glusterfs filesystem and then re-exported as the native kernel NFS share for the hosted-engine to consume? It could then be possible to chuck ctdb in there to provide a last resort failover solution. I have tried myself and suggested it to two people who are running a similar setup. Now using the native kernel NFS server for hosted-engine and they haven't reported as many issues. Curious, could anyone validate my theory on this? If we obtain more details on the use case and obtain gluster logs from the failed scenarios, we should be able to understand the problem better. That could be the first step in validating your theory or evolving further recommendations :). I'm not sure how useful this is, but Jiri Moskovcak tracked this down in an off list message. Message Quote: == We were able to track it down to this (thanks Andrew for providing the testing setup): -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine' Traceback (most recent call last): File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 165, in handle response = success + self._dispatch(data) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 261, in _dispatch .get_all_stats_for_service_type(**options) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata' Andrew/Jiri, Would it be possible to post gluster logs of both the mount and bricks on the bz? I can take a look at it once. If I gather nothing then probably I will ask for your help in re-creating the issue. Pranith Unfortunately, I don't have the logs for that setup any more.. I'll try replicate when I get a chance. If I understand the comment from the BZ, I don't think it's a gluster bug per-say, more just how gluster does its replication. hi Andrew, Thanks for that. I couldn't come to any conclusions because no logs were available. It is unlikely that self-heal is involved because there were no bricks going down/up according to the bug description. Pranith It's definitely connected to the storage which leads us to the gluster, I'm not very familiar with the gluster so I need to check this with our gluster gurus. == Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org http
Re: [Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?
On 07/21/2014 02:08 PM, Jiri Moskovcak wrote: On 07/19/2014 08:58 AM, Pranith Kumar Karampuri wrote: On 07/19/2014 11:25 AM, Andrew Lau wrote: On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri pkara...@redhat.com mailto:pkara...@redhat.com wrote: On 07/18/2014 05:43 PM, Andrew Lau wrote: On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur vbel...@redhat.com mailto:vbel...@redhat.com wrote: [Adding gluster-devel] On 07/18/2014 05:20 PM, Andrew Lau wrote: Hi all, As most of you have got hints from previous messages, hosted engine won't work on gluster . A quote from BZ1097639 Using hosted engine with Gluster backed storage is currently something we really warn against. I think this bug should be closed or re-targeted at documentation, because there is nothing we can do here. Hosted engine assumes that all writes are atomic and (immediately) available for all hosts in the cluster. Gluster violates those assumptions. I tried going through BZ1097639 but could not find much detail with respect to gluster there. A few questions around the problem: 1. Can somebody please explain in detail the scenario that causes the problem? 2. Is hosted engine performing synchronous writes to ensure that writes are durable? Also, if there is any documentation that details the hosted engine architecture that would help in enhancing our understanding of its interactions with gluster. Now my question, does this theory prevent a scenario of perhaps something like a gluster replicated volume being mounted as a glusterfs filesystem and then re-exported as the native kernel NFS share for the hosted-engine to consume? It could then be possible to chuck ctdb in there to provide a last resort failover solution. I have tried myself and suggested it to two people who are running a similar setup. Now using the native kernel NFS server for hosted-engine and they haven't reported as many issues. Curious, could anyone validate my theory on this? If we obtain more details on the use case and obtain gluster logs from the failed scenarios, we should be able to understand the problem better. That could be the first step in validating your theory or evolving further recommendations :). I'm not sure how useful this is, but Jiri Moskovcak tracked this down in an off list message. Message Quote: == We were able to track it down to this (thanks Andrew for providing the testing setup): -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine' Traceback (most recent call last): File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 165, in handle response = success + self._dispatch(data) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py, line 261, in _dispatch .get_all_stats_for_service_type(**options) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py, line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata' Andrew/Jiri, Would it be possible to post gluster logs of both the mount and bricks on the bz? I can take a look at it once. If I gather nothing then probably I will ask for your help in re-creating the issue. Pranith Unfortunately, I don't have the logs for that setup any more.. I'll try replicate when I get a chance. If I understand the comment from the BZ, I don't think it's a gluster bug per-say, more just how gluster does its replication. hi Andrew, Thanks for that. I couldn't come to any conclusions because no logs were available. It is unlikely that self-heal is involved because there were no bricks going down/up according to the bug description. Hi, I've never had such setup, I guessed problem with gluster based on OSError: [Errno 116] Stale file handle: which happens when the file opened by application on client gets removed on the server. I'm pretty sure we (hosted-engine) don't remove that file, so I think it's some gluster magic moving the data around
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On 07/21/2014 05:03 PM, Anders Blomdell wrote: On 2014-07-19 04:43, Pranith Kumar Karampuri wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. Would a pcap network dump + the result from 'tar -c --xattrs /brick/a/gluster' on all the hosts before and after the following commands are run be of any help: # mount -t glusterfs gluster-host:/test /mnt/gluster # mkdir /mnt/gluster/work2 ; # ls /mnt/gluster work2 work2 Are you using ext4? Is this on latest upstream? Pranith If so, where should I send them (size is 2*12*31MB [.tar] + 220kB [pcap]) /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On 07/21/2014 05:17 PM, Anders Blomdell wrote: On 2014-07-21 13:36, Pranith Kumar Karampuri wrote: On 07/21/2014 05:03 PM, Anders Blomdell wrote: On 2014-07-19 04:43, Pranith Kumar Karampuri wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. Would a pcap network dump + the result from 'tar -c --xattrs /brick/a/gluster' on all the hosts before and after the following commands are run be of any help: # mount -t glusterfs gluster-host:/test /mnt/gluster # mkdir /mnt/gluster/work2 ; # ls /mnt/gluster work2 work2 Are you using ext4? Yes Is this on latest upstream? kernel is 3.14.9-200.fc20.x86_64, if that is latest upstream, I don't know. gluster is from master as of end of last week If there are known issues with ext4 i could switch to something else, but during the last 15 years or so, I have had very little problems with ext2/3/4, thats the reason for choosing it. The problem is afrv2 + dht + ext4 offsets. Soumya and Xavier were working on it last I heard(CCed) Pranith /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] What's the impact of enabling the profiler?
On 07/22/2014 11:56 AM, Joe Julian wrote: On 07/21/2014 11:20 PM, Pranith Kumar Karampuri wrote: On 07/22/2014 11:39 AM, Joe Julian wrote: On 07/17/2014 07:30 PM, Pranith Kumar Karampuri wrote: On 07/18/2014 03:05 AM, Joe Julian wrote: What impact, if any, does starting profiling (gluster volume profile $vol start) have on performance? Joe, According to the code the only extra things it does is calling gettimeofday() call at the beginning and end of the FOP to calculate latency, increment some variables. So I guess not much? So far so good. Is the only way to clear the stats to restart the brick? I think when the feature is initially proposed we wanted two things 1) cumulative stats 2) Interval stats Interval stats get cleared whenever 'gluster volume profile volname info' is executed (Although it starts counting the next set of fops that happen after this command execution). But there is no way to clear the cumulative stats. It would be nice if you could give some feedback about what you liked/what you think should change to make better use of it. So I am guessing there wasn't big performance hit? Pranith No noticeable performance hit, no. I'm writing a whitepaper for the best practices for OpenStack on GlusterFS so I needed some idea how qemu actually uses the filesystem. What the operations are so I can look at not only the best ways to tune for that use, but how to build the systems around that. At this point, I'm just collecting data. TBH, I hadn't noticed the interval data. That should be perfect for this. I'll poll it in XML and run the numbers in a few days. Joe, Do let us know your feedback. It needs some real-world usage suggestions from users like you :-). Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Developer Documentation for datastructures in gluster
Here is my first draft of mem-pool data structure for review: http://review.gluster.org/8343 Please don't laugh at the ascii art ;-). Pranith On 07/17/2014 04:10 PM, Ravishankar N wrote: On 07/15/2014 04:39 PM, Pranith Kumar Karampuri wrote: hi, Please respond if you guys volunteer to add documentation for any of the following things that are not already taken. client_t - pranith integration with statedump - pranith mempool - Pranith event-hostory + circ-buff - Raghavendra Bhat inode - Raghavendra Bhat call-stub fd iobuf graph xlator option-framework rbthash runner-framework stack/frame strfd timer store gid-cache(source is heavily documented) dict event-poll I'll take up event-poll. I have created an etherpad link with the components and volunteers thus far: https://etherpad.wikimedia.org/p/glusterdoc Feel free to update this doc with your patch details, other components etc. - Ravi Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Symlinks change date while migrating
On 07/23/2014 02:44 PM, Anders Blomdell wrote: When migrating approx 1 GB of data data by doing gluster volume add-brick test new-host1:/path/to/new/brick ... gluster volume remove-brick old-host1:/path/to/old/brick ... start ... wait for removal to finish gluster volume remove-brick old-host1:/path/to/old/brick ... commit on a 3*4 - 6*4 - 3*4 gluster [version 3.7dev-0.9.git5b8de97] approximately 40% of the symlinks change their mtime to the time they were copied. Is this expected/known or should I file a bug? hi, Seems like a dht issue. File a bug. Assign the component to dht/'distribute' for now. If it is different component, assignee of that bug can change it accordingly. Pranith /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Can anyone else shed any light on this warning?
On 07/26/2014 03:06 AM, Joe Julian wrote: How can it come about? Is this from replacing a brick days ago? Can I prevent it from happening? [2014-07-25 07:00:29.287680] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd (ptr:0x7f17cb846444 inode-gfid:87544fde-9bad-46d8-b610-1a8c93b85113) did not complete, failing fop with EBADF (old-subvolume:gv-nova-3 new-subvolume:gv-nova-4) It's critical because it causes a segfault every time. :( Joe, This is fd migration code. When a brick layout changes (graph change) the file needs to be re-opened in the new graph. This re-open seemed to have failed. It leads to crash probably because extra unref in failure code path. Could you add brick/mount logs to the bug https://bugzilla.redhat.com/show_bug.cgi?id=1123289. What is the configuration of the volume? pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Can anyone else shed any light on this warning?
On 07/26/2014 11:06 AM, Pranith Kumar Karampuri wrote: On 07/26/2014 03:06 AM, Joe Julian wrote: How can it come about? Is this from replacing a brick days ago? Can I prevent it from happening? [2014-07-25 07:00:29.287680] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd (ptr:0x7f17cb846444 inode-gfid:87544fde-9bad-46d8-b610-1a8c93b85113) did not complete, failing fop with EBADF (old-subvolume:gv-nova-3 new-subvolume:gv-nova-4) It's critical because it causes a segfault every time. :( Joe, This is fd migration code. When a brick layout changes (graph change) the file needs to be re-opened in the new graph. This re-open seemed to have failed. It leads to crash probably because extra unref in failure code path. Could you add brick/mount logs to the bug https://bugzilla.redhat.com/show_bug.cgi?id=1123289. What is the configuration of the volume? I checked the code, I don't see any extra unrefs as of now. Please provide the details I asked for in the bug. CC Raghavendra G, Raghavendra Bhat who know this code path a bit more. Pranith pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] How should I submit a testcase without proper solution
hi Anders, Generally new test cases are submitted along with the fix to prevent this situation. You should either submit the fix along with the test case or wait until the fix is submitted by someone else in case you are not actively working on it and then we can re-trigger the regression build for this patch. Then it will be taken in. Pranith On 07/29/2014 08:51 PM, Anders Blomdell wrote: Hi, finally got around to look into Symlink mtime changes when rebalancing (https://bugzilla.redhat.com/show_bug.cgi?id=1122443), and I have submitted a test-case (http://review.gluster.org/#/c/8383/), but that is expected to fail (since I have not managed to write a patch that adresses the problem), and hence it will be voted down by Jenkins, is there something I should do about this? /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Monotonically increasing memory
Yes, even I saw the following leaks, when I tested it a week back. These were the leaks: You should probably take a statedump and see what datatypes are leaking. root@localhost - /usr/local/var/run/gluster 14:10:26 ? awk -f /home/pk1/mem-leaks.awk glusterdump.22412.dump.1406174043 [mount/fuse.fuse - usage-type gf_common_mt_char memusage] size=341240 num_allocs=23602 max_size=347987 max_num_allocs=23604 total_allocs=653194 [mount/fuse.fuse - usage-type gf_common_mt_mem_pool memusage] size=4335440 num_allocs=45159 max_size=7509032 max_num_allocs=77391 total_allocs=530058 [performance/quick-read.r2-quick-read - usage-type gf_common_mt_asprintf memusage] size=182526 num_allocs=30421 max_size=182526 max_num_allocs=30421 total_allocs=30421 [performance/quick-read.r2-quick-read - usage-type gf_common_mt_char memusage] size=547578 num_allocs=30421 max_size=547578 max_num_allocs=30421 total_allocs=30421 [performance/quick-read.r2-quick-read - usage-type gf_common_mt_mem_pool memusage] size=3117196 num_allocs=52999 max_size=3117368 max_num_allocs=53000 total_allocs=109484 [cluster/distribute.r2-dht - usage-type gf_common_mt_asprintf memusage] size=257304 num_allocs=82988 max_size=257304 max_num_allocs=82988 total_allocs=97309 [cluster/distribute.r2-dht - usage-type gf_common_mt_char memusage] size=2082904 num_allocs=82985 max_size=2082904 max_num_allocs=82985 total_allocs=101346 [cluster/distribute.r2-dht - usage-type gf_common_mt_mem_pool memusage] size=9958372 num_allocs=165972 max_size=9963396 max_num_allocs=165980 total_allocs=467956 [performance/quick-read.r2-quick-read - usage-type gf_common_mt_asprintf memusage] size=182526 num_allocs=30421 max_size=182526 max_num_allocs=30421 total_allocs=30421 [performance/quick-read.r2-quick-read - usage-type gf_common_mt_char memusage] size=547578 num_allocs=30421 max_size=547578 max_num_allocs=30421 total_allocs=30421 [performance/quick-read.r2-quick-read - usage-type gf_common_mt_mem_pool memusage] size=3117196 num_allocs=52999 max_size=3117368 max_num_allocs=53000 total_allocs=109484 [cluster/distribute.r2-dht - usage-type gf_common_mt_asprintf memusage] size=257304 num_allocs=82988 max_size=257304 max_num_allocs=82988 total_allocs=97309 [cluster/distribute.r2-dht - usage-type gf_common_mt_char memusage] size=2082904 num_allocs=82985 max_size=2082904 max_num_allocs=82985 total_allocs=101346 [cluster/distribute.r2-dht - usage-type gf_common_mt_mem_pool memusage] size=9958372 num_allocs=165972 max_size=9963396 max_num_allocs=165980 total_allocs=467956 root@localhost - /usr/local/var/run/gluster 14:10:28 ? Pranith On 08/01/2014 12:01 AM, Anders Blomdell wrote: During rsync of 35 files, memory consumption of glusterfs rose to 12 GB (after approx 14 hours), I take it that this is a bug I should try to track down? Version is 3.7dev as of tuesday... /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regarding mempool documentation patch
hi, If there are no more comments, could we take http://review.gluster.com/#/c/8343 in. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regarding resolution for fuse/server
hi, Does anyone know why there is different code for resolution in fuse vs server? There are some differences too, like server asserts about the resolution types like RESOLVE_MUST/RESOLVE_NOT etc where as fuse doesn't do any such thing. Wondering if there is any reason why the code is different in these two xlators. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel