Re: [Gluster-devel] [glusterfs-3.6.0beta3-0.11.gitd01b00a] gluster volume status is running even though the Disk is detached

2014-10-31 Thread Kiran Patil
I set zfs pool failmode to continue, which should disable only write and
not read as explained below

failmode=wait | continue | panic

   Controls the system behavior in the event of catastrophic pool
failure. This condition is typically a result of a loss of connec-
   tivity  to  the underlying storage device(s) or a failure of all
devices within the pool. The behavior of such an event is deter-
   mined as follows:

   waitBlocks all I/O access until the device connectivity
is recovered and the errors are  cleared.  This  is  the  default
   behavior.

   continueReturns  EIO  to  any  new  write  I/O  requests
 but allows reads to any of the remaining healthy devices. Any write
   requests that have yet to be committed to disk would
be blocked.

   panic   Prints out a message to the console and generates a
system crash dump.


Now, I rebuilt the glusterfs master and tried to see if failed driver
results in failed brick and in turn kill brick process and the brick is not
going offline.

# gluster volume status
Status of volume: repvol
Gluster process Port Online Pid
--
Brick 192.168.1.246:/zp1/brick1 49152 Y 2400
Brick 192.168.1.246:/zp2/brick2 49153 Y 2407
NFS Server on localhost 2049 Y 30488
Self-heal Daemon on localhost N/A Y 30495

Task Status of Volume repvol
--
There are no active volume tasks


The /var/log/gluster/mnt.log output:

[2014-10-31 09:18:15.934700] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk]
0-repvol-client-1: socket disconnected
[2014-10-31 09:18:15.934725] I [client.c:2215:client_rpc_notify]
0-repvol-client-1: disconnected from repvol-client-1. Client process will
keep trying to connect to glusterd until brick's port is available
[2014-10-31 09:18:15.935238] I [rpc-clnt.c:1765:rpc_clnt_reconfig]
0-repvol-client-1: changing port to 49153 (from 0)

Now if I copy a file to /mnt it copied without any hang and brick still
shows online.

Thanks,
Kiran.

On Tue, Oct 28, 2014 at 3:44 PM, Niels de Vos nde...@redhat.com wrote:

 On Tue, Oct 28, 2014 at 02:08:32PM +0530, Kiran Patil wrote:
  The content of file zp2-brick2.log is at http://ur1.ca/iku0l (
  http://fpaste.org/145714/44849041/ )
 
  I can't open the file /zp2/brick2/.glusterfs/health_check since it hangs
  due to no disk present.
 
  Let me know the filename pattern, so that I can find it.

 Hmm, if there is a hang while reading from the disk, it will not get
 detected in the current solution. We implemented failure detection on
 top of the detection that is done by the filesystem. Suspending a
 filesystem with fsfreeze or similar should probably not be seen as a
 failure.

 In your case, it seems that the filesystem suspends itself when the disk
 went away. I have no idea if it is possible to configure ZFS to not
 suspend, but return an error to the reading/writing application. Please
 check with such an option.

 If you find such an option, please update the wiki page and recommend
 enabling it:
 - http://gluster.org/community/documentation/index.php/GlusterOnZFS


 Thanks,
 Niels


 
  On Tue, Oct 28, 2014 at 1:42 PM, Niels de Vos nde...@redhat.com wrote:
 
   On Tue, Oct 28, 2014 at 01:10:56PM +0530, Kiran Patil wrote:
I applied the patches, compiled and installed the gluster.
   
# glusterfs --version
glusterfs 3.7dev built on Oct 28 2014 12:03:10
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. http://www.redhat.com/
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
   
# git log
commit 990ce16151c3af17e4cdaa94608b737940b60e4d
Author: Lalatendu Mohanty lmoha...@redhat.com
Date:   Tue Jul 1 07:52:27 2014 -0400
   
Posix: Brick failure detection fix for ext4 filesystem
...
...
   
I see below messages
  
   Many thanks Kiran!
  
   Do you have the messages from the brick that uses the zp2 mountpoint?
  
   There also should be a file with a timestamp when the last check was
   done successfully. If the brick is still running, this timestamp should
   get updated every storage.health-check-interval seconds:
   /zp2/brick2/.glusterfs/health_check
  
   Niels
  
   
File /var/log/glusterfs/etc-glusterfs-glusterd.vol.log :
   
The message I [MSGID: 106005]
[glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management:
 Brick
192.168.1.246:/zp2/brick2 has disconnected from glusterd. repeated
 39
times between [2014-10-28 05:58:09.209419] and [2014-10-28
   06:00:06.226330]
[2014-10-28 

Re: [Gluster-devel] [glusterfs-3.6.0beta3-0.11.gitd01b00a] gluster volume status is running even though the Disk is detached

2014-10-31 Thread Kiran Patil
I am not seeing below message in any log files under /var/log/glusterfs
directroy and its subdirectories.

health-check failed, going down


On Fri, Oct 31, 2014 at 3:16 PM, Kiran Patil ki...@fractalio.com wrote:

 I set zfs pool failmode to continue, which should disable only write and
 not read as explained below

 failmode=wait | continue | panic

Controls the system behavior in the event of catastrophic pool
 failure. This condition is typically a result of a loss of connec-
tivity  to  the underlying storage device(s) or a failure of
 all devices within the pool. The behavior of such an event is deter-
mined as follows:

waitBlocks all I/O access until the device connectivity
 is recovered and the errors are  cleared.  This  is  the  default
behavior.

continueReturns  EIO  to  any  new  write  I/O  requests
  but allows reads to any of the remaining healthy devices. Any write
requests that have yet to be committed to disk
 would be blocked.

panic   Prints out a message to the console and generates a
 system crash dump.


 Now, I rebuilt the glusterfs master and tried to see if failed driver
 results in failed brick and in turn kill brick process and the brick is not
 going offline.

 # gluster volume status
 Status of volume: repvol
 Gluster process Port Online Pid

 --
 Brick 192.168.1.246:/zp1/brick1 49152 Y 2400
 Brick 192.168.1.246:/zp2/brick2 49153 Y 2407
 NFS Server on localhost 2049 Y 30488
 Self-heal Daemon on localhost N/A Y 30495

 Task Status of Volume repvol

 --
 There are no active volume tasks


 The /var/log/gluster/mnt.log output:

 [2014-10-31 09:18:15.934700] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk]
 0-repvol-client-1: socket disconnected
 [2014-10-31 09:18:15.934725] I [client.c:2215:client_rpc_notify]
 0-repvol-client-1: disconnected from repvol-client-1. Client process will
 keep trying to connect to glusterd until brick's port is available
 [2014-10-31 09:18:15.935238] I [rpc-clnt.c:1765:rpc_clnt_reconfig]
 0-repvol-client-1: changing port to 49153 (from 0)

 Now if I copy a file to /mnt it copied without any hang and brick still
 shows online.

 Thanks,
 Kiran.

 On Tue, Oct 28, 2014 at 3:44 PM, Niels de Vos nde...@redhat.com wrote:

 On Tue, Oct 28, 2014 at 02:08:32PM +0530, Kiran Patil wrote:
  The content of file zp2-brick2.log is at http://ur1.ca/iku0l (
  http://fpaste.org/145714/44849041/ )
 
  I can't open the file /zp2/brick2/.glusterfs/health_check since it hangs
  due to no disk present.
 
  Let me know the filename pattern, so that I can find it.

 Hmm, if there is a hang while reading from the disk, it will not get
 detected in the current solution. We implemented failure detection on
 top of the detection that is done by the filesystem. Suspending a
 filesystem with fsfreeze or similar should probably not be seen as a
 failure.

 In your case, it seems that the filesystem suspends itself when the disk
 went away. I have no idea if it is possible to configure ZFS to not
 suspend, but return an error to the reading/writing application. Please
 check with such an option.

 If you find such an option, please update the wiki page and recommend
 enabling it:
 - http://gluster.org/community/documentation/index.php/GlusterOnZFS


 Thanks,
 Niels


 
  On Tue, Oct 28, 2014 at 1:42 PM, Niels de Vos nde...@redhat.com
 wrote:
 
   On Tue, Oct 28, 2014 at 01:10:56PM +0530, Kiran Patil wrote:
I applied the patches, compiled and installed the gluster.
   
# glusterfs --version
glusterfs 3.7dev built on Oct 28 2014 12:03:10
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. http://www.redhat.com/
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
   
# git log
commit 990ce16151c3af17e4cdaa94608b737940b60e4d
Author: Lalatendu Mohanty lmoha...@redhat.com
Date:   Tue Jul 1 07:52:27 2014 -0400
   
Posix: Brick failure detection fix for ext4 filesystem
...
...
   
I see below messages
  
   Many thanks Kiran!
  
   Do you have the messages from the brick that uses the zp2 mountpoint?
  
   There also should be a file with a timestamp when the last check was
   done successfully. If the brick is still running, this timestamp
 should
   get updated every storage.health-check-interval seconds:
   /zp2/brick2/.glusterfs/health_check
  
   Niels
  
   
File /var/log/glusterfs/etc-glusterfs-glusterd.vol.log :
   
The message I [MSGID: 

Re: [Gluster-devel] [glusterfs-3.6.0beta3-0.11.gitd01b00a] gluster volume status is running even though the Disk is detached

2014-10-28 Thread Kiran Patil
I changed  git fetch git://review.gluster.org/glusterfs  to git fetch
http://review.gluster.org/glusterfs  and now it works.

Thanks,
Kiran.

On Tue, Oct 28, 2014 at 11:13 AM, Kiran Patil ki...@fractalio.com wrote:

 Hi Niels,

 I am getting fatal: Couldn't find remote ref refs/changes/13/8213/9
 error.

 Steps to reproduce the issue.

 1) # git clone git://review.gluster.org/glusterfs
 Initialized empty Git repository in /root/gluster-3.6/glusterfs/.git/
 remote: Counting objects: 84921, done.
 remote: Compressing objects: 100% (48307/48307), done.
 remote: Total 84921 (delta 57264), reused 63233 (delta 36254)
 Receiving objects: 100% (84921/84921), 23.23 MiB | 192 KiB/s, done.
 Resolving deltas: 100% (57264/57264), done.

 2) # cd glusterfs
 # git branch
 * master

 3) # git fetch git://review.gluster.org/glusterfs refs/changes/13/8213/9
  git checkout FETCH_HEAD
 fatal: Couldn't find remote ref refs/changes/13/8213/9

 Note: I also tried the above steps on git repo
 https://github.com/gluster/glusterfs and the result is same as above.

 Please let me know if I miss any steps.

 Thanks,
 Kiran.

 On Mon, Oct 27, 2014 at 5:53 PM, Niels de Vos nde...@redhat.com wrote:

 On Mon, Oct 27, 2014 at 05:19:13PM +0530, Kiran Patil wrote:
  Hi,
 
  I created replicated vol with two bricks on the same node and copied
 some
  data to it.
 
  Now removed the disk which has hosted one of the brick of the volume.
 
  Storage.health-check-interval is set to 30 seconds.
 
  I could see the disk is unavailable using zpool command of zfs on linux
 but
  the gluster volume status still displays the brick process running which
  should have been shutdown by this time.
 
  Is this a bug in 3.6 since it is mentioned as feature 
 
 https://github.com/gluster/glusterfs/blob/release-3.6/doc/features/brick-failure-detection.md
 
   or am I doing any mistakes here?

 The initial detection of brick failures did not work for all
 filesystems. It may not work for ZFS too. A fix has been posted, but it
 has not been merged into the master branch yet. When the change has been
 merged, it can get backported to 3.6 and 3.5.

 You may want to test with the patch applied, and add your +1 Verified
 to the change in case it makes it functional for you:
 - http://review.gluster.org/8213

 Cheers,
 Niels

 
  [root@fractal-c92e gluster-3.6]# gluster volume status
  Status of volume: repvol
  Gluster process Port Online Pid
 
 --
  Brick 192.168.1.246:/zp1/brick1 49154 Y 17671
  Brick 192.168.1.246:/zp2/brick2 49155 Y 17682
  NFS Server on localhost 2049 Y 17696
  Self-heal Daemon on localhost N/A Y 17701
 
  Task Status of Volume repvol
 
 --
  There are no active volume tasks
 
 
  [root@fractal-c92e gluster-3.6]# gluster volume info
 
  Volume Name: repvol
  Type: Replicate
  Volume ID: d4f992b1-1393-43b8-9fda-2e2b6e3b5039
  Status: Started
  Number of Bricks: 1 x 2 = 2
  Transport-type: tcp
  Bricks:
  Brick1: 192.168.1.246:/zp1/brick1
  Brick2: 192.168.1.246:/zp2/brick2
  Options Reconfigured:
  storage.health-check-interval: 30
 
  [root@fractal-c92e gluster-3.6]# zpool status zp2
pool: zp2
   state: UNAVAIL
  status: One or more devices are faulted in response to IO failures.
  action: Make sure the affected devices are connected, then run 'zpool
  clear'.
 see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: none requested
  config:
 
  NAMESTATE READ WRITE CKSUM
  zp2 UNAVAIL  0 0 0  insufficient replicas
sdb   UNAVAIL  0 0 0
 
  errors: 2 data errors, use '-v' for a list
 
 
  Thanks,
  Kiran.

  ___
  Gluster-devel mailing list
  Gluster-devel@gluster.org
  http://supercolony.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [glusterfs-3.6.0beta3-0.11.gitd01b00a] gluster volume status is running even though the Disk is detached

2014-10-28 Thread Kiran Patil
I applied the patches, compiled and installed the gluster.

# glusterfs --version
glusterfs 3.7dev built on Oct 28 2014 12:03:10
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. http://www.redhat.com/
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

# git log
commit 990ce16151c3af17e4cdaa94608b737940b60e4d
Author: Lalatendu Mohanty lmoha...@redhat.com
Date:   Tue Jul 1 07:52:27 2014 -0400

Posix: Brick failure detection fix for ext4 filesystem
...
...

I see below messages

File /var/log/glusterfs/etc-glusterfs-glusterd.vol.log :

The message I [MSGID: 106005]
[glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
192.168.1.246:/zp2/brick2 has disconnected from glusterd. repeated 39
times between [2014-10-28 05:58:09.209419] and [2014-10-28 06:00:06.226330]
[2014-10-28 06:00:09.226507] W [socket.c:545:__socket_rwv] 0-management:
readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
argument)
[2014-10-28 06:00:09.226712] I [MSGID: 106005]
[glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
192.168.1.246:/zp2/brick2 has disconnected from glusterd.
[2014-10-28 06:00:12.226881] W [socket.c:545:__socket_rwv] 0-management:
readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
argument)
[2014-10-28 06:00:15.227249] W [socket.c:545:__socket_rwv] 0-management:
readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
argument)
[2014-10-28 06:00:18.227616] W [socket.c:545:__socket_rwv] 0-management:
readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
argument)
[2014-10-28 06:00:21.227976] W [socket.c:545:__socket_rwv] 0-management:
readv on

.
.

[2014-10-28 06:19:15.142867] I
[glusterd-handler.c:1280:__glusterd_handle_cli_get_volume] 0-glusterd:
Received get vol req
The message I [MSGID: 106005]
[glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
192.168.1.246:/zp2/brick2 has disconnected from glusterd. repeated 12
times between [2014-10-28 06:18:09.368752] and [2014-10-28 06:18:45.373063]
[2014-10-28 06:23:38.207649] W [glusterfsd.c:1194:cleanup_and_exit] (--
0-: received signum (15), shutting down


dmesg output:

SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
encountered an uncorrectable I/O failure and has been suspended.

SPLError: 7868:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
encountered an uncorrectable I/O failure and has been suspended.

SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
encountered an uncorrectable I/O failure and has been suspended.

The brick is still online.

# gluster volume status
Status of volume: repvol
Gluster process Port Online Pid
--
Brick 192.168.1.246:/zp1/brick1 49152 Y 4067
Brick 192.168.1.246:/zp2/brick2 49153 Y 4078
NFS Server on localhost 2049 Y 4092
Self-heal Daemon on localhost N/A Y 4097

Task Status of Volume repvol
--
There are no active volume tasks

# gluster volume info

Volume Name: repvol
Type: Replicate
Volume ID: ba1e7c6d-1e1c-45cd-8132-5f4fa4d2d22b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.1.246:/zp1/brick1
Brick2: 192.168.1.246:/zp2/brick2
Options Reconfigured:
storage.health-check-interval: 30

Let me know if you need further information.

Thanks,
Kiran.

On Tue, Oct 28, 2014 at 11:44 AM, Kiran Patil ki...@fractalio.com wrote:

 I changed  git fetch git://review.gluster.org/glusterfs  to git fetch
 http://review.gluster.org/glusterfs  and now it works.

 Thanks,
 Kiran.

 On Tue, Oct 28, 2014 at 11:13 AM, Kiran Patil ki...@fractalio.com wrote:

 Hi Niels,

 I am getting fatal: Couldn't find remote ref refs/changes/13/8213/9
 error.

 Steps to reproduce the issue.

 1) # git clone git://review.gluster.org/glusterfs
 Initialized empty Git repository in /root/gluster-3.6/glusterfs/.git/
 remote: Counting objects: 84921, done.
 remote: Compressing objects: 100% (48307/48307), done.
 remote: Total 84921 (delta 57264), reused 63233 (delta 36254)
 Receiving objects: 100% (84921/84921), 23.23 MiB | 192 KiB/s, done.
 Resolving deltas: 100% (57264/57264), done.

 2) # cd glusterfs
 # git branch
 * master

 3) # git fetch git://review.gluster.org/glusterfs refs/changes/13/8213/9
  git checkout FETCH_HEAD
 fatal: Couldn't find remote ref refs/changes/13/8213/9

 Note: I also tried the above steps on git repo
 https://github.com/gluster/glusterfs and the result is same as above.

 Please let me know if I miss any steps.

 Thanks,
 Kiran.

 On Mon, Oct 27, 2014 at 5:53 PM, Niels de Vos 

Re: [Gluster-devel] [glusterfs-3.6.0beta3-0.11.gitd01b00a] gluster volume status is running even though the Disk is detached

2014-10-28 Thread Niels de Vos
On Tue, Oct 28, 2014 at 01:10:56PM +0530, Kiran Patil wrote:
 I applied the patches, compiled and installed the gluster.
 
 # glusterfs --version
 glusterfs 3.7dev built on Oct 28 2014 12:03:10
 Repository revision: git://git.gluster.com/glusterfs.git
 Copyright (c) 2006-2013 Red Hat, Inc. http://www.redhat.com/
 GlusterFS comes with ABSOLUTELY NO WARRANTY.
 It is licensed to you under your choice of the GNU Lesser
 General Public License, version 3 or any later version (LGPLv3
 or later), or the GNU General Public License, version 2 (GPLv2),
 in all cases as published by the Free Software Foundation.
 
 # git log
 commit 990ce16151c3af17e4cdaa94608b737940b60e4d
 Author: Lalatendu Mohanty lmoha...@redhat.com
 Date:   Tue Jul 1 07:52:27 2014 -0400
 
 Posix: Brick failure detection fix for ext4 filesystem
 ...
 ...
 
 I see below messages

Many thanks Kiran!

Do you have the messages from the brick that uses the zp2 mountpoint?

There also should be a file with a timestamp when the last check was
done successfully. If the brick is still running, this timestamp should
get updated every storage.health-check-interval seconds:
/zp2/brick2/.glusterfs/health_check

Niels

 
 File /var/log/glusterfs/etc-glusterfs-glusterd.vol.log :
 
 The message I [MSGID: 106005]
 [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
 192.168.1.246:/zp2/brick2 has disconnected from glusterd. repeated 39
 times between [2014-10-28 05:58:09.209419] and [2014-10-28 06:00:06.226330]
 [2014-10-28 06:00:09.226507] W [socket.c:545:__socket_rwv] 0-management:
 readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
 argument)
 [2014-10-28 06:00:09.226712] I [MSGID: 106005]
 [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
 192.168.1.246:/zp2/brick2 has disconnected from glusterd.
 [2014-10-28 06:00:12.226881] W [socket.c:545:__socket_rwv] 0-management:
 readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
 argument)
 [2014-10-28 06:00:15.227249] W [socket.c:545:__socket_rwv] 0-management:
 readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
 argument)
 [2014-10-28 06:00:18.227616] W [socket.c:545:__socket_rwv] 0-management:
 readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
 argument)
 [2014-10-28 06:00:21.227976] W [socket.c:545:__socket_rwv] 0-management:
 readv on
 
 .
 .
 
 [2014-10-28 06:19:15.142867] I
 [glusterd-handler.c:1280:__glusterd_handle_cli_get_volume] 0-glusterd:
 Received get vol req
 The message I [MSGID: 106005]
 [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
 192.168.1.246:/zp2/brick2 has disconnected from glusterd. repeated 12
 times between [2014-10-28 06:18:09.368752] and [2014-10-28 06:18:45.373063]
 [2014-10-28 06:23:38.207649] W [glusterfsd.c:1194:cleanup_and_exit] (--
 0-: received signum (15), shutting down
 
 
 dmesg output:
 
 SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
 encountered an uncorrectable I/O failure and has been suspended.
 
 SPLError: 7868:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
 encountered an uncorrectable I/O failure and has been suspended.
 
 SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
 encountered an uncorrectable I/O failure and has been suspended.
 
 The brick is still online.
 
 # gluster volume status
 Status of volume: repvol
 Gluster process Port Online Pid
 --
 Brick 192.168.1.246:/zp1/brick1 49152 Y 4067
 Brick 192.168.1.246:/zp2/brick2 49153 Y 4078
 NFS Server on localhost 2049 Y 4092
 Self-heal Daemon on localhost N/A Y 4097
 
 Task Status of Volume repvol
 --
 There are no active volume tasks
 
 # gluster volume info
 
 Volume Name: repvol
 Type: Replicate
 Volume ID: ba1e7c6d-1e1c-45cd-8132-5f4fa4d2d22b
 Status: Started
 Number of Bricks: 1 x 2 = 2
 Transport-type: tcp
 Bricks:
 Brick1: 192.168.1.246:/zp1/brick1
 Brick2: 192.168.1.246:/zp2/brick2
 Options Reconfigured:
 storage.health-check-interval: 30
 
 Let me know if you need further information.
 
 Thanks,
 Kiran.
 
 On Tue, Oct 28, 2014 at 11:44 AM, Kiran Patil ki...@fractalio.com wrote:
 
  I changed  git fetch git://review.gluster.org/glusterfs  to git fetch
  http://review.gluster.org/glusterfs  and now it works.
 
  Thanks,
  Kiran.
 
  On Tue, Oct 28, 2014 at 11:13 AM, Kiran Patil ki...@fractalio.com wrote:
 
  Hi Niels,
 
  I am getting fatal: Couldn't find remote ref refs/changes/13/8213/9
  error.
 
  Steps to reproduce the issue.
 
  1) # git clone git://review.gluster.org/glusterfs
  Initialized empty Git repository in /root/gluster-3.6/glusterfs/.git/
  remote: Counting objects: 84921, done.
  remote: Compressing objects: 100% (48307/48307), done.
  remote: Total 84921 (delta 57264), reused 63233 (delta 36254)
  Receiving objects: 100% 

Re: [Gluster-devel] [glusterfs-3.6.0beta3-0.11.gitd01b00a] gluster volume status is running even though the Disk is detached

2014-10-28 Thread Niels de Vos
On Tue, Oct 28, 2014 at 02:08:32PM +0530, Kiran Patil wrote:
 The content of file zp2-brick2.log is at http://ur1.ca/iku0l (
 http://fpaste.org/145714/44849041/ )
 
 I can't open the file /zp2/brick2/.glusterfs/health_check since it hangs
 due to no disk present.
 
 Let me know the filename pattern, so that I can find it.

Hmm, if there is a hang while reading from the disk, it will not get
detected in the current solution. We implemented failure detection on
top of the detection that is done by the filesystem. Suspending a
filesystem with fsfreeze or similar should probably not be seen as a
failure.

In your case, it seems that the filesystem suspends itself when the disk
went away. I have no idea if it is possible to configure ZFS to not
suspend, but return an error to the reading/writing application. Please
check with such an option.

If you find such an option, please update the wiki page and recommend
enabling it:
- http://gluster.org/community/documentation/index.php/GlusterOnZFS


Thanks,
Niels


 
 On Tue, Oct 28, 2014 at 1:42 PM, Niels de Vos nde...@redhat.com wrote:
 
  On Tue, Oct 28, 2014 at 01:10:56PM +0530, Kiran Patil wrote:
   I applied the patches, compiled and installed the gluster.
  
   # glusterfs --version
   glusterfs 3.7dev built on Oct 28 2014 12:03:10
   Repository revision: git://git.gluster.com/glusterfs.git
   Copyright (c) 2006-2013 Red Hat, Inc. http://www.redhat.com/
   GlusterFS comes with ABSOLUTELY NO WARRANTY.
   It is licensed to you under your choice of the GNU Lesser
   General Public License, version 3 or any later version (LGPLv3
   or later), or the GNU General Public License, version 2 (GPLv2),
   in all cases as published by the Free Software Foundation.
  
   # git log
   commit 990ce16151c3af17e4cdaa94608b737940b60e4d
   Author: Lalatendu Mohanty lmoha...@redhat.com
   Date:   Tue Jul 1 07:52:27 2014 -0400
  
   Posix: Brick failure detection fix for ext4 filesystem
   ...
   ...
  
   I see below messages
 
  Many thanks Kiran!
 
  Do you have the messages from the brick that uses the zp2 mountpoint?
 
  There also should be a file with a timestamp when the last check was
  done successfully. If the brick is still running, this timestamp should
  get updated every storage.health-check-interval seconds:
  /zp2/brick2/.glusterfs/health_check
 
  Niels
 
  
   File /var/log/glusterfs/etc-glusterfs-glusterd.vol.log :
  
   The message I [MSGID: 106005]
   [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
   192.168.1.246:/zp2/brick2 has disconnected from glusterd. repeated 39
   times between [2014-10-28 05:58:09.209419] and [2014-10-28
  06:00:06.226330]
   [2014-10-28 06:00:09.226507] W [socket.c:545:__socket_rwv] 0-management:
   readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
   argument)
   [2014-10-28 06:00:09.226712] I [MSGID: 106005]
   [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
   192.168.1.246:/zp2/brick2 has disconnected from glusterd.
   [2014-10-28 06:00:12.226881] W [socket.c:545:__socket_rwv] 0-management:
   readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
   argument)
   [2014-10-28 06:00:15.227249] W [socket.c:545:__socket_rwv] 0-management:
   readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
   argument)
   [2014-10-28 06:00:18.227616] W [socket.c:545:__socket_rwv] 0-management:
   readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
   argument)
   [2014-10-28 06:00:21.227976] W [socket.c:545:__socket_rwv] 0-management:
   readv on
  
   .
   .
  
   [2014-10-28 06:19:15.142867] I
   [glusterd-handler.c:1280:__glusterd_handle_cli_get_volume] 0-glusterd:
   Received get vol req
   The message I [MSGID: 106005]
   [glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
   192.168.1.246:/zp2/brick2 has disconnected from glusterd. repeated 12
   times between [2014-10-28 06:18:09.368752] and [2014-10-28
  06:18:45.373063]
   [2014-10-28 06:23:38.207649] W [glusterfsd.c:1194:cleanup_and_exit] (--
   0-: received signum (15), shutting down
  
  
   dmesg output:
  
   SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
   encountered an uncorrectable I/O failure and has been suspended.
  
   SPLError: 7868:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
   encountered an uncorrectable I/O failure and has been suspended.
  
   SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
   encountered an uncorrectable I/O failure and has been suspended.
  
   The brick is still online.
  
   # gluster volume status
   Status of volume: repvol
   Gluster process Port Online Pid
  
  --
   Brick 192.168.1.246:/zp1/brick1 49152 Y 4067
   Brick 192.168.1.246:/zp2/brick2 49153 Y 4078
   NFS Server on localhost 2049 Y 4092
   Self-heal Daemon on localhost N/A Y 4097
  
   

Re: [Gluster-devel] [glusterfs-3.6.0beta3-0.11.gitd01b00a] gluster volume status is running even though the Disk is detached

2014-10-27 Thread Kiran Patil
Hi Niels,

I am getting fatal: Couldn't find remote ref refs/changes/13/8213/9 error.

Steps to reproduce the issue.

1) # git clone git://review.gluster.org/glusterfs
Initialized empty Git repository in /root/gluster-3.6/glusterfs/.git/
remote: Counting objects: 84921, done.
remote: Compressing objects: 100% (48307/48307), done.
remote: Total 84921 (delta 57264), reused 63233 (delta 36254)
Receiving objects: 100% (84921/84921), 23.23 MiB | 192 KiB/s, done.
Resolving deltas: 100% (57264/57264), done.

2) # cd glusterfs
# git branch
* master

3) # git fetch git://review.gluster.org/glusterfs refs/changes/13/8213/9 
git checkout FETCH_HEAD
fatal: Couldn't find remote ref refs/changes/13/8213/9

Note: I also tried the above steps on git repo
https://github.com/gluster/glusterfs and the result is same as above.

Please let me know if I miss any steps.

Thanks,
Kiran.

On Mon, Oct 27, 2014 at 5:53 PM, Niels de Vos nde...@redhat.com wrote:

 On Mon, Oct 27, 2014 at 05:19:13PM +0530, Kiran Patil wrote:
  Hi,
 
  I created replicated vol with two bricks on the same node and copied some
  data to it.
 
  Now removed the disk which has hosted one of the brick of the volume.
 
  Storage.health-check-interval is set to 30 seconds.
 
  I could see the disk is unavailable using zpool command of zfs on linux
 but
  the gluster volume status still displays the brick process running which
  should have been shutdown by this time.
 
  Is this a bug in 3.6 since it is mentioned as feature 
 
 https://github.com/gluster/glusterfs/blob/release-3.6/doc/features/brick-failure-detection.md
 
   or am I doing any mistakes here?

 The initial detection of brick failures did not work for all
 filesystems. It may not work for ZFS too. A fix has been posted, but it
 has not been merged into the master branch yet. When the change has been
 merged, it can get backported to 3.6 and 3.5.

 You may want to test with the patch applied, and add your +1 Verified
 to the change in case it makes it functional for you:
 - http://review.gluster.org/8213

 Cheers,
 Niels

 
  [root@fractal-c92e gluster-3.6]# gluster volume status
  Status of volume: repvol
  Gluster process Port Online Pid
 
 --
  Brick 192.168.1.246:/zp1/brick1 49154 Y 17671
  Brick 192.168.1.246:/zp2/brick2 49155 Y 17682
  NFS Server on localhost 2049 Y 17696
  Self-heal Daemon on localhost N/A Y 17701
 
  Task Status of Volume repvol
 
 --
  There are no active volume tasks
 
 
  [root@fractal-c92e gluster-3.6]# gluster volume info
 
  Volume Name: repvol
  Type: Replicate
  Volume ID: d4f992b1-1393-43b8-9fda-2e2b6e3b5039
  Status: Started
  Number of Bricks: 1 x 2 = 2
  Transport-type: tcp
  Bricks:
  Brick1: 192.168.1.246:/zp1/brick1
  Brick2: 192.168.1.246:/zp2/brick2
  Options Reconfigured:
  storage.health-check-interval: 30
 
  [root@fractal-c92e gluster-3.6]# zpool status zp2
pool: zp2
   state: UNAVAIL
  status: One or more devices are faulted in response to IO failures.
  action: Make sure the affected devices are connected, then run 'zpool
  clear'.
 see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: none requested
  config:
 
  NAMESTATE READ WRITE CKSUM
  zp2 UNAVAIL  0 0 0  insufficient replicas
sdb   UNAVAIL  0 0 0
 
  errors: 2 data errors, use '-v' for a list
 
 
  Thanks,
  Kiran.

  ___
  Gluster-devel mailing list
  Gluster-devel@gluster.org
  http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel