[Gluster-users] Directory ctime/mtime not synced on node being healed

2015-11-27 Thread Tom Pepper
Recently, we lost a brick in a 4-node distribute + replica 2 volume.  The host 
was fine so we simply fixed the hardware failure, recreated the zpool and zfs, 
set the correct trusted.glusterfs.volume-id, restarted the gluster daemons on 
the host and the heal got to work.  The version running is 3.7.4 atop Ubuntu 
Trusty.

However, we’ve noticed that directories are not getting created on the brick 
being healed with the correct ctime and mtime.  Files, however, are being set 
correctly.

$ gluster volume info edc1
 
Volume Name: edc1
Type: Distributed-Replicate
Volume ID: 2f6b5804-e2d8-4400-93e9-b172952b1aae
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: fs4:/fs4/edc1
Brick2: fs5:/fs5/edc1
Brick3: hdfs5:/hdfs5/edc1
Brick4: hdfs6:/hdfs6/edc1
Options Reconfigured:
performance.write-behind-window-size: 1GB
performance.cache-size: 1GB
performance.readdir-ahead: enable
performance.read-ahead: enable

Example:

On the glusterfs mount:

  File: ‘BSA_9781483021973’
  Size: 36  Blocks: 2  IO Block: 131072 directory
Device: 19h/25d Inode: 11345194644681878130  Links: 2
Access: (0777/drwxrwxrwx)  Uid: ( 1007/ UNKNOWN)   Gid: ( 1007/ UNKNOWN)
Access: 2015-11-27 04:01:49.520001319 -0800
Modify: 2014-08-29 09:20:50.006294000 -0700
Change: 2015-02-16 00:04:21.312079523 -0800
 Birth: -

On the unfailed brick:

  File: ‘BSA_9781483021973’
  Size: 10  Blocks: 6  IO Block: 1024   directory
Device: 1ah/26d Inode: 25261   Links: 2
Access: (0777/drwxrwxrwx)  Uid: ( 1007/ UNKNOWN)   Gid: ( 1007/ UNKNOWN)
Access: 2015-11-27 04:01:49.520001319 -0800
Modify: 2014-08-29 09:20:50.006294000 -0700
Change: 2015-02-16 00:04:21.312079523 -0800
 Birth: -

On the failed brick that’s healing:

  File: ‘BSA_9781483021973’
  Size: 10  Blocks: 6  IO Block: 131072 directory
Device: 17h/23d Inode: 252324  Links: 2
Access: (0777/drwxrwxrwx)  Uid: ( 1007/ UNKNOWN)   Gid: ( 1007/ UNKNOWN)
Access: 2015-11-27 10:10:35.441261192 -0800
Modify: 2015-11-25 04:07:36.354860631 -0800
Change: 2015-11-25 04:07:36.354860631 -0800
 Birth: -

Normally, this wouldn’t be an issue, except that the glusterfs is reporting the 
ctime and mtime of the directories that the failed node is now the 
authoritative replica for.  An example:

On a non-failed brick:

  File: ‘BSA_9780792765073’
  Size: 23  Blocks: 6  IO Block: 3072   directory
Device: 1ah/26d Inode: 3734793 Links: 2
Access: (0777/drwxrwxrwx)  Uid: ( 1007/ UNKNOWN)   Gid: ( 1007/ UNKNOWN)
Access: 2015-11-27 10:22:25.374931735 -0800
Modify: 2015-03-24 13:56:53.371733811 -0700
Change: 2015-03-24 13:56:53.371733811 -0700
 Birth: -

On the glusterfs:

  File: ‘BSA_9780792765073’
  Size: 97  Blocks: 2  IO Block: 131072 directory
Device: 19h/25d Inode: 13293019492851992284  Links: 2
Access: (0777/drwxrwxrwx)  Uid: ( 1007/ UNKNOWN)   Gid: ( 1007/ UNKNOWN)
Access: 2015-11-27 10:22:20.922782180 -0800
Modify: 2015-11-25 04:03:21.889978948 -0800
Change: 2015-11-25 04:03:21.889978948 -0800
 Birth: -

Thanks,
-t


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] 3.6.3 Ubuntu PPA

2015-05-23 Thread Tom Pepper
Just wondering if we can expect 3.6.3 to make it to launchpad anytime soon?

Thanks,
-t

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] NFS I/O errors after replicated -> distributed+replicate add-brick

2015-02-26 Thread Tom Pepper
Hi, all:

We had a two-node gluster cluster (replicated, 2 replicas) that recently we 
added two more node/bricks to and performed a rebalance upon, thus making it a 
distributed-replicate volume.  Since doing so, we now see for any NFS access, 
read or write, a “Remote I/O error” whenever performing any operation (stat, 
read, write, whatever) although the operation appears to in fact succeed.

I don’t actually see any information in the gluster logs that would assist.  
The bricks are backstored on ZFS vols.

Any hints?  It’s Gluster 3.6.2 on Ubuntu Trusty.

Clients using glusterfs throw some concerning errors as well - see bottom below 
for examples.

Thanks,
-t



Status of volume: edc1
Gluster process PortOnline  Pid
--
Brick fs4:/fs4/edc1 49154   Y   2435
Brick fs5:/fs5/edc1 49154   Y   2328
Brick hdfs5:/hdfs5/edc1 49152   Y   26725
Brick hdfs6:/hdfs6/edc1 49152   Y   4994
NFS Server on localhost 2049Y   31503
Self-heal Daemon on localhost   N/A Y   31510
NFS Server on 10.54.90.13   2049Y   16310
Self-heal Daemon on 10.54.90.13 N/A Y   16317
NFS Server on hdfs6 2049Y   5006
Self-heal Daemon on hdfs6   N/A Y   5013
NFS Server on hdfs5 2049Y   26737
Self-heal Daemon on hdfs5   N/A Y   26744
 
Task Status of Volume edc1
--
Task : Rebalance   
ID   : b3095ab2-c428-4681-b545-36941a8816f6
Status   : completed   
 
Volume Name: edc1
Type: Distributed-Replicate
Volume ID: 2f6b5804-e2d8-4400-93e9-b172952b1aae
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: fs4:/fs4/edc1
Brick2: fs5:/fs5/edc1
Brick3: hdfs5:/hdfs5/edc1
Brick4: hdfs6:/hdfs6/edc1
Options Reconfigured:
performance.cache-size: 1GB
performance.write-behind-window-size: 1GB

volume edc1-client-0
type protocol/client
option send-gids true
option transport-type tcp
option remote-subvolume /fs4/edc1
option remote-host fs4
option ping-timeout 42
end-volume

volume edc1-client-1
type protocol/client
option send-gids true
option transport-type tcp
option remote-subvolume /fs5/edc1
option remote-host fs5
option ping-timeout 42
end-volume

volume edc1-client-2
type protocol/client
option send-gids true
option transport-type tcp
option remote-subvolume /hdfs5/edc1
option remote-host hdfs5
option ping-timeout 42
end-volume

volume edc1-client-3
type protocol/client
option send-gids true
option transport-type tcp
option remote-subvolume /hdfs6/edc1
option remote-host hdfs6
option ping-timeout 42
end-volume

volume edc1-replicate-0
type cluster/replicate
subvolumes edc1-client-0 edc1-client-1
end-volume

volume edc1-replicate-1
type cluster/replicate
subvolumes edc1-client-2 edc1-client-3
end-volume

volume edc1-dht
type cluster/distribute
subvolumes edc1-replicate-0 edc1-replicate-1
end-volume

volume edc1-write-behind
type performance/write-behind
option cache-size 1GB
subvolumes edc1-dht
end-volume

volume edc1-read-ahead
type performance/read-ahead
subvolumes edc1-write-behind
end-volume

volume edc1-io-cache
type performance/io-cache
option cache-size 1GB
subvolumes edc1-read-ahead
end-volume

volume edc1-quick-read
type performance/quick-read
option cache-size 1GB
subvolumes edc1-io-cache
end-volume

volume edc1-open-behind
type performance/open-behind
subvolumes edc1-quick-read
end-volume

volume edc1-md-cache
type performance/md-cache
subvolumes edc1-open-behind
end-volume

volume edc1
type debug/io-stats
option count-fop-hits off
option latency-measurement off
subvolumes edc1-md-cache
end-volume





[2015-02-26 22:13:07.473839] I [dht-common.c:1822:dht_lookup_cbk] 0-edc1-dht: 
Entry /cc/aspera/sandbox/mwt-wea/.local missing on subvol edc1-replicate-0
[2015-02-26 22:13:07.474890] I [dht-common.c:1822:dht_lookup_cbk] 0-edc1-dht: 
Entry /cc/aspera/sandbox/mwt-wea/.local missing on subvol edc1-replicate-0
[2015-02-26 22:13:07.475891] I [dht-common.c:1822:dht_lookup_cbk] 0-edc1-dht: 
Entry /cc/aspera/sandbox/mwt-wea/.local missing on subvol edc1-replicate-0
[2015-02-26 22:13:07.531037] I [dht-common.c:1822:dht_lookup_cbk] 0-edc1-dht: 
Entry /cc/aspera/sandbox/mwt-wea/.local missing on subvol edc1-replicate-0
[2015-02-26 22:13:07.532210] I