Re: [Gluster-users] after upgrade to 3.6.7 : Internal error xfs_attr3_leaf_write_verify

2015-12-08 Thread Brian Foster
On Thu, Dec 03, 2015 at 03:16:54PM -0500, Vijay Bellur wrote:
> Looks like an issue with xfs. Adding Brian to check if it is a familiar 
> problem.
> 
> Regards,
> Vijay
> 
> - Original Message -
> > From: "Dietmar Putz" 
> > To: gluster-users@gluster.org
> > Sent: Thursday, December 3, 2015 6:06:11 AM
> > Subject: [Gluster-users] after upgrade to 3.6.7 : Internal error
> > xfs_attr3_leaf_write_verify
> > 
> > Hello all,
> > 
> > on 1st december i upgraded two 6 node cluster from glusterfs 3.5.6 to 3.6.7.
> > all of them are equal in hw, os and patchlevel, currently running ubuntu
> > 14.04 lts by an do-release-upgrade from 12.04 lts (this was done before
> > gfs upgrade to 3.5.6, not directly before upgrading to 3.6.7).
> > because of a geo-replication issue all of the nodes have rsync 3.1.1.3
> > installed instead  3.1.0 which comes by the repositories. this is the
> > only deviation from ubuntu repositories for 14.04 lts.
> > since upgrade to gfs 3.6.7 the glusterd on two nodes of the same cluster
> > are going offline after getting an xfs_attr3_leaf_write_verify error for
> > the underlying bricks as shown below.
> > this happens about every 4-5 hours after the problem was solved by an
> > umount / remount of the brick. it makes no difference to run a xfs_check
> > / xfs_repair before remount.
> > xfs_check / xfs_repair did not show any faults. the underlying hw is a
> > raid 5 vol on lsi-9271 8i. megacli does not show any errors.
> > the syslog does not show more than the dmesg output below.
> > every time the same two nodes of the same cluster are affected.
> > as shown in dmesg and syslog, the system recognizes the
> > xfs_attr_leaf_write_verify error about 38 min. before finally giving up.
> > for both events i can not found corresponding events in gluster logs.
> > this is strange...the gluster is historical grown from 3.2.5, 3.3, to
> > 3.4.6/7 which was running well for month, gfs 3.5.6 was running for
> > about two weeks and upgrade to 3.6.7 was done because of a geo-repl
> > log-flood.
> > even when i have no hint/evidence that this is caused by gfs 3.6.7
> > somehow i believe that this is the case...
> > does anybody experienced such an error or have some hints to getting out
> > of this big problem...?
> > unfortunately the affected cluster is the master of a geo-replication
> > which is not well running since update from gfs 3.4.7...fortunately both
> > affected gluster-nodes are not of the same sub-volume.
> > 
> > any help is appreciated...
> > 
> > best regards
> > dietmar
> > 
...
> > - root@gluster-ger-ber-10  /var/log $dmesg -T
> > ...
> > [Wed Dec  2 12:43:47 2015] XFS (sdc1): xfs_log_force: error 5 returned.
> > [Wed Dec  2 12:43:48 2015] XFS (sdc1): xfs_log_force: error 5 returned.
> > [Wed Dec  2 12:45:58 2015] XFS (sdc1): Mounting Filesystem
> > [Wed Dec  2 12:45:58 2015] XFS (sdc1): Starting recovery (logdev: internal)
> > [Wed Dec  2 12:45:59 2015] XFS (sdc1): Ending recovery (logdev: internal)
> > [Wed Dec  2 13:11:53 2015] XFS (sdc1): Mounting Filesystem
> > [Wed Dec  2 13:11:54 2015] XFS (sdc1): Ending clean mount
> > [Wed Dec  2 13:12:29 2015] init: statd main process (25924) killed by
> > KILL signal
> > [Wed Dec  2 13:12:29 2015] init: statd main process ended, respawning
> > [Wed Dec  2 13:13:24 2015] init: statd main process (13433) killed by
> > KILL signal
> > [Wed Dec  2 13:13:24 2015] init: statd main process ended, respawning
> > [Wed Dec  2 17:22:28 2015] 8807076b1000: 00 00 00 00 00 00 00 00 fb
> > ee 00 00 00 00 00 00  
> > [Wed Dec  2 17:22:28 2015] 8807076b1010: 10 00 00 00 00 20 0f e0 00
> > 00 00 00 00 00 00 00  . ..
> > [Wed Dec  2 17:22:28 2015] 8807076b1020: 00 00 00 00 00 00 00 00 00
> > 00 00 00 00 00 00 00  
> > [Wed Dec  2 17:22:28 2015] 8807076b1030: 00 00 00 00 00 00 00 00 00
> > 00 00 00 00 00 00 00  
> > [Wed Dec  2 17:22:28 2015] XFS (sdc1): Internal error
> > xfs_attr3_leaf_write_verify at line 216 of file
> > /build/linux-XHaR1x/linux-3.13.0/fs/xfs/xfs_attr_leaf.c.  Caller
> > 0xa01a66f0

That's a write verifier error on an extended attribute write. The
purpose of the write verifier is to check metadata structure immediately
prior to write submission. Failure means some kind of corruption has
occurred in memory and the filesystem shuts down to prevent any further
damage.

Is this an upstream stable 3.13 kernel or a distro kernel? You could try
something more recent and see if it resolves t

Re: [Gluster-users] after upgrade to 3.6.7 : Internal error xfs_attr3_leaf_write_verify

2015-12-06 Thread Julius Thomas

Hi Saravana,

we are having this issue since upgrading from glusterfs 3.4.7. to 3.5.6 
to 3.6.7.

Now we are trying to downgrade the kernel now.

The bug is communicated here 
https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1468039


Regards

Julius



On 06.12.2015 19:00, Saravanakumar Arumugam wrote:

Hi,
This seems like  XFS filesystem issue.
Can you communicate this error to xfs mailing list?

Thanks,
Saravana

On 12/06/2015 05:23 AM, Julius Thomas wrote:

Dear Gluster Users,

after fixing the problem in the last mail from my colleague by 
upgrading to kernel 3.19.0-39-generic in case of changes with this 
bug in the xfs tree,

the xfs filesystem crashes again after 4 - 5 hours on several peers.

Has anyone recommendations for fixing this problems?
Are there known issues with xfs and ubuntu 14.04?

What is the latest stable release of gluster3, v3.6.3?


You can find latest gluster here.
http://download.gluster.org/pub/gluster/glusterfs/LATEST/

and follow the link here for Ubuntu:
http://download.gluster.org/pub/gluster/glusterfs/LATEST/Ubuntu/

Dec 5 21:14:48 gluster-ger-ber-11 kernel: [16564.018838] XFS (sdc1): 
Metadata corruption detected at 
xfs_attr3_leaf_write_verify+0xe5/0x100 [xfs], block 0x44458e670
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018879] XFS (sdc1): 
Unmount and run xfs_repair
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018895] XFS (sdc1): 
First 64 bytes of corrupted metadata buffer:
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018916] 
880417ff3000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00 

Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018956] 
880417ff3010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00 
. ..
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018984] 
880417ff3020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019011] 
880417ff3030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019041] XFS (sdc1): 
xfs_do_force_shutdown(0x8) called from line 1249 of file 
/build/linux-lts-vivid-1jarlV/linux-lts-vivid-3.19.0/fs/xfs/xfs_buf.c. Return 
address = 0xc02bbd22
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019044] XFS (sdc1): 
Corruption of in-memory data detected.  Shutting down filesystem
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019069] XFS (sdc1): 
Please umount the filesystem and rectify the problem(s)
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.069906] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:15:08 gluster-ger-ber-11 gluster-export[4447]: [2015-12-05 
21:15:08.797327] M 
[posix-helpers.c:1559:posix_health_check_thread_proc] 
0-ger-ber-01-posix: health-check failed, going down
Dec  5 21:15:18 gluster-ger-ber-11 kernel: [16594.068660] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:15:38 gluster-ger-ber-11 gluster-export[4447]: [2015-12-05 
21:15:38.797422] M 
[posix-helpers.c:1564:posix_health_check_thread_proc] 
0-ger-ber-01-posix: still alive! -> SIGTERM
Dec  5 21:15:48 gluster-ger-ber-11 kernel: [16624.119428] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:16:18 gluster-ger-ber-11 kernel: [16654.170134] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:16:48 gluster-ger-ber-11 kernel: [16684.220834] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:17:01 gluster-ger-ber-11 CRON[17656]: (root) CMD ( cd / && 
run-parts --report /etc/cron.hourly)
Dec  5 21:17:18 gluster-ger-ber-11 kernel: [16714.271507] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:17:48 gluster-ger-ber-11 kernel: [16744.322244] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:18:18 gluster-ger-ber-11 kernel: [16774.372948] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:18:48 gluster-ger-ber-11 kernel: [16804.423650] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:19:18 gluster-ger-ber-11 kernel: [16834.474365] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:19:48 gluster-ger-ber-11 kernel: [16864.525082] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:20:18 gluster-ger-ber-11 kernel: [16894.575778] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:20:49 gluster-ger-ber-11 kernel: [16924.626464] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:21:19 gluster-ger-ber-11 kernel: [16954.677161] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:21:49 gluster-ger-ber-11 kernel: [16984.727791] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:22:19 gluster-ger-ber-11 kernel: [17014.778570] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:22:49 gluster-ger-ber-11 kernel: [17044.829240] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:23:19 gluster-ger-ber-11 kernel: [17074.880003] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:23:49 gluster-ger-ber-11 kernel: [17104.930643] XFS (sdc1): 
xfs_log_

Re: [Gluster-users] after upgrade to 3.6.7 : Internal error xfs_attr3_leaf_write_verify

2015-12-06 Thread Saravanakumar Arumugam

Hi,
This seems like  XFS filesystem issue.
Can you communicate this error to xfs mailing list?

Thanks,
Saravana

On 12/06/2015 05:23 AM, Julius Thomas wrote:

Dear Gluster Users,

after fixing the problem in the last mail from my colleague by 
upgrading to kernel 3.19.0-39-generic in case of changes with this bug 
in the xfs tree,

the xfs filesystem crashes again after 4 - 5 hours on several peers.

Has anyone recommendations for fixing this problems?
Are there known issues with xfs and ubuntu 14.04?

What is the latest stable release of gluster3, v3.6.3?


You can find latest gluster here.
http://download.gluster.org/pub/gluster/glusterfs/LATEST/

and follow the link here for Ubuntu:
http://download.gluster.org/pub/gluster/glusterfs/LATEST/Ubuntu/

Dec 5 21:14:48 gluster-ger-ber-11 kernel: [16564.018838] XFS (sdc1): 
Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe5/0x100 
[xfs], block 0x44458e670
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018879] XFS (sdc1): 
Unmount and run xfs_repair
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018895] XFS (sdc1): 
First 64 bytes of corrupted metadata buffer:
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018916] 
880417ff3000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00 

Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018956] 
880417ff3010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00 
. ..
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018984] 
880417ff3020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019011] 
880417ff3030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019041] XFS (sdc1): 
xfs_do_force_shutdown(0x8) called from line 1249 of file 
/build/linux-lts-vivid-1jarlV/linux-lts-vivid-3.19.0/fs/xfs/xfs_buf.c. 
Return address = 0xc02bbd22
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019044] XFS (sdc1): 
Corruption of in-memory data detected.  Shutting down filesystem
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019069] XFS (sdc1): 
Please umount the filesystem and rectify the problem(s)
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.069906] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:15:08 gluster-ger-ber-11 gluster-export[4447]: [2015-12-05 
21:15:08.797327] M 
[posix-helpers.c:1559:posix_health_check_thread_proc] 
0-ger-ber-01-posix: health-check failed, going down
Dec  5 21:15:18 gluster-ger-ber-11 kernel: [16594.068660] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:15:38 gluster-ger-ber-11 gluster-export[4447]: [2015-12-05 
21:15:38.797422] M 
[posix-helpers.c:1564:posix_health_check_thread_proc] 
0-ger-ber-01-posix: still alive! -> SIGTERM
Dec  5 21:15:48 gluster-ger-ber-11 kernel: [16624.119428] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:16:18 gluster-ger-ber-11 kernel: [16654.170134] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:16:48 gluster-ger-ber-11 kernel: [16684.220834] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:17:01 gluster-ger-ber-11 CRON[17656]: (root) CMD (   cd / && 
run-parts --report /etc/cron.hourly)
Dec  5 21:17:18 gluster-ger-ber-11 kernel: [16714.271507] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:17:48 gluster-ger-ber-11 kernel: [16744.322244] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:18:18 gluster-ger-ber-11 kernel: [16774.372948] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:18:48 gluster-ger-ber-11 kernel: [16804.423650] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:19:18 gluster-ger-ber-11 kernel: [16834.474365] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:19:48 gluster-ger-ber-11 kernel: [16864.525082] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:20:18 gluster-ger-ber-11 kernel: [16894.575778] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:20:49 gluster-ger-ber-11 kernel: [16924.626464] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:21:19 gluster-ger-ber-11 kernel: [16954.677161] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:21:49 gluster-ger-ber-11 kernel: [16984.727791] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:22:19 gluster-ger-ber-11 kernel: [17014.778570] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:22:49 gluster-ger-ber-11 kernel: [17044.829240] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:23:19 gluster-ger-ber-11 kernel: [17074.880003] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:23:49 gluster-ger-ber-11 kernel: [17104.930643] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:24:19 gluster-ger-ber-11 kernel: [17134.981336] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:24:49 gluster-ger-ber-11 kernel: [17165.032049] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:25:19 gluster-ger-ber-11 kernel: [17195.082689] XFS (sdc1): 
xfs_log_forc

Re: [Gluster-users] after upgrade to 3.6.7 : Internal error xfs_attr3_leaf_write_verify

2015-12-05 Thread Julius Thomas

Dear Gluster Users,

after fixing the problem in the last mail from my colleague by upgrading 
to kernel 3.19.0-39-generic in case of changes with this bug in the xfs 
tree,

the xfs filesystem crashes again after 4 - 5 hours on several peers.

Has anyone recommendations for fixing this problems?
Are there known issues with xfs and ubuntu 14.04?

What is the latest stable release of gluster3, v3.6.3?

Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018838] XFS (sdc1): 
Metadata corruption detected at xfs_attr3_leaf_write_verify+0xe5/0x100 
[xfs], block 0x44458e670
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018879] XFS (sdc1): 
Unmount and run xfs_repair
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018895] XFS (sdc1): 
First 64 bytes of corrupted metadata buffer:
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018916] 
880417ff3000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 00 00 00 

Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018956] 
880417ff3010: 10 00 00 00 00 20 0f e0 00 00 00 00 00 00 00 00 . 
..
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.018984] 
880417ff3020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019011] 
880417ff3030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019041] XFS (sdc1): 
xfs_do_force_shutdown(0x8) called from line 1249 of file 
/build/linux-lts-vivid-1jarlV/linux-lts-vivid-3.19.0/fs/xfs/xfs_buf.c. 
Return address = 0xc02bbd22
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019044] XFS (sdc1): 
Corruption of in-memory data detected.  Shutting down filesystem
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.019069] XFS (sdc1): 
Please umount the filesystem and rectify the problem(s)
Dec  5 21:14:48 gluster-ger-ber-11 kernel: [16564.069906] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:15:08 gluster-ger-ber-11 gluster-export[4447]: [2015-12-05 
21:15:08.797327] M [posix-helpers.c:1559:posix_health_check_thread_proc] 
0-ger-ber-01-posix: health-check failed, going down
Dec  5 21:15:18 gluster-ger-ber-11 kernel: [16594.068660] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:15:38 gluster-ger-ber-11 gluster-export[4447]: [2015-12-05 
21:15:38.797422] M [posix-helpers.c:1564:posix_health_check_thread_proc] 
0-ger-ber-01-posix: still alive! -> SIGTERM
Dec  5 21:15:48 gluster-ger-ber-11 kernel: [16624.119428] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:16:18 gluster-ger-ber-11 kernel: [16654.170134] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:16:48 gluster-ger-ber-11 kernel: [16684.220834] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:17:01 gluster-ger-ber-11 CRON[17656]: (root) CMD (   cd / && 
run-parts --report /etc/cron.hourly)
Dec  5 21:17:18 gluster-ger-ber-11 kernel: [16714.271507] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:17:48 gluster-ger-ber-11 kernel: [16744.322244] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:18:18 gluster-ger-ber-11 kernel: [16774.372948] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:18:48 gluster-ger-ber-11 kernel: [16804.423650] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:19:18 gluster-ger-ber-11 kernel: [16834.474365] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:19:48 gluster-ger-ber-11 kernel: [16864.525082] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:20:18 gluster-ger-ber-11 kernel: [16894.575778] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:20:49 gluster-ger-ber-11 kernel: [16924.626464] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:21:19 gluster-ger-ber-11 kernel: [16954.677161] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:21:49 gluster-ger-ber-11 kernel: [16984.727791] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:22:19 gluster-ger-ber-11 kernel: [17014.778570] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:22:49 gluster-ger-ber-11 kernel: [17044.829240] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:23:19 gluster-ger-ber-11 kernel: [17074.880003] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:23:49 gluster-ger-ber-11 kernel: [17104.930643] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:24:19 gluster-ger-ber-11 kernel: [17134.981336] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:24:49 gluster-ger-ber-11 kernel: [17165.032049] XFS (sdc1): 
xfs_log_force: error -5 returned.
Dec  5 21:25:19 gluster-ger-ber-11 kernel: [17195.082689] XFS (sdc1): 
xfs_log_force: error -5 returned.


On 03.12.2015 12:06, Dietmar Putz wrote:

Hello all,

on 1st december i upgraded two 6 node cluster from glusterfs 3.5.6 to 
3.6.7.
all of them are equal in hw, os and patchlevel, currently running 
ubuntu 14.04 lts by an do-release-upgrade from 12.04 lts (this was 
done before gfs upgrade to 3.5.6, not directly before upgrading to 

Re: [Gluster-users] after upgrade to 3.6.7 : Internal error xfs_attr3_leaf_write_verify

2015-12-03 Thread Vijay Bellur
Looks like an issue with xfs. Adding Brian to check if it is a familiar problem.

Regards,
Vijay

- Original Message -
> From: "Dietmar Putz" 
> To: gluster-users@gluster.org
> Sent: Thursday, December 3, 2015 6:06:11 AM
> Subject: [Gluster-users] after upgrade to 3.6.7 : Internal error  
> xfs_attr3_leaf_write_verify
> 
> Hello all,
> 
> on 1st december i upgraded two 6 node cluster from glusterfs 3.5.6 to 3.6.7.
> all of them are equal in hw, os and patchlevel, currently running ubuntu
> 14.04 lts by an do-release-upgrade from 12.04 lts (this was done before
> gfs upgrade to 3.5.6, not directly before upgrading to 3.6.7).
> because of a geo-replication issue all of the nodes have rsync 3.1.1.3
> installed instead  3.1.0 which comes by the repositories. this is the
> only deviation from ubuntu repositories for 14.04 lts.
> since upgrade to gfs 3.6.7 the glusterd on two nodes of the same cluster
> are going offline after getting an xfs_attr3_leaf_write_verify error for
> the underlying bricks as shown below.
> this happens about every 4-5 hours after the problem was solved by an
> umount / remount of the brick. it makes no difference to run a xfs_check
> / xfs_repair before remount.
> xfs_check / xfs_repair did not show any faults. the underlying hw is a
> raid 5 vol on lsi-9271 8i. megacli does not show any errors.
> the syslog does not show more than the dmesg output below.
> every time the same two nodes of the same cluster are affected.
> as shown in dmesg and syslog, the system recognizes the
> xfs_attr_leaf_write_verify error about 38 min. before finally giving up.
> for both events i can not found corresponding events in gluster logs.
> this is strange...the gluster is historical grown from 3.2.5, 3.3, to
> 3.4.6/7 which was running well for month, gfs 3.5.6 was running for
> about two weeks and upgrade to 3.6.7 was done because of a geo-repl
> log-flood.
> even when i have no hint/evidence that this is caused by gfs 3.6.7
> somehow i believe that this is the case...
> does anybody experienced such an error or have some hints to getting out
> of this big problem...?
> unfortunately the affected cluster is the master of a geo-replication
> which is not well running since update from gfs 3.4.7...fortunately both
> affected gluster-nodes are not of the same sub-volume.
> 
> any help is appreciated...
> 
> best regards
> dietmar
> 
> 
> 
> 
> [ 09:32:29 ] - root@gluster-ger-ber-10  /var/log $gluster volume info
> 
> Volume Name: ger-ber-01
> Type: Distributed-Replicate
> Volume ID: 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671
> Status: Started
> Number of Bricks: 3 x 2 = 6
> Transport-type: tcp
> Bricks:
> Brick1: gluster-ger-ber-11-int:/gluster-export
> Brick2: gluster-ger-ber-12-int:/gluster-export
> Brick3: gluster-ger-ber-09-int:/gluster-export
> Brick4: gluster-ger-ber-10-int:/gluster-export
> Brick5: gluster-ger-ber-07-int:/gluster-export
> Brick6: gluster-ger-ber-08-int:/gluster-export
> Options Reconfigured:
> changelog.changelog: on
> geo-replication.ignore-pid-check: on
> cluster.min-free-disk: 200GB
> geo-replication.indexing: on
> auth.allow:
> 10.0.1.*,188.138.82.*,188.138.123.*,82.193.249.198,82.193.249.200,31.7.178.137,31.7.178.135,31.7.180.109,31.7.180.98,82.199.147.*,104.155.22.202,104.155.30.201,104.155.5.117,104.155.11.253,104.155.15.34,104.155.25.145,146.148.120.255,31.7.180.148
> nfs.disable: off
> performance.cache-refresh-timeout: 2
> performance.io-thread-count: 32
> performance.cache-size: 1024MB
> performance.read-ahead: on
> performance.cache-min-file-size: 0
> network.ping-timeout: 10
> [ 09:32:52 ] - root@gluster-ger-ber-10  /var/log $
> 
> 
> 
> 
> [ 19:10:55 ] - root@gluster-ger-ber-10  /var/log $gluster volume status
> Status of volume: ger-ber-01
> Gluster processPortOnline Pid
> --
> 
> Brick gluster-ger-ber-11-int:/gluster-export 49152Y 15994
> Brick gluster-ger-ber-12-int:/gluster-export N/AN N/A
> Brick gluster-ger-ber-09-int:/gluster-export 49152Y 10965
> Brick gluster-ger-ber-10-int:/gluster-export N/AN N/A
> Brick gluster-ger-ber-07-int:/gluster-export 49152Y 18542
> Brick gluster-ger-ber-08-int:/gluster-export 49152Y 20275
> NFS Server on localhost2049Y 13658
> Self-heal Daemon on localhostN/AY 13666
> NFS Server on gluster-ger-ber-09-int2049 Y13503
> Self-heal Daemon on gluster-ger-ber-09-intN/A Y 13511
> NFS Server on gluster-ger-ber-07-int2049 Y21526
> Self-heal Daemon on gluster-ger-ber-07-intN/A Y 21534
> NFS Server on gluster-ger

[Gluster-users] after upgrade to 3.6.7 : Internal error xfs_attr3_leaf_write_verify

2015-12-03 Thread Dietmar Putz

Hello all,

on 1st december i upgraded two 6 node cluster from glusterfs 3.5.6 to 3.6.7.
all of them are equal in hw, os and patchlevel, currently running ubuntu 
14.04 lts by an do-release-upgrade from 12.04 lts (this was done before 
gfs upgrade to 3.5.6, not directly before upgrading to 3.6.7).
because of a geo-replication issue all of the nodes have rsync 3.1.1.3 
installed instead  3.1.0 which comes by the repositories. this is the 
only deviation from ubuntu repositories for 14.04 lts.
since upgrade to gfs 3.6.7 the glusterd on two nodes of the same cluster 
are going offline after getting an xfs_attr3_leaf_write_verify error for 
the underlying bricks as shown below.
this happens about every 4-5 hours after the problem was solved by an 
umount / remount of the brick. it makes no difference to run a xfs_check 
/ xfs_repair before remount.
xfs_check / xfs_repair did not show any faults. the underlying hw is a 
raid 5 vol on lsi-9271 8i. megacli does not show any errors.

the syslog does not show more than the dmesg output below.
every time the same two nodes of the same cluster are affected.
as shown in dmesg and syslog, the system recognizes the 
xfs_attr_leaf_write_verify error about 38 min. before finally giving up. 
for both events i can not found corresponding events in gluster logs.
this is strange...the gluster is historical grown from 3.2.5, 3.3, to 
3.4.6/7 which was running well for month, gfs 3.5.6 was running for 
about two weeks and upgrade to 3.6.7 was done because of a geo-repl 
log-flood.
even when i have no hint/evidence that this is caused by gfs 3.6.7 
somehow i believe that this is the case...
does anybody experienced such an error or have some hints to getting out 
of this big problem...?
unfortunately the affected cluster is the master of a geo-replication 
which is not well running since update from gfs 3.4.7...fortunately both 
affected gluster-nodes are not of the same sub-volume.


any help is appreciated...

best regards
dietmar




[ 09:32:29 ] - root@gluster-ger-ber-10  /var/log $gluster volume info

Volume Name: ger-ber-01
Type: Distributed-Replicate
Volume ID: 6a071cfa-b150-4f0b-b1ed-96ab5d4bd671
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: gluster-ger-ber-11-int:/gluster-export
Brick2: gluster-ger-ber-12-int:/gluster-export
Brick3: gluster-ger-ber-09-int:/gluster-export
Brick4: gluster-ger-ber-10-int:/gluster-export
Brick5: gluster-ger-ber-07-int:/gluster-export
Brick6: gluster-ger-ber-08-int:/gluster-export
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
cluster.min-free-disk: 200GB
geo-replication.indexing: on
auth.allow: 
10.0.1.*,188.138.82.*,188.138.123.*,82.193.249.198,82.193.249.200,31.7.178.137,31.7.178.135,31.7.180.109,31.7.180.98,82.199.147.*,104.155.22.202,104.155.30.201,104.155.5.117,104.155.11.253,104.155.15.34,104.155.25.145,146.148.120.255,31.7.180.148

nfs.disable: off
performance.cache-refresh-timeout: 2
performance.io-thread-count: 32
performance.cache-size: 1024MB
performance.read-ahead: on
performance.cache-min-file-size: 0
network.ping-timeout: 10
[ 09:32:52 ] - root@gluster-ger-ber-10  /var/log $




[ 19:10:55 ] - root@gluster-ger-ber-10  /var/log $gluster volume status
Status of volume: ger-ber-01
Gluster processPortOnline Pid
-- 


Brick gluster-ger-ber-11-int:/gluster-export 49152Y 15994
Brick gluster-ger-ber-12-int:/gluster-export N/AN N/A
Brick gluster-ger-ber-09-int:/gluster-export 49152Y 10965
Brick gluster-ger-ber-10-int:/gluster-export N/AN N/A
Brick gluster-ger-ber-07-int:/gluster-export 49152Y 18542
Brick gluster-ger-ber-08-int:/gluster-export 49152Y 20275
NFS Server on localhost2049Y 13658
Self-heal Daemon on localhostN/AY 13666
NFS Server on gluster-ger-ber-09-int2049 Y13503
Self-heal Daemon on gluster-ger-ber-09-intN/A Y 13511
NFS Server on gluster-ger-ber-07-int2049 Y21526
Self-heal Daemon on gluster-ger-ber-07-intN/A Y 21534
NFS Server on gluster-ger-ber-08-int2049 Y24004
Self-heal Daemon on gluster-ger-ber-08-intN/A Y 24011
NFS Server on gluster-ger-ber-11-int2049 Y18944
Self-heal Daemon on gluster-ger-ber-11-intN/A Y 18952
NFS Server on gluster-ger-ber-12-int2049 Y19138
Self-heal Daemon on gluster-ger-ber-12-intN/A Y 19146

Task Status of Volume ger-ber-01
-- 


There are no active volume tasks

- root@gluster-ger-ber-10  /var/log $

- root@gluster-ger-ber-10  /var/log $dmesg -T
...
[Wed Dec  2 12:43:47 2015] XFS (sdc1): xfs_log_force: error 5 returned.
[Wed Dec  2 12:43:48 2015] XFS (sdc1): xfs_log_force: error 5 returned.
[Wed Dec  2 12:45:58 2015] XFS (sdc1): Mounting Filesystem
[Wed Dec  2 12:45