Dear Kernel People, I am running a tiny cluster (27 nodes). Setup is as follows:
* NFS server: kernel: vanilla 2.6.18.2 + areca drivers mounted partition is XFS nfs-kernel-server: 1.0.10-1~bpo.1 (from backports.org) * clients: kernel: 2.6.17-2-amd64 (Debian stock from etch installed few months ago) Today under heavy load which consisted of the nodes manipulating tons of small files, I started to mention that software reported weird errors like: EOFError: EOF read where object expected mv: cannot stat <FILENAME> : No such file or directory etc Looking at the client's dmesg I found nasty: node17.ravana.rutgers.edu: Feb 23 13:32:06 node17 kernel: nfs_update_inode: inode 680223263 mode changed, 0042755 to 0100644 node17.ravana.rutgers.edu: Feb 23 13:41:19 node17 kernel: nfs_update_inode: inode 681427742 mode changed, 0042755 to 0100644 node22.ravana.rutgers.edu: Feb 22 22:58:15 node22 kernel: nfs_update_inode: inode 677306152 mode changed, 0042755 to 0100644 node22.ravana.rutgers.edu: Feb 23 13:48:33 node22 kernel: nfs_update_inode: inode 681695507 mode changed, 0100644 to 0042755 node12.ravana.rutgers.edu: Feb 23 13:31:01 node12 kernel: nfs_update_inode: inode 680150798 mode changed, 0100644 to 0042755 node12.ravana.rutgers.edu: Feb 23 13:34:57 node12 kernel: nfs_update_inode: inode 680418141 mode changed, 0100644 to 0042755 node12.ravana.rutgers.edu: Feb 23 13:37:01 node12 kernel: nfs_update_inode: inode 680637478 mode changed, 0100644 to 0042755 node12.ravana.rutgers.edu: Feb 23 13:39:25 node12 kernel: nfs_update_inode: inode 681034087 mode changed, 0100644 to 0042755 node12.ravana.rutgers.edu: Feb 23 13:40:06 node12 kernel: nfs_update_inode: inode 681225056 mode changed, 0042755 to 0100644 node12.ravana.rutgers.edu: Feb 23 13:43:26 node12 kernel: nfs_update_inode: inode 681474682 mode changed, 0100644 to 0042755 ...... server logs didn't show any abnormal things... where should I look for the source of the problem? googling up seems to be of no interesting result... is there way to eliminate cause may be by tuning some performance parameters (ie sacrificing performance for stability)? Thanks everyone in advance for hints -- Yaroslav Halchenko Research Assistant, Psychology Department, Rutgers-Newark Student Ph.D. @ CS Dept. NJIT Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102 WWW: http://www.linkedin.com/in/yarik - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/