Re: writeout stalls in current -git
On 11/5/07, David Chinner <[EMAIL PROTECTED]> wrote: > On Sun, Nov 04, 2007 at 12:19:19PM +0100, Torsten Kaiser wrote: > > I can now confirm, that I see this also with the current > > mainline-git-version > > I used 2.6.24-rc1-git-b4f555081fdd27d13e6ff39d455d5aefae9d2c0c > > plus the fix for the sg changes in ieee1394. > > Ok, so it's probably a side effect of the writeback changes. > > Attached are two patches (two because one was in a separate patchset as > a standalone change) that should prevent async writeback from blocking > on locked inode cluster buffers. Apply the xfs-factor-inotobp patch first. > Can you see if this fixes the problem? Applied both patches against the kernel mentioned above. This blows up at boot: [ 80.807589] Filesystem "dm-0": Disabling barriers, not supported by the underlying device [ 80.820241] XFS mounting filesystem dm-0 [ 80.913144] [ cut here ] [ 80.914932] kernel BUG at drivers/md/raid5.c:143! [ 80.916751] invalid opcode: [1] SMP [ 80.918338] CPU 3 [ 80.919142] Modules linked in: [ 80.920345] Pid: 974, comm: md1_raid5 Not tainted 2.6.24-rc1 #3 [ 80.922628] RIP: 0010:[] [] __release_stripe+0x164/0x170 [ 80.925935] RSP: 0018:8100060e7dd0 EFLAGS: 00010002 [ 80.927987] RAX: RBX: 81010141c288 RCX: [ 80.930738] RDX: RSI: 81010141c288 RDI: 810004fb3200 [ 80.933488] RBP: 810004fb3200 R08: R09: 0005 [ 80.936240] R10: 0e00 R11: e200038465e8 R12: 81010141c298 [ 80.938990] R13: 0286 R14: 810004fb3330 R15: [ 80.941741] FS: 0060c870() GS:810100313700() knlGS: [ 80.944861] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [ 80.947080] CR2: 7fff7b295000 CR3: 000101842000 CR4: 06e0 [ 80.949830] DR0: DR1: DR2: [ 80.952580] DR3: DR6: 0ff0 DR7: 0400 [ 80.955332] Process md1_raid5 (pid: 974, threadinfo 8100060e6000, task 81000645c730) [ 80.958584] Stack: 81010141c288 01f4 810004fb3200 804b6f2d [ 80.961761] 01f4 81010141c288 804c8bd0 [ 80.964681] 8100060e7ee8 804bd094 81000645c730 8100060e7e70 [ 80.967518] Call Trace: [ 80.968558] [] release_stripe+0x3d/0x60 [ 80.970677] [] md_thread+0x0/0x100 [ 80.972629] [] raid5d+0x344/0x450 [ 80.974549] [] process_timeout+0x0/0x10 [ 80.976668] [] schedule_timeout+0x5a/0xd0 [ 80.978855] [] md_thread+0x0/0x100 [ 80.980807] [] md_thread+0x30/0x100 [ 80.982794] [] autoremove_wake_function+0x0/0x30 [ 80.985214] [] md_thread+0x0/0x100 [ 80.987167] [] kthread+0x4b/0x80 [ 80.989054] [] child_rip+0xa/0x12 [ 80.990972] [] kthread+0x0/0x80 [ 80.992824] [] child_rip+0x0/0x12 [ 80.994743] [ 80.995588] [ 80.995588] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 48 83 ec 28 48 89 5c 24 [ 80.999307] RIP [] __release_stripe+0x164/0x170 [ 81.001711] RSP Switching back to unpatched 2.6.23-mm1 boots sucessfull... Torsten - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: writeout stalls in current -git
On Sun, Nov 04, 2007 at 12:19:19PM +0100, Torsten Kaiser wrote: > On 11/2/07, David Chinner <[EMAIL PROTECTED]> wrote: > > That's stalled waiting on the inode cluster buffer lock. That implies > > that the inode lcuser is already being written out and the inode has > > been redirtied during writeout. > > > > Does the kernel you are testing have the "flush inodes in ascending > > inode number order" patches applied? If so, can you remove that > > patch and see if the problem goes away? > > I can now confirm, that I see this also with the current mainline-git-version > I used 2.6.24-rc1-git-b4f555081fdd27d13e6ff39d455d5aefae9d2c0c > plus the fix for the sg changes in ieee1394. Ok, so it's probably a side effect of the writeback changes. Attached are two patches (two because one was in a separate patchset as a standalone change) that should prevent async writeback from blocking on locked inode cluster buffers. Apply the xfs-factor-inotobp patch first. Can you see if this fixes the problem? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- fs/xfs/xfs_inode.c | 283 - 1 file changed, 129 insertions(+), 154 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.c === --- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.c 2007-09-12 15:41:22.0 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c2007-09-13 08:57:06.395641940 +1000 @@ -124,6 +124,126 @@ xfs_inobp_check( #endif /* + * Simple wrapper for calling xfs_imap() that includes error + * and bounds checking + */ +STATIC int +xfs_ino_to_imap( + xfs_mount_t *mp, + xfs_trans_t *tp, + xfs_ino_t ino, + xfs_imap_t *imap, + uintimap_flags) +{ + int error; + + error = xfs_imap(mp, tp, ino, imap, imap_flags); + if (error) { + cmn_err(CE_WARN, "xfs_ino_to_imap: xfs_imap() returned an " + "error %d on %s. Returning error.", + error, mp->m_fsname); + return error; + } + + /* +* If the inode number maps to a block outside the bounds +* of the file system then return NULL rather than calling +* read_buf and panicing when we get an error from the +* driver. +*/ + if ((imap->im_blkno + imap->im_len) > + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)) { + xfs_fs_cmn_err(CE_ALERT, mp, "xfs_ino_to_imap: " + "(imap->im_blkno (0x%llx) + imap->im_len (0x%llx)) > " + " XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) (0x%llx)", + (unsigned long long) imap->im_blkno, + (unsigned long long) imap->im_len, + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)); + return XFS_ERROR(EINVAL); + } + return 0; +} + +/* + * Find the buffer associated with the given inode map + * We do basic validation checks on the buffer once it has been + * retrieved from disk. + */ +STATIC int +xfs_imap_to_bp( + xfs_mount_t *mp, + xfs_trans_t *tp, + xfs_imap_t *imap, + xfs_buf_t **bpp, + uintbuf_flags, + uintimap_flags) +{ + int error; + int i; + int ni; + xfs_buf_t *bp; + + error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap->im_blkno, + (int)imap->im_len, XFS_BUF_LOCK, &bp); + if (error) { + cmn_err(CE_WARN, "xfs_imap_to_bp: xfs_trans_read_buf()returned " + "an error %d on %s. Returning error.", + error, mp->m_fsname); + return error; + } + + /* +* Validate the magic number and version of every inode in the buffer +* (if DEBUG kernel) or the first inode in the buffer, otherwise. +*/ +#ifdef DEBUG + ni = BBTOB(imap->im_len) >> mp->m_sb.sb_inodelog; +#else /* usual case */ + ni = 1; +#endif + + for (i = 0; i < ni; i++) { + int di_ok; + xfs_dinode_t*dip; + + dip = (xfs_dinode_t *)xfs_buf_offset(bp, + (i << mp->m_sb.sb_inodelog)); + di_ok = be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC && + XFS_DINODE_GOOD_VERSION(dip->di_core.di_version); + if (unlikely(XFS_TEST_ERROR(!di_ok, mp, + XFS_ERRTAG_ITOBP_INOTOBP, + XFS_RANDOM_ITOBP_INOTOBP))) { + if (imap_flags & XFS_IMAP_BULKSTAT) { + xfs_trans_brelse(tp, bp); + return XFS_ERROR(EINVAL)
Re: Massive slowdown when re-querying large nfs dir
Matthew Wilcox wrote: > How about tcpdumping and seeing what requests are flowing across the > wire? You might be able to figure out what's being done differently. I think lookup is faster than getattr. Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Massive slowdown when re-querying large nfs dir
On Mon, Nov 05, 2007 at 07:58:38AM +0300, Al Boldi wrote: > Any ideas? How about tcpdumping and seeing what requests are flowing across the wire? You might be able to figure out what's being done differently. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Massive slowdown when re-querying large nfs dir
There is a massive (3-18x) slowdown when re-querying a large nfs dir (2k+ entries) using a simple ls -l. On 2.6.23 client and server running userland rpc.nfs.V2: first try: time -p ls -l <2k+ entry dir> in ~2.5sec more tries: time -p ls -l <2k+ entry dir> in ~8sec first try: time -p ls -l <5k+ entry dir> in ~9sec more tries: time -p ls -l <5k+ entry dir> in ~180sec On 2.6.23 client and 2.4.31 server running userland rpc.nfs.V2: first try: time -p ls -l <2k+ entry dir> in ~2.5sec more tries: time -p ls -l <2k+ entry dir> in ~7sec first try: time -p ls -l <5k+ entry dir> in ~8sec more tries: time -p ls -l <5k+ entry dir> in ~43sec Remounting the nfs-dir on the client resets the problem. Any ideas? Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: writeout stalls in current -git
On 11/2/07, David Chinner <[EMAIL PROTECTED]> wrote: > That's stalled waiting on the inode cluster buffer lock. That implies > that the inode lcuser is already being written out and the inode has > been redirtied during writeout. > > Does the kernel you are testing have the "flush inodes in ascending > inode number order" patches applied? If so, can you remove that > patch and see if the problem goes away? I can now confirm, that I see this also with the current mainline-git-version I used 2.6.24-rc1-git-b4f555081fdd27d13e6ff39d455d5aefae9d2c0c plus the fix for the sg changes in ieee1394. Bisecting would be troublesome, as the sg changes prevent mainline to boot with my normal config / kill my network. treogen ~ # vmstat 10 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa -> starting emerge 1 0 0 3627072332 157724009713 41 189 2 2 94 2 0 0 0 3607240332 16373600 59910 332 951 2 1 93 4 0 0 0 3601920332 16759200 380 2 218 870 1 1 98 0 0 0 0 3596356332 17164800 40421 182 818 0 0 99 0 0 0 0 3579328332 18043600 87812 147 912 1 1 97 2 0 0 0 3575376332 18277600 236 4 244 953 1 1 95 3 2 1 0 3571792332 18508400 232 7 256 1003 2 1 95 2 0 0 0 3564844332 18736400 228 605 246 1167 2 1 93 4 0 0 0 3562128332 18978400 230 4 527 1238 2 1 93 4 0 1 0 3558764332 19196400 21624 438 1059 1 1 93 6 0 0 0 3555120332 19386800 19936 406 959 0 0 92 8 0 0 0 3552008332 19592800 19711 458 1023 1 1 90 8 0 0 0 3548728332 19766000 183 7 496 1086 1 1 90 8 0 0 0 3545560332 19937200 170 8 483 1017 1 1 90 9 0 1 0 3542124332 20125600 190 1 544 1137 1 1 88 10 1 0 0 3536924332 20329600 195 7 637 1209 2 1 89 8 1 1 0 3485096332 24918400 10116 10372 4537 13 3 76 8 2 0 0 3442004332 27972800 108640 219 1349 7 3 87 4 -> emerge is done reading its package database 1 0 0 3254796332 44863600 027 128 8360 24 6 70 0 2 0 0 3143304332 554016004733 213 4480 16 11 72 1 -> kernel unpacked 1 0 0 3125700332 56041600 120 122 1675 24 1 75 0 1 0 0 3117356332 56796800 0 674 157 2975 24 2 73 1 2 0 0 3111636332 57373600 0 1143 151 1924 23 1 75 1 2 0 0 3102836332 58133200 0 890 153 1330 24 1 75 0 1 0 0 3097236332 58736000 0 656 194 1593 24 1 74 0 1 0 0 3086824332 59548000 0 812 235 2657 25 1 74 0 -> tar.bz2 created, installing starts now 0 0 0 3091612332 6010240082 708 499 2397 17 4 78 1 0 0 0 3086088332 6021800069 2459 769 2237 3 4 88 6 0 0 0 3085916332 60223600 2 1752 693 949 1 2 96 1 0 0 0 3084544332 6035640066 4057 1176 2850 3 6 91 0 0 0 0 3078780332 6055720098 3194 1169 3288 5 6 89 0 0 0 0 3077940332 6059240017 1139 823 1547 1 2 97 0 0 0 0 3078268332 60592400 0 888 807 1329 0 1 99 0 -> first short stall procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 3077040332 60592400 0 1950 785 1495 0 2 89 8 0 0 0 3076588332 60589600 2 3807 925 2046 1 4 95 0 0 0 0 3076900332 6060520011 2564 768 1471 1 3 95 1 0 0 0 3071584332 6079280087 2499 1108 3433 4 6 90 0 -> second longer stall (emerge was not able to complete a single filemove until the 'resume' line) 0 0 0 3071592332 60792800 0 693 692 1289 0 0 99 0 0 0 0 3072584332 60792800 0 792 731 1507 0 1 99 0 0 0 0 3072840332 60792800 0 806 707 1521 0 1 99 0 0 0 0 3072724332 60792800 0 782 695 1372 0 0 99 0 0 0 0 3072972332 60792800 0 677 612 1301 0 0 99 0 0 0 0 3072772332 60792800 0 738 681 1352 1 1 99 0 0 0 0 3073020332 60792800 0 785 708 1328 0 1 99 0 0 0 0 3072896332 60792800 0 833 722 1383 0 0 99 0 -> emerge resumed 0 0 0 3069476332 60797200 2 4885 812 2062 1 4 90 5 1 0 0 3069648332 60806800 4