Re: writeout stalls in current -git

2007-11-04 Thread Torsten Kaiser
On 11/5/07, David Chinner <[EMAIL PROTECTED]> wrote:
> On Sun, Nov 04, 2007 at 12:19:19PM +0100, Torsten Kaiser wrote:
> > I can now confirm, that I see this also with the current 
> > mainline-git-version
> > I used 2.6.24-rc1-git-b4f555081fdd27d13e6ff39d455d5aefae9d2c0c
> > plus the fix for the sg changes in ieee1394.
>
> Ok, so it's probably a side effect of the writeback changes.
>
> Attached are two patches (two because one was in a separate patchset as
> a standalone change) that should prevent async writeback from blocking
> on locked inode cluster buffers. Apply the xfs-factor-inotobp patch first.
> Can you see if this fixes the problem?

Applied both patches against the kernel mentioned above.
This blows up at boot:
[   80.807589] Filesystem "dm-0": Disabling barriers, not supported by
the underlying device
[   80.820241] XFS mounting filesystem dm-0
[   80.913144] [ cut here ]
[   80.914932] kernel BUG at drivers/md/raid5.c:143!
[   80.916751] invalid opcode:  [1] SMP
[   80.918338] CPU 3
[   80.919142] Modules linked in:
[   80.920345] Pid: 974, comm: md1_raid5 Not tainted 2.6.24-rc1 #3
[   80.922628] RIP: 0010:[]  []
__release_stripe+0x164/0x170
[   80.925935] RSP: 0018:8100060e7dd0  EFLAGS: 00010002
[   80.927987] RAX:  RBX: 81010141c288 RCX: 
[   80.930738] RDX:  RSI: 81010141c288 RDI: 810004fb3200
[   80.933488] RBP: 810004fb3200 R08:  R09: 0005
[   80.936240] R10: 0e00 R11: e200038465e8 R12: 81010141c298
[   80.938990] R13: 0286 R14: 810004fb3330 R15: 
[   80.941741] FS:  0060c870() GS:810100313700()
knlGS:
[   80.944861] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[   80.947080] CR2: 7fff7b295000 CR3: 000101842000 CR4: 06e0
[   80.949830] DR0:  DR1:  DR2: 
[   80.952580] DR3:  DR6: 0ff0 DR7: 0400
[   80.955332] Process md1_raid5 (pid: 974, threadinfo
8100060e6000, task 81000645c730)
[   80.958584] Stack:  81010141c288 01f4
810004fb3200 804b6f2d
[   80.961761]  01f4 81010141c288 804c8bd0

[   80.964681]  8100060e7ee8 804bd094 81000645c730
8100060e7e70
[   80.967518] Call Trace:
[   80.968558]  [] release_stripe+0x3d/0x60
[   80.970677]  [] md_thread+0x0/0x100
[   80.972629]  [] raid5d+0x344/0x450
[   80.974549]  [] process_timeout+0x0/0x10
[   80.976668]  [] schedule_timeout+0x5a/0xd0
[   80.978855]  [] md_thread+0x0/0x100
[   80.980807]  [] md_thread+0x30/0x100
[   80.982794]  [] autoremove_wake_function+0x0/0x30
[   80.985214]  [] md_thread+0x0/0x100
[   80.987167]  [] kthread+0x4b/0x80
[   80.989054]  [] child_rip+0xa/0x12
[   80.990972]  [] kthread+0x0/0x80
[   80.992824]  [] child_rip+0x0/0x12
[   80.994743]
[   80.995588]
[   80.995588] Code: 0f 0b eb fe 0f 1f 84 00 00 00 00 00 48 83 ec 28
48 89 5c 24
[   80.999307] RIP  [] __release_stripe+0x164/0x170
[   81.001711]  RSP 

Switching back to unpatched 2.6.23-mm1 boots sucessfull...

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: writeout stalls in current -git

2007-11-04 Thread David Chinner
On Sun, Nov 04, 2007 at 12:19:19PM +0100, Torsten Kaiser wrote:
> On 11/2/07, David Chinner <[EMAIL PROTECTED]> wrote:
> > That's stalled waiting on the inode cluster buffer lock. That implies
> > that the inode lcuser is already being written out and the inode has
> > been redirtied during writeout.
> >
> > Does the kernel you are testing have the "flush inodes in ascending
> > inode number order" patches applied? If so, can you remove that
> > patch and see if the problem goes away?
> 
> I can now confirm, that I see this also with the current mainline-git-version
> I used 2.6.24-rc1-git-b4f555081fdd27d13e6ff39d455d5aefae9d2c0c
> plus the fix for the sg changes in ieee1394.

Ok, so it's probably a side effect of the writeback changes.

Attached are two patches (two because one was in a separate patchset as
a standalone change) that should prevent async writeback from blocking
on locked inode cluster buffers. Apply the xfs-factor-inotobp patch first.
Can you see if this fixes the problem?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group


---
 fs/xfs/xfs_inode.c |  283 -
 1 file changed, 129 insertions(+), 154 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/xfs_inode.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_inode.c   2007-09-12 15:41:22.0 
+1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_inode.c2007-09-13 08:57:06.395641940 +1000
@@ -124,6 +124,126 @@ xfs_inobp_check(
 #endif
 
 /*
+ * Simple wrapper for calling xfs_imap() that includes error
+ * and bounds checking
+ */
+STATIC int
+xfs_ino_to_imap(
+   xfs_mount_t *mp,
+   xfs_trans_t *tp,
+   xfs_ino_t   ino,
+   xfs_imap_t  *imap,
+   uintimap_flags)
+{
+   int error;
+
+   error = xfs_imap(mp, tp, ino, imap, imap_flags);
+   if (error) {
+   cmn_err(CE_WARN, "xfs_ino_to_imap: xfs_imap()  returned an "
+   "error %d on %s.  Returning error.",
+   error, mp->m_fsname);
+   return error;
+   }
+
+   /*
+* If the inode number maps to a block outside the bounds
+* of the file system then return NULL rather than calling
+* read_buf and panicing when we get an error from the
+* driver.
+*/
+   if ((imap->im_blkno + imap->im_len) >
+   XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)) {
+   xfs_fs_cmn_err(CE_ALERT, mp, "xfs_ino_to_imap: "
+   "(imap->im_blkno (0x%llx) + imap->im_len (0x%llx)) > "
+   " XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) (0x%llx)",
+   (unsigned long long) imap->im_blkno,
+   (unsigned long long) imap->im_len,
+   XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks));
+   return XFS_ERROR(EINVAL);
+   }
+   return 0;
+}
+
+/*
+ * Find the buffer associated with the given inode map
+ * We do basic validation checks on the buffer once it has been
+ * retrieved from disk.
+ */
+STATIC int
+xfs_imap_to_bp(
+   xfs_mount_t *mp,
+   xfs_trans_t *tp,
+   xfs_imap_t  *imap,
+   xfs_buf_t   **bpp,
+   uintbuf_flags,
+   uintimap_flags)
+{
+   int error;
+   int i;
+   int ni;
+   xfs_buf_t   *bp;
+
+   error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap->im_blkno,
+  (int)imap->im_len, XFS_BUF_LOCK, &bp);
+   if (error) {
+   cmn_err(CE_WARN, "xfs_imap_to_bp: xfs_trans_read_buf()returned "
+   "an error %d on %s.  Returning error.",
+   error, mp->m_fsname);
+   return error;
+   }
+
+   /*
+* Validate the magic number and version of every inode in the buffer
+* (if DEBUG kernel) or the first inode in the buffer, otherwise.
+*/
+#ifdef DEBUG
+   ni = BBTOB(imap->im_len) >> mp->m_sb.sb_inodelog;
+#else  /* usual case */
+   ni = 1;
+#endif
+
+   for (i = 0; i < ni; i++) {
+   int di_ok;
+   xfs_dinode_t*dip;
+
+   dip = (xfs_dinode_t *)xfs_buf_offset(bp,
+   (i << mp->m_sb.sb_inodelog));
+   di_ok = be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC 
&&
+   XFS_DINODE_GOOD_VERSION(dip->di_core.di_version);
+   if (unlikely(XFS_TEST_ERROR(!di_ok, mp,
+   XFS_ERRTAG_ITOBP_INOTOBP,
+   XFS_RANDOM_ITOBP_INOTOBP))) {
+   if (imap_flags & XFS_IMAP_BULKSTAT) {
+   xfs_trans_brelse(tp, bp);
+   return XFS_ERROR(EINVAL)

Re: Massive slowdown when re-querying large nfs dir

2007-11-04 Thread Al Boldi
Matthew Wilcox wrote:
> How about tcpdumping and seeing what requests are flowing across the
> wire?  You might be able to figure out what's being done differently.

I think lookup is faster than getattr.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Massive slowdown when re-querying large nfs dir

2007-11-04 Thread Matthew Wilcox
On Mon, Nov 05, 2007 at 07:58:38AM +0300, Al Boldi wrote:
> Any ideas?

How about tcpdumping and seeing what requests are flowing across the
wire?  You might be able to figure out what's being done differently.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Massive slowdown when re-querying large nfs dir

2007-11-04 Thread Al Boldi
There is a massive (3-18x) slowdown when re-querying a large nfs dir (2k+ 
entries) using a simple ls -l.

On 2.6.23 client and server running userland rpc.nfs.V2:
first  try: time -p ls -l <2k+ entry dir>  in ~2.5sec
more tries: time -p ls -l <2k+ entry dir>  in ~8sec

first  try: time -p ls -l <5k+ entry dir>  in ~9sec
more tries: time -p ls -l <5k+ entry dir>  in ~180sec

On 2.6.23 client and 2.4.31 server running userland rpc.nfs.V2:
first  try: time -p ls -l <2k+ entry dir>  in ~2.5sec
more tries: time -p ls -l <2k+ entry dir>  in ~7sec

first  try: time -p ls -l <5k+ entry dir>  in ~8sec
more tries: time -p ls -l <5k+ entry dir>  in ~43sec

Remounting the nfs-dir on the client resets the problem.

Any ideas?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: writeout stalls in current -git

2007-11-04 Thread Torsten Kaiser
On 11/2/07, David Chinner <[EMAIL PROTECTED]> wrote:
> That's stalled waiting on the inode cluster buffer lock. That implies
> that the inode lcuser is already being written out and the inode has
> been redirtied during writeout.
>
> Does the kernel you are testing have the "flush inodes in ascending
> inode number order" patches applied? If so, can you remove that
> patch and see if the problem goes away?

I can now confirm, that I see this also with the current mainline-git-version
I used 2.6.24-rc1-git-b4f555081fdd27d13e6ff39d455d5aefae9d2c0c
plus the fix for the sg changes in ieee1394.
Bisecting would be troublesome, as the sg changes prevent mainline to
boot with my normal config / kill my network.

treogen ~ # vmstat 10
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
-> starting emerge
 1  0  0 3627072332 157724009713   41  189  2  2 94  2
 0  0  0 3607240332 16373600   59910  332  951  2  1 93  4
 0  0  0 3601920332 16759200   380 2  218  870  1  1 98  0
 0  0  0 3596356332 17164800   40421  182  818  0  0 99  0
 0  0  0 3579328332 18043600   87812  147  912  1  1 97  2
 0  0  0 3575376332 18277600   236 4  244  953  1  1 95  3
 2  1  0 3571792332 18508400   232 7  256 1003  2  1 95  2
 0  0  0 3564844332 18736400   228   605  246 1167  2  1 93  4
 0  0  0 3562128332 18978400   230 4  527 1238  2  1 93  4
 0  1  0 3558764332 19196400   21624  438 1059  1  1 93  6
 0  0  0 3555120332 19386800   19936  406  959  0  0 92  8
 0  0  0 3552008332 19592800   19711  458 1023  1  1 90  8
 0  0  0 3548728332 19766000   183 7  496 1086  1  1 90  8
 0  0  0 3545560332 19937200   170 8  483 1017  1  1 90  9
 0  1  0 3542124332 20125600   190 1  544 1137  1  1 88 10
 1  0  0 3536924332 20329600   195 7  637 1209  2  1 89  8
 1  1  0 3485096332 24918400   10116 10372 4537 13  3 76  8
 2  0  0 3442004332 27972800  108640  219 1349  7  3 87  4
-> emerge is done reading its package database
 1  0  0 3254796332 44863600 027  128 8360 24  6 70  0
 2  0  0 3143304332 554016004733  213 4480 16 11 72  1
-> kernel unpacked
 1  0  0 3125700332 56041600 120  122 1675 24  1 75  0
 1  0  0 3117356332 56796800 0   674  157 2975 24  2 73  1
 2  0  0 3111636332 57373600 0  1143  151 1924 23  1 75  1
 2  0  0 3102836332 58133200 0   890  153 1330 24  1 75  0
 1  0  0 3097236332 58736000 0   656  194 1593 24  1 74  0
 1  0  0 3086824332 59548000 0   812  235 2657 25  1 74  0
-> tar.bz2 created, installing starts now
 0  0  0 3091612332 6010240082   708  499 2397 17  4 78  1
 0  0  0 3086088332 6021800069  2459  769 2237  3  4 88  6
 0  0  0 3085916332 60223600 2  1752  693  949  1  2 96  1
 0  0  0 3084544332 6035640066  4057 1176 2850  3  6 91  0
 0  0  0 3078780332 6055720098  3194 1169 3288  5  6 89  0
 0  0  0 3077940332 6059240017  1139  823 1547  1  2 97  0
 0  0  0 3078268332 60592400 0   888  807 1329  0  1 99  0
-> first short stall
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  0  0 3077040332 60592400 0  1950  785 1495  0  2 89  8
 0  0  0 3076588332 60589600 2  3807  925 2046  1  4 95  0
 0  0  0 3076900332 6060520011  2564  768 1471  1  3 95  1
 0  0  0 3071584332 6079280087  2499 1108 3433  4  6 90  0
-> second longer stall
(emerge was not able to complete a single filemove until the 'resume' line)
 0  0  0 3071592332 60792800 0   693  692 1289  0  0 99  0
 0  0  0 3072584332 60792800 0   792  731 1507  0  1 99  0
 0  0  0 3072840332 60792800 0   806  707 1521  0  1 99  0
 0  0  0 3072724332 60792800 0   782  695 1372  0  0 99  0
 0  0  0 3072972332 60792800 0   677  612 1301  0  0 99  0
 0  0  0 3072772332 60792800 0   738  681 1352  1  1 99  0
 0  0  0 3073020332 60792800 0   785  708 1328  0  1 99  0
 0  0  0 3072896332 60792800 0   833  722 1383  0  0 99  0
-> emerge resumed
 0  0  0 3069476332 60797200 2  4885  812 2062  1  4 90  5
 1  0  0 3069648332 60806800 4