Re: 2.6.18 mmap hangs unrelated apps
On 2006/12/15 at 15:44:14 Trond Myklebust <[EMAIL PROTECTED]> wrote > On Fri, 2006-12-15 at 15:06 -0600, Michal Sabala wrote: > > > > What nfs_debug information would be useful in tracking this > > problem? Is there any other information I can provide you? > > Could you just out of interest try 2.6.20-rc1? Hello Trond, Andrew, For what it's worth, after running 2.6.20-rc1 for ~12 hours, I did not observe the uninterruptible sleep condition. Thanks, Michal -- Michal "Saahbs" Sabala - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.18 mmap hangs unrelated apps
On 2006/12/15 at 15:44:14 Trond Myklebust <[EMAIL PROTECTED]> wrote > On Fri, 2006-12-15 at 15:06 -0600, Michal Sabala wrote: > > Could this be related to the fact that the nfs mmaped file is unlinked > > before it is ummaped? The .nfsXXX file disappears from the NFS > > server as soon as test-mmap.c exits. > > That shouldn't normally matter. The file won't be deleted until after > the last user has stopped referencing it. However it is true that the > trace you sent indicated that XFree86 was hanging in iput(). > > > What nfs_debug information would be useful in tracking this > > problem? Is there any other information I can provide you? > > Could you just out of interest try 2.6.20-rc1? Trond, I'll try 2.6.20-rc1 on Monday and post results to the list. Thanks, Michal -- Michal "Saahbs" Sabala - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.18 mmap hangs unrelated apps
On 2006/12/15 at 15:12:06 Arjan van de Ven <[EMAIL PROTECTED]> wrote > > > > > I do not have any indication that it is the server not responding. Other > > applications which have NFS files open are continuing to work while in > > this case XFree86 blocks. > > just a strange question, but which video driver do you use in X? maybe > that one is blocking say the pci bus or something... Arjan, The P3 box with nfs root uses the "ati" X11 driver with: :01:00.0 VGA compatible controller: ATI Technologies Inc Rage Mobility P/M AGP 2x (rev 64) The P4 box with nfs /home uses the "i810" X11 driver with: :00:02.0 VGA compatible controller: Intel Corp. 82865G Integrated Graphics Device (rev 02) Thanks, Michal -- Michal "Saahbs" Sabala - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.18 mmap hangs unrelated apps
On 2006/12/15 at 14:42:08 Andrew Morton <[EMAIL PROTECTED]> wrote > On Fri, 15 Dec 2006 11:50:30 -0600 > Michal Sabala <[EMAIL PROTECTED]> wrote: > > > On 2006/12/15 at 10:24:15 Trond Myklebust <[EMAIL PROTECTED]> wrote > > > On Thu, 2006-12-14 at 20:30 -0600, Michal Sabala wrote: > > > > > > > > `cat /proc/*PID*/wchan` for all hanging processes contains page_sync. > > > > > > Have you tried an 'echo t >/proc/sysrq-trigger' on a client with one of > > > these hanging processes? If so, what does the output look like? > > > > Hello Trond, > > > > Below is the sysrq trace output for XFree86 which entered the > > uninterruptible sleep state on the P4 machine with nfs /home. Please > > note that XFree86 does not have any files open in /home - as reported by > > `lsof`. Below, I also listed the output of vmstat. > > We'd need to see the trace of all D-state processes, please. Xfree86 might > just be a victim of a deadlock elsewhere. However there is a problem here.. Hi Andrew, In most cases only a single process enters the D-state, this time it was XFree, but I've seen gimp, firefox, gconfd and bash. Once or twice I did see two or three processes ending up in uninterruptible sleep, but I suspect they entered this state at different test-mmap.c runs (I left test-mmap.c running in a bash loop and checked the system after a few hours). Would it be beneficial to keep running test-mmap.c on this machine until two or more processes end up in D-state? I can leave this machine running test-mmap.c over the weekend. Please advise, Sincerely, Michal -- Michal "Saahbs" Sabala - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.18 mmap hangs unrelated apps
On 2006/12/15 at 13:44:44 Trond Myklebust <[EMAIL PROTECTED]> wrote > On Fri, 2006-12-15 at 11:50 -0600, Michal Sabala wrote: > > On 2006/12/15 at 10:24:15 Trond Myklebust <[EMAIL PROTECTED]> wrote > > > On Thu, 2006-12-14 at 20:30 -0600, Michal Sabala wrote: > > > > > > > > `cat /proc/*PID*/wchan` for all hanging processes contains page_sync. > > > > > > Have you tried an 'echo t >/proc/sysrq-trigger' on a client with one of > > > these hanging processes? If so, what does the output look like? > > > > Hello Trond, > > > > Below is the sysrq trace output for XFree86 which entered the > > uninterruptible sleep state on the P4 machine with nfs /home. Please > > note that XFree86 does not have any files open in /home - as reported by > > `lsof`. Below, I also listed the output of vmstat. > > > It is hanging because it is trying to free up memory by reclaiming pages > that are held by your mmaped file on NFS. Do you know why NFS is > hanging? Trond, I do not have any indication that it is the server not responding. Other applications which have NFS files open are continuing to work while in this case XFree86 blocks. Also, please note that test-mmap.c has successfully finished execution and it is no longer running while XFree86 is still hanging. Could this be related to the fact that the nfs mmaped file is unlinked before it is ummaped? The .nfsXXX file disappears from the NFS server as soon as test-mmap.c exits. What nfs_debug information would be useful in tracking this problem? Is there any other information I can provide you? Thank You, Sincerely, Michal -- Michal "Saahbs" Sabala - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.18 mmap hangs unrelated apps
On 2006/12/15 at 10:24:15 Trond Myklebust <[EMAIL PROTECTED]> wrote > On Thu, 2006-12-14 at 20:30 -0600, Michal Sabala wrote: > > > > `cat /proc/*PID*/wchan` for all hanging processes contains page_sync. > > Have you tried an 'echo t >/proc/sysrq-trigger' on a client with one of > these hanging processes? If so, what does the output look like? Hello Trond, Below is the sysrq trace output for XFree86 which entered the uninterruptible sleep state on the P4 machine with nfs /home. Please note that XFree86 does not have any files open in /home - as reported by `lsof`. Below, I also listed the output of vmstat. XFree86 D 0003 0 2471 2453 (NOTLB) c4871c0c 3082 c86b72bc 0003 cb7c94a4 001d 3b67f3ff c0146dd2 c1184180 cb3e7110 001ec7ff a60f8097 0089 c02e1e60 cb3e7000 c1184180 c1180030 c4871c18 c028c7d8 c4871c5c c01435b6 c01435f3 Call Trace: [] free_pages_bulk+0x1d/0x1d4 [] io_schedule+0x26/0x30 [] sync_page+0x0/0x40 [] sync_page+0x3d/0x40 [] __wait_on_bit_lock+0x2c/0x52 [] __lock_page+0x6a/0x72 [] wake_bit_function+0x0/0x3c [] wake_bit_function+0x0/0x3c [] pagevec_lookup+0x17/0x1d [] truncate_inode_pages_range+0x20a/0x260 [] truncate_inode_pages+0x9/0xc [] generic_delete_inode+0xb6/0x10f [] iput+0x5f/0x61 [] dentry_iput+0x68/0x83 [] dput+0x100/0x118 [] put_nfs_open_context+0x67/0x88 [nfs] [] nfs_release_request+0x38/0x47 [nfs] [] nfs_wait_on_requests_locked+0x62/0x98 [nfs] [] nfs_sync_inode_wait+0x4a/0x130 [nfs] [] nfs_release_page+0x0/0x30 [nfs] [] nfs_release_page+0x1c/0x30 [nfs] [] try_to_release_page+0x34/0x46 [] shrink_page_list+0x263/0x350 [] do_IRQ+0x48/0x50 [] common_interrupt+0x1a/0x20 [] shrink_inactive_list+0x9b/0x248 [] shrink_zone+0xb5/0xd0 [] shrink_zones+0x6a/0x7e [] try_to_free_pages+0xf8/0x1da [] __alloc_pages+0x17c/0x278 [] do_anonymous_page+0x45/0x150 [] __handle_mm_fault+0xda/0x1bf [] do_page_fault+0x1c4/0x4bc [] restore_sigcontext+0x10c/0x15f [] do_page_fault+0x0/0x4bc [] error_code+0x39/0x40 $> vmstat procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 2 1 0 82128 11484 36096001221 311 287 8 3 0 89 $> vmstat -m Cache Num Total Size Pages nfs_direct_cache 0 0 76 50 nfs_write_data 36 91512 7 nfs_read_data32 35512 7 nfs_inode_cache 28108648 6 nfs_page 1 59 64 59 rpc_buffers 8 8 2048 2 rpc_tasks 8 15256 15 rpc_inode_cache 8 14512 7 fib6_nodes5113 32113 ip6_dst_cache 4 15256 15 ndisc_cache 1 15256 15 RAWv6 4 6640 6 UDPv6 1 6640 6 tw_sock_TCPv6 0 0128 30 request_sock_TCPv60 0128 30 TCPv6 2 3 1280 3 ip_conntrack_expect 0 0 96 40 ip_conntrack 13 68224 17 ip_fib_alias 9113 32113 ip_fib_hash 9113 32113 jbd_4k0 0 4096 1 Cache Num Total Size Pages ext3_inode_cache 12284 25352504 8 ext3_xattr0 0 48 78 journal_handle2169 20169 journal_head 85504 52 72 revoke_table 2254 12254 revoke_record 0 0 16203 uhci_urb_priv 1127 28127 clip_arp_cache0 0256 15 UNIX 46133512 7 flow_cache0 0128 30 cfq_ioc_pool 29 84 92 42 cfq_pool 27 80 96 40 crq_pool 20 84 44 84 deadline_drq 0 0 44 84 as_arq0 0 56 67 mqueue_inode_cache1 6640 6 dnotify_cache 0 0 20169 dquot 0 0128 30 eventpoll_pwq 0 0 36101 eventpoll_epi 0 0128 30 inotify_event_cache 0 0 28127 Cache Num Total Size Pages inotify_watch_cache 1 92 40 92 kioctx0 0256 15 kiocb 0 0128 30 fasync_cache 2203 16203 shm
2.6.18 mmap hangs unrelated apps
Hello LKML, I am observing processes entering uninterruptible sleep apparently due to an unrelated application using mmap over nfs. Applications in "uninterruptible sleep" hang indefinitely while other applications continue working properly. The code causing the mmap nfs hangs does the following: (as replicated by the included test-mmap.c file) 1. create file on nfs (file_A, descr_A) 2. make file_A a sparse 200MB file 3. mmap descr_A 4. close descr_A 5. unlink file_A 6. memcpy 200MB to mmaped buffer 7. create a second file on nfs (file_B, descr_B) 8. write() 200MB from mmaped buffer to descr_B 9. close descr_B 10. munmap first file This code may need to be ran tens to hundred runs to trigger the condition. During the execution of the above code, unrelated applications enter uninterruptible sleep (D) - usually firefox2.0, Xorg/XFree86, gimp2.2, gconfd or bash; probably the most active processes. `dmesg` shows nothing of interest. `free` shows anywhere between 1MB and 80MB of memory still remaining free when the problem occurs. `cat /proc/*PID*/wchan` for all hanging processes contains page_sync. * Client Setups: Linux 2.6.18 debian kernel (not tainted) Intel P3/800 512MB ram 0 swap NFS root (rw,noatime,rsize=8192,wsize=8192,nfsvers=3,hard,lock,udp) NIC: 100mbit tulip Cardbus NFS server is Linux 2.6.8 (debian) Gnome running with ooffice, gimp2.2 and firefox2 open and Linux 2.6.18 debian kernel (not tainted) Intel P4/2.8 mem=192M boot option 0 swap NFS home (rw,nosuid,rsize=8192,wsize=8192,hard) NIC: 100mbit e100 PCI NFS server is Apple OSX 10.3 Gnome running with ooffice, gimp2.2 and firefox2 open This happens with NFS servers based on Linux 2.6.8 and OSX 10.3.x. There is nothing unusual in the server log files. Other than large nfs mmaps on limited ram clients, NFS clients are 100% stable (file locking, performance, 6 month uptimes, etc..) NOTE: I also ran the same code on the P4 machine in /tmp (local disk) and it too caused some applications to enter uninterruptible sleep (dozens of consecutive runs were needed). As such this looks not to be directly related to nfs. I would like to assist in any way I can in tracking this bug. I am open to running patched kernels, etc... Thank You, Sincerely, Michal Sabala PS. thank you for all the hard work on the Linux kernel. --- test-mmap.c: #include #include #include #include #include #include #include int main (int argc, char * argv[] ){ char * data = 0; int blocks = 12800; int bSize = 16384; char mmapFileName[] = "temp-XX"; int mmapFileDes = mkstemp( mmapFileName ); if ( mmapFileDes == -1 ){ printf( "cannot make temporary file %s !\n", mmapFileName ); exit( -1 ); } printf( "using desc %d tempfile %s\n", mmapFileDes, mmapFileName ); errno = 0; if ( lseek( mmapFileDes, (blocks*bSize)-1, SEEK_SET ) == -1 ){ if ( errno != 0 ){ perror ( "lseek error: " ); } printf( "cannot lseek tempfile %s !\n", mmapFileName); close( mmapFileDes ); unlink( mmapFileName ); exit( -1 ); } if ( write( mmapFileDes, "X", 1 ) != 1 ){ printf( "cannot sparse write tempfile %s !\n", mmapFileName); close( mmapFileDes ); unlink( mmapFileName ); exit( -1 ); } data = mmap ( NULL, (blocks*bSize), PROT_READ | PROT_WRITE, MAP_SHARED, mmapFileDes, 0 ); if ( data == (void *) -1 ){ printf( "mmap of %s failed!\n", mmapFileName ); close( mmapFileDes ); unlink( mmapFileName ); exit( -1 ); } printf( "block size: %d, blocks num: %d\n", bSize, blocks); close( mmapFileDes ); unlink( mmapFileName ); int i; char * ptr = data; for ( i = 1; i <= blocks; i++ ){ printf( "wrote %d of %d blocks to %s\n", i, blocks, mmapFileName ); memset( ptr, 0, bSize ); ptr += bSize; } // msync( data, blocks*bSize, MS_SYNC ); char destFile[] = "destination-XX"; int destDes = mkstemp( destFile ); if ( destDes == -1 ){ printf( "cannot make destination file %s !\n", destFile ); exit( -1 ); } printf( "using desc %d destfile %s\n", destDes, destFile); ptr = data; for ( i = 1; i <= blocks; i++ ){ int wLen = write( destDes, ptr, bSize ); printf( "wrote %d of %d blocks to %s\n", i, blocks, destFile ); if ( wLen != bSize ){ printf( "debug: short write to %s at %d bytes\n", destFile, wLen ); } ptr += bSize; } close( destDes ); munmap( data, blocks*bSize ); exit( 0 ); } -- Michal "Saahbs" Sabala - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/