Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Tue, Nov 22, 2016 at 05:14:02PM +0100, Vlastimil Babka wrote: > On 11/22/2016 05:06 PM, Marc MERLIN wrote: > > On Mon, Nov 21, 2016 at 01:56:39PM -0800, Marc MERLIN wrote: > >> On Mon, Nov 21, 2016 at 10:50:20PM +0100, Vlastimil Babka wrote: > 4.9rc5 however seems to be doing better, and is still running after 18 > hours. However, I got a few page allocation failures as per below, but > the > system seems to recover. > Vlastimil, do you want me to continue the copy on 4.9 (may take 3-5 > days) > or is that good enough, and i should go back to 4.8.8 with that patch > applied? > https://marc.info/?l=linux-mm&m=147423605024993 > >>> > >>> Hi, I think it's enough for 4.9 for now and I would appreciate trying > >>> 4.8 with that patch, yeah. > >> > >> So the good news is that it's been running for almost 5H and so far so > >> good. > > > > And the better news is that the copy is still going strong, 4.4TB and > > going. So 4.8.8 is fixed with that one single patch as far as I'm > > concerned. > > > > So thanks for that, looks good to me to merge. > > Thanks a lot for the testing. So what do we do now about 4.8? (4.7 is > already EOL AFAICS). > > - send the patch [1] as 4.8-only stable. Greg won't like that, I expect. > - alternatively a simpler (againm 4.8-only) patch that just outright > prevents OOM for 0 < order < costly, as Michal already suggested. > - backport 10+ compaction patches to 4.8 stable > - something else? > > Michal? Linus? > > [1] https://marc.info/?l=linux-mm&m=147423605024993 Sorry for my molasses rate of feedback. I found a workaround, setting vm/watermark_scale_factor to 500, and threw that in sysctl. This was on the MythTV box that OOMs everything after about a day on 4.8 otherwise. I've been running [1] for 9 days on it (4.8.4 + [1]) without issue, but just realized I forgot to remove the watermark_scale_factor workaround. I've restored that now, so I'll see if it becomes unhappy by tomorrow. I also threw up a few other things you had asked for (vmstat, zoneinfo before and after the first OOM on 4.8.4): http://0x.ca/sim/ref/4.8.4/ (that was before booting into a rebuild with [1] applied) Simon-
Re: Hung task detector versus NFS (TASK_KILLABLE)
On Mon, Mar 07, 2016 at 07:11:19PM -0800, Andi Kleen wrote: > > I write this because I would actually find it useful to see the original > > backtrace, even if it is interruptible, not just the collateral damage. > > Since the "skipping" of NFS is basically incomplete anyway, how big a > > deal is this "feature"? > > Random backtrace spewing is always a misfeature for 99.99+% of the users > for whom it is gibberish. Distributions all seem to ship with it on because apparently some people can read it. There was even discussion that the default 10 is not enough. > If you really need it yourself add a kprobe. To emulate a hung task backtrace even when TASK_KILLABLE? That sounds like some hoop-jumping, but I don't know kprobes. I'm just saying the current "NFS filter" is broken ("cat a" twice), but this really will make more noise for people (in cases where NFS is stuck for minutes), I guess I'll just sit in a corner with that line changed in my tree. Simon-
Hung task detector versus NFS (TASK_KILLABLE)
Hello! Back in 2008, you committed 316d9679f33caf7e683471647d1472bfe133d858 which changed softlockup.c (now moved to hung_task.c) to avoid logging a spew of soft lockup warnings when the Ethernet cable is unplugged with active NFS mounts. Meanwhile, I've been seeing hung task warnings like this for years, so I wondered what the deal is. It seems there are VFS paths that can enter uninterruptible sleep as result of locks held in interruptible sleep. For example, I can reproduce hung task warnings by firewalling NFS, then "cat a" twice: the second hangs in mutex_lock() from path_openat(), which then spews a hung task warning. I write this because I would actually find it useful to see the original backtrace, even if it is interruptible, not just the collateral damage. Since the "skipping" of NFS is basically incomplete anyway, how big a deal is this "feature"? Would anybody object if we just returned this to anything blocked? The lines in question these days are here in kernel/hung_task.c: /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */ if (t->state == TASK_UNINTERRUPTIBLE) check_hung_task(t, timeout); It used to be t->state & TASK_UNINTERRUPTIBLE. Simon-
Re: Dirty pages underflow on 3.14.23
On Wed, Jan 07, 2015 at 10:48:10PM +0100, Vlastimil Babka wrote: > On 01/07/2015 10:28 PM, Simon Kirby wrote: > > > Hmm...A possibly-related issue...Before trying this, after a fresh boot, > > /proc/vmstat showed: > > > > nr_alloc_batch 4294541205 > > This can happen, and not be a problem in general. However, there was a fix > abe5f972912d086c080be4bde67750630b6fb38b in 3.17 for a potential performance > issue if this counter overflows on single processor configuration. It was > marked > stable, but the 3.16 series was discontinued before the fix could be > backported. > So if you are on single-core, you might hit the performance issue. That particular commit seems to just change the code path in that case, but should it be underflowing at all on UP? > > Still, nr_alloc_batch reads as 4294254379 after MySQL restart, and now > > seems to stay up there. > > Hm if it stays there, then you are probably hitting the performance issue. > Look > at /proc/zoneinfo, which zone has the underflow. It means this zone will get > unfair amount of allocations, while others may contain stale data and would be > better candidates. In this case, it has only 640MB, and there's only DMA and Normal. This is affecting Normal, and DMA is so small that it probably doesn't matter. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dirty pages underflow on 3.14.23
On Wed, Jan 07, 2015 at 10:57:46AM +, Holger Hoffst?tte wrote: > On Tue, 06 Jan 2015 12:54:43 -0500, Mikulas Patocka wrote: > > > I can't reprodce it. It happened just once. > > > > That patch is supposed to fix an occasional underflow by a single page - > > while my meminfo showed underflow by 22952KiB (5738 pages). > > You are probably looking for: > commit 835f252c6debd204fcd607c79975089b1ecd3472 > "aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer" > > It definitely went into 3.14.26, don't know about 3.16.x. I can confirm that a MySQL shutdown/restart triggers it for me, even immediately following a fresh boot: # uname -a ; grep '^nr_dirty ' /proc/vmstat; /etc/init.d/mysql restart; \ grep '^nr_dirty ' /proc/vmstat Linux blue 3.16.6-blue #51 Mon Oct 20 14:00:47 PDT 2014 i686 GNU/Linux nr_dirty 13 [ ok ] Stopping MySQL database server: mysqld. [ ok ] Starting MySQL database server: mysqld . .. [info] Checking for tables which need an upgrade, are corrupt or were not closed cleanly.. nr_dirty 4294967245 Hmm...A possibly-related issue...Before trying this, after a fresh boot, /proc/vmstat showed: nr_alloc_batch 4294541205 and after the restart, it shows: nr_alloc_batch 161 ...anyway, git cherry-pick ce4b66be6cd964e84363afd4a603633dd061b3b8 on 3.16.6 tree does seem to fix nr_dirty from underflowing...Yay! Still, nr_alloc_batch reads as 4294254379 after MySQL restart, and now seems to stay up there. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dirty pages underflow on 3.14.23
On Mon, Jan 05, 2015 at 06:05:59PM -0500, Mikulas Patocka wrote: > Hi > > I would like to report a memory management bug where the dirty pages count > underflowed. Hello! I've been hitting this problem for a while now. I've seen it on: 3.12.9 3.14.4 3.16 3.16.6 When it occurs, /proc/vmstat shows nr_dirty values such as: nr_dirty 4294967031 (3.12.9) nr_dirty 4294967251 (3.16.6) No other counters appear to be negative or have wrapped in 32 bits, and /proc/meminfo is similar as with your report. See proc file copies and .config here: http://0x.ca/sim/ref/3.16.6-blue/ (hosting box is this one) > It happened after some time that the Dirty pages count underflowed, as can > be seen in /proc/meminfo. The underflow condition was persistent, > /proc/meminfo was showing the big value even when the system was > completely idle. The counter never returned to zero. > > The system didn't crash, but it became very slow - because of the big > value in the "Dirty" field, lazy writing was not working anymore, any > process that created a dirty page triggered immediate writeback, which > slowed down the system very much. The only fix was to reboot the machine. This is also the case with me, although each time it occurs it seems to be when I'm running apt-get upgrade to apply updates. Today, it occurred on 3.16.6 as I started an "apt-get update". It is still possible to dirty new pages and make some progress, but it becomes unusably slow. It ends up writing the same blocks forever (from blktrace | grep D); 33,00 2776 1.220890482 20335 D W 43765671 + 8 [kworker/u2:0] 33,00 2783 1.221073198 20335 D W 7439223 + 8 [kworker/u2:0] 33,00 2791 1.224824452 20335 D W 43765671 + 8 [kworker/u2:0] 33,00 2800 1.232559686 20335 D W 7439223 + 8 [kworker/u2:0] > The kernel version where this happened is 3.14.23. The kernel is compiled > without SMP and with peemption. The system is single-core 32-bit x86. Same. The only other oddity to note is that the IDE driver is still enabled in my case; root is on /dev/md6 which is a RAID 1 of hde1, hdg1. > I see that 3.14.24 containes some fix for underflow (commit > 6619741f17f541113a02c30f22a9ca22e32c9546, upstream commit > abe5f972912d086c080be4bde67750630b6fb38b), but it doesn't seem that that > commit fixes this condition. If you have a commit that could fix this, say > it. That doesn't seem to have made it to 3.16.6, but it sounds like a fairness thing more than a race fix. Vlastimil pointed at this as possibly useful: http://ozlabs.org/~akpm/mmots/broken-out/mm-protect-set_page_dirty-from-ongoing-truncation.patch ...but I can't reproduce this immediately. So far, I have to forget about it for a while, then do an apt-get upgrade. > MemTotal: 253504 kB MemTotal: 639396 kB 640MB should be enough for anybody. :) Hmm, just tried to shut it down as cleanly as possible with sysrq-s, sysrq-u, and got: SysRq : Emergency Sync Emergency Sync complete SysRq : Emergency Remount R/O [ cut here ] WARNING: CPU: 0 PID: 24535 at fs/ext3/inode.c:1590 ext3_ordered_writepage+0x7c/0x240() Modules linked in: xt_recent ts_kmp xt_string nfnetlink_log e100 xt_hashlimit xt_state xt_REDIRECT nf_conntrack_ftp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack [last unloaded: xt_recent] CPU: 0 PID: 24535 Comm: kworker/u2:0 Not tainted 3.16.6-blue #51 Hardware name: MICRO-STAR INTERNATIONAL CO., LTD MS-6330/MS-6330, BIOS 6.00 PG 06/15/2001 Workqueue: writeback bdi_writeback_workfn (flush-9:6) dd3f5c70 c16091f1 dd3f5ca0 c103231b c17bf9ac 5fd7 c17e8a3b 0636 c115db7c c115db7c e7c3d140 e4d673d0 dd3f5cb0 c103235d 0009 dd3f5cd8 c115db7c dd3f5cc8 c10fe13c Call Trace: [] dump_stack+0x16/0x18 [] warn_slowpath_common+0x7b/0xa0 [] ? ext3_ordered_writepage+0x7c/0x240 [] ? ext3_ordered_writepage+0x7c/0x240 [] warn_slowpath_null+0x1d/0x20 [] ext3_ordered_writepage+0x7c/0x240 [] ? __set_page_dirty_buffers+0xc/0x90 [] __writepage+0xb/0x30 [] ? mapping_tagged+0x10/0x10 [] write_cache_pages+0x161/0x3a0 [] ? blk_finish_plug+0xd/0x30 [] ? mapping_tagged+0x10/0x10 [] generic_writepages+0x2f/0x60 [] do_writepages+0x35/0x40 [] __writeback_single_inode+0x3b/0x1e0 [] writeback_sb_inodes+0x160/0x2e0 [] __writeback_inodes_wb+0x6c/0xa0 [] wb_writeback+0x1a2/0x240 [] bdi_writeback_workfn+0x149/0x370 [] process_one_work+0xef/0x310 [] worker_thread+0xe8/0x410 [] ? mod_delayed_work_on+0x60/0x60 [] ? mod_delayed_work_on+0x60/0x60 [] kthread+0x95/0xb0 [] ret_from_kernel_thread+0x20/0x30 [] ? __kthread_parkme+0x60/0x60 ---[ end trace ca1dc42be1a0b8e5 ]--- EXT4-fs (md7): re-mounted. Opts: (null) EXT4-fs (md2): re-mounted. Opts: (null) Emergency Remount complete EXT4-fs (md2): ext4_writepages: jbd2_start: 1024 pages, ino 9438915; err -30 EXT4-fs (md2): ext4_writepages: jbd2_start: 1024 pages, ino 9438915; err -30
Re: net_ns cleanup / RCU overhead
On Thu, Aug 28, 2014 at 01:46:58PM -0700, Paul E. McKenney wrote: > On Thu, Aug 28, 2014 at 03:33:42PM -0500, Eric W. Biederman wrote: > > > I just want to add a little bit more analysis to this. > > > > What we desire to be fast is the copy_net_ns, cleanup_net is batched and > > asynchronous which nothing really cares how long it takes except that > > cleanup_net holds the net_mutex and thus blocks copy_net_ns. > > > > The puzzle is why and which rcu delays Simon is seeing in the network > > namespace cleanup path, as it seems like the synchronize_rcu is not > > the only one, and in the case of vsftp with trivail network namespaces > > where nothing has been done we should not need to delay. > > Indeed, given the version and .config, I can't see why any individual > RCU grace-period operation would be particularly slow. > > I suggest using ftrace on synchronize_rcu() and friends. I made a parallel net namespace create/destroy benchmark that prints the progress and time to create and cleanup 32 unshare()d child processes: http://0x.ca/sim/ref/tools/netnsbench.c I noticed that if I haven't run it for a while, the first batch often is fast, followed by slowness from then on: 0.039478s -++-++-- 4.463837s +--+++-- 3.011882s +++---+- 2.283993s Fiddling around on a stock kernel, "echo 1 > /sys/kernel/rcu_expedited" makes behaviour change as it did with my patch: ++-++-+++-+-+-+-++-+-++--++-+--+-+-++--++-+-+-+-++-+--++ 0.801406s +-+-+-++-+-+-+-+-++--+-+-++-+--++-+-+-+-+-+-+-+-+-+-+-+--++-+--- 0.872011s ++--+-++--+-++--+-++--+-+-+-+-++-+--++--+-++-+-+-+-+--++-+-+-+-- 0.946745s How would I use ftrace on synchronize_rcu() here? As Eric said, cleanup_net() is batched, but while it is cleaning up, net_mutex is held. Isn't the issue just that net_mutex is held while some other things are going on that are meant to be lazy / batched? What is net_mutex protecting in cleanup_net()? I noticed that [kworker/u16:0]'s stack is often: [] wait_rcu_gp+0x46/0x50 [] synchronize_sched+0x2e/0x50 [] nf_nat_net_exit+0x2c/0x50 [nf_nat] [] ops_exit_list.isra.4+0x39/0x60 [] cleanup_net+0xf0/0x1a0 [] process_one_work+0x157/0x440 [] worker_thread+0x63/0x520 [] kthread+0xd6/0xf0 [] ret_from_fork+0x7c/0xb0 [] 0x and [] _rcu_barrier+0x154/0x1f0 [] rcu_barrier+0x10/0x20 [] kmem_cache_destroy+0x6c/0xb0 [] nf_conntrack_cleanup_net_list+0x167/0x1c0 [nf_conntrack] [] nf_conntrack_pernet_exit+0x65/0x70 [nf_conntrack] [] ops_exit_list.isra.4+0x53/0x60 [] cleanup_net+0xf0/0x1a0 [] process_one_work+0x157/0x440 [] worker_thread+0x63/0x520 [] kthread+0xd6/0xf0 [] ret_from_fork+0x7c/0xb0 [] 0x So I tried flushing iptables rules and rmmoding netfilter bits: -++++--- 0.179940s ++--+-+- 0.151988s ---+--+++--- 0.159967s ++--++-- 0.175964s Expedited: ++-+--++-+-+-+-+-+-+--++-+-+-++-+-+-+--++-+-+-+-+-+-+-+-+-+-+--- 0.079988s ++-+-+-+-+-+-+-+-+-+-+-+--++-+--++-+--+-++-+-+--++-+-+-+-+-+-+-- 0.089347s --+++--++--+-+++-+--++-+-+--++-+-+--++-- 0.081566s +-+++---++-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+--- 0.089026s So, much faster. It seems that just loading nf_conntrack_ipv4 (like by running iptables -t nat -nvL) is enough to slow it way down. But it is still capable of being fast, as above. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: net_ns cleanup / RCU overhead
On Thu, Aug 28, 2014 at 12:24:31PM -0700, Paul E. McKenney wrote: > On Tue, Aug 19, 2014 at 10:58:55PM -0700, Simon Kirby wrote: > > Hello! > > > > In trying to figure out what happened to a box running lots of vsftpd > > since we deployed a CONFIG_NET_NS=y kernel to it, we found that the > > (wall) time needed for cleanup_net() to complete, even on an idle box, > > can be quite long: > > > > #!/bin/bash > > > > ip netns delete test >&/dev/null > > while ip netns add test; do > > echo hi > > ip netns delete test > > done > > > > On my desktop and typical hosts, this prints at only around 4 or 6 per > > second. While this is happening, "vmstat 1" reports 100% idle, and there > > there are D-state processes with stacks similar to: > > > > 30566 [kworker/u16:1] D wait_rcu_gp+0x48, synchronize_sched+0x2f, > > cleanup_net+0xdb, process_one_work+0x175, worker_thread+0x119, > > kthread+0xbb, ret_from_fork+0x7c, 0x > > > > 32220 ip D copy_net_ns+0x68, create_new_namespaces+0xfc, > > unshare_nsproxy_namespaces+0x66, SyS_unshare+0x159, > > system_call_fastpath+0x16, 0x > > > > copy_net_ns() is waiting on net_mutex which is held by cleanup_net(). > > > > vsftpd uses CLONE_NEWNET to set up privsep processes. There is a comment > > about it being really slow before 2.6.35 (it avoids CLONE_NEWNET in that > > case). I didn't find anything that makes 2.6.35 any faster, but on Debian > > 2.6.36-5-amd64, I notice it does seem to be a bit faster than 3.2, 3.10, > > 3.16, though still not anything I'd ever want to rely on per connection. > > > > C implementation of the above: http://0x.ca/sim/ref/tools/netnsloop.c > > > > Kernel stack "top": http://0x.ca/sim/ref/tools/pstack > > > > What's going on here? > > That is a bit slow for many configurations, but there are some exceptions. > > So, what is your kernel's .config? I was unable to find a config (or stock kernel) that was any different, but here's the one we're using: http://0x.ca/sim/ref/3.10/config-3.10.53 How fast does the above test run for you? We've been running with the attached, which has helped a little, but it's still quite slow in our particular use case (vsftpd), and with the above test. Should I enable RCU_TRACE or STALL_INFO with a low timeout or something? Simon- -- >8 -- Subject: [PATCH] netns: use synchronize_rcu_expedited instead of synchronize_rcu Similar to ef323088, with synchronize_rcu(), we are only able to create and destroy about 4 or 7 net namespaces per second, which really puts a dent in the performance of programs attempting to use CLONE_NEWNET for privilege separation (vsftpd, chromium). --- net/core/net_namespace.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 85b6269..6dcb4b3 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -296,7 +296,7 @@ static void cleanup_net(struct work_struct *work) * This needs to be before calling the exit() notifiers, so * the rcu_barrier() below isn't sufficient alone. */ - synchronize_rcu(); + synchronize_rcu_expedited(); /* Run all of the network namespace exit methods */ list_for_each_entry_reverse(ops, &pernet_list, list) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
net_ns cleanup / RCU overhead
Hello! In trying to figure out what happened to a box running lots of vsftpd since we deployed a CONFIG_NET_NS=y kernel to it, we found that the (wall) time needed for cleanup_net() to complete, even on an idle box, can be quite long: #!/bin/bash ip netns delete test >&/dev/null while ip netns add test; do echo hi ip netns delete test done On my desktop and typical hosts, this prints at only around 4 or 6 per second. While this is happening, "vmstat 1" reports 100% idle, and there there are D-state processes with stacks similar to: 30566 [kworker/u16:1] D wait_rcu_gp+0x48, synchronize_sched+0x2f, cleanup_net+0xdb, process_one_work+0x175, worker_thread+0x119, kthread+0xbb, ret_from_fork+0x7c, 0x 32220 ip D copy_net_ns+0x68, create_new_namespaces+0xfc, unshare_nsproxy_namespaces+0x66, SyS_unshare+0x159, system_call_fastpath+0x16, 0x copy_net_ns() is waiting on net_mutex which is held by cleanup_net(). vsftpd uses CLONE_NEWNET to set up privsep processes. There is a comment about it being really slow before 2.6.35 (it avoids CLONE_NEWNET in that case). I didn't find anything that makes 2.6.35 any faster, but on Debian 2.6.36-5-amd64, I notice it does seem to be a bit faster than 3.2, 3.10, 3.16, though still not anything I'd ever want to rely on per connection. C implementation of the above: http://0x.ca/sim/ref/tools/netnsloop.c Kernel stack "top": http://0x.ca/sim/ref/tools/pstack What's going on here? Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mutexes: Add CONFIG_DEBUG_MUTEX_FASTPATH=y debug variant to debug SMP races
On Wed, Dec 04, 2013 at 01:14:56PM -0800, Linus Torvalds wrote: > The lock we're moving up isn't the lock that actually protects the > whole allocation logic (it's the lock that then protects the pipe > contents when a pipe is *used*). So it's a useless lock, and moving it > up is a good idea regardless (because it makes the locks only protect > the parts they are actually *supposed* to protect. > > And while extraneous lock wouldn't normally hurt, the sleeping locks > (both mutexes and semaphores) aren't actually safe wrt de-allocation - > they protect anything *inside* the lock, but the lock data structure > itself is accessed racily wrt other lockers (in a way that still > leaves the locked region protected, but not the lock itself). If you > care about details, you can walk through my example. Yes, this makes sense now. It was spin_unlock_mutex() on the pipe lock that itself was already already freed and poisoned by another cpu. This explicit poison check also fires: diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index bf156de..ae425d0 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -159,6 +159,7 @@ static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) __ticket_unlock_slowpath(lock, prev); } else __add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX); + WARN_ON(*(unsigned int *)&lock->tickets.head == 0x6b6b6b6c); } static inline int arch_spin_is_locked(arch_spinlock_t *lock) It warns only as often as the poison checking already did, with a stack of warn_*, __mutex_unlock_slowpath(), mutex_unlock(), pipe_release(). Trying to prove a negative, of course, but I tested with your first fix overnight and got no errors. Current git (with b0d8d2292160bb63de) also looks good. I will leave it running for a few days. Thanks for getting stuck on this one. It was educational, at least! Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mutexes: Add CONFIG_DEBUG_MUTEX_FASTPATH=y debug variant to debug SMP races
On Tue, Dec 03, 2013 at 09:52:33AM +0100, Ingo Molnar wrote: > Indeed: this comes from mutex->count being separate from > mutex->wait_lock, and this should affect every architecture that has a > mutex->count fast-path implemented (essentially every architecture > that matters). > > Such bugs should also magically go away with mutex debugging enabled. Confirmed: I ran the reproducer with CONFIG_DEBUG_MUTEXES for a few hours, and never got a single poison overwritten notice. > I'd expect such bugs to be more prominent with unlucky object > size/alignment: if mutex->count lies on a separate cache line from > mutex->wait_lock. > > Side note: this might be a valid light weight debugging technique, we > could add padding between the two fields to force them into separate > cache lines, without slowing it down. > > Simon, would you be willing to try the fairly trivial patch below? > Please enable CONFIG_DEBUG_MUTEX_FASTPATH=y. Does your kernel fail > faster that way? I didn't see much of a change other than the incremented poison byte is now further in due to the padding, and it shows up in kmalloc-256. I also tried with Linus' udelay() suggestion, below. With this, there were many occurrences per second. Simon- diff --git a/kernel/mutex.c b/kernel/mutex.c index d24105b..f65e735 100644 --- a/kernel/mutex.c +++ b/kernel/mutex.c @@ -25,6 +25,7 @@ #include #include #include +#include /* * In the DEBUG case we are using the "NULL fastpath" for mutexes, @@ -740,6 +741,11 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested) wake_up_process(waiter->task); } + /* udelay a bit if the spinlock isn't contended */ + if (lock->wait_lock.rlock.raw_lock.tickets.head + 1 == + lock->wait_lock.rlock.raw_lock.tickets.tail) + udelay(1); + spin_unlock_mutex(&lock->wait_lock, flags); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mutexes: Add CONFIG_DEBUG_MUTEX_FASTPATH=y debug variant to debug SMP races
On Tue, Dec 03, 2013 at 10:10:29AM -0800, Linus Torvalds wrote: > On Tue, Dec 3, 2013 at 12:52 AM, Ingo Molnar wrote: > > > > I'd expect such bugs to be more prominent with unlucky object > > size/alignment: if mutex->count lies on a separate cache line from > > mutex->wait_lock. > > I doubt that makes much of a difference. It's still just "CPU cycles" > away, and the window will be tiny unless you have multi-socket > machines and/or are just very unlucky. > > For stress-testing, it would be much better to use some hack like > > /* udelay a bit if the spinlock isn't contended */ > if (mutex->wait_lock.ticket.head+1 == mutex->wait_lock.ticket.tail) > udelay(1); > > in __mutex_unlock_common() just before the spin_unlock(). Make the > window really *big*. I haven't had a chance yet to do much testing of the proposed race fix and race enlarging patches, but I did write a tool to reproduce the race. I started it running and left for dinner, and sure enough, it actually seems to work on plain 3.12 on a Dell PowerEdge r410 w/dual E5520, reproducing "Poison overwritten" at a rate of about once every 15 minutes (running 6 in parallel, booted with "slub_debug"). I have no idea if actually relying on tsc alignment between cores and sockets is a reasonable idea these days, but it seems to work. I first used a read() on a pipe close()d by the other process to synchronize them, but this didn't seem to work as well as busy-waiting until the timestamp counters pass a previously-decided-upon start time. Meanwhile, I still don't understand how moving the unlock _up_ to cover less of the code can solve the race, but I will stare at your long explanation more tomorrow. Simon- #include #include #include #include #include #define MAX_PIPES 450 #define FORK_CYCLES 200 #define CLOSE_CYCLES 10 #define STAT_SHIFT 6 static inline unsigned long readtsc() { unsigned int low, high; asm volatile("rdtsc" : "=a" (low), "=d" (high)); return low | ((unsigned long)(high) << 32); } static int pipefd[MAX_PIPES][2]; int main(int argc, char *argv[]) { unsigned long loop, race_start, miss; unsigned long misses = 0, races = 0; int i; pid_t pid; struct rlimit rlim = { .rlim_cur = MAX_PIPES * 2 + 96, .rlim_max = MAX_PIPES * 2 + 96, }; if (setrlimit(RLIMIT_NOFILE, &rlim) != 0) perror("setrlimit(RLIMIT_NOFILE)"); for (loop = 1;;loop++) { /* * Make a bunch of pipes */ for (i = 0;i < MAX_PIPES;i++) { if (pipe(pipefd[i]) == -1) { perror("pipe()"); exit(EXIT_FAILURE); } } race_start = readtsc() + FORK_CYCLES; asm("":::"memory"); pid = fork(); if (pid == -1) { perror("fork()"); exit(EXIT_FAILURE); } pid = !!pid; /* * Close one pipe descriptor per process */ for (i = 0;i < MAX_PIPES;i++) close(pipefd[i][!pid]); for (i = 0;i < MAX_PIPES;i++) { /* * Line up and try to close at the same time */ miss = 1; while (readtsc() < race_start) miss = 0; close(pipefd[i][pid]); misses+= miss; race_start+= CLOSE_CYCLES; } races+= MAX_PIPES; if (!(loop & (~(~0UL << STAT_SHIFT fprintf(stderr, "%c %lu (%.2f%% false starts)\n", "CP"[pid], readtsc(), misses * 100. / races); if (pid) wait(NULL); /* Parent */ else exit(0); /* Child */ } }
Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()
On Sat, Nov 30, 2013 at 09:25:33AM -0800, Linus Torvalds wrote: > On Sat, Nov 30, 2013 at 1:43 AM, Simon Kirby wrote: > > > I turned on kmalloc-192 tracing to find what else is using it: struct > > nfs_fh, struct bio, and struct cred. Poking around those, struct bio has > > bi_cnt, but it is way down in the struct. struct cred has "usage", but it > > comes first. Hmm. Nevertheless, I set: > > > > CONFIG_DEBUG_MUTEXES=y > > CONFIG_DEBUG_LIST=y > > CONFIG_DEBUG_CREDENTIALS=y > > > > And tried: > > > > diff --git a/include/linux/cred.h b/include/linux/cred.h > > index 04421e8..2646fe9 100644 > > --- a/include/linux/cred.h > > +++ b/include/linux/cred.h > > @@ -205,7 +205,9 @@ static inline void validate_process_creds(void) > > */ > > static inline struct cred *get_new_cred(struct cred *cred) > > { > > - atomic_inc(&cred->usage); > > + if (atomic_inc_return(&cred->usage) == 0x6c) { > > + WARN_ON(cred->uid == 0x6b); > > Oh, damn, I thought you had found it, and got very excited and already > wrote a long email about things I wanted you to try. And then I > started looking closer... > > That test is wrong. Both of those fields are 32-bit, so testing them > against 0x6b/0x6c is bogus: you're just catching real cases. The > reason it catches omreport is presumably because omreport runs as some > special user that happens to have uid 107 (on my machine that happens > to be qemu). And having a usage count of 108 isn't particularly > strange either - creds get a lot of re-use. > > So close. It *might* still be one of those cases, but it doesn't > really sound very likely. "bi_cnt" is deep inside the struct bio, and > "usage" is at offset 0, not offset 4. And the ns_fh isn't very > interesting. *head smack* Too much 8-bit AVR coding... Makes sense now: uid=107(nagios) gid=109(nagios) groups=109(nagios) Well, the chances of atomic_inc intentionally incrementing 0x6b6b6b6b are probably pretty rare. I'll try that. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()
On Tue, Nov 26, 2013 at 03:16:09PM -0800, Linus Torvalds wrote: > On Mon, Nov 25, 2013 at 4:44 PM, Simon Kirby wrote: > > > > I was hoping this or something else by 3.12 would have fixed it, so after > > testing we deployed this everywhere and turned off the rest of the debug > > options. I missed slub_debug on one server, though...and it just hit > > another case of overwritten poison. > > Your thing is *very* consistent, it's once more four bytes into that > pipe-info. And it's once more that exact same "increment second word > in the allocation" pattern. > > > Is it true that with slub_debug, aliasing of equal-sized objects is > > turned off, and so they shouldn't be immediately side-by-side? In other > > words, would there be similar scrawling victim chances as allocating > > pipe_inode_info with pages instead of slabs? "slabinfo -a" is empty. > > So the thing is, with slub debugging, slub shouldn't be merging > different slab caches. > > HOWEVER. > > The pipe-info structure isn't using its own slab cache, it's just > using "kmalloc()". So it by definition will merge with all other > kmalloc() allocations of the same size (or, to be exact, of "similar > enough size to hit the same size bucket"). In your case it's the > 192-byte-sized bucket. I turned on kmalloc-192 tracing to find what else is using it: struct nfs_fh, struct bio, and struct cred. Poking around those, struct bio has bi_cnt, but it is way down in the struct. struct cred has "usage", but it comes first. Hmm. Nevertheless, I set: CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LIST=y CONFIG_DEBUG_CREDENTIALS=y And tried: diff --git a/include/linux/bio.h b/include/linux/bio.h index ec48bac..216dc43 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -168,7 +168,7 @@ static inline void *bio_data(struct bio *bio) * returns. and then bio would be freed memory when if (bio->bi_flags ...) * runs */ -#define bio_get(bio) atomic_inc(&(bio)->bi_cnt) +#define bio_get(bio) WARN_ON(atomic_inc_return(&(bio)->bi_cnt) == 0x6c) #if defined(CONFIG_BLK_DEV_INTEGRITY) /* diff --git a/include/linux/cred.h b/include/linux/cred.h index 04421e8..2646fe9 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -205,7 +205,9 @@ static inline void validate_process_creds(void) */ static inline struct cred *get_new_cred(struct cred *cred) { - atomic_inc(&cred->usage); + if (atomic_inc_return(&cred->usage) == 0x6c) { + WARN_ON(cred->uid == 0x6b); + } return cred; } On the same server, this last hunk warned fairly quickly: [ 850.303535] [ cut here ] [ 850.312774] WARNING: CPU: 3 PID: 6169 at include/linux/cred.h:209 get_empty_filp+0x109/0x1b0() [ 850.329974] Modules linked in: ipmi_devintf aoe ipmi_si bnx2 ipmi_msghandler evdev serio_raw [ 850.346913] CPU: 3 PID: 6169 Comm: omreport Not tainted 3.12.0-hw-debug-mutexes+ #83 [ 850.362374] Hardware name: Dell Inc. PowerEdge 1950/0UR033, BIOS 2.0.1 10/30/2007 [ 850.377316] 0009 880428d0fd28 817f2407 88043fccf9e8 [ 850.392134] 880428d0fd68 8105a537 880428d0fd58 [ 850.406936] 880428d89e00 88042960f480 880428d0ff24 88042a19 [ 850.421746] Call Trace: [ 850.426627] [] dump_stack+0x46/0x58 [ 850.436888] [] warn_slowpath_common+0x87/0xb0 [ 850.448878] [] warn_slowpath_null+0x15/0x20 [ 850.460523] [] get_empty_filp+0x109/0x1b0 [ 850.471818] [] path_openat+0x43/0x660 [ 850.482426] [] ? fcntl_setlk+0x5b/0x2d0 [ 850.493391] [] do_filp_open+0x3e/0xa0 [ 850.504008] [] ? mntput_no_expire+0x44/0x130 [ 850.515842] [] ? __alloc_fd+0x42/0x110 [ 850.526630] [] do_sys_open+0x13c/0x230 [ 850.537428] [] compat_SyS_open+0x16/0x20 [ 850.548579] [] sysenter_dispatch+0x7/0x25 [ 850.559888] ---[ end trace acdbea3e141dbaec ]--- All traces are the same, and all Comms are "omreport", which is from the Dell OpenManage tools blob, executed regularly for RAID monitoring. Running it directly does not seem to cause the warning. kern.log shows it seems to warn every 20 minutes. No warnings from CONFIG_DEBUG_CREDENTIALS magic checking at all. Is there anything interesting about this tool? It is 32-bit. I can hook path_openat() and check for the cred contents there to print the path, if that would help. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()
On Tue, Aug 20, 2013 at 12:51:11AM -0700, Ian Applegate wrote: > Unfortunately no boxen with CONFIG_DEBUG_MUTEXES among them. I can > enable on a few and should have some results within the day. These > mainly serve (quite a bit of) HTTP/S cache traffic. > > On Tue, Aug 20, 2013 at 12:21 AM, Al Viro wrote: > > On Tue, Aug 20, 2013 at 12:17:52AM -0700, Ian Applegate wrote: > >> We are also seeing this or a similar issue. On a fairly widespread > >> deployment of 3.10.1 & 3.10.6 this occurred fairly consistently on the > >> order of 36 days (combined MTBF.) > > > > Do you have any boxen with CONFIG_DEBUG_MUTEXES among those? What > > kind of setup do you have on those, BTW? Hmm. So, we went through a few months of running with Linus' suggested culprit-catching patch w/DEBUG_PAGE_ALLOC, but it never tripped. We also ran with DEBUG_MUTEXES, but that never seem to catch anything, either. Ian, is it true that what you saw involved no btrfs? I was still guessing this is related to btrfs, as we are only seeing this on boxes doing btrfs rsync-snapshot backups. I don't know what else is interesting about our workload there, since we're not doing anything exotic. Meanwhille, with DEBUG_LIST on 3.12-rc, we found list corruption, which Josef fixed in 93858769172c4e3678917810e9d5de360eb991cc. This missed 3.12, unfortunately, so I built a 3.12 with Josef's btrfs-next merged (to 54563d41a58be77e9bd9ef7af1ea4026cf0e7e07, which contained that fix). I was hoping this or something else by 3.12 would have fixed it, so after testing we deployed this everywhere and turned off the rest of the debug options. I missed slub_debug on one server, though...and it just hit another case of overwritten poison. Is it true that with slub_debug, aliasing of equal-sized objects is turned off, and so they shouldn't be immediately side-by-side? In other words, would there be similar scrawling victim chances as allocating pipe_inode_info with pages instead of slabs? "slabinfo -a" is empty. [158037.526662] = [158037.528014] BUG kmalloc-192 (Not tainted): Poison overwritten [158037.528014] - [158037.528014] [158037.528014] Disabling lock debugging due to kernel taint [158037.528014] INFO: 0x88013af3da6c-0x88013af3da6c. First byte 0x6c instead of 0x6b [158037.528014] INFO: Allocated in alloc_pipe_info+0x1f/0xb0 age=22 cpu=3 pid=26402 [158037.528014] __slab_alloc.constprop.63+0x35b/0x3a0 [158037.528014] kmem_cache_alloc_trace+0xab/0x110 [158037.528014] alloc_pipe_info+0x1f/0xb0 [158037.528014] create_pipe_files+0x41/0x1f0 [158037.528014] __do_pipe_flags+0x3c/0xb0 [158037.528014] SyS_pipe2+0x1b/0xa0 [158037.528014] SyS_pipe+0xb/0x10 [158037.528014] system_call_fastpath+0x16/0x1b [158037.528014] INFO: Freed in free_pipe_info+0x6a/0x70 age=39 cpu=1 pid=26402 [158037.528014] __slab_free+0x2d/0x2d4 [158037.528014] kfree+0xfd/0x130 [158037.528014] free_pipe_info+0x6a/0x70 [158037.528014] pipe_release+0x94/0xf0 [158037.528014] __fput+0xa7/0x230 [158037.528014] fput+0x9/0x10 [158037.528014] task_work_run+0x97/0xd0 [158037.528014] do_notify_resume+0x66/0x70 [158037.528014] int_signal+0x12/0x17 [158037.528014] INFO: Slab 0xea0004ebcf00 objects=31 used=29 fp=0x88013af3e080 flags=0x80004080 [158037.528014] INFO: Object 0x88013af3da68 @offset=6760 fp=0x88013af3ca28 [158037.528014] [158037.528014] Bytes b4 88013af3da58: 97 b8 59 02 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ..Y. [158037.528014] Object 88013af3da68: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b lkkk [158037.528014] Object 88013af3da78: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [158037.528014] Object 88013af3da88: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [158037.528014] Object 88013af3da98: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [158037.528014] Object 88013af3daa8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [158037.528014] Object 88013af3dab8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [158037.528014] Object 88013af3dac8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [158037.528014] Object 88013af3dad8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [158037.528014] Object 88013af3dae8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [158037.528014] Object 88013af3daf8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [158037.528014] Object 88013af3db08: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [158037.528014] Object 88013af3db18: 6b 6
Re: [3.12-rc] sg_open: leaving the kernel with locks still held!
On Wed, Oct 23, 2013 at 10:10:47AM -0400, Douglas Gilbert wrote: > On 13-10-23 03:44 AM, James Bottomley wrote: > >On Tue, 2013-10-22 at 20:41 -0400, Douglas Gilbert wrote: > >>On 13-10-22 04:56 PM, Simon Kirby wrote: > >>>Hello! > >>> > >>>While trying to figure out why the request queue to sda (ext4) was > >>>clogging up on one of our btrfs backup boxes, I noticed a megarc process > >>>in D state, so enabled locking debugging, and got this (on 3.12-rc6): > >>> > >>>[ 205.372823] > >>>[ 205.372901] [ BUG: lock held when returning to user space! ] > >>>[ 205.372979] 3.12.0-rc6-hw-debug-pagealloc+ #67 Not tainted > >>>[ 205.373055] > >>>[ 205.373132] megarc.bin/5283 is leaving the kernel with locks still held! > >>>[ 205.373212] 1 lock held by megarc.bin/5283: > >>>[ 205.373285] #0: (&sdp->o_sem){.+.+..}, at: [] > >>>sg_open+0x3a0/0x4d0 > >>> > >>>Vaughan, it seems you touched this area last in 15b06f9a02406e, and git > >>>tag --contains says this went in for 3.12-rc. We didn't see this on 3.11, > >>>though I haven't tried with lockdep. > >>> > >>>This is caused by some of our internal RAID monitoring scripts that run > >>>"megarc.bin -dispCfg -a0" (even though that controller isn't present on > >>>this server -- a PowerEdge 2950 w/Perc 5). > >>> > >>>strace output of the program execution that causes the above message is > >>>here: http://0x.ca/sim/ref/3.12-rc6/megarc_strace.txt > >> > >>This has been reported. That patch will be reverted or, > >>if there is enough time, a fix will (or at least should) > >>go in before the release of lk 3.12 . > > > >I think you've got about a week to prove you can fix it (before 3.12 > >goes final). I'll send my current set of fixes to Linus without doing > >anything about sg. > > "prove" is a big ask, especially coming from a > mathematician. I consider it more hacking (in the > golf sense) on my part to tweak well-meaning patches > to the sg driver that cause collateral damage. Further, > I suspect Vaughan's patch was an attempt to fix > damage left be a previous sg_open() hacker. > > I have asked Simon Kirby to apply the patch: > http://marc.info/?l=linux-scsi&m=138237283432010&w=2 > and report if it fixes his problems. Further I have > written three test programs to test O_EXCL handling on > SCSI devices, two of which are in the examples directory > of sg3_utils version 1.37 . The latest one (single > exclusive writer, multiple readers) can be found in > the News section of: >http://sg.danny.cz/sg/ > These tests don't check all possibilities (e.g. random > signals, ml error processing and detached devices) but > they are better than nothing. And, as a side issue, they > break bsg (cause it ignores O_EXCL) and break the block > layer (e.g. /dev/sdb) so perhaps it should be reverted :-) Well, this patch works for me in that I see no more lockdep warnings or unintended consequences when running the same "megarc.bin -dispCfg -a0" command. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[3.12-rc] sg_open: leaving the kernel with locks still held!
Hello! While trying to figure out why the request queue to sda (ext4) was clogging up on one of our btrfs backup boxes, I noticed a megarc process in D state, so enabled locking debugging, and got this (on 3.12-rc6): [ 205.372823] [ 205.372901] [ BUG: lock held when returning to user space! ] [ 205.372979] 3.12.0-rc6-hw-debug-pagealloc+ #67 Not tainted [ 205.373055] [ 205.373132] megarc.bin/5283 is leaving the kernel with locks still held! [ 205.373212] 1 lock held by megarc.bin/5283: [ 205.373285] #0: (&sdp->o_sem){.+.+..}, at: [] sg_open+0x3a0/0x4d0 Vaughan, it seems you touched this area last in 15b06f9a02406e, and git tag --contains says this went in for 3.12-rc. We didn't see this on 3.11, though I haven't tried with lockdep. This is caused by some of our internal RAID monitoring scripts that run "megarc.bin -dispCfg -a0" (even though that controller isn't present on this server -- a PowerEdge 2950 w/Perc 5). strace output of the program execution that causes the above message is here: http://0x.ca/sim/ref/3.12-rc6/megarc_strace.txt Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()
On Mon, Aug 19, 2013 at 04:31:38PM -0700, Simon Kirby wrote: > On Mon, Aug 19, 2013 at 05:24:41PM -0400, Chris Mason wrote: > > > Quoting Linus Torvalds (2013-08-19 17:16:36) > > > On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter > > > wrote: > > > > On Mon, 19 Aug 2013, Simon Kirby wrote: > > > > > > > >>[... ] The > > > >> alloc/free traces are always the same -- always alloc_pipe_info and > > > >> free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4: > > > >> > > > >> Object 880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > > > >> 6b lkkk > > > > > > > > This looks like an increment after free in the second 32 bit value of > > > > the > > > > structure. First 32 bit value's poison is unchanged. > > > > > > Ugh. If that is "struct pipe_inode_info" and I read it right, that's > > > the "wait_lock" spinlock that is part of the mutex. > > > > > > Doing a "spin_lock()" could indeed cause an increment operation. But > > > it still sounds like a very odd case. And even for some wild pointer > > > I'd then expect the spin_unlock to also happen, and to then increment > > > the next byte (or word) too. More importantly, for a mutex, I'd expect > > > the *other* fields to be corrupted too (the "waiter" field etc). That > > > is, unless we're still spinning waiting for the mutex, but with that > > > value we shouldn't, as far as I can see. > > > > > > > Simon, is this box doing btrfs send/receive? If so, it's probably where > > this pipe is coming from. > > No, not for some time (a few kernel versions ago). > > > Linus' CONFIG_DEBUG_PAGE_ALLOC suggestions are going to be the fastest > > way to find it, I can give you a patch if it'll help. > > I presume it's just: > > diff --git a/fs/pipe.c b/fs/pipe.c > index d2c45e1..30d5b8d 100644 > --- a/fs/pipe.c > +++ b/fs/pipe.c > @@ -780,7 +780,7 @@ struct pipe_inode_info *alloc_pipe_info(void) > { > struct pipe_inode_info *pipe; > > - pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL); > + pipe = (void *)get_zeroed_page(GFP_KERNEL); > if (pipe) { > pipe->bufs = kzalloc(sizeof(struct pipe_buffer) * > PIPE_DEF_BUFFERS, GFP_KERNEL); > if (pipe->bufs) { > @@ -790,7 +790,7 @@ struct pipe_inode_info *alloc_pipe_info(void) > mutex_init(&pipe->mutex); > return pipe; > } > - kfree(pipe); > + free_page((unsigned long)pipe); > } > > return NULL; > @@ -808,7 +808,7 @@ void free_pipe_info(struct pipe_inode_info *pipe) > if (pipe->tmp_page) > __free_page(pipe->tmp_page); > kfree(pipe->bufs); > - kfree(pipe); > + free_page((unsigned long)pipe); > } > > static struct vfsmount *pipe_mnt __read_mostly; > > ...and CONFIG_DEBUG_PAGEALLOC enabled. > > > It would be nice if you could trigger this on plain 3.11-rcX instead of > > btrfs-next. > > On 3.10 it was with some btrfs-next pulled in, but the 3.11-rc4 traces > were from 3.11-rc4 with just some of our local patches: > > > git diff --stat v3.11-rc4..master > firmware/Makefile |4 +- > firmware/bnx2/bnx2-mips-06-6.2.3.fw.ihex | 5804 ++ > firmware/bnx2/bnx2-mips-09-6.2.1b.fw.ihex | 6496 + > kernel/acct.c | 21 +- > net/sunrpc/auth.c |2 +- > net/sunrpc/clnt.c | 10 + > net/sunrpc/xprt.c |8 +- > 7 files changed, 12335 insertions(+), 10 deletions(-) > > None of them look relevant, but I'm building vanilla -rc4 with > CONFIG_DEBUG_PAGEALLOC and the patch above. Stock 3.11-rc4 plus the above get_zeroed_page() for pipe allocations has been running since August 19th on a few btrfs boxes. It has been quiet until a few days ago, where we hit this: BUG: soft lockup - CPU#5 stuck for 22s! [btrfs-cleaner:5871] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe serio_raw bnx2 evdev CPU: 5 PID: 5871 Comm: btrfs-cleaner Not tainted 3.11.0-rc4-hw+ #48 Hardware name: Dell Inc. PowerEdge 2950/0NH278, BIOS 2.7.0 10/30/2010 task: 8804261117d0 ti: 8804120d8000 task.ti: 8804120d8000 RIP: 0010:[] [] _raw_spin_unlock_irqrestore+0xc/0x20 RSP: 0018:8804120d98b8 EFLAGS: 0296
Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()
On Mon, Aug 19, 2013 at 05:24:41PM -0400, Chris Mason wrote: > Quoting Linus Torvalds (2013-08-19 17:16:36) > > On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter wrote: > > > On Mon, 19 Aug 2013, Simon Kirby wrote: > > > > > >>[... ] The > > >> alloc/free traces are always the same -- always alloc_pipe_info and > > >> free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4: > > >> > > >> Object 880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > > >> lkkk > > > > > > This looks like an increment after free in the second 32 bit value of the > > > structure. First 32 bit value's poison is unchanged. > > > > Ugh. If that is "struct pipe_inode_info" and I read it right, that's > > the "wait_lock" spinlock that is part of the mutex. > > > > Doing a "spin_lock()" could indeed cause an increment operation. But > > it still sounds like a very odd case. And even for some wild pointer > > I'd then expect the spin_unlock to also happen, and to then increment > > the next byte (or word) too. More importantly, for a mutex, I'd expect > > the *other* fields to be corrupted too (the "waiter" field etc). That > > is, unless we're still spinning waiting for the mutex, but with that > > value we shouldn't, as far as I can see. > > > > Simon, is this box doing btrfs send/receive? If so, it's probably where > this pipe is coming from. No, not for some time (a few kernel versions ago). > Linus' CONFIG_DEBUG_PAGE_ALLOC suggestions are going to be the fastest > way to find it, I can give you a patch if it'll help. I presume it's just: diff --git a/fs/pipe.c b/fs/pipe.c index d2c45e1..30d5b8d 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -780,7 +780,7 @@ struct pipe_inode_info *alloc_pipe_info(void) { struct pipe_inode_info *pipe; - pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL); + pipe = (void *)get_zeroed_page(GFP_KERNEL); if (pipe) { pipe->bufs = kzalloc(sizeof(struct pipe_buffer) * PIPE_DEF_BUFFERS, GFP_KERNEL); if (pipe->bufs) { @@ -790,7 +790,7 @@ struct pipe_inode_info *alloc_pipe_info(void) mutex_init(&pipe->mutex); return pipe; } - kfree(pipe); + free_page((unsigned long)pipe); } return NULL; @@ -808,7 +808,7 @@ void free_pipe_info(struct pipe_inode_info *pipe) if (pipe->tmp_page) __free_page(pipe->tmp_page); kfree(pipe->bufs); - kfree(pipe); + free_page((unsigned long)pipe); } static struct vfsmount *pipe_mnt __read_mostly; ...and CONFIG_DEBUG_PAGEALLOC enabled. > It would be nice if you could trigger this on plain 3.11-rcX instead of > btrfs-next. On 3.10 it was with some btrfs-next pulled in, but the 3.11-rc4 traces were from 3.11-rc4 with just some of our local patches: > git diff --stat v3.11-rc4..master firmware/Makefile |4 +- firmware/bnx2/bnx2-mips-06-6.2.3.fw.ihex | 5804 ++ firmware/bnx2/bnx2-mips-09-6.2.1b.fw.ihex | 6496 + kernel/acct.c | 21 +- net/sunrpc/auth.c |2 +- net/sunrpc/clnt.c | 10 + net/sunrpc/xprt.c |8 +- 7 files changed, 12335 insertions(+), 10 deletions(-) None of them look relevant, but I'm building vanilla -rc4 with CONFIG_DEBUG_PAGEALLOC and the patch above. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()
On Sat, Jul 06, 2013 at 11:27:38AM +0300, Pekka Enberg wrote: > On Sat, Jul 6, 2013 at 3:09 AM, Simon Kirby wrote: > > We saw two Oopses overnight on two separate boxes that seem possibly > > related, but both are weird. These boxes typically run btrfs for rsync > > snapshot backups (and usually Oops in btrfs ;), but not this time! > > backup02 was running 3.10-rc6 plus btrfs-next at the time, and backup03 > > was running 3.10 release plus btrfs-next from yesterday. Full kern.log > > and .config at http://0x.ca/sim/ref/3.10/ > > > > backup02's first Oops: > > > > BUG: unable to handle kernel paging request at 0001 > > IP: [] kmem_cache_alloc+0x4b/0x110 > > PGD 1f54f7067 PUD 0 > > Oops: [#1] SMP > > Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe microcode > > serio_raw bnx2 evdev > > CPU: 0 PID: 23112 Comm: ionice Not tainted 3.10.0-rc6-hw+ #46 > > Hardware name: Dell Inc. PowerEdge 2950/0NH278, BIOS 2.7.0 10/30/2010 > > task: 8802c3f08000 ti: 8801b4876000 task.ti: 8801b4876000 > > RIP: 0010:[] [] > > kmem_cache_alloc+0x4b/0x110 > > RSP: 0018:8801b4877e88 EFLAGS: 00010206 > > RAX: RBX: 8802c3f08000 RCX: 017f040e > > RDX: 017f040d RSI: 00d0 RDI: 8107a503 > > RBP: 8801b4877ec8 R08: 00016a80 R09: > > R10: 7fff025fe120 R11: 0246 R12: 00d0 > > R13: 88042d8019c0 R14: 0001 R15: 7fc3588ee97f > > FS: () GS:88043fc0() knlGS: > > CS: 0010 DS: ES: CR0: 8005003b > > CR2: 0001 CR3: 000409d68000 CR4: 07f0 > > DR0: DR1: DR2: > > DR3: DR6: 0ff0 DR7: 0400 > > Stack: > > 8801b4877ed8 8112a1bc 8800985acd20 8802c3f08000 > > 0001 7fc3588ee334 7fc358af5758 7fc3588ee97f > > 8801b4877ee8 8107a503 8801b4877ee8 ffea > > Call Trace: > > [] ? __fput+0x12c/0x240 > > [] prepare_creds+0x23/0x150 > > [] SyS_faccessat+0x34/0x1f0 > > [] SyS_access+0x13/0x20 > > [] system_call_fastpath+0x16/0x1b > > Code: 75 f0 4c 89 7d f8 49 8b 4d 00 65 48 03 0c 25 68 da 00 00 48 8b 51 08 > > 4c 8b 31 4d 85 f6 74 5f 49 63 45 20 4d 8b 45 00 48 8d 4a 01 <49> 8b 1c 06 > > 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 c8 49 63 > > RIP [] kmem_cache_alloc+0x4b/0x110 > > RSP > > CR2: 0001 > > ---[ end trace 744477356cd98306 ]--- > > > > backup03's first Oops: > > > > BUG: unable to handle kernel paging request at 880502efc240 > > IP: [] kmem_cache_alloc+0x4b/0x110 > > PGD 1d3a067 PUD 0 > > Oops: [#1] SMP > > Modules linked in: aoe ipmi_devintf ipmi_msghandler bnx2 microcode > > serio_raw evdev > > CPU: 6 PID: 14066 Comm: perl Not tainted 3.10.0-hw+ #2 > > Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.11.0 07/23/2012 > > task: 88040111c3b0 ti: 8803c23ae000 task.ti: 8803c23ae000 > > RIP: 0010:[] [] > > kmem_cache_alloc+0x4b/0x110 > > RSP: 0018:8803c23afd90 EFLAGS: 00010282 > > RAX: RBX: 88040111c3b0 RCX: 0002a76e > > RDX: 0002a76d RSI: 00d0 RDI: 8107a4e3 > > RBP: 8803c23afdd0 R08: 00016a80 R09: > > R10: fffe R11: ffd0 R12: 00d0 > > R13: 88041d403980 R14: 880502efc240 R15: 88010e375a40 > > FS: 7f2cae496700() GS:88041f2c() knlGS: > > CS: 0010 DS: ES: CR0: 8005003b > > CR2: 880502efc240 CR3: 0001e0ced000 CR4: 07e0 > > DR0: DR1: DR2: > > DR3: DR6: 0ff0 DR7: 0400 > > Stack: > > 8803c23afe98 8803c23afdb8 81133811 88040111c3b0 > > 88010e375a40 01200011 7f2cae4969d0 88010e375a40 > > 8803c23afdf0 8107a4e3 81b49b80 01200011 > > Call Trace: > > [] ? final_putname+0x21/0x50 > > [] prepare_creds+0x23/0x150 > > [] copy_creds+0x31/0x160 > > [] ? unlazy_fpu+0x9b/0xb0 > > [] copy_process.part.49+0x239/0x1390 > > [] ? __alloc_fd+0x42/0x100 > > [] do_fork+0xa4/0x320 > > [] ? __do_pipe_flags+0x77/0xb0 > > [] ? __fd_install+0x26/0x60 > > [] SyS_clone+0x11/0x20 > > [] s
[3.10] Oopses in kmem_cache_allocate() via prepare_creds()
We saw two Oopses overnight on two separate boxes that seem possibly related, but both are weird. These boxes typically run btrfs for rsync snapshot backups (and usually Oops in btrfs ;), but not this time! backup02 was running 3.10-rc6 plus btrfs-next at the time, and backup03 was running 3.10 release plus btrfs-next from yesterday. Full kern.log and .config at http://0x.ca/sim/ref/3.10/ backup02's first Oops: BUG: unable to handle kernel paging request at 0001 IP: [] kmem_cache_alloc+0x4b/0x110 PGD 1f54f7067 PUD 0 Oops: [#1] SMP Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe microcode serio_raw bnx2 evdev CPU: 0 PID: 23112 Comm: ionice Not tainted 3.10.0-rc6-hw+ #46 Hardware name: Dell Inc. PowerEdge 2950/0NH278, BIOS 2.7.0 10/30/2010 task: 8802c3f08000 ti: 8801b4876000 task.ti: 8801b4876000 RIP: 0010:[] [] kmem_cache_alloc+0x4b/0x110 RSP: 0018:8801b4877e88 EFLAGS: 00010206 RAX: RBX: 8802c3f08000 RCX: 017f040e RDX: 017f040d RSI: 00d0 RDI: 8107a503 RBP: 8801b4877ec8 R08: 00016a80 R09: R10: 7fff025fe120 R11: 0246 R12: 00d0 R13: 88042d8019c0 R14: 0001 R15: 7fc3588ee97f FS: () GS:88043fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0001 CR3: 000409d68000 CR4: 07f0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Stack: 8801b4877ed8 8112a1bc 8800985acd20 8802c3f08000 0001 7fc3588ee334 7fc358af5758 7fc3588ee97f 8801b4877ee8 8107a503 8801b4877ee8 ffea Call Trace: [] ? __fput+0x12c/0x240 [] prepare_creds+0x23/0x150 [] SyS_faccessat+0x34/0x1f0 [] SyS_access+0x13/0x20 [] system_call_fastpath+0x16/0x1b Code: 75 f0 4c 89 7d f8 49 8b 4d 00 65 48 03 0c 25 68 da 00 00 48 8b 51 08 4c 8b 31 4d 85 f6 74 5f 49 63 45 20 4d 8b 45 00 48 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 c8 49 63 RIP [] kmem_cache_alloc+0x4b/0x110 RSP CR2: 0001 ---[ end trace 744477356cd98306 ]--- backup03's first Oops: BUG: unable to handle kernel paging request at 880502efc240 IP: [] kmem_cache_alloc+0x4b/0x110 PGD 1d3a067 PUD 0 Oops: [#1] SMP Modules linked in: aoe ipmi_devintf ipmi_msghandler bnx2 microcode serio_raw evdev CPU: 6 PID: 14066 Comm: perl Not tainted 3.10.0-hw+ #2 Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.11.0 07/23/2012 task: 88040111c3b0 ti: 8803c23ae000 task.ti: 8803c23ae000 RIP: 0010:[] [] kmem_cache_alloc+0x4b/0x110 RSP: 0018:8803c23afd90 EFLAGS: 00010282 RAX: RBX: 88040111c3b0 RCX: 0002a76e RDX: 0002a76d RSI: 00d0 RDI: 8107a4e3 RBP: 8803c23afdd0 R08: 00016a80 R09: R10: fffe R11: ffd0 R12: 00d0 R13: 88041d403980 R14: 880502efc240 R15: 88010e375a40 FS: 7f2cae496700() GS:88041f2c() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 880502efc240 CR3: 0001e0ced000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Stack: 8803c23afe98 8803c23afdb8 81133811 88040111c3b0 88010e375a40 01200011 7f2cae4969d0 88010e375a40 8803c23afdf0 8107a4e3 81b49b80 01200011 Call Trace: [] ? final_putname+0x21/0x50 [] prepare_creds+0x23/0x150 [] copy_creds+0x31/0x160 [] ? unlazy_fpu+0x9b/0xb0 [] copy_process.part.49+0x239/0x1390 [] ? __alloc_fd+0x42/0x100 [] do_fork+0xa4/0x320 [] ? __do_pipe_flags+0x77/0xb0 [] ? __fd_install+0x26/0x60 [] SyS_clone+0x11/0x20 [] stub_clone+0x69/0x90 [] ? system_call_fastpath+0x16/0x1b Code: 75 f0 4c 89 7d f8 49 8b 4d 00 65 48 03 0c 25 68 da 00 00 48 8b 51 08 4c 8b 31 4d 85 f6 74 5f 49 63 45 20 4d 8b 45 00 48 8d 4a 01 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 c8 49 63 RIP [] kmem_cache_alloc+0x4b/0x110 RSP CR2: 880502efc240 ---[ end trace 956d153150ecc57f ]--- Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ 02/42] TTY: do not update atime/mtime on read/write
On Tue, Apr 30, 2013 at 06:41:44PM -0700, Linus Torvalds wrote: > On Tue, Apr 30, 2013 at 5:57 PM, Linus Torvalds > wrote: > > > > Patch is whitespace-damaged and totally untested! Caveat applicator. > > Ok, so it's still whitespace-damaged, but it seems to work. The > appended has the "8 second rule" too.. > > Comments? Simon? Tested -- both hunks seem to work as intended. Thanks! Simon- Below became b0b885657b6c8ef63a46bc9299b2a7715d19acde >Linus > > --- snip snip --- > drivers/tty/pty.c| 3 +++ > drivers/tty/tty_io.c | 4 ++-- > 2 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c > index a62798fcc014..59bfaecc4e14 100644 > --- a/drivers/tty/pty.c > +++ b/drivers/tty/pty.c > @@ -681,6 +681,9 @@ static int ptmx_open(struct inode *inode, struct file > *filp) > > nonseekable_open(inode, filp); > > + /* We refuse fsnotify events on ptmx, since it's a shared resource */ > + filp->f_mode |= FMODE_NONOTIFY; > + > retval = tty_alloc_file(filp); > if (retval) > return retval; > diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c > index 97ebc8c5864e..6464029e4860 100644 > --- a/drivers/tty/tty_io.c > +++ b/drivers/tty/tty_io.c > @@ -988,10 +988,10 @@ void start_tty(struct tty_struct *tty) > > EXPORT_SYMBOL(start_tty); > > +/* We limit tty time update visibility to every 8 seconds or so. */ > static void tty_update_time(struct timespec *time) > { > - unsigned long sec = get_seconds(); > - sec -= sec % 60; > + unsigned long sec = get_seconds() & ~7; > if ((long)(sec - time->tv_sec) > 0) > time->tv_sec = sec; > } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ 02/42] TTY: do not update atime/mtime on read/write
On Mon, Apr 29, 2013 at 06:37:24PM -0700, Greg Kroah-Hartman wrote: > On Mon, Apr 29, 2013 at 05:36:40PM -0700, Simon Kirby wrote: > > On Mon, Apr 29, 2013 at 05:21:17PM -0700, Greg Kroah-Hartman wrote: > > > > > On Mon, Apr 29, 2013 at 05:14:45PM -0700, Simon Kirby wrote: > > > > On Mon, Apr 29, 2013 at 12:01:44PM -0700, Greg Kroah-Hartman wrote: > > > > > > > > > 3.8-stable review patch. If anyone has any objections, please let me > > > > > know. > > > > > > > > I object. This breaks functionality I use every day (seeing who else is > > > > working on stuff with "w"). > > > > > > > > Furthermore, the patch does not actually fix the hole referenced (see > > > > ptmx-keystroke-latency.c on > > > > http://vladz.devzero.fr/013_ptmx-timing.php). > > > > I can still reproduce the timing capture even with this patch applied > > > > (in 3.9-rc8). > > > > > > How? There are no keystrokes being reported to other users, or did we > > > miss something with this patch? > > > > wget http://vladz.devzero.fr/svn/codes/PoC/ptmx-keystroke-latency.c > > gcc -O ptmx-keystroke-latency ptmx-keystroke-latency.c > > ./ptmx-keystroke-latency > > > > Log in to another tty, as another user. See keystroke timing. 3.9-rc8. > > > > Seems like it was missed. Meanwhile, idle times in "w" do not update. > > Ah, it's using inotify on the /dev/ptmx device. Jiri, your change > really doesn't affect that at all :( > > Simon, you mention a grsec change somewhere that addresses this issue. > Any hints on where that would be? Yes, see Jiri's comments in the original patch (b0de59b5733d): http://vladz.devzero.fr/013_ptmx-timing.php The grsec patch is linked from there: http://grsecurity.net/~spender/sidechannel.diff Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] TTY: fix atime/mtime regression
On Fri, Apr 26, 2013 at 10:02:12AM -0700, Linus Torvalds wrote: > On Fri, Apr 26, 2013 at 4:48 AM, Jiri Slaby wrote: > > > > To revert to the old behaviour while still preventing attackers to > > guess the password length, we update the timestamps in one-minute > > intervals by this patch. > > Thanks, applied. > > And now that I see the behavior of "w", I can kind of understand why > you picked 10s intervals. That "w" output is really really quite ugly. > Talking about "27.00s" idle for the current terminal when we only > update at even minutes ends up not being sensible. Ah, so it was your suggestion to go with one minute. I objected to the stable-backporting of this, since it was broken and didn't actually fix the inotify path, but I care more about the time granularity chosen here. > Craig, background: the current git kernel (so 3.9, and these commits > will presumably be back-ported) does not update tty timestamps very > often, because you can use the timestamps to look at peoples typing > behavior. Initially it didn't update the timestamps AT ALL, but that > broke the whole idle routine. Now it updates it only at minute > boundaries, so things like "w" _work_, but the hundreth-of-a-second > idle precision is obviously just totally random noise. > > Not a biggie, I doubt I would even have noticed unless I was > explicitly looking at that field, but I look at this field all the time, and would really like to see seconds. Surely anybody typing a password types it faster than 1 character per second. Why stretch it out so much? Can we at least make it 10 seconds? Simon- --- Subject: [PATCH] TTY: increase atime/mtime update rate 37b7f3c76595 introduces an update interval for TTY atime updates, making "w"'s IDLE column less useful than in the past. Since this is often used for checking to see if other users are actually using the system, reduce the time to 10 seconds. Signed-off-by: Simon Kirby Cc: # follow 37b7f3c76595e23257f61bd80b223de865 --- drivers/tty/tty_io.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index b045268..dee88ff 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -944,7 +944,7 @@ EXPORT_SYMBOL(start_tty); static void tty_update_time(struct timespec *time) { unsigned long sec = get_seconds(); - sec -= sec % 60; + sec -= sec % 10; if ((long)(sec - time->tv_sec) > 0) time->tv_sec = sec; } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ 02/42] TTY: do not update atime/mtime on read/write
On Mon, Apr 29, 2013 at 05:21:17PM -0700, Greg Kroah-Hartman wrote: > On Mon, Apr 29, 2013 at 05:14:45PM -0700, Simon Kirby wrote: > > On Mon, Apr 29, 2013 at 12:01:44PM -0700, Greg Kroah-Hartman wrote: > > > > > 3.8-stable review patch. If anyone has any objections, please let me > > > know. > > > > I object. This breaks functionality I use every day (seeing who else is > > working on stuff with "w"). > > > > Furthermore, the patch does not actually fix the hole referenced (see > > ptmx-keystroke-latency.c on http://vladz.devzero.fr/013_ptmx-timing.php). > > I can still reproduce the timing capture even with this patch applied > > (in 3.9-rc8). > > How? There are no keystrokes being reported to other users, or did we > miss something with this patch? wget http://vladz.devzero.fr/svn/codes/PoC/ptmx-keystroke-latency.c gcc -O ptmx-keystroke-latency ptmx-keystroke-latency.c ./ptmx-keystroke-latency Log in to another tty, as another user. See keystroke timing. 3.9-rc8. Seems like it was missed. Meanwhile, idle times in "w" do not update. > > The grsec patch instead introdues another test within the inotify code > > (is_sidechannel_device()-related bits) -- untested by me, but probably > > more relevant. > > > > Even 37b7f3c76595e23257f61bd80b223de8658617ee, the "regression fix", > > which Linus merged in for the 3.9 release, is still a regression for me. > > And I applied that one as well. Right, so this restores updates but increases the granularity to 60 seconds. I'm complaining that this is still affects my occupational performance. > > 60 seconds means somebody is asleep in my environment, and so is still > > the kind of thing that just pisses me off. I'd rather revert this whole > > thing. > > Users taking a break for longer than a minute upset you? What are you > really trying to keep track of here? Really? In a team environment, a person idle for 30 seconds means they've stopped to look at something else. Now we have to wait 2 minutes to know if this has happened or not. Now it becomes faster to interrupt somebody to ask them if maintenance can be done, etc. > > I'd stand maybe 1 seconds as maximum granularity. You could do that with > > less code and no test. > > Patch to show this? I was thinking of just updating the seconds field of the timespec struct, or leaving this particular part and setting sb->s_time_gran to 1, though that would probably break other things. Since I've never looked at this stuff before, I'm not sure I should make a patch, but I can... Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ 02/42] TTY: do not update atime/mtime on read/write
On Mon, Apr 29, 2013 at 12:01:44PM -0700, Greg Kroah-Hartman wrote: > 3.8-stable review patch. If anyone has any objections, please let me know. I object. This breaks functionality I use every day (seeing who else is working on stuff with "w"). Furthermore, the patch does not actually fix the hole referenced (see ptmx-keystroke-latency.c on http://vladz.devzero.fr/013_ptmx-timing.php). I can still reproduce the timing capture even with this patch applied (in 3.9-rc8). The grsec patch instead introdues another test within the inotify code (is_sidechannel_device()-related bits) -- untested by me, but probably more relevant. Even 37b7f3c76595e23257f61bd80b223de8658617ee, the "regression fix", which Linus merged in for the 3.9 release, is still a regression for me. 60 seconds means somebody is asleep in my environment, and so is still the kind of thing that just pisses me off. I'd rather revert this whole thing. I'd stand maybe 1 seconds as maximum granularity. You could do that with less code and no test. "watch -n.1 ls --full-time /dev/pts/1" shows that the exposed resolution (without inotify) is to the nanosecond. Simon- > -- > > From: Jiri Slaby > > commit b0de59b5733d18b0d1974a060860a8b5c1b36a2e upstream. > > On http://vladz.devzero.fr/013_ptmx-timing.php, we can see how to find > out length of a password using timestamps of /dev/ptmx. It is > documented in "Timing Analysis of Keystrokes and Timing Attacks on > SSH". To avoid that problem, do not update time when reading > from/writing to a TTY. > > I am afraid of regressions as this is a behavior we have since 0.97 > and apps may expect the time to be current, e.g. for monitoring > whether there was a change on the TTY. Now, there is no change. So > this would better have a lot of testing before it goes upstream. > > References: CVE-2013-0160 > > Signed-off-by: Jiri Slaby > Signed-off-by: Greg Kroah-Hartman > > --- > drivers/tty/tty_io.c |8 ++-- > 1 file changed, 2 insertions(+), 6 deletions(-) > > --- a/drivers/tty/tty_io.c > +++ b/drivers/tty/tty_io.c > @@ -977,8 +977,7 @@ static ssize_t tty_read(struct file *fil > else > i = -EIO; > tty_ldisc_deref(ld); > - if (i > 0) > - inode->i_atime = current_fs_time(inode->i_sb); > + > return i; > } > > @@ -1079,11 +1078,8 @@ static inline ssize_t do_tty_write( > break; > cond_resched(); > } > - if (written) { > - struct inode *inode = file->f_path.dentry->d_inode; > - inode->i_mtime = current_fs_time(inode->i_sb); > + if (written) > ret = written; > - } > out: > tty_write_unlock(tty); > return ret; > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression with initramfs and nfsroot (appears to be in the dcache)
On Fri, Nov 30, 2012 at 02:00:48AM +, Al Viro wrote: > OK, that settles it. WARN_ON() and printks in the area can be dropped; > the right fix is below. However, there's a similar place in cifs that > also needs to be dealt with and I really, really wonder why the hell do > we do d_drop() in nfs_revalidate_lookup(). It's not relevant in this > bug, but I would like to understand what's wrong with simply returning > 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care > of unhashing, etc. itself. Would make have_submounts() in there pointless > as well - we could just return 0 and let d_invalidate() take care of the > checks... Trond? > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct > nfs_entry *entry) > nfs_refresh_inode(dentry->d_inode, entry->fattr); > goto out; > } else { > - d_drop(dentry); > + if (d_invalidate(dentry) != 0) > + goto out; > dput(dentry); > } > } Hello, With your previous patch (with the WARN_ON), I hit the WARN_ON() in the test case described here: https://patchwork.kernel.org/patch/1446851/ . The __d_move()ing mountpoint case no longer hits, and there is no longer an EBUSY, so this seems to work for me (in 3.6, where it broke). Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
run_posix_cpu_timers panic on v2.6.22-rc6
Having recently upgraded our Asterisk server, I figured it would be a good time to try a NO_HZ kernel. Everything was running well, until... it decided to panic. All I have is a fuzzy picture of the console to work from. The panic was a fatal exception in interrupt, with EIP within run_posix_cpu_timers. I can't quite read the offsets, but the stack backtrace was: run_rebalance_domains scheduler_tick tick_periodic tick_handle_periodic smp_apic_timer_interrupt apic_timer_interrupt default_idle default_idle cpu_idle start_kernel Seeing as this is all new code and the box has been otherwise stable for the past 3 years, there is probably a problem stil lurking in the NO_HZ code somewhere. But, it looks like I don't have any other info. I'll try to get a better shot of the Oops next time... Simon- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3c590 vs. tulip
On Fri, May 11, 2001 at 09:27:29AM -0400, Dan Mann wrote: > The server has lots (ok, about 20,000 not counting the os itself) of medium > sized files on it, ranging in size from 60k to 40MB. When I run gqview > (image viewing program) on the client and point to a local directory that is > mapped to the server using samba, the images (over 4000 in one directory) > are displayed absolutely as fast as I can click my mouse button. No lag > time whatsoever. How can this be so fast? Even with the images on my local > faster machine it is much slower. Images take at least .5 to 1 second to > load when they are stored locally. But over the network, with 2.4.4 and > samba 2.2, It's as if the server "knows" what I'm going to ask for before I > actually do. Is this normal? I honestly don't think it was this fast when > server was on 2.2 Kernel with samba 2.07. Note that the newer gqviews preload the "next" image (next based on your previous clicking direction). If you are clicking sequentially and give it enough time between images, it will immediately display the next image when you click on it. I don't see how even if it were any sort of caching bug or something that gqview would be able to load them that much faster -- it still has to decode them at one point or another. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.4.3 freeze under heavy writing + open rxvt
Three times now I've had 2.4.3 freeze on my dual CPU box while doing a "dd if=/dev/zero of=/dev/hdc bs=1024k" (a drive to be RMA'd :)). I got bored and opened an rxvt, and as the machine was swapping in (I assume), everything froze. The mouse still moved for about 5 seconds before the freeze, and the window was visible as it was attempting to start tcsh. I'm guessing that what's happening is something is waiting on a lock and blocking interrupts (?) for five seconds while it is swapping in, and the NMI lockup detector is kicking in and really breaking it. I have my serial console plugged in and minicom actually capturing now, so I'll see if I can get a trace of some sort. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LDT allocated for cloned task!
On Tue, Mar 20, 2001 at 09:23:14AM -0800, Linus Torvalds wrote: > It's harmless. > > It's really a warning that says: the mm that you allocated a new LDT for > may have multiple users, and while the LDT is added to all of them, we > don't guarantee _when_ the other users will actually see the LDT. > > It so happens that the other users are probably just something like > "top" or similar, that increment the MM count to make sure that the MM > doesn't go away while they get information about the process. And those > users don't care about the LDT in the least. xmms with the xmms-avi (or avi-xmms?) plugin reproduces the message each and every time xmms starts up. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B))
On Wed, Feb 21, 2001 at 03:52:37PM -0800, David S. Miller wrote: > There is no reason my patch should have this effect. > > All of this is what appears to be a bug in Windows TCP header > compression, if the ID field of the IPv4 header does not change then > it drops every other packet. > > The change I posted as-is, is unacceptable because it adds unnecessary > cost to a fast path. The final change I actually use will likely > involve using the TCP sequence numbers to calculate an "always > changing" ID number in the IPv4 headers to placate these broken > windows machines. Has such a patch gone in to the kernel yet? Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 TCP(?) timeouts
On Fri, Feb 16, 2001 at 07:08:05PM -0500, Simon Kirby wrote: > Hello, > > Today we put 2.4.1 on our mail server after having see it perform well on > some other boxes. It seems now we are receiving a few calls every hour > from customers reporting that the server tends to hang and eventually > time out on them when downloading mail. All customers that have reported > this problem so far are on a didalup connection. Apparently the server > will stop transmitting data (or the client seems to think so), and then > their mail client will time out. We recorded a trace on the mail server end to one of the customers having the problem. At first they closed the connection because their mail client was set to a timeout of 1 minute, but then when they changed it to 5 seconds, it seemed to limp along further. It seems to me just like there's a huge amount of packet loss, but pinging the machine just after this shows 0% loss (just occasional jumps in response time). During this trace, when long periods of nothing went by, "netstat -tan |grep ip" showed nothing abnormal: a 0 byte receive queue and some data in the send queue equal to what would be retransmitted and eventually go through two minutes later. nmap: Remote operating system guess: Windows 2000 Professional, Build 2128 16:26:14.738836 < client.1104 > mail.pop3: S 1263956200:1263956200(0) win 8760 (DF) 16:26:14.73 > mail.pop3 > client.1104: S 26894293:26894293(0) ack 1263956201 win 5840 (DF) 16:26:15.014145 < client.1104 > mail.pop3: . 1:1(0) ack 1 win 9112 (DF) 16:26:15.014866 > mail.pop3 > client.1104: P 1:92(91) ack 1 win 5840 (DF) 16:26:15.291998 < client.1104 > mail.pop3: P 1:16(15) ack 92 win 9021 (DF) 16:26:15.292199 > mail.pop3 > client.1104: . 92:92(0) ack 16 win 5840 (DF) 16:26:15.292305 > mail.pop3 > client.1104: P 92:115(23) ack 16 win 5840 (DF) 16:26:16.686295 > mail.pop3 > client.1104: P 92:115(23) ack 16 win 5840 (DF) 16:26:16.954563 < client.1104 > mail.pop3: P 16:30(14) ack 115 win 8998 (DF) 16:26:16.976908 > mail.pop3 > client.1104: P 115:137(22) ack 30 win 5840 (DF) 16:26:19.776322 > mail.pop3 > client.1104: P 115:137(22) ack 30 win 5840 (DF) 16:26:20.033951 < client.1104 > mail.pop3: P 30:36(6) ack 137 win 8976 (DF) 16:26:20.034063 > mail.pop3 > client.1104: P 137:149(12) ack 36 win 5840 (DF) 16:26:25.626301 > mail.pop3 > client.1104: P 137:149(12) ack 36 win 5840 (DF) 16:26:25.922151 < client.1104 > mail.pop3: P 36:42(6) ack 149 win 8964 (DF) 16:26:25.922254 > mail.pop3 > client.1104: P 149:219(70) ack 42 win 5840 (DF) 16:26:36.949499 < client.1104 > mail.pop3: P 36:42(6) ack 149 win 8964 (DF) 16:26:36.949533 > mail.pop3 > client.1104: . 219:219(0) ack 42 win 5840 (DF) 16:26:37.116302 > mail.pop3 > client.1104: P 149:219(70) ack 42 win 5840 (DF) 16:26:37.380554 < client.1104 > mail.pop3: P 42:50(8) ack 219 win 8894 (DF) 16:26:37.380645 > mail.pop3 > client.1104: . 219:219(0) ack 50 win 5840 (DF) 16:26:37.380709 > mail.pop3 > client.1104: P 219:231(12) ack 50 win 5840 (DF) 16:26:59.567440 < client.1104 > mail.pop3: P 42:50(8) ack 219 win 8894 (DF) 16:26:59.567476 > mail.pop3 > client.1104: . 231:231(0) ack 50 win 5840 (DF) 16:26:59.776301 > mail.pop3 > client.1104: P 219:231(12) ack 50 win 5840 (DF) 16:27:00.043125 < client.1104 > mail.pop3: P 50:59(9) ack 231 win 8882 (DF) 16:27:00.043186 > mail.pop3 > client.1104: . 231:231(0) ack 59 win 5840 (DF) 16:27:00.043475 > mail.pop3 > client.1104: . 231:767(536) ack 59 win 5840 (DF) 16:27:00.043491 > mail.pop3 > client.1104: P 767:1220(453) ack 59 win 5840 (DF) 16:27:44.399831 < client.1104 > mail.pop3: P 50:59(9) ack 231 win 8882 (DF) 16:27:44.399869 > mail.pop3 > client.1104: . 1220:1220(0) ack 59 win 5840 (DF) 16:27:44.836304 > mail.pop3 > client.1104: . 231:767(536) ack 59 win 5840 (DF) 16:27:45.295946 < client.1104 > mail.pop3: . 59:59(0) ack 767 win 9112 (DF) 16:27:45.296003 > mail.pop3 > client.1104: P 767:1220(453) ack 59 win 5840 (DF) 16:29:14.886322 > mail.pop3 > client.1104: P 767:1220(453) ack 59 win 5840 (DF) 16:29:15.264417 < client.1104 > mail.pop3: P 59:67(8) ack 1220 win 8659 (DF) 16:29:15.264479 > mail.pop3 > client.1104: . 1220:1220(0) ack 67 win 5840 (DF) 16:29:15.265127 > mail.pop3 > client.1104: . 1220:1756(536) ack 67 win 5840 (DF) 16:29:15.265145 > mail.pop3 > client.1104: . 1756:2292(536) ack 67 win 5840 (DF) 16:30:45.187652 < client.1104 > mail.pop3: P 59:67(8) ack 1220 win 8659 (DF) 16:30:45.187727 > mail.pop3 > client.1104: . 2292:2292(0) ack 67 win 5840 (DF) 16:31:16.326378 > mail.pop3 > client.1104: . 1220:1756(536) ack 67 win 5840 (DF) 16:31:17.513053 < client.1104 > mail.pop3: . 67:67(0) ack 1756 win 9112 (DF) 16:31:17.513129 > mail.pop3 >
2.4 TCP(?) timeouts
Hello, Today we put 2.4.1 on our mail server after having see it perform well on some other boxes. It seems now we are receiving a few calls every hour from customers reporting that the server tends to hang and eventually time out on them when downloading mail. All customers that have reported this problem so far are on a didalup connection. Apparently the server will stop transmitting data (or the client seems to think so), and then their mail client will time out. I noticed that the 2.4.1 on my desktop seems to time out SSH connections to servers that have become unreachable in about 10 seconds or so, which is many times faster than 2.2 which used to sit for hours before it timed out (if it all). I'm not sure if this is related. I would expect the client to attempt to retransmit some ACKs and eventually get some RSTs back if this were the case. Has anybody seen similar problems? The box was previously running 2.2.19pre8 and no customers reported such problems. We're using cucipop w/ldap on a dual PIII 800 MHz box with 1.5 GB of RAM. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LDT allocated for cloned task!
On Tue, Feb 13, 2001 at 06:22:26PM +, Alan Cox wrote: > > LDT allocated for cloned task! > > > > I'm seeing this message come up fairly often while running vanilla > > 2.4.2-pre3 on my dual Celeron system. I don't think I saw it before > > while running 2.4.1, but I may have just missed it. > > Are you running wine or dosemu ? Actually, I've ran both of them at least a few times this boot. I think I've found what's doing it...xmms with the avi-xmms plugin will cause the message to appear at startup even without playing anything. Moving the libraries out of the /usr/lib/xmms/Input directory and starting xmms again will not produce any message. I only just recently downloaded this plugin which is probably why I didn't see it before. It's also happening on my second (non-DRI) head, so it's probably not related to that (I'll reboot and try again without any DRI modules loaded and see). Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
LDT allocated for cloned task!
LDT allocated for cloned task! I'm seeing this message come up fairly often while running vanilla 2.4.2-pre3 on my dual Celeron system. I don't think I saw it before while running 2.4.1, but I may have just missed it. My system has been up around two days and has 11 of these messages in the ring buffer. Actually, I just remembered that I'm using the mga DRI driver module from the DRI CVS tree rather than the built-in module, so that's not part of the official kernel...maybe that is causing the messages. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ECN
On Fri, Jan 26, 2001 at 07:14:42AM -0800, David S. Miller wrote: > Jamie Lokier writes: > > Does ECN provide perceived benefits to the node using it? > > Yes, endpoints and intermediate routers can tell the TCP sender about > congestion instead of TCP having to guess about it based upon observed > packet drop. > > It is a major enhancement to performance over any WAN. > > The endpoint based congestion notification happens _now_ if both > sides speak ECN. The router based notification will be happening > in the near future as Cisco and others deploy ECN speaking versions of > their router software. Hmm... Just wondering: what does TCP then do when it receives this ECN notification? Try harder, try less? Or does it get a specific packet saying "I dropped your packet", and then the sender retransmits? I suppose I could go find the RFC... Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Subtle MM bug
On Tue, Jan 09, 2001 at 10:47:57AM -0800, Linus Torvalds wrote: > And this _is_ a downside, there's no question about it. There's the worry > about the potential loss of locality, but there's also the fact that you > effectively need a bigger swap partition with 2.4.x - never mind that > large portions of the allocations may never be used. You still need the > disk space for good VM behaviour. > > There are always trade-offs, I think the 2.4.x tradeoff is a good one. Hmm, perhaps you could clarify... For boxes that rarely ever use swap with 2.2, will they now need more swap space on 2.4 to perform well, or just boxes which don't have enough RAM to handle everything nicely? I've always been tending to make swap partitions smaller lately, as it helps in the case where we have to wait for a runaway process to eat up all of the swap space before it gets killed. Making the swap size smaller speeds up the time it takes for this to happen, albeit something which isn't supposed to happen anyway. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Pentium 4 and 2.4/2.5
On Wed, Nov 08, 2000 at 06:47:40PM +, Alan Cox wrote: > Ok. Issue settled. So 'rep nop' is safe. Ok that can get into the spinlocks > for 2.2.18 Just curious... What does "rep nop" actually accomplish, anyway? Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.2.17 toasting cache?
Hmm... This seems to be happening every 20 minutes or so on a mail server here. This box handles about 25-35 POP3 logins per second and has 1 GB of RAM (compiled with the kernel at 1GB currently, oops). I have 2.2.18pre15+VM_global on there ready to go, but we haven't rebooted it to that yet. The box runs cucipop and exim and has some staff logins etc., but it doesn't look like any processes are eating up the memory and dumping it for a number of different reasons. This will probably word wrap for lots of people...sorry. "vmstat 1": procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 26 13 1 1260 5216 125304 695560 0 0 18351 829 1473 62 38 0 18 27 1 1260 2172 125304 696940 0 0 31356 910 1545 77 22 0 23 36 1 1260 2312 124692 693468 0 0 41176 970 2362 76 24 0 27 53 2 1260 654044 34656 132704 0 0 1773 1652 3881 15430 43 31 26 (no reponse for at least 30 seconds here) 8 43 20 39528 857256 19660 17056 388 38332 985 10906 5160 32640 1 65 34 0 51 17 39704 856308 19688 18408 560 352 40888 586 818 4 7 89 0 47 16 39564 854304 19748 21128 420 0 753 0 898 5054 6 9 85 0 45 16 39144 851136 19808 24640 376 0 914 0 1158 12984 7 10 84 As you can see, it decided to throw out around 700 MB of cache. I've been watching top and "vmstat 1" for a while now trying to find out what does it, but no process ever seems to be eating up memory or anything when it happens -- it seems to just free all the memory and then the box just goes very slowly as the RAID array is saturated while it reads back in all of the mailboxes as people login (417 blocked cucipop processes at one point... ouch :)). It doesn't look like anything is slowly eating up the memory (and cache) and then exiting, because if it were, there would be many more blocked cucipop processes trying to read back in the mail. It also doesn't look like something is quickly eating it up and exiting in a single second, because I can't even do that if I try with an optimized malloc()-and-dirty program. It also looks weird that it kicks out some stuff to swap _after_ all of the memory becomes free. This is a dual PIII 700 MHz box, the 2.2.17 kernel has no funky patches other than one to raise the maximum number of simultaneous processes/threads (as you can probably guess). Hmm...it'd be interesting to try 2.4 on there. ;) Any ideas? Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: eepro100: card reports no resources [was VM-global...]
On Mon, Oct 30, 2000 at 02:23:56PM +0800, Andrey Savochkin wrote: > > > > Oct 26 16:38:01 ns29 kernel: eth0: card reports no resources. > > > > > > > let me guess: intel eepro100 or similar?? > > > Well known problem with that one. dont know if its fully fixed ... With > > > > Happens here too, with 2xPPro200, 2.2.18pre17, Eepro100 and light load. > > The network stalls for several minutes when it happens. > > > > > 2.4.0-test9-pre3 it doesnt happen on my machine ... > > > > What about a fix for a 2.2.x...? > > The exact reason for this problem is still unknown. We were seeing this on a firewall a week or so ago -- it was actually coming from some sort of arp flood/loop on the uplink not being caused by us, and the speed of the incoming arp packets would cause these messages to occur. We tried ifconfig up/down, warm reboot, cold reboot, power cycle, card swapping, and the messages continued. We stopped the card with a 3c905 and the messages stopped, but "ifconfig" showed Rx overruns at about the same frequency as the messages used to occur. This is probably another way to trigger this error than what most people are seeing. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kqueue microbenchmark results
On Wed, Oct 25, 2000 at 12:23:07PM -0500, Jonathan Lemon wrote: > Consider a program which reads from point A, writes to point B. If > the buffer associated with B fills up, then we don't want to continue > reading from A. > > A/B may be network sockets, pipes, or ptys. Fine, but we can bind the event watching to the device or socket or pipe that will clog up, right? In which case, we'll later get a write event (just like with select()), and then once there is some progress you can go back to read()ing from the original descriptor. This is even easier than using select() because you don't have to take the descriptor out of the read set and put it in the write set temporarily -- it will automatically work that way. > Or perhaps you receive a request to use a resource that is currently > busy. Does your application want to postpone the request, or read the > data immediately, even if the request can't be serviced yet? Assuming this "resource" has a way of waking up the process when it unclogs, then you can go back and read the remaining data later, which is what you would want to do anyway. > My point is that I can easily think of several examples as to where > this behavior may be beneficial to the application, and I use some of > them myself. You can indeed get the same result by forcing each and > every application that wants this behavior to implement their own > tracking mechanism, but this strikes me as error-prone and places an > undue burden on the application programmer. I can see that you could write it this way... I'm just trying to see if it's really needed. :) As I wrote in my last email to Jamie, you would need to implement a tracking mechanism in any case to avoid DoS attacks from clients or a case where a single client can clog up the reading from any other client. And you'd need to take the descriptor out of the read() set in the select() case anyway, so I don't really see what's different. > You can find my paper at http://people.freebsd.org/~jlemon I'll go and read it now. :) Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kqueue microbenchmark results
On Wed, Oct 25, 2000 at 07:08:48PM +0200, Jamie Lokier wrote: > Simon Kirby wrote: > > > What applications would do better by postponing some of the reading? > > I can't think of any reason off the top of my head why an application > > wouldn't want to read everything it can. > > Pipelined server. > > 1. Wait for event. > 2. Read block > 3. If EAGAIN, goto 1. > 4. If next request in block is incomplete, goto 2. > 5. Process next request in block. > 6. Write response. > 7. If EAGAIN, wait until output is ready for writing then goto 6. > 8. Goto 1 or 2, your choice. >(Here I'd go to 2 if the last read was complete -- it avoids a >redundant call to poll()). > > If you simply read everything you can at step 2, you'll run out of > memory the moment someone sends you 10 requests. > > This doesn't happen if you leave unread data in kernel space -- > TCP windows and all that. Hmm, I don't understand. What happens at "wait until output is ready for writing then goto 6"? You mean you would stop the main loop to wait for a single client to unclog? Wouldn't you just do this? -> 1. Wait for event (read and write queued). Event occurs: Incoming data available. 2. Read a block. 3. Process block just read: Does it contain a full request? If not, queue, goto 2, munge together. If no more data, queue beginning of request, if any, and goto 1. 4. Walk over available requests in block just read. Process. 5. Attempt to write response, if any. 6. Attempted write: Did it all get out? If not, queue waiting writable data and goto 1 to wait for a write event. 7. Goto 2. Assume we got write clogged. Some loop later: 10. Wait for event (read and write queued). Event occurs: Write space available. 11. Write remaining available data. 12. Attempted write: Did it all get out? If not, queue remaining writable data and goto 1 to wait for another write event. 13. Goto 2. (If we're some sort of forwarding daemon and the receiving end of our forward has just unclogged, we want to read any readable data we had waiting. Same with if we're just answering a request, though, as the send direction could still get clogged.) What can't you do here? What's wrong? Note that the write event will let you read any remaining queued data. If you actually stop from going back to the main loop when you're write clogged, you will pause the daemon and create an easy DoS problem. There's no way around needing to queue writable data at least. This is how I wrote my irc daemon a while back, and it works fine with select(). I can't see what wouldn't work with edge-triggered events except perhaps the write() event -- I'm not sure what would be considered "triggered", perhaps when it goes under a watermark or something. In any case, it should all still work assuming get_events() offers the ability to receive "write space available" events. You don't have to read all data if you don't want to, assuming you will get another event later that will unclog the situation (meaning the obstacle must also trigger an event when it is cleared). In fact, if you did leave the read queued in a daemon using select() before, you'd keep looping endlessly taking all CPU and never idle because there would always be read data available. You'd have to not queue the descriptor into the read set and instead stick it in the write set so that you can sleep waiting for the write set to become available, effectively ignorning any further events on the read set until the write unclogs. This sounds just like what would happen if you only got one notification (edge triggered) in the first place. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kqueue microbenchmark results
On Wed, Oct 25, 2000 at 01:02:46AM -0500, Jonathan Lemon wrote: > Yes, someone pointed me to those today. I would suggest reading > some of the relevant literature before embarking on a design. My > paper discusses some of the issues, and Mogul/Banga make some good > points too. > > While an 'edge-trigger' design is indeed simpler, I feel that it > ends up making the job of the application harder. A simple example > to illustrate the point: what if the application does not choose > to read all the data from an incoming packet? The app now has to > implement its own state mechanism to remember that there may be pending > data in the buffer, since it will not get another event notification > unless another packet arrives. What applications would do better by postponing some of the reading? I can't think of any reason off the top of my head why an application wouldn't want to read everything it can. Doing everything in smaller chunks would increase overhead (but maybe reduce latencies very slightly -- albeit probably not much when using a get_events()-style interface). Isn't it probably better to keep the kernel implementation as efficient as possible so that the majority of applications which will read (and write) all data possible can do it as efficiently as possible? Queueing up the events, even as they are in the form received from the kernel, is pretty simple for a userspace program to do, and I think it's the best place for it. I know nothing about any other implementations, though, and I'm speaking mainly from the experiences I've had with coding daemons using select(). You mention you wrote a paper discussing this issue...Where could I find this? Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux's implementation of poll() not scalable?
On Tue, Oct 24, 2000 at 04:12:38PM -0700, Dan Kegel wrote: > With poll(), it was *not a bug* for the user code to drop events; with > your proposed interface, it *is a bug* for the user code to drop events. > I'm just emphasizing this because Simon Kirby ([EMAIL PROTECTED]) posted > incorrectly that your interface "has the same semantics as poll from > the event perspective". I missed this because I've never written anything that drops or forgets events and didn't think about it. Most programs will read() until EOF is returned and write() until EAGAIN is returned with non-blocking sockets. Is there any reason to ignore events other than to slow down response to some events in favor to others? I don't see why this is a problem as this interface _isn't_ replacing select or poll, so it shouldn't matter for existing programs that aren't converted to use the new interface. In any case, I think I would prefer that the kernel be optimized for the common case and leave any strange processing up to userspace so that the majority of programs which don't need this special case can run as fast as possible. Besides, it wouldn't be difficult for a program to stack up a list of events, even in the same structure as it would get from the kernel, so that it can process them later. At least then this data would be in swappable memory. Heck, even from an efficiency perspective, it would be faster for userspace to store the data as it wouldn't keep getting it returned from a syscall each time... Am I missing something else? Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux's implementation of poll() not scalable?
On Tue, Oct 24, 2000 at 10:03:04AM -0700, Linus Torvalds wrote: > Basically, with get_events(), there is a maximum of one event per "bind". > And the memory for that is statically allocated at bind_event() time. >... > But you'd be doing so in a controlled manner: the memory use wouldn't go > up just because there is a sudden influx of 5 packets. So it scales > with load by virtue of simply not _caring_ about the load - it only cares > about the number of fd's you're waiting on. Nice. I like this. It would be easy for existing userspace code to start using this interface as it has the same semantics as select/poll from the event perspective. But it would make things even easier, as the bind would follow the life of the descriptor and thus wouldn't need to be "requeued" before every get_events call, so that part of userspace code could just be ripped out^W^W disabled and kept only for portability. In most of the daemons I have written, I've ended up using memcpy() to keep a non-scribbled-over copy of the fdsets around so I don't have to walk data structures and requeue fds on every loop for select()...nasty. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux's implementation of poll() not scalable?
On Mon, Oct 23, 2000 at 10:39:36PM -0700, Linus Torvalds wrote: > Actually, forget the mmap, it's not needed. > > Here's a suggested "good" interface that would certainly be easy to > implement, and very easy to use, with none of the scalability issues that > many interfaces have. >... > Basically, the perfect interface for events would be > > struct event { > unsigned long id; /* file descriptor ID the event is on */ > unsigned long event;/* bitmask of active events */ > }; > > int get_events(struct event * event_array, int maxnr, struct timeval *tmout); I like. :) However, isn't there already something like this, albeit maybe without the ability to return multiple events at a time? When discussing select/poll on IRC a while ago with sct, sct said: Simon: You just put your sockets into O_NONBLOCK|FASYNC mode for SIGIO as usual. Simon: Then fcntl(fd, F_SETSIG, rtsignum) Simon: And you'll get a signal queue which passes you the fd of each SIGIO in turn. sct: easy :) Simon: You don't even need the overhead of a signal handler: instead of select(), you just do "sigwaitinfo(&siginfo, timeout)" and it will do a select-style IO wait, returning the fd in the siginfo when it's available. (Captured from IRC on Nov 12th, 1998.) Or does this menthod still have the overhead of encapsulating the events into signals within the kernel? Also, what is different in your above interface that prevents it from being able to queue up too many events? I guess the structure is only sizeof(int) * 2 bytes per fd, so it would only take, say, 80kB for 20,000 FDs on x86, but I don't see how the other method would be significantly different. The kernel would have to store the queued events still, surely... Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [2.4.0-test9-pre5] SCSI still broken, trident/mixer still broken
On Thu, Sep 21, 2000 at 09:39:07PM +0200, Torben Mathiasen wrote: > Ok, small patch cooked up. Not tested, not compiled. Give > it a try, and if it works please send it off to Linus. > I really need to get some work done on a project... This worked, thanks. :) Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] > diff -ur --exclude-from=/root/torben /opt/kernel/kernels/linux/drivers/scsi/sg.c >linux/drivers/scsi/sg.c > --- /opt/kernel/kernels/linux/drivers/scsi/sg.c Thu Sep 21 21:29:44 2000 > +++ linux/drivers/scsi/sg.c Thu Sep 21 21:35:46 2000 > @@ -1298,18 +1298,18 @@ > } > > #ifdef MODULE > - > MODULE_PARM(def_reserved_size, "i"); > MODULE_PARM_DESC(def_reserved_size, "size of buffer reserved for each fd"); > +#endif > > -int init_module(void) { > +static int __init init_sg(void) { > if (def_reserved_size >= 0) > sg_big_buff = def_reserved_size; > sg_template.module = THIS_MODULE; > return scsi_register_module(MODULE_SCSI_DEV, &sg_template); > } > > -void cleanup_module( void) > +static void __exit exit_sg( void) > { > #ifdef CONFIG_PROC_FS > sg_proc_cleanup(); > @@ -1324,7 +1324,9 @@ > } > sg_template.dev_max = 0; > } > -#endif /* MODULE */ > + > +module_init(init_sg); > +module_exit(exit_sg); > > > #if 0 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [2.4.0-test9-pre5] SCSI still broken, trident/mixer still broken
On Thu, Sep 21, 2000 at 02:34:01PM -0400, Douglas Gilbert wrote: > I do nearly all of my testing with sg as a module. > So this looks like (another recent) breakage. > > It is beginning to look like the sg driver is not > (properly) initialized when it is built into the > kernel. Perhaps you could put a printk in > sg_init() and sg_attach() to see if they are called. Actually, I also had a printk in sg_init() and it never got printed. I didn't have one in sg_attach, but I can try that. > > At one point before I followed some of the debug/logging commands listed > > at the top of sg.c and got an Oops as well... > > Seems as though I've got a lot of retesting to do. The oops may have been the result of it not being properly initialized or something... Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [2.4.0-test9-pre5] SCSI still broken, trident/mixer still broken
On Thu, Sep 21, 2000 at 01:12:27PM -0400, Douglas Gilbert wrote: > Interesting. 'cat /proc/scsi/scsi' should show the same > devices as 'cat /proc/scsi/sg/device_strs' [and > 'cat /proc/scsi/sg/devices']. If not, then the SCSI > mid-level is not calling sg_detect() [in sg.c] for > all new scsi devices detected by the mid-level. > > The sg_detect() routine is silent for all devices that > are "owned" by other upper level drivers (i.e. disks, > cdroms and tapes) but outputs a line for any other > scsi type (e.g. scanners which are scsi type 6). I didn't fiddle with it too much, but I added a printk to sg_detect and verified it was not getting called at all. I notice now, however, that I don't even have a /proc/scsi/sg. Does that mean it's not getting initialized at all? CONFIG_CHR_DEV_SG=y, assuming that's what needs to be set (config didn't change between kernel versions). At one point before I followed some of the debug/logging commands listed at the top of sg.c and got an Oops as well... Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[2.4.0-test9-pre5] SCSI still broken, trident/mixer still broken
Hi n' stuff, Around 2.4.0-test9-pre2 (or so, definitely in pre3) both my SCSI scanner and trident sound card stopped being happy. They are still both broken in pre5. On test8, both work perfectly. On test8: (scsi0:6:0) Synchronous Data Transfer Request was rejected Vendor: Model: Scanner Rev: 1.70 Type: ScannerANSI SCSI revision: 04 Detected scsi generic sg0 at scsi0, channel 0, id 6, lun 0, type 6 (scsi1:0:3:0) Synchronous at 8.0 Mbyte/sec, offset 31. Vendor: YAMAHAModel: CRW4416S Rev: 1.0e Type: CD-ROM ANSI SCSI revision: 02 Detected scsi CD-ROM sr0 at scsi1, channel 0, id 3, lun 0 scsi : detected 1 SCSI cdrom total. sr0: scsi3-mmc drive: 16x/16x writer cd/rw xa/form2 cdda tray ... on test9pre5 and test9pre3: (scsi0:6:0) Synchronous Data Transfer Request was rejected Vendor: Model: Scanner Rev: 1.70 Type: ScannerANSI SCSI revision: 04 (scsi0:0:3:0) Synchronous at 8.0 Mbyte/sec, offset 31. Vendor: YAMAHAModel: CRW4416S Rev: 1.0e Type: CD-ROM ANSI SCSI revision: 02 Detected scsi CD-ROM sr0 at scsi0, channel 0, id 3, lun 0 sr0: scsi3-mmc drive: 16x/16x writer cd/rw xa/form2 cdda tray ("Detected scsi generic..." line missing.) The trident driver appears to be working, but the mixer (ac97_codec?) appears to always keep everything muted, even though programs let the levels be apparently adjusted. Turning up the volume all the way on my receiver lets me hear some very faint sound leaking through, which sounds like a mixer problem instead of a playback problem. An ALSA CVS snapshot works fine. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0-test9-pre3 boots, 2.4.0-test9-pre4 doesn't (SCSI)
Hello, 2.4.0-test9-pre3 seems to boot and work fine, but 2.4.0-test9-pre4 with the same .config doesn't. It stops here: agpgart: AGP aperture is 64M @ 0xe400 aha152x: processing commandline: ok aha152x: BIOS test: passed, detected 1 controller(s) aha152x: resetting bus... aha152x0: vital data: rev=1, io=0x140 (0x140/0x140), irq=9, scsiid=7, reconnect=enabled, parity=enabled, synchronous=enabled, delay=100, extended translation=disabled aha152x0: trying software interrupt, ok. scsi0 : Adaptec 152x SCSI driver; $Revision: 2.0 $ scsi : 1 host. (Nothing more.) Pressing sysreq-p gives me always the same EIP, c01088ed. System.map: c01088c0 t default_idle c01088f4 t poll_idle This is a dual CPU machine. Both aha152x and aic7xxx are compiled in, but I only compiled aha152x in as of test9-pre2 as it seemed to break when used as a module then (it would loop endlessly detecting my scanner over and over again infinitely -- it got up to sg50something and I rebooted). Perhaps something else is broken that's just showing up differently now, as the test9-pre3 to test9-pre4 diff is pretty small and I don't see anything obviously broken. On test9-pre3, the next lines are: (scsi1) found at PCI 0/6/0 (scsi1) Wide Channel, SCSI ID=7, 32/255 SCBs (scsi1) Downloading sequencer code... 392 instructions downloaded ...etc. .config.gz attached. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] config.gz
Re: Still ext2-corruption in test8-pre5 (incl. OOPS)
On Wed, Sep 06, 2000 at 02:55:29AM +0200, Udo A. Steinberg wrote: > >>EIP; c0130400 <__block_commit_write+50/c0> <= Just got the same Oops with test8-pre5 while exiting mutt: Writing /var/spool/mail/sim...Unable to handle kernel NULL pointer dereference at virtual address 0018 printing eip: c0130583 *pde = Oops: CPU:0 EIP:0010:[] EFLAGS: 00010293 eax: ebx: ecx: 0800 edx: esi: 0800 edi: 0001 ebp: esp: ceb19e40 ds: 0018 es: 0018 ss: 0018 Process mutt (pid: 2153, stackpage=ceb19000) Stack: c1382a80 ce0ab000 0649 0800 c0130b52 ceb640a0 c1382a80 09b7 1000 0dea 000b ceb640a0 ceb640a0 09b7 c014d31e ceb6413c 006f49b7 0649 ceb640a0 ceb6413c Call Trace: [] [] [] [] [] [] [] [] [] [] [] Code: 8b 43 18 83 e0 01 0f 44 ef eb 35 89 f6 f6 43 18 10 74 2d f0 >>EIP; c0130583 <__block_commit_write+43/c0> <= Trace; c0130b52 Trace; c014d31e Trace; c012336d Trace; c012156d Trace; c012168d Trace; c014189e Trace; c014198e Trace; c012c71a Trace; c0124608 Trace; c012c963 Trace; c010a65f Code; c0130583 <__block_commit_write+43/c0> <_EIP>: Code; c0130583 <__block_commit_write+43/c0> <= 0: 8b 43 18 mov0x18(%ebx),%eax <= Code; c0130586 <__block_commit_write+46/c0> 3: 83 e0 01 and$0x1,%eax Code; c0130589 <__block_commit_write+49/c0> 6: 0f 44 ef cmove %edi,%ebp Code; c013058c <__block_commit_write+4c/c0> 9: eb 35 jmp40 <_EIP+0x40> c01305c3 <__block_commit_write+83/c0> Code; c013058e <__block_commit_write+4e/c0> b: 89 f6 mov%esi,%esi Code; c0130590 <__block_commit_write+50/c0> d: f6 43 18 10 testb $0x10,0x18(%ebx) Code; c0130594 <__block_commit_write+54/c0> 11: 74 2d je 40 <_EIP+0x40> c01305c3 <__block_commit_write+83/c0> Code; c0130596 <__block_commit_write+56/c0> 13: f0 00 00 lock add %al,(%eax) Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[Danger] Re: test8-pre4: innd fixed?
On Mon, Sep 04, 2000 at 09:09:43PM -0700, Linus Torvalds wrote: > On Mon, 4 Sep 2000, Mohammad A. Haque wrote: > > > > Is this file corruption 'thing' specific to innd or is it the same > > problem reported with corrupt mailboxes with pre2 and high disk > > activity? > > The mailbox corruption thread is at least partly due to a pine bug that is > triggered by a bugtraq posting. > > The truncate issue is unrelated to that, but may certainly show up on > mailboxes too. There is something definitely now even more broken with test8pre4. I just upgraded to test8pre4 from test7 and was reading this and some other emails with mutt. Upon quiting mutt, mutt reported that there was some sort of error while attempting to write the folder. My folder now looks like this: <1073152 bytes of the start of original folder> <67045376 bytes of NULL (0x00)> <51704 bytes of the end of the original folder> Obviously, the folder was in need of some pruning to begin with, but this pruned a bit more than I would have liked. I'm not exactly sure how this happened, but it definitely didn't happen before with test7. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ [EMAIL PROTECTED] ][ [EMAIL PROTECTED]] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/