Re: kswapd craziness in 3.7
Note that in the meantime, I've also applied (through Andrew) the patch that reverts commit c654345924f7 (see commit 82b212f40059 'Revert "mm: remove __GFP_NO_KSWAPD"'). I wonder if that revert may be bogus, and a result of this same issue. Maybe that revert should be reverted, and replaced with your patch? Mel? Zdenek? What's the status here? Linus On Tue, Nov 27, 2012 at 12:48 PM, Johannes Weiner wrote: > Hi everyone, > > I hope I included everybody that participated in the various threads > on kswapd getting stuck / exhibiting high CPU usage. We were looking > at at least three root causes as far as I can see, so it's not really > clear who observed which problem. Please correct me if the > reported-by, tested-by, bisected-by tags are incomplete. > > One problem was, as it seems, overly aggressive reclaim due to scaling > up reclaim goals based on compaction failures. This one was reverted > in 9671009 mm: revert "mm: vmscan: scale number of pages reclaimed by > reclaim/compaction based on failures". > > Another one was an accounting problem where a freed higher order page > was underreported, and so kswapd had trouble restoring watermarks. > This one was fixed in ef6c5be fix incorrect NR_FREE_PAGES accounting > (appears like memory leak). > > The third one is a problem with small zones, like the DMA zone, where > the high watermark is lower than the low watermark plus compaction gap > (2 * allocation size). The zonelist reclaim in kswapd would do > nothing because all high watermarks are met, but the compaction logic > would find its own requirements unmet and loop over the zones again. > Indefinitely, until some third party would free enough memory to help > meet the higher compaction watermark. The problematic code has been > there since the 3.4 merge window for non-THP higher order allocations > but has been more prominent since the 3.7 merge window, where kswapd > is also woken up for the much more common THP allocations. > > The following patch should fix the third issue by making both reclaim > and compaction code in kswapd use the same predicate to determine > whether a zone is balanced or not. > > Hopefully, the sum of all three fixes should tame kswapd enough for > 3.7. > > Johannes > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kswapd craziness in 3.7
On Tue, Nov 27, 2012 at 2:26 PM, Johannes Weiner wrote: > On Tue, Nov 27, 2012 at 05:02:36PM -0500, Rik van Riel wrote: >> >> Kswapd going crazy is certainly a large part of the problem. >> >> However, that leaves the issue of page_alloc.c waking up >> kswapd when the system is not actually low on memory. >> >> Instead, kswapd is woken up because memory compaction failed, >> potentially even due to lock contention during compaction! >> >> Ideally the allocation code would only wake up kswapd if >> memory needs to be freed, or in order for kswapd to do >> memory compaction (so the allocator does not have to). > > Maybe I missed something, but shouldn't this be solved with my patch? Ok, guys. Cage fight! The rules are simple: two men enter, one man leaves. And the one who comes out gets to explain to me which patch(es) I should apply, and which I should revert, if any. My current guess is that I should apply the one Johannes just sent ("mm: vmscan: fix kswapd endless loop on higher order allocation") after having added the cc to stable to it, and then revert the recent revert (commit 82b212f40059). But I await the Thunderdome. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: unusual update of the security tree
On Tue, Nov 27, 2012 at 3:28 PM, Stephen Rothwell wrote: > > If that is what happened, it may be worth always using the --no-ff flag > to git merge/pull to make sure that the top commit on your tree always > has you as the committer (and maybe SOB). > > Linus, does that make sense in general for maintainers? No. That just hides the real problem - back-merges of random points in history. Don't do them, people. EVER. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Acpi deadlocks with 3.7.0-rc4
Adding more people (and the acpi list) to this report. I'm seeing *very* few changes to the core suspend/resume path in 3.7, and while there are some acpia updates, they seem to be pretty mild too. I think the acpi_os_wait_semaphore thing is a red herring - that's just stale on the stack. Do you have the register state from the oops? Or at least the "Code:" line? It would be nice to see exactly where the oops happens, and I cannot line up your "acpi_ns_lookup + 0xa1/0x5b9" with any code due to different compilers (and configurations etc). Linus On Thu, Nov 15, 2012 at 8:09 AM, Zdenek Kabelac wrote: > Hello > > > I've already seen twice this oops after resuming my Lenovo T61 in docking > station. > > Since for some reason currently the serial line doesn't work correctly after > resume > (while I'm pretty sure it used to work in past) here is at least > hand-written oops > message from mobile camera picture. > > From the trace it seem os_wait semaphore is accessed twice. > Unsure which device is behind it - but it seem docking station is need to > hit this issue. > > > kernel 3.7.0-rc4 > > Pid: pm-suspend > > RIP: acpi_ns_lookup + 0xa1/0x5b9 > > Call Trace: > > ? acpi_os_wait_semaphore + 0x136/0x149 > acpi_ns_get_mode + 0x96/0x102 > ? __lock_is_held +0x5f/0x90 > acpi_ns_evaluate +0x47/0x2de > ? _raw_spin_lock_irqsave > ? acpi_ut_evaluate_object > ? sub_preempt_count > ? pnpacpi_can_wakeup > acpi_rs_get_method_data > ? acpi_os_signal_semaphore > acpi_walk_resources > ? acpi_ut_release_mutex > pnpacpi_build_resource_template > ? acpi_bus_get_device > pnpacpi_set_resources > ? pnp_device_shutdown > pnp_start_dev > pnp_bus_resume > dpm_run_callback > device_resume > dpm_resume > dpm_resume_end > ? acpi_suspend_begin_old > suspend_devices_and_enter > pm_suspend > state_store > kobj_attr_store > sysfs_write_file > vgs_write > sys_write > system_call_fastpath > > Zdenek > > > PS: jpg on request > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer
On Wed, Nov 28, 2012 at 8:22 AM, Kay Sievers wrote: > On Wed, Nov 28, 2012 at 2:33 PM, Michael Kerrisk > wrote: > >> On a 2.6.31 system, immediately after SYSLOG_ACTION_READ_CLEAR, a >> SYSLOG_ACTION_SIZE_UNREAD returns 0. > > Hmm, sounds like the right thing to do. Right. And that's the *OLD* behavior (2.6.31). >> On 3.5, immediately after SYSLOG_ACTION_READ_CLEAR, the value returned >> by SYSLOG_ACTION_SIZE_UNREAD is unchanged And this is the *NEW* behavior, and as you say: > Which sounds at least like weird behaviour, if not "broken". So the new behavior is insane and different. Let's fix it. It looks like it is because the new SYSLOG_ACTION_SIZE_UNREAD code does not take the new clear_seq code into account. Hmm? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Acpi deadlocks with 3.7.0-rc4
On Wed, Nov 28, 2012 at 8:21 AM, Zdenek Kabelac wrote: > > I've opened https://bugzilla.kernel.org/show_bug.cgi?id=51071 > and attached picture there which is all I have. > > I'll try to decode exact code line. Uhhuh. It's missing much of the relevant parts of the code line, in particular the actual oopsing instruction. But what is there decodes to 41 b8 10 00 00 00 mov$0x10,%r8d 48 c7 c1 88 52 64 81mov$0x81645288,%rcx 31 c0 xor%eax,%eax 48 c7 c2 98 52 64 81mov$0x81645298,%rdx bf 00 04 00 0. mov$0x0.00400,%edi .. oops in here .. 74 33 je 0x50 48 89 dfmov%rbx,%rdi e8 4d c9 00 00 callq ? 48 89 d9mov%rbx,%rcx 48 c7 c2 0a .. .. ..mov$0x..0a,%rdx which isn't really very obvious. Do you have that kernel around (or at least the same compiler and configuration)? Doing a objdump --disassemble drivers/acpi/acpica/nsaccess.o might help pinpoint where that is.. > It's probably not a regression from 3.6 - since this problem was there for > much longer - but now it has just become much more visible. Ok. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Acpi deadlocks with 3.7.0-rc4
On Wed, Nov 28, 2012 at 9:27 AM, Zdenek Kabelac wrote: > > I've attached bigger disasfun script output to BZ 51071. > https://bugzilla.kernel.org/show_bug.cgi?id=51071#c1 > > > if (ACPI_GET_DESCRIPTOR_TYPE(prefix_node) != > 00a1 cmpb $0xf,0x8(%rbx) > 00a5 je 0da > > seems to be going out of bounds. The whole "prefix_node" pointer is bogus. It seems to have the value 0x1000. I wonder how that happened. It's loaded from 'scope_info->scope.node', and it *should* be a valid pointer. Can you add a print-out of scope_info->common.descriptor_type and check that it is ACPI_DESC_TYPE_STATE_WSCOPE (== 8). If it is not, return early. Or just something like the attatched, which just uses the root node (and warns once) if it's not a valid WSCOPE thing. Linus patch.diff Description: Binary data
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
No, this is crap. We don't introduce random hooks like this just because the block layer has shit-for-brains and cannot be bothered to do things right. The fact is, the whole locking in the block layer open routine is total and utter crap. It doesn't lock the right thing, even with your change *anyway* (or with the change Jens had). Absolutely nothing in "mmap_region()" cares at all about the block-size anywhere - it's generic, after all - so locking around it is f*cking pointless. There is no way in hell that the caller of ->mmap can *ever* care about the block size, since it never even looks at it. Don't do random crap like this. Why does the code think that mmap matters so much anyway? As you say, the mmap itself does *nothing*. It has no impact for the block size. Linus On Wed, Nov 28, 2012 at 9:25 AM, Mikulas Patocka wrote: > > > On Wed, 28 Nov 2012, Jens Axboe wrote: > >> On 2012-11-28 04:57, Mikulas Patocka wrote: >> > >> > This patch is wrong because you must check if the device is mapped while >> > holding bdev->bd_block_size_semaphore (because >> > bdev->bd_block_size_semaphore prevents new mappings from being created) >> >> No it doesn't. If you read the patch, that was moved to i_mmap_mutex. > > Hmm, it was wrong before the patch and it is wrong after the patch too. > > The problem is that ->mmap method doesn't do the actual mapping, the > caller of ->mmap (mmap_region) does it. So we must actually catch > mmap_region and protect it with the lock, not ->mmap. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Wed, Nov 28, 2012 at 11:43 AM, Al Viro wrote: > Have a > private vm_operations - a copy of generic_file_vm_ops with ->open()/->close() > added to it. That sounds more reasonable. However, I suspect the *most* reasonable thing to do is to just remove the whole damn thing. We really shouldn't care about mmap. If somebody does a mmap on a block device, and somebody else then changes the block size, why-ever should we bother to go through any contortions at *all* to make that kind of insane behavior do anything sane at all. Just let people mmap things. Then just let the normal page cache invalidation work right. In fact, it is entirely possible that we could/should just not even invalidate the page cache at all, just make sure that the buffer heads attached to any pages get disconnected. No? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Wed, Nov 28, 2012 at 11:50 AM, Mikulas Patocka wrote: > > mmap_region() doesn't care about the block size. But a lot of > page-in/page-out code does. That seems a bogus argument. mmap() is in *no* way special. The exact same thing happens for regular read/write. Yet somehow the mmap code is special-cased, while the normal read-write code is not. I suspect it might be *easier* to trigger some issues with mmap, but that still isn't a good enough reason to special-case it. We don't add locking to one please just because that one place shows some race condition more easily. We fix the locking. So for example, maybe the code that *actually* cares about the buffer size (the stuff that allocates buffers in fs/buffer.c) needs to take that new percpu read lock. Basically, any caller of "alloc_page_buffers()/create_empty_buffers()" or whatever. I also wonder whether we need it *at*all*. I suspect that we could easily have multiple block-sizes these days for the same block device. It *used* to be (millions of years ago, when dinosaurs roamed the earth) that the block buffers were global and shared with all users of a partition. But that hasn't been true since we started using the page cache, and I suspect that some of the block size changing issues are simply entirely stale. Yeah, yeah, there could be some coherency issues if people write to the block device through different block sizes, but I think we have those coherency issues anyway. The page-cache is not coherent across different mapping inodes anyway. So I really suspect that some of this is "legacy logic". Or at least perhaps _should_ be. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Wed, Nov 28, 2012 at 12:03 PM, Linus Torvalds wrote: > > mmap() is in *no* way special. The exact same thing happens for > regular read/write. Yet somehow the mmap code is special-cased, while > the normal read-write code is not. I just double-checked, because it's been a long time since I actually looked at the code. But yeah, block device read/write uses the pure page cache functions. IOW, it has the *exact* same IO engine as mmap() would have. So here's my suggestion: - get rid of *all* the locking in aio_read/write and the splice paths - get rid of all the stupid mmap games - instead, add them to the functions that actually use "blkdev_get_block()" and "blkdev_get_blocks()" and nowhere else. That's a fairly limited number of functions: blkdev_{read,write}page(), blkdev_direct_IO() and blkdev_write_{begin,end}() Doesn't that sounds simpler? And more logical: it protects the actual places that use the block size of the device. I dunno. Maybe there is some fundamental reason why the above is broken, but it seems to be a much simpler approach. Sure, you need to guarantee that the people who get the write-lock cannot possibly cause IO while holding it, but since the only reason to get the write lock would be to change the block size, that should be pretty simple, no? Yeah, yeah, I'm probably missing something fundamental, but the above sounds like the simple approach to fixing things. Aiming for having the block size read-lock be taken by the things that pass in the block-size itself. It would be nice for things to be logical and straightforward. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Wed, Nov 28, 2012 at 12:13 PM, Linus Torvalds wrote: > > I dunno. Maybe there is some fundamental reason why the above is > broken, but it seems to be a much simpler approach. Sure, you need to > guarantee that the people who get the write-lock cannot possibly cause > IO while holding it, but since the only reason to get the write lock > would be to change the block size, that should be pretty simple, no? Here is a *COMPLETELY* untested patch. Caveat emptor. It will probably do unspeakable things to your family and pets. Linus patch.diff Description: Binary data
Re: [git pull] drm fixes
[ Hmm. For some reason this seems to have never gone out, and was in my drafts folder. If you get it twice, my bad ] On Thu, Nov 22, 2012 at 12:57 AM, Dave Airlie wrote: > > Doh!, yes I picked wrong place to generate report from, okay here is > one corresponding to what you saw, You should never even need to "pick" any place to generate the report from. Just do something like git fetch upstream (where "upstream" is a branch description for the upstream repository - see "man git-remote" etc, although you can obviously always just type out the whole repo details etc in full if you would want to). Note the "fetch" - not pull - you just want to get it, not merge it. Then you can just point git pull-request at the upstream, and git wll figure out what the latest common point is. No need for you to manually try to figure it out. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Wed, Nov 28, 2012 at 12:32 PM, Linus Torvalds wrote: > > Here is a *COMPLETELY* untested patch. Caveat emptor. It will probably > do unspeakable things to your family and pets. Btw, *if* this approach works, I suspect we could just switch the bd_block_size_semaphore semaphore to be a regular rw-sem. Why? Because now it's no longer ever gotten in the cached IO paths, we only get it when we're doing much more expensive things (ie actual IO, and buffer head allocations etc etc). As long as we just work with the page cache, we never get to the whole lock at all. Which means that the whole percpu-optimized thing is likely no longer all that relevant. But that's an independent thing, and it's only true *if* my patch works. It looks fine on paper, but maybe there's something fundamentally broken about it. One big change my patch does is to move the sync_bdev/kill_bdev to *after* changing the block size. It does that so that it can guarantee that any old data (which didn't see the new block size) will be sync'ed even if there is new IO coming in as we change the block size. The old code locked the whole sync() region, which doesn't work with my approach, since the sync will do IO and would thus cause potential deadlocks while holding the rwsem for writing. So with this patch, as the block size changes, you can actually have some old pages with the old block size *and* some different new pages with the new block size all at the same time. It should all be perfectly fine, but it's worth pointing out. (It probably won't trigger in practice, though, since doing IO while somebody else is changing the blocksize is fundamentally an odd thing to do, but whatever. I also suspect that we *should* perhaps use the inode->i_sem thing to serialize concurrent block size changes, but that's again an independent issue) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Wed, Nov 28, 2012 at 1:29 PM, Mikulas Patocka wrote: > > The problem with this approach is that it is very easy to miss points > where it is assumed that the block size doesn't change - and if you miss a > point, it results in a hidden bug that has a little possibility of being > found. Umm. Mikulas, *your* approach has resulted in bugs. So let's not throw stones in glass houses, shall we? The whole reason for this long thread (and several threads before it) is that your model isn't working and is causing problems. I already pointed out how bogus your arguments about mmap() locking were, and then you have the gall to talk about potential bugs, when I have pointed you to *actual* bugs, and actual mistakes. > For example, __block_write_full_page and __block_write_begin do > if (!page_has_buffers(page)) { create_empty_buffers... } > and then they do > WARN_ON(bh->b_size != blocksize) > err = get_block(inode, block, bh, 1) Right. And none of this is new. > ... so if the buffers were left over from some previous call to > create_empty_buffers with a different blocksize, that WARN_ON is trigged. None of this can happen. > Locking the whole read/write/mmap operations is crude, but at least it can > be done without thorough review of all the memory management code. Umm. Which you clearly didn't do, and raised totally crap arguments for. In contrast, I have a very simple argument for the correctness of my patch: every single user of the "get_block[s]()" interface now takes the lock for as long as get_block[s]() is passed off to somebody else. And since get_block[s]() is the only way to create those empty buffers, I think I pretty much proved exactly what you ask for. And THAT is the whole point and advantage of making locking sane. Sane locking you can actually *think* about! In contrast, locking around "mmap()" is absolutely *guaranteed* to be insane, because mmap() doesn't actually do any of the IO that the lock is supposed to protect against! So Mikulas, quite frankly, your arguments argue against you. When you say "Locking the whole read/write/mmap operations is crude, but at least it can be done without thorough", you are doubly correct: it *is* crude, and it clearly *was* done without thought, since it's a f*cking idiotic AND INCORRECT thing to do. Seriously. Locking around "mmap()" is insane. It leads to insane semantics (the whole EBUSY thing is purely because of that problem) and it leads to bad code (your "let's add a new "mmap_region" hook is just disgusting, and while Al's idea of doing it in the existing "->open" method is at least not nasty, it's definitely extra code and complexity). There are serious *CORRECTNESS* advantages to simplicity and directness. And locking at the right point is definitely very much part of that. Anyway, as far as block size goes, we have exactly two cases: - random IO that does not care about the block size, and will just do whatever the current block size is (ie normal anonymous accesses to the block device). This is the case that needs the locking - but it only needs it around the individual page operations, ie exactly where I put it. In fact, they can happily deal with different block sizes for different pages, they don't really care. - mounted filesystems etc that require a particular block size and set it at mount time, and they have exclusivity rules The second case is the case that actually calls set_blocksize(), and if "kill_bdev()" doesn't get rid of the old blocksizes, then they have always been in trouble, and would always _continue_ to be in trouble, regardless of locking. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Wed, Nov 28, 2012 at 2:52 PM, Linus Torvalds wrote: > >> For example, __block_write_full_page and __block_write_begin do >> if (!page_has_buffers(page)) { create_empty_buffers... } >> and then they do >> WARN_ON(bh->b_size != blocksize) >> err = get_block(inode, block, bh, 1) > > Right. And none of this is new. .. which, btw, is not to say that *other* things aren't new. They are. The change to actually change the block device buffer size before then calling "sync_bdev()" is definitely a real change, and as mentioned, I have not tested the patch in any way. If any block device driver were to actually compare the IO size they get against the bdev->block_size thing, they'd see very different behavior (ie they'd see the new block size as they are asked to write old the old blocks with the old block size). So it does change semantics, no question about that. I don't think any block device does it, though. A bigger issue is for things that emulate what blkdev.c does, and doesn't do the locking. I see code in md/bitmap.c that seems a bit suspicious, for example. That said, it's not *new* breakage, and the "lock at mmap/read/write() time" approach doesn't fix it either (since the mapping will be different for the underlying MD device). So I do think that we should take a look at all the users of "alloc_page_buffers()" and "create_empty_buffers()" to see what *they* do to protect the block-size, but I think that's an independent issue from the raw device access case in fs/block_dev.c.. I guess I have to actually test my patch. I don't have very interesting test-cases, though. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
[ Sorry, I was offline for a while driving kids around ] On Wed, Nov 28, 2012 at 4:38 PM, Mikulas Patocka wrote: > > It can happen. Take your patch (the one that moves bd_block_size_semaphore > into blkdev_readpage, blkdev_writepage and blkdev_write_begin). Interesting. The code *has* the block size (it's in "bh->b_size"), but it actually then uses the inode blocksize instead, and verifies the two against each other. It could just have used the block size directly (and then used the inode i_blkbits only when no buffers existed), avoiding that dependency entirely.. It actually does the same thing (with the same verification) in __block_write_full_page() and (_without_ the verification) in __block_commit_write(). Ho humm. All of those places actually do hold the rwsem for reading, it's just that I don't want to hold it for writing over the sync.. Need to think on this, Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Wed, Nov 28, 2012 at 6:04 PM, Linus Torvalds wrote: > > Interesting. The code *has* the block size (it's in "bh->b_size"), but > it actually then uses the inode blocksize instead, and verifies the > two against each other. It could just have used the block size > directly (and then used the inode i_blkbits only when no buffers > existed), avoiding that dependency entirely.. Looking more at this code, that really would be the nicest solution. There's two cases for the whole get_block() thing: - filesystems. The block size will not change randomly, and "get_block()" seriously depends on the block size. - the raw device. The block size *will* change, but to simplify the problem, "get_block()" is a 1:1 mapping, so it doesn't even care about the block size because it will always return "bh->b_blocknr = nr". So we *could* just say that all the fs/buffer.c code should use "inode->i_blkbits" for creating buffers (because that's the size new buffers should always use), but use "bh->b_size" for any *existing* buffer use. And looking at it, it's even simple. Except for one *very* annoying thing: several users really don't want the size of the buffer, they really do want the *shift* of the buffer size. In fact, that single issue seems to be the reason why "inode->i_blkbits" is really used in fs/buffer.c. Otherwise it would be fairly trivial to just make the pattern be just a simple if (!page_has_buffers(page)) create_empty_buffers(page, 1 << inode->i_blkbits, 0); head = page_buffers(page); blocksize = head->b_size; and just use the blocksize that way, without any other games. All done, no silly WARN_ON() to verify against some global block-size, and the fs/buffer.c code would be perfectly simple, and would have no problem at all with multiple different blocksizes in different pages (the page lock serializes the buffers and thus the blocksize at the per-page level). But the fact that the code wants to do things like block = (sector_t)page->index << (PAGE_CACHE_SHIFT - bbits); seriously seems to be the main thing that keeps us using 'inode->i_blkbits'. Calculating bbits from bh->b_size is just costly enough to hurt (not everywhere, but on some machines). Very annoying. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Wed, Nov 28, 2012 at 6:58 PM, Linus Torvalds wrote: > > But the fact that the code wants to do things like > > block = (sector_t)page->index << (PAGE_CACHE_SHIFT - bbits); > > seriously seems to be the main thing that keeps us using > 'inode->i_blkbits'. Calculating bbits from bh->b_size is just costly > enough to hurt (not everywhere, but on some machines). > > Very annoying. Hmm. Here's a patch that does that anyway. I'm not 100% happy with the whole ilog2 thing, but at the same time, in other cases it actually seems to improve code generation (ie gets rid of the whole unnecessary two dereferences through page->mapping->host just to get the block size, when we have it in the buffer-head that we have to touch *anyway*). Comments? Again, untested. And I notice that Al Viro hasn't been cc'd, which is sad, since he's been involved in much of fs/block_dev.c. Al - this is an independent patch to fs/buffer.c to make fs/block_dev.c able to change the block size of a block device while there is IO in progress that may still use the old block size. The discussion has been on fsdevel and lkml, but you may have missed it... Linus patch.diff Description: Binary data
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Wed, Nov 28, 2012 at 10:25 PM, Al Viro wrote: > > Umm... set_blocksize() is calling kill_bdev(), which does > truncate_inode_pages(mapping, 0). What's going to happen to data in > the dirty pages? IO in progress is not the only thing to worry about... Hmm. Yes. I think it works by virtue of "if you change the blocksize while there is active IO, you're insane and you deserve whatever you get". It shouldn't even be fundamentally hard to make it work, although I suspect it would be more code than it would be worth. The sane model would be to not use truncate_inode_pages(), but instead just walk the pages and get rid of the buffer heads with the wrong size. Preferably *combining* that with the sync_blockdev(). We have no real reason to even invalidate the page cache, it's just the buffers we want to get rid of. But I suspect it's true that none of that is really *worth* it, considering that nobody likely wants to do any concurrent IO. We don't want to crash, or corrupt the data structures, but I suspect "you get what you deserve" might actually be the right model ;) So the current "sync_blockdev()+kill_bdev()" takes care of the *sane* case (we flush any data that happened *before* the block size change), and any concurrent writes with block-size changes are "good luck with that". Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Wed, Nov 28, 2012 at 10:30 PM, Al Viro wrote: > > Note that sync_blockdev() a few lines prior to that is good only if we > have no other processes doing write(2) (or dirtying the mmapped pages, > for that matter). The window isn't too wide, but... So with Mikulas' patches, the write actually would block (at write level) due to the locking. The mmap'ed patches may be around and flushed, but the logic to not allow currently *active* mmaps (with the rather nasty random -EBUSY return value) should mean that there is no race. Or rather, there's a race, but it results in that EBUSY thing. With my simplfied locking, the sync_blockdev() is right before (not a few lines prior) to the kill_bdev(), and in a perfect world they'd actually be one single operation ("write back and invalidate pages with the wrong block-size"). But they aren't. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Do a proper locking for mmap and block size change
On Wed, Nov 28, 2012 at 2:01 PM, Mikulas Patocka wrote: > > This sounds sensible. I'm sending this patch. This looks much better. I think I'll apply this for 3.7 (since it's too late to do anything fancier), and then for 3.8 I will rip out all the locking entirely, because looking at the fs/buffer.c patch I wrote up, it's all totally unnecessary. Adding a ACCESS_ONCE() to the read of the i_blkbits value (when creating new buffers) simply makes the whole locking thing pointless. Just make the page lock protect the block size, and make it per-page, and we're done. No RCU grace period crap, no expedited mess, no nothing. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Thu, Nov 29, 2012 at 6:12 AM, Chris Mason wrote: > > Jumping in based on Linus original patch, which is doing something like > this: > > set_blocksize() { > block new calls to writepage, prepare/commit_write > set the block size > unblock > > < --- can race in here and find bad buffers ---> > > sync_blockdev() > kill_bdev() > > < --- now we're safe --- > > } > > We could add a second semaphore and a page_mkwrite call: Yeah, we could be fancy, but the more I think about it, the less I can say I care. After all, the only things that do the whole set_blocksize() thing should be: - filesystems at mount-time - things like loop/md at block device init time. and quite frankly, if there are any *concurrent* writes with either of the above, I really *really* don't think we should care. I mean, seriously. So the _only_ real reason for the locking in the first place is to make sure of internal kernel consistency. We do not want to oops or corrupt memory if people do odd things. But we really *really* don't care if somebody writes to a partition at the same time as somebody else mounts it. Not enough to do extra work to please insane people. It's also worth noting that NONE OF THIS HAS EVER WORKED IN THE PAST. The whole sequence always used to be unlocked. The locking is entirely new. There is certainly not any legacy users that can possibly rely on "I did writes at the same time as the mount with no serialization, and it worked". It never has worked. So I think this is a case of "perfect is the enemy of good". Especially since I think that with the fs/buffer.c approach, we don't actually need any locking at all at higher levels. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce a method to catch mmap_region (was: Recent kernel "mount" slow)
On Thu, Nov 29, 2012 at 9:51 AM, Chris Mason wrote: > > The bigger question is do we have users that expect to be able to set > the blocksize after mmaping the block device (no writes required)? I > actually feel a little bad for taking up internet bandwidth asking, but > it is a change in behaviour. Yeah, it is. That said, I don't think people will really notice. Nobody mmap's block devices outside of some databases, afaik, and nobody sane mounts a partition at the same time a DB is using it. So I think the new EBUSY check is *ugly*, but I don't realistically believe that it is a problem. The ugliness of the locking is why I'm not a huge fan of it, but if it works I can live with it. But yes, the mmap tests are new with the locking, and could in theory be problematic if somebody reports that it breaks anything. And like the locking, they'd just go away if we just do the fs/buffer.c approach instead. Because doing things in fs/buffer.c simply means that we don't care (and serialization is provided by the page lock on a per-page basis, which is what mmap relies on anyway). So doing the per-page fs/buffer.c approach (along with the "ACCESS_ONCE()" on inode->i_blkbits to make sure we get *one* consistent value, even if we don't care *which* value it is) would basically revert to all the old semantics. The only thing it would change is that we wouldn't see oopses. (And in theory, it would allow us to actively mix-and-match different block sizes for a block device, but realistically I don't think there are any actual users of that - although I could imagine that a filesystem would use a smaller block size for file tail-blocks etc, and still want to use the fs/buffer.c code, so it's *possible* that it would be useful, but filesystems have been able to do things like that by just doing their buffers by hand anyway, so it's not really fundamentally new, just a possible generalization of code) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Do a proper locking for mmap and block size change
On Thu, Nov 29, 2012 at 10:23 AM, Mikulas Patocka wrote: > > > If you remove that percpu rw lock, you also need to rewrite direct i/o > code. > > In theory, block device direct i/o doesn't need buffer block size at all. > But in practice, it shares a lot of code with filesystem direct i/o, it > reads the block size multiple times and it crashes if it changes. If it's a filesystem, then the size will never change while it is mounted. So only the direct-block-device case needs to be worried about, no? And that uses __generic_file_aio_write() and friends, which in turn use the readpage/writepage functions. So for block devices, it should be sufficient to make readpage/writepage (with the writing obviously having all the "write_begin/write_end/fullpage" variants) be safe as far as I can see. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Do a proper locking for mmap and block size change
On Thu, Nov 29, 2012 at 9:19 AM, Linus Torvalds wrote: > > I think I'll apply this for 3.7 (since it's too late to do anything > fancier), and then for 3.8 I will rip out all the locking entirely, > because looking at the fs/buffer.c patch I wrote up, it's all totally > unnecessary. > > Adding a ACCESS_ONCE() to the read of the i_blkbits value (when > creating new buffers) simply makes the whole locking thing pointless. > Just make the page lock protect the block size, and make it per-page, > and we're done. There's a 'block-dev' branch in my git tree, if you guys want to play around with it. It actually reverts fs/block-dev.c back to the 3.6 state (except for some whitespace damage that I refused to re-introduce), so that part of the changes should be pretty safe and well tested. The fs/buffer.c changes, of course, are new. It's largely the same patch I already sent out, with a small helper function to simplify it, and to keep the whole ACCESS_ONCE() thing in just a single place. That branch may be re-based in case I get reports or acks or whatever, so don't rely on it (or if you do, please let me know, and I'll stop editing it). The fact that I could just revert the fs/block-dev.c part to a known state makes me wonder if this might be safe for 3.7 after all (the fs/buffer.c changes all *look* safe). That way we'd not have to worry about any new semantics (whether they be EBUSY or any possible locking slowdowns or RT issues). But I'll think about it, and it would be good for people to double-check my fs/buffer.c stuff. Mikulas, does that pass your testing? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Do a proper locking for mmap and block size change
On Thu, Nov 29, 2012 at 11:15 AM, Chris Mason wrote: > > The fs/buffer.c part makes sense during a quick read. But > fs/direct-io.c plays with i_blkbits too. The semaphore was fixing real > bugs there. Ugh. I _hate_ direct-IO. What a mess. And yeah, it seems to be incestuously playing games that should be in fs/buffer.c. I thought it was doing the sane thing with the page cache. (I now realize that Mikulas was talking about this mess, while I thought he was talking about the AIO code which is largely sane). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Do a proper locking for mmap and block size change
On Thu, Nov 29, 2012 at 11:26 AM, Linus Torvalds wrote: > > (I now realize that Mikulas was talking about this mess, while I > thought he was talking about the AIO code which is largely sane). Oh wow. The direct-IO code really doesn't seem to care at all. I don't think it needs locking either (it seems to do everything with a private buffer-head), and the problem appears solely to be that it reads i_blksize multiple times, so changing it just happens to confuse the direct-io code. If it were to read it only once, and then use that value, it looks like it should all JustWork(tm). And the right thing to do would seem to just add it to the "dio_submit" structure, that we already have. And it already *has* a blkbits field, but that's the "IO blocksize", not the "getblocks blocksize", if I read that mess correctly. Of course, it then *ALREADY* has that "blkfactor" thing, which is the difference between i_blkbits and blktbits, so it effective *does* have i_blkbits already in the dio_submit structure. But despite it all, it keeps re-reading i_blksize. Christ. That code is a mess. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Do a proper locking for mmap and block size change
On Thu, Nov 29, 2012 at 11:48 AM, Chris Mason wrote: > > blkdev_get_blocks (called during DIO) is also checking i_blkbits, but I > really don't get why that isn't byte based instead. DIO is already > doing the shift & mask game. The blkdev_get_blocks() this is just sad. The underlying data structure is actually byte-based (it's "i_size_read(bdev->bd_inode"), but we convert it to a block-based number. Oops. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Do a proper locking for mmap and block size change
On Thu, Nov 29, 2012 at 11:55 AM, Linus Torvalds wrote: > > The blkdev_get_blocks() this is just sad. > > The underlying data structure is actually byte-based (it's > "i_size_read(bdev->bd_inode"), but we convert it to a block-based > number. > > Oops. Oh, it's even worse than that. The DIO code ends up passing in buffer heads that have sizes bigger than the inode i_blksize, which can cause problems at the end of the disk. So blkdev_get_blocks() knows about it, and will then "fix" that and shrink them down. The games with "max_block" are hilarious. In a really sad way. That whole blkdev_get_blocks() function is pure and utter shit. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Do a proper locking for mmap and block size change
On Thu, Nov 29, 2012 at 11:48 AM, Chris Mason wrote: > > It was all a trick to get you to say the AIO code was sane. It's only sane compared to the DIO code. That said, I hate AIO much less these days that we've largely merged the code with the regular IO. It's still a horrible interface, but at least it is no longer a really disgusting separate implementation in the kernel of that horrible interface. So yeah, I guess AIO really is pretty sane these days. > It looks like we could use the private copy of i_blkbits that DIO is > already recording. Yes. But that didn't fix the blkdev_get_blocks() mess you pointed out. I've pushed out two more commits to the 'block-dev' branch at git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux block-dev in case anybody wants to take a look. It is - as usual - entirely untested. It compiles, and I *think* that blkdev_get_blocks() makes a whole lot more sense this way - as you said, it should be byte-based (although it actually does the block number conversion because I worried about overflow - probably unnecessarily). Comments? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Do a proper locking for mmap and block size change
On Thu, Nov 29, 2012 at 1:29 PM, Chris Mason wrote: > > Just reading the new blkdev_get_blocks, it looks like we're mixing > shifts. In direct-io.c map_bh->b_size is how much we'd like to map, and > it has no relation at all to the actual block size of the device. The > interface is abusing b_size to ask for as large a mapping as possible. Ugh. That's a big violation of how buffer-heads are supposed to work: the block number is very much defined to be in multiples of b_size (see for example "submit_bh()" that turns it into a sector number). But you're right. The direct-IO code really *is* violating that, and knows that get_block() ends up being defined in i_blkbits regardless of b_size. What a crock. That direct-IO code is hack-upon-hack. Whoever wrote it should be shot. I think the only sane way to fix is is to pass in the block size to get_blocks(). Which we admittedly should have done long ago, so that's not a bad fix, but without actually looking at what it involves, I think it's going to be pretty big patch. All the filesystems that support the interface need to update it, even if they can then ignore it, because direct-IO does all these hacks only for the raw device. And I think it will improve the interface, but damn, direct-IO is still horrible for playing these kinds of games. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Do a proper locking for mmap and block size change
On Thu, Nov 29, 2012 at 2:16 PM, Linus Torvalds wrote: > > But you're right. The direct-IO code really *is* violating that, and > knows that get_block() ends up being defined in i_blkbits regardless > of b_size. It turns out fs/ioctl.c does the same - it fills in the buffer head with some random bh->b_size too. I think it's not even a power of two in that case. And I guess it's understandable - they don't actually *use* the buffer, they just want the offset. So the b_size field really is just random crap to the users of the get_block interfaces, since they've never cared before. Ugh, this was definitely a dark and disgusting underbelly of the VFS layer. We've not had to really touch it for a *looong* time.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Do a proper locking for mmap and block size change
On Thu, Nov 29, 2012 at 5:16 PM, Chris Mason wrote: > > I searched through filemap.c for the magic i_size check that would let > us get away with ignoring i_blkbits in get_blocks, but its just not > there. The whole fallback-to-buffered scheme seems to rely on > get_blocks checking for i_size. I really hope I'm just missing > something. So generic_write_checks() limits the size to i_size at for writes (and for "isblk"). Sure, then it will do the buffered part after that, but that should all be fine anyway, since by then we use the normal page cache. For reads, generic_file_aio_read() will check pos < size, but doesn't seem to actually limit the size of the iovec. I'm not sure why it doesn't just do "iov_shorten()". Anyway, having looked at actually passing in the block size to get_block(), I can say that is a horrible idea. There are tons of get_block functions (for various filesystems), and *none* of them really want the block size, because they tend to work on block indexes. And if they do want the block size, they'll just get it from the inode or sb, since they are filesystems and it's all stable. So the *only* of the places that would want the block size is fs/block_dev.c. And the callers really already seem to do the i_size check, although they sometimes do it badly. And since there are fewer callers than there are get_block() implementations, I think we should just fix the callers and be done with it. Those generic_file_aio_read/write() functions in fs/direct-io.c really just seem to be badly written. The fact that they may depend on the i_size check in get_blocks() is sad, but I think we should fix it and just remove the check for block devices. That's going to simplify so much.. I updated the 'block-dev' branch to have that simpler fs/block_dev.c model instead. I'll look at the iovec shortening later. It's a non-fast-forward thing, look out! (I actually think we should just add the max-offset check to rw_copy_check_uvector(). That one already does the MAX_RW_COUNT thing, and we could make it do a max_offset check as well). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 3.9-rc3
" Josef Bacik (1): Btrfs: return EIO if we have extent tree corruption Josh Boyer (1): serial: 8250: Keep 8250. module options functional after driver rename Junwei Zhang (1): afkey: fix a typo Kamal Mostafa (1): Input: cypress_ps2 - fix trackpadi found in Dell XPS12 Kees Cook (2): final removal of CONFIG_EXPERIMENTAL signal: always clear sa_restorer on execve Kevin Cernekee (1): Input: ALPS - remove unused argument to alps_enter_command_mode() Kishon Vijay Abraham I (1): usb: gadget: make usb functions to load before gadget driver Konrad Rzeszutek Wilk (2): xen/pciback: Don't disable a PCI device that is already disabled. acpi: Export the acpi_processor_get_performance_info Konstantin Khlebnikov (3): e1000e: fix pci-device enable-counter balance e1000e: fix runtime power management transitions e1000e: fix accessing to suspended device Kumar Amit Mehta (3): staging: comedi: drivers: usbdux.c: fix DMA buffers on stack staging: comedi: drivers: usbduxfast.c: fix for DMA buffers on stack staging: comedi: drivers: usbduxsigma.c: fix DMA buffers on stack Lars-Peter Clausen (4): iio:ad5064: Fix address of the second channel for ad5065/ad5045/ad5025 iio:ad5064: Fix off by one in DAC value range check iio:ad5064: Initialize register cache correctly ext3: Fix format string issues Laxman Dewangan (1): mfd: palmas: Provide irq flags through DT/platform data Ley Foon Tan (1): tty/serial: Add support for Altera serial port Li Zefan (1): s390: Fix a header dependencies related build error Linus Torvalds (2): perf,x86: fix wrmsr_on_cpu() warning on suspend/resume Linux 3.9-rc3 Liu Bo (4): Btrfs: get better concurrency for snapshot-aware defrag work Btrfs: remove btrfs_try_spin_lock Btrfs: fix warning when creating snapshots Btrfs: fix warning of free_extent_map Liu Jinsong (1): xen/acpi: remove redundant acpi/acpi_drivers.h include Luis Alves (2): m68knommu: add CPU_NAME for 68000 m68knommu: fix MC68328.h defines Maarten Lankhorst (1): drm/nouveau: fix regression in vblanking Malcolm Priestley (1): staging: vt6656: Fix oops on resume from suspend. Marc Kleine-Budde (1): usb: otg: use try_module_get in all usb_get_phy functions and add missing module_put Marcin Jurkowski (1): w1: fix oops when w1_search is called from netlink connector Marcin Slusarz (2): drm/nouveau: idle channel before releasing notify object drm/nv50: use correct tiling methods for m2mf buffer moves Marco Porsch (1): mac80211: fix oops on mesh PS broadcast forwarding Marco Stornelli (1): hostfs: fix a not needed double check Marek Szyprowski (1): ARM: DMA-mapping: add missing GFP_DMA flag for atomic buffer allocation Mark Brown (5): Input: ads7864 - check return value of regulator enable Input: mms114 - Fix regulator enable and disable paths mfd: tps65912: Declare and use tps65912_irq_exit() mfd: twl4030-audio: Fix argument type for twl4030_audio_disable_resource() mfd: wm831x: Don't forward declare enum wm831x_auxadc Mathias Krause (3): bridge: fix mdb info leaks rtnl: fix info leak on RTM_GETLINK request for VF devices dcbnl: fix various netlink info leaks Mathieu Desnoyers (1): Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys Matwey V. Kornilov (1): usb: cp210x new Vendor/Device IDs Maxime Ripard (2): ARM: mxs: cfa10049: Fix fb initialisation function ARM: multiplatform: Sort the max gpio numbers. Maxin B. John (1): tools: usb: ffs-test: Fix build failure Michel Lespinasse (1): mm/fremap.c: fix possible oops on error path Nicolas Pitre (1): ARM: mach-imx: move early resume code out of the .data section Nishanth Menon (2): ARM: dts: remove generated .dtb files on clean usb: gadget: composite: fix kernel-doc warnings Nithin Sujir (1): tg3: Update link_up flag for phylib devices Oliver Neukum (1): USB: cdc-wdm: fix buffer overflow Padmavathi Venna (1): Arm: socfpga: pl330: Add #dma-cells for generic dma binding support Paolo Valente (6): pkt_sched: sch_qfq: properly cap timestamps in charge_actual_service pkt_sched: sch_qfq: fix the update of eligible-group sets pkt_sched: sch_qfq: serve activated aggregates immediately if the scheduler is empty pkt_sched: sch_qfq: prevent budget from wrapping around after a dequeue pkt_sched: sch_qfq: do not allow virtual time to jump if an aggregate is in service pkt_sched: sch_qfq: remove a useless invocation of qfq_update_eligible Paul Bolle (8): netfilter: nfnetlink: silence warning if CONFIG_PROVE_RCU isn't set ARM: SPEAr13xx: Fix typo "ARCH_HAVE_CPUFREQ" m68k: drop "select EMAC_INC"
Re: [GIT PULL] arm-soc fixes for v3.9-rc3
On Mon, Mar 18, 2013 at 7:22 AM, Arnd Bergmann wrote: > > are available in the git repository at: > > git+ssh://gitol...@ra.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git > tags/fixes What the heck happened to your script? Please use the public address so that others could look at it if they want to (and so that my merge messages make sense in a public setting). I fixed it up, but please fix your script for next time. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression: Screen turns off when booting in EFI mode
This is apparently still outstanding, and Mantas hadn't cc'd the people involved with the commit itself. Background: with UEFI, commit f9a37be0f02a ("x86: Use PCI setup data") apparently results in a black screen for Mantas. The commit reverts fairly easily (there's been a trivial change to the function since due to dev->rom now being a proper phys_add_t), and considering that the commit doesn't explain what the f*ck it is needed for, or what it would help, I'm inclined to do just that. Trusting firmware-provided values over the things we can find ourselves is known to be fundamentally crap, so what the hell is the point of that commit in the first place? The likelihood that firmware messes up is pretty damn high. Why would we take idiotic "here's the PCI ROM" data from firmware in the first place? What did this fix? We know what it broke.. Doing things like blindly trusting the firmware data without even validating it is a really really bad idea. The commit actually looks seriously broken in other ways too, like blindly doing phys_to_virt() on that, and then trusting the result Mantas, mind changing that "pcibios_add_device()" function so that instead of setting dev->rom/romlen, it just prints out the values (including the device address)? Plase also make it print out the "data->len" field in addition to the rom->xyz fields.. Linus On Sat, Mar 9, 2013 at 1:42 PM, Mantas MikulÄ—nas wrote: > On 2013-02-22 03:03, Mantas MikulÄ—nas wrote: >> On 2013-02-22 01:54, Dave Airlie wrote: | radeon :01:00.0: No connectors reported connected with modes | [drm] Cannot find any crtc or sizes - going 1024x768 The connector is definitely connected, since this is a laptop with a built-in screen... >>> >>> Can you get the log with drm.debug=6 from both boots as well? >> >> Attached. > > The log is also at http://nullroute.eu.org/tmp/2013/dmesg-drm-debug.txt > > Not to be annoying, but I hope this can be fixed until 3.9... > > (I just tested v3.9-rc1-278-g8343bce, and it still does not detect any > displays. And if I understood it correctly, "nomodeset" is going to go > away?) > > -- > Mantas MikulÄ—nas > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression: Screen turns off when booting in EFI mode
On Tue, Mar 19, 2013 at 10:09 AM, Linus Torvalds wrote: > > Doing things like blindly trusting the firmware data without even > validating it is a really really bad idea. The commit actually looks > seriously broken in other ways too, like blindly doing phys_to_virt() > on that, and then trusting the result Ok, looks like the only thing filling it in is eboot.c, and I guess it relies on the EFI memory allocations having been mapped. Which they hopefully have been. Still, even that seems somewhat debatable. eboot.c does a plain memcpy() on the pci->romimage returned by EfiPciIoAttributeOperationGet. And I can *guarantee* that that doesn't work on some PCI chips that end up sharing the decoder for the ROM and the graphics aperture or other device oddities. Afaik, some Radeons do that, for example. So whoever wrote that eboot thing seems to assume that the world is a lot simpler and saner than it actually is, and that everybody magically got things right. Which they never do. The code was presumably tested on just a couple of machines. The problem (well, at least *one* problem) is that pci_map_rom() actually knows about some of these issues, but if dev->rom and dev->romlen have been set, it trusts them unconditionally. So we'd either need to fix that, or we need to be really *really* sure that we only set dev->rom to guaranteed-correct buffers. At least the radeon code seems to verify that the ROM image starts with 0x55/0xaa, but I'm guessing we can't do that in general, even if it is the traditional PC rom pattern. We only have a few users of "pci_map_rom()", I'm wondering if we can move the "dev->rom/romsize" cases into the callers. Then the callers could decide if they want to trust that "pseudo-shadowed" ROM image (which would test that 55/aa pattern for example), or whether they want to try to map the actual physical ROM. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression: Screen turns off when booting in EFI mode
On Tue, Mar 19, 2013 at 12:59 PM, Matthew Garrett wrote: > > Because it's the only way to get the PCI ROM in some cases, like on > pretty much all Apples with Radeons. Only using it if we have no other > options probably makes sense, though. Something like this (entirely > untested)? This looks reasonable. Mantas? Trusting the firmware-provided image when we can't find the actual HW image is quite reasonable. It's the "trust firmware unconditionally" part that gets my goat. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ipc,sem: sysv semaphore scalability
On Wed, Mar 20, 2013 at 12:55 PM, Rik van Riel wrote: > > This series makes the sysv semaphore code more scalable, > by reducing the time the semaphore lock is held, and making > the locking more scalable for semaphore arrays with multiple > semaphores. The series looks sane to me, and I like how each individual step is pretty small and makes sense. It *would* be lovely to see this run with the actual Swingbench numbers. The microbenchmark always looked much nicer. Do the additional multi-semaphore scalability patches on top of Davidlohr's patches help with the swingbench issue, or are we still totally swamped by the ipc lock there? Maybe there were already numbers for that, but the last swingbench numbers I can actually recall was from before the finer-grained locking.. And obviously, getting this tested so that there aren't any more missed wakeups etc would be lovely. I'm assuming the plan is that this all goes through Andrew? Do we have big semop users who could test it on real loads? Considering that I *suspect* the main users are things like Oracle etc, I'd assume that there's some RH lab or partner or similar that is interested in making sure this not only helps, but also that it doesn't break anything ;) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ipc,sem: sysv semaphore scalability
On Wed, Mar 20, 2013 at 1:49 PM, Linus Torvalds wrote: > > It *would* be lovely to see this run with the actual Swingbench > numbers. The microbenchmark always looked much nicer. Do the > additional multi-semaphore scalability patches on top of Davidlohr's > patches help with the swingbench issue, or are we still totally > swamped by the ipc lock there? > > Maybe there were already numbers for that, but the last swingbench > numbers I can actually recall was from before the finer-grained > locking.. Ok, and if the spinlock is still a big deal even with the finer granularity, it might be interesting to hear if Michel's fast locks make a difference. I'm guessing that this series might actually make it easier/cleaner to do the semaphore locking using another lock, since the ipc_lock got split up and out... I think Michel did it for some socket code too. I think that was fairly independent and was for netperf. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL tip/core/urgent] Fix for hlist_entry_safe() regression
On Thu, Mar 21, 2013 at 7:22 AM, Paul E. McKenney wrote: > [Reposting with corrected subject line.] > > Hello, Ingo, > > This series contains a single commit that fixes a regression in > hlist_entry_safe(). .. You do realize that I already merged this a week ago directly? (Merge commit f4846e52c517) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rwsem: steal writing sem for better performance
On Wed, Feb 6, 2013 at 10:28 PM, Ingo Molnar wrote: > > Linus, Andrew, what is your thinking about the patch and about > the timing of the patch? Not for 3.8. Queue it for 3.9, with possibly a stable tag with a big comment "apply after much testing". Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] f2fs fixes for v3.8-rc7
No. You guys need to realize that I'm not talking crap like this this late. This is not major bugfixes. I already looked away once just because it's a new filesystem, but enough is enough. This is way way WAY too late to start sendign "enhancements". Seriously. Send them for the next merge window. Not just before rc7. Linus On Thu, Feb 7, 2013 at 11:21 AM, Jaegeuk Kim wrote: > Hi Linus, > > Here are four patches which are critical bug fixes on f2fs, three > enhancement patches, and a number of trivial patches. > Please pull the following tag. Sorry for the late request. > > Thanks, > > The following changes since commit > 6abb7c25775b7fb2225ad0508236d63ca710e65f: > > Merge tag 'regulator-3.8-rc5' of > git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator > (2013-01-28 22:44:53 -0800) > > are available in the git repository at: > > > git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git > tags/f2fs-for-v3.8 > > for you to fetch changes up to 1efc6d3277f59b764384781c0f8dfc821f229380: > > f2fs: add compat_ioctl to provide backward compatability (2013-02-06 > 17:38:59 +0900) > > > f2fs fixes for v3.8 > > [Major bug fixes] > o Store device file information correctly > o Fix -EIO handling with respect to power-off-recovery > o Allocate blocks with global locks > o Fix wrong calculation of the SSR cost > > [Enhancement] > o Support (un)freeze_fs > o Enhance the f2fs_gc flow > o Support 32-bit binary execution on 64-bit kernel > > > Alejandro Martinez Ruiz (1): > f2fs: fix disable_ext_identify option spelling > > Changman Lee (5): > f2fs: save device node number into f2fs_inode > f2fs: add un/freeze_fs into super_operations > f2fs: stop repeated checking if cp is needed > f2fs: remove repeated F2FS_SET_SB_DIRT call > f2fs: remove unnecessary gc option check and balance_fs > > Jaegeuk Kim (6): > f2fs: prevent checkpoint once any IO failure is detected > f2fs: cover global locks for reserve_new_block > f2fs: remove the use of page_cache_release > f2fs: avoid balanc_fs during evict_inode > f2fs: clarify and enhance the f2fs_gc flow > f2fs: fix calculation of max. gc cost in the SSR case > > Namjae Jeon (8): > f2fs: avoid redundant call to has_not_enough_free_secs in f2fs_gc > f2fs: reorganize code for ra_node_page > f2fs: fix typo mistake for data_version description > f2fs: name gc task as per the block device > f2fs: mark gc_thread as NULL when thread creation is failed > f2fs: make an accessor to get sections for particular block type > f2fs: optimize the return condition for has_not_enough_free_secs > f2fs: add compat_ioctl to provide backward compatability > > majianpeng (4): > f2fs: clean up the add_orphan_inode func > f2fs: add device name in debugfs > f2fs: use F2FS_BLKSIZE to judge bloksize and page_cache_size > f2fs: when check superblock failed, try to check another > superblock > > fs/f2fs/checkpoint.c | 63 +++--- > fs/f2fs/debug.c | 4 +- > fs/f2fs/f2fs.h | 32 ++--- > fs/f2fs/file.c | 35 --- > fs/f2fs/gc.c | 124 > ++- > fs/f2fs/gc.h | 21 - > fs/f2fs/inode.c | 53 +- > fs/f2fs/node.c | 14 +++--- > fs/f2fs/recovery.c | 4 +- > fs/f2fs/segment.c| 29 > fs/f2fs/segment.h| 23 +++--- > fs/f2fs/super.c | 92 +- > 12 files changed, 262 insertions(+), 232 deletions(-) > > -- > Jaegeuk Kim > Samsung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT] Networking
Pulled. However, there's still the r8169 regressions (see the emails with the subject "regression: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out"). It's bisected, and a revert is reported to fix things. It's not in this pull request. Comments? Linus On Sat, Feb 9, 2013 at 7:17 AM, David Miller wrote: > > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master > > for you to fetch changes up to a1c83b054ebe1264ed9ae9d5c286f9eae68e60ea: -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kvmtool tree (Was: Re: [patch] config: fix make kvmconfig)
On Sat, Feb 9, 2013 at 1:55 AM, Ingo Molnar wrote: > > I'll remove it if Linus insists on it, but I think you guys are > putting form before substance and utility :-( No. Your pull requests are just illogical. I have yet to see a single reason why it should be merged. I *thought* "ease of use" could be a reason, and then people posted five-liner scripts to give some of the other virtual boxes the same kind of interface. Avoiding five lines of shell script is not a reason to pull a project into the kernel. > tools/kvm/ is a useful utility to kernel development, that in > just this past cycle was used as an incubator to: That's total bullshit. It could be useful whether it is merged into the kernel or not. "git" is a hell of a lot more useful utility for kernel development, to the point that practically we couldn't do without it any more, and it isn't merged into the kernel. It's a separate project with a separate life, and it is *better* for it. The same goes for all the tools that everybody uses every day for kernel development, often without even thinking about them. So "utility to kernel development" is not a reason for merging it into the kernel. NOT AT ALL. > *Please* don't try to harm useful code just for the heck of it. Again, total *bullshit*. Right now, the whole "attach the kvmtool to the kernel as a remora" doesn't make any sense at all, and not merging it doesn't harm anything AT ALL. Quite the reverse. The fact that kvmtool isn't available as a standalone project probably keeps people actively from using it. You can't just fetch kvmtool. You have to fetch the kernel and kvmtool, and if you're a kernel developer you either have to make a whole new kernel tree for it (which is stupid) or merge it into your normal kernel tree that has development that has nothing to do with kvmtool (which is stupid AND F*CKING INSANE) > Please stop this silliness, IMO it's not constructive at all ... Ingo, it's not us being silly, it is *you*. What the heck is the advantage of merging it into the kernel? It has never ever been explained. This is not like "perf", where there wasn't any alternatives, and oprofile had shown over many many years that the situation wasn't improving: it was only getting worse, and more disconnected from the actual capabilities of the hardware. But kvmtool is no "perf". There are other virtual boxes, and rather than being limited, they are more capable! The selling tool of kvmtool was never that it did something particularly magical, it was always that it did less, and was easier to use. But that does not in any way mean "should be merged". You can do less and be easier to use and stand on your own legs. So here, let me state it very very clearly: I will not be merging kvmtool. It's not about "useful code". It's not about the project keeping to improve. Both of those would seem to be *better* outside the kernel, where there isn't that artificial and actually harmful tie-in. In other words, I don't see *any* advantage to merging kvmtool. I think merging it would be an active mistake, and would just tie two projects together that just shouldn't be tied together. So nobody is "hurting useful code", except perhaps you. Explain to me why I'm wrong. I don't think you can. You certainly haven't so far. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux v3.8-rc7
: Provide dma_mmap_coherent() and dma_get_sgtable() blackfin: Provide dma_mmap_coherent() and dma_get_sgtable() c6x: Provide dummy dma_mmap_coherent() and dma_get_sgtable() cris: Provide dma_mmap_coherent() and dma_get_sgtable() frv: Provide dummy dma_mmap_coherent() and dma_get_sgtable() m68k: Provide dma_mmap_coherent() and dma_get_sgtable() mn10300: Provide dummy dma_mmap_coherent() and dma_get_sgtable() parisc: Provide dummy dma_mmap_coherent() and dma_get_sgtable() xtensa: Provide dummy dma_mmap_coherent() and dma_get_sgtable() Glauber Costa (1): memcg: fix typo in kmemcg cache walk macro H. Peter Anvin (1): x86, doc: Boot protocol 2.12 is in 3.8 Hans Verkuil (1): [media] radio: set vfl_dir correctly to fix modulator regression Haojian Zhuang (1): drivers/rtc/rtc-pl031.c: fix the missing operation on enable Hauke Mehrtens (2): bcma: unregister gpios before unloading bcma ssb: unregister gpios before unloading ssb Heiko Carstens (1): atm/iphase: rename fregt_t -> ffreg_t Ian Campbell (3): xen/netback: shutdown the ring if it contains garbage. xen/netback: free already allocated memory on failure in xen_netbk_get_requests netback: correct netbk_tx_err to handle wrap around. Ilpo Järvinen (1): tcp: fix for zero packets_in_flight was too broad Jan Beulich (2): x86-64: Replace left over sti/cli in ia32 audit exit code xen-pciback: rate limit error messages from xen_pcibk_enable_msi{,x}() Jan Luebbe (1): drivers/rtc/rtc-isl1208.c: call rtc_update_irq() from the alarm irq handler Jan Schmidt (1): Btrfs: fix EDQUOT handling in btrfs_delalloc_reserve_metadata Jason Wang (3): vhost_net: correct error handling in vhost_net_set_backend() vhost_net: handle polling errors when setting backend tuntap: allow polling/writing/reading when detached Jesse Gross (1): openvswitch: Move LRO check from transmit to receive. Jiri Olsa (1): perf: Fix event group context move Joe Perches (1): checkpatch: fix $Float creation of match variables Johan Hedberg (1): Bluetooth: Fix handling of unexpected SMP PDUs Johannes Naab (1): netem: fix delay calculation in rate extension Joonsoo Kim (1): tools/vm: add .gitignore to ignore built binaries Josef Bacik (3): Btrfs: do not merge logged extents if we've removed them from the tree Btrfs: fix missing i_size update Btrfs: fix possible stale data exposure Kirill A. Shutemov (1): thp: avoid dumping huge zero page Kukjin Kim (1): pinctrl: exynos: change PINCTRL_EXYNOS option Lan Tianyu (1): usb: Using correct way to clear usb3.0 device's remote wakeup feature. Larry Finger (2): rtlwifi: Fix the usage of the wrong variable in usb.c rtlwifi: Fix scheduling while atomic bug Lars Ellenberg (1): drbd: fix potential protocol error and resulting disconnect/reconnect Linus Torvalds (1): Linux 3.8-rc7 Liu Bo (1): Btrfs: fix race between snapshot deletion and getting inode Lucas Stach (1): net: usb: fix regression from FLAG_NOARP code Luis Llorente Campo (1): USB: add OWL CM-160 support to cp210x driver Marcelo Ricardo Leitner (1): ipv6: do not create neighbor entries for local delivery Marek Szyprowski (1): regulator: max8998: fix incorrect min_uV value for ldo10 Matthew Daley (1): xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop. Matthias Brugger (1): MAINTAINERS: update avr32 web ressources Matthieu CASTET (1): mtd: nand: onfi don't WARN if we are in 16 bits mode Miao Xie (2): Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write() Btrfs: fix missing release of the space/qgroup reservation in start_transaction() Michael S. Tsirkin (2): tun: fix carrier on/off status tcm_vhost: fix pr_err on early kick Mike Marciniszyn (1): IB/qib: Fix for broken sparse warning fix Mikko Tiihonen (1): drm/radeon: protect against div by 0 in backend setup Milos Vyletel (1): bonding: unset primary slave via sysfs Neil Horman (1): vmxnet3: set carrier state properly on probe Nicholas Bellinger (5): target: Fix zero-length INQUIRY additional sense code regression target: Fix zero-length MODE_SENSE regression target: Fix zero-length READ_CAPACITY_16 regression target: Fix regression allowing unconfigured devices to fabric port link target: Fix divide by zero bug in fabric_max_sectors for unconfigured devices Nickolai Zeldovich (1): drivers: xhci: fix incorrect bit test Nivedita Singhvi (1): tcp: Increment LISTENOVERFLOW and LISTENDROPS in tcp_v4_conn_request() Or Gerlitz (1): mlx4_core: Fix advertisement of wrong PF context behaviour Paul Gortmaker (2): rcu: Prevent soft-lockup complaints about no-CBs CPUs rcu: Make rcu_nocb_poll an early_pa
Re: kvmtool tree (Was: Re: [patch] config: fix make kvmconfig)
On Sat, Feb 9, 2013 at 10:57 AM, Pekka Enberg wrote: > > And yes, you are absolutely correct that living in the kernel tree is > suboptimal for the casual user. However, it's a trade-off to make > tools/kvm *development* easier especially when you need to touch both > kernel and userspace code. Quite frankly, that's just optimizing for the wrong case. The merged case seems to make sense for you and Ingo, and nobody else. And then you wonder why nobody else wants to merge it. I've told you my reasons, you didn't give me *any* actual reasons for me to merge the code. NONE of your statements made any sense at all, since everything you talk about could have been done with a separate project. The only thing the lock-step does is to generate the kind of dependency that I ABSOLUTELY DETEST, where one version of kvmtools goes along with one version of the kernel. That's a huge disadvantage (and we've actually seen signs of that in the perf tool too, where old versions of the tools have been broken, because the people working on them have been way too much in lock-step with the kernel it is used on). And if it's not the case that they have to be used in lockstep, then clearly kvmtool developers could just as easily just have a separate git repository. So you can't have it both ways. What's so wrong with just making it a separate project? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kvmtool tree (Was: Re: [patch] config: fix make kvmconfig)
You do realize that none of your arguments touched the "why should Linus merge the tree" question at all? Everything you said was about how it's more convenient for you and Ingo, not at all about why it should be better for anybody else. You haven't bothered to even try making it an external project, so it doesn't compile that way. You're the only one working on it, so being convenient for you is the primary issue. Arguments like that actively make me not want to merge it, because they are only arguments for you continuing to work the way you have, not arguments for why the project would make sense to merge into the main kernel repository. So I think we should just remove this from linux-next, and be done with the fantasy that it makes sense to merge this. You're not even trying to convince anybody else about the merge making sense. You might as well continue to work the way you do, and I don't mind - but none of it argues for me merging it into the kernel. There's no reason why kvmtool couldn't be external the way all the other virtualization projects are. Linus On Feb 9, 2013 2:01 AM, "Pekka Enberg" wrote: > > On Sat, Feb 9, 2013 at 2:45 AM, Linus Torvalds > wrote: > > Quite frankly, that's just optimizing for the wrong case. > > I obviously don't agree. I'm fairly sure there wouldn't be a kvmtool > that supports x86, PPC64, ARM, and all the virtio drivers had we not > optimized for making development for kernel folks easy. > > In fact that's something Ingo pushed for pretty hard early on and we > also worked hard just to make the code 'feel familiar' to kernel folks. > The assumption was that if we did that, we'd see contributions from > people who would normally not write userspace code. > > On Sat, Feb 9, 2013 at 2:45 AM, Linus Torvalds > wrote: > > The merged case seems to make sense for you and Ingo, and nobody else. > > That's hardly surprising. I'm the only person who was crazy enough to > listen to Ingo and follow through with the damn thing. So I either have > the same experience and perspective as Ingo does on the matter - or I'm > just as full of 'bullshit' as he is. > > On Sat, Feb 9, 2013 at 2:45 AM, Linus Torvalds > wrote: > > The only thing the lock-step does is to generate the kind of > > dependency that I ABSOLUTELY DETEST, where one version of kvmtools > > goes along with one version of the kernel. > > That is simply NOT TRUE. We have never done such a thing with 'kvmtool' > nor I have any evidence that 'perf' has done that either. I regularily > run old versions to make sure that we stay that way. > > On Sat, Feb 9, 2013 at 2:45 AM, Linus Torvalds > wrote: > > So you can't have it both ways. What's so wrong with just making it a > > separate project? > > Do you really think it's an option I have not considered several times? > > There are the immediate practical problems: > > - What code should we take with us from the Linux repository. If I take > just tools/kvm, it won't even build. > > - Where do we do our development? Right now we are using the KVM list > and are part of tip tree workflow. As a separate project, we need to > build the kind of infrastructure we already are relying on now. > > Then there are the long term issues: > > - How do we keep up with KVM and virtio improvements? > > - How do we ensure we get improvements that happened in the kernel > tree to our codebase for the code we share? > > - How do we make it easy for future KVM and virtio developers to > access our code? > > If you want perspective on this just ask Ingo sometime how he feels > about making tools/perf a separate project (which I have actually done). > Much of the *practical* aspects applies to tools/kvm. > > And really, I'm a practical kind of guy. Why do you think I'm willing to > bang to my head to the wall if spinning off as a separate project would > be as simple as you seem to think it is? > > Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kvmtool tree (Was: Re: [patch] config: fix make kvmconfig)
On Sun, Feb 10, 2013 at 6:39 AM, Pekka Enberg wrote: > > The main argument for merging into the main kernel repository has always been > that (we think) it improves the kernel because significant amount of > development is directly linked to kernel code (think KVM ARM port here, for > example). The secondary argument has been to make it easy for kernel > developers > to work on both userspace and kernel in tandem (like has happened with vhost > drivers). In short: it speeds up development of Linux virtualization code. Why? You've made this statement over and over and over again, and I've dismissed it over and over and over again because I simply don't think it's true. It's simply a statement with nothing to back it up. Why repeat it? THAT is my main contention. I told you why I think it's actually actively untrue. You claim it helps, but what is it about kvmtool that makes it so magically helpful to be inside the kernel repository? What is it about this that makes it so critical that you get the kernel and kvmtool with a single pull, and they have to be in sync? When you then at the same time claim that you make very sure that they don't have to be in sync at all. See your earlier emails about how you claim to have worked very hard to make sure they work across different versions. So you make these unsubstantiated claims about how much easier it is, and they make no sense. You never explain *why* it's so magically easier. Is git so hard to use that you can't do "git pull" twice? And why would you normally even *want* to do git pull twice? 99% of the work in the kernel has nothing what-so-ever to do with kvmtool, and hopefully the reverse is equally true. And tying into the kernel just creates this myopic world of only looking at the current kernel. What if somebody decides that they actually want to try to boot Windows with kvmtool? What if somebody tells you that they are really tired of Xen, and actually want to turn kvmtool into *replacement* for Xen instead? What if somebody wants to branch off their own work, concentrating on some other issue entirely, and wants to merge with upstream kvmtool but not worry about the kernel, because they aren't working on the Linux kernel at all, and their work is about something else? I just don't think it makes sense. I don't see what the huge advantage of a single git tree is. Anyway, I'm done arguing. You can do what you want, but just stop misrepresenting it as being "linux-next" material unless you are willing to actually explain why it should be so. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kvmtool tree (Was: Re: [patch] config: fix make kvmconfig)
On Mon, Feb 11, 2013 at 4:26 AM, Ingo Molnar wrote: > > If you are asking whether it is critical for the kernel project > to have tools/kvm/ integrated then it isn't. The kernel will > live just fine without it, even if that decision is a mistake. You go on to explain how this helps kvmtool, and quite frankly, I DO NOT CARE. Everything you talk about is about helping your work, by making the kernel maintenance be more. The fact that you want to use kernel infrastructure in kvmtool is a great example: you may think it's a great thing, but for the kernel it's just extra work, and extra layers of abstraction etc etc. And then you make it clear that you haven't even *bothered* to try to make it a separate project. Sorry, but with that kind of approach, I get less and less interested. I think this whole tying together is a big mistake. It encourages linkages that simply shouldn't be there. And no, perf is not the perfect counter-example. With perf,. the linkages made sense! There's supposed to be deep linkages to profiling and event counting. There is ABSOLUTELY NOT supposed to be deep linkages with virtualization. Quite the reverse. And no, I don't want to maintain the mess that is both. There's just no gain, and lots of potential pain. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kvmtool tree (Was: Re: [patch] config: fix make kvmconfig)
On Mon, Feb 11, 2013 at 5:18 AM, David Woodhouse wrote: > > That's complete nonsense. If you want to use pieces of the kernel > infrastructure, then just *take* them. There are loads of projects which > use the kernel config tools, for example. There's no need to be *in* the > kernel repo. Exactly. I do *not* want a abstraction layer just because somebody wants to use it. It causes idiotic guards in the header files etc. We already had that pain with the user-level header inclusions etc. Just copy it. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kvmtool tree (Was: Re: [patch] config: fix make kvmconfig)
On Mon, Feb 11, 2013 at 9:58 AM, Ingo Molnar wrote: > > So basically Pekka optimistically thought it's an eventual 'tit > for tat', a constant stream of benefits to the kernel, in the > hope of finding a home in the upstream kernel which would > further help both projects. The kernel wants to keep the 'tit' > only though. Ingo, stop this idiotic nonsense. You seem to think that "kvmtool is useful for kernel" is somehow relevant. IT IS TOTALLY IRRELEVANT. "sparse" is useful for kernel development. "git" is useful for kernel development. "xterm" is useful for kernel development. See a pattern? We have tons of tools that are helping develop the kernel, and for absolutely NONE of them is that at all an argument for merging them into the kernel. If the Xen people came and asked me to merge their virtualizer code into the kernel, I'd call them idiots. Why is kvmtool so magical that you use this argument for merging it into the kernel? It makes no sense. Yet you continue to use it as if it was somehow an argument for merging it. Despite the hundreds of projects to the contrary. So this whole "constant stream of benefits" you talk about is PURE AND UTTER GARBAGE. And that's not a commentary on whether it is true or not, it's a commentary on the fact that it is entirely irrelevant to whether something should be merged. Merging two projects does not make them easier to maintain. Quite the reverse. It just ties the maintenance together in irrelevant ways. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/mm] x86, mm: Use a bitfield to mask nuisance get_user() warnings
On Mon, Feb 11, 2013 at 5:37 PM, tip-bot for H. Peter Anvin wrote: > > However, we can declare a bitfield using sizeof(), which is legal > because sizeof() is a constant expression. This quiets the warning, > although the code generated isn't 100% identical from the baseline > before 96477b4 x86-32: Add support for 64bit get_user(): Christ. This is so ugly that it's almost a work of art. Has anybody run this past any gcc developers? And if so, did they run away screaming? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/mm] x86, mm: Use a bitfield to mask nuisance get_user() warnings
On Mon, Feb 11, 2013 at 8:21 PM, H. Peter Anvin wrote: > On 02/11/2013 07:33 PM, Linus Torvalds wrote: > >> Has anybody run this past any gcc developers? And if so, did they run >> away screaming? > > I haven't no... H.J., any comments on this patch? I'd be most worried about any known pitfalls about bitfield code generation. Looking at your code size numbers, it actually seems to *improve* code generation except for the odd i386.pae case (bigger code but also a different data size - odd) and i386 noconfig (different bss, bigger code). The code/data changes makes me wonder if the variable sometimes gets flushed to memory as a 8-byte entry, and maybe there are things gcc people can suggest.. But I don't see anything fundamentally wrong with it. Certainly it looks much better than the disgusting and warning-prone unsigned long long __val_gu8 thing. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/mm] x86, mm: Use a bitfield to mask nuisance get_user() warnings
On Mon, Feb 11, 2013 at 8:42 PM, Linus Torvalds wrote: > > But I don't see anything fundamentally wrong with it. Certainly it > looks much better than the disgusting and warning-prone > > unsigned long long __val_gu8 > > thing. Oh. I just realized. That was your _baseline_ in the comparisons, wasn't it? Can you please make the baseline be the current mainline git version of , not the first "unsigned long long __val_gu8" version of the 64-bit get_user()? Because we should compare against the straightforward code, not the one that could have messed things up already.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch for-3.8] fs, dlm: fix build error when EXPERIMENTAL is disabled
On Tue, Feb 12, 2013 at 1:50 AM, Steven Whitehouse wrote: > > That doesn't seem right to me... DLM has not been experimental for a > long time now. Why not just select CRC32 in addition to IP_SCTP ? Hmm. IP_SCTP already does a "select libcrc32c". So why doesn't that end up working? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/mm] x86, mm: Use a bitfield to mask nuisance get_user() warnings
On Tue, Feb 12, 2013 at 8:38 AM, H.J. Lu wrote: > > Can you do something similar to what we did in glibc: No. Because we use macros to be type-independent (i e"get_user()" works *regardless* of type), so casting to "uintptr_t" doesn't work. It throws away the type information, and truncates 64-bit values on 32-bit architectures. The whole point of the bitmask thing is that it doesn't have that issue, and gets the size correct automatically. It's not pretty, but it allows the rest of the sources to be readable. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Abysmal HDD/USB write speed after sleep on a UEFI system
On Mon, Feb 11, 2013 at 10:25 PM, Artem S. Tashkinov wrote: > Hello Linus, > > I've already posted a bug report > (https://bugzilla.kernel.org/show_bug.cgi?id=53551), > a message to LKML > (http://lkml.indiana.edu/hypermail/linux/kernel/1302.1/00837.html) > and so far I've received zero response even though the bug is quite critical > as it prevents > me from using suspend altogether. > > I wonder if you could tell me who is responsible for this problem and who I > need to CC in > bugzilla. According to your bugzilla it doesn't really seem to be strictly UEFI-specific, and it's hard to tell what subsystem is to blame. A few things to try to pinpoint: (a) Is it *only* write performance that suffers, or is it other performance too? Networking (DMA? Perhaps only writing *to* the network?)? CPU? (b) the fact that it apparently happens with both SATA and USB implies that it's neither, and is more likely something core like memory speed (mtrr, caching) or PCI (DMA, burst sizes, whatever). (c) can you find anything that changes over the suspend/resume? IOW, look at things like "lspci -vvxxx" before-and-after, and see what changed on the bridges leading to both things etc. The performance drop sounds extreme enough that it sounds like caches got disabled or something, but that should show up as CPU performance in general being slow, not just writes to disk. But basically, I think we need more clues about which sub-area is actually the culprit. My *guess* would be some core PCI thing not being initialized, but I don't see how you could even make PCI go that slow. Interrupt problems? DMA failures? I have no idea. Has it ever worked? Suspend on desktop motherboards used to be quite spotty (nobody ever used it, manufacturers didn't care), but it generally has gotten better since people use it more these days.. Added lkml and Bjorn to the participants, in case anybody has any ideas.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/mm] x86, mm: Use a bitfield to mask nuisance get_user() warnings
On Tue, Feb 12, 2013 at 9:14 AM, H. Peter Anvin wrote: > > No, I think what he is talking about it this bit: Ok, I agree that the bitfield code actually looks cleaner. That said, maybe gcc has an easier time using a few odd builtins and magic typeof's. But at least the bitfield trick looks half-way portable.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/mm] x86, mm: Use a bitfield to mask nuisance get_user() warnings
On Tue, Feb 12, 2013 at 9:35 AM, H. Peter Anvin wrote: > > On the other hand, it still uses two gcc extensions: long long bitfields and > typeof. > > I'll see what kind of code we get with the macro. At least one thing to look out for is the poor LLVM people who are trying to make the kernel compile with that compiler.. We shouldn't make it arbitrarily harder for them, so *some* level of portability is a good idea. Then there is icc, but I don't know how relevant that would ever be. At least LLVM has the potential to be widely available. Of course, they may both already support even the odd gcc builtins - we already use a lot of the more straightforward ones... Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/mm] x86, mm: Use a bitfield to mask nuisance get_user() warnings
On Tue, Feb 12, 2013 at 10:25 AM, H. Peter Anvin wrote: > I just thought up this variant, I'm about to test it, but H.J., do you > see any problems with it? Looks good to me. And we already use __builtin_choose_expr(), so it's "portable". And it should avoid all the potential issues with bitfields (rmk already pointed out how bitfields don't work well with the ARM model, who knows what other pitfalls bitfield code generation could have) I wonder if we could/should eventually do some of the sizeof() in generic code - not have these magic things duplicated in all the architectures, just have the architectures specify the raw typed details (__copy_to_user_4() etc). So cross-platform portability could be a good thing. That's a separate discussion, though, and possibly not worth it. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Abysmal HDD/USB write speed after sleep on a UEFI system
On Tue, Feb 12, 2013 at 10:29 AM, Artem S. Tashkinov wrote: > Feb 12, 2013 11:30:20 PM, Linus Torvalds wrote: >> >>A few things to try to pinpoint: >> >> (a) Is it *only* write performance that suffers, or is it other >>performance too? Networking (DMA? Perhaps only writing *to* the >>network?)? CPU? > > I've tested hdpard -tT --direct and the output on boot and after suspend > is quite similar. > > I've also checked my network read/write speed, and it's the same > ~ 100MBit/sec (I have no 1Gbit computers on my network > unfortunately). Ok. So it really sounds like just USB and HD writes. Which is quite odd, since they have basically nothing in common I can think of (except the obvious block layer issues). >> (b) the fact that it apparently happens with both SATA and USB >>implies that it 's neither, and is more likely something core like >>memory speed (mtrr, caching) or PCI (DMA, burst sizes, whatever). > > I've no idea, please, check my bug report where I've just added lots of > information including a diff between on boot and after suspend. I'm not seeing anything particularly interesting there. Except why/how did the MSI address/data change for the SATA controller? The irq itself hasn't changed.. There's probably some sane reason for that too (it's an odd encoding, maybe they code for the same thing), and there's nothing like that for USB, so... And if it was irq problems, I'd expect you to see it more for reads than for writes anyway. Along with a few messages about missed irqs and whatever. I'm stumped, and have no ideas. I can't even begin to guess how this would happen. One thing to try is if it happens for all USB ports (you have multiple controllers) and I assume performance doesn't come back if you unplug and replug the USB disk.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Debugging Thinkpad T430s occasional suspend failure.
On Tue, Feb 12, 2013 at 11:39 AM, Dave Jones wrote: > My Thinkpad T430s suspend/resumes fine most of the time. But every so often > (like one in ten times or so), as soon as I suspend, I get a black screen, > and a blinking power button. > > (Note: Not the capslock lights like when we panic, this laptop 'conveniently > doesn't have those. This is the light surrounding the power button, which > afaik > isn't even OS controlled, so maybe we're dying somewhere in SMI/BIOS land?) Yeah, the blinking power light is a feature of the chipset, the SMI code sets a magic bit in one the register and it will pulse a pin at a given frequency so that you get the "power light blinking while suspended" thing. So the suspend finished, and > I tried debugging this with pm_trace, which told me.. > > [4.576035] Magic number: 0:455:740 > [4.576037] hash matches drivers/base/power/main.c:645 > > Which points me at.. > > 642 Complete: > 643 complete_all(&dev->power.completion); > 644 > 645 TRACE_RESUME(error); > 646 > 647 return error; > 648 } I suspect it's the last tracepoint, and the kernel thinks it sucessfully resumed all devices. You *should* be able to match the magic number with the last device too, but that's only interesting if you get the hash matching *before* the device is resumed (ie you can try to figure out if the resume hung in the device resume list). And it only works if it gets a matching name on the dpm_list (see show_dev_hash), and it apparently didn't. I suspect it's some system device and not interesting, and you really just hit the last entry in the resume tree. > The only thing interesting here I think is that this is the resume path. > So perhaps something failed to suspend, and we tried to back out of > suspending, > but something was too screwed up to abort cleanly ? Yes, the trace is definitely in the resume path. And maybe we have something > I've tried hooking up a serial console, and even tried console_noblank, > which yielded no additional info at all. (I'm guessing the consoles are > suspended > at the time of panic) serial consoles and even nonblanking consoles seldom tend to work well for suspend debugging. It *has* happened, but it's rare. > I also tried unloading all the modules I have loaded before the suspend, which > seemed to reduce the chances of it happening, but eventually it reoccurred. > > Any ideas on how I can further debug this ? The design of the TRACE_RESUME() thing really is as a really poor mans "printf()". IOW, the existing points are more "suggested starting points" than anything else, and the idea is that you can start adding more and more of them as you try to narrow down exactly where it fails.. And it's painful has hell. Plus add too many of them, and you get hash collisions etc. It's a last-ditch effort, but it exists mainly because we have never really figured out anything better. There's a reason I've asked Intel for better CPU lockup tracing facilities for the last 10+ years ;) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/mm] x86, mm: Redesign get_user with a __builtin_choose_expr hack
So this looks clean, but I noticed something (that was true even of the old 64-bit accesses) On Tue, Feb 12, 2013 at 12:55 PM, tip-bot for H. Peter Anvin wrote: > + register __inttype(*(ptr)) __val_gu asm("%edx");\ How does gcc even alllow this? On x86-32, you cannot put a 64-bit value in %edx. Where do the upper bits go? It clearly cannot be %edx:%eax, since we put the error value in %eax. So is the rule for x86-32 that naming "long long" register values names the first register, and the high bits go into the next one (I forget the crazy register numbering, I assume it's %ecx). Or what? This should have a comment. Also, come to think of it, we have tried the "named register variables" thing before, and it has resulted in problems with scope. In particular, two variables within the same scope and the same register have been problematic. And it *does* happen, when you have things like /* copy_user */ put_user(get_user(.., addr), addr2); and then things go downhill. Maybe we do not have these issues, but there are good reasons why we've tried very hard on x86 to avoid named register variables. (I realize that they happen, and some other architectures don't even have good support for naming registers any other way so they are way more common there, so I probably worry needlessly, but it does worry me). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/mm] x86, mm: Redesign get_user with a __builtin_choose_expr hack
On Tue, Feb 12, 2013 at 3:19 PM, H. Peter Anvin wrote: > > Yes, but there doesn't seem to be any other way to do this. gcc won't > even allow "=cd" even if we know the variable is 64 bits, even though > "=A" is documented to be equivalent to "=da". No, "=da" means value "in edx _or_ %eax". Not the same as "A". But you're right, there's nothing similar for %ebx:%ecx. I thought there was. I was really sure we did something special for 64-bit adc etc. > Let me know what you think. I guess we don't have any choice. And the other cleanups certainly look good. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ipc,sem: sysv semaphore scalability
On Tue, Apr 2, 2013 at 9:08 AM, Sasha Levin wrote: > > If you guys are already looking at this, the conversions between size_t, > long and int in the do_msgrcv/load_msg/alloc_msg code are a mess. You could > trigger anything from: Good catch. Let's just change the "(long)bufsz < 0" into "bufsz > INT_MAX". I suspect we should change some of the "int" arguments to "size_t" too so that we don't have these kinds of odd "different routines see different values due to subtle casting errors", but in the end we don't really want to ever help people have these kinds of potential overflow issues. We already limit normal read/write/sendmsg etc to INT_MAX (although we tend to *truncate* it to INT_MAX rather than return an error, but I think the simpler patch here is preferable unless somebody complains). Comments? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ipc,sem: sysv semaphore scalability
On Tue, Apr 2, 2013 at 9:08 AM, Sasha Levin wrote: > > By just playing with the 'msgsz' parameter with MSG_COPY set. Hmm. Looking closer, I suspect you're testing without commit 88b9e456b164 ("ipc: don't allocate a copy larger than max"). That should limit the size passed in to prepare_copy -> load_copy to msg_ctlmax. Now, I think it's possibly still a good idea to limit bufsz to INT_MAX regardless, but as far as I can see that prepare_copy -> load_copy path is the only place that can get confused. Everybody else uses size_t (or "long" in the case of r_maxsize) as far as I can tell. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
af_unix udev startup regression
[ Fixed odd legacy subject line that has nothing to do with the actual bug ] Hmm. Can you double-check and verify that reverting that commit makes things work again for you? Also, what's your distribution and setup? I'd like this to get verified, just to see that it's not some timing-dependent thing or a bisection mistake, but if so, then the LSB test-cases obviously have to be fixed, and the commit that causes the problem needs to be reverted. Test-cases count for nothing compared to actual users. Linus On Thu, Apr 4, 2013 at 9:17 AM, Lai Jiangshan wrote: > Hi, ALL > > I also encountered the same problem. > > git bisect: > > 14134f6584212d585b310ce95428014b653dfaf6 is the first bad commit > commit 14134f6584212d585b310ce95428014b653dfaf6 > Author: dingtianhong > Date: Mon Mar 25 17:02:04 2013 + > > af_unix: dont send SCM_CREDENTIAL when dest socket is NULL > > SCM_SCREDENTIALS should apply to write() syscalls only either source or > destination > socket asserted SOCK_PASSCRED. The original implememtation in > maybe_add_creds is wrong, > and breaks several LSB testcases ( i.e. > /tset/LSB.os/netowkr/recvfrom/T.recvfrom). > > Origionally-authored-by: Karel Srot > Signed-off-by: Ding Tianhong > Acked-by: Eric Dumazet > Signed-off-by: David S. Miller > > :04 04 ef0356cc0fc168a39c0f94cff0ba27c46c4d0048 > ae34e59f235c379f04d6145f0103cccd5b3a307a M net > > === > Like Brian Gerst, no obvious bug, but the system can't boot, "service udev > start" fails when boot > (also DEBUG_PAGEALLOC=n, I did not try to test with it=y) > > [ 11.022976] systemd[1]: udev-control.socket failed to listen on sockets: > Address already in use > [ 11.023293] systemd[1]: Unit udev-control.socket entered failed state. > [ 11.182478] systemd-readahead-replay[399]: Bumped block_nr parameter of > 8:16 to 16384. This is a temporary hack and should be removed one day. > [ 14.473283] udevd[410]: bind failed: Address already in use > [ 14.478630] udevd[410]: error binding udev control socket > [ 15.201158] systemd[1]: udev.service: main process exited, code=exited, > status=1 > [ 16.900792] udevd[427]: error binding udev control socket > [ 18.356484] EXT4-fs (sdb7): re-mounted. Opts: (null) > [ 19.738401] systemd[1]: udev.service holdoff time over, scheduling > restart. > [ 19.742494] systemd[1]: Job pending for unit, delaying automatic restart. > [ 19.747764] systemd[1]: Unit udev.service entered failed state. > [ 19.752303] systemd[1]: udev-control.socket failed to listen on sockets: > Address already in use > [ 19.770723] udevd[459]: bind failed: Address already in use > [ 19.771027] udevd[459]: error binding udev control socket > [ 19.771175] udevd[459]: error binding udev control socket > [ 19.813256] systemd[1]: udev.service: main process exited, code=exited, > status=1 > [ 19.914450] systemd[1]: udev.service holdoff time over, scheduling > restart. > [ 19.918374] systemd[1]: Job pending for unit, delaying automatic restart. > [ 19.923392] systemd[1]: Unit udev.service entered failed state. > [ 19.923808] systemd[1]: udev-control.socket failed to listen on sockets: > Address already in use > [ 19.943792] udevd[465]: bind failed: Address already in use > [ 19.944056] udevd[465]: error binding udev control socket > [ 19.944210] udevd[465]: error binding udev control socket > [ 19.946071] systemd[1]: udev.service: main process exited, code=exited, > status=1 > [ 20.047524] systemd[1]: udev.service holdoff time over, scheduling > restart. > [ 20.051939] systemd[1]: Job pending for unit, delaying automatic restart. > [ 20.057539] systemd[1]: Unit udev.service entered failed state. > [ 20.058069] systemd[1]: udev-control.socket failed to listen on sockets: > Address already in use > [ 20.081141] udevd[467]: bind failed: Address already in use > [ 20.087120] udevd[467]: error binding udev control socket > [ 20.092040] udevd[467]: error binding udev control socket > [ 20.096519] systemd[1]: udev.service: main process exited, code=exited, > status=1 > [ 20.184910] systemd[1]: udev.service holdoff time over, scheduling > restart. > [ 20.189863] systemd[1]: Job pending for unit, delaying automatic restart. > [ 20.195440] systemd[1]: Unit udev.service entered failed state. > [ 20.196012] systemd[1]: udev-control.socket failed to listen on sockets: > Address already in use > [ 20.220543] udevd[469]: bind failed: Address already in use > [ 20.220584] udevd[469]: error binding udev control socket > [ 20.220780] udevd[469]: error binding udev control socket > [ 20.222830] systemd[1]: udev.service: main process exited, code=exited, > status=1 > [ 20.323906] systemd[1]: udev.service holdoff time over, scheduling > restart. > [ 20.329170] systemd[1]: Job pending for unit, delaying automatic restart. > [ 20.334785] systemd[1]: Unit udev.service entered failed state. > [ 20.335318] systemd[1]:
Re: [PATCH] mm: prevent mmap_cache race in find_vma()
On Thu, Apr 4, 2013 at 11:35 AM, Hugh Dickins wrote: > > find_vma() can be called by multiple threads with read lock > held on mm->mmap_sem and any of them can update mm->mmap_cache. > Prevent compiler from re-fetching mm->mmap_cache, because other > readers could update it in the meantime: Ack. I do wonder if we should mark the unlocked update too some way (also in find_vma()), although it's probably not a problem in practice since there's no way the compiler can reasonably really do anything odd with it. We *could* make that an ACCESS_ONCE() write too just to highlight the fact that it's an unlocked write to this optimistic data structure. Anyway, applied. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: prevent mmap_cache race in find_vma()
On Thu, Apr 4, 2013 at 12:01 PM, Hugh Dickins wrote: > > When Paul reminded us of it yesterday, I came to wonder if actually > every use of ACCESS_ONCE in the read form should strictly be matched > by ACCESS_ONCE whenever modifying the location. > > My uneducated guess is that strictly it ought to, in the sense of > insurance policy; but that (apart from that strange split writing > issue which came up a couple of months ago) in practice our compilers > have not "advanced" to the point of making this an issue yet. I don't see how a compiler could reasonably really ever do anything different, but I do think the ACCESS_ONCE() modification version might be a good thing just as a "documentation". This is a good example of this issue, exactly because we have a mix of both speculative cases (the find_vma() lookup and modification) together with strictly exclusive locked accesses to the same field (the ones that invalidate the cache under the write lock). So documenting that the write in find_vma() is this kind of "optimistic unlocked access" is actually a potentially interesting piece of information for programmers, completely independently of whether the compiler will then treat it really differently or not. Of course, a plain comment would do the same, but would be less greppable. And despite the verbiage here, I don't really have a very strong opinion on this. I'm going to let it go, and if somebody sends me a patch with a good explanation in the next merge window, I'll probably apply it. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Sound fixes for 3.9-rc6
On Fri, Apr 5, 2013 at 12:46 AM, Takashi Iwai wrote: > > please pull sound fixes for v3.9-rc6 from: > > git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-linus Argh, Takashi, you're usually so reliable... But you actually meant for me to pull the sound-3.9 tag, didn't you? That "for-linus" branch isn't a signed tag.. Please double-check your scripts, Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS2: Pull request (fixes)
On Fri, Apr 5, 2013 at 9:27 AM, David Teigland wrote: > On Fri, Apr 05, 2013 at 11:34:45AM +0100, Steven Whitehouse wrote: >> Please consider pulling the following changes, > > There's some mixup here that should be cleared up first. > >> David Teigland (2): >> GFS2: Fix unlock of fcntl locks during withdrawn state >> >> Steven Whitehouse (1): >> GFS2: Fix unlock of fcntl locks during withdrawn state Looks like the summary line for one got leaked through an email follow-up to the other. So now the summary of the second commit is meaningless and doesn't actually describe it. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] firmware,IB/qib: revert firmware file move
On Fri, Apr 5, 2013 at 11:15 AM, Mike Marciniszyn wrote: > Commit e2eed58 ("IB/qib: change QLogic to Intel") moved a firmware file > potentially breaking the ABI. Please send things like this generated with the "-M" flag so that you can see it as a rename, instead of a huge add/del patch. Sure, some people may still use traditional "patch", but catering to them when it actually hides what the patch does is just not worth it. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] Please pull powerpc.git merge branch
On Mon, Jan 28, 2013 at 3:42 PM, Benjamin Herrenschmidt wrote: > > Whenever you have a chance between two dives, you might want to consider > pulling my merge branch to pickup a few fixes for 3.8 that have been > accumulating for the last couple of weeks (I was myself travelling > then on vacation). I'll have you know that I haven't quite even left for Au yet, and I have LCA before diving. So no snarky "in between dives" comments, please. At least not for a few days. > git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git Nothing there. Forgot to push? Or some unnamed branch/tag? (And I _am_ leaving for the airport soon, so I may not get to it for a while unless you reply asap) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: circular locking dependency detected
On Thu, Jan 31, 2013 at 9:19 AM, Russell King wrote: > > So... what you seem to be telling me is that 3.9 is going to be a > release which issues lockdep complaints when the console blanks, and > you think that's acceptable? > > Adding Linus and Andrew so they're aware of this issue... Oh, we're extremely aware of it. And it's not a new issue, the locking problem have apparently been around forever, although I'm not sure why the lockdep splat itself started happening only recently. They'll make it into 3.9, it's 3.8 that won't have them. The patches initially caused way *worse* behavior than just a lockdep splat - they caused actual hard lockups (and that was *after* the initial series of fixes). That got fixed (hopefully for the last case!) fairly recently, and I'm not willing to take the scary patch-series that has had several problem cases. LInus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: circular locking dependency detected
On Thu, Jan 31, 2013 at 11:13 AM, Russell King wrote: > > Which may or may not be a good thing depending how you look at it; it > means that once your kernel blanks, you get a lockdep dump. At that > point you lose lockdep checking for everything else because lockdep > disables itself after the first dump. Fair enough, we may want to revert the lockdep checking for console_lock, and make re-enabling it part of the patch-series that fixes the locking. Daniel/Dave? Does that sound reasonable? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/40] CPU hotplug rework - episode I
On Fri, Feb 1, 2013 at 8:48 AM, Thomas Gleixner wrote: >> Methinks Tejun needed a cc on this lot ;) > > Not really. I think we want as many people as possible cc'd on this. You may think it's an obvious improvement, but maybe it's just because you now understand the code because you wrote it yourself, not because it's *actually* better. Having some explicitly documented states may be nice, but do we need eleven of them? And do we want to expose them? At least not for the f*cking notifiers, I hope. Notifiers are a disgrace, and almost all of them are a major design mistake. They all have locking problems, the introduce internal arbitrary API's that are hard to fix later (because you have random people who decided to hook into them, which is the whole *point* of those notifier chains). Since the patches themselves weren't cc'd, I don't know if you actually made each state transition do those insane notifiers or not, but I seriously hope you didn't. With that many states, hopefully the idea is that you don't have any notifiers at all, and you just then call the people associated with a particular state directly. Yes? No? Because if this adds tons of new notifiers, I'm going to say that we need about a hundred people signing off on the patches. Part of your explanation made me think you got rid of the notifiers, but then it became clear that you just renamed them as "state callbacks". If that's some generic exposed interface, I'll NAK it. No way in hell do we want to expose eleven states with some random generic "SMP state callback interface". F*ck no. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/40] CPU hotplug rework - episode I
On Fri, Feb 1, 2013 at 9:44 AM, Thomas Gleixner wrote: > > Just face it. The current hotplug maze has 100+ states which are > completely undocumented. They are asymetric vs. startup and > teardown. They just exists and work somehow aside of the occasional > hard to decode hickup. > > Do you really want to preserve that state by all means [F*ck no]? No., But I also don't want to replace it with "there's now eleven documented states, and random people hook into random documented states". So for me it's the "expose these states" that I get worried about.. A random driver should not necessarily even be able to *see* this, and decide to be clever and take advantage of the ordering. So I'd hope there would be some visibility restrictions. We currently have drivers already being confused by DOWN_PREPARE vs DOWN_FAILED etc etc random state transitions, and giving them even more flexibility to pick random states sounds like a really bad idea. I'd like to make sure that drivers and filesystems etc do not even *see* the states that are meant for the scheduler or workqueues, for example). So 11 states (although some of those seem to have lots of substates, so there may be many more) is too many to *expose*. It's not necessarily too many to "have and document", if you see the difference. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 3.8-rc6
DM-RAID: Fix RAID10's check for sufficient redundancy Kukjin Kim (1): pinctrl: samsung: removing duplicated condition for PINCTRL_SAMSUNG Larry Finger (1): rtlwifi: Fix build warning introduced by commit a290593 Lee Jones (1): mfd: Fix compile errors and warnings when !CONFIG_AB8500_BM Li RongQing (2): ah4/esp4: set transport header correctly for IPsec tunnel mode. ah6/esp6: set transport header correctly for IPsec tunnel mode. Li Zhong (1): powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning for ppc32 Liam Girdwood (2): regulator: MAINTAINERS: update email address ASoC: MAINTAINERS: Update email address. Lingzhu Xiang (1): efivarfs: Drop link count of the right inode Linus Torvalds (1): Linux 3.8-rc6 Linus Walleij (2): mfd: db8500-prcmu: Fix irqdomain usage mfd: tc3589x: Use simple irqdomain Maarten Lankhorst (2): x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES x86, efi: remove attribute check from setup_efi_pci Mark Brown (8): ASoC: dapm: Fix sense of regulator bypass mode ASoC: wm5102: Correct AEC loopback mask ASoC: wm5110: Correct AEC loopback mask ASoC: arizona: Use actual rather than desired BCLK when calculating LRCLK ASoC: wm_adsp: Use GFP_DMA for things that may be DMAed mfd: arizona: Disable control interface reporting for WM5102 and WM5110 mfd: arizona: Check errors from regcache_sync() mfd: wm5102: Fix definition of WM5102_MAX_REGISTER Matt Fleming (5): efivarfs: Never return ENOENT from firmware efivarfs: Delete dentry from dcache in efivarfs_file_write() x86, efi: Set runtime_version to the EFI spec revision efi: Make 'efi_enabled' a function to query EFI facilities samsung-laptop: Disable on EFI hardware Matthias Schiffer (3): batman-adv: fix skb leak in batadv_dat_snoop_incoming_arp_reply() batman-adv: check for more types of invalid IP addresses in DAT batman-adv: filter ARP packets with invalid MAC addresses in DAT Michal Kubecek (1): xfrm: fix freed block size calculation in xfrm_policy_fini() Michel Dänzer (1): drm/radeon: Enable DMA_IB_SWAP_ENABLE on big endian hosts. Mike Snitzer (1): dm thin: fix queue limits stacking Nathan Zimmer (1): efi, x86: Pass a proper identity mapping in efi_call_phys_prelog Neil Horman (1): sctp: refactor sctp_outq_teardown to insure proper re-initalization Nicholas Santos (1): HID: usbhid: quirk for Formosa IR receiver Nickolai Zeldovich (2): 3c574_cs: fix operator precedence between << and & net/xfrm/xfrm_replay: avoid division by zero Nithin Nayak Sujir (2): tg3: Avoid null pointer dereference in tg3_interrupt in netconsole mode tg3: Fix crc errors on jumbo frame receive Olivier Sobrie (3): can: c_can: fix invalid error codes can: ti_hecc: fix invalid error codes can: pch_can: fix invalid error codes Or Gerlitz (1): net/mlx4_core: Set number of msix vectors under SRIOV mode to firmware defaults Pablo Neira Ayuso (2): netfilter: xt_CT: fix unset return value if conntrack zone are disabled netfilter: nf_conntrack: fix BUG_ON while removing nf_conntrack with netns Paul Moore (2): selinux: add the "attach_queue" permission to the "tun_socket" class tun: fix LSM/SELinux labeling of tun/tap devices Peter Korsgaard (1): dm9601: support dm9620 variant Piotr Haber (1): brcmsmac: increase timer reference count for new timers only Pravin B Shelar (1): IP_GRE: Fix kernel panic in IP_GRE with GRE csum. Rahul Sharma (1): drm/exynos: let drm handle edid allocations Ralf Baechle (5): MIPS: BCM47xx: Enable SSB prerequisite SSB_DRIVER_PCICORE. MIPS: Export . MIPS: Add struct p_format to union mips_instruction. MIPS: PNX833x: Fix comment. MIPS: Octeon: Fix warning. Randy Dunlap (1): x86/olpc: Fix olpc-xo1-sci.c build errors Rob Herring (1): net: calxedaxgmac: throw away overrun frames Romain KUNTZ (1): ipv6: fix header length calculation in ip6_append_data() Sachin Kamat (4): drm/exynos: Make g2d_userptr_get_dma_addr static drm/exynos: Make ipp_handle_cmd_work static drm/exynos: Add missing static specifiers in exynos_drm_rotator.c drm/exynos: Make 'drm_hdmi_get_edid' static Sean Paul (2): drm/exynos: Replace mdelay with usleep_range drm/exynos: Remove "internal" interrupt handling Sergio Cambra (1): Bluetooth device 04ca:3008 should use ath3k Seung-Woo Kim (1): drm/exynos: added validation of edid for vidi connection Shawn Guo (1): ASoC: fsl: fix multiple definition of init_module Shirish S (1): drm/exynos: add check for the device power status Simon Guinot (1): pinctrl: mvebu: fix MPP6 value for kirkwood driver Stanislaw Gruszka (2): mac80211
Re: [git pull] fbcon locking fixes.
On Thu, Jan 24, 2013 at 4:42 PM, Dave Airlie wrote: > > These patches have been sailing around long enough, waiting for a maintainer > to reappear, so I've decided enough is enough, lockdep is kinda useful to > have. Last this was tried, these patches failed miserably. They caused instant lockdep splat and then a total lockup with efifb. It may be that Takashi's patch helps fix that problem, but it's in no way clear that it does, so the patch series isn't at all obviously stable. Yes, lockdep is indeed "kinda useful", and there clearly are locking problems in fbdev. But I'm not seeing myself pulling these for 3.8. They've been too problematic to pull in at this late stage. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] fbcon locking fixes.
On Thu, Jan 24, 2013 at 5:45 PM, Dave Airlie wrote: > > Okay I've just sent out another fbcon patch to fix the locking harder. > > There was a path going into set_con2fb_path if an fb driver was > already registered, I just pushed the locking out further on anyone > going in there. > > it boots on my EFI macbook here. Ok, good. Sounds like we'll finally get it fixed, but I'm still too much of a scaredy-cat to take it for now, so -next it is... Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Btrfs fixes
On Thu, Jan 24, 2013 at 1:52 PM, Chris Mason wrote: > > Update on this, we've tracked down the crc errors and are doing final > checks on the patches. Linus are you planning on taking this pull? If > not I can just fold the new stuff into a bigger request. If you have them basically ready, add them to this, I haven't pulled yet. So I'll just ignore this and wait for another pull request. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/asm] x86/defconfig: Turn on CONFIG_CC_OPTIMIZE_FOR_SIZE= y in the 64-bit defconfig
On Sat, Jan 26, 2013 at 7:18 AM, H. Peter Anvin wrote: > On the CPUs Ling is testing on the downsides of -Os probably matter less, in > particular since rep movsb works well. > > It is questionable as a generic default, though. So being the person who really pushed for -Os to begin with (I think I$ and instruction decode bandwidth is one of the most fundamental limits to CPU performance), I wouldn't mind it if we reintroduced it. HOWEVER. It wasn't just "rep movs". The thing that killed -Os for me was that it makes it impossible to try to optimize hot code, because -Os seems to throw out branch prediction information. So when you use "likely()" etc to try to teach the compiler to lay out code a certain way so that code that never really gets executed isn't even brought into the I$, -Os then screws it up completely. Of course, maybe newer versions of gcc might not suck so horribly with -Os, I haven't actually tried in a while. [ Just tested. Still does it ] Also, I doubt Ling was testing a SB CPU. Because "rep movb" still sucks pretty bad on SB. What core *is* Ling testing? Haswell? Ugh. We could make it depend on the optimization target. I'd also wish there was some way to just tune gcc -Os to be closer to reasonable. Or make -O2 not do some of the excessive crap it does (it aligns code *much* too much, for example - who cares if you can do it with a single instruction, if that instruction is so long that it uses up half your decode bandwidth?) The problem, of course, is that most -O2 code generation is done assuming hot loops that don't show much if any I$ issues. And the -Os thing is done *purely* for size, not taking any performance into account at all. There's no balanced middle ground, which is what _we_ would want. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing wrote: > I get below warning every day with 3.7, > one or two times per day. > > [ 2235.186027] WARNING: at > /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 > default_send_IPI_mask_logical+0x2f/0xb8() > [ 2235.186030] Hardware name: Aspire 4741 > [ 2235.186032] empty IPI mask > [ 2235.186079] [] native_send_call_func_ipi+0x4f/0x57 > [ 2235.186087] [] smp_call_function_many+0x191/0x1a9 > [ 2235.186097] [] native_flush_tlb_others+0x21/0x24 > [ 2235.186101] [] flush_tlb_page+0x63/0x89 > [ 2235.186105] [] ptep_set_access_flags+0x20/0x26 > [ 2235.186111] [] do_wp_page+0x234/0x502 > [ 2235.186121] [] handle_pte_fault+0x50d/0x54c > [ 2235.186148] [] handle_mm_fault+0xd0/0xe2 > [ 2235.186153] [] __do_page_fault+0x411/0x42d > [ 2235.186166] [] do_page_fault+0x8/0xa > [ 2235.186170] [] error_code+0x5a/0x60 > > This patch fix it. > > This patch also fix some system hang problem: > If the data->cpumask been cleared after pass > > if (WARN_ONCE(!mask, "empty IPI mask")) > return; > then the problem 83d349f3 fix will happen again. Hmm. We have very consciously tried to avoid the extra copy, although I'm not entirely sure why (it might possibly hurt on the MAXSMP configuration). See for example commit 723aae25d5cd ("smp_call_function_many: handle concurrent clearing of mask") which fixed another version of this problem. But I do agree that it looks like the copy is required, simply because - as you say - once we've done the "list_add_rcu()" to add it to the queue, we can have (another) IPI to the target CPU that can now see it and clear the mask. So by the time we get to actually send the IPI, the mask might have been cleared by another IPI. So I do agree that your patch seems correct, but I really really want to run it by other people. Guys? Original patch on lkml. The other possible fix might be to take the &call_function.lock earlier in generic_smp_call_function_interrupt(), so that we can never clear the bit while somebody is adding entries to the list... But I think it very much tries to avoid that on purpose right now, with only the last CPU responding to that IPI taking the lock. So copying the IPI mask seems to be the reasonable approach. Comments? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] parisc updates for 3.9
On Fri, Feb 22, 2013 at 1:16 PM, Helge Deller wrote: > > git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git > parisc-3.9 In general, I'd love to also get a short human-readable explanation of what the pull does for the merge message. As it is, I just made something up. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] irq: Cleanup context state transitions in irq_exit()
On Sat, Feb 23, 2013 at 10:21 AM, Frederic Weisbecker wrote: > > But tick_nohz_irq_exit() may trigger the timer softirq itself. Suggestion: merge it with the whole softirq handler. The softirq code *already* knows about the whole "oops, one softirq may trigger another" issue, and has a loop - with protection against excess - for exactly this reason. See the whole "goto restart" thing. And tick_nohz_irq_exit() really has very similar semantics to softiq's, it's just "CPU is idle and no pending reschedule" instead of a softirq. But the basic rules are the same ("only run this at the top-level context when exiting the last irq"). So maybe the right thing to do is move the whole "goto restart" one level up, and do softirq's and tick_nohz_irq_exit both inside that loop. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] signal.git
On Wed, Feb 20, 2013 at 2:52 PM, Al Viro wrote: > * a bunch of signal-related syscalls (both native and compat) unified. Ok, in the meantime I had merged the parisc and powerpc trees, which had their own fixes in this area: powerpc added the transactional memory support for power8 (which impacted signal save/restore), and parisc had some fixes to the routines you then removed in favor of generic ones. I fixed up the conflicts, and they didn't look that bad, but I could easily have messed something up, so people - please double-check the end result. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] KVM updates for the 3.9 merge window
On Wed, Feb 20, 2013 at 5:17 PM, Marcelo Tosatti wrote: > > Please pull from > > git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-3.9-1 > > to receive the KVM updates for the 3.9 merge window [..] Ok, particularly the s390 people should check me resolution of the conflicts, since they include the renaming of IOINT_VIR to IRQIO_VIR. But the uapi header file move should be couble-checked by people who use this too. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie wrote: > > So up front, this has a massive merge conflict in > drivers/gpu/drm/radeon/evergreen_cs.c I've fixed it up in drm-next-merged > in the same tree, I fixed up some small ordering issues in my merge as > well, however they aren't important if you want the fun of doing a major > conflict resolution. I did the fun conflict resolution, so my tree doesn't have the ordering changes. I also did some things slightly differently from you - you had left some direct ib[] accesses that I spotted (see for example "case 0x48" (aka "Copy L2T Frame to Field"), and yours apparently has a few cases where you use "idx_value" instead of my mindless conflict resolution that just re-did the brute-force "repace direct ib[] read accesses with the radeon_get_ib_value() helper function". But you don't do it for *all* the radeon_get_ib_value(p, idx+2) users, so whatever. Anyway - my conflict resolution isn't exactly the same as yours, and maybe I screwed something up. But it's damn close, and the differences _seem_ be all be benign. Btw, why is it ok that some functions still read the ib[] array directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg() etc)? Whatever. I prefer doing my own resolutions just so that I know what's going on, and it all seems to build and looks reasonable, but it's always good to get a second opinion. Particularly since I can't actually test the radeon stuff, so just eyeballing it and saying "looks semantically identical to Dave's resolution" may not be 100% sufficient.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Load keys from signed PE binaries
On Mon, Feb 25, 2013 at 7:28 PM, Matthew Garrett wrote: > > You're happy advising Linux vendors that they don't need to worry about > module signing because it's "not obvious" that Microsoft would actually > enforce the security model they've spent significant money developing > and advertising? And you're happy shilling for a broken model? The fact is, the only valid user for the whole security model is to PROTECT THE USER. Your arguments constantly seem to miss that rather big point. You think this is about bending over when MS whispers sweet nothings in your ear.. The whole and only reason I ever merged module signatures is because it actually allows *users* to do a good job at security. You, on the other hand, seem to have drunk the cool-aid on the whole "let's control the user" crap. Did you forget what security was all about? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Load keys from signed PE binaries
On Mon, Feb 25, 2013 at 7:42 PM, Matthew Garrett wrote: > > The user Microsoft care about isn't running Linux How f*cking hard is it for you to understand? Stop arguing about what MS wants. We do not care. We care bout the *user*. You are continually missing the whole point of security, and then you make some idiotic arguments about what MS wants you to do. It's irrelevant. The only thing that matters is what our *users* want us to do, and protecting *their* rights. As long as you seem to treat this as some kind of "let's please MS, not our users" issue, all your arguments are going to be crap. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Load keys from signed PE binaries
On Mon, Feb 25, 2013 at 7:48 PM, Matthew Garrett wrote: > > Our users want to be able to boot Linux. If Microsoft blacklist a > distribution's bootloader, that user isn't going to be able to boot > Linux any more. How does that benefit our users? How does bringing up an unlikely and bogus scenario - and when people call you on it, just double down on it - help users? Stop the fear mongering already. So here's what I would suggest, and it is based on REAL SECURITY and on PUTTING THE USER FIRST instead of your continual "let's please microsoft by doing idiotic crap" approach. So instead of pleasing microsoft, try to see how we can add real security: - a distro should sign its own modules AND NOTHING ELSE by default. And it damn well shouldn't allow any other modules to be loaded at all by default, because why the f*ck should it? And what the hell should a microsoft signature have to do with *anything*? - before loading any third-party module, you'd better make sure you ask the user for permission. On the console. Not using keys. Nothing like that. Keys will be compromised. Try to limit the damage, but more importantly, let the user be in control. - encourage things like per-host random keys - with the stupid UEFI checks disabled entirely if required. They are almost certainly going to be *more* secure than depending on some crazy root of trust based on a big company, with key signing authorities that trust anybody with a credit card. Try to teach people about things like that instead. Encourage people to do their own (random) keys, and adding those to their UEFI setups (or not: the whole UEFI thing is more about control than security), and strive to do things like one-time signing with the private key thrown out entirely. IOW try to encourage *that* kind of "we made sure to ask the user very explicitly with big warnings and create his own key for that particular module" security. Real security, not "we control the user" security. Sure, users will screw that up too. They'll want to load crazy nvidia binary modules etc crap. But make it *their* decision, and under *their* control, instead of trying to tell the world about how this should be blessed by Microsoft. Because it really shouldn't be about MS blessings, it should be about the *user* blessing kernel modules. Quite frankly, *you* are what he key-hating crazies were afraid of. You peddle the "control, not security" crap-ware. The whole "MS owns your machine" is *exactly* the wrong way to use keys. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Load keys from signed PE binaries
On Mon, Feb 25, 2013 at 8:23 PM, Matthew Garrett wrote: > > If the user has explicitly enrolled a hash then they're stepping outside > the trust model. This is the kind of totally bogus crap that no sane person should ever spout. Stop it. If the user has explicitly enrolled a hash, then that should be the *primary* trust model, dammit. That should be very much what you should care about first and foremost, and that should be your goal in life. That's when the user says "I'm in control of my own machine, and I want to trust *this*". It's not about "stepping outside of the trust model". Quite the reverse. It's about actually being *part* of the trust model, and taking control of your own machine. It's the *good* scenario. It's what you should encourage users to do. No, it likely can't be the default because we shouldn't expect users to care enough, but on the other hand the default should definitely *not* be "enable random third party modules signed indirectly by MS", which is what your crazy world-view seems to be. So the first order should be: "we provide modules to cover all normal users". You use the RH key for that. The *second* order should be: "we encourage and tell people how to add their own keys and sign modules they trust". The third order should probably be "we encourage people to use random one-time keys - probably with UEFI key checking turned off entirely, because let's face it, that doesn't really add any real security for most people". It's what kernel developers and most servers would probably want to use. They likely don't do the whole UEFI crap anyway, and random one-time keys are actually better against things like rootkits etc than *any* centrally administered chain of trust. Only somewhere really really deep down should the "ok, what about a MS signature" thing be. It could be part of the user-level application (part of your distribution) that displays the "are you really sure you want to load this module with an unrecognized signature? I can tell that it has a MS signature on it". But by the time you get this far, you've already failed the first few normal levels. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] PCI changes for v3.9
On Sat, Feb 23, 2013 at 6:49 PM, Yinghai Lu wrote: > > Please check if attached diff is right, and hope it could save Linus some > time. Hmm. I did things a bit differently, moving things around more in drivers/acpi/internal.h. Also, my *gut* feel is that the new _handle_hotplug_event_root() function should do that whole dance with acpi_scan_lock_acquire()/acpi_scan_lock_release(), but I didn't really know if it's required or appropriate, so I left it alone. Could you take a look? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] ACPI and power management fixes for v3.9-rc1
On Mon, Feb 25, 2013 at 7:17 PM, Rafael J. Wysocki wrote: > > I wonder if this went unnoticed or there's anything wrong with it or it just > needs to wait for some more time? Just going through things slowly. It's merged in my tree now. Oh, and a request: _please_ don't use unknown TLA's like OPP. This has become a huge problem, to the point that we have a "Documentation/power/opp.txt" file THAT NEVER CLEARLY STATES WHAT THE F*CK OPP ACTUALLY MEANS! What nice "documentation". Ok, I can look up things like this and find that it is "Operating Performance Points". At least in this context. But no, it's not some kind of generic standard, and no, it's not something people should be expected to know in general. Please stop doing "explanations" of things that use TLA's like this. And people shouldn't have to even wonder. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bug in generic strncpy_from_user
On Tue, Feb 26, 2013 at 4:57 AM, Heiko Carstens wrote: > > I was wrong. -EFAULT will be returned, however the vma will grow nevertheless. > Which in turn will cause trouble. Ok. We should fix that too. There whole "access just past the end of the previous vma" should never cause the stack above to expand. The guard page at least gives people a SIGSEGV, but one of the main reasons for the guard page was actually to make sure that new "mmap()" calls do not create mappings just under the stack (in addition to the obvious SIGSEGV when you then access into that thing). So while part of the meaning of the guard page is to get that SIGSEGV, that part is "for safetly". And apparently it works. But at the same time, there is absolutely no reason to ever expand the stack only to hit the guard page _anyway_, so if the stack expansion will cause the requested address to be in the guard page, then the stack expansion should just have failed. I think the problem is that we add the guard page *after* we do the normal "let's try to expand" logic. I'll take a look. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bug in generic strncpy_from_user
On Tue, Feb 26, 2013 at 7:51 AM, Linus Torvalds wrote: > > I think the problem is that we add the guard page *after* we do the > normal "let's try to expand" logic. > > I'll take a look. Ahh, no. The guard page logic happens later at the fault time. We do this in two phases - first "find_extend_vma()" does what the name claims, and then check_stack_guard_page() is done for the last-page case from within do_anonymous_page() when we actually touch the last page itself. But that's actually fine. We can simply make "find_extend_vma()" do the obvious "refuse to extend the vma all the way", because we will later allow the guard page to extend downwards to "touch" the mapping, but that uses separate logic. So the attached trivial patch seems to make perfect sense: It is totally untested, though. Does it work for you (and we should do the same thing for the grows-up case, obviously)? Linus patch.diff Description: Binary data
Re: [GIT PULL] PCI changes for v3.9
On Mon, Feb 25, 2013 at 10:46 PM, Yinghai Lu wrote: > On Mon, Feb 25, 2013 at 9:19 PM, Linus Torvalds > wrote: >> >> Also, my *gut* feel is that the new _handle_hotplug_event_root() >> function should do that whole dance with >> acpi_scan_lock_acquire()/acpi_scan_lock_release(), but I didn't really >> know if it's required or appropriate, so I left it alone. Could you >> take a look? > > Yes, we need that for root bridge hot add path. > > for hot remove path, we already have lock acquire/release in > acpi_bus_hot_remove_device(). > > Please check attached patch for hot add path. Quite frankly, doing this in handle_root_bridge_insertion() doesn't match the pattern elsewhere. Elsewhere you also protected the whole acpi_get_name() lookup etc. Which is why I felt that it would make more sense to add this to _handle_hotplug_event_root(). But there may be good reasons why the root bridge case is different, and I don't have strong opinions, I just wanted people to look at his case. I'll let you and Bjorn sort it out... Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] ACPI and power management fixes for v3.9-rc1
On Tue, Feb 26, 2013 at 8:10 AM, Nishanth Menon wrote: > On 16:55-20130226, Rafael J. Wysocki wrote: >> >> It says that in "Introduction", but it would be clearer if the title of the >> doc was something like "Operating Performance Points (OPP) Library". >> Nishanth? > > Yes indeed. Will the following help? I can post it as an official patch > if the direction is proper Yes, this will definitely help. I didn't even find it in the introduction (Rafael is correct that it is indeed there), because it's hard to see when you don't know what to scan for and it's in a big block of text. I am also happy to note that it is in the Kconfig help and single-line description. Which wasn't true for the new SATA_ZPODD ("Zero Power ODD" - what the heck is ODD?) which was another new entry I wondered about. It turns out that ODD is an odd TLA for "Optical Disk Drive". I'm sure it makes perfect sense if you are a SATA person, but it sure doesn't for any normal human being, even otherwise highly technical ones. Aaron, Tejun, Jeff, can I ask you to also not use specialized TLA's without explaining them? Especially in help text and "documentation", it's very unhelpful to have TLA's that aren't common. We don't have to explain *all* TLA's, since there's a lot that really are rather widespread. But there's a big difference between something like CPU or TLB that have been in generic literature for decades, wrt OPP and ODD that are specialized terms used inside a very particular group and haven't been around for very long either. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/