[patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 4)
Various little cleanups and commenting fixes. Fixed up the patchset so each one, incrementally, should give a properly compiling and running kernel. I'd still like Hugh to ack the anon/swap changes when he can find the time. It would be desirable to get at least one ack as to the overall problem and design of the fix (Martin's ack is just for the s390 changes at this stage). Meanwhile, can it go into -mm for wider testing, if it isn't too much trouble? Thanks, Nick -- SuSE Labs - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/3] mm: make read_cache_page synchronous
Ensure pages are uptodate after returning from read_cache_page, which allows us to cut out most of the filesystem-internal PageUptodate calls. I didn't have a great look down the call chains, but this appears to fixes 7 possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd. All depending on whether the filler is async and/or can return with a !uptodate page. Signed-off-by: Nick Piggin [EMAIL PROTECTED] drivers/mtd/devices/block2mtd.c |3 -- fs/afs/dir.c|3 -- fs/afs/mntpt.c | 11 +++- fs/cramfs/inode.c |3 +- fs/ecryptfs/mmap.c | 11 fs/ext2/dir.c |3 -- fs/freevxfs/vxfs_subr.c |3 -- fs/minix/dir.c |1 fs/namei.c | 12 - fs/nfs/dir.c|5 fs/nfs/symlink.c|6 fs/ntfs/aops.h |3 -- fs/ntfs/attrib.c| 18 +- fs/ntfs/file.c |3 -- fs/ntfs/super.c | 30 +++- fs/ocfs2/symlink.c |7 - fs/partitions/check.c |3 -- fs/reiserfs/xattr.c |4 --- fs/sysv/dir.c | 10 fs/ufs/dir.c|6 fs/ufs/util.c |6 +--- include/linux/pagemap.h | 11 mm/filemap.c| 49 +++- mm/swapfile.c |3 -- 24 files changed, 70 insertions(+), 144 deletions(-) Index: linux-2.6/fs/afs/dir.c === --- linux-2.6.orig/fs/afs/dir.c +++ linux-2.6/fs/afs/dir.c @@ -187,10 +187,7 @@ static struct page *afs_dir_get_page(str page = read_mapping_page(dir-i_mapping, index, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; if (!PageChecked(page)) afs_dir_check_page(dir, page); if (PageError(page)) Index: linux-2.6/fs/afs/mntpt.c === --- linux-2.6.orig/fs/afs/mntpt.c +++ linux-2.6/fs/afs/mntpt.c @@ -77,13 +77,11 @@ int afs_mntpt_check_symlink(struct afs_v } ret = -EIO; - wait_on_page_locked(page); - buf = kmap(page); - if (!PageUptodate(page)) - goto out_free; if (PageError(page)) goto out_free; + buf = kmap(page); + /* examine the symlink's contents */ size = vnode-status.size; _debug(symlink to %*.*s, size, (int) size, buf); @@ -100,8 +98,8 @@ int afs_mntpt_check_symlink(struct afs_v ret = 0; - out_free: kunmap(page); + out_free: page_cache_release(page); out: _leave( = %d, ret); @@ -184,8 +182,7 @@ static struct vfsmount *afs_mntpt_do_aut } ret = -EIO; - wait_on_page_locked(page); - if (!PageUptodate(page) || PageError(page)) + if (PageError(page)) goto error; buf = kmap(page); Index: linux-2.6/fs/cramfs/inode.c === --- linux-2.6.orig/fs/cramfs/inode.c +++ linux-2.6/fs/cramfs/inode.c @@ -180,7 +180,8 @@ static void *cramfs_read(struct super_bl struct page *page = NULL; if (blocknr + i devsize) { - page = read_mapping_page(mapping, blocknr + i, NULL); + page = read_mapping_page_async(mapping, blocknr + i, + NULL); /* synchronous error? */ if (IS_ERR(page)) page = NULL; Index: linux-2.6/fs/ext2/dir.c === --- linux-2.6.orig/fs/ext2/dir.c +++ linux-2.6/fs/ext2/dir.c @@ -161,10 +161,7 @@ static struct page * ext2_get_page(struc struct address_space *mapping = dir-i_mapping; struct page *page = read_mapping_page(mapping, n, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; if (!PageChecked(page)) ext2_check_page(page); if (PageError(page)) Index: linux-2.6/fs/freevxfs/vxfs_subr.c === --- linux-2.6.orig/fs/freevxfs/vxfs_subr.c +++ linux-2.6/fs/freevxfs/vxfs_subr.c @@ -74,10 +74,7 @@ vxfs_get_page(struct address_space *mapp pp = read_mapping_page(mapping, n,
Re: GPL vs non-GPL device drivers
On Wednesday February 14, [EMAIL PROTECTED] wrote: I am well aware of what Greg KHs position is, in fact he is the reason I started the whole rant. This is only a plea to the higher authorities. Linus, please save Linux! Linus is not in any position to do anything. The die is cast. You should speak to a lawyer. The key issue is this: Does combining your work with Linux create a derived work. If it does not, you have nothing to worry about. If it does, then maybe you should worry. If someone who owns copyright in part of the Linux kernel that you are using, decides that they think you have created a derived work, then they might bring this to your attention and ask you to abide by the conditions in the license under which you obtained the Linux kernel. If no suitable resolution can be found, they might take you to court for using their protected work without a valid license (The GPL becomes void if you breach it's requirements). And then the judge might or might not find against you. But it is very hard to know in advance how the judge will decide in a particular case. Hence the best advice is to speak to a lawyer, They have the best chance of advising your how to minimise your risk. I hope that makes the situation clear enough. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/3] fs: buffer don't PageUptodate without page locked
__block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However the next patch will require that SetPageUptodate always be called with the page locked. Simply don't bother setting the page uptodate in this case (it is unusual that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin [EMAIL PROTECTED] fs/buffer.c | 11 +-- 1 file changed, 1 insertion(+), 10 deletions(-) Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh-b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/3] mm: fix PageUptodate memorder
After running SetPageUptodate, preceeding stores to the page contents to actually bring it uptodate may not be ordered with the store to set the page uptodate. Therefore, another CPU which checks PageUptodate is true, then reads the page contents can get stale data. Fix this by ensuring SetPageUptodate is always called with the page locked (except in the case of a new page that cannot be visible to other CPUs), and requiring PageUptodate be checked only when the page is locked. To facilitate lockless checks, SetPageUptodate contains an smp_wmb to order preceeding stores before the store to page flags, and a new PageUptodate_NoLock is introduced, which issues a smp_rmb after the page flags are loaded for the test. DMA memory barrier is not required, because the driver / IO subsystem must bring that into order before telling the core kernel that the read has completed. One thing I like about it is that it unifies the anonymous page handling with the rest of the page management, by marking anon pages as uptodate when they _are_ uptodate, rather than when our implementation requires that they be marked as such. Doing this let me get rid of the smp_wmb's in the page copying functions which, specially added for anonymous pages for a closely related issue, didn't quite match file backed page handling. Convert core code to use PageUptodate_NoLock. Filesystems are unaffected thanks to the change to read_cache_page. Signed-off-by: Nick Piggin [EMAIL PROTECTED] Acked-by: Martin Schwidefsky [EMAIL PROTECTED] fs/splice.c|4 +-- include/linux/highmem.h|4 --- include/linux/page-flags.h | 57 + mm/filemap.c | 20 +++ mm/hugetlb.c |2 + mm/memory.c|9 +++ mm/page_io.c |2 - mm/swap_state.c|2 - 8 files changed, 74 insertions(+), 26 deletions(-) Index: linux-2.6/include/linux/highmem.h === --- linux-2.6.orig/include/linux/highmem.h +++ linux-2.6/include/linux/highmem.h @@ -57,8 +57,6 @@ static inline void clear_user_highpage(s void *addr = kmap_atomic(page, KM_USER0); clear_user_page(addr, vaddr, page); kunmap_atomic(addr, KM_USER0); - /* Make sure this page is cleared on other CPU's too before using it */ - smp_wmb(); } #ifndef __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE @@ -108,8 +106,6 @@ static inline void copy_user_highpage(st copy_user_page(vto, vfrom, vaddr, to); kunmap_atomic(vfrom, KM_USER0); kunmap_atomic(vto, KM_USER1); - /* Make sure this page is cleared on other CPU's too before using it */ - smp_wmb(); } #endif Index: linux-2.6/include/linux/page-flags.h === --- linux-2.6.orig/include/linux/page-flags.h +++ linux-2.6/include/linux/page-flags.h @@ -126,16 +126,65 @@ #define ClearPageReferenced(page) clear_bit(PG_referenced, (page)-flags) #define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, (page)-flags) -#define PageUptodate(page) test_bit(PG_uptodate, (page)-flags) -#ifdef CONFIG_S390 +static inline int PageUptodate(struct page *page) +{ + WARN_ON(!PageLocked(page)); + return test_bit(PG_uptodate, (page)-flags); +} + +/* + * PageUptodate to be used when not holding the page lock. + */ +static inline int PageUptodate_NoLock(struct page *page) +{ + int ret = test_bit(PG_uptodate, (page)-flags); + + /* +* Must ensure that the data we read out of the page is loaded +* _after_ we've loaded page-flags and found that it is uptodate. +* See SetPageUptodate() for the other side of the story. +*/ + if (ret) + smp_rmb(); + + return ret; +} + static inline void SetPageUptodate(struct page *page) { + WARN_ON(!PageLocked(page)); +#ifdef CONFIG_S390 if (!test_and_set_bit(PG_uptodate, page-flags)) page_test_and_clear_dirty(page); -} #else -#define SetPageUptodate(page) set_bit(PG_uptodate, (page)-flags) + /* +* Memory barrier must be issued before setting the PG_uptodate bit, +* so all previous writes that served to bring the page uptodate are +* visible before PageUptodate becomes true. +* +* S390 is guaranteed to have a barrier in the test_and_set operation +* (see Documentation/atomic_ops.txt). +* +* This memory barrier should not need to provide ordering against +* DMA writes into the page, because the IO completion should really +* be doing that. +*/ + smp_wmb(); + set_bit(PG_uptodate, (page)-flags); #endif +} + +static inline void SetNewPageUptodate(struct page *page) +{ + /* +* S390 sets page dirty bit on IO operations, which is why it is +* cleared in
[PATCH] fix mempolicy's check on a system with memory-less-node take4
please ack if O.K. -Kame -- bind_zonelist() can create zero-length zonelist if there is a memory-less-node. This patch checks the length of zonelist. If length is 0, returns -EINVAL. Changelog: v3 - v4: - changes a name of a temporal void* variable as error_code Changelog: v2 - v3 - removed ambiguous void *pointer usage. - fixed warnings...misuse of PTR_ERR. Changelog: v1 - v2 - avoid extra pgdat scanningit is not necessary. tested on ia64/NUMA with memory-less-node. Signed-Off-By: KAMEZAWA Hiroyuki [EMAIL PROTECTED] Index: linux-2.6.20/mm/mempolicy.c === --- linux-2.6.20.orig/mm/mempolicy.c2007-02-13 15:14:13.0 +0900 +++ linux-2.6.20/mm/mempolicy.c 2007-02-15 16:11:17.0 +0900 @@ -144,7 +144,7 @@ max++; /* space for zlcache_ptr (see mmzone.h) */ zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL); if (!zl) - return NULL; + return ERR_PTR(-ENOMEM); zl-zlcache_ptr = NULL; num = 0; /* First put in the highest zones from all nodes, then all the next @@ -162,6 +162,10 @@ break; k--; } + if (num == 0) { + kfree(zl); + return ERR_PTR(-EINVAL); + } zl-zones[num] = NULL; return zl; } @@ -193,9 +197,10 @@ break; case MPOL_BIND: policy-v.zonelist = bind_zonelist(nodes); - if (policy-v.zonelist == NULL) { + if (IS_ERR(policy-v.zonelist)) { + void *error_code = policy-v.zonelist; kmem_cache_free(policy_cache, policy); - return ERR_PTR(-ENOMEM); + return error_code; } break; } @@ -1667,7 +1672,7 @@ * then zonelist_policy() will FALL THROUGH to MPOL_DEFAULT. */ - if (zonelist) { + if (!IS_ERR(zonelist)) { /* Good - got mem - substitute new zonelist */ kfree(pol-v.zonelist); pol-v.zonelist = zonelist; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL vs non-GPL device drivers
On Wed, Feb 14, 2007 at 10:46:13PM -0800, v j wrote: You don't get it do you. Our source code is meaningless to the Open Source community at large. Linux supports entire _architectures_ of which there are single figures of people using it. What makes your hardware special ? We are only _using_ Linux. If you're adding kernel modules, you're more than using Linux, you're developing _for_ linux. You're just choosing to keep the fruits of those labors to yourself. Just as we could have used VxWorks or OSE. You could. But would you have had access to thousands of worldwide contributors making your code better? This is what you've missed out on with your current stance. Using our source code would not benefit anybody but our competitors. This excuse has been given time and time again, and repeatedly been proven false. And as soon as one of your competitors makes their drivers open, guess which one gets 1000+ free developers working on their code ? Sure we could make our drivers open-source. This is a decision that is made FIRST when evaluating an OS. If we we were required to make our drivers/HW open, we would just not have chosen Linux. It is as simple as that. Please, revisit the 1990s. Read the cathedral and the bazaar.[1] Listen to MC Hammer. Realise the funky horror. Then when you're ready to revisit us with some points that haven't already been dismissed please post again. Until then, you're offering nothing new. Dave [1] Jesus, I'm recommending ESR texts, I must be desperate. -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL vs non-GPL device drivers
On Wed, 2007-02-14 at 21:16 -0800, v j wrote: This is in reference to the following thread: http://lkml.org/lkml/2006/12/14/63 I am not sure if this is ever addressed in LKML, but linux is _very_ popular in the embedded space. We (an embedded vendor) chose Linux 3 years back because of its lack of royalty model, robustness and availability of infinite number of open-source tools. I think you have a bit of a misunderstanding... Linux is not royalty free. Just the royalty is not in the form of cash, but in the form of having to give your improvements back to the open source world. (this is paraphrasing the intent of the GPL basically, you can argue for hours if drivers are separate or improvements, and I'm not interested in that debate, it has been debated to death before and only lawyers will in the end be able to settle that on a case by case basis). If your mindset is how much can I take take take without giving back back back then personally I think you're sort of acting like a parasite in this context - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL vs non-GPL device drivers
Ben Nizette wrote: v j wrote: This is in reference to the following thread: http://lkml.org/lkml/2006/12/14/63 I am not sure if this is ever addressed in LKML, but linux is _very_ popular in the embedded space. We (an embedded vendor) chose Linux 3 years back because of its lack of royalty model, robustness and availability of infinite number of open-source tools. [...] However we have a worrying trend here. If at some point it becomes illegal to load our modules into the linux kernel, then it is unacceptable to us. We would have been better off choosing VxWorks or OSE 3 years ago when we made an OS choice. The fact that Linux is becoming more and more closed is very very alarming. Question to the world here: Distros make, as a matter of course, a series of modifications to the Linux Kernel so that their modules or features work. What stops VJ making a patchset which effectively s/EXPORT_SYMBOL_GPL/EXPORT_SYMBOL/g 's the kernel source then distributing that under the GPL? He then supplies his un-GPL'd modules to the world which just happen to only run on the modified kernel. I've read the GPL of course (IANAL though) and I can't see what this violates except the /spirit/ of the license. Don't get me wrong, I'm strongly against anyone doing what I just mentioned, I believe it to be immoral taking someone's GPL'd code and mangling it in such a way. I speak as an embedded developer myself whose company decided that running our code under Linux and distributing our code under the GPL was far preferable to running closed-source software on a closed-source platform. The best bet would be to read up on lots of past discussions related to exactly these kinds of questions, then ask your Lawyer. Rhetorical question: what stops me from taking somebody's copyrighted work, stripping the copyrights or falsely claiming to have a license to redistribute it, then selling it? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL vs non-GPL device drivers
On Wednesday February 14, [EMAIL PROTECTED] wrote: On 2/14/07, Randy Dunlap [EMAIL PROTECTED] wrote: We seem to have different definitions of open and closed. Open = 3rd party Linux drivers can be loaded. Closed = No third party Linux drivers can be loaded. Loading a driver is not at issue. Anyone may load a driver. The issue is when you *distribute* a driver. If that driver is a derived work or the Linux kernel, then you may only distribute it under the terms of the GPLv2, which essentially means that you make the source code available - under the GPLv2 - to everyone you give the driver to. How do you know if the driver is a derived work? Well, if it uses POSIX syscalls only, it isn't. (You can write USB drivers in user-space which do this). If it uses symbols exported with EXPORT_SYMBOL_GPL, then the author of the code which provides those symbols thinks that the driver is a derived work. If it uses EXPORT_SYMBOL symbols, then it is less clear what people believe, though there are certainly some who believe it will still be a derived work. But of course the person who's opinion really counts is the judge. So you need to get legal advice. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components
Nadia Derbey [EMAIL PROTECTED] writes: But, what do you do with Oracle that's asking maxfiles to be set to 0x1, while the default value might be enough for a system that's not running Oracle. I'm afraid that giving boot time values to the max_* tunables we will loose all the benefits from /proc (or /sys): it is impossible to anticipate what an OS will be used for. So allowing such things to be changed without having to reboot the machine is in my mind quite a powerful feature we should keep taking adavntage of. I'm not saying remove user spaces' ability to set the denial-of-service limits. I'm saying if they need to be frequently changed we need to update the default so they are higher by default. There really is no cost in moving those values up and down it is just an arbitrary integer used in comparisons. But if we can make a good guess that still catches runaway programs before they kill the machine but also allows more programs to work out of the box we are in better shape. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL vs non-GPL device drivers
On 2/15/07, Neil Brown [EMAIL PROTECTED] wrote: [..] then it is less clear what people believe Another area where it is less clear what people believe is if you are distributing the module separately to the kernel, but, as I understand it, vj says he is not. But of course the person who's opinion really counts is the judge. The judge's opinion only counts if you actually get to court and manage to put up a legal defense. So you need to get legal advice. Or, ya know, you could take the moral/ethical advice that you're being a worm and stop now. Trent - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GPL vs non-GPL device drivers
On Wednesday February 14, [EMAIL PROTECTED] wrote: You don't get it do you. Our source code is meaningless to the Open Source community at large. It is only useful to our tiny set of competitors that have nothing to do with Linux. The Embedded space is very specific. We are only _using_ Linux. Just as we could have used VxWorks or OSE. Using our source code would not benefit anybody but our competitors. It would also benefit your *customers*. And you might find that providing such benefits increases the number of your customers. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] GIT 1.5.0
Jakub Narebski [EMAIL PROTECTED] wrote: Junio C Hamano wrote: - git-blame learned a new option, --incremental, that tells it to output the blames as they are assigned. A sample script to use it is also included as contrib/blameview. And there are example GUI blameview (Perk GTK2), and example Emacs module for incremental git-blame, both in contib/ area. Not to mention the incremental blame viewer built into git-gui: git gui blame HEAD foo.c -- Shawn. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] MODSIGN: Kernel module signing
On Wednesday 14 February 2007 20:13, Dave Jones wrote: I've not investigated it, but I hear rumours that suse has something similar. Actually, no. We don't belive that module signing adds significant value, and it also doesn't work well with external modules. (The external modules we really care about are GPL ones; it gives us a way to update drivers without pushing out entirely new kernels.) Cheers, Andreas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] MODSIGN: Kernel module signing
On Wed, Feb 14, 2007 at 09:35:40PM -0800, Andreas Gruenbacher wrote: On Wednesday 14 February 2007 20:13, Dave Jones wrote: I've not investigated it, but I hear rumours that suse has something similar. Actually, no. We don't belive that module signing adds significant value, ok, then I was misinformed. and it also doesn't work well with external modules. well, the situation for external modules is no worse than usual. They still work, they just aren't signed. Which from a distributor point of view, is actually a nice thing, as they stick out like a sore thumb in oops reports with (U) markers :) (The external modules we really care about are GPL ones; it gives us a way to update drivers without pushing out entirely new kernels.) external modules still compile, and run just fine. The signed modules code doesn't prevent loading of them unless the user decides to do so with a special boot option (which is no different really than say, reducing the cap-bound sysctl to prevent module loading). Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/6] atl1: bugfix, cleanup, enhancement
Jeff, Please accept the following patchset for the atl1 network device driver. * Drop unnecessary NET_PCI config * Fix incorrect hash table address * Read MAC address from register * Remove unused define * Add Attansic L1 device id to pci_ids * Bump version number This patchset contains changes to the following files. drivers/net/Kconfig |2 +- drivers/net/atl1/atl1_hw.c | 37 + drivers/net/atl1/atl1_main.c |5 ++--- include/linux/pci_ids.h |1 + 4 files changed, 25 insertions(+), 20 deletions(-) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/6] atl1: drop NET_PCI from Kconfig
From: Jay Cliburn [EMAIL PROTECTED] The atl1 driver doesn't need NET_PCI. Remove it from Kconfig. Noticed by Chad Sprouse. Signed-off-by: Jay Cliburn [EMAIL PROTECTED] Signed-off-by: Chris Snook [EMAIL PROTECTED] --- drivers/net/Kconfig |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 0bb3c1e..1b624b4 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2350,7 +2350,7 @@ config QLA3XXX config ATL1 tristate Attansic L1 Gigabit Ethernet support (EXPERIMENTAL) - depends on NET_PCI PCI EXPERIMENTAL + depends on PCI EXPERIMENTAL select CRC32 select MII help - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/6] atl1: fix bad ioread address
From: Al Viro [EMAIL PROTECTED] An ioread32 statement reads the wrong address. Fix it. Signed-off-by: Al Viro [EMAIL PROTECTED] Signed-off-by: Jay Cliburn [EMAIL PROTECTED] Signed-off-by: Chris Snook [EMAIL PROTECTED] --- drivers/net/atl1/atl1_hw.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/atl1/atl1_hw.c b/drivers/net/atl1/atl1_hw.c index 08b2d78..e28707a 100644 --- a/drivers/net/atl1/atl1_hw.c +++ b/drivers/net/atl1/atl1_hw.c @@ -357,7 +357,7 @@ void atl1_hash_set(struct atl1_hw *hw, u32 hash_value) */ hash_reg = (hash_value 31) 0x1; hash_bit = (hash_value 26) 0x1F; - mta = ioread32((hw + REG_RX_HASH_TABLE) + (hash_reg 2)); + mta = ioread32((hw-hw_addr + REG_RX_HASH_TABLE) + (hash_reg 2)); mta |= (1 hash_bit); iowrite32(mta, (hw-hw_addr + REG_RX_HASH_TABLE) + (hash_reg 2)); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/6] atl1: read MAC address from register
From: Jay Cliburn [EMAIL PROTECTED] On some Asus motherboards containing the L1 NIC, the MAC address is written by the BIOS directly to the MAC register during POST, and is not stored in eeprom. If we don't succeed in fetching the MAC address from eeprom or spi, try reading it directly from the MAC register. Suggested by Xiong Huang. And do some cleanup while we've got the hood up... Signed-off-by: Jay Cliburn [EMAIL PROTECTED] Signed-off-by: Chris Snook [EMAIL PROTECTED] --- drivers/net/atl1/atl1_hw.c | 35 --- 1 files changed, 20 insertions(+), 15 deletions(-) diff --git a/drivers/net/atl1/atl1_hw.c b/drivers/net/atl1/atl1_hw.c index e28707a..314dbaa 100644 --- a/drivers/net/atl1/atl1_hw.c +++ b/drivers/net/atl1/atl1_hw.c @@ -243,14 +243,8 @@ static int atl1_get_permanent_address(struct atl1_hw *hw) i += 4; } -/* - * The following 2 lines are the Attansic originals. Saving for posterity. - * *(u32 *) eth_addr[2] = LONGSWAP(addr[0]); - * *(u16 *) eth_addr[0] = SHORTSWAP(*(u16 *) addr[1]); - */ - *(u32 *) eth_addr[2] = swab32(addr[0]); - *(u16 *) eth_addr[0] = swab16(*(u16 *) addr[1]); - + *(u32 *) eth_addr[2] = swab32(addr[0]); + *(u16 *) eth_addr[0] = swab16(*(u16 *) addr[1]); if (is_valid_ether_addr(eth_addr)) { memcpy(hw-perm_mac_addr, eth_addr, ETH_ALEN); return 0; @@ -281,17 +275,28 @@ static int atl1_get_permanent_address(struct atl1_hw *hw) i += 4; } -/* - * The following 2 lines are the Attansic originals. Saving for posterity. - * *(u32 *) eth_addr[2] = LONGSWAP(addr[0]); - * *(u16 *) eth_addr[0] = SHORTSWAP(*(u16 *) addr[1]); - */ - *(u32 *) eth_addr[2] = swab32(addr[0]); - *(u16 *) eth_addr[0] = swab16(*(u16 *) addr[1]); + *(u32 *) eth_addr[2] = swab32(addr[0]); + *(u16 *) eth_addr[0] = swab16(*(u16 *) addr[1]); if (is_valid_ether_addr(eth_addr)) { memcpy(hw-perm_mac_addr, eth_addr, ETH_ALEN); return 0; } + + /* +* On some motherboards, the MAC address is written by the +* BIOS directly to the MAC register during POST, and is +* not stored in eeprom. If all else thus far has failed +* to fetch the permanent MAC address, try reading it directly. +*/ + addr[0] = ioread32(hw-hw_addr + REG_MAC_STA_ADDR); + addr[1] = ioread16(hw-hw_addr + (REG_MAC_STA_ADDR + 4)); + *(u32 *) eth_addr[2] = swab32(addr[0]); + *(u16 *) eth_addr[0] = swab16(*(u16 *) addr[1]); + if (is_valid_ether_addr(eth_addr)) { + memcpy(hw-perm_mac_addr, eth_addr, ETH_ALEN); + return 0; + } + return 1; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/6] atl1: remove unused define
From: Chris Snook [EMAIL PROTECTED] Remove unused define from atl1_main.c. Signed-off-by: Chris Snook [EMAIL PROTECTED] Signed-off-by: Jay Cliburn [EMAIL PROTECTED] --- drivers/net/atl1/atl1_main.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/net/atl1/atl1_main.c b/drivers/net/atl1/atl1_main.c index 6655640..abce97e 100644 --- a/drivers/net/atl1/atl1_main.c +++ b/drivers/net/atl1/atl1_main.c @@ -82,7 +82,6 @@ #include atl1.h -#define RUN_REALTIME 0 #define DRIVER_VERSION 2.0.6 char atl1_driver_name[] = atl1; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/6] atl1: add L1 device id to pci_ids, then use it
From: Chris Snook [EMAIL PROTECTED] Add device id for the Attansic L1 chip to pci_ids.h, then use it. Signed-off-by: Chris Snook [EMAIL PROTECTED] Signed-off-by: Jay Cliburn [EMAIL PROTECTED] --- drivers/net/atl1/atl1_main.c |2 +- include/linux/pci_ids.h |1 + 2 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/net/atl1/atl1_main.c b/drivers/net/atl1/atl1_main.c index abce97e..09f3375 100644 --- a/drivers/net/atl1/atl1_main.c +++ b/drivers/net/atl1/atl1_main.c @@ -99,7 +99,7 @@ MODULE_VERSION(DRIVER_VERSION); * atl1_pci_tbl - PCI Device ID Table */ static const struct pci_device_id atl1_pci_tbl[] = { - {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, 0x1048)}, + {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1)}, /* required last entry */ {0,} }; diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 68a7be9..bd21933 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2067,6 +2067,7 @@ #define PCI_DEVICE_ID_TDI_EHCI 0x0101 #define PCI_VENDOR_ID_ATTANSIC 0x1969 +#define PCI_DEVICE_ID_ATTANSIC_L1 0x1048 #define PCI_VENDOR_ID_JMICRON 0x197B #define PCI_DEVICE_ID_JMICRON_JMB360 0x2360 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/6] atl1: bump version number
From: Jay Cliburn [EMAIL PROTECTED] Bump the version number. Signed-off-by: Jay Cliburn [EMAIL PROTECTED] --- drivers/net/atl1/atl1_main.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/atl1/atl1_main.c b/drivers/net/atl1/atl1_main.c index 09f3375..6567348 100644 --- a/drivers/net/atl1/atl1_main.c +++ b/drivers/net/atl1/atl1_main.c @@ -82,7 +82,7 @@ #include atl1.h -#define DRIVER_VERSION 2.0.6 +#define DRIVER_VERSION 2.0.7 char atl1_driver_name[] = atl1; static const char atl1_driver_string[] = Attansic L1 Ethernet Network Driver; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] MODSIGN: Kernel module signing
On Wednesday 14 February 2007 21:45, Dave Jones wrote: well, the situation for external modules is no worse than usual. They still work, they just aren't signed. Which from a distributor point of view, is actually a nice thing, as they stick out like a sore thumb in oops reports with (U) markers :) I agree, that's really what should happen. We solve this by marking modules as supported, partner supported, or unsupported, but in an insecure way, so partners and users could try to fake the support status of a module and/or remove status flags from Oopses, and cryptography wouldn't save us. We could try to sign Oopses which I guess you guys are doing. This whole issue hasn't been a serious problem in the past though, and we generally try to trust users not to play games on us. In the end, it all seems to boils down to a difference in philosophy. Thanks, Andreas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] MODSIGN: Kernel module signing
On Wed, Feb 14, 2007 at 10:14:53PM -0800, Andreas Gruenbacher wrote: On Wednesday 14 February 2007 21:45, Dave Jones wrote: well, the situation for external modules is no worse than usual. They still work, they just aren't signed. Which from a distributor point of view, is actually a nice thing, as they stick out like a sore thumb in oops reports with (U) markers :) I agree, that's really what should happen. We solve this by marking modules as supported, partner supported, or unsupported, but in an insecure way, so partners and users could try to fake the support status of a module and/or remove status flags from Oopses, and cryptography wouldn't save us. We could try to sign Oopses which I guess you guys are doing. This whole issue hasn't been a serious problem in the past though, and we generally try to trust users not to play games on us. For the most part it works out. I've had users file oopses where they've editted out Tainted: P, and left in nvidia(U) for example :-) Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 000/196] V4L/DVB updates
Linus, Please pull 'master' from: git://git.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb.git master Basically, this series adds support for a bunch of newer cards and newer drivers, do some relevant cleanups on cx88 (improving source code readability and reducing binary code size), adds FM radio support on pvrusb2 and do several other fixes and improvements. A more detailed log: - Add support for the ASUS P7131 remote control - Add the Composite over S-Video input on the Asus P7131 Dual - Update cx2341x documentation. - Update cx2341x documentation. - Removed unimplemented cx2341x API commands - Improve cx2341x documentation - Saa7134: add support for the Encore ENL-TV - Updated cardlist to reflect the newly added saa7134 board - DIB3000MC and NOVA T USB2 #2 - Cablestar2 support - DVB: Remove unneeded void * casts in ttpci/av7110 - Remove some unused code from kernel mainstream - Add support for more Encore TV cards - DVB: fix compile error - Make usbvision_rvfree() static - MAINTAINERS: tag pvrusb2 list as subscribers-only - Pvrusb2-hdw kfree cleanup - Cpia module_put cleanup - Tvmixer module_put cleanup - Cleanup: switch to using msecs_to_jiffies() on bttv - Improves some USBVision info messages - Bt8xx: add support for Ultraview DVB-T Lite - SN9C102 driver updates - ZC0301 driver updates. - ET61X251 driver updates. - Fix authorship references - Budget-ci: add support for the Technotrend 1500 bundled remote - Fix OOPS on some waitqueue conditions - Some fixes at stream waitqueue on vivi - Pvrusb2: Enable radio mode round #1 - Pvrusb2: Enable radio mode round #2 - Pvrusb2: Fix for min/max control value checking - Pvrusb2: Implement multiple minor device number handling - Pvrusb2: Implement stream claim checking function - Pvrusb2: Implement /dev/radioX - Pvrusb2: Use enumeration for minor number get / store code - Pvrusb2: Use separate enumeration for get/store of minor number - Pvrusb2: Make units uniform when tracking tuning frequency - Pvrusb2: video standard broadcast fix for radio mode - Pvrusb2: Allow overriding vbi and radio device minor numbers - Pvrusb2: Fix heap corruption introduced by radio mods - Pvrusb2: Fix tuner frequency calculation - Pvrusb2: Fix tuning calculation when in radio mode - Pvrusb2: v4l2 API implementation frequency tweaks - Pvrusb2: Enable radio mode for 24xxx devices - Pvrusb2: Newer frequency range checking - Pvrusb2: Better radio versus tv frequency handling - Pvrusb2: Remove stream claiming hack from /dev/radio - Pvrusb2: Change default volume to something sane - Pvrusb2: cosmetic comment tweak - Pvrusb2: Fix cut/paste bug in auto_mode_switch control - Pvrusb2: Stream configuration cleanups - Pvrusb2: bug fix involving switch into radio mode - Pvrusb2: Be smarter about mode restoration - Cpia.c: buffer overflow - Bttv cropping support - Pvrusb2: It's safe to kfree() a null pointer - Pvrusb2: Use kzalloc instead of kmalloc+memset pairs - Pvrusb2: Allow streaming from /dev/radioX - Pvrusb2: VIDIOC_G_TUNER cleanup - Pvrusb2: Slight debug printing efficiency fixup - Pvrusb2: Remove automodeswitch control - Pvrusb2: Stop hardcoding frequency ranges - Pvrusb2: trace print added - Pvrusb2: Fix missing break statement on VIDIOC_S_TUNER - Pvrusb2: Fix sizeof() calculation foul-up - Pvrusb2: Minor dead code / comment cleanups - Pvrusb2: V4L EXT_CTRLS fixup - Pvrusb2: A patch to use ARRAY_SIZE macro when appropriate - Pvrusb2: Use kzalloc in place of kmalloc/memset pairs - Pvrusb2: Use ARRAY_SIZE wherever possible - Pvrusb2: Emit VIDIOC_S_TUNER correctly - Pvrusb2: Introduce fake audio input selection - Pvrusb2: Allow VIDIOC_S_FMT with -1 for resolution values - Convert cx8800 driver to video_ioctl2 handler - Added support for V4L2_STD_NTSC_443 - Uncommented NTSC/443 video standard - Make cx88-blackbird to work again - Renamed video_mux to cx88_video_mux - make videodev to auto-generate standards - Fix vidioc_g_tuner handling - Moved several stuff that were at cx88-video to cx88-blackbird.c - Reorder some ioctl handlers - Do some cleanups at cx88-blackbird - Use cx88_set_freq() on cx88-blackbird.c - Remove_cx88_ioctl - Convert cx88-blackbird to use video_ioctl2 - Keep the previous tvnorm default for cx88 and cx88-blackbird - Saa7134: add support for Terratec Cinergy HT PCI - Adds video output routing - Cx88: Add support for svideo/composite input of the Terratec Cinergy 1400 DVB-T - Remove some warnings when compiling on x86_64 - Fix: VIDIOC_G_TUNER were returning an endless number of tuners - Various cx2341x documentation updates/fixes. - Proper vendor/device ID for the CinergyT2 input device - Dvb-usb: Initial support for MSI Mega Sky 580 based on Uli m9206 - Dvb-usb:
Re: [PATCH 1/1] Fabric7 VIOC driver source code
On Wed, 07 Feb 2007 13:07:40 -0800 Sriram Chidambaram [EMAIL PROTECTED] wrote: This patch provides the Fabric7 VIOC driver source code. This git mbox patch is built against git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git The patch can be pulled from ftp://ftp.fabric7.com/VIOC/Fabric7-VIOC-driver-patch.FEB-07-2007 For people wondering what this is, the documentation file is below. I'll pull this driver into my queue so that it doesn't get lost and to give people an opportunity to review it more easily. From a quick peek, I'd expect some changes to be needed: stylistic things, plus some suspicious looking PCI-poking in vioc_irq.c. But I didn't look at it at all closely. The driver needed a bit of help to make it compile on ia64 (I haven't tried any other architectures). If it's simply not possible that this device will ever be present on any non-x86 machines then perhaps we should restrict it to those architectures at kernel configuration time. But then, all the changes I made were good ones.. Overview A Virtual Input-Output Controller (VIOC) is a PCI device that provides 10Gbps of I/O bandwidth that can be shared by up to 16 virtual network interfaces (VNICs). VIOC hardware supports several features such as large frames, checksum offload, gathered send, MSI/MSI-X, bandwidth control, interrupt mitigation, etc. VNICs are provisioned to a host partition via an out-of-band interface from the System Controller -- typically before the partition boots, although they can be dynamically added or removed from a running partition as well. Each provisioned VNIC appears as an Ethernet netdevice to the host OS, and maintains its own transmit ring in DMA memory. VNICs are configured to share up to 4 of total 16 receive rings and 1 of total 16 receive-completion rings in DMA memory. VIOC hardware classifies packets into receive rings based on size, allowing more efficient use of DMA buffer memory. The default, and recommended, configuration uses groups of 'receive sets' (rxsets), each with 3 receive rings, a receive completion ring, and a VIOC Rx interrupt. The driver gives each rxset a NAPI poll handler associated with a phantom (invisible) netdevice, for concurrency. VNICs are assigned to rxsets using a simple modulus. VIOC provides 4 interrupts in INTx mode: 2 for Rx, 1 for Tx, and 1 for out-of-band messages from the System Controller and errors. VIOC also provides 19 MSI-X interrupts: 16 for Rx, 1 for Tx, 1 for out-of-band messages from the System Controller, and 1 for error signalling from the hardware. The VIOC driver makes a determination whether MSI-X functionality is supported and initializes interrupts accordingly. [Note: The Linux kernel disables MSI-X for VIOCs on modules with AMD 8131, even if the device is on the HT link.] Module loadable parameters == - poll_weight (default 8) - the number of received packets will be processed during one call into the NAPI poll handler. - rx_intr_timeout (default 1) - hardware rx interrupt mitigation timer, in units of 5us. - rx_intr_pkt_cnt (default 64) - hardware rx interrupt mitigation counter, in units of packets. - tx_pkts_per_irq (default 64) - hardware tx interrupt mitigation counter, in units of packets. - tx_pkts_per_bell (default 1) - the number of packets to enqueue on a transmit ring before issuing a doorbell to hardware. Performance Tuning == You may want to use the following sysctl settings to improve performance. [NOTE: To be re-checked] # set in /etc/sysctl.conf net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 net.ipv4.tcp_rmem = 1000 1000 1000 net.ipv4.tcp_wmem = 1000 1000 1000 net.ipv4.tcp_mem = 1000 1000 1000 net.core.rmem_max = 5242879 net.core.wmem_max = 5242879 net.core.rmem_default = 5242879 net.core.wmem_default = 5242879 net.core.optmem_max = 5242879 net.core.netdev_max_backlog = 10 Out-of-band Communications with System Controller = System operators can use the out-of-band facility to allow for remote shutdown or reboot of the host partition. Upon receiving such a command, the VIOC driver executes /sbin/reboot or /sbin/shutdown via the usermodehelper() call. This same communications facility is used for dynamic VNIC provisioning (plug in and out). The VIOC driver also registers a callback with register_reboot_notifier(). When the callback is executed, the driver records the shutdown event and reason in a VIOC register to notify the System Controller. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 16/21] Xen-paravirt: Add code into head.S to handle being booted by Xen
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Ok. If that is all this may be a difference that makes no difference. binutils has a bad habit of looking at sections (which are fully optional) instead of segments on ET_EXEC and ET_DYN objects. Only ET_REL objects (.o files) are required to have sections. The Xen domain loader will have to be changed to deal with that, which isn't too much of a problem. Ok. Please fix the Xen domain loader to not look at sections. It is a bug for any kind of executable loader to look at anything other then segments. My main concern is the randomness of it, and whether it will fail in some more harmful way on other versions of binutils. Reasonable and it's probably worth letting the binutils developer know. I do agree that it is weird. It might be that something in binutils doesn't like us dropping some of the notes. So I recommend for testing write a 100 line program that includes elf.h and reads out the note segment. If all is well we can split this code out. The Xen readnotes utility is essentially that. I'll hack it. Sounds good. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 16/21] Xen-paravirt: Add code into head.S to handle being booted by Xen
Eric W. Biederman wrote: Reasonable and it's probably worth letting the binutils developer know. I do agree that it is weird. It might be that something in binutils doesn't like us dropping some of the notes. What do you mean by dropping some of the notes? I think the only notes (at least in this case) are the Xen ones, and they're all included. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 16/21] Xen-paravirt: Add code into head.S to handle being booted by Xen
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Reasonable and it's probably worth letting the binutils developer know. I do agree that it is weird. It might be that something in binutils doesn't like us dropping some of the notes. What do you mean by dropping some of the notes? I think the only notes (at least in this case) are the Xen ones, and they're all included. I'm pretty certain we explicitly drop the weird GNU note that is automatically generated by gcc and specifies something informational. Basically into .note we include *(.note.*) but not *(.note). I don't think anything we are doing is wrong but ld gets confused easily in the corner cases. I'm modestly surprised we didn't have to mark our .note.xxx scions as .section .note.xxx @note or whatever the proper gas syntax is. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 16/21] Xen-paravirt: Add code into head.S to handle being booted by Xen
Eric W. Biederman wrote: I'm pretty certain we explicitly drop the weird GNU note that is automatically generated by gcc and specifies something informational. But that's something else again, since it appears as a PT_GNU_STACK phdr. I don't think anything we are doing is wrong but ld gets confused easily in the corner cases. I'm modestly surprised we didn't have to mark our .note.xxx scions as .section .note.xxx @note or whatever the proper gas syntax is. I did try that, and it didn't make a difference. The manual says that the output section type follows the input section type, so I agree its a bit surprising we ever get a SHT_NOTE out of it without the @note stuff. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 16/21] Xen-paravirt: Add code into head.S to handle being booted by Xen
Jeremy Fitzhardinge [EMAIL PROTECTED] writes: Eric W. Biederman wrote: I'm pretty certain we explicitly drop the weird GNU note that is automatically generated by gcc and specifies something informational. But that's something else again, since it appears as a PT_GNU_STACK phdr. Not that. It's more like abi version or gcc version or something like. At least there used to be one of those notes in every .o file and compiled program. I don't think anything we are doing is wrong but ld gets confused easily in the corner cases. I'm modestly surprised we didn't have to mark our .note.xxx scions as .section .note.xxx @note or whatever the proper gas syntax is. I did try that, and it didn't make a difference. The manual says that the output section type follows the input section type, so I agree its a bit surprising we ever get a SHT_NOTE out of it without the @note stuff. Right. So the surprise is that SHT_NOTE got set. There are some defaults based on the section name somewhere that appear to have done the right thing. My best hunch really is that ld treated the .note sections normally and just mist the handling of the magic SHT_NOTE type. Which is why I'm not to worried. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] autofs4 - fix another race between mount and expire
Hi Andrew, Jeff Moyer has identified a race between mount and expire. What happens is that during an expire the situation can arise that a directory is removed and another lookup is done before the expire issues a completion status to the kernel module. In this case, since the the lookup gets a new dentry, it doesn't know that there is an expire in progress and when it posts its mount request, matches the existing expire request and waits for its completion. ENOENT is then returned to user space from lookup (as the dentry passed in is now unhashed) without having performed the mount request. The solution used here is to keep track of dentrys in this unhashed state and reuse them, if possible, in order to preserve the flags. Additionally, this infrastructure will provide the framework for the reintroduction of caching of mount fails removed earlier in development. Signed-off-by: Ian Kent [EMAIL PROTECTED] Acked-by: Jeff Moyer [EMAIL PROTECTED] Ian --- --- linux-2.6.20/fs/autofs4/autofs_i.h.lookup-expire-race 2007-02-05 03:44:54.0 +0900 +++ linux-2.6.20/fs/autofs4/autofs_i.h 2007-02-12 12:15:17.0 +0900 @@ -52,6 +52,8 @@ struct autofs_info { int flags; + struct list_head rehash; + struct autofs_sb_info *sbi; unsigned long last_used; atomic_t count; @@ -110,6 +112,8 @@ struct autofs_sb_info { struct mutex wq_mutex; spinlock_t fs_lock; struct autofs_wait_queue *queues; /* Wait queue pointer */ + spinlock_t rehash_lock; + struct list_head rehash_list; }; static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb) --- linux-2.6.20/fs/autofs4/root.c.lookup-expire-race 2007-02-05 03:44:54.0 +0900 +++ linux-2.6.20/fs/autofs4/root.c 2007-02-12 12:14:51.0 +0900 @@ -263,7 +263,7 @@ static int try_to_fill_dentry(struct den */ status = d_invalidate(dentry); if (status != -EBUSY) - return -ENOENT; + return -EAGAIN; } DPRINTK(dentry=%p %.*s ino=%p, @@ -413,7 +413,16 @@ static int autofs4_revalidate(struct den */ status = try_to_fill_dentry(dentry, flags); if (status == 0) - return 1; + return 1; + + /* +* A status of EAGAIN here means that the dentry has gone +* away while waiting for an expire to complete. If we are +* racing with expire lookup will wait for it so this must +* be a revalidate and we need to send it to lookup. +*/ + if (status == -EAGAIN) + return 0; return status; } @@ -459,9 +468,18 @@ void autofs4_dentry_release(struct dentr de-d_fsdata = NULL; if (inf) { + struct autofs_sb_info *sbi = autofs4_sbi(de-d_sb); + inf-dentry = NULL; inf-inode = NULL; + if (sbi) { + spin_lock(sbi-rehash_lock); + if (!list_empty(inf-rehash)) + list_del(inf-rehash); + spin_unlock(sbi-rehash_lock); + } + autofs4_free_ino(inf); } } @@ -478,10 +496,80 @@ static struct dentry_operations autofs4_ .d_release = autofs4_dentry_release, }; +static struct dentry *autofs4_lookup_unhashed(struct autofs_sb_info *sbi, struct dentry *parent, struct qstr *name) +{ + unsigned int len = name-len; + unsigned int hash = name-hash; + const unsigned char *str = name-name; + struct list_head *p, *head; + + spin_lock(dcache_lock); + spin_lock(sbi-rehash_lock); + head = sbi-rehash_list; + list_for_each(p, head) { + struct autofs_info *ino; + struct dentry *dentry; + struct qstr *qstr; + + ino = list_entry(p, struct autofs_info, rehash); + dentry = ino-dentry; + + spin_lock(dentry-d_lock); + + /* Bad luck, we've already been dentry_iput */ + if (!dentry-d_inode) + goto next; + + qstr = dentry-d_name; + + if (dentry-d_name.hash != hash) + goto next; + if (dentry-d_parent != parent) + goto next; + + if (qstr-len != len) + goto next; + if (memcmp(qstr-name, str, len)) + goto next; + + if (d_unhashed(dentry)) { + struct autofs_info *ino = autofs4_dentry_ino(dentry); + struct inode *inode = dentry-d_inode; + + list_del_init(ino-rehash); + dget(dentry); +
[PATCH 2/2] autofs4 - check for directory re-create in lookup
Hi Andrew, This problem was identified and fixed some time ago by Jeff Moyer but it fell through the cracks somehow. It is possible that a user space application could remove and re-create a directory during a request. To avoid returning a failure from lookup incorrectly when our current dentry is unhashed we need to check if another positive, hashed dentry matching this one exists and if so return it instead of a fail. Signed-off-by: Jeff Moyer [EMAIL PROTECTED] Signed-off-by: Ian Kent [EMAIL PROTECTED] Ian --- --- linux-2.6.20/fs/autofs4/root.c.lookup-check-unhased 2007-02-12 13:49:46.0 +0900 +++ linux-2.6.20/fs/autofs4/root.c 2007-02-12 13:54:58.0 +0900 @@ -655,14 +655,29 @@ static struct dentry *autofs4_lookup(str /* * If this dentry is unhashed, then we shouldn't honour this -* lookup even if the dentry is positive. Returning ENOENT here -* doesn't do the right thing for all system calls, but it should -* be OK for the operations we permit from an autofs. +* lookup. Returning ENOENT here doesn't do the right thing +* for all system calls, but it should be OK for the operations +* we permit from an autofs. */ if (dentry-d_inode d_unhashed(dentry)) { + /* +* A user space application can (and has done in the past) +* remove and re-create this directory during the callback. +* This can leave us with an unhashed dentry, but a +* successful mount! So we need to perform another +* cached lookup in case the dentry now exists. +*/ + struct dentry *parent = dentry-d_parent; + struct dentry *new = d_lookup(parent, dentry-d_name); + if (new != NULL) + dentry = new; + else + dentry = ERR_PTR(-ENOENT); + if (unhashed) dput(unhashed); - return ERR_PTR(-ENOENT); + + return dentry; } if (unhashed) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [uml-devel] x86_64: fix 2.6.18 regression - PTRACE_OLDSETOPTIONS should be accepted
On Thu, 15 Feb 2007 04:43:41 +0100 Blaisorblade [EMAIL PROTECTED] wrote: I sent an equivalent patch in earlier today: Doh! Interesting this timing... Index: linux-2.6/arch/x86_64/ia32/ptrace32.c === --- linux-2.6.orig/arch/x86_64/ia32/ptrace32.c +++ linux-2.6/arch/x86_64/ia32/ptrace32.c @@ -239,6 +239,8 @@ asmlinkage long sys32_ptrace(long reques __u32 val; switch (request) { + case PTRACE_OLDSETOPTIONS: + request = PTRACE_SETOPTIONS; case PTRACE_TRACEME: case PTRACE_ATTACH: case PTRACE_KILL: I change the request so that PTRACE_OLDSETOPTIONS doesn't need to propogate any further. However, it is present in include/asm-x86_64, so I guess that counts as being part of the x86_64 ABI. That being the case, I guess my patch can be dropped in favor of this one. It is handled in ptrace_request, unless there are include problems. I'm going to reboot and test mine for any remaining problem. Whatever happens, please ensure that the final fix makes it into -stable as well. Jeff's version of this patch wasn't cc'ed to [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sata_nv ADMA controller lockup investigation
While testing out some libata FUA changes I was working on, I was inadvertently able to reproduce the kind of NCQ command timeouts in sata_nv that a few people have reported. I since verified that the FUA stuff had nothing to do with it as it still happens even with FUA disabled. However I'm somewhat at a loss as to how to further debug this, so I'm posting my findings in the hope that somebody has some more ideas (or anyone at NVIDIA decides to come forth with a tip or two). The conditions in which I can reproduce this are with: ext3 filesystem mounted with -o barrier=1 Two instances of a program which truncates a file, then writes single bytes to it, fsyncing after each one. Simultaneously, repeatedly writing 100MB from /dev/zero to a file using dd. A command timeout usually happens within a few minutes. With my working copy loaded up with a ton of extra debugging, the exception report for one of these looks like this. My comments are indented. ata4: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 cpb count 0x0 next cpb idx 0x0 This is just dumping all of the ADMA registers when the timeout happened. ata4: last intr at 1171511467:501179, status 0x1540 This shows the time of the last interrupt in seconds:microseconds and the ADMA status register contents at that time. ata4: cmd 61/08:00:40:36:75/00:00:0c:00:00/40 tag 0 at 1171511467:360525 done 1171511467:393064, stat before 0x400 after 0x400 ata4: cmd 61/40:00:80:a1:64/00:00:0a:00:00/40 tag 0 at 1171511467:393928 done 1171511467:394345, stat before 0x500 after 0x400 ata4: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 at 1171511467:394400 done 1171511467:425548, stat before 0x500 after 0x400 ata4: cmd 61/08:00:c0:a1:64/00:00:0a:00:00/40 tag 0 at 1171511467:425556 done 1171511467:425694, stat before 0x500 after 0x400 ata4: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 at 1171511467:425699 done 1171511467:433896, stat before 0x500 after 0x400 ata4: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 at 1171511467:433958 done 1171511467:433971, stat before 0x500 after 0x400 ata4: cmd 61/08:00:c8:a1:64/00:00:0a:00:00/40 tag 0 at 1171511467:433978 done 1171511467:434152, stat before 0x500 after 0x400 ata4: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 at 1171511467:434160 done 1171511467:442326, stat before 0x500 after 0x400 ata4: cmd 61/08:00:d0:a1:64/00:00:0a:00:00/40 tag 0 at 1171511467:442389 done 1171511467:442843, stat before 0x500 after 0x400 ata4: cmd 61/08:08:88:7e:75/00:00:0c:00:00/40 tag 1 at 1171511467:442395 done 1171511467:442846, stat before 0x400 after 0x400 ata4: cmd 61/e8:10:08:58:77/01:00:0c:00:00/40 tag 2 at 1171511467:442419 done 1171511467:445010, stat before 0x400 after 0x400 ata4: cmd 61/e8:18:f0:59:77/01:00:0c:00:00/40 tag 3 at 1171511467:442437 done 1171511467:447182, stat before 0x0 after 0x0 ata4: cmd 61/e8:20:d8:5b:77/01:00:0c:00:00/40 tag 4 at 1171511467:442455 done 1171511467:449343, stat before 0x0 after 0x0 ata4: cmd 61/e8:28:c0:5d:77/01:00:0c:00:00/40 tag 5 at 1171511467:442475 done 1171511467:451543, stat before 0x0 after 0x0 ata4: cmd 61/30:30:a8:5f:77/00:00:0c:00:00/40 tag 6 at 1171511467:442481 done 1171511467:451833, stat before 0x0 after 0x0 ata4: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 at 1171511467:451858 done 1171511467:492486, stat before 0x500 after 0x400 ata4: cmd 61/08:00:d8:a1:64/00:00:0a:00:00/40 tag 0 at 1171511467:492498 done 1171511467:492666, stat before 0x500 after 0x400 ata4: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 at 1171511467:492671 done 1171511467:500909, stat before 0x500 after 0x400 ata4: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 at 1171511467:501167 done 1171511467:501181, stat before 0x500 after 0x400 ata4: cmd 61/08:00:e0:a1:64/00:00:0a:00:00/40 tag 0 at 1171511467:501187 done 0:0, stat before 0x500 after 0x400 These lines show the last 20 commands issued, the contents of the taskfile, the tag, the time in sec:usec they were issued, the time in sec:usec they completed (0:0 for still incomplete), the ADMA status register contents before issuing the command, and the register contents after issuing the command. ata4: CPB 0: ctl_flags 0x1f, resp_flags 0x0 Contents of the outstanding CPB's flags, showing that the controller seems not to have touched it, released and done flags are clear. ata4: timeout waiting for ADMA IDLE, stat=0x400 ata4: timeout waiting for ADMA LEGACY, stat=0x400 As part of error handling we try to switch the controller back to legacy mode. We time out waiting for the controller to show IDLE, and then clear the GO bit, and then time out waiting for it to show the LEGACY state. Right after this we beat it over the head with NV_ADMA_CTL_CHANNEL_RESET which finally seems to restore its senses, until one of these happens again. ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0
[PATCH 2.6.21-rc1 0/5] ehca patch set for 2.6.21-rc1
Hello Roland! Here is a patch set for ehca with the following changes resp. bug fixes: * Reworked irq handler to avoid/reduce missed irq events * Fix race condition bug in find_next_online_cpu() and other potential locking issue of scaling code * Allow scaling code to be configurable (en-/disable) via module parameter * Replace yield() in ehca_destroy_cq() by wait_for_completion() * ehca_query_port() now returns LINK_UP for phys_state instead UNKNOWN Thanks! Nam - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler to avoid/reduce missed irq events
Hi, here is a patch for ehca with the reworked irq handler. Thanks Nam Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- ehca_classes.h | 18 +++-- ehca_eq.c |1 ehca_irq.c | 200 - ehca_irq.h |1 ehca_main.c| 24 +- ipz_pt_fn.h|9 ++ 6 files changed, 172 insertions(+), 81 deletions(-) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-11 21:31:06.0 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-14 12:53:41.0 +0100 @@ -42,8 +42,6 @@ #ifndef __EHCA_CLASSES_H__ #define __EHCA_CLASSES_H__ -#include ehca_classes.h -#include ipz_pt_fn.h struct ehca_module; struct ehca_qp; @@ -54,14 +52,22 @@ struct ehca_mw; struct ehca_pd; struct ehca_av; +#include rdma/ib_verbs.h +#include rdma/ib_user_verbs.h + #ifdef CONFIG_PPC64 #include ehca_classes_pSeries.h #endif +#include ipz_pt_fn.h +#include ehca_qes.h +#include ehca_irq.h -#include rdma/ib_verbs.h -#include rdma/ib_user_verbs.h +#define EHCA_EQE_CACHE_SIZE 20 -#include ehca_irq.h +struct ehca_eqe_cache_entry { + struct ehca_eqe *eqe; + struct ehca_cq *cq; +}; struct ehca_eq { u32 length; @@ -74,6 +80,8 @@ struct ehca_eq { spinlock_t spinlock; struct tasklet_struct interrupt_task; u32 ist; + spinlock_t irq_spinlock; + struct ehca_eqe_cache_entry eqe_cache[EHCA_EQE_CACHE_SIZE]; }; struct ehca_sport { diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_eq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_eq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_eq.c2007-02-11 21:31:06.0 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_eq.c2007-02-14 12:53:40.0 +0100 @@ -61,6 +61,7 @@ int ehca_create_eq(struct ehca_shca *shc struct ib_device *ib_dev = shca-ib_device; spin_lock_init(eq-spinlock); + spin_lock_init(eq-irq_spinlock); eq-is_initialized = 0; if (type != EHCA_EQ type != EHCA_NEQ) { diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-11 21:36:12.0 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 13:07:54.0 +0100 @@ -401,87 +400,143 @@ irqreturn_t ehca_interrupt_eq(int irq, v return IRQ_HANDLED; } -void ehca_tasklet_eq(unsigned long data) -{ - struct ehca_shca *shca = (struct ehca_shca*)data; - struct ehca_eqe *eqe; - int int_state; - int query_cnt = 0; - do { - eqe = (struct ehca_eqe *)ehca_poll_eq(shca, shca-eq); +static inline void process_eqe(struct ehca_shca *shca, struct ehca_eqe *eqe) +{ + u64 eqe_value; + u32 token; + unsigned long flags; + struct ehca_cq *cq; + eqe_value = eqe-entry; + ehca_dbg(shca-ib_device, eqe_value=%lx, eqe_value); + if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, eqe_value)) { + ehca_dbg(shca-ib_device, ... completion event); + token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe_value); + spin_lock_irqsave(ehca_cq_idr_lock, flags); + cq = idr_find(ehca_cq_idr, token); + if (cq == NULL) { + spin_unlock_irqrestore(ehca_cq_idr_lock, flags); + ehca_err(shca-ib_device, +Invalid eqe for non-existing cq token=%x, +token); + return; + } + reset_eq_pending(cq); +#ifdef CONFIG_INFINIBAND_EHCA_SCALING + queue_comp_task(cq); + spin_unlock_irqrestore(ehca_cq_idr_lock, flags); +#else + spin_unlock_irqrestore(ehca_cq_idr_lock, flags); + comp_event_callback(cq); +#endif + } else { + ehca_dbg(shca-ib_device, +Got non completion event); + parse_identifier(shca, eqe_value); + } +} - if ((shca-hw_level = 2) eqe) - int_state = 1; - else - int_state = 0; +void ehca_process_eq(struct ehca_shca *shca, int is_irq) +{ + struct ehca_eq *eq = shca-eq; + struct ehca_eqe_cache_entry *eqe_cache = eq-eqe_cache; + u64 eqe_value; + unsigned long flags; + int eqe_cnt, i; + int eq_empty = 0; - while ((int_state == 1) || eqe) { - while (eqe) { - u64 eqe_value = eqe-entry; - - ehca_dbg(shca-ib_device, -eqe_value=%lx, eqe_value); - -
[PATCH 2.6.21-rc1 2/5] ehca: fix race condition/locking issues in scaling code
Hi, this patch fixes a race condition in find_next_cpu_online() and some other locking issues in scaling code. Thanks Nam Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- ehca_irq.c | 68 + 1 files changed, 33 insertions(+), 35 deletions(-) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 14:16:45.0 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 14:16:35.0 +0100 @@ -544,28 +544,30 @@ void ehca_tasklet_eq(unsigned long data) static inline int find_next_online_cpu(struct ehca_comp_pool* pool) { - unsigned long flags_last_cpu; + int cpu; + unsigned long flags; + WARN_ON_ONCE(!in_interrupt()); if (ehca_debug_level) ehca_dmp(cpu_online_map, sizeof(cpumask_t), ); - spin_lock_irqsave(pool-last_cpu_lock, flags_last_cpu); - pool-last_cpu = next_cpu(pool-last_cpu, cpu_online_map); - if (pool-last_cpu == NR_CPUS) - pool-last_cpu = first_cpu(cpu_online_map); - spin_unlock_irqrestore(pool-last_cpu_lock, flags_last_cpu); + spin_lock_irqsave(pool-last_cpu_lock, flags); + cpu = next_cpu(pool-last_cpu, cpu_online_map); + if (cpu == NR_CPUS) + cpu = first_cpu(cpu_online_map); + pool-last_cpu = cpu; + spin_unlock_irqrestore(pool-last_cpu_lock, flags); - return pool-last_cpu; + return cpu; } static void __queue_comp_task(struct ehca_cq *__cq, struct ehca_cpu_comp_task *cct) { - unsigned long flags_cct; - unsigned long flags_cq; + unsigned long flags; - spin_lock_irqsave(cct-task_lock, flags_cct); - spin_lock_irqsave(__cq-task_lock, flags_cq); + spin_lock_irqsave(cct-task_lock, flags); + spin_lock(__cq-task_lock); if (__cq-nr_callbacks == 0) { __cq-nr_callbacks++; @@ -576,8 +578,8 @@ static void __queue_comp_task(struct ehc else __cq-nr_callbacks++; - spin_unlock_irqrestore(__cq-task_lock, flags_cq); - spin_unlock_irqrestore(cct-task_lock, flags_cct); + spin_unlock(__cq-task_lock); + spin_unlock_irqrestore(cct-task_lock, flags); } static void queue_comp_task(struct ehca_cq *__cq) @@ -588,69 +590,69 @@ static void queue_comp_task(struct ehca_ cpu = get_cpu(); cpu_id = find_next_online_cpu(pool); - BUG_ON(!cpu_online(cpu_id)); cct = per_cpu_ptr(pool-cpu_comp_tasks, cpu_id); + BUG_ON(!cct); if (cct-cq_jobs 0) { cpu_id = find_next_online_cpu(pool); cct = per_cpu_ptr(pool-cpu_comp_tasks, cpu_id); + BUG_ON(!cct); } __queue_comp_task(__cq, cct); - - put_cpu(); - - return; } static void run_comp_task(struct ehca_cpu_comp_task* cct) { struct ehca_cq *cq; - unsigned long flags_cct; - unsigned long flags_cq; + unsigned long flags; - spin_lock_irqsave(cct-task_lock, flags_cct); + spin_lock_irqsave(cct-task_lock, flags); while (!list_empty(cct-cq_list)) { cq = list_entry(cct-cq_list.next, struct ehca_cq, entry); - spin_unlock_irqrestore(cct-task_lock, flags_cct); + spin_unlock_irqrestore(cct-task_lock, flags); comp_event_callback(cq); - spin_lock_irqsave(cct-task_lock, flags_cct); + spin_lock_irqsave(cct-task_lock, flags); - spin_lock_irqsave(cq-task_lock, flags_cq); + spin_lock(cq-task_lock); cq-nr_callbacks--; if (cq-nr_callbacks == 0) { list_del_init(cct-cq_list.next); cct-cq_jobs--; } - spin_unlock_irqrestore(cq-task_lock, flags_cq); - + spin_unlock(cq-task_lock); } - spin_unlock_irqrestore(cct-task_lock, flags_cct); - - return; + spin_unlock_irqrestore(cct-task_lock, flags); } static int comp_task(void *__cct) { struct ehca_cpu_comp_task* cct = __cct; + int cql_empty; DECLARE_WAITQUEUE(wait, current); set_current_state(TASK_INTERRUPTIBLE); while(!kthread_should_stop()) { add_wait_queue(cct-wait_queue, wait); - if (list_empty(cct-cq_list)) + spin_lock_irq(cct-task_lock); + cql_empty = list_empty(cct-cq_list); + spin_unlock_irq(cct-task_lock); + if (cql_empty) schedule(); else __set_current_state(TASK_RUNNING); remove_wait_queue(cct-wait_queue, wait); - if (!list_empty(cct-cq_list)) +
[PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion()
Hi, this patch removes yield() and uses wait_for_completion() in order to wait for running completion handlers finished before destroying associated completion queue. Thanks Nam Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- ehca_classes.h |3 +++ ehca_cq.c |3 ++- ehca_irq.c |6 +- 3 files changed, 10 insertions(+), 2 deletions(-) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-14 13:52:49.0 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-14 13:52:06.0 +0100 @@ -52,6 +52,8 @@ struct ehca_mw; struct ehca_pd; struct ehca_av; +#include linux/completion.h + #include rdma/ib_verbs.h #include rdma/ib_user_verbs.h @@ -154,6 +156,7 @@ struct ehca_cq { struct hlist_head qp_hashtab[QP_HASHTAB_LEN]; struct list_head entry; u32 nr_callbacks; + struct completion zero_callbacks; spinlock_t task_lock; u32 ownpid; /* mmap counter for resources mapped into user space */ diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_cq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_cq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_cq.c2007-02-14 13:52:49.0 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_cq.c2007-02-14 13:52:06.0 +0100 @@ -147,6 +147,7 @@ struct ib_cq *ehca_create_cq(struct ib_d spin_lock_init(my_cq-spinlock); spin_lock_init(my_cq-cb_lock); spin_lock_init(my_cq-task_lock); + init_completion(my_cq-zero_callbacks); my_cq-ownpid = current-tgid; cq = my_cq-ib_cq; @@ -332,7 +333,7 @@ int ehca_destroy_cq(struct ib_cq *cq) spin_lock_irqsave(ehca_cq_idr_lock, flags); while (my_cq-nr_callbacks) { spin_unlock_irqrestore(ehca_cq_idr_lock, flags); - yield(); + wait_for_completion(my_cq-zero_callbacks); spin_lock_irqsave(ehca_cq_idr_lock, flags); } diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 13:52:49.0 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 13:52:06.0 +0100 @@ -605,6 +605,7 @@ static void run_comp_task(struct ehca_cp spin_lock_irqsave(cct-task_lock, flags); while (!list_empty(cct-cq_list)) { + int is_complete = 0; cq = list_entry(cct-cq_list.next, struct ehca_cq, entry); spin_unlock_irqrestore(cct-task_lock, flags); comp_event_callback(cq); @@ -612,11 +613,14 @@ static void run_comp_task(struct ehca_cp spin_lock(cq-task_lock); cq-nr_callbacks--; - if (cq-nr_callbacks == 0) { + is_complete = (cq-nr_callbacks == 0); + if (is_complete) { list_del_init(cct-cq_list.next); cct-cq_jobs--; } spin_unlock(cq-task_lock); + if (is_complete) /* wake up waiting destroy_cq() */ + complete(cq-zero_callbacks); } spin_unlock_irqrestore(cct-task_lock, flags); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2.6.21-rc1 3/5] ehca: allow en/disabling scaling code via module parameter
Hi, here is a patch for ehca that allows users to en/disable scaling code when loading ib_ehca module. Thanks Nam Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- Kconfig|8 ehca_classes.h |1 + ehca_irq.c | 47 +-- ehca_main.c|4 4 files changed, 26 insertions(+), 34 deletions(-) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/Kconfig infiniband_work/drivers/infiniband/hw/ehca/Kconfig --- infiniband_orig/drivers/infiniband/hw/ehca/Kconfig 2007-02-14 14:18:16.0 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/Kconfig 2007-02-14 14:20:52.0 +0100 @@ -7,11 +7,3 @@ config INFINIBAND_EHCA To compile the driver as a module, choose M here. The module will be called ib_ehca. -config INFINIBAND_EHCA_SCALING - bool Scaling support (EXPERIMENTAL) - depends on IBMEBUS INFINIBAND_EHCA HOTPLUG_CPU EXPERIMENTAL - default y - ---help--- - eHCA scaling support schedules the CQ callbacks to different CPUs. - - To enable this feature choose Y here. diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-14 14:18:16.0 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_classes.h 2007-02-14 14:20:17.0 +0100 @@ -277,6 +277,7 @@ extern struct idr ehca_cq_idr; extern int ehca_static_rate; extern int ehca_port_act_time; extern int ehca_use_hp_mr; +extern int ehca_scaling_code; struct ipzu_queue_resp { u32 qe_size; /* queue entry size */ diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 14:18:16.0 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_irq.c 2007-02-14 14:20:17.0 +0100 @@ -63,15 +63,11 @@ #define ERROR_DATA_LENGTH EHCA_BMASK_IBM(52,63) #define ERROR_DATA_TYPEEHCA_BMASK_IBM(0,7) -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - static void queue_comp_task(struct ehca_cq *__cq); static struct ehca_comp_pool* pool; static struct notifier_block comp_pool_callback_nb; -#endif - static inline void comp_event_callback(struct ehca_cq *cq) { if (!cq-ib_cq.comp_handler) @@ -423,13 +419,13 @@ static inline void process_eqe(struct eh return; } reset_eq_pending(cq); -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - queue_comp_task(cq); - spin_unlock_irqrestore(ehca_cq_idr_lock, flags); -#else - spin_unlock_irqrestore(ehca_cq_idr_lock, flags); - comp_event_callback(cq); -#endif + if (ehca_scaling_code) { + queue_comp_task(cq); + spin_unlock_irqrestore(ehca_cq_idr_lock, flags); + } else { + spin_unlock_irqrestore(ehca_cq_idr_lock, flags); + comp_event_callback(cq); + } } else { ehca_dbg(shca-ib_device, Got non completion event); @@ -508,13 +504,12 @@ void ehca_process_eq(struct ehca_shca *s /* call completion handler for cached eqes */ for (i = 0; i eqe_cnt; i++) if (eq-eqe_cache[i].cq) { -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - spin_lock(ehca_cq_idr_lock); - queue_comp_task(eq-eqe_cache[i].cq); - spin_unlock(ehca_cq_idr_lock); -#else - comp_event_callback(eq-eqe_cache[i].cq); -#endif + if (ehca_scaling_code) { + spin_lock(ehca_cq_idr_lock); + queue_comp_task(eq-eqe_cache[i].cq); + spin_unlock(ehca_cq_idr_lock); + } else + comp_event_callback(eq-eqe_cache[i].cq); } else { ehca_dbg(shca-ib_device, Got non completion event); parse_identifier(shca, eq-eqe_cache[i].eqe-entry); @@ -540,8 +535,6 @@ void ehca_tasklet_eq(unsigned long data) ehca_process_eq((struct ehca_shca*)data, 1); } -#ifdef CONFIG_INFINIBAND_EHCA_SCALING - static inline int find_next_online_cpu(struct ehca_comp_pool* pool) { int cpu; @@ -764,14 +757,14 @@ static int comp_pool_callback(struct not return NOTIFY_OK; } -#endif - int ehca_create_comp_pool(void) { -#ifdef CONFIG_INFINIBAND_EHCA_SCALING int cpu; struct task_struct *task; + if (!ehca_scaling_code) + return 0; + pool = kzalloc(sizeof(struct ehca_comp_pool), GFP_KERNEL); if (pool == NULL) return -ENOMEM; @@ -796,16 +789,19 @@ int
[PATCH 2.6.21-rc1 5/5] ehca: query_port() returns LINK_UP instead UNKNOWN
Hi, this patch sets port phys state as a result of ehca_query_port() to LINK_UP. On pSeries ehca actually represents a logical HCA, whose phys/link state always is LINK_UP. Thanks Nam Signed-off-by: Hoang-Nam Nguyen [EMAIL PROTECTED] --- ehca_hca.c |3 +++ 1 files changed, 3 insertions(+) diff -Nurp infiniband_orig/drivers/infiniband/hw/ehca/ehca_hca.c infiniband_work/drivers/infiniband/hw/ehca/ehca_hca.c --- infiniband_orig/drivers/infiniband/hw/ehca/ehca_hca.c 2007-02-14 13:11:45.0 +0100 +++ infiniband_work/drivers/infiniband/hw/ehca/ehca_hca.c 2007-02-14 12:53:52.0 +0100 @@ -162,6 +162,9 @@ int ehca_query_port(struct ib_device *ib props-active_width= IB_WIDTH_12X; props-active_speed= 0x1; + /* at the moment (logical) link state is always LINK_UP */ + props-phys_state = 0x5; + query_port1: ehca_free_fw_ctrlblock(rblock); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler to avoid/reduce missed irq events
On Wed, Feb 14, 2007 at 05:40:47PM +0100, Hoang-Nam Nguyen wrote: Hi, here is a patch for ehca with the reworked irq handler. Thanks Nam This looks okay to me (and sorry for new replying earlier to you private mail) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.21-rc1 1/5] ehca: reworked irq handler to avoid/reduce missed irq events
Looks fine but this patch at least has serious whitespace damage... please resend a fixed version. - R. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion()
@@ -332,7 +333,7 @@ int ehca_destroy_cq(struct ib_cq *cq) spin_lock_irqsave(ehca_cq_idr_lock, flags); while (my_cq-nr_callbacks) { spin_unlock_irqrestore(ehca_cq_idr_lock, flags); - yield(); + wait_for_completion(my_cq-zero_callbacks); spin_lock_irqsave(ehca_cq_idr_lock, flags); } A while loop around wait_for_completion doesn't make all that much sense. I suspect a simple if (my_cq-nr_callbacks) wait_for_completion(my_cq-zero_callbacks); Is what you need. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.21-rc1 4/5] ehca: replace yield() by wait_for_completion()
I agree with Christoph -- the use of wait_for_completion() in a loop makes no sense. When you send a new copy of this patch without whitespace damage, please fix that up too... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/