Re: [PATCH -mm1 0/2] Fix unlocked call to idr_find()
On Thu, Sep 27, 2007 at 04:33:54PM +0200, [EMAIL PROTECTED] wrote: > > This a series of 2 patches that should be applied on top of the other ipc > patches, in 2.6.23-rc6-mm1. ... > They should be applied to 2.6.23-rc6-mm1, in the following order: Didn't you mean 2.6.23-rc8-mm1, btw? Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Module use count must be updated as bridges are created/destroyed
Jan Beulich <[EMAIL PROTECTED]> wrote: > > So we have an unsolvable problem here then, unless infrastructure gets added > that allows a module to declare itself as not-implicit-unload-safe, forcing > modprobe -r to keep its hands off it. Ugly. Yes I've always wanted to have a separate count that indicates a module is in use but does not prevent its immediate removal. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata_sil24 broken since 2.6.23-rc4-mm1
On 9/27/07, Tejun Heo <[EMAIL PROTECTED]> wrote: > Torsten Kaiser wrote: > > Known good is for me 2.6.23-rc3-mm1, the first known bad is 2.6.23-rc4-mm1. > > I will try to look at the diff between these revisions some more, but > > the change in sata_sil24.c looked like a perfect match for the > > symptoms I was seeing. > > I think the first thing to do here is to verify 2.6.23-rc3-mm1 still > works fine and my previous debug patch is pretty much meaningless if > address initialization failure isn't the cause. After the first trouble with -rc4-mm1 I switched back to -rc3-mm1. I booted that kernel 7 times over 4 days and never had trouble. (Before -rc4-mm1 came out, I used -rc3-mm1 for over a week) So in case of -rc3-mm1 I'm pretty sure that it works. Not completely sure is if 2.6.23-rc7-sglist kernel works. I booted that 9 times, but from a quick look in /var/log/messages, I might not have hit the "correct" situation to trigger the error. That kernel is vanilla 2.6.23-rc7 plus the patch from http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2 ( http://marc.info/?l=linux-ide=119055574826083=2 ) Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with SMP & ACPI powering off
On Thursday 27 September 2007 18:00, Rafael J. Wysocki wrote: > On Thursday, 27 September 2007 23:29, Mark Lord wrote: > > Question: do we disable all CPUs except 0 when doing ACPI power off? > > No, but we should. We used to. It is absolutely mandatory -- else it confuses the BIOS on some boards b/c it isn't expecting SMM to get entered from other than cpu0. -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [13/17] Virtual compound page freeing in interrupt context
On Tue, 25 Sep 2007 16:42:17 -0700 Christoph Lameter <[EMAIL PROTECTED]> wrote: > +static noinline void vcompound_free(void *addr) > +{ > + if (in_interrupt()) { Should be (in_interrupt() || irqs_disabled()) ? Regards, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: drivers/usb/misc/emi*.c have the biggest data objects in the whole tree
On Fri, Sep 14, 2007 at 11:35:34AM +0100, Denys Vlasenko wrote: > Hi Tapio, > > You are the author of these files. Are you still maintaining them? > If not, do you know who is the current maintainer? > > These two object files hold the biggest data objects in the whole Linux kernel > after lockdep: > >textdata bss dec hex filename >1258 160516 0 161774 277ee ./drivers/usb/misc/emi26.o >1504 209296 0 210800 33770 ./drivers/usb/misc/emi62.o > > Basically, these are big arrays of the following structures: > > typedef struct _INTEL_HEX_RECORD > { > __u32 length; > __u32 address; > __u32 type; > __u8data[MAX_INTEL_HEX_RECORD_LENGTH]; > } INTEL_HEX_RECORD; > > I suggest the following optimizations: > > Change structure to > > typedef struct _INTEL_HEX_RECORD > { > __u8 type; > __u8 length; > __u16 address; > __u8data[MAX_INTEL_HEX_RECORD_LENGTH]; > } INTEL_HEX_RECORD __attribute__((__packed__)); Only if you redo the whole firmware image too :) What is this really hurting? It's only relevant if you load the specific module, if you have this device type. It's a firmware blob, nothing really interesting at all. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git patches] net driver fixes
And an e1000 id patch. Please pull from 'upstream-linus' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream-linus to receive the following updates: drivers/net/e1000/e1000_ethtool.c |1 + drivers/net/e1000/e1000_hw.c |1 + drivers/net/e1000/e1000_hw.h |1 + drivers/net/e1000/e1000_main.c|2 + drivers/net/sky2.c| 53 +++-- 5 files changed, 44 insertions(+), 14 deletions(-) Auke Kok (1): e1000: Add device IDs of blade version of the 82571 quad port Stephen Hemminger (3): sky2: sky2 FE+ receive status workaround sky2: FE+ vlan workaround sky2: fix transmit state on resume diff --git a/drivers/net/e1000/e1000_ethtool.c b/drivers/net/e1000/e1000_ethtool.c index 4c3785c..9ecc3ad 100644 --- a/drivers/net/e1000/e1000_ethtool.c +++ b/drivers/net/e1000/e1000_ethtool.c @@ -1726,6 +1726,7 @@ static int e1000_wol_exclusion(struct e1000_adapter *adapter, struct ethtool_wol case E1000_DEV_ID_82571EB_QUAD_COPPER: case E1000_DEV_ID_82571EB_QUAD_FIBER: case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE: + case E1000_DEV_ID_82571PT_QUAD_COPPER: case E1000_DEV_ID_82546GB_QUAD_COPPER_KSP3: /* quad port adapters only support WoL on port A */ if (!adapter->quad_port_a) { diff --git a/drivers/net/e1000/e1000_hw.c b/drivers/net/e1000/e1000_hw.c index ba120f7..8604adb 100644 --- a/drivers/net/e1000/e1000_hw.c +++ b/drivers/net/e1000/e1000_hw.c @@ -387,6 +387,7 @@ e1000_set_mac_type(struct e1000_hw *hw) case E1000_DEV_ID_82571EB_SERDES_DUAL: case E1000_DEV_ID_82571EB_SERDES_QUAD: case E1000_DEV_ID_82571EB_QUAD_COPPER: + case E1000_DEV_ID_82571PT_QUAD_COPPER: case E1000_DEV_ID_82571EB_QUAD_FIBER: case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE: hw->mac_type = e1000_82571; diff --git a/drivers/net/e1000/e1000_hw.h b/drivers/net/e1000/e1000_hw.h index fe87146..07f0ea7 100644 --- a/drivers/net/e1000/e1000_hw.h +++ b/drivers/net/e1000/e1000_hw.h @@ -475,6 +475,7 @@ int32_t e1000_check_phy_reset_block(struct e1000_hw *hw); #define E1000_DEV_ID_82571EB_FIBER 0x105F #define E1000_DEV_ID_82571EB_SERDES 0x1060 #define E1000_DEV_ID_82571EB_QUAD_COPPER 0x10A4 +#define E1000_DEV_ID_82571PT_QUAD_COPPER 0x10D5 #define E1000_DEV_ID_82571EB_QUAD_FIBER 0x10A5 #define E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE 0x10BC #define E1000_DEV_ID_82571EB_SERDES_DUAL 0x10D9 diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 4a22595..e7c8951 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -108,6 +108,7 @@ static struct pci_device_id e1000_pci_tbl[] = { INTEL_E1000_ETHERNET_DEVICE(0x10BC), INTEL_E1000_ETHERNET_DEVICE(0x10C4), INTEL_E1000_ETHERNET_DEVICE(0x10C5), + INTEL_E1000_ETHERNET_DEVICE(0x10D5), INTEL_E1000_ETHERNET_DEVICE(0x10D9), INTEL_E1000_ETHERNET_DEVICE(0x10DA), /* required last entry */ @@ -1101,6 +1102,7 @@ e1000_probe(struct pci_dev *pdev, case E1000_DEV_ID_82571EB_QUAD_COPPER: case E1000_DEV_ID_82571EB_QUAD_FIBER: case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE: + case E1000_DEV_ID_82571PT_QUAD_COPPER: /* if quad port adapter, disable WoL on all but port A */ if (global_quad_port_a != 0) adapter->eeprom_wol = 0; diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index 0792031..162489b 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -910,6 +910,20 @@ static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2) return le; } +static void tx_init(struct sky2_port *sky2) +{ + struct sky2_tx_le *le; + + sky2->tx_prod = sky2->tx_cons = 0; + sky2->tx_tcpsum = 0; + sky2->tx_last_mss = 0; + + le = get_tx_le(sky2); + le->addr = 0; + le->opcode = OP_ADDR64 | HW_OWNER; + sky2->tx_addr64 = 0; +} + static inline struct tx_ring_info *tx_le_re(struct sky2_port *sky2, struct sky2_tx_le *le) { @@ -1320,7 +1334,8 @@ static int sky2_up(struct net_device *dev) GFP_KERNEL); if (!sky2->tx_ring) goto err_out; - sky2->tx_prod = sky2->tx_cons = 0; + + tx_init(sky2); sky2->rx_le = pci_alloc_consistent(hw->pdev, RX_LE_BYTES, >rx_le_map); @@ -2148,6 +2163,18 @@ static struct sk_buff *sky2_receive(struct net_device *dev, sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending; prefetch(sky2->rx_ring + sky2->rx_next); + if (length < ETH_ZLEN || length > sky2->rx_data_size) + goto len_error; + + /* This chip has hardware problems that generates bogus status. +* So do only marginal
[PATCH] libata drain fifo on stuck DRQ HSM violation
Tejun Heo wrote: Jeff Garzik wrote: Tejun Heo wrote: Alan Cox wrote: I think there have been enough cases where this draining was necessary. IIRC, ata_piix was involved in those cases, right? If so, can you please submit a patch which applies this only to affected controllers? I don't feel too confident about applying this to all SFF controllers. Old IDE does it on all controllers bar a couple. So we have a very good knowledge of what does/doesn't work. The one that needs care in old ide is an ordering issue where a state machine reset done first causes the drain of the I/O to hang. Hmmm... So, do we apply draining to all PATA? Or is ata_piix SATA affected too? I would think all SFF controllers, since a lot of first gen SATA are really bridged solutions. If they are flagging DRQ, I say oblige them :) Alright, then the posted patch should be good enough. Mark, can you be bothered to regenerate the patch and post it one more time (again)? It seems we all agree the update is needed. I think this original patch still applies cleanly on at least 2.6.23-rc7. Drain up to 512 words from host/bridge FIFO on stuck DRQ HSM violation, rather than just getting stuck there forever. Signed-Off-By: Mark Lord <[EMAIL PROTECTED]> --- --- old/drivers/ata/libata-sff.c2007-04-26 12:02:46.0 -0400 +++ linux/drivers/ata/libata-sff.c 2007-04-29 08:29:27.0 -0400 @@ -413,6 +413,24 @@ ap->ops->irq_on(ap); } +static void ata_drain_fifo (struct ata_port *ap, struct ata_queued_cmd *qc) +{ + u8 stat = ata_chk_status(ap); + /* +* Try to clear stuck DRQ if necessary. +*/ + if ((stat & ATA_DRQ) && (!qc || qc->dma_dir != DMA_TO_DEVICE)) { + unsigned int i, limit = 512; + printk("Draining up to %u words from data FIFO.\n", limit); + for (i = 0; i < limit ; ++i) { + ioread16(ap->ioaddr.data_addr); + if (!(ata_chk_status(ap) & ATA_DRQ)) + break; + } + printk("Drained %u/%u words.\n", i, limit); + } +} + /** * ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller * @ap: port to handle error for @@ -469,7 +487,7 @@ } ata_altstatus(ap); - ata_chk_status(ap); + ata_drain_fifo(ap, qc); ap->ops->irq_clear(ap); spin_unlock_irqrestore(ap->lock, flags); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stardom SATA HSM violation
Tejun Heo wrote: Alan Cox wrote: I think there have been enough cases where this draining was necessary. IIRC, ata_piix was involved in those cases, right? If so, can you please submit a patch which applies this only to affected controllers? I don't feel too confident about applying this to all SFF controllers. Old IDE does it on all controllers bar a couple. So we have a very good knowledge of what does/doesn't work. The one that needs care in old ide is an ordering issue where a state machine reset done first causes the drain of the I/O to hang. Hmmm... So, do we apply draining to all PATA? Or is ata_piix SATA affected too? ata_piix SATA is definitely affected when a PATA_drive to SATA_host bridge is present. Possibly other times. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3/4] dma: document dma_flags_set_dmabarrier()
On Thu, Sep 27, 2007 at 06:13:02PM -0700, [EMAIL PROTECTED] wrote: > > Document dma_flags_set_dmabarrier(). > > Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]> This looks really good! thanks, grant Acked-by: Grant Grundler <[EMAIL PROTECTED]> > > --- > DMA-API.txt | 26 ++ > 1 files changed, 26 insertions(+) > > diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt > index cc7a8c3..5fc0bba 100644 > --- a/Documentation/DMA-API.txt > +++ b/Documentation/DMA-API.txt > @@ -544,3 +544,29 @@ size is the size (and should be a page-sized multiple). > The return value will be either a pointer to the processor virtual > address of the memory, or an error (via PTR_ERR()) if any part of the > region is occupied. > + > +int > +dma_flags_set_dmabarrier(int dir) > + > +Amend dir (one of the enum dma_data_direction values), with a > +platform-specific "dmabarrier" attribute. The dmabarrier attribute > +forces a flush of all in-flight DMA when the associated memory > +region is written to (see example below.) > + > +This provides a mechanism to enforce ordering of DMA on platforms that > +permit DMA to be reordered between device and host memory (within a > +NUMA interconnect). On other platforms this is a nop. > + > +The dmabarrier would be set when the memory region is mapped for DMA, > +e.g.: > + > + int count, flags = dma_flags_set_dmabarrier(DMA_BIDIRECTIONAL); > + > + count = dma_map_sg(dev, sglist, nents, flags); > + > +As an example of a situation where this would be useful, suppose that > +the device does a DMA write to indicate that data is ready and > +available in memory. The DMA of the "completion indication" could > +race with data DMA. Using a dmabarrier on the memory used for > +completion indications would prevent the race. > + - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: Correct SuS compliance for open of large file without options
On Thu, Sep 27, 2007 at 05:28:57PM -0600, Matthew Wilcox wrote: > On Thu, Sep 27, 2007 at 07:19:27PM -0400, Theodore Tso wrote: > > Would you accept a patch which causes the deprecated sysfs > > files/directories to disappear, even if CONFIG_SYS_DEPRECATED is > > defined, via a boot-time parameter? > > How about a mount option? That way people can test without a reboot: > > mount -o remount,deprecated={yes,no} /sys Unfortunatly, due to the way sysfs and kobjects are built up, this is pretty impossible to do. sorry, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: Correct SuS compliance for open of large file without options
On Thu, Sep 27, 2007 at 07:19:27PM -0400, Theodore Tso wrote: > On Thu, Sep 27, 2007 at 02:34:45PM -0700, Greg KH wrote: > > Ok, how then should I advertise this better? What can we do better to > > help userspace programmers out in this regard? > > Would you accept a patch which causes the deprecated sysfs > files/directories to disappear, even if CONFIG_SYS_DEPRECATED is > defined, via a boot-time parameter? As discussed in the kernel summit talk about this very topic, Kay is working on a patch to do just that :) > Many people and distros are > likely to keep CONFIG_SYS_DEPRECATED defined just our of paranoia that > things might break. Doing a quick google, I note that Fedora has been > going back and forth of turning it off, watching things break, and > then turning it back on. The latest time, the changelog said: > > * Fri Jan 26 23:00:00 2007 Bill Nottingham > > - turn on CONFIG_SYSFS_DEPRECATED so that things actually work. *sigh* > > (and I've checked, Fedora's CVS still has CONFIG_SYSFS_DEPRECATED > defined; it's not just Debian at fault here.) That's odd, SuSE and Gentoo have been working for quite some time just fine with that option disabled :) thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iwl4965 and driver merging policy
On Thu, 2007-09-27 at 22:30 -0400, John W. Linville wrote: > > It doesn't seem to pull any depedency nor affect any other external > > piece of code unless I'm missing something, so it's a perfect > example of > > what we've been discussing back then: there is just no point not > merging > > it at any time right ? :-) > > It is queued for 2.6.24. I'm not even sure it was originally posted > in time for the 2.6.23 merge window, but even if it was there was > a lot of opposition to merging it until fairly recently. In fact, > I'm sure there are still some wireless developers that are less than > happy about merging it now. > > Anyway, coming soon to a kernel near you... Allright, thanks. I was mostly trying to figure out where we standed with this whole idea that driver additions were not necessarily constrainted by the merge window (which I think is fair to do) and this looked like a good example to pick since it affects my new laptop :-) Out of curiosity, what's the main source of opposition ? Since it's being shipped by distro or built out of tree by most users -anyway-, I think it's pretty clear that we'd be better off having it merged asap rather than trying to figure out what random version was included by users/distros and try to support it, in addition to wider exposure & all the yadada of being upstream in the first place. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iwl4965 and driver merging policy
> > Well, pulling in iwlwifi would require also pulling in the mac80211 > subsystem, so it's not quite that simple (although I'm not sure what's > holding back that going into the kernel.) I though that was already in 2.6.23 ... my bad if I missed something (there is definitely something there called net/mac80211) > I had no problem building my personal production kernel by taking > 2.6.23-rc8, and doing a git pull from the everything branch in John > Linville's wireless-dev git tree. It's probably too late to pull it > for 2.6.23-rc8 (although if Linux wanted to do it it's only one git > pull command away :-), but it would be really nice if it could get > merged in for 2.6.24. Yes, I agree -rc8 seems to be a tad too late, I'm just surprised we didn't get it in earlier though since it seems it's been around and useable for some time. Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iwl4965 and driver merging policy
On Fri, Sep 28, 2007 at 11:39:27AM +1000, Benjamin Herrenschmidt wrote: > Just a little question in the light of the discussion we had at Kernel > Summit about merging drivers upstream (and here, I strongly agree with > Linus, hence my message). You must not have been watching me SPAM netdev for the past two weeks. :-) > I just got that new T61 laptop which happens to have an iwl4xxx chip. > The distro I installed on it (ubuntu) has a driver for it. I suspect > others do too and most users get it from some random external tree and > use it. > > Thus my question, why are we about to release 2.6.23 without it ? > > It doesn't seem to pull any depedency nor affect any other external > piece of code unless I'm missing something, so it's a perfect example of > what we've been discussing back then: there is just no point not merging > it at any time right ? :-) It is queued for 2.6.24. I'm not even sure it was originally posted in time for the 2.6.23 merge window, but even if it was there was a lot of opposition to merging it until fairly recently. In fact, I'm sure there are still some wireless developers that are less than happy about merging it now. Anyway, coming soon to a kernel near you... John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iwl4965 and driver merging policy
On Fri, Sep 28, 2007 at 11:39:27AM +1000, Benjamin Herrenschmidt wrote: > > Just a little question in the light of the discussion we had at Kernel > Summit about merging drivers upstream (and here, I strongly agree with > Linus, hence my message). > > I just got that new T61 laptop which happens to have an iwl4xxx chip. > The distro I installed on it (ubuntu) has a driver for it. I suspect > others do too and most users get it from some random external tree and > use it. > > Thus my question, why are we about to release 2.6.23 without it ? > > It doesn't seem to pull any depedency nor affect any other external > piece of code unless I'm missing something, so it's a perfect example of > what we've been discussing back then: there is just no point not merging > it at any time right ? :-) Well, pulling in iwlwifi would require also pulling in the mac80211 subsystem, so it's not quite that simple (although I'm not sure what's holding back that going into the kernel.) I had no problem building my personal production kernel by taking 2.6.23-rc8, and doing a git pull from the everything branch in John Linville's wireless-dev git tree. It's probably too late to pull it for 2.6.23-rc8 (although if Linux wanted to do it it's only one git pull command away :-), but it would be really nice if it could get merged in for 2.6.24. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
WARNING: at arch/x86_64/kernel/smp.c:397 smp_call_function_mask()
On Thu, Sep 27, 2007 at 02:22:20AM -0700, Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc8/2.6.23-rc8-mm2/ Laurent, It triggered a WARNING on first run in qemu: [0.31] WARNING: at arch/x86_64/kernel/smp.c:397 smp_call_function_mask() [0.31] [0.31] Call Trace: [0.31] [] dump_trace+0x3ee/0x4a0 [0.31] [] show_trace+0x43/0x70 [0.31] [] dump_stack+0x15/0x20 [0.31] [] smp_call_function_mask+0x94/0xa0 [0.31] [] smp_call_function+0x19/0x20 [0.31] [] on_each_cpu+0x1f/0x50 [0.31] [] global_flush_tlb+0x8c/0x110 [0.31] [] free_init_pages+0xe5/0xf0 [0.31] [] alternative_instructions+0x7e/0x150 [0.31] [] check_bugs+0x1a/0x20 [0.31] [] start_kernel+0x2da/0x380 [0.31] [] _sinittext+0x132/0x140 Here is the more complete log: [0.00] Linux version 2.6.23-rc8-mm2 ([EMAIL PROTECTED]) (gcc version 4.2.1 (Debian 4.2.1-5)) #3 SMP Fri Sep 28 10:29:34 CST 2007 [0.00] Command line: root=/dev/hda rw console=ttyS0 clock=pit init=/bin/bash [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009fc00 (usable) [0.00] BIOS-e820: 0009fc00 - 000a (reserved) [0.00] BIOS-e820: 000e8000 - 0010 (reserved) [0.00] BIOS-e820: 0010 - 39ff (usable) [0.00] BIOS-e820: 39ff - 3a00 (ACPI data) [0.00] BIOS-e820: fffc - 0001 (reserved) [0.00] end_pfn_map = 1048576 [0.00] DMI not present or invalid. [0.00] ACPI: RSDP 000FAA30, 0014 (r0 BOCHS ) [0.00] ACPI: RSDT 39FF, 002C (r0 BOCHS BXPCRSDT1 BXPC 1) [0.00] ACPI: FACP 39FF002C, 0074 (r0 BOCHS BXPCFACP1 BXPC 1) [0.00] ACPI: DSDT 39FF0100, 0832 (r1 BXPC BXDSDT1 INTL 20060912) [0.00] ACPI: FACS 39FF00C0, 0040 [0.00] ACPI: APIC 39FF0938, 0040 (r0 BOCHS BXPCAPIC1 BXPC 1) [0.00] No NUMA configuration found [0.00] Faking a node at -39ff [0.00] Bootmem setup node 0 -39ff [0.00] Zone PFN ranges: [0.00] DMA 0 -> 4096 [0.00] DMA324096 -> 1048576 [0.00] Normal1048576 -> 1048576 [0.00] Movable zone start PFN for each node [0.00] early_node_map[2] active PFN ranges [0.00] 0:0 -> 159 [0.00] 0: 256 -> 237552 [0.00] ACPI: PM-Timer IO Port: 0xb008 [0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) [0.00] Processor #0 (Bootup-CPU) [0.00] ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) [0.00] IOAPIC[0]: apic_id 1, address 0xfec0, GSI 0-23 [0.00] Setting APIC routing to flat [0.00] Using ACPI (MADT) for SMP configuration information [0.00] Allocating PCI resources starting at 4000 (gap: 3a00:c5fc) [0.00] .eh_frame_hdr for 'kernel' present but unusable [0.00] SMP: Allowing 1 CPUs, 0 hotplug CPUs [0.00] PERCPU: Allocating 429480 bytes of per cpu data [0.00] Built 1 zonelists in Node order, mobility grouping on. Total pages: 231879 [0.00] Policy zone: DMA32 [0.00] Kernel command line: root=/dev/hda rw console=ttyS0 clock=pit init=/bin/bash [0.00] Warning! clock= boot option is deprecated. Use clocksource=xyz [0.00] Initializing CPU#0 [0.00] PID hash table entries: 4096 (order: 12, 32768 bytes) [0.00] TSC calibrated against PM_TIMER [0.00] time.c: Detected 2932.892 MHz processor. [0.02] console [kgdb0] enabled [0.03] Console: colour VGA+ 80x25 [0.04] console [ttyS0] enabled [0.05] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [0.05] ... MAX_LOCKDEP_SUBCLASSES:8 [0.05] ... MAX_LOCK_DEPTH: 30 [0.05] ... MAX_LOCKDEP_KEYS:2048 [0.05] ... CLASSHASH_SIZE: 1024 [0.05] ... MAX_LOCKDEP_ENTRIES: 8192 [0.05] ... MAX_LOCKDEP_CHAINS: 16384 [0.05] ... CHAINHASH_SIZE: 8192 [0.05] memory used by lock dependency info: 1712 kB [0.05] per task-struct memory footprint: 2160 bytes [0.05] Checking aperture... [0.10] Memory: 905832k/950208k available (3018k kernel code, 43988k reserved, 2171k data, 720k init) [0.10] SLUB: Genslabs=12, HWalign=64, Order=0-3, MinObjects=16, CPUs=1, Nodes=1 [0.25] Calibrating delay using timer specific routine.. 5880.64 BogoMIPS (lpj=29403242) [0.25] kswapd reclaim order set to 3 [0.25] Security Framework initialized [0.25] SELinux: Initializing. [0.25]
Re: [patch 2/2] VFS: allow filesystem to override mknod capability checks
On Monday September 24, [EMAIL PROTECTED] wrote: > From: Miklos Szeredi <[EMAIL PROTECTED]> > > Add a new super block flag, that results in the VFS not checking if > the current process has enough privileges to do an mknod(). > > If this flag is set, all mounts for this super block will have the > "nodev" flag implied. > > This is needed on filesystems, where an unprivileged user may be able > to create a device node, without causing security problems. > > One such example is "mountlo" a loopback mount utility implemented > with fuse and UML, which runs as an unprivileged userspace process. > In this case the user does in fact have the right to create device > nodes within the filesystem image, as long as the user has write > access to the image. Since the filesystem is mounted with "nodev", > adding device nodes is not a security concern. I must admit that I don't feel very comfortable about this. I wonder how many more flags we might be tempted to add to allow user-controlled filesystems to do interesting things. Somehow I doubt this will be the last, so we should be very careful allowing it to be the first (or is it the second already?) A more concrete comment on the patch: Is it really necessary to introduce IS_MNT_NODEV?? Why not simply test both the flags (MS_MKNOD_NOCAP and MNT_NODEV) before allowing the mknod? That would localise the change to where is it really relevant. Do we actually need a new flag? Would not a combination of MS_NODEV and MS_SETUSER achieve the same thing (near enough)? Do you imagine this flag being set as a mount option (-o unprivmknod) or does the filesystem set it itself? If the latter, maybe this test should be moved down into the filesystems ->mknod operation. Most filesystems get if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) return -EPERM; at the top of ->mknod. fuse can do whatever it likes without bothering common code. According to fs.h, we only support 32 fs-independent mount-flags, and over half are in use. I'm not convinced we should spend one on such a narrow requirement. NeilBrown > > Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> > --- > > Index: linux/fs/namei.c > === > --- linux.orig/fs/namei.c 2007-09-24 13:52:17.0 +0200 > +++ linux/fs/namei.c 2007-09-24 13:54:57.0 +0200 > @@ -1617,7 +1617,7 @@ int may_open(struct nameidata *nd, int a > if (S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) { > flag &= ~O_TRUNC; > } else if (S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode)) { > - if (nd->mnt->mnt_flags & MNT_NODEV) > + if (IS_MNT_NODEV(nd->mnt)) > return -EACCES; > > flag &= ~O_TRUNC; > @@ -1920,7 +1920,8 @@ int vfs_mknod(struct inode *dir, struct > if (error) > return error; > > - if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) > + if (!(dir->i_sb->s_flags & MS_MKNOD_NOCAP) && > + (S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) > return -EPERM; > > if (!dir->i_op || !dir->i_op->mknod) > Index: linux/include/linux/fs.h > === > --- linux.orig/include/linux/fs.h 2007-09-24 13:52:17.0 +0200 > +++ linux/include/linux/fs.h 2007-09-24 13:54:57.0 +0200 > @@ -130,6 +130,8 @@ extern int dir_notify_enable; > #define MS_SETUSER (1<<23) /* set mnt_uid to current user */ > #define MS_NOMNT (1<<24) /* don't allow unprivileged submounts */ > #define MS_KERNMOUNT (1<<25) /* this is a kern_mount call */ > +#define MS_MKNOD_NOCAP (1<<26) /* no capability check in mknod, > +implies "nodev" */ > #define MS_ACTIVE(1<<30) > #define MS_NOUSER(1<<31) > > @@ -190,6 +192,10 @@ extern int dir_notify_enable; > #define IS_SWAPFILE(inode) ((inode)->i_flags & S_SWAPFILE) > #define IS_PRIVATE(inode)((inode)->i_flags & S_PRIVATE) > > +#define IS_MNT_NODEV(mnt)(((mnt)->mnt_flags & MNT_NODEV) || \ > + ((mnt)->mnt_sb->s_flags & MS_MKNOD_NOCAP)) > + > + > /* the read-only stuff doesn't really belong here, but any other place is > probably as bad and I don't want to create yet another include file. */ > > Index: linux/drivers/mtd/mtdsuper.c > === > --- linux.orig/drivers/mtd/mtdsuper.c 2007-09-24 13:52:17.0 +0200 > +++ linux/drivers/mtd/mtdsuper.c 2007-09-24 13:54:57.0 +0200 > @@ -194,7 +194,7 @@ int get_sb_mtd(struct file_system_type * > if (!S_ISBLK(nd.dentry->d_inode->i_mode)) > goto out; > > - if (nd.mnt->mnt_flags & MNT_NODEV) { > + if (IS_MNT_NODEV(nd.mnt)) { > ret = -EACCES; > goto out; > } > Index: linux/fs/block_dev.c >
Re: Floating Point Issue
On Thu, Sep 27, 2007 at 05:17:44PM +0200, Jan Engelhardt wrote: > >On Sep 27 2007 12:41, mahamuni ashish wrote: >>I have small code > >This is not a kernel problem. (Read your C book and/or ask in >a C newsgroup.) Please goto comp.lang.c for help. ;) -- "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] Hook up group scheduler with control groups
On Thu, Sep 27, 2007 at 04:42:41PM -0700, Andrew Morton wrote: > > @@ -219,6 +225,9 @@ static inline struct task_grp *task_grp( > > > > #ifdef CONFIG_FAIR_USER_SCHED > > tg = p->user->tg; > > +#elif CONFIG_FAIR_CGROUP_SCHED > > + tg = container_of(task_subsys_state(p, cpu_cgroup_subsys_id), > > + struct task_grp, css); > > #else > > tg = _task_grp; > > #endif > > that's a bit funny-looking. Are CONFIG_FAIR_CGROUP_SCHED and > CONFIG_FAIR_USER_SCHED mutually exclusive? Yes. While configuring kernel, user can choose only one of those options and not both. > Doesn't seem that way. Hmm ..why do you say that? > if > they're both defined then CONFIG_FAIR_USER_SCHED "wins". > Anyway, please confirm that this is correct? They can't both be defined. > I'll switch that to `#elif defined(CONFIG_FAIR_CGROUP_SCHED)'. We can get > gcc warnings with `#if CONFIG_FOO', and people should be using `#ifdef > CONFIG_FOO', so I assume the same applies to #elif. Thx for fixing it! -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: Correct SuS compliance for open of large file without options
On Thu, Sep 27, 2007 at 05:28:57PM -0600, Matthew Wilcox wrote: > On Thu, Sep 27, 2007 at 07:19:27PM -0400, Theodore Tso wrote: > > Would you accept a patch which causes the deprecated sysfs > > files/directories to disappear, even if CONFIG_SYS_DEPRECATED is > > defined, via a boot-time parameter? > > How about a mount option? That way people can test without a reboot: > > mount -o remount,deprecated={yes,no} /sys It would be nice if that would be easy to make work, but the problem is that remounting /sysfs doesn't change the entries in the sysfs tree that have already been made in the tree. We could do something such as creating an sysfs_create_link_deprecated() call which created a kobject with a new flag indicating it's deprecated, so it could be filtered out dynamically when /sys is remounted, or when some file such as /sys/kernel/deprecated_sysfs_files has "0" or "1" written to it. The question is whether it's worth it, since we'd have to bloat the kobject structure by 4 bytes (it currently doesn't have a flags field from which we could borrow a bit), or whether it's OK just to make the user reboot. (I do agree it would be nicer if the user didn't have to reboot, but most of the time they will need to test the initrd and init scripts anyway. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i915: make vbl interrupts work properly on i965g/gm
On Thursday, September 27, 2007 7:05:31 pm Dave Airlie wrote: > Hi Linus, > > The attached patch is to fix a bug reported on 965gm chipsets (lots of new > laptops), I think distros will all have to patch this in to fix it, so can > we get it into the 2.6.23 final? > > (Otherwise I'll wait until stable..) Without this patch, my 965GM drops vblank interrupts, so I'd really like to see it upstream ASAP too. Acked-by: Jesse Barnes <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-rc8 network problem. Mem leak? ip1000a?
Uniprocessor Althlon 64, 64-bit kernel, 2G ECC RAM, 2.6.23-rc8 + linuxpps (5.0.0) + ip1000a driver. (patch from http://marc.info/?l=linux-netdev=118980588419882) After a few hours of operation, ntp loses the ability to send packets. sendto() returns -EAGAIN to everything, including the 24-byte UDP packet that is a response to ntpq. -EAGAIN on a sendto() makes me think of memory problems, so here's meminfo at the time: ### FAILED state ### # cat /proc/meminfo MemTotal: 2059384 kB MemFree: 15332 kB Buffers:665608 kB Cached: 18212 kB SwapCached: 0 kB Active: 380384 kB Inactive: 355020 kB SwapTotal: 5855208 kB SwapFree: 5854552 kB Dirty: 28504 kB Writeback: 0 kB AnonPages: 51608 kB Mapped: 11852 kB Slab: 1285348 kB SReclaimable: 152968 kB SUnreclaim:1132380 kB PageTables: 3888 kB NFS_Unstable:0 kB Bounce: 0 kB CommitLimit: 6884900 kB Committed_AS: 590528 kB VmallocTotal: 34359738367 kB VmallocUsed:265628 kB VmallocChunk: 34359472059 kB Killing and restarting ntpd gets it running again for a few hours. Here's after about two hours of successful operation. (I'll try to remember to run slabinfo before killing ntpd next time.) ### WORKING state ### # cat /proc/meminfo MemTotal: 2059384 kB MemFree: 20252 kB Buffers:242688 kB Cached: 41556 kB SwapCached:200 kB Active: 285012 kB Inactive: 147348 kB SwapTotal: 5855208 kB SwapFree: 5854212 kB Dirty: 36 kB Writeback: 0 kB AnonPages: 148052 kB Mapped: 12756 kB Slab: 1582512 kB SReclaimable: 134348 kB SUnreclaim:1448164 kB PageTables: 4500 kB NFS_Unstable:0 kB Bounce: 0 kB CommitLimit: 6884900 kB Committed_AS: 689956 kB VmallocTotal: 34359738367 kB VmallocUsed:265628 kB VmallocChunk: 34359472059 kB # /usr/src/linux/Documentation/vm/slabinfo Name Objects ObjsizeSpace Slabs/Part/Cpu O/S O %Fr %Ef Flg :016 1478 1624.5K 6/3/1 256 0 50 96 * :024 170 24 4.0K 1/0/1 170 0 0 99 * :032 1339 3245.0K 11/2/1 128 0 18 95 * :040 102 40 4.0K 1/0/1 102 0 0 99 * :064 5937 64 413.6K 101/15/1 64 0 14 91 * :07256 72 4.0K 1/0/1 56 0 0 98 * :088 6946 88 618.4K151/0/1 46 0 0 98 * :096 23851 96 2.5M 616/144/1 42 0 23 90 * :128 730 128 114.6K 28/6/1 32 0 21 81 * :136 232 13636.8K 9/6/1 30 0 66 85 * :192 474 19298.3K 24/4/1 21 0 16 92 * :256 1385376 256 354.6M 86587/0/1 16 0 0 99 * :32012 304 4.0K 1/0/1 12 0 0 89 *A :384 359 384 180.2K44/23/1 10 0 52 76 *A :512 1384316 512 708.7M 173040/1/18 0 0 99 * :64072 61653.2K 13/5/16 0 38 83 *A :704 1870 696 1.3M170/0/1 11 1 0 93 *A :0001024 4271024 454.6K111/9/14 0 8 96 * :0001472 1501472 245.7K 30/0/15 1 0 89 * :00020481589912048 325.7M 39759/25/14 1 0 99 * :0004096514096 245.7K 30/9/12 1 30 85 * Acpi-State 51 80 4.0K 1/0/1 51 0 0 99 anon_vma 1032 1628.6K 7/5/1 170 0 71 57 bdev_cache 43 72036.8K 9/1/15 0 11 83 Aa blkdev_requests 42 28812.2K 3/0/1 14 0 0 98 buffer_head 59173 10411.1M2734/1690/1 39 0 61 54 a cfq_io_context 223 15240.9K 10/6/1 26 0 60 82 dentry 98641 19219.7M 4813/274/1 21 0 5 96 a ext3_inode_cache115690 68886.3M 10545/77/1 11 1 0 92 a file_lock_cache 23 168 4.0K 1/0/1 23 0 0 94 idr_layer_cache118 52869.6K 17/1/17 0 5 89 inode_cache 1365 528 798.7K195/0/17 0 0 90 a kmalloc-131072 1 131072 131.0K 1/0/11 5 0 100 kmalloc-163848 16384 131.0K 8/0/11 2 0 100 kmalloc-327681 3276832.7K 1/0/11 3 0 100 kmalloc-8 1535 812.2K 3/1/1 512 0 33 99 kmalloc-819210
[PATCH] i915: make vbl interrupts work properly on i965g/gm
Hi Linus, The attached patch is to fix a bug reported on 965gm chipsets (lots of new laptops), I think distros will all have to patch this in to fix it, so can we get it into the 2.6.23 final? (Otherwise I'll wait until stable..) Dave.From 14e53712e5e2ccc72dac1131de78e590e9a9d451 Mon Sep 17 00:00:00 2001 From: Dave Airlie <[EMAIL PROTECTED]> Date: Fri, 28 Sep 2007 11:46:28 +1000 Subject: [PATCH] i915: make vbl interrupts work properly on i965g/gm hw. This code is ported from the DRM git tree and allows the vblank interrupts to function on the i965 hw. It also requires a change in Mesa's 965 driver to actually use them. Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> --- drivers/char/drm/i915_drv.h |6 ++ drivers/char/drm/i915_irq.c | 12 2 files changed, 18 insertions(+), 0 deletions(-) diff --git a/drivers/char/drm/i915_drv.h b/drivers/char/drm/i915_drv.h index 737088b..28b9873 100644 --- a/drivers/char/drm/i915_drv.h +++ b/drivers/char/drm/i915_drv.h @@ -210,6 +210,12 @@ extern int i915_wait_ring(struct drm_device * dev, int n, const char *caller); #define I915REG_INT_MASK_R 0x020a8 #define I915REG_INT_ENABLE_R 0x020a0 +#define I915REG_PIPEASTAT 0x70024 +#define I915REG_PIPEBSTAT 0x71024 + +#define I915_VBLANK_INTERRUPT_ENABLE (1UL<<17) +#define I915_VBLANK_CLEAR (1UL<<1) + #define SRX_INDEX 0x3c4 #define SRX_DATA 0x3c5 #define SR01 1 diff --git a/drivers/char/drm/i915_irq.c b/drivers/char/drm/i915_irq.c index 4b4b2ce..bb8e9e9 100644 --- a/drivers/char/drm/i915_irq.c +++ b/drivers/char/drm/i915_irq.c @@ -214,6 +214,10 @@ irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS) struct drm_device *dev = (struct drm_device *) arg; drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private; u16 temp; + u32 pipea_stats, pipeb_stats; + + pipea_stats = I915_READ(I915REG_PIPEASTAT); + pipeb_stats = I915_READ(I915REG_PIPEBSTAT); temp = I915_READ16(I915REG_INT_IDENTITY_R); @@ -225,6 +229,8 @@ irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS) return IRQ_NONE; I915_WRITE16(I915REG_INT_IDENTITY_R, temp); + (void) I915_READ16(I915REG_INT_IDENTITY_R); + DRM_READMEMORYBARRIER(); dev_priv->sarea_priv->last_dispatch = READ_BREADCRUMB(dev_priv); @@ -252,6 +258,12 @@ irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS) if (dev_priv->swaps_pending > 0) drm_locked_tasklet(dev, i915_vblank_tasklet); + I915_WRITE(I915REG_PIPEASTAT, + pipea_stats|I915_VBLANK_INTERRUPT_ENABLE| + I915_VBLANK_CLEAR); + I915_WRITE(I915REG_PIPEBSTAT, + pipeb_stats|I915_VBLANK_INTERRUPT_ENABLE| + I915_VBLANK_CLEAR); } return IRQ_HANDLED; -- 1.5.3.1
Re: State of the Linux PCI Subsystem for 2.6.23-rc8
On Thu, Sep 27, 2007 at 09:18:50PM -0400, Jeff Garzik wrote: > Greg KH wrote: >> On Thu, Sep 27, 2007 at 03:22:35AM -0400, Jeff Garzik wrote: >>> Greg KH wrote: On Wed, Sep 26, 2007 at 11:40:58PM +0200, Brice Goglin wrote: > Greg KH wrote: >> Here's a summary of the current state of the Linux PCI subsystem, as >> of >> 2.6.23-rc8. >> >> If the information in here is incorrect, or anyone knows of any >> outstanding issues not listed here, please let me know. >> >> List of outstanding regressions from 2.6.22: >> - none known. >> >> List of outstanding regressions from older kernel versions: >> - none known. >> > What about http://marc.info/?l=linux-pci=11907248538=2 ? That's not a regression, right? Tt's probably never worked for that kind of box :) I think the pci bus patches that are pending from Jeff Garzik should fix up these issues. They are in one of his trees, and in the -mm release, if you are able to test those. >>> jgarzik/misc-2.6.git#pciseg has my only outstanding PCI stuff, which is a >>> small x86[-64] PCI domain support patch. Mostly unrelated to the thread >>> at hand, alas, even though it was touching that area. >>> >>> I need to a few changes required by Andi, who made several good points, >>> then the PCI domains thing should be ready for upstream. I don't care >>> much who merges it, you, Andi or me. >> I'll take it, as I guess it should go through me, Andi is going to have >> enough merge issues for 2.6.24 :) >> I'll add them to my tree later today. > > Please don't pull 'pciseg' just yet... it needs the fixes Andi pointed > out, namely, it should be turned on by default in x86 / x86-64 platform > Kconfig, and have a boot-time method of disabling it. Ok, let me know when you want me to pull it and I will. Or just send me the patches by email, that's much easier for me :) thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
iwl4965 and driver merging policy
Hi ! Just a little question in the light of the discussion we had at Kernel Summit about merging drivers upstream (and here, I strongly agree with Linus, hence my message). I just got that new T61 laptop which happens to have an iwl4xxx chip. The distro I installed on it (ubuntu) has a driver for it. I suspect others do too and most users get it from some random external tree and use it. Thus my question, why are we about to release 2.6.23 without it ? It doesn't seem to pull any depedency nor affect any other external piece of code unless I'm missing something, so it's a perfect example of what we've been discussing back then: there is just no point not merging it at any time right ? :-) Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] New kernel-message logging API (take 2)
> Example: { > struct kprint_block out; > kprint_block_init(, KPRINT_DEBUG); > kprint_block(, "Stack trace:"); > > while(unwind_stack()) { > kprint_block(, "%p %s", address, symbol); > } > kprint_block_flush(); > } Assuming that kprint_block_flush() is a combination of kprint_block_printit() and kprint_block_abort(), you coulld make a macro wrapper for this to preclude leaks: #define KPRINT_BLOCK(block, level, code) \ do { \ struct kprint_block block; \ kprint_block_init(, KPRINT_##level); \ do { \ code ; \ kprint_block_printit(); \ while (0); \ kprint_block_abort(); \ } while(0) The inner do { } while(0) region is so you can abort with "break". (Or you can split it into KPRINT_BEGIN() and KPRINT_END() macros, if that works out to be cleaner.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] UML - Correctly handle skb allocation failures
On Thu, Sep 27, 2007 at 04:53:40PM -0700, Andrew Morton wrote: > Still wanna know why it is safe for uml_net_rx to be playing with > drop_skb when update_drop_skb() could be concurrently reallocating > and freeing it. Ah, yes, I missed that point in the horror of my botch last night. I'll add irqsave/irqrestore to the locking - keep this patch, and I'll send in a fix. Jeff -- Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: State of the Linux PCI Subsystem for 2.6.23-rc8
Greg KH wrote: On Thu, Sep 27, 2007 at 03:22:35AM -0400, Jeff Garzik wrote: Greg KH wrote: On Wed, Sep 26, 2007 at 11:40:58PM +0200, Brice Goglin wrote: Greg KH wrote: Here's a summary of the current state of the Linux PCI subsystem, as of 2.6.23-rc8. If the information in here is incorrect, or anyone knows of any outstanding issues not listed here, please let me know. List of outstanding regressions from 2.6.22: - none known. List of outstanding regressions from older kernel versions: - none known. What about http://marc.info/?l=linux-pci=11907248538=2 ? That's not a regression, right? Tt's probably never worked for that kind of box :) I think the pci bus patches that are pending from Jeff Garzik should fix up these issues. They are in one of his trees, and in the -mm release, if you are able to test those. jgarzik/misc-2.6.git#pciseg has my only outstanding PCI stuff, which is a small x86[-64] PCI domain support patch. Mostly unrelated to the thread at hand, alas, even though it was touching that area. I need to a few changes required by Andi, who made several good points, then the PCI domains thing should be ready for upstream. I don't care much who merges it, you, Andi or me. I'll take it, as I guess it should go through me, Andi is going to have enough merge issues for 2.6.24 :) I'll add them to my tree later today. Please don't pull 'pciseg' just yet... it needs the fixes Andi pointed out, namely, it should be turned on by default in x86 / x86-64 platform Kconfig, and have a boot-time method of disabling it. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] writeback: remove unnecessary wait in throttle_vm_writeout()
On Thu, Sep 27, 2007 at 11:16:10AM -0400, Rik van Riel wrote: > On Thu, 27 Sep 2007 09:50:16 +0800 > Fengguang Wu <[EMAIL PROTECTED]> wrote: > > > We don't want to introduce pointless delays in throttle_vm_writeout() > > when the writeback limits are not yet exceeded, do we? > > Good catch. Thank you. > > Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]> > > Reviewed-by: Rik van Riel <[EMAIL PROTECTED]> It could be a good fix for 2.6.22/23. But for -mm, I'm not sure if throttle_vm_writeout() will be eventually removed ;-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[4/4] mthca: allow setting "dmabarrier" on user-allocated memory
Use the dma_flags_set_dmabarrier() interface to allow a "dmabarrier" attribute to be associated with user-allocated memory. (For now, it's only implemented for mthca.) Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]> --- drivers/infiniband/core/umem.c |7 +-- drivers/infiniband/hw/amso1100/c2_provider.c |2 +- drivers/infiniband/hw/cxgb3/iwch_provider.c |2 +- drivers/infiniband/hw/ehca/ehca_mrmw.c |2 +- drivers/infiniband/hw/ipath/ipath_mr.c |2 +- drivers/infiniband/hw/mlx4/cq.c |2 +- drivers/infiniband/hw/mlx4/doorbell.c|2 +- drivers/infiniband/hw/mlx4/mr.c |3 ++- drivers/infiniband/hw/mlx4/qp.c |2 +- drivers/infiniband/hw/mlx4/srq.c |2 +- drivers/infiniband/hw/mthca/mthca_provider.c |7 ++- drivers/infiniband/hw/mthca/mthca_user.h | 10 +- include/rdma/ib_umem.h |4 ++-- 13 files changed, 32 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 664d2fa..5b30b0c 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -69,9 +69,10 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d * @addr: userspace virtual address to start at * @size: length of region to pin * @access: IB_ACCESS_xxx flags for memory being pinned + * @dmabarrier: set "dmabarrier" attribute on this memory, if necessary */ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, - size_t size, int access) + size_t size, int access, int dmabarrier) { struct ib_umem *umem; struct page **page_list; @@ -83,6 +84,8 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, int ret; int off; int i; + int flags = dmabarrier ? dma_flags_set_dmabarrier(DMA_BIDIRECTIONAL): + DMA_BIDIRECTIONAL; if (!can_do_mlock()) return ERR_PTR(-EPERM); @@ -160,7 +163,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, chunk->nmap = ib_dma_map_sg(context->device, >page_list[0], chunk->nents, - DMA_BIDIRECTIONAL); + flags); if (chunk->nmap <= 0) { for (i = 0; i < chunk->nents; ++i) put_page(chunk->page_list[i].page); diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c index 997cf15..17243b7 100644 --- a/drivers/infiniband/hw/amso1100/c2_provider.c +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -449,7 +449,7 @@ static struct ib_mr *c2_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, return ERR_PTR(-ENOMEM); c2mr->pd = c2pd; - c2mr->umem = ib_umem_get(pd->uobject->context, start, length, acc); + c2mr->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0); if (IS_ERR(c2mr->umem)) { err = PTR_ERR(c2mr->umem); kfree(c2mr); diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index f0c7775..d0a514c 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -601,7 +601,7 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, if (!mhp) return ERR_PTR(-ENOMEM); - mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc); + mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0); if (IS_ERR(mhp->umem)) { err = PTR_ERR(mhp->umem); kfree(mhp); diff --git a/drivers/infiniband/hw/ehca/ehca_mrmw.c b/drivers/infiniband/hw/ehca/ehca_mrmw.c index d97eda3..c13c11c 100644 --- a/drivers/infiniband/hw/ehca/ehca_mrmw.c +++ b/drivers/infiniband/hw/ehca/ehca_mrmw.c @@ -329,7 +329,7 @@ struct ib_mr *ehca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, } e_mr->umem = ib_umem_get(pd->uobject->context, start, length, -mr_access_flags); +mr_access_flags, 0); if (IS_ERR(e_mr->umem)) { ib_mr = (void *)e_mr->umem; goto reg_user_mr_exit1; diff --git a/drivers/infiniband/hw/ipath/ipath_mr.c b/drivers/infiniband/hw/ipath/ipath_mr.c index e442470..e351222 100644 --- a/drivers/infiniband/hw/ipath/ipath_mr.c +++ b/drivers/infiniband/hw/ipath/ipath_mr.c @@ -195,7 +195,7 @@ struct ib_mr *ipath_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
[3/4] dma: document dma_flags_set_dmabarrier()
Document dma_flags_set_dmabarrier(). Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]> --- DMA-API.txt | 26 ++ 1 files changed, 26 insertions(+) diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index cc7a8c3..5fc0bba 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -544,3 +544,29 @@ size is the size (and should be a page-sized multiple). The return value will be either a pointer to the processor virtual address of the memory, or an error (via PTR_ERR()) if any part of the region is occupied. + +int +dma_flags_set_dmabarrier(int dir) + +Amend dir (one of the enum dma_data_direction values), with a +platform-specific "dmabarrier" attribute. The dmabarrier attribute +forces a flush of all in-flight DMA when the associated memory +region is written to (see example below.) + +This provides a mechanism to enforce ordering of DMA on platforms that +permit DMA to be reordered between device and host memory (within a +NUMA interconnect). On other platforms this is a nop. + +The dmabarrier would be set when the memory region is mapped for DMA, +e.g.: + + int count, flags = dma_flags_set_dmabarrier(DMA_BIDIRECTIONAL); + + count = dma_map_sg(dev, sglist, nents, flags); + +As an example of a situation where this would be useful, suppose that +the device does a DMA write to indicate that data is ready and +available in memory. The DMA of the "completion indication" could +race with data DMA. Using a dmabarrier on the memory used for +completion indications would prevent the race. + - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[2/4] dma: redefine dma_flags_set_dmabarrier() for sn-ia64
define dma_flags_set_dmabarrier() for sn-ia64 - it "borrows" bits from the direction argument (renamed "flags") to the dma_map_* routines to pass an additional "dmabarrier" attribute. Also define routines to retrieve the original direction and attribute from "flags". Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]> --- arch/ia64/sn/pci/pci_dma.c | 35 ++- include/asm-ia64/sn/io.h | 24 2 files changed, 50 insertions(+), 9 deletions(-) diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c index d79ddac..6c0a498 100644 --- a/arch/ia64/sn/pci/pci_dma.c +++ b/arch/ia64/sn/pci/pci_dma.c @@ -153,7 +153,7 @@ EXPORT_SYMBOL(sn_dma_free_coherent); * @dev: device to map for * @cpu_addr: kernel virtual address of the region to map * @size: size of the region - * @direction: DMA direction + * @flags: DMA direction, and arch-specific attributes * * Map the region pointed to by @cpu_addr for DMA and return the * DMA address. @@ -167,17 +167,23 @@ EXPORT_SYMBOL(sn_dma_free_coherent); * figure out how to save dmamap handle so can use two step. */ dma_addr_t sn_dma_map_single(struct device *dev, void *cpu_addr, size_t size, -int direction) +int flags) { dma_addr_t dma_addr; unsigned long phys_addr; struct pci_dev *pdev = to_pci_dev(dev); struct sn_pcibus_provider *provider = SN_PCIDEV_BUSPROVIDER(pdev); + int dmabarrier = dma_flags_get_dmabarrier(flags); BUG_ON(dev->bus != _bus_type); phys_addr = __pa(cpu_addr); - dma_addr = provider->dma_map(pdev, phys_addr, size, SN_DMA_ADDR_PHYS); + if (dmabarrier) + dma_addr = provider->dma_map_consistent(pdev, phys_addr, size, + SN_DMA_ADDR_PHYS); + else + dma_addr = provider->dma_map(pdev, phys_addr, size, +SN_DMA_ADDR_PHYS); if (!dma_addr) { printk(KERN_ERR "%s: out of ATEs\n", __FUNCTION__); return 0; @@ -240,18 +246,20 @@ EXPORT_SYMBOL(sn_dma_unmap_sg); * @dev: device to map for * @sg: scatterlist to map * @nhwentries: number of entries - * @direction: direction of the DMA transaction + * @flags: direction of the DMA transaction, and arch-specific attributes * * Maps each entry of @sg for DMA. */ int sn_dma_map_sg(struct device *dev, struct scatterlist *sg, int nhwentries, - int direction) + int flags) { unsigned long phys_addr; struct scatterlist *saved_sg = sg; struct pci_dev *pdev = to_pci_dev(dev); struct sn_pcibus_provider *provider = SN_PCIDEV_BUSPROVIDER(pdev); int i; + int dmabarrier = dma_flags_get_dmabarrier(flags); + int direction = dma_flags_get_direction(flags); BUG_ON(dev->bus != _bus_type); @@ -259,12 +267,21 @@ int sn_dma_map_sg(struct device *dev, struct scatterlist *sg, int nhwentries, * Setup a DMA address for each entry in the scatterlist. */ for (i = 0; i < nhwentries; i++, sg++) { + dma_addr_t dma_addr; phys_addr = SG_ENT_PHYS_ADDRESS(sg); - sg->dma_address = provider->dma_map(pdev, - phys_addr, sg->length, - SN_DMA_ADDR_PHYS); - if (!sg->dma_address) { + if (dmabarrier) { + dma_addr = provider->dma_map_consistent(pdev, + phys_addr, + sg->length, + SN_DMA_ADDR_PHYS); + } else { + dma_addr = provider->dma_map(pdev, +phys_addr, sg->length, +SN_DMA_ADDR_PHYS); + } + + if (!(sg->dma_address = dma_addr)) { printk(KERN_ERR "%s: out of ATEs\n", __FUNCTION__); /* diff --git a/include/asm-ia64/sn/io.h b/include/asm-ia64/sn/io.h index 41c73a7..301bc47 100644 --- a/include/asm-ia64/sn/io.h +++ b/include/asm-ia64/sn/io.h @@ -271,4 +271,28 @@ sn_pci_set_vchan(struct pci_dev *pci_dev, unsigned long *addr, int vchan) return 0; } +#define ARCH_CAN_REORDER_DMA +/* here we steal some upper bits from the "direction" argument to the + * dma_map_* routines */ +#define DMA_ATTR_SHIFT 8 +/* bottom 8 bits for direction, remaining bits for additional "attributes" */ +#define DMA_BARRIER_ATTR 0x1 +/* Setting DMA_BARRIER_ATTR on a DMA-mapped memory region causes all in- + * flight DMA to be flushed when the memory region is written to. So + *
[1/4] dma: add dma_flags_set_dmabarrier() to dma interface
Introduce the dma_flags_set_dmabarrier() interface and give it a default no-op implementation. Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]> --- dma-mapping.h |6 ++ 1 files changed, 6 insertions(+) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 2dc21cb..4d1d199 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -99,4 +99,10 @@ static inline void dmam_release_declared_memory(struct device *dev) } #endif /* ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY */ +#ifndef ARCH_CAN_REORDER_DMA +static inline int dma_flags_set_dmabarrier(int dir) { + return dir; +} +#endif /* ARCH_CAN_REORDER_DMA */ + #endif - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/4] allow drivers to flush in-flight DMA v2
On Altix, DMA may be reordered between a device and host memory. This reordering can happen in the NUMA interconnect, and it usually results in correct operation and improved performance. In some situations it may be necessary to explicitly synchronize DMA from the device. This patchset allows a memory region to be mapped with a "dmabarrier". Writes to the memory region will cause in-flight DMA to be flushed, providing a mechanism to order DMA from a device. There are 4 patches in this patchset: [1/4] dma: add dma_flags_set_dmabarrier() to dma interface [2/4] dma: redefine dma_flags_set_dmabarrier() for sn-ia64 [3/4] dma: document dma_flags_set_dmabarrier() [4/4] mthca: allow setting "dmabarrier" on user-allocated memory -- Arthur - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sys_chroot+sys_fchdir Fix
Bill Davidsen wrote: It seems there are (at least) two parts to this, one regarding changing working directory which is clearly stated in the standards and must work as it does, and the various issues regarding getting out of the chroot after the cwd has entered that changed root. That second part seems to offer room for additional controls on getting out of the chroot which do not violate any of the obvious standards, and which therefore might be valid candidates for discussion on the basis of benefit rather than portability. Correct. BSDs solved the problem by changing cwd on subsequent use of chroot; I think there's a better way. I think the solution might be to add a "previous root", and restrict the process there as well as the new root. That is, once cwd is set within the new root, that new root is the limit. Prior to setting cwd within the new root, the previous root is the limit. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pci: use pci=bfsort for HP DL385 G2, DL585 G2
On Thu, Sep 27, 2007 at 11:18:44AM +0200, Michal Schmidt wrote: > Hello, > > HP ProLiant systems DL385 G2 and DL585 G2 need pci=bfsort to enumerate PCI > devices in the expected order. > > (John, can you please confirm and ACK this?) As a shameless plug, biosdevname is a userspace app I wrote to help solve this so we don't need to patch the kernel for future systems. It's not integrated into any distributions properly yet, but is included in openSUSE 10.3 and Fedora 8 for people who want to download and install it there. It acts as a udev helper. For the time being, patching the kernel is necessary. I really hope biosdevname eliminates that need in future distributions. http://linux.dell.com/biosdevname/ Thanks, Matt -- Matt Domsch Linux Technology Strategist, Dell Office of the CTO linux.dell.com & www.dell.com/linux - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] allow drivers to flush in-flight DMA
On Wed, Sep 26, 2007 at 12:49:50AM -0600, Grant Grundler wrote: [edited out several points that I think have been already addresed by others in this thread.] > > Defining it terms of completion queues won't mean much to most folks. > Better to add a description of completion queues to the DMA-API.txt if > necessary. dma_alloc_coherent() API is pretty well understood. OK, next time I'll use a more generic description. > > > There are four patches in this set: > > > > [1/4] dma: add dma_flags_set_dmaflush() to dma interface > > Sorry - this feels like a "color of the shed" argument, but isn't > this about DMA ordering attribute? > "dmaflush" is an action and not an attribute to me. Right - an attribute is a noun, not a verb. I'm going to try "s/dmaflush/dmabarrier/" in the next version. > > This patch updates Documentation/DMA-mapping.txt. But it's a change to > the generic (not PCI specific) API described in DMA-API.txt. > Can you update that as well please? > Ja, I realized that soon after hitting the send button. I'll move the documentation to DMA-API.txt. -- Arthur - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] dma: add dma_flags_set_dmaflush() to dma interface
On Tue, Sep 25, 2007 at 10:13:33PM -0700, Randy Dunlap wrote: > On Tue, 25 Sep 2007 17:00:57 -0700 [EMAIL PROTECTED] wrote: > .. > 1. Function signature should be on one line if possible (and it is). > Aw crud, I looked at dma-mapping.h and it uses this format sometimes. > Well, it's undesirable, so please don't propagate it. > > 2. No parens on return: it's not a function. > > static inline int dma_flags_set_dmaflush(int dir) > { > return dir; > } > > > Similar comments for patch 2/4: sn-ia64. > Both fixed in next version. Thanks, Randy. -- Arthur - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More E820 brokenness
Jordan Crouse wrote: > > Worked, but that just raises more questions. Why didn't more x86 boxes > break or, alternatively, why did a new version of the BIOS fix the problem? > I guess we shouldn't look a gift horse in the mouth. Or something. > Why didn't more x86 boxes break... well, it's pretty natural an implementation of the BIOS to not clobber registers that aren't outputs. Arguably the BIOSes that do are still buggy, since there isn't a well-defined calling sequence for the BIOS and the convention that has evolved is "don't clobber anything unless it's an output." It's still wrong, however, especially since it means omitting the *real* SMAP check. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] QoS params patch update.
On Thu, Sep 27, 2007 at 01:17:39PM -0700, Mark Gross wrote: > Updated qos PM parameter patch: > Note: the replacing of latency.c with this is a separate patch. > > this patch attempts to address the issues raised so far. > [snip] > +static int register_new_qos_misc(struct qos_object *qos) > +{ > + int ret; > + > + qos->qos_power_miscdev.minor = MISC_DYNAMIC_MINOR; > + qos->qos_power_miscdev.name = qos->name; > + qos->qos_power_miscdev.fops = _power_fops; > + > + ret = misc_register(>qos_power_miscdev); > + > + return ret; > +} > + Minor nit, ret is a pointless variable here, you can just return misc_register directly. Other than that, this looks much better! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More E820 brokenness
On 27/09/07 16:36 -0700, H. Peter Anvin wrote: > Jordan Crouse wrote: > >>> > >> Oh bugger, looks like this one might be genuinely my fault after all. > >> The ID check in the new code is buggy. > >> > >> Can you please test this revised patch out (against current -git)? > > > > > > That looks the same as the previous patch you sent? > > > > Sorry, this is the right one... > > -hpa > diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c > index bccaa1c..2f37568 100644 > --- a/arch/i386/boot/memory.c > +++ b/arch/i386/boot/memory.c > @@ -28,11 +28,10 @@ static int detect_memory_e820(void) > > do { > size = sizeof(struct e820entry); > - id = SMAP; > asm("int $0x15; setc %0" > - : "=am" (err), "+b" (next), "+d" (id), "+c" (size), > + : "=dm" (err), "+b" (next), "=a" (id), "+c" (size), > "=m" (*desc) > - : "D" (desc), "a" (0xe820)); > + : "D" (desc), "d" (SMAP), "a" (0xe820)); > > /* Some BIOSes stop returning SMAP in the middle of > the search loop. We don't know exactly how the BIOS Worked, but that just raises more questions. Why didn't more x86 boxes break or, alternatively, why did a new version of the BIOS fix the problem? I guess we shouldn't look a gift horse in the mouth. Or something. Thanks very much for your help. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stardom SATA HSM violation
Jeff Garzik wrote: > Tejun Heo wrote: >> Alan Cox wrote: I think there have been enough cases where this draining was necessary. IIRC, ata_piix was involved in those cases, right? If so, can you please submit a patch which applies this only to affected controllers? I don't feel too confident about applying this to all SFF controllers. >>> Old IDE does it on all controllers bar a couple. So we have a very good >>> knowledge of what does/doesn't work. The one that needs care in old ide >>> is an ordering issue where a state machine reset done first causes the >>> drain of the I/O to hang. >> >> Hmmm... So, do we apply draining to all PATA? Or is ata_piix SATA >> affected too? > > I would think all SFF controllers, since a lot of first gen SATA are > really bridged solutions. If they are flagging DRQ, I say oblige them :) Alright, then the posted patch should be good enough. Mark, can you be bothered to regenerate the patch and post it one more time (again)? It seems we all agree the update is needed. Thanks a lot. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] UML - Correctly handle skb allocation failures
On Thu, 27 Sep 2007 13:01:26 -0400 Jeff Dike <[EMAIL PROTECTED]> wrote: > +static int update_drop_skb(int max) > +{ > + struct sk_buff *new; > + int err = 0; > + > + spin_lock(_lock); > + > + if (max <= drop_max) > + goto out; > + > + err = -ENOMEM; > + new = dev_alloc_skb(max); > + if (new == NULL) > + goto out; > + > + skb_put(new, max); > + > + kfree_skb(drop_skb); > + drop_skb = new; > + drop_max = max; > + err = 0; > +out: > + spin_unlock(_lock); > + > + return err; > +} > + > static int uml_net_rx(struct net_device *dev) > { > struct uml_net_private *lp = dev->priv; > @@ -43,6 +82,9 @@ static int uml_net_rx(struct net_device > /* If we can't allocate memory, try again next round. */ > skb = dev_alloc_skb(lp->max_packet); > if (skb == NULL) { > + drop_skb->dev = dev; > + /* Read a packet into drop_skb and don't do anything with it. */ > + (*lp->read)(lp->fd, drop_skb, lp); > lp->stats.rx_dropped++; > return 0; Still wanna know why it is safe for uml_net_rx to be playing with drop_skb when update_drop_skb() could be concurrently reallocating and freeing it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Some IO scheduler cleanup in Documentation/block
On Thu, Sep 27 2007, Alan D. Brunelle wrote: > > [PATCH] Some IO scheduler cleanup in Documentation/block [snip] Thanks Alan, applied. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stardom SATA HSM violation
Tejun Heo wrote: Alan Cox wrote: I think there have been enough cases where this draining was necessary. IIRC, ata_piix was involved in those cases, right? If so, can you please submit a patch which applies this only to affected controllers? I don't feel too confident about applying this to all SFF controllers. Old IDE does it on all controllers bar a couple. So we have a very good knowledge of what does/doesn't work. The one that needs care in old ide is an ordering issue where a state machine reset done first causes the drain of the I/O to hang. Hmmm... So, do we apply draining to all PATA? Or is ata_piix SATA affected too? I would think all SFF controllers, since a lot of first gen SATA are really bridged solutions. If they are flagging DRQ, I say oblige them :) Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] Hook up group scheduler with control groups
On Fri, 28 Sep 2007 01:05:12 +0530 Dhaval Giani <[EMAIL PROTECTED]> wrote: > On Thu, Sep 27, 2007 at 12:00:33PM -0700, Randy Dunlap wrote: > > On Thu, 27 Sep 2007 23:34:15 +0530 Dhaval Giani wrote: > > > > > > > > > +config RESOURCE_COUNTERS > > > + bool "Resource counters" > > > + help > > > + This option enables controller independent resource accounting > > > > Above line is tab + 2 spaces (i.e., correct). > > > > > + infrastructure that works with cgroups. > > > > Above line indent is 10 spaces (i.e., not correct). > > > > Ah! Thanks for the explanation. Corrected patch follows. > > Signed-off-by : Srivatsa Vaddagiri <[EMAIL PROTECTED]> > Signed-off-by : Dhaval Giani <[EMAIL PROTECTED]> > > ... > > @@ -219,6 +225,9 @@ static inline struct task_grp *task_grp( > > #ifdef CONFIG_FAIR_USER_SCHED > tg = p->user->tg; > +#elif CONFIG_FAIR_CGROUP_SCHED > + tg = container_of(task_subsys_state(p, cpu_cgroup_subsys_id), > + struct task_grp, css); > #else > tg = _task_grp; > #endif that's a bit funny-looking. Are CONFIG_FAIR_CGROUP_SCHED and CONFIG_FAIR_USER_SCHED mutually exclusive? Doesn't seem that way. if they're both defined then CONFIG_FAIR_USER_SCHED "wins". Anyway, please confirm that this is correct? I'll switch that to `#elif defined(CONFIG_FAIR_CGROUP_SCHED)'. We can get gcc warnings with `#if CONFIG_FOO', and people should be using `#ifdef CONFIG_FOO', so I assume the same applies to #elif. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: Correct SuS compliance for open of large file without options
On Thu, Sep 27 2007, Theodore Tso wrote: > On Thu, Sep 27, 2007 at 04:19:12PM +0100, Alan Cox wrote: > > > Well it's not my call, just seems like a really bad idea to change the > > > error value. You can't claim full coverage for such testing anyway, it's > > > one of those things that people will complain about two releases later > > > saying it broke app foo. > > > > Strange since we've spent years changing error values and getting them > > right in the past. > > I doubt there any apps which are going to specifically check for EFBIG > and do soemthing different if they get EOVERFLOW instead. If it was > something like EAGAIN or EPERM, I'd be more concerned, but EFBIG > vs. EOVERFLOW? C'mon! It's not checking EFBIG vs EOVERFLOW, it's checking one and not the other. But I digress, not trying to NAK the patch, just voicing my opinion on the matter. It's not something you can easily test and claim good app coverage, though. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More E820 brokenness
Jordan Crouse wrote: >>> >> Oh bugger, looks like this one might be genuinely my fault after all. >> The ID check in the new code is buggy. >> >> Can you please test this revised patch out (against current -git)? > > > That looks the same as the previous patch you sent? > Sorry, this is the right one... -hpa diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index bccaa1c..2f37568 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -28,11 +28,10 @@ static int detect_memory_e820(void) do { size = sizeof(struct e820entry); - id = SMAP; asm("int $0x15; setc %0" - : "=am" (err), "+b" (next), "+d" (id), "+c" (size), + : "=dm" (err), "+b" (next), "=a" (id), "+c" (size), "=m" (*desc) - : "D" (desc), "a" (0xe820)); + : "D" (desc), "d" (SMAP), "a" (0xe820)); /* Some BIOSes stop returning SMAP in the middle of the search loop. We don't know exactly how the BIOS
Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure
On Thu, Sep 27, 2007 at 04:10:31PM -0600, Matthew Wilcox wrote: > In the error handler, we wait_for_completion(io_reset_wait). > In sym2_io_error_detected, we init_completion(io_reset_wait). > Isn't it possible that we hit the error handler before we hit the > io_error_detected path, and thus the completion wait is lost? > Since the completion is already initialised in sym_attach(), I don't > think we need to initialise it in sym2_io_error_detected(). > Makes sense to just delete it? Good catch. But no ... and I had to study this a bit. Bear with me: It is enough to call init_completion() once, and not once per use: it initializes spinlocks, which shouldn't be intialized twice. But, that completion might be used multiple times when there are multiple errors, and so, before using it a second time, one must set completion->done = 0. The INIT_COMPLETION() macro does this. One must have completion->done = 0 before every use, as otherwise, wait_for_completion() won't actually wait. And since complete_all() sets x->done += UINT_MAX/2, I'm pretty sure x->done won't be zero the next time we use it, unless we make it so. So I need to find a place to safely call INIT_COMPLETION() again, after the completion has been used. At the moment, I'm stumped as to where to do this. [think ... think ... think] I think the race you describe above is harmless. The first time that sym_eh_handler() will run, it will be with SYM_EH_ABORT, in it doesn't matter if we lose that, since the device is hosed anyway. At some later time, it will run with SYM_EH_DEVICE_RESET and then SYM_EH_BUS_RESET and then SYM_EH_HOST_RESET, and we won't miss those, since, by now, sym2_io_error_detected() will have run. So, by my reading, I'd say that init_completion() in sym2_io_error_detected() has to stay (although perhaps it should be replaced by the INIT_COMPLETION() macro.) Removing it will prevent correct operation on the second and subsequent errors. --Linas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More E820 brokenness
On 27/09/07 16:27 -0700, H. Peter Anvin wrote: > Jordan Crouse wrote: > > On 27/09/07 15:47 -0700, H. Peter Anvin wrote: > >> Jordan Crouse wrote: > >>> Breaks on the Geode - original behavior. > >>> > >>> I think that having boot_prams.e820_entries != 0 makes the kernel > >>> assume the e820 data is correct. > >>> > >> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode, > >> because this, to the best of my reading, mimics the 2.6.22 behavior > >> exactly. DID IT REALLY, and/or did you make any kind of configuration > >> changes? > > > > I copied in a 2.6.22 kernel to see that it really did work, and it did. > > But here's the crazy part - I did a dmesg, and it looks like it > > *is* using e820 data, and it looks complete (I see the entire map - > > including the ACPI and reserved blocks way up high). > > > > So apparently it was the 2.6.22 code that was buggy, but reading it, > > I don't immediately see how. > > > > Oh bugger, looks like this one might be genuinely my fault after all. > The ID check in the new code is buggy. > > Can you please test this revised patch out (against current -git)? > -hpa > > > > diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c > index bccaa1c..84939b7 100644 > --- a/arch/i386/boot/memory.c > +++ b/arch/i386/boot/memory.c > @@ -34,17 +34,7 @@ static int detect_memory_e820(void) > "=m" (*desc) > : "D" (desc), "a" (0xe820)); > > - /* Some BIOSes stop returning SMAP in the middle of > -the search loop. We don't know exactly how the BIOS > -screwed up the map at that point, we might have a > -partial map, the full map, or complete garbage, so > -just return failure. */ > - if (id != SMAP) { > - count = 0; > - break; > - } > - > - if (err) > + if (id != SMAP || err) > break; > > count++; That looks the same as the previous patch you sent? Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stardom SATA HSM violation
Alan Cox wrote: >> I think there have been enough cases where this draining was necessary. >> IIRC, ata_piix was involved in those cases, right? If so, can you >> please submit a patch which applies this only to affected controllers? >> I don't feel too confident about applying this to all SFF controllers. > > Old IDE does it on all controllers bar a couple. So we have a very good > knowledge of what does/doesn't work. The one that needs care in old ide > is an ordering issue where a state machine reset done first causes the > drain of the I/O to hang. Hmmm... So, do we apply draining to all PATA? Or is ata_piix SATA affected too? -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: Correct SuS compliance for open of large file without options
On Thu, Sep 27, 2007 at 07:19:27PM -0400, Theodore Tso wrote: > Would you accept a patch which causes the deprecated sysfs > files/directories to disappear, even if CONFIG_SYS_DEPRECATED is > defined, via a boot-time parameter? How about a mount option? That way people can test without a reboot: mount -o remount,deprecated={yes,no} /sys -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More E820 brokenness
Jordan Crouse wrote: > On 27/09/07 15:47 -0700, H. Peter Anvin wrote: >> Jordan Crouse wrote: >>> Breaks on the Geode - original behavior. >>> >>> I think that having boot_prams.e820_entries != 0 makes the kernel >>> assume the e820 data is correct. >>> >> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode, >> because this, to the best of my reading, mimics the 2.6.22 behavior >> exactly. DID IT REALLY, and/or did you make any kind of configuration >> changes? > > I copied in a 2.6.22 kernel to see that it really did work, and it did. > But here's the crazy part - I did a dmesg, and it looks like it > *is* using e820 data, and it looks complete (I see the entire map - > including the ACPI and reserved blocks way up high). > > So apparently it was the 2.6.22 code that was buggy, but reading it, > I don't immediately see how. > Oh bugger, looks like this one might be genuinely my fault after all. The ID check in the new code is buggy. Can you please test this revised patch out (against current -git)? -hpa diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index bccaa1c..84939b7 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -34,17 +34,7 @@ static int detect_memory_e820(void) "=m" (*desc) : "D" (desc), "a" (0xe820)); - /* Some BIOSes stop returning SMAP in the middle of - the search loop. We don't know exactly how the BIOS - screwed up the map at that point, we might have a - partial map, the full map, or complete garbage, so - just return failure. */ - if (id != SMAP) { - count = 0; - break; - } - - if (err) + if (id != SMAP || err) break; count++;
Re: sata_sil24 broken since 2.6.23-rc4-mm1
Torsten Kaiser wrote: > Known good is for me 2.6.23-rc3-mm1, the first known bad is 2.6.23-rc4-mm1. > I will try to look at the diff between these revisions some more, but > the change in sata_sil24.c looked like a perfect match for the > symptoms I was seeing. I think the first thing to do here is to verify 2.6.23-rc3-mm1 still works fine and my previous debug patch is pretty much meaningless if address initialization failure isn't the cause. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: Correct SuS compliance for open of large file without options
On Thu, Sep 27, 2007 at 06:27:48PM -0400, Kyle Moffett wrote: > On Sep 27, 2007, at 17:34:45, Greg KH wrote: >> On Thu, Sep 27, 2007 at 02:37:42PM -0400, Theodore Tso wrote: >>> That fact that sysfs is all laid out in a directory, but for which some >>> directories/symlinks are OK to use, and some are NOT OK to use --- is why >>> I call the sysfs interface "an open pit". >> >> And because of the original design mistakes, we have only been able to >> change things for the better in a slow manner. We have had userspace >> programs fixed up for _years_ before we are able to make the corresponding >> changes in the kernel, so as to not break the distros that are slow to >> upgrade packages and kernels (like Debian.) > > Hey! No poking fingers at Debian here; it's been *MUCH* improved lately. Heh, sorry, but Debian in the past had a lot of problems in this area. It's good to know that this is no longer a issue :) thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More E820 brokenness
Jordan Crouse wrote: > > I copied in a 2.6.22 kernel to see that it really did work, and it did. > But here's the crazy part - I did a dmesg, and it looks like it > *is* using e820 data, and it looks complete (I see the entire map - > including the ACPI and reserved blocks way up high). > > So apparently it was the 2.6.22 code that was buggy, but reading it, > I don't immediately see how. > Was this a stock 2.6.22 kernel, or might it have been modified? There is, of course, also the possibility that triggering the BIOS bug in your case depends on some delicate combination of input state. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: Correct SuS compliance for open of large file without options
On Thu, Sep 27, 2007 at 02:34:45PM -0700, Greg KH wrote: > Ok, how then should I advertise this better? What can we do better to > help userspace programmers out in this regard? Would you accept a patch which causes the deprecated sysfs files/directories to disappear, even if CONFIG_SYS_DEPRECATED is defined, via a boot-time parameter? Many people and distros are likely to keep CONFIG_SYS_DEPRECATED defined just our of paranoia that things might break. Doing a quick google, I note that Fedora has been going back and forth of turning it off, watching things break, and then turning it back on. The latest time, the changelog said: * Fri Jan 26 23:00:00 2007 Bill Nottingham - turn on CONFIG_SYSFS_DEPRECATED so that things actually work. *sigh* (and I've checked, Fedora's CVS still has CONFIG_SYSFS_DEPRECATED defined; it's not just Debian at fault here.) So having a boot-time parameter would make it much easier for application programmers (who run distro kernels and who are unlikely to want to compile their own custom kernel) to test to see what breaks without CONFIG_SYS_DEPRECATED. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More E820 brokenness
On 27/09/07 15:47 -0700, H. Peter Anvin wrote: > Jordan Crouse wrote: > > > > Breaks on the Geode - original behavior. > > > > I think that having boot_prams.e820_entries != 0 makes the kernel > > assume the e820 data is correct. > > > > Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode, > because this, to the best of my reading, mimics the 2.6.22 behavior > exactly. DID IT REALLY, and/or did you make any kind of configuration > changes? I copied in a 2.6.22 kernel to see that it really did work, and it did. But here's the crazy part - I did a dmesg, and it looks like it *is* using e820 data, and it looks complete (I see the entire map - including the ACPI and reserved blocks way up high). So apparently it was the 2.6.22 code that was buggy, but reading it, I don't immediately see how. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kswapd should only wait on IO if there is IO
On Thu, 27 Sep 2007 15:59:07 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > And lost the changelog ;) Good point. The current kswapd (and try_to_free_pages) code has an oddity where the code will wait on IO, even if there is no IO in flight. This problem is notable especially when the system scans through many unfreeable pages, causing unnecessary stalls in the VM. Additionally, tasks without __GFP_FS or __GFP_IO in the direct reclaim path will sleep if a significant number of pages are encountered that should be written out. This gives kswapd a chance to write out those pages, while the direct reclaim task sleeps. Signed-off-by: Rik van Riel <[EMAIL PROTECTED]> diff -up linux-2.6.22/mm/vmscan.c.wait linux-2.6.22/mm/vmscan.c --- linux-2.6.22/mm/vmscan.c.wait 2007-09-27 18:45:57.0 -0400 +++ linux-2.6.22/mm/vmscan.c2007-09-27 18:48:43.0 -0400 @@ -68,6 +68,13 @@ struct scan_control { int all_unreclaimable; int order; + + /* +* Pages that have (or should have) IO pending. If we run into +* a lot of these, we're better off waiting a little for IO to +* finish rather than scanning more pages in the VM. +*/ + int nr_io_pages; }; #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru)) @@ -489,8 +496,10 @@ static unsigned long shrink_page_list(st */ if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs) wait_on_page_writeback(page); - else + else { + sc->nr_io_pages++; goto keep_locked; + } } referenced = page_referenced(page, 1); @@ -529,8 +538,10 @@ static unsigned long shrink_page_list(st if (PageDirty(page)) { if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced) goto keep_locked; - if (!may_enter_fs) + if (!may_enter_fs) { + sc->nr_io_pages++; goto keep_locked; + } if (!sc->may_writepage) goto keep_locked; @@ -541,8 +552,10 @@ static unsigned long shrink_page_list(st case PAGE_ACTIVATE: goto activate_locked; case PAGE_SUCCESS: - if (PageWriteback(page) || PageDirty(page)) + if (PageWriteback(page) || PageDirty(page)) { + sc->nr_io_pages++; goto keep; + } /* * A synchronous write - probably a ramdisk. Go * ahead and try to reclaim the page. @@ -1201,6 +1214,7 @@ unsigned long try_to_free_pages(struct z for (priority = DEF_PRIORITY; priority >= 0; priority--) { sc.nr_scanned = 0; + sc.nr_io_pages = 0; if (!priority) disable_swap_token(); nr_reclaimed += shrink_zones(priority, zones, ); @@ -1229,7 +1243,8 @@ unsigned long try_to_free_pages(struct z } /* Take a nap, wait for some writeback to complete */ - if (sc.nr_scanned && priority < DEF_PRIORITY - 2) + if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && + sc.nr_io_pages > sc.swap_cluster_max) congestion_wait(WRITE, HZ/10); } /* top priority shrink_caches still had more to do? don't OOM, then */ @@ -1315,6 +1330,7 @@ loop_again: if (!priority) disable_swap_token(); + sc.nr_io_pages = 0; all_zones_ok = 1; /* @@ -1398,7 +1414,8 @@ loop_again: * OK, kswapd is getting into trouble. Take a nap, then take * another pass across the zones. */ - if (total_scanned && priority < DEF_PRIORITY - 2) + if (total_scanned && priority < DEF_PRIORITY - 2 && + sc.nr_io_pages > sc.swap_cluster_max) congestion_wait(WRITE, HZ/10); /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with SMP & ACPI powering off
Rafael J. Wysocki wrote: On Thursday, 27 September 2007 23:29, Mark Lord wrote: Question: do we disable all CPUs except 0 when doing ACPI power off? No, but we should. Background: I have a machine here dedicated to running MythTV. It powers up to record, and then sets the RTC alarm for next time and powers down again in between recordings. It has an Intel Core2duo E6300 CPU, currently on an ICH8 motherboard. Previously it was on a completely different (vendor,bios,...) ICH7 motherboard. In both cases, "halt -p" sometimes fails to actually turn off the power, which means that it later then fails to "turn on" to record again. Annoying. This is a 32-bit kernel/runtime, with full ACPI (not APM) kernel support enabled. So I'm wondering if it may be due to the old SMP-poweroff bogeyman ? May be. Which kernel? Latest 2.6.23-rc-git. Same problem from time to time on 2.6.17, as well. Dunno about in between those Revs., but it's much more common on the latest than it was on the old kernel. For now, I've hardcoded a cpu_down(1) into the poweroff code, and we'll see if that helps or is merely redundant. But I do wonder where else to look for a cause? Two different boards, vendors, BIOSs, same CPU chip. Same problem. Same chipset, perchance? Mmmm I originally didn't think so. But actually one board is ICH8, the other ICH8R, so yes, they use the same chipset. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Linux-tiny project revival
On Thursday 27 September 2007 2:00:36 am Arnd Bergmann wrote: > #define KERN_NOTICE "<5>", > > #define PRINTK_CONTINUED "", > > #define printk(level, str, ...) \ >do { \ > if (sizeof(level) == 1) /* continued printk */\ > actual_printk(str, __VA_ARGS__); \ > else if ((level[1] - '0') < CONFIG_PRINTK_DOICARE) \ >actual_printk(level str, __VA_ARGS__); \ >} while(0); > > Then you don't have to change every single printk in the kernel, but > only those that don't currently come with a log level. More importantly, > you can do the conversion without a flag day, by spreading (an empty) > PRINTK_CONTINUED in places that do need a printk without a log level. The "change every printk in the kernel" suggestion came from me trying to figure out how to get the printk() calls below a certain log level to optimize out and not take up space in the binary. The above doesn't address the original cause of the thread, as far as I can tell. > Arnd <>< Rob -- "One of my most productive days was throwing away 1000 lines of code." - Ken Thompson. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bluez-devel] Warnings and Bug on 2.6.23-rc6 closing rfcomm links (device_move() API ?)
Hi Cornelia, > > >> Yet another report, once again while putting rfcomm system under load. > > >> Several USB adapters, several links. > > > > > > Is this a regression or does it happen with 2.6.22 too? > > > > I've not tested with 2.6.22, but have done it a few days ago with > > 2.6.21-2-486 (stock debian package), and got the 2 Oops below. Maybe > > that's a different problem, or maybe not? > > > > > > kobject_add failed for rfcomm1 with -EEXIST, don't try to register > > things with the same name in the same directory. > > There's something wrong with rfcomm trying to create objects with > duplicate names... that should have been fixed. Regards Marcel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kswapd should only wait on IO if there is IO
On Thu, 27 Sep 2007 18:50:27 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > On Thu, 27 Sep 2007 15:21:21 -0700 > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > Nope, sc.nr_io_pages will also be incremented when the code runs into > > > pages that are already PageWriteback. > > > > yup, I didn't think of that. Hopefully someone else will be in there > > working on that zone too. If this caller yields and defers to kswapd > > then that's very likely. Except we just took away the ability to do that.. > > if (PageDirty(page)) { > if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && > referenced) > goto keep_locked; > if (!may_enter_fs) > goto keep_locked; > > I think we can fix that problem by adding a sc->nr_io_pages++ > between the last if and the goto keep_locked in shrink_page_list. > > That way !GFP_IO or !GFP_FS tasks will cause themselves to sleep > if there are pages that need to be written out, even if those > pages are not in flight to disk yet. yeah, that's prudent I guess. > I have also added the comment you wanted. And lost the changelog ;) > - if (sc.nr_scanned && priority < DEF_PRIORITY - 2) > + if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && > + sc.nr_io_pages > sc.swap_cluster_max) I do think this design decision needs a bit of explanation too. > congestion_wait(WRITE, HZ/10); > } > /* top priority shrink_caches still had more to do? don't OOM, then */ > @@ -1315,6 +1330,7 @@ loop_again: > if (!priority) > disable_swap_token(); > > + sc.nr_io_pages = 0; > all_zones_ok = 1; > > /* > @@ -1398,7 +1414,8 @@ loop_again: >* OK, kswapd is getting into trouble. Take a nap, then take >* another pass across the zones. >*/ > - if (total_scanned && priority < DEF_PRIORITY - 2) > + if (total_scanned && priority < DEF_PRIORITY - 2 && As did that one. Ho hum :( Maybe it's in the git history somewhere. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kswapd should only wait on IO if there is IO
On Thu, 27 Sep 2007 15:21:21 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > > Nope, sc.nr_io_pages will also be incremented when the code runs into > > pages that are already PageWriteback. > > yup, I didn't think of that. Hopefully someone else will be in there > working on that zone too. If this caller yields and defers to kswapd > then that's very likely. Except we just took away the ability to do that.. if (PageDirty(page)) { if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced) goto keep_locked; if (!may_enter_fs) goto keep_locked; I think we can fix that problem by adding a sc->nr_io_pages++ between the last if and the goto keep_locked in shrink_page_list. That way !GFP_IO or !GFP_FS tasks will cause themselves to sleep if there are pages that need to be written out, even if those pages are not in flight to disk yet. I have also added the comment you wanted. Signed-off-by: Rik van Riel <[EMAIL PROTECTED]> diff -up linux-2.6.23-rc7/mm/vmscan.c.wait linux-2.6.22/mm/vmscan.c --- linux-2.6.23-rc7/mm/vmscan.c.wait 2007-09-27 18:45:57.0 -0400 +++ linux-2.6.23-rc7/mm/vmscan.c2007-09-27 18:48:43.0 -0400 @@ -68,6 +68,13 @@ struct scan_control { int all_unreclaimable; int order; + + /* +* Pages that have (or should have) IO pending. If we run into +* a lot of these, we're better off waiting a little for IO to +* finish rather than scanning more pages in the VM. +*/ + int nr_io_pages; }; #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru)) @@ -489,8 +496,10 @@ static unsigned long shrink_page_list(st */ if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs) wait_on_page_writeback(page); - else + else { + sc->nr_io_pages++; goto keep_locked; + } } referenced = page_referenced(page, 1); @@ -529,8 +538,10 @@ static unsigned long shrink_page_list(st if (PageDirty(page)) { if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced) goto keep_locked; - if (!may_enter_fs) + if (!may_enter_fs) { + sc->nr_io_pages++; goto keep_locked; + } if (!sc->may_writepage) goto keep_locked; @@ -541,8 +552,10 @@ static unsigned long shrink_page_list(st case PAGE_ACTIVATE: goto activate_locked; case PAGE_SUCCESS: - if (PageWriteback(page) || PageDirty(page)) + if (PageWriteback(page) || PageDirty(page)) { + sc->nr_io_pages++; goto keep; + } /* * A synchronous write - probably a ramdisk. Go * ahead and try to reclaim the page. @@ -1201,6 +1214,7 @@ unsigned long try_to_free_pages(struct z for (priority = DEF_PRIORITY; priority >= 0; priority--) { sc.nr_scanned = 0; + sc.nr_io_pages = 0; if (!priority) disable_swap_token(); nr_reclaimed += shrink_zones(priority, zones, ); @@ -1229,7 +1243,8 @@ unsigned long try_to_free_pages(struct z } /* Take a nap, wait for some writeback to complete */ - if (sc.nr_scanned && priority < DEF_PRIORITY - 2) + if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && + sc.nr_io_pages > sc.swap_cluster_max) congestion_wait(WRITE, HZ/10); } /* top priority shrink_caches still had more to do? don't OOM, then */ @@ -1315,6 +1330,7 @@ loop_again: if (!priority) disable_swap_token(); + sc.nr_io_pages = 0; all_zones_ok = 1; /* @@ -1398,7 +1414,8 @@ loop_again: * OK, kswapd is getting into trouble. Take a nap, then take * another pass across the zones. */ - if (total_scanned && priority < DEF_PRIORITY - 2) + if (total_scanned && priority < DEF_PRIORITY - 2 && + sc.nr_io_pages > sc.swap_cluster_max) congestion_wait(WRITE, HZ/10); /* - To unsubscribe from this list: send
[PATCH] removes array_size duplicates
This patch removes some ARRAY_SIZE macro duplicates. There is also one in arch/um/include/user.h, which isn't fixed here because comments in that file explicitly state a preference for the 'less fancy' version. If that's the case as well for any of the other replacements please comment. Signed-off-by: Roel Kluin <[EMAIL PROTECTED]> --- Documentation/spi/spidev_test.c|2 -- arch/i386/boot/compressed/relocs.c |1 - arch/m68k/amiga/amisound.c |3 +-- arch/powerpc/boot/types.h |2 -- arch/sparc64/kernel/pci.c |6 ++ drivers/acpi/utilities/uteval.c|4 ++-- drivers/net/irda/actisys-sir.c |6 ++ drivers/net/lp486e.c |4 +--- drivers/net/sk98lin/skgemib.c |5 - drivers/net/skfp/smt.c |4 +--- drivers/net/skfp/srf.c | 18 +++--- drivers/net/wireless/ipw2100.c | 13 - drivers/serial/68328serial.c |6 ++ drivers/video/sgivwfb.c|4 ++-- include/acpi/acmacros.h|2 -- include/linux/netfilter/xt_sctp.h | 12 +--- include/net/ip_vs.h|1 - include/video/sgivw.h |1 - net/ipv4/ipvs/ip_vs_proto_tcp.c|2 +- scripts/mod/file2alias.c |2 -- 20 files changed, 30 insertions(+), 68 deletions(-) diff --git a/Documentation/spi/spidev_test.c b/Documentation/spi/spidev_test.c index 218e862..0f23aac 100644 --- a/Documentation/spi/spidev_test.c +++ b/Documentation/spi/spidev_test.c @@ -21,8 +21,6 @@ #include #include -#define ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0])) - static void pabort(const char *s) { perror(s); diff --git a/arch/i386/boot/compressed/relocs.c b/arch/i386/boot/compressed/relocs.c index 2d77ee7..5d8dbff 100644 --- a/arch/i386/boot/compressed/relocs.c +++ b/arch/i386/boot/compressed/relocs.c @@ -11,7 +11,6 @@ #include #define MAX_SHDRS 100 -#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) static Elf32_Ehdr ehdr; static Elf32_Shdr shdr[MAX_SHDRS]; static Elf32_Sym *symtab[MAX_SHDRS]; diff --git a/arch/m68k/amiga/amisound.c b/arch/m68k/amiga/amisound.c index 1f5bfb5..8d013a1 100644 --- a/arch/m68k/amiga/amisound.c +++ b/arch/m68k/amiga/amisound.c @@ -21,7 +21,6 @@ static const signed char sine_data[] = { 0, 39, 75, 103, 121, 127, 121, 103, 75, 39, 0, -39, -75, -103, -121, -127, -121, -103, -75, -39 }; -#define DATA_SIZE (sizeof(sine_data)/sizeof(sine_data[0])) #define custom amiga_custom @@ -55,7 +54,7 @@ void __init amiga_init_sound(void) memcpy (snd_data, sine_data, sizeof(sine_data)); /* setup divisor */ - clock_constant = (amiga_colorclock+DATA_SIZE/2)/DATA_SIZE; + clock_constant = (amiga_colorclock + ARRAY_SIZE(sine_data) /2) / ARRAY_SIZE(sine_data); /* without amifb, turn video off and enable high quality sound */ #ifndef CONFIG_FB_AMIGA diff --git a/arch/powerpc/boot/types.h b/arch/powerpc/boot/types.h index 31393d1..733622a 100644 --- a/arch/powerpc/boot/types.h +++ b/arch/powerpc/boot/types.h @@ -1,8 +1,6 @@ #ifndef _TYPES_H_ #define _TYPES_H_ -#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) - typedef unsigned char u8; typedef unsigned short u16; typedef unsigned int u32; diff --git a/arch/sparc64/kernel/pci.c b/arch/sparc64/kernel/pci.c index e8dac81..5c8c433 100644 --- a/arch/sparc64/kernel/pci.c +++ b/arch/sparc64/kernel/pci.c @@ -209,14 +209,12 @@ static struct { { "SUNW,sun4v-pci", sun4v_pci_init }, { "pciex108e,80f0", fire_pci_init }, }; -#define PCI_NUM_CONTROLLER_TYPES (sizeof(pci_controller_table) / \ - sizeof(pci_controller_table[0])) static int __init pci_controller_init(const char *model_name, int namelen, struct device_node *dp) { int i; - for (i = 0; i < PCI_NUM_CONTROLLER_TYPES; i++) { + for (i = 0; i < ARRAY_SIZE(pci_controller_table); i++) { if (!strncmp(model_name, pci_controller_table[i].model_name, namelen)) { @@ -232,7 +230,7 @@ static int __init pci_is_controller(const char *model_name, int namelen, struct { int i; - for (i = 0; i < PCI_NUM_CONTROLLER_TYPES; i++) { + for (i = 0; i < ARRAY_SIZE(pci_controller_table); i++) { if (!strncmp(model_name, pci_controller_table[i].model_name, namelen)) { diff --git a/drivers/acpi/utilities/uteval.c b/drivers/acpi/utilities/uteval.c index 0042b7e..5da86d5 100644 --- a/drivers/acpi/utilities/uteval.c +++ b/drivers/acpi/utilities/uteval.c @@ -122,7 +122,7 @@ acpi_status acpi_ut_osi_implementation(struct acpi_walk_state *walk_state) /* Compare input string to static table of supported interfaces */ - for (i = 0; i < ACPI_ARRAY_LENGTH(acpi_interfaces_supported); i++) { +
Re: More E820 brokenness
Jordan Crouse wrote: > > Breaks on the Geode - original behavior. > > I think that having boot_prams.e820_entries != 0 makes the kernel > assume the e820 data is correct. > Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode, because this, to the best of my reading, mimics the 2.6.22 behavior exactly. DID IT REALLY, and/or did you make any kind of configuration changes? >> I want to emphasize that this is seriously broken. Using a partial e820 >> map could have disastrous results, since the kernel will have partial >> memory map information and not know about reserved areas, etc. Part of >> me feels that the right thing to do is what the current git kernel does >> -- either fall back to e801, or stop and error. > > I'm inclined to agree. Arguably the right thing to do is to find the responsible BIOS engineer and shoot them, but that's hard to do without robotics. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 4/4] sysfs: implement new features
On Sep 25, 2007, at 18:50:05, Greg KH wrote: On Thu, Sep 20, 2007 at 05:31:37PM +0900, Tejun Heo wrote: * Name-formatting for symlinks. e.g. symlink pointing to /dira/ dirb/leaf can be named as "symlink:%1-%0" and it will show up as "symlink:dirb-leaf". This only applies when new interface is used. Is this really necessary? It looks like we are adding a "special" type of parser here that no one uses. IMHO this would be nicer if it could reuse existing sprintf code to handle all the nice shiny sprintf format specifiers. The only challenge would be how to dynamically build a varargs list from an array of component names although perhaps there could be an internal __csprintf function which took a callback for retrieving arguments. Also since all of the path components are strings I don't know that numeric specifiers could be made useful, so perhaps it's not the greatest idea. I think the primary importance for this functionality is: * Autorenaming of symlinks according to the name format string when target or one of its ancestors is renamed or moved. This only applies when new interface is used. Nice. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] Linux-tiny project revival
On Thursday 27 September 2007, you wrote: > > Then you don't have to change every single printk in the kernel, but > > only those that don't currently come with a log level. More importantly, > > you can do the conversion without a flag day, by spreading (an empty) > > PRINTK_CONTINUED in places that do need a printk without a log level. > > The problem is, how do you know whether to print a continued printk or not? > It depends on the loglevel of the first printk. Those need to be looked at individually. You can normally see easily from the context whether the missing log level was an accident, or the author actually has multiple printk statements for a single line. In one case, you would add a log level, in the other case, you can add PRINTK_CONTINUED, or something similar. An alternative to PRINTK_CONTINUED might be a new function, e.g. printk_continued() or similar that does not expect a log level. > So besides compile-time parsing of the source code, replacing printk with > loglevel specific alternatives (one way or the other) seems the only option. That would mean replacing all of them, not just those that currently lack a loglevel. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More E820 brokenness
On 27/09/07 15:17 -0700, H. Peter Anvin wrote: > As luck would have it, it's not just an obscure Geode system which has a > broken E820 implementation. Today I received a bug report about a Dell > system (XPS M1330) with broken E820. > > Unfortunately, the workaround for the Geode breaks this system, because > x86-64 doesn't fall back to the e801/88 information like the i386 kernel > does. > > I wonder if the relevant people could test out this patch to see how it > works on their respective system. This patch reverts to 2.6.23-rc8 > behaviour of simply truncating the map, but still makes e801/88 info > available to the kernel; this hopefully should match 2.6.22 behaviour. Breaks on the Geode - original behavior. I think that having boot_prams.e820_entries != 0 makes the kernel assume the e820 data is correct. > I want to emphasize that this is seriously broken. Using a partial e820 > map could have disastrous results, since the kernel will have partial > memory map information and not know about reserved areas, etc. Part of > me feels that the right thing to do is what the current git kernel does > -- either fall back to e801, or stop and error. I'm inclined to agree. Jordan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc7 - _random_ IRQ23 : nobody cared
On Thu, 2007-09-27 at 10:05 +, Paul Rolland wrote: > Hello, > > On Thu, 27 Sep 2007 19:04:11 +1000 > Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote: > > > Let me guess... this is a T61 or X61 ? > Bad luck ;) > > This is an Asus P5W-DH Deluxe motherboard, with a Core2 6400 CPU, > a bunch of disk (2 IDE, 3 SATA, 1 CDRW and 1 DVDRW-DL), and a damned > Olitec PCI V92 V2 modem. What chipset ? 965gm ? Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: Correct SuS compliance for open of large file without options
On Sep 27, 2007, at 17:34:45, Greg KH wrote: On Thu, Sep 27, 2007 at 02:37:42PM -0400, Theodore Tso wrote: That fact that sysfs is all laid out in a directory, but for which some directories/symlinks are OK to use, and some are NOT OK to use --- is why I call the sysfs interface "an open pit". And because of the original design mistakes, we have only been able to change things for the better in a slow manner. We have had userspace programs fixed up for _years_ before we are able to make the corresponding changes in the kernel, so as to not break the distros that are slow to upgrade packages and kernels (like Debian.) Hey! No poking fingers at Debian here; it's been *MUCH* improved lately. I far more frequently have problems with boxes still running some ancient release of RHEL-4 or something than I do with those running Debian stable (virtually always the latest Debian stable). Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kswapd should only wait on IO if there is IO
On Thu, 27 Sep 2007 18:13:25 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > On Thu, 27 Sep 2007 14:47:02 -0700 > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > On Thu, 27 Sep 2007 17:08:16 -0400 > > Rik van Riel <[EMAIL PROTECTED]> wrote: > > > > > The current kswapd (and try_to_free_pages) code has an oddity where the > > > code will wait on IO, even if there is no IO in flight. This problem is > > > notable especially when the system scans through many unfreeable pages, > > > causing unnecessary stalls in the VM. > > > > > > > What effect did this change have? > > Kswapd was no longer sitting in "D" state as often and pages got > freed more promptly. The test was done on a RHEL kernel with > this change though - I guess I should redo it with a current upstream > kernel. OK. Yes, it should help quite a bit in the common cases. > > > > > > /* Take a nap, wait for some writeback to complete */ > > > - if (sc.nr_scanned && priority < DEF_PRIORITY - 2) > > > + if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && > > > + sc.nr_io_pages > sc.swap_cluster_max) > > > > The comparison with swap_cluster_max is unobvious, and merits a > > comment. What is the thinking here? > > If the number of pages undergoing IO is really small, waiting > for them may be a waste of time. > > Maybe my thinking is wrong, not sure... The thinking sounds good to me, but I'm looking for weirdo side-effects in corner cases. And I'm trying to work out what actual design we want to have behind these various magic numbers and thresholds. > > Also, we now have this: > > > > if (total_scanned > sc.swap_cluster_max + > > sc.swap_cluster_max / 2) { > > wakeup_pdflush(laptop_mode ? 0 : total_scanned); > > sc.may_writepage = 1; > > } > > > > /* Take a nap, wait for some writeback to complete */ > > if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && > > sc.nr_io_pages > sc.swap_cluster_max) > > congestion_wait(WRITE, HZ/10); > > > > > > So in the case where total_scanned has not yet reached > > swap_cluster_max, this process isn't initiating writeout and it isn't > > sleeping, either. Nor is it incrementing nr_io_pages. > > Actually, nr_io_pages is also incremented when we run into pages that > are already PageWriteback - even if we did not start the IO ourselves. OK, that'll help a lot in this scenario. > > In the range (swap_cluster_max < nr_io_pages < 1.5*swap_cluster_max) this > > process still isn't incrementing nr_io_pages, but it _is_ running > > congestion_wait(). > > It is incrementing sc.nr_io_pages and will wait on IO to complete if > the amount of pages in flight to disk that it scanned over is larger > than the number of pages that it is trying to free. > > > Once nr_io_pages exceeds 1.5*swap_cluster_max, this process is both > > initiating IO and is throttling on writeback completion events. > > > > This all seems a bit weird and arbitrary - what is the reason for > > throttling-but-not-writing in that 1.0->1.5 window? > > Good question. Note that the throttling-but-not-writing window in > the current code is 0.0->1.5, so this patch does reduce the throttling > window compared to the current code. > > What is the reason that the current code does IO throttling even if > there is no IO at all in flight? Buggered if I know ;) It may have the accidental effect that it opens a window in which some may_enter_fs-capable process can get scheduled and do some writeout, perhaps. > > If there _is_ a reason and it's all been carefully thought out and > > designed, then can we please capture a description of that design in the > > changelog or in the code? > > I'll add a description for the sc.nr_io_pages > sc.swap_cluster_max > test. OK, thanks. Perhaps a few words tacked onto the nr_io_pages definition site would be the place to capture this. > > Also, I wonder about what this change will do to the dynamic behaviour of > > GFP_NOFS direct-reclaimers. Previously they would throttle if they > > encounter dirty pages which they can't write out. Hopefully someone else > > (kswapd or a __GFP_FS direct-reclaimer) will write some of those pages > > and this caller will be woken when that writeout completes and will go off > > and scoop them off the tail of the LRU. > > > > But after this change, such a GFP_NOFS caller will, afacit, burn its way > > through potentially the entire inactive list and will then declare oom. > > Nope, sc.nr_io_pages will also be incremented when the code runs into > pages that are already PageWriteback. yup, I didn't think of that. Hopefully someone else will be in there working on that zone too. If this caller yields and defers to kswapd then that's very likely. Except we just took away the ability to do that.. - To unsubscribe from this
Re: 2.6.23-rc8-mm1 -- powerpc link failure
On Thu, 27 Sep 2007, Andrew Morton wrote: > > +extern void arch_randomize_brk(void); > > #include "../../../fs/binfmt_elf.c" > Is this sinful extern-decl-in-C acually needed? Some time passed since I have written the patch, but I remember that this was needed, otherwise under some circumstances the build failed, but I don't remember details ... I will try to look at it more in a while. But it definitely was needed to work with that horrible include-huge-c-file-from-another-one. Thanks, -- Jiri Kosina SUSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
More E820 brokenness
As luck would have it, it's not just an obscure Geode system which has a broken E820 implementation. Today I received a bug report about a Dell system (XPS M1330) with broken E820. Unfortunately, the workaround for the Geode breaks this system, because x86-64 doesn't fall back to the e801/88 information like the i386 kernel does. I wonder if the relevant people could test out this patch to see how it works on their respective system. This patch reverts to 2.6.23-rc8 behaviour of simply truncating the map, but still makes e801/88 info available to the kernel; this hopefully should match 2.6.22 behaviour. I want to emphasize that this is seriously broken. Using a partial e820 map could have disastrous results, since the kernel will have partial memory map information and not know about reserved areas, etc. Part of me feels that the right thing to do is what the current git kernel does -- either fall back to e801, or stop and error. (Andi: I would particularly appreciate your opinion on this issue.) -hpa diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c index bccaa1c..84939b7 100644 --- a/arch/i386/boot/memory.c +++ b/arch/i386/boot/memory.c @@ -34,17 +34,7 @@ static int detect_memory_e820(void) "=m" (*desc) : "D" (desc), "a" (0xe820)); - /* Some BIOSes stop returning SMAP in the middle of - the search loop. We don't know exactly how the BIOS - screwed up the map at that point, we might have a - partial map, the full map, or complete garbage, so - just return failure. */ - if (id != SMAP) { - count = 0; - break; - } - - if (err) + if (id != SMAP || err) break; count++;
[PATCH] clockevents: fix bogus next_event reset for oneshot broadcast devices
In periodic broadcast mode the next_event member of the broadcast device structure is set to KTIME_MAX in the interrupt handler. This is wrong, as we calculate the next periodic interrupt with this variable. Remove it. Noticed by Ralf. MIPS is the first user of this mode, it does not affect existing users. Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]> Acked-and-tested-by: Ralf Baechle <[EMAIL PROTECTED]> --- diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 0962e05..acf15b4 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -176,8 +176,6 @@ static void tick_do_periodic_broadcast(void) */ static void tick_handle_periodic_broadcast(struct clock_event_device *dev) { - dev->next_event.tv64 = KTIME_MAX; - tick_do_periodic_broadcast(); /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc8-mm1 -- powerpc link failure
On Thu, 27 Sep 2007 14:13:21 +0200 (CEST) Jiri Kosina <[EMAIL PROTECTED]> wrote: > i386 and x86_64: randomize brk() > > ... > > --- a/arch/x86_64/ia32/ia32_binfmt.c > +++ b/arch/x86_64/ia32/ia32_binfmt.c > @@ -262,6 +262,7 @@ static void elf32_init(struct pt_regs *); > #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1 > #define arch_setup_additional_pages syscall32_setup_pages > extern int syscall32_setup_pages(struct linux_binprm *, int exstack); > +extern void arch_randomize_brk(void); > > #include "../../../fs/binfmt_elf.c" Is this sinful extern-decl-in-C acually needed? > index b4fbe47..5a1adf9 100644 > --- a/include/asm-x86_64/elf.h > +++ b/include/asm-x86_64/elf.h > @@ -177,4 +177,6 @@ do if (vdso_enabled) { > \ > > #endif > > +extern void arch_randomize_brk(void); > + > #endif Because we already have a declaration in the correct place? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kswapd should only wait on IO if there is IO
On Thu, 27 Sep 2007 14:47:02 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Thu, 27 Sep 2007 17:08:16 -0400 > Rik van Riel <[EMAIL PROTECTED]> wrote: > > > The current kswapd (and try_to_free_pages) code has an oddity where the > > code will wait on IO, even if there is no IO in flight. This problem is > > notable especially when the system scans through many unfreeable pages, > > causing unnecessary stalls in the VM. > > > > What effect did this change have? Kswapd was no longer sitting in "D" state as often and pages got freed more promptly. The test was done on a RHEL kernel with this change though - I guess I should redo it with a current upstream kernel. > > diff -up linux-2.6.22.x86_64/mm/vmscan.c.wait > > linux-2.6.22.x86_64/mm/vmscan.c > > --- linux-2.6.22.x86_64/mm/vmscan.c.wait2007-09-25 11:33:30.0 > > -0400 > > +++ linux-2.6.22.x86_64/mm/vmscan.c 2007-09-25 21:27:08.0 -0400 > > @@ -68,6 +68,8 @@ struct scan_control { > > int all_unreclaimable; > > > > int order; > > + > > + int nr_io_pages; > > }; > > > > #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru)) > > @@ -489,8 +491,10 @@ static unsigned long shrink_page_list(st > > */ > > if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs) > > wait_on_page_writeback(page); > > - else > > + else { > > + sc->nr_io_pages++; > > goto keep_locked; > > + } > > } > > > > referenced = page_referenced(page, 1); > > @@ -541,8 +545,10 @@ static unsigned long shrink_page_list(st > > case PAGE_ACTIVATE: > > goto activate_locked; > > case PAGE_SUCCESS: > > - if (PageWriteback(page) || PageDirty(page)) > > + if (PageWriteback(page) || PageDirty(page)) { > > + sc->nr_io_pages++; > > goto keep; > > + } > > /* > > * A synchronous write - probably a ramdisk. Go > > * ahead and try to reclaim the page. > > @@ -1201,6 +1207,7 @@ unsigned long try_to_free_pages(struct z > > > > for (priority = DEF_PRIORITY; priority >= 0; priority--) { > > sc.nr_scanned = 0; > > + sc.nr_io_pages = 0; > > if (!priority) > > disable_swap_token(); > > nr_reclaimed += shrink_zones(priority, zones, ); > > @@ -1229,7 +1236,8 @@ unsigned long try_to_free_pages(struct z > > } > > > > /* Take a nap, wait for some writeback to complete */ > > - if (sc.nr_scanned && priority < DEF_PRIORITY - 2) > > + if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && > > + sc.nr_io_pages > sc.swap_cluster_max) > > The comparison with swap_cluster_max is unobvious, and merits a > comment. What is the thinking here? If the number of pages undergoing IO is really small, waiting for them may be a waste of time. Maybe my thinking is wrong, not sure... > Also, we now have this: > > if (total_scanned > sc.swap_cluster_max + > sc.swap_cluster_max / 2) { > wakeup_pdflush(laptop_mode ? 0 : total_scanned); > sc.may_writepage = 1; > } > > /* Take a nap, wait for some writeback to complete */ > if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && > sc.nr_io_pages > sc.swap_cluster_max) > congestion_wait(WRITE, HZ/10); > > > So in the case where total_scanned has not yet reached > swap_cluster_max, this process isn't initiating writeout and it isn't > sleeping, either. Nor is it incrementing nr_io_pages. Actually, nr_io_pages is also incremented when we run into pages that are already PageWriteback - even if we did not start the IO ourselves. > In the range (swap_cluster_max < nr_io_pages < 1.5*swap_cluster_max) this > process still isn't incrementing nr_io_pages, but it _is_ running > congestion_wait(). It is incrementing sc.nr_io_pages and will wait on IO to complete if the amount of pages in flight to disk that it scanned over is larger than the number of pages that it is trying to free. > Once nr_io_pages exceeds 1.5*swap_cluster_max, this process is both > initiating IO and is throttling on writeback completion events. > > This all seems a bit weird and arbitrary - what is the reason for > throttling-but-not-writing in that 1.0->1.5 window? Good question. Note that the throttling-but-not-writing window in the current code is 0.0->1.5, so this patch does reduce the throttling window compared to the
Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure
On Thu, Sep 27, 2007 at 05:00:22PM -0500, Linas Vepstas wrote: > On Wed, Sep 26, 2007 at 09:02:16AM -0600, Matthew Wilcox wrote: > > I'm a little concerned by the mention of MMIO. It's entirely possible > > for the sym2 driver to be using ioports to access the card rather than > > MMIO. Is it simply that it can't on the platform you test on? > > The comment is misleading. I've been in the bad habit of calling > it "mmio" whenever its not DMA. OK, cool, thanks. I'll update the comment for you. One last thing (sorry, I only just noticed): In the error handler, we wait_for_completion(io_reset_wait). In sym2_io_error_detected, we init_completion(io_reset_wait). Isn't it possible that we hit the error handler before we hit the io_error_detected path, and thus the completion wait is lost? Since the completion is already initialised in sym_attach(), I don't think we need to initialise it in sym2_io_error_detected(). Makes sense to just delete it? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] make reiserfs stop using 'struct file' for internal xattr operations
On Thu, 2007-09-27 at 14:51 -0700, Andrew Morton wrote: > So your stuff becomes dependent on Nick's stuff, and Nick's stuff is still > failing on NFS, I think. It worked today, it turned out to be a UML bug. Real hardware seemed to work properly, but will test a bit more tomorrow. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why do so many machines need "noapic"?
Dave Jones wrote: If memory serves correctly, that was circa 2.6.10, back in these commits.. commit a068ea13d1db406e15c346e93530343f6e70184c Author: Len Brown <[EMAIL PROTECTED]> Date: Sun Oct 10 05:21:08 2004 -0400 [ACPI] If BIOS disabled the LAPIC, believe it by default. "lapic" is available to force enabling the LAPIC in the event you know more than your BIOS vendor. http://bugzilla.kernel.org/show_bug.cgi?id=3238 commit 2fcfece90db9643b6f30a7ad343898a2871e6a81 Author: Len Brown <[EMAIL PROTECTED]> Date: Sat Oct 9 20:12:45 2004 -0400 [ACPI] Don't enable LAPIC when the BIOS disabled it. Doing so apparently breaks every Dell on Earth. http://bugzilla.kernel.org/show_bug.cgi?id=3238 But those changes relate to the local APIC, which 'noapic' shouldn't have any effect on should it ? If the LAPIC is disabled, then you CAN'T use the IO-APIC right? So then wouldn't the noapic option have no effects since the apic is already disabled? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure
On Wed, Sep 26, 2007 at 09:02:16AM -0600, Matthew Wilcox wrote: > On Fri, Apr 20, 2007 at 03:47:20PM -0500, Linas Vepstas wrote: > > Implement the so-called "first failure data capture" (FFDC) for the > > symbios PCI error recovery. After a PCI error event is reported, > > the driver requests that MMIO be enabled. Once enabled, it > > then reads and dumps assorted status registers, and concludes > > by requesting the usual reset sequence. > > > + /* Request that MMIO be enabled, so register dump can be taken. */ > > + return PCI_ERS_RESULT_CAN_RECOVER; > > +} > > I'm a little concerned by the mention of MMIO. It's entirely possible > for the sym2 driver to be using ioports to access the card rather than > MMIO. Is it simply that it can't on the platform you test on? The comment is misleading. I've been in the bad habit of calling it "mmio" whenever its not DMA. The habit is because there are two distinct enable bits in the pci-host bridge during error recovery: one to enable mmio/ioports, and the other to enable DMA. If the adapter has gone crazy, I don't want to enable DMA, so that it doesn't scribble to bad places. But, by enabling mmio/ioports, perhaps it can be finessed back into a semi-sane state, e.g. sane enough to perform a dump of its internal state. --linas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] make reiserfs stop using 'struct file' for internal xattr operations
On Thu, 27 Sep 2007 14:51:25 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > > Plus, reiserfs seems to compile with that patch I just sent. Sure as > > heck surprised me. > > > > That'll be because reiserfs-convert-to-new-aops.patch witched reiserfs over > to ->write_begin() and ->write_end(). Actually, we should rename reiserfs_prepare_write and reiserfs_commit_write to something else to reduce confusion. Probably lots of other filesystems would benefit from the same change, post-Nick's-stuff. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-rc8-mm2 NULL dereference in __mnt_is_readonly in ftruncate
Kernel 2.6.23-rc8-mm2 on a AMD-64, filesystems mounted are reiserfs, reiser4 and tmpfs. netconsole dmesg output and .config are included below. Near the end of my boot sequence, there is a kernel error. I am not sure exactly what user-space is doing to make this happen, but I know that a simple shell and some filesystem operations do not cause it. This error also occurred in 2.6.23-rc8-mm1 but I didn't have time to post it and hoped it would just go away. I never tested 2.6.23-rc7-mm*, and the error did not happen in rc6-mm1. console [netcon0] enabled netconsole: network logging started eth0: no IPv6 routers present Unable to handle kernel NULL pointer dereference at 0053 RIP: [] __mnt_is_readonly+0x0/0x20 PGD 0 Oops: [1] SMP last sysfs file: /block/sr0/size CPU 0 Modules linked in: netconsole configfs sg ipv6 evdev usbhid hid usb_storage libusual psmouse serio_raw ssb video output ehci_hcd ohci_hcd usbcore snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd snd_page_alloc Pid: 7291, comm: smbd Not tainted 2.6.23-rc8-mm2 #1 RIP: 0010:[] [] __mnt_is_readonly+0x0/0x20 RSP: 0018:8100068b1b60 EFLAGS: 00010296 RAX: 810007108000 RBX: 81000261d8c0 RCX: 8093aca0 RDX: 0004 RSI: 8092e950 RDI: 0003 RBP: 0003 R08: 0003 R09: 8061f7cd R10: b256aacb R11: R12: ffe2 R13: 8100068b1bd8 R14: 8100068b1ee8 R15: 81000655a910 FS: 7f6f0930c6f0() GS:806ce000() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0053 CR3: 07cb2000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process smbd (pid: 7291, threadinfo 8100068b, task 810007108000) last branch before last exception/interrupt from [] mnt_want_write+0x3a/0x90 to [] __mnt_is_readonly+0x0/0x20 Stack: 802cc37f 8100078cd9a0 8100068b1bd8 8100078cd9a0 802c82bc 8100078cd780 8100078cd9a0 8100068b1bd8 8100068b1ee8 3000 Call Trace: [] mnt_want_write+0x3f/0x90 [] file_update_time+0x2c/0xe0 [] truncate_file_body+0x148/0x3f0 [] __lock_acquire+0x583/0x1180 [] _spin_unlock+0x17/0x20 [] store_black_box+0x82/0x90 [] safe_link_add+0x75/0xd0 [] setattr_unix_file+0x207/0x220 [] _spin_unlock_irq+0x24/0x30 [] __down_write_nested+0xa1/0xc0 [] notify_change+0xf7/0x2c0 [] do_truncate+0x5e/0x80 [] sys_ftruncate+0x119/0x130 [] system_call+0x7e/0x83 INFO: lockdep is turned off. Code: f6 47 50 40 b8 01 00 00 00 75 0a 48 8b 47 28 8b 40 58 83 e0 RIP [] __mnt_is_readonly+0x0/0x20 RSP CR2: 0053 BUG: spinlock lockup on CPU#0, syslogd/5128, 81000261d8c0 Call Trace: [] _raw_spin_lock+0x134/0x140 [] mnt_want_write+0x37/0x90 [] mnt_want_write+0x37/0x90 [] file_update_time+0x2c/0xe0 [] write_unix_file+0x275/0x530 [] write_unix_file+0x0/0x530 [] do_loop_readv_writev+0x45/0x70 [] do_readv_writev+0x20a/0x220 [] sys_writev+0x53/0xc0 [] system_call+0x7e/0x83 INFO: lockdep is turned off. BUG: soft lockup - CPU#0 stuck for 11s! [syslogd:5128] CPU 0: Modules linked in: netconsole configfs sg ipv6 evdev usbhid hid usb_storage libusual psmouse serio_raw ssb video output ehci_hcd ohci_hcd usbcore snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd snd_page_alloc Pid: 5128, comm: syslogd Tainted: G D 2.6.23-rc8-mm2 #1 RIP: 0010:[] [] _raw_spin_lock+0xb8/0x140 RSP: 0018:810006067d18 EFLAGS: 0246 RAX: RBX: 080c83f0 RCX: 75413ee4 RDX: 0027 RSI: 002182fd RDI: 0001 RBP: R08: 0001 R09: 0001 R10: 8023f8bb R11: 62da686f R12: 0001 R13: 8080ef80 R14: 810006066000 R15: 810081e11000 FS: 7fe85cf206f0() GS:806ce000() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0053 CR3: 064a8000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Call Trace: [] _raw_spin_lock+0xc9/0x140 [] mnt_want_write+0x37/0x90 [] mnt_want_write+0x37/0x90 [] file_update_time+0x2c/0xe0 [] write_unix_file+0x275/0x530 [] write_unix_file+0x0/0x530 [] do_loop_readv_writev+0x45/0x70 [] do_readv_writev+0x20a/0x220 [] sys_writev+0x53/0xc0 [] system_call+0x7e/0x83 INFO: lockdep is turned off. SysRq : HELP : loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks SysRq : Resetting # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23-rc8-mm2 # Thu Sep 27 14:06:06 2007 # CONFIG_X86_64=y
Re: [RFC][PATCH] make reiserfs stop using 'struct file' for internal xattr operations
On Thu, 27 Sep 2007 14:27:14 -0700 Dave Hansen <[EMAIL PROTECTED]> wrote: > On Thu, 2007-09-27 at 22:04 +0100, Christoph Hellwig wrote: > > On Thu, Sep 27, 2007 at 01:53:39PM -0700, Dave Hansen wrote: > > > -int reiserfs_commit_write(struct file *f, struct page *page, > > > - unsigned from, unsigned to); > > > -int reiserfs_prepare_write(struct file *f, struct page *page, > > > -unsigned from, unsigned to); > > > +int reiserfs_commit_write(struct page *page, unsigned from, unsigned to); > > > +int reiserfs_prepare_write(struct page *page, unsigned from, unsigned > > > to); > > > > I doubt this will work. These are also used for the ->prepare_write > > and ->commit_write aops, and the method signature definitively wants > > a file there, even if it's zero.. > > Oddly enough, I don't see those functions being used in aops: > > const struct address_space_operations reiserfs_address_space_operations = { > .writepage = reiserfs_writepage, > .readpage = reiserfs_readpage, > .readpages = reiserfs_readpages, > .releasepage = reiserfs_releasepage, > .invalidatepage = reiserfs_invalidatepage, > .sync_page = block_sync_page, > .write_begin = reiserfs_write_begin, > .write_end = reiserfs_write_end, > .bmap = reiserfs_aop_bmap, > .direct_IO = reiserfs_direct_IO, > .set_page_dirty = reiserfs_set_page_dirty, > }; > > Plus, reiserfs seems to compile with that patch I just sent. Sure as > heck surprised me. > That'll be because reiserfs-convert-to-new-aops.patch witched reiserfs over to ->write_begin() and ->write_end(). So your stuff becomes dependent on Nick's stuff, and Nick's stuff is still failing on NFS, I think. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kswapd should only wait on IO if there is IO
On Thu, 27 Sep 2007 17:08:16 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > The current kswapd (and try_to_free_pages) code has an oddity where the > code will wait on IO, even if there is no IO in flight. This problem is > notable especially when the system scans through many unfreeable pages, > causing unnecessary stalls in the VM. > What effect did this change have? > > diff -up linux-2.6.22.x86_64/mm/vmscan.c.wait linux-2.6.22.x86_64/mm/vmscan.c > --- linux-2.6.22.x86_64/mm/vmscan.c.wait 2007-09-25 11:33:30.0 > -0400 > +++ linux-2.6.22.x86_64/mm/vmscan.c 2007-09-25 21:27:08.0 -0400 > @@ -68,6 +68,8 @@ struct scan_control { > int all_unreclaimable; > > int order; > + > + int nr_io_pages; > }; > > #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru)) > @@ -489,8 +491,10 @@ static unsigned long shrink_page_list(st >*/ > if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs) > wait_on_page_writeback(page); > - else > + else { > + sc->nr_io_pages++; > goto keep_locked; > + } > } > > referenced = page_referenced(page, 1); > @@ -541,8 +545,10 @@ static unsigned long shrink_page_list(st > case PAGE_ACTIVATE: > goto activate_locked; > case PAGE_SUCCESS: > - if (PageWriteback(page) || PageDirty(page)) > + if (PageWriteback(page) || PageDirty(page)) { > + sc->nr_io_pages++; > goto keep; > + } > /* >* A synchronous write - probably a ramdisk. Go >* ahead and try to reclaim the page. > @@ -1201,6 +1207,7 @@ unsigned long try_to_free_pages(struct z > > for (priority = DEF_PRIORITY; priority >= 0; priority--) { > sc.nr_scanned = 0; > + sc.nr_io_pages = 0; > if (!priority) > disable_swap_token(); > nr_reclaimed += shrink_zones(priority, zones, ); > @@ -1229,7 +1236,8 @@ unsigned long try_to_free_pages(struct z > } > > /* Take a nap, wait for some writeback to complete */ > - if (sc.nr_scanned && priority < DEF_PRIORITY - 2) > + if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && > + sc.nr_io_pages > sc.swap_cluster_max) The comparison with swap_cluster_max is unobvious, and merits a comment. What is the thinking here? Also, we now have this: if (total_scanned > sc.swap_cluster_max + sc.swap_cluster_max / 2) { wakeup_pdflush(laptop_mode ? 0 : total_scanned); sc.may_writepage = 1; } /* Take a nap, wait for some writeback to complete */ if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && sc.nr_io_pages > sc.swap_cluster_max) congestion_wait(WRITE, HZ/10); So in the case where total_scanned has not yet reached swap_cluster_max, this process isn't initiating writeout and it isn't sleeping, either. Nor is it incrementing nr_io_pages. In the range (swap_cluster_max < nr_io_pages < 1.5*swap_cluster_max) this process still isn't incrementing nr_io_pages, but it _is_ running congestion_wait(). Once nr_io_pages exceeds 1.5*swap_cluster_max, this process is both initiating IO and is throttling on writeback completion events. This all seems a bit weird and arbitrary - what is the reason for throttling-but-not-writing in that 1.0->1.5 window? If there _is_ a reason and it's all been carefully thought out and designed, then can we please capture a description of that design in the changelog or in the code? Also, I wonder about what this change will do to the dynamic behaviour of GFP_NOFS direct-reclaimers. Previously they would throttle if they encounter dirty pages which they can't write out. Hopefully someone else (kswapd or a __GFP_FS direct-reclaimer) will write some of those pages and this caller will be woken when that writeout completes and will go off and scoop them off the tail of the LRU. But after this change, such a GFP_NOFS caller will, afacit, burn its way through potentially the entire inactive list and will then declare oom. Non-preemtible uniprocessor kernels would be most at risk from this. > congestion_wait(WRITE, HZ/10); > } > /* top priority shrink_caches still had more to do? don't OOM, then */ > @@ -1315,6 +1323,7 @@ loop_again: > if (!priority) >
Re: Problems with SMP & ACPI powering off
On Thursday, 27 September 2007 23:29, Mark Lord wrote: > Question: do we disable all CPUs except 0 when doing ACPI power off? No, but we should. > Background: > I have a machine here dedicated to running MythTV. > It powers up to record, and then sets the RTC alarm for next time > and powers down again in between recordings. > > It has an Intel Core2duo E6300 CPU, currently on an ICH8 motherboard. > Previously it was on a completely different (vendor,bios,...) ICH7 > motherboard. > > In both cases, "halt -p" sometimes fails to actually turn off the power, > which means that it later then fails to "turn on" to record again. > > Annoying. > > This is a 32-bit kernel/runtime, with full ACPI (not APM) kernel support > enabled. > > So I'm wondering if it may be due to the old SMP-poweroff bogeyman ? May be. Which kernel? > For now, I've hardcoded a cpu_down(1) into the poweroff code, > and we'll see if that helps or is merely redundant. > > But I do wonder where else to look for a cause? > > Two different boards, vendors, BIOSs, same CPU chip. Same problem. Same chipset, perchance? Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 02/10] Dont touch fs_struct in usermodehelper
On Thu, Sep 27, 2007 at 10:39:22PM +0200, Christoph Hellwig wrote: > On Thu, Sep 27, 2007 at 10:46:04AM -0700, Greg KH wrote: > > On Thu, Sep 27, 2007 at 04:12:02PM +0200, [EMAIL PROTECTED] wrote: > > > This test seems to be unnecessary since we always have rootfs mounted > > > before > > > calling a usermodehelper. > > > > Are you sure this is true? I thought we called the usermode helper for > > hotplug _very_ early in the boot sequence when the device tree starts to > > get populated. > > rootfs is mounted by init_mount_tree, and curret->fs is set up for init > there aswell. This is called by mnt_init, which is called by > vfs_caches_init, which is called by start_kernel far before we go to > rest_init which finally creates a thread to call kernel_init which then > calls do_basic_setup which calls do_initcalls to initialize drivers and > afterwards runs the initrd/initramfs. > > While the actual function names in main.c changed quite a bit we've > initialized the initial namespace very early on since the 2.5 days. Ah, ok, great, thanks for correcting me. I have no objection to this patch then. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: Correct SuS compliance for open of large file without options
On Thu, Sep 27, 2007 at 02:37:42PM -0400, Theodore Tso wrote: > On Thu, Sep 27, 2007 at 10:59:17AM -0700, Greg KH wrote: > > Come on now, I'm _very_ tired of this kind of discussion. Please go > > read the documentation on how to _use_ sysfs from userspace in such a > > way that you can properly access these data structures so that no > > breakage occurs. > > I've read it; the question is whether every single application > programmer or system shell script programmer who writes code my system > depends upon has read it this document buried in the kernel sources, > or whether things will break spectacularly --- one of those things > that leaves me in suspense each time I update the kernel. Ok, how then should I advertise this better? What can we do better to help userspace programmers out in this regard? > I'm reminded of Rusty's 2003 OLS Keynote, where he points out that > what's important is not making an interface easy to use, but _hard_ > _to_ _misuse_. Me and Pat Mochel sat in that talk and instantly had an "oh shit" moment when it came to the in-kernel usage of sysfs and the driver model. Ever since then, I have been working to change the code to make it better. With the exception of the recent help from Kay, I am the only one doing this as Pat has been gone for a few years and isn't coming back. > That fact that sysfs is all laid out in a directory, but for which > some directories/symlinks are OK to use, and some are NOT OK to use > --- is why I call the sysfs interface "an open pit". We (well, Kay mostly) have also been working on fixing this all up to make it much harder to use sysfs incorrectly. We will have a single device tree (well, almost a single tree, it's getting there), so that all of the information is only in one place, and you don't have to go searching all over the place for it. That is a direct improvement over the old design where somethings were in one place, and others in another. And because of the original design mistakes, we have only been able to change things for the better in a slow manner. We have had userspace programs fixed up for _years_ before we are able to make the corresponding changes in the kernel, so as to not break the distros that are slow to upgrade packages and kernels (like Debian.) If I had my druthers, we could instantly put some patches into the tree to fix up the sysfs "mess" once and for all, creating a unified, single tree, with only a handful of needed symlinks to be able to categorize certain things. We have the patches (Kay wrote them over a year ago), and userspace programs work just fine with them (udev and HAL), but because we need to support 5 year old userspace programs running tomorrows kernel, we must take very tiny, slow steps to get there. And yes, sysfs has slowly changed over the years, and along the way we have kept things working, with only very minor problems. You have no idea the crazy mismatch of kernels and userspace programs we have had to deal with. And it will continue to change, slowly, until we reach the unified-tree goal, and all of those old crufty userspace programs are dead and buried (I got a bug report about RHEL's udev version 039 just yesterday.) So you can't have it both ways. You can't complain that sysfs isn't stable, and isn't "properly userspace friendly" at the same time. In order to fix the issues, we have to change it, and do it slowly, because I don't want to break some distros that can't keep up with the others. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] spin_lock_unlocked cleanups
Replace some SPIN_LOCK_UNLOCKED with DEFINE_SPINLOCK Signed-off-by: Roel Kluin <[EMAIL PROTECTED]> --- diff --git a/arch/mips/pci/ops-pmcmsp.c b/arch/mips/pci/ops-pmcmsp.c index 09fa007..059eade 100644 --- a/arch/mips/pci/ops-pmcmsp.c +++ b/arch/mips/pci/ops-pmcmsp.c @@ -206,7 +206,7 @@ static void pci_proc_init(void) } #endif /* CONFIG_PROC_FS && PCI_COUNTERS */ -spinlock_t bpci_lock = SPIN_LOCK_UNLOCKED; +DEFINE_SPINLOCK(bpci_lock); /* * diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c index d5fd390..cd2766e 100644 --- a/arch/powerpc/mm/slice.c +++ b/arch/powerpc/mm/slice.c @@ -34,7 +34,7 @@ #include #include -static spinlock_t slice_convert_lock = SPIN_LOCK_UNLOCKED; +static DEFINE_SPINLOCK(slice_convert_lock); #ifdef DEBUG diff --git a/drivers/char/watchdog/bfin_wdt.c b/drivers/char/watchdog/bfin_wdt.c index 309d279..31dc7a6 100644 --- a/drivers/char/watchdog/bfin_wdt.c +++ b/drivers/char/watchdog/bfin_wdt.c @@ -71,7 +71,7 @@ static int nowayout = WATCHDOG_NOWAYOUT; static struct watchdog_info bfin_wdt_info; static unsigned long open_check; static char expect_close; -static spinlock_t bfin_wdt_spinlock = SPIN_LOCK_UNLOCKED; +static DEFINE_SPINLOCK(bfin_wdt_spinlock); /** * bfin_wdt_keepalive - Keep the Userspace Watchdog Alive diff --git a/drivers/ieee1394/ieee1394_core.c b/drivers/ieee1394/ieee1394_core.c index 98fd985..36c747b 100644 --- a/drivers/ieee1394/ieee1394_core.c +++ b/drivers/ieee1394/ieee1394_core.c @@ -488,7 +488,7 @@ void hpsb_selfid_complete(struct hpsb_host *host, int phyid, int isroot) highlevel_host_reset(host); } -static spinlock_t pending_packets_lock = SPIN_LOCK_UNLOCKED; +static DEFINE_SPINLOCK(pending_packets_lock); /** * hpsb_packet_sent - notify core of sending a packet diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index 83e76b3..94fd78f 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -15,9 +15,9 @@ #include "sysfs.h" DEFINE_MUTEX(sysfs_mutex); -spinlock_t sysfs_assoc_lock = SPIN_LOCK_UNLOCKED; +DEFINE_SPINLOCK(sysfs_assoc_lock); -static spinlock_t sysfs_ino_lock = SPIN_LOCK_UNLOCKED; +static DEFINE_SPINLOCK(sysfs_ino_lock); static DEFINE_IDA(sysfs_ino_ida); /** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NO_HZ hangs up AMD MK-36
2007/9/28, Thomas Gleixner <[EMAIL PROTECTED]>: > On Fri, 2007-09-28 at 00:01 +0300, Dmitry Tyschenko wrote: > > Sorry, I am newbie in linux. Hope you was talking about: > > /boot/vmlinuz-2.6.22-1-k7 root=/dev/sda5 ro nohz=off > > Yes. > > > But it doesn't help for Debians 2.6.22-1 (I don't have another > > prebuiled) still same problems. > > Can you please add: nolapic_timer instead ? > > Thanks, > > tglx It works with nolapic_timer! Thank you! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with SMP & ACPI powering off
Mark Lord wrote: Question: do we disable all CPUs except 0 when doing ACPI power off? Background: I have a machine here dedicated to running MythTV. It powers up to record, and then sets the RTC alarm for next time and powers down again in between recordings. It has an Intel Core2duo E6300 CPU, currently on an ICH8 motherboard. Previously it was on a completely different (vendor,bios,...) ICH7 motherboard. In both cases, "halt -p" sometimes fails to actually turn off the power, which means that it later then fails to "turn on" to record again. Annoying. This is a 32-bit kernel/runtime, with full ACPI (not APM) kernel support enabled. So I'm wondering if it may be due to the old SMP-poweroff bogeyman ? For now, I've hardcoded a cpu_down(1) into the poweroff code, and we'll see if that helps or is merely redundant. But I do wonder where else to look for a cause? Two different boards, vendors, BIOSs, same CPU chip. Same problem. Oh, and two different power-supplies, too. -ml - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Problems with SMP & ACPI powering off
Question: do we disable all CPUs except 0 when doing ACPI power off? Background: I have a machine here dedicated to running MythTV. It powers up to record, and then sets the RTC alarm for next time and powers down again in between recordings. It has an Intel Core2duo E6300 CPU, currently on an ICH8 motherboard. Previously it was on a completely different (vendor,bios,...) ICH7 motherboard. In both cases, "halt -p" sometimes fails to actually turn off the power, which means that it later then fails to "turn on" to record again. Annoying. This is a 32-bit kernel/runtime, with full ACPI (not APM) kernel support enabled. So I'm wondering if it may be due to the old SMP-poweroff bogeyman ? For now, I've hardcoded a cpu_down(1) into the poweroff code, and we'll see if that helps or is merely redundant. But I do wonder where else to look for a cause? Two different boards, vendors, BIOSs, same CPU chip. Same problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] make reiserfs stop using 'struct file' for internal xattr operations
On Thu, 2007-09-27 at 22:04 +0100, Christoph Hellwig wrote: > On Thu, Sep 27, 2007 at 01:53:39PM -0700, Dave Hansen wrote: > > -int reiserfs_commit_write(struct file *f, struct page *page, > > - unsigned from, unsigned to); > > -int reiserfs_prepare_write(struct file *f, struct page *page, > > - unsigned from, unsigned to); > > +int reiserfs_commit_write(struct page *page, unsigned from, unsigned to); > > +int reiserfs_prepare_write(struct page *page, unsigned from, unsigned to); > > I doubt this will work. These are also used for the ->prepare_write > and ->commit_write aops, and the method signature definitively wants > a file there, even if it's zero.. Oddly enough, I don't see those functions being used in aops: const struct address_space_operations reiserfs_address_space_operations = { .writepage = reiserfs_writepage, .readpage = reiserfs_readpage, .readpages = reiserfs_readpages, .releasepage = reiserfs_releasepage, .invalidatepage = reiserfs_invalidatepage, .sync_page = block_sync_page, .write_begin = reiserfs_write_begin, .write_end = reiserfs_write_end, .bmap = reiserfs_aop_bmap, .direct_IO = reiserfs_direct_IO, .set_page_dirty = reiserfs_set_page_dirty, }; Plus, reiserfs seems to compile with that patch I just sent. Sure as heck surprised me. -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] bw-qcam: use data_reverse instead of manually poking the control register
On Thu, 27 Sep 2007 12:28:31 -0700 "Brett Warden" <[EMAIL PROTECTED]> wrote: > Fixes use of parport_write_control() to match the newer interface that > requires explicit parport_data_reverse() and parport_data_forward() > calls. This eliminates the following error message and restores the > original intended behavior: Looks good > parport0 (bw-qcam): use data_reverse for this! > > Also increases threshold in qc_detect() from 300 to 400, as my camera > often results in a count of approx 330. Added a kernel error message > to indicate detection failure. Likewise > Signed-off-by: Brett T. Warden <[EMAIL PROTECTED]> Acked-by: Alan Cox <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel Oops in ext3 code
Hi, Could you please sent the objdump of the ext4_discard_reservation function? It doesn't match what I see here. Thanks, Mingming On Thu, 2007-09-27 at 12:31 +0200, [EMAIL PROTECTED] wrote: > Hi all! > > (Please Cc) > > kernel 2.6.23-rc6 > Debian/sid > > kernel ooops: > > BUG: unable to handle kernel paging request at virtual address 104b > printing eip: > c0195bd3 > *pde = > Oops: [#1] > PREEMPT SMP > Modules linked in: vboxdrv binfmt_misc fuse coretemp hwmon gspca videodev > v4l2_common v4l1_compat iwl3945 mac80211 tifm_7xx1 tifm_core joydev irda > crc_ccitt 8250_pnp 8250 serial_core firewire_ohci firewire_core crc_itu_t > CPU:0 > EIP:0060:[]Not tainted VLI > EFLAGS: 00010206 (2.6.23-rc6 #1) > EIP is at ext3_discard_reservation+0x18/0x4d > eax: dff23800 ebx: 1033 ecx: dfc15ec0 edx: > esi: c0007c44 edi: 1033 ebp: dfc2bef4 esp: dfc2beac > ds: 007b es: 007b fs: 00d8 gs: ss: 0068 > Process kswapd0 (pid: 261, ti=dfc2a000 task=dfcac570 task.ti=dfc2a000) > Stack: c0007ba4 c0007c44 1033 c019ec51 c0007c44 c0007d8c 002c > c0171b1b > 002c c0007c44 c0007c4c c0171da2 c050880c 0080 > 0080 > c0171fb8 0080 c0007e48 df9e3910 7404 c03f5634 0080 > 00d0 > Call Trace: > [] ext3_clear_inode+0x5d/0x76 > [] clear_inode+0x6b/0xb9 > [] dispose_list+0x48/0xc9 > [] shrink_icache_memory+0x195/0x1bd > [] shrink_slab+0xe2/0x159 > [] kswapd+0x2d3/0x431 > [] autoremove_wake_function+0x0/0x33 > [] kswapd+0x0/0x431 > [] kthread+0x38/0x5d > [] kthread+0x0/0x5d > [] kernel_thread_helper+0x7/0x10 > === > Code: 83 f8 01 19 c0 f7 d0 83 e0 08 89 42 0c 89 56 b4 5b 5e c3 57 56 89 c6 > 53 8b 58 b4 8b 80 a4 00 00 00 85 db 8b 80 78 01 00 00 74 30 <83> 7b 18 00 74 > 2a 8d b8 00 03 00 00 89 f8 e8 b8 ca 1a 00 83 7b > EIP: [] ext3_discard_reservation+0x18/0x4d SS:ESP 0068:dfc2beac > > > Sysrq did work, so the oops was saved. Good. > > Any ideas? > > Best wishes > > Norbert > > --- > Dr. Norbert Preining <[EMAIL PROTECTED]>Vienna University of > Technology > Debian Developer <[EMAIL PROTECTED]> Debian TeX Group > gpg DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 > B094 > --- > As he came into the light they could see his black and > gold uniform on which the buttons were so highly polished > that they shone with an intensity that would have made an > approaching motorist flash his lights in annoyance. > --- Douglas Adams, The Hitchhikers Guide to the Galaxy > - > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] New kernel-message logging API (take 2)
Hello, A big thanks to everybody who read and replied to my first e-mail; I have tried my best to incorporate your feedback and suggestions. I also added some CCs who recently participated in logging-related discussions. Changes (since Sept. 22): * Extensibility -> Allowing the compiler to eliminate messages below a certain threshold requires changing the API. * Add some special-purpose logging functions (printk_detected(), _registered(), _settings(), and _copyright()) * Fine-grained log-level control. "Everything above" or "everything below" can be emulated by turning the specific log-levels on or off. * Define an extra header containing the (optional) secondary interface (err()/warn()/info()) * Remove kprint_*() aliases. * kprint_() is better than kprint( CONFIG_KPRINT_LOGLEVEL_MAX) { \ kprint_real_block_init(block, loglevel); #define kprint_block(block, fmt, ...) \ kprint_real_block(block, fmt, ## __VA_ARGS__); #define kprint_block_flush(block) \ kprint_real_block_flush(block); \ } /* Thus, this C code: */ kprint_block_init(, KPRINT_INFO); kprint_block(, "Hello world"); kprint_block_flush(); /* Would pre-process into this: */ if(6 < 4) { kprint_real_block_init(, 6); kprint_real_block(, "Hello world"); kprint_block_flush(); } } References [1] http://lkml.org/lkml/2007/9/21/267 (Joe Perches) [2] http://lkml.org/lkml/2007/9/20/352 (Rob Landley) [3] http://lkml.org/lkml/2007/9/21/151 (Dick Streefland) [4] http://lkml.org/lkml/2007/6/13/146 (Michael Holzheu) [5] http://lkml.org/lkml/2007/9/24/320 (Jesse Barnes) [6] http://lkml.org/lkml/2007/9/22/162 (Miguel Ojeda) [7] http://lkml.org/lkml/2007/9/25/62 (Vegard Nossum) [8] http://lkml.org/lkml/2007/9/22/157 (Joe Perches) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kswapd should only wait on IO if there is IO
The current kswapd (and try_to_free_pages) code has an oddity where the code will wait on IO, even if there is no IO in flight. This problem is notable especially when the system scans through many unfreeable pages, causing unnecessary stalls in the VM. Signed-off-by: Rik van Riel <[EMAIL PROTECTED]> diff -up linux-2.6.22.x86_64/mm/vmscan.c.wait linux-2.6.22.x86_64/mm/vmscan.c --- linux-2.6.22.x86_64/mm/vmscan.c.wait2007-09-25 11:33:30.0 -0400 +++ linux-2.6.22.x86_64/mm/vmscan.c 2007-09-25 21:27:08.0 -0400 @@ -68,6 +68,8 @@ struct scan_control { int all_unreclaimable; int order; + + int nr_io_pages; }; #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru)) @@ -489,8 +491,10 @@ static unsigned long shrink_page_list(st */ if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs) wait_on_page_writeback(page); - else + else { + sc->nr_io_pages++; goto keep_locked; + } } referenced = page_referenced(page, 1); @@ -541,8 +545,10 @@ static unsigned long shrink_page_list(st case PAGE_ACTIVATE: goto activate_locked; case PAGE_SUCCESS: - if (PageWriteback(page) || PageDirty(page)) + if (PageWriteback(page) || PageDirty(page)) { + sc->nr_io_pages++; goto keep; + } /* * A synchronous write - probably a ramdisk. Go * ahead and try to reclaim the page. @@ -1201,6 +1207,7 @@ unsigned long try_to_free_pages(struct z for (priority = DEF_PRIORITY; priority >= 0; priority--) { sc.nr_scanned = 0; + sc.nr_io_pages = 0; if (!priority) disable_swap_token(); nr_reclaimed += shrink_zones(priority, zones, ); @@ -1229,7 +1236,8 @@ unsigned long try_to_free_pages(struct z } /* Take a nap, wait for some writeback to complete */ - if (sc.nr_scanned && priority < DEF_PRIORITY - 2) + if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && + sc.nr_io_pages > sc.swap_cluster_max) congestion_wait(WRITE, HZ/10); } /* top priority shrink_caches still had more to do? don't OOM, then */ @@ -1315,6 +1323,7 @@ loop_again: if (!priority) disable_swap_token(); + sc.nr_io_pages = 0; all_zones_ok = 1; /* @@ -1398,7 +1407,8 @@ loop_again: * OK, kswapd is getting into trouble. Take a nap, then take * another pass across the zones. */ - if (total_scanned && priority < DEF_PRIORITY - 2) + if (total_scanned && priority < DEF_PRIORITY - 2 && + sc.nr_io_pages > sc.swap_cluster_max) congestion_wait(WRITE, HZ/10); /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NO_HZ hangs up AMD MK-36
On Fri, 2007-09-28 at 00:01 +0300, Dmitry Tyschenko wrote: > Sorry, I am newbie in linux. Hope you was talking about: > /boot/vmlinuz-2.6.22-1-k7 root=/dev/sda5 ro nohz=off Yes. > But it doesn't help for Debians 2.6.22-1 (I don't have another > prebuiled) still same problems. Can you please add: nolapic_timer instead ? Thanks, tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NO_HZ hangs up AMD MK-36
Please don't top post. On Thursday, 27 September 2007 23:01, Dmitry Tyschenko wrote: > Sorry, I am newbie in linux. Hope you was talking about: > /boot/vmlinuz-2.6.22-1-k7 root=/dev/sda5 ro nohz=off Yes. > But it doesn't help for Debians 2.6.22-1 (I don't have another > prebuiled) still same problems. So, you need to explicitly unset NO_HZ in the kernel coniguration to make things work. Well, in that case please wait until the 2.6.23 kernel is out and test it. There will be some important fixes related to NO_HZ in it. Greetings, Rafael > 2007/9/27, Rafael J. Wysocki <[EMAIL PROTECTED]>: > > On Thursday, 27 September 2007 22:28, Dmitry Tyschenko wrote: > > > Hello, > > > > > > I have laptop Asus X50M. Using old Debian Etch from February. > > > Kernel from 2.6.21 doesn't boot, hangs up just in 10seconds - 1minute > > > after GRUB screen. > > > I have tryed different versions of gcc (4.1.1, 4.1.2, 4.2.1) to build > > > 2.6.22.8 kernel, but no results. > > > But if I disable NO_HZ option 2.6.21 is working fine for me. > > > > > > I think this is important problem, because some of the project, Debian > > > for example, > > > are building kernel with this options enabled (in > > > linux-image-2.6.22-1-k7 package it is enabled), > > > and some people, like me, can not use new kernels. > > > > > > I have attached some of my PC info, hope this can help > > > > You can use the "nohz=off" kernel command line switch. Please check if it > > works for you. > > > > Greetings, > > Rafael > > > > -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/