Re: [PATCH -mm1 0/2] Fix unlocked call to idr_find()

2007-09-27 Thread Jarek Poplawski
On Thu, Sep 27, 2007 at 04:33:54PM +0200, [EMAIL PROTECTED] wrote:
> 
> This a series of 2 patches that should be applied on top of the other ipc
> patches, in 2.6.23-rc6-mm1.
...
> They should be applied to 2.6.23-rc6-mm1, in the following order:

Didn't you mean 2.6.23-rc8-mm1, btw?

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Module use count must be updated as bridges are created/destroyed

2007-09-27 Thread Herbert Xu
Jan Beulich <[EMAIL PROTECTED]> wrote:
>
> So we have an unsolvable problem here then, unless infrastructure gets added
> that allows a module to declare itself as not-implicit-unload-safe, forcing
> modprobe -r to keep its hands off it. Ugly.

Yes I've always wanted to have a separate count that indicates
a module is in use but does not prevent its immediate removal.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_sil24 broken since 2.6.23-rc4-mm1

2007-09-27 Thread Torsten Kaiser
On 9/27/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
> Torsten Kaiser wrote:
> > Known good is for me 2.6.23-rc3-mm1, the first known bad is 2.6.23-rc4-mm1.
> > I will try to look at the diff between these revisions some more, but
> > the change in sata_sil24.c looked like a perfect match for the
> > symptoms I was seeing.
>
> I think the first thing to do here is to verify 2.6.23-rc3-mm1 still
> works fine and my previous debug patch is pretty much meaningless if
> address initialization failure isn't the cause.

After the first trouble with -rc4-mm1 I switched back to -rc3-mm1. I
booted that kernel 7 times over 4 days and never had trouble. (Before
-rc4-mm1 came out, I used -rc3-mm1 for over a week)

So in case of -rc3-mm1 I'm pretty sure that it works.

Not completely sure is if 2.6.23-rc7-sglist kernel works. I booted
that 9 times, but from a quick look in /var/log/messages, I might not
have hit the "correct" situation to trigger the error.
That kernel is vanilla 2.6.23-rc7 plus the patch from
http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2
( http://marc.info/?l=linux-ide=119055574826083=2 )

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with SMP & ACPI powering off

2007-09-27 Thread Len Brown
On Thursday 27 September 2007 18:00, Rafael J. Wysocki wrote:
> On Thursday, 27 September 2007 23:29, Mark Lord wrote:
> > Question:  do we disable all CPUs except 0 when doing ACPI power off?
> 
> No, but we should.

We used to.
It is absolutely mandatory -- else it confuses the BIOS on some boards
b/c it isn't expecting SMM to get entered from other than cpu0.

-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [13/17] Virtual compound page freeing in interrupt context

2007-09-27 Thread KAMEZAWA Hiroyuki
On Tue, 25 Sep 2007 16:42:17 -0700
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> +static noinline void vcompound_free(void *addr)
> +{
> + if (in_interrupt()) {

Should be (in_interrupt() || irqs_disabled()) ?

Regards,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: drivers/usb/misc/emi*.c have the biggest data objects in the whole tree

2007-09-27 Thread Greg KH
On Fri, Sep 14, 2007 at 11:35:34AM +0100, Denys Vlasenko wrote:
> Hi Tapio,
> 
> You are the author of these files. Are you still maintaining them?
> If not, do you know who is the current maintainer?
> 
> These two object files hold the biggest data objects in the whole Linux kernel
> after lockdep:
> 
>textdata bss dec hex filename
>1258  160516   0  161774   277ee ./drivers/usb/misc/emi26.o
>1504  209296   0  210800   33770 ./drivers/usb/misc/emi62.o
> 
> Basically, these are big arrays of the following structures:
> 
> typedef struct _INTEL_HEX_RECORD
> {
> __u32   length;
> __u32   address;
> __u32   type;
> __u8data[MAX_INTEL_HEX_RECORD_LENGTH];
> } INTEL_HEX_RECORD;
> 
> I suggest the following optimizations:
> 
> Change structure to
> 
> typedef struct _INTEL_HEX_RECORD
> {
> __u8   type;
> __u8   length;
> __u16   address;
> __u8data[MAX_INTEL_HEX_RECORD_LENGTH];
> } INTEL_HEX_RECORD __attribute__((__packed__));

Only if you redo the whole firmware image too :)

What is this really hurting?  It's only relevant if you load the
specific module, if you have this device type.  It's a firmware blob,
nothing really interesting at all.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git patches] net driver fixes

2007-09-27 Thread Jeff Garzik

And an e1000 id patch.

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
upstream-linus

to receive the following updates:

 drivers/net/e1000/e1000_ethtool.c |1 +
 drivers/net/e1000/e1000_hw.c  |1 +
 drivers/net/e1000/e1000_hw.h  |1 +
 drivers/net/e1000/e1000_main.c|2 +
 drivers/net/sky2.c|   53 +++--
 5 files changed, 44 insertions(+), 14 deletions(-)

Auke Kok (1):
  e1000: Add device IDs of blade version of the 82571 quad port

Stephen Hemminger (3):
  sky2: sky2 FE+ receive status workaround
  sky2: FE+ vlan workaround
  sky2: fix transmit state on resume

diff --git a/drivers/net/e1000/e1000_ethtool.c 
b/drivers/net/e1000/e1000_ethtool.c
index 4c3785c..9ecc3ad 100644
--- a/drivers/net/e1000/e1000_ethtool.c
+++ b/drivers/net/e1000/e1000_ethtool.c
@@ -1726,6 +1726,7 @@ static int e1000_wol_exclusion(struct e1000_adapter 
*adapter, struct ethtool_wol
case E1000_DEV_ID_82571EB_QUAD_COPPER:
case E1000_DEV_ID_82571EB_QUAD_FIBER:
case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE:
+   case E1000_DEV_ID_82571PT_QUAD_COPPER:
case E1000_DEV_ID_82546GB_QUAD_COPPER_KSP3:
/* quad port adapters only support WoL on port A */
if (!adapter->quad_port_a) {
diff --git a/drivers/net/e1000/e1000_hw.c b/drivers/net/e1000/e1000_hw.c
index ba120f7..8604adb 100644
--- a/drivers/net/e1000/e1000_hw.c
+++ b/drivers/net/e1000/e1000_hw.c
@@ -387,6 +387,7 @@ e1000_set_mac_type(struct e1000_hw *hw)
case E1000_DEV_ID_82571EB_SERDES_DUAL:
case E1000_DEV_ID_82571EB_SERDES_QUAD:
case E1000_DEV_ID_82571EB_QUAD_COPPER:
+   case E1000_DEV_ID_82571PT_QUAD_COPPER:
case E1000_DEV_ID_82571EB_QUAD_FIBER:
case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE:
hw->mac_type = e1000_82571;
diff --git a/drivers/net/e1000/e1000_hw.h b/drivers/net/e1000/e1000_hw.h
index fe87146..07f0ea7 100644
--- a/drivers/net/e1000/e1000_hw.h
+++ b/drivers/net/e1000/e1000_hw.h
@@ -475,6 +475,7 @@ int32_t e1000_check_phy_reset_block(struct e1000_hw *hw);
 #define E1000_DEV_ID_82571EB_FIBER   0x105F
 #define E1000_DEV_ID_82571EB_SERDES  0x1060
 #define E1000_DEV_ID_82571EB_QUAD_COPPER 0x10A4
+#define E1000_DEV_ID_82571PT_QUAD_COPPER 0x10D5
 #define E1000_DEV_ID_82571EB_QUAD_FIBER  0x10A5
 #define E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE  0x10BC
 #define E1000_DEV_ID_82571EB_SERDES_DUAL 0x10D9
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 4a22595..e7c8951 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -108,6 +108,7 @@ static struct pci_device_id e1000_pci_tbl[] = {
INTEL_E1000_ETHERNET_DEVICE(0x10BC),
INTEL_E1000_ETHERNET_DEVICE(0x10C4),
INTEL_E1000_ETHERNET_DEVICE(0x10C5),
+   INTEL_E1000_ETHERNET_DEVICE(0x10D5),
INTEL_E1000_ETHERNET_DEVICE(0x10D9),
INTEL_E1000_ETHERNET_DEVICE(0x10DA),
/* required last entry */
@@ -1101,6 +1102,7 @@ e1000_probe(struct pci_dev *pdev,
case E1000_DEV_ID_82571EB_QUAD_COPPER:
case E1000_DEV_ID_82571EB_QUAD_FIBER:
case E1000_DEV_ID_82571EB_QUAD_COPPER_LOWPROFILE:
+   case E1000_DEV_ID_82571PT_QUAD_COPPER:
/* if quad port adapter, disable WoL on all but port A */
if (global_quad_port_a != 0)
adapter->eeprom_wol = 0;
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 0792031..162489b 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -910,6 +910,20 @@ static inline struct sky2_tx_le *get_tx_le(struct 
sky2_port *sky2)
return le;
 }
 
+static void tx_init(struct sky2_port *sky2)
+{
+   struct sky2_tx_le *le;
+
+   sky2->tx_prod = sky2->tx_cons = 0;
+   sky2->tx_tcpsum = 0;
+   sky2->tx_last_mss = 0;
+
+   le = get_tx_le(sky2);
+   le->addr = 0;
+   le->opcode = OP_ADDR64 | HW_OWNER;
+   sky2->tx_addr64 = 0;
+}
+
 static inline struct tx_ring_info *tx_le_re(struct sky2_port *sky2,
struct sky2_tx_le *le)
 {
@@ -1320,7 +1334,8 @@ static int sky2_up(struct net_device *dev)
GFP_KERNEL);
if (!sky2->tx_ring)
goto err_out;
-   sky2->tx_prod = sky2->tx_cons = 0;
+
+   tx_init(sky2);
 
sky2->rx_le = pci_alloc_consistent(hw->pdev, RX_LE_BYTES,
   >rx_le_map);
@@ -2148,6 +2163,18 @@ static struct sk_buff *sky2_receive(struct net_device 
*dev,
sky2->rx_next = (sky2->rx_next + 1) % sky2->rx_pending;
prefetch(sky2->rx_ring + sky2->rx_next);
 
+   if (length < ETH_ZLEN || length > sky2->rx_data_size)
+   goto len_error;
+
+   /* This chip has hardware problems that generates bogus status.
+* So do only marginal 

[PATCH] libata drain fifo on stuck DRQ HSM violation

2007-09-27 Thread Mark Lord

Tejun Heo wrote:

Jeff Garzik wrote:

Tejun Heo wrote:

Alan Cox wrote:

I think there have been enough cases where this draining was necessary.
 IIRC, ata_piix was involved in those cases, right?  If so, can you
please submit a patch which applies this only to affected controllers?
I don't feel too confident about applying this to all SFF controllers.

Old IDE does it on all controllers bar a couple. So we have a very good
knowledge of what does/doesn't work. The one that needs care in old ide
is an ordering issue where a state machine reset done first causes the
drain of the I/O to hang.

Hmmm... So, do we apply draining to all PATA?  Or is ata_piix SATA
affected too?

I would think all SFF controllers, since a lot of first gen SATA are
really bridged solutions.  If they are flagging DRQ, I say oblige them :)


Alright, then the posted patch should be good enough.  Mark, can you be
bothered to regenerate the patch and post it one more time (again)?  It
seems we all agree the update is needed.


I think this original patch still applies cleanly on at least 2.6.23-rc7.

Drain up to 512 words from host/bridge FIFO on stuck DRQ HSM violation,
rather than just getting stuck there forever.

Signed-Off-By:  Mark Lord <[EMAIL PROTECTED]>
---

--- old/drivers/ata/libata-sff.c2007-04-26 12:02:46.0 -0400
+++ linux/drivers/ata/libata-sff.c  2007-04-29 08:29:27.0 -0400
@@ -413,6 +413,24 @@
ap->ops->irq_on(ap);
}

+static void ata_drain_fifo (struct ata_port *ap, struct ata_queued_cmd *qc)
+{
+   u8 stat = ata_chk_status(ap);
+   /*
+* Try to clear stuck DRQ if necessary.
+*/
+   if ((stat & ATA_DRQ) && (!qc || qc->dma_dir != DMA_TO_DEVICE)) {
+   unsigned int i, limit = 512;
+   printk("Draining up to %u words from data FIFO.\n", limit);
+   for (i = 0; i < limit ; ++i) {
+   ioread16(ap->ioaddr.data_addr);
+   if (!(ata_chk_status(ap) & ATA_DRQ))
+   break;
+   }
+   printk("Drained %u/%u words.\n", i, limit);
+   }
+}
+
/**
 *  ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller
 *  @ap: port to handle error for
@@ -469,7 +487,7 @@
}

ata_altstatus(ap);
-   ata_chk_status(ap);
+   ata_drain_fifo(ap, qc);
ap->ops->irq_clear(ap);

spin_unlock_irqrestore(ap->lock, flags);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stardom SATA HSM violation

2007-09-27 Thread Mark Lord

Tejun Heo wrote:

Alan Cox wrote:

I think there have been enough cases where this draining was necessary.
 IIRC, ata_piix was involved in those cases, right?  If so, can you
please submit a patch which applies this only to affected controllers?
I don't feel too confident about applying this to all SFF controllers.

Old IDE does it on all controllers bar a couple. So we have a very good
knowledge of what does/doesn't work. The one that needs care in old ide
is an ordering issue where a state machine reset done first causes the
drain of the I/O to hang.


Hmmm... So, do we apply draining to all PATA?  Or is ata_piix SATA
affected too?


ata_piix SATA is definitely affected when a PATA_drive to SATA_host bridge is 
present.
Possibly other times.

Cheers

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3/4] dma: document dma_flags_set_dmabarrier()

2007-09-27 Thread Grant Grundler
On Thu, Sep 27, 2007 at 06:13:02PM -0700, [EMAIL PROTECTED] wrote:
> 
> Document dma_flags_set_dmabarrier().
> 
> Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]>

This looks really good!

thanks,
grant

Acked-by: Grant Grundler <[EMAIL PROTECTED]>

> 
> ---
>  DMA-API.txt |   26 ++
>  1 files changed, 26 insertions(+)
> 
> diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
> index cc7a8c3..5fc0bba 100644
> --- a/Documentation/DMA-API.txt
> +++ b/Documentation/DMA-API.txt
> @@ -544,3 +544,29 @@ size is the size (and should be a page-sized multiple).
>  The return value will be either a pointer to the processor virtual
>  address of the memory, or an error (via PTR_ERR()) if any part of the
>  region is occupied.
> +
> +int 
> +dma_flags_set_dmabarrier(int dir)
> +
> +Amend dir (one of the enum dma_data_direction values), with a 
> +platform-specific "dmabarrier" attribute.  The dmabarrier attribute 
> +forces a flush of all in-flight DMA when the associated memory 
> +region is written to (see example below.)
> +
> +This provides a mechanism to enforce ordering of DMA on platforms that 
> +permit DMA to be reordered between device and host memory (within a 
> +NUMA interconnect).  On other platforms this is a nop.
> +
> +The dmabarrier would be set when the memory region is mapped for DMA, 
> +e.g.:
> +
> + int count, flags = dma_flags_set_dmabarrier(DMA_BIDIRECTIONAL);
> + 
> + count = dma_map_sg(dev, sglist, nents, flags);
> +
> +As an example of a situation where this would be useful, suppose that 
> +the device does a DMA write to indicate that data is ready and 
> +available in memory.  The DMA of the "completion indication" could 
> +race with data DMA.  Using a dmabarrier on the memory used for 
> +completion indications would prevent the race.
> +
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Greg KH
On Thu, Sep 27, 2007 at 05:28:57PM -0600, Matthew Wilcox wrote:
> On Thu, Sep 27, 2007 at 07:19:27PM -0400, Theodore Tso wrote:
> > Would you accept a patch which causes the deprecated sysfs
> > files/directories to disappear, even if CONFIG_SYS_DEPRECATED is
> > defined, via a boot-time parameter?
> 
> How about a mount option?  That way people can test without a reboot:
> 
> mount -o remount,deprecated={yes,no} /sys

Unfortunatly, due to the way sysfs and kobjects are built up, this is
pretty impossible to do.

sorry,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Greg KH
On Thu, Sep 27, 2007 at 07:19:27PM -0400, Theodore Tso wrote:
> On Thu, Sep 27, 2007 at 02:34:45PM -0700, Greg KH wrote:
> > Ok, how then should I advertise this better?  What can we do better to
> > help userspace programmers out in this regard?
> 
> Would you accept a patch which causes the deprecated sysfs
> files/directories to disappear, even if CONFIG_SYS_DEPRECATED is
> defined, via a boot-time parameter?

As discussed in the kernel summit talk about this very topic, Kay is
working on a patch to do just that :)

> Many people and distros are
> likely to keep CONFIG_SYS_DEPRECATED defined just our of paranoia that
> things might break.  Doing a quick google, I note that Fedora has been
> going back and forth of turning it off, watching things break, and
> then turning it back on.  The latest time, the changelog said:
> 
> * Fri Jan 26 23:00:00 2007 Bill Nottingham 
> 
> - turn on CONFIG_SYSFS_DEPRECATED so that things actually work. *sigh*
> 
> (and I've checked, Fedora's CVS still has CONFIG_SYSFS_DEPRECATED
> defined; it's not just Debian at fault here.)

That's odd, SuSE and Gentoo have been working for quite some time just
fine with that option disabled :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iwl4965 and driver merging policy

2007-09-27 Thread Benjamin Herrenschmidt

On Thu, 2007-09-27 at 22:30 -0400, John W. Linville wrote:
> > It doesn't seem to pull any depedency nor affect any other external
> > piece of code unless I'm missing something, so it's a perfect
> example of
> > what we've been discussing back then: there is just no point not
> merging
> > it at any time right ? :-)
> 
> It is queued for 2.6.24.  I'm not even sure it was originally posted
> in time for the 2.6.23 merge window, but even if it was there was
> a lot of opposition to merging it until fairly recently.  In fact,
> I'm sure there are still some wireless developers that are less than
> happy about merging it now.
> 
> Anyway, coming soon to a kernel near you...

Allright, thanks. I was mostly trying to figure out where we standed
with this whole idea that driver additions were not necessarily
constrainted by the merge window (which I think is fair to do) and this
looked like a good example to pick since it affects my new laptop :-)

Out of curiosity, what's the main source of opposition ? Since it's
being shipped by distro or built out of tree by most users -anyway-, I
think it's pretty clear that we'd be better off having it merged asap
rather than trying to figure out what random version was included by
users/distros and try to support it, in addition to wider exposure & all
the yadada of being upstream in the first place.

Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iwl4965 and driver merging policy

2007-09-27 Thread Benjamin Herrenschmidt

> 
> Well, pulling in iwlwifi would require also pulling in the mac80211
> subsystem, so it's not quite that simple (although I'm not sure what's
> holding back that going into the kernel.)

I though that was already in 2.6.23 ... my bad if I missed something
(there is definitely something there called net/mac80211)

> I had no problem building my personal production kernel by taking
> 2.6.23-rc8, and doing a git pull from the everything branch in John
> Linville's wireless-dev git tree.  It's probably too late to pull it
> for 2.6.23-rc8 (although if Linux wanted to do it it's only one git
> pull command away :-), but it would be really nice if it could get
> merged in for 2.6.24.

Yes, I agree -rc8 seems to be a tad too late, I'm just surprised we
didn't get it in earlier though since it seems it's been around and
useable for some time.

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iwl4965 and driver merging policy

2007-09-27 Thread John W. Linville
On Fri, Sep 28, 2007 at 11:39:27AM +1000, Benjamin Herrenschmidt wrote:

> Just a little question in the light of the discussion we had at Kernel
> Summit about merging drivers upstream (and here, I strongly agree with
> Linus, hence my message).

You must not have been watching me SPAM netdev for the past two
weeks. :-)

> I just got that new T61 laptop which happens to have an iwl4xxx chip.
> The distro I installed on it (ubuntu) has a driver for it. I suspect
> others do too and most users get it from some random external tree and
> use it.
> 
> Thus my question, why are we about to release 2.6.23 without it ?
> 
> It doesn't seem to pull any depedency nor affect any other external
> piece of code unless I'm missing something, so it's a perfect example of
> what we've been discussing back then: there is just no point not merging
> it at any time right ? :-)

It is queued for 2.6.24.  I'm not even sure it was originally posted
in time for the 2.6.23 merge window, but even if it was there was
a lot of opposition to merging it until fairly recently.  In fact,
I'm sure there are still some wireless developers that are less than
happy about merging it now.

Anyway, coming soon to a kernel near you...

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: iwl4965 and driver merging policy

2007-09-27 Thread Theodore Tso
On Fri, Sep 28, 2007 at 11:39:27AM +1000, Benjamin Herrenschmidt wrote:
> 
> Just a little question in the light of the discussion we had at Kernel
> Summit about merging drivers upstream (and here, I strongly agree with
> Linus, hence my message).
> 
> I just got that new T61 laptop which happens to have an iwl4xxx chip.
> The distro I installed on it (ubuntu) has a driver for it. I suspect
> others do too and most users get it from some random external tree and
> use it.
> 
> Thus my question, why are we about to release 2.6.23 without it ?
> 
> It doesn't seem to pull any depedency nor affect any other external
> piece of code unless I'm missing something, so it's a perfect example of
> what we've been discussing back then: there is just no point not merging
> it at any time right ? :-)

Well, pulling in iwlwifi would require also pulling in the mac80211
subsystem, so it's not quite that simple (although I'm not sure what's
holding back that going into the kernel.)

I had no problem building my personal production kernel by taking
2.6.23-rc8, and doing a git pull from the everything branch in John
Linville's wireless-dev git tree.  It's probably too late to pull it
for 2.6.23-rc8 (although if Linux wanted to do it it's only one git
pull command away :-), but it would be really nice if it could get
merged in for 2.6.24.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


WARNING: at arch/x86_64/kernel/smp.c:397 smp_call_function_mask()

2007-09-27 Thread Fengguang Wu
On Thu, Sep 27, 2007 at 02:22:20AM -0700, Andrew Morton wrote:
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc8/2.6.23-rc8-mm2/
 
Laurent,

It triggered a WARNING on first run in qemu:

[0.31] WARNING: at arch/x86_64/kernel/smp.c:397 smp_call_function_mask()
[0.31]
[0.31] Call Trace:
[0.31]  [] dump_trace+0x3ee/0x4a0
[0.31]  [] show_trace+0x43/0x70
[0.31]  [] dump_stack+0x15/0x20
[0.31]  [] smp_call_function_mask+0x94/0xa0
[0.31]  [] smp_call_function+0x19/0x20
[0.31]  [] on_each_cpu+0x1f/0x50
[0.31]  [] global_flush_tlb+0x8c/0x110
[0.31]  [] free_init_pages+0xe5/0xf0
[0.31]  [] alternative_instructions+0x7e/0x150
[0.31]  [] check_bugs+0x1a/0x20
[0.31]  [] start_kernel+0x2da/0x380
[0.31]  [] _sinittext+0x132/0x140


Here is the more complete log:

[0.00] Linux version 2.6.23-rc8-mm2 ([EMAIL PROTECTED]) (gcc version 
4.2.1 (Debian 4.2.1-5)) #3 SMP Fri Sep 28 10:29:34 CST 2007
[0.00] Command line: root=/dev/hda rw console=ttyS0 clock=pit 
init=/bin/bash
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e8000 - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 39ff (usable)
[0.00]  BIOS-e820: 39ff - 3a00 (ACPI data)
[0.00]  BIOS-e820: fffc - 0001 (reserved)
[0.00] end_pfn_map = 1048576
[0.00] DMI not present or invalid.
[0.00] ACPI: RSDP 000FAA30, 0014 (r0 BOCHS )
[0.00] ACPI: RSDT 39FF, 002C (r0 BOCHS  BXPCRSDT1 BXPC  
  1)
[0.00] ACPI: FACP 39FF002C, 0074 (r0 BOCHS  BXPCFACP1 BXPC  
  1)
[0.00] ACPI: DSDT 39FF0100, 0832 (r1   BXPC   BXDSDT1 INTL 
20060912)
[0.00] ACPI: FACS 39FF00C0, 0040
[0.00] ACPI: APIC 39FF0938, 0040 (r0 BOCHS  BXPCAPIC1 BXPC  
  1)
[0.00] No NUMA configuration found
[0.00] Faking a node at -39ff
[0.00] Bootmem setup node 0 -39ff
[0.00] Zone PFN ranges:
[0.00]   DMA 0 -> 4096
[0.00]   DMA324096 ->  1048576
[0.00]   Normal1048576 ->  1048576
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[2] active PFN ranges
[0.00] 0:0 ->  159
[0.00] 0:  256 ->   237552
[0.00] ACPI: PM-Timer IO Port: 0xb008
[0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[0.00] Processor #0 (Bootup-CPU)
[0.00] ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
[0.00] IOAPIC[0]: apic_id 1, address 0xfec0, GSI 0-23
[0.00] Setting APIC routing to flat
[0.00] Using ACPI (MADT) for SMP configuration information
[0.00] Allocating PCI resources starting at 4000 (gap: 
3a00:c5fc)
[0.00] .eh_frame_hdr for 'kernel' present but unusable
[0.00] SMP: Allowing 1 CPUs, 0 hotplug CPUs
[0.00] PERCPU: Allocating 429480 bytes of per cpu data
[0.00] Built 1 zonelists in Node order, mobility grouping on.  Total 
pages: 231879
[0.00] Policy zone: DMA32
[0.00] Kernel command line: root=/dev/hda rw console=ttyS0 clock=pit 
init=/bin/bash
[0.00] Warning! clock= boot option is deprecated. Use clocksource=xyz
[0.00] Initializing CPU#0
[0.00] PID hash table entries: 4096 (order: 12, 32768 bytes)
[0.00] TSC calibrated against PM_TIMER
[0.00] time.c: Detected 2932.892 MHz processor.
[0.02] console [kgdb0] enabled
[0.03] Console: colour VGA+ 80x25
[0.04] console [ttyS0] enabled
[0.05] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., 
Ingo Molnar
[0.05] ... MAX_LOCKDEP_SUBCLASSES:8
[0.05] ... MAX_LOCK_DEPTH:  30
[0.05] ... MAX_LOCKDEP_KEYS:2048
[0.05] ... CLASSHASH_SIZE:   1024
[0.05] ... MAX_LOCKDEP_ENTRIES: 8192
[0.05] ... MAX_LOCKDEP_CHAINS:  16384
[0.05] ... CHAINHASH_SIZE:  8192
[0.05]  memory used by lock dependency info: 1712 kB
[0.05]  per task-struct memory footprint: 2160 bytes
[0.05] Checking aperture...
[0.10] Memory: 905832k/950208k available (3018k kernel code, 43988k 
reserved, 2171k data, 720k init)
[0.10] SLUB: Genslabs=12, HWalign=64, Order=0-3, MinObjects=16, CPUs=1, 
Nodes=1
[0.25] Calibrating delay using timer specific routine.. 5880.64 
BogoMIPS (lpj=29403242)
[0.25] kswapd reclaim order set to 3
[0.25] Security Framework initialized
[0.25] SELinux:  Initializing.
[0.25] 

Re: [patch 2/2] VFS: allow filesystem to override mknod capability checks

2007-09-27 Thread Neil Brown
On Monday September 24, [EMAIL PROTECTED] wrote:
> From: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> Add a new super block flag, that results in the VFS not checking if
> the current process has enough privileges to do an mknod().
> 
> If this flag is set, all mounts for this super block will have the
> "nodev" flag implied.
> 
> This is needed on filesystems, where an unprivileged user may be able
> to create a device node, without causing security problems.
> 
> One such example is "mountlo" a loopback mount utility implemented
> with fuse and UML, which runs as an unprivileged userspace process.
> In this case the user does in fact have the right to create device
> nodes within the filesystem image, as long as the user has write
> access to the image.  Since the filesystem is mounted with "nodev",
> adding device nodes is not a security concern.

I must admit that I don't feel very comfortable about this.  I wonder
how many more flags we might be tempted to add to allow
user-controlled filesystems to do interesting things.  Somehow I doubt
this will be the last, so we should be very careful allowing it to be
the first (or is it the second already?)

A more concrete comment on the patch:  Is it really necessary to
introduce IS_MNT_NODEV??  Why not simply test both the flags
(MS_MKNOD_NOCAP and MNT_NODEV) before allowing the mknod?  That would
localise the change to where is it really relevant.

Do we actually need a new flag?  Would not a combination of MS_NODEV
and MS_SETUSER achieve the same thing (near enough)?

Do you imagine this flag being set as a mount option (-o unprivmknod)
or does the filesystem set it itself?
If the latter, maybe this test should be moved down into the
filesystems ->mknod operation.  Most filesystems get
 
if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
return -EPERM;

at the top of ->mknod.  fuse can do whatever it likes without
bothering common code.

According to fs.h, we only support 32 fs-independent mount-flags, and
over half are in use.  I'm not convinced we should spend one on such a
narrow requirement.

NeilBrown


> 
> Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
> ---
> 
> Index: linux/fs/namei.c
> ===
> --- linux.orig/fs/namei.c 2007-09-24 13:52:17.0 +0200
> +++ linux/fs/namei.c  2007-09-24 13:54:57.0 +0200
> @@ -1617,7 +1617,7 @@ int may_open(struct nameidata *nd, int a
>   if (S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) {
>   flag &= ~O_TRUNC;
>   } else if (S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode)) {
> - if (nd->mnt->mnt_flags & MNT_NODEV)
> + if (IS_MNT_NODEV(nd->mnt))
>   return -EACCES;
>  
>   flag &= ~O_TRUNC;
> @@ -1920,7 +1920,8 @@ int vfs_mknod(struct inode *dir, struct 
>   if (error)
>   return error;
>  
> - if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
> + if (!(dir->i_sb->s_flags & MS_MKNOD_NOCAP) &&
> + (S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
>   return -EPERM;
>  
>   if (!dir->i_op || !dir->i_op->mknod)
> Index: linux/include/linux/fs.h
> ===
> --- linux.orig/include/linux/fs.h 2007-09-24 13:52:17.0 +0200
> +++ linux/include/linux/fs.h  2007-09-24 13:54:57.0 +0200
> @@ -130,6 +130,8 @@ extern int dir_notify_enable;
>  #define MS_SETUSER   (1<<23) /* set mnt_uid to current user */
>  #define MS_NOMNT (1<<24) /* don't allow unprivileged submounts */
>  #define MS_KERNMOUNT (1<<25) /* this is a kern_mount call */
> +#define MS_MKNOD_NOCAP   (1<<26) /* no capability check in mknod,
> +implies "nodev" */
>  #define MS_ACTIVE(1<<30)
>  #define MS_NOUSER(1<<31)
>  
> @@ -190,6 +192,10 @@ extern int dir_notify_enable;
>  #define IS_SWAPFILE(inode)   ((inode)->i_flags & S_SWAPFILE)
>  #define IS_PRIVATE(inode)((inode)->i_flags & S_PRIVATE)
>  
> +#define IS_MNT_NODEV(mnt)(((mnt)->mnt_flags & MNT_NODEV) || \
> + ((mnt)->mnt_sb->s_flags & MS_MKNOD_NOCAP))
> +
> +
>  /* the read-only stuff doesn't really belong here, but any other place is
> probably as bad and I don't want to create yet another include file. */
>  
> Index: linux/drivers/mtd/mtdsuper.c
> ===
> --- linux.orig/drivers/mtd/mtdsuper.c 2007-09-24 13:52:17.0 +0200
> +++ linux/drivers/mtd/mtdsuper.c  2007-09-24 13:54:57.0 +0200
> @@ -194,7 +194,7 @@ int get_sb_mtd(struct file_system_type *
>   if (!S_ISBLK(nd.dentry->d_inode->i_mode))
>   goto out;
>  
> - if (nd.mnt->mnt_flags & MNT_NODEV) {
> + if (IS_MNT_NODEV(nd.mnt)) {
>   ret = -EACCES;
>   goto out;
>   }
> Index: linux/fs/block_dev.c
> 

Re: Floating Point Issue

2007-09-27 Thread WANG Cong
On Thu, Sep 27, 2007 at 05:17:44PM +0200, Jan Engelhardt wrote:
>
>On Sep 27 2007 12:41, mahamuni ashish wrote:
>>I have small code
>
>This is not a kernel problem. (Read your C book and/or ask in
>a C newsgroup.)

Please goto comp.lang.c for help. ;)

-- 
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Hook up group scheduler with control groups

2007-09-27 Thread Srivatsa Vaddagiri
On Thu, Sep 27, 2007 at 04:42:41PM -0700, Andrew Morton wrote:
> > @@ -219,6 +225,9 @@ static inline struct task_grp *task_grp(
> >  
> >  #ifdef CONFIG_FAIR_USER_SCHED
> > tg = p->user->tg;
> > +#elif CONFIG_FAIR_CGROUP_SCHED
> > +   tg = container_of(task_subsys_state(p, cpu_cgroup_subsys_id),
> > +   struct task_grp, css);
> >  #else
> > tg  = _task_grp;
> >  #endif
> 
> that's a bit funny-looking.  Are CONFIG_FAIR_CGROUP_SCHED and
> CONFIG_FAIR_USER_SCHED mutually exclusive?

Yes. While configuring kernel, user can choose only one of those options
and not both.

>  Doesn't seem that way.

Hmm ..why do you say that?

> if
> they're both defined then CONFIG_FAIR_USER_SCHED "wins".
> Anyway, please confirm that this is correct?

They can't both be defined.

> I'll switch that to `#elif defined(CONFIG_FAIR_CGROUP_SCHED)'.  We can get
> gcc warnings with `#if CONFIG_FOO', and people should be using `#ifdef
> CONFIG_FOO', so I assume the same applies to #elif.

Thx for fixing it!

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Theodore Tso
On Thu, Sep 27, 2007 at 05:28:57PM -0600, Matthew Wilcox wrote:
> On Thu, Sep 27, 2007 at 07:19:27PM -0400, Theodore Tso wrote:
> > Would you accept a patch which causes the deprecated sysfs
> > files/directories to disappear, even if CONFIG_SYS_DEPRECATED is
> > defined, via a boot-time parameter?
> 
> How about a mount option?  That way people can test without a reboot:
> 
> mount -o remount,deprecated={yes,no} /sys

It would be nice if that would be easy to make work, but the problem
is that remounting /sysfs doesn't change the entries in the sysfs tree
that have already been made in the tree.  We could do something such
as creating an sysfs_create_link_deprecated() call which created a
kobject with a new flag indicating it's deprecated, so it could be
filtered out dynamically when /sys is remounted, or when some file
such as /sys/kernel/deprecated_sysfs_files has "0" or "1" written to
it.

The question is whether it's worth it, since we'd have to bloat the
kobject structure by 4 bytes (it currently doesn't have a flags field
from which we could borrow a bit), or whether it's OK just to make the
user reboot.  (I do agree it would be nicer if the user didn't have to
reboot, but most of the time they will need to test the initrd and
init scripts anyway.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i915: make vbl interrupts work properly on i965g/gm

2007-09-27 Thread Jesse Barnes
On Thursday, September 27, 2007 7:05:31 pm Dave Airlie wrote:
> Hi Linus,
>
> The attached patch is to fix a bug reported on 965gm chipsets (lots of new
> laptops), I think distros will all have to patch this in to fix it, so can
> we get it into the 2.6.23 final?
>
> (Otherwise I'll wait until stable..)

Without this patch, my 965GM drops vblank interrupts, so I'd really like to 
see it upstream ASAP too.

Acked-by:  Jesse Barnes <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.23-rc8 network problem. Mem leak? ip1000a?

2007-09-27 Thread linux
Uniprocessor Althlon 64, 64-bit kernel, 2G ECC RAM,
2.6.23-rc8 + linuxpps (5.0.0) + ip1000a driver.
(patch from http://marc.info/?l=linux-netdev=118980588419882)

After a few hours of operation, ntp loses the ability to send packets.
sendto() returns -EAGAIN to everything, including the 24-byte UDP packet
that is a response to ntpq.

-EAGAIN on a sendto() makes me think of memory problems, so here's
meminfo at the time:

### FAILED state ###
# cat /proc/meminfo 
MemTotal:  2059384 kB
MemFree: 15332 kB
Buffers:665608 kB
Cached:  18212 kB
SwapCached:  0 kB
Active: 380384 kB
Inactive:   355020 kB
SwapTotal: 5855208 kB
SwapFree:  5854552 kB
Dirty:   28504 kB
Writeback:   0 kB
AnonPages:   51608 kB
Mapped:  11852 kB
Slab:  1285348 kB
SReclaimable:   152968 kB
SUnreclaim:1132380 kB
PageTables:   3888 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:   6884900 kB
Committed_AS:   590528 kB
VmallocTotal: 34359738367 kB
VmallocUsed:265628 kB
VmallocChunk: 34359472059 kB


Killing and restarting ntpd gets it running again for a few hours.
Here's after about two hours of successful operation.  (I'll try to
remember to run slabinfo before killing ntpd next time.)

### WORKING state ###
# cat /proc/meminfo
MemTotal:  2059384 kB
MemFree: 20252 kB
Buffers:242688 kB
Cached:  41556 kB
SwapCached:200 kB
Active: 285012 kB
Inactive:   147348 kB
SwapTotal: 5855208 kB
SwapFree:  5854212 kB
Dirty:  36 kB
Writeback:   0 kB
AnonPages:  148052 kB
Mapped:  12756 kB
Slab:  1582512 kB
SReclaimable:   134348 kB
SUnreclaim:1448164 kB
PageTables:   4500 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:   6884900 kB
Committed_AS:   689956 kB
VmallocTotal: 34359738367 kB
VmallocUsed:265628 kB
VmallocChunk: 34359472059 kB
# /usr/src/linux/Documentation/vm/slabinfo
Name   Objects ObjsizeSpace Slabs/Part/Cpu  O/S O %Fr %Ef 
Flg
:016  1478  1624.5K  6/3/1  256 0  50  96 *
:024   170  24 4.0K  1/0/1  170 0   0  99 *
:032  1339  3245.0K 11/2/1  128 0  18  95 *
:040   102  40 4.0K  1/0/1  102 0   0  99 *
:064  5937  64   413.6K   101/15/1   64 0  14  91 *
:07256  72 4.0K  1/0/1   56 0   0  98 *
:088  6946  88   618.4K151/0/1   46 0   0  98 *
:096 23851  96 2.5M  616/144/1   42 0  23  90 *
:128   730 128   114.6K 28/6/1   32 0  21  81 *
:136   232 13636.8K  9/6/1   30 0  66  85 *
:192   474 19298.3K 24/4/1   21 0  16  92 *
:256   1385376 256   354.6M  86587/0/1   16 0   0  99 *
:32012 304 4.0K  1/0/1   12 0   0  89 *A
:384   359 384   180.2K44/23/1   10 0  52  76 *A
:512   1384316 512   708.7M 173040/1/18 0   0  99 *
:64072 61653.2K 13/5/16 0  38  83 *A
:704  1870 696 1.3M170/0/1   11 1   0  93 *A
:0001024   4271024   454.6K111/9/14 0   8  96 *
:0001472   1501472   245.7K 30/0/15 1   0  89 *
:00020481589912048   325.7M 39759/25/14 1   0  99 *
:0004096514096   245.7K 30/9/12 1  30  85 *
Acpi-State  51  80 4.0K  1/0/1   51 0   0  99 
anon_vma  1032  1628.6K  7/5/1  170 0  71  57 
bdev_cache  43 72036.8K  9/1/15 0  11  83 Aa
blkdev_requests 42 28812.2K  3/0/1   14 0   0  98 
buffer_head  59173 10411.1M2734/1690/1   39 0  61  54 a
cfq_io_context 223 15240.9K 10/6/1   26 0  60  82 
dentry   98641 19219.7M 4813/274/1   21 0   5  96 a
ext3_inode_cache115690 68886.3M 10545/77/1   11 1   0  92 a
file_lock_cache 23 168 4.0K  1/0/1   23 0   0  94 
idr_layer_cache118 52869.6K 17/1/17 0   5  89 
inode_cache   1365 528   798.7K195/0/17 0   0  90 a
kmalloc-131072   1  131072   131.0K  1/0/11 5   0 100 
kmalloc-163848   16384   131.0K  8/0/11 2   0 100 
kmalloc-327681   3276832.7K  1/0/11 3   0 100 
kmalloc-8 1535   812.2K  3/1/1  512 0  33  99 
kmalloc-819210

[PATCH] i915: make vbl interrupts work properly on i965g/gm

2007-09-27 Thread Dave Airlie


Hi Linus,

The attached patch is to fix a bug reported on 965gm chipsets (lots of new 
laptops), I think distros will all have to patch this in to fix it, so can 
we get it into the 2.6.23 final?


(Otherwise I'll wait until stable..)

Dave.From 14e53712e5e2ccc72dac1131de78e590e9a9d451 Mon Sep 17 00:00:00 2001
From: Dave Airlie <[EMAIL PROTECTED]>
Date: Fri, 28 Sep 2007 11:46:28 +1000
Subject: [PATCH] i915: make vbl interrupts work properly on i965g/gm hw.

This code is ported from the DRM git tree and allows the vblank interrupts
to function on the i965 hw. It also requires a change in Mesa's 965 driver
to actually use them.

Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>
---
 drivers/char/drm/i915_drv.h |6 ++
 drivers/char/drm/i915_irq.c |   12 
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/drivers/char/drm/i915_drv.h b/drivers/char/drm/i915_drv.h
index 737088b..28b9873 100644
--- a/drivers/char/drm/i915_drv.h
+++ b/drivers/char/drm/i915_drv.h
@@ -210,6 +210,12 @@ extern int i915_wait_ring(struct drm_device * dev, int n, const char *caller);
 #define I915REG_INT_MASK_R 	0x020a8
 #define I915REG_INT_ENABLE_R	0x020a0
 
+#define I915REG_PIPEASTAT	0x70024
+#define I915REG_PIPEBSTAT	0x71024
+
+#define I915_VBLANK_INTERRUPT_ENABLE	(1UL<<17)
+#define I915_VBLANK_CLEAR		(1UL<<1)
+
 #define SRX_INDEX		0x3c4
 #define SRX_DATA		0x3c5
 #define SR01			1
diff --git a/drivers/char/drm/i915_irq.c b/drivers/char/drm/i915_irq.c
index 4b4b2ce..bb8e9e9 100644
--- a/drivers/char/drm/i915_irq.c
+++ b/drivers/char/drm/i915_irq.c
@@ -214,6 +214,10 @@ irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS)
 	struct drm_device *dev = (struct drm_device *) arg;
 	drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev->dev_private;
 	u16 temp;
+	u32 pipea_stats, pipeb_stats;
+
+	pipea_stats = I915_READ(I915REG_PIPEASTAT);
+	pipeb_stats = I915_READ(I915REG_PIPEBSTAT);
 
 	temp = I915_READ16(I915REG_INT_IDENTITY_R);
 
@@ -225,6 +229,8 @@ irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS)
 		return IRQ_NONE;
 
 	I915_WRITE16(I915REG_INT_IDENTITY_R, temp);
+	(void) I915_READ16(I915REG_INT_IDENTITY_R);
+	DRM_READMEMORYBARRIER();
 
 	dev_priv->sarea_priv->last_dispatch = READ_BREADCRUMB(dev_priv);
 
@@ -252,6 +258,12 @@ irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS)
 
 		if (dev_priv->swaps_pending > 0)
 			drm_locked_tasklet(dev, i915_vblank_tasklet);
+		I915_WRITE(I915REG_PIPEASTAT,
+			pipea_stats|I915_VBLANK_INTERRUPT_ENABLE|
+			I915_VBLANK_CLEAR);
+		I915_WRITE(I915REG_PIPEBSTAT,
+			pipeb_stats|I915_VBLANK_INTERRUPT_ENABLE|
+			I915_VBLANK_CLEAR);
 	}
 
 	return IRQ_HANDLED;
-- 
1.5.3.1



Re: State of the Linux PCI Subsystem for 2.6.23-rc8

2007-09-27 Thread Greg KH
On Thu, Sep 27, 2007 at 09:18:50PM -0400, Jeff Garzik wrote:
> Greg KH wrote:
>> On Thu, Sep 27, 2007 at 03:22:35AM -0400, Jeff Garzik wrote:
>>> Greg KH wrote:
 On Wed, Sep 26, 2007 at 11:40:58PM +0200, Brice Goglin wrote:
> Greg KH wrote:
>> Here's a summary of the current state of the Linux PCI subsystem, as 
>> of
>> 2.6.23-rc8.
>>
>> If the information in here is incorrect, or anyone knows of any
>> outstanding issues not listed here, please let me know.
>>
>> List of outstanding regressions from 2.6.22:
>>  - none known.
>>
>> List of outstanding regressions from older kernel versions:
>>  - none known.
>>   
> What about http://marc.info/?l=linux-pci=11907248538=2 ?
 That's not a regression, right?  Tt's probably never worked for that
 kind of box :)
 I think the pci bus patches that are pending from Jeff Garzik should fix
 up these issues.  They are in one of his trees, and in the -mm release,
 if you are able to test those.
>>> jgarzik/misc-2.6.git#pciseg has my only outstanding PCI stuff, which is a 
>>> small x86[-64] PCI domain support patch.  Mostly unrelated to the thread 
>>> at hand, alas, even though it was touching that area.
>>>
>>> I need to a few changes required by Andi, who made several good points, 
>>> then the PCI domains thing should be ready for upstream.  I don't care 
>>> much who merges it, you, Andi or me.
>> I'll take it, as I guess it should go through me, Andi is going to have
>> enough merge issues for 2.6.24 :)
>> I'll add them to my tree later today.
>
> Please don't pull 'pciseg' just yet...  it needs the fixes Andi pointed 
> out, namely, it should be turned on by default in x86 / x86-64 platform 
> Kconfig, and have a boot-time method of disabling it.

Ok, let me know when you want me to pull it and I will.  Or just send me
the patches by email, that's much easier for me :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


iwl4965 and driver merging policy

2007-09-27 Thread Benjamin Herrenschmidt
Hi !

Just a little question in the light of the discussion we had at Kernel
Summit about merging drivers upstream (and here, I strongly agree with
Linus, hence my message).

I just got that new T61 laptop which happens to have an iwl4xxx chip.
The distro I installed on it (ubuntu) has a driver for it. I suspect
others do too and most users get it from some random external tree and
use it.

Thus my question, why are we about to release 2.6.23 without it ?

It doesn't seem to pull any depedency nor affect any other external
piece of code unless I'm missing something, so it's a perfect example of
what we've been discussing back then: there is just no point not merging
it at any time right ? :-)

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] New kernel-message logging API (take 2)

2007-09-27 Thread linux
> Example: {
>   struct kprint_block out;
>   kprint_block_init(, KPRINT_DEBUG);
>   kprint_block(, "Stack trace:");
>
>   while(unwind_stack()) {
>   kprint_block(, "%p %s", address, symbol);
>   }
>   kprint_block_flush();
> }

Assuming that kprint_block_flush() is a combination of
kprint_block_printit() and kprint_block_abort(), you
coulld make a macro wrapper for this to preclude leaks:

#define KPRINT_BLOCK(block, level, code) \
do { \
struct kprint_block block; \
kprint_block_init(, KPRINT_##level); \
do { \
code ; \
kprint_block_printit(); \
while (0); \
kprint_block_abort(); \
} while(0)

The inner do { } while(0) region is so you can abort with "break".

(Or you can split it into KPRINT_BEGIN() and KPRINT_END() macros,
if that works out to be cleaner.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] UML - Correctly handle skb allocation failures

2007-09-27 Thread Jeff Dike
On Thu, Sep 27, 2007 at 04:53:40PM -0700, Andrew Morton wrote:
> Still wanna know why it is safe for uml_net_rx to be playing with
> drop_skb when update_drop_skb() could be concurrently reallocating
> and freeing it.

Ah, yes, I missed that point in the horror of my botch last night.

I'll add irqsave/irqrestore to the locking - keep this patch, and I'll
send in a fix.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: State of the Linux PCI Subsystem for 2.6.23-rc8

2007-09-27 Thread Jeff Garzik

Greg KH wrote:

On Thu, Sep 27, 2007 at 03:22:35AM -0400, Jeff Garzik wrote:

Greg KH wrote:

On Wed, Sep 26, 2007 at 11:40:58PM +0200, Brice Goglin wrote:

Greg KH wrote:

Here's a summary of the current state of the Linux PCI subsystem, as of
2.6.23-rc8.

If the information in here is incorrect, or anyone knows of any
outstanding issues not listed here, please let me know.

List of outstanding regressions from 2.6.22:
- none known.

List of outstanding regressions from older kernel versions:
- none known.
  

What about http://marc.info/?l=linux-pci=11907248538=2 ?

That's not a regression, right?  Tt's probably never worked for that
kind of box :)
I think the pci bus patches that are pending from Jeff Garzik should fix
up these issues.  They are in one of his trees, and in the -mm release,
if you are able to test those.
jgarzik/misc-2.6.git#pciseg has my only outstanding PCI stuff, which is a 
small x86[-64] PCI domain support patch.  Mostly unrelated to the thread at 
hand, alas, even though it was touching that area.


I need to a few changes required by Andi, who made several good points, 
then the PCI domains thing should be ready for upstream.  I don't care much 
who merges it, you, Andi or me.


I'll take it, as I guess it should go through me, Andi is going to have
enough merge issues for 2.6.24 :)

I'll add them to my tree later today.


Please don't pull 'pciseg' just yet...  it needs the fixes Andi pointed 
out, namely, it should be turned on by default in x86 / x86-64 platform 
Kconfig, and have a boot-time method of disabling it.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] writeback: remove unnecessary wait in throttle_vm_writeout()

2007-09-27 Thread Fengguang Wu
On Thu, Sep 27, 2007 at 11:16:10AM -0400, Rik van Riel wrote:
> On Thu, 27 Sep 2007 09:50:16 +0800
> Fengguang Wu <[EMAIL PROTECTED]> wrote:
> 
> > We don't want to introduce pointless delays in throttle_vm_writeout()
> > when the writeback limits are not yet exceeded, do we?
> 
> Good catch.

Thank you.
 
> > Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
> 
> Reviewed-by: Rik van Riel <[EMAIL PROTECTED]>

It could be a good fix for 2.6.22/23.  But for -mm, I'm not
sure if throttle_vm_writeout() will be eventually removed ;-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[4/4] mthca: allow setting "dmabarrier" on user-allocated memory

2007-09-27 Thread akepner

Use the dma_flags_set_dmabarrier() interface to allow a "dmabarrier"
attribute to be associated with user-allocated memory. (For now,
it's only implemented for mthca.)

Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]>

--- 
 drivers/infiniband/core/umem.c   |7 +--
 drivers/infiniband/hw/amso1100/c2_provider.c |2 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |2 +-
 drivers/infiniband/hw/ehca/ehca_mrmw.c   |2 +-
 drivers/infiniband/hw/ipath/ipath_mr.c   |2 +-
 drivers/infiniband/hw/mlx4/cq.c  |2 +-
 drivers/infiniband/hw/mlx4/doorbell.c|2 +-
 drivers/infiniband/hw/mlx4/mr.c  |3 ++-
 drivers/infiniband/hw/mlx4/qp.c  |2 +-
 drivers/infiniband/hw/mlx4/srq.c |2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c |7 ++-
 drivers/infiniband/hw/mthca/mthca_user.h |   10 +-
 include/rdma/ib_umem.h   |4 ++--
 13 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 664d2fa..5b30b0c 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -69,9 +69,10 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
  * @addr: userspace virtual address to start at
  * @size: length of region to pin
  * @access: IB_ACCESS_xxx flags for memory being pinned
+ * @dmabarrier: set "dmabarrier" attribute on this memory, if necessary 
  */
 struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
-   size_t size, int access)
+   size_t size, int access, int dmabarrier)
 {
struct ib_umem *umem;
struct page **page_list;
@@ -83,6 +84,8 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
int ret;
int off;
int i;
+   int flags = dmabarrier ? dma_flags_set_dmabarrier(DMA_BIDIRECTIONAL): 
+   DMA_BIDIRECTIONAL;
 
if (!can_do_mlock())
return ERR_PTR(-EPERM);
@@ -160,7 +163,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
unsigned long addr,
chunk->nmap = ib_dma_map_sg(context->device,
>page_list[0],
chunk->nents,
-   DMA_BIDIRECTIONAL);
+   flags);
if (chunk->nmap <= 0) {
for (i = 0; i < chunk->nents; ++i)
put_page(chunk->page_list[i].page);
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c 
b/drivers/infiniband/hw/amso1100/c2_provider.c
index 997cf15..17243b7 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -449,7 +449,7 @@ static struct ib_mr *c2_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
return ERR_PTR(-ENOMEM);
c2mr->pd = c2pd;
 
-   c2mr->umem = ib_umem_get(pd->uobject->context, start, length, acc);
+   c2mr->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0);
if (IS_ERR(c2mr->umem)) {
err = PTR_ERR(c2mr->umem);
kfree(c2mr);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index f0c7775..d0a514c 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -601,7 +601,7 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
if (!mhp)
return ERR_PTR(-ENOMEM);
 
-   mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc);
+   mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0);
if (IS_ERR(mhp->umem)) {
err = PTR_ERR(mhp->umem);
kfree(mhp);
diff --git a/drivers/infiniband/hw/ehca/ehca_mrmw.c 
b/drivers/infiniband/hw/ehca/ehca_mrmw.c
index d97eda3..c13c11c 100644
--- a/drivers/infiniband/hw/ehca/ehca_mrmw.c
+++ b/drivers/infiniband/hw/ehca/ehca_mrmw.c
@@ -329,7 +329,7 @@ struct ib_mr *ehca_reg_user_mr(struct ib_pd *pd, u64 start, 
u64 length,
}
 
e_mr->umem = ib_umem_get(pd->uobject->context, start, length,
-mr_access_flags);
+mr_access_flags, 0);
if (IS_ERR(e_mr->umem)) {
ib_mr = (void *)e_mr->umem;
goto reg_user_mr_exit1;
diff --git a/drivers/infiniband/hw/ipath/ipath_mr.c 
b/drivers/infiniband/hw/ipath/ipath_mr.c
index e442470..e351222 100644
--- a/drivers/infiniband/hw/ipath/ipath_mr.c
+++ b/drivers/infiniband/hw/ipath/ipath_mr.c
@@ -195,7 +195,7 @@ struct ib_mr *ipath_reg_user_mr(struct ib_pd *pd, u64 
start, u64 length,
  

[3/4] dma: document dma_flags_set_dmabarrier()

2007-09-27 Thread akepner

Document dma_flags_set_dmabarrier().

Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]>

---
 DMA-API.txt |   26 ++
 1 files changed, 26 insertions(+)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index cc7a8c3..5fc0bba 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -544,3 +544,29 @@ size is the size (and should be a page-sized multiple).
 The return value will be either a pointer to the processor virtual
 address of the memory, or an error (via PTR_ERR()) if any part of the
 region is occupied.
+
+int 
+dma_flags_set_dmabarrier(int dir)
+
+Amend dir (one of the enum dma_data_direction values), with a 
+platform-specific "dmabarrier" attribute.  The dmabarrier attribute 
+forces a flush of all in-flight DMA when the associated memory 
+region is written to (see example below.)
+
+This provides a mechanism to enforce ordering of DMA on platforms that 
+permit DMA to be reordered between device and host memory (within a 
+NUMA interconnect).  On other platforms this is a nop.
+
+The dmabarrier would be set when the memory region is mapped for DMA, 
+e.g.:
+
+   int count, flags = dma_flags_set_dmabarrier(DMA_BIDIRECTIONAL);
+   
+   count = dma_map_sg(dev, sglist, nents, flags);
+
+As an example of a situation where this would be useful, suppose that 
+the device does a DMA write to indicate that data is ready and 
+available in memory.  The DMA of the "completion indication" could 
+race with data DMA.  Using a dmabarrier on the memory used for 
+completion indications would prevent the race.
+
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2/4] dma: redefine dma_flags_set_dmabarrier() for sn-ia64

2007-09-27 Thread akepner

define dma_flags_set_dmabarrier() for sn-ia64 - it "borrows"
bits from the direction argument (renamed "flags") to the
dma_map_* routines to pass an additional "dmabarrier" attribute.
Also define routines to retrieve the original direction and
attribute from "flags".

Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]>

---
 arch/ia64/sn/pci/pci_dma.c |   35 ++-
 include/asm-ia64/sn/io.h   |   24 
 2 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c
index d79ddac..6c0a498 100644
--- a/arch/ia64/sn/pci/pci_dma.c
+++ b/arch/ia64/sn/pci/pci_dma.c
@@ -153,7 +153,7 @@ EXPORT_SYMBOL(sn_dma_free_coherent);
  * @dev: device to map for
  * @cpu_addr: kernel virtual address of the region to map
  * @size: size of the region
- * @direction: DMA direction
+ * @flags: DMA direction, and arch-specific attributes
  *
  * Map the region pointed to by @cpu_addr for DMA and return the
  * DMA address.
@@ -167,17 +167,23 @@ EXPORT_SYMBOL(sn_dma_free_coherent);
  *   figure out how to save dmamap handle so can use two step.
  */
 dma_addr_t sn_dma_map_single(struct device *dev, void *cpu_addr, size_t size,
-int direction)
+int flags)
 {
dma_addr_t dma_addr;
unsigned long phys_addr;
struct pci_dev *pdev = to_pci_dev(dev);
struct sn_pcibus_provider *provider = SN_PCIDEV_BUSPROVIDER(pdev);
+   int dmabarrier = dma_flags_get_dmabarrier(flags);
 
BUG_ON(dev->bus != _bus_type);
 
phys_addr = __pa(cpu_addr);
-   dma_addr = provider->dma_map(pdev, phys_addr, size, SN_DMA_ADDR_PHYS);
+   if (dmabarrier)
+   dma_addr = provider->dma_map_consistent(pdev, phys_addr, size, 
+   SN_DMA_ADDR_PHYS);
+   else
+   dma_addr = provider->dma_map(pdev, phys_addr, size, 
+SN_DMA_ADDR_PHYS);
if (!dma_addr) {
printk(KERN_ERR "%s: out of ATEs\n", __FUNCTION__);
return 0;
@@ -240,18 +246,20 @@ EXPORT_SYMBOL(sn_dma_unmap_sg);
  * @dev: device to map for
  * @sg: scatterlist to map
  * @nhwentries: number of entries
- * @direction: direction of the DMA transaction
+ * @flags: direction of the DMA transaction, and arch-specific attributes
  *
  * Maps each entry of @sg for DMA.
  */
 int sn_dma_map_sg(struct device *dev, struct scatterlist *sg, int nhwentries,
- int direction)
+ int flags)
 {
unsigned long phys_addr;
struct scatterlist *saved_sg = sg;
struct pci_dev *pdev = to_pci_dev(dev);
struct sn_pcibus_provider *provider = SN_PCIDEV_BUSPROVIDER(pdev);
int i;
+   int dmabarrier = dma_flags_get_dmabarrier(flags);
+   int direction = dma_flags_get_direction(flags);
 
BUG_ON(dev->bus != _bus_type);
 
@@ -259,12 +267,21 @@ int sn_dma_map_sg(struct device *dev, struct scatterlist 
*sg, int nhwentries,
 * Setup a DMA address for each entry in the scatterlist.
 */
for (i = 0; i < nhwentries; i++, sg++) {
+   dma_addr_t dma_addr;
phys_addr = SG_ENT_PHYS_ADDRESS(sg);
-   sg->dma_address = provider->dma_map(pdev,
-   phys_addr, sg->length,
-   SN_DMA_ADDR_PHYS);
 
-   if (!sg->dma_address) {
+   if (dmabarrier) {
+   dma_addr = provider->dma_map_consistent(pdev,
+   phys_addr,
+   sg->length,
+   
SN_DMA_ADDR_PHYS);
+   } else {
+   dma_addr = provider->dma_map(pdev,
+phys_addr, sg->length,
+SN_DMA_ADDR_PHYS);
+   }
+
+   if (!(sg->dma_address = dma_addr)) {
printk(KERN_ERR "%s: out of ATEs\n", __FUNCTION__);
 
/*
diff --git a/include/asm-ia64/sn/io.h b/include/asm-ia64/sn/io.h
index 41c73a7..301bc47 100644
--- a/include/asm-ia64/sn/io.h
+++ b/include/asm-ia64/sn/io.h
@@ -271,4 +271,28 @@ sn_pci_set_vchan(struct pci_dev *pci_dev, unsigned long 
*addr, int vchan)
return 0;
 }
 
+#define ARCH_CAN_REORDER_DMA
+/* here we steal some upper bits from the "direction" argument to the 
+ * dma_map_* routines */
+#define DMA_ATTR_SHIFT 8
+/* bottom 8 bits for direction, remaining bits for additional "attributes" */
+#define DMA_BARRIER_ATTR   0x1
+/* Setting DMA_BARRIER_ATTR on a DMA-mapped memory region causes all in-
+ * flight DMA to be flushed when the memory region is written to. So 
+ * 

[1/4] dma: add dma_flags_set_dmabarrier() to dma interface

2007-09-27 Thread akepner

Introduce the dma_flags_set_dmabarrier() interface and give it
a default no-op implementation.

Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]>

--- 

 dma-mapping.h |6 ++
 1 files changed, 6 insertions(+)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h

index 2dc21cb..4d1d199 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -99,4 +99,10 @@ static inline void dmam_release_declared_memory(struct 
device *dev)
 }
 #endif /* ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY */
 
+#ifndef ARCH_CAN_REORDER_DMA
+static inline int dma_flags_set_dmabarrier(int dir) {
+   return dir;
+}
+#endif /* ARCH_CAN_REORDER_DMA */
+
 #endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] allow drivers to flush in-flight DMA v2

2007-09-27 Thread akepner

On Altix, DMA may be reordered between a device and host memory. 
This reordering can happen in the NUMA interconnect, and it usually 
results in correct operation and improved performance. In some 
situations it may be necessary to explicitly synchronize DMA from 
the device.

This patchset allows a memory region to be mapped with a "dmabarrier". 
Writes to the memory region will cause in-flight DMA to be flushed, 
providing a mechanism to order DMA from a device.

There are 4 patches in this patchset:

  [1/4] dma: add dma_flags_set_dmabarrier() to dma interface
  [2/4] dma: redefine dma_flags_set_dmabarrier() for sn-ia64
  [3/4] dma: document dma_flags_set_dmabarrier()
  [4/4] mthca: allow setting "dmabarrier" on user-allocated memory

-- 
Arthur

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_chroot+sys_fchdir Fix

2007-09-27 Thread David Newall
Bill Davidsen wrote: 
It seems there are (at least) two parts to this, one regarding 
changing working directory which is clearly stated in the standards 
and must work as it does, and the various issues regarding getting out 
of the chroot after the cwd has entered that changed root. That second 
part seems to offer room for additional controls on getting out of the 
chroot which do not violate any of the obvious standards, and which 
therefore might be valid candidates for discussion on the basis of 
benefit rather than portability.


Correct.  BSDs solved the problem by changing cwd on subsequent use of 
chroot; I think there's a better way.  I think the solution might be to 
add a "previous root", and restrict the process there as well as the new 
root.  That is, once cwd is set within the new root, that new root is 
the limit.  Prior to setting cwd within the new root, the previous root 
is the limit.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pci: use pci=bfsort for HP DL385 G2, DL585 G2

2007-09-27 Thread Matt Domsch
On Thu, Sep 27, 2007 at 11:18:44AM +0200, Michal Schmidt wrote:
> Hello,
> 
> HP ProLiant systems DL385 G2 and DL585 G2 need pci=bfsort to enumerate PCI
> devices in the expected order.
> 
> (John, can you please confirm and ACK this?)

As a shameless plug, biosdevname is a userspace app I wrote to help
solve this so we don't need to patch the kernel for future systems.
It's not integrated into any distributions properly yet, but is
included in openSUSE 10.3 and Fedora 8 for people who want to download
and install it there.  It acts as a udev helper.

For the time being, patching the kernel is necessary.  I really hope
biosdevname eliminates that need in future distributions.

http://linux.dell.com/biosdevname/

Thanks,
Matt

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] allow drivers to flush in-flight DMA

2007-09-27 Thread akepner
On Wed, Sep 26, 2007 at 12:49:50AM -0600, Grant Grundler wrote:

[edited out several points that I think have been already 
addresed by others in this thread.]

> 
> Defining it terms of completion queues won't mean much to most folks.
> Better to add a description of completion queues to the DMA-API.txt if
> necessary.  dma_alloc_coherent() API is pretty well understood.

OK, next time I'll use a more generic description.

> 
> > There are four patches in this set:
> > 
> >   [1/4] dma: add dma_flags_set_dmaflush() to dma interface
> 
> Sorry - this feels like a "color of the shed" argument, but isn't
> this about DMA ordering attribute?
> "dmaflush" is an action and not an attribute to me.

Right - an attribute is a noun, not a verb. I'm going to try 
"s/dmaflush/dmabarrier/" in the next version.

> 
> This patch updates Documentation/DMA-mapping.txt. But it's a change to
> the generic (not PCI specific) API described in DMA-API.txt.
> Can you update that as well please?
>

Ja, I realized that soon after hitting the send button. I'll 
move the documentation to DMA-API.txt.

-- 
Arthur

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] dma: add dma_flags_set_dmaflush() to dma interface

2007-09-27 Thread akepner
On Tue, Sep 25, 2007 at 10:13:33PM -0700, Randy Dunlap wrote:
> On Tue, 25 Sep 2007 17:00:57 -0700 [EMAIL PROTECTED] wrote:
> ..
> 1.  Function signature should be on one line if possible (and it is).
> Aw crud, I looked at dma-mapping.h and it uses this format sometimes.
> Well, it's undesirable, so please don't propagate it.
> 
> 2.  No parens on return: it's not a function.
> 
> static inline int dma_flags_set_dmaflush(int dir)
> {
>   return dir;
> }
> 
> 
> Similar comments for patch 2/4: sn-ia64.
> 

Both fixed in next version. Thanks, Randy.

-- 
Arthur

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
> 
> Worked, but that just raises more questions.  Why didn't more x86 boxes
> break or, alternatively, why did a new version of the BIOS fix the problem? 
> I guess we shouldn't look a gift horse in the mouth. Or something.
> 

Why didn't more x86 boxes break... well, it's pretty natural an
implementation of the BIOS to not clobber registers that aren't outputs.
 Arguably the BIOSes that do are still buggy, since there isn't a
well-defined calling sequence for the BIOS and the convention that has
evolved is "don't clobber anything unless it's an output."

It's still wrong, however, especially since it means omitting the *real*
SMAP check.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] QoS params patch update.

2007-09-27 Thread Paul Mundt
On Thu, Sep 27, 2007 at 01:17:39PM -0700, Mark Gross wrote:
> Updated qos PM parameter patch:
> Note: the replacing of latency.c with this is a separate patch.
> 
> this patch attempts to address the issues raised so far.
> 
[snip]

> +static int register_new_qos_misc(struct qos_object *qos)
> +{
> + int ret;
> +
> + qos->qos_power_miscdev.minor = MISC_DYNAMIC_MINOR;
> + qos->qos_power_miscdev.name = qos->name;
> + qos->qos_power_miscdev.fops = _power_fops;
> +
> + ret = misc_register(>qos_power_miscdev);
> +
> + return ret;
> +}
> +
Minor nit, ret is a pointless variable here, you can just return
misc_register directly.

Other than that, this looks much better!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 16:36 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> >>>
> >> Oh bugger, looks like this one might be genuinely my fault after all.
> >> The ID check in the new code is buggy.
> >>
> >> Can you please test this revised patch out (against current -git)?
> > 
> > 
> > That looks the same as the previous patch you sent?
> > 
> 
> Sorry, this is the right one...
> 
>   -hpa

> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> index bccaa1c..2f37568 100644
> --- a/arch/i386/boot/memory.c
> +++ b/arch/i386/boot/memory.c
> @@ -28,11 +28,10 @@ static int detect_memory_e820(void)
>  
>   do {
>   size = sizeof(struct e820entry);
> - id = SMAP;
>   asm("int $0x15; setc %0"
> - : "=am" (err), "+b" (next), "+d" (id), "+c" (size),
> + : "=dm" (err), "+b" (next), "=a" (id), "+c" (size),
> "=m" (*desc)
> - : "D" (desc), "a" (0xe820));
> + : "D" (desc), "d" (SMAP), "a" (0xe820));
>  
>   /* Some BIOSes stop returning SMAP in the middle of
>  the search loop.  We don't know exactly how the BIOS

Worked, but that just raises more questions.  Why didn't more x86 boxes
break or, alternatively, why did a new version of the BIOS fix the problem? 
I guess we shouldn't look a gift horse in the mouth. Or something.

Thanks very much for your help.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stardom SATA HSM violation

2007-09-27 Thread Tejun Heo
Jeff Garzik wrote:
> Tejun Heo wrote:
>> Alan Cox wrote:
 I think there have been enough cases where this draining was necessary.
  IIRC, ata_piix was involved in those cases, right?  If so, can you
 please submit a patch which applies this only to affected controllers?
 I don't feel too confident about applying this to all SFF controllers.
>>> Old IDE does it on all controllers bar a couple. So we have a very good
>>> knowledge of what does/doesn't work. The one that needs care in old ide
>>> is an ordering issue where a state machine reset done first causes the
>>> drain of the I/O to hang.
>>
>> Hmmm... So, do we apply draining to all PATA?  Or is ata_piix SATA
>> affected too?
> 
> I would think all SFF controllers, since a lot of first gen SATA are
> really bridged solutions.  If they are flagging DRQ, I say oblige them :)

Alright, then the posted patch should be good enough.  Mark, can you be
bothered to regenerate the patch and post it one more time (again)?  It
seems we all agree the update is needed.

Thanks a lot.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] UML - Correctly handle skb allocation failures

2007-09-27 Thread Andrew Morton
On Thu, 27 Sep 2007 13:01:26 -0400
Jeff Dike <[EMAIL PROTECTED]> wrote:

> +static int update_drop_skb(int max)
> +{
> + struct sk_buff *new;
> + int err = 0;
> +
> + spin_lock(_lock);
> +
> + if (max <= drop_max)
> + goto out;
> +
> + err = -ENOMEM;
> + new = dev_alloc_skb(max);
> + if (new == NULL)
> + goto out;
> +
> + skb_put(new, max);
> +
> + kfree_skb(drop_skb);
> + drop_skb = new;
> + drop_max = max;
> + err = 0;
> +out:
> + spin_unlock(_lock);
> +
> + return err;
> +}
> +
>  static int uml_net_rx(struct net_device *dev)
>  {
>   struct uml_net_private *lp = dev->priv;
> @@ -43,6 +82,9 @@ static int uml_net_rx(struct net_device 
>   /* If we can't allocate memory, try again next round. */
>   skb = dev_alloc_skb(lp->max_packet);
>   if (skb == NULL) {
> + drop_skb->dev = dev;
> + /* Read a packet into drop_skb and don't do anything with it. */
> + (*lp->read)(lp->fd, drop_skb, lp);
>   lp->stats.rx_dropped++;
>   return 0;

Still wanna know why it is safe for uml_net_rx to be playing with
drop_skb when update_drop_skb() could be concurrently reallocating
and freeing it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Some IO scheduler cleanup in Documentation/block

2007-09-27 Thread Jens Axboe
On Thu, Sep 27 2007, Alan D. Brunelle wrote:
> 

> [PATCH] Some IO scheduler cleanup in Documentation/block

[snip]

Thanks Alan, applied.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stardom SATA HSM violation

2007-09-27 Thread Jeff Garzik

Tejun Heo wrote:

Alan Cox wrote:

I think there have been enough cases where this draining was necessary.
 IIRC, ata_piix was involved in those cases, right?  If so, can you
please submit a patch which applies this only to affected controllers?
I don't feel too confident about applying this to all SFF controllers.

Old IDE does it on all controllers bar a couple. So we have a very good
knowledge of what does/doesn't work. The one that needs care in old ide
is an ordering issue where a state machine reset done first causes the
drain of the I/O to hang.


Hmmm... So, do we apply draining to all PATA?  Or is ata_piix SATA
affected too?


I would think all SFF controllers, since a lot of first gen SATA are 
really bridged solutions.  If they are flagging DRQ, I say oblige them :)


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Hook up group scheduler with control groups

2007-09-27 Thread Andrew Morton
On Fri, 28 Sep 2007 01:05:12 +0530
Dhaval Giani <[EMAIL PROTECTED]> wrote:

> On Thu, Sep 27, 2007 at 12:00:33PM -0700, Randy Dunlap wrote:
> > On Thu, 27 Sep 2007 23:34:15 +0530 Dhaval Giani wrote:
> > > 
> > > 
> > > +config RESOURCE_COUNTERS
> > > + bool "Resource counters"
> > > + help
> > > +   This option enables controller independent resource accounting
> > 
> > Above line is tab + 2 spaces (i.e., correct).
> > 
> > > +  infrastructure that works with cgroups.
> > 
> > Above line indent is 10 spaces (i.e., not correct).
> > 
> 
> Ah! Thanks for the explanation. Corrected patch follows.
> 
> Signed-off-by : Srivatsa Vaddagiri <[EMAIL PROTECTED]>
> Signed-off-by : Dhaval Giani <[EMAIL PROTECTED]>
> 
> ...
>

> @@ -219,6 +225,9 @@ static inline struct task_grp *task_grp(
>  
>  #ifdef CONFIG_FAIR_USER_SCHED
>   tg = p->user->tg;
> +#elif CONFIG_FAIR_CGROUP_SCHED
> + tg = container_of(task_subsys_state(p, cpu_cgroup_subsys_id),
> + struct task_grp, css);
>  #else
>   tg  = _task_grp;
>  #endif

that's a bit funny-looking.  Are CONFIG_FAIR_CGROUP_SCHED and
CONFIG_FAIR_USER_SCHED mutually exclusive?  Doesn't seem that way.  if
they're both defined then CONFIG_FAIR_USER_SCHED "wins".

Anyway, please confirm that this is correct?

I'll switch that to `#elif defined(CONFIG_FAIR_CGROUP_SCHED)'.  We can get
gcc warnings with `#if CONFIG_FOO', and people should be using `#ifdef
CONFIG_FOO', so I assume the same applies to #elif.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Jens Axboe
On Thu, Sep 27 2007, Theodore Tso wrote:
> On Thu, Sep 27, 2007 at 04:19:12PM +0100, Alan Cox wrote:
> > > Well it's not my call, just seems like a really bad idea to change the
> > > error value. You can't claim full coverage for such testing anyway, it's
> > > one of those things that people will complain about two releases later
> > > saying it broke app foo.
> > 
> > Strange since we've spent years changing error values and getting them
> > right in the past. 
> 
> I doubt there any apps which are going to specifically check for EFBIG
> and do soemthing different if they get EOVERFLOW instead.  If it was
> something like EAGAIN or EPERM, I'd be more concerned, but EFBIG
> vs. EOVERFLOW?  C'mon!

It's not checking EFBIG vs EOVERFLOW, it's checking one and not the
other. But I digress, not trying to NAK the patch, just voicing my
opinion on the matter. It's not something you can easily test and claim
good app coverage, though.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
>>>
>> Oh bugger, looks like this one might be genuinely my fault after all.
>> The ID check in the new code is buggy.
>>
>> Can you please test this revised patch out (against current -git)?
> 
> 
> That looks the same as the previous patch you sent?
> 

Sorry, this is the right one...

-hpa
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..2f37568 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -28,11 +28,10 @@ static int detect_memory_e820(void)
 
 	do {
 		size = sizeof(struct e820entry);
-		id = SMAP;
 		asm("int $0x15; setc %0"
-		: "=am" (err), "+b" (next), "+d" (id), "+c" (size),
+		: "=dm" (err), "+b" (next), "=a" (id), "+c" (size),
 		  "=m" (*desc)
-		: "D" (desc), "a" (0xe820));
+		: "D" (desc), "d" (SMAP), "a" (0xe820));
 
 		/* Some BIOSes stop returning SMAP in the middle of
 		   the search loop.  We don't know exactly how the BIOS


Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-09-27 Thread Linas Vepstas
On Thu, Sep 27, 2007 at 04:10:31PM -0600, Matthew Wilcox wrote:
> In the error handler, we wait_for_completion(io_reset_wait).
> In sym2_io_error_detected, we init_completion(io_reset_wait).
> Isn't it possible that we hit the error handler before we hit the
> io_error_detected path, and thus the completion wait is lost?
> Since the completion is already initialised in sym_attach(), I don't
> think we need to initialise it in sym2_io_error_detected().
> Makes sense to just delete it?

Good catch. But no ... and I had to study this a bit. Bear with me:

It is enough to call init_completion() once, and not once per use:
it initializes spinlocks, which shouldn't be intialized twice. 

But, that completion might be used multiple times when there are
multiple errors, and so, before using it a second time, one must 
set completion->done = 0.  The INIT_COMPLETION() macro does this. 

One must have completion->done = 0 before every use, as otherwise, 
wait_for_completion() won't actually wait. And since complete_all()
sets x->done += UINT_MAX/2, I'm pretty sure x->done won't be zero
the next time we use it, unless we make it so.

So I need to find a place to safely call INIT_COMPLETION() again, 
after the completion has been used. At the moment, I'm stumped
as to where to do this. 

 [think ... think ... think] 

I think the race you describe above is harmless. The first time
that sym_eh_handler() will run, it will be with SYM_EH_ABORT, 
in it doesn't matter if we lose that, since the device is hosed
anyway. At some later time, it will run with SYM_EH_DEVICE_RESET
and then SYM_EH_BUS_RESET and then SYM_EH_HOST_RESET, and we won't 
miss those, since, by now, sym2_io_error_detected() will have run.

So, by my reading, I'd say that init_completion() in
sym2_io_error_detected() has to stay (although perhaps
it should be replaced by the INIT_COMPLETION() macro.)
Removing it will prevent correct operation on the second 
and subsequent errors.

--Linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 16:27 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> > On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
> >> Jordan Crouse wrote:
> >>> Breaks on the Geode - original behavior.
> >>>
> >>> I think that having boot_prams.e820_entries != 0 makes the kernel
> >>> assume the e820 data is correct.
> >>>
> >> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
> >> because this, to the best of my reading, mimics the 2.6.22 behavior
> >> exactly.  DID IT REALLY, and/or did you make any kind of configuration
> >> changes?
> > 
> > I copied in a 2.6.22 kernel to see that it really did work, and it did.
> > But here's the crazy part - I did a dmesg, and it looks like it
> > *is* using e820 data, and it looks complete (I see the entire map - 
> > including the ACPI and reserved blocks way up high).
> > 
> > So apparently it was the 2.6.22 code that was buggy, but reading it,
> > I don't immediately see how. 
> > 
> 
> Oh bugger, looks like this one might be genuinely my fault after all.
> The ID check in the new code is buggy.
> 
> Can you please test this revised patch out (against current -git)?

>   -hpa
> 
> 
> 

> diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
> index bccaa1c..84939b7 100644
> --- a/arch/i386/boot/memory.c
> +++ b/arch/i386/boot/memory.c
> @@ -34,17 +34,7 @@ static int detect_memory_e820(void)
> "=m" (*desc)
>   : "D" (desc), "a" (0xe820));
>  
> - /* Some BIOSes stop returning SMAP in the middle of
> -the search loop.  We don't know exactly how the BIOS
> -screwed up the map at that point, we might have a
> -partial map, the full map, or complete garbage, so
> -just return failure. */
> - if (id != SMAP) {
> - count = 0;
> - break;
> - }
> -
> - if (err)
> + if (id != SMAP || err)
>   break;
>  
>   count++;


That looks the same as the previous patch you sent?

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Stardom SATA HSM violation

2007-09-27 Thread Tejun Heo
Alan Cox wrote:
>> I think there have been enough cases where this draining was necessary.
>>  IIRC, ata_piix was involved in those cases, right?  If so, can you
>> please submit a patch which applies this only to affected controllers?
>> I don't feel too confident about applying this to all SFF controllers.
> 
> Old IDE does it on all controllers bar a couple. So we have a very good
> knowledge of what does/doesn't work. The one that needs care in old ide
> is an ordering issue where a state machine reset done first causes the
> drain of the I/O to hang.

Hmmm... So, do we apply draining to all PATA?  Or is ata_piix SATA
affected too?

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Matthew Wilcox
On Thu, Sep 27, 2007 at 07:19:27PM -0400, Theodore Tso wrote:
> Would you accept a patch which causes the deprecated sysfs
> files/directories to disappear, even if CONFIG_SYS_DEPRECATED is
> defined, via a boot-time parameter?

How about a mount option?  That way people can test without a reboot:

mount -o remount,deprecated={yes,no} /sys

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
> On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
>> Jordan Crouse wrote:
>>> Breaks on the Geode - original behavior.
>>>
>>> I think that having boot_prams.e820_entries != 0 makes the kernel
>>> assume the e820 data is correct.
>>>
>> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
>> because this, to the best of my reading, mimics the 2.6.22 behavior
>> exactly.  DID IT REALLY, and/or did you make any kind of configuration
>> changes?
> 
> I copied in a 2.6.22 kernel to see that it really did work, and it did.
> But here's the crazy part - I did a dmesg, and it looks like it
> *is* using e820 data, and it looks complete (I see the entire map - 
> including the ACPI and reserved blocks way up high).
> 
> So apparently it was the 2.6.22 code that was buggy, but reading it,
> I don't immediately see how. 
> 

Oh bugger, looks like this one might be genuinely my fault after all.
The ID check in the new code is buggy.

Can you please test this revised patch out (against current -git)?

-hpa



diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..84939b7 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -34,17 +34,7 @@ static int detect_memory_e820(void)
  "=m" (*desc)
: "D" (desc), "a" (0xe820));
 
-   /* Some BIOSes stop returning SMAP in the middle of
-  the search loop.  We don't know exactly how the BIOS
-  screwed up the map at that point, we might have a
-  partial map, the full map, or complete garbage, so
-  just return failure. */
-   if (id != SMAP) {
-   count = 0;
-   break;
-   }
-
-   if (err)
+   if (id != SMAP || err)
break;
 
count++;


Re: sata_sil24 broken since 2.6.23-rc4-mm1

2007-09-27 Thread Tejun Heo
Torsten Kaiser wrote:
> Known good is for me 2.6.23-rc3-mm1, the first known bad is 2.6.23-rc4-mm1.
> I will try to look at the diff between these revisions some more, but
> the change in sata_sil24.c looked like a perfect match for the
> symptoms I was seeing.

I think the first thing to do here is to verify 2.6.23-rc3-mm1 still
works fine and my previous debug patch is pretty much meaningless if
address initialization failure isn't the cause.

Thanks.

-- 
tejun

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Greg KH
On Thu, Sep 27, 2007 at 06:27:48PM -0400, Kyle Moffett wrote:
> On Sep 27, 2007, at 17:34:45, Greg KH wrote:
>> On Thu, Sep 27, 2007 at 02:37:42PM -0400, Theodore Tso wrote:
>>> That fact that sysfs is all laid out in a directory, but for which some 
>>> directories/symlinks are OK to use, and some are NOT OK to use --- is why 
>>> I call the sysfs interface "an open pit".
>>
>> And because of the original design mistakes, we have only been able to 
>> change things for the better in a slow manner.  We have had userspace 
>> programs fixed up for _years_ before we are able to make the corresponding 
>> changes in the kernel, so as to not break the distros that are slow to 
>> upgrade packages and kernels (like Debian.)
>
> Hey!  No poking fingers at Debian here; it's been *MUCH* improved lately.  

Heh, sorry, but Debian in the past had a lot of problems in this area.
It's good to know that this is no longer a issue :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
> 
> I copied in a 2.6.22 kernel to see that it really did work, and it did.
> But here's the crazy part - I did a dmesg, and it looks like it
> *is* using e820 data, and it looks complete (I see the entire map - 
> including the ACPI and reserved blocks way up high).
> 
> So apparently it was the 2.6.22 code that was buggy, but reading it,
> I don't immediately see how. 
> 

Was this a stock 2.6.22 kernel, or might it have been modified?

There is, of course, also the possibility that triggering the BIOS bug
in your case depends on some delicate combination of input state.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Theodore Tso
On Thu, Sep 27, 2007 at 02:34:45PM -0700, Greg KH wrote:
> Ok, how then should I advertise this better?  What can we do better to
> help userspace programmers out in this regard?

Would you accept a patch which causes the deprecated sysfs
files/directories to disappear, even if CONFIG_SYS_DEPRECATED is
defined, via a boot-time parameter?  Many people and distros are
likely to keep CONFIG_SYS_DEPRECATED defined just our of paranoia that
things might break.  Doing a quick google, I note that Fedora has been
going back and forth of turning it off, watching things break, and
then turning it back on.  The latest time, the changelog said:

* Fri Jan 26 23:00:00 2007 Bill Nottingham 

- turn on CONFIG_SYSFS_DEPRECATED so that things actually work. *sigh*

(and I've checked, Fedora's CVS still has CONFIG_SYSFS_DEPRECATED
defined; it's not just Debian at fault here.)

So having a boot-time parameter would make it much easier for
application programmers (who run distro kernels and who are unlikely
to want to compile their own custom kernel) to test to see what breaks
without CONFIG_SYS_DEPRECATED.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 15:47 -0700, H. Peter Anvin wrote:
> Jordan Crouse wrote:
> > 
> > Breaks on the Geode - original behavior.
> > 
> > I think that having boot_prams.e820_entries != 0 makes the kernel
> > assume the e820 data is correct.
> > 
> 
> Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
> because this, to the best of my reading, mimics the 2.6.22 behavior
> exactly.  DID IT REALLY, and/or did you make any kind of configuration
> changes?

I copied in a 2.6.22 kernel to see that it really did work, and it did.
But here's the crazy part - I did a dmesg, and it looks like it
*is* using e820 data, and it looks complete (I see the entire map - 
including the ACPI and reserved blocks way up high).

So apparently it was the 2.6.22 code that was buggy, but reading it,
I don't immediately see how. 

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kswapd should only wait on IO if there is IO

2007-09-27 Thread Rik van Riel
On Thu, 27 Sep 2007 15:59:07 -0700
Andrew Morton <[EMAIL PROTECTED]> wrote:

> And lost the changelog ;)

Good point.

The current kswapd (and try_to_free_pages) code has an oddity where the
code will wait on IO, even if there is no IO in flight.  This problem is
notable especially when the system scans through many unfreeable pages,
causing unnecessary stalls in the VM.

Additionally, tasks without __GFP_FS or __GFP_IO in the direct reclaim
path will sleep if a significant number of pages are encountered that
should be written out.  This gives kswapd a chance to write out those
pages, while the direct reclaim task sleeps.

Signed-off-by: Rik van Riel <[EMAIL PROTECTED]>

diff -up linux-2.6.22/mm/vmscan.c.wait linux-2.6.22/mm/vmscan.c
--- linux-2.6.22/mm/vmscan.c.wait   2007-09-27 18:45:57.0 -0400
+++ linux-2.6.22/mm/vmscan.c2007-09-27 18:48:43.0 -0400
@@ -68,6 +68,13 @@ struct scan_control {
int all_unreclaimable;
 
int order;
+
+   /*
+* Pages that have (or should have) IO pending.  If we run into
+* a lot of these, we're better off waiting a little for IO to
+* finish rather than scanning more pages in the VM.
+*/
+   int nr_io_pages;
 };
 
 #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
@@ -489,8 +496,10 @@ static unsigned long shrink_page_list(st
 */
if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
wait_on_page_writeback(page);
-   else
+   else {
+   sc->nr_io_pages++;
goto keep_locked;
+   }
}
 
referenced = page_referenced(page, 1);
@@ -529,8 +538,10 @@ static unsigned long shrink_page_list(st
if (PageDirty(page)) {
if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced)
goto keep_locked;
-   if (!may_enter_fs)
+   if (!may_enter_fs) {
+   sc->nr_io_pages++;
goto keep_locked;
+   }
if (!sc->may_writepage)
goto keep_locked;
 
@@ -541,8 +552,10 @@ static unsigned long shrink_page_list(st
case PAGE_ACTIVATE:
goto activate_locked;
case PAGE_SUCCESS:
-   if (PageWriteback(page) || PageDirty(page))
+   if (PageWriteback(page) || PageDirty(page)) {
+   sc->nr_io_pages++;
goto keep;
+   }
/*
 * A synchronous write - probably a ramdisk.  Go
 * ahead and try to reclaim the page.
@@ -1201,6 +1214,7 @@ unsigned long try_to_free_pages(struct z
 
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
sc.nr_scanned = 0;
+   sc.nr_io_pages = 0;
if (!priority)
disable_swap_token();
nr_reclaimed += shrink_zones(priority, zones, );
@@ -1229,7 +1243,8 @@ unsigned long try_to_free_pages(struct z
}
 
/* Take a nap, wait for some writeback to complete */
-   if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
+   if (sc.nr_scanned && priority < DEF_PRIORITY - 2 &&
+   sc.nr_io_pages > sc.swap_cluster_max)
congestion_wait(WRITE, HZ/10);
}
/* top priority shrink_caches still had more to do? don't OOM, then */
@@ -1315,6 +1330,7 @@ loop_again:
if (!priority)
disable_swap_token();
 
+   sc.nr_io_pages = 0;
all_zones_ok = 1;
 
/*
@@ -1398,7 +1414,8 @@ loop_again:
 * OK, kswapd is getting into trouble.  Take a nap, then take
 * another pass across the zones.
 */
-   if (total_scanned && priority < DEF_PRIORITY - 2)
+   if (total_scanned && priority < DEF_PRIORITY - 2 &&
+   sc.nr_io_pages > sc.swap_cluster_max)
congestion_wait(WRITE, HZ/10);
 
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with SMP & ACPI powering off

2007-09-27 Thread Mark Lord

Rafael J. Wysocki wrote:

On Thursday, 27 September 2007 23:29, Mark Lord wrote:

Question:  do we disable all CPUs except 0 when doing ACPI power off?


No, but we should.


Background:
I have a machine here dedicated to running MythTV.
It powers up to record, and then sets the RTC alarm for next time
and powers down again in between recordings.

It has an Intel Core2duo E6300 CPU, currently on an ICH8 motherboard.
Previously it was on a completely different (vendor,bios,...) ICH7 motherboard.

In both cases, "halt -p" sometimes fails to actually turn off the power,
which means that it later then fails to "turn on" to record again.

Annoying.

This is a 32-bit kernel/runtime, with full ACPI (not APM) kernel support 
enabled.

So I'm wondering if it may be due to the old SMP-poweroff bogeyman ?


May be.

Which kernel?


Latest 2.6.23-rc-git.  Same problem from time to time on 2.6.17, as well.
Dunno about in between those Revs., but it's much more common on the latest
than it was on the old kernel.




For now, I've hardcoded a cpu_down(1) into the poweroff code,
and we'll see if that helps or is merely redundant.

But I do wonder where else to look for a cause?

Two different boards, vendors, BIOSs, same CPU chip.  Same problem.


Same chipset, perchance?


Mmmm I originally didn't think so.

But actually one board is ICH8, the other ICH8R,
so yes, they use the same chipset.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Linux-tiny project revival

2007-09-27 Thread Rob Landley
On Thursday 27 September 2007 2:00:36 am Arnd Bergmann wrote:
> #define KERN_NOTICE "<5>",
>
> #define PRINTK_CONTINUED "",
>
>  #define printk(level, str, ...) \
>do { \
>  if (sizeof(level) == 1) /* continued printk */\
>   actual_printk(str, __VA_ARGS__); \
>  else if ((level[1] - '0') < CONFIG_PRINTK_DOICARE) \
>actual_printk(level str, __VA_ARGS__); \
>} while(0);
>
> Then you don't have to change every single printk in the kernel, but
> only those that don't currently come with a log level. More importantly,
> you can do the conversion without a flag day, by spreading (an empty)
> PRINTK_CONTINUED in places that do need a printk without a log level.

The "change every printk in the kernel" suggestion came from me trying to 
figure out how to get the printk() calls below a certain log level to 
optimize out and not take up space in the binary.

The above doesn't address the original cause of the thread, as far as I can 
tell.

>   Arnd <><

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bluez-devel] Warnings and Bug on 2.6.23-rc6 closing rfcomm links (device_move() API ?)

2007-09-27 Thread Marcel Holtmann
Hi Cornelia,

> > >> Yet another report, once again while putting rfcomm system under load. 
> > >> Several USB adapters, several links.
> > > 
> > > Is this a regression or does it happen with 2.6.22 too?
> > 
> > I've not tested with 2.6.22, but have done it a few days ago with 
> > 2.6.21-2-486 (stock debian package), and got the 2 Oops below. Maybe 
> > that's a different problem, or maybe not?
> > 
> > 
> > kobject_add failed for rfcomm1 with -EEXIST, don't try to register 
> > things with the same name in the same directory.
> 
> There's something wrong with rfcomm trying to create objects with
> duplicate names...

that should have been fixed.

Regards

Marcel


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kswapd should only wait on IO if there is IO

2007-09-27 Thread Andrew Morton
On Thu, 27 Sep 2007 18:50:27 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:

> On Thu, 27 Sep 2007 15:21:21 -0700
> Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > > Nope, sc.nr_io_pages will also be incremented when the code runs into
> > > pages that are already PageWriteback.
> > 
> > yup, I didn't think of that.  Hopefully someone else will be in there
> > working on that zone too.  If this caller yields and defers to kswapd
> > then that's very likely.  Except we just took away the ability to do that..
> 
> if (PageDirty(page)) {
> if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && 
> referenced)
> goto keep_locked;
> if (!may_enter_fs)
> goto keep_locked;
> 
> I think we can fix that problem by adding a sc->nr_io_pages++
> between the last if and the goto keep_locked in shrink_page_list.
> 
> That way !GFP_IO or !GFP_FS tasks will cause themselves to sleep
> if there are pages that need to be written out, even if those
> pages are not in flight to disk yet.

yeah, that's prudent I guess.

> I have also added the comment you wanted.

And lost the changelog ;)

> - if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
> + if (sc.nr_scanned && priority < DEF_PRIORITY - 2 &&
> + sc.nr_io_pages > sc.swap_cluster_max)

I do think this design decision needs a bit of explanation too.

>   congestion_wait(WRITE, HZ/10);
>   }
>   /* top priority shrink_caches still had more to do? don't OOM, then */
> @@ -1315,6 +1330,7 @@ loop_again:
>   if (!priority)
>   disable_swap_token();
>  
> + sc.nr_io_pages = 0;
>   all_zones_ok = 1;
>  
>   /*
> @@ -1398,7 +1414,8 @@ loop_again:
>* OK, kswapd is getting into trouble.  Take a nap, then take
>* another pass across the zones.
>*/
> - if (total_scanned && priority < DEF_PRIORITY - 2)
> + if (total_scanned && priority < DEF_PRIORITY - 2 &&

As did that one.  Ho hum :(  Maybe it's in the git history somewhere.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kswapd should only wait on IO if there is IO

2007-09-27 Thread Rik van Riel
On Thu, 27 Sep 2007 15:21:21 -0700
Andrew Morton <[EMAIL PROTECTED]> wrote:

> > Nope, sc.nr_io_pages will also be incremented when the code runs into
> > pages that are already PageWriteback.
> 
> yup, I didn't think of that.  Hopefully someone else will be in there
> working on that zone too.  If this caller yields and defers to kswapd
> then that's very likely.  Except we just took away the ability to do that..

if (PageDirty(page)) {
if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced)
goto keep_locked;
if (!may_enter_fs)
goto keep_locked;

I think we can fix that problem by adding a sc->nr_io_pages++
between the last if and the goto keep_locked in shrink_page_list.

That way !GFP_IO or !GFP_FS tasks will cause themselves to sleep
if there are pages that need to be written out, even if those
pages are not in flight to disk yet.

I have also added the comment you wanted.

Signed-off-by: Rik van Riel <[EMAIL PROTECTED]>

diff -up linux-2.6.23-rc7/mm/vmscan.c.wait linux-2.6.22/mm/vmscan.c
--- linux-2.6.23-rc7/mm/vmscan.c.wait   2007-09-27 18:45:57.0 -0400
+++ linux-2.6.23-rc7/mm/vmscan.c2007-09-27 18:48:43.0 -0400
@@ -68,6 +68,13 @@ struct scan_control {
int all_unreclaimable;
 
int order;
+
+   /*
+* Pages that have (or should have) IO pending.  If we run into
+* a lot of these, we're better off waiting a little for IO to
+* finish rather than scanning more pages in the VM.
+*/
+   int nr_io_pages;
 };
 
 #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
@@ -489,8 +496,10 @@ static unsigned long shrink_page_list(st
 */
if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
wait_on_page_writeback(page);
-   else
+   else {
+   sc->nr_io_pages++;
goto keep_locked;
+   }
}
 
referenced = page_referenced(page, 1);
@@ -529,8 +538,10 @@ static unsigned long shrink_page_list(st
if (PageDirty(page)) {
if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced)
goto keep_locked;
-   if (!may_enter_fs)
+   if (!may_enter_fs) {
+   sc->nr_io_pages++;
goto keep_locked;
+   }
if (!sc->may_writepage)
goto keep_locked;
 
@@ -541,8 +552,10 @@ static unsigned long shrink_page_list(st
case PAGE_ACTIVATE:
goto activate_locked;
case PAGE_SUCCESS:
-   if (PageWriteback(page) || PageDirty(page))
+   if (PageWriteback(page) || PageDirty(page)) {
+   sc->nr_io_pages++;
goto keep;
+   }
/*
 * A synchronous write - probably a ramdisk.  Go
 * ahead and try to reclaim the page.
@@ -1201,6 +1214,7 @@ unsigned long try_to_free_pages(struct z
 
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
sc.nr_scanned = 0;
+   sc.nr_io_pages = 0;
if (!priority)
disable_swap_token();
nr_reclaimed += shrink_zones(priority, zones, );
@@ -1229,7 +1243,8 @@ unsigned long try_to_free_pages(struct z
}
 
/* Take a nap, wait for some writeback to complete */
-   if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
+   if (sc.nr_scanned && priority < DEF_PRIORITY - 2 &&
+   sc.nr_io_pages > sc.swap_cluster_max)
congestion_wait(WRITE, HZ/10);
}
/* top priority shrink_caches still had more to do? don't OOM, then */
@@ -1315,6 +1330,7 @@ loop_again:
if (!priority)
disable_swap_token();
 
+   sc.nr_io_pages = 0;
all_zones_ok = 1;
 
/*
@@ -1398,7 +1414,8 @@ loop_again:
 * OK, kswapd is getting into trouble.  Take a nap, then take
 * another pass across the zones.
 */
-   if (total_scanned && priority < DEF_PRIORITY - 2)
+   if (total_scanned && priority < DEF_PRIORITY - 2 &&
+   sc.nr_io_pages > sc.swap_cluster_max)
congestion_wait(WRITE, HZ/10);
 
/*
-
To unsubscribe from this list: send 

[PATCH] removes array_size duplicates

2007-09-27 Thread roel
This patch removes some ARRAY_SIZE macro duplicates. There is also one in
arch/um/include/user.h, which isn't fixed here because comments in that file
explicitly state a preference for the 'less fancy' version. If that's the
case as well for any of the other replacements please comment.

Signed-off-by: Roel Kluin <[EMAIL PROTECTED]>
---

 Documentation/spi/spidev_test.c|2 --
 arch/i386/boot/compressed/relocs.c |1 -
 arch/m68k/amiga/amisound.c |3 +--
 arch/powerpc/boot/types.h  |2 --
 arch/sparc64/kernel/pci.c  |6 ++
 drivers/acpi/utilities/uteval.c|4 ++--
 drivers/net/irda/actisys-sir.c |6 ++
 drivers/net/lp486e.c   |4 +---
 drivers/net/sk98lin/skgemib.c  |5 -
 drivers/net/skfp/smt.c |4 +---
 drivers/net/skfp/srf.c |   18 +++---
 drivers/net/wireless/ipw2100.c |   13 -
 drivers/serial/68328serial.c   |6 ++
 drivers/video/sgivwfb.c|4 ++--
 include/acpi/acmacros.h|2 --
 include/linux/netfilter/xt_sctp.h  |   12 +---
 include/net/ip_vs.h|1 -
 include/video/sgivw.h  |1 -
 net/ipv4/ipvs/ip_vs_proto_tcp.c|2 +-
 scripts/mod/file2alias.c   |2 --
 20 files changed, 30 insertions(+), 68 deletions(-)

diff --git a/Documentation/spi/spidev_test.c b/Documentation/spi/spidev_test.c
index 218e862..0f23aac 100644
--- a/Documentation/spi/spidev_test.c
+++ b/Documentation/spi/spidev_test.c
@@ -21,8 +21,6 @@
 #include 
 #include 
 
-#define ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0]))
-
 static void pabort(const char *s)
 {
perror(s);
diff --git a/arch/i386/boot/compressed/relocs.c 
b/arch/i386/boot/compressed/relocs.c
index 2d77ee7..5d8dbff 100644
--- a/arch/i386/boot/compressed/relocs.c
+++ b/arch/i386/boot/compressed/relocs.c
@@ -11,7 +11,6 @@
 #include 
 
 #define MAX_SHDRS 100
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 static Elf32_Ehdr ehdr;
 static Elf32_Shdr shdr[MAX_SHDRS];
 static Elf32_Sym  *symtab[MAX_SHDRS];
diff --git a/arch/m68k/amiga/amisound.c b/arch/m68k/amiga/amisound.c
index 1f5bfb5..8d013a1 100644
--- a/arch/m68k/amiga/amisound.c
+++ b/arch/m68k/amiga/amisound.c
@@ -21,7 +21,6 @@ static const signed char sine_data[] = {
0,  39,  75,  103,  121,  127,  121,  103,  75,  39,
0, -39, -75, -103, -121, -127, -121, -103, -75, -39
 };
-#define DATA_SIZE  (sizeof(sine_data)/sizeof(sine_data[0]))
 
 #define custom amiga_custom
 
@@ -55,7 +54,7 @@ void __init amiga_init_sound(void)
memcpy (snd_data, sine_data, sizeof(sine_data));
 
/* setup divisor */
-   clock_constant = (amiga_colorclock+DATA_SIZE/2)/DATA_SIZE;
+   clock_constant = (amiga_colorclock + ARRAY_SIZE(sine_data) /2) / 
ARRAY_SIZE(sine_data);
 
/* without amifb, turn video off and enable high quality sound */
 #ifndef CONFIG_FB_AMIGA
diff --git a/arch/powerpc/boot/types.h b/arch/powerpc/boot/types.h
index 31393d1..733622a 100644
--- a/arch/powerpc/boot/types.h
+++ b/arch/powerpc/boot/types.h
@@ -1,8 +1,6 @@
 #ifndef _TYPES_H_
 #define _TYPES_H_
 
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
-
 typedef unsigned char  u8;
 typedef unsigned short u16;
 typedef unsigned int   u32;
diff --git a/arch/sparc64/kernel/pci.c b/arch/sparc64/kernel/pci.c
index e8dac81..5c8c433 100644
--- a/arch/sparc64/kernel/pci.c
+++ b/arch/sparc64/kernel/pci.c
@@ -209,14 +209,12 @@ static struct {
{ "SUNW,sun4v-pci", sun4v_pci_init },
{ "pciex108e,80f0", fire_pci_init },
 };
-#define PCI_NUM_CONTROLLER_TYPES (sizeof(pci_controller_table) / \
- sizeof(pci_controller_table[0]))
 
 static int __init pci_controller_init(const char *model_name, int namelen, 
struct device_node *dp)
 {
int i;
 
-   for (i = 0; i < PCI_NUM_CONTROLLER_TYPES; i++) {
+   for (i = 0; i < ARRAY_SIZE(pci_controller_table); i++) {
if (!strncmp(model_name,
 pci_controller_table[i].model_name,
 namelen)) {
@@ -232,7 +230,7 @@ static int __init pci_is_controller(const char *model_name, 
int namelen, struct
 {
int i;
 
-   for (i = 0; i < PCI_NUM_CONTROLLER_TYPES; i++) {
+   for (i = 0; i < ARRAY_SIZE(pci_controller_table); i++) {
if (!strncmp(model_name,
 pci_controller_table[i].model_name,
 namelen)) {
diff --git a/drivers/acpi/utilities/uteval.c b/drivers/acpi/utilities/uteval.c
index 0042b7e..5da86d5 100644
--- a/drivers/acpi/utilities/uteval.c
+++ b/drivers/acpi/utilities/uteval.c
@@ -122,7 +122,7 @@ acpi_status acpi_ut_osi_implementation(struct 
acpi_walk_state *walk_state)
 
/* Compare input string to static table of supported interfaces */
 
-   for (i = 0; i < ACPI_ARRAY_LENGTH(acpi_interfaces_supported); i++) {
+   

Re: More E820 brokenness

2007-09-27 Thread H. Peter Anvin
Jordan Crouse wrote:
> 
> Breaks on the Geode - original behavior.
> 
> I think that having boot_prams.e820_entries != 0 makes the kernel
> assume the e820 data is correct.
> 

Okay, now I'm utterly baffled how 2.6.22 ever worked on this Geode,
because this, to the best of my reading, mimics the 2.6.22 behavior
exactly.  DID IT REALLY, and/or did you make any kind of configuration
changes?

>> I want to emphasize that this is seriously broken.  Using a partial e820
>> map could have disastrous results, since the kernel will have partial
>> memory map information and not know about reserved areas, etc.  Part of
>> me feels that the right thing to do is what the current git kernel does
>> -- either fall back to e801, or stop and error.
> 
> I'm inclined to agree.  

Arguably the right thing to do is to find the responsible BIOS engineer
and shoot them, but that's hard to do without robotics.

-hpa



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET 4/4] sysfs: implement new features

2007-09-27 Thread Kyle Moffett

On Sep 25, 2007, at 18:50:05, Greg KH wrote:

On Thu, Sep 20, 2007 at 05:31:37PM +0900, Tejun Heo wrote:
* Name-formatting for symlinks.  e.g. symlink pointing to /dira/ 
dirb/leaf can be named as "symlink:%1-%0" and it will show up as  
"symlink:dirb-leaf".  This only applies when new interface is used.


Is this really necessary?  It looks like we are adding a "special"  
type of parser here that no one uses.


IMHO this would be nicer if it could reuse existing sprintf code to  
handle all the nice shiny sprintf format specifiers.  The only  
challenge would be how to dynamically build a varargs list from an  
array of component names although perhaps there could be an internal  
__csprintf function which took a callback for retrieving arguments.   
Also since all of the path components are strings I don't know that  
numeric specifiers could be made useful, so perhaps it's not the  
greatest idea.


I think the primary importance for this functionality is:

* Autorenaming of symlinks according to the name format string  
when target or one of its ancestors is renamed or moved.  This  
only applies when new interface is used.


Nice.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Linux-tiny project revival

2007-09-27 Thread Arnd Bergmann
On Thursday 27 September 2007, you wrote:
> > Then you don't have to change every single printk in the kernel, but
> > only those that don't currently come with a log level. More importantly,
> > you can do the conversion without a flag day, by spreading (an empty)
> > PRINTK_CONTINUED in places that do need a printk without a log level.
> 
> The problem is, how do you know whether to print a continued printk or not?
> It depends on the loglevel of the first printk.

Those need to be looked at individually. You can normally see easily from
the context whether the missing log level was an accident, or the author
actually has multiple printk statements for a single line. In one case,
you would add a log level, in the other case, you can add PRINTK_CONTINUED,
or something similar. An alternative to PRINTK_CONTINUED might be a new
function, e.g. printk_continued() or similar that does not expect a log
level.

> So besides compile-time parsing of the source code, replacing printk with
> loglevel specific alternatives (one way or the other) seems the only option.

That would mean replacing all of them, not just those that currently lack
a loglevel.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: More E820 brokenness

2007-09-27 Thread Jordan Crouse
On 27/09/07 15:17 -0700, H. Peter Anvin wrote:
> As luck would have it, it's not just an obscure Geode system which has a
> broken E820 implementation.  Today I received a bug report about a Dell
> system (XPS M1330) with broken E820.
> 
> Unfortunately, the workaround for the Geode breaks this system, because
> x86-64 doesn't fall back to the e801/88 information like the i386 kernel
> does.
> 
> I wonder if the relevant people could test out this patch to see how it
> works on their respective system.  This patch reverts to 2.6.23-rc8
> behaviour of simply truncating the map, but still makes e801/88 info
> available to the kernel; this hopefully should match 2.6.22 behaviour.

Breaks on the Geode - original behavior.

I think that having boot_prams.e820_entries != 0 makes the kernel
assume the e820 data is correct.

> I want to emphasize that this is seriously broken.  Using a partial e820
> map could have disastrous results, since the kernel will have partial
> memory map information and not know about reserved areas, etc.  Part of
> me feels that the right thing to do is what the current git kernel does
> -- either fall back to e801, or stop and error.

I'm inclined to agree.  

Jordan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc7 - _random_ IRQ23 : nobody cared

2007-09-27 Thread Benjamin Herrenschmidt

On Thu, 2007-09-27 at 10:05 +, Paul Rolland wrote:
> Hello,
> 
> On Thu, 27 Sep 2007 19:04:11 +1000
> Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote:
> 
> > Let me guess... this is a T61 or X61 ?
> Bad luck ;)
>  
> This is an Asus P5W-DH Deluxe motherboard, with a Core2 6400 CPU, 
> a bunch of disk (2 IDE, 3 SATA, 1 CDRW and 1 DVDRW-DL), and a damned
> Olitec PCI V92 V2 modem.

What chipset ? 965gm ?

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Kyle Moffett

On Sep 27, 2007, at 17:34:45, Greg KH wrote:

On Thu, Sep 27, 2007 at 02:37:42PM -0400, Theodore Tso wrote:
That fact that sysfs is all laid out in a directory, but for which  
some directories/symlinks are OK to use, and some are NOT OK to  
use --- is why I call the sysfs interface "an open pit".


And because of the original design mistakes, we have only been able  
to change things for the better in a slow manner.  We have had  
userspace programs fixed up for _years_ before we are able to make  
the corresponding changes in the kernel, so as to not break the  
distros that are slow to upgrade packages and kernels (like Debian.)


Hey!  No poking fingers at Debian here; it's been *MUCH* improved  
lately.  I far more frequently have problems with boxes still running  
some ancient release of RHEL-4 or something than I do with those  
running Debian stable (virtually always the latest Debian stable).


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kswapd should only wait on IO if there is IO

2007-09-27 Thread Andrew Morton
On Thu, 27 Sep 2007 18:13:25 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:

> On Thu, 27 Sep 2007 14:47:02 -0700
> Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > On Thu, 27 Sep 2007 17:08:16 -0400
> > Rik van Riel <[EMAIL PROTECTED]> wrote:
> > 
> > > The current kswapd (and try_to_free_pages) code has an oddity where the
> > > code will wait on IO, even if there is no IO in flight.  This problem is
> > > notable especially when the system scans through many unfreeable pages,
> > > causing unnecessary stalls in the VM.
> > > 
> > 
> > What effect did this change have?
> 
> Kswapd was no longer sitting in "D" state as often and pages got
> freed more promptly.  The test was done on a RHEL kernel with
> this change though - I guess I should redo it with a current upstream
> kernel.

OK.  Yes, it should help quite a bit in the common cases.

> > >  
> > >   /* Take a nap, wait for some writeback to complete */
> > > - if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
> > > + if (sc.nr_scanned && priority < DEF_PRIORITY - 2 &&
> > > + sc.nr_io_pages > sc.swap_cluster_max)
> > 
> > The comparison with swap_cluster_max is unobvious, and merits a
> > comment.  What is the thinking here?  
> 
> If the number of pages undergoing IO is really small, waiting
> for them may be a waste of time.
> 
> Maybe my thinking is wrong, not sure...

The thinking sounds good to me, but I'm looking for weirdo side-effects in
corner cases.  And I'm trying to work out what actual design we want to
have behind these various magic numbers and thresholds.

> > Also, we now have this:
> > 
> > if (total_scanned > sc.swap_cluster_max +
> > sc.swap_cluster_max / 2) {
> > wakeup_pdflush(laptop_mode ? 0 : total_scanned);
> > sc.may_writepage = 1;
> > }
> > 
> > /* Take a nap, wait for some writeback to complete */
> > if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && 
> > sc.nr_io_pages > sc.swap_cluster_max)
> > congestion_wait(WRITE, HZ/10);
> > 
> > 
> > So in the case where total_scanned has not yet reached
> > swap_cluster_max, this process isn't initiating writeout and it isn't
> > sleeping, either.  Nor is it incrementing nr_io_pages.
> 
> Actually, nr_io_pages is also incremented when we run into pages that
> are already PageWriteback - even if we did not start the IO ourselves.

OK, that'll help a lot in this scenario.

> > In the range (swap_cluster_max < nr_io_pages < 1.5*swap_cluster_max) this
> > process still isn't incrementing nr_io_pages, but it _is_ running
> > congestion_wait().
> 
> It is incrementing sc.nr_io_pages and will wait on IO to complete if
> the amount of pages in flight to disk that it scanned over is larger
> than the number of pages that it is trying to free.
> 
> > Once nr_io_pages exceeds 1.5*swap_cluster_max, this process is both
> > initiating IO and is throttling on writeback completion events.
> > 
> > This all seems a bit weird and arbitrary - what is the reason for
> > throttling-but-not-writing in that 1.0->1.5 window?
> 
> Good question.  Note that the throttling-but-not-writing window in
> the current code is 0.0->1.5, so this patch does reduce the throttling
> window compared to the current code.
> 
> What is the reason that the current code does IO throttling even if
> there is no IO at all in flight?

Buggered if I know ;)

It may have the accidental effect that it opens a window in which some
may_enter_fs-capable process can get scheduled and do some writeout,
perhaps.

> > If there _is_ a reason and it's all been carefully thought out and
> > designed, then can we please capture a description of that design in the
> > changelog or in the code?
> 
> I'll add a description for the sc.nr_io_pages > sc.swap_cluster_max
> test.

OK, thanks.  Perhaps a few words tacked onto the nr_io_pages definition
site would be the place to capture this.

> > Also, I wonder about what this change will do to the dynamic behaviour of
> > GFP_NOFS direct-reclaimers.  Previously they would throttle if they
> > encounter dirty pages which they can't write out.  Hopefully someone else
> > (kswapd or a __GFP_FS direct-reclaimer) will write some of those pages
> > and this caller will be woken when that writeout completes and will go off
> > and scoop them off the tail of the LRU.
> > 
> > But after this change, such a GFP_NOFS caller will, afacit, burn its way
> > through potentially the entire inactive list and will then declare oom. 
> 
> Nope, sc.nr_io_pages will also be incremented when the code runs into
> pages that are already PageWriteback.

yup, I didn't think of that.  Hopefully someone else will be in there
working on that zone too.  If this caller yields and defers to kswapd
then that's very likely.  Except we just took away the ability to do that..

-
To unsubscribe from this 

Re: 2.6.23-rc8-mm1 -- powerpc link failure

2007-09-27 Thread Jiri Kosina
On Thu, 27 Sep 2007, Andrew Morton wrote:

> > +extern void arch_randomize_brk(void);
> >  #include "../../../fs/binfmt_elf.c" 
> Is this sinful extern-decl-in-C acually needed?

Some time passed since I have written the patch, but I remember that this 
was needed, otherwise under some circumstances the build failed, but I 
don't remember details ... I will try to look at it more in a while.

But it definitely was needed to work with that horrible 
include-huge-c-file-from-another-one.

Thanks,

-- 
Jiri Kosina
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


More E820 brokenness

2007-09-27 Thread H. Peter Anvin
As luck would have it, it's not just an obscure Geode system which has a
broken E820 implementation.  Today I received a bug report about a Dell
system (XPS M1330) with broken E820.

Unfortunately, the workaround for the Geode breaks this system, because
x86-64 doesn't fall back to the e801/88 information like the i386 kernel
does.

I wonder if the relevant people could test out this patch to see how it
works on their respective system.  This patch reverts to 2.6.23-rc8
behaviour of simply truncating the map, but still makes e801/88 info
available to the kernel; this hopefully should match 2.6.22 behaviour.

I want to emphasize that this is seriously broken.  Using a partial e820
map could have disastrous results, since the kernel will have partial
memory map information and not know about reserved areas, etc.  Part of
me feels that the right thing to do is what the current git kernel does
-- either fall back to e801, or stop and error.

(Andi: I would particularly appreciate your opinion on this issue.)

-hpa
diff --git a/arch/i386/boot/memory.c b/arch/i386/boot/memory.c
index bccaa1c..84939b7 100644
--- a/arch/i386/boot/memory.c
+++ b/arch/i386/boot/memory.c
@@ -34,17 +34,7 @@ static int detect_memory_e820(void)
  "=m" (*desc)
: "D" (desc), "a" (0xe820));
 
-   /* Some BIOSes stop returning SMAP in the middle of
-  the search loop.  We don't know exactly how the BIOS
-  screwed up the map at that point, we might have a
-  partial map, the full map, or complete garbage, so
-  just return failure. */
-   if (id != SMAP) {
-   count = 0;
-   break;
-   }
-
-   if (err)
+   if (id != SMAP || err)
break;
 
count++;


[PATCH] clockevents: fix bogus next_event reset for oneshot broadcast devices

2007-09-27 Thread Thomas Gleixner
In periodic broadcast mode the next_event member of the broadcast device
structure is set to KTIME_MAX in the interrupt handler. This is wrong,
as we calculate the next periodic interrupt with this variable.

Remove it.

Noticed by Ralf. MIPS is the first user of this mode, it does not affect
existing users.

Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
Acked-and-tested-by: Ralf Baechle <[EMAIL PROTECTED]>
---

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 0962e05..acf15b4 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -176,8 +176,6 @@ static void tick_do_periodic_broadcast(void)
  */
 static void tick_handle_periodic_broadcast(struct clock_event_device *dev)
 {
-   dev->next_event.tv64 = KTIME_MAX;
-
tick_do_periodic_broadcast();
 
/*


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc8-mm1 -- powerpc link failure

2007-09-27 Thread Andrew Morton
On Thu, 27 Sep 2007 14:13:21 +0200 (CEST)
Jiri Kosina <[EMAIL PROTECTED]> wrote:

> i386 and x86_64: randomize brk()
> 
> ...
>
> --- a/arch/x86_64/ia32/ia32_binfmt.c
> +++ b/arch/x86_64/ia32/ia32_binfmt.c
> @@ -262,6 +262,7 @@ static void elf32_init(struct pt_regs *);
>  #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
>  #define arch_setup_additional_pages syscall32_setup_pages
>  extern int syscall32_setup_pages(struct linux_binprm *, int exstack);
> +extern void arch_randomize_brk(void);
>  
>  #include "../../../fs/binfmt_elf.c" 

Is this sinful extern-decl-in-C acually needed?
 
> index b4fbe47..5a1adf9 100644
> --- a/include/asm-x86_64/elf.h
> +++ b/include/asm-x86_64/elf.h
> @@ -177,4 +177,6 @@ do if (vdso_enabled) {
> \
>  
>  #endif
>  
> +extern void arch_randomize_brk(void);
> +
>  #endif

Because we already have a declaration in the correct place?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kswapd should only wait on IO if there is IO

2007-09-27 Thread Rik van Riel
On Thu, 27 Sep 2007 14:47:02 -0700
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Thu, 27 Sep 2007 17:08:16 -0400
> Rik van Riel <[EMAIL PROTECTED]> wrote:
> 
> > The current kswapd (and try_to_free_pages) code has an oddity where the
> > code will wait on IO, even if there is no IO in flight.  This problem is
> > notable especially when the system scans through many unfreeable pages,
> > causing unnecessary stalls in the VM.
> > 
> 
> What effect did this change have?

Kswapd was no longer sitting in "D" state as often and pages got
freed more promptly.  The test was done on a RHEL kernel with
this change though - I guess I should redo it with a current upstream
kernel.

> > diff -up linux-2.6.22.x86_64/mm/vmscan.c.wait 
> > linux-2.6.22.x86_64/mm/vmscan.c
> > --- linux-2.6.22.x86_64/mm/vmscan.c.wait2007-09-25 11:33:30.0 
> > -0400
> > +++ linux-2.6.22.x86_64/mm/vmscan.c 2007-09-25 21:27:08.0 -0400
> > @@ -68,6 +68,8 @@ struct scan_control {
> > int all_unreclaimable;
> >  
> > int order;
> > +
> > +   int nr_io_pages;
> >  };
> >  
> >  #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
> > @@ -489,8 +491,10 @@ static unsigned long shrink_page_list(st
> >  */
> > if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
> > wait_on_page_writeback(page);
> > -   else
> > +   else {
> > +   sc->nr_io_pages++;
> > goto keep_locked;
> > +   }
> > }
> >  
> > referenced = page_referenced(page, 1);
> > @@ -541,8 +545,10 @@ static unsigned long shrink_page_list(st
> > case PAGE_ACTIVATE:
> > goto activate_locked;
> > case PAGE_SUCCESS:
> > -   if (PageWriteback(page) || PageDirty(page))
> > +   if (PageWriteback(page) || PageDirty(page)) {
> > +   sc->nr_io_pages++;
> > goto keep;
> > +   }
> > /*
> >  * A synchronous write - probably a ramdisk.  Go
> >  * ahead and try to reclaim the page.
> > @@ -1201,6 +1207,7 @@ unsigned long try_to_free_pages(struct z
> >  
> > for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> > sc.nr_scanned = 0;
> > +   sc.nr_io_pages = 0;
> > if (!priority)
> > disable_swap_token();
> > nr_reclaimed += shrink_zones(priority, zones, );
> > @@ -1229,7 +1236,8 @@ unsigned long try_to_free_pages(struct z
> > }
> >  
> > /* Take a nap, wait for some writeback to complete */
> > -   if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
> > +   if (sc.nr_scanned && priority < DEF_PRIORITY - 2 &&
> > +   sc.nr_io_pages > sc.swap_cluster_max)
> 
> The comparison with swap_cluster_max is unobvious, and merits a
> comment.  What is the thinking here?  

If the number of pages undergoing IO is really small, waiting
for them may be a waste of time.

Maybe my thinking is wrong, not sure...

> Also, we now have this:
> 
>   if (total_scanned > sc.swap_cluster_max +
>   sc.swap_cluster_max / 2) {
>   wakeup_pdflush(laptop_mode ? 0 : total_scanned);
>   sc.may_writepage = 1;
>   }
> 
>   /* Take a nap, wait for some writeback to complete */
>   if (sc.nr_scanned && priority < DEF_PRIORITY - 2 && 
>   sc.nr_io_pages > sc.swap_cluster_max)
>   congestion_wait(WRITE, HZ/10);
> 
> 
> So in the case where total_scanned has not yet reached
> swap_cluster_max, this process isn't initiating writeout and it isn't
> sleeping, either.  Nor is it incrementing nr_io_pages.

Actually, nr_io_pages is also incremented when we run into pages that
are already PageWriteback - even if we did not start the IO ourselves.

> In the range (swap_cluster_max < nr_io_pages < 1.5*swap_cluster_max) this
> process still isn't incrementing nr_io_pages, but it _is_ running
> congestion_wait().

It is incrementing sc.nr_io_pages and will wait on IO to complete if
the amount of pages in flight to disk that it scanned over is larger
than the number of pages that it is trying to free.

> Once nr_io_pages exceeds 1.5*swap_cluster_max, this process is both
> initiating IO and is throttling on writeback completion events.
> 
> This all seems a bit weird and arbitrary - what is the reason for
> throttling-but-not-writing in that 1.0->1.5 window?

Good question.  Note that the throttling-but-not-writing window in
the current code is 0.0->1.5, so this patch does reduce the throttling
window compared to the 

Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-09-27 Thread Matthew Wilcox
On Thu, Sep 27, 2007 at 05:00:22PM -0500, Linas Vepstas wrote:
> On Wed, Sep 26, 2007 at 09:02:16AM -0600, Matthew Wilcox wrote:
> > I'm a little concerned by the mention of MMIO.  It's entirely possible
> > for the sym2 driver to be using ioports to access the card rather than
> > MMIO.  Is it simply that it can't on the platform you test on?
> 
> The comment is misleading. I've been in the bad habit of calling
> it "mmio" whenever its not DMA.

OK, cool, thanks.  I'll update the comment for you.

One last thing (sorry, I only just noticed):
In the error handler, we wait_for_completion(io_reset_wait).
In sym2_io_error_detected, we init_completion(io_reset_wait).
Isn't it possible that we hit the error handler before we hit the
io_error_detected path, and thus the completion wait is lost?
Since the completion is already initialised in sym_attach(), I don't
think we need to initialise it in sym2_io_error_detected().
Makes sense to just delete it?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] make reiserfs stop using 'struct file' for internal xattr operations

2007-09-27 Thread Peter Zijlstra

On Thu, 2007-09-27 at 14:51 -0700, Andrew Morton wrote:

> So your stuff becomes dependent on Nick's stuff, and Nick's stuff is still
> failing on NFS, I think.

It worked today, it turned out to be a UML bug. Real hardware seemed to
work properly, but will test a bit more tomorrow.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why do so many machines need "noapic"?

2007-09-27 Thread Phillip Susi

Dave Jones wrote:

If memory serves correctly, that was circa 2.6.10, back in these commits..

commit a068ea13d1db406e15c346e93530343f6e70184c
Author: Len Brown <[EMAIL PROTECTED]>
Date:   Sun Oct 10 05:21:08 2004 -0400

[ACPI] If BIOS disabled the LAPIC, believe it by default.
"lapic" is available to force enabling the LAPIC
in the event you know more than your BIOS vendor.
http://bugzilla.kernel.org/show_bug.cgi?id=3238

commit 2fcfece90db9643b6f30a7ad343898a2871e6a81
Author: Len Brown <[EMAIL PROTECTED]>
Date:   Sat Oct 9 20:12:45 2004 -0400

[ACPI] Don't enable LAPIC when the BIOS disabled it.
Doing so apparently breaks every Dell on Earth.
http://bugzilla.kernel.org/show_bug.cgi?id=3238


But those changes relate to the local APIC, which 'noapic' shouldn't
have any effect on should it ?


If the LAPIC is disabled, then you CAN'T use the IO-APIC right?  So then
wouldn't the noapic option have no effects since the apic is already
disabled?



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-09-27 Thread Linas Vepstas
On Wed, Sep 26, 2007 at 09:02:16AM -0600, Matthew Wilcox wrote:
> On Fri, Apr 20, 2007 at 03:47:20PM -0500, Linas Vepstas wrote:
> > Implement the so-called "first failure data capture" (FFDC) for the
> > symbios PCI error recovery.  After a PCI error event is reported,
> > the driver requests that MMIO be enabled. Once enabled, it 
> > then reads and dumps assorted status registers, and concludes
> > by requesting the usual reset sequence.
> 
> > +   /* Request that MMIO be enabled, so register dump can be taken. */
> > +   return PCI_ERS_RESULT_CAN_RECOVER;
> > +}
> 
> I'm a little concerned by the mention of MMIO.  It's entirely possible
> for the sym2 driver to be using ioports to access the card rather than
> MMIO.  Is it simply that it can't on the platform you test on?

The comment is misleading. I've been in the bad habit of calling
it "mmio" whenever its not DMA.

The habit is because there are two distinct enable bits in the 
pci-host bridge during error recovery: one to enable mmio/ioports, 
and the other to enable DMA. If the adapter has gone crazy, I don't 
want to enable DMA, so that it doesn't scribble to bad places. But, 
by enabling mmio/ioports, perhaps it can be finessed back into a 
semi-sane state, e.g. sane enough to perform a dump of its internal
state.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] make reiserfs stop using 'struct file' for internal xattr operations

2007-09-27 Thread Andrew Morton
On Thu, 27 Sep 2007 14:51:25 -0700
Andrew Morton <[EMAIL PROTECTED]> wrote:

> > Plus, reiserfs seems to compile with that patch I just sent.  Sure as
> > heck surprised me.
> > 
> 
> That'll be because reiserfs-convert-to-new-aops.patch witched reiserfs over
> to ->write_begin() and ->write_end().

Actually, we should rename reiserfs_prepare_write and reiserfs_commit_write
to something else to reduce confusion.  Probably lots of other filesystems
would benefit from the same change, post-Nick's-stuff.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.23-rc8-mm2 NULL dereference in __mnt_is_readonly in ftruncate

2007-09-27 Thread Zan Lynx
Kernel 2.6.23-rc8-mm2 on a AMD-64, filesystems mounted are reiserfs,
reiser4 and tmpfs.
netconsole dmesg output and .config are included below.

Near the end of my boot sequence, there is a kernel error.  I am not
sure exactly what user-space is doing to make this happen, but I know
that a simple shell and some filesystem operations do not cause it.

This error also occurred in 2.6.23-rc8-mm1 but I didn't have time to
post it and hoped it would just go away.  I never tested 2.6.23-rc7-mm*,
and the error did not happen in rc6-mm1.

console [netcon0] enabled
netconsole: network logging started
eth0: no IPv6 routers present
Unable to handle kernel NULL pointer dereference at 0053 RIP: 
 [] __mnt_is_readonly+0x0/0x20
PGD 0 
Oops:  [1] SMP 
last sysfs file: /block/sr0/size
CPU 0 
Modules linked in: netconsole configfs sg ipv6 evdev usbhid hid usb_storage 
libusual psmouse serio_raw ssb video output ehci_hcd ohci_hcd usbcore 
snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd snd_page_alloc
Pid: 7291, comm: smbd Not tainted 2.6.23-rc8-mm2 #1
RIP: 0010:[]  [] __mnt_is_readonly+0x0/0x20
RSP: 0018:8100068b1b60  EFLAGS: 00010296
RAX: 810007108000 RBX: 81000261d8c0 RCX: 8093aca0
RDX: 0004 RSI: 8092e950 RDI: 0003
RBP: 0003 R08: 0003 R09: 8061f7cd
R10: b256aacb R11:  R12: ffe2
R13: 8100068b1bd8 R14: 8100068b1ee8 R15: 81000655a910
FS:  7f6f0930c6f0() GS:806ce000() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0053 CR3: 07cb2000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process smbd (pid: 7291, threadinfo 8100068b, task 810007108000)
last branch before last exception/interrupt
 from  [] mnt_want_write+0x3a/0x90
 to  [] __mnt_is_readonly+0x0/0x20
Stack:  802cc37f 8100078cd9a0 8100068b1bd8 8100078cd9a0
 802c82bc 8100078cd780  8100078cd9a0
 8100068b1bd8 8100068b1ee8 3000 
Call Trace:
 [] mnt_want_write+0x3f/0x90
 [] file_update_time+0x2c/0xe0
 [] truncate_file_body+0x148/0x3f0
 [] __lock_acquire+0x583/0x1180
 [] _spin_unlock+0x17/0x20
 [] store_black_box+0x82/0x90
 [] safe_link_add+0x75/0xd0
 [] setattr_unix_file+0x207/0x220
 [] _spin_unlock_irq+0x24/0x30
 [] __down_write_nested+0xa1/0xc0
 [] notify_change+0xf7/0x2c0
 [] do_truncate+0x5e/0x80
 [] sys_ftruncate+0x119/0x130
 [] system_call+0x7e/0x83

INFO: lockdep is turned off.

Code: f6 47 50 40 b8 01 00 00 00 75 0a 48 8b 47 28 8b 40 58 83 e0 
RIP  [] __mnt_is_readonly+0x0/0x20
 RSP 
CR2: 0053
BUG: spinlock lockup on CPU#0, syslogd/5128, 81000261d8c0

Call Trace:
 [] _raw_spin_lock+0x134/0x140
 [] mnt_want_write+0x37/0x90
 [] mnt_want_write+0x37/0x90
 [] file_update_time+0x2c/0xe0
 [] write_unix_file+0x275/0x530
 [] write_unix_file+0x0/0x530
 [] do_loop_readv_writev+0x45/0x70
 [] do_readv_writev+0x20a/0x220
 [] sys_writev+0x53/0xc0
 [] system_call+0x7e/0x83

INFO: lockdep is turned off.
BUG: soft lockup - CPU#0 stuck for 11s! [syslogd:5128]
CPU 0:
Modules linked in: netconsole configfs sg ipv6 evdev usbhid hid usb_storage 
libusual psmouse serio_raw ssb video output ehci_hcd ohci_hcd usbcore 
snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd snd_page_alloc
Pid: 5128, comm: syslogd Tainted: G  D 2.6.23-rc8-mm2 #1
RIP: 0010:[]  [] _raw_spin_lock+0xb8/0x140
RSP: 0018:810006067d18  EFLAGS: 0246
RAX:  RBX: 080c83f0 RCX: 75413ee4
RDX: 0027 RSI: 002182fd RDI: 0001
RBP:  R08: 0001 R09: 0001
R10: 8023f8bb R11: 62da686f R12: 0001
R13: 8080ef80 R14: 810006066000 R15: 810081e11000
FS:  7fe85cf206f0() GS:806ce000() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0053 CR3: 064a8000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400

Call Trace:
 [] _raw_spin_lock+0xc9/0x140
 [] mnt_want_write+0x37/0x90
 [] mnt_want_write+0x37/0x90
 [] file_update_time+0x2c/0xe0
 [] write_unix_file+0x275/0x530
 [] write_unix_file+0x0/0x530
 [] do_loop_readv_writev+0x45/0x70
 [] do_readv_writev+0x20a/0x220
 [] sys_writev+0x53/0xc0
 [] system_call+0x7e/0x83

INFO: lockdep is turned off.
SysRq : HELP : loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll 
saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks 
Unmount shoW-blocked-tasks 
SysRq : Resetting


#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc8-mm2
# Thu Sep 27 14:06:06 2007
#
CONFIG_X86_64=y

Re: [RFC][PATCH] make reiserfs stop using 'struct file' for internal xattr operations

2007-09-27 Thread Andrew Morton
On Thu, 27 Sep 2007 14:27:14 -0700
Dave Hansen <[EMAIL PROTECTED]> wrote:

> On Thu, 2007-09-27 at 22:04 +0100, Christoph Hellwig wrote:
> > On Thu, Sep 27, 2007 at 01:53:39PM -0700, Dave Hansen wrote:
> > > -int reiserfs_commit_write(struct file *f, struct page *page,
> > > -   unsigned from, unsigned to);
> > > -int reiserfs_prepare_write(struct file *f, struct page *page,
> > > -unsigned from, unsigned to);
> > > +int reiserfs_commit_write(struct page *page, unsigned from, unsigned to);
> > > +int reiserfs_prepare_write(struct page *page, unsigned from, unsigned 
> > > to);
> > 
> > I doubt this will work.  These are also used for the ->prepare_write
> > and ->commit_write aops, and the method signature definitively wants
> > a file there, even if it's zero..
> 
> Oddly enough, I don't see those functions being used in aops:
> 
> const struct address_space_operations reiserfs_address_space_operations = {
> .writepage = reiserfs_writepage,
> .readpage = reiserfs_readpage,
> .readpages = reiserfs_readpages,
> .releasepage = reiserfs_releasepage,
> .invalidatepage = reiserfs_invalidatepage,
> .sync_page = block_sync_page,
> .write_begin = reiserfs_write_begin,
> .write_end = reiserfs_write_end,
> .bmap = reiserfs_aop_bmap,
> .direct_IO = reiserfs_direct_IO,
> .set_page_dirty = reiserfs_set_page_dirty,
> };
> 
> Plus, reiserfs seems to compile with that patch I just sent.  Sure as
> heck surprised me.
> 

That'll be because reiserfs-convert-to-new-aops.patch witched reiserfs over
to ->write_begin() and ->write_end().

So your stuff becomes dependent on Nick's stuff, and Nick's stuff is still
failing on NFS, I think.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kswapd should only wait on IO if there is IO

2007-09-27 Thread Andrew Morton
On Thu, 27 Sep 2007 17:08:16 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:

> The current kswapd (and try_to_free_pages) code has an oddity where the
> code will wait on IO, even if there is no IO in flight.  This problem is
> notable especially when the system scans through many unfreeable pages,
> causing unnecessary stalls in the VM.
> 

What effect did this change have?

> 
> diff -up linux-2.6.22.x86_64/mm/vmscan.c.wait linux-2.6.22.x86_64/mm/vmscan.c
> --- linux-2.6.22.x86_64/mm/vmscan.c.wait  2007-09-25 11:33:30.0 
> -0400
> +++ linux-2.6.22.x86_64/mm/vmscan.c   2007-09-25 21:27:08.0 -0400
> @@ -68,6 +68,8 @@ struct scan_control {
>   int all_unreclaimable;
>  
>   int order;
> +
> + int nr_io_pages;
>  };
>  
>  #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
> @@ -489,8 +491,10 @@ static unsigned long shrink_page_list(st
>*/
>   if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
>   wait_on_page_writeback(page);
> - else
> + else {
> + sc->nr_io_pages++;
>   goto keep_locked;
> + }
>   }
>  
>   referenced = page_referenced(page, 1);
> @@ -541,8 +545,10 @@ static unsigned long shrink_page_list(st
>   case PAGE_ACTIVATE:
>   goto activate_locked;
>   case PAGE_SUCCESS:
> - if (PageWriteback(page) || PageDirty(page))
> + if (PageWriteback(page) || PageDirty(page)) {
> + sc->nr_io_pages++;
>   goto keep;
> + }
>   /*
>* A synchronous write - probably a ramdisk.  Go
>* ahead and try to reclaim the page.
> @@ -1201,6 +1207,7 @@ unsigned long try_to_free_pages(struct z
>  
>   for (priority = DEF_PRIORITY; priority >= 0; priority--) {
>   sc.nr_scanned = 0;
> + sc.nr_io_pages = 0;
>   if (!priority)
>   disable_swap_token();
>   nr_reclaimed += shrink_zones(priority, zones, );
> @@ -1229,7 +1236,8 @@ unsigned long try_to_free_pages(struct z
>   }
>  
>   /* Take a nap, wait for some writeback to complete */
> - if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
> + if (sc.nr_scanned && priority < DEF_PRIORITY - 2 &&
> + sc.nr_io_pages > sc.swap_cluster_max)

The comparison with swap_cluster_max is unobvious, and merits a
comment.  What is the thinking here?  


Also, we now have this:

if (total_scanned > sc.swap_cluster_max +
sc.swap_cluster_max / 2) {
wakeup_pdflush(laptop_mode ? 0 : total_scanned);
sc.may_writepage = 1;
}

/* Take a nap, wait for some writeback to complete */
if (sc.nr_scanned && priority < DEF_PRIORITY - 2 &&
sc.nr_io_pages > sc.swap_cluster_max)
congestion_wait(WRITE, HZ/10);


So in the case where total_scanned has not yet reached
swap_cluster_max, this process isn't initiating writeout and it isn't
sleeping, either.  Nor is it incrementing nr_io_pages.

In the range (swap_cluster_max < nr_io_pages < 1.5*swap_cluster_max) this
process still isn't incrementing nr_io_pages, but it _is_ running
congestion_wait().

Once nr_io_pages exceeds 1.5*swap_cluster_max, this process is both
initiating IO and is throttling on writeback completion events.

This all seems a bit weird and arbitrary - what is the reason for
throttling-but-not-writing in that 1.0->1.5 window?

If there _is_ a reason and it's all been carefully thought out and
designed, then can we please capture a description of that design in the
changelog or in the code?



Also, I wonder about what this change will do to the dynamic behaviour of
GFP_NOFS direct-reclaimers.  Previously they would throttle if they
encounter dirty pages which they can't write out.  Hopefully someone else
(kswapd or a __GFP_FS direct-reclaimer) will write some of those pages
and this caller will be woken when that writeout completes and will go off
and scoop them off the tail of the LRU.

But after this change, such a GFP_NOFS caller will, afacit, burn its way
through potentially the entire inactive list and will then declare oom. 
Non-preemtible uniprocessor kernels would be most at risk from this.


>   congestion_wait(WRITE, HZ/10);
>   }
>   /* top priority shrink_caches still had more to do? don't OOM, then */
> @@ -1315,6 +1323,7 @@ loop_again:
>   if (!priority)
>   

Re: Problems with SMP & ACPI powering off

2007-09-27 Thread Rafael J. Wysocki
On Thursday, 27 September 2007 23:29, Mark Lord wrote:
> Question:  do we disable all CPUs except 0 when doing ACPI power off?

No, but we should.

> Background:
> I have a machine here dedicated to running MythTV.
> It powers up to record, and then sets the RTC alarm for next time
> and powers down again in between recordings.
> 
> It has an Intel Core2duo E6300 CPU, currently on an ICH8 motherboard.
> Previously it was on a completely different (vendor,bios,...) ICH7 
> motherboard.
> 
> In both cases, "halt -p" sometimes fails to actually turn off the power,
> which means that it later then fails to "turn on" to record again.
> 
> Annoying.
> 
> This is a 32-bit kernel/runtime, with full ACPI (not APM) kernel support 
> enabled.
> 
> So I'm wondering if it may be due to the old SMP-poweroff bogeyman ?

May be.

Which kernel?

> For now, I've hardcoded a cpu_down(1) into the poweroff code,
> and we'll see if that helps or is merely redundant.
> 
> But I do wonder where else to look for a cause?
> 
> Two different boards, vendors, BIOSs, same CPU chip.  Same problem.

Same chipset, perchance?

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/10] Dont touch fs_struct in usermodehelper

2007-09-27 Thread Greg KH
On Thu, Sep 27, 2007 at 10:39:22PM +0200, Christoph Hellwig wrote:
> On Thu, Sep 27, 2007 at 10:46:04AM -0700, Greg KH wrote:
> > On Thu, Sep 27, 2007 at 04:12:02PM +0200, [EMAIL PROTECTED] wrote:
> > > This test seems to be unnecessary since we always have rootfs mounted 
> > > before
> > > calling a usermodehelper.
> > 
> > Are you sure this is true?  I thought we called the usermode helper for
> > hotplug _very_ early in the boot sequence when the device tree starts to
> > get populated.
> 
> rootfs is mounted by init_mount_tree, and curret->fs is set up for init
> there aswell.  This is called by mnt_init, which is called by
> vfs_caches_init, which is called by start_kernel far before we go to
> rest_init which finally creates a thread to call kernel_init which then
> calls do_basic_setup which calls do_initcalls to initialize drivers and
> afterwards runs the initrd/initramfs.
> 
> While the actual function names in main.c changed quite a bit we've
> initialized the initial namespace very early on since the 2.5 days.

Ah, ok, great, thanks for correcting me.  I have no objection to this
patch then.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: Correct SuS compliance for open of large file without options

2007-09-27 Thread Greg KH
On Thu, Sep 27, 2007 at 02:37:42PM -0400, Theodore Tso wrote:
> On Thu, Sep 27, 2007 at 10:59:17AM -0700, Greg KH wrote:
> > Come on now, I'm _very_ tired of this kind of discussion.  Please go
> > read the documentation on how to _use_ sysfs from userspace in such a
> > way that you can properly access these data structures so that no
> > breakage occurs.
> 
> I've read it; the question is whether every single application
> programmer or system shell script programmer who writes code my system
> depends upon has read it this document buried in the kernel sources,
> or whether things will break spectacularly --- one of those things
> that leaves me in suspense each time I update the kernel.

Ok, how then should I advertise this better?  What can we do better to
help userspace programmers out in this regard?

> I'm reminded of Rusty's 2003 OLS Keynote, where he points out that
> what's important is not making an interface easy to use, but _hard_
> _to_ _misuse_.

Me and Pat Mochel sat in that talk and instantly had an "oh shit" moment
when it came to the in-kernel usage of sysfs and the driver model.  Ever
since then, I have been working to change the code to make it better.
With the exception of the recent help from Kay, I am the only one doing
this as Pat has been gone for a few years and isn't coming back.

> That fact that sysfs is all laid out in a directory, but for which
> some directories/symlinks are OK to use, and some are NOT OK to use
> --- is why I call the sysfs interface "an open pit".

We (well, Kay mostly) have also been working on fixing this all up to
make it much harder to use sysfs incorrectly.  We will have a single
device tree (well, almost a single tree, it's getting there), so that
all of the information is only in one place, and you don't have to go
searching all over the place for it.  That is a direct improvement over
the old design where somethings were in one place, and others in
another.

And because of the original design mistakes, we have only been able to
change things for the better in a slow manner.  We have had userspace
programs fixed up for _years_ before we are able to make the
corresponding changes in the kernel, so as to not break the distros that
are slow to upgrade packages and kernels (like Debian.)

If I had my druthers, we could instantly put some patches into the tree
to fix up the sysfs "mess" once and for all, creating a unified, single
tree, with only a handful of needed symlinks to be able to categorize
certain things.  We have the patches (Kay wrote them over a year ago),
and userspace programs work just fine with them (udev and HAL), but
because we need to support 5 year old userspace programs running
tomorrows kernel, we must take very tiny, slow steps to get there.

And yes, sysfs has slowly changed over the years, and along the way we
have kept things working, with only very minor problems.  You have no
idea the crazy mismatch of kernels and userspace programs we have had to
deal with.  And it will continue to change, slowly, until we reach the
unified-tree goal, and all of those old crufty userspace programs are
dead and buried (I got a bug report about RHEL's udev version 039 just
yesterday.)

So you can't have it both ways.  You can't complain that sysfs isn't
stable, and isn't "properly userspace friendly" at the same time.  In
order to fix the issues, we have to change it, and do it slowly, because
I don't want to break some distros that can't keep up with the others.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] spin_lock_unlocked cleanups

2007-09-27 Thread roel
Replace some SPIN_LOCK_UNLOCKED with DEFINE_SPINLOCK

Signed-off-by: Roel Kluin <[EMAIL PROTECTED]>
---
diff --git a/arch/mips/pci/ops-pmcmsp.c b/arch/mips/pci/ops-pmcmsp.c
index 09fa007..059eade 100644
--- a/arch/mips/pci/ops-pmcmsp.c
+++ b/arch/mips/pci/ops-pmcmsp.c
@@ -206,7 +206,7 @@ static void pci_proc_init(void)
 }
 #endif /* CONFIG_PROC_FS && PCI_COUNTERS */
 
-spinlock_t bpci_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(bpci_lock);
 
 /*
  *
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index d5fd390..cd2766e 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -34,7 +34,7 @@
 #include 
 #include 
 
-static spinlock_t slice_convert_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(slice_convert_lock);
 
 
 #ifdef DEBUG
diff --git a/drivers/char/watchdog/bfin_wdt.c b/drivers/char/watchdog/bfin_wdt.c
index 309d279..31dc7a6 100644
--- a/drivers/char/watchdog/bfin_wdt.c
+++ b/drivers/char/watchdog/bfin_wdt.c
@@ -71,7 +71,7 @@ static int nowayout = WATCHDOG_NOWAYOUT;
 static struct watchdog_info bfin_wdt_info;
 static unsigned long open_check;
 static char expect_close;
-static spinlock_t bfin_wdt_spinlock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(bfin_wdt_spinlock);
 
 /**
  * bfin_wdt_keepalive - Keep the Userspace Watchdog Alive
diff --git a/drivers/ieee1394/ieee1394_core.c b/drivers/ieee1394/ieee1394_core.c
index 98fd985..36c747b 100644
--- a/drivers/ieee1394/ieee1394_core.c
+++ b/drivers/ieee1394/ieee1394_core.c
@@ -488,7 +488,7 @@ void hpsb_selfid_complete(struct hpsb_host *host, int 
phyid, int isroot)
highlevel_host_reset(host);
 }
 
-static spinlock_t pending_packets_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(pending_packets_lock);
 
 /**
  * hpsb_packet_sent - notify core of sending a packet
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 83e76b3..94fd78f 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -15,9 +15,9 @@
 #include "sysfs.h"
 
 DEFINE_MUTEX(sysfs_mutex);
-spinlock_t sysfs_assoc_lock = SPIN_LOCK_UNLOCKED;
+DEFINE_SPINLOCK(sysfs_assoc_lock);
 
-static spinlock_t sysfs_ino_lock = SPIN_LOCK_UNLOCKED;
+static DEFINE_SPINLOCK(sysfs_ino_lock);
 static DEFINE_IDA(sysfs_ino_ida);
 
 /**
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NO_HZ hangs up AMD MK-36

2007-09-27 Thread Dmitry Tyschenko
2007/9/28, Thomas Gleixner <[EMAIL PROTECTED]>:
> On Fri, 2007-09-28 at 00:01 +0300, Dmitry Tyschenko wrote:
> > Sorry, I am newbie in linux. Hope you was talking about:
> > /boot/vmlinuz-2.6.22-1-k7 root=/dev/sda5 ro nohz=off
>
> Yes.
>
> > But it doesn't help for Debians 2.6.22-1 (I don't have another
> > prebuiled) still same problems.
>
> Can you please add: nolapic_timer instead ?
>
> Thanks,
>
> tglx

It works with nolapic_timer!
Thank you!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with SMP & ACPI powering off

2007-09-27 Thread Mark Lord

Mark Lord wrote:

Question:  do we disable all CPUs except 0 when doing ACPI power off?

Background:
I have a machine here dedicated to running MythTV.
It powers up to record, and then sets the RTC alarm for next time
and powers down again in between recordings.

It has an Intel Core2duo E6300 CPU, currently on an ICH8 motherboard.
Previously it was on a completely different (vendor,bios,...) ICH7 
motherboard.


In both cases, "halt -p" sometimes fails to actually turn off the power,
which means that it later then fails to "turn on" to record again.

Annoying.

This is a 32-bit kernel/runtime, with full ACPI (not APM) kernel support 
enabled.


So I'm wondering if it may be due to the old SMP-poweroff bogeyman ?

For now, I've hardcoded a cpu_down(1) into the poweroff code,
and we'll see if that helps or is merely redundant.

But I do wonder where else to look for a cause?

Two different boards, vendors, BIOSs, same CPU chip.  Same problem.


Oh, and two different power-supplies, too.

-ml
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Problems with SMP & ACPI powering off

2007-09-27 Thread Mark Lord

Question:  do we disable all CPUs except 0 when doing ACPI power off?

Background:
I have a machine here dedicated to running MythTV.
It powers up to record, and then sets the RTC alarm for next time
and powers down again in between recordings.

It has an Intel Core2duo E6300 CPU, currently on an ICH8 motherboard.
Previously it was on a completely different (vendor,bios,...) ICH7 motherboard.

In both cases, "halt -p" sometimes fails to actually turn off the power,
which means that it later then fails to "turn on" to record again.

Annoying.

This is a 32-bit kernel/runtime, with full ACPI (not APM) kernel support 
enabled.

So I'm wondering if it may be due to the old SMP-poweroff bogeyman ?

For now, I've hardcoded a cpu_down(1) into the poweroff code,
and we'll see if that helps or is merely redundant.

But I do wonder where else to look for a cause?

Two different boards, vendors, BIOSs, same CPU chip.  Same problem.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] make reiserfs stop using 'struct file' for internal xattr operations

2007-09-27 Thread Dave Hansen
On Thu, 2007-09-27 at 22:04 +0100, Christoph Hellwig wrote:
> On Thu, Sep 27, 2007 at 01:53:39PM -0700, Dave Hansen wrote:
> > -int reiserfs_commit_write(struct file *f, struct page *page,
> > - unsigned from, unsigned to);
> > -int reiserfs_prepare_write(struct file *f, struct page *page,
> > -  unsigned from, unsigned to);
> > +int reiserfs_commit_write(struct page *page, unsigned from, unsigned to);
> > +int reiserfs_prepare_write(struct page *page, unsigned from, unsigned to);
> 
> I doubt this will work.  These are also used for the ->prepare_write
> and ->commit_write aops, and the method signature definitively wants
> a file there, even if it's zero..

Oddly enough, I don't see those functions being used in aops:

const struct address_space_operations reiserfs_address_space_operations = {
.writepage = reiserfs_writepage,
.readpage = reiserfs_readpage,
.readpages = reiserfs_readpages,
.releasepage = reiserfs_releasepage,
.invalidatepage = reiserfs_invalidatepage,
.sync_page = block_sync_page,
.write_begin = reiserfs_write_begin,
.write_end = reiserfs_write_end,
.bmap = reiserfs_aop_bmap,
.direct_IO = reiserfs_direct_IO,
.set_page_dirty = reiserfs_set_page_dirty,
};

Plus, reiserfs seems to compile with that patch I just sent.  Sure as
heck surprised me.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bw-qcam: use data_reverse instead of manually poking the control register

2007-09-27 Thread Alan Cox
On Thu, 27 Sep 2007 12:28:31 -0700
"Brett Warden" <[EMAIL PROTECTED]> wrote:

> Fixes use of parport_write_control() to match the newer interface that
> requires explicit parport_data_reverse() and parport_data_forward()
> calls. This eliminates the following error message and restores the
> original intended behavior:

Looks good

> parport0 (bw-qcam): use data_reverse for this!
> 
> Also increases threshold in qc_detect() from 300 to 400, as my camera
> often results in a count of approx 330. Added a kernel error message
> to indicate detection failure.

Likewise

> Signed-off-by: Brett T. Warden <[EMAIL PROTECTED]>

Acked-by: Alan Cox <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel Oops in ext3 code

2007-09-27 Thread Mingming Cao
Hi,
Could you please sent the objdump of the ext4_discard_reservation
function? It doesn't match what I see here.

Thanks,
Mingming

On Thu, 2007-09-27 at 12:31 +0200, [EMAIL PROTECTED]
wrote:
> Hi all!
> 
> (Please Cc)
> 
> kernel 2.6.23-rc6
> Debian/sid
> 
> kernel ooops:
> 
> BUG: unable to handle kernel paging request at virtual address 104b
>  printing eip:
>  c0195bd3
>  *pde = 
>  Oops:  [#1]
>  PREEMPT SMP 
>  Modules linked in: vboxdrv binfmt_misc fuse coretemp hwmon gspca videodev 
> v4l2_common v4l1_compat iwl3945 mac80211 tifm_7xx1 tifm_core joydev irda 
> crc_ccitt 8250_pnp 8250 serial_core firewire_ohci firewire_core crc_itu_t
>  CPU:0
>  EIP:0060:[]Not tainted VLI
>  EFLAGS: 00010206   (2.6.23-rc6 #1)
>  EIP is at ext3_discard_reservation+0x18/0x4d
>  eax: dff23800   ebx: 1033   ecx: dfc15ec0   edx: 
>  esi: c0007c44   edi: 1033   ebp: dfc2bef4   esp: dfc2beac
>  ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
>  Process kswapd0 (pid: 261, ti=dfc2a000 task=dfcac570 task.ti=dfc2a000)
>  Stack: c0007ba4 c0007c44 1033 c019ec51 c0007c44 c0007d8c 002c 
> c0171b1b 
> 002c c0007c44 c0007c4c c0171da2 c050880c  0080 
> 0080 
> c0171fb8 0080 c0007e48 df9e3910 7404 c03f5634 0080 
> 00d0 
>  Call Trace:
>   [] ext3_clear_inode+0x5d/0x76
>   [] clear_inode+0x6b/0xb9
>   [] dispose_list+0x48/0xc9
>   [] shrink_icache_memory+0x195/0x1bd
>   [] shrink_slab+0xe2/0x159
>   [] kswapd+0x2d3/0x431
>   [] autoremove_wake_function+0x0/0x33
>   [] kswapd+0x0/0x431
>   [] kthread+0x38/0x5d
>   [] kthread+0x0/0x5d
>   [] kernel_thread_helper+0x7/0x10
>   ===
>  Code: 83 f8 01 19 c0 f7 d0 83 e0 08 89 42 0c 89 56 b4 5b 5e c3 57 56 89 c6 
> 53 8b 58 b4 8b 80 a4 00 00 00 85 db 8b 80 78 01 00 00 74 30 <83> 7b 18 00 74 
> 2a 8d b8 00 03 00 00 89 f8 e8 b8 ca 1a 00 83 7b 
>  EIP: [] ext3_discard_reservation+0x18/0x4d SS:ESP 0068:dfc2beac
> 
> 
> Sysrq did work, so the oops was saved. Good.
> 
> Any ideas?
> 
> Best wishes
> 
> Norbert
> 
> ---
> Dr. Norbert Preining <[EMAIL PROTECTED]>Vienna University of 
> Technology
> Debian Developer <[EMAIL PROTECTED]> Debian TeX Group
> gpg DSA: 0x09C5B094  fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 
> B094
> ---
> As he came into the light they could see his black and
> gold uniform on which the buttons were so highly polished
> that they shone with an intensity that would have made an
> approaching motorist flash his lights in annoyance.
>  --- Douglas Adams, The Hitchhikers Guide to the Galaxy
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] New kernel-message logging API (take 2)

2007-09-27 Thread Vegard Nossum
Hello,

A big thanks to everybody who read and replied to my first e-mail; I
have tried my best to incorporate your feedback and suggestions. I
also added some CCs who recently participated in logging-related
discussions.


Changes (since Sept. 22):

  * Extensibility -> Allowing the compiler to eliminate messages below
a certain threshold requires changing the API.
  * Add some special-purpose logging functions
(printk_detected(), _registered(), _settings(), and _copyright())
  * Fine-grained log-level control. "Everything above" or "everything
below" can be emulated by turning the specific log-levels on or off.
  * Define an extra header containing the (optional) secondary
interface (err()/warn()/info())
  * Remove kprint_*() aliases.
  * kprint_() is better than kprint( CONFIG_KPRINT_LOGLEVEL_MAX) { \
kprint_real_block_init(block, loglevel);

#define kprint_block(block, fmt, ...)   \
kprint_real_block(block, fmt, ## __VA_ARGS__);

#define kprint_block_flush(block)   \
kprint_real_block_flush(block); \
}

/* Thus, this C code: */
kprint_block_init(, KPRINT_INFO);
kprint_block(, "Hello world");
kprint_block_flush();

/* Would pre-process into this: */
if(6 < 4) {
kprint_real_block_init(, 6);
kprint_real_block(, "Hello world");
kprint_block_flush();
}
}


References

[1] http://lkml.org/lkml/2007/9/21/267 (Joe Perches)
[2] http://lkml.org/lkml/2007/9/20/352 (Rob Landley)
[3] http://lkml.org/lkml/2007/9/21/151 (Dick Streefland)
[4] http://lkml.org/lkml/2007/6/13/146 (Michael Holzheu)
[5] http://lkml.org/lkml/2007/9/24/320 (Jesse Barnes)
[6] http://lkml.org/lkml/2007/9/22/162 (Miguel Ojeda)
[7] http://lkml.org/lkml/2007/9/25/62 (Vegard Nossum)
[8] http://lkml.org/lkml/2007/9/22/157 (Joe Perches)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kswapd should only wait on IO if there is IO

2007-09-27 Thread Rik van Riel
The current kswapd (and try_to_free_pages) code has an oddity where the
code will wait on IO, even if there is no IO in flight.  This problem is
notable especially when the system scans through many unfreeable pages,
causing unnecessary stalls in the VM.

Signed-off-by: Rik van Riel <[EMAIL PROTECTED]>

diff -up linux-2.6.22.x86_64/mm/vmscan.c.wait linux-2.6.22.x86_64/mm/vmscan.c
--- linux-2.6.22.x86_64/mm/vmscan.c.wait2007-09-25 11:33:30.0 
-0400
+++ linux-2.6.22.x86_64/mm/vmscan.c 2007-09-25 21:27:08.0 -0400
@@ -68,6 +68,8 @@ struct scan_control {
int all_unreclaimable;
 
int order;
+
+   int nr_io_pages;
 };
 
 #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
@@ -489,8 +491,10 @@ static unsigned long shrink_page_list(st
 */
if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
wait_on_page_writeback(page);
-   else
+   else {
+   sc->nr_io_pages++;
goto keep_locked;
+   }
}
 
referenced = page_referenced(page, 1);
@@ -541,8 +545,10 @@ static unsigned long shrink_page_list(st
case PAGE_ACTIVATE:
goto activate_locked;
case PAGE_SUCCESS:
-   if (PageWriteback(page) || PageDirty(page))
+   if (PageWriteback(page) || PageDirty(page)) {
+   sc->nr_io_pages++;
goto keep;
+   }
/*
 * A synchronous write - probably a ramdisk.  Go
 * ahead and try to reclaim the page.
@@ -1201,6 +1207,7 @@ unsigned long try_to_free_pages(struct z
 
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
sc.nr_scanned = 0;
+   sc.nr_io_pages = 0;
if (!priority)
disable_swap_token();
nr_reclaimed += shrink_zones(priority, zones, );
@@ -1229,7 +1236,8 @@ unsigned long try_to_free_pages(struct z
}
 
/* Take a nap, wait for some writeback to complete */
-   if (sc.nr_scanned && priority < DEF_PRIORITY - 2)
+   if (sc.nr_scanned && priority < DEF_PRIORITY - 2 &&
+   sc.nr_io_pages > sc.swap_cluster_max)
congestion_wait(WRITE, HZ/10);
}
/* top priority shrink_caches still had more to do? don't OOM, then */
@@ -1315,6 +1323,7 @@ loop_again:
if (!priority)
disable_swap_token();
 
+   sc.nr_io_pages = 0;
all_zones_ok = 1;
 
/*
@@ -1398,7 +1407,8 @@ loop_again:
 * OK, kswapd is getting into trouble.  Take a nap, then take
 * another pass across the zones.
 */
-   if (total_scanned && priority < DEF_PRIORITY - 2)
+   if (total_scanned && priority < DEF_PRIORITY - 2 &&
+   sc.nr_io_pages > sc.swap_cluster_max)
congestion_wait(WRITE, HZ/10);
 
/*

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NO_HZ hangs up AMD MK-36

2007-09-27 Thread Thomas Gleixner
On Fri, 2007-09-28 at 00:01 +0300, Dmitry Tyschenko wrote:
> Sorry, I am newbie in linux. Hope you was talking about:
> /boot/vmlinuz-2.6.22-1-k7 root=/dev/sda5 ro nohz=off

Yes.

> But it doesn't help for Debians 2.6.22-1 (I don't have another
> prebuiled) still same problems.

Can you please add: nolapic_timer instead ?

Thanks,

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NO_HZ hangs up AMD MK-36

2007-09-27 Thread Rafael J. Wysocki
Please don't top post.

On Thursday, 27 September 2007 23:01, Dmitry Tyschenko wrote:
> Sorry, I am newbie in linux. Hope you was talking about:
> /boot/vmlinuz-2.6.22-1-k7 root=/dev/sda5 ro nohz=off

Yes.

> But it doesn't help for Debians 2.6.22-1 (I don't have another
> prebuiled) still same problems.

So, you need to explicitly unset NO_HZ in the kernel coniguration to make
things work.

Well, in that case please wait until the 2.6.23 kernel is out and test it.
There will be some important fixes related to NO_HZ in it.

Greetings,
Rafael


> 2007/9/27, Rafael J. Wysocki <[EMAIL PROTECTED]>:
> > On Thursday, 27 September 2007 22:28, Dmitry Tyschenko wrote:
> > > Hello,
> > >
> > > I have laptop Asus X50M. Using old Debian Etch from February.
> > > Kernel from 2.6.21 doesn't boot, hangs up just in 10seconds -  1minute
> > > after GRUB screen.
> > > I have tryed different versions of gcc (4.1.1, 4.1.2, 4.2.1) to build
> > > 2.6.22.8 kernel, but no results.
> > > But if I disable NO_HZ option 2.6.21 is working fine for me.
> > >
> > > I think this is important problem, because some of the project, Debian
> > > for example,
> > >  are building kernel with this options enabled (in
> > > linux-image-2.6.22-1-k7 package it is enabled),
> > > and some people, like me, can not use new kernels.
> > >
> > > I have attached some of my PC info, hope this can help
> >
> > You can use the "nohz=off" kernel command line switch.  Please check if it
> > works for you.
> >
> > Greetings,
> > Rafael
> >
> 
> 

-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >