Re: [PATCH 6/10] ACPI: register ACPI Video LCD as generic thermal cooling device
Hi, Matthew, On Fri, 2008-01-18 at 09:42 +0800, Matthew Garrett wrote: > On Fri, Jan 18, 2008 at 09:31:40AM +0800, Zhang Rui wrote: > > > Just like I don't think lcd should be used for ACPI thermal > management > > before I saw it is listed in _TZD and intel_menlow requires to > throttle > > it when overheating, why not let the individual drivers implement > the > > callbacks if there is clearly a request to do this. > > And we can add this to the generic acpi_device struct then if this > is a > > common feature for all ACPI devices. > > It'll probably never be common for all ACPI devices, I agree. > but it's already > required for three types. I think that's a strong argument for making > it generic. I don't think it's worth doing this as it's only the common feature for three ACPI devices. > > Well, you're right. > > But in order to throttle the lcd, this is reasonable, right? > > Moving the common code into its own routine and then calling that from > each of the others would probably work. Yes. I can send an on top patch if the patch "Rationalise ACPI backlight implementation" is applied. Thanks, Rui -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] spi : use class iteration api
On Monday 21 January 2008, Dave Young wrote: > Convert to use the class iteration api. > > Signed-off-by: Dave Young <[EMAIL PROTECTED]> Ack. > --- > drivers/spi/spi.c | 24 ++-- > 1 file changed, 14 insertions(+), 10 deletions(-) > > diff -upr linux/drivers/spi/spi.c linux.new/drivers/spi/spi.c > --- linux/drivers/spi/spi.c 2008-01-22 15:09:49.0 +0800 > +++ linux.new/drivers/spi/spi.c 2008-01-22 15:09:49.0 +0800 > @@ -485,6 +485,15 @@ void spi_unregister_master(struct spi_ma > } > EXPORT_SYMBOL_GPL(spi_unregister_master); > > +static int __spi_master_match(struct device *dev, void *data) > +{ > + struct spi_master *m; > + u16 *bus_num = data; > + > + m = container_of(dev, struct spi_master, dev); > + return m->bus_num == *bus_num; > +} > + > /** > * spi_busnum_to_master - look up master associated with bus_num > * @bus_num: the master's bus number > @@ -499,17 +508,12 @@ struct spi_master *spi_busnum_to_master( > { > struct device *dev; > struct spi_master *master = NULL; > - struct spi_master *m; > > - down(&spi_master_class.sem); > - list_for_each_entry(dev, &spi_master_class.children, node) { > - m = container_of(dev, struct spi_master, dev); > - if (m->bus_num == bus_num) { > - master = spi_master_get(m); > - break; > - } > - } > - up(&spi_master_class.sem); > + dev = class_find_device(&spi_master_class, &bus_num, > + __spi_master_match); > + if (dev) > + master = container_of(dev, struct spi_master, dev); > + /* reference got in class_find_device */ > return master; > } > EXPORT_SYMBOL_GPL(spi_busnum_to_master); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 7/7] driver-core : convert semaphore to mutex in struct class
On 22-01-2008 01:55, Dave Young wrote: ... > Hi, thanks your effort. Now I think we should stop this thread and > waiting the class_device going away :) Sure! But, if you change your mind I'm interested in this subject. Thanks, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/6] driver-core : add class iteration api
On Mon, Jan 21, 2008 at 10:24:17PM -0800, David Brownell wrote: > On Monday 21 January 2008, Dave Young wrote: > > > > +/** > > + * class_for_each_device - device iterator > > + * @class: the class we're iterating > > + * @data: data for the callback > > + * @fn: function to be called for each device > > + * > > + * Iterate over @class's list of devices, and call @fn for each, > > + * passing it @data. > > + * > > + * We check the return of @fn each time. If it returns anything > > + * other than 0, we break out and return that value. > > I have a suggestion for better documentation, which > applies to all these utilities: > > > > + */ > > +int class_for_each_device(struct class *class, void *data, > > + int (*fn)(struct device *, void *)) > > +{ > > + struct device *dev; > > + int error = 0; > > + > > + if (!class) > > + return -EINVAL; > > + down(&class->sem); > > + list_for_each_entry(dev, &class->devices, node) { > > + dev = get_device(dev); > > + if (dev) { > > + error = fn(dev, data); > > This is called with class->sem held. So fn() has a > constraint to not re-acquire that ... else it'd be > self-deadlocking. I'd like to see docs at least > mention that; calls to add or remove class members > would be verboten, for example, which isn't an issue > with most other driver model iterators. > > > > + put_device(dev); > > + } else > > + error = -ENODEV; > > + if (error) > > + break; > > + } > > + up(&class->sem); > > + > > + return error; > > +} > > +EXPORT_SYMBOL_GPL(class_for_each_device); Update kerneldoc as david brownell's sugestion. Is it right for me add Cornelia Huck's ack after this change? --- Add the following class iteration functions for driver use: class_for_each_device class_find_device class_for_each_child class_find_child Signed-off-by: Dave Young <[EMAIL PROTECTED]> Acked-by: Cornelia Huck <[EMAIL PROTECTED]> --- drivers/base/class.c | 175 + include/linux/device.h | 11 ++- 2 files changed, 184 insertions(+), 2 deletions(-) diff -upr linux/drivers/base/class.c linux.new/drivers/base/class.c --- linux/drivers/base/class.c 2008-01-22 15:06:55.0 +0800 +++ linux.new/drivers/base/class.c 2008-01-22 15:06:55.0 +0800 @@ -798,6 +798,181 @@ void class_device_put(struct class_devic kobject_put(&class_dev->kobj); } +/** + * class_for_each_device - device iterator + * @class: the class we're iterating + * @data: data for the callback + * @fn: function to be called for each device + * + * Iterate over @class's list of devices, and call @fn for each, + * passing it @data. + * + * We check the return of @fn each time. If it returns anything + * other than 0, we break out and return that value. + * + * Note, we hold class->sem in this function, so it can not be + * re-acquired in @fn, otherwise it will self-deadlocking. For + * example, calls to add or remove class members would be verboten. + */ +int class_for_each_device(struct class *class, void *data, + int (*fn)(struct device *, void *)) +{ + struct device *dev; + int error = 0; + + if (!class) + return -EINVAL; + down(&class->sem); + list_for_each_entry(dev, &class->devices, node) { + dev = get_device(dev); + if (dev) { + error = fn(dev, data); + put_device(dev); + } else + error = -ENODEV; + if (error) + break; + } + up(&class->sem); + + return error; +} +EXPORT_SYMBOL_GPL(class_for_each_device); + +/** + * class_find_device - device iterator for locating a particular device + * @class: the class we're iterating + * @data: data for the match function + * @match: function to check device + * + * This is similar to the class_for_each_dev() function above, but it + * returns a reference to a device that is 'found' for later use, as + * determined by the @match callback. + * + * The callback should return 0 if the device doesn't match and non-zero + * if it does. If the callback returns non-zero, this function will + * return to the caller and not iterate over any more devices. + + * Note, you will need to drop the reference with put_device() after use. + * + * We hold class->sem in this function, so it can not be + * re-acquired in @match, otherwise it will self-deadlocking. For + * example, calls to add or remove class members would be verboten. + */ +struct device *class_find_device(struct class *class, void *data, + int (*match)(struct device *, void *)) +{ + struct device *dev; + int found = 0; + + if (!class) + return NULL; + + down(&class->sem); + list_for_each_entry(dev, &c
Re: [PATCH 0/6] RFC: Typesafe callbacks
On Tuesday 22 January 2008 10:57:03 Linus Torvalds wrote: > On Tue, 22 Jan 2008, Rusty Russell wrote: > > Attempt to create callbacks which take unsigned long as well as > > correct pointer types. > > I bow down before you. > > I thought I had done some rather horrible things with gcc built-ins and > macros, but I hereby hand over my crown to you. > > As my daughter would say: that patch fell out of the ugly tree, and hit > every branch on the way down. Very impressive. > > All hail Rusty, undisputed ruler of Ugly-land. Err, thanks. I read some old SCSI drivers and felt inspired... > Side note: can you verify that __builtin_choose_expr() exists in gcc-3? I > don't think we've relied on it before except on arm, and that one has > always had its own compiler version dependencies.. Hmm, looks like not in 3.0.4, is in 3.1.1. I'll make it appropriately #ifdef'ed (which as a bonus will make things that little bit uglier still...) If we can stomach it the effect is nice, but the version which simply allows pointer correctness (rather than trying to do unsigned long too) is less bletcherous. Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] spi : use class iteration api
Convert to use the class iteration api. Signed-off-by: Dave Young <[EMAIL PROTECTED]> --- drivers/spi/spi.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff -upr linux/drivers/spi/spi.c linux.new/drivers/spi/spi.c --- linux/drivers/spi/spi.c 2008-01-22 15:09:49.0 +0800 +++ linux.new/drivers/spi/spi.c 2008-01-22 15:09:49.0 +0800 @@ -485,6 +485,15 @@ void spi_unregister_master(struct spi_ma } EXPORT_SYMBOL_GPL(spi_unregister_master); +static int __spi_master_match(struct device *dev, void *data) +{ + struct spi_master *m; + u16 *bus_num = data; + + m = container_of(dev, struct spi_master, dev); + return m->bus_num == *bus_num; +} + /** * spi_busnum_to_master - look up master associated with bus_num * @bus_num: the master's bus number @@ -499,17 +508,12 @@ struct spi_master *spi_busnum_to_master( { struct device *dev; struct spi_master *master = NULL; - struct spi_master *m; - down(&spi_master_class.sem); - list_for_each_entry(dev, &spi_master_class.children, node) { - m = container_of(dev, struct spi_master, dev); - if (m->bus_num == bus_num) { - master = spi_master_get(m); - break; - } - } - up(&spi_master_class.sem); + dev = class_find_device(&spi_master_class, &bus_num, + __spi_master_match); + if (dev) + master = container_of(dev, struct spi_master, dev); + /* reference got in class_find_device */ return master; } EXPORT_SYMBOL_GPL(spi_busnum_to_master); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Parallelize IO for e2fsck
On Jan 22, 2008 14:38 +1100, David Chinner wrote: > On Mon, Jan 21, 2008 at 04:00:41PM -0700, Andreas Dilger wrote: > > I discussed this with Ted at one point also. This is a generic problem, > > not just for readahead, because "fsck" can run multiple e2fsck in parallel > > and in case of many large filesystems on a single node this can cause > > memory usage problems also. > > > > What I was proposing is that "fsck.{fstype}" be modified to return an > > estimated minimum amount of memory needed, and some "desired" amount of > > memory (i.e. readahead) to fsck the filesystem, using some parameter like > > "fsck.{fstype} --report-memory-needed /dev/XXX". If this does not > > return the output in the expected format, or returns an error then fsck > > will assume some amount of memory based on the device size and continue > > as it does today. > > And while fsck is running, some other program runs that uses > memory and blows your carefully calculated paramters to smithereens? Well, fsck has a rather restricted working environment, because it is run before most other processes start (i.e. single-user mode). For fsck initiated by an admin in other runlevels the admin would need to specify the upper limit of memory usage. My proposal was only for the single-user fsck at boot time. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Parallelize IO for e2fsck
On Jan 21, 2008 23:17 -0500, [EMAIL PROTECTED] wrote: > On Tue, 22 Jan 2008 14:38:30 +1100, David Chinner said: > > Perhaps instead of swapping immediately, a SIGLOWMEM could be sent > > to a processes that aren't masking the signal followed by a short > > grace period to allow the processes to free up some memory before > > swapping out pages from that process? > > AIX had SIGDANGER some 15 years ago. Admittedly, that was sent when > the system was about to hit OOM, not when it was about to start swapping. I'd tried to advocate SIGDANGER some years ago as well, but none of the kernel maintainers were interested. It definitely makes sense to have some sort of mechanism like this. At the time I first brought it up it was in conjunction with Netscape using too much cache on some system, but it would be just as useful for all kinds of other memory- hungry applications. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] spi : use class iteration api
On Jan 22, 2008 2:56 PM, David Brownell <[EMAIL PROTECTED]> wrote: > On Monday 21 January 2008, Dave Young wrote: > > +static int __spi_master_match(struct device *dev, void *data) > > +{ > > +struct spi_master *m; > > +u16 *bus_num = (u16 *)data; > > That's "void *data" so "u16 *bus_num = data" is preferred. > Fine. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] spi : use class iteration api
On Monday 21 January 2008, Dave Young wrote: > +static int __spi_master_match(struct device *dev, void *data) > +{ > + struct spi_master *m; > + u16 *bus_num = (u16 *)data; That's "void *data" so "u16 *bus_num = data" is preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] bluetooth : move children of connection device to NULL before connection down
Hi Dave, > > Add people missed in cc-list. > > Thanks Dave for your continued efforts on Bluetooth bugs like this. > > Marcel, are you going to review/ACK/integrate/push-upstream/whatever > any of these Bluetooth patches? > > It hasn't been getting much love from you as of late, you are one of > the listed maintainers, and I don't want to lose any of Dave's > valuable bug fixing work. I will be fully back in business next week. Just got stuck in a project that needed 200% of my time to get it going. > Or should I just handle it all directly? I followed the list only a little bit, but from what I have seen is that Dave is doing a great job in tracking all issues down to the real cause. I had a look at his last patch and after review, I agree that this is a possible solution. I only have two nitpicks about the coding style. So in del_conn the struct device declaration should be made after the struct hci_conn assignment from the container and I would put an extra empty line before the devel_del, put_device block. Nitpicks only. Right now I can't think of any side effects by this patch. Actually I only see an improvement with this patch. So please take it directly and starting with next week, I gonna make sure that they are handled again properly by me. Regards Marcel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/6] driver-core : add class iteration api
On Jan 22, 2008 2:24 PM, David Brownell <[EMAIL PROTECTED]> wrote: > On Monday 21 January 2008, Dave Young wrote: > > > > +/** > > + * class_for_each_device - device iterator > > + * @class: the class we're iterating > > + * @data: data for the callback > > + * @fn: function to be called for each device > > + * > > + * Iterate over @class's list of devices, and call @fn for each, > > + * passing it @data. > > + * > > + * We check the return of @fn each time. If it returns anything > > + * other than 0, we break out and return that value. > > I have a suggestion for better documentation, which > applies to all these utilities: > > > > + */ > > +int class_for_each_device(struct class *class, void *data, > > +int (*fn)(struct device *, void *)) > > +{ > > + struct device *dev; > > + int error = 0; > > + > > + if (!class) > > + return -EINVAL; > > + down(&class->sem); > > + list_for_each_entry(dev, &class->devices, node) { > > + dev = get_device(dev); > > + if (dev) { > > + error = fn(dev, data); > > This is called with class->sem held. So fn() has a > constraint to not re-acquire that ... else it'd be > self-deadlocking. I'd like to see docs at least > mention that; calls to add or remove class members > would be verboten, for example, which isn't an issue > with most other driver model iterators. Very good comment, thanks david. I will update after a while. > > > > > + put_device(dev); > > + } else > > + error = -ENODEV; > > + if (error) > > + break; > > + } > > + up(&class->sem); > > + > > + return error; > > +} > > +EXPORT_SYMBOL_GPL(class_for_each_device); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] bluetooth : move children of connection device to NULL before connection down
From: Marcel Holtmann <[EMAIL PROTECTED]> Date: Tue, 22 Jan 2008 07:18:16 +0100 > Right now I can't think of any side effects by this patch. Actually I > only see an improvement with this patch. So please take it directly and > starting with next week, I gonna make sure that they are handled again > properly by me. Excellent, I'll do that. Thanks for the feedback. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/6] driver-core : add class iteration api
On Monday 21 January 2008, Dave Young wrote: > > +/** > + * class_for_each_device - device iterator > + * @class: the class we're iterating > + * @data: data for the callback > + * @fn: function to be called for each device > + * > + * Iterate over @class's list of devices, and call @fn for each, > + * passing it @data. > + * > + * We check the return of @fn each time. If it returns anything > + * other than 0, we break out and return that value. I have a suggestion for better documentation, which applies to all these utilities: > + */ > +int class_for_each_device(struct class *class, void *data, > +int (*fn)(struct device *, void *)) > +{ > + struct device *dev; > + int error = 0; > + > + if (!class) > + return -EINVAL; > + down(&class->sem); > + list_for_each_entry(dev, &class->devices, node) { > + dev = get_device(dev); > + if (dev) { > + error = fn(dev, data); This is called with class->sem held. So fn() has a constraint to not re-acquire that ... else it'd be self-deadlocking. I'd like to see docs at least mention that; calls to add or remove class members would be verboten, for example, which isn't an issue with most other driver model iterators. > + put_device(dev); > + } else > + error = -ENODEV; > + if (error) > + break; > + } > + up(&class->sem); > + > + return error; > +} > +EXPORT_SYMBOL_GPL(class_for_each_device); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] pci-skeleton: Misc fixes to build neatly
Hello Jeff, The pci-skeleton.c has several problems with compilation, such as missing args when calling synchronize_irq(). Fix it. Signed-off-by: Jike Song <[EMAIL PROTECTED]> --- drivers/net/pci-skeleton.c | 49 ++- 1 files changed, 25 insertions(+), 24 deletions(-) diff --git a/drivers/net/pci-skeleton.c b/drivers/net/pci-skeleton.c index ed402e0..fffc49b 100644 --- a/drivers/net/pci-skeleton.c +++ b/drivers/net/pci-skeleton.c @@ -541,7 +541,7 @@ static void netdrv_hw_start (struct net_device *dev); #define NETDRV_W32_F(reg, val32) do { writel ((val32), ioaddr + (reg)); readl (ioaddr + (reg)); } while (0) -#if MMIO_FLUSH_AUDIT_COMPLETE +#ifdef MMIO_FLUSH_AUDIT_COMPLETE /* write MMIO register */ #define NETDRV_W8(reg, val8) writeb ((val8), ioaddr + (reg)) @@ -603,7 +603,7 @@ static int __devinit netdrv_init_board (struct pci_dev *pdev, return -ENOMEM; } SET_NETDEV_DEV(dev, &pdev->dev); - tp = dev->priv; + tp = netdev_priv(dev); /* enable device (incl. PCI PM wakeup), and bus-mastering */ rc = pci_enable_device (pdev); @@ -759,7 +759,7 @@ static int __devinit netdrv_init_one (struct pci_dev *pdev, return i; } - tp = dev->priv; + tp = netdev_priv(dev); assert (ioaddr != NULL); assert (dev != NULL); @@ -783,7 +783,7 @@ static int __devinit netdrv_init_one (struct pci_dev *pdev, dev->base_addr = (unsigned long) ioaddr; /* dev->priv/tp zeroed and aligned in alloc_etherdev */ - tp = dev->priv; + tp = netdev_priv(dev); /* note: tp->chipset set in netdrv_init_board */ tp->drv_flags = PCI_COMMAND_IO | PCI_COMMAND_MEMORY | @@ -841,7 +841,7 @@ static void __devexit netdrv_remove_one (struct pci_dev *pdev) assert (dev != NULL); - np = dev->priv; + np = netdev_priv(dev); assert (np != NULL); unregister_netdev (dev); @@ -974,7 +974,7 @@ static void mdio_sync (void *mdio_addr) static int mdio_read (struct net_device *dev, int phy_id, int location) { - struct netdrv_private *tp = dev->priv; + struct netdrv_private *tp = netdev_priv(dev); void *mdio_addr = tp->mmio_addr + Config4; int mii_cmd = (0xf6 << 10) | (phy_id << 5) | location; int retval = 0; @@ -1017,7 +1017,7 @@ static int mdio_read (struct net_device *dev, int phy_id, int location) static void mdio_write (struct net_device *dev, int phy_id, int location, int value) { - struct netdrv_private *tp = dev->priv; + struct netdrv_private *tp = netdev_priv(dev); void *mdio_addr = tp->mmio_addr + Config4; int mii_cmd = (0x5002 << 16) | (phy_id << 23) | (location << 18) | value; @@ -1060,7 +1060,7 @@ static void mdio_write (struct net_device *dev, int phy_id, int location, static int netdrv_open (struct net_device *dev) { - struct netdrv_private *tp = dev->priv; + struct netdrv_private *tp = netdev_priv(dev); int retval; #ifdef NETDRV_DEBUG void *ioaddr = tp->mmio_addr; @@ -1121,7 +1121,7 @@ static int netdrv_open (struct net_device *dev) /* Start the hardware at open or resume. */ static void netdrv_hw_start (struct net_device *dev) { - struct netdrv_private *tp = dev->priv; + struct netdrv_private *tp = netdev_priv(dev); void *ioaddr = tp->mmio_addr; u32 i; @@ -1191,7 +1191,7 @@ static void netdrv_hw_start (struct net_device *dev) /* Initialize the Rx and Tx rings, along with various 'dev' bits. */ static void netdrv_init_ring (struct net_device *dev) { - struct netdrv_private *tp = dev->priv; + struct netdrv_private *tp = netdev_priv(dev); int i; DPRINTK ("ENTER\n"); @@ -1213,7 +1213,7 @@ static void netdrv_init_ring (struct net_device *dev) static void netdrv_timer (unsigned long data) { struct net_device *dev = (struct net_device *) data; - struct netdrv_private *tp = dev->priv; + struct netdrv_private *tp = netdev_priv(dev); void *ioaddr = tp->mmio_addr; int next_tick = 60 * HZ; int mii_lpa; @@ -1252,9 +1252,10 @@ static void netdrv_timer (unsigned long data) } -static void netdrv_tx_clear (struct netdrv_private *tp) +static void netdrv_tx_clear (struct net_device *dev) { int i; + struct netdrv_private *tp = netdev_priv(dev); atomic_set (&tp->cur_tx, 0); atomic_set (&tp->dirty_tx, 0); @@ -1278,7 +1279,7 @@ static void netdrv_tx_clear (struct netdrv_private *tp) static void netdrv_tx_timeout (struct net_device *dev) { - struct netdrv_private *tp = dev->priv; + struct netdrv_private *tp = netdev_priv(dev); void *ioaddr = tp->mmio_addr; int i; u8 tmp8; @@ -1311,7 +1312,7 @@ static void netdrv_tx_timeout (struct net_device *dev) /* Stop a shared interrupt from scavenging while we are. */
Re: [PATCH 7/7] driver-core : convert semaphore to mutex in struct class
> > > > Hope the iteration patches 1-6/7 could be applied. > > Can you resend them again, and CC: me on all of them, with the latest > updates, so I know what I should be reviewing this time around? Hi, sent. > > thanks, > > greg k-h > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/6] driver-core : add class iteration api
On Jan 22, 2008 1:54 PM, Dave Young <[EMAIL PROTECTED]> wrote: > > Add the following class iteration functions for driver use: > class_for_each_device > class_find_device > class_for_each_child > class_find_child > > Signed-off-by: Dave Young <[EMAIL PROTECTED]> > Acked-by: Cornelia Huck <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/6] spi : use class iteration api
Convert to use the class iteration api. Signed-off-by: Dave Young <[EMAIL PROTECTED]> --- drivers/spi/spi.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff -upr linux/drivers/spi/spi.c linux.new/drivers/spi/spi.c --- linux/drivers/spi/spi.c 2008-01-16 08:43:35.0 +0800 +++ linux.new/drivers/spi/spi.c 2008-01-16 08:43:35.0 +0800 @@ -485,6 +485,15 @@ void spi_unregister_master(struct spi_ma } EXPORT_SYMBOL_GPL(spi_unregister_master); +static int __spi_master_match(struct device *dev, void *data) +{ + struct spi_master *m; + u16 *bus_num = (u16 *)data; + + m = container_of(dev, struct spi_master, dev); + return m->bus_num == *bus_num; +} + /** * spi_busnum_to_master - look up master associated with bus_num * @bus_num: the master's bus number @@ -499,17 +508,12 @@ struct spi_master *spi_busnum_to_master( { struct device *dev; struct spi_master *master = NULL; - struct spi_master *m; - down(&spi_master_class.sem); - list_for_each_entry(dev, &spi_master_class.children, node) { - m = container_of(dev, struct spi_master, dev); - if (m->bus_num == bus_num) { - master = spi_master_get(m); - break; - } - } - up(&spi_master_class.sem); + dev = class_find_device(&spi_master_class, &bus_num, + __spi_master_match); + if (dev) + master = container_of(dev, struct spi_master, dev); + /* reference got in class_find_device */ return master; } EXPORT_SYMBOL_GPL(spi_busnum_to_master); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/6] scsi : use class iteration api
Convert to use the class iteration api. Signed-off-by: Dave Young <[EMAIL PROTECTED]> --- drivers/scsi/hosts.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff -upr linux/drivers/scsi/hosts.c linux.new/drivers/scsi/hosts.c --- linux/drivers/scsi/hosts.c 2008-01-16 08:43:35.0 +0800 +++ linux.new/drivers/scsi/hosts.c 2008-01-16 08:43:35.0 +0800 @@ -429,6 +429,15 @@ void scsi_unregister(struct Scsi_Host *s } EXPORT_SYMBOL(scsi_unregister); +static int __scsi_host_match(struct class_device *cdev, void *data) +{ + struct Scsi_Host *p; + unsigned short *hostnum = (unsigned short *)data; + + p = class_to_shost(cdev); + return p->host_no == *hostnum; +} + /** * scsi_host_lookup - get a reference to a Scsi_Host by host no * @@ -439,19 +448,12 @@ EXPORT_SYMBOL(scsi_unregister); **/ struct Scsi_Host *scsi_host_lookup(unsigned short hostnum) { - struct class *class = &shost_class; struct class_device *cdev; - struct Scsi_Host *shost = ERR_PTR(-ENXIO), *p; + struct Scsi_Host *shost = ERR_PTR(-ENXIO); - down(&class->sem); - list_for_each_entry(cdev, &class->children, node) { - p = class_to_shost(cdev); - if (p->host_no == hostnum) { - shost = scsi_host_get(p); - break; - } - } - up(&class->sem); + cdev = class_find_child(&shost_class, &hostnum, __scsi_host_match); + if (cdev) + shost = scsi_host_get(class_to_shost(cdev)); return shost; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/6] rtc : use class iteration api
Convert to use the class iteration api. Signed-off-by: Dave Young <[EMAIL PROTECTED]> --- drivers/rtc/interface.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff -upr linux/drivers/rtc/interface.c linux.new/drivers/rtc/interface.c --- linux/drivers/rtc/interface.c 2008-01-11 18:06:38.0 +0800 +++ linux.new/drivers/rtc/interface.c 2008-01-11 18:06:38.0 +0800 @@ -251,20 +251,23 @@ void rtc_update_irq(struct rtc_device *r } EXPORT_SYMBOL_GPL(rtc_update_irq); +static int __rtc_match(struct device *dev, void *data) +{ + char *name = (char *)data; + + if (strncmp(dev->bus_id, name, BUS_ID_SIZE) == 0) + return 1; + return 0; +} + struct rtc_device *rtc_class_open(char *name) { struct device *dev; struct rtc_device *rtc = NULL; - down(&rtc_class->sem); - list_for_each_entry(dev, &rtc_class->devices, node) { - if (strncmp(dev->bus_id, name, BUS_ID_SIZE) == 0) { - dev = get_device(dev); - if (dev) - rtc = to_rtc_device(dev); - break; - } - } + dev = class_find_device(rtc_class, name, __rtc_match); + if (dev) + rtc = to_rtc_device(dev); if (rtc) { if (!try_module_get(rtc->owner)) { @@ -272,7 +275,6 @@ struct rtc_device *rtc_class_open(char * rtc = NULL; } } - up(&rtc_class->sem); return rtc; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/6] power supply : use class iteration api
Convert to use the class iteration api. Signed-off-by: Dave Young <[EMAIL PROTECTED]> --- drivers/power/apm_power.c | 116 ++ drivers/power/power_supply_core.c | 72 --- 2 files changed, 106 insertions(+), 82 deletions(-) diff -upr linux/drivers/power/apm_power.c linux.new/drivers/power/apm_power.c --- linux/drivers/power/apm_power.c 2008-01-11 18:06:38.0 +0800 +++ linux.new/drivers/power/apm_power.c 2008-01-11 18:06:38.0 +0800 @@ -13,6 +13,7 @@ #include #include +static DEFINE_MUTEX(apm_mutex); #define PSY_PROP(psy, prop, val) psy->get_property(psy, \ POWER_SUPPLY_PROP_##prop, val) @@ -23,67 +24,86 @@ static struct power_supply *main_battery; -static void find_main_battery(void) -{ - struct device *dev; - struct power_supply *bat = NULL; - struct power_supply *max_charge_bat = NULL; - struct power_supply *max_energy_bat = NULL; +struct find_bat_param { + struct power_supply *main; + struct power_supply *bat; + struct power_supply *max_charge_bat; + struct power_supply *max_energy_bat; union power_supply_propval full; - int max_charge = 0; - int max_energy = 0; + int max_charge; + int max_energy; +}; - main_battery = NULL; +static int __find_main_battery(struct device *dev, void *data) +{ + struct find_bat_param *bp = (struct find_bat_param *)data; - list_for_each_entry(dev, &power_supply_class->devices, node) { - bat = dev_get_drvdata(dev); + bp->bat = dev_get_drvdata(dev); - if (bat->use_for_apm) { - /* nice, we explicitly asked to report this battery. */ - main_battery = bat; - return; - } + if (bp->bat->use_for_apm) { + /* nice, we explicitly asked to report this battery. */ + bp->main = bp->bat; + return 1; + } - if (!PSY_PROP(bat, CHARGE_FULL_DESIGN, &full) || - !PSY_PROP(bat, CHARGE_FULL, &full)) { - if (full.intval > max_charge) { - max_charge_bat = bat; - max_charge = full.intval; - } - } else if (!PSY_PROP(bat, ENERGY_FULL_DESIGN, &full) || - !PSY_PROP(bat, ENERGY_FULL, &full)) { - if (full.intval > max_energy) { - max_energy_bat = bat; - max_energy = full.intval; - } + if (!PSY_PROP(bp->bat, CHARGE_FULL_DESIGN, &bp->full) || + !PSY_PROP(bp->bat, CHARGE_FULL, &bp->full)) { + if (bp->full.intval > bp->max_charge) { + bp->max_charge_bat = bp->bat; + bp->max_charge = bp->full.intval; + } + } else if (!PSY_PROP(bp->bat, ENERGY_FULL_DESIGN, &bp->full) || + !PSY_PROP(bp->bat, ENERGY_FULL, &bp->full)) { + if (bp->full.intval > bp->max_energy) { + bp->max_energy_bat = bp->bat; + bp->max_energy = bp->full.intval; } } + return 0; +} + +static void find_main_battery(void) +{ + struct find_bat_param bp; + int error; + + memset(&bp, 0, sizeof(struct find_bat_param)); + main_battery = NULL; + bp.main = main_battery; + + error = class_for_each_device(power_supply_class, &bp, + __find_main_battery); + if (error) { + main_battery = bp.main; + return; + } - if ((max_energy_bat && max_charge_bat) && - (max_energy_bat != max_charge_bat)) { + if ((bp.max_energy_bat && bp.max_charge_bat) && + (bp.max_energy_bat != bp.max_charge_bat)) { /* try guess battery with more capacity */ - if (!PSY_PROP(max_charge_bat, VOLTAGE_MAX_DESIGN, &full)) { - if (max_energy > max_charge * full.intval) - main_battery = max_energy_bat; + if (!PSY_PROP(bp.max_charge_bat, VOLTAGE_MAX_DESIGN, + &bp.full)) { + if (bp.max_energy > bp.max_charge * bp.full.intval) + main_battery = bp.max_energy_bat; else - main_battery = max_charge_bat; - } else if (!PSY_PROP(max_energy_bat, VOLTAGE_MAX_DESIGN, - &full)) { - if (max_charge > max_energy / full.intval) - main_battery = max_charge_bat; + main_battery
[PATCH 2/6] ieee1394 : use class iteration api
Convert to use the class iteration api. Signed-off-by: Dave Young <[EMAIL PROTECTED]> --- drivers/ieee1394/nodemgr.c | 312 + 1 file changed, 175 insertions(+), 137 deletions(-) diff -upr linux/drivers/ieee1394/nodemgr.c linux.new/drivers/ieee1394/nodemgr.c --- linux/drivers/ieee1394/nodemgr.c2008-01-16 08:43:35.0 +0800 +++ linux.new/drivers/ieee1394/nodemgr.c2008-01-16 08:43:35.0 +0800 @@ -727,33 +727,31 @@ static int nodemgr_bus_match(struct devi static DEFINE_MUTEX(nodemgr_serialize_remove_uds); +static int __match_ne(struct device *dev, void *data) +{ + struct unit_directory *ud; + struct node_entry *ne = (struct node_entry *)data; + + ud = container_of(dev, struct unit_directory, unit_dev); + return ud->ne == ne; +} + static void nodemgr_remove_uds(struct node_entry *ne) { struct device *dev; - struct unit_directory *tmp, *ud; + struct unit_directory *ud; - /* Iteration over nodemgr_ud_class.devices has to be protected by -* nodemgr_ud_class.sem, but device_unregister() will eventually -* take nodemgr_ud_class.sem too. Therefore pick out one ud at a time, -* release the semaphore, and then unregister the ud. Since this code -* may be called from other contexts besides the knodemgrds, protect the -* gap after release of the semaphore by nodemgr_serialize_remove_uds. + /* Use class_find device to iterate the devices. Since this code +* may be called from other contexts besides the knodemgrds, +* protect it by nodemgr_serialize_remove_uds. */ mutex_lock(&nodemgr_serialize_remove_uds); for (;;) { - ud = NULL; - down(&nodemgr_ud_class.sem); - list_for_each_entry(dev, &nodemgr_ud_class.devices, node) { - tmp = container_of(dev, struct unit_directory, - unit_dev); - if (tmp->ne == ne) { - ud = tmp; - break; - } - } - up(&nodemgr_ud_class.sem); - if (ud == NULL) + dev = class_find_device(&nodemgr_ud_class, ne, __match_ne); + if (!dev) break; + ud = container_of(dev, struct unit_directory, unit_dev); + put_device(dev); device_unregister(&ud->unit_dev); device_unregister(&ud->device); } @@ -882,45 +880,66 @@ fail_alloc: return NULL; } +static int __match_ne_guid(struct device *dev, void *data) +{ + struct node_entry *ne; + u64 *guid = (u64 *)data; + + ne = container_of(dev, struct node_entry, node_dev); + return ne->guid == *guid; +} static struct node_entry *find_entry_by_guid(u64 guid) { struct device *dev; - struct node_entry *ne, *ret_ne = NULL; - - down(&nodemgr_ne_class.sem); - list_for_each_entry(dev, &nodemgr_ne_class.devices, node) { - ne = container_of(dev, struct node_entry, node_dev); + struct node_entry *ne; - if (ne->guid == guid) { - ret_ne = ne; - break; - } - } - up(&nodemgr_ne_class.sem); + dev = class_find_device(&nodemgr_ne_class, &guid, __match_ne_guid); + if (!dev) + return NULL; + ne = container_of(dev, struct node_entry, node_dev); + put_device(dev); - return ret_ne; + return ne; } +struct match_nodeid_param { + struct hpsb_host *host; + nodeid_t nodeid; +}; + +static int __match_ne_nodeid(struct device *dev, void *data) +{ + int found = 0; + struct node_entry *ne; + struct match_nodeid_param *param = (struct match_nodeid_param *)data; + + if (!dev) + goto ret; + ne = container_of(dev, struct node_entry, node_dev); + if (ne->host == param->host && ne->nodeid == param->nodeid) + found = 1; +ret: + return found; +} static struct node_entry *find_entry_by_nodeid(struct hpsb_host *host, nodeid_t nodeid) { struct device *dev; - struct node_entry *ne, *ret_ne = NULL; + struct node_entry *ne; + struct match_nodeid_param param; - down(&nodemgr_ne_class.sem); - list_for_each_entry(dev, &nodemgr_ne_class.devices, node) { - ne = container_of(dev, struct node_entry, node_dev); + param.host = host; + param.nodeid = nodeid; - if (ne->host == host && ne->nodeid == nodeid) { - ret_ne = ne; - break; - } - } - up(&nodemgr_ne_class.sem); + dev = class_find_device(&nodemgr_ne_class, ¶m, __match_ne_nodeid); + i
[PATCH 1/6] driver-core : add class iteration api
Add the following class iteration functions for driver use: class_for_each_device class_find_device class_for_each_child class_find_child Signed-off-by: Dave Young <[EMAIL PROTECTED]> --- drivers/base/class.c | 159 + include/linux/device.h | 11 ++- 2 files changed, 168 insertions(+), 2 deletions(-) diff -upr linux/drivers/base/class.c linux.new/drivers/base/class.c --- linux/drivers/base/class.c 2008-01-15 11:12:29.0 +0800 +++ linux.new/drivers/base/class.c 2008-01-15 11:12:29.0 +0800 @@ -798,6 +798,165 @@ void class_device_put(struct class_devic kobject_put(&class_dev->kobj); } +/** + * class_for_each_device - device iterator + * @class: the class we're iterating + * @data: data for the callback + * @fn: function to be called for each device + * + * Iterate over @class's list of devices, and call @fn for each, + * passing it @data. + * + * We check the return of @fn each time. If it returns anything + * other than 0, we break out and return that value. + */ +int class_for_each_device(struct class *class, void *data, + int (*fn)(struct device *, void *)) +{ + struct device *dev; + int error = 0; + + if (!class) + return -EINVAL; + down(&class->sem); + list_for_each_entry(dev, &class->devices, node) { + dev = get_device(dev); + if (dev) { + error = fn(dev, data); + put_device(dev); + } else + error = -ENODEV; + if (error) + break; + } + up(&class->sem); + + return error; +} +EXPORT_SYMBOL_GPL(class_for_each_device); + +/** + * class_find_device - device iterator for locating a particular device + * @class: the class we're iterating + * @data: data for the match function + * @match: function to check device + * + * This is similar to the class_for_each_dev() function above, but it + * returns a reference to a device that is 'found' for later use, as + * determined by the @match callback. + * + * The callback should return 0 if the device doesn't match and non-zero + * if it does. If the callback returns non-zero, this function will + * return to the caller and not iterate over any more devices. + + * Note, you will need to drop the reference with put_device() after use. + */ +struct device *class_find_device(struct class *class, void *data, + int (*match)(struct device *, void *)) +{ + struct device *dev; + int found = 0; + + if (!class) + return NULL; + + down(&class->sem); + list_for_each_entry(dev, &class->devices, node) { + dev = get_device(dev); + if (dev) { + if (match(dev, data)) { + found = 1; + break; + } else + put_device(dev); + } else + break; + } + up(&class->sem); + + return found ? dev : NULL; +} +EXPORT_SYMBOL_GPL(class_find_device); + +/** + * class_for_each_child - class child iterator + * @class: the class we're iterating + * @data: data for the callback + * @fn: function to be called for each child of the class + * + * Iterate over @class's list of children, and call @fn for each, + * passing it @data. + * + * We check the return of @fn each time. If it returns anything + * other than 0, we break out and return that value. + */ +int class_for_each_child(struct class *class, void *data, + int (*fn)(struct class_device *, void *)) +{ + struct class_device *dev; + int error = 0; + + if (!class) + return -EINVAL; + down(&class->sem); + list_for_each_entry(dev, &class->children, node) { + dev = class_device_get(dev); + if (dev) { + error = fn(dev, data); + class_device_put(dev); + } else + error = -ENODEV; + if (error) + break; + } + up(&class->sem); + + return error; +} +EXPORT_SYMBOL_GPL(class_for_each_child); + +/** + * class_find_child - device iterator for locating a particular class_device + * @class: the class we're iterating + * @data: data for the match function + * @match: function to check class_device + * + * This is similar to the class_for_each_child() function above, but it + * returns a reference to a class_device that is 'found' for later use, as + * determined by the @match callback. + * + * The callback should return 0 if the class_device doesn't match and non-zero + * if it does. If the callback returns non-zero, this function will + * return to the caller and not iter
Re: questions on NAPI processing latency and dropped network packets
Chris Friesen a écrit : Eric Dumazet wrote: Chris Friesen a écrit : I've done some further digging, and it appears that one of the problems we may be facing is very high instantaneous traffic rates. Instrumentation showed up to 222K packets/sec for short periods (at least 1.1 ms, possibly longer), although the long-term average is down around 14-16K packets/sec. Instrumentation done where exactly ? I added some code to e1000_clean_rx_irq() to track rx_fifo drops, total packets received, and an accurate timestamp. If rx_fifo errors changed, it would dump the information. Is there anything else we can do to minimize the latency of network packet processing and avoid having to crank the rx ring size up so high? You have some tasks that disable softirqs too long. Sometimes, bumping RX ring size is OK (but you will still have delays), sometimes it is not an option, since 4096 is the limit on current hardware. I added some instrumentation to take timestamps in __do_softirq() as well. Based on these timestamps, I can see the following code sequence: 2374604616 usec, start processing softirqs in __do_softirq() 2374610337 usec, log values in e1000_clean_rx_irq() 2374611411 usec, log values in e1000_clean_rx_irq() In between the successive calls to e1000_clean_rx_irq() the rx_fifo counts went up. Does anyone have any patchsets to track down what softirqs are taking a long time, and/or who's disabling softirqs? Not for linux-2.6.10 unfortunatly. Check net/ipv4/route.c, where many improvements can be done, especially if you have a large rt cache grep . /proc/sys/net/ipv4/route/* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24 regression: pan hanging unkilleable and un-straceable
On Tue, 2008-01-22 at 16:25 +1100, Nick Piggin wrote: > On Tuesday 22 January 2008 16:03, Mike Galbraith wrote: > > I've hit same twice recently (not pan, and not repeatable). > > Nasty. The attached patch is something really simple that can sometimes help. > sysrq+p is also an option, if you're on a UP system. SMP (P4/HT imitating real cores) > Any luck getting traces? We'll see. Armed. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rcu: fix section mismatch
On Mon, Jan 21, 2008 at 03:34:09PM -0800, Randy Dunlap wrote: > On Mon, 21 Jan 2008 11:38:38 +1100 Rusty Russell wrote: > > > On Sunday 20 January 2008 08:25:49 Sam Ravnborg wrote: > > > On Sat, Jan 19, 2008 at 11:56:43AM -0800, Randy Dunlap wrote: > > > > rcu_online_cpu() should be __cpuinit instead of __devinit. > > > > > > So if we have: > > > CONFIG_HOTPLUG=n > > > CONFIG_HOTPLUG_CPU=y > > > > > > then this is a oops candidate. > > > > At first glance, this can't happen because all CONFIG_HOTPLUG_CPU depends > > on > > CONFIG_HOTPLUG or selects it, for all archs. > > Mostly, but arch/mips/ seems to be different (neither depends nor selects) > unless it has changed very recently (I looked at 2.6.24-rc8). mips has default n So they at least try to turn off this feature. Sam -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHSET] driver core : add class iteration api
Repost for review. [PATCH 1/6] Add some class iteration functions in driver core [PATCH 2-6/6] Make the drivers with class iterating to use class iteration api toc: --- 1-driver-core-add-class-iteration-api.patch 2-ieee1394-use-class-iteration-api.patch 3-power_supply-use-class-iteration-api.patch 4-rtc-use-class-iteration-api.patch 5-scsi-use-class-iteration-api.patch 6-spi-use-class-iteration-api.patch Summary diffstat: --- drivers/base/class.c | 159 +++ drivers/ieee1394/nodemgr.c| 312 +- drivers/power/apm_power.c | 116 -- drivers/power/power_supply_core.c | 72 drivers/rtc/interface.c | 22 +- drivers/scsi/hosts.c | 24 +- drivers/spi/spi.c | 24 +- include/linux/device.h| 11 + 8 files changed, 488 insertions(+), 252 deletions(-) Regards dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sysfs network namespace support - was this patch set forgotten ?
On Sun, Jan 20, 2008 at 09:08:43AM +0200, Ian Brown wrote: > Hello, > > I saw some posts (from about a month ago) about network namespace > support patches; I wonder: what > is the status of this patch set ? was it somehow forgotten ? > (I don't see it in v2.6.24-rc8 mm tree). It wasn't "forgotten", but rather, not applied as Al Viro started to have some serious questions about these changes... I'll wait for his review before applying them. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/12] ide-floppy redux v2.5
On Mon, Jan 21, 2008 at 11:45:35PM +0100, Bartlomiej Zolnierkiewicz wrote: > > Hi Borislav, > > On Sunday 20 January 2008, Borislav Petkov wrote: > > On Mon, Jan 14, 2008 at 10:38:17PM +0100, Bartlomiej Zolnierkiewicz wrote: > > > > By the way, I have an Iomega ZIP 100 drive somewhere in my hardware > > > > pile and > > > > will do some testing with the "new" :) driver just in case. > > > > > > This would be great. :) > > Hi Bart, > > > > i just whipped rc8 along with your pata-2.6 tree on top and had several test > > runs of the ide-floppy driver (raw reads, software floppy disk eject, etc) > > and > > everything seems to work fine. I will keep this hardware setup here so that > > we > > could at least test ide-floppy occasionally. We should probably acknowledge > > this > > Big thanks for all great ide-floppy work! I hope that you'll continue with > putting IDE device drivers in shape. :) Sure, no problem :). I'm on ide-tape right now and probably will have most of it ready for submission on the weekend so keep your fingers crossed... :) -- Regards/Gruß, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22.16 MD raid1 doesn't mark removed disk faulty, MD thread goes UN
cc'ing Tanaka-san given his recent raid1 BUG report: http://lkml.org/lkml/2008/1/14/515 On Jan 21, 2008 6:04 PM, Mike Snitzer <[EMAIL PROTECTED]> wrote: > Under 2.6.22.16, I physically pulled a SATA disk (/dev/sdac, connected to > an aacraid controller) that was acting as the local raid1 member of > /dev/md30. > > Linux MD didn't see an /dev/sdac1 error until I tried forcing the issue by > doing a read (with dd) from /dev/md30: > > Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result: > hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK > Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key : > Hardware Error [current] > Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0 > Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense: > Internal target failure > Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 71 > Jan 21 17:08:07 lab17-233 kernel: printk: 3 messages suppressed. > Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 8 > Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 16 > Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 24 > Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 32 > Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 40 > Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 48 > Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 56 > Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 64 > Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 72 > Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 80 > Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result: > hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK > Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key : > Hardware Error [current] > Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0 > Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense: > Internal target failure > Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 343 > Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Result: > hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK > Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key : > Hardware Error [current] > Jan 21 17:08:08 lab17-233 kernel: Info fld=0x0 > ... > Jan 21 17:08:12 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense: > Internal target failure > Jan 21 17:08:12 lab17-233 kernel: end_request: I/O error, dev sdac, sector > 3399 > Jan 21 17:08:12 lab17-233 kernel: printk: 765 messages suppressed. > Jan 21 17:08:12 lab17-233 kernel: raid1: sdac1: rescheduling sector 3336 > > However, the MD layer still hasn't marked the sdac1 member faulty: > > md30 : active raid1 nbd2[1](W) sdac1[0] > 4016204 blocks super 1.0 [2/2] [UU] > bitmap: 1/8 pages [4KB], 256KB chunk > > The dd I used to read from /dev/md30 is blocked on IO: > > Jan 21 17:13:55 lab17-233 kernel: ddD 0afa9cf5c346 > 0 12337 7702 (NOTLB) > Jan 21 17:13:55 lab17-233 kernel: 81010c449868 0082 > 80268f14 > Jan 21 17:13:55 lab17-233 kernel: 81015da6f320 81015de532c0 > 0008 81012d9d7780 > Jan 21 17:13:55 lab17-233 kernel: 81015fae2880 4926 > 81012d9d7970 0001802879a0 > Jan 21 17:13:55 lab17-233 kernel: Call Trace: > Jan 21 17:13:55 lab17-233 kernel: [] > mempool_alloc+0x24/0xda > Jan 21 17:13:55 lab17-233 kernel: [] > :raid1:wait_barrier+0x84/0xc2 > Jan 21 17:13:55 lab17-233 kernel: [] > default_wake_function+0x0/0xe > Jan 21 17:13:55 lab17-233 kernel: [] > :raid1:make_request+0x83/0x5c0 > Jan 21 17:13:55 lab17-233 kernel: [] > __make_request+0x57f/0x668 > Jan 21 17:13:55 lab17-233 kernel: [] > generic_make_request+0x26e/0x2a9 > Jan 21 17:13:55 lab17-233 kernel: [] > mempool_alloc+0x24/0xda > Jan 21 17:13:55 lab17-233 kernel: [] __next_cpu+0x19/0x28 > Jan 21 17:13:55 lab17-233 kernel: [] submit_bio+0xb6/0xbd > Jan 21 17:13:55 lab17-233 kernel: [] submit_bh+0xdf/0xff > Jan 21 17:13:55 lab17-233 kernel: [] > block_read_full_page+0x271/0x28e > Jan 21 17:13:55 lab17-233 kernel: [] > blkdev_get_block+0x0/0x46 > Jan 21 17:13:55 lab17-233 kernel: [] > radix_tree_insert+0xcb/0x18c > Jan 21 17:13:55 lab17-233 kernel: [] > __do_page_cache_readahead+0x16d/0x1df > Jan 21 17:13:55 lab17-233 kernel: [] > getnstimeofday+0x32/0x8d > Jan 21 17:13:55 lab17-233 kernel: [] ktime_get_ts+0x1a/0x4e > Jan 21 17:13:55 lab17-233 kernel: [] > delayacct_end+0x7d/0x88 > Jan 21 17:13:55 lab17-233 kernel: [] > blockable_page_cache_readahead+0x53/0xb2 > Jan 21 17:13:55 lab17-233 kernel: [] > make_ahead_window+0x82/0x9e > Jan 21 17:13:55 lab17-233 kernel: [] > page_cache_readahead+0x18a/0x1c1 > Jan 21 17:13:55 lab17-233 kernel: [] > do_generic_mapping_read+0x135/0x3fc > Jan 21 17:13:55 lab17-233 kernel: [] > file_read_actor+0x0/0x170 > Jan 21 17:13:55 lab17-233 kernel:
Re: The SMP alternatives code breaks exception fixup?
Chuck Ebbert <[EMAIL PROTECTED]> writes: > > There is a fixup, so this should never happen. But the lock instruction > was replaced with a nop by the altinstruction code, and that makes the fixup > address wrong. AFAICT we don't fix up the exception table when we replace > a lock with a nop, which makes the fixup table point to the nop instead > of the cmpxchg instruction and causes us to miss the fixup. Indeed. Nasty issue. A quick fix would be to add another fixup to handle both cases I checked the other LOCK_PREFIX users and they look ok. Does this fix it? -Andi (untested) --- Add exception handlers for both the LOCK and no LOCK prefix case in futex. Hopefully fixes https://bugzilla.redhat.com/show_bug.cgi?id=429412 Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> Index: linux/include/asm-x86/futex.h === --- linux.orig/include/asm-x86/futex.h +++ linux/include/asm-x86/futex.h @@ -30,7 +30,7 @@ "1:movl%2, %0\n\ movl%0, %3\n" \ insn "\n" \ -"2:" LOCK_PREFIX "cmpxchgl %3, %2\n\ +"2:" LOCK_PREFIX "\n5: cmpxchgl %3, %2\n \ jnz 1b\n\ 3: .section .fixup,\"ax\"\n\ 4: mov %5, %1\n\ @@ -38,7 +38,7 @@ .previous\n \ .section __ex_table,\"a\"\n \ .align 8\n"\ - _ASM_PTR "1b,4b,2b,4b\n \ + _ASM_PTR "1b,4b,2b,4b,5b,4b\n \ .previous" \ : "=&a" (oldval), "=&r" (ret), "+m" (*uaddr), \ "=&r" (tem) \ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24 regression: pan hanging unkilleable and un-straceable
On Tuesday 22 January 2008 16:03, Mike Galbraith wrote: > On Tue, 2008-01-22 at 11:05 +1100, Nick Piggin wrote: > > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > > > With Linux 2.6.24-rc8 I often have the problem that the pan usenet > > > reader starts using 100% of CPU time after some time. When this > > > happens, kill -9 does not work, and strace just hangs when trying to > > > attach to the process. The same with gdb. ps shows the process as > > > being in the R state. > > > > > > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan: > > > Jan 21 21:45:01 Anastacia kernel: pan R running task > > > 0 > > > > Well I've twice tried to submit a patch to print stacks for running > > tasks as well, but nobody seems interested. It would at least give a > > chance to see something. > > I've hit same twice recently (not pan, and not repeatable). Nasty. The attached patch is something really simple that can sometimes help. sysrq+p is also an option, if you're on a UP system. Any luck getting traces? Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -4920,8 +4920,7 @@ static void show_task(struct task_struct printk(KERN_CONT "%5lu %5d %6d\n", free, task_pid_nr(p), task_pid_nr(p->real_parent)); - if (state != TASK_RUNNING) - show_stack(p, NULL); + show_stack(p, NULL); } void show_state_filter(unsigned long state_filter)
Re: [PATCH 7/7] driver-core : convert semaphore to mutex in struct class
On Tue, Jan 22, 2008 at 08:55:05AM +0800, Dave Young wrote: > On Jan 22, 2008 5:16 AM, Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > Dave Young wrote, On 01/21/2008 09:44 AM: > > ... > > > I applied it in my kernel, built and run without warnings, but it need > > > more testing. > > > I will be very glad to see the test result about this if you could, > > > thanks. > > > > Bad news. (Alas I won't be able to check this today.) > > Hi, thanks your effort. Now I think we should stop this thread and > waiting the class_device going away :) > > Hope the iteration patches 1-6/7 could be applied. Can you resend them again, and CC: me on all of them, with the latest updates, so I know what I should be reviewing this time around? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24 regression: pan hanging unkilleable and un-straceable
On Tue, 2008-01-22 at 11:05 +1100, Nick Piggin wrote: > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote: > > With Linux 2.6.24-rc8 I often have the problem that the pan usenet > > reader starts using 100% of CPU time after some time. When this happens, > > kill -9 does not work, and strace just hangs when trying to attach to > > the process. The same with gdb. ps shows the process as being in the R > > state. > > > > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan: > > Jan 21 21:45:01 Anastacia kernel: pan R running task0 > > Well I've twice tried to submit a patch to print stacks for running > tasks as well, but nobody seems interested. It would at least give a > chance to see something. I've hit same twice recently (not pan, and not repeatable). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
Anton Salikhmetov <[EMAIL PROTECTED]> writes: You should probably put your design document somewhere in Documentation with a patch. > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > + * It will force a pagefault on the next write access. > + */ > +static void vma_wrprotect(struct vm_area_struct *vma) > +{ > + unsigned long addr; > + > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { > + spinlock_t *ptl; > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > + pud_t *pud = pud_offset(pgd, addr); > + pmd_t *pmd = pmd_offset(pud, addr); > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); This means on i386 with highmem ptes you will map/flush tlb/unmap each PTE individually. You will do 512 times as much work as really needed per PTE leaf page. The performance critical address space walkers use a different design pattern that avoids this. > + if (pte_dirty(*pte) && pte_write(*pte)) { > + pte_t entry = ptep_clear_flush(vma, addr, pte); Flushing TLBs unbatched can also be very expensive because if the MM is shared by several CPUs you'll have a inter-processor interrupt for each iteration. They are quite costly even on smaller systems. It would be better if you did a single flush_tlb_range() at the end. This means on x86 this will currently always do a full flush, but that's still better than really slowing down in the heavily multithreaded case. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings
On Jan 22, 2008 5:14 AM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote: > > On Mon, 21 Jan 2008, Dave Young wrote: > > > Please see the kernel messages following,(trigged while using some qemu > > session) > > BTW, seems there's some e100 error message as well. > > > > PCI: Setting latency timer of device :00:1b.0 to 64 > > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI > > e100: Copyright(c) 1999-2006 Intel Corporation > > ACPI: PCI Interrupt :03:08.0[A] -> GSI 20 (level, low) -> IRQ 20 > > modprobe:2331 conflicting cache attribute efaff000-efb0 > > uncached<->default > > e100: :03:08.0: e100_probe: Cannot map device registers, aborting. > > ACPI: PCI interrupt for device :03:08.0 disabled > > e100: probe of :03:08.0 failed with error -12 > > eth0: setting full-duplex. > > [ cut here ] > > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150() > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq > > snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel > > snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw > > intel_agp button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore > > dcdbas 3c59x pcspkr snd_page_alloc > > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4 > > [] ? printk+0x0/0x20 > > [] warn_on_slowpath+0x54/0x80 > > [] ? ip_finish_output+0x128/0x2e0 > > [] ? ip_output+0xe7/0x100 > > [] ? ip_local_out+0x18/0x20 > > [] ? ip_queue_xmit+0x3dc/0x470 > > [] ? _spin_unlock_irqrestore+0x5e/0x70 > > [] ? check_pad_bytes+0x61/0x80 > > [] tcp_mark_head_lost+0x121/0x150 > > [] tcp_update_scoreboard+0x4c/0x170 > > [] tcp_fastretrans_alert+0x48a/0x6b0 > > [] tcp_ack+0x1b3/0x3a0 > > [] tcp_rcv_established+0x3eb/0x710 > > [] tcp_v4_do_rcv+0xe5/0x100 > > [] tcp_v4_rcv+0x5db/0x660 > > Doh, once more these S+L things..., the rest are symptom of the first > problem. What is the S+L thing? Could you explain a bit? > > What is strange is that it doesn't show up until now, the last TCP > changes that could have some significance are from early Dec/Nov. Is > there some reason why you haven't seen this before this (e.g., not > tested with similar cfg or so)? Hmm, don't know how to answer ... I'm a bit worried about its > reproducability if it takes this far to see it... > > > -- > i. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] RFC: Typesafe callbacks
Rusty Russell <[EMAIL PROTECTED]> writes: > === > Attempt to create callbacks which take unsigned long as well as > correct pointer types. FWIW i had something similar using the gcc union extension at some point for ioctls because I was tired for all the ugly casts from unsigned long arg to void * in ioctl handlers. But I decided to not push it because sparse would have likely choked on it, and sparse actually finds a lot of bugs so it's more important than having a few more casts. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Parallelize IO for e2fsck
On Tue, 22 Jan 2008 14:38:30 +1100, David Chinner said: > Perhaps instead of swapping immediately, a SIGLOWMEM could be sent > to a processes that aren't masking the signal followed by a short > grace period to allow the processes to free up some memory before > swapping out pages from that process? AIX had SIGDANGER some 15 years ago. Admittedly, that was sent when the system was about to hit OOM, not when it was about to start swapping. I suspect both approaches have their merits... pgp1E2qCn6W5E.pgp Description: PGP signature
[PATCH] ARM: Ignore memory tags with invalid data
From: Corey Minyard <[EMAIL PROTECTED]> The DNS-323 system has several bogus memory entries in the tag table, and it caused the system to crash at startup. Ignore tag entries that are obviously bogus. Signed-off-by: Corey Minyard <[EMAIL PROTECTED]> --- arch/arm/kernel/setup.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index bf56eb3..dfdb469 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -630,7 +630,12 @@ __tagtable(ATAG_CORE, parse_tag_core); static int __init parse_tag_mem32(const struct tag *tag) { - if (meminfo.nr_banks >= NR_BANKS) { + /* +* Make sure that the memory size is non-zero, page aligned, +* and that it doesn't overflow the meminfo table. +*/ + if (meminfo.nr_banks >= NR_BANKS || tag->u.mem.size & ~PAGE_MASK || + tag->u.mem.size == 0 || tag->u.mem.start & ~PAGE_MASK) { printk(KERN_WARNING "Ignoring memory bank 0x%08x size %dKB\n", tag->u.mem.start, tag->u.mem.size / 1024); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] mm: fix PageUptodate data race
After running SetPageUptodate, preceeding stores to the page contents to actually bring it uptodate may not be ordered with the store to set the page uptodate. Therefore, another CPU which checks PageUptodate is true, then reads the page contents can get stale data. Fix this by having an smp_wmb before SetPageUptodate, and smp_rmb after PageUptodate. Many places that test PageUptodate, do so with the page locked, and this would be enough to ensure memory ordering in those places if SetPageUptodate were only called while the page is locked. Unfortunately that is not always the case for some filesystems, but it could be an idea for the future. Also bring the handling of anonymous page uptodateness in line with that of file backed page management, by marking anon pages as uptodate when they _are_ uptodate, rather than when our implementation requires that they be marked as such. Doing allows us to get rid of the smp_wmb's in the page copying functions, which were especially added for anonymous pages for an analogous memory ordering problem. Both file and anonymous pages are handled with the same barriers. FAQ: Q. Why not do this in flush_dcache_page? A. Firstly, flush_dcache_page handles only one side (the smb side) of the ordering protocol; we'd still need smp_rmb somewhere. Secondly, hiding away memory barriers in a completely unrelated function is nasty; at least in the PageUptodate macros, they are located together with (half) the operations involved in the ordering. Thirdly, the smp_wmb is only required when first bringing the page uptodate, wheras flush_dcache_page should be called each time it is written to through the kernel mapping. It is logically the wrong place to put it. Q. Why does this increase my text size / reduce my performance / etc. A. Because it is adding the necessary instructions to eliminate the data-race. Q. Can it be improved? A. Yes, eg. if you were to create a rule that all SetPageUptodate operations run under the page lock, we could avoid the smp_rmb places where PageUptodate is queried under the page lock. Requires audit of all filesystems and at least some would need reworking. That's great you're interested, I'm eagerly awaiting your patches. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> --- Index: linux-2.6/include/linux/highmem.h === --- linux-2.6.orig/include/linux/highmem.h +++ linux-2.6/include/linux/highmem.h @@ -68,8 +68,6 @@ static inline void clear_user_highpage(s void *addr = kmap_atomic(page, KM_USER0); clear_user_page(addr, vaddr, page); kunmap_atomic(addr, KM_USER0); - /* Make sure this page is cleared on other CPU's too before using it */ - smp_wmb(); } #ifndef __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE @@ -160,8 +158,6 @@ static inline void copy_user_highpage(st copy_user_page(vto, vfrom, vaddr, to); kunmap_atomic(vfrom, KM_USER0); kunmap_atomic(vto, KM_USER1); - /* Make sure this page is cleared on other CPU's too before using it */ - smp_wmb(); } #endif Index: linux-2.6/include/linux/page-flags.h === --- linux-2.6.orig/include/linux/page-flags.h +++ linux-2.6/include/linux/page-flags.h @@ -131,16 +131,52 @@ #define ClearPageReferenced(page) clear_bit(PG_referenced, &(page)->flags) #define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, &(page)->flags) -#define PageUptodate(page) test_bit(PG_uptodate, &(page)->flags) +static inline int PageUptodate(struct page *page) +{ + int ret = test_bit(PG_uptodate, &(page)->flags); + + /* +* Must ensure that the data we read out of the page is loaded +* _after_ we've loaded page->flags to check for PageUptodate. +* We can skip the barrier if the page is not uptodate, because +* we wouldn't be reading anything from it. +* +* See SetPageUptodate() for the other side of the story. +*/ + if (ret) + smp_rmb(); + + return ret; +} + +static inline void __SetPageUptodate(struct page *page) +{ + smp_wmb(); + __set_bit(PG_uptodate, &(page)->flags); #ifdef CONFIG_S390 + page_clear_dirty(page); +#endif +} + static inline void SetPageUptodate(struct page *page) { +#ifdef CONFIG_S390 if (!test_and_set_bit(PG_uptodate, &page->flags)) page_clear_dirty(page); -} #else -#define SetPageUptodate(page) set_bit(PG_uptodate, &(page)->flags) + /* +* Memory barrier must be issued before setting the PG_uptodate bit, +* so that all previous stores issued in order to bring the page +* uptodate are actually visible before PageUptodate becomes true. +* +* s390 doesn't need an explicit smp_wmb here because the test and +* set bit already provides full barriers. +*/ + smp_wmb(); + set_bit(PG_uptodate, &(p
[PATCH 49/49] jbd2: sparse pointer use of zero as null
From: Mingming Cao <[EMAIL PROTECTED]> Get rid of sparse related warnings from places that use integer as NULL pointer. (Ported from upstream ext3/jbd changes.) Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]> --- fs/jbd2/transaction.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 0c8adab..b9b0b6f 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1182,7 +1182,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) } /* That test should have eliminated the following case: */ - J_ASSERT_JH(jh, jh->b_frozen_data == 0); + J_ASSERT_JH(jh, jh->b_frozen_data == NULL); JBUFFER_TRACE(jh, "file as BJ_Metadata"); spin_lock(&journal->j_list_lock); @@ -1532,7 +1532,7 @@ void __jbd2_journal_temp_unlink_buffer(struct journal_head *jh) J_ASSERT_JH(jh, jh->b_jlist < BJ_Types); if (jh->b_jlist != BJ_None) - J_ASSERT_JH(jh, transaction != 0); + J_ASSERT_JH(jh, transaction != NULL); switch (jh->b_jlist) { case BJ_None: @@ -1601,11 +1601,11 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh) if (buffer_locked(bh) || buffer_dirty(bh)) goto out; - if (jh->b_next_transaction != 0) + if (jh->b_next_transaction != NULL) goto out; spin_lock(&journal->j_list_lock); - if (jh->b_transaction != 0 && jh->b_cp_transaction == 0) { + if (jh->b_transaction != NULL && jh->b_cp_transaction == NULL) { if (jh->b_jlist == BJ_SyncData || jh->b_jlist == BJ_Locked) { /* A written-back ordered data buffer */ JBUFFER_TRACE(jh, "release data"); @@ -1613,7 +1613,7 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh) jbd2_journal_remove_journal_head(bh); __brelse(bh); } - } else if (jh->b_cp_transaction != 0 && jh->b_transaction == 0) { + } else if (jh->b_cp_transaction != NULL && jh->b_transaction == NULL) { /* written-back checkpointed metadata buffer */ if (jh->b_jlist == BJ_None) { JBUFFER_TRACE(jh, "remove from checkpoint list"); @@ -1973,7 +1973,7 @@ void __jbd2_journal_file_buffer(struct journal_head *jh, J_ASSERT_JH(jh, jh->b_jlist < BJ_Types); J_ASSERT_JH(jh, jh->b_transaction == transaction || - jh->b_transaction == 0); + jh->b_transaction == NULL); if (jh->b_transaction && jh->b_jlist == jlist) return; -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Parallelize IO for e2fsck
On Mon, Jan 21, 2008 at 04:00:41PM -0700, Andreas Dilger wrote: > On Jan 16, 2008 13:30 -0800, Valerie Henson wrote: > > I have a partial solution that sort of blindly manages the buffer > > cache. First, the user passes e2fsck a parameter saying how much > > memory is available as buffer cache. The readahead thread reads > > things in and immediately throws them away so they are only in buffer > > cache (no double-caching). Then readahead and e2fsck work together so > > that readahead only reads in new blocks when the main thread is done > > with earlier blocks. The already-used blocks get kicked out of buffer > > cache to make room for the new ones. > > > > What would be nice is to take into account the current total memory > > usage of the whole fsck process and factor that in. I don't think it > > would be hard to add to the existing cache management framework. > > Thoughts? > > I discussed this with Ted at one point also. This is a generic problem, > not just for readahead, because "fsck" can run multiple e2fsck in parallel > and in case of many large filesystems on a single node this can cause > memory usage problems also. > > What I was proposing is that "fsck.{fstype}" be modified to return an > estimated minimum amount of memory needed, and some "desired" amount of > memory (i.e. readahead) to fsck the filesystem, using some parameter like > "fsck.{fstype} --report-memory-needed /dev/XXX". If this does not > return the output in the expected format, or returns an error then fsck > will assume some amount of memory based on the device size and continue > as it does today. And while fsck is running, some other program runs that uses memory and blows your carefully calculated paramters to smithereens? I think there is a clear need for applications to be able to register a callback from the kernel to indicate that the machine as a whole is running out of memory and that the application should trim it's caches to reduce memory utilisation. Perhaps instead of swapping immediately, a SIGLOWMEM could be sent to a processes that aren't masking the signal followed by a short grace period to allow the processes to free up some memory before swapping out pages from that process? With this sort of feedback, the fsck process can scale back it's readahead and remove cached info that is not critical to what it is currently doing and thereby prevent readahead thrashing as memory usage of the fsck process itself grows. Another example where this could be useful is to tell browsers to release some of their cache rather than having the VM swap it out. IMO, a scheme like this will be far more reliable than trying to guess what the optimal settings are going to be over the whole lifetime of a process Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: W1: w1_slave units, standardize 1C or .001C? Break API
H. Peter Anvin wrote: David Fries wrote: The ds18b20 one wire temperature sensor conversion routine is returning the units in degrees C while the ds1820 (ds18s20) is returning it in .001 degrees C. 20C vs 20312C. Once you know the units I'm liking the latter as it gives a higher precision. Time to break user applications so the driver can give the temperature in the same units for both sensors. I only have the ds18b20 sensor model. Here is the current output from the sys file for this sensor. /sys/devices/w1_bus_master1/28-000e84a2/w1_slave 45 01 4b 46 7f ff 0b 10 84 : crc=84 YES 45 01 4b 46 7f ff 0b 10 84 t=20 I ran the example data from the specification for the ds1820 through it's conversion routine and found that t= was 1000 times the value. What should the displayed units be? This is the same ds18b20 conversion *1000. Is everyone ok or is any objecting to .001 degrees C for the units? Patch will follow. The .001 C does truncate one bit of precision from the ds18b20 by the way. Millikelvins would have the nice property of never being negative. :) Alternatively, centikelvins would fit nicely in 16 bits if anyone cares... 655.35 K = 382.20 °C = 719.96 °F -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Use separate sections for __dev/__cpu/__mem code/data
On Tue, Jan 22, 2008 at 09:56:57AM +0900, Paul Mundt wrote: > On Mon, Jan 21, 2008 at 01:06:41PM +0100, Sam Ravnborg wrote: > > On Mon, Jan 21, 2008 at 07:52:57PM +0900, Paul Mundt wrote: > > > On Mon, Jan 21, 2008 at 11:47:45AM +0100, Sam Ravnborg wrote: > > > > On Mon, Jan 21, 2008 at 11:45:06AM +0100, Sam Ravnborg wrote: > > > > > On Mon, Jan 21, 2008 at 11:29:52AM +0100, Andreas Schwab wrote: > > > > > > Sam Ravnborg <[EMAIL PROTECTED]> writes: > > > > > > > > > > > > > On Mon, Jan 21, 2008 at 04:33:41PM +0900, Paul Mundt wrote: > > > > > > >> so the ## is being taken directly rather than acting as a > > > > > > >> concatenation. > > > > > > > > > > > > > > Strange... > > > > > > > I can reproduce with gcc 3.4.5 here - will fix. > > > > > > > > > > > > The ## operator does not work with -traditional. > > > > > > > > > > Crap - then it breaks at the following architectures: > > > > > sh64, s390, m68k, m32r > > > > > > > > > > Thanks Andreas. > > > > > > > > OK - I was too quick it seem. > > > > sh has: > > > > arch/sh/Makefile:CPPFLAGS_vmlinux.lds := -traditional > > > > > > > > So this needs to be ripped out as it is not needed. > > > > > > > Yes, that can be killed. If this is aimed at 2.6.25, I'll just kill it > > > off in my tree. Otherwise, feel free to roll this in to your patch set. > > > > It is aimed for 2.6.25 and it looks reasonable to reach that goal. > > So please take this patch in your tree. > > > Done. Thanks Paul, and thanks for the prompt testing! Sam -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 32/49] jbd2: jbd2 stats through procfs
From: Johann Lombardi <[EMAIL PROTECTED]> The patch below updates the jbd stats patch to 2.6.20/jbd2. The initial patch was posted by Alex Tomas in December 2005 (http://marc.info/?l=linux-ext4&m=113538565128617&w=2). It provides statistics via procfs such as transaction lifetime and size. Sometimes, investigating performance problems, i find useful to have stats from jbd about transaction's lifetime, size, etc. here is a patch for review and inclusion probably. for example, stats after creation of 3M files in htree directory: [EMAIL PROTECTED] ~]# cat /proc/fs/jbd/sda/history R/C tid wait run lock flush log hndls block inlog ctime write drop close R261 8260 2720 0 0 750 9892 8170 8187 C259750 0 4885 1 R262 202200 100 770 9836 8170 8187 R263 302200 100 3070 9812 8170 8187 R264 0 5000 100 1340 0 0 0 C2618240 3212 4957 0 R265 8260 1470 0 0 4640 9854 8170 8187 R266 0 5000 100 1460 0 0 0 C2628210 2989 4868 0 R267 8230 1490 100 4440 9875 8171 8188 R268 0 5000 100 1260 0 0 0 C2637710 2937 4908 0 R269 7730 1470 100 3330 9841 8170 8187 R270 0 5000 100 830 0 0 0 C2658140 3234 4898 0 C267720 0 4849 1 R271 8630 2740 200 740 9819 8170 8187 C269800 0 4214 1 R272 402170 100 830 9716 8170 8187 R273 402280 0 0 3530 9799 8170 8187 R274 0 5000 100 990 0 0 0 where, R - line for transaction's life from T_RUNNING to T_FINISHED C - line for transaction's checkpointing tid - transaction's id wait - for how long we were waiting for new transaction to start (the longest period journal_start() took in this transaction) run - real transaction's lifetime (from T_RUNNING to T_LOCKED lock - how long we were waiting for all handles to close (time the transaction was in T_LOCKED) flush - how long it took to flush all data (data=ordered) log - how long it took to write the transaction to the log hndls - how many handles got to the transaction block - how many blocks got to the transaction inlog - how many blocks are written to the log (block + descriptors) ctime - how long it took to checkpoint the transaction write - how many blocks have been written during checkpointing drop - how many blocks have been dropped during checkpointing close - how many running transactions have been closed to checkpoint this one all times are in msec. [EMAIL PROTECTED] ~]# cat /proc/fs/jbd/sda/info 280 transaction, each upto 8192 blocks average: 1633ms waiting for transaction 3616ms running transaction 5ms transaction was being locked 1ms flushing data (in ordered mode) 1799ms logging transaction 11781 handles per transaction 5629 blocks per transaction 5641 logged blocks per transaction Signed-off-by: Johann Lombardi <[EMAIL PROTECTED]> Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]> --- fs/jbd2/checkpoint.c | 10 +- fs/jbd2/commit.c | 49 +++ fs/jbd2/journal.c | 338 + fs/jbd2/transaction.c |9 ++ include/linux/jbd2.h | 77 +++ 5 files changed, 481 insertions(+), 2 deletions(-) diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index 7e958c8..1b7f282 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -232,7 +232,8 @@ __flush_batch(journal_t *journal, struct buffer_head **bhs, int *batch_count) * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it */ static int __process_buffer(journal_t *journal, struct journal_head *jh, - struct buffer_head **bhs, int *batch_count) + struct buffer_head **bhs, int *batch_count, + transaction_t *transaction) { struct buffer_head *bh = jh2bh(jh); int ret = 0; @@ -250,6 +251,7 @@ static int __process_buffer(journal_t *journal, struct journal_head *jh, transaction_t *t = jh->b_transaction; tid_t tid = t->t_tid; + transaction->t_chp_stats.cs_forced_to_close++; spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); jbd2_log_start_commit(journal, tid); @@ -279,6 +281,7 @@ static int __process_buffer(journal_t *j
[PATCH 06/49] ext4: fixes block group number being set to a negative value
From: Avantika Mathur <[EMAIL PROTECTED]> This patch fixes various places where the group number is set to a negative value. Signed-off-by: Avantika Mathur <[EMAIL PROTECTED]> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]> --- fs/ext4/ialloc.c | 101 - 1 files changed, 53 insertions(+), 48 deletions(-) diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 64dea86..7b5cfa6 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -260,12 +260,14 @@ error_return: * For other inodes, search forward from the parent directory\'s block * group to find a free inode. */ -static ext4_group_t find_group_dir(struct super_block *sb, struct inode *parent) +static int find_group_dir(struct super_block *sb, struct inode *parent, + ext4_group_t *best_group) { ext4_group_t ngroups = EXT4_SB(sb)->s_groups_count; unsigned int freei, avefreei; struct ext4_group_desc *desc, *best_desc = NULL; - ext4_group_t group, best_group = -1; + ext4_group_t group; + int ret = -1; freei = percpu_counter_read_positive(&EXT4_SB(sb)->s_freeinodes_counter); avefreei = freei / ngroups; @@ -279,11 +281,12 @@ static ext4_group_t find_group_dir(struct super_block *sb, struct inode *parent) if (!best_desc || (le16_to_cpu(desc->bg_free_blocks_count) > le16_to_cpu(best_desc->bg_free_blocks_count))) { - best_group = group; + *best_group = group; best_desc = desc; + ret = 0; } } - return best_group; + return ret; } /* @@ -314,8 +317,8 @@ static ext4_group_t find_group_dir(struct super_block *sb, struct inode *parent) #define INODE_COST 64 #define BLOCK_COST 256 -static ext4_group_t find_group_orlov(struct super_block *sb, - struct inode *parent) +static int find_group_orlov(struct super_block *sb, struct inode *parent, + ext4_group_t *group) { ext4_group_t parent_group = EXT4_I(parent)->i_block_group; struct ext4_sb_info *sbi = EXT4_SB(sb); @@ -328,7 +331,7 @@ static ext4_group_t find_group_orlov(struct super_block *sb, unsigned int ndirs; int max_debt, max_dirs, min_inodes; ext4_grpblk_t min_blocks; - ext4_group_t group = -1, i; + ext4_group_t i; struct ext4_group_desc *desc; freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter); @@ -341,13 +344,14 @@ static ext4_group_t find_group_orlov(struct super_block *sb, if ((parent == sb->s_root->d_inode) || (EXT4_I(parent)->i_flags & EXT4_TOPDIR_FL)) { int best_ndir = inodes_per_group; - ext4_group_t best_group = -1; + ext4_group_t grp; + int ret = -1; - get_random_bytes(&group, sizeof(group)); - parent_group = (unsigned)group % ngroups; + get_random_bytes(&grp, sizeof(grp)); + parent_group = (unsigned)grp % ngroups; for (i = 0; i < ngroups; i++) { - group = (parent_group + i) % ngroups; - desc = ext4_get_group_desc (sb, group, NULL); + grp = (parent_group + i) % ngroups; + desc = ext4_get_group_desc(sb, grp, NULL); if (!desc || !desc->bg_free_inodes_count) continue; if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir) @@ -356,11 +360,12 @@ static ext4_group_t find_group_orlov(struct super_block *sb, continue; if (le16_to_cpu(desc->bg_free_blocks_count) < avefreeb) continue; - best_group = group; + *group = grp; + ret = 0; best_ndir = le16_to_cpu(desc->bg_used_dirs_count); } - if (best_group >= 0) - return best_group; + if (ret == 0) + return ret; goto fallback; } @@ -381,8 +386,8 @@ static ext4_group_t find_group_orlov(struct super_block *sb, max_debt = 1; for (i = 0; i < ngroups; i++) { - group = (parent_group + i) % ngroups; - desc = ext4_get_group_desc (sb, group, NULL); + *group = (parent_group + i) % ngroups; + desc = ext4_get_group_desc(sb, *group, NULL); if (!desc || !desc->bg_free_inodes_count) continue; if (le16_to_cpu(desc->bg_used_dirs_count) >= max_dirs) @@ -391,17 +396,16 @@ static ext4_group_t find_group_orlov(struct super_block *sb, conti
[PATCH 03/49] ext4: Introduce ext4_lblk_t
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> This patch adds a new data type ext4_lblk_t to represent the logical file blocks. This is the preparatory patch to support large files in ext4 The follow up patch with convert the ext4_inode i_blocks to represent the number of blocks in file system block size. This changes makes it possible to have a block number 2**32 -1 which will result in overflow if the block number is represented by signed long. This patch convert all the block number to type ext4_lblk_t which is typedef to __u32 Also remove dead code ext4_ext_walk_space Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]> --- fs/ext4/dir.c |2 +- fs/ext4/extents.c | 218 --- fs/ext4/inode.c | 34 --- fs/ext4/namei.c | 54 ++- fs/ext4/super.c |4 +- include/linux/ext4_fs.h | 29 -- include/linux/ext4_fs_extents.h | 19 +--- include/linux/ext4_fs_i.h |9 +- 8 files changed, 143 insertions(+), 226 deletions(-) diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c index 145a9c0..33888bb 100644 --- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -124,7 +124,7 @@ static int ext4_readdir(struct file * filp, offset = filp->f_pos & (sb->s_blocksize - 1); while (!error && !stored && filp->f_pos < inode->i_size) { - unsigned long blk = filp->f_pos >> EXT4_BLOCK_SIZE_BITS(sb); + ext4_lblk_t blk = filp->f_pos >> EXT4_BLOCK_SIZE_BITS(sb); struct buffer_head map_bh; struct buffer_head *bh = NULL; diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 8528774..19d8059 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -144,7 +144,7 @@ static int ext4_ext_dirty(handle_t *handle, struct inode *inode, static ext4_fsblk_t ext4_ext_find_goal(struct inode *inode, struct ext4_ext_path *path, - ext4_fsblk_t block) + ext4_lblk_t block) { struct ext4_inode_info *ei = EXT4_I(inode); ext4_fsblk_t bg_start; @@ -367,13 +367,14 @@ static void ext4_ext_drop_refs(struct ext4_ext_path *path) * the header must be checked before calling this */ static void -ext4_ext_binsearch_idx(struct inode *inode, struct ext4_ext_path *path, int block) +ext4_ext_binsearch_idx(struct inode *inode, + struct ext4_ext_path *path, ext4_lblk_t block) { struct ext4_extent_header *eh = path->p_hdr; struct ext4_extent_idx *r, *l, *m; - ext_debug("binsearch for %d(idx): ", block); + ext_debug("binsearch for %lu(idx): ", (unsigned long)block); l = EXT_FIRST_INDEX(eh) + 1; r = EXT_LAST_INDEX(eh); @@ -425,7 +426,8 @@ ext4_ext_binsearch_idx(struct inode *inode, struct ext4_ext_path *path, int bloc * the header must be checked before calling this */ static void -ext4_ext_binsearch(struct inode *inode, struct ext4_ext_path *path, int block) +ext4_ext_binsearch(struct inode *inode, + struct ext4_ext_path *path, ext4_lblk_t block) { struct ext4_extent_header *eh = path->p_hdr; struct ext4_extent *r, *l, *m; @@ -438,7 +440,7 @@ ext4_ext_binsearch(struct inode *inode, struct ext4_ext_path *path, int block) return; } - ext_debug("binsearch for %d: ", block); + ext_debug("binsearch for %lu: ", (unsigned long)block); l = EXT_FIRST_EXTENT(eh) + 1; r = EXT_LAST_EXTENT(eh); @@ -494,7 +496,8 @@ int ext4_ext_tree_init(handle_t *handle, struct inode *inode) } struct ext4_ext_path * -ext4_ext_find_extent(struct inode *inode, int block, struct ext4_ext_path *path) +ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block, + struct ext4_ext_path *path) { struct ext4_extent_header *eh; struct buffer_head *bh; @@ -979,8 +982,8 @@ repeat: /* refill path */ ext4_ext_drop_refs(path); path = ext4_ext_find_extent(inode, - le32_to_cpu(newext->ee_block), - path); + (ext4_lblk_t)le32_to_cpu(newext->ee_block), + path); if (IS_ERR(path)) err = PTR_ERR(path); } else { @@ -992,8 +995,8 @@ repeat: /* refill path */ ext4_ext_drop_refs(path); path = ext4_ext_find_extent(inode, - le32_to_cpu(newext->ee_block), - path); + (ext4_lblk_t)le32_to_cpu(newext->ee_block), + path); if (IS
[PATCH 35/49] ext4: Add inode version support in ext4
From: Jean Noel Cordenner <[EMAIL PROTECTED]> This patch adds 64-bit inode version support to ext4. The lower 32 bits are stored in the osd1.linux1.l_i_version field while the high 32 bits are stored in the i_version_hi field newly created in the ext4_inode. This field is incremented in case the ext4_inode is large enough. A i_version mount option has been added to enable the feature. Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]> Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]> Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> Signed-off-by: Jean Noel Cordenner <[EMAIL PROTECTED]> --- fs/ext4/inode.c | 18 +- fs/ext4/super.c | 10 -- fs/inode.c | 17 - include/linux/ext4_fs.h |6 +- include/linux/fs.h | 16 +++- 5 files changed, 45 insertions(+), 22 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ee0bc3a..3c013e5 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2780,6 +2780,13 @@ void ext4_read_inode(struct inode * inode) EXT4_INODE_GET_XTIME(i_atime, inode, raw_inode); EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode); + inode->i_version = le32_to_cpu(raw_inode->i_disk_version); + if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) { + if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi)) + inode->i_version |= + (__u64)(le32_to_cpu(raw_inode->i_version_hi)) << 32; + } + if (S_ISREG(inode->i_mode)) { inode->i_op = &ext4_file_inode_operations; inode->i_fop = &ext4_file_operations; @@ -2962,8 +2969,14 @@ static int ext4_do_update_inode(handle_t *handle, } else for (block = 0; block < EXT4_N_BLOCKS; block++) raw_inode->i_block[block] = ei->i_data[block]; - if (ei->i_extra_isize) + raw_inode->i_disk_version = cpu_to_le32(inode->i_version); + if (ei->i_extra_isize) { + if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi)) + raw_inode->i_version_hi = + cpu_to_le32(inode->i_version >> 32); raw_inode->i_extra_isize = cpu_to_le16(ei->i_extra_isize); + } + BUFFER_TRACE(bh, "call ext4_journal_dirty_metadata"); rc = ext4_journal_dirty_metadata(handle, bh); @@ -3190,6 +3203,9 @@ int ext4_mark_iloc_dirty(handle_t *handle, { int err = 0; + if (test_opt(inode->i_sb, I_VERSION)) + inode_inc_iversion(inode); + /* the do_update_inode consumes one bh->b_count */ get_bh(iloc->bh); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index f7479d3..aa22acd 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -732,6 +732,8 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) seq_puts(seq, ",nobh"); if (!test_opt(sb, EXTENTS)) seq_puts(seq, ",noextents"); + if (test_opt(sb, I_VERSION)) + seq_puts(seq, ",i_version"); if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA) seq_puts(seq, ",data=journal"); @@ -874,7 +876,7 @@ enum { Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota, Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota, Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota, - Opt_grpquota, Opt_extents, Opt_noextents, + Opt_grpquota, Opt_extents, Opt_noextents, Opt_i_version, }; static match_table_t tokens = { @@ -928,6 +930,7 @@ static match_table_t tokens = { {Opt_barrier, "barrier=%u"}, {Opt_extents, "extents"}, {Opt_noextents, "noextents"}, + {Opt_i_version, "i_version"}, {Opt_err, NULL}, {Opt_resize, "resize"}, }; @@ -1273,6 +1276,10 @@ clear_qf_name: case Opt_noextents: clear_opt (sbi->s_mount_opt, EXTENTS); break; + case Opt_i_version: + set_opt(sbi->s_mount_opt, I_VERSION); + sb->s_flags |= MS_I_VERSION; + break; default: printk (KERN_ERR "EXT4-fs: Unrecognized mount option \"%s\" " @@ -3197,7 +3204,6 @@ out: i_size_write(inode, off+len-towrite); EXT4_I(inode)->i_disksize = inode->i_size; } - inode->i_version++; inode->i_mtime = inode->i_ctime = CURRENT_TIME; ext4_mark_inode_dirty(handle, inode); mutex_unlock(&inode->i_mutex); diff --git a/fs/inode.c b/fs/inode.c index b48324a..276ffd6 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1243,23 +1243,6 @@ void touch_atime(struct vfsmount *mnt, struct dentry *dentry) EXPORT_SYMBOL(touch_atime); /** - * inode_inc_iversion - increments i_version - * @i
[PATCH 38/49] ext4: fix up EXT4FS_DEBUG builds
From: Eric Sandeen <[EMAIL PROTECTED]> Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail without these changes. Clean up some format warnings too. Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> --- fs/ext4/balloc.c |6 +++--- fs/ext4/ialloc.c |2 +- fs/ext4/resize.c | 16 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 925e063..54d3da7 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -1630,7 +1630,7 @@ ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode, sbi = EXT4_SB(sb); es = EXT4_SB(sb)->s_es; - ext4_debug("goal=%lu.\n", goal); + ext4_debug("goal=%llu.\n", goal); /* * Allocate a block from reservation only when * filesystem is mounted with reservation(default,-o reservation), and @@ -1740,7 +1740,7 @@ retry_alloc: allocated: - ext4_debug("using block group %d(%d)\n", + ext4_debug("using block group %lu(%d)\n", group_no, gdp->bg_free_blocks_count); BUFFER_TRACE(gdp_bh, "get_write_access"); @@ -1898,7 +1898,7 @@ ext4_fsblk_t ext4_count_free_blocks(struct super_block *sb) brelse(bitmap_bh); printk("ext4_count_free_blocks: stored = %llu" ", computed = %llu, %llu\n", - EXT4_FREE_BLOCKS_COUNT(es), + ext4_free_blocks_count(es), desc_count, bitmap_count); return bitmap_count; #else diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 17b5df1..575b521 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -857,7 +857,7 @@ unsigned long ext4_count_free_inodes (struct super_block * sb) continue; x = ext4_count_free(bitmap_bh, EXT4_INODES_PER_GROUP(sb) / 8); - printk("group %d: stored = %d, counted = %lu\n", + printk(KERN_DEBUG "group %lu: stored = %d, counted = %lu\n", i, le16_to_cpu(gdp->bg_free_inodes_count), x); bitmap_count += x; } diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index 7090c2d..4fbba60 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -206,7 +206,7 @@ static int setup_new_group_blocks(struct super_block *sb, } if (ext4_bg_has_super(sb, input->group)) { - ext4_debug("mark backup superblock %#04lx (+0)\n", start); + ext4_debug("mark backup superblock %#04llx (+0)\n", start); ext4_set_bit(0, bh->b_data); } @@ -215,7 +215,7 @@ static int setup_new_group_blocks(struct super_block *sb, i < gdblocks; i++, block++, bit++) { struct buffer_head *gdb; - ext4_debug("update backup group %#04lx (+%d)\n", block, bit); + ext4_debug("update backup group %#04llx (+%d)\n", block, bit); if ((err = extend_or_restart_transaction(handle, 1, bh))) goto exit_bh; @@ -243,7 +243,7 @@ static int setup_new_group_blocks(struct super_block *sb, i < reserved_gdb; i++, block++, bit++) { struct buffer_head *gdb; - ext4_debug("clear reserved block %#04lx (+%d)\n", block, bit); + ext4_debug("clear reserved block %#04llx (+%d)\n", block, bit); if ((err = extend_or_restart_transaction(handle, 1, bh))) goto exit_bh; @@ -256,10 +256,10 @@ static int setup_new_group_blocks(struct super_block *sb, ext4_set_bit(bit, bh->b_data); brelse(gdb); } - ext4_debug("mark block bitmap %#04x (+%ld)\n", input->block_bitmap, + ext4_debug("mark block bitmap %#04llx (+%llu)\n", input->block_bitmap, input->block_bitmap - start); ext4_set_bit(input->block_bitmap - start, bh->b_data); - ext4_debug("mark inode bitmap %#04x (+%ld)\n", input->inode_bitmap, + ext4_debug("mark inode bitmap %#04llx (+%llu)\n", input->inode_bitmap, input->inode_bitmap - start); ext4_set_bit(input->inode_bitmap - start, bh->b_data); @@ -268,7 +268,7 @@ static int setup_new_group_blocks(struct super_block *sb, i < sbi->s_itb_per_group; i++, bit++, block++) { struct buffer_head *it; - ext4_debug("clear inode block %#04lx (+%d)\n", block, bit); + ext4_debug("clear inode block %#04llx (+%d)\n", block, bit); if ((err = extend_or_restart_transaction(handle, 1, bh))) goto exit_bh; @@ -291,7 +291,7 @@ static int setup_new_group_blocks(struct super_block *sb, brelse(bh); /* Mark unused entries in inode bitmap used */ - ext4_debug("clear inode bitmap %#04x (+%ld)\n", + ext4_debug("clear inode bitmap %#04llx (+%llu)\n", input->inode_bitmap, input->inode_bitmap - start);
[PATCH 34/49] vfs: Add 64 bit i_version support
From: Jean Noel Cordenner <[EMAIL PROTECTED]> The i_version field of the inode is changed to be a 64-bit counter that is set on every inode creation and that is incremented every time the inode data is modified (similarly to the "ctime" time-stamp). The aim is to fulfill a NFSv4 requirement for rfc3530. This first part concerns the vfs, it converts the 32-bit i_version in the generic inode to a 64-bit, a flag is added in the super block in order to check if the feature is enabled and the i_version is incremented in the vfs. Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Signed-off-by: Jean Noel Cordenner <[EMAIL PROTECTED]> Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]> --- fs/afs/dir.c |9 + fs/afs/inode.c |3 ++- fs/inode.c | 22 ++ include/linux/fs.h |5 - 4 files changed, 33 insertions(+), 6 deletions(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 33fe39a..0cc3597 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -546,11 +546,11 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry, dentry->d_op = &afs_fs_dentry_operations; d_add(dentry, inode); - _leave(" = 0 { vn=%u u=%u } -> { ino=%lu v=%lu }", + _leave(" = 0 { vn=%u u=%u } -> { ino=%lu v=%llu }", fid.vnode, fid.unique, dentry->d_inode->i_ino, - dentry->d_inode->i_version); + (unsigned long long)dentry->d_inode->i_version); return NULL; } @@ -630,9 +630,10 @@ static int afs_d_revalidate(struct dentry *dentry, struct nameidata *nd) * been deleted and replaced, and the original vnode ID has * been reused */ if (fid.unique != vnode->fid.unique) { - _debug("%s: file deleted (uq %u -> %u I:%lu)", + _debug("%s: file deleted (uq %u -> %u I:%llu)", dentry->d_name.name, fid.unique, - vnode->fid.unique, dentry->d_inode->i_version); + vnode->fid.unique, + (unsigned long long)dentry->d_inode->i_version); spin_lock(&vnode->lock); set_bit(AFS_VNODE_DELETED, &vnode->flags); spin_unlock(&vnode->lock); diff --git a/fs/afs/inode.c b/fs/afs/inode.c index d196840..84750c8 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -301,7 +301,8 @@ int afs_getattr(struct vfsmount *mnt, struct dentry *dentry, inode = dentry->d_inode; - _enter("{ ino=%lu v=%lu }", inode->i_ino, inode->i_version); + _enter("{ ino=%lu v=%llu }", inode->i_ino, + (unsigned long long)inode->i_version); generic_fillattr(inode, stat); return 0; diff --git a/fs/inode.c b/fs/inode.c index ed35383..b48324a 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1243,6 +1243,23 @@ void touch_atime(struct vfsmount *mnt, struct dentry *dentry) EXPORT_SYMBOL(touch_atime); /** + * inode_inc_iversion - increments i_version + * @inode: inode that need to be updated + * + * Every time the inode is modified, the i_version field + * will be incremented. + * The filesystem has to be mounted with i_version flag + * + */ + +void inode_inc_iversion(struct inode *inode) +{ + spin_lock(&inode->i_lock); + inode->i_version++; + spin_unlock(&inode->i_lock); +} + +/** * file_update_time- update mtime and ctime time * @file: file accessed * @@ -1276,6 +1293,11 @@ void file_update_time(struct file *file) sync_it = 1; } + if (IS_I_VERSION(inode)) { + inode_inc_iversion(inode); + sync_it = 1; + } + if (sync_it) mark_inode_dirty_sync(inode); } diff --git a/include/linux/fs.h b/include/linux/fs.h index b3ec4a4..94cf5d8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -124,6 +124,7 @@ extern int dir_notify_enable; #define MS_SHARED (1<<20) /* change to shared */ #define MS_RELATIME(1<<21) /* Update atime relative to mtime/ctime. */ #define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */ +#define MS_I_VERSION (1<<23) /* Update inode I_version field */ #define MS_ACTIVE (1<<30) #define MS_NOUSER (1<<31) @@ -173,6 +174,7 @@ extern int dir_notify_enable; ((inode)->i_flags & (S_SYNC|S_DIRSYNC))) #define IS_MANDLOCK(inode) __IS_FLG(inode, MS_MANDLOCK) #define IS_NOATIME(inode) __IS_FLG(inode, MS_RDONLY|MS_NOATIME) +#define IS_I_VERSION(inode) __IS_FLG(inode, MS_I_VERSION) #define IS_NOQUOTA(inode) ((inode)->i_flags & S_NOQUOTA) #define IS_APPEND(inode) ((inode)->i_flags & S_APPEND) @@ -599,7 +601,7 @@ struct inode { uid_t i_uid; gid_t i_gid; dev_t i_rdev; -
[PATCH 18/49] ext4: sync up block group descriptor with e2fsprogs.
From: Coly Li <[EMAIL PROTECTED]> This patch extends bg_itable_unused of ext4 group descriptor from 16bit into 32bit. In order to add bg_itable_unused_hi into struct ext4_group_desc, some extra fields which are already introduced into e2fsprogs are also added in for consistency. Signed-off-by: Coly Li <[EMAIL PROTECTED]> Cc: Andreas Dilger <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> --- include/linux/ext4_fs.h |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h index 6ae91f4..55a376e 100644 --- a/include/linux/ext4_fs.h +++ b/include/linux/ext4_fs.h @@ -118,6 +118,11 @@ struct ext4_group_desc __le32 bg_block_bitmap_hi; /* Blocks bitmap block MSB */ __le32 bg_inode_bitmap_hi; /* Inodes bitmap block MSB */ __le32 bg_inode_table_hi; /* Inodes table block MSB */ + __le16 bg_free_blocks_count_hi;/* Free blocks count MSB */ + __le16 bg_free_inodes_count_hi;/* Free inodes count MSB */ + __le16 bg_used_dirs_count_hi; /* Directories count MSB */ + __le16 bg_itable_unused_hi;/* Unused inodes count MSB */ + __u32 bg_reserved2[3]; }; #define EXT4_BG_INODE_UNINIT 0x0001 /* Inode table/bitmap not in use */ -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 33/49] ext4: Add the journal checksum feature
From: Girish Shilamkar <[EMAIL PROTECTED]> The journal checksum feature adds two new flags i.e JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Due to checksums, writing of the commit record no longer needs to be synchronous. Now commit record can be sent to disk without waiting for descriptor blocks to be written to disk. This behavior is controlled using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be able to recover the journal with _ASYNC_COMMIT hence it is made incompat. The commit header has been extended to hold the checksum along with the type of the checksum. For recovery in pass scan checksums are verified to ensure the sanity and completeness(in case of _ASYNC_COMMIT) of every transaction. Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]> Signed-off-by: Girish Shilamkar <[EMAIL PROTECTED]> Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> --- Documentation/filesystems/ext4.txt | 10 ++ fs/Kconfig |1 + fs/ext4/super.c| 25 + fs/jbd2/commit.c | 196 +++- fs/jbd2/journal.c | 28 + fs/jbd2/recovery.c | 149 ++-- include/linux/ext4_fs.h|3 +- include/linux/jbd2.h | 36 ++- 8 files changed, 388 insertions(+), 60 deletions(-) diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 6a4adca..4f329af 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -89,6 +89,16 @@ When mounting an ext4 filesystem, the following option are accepted: extentsext4 will use extents to address file data. The file system will no longer be mountable by ext3. +journal_checksum Enable checksumming of the journal transactions. + This will allow the recovery code in e2fsck and the + kernel to detect corruption in the kernel. It is a + compatible change and will be ignored by older kernels. + +journal_async_commit Commit block can be written to disk without waiting + for descriptor blocks. If enabled older kernels cannot + mount the device. This will enable 'journal_checksum' + internally. + journal=update Update the ext4 file system's journal to the current format. diff --git a/fs/Kconfig b/fs/Kconfig index 487236c..bb0b72c 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -236,6 +236,7 @@ config JBD_DEBUG config JBD2 tristate + select CRC32 help This is a generic journaling layer for block devices that support both 32-bit and 64-bit block numbers. It is currently used by diff --git a/fs/ext4/super.c b/fs/ext4/super.c index c730544..f7479d3 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -869,6 +869,7 @@ enum { Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl, Opt_reservation, Opt_noreservation, Opt_noload, Opt_nobh, Opt_bh, Opt_commit, Opt_journal_update, Opt_journal_inum, Opt_journal_dev, + Opt_journal_checksum, Opt_journal_async_commit, Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback, Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota, Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota, @@ -908,6 +909,8 @@ static match_table_t tokens = { {Opt_journal_update, "journal=update"}, {Opt_journal_inum, "journal=%u"}, {Opt_journal_dev, "journal_dev=%u"}, + {Opt_journal_checksum, "journal_checksum"}, + {Opt_journal_async_commit, "journal_async_commit"}, {Opt_abort, "abort"}, {Opt_data_journal, "data=journal"}, {Opt_data_ordered, "data=ordered"}, @@ -1095,6 +1098,13 @@ static int parse_options (char *options, struct super_block *sb, return 0; *journal_devnum = option; break; + case Opt_journal_checksum: + set_opt(sbi->s_mount_opt, JOURNAL_CHECKSUM); + break; + case Opt_journal_async_commit: + set_opt(sbi->s_mount_opt, JOURNAL_ASYNC_COMMIT); + set_opt(sbi->s_mount_opt, JOURNAL_CHECKSUM); + break; case Opt_noload: set_opt (sbi->s_mount_opt, NOLOAD); break; @@ -2114,6 +2124,21 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent) goto failed_mount4; } + if (test_opt(sb, JOURNAL_ASYNC_
[PATCH 02/49] ext4: Avoid rec_len overflow with 64KB block size
From: Jan Kara <[EMAIL PROTECTED]> With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry lenght. So we store 0x instead and convert value when read from / written to disk. The patch also converts some places to use ext4_next_entry() when we are changing them anyway. Signed-off-by: Jan Kara <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> --- fs/ext4/dir.c | 12 fs/ext4/namei.c | 77 ++ include/linux/ext4_fs.h | 20 3 files changed, 63 insertions(+), 46 deletions(-) diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c index f612bef..145a9c0 100644 --- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -67,7 +67,7 @@ int ext4_check_dir_entry (const char * function, struct inode * dir, unsigned long offset) { const char * error_msg = NULL; - const int rlen = le16_to_cpu(de->rec_len); + const int rlen = ext4_rec_len_from_disk(de->rec_len); if (rlen < EXT4_DIR_REC_LEN(1)) error_msg = "rec_len is smaller than minimal"; @@ -172,10 +172,10 @@ revalidate: * least that it is non-zero. A * failure will be detected in the * dirent test below. */ - if (le16_to_cpu(de->rec_len) < - EXT4_DIR_REC_LEN(1)) + if (ext4_rec_len_from_disk(de->rec_len) + < EXT4_DIR_REC_LEN(1)) break; - i += le16_to_cpu(de->rec_len); + i += ext4_rec_len_from_disk(de->rec_len); } offset = i; filp->f_pos = (filp->f_pos & ~(sb->s_blocksize - 1)) @@ -197,7 +197,7 @@ revalidate: ret = stored; goto out; } - offset += le16_to_cpu(de->rec_len); + offset += ext4_rec_len_from_disk(de->rec_len); if (le32_to_cpu(de->inode)) { /* We might block in the next section * if the data destination is @@ -219,7 +219,7 @@ revalidate: goto revalidate; stored ++; } - filp->f_pos += le16_to_cpu(de->rec_len); + filp->f_pos += ext4_rec_len_from_disk(de->rec_len); } offset = 0; brelse (bh); diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 94ee6f3..d9a3a2f 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -280,7 +280,7 @@ static struct stats dx_show_leaf(struct dx_hash_info *hinfo, struct ext4_dir_ent space += EXT4_DIR_REC_LEN(de->name_len); names++; } - de = (struct ext4_dir_entry_2 *) ((char *) de + le16_to_cpu(de->rec_len)); + de = ext4_next_entry(de); } printk("(%i)\n", names); return (struct stats) { names, space, 1 }; @@ -551,7 +551,8 @@ static int ext4_htree_next_block(struct inode *dir, __u32 hash, */ static inline struct ext4_dir_entry_2 *ext4_next_entry(struct ext4_dir_entry_2 *p) { - return (struct ext4_dir_entry_2 *)((char*)p + le16_to_cpu(p->rec_len)); + return (struct ext4_dir_entry_2 *)((char *)p + + ext4_rec_len_from_disk(p->rec_len)); } /* @@ -720,7 +721,7 @@ static int dx_make_map (struct ext4_dir_entry_2 *de, int size, cond_resched(); } /* XXX: do we need to check rec_len == 0 case? -Chris */ - de = (struct ext4_dir_entry_2 *) ((char *) de + le16_to_cpu(de->rec_len)); + de = ext4_next_entry(de); } return count; } @@ -820,7 +821,7 @@ static inline int search_dirblock(struct buffer_head * bh, return 1; } /* prevent looping on a bad block */ - de_len = le16_to_cpu(de->rec_len); + de_len = ext4_rec_len_from_disk(de->rec_len); if (de_len <= 0) return -1; offset += de_len; @@ -1128,7 +1129,7 @@ dx_move_dirents(char *from, char *to, struct dx_map_entry *map, int count) rec_len = EXT4_DIR_REC_LEN(de->name_len); memcpy (to, de, rec_len); ((struct ext4_dir_entry_2 *) to)->rec_len = - cpu_to_le16(rec_len); + ext4_rec_len_to_disk(rec_len); de->inode = 0; map++; to += rec_len; @@ -11
[PATCH 43/49] ext4: Check for return value from sb_set_blocksize
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> sb_set_blocksize validates whether the specfied block size can be used by the file system. Make sure we fail mounting the file system if the blocksize specfied cannot be used. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> --- fs/ext4/super.c | 15 +-- 1 files changed, 5 insertions(+), 10 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 91a11ec..a91e17e 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1809,7 +1809,6 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent) unsigned long def_mount_opts; struct inode *root; int blocksize; - int hblock; int db_count; int i; int needs_recovery; @@ -1966,20 +1965,16 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent) goto failed_mount; } - hblock = bdev_hardsect_size(sb->s_bdev); if (sb->s_blocksize != blocksize) { - /* -* Make sure the blocksize for the filesystem is larger -* than the hardware sectorsize for the machine. -*/ - if (blocksize < hblock) { - printk(KERN_ERR "EXT4-fs: blocksize %d too small for " - "device blocksize %d.\n", blocksize, hblock); + + /* Validate the filesystem blocksize */ + if (!sb_set_blocksize(sb, blocksize)) { + printk(KERN_ERR "EXT4-fs: bad block size %d.\n", + blocksize); goto failed_mount; } brelse (bh); - sb_set_blocksize(sb, blocksize); logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE; offset = do_div(logical_sb_block, blocksize); bh = sb_bread(sb, logical_sb_block); -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/49] ext4: add ext4_group_t, and change all group variables to this type.
From: Avantika Mathur <[EMAIL PROTECTED]> In many places variables for block group are of type int, which limits the maximum number of block groups to 2^31. Each block group can have up to 2^15 blocks, with a 4K block size, and the max filesystem size is limited to 2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB This patch introduces a new type ext4_group_t, of type unsigned long, to represent block group numbers in ext4. All occurrences of block group variables are converted to type ext4_group_t. Signed-off-by: Avantika Mathur <[EMAIL PROTECTED]> --- fs/ext4/balloc.c | 69 +--- fs/ext4/group.h|8 +++-- fs/ext4/ialloc.c | 46 +++-- fs/ext4/inode.c|5 ++- fs/ext4/resize.c | 12 fs/ext4/super.c| 20 ++--- include/linux/ext4_fs.h| 11 --- include/linux/ext4_fs_i.h |5 ++- include/linux/ext4_fs_sb.h |2 +- 9 files changed, 91 insertions(+), 87 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 71ee95e..9568a57 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -29,7 +29,7 @@ * Calculate the block group number and offset, given a block number */ void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr, - unsigned long *blockgrpp, ext4_grpblk_t *offsetp) + ext4_group_t *blockgrpp, ext4_grpblk_t *offsetp) { struct ext4_super_block *es = EXT4_SB(sb)->s_es; ext4_grpblk_t offset; @@ -46,7 +46,7 @@ void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr, /* Initializes an uninitialized block bitmap if given, and returns the * number of blocks free in the group. */ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh, - int block_group, struct ext4_group_desc *gdp) +ext4_group_t block_group, struct ext4_group_desc *gdp) { unsigned long start; int bit, bit_max; @@ -60,7 +60,7 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh, * essentially implementing a per-group read-only flag. */ if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) { ext4_error(sb, __FUNCTION__, - "Checksum bad for group %u\n", block_group); + "Checksum bad for group %lu\n", block_group); gdp->bg_free_blocks_count = 0; gdp->bg_free_inodes_count = 0; gdp->bg_itable_unused = 0; @@ -153,7 +153,7 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh, * group descriptor */ struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb, -unsigned int block_group, +ext4_group_t block_group, struct buffer_head ** bh) { unsigned long group_desc; @@ -164,7 +164,7 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb, if (block_group >= sbi->s_groups_count) { ext4_error (sb, "ext4_get_group_desc", "block_group >= groups_count - " - "block_group = %d, groups_count = %lu", + "block_group = %lu, groups_count = %lu", block_group, sbi->s_groups_count); return NULL; @@ -176,7 +176,7 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb, if (!sbi->s_group_desc[group_desc]) { ext4_error (sb, "ext4_get_group_desc", "Group descriptor not loaded - " - "block_group = %d, group_desc = %lu, desc = %lu", + "block_group = %lu, group_desc = %lu, desc = %lu", block_group, group_desc, offset); return NULL; } @@ -200,7 +200,7 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb, * Return buffer_head on success or NULL in case of failure. */ struct buffer_head * -read_block_bitmap(struct super_block *sb, unsigned int block_group) +read_block_bitmap(struct super_block *sb, ext4_group_t block_group) { struct ext4_group_desc * desc; struct buffer_head * bh = NULL; @@ -227,7 +227,7 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group) if (!bh) ext4_error (sb, __FUNCTION__, "Cannot read block bitmap - " - "block_group = %d, block_bitmap = %llu", + "block_group = %lu, block_bitmap = %llu", block_group, bitmap_blk); return bh; } @@ -320,7 +3
[PATCH 22/49] ext4: Change the default behaviour on error
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> ext4 file system was by default ignoring errors and continuing. This is not a good default as continuing on error could lead to file system corruption. Change the default to mark the file system readonly. Debian and ubuntu already does this as the default in their fstab. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> Acked-by: Eric Sandeen <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> --- fs/ext4/super.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 32e3ecb..effd375 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -688,16 +688,16 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) le16_to_cpu(es->s_def_resgid) != EXT4_DEF_RESGID) { seq_printf(seq, ",resgid=%u", sbi->s_resgid); } - if (test_opt(sb, ERRORS_CONT)) { + if (test_opt(sb, ERRORS_RO)) { int def_errors = le16_to_cpu(es->s_errors); if (def_errors == EXT4_ERRORS_PANIC || - def_errors == EXT4_ERRORS_RO) { - seq_puts(seq, ",errors=continue"); + def_errors == EXT4_ERRORS_CONTINUE) { + seq_puts(seq, ",errors=remount-ro"); } } - if (test_opt(sb, ERRORS_RO)) - seq_puts(seq, ",errors=remount-ro"); + if (test_opt(sb, ERRORS_CONT)) + seq_puts(seq, ",errors=continue"); if (test_opt(sb, ERRORS_PANIC)) seq_puts(seq, ",errors=panic"); if (test_opt(sb, NO_UID32)) @@ -1819,10 +1819,10 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent) if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_PANIC) set_opt(sbi->s_mount_opt, ERRORS_PANIC); - else if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_RO) - set_opt(sbi->s_mount_opt, ERRORS_RO); - else + else if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_CONTINUE) set_opt(sbi->s_mount_opt, ERRORS_CONT); + else + set_opt(sbi->s_mount_opt, ERRORS_RO); sbi->s_resuid = le16_to_cpu(es->s_def_resuid); sbi->s_resgid = le16_to_cpu(es->s_def_resgid); -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 36/49] ext4: Add EXT4_IOC_MIGRATE ioctl
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> The below patch add ioctl for migrating ext3 indirect block mapped inode to ext4 extent mapped inode. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/Makefile|2 +- fs/ext4/ioctl.c |3 + fs/ext4/migrate.c | 634 +++ include/linux/ext4_fs.h |4 + 4 files changed, 642 insertions(+), 1 deletions(-) create mode 100644 fs/ext4/migrate.c diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile index ae6e7e5..d5fd80b 100644 --- a/fs/ext4/Makefile +++ b/fs/ext4/Makefile @@ -6,7 +6,7 @@ obj-$(CONFIG_EXT4DEV_FS) += ext4dev.o ext4dev-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \ ioctl.o namei.o super.o symlink.o hash.o resize.o extents.o \ - ext4_jbd2.o + ext4_jbd2.o migrate.o ext4dev-$(CONFIG_EXT4DEV_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o ext4dev-$(CONFIG_EXT4DEV_FS_POSIX_ACL) += acl.o diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index c0e5b8c..2ed7c37 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -254,6 +254,9 @@ flags_err: return err; } + case EXT4_IOC_MIGRATE: + return ext4_ext_migrate(inode, filp, cmd, arg); + default: return -ENOTTY; } diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c new file mode 100644 index 000..7203d3d --- /dev/null +++ b/fs/ext4/migrate.c @@ -0,0 +1,634 @@ +/* + * Copyright IBM Corporation, 2007 + * Author Aneesh Kumar K.V <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + * + */ + +#include +#include +#include + +struct list_blocks_struct { + ext4_lblk_t first_block, last_block; + ext4_fsblk_t first_pblock, last_pblock; +}; + +/* will go away */ +static void ext4_ext_store_pblock(struct ext4_extent *ex, ext4_fsblk_t pb) +{ + ex->ee_start_lo = cpu_to_le32((unsigned long) (pb & 0x)); + ex->ee_start_hi = cpu_to_le16((unsigned long) ((pb >> 31) >> 1) + & 0x); +} + +static int finish_range(handle_t *handle, struct inode *inode, + struct list_blocks_struct *lb) + +{ + int retval = 0, needed; + struct ext4_extent newext; + struct ext4_ext_path *path; + if (lb->first_pblock == 0) + return 0; + + /* Add the extent to temp inode*/ + newext.ee_block = cpu_to_le32(lb->first_block); + newext.ee_len = cpu_to_le16(lb->last_block - lb->first_block + 1); + ext4_ext_store_pblock(&newext, lb->first_pblock); + path = ext4_ext_find_extent(inode, lb->first_block, NULL); + + if (IS_ERR(path)) { + retval = PTR_ERR(path); + goto err_out; + } + + /* +* Calculate the credit needed to inserting this extent +* Since we are doing this in loop we may accumalate extra +* credit. But below we try to not accumalate too much +* of them by restarting the journal. +*/ + needed = ext4_ext_calc_credits_for_insert(inode, path); + + /* +* Make sure the credit we accumalated is not really high +*/ + + if (needed && handle->h_buffer_credits >= EXT4_RESERVE_TRANS_BLOCKS) { + + retval = ext4_journal_restart(handle, needed); + if (retval) + goto err_out; + + } + + if (needed) { + retval = ext4_journal_extend(handle, needed); + if (retval != 0) { + /* +* IF not able to extend the journal restart the journal +*/ + retval = ext4_journal_restart(handle, needed); + if (retval) + goto err_out; + } + } + + retval = ext4_ext_insert_extent(handle, inode, path, &newext); + +err_out: + lb->first_pblock = 0; + return retval; +} +static int update_extent_range(handle_t *handle, struct inode *inode, + ext4_fsblk_t pblock, ext4_lblk_t blk_num, + struct list_blocks_struct *lb) +{ + int retval; + + /* +* See if we can add on to the existing range (if it exists) +*/ + if (lb->first_pblock && + (lb->last_pblock+1 == pblock) && + (lb->last_block+1 == blk_num)) { + lb->last_pblock = pblock; + lb->last_block = blk_num; + retur
[PATCH 13/49] ext4: different maxbytes functions for bitmap & extent files
From: Eric Sandeen <[EMAIL PROTECTED]> use 2 different maxbytes functions for bitmapped & extent-based files. Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]> --- fs/ext4/super.c | 45 ++--- 1 files changed, 42 insertions(+), 3 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 64067de..c79e46b 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1600,19 +1600,58 @@ static void ext4_orphan_cleanup (struct super_block * sb, #endif sb->s_flags = s_flags; /* Restore MS_RDONLY status */ } +/* + * Maximal extent format file size. + * Resulting logical blkno at s_maxbytes must fit in our on-disk + * extent format containers, within a sector_t, and within i_blocks + * in the vfs. ext4 inode has 48 bits of i_block in fsblock units, + * so that won't be a limiting factor. + * + * Note, this does *not* consider any metadata overhead for vfs i_blocks. + */ +static loff_t ext4_max_size(int blkbits) +{ + loff_t res; + loff_t upper_limit = MAX_LFS_FILESIZE; + + /* small i_blocks in vfs inode? */ + if (sizeof(blkcnt_t) < sizeof(u64)) { + /* +* CONFIG_LSF is not enabled implies the inode +* i_block represent total blocks in 512 bytes +* 32 == size of vfs inode i_blocks * 8 +*/ + upper_limit = (1LL << 32) - 1; + + /* total blocks in file system block size */ + upper_limit >>= (blkbits - 9); + upper_limit <<= blkbits; + } + + /* 32-bit extent-start container, ee_block */ + res = 1LL << 32; + res <<= blkbits; + res -= 1; + + /* Sanity check against vm- & vfs- imposed limits */ + if (res > upper_limit) + res = upper_limit; + + return res; +} /* - * Maximal file size. There is a direct, and {,double-,triple-}indirect + * Maximal bitmap file size. There is a direct, and {,double-,triple-}indirect * block limit, and also a limit of (2^48 - 1) 512-byte sectors in i_blocks. * We need to be 1 filesystem block less than the 2^48 sector limit. */ -static loff_t ext4_max_size(int bits) +static loff_t ext4_max_bitmap_size(int bits) { loff_t res = EXT4_NDIR_BLOCKS; int meta_blocks; loff_t upper_limit; /* This is calculated to be the largest file size for a -* dense, file such that the total number of +* dense, bitmapped file such that the total number of * sectors in the file, including data and all indirect blocks, * does not exceed 2^48 -1 * __u32 i_blocks_lo and _u16 i_blocks_high representing the -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 30/49] ext4: Convert truncate_mutex to read write semaphore.
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> We are currently taking the truncate_mutex for every read. This would have performance impact on large CPU configuration. Convert the lock to read write semaphore and take read lock when we are trying to read the file. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/balloc.c |2 +- fs/ext4/extents.c | 13 +++-- fs/ext4/file.c|4 ++-- fs/ext4/inode.c | 39 --- fs/ext4/ioctl.c |4 ++-- fs/ext4/super.c |2 +- include/linux/ext4_fs.h | 25 - include/linux/ext4_fs_i.h |6 +++--- 8 files changed, 52 insertions(+), 43 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index a9140ea..925e063 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -526,7 +526,7 @@ static inline int rsv_is_empty(struct ext4_reserve_window *rsv) * when setting the reservation window size through ioctl before the file * is open for write (needs block allocation). * - * Needs truncate_mutex protection prior to call this function. + * Needs down_write(i_data_sem) protection prior to call this function. */ void ext4_init_block_alloc_info(struct inode *inode) { diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index ec5019f..03d1bbb 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1565,7 +1565,7 @@ static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode, * This routine returns max. credits that the extent tree can consume. * It should be OK for low-performance paths like ->writepage() * To allow many writing processes to fit into a single transaction, - * the caller should calculate credits under truncate_mutex and + * the caller should calculate credits under i_data_sem and * pass the actual path. */ int ext4_ext_calc_credits_for_insert(struct inode *inode, @@ -2131,7 +2131,8 @@ out: /* * Need to be called with - * mutex_lock(&EXT4_I(inode)->truncate_mutex); + * down_read(&EXT4_I(inode)->i_data_sem) if not allocating file system block + * (ie, create is zero). Otherwise down_write(&EXT4_I(inode)->i_data_sem) */ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, ext4_lblk_t iblock, @@ -2350,7 +2351,7 @@ void ext4_ext_truncate(struct inode * inode, struct page *page) if (page) ext4_block_truncate_page(handle, page, mapping, inode->i_size); - mutex_lock(&EXT4_I(inode)->truncate_mutex); + down_write(&EXT4_I(inode)->i_data_sem); ext4_ext_invalidate_cache(inode); /* @@ -2386,7 +2387,7 @@ out_stop: if (inode->i_nlink) ext4_orphan_del(handle, inode); - mutex_unlock(&EXT4_I(inode)->truncate_mutex); + up_write(&EXT4_I(inode)->i_data_sem); ext4_journal_stop(handle); } @@ -2450,7 +2451,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len) * modify 1 super block, 1 block bitmap and 1 group descriptor. */ credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + 3; - mutex_lock(&EXT4_I(inode)->truncate_mutex) + down_write((&EXT4_I(inode)->i_data_sem)); retry: while (ret >= 0 && ret < max_blocks) { block = block + ret; @@ -2507,7 +2508,7 @@ retry: if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) goto retry; - mutex_unlock(&EXT4_I(inode)->truncate_mutex) + up_write((&EXT4_I(inode)->i_data_sem)); /* * Time to update the file size. * Update only when preallocation was requested beyond the file size. diff --git a/fs/ext4/file.c b/fs/ext4/file.c index a6b2aa1..ac35ec5 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -37,9 +37,9 @@ static int ext4_release_file (struct inode * inode, struct file * filp) if ((filp->f_mode & FMODE_WRITE) && (atomic_read(&inode->i_writecount) == 1)) { - mutex_lock(&EXT4_I(inode)->truncate_mutex); + down_write(&EXT4_I(inode)->i_data_sem); ext4_discard_reservation(inode); - mutex_unlock(&EXT4_I(inode)->truncate_mutex); + up_write(&EXT4_I(inode)->i_data_sem); } if (is_dx(inode) && filp->private_data) ext4_htree_free_dir_info(filp->private_data); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 71c7ad0..596b3ab 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -308,7 +308,7 @@ static int ext4_block_to_path(struct inode *inode, final = ptrs; } else { ext4_warning(inode->i_sb, "ext4_block_to_path", - "block %u > max", + "block %lu > max", i_block + direct_blocks + indirect_blocks + double_blocks); } @@ -345,7 +345,7 @@ static int ext4
[PATCH 10/49] ext4: Rename i_dir_acl to i_size_high
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> Rename ext4_inode.i_dir_acl to i_size_high drop ext4_inode_info.i_dir_acl as it is not used Rename ext4_inode.i_size to ext4_inode.i_size_lo Add helper function for accessing the ext4_inode combined i_size. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/ialloc.c |1 - fs/ext4/inode.c | 55 ++--- include/linux/ext4_fs.h | 15 +-- include/linux/ext4_fs_i.h |1 - 4 files changed, 34 insertions(+), 38 deletions(-) diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 00b152b..17b5df1 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -709,7 +709,6 @@ got: if (!S_ISDIR(mode)) ei->i_flags &= ~EXT4_DIRSYNC_FL; ei->i_file_acl = 0; - ei->i_dir_acl = 0; ei->i_dtime = 0; ei->i_block_alloc_info = NULL; ei->i_block_group = group; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 7bcec18..e663455 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2694,7 +2694,6 @@ void ext4_read_inode(struct inode * inode) inode->i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16; } inode->i_nlink = le16_to_cpu(raw_inode->i_links_count); - inode->i_size = le32_to_cpu(raw_inode->i_size); ei->i_state = 0; ei->i_dir_start_lookup = 0; @@ -2720,15 +2719,11 @@ void ext4_read_inode(struct inode * inode) ei->i_flags = le32_to_cpu(raw_inode->i_flags); ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl_lo); if (EXT4_SB(inode->i_sb)->s_es->s_creator_os != - cpu_to_le32(EXT4_OS_HURD)) + cpu_to_le32(EXT4_OS_HURD)) { ei->i_file_acl |= ((__u64)le16_to_cpu(raw_inode->i_file_acl_high)) << 32; - if (!S_ISREG(inode->i_mode)) { - ei->i_dir_acl = le32_to_cpu(raw_inode->i_dir_acl); - } else { - inode->i_size |= - ((__u64)le32_to_cpu(raw_inode->i_size_high)) << 32; } + inode->i_size = ext4_isize(raw_inode); ei->i_disksize = inode->i_size; inode->i_generation = le32_to_cpu(raw_inode->i_generation); ei->i_block_group = iloc.block_group; @@ -2852,7 +2847,6 @@ static int ext4_do_update_inode(handle_t *handle, raw_inode->i_gid_high = 0; } raw_inode->i_links_count = cpu_to_le16(inode->i_nlink); - raw_inode->i_size = cpu_to_le32(ei->i_disksize); EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode); EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode); @@ -2867,32 +2861,27 @@ static int ext4_do_update_inode(handle_t *handle, raw_inode->i_file_acl_high = cpu_to_le16(ei->i_file_acl >> 32); raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl); - if (!S_ISREG(inode->i_mode)) { - raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl); - } else { - raw_inode->i_size_high = - cpu_to_le32(ei->i_disksize >> 32); - if (ei->i_disksize > 0x7fffULL) { - struct super_block *sb = inode->i_sb; - if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, - EXT4_FEATURE_RO_COMPAT_LARGE_FILE) || - EXT4_SB(sb)->s_es->s_rev_level == - cpu_to_le32(EXT4_GOOD_OLD_REV)) { - /* If this is the first large file - * created, add a flag to the superblock. - */ - err = ext4_journal_get_write_access(handle, - EXT4_SB(sb)->s_sbh); - if (err) - goto out_brelse; - ext4_update_dynamic_rev(sb); - EXT4_SET_RO_COMPAT_FEATURE(sb, + ext4_isize_set(raw_inode, ei->i_disksize); + if (ei->i_disksize > 0x7fffULL) { + struct super_block *sb = inode->i_sb; + if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, + EXT4_FEATURE_RO_COMPAT_LARGE_FILE) || + EXT4_SB(sb)->s_es->s_rev_level == + cpu_to_le32(EXT4_GOOD_OLD_REV)) { + /* If this is the first large file +* created, add a flag to the superblock. +*/ + err = ext4_journal_get_write_access(handle, + EXT4_SB(sb)->s_sbh); + if (err) + goto out_brelse; + ext4_update_dynamic_rev(sb); + EXT4_SET_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_LARGE_FILE); -
[PATCH 29/49] ext4: Make ext4_get_blocks_wrap take the truncate_mutex early.
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> When doing a migrate from ext3 to ext4 inode we need to make sure the test for inode type and walking inode data happens inside lock. To make this happen move truncate_mutex early before checking the i_flags. This actually should enable us to remove the verify_chain(). Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/extents.c |9 -- fs/ext4/inode.c | 69 +- include/linux/ext4_fs.h |2 + 3 files changed, 16 insertions(+), 64 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 8593e59..ec5019f 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2129,6 +2129,10 @@ out: return err ? err : allocated; } +/* + * Need to be called with + * mutex_lock(&EXT4_I(inode)->truncate_mutex); + */ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, ext4_lblk_t iblock, unsigned long max_blocks, struct buffer_head *bh_result, @@ -2144,7 +2148,6 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, __clear_bit(BH_New, &bh_result->b_state); ext_debug("blocks %u/%lu requested for inode %u\n", iblock, max_blocks, inode->i_ino); - mutex_lock(&EXT4_I(inode)->truncate_mutex); /* check in cache */ goal = ext4_ext_in_cache(inode, iblock, &newex); @@ -2318,8 +2321,6 @@ out2: ext4_ext_drop_refs(path); kfree(path); } - mutex_unlock(&EXT4_I(inode)->truncate_mutex); - return err ? err : allocated; } @@ -2449,6 +2450,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len) * modify 1 super block, 1 block bitmap and 1 group descriptor. */ credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + 3; + mutex_lock(&EXT4_I(inode)->truncate_mutex) retry: while (ret >= 0 && ret < max_blocks) { block = block + ret; @@ -2505,6 +2507,7 @@ retry: if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) goto retry; + mutex_unlock(&EXT4_I(inode)->truncate_mutex) /* * Time to update the file size. * Update only when preallocation was requested beyond the file size. diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index eaace13..71c7ad0 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -243,13 +243,6 @@ static inline void add_chain(Indirect *p, struct buffer_head *bh, __le32 *v) p->bh = bh; } -static int verify_chain(Indirect *from, Indirect *to) -{ - while (from <= to && from->key == *from->p) - from++; - return (from > to); -} - /** * ext4_block_to_path - parse the block number into array of offsets * @inode: inode in question (we are only interested in its superblock) @@ -348,10 +341,11 @@ static int ext4_block_to_path(struct inode *inode, * (pointer to last triple returned, [EMAIL PROTECTED] == 0) * or when it gets an IO error reading an indirect block * (ditto, [EMAIL PROTECTED] == -EIO) - * or when it notices that chain had been changed while it was reading - * (ditto, [EMAIL PROTECTED] == -EAGAIN) * or when it reads all @depth-1 indirect blocks successfully and finds * the whole chain, all way to the data (returns %NULL, *err == 0). + * + * Need to be called with + * mutex_lock(&EXT4_I(inode)->truncate_mutex) */ static Indirect *ext4_get_branch(struct inode *inode, int depth, ext4_lblk_t *offsets, @@ -370,9 +364,6 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth, bh = sb_bread(sb, le32_to_cpu(p->key)); if (!bh) goto failure; - /* Reader: pointers */ - if (!verify_chain(chain, p)) - goto changed; add_chain(++p, bh, (__le32*)bh->b_data + *++offsets); /* Reader: end */ if (!p->key) @@ -380,10 +371,6 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth, } return NULL; -changed: - brelse(bh); - *err = -EAGAIN; - goto no_block; failure: *err = -EIO; no_block: @@ -787,6 +774,10 @@ err_out: * return > 0, # of blocks mapped or allocated. * return = 0, if plain lookup failed. * return < 0, error case. + * + * + * Need to be called with + * mutex_lock(&EXT4_I(inode)->truncate_mutex) */ int ext4_get_blocks_handle(handle_t *handle, struct inode *inode, ext4_lblk_t iblock, unsigned long maxblocks, @@ -825,18 +816,6 @@ int ext4_get_blocks_handle(handle_t *handle, struct inode *inode, while (count < maxblocks && count <= blocks_to_boundary) { ext4_fsblk_t blk; - if (!verify_chain(chai
[PATCH 11/49] ext4: Add support for 48 bit inode i_blocks.
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode to represet the higher 16 bits for i_blocks. With this change max_file size becomes (2**48 -1 )* 512 bytes. We add a RO_COMPAT feature to the super block to indicate that inode have i_blocks represented as a split 48 bits. Super block with this feature set cannot be mounted read write on a kernel with CONFIG_LSF disabled. Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/inode.c | 58 ++- fs/ext4/super.c | 62 ++ include/linux/ext4_fs.h | 10 +-- 3 files changed, 119 insertions(+), 11 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e663455..bb89fe7 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2667,6 +2667,22 @@ void ext4_get_inode_flags(struct ext4_inode_info *ei) if (flags & S_DIRSYNC) ei->i_flags |= EXT4_DIRSYNC_FL; } +static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, + struct ext4_inode_info *ei) +{ + blkcnt_t i_blocks ; + struct super_block *sb = ei->vfs_inode.i_sb; + + if (EXT4_HAS_RO_COMPAT_FEATURE(sb, + EXT4_FEATURE_RO_COMPAT_HUGE_FILE)) { + /* we are using combined 48 bit field */ + i_blocks = ((u64)le16_to_cpu(raw_inode->i_blocks_high)) << 32 | + le32_to_cpu(raw_inode->i_blocks_lo); + return i_blocks; + } else { + return le32_to_cpu(raw_inode->i_blocks_lo); + } +} void ext4_read_inode(struct inode * inode) { @@ -2715,8 +2731,8 @@ void ext4_read_inode(struct inode * inode) * recovery code: that's fine, we're about to complete * the process of deleting those. */ } - inode->i_blocks = le32_to_cpu(raw_inode->i_blocks); ei->i_flags = le32_to_cpu(raw_inode->i_flags); + inode->i_blocks = ext4_inode_blocks(raw_inode, ei); ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl_lo); if (EXT4_SB(inode->i_sb)->s_es->s_creator_os != cpu_to_le32(EXT4_OS_HURD)) { @@ -2799,6 +2815,43 @@ bad_inode: return; } +static int ext4_inode_blocks_set(handle_t *handle, + struct ext4_inode *raw_inode, + struct ext4_inode_info *ei) +{ + struct inode *inode = &(ei->vfs_inode); + u64 i_blocks = inode->i_blocks; + struct super_block *sb = inode->i_sb; + int err = 0; + + if (i_blocks <= ~0U) { + /* +* i_blocks can be represnted in a 32 bit variable +* as multiple of 512 bytes +*/ + raw_inode->i_blocks_lo = cpu_to_le32((u32)i_blocks); + raw_inode->i_blocks_high = 0; + } else if (i_blocks <= 0xULL) { + /* +* i_blocks can be represented in a 48 bit variable +* as multiple of 512 bytes +*/ + err = ext4_update_rocompat_feature(handle, sb, + EXT4_FEATURE_RO_COMPAT_HUGE_FILE); + if (err) + goto err_out; + /* i_block is stored in the split 48 bit fields */ + raw_inode->i_blocks_lo = cpu_to_le32((u32)i_blocks); + raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32); + } else { + ext4_error(sb, __FUNCTION__, + "Wrong inode i_blocks count %llu\n", + (unsigned long long)inode->i_blocks); + } +err_out: + return err; +} + /* * Post the struct inode info into an on-disk inode location in the * buffer-cache. This gobbles the caller's reference to the @@ -2853,7 +2906,8 @@ static int ext4_do_update_inode(handle_t *handle, EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode); EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode); - raw_inode->i_blocks = cpu_to_le32(inode->i_blocks); + if (ext4_inode_blocks_set(handle, raw_inode, ei)) + goto out_brelse; raw_inode->i_dtime = cpu_to_le32(ei->i_dtime); raw_inode->i_flags = cpu_to_le32(ei->i_flags); if (EXT4_SB(inode->i_sb)->s_es->s_creator_os != diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 7be27db..2b9dc96 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1603,17 +1603,50 @@ static void ext4_orphan_cleanup (struct super_block * sb, /* * Maximal file size. There is a direct, and {,double-,triple-}indirect - * block limit, and also a limit of (2^32 - 1) 512-byte sectors in i_blocks. - * We need to be 1 filesystem block less than the 2^32 sector limit. + * block limit, and also a limit of (2^48 - 1
[PATCH 24/49] ext4: add block bitmap validation
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given corresponding to block bitmap, inode bitmap and inode tables. Validate the block bitmap against these blocks. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/balloc.c | 99 -- 1 files changed, 81 insertions(+), 18 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index ff3428e..a9140ea 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -189,13 +189,65 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb, return desc; } +static int ext4_valid_block_bitmap(struct super_block *sb, + struct ext4_group_desc *desc, + unsigned int block_group, + struct buffer_head *bh) +{ + ext4_grpblk_t offset; + ext4_grpblk_t next_zero_bit; + ext4_fsblk_t bitmap_blk; + ext4_fsblk_t group_first_block; + + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) { + /* with FLEX_BG, the inode/block bitmaps and itable +* blocks may not be in the group at all +* so the bitmap validation will be skipped for those groups +* or it has to also read the block group where the bitmaps +* are located to verify they are set. +*/ + return 1; + } + group_first_block = ext4_group_first_block_no(sb, block_group); + + /* check whether block bitmap block number is set */ + bitmap_blk = ext4_block_bitmap(sb, desc); + offset = bitmap_blk - group_first_block; + if (!ext4_test_bit(offset, bh->b_data)) + /* bad block bitmap */ + goto err_out; + + /* check whether the inode bitmap block number is set */ + bitmap_blk = ext4_inode_bitmap(sb, desc); + offset = bitmap_blk - group_first_block; + if (!ext4_test_bit(offset, bh->b_data)) + /* bad block bitmap */ + goto err_out; + + /* check whether the inode table block number is set */ + bitmap_blk = ext4_inode_table(sb, desc); + offset = bitmap_blk - group_first_block; + next_zero_bit = ext4_find_next_zero_bit(bh->b_data, + offset + EXT4_SB(sb)->s_itb_per_group, + offset); + if (next_zero_bit >= offset + EXT4_SB(sb)->s_itb_per_group) + /* good bitmap for inode tables */ + return 1; + +err_out: + ext4_error(sb, __FUNCTION__, + "Invalid block bitmap - " + "block_group = %d, block = %llu", + block_group, bitmap_blk); + return 0; +} /** * read_block_bitmap() * @sb:super block * @block_group: given block group * - * Read the bitmap for a given block_group, reading into the specified - * slot in the superblock's bitmap cache. + * Read the bitmap for a given block_group,and validate the + * bits for block/inode/inode tables are set in the bitmaps * * Return buffer_head on success or NULL in case of failure. */ @@ -210,25 +262,36 @@ read_block_bitmap(struct super_block *sb, ext4_group_t block_group) if (!desc) return NULL; bitmap_blk = ext4_block_bitmap(sb, desc); + bh = sb_getblk(sb, bitmap_blk); + if (unlikely(!bh)) { + ext4_error(sb, __FUNCTION__, + "Cannot read block bitmap - " + "block_group = %d, block_bitmap = %llu", + (int)block_group, (unsigned long long)bitmap_blk); + return NULL; + } + if (bh_uptodate_or_lock(bh)) + return bh; + if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { - bh = sb_getblk(sb, bitmap_blk); - if (!buffer_uptodate(bh)) { - lock_buffer(bh); - if (!buffer_uptodate(bh)) { - ext4_init_block_bitmap(sb, bh, block_group, - desc); - set_buffer_uptodate(bh); - } - unlock_buffer(bh); - } - } else { - bh = sb_bread(sb, bitmap_blk); + ext4_init_block_bitmap(sb, bh, block_group, desc); + set_buffer_uptodate(bh); + unlock_buffer(bh); + return bh; } - if (!bh) - ext4_error (sb, __FUNCTION__, + if (bh_submit_read(bh) < 0) { + brelse(bh); + ext4_error(sb, __FUNCTION__, "Cannot read block bitmap - " -
[PATCH 46/49] jbd2: add lockdep support
From: Mingming Cao <[EMAIL PROTECTED]> Ported from similar patch for the jbd layer. Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]> --- fs/jbd2/transaction.c | 11 +++ include/linux/jbd2.h |4 2 files changed, 15 insertions(+), 0 deletions(-) diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index f30802a..70b3199 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -241,6 +241,8 @@ out: return ret; } +static struct lock_class_key jbd2_handle_key; + /* Allocate a new handle. This should probably be in a slab... */ static handle_t *new_handle(int nblocks) { @@ -251,6 +253,9 @@ static handle_t *new_handle(int nblocks) handle->h_buffer_credits = nblocks; handle->h_ref = 1; + lockdep_init_map(&handle->h_lockdep_map, "jbd2_handle", + &jbd2_handle_key, 0); + return handle; } @@ -293,7 +298,11 @@ handle_t *jbd2_journal_start(journal_t *journal, int nblocks) jbd2_free_handle(handle); current->journal_info = NULL; handle = ERR_PTR(err); + goto out; } + + lock_acquire(&handle->h_lockdep_map, 0, 0, 0, 2, _THIS_IP_); +out: return handle; } @@ -1419,6 +1428,8 @@ int jbd2_journal_stop(handle_t *handle) spin_unlock(&journal->j_state_lock); } + lock_release(&handle->h_lockdep_map, 1, _THIS_IP_); + jbd2_free_handle(handle); return err; } diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index a2645c2..f982d38 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -418,6 +418,10 @@ struct handle_s unsigned inth_sync: 1; /* sync-on-close */ unsigned inth_jdata:1; /* force data journaling */ unsigned inth_aborted: 1; /* fatal error on handle */ + +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map h_lockdep_map; +#endif }; -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 12/49] ext4: Support large files
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> This patch converts ext4_inode i_blocks to represent total blocks occupied by the inode in file system block size. Earlier the variable used to represent this in 512 byte block size. This actually limited the total size of the file. The feature is enabled transparently when we write an inode whose i_blocks cannot be represnted as 512 byte units in a 48 bit variable. inode flag EXT4_HUGE_FILE_FL Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/inode.c | 32 +--- fs/ext4/super.c |9 ++--- include/linux/ext4_fs.h |3 ++- 3 files changed, 33 insertions(+), 11 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bb89fe7..9cf8572 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2671,14 +2671,20 @@ static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode, struct ext4_inode_info *ei) { blkcnt_t i_blocks ; - struct super_block *sb = ei->vfs_inode.i_sb; + struct inode *inode = &(ei->vfs_inode); + struct super_block *sb = inode->i_sb; if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_HUGE_FILE)) { /* we are using combined 48 bit field */ i_blocks = ((u64)le16_to_cpu(raw_inode->i_blocks_high)) << 32 | le32_to_cpu(raw_inode->i_blocks_lo); - return i_blocks; + if (ei->i_flags & EXT4_HUGE_FILE_FL) { + /* i_blocks represent file system block size */ + return i_blocks << (inode->i_blkbits - 9); + } else { + return i_blocks; + } } else { return le32_to_cpu(raw_inode->i_blocks_lo); } @@ -2829,8 +2835,9 @@ static int ext4_inode_blocks_set(handle_t *handle, * i_blocks can be represnted in a 32 bit variable * as multiple of 512 bytes */ - raw_inode->i_blocks_lo = cpu_to_le32((u32)i_blocks); + raw_inode->i_blocks_lo = cpu_to_le32(i_blocks); raw_inode->i_blocks_high = 0; + ei->i_flags &= ~EXT4_HUGE_FILE_FL; } else if (i_blocks <= 0xULL) { /* * i_blocks can be represented in a 48 bit variable @@ -2841,12 +2848,23 @@ static int ext4_inode_blocks_set(handle_t *handle, if (err) goto err_out; /* i_block is stored in the split 48 bit fields */ - raw_inode->i_blocks_lo = cpu_to_le32((u32)i_blocks); + raw_inode->i_blocks_lo = cpu_to_le32(i_blocks); raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32); + ei->i_flags &= ~EXT4_HUGE_FILE_FL; } else { - ext4_error(sb, __FUNCTION__, - "Wrong inode i_blocks count %llu\n", - (unsigned long long)inode->i_blocks); + /* +* i_blocks should be represented in a 48 bit variable +* as multiple of file system block size +*/ + err = ext4_update_rocompat_feature(handle, sb, + EXT4_FEATURE_RO_COMPAT_HUGE_FILE); + if (err) + goto err_out; + ei->i_flags |= EXT4_HUGE_FILE_FL; + /* i_block is stored in file system block size */ + i_blocks = i_blocks >> (inode->i_blkbits - 9); + raw_inode->i_blocks_lo = cpu_to_le32(i_blocks); + raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32); } err_out: return err; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 2b9dc96..64067de 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1631,11 +1631,14 @@ static loff_t ext4_max_size(int bits) upper_limit >>= (bits - 9); } else { - /* We use 48 bit ext4_inode i_blocks */ + /* +* We use 48 bit ext4_inode i_blocks +* With EXT4_HUGE_FILE_FL set the i_blocks +* represent total number of blocks in +* file system block size +*/ upper_limit = (1LL << 48) - 1; - /* total blocks in file system block size */ - upper_limit >>= (bits - 9); } /* indirect blocks */ diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h index be25eca..6ae91f4 100644 --- a/include/linux/ext4_fs.h +++ b/include/linux/ext4_fs.h @@ -178,8 +178,9 @@ struct ext4_group_desc #define EXT4_NOTAIL_FL 0x8000 /* file tail should not be merged */ #define EXT4_DIRSYNC_FL0x0001 /* dirsync behaviour (directories only) */ #defin
[PATCH 37/49] ext4: Fix ext4_show_options to show the correct mount options.
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> We need to look at the default value and make sure the mount options are not set via default value before showing them via ext4_show_options Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/super.c | 26 +++--- 1 files changed, 15 insertions(+), 11 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index aa22acd..64fc7f1 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -665,18 +665,20 @@ static inline void ext4_show_quota_options(struct seq_file *seq, struct super_bl */ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) { + int def_errors; + unsigned long def_mount_opts; struct super_block *sb = vfs->mnt_sb; struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_super_block *es = sbi->s_es; - unsigned long def_mount_opts; def_mount_opts = le32_to_cpu(es->s_default_mount_opts); + def_errors = le16_to_cpu(es->s_errors); if (sbi->s_sb_block != 1) seq_printf(seq, ",sb=%llu", sbi->s_sb_block); if (test_opt(sb, MINIX_DF)) seq_puts(seq, ",minixdf"); - if (test_opt(sb, GRPID)) + if (test_opt(sb, GRPID) && !(def_mount_opts & EXT4_DEFM_BSDGROUPS)) seq_puts(seq, ",grpid"); if (!test_opt(sb, GRPID) && (def_mount_opts & EXT4_DEFM_BSDGROUPS)) seq_puts(seq, ",nogrpid"); @@ -689,25 +691,24 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) seq_printf(seq, ",resgid=%u", sbi->s_resgid); } if (test_opt(sb, ERRORS_RO)) { - int def_errors = le16_to_cpu(es->s_errors); - if (def_errors == EXT4_ERRORS_PANIC || def_errors == EXT4_ERRORS_CONTINUE) { seq_puts(seq, ",errors=remount-ro"); } } - if (test_opt(sb, ERRORS_CONT)) + if (test_opt(sb, ERRORS_CONT) && def_errors != EXT4_ERRORS_CONTINUE) seq_puts(seq, ",errors=continue"); - if (test_opt(sb, ERRORS_PANIC)) + if (test_opt(sb, ERRORS_PANIC) && def_errors != EXT4_ERRORS_PANIC) seq_puts(seq, ",errors=panic"); - if (test_opt(sb, NO_UID32)) + if (test_opt(sb, NO_UID32) && !(def_mount_opts & EXT4_DEFM_UID16)) seq_puts(seq, ",nouid32"); - if (test_opt(sb, DEBUG)) + if (test_opt(sb, DEBUG) && !(def_mount_opts & EXT4_DEFM_DEBUG)) seq_puts(seq, ",debug"); if (test_opt(sb, OLDALLOC)) seq_puts(seq, ",oldalloc"); #ifdef CONFIG_EXT4DEV_FS_XATTR - if (test_opt(sb, XATTR_USER)) + if (test_opt(sb, XATTR_USER) && + !(def_mount_opts & EXT4_DEFM_XATTR_USER)) seq_puts(seq, ",user_xattr"); if (!test_opt(sb, XATTR_USER) && (def_mount_opts & EXT4_DEFM_XATTR_USER)) { @@ -715,7 +716,7 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) } #endif #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL - if (test_opt(sb, POSIX_ACL)) + if (test_opt(sb, POSIX_ACL) && !(def_mount_opts & EXT4_DEFM_ACL)) seq_puts(seq, ",acl"); if (!test_opt(sb, POSIX_ACL) && (def_mount_opts & EXT4_DEFM_ACL)) seq_puts(seq, ",noacl"); @@ -735,6 +736,10 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) if (test_opt(sb, I_VERSION)) seq_puts(seq, ",i_version"); + /* +* journal mode get enabled in different ways +* So just print the value even if we didn't specify it +*/ if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA) seq_puts(seq, ",data=journal"); else if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_ORDERED_DATA) @@ -743,7 +748,6 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) seq_puts(seq, ",data=writeback"); ext4_show_quota_options(seq, sb); - return 0; } -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/49] ext4: Introduce ext4_update_*_feature
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> Introduce ext4_update_*_feature and use them instead of opencoding. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/ialloc.c| 11 +++- fs/ext4/super.c | 60 +++ include/linux/ext4_fs.h |6 3 files changed, 70 insertions(+), 7 deletions(-) diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 7b5cfa6..00b152b 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -748,13 +748,10 @@ got: if (test_opt(sb, EXTENTS)) { EXT4_I(inode)->i_flags |= EXT4_EXTENTS_FL; ext4_ext_tree_init(handle, inode); - if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) { - err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh); - if (err) goto fail; - EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS); - BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "call ext4_journal_dirty_metadata"); - err = ext4_journal_dirty_metadata(handle, EXT4_SB(sb)->s_sbh); - } + err = ext4_update_incompat_feature(handle, sb, + EXT4_FEATURE_INCOMPAT_EXTENTS); + if (err) + goto fail; } ext4_debug("allocating inode %lu\n", inode->i_ino); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index df8842b..4d7f33f 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -373,6 +373,66 @@ void ext4_update_dynamic_rev(struct super_block *sb) */ } +int ext4_update_compat_feature(handle_t *handle, + struct super_block *sb, __u32 compat) +{ + int err = 0; + if (!EXT4_HAS_COMPAT_FEATURE(sb, compat)) { + err = ext4_journal_get_write_access(handle, + EXT4_SB(sb)->s_sbh); + if (err) + return err; + EXT4_SET_COMPAT_FEATURE(sb, compat); + sb->s_dirt = 1; + handle->h_sync = 1; + BUFFER_TRACE(EXT4_SB(sb)->s_sbh, + "call ext4_journal_dirty_met adata"); + err = ext4_journal_dirty_metadata(handle, + EXT4_SB(sb)->s_sbh); + } + return err; +} + +int ext4_update_rocompat_feature(handle_t *handle, + struct super_block *sb, __u32 rocompat) +{ + int err = 0; + if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, rocompat)) { + err = ext4_journal_get_write_access(handle, + EXT4_SB(sb)->s_sbh); + if (err) + return err; + EXT4_SET_RO_COMPAT_FEATURE(sb, rocompat); + sb->s_dirt = 1; + handle->h_sync = 1; + BUFFER_TRACE(EXT4_SB(sb)->s_sbh, + "call ext4_journal_dirty_met adata"); + err = ext4_journal_dirty_metadata(handle, + EXT4_SB(sb)->s_sbh); + } + return err; +} + +int ext4_update_incompat_feature(handle_t *handle, + struct super_block *sb, __u32 incompat) +{ + int err = 0; + if (!EXT4_HAS_INCOMPAT_FEATURE(sb, incompat)) { + err = ext4_journal_get_write_access(handle, + EXT4_SB(sb)->s_sbh); + if (err) + return err; + EXT4_SET_INCOMPAT_FEATURE(sb, incompat); + sb->s_dirt = 1; + handle->h_sync = 1; + BUFFER_TRACE(EXT4_SB(sb)->s_sbh, + "call ext4_journal_dirty_met adata"); + err = ext4_journal_dirty_metadata(handle, + EXT4_SB(sb)->s_sbh); + } + return err; +} + /* * Open the external journal device */ diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h index e1103c2..429dbfc 100644 --- a/include/linux/ext4_fs.h +++ b/include/linux/ext4_fs.h @@ -989,6 +989,12 @@ extern void ext4_abort (struct super_block *, const char *, const char *, ...) extern void ext4_warning (struct super_block *, const char *, const char *, ...) __attribute__ ((format (printf, 3, 4))); extern void ext4_update_dynamic_rev (struct super_block *sb); +extern int ext4_update_compat_feature(handle_t *handle, struct super_block *sb, + __u32 compat); +extern int ext4_update_rocompat_feature(handle_t *handle, + struct super_block *sb, __u32 rocompat); +extern int ext4_update_incompat_feature(handle_t *handle, + struct super_block *sb, __u32 incompat); extern ext4_fsblk_t ext4_block_bitmap(struct super_block *sb,
[PATCH 08/49] ext4: Fix sparse warnings.
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> Fix sparse warnings related to static functions and local variables. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/extents.c |6 +++--- fs/ext4/inode.c | 18 +++--- fs/ext4/super.c |3 +++ include/linux/ext4_fs.h |2 ++ 4 files changed, 19 insertions(+), 10 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 6853722..754c0d3 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1088,7 +1088,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode, * then we have to correct all indexes above. * TODO: do we need to correct tree in all cases? */ -int ext4_ext_correct_indexes(handle_t *handle, struct inode *inode, +static int ext4_ext_correct_indexes(handle_t *handle, struct inode *inode, struct ext4_ext_path *path) { struct ext4_extent_header *eh; @@ -1535,7 +1535,7 @@ ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block, * It's used in truncate case only, thus all requests are for * last index in the block only. */ -int ext4_ext_rm_idx(handle_t *handle, struct inode *inode, +static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode, struct ext4_ext_path *path) { struct buffer_head *bh; @@ -1806,7 +1806,7 @@ ext4_ext_more_to_rm(struct ext4_ext_path *path) return 1; } -int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start) +static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start) { struct super_block *sb = inode->i_sb; int depth = ext_depth(inode); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 1ee19c9..76ceba2 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2052,11 +2052,11 @@ static void ext4_clear_blocks(handle_t *handle, struct inode *inode, for (p = first; p < last; p++) { u32 nr = le32_to_cpu(*p); if (nr) { - struct buffer_head *bh; + struct buffer_head *tbh; *p = 0; - bh = sb_find_get_block(inode->i_sb, nr); - ext4_forget(handle, 0, inode, bh, nr); + tbh = sb_find_get_block(inode->i_sb, nr); + ext4_forget(handle, 0, inode, tbh, nr); } } @@ -2324,8 +2324,10 @@ void ext4_truncate(struct inode *inode) return; } - if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) - return ext4_ext_truncate(inode, page); + if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) { + ext4_ext_truncate(inode, page); + return; + } handle = start_transaction(inode); if (IS_ERR(handle)) { @@ -3163,8 +3165,10 @@ ext4_reserve_inode_write(handle_t *handle, struct inode *inode, * Expand an inode by new_extra_isize bytes. * Returns 0 on success or negative error number on failure. */ -int ext4_expand_extra_isize(struct inode *inode, unsigned int new_extra_isize, - struct ext4_iloc iloc, handle_t *handle) +static int ext4_expand_extra_isize(struct inode *inode, + unsigned int new_extra_isize, + struct ext4_iloc iloc, + handle_t *handle) { struct ext4_inode *raw_inode; struct ext4_xattr_ibody_header *header; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 4d7f33f..7be27db 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1644,6 +1644,9 @@ static ext4_fsblk_t descriptor_loc(struct super_block *sb, static int ext4_fill_super (struct super_block *sb, void *data, int silent) + __releases(kernel_sem) + __acquires(kernel_sem) + { struct buffer_head * bh; struct ext4_super_block *es = NULL; diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h index 429dbfc..1a27433 100644 --- a/include/linux/ext4_fs.h +++ b/include/linux/ext4_fs.h @@ -893,6 +893,8 @@ extern ext4_fsblk_t ext4_new_block (handle_t *handle, struct inode *inode, ext4_fsblk_t goal, int *errp); extern ext4_fsblk_t ext4_new_blocks (handle_t *handle, struct inode *inode, ext4_fsblk_t goal, unsigned long *count, int *errp); +extern ext4_fsblk_t ext4_new_blocks_old(handle_t *handle, struct inode *inode, + ext4_fsblk_t goal, unsigned long *count, int *errp); extern void ext4_free_blocks (handle_t *handle, struct inode *inode, ext4_fsblk_t block, unsigned long count); extern void ext4_free_blocks_sb (handle_t *handle, struct super_block *sb, -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.ke
[PATCH 21/49] ext4: fix oops on corrupted ext4 mount
From: Eric Sandeen <[EMAIL PROTECTED]> When mounting an ext4 filesystem with corrupted s_first_data_block, things can go very wrong and oops. Because blocks_count in ext4_fill_super is a u64, and we must use do_div, the calculation of db_count is done differently than on ext4. If first_data_block is corrupted such that it is larger than ext4_blocks_count, for example, then the intermediate blocks_count value may go negative, but sign-extend to a very large value: blocks_count = (ext4_blocks_count(es) - le32_to_cpu(es->s_first_data_block) + EXT4_BLOCKS_PER_GROUP(sb) - 1); This is then assigned to s_groups_count which is an unsigned long: sbi->s_groups_count = blocks_count; This may result in a value of 0x which is then used to compute db_count: db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) / EXT4_DESC_PER_BLOCK(sb); and in this case db_count will wind up as 0 because the addition overflows 32 bits. This in turn causes the kmalloc for group_desc to be of 0 size: sbi->s_group_desc = kmalloc(db_count * sizeof (struct buffer_head *), GFP_KERNEL); and eventually in ext4_check_descriptors, dereferencing sbi->s_group_desc[desc_block] will result in a NULL pointer dereference. The simplest test seems to be to sanity check s_first_data_block, EXT4_BLOCKS_PER_GROUP, and ext4_blocks_count values to be sure their combination won't result in a bad intermediate value for blocks_count. We could just check for db_count == 0, but catching it at the root cause seems like it provides more info. Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]> Reviewed-by: Mingming Cao <[EMAIL PROTECTED]> --- fs/ext4/super.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 1484a08..32e3ecb 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1997,6 +1997,17 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent) if (EXT4_BLOCKS_PER_GROUP(sb) == 0) goto cantfind_ext4; + + /* ensure blocks_count calculation below doesn't sign-extend */ + if (ext4_blocks_count(es) + EXT4_BLOCKS_PER_GROUP(sb) < + le32_to_cpu(es->s_first_data_block) + 1) { + printk(KERN_WARNING "EXT4-fs: bad geometry: block count %llu, " + "first data block %u, blocks per group %lu\n", + ext4_blocks_count(es), + le32_to_cpu(es->s_first_data_block), + EXT4_BLOCKS_PER_GROUP(sb)); + goto failed_mount; + } blocks_count = (ext4_blocks_count(es) - le32_to_cpu(es->s_first_data_block) + EXT4_BLOCKS_PER_GROUP(sb) - 1); -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 26/49] jbd2: Fix assertion failure in fs/jbd2/checkpoint.c
From: Jan Kara <[EMAIL PROTECTED]> Before we start committing a transaction, we call __journal_clean_checkpoint_list() to cleanup transaction's written-back buffers. If this call happens to remove all of them (and there were already some buffers), __journal_remove_checkpoint() will decide to free the transaction because it isn't (yet) a committing transaction and soon we fail some assertion - the transaction really isn't ready to be freed :). We change the check in __journal_remove_checkpoint() to free only a transaction in T_FINISHED state. The locking there is subtle though (as everywhere in JBD ;(). We use j_list_lock to protect the check and a subsequent call to __journal_drop_transaction() and do the same in the end of journal_commit_transaction() which is the only place where a transaction can get to T_FINISHED state. Probably I'm too paranoid here and such locking is not really necessary - checkpoint lists are processed only from log_do_checkpoint() where a transaction must be already committed to be processed or from __journal_clean_checkpoint_list() where kjournald itself calls it and thus transaction cannot change state either. Better be safe if something changes in future... Signed-off-by: Jan Kara <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/jbd2/checkpoint.c | 12 ++-- fs/jbd2/commit.c |8 include/linux/jbd2.h |2 ++ 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index 3fccde7..7e958c8 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -602,15 +602,15 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) /* * There is one special case to worry about: if we have just pulled the -* buffer off a committing transaction's forget list, then even if the -* checkpoint list is empty, the transaction obviously cannot be -* dropped! +* buffer off a running or committing transaction's checkpoing list, +* then even if the checkpoint list is empty, the transaction obviously +* cannot be dropped! * -* The locking here around j_committing_transaction is a bit sleazy. +* The locking here around t_state is a bit sleazy. * See the comment at the end of jbd2_journal_commit_transaction(). */ - if (transaction == journal->j_committing_transaction) { - JBUFFER_TRACE(jh, "belongs to committing transaction"); + if (transaction->t_state != T_FINISHED) { + JBUFFER_TRACE(jh, "belongs to running/committing transaction"); goto out; } diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 6986f33..39b5cee 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -867,10 +867,10 @@ restart_loop: } spin_unlock(&journal->j_list_lock); /* -* This is a bit sleazy. We borrow j_list_lock to protect -* journal->j_committing_transaction in __jbd2_journal_remove_checkpoint. -* Really, __jbd2_journal_remove_checkpoint should be using j_state_lock but -* it's a bit hassle to hold that across __jbd2_journal_remove_checkpoint +* This is a bit sleazy. We use j_list_lock to protect transition +* of a transaction into T_FINISHED state and calling +* __jbd2_journal_drop_transaction(). Otherwise we could race with +* other checkpointing code processing the transaction... */ spin_lock(&journal->j_state_lock); spin_lock(&journal->j_list_lock); diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index d5f7cff..d861ffd 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -442,6 +442,8 @@ struct transaction_s /* * Transaction's current state * [no locking - only kjournald2 alters this] +* [j_list_lock] guards transition of a transaction into T_FINISHED +* state and subsequent call of __jbd2_journal_drop_transaction() * FIXME: needs barriers * KLUDGE: [use j_state_lock] */ -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 23/49] Add buffer head related helper functions
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> Add buffer head related helper function bh_uptodate_or_lock and bh_submit_read which can be used by file system Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/buffer.c | 41 + include/linux/buffer_head.h |2 ++ 2 files changed, 43 insertions(+), 0 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 7249e01..7593ff3 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -3213,6 +3213,47 @@ static int buffer_cpu_notify(struct notifier_block *self, return NOTIFY_OK; } +/** + * bh_uptodate_or_lock: Test whether the buffer is uptodate + * @bh: struct buffer_head + * + * Return true if the buffer is up-to-date and false, + * with the buffer locked, if not. + */ +int bh_uptodate_or_lock(struct buffer_head *bh) +{ + if (!buffer_uptodate(bh)) { + lock_buffer(bh); + if (!buffer_uptodate(bh)) + return 0; + unlock_buffer(bh); + } + return 1; +} +EXPORT_SYMBOL(bh_uptodate_or_lock); +/** + * bh_submit_read: Submit a locked buffer for reading + * @bh: struct buffer_head + * + * Returns a negative error + */ +int bh_submit_read(struct buffer_head *bh) +{ + if (!buffer_locked(bh)) + lock_buffer(bh); + + if (buffer_uptodate(bh)) + return 0; + + get_bh(bh); + bh->b_end_io = end_buffer_read_sync; + submit_bh(READ, bh); + wait_on_buffer(bh); + if (buffer_uptodate(bh)) + return 0; + return -EIO; +} +EXPORT_SYMBOL(bh_submit_read); void __init buffer_init(void) { int nrpages; diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index da0d83f..e98801f 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -192,6 +192,8 @@ int sync_dirty_buffer(struct buffer_head *bh); int submit_bh(int, struct buffer_head *); void write_boundary_block(struct block_device *bdev, sector_t bblock, unsigned blocksize); +int bh_uptodate_or_lock(struct buffer_head *bh); +int bh_submit_read(struct buffer_head *bh); extern int buffer_heads_over_limit; -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 31/49] ext4: Take read lock during overwrite case.
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> When we are overwriting a file and not actually allocating new file system blocks we need to take only the read lock on i_data_sem. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/inode.c | 32 1 files changed, 24 insertions(+), 8 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 596b3ab..ee0bc3a 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -901,11 +901,31 @@ int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block, int create, int extend_disksize) { int retval; - if (create) { - down_write((&EXT4_I(inode)->i_data_sem)); + /* +* Try to see if we can get the block without requesting +* for new file system block. +*/ + down_read((&EXT4_I(inode)->i_data_sem)); + if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) { + retval = ext4_ext_get_blocks(handle, inode, block, max_blocks, + bh, 0, 0); } else { - down_read((&EXT4_I(inode)->i_data_sem)); + retval = ext4_get_blocks_handle(handle, + inode, block, max_blocks, bh, 0, 0); } + up_read((&EXT4_I(inode)->i_data_sem)); + if (!create || (retval > 0)) + return retval; + + /* +* We need to allocate new blocks which will result +* in i_data update +*/ + down_write((&EXT4_I(inode)->i_data_sem)); + /* +* We need to check for EXT4 here because migrate +* could have changed the inode type in between +*/ if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) { retval = ext4_ext_get_blocks(handle, inode, block, max_blocks, bh, create, extend_disksize); @@ -913,11 +933,7 @@ int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block, retval = ext4_get_blocks_handle(handle, inode, block, max_blocks, bh, create, extend_disksize); } - if (create) { - up_write((&EXT4_I(inode)->i_data_sem)); - } else { - up_read((&EXT4_I(inode)->i_data_sem)); - } + up_write((&EXT4_I(inode)->i_data_sem)); return retval; } static int ext4_get_block(struct inode *inode, sector_t iblock, -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 14/49] ext4: export iov_shorten from kernel for ext4's use
From: Eric Sandeen <[EMAIL PROTECTED]> Export iov_shorten() from kernel so that ext4 can truncate too-large writes to bitmapped files. Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]> --- fs/read_write.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index ea1f94c..dfaee3f 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -450,6 +450,7 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to) } return seg; } +EXPORT_SYMBOL(iov_shorten); ssize_t do_sync_readv_writev(struct file *filp, const struct iovec *iov, unsigned long nr_segs, size_t len, loff_t *ppos, iov_fn_t fn) -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 16/49] ext2: Fix the max file size for ext2 file system.
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> The max file size for ext2 file system is now calculated with hardcoded 4K block size. The patch fixes it to be calculated with the right block size. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext2/super.c | 32 1 files changed, 28 insertions(+), 4 deletions(-) diff --git a/fs/ext2/super.c b/fs/ext2/super.c index 154e25f..6abaf75 100644 --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -680,11 +680,31 @@ static int ext2_check_descriptors (struct super_block * sb) static loff_t ext2_max_size(int bits) { loff_t res = EXT2_NDIR_BLOCKS; - /* This constant is calculated to be the largest file size for a -* dense, 4k-blocksize file such that the total number of + int meta_blocks; + loff_t upper_limit; + + /* This is calculated to be the largest file size for a +* dense, file such that the total number of * sectors in the file, including data and all indirect blocks, -* does not exceed 2^32. */ - const loff_t upper_limit = 0x1ff7fffd000LL; +* does not exceed 2^32 -1 +* __u32 i_blocks representing the total number of +* 512 bytes blocks of the file +*/ + upper_limit = (1LL << 32) - 1; + + /* total blocks in file system block size */ + upper_limit >>= (bits - 9); + + + /* indirect blocks */ + meta_blocks = 1; + /* double indirect blocks */ + meta_blocks += 1 + (1LL << (bits-2)); + /* tripple indirect blocks */ + meta_blocks += 1 + (1LL << (bits-2)) + (1LL << (2*(bits-2))); + + upper_limit -= meta_blocks; + upper_limit <<= bits; res += 1LL << (bits-2); res += 1LL << (2*(bits-2)); @@ -692,6 +712,10 @@ static loff_t ext2_max_size(int bits) res <<= bits; if (res > upper_limit) res = upper_limit; + + if (res > MAX_LFS_FILESIZE) + res = MAX_LFS_FILESIZE; + return res; } -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 17/49] ext3: Fix the max file size for ext3 file system.
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> The max file size for ext3 file system is now calculated with hardcoded 4K block size. The patch fixes it to be calculated with the right block size. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext3/super.c | 32 1 files changed, 28 insertions(+), 4 deletions(-) diff --git a/fs/ext3/super.c b/fs/ext3/super.c index cb14de1..f3675cc 100644 --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -1436,11 +1436,31 @@ static void ext3_orphan_cleanup (struct super_block * sb, static loff_t ext3_max_size(int bits) { loff_t res = EXT3_NDIR_BLOCKS; - /* This constant is calculated to be the largest file size for a -* dense, 4k-blocksize file such that the total number of + int meta_blocks; + loff_t upper_limit; + + /* This is calculated to be the largest file size for a +* dense, file such that the total number of * sectors in the file, including data and all indirect blocks, -* does not exceed 2^32. */ - const loff_t upper_limit = 0x1ff7fffd000LL; +* does not exceed 2^32 -1 +* __u32 i_blocks representing the total number of +* 512 bytes blocks of the file +*/ + upper_limit = (1LL << 32) - 1; + + /* total blocks in file system block size */ + upper_limit >>= (bits - 9); + + + /* indirect blocks */ + meta_blocks = 1; + /* double indirect blocks */ + meta_blocks += 1 + (1LL << (bits-2)); + /* tripple indirect blocks */ + meta_blocks += 1 + (1LL << (bits-2)) + (1LL << (2*(bits-2))); + + upper_limit -= meta_blocks; + upper_limit <<= bits; res += 1LL << (bits-2); res += 1LL << (2*(bits-2)); @@ -1448,6 +1468,10 @@ static loff_t ext3_max_size(int bits) res <<= bits; if (res > upper_limit) res = upper_limit; + + if (res > MAX_LFS_FILESIZE) + res = MAX_LFS_FILESIZE; + return res; } -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 27/49] ext4: Check for the correct error return from
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> ext4_ext_get_blocks returns negative values on error. We should check for <= 0 Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/extents.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 754c0d3..8593e59 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2462,12 +2462,12 @@ retry: ret = ext4_ext_get_blocks(handle, inode, block, max_blocks, &map_bh, EXT4_CREATE_UNINITIALIZED_EXT, 0); - WARN_ON(!ret); - if (!ret) { + WARN_ON(ret <= 0); + if (ret <= 0) { ext4_error(inode->i_sb, "ext4_fallocate", - "ext4_ext_get_blocks returned 0! inode#%lu" - ", block=%u, max_blocks=%lu", - inode->i_ino, block, max_blocks); + "ext4_ext_get_blocks returned error: " + "inode#%lu, block=%u, max_blocks=%lu", + inode->i_ino, block, max_blocks); ret = -EIO; ext4_mark_inode_dirty(handle, inode); ret2 = ext4_journal_stop(handle); -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 19/49] ext4: Return after ext4_error in case of failures
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> This fix some instances where we were continuing after calling ext4_error. ext4_error call panic only if errors=panic mount option is set. So we need to make sure we return correctly after ext4_error call Reported by: Adrian Bunk <[EMAIL PROTECTED]> Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/balloc.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 9568a57..ff3428e 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -587,11 +587,13 @@ do_more: in_range(ext4_inode_bitmap(sb, desc), block, count) || in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) || in_range(block + count - 1, ext4_inode_table(sb, desc), -sbi->s_itb_per_group)) +sbi->s_itb_per_group)) { ext4_error (sb, "ext4_free_blocks", "Freeing blocks in system zones - " "Block = %llu, count = %lu", block, count); + goto error_return; + } /* * We are about to start releasing blocks in the bitmap, @@ -1690,11 +1692,13 @@ allocated: in_range(ret_block, ext4_inode_table(sb, gdp), EXT4_SB(sb)->s_itb_per_group) || in_range(ret_block + num - 1, ext4_inode_table(sb, gdp), -EXT4_SB(sb)->s_itb_per_group)) +EXT4_SB(sb)->s_itb_per_group)) { ext4_error(sb, "ext4_new_block", "Allocating block in system zone - " "blocks from %llu, length %lu", ret_block, num); + goto out; + } performed_allocation = 1; -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 42/49] ext4: Enable the multiblock allocator by default
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> Enable the multiblock allocator by default. Fix ext4_show_options() so if it is not enabled, the nomballoc option included in /proc/mounts. Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> Acked-by: Eric Sandeen <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]> --- fs/ext4/super.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 136d095..91a11ec 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -736,6 +736,8 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) seq_puts(seq, ",nobh"); if (!test_opt(sb, EXTENTS)) seq_puts(seq, ",noextents"); + if (!test_opt(sb, MBALLOC)) + seq_puts(seq, ",nomballoc"); if (test_opt(sb, I_VERSION)) seq_puts(seq, ",i_version"); @@ -1902,6 +1904,11 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent) * User -o noextents to turn it off */ set_opt(sbi->s_mount_opt, EXTENTS); + /* +* turn on mballoc feature by default in ext4 filesystem +* User -o nomballoc to turn it off +*/ + set_opt(sbi->s_mount_opt, MBALLOC); if (!parse_options ((char *) data, sb, &journal_inum, &journal_devnum, NULL, 0)) -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 28/49] ext4: remove unused code from ext4_find_entry()
From: Mariusz Kozlowski <[EMAIL PROTECTED]> The unused code found in ext3_find_entry() is also present (and still unused) in the ext4_find_entry() code. This patch removes it. Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]> --- fs/ext4/namei.c |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index fb673b1..67b6d8a 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -861,14 +861,10 @@ static struct buffer_head * ext4_find_entry (struct dentry *dentry, int i, err; struct inode *dir = dentry->d_parent->d_inode; int namelen; - const u8 *name; - unsigned blocksize; *res_dir = NULL; sb = dir->i_sb; - blocksize = sb->s_blocksize; namelen = dentry->d_name.len; - name = dentry->d_name.name; if (namelen > EXT4_NAME_LEN) return NULL; if (is_dx(dir)) { -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 45/49] ext4: Use the ext4_ext_actual_len() helper function
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> ext4 uses the high bit of the extent length to encode whether the extent is intialized or not. The helper function ext4_ext_get_actual_len should be used to get the actual length of the extent. This addresses the kernel bug documented here: http://bugzilla.kernel.org/show_bug.cgi?id=9732 kernel BUG at fs/ext4/extents.c:1056! Call Trace: [] :ext4dev:ext4_ext_get_blocks+0x5ba/0x8c1 [] lock_release_holdtime+0x27/0x49 [] _spin_unlock+0x17/0x20 [] :jbd2:start_this_handle+0x4e0/0x4fe [] :ext4dev:ext4_fallocate+0x175/0x39a [] lock_release_holdtime+0x27/0x49 [] __lock_acquire+0x4e7/0xc4d [] lock_release_holdtime+0x27/0x49 [] sys_fallocate+0xe4/0x10d [] tracesys+0xd5/0xda Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]> --- fs/ext4/extents.c | 24 +--- 1 files changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 13e3e8c..b6b9ec7 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1029,7 +1029,7 @@ ext4_ext_search_left(struct inode *inode, struct ext4_ext_path *path, { struct ext4_extent_idx *ix; struct ext4_extent *ex; - int depth; + int depth, ee_len; BUG_ON(path == NULL); depth = path->p_depth; @@ -1043,6 +1043,7 @@ ext4_ext_search_left(struct inode *inode, struct ext4_ext_path *path, * first one in the file */ ex = path[depth].p_ext; + ee_len = ext4_ext_get_actual_len(ex); if (*logical < le32_to_cpu(ex->ee_block)) { BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex); while (--depth >= 0) { @@ -1052,10 +1053,10 @@ ext4_ext_search_left(struct inode *inode, struct ext4_ext_path *path, return 0; } - BUG_ON(*logical < le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len)); + BUG_ON(*logical < (le32_to_cpu(ex->ee_block) + ee_len)); - *logical = le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len) - 1; - *phys = ext_pblock(ex) + le16_to_cpu(ex->ee_len) - 1; + *logical = le32_to_cpu(ex->ee_block) + ee_len - 1; + *phys = ext_pblock(ex) + ee_len - 1; return 0; } @@ -1075,7 +1076,7 @@ ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path, struct ext4_extent_idx *ix; struct ext4_extent *ex; ext4_fsblk_t block; - int depth; + int depth, ee_len; BUG_ON(path == NULL); depth = path->p_depth; @@ -1089,6 +1090,7 @@ ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path, * first one in the file */ ex = path[depth].p_ext; + ee_len = ext4_ext_get_actual_len(ex); if (*logical < le32_to_cpu(ex->ee_block)) { BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex); while (--depth >= 0) { @@ -1100,7 +1102,7 @@ ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path, return 0; } - BUG_ON(*logical < le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len)); + BUG_ON(*logical < (le32_to_cpu(ex->ee_block) + ee_len)); if (ex != EXT_LAST_EXTENT(path[depth].p_hdr)) { /* next allocated block in this leaf */ @@ -1316,7 +1318,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1, if (ext1_ee_len + ext2_ee_len > max_len) return 0; #ifdef AGGRESSIVE_TEST - if (le16_to_cpu(ex1->ee_len) >= 4) + if (ext1_ee_len >= 4) return 0; #endif @@ -2313,7 +2315,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, - le32_to_cpu(newex.ee_block) + ext_pblock(&newex); /* number of remaining blocks in the extent */ - allocated = le16_to_cpu(newex.ee_len) - + allocated = ext4_ext_get_actual_len(&newex) - (iblock - le32_to_cpu(newex.ee_block)); goto out; } else { @@ -2429,7 +2431,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, newex.ee_len = cpu_to_le16(max_blocks); err = ext4_ext_check_overlap(inode, &newex, path); if (err) - allocated = le16_to_cpu(newex.ee_len); + allocated = ext4_ext_get_actual_len(&newex); else allocated = max_blocks; @@ -2461,7 +2463,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, * but otherwise we'd need to call it every free() */ ext4_mb_discard_inode_preallocations(inode); ext4_free_blocks(handle, inode, ext_pblock(&newex), - le16_to_cpu(newex.ee_len), 0); + ext4_ext_get_actual_len(&newex), 0); g
Re: [PATCH][RESEND] sh: termios ioctl definitions
On Sat, 19 Jan 2008 16:05:06 + Alan Cox <[EMAIL PROTECTED]> wrote: > These ports are holding up progress and now have been for months. Do the > job for them. Never understood the dependencies here. Do these two patches depend on something else which is only-in-mm? Also, I've been uncertainly sitting on http://userweb.kernel.org/~akpm/mmotm/broken-out/tty-fix-tty-network-driver-interactions-with-tcget-tcset-calls-x86-fix.patch for some time. Is it ready to go into git-x86? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/49] ext4: Rename i_file_acl to i_file_acl_lo
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> Rename i_file_acl to i_file_acl_lo. This helps in finding bugs where we use i_file_acl instead of the combined i_file_acl_lo and i_file_acl_high Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> --- fs/ext4/inode.c |4 ++-- include/linux/ext4_fs.h |2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 76ceba2..7bcec18 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2718,7 +2718,7 @@ void ext4_read_inode(struct inode * inode) } inode->i_blocks = le32_to_cpu(raw_inode->i_blocks); ei->i_flags = le32_to_cpu(raw_inode->i_flags); - ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl); + ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl_lo); if (EXT4_SB(inode->i_sb)->s_es->s_creator_os != cpu_to_le32(EXT4_OS_HURD)) ei->i_file_acl |= @@ -2866,7 +2866,7 @@ static int ext4_do_update_inode(handle_t *handle, cpu_to_le32(EXT4_OS_HURD)) raw_inode->i_file_acl_high = cpu_to_le16(ei->i_file_acl >> 32); - raw_inode->i_file_acl = cpu_to_le32(ei->i_file_acl); + raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl); if (!S_ISREG(inode->i_mode)) { raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl); } else { diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h index 1a27433..6894f36 100644 --- a/include/linux/ext4_fs.h +++ b/include/linux/ext4_fs.h @@ -297,7 +297,7 @@ struct ext4_inode { } osd1; /* OS dependent 1 */ __le32 i_block[EXT4_N_BLOCKS];/* Pointers to blocks */ __le32 i_generation; /* File version (for NFS) */ - __le32 i_file_acl; /* File ACL */ + __le32 i_file_acl_lo; /* File ACL */ __le32 i_dir_acl; /* Directory ACL */ __le32 i_obso_faddr; /* Obsoleted fragment address */ union { -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/49] ext4 extents: remove unneeded casts
From: Eric Sandeen <[EMAIL PROTECTED]> There are many casts in extents.c which are not needed, as the variables are already the type of the cast, or are being promoted for no particular reason in printk's. Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> --- fs/ext4/extents.c | 49 ++--- 1 files changed, 22 insertions(+), 27 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 19d8059..6853722 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -374,7 +374,7 @@ ext4_ext_binsearch_idx(struct inode *inode, struct ext4_extent_idx *r, *l, *m; - ext_debug("binsearch for %lu(idx): ", (unsigned long)block); + ext_debug("binsearch for %u(idx): ", block); l = EXT_FIRST_INDEX(eh) + 1; r = EXT_LAST_INDEX(eh); @@ -440,7 +440,7 @@ ext4_ext_binsearch(struct inode *inode, return; } - ext_debug("binsearch for %lu: ", (unsigned long)block); + ext_debug("binsearch for %u: ", block); l = EXT_FIRST_EXTENT(eh) + 1; r = EXT_LAST_EXTENT(eh); @@ -766,7 +766,7 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode, while (k--) { oldblock = newblock; newblock = ablocks[--a]; - bh = sb_getblk(inode->i_sb, (ext4_fsblk_t)newblock); + bh = sb_getblk(inode->i_sb, newblock); if (!bh) { err = -EIO; goto cleanup; @@ -786,9 +786,8 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode, fidx->ei_block = border; ext4_idx_store_pblock(fidx, oldblock); - ext_debug("int.index at %d (block %llu): %lu -> %llu\n", i, - newblock, (unsigned long) le32_to_cpu(border), - oldblock); + ext_debug("int.index at %d (block %llu): %u -> %llu\n", + i, newblock, le32_to_cpu(border), oldblock); /* copy indexes */ m = 0; path[i].p_idx++; @@ -1476,10 +1475,10 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path, } else if (block < le32_to_cpu(ex->ee_block)) { lblock = block; len = le32_to_cpu(ex->ee_block) - block; - ext_debug("cache gap(before): %lu [%lu:%lu]", - (unsigned long) block, - (unsigned long) le32_to_cpu(ex->ee_block), - (unsigned long) ext4_ext_get_actual_len(ex)); + ext_debug("cache gap(before): %u [%u:%u]", + block, + le32_to_cpu(ex->ee_block), +ext4_ext_get_actual_len(ex)); } else if (block >= le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex)) { ext4_lblk_t next; @@ -1487,10 +1486,10 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path, + ext4_ext_get_actual_len(ex); next = ext4_ext_next_allocated_block(path); - ext_debug("cache gap(after): [%lu:%lu] %lu", - (unsigned long) le32_to_cpu(ex->ee_block), - (unsigned long) ext4_ext_get_actual_len(ex), - (unsigned long) block); + ext_debug("cache gap(after): [%u:%u] %u", + le32_to_cpu(ex->ee_block), + ext4_ext_get_actual_len(ex), + block); BUG_ON(next == lblock); len = next - lblock; } else { @@ -1498,7 +1497,7 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path, BUG(); } - ext_debug(" -> %lu:%lu\n", (unsigned long) lblock, len); + ext_debug(" -> %u:%lu\n", lblock, len); ext4_ext_put_in_cache(inode, lblock, len, 0, EXT4_EXT_CACHE_GAP); } @@ -1520,11 +1519,9 @@ ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block, ex->ee_block = cpu_to_le32(cex->ec_block); ext4_ext_store_pblock(ex, cex->ec_start); ex->ee_len = cpu_to_le16(cex->ec_len); - ext_debug("%lu cached by %lu:%lu:%llu\n", - (unsigned long) block, - (unsigned long) cex->ec_block, - (unsigned long) cex->ec_len, - cex->ec_start); + ext_debug("%u cached by %u:%u:%llu\n", + block, + cex->ec_block, cex->ec_len, cex->ec_start); return cex->ec_type; } @@ -2145,9 +2142,8 @@ int ext4_ext_get_blocks(handl
[PATCH 47/49] jbd2: Mark jbd2 slabs as SLAB_TEMPORARY
From: Mingming Cao <[EMAIL PROTECTED]> This patch marks slab allocations by jbd2 as short-lived in support of Mel Gorman's "Group short-lived and reclaimable kernel allocations" patch. (Ported from similar changes made to fs/jbd/journal.c and fs/jbd/revoke.c in Mel's patch.) Cc: Mel Gorman <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]> --- fs/jbd2/journal.c |4 ++-- fs/jbd2/revoke.c |6 -- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index f8b0f8c..8301e8d 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1975,7 +1975,7 @@ static int journal_init_jbd2_journal_head_cache(void) jbd2_journal_head_cache = kmem_cache_create("jbd2_journal_head", sizeof(struct journal_head), 0, /* offset */ - 0, /* flags */ + SLAB_TEMPORARY, /* flags */ NULL); /* ctor */ retval = 0; if (jbd2_journal_head_cache == 0) { @@ -2271,7 +2271,7 @@ static int __init journal_init_handle_cache(void) jbd2_handle_cache = kmem_cache_create("jbd2_journal_handle", sizeof(handle_t), 0, /* offset */ - 0, /* flags */ + SLAB_TEMPORARY, /* flags */ NULL); /* ctor */ if (jbd2_handle_cache == NULL) { printk(KERN_EMERG "JBD: failed to create handle cache\n"); diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c index 3595fd4..df36f42 100644 --- a/fs/jbd2/revoke.c +++ b/fs/jbd2/revoke.c @@ -171,13 +171,15 @@ int __init jbd2_journal_init_revoke_caches(void) { jbd2_revoke_record_cache = kmem_cache_create("jbd2_revoke_record", sizeof(struct jbd2_revoke_record_s), - 0, SLAB_HWCACHE_ALIGN, NULL); + 0, + SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY, + NULL); if (jbd2_revoke_record_cache == 0) return -ENOMEM; jbd2_revoke_table_cache = kmem_cache_create("jbd2_revoke_table", sizeof(struct jbd2_revoke_table_s), - 0, 0, NULL); + 0, SLAB_TEMPORARY, NULL); if (jbd2_revoke_table_cache == 0) { kmem_cache_destroy(jbd2_revoke_record_cache); jbd2_revoke_record_cache = NULL; -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 25/49] jbd2: Remove printk from J_ASSERT to preserve registers during BUG
From: Chris Snook <[EMAIL PROTECTED]> Signed-off-by: Chris Snook <[EMAIL PROTECTED]> Cc: "Stephen C. Tweedie" <[EMAIL PROTECTED]> Cc: Theodore Ts'o <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]> --- include/linux/jbd2.h | 16 +--- 1 files changed, 1 insertions(+), 15 deletions(-) diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 06ef114..d5f7cff 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -256,17 +256,7 @@ typedef struct journal_superblock_s #include #include -#define JBD2_ASSERTIONS -#ifdef JBD2_ASSERTIONS -#define J_ASSERT(assert) \ -do { \ - if (!(assert)) {\ - printk (KERN_EMERG \ - "Assertion failure in %s() at %s:%d: \"%s\"\n", \ - __FUNCTION__, __FILE__, __LINE__, # assert);\ - BUG(); \ - } \ -} while (0) +#define J_ASSERT(assert) BUG_ON(!(assert)) #if defined(CONFIG_BUFFER_DEBUG) void buffer_assertion_failure(struct buffer_head *bh); @@ -282,10 +272,6 @@ void buffer_assertion_failure(struct buffer_head *bh); #define J_ASSERT_JH(jh, expr) J_ASSERT(expr) #endif -#else -#define J_ASSERT(assert) do { } while (0) -#endif /* JBD2_ASSERTIONS */ - #if defined(JBD2_PARANOID_IOFAIL) #define J_EXPECT(expr, why...) J_ASSERT(expr) #define J_EXPECT_BH(bh, expr, why...) J_ASSERT_BH(bh, expr) -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 39/49] ext4: Add ext4_find_next_bit()
From: Aneesh Kumar K.V <[EMAIL PROTECTED]> This function is used by the ext4 multi block allocator patches. Also add generic_find_next_le_bit Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- include/asm-arm/bitops.h |2 + include/asm-generic/bitops/ext2-non-atomic.h |2 + include/asm-generic/bitops/le.h |4 ++ include/asm-m68k/bitops.h|2 + include/asm-m68knommu/bitops.h |2 + include/asm-powerpc/bitops.h |4 ++ include/asm-s390/bitops.h|2 + include/linux/ext4_fs.h |1 + lib/find_next_bit.c | 43 ++ 9 files changed, 62 insertions(+), 0 deletions(-) diff --git a/include/asm-arm/bitops.h b/include/asm-arm/bitops.h index 47a6b08..5c60bfc 100644 --- a/include/asm-arm/bitops.h +++ b/include/asm-arm/bitops.h @@ -310,6 +310,8 @@ static inline int constant_fls(int x) _find_first_zero_bit_le(p,sz) #define ext2_find_next_zero_bit(p,sz,off) \ _find_next_zero_bit_le(p,sz,off) +#define ext2_find_next_bit(p, sz, off) \ + _find_next_bit_le(p, sz, off) /* * Minix is defined to use little-endian byte ordering. diff --git a/include/asm-generic/bitops/ext2-non-atomic.h b/include/asm-generic/bitops/ext2-non-atomic.h index 1697404..63cf822 100644 --- a/include/asm-generic/bitops/ext2-non-atomic.h +++ b/include/asm-generic/bitops/ext2-non-atomic.h @@ -14,5 +14,7 @@ generic_find_first_zero_le_bit((unsigned long *)(addr), (size)) #define ext2_find_next_zero_bit(addr, size, off) \ generic_find_next_zero_le_bit((unsigned long *)(addr), (size), (off)) +#define ext2_find_next_bit(addr, size, off) \ + generic_find_next_le_bit((unsigned long *)(addr), (size), (off)) #endif /* _ASM_GENERIC_BITOPS_EXT2_NON_ATOMIC_H_ */ diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h index b9c7e5d..80e3bf1 100644 --- a/include/asm-generic/bitops/le.h +++ b/include/asm-generic/bitops/le.h @@ -20,6 +20,8 @@ #define generic___test_and_clear_le_bit(nr, addr) __test_and_clear_bit(nr, addr) #define generic_find_next_zero_le_bit(addr, size, offset) find_next_zero_bit(addr, size, offset) +#define generic_find_next_le_bit(addr, size, offset) \ + find_next_bit(addr, size, offset) #elif defined(__BIG_ENDIAN) @@ -42,6 +44,8 @@ extern unsigned long generic_find_next_zero_le_bit(const unsigned long *addr, unsigned long size, unsigned long offset); +extern unsigned long generic_find_next_le_bit(const unsigned long *addr, + unsigned long size, unsigned long offset); #else #error "Please fix " diff --git a/include/asm-m68k/bitops.h b/include/asm-m68k/bitops.h index 2976b5d..83d1f28 100644 --- a/include/asm-m68k/bitops.h +++ b/include/asm-m68k/bitops.h @@ -410,6 +410,8 @@ static inline int ext2_find_next_zero_bit(const void *vaddr, unsigned size, res = ext2_find_first_zero_bit (p, size - 32 * (p - addr)); return (p - addr) * 32 + res; } +#define ext2_find_next_bit(addr, size, off) \ + generic_find_next_le_bit((unsigned long *)(addr), (size), (off)) #endif /* __KERNEL__ */ diff --git a/include/asm-m68knommu/bitops.h b/include/asm-m68knommu/bitops.h index f8dfb7b..f43afe1 100644 --- a/include/asm-m68knommu/bitops.h +++ b/include/asm-m68knommu/bitops.h @@ -294,6 +294,8 @@ found_middle: return result + ffz(__swab32(tmp)); } +#define ext2_find_next_bit(addr, size, off) \ + generic_find_next_le_bit((unsigned long *)(addr), (size), (off)) #include #endif /* __KERNEL__ */ diff --git a/include/asm-powerpc/bitops.h b/include/asm-powerpc/bitops.h index 733b4af..220d9a7 100644 --- a/include/asm-powerpc/bitops.h +++ b/include/asm-powerpc/bitops.h @@ -359,6 +359,8 @@ static __inline__ int test_le_bit(unsigned long nr, unsigned long generic_find_next_zero_le_bit(const unsigned long *addr, unsigned long size, unsigned long offset); +unsigned long generic_find_next_le_bit(const unsigned long *addr, + unsigned long size, unsigned long offset); /* Bitmap functions for the ext2 filesystem */ #define ext2_set_bit(nr,addr) \ @@ -378,6 +380,8 @@ unsigned long generic_find_next_zero_le_bit(const unsigned long *addr, #define ext2_find_next_zero_bit(addr, size, off) \ generic_find_next_zero_le_bit((unsigned long*)addr, size, off) +#define ext2_find_next_bit(addr, size, off) \ + generic_find_next_le_bit((unsigned long *)addr, size, off) /* Bitmap functions for the minix filesystem. */ #define minix_test_and_set_bit(nr,addr) \ diff --git a/include/asm-s390/bitops.h b/include/asm-s390/bitops.h index 34d9a63..dba6fec 100644 --- a/include/asm-s390/bitops.h +++ b/include/asm-s390/bitops.
[PATCH 20/49] ext4/super.c: fix #ifdef's (CONFIG_EXT4_* -> CONFIG_EXT4DEV_*)
From: Adrian Bunk <[EMAIL PROTECTED]> Based on a report by Robert P. J. Day. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> --- fs/ext4/super.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 0931831..1484a08 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -706,7 +706,7 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) seq_puts(seq, ",debug"); if (test_opt(sb, OLDALLOC)) seq_puts(seq, ",oldalloc"); -#ifdef CONFIG_EXT4_FS_XATTR +#ifdef CONFIG_EXT4DEV_FS_XATTR if (test_opt(sb, XATTR_USER)) seq_puts(seq, ",user_xattr"); if (!test_opt(sb, XATTR_USER) && @@ -714,7 +714,7 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) seq_puts(seq, ",nouser_xattr"); } #endif -#ifdef CONFIG_EXT4_FS_POSIX_ACL +#ifdef CONFIG_EXT4DEV_FS_POSIX_ACL if (test_opt(sb, POSIX_ACL)) seq_puts(seq, ",acl"); if (!test_opt(sb, POSIX_ACL) && (def_mount_opts & EXT4_DEFM_ACL)) -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 44/49] ext4: fix uniniatilized extent splitting error
From: Dmitry Monakhov <[EMAIL PROTECTED]> Fix bug reported by Dmitry Monakhov caused by lost error code Testcase: blksize = 0x1000; fd = open(argv[1], O_RDWR|O_CREAT, 0700); unsigned long long sz = 0x1000UL; /* allocating big blocks chunk */ syscall(__NR_fallocate, fd, 0, 0UL, sz) /* grab all other available filesystem space */ tfd = open("tmp", O_RDWR|O_CREAT|O_DIRECT, 0700); while( write(tfd, buf, 4096) > 0); /* loop untill ENOSPC */ fsync(fd); /* just in case */ while (pos < sz) { /* each seek+ write operation result in splits uninitialized extent in three extents. Splitting may result in new extent allocation which probably will fail because of ENOSPC*/ lseek(fd, blksize*2 -1, SEEK_CUR); if ((ret = write(fd, 'a', 1)) != 1) exit(1); pos += blksize * 2; } Signed-off-by: Dmitry Monakhov <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]> --- fs/ext4/extents.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 8cf5545..13e3e8c 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2373,9 +2373,10 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode, ret = ext4_ext_convert_to_initialized(handle, inode, path, iblock, max_blocks); - if (ret <= 0) + if (ret <= 0) { + err = ret; goto out2; - else + } else allocated = ret; goto outnew; } -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 48/49] jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup
From: Mingming Cao <[EMAIL PROTECTED]> While "every 5 seconds" doesn't sound as a problem, there can be many of these (and these timers do add up over all the kernel). The "5 second" wakeup isn't really timing sensitive; in addition even with rounding it'll still happen every 5 seconds (with the exception of the very first time, which is likely to be rounded up to somewhere closer to 6 seconds) (Ported from similar JBD patch made by Arjan van de Ven to fs/jbd/transaction.c) Cc: Arjan van de Ven <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]> --- fs/jbd2/transaction.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 70b3199..0c8adab 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -54,7 +54,7 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction) spin_lock_init(&transaction->t_handle_lock); /* Set up the commit timer for the new transaction. */ - journal->j_commit_timer.expires = transaction->t_expires; + journal->j_commit_timer.expires = round_jiffies(transaction->t_expires); add_timer(&journal->j_commit_timer); J_ASSERT(journal->j_running_transaction == NULL); -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/49] ext4: Support large blocksize up to PAGESIZE
From: Takashi Sato <[EMAIL PROTECTED]> This patch set supports large block size(>4k, <=64k) in ext4, just enlarging the block size limit. But it is NOT possible to have 64kB blocksize on ext4 without some changes to the directory handling code. The reason is that an empty 64kB directory block would have a rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in the filesystem. The proposed solution is treat 64k rec_len with a an impossible value like rec_len = 0x to handle this. The Patch-set consists of the following 2 patches. [1/2] ext4: enlarge blocksize - Allow blocksize up to pagesize [2/2] ext4: fix rec_len overflow - prevent rec_len from overflow with 64KB blocksize Now on 64k page ppc64 box runs with this patch set we could create a 64k block size ext4dev, and able to handle empty directory block. Signed-off-by: Takashi Sato <[EMAIL PROTECTED]> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]> --- fs/ext4/super.c |5 + include/linux/ext4_fs.h |4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 1ca0f54..ab7010d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1624,6 +1624,11 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent) goto out_fail; } + if (!sb_set_blocksize(sb, blocksize)) { + printk(KERN_ERR "EXT4-fs: bad blocksize %d.\n", blocksize); + goto out_fail; + } + /* * The ext4 superblock will not be buffer aligned for other than 1kB * block sizes. We need to calculate the offset from buffer start. diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h index 97dd409..dfe4487 100644 --- a/include/linux/ext4_fs.h +++ b/include/linux/ext4_fs.h @@ -73,8 +73,8 @@ * Macro-instructions used to manage several block sizes */ #define EXT4_MIN_BLOCK_SIZE1024 -#defineEXT4_MAX_BLOCK_SIZE 4096 -#define EXT4_MIN_BLOCK_LOG_SIZE 10 +#defineEXT4_MAX_BLOCK_SIZE 65536 +#define EXT4_MIN_BLOCK_LOG_SIZE10 #ifdef __KERNEL__ # define EXT4_BLOCK_SIZE(s)((s)->s_blocksize) #else -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 40/49] ext4: Add new functions for searching extent tree
From: Alex Tomas <[EMAIL PROTECTED]> Add the functions ext4_ext_search_left() and ext4_ext_search_right(), which are used by mballoc during ext4_ext_get_blocks to decided whether to merge extent information. Signed-off-by: Alex Tomas <[EMAIL PROTECTED]> Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]> Signed-off-by: Johann Lombardi <[EMAIL PROTECTED]> Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]> Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]> --- fs/ext4/extents.c | 142 +++ include/linux/ext4_fs_extents.h |4 + 2 files changed, 146 insertions(+), 0 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 03d1bbb..a60227c 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1017,6 +1017,148 @@ out: } /* + * search the closest allocated block to the left for *logical + * and returns it at @logical + it's physical address at @phys + * if *logical is the smallest allocated block, the function + * returns 0 at @phys + * return value contains 0 (success) or error code + */ +int +ext4_ext_search_left(struct inode *inode, struct ext4_ext_path *path, + ext4_lblk_t *logical, ext4_fsblk_t *phys) +{ + struct ext4_extent_idx *ix; + struct ext4_extent *ex; + int depth; + + BUG_ON(path == NULL); + depth = path->p_depth; + *phys = 0; + + if (depth == 0 && path->p_ext == NULL) + return 0; + + /* usually extent in the path covers blocks smaller +* then *logical, but it can be that extent is the +* first one in the file */ + + ex = path[depth].p_ext; + if (*logical < le32_to_cpu(ex->ee_block)) { + BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex); + while (--depth >= 0) { + ix = path[depth].p_idx; + BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr)); + } + return 0; + } + + BUG_ON(*logical < le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len)); + + *logical = le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len) - 1; + *phys = ext_pblock(ex) + le16_to_cpu(ex->ee_len) - 1; + return 0; +} + +/* + * search the closest allocated block to the right for *logical + * and returns it at @logical + it's physical address at @phys + * if *logical is the smallest allocated block, the function + * returns 0 at @phys + * return value contains 0 (success) or error code + */ +int +ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path, + ext4_lblk_t *logical, ext4_fsblk_t *phys) +{ + struct buffer_head *bh = NULL; + struct ext4_extent_header *eh; + struct ext4_extent_idx *ix; + struct ext4_extent *ex; + ext4_fsblk_t block; + int depth; + + BUG_ON(path == NULL); + depth = path->p_depth; + *phys = 0; + + if (depth == 0 && path->p_ext == NULL) + return 0; + + /* usually extent in the path covers blocks smaller +* then *logical, but it can be that extent is the +* first one in the file */ + + ex = path[depth].p_ext; + if (*logical < le32_to_cpu(ex->ee_block)) { + BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex); + while (--depth >= 0) { + ix = path[depth].p_idx; + BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr)); + } + *logical = le32_to_cpu(ex->ee_block); + *phys = ext_pblock(ex); + return 0; + } + + BUG_ON(*logical < le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len)); + + if (ex != EXT_LAST_EXTENT(path[depth].p_hdr)) { + /* next allocated block in this leaf */ + ex++; + *logical = le32_to_cpu(ex->ee_block); + *phys = ext_pblock(ex); + return 0; + } + + /* go up and search for index to the right */ + while (--depth >= 0) { + ix = path[depth].p_idx; + if (ix != EXT_LAST_INDEX(path[depth].p_hdr)) + break; + } + + if (depth < 0) { + /* we've gone up to the root and +* found no index to the right */ + return 0; + } + + /* we've found index to the right, let's +* follow it and find the closest allocated +* block to the right */ + ix++; + block = idx_pblock(ix); + while (++depth < path->p_depth) { + bh = sb_bread(inode->i_sb, block); + if (bh == NULL) + return -EIO; + eh = ext_block_hdr(bh); + if (ext4_ext_check_header(inode, eh, depth)) { + brelse(bh); + return -EIO; + } + ix = EXT_FIRST_INDEX(eh); + block = idx_pblock(ix); +
[PATCH 15/49] ext4: store maxbytes for bitmapped files and return EFBIG as appropriate
From: Eric Sandeen <[EMAIL PROTECTED]> Calculate & store the max offset for bitmapped files, and catch too-large seeks, truncates, and writes in ext4, shortening or rejecting as appropriate. Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]> --- fs/ext4/file.c | 19 ++- fs/ext4/inode.c| 16 +++- fs/ext4/super.c|1 + include/linux/ext4_fs_sb.h |1 + 4 files changed, 35 insertions(+), 2 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 1a81cd6..a6b2aa1 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -56,8 +56,25 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov, ssize_t ret; int err; - ret = generic_file_aio_write(iocb, iov, nr_segs, pos); + /* +* If we have encountered a bitmap-format file, the size limit +* is smaller than s_maxbytes, which is for extent-mapped files. +*/ + + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) { + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + size_t length = iov_length(iov, nr_segs); + if (pos > sbi->s_bitmap_maxbytes) + return -EFBIG; + + if (pos + length > sbi->s_bitmap_maxbytes) { + nr_segs = iov_shorten((struct iovec *)iov, nr_segs, + sbi->s_bitmap_maxbytes - pos); + } + } + + ret = generic_file_aio_write(iocb, iov, nr_segs, pos); /* * Skip flushing if there was an error, or if nothing was written. */ diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9cf8572..eaace13 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -314,7 +314,10 @@ static int ext4_block_to_path(struct inode *inode, offsets[n++] = i_block & (ptrs - 1); final = ptrs; } else { - ext4_warning(inode->i_sb, "ext4_block_to_path", "block > big"); + ext4_warning(inode->i_sb, "ext4_block_to_path", + "block %u > max", + i_block + direct_blocks + + indirect_blocks + double_blocks); } if (boundary) *boundary = final - 1 - (i_block & (ptrs - 1)); @@ -3092,6 +3095,17 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) ext4_journal_stop(handle); } + if (attr->ia_valid & ATTR_SIZE) { + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) { + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + + if (attr->ia_size > sbi->s_bitmap_maxbytes) { + error = -EFBIG; + goto err_out; + } + } + } + if (S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) { handle_t *handle; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index c79e46b..0931831 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1922,6 +1922,7 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent) } } + sbi->s_bitmap_maxbytes = ext4_max_bitmap_size(sb->s_blocksize_bits); sb->s_maxbytes = ext4_max_size(sb->s_blocksize_bits); if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV) { diff --git a/include/linux/ext4_fs_sb.h b/include/linux/ext4_fs_sb.h index f15821c..38a47ec 100644 --- a/include/linux/ext4_fs_sb.h +++ b/include/linux/ext4_fs_sb.h @@ -38,6 +38,7 @@ struct ext4_sb_info { ext4_group_t s_groups_count;/* Number of groups in the fs */ unsigned long s_overhead_last; /* Last calculated overhead */ unsigned long s_blocks_last;/* Last seen block count */ + loff_t s_bitmap_maxbytes; /* max bytes for bitmap files */ struct buffer_head * s_sbh; /* Buffer containing the super block */ struct ext4_super_block * s_es; /* Pointer to the super block in the buffer */ struct buffer_head ** s_group_desc; -- 1.5.4.rc3.31.g1271-dirty -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ext4 merge plans for 2.6.25
The following patches have been in the -mm tree for a while, and I plan to push them to Linus when the 2.6.25 merge window opens. With this patch series, it is expected that ext4 format should be settling down. We still have delayed allocation and online defrag which aren't quite ready to merge, but those shouldn't affect the on-disk format. I don't expect any other on-disk format changes to show up after this point, but I've been wrong before any such changes would have to have a Really Good Reason, though. (No, Abhishek Rai's changes wouldn't count as an on-disk change, since they change layout choices, but not anything that e2fsck would actually care about. We may try merging those into ext4 and see how they play out in the -mm tree; we'll see.) - Ted P.S. Yes, the currently released e2fsprogs won't support all of these format changes yet; again ext4, shouldn't be deployed to production systems yet, although we do salute those who are willing to be guinea pigs and play with this code! Never fear, I'll be working to get e2fsprogs caught up Real Soon Now. Adrian Bunk (1): ext4/super.c: fix #ifdef's (CONFIG_EXT4_* -> CONFIG_EXT4DEV_*) Alex Tomas (2): ext4: Add new functions for searching extent tree ext4: Add multi block allocator for ext4 Aneesh Kumar K.V (23): ext4: Introduce ext4_lblk_t ext4: Introduce ext4_update_*_feature ext4: Fix sparse warnings. ext4: Rename i_file_acl to i_file_acl_lo ext4: Rename i_dir_acl to i_size_high ext4: Add support for 48 bit inode i_blocks. ext4: Support large files ext2: Fix the max file size for ext2 file system. ext3: Fix the max file size for ext3 file system. ext4: Return after ext4_error in case of failures ext4: Change the default behaviour on error Add buffer head related helper functions ext4: add block bitmap validation ext4: Check for the correct error return from ext4: Make ext4_get_blocks_wrap take the truncate_mutex early. ext4: Convert truncate_mutex to read write semaphore. ext4: Take read lock during overwrite case. ext4: Add EXT4_IOC_MIGRATE ioctl ext4: Fix ext4_show_options to show the correct mount options. ext4: Add ext4_find_next_bit() ext4: Enable the multiblock allocator by default ext4: Check for return value from sb_set_blocksize ext4: Use the ext4_ext_actual_len() helper function Avantika Mathur (2): ext4: add ext4_group_t, and change all group variables to this type. ext4: fixes block group number being set to a negative value Chris Snook (1): jbd2: Remove printk from J_ASSERT to preserve registers during BUG Coly Li (1): ext4: sync up block group descriptor with e2fsprogs. Dmitry Monakhov (1): ext4: fix uniniatilized extent splitting error Eric Sandeen (6): ext4 extents: remove unneeded casts ext4: different maxbytes functions for bitmap & extentfiles ext4: export iov_shorten from kernel for ext4's use ext4: store maxbytes for bitmapped files and return EFBIG as appropriate ext4: fix oops on corrupted ext4 mount ext4: fix up EXT4FS_DEBUG builds Girish Shilamkar (1): ext4: Add the journal checksum feature Jan Kara (2): ext4: Avoid rec_len overflow with 64KB block size jbd2: Fix assertion failure in fs/jbd2/checkpoint.c Jean Noel Cordenner (2): vfs: Add 64 bit i_version support ext4: Add inode version support in ext4 Johann Lombardi (1): jbd2: jbd2 stats through procfs Mariusz Kozlowski (1): ext4: remove unused code from ext4_find_entry() Mingming Cao (4): jbd2: add lockdep support jbd2: Mark jbd2 slabs as SLAB_TEMPORARY jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup jbd2: sparse pointer use of zero as null Takashi Sato (1): ext4: Support large blocksize up to PAGESIZE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC/PATCH] dma: dma_{un}map_{single|sg}_attrs() interface
Here's a new interface for passing attributes to the dma mapping and unmapping routines. (I have patches that make use of the interface as well, but let's discuss this piece first.) For ia64, new machvec entries replace the dma map/unmap interface, and the old interface is implemented in terms of the new. (All implementations other than ia64/sn2 ignore the new attributes.) For architectures other than ia64, the new interface is implemented in terms of the old (attributes are always ignored). Tested on hpzx1 and ia64/sn2 (IA64_GENERIC kernels) and on x86_64. Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]> -- arch/ia64/hp/common/hwsw_iommu.c | 60 +- arch/ia64/hp/common/sba_iommu.c | 62 +++ arch/ia64/sn/pci/pci_dma.c | 71 +++ include/asm-ia64/dma-mapping.h | 28 ++-- include/asm-ia64/machvec.h | 52 +- include/asm-ia64/machvec_hpzx1.h | 16 +++--- include/asm-ia64/machvec_hpzx1_swiotlb.h | 16 +++--- include/asm-ia64/machvec_sn2.h | 16 +++--- include/linux/dma-attrs.h| 33 ++ include/linux/dma-mapping.h | 33 ++ lib/swiotlb.c| 50 ++--- 11 files changed, 301 insertions(+), 136 deletions(-) diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c index 94e5710..8cedd6c 100644 --- a/arch/ia64/hp/common/hwsw_iommu.c +++ b/arch/ia64/hp/common/hwsw_iommu.c @@ -20,10 +20,10 @@ extern int swiotlb_late_init_with_default_size (size_t size); extern ia64_mv_dma_alloc_coherent swiotlb_alloc_coherent; extern ia64_mv_dma_free_coherent swiotlb_free_coherent; -extern ia64_mv_dma_map_single swiotlb_map_single; -extern ia64_mv_dma_unmap_singleswiotlb_unmap_single; -extern ia64_mv_dma_map_sg swiotlb_map_sg; -extern ia64_mv_dma_unmap_sgswiotlb_unmap_sg; +extern ia64_mv_dma_map_single_attrsswiotlb_map_single_attrs; +extern ia64_mv_dma_unmap_single_attrs swiotlb_unmap_single_attrs; +extern ia64_mv_dma_map_sg_attrsswiotlb_map_sg_attrs; +extern ia64_mv_dma_unmap_sg_attrs swiotlb_unmap_sg_attrs; extern ia64_mv_dma_supported swiotlb_dma_supported; extern ia64_mv_dma_mapping_error swiotlb_dma_mapping_error; @@ -31,19 +31,19 @@ extern ia64_mv_dma_mapping_error swiotlb_dma_mapping_error; extern ia64_mv_dma_alloc_coherent sba_alloc_coherent; extern ia64_mv_dma_free_coherent sba_free_coherent; -extern ia64_mv_dma_map_single sba_map_single; -extern ia64_mv_dma_unmap_singlesba_unmap_single; -extern ia64_mv_dma_map_sg sba_map_sg; -extern ia64_mv_dma_unmap_sgsba_unmap_sg; +extern ia64_mv_dma_map_single_attrssba_map_single_attrs; +extern ia64_mv_dma_unmap_single_attrs sba_unmap_single_attrs; +extern ia64_mv_dma_map_sg_attrssba_map_sg_attrs; +extern ia64_mv_dma_unmap_sg_attrs sba_unmap_sg_attrs; extern ia64_mv_dma_supported sba_dma_supported; extern ia64_mv_dma_mapping_error sba_dma_mapping_error; #define hwiommu_alloc_coherent sba_alloc_coherent #define hwiommu_free_coherent sba_free_coherent -#define hwiommu_map_single sba_map_single -#define hwiommu_unmap_single sba_unmap_single -#define hwiommu_map_sg sba_map_sg -#define hwiommu_unmap_sg sba_unmap_sg +#define hwiommu_map_single_attrs sba_map_single_attrs +#define hwiommu_unmap_single_attrs sba_unmap_single_attrs +#define hwiommu_map_sg_attrs sba_map_sg_attrs +#define hwiommu_unmap_sg_attrs sba_unmap_sg_attrs #define hwiommu_dma_supported sba_dma_supported #define hwiommu_dma_mapping_error sba_dma_mapping_error #define hwiommu_sync_single_for_cpumachvec_dma_sync_single @@ -98,40 +98,44 @@ hwsw_free_coherent (struct device *dev, size_t size, void *vaddr, dma_addr_t dma } dma_addr_t -hwsw_map_single (struct device *dev, void *addr, size_t size, int dir) +hwsw_map_single_attrs (struct device *dev, void *addr, size_t size, int dir, + struct dma_attrs *attrs) { if (use_swiotlb(dev)) - return swiotlb_map_single(dev, addr, size, dir); + return swiotlb_map_single_attrs(dev, addr, size, dir, attrs); else - return hwiommu_map_single(dev, addr, size, dir); + return hwiommu_map_single_attrs(dev, addr, size, dir, attrs); } void -hwsw_unmap_single (struct device *dev, dma_addr_t iova, size_t size, int dir) +hwsw_unmap_single_attrs (struct device *dev, dma_addr_t iova, size_t size, +int dir, struct dma_attrs *attrs) { if (use_swiotlb(dev)) - return swiotlb_unmap_single(dev, iova, size
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
2008/1/22, Linus Torvalds <[EMAIL PROTECTED]>: > > > On Tue, 22 Jan 2008, Anton Salikhmetov wrote: > > > > /* > > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > > + * It will force a pagefault on the next write access. > > + */ > > +static void vma_wrprotect(struct vm_area_struct *vma) > > +{ > > + unsigned long addr; > > + > > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { > > + spinlock_t *ptl; > > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > > + pud_t *pud = pud_offset(pgd, addr); > > + pmd_t *pmd = pmd_offset(pud, addr); > > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > > This is extremely expensive over bigger areas, especially sparsely mapped > ones (it does all the lookups for all four levels over and over and over > again for eachg page). > > I think Peter Zijlstra posted a version that uses the regular kind of > nested loop (with inline functions to keep the thing nice and clean), > which gets rid of that. Thanks for your feedback, Linus! I will use Peter Zijlstra's version of such an operation in my next patch series. > > [ The sad/funny part is that this is all how we *used* to do msync(), back > in the days: we're literally going back to the "pre-cleanup" logic. See > commit 204ec841fbea3e5138168edbc3a76d46747cc987: "mm: msync() cleanup" > for details ] > > Quite frankly, I really think you might be better off just doing a > > git revert 204ec841fbea3e5138168edbc3a76d46747cc987 > > and working from there! I just checked, and it still reverts cleanly, and > you'd end up with a nice code-base that (a) has gotten years of testing > and (b) already has the looping-over-the-pagetables code. > > Linus > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
WARNING: at kernel/mutex.c:134
While try to trigger bug 9778, press ctrl+alt+sysrq +w, the following warnings appeared: usb0: unregister 'cdc_ether' usb-:00:1d.7-1.3, CDC Ethernet Device unregister_netdevice: waiting for usb0 to become free. Usage count = 1 SysRq : Show Blocked State taskPC stack pid father Sched Debug Version: v0.07, 2.6.24-rc8-mm1 #5 now at 2660467.699238 msecs .sysctl_sched_latency: 40.00 .sysctl_sched_min_granularity: 8.00 .sysctl_sched_wakeup_granularity : 20.00 .sysctl_sched_batch_wakeup_granularity : 20.00 .sysctl_sched_child_runs_first : 0.01 .sysctl_sched_features : 39 cpu#0, 2793.192 MHz .nr_running: 2 .load : 4096 .nr_switches : 1627754 .nr_load_updates : 241563 .nr_uninterruptible: 4294967012 .jiffies : 708140 .next_balance : 0.708327 .curr->pid : 0 .clock : 2656959.965268 .idle_clock: 0.00 .prev_clock_raw: 2674890.031768 .clock_warps : 0 .clock_overflows : 5805 .clock_underflows : 127079 .clock_deep_idle_events: 2 .clock_max_delta : 671.628710 .cpu_load[0] : 0 .cpu_load[1] : 0 .cpu_load[2] : 6 .cpu_load[3] : 48 .cpu_load[4] : 85 cfs_rq .exec_clock: 0.00 .MIN_vruntime : 180379.753201 .min_vruntime : 180379.753201 .max_vruntime : 180379.753201 .spread: 0.00 .spread0 : 0.00 .nr_running: 1 .load : 4096 .nr_spread_over: 0 [ cut here ] WARNING: at kernel/mutex.c:134 mutex_lock_nested+0x277/0x290() Modules linked in: cdc_ether usbnet snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel snd_pcm intel_agp btusb snd_timer rtc_cmos bluetooth sg rtc_core 3c59x evdev agpgart snd serio_raw button thermal processor soundcore rtc_lib snd_page_alloc i2c_i801 pcspkr dcdbas Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #5 [] ? printk+0x0/0x20 [] warn_on_slowpath+0x54/0x80 [] ? _spin_unlock_irqrestore+0x5e/0x70 [] ? release_console_sem+0xd1/0xe0 [] ? vprintk+0x308/0x320 [] ? put_lock_stats+0x21/0x30 [] ? lock_release_holdtime+0x60/0x80 [] ? print_cfs_rq+0x117/0x500 [] ? __lock_release+0x47/0x70 [] ? print_cfs_rq+0x117/0x500 [] ? printk+0x18/0x20 [] mutex_lock_nested+0x277/0x290 [] ? vprintk+0x308/0x320 [] ? print_cfs_stats+0x30/0xb0 [] print_cfs_stats+0x30/0xb0 [] print_cpu+0x81c/0x830 [] sched_debug_show+0x22a/0x430 [] sysrq_sched_debug_show+0xc/0x10 [] show_state_filter+0x86/0xb0 [] sysrq_handle_showstate_blocked+0xd/0x10 [] __handle_sysrq+0x89/0x120 [] handle_sysrq+0x33/0x40 [] kbd_keycode+0x39e/0x480 [] ? __mod_timer+0xa0/0xb0 [] kbd_event+0xea/0x100 [] input_pass_event+0xec/0x100 [] ? input_pass_event+0x0/0x100 [] ? mod_timer+0x26/0x40 [] input_handle_event+0xb2/0x2b0 [] input_event+0x5f/0x80 [] hidinput_hid_event+0xef/0x390 [] ? hid_input_field+0x40/0x340 [] hid_process_event+0x63/0x90 [] hid_input_field+0x2c5/0x340 [] hid_input_report+0x106/0x260 [] ? put_lock_stats+0xd/0x30 [] ? lock_release_holdtime+0x60/0x80 [] hid_irq_in+0x181/0x190 [] ? uhci_giveback_urb+0x8a/0x160 [] usb_hcd_giveback_urb+0x41/0xa0 [] uhci_giveback_urb+0x97/0x160 [] uhci_scan_qh+0x70/0x1c0 [] uhci_scan_schedule+0x8b/0x130 [] uhci_irq+0xb5/0x150 [] ? __lock_release+0x47/0x70 [] usb_hcd_irq+0x24/0x60 [] handle_IRQ_event+0x28/0x60 [] handle_fasteoi_irq+0x6e/0xd0 [] do_IRQ+0x3c/0x80 [] ? tick_nohz_stop_sched_tick+0x25c/0x350 [] common_interrupt+0x2e/0x34 [] ? mwait_idle_with_hints+0x40/0x50 [] ? mwait_idle+0x0/0x20 [] mwait_idle+0x12/0x20 [] cpu_idle+0x61/0x110 [] rest_init+0x5d/0x60 [] start_kernel+0x1fa/0x260 [] ? unknown_bootoption+0x0/0x130 === ---[ end trace bc131943b9b4ac4c ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] CRAMFS: Uncompressed files support
Hi, This patch enables the uncompressed files support in cramfs. The word 'uncompressed file' is from linear cramfs (aka Application XIP). In linear cramfs, it is used to suport XIP on NOR. However it is also helpful on OneNAND. It makes a filesystem faster by removing compression overhead. In XIP mode it runs XIP, But non-XIP mode. It copies data to ram and runs. In my simple test, copy busybox (compressed or uncompressed). It reduces the about 50% time saving from 0.40s to 0.19s. Yes, it incrases the file system size, but nowadays flash has big capacity. It's trade-off between size and performance. Also this patch uses the page cache directly. In previous implementation, it used the local buffer. why? It's already uncompressed and fits to page size. So It uses the page directly to remove useless memory copy. It's compatible the existing linear cramfs image and original one. Any comments are welcome. Thank you, Kyungmin Park Signed-off-by: Kyungmin Park <[EMAIL PROTECTED]> --- diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index 3d194a2..edba28f 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -40,6 +40,7 @@ static DEFINE_MUTEX(read_mutex); #define CRAMINO(x) (((x)->offset && (x)->size)?(x)->offset<<2:1) #define OFFSET(x) ((x)->i_ino) +#define CRAMFS_INODE_IS_XIP(x) ((x)->i_mode & S_ISVTX) static int cramfs_iget5_test(struct inode *inode, void *opaque) { @@ -143,8 +144,9 @@ static int next_buffer; /* * Returns a pointer to a buffer containing at least LEN bytes of * filesystem starting at byte offset OFFSET into the filesystem. + * If the @pg has the page, it returns the page buffer address */ -static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len) +static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len, struct page **pg) { struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping; struct page *pages[BLKS_PER_BUF]; @@ -174,6 +176,22 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned i devsize = mapping->host->i_size >> PAGE_CACHE_SHIFT; + /* +* Use page directly either +* - uncompressed page or +* - comprssed page which has all required data +*/ + if (pg && offset + len <= PAGE_CACHE_SIZE) { + struct page *page = NULL; + page = read_mapping_page(mapping, blocknr, NULL); + if (!IS_ERR(page)) { + *pg = page; + data = kmap(page); + data += offset; + return data; + } + } + /* Ok, read in BLKS_PER_BUF pages completely first. */ unread = 0; for (i = 0; i < BLKS_PER_BUF; i++) { @@ -253,14 +271,14 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent) buffer_blocknr[i] = -1; /* Read the first block and get the superblock from it */ - memcpy(&super, cramfs_read(sb, 0, sizeof(super)), sizeof(super)); + memcpy(&super, cramfs_read(sb, 0, sizeof(super), NULL), sizeof(super)); mutex_unlock(&read_mutex); /* Do sanity checks on the superblock */ if (super.magic != CRAMFS_MAGIC) { /* check at 512 byte offset */ mutex_lock(&read_mutex); - memcpy(&super, cramfs_read(sb, 512, sizeof(super)), sizeof(super)); + memcpy(&super, cramfs_read(sb, 512, sizeof(super), NULL), sizeof(super)); mutex_unlock(&read_mutex); if (super.magic != CRAMFS_MAGIC) { if (!silent) @@ -367,7 +385,7 @@ static int cramfs_readdir(struct file *filp, void *dirent, filldir_t filldir) int namelen, error; mutex_lock(&read_mutex); - de = cramfs_read(sb, OFFSET(inode) + offset, sizeof(*de)+256); + de = cramfs_read(sb, OFFSET(inode) + offset, sizeof(*de)+256, NULL); name = (char *)(de+1); /* @@ -417,7 +435,7 @@ static struct dentry * cramfs_lookup(struct inode *dir, struct dentry *dentry, s char *name; int namelen, retval; - de = cramfs_read(dir->i_sb, OFFSET(dir) + offset, sizeof(*de)+256); + de = cramfs_read(dir->i_sb, OFFSET(dir) + offset, sizeof(*de)+256, NULL); name = (char *)(de+1); /* Try to take advantage of sorted directories */ @@ -463,21 +481,44 @@ static struct dentry * cramfs_lookup(struct inode *dir, struct dentry *dentry, s static int cramfs_readpage(struct file *file, struct page * page) { struct inode *inode = page->mapping->host; + struct super_block *sb = inode->i_sb; u32 maxblock, bytes_filled; + struct page *pg = NULL; void *pgdata; maxblock = (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
Re: [PATCH] mmu notifiers #v3
On Mon, 21 Jan 2008 13:52:04 +0100 Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]> Reviewed-by: Rik van Riel <[EMAIL PROTECTED]> -- All rights reversed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>: > > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > > > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>: > > > > Some very pedantic nitpicking below; > > > > > > > > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote: > > ... > > > > > + if (file && (vma->vm_flags & VM_SHARED)) { > > > > > + if (flags & MS_ASYNC) > > > > > + vma_wrprotect(vma); > > > > > + if (flags & MS_SYNC) { > > > > > > > > "else if" ?? > > > > > > The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I > > > did not use the "else-if" here. Moreover, this function itself checks > > > that they never come together. > > > > > > > I would say that them being mutually exclusive would be a reason *for* > > using "else-if" here. > > This check is performed by the sys_msync() function itself in its very > beginning. > > We don't need to check it later. > Sure, it's just that, to me, using 'else-if' makes it explicit that the two are mutually exclusive. Using "if (...), if (...)" doesn't. Maybe it's just me, but I feel that 'else-if' here better shows the intention... No big deal. -- Jesper Juhl <[EMAIL PROTECTED]> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files
On Tue, 22 Jan 2008, Anton Salikhmetov wrote: > > /* > + * Scan the PTEs for pages belonging to the VMA and mark them read-only. > + * It will force a pagefault on the next write access. > + */ > +static void vma_wrprotect(struct vm_area_struct *vma) > +{ > + unsigned long addr; > + > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { > + spinlock_t *ptl; > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr); > + pud_t *pud = pud_offset(pgd, addr); > + pmd_t *pmd = pmd_offset(pud, addr); > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); This is extremely expensive over bigger areas, especially sparsely mapped ones (it does all the lookups for all four levels over and over and over again for eachg page). I think Peter Zijlstra posted a version that uses the regular kind of nested loop (with inline functions to keep the thing nice and clean), which gets rid of that. [ The sad/funny part is that this is all how we *used* to do msync(), back in the days: we're literally going back to the "pre-cleanup" logic. See commit 204ec841fbea3e5138168edbc3a76d46747cc987: "mm: msync() cleanup" for details ] Quite frankly, I really think you might be better off just doing a git revert 204ec841fbea3e5138168edbc3a76d46747cc987 and working from there! I just checked, and it still reverts cleanly, and you'd end up with a nice code-base that (a) has gotten years of testing and (b) already has the looping-over-the-pagetables code. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/