Re: [PATCH 6/10] ACPI: register ACPI Video LCD as generic thermal cooling device

2008-01-21 Thread Zhang Rui
Hi, Matthew,

On Fri, 2008-01-18 at 09:42 +0800, Matthew Garrett wrote:
> On Fri, Jan 18, 2008 at 09:31:40AM +0800, Zhang Rui wrote:
> 
> > Just like I don't think lcd should be used for ACPI thermal
> management
> > before I saw it is listed in _TZD and intel_menlow requires to
> throttle
> > it when overheating, why not let the individual drivers implement
> the
> > callbacks if there is clearly a request to do this.
> > And we can add this to the generic acpi_device struct then if this
> is a
> > common feature for all ACPI devices.
> 
> It'll probably never be common for all ACPI devices,
I agree.
>  but it's already
> required for three types. I think that's a strong argument for making
> it generic.
I don't think it's worth doing this as it's only the common feature for
three ACPI devices.
> > Well, you're right.
> > But in order to throttle the lcd, this is reasonable, right?
> 
> Moving the common code into its own routine and then calling that from
> each of the others would probably work.
Yes.
I can send an on top patch if the patch "Rationalise ACPI backlight
implementation" is applied.

Thanks,
Rui

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] spi : use class iteration api

2008-01-21 Thread David Brownell
On Monday 21 January 2008, Dave Young wrote:
> Convert to use the class iteration api.
> 
> Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

Ack.


> ---
>  drivers/spi/spi.c |   24 ++--
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff -upr linux/drivers/spi/spi.c linux.new/drivers/spi/spi.c
> --- linux/drivers/spi/spi.c   2008-01-22 15:09:49.0 +0800
> +++ linux.new/drivers/spi/spi.c   2008-01-22 15:09:49.0 +0800
> @@ -485,6 +485,15 @@ void spi_unregister_master(struct spi_ma
>  }
>  EXPORT_SYMBOL_GPL(spi_unregister_master);
>  
> +static int __spi_master_match(struct device *dev, void *data)
> +{
> + struct spi_master *m;
> + u16 *bus_num = data;
> +
> + m = container_of(dev, struct spi_master, dev);
> + return m->bus_num == *bus_num;
> +}
> +
>  /**
>   * spi_busnum_to_master - look up master associated with bus_num
>   * @bus_num: the master's bus number
> @@ -499,17 +508,12 @@ struct spi_master *spi_busnum_to_master(
>  {
>   struct device   *dev;
>   struct spi_master   *master = NULL;
> - struct spi_master   *m;
>  
> - down(&spi_master_class.sem);
> - list_for_each_entry(dev, &spi_master_class.children, node) {
> - m = container_of(dev, struct spi_master, dev);
> - if (m->bus_num == bus_num) {
> - master = spi_master_get(m);
> - break;
> - }
> - }
> - up(&spi_master_class.sem);
> + dev = class_find_device(&spi_master_class, &bus_num,
> + __spi_master_match);
> + if (dev)
> + master = container_of(dev, struct spi_master, dev);
> + /* reference got in class_find_device */
>   return master;
>  }
>  EXPORT_SYMBOL_GPL(spi_busnum_to_master);
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7] driver-core : convert semaphore to mutex in struct class

2008-01-21 Thread Jarek Poplawski
On 22-01-2008 01:55, Dave Young wrote:
...
> Hi, thanks your effort. Now I think we should stop this thread and
> waiting the class_device going away :)

Sure! But, if you change your mind I'm interested in this subject.

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] driver-core : add class iteration api

2008-01-21 Thread Dave Young
On Mon, Jan 21, 2008 at 10:24:17PM -0800, David Brownell wrote:
> On Monday 21 January 2008, Dave Young wrote:
> >  
> > +/**
> > + * class_for_each_device - device iterator
> > + * @class: the class we're iterating
> > + * @data: data for the callback
> > + * @fn: function to be called for each device
> > + *
> > + * Iterate over @class's list of devices, and call @fn for each,
> > + * passing it @data.
> > + *
> > + * We check the return of @fn each time. If it returns anything
> > + * other than 0, we break out and return that value.
> 
> I have a suggestion for better documentation, which
> applies to all these utilities:
> 
> 
> > + */
> > +int class_for_each_device(struct class *class, void *data,
> > +  int (*fn)(struct device *, void *))
> > +{
> > +   struct device *dev;
> > +   int error = 0;
> > +
> > +   if (!class)
> > +   return -EINVAL;
> > +   down(&class->sem);
> > +   list_for_each_entry(dev, &class->devices, node) {
> > +   dev = get_device(dev);
> > +   if (dev) {
> > +   error = fn(dev, data);
> 
> This is called with class->sem held.  So fn() has a
> constraint to not re-acquire that ... else it'd be
> self-deadlocking.  I'd like to see docs at least
> mention that; calls to add or remove class members
> would be verboten, for example, which isn't an issue
> with most other driver model iterators.
> 
> 
> > +   put_device(dev);
> > +   } else
> > +   error = -ENODEV;
> > +   if (error)
> > +   break;
> > +   }
> > +   up(&class->sem);
> > +
> > +   return error;
> > +}
> > +EXPORT_SYMBOL_GPL(class_for_each_device);

Update kerneldoc as david brownell's sugestion.
Is it right for me add Cornelia Huck's ack after this change?
---

Add the following class iteration functions for driver use:
class_for_each_device
class_find_device
class_for_each_child
class_find_child

Signed-off-by: Dave Young <[EMAIL PROTECTED]> 
Acked-by: Cornelia Huck <[EMAIL PROTECTED]>

---
 drivers/base/class.c   |  175 +
 include/linux/device.h |   11 ++-
 2 files changed, 184 insertions(+), 2 deletions(-)

diff -upr linux/drivers/base/class.c linux.new/drivers/base/class.c
--- linux/drivers/base/class.c  2008-01-22 15:06:55.0 +0800
+++ linux.new/drivers/base/class.c  2008-01-22 15:06:55.0 +0800
@@ -798,6 +798,181 @@ void class_device_put(struct class_devic
kobject_put(&class_dev->kobj);
 }
 
+/**
+ * class_for_each_device - device iterator
+ * @class: the class we're iterating
+ * @data: data for the callback
+ * @fn: function to be called for each device
+ *
+ * Iterate over @class's list of devices, and call @fn for each,
+ * passing it @data.
+ *
+ * We check the return of @fn each time. If it returns anything
+ * other than 0, we break out and return that value.
+ *
+ * Note, we hold class->sem in this function, so it can not be
+ * re-acquired in @fn, otherwise it will self-deadlocking. For
+ * example, calls to add or remove class members would be verboten.
+ */
+int class_for_each_device(struct class *class, void *data,
+  int (*fn)(struct device *, void *))
+{
+   struct device *dev;
+   int error = 0;
+
+   if (!class)
+   return -EINVAL;
+   down(&class->sem);
+   list_for_each_entry(dev, &class->devices, node) {
+   dev = get_device(dev);
+   if (dev) {
+   error = fn(dev, data);
+   put_device(dev);
+   } else
+   error = -ENODEV;
+   if (error)
+   break;
+   }
+   up(&class->sem);
+
+   return error;
+}
+EXPORT_SYMBOL_GPL(class_for_each_device);
+
+/**
+ * class_find_device - device iterator for locating a particular device
+ * @class: the class we're iterating
+ * @data: data for the match function
+ * @match: function to check device
+ *
+ * This is similar to the class_for_each_dev() function above, but it
+ * returns a reference to a device that is 'found' for later use, as
+ * determined by the @match callback.
+ *
+ * The callback should return 0 if the device doesn't match and non-zero
+ * if it does.  If the callback returns non-zero, this function will
+ * return to the caller and not iterate over any more devices.
+
+ * Note, you will need to drop the reference with put_device() after use.
+ *
+ * We hold class->sem in this function, so it can not be
+ * re-acquired in @match, otherwise it will self-deadlocking. For
+ * example, calls to add or remove class members would be verboten.
+ */
+struct device *class_find_device(struct class *class, void *data,
+  int (*match)(struct device *, void *))
+{
+   struct device *dev;
+   int found = 0;
+
+   if (!class)
+   return NULL;
+
+   down(&class->sem);
+   list_for_each_entry(dev, &c

Re: [PATCH 0/6] RFC: Typesafe callbacks

2008-01-21 Thread Rusty Russell
On Tuesday 22 January 2008 10:57:03 Linus Torvalds wrote:
> On Tue, 22 Jan 2008, Rusty Russell wrote:
> > Attempt to create callbacks which take unsigned long as well as
> > correct pointer types.
>
> I bow down before you.
>
> I thought I had done some rather horrible things with gcc built-ins and
> macros, but I hereby hand over my crown to you.
>
> As my daughter would say: that patch fell out of the ugly tree, and hit
> every branch on the way down. Very impressive.
>
> All hail Rusty, undisputed ruler of Ugly-land.

Err, thanks.  I read some old SCSI drivers and felt inspired...

> Side note: can you verify that __builtin_choose_expr() exists in gcc-3? I
> don't think we've relied on it before except on arm, and that one has
> always had its own compiler version dependencies..

Hmm, looks like not in 3.0.4, is in 3.1.1.  I'll make it appropriately  
#ifdef'ed (which as a bonus will make things that little bit uglier still...)

If we can stomach it the effect is nice, but the version which simply 
allows pointer correctness (rather than trying to do unsigned long too) is 
less bletcherous.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] spi : use class iteration api

2008-01-21 Thread Dave Young
Convert to use the class iteration api.

Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

---
 drivers/spi/spi.c |   24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff -upr linux/drivers/spi/spi.c linux.new/drivers/spi/spi.c
--- linux/drivers/spi/spi.c 2008-01-22 15:09:49.0 +0800
+++ linux.new/drivers/spi/spi.c 2008-01-22 15:09:49.0 +0800
@@ -485,6 +485,15 @@ void spi_unregister_master(struct spi_ma
 }
 EXPORT_SYMBOL_GPL(spi_unregister_master);
 
+static int __spi_master_match(struct device *dev, void *data)
+{
+   struct spi_master *m;
+   u16 *bus_num = data;
+
+   m = container_of(dev, struct spi_master, dev);
+   return m->bus_num == *bus_num;
+}
+
 /**
  * spi_busnum_to_master - look up master associated with bus_num
  * @bus_num: the master's bus number
@@ -499,17 +508,12 @@ struct spi_master *spi_busnum_to_master(
 {
struct device   *dev;
struct spi_master   *master = NULL;
-   struct spi_master   *m;
 
-   down(&spi_master_class.sem);
-   list_for_each_entry(dev, &spi_master_class.children, node) {
-   m = container_of(dev, struct spi_master, dev);
-   if (m->bus_num == bus_num) {
-   master = spi_master_get(m);
-   break;
-   }
-   }
-   up(&spi_master_class.sem);
+   dev = class_find_device(&spi_master_class, &bus_num,
+   __spi_master_match);
+   if (dev)
+   master = container_of(dev, struct spi_master, dev);
+   /* reference got in class_find_device */
return master;
 }
 EXPORT_SYMBOL_GPL(spi_busnum_to_master);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Parallelize IO for e2fsck

2008-01-21 Thread Andreas Dilger
On Jan 22, 2008  14:38 +1100, David Chinner wrote:
> On Mon, Jan 21, 2008 at 04:00:41PM -0700, Andreas Dilger wrote:
> > I discussed this with Ted at one point also.  This is a generic problem,
> > not just for readahead, because "fsck" can run multiple e2fsck in parallel
> > and in case of many large filesystems on a single node this can cause
> > memory usage problems also.
> > 
> > What I was proposing is that "fsck.{fstype}" be modified to return an
> > estimated minimum amount of memory needed, and some "desired" amount of
> > memory (i.e. readahead) to fsck the filesystem, using some parameter like
> > "fsck.{fstype} --report-memory-needed /dev/XXX".  If this does not
> > return the output in the expected format, or returns an error then fsck
> > will assume some amount of memory based on the device size and continue
> > as it does today.
> 
> And while fsck is running, some other program runs that uses
> memory and blows your carefully calculated paramters to smithereens?

Well, fsck has a rather restricted working environment, because it is
run before most other processes start (i.e. single-user mode).  For fsck
initiated by an admin in other runlevels the admin would need to specify
the upper limit of memory usage.  My proposal was only for the single-user
fsck at boot time.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Parallelize IO for e2fsck

2008-01-21 Thread Andreas Dilger
On Jan 21, 2008  23:17 -0500, [EMAIL PROTECTED] wrote:
> On Tue, 22 Jan 2008 14:38:30 +1100, David Chinner said:
> > Perhaps instead of swapping immediately, a SIGLOWMEM could be sent
> > to a processes that aren't masking the signal followed by a short
> > grace period to allow the processes to free up some memory before
> > swapping out pages from that process?
> 
> AIX had SIGDANGER some 15 years ago.  Admittedly, that was sent when
> the system was about to hit OOM, not when it was about to start swapping.

I'd tried to advocate SIGDANGER some years ago as well, but none of
the kernel maintainers were interested.  It definitely makes sense
to have some sort of mechanism like this.  At the time I first brought
it up it was in conjunction with Netscape using too much cache on some
system, but it would be just as useful for all kinds of other memory-
hungry applications.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] spi : use class iteration api

2008-01-21 Thread Dave Young
On Jan 22, 2008 2:56 PM, David Brownell <[EMAIL PROTECTED]> wrote:
> On Monday 21 January 2008, Dave Young wrote:
> > +static int __spi_master_match(struct device *dev, void *data)
> > +{
> > +struct spi_master *m;
> > +u16 *bus_num = (u16 *)data;
>
> That's "void *data" so "u16 *bus_num = data" is preferred.
>
Fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] spi : use class iteration api

2008-01-21 Thread David Brownell
On Monday 21 January 2008, Dave Young wrote:
> +static int __spi_master_match(struct device *dev, void *data)
> +{
> +   struct spi_master *m;
> +   u16 *bus_num = (u16 *)data;

That's "void *data" so "u16 *bus_num = data" is preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bluetooth : move children of connection device to NULL before connection down

2008-01-21 Thread Marcel Holtmann
Hi Dave,

> > Add people missed in cc-list.
> 
> Thanks Dave for your continued efforts on Bluetooth bugs like this.
> 
> Marcel, are you going to review/ACK/integrate/push-upstream/whatever
> any of these Bluetooth patches?
> 
> It hasn't been getting much love from you as of late, you are one of
> the listed maintainers, and I don't want to lose any of Dave's
> valuable bug fixing work.

I will be fully back in business next week. Just got stuck in a project
that needed 200% of my time to get it going.

> Or should I just handle it all directly?

I followed the list only a little bit, but from what I have seen is that
Dave is doing a great job in tracking all issues down to the real cause.

I had a look at his last patch and after review, I agree that this is a
possible solution. I only have two nitpicks about the coding style. So
in del_conn the struct device declaration should be made after the
struct hci_conn assignment from the container and I would put an extra
empty line before the devel_del, put_device block. Nitpicks only.

Right now I can't think of any side effects by this patch. Actually I
only see an improvement with this patch. So please take it directly and
starting with next week, I gonna make sure that they are handled again
properly by me.

Regards

Marcel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] driver-core : add class iteration api

2008-01-21 Thread Dave Young
On Jan 22, 2008 2:24 PM, David Brownell <[EMAIL PROTECTED]> wrote:
> On Monday 21 January 2008, Dave Young wrote:
> >
> > +/**
> > + *   class_for_each_device - device iterator
> > + *   @class: the class we're iterating
> > + *   @data: data for the callback
> > + *   @fn: function to be called for each device
> > + *
> > + *   Iterate over @class's list of devices, and call @fn for each,
> > + *   passing it @data.
> > + *
> > + *   We check the return of @fn each time. If it returns anything
> > + *   other than 0, we break out and return that value.
>
> I have a suggestion for better documentation, which
> applies to all these utilities:
>
>
> > + */
> > +int class_for_each_device(struct class *class, void *data,
> > +int (*fn)(struct device *, void *))
> > +{
> > + struct device *dev;
> > + int error = 0;
> > +
> > + if (!class)
> > + return -EINVAL;
> > + down(&class->sem);
> > + list_for_each_entry(dev, &class->devices, node) {
> > + dev = get_device(dev);
> > + if (dev) {
> > + error = fn(dev, data);
>
> This is called with class->sem held.  So fn() has a
> constraint to not re-acquire that ... else it'd be
> self-deadlocking.  I'd like to see docs at least
> mention that; calls to add or remove class members
> would be verboten, for example, which isn't an issue
> with most other driver model iterators.

Very good comment, thanks david.  I will update after a while.

>
>
>
> > + put_device(dev);
> > + } else
> > + error = -ENODEV;
> > + if (error)
> > + break;
> > + }
> > + up(&class->sem);
> > +
> > + return error;
> > +}
> > +EXPORT_SYMBOL_GPL(class_for_each_device);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bluetooth : move children of connection device to NULL before connection down

2008-01-21 Thread David Miller
From: Marcel Holtmann <[EMAIL PROTECTED]>
Date: Tue, 22 Jan 2008 07:18:16 +0100

> Right now I can't think of any side effects by this patch. Actually I
> only see an improvement with this patch. So please take it directly and
> starting with next week, I gonna make sure that they are handled again
> properly by me.

Excellent, I'll do that.

Thanks for the feedback.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] driver-core : add class iteration api

2008-01-21 Thread David Brownell
On Monday 21 January 2008, Dave Young wrote:
>  
> +/**
> + *   class_for_each_device - device iterator
> + *   @class: the class we're iterating
> + *   @data: data for the callback
> + *   @fn: function to be called for each device
> + *
> + *   Iterate over @class's list of devices, and call @fn for each,
> + *   passing it @data.
> + *
> + *   We check the return of @fn each time. If it returns anything
> + *   other than 0, we break out and return that value.

I have a suggestion for better documentation, which
applies to all these utilities:


> + */
> +int class_for_each_device(struct class *class, void *data,
> +int (*fn)(struct device *, void *))
> +{
> + struct device *dev;
> + int error = 0;
> +
> + if (!class)
> + return -EINVAL;
> + down(&class->sem);
> + list_for_each_entry(dev, &class->devices, node) {
> + dev = get_device(dev);
> + if (dev) {
> + error = fn(dev, data);

This is called with class->sem held.  So fn() has a
constraint to not re-acquire that ... else it'd be
self-deadlocking.  I'd like to see docs at least
mention that; calls to add or remove class members
would be verboten, for example, which isn't an issue
with most other driver model iterators.


> + put_device(dev);
> + } else
> + error = -ENODEV;
> + if (error)
> + break;
> + }
> + up(&class->sem);
> +
> + return error;
> +}
> +EXPORT_SYMBOL_GPL(class_for_each_device);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] pci-skeleton: Misc fixes to build neatly

2008-01-21 Thread Jike Song
Hello Jeff,

The pci-skeleton.c has several problems with compilation, such as missing args
when calling synchronize_irq(). Fix it.

Signed-off-by: Jike Song <[EMAIL PROTECTED]>
---
 drivers/net/pci-skeleton.c |   49 ++-
 1 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/net/pci-skeleton.c b/drivers/net/pci-skeleton.c
index ed402e0..fffc49b 100644
--- a/drivers/net/pci-skeleton.c
+++ b/drivers/net/pci-skeleton.c
@@ -541,7 +541,7 @@ static void netdrv_hw_start (struct net_device *dev);
 #define NETDRV_W32_F(reg, val32)   do { writel ((val32), ioaddr +
(reg)); readl (ioaddr + (reg)); } while (0)


-#if MMIO_FLUSH_AUDIT_COMPLETE
+#ifdef MMIO_FLUSH_AUDIT_COMPLETE

 /* write MMIO register */
 #define NETDRV_W8(reg, val8)   writeb ((val8), ioaddr + (reg))
@@ -603,7 +603,7 @@ static int __devinit netdrv_init_board (struct
pci_dev *pdev,
return -ENOMEM;
}
SET_NETDEV_DEV(dev, &pdev->dev);
-   tp = dev->priv;
+   tp = netdev_priv(dev);

/* enable device (incl. PCI PM wakeup), and bus-mastering */
rc = pci_enable_device (pdev);
@@ -759,7 +759,7 @@ static int __devinit netdrv_init_one (struct pci_dev *pdev,
return i;
}

-   tp = dev->priv;
+   tp = netdev_priv(dev);

assert (ioaddr != NULL);
assert (dev != NULL);
@@ -783,7 +783,7 @@ static int __devinit netdrv_init_one (struct pci_dev *pdev,
dev->base_addr = (unsigned long) ioaddr;

/* dev->priv/tp zeroed and aligned in alloc_etherdev */
-   tp = dev->priv;
+   tp = netdev_priv(dev);

/* note: tp->chipset set in netdrv_init_board */
tp->drv_flags = PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
@@ -841,7 +841,7 @@ static void __devexit netdrv_remove_one (struct
pci_dev *pdev)

assert (dev != NULL);

-   np = dev->priv;
+   np = netdev_priv(dev);
assert (np != NULL);

unregister_netdev (dev);
@@ -974,7 +974,7 @@ static void mdio_sync (void *mdio_addr)

 static int mdio_read (struct net_device *dev, int phy_id, int location)
 {
-   struct netdrv_private *tp = dev->priv;
+   struct netdrv_private *tp = netdev_priv(dev);
void *mdio_addr = tp->mmio_addr + Config4;
int mii_cmd = (0xf6 << 10) | (phy_id << 5) | location;
int retval = 0;
@@ -1017,7 +1017,7 @@ static int mdio_read (struct net_device *dev,
int phy_id, int location)
 static void mdio_write (struct net_device *dev, int phy_id, int location,
int value)
 {
-   struct netdrv_private *tp = dev->priv;
+   struct netdrv_private *tp = netdev_priv(dev);
void *mdio_addr = tp->mmio_addr + Config4;
int mii_cmd =
(0x5002 << 16) | (phy_id << 23) | (location << 18) | value;
@@ -1060,7 +1060,7 @@ static void mdio_write (struct net_device *dev,
int phy_id, int location,

 static int netdrv_open (struct net_device *dev)
 {
-   struct netdrv_private *tp = dev->priv;
+   struct netdrv_private *tp = netdev_priv(dev);
int retval;
 #ifdef NETDRV_DEBUG
void *ioaddr = tp->mmio_addr;
@@ -1121,7 +1121,7 @@ static int netdrv_open (struct net_device *dev)
 /* Start the hardware at open or resume. */
 static void netdrv_hw_start (struct net_device *dev)
 {
-   struct netdrv_private *tp = dev->priv;
+   struct netdrv_private *tp = netdev_priv(dev);
void *ioaddr = tp->mmio_addr;
u32 i;

@@ -1191,7 +1191,7 @@ static void netdrv_hw_start (struct net_device *dev)
 /* Initialize the Rx and Tx rings, along with various 'dev' bits. */
 static void netdrv_init_ring (struct net_device *dev)
 {
-   struct netdrv_private *tp = dev->priv;
+   struct netdrv_private *tp = netdev_priv(dev);
int i;

DPRINTK ("ENTER\n");
@@ -1213,7 +1213,7 @@ static void netdrv_init_ring (struct net_device *dev)
 static void netdrv_timer (unsigned long data)
 {
struct net_device *dev = (struct net_device *) data;
-   struct netdrv_private *tp = dev->priv;
+   struct netdrv_private *tp = netdev_priv(dev);
void *ioaddr = tp->mmio_addr;
int next_tick = 60 * HZ;
int mii_lpa;
@@ -1252,9 +1252,10 @@ static void netdrv_timer (unsigned long data)
 }


-static void netdrv_tx_clear (struct netdrv_private *tp)
+static void netdrv_tx_clear (struct net_device *dev)
 {
int i;
+   struct netdrv_private *tp = netdev_priv(dev);

atomic_set (&tp->cur_tx, 0);
atomic_set (&tp->dirty_tx, 0);
@@ -1278,7 +1279,7 @@ static void netdrv_tx_clear (struct netdrv_private *tp)

 static void netdrv_tx_timeout (struct net_device *dev)
 {
-   struct netdrv_private *tp = dev->priv;
+   struct netdrv_private *tp = netdev_priv(dev);
void *ioaddr = tp->mmio_addr;
int i;
u8 tmp8;
@@ -1311,7 +1312,7 @@ static void netdrv_tx_timeout (struct net_device *dev)
/* Stop a shared interrupt from scavenging while we are. */
   

Re: [PATCH 7/7] driver-core : convert semaphore to mutex in struct class

2008-01-21 Thread Dave Young
> >
> > Hope the iteration patches 1-6/7 could be applied.
>
> Can you resend them again, and CC: me on all of them, with the latest
> updates, so I know what I should be reviewing this time around?

Hi, sent.

>
> thanks,
>
> greg k-h
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] driver-core : add class iteration api

2008-01-21 Thread Dave Young
On Jan 22, 2008 1:54 PM, Dave Young <[EMAIL PROTECTED]> wrote:
>
> Add the following class iteration functions for driver use:
> class_for_each_device
> class_find_device
> class_for_each_child
> class_find_child
>
> Signed-off-by: Dave Young <[EMAIL PROTECTED]>
>

Acked-by: Cornelia Huck <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] spi : use class iteration api

2008-01-21 Thread Dave Young
Convert to use the class iteration api.

Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

---
 drivers/spi/spi.c |   24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff -upr linux/drivers/spi/spi.c linux.new/drivers/spi/spi.c
--- linux/drivers/spi/spi.c 2008-01-16 08:43:35.0 +0800
+++ linux.new/drivers/spi/spi.c 2008-01-16 08:43:35.0 +0800
@@ -485,6 +485,15 @@ void spi_unregister_master(struct spi_ma
 }
 EXPORT_SYMBOL_GPL(spi_unregister_master);
 
+static int __spi_master_match(struct device *dev, void *data)
+{
+   struct spi_master *m;
+   u16 *bus_num = (u16 *)data;
+
+   m = container_of(dev, struct spi_master, dev);
+   return m->bus_num == *bus_num;
+}
+
 /**
  * spi_busnum_to_master - look up master associated with bus_num
  * @bus_num: the master's bus number
@@ -499,17 +508,12 @@ struct spi_master *spi_busnum_to_master(
 {
struct device   *dev;
struct spi_master   *master = NULL;
-   struct spi_master   *m;
 
-   down(&spi_master_class.sem);
-   list_for_each_entry(dev, &spi_master_class.children, node) {
-   m = container_of(dev, struct spi_master, dev);
-   if (m->bus_num == bus_num) {
-   master = spi_master_get(m);
-   break;
-   }
-   }
-   up(&spi_master_class.sem);
+   dev = class_find_device(&spi_master_class, &bus_num,
+   __spi_master_match);
+   if (dev)
+   master = container_of(dev, struct spi_master, dev);
+   /* reference got in class_find_device */
return master;
 }
 EXPORT_SYMBOL_GPL(spi_busnum_to_master);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/6] scsi : use class iteration api

2008-01-21 Thread Dave Young
Convert to use the class iteration api.

Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

---
 drivers/scsi/hosts.c |   24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff -upr linux/drivers/scsi/hosts.c linux.new/drivers/scsi/hosts.c
--- linux/drivers/scsi/hosts.c  2008-01-16 08:43:35.0 +0800
+++ linux.new/drivers/scsi/hosts.c  2008-01-16 08:43:35.0 +0800
@@ -429,6 +429,15 @@ void scsi_unregister(struct Scsi_Host *s
 }
 EXPORT_SYMBOL(scsi_unregister);
 
+static int __scsi_host_match(struct class_device *cdev, void *data)
+{
+   struct Scsi_Host *p;
+   unsigned short *hostnum = (unsigned short *)data;
+
+   p = class_to_shost(cdev);
+   return p->host_no == *hostnum;
+}
+
 /**
  * scsi_host_lookup - get a reference to a Scsi_Host by host no
  *
@@ -439,19 +448,12 @@ EXPORT_SYMBOL(scsi_unregister);
  **/
 struct Scsi_Host *scsi_host_lookup(unsigned short hostnum)
 {
-   struct class *class = &shost_class;
struct class_device *cdev;
-   struct Scsi_Host *shost = ERR_PTR(-ENXIO), *p;
+   struct Scsi_Host *shost = ERR_PTR(-ENXIO);
 
-   down(&class->sem);
-   list_for_each_entry(cdev, &class->children, node) {
-   p = class_to_shost(cdev);
-   if (p->host_no == hostnum) {
-   shost = scsi_host_get(p);
-   break;
-   }
-   }
-   up(&class->sem);
+   cdev = class_find_child(&shost_class, &hostnum, __scsi_host_match);
+   if (cdev)
+   shost = scsi_host_get(class_to_shost(cdev));
 
return shost;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/6] rtc : use class iteration api

2008-01-21 Thread Dave Young
Convert to use the class iteration api.

Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

---
 drivers/rtc/interface.c |   22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff -upr linux/drivers/rtc/interface.c linux.new/drivers/rtc/interface.c
--- linux/drivers/rtc/interface.c   2008-01-11 18:06:38.0 +0800
+++ linux.new/drivers/rtc/interface.c   2008-01-11 18:06:38.0 +0800
@@ -251,20 +251,23 @@ void rtc_update_irq(struct rtc_device *r
 }
 EXPORT_SYMBOL_GPL(rtc_update_irq);
 
+static int __rtc_match(struct device *dev, void *data)
+{
+   char *name = (char *)data;
+
+   if (strncmp(dev->bus_id, name, BUS_ID_SIZE) == 0)
+   return 1;
+   return 0;
+}
+
 struct rtc_device *rtc_class_open(char *name)
 {
struct device *dev;
struct rtc_device *rtc = NULL;
 
-   down(&rtc_class->sem);
-   list_for_each_entry(dev, &rtc_class->devices, node) {
-   if (strncmp(dev->bus_id, name, BUS_ID_SIZE) == 0) {
-   dev = get_device(dev);
-   if (dev)
-   rtc = to_rtc_device(dev);
-   break;
-   }
-   }
+   dev = class_find_device(rtc_class, name, __rtc_match);
+   if (dev)
+   rtc = to_rtc_device(dev);
 
if (rtc) {
if (!try_module_get(rtc->owner)) {
@@ -272,7 +275,6 @@ struct rtc_device *rtc_class_open(char *
rtc = NULL;
}
}
-   up(&rtc_class->sem);
 
return rtc;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] power supply : use class iteration api

2008-01-21 Thread Dave Young
Convert to use the class iteration api.

Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

---
 drivers/power/apm_power.c |  116 ++
 drivers/power/power_supply_core.c |   72 ---
 2 files changed, 106 insertions(+), 82 deletions(-)

diff -upr linux/drivers/power/apm_power.c linux.new/drivers/power/apm_power.c
--- linux/drivers/power/apm_power.c 2008-01-11 18:06:38.0 +0800
+++ linux.new/drivers/power/apm_power.c 2008-01-11 18:06:38.0 +0800
@@ -13,6 +13,7 @@
 #include 
 #include 
 
+static DEFINE_MUTEX(apm_mutex);
 #define PSY_PROP(psy, prop, val) psy->get_property(psy, \
 POWER_SUPPLY_PROP_##prop, val)
 
@@ -23,67 +24,86 @@
 
 static struct power_supply *main_battery;
 
-static void find_main_battery(void)
-{
-   struct device *dev;
-   struct power_supply *bat = NULL;
-   struct power_supply *max_charge_bat = NULL;
-   struct power_supply *max_energy_bat = NULL;
+struct find_bat_param {
+   struct power_supply *main;
+   struct power_supply *bat;
+   struct power_supply *max_charge_bat;
+   struct power_supply *max_energy_bat;
union power_supply_propval full;
-   int max_charge = 0;
-   int max_energy = 0;
+   int max_charge;
+   int max_energy;
+};
 
-   main_battery = NULL;
+static int __find_main_battery(struct device *dev, void *data)
+{
+   struct find_bat_param *bp = (struct find_bat_param *)data;
 
-   list_for_each_entry(dev, &power_supply_class->devices, node) {
-   bat = dev_get_drvdata(dev);
+   bp->bat = dev_get_drvdata(dev);
 
-   if (bat->use_for_apm) {
-   /* nice, we explicitly asked to report this battery. */
-   main_battery = bat;
-   return;
-   }
+   if (bp->bat->use_for_apm) {
+   /* nice, we explicitly asked to report this battery. */
+   bp->main = bp->bat;
+   return 1;
+   }
 
-   if (!PSY_PROP(bat, CHARGE_FULL_DESIGN, &full) ||
-   !PSY_PROP(bat, CHARGE_FULL, &full)) {
-   if (full.intval > max_charge) {
-   max_charge_bat = bat;
-   max_charge = full.intval;
-   }
-   } else if (!PSY_PROP(bat, ENERGY_FULL_DESIGN, &full) ||
-   !PSY_PROP(bat, ENERGY_FULL, &full)) {
-   if (full.intval > max_energy) {
-   max_energy_bat = bat;
-   max_energy = full.intval;
-   }
+   if (!PSY_PROP(bp->bat, CHARGE_FULL_DESIGN, &bp->full) ||
+   !PSY_PROP(bp->bat, CHARGE_FULL, &bp->full)) {
+   if (bp->full.intval > bp->max_charge) {
+   bp->max_charge_bat = bp->bat;
+   bp->max_charge = bp->full.intval;
+   }
+   } else if (!PSY_PROP(bp->bat, ENERGY_FULL_DESIGN, &bp->full) ||
+   !PSY_PROP(bp->bat, ENERGY_FULL, &bp->full)) {
+   if (bp->full.intval > bp->max_energy) {
+   bp->max_energy_bat = bp->bat;
+   bp->max_energy = bp->full.intval;
}
}
+   return 0;
+}
+
+static void find_main_battery(void)
+{
+   struct find_bat_param bp;
+   int error;
+
+   memset(&bp, 0, sizeof(struct find_bat_param));
+   main_battery = NULL;
+   bp.main = main_battery;
+
+   error = class_for_each_device(power_supply_class, &bp,
+ __find_main_battery);
+   if (error) {
+   main_battery = bp.main;
+   return;
+   }
 
-   if ((max_energy_bat && max_charge_bat) &&
-   (max_energy_bat != max_charge_bat)) {
+   if ((bp.max_energy_bat && bp.max_charge_bat) &&
+   (bp.max_energy_bat != bp.max_charge_bat)) {
/* try guess battery with more capacity */
-   if (!PSY_PROP(max_charge_bat, VOLTAGE_MAX_DESIGN, &full)) {
-   if (max_energy > max_charge * full.intval)
-   main_battery = max_energy_bat;
+   if (!PSY_PROP(bp.max_charge_bat, VOLTAGE_MAX_DESIGN,
+ &bp.full)) {
+   if (bp.max_energy > bp.max_charge * bp.full.intval)
+   main_battery = bp.max_energy_bat;
else
-   main_battery = max_charge_bat;
-   } else if (!PSY_PROP(max_energy_bat, VOLTAGE_MAX_DESIGN,
- &full)) {
-   if (max_charge > max_energy / full.intval)
-   main_battery = max_charge_bat;
+   main_battery

[PATCH 2/6] ieee1394 : use class iteration api

2008-01-21 Thread Dave Young
Convert to use the class iteration api.

Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

---
 drivers/ieee1394/nodemgr.c |  312 +
 1 file changed, 175 insertions(+), 137 deletions(-)

diff -upr linux/drivers/ieee1394/nodemgr.c linux.new/drivers/ieee1394/nodemgr.c
--- linux/drivers/ieee1394/nodemgr.c2008-01-16 08:43:35.0 +0800
+++ linux.new/drivers/ieee1394/nodemgr.c2008-01-16 08:43:35.0 
+0800
@@ -727,33 +727,31 @@ static int nodemgr_bus_match(struct devi
 
 static DEFINE_MUTEX(nodemgr_serialize_remove_uds);
 
+static int __match_ne(struct device *dev, void *data)
+{
+   struct unit_directory *ud;
+   struct node_entry *ne = (struct node_entry *)data;
+
+   ud = container_of(dev, struct unit_directory, unit_dev);
+   return ud->ne == ne;
+}
+
 static void nodemgr_remove_uds(struct node_entry *ne)
 {
struct device *dev;
-   struct unit_directory *tmp, *ud;
+   struct unit_directory *ud;
 
-   /* Iteration over nodemgr_ud_class.devices has to be protected by
-* nodemgr_ud_class.sem, but device_unregister() will eventually
-* take nodemgr_ud_class.sem too. Therefore pick out one ud at a time,
-* release the semaphore, and then unregister the ud. Since this code
-* may be called from other contexts besides the knodemgrds, protect the
-* gap after release of the semaphore by nodemgr_serialize_remove_uds.
+   /* Use class_find device to iterate the devices. Since this code
+* may be called from other contexts besides the knodemgrds,
+* protect it by nodemgr_serialize_remove_uds.
 */
mutex_lock(&nodemgr_serialize_remove_uds);
for (;;) {
-   ud = NULL;
-   down(&nodemgr_ud_class.sem);
-   list_for_each_entry(dev, &nodemgr_ud_class.devices, node) {
-   tmp = container_of(dev, struct unit_directory,
-  unit_dev);
-   if (tmp->ne == ne) {
-   ud = tmp;
-   break;
-   }
-   }
-   up(&nodemgr_ud_class.sem);
-   if (ud == NULL)
+   dev = class_find_device(&nodemgr_ud_class, ne, __match_ne);
+   if (!dev)
break;
+   ud = container_of(dev, struct unit_directory, unit_dev);
+   put_device(dev);
device_unregister(&ud->unit_dev);
device_unregister(&ud->device);
}
@@ -882,45 +880,66 @@ fail_alloc:
return NULL;
 }
 
+static int __match_ne_guid(struct device *dev, void *data)
+{
+   struct node_entry *ne;
+   u64 *guid = (u64 *)data;
+
+   ne = container_of(dev, struct node_entry, node_dev);
+   return ne->guid == *guid;
+}
 
 static struct node_entry *find_entry_by_guid(u64 guid)
 {
struct device *dev;
-   struct node_entry *ne, *ret_ne = NULL;
-
-   down(&nodemgr_ne_class.sem);
-   list_for_each_entry(dev, &nodemgr_ne_class.devices, node) {
-   ne = container_of(dev, struct node_entry, node_dev);
+   struct node_entry *ne;
 
-   if (ne->guid == guid) {
-   ret_ne = ne;
-   break;
-   }
-   }
-   up(&nodemgr_ne_class.sem);
+   dev = class_find_device(&nodemgr_ne_class, &guid, __match_ne_guid);
+   if (!dev)
+   return NULL;
+   ne = container_of(dev, struct node_entry, node_dev);
+   put_device(dev);
 
-   return ret_ne;
+   return ne;
 }
 
+struct match_nodeid_param {
+   struct hpsb_host *host;
+   nodeid_t nodeid;
+};
+
+static int __match_ne_nodeid(struct device *dev, void *data)
+{
+   int found = 0;
+   struct node_entry *ne;
+   struct match_nodeid_param *param = (struct match_nodeid_param *)data;
+
+   if (!dev)
+   goto ret;
+   ne = container_of(dev, struct node_entry, node_dev);
+   if (ne->host == param->host && ne->nodeid == param->nodeid)
+   found = 1;
+ret:
+   return found;
+}
 
 static struct node_entry *find_entry_by_nodeid(struct hpsb_host *host,
   nodeid_t nodeid)
 {
struct device *dev;
-   struct node_entry *ne, *ret_ne = NULL;
+   struct node_entry *ne;
+   struct match_nodeid_param param;
 
-   down(&nodemgr_ne_class.sem);
-   list_for_each_entry(dev, &nodemgr_ne_class.devices, node) {
-   ne = container_of(dev, struct node_entry, node_dev);
+   param.host = host;
+   param.nodeid = nodeid;
 
-   if (ne->host == host && ne->nodeid == nodeid) {
-   ret_ne = ne;
-   break;
-   }
-   }
-   up(&nodemgr_ne_class.sem);
+   dev = class_find_device(&nodemgr_ne_class, ¶m, __match_ne_nodeid);
+   i

[PATCH 1/6] driver-core : add class iteration api

2008-01-21 Thread Dave Young

Add the following class iteration functions for driver use:
class_for_each_device
class_find_device
class_for_each_child
class_find_child

Signed-off-by: Dave Young <[EMAIL PROTECTED]> 

---
 drivers/base/class.c   |  159 +
 include/linux/device.h |   11 ++-
 2 files changed, 168 insertions(+), 2 deletions(-)

diff -upr linux/drivers/base/class.c linux.new/drivers/base/class.c
--- linux/drivers/base/class.c  2008-01-15 11:12:29.0 +0800
+++ linux.new/drivers/base/class.c  2008-01-15 11:12:29.0 +0800
@@ -798,6 +798,165 @@ void class_device_put(struct class_devic
kobject_put(&class_dev->kobj);
 }
 
+/**
+ * class_for_each_device - device iterator
+ * @class: the class we're iterating
+ * @data: data for the callback
+ * @fn: function to be called for each device
+ *
+ * Iterate over @class's list of devices, and call @fn for each,
+ * passing it @data.
+ *
+ * We check the return of @fn each time. If it returns anything
+ * other than 0, we break out and return that value.
+ */
+int class_for_each_device(struct class *class, void *data,
+  int (*fn)(struct device *, void *))
+{
+   struct device *dev;
+   int error = 0;
+
+   if (!class)
+   return -EINVAL;
+   down(&class->sem);
+   list_for_each_entry(dev, &class->devices, node) {
+   dev = get_device(dev);
+   if (dev) {
+   error = fn(dev, data);
+   put_device(dev);
+   } else
+   error = -ENODEV;
+   if (error)
+   break;
+   }
+   up(&class->sem);
+
+   return error;
+}
+EXPORT_SYMBOL_GPL(class_for_each_device);
+
+/**
+ * class_find_device - device iterator for locating a particular device
+ * @class: the class we're iterating
+ * @data: data for the match function
+ * @match: function to check device
+ *
+ * This is similar to the class_for_each_dev() function above, but it
+ * returns a reference to a device that is 'found' for later use, as
+ * determined by the @match callback.
+ *
+ * The callback should return 0 if the device doesn't match and non-zero
+ * if it does.  If the callback returns non-zero, this function will
+ * return to the caller and not iterate over any more devices.
+
+ * Note, you will need to drop the reference with put_device() after use.
+ */
+struct device *class_find_device(struct class *class, void *data,
+  int (*match)(struct device *, void *))
+{
+   struct device *dev;
+   int found = 0;
+
+   if (!class)
+   return NULL;
+
+   down(&class->sem);
+   list_for_each_entry(dev, &class->devices, node) {
+   dev = get_device(dev);
+   if (dev) {
+   if (match(dev, data)) {
+   found = 1;
+   break;
+   } else
+   put_device(dev);
+   } else
+   break;
+   }
+   up(&class->sem);
+
+   return found ? dev : NULL;
+}
+EXPORT_SYMBOL_GPL(class_find_device);
+
+/**
+ * class_for_each_child - class child iterator
+ * @class: the class we're iterating
+ * @data: data for the callback
+ * @fn: function to be called for each child of the class
+ *
+ * Iterate over @class's list of children, and call @fn for each,
+ * passing it @data.
+ *
+ * We check the return of @fn each time. If it returns anything
+ * other than 0, we break out and return that value.
+ */
+int class_for_each_child(struct class *class, void *data,
+  int (*fn)(struct class_device *, void *))
+{
+   struct class_device *dev;
+   int error = 0;
+
+   if (!class)
+   return -EINVAL;
+   down(&class->sem);
+   list_for_each_entry(dev, &class->children, node) {
+   dev = class_device_get(dev);
+   if (dev) {
+   error = fn(dev, data);
+   class_device_put(dev);
+   } else
+   error = -ENODEV;
+   if (error)
+   break;
+   }
+   up(&class->sem);
+
+   return error;
+}
+EXPORT_SYMBOL_GPL(class_for_each_child);
+
+/**
+ * class_find_child - device iterator for locating a particular class_device
+ * @class: the class we're iterating
+ * @data: data for the match function
+ * @match: function to check class_device
+ *
+ * This is similar to the class_for_each_child() function above, but it
+ * returns a reference to a class_device that is 'found' for later use, as
+ * determined by the @match callback.
+ *
+ * The callback should return 0 if the class_device doesn't match and non-zero
+ * if it does.  If the callback returns non-zero, this function will
+ * return to the caller and not iter

Re: questions on NAPI processing latency and dropped network packets

2008-01-21 Thread Eric Dumazet

Chris Friesen a écrit :

Eric Dumazet wrote:

Chris Friesen a écrit :

I've done some further digging, and it appears that one of the 
problems we may be facing is very high instantaneous traffic rates.


Instrumentation showed up to 222K packets/sec for short periods (at 
least 1.1 ms, possibly longer), although the long-term average is 
down around 14-16K packets/sec.



Instrumentation done where exactly ?


I added some code to e1000_clean_rx_irq() to track rx_fifo drops, total 
packets received, and an accurate timestamp.


If rx_fifo errors changed, it would dump the information.

Is there anything else we can do to minimize the latency of network 
packet processing and avoid having to crank the rx ring size up so high?


You have some tasks that disable softirqs too long. Sometimes, bumping 
RX ring size is OK (but you will still have delays), sometimes it is 
not an option, since 4096 is the limit on current hardware.


I added some instrumentation to take timestamps in __do_softirq() as 
well.  Based on these timestamps, I can see the following code sequence:


2374604616 usec, start processing softirqs in __do_softirq()
2374610337 usec, log values in e1000_clean_rx_irq()
2374611411 usec, log values in e1000_clean_rx_irq()

In between the successive calls to e1000_clean_rx_irq() the rx_fifo 
counts went up.


Does anyone have any patchsets to track down what softirqs are taking a 
long time, and/or who's disabling softirqs?




Not for linux-2.6.10 unfortunatly.

Check net/ipv4/route.c, where many improvements can be done, especially if you 
have a large rt cache


grep . /proc/sys/net/ipv4/route/*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24 regression: pan hanging unkilleable and un-straceable

2008-01-21 Thread Mike Galbraith

On Tue, 2008-01-22 at 16:25 +1100, Nick Piggin wrote:
> On Tuesday 22 January 2008 16:03, Mike Galbraith wrote:

> > I've hit same twice recently (not pan, and not repeatable).
> 
> Nasty. The attached patch is something really simple that can sometimes help.
> sysrq+p is also an option, if you're on a UP system.

SMP (P4/HT imitating real cores)

> Any luck getting traces?

We'll see.  Armed.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rcu: fix section mismatch

2008-01-21 Thread Sam Ravnborg
On Mon, Jan 21, 2008 at 03:34:09PM -0800, Randy Dunlap wrote:
> On Mon, 21 Jan 2008 11:38:38 +1100 Rusty Russell wrote:
> 
> > On Sunday 20 January 2008 08:25:49 Sam Ravnborg wrote:
> > > On Sat, Jan 19, 2008 at 11:56:43AM -0800, Randy Dunlap wrote:
> > > > rcu_online_cpu() should be __cpuinit instead of __devinit.
> > >
> > > So if we have:
> > > CONFIG_HOTPLUG=n
> > > CONFIG_HOTPLUG_CPU=y
> > >
> > > then this is a oops candidate.
> > 
> > At first glance, this can't happen because all CONFIG_HOTPLUG_CPU depends 
> > on 
> > CONFIG_HOTPLUG or selects it, for all archs.
> 
> Mostly, but arch/mips/ seems to be different (neither depends nor selects)
> unless it has changed very recently (I looked at 2.6.24-rc8).

mips has
default n

So they at least try to turn off this feature.

Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHSET] driver core : add class iteration api

2008-01-21 Thread Dave Young
Repost for review.

[PATCH 1/6] Add some class iteration functions in driver core
[PATCH 2-6/6] Make the drivers with class iterating to use class iteration api

toc:
---
1-driver-core-add-class-iteration-api.patch
2-ieee1394-use-class-iteration-api.patch
3-power_supply-use-class-iteration-api.patch
4-rtc-use-class-iteration-api.patch
5-scsi-use-class-iteration-api.patch
6-spi-use-class-iteration-api.patch

Summary diffstat:
---
 drivers/base/class.c  |  159 +++
 drivers/ieee1394/nodemgr.c|  312 +-
 drivers/power/apm_power.c |  116 --
 drivers/power/power_supply_core.c |   72 
 drivers/rtc/interface.c   |   22 +-
 drivers/scsi/hosts.c  |   24 +-
 drivers/spi/spi.c |   24 +-
 include/linux/device.h|   11 +
 8 files changed, 488 insertions(+), 252 deletions(-)

Regards
dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sysfs network namespace support - was this patch set forgotten ?

2008-01-21 Thread Greg KH
On Sun, Jan 20, 2008 at 09:08:43AM +0200, Ian Brown wrote:
> Hello,
> 
> I saw some posts (from about a month ago) about network namespace
> support patches; I wonder: what
> is the status of this patch set ? was it somehow forgotten ?
> (I don't see it in v2.6.24-rc8 mm tree).

It wasn't "forgotten", but rather, not applied as Al Viro started to
have some serious questions about these changes...  I'll wait for his
review before applying them.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/12] ide-floppy redux v2.5

2008-01-21 Thread Borislav Petkov
On Mon, Jan 21, 2008 at 11:45:35PM +0100, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi Borislav,
> 
> On Sunday 20 January 2008, Borislav Petkov wrote:
> > On Mon, Jan 14, 2008 at 10:38:17PM +0100, Bartlomiej Zolnierkiewicz wrote:
> > > > By the way, I have an Iomega ZIP 100 drive somewhere in my hardware 
> > > > pile and
> > > > will do some testing with the "new" :) driver just in case.
> > > 
> > > This would be great. :)
> > Hi Bart,
> > 
> > i just whipped rc8 along with your pata-2.6 tree on top and had several test
> > runs of the ide-floppy driver (raw reads, software floppy disk eject, etc) 
> > and
> > everything seems to work fine. I will keep this hardware setup here so that 
> > we
> > could at least test ide-floppy occasionally. We should probably acknowledge 
> > this
> 
> Big thanks for all great ide-floppy work!  I hope that you'll continue with
> putting IDE device drivers in shape. :)

Sure, no problem :). I'm on ide-tape right now and probably will have most of it
ready for submission on the weekend so keep your fingers crossed... :)

-- 
Regards/Gruß,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22.16 MD raid1 doesn't mark removed disk faulty, MD thread goes UN

2008-01-21 Thread Mike Snitzer
cc'ing Tanaka-san given his recent raid1 BUG report:
http://lkml.org/lkml/2008/1/14/515

On Jan 21, 2008 6:04 PM, Mike Snitzer <[EMAIL PROTECTED]> wrote:
> Under 2.6.22.16, I physically pulled a SATA disk (/dev/sdac, connected to
> an aacraid controller) that was acting as the local raid1 member of
> /dev/md30.
>
> Linux MD didn't see an /dev/sdac1 error until I tried forcing the issue by
> doing a read (with dd) from /dev/md30:
>
> Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
> Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key :
> Hardware Error [current]
> Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0
> Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense:
> Internal target failure
> Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 71
> Jan 21 17:08:07 lab17-233 kernel: printk: 3 messages suppressed.
> Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 8
> Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 16
> Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 24
> Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 32
> Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 40
> Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 48
> Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 56
> Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 64
> Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 72
> Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 80
> Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
> Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key :
> Hardware Error [current]
> Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0
> Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense:
> Internal target failure
> Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 343
> Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
> Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key :
> Hardware Error [current]
> Jan 21 17:08:08 lab17-233 kernel: Info fld=0x0
> ...
> Jan 21 17:08:12 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense:
> Internal target failure
> Jan 21 17:08:12 lab17-233 kernel: end_request: I/O error, dev sdac, sector 
> 3399
> Jan 21 17:08:12 lab17-233 kernel: printk: 765 messages suppressed.
> Jan 21 17:08:12 lab17-233 kernel: raid1: sdac1: rescheduling sector 3336
>
> However, the MD layer still hasn't marked the sdac1 member faulty:
>
> md30 : active raid1 nbd2[1](W) sdac1[0]
>   4016204 blocks super 1.0 [2/2] [UU]
>   bitmap: 1/8 pages [4KB], 256KB chunk
>
> The dd I used to read from /dev/md30 is blocked on IO:
>
> Jan 21 17:13:55 lab17-233 kernel: ddD 0afa9cf5c346
> 0 12337   7702 (NOTLB)
> Jan 21 17:13:55 lab17-233 kernel:  81010c449868 0082
>  80268f14
> Jan 21 17:13:55 lab17-233 kernel:  81015da6f320 81015de532c0
> 0008 81012d9d7780
> Jan 21 17:13:55 lab17-233 kernel:  81015fae2880 4926
> 81012d9d7970 0001802879a0
> Jan 21 17:13:55 lab17-233 kernel: Call Trace:
> Jan 21 17:13:55 lab17-233 kernel:  [] 
> mempool_alloc+0x24/0xda
> Jan 21 17:13:55 lab17-233 kernel:  []
> :raid1:wait_barrier+0x84/0xc2
> Jan 21 17:13:55 lab17-233 kernel:  []
> default_wake_function+0x0/0xe
> Jan 21 17:13:55 lab17-233 kernel:  []
> :raid1:make_request+0x83/0x5c0
> Jan 21 17:13:55 lab17-233 kernel:  []
> __make_request+0x57f/0x668
> Jan 21 17:13:55 lab17-233 kernel:  []
> generic_make_request+0x26e/0x2a9
> Jan 21 17:13:55 lab17-233 kernel:  [] 
> mempool_alloc+0x24/0xda
> Jan 21 17:13:55 lab17-233 kernel:  [] __next_cpu+0x19/0x28
> Jan 21 17:13:55 lab17-233 kernel:  [] submit_bio+0xb6/0xbd
> Jan 21 17:13:55 lab17-233 kernel:  [] submit_bh+0xdf/0xff
> Jan 21 17:13:55 lab17-233 kernel:  []
> block_read_full_page+0x271/0x28e
> Jan 21 17:13:55 lab17-233 kernel:  []
> blkdev_get_block+0x0/0x46
> Jan 21 17:13:55 lab17-233 kernel:  []
> radix_tree_insert+0xcb/0x18c
> Jan 21 17:13:55 lab17-233 kernel:  []
> __do_page_cache_readahead+0x16d/0x1df
> Jan 21 17:13:55 lab17-233 kernel:  [] 
> getnstimeofday+0x32/0x8d
> Jan 21 17:13:55 lab17-233 kernel:  [] ktime_get_ts+0x1a/0x4e
> Jan 21 17:13:55 lab17-233 kernel:  [] 
> delayacct_end+0x7d/0x88
> Jan 21 17:13:55 lab17-233 kernel:  []
> blockable_page_cache_readahead+0x53/0xb2
> Jan 21 17:13:55 lab17-233 kernel:  []
> make_ahead_window+0x82/0x9e
> Jan 21 17:13:55 lab17-233 kernel:  []
> page_cache_readahead+0x18a/0x1c1
> Jan 21 17:13:55 lab17-233 kernel:  []
> do_generic_mapping_read+0x135/0x3fc
> Jan 21 17:13:55 lab17-233 kernel:  []
> file_read_actor+0x0/0x170
> Jan 21 17:13:55 lab17-233 kernel: 

Re: The SMP alternatives code breaks exception fixup?

2008-01-21 Thread Andi Kleen
Chuck Ebbert <[EMAIL PROTECTED]> writes:
> 
> There is a fixup, so this should never happen. But the lock instruction
> was replaced with a nop by the altinstruction code, and that makes the fixup
> address wrong. AFAICT we don't fix up the exception table when we replace
> a lock with a nop, which makes the fixup table point to the nop instead
> of the cmpxchg instruction and causes us to miss the fixup.

Indeed.  Nasty issue.

A quick fix would be to add another fixup to handle both cases
I checked the other LOCK_PREFIX users and they look ok.

Does this fix it?

-Andi

(untested) 

---

Add exception handlers for both the LOCK and no LOCK prefix
case in futex.

Hopefully fixes https://bugzilla.redhat.com/show_bug.cgi?id=429412

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

Index: linux/include/asm-x86/futex.h
===
--- linux.orig/include/asm-x86/futex.h
+++ linux/include/asm-x86/futex.h
@@ -30,7 +30,7 @@
 "1:movl%2, %0\n\
movl%0, %3\n"   \
insn "\n"   \
-"2:" LOCK_PREFIX "cmpxchgl %3, %2\n\
+"2:" LOCK_PREFIX "\n5: cmpxchgl %3, %2\n   \
jnz 1b\n\
 3: .section .fixup,\"ax\"\n\
 4: mov %5, %1\n\
@@ -38,7 +38,7 @@
.previous\n \
.section __ex_table,\"a\"\n \
.align  8\n"\
-   _ASM_PTR "1b,4b,2b,4b\n \
+   _ASM_PTR "1b,4b,2b,4b,5b,4b\n   \
.previous"  \
: "=&a" (oldval), "=&r" (ret), "+m" (*uaddr),   \
  "=&r" (tem)   \
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24 regression: pan hanging unkilleable and un-straceable

2008-01-21 Thread Nick Piggin
On Tuesday 22 January 2008 16:03, Mike Galbraith wrote:
> On Tue, 2008-01-22 at 11:05 +1100, Nick Piggin wrote:
> > On Tuesday 22 January 2008 07:58, Frederik Himpe wrote:
> > > With Linux 2.6.24-rc8 I often have the problem that the pan usenet
> > > reader starts using 100% of CPU time after some time. When this
> > > happens, kill -9 does not work, and strace just hangs when trying to
> > > attach to the process. The same with gdb. ps shows the process as
> > > being in the R state.
> > >
> > > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan:
> > > Jan 21 21:45:01 Anastacia kernel: pan   R  running task   
> > > 0
> >
> > Well I've twice tried to submit a patch to print stacks for running
> > tasks as well, but nobody seems interested. It would at least give a
> > chance to see something.
>
> I've hit same twice recently (not pan, and not repeatable).

Nasty. The attached patch is something really simple that can sometimes help.
sysrq+p is also an option, if you're on a UP system.

Any luck getting traces?

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -4920,8 +4920,7 @@ static void show_task(struct task_struct
 	printk(KERN_CONT "%5lu %5d %6d\n", free,
 		task_pid_nr(p), task_pid_nr(p->real_parent));
 
-	if (state != TASK_RUNNING)
-		show_stack(p, NULL);
+	show_stack(p, NULL);
 }
 
 void show_state_filter(unsigned long state_filter)


Re: [PATCH 7/7] driver-core : convert semaphore to mutex in struct class

2008-01-21 Thread Greg KH
On Tue, Jan 22, 2008 at 08:55:05AM +0800, Dave Young wrote:
> On Jan 22, 2008 5:16 AM, Jarek Poplawski <[EMAIL PROTECTED]> wrote:
> > Dave Young wrote, On 01/21/2008 09:44 AM:
> > ...
> > > I applied it in my kernel, built and run without warnings, but it need
> > > more testing.
> > > I will be very glad to see the test result about this if you could, 
> > > thanks.
> >
> > Bad news. (Alas I won't be able to check this today.)
> 
> Hi, thanks your effort. Now I think we should stop this thread and
> waiting the class_device going away :)
> 
> Hope the iteration patches 1-6/7 could be applied.

Can you resend them again, and CC: me on all of them, with the latest
updates, so I know what I should be reviewing this time around?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24 regression: pan hanging unkilleable and un-straceable

2008-01-21 Thread Mike Galbraith

On Tue, 2008-01-22 at 11:05 +1100, Nick Piggin wrote:
> On Tuesday 22 January 2008 07:58, Frederik Himpe wrote:
> > With Linux 2.6.24-rc8 I often have the problem that the pan usenet
> > reader starts using 100% of CPU time after some time. When this happens,
> > kill -9 does not work, and strace just hangs when trying to attach to
> > the process. The same with gdb. ps shows the process as being in the R
> > state.
> >
> > I pressed Ctrl-Alt-SysRq-T, and this was shown for pan:
> > Jan 21 21:45:01 Anastacia kernel: pan   R  running task0 
> 
> Well I've twice tried to submit a patch to print stacks for running
> tasks as well, but nobody seems interested. It would at least give a
> chance to see something.

I've hit same twice recently (not pan, and not repeatable).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Andi Kleen
Anton Salikhmetov <[EMAIL PROTECTED]> writes:

You should probably put your design document somewhere in Documentation
with a patch.

> + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> + * It will force a pagefault on the next write access.
> + */
> +static void vma_wrprotect(struct vm_area_struct *vma)
> +{
> + unsigned long addr;
> +
> + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> + spinlock_t *ptl;
> + pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> + pud_t *pud = pud_offset(pgd, addr);
> + pmd_t *pmd = pmd_offset(pud, addr);
> + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);

This means on i386 with highmem ptes you will map/flush tlb/unmap each
PTE individually. You will do 512 times as much work as really needed
per PTE leaf page.

The performance critical address space walkers use a different design
pattern that avoids this.

> + if (pte_dirty(*pte) && pte_write(*pte)) {
> + pte_t entry = ptep_clear_flush(vma, addr, pte);

Flushing TLBs unbatched can also be very expensive because if the MM is
shared by several CPUs you'll have a inter-processor interrupt for 
each iteration. They are quite costly even on smaller systems.

It would be better if you did a single flush_tlb_range() at the end.
This means on x86 this will currently always do a full flush, but that's
still better than really slowing down in the heavily multithreaded case.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings

2008-01-21 Thread Dave Young
On Jan 22, 2008 5:14 AM, Ilpo Järvinen <[EMAIL PROTECTED]> wrote:
>
> On Mon, 21 Jan 2008, Dave Young wrote:
>
> > Please see the kernel messages following,(trigged while using some qemu 
> > session)
> > BTW, seems there's some e100 error message as well.
> >
> > PCI: Setting latency timer of device :00:1b.0 to 64
> > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > e100: Copyright(c) 1999-2006 Intel Corporation
> > ACPI: PCI Interrupt :03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > modprobe:2331 conflicting cache attribute efaff000-efb0 
> > uncached<->default
> > e100: :03:08.0: e100_probe: Cannot map device registers, aborting.
> > ACPI: PCI interrupt for device :03:08.0 disabled
> > e100: probe of :03:08.0 failed with error -12
> > eth0:  setting full-duplex.
> > [ cut here ]
> > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
> > snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel 
> > snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw 
> > intel_agp button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore 
> > dcdbas 3c59x pcspkr snd_page_alloc
> > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> >  [] ? printk+0x0/0x20
> >  [] warn_on_slowpath+0x54/0x80
> >  [] ? ip_finish_output+0x128/0x2e0
> >  [] ? ip_output+0xe7/0x100
> >  [] ? ip_local_out+0x18/0x20
> >  [] ? ip_queue_xmit+0x3dc/0x470
> >  [] ? _spin_unlock_irqrestore+0x5e/0x70
> >  [] ? check_pad_bytes+0x61/0x80
> >  [] tcp_mark_head_lost+0x121/0x150
> >  [] tcp_update_scoreboard+0x4c/0x170
> >  [] tcp_fastretrans_alert+0x48a/0x6b0
> >  [] tcp_ack+0x1b3/0x3a0
> >  [] tcp_rcv_established+0x3eb/0x710
> >  [] tcp_v4_do_rcv+0xe5/0x100
> >  [] tcp_v4_rcv+0x5db/0x660
>
> Doh, once more these S+L things..., the rest are symptom of the first
> problem.

What is the S+L thing? Could you explain a bit?

>
> What is strange is that it doesn't show up until now, the last TCP
> changes that could have some significance are from early Dec/Nov. Is
> there some reason why you haven't seen this before this (e.g., not
> tested with similar cfg or so)?

Hmm, don't know how to answer ...

I'm a bit worried about its
> reproducability if it takes this far to see it...
>
>
> --
>  i.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] RFC: Typesafe callbacks

2008-01-21 Thread Andi Kleen
Rusty Russell <[EMAIL PROTECTED]> writes:
> ===
> Attempt to create callbacks which take unsigned long as well as
> correct pointer types.

FWIW i had something similar using the gcc union extension at some
point for ioctls because I was tired for all the ugly casts from
unsigned long arg to void * in ioctl handlers.

But I decided to not push it because sparse would have likely choked 
on it, and sparse actually finds a lot of bugs so it's more important than
having a few more casts.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Parallelize IO for e2fsck

2008-01-21 Thread Valdis . Kletnieks
On Tue, 22 Jan 2008 14:38:30 +1100, David Chinner said:

> Perhaps instead of swapping immediately, a SIGLOWMEM could be sent
> to a processes that aren't masking the signal followed by a short
> grace period to allow the processes to free up some memory before
> swapping out pages from that process?

AIX had SIGDANGER some 15 years ago.  Admittedly, that was sent when
the system was about to hit OOM, not when it was about to start swapping.

I suspect both approaches have their merits...


pgp1E2qCn6W5E.pgp
Description: PGP signature


[PATCH] ARM: Ignore memory tags with invalid data

2008-01-21 Thread Corey Minyard
From: Corey Minyard <[EMAIL PROTECTED]>

The DNS-323 system has several bogus memory entries in the tag table,
and it caused the system to crash at startup.  Ignore tag entries that
are obviously bogus.

Signed-off-by: Corey Minyard <[EMAIL PROTECTED]>
---
 arch/arm/kernel/setup.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index bf56eb3..dfdb469 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -630,7 +630,12 @@ __tagtable(ATAG_CORE, parse_tag_core);
 
 static int __init parse_tag_mem32(const struct tag *tag)
 {
-   if (meminfo.nr_banks >= NR_BANKS) {
+   /*
+* Make sure that the memory size is non-zero, page aligned,
+* and that it doesn't overflow the meminfo table.
+*/
+   if (meminfo.nr_banks >= NR_BANKS || tag->u.mem.size & ~PAGE_MASK ||
+   tag->u.mem.size == 0 || tag->u.mem.start & ~PAGE_MASK) {
printk(KERN_WARNING
   "Ignoring memory bank 0x%08x size %dKB\n",
tag->u.mem.start, tag->u.mem.size / 1024);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] mm: fix PageUptodate data race

2008-01-21 Thread Nick Piggin

After running SetPageUptodate, preceeding stores to the page contents to
actually bring it uptodate may not be ordered with the store to set the page
uptodate.

Therefore, another CPU which checks PageUptodate is true, then reads the
page contents can get stale data.

Fix this by having an smp_wmb before SetPageUptodate, and smp_rmb after
PageUptodate.

Many places that test PageUptodate, do so with the page locked, and this
would be enough to ensure memory ordering in those places if SetPageUptodate
were only called while the page is locked. Unfortunately that is not always
the case for some filesystems, but it could be an idea for the future.

Also bring the handling of anonymous page uptodateness in line with that of
file backed page management, by marking anon pages as uptodate when they _are_
uptodate, rather than when our implementation requires that they be marked as
such. Doing allows us to get rid of the smp_wmb's in the page copying
functions, which were especially added for anonymous pages for an analogous
memory ordering problem. Both file and anonymous pages are handled with the
same barriers.

FAQ:
Q. Why not do this in flush_dcache_page?
A. Firstly, flush_dcache_page handles only one side (the smb side) of the
ordering protocol; we'd still need smp_rmb somewhere. Secondly, hiding away
memory barriers in a completely unrelated function is nasty; at least in the
PageUptodate macros, they are located together with (half) the operations
involved in the ordering. Thirdly, the smp_wmb is only required when first
bringing the page uptodate, wheras flush_dcache_page should be called each time
it is written to through the kernel mapping. It is logically the wrong place to
put it.

Q. Why does this increase my text size / reduce my performance / etc.
A. Because it is adding the necessary instructions to eliminate the data-race.

Q. Can it be improved?
A. Yes, eg. if you were to create a rule that all SetPageUptodate operations
run under the page lock, we could avoid the smp_rmb places where PageUptodate
is queried under the page lock. Requires audit of all filesystems and at least
some would need reworking. That's great you're interested, I'm eagerly awaiting
your patches.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
---
Index: linux-2.6/include/linux/highmem.h
===
--- linux-2.6.orig/include/linux/highmem.h
+++ linux-2.6/include/linux/highmem.h
@@ -68,8 +68,6 @@ static inline void clear_user_highpage(s
void *addr = kmap_atomic(page, KM_USER0);
clear_user_page(addr, vaddr, page);
kunmap_atomic(addr, KM_USER0);
-   /* Make sure this page is cleared on other CPU's too before using it */
-   smp_wmb();
 }
 
 #ifndef __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
@@ -160,8 +158,6 @@ static inline void copy_user_highpage(st
copy_user_page(vto, vfrom, vaddr, to);
kunmap_atomic(vfrom, KM_USER0);
kunmap_atomic(vto, KM_USER1);
-   /* Make sure this page is cleared on other CPU's too before using it */
-   smp_wmb();
 }
 
 #endif
Index: linux-2.6/include/linux/page-flags.h
===
--- linux-2.6.orig/include/linux/page-flags.h
+++ linux-2.6/include/linux/page-flags.h
@@ -131,16 +131,52 @@
 #define ClearPageReferenced(page)  clear_bit(PG_referenced, &(page)->flags)
 #define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, 
&(page)->flags)
 
-#define PageUptodate(page) test_bit(PG_uptodate, &(page)->flags)
+static inline int PageUptodate(struct page *page)
+{
+   int ret = test_bit(PG_uptodate, &(page)->flags);
+
+   /*
+* Must ensure that the data we read out of the page is loaded
+* _after_ we've loaded page->flags to check for PageUptodate.
+* We can skip the barrier if the page is not uptodate, because
+* we wouldn't be reading anything from it.
+*
+* See SetPageUptodate() for the other side of the story.
+*/
+   if (ret)
+   smp_rmb();
+
+   return ret;
+}
+
+static inline void __SetPageUptodate(struct page *page)
+{
+   smp_wmb();
+   __set_bit(PG_uptodate, &(page)->flags);
 #ifdef CONFIG_S390
+   page_clear_dirty(page);
+#endif
+}
+
 static inline void SetPageUptodate(struct page *page)
 {
+#ifdef CONFIG_S390
if (!test_and_set_bit(PG_uptodate, &page->flags))
page_clear_dirty(page);
-}
 #else
-#define SetPageUptodate(page)  set_bit(PG_uptodate, &(page)->flags)
+   /*
+* Memory barrier must be issued before setting the PG_uptodate bit,
+* so that all previous stores issued in order to bring the page
+* uptodate are actually visible before PageUptodate becomes true.
+*
+* s390 doesn't need an explicit smp_wmb here because the test and
+* set bit already provides full barriers.
+*/
+   smp_wmb();
+   set_bit(PG_uptodate, &(p

[PATCH 49/49] jbd2: sparse pointer use of zero as null

2008-01-21 Thread Theodore Ts'o
From: Mingming Cao <[EMAIL PROTECTED]>

Get rid of sparse related warnings from places that use integer as NULL
pointer.  (Ported from upstream ext3/jbd changes.)

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
---
 fs/jbd2/transaction.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 0c8adab..b9b0b6f 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1182,7 +1182,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct 
buffer_head *bh)
}
 
/* That test should have eliminated the following case: */
-   J_ASSERT_JH(jh, jh->b_frozen_data == 0);
+   J_ASSERT_JH(jh, jh->b_frozen_data == NULL);
 
JBUFFER_TRACE(jh, "file as BJ_Metadata");
spin_lock(&journal->j_list_lock);
@@ -1532,7 +1532,7 @@ void __jbd2_journal_temp_unlink_buffer(struct 
journal_head *jh)
 
J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
if (jh->b_jlist != BJ_None)
-   J_ASSERT_JH(jh, transaction != 0);
+   J_ASSERT_JH(jh, transaction != NULL);
 
switch (jh->b_jlist) {
case BJ_None:
@@ -1601,11 +1601,11 @@ __journal_try_to_free_buffer(journal_t *journal, struct 
buffer_head *bh)
if (buffer_locked(bh) || buffer_dirty(bh))
goto out;
 
-   if (jh->b_next_transaction != 0)
+   if (jh->b_next_transaction != NULL)
goto out;
 
spin_lock(&journal->j_list_lock);
-   if (jh->b_transaction != 0 && jh->b_cp_transaction == 0) {
+   if (jh->b_transaction != NULL && jh->b_cp_transaction == NULL) {
if (jh->b_jlist == BJ_SyncData || jh->b_jlist == BJ_Locked) {
/* A written-back ordered data buffer */
JBUFFER_TRACE(jh, "release data");
@@ -1613,7 +1613,7 @@ __journal_try_to_free_buffer(journal_t *journal, struct 
buffer_head *bh)
jbd2_journal_remove_journal_head(bh);
__brelse(bh);
}
-   } else if (jh->b_cp_transaction != 0 && jh->b_transaction == 0) {
+   } else if (jh->b_cp_transaction != NULL && jh->b_transaction == NULL) {
/* written-back checkpointed metadata buffer */
if (jh->b_jlist == BJ_None) {
JBUFFER_TRACE(jh, "remove from checkpoint list");
@@ -1973,7 +1973,7 @@ void __jbd2_journal_file_buffer(struct journal_head *jh,
 
J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
J_ASSERT_JH(jh, jh->b_transaction == transaction ||
-   jh->b_transaction == 0);
+   jh->b_transaction == NULL);
 
if (jh->b_transaction && jh->b_jlist == jlist)
return;
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Parallelize IO for e2fsck

2008-01-21 Thread David Chinner
On Mon, Jan 21, 2008 at 04:00:41PM -0700, Andreas Dilger wrote:
> On Jan 16, 2008  13:30 -0800, Valerie Henson wrote:
> > I have a partial solution that sort of blindly manages the buffer
> > cache.  First, the user passes e2fsck a parameter saying how much
> > memory is available as buffer cache.  The readahead thread reads
> > things in and immediately throws them away so they are only in buffer
> > cache (no double-caching).  Then readahead and e2fsck work together so
> > that readahead only reads in new blocks when the main thread is done
> > with earlier blocks.  The already-used blocks get kicked out of buffer
> > cache to make room for the new ones.
> >
> > What would be nice is to take into account the current total memory
> > usage of the whole fsck process and factor that in.  I don't think it
> > would be hard to add to the existing cache management framework.
> > Thoughts?
> 
> I discussed this with Ted at one point also.  This is a generic problem,
> not just for readahead, because "fsck" can run multiple e2fsck in parallel
> and in case of many large filesystems on a single node this can cause
> memory usage problems also.
> 
> What I was proposing is that "fsck.{fstype}" be modified to return an
> estimated minimum amount of memory needed, and some "desired" amount of
> memory (i.e. readahead) to fsck the filesystem, using some parameter like
> "fsck.{fstype} --report-memory-needed /dev/XXX".  If this does not
> return the output in the expected format, or returns an error then fsck
> will assume some amount of memory based on the device size and continue
> as it does today.

And while fsck is running, some other program runs that uses
memory and blows your carefully calculated paramters to smithereens?

I think there is a clear need for applications to be able to
register a callback from the kernel to indicate that the machine as
a whole is running out of memory and that the application should
trim it's caches to reduce memory utilisation.

Perhaps instead of swapping immediately, a SIGLOWMEM could be sent
to a processes that aren't masking the signal followed by a short
grace period to allow the processes to free up some memory before
swapping out pages from that process?

With this sort of feedback, the fsck process can scale back it's
readahead and remove cached info that is not critical to what it
is currently doing and thereby prevent readahead thrashing as
memory usage of the fsck process itself grows.

Another example where this could be useful is to tell browsers to
release some of their cache rather than having the VM swap it out.

IMO, a scheme like this will be far more reliable than trying to
guess what the optimal settings are going to be over the whole
lifetime of a process

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: W1: w1_slave units, standardize 1C or .001C? Break API

2008-01-21 Thread H. Peter Anvin

H. Peter Anvin wrote:

David Fries wrote:

The ds18b20 one wire temperature sensor conversion routine is
returning the units in degrees C while the ds1820 (ds18s20) is
returning it in .001 degrees C.  20C vs 20312C.  Once you know the
units I'm liking the latter as it gives a higher precision.  Time to
break user applications so the driver can give the temperature in the
same units for both sensors.

I only have the ds18b20 sensor model.  Here is the current output from
the sys file for this sensor.
/sys/devices/w1_bus_master1/28-000e84a2/w1_slave
45 01 4b 46 7f ff 0b 10 84 : crc=84 YES
45 01 4b 46 7f ff 0b 10 84 t=20

I ran the example data from the specification for the ds1820 through
it's conversion routine and found that t= was 1000 times the value.  
What should the displayed units be? 
This is the same ds18b20 conversion *1000.  Is everyone ok or is any

objecting to .001 degrees C for the units?  Patch will follow.  The
.001 C does truncate one bit of precision from the ds18b20 by the way.



Millikelvins would have the nice property of never being negative.  :)



Alternatively, centikelvins would fit nicely in 16 bits if anyone cares...

655.35 K = 382.20 °C = 719.96 °F

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use separate sections for __dev/__cpu/__mem code/data

2008-01-21 Thread Sam Ravnborg
On Tue, Jan 22, 2008 at 09:56:57AM +0900, Paul Mundt wrote:
> On Mon, Jan 21, 2008 at 01:06:41PM +0100, Sam Ravnborg wrote:
> > On Mon, Jan 21, 2008 at 07:52:57PM +0900, Paul Mundt wrote:
> > > On Mon, Jan 21, 2008 at 11:47:45AM +0100, Sam Ravnborg wrote:
> > > > On Mon, Jan 21, 2008 at 11:45:06AM +0100, Sam Ravnborg wrote:
> > > > > On Mon, Jan 21, 2008 at 11:29:52AM +0100, Andreas Schwab wrote:
> > > > > > Sam Ravnborg <[EMAIL PROTECTED]> writes:
> > > > > > 
> > > > > > > On Mon, Jan 21, 2008 at 04:33:41PM +0900, Paul Mundt wrote:
> > > > > > >> so the ## is being taken directly rather than acting as a 
> > > > > > >> concatenation.
> > > > > > >
> > > > > > > Strange...
> > > > > > > I can reproduce with gcc 3.4.5 here - will fix.
> > > > > > 
> > > > > > The ## operator does not work with -traditional.
> > > > > 
> > > > > Crap - then it breaks at the following architectures:
> > > > > sh64, s390, m68k, m32r
> > > > > 
> > > > > Thanks Andreas.
> > > > 
> > > > OK - I was too quick it seem.
> > > > sh has:
> > > > arch/sh/Makefile:CPPFLAGS_vmlinux.lds := -traditional
> > > > 
> > > > So this needs to be ripped out as it is not needed.
> > > > 
> > > Yes, that can be killed. If this is aimed at 2.6.25, I'll just kill it
> > > off in my tree. Otherwise, feel free to roll this in to your patch set.
> > 
> > It is aimed for 2.6.25 and it looks reasonable to reach that goal.
> > So please take this patch in your tree.
> > 
> Done.
Thanks Paul,
and thanks for the prompt testing!

Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 32/49] jbd2: jbd2 stats through procfs

2008-01-21 Thread Theodore Ts'o
From: Johann Lombardi <[EMAIL PROTECTED]>

The patch below updates the jbd stats patch to 2.6.20/jbd2.
The initial patch was posted by Alex Tomas in December 2005
(http://marc.info/?l=linux-ext4&m=113538565128617&w=2).
It provides statistics via procfs such as transaction lifetime and size.

Sometimes, investigating performance problems, i find useful to have
stats from jbd about transaction's lifetime, size, etc. here is a
patch for review and inclusion probably.

for example, stats after creation of 3M files in htree directory:

[EMAIL PROTECTED] ~]# cat /proc/fs/jbd/sda/history
R/C  tid   wait  run   lock  flush log   hndls  block inlog ctime write drop  
close
R261   8260  2720  0 0 750   9892   8170  8187
C259750   0 4885  1
R262   202200  100 770   9836   8170  8187
R263   302200  100 3070  9812   8170  8187
R264   0 5000  100 1340  0  0 0
C2618240  3212  4957  0
R265   8260  1470  0 0 4640  9854   8170  8187
R266   0 5000  100 1460  0  0 0
C2628210  2989  4868  0
R267   8230  1490  100 4440  9875   8171  8188
R268   0 5000  100 1260  0  0 0
C2637710  2937  4908  0
R269   7730  1470  100 3330  9841   8170  8187
R270   0 5000  100 830   0  0 0
C2658140  3234  4898  0
C267720   0 4849  1
R271   8630  2740  200 740   9819   8170  8187
C269800   0 4214  1
R272   402170  100 830   9716   8170  8187
R273   402280  0 0 3530  9799   8170  8187
R274   0 5000  100 990   0  0 0


where,

R - line for transaction's life from T_RUNNING to T_FINISHED
C - line for transaction's checkpointing
tid   - transaction's id
wait  - for how long we were waiting for new transaction to start
 (the longest period journal_start() took in this transaction)
run   - real transaction's lifetime (from T_RUNNING to T_LOCKED
lock  - how long we were waiting for all handles to close
 (time the transaction was in T_LOCKED)
flush - how long it took to flush all data (data=ordered)
log   - how long it took to write the transaction to the log
hndls - how many handles got to the transaction
block - how many blocks got to the transaction
inlog - how many blocks are written to the log (block + descriptors)
ctime - how long it took to checkpoint the transaction
write - how many blocks have been written during checkpointing
drop  - how many blocks have been dropped during checkpointing
close - how many running transactions have been closed to checkpoint this one

all times are in msec.


[EMAIL PROTECTED] ~]# cat /proc/fs/jbd/sda/info
280 transaction, each upto 8192 blocks
average:
  1633ms waiting for transaction
  3616ms running transaction
  5ms transaction was being locked
  1ms flushing data (in ordered mode)
  1799ms logging transaction
  11781 handles per transaction
  5629 blocks per transaction
  5641 logged blocks per transaction

Signed-off-by: Johann Lombardi <[EMAIL PROTECTED]>
Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
---
 fs/jbd2/checkpoint.c  |   10 +-
 fs/jbd2/commit.c  |   49 +++
 fs/jbd2/journal.c |  338 +
 fs/jbd2/transaction.c |9 ++
 include/linux/jbd2.h  |   77 +++
 5 files changed, 481 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 7e958c8..1b7f282 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -232,7 +232,8 @@ __flush_batch(journal_t *journal, struct buffer_head **bhs, 
int *batch_count)
  * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
  */
 static int __process_buffer(journal_t *journal, struct journal_head *jh,
-   struct buffer_head **bhs, int *batch_count)
+   struct buffer_head **bhs, int *batch_count,
+   transaction_t *transaction)
 {
struct buffer_head *bh = jh2bh(jh);
int ret = 0;
@@ -250,6 +251,7 @@ static int __process_buffer(journal_t *journal, struct 
journal_head *jh,
transaction_t *t = jh->b_transaction;
tid_t tid = t->t_tid;
 
+   transaction->t_chp_stats.cs_forced_to_close++;
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
jbd2_log_start_commit(journal, tid);
@@ -279,6 +281,7 @@ static int __process_buffer(journal_t *j

[PATCH 06/49] ext4: fixes block group number being set to a negative value

2008-01-21 Thread Theodore Ts'o
From: Avantika Mathur <[EMAIL PROTECTED]>

This patch fixes various places where the group number is set to a negative
value.

Signed-off-by: Avantika Mathur <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
---
 fs/ext4/ialloc.c |  101 -
 1 files changed, 53 insertions(+), 48 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 64dea86..7b5cfa6 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -260,12 +260,14 @@ error_return:
  * For other inodes, search forward from the parent directory\'s block
  * group to find a free inode.
  */
-static ext4_group_t find_group_dir(struct super_block *sb, struct inode 
*parent)
+static int find_group_dir(struct super_block *sb, struct inode *parent,
+   ext4_group_t *best_group)
 {
ext4_group_t ngroups = EXT4_SB(sb)->s_groups_count;
unsigned int freei, avefreei;
struct ext4_group_desc *desc, *best_desc = NULL;
-   ext4_group_t group, best_group = -1;
+   ext4_group_t group;
+   int ret = -1;
 
freei = 
percpu_counter_read_positive(&EXT4_SB(sb)->s_freeinodes_counter);
avefreei = freei / ngroups;
@@ -279,11 +281,12 @@ static ext4_group_t find_group_dir(struct super_block 
*sb, struct inode *parent)
if (!best_desc ||
(le16_to_cpu(desc->bg_free_blocks_count) >
 le16_to_cpu(best_desc->bg_free_blocks_count))) {
-   best_group = group;
+   *best_group = group;
best_desc = desc;
+   ret = 0;
}
}
-   return best_group;
+   return ret;
 }
 
 /*
@@ -314,8 +317,8 @@ static ext4_group_t find_group_dir(struct super_block *sb, 
struct inode *parent)
 #define INODE_COST 64
 #define BLOCK_COST 256
 
-static ext4_group_t find_group_orlov(struct super_block *sb,
- struct inode *parent)
+static int find_group_orlov(struct super_block *sb, struct inode *parent,
+   ext4_group_t *group)
 {
ext4_group_t parent_group = EXT4_I(parent)->i_block_group;
struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -328,7 +331,7 @@ static ext4_group_t find_group_orlov(struct super_block *sb,
unsigned int ndirs;
int max_debt, max_dirs, min_inodes;
ext4_grpblk_t min_blocks;
-   ext4_group_t group = -1, i;
+   ext4_group_t i;
struct ext4_group_desc *desc;
 
freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
@@ -341,13 +344,14 @@ static ext4_group_t find_group_orlov(struct super_block 
*sb,
if ((parent == sb->s_root->d_inode) ||
(EXT4_I(parent)->i_flags & EXT4_TOPDIR_FL)) {
int best_ndir = inodes_per_group;
-   ext4_group_t best_group = -1;
+   ext4_group_t grp;
+   int ret = -1;
 
-   get_random_bytes(&group, sizeof(group));
-   parent_group = (unsigned)group % ngroups;
+   get_random_bytes(&grp, sizeof(grp));
+   parent_group = (unsigned)grp % ngroups;
for (i = 0; i < ngroups; i++) {
-   group = (parent_group + i) % ngroups;
-   desc = ext4_get_group_desc (sb, group, NULL);
+   grp = (parent_group + i) % ngroups;
+   desc = ext4_get_group_desc(sb, grp, NULL);
if (!desc || !desc->bg_free_inodes_count)
continue;
if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir)
@@ -356,11 +360,12 @@ static ext4_group_t find_group_orlov(struct super_block 
*sb,
continue;
if (le16_to_cpu(desc->bg_free_blocks_count) < avefreeb)
continue;
-   best_group = group;
+   *group = grp;
+   ret = 0;
best_ndir = le16_to_cpu(desc->bg_used_dirs_count);
}
-   if (best_group >= 0)
-   return best_group;
+   if (ret == 0)
+   return ret;
goto fallback;
}
 
@@ -381,8 +386,8 @@ static ext4_group_t find_group_orlov(struct super_block *sb,
max_debt = 1;
 
for (i = 0; i < ngroups; i++) {
-   group = (parent_group + i) % ngroups;
-   desc = ext4_get_group_desc (sb, group, NULL);
+   *group = (parent_group + i) % ngroups;
+   desc = ext4_get_group_desc(sb, *group, NULL);
if (!desc || !desc->bg_free_inodes_count)
continue;
if (le16_to_cpu(desc->bg_used_dirs_count) >= max_dirs)
@@ -391,17 +396,16 @@ static ext4_group_t find_group_orlov(struct super_block 
*sb,
conti

[PATCH 03/49] ext4: Introduce ext4_lblk_t

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

This patch adds a new data type ext4_lblk_t to represent
the logical file blocks.

This is the preparatory patch to support large files in ext4
The follow up patch with convert the ext4_inode i_blocks to
represent the number of blocks in file system block size. This
changes makes it possible to have a block number 2**32 -1 which
will result in overflow if the block number is represented by
signed long. This patch convert all the block number to type
ext4_lblk_t which is typedef to __u32

Also remove dead code ext4_ext_walk_space

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
---
 fs/ext4/dir.c   |2 +-
 fs/ext4/extents.c   |  218 ---
 fs/ext4/inode.c |   34 ---
 fs/ext4/namei.c |   54 ++-
 fs/ext4/super.c |4 +-
 include/linux/ext4_fs.h |   29 --
 include/linux/ext4_fs_extents.h |   19 +---
 include/linux/ext4_fs_i.h   |9 +-
 8 files changed, 143 insertions(+), 226 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 145a9c0..33888bb 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -124,7 +124,7 @@ static int ext4_readdir(struct file * filp,
offset = filp->f_pos & (sb->s_blocksize - 1);
 
while (!error && !stored && filp->f_pos < inode->i_size) {
-   unsigned long blk = filp->f_pos >> EXT4_BLOCK_SIZE_BITS(sb);
+   ext4_lblk_t blk = filp->f_pos >> EXT4_BLOCK_SIZE_BITS(sb);
struct buffer_head map_bh;
struct buffer_head *bh = NULL;
 
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 8528774..19d8059 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -144,7 +144,7 @@ static int ext4_ext_dirty(handle_t *handle, struct inode 
*inode,
 
 static ext4_fsblk_t ext4_ext_find_goal(struct inode *inode,
  struct ext4_ext_path *path,
- ext4_fsblk_t block)
+ ext4_lblk_t block)
 {
struct ext4_inode_info *ei = EXT4_I(inode);
ext4_fsblk_t bg_start;
@@ -367,13 +367,14 @@ static void ext4_ext_drop_refs(struct ext4_ext_path *path)
  * the header must be checked before calling this
  */
 static void
-ext4_ext_binsearch_idx(struct inode *inode, struct ext4_ext_path *path, int 
block)
+ext4_ext_binsearch_idx(struct inode *inode,
+   struct ext4_ext_path *path, ext4_lblk_t block)
 {
struct ext4_extent_header *eh = path->p_hdr;
struct ext4_extent_idx *r, *l, *m;
 
 
-   ext_debug("binsearch for %d(idx):  ", block);
+   ext_debug("binsearch for %lu(idx):  ", (unsigned long)block);
 
l = EXT_FIRST_INDEX(eh) + 1;
r = EXT_LAST_INDEX(eh);
@@ -425,7 +426,8 @@ ext4_ext_binsearch_idx(struct inode *inode, struct 
ext4_ext_path *path, int bloc
  * the header must be checked before calling this
  */
 static void
-ext4_ext_binsearch(struct inode *inode, struct ext4_ext_path *path, int block)
+ext4_ext_binsearch(struct inode *inode,
+   struct ext4_ext_path *path, ext4_lblk_t block)
 {
struct ext4_extent_header *eh = path->p_hdr;
struct ext4_extent *r, *l, *m;
@@ -438,7 +440,7 @@ ext4_ext_binsearch(struct inode *inode, struct 
ext4_ext_path *path, int block)
return;
}
 
-   ext_debug("binsearch for %d:  ", block);
+   ext_debug("binsearch for %lu:  ", (unsigned long)block);
 
l = EXT_FIRST_EXTENT(eh) + 1;
r = EXT_LAST_EXTENT(eh);
@@ -494,7 +496,8 @@ int ext4_ext_tree_init(handle_t *handle, struct inode 
*inode)
 }
 
 struct ext4_ext_path *
-ext4_ext_find_extent(struct inode *inode, int block, struct ext4_ext_path 
*path)
+ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
+   struct ext4_ext_path *path)
 {
struct ext4_extent_header *eh;
struct buffer_head *bh;
@@ -979,8 +982,8 @@ repeat:
/* refill path */
ext4_ext_drop_refs(path);
path = ext4_ext_find_extent(inode,
-   le32_to_cpu(newext->ee_block),
-   path);
+   (ext4_lblk_t)le32_to_cpu(newext->ee_block),
+   path);
if (IS_ERR(path))
err = PTR_ERR(path);
} else {
@@ -992,8 +995,8 @@ repeat:
/* refill path */
ext4_ext_drop_refs(path);
path = ext4_ext_find_extent(inode,
-   le32_to_cpu(newext->ee_block),
-   path);
+  (ext4_lblk_t)le32_to_cpu(newext->ee_block),
+   path);
if (IS

[PATCH 35/49] ext4: Add inode version support in ext4

2008-01-21 Thread Theodore Ts'o
From: Jean Noel Cordenner <[EMAIL PROTECTED]>

This patch adds 64-bit inode version support to ext4. The lower 32 bits
are stored in the osd1.linux1.l_i_version field while the high 32 bits
are stored in the i_version_hi field newly created in the ext4_inode.
This field is incremented in case the ext4_inode is large enough. A
i_version mount option has been added to enable the feature.

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]>
Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
Signed-off-by: Jean Noel Cordenner <[EMAIL PROTECTED]>
---
 fs/ext4/inode.c |   18 +-
 fs/ext4/super.c |   10 --
 fs/inode.c  |   17 -
 include/linux/ext4_fs.h |6 +-
 include/linux/fs.h  |   16 +++-
 5 files changed, 45 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ee0bc3a..3c013e5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2780,6 +2780,13 @@ void ext4_read_inode(struct inode * inode)
EXT4_INODE_GET_XTIME(i_atime, inode, raw_inode);
EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode);
 
+   inode->i_version = le32_to_cpu(raw_inode->i_disk_version);
+   if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
+   if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
+   inode->i_version |=
+   (__u64)(le32_to_cpu(raw_inode->i_version_hi)) << 32;
+   }
+
if (S_ISREG(inode->i_mode)) {
inode->i_op = &ext4_file_inode_operations;
inode->i_fop = &ext4_file_operations;
@@ -2962,8 +2969,14 @@ static int ext4_do_update_inode(handle_t *handle,
} else for (block = 0; block < EXT4_N_BLOCKS; block++)
raw_inode->i_block[block] = ei->i_data[block];
 
-   if (ei->i_extra_isize)
+   raw_inode->i_disk_version = cpu_to_le32(inode->i_version);
+   if (ei->i_extra_isize) {
+   if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
+   raw_inode->i_version_hi =
+   cpu_to_le32(inode->i_version >> 32);
raw_inode->i_extra_isize = cpu_to_le16(ei->i_extra_isize);
+   }
+
 
BUFFER_TRACE(bh, "call ext4_journal_dirty_metadata");
rc = ext4_journal_dirty_metadata(handle, bh);
@@ -3190,6 +3203,9 @@ int ext4_mark_iloc_dirty(handle_t *handle,
 {
int err = 0;
 
+   if (test_opt(inode->i_sb, I_VERSION))
+   inode_inc_iversion(inode);
+
/* the do_update_inode consumes one bh->b_count */
get_bh(iloc->bh);
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f7479d3..aa22acd 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -732,6 +732,8 @@ static int ext4_show_options(struct seq_file *seq, struct 
vfsmount *vfs)
seq_puts(seq, ",nobh");
if (!test_opt(sb, EXTENTS))
seq_puts(seq, ",noextents");
+   if (test_opt(sb, I_VERSION))
+   seq_puts(seq, ",i_version");
 
if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA)
seq_puts(seq, ",data=journal");
@@ -874,7 +876,7 @@ enum {
Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
-   Opt_grpquota, Opt_extents, Opt_noextents,
+   Opt_grpquota, Opt_extents, Opt_noextents, Opt_i_version,
 };
 
 static match_table_t tokens = {
@@ -928,6 +930,7 @@ static match_table_t tokens = {
{Opt_barrier, "barrier=%u"},
{Opt_extents, "extents"},
{Opt_noextents, "noextents"},
+   {Opt_i_version, "i_version"},
{Opt_err, NULL},
{Opt_resize, "resize"},
 };
@@ -1273,6 +1276,10 @@ clear_qf_name:
case Opt_noextents:
clear_opt (sbi->s_mount_opt, EXTENTS);
break;
+   case Opt_i_version:
+   set_opt(sbi->s_mount_opt, I_VERSION);
+   sb->s_flags |= MS_I_VERSION;
+   break;
default:
printk (KERN_ERR
"EXT4-fs: Unrecognized mount option \"%s\" "
@@ -3197,7 +3204,6 @@ out:
i_size_write(inode, off+len-towrite);
EXT4_I(inode)->i_disksize = inode->i_size;
}
-   inode->i_version++;
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
ext4_mark_inode_dirty(handle, inode);
mutex_unlock(&inode->i_mutex);
diff --git a/fs/inode.c b/fs/inode.c
index b48324a..276ffd6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1243,23 +1243,6 @@ void touch_atime(struct vfsmount *mnt, struct dentry 
*dentry)
 EXPORT_SYMBOL(touch_atime);
 
 /**
- * inode_inc_iversion  -   increments i_version
- * @i

[PATCH 38/49] ext4: fix up EXT4FS_DEBUG builds

2008-01-21 Thread Theodore Ts'o
From: Eric Sandeen <[EMAIL PROTECTED]>

Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail
without these changes.  Clean up some format warnings too.

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/ext4/balloc.c |6 +++---
 fs/ext4/ialloc.c |2 +-
 fs/ext4/resize.c |   16 
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 925e063..54d3da7 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -1630,7 +1630,7 @@ ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct 
inode *inode,
 
sbi = EXT4_SB(sb);
es = EXT4_SB(sb)->s_es;
-   ext4_debug("goal=%lu.\n", goal);
+   ext4_debug("goal=%llu.\n", goal);
/*
 * Allocate a block from reservation only when
 * filesystem is mounted with reservation(default,-o reservation), and
@@ -1740,7 +1740,7 @@ retry_alloc:
 
 allocated:
 
-   ext4_debug("using block group %d(%d)\n",
+   ext4_debug("using block group %lu(%d)\n",
group_no, gdp->bg_free_blocks_count);
 
BUFFER_TRACE(gdp_bh, "get_write_access");
@@ -1898,7 +1898,7 @@ ext4_fsblk_t ext4_count_free_blocks(struct super_block 
*sb)
brelse(bitmap_bh);
printk("ext4_count_free_blocks: stored = %llu"
", computed = %llu, %llu\n",
-  EXT4_FREE_BLOCKS_COUNT(es),
+   ext4_free_blocks_count(es),
desc_count, bitmap_count);
return bitmap_count;
 #else
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 17b5df1..575b521 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -857,7 +857,7 @@ unsigned long ext4_count_free_inodes (struct super_block * 
sb)
continue;
 
x = ext4_count_free(bitmap_bh, EXT4_INODES_PER_GROUP(sb) / 8);
-   printk("group %d: stored = %d, counted = %lu\n",
+   printk(KERN_DEBUG "group %lu: stored = %d, counted = %lu\n",
i, le16_to_cpu(gdp->bg_free_inodes_count), x);
bitmap_count += x;
}
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 7090c2d..4fbba60 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -206,7 +206,7 @@ static int setup_new_group_blocks(struct super_block *sb,
}
 
if (ext4_bg_has_super(sb, input->group)) {
-   ext4_debug("mark backup superblock %#04lx (+0)\n", start);
+   ext4_debug("mark backup superblock %#04llx (+0)\n", start);
ext4_set_bit(0, bh->b_data);
}
 
@@ -215,7 +215,7 @@ static int setup_new_group_blocks(struct super_block *sb,
 i < gdblocks; i++, block++, bit++) {
struct buffer_head *gdb;
 
-   ext4_debug("update backup group %#04lx (+%d)\n", block, bit);
+   ext4_debug("update backup group %#04llx (+%d)\n", block, bit);
 
if ((err = extend_or_restart_transaction(handle, 1, bh)))
goto exit_bh;
@@ -243,7 +243,7 @@ static int setup_new_group_blocks(struct super_block *sb,
 i < reserved_gdb; i++, block++, bit++) {
struct buffer_head *gdb;
 
-   ext4_debug("clear reserved block %#04lx (+%d)\n", block, bit);
+   ext4_debug("clear reserved block %#04llx (+%d)\n", block, bit);
 
if ((err = extend_or_restart_transaction(handle, 1, bh)))
goto exit_bh;
@@ -256,10 +256,10 @@ static int setup_new_group_blocks(struct super_block *sb,
ext4_set_bit(bit, bh->b_data);
brelse(gdb);
}
-   ext4_debug("mark block bitmap %#04x (+%ld)\n", input->block_bitmap,
+   ext4_debug("mark block bitmap %#04llx (+%llu)\n", input->block_bitmap,
   input->block_bitmap - start);
ext4_set_bit(input->block_bitmap - start, bh->b_data);
-   ext4_debug("mark inode bitmap %#04x (+%ld)\n", input->inode_bitmap,
+   ext4_debug("mark inode bitmap %#04llx (+%llu)\n", input->inode_bitmap,
   input->inode_bitmap - start);
ext4_set_bit(input->inode_bitmap - start, bh->b_data);
 
@@ -268,7 +268,7 @@ static int setup_new_group_blocks(struct super_block *sb,
 i < sbi->s_itb_per_group; i++, bit++, block++) {
struct buffer_head *it;
 
-   ext4_debug("clear inode block %#04lx (+%d)\n", block, bit);
+   ext4_debug("clear inode block %#04llx (+%d)\n", block, bit);
 
if ((err = extend_or_restart_transaction(handle, 1, bh)))
goto exit_bh;
@@ -291,7 +291,7 @@ static int setup_new_group_blocks(struct super_block *sb,
brelse(bh);
 
/* Mark unused entries in inode bitmap used */
-   ext4_debug("clear inode bitmap %#04x (+%ld)\n",
+   ext4_debug("clear inode bitmap %#04llx (+%llu)\n",
   input->inode_bitmap, input->inode_bitmap - start);

[PATCH 34/49] vfs: Add 64 bit i_version support

2008-01-21 Thread Theodore Ts'o
From: Jean Noel Cordenner <[EMAIL PROTECTED]>

The i_version field of the inode is changed to be a 64-bit counter that
is set on every inode creation and that is incremented every time the
inode data is modified (similarly to the "ctime" time-stamp).
The aim is to fulfill a NFSv4 requirement for rfc3530.
This first part concerns the vfs, it converts the 32-bit i_version in
the generic inode to a 64-bit, a flag is added in the super block in
order to check if the feature is enabled and the i_version is
incremented in the vfs.

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Signed-off-by: Jean Noel Cordenner <[EMAIL PROTECTED]>
Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]>
---
 fs/afs/dir.c   |9 +
 fs/afs/inode.c |3 ++-
 fs/inode.c |   22 ++
 include/linux/fs.h |5 -
 4 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 33fe39a..0cc3597 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -546,11 +546,11 @@ static struct dentry *afs_lookup(struct inode *dir, 
struct dentry *dentry,
dentry->d_op = &afs_fs_dentry_operations;
 
d_add(dentry, inode);
-   _leave(" = 0 { vn=%u u=%u } -> { ino=%lu v=%lu }",
+   _leave(" = 0 { vn=%u u=%u } -> { ino=%lu v=%llu }",
   fid.vnode,
   fid.unique,
   dentry->d_inode->i_ino,
-  dentry->d_inode->i_version);
+  (unsigned long long)dentry->d_inode->i_version);
 
return NULL;
 }
@@ -630,9 +630,10 @@ static int afs_d_revalidate(struct dentry *dentry, struct 
nameidata *nd)
 * been deleted and replaced, and the original vnode ID has
 * been reused */
if (fid.unique != vnode->fid.unique) {
-   _debug("%s: file deleted (uq %u -> %u I:%lu)",
+   _debug("%s: file deleted (uq %u -> %u I:%llu)",
   dentry->d_name.name, fid.unique,
-  vnode->fid.unique, dentry->d_inode->i_version);
+  vnode->fid.unique,
+  (unsigned long long)dentry->d_inode->i_version);
spin_lock(&vnode->lock);
set_bit(AFS_VNODE_DELETED, &vnode->flags);
spin_unlock(&vnode->lock);
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index d196840..84750c8 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -301,7 +301,8 @@ int afs_getattr(struct vfsmount *mnt, struct dentry *dentry,
 
inode = dentry->d_inode;
 
-   _enter("{ ino=%lu v=%lu }", inode->i_ino, inode->i_version);
+   _enter("{ ino=%lu v=%llu }", inode->i_ino,
+   (unsigned long long)inode->i_version);
 
generic_fillattr(inode, stat);
return 0;
diff --git a/fs/inode.c b/fs/inode.c
index ed35383..b48324a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1243,6 +1243,23 @@ void touch_atime(struct vfsmount *mnt, struct dentry 
*dentry)
 EXPORT_SYMBOL(touch_atime);
 
 /**
+ * inode_inc_iversion  -   increments i_version
+ * @inode: inode that need to be updated
+ *
+ * Every time the inode is modified, the i_version field
+ * will be incremented.
+ * The filesystem has to be mounted with i_version flag
+ *
+ */
+
+void inode_inc_iversion(struct inode *inode)
+{
+   spin_lock(&inode->i_lock);
+   inode->i_version++;
+   spin_unlock(&inode->i_lock);
+}
+
+/**
  * file_update_time-   update mtime and ctime time
  * @file: file accessed
  *
@@ -1276,6 +1293,11 @@ void file_update_time(struct file *file)
sync_it = 1;
}
 
+   if (IS_I_VERSION(inode)) {
+   inode_inc_iversion(inode);
+   sync_it = 1;
+   }
+
if (sync_it)
mark_inode_dirty_sync(inode);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b3ec4a4..94cf5d8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -124,6 +124,7 @@ extern int dir_notify_enable;
 #define MS_SHARED  (1<<20) /* change to shared */
 #define MS_RELATIME(1<<21) /* Update atime relative to mtime/ctime. */
 #define MS_KERNMOUNT   (1<<22) /* this is a kern_mount call */
+#define MS_I_VERSION   (1<<23) /* Update inode I_version field */
 #define MS_ACTIVE  (1<<30)
 #define MS_NOUSER  (1<<31)
 
@@ -173,6 +174,7 @@ extern int dir_notify_enable;
((inode)->i_flags & (S_SYNC|S_DIRSYNC)))
 #define IS_MANDLOCK(inode) __IS_FLG(inode, MS_MANDLOCK)
 #define IS_NOATIME(inode)   __IS_FLG(inode, MS_RDONLY|MS_NOATIME)
+#define IS_I_VERSION(inode)   __IS_FLG(inode, MS_I_VERSION)
 
 #define IS_NOQUOTA(inode)  ((inode)->i_flags & S_NOQUOTA)
 #define IS_APPEND(inode)   ((inode)->i_flags & S_APPEND)
@@ -599,7 +601,7 @@ struct inode {
uid_t   i_uid;
gid_t   i_gid;
dev_t   i_rdev;
- 

[PATCH 18/49] ext4: sync up block group descriptor with e2fsprogs.

2008-01-21 Thread Theodore Ts'o
From: Coly Li <[EMAIL PROTECTED]>

This patch extends bg_itable_unused of ext4 group descriptor
from 16bit into 32bit. In order to add bg_itable_unused_hi into
struct ext4_group_desc, some extra fields which are already introduced into
e2fsprogs are also added in for consistency.

Signed-off-by: Coly Li <[EMAIL PROTECTED]>
Cc: Andreas Dilger <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 include/linux/ext4_fs.h |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index 6ae91f4..55a376e 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -118,6 +118,11 @@ struct ext4_group_desc
__le32  bg_block_bitmap_hi; /* Blocks bitmap block MSB */
__le32  bg_inode_bitmap_hi; /* Inodes bitmap block MSB */
__le32  bg_inode_table_hi;  /* Inodes table block MSB */
+   __le16  bg_free_blocks_count_hi;/* Free blocks count MSB */
+   __le16  bg_free_inodes_count_hi;/* Free inodes count MSB */
+   __le16  bg_used_dirs_count_hi;  /* Directories count MSB */
+   __le16  bg_itable_unused_hi;/* Unused inodes count MSB */
+   __u32   bg_reserved2[3];
 };
 
 #define EXT4_BG_INODE_UNINIT   0x0001 /* Inode table/bitmap not in use */
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 33/49] ext4: Add the journal checksum feature

2008-01-21 Thread Theodore Ts'o
From: Girish Shilamkar <[EMAIL PROTECTED]>

The journal checksum feature adds two new flags i.e
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.

JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
checksum for the blocks described by the descriptor blocks.
Due to checksums, writing of the commit record no longer needs to be
synchronous. Now commit record can be sent to disk without waiting for
descriptor blocks to be written to disk. This behavior is controlled
using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
able to recover the journal with _ASYNC_COMMIT hence it is made
incompat.
The commit header has been extended to hold the checksum along with the
type of the checksum.

For recovery in pass scan checksums are verified to ensure the sanity
and completeness(in case of _ASYNC_COMMIT) of every transaction.

Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
Signed-off-by: Girish Shilamkar <[EMAIL PROTECTED]>
Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 Documentation/filesystems/ext4.txt |   10 ++
 fs/Kconfig |1 +
 fs/ext4/super.c|   25 +
 fs/jbd2/commit.c   |  196 +++-
 fs/jbd2/journal.c  |   28 +
 fs/jbd2/recovery.c |  149 ++--
 include/linux/ext4_fs.h|3 +-
 include/linux/jbd2.h   |   36 ++-
 8 files changed, 388 insertions(+), 60 deletions(-)

diff --git a/Documentation/filesystems/ext4.txt 
b/Documentation/filesystems/ext4.txt
index 6a4adca..4f329af 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -89,6 +89,16 @@ When mounting an ext4 filesystem, the following option are 
accepted:
 extentsext4 will use extents to address file data.  The
file system will no longer be mountable by ext3.
 
+journal_checksum   Enable checksumming of the journal transactions.
+   This will allow the recovery code in e2fsck and the
+   kernel to detect corruption in the kernel.  It is a
+   compatible change and will be ignored by older kernels.
+
+journal_async_commit   Commit block can be written to disk without waiting
+   for descriptor blocks. If enabled older kernels cannot
+   mount the device. This will enable 'journal_checksum'
+   internally.
+
 journal=update Update the ext4 file system's journal to the current
format.
 
diff --git a/fs/Kconfig b/fs/Kconfig
index 487236c..bb0b72c 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -236,6 +236,7 @@ config JBD_DEBUG
 
 config JBD2
tristate
+   select CRC32
help
  This is a generic journaling layer for block devices that support
  both 32-bit and 64-bit block numbers.  It is currently used by
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c730544..f7479d3 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -869,6 +869,7 @@ enum {
Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl,
Opt_reservation, Opt_noreservation, Opt_noload, Opt_nobh, Opt_bh,
Opt_commit, Opt_journal_update, Opt_journal_inum, Opt_journal_dev,
+   Opt_journal_checksum, Opt_journal_async_commit,
Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback,
Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
@@ -908,6 +909,8 @@ static match_table_t tokens = {
{Opt_journal_update, "journal=update"},
{Opt_journal_inum, "journal=%u"},
{Opt_journal_dev, "journal_dev=%u"},
+   {Opt_journal_checksum, "journal_checksum"},
+   {Opt_journal_async_commit, "journal_async_commit"},
{Opt_abort, "abort"},
{Opt_data_journal, "data=journal"},
{Opt_data_ordered, "data=ordered"},
@@ -1095,6 +1098,13 @@ static int parse_options (char *options, struct 
super_block *sb,
return 0;
*journal_devnum = option;
break;
+   case Opt_journal_checksum:
+   set_opt(sbi->s_mount_opt, JOURNAL_CHECKSUM);
+   break;
+   case Opt_journal_async_commit:
+   set_opt(sbi->s_mount_opt, JOURNAL_ASYNC_COMMIT);
+   set_opt(sbi->s_mount_opt, JOURNAL_CHECKSUM);
+   break;
case Opt_noload:
set_opt (sbi->s_mount_opt, NOLOAD);
break;
@@ -2114,6 +2124,21 @@ static int ext4_fill_super (struct super_block *sb, void 
*data, int silent)
goto failed_mount4;
}
 
+   if (test_opt(sb, JOURNAL_ASYNC_

[PATCH 02/49] ext4: Avoid rec_len overflow with 64KB block size

2008-01-21 Thread Theodore Ts'o
From: Jan Kara <[EMAIL PROTECTED]>

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0x instead and convert
value when read from / written to disk. The patch also converts some places
to use ext4_next_entry() when we are changing them anyway.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/ext4/dir.c   |   12 
 fs/ext4/namei.c |   77 ++
 include/linux/ext4_fs.h |   20 
 3 files changed, 63 insertions(+), 46 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index f612bef..145a9c0 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -67,7 +67,7 @@ int ext4_check_dir_entry (const char * function, struct inode 
* dir,
  unsigned long offset)
 {
const char * error_msg = NULL;
-   const int rlen = le16_to_cpu(de->rec_len);
+   const int rlen = ext4_rec_len_from_disk(de->rec_len);
 
if (rlen < EXT4_DIR_REC_LEN(1))
error_msg = "rec_len is smaller than minimal";
@@ -172,10 +172,10 @@ revalidate:
 * least that it is non-zero.  A
 * failure will be detected in the
 * dirent test below. */
-   if (le16_to_cpu(de->rec_len) <
-   EXT4_DIR_REC_LEN(1))
+   if (ext4_rec_len_from_disk(de->rec_len)
+   < EXT4_DIR_REC_LEN(1))
break;
-   i += le16_to_cpu(de->rec_len);
+   i += ext4_rec_len_from_disk(de->rec_len);
}
offset = i;
filp->f_pos = (filp->f_pos & ~(sb->s_blocksize - 1))
@@ -197,7 +197,7 @@ revalidate:
ret = stored;
goto out;
}
-   offset += le16_to_cpu(de->rec_len);
+   offset += ext4_rec_len_from_disk(de->rec_len);
if (le32_to_cpu(de->inode)) {
/* We might block in the next section
 * if the data destination is
@@ -219,7 +219,7 @@ revalidate:
goto revalidate;
stored ++;
}
-   filp->f_pos += le16_to_cpu(de->rec_len);
+   filp->f_pos += ext4_rec_len_from_disk(de->rec_len);
}
offset = 0;
brelse (bh);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 94ee6f3..d9a3a2f 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -280,7 +280,7 @@ static struct stats dx_show_leaf(struct dx_hash_info 
*hinfo, struct ext4_dir_ent
space += EXT4_DIR_REC_LEN(de->name_len);
names++;
}
-   de = (struct ext4_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de->rec_len));
+   de = ext4_next_entry(de);
}
printk("(%i)\n", names);
return (struct stats) { names, space, 1 };
@@ -551,7 +551,8 @@ static int ext4_htree_next_block(struct inode *dir, __u32 
hash,
  */
 static inline struct ext4_dir_entry_2 *ext4_next_entry(struct ext4_dir_entry_2 
*p)
 {
-   return (struct ext4_dir_entry_2 *)((char*)p + le16_to_cpu(p->rec_len));
+   return (struct ext4_dir_entry_2 *)((char *)p +
+   ext4_rec_len_from_disk(p->rec_len));
 }
 
 /*
@@ -720,7 +721,7 @@ static int dx_make_map (struct ext4_dir_entry_2 *de, int 
size,
cond_resched();
}
/* XXX: do we need to check rec_len == 0 case? -Chris */
-   de = (struct ext4_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de->rec_len));
+   de = ext4_next_entry(de);
}
return count;
 }
@@ -820,7 +821,7 @@ static inline int search_dirblock(struct buffer_head * bh,
return 1;
}
/* prevent looping on a bad block */
-   de_len = le16_to_cpu(de->rec_len);
+   de_len = ext4_rec_len_from_disk(de->rec_len);
if (de_len <= 0)
return -1;
offset += de_len;
@@ -1128,7 +1129,7 @@ dx_move_dirents(char *from, char *to, struct dx_map_entry 
*map, int count)
rec_len = EXT4_DIR_REC_LEN(de->name_len);
memcpy (to, de, rec_len);
((struct ext4_dir_entry_2 *) to)->rec_len =
-   cpu_to_le16(rec_len);
+   ext4_rec_len_to_disk(rec_len);
de->inode = 0;
map++;
to += rec_len;
@@ -11

[PATCH 43/49] ext4: Check for return value from sb_set_blocksize

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

sb_set_blocksize validates whether the specfied block size can be used by
the file system. Make sure we fail mounting the file system if the
blocksize specfied cannot be used.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/ext4/super.c |   15 +--
 1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 91a11ec..a91e17e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1809,7 +1809,6 @@ static int ext4_fill_super (struct super_block *sb, void 
*data, int silent)
unsigned long def_mount_opts;
struct inode *root;
int blocksize;
-   int hblock;
int db_count;
int i;
int needs_recovery;
@@ -1966,20 +1965,16 @@ static int ext4_fill_super (struct super_block *sb, 
void *data, int silent)
goto failed_mount;
}
 
-   hblock = bdev_hardsect_size(sb->s_bdev);
if (sb->s_blocksize != blocksize) {
-   /*
-* Make sure the blocksize for the filesystem is larger
-* than the hardware sectorsize for the machine.
-*/
-   if (blocksize < hblock) {
-   printk(KERN_ERR "EXT4-fs: blocksize %d too small for "
-  "device blocksize %d.\n", blocksize, hblock);
+
+   /* Validate the filesystem blocksize */
+   if (!sb_set_blocksize(sb, blocksize)) {
+   printk(KERN_ERR "EXT4-fs: bad block size %d.\n",
+   blocksize);
goto failed_mount;
}
 
brelse (bh);
-   sb_set_blocksize(sb, blocksize);
logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE;
offset = do_div(logical_sb_block, blocksize);
bh = sb_bread(sb, logical_sb_block);
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/49] ext4: add ext4_group_t, and change all group variables to this type.

2008-01-21 Thread Theodore Ts'o
From: Avantika Mathur <[EMAIL PROTECTED]>

In many places variables for block group are of type int, which limits the
maximum number of block groups to 2^31.  Each block group can have up to
2^15 blocks, with a 4K block size,  and the max filesystem size is limited to
2^31 * (2^15 * 2^12) = 2^58  -- or 256 PB

This patch introduces a new type ext4_group_t, of type unsigned long, to
represent block group numbers in ext4.
All occurrences of block group variables are converted to type ext4_group_t.

Signed-off-by: Avantika Mathur <[EMAIL PROTECTED]>
---
 fs/ext4/balloc.c   |   69 +---
 fs/ext4/group.h|8 +++--
 fs/ext4/ialloc.c   |   46 +++--
 fs/ext4/inode.c|5 ++-
 fs/ext4/resize.c   |   12 
 fs/ext4/super.c|   20 ++---
 include/linux/ext4_fs.h|   11 ---
 include/linux/ext4_fs_i.h  |5 ++-
 include/linux/ext4_fs_sb.h |2 +-
 9 files changed, 91 insertions(+), 87 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 71ee95e..9568a57 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -29,7 +29,7 @@
  * Calculate the block group number and offset, given a block number
  */
 void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr,
-   unsigned long *blockgrpp, ext4_grpblk_t *offsetp)
+   ext4_group_t *blockgrpp, ext4_grpblk_t *offsetp)
 {
struct ext4_super_block *es = EXT4_SB(sb)->s_es;
ext4_grpblk_t offset;
@@ -46,7 +46,7 @@ void ext4_get_group_no_and_offset(struct super_block *sb, 
ext4_fsblk_t blocknr,
 /* Initializes an uninitialized block bitmap if given, and returns the
  * number of blocks free in the group. */
 unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
-   int block_group, struct ext4_group_desc *gdp)
+ext4_group_t block_group, struct ext4_group_desc *gdp)
 {
unsigned long start;
int bit, bit_max;
@@ -60,7 +60,7 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, 
struct buffer_head *bh,
 * essentially implementing a per-group read-only flag. */
if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
ext4_error(sb, __FUNCTION__,
-  "Checksum bad for group %u\n", block_group);
+ "Checksum bad for group %lu\n", block_group);
gdp->bg_free_blocks_count = 0;
gdp->bg_free_inodes_count = 0;
gdp->bg_itable_unused = 0;
@@ -153,7 +153,7 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, 
struct buffer_head *bh,
  * group descriptor
  */
 struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb,
-unsigned int block_group,
+ext4_group_t block_group,
 struct buffer_head ** bh)
 {
unsigned long group_desc;
@@ -164,7 +164,7 @@ struct ext4_group_desc * ext4_get_group_desc(struct 
super_block * sb,
if (block_group >= sbi->s_groups_count) {
ext4_error (sb, "ext4_get_group_desc",
"block_group >= groups_count - "
-   "block_group = %d, groups_count = %lu",
+   "block_group = %lu, groups_count = %lu",
block_group, sbi->s_groups_count);
 
return NULL;
@@ -176,7 +176,7 @@ struct ext4_group_desc * ext4_get_group_desc(struct 
super_block * sb,
if (!sbi->s_group_desc[group_desc]) {
ext4_error (sb, "ext4_get_group_desc",
"Group descriptor not loaded - "
-   "block_group = %d, group_desc = %lu, desc = %lu",
+   "block_group = %lu, group_desc = %lu, desc = %lu",
 block_group, group_desc, offset);
return NULL;
}
@@ -200,7 +200,7 @@ struct ext4_group_desc * ext4_get_group_desc(struct 
super_block * sb,
  * Return buffer_head on success or NULL in case of failure.
  */
 struct buffer_head *
-read_block_bitmap(struct super_block *sb, unsigned int block_group)
+read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
 {
struct ext4_group_desc * desc;
struct buffer_head * bh = NULL;
@@ -227,7 +227,7 @@ read_block_bitmap(struct super_block *sb, unsigned int 
block_group)
if (!bh)
ext4_error (sb, __FUNCTION__,
"Cannot read block bitmap - "
-   "block_group = %d, block_bitmap = %llu",
+   "block_group = %lu, block_bitmap = %llu",
block_group, bitmap_blk);
return bh;
 }
@@ -320,7 +3

[PATCH 22/49] ext4: Change the default behaviour on error

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

ext4 file system was by default ignoring errors and continuing. This
is not a good default as continuing on error could lead to file system
corruption. Change the default to mark the file system
readonly. Debian and ubuntu already does this as the default in their
fstab.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
Acked-by: Eric Sandeen <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/ext4/super.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 32e3ecb..effd375 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -688,16 +688,16 @@ static int ext4_show_options(struct seq_file *seq, struct 
vfsmount *vfs)
le16_to_cpu(es->s_def_resgid) != EXT4_DEF_RESGID) {
seq_printf(seq, ",resgid=%u", sbi->s_resgid);
}
-   if (test_opt(sb, ERRORS_CONT)) {
+   if (test_opt(sb, ERRORS_RO)) {
int def_errors = le16_to_cpu(es->s_errors);
 
if (def_errors == EXT4_ERRORS_PANIC ||
-   def_errors == EXT4_ERRORS_RO) {
-   seq_puts(seq, ",errors=continue");
+   def_errors == EXT4_ERRORS_CONTINUE) {
+   seq_puts(seq, ",errors=remount-ro");
}
}
-   if (test_opt(sb, ERRORS_RO))
-   seq_puts(seq, ",errors=remount-ro");
+   if (test_opt(sb, ERRORS_CONT))
+   seq_puts(seq, ",errors=continue");
if (test_opt(sb, ERRORS_PANIC))
seq_puts(seq, ",errors=panic");
if (test_opt(sb, NO_UID32))
@@ -1819,10 +1819,10 @@ static int ext4_fill_super (struct super_block *sb, 
void *data, int silent)
 
if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_PANIC)
set_opt(sbi->s_mount_opt, ERRORS_PANIC);
-   else if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_RO)
-   set_opt(sbi->s_mount_opt, ERRORS_RO);
-   else
+   else if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_CONTINUE)
set_opt(sbi->s_mount_opt, ERRORS_CONT);
+   else
+   set_opt(sbi->s_mount_opt, ERRORS_RO);
 
sbi->s_resuid = le16_to_cpu(es->s_def_resuid);
sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 36/49] ext4: Add EXT4_IOC_MIGRATE ioctl

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

The below patch add ioctl for migrating ext3 indirect block mapped inode
to ext4 extent mapped inode.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/Makefile|2 +-
 fs/ext4/ioctl.c |3 +
 fs/ext4/migrate.c   |  634 +++
 include/linux/ext4_fs.h |4 +
 4 files changed, 642 insertions(+), 1 deletions(-)
 create mode 100644 fs/ext4/migrate.c

diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index ae6e7e5..d5fd80b 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -6,7 +6,7 @@ obj-$(CONFIG_EXT4DEV_FS) += ext4dev.o
 
 ext4dev-y  := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \
   ioctl.o namei.o super.o symlink.o hash.o resize.o extents.o \
-  ext4_jbd2.o
+  ext4_jbd2.o migrate.o
 
 ext4dev-$(CONFIG_EXT4DEV_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o
 ext4dev-$(CONFIG_EXT4DEV_FS_POSIX_ACL) += acl.o
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index c0e5b8c..2ed7c37 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -254,6 +254,9 @@ flags_err:
return err;
}
 
+   case EXT4_IOC_MIGRATE:
+   return ext4_ext_migrate(inode, filp, cmd, arg);
+
default:
return -ENOTTY;
}
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
new file mode 100644
index 000..7203d3d
--- /dev/null
+++ b/fs/ext4/migrate.c
@@ -0,0 +1,634 @@
+/*
+ * Copyright IBM Corporation, 2007
+ * Author Aneesh Kumar K.V <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#include 
+#include 
+#include 
+
+struct list_blocks_struct {
+   ext4_lblk_t first_block, last_block;
+   ext4_fsblk_t first_pblock, last_pblock;
+};
+
+/* will go away */
+static void ext4_ext_store_pblock(struct ext4_extent *ex, ext4_fsblk_t pb)
+{
+   ex->ee_start_lo = cpu_to_le32((unsigned long) (pb & 0x));
+   ex->ee_start_hi = cpu_to_le16((unsigned long) ((pb >> 31) >> 1)
+   & 0x);
+}
+
+static int finish_range(handle_t *handle, struct inode *inode,
+   struct list_blocks_struct *lb)
+
+{
+   int retval = 0, needed;
+   struct ext4_extent newext;
+   struct ext4_ext_path *path;
+   if (lb->first_pblock == 0)
+   return 0;
+
+   /* Add the extent to temp inode*/
+   newext.ee_block = cpu_to_le32(lb->first_block);
+   newext.ee_len   = cpu_to_le16(lb->last_block - lb->first_block + 1);
+   ext4_ext_store_pblock(&newext, lb->first_pblock);
+   path = ext4_ext_find_extent(inode, lb->first_block, NULL);
+
+   if (IS_ERR(path)) {
+   retval = PTR_ERR(path);
+   goto err_out;
+   }
+
+   /*
+* Calculate the credit needed to inserting this extent
+* Since we are doing this in loop we may accumalate extra
+* credit. But below we try to not accumalate too much
+* of them by restarting the journal.
+*/
+   needed = ext4_ext_calc_credits_for_insert(inode, path);
+
+   /*
+* Make sure the credit we accumalated is not really high
+*/
+
+   if (needed && handle->h_buffer_credits >= EXT4_RESERVE_TRANS_BLOCKS) {
+
+   retval = ext4_journal_restart(handle, needed);
+   if (retval)
+   goto err_out;
+
+   }
+
+   if (needed) {
+   retval = ext4_journal_extend(handle, needed);
+   if (retval != 0) {
+   /*
+* IF not able to extend the journal restart the journal
+*/
+   retval = ext4_journal_restart(handle, needed);
+   if (retval)
+   goto err_out;
+   }
+   }
+
+   retval = ext4_ext_insert_extent(handle, inode, path, &newext);
+
+err_out:
+   lb->first_pblock = 0;
+   return retval;
+}
+static int update_extent_range(handle_t *handle, struct inode *inode,
+   ext4_fsblk_t pblock, ext4_lblk_t blk_num,
+   struct list_blocks_struct *lb)
+{
+   int retval;
+
+   /*
+* See if we can add on to the existing range (if it exists)
+*/
+   if (lb->first_pblock &&
+   (lb->last_pblock+1 == pblock) &&
+   (lb->last_block+1 == blk_num)) {
+   lb->last_pblock = pblock;
+   lb->last_block = blk_num;
+   retur

[PATCH 13/49] ext4: different maxbytes functions for bitmap & extent files

2008-01-21 Thread Theodore Ts'o
From: Eric Sandeen <[EMAIL PROTECTED]>

use 2 different maxbytes functions for bitmapped & extent-based
files.

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
---
 fs/ext4/super.c |   45 ++---
 1 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 64067de..c79e46b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1600,19 +1600,58 @@ static void ext4_orphan_cleanup (struct super_block * 
sb,
 #endif
sb->s_flags = s_flags; /* Restore MS_RDONLY status */
 }
+/*
+ * Maximal extent format file size.
+ * Resulting logical blkno at s_maxbytes must fit in our on-disk
+ * extent format containers, within a sector_t, and within i_blocks
+ * in the vfs.  ext4 inode has 48 bits of i_block in fsblock units,
+ * so that won't be a limiting factor.
+ *
+ * Note, this does *not* consider any metadata overhead for vfs i_blocks.
+ */
+static loff_t ext4_max_size(int blkbits)
+{
+   loff_t res;
+   loff_t upper_limit = MAX_LFS_FILESIZE;
+
+   /* small i_blocks in vfs inode? */
+   if (sizeof(blkcnt_t) < sizeof(u64)) {
+   /*
+* CONFIG_LSF is not enabled implies the inode
+* i_block represent total blocks in 512 bytes
+* 32 == size of vfs inode i_blocks * 8
+*/
+   upper_limit = (1LL << 32) - 1;
+
+   /* total blocks in file system block size */
+   upper_limit >>= (blkbits - 9);
+   upper_limit <<= blkbits;
+   }
+
+   /* 32-bit extent-start container, ee_block */
+   res = 1LL << 32;
+   res <<= blkbits;
+   res -= 1;
+
+   /* Sanity check against vm- & vfs- imposed limits */
+   if (res > upper_limit)
+   res = upper_limit;
+
+   return res;
+}
 
 /*
- * Maximal file size.  There is a direct, and {,double-,triple-}indirect
+ * Maximal bitmap file size.  There is a direct, and {,double-,triple-}indirect
  * block limit, and also a limit of (2^48 - 1) 512-byte sectors in i_blocks.
  * We need to be 1 filesystem block less than the 2^48 sector limit.
  */
-static loff_t ext4_max_size(int bits)
+static loff_t ext4_max_bitmap_size(int bits)
 {
loff_t res = EXT4_NDIR_BLOCKS;
int meta_blocks;
loff_t upper_limit;
/* This is calculated to be the largest file size for a
-* dense, file such that the total number of
+* dense, bitmapped file such that the total number of
 * sectors in the file, including data and all indirect blocks,
 * does not exceed 2^48 -1
 * __u32 i_blocks_lo and _u16 i_blocks_high representing the
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 30/49] ext4: Convert truncate_mutex to read write semaphore.

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

We are currently taking the truncate_mutex for every read. This would have
performance impact on large CPU configuration. Convert the lock to read write
semaphore and take read lock when we are trying to read the file.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/balloc.c  |2 +-
 fs/ext4/extents.c |   13 +++--
 fs/ext4/file.c|4 ++--
 fs/ext4/inode.c   |   39 ---
 fs/ext4/ioctl.c   |4 ++--
 fs/ext4/super.c   |2 +-
 include/linux/ext4_fs.h   |   25 -
 include/linux/ext4_fs_i.h |6 +++---
 8 files changed, 52 insertions(+), 43 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index a9140ea..925e063 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -526,7 +526,7 @@ static inline int rsv_is_empty(struct ext4_reserve_window 
*rsv)
  * when setting the reservation window size through ioctl before the file
  * is open for write (needs block allocation).
  *
- * Needs truncate_mutex protection prior to call this function.
+ * Needs down_write(i_data_sem) protection prior to call this function.
  */
 void ext4_init_block_alloc_info(struct inode *inode)
 {
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ec5019f..03d1bbb 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1565,7 +1565,7 @@ static int ext4_ext_rm_idx(handle_t *handle, struct inode 
*inode,
  * This routine returns max. credits that the extent tree can consume.
  * It should be OK for low-performance paths like ->writepage()
  * To allow many writing processes to fit into a single transaction,
- * the caller should calculate credits under truncate_mutex and
+ * the caller should calculate credits under i_data_sem and
  * pass the actual path.
  */
 int ext4_ext_calc_credits_for_insert(struct inode *inode,
@@ -2131,7 +2131,8 @@ out:
 
 /*
  * Need to be called with
- * mutex_lock(&EXT4_I(inode)->truncate_mutex);
+ * down_read(&EXT4_I(inode)->i_data_sem) if not allocating file system block
+ * (ie, create is zero). Otherwise down_write(&EXT4_I(inode)->i_data_sem)
  */
 int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
ext4_lblk_t iblock,
@@ -2350,7 +2351,7 @@ void ext4_ext_truncate(struct inode * inode, struct page 
*page)
if (page)
ext4_block_truncate_page(handle, page, mapping, inode->i_size);
 
-   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+   down_write(&EXT4_I(inode)->i_data_sem);
ext4_ext_invalidate_cache(inode);
 
/*
@@ -2386,7 +2387,7 @@ out_stop:
if (inode->i_nlink)
ext4_orphan_del(handle, inode);
 
-   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+   up_write(&EXT4_I(inode)->i_data_sem);
ext4_journal_stop(handle);
 }
 
@@ -2450,7 +2451,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t 
offset, loff_t len)
 * modify 1 super block, 1 block bitmap and 1 group descriptor.
 */
credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + 3;
-   mutex_lock(&EXT4_I(inode)->truncate_mutex)
+   down_write((&EXT4_I(inode)->i_data_sem));
 retry:
while (ret >= 0 && ret < max_blocks) {
block = block + ret;
@@ -2507,7 +2508,7 @@ retry:
if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry;
 
-   mutex_unlock(&EXT4_I(inode)->truncate_mutex)
+   up_write((&EXT4_I(inode)->i_data_sem));
/*
 * Time to update the file size.
 * Update only when preallocation was requested beyond the file size.
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index a6b2aa1..ac35ec5 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -37,9 +37,9 @@ static int ext4_release_file (struct inode * inode, struct 
file * filp)
if ((filp->f_mode & FMODE_WRITE) &&
(atomic_read(&inode->i_writecount) == 1))
{
-   mutex_lock(&EXT4_I(inode)->truncate_mutex);
+   down_write(&EXT4_I(inode)->i_data_sem);
ext4_discard_reservation(inode);
-   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+   up_write(&EXT4_I(inode)->i_data_sem);
}
if (is_dx(inode) && filp->private_data)
ext4_htree_free_dir_info(filp->private_data);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 71c7ad0..596b3ab 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -308,7 +308,7 @@ static int ext4_block_to_path(struct inode *inode,
final = ptrs;
} else {
ext4_warning(inode->i_sb, "ext4_block_to_path",
-   "block %u > max",
+   "block %lu > max",
i_block + direct_blocks +
indirect_blocks + double_blocks);
}
@@ -345,7 +345,7 @@ static int ext4

[PATCH 10/49] ext4: Rename i_dir_acl to i_size_high

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

Rename ext4_inode.i_dir_acl to i_size_high
drop ext4_inode_info.i_dir_acl as it is not used
Rename ext4_inode.i_size to ext4_inode.i_size_lo
Add helper function for accessing the ext4_inode combined i_size.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/ialloc.c  |1 -
 fs/ext4/inode.c   |   55 ++---
 include/linux/ext4_fs.h   |   15 +--
 include/linux/ext4_fs_i.h |1 -
 4 files changed, 34 insertions(+), 38 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 00b152b..17b5df1 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -709,7 +709,6 @@ got:
if (!S_ISDIR(mode))
ei->i_flags &= ~EXT4_DIRSYNC_FL;
ei->i_file_acl = 0;
-   ei->i_dir_acl = 0;
ei->i_dtime = 0;
ei->i_block_alloc_info = NULL;
ei->i_block_group = group;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7bcec18..e663455 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2694,7 +2694,6 @@ void ext4_read_inode(struct inode * inode)
inode->i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
}
inode->i_nlink = le16_to_cpu(raw_inode->i_links_count);
-   inode->i_size = le32_to_cpu(raw_inode->i_size);
 
ei->i_state = 0;
ei->i_dir_start_lookup = 0;
@@ -2720,15 +2719,11 @@ void ext4_read_inode(struct inode * inode)
ei->i_flags = le32_to_cpu(raw_inode->i_flags);
ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl_lo);
if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
-   cpu_to_le32(EXT4_OS_HURD))
+   cpu_to_le32(EXT4_OS_HURD)) {
ei->i_file_acl |=
((__u64)le16_to_cpu(raw_inode->i_file_acl_high)) << 32;
-   if (!S_ISREG(inode->i_mode)) {
-   ei->i_dir_acl = le32_to_cpu(raw_inode->i_dir_acl);
-   } else {
-   inode->i_size |=
-   ((__u64)le32_to_cpu(raw_inode->i_size_high)) << 32;
}
+   inode->i_size = ext4_isize(raw_inode);
ei->i_disksize = inode->i_size;
inode->i_generation = le32_to_cpu(raw_inode->i_generation);
ei->i_block_group = iloc.block_group;
@@ -2852,7 +2847,6 @@ static int ext4_do_update_inode(handle_t *handle,
raw_inode->i_gid_high = 0;
}
raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
-   raw_inode->i_size = cpu_to_le32(ei->i_disksize);
 
EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
@@ -2867,32 +2861,27 @@ static int ext4_do_update_inode(handle_t *handle,
raw_inode->i_file_acl_high =
cpu_to_le16(ei->i_file_acl >> 32);
raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
-   if (!S_ISREG(inode->i_mode)) {
-   raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl);
-   } else {
-   raw_inode->i_size_high =
-   cpu_to_le32(ei->i_disksize >> 32);
-   if (ei->i_disksize > 0x7fffULL) {
-   struct super_block *sb = inode->i_sb;
-   if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
-   EXT4_FEATURE_RO_COMPAT_LARGE_FILE) ||
-   EXT4_SB(sb)->s_es->s_rev_level ==
-   cpu_to_le32(EXT4_GOOD_OLD_REV)) {
-  /* If this is the first large file
-   * created, add a flag to the superblock.
-   */
-   err = ext4_journal_get_write_access(handle,
-   EXT4_SB(sb)->s_sbh);
-   if (err)
-   goto out_brelse;
-   ext4_update_dynamic_rev(sb);
-   EXT4_SET_RO_COMPAT_FEATURE(sb,
+   ext4_isize_set(raw_inode, ei->i_disksize);
+   if (ei->i_disksize > 0x7fffULL) {
+   struct super_block *sb = inode->i_sb;
+   if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
+   EXT4_FEATURE_RO_COMPAT_LARGE_FILE) ||
+   EXT4_SB(sb)->s_es->s_rev_level ==
+   cpu_to_le32(EXT4_GOOD_OLD_REV)) {
+   /* If this is the first large file
+* created, add a flag to the superblock.
+*/
+   err = ext4_journal_get_write_access(handle,
+   EXT4_SB(sb)->s_sbh);
+   if (err)
+   goto out_brelse;
+   ext4_update_dynamic_rev(sb);
+   EXT4_SET_RO_COMPAT_FEATURE(sb,
EXT4_FEATURE_RO_COMPAT_LARGE_FILE);
-   

[PATCH 29/49] ext4: Make ext4_get_blocks_wrap take the truncate_mutex early.

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

When doing a migrate from ext3 to ext4 inode we need to make sure the test
for inode type and walking inode data happens inside  lock. To make this
happen move truncate_mutex early before checking the i_flags.


This actually should enable us to remove the verify_chain().

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/extents.c   |9 --
 fs/ext4/inode.c |   69 +-
 include/linux/ext4_fs.h |2 +
 3 files changed, 16 insertions(+), 64 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 8593e59..ec5019f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2129,6 +2129,10 @@ out:
return err ? err : allocated;
 }
 
+/*
+ * Need to be called with
+ * mutex_lock(&EXT4_I(inode)->truncate_mutex);
+ */
 int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
ext4_lblk_t iblock,
unsigned long max_blocks, struct buffer_head *bh_result,
@@ -2144,7 +2148,6 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode 
*inode,
__clear_bit(BH_New, &bh_result->b_state);
ext_debug("blocks %u/%lu requested for inode %u\n",
iblock, max_blocks, inode->i_ino);
-   mutex_lock(&EXT4_I(inode)->truncate_mutex);
 
/* check in cache */
goal = ext4_ext_in_cache(inode, iblock, &newex);
@@ -2318,8 +2321,6 @@ out2:
ext4_ext_drop_refs(path);
kfree(path);
}
-   mutex_unlock(&EXT4_I(inode)->truncate_mutex);
-
return err ? err : allocated;
 }
 
@@ -2449,6 +2450,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t 
offset, loff_t len)
 * modify 1 super block, 1 block bitmap and 1 group descriptor.
 */
credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + 3;
+   mutex_lock(&EXT4_I(inode)->truncate_mutex)
 retry:
while (ret >= 0 && ret < max_blocks) {
block = block + ret;
@@ -2505,6 +2507,7 @@ retry:
if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry;
 
+   mutex_unlock(&EXT4_I(inode)->truncate_mutex)
/*
 * Time to update the file size.
 * Update only when preallocation was requested beyond the file size.
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index eaace13..71c7ad0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -243,13 +243,6 @@ static inline void add_chain(Indirect *p, struct 
buffer_head *bh, __le32 *v)
p->bh = bh;
 }
 
-static int verify_chain(Indirect *from, Indirect *to)
-{
-   while (from <= to && from->key == *from->p)
-   from++;
-   return (from > to);
-}
-
 /**
  * ext4_block_to_path - parse the block number into array of offsets
  * @inode: inode in question (we are only interested in its superblock)
@@ -348,10 +341,11 @@ static int ext4_block_to_path(struct inode *inode,
  * (pointer to last triple returned, [EMAIL PROTECTED] == 0)
  * or when it gets an IO error reading an indirect block
  * (ditto, [EMAIL PROTECTED] == -EIO)
- * or when it notices that chain had been changed while it was reading
- * (ditto, [EMAIL PROTECTED] == -EAGAIN)
  * or when it reads all @depth-1 indirect blocks successfully and finds
  * the whole chain, all way to the data (returns %NULL, *err == 0).
+ *
+ *  Need to be called with
+ *  mutex_lock(&EXT4_I(inode)->truncate_mutex)
  */
 static Indirect *ext4_get_branch(struct inode *inode, int depth,
 ext4_lblk_t  *offsets,
@@ -370,9 +364,6 @@ static Indirect *ext4_get_branch(struct inode *inode, int 
depth,
bh = sb_bread(sb, le32_to_cpu(p->key));
if (!bh)
goto failure;
-   /* Reader: pointers */
-   if (!verify_chain(chain, p))
-   goto changed;
add_chain(++p, bh, (__le32*)bh->b_data + *++offsets);
/* Reader: end */
if (!p->key)
@@ -380,10 +371,6 @@ static Indirect *ext4_get_branch(struct inode *inode, int 
depth,
}
return NULL;
 
-changed:
-   brelse(bh);
-   *err = -EAGAIN;
-   goto no_block;
 failure:
*err = -EIO;
 no_block:
@@ -787,6 +774,10 @@ err_out:
  * return > 0, # of blocks mapped or allocated.
  * return = 0, if plain lookup failed.
  * return < 0, error case.
+ *
+ *
+ * Need to be called with
+ * mutex_lock(&EXT4_I(inode)->truncate_mutex)
  */
 int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
ext4_lblk_t iblock, unsigned long maxblocks,
@@ -825,18 +816,6 @@ int ext4_get_blocks_handle(handle_t *handle, struct inode 
*inode,
while (count < maxblocks && count <= blocks_to_boundary) {
ext4_fsblk_t blk;
 
-   if (!verify_chain(chai

[PATCH 11/49] ext4: Add support for 48 bit inode i_blocks.

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode
to represet the higher 16 bits for i_blocks. With this change max_file
size becomes (2**48 -1 )* 512 bytes.

We add a RO_COMPAT feature to the super block to indicate that inode
have i_blocks represented as a split 48 bits. Super block with this
feature set cannot be mounted read write on a kernel with CONFIG_LSF
disabled.

Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/inode.c |   58 ++-
 fs/ext4/super.c |   62 ++
 include/linux/ext4_fs.h |   10 +--
 3 files changed, 119 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e663455..bb89fe7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2667,6 +2667,22 @@ void ext4_get_inode_flags(struct ext4_inode_info *ei)
if (flags & S_DIRSYNC)
ei->i_flags |= EXT4_DIRSYNC_FL;
 }
+static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode,
+   struct ext4_inode_info *ei)
+{
+   blkcnt_t i_blocks ;
+   struct super_block *sb = ei->vfs_inode.i_sb;
+
+   if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+   EXT4_FEATURE_RO_COMPAT_HUGE_FILE)) {
+   /* we are using combined 48 bit field */
+   i_blocks = ((u64)le16_to_cpu(raw_inode->i_blocks_high)) << 32 |
+   le32_to_cpu(raw_inode->i_blocks_lo);
+   return i_blocks;
+   } else {
+   return le32_to_cpu(raw_inode->i_blocks_lo);
+   }
+}
 
 void ext4_read_inode(struct inode * inode)
 {
@@ -2715,8 +2731,8 @@ void ext4_read_inode(struct inode * inode)
 * recovery code: that's fine, we're about to complete
 * the process of deleting those. */
}
-   inode->i_blocks = le32_to_cpu(raw_inode->i_blocks);
ei->i_flags = le32_to_cpu(raw_inode->i_flags);
+   inode->i_blocks = ext4_inode_blocks(raw_inode, ei);
ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl_lo);
if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
cpu_to_le32(EXT4_OS_HURD)) {
@@ -2799,6 +2815,43 @@ bad_inode:
return;
 }
 
+static int ext4_inode_blocks_set(handle_t *handle,
+   struct ext4_inode *raw_inode,
+   struct ext4_inode_info *ei)
+{
+   struct inode *inode = &(ei->vfs_inode);
+   u64 i_blocks = inode->i_blocks;
+   struct super_block *sb = inode->i_sb;
+   int err = 0;
+
+   if (i_blocks <= ~0U) {
+   /*
+* i_blocks can be represnted in a 32 bit variable
+* as multiple of 512 bytes
+*/
+   raw_inode->i_blocks_lo   = cpu_to_le32((u32)i_blocks);
+   raw_inode->i_blocks_high = 0;
+   } else if (i_blocks <= 0xULL) {
+   /*
+* i_blocks can be represented in a 48 bit variable
+* as multiple of 512 bytes
+*/
+   err = ext4_update_rocompat_feature(handle, sb,
+   EXT4_FEATURE_RO_COMPAT_HUGE_FILE);
+   if (err)
+   goto  err_out;
+   /* i_block is stored in the split  48 bit fields */
+   raw_inode->i_blocks_lo   = cpu_to_le32((u32)i_blocks);
+   raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
+   } else {
+   ext4_error(sb, __FUNCTION__,
+   "Wrong inode i_blocks count  %llu\n",
+   (unsigned long long)inode->i_blocks);
+   }
+err_out:
+   return err;
+}
+
 /*
  * Post the struct inode info into an on-disk inode location in the
  * buffer-cache.  This gobbles the caller's reference to the
@@ -2853,7 +2906,8 @@ static int ext4_do_update_inode(handle_t *handle,
EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
 
-   raw_inode->i_blocks = cpu_to_le32(inode->i_blocks);
+   if (ext4_inode_blocks_set(handle, raw_inode, ei))
+   goto out_brelse;
raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
raw_inode->i_flags = cpu_to_le32(ei->i_flags);
if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7be27db..2b9dc96 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1603,17 +1603,50 @@ static void ext4_orphan_cleanup (struct super_block * 
sb,
 
 /*
  * Maximal file size.  There is a direct, and {,double-,triple-}indirect
- * block limit, and also a limit of (2^32 - 1) 512-byte sectors in i_blocks.
- * We need to be 1 filesystem block less than the 2^32 sector limit.
+ * block limit, and also a limit of (2^48 - 1

[PATCH 24/49] ext4: add block bitmap validation

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

When a new block bitmap is read from disk in read_block_bitmap()
there are a few bits that should ALWAYS be set.  In particular,
the blocks given corresponding to block bitmap, inode bitmap and inode tables.
Validate the block bitmap against these blocks.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/balloc.c |   99 --
 1 files changed, 81 insertions(+), 18 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index ff3428e..a9140ea 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -189,13 +189,65 @@ struct ext4_group_desc * ext4_get_group_desc(struct 
super_block * sb,
return desc;
 }
 
+static int ext4_valid_block_bitmap(struct super_block *sb,
+   struct ext4_group_desc *desc,
+   unsigned int block_group,
+   struct buffer_head *bh)
+{
+   ext4_grpblk_t offset;
+   ext4_grpblk_t next_zero_bit;
+   ext4_fsblk_t bitmap_blk;
+   ext4_fsblk_t group_first_block;
+
+   if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) {
+   /* with FLEX_BG, the inode/block bitmaps and itable
+* blocks may not be in the group at all
+* so the bitmap validation will be skipped for those groups
+* or it has to also read the block group where the bitmaps
+* are located to verify they are set.
+*/
+   return 1;
+   }
+   group_first_block = ext4_group_first_block_no(sb, block_group);
+
+   /* check whether block bitmap block number is set */
+   bitmap_blk = ext4_block_bitmap(sb, desc);
+   offset = bitmap_blk - group_first_block;
+   if (!ext4_test_bit(offset, bh->b_data))
+   /* bad block bitmap */
+   goto err_out;
+
+   /* check whether the inode bitmap block number is set */
+   bitmap_blk = ext4_inode_bitmap(sb, desc);
+   offset = bitmap_blk - group_first_block;
+   if (!ext4_test_bit(offset, bh->b_data))
+   /* bad block bitmap */
+   goto err_out;
+
+   /* check whether the inode table block number is set */
+   bitmap_blk = ext4_inode_table(sb, desc);
+   offset = bitmap_blk - group_first_block;
+   next_zero_bit = ext4_find_next_zero_bit(bh->b_data,
+   offset + EXT4_SB(sb)->s_itb_per_group,
+   offset);
+   if (next_zero_bit >= offset + EXT4_SB(sb)->s_itb_per_group)
+   /* good bitmap for inode tables */
+   return 1;
+
+err_out:
+   ext4_error(sb, __FUNCTION__,
+   "Invalid block bitmap - "
+   "block_group = %d, block = %llu",
+   block_group, bitmap_blk);
+   return 0;
+}
 /**
  * read_block_bitmap()
  * @sb:super block
  * @block_group:   given block group
  *
- * Read the bitmap for a given block_group, reading into the specified
- * slot in the superblock's bitmap cache.
+ * Read the bitmap for a given block_group,and validate the
+ * bits for block/inode/inode tables are set in the bitmaps
  *
  * Return buffer_head on success or NULL in case of failure.
  */
@@ -210,25 +262,36 @@ read_block_bitmap(struct super_block *sb, ext4_group_t 
block_group)
if (!desc)
return NULL;
bitmap_blk = ext4_block_bitmap(sb, desc);
+   bh = sb_getblk(sb, bitmap_blk);
+   if (unlikely(!bh)) {
+   ext4_error(sb, __FUNCTION__,
+   "Cannot read block bitmap - "
+   "block_group = %d, block_bitmap = %llu",
+   (int)block_group, (unsigned long long)bitmap_blk);
+   return NULL;
+   }
+   if (bh_uptodate_or_lock(bh))
+   return bh;
+
if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
-   bh = sb_getblk(sb, bitmap_blk);
-   if (!buffer_uptodate(bh)) {
-   lock_buffer(bh);
-   if (!buffer_uptodate(bh)) {
-   ext4_init_block_bitmap(sb, bh, block_group,
-  desc);
-   set_buffer_uptodate(bh);
-   }
-   unlock_buffer(bh);
-   }
-   } else {
-   bh = sb_bread(sb, bitmap_blk);
+   ext4_init_block_bitmap(sb, bh, block_group, desc);
+   set_buffer_uptodate(bh);
+   unlock_buffer(bh);
+   return bh;
}
-   if (!bh)
-   ext4_error (sb, __FUNCTION__,
+   if (bh_submit_read(bh) < 0) {
+   brelse(bh);
+   ext4_error(sb, __FUNCTION__,
"Cannot read block bitmap - "
- 

[PATCH 46/49] jbd2: add lockdep support

2008-01-21 Thread Theodore Ts'o
From: Mingming Cao <[EMAIL PROTECTED]>

Ported from similar patch for the jbd layer.

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
---
 fs/jbd2/transaction.c |   11 +++
 include/linux/jbd2.h  |4 
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index f30802a..70b3199 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -241,6 +241,8 @@ out:
return ret;
 }
 
+static struct lock_class_key jbd2_handle_key;
+
 /* Allocate a new handle.  This should probably be in a slab... */
 static handle_t *new_handle(int nblocks)
 {
@@ -251,6 +253,9 @@ static handle_t *new_handle(int nblocks)
handle->h_buffer_credits = nblocks;
handle->h_ref = 1;
 
+   lockdep_init_map(&handle->h_lockdep_map, "jbd2_handle",
+   &jbd2_handle_key, 0);
+
return handle;
 }
 
@@ -293,7 +298,11 @@ handle_t *jbd2_journal_start(journal_t *journal, int 
nblocks)
jbd2_free_handle(handle);
current->journal_info = NULL;
handle = ERR_PTR(err);
+   goto out;
}
+
+   lock_acquire(&handle->h_lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+out:
return handle;
 }
 
@@ -1419,6 +1428,8 @@ int jbd2_journal_stop(handle_t *handle)
spin_unlock(&journal->j_state_lock);
}
 
+   lock_release(&handle->h_lockdep_map, 1, _THIS_IP_);
+
jbd2_free_handle(handle);
return err;
 }
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index a2645c2..f982d38 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -418,6 +418,10 @@ struct handle_s
unsigned inth_sync: 1;  /* sync-on-close */
unsigned inth_jdata:1;  /* force data journaling */
unsigned inth_aborted:  1;  /* fatal error on handle */
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+   struct lockdep_map  h_lockdep_map;
+#endif
 };
 
 
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/49] ext4: Support large files

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

This patch converts ext4_inode i_blocks to represent total
blocks occupied by the inode in file system block size.
Earlier the variable used to represent this in 512 byte
block size. This actually limited the total size of the file.

The feature is enabled transparently when we write an inode
whose i_blocks cannot be represnted as 512 byte units in a
48 bit variable.

inode flag  EXT4_HUGE_FILE_FL

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/inode.c |   32 +---
 fs/ext4/super.c |9 ++---
 include/linux/ext4_fs.h |3 ++-
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index bb89fe7..9cf8572 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2671,14 +2671,20 @@ static blkcnt_t ext4_inode_blocks(struct ext4_inode 
*raw_inode,
struct ext4_inode_info *ei)
 {
blkcnt_t i_blocks ;
-   struct super_block *sb = ei->vfs_inode.i_sb;
+   struct inode *inode = &(ei->vfs_inode);
+   struct super_block *sb = inode->i_sb;
 
if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
EXT4_FEATURE_RO_COMPAT_HUGE_FILE)) {
/* we are using combined 48 bit field */
i_blocks = ((u64)le16_to_cpu(raw_inode->i_blocks_high)) << 32 |
le32_to_cpu(raw_inode->i_blocks_lo);
-   return i_blocks;
+   if (ei->i_flags & EXT4_HUGE_FILE_FL) {
+   /* i_blocks represent file system block size */
+   return i_blocks  << (inode->i_blkbits - 9);
+   } else {
+   return i_blocks;
+   }
} else {
return le32_to_cpu(raw_inode->i_blocks_lo);
}
@@ -2829,8 +2835,9 @@ static int ext4_inode_blocks_set(handle_t *handle,
 * i_blocks can be represnted in a 32 bit variable
 * as multiple of 512 bytes
 */
-   raw_inode->i_blocks_lo   = cpu_to_le32((u32)i_blocks);
+   raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
raw_inode->i_blocks_high = 0;
+   ei->i_flags &= ~EXT4_HUGE_FILE_FL;
} else if (i_blocks <= 0xULL) {
/*
 * i_blocks can be represented in a 48 bit variable
@@ -2841,12 +2848,23 @@ static int ext4_inode_blocks_set(handle_t *handle,
if (err)
goto  err_out;
/* i_block is stored in the split  48 bit fields */
-   raw_inode->i_blocks_lo   = cpu_to_le32((u32)i_blocks);
+   raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
+   ei->i_flags &= ~EXT4_HUGE_FILE_FL;
} else {
-   ext4_error(sb, __FUNCTION__,
-   "Wrong inode i_blocks count  %llu\n",
-   (unsigned long long)inode->i_blocks);
+   /*
+* i_blocks should be represented in a 48 bit variable
+* as multiple of  file system block size
+*/
+   err = ext4_update_rocompat_feature(handle, sb,
+   EXT4_FEATURE_RO_COMPAT_HUGE_FILE);
+   if (err)
+   goto  err_out;
+   ei->i_flags |= EXT4_HUGE_FILE_FL;
+   /* i_block is stored in file system block size */
+   i_blocks = i_blocks >> (inode->i_blkbits - 9);
+   raw_inode->i_blocks_lo   = cpu_to_le32(i_blocks);
+   raw_inode->i_blocks_high = cpu_to_le16(i_blocks >> 32);
}
 err_out:
return err;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2b9dc96..64067de 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1631,11 +1631,14 @@ static loff_t ext4_max_size(int bits)
upper_limit >>= (bits - 9);
 
} else {
-   /* We use 48 bit ext4_inode i_blocks */
+   /*
+* We use 48 bit ext4_inode i_blocks
+* With EXT4_HUGE_FILE_FL set the i_blocks
+* represent total number of blocks in
+* file system block size
+*/
upper_limit = (1LL << 48) - 1;
 
-   /* total blocks in file system block size */
-   upper_limit >>= (bits - 9);
}
 
/* indirect blocks */
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index be25eca..6ae91f4 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -178,8 +178,9 @@ struct ext4_group_desc
 #define EXT4_NOTAIL_FL 0x8000 /* file tail should not be 
merged */
 #define EXT4_DIRSYNC_FL0x0001 /* dirsync behaviour 
(directories only) */
 #defin

[PATCH 37/49] ext4: Fix ext4_show_options to show the correct mount options.

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

We need to look at the default value and make sure
the mount options are not set via default value
before showing them via ext4_show_options

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/super.c |   26 +++---
 1 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index aa22acd..64fc7f1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -665,18 +665,20 @@ static inline void ext4_show_quota_options(struct 
seq_file *seq, struct super_bl
  */
 static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
 {
+   int def_errors;
+   unsigned long def_mount_opts;
struct super_block *sb = vfs->mnt_sb;
struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_super_block *es = sbi->s_es;
-   unsigned long def_mount_opts;
 
def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
+   def_errors = le16_to_cpu(es->s_errors);
 
if (sbi->s_sb_block != 1)
seq_printf(seq, ",sb=%llu", sbi->s_sb_block);
if (test_opt(sb, MINIX_DF))
seq_puts(seq, ",minixdf");
-   if (test_opt(sb, GRPID))
+   if (test_opt(sb, GRPID) && !(def_mount_opts & EXT4_DEFM_BSDGROUPS))
seq_puts(seq, ",grpid");
if (!test_opt(sb, GRPID) && (def_mount_opts & EXT4_DEFM_BSDGROUPS))
seq_puts(seq, ",nogrpid");
@@ -689,25 +691,24 @@ static int ext4_show_options(struct seq_file *seq, struct 
vfsmount *vfs)
seq_printf(seq, ",resgid=%u", sbi->s_resgid);
}
if (test_opt(sb, ERRORS_RO)) {
-   int def_errors = le16_to_cpu(es->s_errors);
-
if (def_errors == EXT4_ERRORS_PANIC ||
def_errors == EXT4_ERRORS_CONTINUE) {
seq_puts(seq, ",errors=remount-ro");
}
}
-   if (test_opt(sb, ERRORS_CONT))
+   if (test_opt(sb, ERRORS_CONT) && def_errors != EXT4_ERRORS_CONTINUE)
seq_puts(seq, ",errors=continue");
-   if (test_opt(sb, ERRORS_PANIC))
+   if (test_opt(sb, ERRORS_PANIC) && def_errors != EXT4_ERRORS_PANIC)
seq_puts(seq, ",errors=panic");
-   if (test_opt(sb, NO_UID32))
+   if (test_opt(sb, NO_UID32) && !(def_mount_opts & EXT4_DEFM_UID16))
seq_puts(seq, ",nouid32");
-   if (test_opt(sb, DEBUG))
+   if (test_opt(sb, DEBUG) && !(def_mount_opts & EXT4_DEFM_DEBUG))
seq_puts(seq, ",debug");
if (test_opt(sb, OLDALLOC))
seq_puts(seq, ",oldalloc");
 #ifdef CONFIG_EXT4DEV_FS_XATTR
-   if (test_opt(sb, XATTR_USER))
+   if (test_opt(sb, XATTR_USER) &&
+   !(def_mount_opts & EXT4_DEFM_XATTR_USER))
seq_puts(seq, ",user_xattr");
if (!test_opt(sb, XATTR_USER) &&
(def_mount_opts & EXT4_DEFM_XATTR_USER)) {
@@ -715,7 +716,7 @@ static int ext4_show_options(struct seq_file *seq, struct 
vfsmount *vfs)
}
 #endif
 #ifdef CONFIG_EXT4DEV_FS_POSIX_ACL
-   if (test_opt(sb, POSIX_ACL))
+   if (test_opt(sb, POSIX_ACL) && !(def_mount_opts & EXT4_DEFM_ACL))
seq_puts(seq, ",acl");
if (!test_opt(sb, POSIX_ACL) && (def_mount_opts & EXT4_DEFM_ACL))
seq_puts(seq, ",noacl");
@@ -735,6 +736,10 @@ static int ext4_show_options(struct seq_file *seq, struct 
vfsmount *vfs)
if (test_opt(sb, I_VERSION))
seq_puts(seq, ",i_version");
 
+   /*
+* journal mode get enabled in different ways
+* So just print the value even if we didn't specify it
+*/
if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA)
seq_puts(seq, ",data=journal");
else if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_ORDERED_DATA)
@@ -743,7 +748,6 @@ static int ext4_show_options(struct seq_file *seq, struct 
vfsmount *vfs)
seq_puts(seq, ",data=writeback");
 
ext4_show_quota_options(seq, sb);
-
return 0;
 }
 
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/49] ext4: Introduce ext4_update_*_feature

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

Introduce ext4_update_*_feature and use them instead
of opencoding.


Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/ialloc.c|   11 +++-
 fs/ext4/super.c |   60 +++
 include/linux/ext4_fs.h |6 
 3 files changed, 70 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 7b5cfa6..00b152b 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -748,13 +748,10 @@ got:
if (test_opt(sb, EXTENTS)) {
EXT4_I(inode)->i_flags |= EXT4_EXTENTS_FL;
ext4_ext_tree_init(handle, inode);
-   if (!EXT4_HAS_INCOMPAT_FEATURE(sb, 
EXT4_FEATURE_INCOMPAT_EXTENTS)) {
-   err = ext4_journal_get_write_access(handle, 
EXT4_SB(sb)->s_sbh);
-   if (err) goto fail;
-   EXT4_SET_INCOMPAT_FEATURE(sb, 
EXT4_FEATURE_INCOMPAT_EXTENTS);
-   BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "call 
ext4_journal_dirty_metadata");
-   err = ext4_journal_dirty_metadata(handle, 
EXT4_SB(sb)->s_sbh);
-   }
+   err = ext4_update_incompat_feature(handle, sb,
+   EXT4_FEATURE_INCOMPAT_EXTENTS);
+   if (err)
+   goto fail;
}
 
ext4_debug("allocating inode %lu\n", inode->i_ino);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index df8842b..4d7f33f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -373,6 +373,66 @@ void ext4_update_dynamic_rev(struct super_block *sb)
 */
 }
 
+int ext4_update_compat_feature(handle_t *handle,
+   struct super_block *sb, __u32 compat)
+{
+   int err = 0;
+   if (!EXT4_HAS_COMPAT_FEATURE(sb, compat)) {
+   err = ext4_journal_get_write_access(handle,
+   EXT4_SB(sb)->s_sbh);
+   if (err)
+   return err;
+   EXT4_SET_COMPAT_FEATURE(sb, compat);
+   sb->s_dirt = 1;
+   handle->h_sync = 1;
+   BUFFER_TRACE(EXT4_SB(sb)->s_sbh,
+   "call ext4_journal_dirty_met adata");
+   err = ext4_journal_dirty_metadata(handle,
+   EXT4_SB(sb)->s_sbh);
+   }
+   return err;
+}
+
+int ext4_update_rocompat_feature(handle_t *handle,
+   struct super_block *sb, __u32 rocompat)
+{
+   int err = 0;
+   if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, rocompat)) {
+   err = ext4_journal_get_write_access(handle,
+   EXT4_SB(sb)->s_sbh);
+   if (err)
+   return err;
+   EXT4_SET_RO_COMPAT_FEATURE(sb, rocompat);
+   sb->s_dirt = 1;
+   handle->h_sync = 1;
+   BUFFER_TRACE(EXT4_SB(sb)->s_sbh,
+   "call ext4_journal_dirty_met adata");
+   err = ext4_journal_dirty_metadata(handle,
+   EXT4_SB(sb)->s_sbh);
+   }
+   return err;
+}
+
+int ext4_update_incompat_feature(handle_t *handle,
+   struct super_block *sb, __u32 incompat)
+{
+   int err = 0;
+   if (!EXT4_HAS_INCOMPAT_FEATURE(sb, incompat)) {
+   err = ext4_journal_get_write_access(handle,
+   EXT4_SB(sb)->s_sbh);
+   if (err)
+   return err;
+   EXT4_SET_INCOMPAT_FEATURE(sb, incompat);
+   sb->s_dirt = 1;
+   handle->h_sync = 1;
+   BUFFER_TRACE(EXT4_SB(sb)->s_sbh,
+   "call ext4_journal_dirty_met adata");
+   err = ext4_journal_dirty_metadata(handle,
+   EXT4_SB(sb)->s_sbh);
+   }
+   return err;
+}
+
 /*
  * Open the external journal device
  */
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index e1103c2..429dbfc 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -989,6 +989,12 @@ extern void ext4_abort (struct super_block *, const char 
*, const char *, ...)
 extern void ext4_warning (struct super_block *, const char *, const char *, 
...)
__attribute__ ((format (printf, 3, 4)));
 extern void ext4_update_dynamic_rev (struct super_block *sb);
+extern int ext4_update_compat_feature(handle_t *handle, struct super_block *sb,
+   __u32 compat);
+extern int ext4_update_rocompat_feature(handle_t *handle,
+   struct super_block *sb, __u32 rocompat);
+extern int ext4_update_incompat_feature(handle_t *handle,
+   struct super_block *sb, __u32 incompat);
 extern ext4_fsblk_t ext4_block_bitmap(struct super_block *sb,
 

[PATCH 08/49] ext4: Fix sparse warnings.

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

Fix sparse warnings related to static functions
and local variables.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/extents.c   |6 +++---
 fs/ext4/inode.c |   18 +++---
 fs/ext4/super.c |3 +++
 include/linux/ext4_fs.h |2 ++
 4 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 6853722..754c0d3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1088,7 +1088,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode 
*inode,
  * then we have to correct all indexes above.
  * TODO: do we need to correct tree in all cases?
  */
-int ext4_ext_correct_indexes(handle_t *handle, struct inode *inode,
+static int ext4_ext_correct_indexes(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path)
 {
struct ext4_extent_header *eh;
@@ -1535,7 +1535,7 @@ ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block,
  * It's used in truncate case only, thus all requests are for
  * last index in the block only.
  */
-int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
+static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path)
 {
struct buffer_head *bh;
@@ -1806,7 +1806,7 @@ ext4_ext_more_to_rm(struct ext4_ext_path *path)
return 1;
 }
 
-int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start)
+static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start)
 {
struct super_block *sb = inode->i_sb;
int depth = ext_depth(inode);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1ee19c9..76ceba2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2052,11 +2052,11 @@ static void ext4_clear_blocks(handle_t *handle, struct 
inode *inode,
for (p = first; p < last; p++) {
u32 nr = le32_to_cpu(*p);
if (nr) {
-   struct buffer_head *bh;
+   struct buffer_head *tbh;
 
*p = 0;
-   bh = sb_find_get_block(inode->i_sb, nr);
-   ext4_forget(handle, 0, inode, bh, nr);
+   tbh = sb_find_get_block(inode->i_sb, nr);
+   ext4_forget(handle, 0, inode, tbh, nr);
}
}
 
@@ -2324,8 +2324,10 @@ void ext4_truncate(struct inode *inode)
return;
}
 
-   if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)
-   return ext4_ext_truncate(inode, page);
+   if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
+   ext4_ext_truncate(inode, page);
+   return;
+   }
 
handle = start_transaction(inode);
if (IS_ERR(handle)) {
@@ -3163,8 +3165,10 @@ ext4_reserve_inode_write(handle_t *handle, struct inode 
*inode,
  * Expand an inode by new_extra_isize bytes.
  * Returns 0 on success or negative error number on failure.
  */
-int ext4_expand_extra_isize(struct inode *inode, unsigned int new_extra_isize,
-   struct ext4_iloc iloc, handle_t *handle)
+static int ext4_expand_extra_isize(struct inode *inode,
+  unsigned int new_extra_isize,
+  struct ext4_iloc iloc,
+  handle_t *handle)
 {
struct ext4_inode *raw_inode;
struct ext4_xattr_ibody_header *header;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4d7f33f..7be27db 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1644,6 +1644,9 @@ static ext4_fsblk_t descriptor_loc(struct super_block *sb,
 
 
 static int ext4_fill_super (struct super_block *sb, void *data, int silent)
+   __releases(kernel_sem)
+   __acquires(kernel_sem)
+
 {
struct buffer_head * bh;
struct ext4_super_block *es = NULL;
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index 429dbfc..1a27433 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -893,6 +893,8 @@ extern ext4_fsblk_t ext4_new_block (handle_t *handle, 
struct inode *inode,
ext4_fsblk_t goal, int *errp);
 extern ext4_fsblk_t ext4_new_blocks (handle_t *handle, struct inode *inode,
ext4_fsblk_t goal, unsigned long *count, int *errp);
+extern ext4_fsblk_t ext4_new_blocks_old(handle_t *handle, struct inode *inode,
+   ext4_fsblk_t goal, unsigned long *count, int *errp);
 extern void ext4_free_blocks (handle_t *handle, struct inode *inode,
ext4_fsblk_t block, unsigned long count);
 extern void ext4_free_blocks_sb (handle_t *handle, struct super_block *sb,
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.ke

[PATCH 21/49] ext4: fix oops on corrupted ext4 mount

2008-01-21 Thread Theodore Ts'o
From: Eric Sandeen <[EMAIL PROTECTED]>

When mounting an ext4 filesystem with corrupted s_first_data_block, things
can go very wrong and oops.

Because blocks_count in ext4_fill_super is a u64, and we must use do_div,
the calculation of db_count is done differently than on ext4.  If
first_data_block is corrupted such that it is larger than ext4_blocks_count,
for example, then the intermediate blocks_count value may go negative,
but sign-extend to a very large value:

blocks_count = (ext4_blocks_count(es) -
le32_to_cpu(es->s_first_data_block) +
EXT4_BLOCKS_PER_GROUP(sb) - 1);

This is then assigned to s_groups_count which is an unsigned long:

sbi->s_groups_count = blocks_count;

This may result in a value of 0x which is then used to compute
db_count:

db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
   EXT4_DESC_PER_BLOCK(sb);

and in this case db_count will wind up as 0 because the addition overflows
32 bits.  This in turn causes the kmalloc for group_desc to be of 0 size:

sbi->s_group_desc = kmalloc(db_count * sizeof (struct buffer_head *),
GFP_KERNEL);

and eventually in ext4_check_descriptors, dereferencing
sbi->s_group_desc[desc_block] will result in a NULL pointer dereference.

The simplest test seems to be to sanity check s_first_data_block,
EXT4_BLOCKS_PER_GROUP, and ext4_blocks_count values to be sure
their combination won't result in a bad intermediate value for
blocks_count.  We could just check for db_count == 0, but
catching it at the root cause seems like it provides more info.

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
Reviewed-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/ext4/super.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 1484a08..32e3ecb 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1997,6 +1997,17 @@ static int ext4_fill_super (struct super_block *sb, void 
*data, int silent)
 
if (EXT4_BLOCKS_PER_GROUP(sb) == 0)
goto cantfind_ext4;
+
+   /* ensure blocks_count calculation below doesn't sign-extend */
+   if (ext4_blocks_count(es) + EXT4_BLOCKS_PER_GROUP(sb) <
+   le32_to_cpu(es->s_first_data_block) + 1) {
+   printk(KERN_WARNING "EXT4-fs: bad geometry: block count %llu, "
+  "first data block %u, blocks per group %lu\n",
+   ext4_blocks_count(es),
+   le32_to_cpu(es->s_first_data_block),
+   EXT4_BLOCKS_PER_GROUP(sb));
+   goto failed_mount;
+   }
blocks_count = (ext4_blocks_count(es) -
le32_to_cpu(es->s_first_data_block) +
EXT4_BLOCKS_PER_GROUP(sb) - 1);
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 26/49] jbd2: Fix assertion failure in fs/jbd2/checkpoint.c

2008-01-21 Thread Theodore Ts'o
From: Jan Kara <[EMAIL PROTECTED]>

Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.

If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).

We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state.  The locking there is subtle though (as
everywhere in JBD ;().  We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.

Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either.  Better be safe if something
changes in future...

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---
 fs/jbd2/checkpoint.c |   12 ++--
 fs/jbd2/commit.c |8 
 include/linux/jbd2.h |2 ++
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 3fccde7..7e958c8 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -602,15 +602,15 @@ int __jbd2_journal_remove_checkpoint(struct journal_head 
*jh)
 
/*
 * There is one special case to worry about: if we have just pulled the
-* buffer off a committing transaction's forget list, then even if the
-* checkpoint list is empty, the transaction obviously cannot be
-* dropped!
+* buffer off a running or committing transaction's checkpoing list,
+* then even if the checkpoint list is empty, the transaction obviously
+* cannot be dropped!
 *
-* The locking here around j_committing_transaction is a bit sleazy.
+* The locking here around t_state is a bit sleazy.
 * See the comment at the end of jbd2_journal_commit_transaction().
 */
-   if (transaction == journal->j_committing_transaction) {
-   JBUFFER_TRACE(jh, "belongs to committing transaction");
+   if (transaction->t_state != T_FINISHED) {
+   JBUFFER_TRACE(jh, "belongs to running/committing transaction");
goto out;
}
 
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 6986f33..39b5cee 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -867,10 +867,10 @@ restart_loop:
}
spin_unlock(&journal->j_list_lock);
/*
-* This is a bit sleazy.  We borrow j_list_lock to protect
-* journal->j_committing_transaction in 
__jbd2_journal_remove_checkpoint.
-* Really, __jbd2_journal_remove_checkpoint should be using 
j_state_lock but
-* it's a bit hassle to hold that across 
__jbd2_journal_remove_checkpoint
+* This is a bit sleazy.  We use j_list_lock to protect transition
+* of a transaction into T_FINISHED state and calling
+* __jbd2_journal_drop_transaction(). Otherwise we could race with
+* other checkpointing code processing the transaction...
 */
spin_lock(&journal->j_state_lock);
spin_lock(&journal->j_list_lock);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index d5f7cff..d861ffd 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -442,6 +442,8 @@ struct transaction_s
/*
 * Transaction's current state
 * [no locking - only kjournald2 alters this]
+* [j_list_lock] guards transition of a transaction into T_FINISHED
+* state and subsequent call of __jbd2_journal_drop_transaction()
 * FIXME: needs barriers
 * KLUDGE: [use j_state_lock]
 */
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/49] Add buffer head related helper functions

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

Add buffer head related helper function bh_uptodate_or_lock and
bh_submit_read which can be used by file system

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/buffer.c |   41 +
 include/linux/buffer_head.h |2 ++
 2 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 7249e01..7593ff3 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3213,6 +3213,47 @@ static int buffer_cpu_notify(struct notifier_block *self,
return NOTIFY_OK;
 }
 
+/**
+ * bh_uptodate_or_lock: Test whether the buffer is uptodate
+ * @bh: struct buffer_head
+ *
+ * Return true if the buffer is up-to-date and false,
+ * with the buffer locked, if not.
+ */
+int bh_uptodate_or_lock(struct buffer_head *bh)
+{
+   if (!buffer_uptodate(bh)) {
+   lock_buffer(bh);
+   if (!buffer_uptodate(bh))
+   return 0;
+   unlock_buffer(bh);
+   }
+   return 1;
+}
+EXPORT_SYMBOL(bh_uptodate_or_lock);
+/**
+ * bh_submit_read: Submit a locked buffer for reading
+ * @bh: struct buffer_head
+ *
+ * Returns a negative error
+ */
+int bh_submit_read(struct buffer_head *bh)
+{
+   if (!buffer_locked(bh))
+   lock_buffer(bh);
+
+   if (buffer_uptodate(bh))
+   return 0;
+
+   get_bh(bh);
+   bh->b_end_io = end_buffer_read_sync;
+   submit_bh(READ, bh);
+   wait_on_buffer(bh);
+   if (buffer_uptodate(bh))
+   return 0;
+   return -EIO;
+}
+EXPORT_SYMBOL(bh_submit_read);
 void __init buffer_init(void)
 {
int nrpages;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index da0d83f..e98801f 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -192,6 +192,8 @@ int sync_dirty_buffer(struct buffer_head *bh);
 int submit_bh(int, struct buffer_head *);
 void write_boundary_block(struct block_device *bdev,
sector_t bblock, unsigned blocksize);
+int bh_uptodate_or_lock(struct buffer_head *bh);
+int bh_submit_read(struct buffer_head *bh);
 
 extern int buffer_heads_over_limit;
 
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 31/49] ext4: Take read lock during overwrite case.

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

When we are overwriting a file and not actually allocating new file system
blocks we need to take only the read lock on i_data_sem.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/inode.c |   32 
 1 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 596b3ab..ee0bc3a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -901,11 +901,31 @@ int ext4_get_blocks_wrap(handle_t *handle, struct inode 
*inode, sector_t block,
int create, int extend_disksize)
 {
int retval;
-   if (create) {
-   down_write((&EXT4_I(inode)->i_data_sem));
+   /*
+* Try to see if we can get  the block without requesting
+* for new file system block.
+*/
+   down_read((&EXT4_I(inode)->i_data_sem));
+   if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
+   retval =  ext4_ext_get_blocks(handle, inode, block, max_blocks,
+   bh, 0, 0);
} else {
-   down_read((&EXT4_I(inode)->i_data_sem));
+   retval = ext4_get_blocks_handle(handle,
+   inode, block, max_blocks, bh, 0, 0);
}
+   up_read((&EXT4_I(inode)->i_data_sem));
+   if (!create || (retval > 0))
+   return retval;
+
+   /*
+* We need to allocate new blocks which will result
+* in i_data update
+*/
+   down_write((&EXT4_I(inode)->i_data_sem));
+   /*
+* We need to check for EXT4 here because migrate
+* could have changed the inode type in between
+*/
if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
retval =  ext4_ext_get_blocks(handle, inode, block, max_blocks,
bh, create, extend_disksize);
@@ -913,11 +933,7 @@ int ext4_get_blocks_wrap(handle_t *handle, struct inode 
*inode, sector_t block,
retval = ext4_get_blocks_handle(handle, inode, block,
max_blocks, bh, create, extend_disksize);
}
-   if (create) {
-   up_write((&EXT4_I(inode)->i_data_sem));
-   } else {
-   up_read((&EXT4_I(inode)->i_data_sem));
-   }
+   up_write((&EXT4_I(inode)->i_data_sem));
return retval;
 }
 static int ext4_get_block(struct inode *inode, sector_t iblock,
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/49] ext4: export iov_shorten from kernel for ext4's use

2008-01-21 Thread Theodore Ts'o
From: Eric Sandeen <[EMAIL PROTECTED]>

Export iov_shorten() from kernel so that ext4 can
truncate too-large writes to bitmapped files.

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
---
 fs/read_write.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index ea1f94c..dfaee3f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -450,6 +450,7 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long 
nr_segs, size_t to)
}
return seg;
 }
+EXPORT_SYMBOL(iov_shorten);
 
 ssize_t do_sync_readv_writev(struct file *filp, const struct iovec *iov,
unsigned long nr_segs, size_t len, loff_t *ppos, iov_fn_t fn)
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/49] ext2: Fix the max file size for ext2 file system.

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

The max file size for ext2 file system is now calculated
with hardcoded 4K block size. The patch fixes it to be
calculated with the right block size.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext2/super.c |   32 
 1 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 154e25f..6abaf75 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -680,11 +680,31 @@ static int ext2_check_descriptors (struct super_block * 
sb)
 static loff_t ext2_max_size(int bits)
 {
loff_t res = EXT2_NDIR_BLOCKS;
-   /* This constant is calculated to be the largest file size for a
-* dense, 4k-blocksize file such that the total number of
+   int meta_blocks;
+   loff_t upper_limit;
+
+   /* This is calculated to be the largest file size for a
+* dense, file such that the total number of
 * sectors in the file, including data and all indirect blocks,
-* does not exceed 2^32. */
-   const loff_t upper_limit = 0x1ff7fffd000LL;
+* does not exceed 2^32 -1
+* __u32 i_blocks representing the total number of
+* 512 bytes blocks of the file
+*/
+   upper_limit = (1LL << 32) - 1;
+
+   /* total blocks in file system block size */
+   upper_limit >>= (bits - 9);
+
+
+   /* indirect blocks */
+   meta_blocks = 1;
+   /* double indirect blocks */
+   meta_blocks += 1 + (1LL << (bits-2));
+   /* tripple indirect blocks */
+   meta_blocks += 1 + (1LL << (bits-2)) + (1LL << (2*(bits-2)));
+
+   upper_limit -= meta_blocks;
+   upper_limit <<= bits;
 
res += 1LL << (bits-2);
res += 1LL << (2*(bits-2));
@@ -692,6 +712,10 @@ static loff_t ext2_max_size(int bits)
res <<= bits;
if (res > upper_limit)
res = upper_limit;
+
+   if (res > MAX_LFS_FILESIZE)
+   res = MAX_LFS_FILESIZE;
+
return res;
 }
 
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/49] ext3: Fix the max file size for ext3 file system.

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

The max file size for ext3 file system is now calculated
with hardcoded 4K block size. The patch fixes it to be
calculated with the right block size.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext3/super.c |   32 
 1 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index cb14de1..f3675cc 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -1436,11 +1436,31 @@ static void ext3_orphan_cleanup (struct super_block * 
sb,
 static loff_t ext3_max_size(int bits)
 {
loff_t res = EXT3_NDIR_BLOCKS;
-   /* This constant is calculated to be the largest file size for a
-* dense, 4k-blocksize file such that the total number of
+   int meta_blocks;
+   loff_t upper_limit;
+
+   /* This is calculated to be the largest file size for a
+* dense, file such that the total number of
 * sectors in the file, including data and all indirect blocks,
-* does not exceed 2^32. */
-   const loff_t upper_limit = 0x1ff7fffd000LL;
+* does not exceed 2^32 -1
+* __u32 i_blocks representing the total number of
+* 512 bytes blocks of the file
+*/
+   upper_limit = (1LL << 32) - 1;
+
+   /* total blocks in file system block size */
+   upper_limit >>= (bits - 9);
+
+
+   /* indirect blocks */
+   meta_blocks = 1;
+   /* double indirect blocks */
+   meta_blocks += 1 + (1LL << (bits-2));
+   /* tripple indirect blocks */
+   meta_blocks += 1 + (1LL << (bits-2)) + (1LL << (2*(bits-2)));
+
+   upper_limit -= meta_blocks;
+   upper_limit <<= bits;
 
res += 1LL << (bits-2);
res += 1LL << (2*(bits-2));
@@ -1448,6 +1468,10 @@ static loff_t ext3_max_size(int bits)
res <<= bits;
if (res > upper_limit)
res = upper_limit;
+
+   if (res > MAX_LFS_FILESIZE)
+   res = MAX_LFS_FILESIZE;
+
return res;
 }
 
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 27/49] ext4: Check for the correct error return from

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

ext4_ext_get_blocks returns negative values on error. We should
check for  <= 0

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/extents.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 754c0d3..8593e59 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2462,12 +2462,12 @@ retry:
ret = ext4_ext_get_blocks(handle, inode, block,
  max_blocks, &map_bh,
  EXT4_CREATE_UNINITIALIZED_EXT, 0);
-   WARN_ON(!ret);
-   if (!ret) {
+   WARN_ON(ret <= 0);
+   if (ret <= 0) {
ext4_error(inode->i_sb, "ext4_fallocate",
-  "ext4_ext_get_blocks returned 0! inode#%lu"
-  ", block=%u, max_blocks=%lu",
-  inode->i_ino, block, max_blocks);
+   "ext4_ext_get_blocks returned error: "
+   "inode#%lu, block=%u, max_blocks=%lu",
+   inode->i_ino, block, max_blocks);
ret = -EIO;
ext4_mark_inode_dirty(handle, inode);
ret2 = ext4_journal_stop(handle);
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/49] ext4: Return after ext4_error in case of failures

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

This fix some instances where we were continuing after calling
ext4_error. ext4_error call panic only if errors=panic mount option is
set. So we need to make sure we return correctly after ext4_error call

Reported by: Adrian Bunk <[EMAIL PROTECTED]>

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/balloc.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 9568a57..ff3428e 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -587,11 +587,13 @@ do_more:
in_range(ext4_inode_bitmap(sb, desc), block, count) ||
in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) ||
in_range(block + count - 1, ext4_inode_table(sb, desc),
-sbi->s_itb_per_group))
+sbi->s_itb_per_group)) {
ext4_error (sb, "ext4_free_blocks",
"Freeing blocks in system zones - "
"Block = %llu, count = %lu",
block, count);
+   goto error_return;
+   }
 
/*
 * We are about to start releasing blocks in the bitmap,
@@ -1690,11 +1692,13 @@ allocated:
in_range(ret_block, ext4_inode_table(sb, gdp),
 EXT4_SB(sb)->s_itb_per_group) ||
in_range(ret_block + num - 1, ext4_inode_table(sb, gdp),
-EXT4_SB(sb)->s_itb_per_group))
+EXT4_SB(sb)->s_itb_per_group)) {
ext4_error(sb, "ext4_new_block",
"Allocating block in system zone - "
"blocks from %llu, length %lu",
 ret_block, num);
+   goto out;
+   }
 
performed_allocation = 1;
 
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 42/49] ext4: Enable the multiblock allocator by default

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

Enable the multiblock allocator by default.

Fix ext4_show_options() so if it is not enabled, the nomballoc option
included in /proc/mounts.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
Acked-by: Eric Sandeen <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
---
 fs/ext4/super.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 136d095..91a11ec 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -736,6 +736,8 @@ static int ext4_show_options(struct seq_file *seq, struct 
vfsmount *vfs)
seq_puts(seq, ",nobh");
if (!test_opt(sb, EXTENTS))
seq_puts(seq, ",noextents");
+   if (!test_opt(sb, MBALLOC))
+   seq_puts(seq, ",nomballoc");
if (test_opt(sb, I_VERSION))
seq_puts(seq, ",i_version");
 
@@ -1902,6 +1904,11 @@ static int ext4_fill_super (struct super_block *sb, void 
*data, int silent)
 * User -o noextents to turn it off
 */
set_opt(sbi->s_mount_opt, EXTENTS);
+   /*
+* turn on mballoc feature by default in ext4 filesystem
+* User -o nomballoc to turn it off
+*/
+   set_opt(sbi->s_mount_opt, MBALLOC);
 
if (!parse_options ((char *) data, sb, &journal_inum, &journal_devnum,
NULL, 0))
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 28/49] ext4: remove unused code from ext4_find_entry()

2008-01-21 Thread Theodore Ts'o
From: Mariusz Kozlowski <[EMAIL PROTECTED]>

The unused code found in ext3_find_entry() is also present (and still
unused) in the ext4_find_entry() code. This patch removes it.

Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
---
 fs/ext4/namei.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index fb673b1..67b6d8a 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -861,14 +861,10 @@ static struct buffer_head * ext4_find_entry (struct 
dentry *dentry,
int i, err;
struct inode *dir = dentry->d_parent->d_inode;
int namelen;
-   const u8 *name;
-   unsigned blocksize;
 
*res_dir = NULL;
sb = dir->i_sb;
-   blocksize = sb->s_blocksize;
namelen = dentry->d_name.len;
-   name = dentry->d_name.name;
if (namelen > EXT4_NAME_LEN)
return NULL;
if (is_dx(dir)) {
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 45/49] ext4: Use the ext4_ext_actual_len() helper function

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

ext4 uses the high bit of the extent length to encode whether the extent
is intialized or not. The helper function ext4_ext_get_actual_len should
be used to get the actual length of the extent.

This addresses the kernel bug documented here:
 http://bugzilla.kernel.org/show_bug.cgi?id=9732

kernel BUG at fs/ext4/extents.c:1056!

Call Trace:
[] :ext4dev:ext4_ext_get_blocks+0x5ba/0x8c1
[] lock_release_holdtime+0x27/0x49
[] _spin_unlock+0x17/0x20
[] :jbd2:start_this_handle+0x4e0/0x4fe
[] :ext4dev:ext4_fallocate+0x175/0x39a
[] lock_release_holdtime+0x27/0x49
[] __lock_acquire+0x4e7/0xc4d
[] lock_release_holdtime+0x27/0x49
[] sys_fallocate+0xe4/0x10d
[] tracesys+0xd5/0xda

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
---
 fs/ext4/extents.c |   24 +---
 1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 13e3e8c..b6b9ec7 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1029,7 +1029,7 @@ ext4_ext_search_left(struct inode *inode, struct 
ext4_ext_path *path,
 {
struct ext4_extent_idx *ix;
struct ext4_extent *ex;
-   int depth;
+   int depth, ee_len;
 
BUG_ON(path == NULL);
depth = path->p_depth;
@@ -1043,6 +1043,7 @@ ext4_ext_search_left(struct inode *inode, struct 
ext4_ext_path *path,
 * first one in the file */
 
ex = path[depth].p_ext;
+   ee_len = ext4_ext_get_actual_len(ex);
if (*logical < le32_to_cpu(ex->ee_block)) {
BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
while (--depth >= 0) {
@@ -1052,10 +1053,10 @@ ext4_ext_search_left(struct inode *inode, struct 
ext4_ext_path *path,
return 0;
}
 
-   BUG_ON(*logical < le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len));
+   BUG_ON(*logical < (le32_to_cpu(ex->ee_block) + ee_len));
 
-   *logical = le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len) - 1;
-   *phys = ext_pblock(ex) + le16_to_cpu(ex->ee_len) - 1;
+   *logical = le32_to_cpu(ex->ee_block) + ee_len - 1;
+   *phys = ext_pblock(ex) + ee_len - 1;
return 0;
 }
 
@@ -1075,7 +1076,7 @@ ext4_ext_search_right(struct inode *inode, struct 
ext4_ext_path *path,
struct ext4_extent_idx *ix;
struct ext4_extent *ex;
ext4_fsblk_t block;
-   int depth;
+   int depth, ee_len;
 
BUG_ON(path == NULL);
depth = path->p_depth;
@@ -1089,6 +1090,7 @@ ext4_ext_search_right(struct inode *inode, struct 
ext4_ext_path *path,
 * first one in the file */
 
ex = path[depth].p_ext;
+   ee_len = ext4_ext_get_actual_len(ex);
if (*logical < le32_to_cpu(ex->ee_block)) {
BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
while (--depth >= 0) {
@@ -1100,7 +1102,7 @@ ext4_ext_search_right(struct inode *inode, struct 
ext4_ext_path *path,
return 0;
}
 
-   BUG_ON(*logical < le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len));
+   BUG_ON(*logical < (le32_to_cpu(ex->ee_block) + ee_len));
 
if (ex != EXT_LAST_EXTENT(path[depth].p_hdr)) {
/* next allocated block in this leaf */
@@ -1316,7 +1318,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct 
ext4_extent *ex1,
if (ext1_ee_len + ext2_ee_len > max_len)
return 0;
 #ifdef AGGRESSIVE_TEST
-   if (le16_to_cpu(ex1->ee_len) >= 4)
+   if (ext1_ee_len >= 4)
return 0;
 #endif
 
@@ -2313,7 +2315,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode 
*inode,
   - le32_to_cpu(newex.ee_block)
   + ext_pblock(&newex);
/* number of remaining blocks in the extent */
-   allocated = le16_to_cpu(newex.ee_len) -
+   allocated = ext4_ext_get_actual_len(&newex) -
(iblock - le32_to_cpu(newex.ee_block));
goto out;
} else {
@@ -2429,7 +2431,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode 
*inode,
newex.ee_len = cpu_to_le16(max_blocks);
err = ext4_ext_check_overlap(inode, &newex, path);
if (err)
-   allocated = le16_to_cpu(newex.ee_len);
+   allocated = ext4_ext_get_actual_len(&newex);
else
allocated = max_blocks;
 
@@ -2461,7 +2463,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode 
*inode,
 * but otherwise we'd need to call it every free() */
ext4_mb_discard_inode_preallocations(inode);
ext4_free_blocks(handle, inode, ext_pblock(&newex),
-   le16_to_cpu(newex.ee_len), 0);
+   ext4_ext_get_actual_len(&newex), 0);
g

Re: [PATCH][RESEND] sh: termios ioctl definitions

2008-01-21 Thread Andrew Morton
On Sat, 19 Jan 2008 16:05:06 + Alan Cox <[EMAIL PROTECTED]> wrote:

> These ports are holding up progress and now have been for months. Do the
> job for them.

Never understood the dependencies here.  Do these two patches depend on 
something
else which is only-in-mm?

Also, I've been uncertainly sitting on
http://userweb.kernel.org/~akpm/mmotm/broken-out/tty-fix-tty-network-driver-interactions-with-tcget-tcset-calls-x86-fix.patch
for some time.  Is it ready to go into git-x86?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/49] ext4: Rename i_file_acl to i_file_acl_lo

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

Rename i_file_acl to i_file_acl_lo. This helps
in finding bugs where we use i_file_acl instead
of the combined i_file_acl_lo and i_file_acl_high

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
 fs/ext4/inode.c |4 ++--
 include/linux/ext4_fs.h |2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 76ceba2..7bcec18 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2718,7 +2718,7 @@ void ext4_read_inode(struct inode * inode)
}
inode->i_blocks = le32_to_cpu(raw_inode->i_blocks);
ei->i_flags = le32_to_cpu(raw_inode->i_flags);
-   ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl);
+   ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl_lo);
if (EXT4_SB(inode->i_sb)->s_es->s_creator_os !=
cpu_to_le32(EXT4_OS_HURD))
ei->i_file_acl |=
@@ -2866,7 +2866,7 @@ static int ext4_do_update_inode(handle_t *handle,
cpu_to_le32(EXT4_OS_HURD))
raw_inode->i_file_acl_high =
cpu_to_le16(ei->i_file_acl >> 32);
-   raw_inode->i_file_acl = cpu_to_le32(ei->i_file_acl);
+   raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
if (!S_ISREG(inode->i_mode)) {
raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl);
} else {
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index 1a27433..6894f36 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -297,7 +297,7 @@ struct ext4_inode {
} osd1; /* OS dependent 1 */
__le32  i_block[EXT4_N_BLOCKS];/* Pointers to blocks */
__le32  i_generation;   /* File version (for NFS) */
-   __le32  i_file_acl; /* File ACL */
+   __le32  i_file_acl_lo;  /* File ACL */
__le32  i_dir_acl;  /* Directory ACL */
__le32  i_obso_faddr;   /* Obsoleted fragment address */
union {
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/49] ext4 extents: remove unneeded casts

2008-01-21 Thread Theodore Ts'o
From: Eric Sandeen <[EMAIL PROTECTED]>

There are many casts in extents.c which are not needed,
as the variables are already the type of the cast, or
are being promoted for no particular reason in printk's.

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/ext4/extents.c |   49 ++---
 1 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 19d8059..6853722 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -374,7 +374,7 @@ ext4_ext_binsearch_idx(struct inode *inode,
struct ext4_extent_idx *r, *l, *m;
 
 
-   ext_debug("binsearch for %lu(idx):  ", (unsigned long)block);
+   ext_debug("binsearch for %u(idx):  ", block);
 
l = EXT_FIRST_INDEX(eh) + 1;
r = EXT_LAST_INDEX(eh);
@@ -440,7 +440,7 @@ ext4_ext_binsearch(struct inode *inode,
return;
}
 
-   ext_debug("binsearch for %lu:  ", (unsigned long)block);
+   ext_debug("binsearch for %u:  ", block);
 
l = EXT_FIRST_EXTENT(eh) + 1;
r = EXT_LAST_EXTENT(eh);
@@ -766,7 +766,7 @@ static int ext4_ext_split(handle_t *handle, struct inode 
*inode,
while (k--) {
oldblock = newblock;
newblock = ablocks[--a];
-   bh = sb_getblk(inode->i_sb, (ext4_fsblk_t)newblock);
+   bh = sb_getblk(inode->i_sb, newblock);
if (!bh) {
err = -EIO;
goto cleanup;
@@ -786,9 +786,8 @@ static int ext4_ext_split(handle_t *handle, struct inode 
*inode,
fidx->ei_block = border;
ext4_idx_store_pblock(fidx, oldblock);
 
-   ext_debug("int.index at %d (block %llu): %lu -> %llu\n", i,
-   newblock, (unsigned long) le32_to_cpu(border),
-   oldblock);
+   ext_debug("int.index at %d (block %llu): %u -> %llu\n",
+   i, newblock, le32_to_cpu(border), oldblock);
/* copy indexes */
m = 0;
path[i].p_idx++;
@@ -1476,10 +1475,10 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct 
ext4_ext_path *path,
} else if (block < le32_to_cpu(ex->ee_block)) {
lblock = block;
len = le32_to_cpu(ex->ee_block) - block;
-   ext_debug("cache gap(before): %lu [%lu:%lu]",
-   (unsigned long) block,
-   (unsigned long) le32_to_cpu(ex->ee_block),
-   (unsigned long) ext4_ext_get_actual_len(ex));
+   ext_debug("cache gap(before): %u [%u:%u]",
+   block,
+   le32_to_cpu(ex->ee_block),
+ext4_ext_get_actual_len(ex));
} else if (block >= le32_to_cpu(ex->ee_block)
+ ext4_ext_get_actual_len(ex)) {
ext4_lblk_t next;
@@ -1487,10 +1486,10 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct 
ext4_ext_path *path,
+ ext4_ext_get_actual_len(ex);
 
next = ext4_ext_next_allocated_block(path);
-   ext_debug("cache gap(after): [%lu:%lu] %lu",
-   (unsigned long) le32_to_cpu(ex->ee_block),
-   (unsigned long) ext4_ext_get_actual_len(ex),
-   (unsigned long) block);
+   ext_debug("cache gap(after): [%u:%u] %u",
+   le32_to_cpu(ex->ee_block),
+   ext4_ext_get_actual_len(ex),
+   block);
BUG_ON(next == lblock);
len = next - lblock;
} else {
@@ -1498,7 +1497,7 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct 
ext4_ext_path *path,
BUG();
}
 
-   ext_debug(" -> %lu:%lu\n", (unsigned long) lblock, len);
+   ext_debug(" -> %u:%lu\n", lblock, len);
ext4_ext_put_in_cache(inode, lblock, len, 0, EXT4_EXT_CACHE_GAP);
 }
 
@@ -1520,11 +1519,9 @@ ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block,
ex->ee_block = cpu_to_le32(cex->ec_block);
ext4_ext_store_pblock(ex, cex->ec_start);
ex->ee_len = cpu_to_le16(cex->ec_len);
-   ext_debug("%lu cached by %lu:%lu:%llu\n",
-   (unsigned long) block,
-   (unsigned long) cex->ec_block,
-   (unsigned long) cex->ec_len,
-   cex->ec_start);
+   ext_debug("%u cached by %u:%u:%llu\n",
+   block,
+   cex->ec_block, cex->ec_len, cex->ec_start);
return cex->ec_type;
}
 
@@ -2145,9 +2142,8 @@ int ext4_ext_get_blocks(handl

[PATCH 47/49] jbd2: Mark jbd2 slabs as SLAB_TEMPORARY

2008-01-21 Thread Theodore Ts'o
From: Mingming Cao <[EMAIL PROTECTED]>

This patch marks slab allocations by jbd2 as short-lived in support of
Mel Gorman's "Group short-lived and reclaimable kernel allocations"
patch.  (Ported from similar changes made to fs/jbd/journal.c and
fs/jbd/revoke.c in Mel's patch.)

Cc: Mel Gorman <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
---
 fs/jbd2/journal.c |4 ++--
 fs/jbd2/revoke.c  |6 --
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index f8b0f8c..8301e8d 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1975,7 +1975,7 @@ static int journal_init_jbd2_journal_head_cache(void)
jbd2_journal_head_cache = kmem_cache_create("jbd2_journal_head",
sizeof(struct journal_head),
0,  /* offset */
-   0,  /* flags */
+   SLAB_TEMPORARY, /* flags */
NULL);  /* ctor */
retval = 0;
if (jbd2_journal_head_cache == 0) {
@@ -2271,7 +2271,7 @@ static int __init journal_init_handle_cache(void)
jbd2_handle_cache = kmem_cache_create("jbd2_journal_handle",
sizeof(handle_t),
0,  /* offset */
-   0,  /* flags */
+   SLAB_TEMPORARY, /* flags */
NULL);  /* ctor */
if (jbd2_handle_cache == NULL) {
printk(KERN_EMERG "JBD: failed to create handle cache\n");
diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index 3595fd4..df36f42 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -171,13 +171,15 @@ int __init jbd2_journal_init_revoke_caches(void)
 {
jbd2_revoke_record_cache = kmem_cache_create("jbd2_revoke_record",
   sizeof(struct jbd2_revoke_record_s),
-  0, SLAB_HWCACHE_ALIGN, NULL);
+  0,
+  SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY,
+  NULL);
if (jbd2_revoke_record_cache == 0)
return -ENOMEM;
 
jbd2_revoke_table_cache = kmem_cache_create("jbd2_revoke_table",
   sizeof(struct jbd2_revoke_table_s),
-  0, 0, NULL);
+  0, SLAB_TEMPORARY, NULL);
if (jbd2_revoke_table_cache == 0) {
kmem_cache_destroy(jbd2_revoke_record_cache);
jbd2_revoke_record_cache = NULL;
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25/49] jbd2: Remove printk from J_ASSERT to preserve registers during BUG

2008-01-21 Thread Theodore Ts'o
From: Chris Snook <[EMAIL PROTECTED]>

Signed-off-by: Chris Snook <[EMAIL PROTECTED]>
Cc: "Stephen C. Tweedie" <[EMAIL PROTECTED]>
Cc: Theodore Ts'o <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
---
 include/linux/jbd2.h |   16 +---
 1 files changed, 1 insertions(+), 15 deletions(-)

diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 06ef114..d5f7cff 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -256,17 +256,7 @@ typedef struct journal_superblock_s
 #include 
 #include 
 
-#define JBD2_ASSERTIONS
-#ifdef JBD2_ASSERTIONS
-#define J_ASSERT(assert)   \
-do {   \
-   if (!(assert)) {\
-   printk (KERN_EMERG  \
-   "Assertion failure in %s() at %s:%d: \"%s\"\n", \
-   __FUNCTION__, __FILE__, __LINE__, # assert);\
-   BUG();  \
-   }   \
-} while (0)
+#define J_ASSERT(assert)   BUG_ON(!(assert))
 
 #if defined(CONFIG_BUFFER_DEBUG)
 void buffer_assertion_failure(struct buffer_head *bh);
@@ -282,10 +272,6 @@ void buffer_assertion_failure(struct buffer_head *bh);
 #define J_ASSERT_JH(jh, expr)  J_ASSERT(expr)
 #endif
 
-#else
-#define J_ASSERT(assert)   do { } while (0)
-#endif /* JBD2_ASSERTIONS */
-
 #if defined(JBD2_PARANOID_IOFAIL)
 #define J_EXPECT(expr, why...) J_ASSERT(expr)
 #define J_EXPECT_BH(bh, expr, why...)  J_ASSERT_BH(bh, expr)
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 39/49] ext4: Add ext4_find_next_bit()

2008-01-21 Thread Theodore Ts'o
From: Aneesh Kumar K.V <[EMAIL PROTECTED]>

This function is used by the ext4 multi block allocator patches.

Also add generic_find_next_le_bit

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---
 include/asm-arm/bitops.h |2 +
 include/asm-generic/bitops/ext2-non-atomic.h |2 +
 include/asm-generic/bitops/le.h  |4 ++
 include/asm-m68k/bitops.h|2 +
 include/asm-m68knommu/bitops.h   |2 +
 include/asm-powerpc/bitops.h |4 ++
 include/asm-s390/bitops.h|2 +
 include/linux/ext4_fs.h  |1 +
 lib/find_next_bit.c  |   43 ++
 9 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/include/asm-arm/bitops.h b/include/asm-arm/bitops.h
index 47a6b08..5c60bfc 100644
--- a/include/asm-arm/bitops.h
+++ b/include/asm-arm/bitops.h
@@ -310,6 +310,8 @@ static inline int constant_fls(int x)
_find_first_zero_bit_le(p,sz)
 #define ext2_find_next_zero_bit(p,sz,off)  \
_find_next_zero_bit_le(p,sz,off)
+#define ext2_find_next_bit(p, sz, off) \
+   _find_next_bit_le(p, sz, off)
 
 /*
  * Minix is defined to use little-endian byte ordering.
diff --git a/include/asm-generic/bitops/ext2-non-atomic.h 
b/include/asm-generic/bitops/ext2-non-atomic.h
index 1697404..63cf822 100644
--- a/include/asm-generic/bitops/ext2-non-atomic.h
+++ b/include/asm-generic/bitops/ext2-non-atomic.h
@@ -14,5 +14,7 @@
generic_find_first_zero_le_bit((unsigned long *)(addr), (size))
 #define ext2_find_next_zero_bit(addr, size, off) \
generic_find_next_zero_le_bit((unsigned long *)(addr), (size), (off))
+#define ext2_find_next_bit(addr, size, off) \
+   generic_find_next_le_bit((unsigned long *)(addr), (size), (off))
 
 #endif /* _ASM_GENERIC_BITOPS_EXT2_NON_ATOMIC_H_ */
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index b9c7e5d..80e3bf1 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -20,6 +20,8 @@
 #define generic___test_and_clear_le_bit(nr, addr) __test_and_clear_bit(nr, 
addr)
 
 #define generic_find_next_zero_le_bit(addr, size, offset) 
find_next_zero_bit(addr, size, offset)
+#define generic_find_next_le_bit(addr, size, offset) \
+   find_next_bit(addr, size, offset)
 
 #elif defined(__BIG_ENDIAN)
 
@@ -42,6 +44,8 @@
 
 extern unsigned long generic_find_next_zero_le_bit(const unsigned long *addr,
unsigned long size, unsigned long offset);
+extern unsigned long generic_find_next_le_bit(const unsigned long *addr,
+   unsigned long size, unsigned long offset);
 
 #else
 #error "Please fix "
diff --git a/include/asm-m68k/bitops.h b/include/asm-m68k/bitops.h
index 2976b5d..83d1f28 100644
--- a/include/asm-m68k/bitops.h
+++ b/include/asm-m68k/bitops.h
@@ -410,6 +410,8 @@ static inline int ext2_find_next_zero_bit(const void 
*vaddr, unsigned size,
res = ext2_find_first_zero_bit (p, size - 32 * (p - addr));
return (p - addr) * 32 + res;
 }
+#define ext2_find_next_bit(addr, size, off) \
+   generic_find_next_le_bit((unsigned long *)(addr), (size), (off))
 
 #endif /* __KERNEL__ */
 
diff --git a/include/asm-m68knommu/bitops.h b/include/asm-m68knommu/bitops.h
index f8dfb7b..f43afe1 100644
--- a/include/asm-m68knommu/bitops.h
+++ b/include/asm-m68knommu/bitops.h
@@ -294,6 +294,8 @@ found_middle:
return result + ffz(__swab32(tmp));
 }
 
+#define ext2_find_next_bit(addr, size, off) \
+   generic_find_next_le_bit((unsigned long *)(addr), (size), (off))
 #include 
 
 #endif /* __KERNEL__ */
diff --git a/include/asm-powerpc/bitops.h b/include/asm-powerpc/bitops.h
index 733b4af..220d9a7 100644
--- a/include/asm-powerpc/bitops.h
+++ b/include/asm-powerpc/bitops.h
@@ -359,6 +359,8 @@ static __inline__ int test_le_bit(unsigned long nr,
 unsigned long generic_find_next_zero_le_bit(const unsigned long *addr,
unsigned long size, unsigned long offset);
 
+unsigned long generic_find_next_le_bit(const unsigned long *addr,
+   unsigned long size, unsigned long offset);
 /* Bitmap functions for the ext2 filesystem */
 
 #define ext2_set_bit(nr,addr) \
@@ -378,6 +380,8 @@ unsigned long generic_find_next_zero_le_bit(const unsigned 
long *addr,
 #define ext2_find_next_zero_bit(addr, size, off) \
generic_find_next_zero_le_bit((unsigned long*)addr, size, off)
 
+#define ext2_find_next_bit(addr, size, off) \
+   generic_find_next_le_bit((unsigned long *)addr, size, off)
 /* Bitmap functions for the minix filesystem.  */
 
 #define minix_test_and_set_bit(nr,addr) \
diff --git a/include/asm-s390/bitops.h b/include/asm-s390/bitops.h
index 34d9a63..dba6fec 100644
--- a/include/asm-s390/bitops.h
+++ b/include/asm-s390/bitops.

[PATCH 20/49] ext4/super.c: fix #ifdef's (CONFIG_EXT4_* -> CONFIG_EXT4DEV_*)

2008-01-21 Thread Theodore Ts'o
From: Adrian Bunk <[EMAIL PROTECTED]>

Based on a report by Robert P. J. Day.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
---
 fs/ext4/super.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0931831..1484a08 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -706,7 +706,7 @@ static int ext4_show_options(struct seq_file *seq, struct 
vfsmount *vfs)
seq_puts(seq, ",debug");
if (test_opt(sb, OLDALLOC))
seq_puts(seq, ",oldalloc");
-#ifdef CONFIG_EXT4_FS_XATTR
+#ifdef CONFIG_EXT4DEV_FS_XATTR
if (test_opt(sb, XATTR_USER))
seq_puts(seq, ",user_xattr");
if (!test_opt(sb, XATTR_USER) &&
@@ -714,7 +714,7 @@ static int ext4_show_options(struct seq_file *seq, struct 
vfsmount *vfs)
seq_puts(seq, ",nouser_xattr");
}
 #endif
-#ifdef CONFIG_EXT4_FS_POSIX_ACL
+#ifdef CONFIG_EXT4DEV_FS_POSIX_ACL
if (test_opt(sb, POSIX_ACL))
seq_puts(seq, ",acl");
if (!test_opt(sb, POSIX_ACL) && (def_mount_opts & EXT4_DEFM_ACL))
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 44/49] ext4: fix uniniatilized extent splitting error

2008-01-21 Thread Theodore Ts'o
From: Dmitry Monakhov <[EMAIL PROTECTED]>

Fix bug reported by Dmitry Monakhov caused by lost error code

Testcase:

blksize = 0x1000;
fd = open(argv[1], O_RDWR|O_CREAT, 0700);
unsigned long long sz = 0x1000UL;
/* allocating big blocks chunk */
syscall(__NR_fallocate, fd, 0, 0UL, sz)

/* grab all other available filesystem space */
tfd = open("tmp", O_RDWR|O_CREAT|O_DIRECT, 0700);
while( write(tfd, buf, 4096) > 0); /* loop untill ENOSPC */
fsync(fd); /* just in case */
while (pos < sz) {
/* each seek+ write operation result in splits uninitialized extent
in three extents. Splitting may result in new extent allocation
which probably will fail because of ENOSPC*/

lseek(fd, blksize*2 -1, SEEK_CUR);
if ((ret = write(fd, 'a', 1)) != 1)
exit(1);
pos += blksize * 2;
}

Signed-off-by: Dmitry Monakhov <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
---
 fs/ext4/extents.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 8cf5545..13e3e8c 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2373,9 +2373,10 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode 
*inode,
ret = ext4_ext_convert_to_initialized(handle, inode,
path, iblock,
max_blocks);
-   if (ret <= 0)
+   if (ret <= 0) {
+   err = ret;
goto out2;
-   else
+   } else
allocated = ret;
goto outnew;
}
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 48/49] jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup

2008-01-21 Thread Theodore Ts'o
From: Mingming Cao <[EMAIL PROTECTED]>

While "every 5 seconds" doesn't sound as a problem, there can be many
of these (and these timers do add up over all the kernel).  The "5
second" wakeup isn't really timing sensitive; in addition even with
rounding it'll still happen every 5 seconds (with the exception of the
very first time, which is likely to be rounded up to somewhere closer
to 6 seconds)

(Ported from similar JBD patch made by Arjan van de Ven to
fs/jbd/transaction.c)

Cc: Arjan van de Ven <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
---
 fs/jbd2/transaction.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 70b3199..0c8adab 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -54,7 +54,7 @@ jbd2_get_transaction(journal_t *journal, transaction_t 
*transaction)
spin_lock_init(&transaction->t_handle_lock);
 
/* Set up the commit timer for the new transaction. */
-   journal->j_commit_timer.expires = transaction->t_expires;
+   journal->j_commit_timer.expires = round_jiffies(transaction->t_expires);
add_timer(&journal->j_commit_timer);
 
J_ASSERT(journal->j_running_transaction == NULL);
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/49] ext4: Support large blocksize up to PAGESIZE

2008-01-21 Thread Theodore Ts'o
From: Takashi Sato <[EMAIL PROTECTED]>

This patch set supports large block size(>4k, <=64k) in ext4,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext4 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0x to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext4: enlarge blocksize
 - Allow blocksize up to pagesize

  [2/2]  ext4: fix rec_len overflow
 - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext4dev, and able to handle empty directory block.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/ext4/super.c |5 +
 include/linux/ext4_fs.h |4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 1ca0f54..ab7010d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1624,6 +1624,11 @@ static int ext4_fill_super (struct super_block *sb, void 
*data, int silent)
goto out_fail;
}
 
+   if (!sb_set_blocksize(sb, blocksize)) {
+   printk(KERN_ERR "EXT4-fs: bad blocksize %d.\n", blocksize);
+   goto out_fail;
+   }
+
/*
 * The ext4 superblock will not be buffer aligned for other than 1kB
 * block sizes.  We need to calculate the offset from buffer start.
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index 97dd409..dfe4487 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -73,8 +73,8 @@
  * Macro-instructions used to manage several block sizes
  */
 #define EXT4_MIN_BLOCK_SIZE1024
-#defineEXT4_MAX_BLOCK_SIZE 4096
-#define EXT4_MIN_BLOCK_LOG_SIZE  10
+#defineEXT4_MAX_BLOCK_SIZE 65536
+#define EXT4_MIN_BLOCK_LOG_SIZE10
 #ifdef __KERNEL__
 # define EXT4_BLOCK_SIZE(s)((s)->s_blocksize)
 #else
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 40/49] ext4: Add new functions for searching extent tree

2008-01-21 Thread Theodore Ts'o
From: Alex Tomas <[EMAIL PROTECTED]>

Add the functions ext4_ext_search_left() and ext4_ext_search_right(),
which are used by mballoc during ext4_ext_get_blocks to decided whether
to merge extent information.

Signed-off-by: Alex Tomas <[EMAIL PROTECTED]>
Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
Signed-off-by: Johann Lombardi <[EMAIL PROTECTED]>
Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
Signed-off-by: "Theodore Ts'o" <[EMAIL PROTECTED]>
---
 fs/ext4/extents.c   |  142 +++
 include/linux/ext4_fs_extents.h |4 +
 2 files changed, 146 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 03d1bbb..a60227c 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1017,6 +1017,148 @@ out:
 }
 
 /*
+ * search the closest allocated block to the left for *logical
+ * and returns it at @logical + it's physical address at @phys
+ * if *logical is the smallest allocated block, the function
+ * returns 0 at @phys
+ * return value contains 0 (success) or error code
+ */
+int
+ext4_ext_search_left(struct inode *inode, struct ext4_ext_path *path,
+   ext4_lblk_t *logical, ext4_fsblk_t *phys)
+{
+   struct ext4_extent_idx *ix;
+   struct ext4_extent *ex;
+   int depth;
+
+   BUG_ON(path == NULL);
+   depth = path->p_depth;
+   *phys = 0;
+
+   if (depth == 0 && path->p_ext == NULL)
+   return 0;
+
+   /* usually extent in the path covers blocks smaller
+* then *logical, but it can be that extent is the
+* first one in the file */
+
+   ex = path[depth].p_ext;
+   if (*logical < le32_to_cpu(ex->ee_block)) {
+   BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
+   while (--depth >= 0) {
+   ix = path[depth].p_idx;
+   BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr));
+   }
+   return 0;
+   }
+
+   BUG_ON(*logical < le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len));
+
+   *logical = le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len) - 1;
+   *phys = ext_pblock(ex) + le16_to_cpu(ex->ee_len) - 1;
+   return 0;
+}
+
+/*
+ * search the closest allocated block to the right for *logical
+ * and returns it at @logical + it's physical address at @phys
+ * if *logical is the smallest allocated block, the function
+ * returns 0 at @phys
+ * return value contains 0 (success) or error code
+ */
+int
+ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path,
+   ext4_lblk_t *logical, ext4_fsblk_t *phys)
+{
+   struct buffer_head *bh = NULL;
+   struct ext4_extent_header *eh;
+   struct ext4_extent_idx *ix;
+   struct ext4_extent *ex;
+   ext4_fsblk_t block;
+   int depth;
+
+   BUG_ON(path == NULL);
+   depth = path->p_depth;
+   *phys = 0;
+
+   if (depth == 0 && path->p_ext == NULL)
+   return 0;
+
+   /* usually extent in the path covers blocks smaller
+* then *logical, but it can be that extent is the
+* first one in the file */
+
+   ex = path[depth].p_ext;
+   if (*logical < le32_to_cpu(ex->ee_block)) {
+   BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
+   while (--depth >= 0) {
+   ix = path[depth].p_idx;
+   BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr));
+   }
+   *logical = le32_to_cpu(ex->ee_block);
+   *phys = ext_pblock(ex);
+   return 0;
+   }
+
+   BUG_ON(*logical < le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len));
+
+   if (ex != EXT_LAST_EXTENT(path[depth].p_hdr)) {
+   /* next allocated block in this leaf */
+   ex++;
+   *logical = le32_to_cpu(ex->ee_block);
+   *phys = ext_pblock(ex);
+   return 0;
+   }
+
+   /* go up and search for index to the right */
+   while (--depth >= 0) {
+   ix = path[depth].p_idx;
+   if (ix != EXT_LAST_INDEX(path[depth].p_hdr))
+   break;
+   }
+
+   if (depth < 0) {
+   /* we've gone up to the root and
+* found no index to the right */
+   return 0;
+   }
+
+   /* we've found index to the right, let's
+* follow it and find the closest allocated
+* block to the right */
+   ix++;
+   block = idx_pblock(ix);
+   while (++depth < path->p_depth) {
+   bh = sb_bread(inode->i_sb, block);
+   if (bh == NULL)
+   return -EIO;
+   eh = ext_block_hdr(bh);
+   if (ext4_ext_check_header(inode, eh, depth)) {
+   brelse(bh);
+   return -EIO;
+   }
+   ix = EXT_FIRST_INDEX(eh);
+   block = idx_pblock(ix);
+  

[PATCH 15/49] ext4: store maxbytes for bitmapped files and return EFBIG as appropriate

2008-01-21 Thread Theodore Ts'o
From: Eric Sandeen <[EMAIL PROTECTED]>

Calculate & store the max offset for bitmapped files, and
catch too-large seeks, truncates, and writes in ext4, shortening
or rejecting as appropriate.

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
---
 fs/ext4/file.c |   19 ++-
 fs/ext4/inode.c|   16 +++-
 fs/ext4/super.c|1 +
 include/linux/ext4_fs_sb.h |1 +
 4 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 1a81cd6..a6b2aa1 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -56,8 +56,25 @@ ext4_file_write(struct kiocb *iocb, const struct iovec *iov,
ssize_t ret;
int err;
 
-   ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
+   /*
+* If we have encountered a bitmap-format file, the size limit
+* is smaller than s_maxbytes, which is for extent-mapped files.
+*/
+
+   if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) {
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   size_t length = iov_length(iov, nr_segs);
 
+   if (pos > sbi->s_bitmap_maxbytes)
+   return -EFBIG;
+
+   if (pos + length > sbi->s_bitmap_maxbytes) {
+   nr_segs = iov_shorten((struct iovec *)iov, nr_segs,
+ sbi->s_bitmap_maxbytes - pos);
+   }
+   }
+
+   ret = generic_file_aio_write(iocb, iov, nr_segs, pos);
/*
 * Skip flushing if there was an error, or if nothing was written.
 */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9cf8572..eaace13 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -314,7 +314,10 @@ static int ext4_block_to_path(struct inode *inode,
offsets[n++] = i_block & (ptrs - 1);
final = ptrs;
} else {
-   ext4_warning(inode->i_sb, "ext4_block_to_path", "block > big");
+   ext4_warning(inode->i_sb, "ext4_block_to_path",
+   "block %u > max",
+   i_block + direct_blocks +
+   indirect_blocks + double_blocks);
}
if (boundary)
*boundary = final - 1 - (i_block & (ptrs - 1));
@@ -3092,6 +3095,17 @@ int ext4_setattr(struct dentry *dentry, struct iattr 
*attr)
ext4_journal_stop(handle);
}
 
+   if (attr->ia_valid & ATTR_SIZE) {
+   if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) {
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+
+   if (attr->ia_size > sbi->s_bitmap_maxbytes) {
+   error = -EFBIG;
+   goto err_out;
+   }
+   }
+   }
+
if (S_ISREG(inode->i_mode) &&
attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) {
handle_t *handle;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c79e46b..0931831 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1922,6 +1922,7 @@ static int ext4_fill_super (struct super_block *sb, void 
*data, int silent)
}
}
 
+   sbi->s_bitmap_maxbytes = ext4_max_bitmap_size(sb->s_blocksize_bits);
sb->s_maxbytes = ext4_max_size(sb->s_blocksize_bits);
 
if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV) {
diff --git a/include/linux/ext4_fs_sb.h b/include/linux/ext4_fs_sb.h
index f15821c..38a47ec 100644
--- a/include/linux/ext4_fs_sb.h
+++ b/include/linux/ext4_fs_sb.h
@@ -38,6 +38,7 @@ struct ext4_sb_info {
ext4_group_t s_groups_count;/* Number of groups in the fs */
unsigned long s_overhead_last;  /* Last calculated overhead */
unsigned long s_blocks_last;/* Last seen block count */
+   loff_t s_bitmap_maxbytes;   /* max bytes for bitmap files */
struct buffer_head * s_sbh; /* Buffer containing the super block */
struct ext4_super_block * s_es; /* Pointer to the super block in the 
buffer */
struct buffer_head ** s_group_desc;
-- 
1.5.4.rc3.31.g1271-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ext4 merge plans for 2.6.25

2008-01-21 Thread Theodore Ts'o
The following patches have been in the -mm tree for a while, and I
plan to push them to Linus when the 2.6.25 merge window opens.  With
this patch series, it is expected that ext4 format should be settling
down.  We still have delayed allocation and online defrag which aren't
quite ready to merge, but those shouldn't affect the on-disk format.

I don't expect any other on-disk format changes to show up after this
point, but I've been wrong before  any such changes would have to
have a Really Good Reason, though.  (No, Abhishek Rai's changes
wouldn't count as an on-disk change, since they change layout choices,
but not anything that e2fsck would actually care about.  We may try
merging those into ext4 and see how they play out in the -mm tree;
we'll see.)

- Ted

P.S.  Yes, the currently released e2fsprogs won't support all of these
format changes yet; again ext4, shouldn't be deployed to production
systems yet, although we do salute those who are willing to be guinea
pigs and play with this code!  Never fear, I'll be working to get
e2fsprogs caught up Real Soon Now.

Adrian Bunk (1):
  ext4/super.c: fix #ifdef's (CONFIG_EXT4_* -> CONFIG_EXT4DEV_*)

Alex Tomas (2):
  ext4: Add new functions for searching extent tree
  ext4: Add multi block allocator for ext4

Aneesh Kumar K.V (23):
  ext4: Introduce ext4_lblk_t
  ext4: Introduce ext4_update_*_feature
  ext4:  Fix sparse warnings.
  ext4: Rename i_file_acl to i_file_acl_lo
  ext4: Rename i_dir_acl to i_size_high
  ext4: Add support for 48 bit inode i_blocks.
  ext4: Support large files
  ext2: Fix the max file size for ext2 file system.
  ext3: Fix the max file size for ext3 file system.
  ext4: Return after ext4_error in case of failures
  ext4: Change the default behaviour on error
  Add buffer head related helper functions
  ext4: add block bitmap validation
  ext4: Check for the correct error return from
  ext4: Make ext4_get_blocks_wrap take the truncate_mutex early.
  ext4: Convert truncate_mutex to read write semaphore.
  ext4: Take read lock during overwrite case.
  ext4: Add EXT4_IOC_MIGRATE ioctl
  ext4: Fix ext4_show_options to show the correct mount options.
  ext4: Add ext4_find_next_bit()
  ext4: Enable the multiblock allocator by default
  ext4: Check for return value from sb_set_blocksize
  ext4: Use the ext4_ext_actual_len() helper function

Avantika Mathur (2):
  ext4: add ext4_group_t, and change all group variables to this type.
  ext4: fixes block group number being set to a negative value

Chris Snook (1):
  jbd2: Remove printk from J_ASSERT to preserve registers during BUG

Coly Li (1):
  ext4: sync up block group descriptor with e2fsprogs.

Dmitry Monakhov (1):
  ext4: fix uniniatilized extent splitting error

Eric Sandeen (6):
  ext4 extents: remove unneeded casts
  ext4: different maxbytes functions for bitmap & extentfiles
  ext4: export iov_shorten from kernel for ext4's use
  ext4: store maxbytes for bitmapped  files and return EFBIG as appropriate
  ext4: fix oops on corrupted ext4 mount
  ext4: fix up EXT4FS_DEBUG builds

Girish Shilamkar (1):
  ext4: Add the journal checksum feature

Jan Kara (2):
  ext4: Avoid rec_len overflow with 64KB block size
  jbd2: Fix assertion failure in fs/jbd2/checkpoint.c

Jean Noel Cordenner (2):
  vfs: Add 64 bit i_version support
  ext4: Add inode version support in ext4

Johann Lombardi (1):
  jbd2: jbd2 stats through procfs

Mariusz Kozlowski (1):
  ext4: remove unused code from ext4_find_entry()

Mingming Cao (4):
  jbd2: add lockdep support
  jbd2: Mark jbd2 slabs as SLAB_TEMPORARY
  jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup
  jbd2: sparse pointer use of zero as null

Takashi Sato (1):
  ext4:  Support large blocksize up to PAGESIZE


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/PATCH] dma: dma_{un}map_{single|sg}_attrs() interface

2008-01-21 Thread akepner

Here's a new interface for passing attributes to the dma mapping 
and unmapping routines. (I have patches that make use of the 
interface as well, but let's discuss this piece first.)

For ia64, new machvec entries replace the dma map/unmap interface, 
and the old interface is implemented in terms of the new. (All 
implementations other than ia64/sn2 ignore the new attributes.) 

For architectures other than ia64, the new interface is implemented 
in terms of the old (attributes are always ignored).

Tested on hpzx1 and ia64/sn2 (IA64_GENERIC kernels) and on x86_64. 

Signed-off-by: Arthur Kepner <[EMAIL PROTECTED]>

-- 

 arch/ia64/hp/common/hwsw_iommu.c |   60 +-
 arch/ia64/hp/common/sba_iommu.c  |   62 +++
 arch/ia64/sn/pci/pci_dma.c   |   71 +++
 include/asm-ia64/dma-mapping.h   |   28 ++--
 include/asm-ia64/machvec.h   |   52 +-
 include/asm-ia64/machvec_hpzx1.h |   16 +++---
 include/asm-ia64/machvec_hpzx1_swiotlb.h |   16 +++---
 include/asm-ia64/machvec_sn2.h   |   16 +++---
 include/linux/dma-attrs.h|   33 ++
 include/linux/dma-mapping.h  |   33 ++
 lib/swiotlb.c|   50 ++---
 11 files changed, 301 insertions(+), 136 deletions(-)

diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c
index 94e5710..8cedd6c 100644
--- a/arch/ia64/hp/common/hwsw_iommu.c
+++ b/arch/ia64/hp/common/hwsw_iommu.c
@@ -20,10 +20,10 @@
 extern int swiotlb_late_init_with_default_size (size_t size);
 extern ia64_mv_dma_alloc_coherent  swiotlb_alloc_coherent;
 extern ia64_mv_dma_free_coherent   swiotlb_free_coherent;
-extern ia64_mv_dma_map_single  swiotlb_map_single;
-extern ia64_mv_dma_unmap_singleswiotlb_unmap_single;
-extern ia64_mv_dma_map_sg  swiotlb_map_sg;
-extern ia64_mv_dma_unmap_sgswiotlb_unmap_sg;
+extern ia64_mv_dma_map_single_attrsswiotlb_map_single_attrs;
+extern ia64_mv_dma_unmap_single_attrs  swiotlb_unmap_single_attrs;
+extern ia64_mv_dma_map_sg_attrsswiotlb_map_sg_attrs;
+extern ia64_mv_dma_unmap_sg_attrs  swiotlb_unmap_sg_attrs;
 extern ia64_mv_dma_supported   swiotlb_dma_supported;
 extern ia64_mv_dma_mapping_error   swiotlb_dma_mapping_error;
 
@@ -31,19 +31,19 @@ extern ia64_mv_dma_mapping_error
swiotlb_dma_mapping_error;
 
 extern ia64_mv_dma_alloc_coherent  sba_alloc_coherent;
 extern ia64_mv_dma_free_coherent   sba_free_coherent;
-extern ia64_mv_dma_map_single  sba_map_single;
-extern ia64_mv_dma_unmap_singlesba_unmap_single;
-extern ia64_mv_dma_map_sg  sba_map_sg;
-extern ia64_mv_dma_unmap_sgsba_unmap_sg;
+extern ia64_mv_dma_map_single_attrssba_map_single_attrs;
+extern ia64_mv_dma_unmap_single_attrs  sba_unmap_single_attrs;
+extern ia64_mv_dma_map_sg_attrssba_map_sg_attrs;
+extern ia64_mv_dma_unmap_sg_attrs  sba_unmap_sg_attrs;
 extern ia64_mv_dma_supported   sba_dma_supported;
 extern ia64_mv_dma_mapping_error   sba_dma_mapping_error;
 
 #define hwiommu_alloc_coherent sba_alloc_coherent
 #define hwiommu_free_coherent  sba_free_coherent
-#define hwiommu_map_single sba_map_single
-#define hwiommu_unmap_single   sba_unmap_single
-#define hwiommu_map_sg sba_map_sg
-#define hwiommu_unmap_sg   sba_unmap_sg
+#define hwiommu_map_single_attrs   sba_map_single_attrs
+#define hwiommu_unmap_single_attrs sba_unmap_single_attrs
+#define hwiommu_map_sg_attrs   sba_map_sg_attrs
+#define hwiommu_unmap_sg_attrs sba_unmap_sg_attrs
 #define hwiommu_dma_supported  sba_dma_supported
 #define hwiommu_dma_mapping_error  sba_dma_mapping_error
 #define hwiommu_sync_single_for_cpumachvec_dma_sync_single
@@ -98,40 +98,44 @@ hwsw_free_coherent (struct device *dev, size_t size, void 
*vaddr, dma_addr_t dma
 }
 
 dma_addr_t
-hwsw_map_single (struct device *dev, void *addr, size_t size, int dir)
+hwsw_map_single_attrs (struct device *dev, void *addr, size_t size, int dir, 
+  struct dma_attrs *attrs)
 {
if (use_swiotlb(dev))
-   return swiotlb_map_single(dev, addr, size, dir);
+   return swiotlb_map_single_attrs(dev, addr, size, dir, attrs);
else
-   return hwiommu_map_single(dev, addr, size, dir);
+   return hwiommu_map_single_attrs(dev, addr, size, dir, attrs);
 }
 
 void
-hwsw_unmap_single (struct device *dev, dma_addr_t iova, size_t size, int dir)
+hwsw_unmap_single_attrs (struct device *dev, dma_addr_t iova, size_t size, 
+int dir, struct dma_attrs *attrs)
 {
if (use_swiotlb(dev))
-   return swiotlb_unmap_single(dev, iova, size

Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Anton Salikhmetov
2008/1/22, Linus Torvalds <[EMAIL PROTECTED]>:
>
>
> On Tue, 22 Jan 2008, Anton Salikhmetov wrote:
> >
> >  /*
> > + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> > + * It will force a pagefault on the next write access.
> > + */
> > +static void vma_wrprotect(struct vm_area_struct *vma)
> > +{
> > + unsigned long addr;
> > +
> > + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> > + spinlock_t *ptl;
> > + pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> > + pud_t *pud = pud_offset(pgd, addr);
> > + pmd_t *pmd = pmd_offset(pud, addr);
> > + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>
> This is extremely expensive over bigger areas, especially sparsely mapped
> ones (it does all the lookups for all four levels over and over and over
> again for eachg page).
>
> I think Peter Zijlstra posted a version that uses the regular kind of
> nested loop (with inline functions to keep the thing nice and clean),
> which gets rid of that.

Thanks for your feedback, Linus!

I will use Peter Zijlstra's version of such an operation in my next
patch series.

>
> [ The sad/funny part is that this is all how we *used* to do msync(), back
>   in the days: we're literally going back to the "pre-cleanup" logic. See
>   commit 204ec841fbea3e5138168edbc3a76d46747cc987: "mm: msync() cleanup"
>   for details ]
>
> Quite frankly, I really think you might be better off just doing a
>
> git revert 204ec841fbea3e5138168edbc3a76d46747cc987
>
> and working from there! I just checked, and it still reverts cleanly, and
> you'd end up with a nice code-base that (a) has gotten years of testing
> and (b) already has the looping-over-the-pagetables code.
>
> Linus
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


WARNING: at kernel/mutex.c:134

2008-01-21 Thread Dave Young
While try to trigger bug 9778, press ctrl+alt+sysrq +w, the following warnings 
appeared:

usb0: unregister 'cdc_ether' usb-:00:1d.7-1.3, CDC Ethernet Device
unregister_netdevice: waiting for usb0 to become free. Usage count = 1
SysRq : Show Blocked State
  taskPC stack   pid father
Sched Debug Version: v0.07, 2.6.24-rc8-mm1 #5
now at 2660467.699238 msecs
  .sysctl_sched_latency: 40.00
  .sysctl_sched_min_granularity: 8.00
  .sysctl_sched_wakeup_granularity : 20.00
  .sysctl_sched_batch_wakeup_granularity   : 20.00
  .sysctl_sched_child_runs_first   : 0.01
  .sysctl_sched_features   : 39

cpu#0, 2793.192 MHz
  .nr_running: 2
  .load  : 4096
  .nr_switches   : 1627754
  .nr_load_updates   : 241563
  .nr_uninterruptible: 4294967012
  .jiffies   : 708140
  .next_balance  : 0.708327
  .curr->pid : 0
  .clock : 2656959.965268
  .idle_clock: 0.00
  .prev_clock_raw: 2674890.031768
  .clock_warps   : 0
  .clock_overflows   : 5805
  .clock_underflows  : 127079
  .clock_deep_idle_events: 2
  .clock_max_delta   : 671.628710
  .cpu_load[0]   : 0
  .cpu_load[1]   : 0
  .cpu_load[2]   : 6
  .cpu_load[3]   : 48
  .cpu_load[4]   : 85

cfs_rq
  .exec_clock: 0.00
  .MIN_vruntime  : 180379.753201
  .min_vruntime  : 180379.753201
  .max_vruntime  : 180379.753201
  .spread: 0.00
  .spread0   : 0.00
  .nr_running: 1
  .load  : 4096
  .nr_spread_over: 0
[ cut here ]
WARNING: at kernel/mutex.c:134 mutex_lock_nested+0x277/0x290()
Modules linked in: cdc_ether usbnet snd_seq_dummy snd_seq_oss 
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 
psmouse snd_hda_intel snd_pcm intel_agp btusb snd_timer rtc_cmos bluetooth sg 
rtc_core 3c59x evdev agpgart snd serio_raw button thermal processor soundcore 
rtc_lib snd_page_alloc i2c_i801 pcspkr dcdbas
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #5
 [] ? printk+0x0/0x20
 [] warn_on_slowpath+0x54/0x80
 [] ? _spin_unlock_irqrestore+0x5e/0x70
 [] ? release_console_sem+0xd1/0xe0
 [] ? vprintk+0x308/0x320
 [] ? put_lock_stats+0x21/0x30
 [] ? lock_release_holdtime+0x60/0x80
 [] ? print_cfs_rq+0x117/0x500
 [] ? __lock_release+0x47/0x70
 [] ? print_cfs_rq+0x117/0x500
 [] ? printk+0x18/0x20
 [] mutex_lock_nested+0x277/0x290
 [] ? vprintk+0x308/0x320
 [] ? print_cfs_stats+0x30/0xb0
 [] print_cfs_stats+0x30/0xb0
 [] print_cpu+0x81c/0x830
 [] sched_debug_show+0x22a/0x430
 [] sysrq_sched_debug_show+0xc/0x10
 [] show_state_filter+0x86/0xb0
 [] sysrq_handle_showstate_blocked+0xd/0x10
 [] __handle_sysrq+0x89/0x120
 [] handle_sysrq+0x33/0x40
 [] kbd_keycode+0x39e/0x480
 [] ? __mod_timer+0xa0/0xb0
 [] kbd_event+0xea/0x100
 [] input_pass_event+0xec/0x100
 [] ? input_pass_event+0x0/0x100
 [] ? mod_timer+0x26/0x40
 [] input_handle_event+0xb2/0x2b0
 [] input_event+0x5f/0x80
 [] hidinput_hid_event+0xef/0x390
 [] ? hid_input_field+0x40/0x340
 [] hid_process_event+0x63/0x90
 [] hid_input_field+0x2c5/0x340
 [] hid_input_report+0x106/0x260
 [] ? put_lock_stats+0xd/0x30
 [] ? lock_release_holdtime+0x60/0x80
 [] hid_irq_in+0x181/0x190
 [] ? uhci_giveback_urb+0x8a/0x160
 [] usb_hcd_giveback_urb+0x41/0xa0
 [] uhci_giveback_urb+0x97/0x160
 [] uhci_scan_qh+0x70/0x1c0
 [] uhci_scan_schedule+0x8b/0x130
 [] uhci_irq+0xb5/0x150
 [] ? __lock_release+0x47/0x70
 [] usb_hcd_irq+0x24/0x60
 [] handle_IRQ_event+0x28/0x60
 [] handle_fasteoi_irq+0x6e/0xd0
 [] do_IRQ+0x3c/0x80
 [] ? tick_nohz_stop_sched_tick+0x25c/0x350
 [] common_interrupt+0x2e/0x34
 [] ? mwait_idle_with_hints+0x40/0x50
 [] ? mwait_idle+0x0/0x20
 [] mwait_idle+0x12/0x20
 [] cpu_idle+0x61/0x110
 [] rest_init+0x5d/0x60
 [] start_kernel+0x1fa/0x260
 [] ? unknown_bootoption+0x0/0x130
 ===
---[ end trace bc131943b9b4ac4c ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] CRAMFS: Uncompressed files support

2008-01-21 Thread Kyungmin Park
Hi,

This patch enables the uncompressed files support in cramfs.

The word 'uncompressed file' is from linear cramfs (aka Application XIP).
In linear cramfs, it is used to suport XIP on NOR. However it is also helpful 
on OneNAND. It makes a filesystem faster by removing compression overhead.
In XIP mode it runs XIP, But non-XIP mode. It copies data to ram and runs.

In my simple test, copy busybox (compressed or uncompressed).
It reduces the about 50% time saving from 0.40s to 0.19s.
Yes, it incrases the file system size, but nowadays flash has big capacity.
It's trade-off between size and performance.

Also this patch uses the page cache directly.
In previous implementation, it used the local buffer. why?
It's already uncompressed and fits to page size. So It uses the page directly 
to remove useless memory copy.

It's compatible the existing linear cramfs image and original one.

Any comments are welcome.

Thank you,
Kyungmin Park

Signed-off-by: Kyungmin Park <[EMAIL PROTECTED]>
---
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 3d194a2..edba28f 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -40,6 +40,7 @@ static DEFINE_MUTEX(read_mutex);
 #define CRAMINO(x) (((x)->offset && (x)->size)?(x)->offset<<2:1)
 #define OFFSET(x)  ((x)->i_ino)
 
+#define CRAMFS_INODE_IS_XIP(x) ((x)->i_mode & S_ISVTX)
 
 static int cramfs_iget5_test(struct inode *inode, void *opaque)
 {
@@ -143,8 +144,9 @@ static int next_buffer;
 /*
  * Returns a pointer to a buffer containing at least LEN bytes of
  * filesystem starting at byte offset OFFSET into the filesystem.
+ * If the @pg has the page, it returns the page buffer address
  */
-static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned 
int len)
+static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned 
int len, struct page **pg)
 {
struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
struct page *pages[BLKS_PER_BUF];
@@ -174,6 +176,22 @@ static void *cramfs_read(struct super_block *sb, unsigned 
int offset, unsigned i
 
devsize = mapping->host->i_size >> PAGE_CACHE_SHIFT;
 
+   /*
+* Use page directly either 
+* - uncompressed page or
+* - comprssed page which has all required data
+*/
+   if (pg && offset + len <= PAGE_CACHE_SIZE) {
+   struct page *page = NULL;
+   page = read_mapping_page(mapping, blocknr, NULL);
+   if (!IS_ERR(page)) {
+   *pg = page;
+   data = kmap(page);
+   data += offset;
+   return data;
+   }
+   }
+
/* Ok, read in BLKS_PER_BUF pages completely first. */
unread = 0;
for (i = 0; i < BLKS_PER_BUF; i++) {
@@ -253,14 +271,14 @@ static int cramfs_fill_super(struct super_block *sb, void 
*data, int silent)
buffer_blocknr[i] = -1;
 
/* Read the first block and get the superblock from it */
-   memcpy(&super, cramfs_read(sb, 0, sizeof(super)), sizeof(super));
+   memcpy(&super, cramfs_read(sb, 0, sizeof(super), NULL), sizeof(super));
mutex_unlock(&read_mutex);
 
/* Do sanity checks on the superblock */
if (super.magic != CRAMFS_MAGIC) {
/* check at 512 byte offset */
mutex_lock(&read_mutex);
-   memcpy(&super, cramfs_read(sb, 512, sizeof(super)), 
sizeof(super));
+   memcpy(&super, cramfs_read(sb, 512, sizeof(super), NULL), 
sizeof(super));
mutex_unlock(&read_mutex);
if (super.magic != CRAMFS_MAGIC) {
if (!silent)
@@ -367,7 +385,7 @@ static int cramfs_readdir(struct file *filp, void *dirent, 
filldir_t filldir)
int namelen, error;
 
mutex_lock(&read_mutex);
-   de = cramfs_read(sb, OFFSET(inode) + offset, sizeof(*de)+256);
+   de = cramfs_read(sb, OFFSET(inode) + offset, sizeof(*de)+256, 
NULL);
name = (char *)(de+1);
 
/*
@@ -417,7 +435,7 @@ static struct dentry * cramfs_lookup(struct inode *dir, 
struct dentry *dentry, s
char *name;
int namelen, retval;
 
-   de = cramfs_read(dir->i_sb, OFFSET(dir) + offset, 
sizeof(*de)+256);
+   de = cramfs_read(dir->i_sb, OFFSET(dir) + offset, 
sizeof(*de)+256, NULL);
name = (char *)(de+1);
 
/* Try to take advantage of sorted directories */
@@ -463,21 +481,44 @@ static struct dentry * cramfs_lookup(struct inode *dir, 
struct dentry *dentry, s
 static int cramfs_readpage(struct file *file, struct page * page)
 {
struct inode *inode = page->mapping->host;
+   struct super_block *sb = inode->i_sb;
u32 maxblock, bytes_filled;
+   struct page *pg = NULL;
void *pgdata;
 
maxblock = (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
  

Re: [PATCH] mmu notifiers #v3

2008-01-21 Thread Rik van Riel
On Mon, 21 Jan 2008 13:52:04 +0100
Andrea Arcangeli <[EMAIL PROTECTED]> wrote:

> Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

Reviewed-by: Rik van Riel <[EMAIL PROTECTED]>

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Jesper Juhl
On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>:
> > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> > > 2008/1/22, Jesper Juhl <[EMAIL PROTECTED]>:
> > > > Some very pedantic nitpicking below;
> > > >
> > > > On 22/01/2008, Anton Salikhmetov <[EMAIL PROTECTED]> wrote:
> > ...
> > > > > +   if (file && (vma->vm_flags & VM_SHARED)) {
> > > > > +   if (flags & MS_ASYNC)
> > > > > +   vma_wrprotect(vma);
> > > > > +   if (flags & MS_SYNC) {
> > > >
> > > > "else if" ??
> > >
> > > The MS_ASYNC and MS_SYNC flags are mutually exclusive, that is why I
> > > did not use the "else-if" here. Moreover, this function itself checks
> > > that they never come together.
> > >
> >
> > I would say that them being mutually exclusive would be a reason *for*
> > using "else-if" here.
>
> This check is performed by the sys_msync() function itself in its very
> beginning.
>
> We don't need to check it later.
>

Sure, it's just that, to me, using 'else-if' makes it explicit that
the two are mutually exclusive. Using "if (...), if (...)" doesn't.
Maybe it's just me, but I feel that 'else-if' here better shows the
intention...  No big deal.

-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v7 2/2] Update ctime and mtime for memory-mapped files

2008-01-21 Thread Linus Torvalds


On Tue, 22 Jan 2008, Anton Salikhmetov wrote:
>  
>  /*
> + * Scan the PTEs for pages belonging to the VMA and mark them read-only.
> + * It will force a pagefault on the next write access.
> + */
> +static void vma_wrprotect(struct vm_area_struct *vma)
> +{
> + unsigned long addr;
> +
> + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> + spinlock_t *ptl;
> + pgd_t *pgd = pgd_offset(vma->vm_mm, addr);
> + pud_t *pud = pud_offset(pgd, addr);
> + pmd_t *pmd = pmd_offset(pud, addr);
> + pte_t *pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);

This is extremely expensive over bigger areas, especially sparsely mapped 
ones (it does all the lookups for all four levels over and over and over 
again for eachg page).

I think Peter Zijlstra posted a version that uses the regular kind of 
nested loop (with inline functions to keep the thing nice and clean), 
which gets rid of that.

[ The sad/funny part is that this is all how we *used* to do msync(), back 
  in the days: we're literally going back to the "pre-cleanup" logic. See 
  commit 204ec841fbea3e5138168edbc3a76d46747cc987: "mm: msync() cleanup" 
  for details ]

Quite frankly, I really think you might be better off just doing a

git revert 204ec841fbea3e5138168edbc3a76d46747cc987

and working from there! I just checked, and it still reverts cleanly, and 
you'd end up with a nice code-base that (a) has gotten years of testing 
and (b) already has the looping-over-the-pagetables code.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >