Re: Add the infamous Huawei E220 to option.c

2007-11-28 Thread Pete Zaitcev
On Thu, 29 Nov 2007 08:38:59 +0100, Oliver Neukum <[EMAIL PROTECTED]> wrote:

> Am Donnerstag, 29. November 2007 01:13:05 schrieb Pete Zaitcev:
> > The problem stems from the fact that both option and usb-storage can bind
> > to the modem when in storage mode: the former binds because of the storage
> > class, the latter binds because of VID/PID match. The modprobe loads both,
> 
> Isn't it possible to fix this in option's module table?

At first thought it'll need adding a field to struct usb_serial to save
the driver_info from the ID table in usb_serial_probe. It's something I'd
like to discuss actually. I hate fields which store information this
way: filled in one place, used in another place... From the perspective
of code prettiness I would rather add another method for usb_serial_probe
to call. But I'm not sure really.

-- Pete
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4, v3] Physical PCI slot objects

2007-11-28 Thread Kenji Kaneshige
> Hi Gary, Kenji-san, et. al,
> 
> * Gary Hade <[EMAIL PROTECTED]>:
>> Alex, What I was trying to suggest is a boot-time kernel
>> option, not a kernel configuration option.  The basic idea is
>> to give the user (with a single binary kernel) the ability to
>> include your ACPI-PCI slot driver feature changes only when
>> they are really needed.  In addition to reducing the number of
>> system/PCI hotplug driver combinations where your changes would
>> need to be validated, I believe would also help alleviate other
>> worries (e.g. Andi Kleen's memory consumption concern).  I
>> believe this goal could also be achieved with the kernel config
>> option by making the pci_slot module runtime loadable with the
>> PCI hotplug drivers only visiting your new code when the
>> pci_slot driver is loaded, although I think this would be more
>> difficult to implement.
> 
> I have modified my patch series so that the final patch that
> introduces my ACPI-PCI slot driver is a full-fledged module, that
> has a tristate Kconfig option.
> 

Thank you for your good job.

I tested shpchp and pciehp both with and without pci_slot module. There
seems no regression from shpchp and pciehp's point of view.
(I had a little concern about the hotplug slots' name that vary depending
on whether pci_slot functionality is enabled or disabled. But, now that we
can build pci_slot driver as a kernel module, I don't think it is a big
problem).

Only the problems is that I got Call Traces with the following error
messages when pci_slot driver was loaded, and one strange slot named
'1023' was registered (other slots are fine). This is the same problem
I reported before.

sysfs: duplicate filename '1023' can not be created
WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()

kobject_add failed for 1023 with -EEXIST, don't try to register
things with the same name in the same directory.

On my system, hotplug slots themselves can be added, removed and replaced
with the ohter type of I/O box. The ACPI firmware tells OS the presence of
those slots using _STA method (That is, it doesn't use 'LoadTable()' AML
operator). On the other hand, current pci_slot driver doesn't check _STA.
As a result, pci_slot driver tryied to register the invalid (non-existing)
slots. The ACPI firmware of my system returns '1023' if the invalid slot's
_SUN is evaluated. This is the cause of Call Traces mentioned above. To
fix this problem, pci_slot driver need to check _STA when scanning ACPI
Namespace.

I'm sorry for reporting this so late. I'm attaching the patch to fix the
problem. This is against 2.6.24-rc3 with your patches applied. Could you
try it?

BTW, acpiphp also seems to have the same problem...

Thanks,
Kenji Kaneshige

---
 drivers/acpi/pci_slot.c |   13 +
 1 file changed, 13 insertions(+)

Index: linux-2.6.24-rc3/drivers/acpi/pci_slot.c
===
--- linux-2.6.24-rc3.orig/drivers/acpi/pci_slot.c
+++ linux-2.6.24-rc3/drivers/acpi/pci_slot.c
@@ -113,10 +113,17 @@ register_slot(acpi_handle handle, u32 lv
int device;
unsigned long sun;
char name[KOBJ_NAME_LEN];
+   acpi_status status;
+   struct acpi_device *dummy_device;
 
struct pci_slot *pci_slot;
struct pci_bus *pci_bus = context;
 
+   /* Skip non-existing device object. */
+   status = acpi_bus_get_device(handle, _device);
+   if (ACPI_FAILURE(status))
+   return AE_OK;
+
if (check_slot(handle, , ))
return AE_OK;
 
@@ -150,12 +157,18 @@ walk_p2p_bridge(acpi_handle handle, u32 
acpi_status status;
acpi_handle dummy_handle;
acpi_walk_callback user_function;
+   struct acpi_device *dummy_device;
 
struct pci_dev *dev;
struct pci_bus *pci_bus;
struct p2p_bridge_context child_context;
struct p2p_bridge_context *parent_context = context;
 
+   /* Skip non-existing device object. */
+   status = acpi_bus_get_device(handle, _device);
+   if (ACPI_FAILURE(status))
+   return AE_OK;
+
pci_bus = parent_context->pci_bus;
user_function = parent_context->user_function;
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question regarding mutex locking

2007-11-28 Thread Jarek Poplawski
On 29-11-2007 03:34, David Schwartz wrote:
>> Thanks for the help. Someday, I hope to understand this stuff.
>>
>> Larry
> 
> Any code either deals with an object or it doesn't. If it doesn't deal with
> that object, it should not be acquiring locks on that object. If it does
> deal with that object, it must know the internal details of that object,
> including when and whether locks are held, or it cannot deal with that
> object sanely.
...

Maybe it'll unnecessarily complicate the thing, but since you repeat
the need to know the object - sometimes the locking is done to
synchronize something in time only, so to assure only one action is
done at a time or a few actions are done in proper order, or/and
shouldn't be broken in the meantime by other actions (so, no need
to deal with any common data).

But, of course, we can say an action could be a kind of object too.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] New kobject/kset/ktype documentation and example code

2007-11-28 Thread Kay Sievers
On Wed, 2007-11-28 at 22:08 -0800, Greg KH wrote:
> On Wed, Nov 28, 2007 at 06:00:27PM +0100, Kay Sievers wrote:
> > On Wed, 2007-11-28 at 17:51 +0100, Cornelia Huck wrote:
> > > On Wed, 28 Nov 2007 17:36:29 +0100, Kay Sievers <[EMAIL PROTECTED]> wrote:
> > > > On Wed, 2007-11-28 at 17:12 +0100, Cornelia Huck wrote:
> > > > > On Wed, 28 Nov 2007 16:57:48 +0100, Kay Sievers <[EMAIL PROTECTED]> 
> > > > > wrote:
> > > > > > On Wed, 2007-11-28 at 16:48 +0100, Cornelia Huck wrote:
> > > > > > > On Wed, 28 Nov 2007 13:23:02 +0100, Kay Sievers <[EMAIL 
> > > > > > > PROTECTED]> wrote:
> > > > > > > > On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > > > > > > > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[EMAIL 
> > > > > > > > > PROTECTED]> wrote:
> > > > > > 
> > > > > > > > > > The uevent function will be called when the uevent is about 
> > > > > > > > > > to be sent to
> > > > > > > > > > userspace to allow more environment variables to be added 
> > > > > > > > > > to the uevent.
> > > > > > > > > 
> > > > > > > > > It may be helpful to mention which uevents are by default 
> > > > > > > > > created by
> > > > > > > > > the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
> > > > > > > > 
> > > > > > > > I think, we should remove all these default events from the 
> > > > > > > > kobject
> > > > > > > > core. We will not be able to manage the timing issues and "raw" 
> > > > > > > > kobject
> > > > > > > > users should request the events on their own, when they are 
> > > > > > > > finished
> > > > > > > > adding stuff to the kobject. I see currently no way to solve the
> > > > > > > > "attributes created after the event" problem. The new
> > > > > > > > *_create_and_register functions do not allow default attributes 
> > > > > > > > to be
> > > > > > > > created, which will just lead to serious trouble when someone 
> > > > > > > > wants to
> > > > > > > > use udev to set defaults and such things. We may just want to 
> > > > > > > > require an
> > > > > > > > explicit call to send the event?
> > > > > > > 
> > > > > > > There will always be attributes that will show up later (for 
> > > > > > > example,
> > > > > > > after a device is activated). Probably the best approach is to 
> > > > > > > keep the
> > > > > > > default uevents, but have the attribute-adder send another uevent 
> > > > > > > when
> > > > > > > they are done?
> > > > > > 
> > > > > > Uh, that's more an exception where we can't give guarantees because 
> > > > > > of
> > > > > > very specific hardware setups, and it would be an additional 
> > > > > > "change"
> > > > > > event. There are valid cases for this, but only a _very_ few.
> > > > > > 
> > > > > > There is absolutely no reason not to do it right with the "add" 
> > > > > > event,
> > > > > > just because we are too lazy to solve it proper the current code. 
> > > > > > It's
> > > > > > just so broken by design, what we are doing today. :)
> > > > > 
> > > > > I'm worrying a bit about changes that impact the whole code tree in
> > > > > lots of places. I'd be fine with the device layer doing its uevent
> > > > > manually in device_add() at the very end, though. (This would allow
> > > > > drivers to add attributes in their probe function before the uevent,
> > > > > for example.)
> > > 
> > > 
> > 
> > I think I still remember what I did 2.5 years ago :)
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e57cd73e2e844a3da25cc6b420674c81bbe1b387
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=18c3d5271b472c096adfc856e107c79f6fd30d7d
> > 
> > > > The driver core does use the split already in most places, I did that
> > > > long ago. There are not too many (~20) users of kobject_register(), and
> > > > it's a pretty straight-forward change to change that to _init, _add,
> > > > _uevent, and get rid of that totally useless "convenience api".
> > > > 
> > > > I think there is no longer any excuse to keep that broken code around,
> > > > and even require to document that it's broken. The whole purpose of the
> > > > uevent is userspace consumption, which just doesn't work correctly with
> > > > the code we offer. The fix is trivial, and should be done now, and we no
> > > > longer need to fiddle around timing issues, just because we are too
> > > > lazy.
> > > > 
> > > > I propose the removal of _all_ funtions that have *register* in their
> > > > name, and always require the following sequence:
> > > >   _init()
> > > >   _add()
> > > >   _uevent(_ADD)
> > > > 
> > > >   _uevent(_REMOVE)
> > > >   _del()
> > > >   _put()
> > > > 
> > > > The _create_and_register() functions would become  _create_ and_add()
> > > > and will need an additional _uevent() call after they populated the
> > > > object.
> > > 
> > > I'm absolutely fine with doing that at the kobject level (after all,
> > > it's a quite contained change, and the uevent function explicitely
> > > works on a kobject).
> > > 
> > > For the other 

Re: Add the infamous Huawei E220 to option.c

2007-11-28 Thread Oliver Neukum
Am Donnerstag, 29. November 2007 07:33:03 schrieb Johann Wilhelm:
> But in my opinion the the modul-load-order should be forced by udev...  
> this should work and we only have 1 position to keep in mind if we eg.  
> get a new E220, support for the E270 or something else...

No, udev cannot help here because any of the two modules may already
be loaded when you plug in your device. You also need to get the kernel space
probing corrected. Basically you have three options.

1. Make both drivers handle the issue. That means code duplication
2. Make the option driver fail gracefully in probe()
3. Make sure usbcore doesn't probe the devices in the wrong mode with the 
option driver

Regards
Oliver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add the infamous Huawei E220 to option.c

2007-11-28 Thread Johann Wilhelm

Hi there,

Well your code basically looks nice... but keep in mind that there are  
several different E220-devices (in fact i know of 2 different PIDs...  
and I would be really surprised if they only use 2 of them...). So you  
should check all possible PIDs...


But in my opinion the the modul-load-order should be forced by udev...  
this should work and we only have 1 position to keep in mind if we eg.  
get a new E220, support for the E270 or something else...


73

Zitat von Pete Zaitcev <[EMAIL PROTECTED]>:


Hi, All:

It looks like the Huawei E220 saga is not over yet. A collegue of mine,
David Russll, reported that the modem does not work reliably on Fedora 8,
which does have the initializer in usb-storage.

The problem stems from the fact that both option and usb-storage can bind
to the modem when in storage mode: the former binds because of the storage
class, the latter binds because of VID/PID match. The modprobe loads both,
it's random which wins. If usb-storage wins, everything is fine. If option
wins, it binds to modem still in storage mode and does not work.

I propose we add the same initializer that usb-storage has to the option.
This way no matter which driver wins the modem gets initialized. The
patch is tested on David's modem, but I would like someone give it more
testing.

I dunno, do we want some kind of code sharing between storage and option?
They both could use the normal usb_control_msg, I think.

Also, from archives it looks like Johann may need PID 0x1004 added.

Since we're on topic, David's modem has exactly same IDs as Norbert's,
but works fine with the length of 1. Although it's possible that the
firmware is different without different firmware reported in USB desc-
riptors. Does anyone know a magic AT command? ATI or something?
Norbert, please try my patch, maybe it'll work this time.

And finally, pleas stop using that script from the polish website and
above all quit using the generic serial subdriver. The option must
work now with the patch. Please let me know if it fails.

Thanks in advance,
-- Pete

diff -urp -X dontdiff  
linux-2.6.23.1-42.fc8/drivers/usb/serial/option.c  
linux-2.6.23.1-42.fc8.e220.1/drivers/usb/serial/option.c
--- linux-2.6.23.1-42.fc8/drivers/usb/serial/option.c	2007-10-09  
13:31:38.0 -0700
+++  
linux-2.6.23.1-42.fc8.e220.1/drivers/usb/serial/option.c	2007-11-27  
21:36:11.0 -0800

@@ -448,7 +448,7 @@ static void option_indat_callback(struct
err = usb_submit_urb(urb, GFP_ATOMIC);
if (err)
printk(KERN_ERR "%s: resubmit read urb failed. "
-   "(%d)", __FUNCTION__, err);
+   "(%d)\n", __FUNCTION__, err);
}
}
return;
@@ -728,6 +728,35 @@ static int option_send_setup(struct usb_
return 0;
 }

+static void option_start_huawei(struct usb_serial *serial)
+{
+   struct usb_device *dev = serial->dev;
+   char *buf;
+   int rc;
+
+   if (!(le16_to_cpu(dev->descriptor.idVendor) == HUAWEI_VENDOR_ID &&
+   le16_to_cpu(dev->descriptor.idProduct) == HUAWEI_PRODUCT_E220))
+   return;
+
+   if ((buf = kmalloc(1, GFP_KERNEL)) == 0)
+   goto err_buf;
+
+   buf[0] = 0x1;
+   rc = usb_control_msg(dev, usb_sndctrlpipe(dev, 0),
+   USB_REQ_SET_FEATURE, USB_TYPE_STANDARD | USB_RECIP_DEVICE,
+   0x01, 0x0, buf, 1, 1000);
+   if (rc) {
+   printk(KERN_ERR "%s: HUAWEI E220 setup failed (%d)\n",
+   __FUNCTION__, rc);
+   }
+
+   kfree(buf);
+   return;
+
+err_buf:
+   ;
+}
+
 static int option_startup(struct usb_serial *serial)
 {
int i, err;
@@ -736,6 +765,8 @@ static int option_startup(struct usb_ser

dbg("%s", __FUNCTION__);

+   option_start_huawei(serial);
+
/* Now setup per port private data */
for (i = 0; i < serial->num_ports; i++) {
port = serial->port[i];




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add the infamous Huawei E220 to option.c

2007-11-28 Thread Oliver Neukum
Am Donnerstag, 29. November 2007 01:13:05 schrieb Pete Zaitcev:
> The problem stems from the fact that both option and usb-storage can bind
> to the modem when in storage mode: the former binds because of the storage
> class, the latter binds because of VID/PID match. The modprobe loads both,

Isn't it possible to fix this in option's module table?

Regards
Oliver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] timekeeping: rename timekeeping_is_continuous to timekeeping_valid_for_hres

2007-11-28 Thread Li Zefan

Function timekeeping_is_continuous() no longer checks flag
CLOCK_IS_CONTINUOUS, and it checks CLOCK_SOURCE_VALID_FOR_HRES
now. So rename the function accordingly.

Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

---
 include/linux/time.h  |2 +-
 kernel/time/tick-sched.c  |2 +-
 kernel/time/timekeeping.c |4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index b04136d..fa21fe5 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -120,7 +120,7 @@ extern void getboottime(struct timespec *ts);
 extern void monotonic_to_bootbased(struct timespec *ts);
 
 extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
-extern int timekeeping_is_continuous(void);
+extern int timekeeping_valid_for_hres(void);
 extern void update_wall_time(void);
 
 /**
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 27a2338..fb69787 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -654,7 +654,7 @@ int tick_check_oneshot_change(int allow_nohz)
if (ts->nohz_mode != NOHZ_MODE_INACTIVE)
return 0;
 
-   if (!timekeeping_is_continuous() || !tick_is_oneshot_available())
+   if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
return 0;
 
if (!allow_nohz)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e5e466b..e112dc4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -211,9 +211,9 @@ static inline s64 __get_nsec_offset(void) { return 0; }
 #endif
 
 /**
- * timekeeping_is_continuous - check to see if timekeeping is free running
+ * timekeeping_valid_for_hres - Check if timekeeping is suitable for hres
  */
-int timekeeping_is_continuous(void)
+int timekeeping_valid_for_hres(void)
 {
unsigned long seq;
int ret;
-- 
1.5.3.rc7

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: void* arithmnetic

2007-11-28 Thread Benny Halevy
On Nov. 29, 2007, 3:19 +0200, "Ming Lei" <[EMAIL PROTECTED]> wrote:
> 2007/11/29, Jan Engelhardt <[EMAIL PROTECTED]>:
>> On Nov 29 2007 01:05, J.A. Magallón wrote:
>>> Since begin of the ages the build of the nvidia driver says things like
>>> this:
>>>
>> Explicitly adding -Wpointer-arith to ones own Makefile is like
>> admitting the code might be problematic. :->
>>
>>
>> I think sizeof(void *) == 1 is taken as granted as sizeof(int) >= 4
>> these days. Sigh.
> sizeof(void *) == 4, sizeof(void)==1, :)
well, sizeof(void *) == sizeof(unsigned long) maybe :)

>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [EMAIL PROTECTED]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/6] time: fix typo in comments

2007-11-28 Thread Li Zefan

Fix typo in comments.

BTW: I have to fix coding style in arch/ia64/kernel/time.c also,
otherwise checkpatch.pl will be complaining.

Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

---
 arch/ia64/kernel/time.c   |   14 +++---
 arch/x86/kernel/time_64.c |2 +-
 include/linux/hrtimer.h   |2 +-
 include/linux/jiffies.h   |6 +++---
 kernel/time.c |4 ++--
 kernel/time/clockevents.c |2 +-
 kernel/time/timekeeping.c |2 +-
 7 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 2bb8421..5fc8c89 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -49,13 +49,13 @@ EXPORT_SYMBOL(last_cli_ip);
 #endif
 
 static struct clocksource clocksource_itc = {
-.name   = "itc",
-.rating = 350,
-.read   = itc_get_cycles,
-.mask   = CLOCKSOURCE_MASK(64),
-.mult   = 0, /*to be caluclated*/
-.shift  = 16,
-.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+   .name   = "itc",
+   .rating = 350,
+   .read   = itc_get_cycles,
+   .mask   = CLOCKSOURCE_MASK(64),
+   .mult   = 0, /*to be calculated*/
+   .shift  = 16,
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
 };
 static struct clocksource *itc_clocksource;
 
diff --git a/arch/x86/kernel/time_64.c b/arch/x86/kernel/time_64.c
index 368b194..2cc7570 100644
--- a/arch/x86/kernel/time_64.c
+++ b/arch/x86/kernel/time_64.c
@@ -235,7 +235,7 @@ static unsigned int __init tsc_calibrate_cpu_khz(void)
reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
}
local_irq_save(flags);
-   /* start meauring cycles, incrementing from 0 */
+   /* start measuring cycles, incrementing from 0 */
wrmsrl(MSR_K7_PERFCTR0 + i, 0);
wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
rdtscl(tsc_start);
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 80c7e98..d42c6be 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -78,7 +78,7 @@ enum hrtimer_cb_mode {
  * as otherwise the timer could be removed before the softirq code finishes the
  * the handling of the timer.
  *
- * The HRTIMER_STATE_ENQUEUE bit is always or'ed to the current state to
+ * The HRTIMER_STATE_ENQUEUED bit is always or'ed to the current state to
  * preserve the HRTIMER_STATE_CALLBACK bit in the above scenario.
  *
  * All state transitions are protected by cpu_base->lock.
diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 8b08002..b071f46 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -36,7 +36,7 @@
 /* LATCH is used in the interval timer and ftape setup. */
 #define LATCH  ((CLOCK_TICK_RATE + HZ/2) / HZ) /* For divider */
 
-/* Suppose we want to devide two numbers NOM and DEN: NOM/DEN, the we can
+/* Suppose we want to devide two numbers NOM and DEN: NOM/DEN, then we can
  * improve accuracy by shifting LSH bits, hence calculating:
  * (NOM << LSH) / DEN
  * This however means trouble for large NOM, because (NOM << LSH) may no
@@ -154,7 +154,7 @@ extern unsigned long preset_lpj;
  * We want to do realistic conversions of time so we need to use the same
  * values the update wall clock code uses as the jiffies size.  This value
  * is: TICK_NSEC (which is defined in timex.h).  This
- * is a constant and is in nanoseconds.  We will used scaled math
+ * is a constant and is in nanoseconds.  We will use scaled math
  * with a set of scales defined here as SEC_JIFFIE_SC,  USEC_JIFFIE_SC and
  * NSEC_JIFFIE_SC.  Note that these defines contain nothing but
  * constants and so are computed at compile time.  SHIFT_HZ (computed in
@@ -198,7 +198,7 @@ extern unsigned long preset_lpj;
  * operator if the result is a long long AND at least one of the
  * operands is cast to long long (usually just prior to the "*" so as
  * not to confuse it into thinking it really has a 64-bit operand,
- * which, buy the way, it can do, but it take more code and at least 2
+ * which, buy the way, it can do, but it takes more code and at least 2
  * mpys).
 
  * We also need to be aware that one second in nanoseconds is only a
diff --git a/kernel/time.c b/kernel/time.c
index 09d3c45..c25f472 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -266,7 +266,7 @@ EXPORT_SYMBOL(jiffies_to_usecs);
  *
  * This function should be only used for timestamps returned by
  * current_kernel_time() or CURRENT_TIME, not with do_gettimeofday() because
- * it doesn't handle the better resolution of the later.
+ * it doesn't handle the better resolution of the latter.
  */
 struct timespec timespec_trunc(struct timespec t, unsigned gran)
 {
@@ -314,7 +314,7 @@ EXPORT_SYMBOL_GPL(getnstimeofday);
  * This algorithm was first published by Gauss (I think).
  *
  * WARNING: this function will overflow on 2106-02-07 06:28:16 on
- * machines were long is 

[PATCH 5/6] time: delete comments that refer to noexistent symbols

2007-11-28 Thread Li Zefan

Function do_timer_interrupt_hook() don't take argument regs,
and structure hrtimer_sleeper don't have member cb_pending.
So delete comments refering to these symbols.

Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

---
 include/asm-x86/mach-voyager/do_timer.h |1 -
 include/linux/hrtimer.h |1 -
 2 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/include/asm-x86/mach-voyager/do_timer.h 
b/include/asm-x86/mach-voyager/do_timer.h
index bc2b589..9e5a459 100644
--- a/include/asm-x86/mach-voyager/do_timer.h
+++ b/include/asm-x86/mach-voyager/do_timer.h
@@ -6,7 +6,6 @@
 
 /**
  * do_timer_interrupt_hook - hook into timer tick
- * @regs: standard registers from interrupt
  *
  * Call the pit clock event handler. see asm/i8253.h
  **/
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 0a23302..d42c6be 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -149,7 +149,6 @@ struct hrtimer_sleeper {
  * @get_time:  function to retrieve the current time of the clock
  * @get_softirq_time:  function to retrieve the current time from the softirq
  * @softirq_time:  the time when running the hrtimer queue in the softirq
- * @cb_pending:list of timers where the callback is pending
  * @offset:offset of this clock to the monotonic base
  * @reprogram: function to reprogram the timer event
  */
-- 
1.5.3.rc7

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] tick: add a missing dot in prink

2007-11-28 Thread Li Zefan

Add a missing '.' in prink information.

Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

---
 kernel/time/tick-oneshot.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/time/tick-oneshot.c b/kernel/time/tick-oneshot.c
index 0258d31..0b5e513 100644
--- a/kernel/time/tick-oneshot.c
+++ b/kernel/time/tick-oneshot.c
@@ -78,7 +78,7 @@ int tick_switch_to_oneshot(void (*handler)(struct 
clock_event_device *))
printk(KERN_INFO "Clockevents: "
   "could not switch to one-shot mode:");
if (!dev) {
-   printk(" no tick device\n");
+   printk(" no tick device.\n");
} else {
if (!tick_device_is_functional(dev))
printk(" %s is not functional.\n", dev->name);
-- 
1.5.3.rc7

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/6] time: small fixes and code cleanups

2007-11-28 Thread Li Zefan

Those patches do some small fixes and code cleanups.
No actual bug is fixed though.


Li Zefan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6] clocksource: remove redundant code

2007-11-28 Thread Li Zefan

Flag CLOCK_SOURCE_WATCHDOG is cleared twice. Note clocksource_change_rating()
won't do anyting with the cs flag.

Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

---
 kernel/time/clocksource.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c8a9d13..0ba9fa8 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -91,7 +91,6 @@ static void clocksource_ratewd(struct clocksource *cs, 
int64_t delta)
   cs->name, delta);
cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG);
clocksource_change_rating(cs, 0);
-   cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
list_del(>wd_list);
 }
 
-- 
1.5.3.rc7

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/6] clockevent: simplify list operations

2007-11-28 Thread Li Zefan

list_for_each_safe() suffices here.

Signed-off-by: Li Zefan <[EMAIL PROTECTED]>

---
 kernel/time/clockevents.c |   11 ---
 1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 822beeb..68fbe73 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -200,6 +200,8 @@ void clockevents_exchange_device(struct clock_event_device 
*old,
  */
 void clockevents_notify(unsigned long reason, void *arg)
 {
+   struct list_head *node, *tmp;
+
spin_lock(_lock);
clockevents_do_notify(reason, arg);
 
@@ -209,13 +211,8 @@ void clockevents_notify(unsigned long reason, void *arg)
 * Unregister the clock event devices which were
 * released from the users in the notify chain.
 */
-   while (!list_empty(_released)) {
-   struct clock_event_device *dev;
-
-   dev = list_entry(clockevents_released.next,
-struct clock_event_device, list);
-   list_del(>list);
-   }
+   list_for_each_safe(node, tmp, _released)
+   list_del(node);
break;
default:
break;
-- 
1.5.3.rc7

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] jiffies counter leaps in 2.6.24-rc3

2007-11-28 Thread Stefano Brivio
On Sat, 24 Nov 2007 20:31:25 +0100
"Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:

> On Saturday, 24 of November 2007, Stefano Brivio wrote:
> > On Sat, 24 Nov 2007 19:48:58 +0100
> > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> > 
> > > NO_HZ?  Highres timers?
> > 
> > CONFIG_HZ_1000=y
> > # CONFIG_HIGH_RES_TIMERS is not set
> >  
> > > I understand that the previous kernels behave correctly.  All of them?
> > 
> > 2.6.21 behaved correctly. Sorry but git-bisect would take a lot of time
> > (I can't reliably reproduce the jiffies jump), so I would avoid that if
> > not strictly needed.
> 
> Well, it would be good to know if 2.6.23 behaves correctly, at least.

Weird, it looks like I can't boot with 2.6.23.9 because of some issues with
dm-crypt (my root filesystem is encrypted). I double-checked the
configuration (which I just took from my current one), well, no way. Any
other test I can do?

In the meanwhile, I noted another thing: sometimes it happens that I become
root and the jiffies counter jumps ahead. Then, when I close any root
session, the jiffies counter jumps back to the correct value.

Please remember that this isn't just an aesthetic issue, as some drivers
(e.g. b43 and b43legacy, but I guess a lot more) rely on jiffies. Do I need
to file a bug to bugzilla.kernel.org? Thank you.


--
Ciao
Stefano
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [libata] Set proper ATA UDMA mode for bf548 according to system clock.

2007-11-28 Thread Sonic Zhang
Any comment?

Thanks

Sonic

On Nov 27, 2007 12:47 PM, sonic zhang <[EMAIL PROTECTED]> wrote:
> UDMA Mode - Frequency compatibility
>
> UDMA5 - 100 MB/s   - SCLK  = 133 MHz
> UDMA4 - 66 MB/s- SCLK >=  80 MHz
> UDMA3 - 44.4 MB/s  - SCLK >=  50 MHz
> UDMA2 - 33 MB/s- SCLK >=  40 MHz
>
>
> Signed-off-by: Sonic Zhang <[EMAIL PROTECTED]>
> ---
>  drivers/ata/pata_bf54x.c |7 +++
>  1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/ata/pata_bf54x.c b/drivers/ata/pata_bf54x.c
> index 81db405..088a41f 100644
> --- a/drivers/ata/pata_bf54x.c
> +++ b/drivers/ata/pata_bf54x.c
> @@ -1489,6 +1489,8 @@ static int __devinit bfin_atapi_probe(st
> int board_idx = 0;
> struct resource *res;
> struct ata_host *host;
> +   unsigned int fsclk = get_sclk();
> +   int udma_mode = 5;
> const struct ata_port_info *ppi[] =
> { _port_info[board_idx], NULL };
>
> @@ -1507,6 +1509,11 @@ static int __devinit bfin_atapi_probe(st
> if (res == NULL)
> return -EINVAL;
>
> +   while (bfin_port_info[board_idx].udma_mask>0 && udma_fsclk[udma_mode] 
> > fsclk) {
> +   udma_mode--;
> +   bfin_port_info[board_idx].udma_mask >>= 1;
> +   }
> +
> /*
>  * Now that that's out of the way, wire up the port..
>  */
> --
> 1.4.3.4
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc3-git2 softlockup detected

2007-11-28 Thread Kamalesh Babulal
Andrew Morton wrote:
> On Wed, 28 Nov 2007 12:47:19 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> 
>> Andrew Morton wrote:
>>> On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> 
>>> wrote:
>>>
 Hi,
>>> (cc linux-scsi, for sym53c8xx)
>>>
 Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox
>>> I assume this is a post-2.6.23 regression?
>>>
 BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
 NIP: c002f02c LR: d01414fc CTR: c002f018
 REGS: c0077cbef0b0 TRAP: 0901   Not tainted  (2.6.24-rc3-git2-autotest)
 MSR: 80009032   CR: 24022088  XER: 
 TASK = c0077cbd8000[375] 'insmod' THREAD: c0077cbec000 CPU: 1
 GPR00: d01414fc c0077cbef330 c052b930 d80080002014 
 GPR04: d8008000202c  c0077ca1cb00 d014ce54 
 GPR08: c0077ca1c63c  002a c002f018 
 GPR12: d0143610 c0473d00 
 NIP [c002f02c] .ioread8+0x14/0x60
 LR [d01414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
 Call Trace:
 [c0077cbef330] [c0077cbef3c0] 0xc0077cbef3c0 (unreliable)
 [c0077cbef3a0] [d01414fc] .sym_hcb_attach+0x1188/0x1378 
 [sym53c8xx]
 [c0077cbef470] [d01395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
 [c0077cbef710] [c01bc118] .pci_device_probe+0x124/0x1b0
 [c0077cbef7b0] [c0221138] .driver_probe_device+0x144/0x20c
 [c0077cbef850] [c0221450] .__driver_attach+0xcc/0x154
 [c0077cbef8e0] [c021ff94] .bus_for_each_dev+0x7c/0xd4
 [c0077cbef9a0] [c0220e9c] .driver_attach+0x28/0x40
 [c0077cbefa20] [c02204d8] .bus_add_driver+0x90/0x228
 [c0077cbefac0] [c0221858] .driver_register+0x94/0xb0
 [c0077cbefb40] [c01bc430] .__pci_register_driver+0x6c/0xcc
 [c0077cbefbe0] [d0143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
 [c0077cbefc80] [c008ce80] .sys_init_module+0x17c4/0x1958
 [c0077cbefe30] [c000872c] syscall_exit+0x0/0x40
 Instruction dump:
 6000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6 
 f8010010 f821ff91 7c0004ac 8923 <0c09> 4c00012c 79290620 2f8900ff 
>>> I see no obvious lockup sites near the end of sym_hcb_attach().  Maybe it's
>>> being called lots of times from a higher level..  Do the traces all look
>>> the same?
>> Hi Andrew,
>>
>> I see this call trace twice and both looks similar and on another reboot
>> the following trace is seen twice in different cpu
>>
>> BUG: soft lockup detected on CPU#3!
>> Call Trace:
>> [C0003FEDEDA0] [C0010220] .show_stack+0x68/0x1b0 (unreliable)
>> [C0003FEDEE40] [C00A061C] .softlockup_tick+0xf0/0x13c
>> [C0003FEDEEF0] [C0072E2C] .run_local_timers+0x1c/0x30
>> [C0003FEDEF70] [C0022FA0] .timer_interrupt+0xa8/0x488
>> [C0003FEDF050] [C00034EC] decrementer_common+0xec/0x100
>> --- Exception: 901 at .ioread8+0x14/0x60
>> LR = .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
>> [C0003FEDF340] [D02B3BC0] 0xd02b3bc0 (unreliable)
>> [C0003FEDF3B0] [D029A3C0] .sym_hcb_attach+0x1194/0x1384 
>> [sym53c8xx]
>> [C0003FEDF480] [D0291D30] .sym2_probe+0x75c/0x9f8 [sym53c8xx]
>> [C0003FEDF710] [C01B65A4] .pci_device_probe+0x13c/0x1dc
>> [C0003FEDF7D0] [C0219A0C] .driver_probe_device+0xa0/0x15c
>> [C0003FEDF870] [C0219C64] .__driver_attach+0xb4/0x138
>> [C0003FEDF900] [C021913C] .bus_for_each_dev+0x7c/0xd4
>> [C0003FEDF9C0] [C02198B0] .driver_attach+0x28/0x40
>> [C0003FEDFA40] [C0218BA4] .bus_add_driver+0x98/0x18c
>> [C0003FEDFAE0] [C021A064] .driver_register+0xa8/0xc4
>> [C0003FEDFB60] [C01B68AC] .__pci_register_driver+0x5c/0xa4
>> [C0003FEDFBF0] [D029C204] .sym2_init+0x104/0x1550 [sym53c8xx]
>> [C0003FEDFC90] [C008D1F4] .sys_init_module+0x1764/0x1998
>> [C0003FEDFE30] [C000869C] syscall_exit+0x0/0x40
>>
> 
> hm, odd.
> 
> Can you look up sym_hcb_attach+0x1194/0x1384 in gdb?  Something like
> 
Hi Andrew,

I tried with 2.6.24-rc3-git3 and got the following trace

BUG: soft lockup - CPU#2 stuck for 11s! [insmod:375]
NIP: c002f02c LR: d01414fc CTR: c002f018
REGS: c0077ca3b0b0 TRAP: 0901   Not tainted  (2.6.24-rc3-git3-autokern1)
MSR: 80009032   CR: 24022088  XER: 
TASK = c0077cc58000[375] 'insmod' THREAD: c0077ca38000 CPU: 2
GPR00: d01414fc c0077ca3b330 c052b880 d80080002014 
GPR04: d8008000202c  c0077c82eb00 d014ce54 
GPR08: c0077c82e63c  002a c002f018 
GPR12: d0143610 c0473f80 
NIP [c002f02c] 

Re: [patch 1/1] Writeback fix for concurrent large and small file writes

2007-11-28 Thread Michael Rubin
Thank you. Integrated the fixes in my patch.

On Nov 28, 2007 6:13 PM, Frans Pop <[EMAIL PROTECTED]> wrote:
> Two typos in comments.
>
> Cheers,
> FJP
>
> Michael Rubin wrote:
> > + * The flush tree organizes the dirtied_when keys with the rb_tree. Any
> > + * inodes with a duplicate dirtied_when value are link listed together.
> > This + * link list is sorted by the inode's i_flushed_when. When both the
> > + * dirited_when and the i_flushed_when are indentical the order in the
> > + * linked list determines the order we flush the inodes.
>
> s/dirited_when/dirtied_when/
>
> > + * Here is where we interate to find the next inode to process. The
> > + * strategy is to first look for any other inodes with the same
> > dirtied_when + * value. If we have already processed that node then we
> > need to find + * the next highest dirtied_when value in the tree.
>
> s/interate/iterate/
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


get block_device from name

2007-11-28 Thread Tim Barbour
Please CC: me on any replies, as I am not subscribed to the list.

I want to do some bio IO on a block device (*no* filesystem involved).

First I need to get hold of the struct block_device. What is the current
recommended way to get from the name of a device (e.g. "/dev/sda1") to the
corresponding struct block_device ? What is the current canonical representation
of a device - dev_t ? Did kdev_t go away ?

Given the struct block_device, can I immediately use it in a bio, or do I need
to prepare the device first ?

I think I understand how to submit a bio, once I have the block_device ready.

I have read that the handling of block_devices is not as straightforward as it
could be... ( http://lwn.net/Articles/247072/ )

Any suggestions would be appreciated.

Again, please CC: me on any replies.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question regarding mutex locking

2007-11-28 Thread Jarek Poplawski
On Wed, Nov 28, 2007 at 03:33:12PM -0800, Stephen Hemminger wrote:
...
> WTF are you teaching a lesson on how NOT to do locking?
> 
> Any code which has this kind of convoluted dependency on conditional
> locking is fundamentally broken.
> 

As a matter of fact I've been thinking, about one more Re: to myself
to point this all is a good example how problematic such solution
would be, but I've decided it's rather apparent. IMHO learning needs
bad examples too - to better understand why they should be avoided.

On the other hand, I've seen quite a lot of fundamentally right, but
practically broken code, so I'm not sure what's better. And, btw., I
guess this 'fundamentally broken' type of locking could be found in
the kernel too, but I'd prefer not too look after this now.

Thanks,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] LTTng instrumentation mm (using page_to_pfn)

2007-11-28 Thread Dave Hansen
On Wed, 2007-11-28 at 21:34 -0500, Mathieu Desnoyers wrote:
> Before I start digging deeper in checking whether it is already
> instrumented by the fs instrumentation (and would therefore be
> redundant), is there a particular data structure from mm/ that you
> suggest taking the swap file number and location in swap from ?

page_private() at this point stores a swp_entry_t.  There are swp_type()
and swp_offset() helpers to decode the two bits you need after you've
turned page_private() into a swp_entry_t.  See how get_swap_bio()
creates a temporary swp_entry_t from the page_private() passed into it,
then uses swp_type/offset() on it?

I don't know if there is some history behind it, but it doesn't make a
whole ton of sense to me to be passing page_private(page) into
get_swap_bio() (which happens from its only two call sites).  It just
kinda obfuscates where 'index' came from.

It think we probably could just be doing

swp_entry_t entry = { .val = page_private(page), };

in get_swap_bio() and not passing page_private().  We have the page in
there already, so we don't need to pass a derived value like
page_private().  At the least, it'll save some clutter in the function
declaration.  

Or, make a helper:

static swp_entry_t page_swp_entry(struct page *page)
{
swp_entry_t entry;
VM_BUG_ON(!PageSwapCache(page));
entry.val = page_private(page);
return entry;
}

I see at least 4 call sites that could use this.  The try_to_unmap_one()
caller would trip over the debug check, so you'd have to move the call
inside of the if(PageSwapCache(page)) statement.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nommu: Add new vmalloc_user() and remap_vmalloc_range() interfaces.

2007-11-28 Thread Greg Ungerer



Paul Mundt wrote:

This builds on top of the earlier vmalloc_32_user() work introduced by
b50731732f926d6c49fd0724616a7344c31cd5cf, as we now have places in the
nommu allmodconfig that hit up against these missing APIs.

As vmalloc_32_user() is already implemented, this is moved over to
vmalloc_user() and simply made a wrapper. As all current nommu platforms
are 32-bit addressable, there's no special casing we have to do for
ZONE_DMA and things of that nature as per GFP_VMALLOC32.

remap_vmalloc_range() needs to check VM_USERMAP in order to figure out
whether we permit the remap or not, which means that we also have to
rework the vmalloc_user() code to grovel for the VMA and set the flag.

Signed-off-by: Paul Mundt <[EMAIL PROTECTED]>


Acked-by: Greg Ungerer <[EMAIL PROTECTED]>



 mm/nommu.c |   45 -
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/mm/nommu.c b/mm/nommu.c
index 35622c5..c4768d0 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -10,6 +10,7 @@
  *  Copyright (c) 2000-2003 David McCullough <[EMAIL PROTECTED]>
  *  Copyright (c) 2000-2001 D Jeff Dionne <[EMAIL PROTECTED]>
  *  Copyright (c) 2002  Greg Ungerer <[EMAIL PROTECTED]>
+ *  Copyright (c) 2007  Paul Mundt <[EMAIL PROTECTED]>
  */
 
 #include 

@@ -183,6 +184,26 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask, 
pgprot_t prot)
 }
 EXPORT_SYMBOL(__vmalloc);
 
+void *vmalloc_user(unsigned long size)

+{
+   void *ret;
+
+   ret = __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO,
+   PAGE_KERNEL);
+   if (ret) {
+   struct vm_area_struct *vma;
+
+   down_write(>mm->mmap_sem);
+   vma = find_vma(current->mm, (unsigned long)ret);
+   if (vma)
+   vma->vm_flags |= VM_USERMAP;
+   up_write(>mm->mmap_sem);
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL(vmalloc_user);
+
 struct page * vmalloc_to_page(void *addr)
 {
return virt_to_page(addr);
@@ -253,10 +274,17 @@ EXPORT_SYMBOL(vmalloc_32);
  *
  * The resulting memory area is 32bit addressable and zeroed so it can be
  * mapped to userspace without leaking data.
+ *
+ * VM_USERMAP is set on the corresponding VMA so that subsequent calls to
+ * remap_vmalloc_range() are permissible.
  */
 void *vmalloc_32_user(unsigned long size)
 {
-   return __vmalloc(size, GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL);
+   /*
+* We'll have to sort out the ZONE_DMA bits for 64-bit,
+* but for now this can simply use vmalloc_user() directly.
+*/
+   return vmalloc_user(size);
 }
 EXPORT_SYMBOL(vmalloc_32_user);
 
@@ -1213,6 +1241,21 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long from,

 }
 EXPORT_SYMBOL(remap_pfn_range);
 
+int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,

+   unsigned long pgoff)
+{
+   unsigned int size = vma->vm_end - vma->vm_start;
+
+   if (!(vma->vm_flags & VM_USERMAP))
+   return -EINVAL;
+
+   vma->vm_start = (unsigned long)(addr + (pgoff << PAGE_SHIFT));
+   vma->vm_end = vma->vm_start + size;
+
+   return 0;
+}
+EXPORT_SYMBOL(remap_vmalloc_range);
+
 void swap_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
 {
 }



--

Greg Ungerer  --  Chief Software Dude   EMAIL: [EMAIL PROTECTED]
Secure Computing CorporationPHONE:   +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] New kobject/kset/ktype documentation and example code

2007-11-28 Thread Greg KH
On Wed, Nov 28, 2007 at 02:03:28PM -0500, Alan Stern wrote:
> On Tue, 27 Nov 2007, Greg KH wrote:
> 
> > Part of the difficulty in understanding the driver model - and the kobject
> > abstraction upon which it is built - is that there is no obvious starting
> > place. Dealing with kobjects requires understanding a few different types,
> > all of which make reference to each other. In an attempt to make things
> > easier, we'll take a multi-pass approach, starting with vague terms and
> > adding detail as we go. To that end, here are some quick definitions of
> > some terms we will be working with.
> > 
> >  - A kobject is an object of type struct kobject.  Kobjects have a name
> >and a reference count.  A kobject also has a parent pointer (allowing
> >objects to be arranged into hierarchies), a specific type, and,
> >usually, a representation in the sysfs virtual filesystem.
> 
> As Cornelia said, it would be worthwhile mentioning krefs in this
> document as well.  They are simple enough to explain, after all.

Now added, thanks.

> > Initialization of kobjects
> > 
> > Code which creates a kobject must, of course, initialize that object. Some
> > of the internal fields are setup with a (mandatory) call to kobject_init():
> 
> kobject_init() isn't mandatory if you use kobject_register().  But then 
> Kay wants to do away with kobject_register()...
> 
> > The other kobject fields which should be set, directly or indirectly, by
> > the creator are its ktype, kset, and parent. We will get to those shortly,
> > however please note that the ktype and kset must be set before the
> > kobject_init() function is called.
> 
> In fact kset, ktype, and parent are optional, right?  You might mention
> at this point that not all those fields are needed, and explain later
> which combinations are legal.

They are optional, but if you want to do anything, you need to set them :)

> > When a reference is released, the call to kobject_put() will decrement the
> > reference count and, possibly, free the object. Note that kobject_init()
> > sets the reference count to one, so the code which sets up the kobject will
> > need to do a kobject_put() eventually to release that reference.
> 
> It's worth mentioning here (and perhaps elsewhere too) that all of the
> function calls described here can sleep and hence must be made in
> process context, with the exception of the *_get() routines.  It's
> possible to call *_put() in atomic context; the SCSI core does this
> (with device_put, not kobject_put) and has to jump through hoops to run
> the corresponding release routine in a waitqueue task.  In general,
> though, it isn't safe.

Is this really needed?  If anyone calls them from non-process context,
they will get a nasty run-time warning, right?

> > Because kobjects are dynamic, they must not be declared statically or on
> > the stack, but instead, always from the heap.  Future versions of the
> > kernel will contain a run-time check for kobjects that are created
> > statically and will warn the developer of this improper usage.
> 
> Why not?  What's wrong with static kobjects?  I've never understood this.

They are reference counted.  Other portions of the kernel can grab them
and think they are safe to use.  If you do this with a static object,
what happens when the code goes away?

Most of the nasty race conditions that require this are now cleaned up
with Tejun's great sysfs work, so you will probably not see problems if
you do this, but in general, it's not a good thing to do.

> > ktypes and release methods
> > 
> > One important thing still missing from the discussion is what happens to a
> > kobject when its reference count reaches zero. The code which created the
> > kobject generally does not know when that will happen; if it did, there
> > would be little point in using a kobject in the first place. Even
> > predicatable object lifecycles become more complicated when sysfs is
> 
> predictable

thanks.

> > One important point cannot be overstated: every kobject must have a
> > release() method, and the kobject must persist (in a consistent state)
> > until that method is called. If these constraints are not met, the code is
> > flawed.  Note that the kernel will warn you if you forget to provide a
> > release() method.  Do not try to get rid of this warning by providing an
> > "empty" release function, you will be mocked merciously by the kobject
> > maintainer if you attempt this.
> 
> Not to mention that doing this will leak memory.  Unless the kobject
> is static...

heh.

I think your other questions are already answered, right?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
Second portion. Add a new seg_offset macro to calculate the offset. This 
can be avoided if the linker relocates the per cpu area to zero. Includes 
a patch to read trickle count via both methods to verify that it actually 
works. Both patches on top of the per cpu cleanup patches that I sent 
today too.


x86_64: Make the x86_32 percpu operations usable on x86_64

Calculate the offset relative to gs in order to be able to address
per cpu data using the x86_64 per cpu macros.

The subtraction of __per_cpu_start will make the offset based
from the beginning of the per cpu area. That is where gs points to.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 drivers/char/random.c|2 +-
 include/asm-x86/percpu.h |   29 ++---
 init/main.c  |5 +
 3 files changed, 24 insertions(+), 12 deletions(-)

Index: linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-x86/percpu.h  2007-11-28 
17:50:01.861182410 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-x86/percpu.h   2007-11-28 
21:22:50.845872906 -0800
@@ -16,7 +16,13 @@
 #define __my_cpu_offset read_pda(data_offset)
 
 #define per_cpu_offset(x) (__per_cpu_offset(x))
+#define __percpu_seg "%%gs:"
+/* Calculate the offset to use with the segment register */
+#define seg_offset(name)   (*SHIFT_PTR(_cpu_var(name), - (unsigned 
long)__per_cpu_start))
 
+#else
+#define __percpu_seg ""
+#define seg_offset(name)   per_cpu_var(name)
 #endif
 #include 
 
@@ -64,16 +70,11 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
  *PER_CPU(cpu_gdt_descr, %ebx)
  */
 #ifdef CONFIG_SMP
-
 #define __my_cpu_offset x86_read_percpu(this_cpu_off)
-
 /* fs segment starts at (positive) offset == __per_cpu_offset[cpu] */
 #define __percpu_seg "%%fs:"
-
 #else  /* !SMP */
-
 #define __percpu_seg ""
-
 #endif /* SMP */
 
 #include 
@@ -81,6 +82,13 @@ DECLARE_PER_CPU(struct x8664_pda, pda);
 /* We can use this directly for local CPU (faster). */
 DECLARE_PER_CPU(unsigned long, this_cpu_off);
 
+#define seg_offset(name)   per_cpu_var(name)
+
+#endif /* __ASSEMBLY__ */
+#endif /* !CONFIG_X86_64 */
+
+#ifndef __ASSEMBLY__
+
 /* For arch-specific code, we can use direct single-insn ops (they
  * don't give an lvalue though). */
 extern void __bad_percpu_size(void);
@@ -132,11 +140,10 @@ extern void __bad_percpu_size(void);
}   \
ret__; })
 
-#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
-#define x86_write_percpu(var,val) percpu_to_op("mov", per_cpu__##var, val)
-#define x86_add_percpu(var,val) percpu_to_op("add", per_cpu__##var, val)
-#define x86_sub_percpu(var,val) percpu_to_op("sub", per_cpu__##var, val)
-#define x86_or_percpu(var,val) percpu_to_op("or", per_cpu__##var, val)
+#define x86_read_percpu(var) percpu_from_op("mov", seg_offset(var))
+#define x86_write_percpu(var,val) percpu_to_op("mov", seg_offset(var), val)
+#define x86_add_percpu(var,val) percpu_to_op("add", seg_offset(var), val)
+#define x86_sub_percpu(var,val) percpu_to_op("sub", seg_offset(var), val)
+#define x86_or_percpu(var,val) percpu_to_op("or", seg_offset(var), val)
 #endif /* !__ASSEMBLY__ */
-#endif /* !CONFIG_X86_64 */
 #endif /* _ASM_X86_PERCPU_H_ */
Index: linux-2.6.24-rc3-mm2/drivers/char/random.c
===
--- linux-2.6.24-rc3-mm2.orig/drivers/char/random.c 2007-11-28 
21:20:58.225804398 -0800
+++ linux-2.6.24-rc3-mm2/drivers/char/random.c  2007-11-28 21:28:38.967363573 
-0800
@@ -272,7 +272,7 @@ static int random_write_wakeup_thresh = 
 
 static int trickle_thresh __read_mostly = INPUT_POOL_WORDS * 28;
 
-static DEFINE_PER_CPU(int, trickle_count) = 0;
+DEFINE_PER_CPU(int, trickle_count) = 55;
 
 /*
  * A pool of size .poolwords is stirred with a primitive polynomial
Index: linux-2.6.24-rc3-mm2/init/main.c
===
--- linux-2.6.24-rc3-mm2.orig/init/main.c   2007-11-28 21:10:54.245804225 
-0800
+++ linux-2.6.24-rc3-mm2/init/main.c2007-11-28 21:22:17.769053628 -0800
@@ -504,6 +504,8 @@ void __init __attribute__((weak)) smp_se
 {
 }
 
+DECLARE_PER_CPU(int, trickle_count);
+
 asmlinkage void __init start_kernel(void)
 {
char * command_line;
@@ -645,6 +647,9 @@ asmlinkage void __init start_kernel(void
 
acpi_early_init(); /* before LAPIC and SMP init */
 
+   printk("Reading trickle cound =%lu. Is %lu\n",
+   x86_read_percpu(trickle_count),
+   __raw_get_cpu_var(trickle_count));
/* Do the rest non-__init'ed, we're now alive */
rest_init();
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Sample kset/ktype/kobject implementation

2007-11-28 Thread Greg KH
On Wed, Nov 28, 2007 at 05:35:32PM +0100, Cornelia Huck wrote:
> On Tue, 27 Nov 2007 15:04:06 -0800,
> Greg KH <[EMAIL PROTECTED]> wrote:
> 
> > static struct foo_obj *create_foo_obj(const char *name)
> > {
> > struct foo_obj *foo;
> > int retval;
> > 
> > /* allocate the memory for the whole object */
> > foo = kzalloc(sizeof(*foo), GFP_KERNEL);
> > if (!foo)
> > return NULL;
> > 
> > /* initialize the kobject portion of the object properly */
> > kobject_set_name(>kobj, "%s", name);
> 
> Returncode not checked :)

good catch.  Hm, I don't think anyone checks that function :)

> > foo->kobj.kset = example_kset;
> > foo->kobj.ktype = _ktype;
> > 
> > /*
> >  * Register the kobject with the kernel, all the default files will
> >  * be created here and the uevent will be sent out.  If we were to
> >  * call kobject_init() and then kobject_add() we would be
> >  * responsible for sending out the initial KOBJ_ADD uevent.
> >  */
> > retval = kobject_register(>kobj);
> > if (retval) {
> > kfree(foo);
> 
> kobject_put(foo) is needed since it gets you through kobject_cleanup()
> where the name can be freed.

No, kobject_register() should have handled that for us, right?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] New kobject/kset/ktype documentation and example code

2007-11-28 Thread Greg KH
On Wed, Nov 28, 2007 at 12:45:45PM +0100, Cornelia Huck wrote:
> On Tue, 27 Nov 2007 15:02:52 -0800,
> Greg KH <[EMAIL PROTECTED]> wrote:
> >  - A kset can provide a set of default attributes that all kobjects that
> >belong to it automatically inherit and have created whenever a kobject
> >is registered belonging to the kset.
> 
> Hm, the default attributes are provided by the ktype?

Yes, now fixed.

> > The uevent function will be called when the uevent is about to be sent to
> > userspace to allow more environment variables to be added to the uevent.
> 
> It may be helpful to mention which uevents are by default created by
> the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).

Is this really needed?

> >  - refcount is the kobject's reference count; it is initialized by 
> > kobject_init()
> 
> There is no field called "refcount"; the embedded struct kref kref is
> initialized by kobject_init().

now removed, thanks.

> > Often, much of the initialization of a kobject is handled by the layer that
> > manages the containing kset.  See the sample/kobject/kset-example.c for how
> > this is usually handled.
> 
> Do we also want to mention kobject_rename() and kobject_move(), or are
> those functions so esoteric that most people don't want to know about
> them?

They can be found in the kerneldoc api reference if they are needed :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] New kobject/kset/ktype documentation and example code

2007-11-28 Thread Greg KH
On Wed, Nov 28, 2007 at 06:00:27PM +0100, Kay Sievers wrote:
> On Wed, 2007-11-28 at 17:51 +0100, Cornelia Huck wrote:
> > On Wed, 28 Nov 2007 17:36:29 +0100,
> > Kay Sievers <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > On Wed, 2007-11-28 at 17:12 +0100, Cornelia Huck wrote:
> > > > On Wed, 28 Nov 2007 16:57:48 +0100,
> > > > Kay Sievers <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > On Wed, 2007-11-28 at 16:48 +0100, Cornelia Huck wrote:
> > > > > > On Wed, 28 Nov 2007 13:23:02 +0100,
> > > > > > Kay Sievers <[EMAIL PROTECTED]> wrote:
> > > > > > > On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > > > > > > > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[EMAIL PROTECTED]> 
> > > > > > > > wrote:
> > > > > 
> > > > > > > > > The uevent function will be called when the uevent is about 
> > > > > > > > > to be sent to
> > > > > > > > > userspace to allow more environment variables to be added to 
> > > > > > > > > the uevent.
> > > > > > > > 
> > > > > > > > It may be helpful to mention which uevents are by default 
> > > > > > > > created by
> > > > > > > > the kobject core (KOBJ_ADD, KOBJ_DEL, KOBJ_MOVE).
> > > > > > > 
> > > > > > > I think, we should remove all these default events from the 
> > > > > > > kobject
> > > > > > > core. We will not be able to manage the timing issues and "raw" 
> > > > > > > kobject
> > > > > > > users should request the events on their own, when they are 
> > > > > > > finished
> > > > > > > adding stuff to the kobject. I see currently no way to solve the
> > > > > > > "attributes created after the event" problem. The new
> > > > > > > *_create_and_register functions do not allow default attributes 
> > > > > > > to be
> > > > > > > created, which will just lead to serious trouble when someone 
> > > > > > > wants to
> > > > > > > use udev to set defaults and such things. We may just want to 
> > > > > > > require an
> > > > > > > explicit call to send the event?
> > > > > > 
> > > > > > There will always be attributes that will show up later (for 
> > > > > > example,
> > > > > > after a device is activated). Probably the best approach is to keep 
> > > > > > the
> > > > > > default uevents, but have the attribute-adder send another uevent 
> > > > > > when
> > > > > > they are done?
> > > > > 
> > > > > Uh, that's more an exception where we can't give guarantees because of
> > > > > very specific hardware setups, and it would be an additional "change"
> > > > > event. There are valid cases for this, but only a _very_ few.
> > > > > 
> > > > > There is absolutely no reason not to do it right with the "add" event,
> > > > > just because we are too lazy to solve it proper the current code. It's
> > > > > just so broken by design, what we are doing today. :)
> > > > 
> > > > I'm worrying a bit about changes that impact the whole code tree in
> > > > lots of places. I'd be fine with the device layer doing its uevent
> > > > manually in device_add() at the very end, though. (This would allow
> > > > drivers to add attributes in their probe function before the uevent,
> > > > for example.)
> > 
> > 
> 
> I think I still remember what I did 2.5 years ago :)
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e57cd73e2e844a3da25cc6b420674c81bbe1b387
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=18c3d5271b472c096adfc856e107c79f6fd30d7d
> 
>  
> > > The driver core does use the split already in most places, I did that
> > > long ago. There are not too many (~20) users of kobject_register(), and
> > > it's a pretty straight-forward change to change that to _init, _add,
> > > _uevent, and get rid of that totally useless "convenience api".
> > > 
> > > I think there is no longer any excuse to keep that broken code around,
> > > and even require to document that it's broken. The whole purpose of the
> > > uevent is userspace consumption, which just doesn't work correctly with
> > > the code we offer. The fix is trivial, and should be done now, and we no
> > > longer need to fiddle around timing issues, just because we are too
> > > lazy.
> > > 
> > > I propose the removal of _all_ funtions that have *register* in their
> > > name, and always require the following sequence:
> > >   _init()
> > >   _add()
> > >   _uevent(_ADD)
> > > 
> > >   _uevent(_REMOVE)
> > >   _del()
> > >   _put()
> > > 
> > > The _create_and_register() functions would become  _create_ and_add()
> > > and will need an additional _uevent() call after they populated the
> > > object.
> > 
> > I'm absolutely fine with doing that at the kobject level (after all,
> > it's a quite contained change, and the uevent function explicitely
> > works on a kobject).
> > 
> > For the other _register()/_unregister() functions, it's a different
> > piece of cake. They are:
> > - distributed through lot of different code
> > - at a higher level than kobjects, and kobject_uevent() acts on the
> > kobject
> > - usually encapsulating a sequence that wants to be used by 

Re: [RFC] New kobject/kset/ktype documentation and example code

2007-11-28 Thread Greg KH
On Wed, Nov 28, 2007 at 01:23:02PM +0100, Kay Sievers wrote:
> On Wed, 2007-11-28 at 12:45 +0100, Cornelia Huck wrote:
> > On Tue, 27 Nov 2007 15:02:52 -0800, Greg KH <[EMAIL PROTECTED]> wrote:
> 
> > > A kset serves these functions:
> > > 
> > >  - It serves as a bag containing a group of objects. A kset can be used by
> > >the kernel to track "all block devices" or "all PCI device drivers."
> > > 
> > >  - A kset is also a subdirectory in sysfs, where the associated kobjects
> > >with the kset can show up.  
> > 
> > Perhaps better wording:
> > 
> > A kset is also represented via a subdirectory in sysfs, under which the
> > kobjects associated with the kset can show up.
> 
> This draws a misleading picture. A member of a kset shows up where the
> "parent" pointer points to. Like /sys/block is a kset, the kset contains
> disks and partitions, but partitions do not live at the kset, and tons
> of other kset directories where this is the case.
> 
> "If the kobject belonging to a kset has no parent kobject set, it will
> be added to the kset's directory. Not all members of a kset do
> necessarily live in the kset directory. If an explicit parent kobject is
> assigned before the kobject is added, the kobject is registered with the
> kset, but added below the parent kobject."

Nice, thanks, I've added this :)

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
Here is the first of two patches for x86_64 that move the pda into the per 
cpu area and then make the x86 percpu macros work for x86_64. This needs 
to be generalized for other arches. The __per_cpu_start offsets can be 
taken care of by the linker. We can also tell the linker to completely 
relocate the percpu area to 0.



X86_64: Declare pda as per cpu data thereby moving it into the cpu area

Declare the pda as a per cpu variable. This will have the effect of moving
the pda data into the cpu area managed by cpu alloc.

The boot_pdas are only needed in head64.c so move the declaration
over there and make it static.

Remove the code that allocates special pda data structures.

The pda is moved to the beginning of the per cpu area. gs is pointing to the
pda. And therefore gs: is now pointing to the per cpu area of the current
processor. A per cpu variable can then be reached at

%gs:[_cpu_ - __per_cpu_start]

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 arch/x86/kernel/head64.c  |6 ++
 arch/x86/kernel/setup64.c |   13 ++---
 arch/x86/kernel/smpboot_64.c  |   16 
 include/asm-generic/vmlinux.lds.h |1 +
 include/asm-x86/pda.h |1 -
 include/linux/percpu.h|4 
 6 files changed, 21 insertions(+), 20 deletions(-)

Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/setup64.c 2007-11-28 
20:59:13.124188194 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/setup64.c  2007-11-28 
21:08:50.473347382 -0800
@@ -30,7 +30,9 @@ cpumask_t cpu_initialized __cpuinitdata 
 
 struct x8664_pda *_cpu_pda[NR_CPUS] __read_mostly;
 EXPORT_SYMBOL(_cpu_pda);
-struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
+
+DEFINE_PER_CPU_FIRST(struct x8664_pda, pda);
+EXPORT_PER_CPU_SYMBOL(pda);
 
 struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) idt_table };
 
@@ -109,10 +111,15 @@ void __init setup_per_cpu_areas(void)
}
if (!ptr)
panic("Cannot allocate cpu data for CPU %d\n", i);
-   cpu_pda(i)->data_offset = ptr - __per_cpu_start;
memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+   /* Relocate the pda */
+   memcpy(ptr, cpu_pda(i), sizeof(struct x8664_pda));
+   cpu_pda(i) = (struct x8664_pda *)ptr;
+   cpu_pda(i)->data_offset = ptr - __per_cpu_start;
}
-} 
+   /* Fix up pda for this processor  */
+   pda_init(0);
+}
 
 void pda_init(int cpu)
 { 
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/smpboot_64.c  2007-11-28 
20:59:13.136188167 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/smpboot_64.c   2007-11-28 
20:59:35.399937395 -0800
@@ -556,22 +556,6 @@ static int __cpuinit do_boot_cpu(int cpu
return -1;
}
 
-   /* Allocate node local memory for AP pdas */
-   if (cpu_pda(cpu) == _cpu_pda[cpu]) {
-   struct x8664_pda *newpda, *pda;
-   int node = cpu_to_node(cpu);
-   pda = cpu_pda(cpu);
-   newpda = kmalloc_node(sizeof (struct x8664_pda), GFP_ATOMIC,
- node);
-   if (newpda) {
-   memcpy(newpda, pda, sizeof (struct x8664_pda));
-   cpu_pda(cpu) = newpda;
-   } else
-   printk(KERN_ERR
-   "Could not allocate node local PDA for CPU %d on node %d\n",
-   cpu, node);
-   }
-
alternatives_smp_switch(1);
 
c_idle.idle = get_idle_for_cpu(cpu);
Index: linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c
===
--- linux-2.6.24-rc3-mm2.orig/arch/x86/kernel/head64.c  2007-11-28 
20:59:13.152187359 -0800
+++ linux-2.6.24-rc3-mm2/arch/x86/kernel/head64.c   2007-11-28 
20:59:35.403937534 -0800
@@ -22,6 +22,12 @@
 #include 
 #include 
 
+/*
+ * Only used before the per cpu areas are setup. The use for the non possible
+ * cpus continues after boot
+ */
+static struct x8664_pda boot_cpu_pda[NR_CPUS] __cacheline_aligned;
+
 static void __init zap_identity_mappings(void)
 {
pgd_t *pgd = pgd_offset_k(0UL);
Index: linux-2.6.24-rc3-mm2/include/asm-x86/pda.h
===
--- linux-2.6.24-rc3-mm2.orig/include/asm-x86/pda.h 2007-11-28 
20:59:13.164187921 -0800
+++ linux-2.6.24-rc3-mm2/include/asm-x86/pda.h  2007-11-28 20:59:35.403937534 
-0800
@@ -39,7 +39,6 @@ struct x8664_pda {
 } cacheline_aligned_in_smp;
 
 extern struct x8664_pda *_cpu_pda[];
-extern struct x8664_pda boot_cpu_pda[];
 extern void pda_init(int);
 
 #define cpu_pda(i) 

Re: [RFC] New kobject/kset/ktype documentation and example code

2007-11-28 Thread Greg KH
On Wed, Nov 28, 2007 at 10:01:08AM +0100, Cornelia Huck wrote:
> On Tue, 27 Nov 2007 15:02:52 -0800,
> Greg KH <[EMAIL PROTECTED]> wrote:
> > So, for example, UIO code has a structure that defines the memory region
> > associated with a uio device:
> > 
> > struct uio_mem {
> > struct kobject kobj;
> > unsigned long addr;
> > unsigned long size;
> > int memtype;
> > void __iomem *internal_addr;
> > };
> > 
> > If you have a struct uio_mem structure, finding its embedded kobject is 
> > just a
> > matter of using the kobj pointer.  
> 
> Pointer may be a confusing term, how about "structure member"?

thanks, now fixed.

> > Code that works with kobjects will often
> > have the opposite problem, however: given a struct kobject pointer, what is
> > the pointer to the containing structure?  You must avoid tricks (such as
> > assuming that the kobject is at the beginning of the structure) and,
> > instead, use the container_of() macro, found in :
> > 
> > container_of(pointer, type, member)
> > 
> > where pointer is the pointer to the embedded kobject, type is the type of
> > the containing structure, and member is the name of the structure field to
> > which pointer points.  The return value from container_of() is a pointer to
> > the given type. So, for example, a pointer to a struct kobject embedded
> > within a struct cdev called "kp" could be converted to a pointer to the
> 
> "struct uio_mem", I guess.

yes, now fixed.

> > containing structure with:
> > 
> > struct uio_mem *u_mem = container_of(kp, struct uio_mem, kobj);
> > 
> > Programmers will often define a simple macro for "back-casting" kobject
> > pointers to the containing type.
> > 
> > 
> > Initialization of kobjects
> > 
> > Code which creates a kobject must, of course, initialize that object. Some
> > of the internal fields are setup with a (mandatory) call to kobject_init():
> > 
> > void kobject_init(struct kobject *kobj);
> > 
> > Among other things, kobject_init() sets the kobject's reference count to
> > one.  Calling kobject_init() is not sufficient, however. Kobject users
> > must, at a minimum, set the name of the kobject; this is the name that will
> > be used in sysfs entries. 
> 
> Unless they don't register their kobject. (But they should always set a
> name anyway to avoid funny debug messages, so it is probably a good
> idea to call this a "must").

Yeah, I'll leave this in.

> > To set the name of a kobject properly, do not
> > attempt to manipulate the internal name field, but instead use:
> > 
> > int kobject_set_name(struct kobject *kobj, const char *format, ...);
> > 
> > This function takes a printk-style variable argument list. Believe it or
> > not, it is actually possible for this operation to fail; conscientious code
> > should check the return value and react accordingly.
> > 
> > The other kobject fields which should be set, directly or indirectly, by
> > the creator are its ktype, kset, and parent. We will get to those shortly,
> > however please note that the ktype and kset must be set before the
> > kobject_init() function is called.
> > 
> > 
> > 
> > Reference counts
> > 
> > One of the key functions of a kobject is to serve as a reference counter
> > for the object in which it is embedded. 
> 
> Hm, I thought that was the purpose of struct kref?

Yes, I'll add a reference to kref now.

> > As long as references to the object
> > exist, the object (and the code which supports it) must continue to exist.
> > The low-level functions for manipulating a kobject's reference counts are:
> > 
> > struct kobject *kobject_get(struct kobject *kobj);
> > void kobject_put(struct kobject *kobj);
> > 
> > A successful call to kobject_get() will increment the kobject's reference
> > counter and return the pointer to the kobject. If, however, the kobject is
> > already in the process of being destroyed, the operation will fail and
> > kobject_get() will return NULL. 
> 
> Eh, no. We'll always return !NULL if the kobject is !NULL to start
> with. If the reference count is already 0, the code will moan, but the
> caller will still get a pointer.

Good point, this was the way things used to work a long time ago, I'll remove
this.

> > This return value must always be tested, or
> > no end of unpleasant race conditions could result.
> > 
> > When a reference is released, the call to kobject_put() will decrement the
> > reference count and, possibly, free the object. Note that kobject_init()
> > sets the reference count to one, so the code which sets up the kobject will
> > need to do a kobject_put() eventually to release that reference.
> > 
> > Because kobjects are dynamic, they must not be declared statically or on
> > the stack, but instead, always from the heap.  Future versions of the
> > kernel will contain a run-time check for kobjects that are created
> > statically and will warn the developer of this improper usage.
> > 
> > 
> > Hooking into sysfs
> > 
> > An initialized 

Re: [RFC] New kobject/kset/ktype documentation and example code

2007-11-28 Thread Greg KH
On Tue, Nov 27, 2007 at 08:50:14PM -0700, Jonathan Corbet wrote:
> Greg KH <[EMAIL PROTECTED]> wrote:
> 
> > Jonathan, I used your old lwn.net article about kobjects as the basis
> > for this document, I hope you don't mind
> 
> Certainly I have no objections, I'm glad it was useful.

Thanks, it was a great framework to work with.

> > It is rare (even unknown) for kernel code to create a standalone kobject;
> > with one major exception explained below.
> 
> You don't keep this promise - bet you thought we wouldn't notice...
> Actually I guess you do, in the "creating simple kobjects" section.
> When you get to that point, you should mention that this is a situation
> where standalone kobjects make sense.

Sorry, yes, that is where I tried to explain it.  I'll flush it out some
more.

> Given that there are quite a few standalone kobjects created by this
> patch set (kernel_kobj, security_kobj, s390_kobj, etc.), the "(even
> unknown)" should probably come out.

Ok.

> > So, for example, UIO code has a structure that defines the memory region
> > associated with a uio device:
> 
> *The* UIO code, presumably.

fixed.

> > the given type. So, for example, a pointer to a struct kobject embedded
> > within a struct cdev called "kp" could be converted to a pointer to the
> > containing structure with:
> 
> That should be "struct uio_mem", I think.

fixed.

> > one.  Calling kobject_init() is not sufficient, however. Kobject users
> > must, at a minimum, set the name of the kobject; this is the name that will
> > be used in sysfs entries.
> 
> Is setting the name mandatory now, or are there still places where
> kobjects (which do not appear in sysfs) do have - and do not need - a
> name?

Any kobject that is registered needs to have a name.  If someone tries
to call kobject_register() or kobject_add() without a name set they will
find out that it is not allowed :)

And yes, there are a few places in the kernel with kobjects that are
never registered.  I'm working on trying to get rid of them...

> > Because kobjects are dynamic, they must not be declared statically or on
> > the stack, but instead, always from the heap.  Future versions of the
> 
> "always be allocated from the heap"?

thanks.

> > "empty" release function, you will be mocked merciously by the kobject
> > maintainer if you attempt this.
> 
> So just how should severely should we mock kobject maintainers who can't
> spell "mercilessly"?  :)

Heh, turns out that a lot of people sent me this privately :)

> >  - A kset can provide a set of default attributes that all kobjects that
> >belong to it automatically inherit and have created whenever a kobject
> >is registered belonging to the kset.
> 
> Can we try that one again?
> 
>  - A kset can provide a set of default attributes for all kobjects which
>belong to it.

No, it's the ktype that does this, I'll go fix that up...

> > There is currently
> > no other way to add a kobject to a kset without directly messing with the
> > list pointers.
> 
> Presumably the latter way is not recommended; I would either say so or
> not mention this possibility at all.

Ah, yes, now removed.

Thanks for the review, I really appreciate it.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PPC: CELLEB - fix potential NULL pointer dereference

2007-11-28 Thread Cyrill Gorcunov
On 11/29/07, Ishizaki Kou <[EMAIL PROTECTED]> wrote:
[...snip...]
>
> There is no problem to use Michael's part, and I also prefer simple
> one like this.
>
> Cyrill, would you please update your patch?
>
> Best regards,
> Kou Ishizaki
>

Please see updated patch enveloped. (Can't do it inline becase I'm on
my work now where I have no Linux machine)

Cyrill
---
From: Cyrill Gorcunov <[EMAIL PROTECTED]>
Subject: [PATCH] PPC: CELLEB - fix possible NULL pointer dereference

This patch adds checking for NULL returned value to
prevent possible NULL pointer dereference.

Signed-off-by: Cyrill Gorcunov <[EMAIL PROTECTED]>
---

 arch/powerpc/platforms/celleb/pci.c |   11 ---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/celleb/pci.c 
b/arch/powerpc/platforms/celleb/pci.c
index 6bc32fd..13ec4a6 100644
--- a/arch/powerpc/platforms/celleb/pci.c
+++ b/arch/powerpc/platforms/celleb/pci.c
@@ -138,8 +138,6 @@ static void celleb_config_read_fake(unsigned char *config, 
int where,
*val = celleb_fake_config_readl(p);
break;
}
-
-   return;
 }
 
 static void celleb_config_write_fake(unsigned char *config, int where,
@@ -158,7 +156,6 @@ static void celleb_config_write_fake(unsigned char *config, 
int where,
celleb_fake_config_writel(val, p);
break;
}
-   return;
 }
 
 static int celleb_fake_pci_read_config(struct pci_bus *bus,
@@ -351,6 +348,10 @@ static int __init celleb_setup_fake_pci_device(struct 
device_node *node,
wi1 = of_get_property(node, "vendor-id", NULL);
wi2 = of_get_property(node, "class-code", NULL);
wi3 = of_get_property(node, "revision-id", NULL);
+   if (!wi0 || !wi1 || !wi2 || !wi3) {
+   printk(KERN_ERR "PCI: Missing device tree properties.\n");
+   goto error;
+   }
 
celleb_config_write_fake(*config, PCI_DEVICE_ID, 2, wi0[0] & 0x);
celleb_config_write_fake(*config, PCI_VENDOR_ID, 2, wi1[0] & 0x);
@@ -372,6 +373,10 @@ static int __init celleb_setup_fake_pci_device(struct 
device_node *node,
celleb_setup_pci_base_addrs(hose, devno, fn, num_base_addr);
 
li = of_get_property(node, "interrupts", );
+   if (!li) {
+   printk(KERN_ERR "PCI: interrupts not found.\n");
+   goto error;
+   }
val = li[0];
celleb_config_write_fake(*config, PCI_INTERRUPT_PIN, 1, 1);
celleb_config_write_fake(*config, PCI_INTERRUPT_LINE, 1, val);


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
> x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses 
> the small model. A load of a 64 bit address would require much more 
> expensive instructions. A load of a 64 bit address is currently avoided 
> through the use of the pda that contains the full 64 bit address in the
> data_offset field. Operations on per cpu data on x86_64 must therefore 
> first load data_offset via gs and then add the per cpu address to this
> offset. Then the per cpu operation is performed on that address.
>   

Hm.  Certainly a non-one-instruction access would be considerably less
useful than one that is, because of preemption issues.

(In general you need to pin yourself to a cpu if you're using percpu
data, but sometimes it doesn't matter.  In particular, the reason I'm
interested in this at all is because Xen puts its interrupt mask flag in
per-cpu data, and a single instruction means that masking interrupts
[=disable preemption] can be done in one instruction with no scope for
preemption in the middle doing something unexpected.)

> In order to avoid this situation through one instruction we need a small 
> 32 bit offset relative to gs. Otherwise we cannot get away from the PDA 
> and the use of data_offset.
>   

Hm, yes, I see.  Dratted large address space.  What's wrong with 4G
anyway? ;)

Anyway, I can see the problem with my thinking about this so far.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: git guidance

2007-11-28 Thread Al Boldi
Jakub Narebski wrote:
> Al Boldi wrote:
> > Johannes Schindelin wrote:
> >> By that definition, no SCM, not even CVS, is transparent.  Nothing
> >> short of unpacked directories of all versions (wasting a lot of disk
> >> space) would.
> >
> > Who said anything about unpacking?
> >
> > I'm talking about GIT transparently serving a Virtual Version Control
> > dir to be mounted on the client.
>
> Are you talking about something like (in alpha IIRC) gitfs?
>
>   http://www.sfgoth.com/~mitch/linux/gitfs/

This looks like a good start.

> Besides, you can always use "git show :". For example
> gitweb (and I think other web interfaces) can show any version of a file
> or a directory, accessing only repository.

Sure, browsing is the easy part, but Version Control starts when things 
become writable.


Thanks for the link!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3-mm2 (bugfix for memory cgroup per-zone-struct allocation.)

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 12:23:29 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:
> I noticed CONFIG_NUMA + CONFIG_CGROUP_MEM_CONT + CONFIG_SLUB cannot boot 
> because of my patch.
> (SLAB is ok.)
> I'll post workaround soon.
> 
==
This is a fix. tested on my ia64/NUMA box both on SLAB/SLUB.
This patch fixes kmalloc_node() is called against node-without-memory.

It's better to add memory hotplug callback for supporing possible nodes
(memory hotplug) but here just uses kmalloc().

Should be revisited later.

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

 mm/memcontrol.c |   14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

Index: linux-2.6.24-rc3-mm2/mm/memcontrol.c
===
--- linux-2.6.24-rc3-mm2.orig/mm/memcontrol.c
+++ linux-2.6.24-rc3-mm2/mm/memcontrol.c
@@ -1117,8 +1117,18 @@ static int alloc_mem_cgroup_per_zone_inf
struct mem_cgroup_per_node *pn;
struct mem_cgroup_per_zone *mz;
int zone;
-
-   pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
+   /*
+* This routine is called against possible nodes.
+* But it's BUG to call kmalloc() against offline node.
+*
+* TODO: this routine can waste much memory for nodes which will
+*   never be onlined. It's better to use memory hotplug callback
+*   function.
+*/
+   if (node_state(node, N_HIGH_MEMORY))
+   pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
+   else
+   pn = kmalloc(sizeof(*pn), GFP_KERNEL);
if (!pn)
return 1;
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)

2007-11-28 Thread Richard Knutsson

Vegard Nossum wrote:

Hi,

On Nov 28, 2007 7:51 AM, Richard Knutsson <[EMAIL PROTECTED]> wrote:
  

Vegard Nossum wrote:


+static int
  

Not 'static bool'?


+page_is_tracked(struct page *page)
  

Why not returning 'false' and 'true'?



Sorry, I am not used to using bool in C :-) I will change this if bool
is preferred in kernel code.

  
Well, why not use them since we have them (C99 standard and over a year 
in the kernel). ;)
What is "preferred" in a group of a few thousands, is hard to say, but I 
believe it is the way to go. The only "resistance" to it I know, is "it 
is not a C idiom". A quite illogical statement, at best. However, the 
0/1 vs false/true is just a preference. (I like false/true, since I also 
say "true AND false = false" for example... (NOT true = false, makes 
sense to me, NOT 1 = 0 seem strange, why can't it be 2, or -1 ;) ))

+static unsigned int
+opcode_get_size(const uint8_t *opcode)
  

Are we not using 'u8' in the kernel?



Actually, I don't see any reason to use u8 when uint8_t is already
standard and used in other places in the kernel.
  
I believe I have heard they can be a problem in some situations. It also 
have the benefit of uniforming the kernel-code.


cu
Richard Knutsson

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] base/class.c: prevent ooops due to insert/remove race

2007-11-28 Thread Greg KH
On Wed, Nov 28, 2007 at 11:00:36PM -0500, Mark Lord wrote:
> While doing insert/remove (quickly) tests on USB, I managed to trigger
> an Oops on 2.6.23.1 on the call to strlen() in make_class_name().
>
> This patch prevents this oops.
>
> There is still the larger problem of the overall race
> that caused this in the first place, but much of the rest
> of the code in class.c appears to also do NULL checks to
> avoid Oops'ing, so this continues the tradition.
>
> Signed-off-by:  Mark Lord <[EMAIL PROTECTED]>

As this is a bandage over the real problem, I'd prefer to not apply this
one right now until we find the root cause.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] printk trivial optimizations fix

2007-11-28 Thread Denys Vlasenko
On Wednesday 28 November 2007 11:02, Hugh Dickins wrote:
> mm's printk has been showing "%p" in abominable upper case recently:
> its trivial optimizations have changed the default from lower to upper,
> so the 'p' case needs to enforce lower explicitly.
>
> Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]>
> ---
>
>  lib/vsprintf.c |1 +
>  1 file changed, 1 insertion(+)
>
> --- 2.6.24-rc3-mm2/lib/vsprintf.c 2007-11-28 12:42:26.0 +
> +++ linux/lib/vsprintf.c  2007-11-28 17:01:20.0 +
> @@ -525,6 +525,7 @@ int vsnprintf(char *buf, size_t size, co
>   continue;
>
>   case 'p':
> + flags |= SMALL;
>   if (field_width == -1) {
>   field_width = 2*sizeof(void *);
>   flags |= ZEROPAD;

Thanks Hugh for catching this. My fault :(
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kconfig: Make KCONFIG_ALLCONFIG work with randconfig.

2007-11-28 Thread Paul Mundt
On Wed, Nov 28, 2007 at 06:08:16PM +0100, Roman Zippel wrote:
> On Wed, 28 Nov 2007, Paul Mundt wrote:
> > While allyes/mod/noconfigs do seem to work fine with KCONFIG_ALLCONFIG
> > provisions, randconfig tramples all over the provided values at perhaps
> > not surprisingly, random.
> 
> Please be careful with such broad statements, there is only an issue with 
> choice values.
> 
Ok, I'll rephrase, '100% of the provided values I tested with were being
randomly clobbered'. Is that better? Broken is broken, whether it applies
to a small subset of symbols or not.

> > Debugging this a bit, there seemed to be two issues:
> > 
> > - SYMBOL_DEF and SYMBOL_DEF_USER overlap, which made
> >   def_sym->flags the same regardless of whether we came from an
> >   KCONFIG_ALLCONFIG path or not.
> 
> Look at how SYMBOL_DEF is used in confdata.c.
> 
Ah, ok. I was just trying to find something I could test that would be
different for the KCONFIG_ALLCONFIG path, but it seems like is_new is a
much cleaner solution for this, thanks for pointing it out!

Updated patch follows.

Signed-off-by: Paul Mundt <[EMAIL PROTECTED]>

---

 scripts/kconfig/conf.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/kconfig/conf.c b/scripts/kconfig/conf.c
index a38787a..8d6f174 100644
--- a/scripts/kconfig/conf.c
+++ b/scripts/kconfig/conf.c
@@ -374,7 +374,8 @@ static int conf_choice(struct menu *menu)
continue;
break;
case set_random:
-   def = (random() % cnt) + 1;
+   if (is_new)
+   def = (random() % cnt) + 1;
case set_default:
case set_yes:
case set_mod:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] base/class.c: prevent ooops due to insert/remove race

2007-11-28 Thread Mark Lord

While doing insert/remove (quickly) tests on USB, I managed to trigger
an Oops on 2.6.23.1 on the call to strlen() in make_class_name().

This patch prevents this oops.

There is still the larger problem of the overall race
that caused this in the first place, but much of the rest
of the code in class.c appears to also do NULL checks to
avoid Oops'ing, so this continues the tradition.

Signed-off-by:  Mark Lord <[EMAIL PROTECTED]>
---

Patch applies to both 2.6.24 and 2.6.23.

--- old/drivers/base/class.c2007-11-28 22:54:59.0 -0500
+++ linux/drivers/base/class.c  2007-11-28 22:54:48.0 -0500
@@ -354,6 +354,8 @@
char *class_name;
int size;

+   if (!name)
+   return NULL;
size = strlen(name) + strlen(kobject_name(kobj)) + 2;

class_name = kmalloc(size, GFP_KERNEL);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] IB/ehca: Fix static rate if path faster than link

2007-11-28 Thread Roland Dreier
thanks, applied
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 12:33:28 +0900 (JST)
[EMAIL PROTECTED] (YAMAMOTO Takashi) wrote:

> > +static inline struct mem_cgroup_per_zone *
> > +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> > +{
> > +   if (!mem->info.nodeinfo[nid])
> 
> can this be true?
> 
> YAMAMOTO Takashi

When I set early_init=1, I added that check.
BUG_ON() is better ?

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread YAMAMOTO Takashi
> +static inline struct mem_cgroup_per_zone *
> +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> +{
> + if (!mem->info.nodeinfo[nid])

can this be true?

YAMAMOTO Takashi

> + return NULL;
> + return >info.nodeinfo[nid]->zoneinfo[zid];
> +}
> +
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread Christoph Lameter
On Thu, 29 Nov 2007, KAMEZAWA Hiroyuki wrote:

> ok, just use N_HIGH_MEMORY here and add comment for hotplugging support is 
> not yet.
> 
> Christoph-san, Lee-san, could you confirm following ?
> 
> - when SLAB is used, kmalloc_node() against offline node will success.
> - when SLUB is used, kmalloc_node() against offline node will panic.
> 
> Then, the caller should take care that node is online before kmalloc().

H... An offline node implies that the per node structure does not 
exist. SLAB should fail too. If there is something wrong with the allocs 
then its likely a difference in the way hotplug was put into SLAB and 
SLUB.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 12:19:37 +0900 (JST)
[EMAIL PROTECTED] (YAMAMOTO Takashi) wrote:

> > @@ -651,10 +758,11 @@
> > /* Avoid race with charge */
> > atomic_set(>ref_cnt, 0);
> > if (clear_page_cgroup(page, pc) == pc) {
> > +   int active;
> > css_put(>css);
> > +   active = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
> > res_counter_uncharge(>res, PAGE_SIZE);
> > -   list_del_init(>lru);
> > -   mem_cgroup_charge_statistics(mem, pc->flags, false);
> > +   __mem_cgroup_remove_list(pc);
> > kfree(pc);
> > } else  /* being uncharged ? ...do relax */
> > break;
> 
> 'active' seems unused.
> 
ok, I will post clean-up against -mm2.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PPC: CELLEB - fix potential NULL pointer dereference

2007-11-28 Thread Ishizaki Kou
Cyrill Gorcunov <[EMAIL PROTECTED]> wrote:
> On 11/28/07, Cyrill Gorcunov <[EMAIL PROTECTED]> wrote:
> > On 11/28/07, Michael Ellerman <[EMAIL PROTECTED]> wrote:
> > > On Mon, 2007-11-26 at 10:46 +0300, Cyrill Gorcunov wrote:
> > > > This patch adds checking for NULL value returned to prevent possible
> > > > NULL pointer dereference.
> > > > Also two unneeded 'return' are removed.
> > > >
> > > > Signed-off-by: Cyrill Gorcunov <[EMAIL PROTECTED]>
> > > > ---
> > > > Any comments are welcome.
> > >
> > > I guess it's good to be paranoid, but this is a little verbose:
> > >
> > >wi0 = of_get_property(node, "device-id", NULL);
> > > +   if (unlikely((!wi0))) {
> > > +   printk(KERN_ERR "PCI: device-id not found.\n");
> > > +   goto error;
> > > +   }
> > >wi1 = of_get_property(node, "vendor-id", NULL);
> > > +   if (unlikely((!wi1))) {
> > > +   printk(KERN_ERR "PCI: vendor-id not found.\n");
> > > +   goto error;
> > > +   }
> > >wi2 = of_get_property(node, "class-code", NULL);
> > > +   if (unlikely((!wi2))) {
> > > +   printk(KERN_ERR "PCI: class-code not found.\n");
> > > +   goto error;
> > > +   }
> > >wi3 = of_get_property(node, "revision-id", NULL);
> > > +   if (unlikely((!wi3))) {
> > > +   printk(KERN_ERR "PCI: revision-id not found.\n");
> > > +   goto error;
> > > +   }
> > >
> > > Perhaps instead:
> > >
> > >wi0 = of_get_property(node, "device-id", NULL);
> > >wi1 = of_get_property(node, "vendor-id", NULL);
> > >wi2 = of_get_property(node, "class-code", NULL);
> > >wi3 = of_get_property(node, "revision-id", NULL);
> > >
> > >   if (!wi0 || !wi1 || !wi2 || !wi3) {
> > >   printk(KERN_ERR "PCI: Missing device tree properties.\n");
> > >   goto error;
> > >   }
> >
> > Hi Michael, yes that is much better (actually I was doubt about what form of
> > which the checking style to use - your form is much compact but mine does
> > show where *exactly* the problem appeared). So 'case that is the fake driver
> > your form is preferred ;) Ishizaki, could you use Michael's part then?
> >
> > >
> > >
> > > cheers
> > >
> > > --
> > > Michael Ellerman
> > > OzLabs, IBM Australia Development Lab
> > >
> > > wwweb: http://michael.ellerman.id.au
> > > phone: +61 2 6212 1183 (tie line 70 21183)
> > >
> > > We do not inherit the earth from our ancestors,
> > > we borrow it from our children. - S.M.A.R.T Person
> > >
> > >
> >
> > Cyrill
> >
> Ishizaki I can update the patch if you needed. Should I?
> 
> Cyrill

There is no problem to use Michael's part, and I also prefer simple
one like this.

Cyrill, would you please update your patch?

Best regards,
Kou Ishizaki
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3-mm2

2007-11-28 Thread KAMEZAWA Hiroyuki

> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-add-scan_global_lru-macro.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-nid-zid-helper-function-for-cgroup.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-per-zone-active-inactive-counter.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-calculate-mapper_ratio-per-cgroup.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-calculate-active-inactive-imbalance-per-cgroup.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-remember-reclaim-priority-in-memory-cgroup.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-remember-reclaim-priority-in-memory-cgroup-fix.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-remember-reclaim-priority-in-memory-cgroup-fix-2.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-calculate-the-number-of-pages-to-be-scanned-per-cgroup.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-modifies-vmscanc-for-isolate-globa-cgroup-lru-activity.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-modifies-vmscanc-for-isolate-globa-cgroup-lru-activity-fix.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-per-zone-lru-for-cgroup.patch
> +per-zone-and-reclaim-enhancements-for-memory-controller-take-3-per-zone-lock-for-cgroup.patch
> 
>  cgroup memeory controller updates
> 
I noticed CONFIG_NUMA + CONFIG_CGROUP_MEM_CONT + CONFIG_SLUB cannot boot 
because of my patch.
(SLAB is ok.)
I'll post workaround soon.

Sorry,
-Kame


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread YAMAMOTO Takashi
> @@ -651,10 +758,11 @@
>   /* Avoid race with charge */
>   atomic_set(>ref_cnt, 0);
>   if (clear_page_cgroup(page, pc) == pc) {
> + int active;
>   css_put(>css);
> + active = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
>   res_counter_uncharge(>res, PAGE_SIZE);
> - list_del_init(>lru);
> - mem_cgroup_charge_statistics(mem, pc->flags, false);
> + __mem_cgroup_remove_list(pc);
>   kfree(pc);
>   } else  /* being uncharged ? ...do relax */
>   break;

'active' seems unused.

YAMAMOTO Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 11:24:06 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> On Thu, 29 Nov 2007 10:37:02 +0900
> KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:
> 
> > Maybe zonelists of NODE_DATA() is not initialized. you are right.
> > I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug 
> > case later.)
> > 
> > Thank you for test!
> > 
> Could you try this ? 
> 
Sorry..this can be a workaround but I noticed I miss something..

ok, just use N_HIGH_MEMORY here and add comment for hotplugging support is not 
yet.

Christoph-san, Lee-san, could you confirm following ?

- when SLAB is used, kmalloc_node() against offline node will success.
- when SLUB is used, kmalloc_node() against offline node will panic.

Then, the caller should take care that node is online before kmalloc().

Regards,
-Kame 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] (2.4.26-rc3-mm2) -mm Update CAP_LAST_CAP to reflect CAP_MAC_ADMIN

2007-11-28 Thread Serge E. Hallyn
Quoting Casey Schaufler ([EMAIL PROTECTED]):
> From: Casey Schaufler <[EMAIL PROTECTED]>
> 
> Bump the value of CAP_LAST_CAP to reflect the current last cap value.
> It appears that the patch that introduced CAP_LAST_CAP and the patch
> that introduced CAP_MAC_ADMIN came in more or less at the same time.
> 
> Signed-off-by: Casey Schaufler <[EMAIL PROTECTED]>

Signed-off-by: Serge Hallyn <[EMAIL PROTECTED]>


> 
> ---
> 
>  include/linux/capability.h |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> 
> diff -uprN -X linux-2.6.24-rc3-mm2-base/Documentation/dontdiff 
> linux-2.6.24-rc3-mm2-base/include/linux/capability.h 
> linux-2.6.24-rc3-mm2-lastcap/include/linux/capability.h
> --- linux-2.6.24-rc3-mm2-base/include/linux/capability.h  2007-11-27 
> 16:47:02.0 -0800
> +++ linux-2.6.24-rc3-mm2-lastcap/include/linux/capability.h   2007-11-28 
> 14:04:57.0 -0800
> @@ -315,10 +315,6 @@ typedef struct kernel_cap_struct {
> 
>  #define CAP_SETFCAP   31
> 
> -#define CAP_LAST_CAP CAP_SETFCAP
> -
> -#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
> -
>  /* Override MAC access.
> The base kernel enforces no MAC policy.
> An LSM may enforce a MAC policy, and if it does and it chooses
> @@ -336,6 +332,10 @@ typedef struct kernel_cap_struct {
> 
>  #define CAP_MAC_ADMIN33
> 
> +#define CAP_LAST_CAP CAP_MAC_ADMIN
> +
> +#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
> +
>  /*
>   * Bit location of each capability (used by user-space library and kernel)
>   */
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch](Resend) mm/sparse.c: Improve the error handling for sparse_add_one_section()

2007-11-28 Thread Yasunori Goto

Looks good to me.

Thanks.

Acked-by: Yasunori Goto <[EMAIL PROTECTED]>



> On Tue, Nov 27, 2007 at 10:53:45AM -0800, Dave Hansen wrote:
> >On Tue, 2007-11-27 at 10:26 +0800, WANG Cong wrote:
> >> 
> >> @@ -414,7 +418,7 @@ int sparse_add_one_section(struct zone *
> >>  out:
> >> pgdat_resize_unlock(pgdat, );
> >> if (ret <= 0)
> >> -   __kfree_section_memmap(memmap, nr_pages);
> >> +   kfree(usemap);
> >> return ret;
> >>  }
> >>  #endif 
> >
> >Why did you get rid of the memmap free here?  A bad return from
> >sparse_init_one_section() indicates that we didn't use the memmap, so it
> >will leak otherwise.
> 
> Sorry, I was confused by the recursion. This one should be OK.
> 
> Thanks.
> 
> 
> 
> Improve the error handling for mm/sparse.c::sparse_add_one_section().  And I
> see no reason to check 'usemap' until holding the 'pgdat_resize_lock'.
> 
> Cc: Christoph Lameter <[EMAIL PROTECTED]>
> Cc: Dave Hansen <[EMAIL PROTECTED]>
> Cc: Rik van Riel <[EMAIL PROTECTED]>
> Cc: Yasunori Goto <[EMAIL PROTECTED]>
> Cc: Andy Whitcroft <[EMAIL PROTECTED]>
> Signed-off-by: WANG Cong <[EMAIL PROTECTED]>
> 
> ---
> Index: linux-2.6/mm/sparse.c
> ===
> --- linux-2.6.orig/mm/sparse.c
> +++ linux-2.6/mm/sparse.c
> @@ -391,9 +391,17 @@ int sparse_add_one_section(struct zone *
>* no locking for this, because it does its own
>* plus, it does a kmalloc
>*/
> - sparse_index_init(section_nr, pgdat->node_id);
> + ret = sparse_index_init(section_nr, pgdat->node_id);
> + if (ret < 0)
> + return ret;
>   memmap = kmalloc_section_memmap(section_nr, pgdat->node_id, nr_pages);
> + if (!memmap)
> + return -ENOMEM;
>   usemap = __kmalloc_section_usemap();
> + if (!usemap) {
> + __kfree_section_memmap(memmap, nr_pages);
> + return -ENOMEM;
> + }
>  
>   pgdat_resize_lock(pgdat, );
>  
> @@ -403,18 +411,16 @@ int sparse_add_one_section(struct zone *
>   goto out;
>   }
>  
> - if (!usemap) {
> - ret = -ENOMEM;
> - goto out;
> - }
>   ms->section_mem_map |= SECTION_MARKED_PRESENT;
>  
>   ret = sparse_init_one_section(ms, section_nr, memmap, usemap);
>  
>  out:
>   pgdat_resize_unlock(pgdat, );
> - if (ret <= 0)
> + if (ret <= 0) {
> + kfree(usemap);
>   __kfree_section_memmap(memmap, nr_pages);
> + }
>   return ret;
>  }
>  #endif

-- 
Yasunori Goto 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix plip 1

2007-11-28 Thread Mikulas Patocka


On Mon, 26 Nov 2007, Linus Torvalds wrote:

> 
> 
> On Thu, 22 Nov 2007, Mikulas Patocka wrote:
> > 
> > netif_rx is meant to be called from interrupts because it doesn't wake up 
> > ksoftirqd. For calling from outside interrupts, netif_rx_ni exists.
> 
> Argh. Can you _please_ use more useful subject lines than "fix plip 1/2"?
> 
> Those subject lines are what becomes the single-line description of the 
> problem, used by visualizers like gitk and gitweb. So "fix plip 1" is a 
> singularly bad such line!

OK, I see

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Question regarding mutex locking

2007-11-28 Thread David Schwartz

> Thanks for the help. Someday, I hope to understand this stuff.
>
> Larry

Any code either deals with an object or it doesn't. If it doesn't deal with
that object, it should not be acquiring locks on that object. If it does
deal with that object, it must know the internal details of that object,
including when and whether locks are held, or it cannot deal with that
object sanely.

So your question starts out broken, it says, "I need to lock an object, but
I have no clue what's going on with that very same object." If you don't
know what's going on with the object, you don't know enough about the object
to lock it. If you do, you should know whether you hold the lock or not.

Either architect so this function doesn't deal with that object and so
doesn't need to lock it or architect it so that this function knows what's
going on with that object and so knows whether it holds the lock or not.

If you don't follow this rule, a lot of things can go horribly wrong. The
two biggest issues are:

1) You don't know the semantic effect of locking and unlocking the mutex. So
any code placed before the mutex is acquired or after its released may not
do what's expected. For example, you cannot unlock the mutex and yield,
because you might not actually wind up unlocking the mutex.

2) A function that acquires a lock normally expects the object it locks to
be in a consistent state when it acquires the lock. However, since your code
may or may not acquire the mutex, it is not assured that its lock gets the
object in a consistent state. Requiring the caller to know this and call the
function with the object in a consistent state creates brokenness of varying
kinds. (If the object may change, why not just release the lock before
calling? If the object may not change, why is the sub-function releasing the
lock?)

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] LTTng instrumentation mm (using page_to_pfn)

2007-11-28 Thread Mathieu Desnoyers
I am adding the rest.. two questions left :

* Dave Hansen ([EMAIL PROTECTED]) wrote:
 
> > 
> > Index: linux-2.6-lttng/mm/memory.c
> > ===
> > --- linux-2.6-lttng.orig/mm/memory.c2007-11-28 08:42:09.0 
> > -0500
> > +++ linux-2.6-lttng/mm/memory.c 2007-11-28 09:02:57.0 -0500
> > @@ -2072,6 +2072,7 @@ static int do_swap_page(struct mm_struct
> > delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> > page = lookup_swap_cache(entry);
> > if (!page) {
> > +   trace_mark(mm_swap_in, "pfn %lu", page_to_pfn(page));
> > grab_swap_token(); /* Contend for token _before_ read-in */
> > swapin_readahead(entry, address, vma);
> > page = read_swap_cache_async(entry, vma, address);
> 
> How about putting the swap file number and the offset as well?
> 
[...]
> > Index: linux-2.6-lttng/mm/page_io.c
> > ===
> > --- linux-2.6-lttng.orig/mm/page_io.c   2007-11-28 08:38:47.0 
> > -0500
> > +++ linux-2.6-lttng/mm/page_io.c2007-11-28 08:52:14.0 -0500
> > @@ -114,6 +114,7 @@ int swap_writepage(struct page *page, st
> > rw |= (1 << BIO_RW_SYNC);
> > count_vm_event(PSWPOUT);
> > set_page_writeback(page);
> > +   trace_mark(mm_swap_out, "pfn %lu", page_to_pfn(page));
> > unlock_page(page);
> > submit_bio(rw, bio);
> 
> I'd also like to see the swap file number and the location in swap for
> this one.  
> 

Before I start digging deeper in checking whether it is already
instrumented by the fs instrumentation (and would therefore be
redundant), is there a particular data structure from mm/ that you
suggest taking the swap file number and location in swap from ?

Mathieu

> -- Dave
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 10:37:02 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> Maybe zonelists of NODE_DATA() is not initialized. you are right.
> I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug 
> case later.)
> 
> Thank you for test!
> 
Could you try this ? 

Thanks,
-Kame
==

Don't call kmalloc() against possible but offline node.

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

 mm/memcontrol.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

Index: test-2.6.24-rc3-mm1/mm/memcontrol.c
===
--- test-2.6.24-rc3-mm1.orig/mm/memcontrol.c
+++ test-2.6.24-rc3-mm1/mm/memcontrol.c
@@ -1117,8 +1117,14 @@ static int alloc_mem_cgroup_per_zone_inf
struct mem_cgroup_per_node *pn;
struct mem_cgroup_per_zone *mz;
int zone;
-
-   pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
+   /*
+* This routine is called against possible nodes.
+* But it's BUG to call kmalloc() against offline node.
+*/
+   if (node_state(N_ONLINE, node))
+   pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
+   else
+   pn = kmalloc(sizeof(*pn), GFP_KERNEL);
if (!pn)
return 1;
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] Writeback fix for concurrent large and small file writes

2007-11-28 Thread Frans Pop
Two typos in comments.

Cheers,
FJP

Michael Rubin wrote:
> + * The flush tree organizes the dirtied_when keys with the rb_tree. Any
> + * inodes with a duplicate dirtied_when value are link listed together.
> This + * link list is sorted by the inode's i_flushed_when. When both the
> + * dirited_when and the i_flushed_when are indentical the order in the
> + * linked list determines the order we flush the inodes.

s/dirited_when/dirtied_when/

> + * Here is where we interate to find the next inode to process. The
> + * strategy is to first look for any other inodes with the same
> dirtied_when + * value. If we have already processed that node then we
> need to find + * the next highest dirtied_when value in the tree.

s/interate/iterate/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

> > percpu references are quite frequent already (vm statistics) and will be 
> > more frequent after we have converted the per cpu arrays to per cpu 
> > allocations.
> >   
> 
> Well, I think the point is moot, because x86 will always use 32-bit
> offsets.  Each reference will only be 1 byte bigger than a normal
> variable reference.

Just because i386 is not able to use it does not mean that other arches 
are not. F.e. IA64 can embedd offsets in the actual instruction (but of 
course not 64bit).

x86_64 can use a 32 bit offset instead of a 64 bit addres because it uses 
the small model. A load of a 64 bit address would require much more 
expensive instructions. A load of a 64 bit address is currently avoided 
through the use of the pda that contains the full 64 bit address in the
data_offset field. Operations on per cpu data on x86_64 must therefore 
first load data_offset via gs and then add the per cpu address to this
offset. Then the per cpu operation is performed on that address.

In order to avoid this situation through one instruction we need a small 
32 bit offset relative to gs. Otherwise we cannot get away from the PDA 
and the use of data_offset.

 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
> The percpu areas need to be allocated in a NUMA aware fashion. Otherwise 
> you use distant memory for the most performance sensitive areas. The NUMA 
> subsystem must be so far up that these allocations can be performed in the 
> right way. And this means at least you need to know on which node each 
> processor is located. That is what the PDA is currently used for and i386 
> has no other way of doing that. I think we could use an array [NR_CPUS] 
> for this one but we want to avoid these arrays because NR_CPUS may get 
> very big.
>   

Oh, you mean there needs to be some percpu data mechanism operating in
order to do numa-aware allocations, which would be necessary to allocate
the percpu memory itself?

I can see how that would be awkward.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net/e1000: fix memcpy in e1000_get_strings

2007-11-28 Thread Roel Kluin
drivers/net/e1000/e1000_ethtool.c:113:
#define E1000_TEST_LEN sizeof(e1000_gstrings_test) / ETH_GSTRING_LEN

drivers/net/e1000e/ethtool.c:106:
#define E1000_TEST_LEN sizeof(e1000_gstrings_test) / ETH_GSTRING_LEN

E1000_TEST_LEN*ETH_GSTRING_LEN will expand to 
sizeof(e1000_gstrings_test) / (ETH_GSTRING_LEN * ETH_GSTRING_LEN)

Please confirm that the change is as wanted.
--
A lack of parentheses around defines causes unexpected results due to operator
precedences.

Signed-off-by: Roel Kluin <[EMAIL PROTECTED]>
---
diff --git a/drivers/net/e1000/e1000_ethtool.c 
b/drivers/net/e1000/e1000_ethtool.c
index 667f18b..b83ccce 100644
--- a/drivers/net/e1000/e1000_ethtool.c
+++ b/drivers/net/e1000/e1000_ethtool.c
@@ -1923,7 +1923,7 @@ e1000_get_strings(struct net_device *netdev, uint32_t 
stringset, uint8_t *data)
switch (stringset) {
case ETH_SS_TEST:
memcpy(data, *e1000_gstrings_test,
-   E1000_TEST_LEN*ETH_GSTRING_LEN);
+   sizeof(e1000_gstrings_test));
break;
case ETH_SS_STATS:
for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++) {
diff --git a/drivers/net/e1000e/ethtool.c b/drivers/net/e1000e/ethtool.c
index 6a39784..338c49d 100644
--- a/drivers/net/e1000e/ethtool.c
+++ b/drivers/net/e1000e/ethtool.c
@@ -1739,7 +1739,7 @@ static void e1000_get_strings(struct net_device *netdev, 
u32 stringset,
switch (stringset) {
case ETH_SS_TEST:
memcpy(data, *e1000_gstrings_test,
-   E1000_TEST_LEN*ETH_GSTRING_LEN);
+   sizeof(e1000_gstrings_test));
break;
case ETH_SS_STATS:
for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

> Don't think it matters either way.  Before percpu is allocated, NUMA
> issues don't matter.  Once they are - by whatever mechanism - you can
> set the segment bases up appropriately.  The fact that you chose to put
> percpu data at address X doesn't affect the percpu mechanism one way or
> the other.

The percpu areas need to be allocated in a NUMA aware fashion. Otherwise 
you use distant memory for the most performance sensitive areas. The NUMA 
subsystem must be so far up that these allocations can be performed in the 
right way. And this means at least you need to know on which node each 
processor is located. That is what the PDA is currently used for and i386 
has no other way of doing that. I think we could use an array [NR_CPUS] 
for this one but we want to avoid these arrays because NR_CPUS may get 
very big.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] [net/wireless/iwlwifi] : iwlwifi 4965 Fix race conditional panic.

2007-11-28 Thread Joonwoo Park
The cancel_delayed_work_sync has moved into ilw_cancel_deferred_work.
Thanks Zhu Yi.

[net/wireless/iwlwifi] : iwlwifi 4965 Fix race conditional panic.

Signed-off-by: Joonwoo Park <[EMAIL PROTECTED]>
---
diff --git a/drivers/net/wireless/iwlwifi/iwl4965-base.c 
b/drivers/net/wireless/iwlwifi/iwl4965-base.c
index 9918780..2474eba 100644
--- a/drivers/net/wireless/iwlwifi/iwl4965-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl4965-base.c
@@ -8864,6 +8864,7 @@ static void iwl_cancel_deferred_work(struct iwl_priv 
*priv)
 {
iwl_hw_cancel_deferred_work(priv);
 
+   cancel_delayed_work_sync(>init_alive_start);
cancel_delayed_work(>scan_check);
cancel_delayed_work(>alive_start);
cancel_delayed_work(>post_associate);
---

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of tree module using LSM

2007-11-28 Thread Casey Schaufler

--- Jan Engelhardt <[EMAIL PROTECTED]> wrote:

> 
> On Nov 28 2007 18:22, [EMAIL PROTECTED] wrote:
> >
> >Talpa is modular itself being composed of a set of kernel modules of which 
> >not all are loaded simultaneously. Where possible LSM can be used and _no_ 
> >messing with syscall table will take place. Unfortunately where another 
> >LSM user is present that won't work
> 
> SELinux supports chaining, so if talpa is loaded as a secondary to selinux,
> where is the problem? For those LSMs which do not support chaining (*cough*
> apparmor *cough* be one, mtadm another), fix them.

Um, cough cough (I ready do have a nasty cold) SELinux supports
a very limited bit of chaining. I don't think you're going to be
chaining security_secid_to_secctx() or security_secctx_to_secid()
with the current SELinux code, but you could prove me wrong there.

Chaining is a red herring. If you want talpa it seems that you
have a use case that isn't going to require the presence of
another LSM. You may have other issues, but at this point I say
throw caution to the wind, clean it up based on the suggestions
you've seen here, and put the patch up as an RFC on the LSM list.

What's the worst that could happen?


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] [net/wireless/iwlwifi] : iwlwifi 3945 Fix raceconditional panic.

2007-11-28 Thread Joonwoo Park
2007/11/29, Zhu Yi <[EMAIL PROTECTED]>:
> 
> Good catch. But it will be better if you add it into
> iwl_cancel_deferred_work().
> 

Thanks.
I agree with you. 
Actually, I considered it, but I was afraid of side effect.
Anyway, I'm attaching a new one.

Thanks.
Joonwoo

[net/wireless/iwlwifi] : iwlwifi 3945 Fix race conditional panic.

Signed-off-by: Joonwoo Park <[EMAIL PROTECTED]>
---
diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c 
b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index 465da4f..e51e872 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -8270,6 +8270,7 @@ static void iwl_cancel_deferred_work(struct iwl_priv 
*priv)
 {
iwl_hw_cancel_deferred_work(priv);
 
+   cancel_delayed_work_sync(>init_alive_start);
cancel_delayed_work(>scan_check);
cancel_delayed_work(>alive_start);
cancel_delayed_work(>post_associate);
---

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
> On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:
>
>   
>> I don't see the problem.  The way i386 does it inherently supports
>> per-cpu data very early on (it uses the prototype percpu section until
>> the real percpu values are set up).
>> 
>
> Ok so we could do that for x86_64 as well? There is more complicated 
> bootstrap since i386 does not support NUMA aware placement of per cpu 
> areas.
>   

Don't think it matters either way.  Before percpu is allocated, NUMA
issues don't matter.  Once they are - by whatever mechanism - you can
set the segment bases up appropriately.  The fact that you chose to put
percpu data at address X doesn't affect the percpu mechanism one way or
the other.

> percpu references are quite frequent already (vm statistics) and will be 
> more frequent after we have converted the per cpu arrays to per cpu 
> allocations.
>   

Well, I think the point is moot, because x86 will always use 32-bit
offsets.  Each reference will only be 1 byte bigger than a normal
variable reference.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:

> I don't see the problem.  The way i386 does it inherently supports
> per-cpu data very early on (it uses the prototype percpu section until
> the real percpu values are set up).

Ok so we could do that for x86_64 as well? There is more complicated 
bootstrap since i386 does not support NUMA aware placement of per cpu 
areas.

> > The i386 way of referring to per cpu data is not optimal because it is 
> > always offset by __per_cpu_start. per cpu data offsets need to be relative 
> > to the beginning of the per cpu area. per cpu data is less than 64k so 2 
> > byte offsets would be enough.
> >   
> 
> I don't see that's terribly important.  percpu references aren't all
> that common overall, and - at least on x86 - using a 16-bit offset
> (assuming its possible) would require a prefix anyway, so it would only
> save 1 byte per reference.  But I can't convince gas to generate a
> 16-bit offset anyway.

percpu references are quite frequent already (vm statistics) and will be 
more frequent after we have converted the per cpu arrays to per cpu 
allocations.


> > That way the __per_cpu_offset array and the registers that are used on 
> > various platforms are pointing to the actual data and can be loaded
> > directly into a register and then a load with a small offset to that 
> > register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, 
> > on ia64 a fixed address stands in for the register.
> 
> The asm used to generate these references is inherently arch-specific
> anyway, so the type and size of offset needed from the per-cpu base
> register to the data itself can be arch-dependent without loss of
> generality.  

Well yes that is already the case and made explicit by the percpu cleanup 
done so far. The offset of a base is used by multiple architectures.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Wed, 28 Nov 2007 16:19:59 -0500
Lee Schermerhorn <[EMAIL PROTECTED]> wrote:

> As soon as this loop hits the first non-existent node on my platform, I
> get a NULL pointer deref down in __alloc_pages.  Stack trace below.
> 
> Perhaps N_POSSIBLE should be N_HIGH_MEMORY?  That would require handling
> of memory/node hotplug for each memory control group, right?  But, I'm
> going to try N_HIGH_MEMORY as a work around.
> 
Hmm, ok. (>_<


> Call Trace:
>  [] show_stack+0x80/0xa0
> sp=a001008e39c0 bsp=a001008dd1b0
>  [] show_regs+0x870/0x8a0
> sp=a001008e3b90 bsp=a001008dd158
>  [] die+0x190/0x300
> sp=a001008e3b90 bsp=a001008dd110
>  [] ia64_do_page_fault+0x8e0/0xa20
> sp=a001008e3b90 bsp=a001008dd0b8
>  [] ia64_leave_kernel+0x0/0x270
> sp=a001008e3c20 bsp=a001008dd0b8
>  [] __alloc_pages+0x30/0x6e0
> sp=a001008e3df0 bsp=a001008dcfe0
>  [] new_slab+0x610/0x6c0
> sp=a001008e3e00 bsp=a001008dcf80
>  [] get_new_slab+0x50/0x200
> sp=a001008e3e00 bsp=a001008dcf48
>  [] __slab_alloc+0x2e0/0x4e0
> sp=a001008e3e00 bsp=a001008dcf00
>  [] kmem_cache_alloc_node+0x180/0x200
> sp=a001008e3e10 bsp=a001008dcec0
>  [] mem_cgroup_create+0x160/0x400
> sp=a001008e3e10 bsp=a001008dce78
>  [] cgroup_init_subsys+0xa0/0x400
> sp=a001008e3e20 bsp=a001008dce28
>  [] cgroup_init+0x90/0x160
> sp=a001008e3e20 bsp=a001008dce00
>  [] start_kernel+0x700/0x820
> sp=a001008e3e20 bsp=a001008dcd80
> 
Maybe zonelists of NODE_DATA() is not initialized. you are right.
I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug case 
later.)

Thank you for test!

Regards,
-Kame



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] asm-arm/{arch-omap,arch-ixp23xx}: parentheses around NR_IRQS definition

2007-11-28 Thread Roel Kluin
Roel Kluin wrote:
> Add parentheses to prevent operator precedence errors
> 
> Signed-off-by: Roel Kluin <[EMAIL PROTECTED]>

For the arch-ixp23xx part I should have added:
Acked-by: Lennert Buytenhek <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Andi Kleen

> * drop support for stack-protector (does it really help? do people
>   use it?)

AFAIK we only ever had a single classical stack buffer overflow in the kernel.
It certainly doesn't seem to be a common security problem it is solving.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Jeremy Fitzhardinge
Christoph Lameter wrote:
> On Wed, 28 Nov 2007, Jeremy Fitzhardinge wrote:
>
>  > Yes, I would like to convert x86_64 to match i386's percpu, and drop the
>   
>> pda altogether.  The only thing preventing this is the stack canary, and
>> I'm wondering how much value there is in keeping it, given the
>> disadvantages of having this divergence between 32 and 64 bit.
>> 
>
> I think most of the PDA could be gotten rid of. The problems are
>
> 1. The stack canary
>   

Yes, this is a biggie.  It needs one of:

* fix gcc
* post-process the .s file
* drop support for stack-protector (does it really help? do people
  use it?)


> 2. The PDA is used to store per cpu data before the per cpu areas
>are setup.
>   

I don't see the problem.  The way i386 does it inherently supports
per-cpu data very early on (it uses the prototype percpu section until
the real percpu values are set up).

> The i386 way of referring to per cpu data is not optimal because it is 
> always offset by __per_cpu_start. per cpu data offsets need to be relative 
> to the beginning of the per cpu area. per cpu data is less than 64k so 2 
> byte offsets would be enough.
>   

I don't see that's terribly important.  percpu references aren't all
that common overall, and - at least on x86 - using a 16-bit offset
(assuming its possible) would require a prefix anyway, so it would only
save 1 byte per reference.  But I can't convince gas to generate a
16-bit offset anyway.

> That way the __per_cpu_offset array and the registers that are used on 
> various platforms are pointing to the actual data and can be loaded
> directly into a register and then a load with a small offset to that 
> register can be performed. On x86_64 this is gs, on i386 fs, on sparc g5, 
> on ia64 a fixed address stands in for the register.

The asm used to generate these references is inherently arch-specific
anyway, so the type and size of offset needed from the per-cpu base
register to the data itself can be arch-dependent without loss of
generality.  

I definitely see that small offsets might be useful for other
architectures, but for x86 it doesn't help and makes things more
complex.  The only difference between 32- and 64-bit is whether we
generate an offset from %fs, %gs or nothing (for the UP case).


>  In loops over all per 
> cpu variables this will also simplify the code.
>   

Why's that?

> And ultimately we can get rid of the ugly RELOC_HIDE macro. It simply 
> becomes the adding of the base address in a register to a per cpu offset.
>   

I was never quite sure what that was for.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Christoph Lameter
On Thu, 29 Nov 2007, Andi Kleen wrote:

> On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote:
> > 1. The stack canary
> 
> You would need to change gcc with a new option and only allow the stack
> checking when the compiler supports the new option. However the problem
> is still how to get a reasonable fixed offset. Or perhaps just change
> gcc to use a linker symbol relative to %gs that could be set to anything?

I still think we should leave the canary as is.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: void* arithmnetic

2007-11-28 Thread Ming Lei
2007/11/29, Jan Engelhardt <[EMAIL PROTECTED]>:
>
> On Nov 29 2007 01:05, J.A. Magallón wrote:
> >
> >Since begin of the ages the build of the nvidia driver says things like
> >this:
> >
>
> Explicitly adding -Wpointer-arith to ones own Makefile is like
> admitting the code might be problematic. :->
>
>
> I think sizeof(void *) == 1 is taken as granted as sizeof(int) >= 4
> these days. Sigh.
sizeof(void *) == 4, sizeof(void)==1, :)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 05/14] percpu: Use a Kconfig variable to configure arch specific percpu setup

2007-11-28 Thread Andi Kleen
On Wed, Nov 28, 2007 at 04:11:37PM -0800, Christoph Lameter wrote:
> 1. The stack canary

You would need to change gcc with a new option and only allow the stack
checking when the compiler supports the new option. However the problem
is still how to get a reasonable fixed offset. Or perhaps just change
gcc to use a linker symbol relative to %gs that could be set to anything?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] net/bonding: Return nothing for not applicable values

2007-11-28 Thread Jay Vosburgh

>The previous code returned '\n' (that is, a single empty line)
>from most files, with one exception (xmit_hash_policy), where
>it returned 'NA\n'.  This patch consolidates each file to return
>nothing at all if not applicable, not even a '\n'.
>
>I find this behaviour more usual, more useful, more efficient
>and shorter to code from both sides.
[...]
>+  if ((bond->params.mode == BOND_MODE_XOR) ||
>+  (bond->params.mode == BOND_MODE_8023AD)) {
>   count = sprintf(buf, "%s %d\n",
>   xmit_hashtype_tbl[bond->params.xmit_policy].modename,
>   bond->params.xmit_policy);

Rather than this (returning nothing if not in xor or 802.3ad
mode), I'd prefer to see this always return whatever the xmit policy is
(regardless of the mode), and remove the mode test from
bonding_store_xmit_hash().

This would be consistent with the way the arp_ip_target option
is treated: the actual value is always displayed, even if it is not
used, and it is legal to change the value, regardless of the mode.

Other than this, I'm fine with the changes.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] [net/wireless/iwlwifi] : iwlwifi 3945 Fix race conditional panic.

2007-11-28 Thread Zhu Yi

On Wed, 2007-11-28 at 19:41 +0900, Joonwoo Park wrote:
> [net/wireless/iwlwifi] : iwlwifi 3945 Fix race conditional panic.
> 
> Signed-off-by: Joonwoo Park <[EMAIL PROTECTED]>
> ---
> diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> index 465da4f..ac6c4a9 100644
> --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> @@ -8570,6 +8570,7 @@ static void iwl_pci_remove(struct pci_dev *pdev)
> IWL_DEBUG_INFO("*** UNLOAD DRIVER ***\n");
>  
> mutex_lock(>mutex);
> +   cancel_delayed_work_sync(>init_alive_start);
> set_bit(STATUS_EXIT_PENDING, >status);
> __iwl_down(priv);
> mutex_unlock(>mutex);

Good catch. But it will be better if you add it into
iwl_cancel_deferred_work().

Thanks,
-yi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4, v3] Physical PCI slot objects

2007-11-28 Thread Gary Hade
On Wed, Nov 28, 2007 at 04:02:38PM -0800, Kristen Carlson Accardi wrote:
> On Wed, 28 Nov 2007 13:31:47 -0800
> Gary Hade <[EMAIL PROTECTED]> wrote:
> 
> > FYI, the node contains 2 hotpluggable PCIe slots and 5
> > non-hotpluggable PCIe slots but 'pci_slot' only exposed
> > the 2 hotpluggable slots.  This does not appear to be due
> > to a 'pci_slot' driver problem since I looked at the DSDT
> > and SSDT and found that there are currently no _SUN methods
> > for the non-hotpluggable slots.
> 
> Thanks for testing Gary.  I would think this situation would be the
> common case, since I doubt most firmware writers would bother to
> implement _SUN for non-hotpluggable slots -- at least on other DSDT
> I've seen this has been the case as well.  

Yea, I was also not surprised although features such as
Alex working on may provide some motivation to change that.

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
[EMAIL PROTECTED]
http://www.ibm.com/linux/ltc

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Error returns not handled correctly by sysfs.c:subsys_attr_store()

2007-11-28 Thread Tejun Heo
Andrew Patterson wrote:
> I tried with clean 2.6.24-rc3 and get the same bad behavior.  This is on
> an ia64 box, so maybe that is an issue. I can try on an x86 box as well.
> Oh, one other thing.  I tried a "uname -r" to make sure I had the
> correct kernel booted and got:
> 
> # uname -r
> 2.6.24-rc3
> x
> y
> z
> #

Yeah, please try it on another machine from clean tree.  sysfs code is
definitely not endian dependent and is 64 bit clean.  Heck, all my test
machines run 64 bit these days.  I would be surprised if it's something
architecture dependent but please try on a different machine with
different userland with kernel built from fresh source tree.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] asm-arm/{arch-omap,arch-ixp23xx}: parentheses around NR_IRQS definition

2007-11-28 Thread Roel Kluin
in include/asm-arm/arch-omap/board-innovator.h:40
#define NR_IRQSIH_BOARD_BASE + NR_FPGA_IRQS

in include/asm-arm/arch-ixp23xx/irqs.h:156:
#define NR_IRQS  NR_IXP23XX_IRQS + NR_IXP23XX_MACH_IRQS

This could lead to problems when this definition is used in:

arch/ia64/sn/kernel/irq.c:516:
sn_irq_lh = kmalloc(sizeof(struct list_head *) * NR_IRQS, GFP_KERNEL);
arch/x86/kernel/io_apic_32.c:693:
irq_cpu_data[i].irq_delta = kmalloc(sizeof(unsigned long) * NR_IRQS, 
GFP_KERNEL);
694:
irq_cpu_data[i].last_irq = kmalloc(sizeof(unsigned long) * NR_IRQS, GFP_KERNEL);
699:
memset(irq_cpu_data[i].irq_delta,0,sizeof(unsigned long) * NR_IRQS);
700:
memset(irq_cpu_data[i].last_irq,0,sizeof(unsigned long) * NR_IRQS);
fs/proc/proc_misc.c:464:
per_irq_sum = kzalloc(sizeof(unsigned int)*NR_IRQS, GFP_KERNEL);

I am not sure whether this definition actually is used in any of these files.
Am I being paranoya? anyway, adding parentheses should be safe.
--
Add parentheses to prevent operator precedence errors

Signed-off-by: Roel Kluin <[EMAIL PROTECTED]>
---
diff --git a/include/asm-arm/arch-ixp23xx/irqs.h 
b/include/asm-arm/arch-ixp23xx/irqs.h
index e696395..27c5808 100644
--- a/include/asm-arm/arch-ixp23xx/irqs.h
+++ b/include/asm-arm/arch-ixp23xx/irqs.h
@@ -153,7 +153,7 @@
  */
 #define NR_IXP23XX_MACH_IRQS   32
 
-#define NR_IRQSNR_IXP23XX_IRQS + 
NR_IXP23XX_MACH_IRQS
+#define NR_IRQS(NR_IXP23XX_IRQS + 
NR_IXP23XX_MACH_IRQS)
 
 #define IXP23XX_MACH_IRQ(irq)  (NR_IXP23XX_IRQ + (irq))
 
diff --git a/include/asm-arm/arch-omap/board-innovator.h 
b/include/asm-arm/arch-omap/board-innovator.h
index b3cf334..56d2c98 100644
--- a/include/asm-arm/arch-omap/board-innovator.h
+++ b/include/asm-arm/arch-omap/board-innovator.h
@@ -37,7 +37,7 @@
 #define OMAP1510P1_EMIFF_PRI_VALUE 0x00
 
 #define NR_FPGA_IRQS   24
-#define NR_IRQS IH_BOARD_BASE + NR_FPGA_IRQS
+#define NR_IRQS (IH_BOARD_BASE + NR_FPGA_IRQS)
 
 #ifndef __ASSEMBLY__
 void fpga_write(unsigned char val, int reg);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of tree module using LSM

2007-11-28 Thread Greg KH
On Thu, Nov 29, 2007 at 01:53:46AM +0100, Jan Engelhardt wrote:
> 
> On Nov 28 2007 16:38, Greg KH wrote:
> >> 
> >> And if we are talking about the situation when files are written to
> >> in controlled way (i.e. we are not concerned with malware running on
> >> the box in question and just want to stop it from passing through
> >> mailsewer, etc.), then there's no damn need to play with LSM - just
> >> have e.g. coda with its commit-on-close and run the scanner on
> >> commit.  End of story.  Mind you, in such setups one would be much
> >> better off just having the mail server run the tests explicitly in
> >> the userland, along with the rest of anti-spam, etc. filters.
> >
> >I've repeated the above statements so many times to a number of the
> >anti-virus companies, and other people that really should know better,
> >that I'm really sick of it.  For some reason, they keep trying to do
> >things like this in the kernel, despite it being trivial to do in
> >userspace properly.
> >
> Do you mean something along the lines of FUSE?

That is one way, but not the simplest or nicest (people don't want to
run their whole fs on FUSE just yet).

The easiest way is as Al described above, just have the userspace
program that wrote the file to disk, check it then.

There are some nice SAMBA plugins that do just that already out there...

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/3] net/bonding: Adhere to coding style: break line after the if condition

2007-11-28 Thread Randy Dunlap

=?utf-8?q?Ferenc_W=C3=A1gner?= wrote:

Signed-off-by: Ferenc Wágner <[EMAIL PROTECTED]>


Acked-by: Randy Dunlap <[EMAIL PROTECTED]>

Thanks.


---
Randy Dunlap <[EMAIL PROTECTED]> writes:

 drivers/net/bonding/bond_sysfs.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 5c31f5c..9de2c52 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -91,7 +91,8 @@ static ssize_t bonding_show_bonds(struct class *cls, char 
*buf)
}
res += sprintf(buf + res, "%s ", bond->dev->name);
}
-   if (res) buf[res-1] = '\n'; /* eat the leftover space */
+   if (res)
+   buf[res-1] = '\n'; /* eat the leftover space */
up_read(&(bonding_rwsem));
return res;
 }
@@ -239,7 +240,8 @@ static ssize_t bonding_show_slaves(struct device *d,
res += sprintf(buf + res, "%s ", slave->dev->name);
}
read_unlock(>lock);
-   if (res) buf[res-1] = '\n'; /* eat the leftover space */
+   if (res)
+   buf[res-1] = '\n'; /* eat the leftover space */
return res;
 }
 
@@ -705,7 +707,8 @@ static ssize_t bonding_show_arp_targets(struct device *d,

res += sprintf(buf + res, "%u.%u.%u.%u ",
   NIPQUAD(bond->params.arp_targets[i]));
}
-   if (res) buf[res-1] = '\n'; /* eat the leftover space */
+   if (res)
+   buf[res-1] = '\n'; /* eat the leftover space */
return res;
 }
 



--
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of tree module using LSM

2007-11-28 Thread Greg KH
On Wed, Nov 28, 2007 at 12:42:52PM +, Tvrtko A. Ursulin wrote:
> 
> Hi Linus, all,
> 
> During one recent LKML discussion 
> (http://marc.info/?l=linux-kernel=119267398722085=2) about LSM going 
> static  you called for LSM users to speak up.
> 
> We here at Sophos (the fourth largest endpoint security vendor in the world) 
> have such a module called Talpa which is a part of our main endpoint security 
> product for Linux  that protects from viruses and malware hosted on Linux, 
> including those targetting Windows or other connected devices,  
> (http://www.sophos.com/products/enterprise/endpoint/security-and-control/linux/index.html)
>  
> which is GPL code and has been in the field for almost three years now. It's 
> source code has been shipping with the product from the start.  We also have 
> a SourceForge project at http://sourceforge.net/projects/talpa/ to host it.
> 
> In essence, what our module does is it intercepts file accesses and allows 
> userspace daemons to vet them. One of the means we implemented that is 
> through LSM and although it is not a perfect match for such use we prefer to 
> use an official interface. Unfortunately, with time it became impossible to 
> use LSM on some distributions (SELinux) so we had to implement other 
> intercept methods which are significantly less nice, and which may also 
> become unworkable over time.

Do you have a patch that shows the type of interface you would like to
see?  Like James stated, if you do not participate in the development
process, we have no way of knowing what you even want from the kernel.

What has kept you from submitting your code for inclusion in the main
kernel source tree?

Right now, your customers void their support warranties if they run your
software, as it can not be supported by the distros as an out-of-tree
kernel module.  I'm sure your customers would like to not have this
problem.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc3-mm2 - Build Failure on powerpc timerfd() undeclared

2007-11-28 Thread Arnd Bergmann
On Wednesday 28 November 2007 19:43:45 Andrew Morton wrote:
> > I guess all architectures except x86 are currently broken because they
> > reference the old sys_timerfd function.
>
> None of them were broken in my testing and I'm unsure why powerpc broke
> here.

PowerPC is unique in that it actually relies on the declarations
in include/{linux,asm}/syscalls.h to be present, because the
spu_syscall_table is generated from C code, not from assembly.
One reason why I did this was to be sure to find this exact
type of problem at compile-time, not at link time.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of tree module using LSM

2007-11-28 Thread Jan Engelhardt

On Nov 28 2007 16:38, Greg KH wrote:
>> 
>> And if we are talking about the situation when files are written to
>> in controlled way (i.e. we are not concerned with malware running on
>> the box in question and just want to stop it from passing through
>> mailsewer, etc.), then there's no damn need to play with LSM - just
>> have e.g. coda with its commit-on-close and run the scanner on
>> commit.  End of story.  Mind you, in such setups one would be much
>> better off just having the mail server run the tests explicitly in
>> the userland, along with the rest of anti-spam, etc. filters.
>
>I've repeated the above statements so many times to a number of the
>anti-virus companies, and other people that really should know better,
>that I'm really sick of it.  For some reason, they keep trying to do
>things like this in the kernel, despite it being trivial to do in
>userspace properly.
>
Do you mean something along the lines of FUSE?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of tree module using LSM

2007-11-28 Thread Jan Engelhardt

On Nov 28 2007 18:22, [EMAIL PROTECTED] wrote:
>
>Talpa is modular itself being composed of a set of kernel modules of which 
>not all are loaded simultaneously. Where possible LSM can be used and _no_ 
>messing with syscall table will take place. Unfortunately where another 
>LSM user is present that won't work

SELinux supports chaining, so if talpa is loaded as a secondary to selinux,
where is the problem? For those LSMs which do not support chaining (*cough*
apparmor *cough* be one, mtadm another), fix them.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.23.9-rt12: BUGs

2007-11-28 Thread Fernando Lopez-Lezcano
> I'll try rt12...
> 
> Same problems in rt12, getting lots of "delay of xxx usecs exceeds
> estimated spare time of ; restart" in jackd (on my T61 Lenovo laptop
> running fc7). Does not happen with 2.6.22.10 + rt9. This is both with
> the internal snd-hda-intel card and a pcmcia rme hdsp multiface. 

While trying out 2.6.23.9-rt12 I got the three attached bugs. 
Also attached is the output of dmesg for a clean boot on the machine. 

Jack displays timing problems, similar to when there were timing 
issues with dual processor machines. Still investigating as time 
permits. 

-- Fernando

 apparently while suspending ---

Nov 27 20:06:01 localhost kernel: Stopping tasks ... done.
Nov 27 20:06:01 localhost kernel: Suspending console(s)
Nov 27 20:06:01 localhost kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Nov 27 20:06:01 localhost kernel: sd 0:0:0:0: [sda] Stopping disk
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :15:00.2 disabled
Nov 27 20:06:01 localhost kernel: eth%d: Going into suspend...
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :03:00.0 disabled
Nov 27 20:06:01 localhost pcscd: winscard_msg_srv.c:238:SHMProcessEventsContext() select returns with failure: Interrupted system call
Nov 27 20:06:01 localhost pcscd: winscard_svc.c:222:ContextThread() Error in SHMProcessEventsContext
Nov 27 20:06:01 localhost pcscd: winscard_msg_srv.c:238:SHMProcessEventsContext() select returns with failure: Interrupted system call
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1f.2 disabled
Nov 27 20:06:01 localhost pcscd: winscard_svc.c:222:ContextThread() Error in SHMProcessEventsContext
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.7 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.2 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.1 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.0 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1b.0 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1a.7 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1a.1 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1a.0 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:19.0 disabled
Nov 27 20:06:01 localhost kernel: Disabling non-boot CPUs ...
Nov 27 20:06:01 localhost kernel: Breaking affinity for irq 218
Nov 27 20:06:01 localhost kernel: CPU 1 is now offline
Nov 27 20:06:01 localhost kernel: SMP alternatives: switching to UP code
Nov 27 20:06:01 localhost kernel: BUG: sleeping function called from invalid context pm-suspend(3740) at kernel/rtmutex.c:637
Nov 27 20:06:01 localhost gnome-power-manager: (nando) DBUS timed out, but recovering
Nov 27 20:06:01 localhost kernel: in_atomic():0 [], irqs_disabled():1
Nov 27 20:06:01 localhost kernel:  [] __rt_spin_lock+0x21/0x3d
Nov 27 20:06:01 localhost kernel:  [] free_pages_bulk+0x28/0x188
Nov 27 20:06:01 localhost kernel:  [] __drain_pages+0x48/0x69
Nov 27 20:06:01 localhost kernel:  [] page_alloc_cpu_notify+0x1e/0x3d
Nov 27 20:06:01 localhost kernel:  [] notifier_call_chain+0x2a/0x47
Nov 27 20:06:01 localhost kernel:  [] raw_notifier_call_chain+0x17/0x1a
Nov 27 20:06:01 localhost kernel:  [] _cpu_down+0x184/0x242
Nov 27 20:06:01 localhost kernel:  [] disable_nonboot_cpus+0x4e/0xd2
Nov 27 20:06:01 localhost kernel:  [] acpi_sleep_prepare+0x41/0x48
Nov 27 20:06:01 localhost kernel:  [] suspend_devices_and_enter+0x64/0x96
Nov 27 20:06:01 localhost kernel:  [] enter_state+0x11b/0x193
Nov 27 20:06:01 localhost kernel:  [] state_store+0x8e/0xa2
Nov 27 20:06:01 localhost kernel:  [] state_store+0x0/0xa2
Nov 27 20:06:01 localhost kernel:  [] subsys_attr_store+0x27/0x2b
Nov 27 20:06:01 localhost kernel:  [] sysfs_write_file+0xa6/0xd9
Nov 27 20:06:01 localhost kernel:  [] sysfs_write_file+0x0/0xd9
Nov 27 20:06:01 localhost kernel:  [] vfs_write+0xa8/0x15a
Nov 27 20:06:01 localhost gnome-power-manager: (nando) Resuming computer
Nov 27 20:06:01 localhost kernel:  [] sys_write+0x41/0x67
Nov 27 20:06:01 localhost kernel:  [] syscall_call+0x7/0xb
Nov 27 20:06:01 localhost kernel:  [] xfrm_send_policy_notify+0x44f/0x4f4
Nov 27 20:06:01 localhost NetworkManager:   Waking up from sleep. 
Nov 27 20:06:01 localhost kernel:  ===
Nov 27 20:06:01 localhost NetworkManager:   Deactivating device eth1. 
Nov 27 20:06:01 localhost kernel: CPU1 is down
Nov 27 20:06:01 localhost NetworkManager:   eth1: Device is fully-supported using driver 'e1000'. 
Nov 27 20:06:01 localhost kernel: Intel machine check architecture supported.
Nov 27 20:06:01 localhost NetworkManager:   nm_device_init(): waiting for device's worker thread to start 
Nov 27 20:06:01 localhost kernel: Intel machine check reporting enabled on CPU#0.

[PATCH x86/mm 6/6] x86-64 ia32 ptrace get/putreg32 current task

2007-11-28 Thread Roland McGrath

This generalizes the getreg32 and putreg32 functions so they can be used on
the current task, as well as on a task stopped in TASK_TRACED and switched
off.  This lays the groundwork to share this code for all kinds of
user-mode machine state access, not just ptrace.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/ia32/ptrace32.c |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/x86/ia32/ptrace32.c b/arch/x86/ia32/ptrace32.c
index c52d066..d5663e2 100644
--- a/arch/x86/ia32/ptrace32.c
+++ b/arch/x86/ia32/ptrace32.c
@@ -48,19 +48,27 @@ static int putreg32(struct task_struct *child, unsigned 
regno, u32 val)
if (val && (val & 3) != 3)
return -EIO;
child->thread.fsindex = val & 0x;
+   if (child == current)
+   loadsegment(fs, child->thread.fsindex);
break;
case offsetof(struct user32, regs.gs):
if (val && (val & 3) != 3)
return -EIO;
child->thread.gsindex = val & 0x;
+   if (child == current)
+   load_gs_index(child->thread.gsindex);
break;
case offsetof(struct user32, regs.ds):
if (val && (val & 3) != 3)
return -EIO;
child->thread.ds = val & 0x;
+   if (child == current)
+   loadsegment(ds, child->thread.ds);
break;
case offsetof(struct user32, regs.es):
child->thread.es = val & 0x;
+   if (child == current)
+   loadsegment(es, child->thread.ds);
break;
case offsetof(struct user32, regs.ss):
if ((val & 3) != 3)
@@ -129,15 +137,23 @@ static int getreg32(struct task_struct *child, unsigned 
regno, u32 *val)
switch (regno) {
case offsetof(struct user32, regs.fs):
*val = child->thread.fsindex;
+   if (child == current)
+   asm("movl %%fs,%0" : "=r" (*val));
break;
case offsetof(struct user32, regs.gs):
*val = child->thread.gsindex;
+   if (child == current)
+   asm("movl %%gs,%0" : "=r" (*val));
break;
case offsetof(struct user32, regs.ds):
*val = child->thread.ds;
+   if (child == current)
+   asm("movl %%ds,%0" : "=r" (*val));
break;
case offsetof(struct user32, regs.es):
*val = child->thread.es;
+   if (child == current)
+   asm("movl %%es,%0" : "=r" (*val));
break;
 
R32(cs, cs);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH x86/mm 5/6] x86-32 ptrace get/putreg current task

2007-11-28 Thread Roland McGrath

This generalizes the getreg and putreg functions so they can be used on the
current task, as well as on a task stopped in TASK_TRACED and switched off.
This lays the groundwork to share this code for all kinds of user-mode
machine state access, not just ptrace.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/ptrace_32.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/ptrace_32.c b/arch/x86/kernel/ptrace_32.c
index 5aca84e..2607130 100644
--- a/arch/x86/kernel/ptrace_32.c
+++ b/arch/x86/kernel/ptrace_32.c
@@ -55,6 +55,12 @@ static int putreg(struct task_struct *child,
if (value && (value & 3) != 3)
return -EIO;
child->thread.gs = value;
+   if (child == current)
+   /*
+* The user-mode %gs is not affected by
+* kernel entry, so we must update the CPU.
+*/
+   loadsegment(gs, value);
return 0;
case DS:
case ES:
@@ -104,6 +110,8 @@ static unsigned long getreg(struct task_struct *child, 
unsigned long regno)
break;
case GS:
retval = child->thread.gs;
+   if (child == current)
+   savesegment(gs, retval);
break;
case DS:
case ES:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH x86/mm 4/6] x86-64 ptrace get/putreg current task

2007-11-28 Thread Roland McGrath

This generalizes the getreg and putreg functions so they can be used on the
current task, as well as on a task stopped in TASK_TRACED and switched off.
This lays the groundwork to share this code for all kinds of user-mode
machine state access, not just ptrace.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/ptrace_64.c |   36 ++--
 1 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/ptrace_64.c b/arch/x86/kernel/ptrace_64.c
index 2427548..5979dbe 100644
--- a/arch/x86/kernel/ptrace_64.c
+++ b/arch/x86/kernel/ptrace_64.c
@@ -67,21 +67,29 @@ static int putreg(struct task_struct *child,
if (value && (value & 3) != 3)
return -EIO;
child->thread.fsindex = value & 0x;
+   if (child == current)
+   loadsegment(fs, child->thread.fsindex);
return 0;
case offsetof(struct user_regs_struct,gs):
if (value && (value & 3) != 3)
return -EIO;
child->thread.gsindex = value & 0x;
+   if (child == current)
+   load_gs_index(child->thread.gsindex);
return 0;
case offsetof(struct user_regs_struct,ds):
if (value && (value & 3) != 3)
return -EIO;
child->thread.ds = value & 0x;
+   if (child == current)
+   loadsegment(ds, child->thread.ds);
return 0;
case offsetof(struct user_regs_struct,es):
if (value && (value & 3) != 3)
return -EIO;
child->thread.es = value & 0x;
+   if (child == current)
+   loadsegment(es, child->thread.es);
return 0;
case offsetof(struct user_regs_struct,ss):
if ((value & 3) != 3)
@@ -135,14 +143,32 @@ static unsigned long getreg(struct task_struct *child, 
unsigned long regno)
 {
struct pt_regs *regs = task_pt_regs(child);
unsigned long val;
+   unsigned int seg;
switch (regno) {
case offsetof(struct user_regs_struct, fs):
+   if (child == current) {
+   /* Older gas can't assemble movq %?s,%r?? */
+   asm("movl %%fs,%0" : "=r" (seg));
+   return seg;
+   }
return child->thread.fsindex;
case offsetof(struct user_regs_struct, gs):
+   if (child == current) {
+   asm("movl %%gs,%0" : "=r" (seg));
+   return seg;
+   }
return child->thread.gsindex;
case offsetof(struct user_regs_struct, ds):
+   if (child == current) {
+   asm("movl %%ds,%0" : "=r" (seg));
+   return seg;
+   }
return child->thread.ds;
case offsetof(struct user_regs_struct, es):
+   if (child == current) {
+   asm("movl %%es,%0" : "=r" (seg));
+   return seg;
+   }
return child->thread.es;
case offsetof(struct user_regs_struct, fs_base):
/*
@@ -152,7 +178,10 @@ static unsigned long getreg(struct task_struct *child, 
unsigned long regno)
 */
if (child->thread.fs != 0)
return child->thread.fs;
-   if (child->thread.fsindex != FS_TLS_SEL)
+   seg = child->thread.fsindex;
+   if (child == current)
+   asm("movl %%fs,%0" : "=r" (seg));
+   if (seg != FS_TLS_SEL)
return 0;
return get_desc_base(>thread.tls_array[FS_TLS]);
case offsetof(struct user_regs_struct, gs_base):
@@ -161,7 +190,10 @@ static unsigned long getreg(struct task_struct *child, 
unsigned long regno)
 */
if (child->thread.gs != 0)
return child->thread.gs;
-   if (child->thread.gsindex != GS_TLS_SEL)
+   seg = child->thread.gsindex;
+   if (child == current)
+   asm("movl %%gs,%0" : "=r" (seg));
+   if (seg != GS_TLS_SEL)
return 0;
return get_desc_base(>thread.tls_array[GS_TLS]);
case offsetof(struct user_regs_struct, flags):
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH x86/mm 3/6] x86-32 ptrace whitespace

2007-11-28 Thread Roland McGrath

This canonicalizes the indentation in the getreg and putreg functions.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/ptrace_32.c |  110 +-
 1 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kernel/ptrace_32.c b/arch/x86/kernel/ptrace_32.c
index f81e2f1..5aca84e 100644
--- a/arch/x86/kernel/ptrace_32.c
+++ b/arch/x86/kernel/ptrace_32.c
@@ -51,37 +51,37 @@ static int putreg(struct task_struct *child,
struct pt_regs *regs = task_pt_regs(child);
regno >>= 2;
switch (regno) {
-   case GS:
-   if (value && (value & 3) != 3)
-   return -EIO;
-   child->thread.gs = value;
-   return 0;
-   case DS:
-   case ES:
-   case FS:
-   if (value && (value & 3) != 3)
-   return -EIO;
-   value &= 0x;
-   break;
-   case SS:
-   case CS:
-   if ((value & 3) != 3)
-   return -EIO;
-   value &= 0x;
-   break;
-   case EFL:
-   value &= FLAG_MASK;
-   /*
-* If the user value contains TF, mark that
-* it was not "us" (the debugger) that set it.
-* If not, make sure it stays set if we had.
-*/
-   if (value & X86_EFLAGS_TF)
-   clear_tsk_thread_flag(child, TIF_FORCED_TF);
-   else if (test_tsk_thread_flag(child, TIF_FORCED_TF))
-   value |= X86_EFLAGS_TF;
-   value |= regs->flags & ~FLAG_MASK;
-   break;
+   case GS:
+   if (value && (value & 3) != 3)
+   return -EIO;
+   child->thread.gs = value;
+   return 0;
+   case DS:
+   case ES:
+   case FS:
+   if (value && (value & 3) != 3)
+   return -EIO;
+   value &= 0x;
+   break;
+   case SS:
+   case CS:
+   if ((value & 3) != 3)
+   return -EIO;
+   value &= 0x;
+   break;
+   case EFL:
+   value &= FLAG_MASK;
+   /*
+* If the user value contains TF, mark that
+* it was not "us" (the debugger) that set it.
+* If not, make sure it stays set if we had.
+*/
+   if (value & X86_EFLAGS_TF)
+   clear_tsk_thread_flag(child, TIF_FORCED_TF);
+   else if (test_tsk_thread_flag(child, TIF_FORCED_TF))
+   value |= X86_EFLAGS_TF;
+   value |= regs->flags & ~FLAG_MASK;
+   break;
}
*pt_regs_access(regs, regno) = value;
return 0;
@@ -94,26 +94,26 @@ static unsigned long getreg(struct task_struct *child, 
unsigned long regno)
 
regno >>= 2;
switch (regno) {
-   case EFL:
-   /*
-* If the debugger set TF, hide it from the readout.
-*/
-   retval = regs->flags;
-   if (test_tsk_thread_flag(child, TIF_FORCED_TF))
-   retval &= ~X86_EFLAGS_TF;
-   break;
-   case GS:
-   retval = child->thread.gs;
-   break;
-   case DS:
-   case ES:
-   case FS:
-   case SS:
-   case CS:
-   retval = 0x;
-   /* fall through */
-   default:
-   retval &= *pt_regs_access(regs, regno);
+   case EFL:
+   /*
+* If the debugger set TF, hide it from the readout.
+*/
+   retval = regs->flags;
+   if (test_tsk_thread_flag(child, TIF_FORCED_TF))
+   retval &= ~X86_EFLAGS_TF;
+   break;
+   case GS:
+   retval = child->thread.gs;
+   break;
+   case DS:
+   case ES:
+   case FS:
+   case SS:
+   case CS:
+   retval = 0x;
+   /* fall through */
+   default:
+   retval &= *pt_regs_access(regs, regno);
}
return retval;
 }
@@ -190,7 +190,7 @@ static int ptrace_set_debugreg(struct task_struct *child,
  * Make sure the single step bit is not set.
  */
 void ptrace_disable(struct task_struct *child)
-{ 
+{
user_disable_single_step(child);
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
 }
@@ -203,7 +203,7 @@ long 

[PATCH x86/mm 2/6] x86-64 ptrace whitespace

2007-11-28 Thread Roland McGrath

This canonicalizes the indentation in the getreg and putreg functions.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/kernel/ptrace_64.c |  224 +-
 1 files changed, 112 insertions(+), 112 deletions(-)

diff --git a/arch/x86/kernel/ptrace_64.c b/arch/x86/kernel/ptrace_64.c
index 56b31cd..2427548 100644
--- a/arch/x86/kernel/ptrace_64.c
+++ b/arch/x86/kernel/ptrace_64.c
@@ -2,7 +2,7 @@
 /*
  * Pentium III FXSR, SSE support
  * Gareth Hughes <[EMAIL PROTECTED]>, May 2000
- * 
+ *
  * x86-64 port 2000-2002 Andi Kleen
  */
 
@@ -48,7 +48,7 @@
  * Make sure the single step bit is not set.
  */
 void ptrace_disable(struct task_struct *child)
-{ 
+{
user_disable_single_step(child);
 }
 
@@ -63,69 +63,69 @@ static int putreg(struct task_struct *child,
 {
struct pt_regs *regs = task_pt_regs(child);
switch (regno) {
-   case offsetof(struct user_regs_struct,fs):
-   if (value && (value & 3) != 3)
-   return -EIO;
-   child->thread.fsindex = value & 0x; 
-   return 0;
-   case offsetof(struct user_regs_struct,gs):
-   if (value && (value & 3) != 3)
-   return -EIO;
-   child->thread.gsindex = value & 0x;
-   return 0;
-   case offsetof(struct user_regs_struct,ds):
-   if (value && (value & 3) != 3)
-   return -EIO;
-   child->thread.ds = value & 0x;
-   return 0;
-   case offsetof(struct user_regs_struct,es): 
-   if (value && (value & 3) != 3)
-   return -EIO;
-   child->thread.es = value & 0x;
-   return 0;
-   case offsetof(struct user_regs_struct,ss):
-   if ((value & 3) != 3)
-   return -EIO;
-   value &= 0x;
-   return 0;
-   case offsetof(struct user_regs_struct,fs_base):
-   if (value >= TASK_SIZE_OF(child))
-   return -EIO;
-   /*
-* When changing the segment base, use do_arch_prctl
-* to set either thread.fs or thread.fsindex and the
-* corresponding GDT slot.
-*/
-   if (child->thread.fs != value)
-   return do_arch_prctl(child, ARCH_SET_FS, value);
-   return 0;
-   case offsetof(struct user_regs_struct,gs_base):
-   /*
-* Exactly the same here as the %fs handling above.
-*/
-   if (value >= TASK_SIZE_OF(child))
-   return -EIO;
-   if (child->thread.gs != value)
-   return do_arch_prctl(child, ARCH_SET_GS, value);
-   return 0;
-   case offsetof(struct user_regs_struct,flags):
-   value &= FLAG_MASK;
-   /*
-* If the user value contains TF, mark that
-* it was not "us" (the debugger) that set it.
-* If not, make sure it stays set if we had.
-*/
-   if (value & X86_EFLAGS_TF)
-   clear_tsk_thread_flag(child, TIF_FORCED_TF);
-   else if (test_tsk_thread_flag(child, TIF_FORCED_TF))
-   value |= X86_EFLAGS_TF;
-   value |= regs->flags & ~FLAG_MASK;
-   break;
-   case offsetof(struct user_regs_struct,cs): 
-   if ((value & 3) != 3)
-   return -EIO;
-   value &= 0x;
-   break;
+   case offsetof(struct user_regs_struct,fs):
+   if (value && (value & 3) != 3)
+   return -EIO;
+   child->thread.fsindex = value & 0x;
+   return 0;
+   case offsetof(struct user_regs_struct,gs):
+   if (value && (value & 3) != 3)
+   return -EIO;
+   child->thread.gsindex = value & 0x;
+   return 0;
+   case offsetof(struct user_regs_struct,ds):
+   if (value && (value & 3) != 3)
+   return -EIO;
+   child->thread.ds = value & 0x;
+   return 0;
+   case offsetof(struct user_regs_struct,es):
+   if (value && (value & 3) != 3)
+   return -EIO;
+   child->thread.es = value & 0x;
+ 

Re: void* arithmnetic

2007-11-28 Thread Jan Engelhardt

On Nov 29 2007 01:05, J.A. Magallón wrote:
>
>Since begin of the ages the build of the nvidia driver says things like
>this:
>

Explicitly adding -Wpointer-arith to ones own Makefile is like
admitting the code might be problematic. :->


I think sizeof(void *) == 1 is taken as granted as sizeof(int) >= 4
these days. Sigh.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Out of tree module using LSM

2007-11-28 Thread Greg KH
On Wed, Nov 28, 2007 at 06:30:40PM +, Al Viro wrote:
> On Wed, Nov 28, 2007 at 01:15:05PM -0500, [EMAIL PROTECTED] wrote:
> > (Note that the concept has interesting implications in the other direction 
> > as
> > well - rather than stopping you from reading a file that has malware, you 
> > could
> > in theory write an anti-export package that would let you write onto 
> > external
> > memory or outbound e-mail, but prevent the write if it was 
> > corporate-sensitive
> > data, or whatever.
> 
> You _can_ _not_ do that.  If shared mapping gets dirtied, you have no way to
> intercept that.  At all.  Especially since the page stays mapped while it is
> written out, so the next modification can come when hardware had already
> started outbound DMA and there's no way to abort it, no matter what your
> external scanner would do.
> 
> Folks, really, that doesn't work.  At all.  You can intercept all system
> calls you want and it will not be enough to prevent the "bad" contents
> from hitting the disk.
> 
> And if we are talking about the situation when files are written to in
> controlled way (i.e. we are not concerned with malware running on the box
> in question and just want to stop it from passing through mailsewer, etc.),
> then there's no damn need to play with LSM - just have e.g. coda with its
> commit-on-close and run the scanner on commit.  End of story.  Mind you,
> in such setups one would be much better off just having the mail server run
> the tests explicitly in the userland, along with the rest of anti-spam, etc.
> filters.

I've repeated the above statements so many times to a number of the
anti-virus companies, and other people that really should know better,
that I'm really sick of it.  For some reason, they keep trying to do
things like this in the kernel, despite it being trivial to do in
userspace properly.

In the end, I even got one company to agree that it should be done in
userspace (McAfee), but they ignored this and went off to update their
kernel code again :(

Just because other operating systems require you to do things like this
within the kernel, doesn't mean that you have to do the same thing on
Linux...

so sad,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH x86/mm 1/6] x86-64 ia32 ptrace pt_regs cleanup

2007-11-28 Thread Roland McGrath

This cleans up the getreg32/putreg32 functions to use struct pt_regs in a
straightforward fashion, instead of equivalent ugly pointer arithmetic.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 arch/x86/ia32/ptrace32.c |   21 +
 1 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/ia32/ptrace32.c b/arch/x86/ia32/ptrace32.c
index 1e382e3..c52d066 100644
--- a/arch/x86/ia32/ptrace32.c
+++ b/arch/x86/ia32/ptrace32.c
@@ -37,11 +37,11 @@
 
 #define R32(l,q)   \
case offsetof(struct user32, regs.l):   \
-   stack[offsetof(struct pt_regs, q) / 8] = val; break
+   regs->q = val; break;
 
 static int putreg32(struct task_struct *child, unsigned regno, u32 val)
 {
-   __u64 *stack = (__u64 *)task_pt_regs(child);
+   struct pt_regs *regs = task_pt_regs(child);
 
switch (regno) {
case offsetof(struct user32, regs.fs):
@@ -65,12 +65,12 @@ static int putreg32(struct task_struct *child, unsigned 
regno, u32 val)
case offsetof(struct user32, regs.ss):
if ((val & 3) != 3)
return -EIO;
-   stack[offsetof(struct pt_regs, ss)/8] = val & 0x;
+   regs->ss = val & 0x;
break;
case offsetof(struct user32, regs.cs):
if ((val & 3) != 3)
return -EIO;
-   stack[offsetof(struct pt_regs, cs)/8] = val & 0x;
+   regs->cs = val & 0x;
break;
 
R32(ebx, bx);
@@ -84,9 +84,7 @@ static int putreg32(struct task_struct *child, unsigned 
regno, u32 val)
R32(eip, ip);
R32(esp, sp);
 
-   case offsetof(struct user32, regs.eflags): {
-   __u64 *flags = [offsetof(struct pt_regs, flags)/8];
-
+   case offsetof(struct user32, regs.eflags):
val &= FLAG_MASK;
/*
 * If the user value contains TF, mark that
@@ -97,9 +95,8 @@ static int putreg32(struct task_struct *child, unsigned 
regno, u32 val)
clear_tsk_thread_flag(child, TIF_FORCED_TF);
else if (test_tsk_thread_flag(child, TIF_FORCED_TF))
val |= X86_EFLAGS_TF;
-   *flags = val | (*flags & ~FLAG_MASK);
+   regs->flags = val | (regs->flags & ~FLAG_MASK);
break;
-   }
 
case offsetof(struct user32, u_debugreg[0]) ...
offsetof(struct user32, u_debugreg[7]):
@@ -123,11 +120,11 @@ static int putreg32(struct task_struct *child, unsigned 
regno, u32 val)
 
 #define R32(l,q)   \
case offsetof(struct user32, regs.l):   \
-   *val = stack[offsetof(struct pt_regs, q)/8]; break
+   *val = regs->q; break
 
 static int getreg32(struct task_struct *child, unsigned regno, u32 *val)
 {
-   __u64 *stack = (__u64 *)task_pt_regs(child);
+   struct pt_regs *regs = task_pt_regs(child);
 
switch (regno) {
case offsetof(struct user32, regs.fs):
@@ -160,7 +157,7 @@ static int getreg32(struct task_struct *child, unsigned 
regno, u32 *val)
/*
 * If the debugger set TF, hide it from the readout.
 */
-   *val = stack[offsetof(struct pt_regs, flags)/8];
+   *val = regs->flags;
if (test_tsk_thread_flag(child, TIF_FORCED_TF))
*val &= ~X86_EFLAGS_TF;
break;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] Oops in USB / dev code plugging/unplugging multi flash reader

2007-11-28 Thread Greg KH
On Wed, Nov 28, 2007 at 06:17:39PM -0500, Mark Lord wrote:
> Greg KH wrote:
>> On Wed, Nov 28, 2007 at 03:02:35PM -0500, Mark Lord wrote:
>>> While testing a new USB reader/cable today,
>>> I was plugging/unplugging the USB multi-flash reader (22 in 1),
>>> and produced this weird oops.
>>>
>>> There's a locking problem in there somewhere, Greg.
>>>
>>> 2.6.23.8
>> Can you duplicate this without the closed source ATI graphics driver 
>> loaded?
> ..
>
> I don't know if I can reproduce it easily regardless.
> But that fglrx module has ZERO users, so it was completely benign here
> (I've now deleted it from my system).
>
> The tracebacks clearly show USB/dev error.

I'm not disagreeing, but I've seen some very strange crap over the years
come from those closed source video drivers so I do not trust them at
all.

If you can reproduce this without it loaded, please send the new oops
message to the linux-usb mailing list and the developers there will be
glad to work with you to track this down.

Oh, can you also reproduce it with 2.6.24-rc3?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] Writeback fix for concurrent large and small file writes

2007-11-28 Thread Fengguang Wu
On Wed, Nov 28, 2007 at 11:29:57AM -0800, Michael Rubin wrote:
> >From [EMAIL PROTECTED] Wed Nov 28 11:10:06 2007
> Message-Id: <[EMAIL PROTECTED]>
> Date: Wed, 28 Nov 2007 11:01:21 -0800
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: [patch 1/1] Writeback fix for concurrent large and small file writes.
> 
> From: Michael Rubin <[EMAIL PROTECTED]>
> 
> Fixing a bug where writing to large files while concurrently writing to
> smaller ones creates a situation where writeback cannot keep up with the

Could you demonstrate the situation? Or if I guess it right, could it
be fixed by the following patch? (not a nack: If so, your patch could
also be considered as a general purpose improvement, instead of a bug
fix.)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 0fca820..62e62e2 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -301,7 +301,7 @@ __sync_single_inode(struct inode *inode, struct 
writeback_control *wbc)
 * Someone redirtied the inode while were writing back
 * the pages.
 */
-   redirty_tail(inode);
+   requeue_io(inode);
} else if (atomic_read(>i_count)) {
/*
 * The inode is clean, inuse

Thank you,
Fengguang

> traffic and memory baloons until the we hit the threshold watermark. This
> can result in surprising latency spikes when syncing. This latency
> can take minutes on large memory systems. Upon request I can provide
> a test to reproduce this situation. The flush tree fixes this issue and
> fixes several other minor issues with fairness also.
> 
> 1) Adding a data structure to guarantee fairness when writing inodes
> to disk.  The flush_tree is based on an rbtree. The only difference is
> how duplicate keys are chained off the same rb_node.
> 
> 2) Added a FS flag to mark file systems that are not disk backed so we
> don't have to flush them. Not sure I marked all of them. But just marking
> these improves writeback performance.
> 
> 3) Added an inode flag to allow inodes to be marked so that they are
> never written back to disk. See get_pipe_inode.
> 
> Under autotest this patch has passed: fsx, bonnie, and iozone. I am
> currently writing more writeback focused tests (which so far have been
> passed) to add into autotest.
> 
> Signed-off-by: Michael Rubin <[EMAIL PROTECTED]>
> ---

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How to map user space's virtual memory into kernel logical address space

2007-11-28 Thread Robert Hancock

Maitre Bart wrote:

A given app is allocating a large amount of memory (~10M) with
malloc().
It passes this pointer to the kernel (device driver) via an custom
ioctl.
I would like the driver to work on that memory with a pointer (as if
it was allocated with vmalloc) as well as the user space too (upon
return of the syscall).
Is there a way to map a user space's virtual memory range into the
kernel logical address space?

As far as I learned from my readings, using the user-space pointer
directly in kernel space will not work.

Of course, copy_from_user() is out of question for efficiency
purposes.

ioremap() is pretty close to what I wish to do except that it accepts
a physical address and I don't how to get it from a user space
pointer. And since a physical address is required, I assume the range
is considered contiguous, which is not really the case for malloc().

mmap()/remap_pfn_range() are interesting but I don't know how to get a
kernel pointer out of them.

kmap() does the job for a single page (and anyway, I wouldn't know how
to  feed it with a struct page from the userland pointer).

get_user_pages() looks promising but it seems I have to call kmap() on
each page, so it looks like I cannot operate on the buffer with a
single pointer.

Does any one know if it is possible? And if so, how can I do it?


10MB is an awfully big mapping to put into kernel virtual memory space. 
I suspect it might be easier to allocate the memory in the kernel and 
map it in from userspace, but then you have the same problem (and 10MB 
is awfully big for vmalloc).


Is there a good reason why you have to be able to do this? There's 
likely a better way.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] (2.4.26-rc3-mm2) -mm Update CAP_LAST_CAP to reflect CAP_MAC_ADMIN

2007-11-28 Thread Casey Schaufler
From: Casey Schaufler <[EMAIL PROTECTED]>

Bump the value of CAP_LAST_CAP to reflect the current last cap value.
It appears that the patch that introduced CAP_LAST_CAP and the patch
that introduced CAP_MAC_ADMIN came in more or less at the same time.

Signed-off-by: Casey Schaufler <[EMAIL PROTECTED]>

---

 include/linux/capability.h |8 
 1 file changed, 4 insertions(+), 4 deletions(-)


diff -uprN -X linux-2.6.24-rc3-mm2-base/Documentation/dontdiff 
linux-2.6.24-rc3-mm2-base/include/linux/capability.h 
linux-2.6.24-rc3-mm2-lastcap/include/linux/capability.h
--- linux-2.6.24-rc3-mm2-base/include/linux/capability.h2007-11-27 
16:47:02.0 -0800
+++ linux-2.6.24-rc3-mm2-lastcap/include/linux/capability.h 2007-11-28 
14:04:57.0 -0800
@@ -315,10 +315,6 @@ typedef struct kernel_cap_struct {
 
 #define CAP_SETFCAP 31
 
-#define CAP_LAST_CAP CAP_SETFCAP
-
-#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
-
 /* Override MAC access.
The base kernel enforces no MAC policy.
An LSM may enforce a MAC policy, and if it does and it chooses
@@ -336,6 +332,10 @@ typedef struct kernel_cap_struct {
 
 #define CAP_MAC_ADMIN33
 
+#define CAP_LAST_CAP CAP_MAC_ADMIN
+
+#define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
+
 /*
  * Bit location of each capability (used by user-space library and kernel)
  */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: void* arithmnetic

2007-11-28 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 29, 2007 at 01:05:31AM +0100, J.A. Magallón escreveu:
> Hi all...
> 
> Since begin of the ages the build of the nvidia driver says things like
> this:
> 
> include/asm/compat.h:210: warning: pointer of type 'void *' used in arithmetic
> 
> There are several of this warnings. The code in question for this example
> is:
> 
> static __inline__ void __user *compat_alloc_user_space(long len)
> {
> struct pt_regs *regs = task_pt_regs(current);
> return (void __user *)regs->rsp - len;
> }
> 
> As this is dealing with mem blocks, I suppose it's counting in bytes, so
> we could do something like:
> 
>return (void __user *)((u8*)regs->rsp - len);
> 
> so the arithmetic knows how to inc/dec for each unity...
> I think the warning is correct and that void* arithmetic is undefined in C,
> isn't it ?

Yes, but not in gcc, the language the kernel is written 8)

It is allowed and the size of a void is 1. -Wpointer-arith disables
this.

[EMAIL PROTECTED] ~]$ cat voidptr.c
#include 

int main(int argc, char *argv[])
{
void *ptr = argv[argc - 1];

puts(ptr + 4);
return 0;
}
[EMAIL PROTECTED] ~]$ gcc -Wall voidptr.c -o voidptr
[EMAIL PROTECTED] ~]$ ./a Magallón
llón
[EMAIL PROTECTED] ~]$ gcc -Wall -Wpointer-arith voidptr.c -o voidptr
voidptr.c: In function ‘main’:
voidptr.c:7: warning: pointer of type ‘void *’ used in arithmetic
[EMAIL PROTECTED] ~]$ ./a Magallón
llón
[EMAIL PROTECTED] ~]$

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/3] net/bonding: Adhere to coding style: break line after the if condition

2007-11-28 Thread =?utf-8?q?Ferenc_W=C3=A1gner?=
Signed-off-by: Ferenc Wágner <[EMAIL PROTECTED]>
---
Randy Dunlap <[EMAIL PROTECTED]> writes:

> Wagner Ferenc wrote:
>> Randy Dunlap <[EMAIL PROTECTED]> writes:
>>
>>> Patches 1 & 3 use
>>>
>>> if (res) statement;
>>>
>>> but the preferred form is
>>>
>>> if (res)
>>> statement;
>>>
>>> Even if this style was already used in the source file, it should
>>> be cleaned up.
>>
>> No principal problem.  So that I learn something useful: how should I
>> go about this?  I created the patches with git-format-patch, and they
>> depend on each other, so I'd rather not git-reset, if possible...
>>
>> Can I just create a follow-up patch which fixes this stylistic issue?
>
> That's OK with me.  I can't say how it might be done with git.

 drivers/net/bonding/bond_sysfs.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 5c31f5c..9de2c52 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -91,7 +91,8 @@ static ssize_t bonding_show_bonds(struct class *cls, char 
*buf)
}
res += sprintf(buf + res, "%s ", bond->dev->name);
}
-   if (res) buf[res-1] = '\n'; /* eat the leftover space */
+   if (res)
+   buf[res-1] = '\n'; /* eat the leftover space */
up_read(&(bonding_rwsem));
return res;
 }
@@ -239,7 +240,8 @@ static ssize_t bonding_show_slaves(struct device *d,
res += sprintf(buf + res, "%s ", slave->dev->name);
}
read_unlock(>lock);
-   if (res) buf[res-1] = '\n'; /* eat the leftover space */
+   if (res)
+   buf[res-1] = '\n'; /* eat the leftover space */
return res;
 }
 
@@ -705,7 +707,8 @@ static ssize_t bonding_show_arp_targets(struct device *d,
res += sprintf(buf + res, "%u.%u.%u.%u ",
   NIPQUAD(bond->params.arp_targets[i]));
}
-   if (res) buf[res-1] = '\n'; /* eat the leftover space */
+   if (res)
+   buf[res-1] = '\n'; /* eat the leftover space */
return res;
 }
 
-- 
1.4.4.4

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: named + capset = EPERM [Was: 2.6.24-rc3-mm2]

2007-11-28 Thread Serge E. Hallyn
Quoting Serge E. Hallyn ([EMAIL PROTECTED]):
> Quoting Serge E. Hallyn ([EMAIL PROTECTED]):
> > Quoting Casey Schaufler ([EMAIL PROTECTED]):
> > > 
> > > --- Jiri Slaby <[EMAIL PROTECTED]> wrote:
> > > 
> > > > On 11/28/2007 12:41 PM, Andrew Morton wrote:
> > > > >
> > > >
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm2/
> > > > [...]
> > > > > +capabilities-introduce-per-process-capability-bounding-set.patch
> > > > 
> > > > A regression against -mm1. This patch breaks bind (9.5.0-18.a7.fc8):
> > > > capset(0x19980330, 0,
> > > >
> > > {CAP_DAC_READ_SEARCH|CAP_SETGID|CAP_SETUID|CAP_NET_BIND_SERVICE|CAP_SYS_CHROOT|CAP_SYS_RESOURCE,
> > > >
> > > CAP_DAC_READ_SEARCH|CAP_SETGID|CAP_SETUID|CAP_NET_BIND_SERVICE|CAP_SYS_CHROOT|CAP_SYS_RESOURCE,
> > > > 0}) = -1 EPERM (Operation not permitted)
> > > > 
> > > > $ grep SEC .config
> > > > CONFIG_SECCOMP=y
> > > > # CONFIG_NETWORK_SECMARK is not set
> > > > CONFIG_RPCSEC_GSS_KRB5=m
> > > > # CONFIG_RPCSEC_GSS_SPKM3 is not set
> > > > # CONFIG_SECURITY is not set
> > > > # CONFIG_SECURITY_FILE_CAPABILITIES is not set
> > > > 
> > > > probably this hunk?:
> > > > @@ -133,6 +119,12 @@ int cap_capset_check (struct task_struct
> > > > /* incapable of using this inheritable set */
> > > > return -EPERM;
> > > > }
> > > > +   if (!!cap_issubset(*inheritable,
> > > > +  cap_combine(target->cap_inheritable,
> > > > +  current->cap_bset))) {
> > > > +   /* no new pI capabilities outside bounding set */
> > > > +   return -EPERM;
> > > > +   }
> > 
> > That shouldn't be it, since you can't lower cap_bset since
> > CONFIG_SECURITY_FILE_CAPABILITIES=n.
> 
> Hmm, but sure enough that appears to be it.
> 
> Still trying to figure out why.

No.  Seriously.  You're kidding me.

Patch attached  :(

Thanks for spotting this, Jiri.  I don't know where I introduced this
since I thought all my tests had passed...

thanks,
-serge

>From 70d5da610fdbd66a36886c01e27b7fb11d2de044 Mon Sep 17 00:00:00 2001
From: [EMAIL PROTECTED] <[EMAIL PROTECTED](none)>
Date: Wed, 28 Nov 2007 16:16:23 -0800
Subject: [PATCH 1/1] capabilities: correct logic at capset_check

Fix typo at capset_check introduced with capability bounding set
patch.

Signed-off-by: [EMAIL PROTECTED] <[EMAIL PROTECTED](none)>
---
 security/commoncap.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index c25ad09..503e958 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -119,7 +119,7 @@ int cap_capset_check (struct task_struct *target, 
kernel_cap_t *effective,
/* incapable of using this inheritable set */
return -EPERM;
}
-   if (!!cap_issubset(*inheritable,
+   if (!cap_issubset(*inheritable,
   cap_combine(target->cap_inheritable,
   current->cap_bset))) {
/* no new pI capabilities outside bounding set */
-- 
1.5.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >