date:20070309

Re: the usage of DEBUG_DRIVER seems ambiguous

2007-03-09 Thread Robert P. J. Day

On Fri, 9 Mar 2007, Artem Bityutskiy wrote:

> Randy Dunlap wrote:
> > >   it's clearly a configuration variable, but it's also being used by
> > > itself in a few drivers/net/ source files.  is that deliberate?
> >
> > The ones in drivers/net/ are just local driver debug controls.
> > They happen to have the same name as a (likely newer) kconfig symbol.
> >
> > Is there a real problem that needs to be fixed?
>
> Renaming them just for the sake of being less confusing makes sense.

that's kind of what i had in mind.  i have a script that peruses the
source tree, checking for apparent typoes in preprocessor directives
when someone forgets the leading "CONFIG_", and as long as that macro
name is the way it is, that example is going to be flagged every time
for no good reason.

if someone wants to make a suggestion, i can submit a simple renaming
patch.

rday

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/3] Input: psmouse - create PS/2 protocol options for Kconfig

2007-03-09 Thread Dmitry Torokhov


On 3/9/07, Andres Salomon <[EMAIL PROTECTED]> wrote:


I haven't seen patches in your tree; are you waiting for me to do the
cleanups and resend?



Still in my private tree; will try to push out over the weekend.

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Keyboard stops working after *lock [Was: 2.6.21-rc2-mm1]

2007-03-09 Thread Jiri Kosina

On Fri, 9 Mar 2007, Dmitry Torokhov wrote:

> > > > > (II) evdev brain: Rescanning devices (12).
> > > > > (II) evdev brain: Rescanning devices (13).
> > > > > (II) evdev brain: Rescanning devices (14).
> > > > > in this kernel, but I don't know if this is relevant.
> > > > > After booting back to .20-mm2 everything is OK.
> > > Thanks.  Cc's added.
> > Remains unsolved in 2.6.21-rc3-mm2.
> Does a PS/2 keyboard behave for you?
> Nowadays I forward all USB HID related issues to Jiri Kosina ;) (CCed).

Hi,

more importantly, does 2.6.21-rc3 work for you? There are not that many 
USB HID/hidinput specific patches in -mm, so it would show clearly whether 
it's problem in USB HID/hidinput, or somewhere else.

What keyboard is that please? (vedor/product ids)

Also, if it turns out to be HID problem - could you please send output of 
both working and non-working kernels with hid/usbhid debugging enabled?

If this is present also in vanilla and not only in -mm, could you please 
try reverting commits 4237081e573b99a48991aa71364b0682c444651c and 
d4ae650a904612ffb7edd3f28b69b022988d2466 and let me know if the situation 
gets any better?

Thanks,

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-09 Thread Bill Davidsen


Linus Torvalds wrote:

On Thu, 8 Mar 2007, Bill Davidsen wrote:
  

Please, could you now rethink plugable scheduler as well? Even if one had to
be chosen at boot time and couldn't be change thereafter, it would still allow
a few new thoughts to be included.



No. Really.

I absolutely *detest* pluggable schedulers. They have a huge downside: 
they allow people to think that it's ok to make special-case schedulers. 
  
But it IS okay for people to make special-case schedulers. Because it's 
MY machine, and how it behaves under mixed load is not a technical 
issue, it's a POLICY issue, and therefore the only way you can allow the 
admin to implement that policy is to either provide several schedulers 
or to provide all sorts of tunable knobs. And by having a few schedulers 
which have been heavily tested and reviewed, you can define the policy 
the scheduler implements and document it. Instead of people writing 
their own, or hacking the code, they could have a few well-tested 
choices, with known policy goals.

And I simply very fundamentally disagree.

If you want to play with a scheduler of your own, go wild. It's easy 
(well, you'll find out that getting good results isn't, but that's a 
different thing). But actual pluggable schedulers just cause people to 
think that "oh, the scheduler performs badly under circumstance X, so 
let's tell people to use special scheduler Y for that case".
  
And has that been a problem with io schedulers? I don't see any vast 
proliferation of them, I don't see contentious exchanges on LKML, or 
people asking how to get yet another into mainline. In fact, I would say 
that the io scheduler situation is as right as anything can be, choices 
for special cases, lack of requests for something else.
And CPU scheduling really isn't that complicated. It's *way* simpler than 
IO scheduling. There simply is *no*excuse* for not trying to do it well 
enough for all cases, or for having special-case stuff.
  
This supposes that the desired behavior, the policy, is the same on all 
machines or that there is currently a way to set the target. If I want 
interactive response with no consideration to batch (and can't trust 
users to use nice), I want one policy. If I want a compromise, the 
current scheduler or RSDL are candidates, but they do different things.
But even IO scheduling actually ends up being largely the same. Yes, we 
have pluggable schedulers, and we even allow switching them, but in the 
end, we don't want people to actually do it. It's much better to have a 
scheduler that is "good enough" than it is to have five that are "perfect" 
for five particular cases.
  
We not only have multiple io schedulers, we have many tunable io 
parameters, all of which allow people to make their system behave the 
way they think is best. It isn't causing complaint, confusion, or 
instability. We have many people requesting a different scheduler, so 
obviously what we have isn't "good enough" and I doubt any one scheduler 
can be, given that the target behavior is driven by non-technical choices.


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Sleeping thread not receive signal until it wakes up

2007-03-09 Thread Sergey Vlasov

On Thu, 8 Mar 2007 14:52:07 -0800 Luong Ngo wrote:

[...]
> static irqreturn board_isr(int irq, void *dev_id, struct pt_regs* regs)
> {
>  spin_lock(&dev->lock);
>if (dev->irqMask & (1 << irqBit)) {
> // Set the interrupt event mask
> dev->irqEvent |= (1 << irqBit);
> 
> // Disable this irq, it will be reenabled after processed by board task
> disable_irq(irq);

I assume that your device does not support shared interrupts?  If it
does (and a PCI device is required to support them), you cannot use
disable_irq() here (and you need to check a register in the device to
find out if it really did generate an IRQ)...

> // Wake up Board thread that calling IOCTL
> wake_up(&(dev->boardIRQWaitQueue));
>   }
>   spin_unlock(&dev->lock);
> 
>   return IRQ_HANDLED;

...and return IRQ_NONE here if the IRQ is not from your device.

> 
> }
> 
> static int ats89_ioctl(struct inode *inode, struct file *file, u_int
> cmd, u_long arg)
> {
> 
>   switch(cmd){
>case GET_IRQ_CMD: {
> u32  regMask32;
> 
>spin_lock_irq(dev->lock);
>while ((dev->irqMask & dev->irqEvent) == 0) {
>  // Sleep until board interrupt happens
>  spin_unlock_irq(dev->lock);
>  interruptible_sleep_on(&(dev->boardIRQWaitQueue));
>  if (uncond_wakeup) {
>  /* don't go back to loop */
>  break;
>  }
>  spin_lock_irq(dev->lock);
>  }
> 
> uncond_wakeup = 0;
> 
>  // Board interrupt happened
> regMask32 = dev->irqMask & dev->irqEvent;
>  if(copy_to_user(&(((ATS89_IOCTL_S *)arg)->mask32),
> ®Mask32, sizeof(u32))) {
>  spin_unlock_irq(dev->lock);
>  return -EAGAIN;
>  }
> 
>  // Clear the event mask
>  dev->irqEvent = 0;
>  spin_unlock_irq(dev->lock);
> }
> break;
> 
> 
>}
> }

And this code is full of bugs:

 1) As you have been told already, interruptible_sleep_on() and
sleep_on() functions are broken and should not be used (they are
left in the kernel only to support some obsolete code).  Either
use wait_event_interruptible() or work with wait queues directly
(prepare_to_wait(), finish_wait(), ...).

 2) The code to handle pending signals is missing - you need to have
this after wait_event_interruptible():

if (signal_pending(current))
return -ERESTARTSYS;

(but be careful - you might need to clean up something before
returning).

This is what causes your problem - interruptible_sleep_on()
returns if a signal is pending, but your code does not check for
signals and therefore invokes interruptible_sleep_on() again; but
if a signal is pending, interruptible_sleep_on() returns
immediately, causing your driver to eat 100% CPU looping in kernel
mode until some device event finally happens.

 3) If uncond_wakeup is set, you break out of the loop with dev->lock
unlocked; however, if dev->irqEvent gets set, you exit the loop
with dev->lock locked.  The subsequent code always unlocks
dev->lock, so in the uncond_wakeup case you have double unlock.

 4) You are doing copy_to_user() while holding a spinlock - this is
prohibited (as any other form of sleep inside a spinlock).

 5) The return code for the copy_to_user() failure is wrong - it
should be -EFAULT (this is not a fatal bug, but an annoyance for
users of your driver, who might get such nonstandard error codes
while debugging their programs and wonder what is going on).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sys_write() racy for multi-threaded append?

2007-03-09 Thread Benjamin LaHaise

On Fri, Mar 09, 2007 at 04:19:55AM -0800, Michael K. Edwards wrote:
> On 3/8/07, Benjamin LaHaise <[EMAIL PROTECTED]> wrote:
> >Any number of things can cause a short write to occur, and rewinding the
> >file position after the fact is just as bad.  A sane app has to either
> >serialise the writes itself or use a thread safe API like pwrite().
> 
> Not on a pipe/FIFO.  Short writes there are flat out verboten by
> 1003.1 unless O_NONBLOCK is set.  (Not that f_pos is interesting on a
> pipe except as a "bytes sent" indicator  -- and in the multi-threaded
> scenario, if you do the speculative update that I'm suggesting, you
> can't 100% trust it unless you ensure that you are not in
> mid-read/write in some other thread at the moment you sample f_pos.
> But that doesn't make it useless.)

Writes to a pipe/FIFO are atomic, so long as they fit within the pipe buffer 
size, while f_pos on a pipe is undefined -- what exactly is the issue here?  
The semantics you're assuming are not defined by POSIX.  Heck, even looking 
at a man page for one of the *BSDs states "Some devices are incapable of 
seeking.  The value of the pointer associated with such a device is 
undefined."  What part of undefined is problematic?

> As to what a "sane app" has to do: it's just not that unusual to write
> application code that treats a short read/write as a catastrophic
> error, especially when the fd is of a type that is known never to
> produce a short read/write unless something is drastically wrong.  For
> instance, I bomb on short write in audio applications where the driver
> is known to block until enough bytes have been read/written, period.
> When switching from reading a stream of audio frames from thread A to
> reading them from thread B, I may be willing to omit app
> serialization, because I can tolerate an imperfect hand-off in which
> thread A steals one last frame after thread B has started reading --
> as long as the fd doesn't get screwed up.  There is no reason for the
> generic sys_read code to leave a race open in which the same frame is
> read by both threads and a hardware buffer overrun results later.

I hope I don't have to run any of your software.  Short writes can and do 
happen because of a variety of reasons: signals, memory allocation failures, 
quota being exceeded  These are all error conditions the kernel has to 
provide well defined semantics for, as well behaved applications will try 
to handle them gracefully.

> In short, I'm not proposing that the kernel perfectly serialize
> concurrent reads and writes to arbitrary fd types.  I'm proposing that
> it not do something blatantly stupid and easily avoided in generic
> code that makes it impossible for any fd type to guarantee that, after
> 10 successful pipelined 100-byte reads or writes, f_pos will have
> advanced by 1000.

The semantics you're looking for are defined for regular files with 
O_APPEND.  Anything else is asking for synchronization that other 
applications do not require and do not desire.

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [SLUB 0/3] SLUB: The unqueued slab allocator V4

2007-03-09 Thread Mel Gorman




Note that I am amazed that the kernbench even worked.


The results without slub_debug were not good except for IA64. x86_64 and 
ppc64 both blew up for a variety of reasons. The IA64 results were


KernBench Comparison

  2.6.21-rc2-mm2-clean   2.6.21-rc2-mm2-slub  
%diff
User   CPU time1084.64   1032.93  
4.77%
System CPU time  73.38 63.14 
13.95%
Total  CPU time1158.02   1096.07  
5.35%
Elapsedtime 307.00285.62  
6.96%

AIM9 Comparison
---
 2.6.21-rc2-mm2-clean2.6.21-rc2-mm2-slub
 1 creat-clo425460.75  438809.64   13348.89  
3.14% File Creations and Closes/second
 2 page_test   2097119.26 3398259.27 1301140.01 62.04% 
System Allocations & Pages/second
 3 brk_test7008395.33 6728755.72 -279639.61 
-3.99% System Memory Allocations/second
 4 jmp_test   12226295.3112254966.21   28670.90  
0.23% Non-local gotos/second
 5 signal_test 1271126.28 1235510.96  -35615.32 
-2.80% Signal Traps/second
 6 exec_test   395.54 381.18 -14.36 
-3.63% Program Loads/second
 7 fork_test 13218.23   13211.41  -6.82 
-0.05% Task Creations/second
 8 link_test 64776.047488.13  -57287.91 
-88.44% Link/Unlink Pairs/second

An example console log from x86_64 is below. It's not particular clear why 
it went blamo and I haven't had a chance all day to kick it around for a 
bit due to a variety of other hilarity floating around.


Linux version 2.6.21-rc2-mm2-autokern1 ([EMAIL PROTECTED]) (gcc version 4.1.1 
20060525 (Red Hat 4.1.1-1)) #1 SMP Thu Mar 8 12:13:27 CST 2007
Command line: ro root=/dev/VolGroup00/LogVol00 rhgb console=tty0 
console=ttyS1,19200 selinux=no autobench_args: root=30726124 ABAT:1173378546 
loglevel=8
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009d400 (usable)
 BIOS-e820: 0009d400 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 3ffcddc0 (usable)
 BIOS-e820: 3ffcddc0 - 3ffd (ACPI data)
 BIOS-e820: 3ffd - 4000 (reserved)
 BIOS-e820: fec0 - 0001 (reserved)
Entering add_active_range(0, 0, 157) 0 entries of 3200 used
Entering add_active_range(0, 256, 262093) 1 entries of 3200 used
end_pfn_map = 1048576
DMI 2.3 present.
ACPI: RSDP 000FDFC0, 0014 (r0 IBM   )
ACPI: RSDT 3FFCFF80, 0034 (r1 IBMSERBLADE 1000 IBM  45444F43)
ACPI: FACP 3FFCFEC0, 0084 (r2 IBMSERBLADE 1000 IBM  45444F43)
ACPI: DSDT 3FFCDDC0, 1EA6 (r1 IBMSERBLADE 1000 INTL  2002025)
ACPI: FACS 3FFCFCC0, 0040
ACPI: APIC 3FFCFE00, 009C (r1 IBMSERBLADE 1000 IBM  45444F43)
ACPI: SRAT 3FFCFD40, 0098 (r1 IBMSERBLADE 1000 IBM  45444F43)
ACPI: HPET 3FFCFD00, 0038 (r1 IBMSERBLADE 1000 IBM  45444F43)
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 1 -> APIC 2 -> Node 1
SRAT: PXM 1 -> APIC 3 -> Node 1
SRAT: Node 0 PXM 0 0-4000
Entering add_active_range(0, 0, 157) 0 entries of 3200 used
Entering add_active_range(0, 256, 262093) 1 entries of 3200 used
NUMA: Using 63 for the hash shift.
Bootmem setup node 0 -3ffcd000
Node 0 memmap at 0x81003efcd000 size 16773952 first pfn 0x81003efcd000
sizeof(struct page) = 64
Zone PFN ranges:
  DMA 0 -> 4096
  DMA324096 ->  1048576
  Normal1048576 ->  1048576
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 ->  157
0:  256 ->   262093
On node 0 totalpages: 261994
  DMA zone: 64 pages used for memmap
  DMA zone: 2017 pages reserved
  DMA zone: 1916 pages, LIFO batch:0
  DMA32 zone: 4031 pages used for memmap
  DMA32 zone: 253966 pages, LIFO batch:31
  Normal zone: 0 pages used for memmap
  Movable zone: 0 pages used for memmap
ACPI: PM-Timer IO Port: 0x2208
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
Processor #2
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
Processor #3
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x0e] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 14, address 0xfec0, GSI 0-23
ACPI: IOAPIC (id[0x0d] address[0xfec1] gsi_base[24])
IOAPIC[1]: apic_id 13, address 0xfe

[PATCH 0/4 TRY#3] improve alternative instruction code and optimize get_cycles_sync

2007-03-09 Thread Joerg Roedel

This series of patches extend the alternative instructions framework on
i386 and x86_64 architectures to support two alternative instruction
replacements. This code is used together with the introduction of the
X86_FEATURE_SYNC_RDTSC flag on i386 to simplify and optimize the
get_cycles_sync() function. The optimization changes this function to
use RDTSCP instead of CPUID;RDTSC if this instruction is available.
Don't use CPUID there is really important if the kernel runs as a KVM
guest, because this instruction is intercepted and causes an expensive
VMEXIT.

Changes to the previous submit:
 * rebased to current linus git tree
 * replaced RDTSCP usage in get_cycles_sync with the opcode to
   make it compile with older binutils

-- 
Joerg Roedel
Operating System Research Center
AMD Saxony LLC & Co. KG


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/7] revoke: wire up s390 system calls

2007-03-09 Thread Arnd Bergmann

On Friday 09 March 2007, Pekka J Enberg wrote:
> 
> From: Serge E. Hallyn <[EMAIL PROTECTED]>
> 
> Make revokeat and frevoke system calls available to user-space on s390.
> 
> Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]>
> Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>

Looks good to me, but you really should through Martin, since he
has an overview of what syscall numbers may already be assigned
some another patch he has queued up.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4 TRY#3] i386: extend alternative instructions framework

2007-03-09 Thread Joerg Roedel

From: Joerg Roedel <[EMAIL PROTECTED]>

This patch extends the alternative instructions framework to support 2
alternative instructions.

Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>

-- 
Joerg Roedel
Operating System Research Center
AMD Saxony LLC & Co. KG
diff --git a/arch/i386/kernel/alternative.c b/arch/i386/kernel/alternative.c
index 9eca21b..59f1770 100644
--- a/arch/i386/kernel/alternative.c
+++ b/arch/i386/kernel/alternative.c
@@ -153,14 +153,23 @@ extern u8 __smp_alt_begin[], __smp_alt_end[];
 void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
 {
struct alt_instr *a;
-   u8 *instr;
+   u8 *instr, *replacement;
+   u8 replacementlen;
int diff;
 
DPRINTK("%s: alt table %p -> %p\n", __FUNCTION__, start, end);
for (a = start; a < end; a++) {
-   BUG_ON(a->replacementlen > a->instrlen);
-   if (!boot_cpu_has(a->cpuid))
+   if (boot_cpu_has(a->cpuid)) {
+   replacement = a->replacement;
+   replacementlen = a->replacementlen;
+   } else if ((a->replacementlen2 > 0) &&
+  (boot_cpu_has(a->cpuid2))) {
+   replacement = a->replacement2;
+   replacementlen = a->replacementlen2;
+   } else
continue;
+
+   BUG_ON(replacementlen > a->instrlen);
instr = a->instr;
 #ifdef CONFIG_X86_64
/* vsyscall code is not mapped yet. resolve it manually. */
@@ -170,9 +179,9 @@ void apply_alternatives(struct alt_instr *start, struct 
alt_instr *end)
__FUNCTION__, a->instr, instr);
}
 #endif
-   memcpy(instr, a->replacement, a->replacementlen);
-   diff = a->instrlen - a->replacementlen;
-   nop_out(instr + a->replacementlen, diff);
+   memcpy(instr, replacement, replacementlen);
+   diff = a->instrlen - replacementlen;
+   nop_out(instr + replacementlen, diff);
}
 }
 
diff --git a/include/asm-i386/alternative.h b/include/asm-i386/alternative.h
index b8fa955..4a77e93 100644
--- a/include/asm-i386/alternative.h
+++ b/include/asm-i386/alternative.h
@@ -10,11 +10,14 @@
 struct alt_instr {
u8 *instr;  /* original instruction */
u8 *replacement;
+   u8 *replacement2;
u8  cpuid;  /* cpuid bit set for replacement */
+   u8  cpuid2; /* cpuid bit set for replacement2 */
u8  instrlen;   /* length of original instruction */
u8  replacementlen; /* length of new instruction, <= instrlen */
-   u8  pad;
-};
+   u8  replacementlen2;
+   u8  pad[3];
+}  __attribute__ ((packed));
 
 extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
 
@@ -36,6 +39,12 @@ static inline void alternatives_smp_switch(int smp) {}
 #endif
 
 /*
+ * use this macro(s) if you need more than one output parameter
+ * in alternative_io_*
+ */
+#define ASM_OUTPUT2(a, b) a, b
+
+/*
  * Alternative instructions for different CPU types or capabilities.
  *
  * This allows to use optimized instructions even on generic binary
@@ -53,9 +62,12 @@ static inline void alternatives_smp_switch(int smp) {}
  "  .align 4\n"\
  "  .long 661b\n"/* label */   \
  "  .long 663f\n"/* new instruction */ \
+ "  .long 0x00\n"  \
  "  .byte %c0\n" /* feature bit */ \
+ "  .byte 0x00\n"  \
  "  .byte 662b-661b\n"   /* sourcelen */   \
  "  .byte 664f-663f\n"   /* replacementlen */  \
+ "  .byte 0x00\n"  \
  ".previous\n" \
  ".section .altinstr_replacement,\"ax\"\n" \
  "663:\n\t" newinstr "\n664:\n"   /* replacement */\
@@ -77,14 +89,38 @@ static inline void alternatives_smp_switch(int smp) {}
  "  .align 4\n"\
  "  .long 661b\n"/* label */   \
  "  .long 663f\n"/* new instruction */ \
+ "  .long 0x00\n"  \
  "  .byte %c0\n" /* feature bit */ \
+ "  .byte 0x00\n"  \
  "  .byte 662b-661b\n"   /* sourcelen */   \
  "  .byte 664f-663f\n"   /* replacementlen */  \
+ "  .byte 0x00\n"  \
  ".previ

[PATCH 2/4 TRY#3] x86_64: changes to x86_64 architecture for alternative instruction improvements

2007-03-09 Thread Joerg Roedel

From: Joerg Roedel <[EMAIL PROTECTED]>

In this patch updates the x86_64 architecture to work with the changes
to alternative instructions in i386

Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>

-- 
Joerg Roedel
Operating System Research Center
AMD Saxony LLC & Co. KG
diff --git a/arch/x86_64/lib/clear_page.S b/arch/x86_64/lib/clear_page.S
index 9a10a78..ab525ee 100644
--- a/arch/x86_64/lib/clear_page.S
+++ b/arch/x86_64/lib/clear_page.S
@@ -53,7 +53,10 @@ ENDPROC(clear_page)
.align 8
.quad clear_page
.quad 1b
+   .quad 0
.byte X86_FEATURE_REP_GOOD
+   .byte 0
.byte .Lclear_page_end - clear_page
.byte 2b - 1b
+   .byte 0
.previous
diff --git a/arch/x86_64/lib/copy_page.S b/arch/x86_64/lib/copy_page.S
index 727a5d4..b4d0329 100644
--- a/arch/x86_64/lib/copy_page.S
+++ b/arch/x86_64/lib/copy_page.S
@@ -113,7 +113,10 @@ ENDPROC(copy_page)
.align 8
.quad copy_page
.quad 1b
+   .quad 0
.byte X86_FEATURE_REP_GOOD
+   .byte 0
.byte .Lcopy_page_end - copy_page
.byte 2b - 1b
+   .byte 0
.previous
diff --git a/arch/x86_64/lib/copy_user.S b/arch/x86_64/lib/copy_user.S
index 70bebd3..d505df3 100644
--- a/arch/x86_64/lib/copy_user.S
+++ b/arch/x86_64/lib/copy_user.S
@@ -27,9 +27,12 @@
.align 8
.quad  0b
.quad  2b
+   .quad  0
.byte  \feature  /* when feature is set */
+   .byte  0
.byte  5
.byte  5
+   .byte  0
.previous
.endm
 
diff --git a/arch/x86_64/lib/memcpy.S b/arch/x86_64/lib/memcpy.S
index 0ea0ddc..b1e1686 100644
--- a/arch/x86_64/lib/memcpy.S
+++ b/arch/x86_64/lib/memcpy.S
@@ -123,7 +123,10 @@ ENDPROC(__memcpy)
.align 8
.quad memcpy
.quad 1b
+   .quad 0
.byte X86_FEATURE_REP_GOOD
+   .byte 0
.byte .Lfinal - memcpy
.byte 2b - 1b
+   .byte 0
.previous
diff --git a/arch/x86_64/lib/memset.S b/arch/x86_64/lib/memset.S
index 2c59481..566e179 100644
--- a/arch/x86_64/lib/memset.S
+++ b/arch/x86_64/lib/memset.S
@@ -127,7 +127,10 @@ ENDPROC(__memset)
.align 8
.quad memset
.quad 1b
+   .quad 0
.byte X86_FEATURE_REP_GOOD
+   .byte 0
.byte .Lfinal - memset
.byte 2b - 1b
+   .byte 0
.previous
diff --git a/include/asm-x86_64/alternative.h b/include/asm-x86_64/alternative.h
index a6657b4..63cd8e5 100644
--- a/include/asm-x86_64/alternative.h
+++ b/include/asm-x86_64/alternative.h
@@ -10,11 +10,14 @@
 struct alt_instr {
u8 *instr;  /* original instruction */
u8 *replacement;
+   u8 *replacement2;
u8  cpuid;  /* cpuid bit set for replacement */
+   u8  cpuid2; /* cpuid bit set for replacement2 */
u8  instrlen;   /* length of original instruction */
u8  replacementlen; /* length of new instruction, <= instrlen */
-   u8  pad[5];
-};
+   u8  replacementlen2;
+   u8  pad[3];
+} __attribute__ ((packed));
 
 extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
 
@@ -36,6 +39,12 @@ static inline void alternatives_smp_switch(int smp) {}
 
 #endif
 
+/*
+ * use this macro(s) if you need more than one output parameter
+ * in alternative_io_*
+ */
+#define ASM_OUTPUT2(a, b) a, b
+
 /*
  * Alternative instructions for different CPU types or capabilities.
  *
@@ -54,9 +63,12 @@ static inline void alternatives_smp_switch(int smp) {}
  "  .align 8\n"   \
  "  .quad 661b\n"/* label */  \
  "  .quad 663f\n"/* new instruction */ \
+ "  .quad 0x00\n"  \
  "  .byte %c0\n" /* feature bit */\
+ "  .byte 0x00\n"  \
  "  .byte 662b-661b\n"   /* sourcelen */  \
  "  .byte 664f-663f\n"   /* replacementlen */ \
+ "  .byte 0x00\n"  \
  ".previous\n" \
  ".section .altinstr_replacement,\"ax\"\n" \
  "663:\n\t" newinstr "\n664:\n"   /* replacement */ \
@@ -78,9 +90,12 @@ static inline void alternatives_smp_switch(int smp) {}
  "  .align 8\n"\
  "  .quad 661b\n"/* label */   \
  "  .quad 663f\n"/* new instruction */ \
+ "  .quad 0x00\n"  \
  "  .byte %c0\n" /* feature bit */ \
+ "  .byte 0x00\n"  \
  "  .byte 662b-661b\n"

[PATCH 3/4 TRY#3] i386: add the X86_FEATURE_SYNC_RDTSC flag

2007-03-09 Thread Joerg Roedel

From: Joerg Roedel <[EMAIL PROTECTED]>

This patch adds the  X86_FEATURE_SYNC_RDTSC to the i386 architecture.
This is very helpfull to simplify the get_cycles_sync() function and
remove the #ifdefs from it.

Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>

-- 
Joerg Roedel
Operating System Research Center
AMD Saxony LLC & Co. KG
diff --git a/arch/i386/kernel/cpu/amd.c b/arch/i386/kernel/cpu/amd.c
index 41cfea5..11f5730 100644
--- a/arch/i386/kernel/cpu/amd.c
+++ b/arch/i386/kernel/cpu/amd.c
@@ -241,6 +241,8 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
 
if (cpuid_eax(0x8000) >= 0x8006)
num_cache_leaves = 3;
+
+   clear_bit(X86_FEATURE_SYNC_RDTSC, c->x86_capability);
 }
 
 static unsigned int __cpuinit amd_size_cache(struct cpuinfo_x86 * c, unsigned 
int size)
diff --git a/arch/i386/kernel/cpu/intel.c b/arch/i386/kernel/cpu/intel.c
index 56fe265..403a495 100644
--- a/arch/i386/kernel/cpu/intel.c
+++ b/arch/i386/kernel/cpu/intel.c
@@ -188,8 +188,11 @@ static void __cpuinit init_intel(struct cpuinfo_x86 *c)
}
 #endif
 
-   if (c->x86 == 15)
+   if (c->x86 == 15) {
set_bit(X86_FEATURE_P4, c->x86_capability);
+   set_bit(X86_FEATURE_SYNC_RDTSC, c->x86_capability);
+   } else
+   clear_bit(X86_FEATURE_SYNC_RDTSC, c->x86_capability);
if (c->x86 == 6) 
set_bit(X86_FEATURE_P3, c->x86_capability);
if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
diff --git a/include/asm-i386/cpufeature.h b/include/asm-i386/cpufeature.h
index 3f92b94..a9f1f01 100644
--- a/include/asm-i386/cpufeature.h
+++ b/include/asm-i386/cpufeature.h
@@ -75,6 +76,7 @@
 #define X86_FEATURE_ARCH_PERFMON (3*32+11) /* Intel Architectural PerfMon */
 #define X86_FEATURE_PEBS   (3*32+12)  /* Precise-Event Based Sampling */
 #define X86_FEATURE_BTS(3*32+13)  /* Branch Trace Store */
+#define X86_FEATURE_SYNC_RDTSC  (3*32+14)  /* RDTSC is serializing */
 
 /* Intel-defined CPU features, CPUID level 0x0001 (ecx), word 4 */
 #define X86_FEATURE_XMM3   (4*32+ 0) /* Streaming SIMD Extensions-3 */

[PATCH 4/4 TRY#3] optimize and simplify get_cycles_sync()

2007-03-09 Thread Joerg Roedel

From: Joerg Roedel <[EMAIL PROTECTED]>

This patch simplifies the get_cycles_sync() function by removing
the #ifdefs from it. Further it introduces an optimization for AMD
processors. There the RDTSCP instruction is used instead of CPUID;RDTSC
which is helpfull if the kernel runs as a KVM guest. Running as a guest
makes CPUID very expensive because it causes an intercept of the guest.

Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>

-- 
Joerg Roedel
Operating System Research Center
AMD Saxony LLC & Co. KG
diff --git a/include/asm-i386/cpufeature.h b/include/asm-i386/cpufeature.h
index 3f92b94..a9f1f01 100644
--- a/include/asm-i386/cpufeature.h
+++ b/include/asm-i386/cpufeature.h
@@ -49,6 +49,7 @@
 #define X86_FEATURE_MP (1*32+19) /* MP Capable. */
 #define X86_FEATURE_NX (1*32+20) /* Execute Disable */
 #define X86_FEATURE_MMXEXT (1*32+22) /* AMD MMX extensions */
+#define X86_FEATURE_RDTSCP  (1*32+27) /* RDTSCP */
 #define X86_FEATURE_LM (1*32+29) /* Long Mode (x86-64) */
 #define X86_FEATURE_3DNOWEXT   (1*32+30) /* AMD 3DNow! extensions */
 #define X86_FEATURE_3DNOW  (1*32+31) /* 3DNow! */
diff --git a/include/asm-i386/tsc.h b/include/asm-i386/tsc.h
index 84016ff..0b769ad 100644
--- a/include/asm-i386/tsc.h
+++ b/include/asm-i386/tsc.h
@@ -7,6 +7,7 @@
 #define _ASM_i386_TSC_H
 
 #include 
+#include 
 
 /*
  * Standard way to access the cycle counter.
@@ -34,22 +35,16 @@ static inline cycles_t get_cycles(void)
 /* Like get_cycles, but make sure the CPU is synchronized. */
 static __always_inline cycles_t get_cycles_sync(void)
 {
-   unsigned long long ret;
-#ifdef X86_FEATURE_SYNC_RDTSC
-   unsigned eax;
+   unsigned int a, d;
 
-   /*
-* Don't do an additional sync on CPUs where we know
-* RDTSC is already synchronous:
-*/
-   alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
- "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
-#else
-   sync_core();
-#endif
-   rdtscll(ret);
+#define RDTSCP ".byte 0x0f, 0x01, 0xf9"
+   alternative_io_two("cpuid\nrdtsc",
+  "rdtsc", X86_FEATURE_SYNC_RDTSC,
+  ".byte 0x0f, 0x01, 0xf9", X86_FEATURE_RDTSCP,
+  ASM_OUTPUT2("=a" (a), "=d" (d)),
+  "0" (1) : "ecx", "memory");
 
-   return ret;
+   return ((unsigned long long)a) | (((unsigned long long)d)<<32);
 }
 
 extern void tsc_init(void);

dev_printk and new-style class devices

2007-03-09 Thread Jean Delvare

Hi Greg, all,

As the new-style class devices (as opposed to old-style struct
class_device) are becoming more widely used, I noticed that the
dev_printk-based functions are not working properly with these.
New-style class devices have no driver nor bus, almost by definition,
and as a result dev_driver_string(), which is used as the first
parameter of dev_printk, resolves to an empty string. This causes
entries like the following to show in my logs:

 i2c-2: adapter [SMBus stub driver] registered

Notice the unaesthetical leading whitespace. In order to fix this
problem, I suggest that we extend dev_driver_string to deal with
new-style class devices:

Signed-off-by: Jean Delvare <[EMAIL PROTECTED]>
---
 drivers/base/core.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- linux-2.6.21-rc3.orig/drivers/base/core.c   2007-02-28 09:48:19.0 
+0100
+++ linux-2.6.21-rc3/drivers/base/core.c2007-03-09 16:01:07.0 
+0100
@@ -57,7 +57,8 @@ bool is_lanana_major(unsigned int major)
 const char *dev_driver_string(struct device *dev)
 {
return dev->driver ? dev->driver->name :
-   (dev->bus ? dev->bus->name : "");
+   (dev->bus ? dev->bus->name :
+   (dev->class ? dev->class->name : ""));
 }
 EXPORT_SYMBOL(dev_driver_string);

In the case above, the message in the logs now looks like:

i2c-adapter i2c-2: adapter [SMBus stub driver] registered

Which is much better IMHO. Greg, what do you think?

Thanks,
-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Software Suspend: Fix suspend when console is in VT_AUTO/KD_GRAPHICS mode

2007-03-09 Thread Matthew Garrett

On Fri, Mar 09, 2007 at 10:08:05AM +0100, Pavel Machek wrote:

> So... if current console is graphical, we leave X accessing the
> console... That's bad, because video state is not going to be
> restored...?

A graphical console is not necessarily X. Is there any requirement for 
there to be a single VT that isn't in text mode? The vt switching is 
a hack, we shouldn't make life difficult for people who have their own 
userspace code that's entirely capable of restoring video state on its 
own.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/7] revoke: wire up s390 system calls

2007-03-09 Thread Pekka Enberg


Hi Martin,

Martin Schwidefsky wrote:

Yes, please put me or Heiko on CC if you add system calls to s390.


Ok, sorry about that. I would expect akpm to send it to you guys though 
whenever revoke graduates from -mm and not merge it to mainline.


Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/7] revoke: wire up s390 system calls

2007-03-09 Thread Martin Schwidefsky

On Fri, 2007-03-09 at 16:11 +0100, Arnd Bergmann wrote:
> > Make revokeat and frevoke system calls available to user-space on s390.
> > 
> > Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]>
> > Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
> 
> Looks good to me, but you really should through Martin, since he
> has an overview of what syscall numbers may already be assigned
> some another patch he has queued up.

Yes, please put me or Heiko on CC if you add system calls to s390.

-- 
blue skies,  IBM Deutschland Entwicklung GmbH
   MartinVorsitzender des Aufsichtsrats: Johann Weihen
 Geschäftsführung: Herbert Kircher
Martin Schwidefsky   Sitz der Gesellschaft: Böblingen
Linux on zSeries Registergericht: Amtsgericht Stuttgart,
   Development   HRB 243294

"Reality continues to ruin my life." - Calvin.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Use more gcc extensions in the Linux headers

2007-03-09 Thread Andi Kleen

Rusty Russell <[EMAIL PROTECTED]> writes:

> __builtin_types_compatible_p() has been around since gcc 2.95, and we
> don't use it anywhere.  This patch quietly fixes that.

Using BUILD_BUG_ON_ZERO() would have been somewhat cleaner.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Keyboard stops working after *lock [Was: 2.6.21-rc2-mm1]

2007-03-09 Thread Jiri Kosina

On Fri, 9 Mar 2007, Jiri Kosina wrote:

> If this is present also in vanilla and not only in -mm, could you please 
> try reverting commits 4237081e573b99a48991aa71364b0682c444651c and 
> d4ae650a904612ffb7edd3f28b69b022988d2466 and let me know if the 
> situation gets any better?

Hi Jiri,

or even better, does the patch below (against 2.6.21-rc3) fix the problem 
with your keyboard? I can see possibilities of report fields unaligned to 
the byte boundary, which this might be causing problems.

(the original patch author added to cc)

Thanks.

diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index f4ee1af..f571513 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -873,10 +873,6 @@ static void hid_output_field(struct hid_field *field, __u8 
*data)
unsigned size = field->report_size;
unsigned n;
 
-   /* make sure the unused bits in the last byte are zeros */
-   if (count > 0 && size > 0)
-   data[(offset+count*size-1)/8] = 0;
-
for (n = 0; n < count; n++) {
if (field->logical_minimum < 0) /* signed values */
implement(data, offset + n * size, size, 
s32ton(field->value[n], size));
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] Software Suspend: Fix suspend when console is in VT_AUTO/KD_GRAPHICS mode

2007-03-09 Thread Andrew Johnson

Matthew Garrett wrote: 
> On Fri, Mar 09, 2007 at 10:08:05AM +0100, Pavel Machek wrote:
> 
> > So... if current console is graphical, we leave X accessing the
> > console... That's bad, because video state is not going to be
> > restored...?
> 
> A graphical console is not necessarily X. Is there any requirement for
> there to be a single VT that isn't in text mode? The vt switching is
> a hack, we shouldn't make life difficult for people who have their own
> userspace code that's entirely capable of restoring video state on its
> own.

The problem actually comes about when using Qtopia Phone Edition (QPE)
on a PXA270. QPE puts the console into VT_AUTO+KD_GRAPHICS mode and
writes directly to the framebuffer from then on.  In this mode the
kernel correctly disallows a console change, as QPE is not getting
notification of a console change and thus does not know when to repaint
the screen. 

AFAIK, X uses VT_PROCESS+KD_GRAPHICS mode, so it gets notification of a
change to and from the X console, thus it knows when to repaint the
screen.

I think you can test this by changing the mode of a text console to
KD_GRAPHICS using the KDSETMODE ioctl, then attempting to change to
another text console using chvt.

-- Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] revoke: add f_light flag for struct file

2007-03-09 Thread Benjamin LaHaise

On Fri, Mar 09, 2007 at 12:13:35PM +0100, Eric Dumazet wrote:
> Then just drop the fget_light() 'optimisation' and always take a reference 
> (atomic on f_count) regardless of single-thread or not. Instead of dirtying 
> f_light, just do the straightforward thing and be with it.
> 
> (that is : fget_light() = fget() = no more keeping fput_needed everywhere, 
> and 
> convoluted things in some dark sides of the kernel.

And it makes things rather slower for a lot of single threaded applications 
on modern systems.  Yes, fget_light can be done much more cleanly, but please 
don't go around ripping out optimizations just because.

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/7] revoke: wire up s390 system calls

2007-03-09 Thread Martin Schwidefsky

On Fri, 2007-03-09 at 17:41 +0200, Pekka Enberg wrote:
> Martin Schwidefsky wrote:
> > Yes, please put me or Heiko on CC if you add system calls to s390.
> 
> Ok, sorry about that. I would expect akpm to send it to you guys though 
> whenever revoke graduates from -mm and not merge it to mainline.

Yes, but nobody is perfect. Even Andrew sometimes forgets to add people
to CC who should know about "stuff". It would be nice if the CC-line is
added from the start.

-- 
blue skies,  IBM Deutschland Entwicklung GmbH
   MartinVorsitzender des Aufsichtsrats: Johann Weihen
 Geschäftsführung: Herbert Kircher
Martin Schwidefsky   Sitz der Gesellschaft: Böblingen
Linux on zSeries Registergericht: Amtsgericht Stuttgart,
   Development   HRB 243294

"Reality continues to ruin my life." - Calvin.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Possible "struct pid" leak from tty_io.c

2007-03-09 Thread Eric W. Biederman

"Catalin Marinas" <[EMAIL PROTECTED]> writes:

> On 08/03/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> "Catalin Marinas" <[EMAIL PROTECTED]> writes:
>
> I think it's only the pid_chain and rcu member that could be placed in
> a list and kmemleak scans the memory for these two offsets as well.
> I'll check those lists anyway but I doubt it's a more fundamental
> problem with how kmemleak handles struct pid as I should've probably
> got more reports.

Right.  I was pointing out the possibilities but because we do
some tricky things.  Mostly I was wondering about the hlist for
the list of tasks.  Now if a task is on that list we should have
a struct pid_link pointing at our struct pid, so it shouldn't fool
kmemleak but I'm still a little curious if all of those hlist_heads are
NULL pointers.

>> In most any other layer we cache pids indefinitely and a situation
>> where we have a pointer to a struct pid with a ref count of 1 long
>> after the process goes away is expected.
>
> Yes, indeed, but what kmemleak reports is that the pid structure
> wasn't freed yet and there is no way to determine its pointer directly
> or via container_of on members (by scanning the memory), hence it is
> considered a leak.

Yes that sounds like a leak.

>> I don't understand your situation enough to guess what is going wrong
>> yet.  Hopefully I have given you enough information to get started.
>
> Yes, many thanks. I'll dig further and let you know.

Thanks

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

"No handler for vector" patches don't work on some systems

2007-03-09 Thread Chuck Ebbert

So far I've tried the simple "survive having no handler
for a vector" patch and the preliminary 3-patch series
that was in -mm for a while, and neither work on the
Dell PowerEdge 29xx and 19xx systems. These servers
have the Intel 5000X chipset with the 6700PXH PCI Hub
with dual independent PCI-X busses, each with its own
I/OxAPIC with 24 interrupts. The fixes do work on
"simple" systems but not on these high-end ones.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

"No handler for vector" patches don't work on some systems

2007-03-09 Thread Chuck Ebbert

[sorry for the dup: this time to the right recipient]

So far I've tried the simple "survive having no handler
for a vector" patch and the preliminary 3-patch series
that was in -mm for a while, and neither work on the
Dell PowerEdge 29xx and 19xx systems. These servers
have the Intel 5000X chipset with the 6700PXH PCI Hub
with dual independent PCI-X busses, each with its own
I/OxAPIC with 24 interrupts. The fixes do work on
"simple" systems but not on these high-end ones.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] chaostables

2007-03-09 Thread Petr Tesařík

jimmy píše v Pá 09. 03. 2007 v 13:37 +0530:
> Alan Cox wrote:
> >> Also note that the word 'chaostables' does not even appear in the patch, 
> >> though xt_CHAOS does. Since we know that {xt,ipt}_[A-Z]+ are targets, we 
> >> can safely assume that CHAOS does what it says - make fun of nmap.
> > 
> > "entropy" ?
> > "randomness"
> 
> fuzztables?

confuztables!

Petr


signature.asc
Description: Toto je digitálně	 podepsaná část	 zprávy

Re: the usage of DEBUG_DRIVER seems ambiguous

2007-03-09 Thread Stefan Richter

Robert P. J. Day wrote:
> On Fri, 9 Mar 2007, Artem Bityutskiy wrote:
>> Randy Dunlap wrote:
>> > The ones in drivers/net/ are just local driver debug controls.
>> > They happen to have the same name as a (likely newer) kconfig symbol.
>> >
>> > Is there a real problem that needs to be fixed?
>>
>> Renaming them just for the sake of being less confusing makes sense.
...
> if someone wants to make a suggestion, i can submit a simple renaming
> patch.

If a driver or subsystem already uses a prefix to have an own namespace
for macros, functions, structs and so on, a local DEBUG_DRIVER could
become something like LOCALPREFIX_DEBUG. If there is a narrow usage of
the macro, e.g. to indicate a debug level, it could become something
speaking like LOCALPREFIX_DEBUG_LEVEL. --- However, after looking at the
actual occurrences of DEBUG_DRIVER, I see that this recommendation
doesn't really apply that well.

>From your initial post:
| $ $ grep -rw DEBUG_DRIVER *
| drivers/net/sunlance.c:#undef DEBUG_DRIVER

This is an old forgotten rest of earlier debug code. See here for evidence:
http://lxr.linux.no/source/drivers/net/sunlance.c?v=2.2.26#L791
791 #ifdef DEBUG_DRIVER
792 printk (KERN_DEBUG "Lance restart=%d\n", status);
793 #endif

This usage of DEBUG_DRIVER isn't there anymore. Therefore simply delete
the remaining occurrence:
-#undef DEBUG_DRIVER

| drivers/net/a2065.c:#ifdef DEBUG_DRIVER
| drivers/net/a2065.c:#ifdef DEBUG_DRIVER

Rename to A2065_DEBUG or LANCE_DEBUG. Two more alternatives:

-#ifdef DEBUG_DRIVER
+#if 0 /* debug */

or
-#ifdef DEBUG_DRIVER
-   years_old_debug_cruft_nobody_enables_anymore();
-#endif

Needless to say, the maintainer certainly wants to ACK/NAK this.

| drivers/net/7990.c:#ifdef DEBUG_DRIVER
| drivers/net/7990.c:#ifdef DEBUG_DRIVER

Exactly like with a2065.c.

| drivers/base/Kconfig:config DEBUG_DRIVER

According to where it is defined, CONFIG_DEBUG_DRIVER should only occur
in drivers/base/* (and some defconfigs outside of drivers/base/).

| ...

More hits from LXR:

drivers/isdn/hardware/eicon/dbgioctl.h

Where is this header file used anyway? Can the entire file be deleted?

drivers/isdn/gigaset/gigaset.h  (definition as enum item)
drivers/isdn/gigaset/common.c   (multiple uses)

If the overlap with CONFIG_DEBUG_DRIVER bothers you, rename it to
DEBUG_DRIVER_STRUCT or whatever.
-- 
Stefan Richter
-=-=-=== --== --===
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/7] revoke: wire up s390 system calls

2007-03-09 Thread Serge E. Hallyn

Quoting Martin Schwidefsky ([EMAIL PROTECTED]):
> On Fri, 2007-03-09 at 17:41 +0200, Pekka Enberg wrote:
> > Martin Schwidefsky wrote:
> > > Yes, please put me or Heiko on CC if you add system calls to s390.
> > 
> > Ok, sorry about that. I would expect akpm to send it to you guys though 
> > whenever revoke graduates from -mm and not merge it to mainline.
> 
> Yes, but nobody is perfect. Even Andrew sometimes forgets to add people
> to CC who should know about "stuff". It would be nice if the CC-line is
> added from the start.

Sorry, I should have cc:d you when I sent my testing patch to Pekka.

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: passing function pointers through platform devices?

2007-03-09 Thread NZG

On Wednesday 07 March 2007 11:55 am, David Brownell wrote:
> > I'm developing an SPI- bus >MMC/SD block driver translation layer.
>
> Another one?  There's already been significant work in that area.  See for
> example
>
>   http://marc.theaimsgroup.com/?l=linux-kernel&m=117000652529003&w=2
Nice, I'll build on that, my previous work ignored the SPI/MMC layers (because 
they didn't exist at the time) and just build a stacked driver on a character 
SPI driver.  This gives me some direction as to how to proceed.

> Which admittedly didn't behave when I just put it onto my test rig,
> but seems nonetheless to be a significant step forward.  It's not like
> everyone has hardware that can use such a driver after all!
True, but it's fairly common right now, until every microcontroller gets a 
hardware SD controller (which seems to be the trend)

> That's how it's done in that patch.  The model being what the PXA MMC/SD
> card driver does, since that's the most generic model I found ... handling
> for example systems which need to poll for card detect, as well as ones
> that can use real gpio based IRQs.  The mmc_spi driver doesn't need to know
> which kind of platform it's got.
Sounds good, thanks

NZG

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: refcounting drivers' data structures used in sysfs buffers

2007-03-09 Thread Alan Stern

On Fri, 9 Mar 2007, Oliver Neukum wrote:

> Am Donnerstag, 8. März 2007 17:02 schrieb Alan Stern:
> > On Thu, 8 Mar 2007, Oliver Neukum wrote:
> > 
> > > Hi,
> > > 
> > > after a lightning bolt from high above I've been looking into refcounting
> > > the data structures drivers use to provide the data used to refill sysfs
> > > buffers. I've come to the following conclusion.
> > > 
> > > 1. struct sysfs_buffer must have a struct kref * and probably a destructor
> > > pointer
> > > 2. drivers must be able to pass these pointers through an extended
> > > device_create_file()
> > > 3. Drivers must use refcounting if they want to use attributes
> > > 4. read/write/poll must do refcounting
> > > 
> > > I am not sure where to store the pointers. struct sysfs_dirent() looks
> > > like the obvious choice. Comments?
> > 
> > Can you explain the reasoning that led to these conclusions?  And what 
> > exactly was your lightning bolt?
> 
> The old race between disconnect and IO to attribute via sysfs again.
> If I cannot disassociate the drivers from the buffers in the buffers, drivers
> must not deallocate the data necessary to answer sysfs callbacks while
> a buffer exists.

Why wouldn't you be able to dissociate a driver from a buffer?  That was 
the whole point of adding .orphan to sysfs_buffer and creating 
sysfs_buffer_collection -- it was supposed to solve exactly this race.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/7] Resource counters

2007-03-09 Thread Herbert Poetzl

On Wed, Mar 07, 2007 at 10:19:05AM +0300, Pavel Emelianov wrote:
> Balbir Singh wrote:
> > Pavel Emelianov wrote:
> >> Introduce generic structures and routines for
> >> resource accounting.
> >>
> >> Each resource accounting container is supposed to
> >> aggregate it, container_subsystem_state and its
> >> resource-specific members within.
> >>
> >>
> >> 
> >>
> >> diff -upr linux-2.6.20.orig/include/linux/res_counter.h
> >> linux-2.6.20-0/include/linux/res_counter.h
> >> --- linux-2.6.20.orig/include/linux/res_counter.h2007-03-06
> >> 13:39:17.0 +0300
> >> +++ linux-2.6.20-0/include/linux/res_counter.h2007-03-06
> >> 13:33:28.0 +0300
> >> @@ -0,0 +1,83 @@
> >> +#ifndef __RES_COUNTER_H__
> >> +#define __RES_COUNTER_H__
> >> +/*
> >> + * resource counters
> >> + *
> >> + * Copyright 2007 OpenVZ SWsoft Inc
> >> + *
> >> + * Author: Pavel Emelianov <[EMAIL PROTECTED]>
> >> + *
> >> + */
> >> +
> >> +#include 
> >> +
> >> +struct res_counter {
> >> +unsigned long usage;
> >> +unsigned long limit;
> >> +unsigned long failcnt;
> >> +spinlock_t lock;
> >> +};
> >> +
> >> +enum {
> >> +RES_USAGE,
> >> +RES_LIMIT,
> >> +RES_FAILCNT,
> >> +};
> >> +
> >> +ssize_t res_counter_read(struct res_counter *cnt, int member,
> >> +const char __user *buf, size_t nbytes, loff_t *pos);
> >> +ssize_t res_counter_write(struct res_counter *cnt, int member,
> >> +const char __user *buf, size_t nbytes, loff_t *pos);
> >> +
> >> +static inline void res_counter_init(struct res_counter *cnt)
> >> +{
> >> +spin_lock_init(&cnt->lock);
> >> +cnt->limit = (unsigned long)LONG_MAX;
> >> +}
> >> +
> > 
> > Is there any way to indicate that there are no limits on this container.
> 
> Yes - LONG_MAX is essentially a "no limit" value as no
> container will ever have such many files :)

-1 or ~0 is a viable choice for userspace to
communicate 'infinite' or 'unlimited'

> > LONG_MAX is quite huge, but still when the administrator wants to
> > configure a container to *un-limited usage*, it becomes hard for
> > the administrator.
> > 
> >> +static inline int res_counter_charge_locked(struct res_counter *cnt,
> >> +unsigned long val)
> >> +{
> >> +if (cnt->usage <= cnt->limit - val) {
> >> +cnt->usage += val;
> >> +return 0;
> >> +}
> >> +
> >> +cnt->failcnt++;
> >> +return -ENOMEM;
> >> +}
> >> +
> >> +static inline int res_counter_charge(struct res_counter *cnt,
> >> +unsigned long val)
> >> +{
> >> +int ret;
> >> +unsigned long flags;
> >> +
> >> +spin_lock_irqsave(&cnt->lock, flags);
> >> +ret = res_counter_charge_locked(cnt, val);
> >> +spin_unlock_irqrestore(&cnt->lock, flags);
> >> +return ret;
> >> +}
> >> +
> > 
> > Will atomic counters help here.
> 
> I'm afraid no. We have to atomically check for limit and alter
> one of usage or failcnt depending on the checking result. Making
> this with atomic_xxx ops will require at least two ops.

Linux-VServer does the accounting with atomic counters,
so that works quite fine, just do the checks at the
beginning of whatever resource allocation and the
accounting once the resource is acquired ...

> If we'll remove failcnt this would look like
>while (atomic_cmpxchg(...))
> which is also not that good.
> 
> Moreover - in RSS accounting patches I perform page list
> manipulations under this lock, so this also saves one atomic op.

it still hasn't been shown that this kind of RSS limit
doesn't add big time overhead to normal operations
(inside and outside of such a resource container)

note that the 'usual' memory accounting is much more
lightweight and serves similar purposes ...

best,
Herbert

> >> +static inline void res_counter_uncharge_locked(struct res_counter *cnt,
> >> +unsigned long val)
> >> +{
> >> +if (unlikely(cnt->usage < val)) {
> >> +WARN_ON(1);
> >> +val = cnt->usage;
> >> +}
> >> +
> >> +cnt->usage -= val;
> >> +}
> >> +
> >> +static inline void res_counter_uncharge(struct res_counter *cnt,
> >> +unsigned long val)
> >> +{
> >> +unsigned long flags;
> >> +
> >> +spin_lock_irqsave(&cnt->lock, flags);
> >> +res_counter_uncharge_locked(cnt, val);
> >> +spin_unlock_irqrestore(&cnt->lock, flags);
> >> +}
> >> +
> >> +#endif
> >> diff -upr linux-2.6.20.orig/init/Kconfig linux-2.6.20-0/init/Kconfig
> >> --- linux-2.6.20.orig/init/Kconfig2007-03-06 13:33:28.0 +0300
> >> +++ linux-2.6.20-0/init/Kconfig2007-03-06 13:33:28.0 +0300
> >> @@ -265,6 +265,10 @@ config CPUSETS
> >>
> >>Say N if unsure.
> >>
> >> +config RESOURCE_COUNTERS
> >> +bool
> >> +select CONTAINERS
> >> +
> >>  config SYSFS_DEPRECATED
> >>  bool "Create deprecated sysfs files"
> >>  default y
> >> diff -upr linux-2.6.20.orig/kernel/Makefile
> >> linux-2.6.20-0/kernel/Makefile
> >> --- linux-2.6.20.or

Re: [SLUB 0/3] SLUB: The unqueued slab allocator V4

2007-03-09 Thread Christoph Lameter

On Fri, 9 Mar 2007, Mel Gorman wrote:

> I'm not sure what you mean by per-order queues. The buddy allocator already
> has per-order lists.

Somehow they do not seem to work right. SLAB (and now SLUB too) can avoid 
(or defer) fragmentation by keeping its own queues.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] rcfs core patch

2007-03-09 Thread Kirill Korotaev

>  nobody actually cares about a precise accounting and
>  calculating shares or partitions of whatever resource,
>  all that matters is that you have a way to prevent a
>  potential hostile environment from sucking up all your
>  resources (or even a single one) resulting in a DoS
This is not true. People care. Reasons:
  - resource planning
  - fairness
  - guarantees
  What you talk is about security only. Not the above issues.
  So good precision is required. If there is no precision at all,
  security sucks as well and can be exploited, e.g. for CPU
  schedulers doing an accounting based on 
  jiffies accounting in scheduler_tick() it is easy to build
  an application consuming 90% of CPU, but ~0% from scheduler POV.

Kirill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] rcfs core patch

2007-03-09 Thread Kirill Korotaev

>>There have been various projects attempting to provide resource
>>management support in Linux, including CKRM/Resource Groups and UBC.
> 
> 
> let me note here, once again, that you forgot Linux-VServer
> which does quite non-intrusive resource management ...
Herbert, do you care to send patches except for ask others to do
something that works for you?

Looks like your main argument is non-intrusive...
"working", "secure", "flexible" are not required to people any more? :/

>> Each had its own task-grouping mechanism. 
> 
> 
> the basic 'context' (pid space) is the grouping mechanism
> we use for resource management too
> 
> 
>>Paul Menage observed [1] that cpusets in the kernel already has a
>>grouping mechanism which was working well for cpusets. He went ahead
>>and generalized the grouping code in cpusets so that it could be used
>>for overall resource management purpose. 
> 
> 
>>With his patches, it is possible to even create multiple hierarchies
>>of groups (see [2] on why multiple hierarchies) as follows:
> 
> 
> do we need or even want that? IMHO the hierarchical
> concept CKRM was designed with, was also the reason
> for it being slow, unuseable and complicated
1. cpusets are hierarchical already. So hierarchy is required.
2. As it was discussed on the call controllers which are flat
   can just prohibit creation of hierarchy on the filesystem.
   i.e. allow only 1 depth and continue being fast.

>>mount -t container -o cpuset none /dev/cpuset <- cpuset hierarchy
>>mount -t container -o mem,cpu none /dev/mem   <- memory/cpu hierarchy
>>mount -t container -o disk none /dev/disk <- disk hierarchy
>>
>>In each hierarchy, you can create task groups and manipulate the
>>resource parameters of each group. You can also move tasks between
>>groups at run-time (see [3] on why this is required). 
> 
> 
>>Each hierarchy is also manipulated independent of the other.  
> 
> 
>>Paul's patches also introduced a 'struct container' in the kernel,
>>which serves these key purposes:
>>
>>- Task-grouping
>>  'struct container' represents a task-group created in each hierarchy.
>>  So every directory created under /dev/cpuset or /dev/mem above will
>>  have a corresponding 'struct container' inside the kernel. All tasks
>>  pointing to the same 'struct container' are considered to be part of
>>  a group
>>
>>  The 'struct container' in turn has pointers to resource objects which
>>  store actual resource parameters for that group. In above example,
>>  'struct container' created under /dev/cpuset will have a pointer to
>>  'struct cpuset' while 'struct container' created under /dev/disk will
>>  have pointer to 'struct disk_quota_or_whatever'.
>>
>>- Maintain hierarchical information
>>  The 'struct container' also keeps track of hierarchical relationship
>>  between groups.
>>
>>The filesystem interface in the patches essentially serves these
>>purposes:
>>
>>  - Provide an interface to manipulate task-groups. This includes
>>creating/deleting groups, listing tasks present in a group and 
>>moving tasks across groups
>>
>>  - Provdes an interface to manipulate the resource objects
>>(limits etc) pointed to by 'struct container'.
>>
>>As you know, the introduction of 'struct container' was objected
>>to and was felt redundant as a means to group tasks. Thats where I
>>took a shot at converting over Paul Menage's patch to avoid 'struct
>>container' abstraction and insead work with 'struct nsproxy'.
> 
> 
> which IMHO isn't a step in the right direction, as
> you will need to handle different nsproxies within
> the same 'resource container' (see previous email)
tend to agree.
Looks like Paul's original patch was in the right way.

[...]
>>A separate filesystem would give us more flexibility like the
>>implementing multi-hierarchy support described above.
> 
> 
> why is the filesystem approach so favored for this
> kind of manipulations?
> 
> IMHO it is one of the worst interfaces I can imagine
> (to move tasks between spaces and/or assign resources)
> but yes, I'm aware that filesystems are 'in' nowadays
I also hate filesystems approach being used nowdays everywhere.
But, looks like there are reasons still:
1. cpusets already use fs interface.
2. each controller can have a bit of specific information/controls exported 
easily.

Can you suggest any other extensible/flexible interface for these?

Thanks,
Kirill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] rcfs core patch

2007-03-09 Thread Paul Jackson

Kirill, responding to Herbert:
> > do we need or even want that? IMHO the hierarchical
> > concept CKRM was designed with, was also the reason
> > for it being slow, unuseable and complicated
> 1. cpusets are hierarchical already. So hierarchy is required.

I think that CKRM has a harder time doing a hierarchy than cpusets.

CKRM is trying to account for and control how much of an amorphous
resource is used, whereas cpusets is trying to control whether a
specifically identifiable resource is used, or not used, not how
much of it is used.

A child cpuset gets configured to allow certain CPUs and Nodes, and
then does not need to dynamically pass back any information about
what is actually used - it's a one-way control with no feedback.
That's a relatively easier problem.

CKRM (as I recall it, from long ago ...) has to track the amount
of usage dynamically, across parent and child groups (whatever they
were called.)  That's a harder problem.

So, yes, as Kirill observes, we need the hierarchy because cpusets
has it, cpuset users make good use of the hierarchy, and the hierarchy
works fine in that case, even if a hierarchy is more difficult for CKRM.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [Patch 1/1] IBAC Patch

2007-03-09 Thread Mimi Zohar

On Thu, 2007-03-08 at 15:08 -0800, Randy Dunlap wrote:
> On Thu, 08 Mar 2007 17:58:16 -0500 Mimi Zohar wrote:
> 
> > This is a request for comments for a new Integrity Based Access
> > Control(IBAC) LSM module which bases access control decisions
> > on the new integrity framework services. 
> > 
> > (Hopefully this will help clarify the interaction between an LSM 
> > module and LIM module.)
> > 
> > Index: linux-2.6.21-rc3-mm2/security/ibac/Kconfig
> > ===
> > --- /dev/null
> > +++ linux-2.6.21-rc3-mm2/security/ibac/Kconfig
> > @@ -0,0 +1,36 @@
> > +config SECURITY_IBAC
> > +   boolean "IBAC support"
> > +   depends on SECURITY && SECURITY_NETWORK && INTEGRITY
> > +   help
> > + Integrity Based Access Control(IBAC) implements integrity
> > + based access control.
> 
> Please make the help text do more than repeat the words I B A C...
> Put a short explanation or say something like:
> See Documentation/security/foobar.txt for more information.
> (and add that file)

Agreed.  Perhaps something like:

Integrity Based Access Control(IBAC) uses the Linux Integrity
Module(LIM) API calls to verify an executable's metadata and 
data's integrity.  Based on the results, execution permission 
is permitted/denied.  Integrity providers may implement the 
LIM hooks differently.  For more information on integrity
verification refer to the specific integrity provider 
documentation. 

> > +config SECURITY_IBAC_BOOTPARAM
> > +   bool "IBAC boot parameter"
> > +   depends on SECURITY_IBAC
> > +   default y
> > +   help
> > + This option adds a kernel parameter 'ibac', which allows IBAC
> > + to be disabled at boot.  If this option is selected, IBAC
> > + functionality can be disabled with ibac=0 on the kernel
> > + command line.  The purpose of this option is to allow a
> > + single kernel image to be distributed with IBAC built in,
> > + but not necessarily enabled.
> > +
> > + If you are unsure how to answer this question, answer N.
> 
> What's the downside to having this always builtin instead of
> yet another config option?

The ability of changing LSM modules at runtime might be perceived
as problematic.

> > +static struct security_operations ibac_security_ops = {
> > +   .bprm_check_security = ibac_bprm_check_security
> > +};
> > +
> > +static int __init init_ibac(void)
> > +{
> > +   int rc;
> > +
> > +   if (!ibac_enabled)
> > +   return 0;
> > +
> > +   rc = register_security(&ibac_security_ops);
> > +   if (rc != 0)
> > +   panic("IBAC: Unable to register with kernel\n");
> 
> Normally we would not want to see a panic() from a register_xyz()
> failure, but I guess you are arguing that an ibac register_security()
> failure needs to halt everything??

Yes, as this implies that another LSM module registered the hooks first,
preventing IBAC from registering itself. 

Thank you for your other comments.  They'll be addressed in the next
ibac patch release.

Mimi Zohar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: should RTS init in serial core be tied to CRTSCTS

2007-03-09 Thread Oleksiy Kebkal


2007/3/8, Russell King <[EMAIL PROTECTED]>:

... which occurs /after/ userspace is up and running, when sysfs is
available.  So putting it in sysfs is reasonable.


Is it right place for serial settings?
/sys/class/tty/ttySN/

How far is it reasonable to split termios settings to the attributes?
1)
/sys/class/tty/ttyS0/termios
2)
/sys/class/tty/ttyS0/c_iflag
/sys/class/tty/ttyS0/c_oflag
/sys/class/tty/ttyS0/c_cflag
/sys/class/tty/ttyS0/c_lflag
/sys/class/tty/ttyS0/c_cc
3)
/sys/class/tty/ttyS0/speed
/sys/class/tty/ttyS0/eof
/sys/class/tty/ttyS0/eon
/sys/class/tty/ttyS0/erase
 and so on


-Oleksiy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [Patch 1/1] IBAC Patch

2007-03-09 Thread Serge E. Hallyn

Quoting [EMAIL PROTECTED] ([EMAIL PROTECTED]):
> On Thu, 08 Mar 2007 17:58:16 EST, Mimi Zohar said:
> > This is a request for comments for a new Integrity Based Access
> > Control(IBAC) LSM module which bases access control decisions
> > on the new integrity framework services. 
> > 
> > (Hopefully this will help clarify the interaction between an LSM 
> > module and LIM module.)
> 
> OK, between this and the additional LIM hooks I didn't notice in an earlier
> patch, we're starting to see the API.   The only problem is that although
> it may be the right API for *your* code, I suspect it's a non-starter without
> a discussion about whether it's the right *generic* API for an LIM (which will
> require at least one dramatic bun fight about what "Integrity" means).

Casey's earlier message suggested this too.  'Integrity' here in
particular does not mean online integrity guarantees through, i.e.,
information flow control.  So perhaps instead of 'integrity' we should
make sure to always say 'integrity measurement'.  Of course then there
is already the 'integrity measurement architecture' which is only one
implementation of a LIM module, right?   So it would need to be renamed
to TIMA (TPM-enabled IMA) or something I guess.

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Complain about missing system calls.

2007-03-09 Thread Andi Kleen

David Woodhouse <[EMAIL PROTECTED]> writes:

> Most system calls seem to get added to i386 first. This patch
> automatically generates a warning for any new system call which is
> implemented on i386 but not the architecture currently being compiled.
> On PowerPC at the moment, for example, it results in these warnings:
> init/missing_syscalls.h:935:3: warning: #warning syscall sync_file_range not 
> implemented
> init/missing_syscalls.h:947:3: warning: #warning syscall getcpu not 
> implemented
> init/missing_syscalls.h:950:3: warning: #warning syscall epoll_pwait not 
> implemented

I think a better solution would be to finally switch to auto generated
system call tables for newer system calls. The original reason why the
architectures have different system call numbers -- compatibility with
another "native" Unix -- is completely obsolete now. This leaves only
minor differences of compat stub vs non compat stub and a few
architecture specific calls.

Of course the existing syscall numbers can't be changed, but for all new 
calls one could just add automatically for everybody.

A global table with two entries (compat and non compat) and a per arch 
override table should be sufficient.

Comments?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Complain about missing system calls.

2007-03-09 Thread Jan-Benedict Glaw

On Fri, 2007-03-09 17:11:10 +0100, Andi Kleen <[EMAIL PROTECTED]> wrote:
> David Woodhouse <[EMAIL PROTECTED]> writes:
> > Most system calls seem to get added to i386 first. This patch
> > automatically generates a warning for any new system call which is
> > implemented on i386 but not the architecture currently being compiled.
> > On PowerPC at the moment, for example, it results in these warnings:
> > init/missing_syscalls.h:935:3: warning: #warning syscall sync_file_range 
> > not implemented
> > init/missing_syscalls.h:947:3: warning: #warning syscall getcpu not 
> > implemented
> > init/missing_syscalls.h:950:3: warning: #warning syscall epoll_pwait not 
> > implemented
> 
> I think a better solution would be to finally switch to auto generated
> system call tables for newer system calls. The original reason why the
> architectures have different system call numbers -- compatibility with
> another "native" Unix -- is completely obsolete now. This leaves only
> minor differences of compat stub vs non compat stub and a few
> architecture specific calls.
> 
> Of course the existing syscall numbers can't be changed, but for all new 
> calls one could just add automatically for everybody.
> 
> A global table with two entries (compat and non compat) and a per arch 
> override table should be sufficient.

Not everybody has a simple indexed list of pointers :)  For example,
for vax-linux, we use a struct per syscall with the expected number of
on-stack longwords for the call.

So if something "new" is coming up, please keep in mind that it should
be flexible enough to represent that. :)

MfG, JBG

-- 
  Jan-Benedict Glaw  [EMAIL PROTECTED]  +49-172-7608481
Signature of: "really soon now":  an unspecified period of time, 
likly to
the second  : be greater than any reasonable 
definition
  of "soon".


signature.asc
Description: Digital signature

Re: Possible "struct pid" leak from tty_io.c

2007-03-09 Thread Catalin Marinas

Eric,

For a longer explanation, see the second part of this e-mail. In
short, the patch below seems to fix this particular leak. I'm not sure
that's the correct/complete fix as I seem to still get a 2nd report.
Any info is welcomed.

diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index e453268..4e33dc2 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -1375,6 +1375,9 @@ static void do_tty_hangup(struct work_struct *work)
}
read_unlock(&tasklist_lock);

+   put_pid(tty->session);
+   put_pid(tty->pgrp);
+
tty->flags = 0;
tty->session = NULL;
tty->pgrp = NULL;

On 08/03/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

"Catalin Marinas" <[EMAIL PROTECTED]> writes:
> The /sbin/init application calls sys_clone() a few times but only one
> leak is reported (see below). Looking at the reported pid object (at
> 0xc7c14500), count is 2 and nr is 296 but no process with pid 296
> exists any more.

[...]

> unreferenced object 0xc7c14500 (size 36):
>  comm "init", pid 245, jiffies 4294939289
>  backtrace:
>[] kmem_cache_alloc
>[] alloc_pid
>[] do_fork
>[] sys_clone
>[] ret_fast_syscall

I think this is the path that all pid structures come from so
unfortunately that doesn't help tracing this problem down.

No, indeed, but that's the only thing kmemleak can report. Anyway, I
got some more information now, after adding several printk's:

The difference from other pid objects is that this one (with nr 296)
is passed as a parameter to proc_set_tty(). The __proc_set_tty()
function increments the pid->count twice via get_pid(), and, with two
other get_pid calls, the pid->count for this object gets to 5 (1 being
the initial value). The prints below are function name, struct pid
address (different from the runs yesterday though), pid->nr and
pid->count (after get_pid incrementing). It also show the return
address and symbol (the calling function):

 alloc_pid: c7c149d8, 296, 1
 get_pid: c7c149d8, 296, 2
   return: c0122d64 (proc_set_tty+0x34/0x54)
 get_pid: c7c149d8, 296, 3
   return: c0122d64 (proc_set_tty+0x34/0x54)
 get_pid: c7c149d8, 296, 4
   return: c002b328 (do_exit+0x2e4/0x7f8) - this is actually the get_pid
 in disassociate_ctty but it is reported like this because of get_pid
 inlining
 get_pid: c7c149d8, 296, 5
   return: c0124a0c (tty_vhangup+0x14/0x18)

On the exit path (see below), however, put_pid is called twice before
free_pid and once via release_task -> detach_pid -> free_pid -> ... ->
__rcu_process_callbacks -> delayed_put_pid -> put_pid. Note that
free_pid is called with pid->nr == 3 and the last put_pid gets called
with nr == 3 as well (but it decrements it to 2 and that's what I find
at that memory location). In the trace below, the pid->count is
printed before put_pid modifies it:

 put_pid: c7c149d8, 296, 5
   return: c0124b5c (disassociate_ctty+0x14c/0x230)
 put_pid: c7c149d8, 296, 4
   return: c0124ba8 (disassociate_ctty+0x198/0x230)
 detach_pid: c7c149d8, 296, 3
   return: c002a230 (release_task+0x1c0/0x358)
 detach_pid: c7c149d8, 296, 3
   return: c002a248 (release_task+0x1d8/0x358)
 detach_pid: c7c149d8, 296, 3
   return: c002a254 (release_task+0x1e4/0x358)
 free_pid: c7c149d8, 296, 3
   return: c003a990 (detach_pid+0xac/0xc8)
 ...
 delayed_put_pid: c7c149d8, 296, 3
   return: c003af68 (__rcu_process_callbacks+0x19c/0x25c)
 put_pid: c7c149d8, 296, 3
   return: c003a8cc (delayed_put_pid+0x54/0x6c)

In the above disassociate_ctty() function the code below (line 1542)
doesn't seem to get called:

tty = get_current_tty();
if (tty) {
put_pid(tty->session);
put_pid(tty->pgrp);
tty->session = NULL;
tty->pgrp = NULL;
} else {

and I get the following error if TTY_DEBUG_HANGUP is defined - "error
attempted to write to tty [0x] = NULL".

It looks like the tty_vhangup() call in in disassociate_ctty() sets
current->signal->tty to NULL in the do_each_pid_task loop in
do_tty_hangup (p->signal->tty = NULL). The second call to
get_current_tty() in disassociate_ctty() return NULL and therefore no
put_pid on tty->session and tty->pgrp (which are also set to NULL in
the previous function).

Regards.

--
Catalin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: refcounting drivers' data structures used in sysfs buffers

2007-03-09 Thread Oliver Neukum

Am Freitag, 9. März 2007 17:32 schrieb Alan Stern:
> On Fri, 9 Mar 2007, Oliver Neukum wrote:
> 
> > Am Donnerstag, 8. März 2007 17:02 schrieb Alan Stern:
> > > On Thu, 8 Mar 2007, Oliver Neukum wrote:
> > > 
> > > > Hi,
> > > > 
> > > > after a lightning bolt from high above I've been looking into 
> > > > refcounting
> > > > the data structures drivers use to provide the data used to refill sysfs
> > > > buffers. I've come to the following conclusion.
> > > > 
> > > > 1. struct sysfs_buffer must have a struct kref * and probably a 
> > > > destructor
> > > > pointer
> > > > 2. drivers must be able to pass these pointers through an extended
> > > > device_create_file()
> > > > 3. Drivers must use refcounting if they want to use attributes
> > > > 4. read/write/poll must do refcounting
> > > > 
> > > > I am not sure where to store the pointers. struct sysfs_dirent() looks
> > > > like the obvious choice. Comments?
> > > 
> > > Can you explain the reasoning that led to these conclusions?  And what 
> > > exactly was your lightning bolt?
> > 
> > The old race between disconnect and IO to attribute via sysfs again.
> > If I cannot disassociate the drivers from the buffers in the buffers, 
> > drivers
> > must not deallocate the data necessary to answer sysfs callbacks while
> > a buffer exists.
> 
> Why wouldn't you be able to dissociate a driver from a buffer?  That was 
> the whole point of adding .orphan to sysfs_buffer and creating 
> sysfs_buffer_collection -- it was supposed to solve exactly this race.

It did solve the race but deadlocked when unbinding devices through sysfs.
Linux therefore asked for the patch to be reverted and wants the isue solved
with refcounting.

Regards
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: "No handler for vector" patches don't work on some systems

2007-03-09 Thread Eric W. Biederman

Chuck Ebbert <[EMAIL PROTECTED]> writes:

> [sorry for the dup: this time to the right recipient]
>
> So far I've tried the simple "survive having no handler
> for a vector" patch and the preliminary 3-patch series
> that was in -mm for a while, and neither work on the
> Dell PowerEdge 29xx and 19xx systems. These servers
> have the Intel 5000X chipset with the 6700PXH PCI Hub
> with dual independent PCI-X busses, each with its own
> I/OxAPIC with 24 interrupts. The fixes do work on
> "simple" systems but not on these high-end ones.

Ok thanks for the report.  It sounds like there is another cause
for the problem in the Dell case.

The simple patch drops the interrupt handler but acknowledges the
hardware so if the driver can survive missing an interrupt we
should be ok.  With level triggered interrupts this should pretty
much be guaranteed as after the acknowledgement the unhandled
interrupt will be refired.

One of my internal test systems had a 6700PXH PCI hub (at least I
think that was the part) the E7520 chipset.  So I don't think it is
just a matter of the hardware.  Although I do recall Intel having an
errata out on that class of hardware for occasionally reordering
interrupt messages with the end of interrupt coming before the 
interrupt message itself.  Causing various things to get confused.
It would not surprise me if we were tickling some errata like that.

I would very much like to know if what I merged linus's tree helps.
It is a little more conservative, than my earlier patches.  I need
a way to reproduce this or to work closely with someone who is, because
this sounds like it has a different cause and I need to start with
that assumption.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Keyboard stops working after *lock [Was: 2.6.21-rc2-mm1]

2007-03-09 Thread Jiri Slaby


On 3/9/07, Jiri Kosina <[EMAIL PROTECTED]> wrote:

On Fri, 9 Mar 2007, Jiri Kosina wrote:

> If this is present also in vanilla and not only in -mm, could you please
> try reverting commits 4237081e573b99a48991aa71364b0682c444651c and
> d4ae650a904612ffb7edd3f28b69b022988d2466 and let me know if the
> situation gets any better?

Hi Jiri,


Hi.


or even better, does the patch below (against 2.6.21-rc3) fix the problem
with your keyboard? I can see possibilities of report fields unaligned to
the byte boundary, which this might be causing problems.


I'll try it all.

I don't know if this is related, but my notebook keyboard doesn't emit
numbers with numlock (not even directly Fn+blue number) anymore with
-rc3 (note that LED is flashing when numlock is on). I think -rc2
worked fine (I'm going to check this too). It's Asus M6R, similar
(except wi-fi) to for example yenya's model here:
http://www.fi.muni.cz/~kas/m6r/

thanks,
--
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 2/7] RSS controller core

2007-03-09 Thread Herbert Poetzl

On Tue, Mar 06, 2007 at 02:00:36PM -0800, Andrew Morton wrote:
> On Tue, 06 Mar 2007 17:55:29 +0300
> Pavel Emelianov <[EMAIL PROTECTED]> wrote:
> 
> > +struct rss_container {
> > +   struct res_counter res;
> > +   struct list_head page_list;
> > +   struct container_subsys_state css;
> > +};
> > +
> > +struct page_container {
> > +   struct page *page;
> > +   struct rss_container *cnt;
> > +   struct list_head list;
> > +};
> 
> ah. This looks good. I'll find a hunk of time to go through this work
> and through Paul's patches. It'd be good to get both patchsets lined
> up in -mm within a couple of weeks. But..

doesn't look so good for me, mainly becaus of the 
additional per page data and per page processing

on 4GB memory, with 100 guests, 50% shared for each
guest, this basically means ~1mio pages, 500k shared
and 1500k x sizeof(page_container) entries, which
roughly boils down to ~25MB of wasted memory ...

increase the amount of shared pages and it starts
getting worse, but maybe I'm missing something here

> We need to decide whether we want to do per-container memory
> limitation via these data structures, or whether we do it via a
> physical scan of some software zone, possibly based on Mel's patches.

why not do simple page accounting (as done currently
in Linux) and use that for the limits, without
keeping the reference from container to page?

best,
Herbert

> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] z85230: Fix FIFO handling

2007-03-09 Thread Jeff Garzik


Alan Cox wrote:

We must exit immediately on a FIFO fill not take the end of packet path
otherwise each underrun in PIO transmit mode causes a runt packet and the
data is lost.

Signed-off-by: Alan Cox <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Possible "struct pid" leak from tty_io.c

2007-03-09 Thread Catalin Marinas

On 09/03/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

"Catalin Marinas" <[EMAIL PROTECTED]> writes:

> On 08/03/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> "Catalin Marinas" <[EMAIL PROTECTED]> writes:
>
> I think it's only the pid_chain and rcu member that could be placed in
> a list and kmemleak scans the memory for these two offsets as well.
> I'll check those lists anyway but I doubt it's a more fundamental
> problem with how kmemleak handles struct pid as I should've probably
> got more reports.

Right.  I was pointing out the possibilities but because we do
some tricky things.  Mostly I was wondering about the hlist for
the list of tasks.  Now if a task is on that list we should have
a struct pid_link pointing at our struct pid, so it shouldn't fool
kmemleak but I'm still a little curious if all of those hlist_heads are
NULL pointers.

Yes, all the 3 hlist_head tasks are NULL pointers on the reported object.

--
Catalin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i2c-core: i2c bitbang gpio structure

2007-03-09 Thread Jean Delvare

Hi Bryan,

On Fri, 09 Mar 2007 18:13:21 +0800, Wu, Bryan wrote:
> Hi folks,
> 
> A new structure is added to i2c-core for GPIO-based I2C interface
> adapter. My latest GPIO based I2C adapter driver for Blackfin system
> will use this stuff. And also IXP4XX GPIO based I2C driver can also be
> moved to this.
>
> Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> 
> ---
>  include/linux/i2c.h |   20 
>  1 file changed, 20 insertions(+)
> 
> Index: include/linux/i2c.h
> ===
> --- include/linux/i2c.h   (revision 2813)
> +++ include/linux/i2c.h   (working copy)
> @@ -201,6 +201,26 @@ struct i2c_algorithm {
>  };
>  
>  /*
> + * Some chips do not have an I2C unit, so GPIO lines are just used to 
> + * Used as platform_data to provide GPIO pin information to this kind GPIO 
> + * based I2C driver.
> + */
> +struct i2c_bitbang_gpio {
> + int sda;
> + int scl;
> +};

Why would this be included in the generic i2c.h header file? As far as
I can see this structure only makes sense for bit-banged I2C busses, so
this structure should be declared in i2c-algo-bit.h.

Also, this structure alone isn't very useful. I'm waiting to see
drivers actually making use of it before I will consider merging this
patch at all.

> +
> +static inline int i2c_bitbang_gpio_sda(struct i2c_bitbang_gpio *gpio)
> +{
> + return (gpio->sda);
> +}
> +
> +static inline int i2c_bitbang_gpio_scl(struct i2c_bitbang_gpio *gpio)
> +{
> + return (gpio->scl);
> +}

What's the point of these?

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: refcounting drivers' data structures used in sysfs buffers

2007-03-09 Thread Dmitry Torokhov


On 3/9/07, Oliver Neukum <[EMAIL PROTECTED]> wrote:

Am Freitag, 9. März 2007 17:32 schrieb Alan Stern:
> On Fri, 9 Mar 2007, Oliver Neukum wrote:
>
> > Am Donnerstag, 8. März 2007 17:02 schrieb Alan Stern:
> > > On Thu, 8 Mar 2007, Oliver Neukum wrote:
> > >
> > > > Hi,
> > > >
> > > > after a lightning bolt from high above I've been looking into 
refcounting
> > > > the data structures drivers use to provide the data used to refill sysfs
> > > > buffers. I've come to the following conclusion.
> > > >
> > > > 1. struct sysfs_buffer must have a struct kref * and probably a 
destructor
> > > > pointer
> > > > 2. drivers must be able to pass these pointers through an extended
> > > > device_create_file()
> > > > 3. Drivers must use refcounting if they want to use attributes
> > > > 4. read/write/poll must do refcounting
> > > >
> > > > I am not sure where to store the pointers. struct sysfs_dirent() looks
> > > > like the obvious choice. Comments?
> > >
> > > Can you explain the reasoning that led to these conclusions?  And what
> > > exactly was your lightning bolt?
> >
> > The old race between disconnect and IO to attribute via sysfs again.
> > If I cannot disassociate the drivers from the buffers in the buffers, 
drivers
> > must not deallocate the data necessary to answer sysfs callbacks while
> > a buffer exists.
>
> Why wouldn't you be able to dissociate a driver from a buffer?  That was
> the whole point of adding .orphan to sysfs_buffer and creating
> sysfs_buffer_collection -- it was supposed to solve exactly this race.

It did solve the race but deadlocked when unbinding devices through sysfs.
Linux therefore asked for the patch to be reverted and wants the isue solved
with refcounting.



I think we already have all refcounting that is needed. What is
missing is subsystem-provided ->release() hooks for drivers to release
driver-specific resources when a device finally goes away.

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] revoke: add f_light flag for struct file

2007-03-09 Thread Pekka Enberg


On Fri, Mar 09, 2007 at 12:13:35PM +0100, Eric Dumazet wrote:

> Then just drop the fget_light() 'optimisation' and always take a reference
> (atomic on f_count) regardless of single-thread or not. Instead of dirtying
> f_light, just do the straightforward thing and be with it.
>
> (that is : fget_light() = fget() = no more keeping fput_needed everywhere, and
> convoluted things in some dark sides of the kernel.


On 3/9/07, Benjamin LaHaise <[EMAIL PROTECTED]> wrote:

And it makes things rather slower for a lot of single threaded applications
on modern systems.  Yes, fget_light can be done much more cleanly, but please
don't go around ripping out optimizations just because.


Don't worry, the fget_light() bits are no longer needed:
http://lkml.org/lkml/2007/3/9/151
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Possible "struct pid" leak from tty_io.c

2007-03-09 Thread Eric W. Biederman

"Catalin Marinas" <[EMAIL PROTECTED]> writes:

> Eric,
>
> For a longer explanation, see the second part of this e-mail. In
> short, the patch below seems to fix this particular leak. I'm not sure
> that's the correct/complete fix as I seem to still get a 2nd report.
> Any info is welcomed.

Sure.  I was starting to suspect that location myself.

> diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
> index e453268..4e33dc2 100644
> --- a/drivers/char/tty_io.c
> +++ b/drivers/char/tty_io.c
> @@ -1375,6 +1375,9 @@ static void do_tty_hangup(struct work_struct *work)
>   }
>   read_unlock(&tasklist_lock);
>
> + put_pid(tty->session);
> + put_pid(tty->pgrp);
> +
>   tty->flags = 0;
>   tty->session = NULL;
>   tty->pgrp = NULL;
>
> On 08/03/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> "Catalin Marinas" <[EMAIL PROTECTED]> writes:
>> > The /sbin/init application calls sys_clone() a few times but only one
>> > leak is reported (see below). Looking at the reported pid object (at
>> > 0xc7c14500), count is 2 and nr is 296 but no process with pid 296
>> > exists any more.
> [...]
>> > unreferenced object 0xc7c14500 (size 36):
>> >  comm "init", pid 245, jiffies 4294939289
>> >  backtrace:
>> >[] kmem_cache_alloc
>> >[] alloc_pid
>> >[] do_fork
>> >[] sys_clone
>> >[] ret_fast_syscall
>>
>> I think this is the path that all pid structures come from so
>> unfortunately that doesn't help tracing this problem down.
>
> No, indeed, but that's the only thing kmemleak can report. Anyway, I
> got some more information now, after adding several printk's:
>
> The difference from other pid objects is that this one (with nr 296)
> is passed as a parameter to proc_set_tty(). The __proc_set_tty()
> function increments the pid->count twice via get_pid(), and, with two
> other get_pid calls, the pid->count for this object gets to 5 (1 being
> the initial value). The prints below are function name, struct pid
> address (different from the runs yesterday though), pid->nr and
> pid->count (after get_pid incrementing). It also show the return
> address and symbol (the calling function):
>
>  alloc_pid: c7c149d8, 296, 1
>  get_pid: c7c149d8, 296, 2
>return: c0122d64 (proc_set_tty+0x34/0x54)
>  get_pid: c7c149d8, 296, 3
>return: c0122d64 (proc_set_tty+0x34/0x54)
>  get_pid: c7c149d8, 296, 4
>return: c002b328 (do_exit+0x2e4/0x7f8) - this is actually the get_pid
>  in disassociate_ctty but it is reported like this because of get_pid
>  inlining
>  get_pid: c7c149d8, 296, 5
>return: c0124a0c (tty_vhangup+0x14/0x18)
>
> On the exit path (see below), however, put_pid is called twice before
> free_pid and once via release_task -> detach_pid -> free_pid -> ... ->
> __rcu_process_callbacks -> delayed_put_pid -> put_pid. Note that
> free_pid is called with pid->nr == 3 and the last put_pid gets called
> with nr == 3 as well (but it decrements it to 2 and that's what I find
> at that memory location). In the trace below, the pid->count is
> printed before put_pid modifies it:
>
>  put_pid: c7c149d8, 296, 5
>return: c0124b5c (disassociate_ctty+0x14c/0x230)
>  put_pid: c7c149d8, 296, 4
>return: c0124ba8 (disassociate_ctty+0x198/0x230)
>  detach_pid: c7c149d8, 296, 3
>return: c002a230 (release_task+0x1c0/0x358)
>  detach_pid: c7c149d8, 296, 3
>return: c002a248 (release_task+0x1d8/0x358)
>  detach_pid: c7c149d8, 296, 3
>return: c002a254 (release_task+0x1e4/0x358)
>  free_pid: c7c149d8, 296, 3
>return: c003a990 (detach_pid+0xac/0xc8)
>  ...
>  delayed_put_pid: c7c149d8, 296, 3
>return: c003af68 (__rcu_process_callbacks+0x19c/0x25c)
>  put_pid: c7c149d8, 296, 3
>return: c003a8cc (delayed_put_pid+0x54/0x6c)
>
> In the above disassociate_ctty() function the code below (line 1542)
> doesn't seem to get called:
>
>   tty = get_current_tty();
>   if (tty) {
>   put_pid(tty->session);
>   put_pid(tty->pgrp);
>   tty->session = NULL;
>   tty->pgrp = NULL;
>   } else {
>
> and I get the following error if TTY_DEBUG_HANGUP is defined - "error
> attempted to write to tty [0x] = NULL".
>
> It looks like the tty_vhangup() call in in disassociate_ctty() sets
> current->signal->tty to NULL in the do_each_pid_task loop in
> do_tty_hangup (p->signal->tty = NULL). The second call to
> get_current_tty() in disassociate_ctty() return NULL and therefore no
> put_pid on tty->session and tty->pgrp (which are also set to NULL in
> the previous function).

Thanks.

If I can manage to focus on this, it looks like the information I need to 
start fixing this.

Adding the reference counting when we didn't have any before is always
interesting.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Keyboard stops working after *lock [Was: 2.6.21-rc2-mm1]

2007-03-09 Thread Jiri Slaby


On 3/9/07, Jiri Slaby <[EMAIL PROTECTED]> wrote:

I don't know if this is related, but my notebook keyboard doesn't emit
numbers with numlock (not even directly Fn+blue number) anymore with
-rc3 (note that LED is flashing when numlock is on). I think -rc2
worked fine (I'm going to check this too). It's Asus M6R, similar
(except wi-fi) to for example yenya's model here:
http://www.fi.muni.cz/~kas/m6r/


Ignore this, it's deux ex machina, it works now.

regards,
--
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fs/jffs2/scan.c: Fix error-path leak

2007-03-09 Thread Artem Bityutskiy


Please, do not forget look at MAINTAINERS and CC the maintainer. David is CCed.

Amit Choudhary wrote:

Description: Fix error-path leak in function jffs2_scan_medium(), in file 
fs/jffs2/scan.c

Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>

diff --git a/fs/jffs2/scan.c b/fs/jffs2/scan.c
index e241346..cd9ed6e 100644
--- a/fs/jffs2/scan.c
+++ b/fs/jffs2/scan.c
@@ -130,6 +130,8 @@ #endif
if (jffs2_sum_active()) {
s = kmalloc(sizeof(struct jffs2_summary), GFP_KERNEL);
if (!s) {
+   free(flashbuf);
+   flashbuf = NULL;
JFFS2_WARNING("Can't allocate memory for summary\n");
return -ENOMEM;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: refcounting drivers' data structures used in sysfs buffers

2007-03-09 Thread Oliver Neukum

Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:

> I think we already have all refcounting that is needed. What is
> missing is subsystem-provided ->release() hooks for drivers to release
> driver-specific resources when a device finally goes away.

This is an interesting idea. Is it nice to pass through release()
but not open() ?

Regards
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [SLUB 0/3] SLUB: The unqueued slab allocator V4

2007-03-09 Thread Christoph Lameter

On Fri, 9 Mar 2007, Mel Gorman wrote:

> The results without slub_debug were not good except for IA64. x86_64 and ppc64
> both blew up for a variety of reasons. The IA64 results were

Yuck that is the dst issue that Adrian is also looking at. Likely an issue 
with slab merging and RCU frees.

> KernBench Comparison
> 
>   2.6.21-rc2-mm2-clean   2.6.21-rc2-mm2-slub
> %diff
> User   CPU time1084.64   1032.93 4.77%
> System CPU time  73.38 63.14 
> 13.95%
> Total  CPU time1158.02   1096.07 5.35%
> Elapsedtime 307.00285.62 6.96%

Wow! The first indication that we are on the right track with this.

> AIM9 Comparison
>  2 page_test   2097119.26 3398259.27 1301140.01 
> 62.04% System Allocations & Pages/second

Wow! Must have all stayed within slab boundaries.

>  8 link_test 64776.047488.13  -57287.91 
> -88.44% Link/Unlink Pairs/second

Crap. Maybe we straddled a slab boundary here?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Trouble using some (fast) compact flash as ide device on an embedded system

2007-03-09 Thread Marco Lazzarotto

Hallo! :-)

Bartlomiej Zolnierkiewicz ha scritto:
> Czesc!
> 
> On Tuesday 06 March 2007, Marco Lazzarotto wrote:
> 
>>Ciao!
>>
>>Bartlomiej Zolnierkiewicz ha scritto:
>>
>>>On Friday 02 March 2007, Pavel Machek wrote:
>>>
>>>
Hi!

>As I reported in bug 8036 in bugzilla.kernel.org,
>
>Hardware Environment:
>
>- Use a compact flash SanDisk SDCFB-128 Firmware revision HDX 2.15
>  (we used other compact flashes with the same hw ad sw for years
>   with  no trouble)
>
>It happens on both etx boards:
>- VIA SOM-ETX (4475)
>- Gene-4312
>>
>>ERRATA CORRIGE: Gene-4312 is not a etx board ;-) but a pc/104
> 
> 
> What IDE hardware / host driver is used by this system?

NB: I'm usign the VIA SOM-ETX (4475) for debugging

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot :00:07.1
PCI: Calling quirk c01dc1e8 for :00:07.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci:00:07.1
ide1: BM-DMA at 0xe408-0xe40f, BIOS settings: hdc:DMA, hdd:pio

(I disabled DMA in the bios, why is saying it is enabled?)

>Doing the command
>sfdisk -R /dev/hdc
>
>gives:
>
>* * *
>ide1: start_request: current=0xc6ebe754 (rq->sect=0,block 0)
>hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest }
>ide: failed opcode was: unknown
>hdc: drive not ready for command
>ide1: start_request: current=0xc6ebe754 (rq->sect=0,block 0)
>hdc: do_special: 0x02
>hdc: do_special: recalibrate
>ide1: start_request: current=0xc6ebe754 (rq->sect=0,block 0)
>hdc: reading: block=0 sectors=8, buffer = 0xc6cd4
>ide1: end_request: current=0xc6ebe754
>* * *
>
>the 'bad bit' in status error is DataRequest
> 
> 
> Seems like the device wants data from/to host and I have no idea why this
> is happening.  It might be that this particular CF has problems with one
> of the commands that IDE driver issues during device initialization.
> 
> I assume that device is recognized properly by the driver during probe, right?
> If so probably adding some debugging printks (i.e. dumping status register)
> to ide-disk.c:idedisk_setup() would shed some more light at the problem...

The device seems to be recognized properly.
Here's (part of) the dmesg output:

 * * *

ide1: BM-DMA at 0xe408-0xe40f, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide1...
probing for hdc: present=0, media=32, probetype=ATA
hdc: SanDisk SDCFB-128, CFA DISK drive ()
Before ide_disk_init_chs() and ide_disk_init_mult_count()
IDE_STATUS_REG=0x50
After ide_disk_init_chs() and ide_disk_init_mult_count() IDE_STATUS_REG=0x50
probing for hdd: present=0, media=32, probetype=ATA
probing for hdd: present=0, media=32, probetype=ATAPI
ide_init_queue()
ide1 at 0x170-0x177,0x376 on irq 15
Probing IDE interface ide0...
probing for hda: present=0, media=32, probetype=ATA
probing for hda: present=0, media=32, probetype=ATAPI
probing for hdb: present=0, media=32, probetype=ATA
probing for hdb: present=0, media=32, probetype=ATAPI
Probing IDE interface ide2...
probing for hde: present=0, media=32, probetype=ATA
probing for hde: present=0, media=32, probetype=ATAPI
probing for hdf: present=0, media=32, probetype=ATA
probing for hdf: present=0, media=32, probetype=ATAPI
Probing IDE interface ide3...
probing for hdg: present=0, media=32, probetype=ATA
probing for hdg: present=0, media=32, probetype=ATAPI
probing for hdh: present=0, media=32, probetype=ATA
probing for hdh: present=0, media=32, probetype=ATAPI
hdc: max request size: 128KiB
After init_idedisk_capacity() IDE_STATUS_REG=0x50
After idedisk_capacity() IDE_STATUS_REG=0x50
hdc: 250880 sectors (128 MB) w/1KiB Cache (buf_size=2), CHS=980/8/32
After write_cache(drive,1) IDE_STATUS_REG=0x50
 hdc:
ide1: start_request: current=0xc1190804 (rq->sect=0,block 0, SECTOR_SIZE=512
hdc: do_special: 0x03
hdc: do_special: set_geometry
ide1: start_request: current=0xc1190804 (rq->sect=0,block 0, SECTOR_SIZE=512
hdc: do_special: 0x02
hdc: do_special: recalibrate
hdc : recal_intr() IDE_STATUS_REG=50
ide1: start_request: current=0xc1190804 (rq->sect=0,block 0, SECTOR_SIZE=512
hdc: reading: block=0, sectors=8, buffer=0xc6c2d000
ide1: end_request:   current=0xc1190804
 hdc1

 * * *

I dump IDE_STATUS_REG with e.g. 'printk("%s : recal_intr()
IDE_STATUS_REG=%02x\n",drive->name,stat)'
where stat was assigned as 'u8 stat=hwif->INB(IDE_STATUS_REG)'

It seems to me that the status is good until it tries to read the
partition table...

In fact, after I do

sfdisk -R /dev/hdc

every other reading from compact flash (if ever does not get 'lost
interrupt) generates the message
 hdc: status error: status=0x58 {...}

>
>doing
>sfdisk -l /dev/hdc
>
>gives:
>
>* * *
>ide1: start_request: curre

Re: "No handler for vector" patches don't work on some systems

2007-03-09 Thread Chuck Ebbert

Eric W. Biederman wrote:
> Chuck Ebbert <[EMAIL PROTECTED]> writes:
>>
>> So far I've tried the simple "survive having no handler
>> for a vector" patch and the preliminary 3-patch series
>> that was in -mm for a while, and neither work on the
>> Dell PowerEdge 29xx and 19xx systems. These servers
>> have the Intel 5000X chipset with the 6700PXH PCI Hub
>> with dual independent PCI-X busses, each with its own
>> I/OxAPIC with 24 interrupts. The fixes do work on
>> "simple" systems but not on these high-end ones.
> 
> 
> I would very much like to know if what I merged linus's tree helps.
> It is a little more conservative, than my earlier patches.  I need
> a way to reproduce this or to work closely with someone who is, because
> this sounds like it has a different cause and I need to start with
> that assumption.

Was that merged or is it still in -mm? The last thing I see in
arch/x86_64/irq.c is:

[PATCH] x86-64: survive having no irq mapping for a vector

And we tried that one.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spi subsystem: destroy the spi_bitbang workqueue only after the spi master is unregistered

2007-03-09 Thread Chris Lesiak

From: Chris Lesiak <[EMAIL PROTECTED]>

This patch fixes a bug in the cleanup of an spi_bitbang bus.  The
workqueue associated with the bus was destroyed before the call to
spi_unregister_master.  That meant that spi devices on that bus would be
unable to do IO in their remove method.  The shutdown flag should have
been able to prevent a segfault, but was never getting set.  By waiting
to destroy the workqueue until after the master is unregistered, devices
are able to do IO in their remove methods.  An added benefit is that
neither the shutdown flag nor a wait for the queue of messages to empty
is needed.

Signed-off-by: Chris Lesiak <[EMAIL PROTECTED]>

---

diff -uprN -X linux-2.6.20-vanilla/Documentation/dontdiff 
linux-2.6.20-vanilla/drivers/spi/spi_bitbang.c 
linux-2.6.20/drivers/spi/spi_bitbang.c
--- linux-2.6.20-vanilla/drivers/spi/spi_bitbang.c  2007-02-04 
12:44:54.0 -0600
+++ linux-2.6.20/drivers/spi/spi_bitbang.c  2007-03-09 11:23:42.0 
-0600
@@ -302,10 +302,6 @@ static void bitbang_work(struct work_str
setup_transfer = NULL;
 
list_for_each_entry (t, &m->transfers, transfer_list) {
-   if (bitbang->shutdown) {
-   status = -ESHUTDOWN;
-   break;
-   }
 
/* override or restore speed and wordsize */
if (t->speed_hz || t->bits_per_word) {
@@ -410,8 +406,6 @@ int spi_bitbang_transfer(struct spi_devi
m->status = -EINPROGRESS;
 
bitbang = spi_master_get_devdata(spi->master);
-   if (bitbang->shutdown)
-   return -ESHUTDOWN;
 
spin_lock_irqsave(&bitbang->lock, flags);
if (!spi->max_speed_hz)
@@ -506,28 +500,12 @@ EXPORT_SYMBOL_GPL(spi_bitbang_start);
  */
 int spi_bitbang_stop(struct spi_bitbang *bitbang)
 {
-   unsignedlimit = 500;
-
-   spin_lock_irq(&bitbang->lock);
-   bitbang->shutdown = 0;
-   while (!list_empty(&bitbang->queue) && limit--) {
-   spin_unlock_irq(&bitbang->lock);
+   spi_unregister_master(bitbang->master);
 
-   dev_dbg(bitbang->master->cdev.dev, "wait for queue\n");
-   msleep(10);
-
-   spin_lock_irq(&bitbang->lock);
-   }
-   spin_unlock_irq(&bitbang->lock);
-   if (!list_empty(&bitbang->queue)) {
-   dev_err(bitbang->master->cdev.dev, "queue didn't empty\n");
-   return -EBUSY;
-   }
+   WARN_ON(!list_empty(&bitbang->queue));
 
destroy_workqueue(bitbang->workqueue);
 
-   spi_unregister_master(bitbang->master);
-
return 0;
 }
 EXPORT_SYMBOL_GPL(spi_bitbang_stop);
diff -uprN -X linux-2.6.20-vanilla/Documentation/dontdiff 
linux-2.6.20-vanilla/include/linux/spi/spi_bitbang.h 
linux-2.6.20/include/linux/spi/spi_bitbang.h
--- linux-2.6.20-vanilla/include/linux/spi/spi_bitbang.h2007-02-04 
12:44:54.0 -0600
+++ linux-2.6.20/include/linux/spi/spi_bitbang.h2007-03-09 
11:23:42.0 -0600
@@ -25,7 +25,6 @@ struct spi_bitbang {
spinlock_t  lock;
struct list_headqueue;
u8  busy;
-   u8  shutdown;
u8  use_dma;
 
struct spi_master   *master;




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] chaostables

2007-03-09 Thread Jan Engelhardt

Hello,

On Mar 9 2007 11:54, Amin Azez wrote:
>> Adding a member to the ip_conntrack/nf_conntrack and sk_buff struct
>> would increase the struct sizes, and that would penalize users who do
>> not intend to use xt_portscan.
>
>I understand what you say but it sounds a bit like saying: "but we didn't
>make it very good because so few people would use it anyway" which of
>course makes it even less attractive. I realise you have your own
>interpretation but this is how it reads to me.

I just gave the reason why I designed it the way it is now. If you 
really feel it needs to be changed, well, I don't really object to that. 
chaostables has only seen like.. 1 1/2 version announcements (urls to 
tarballs, no patches) to mailing lists, and except for the few users who 
definitely tried it (based on questions I received), there have not been 
any suggestions for changes yet, which either tells me that nobody is 
interested or everything is fine.

>> I do not see why the packet/connection marks should not be used to record
>> additional information
>...
>> Almost never I required connection marking myself 
>I guessed as much. I use it heavily, with my xml rule generators.
>> except for this
>> portscanning automaton and perhaps a little MARK here and there for
>> finely-tuned SNAT. Again, things might look different on your side(s).
>
>There's too many things fighting over the same few bits of the mark, and
>in your case you are using it to track internal state of a connection
>that has no relevance to the rest of the iptables/ebtables rules.
>
>I'm suggesting that some of the people who would want to use the chaos
>match, won't because of the mark issue.
>
>This is not a new problem.
>
>http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/16217

"""netfilter marks are the solution of last resort. This is
becoming very painful for those of us who produce general
Netfilter configuration tools.""" -Toam Eastep

I see. Thank you for the link. I think you are on the way to have me 
convinced.

Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: refcounting drivers' data structures used in sysfs buffers

2007-03-09 Thread Dmitry Torokhov

On 3/9/07, Oliver Neukum <[EMAIL PROTECTED]> wrote:

Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:

> I think we already have all refcounting that is needed. What is
> missing is subsystem-provided ->release() hooks for drivers to release
> driver-specific resources when a device finally goes away.

This is an interesting idea. Is it nice to pass through release()
but not open() ?

Not sure if I follow... Generally speaking open is not a mandatory
operation; however every object in driver model has a release method.
What I am saying is that certain drivers need to have their disconnect
method split in 2 parts - one that shuts down the device and second is
releases resources that might be accesses through sysfs (and other
kernel parts). That second part will have to be called from
subsystem's core ->release() method se we need a release() hook.

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] ibmebus: dynamic addiiton/removal of adapters, uevent, root device based on struct device

2007-03-09 Thread Joachim Fenkes

John Rose <[EMAIL PROTECTED]> wrote on 06.03.2007 22:51:42:

> We are seeing several build errors when attempting to apply this to
> 2.6.21-rc2:

Hot Damn! I did my test compiles with gcc 3.3, and you obviously compiled 
with gcc 4.1 - I only got a warning where you got an error, and that 
warning escaped me. Sorry about that.

I fixed the error and all warnings and will post a fresh set of patches 
right after this reply. If you could give the new patch another go and ack 
it if it works, I would be delighted! =)

Thanks for pointing this out!
  Joachim

---
Joachim Fenkes  --  eHCA Linux Driver Developer and Hardware Tamer
IBM Deutschland Entwicklung GmbH  --  Dept. 3627 (I/O Firmware Dev. 2)
Schoenaicher Strasse 220  --  71032 Boeblingen  --  Germany
eMail: [EMAIL PROTECTED]  --  Phone: +49 7031 16 1239 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.20-rc3: Clocksource tsc unstable

2007-03-09 Thread Jiri Slaby


Hi.

I got this message after suspend;resume on my notebook
Clocksource tsc unstable (delta = -154983451 ns)

What other info should I post, who should I Cc?

regards,
--
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i2c-core: i2c bitbang gpio structure

2007-03-09 Thread David Brownell

On Friday 09 March 2007 8:55 am, Jean Delvare wrote:

> > +struct i2c_bitbang_gpio {
> > +   int sda;
> > +   int scl;
> > +};
> 
> ...
> 
> Also, this structure alone isn't very useful. I'm waiting to see
> drivers actually making use of it before I will consider merging this
> patch at all.

The notion would be that we could have one i2c bitbanger using
the CONFIG_GENERIC_GPIO  interfaces that could work
on most platforms, using that struct for platform_data and the
usual convention for platform device naming.

I'd expect that struct would be merged as part of such a generic
GPIO bitbang driver, and would only be used by that one driver.

SPI could use such a generic bitbanger too.  Until 2.6.21 it's
been missing that last step:  it's needed platform-specific
GPIO calls, so the bitbangers were generic except for those
lowest-level hooks.

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: "No handler for vector" patches don't work on some systems

2007-03-09 Thread Eric W. Biederman

Chuck Ebbert <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote:
>> Chuck Ebbert <[EMAIL PROTECTED]> writes:
>>>
>>> So far I've tried the simple "survive having no handler
>>> for a vector" patch and the preliminary 3-patch series
>>> that was in -mm for a while, and neither work on the
>>> Dell PowerEdge 29xx and 19xx systems. These servers
>>> have the Intel 5000X chipset with the 6700PXH PCI Hub
>>> with dual independent PCI-X busses, each with its own
>>> I/OxAPIC with 24 interrupts. The fixes do work on
>>> "simple" systems but not on these high-end ones.
>> 
>> 
>> I would very much like to know if what I merged linus's tree helps.
>> It is a little more conservative, than my earlier patches.  I need
>> a way to reproduce this or to work closely with someone who is, because
>> this sounds like it has a different cause and I need to start with
>> that assumption.
>
> Was that merged or is it still in -mm? The last thing I see in
> arch/x86_64/irq.c is:
>
>   [PATCH] x86-64: survive having no irq mapping for a vector
>
> And we tried that one.

Look in arch/x86_64/io_apic.c. That is where most of the work happened.
If you can extract that patch series for a backport more power to you.

Eric

commit 610142927b5bc149da92b03c7ab08b8b5f205b74
Author: Eric W. Biederman <[EMAIL PROTECTED]>
Date:   Fri Feb 23 04:40:58 2007 -0700

[PATCH] x86_64 irq: Safely cleanup an irq after moving it.

The problem:  After moving an interrupt when is it safe to teardown
the data structures for receiving the interrupt at the old location?

With a normal pci device it is possible to issue a read to a device
to flush all posted writes.  This does not work for the oldest ioapics
because they are on a 3-wire apic bus which is a completely different
data path.  For some more modern ioapics when everything is using
front side bus delivery you can flush interrupts by simply issuing a
read to the ioapic.  For other modern ioapics emperical testing has
shown that this does not work.

So it appears the only reliable way to know the last of the irqs from an
ioapic have been received from before the ioapic was reprogrammed is to
received the first irq from the ioapic from after it was reprogrammed.

Once we know the last irq message has been received from an ioapic
into a local apic we then need to know that irq message has been
processed through the local apics.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] revoke: add f_light flag for struct file

2007-03-09 Thread Eric Dumazet

On Friday 09 March 2007 17:11, Benjamin LaHaise wrote:
> On Fri, Mar 09, 2007 at 12:13:35PM +0100, Eric Dumazet wrote:
> > Then just drop the fget_light() 'optimisation' and always take a
> > reference (atomic on f_count) regardless of single-thread or not. Instead
> > of dirtying f_light, just do the straightforward thing and be with it.
> >
> > (that is : fget_light() = fget() = no more keeping fput_needed
> > everywhere, and convoluted things in some dark sides of the kernel.
>
> And it makes things rather slower for a lot of single threaded applications
> on modern systems.  Yes, fget_light can be done much more cleanly, but
> please don't go around ripping out optimizations just because.

Sure. But I apparently was the only guy to react to the f_light horror story.

And it seems a solution was found, after some mail exchanges.

In French we have this expression : "Precher le faux pour savoir le vrai"

You could translate to "make false statements in order to discover the truth" 
or "to tell a lie in order to get at the truth" or maybe "playing the devil's 
advocate", but really the French one is better :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Use more gcc extensions in the Linux headers

2007-03-09 Thread Linus Torvalds

On Fri, 9 Mar 2007, Christoph Hellwig wrote:
> 
> It was only put in under the premise that they'll fix whatever breaks,
> we're not going to put any maintaince border on us to hack around
> broken propritary compilers.

Well, since Rusty's macro was hoddible *anyway*, I don't think I'd apply 
it as-is. Breaking icc for something that ugly and not-very-important 
simply makes no sense.

There are better ways to do this. 

For one, you could (and should!) abstract these kinds of things out, 
rather than put them in another macro that really does something totally 
different. Then, the macro could have become

#define ARRAY_SIZE (sizeof_expression + 0*error_if_not_array)

which would already be a hell of a lot more readable. But more 
importantly, it's also now suddenly much easiler to abstract out for 
different compilers.

We *already* support different compilers through , and 
there just isn't any reason for bad code just for bad codes sake!

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] ibmebus: whitespace fixes

2007-03-09 Thread Joachim Fenkes

This fixes a lot of whitespace in ibmebus.[ch]


Signed-off-by: Joachim Fenkes <[EMAIL PROTECTED]>
---


This patchset applies on top of a vanilla 2.6.20 kernel.
No dependencies on other patches except for part 3/3.
This is a repost of my earlier patchset and fixes a stupid
compile error.


 arch/powerpc/kernel/ibmebus.c |  126 +-
 include/asm-powerpc/ibmebus.h |   42 +++---
 2 files changed, 84 insertions(+), 84 deletions(-)


diff -Nurp 01.original/arch/powerpc/kernel/ibmebus.c 
02.whitespace-fixes/arch/powerpc/kernel/ibmebus.c
--- 01.original/arch/powerpc/kernel/ibmebus.c   2007-02-22 05:26:24.0 
+0100
+++ 02.whitespace-fixes/arch/powerpc/kernel/ibmebus.c   2007-02-22 
06:57:18.0 +0100
@@ -3,35 +3,35 @@
  *
  * Copyright (c) 2005 IBM Corporation
  *  Heiko J Schick <[EMAIL PROTECTED]>
- *
+ *
  * All rights reserved.
  *
- * This source code is distributed under a dual license of GPL v2.0 and OpenIB 
- * BSD. 
+ * This source code is distributed under a dual license of GPL v2.0 and OpenIB
+ * BSD.
  *
  * OpenIB BSD License
  *
- * Redistribution and use in source and binary forms, with or without 
- * modification, are permitted provided that the following conditions are met: 
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
  *
- * Redistributions of source code must retain the above copyright notice, this 
- * list of conditions and the following disclaimer. 
+ * Redistributions of source code must retain the above copyright notice, this
+ * list of conditions and the following disclaimer.
  *
- * Redistributions in binary form must reproduce the above copyright notice, 
- * this list of conditions and the following disclaimer in the documentation 
+ * Redistributions in binary form must reproduce the above copyright notice,
+ * this list of conditions and the following disclaimer in the documentation
  * and/or other materials
- * provided with the distribution. 
+ * provided with the distribution.
  *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 
- * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
- * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE 
- * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
- * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
  * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, 
WHETHER
- * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 
- * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 
+ * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
@@ -55,7 +55,7 @@ static void *ibmebus_alloc_coherent(stru
gfp_t flag)
 {
void *mem;
-   
+
mem = kmalloc(size, flag);
*dma_handle = (dma_addr_t)mem;
 
@@ -63,7 +63,7 @@ static void *ibmebus_alloc_coherent(stru
 }
 
 static void ibmebus_free_coherent(struct device *dev,
- size_t size, void *vaddr, 
+ size_t size, void *vaddr,
  dma_addr_t dma_handle)
 {
kfree(vaddr);
@@ -79,7 +79,7 @@ static dma_addr_t ibmebus_map_single(str
 
 static void ibmebus_unmap_single(struct device *dev,
 dma_addr_t dma_addr,
-size_t size, 
+size_t size,
 enum dma_data_direction direction)
 {
return;
@@ -90,13 +90,13 @@ static int ibmebus_map_sg(struct device 
  int nents, enum dma_data_direction direction)
 {
int i;
-   
+
for (i = 0; i < nents; i++) {
-   sg[i].dma_address = (dma_addr_t)page_address(sg[i].page) 
+   sg[i].dma_address = (dma_addr_t)page_address(sg[i].page)
+ sg[i].offset;
sg[i].dma_length = sg[i].length;
}
-   
+
return nents;
 }
 
@@ -128,15 +128,15 @@ static int ibmebus_bus_probe(struct devi
struct ibmeb

[PATCH 2/3] ibmebus: dynamic addition/removal of adapters, some code cleanup

2007-03-09 Thread Joachim Fenkes

This adds two sysfs attributes to /sys/bus/ibmebus which can be used to
notify the ebus driver of added / removed ebus devices in the OF device
tree.

Echoing the device's location code (as found in the OFDT "ibm,loc-code"
property) into the "probe" attribute will notify ebus of addition of the
device and cause the appropriate device driver's probe function to be called
on the device.

Likewise, echoing the location code into the "remove" attribute will cause
the device to be removed from the system.

The writes will block until the respective operation has finished and return
an error code if the operation failed.

In addition, two minor tidbits are fixed:

- The fake root device used to provide a common parent for all ebus devices
  is now based on device instead of of_device - it had no associated devtree
  node. This saves several checks throughout the ebus driver.

- The sysfs attributes are now generated automagically by device_register()
  instead of by the ibmebus code, which saves a few compiler warnings about
  unused return codes.


Signed-off-by: Joachim Fenkes <[EMAIL PROTECTED]>
---


This is a repost of my earlier patch, fixing a stupid compile error and some
warnings.


 arch/powerpc/kernel/ibmebus.c |  167 +-
 include/asm-powerpc/ibmebus.h |2
 2 files changed, 134 insertions(+), 35 deletions(-)


diff -Nurp 02.whitespace-fixes/arch/powerpc/kernel/ibmebus.c 
03.almost-all/arch/powerpc/kernel/ibmebus.c
--- 02.whitespace-fixes/arch/powerpc/kernel/ibmebus.c   2007-02-22 
06:57:18.0 +0100
+++ 03.almost-all/arch/powerpc/kernel/ibmebus.c 2007-03-09 17:37:08.309979440 
+0100
@@ -2,6 +2,7 @@
  * IBM PowerPC IBM eBus Infrastructure Support.
  *
  * Copyright (c) 2005 IBM Corporation
+ *  Joachim Fenkes <[EMAIL PROTECTED]>
  *  Heiko J Schick <[EMAIL PROTECTED]>
  *
  * All rights reserved.
@@ -43,12 +44,14 @@
 #include 
 #include 
 
-static struct ibmebus_dev ibmebus_bus_device = { /* fake "parent" device */
-   .name = ibmebus_bus_device.ofdev.dev.bus_id,
-   .ofdev.dev.bus_id = "ibmebus",
-   .ofdev.dev.bus= &ibmebus_bus_type,
+#define MAX_LOC_CODE_LENGTH 80
+
+static struct device ibmebus_bus_device = { /* fake "parent" device */
+   .bus_id = "ibmebus",
 };
 
+struct bus_type ibmebus_bus_type;
+
 static void *ibmebus_alloc_coherent(struct device *dev,
size_t size,
dma_addr_t *dma_handle,
@@ -158,21 +161,12 @@ static void __devinit ibmebus_dev_releas
kfree(to_ibmebus_dev(dev));
 }
 
-static ssize_t ibmebusdev_show_name(struct device *dev,
-   struct device_attribute *attr, char *buf)
-{
-   return sprintf(buf, "%s\n", to_ibmebus_dev(dev)->name);
-}
-static DEVICE_ATTR(name, S_IRUSR | S_IRGRP | S_IROTH, ibmebusdev_show_name,
-  NULL);
-
-static struct ibmebus_dev* __devinit ibmebus_register_device_common(
+static int __devinit ibmebus_register_device_common(
struct ibmebus_dev *dev, const char *name)
 {
int err = 0;
 
-   dev->name = name;
-   dev->ofdev.dev.parent  = &ibmebus_bus_device.ofdev.dev;
+   dev->ofdev.dev.parent  = &ibmebus_bus_device;
dev->ofdev.dev.bus = &ibmebus_bus_type;
dev->ofdev.dev.release = ibmebus_dev_release;
 
@@ -186,12 +180,10 @@ static struct ibmebus_dev* __devinit ibm
if ((err = of_device_register(&dev->ofdev)) != 0) {
printk(KERN_ERR "%s: failed to register device (%d).\n",
   __FUNCTION__, err);
-   return NULL;
+   return -ENODEV;
}
 
-   device_create_file(&dev->ofdev.dev, &dev_attr_name);
-
-   return dev;
+   return 0;
 }
 
 static struct ibmebus_dev* __devinit ibmebus_register_device_node(
@@ -205,18 +197,18 @@ static struct ibmebus_dev* __devinit ibm
if (!loc_code) {
 printk(KERN_WARNING "%s: node %s missing 'ibm,loc-code'\n",
   __FUNCTION__, dn->name ? dn->name : "");
-   return NULL;
+   return ERR_PTR(-EINVAL);
 }
 
if (strlen(loc_code) == 0) {
printk(KERN_WARNING "%s: 'ibm,loc-code' is invalid\n",
   __FUNCTION__);
-   return NULL;
+   return ERR_PTR(-EINVAL);
}
 
dev = kzalloc(sizeof(struct ibmebus_dev), GFP_KERNEL);
if (!dev) {
-   return NULL;
+   return ERR_PTR(-ENOMEM);
}
 
dev->ofdev.node = of_node_get(dn);
@@ -227,9 +219,9 @@ static struct ibmebus_dev* __devinit ibm
min(length, BUS_ID_SIZE - 1));
 
/* Register with generic device framework. */
-   if (ibmebus_register_device_common(dev, dn->name) == NULL) {
+   if (ibmebus_register_device_common(dev, dn->name) != 0) {
kfree(dev);
-   return NULL;
+   return ERR_PTR(-ENODEV);
}

ABI coupling to hypervisors via CONFIG_PARAVIRT

2007-03-09 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> > Sure, that's clean, From that perspective the apic is a bunch of 
> > registers backed by a state machine or something.
> 
> I think you could do much worse than just decide to pick the 
> IO-APIC/lapic as your "virtual interrupt controller model". So I do 
> *not* think that APICRead/APICWrite are in any way horrible interfaces 
> for a virtual interrupt controller. In many ways, you then have a 
> tested and known interface to work with.

yes - but we already support the raw hardware ABI, in the native kernel.

paravirt_ops is not 'just another PC sub-arch'. It is not 'just another 
hardware driver'. It is not 'just another x86 CPU'. paravirt_ops is much 
wider than that, it hooks everywhere and has effect on everything!

Lets take a look at the raw numbers. Here's a typical distro kernel 
vmlinux, with and without CONFIG_PARAVIRT [with no paravirt backend 
enabled]:

 textdata bss dec hex filename
   139863   49010   57672  246545   3c311 x86-kernel-built-in.o.noparavirt
   148865   49310   57672  255847   3e767 x86-kernel-built-in.o.paravirt

 textdata bss dec hex filename
  5154975  586932  221184 5963091  5afd53 vmlinux.noparavirt
  5189197  587504  221184 5997885  5b853d vmlinux.paravirt

why did code size increase by +6.4% in arch/i386/ (+0.7% in the 
vmlinux)? It is purely because CONFIG_PARAVIRT adds more than _1400_ 
function call hooks to the x86 arch:

 c05c8e60 D paravirt_ops

 c0102602:   ff 15 9c 8e 5c c0   call   *0xc05c8e9c
 c0102d37:   ff 15 94 8e 5c c0   call   *0xc05c8e94
 c0102d45:   ff 15 94 8e 5c c0   call   *0xc05c8e94
 c0102d53:   ff 15 94 8e 5c c0   call   *0xc05c8e94
 c0102d61:   ff 15 94 8e 5c c0   call   *0xc05c8e94
 c0102d6f:   ff 15 94 8e 5c c0   call   *0xc05c8e94
 [...]

 $ objdump -d vmlinux | grep c05c8e | wc -l
 1463

_1463_ hooks, spread out all around the x86 arch.

Are these only trivial hooks a'ka alternatives.h? Not at all, these are 
full-blown function hooks freely modifiable by a paravirt_ops 
implementation, spread throughout the architecture in a finegrained way. 
(see my arguments and specific demonstration about the bad effects of 
this, four paragraphs below.)

As a comparison: people argued about CONFIG_SECURITY hooks and flamed 
about them no end. The reality is, there's only _269_ calls to 
security_ops in this same kernel, and i've got CONFIG_SECURITY + SELINUX 
enabled. And the only functional modification that security_ops does to 
native behavior is "deny the syscall". Not 'full control over 
behavior'... In terms of coupling, CONFIG_SECURITY hooks are a walk in 
the park, relative to CONFIG_PARAVIRT.

we dont even give /real silicon/ that many hooks! If an x86 CPU came 
along that required the addition of 1400+ function hooks then we'd say: 
'you must be joking, that's not an x86 CPU! Make it more compatible!'.

please dont get me wrong - 1463 hooks spread out might be fine in the 
end, but _if and only if_ there are safeguards in place to make sure 
they are just a trivial variation of the hardware ABI - a'ka 
asm/alternatives.h. But there is _no_ such safeguard in place today and 
we are seeing the bad effects of that _already_, with just a _single_ 
hypervisor and a _single_ abstraction topic (time), so i'm very strongly 
convinced that it's a serious issue that cannot just be glossed over 
with "relax, it will work out fine". If there's one thing we learned in 
the past 15 years is that ABI issues will haunt us forever.

Let me demonstrate some of the bad effects, and how far we've _already_ 
deviated from the 'hardware ABI'. An example: one assumes that 
paravirt_ops.safe_halt() is a trivial variation of the 'halt 
instruction', right? But vmi.c and vmitimer.c does much more than that. 
Take a look at vmi_safe_halt() which calls vmi_stop_hz_timer(): it hacks 
back a jiffies assumption into its code via paravirt_ops.safe_halt() - 
purely via changes local to vmitimer.c, by using next_timer_interrupt()! 
Thus it has created a _dual layer_ of dynticks that we specifically 
objected against. It does so in spite of our warning about why that is 
bad, it does so in spite of Xen having implemented a clockevents driver 
in 2 hours, and it does so under the cover of 'oh, this is only a 
vmitimer.c local change'. It circumvents the native dynticks framework 
and in essence brings in the bad NO_IDLE_HZ technique that we worked so 
hard for 2 years not to ever enable for the i386 arch!

so one of my very real problems with paravirt_ops is that due to its 
sheer hook-based impact it allows the modification of the hardware ABI 
on a _very_ wide scale: both unintentionally and intentionally. 

Furthermore, it allows the introduction of hard-to-remove hardwired 
quirks that bind one particular paravirt_ops method to the hypervisor 
ABI - quirks that are not present in any real silicon! Quirks 
_guaranteed by Linux_, by virtu

[PATCH 3/3] ibmebus: uevent support

2007-03-09 Thread Joachim Fenkes

This adds uevent support to ibmebus using the generic of_device_uevent()
function.


Signed-off-by: Joachim Fenkes <[EMAIL PROTECTED]>
---


I split this change into a separate patch because it depends on another
patch against 2.6.20, submitted by Sylvain Munaut:
http://patchwork.ozlabs.org/linuxppc/patch?id=9558


 ibmebus.c |1 +
 1 file changed, 1 insertion(+)


diff -Nurp 03.almost-all/arch/powerpc/kernel/ibmebus.c 
04.uevent/arch/powerpc/kernel/ibmebus.c
--- 03.almost-all/arch/powerpc/kernel/ibmebus.c 2007-03-09 17:37:08.309979440 
+0100
+++ 04.uevent/arch/powerpc/kernel/ibmebus.c 2007-03-07 19:07:53.0 
+0100
@@ -460,6 +460,7 @@ static struct bus_attribute ibmebus_bus_
 
 struct bus_type ibmebus_bus_type = {
.name  = "ibmebus",
+   .uevent= of_device_uevent,
.match = ibmebus_bus_match,
.dev_attrs = ibmebus_dev_attrs,
.bus_attrs = ibmebus_bus_attrs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc3-mm1 RSDL results

2007-03-09 Thread Mark Lord


Mmm.. when it's good, it's *really* good.
My desktop feels snappier and all of that.

No noticeable jerkiness of windows/scrolling,
which I *do* observe with the stock scheduler.

But when it's bad, it stinks.
Like when a "make -j2" kernel rebuild is happening in a background window

This is on a Pentium-M 760 single-core, w/2GB SDRAM (notebook).

JADP (Just Another Data Point).

Cheers

Mark
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4 TRY#3] optimize and simplify get_cycles_sync()

2007-03-09 Thread Avi Kivity


Joerg Roedel wrote:

From: Joerg Roedel <[EMAIL PROTECTED]>

This patch simplifies the get_cycles_sync() function by removing
the #ifdefs from it. Further it introduces an optimization for AMD
processors. There the RDTSCP instruction is used instead of CPUID;RDTSC
which is helpfull if the kernel runs as a KVM guest. Running as a guest
makes CPUID very expensive because it causes an intercept of the guest.

  
+#define RDTSCP ".byte 0x0f, 0x01, 0xf9"

+   alternative_io_two("cpuid\nrdtsc",
+  "rdtsc", X86_FEATURE_SYNC_RDTSC,
+  ".byte 0x0f, 0x01, 0xf9", X86_FEATURE_RDTSCP,
  


why not use the RDTSCP macro here?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-09 Thread Linus Torvalds

On Fri, 9 Mar 2007, Bill Davidsen wrote:
>
> But it IS okay for people to make special-case schedulers. Because it's MY
> machine,

Sure.

Go wild. It's what open-source is all about.

I'm not stopping you.

I'm just not merging code that makes the scheduler unreadable, even hard 
to understand, and slows things down. I'm also not merging code that sets 
some scheduler policy limits by having specific "pluggable scheduler 
interfaces".

Different schedulers tend to need different data structures in some *very* 
core data, like the per-cpu run-queues, in "struct task_struct", in 
"struct thread_struct" etc etc. Those are some of *the* most low-level 
structures in the kernel. And those are things that get set up to have as 
little cache footprint a possible etc.

IO schedulers have basically none of those issues. Once you need to do IO, 
you'll happibly use a few indirect pointers, it's not going to show up 
anywhere. But in the scheduler, 10 cycles here and there will be a big 
deal.

And hey, you can try to prove me wrong. Code talks. So far, nobody has 
really ever come close.

So go and code it up, and show the end result. So far, nobody who actually 
*does* CPU schedulers have really wanted to do it, because they all want 
to muck around with their own private versions of the data structures.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH, take2] VFS : Delay the dentry name generation on sockets and pipes.

2007-03-09 Thread Linus Torvalds



On Fri, 9 Mar 2007, Eric Dumazet wrote:
> 
>   CAUTION : d_path() logic is quite  tricky. 
>   The correct way to return for example "Hello" is to put it
>   at the end of the buffer, and returns a pointer to the first char.

Yeah, it's subtle, since it wants to use a single buffer, and not copy 
things around too much.

But can I ask you to do a take3, and simply have a helper function like

char *dynamic_dname(struct dentry *dentry, char *buffer, int len,
const char *fmt, ...)
{
va_list args;
char temp[64];
int i;

va_start(args, fmt);
i = vsnprintf(tmp,sizeof(tmp),fmt,args) + 1;
va_end(args);

if (i > len)
return ERR_PTR(-ENAMETOOLONG);

buffer += len - i;
memcpy(buffer, tmp, i);
return buffer;
}

and just require that everybody use that function.

Then the pipe code would just become

static char *pipefs_dname(struct dentry *dentry, char *buffer, int 
buflen)
{
return dynamic_dname(dentry, buffer, buflen, ""pipe:[%lu]", 
dentry->d_inode->i_ino);
}

and you're done, and you have only *one* place in the VFS layer 
(preferably right next to d_path() itself) that cares about the
subtle issues that we have.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc3-mm1 RSDL results

2007-03-09 Thread Jeffrey Hundstad



Mark Lord wrote:

Mmm.. when it's good, it's *really* good.
My desktop feels snappier and all of that.

No noticeable jerkiness of windows/scrolling,
which I *do* observe with the stock scheduler.

But when it's bad, it stinks.
Like when a "make -j2" kernel rebuild is happening in a background window



Would you please do that same "make -j2" niced.  Tell us how that feels.

--
Jeffrey Hundstad

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.21-rc3-rt0

2007-03-09 Thread Michal Piotrowski

Hi,

I get a lot of
"NOHZ: local_softirq_pending 02"

and I have noticed some swsuspend problems.

Disabling non-boot CPUs ...
CPU1 playing dead
 [] dump_trace+0x7f/0x229
 [] show_trace_log_lvl+0x35/0x54
 [] show_trace+0x2c/0x2e
 [] dump_stack+0x29/0x2b
 [] cpu_idle+0x91/0x126
 [] start_secondary+0x30d/0x315
 ===
---
| preempt count: 0001 ]
| 1-level deep critical section nesting:

.. []  cpu_idle+0x11b/0x126
.[] ..   ( <= start_secondary+0x30d/0x315)

l *0xc01021c4
0xc01021c4 is in cpu_idle (arch/i386/kernel/process.c:204).
199 tick_nohz_restart_sched_tick();
200 local_irq_disable();
201 __preempt_enable_no_resched();
202 __schedule();
203 preempt_disable();
204 local_irq_enable();
205 }
206 }
207
208 void cpu_idle_wait(void)

 l *0xc01156ec
0xc01156ec is in start_secondary (arch/i386/kernel/smpboot.c:432).
427 /* We can take interrupts now: we're officially "up". */
428 local_irq_enable();
429
430 wmb();
431 cpu_idle();
432 }
433
434 /*
435  * Everything has been set up for the secondary
436  * CPUs - they just need to reload everything


CPU 1 is now offline
lockdep: not fixing up alternatives.
stopped custom tracer.

=
[ INFO: possible recursive locking detected ]
[ 2.6.21-rc3-rt0 #1
-
swsusp_shutdown/3406 is trying to acquire lock:
 ((raw_spinlock_t *)(&lock->wait_lock)){--..}, at: [] 
migrate_timers+0x8b/0x16e

but task is already holding lock:
 ((raw_spinlock_t *)(&lock->wait_lock)){--..}, at: [] 
migrate_timers+0x77/0x16e

l *0xc01329cd
0xc01329cd is in migrate_timers (kernel/timer.c:1854).
1849
1850local_irq_disable_nort();
1851double_spin_lock(&new_base->lock, &old_base->lock,
1852 smp_processor_id() < cpu);
1853
1854BUG_ON(old_base->running_timer);
1855
1856for (i = 0; i < TVR_SIZE; i++)
1857migrate_timer_list(new_base, old_base->tv1.vec + i);
1858for (i = 0; i < TVN_SIZE; i++) {

l *0xc01329b9
0xc01329b9 is in migrate_timers (include/linux/spinlock.h:711).
706 __acquires(l1)
707 __acquires(l2)
708 {
709 if (l1_first) {
710 spin_lock(l1);
711 spin_lock(l2);
712 } else {
713 spin_lock(l2);
714 spin_lock(l1);
715 }

other info that might help us debug this:
5 locks held by swsusp_shutdown/3406:
 #0:  (pm_mutex){--..}, at: [] enter_state+0x40/0xbf
 #1:  (cpu_add_remove_lock){--..}, at: [] 
disable_nonboot_cpus+0x1a/0x11c
 #2:  (cache_chain_mutex){--..}, at: [] cpuup_callback+0x214/0x3c1
 #3:  (workqueue_mutex){--..}, at: [] 
workqueue_cpu_callback+0x11f/0x1a6
 #4:  ((raw_spinlock_t *)(&lock->wait_lock)){--..}, at: [] 
migrate_timers+0x77/0x16e

l *0xc0155c98
0xc0155c98 is in enter_state (kernel/power/main.c:197).
192 int error;
193
194 if (!valid_state(state))
195 return -ENODEV;
196 if (!mutex_trylock(&pm_mutex))
197 return -EBUSY;
198
199 if (state == PM_SUSPEND_DISK) {
200 error = pm_suspend_disk();
201 goto Unlock;

l *0xc014f781
0xc014f781 is in disable_nonboot_cpus (kernel/cpu.c:264).
259 int disable_nonboot_cpus(void)
260 {
261 int cpu, first_cpu, error = 0;
262
263 mutex_lock(&cpu_add_remove_lock);
264 first_cpu = first_cpu(cpu_present_map);
265 if (!cpu_online(first_cpu)) {
266 error = _cpu_up(first_cpu);
267 if (error) {
268 printk(KERN_ERR "Could not bring CPU%d up.\n",

l *0xc01872b7
0xc01872b7 is in cpuup_callback (mm/slab.c:1342).
1337start_cpu_timer(cpu);
1338break;
1339#ifdef CONFIG_HOTPLUG_CPU
1340case CPU_DOWN_PREPARE:
1341mutex_lock(&cache_chain_mutex);
1342break;
1343case CPU_DOWN_FAILED:
1344mutex_unlock(&cache_chain_mutex);
1345break;
1346case CPU_DEAD:

l *0xc0139fb0
0xc0139fb0 is in workqueue_cpu_callback (kernel/workqueue.c:883).
878 mutex_unlock(&workqueue_mutex);
879 break;
880
881 case CPU_DOWN_PREPARE:
882 mutex_lock(&workqueue_mutex);
883 break;
884
885 case CPU_DOWN_FAILED:
886 mutex_unlock(&workqueue_mutex);
887 break;


stack backtrace:
 [] dump_trace+0x7f/0x229
 [] show_trace_log_lvl+0x35/0x5

Re: ABI coupling to hypervisors via CONFIG_PARAVIRT

2007-03-09 Thread Linus Torvalds

On Fri, 9 Mar 2007, Ingo Molnar wrote:
> 
> yes - but we already support the raw hardware ABI, in the native kernel.

Why do you continue to call paravirt an ABI?

We got over that. It's not. It's an API.

VMI is an ABI.

As long as you try to confuse the two, there's no point to the discussion.

Yeah, paravirt is ugly. Yeah, the calls should be moved higher in the 
stack. But you don't help by confusing the issue by mixing the different 
parts up and calling something an ABI that simply *isn't*.

Paravirt already acts on a higher level than the ioapic. It does do the 
"irq_disable()" kind of "highlevel" callbacks. Yeah, the "apic_write()" 
ones should go away, and they're just hacky, but there's nothing there 
that is an ABI.

So just *fix* it or tell others to fix it, instead of just confusing the 
issue.

And trust me, if "apic_write" causes bugs because it interacts with real 
APIC usage, we don't care ONE WHIT. That paravirt_ops entry goes out the 
window so fast you can't say "Whaa?!??". 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc3-mm1 RSDL results

2007-03-09 Thread Matt Mackall

On Fri, Mar 09, 2007 at 07:39:05PM +1100, Con Kolivas wrote:
> On Friday 09 March 2007 19:20, Matt Mackall wrote:
> > And I've just rebooted with NO_HZ and things are greatly improved. At
> > idle, Beryl effects are silky smooth (possibly better than stock) and
> > shows less load. Under 'make', Beryl is still responsive as is Galeon.
> > No sign of lagging mouse or typing.
> >
> > Under make -j 5, things are intermittent. Galeon scrolling is
> > sometimes still responsive, but Beryl, terminals and mouse still drag
> > quite a bit.
> 
> I just replied before you sent this one out I think our messages passed each 
> other across the ocean somewhere. I don't quite get what combination of 
> factors you're saying here caused great improvement. Was it enabling NO_HZ on 
> mainline cpu scheduler or disabling NO_HZ or on RSDL?

Turning on NO_HZ on RSDL greatly improved it. I have not tried NO_HZ
on mainline. The first test was with NO_HZ=n, the second was with
NO_HZ=y.

My baseline test was with mainline NO_HZ=y.

As an aside, we should not name config options NO_* or DISABLE_*
because of the potential for double negation.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ABI coupling to hypervisors via CONFIG_PARAVIRT

2007-03-09 Thread Andi Kleen

On Friday 09 March 2007 19:02, Ingo Molnar wrote:

> _1463_ hooks, spread out all around the x86 arch.

They are not all different hooks though, just many call site of the same.
Also most of them are well defined to just match what the instructions
do.

paravirt_ops has under hundred entries right now and i intend to not
expand it much further after the Xen bits are in.

> Let me demonstrate some of the bad effects, and how far we've _already_ 
> deviated from the 'hardware ABI'. An example: one assumes that 
> paravirt_ops.safe_halt() 

The vmi maintainers already agreed to fix that.

> i claim that when the 'API cut' is done at the right level 

Can you make a proposal?  Would you be willing to write code for that?

> then no more  
> than say 100 hooks would be needed 

Well we have less than 100 hooks right now, just with many call sites @)

> - with virtually zero kernel size  
> increase. 

I'll believe that when I see it.

> We've got all the right highlevel abstractions: genirq, gtod,  
> clockevents. Whatever is missing at the moment from the framework (say 
> smp_send_reschedule()) we can abstract away. 

smp_send_reschedule() is just an IPI instance which is already
abstracted with genapic. Xen has a genapic_xen.

VMI is still where ->apic_read/->apic_write and their relatively
harmless timer interrupt change make sense -- if they needed
more changes they would just need to bite the bullet and 
provide a custom genapic vmware apic driver.

> Unfortunately, with the current paravirt_ops policy we might end up 
> seeing none of that unification. 

I am open to concrete incremental proposals for improvements.

> 
> And that is why the "paravirt_ops is just virtual hardware" argument is 
> totally wrong. _Nothing_ limits hypervisors from adding arbitrary ABI 
> bindings to Linux. For example, VMI does this already and none of the 
> following are hardware ABIs:
> 
>  #define VMI_CALL_SetAlarm   68
>  #define VMI_CALL_CancelAlarm69
>  #define VMI_CALL_GetWallclockTime   70
>  #define VMI_CALL_WallclockUpdated   71

That's VMI internal, not exposed above paravirt ops.

> Firstly, i think this has been over-rushed. After years of being happy 
> with forks of the Linux kernel, 

The code has been posted for a long time, open for review for everybody.

> Secondly, i'd like to see a paravirt approach that has /implicit/ 
> safeguards against the following type of crap:

I don't think you can use an API to force the underlying implementation
in a practical way. If code wants to do something wrong it no API 
in the world will stop it. That is why we have code review instead.

>it has a hardwired assumption that 'cycles' makes a sense as a way to 
>communicate time units:
> 
> vmi_timer_ops.set_alarm(
>   VMI_ALARM_WIRED_LVTT | VMI_ALARM_IS_PERIODIC | 
> VMI_CYCLES_AVAILABLE,
>   per_cpu(process_times_cycles_accounted_cpu, cpu) + 
> cycles_per_alarm,
>   cycles_per_alarm);

That's because VMI is defined this way? If paravirt chosed to not pass cycles
anymore they would just add a simple conversion function. SMOP.

>it has a hardwired assumption that Linux keeps time in units of 
>'jiffies':

Well a lot of drivers have that, but it can be all fixed.

> Granted, some of these are just harmless quirks that are fixable in 
> Linux only,

All of them.

> but some of these are stiffling because they bind Linux to  
> the hypervisor ABI.

I haven't seen a concrete example of that yet to be honest and I don't
really believe it.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix building kernel under Solaris 11_snv

2007-03-09 Thread Sam Ravnborg

On Thu, Mar 08, 2007 at 11:01:57PM +0100, Jan Engelhardt wrote:
> 
> On Mar 8 2007 22:25, Sam Ravnborg wrote:
> >Subject: Re: [PATCH] Fix building kernel under Solaris
> 
> Since Solaris seems to be on the run, I did myself try compile it. 
> However, unlike the original poster who said he did so on SunOS 4.8, I 
> did it on 5.11_snv39, yielding a bigger changeset. I thought I just 
> share the diff that piled up so far. It needs a lot of hacks on the 
> Solaris side - prioritizing GNU names, then, second, gnu ld has a 
> glitch, then, gcc has a missing file... it's fun fun fun!

Can I please have a signed-off version of this patch.

Thanks,
Sam

> 
> Well, I will iterate the key problem with the missing file:
> 
>   *  include/linux/kernel.h (and many others) include 
>  BUT - since we are using -nostdinc, /usr/include/stdarg.h is not
>  considered. And gcc's stdarg.h (which lives at
>  /usr/lib/gcc/i586-suse-linux/4.1.2/include/stdarg.h in Linux land)
>  is missing in Solaris' GCC (which is version 3.4.3).
> 
> Hack #1:
>   ln -s \
>   
> /usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/install-tools/include/stdarg.h \
>   /usr/sfw/lib/gcc/i386-pc-solaris2.11/3.4.3/include/stdarg.h
> 
> Hack #2: GNU programs...
> 
>   mkdir -p ~/gnulink;
>   for i in addr2line ar as egrep grep ld make nm objcopy objdump \
> ranlib readelf size string strip tar; do
>   ln -s "/usr/sfw/bin/$i" "~/gnulink/$i";
>   done;
> 
>   for i in cat chgrp chmod chown chroot cksum cmp cp cut date \
> dd df diff du echo env expand expr false fgrep find fold \
> getopt groups head hostid install join ln locate ls mkdir \
> mkfifo mknod mv nice nohup od pwd rm rmdir sed seq shred \
> sleep sort split stty tac tail tee touch tr true uname \
> uniq uptime wc who whoami xargs yes gawk; do
>   ln -s "/opt/csw/bin/$i" "~/gnulink/$i";
>   done;
> 
> Hack #3: Diff file...
> 
> Hack #4: GNU ld glitch workaround (GNU ld looks in the current dir...)
> 
>   cd linux-2.6.21-rc3
>   ln -s /usr/sfw/i386-sun-solaris2.11/lib/ldscripts ldscripts
> 
> Fun #1:
> 
>   export PATH="$HOME/gnulink:$PATH";
>   make ARCH=i386
> 
> Oddity #1:
> 
>   ARCH=i386 required because the Makefiles seem to use `uname -m`
>   (which returns "i86pc") rather than `uname -p`. I think we are
>   at odds here though...
> 
>   uname -muname -p
>   SOL i86pc   i386
>   LINUX   i686athlon
> 
> 
> Expect compiler failures, especially with assembler code.
> 
> 
> Jan
> 
> <<< PATCH BELOW <<<
> 
> Index: linux-2.6.21-rc3/include/linux/input.h
> ===
> --- linux-2.6.21-rc3.orig/include/linux/input.h   2007-03-07 
> 05:41:20.0 +0100
> +++ linux-2.6.21-rc3/include/linux/input.h2007-03-07 23:40:39.417339000 
> +0100
> @@ -16,7 +16,9 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#ifndef __sun__
> +#include 
> +#endif
>  #endif
>  
>  /*
> Index: linux-2.6.21-rc3/scripts/genksyms/genksyms.c
> ===
> --- linux-2.6.21-rc3.orig/scripts/genksyms/genksyms.c 2007-03-07 
> 05:41:20.0 +0100
> +++ linux-2.6.21-rc3/scripts/genksyms/genksyms.c  2007-03-07 
> 23:28:35.659555000 +0100
> @@ -21,6 +21,7 @@
> along with this program; if not, write to the Free Software Foundation,
> Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> Index: linux-2.6.21-rc3/scripts/kallsyms.c
> ===
> --- linux-2.6.21-rc3.orig/scripts/kallsyms.c  2007-03-07 05:41:20.0 
> +0100
> +++ linux-2.6.21-rc3/scripts/kallsyms.c   2007-03-07 23:46:46.249005000 
> +0100
> @@ -378,6 +378,40 @@
>   table_cnt = pos;
>  }
>  
> +#ifdef __sun__
> +/* Return the first occurrence of NEEDLE in HAYSTACK.  */
> +void *
> +memmem (haystack, haystack_len, needle, needle_len)
> + const void *haystack;
> + size_t haystack_len;
> + const void *needle;
> + size_t needle_len;
> +{
> +  const char *begin;
> +  const char *const last_possible
> += (const char *) haystack + haystack_len - needle_len;
> +
> +  if (needle_len == 0)
> +/* The first occurrence of the empty string is deemed to occur at
> +   the beginning of the string.  */
> +return (void *) haystack;
> +
> +  /* Sanity check, otherwise the loop might search through the whole
> + memory.  */
> +  if (__builtin_expect (haystack_len < needle_len, 0))
> +return NULL;
> +
> +  for (begin = (const char *) haystack; begin <= last_possible; ++begin)
> +if (begin[0] == ((const char *) needle)[0] &&
> +!memcmp ((const void *) &begin[1],
> + (const void *) ((const char *) needle + 1),
> + needle_l

Re: ABI coupling to hypervisors via CONFIG_PARAVIRT

2007-03-09 Thread Chris Wright

* Ingo Molnar ([EMAIL PROTECTED]) wrote:
> i claim that when the 'API cut' is done at the right level then no more 
> than say 100 hooks would be needed - with virtually zero kernel size 
> increase. We've got all the right highlevel abstractions: genirq, gtod, 
> clockevents. Whatever is missing at the moment from the framework (say 
> smp_send_reschedule()) we can abstract away. The bonus? It would be 
> almost directly applicable to other architectures as well. It would also 
> work with /any/ hypervisor.

Oddly enough, that's really what we are trying to acheive.  There is
definitely some tension between the VMI model which is modeled very
directly on hardware and something like the Xen model which prefers
higher level interfaces.

I don't really agree with your metrics w.r.t hooks.  My point is you
take callsites == hooks to arrive at 1463 hook, but then above say 100
hooks is sufficient.  But we have on the order of 100 hooks (I believe
it's ~75 in Linus' tree).  Put it another way.  Do you believe that
something like irq_{en,dis}able() is appropriate to hook (as that's >
1400 callsites already)?

> Firstly, i think this has been over-rushed. After years of being happy 
> with forks of the Linux kernel, all the hypervisors woke up at once and 
> want to have their stuff upstream /now/. This rush created a hodgepodge 
> of APIs/ABIs that we now in the end promise to support /all/. (if we 
> take CONFIG_VMI i can see little ethical reason to not take Xen's 
> paravirt_ops, lguest's paravirt_ops, KVM's paravirt_ops and i'm sure 
> Microsoft/Novell will have something nice and different for us too.)

It would be imminently helpful if you helped with some specific ideas
on where the paravirt_ops interface needs to be adjusted.

> Secondly, i'd like to see a paravirt approach that has /implicit/ 
> safeguards against the following type of crap:

How would you propose doing that?  Typically that's done with code
review and patches.

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Software Suspend: Fix suspend when console is in VT_AUTO/KD_GRAPHICS mode

2007-03-09 Thread Andrew Johnson

On Fri, 2007-09-03 at 15:34 +, Matthew Garrett wrote:
> On Fri, Mar 09, 2007 at 10:08:05AM +0100, Pavel Machek wrote:
> 
> > So... if current console is graphical, we leave X accessing the
> > console... That's bad, because video state is not going to be
> > restored...?
> 
> A graphical console is not necessarily X. Is there any requirement for 
> there to be a single VT that isn't in text mode? The vt switching is 
> a hack, we shouldn't make life difficult for people who have their own 
> userspace code that's entirely capable of restoring video state on its 
> own.

I realised that the previous patch would disallow a console switch while
running X.  Attached is an updated patch with this scenario fixed.

Another approach might be to fail in vt_waitactive() if a console switch
is not going to occur.

-- Andrew

Signed-off-by: Andrew Johnson <[EMAIL PROTECTED]>
---
diff -rup linux-2.6.20.1/drivers/char/vt.c linux/drivers/char/vt.c
--- linux-2.6.20.1/drivers/char/vt.c2007-02-19 22:34:32.0 -0800
+++ linux/drivers/char/vt.c 2007-03-09 10:53:32.0 -0800
@@ -2188,10 +2188,22 @@ static void console_callback(struct work
release_console_sem();
 }
 
-void set_console(int nr)
+extern char vt_dont_switch;
+
+int set_console(int nr)
 {
+   struct vc_data *vc = vc_cons[fg_console].d;
+
+   if(!vc_cons_allocated(nr) || vt_dont_switch || 
+   (vc->vt_mode.mode != VT_PROCESS && vc->vc_mode == KD_GRAPHICS)) 
{
+
+   return -EINVAL;
+   }
+
want_console = nr;
schedule_console_callback();
+
+   return 0;
 }
 
 struct tty_driver *console_driver;
diff -rup linux-2.6.20.1/drivers/char/vt_ioctl.c
linux/drivers/char/vt_ioctl.c
--- linux-2.6.20.1/drivers/char/vt_ioctl.c  2007-02-19 22:34:32.0
-0800
+++ linux/drivers/char/vt_ioctl.c   2007-03-08 14:15:41.0 -0800
@@ -34,7 +34,7 @@
 #include 
 #include 
 
-static char vt_dont_switch;
+char vt_dont_switch;
 extern struct tty_driver *console_driver;
 
 #define VT_IS_IN_USE(i)(console_driver->ttys[i] &&
console_driver->ttys[i]->count)
diff -rup linux-2.6.20.1/include/linux/kbd_kern.h
linux/include/linux/kbd_kern.h
--- linux-2.6.20.1/include/linux/kbd_kern.h 2007-02-19
22:34:32.0 -0800
+++ linux/include/linux/kbd_kern.h  2007-03-08 14:15:41.0 -0800
@@ -75,7 +75,7 @@ extern int do_poke_blanked_console;
 
 extern void (*kbd_ledfunc)(unsigned int led);
 
-extern void set_console(int nr);
+extern int set_console(int nr);
 extern void schedule_console_callback(void);
 
 static inline void set_leds(void)
diff -rup linux-2.6.20.1/kernel/power/console.c
linux/kernel/power/console.c
--- linux-2.6.20.1/kernel/power/console.c   2007-02-19 22:34:32.0
-0800
+++ linux/kernel/power/console.c2007-03-08 14:15:41.0 -0800
@@ -27,7 +27,11 @@ int pm_prepare_console(void)
return 1;
}
 
-   set_console(SUSPEND_CONSOLE);
+   if (set_console(SUSPEND_CONSOLE)) {
+   /* Unable to change to the new console */
+   release_console_sem();
+   return 1;
+   }
release_console_sem();
 
if (vt_waitactive(SUSPEND_CONSOLE)) {


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 2.6.20.2

2007-03-09 Thread Greg KH

We (the -stable team) are announcing the release of the 2.6.20.2 kernel.
It contains a metric buttload of bugfixes and security updates, so all
2.6.20 users are recommended to upgrade.

The diffstat and short summary of the fixes are below.

I'll also be replying to this message with a copy of the patch between
2.6.20.1 and 2.6.20.2.

The updated 2.6.20.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.20.y.git
and can be browsed at the normal kernel.org git web browser:
www.kernel.org/git/

thanks,

greg k-h


 Makefile   |2 
 arch/i386/kernel/cpu/mtrr/if.c |   33 +++-
 arch/i386/kernel/signal.c  |6 +-
 arch/i386/kernel/sysenter.c|2 
 arch/ia64/Kconfig  |1 
 arch/ia64/kernel/crash.c   |   11 ++--
 arch/ia64/kernel/machine_kexec.c   |2 
 arch/m32r/kernel/process.c |2 
 arch/m32r/kernel/signal.c  |   26 +
 arch/powerpc/kernel/head_64.S  |2 
 arch/ppc/kernel/ppc_ksyms.c|2 
 arch/sparc64/kernel/of_device.c|   40 ++-
 arch/um/os-Linux/sigio.c   |   38 +++---
 arch/x86_64/ia32/ia32_signal.c |7 ++
 arch/x86_64/ia32/ptrace32.c|1 
 arch/x86_64/kernel/irq.c   |   12 +++-
 block/ll_rw_blk.c  |2 
 drivers/Makefile   |2 
 drivers/ata/ahci.c |   14 +
 drivers/ata/ata_generic.c  |4 +
 drivers/ata/ata_piix.c |4 +
 drivers/ata/pata_ali.c |6 ++
 drivers/ata/pata_amd.c |   10 +++
 drivers/ata/pata_atiixp.c  |4 +
 drivers/ata/pata_cmd64x.c  |6 ++
 drivers/ata/pata_cs5520.c  |7 ++
 drivers/ata/pata_cs5530.c  |6 ++
 drivers/ata/pata_cs5535.c  |4 +
 drivers/ata/pata_cypress.c |4 +
 drivers/ata/pata_efar.c|4 +
 drivers/ata/pata_hpt366.c  |7 ++
 drivers/ata/pata_hpt3x3.c  |6 ++
 drivers/ata/pata_it821x.c  |6 ++
 drivers/ata/pata_jmicron.c |8 +++
 drivers/ata/pata_marvell.c |4 +
 drivers/ata/pata_mpiix.c   |4 +
 drivers/ata/pata_netcell.c |4 +
 drivers/ata/pata_ns87410.c |4 +
 drivers/ata/pata_oldpiix.c |4 +
 drivers/ata/pata_opti.c|4 +
 drivers/ata/pata_optidma.c |4 +
 drivers/ata/pata_pdc202xx_old.c|4 +
 drivers/ata/pata_radisys.c |4 +
 drivers/ata/pata_rz1000.c  |6 ++
 drivers/ata/pata_sc1200.c  |4 +
 drivers/ata/pata_serverworks.c |6 ++
 drivers/ata/pata_sil680.c  |8 +++
 drivers/ata/pata_sis.c |4 +
 drivers/ata/pata_triflex.c |4 +
 drivers/ata/pata_via.c |6 ++
 drivers/ata/sata_sil.c |   10 +++
 drivers/ata/sata_sil24.c   |2 
 drivers/block/pktcdvd.c|2 
 drivers/char/agp/intel-agp.c   |   14 +++--
 drivers/char/pcmcia/cm4040_cs.c|3 -
 drivers/char/specialix.c   |2 
 drivers/char/tty_io.c  |   14 +
 drivers/hid/hid-core.c |5 -
 drivers/ide/ide-iops.c |2 
 drivers/ieee1394/nodemgr.c |   24 ++---
 drivers/ieee1394/video1394.c   |8 +++
 drivers/input/mouse/psmouse-base.c |   28 ++
 drivers/input/mouse/psmouse.h  |1 
 drivers/input/mouse/synaptics.c|1 
 drivers/kvm/kvm.h  |2 
 drivers/macintosh/Kconfig  |2 
 drivers/md/bitmap.c|   22 +++-
 drivers/md/raid10.c|   38 +++---
 drivers/md/raid5.c |   42 ++-
 drivers/media/dvb/dvb-core/dvbdev.c|   13 
 drivers/media/dvb/dvb-usb/cxusb.c  |4 -
 drivers/media/dvb/dvb-usb/digitv.c |2 
 drivers/media/video/cx25840/cx25840-core.c |4 -
 drivers/media/video/cx25840/cx25840-firmware.c |2 
 drivers/media/video/cx88/cx88-blackbird.c  |   14 +++--
 drivers/media/vid

[PATCH] Bitbanging i2c bus driver using the GPIO API

2007-03-09 Thread Haavard Skinnemoen

This is a very simple bitbanging i2c bus driver utilizing the new
arch-neutral GPIO API. Useful for chips that don't have a built-in
i2c controller, additional i2c busses, or testing purposes.

To use, include something similar to the following in the
board-specific setup code:

  #include 

  static struct i2c_gpio_platform_data i2c_gpio_data = {
.sda_pin= GPIO_PIN_FOO,
.scl_pin= GPIO_PIN_BAR,
  };
  static struct platform_device i2c_gpio_device = {
.name   = "i2c-gpio",
.id = 0,
.dev= {
.platform_data  = &i2c_gpio_data,
},
  };

Register this platform_device, set up the i2c pins as GPIO if
required and you're ready to go.

Signed-off-by: Haavard Skinnemoen <[EMAIL PROTECTED]>
---
I wrote this driver for testing purposes a couple of weeks ago.
Figured I might as well post it since it looks like something like
this is needed.

This driver hasn't yet been updated for the latest change to the GPIO
API. I'll update the patch when the GPIO change makes it into
mainline.

Haavard

 drivers/i2c/busses/Kconfig|8 ++
 drivers/i2c/busses/Makefile   |1 +
 drivers/i2c/busses/i2c-gpio.c |  164 +
 include/linux/i2c-gpio.h  |   18 +
 include/linux/i2c-id.h|1 +
 5 files changed, 192 insertions(+), 0 deletions(-)

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index fb19dbb..52f79d1 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -102,6 +102,14 @@ config I2C_ELEKTOR
  This support is also available as a module.  If so, the module 
  will be called i2c-elektor.
 
+config I2C_GPIO
+   tristate "GPIO-based bitbanging i2c driver"
+   depends on I2C && GENERIC_GPIO
+   select I2C_ALGOBIT
+   help
+ This is a very simple bitbanging i2c driver utilizing the
+ arch-neutral GPIO API to control the SCL and SDA lines.
+
 config I2C_HYDRA
tristate "CHRP Apple Hydra Mac I/O I2C interface"
depends on I2C && PCI && PPC_CHRP && EXPERIMENTAL
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 290b540..68f2b05 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_I2C_AMD8111) += i2c-amd8111.o
 obj-$(CONFIG_I2C_AT91) += i2c-at91.o
 obj-$(CONFIG_I2C_AU1550)   += i2c-au1550.o
 obj-$(CONFIG_I2C_ELEKTOR)  += i2c-elektor.o
+obj-$(CONFIG_I2C_GPIO) += i2c-gpio.o
 obj-$(CONFIG_I2C_HYDRA)+= i2c-hydra.o
 obj-$(CONFIG_I2C_I801) += i2c-i801.o
 obj-$(CONFIG_I2C_I810) += i2c-i810.o
diff --git a/drivers/i2c/busses/i2c-gpio.c b/drivers/i2c/busses/i2c-gpio.c
new file mode 100644
index 000..f5ed64e
--- /dev/null
+++ b/drivers/i2c/busses/i2c-gpio.c
@@ -0,0 +1,164 @@
+/*
+ * Bitbanging i2c bus driver using the GPIO API
+ *
+ * Copyright (C) 2006 Atmel Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+void i2c_gpio_setsda(void *data, int state)
+{
+   struct i2c_gpio_platform_data *pdata = data;
+
+   if (state)
+   gpio_direction_input(pdata->sda_pin);
+   else
+   gpio_direction_output(pdata->sda_pin);
+}
+
+void i2c_gpio_setscl(void *data, int state)
+{
+   struct i2c_gpio_platform_data *pdata = data;
+
+   if (state)
+   gpio_direction_input(pdata->scl_pin);
+   else
+   gpio_direction_output(pdata->scl_pin);
+}
+
+int i2c_gpio_getsda(void *data)
+{
+   struct i2c_gpio_platform_data *pdata = data;
+
+   return gpio_get_value(pdata->sda_pin);
+}
+
+int i2c_gpio_getscl(void *data)
+{
+   struct i2c_gpio_platform_data *pdata = data;
+
+   return gpio_get_value(pdata->scl_pin);
+}
+
+static int __init i2c_gpio_probe(struct platform_device *pdev)
+{
+   struct i2c_gpio_platform_data *pdata;
+   struct i2c_algo_bit_data *bit_data;
+   struct i2c_adapter *adap;
+   int ret;
+
+   pdata = pdev->dev.platform_data;
+   if (!pdata)
+   return -ENXIO;
+
+   ret = -ENOMEM;
+   adap = kzalloc(sizeof(struct i2c_adapter), GFP_KERNEL);
+   if (!adap)
+   goto err_alloc_adap;
+   bit_data = kzalloc(sizeof(struct i2c_algo_bit_data), GFP_KERNEL);
+   if (!bit_data)
+   goto err_alloc_bit_data;
+
+   ret = gpio_request(pdata->sda_pin, "sda");
+   if (ret)
+   goto err_request_sda;
+   ret = gpio_request(pdata->scl_pin, "scl");
+   if (ret)
+   goto err_request_scl;
+
+   gpio_direction_input(pdata->sda_pin);
+   gpio_direction_input(pdata->scl_pin);
+   gpio_set_value(pdata->sda_pin, 0);
+

Re: [4/6] 2.6.21-rc2: known regressions

2007-03-09 Thread Andrew

On Thu, March 8, 2007 11:28 pm, Len Brown wrote:
> On Monday 05 March 2007 05:35, Antonino A. Daplas wrote:
>
>
> Looks like I got fooled by the negative logic for the nvidia_bugs().
> Please test this patch -- it should fix it,
> as well as simplify the code a bit.
>
> thanks, -Len
>

Yep. You can knock this one off the regression
list :)

Thanks,
  Andrew


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ABI coupling to hypervisors via CONFIG_PARAVIRT

2007-03-09 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> On Fri, 9 Mar 2007, Ingo Molnar wrote:
> > 
> > yes - but we already support the raw hardware ABI, in the native 
> > kernel.
> 
> Why do you continue to call paravirt an ABI?
> 
> We got over that. It's not. It's an API.
> 
> VMI is an ABI.

Unfortunately i still dont see where i'm wrong, and i'm really trying to 
understand your argument. Is your argument that as long as an ABI (VMI) 
is never directly used but only used via wrapper functions 
(paravirt_ops), it has no effects whatsoever on the flexibility of the 
rest of the software and ceases to have any negative ABI effects? In my 
opinion that is an absurd (and incorrect) point so i guess you must mean 
something else, but i really cannot think what that is.

I never said paravirt_ops is an ABI. I say that the ABI(s) _behind_ 
paravirt_ops [in the backend] /does/ limit Linux, even if wrapped, 
inevitably, and that i'm simply worried about having 4-5 independent 
ABIs behind each paravirt_ops variant each creating a web of design 
constraints on the rest of the kernel. To quote a past email of mine:

|| 'paravirt ops can take care of it' - but that is just blatantly 
|| _FALSE_: the ABI 'behind' the paravirt_ops 'shines through' via 
|| functional coupling

it doesnt matter in how big letters the wrapper functions have 'freedom' 
written on them, the _real_ constraint is the user's expectation to have 
the hypervisor work with Linux that worked with that particular VMI ABI 
in v2.6.21. So the user wants to have its hypervisor 1.12 work with 
Linux v2.6.22 - without having to update the hypervisor. And Linux 
v2.6.23. Etc. /That/ is the 'ABI effect' i'm worried about. It is a 
"compatibility web" that gets more and more entangled with every new 
paravirt_ops implementation added.

In practice, when a problem comes up during code rewrite, 90% of the 
time we can probably find a way around it via paravirt_ops and the 
backend, but i'm simply worried about the remaining 10%. And that 10% is 
not hypothetical at all, should i cite specific examples of problems 
that i think cannot be solved via Linux-only modifications?

I'm also worried about the sheer QA inertia of having an additional 4-5 
hypervisor-ABI constraints on the correctness of the kernel, in addition 
to the 2 main CPU variants we have at the moment.

If we said "paravirt_ops must behave like real hardware" then we'd 
probably remove some of that risk (although enforcement is still an 
issue). But we _specifically_ say that no, it doesnt have to behave like 
real hardware. We allow shortcuts, we allow modifications of behavior - 
and that's good in quite many cases. But we allow really weird hacks 
like the .safe_halt() thing. Our only present requirement it appears is 
that "it works with today's hypervisor" - and that requirement 
automatically transforms itself into: "all future kernels will work with 
all past versions of the hypervisor".

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Bitbanging i2c bus driver using the GPIO API

2007-03-09 Thread David Brownell

On Friday 09 March 2007 10:48 am, Haavard Skinnemoen wrote:
> This is a very simple bitbanging i2c bus driver utilizing the new
> arch-neutral GPIO API. Useful for chips that don't have a built-in
> i2c controller, additional i2c busses, or testing purposes.

That's the right idea!  But remember that not all GPIOs support
reading back the actual value on SCL (it's an OUT pin, so lacking
multidrive capability the values "should" be what you wrote), so
getscl() support should depend on a flag in platform data.  In
the same vein, if SCL is an output-only pin, you won't be able
to change its direction ... but then, I'm not sure why you were
changing its direction in setscl() rather than just its value.

I2C has another interesting special case.  at91_set_multi_drive()
would be appropriate (yes?) for ARCH_AT91 to use on SCL, to best
support both clock stretching and multi-master configurations.

> + gpio_direction_input(pdata->sda_pin);
> + gpio_direction_input(pdata->scl_pin);
> + gpio_set_value(pdata->sda_pin, 0);
> + gpio_set_value(pdata->scl_pin, 0);

Surely you mean "output" in both cases.  So you can set the
value.  Setting the value on an input pin is undefined.  ;)

> + printk(KERN_INFO "i2c-gpio: using pins 0x%x (sda) 0x%x (scl)\n",
> +pdata->sda_pin, pdata->scl_pin);

Please, no hex there.  I think dev_info() would be better; and it
might be nice to report whether clock stretching is supported.

> --- a/include/linux/i2c-id.h
> +++ b/include/linux/i2c-id.h
> @@ -194,6 +194,7 @@
>  #define I2C_HW_B_EM28XX  0x01001f /* em28xx video capture cards 
> */
>  #define I2C_HW_B_CX2341X 0x010020 /* Conexant CX2341X MPEG encoder cards 
> */
>  #define I2C_HW_B_INTELFB 0x010021 /* intel framebuffer driver */
> +#define I2C_HW_B_GPIO0x010022 /* Generic GPIO-based driver */

It'd be nice to completely abolish those IDs, starting by not
adding new ones.  Especially, not adding unused ones!

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: refcounting drivers' data structures used in sysfs buffers

2007-03-09 Thread Alan Stern

On Fri, 9 Mar 2007, Dmitry Torokhov wrote:

> On 3/9/07, Oliver Neukum <[EMAIL PROTECTED]> wrote:
> > Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:
> >
> > > I think we already have all refcounting that is needed. What is
> > > missing is subsystem-provided ->release() hooks for drivers to release
> > > driver-specific resources when a device finally goes away.
> >
> > This is an interesting idea. Is it nice to pass through release()
> > but not open() ?
> >
> 
> Not sure if I follow... Generally speaking open is not a mandatory
> operation; however every object in driver model has a release method.
> What I am saying is that certain drivers need to have their disconnect
> method split in 2 parts - one that shuts down the device and second is
> releases resources that might be accesses through sysfs (and other
> kernel parts). That second part will have to be called from
> subsystem's core ->release() method se we need a release() hook.

Dmitry, you're not viewing this correctly.

Adding a new release() callback would solve the problem by creating 
another.  Drivers need to release their data as soon as possible after
they unbind from a device, not when the device itself goes away.  Think
about what would happen if you tried to rmmod a driver.  The rmmod process 
would block until the device was unregistered.

Oliver, your idea won't work either.  Think about what would happen if 
someone did

rmmod driver_module http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] swsusp: Disable nonboot CPUs before entering platform suspend

2007-03-09 Thread Rafael J. Wysocki

On Friday, 9 March 2007 13:29, Heiko Carstens wrote:
> On Wed, Mar 07, 2007 at 09:07:17PM +, Pavel Machek wrote:
> > Hi!
> > 
> > > Prevent the WARN_ON() in 
> > > arch/x86_64/kernel/acpi/sleep.c:init_low_mapping()
> > > from triggering by disabling nonboot CPUs before we finally enter the 
> > > platform
> > > suspend.
> > > 
> > > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > > ---
> > >  kernel/power/disk.c |1 +
> > >  kernel/power/user.c |2 +-
> > >  2 files changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > Index: linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > > ===
> > > --- linux-2.6.21-rc2-mm2.orig/kernel/power/disk.c
> > > +++ linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > > @@ -61,6 +61,7 @@ static void power_down(suspend_disk_meth
> > >   switch(mode) {
> > >   case PM_DISK_PLATFORM:
> > >   if (pm_ops && pm_ops->enter) {
> > > + disable_nonboot_cpus();
> > >   kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
> > >   pm_ops->enter(PM_SUSPEND_DISK);
> > >   break;
> > 
> > ...so, if pm_ops is non-null, power_down does nonboot cpu disabling,
> > otherwise we proceed with cpus enabled?
> > 
> > That looks ugly.
> > 
> > Is the warning bogus? Or maybe we should *always* disable nonboot cpus
> > in powerdown path?
> 
> Is disable_nonboot_cpus() assuming that first_cpu(cpu_present_map) is
> the boot cpu? Just wondering why disable_nonboot_cpus() isn't using just
> any_online_cpu(cpu_online_map)...

Is your question related to the code in kernel/cpu.c?

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] hotplug cpu: migrate a task within its cpuset

2007-03-09 Thread Cliff Wickman


From: Cliff Wickman <[EMAIL PROTECTED]>

(this is a second submission -- the first was from a work area back
 porting to an older release)

When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
that have been running on that cpu.

Currently, such a task is migrated:
 1) to any cpu on the same node as the disabled cpu, which is both online
and among that task's cpus_allowed
 2) to any cpu which is both online and among that task's cpus_allowed

But the task's cpus_allowed may have been a single cpu.

This patch would insert a preference to migrate such a task to a cpu within
its cpuset (and set its cpus_allowed to its cpuset).

With this patch, migrate the task to:
 1) to any cpu on the same node as the disabled cpu, which is both online
and among that task's cpus_allowed
 2) to any online cpu within the task's cpuset
 3) to any cpu which is both online and among that task's cpus_allowed


Diffed against 2.6.21-rc3 (Andrew's current top of tree)

Signed-off-by: Cliff Wickman <[EMAIL PROTECTED]>

---
 kernel/sched.c |6 ++
 1 file changed, 6 insertions(+)

Index: morton.070123/kernel/sched.c
===
--- morton.070123.orig/kernel/sched.c
+++ morton.070123/kernel/sched.c
@@ -5170,6 +5170,12 @@ restart:
if (dest_cpu == NR_CPUS)
dest_cpu = any_online_cpu(p->cpus_allowed);
 
+   /* try to stay on the same cpuset */
+   if (dest_cpu == NR_CPUS) {
+   p->cpus_allowed = cpuset_cpus_allowed(p);
+   dest_cpu = any_online_cpu(p->cpus_allowed);
+   }
+
/* No more Mr. Nice Guy. */
if (dest_cpu == NR_CPUS) {
rq = task_rq_lock(p, &flags);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Sleeping thread not receive signal until it wakes up

2007-03-09 Thread Luong Ngo

On 3/9/07, Sergey Vlasov <[EMAIL PROTECTED]> wrote:

On Thu, 8 Mar 2007 14:52:07 -0800 Luong Ngo wrote:

[...]
> static irqreturn board_isr(int irq, void *dev_id, struct pt_regs* regs)
> {
>  spin_lock(&dev->lock);
>if (dev->irqMask & (1 << irqBit)) {
> // Set the interrupt event mask
> dev->irqEvent |= (1 << irqBit);
>
> // Disable this irq, it will be reenabled after processed by board task
> disable_irq(irq);

I assume that your device does not support shared interrupts?  If it
does (and a PCI device is required to support them), you cannot use
disable_irq() here (and you need to check a register in the device to
find out if it really did generate an IRQ)...

 Yes, the device does not share interrupt.

> static int ats89_ioctl(struct inode *inode, struct file *file, u_int
> cmd, u_long arg)
> {
>
>   switch(cmd){
>case GET_IRQ_CMD: {
> u32  regMask32;
>
>spin_lock_irq(dev->lock);
>while ((dev->irqMask & dev->irqEvent) == 0) {
>  // Sleep until board interrupt happens
>  spin_unlock_irq(dev->lock);
>  interruptible_sleep_on(&(dev->boardIRQWaitQueue));
>  if (uncond_wakeup) {
>  /* don't go back to loop */
>  break;
>  }
>  spin_lock_irq(dev->lock);
>  }
>
> uncond_wakeup = 0;
>
>  // Board interrupt happened
> regMask32 = dev->irqMask & dev->irqEvent;
>  if(copy_to_user(&(((ATS89_IOCTL_S *)arg)->mask32),
> ®Mask32, sizeof(u32))) {
>  spin_unlock_irq(dev->lock);
>  return -EAGAIN;
>  }
>
>  // Clear the event mask
>  dev->irqEvent = 0;
>  spin_unlock_irq(dev->lock);
> }
> break;
>
>
>}
> }

And this code is full of bugs:

 1) As you have been told already, interruptible_sleep_on() and
   sleep_on() functions are broken and should not be used (they are
   left in the kernel only to support some obsolete code).  Either
   use wait_event_interruptible() or work with wait queues directly
   (prepare_to_wait(), finish_wait(), ...).

I agree.but as I said our hardware will repeatedly raising
interrupts until it's serviced, the missing wakeup call would be
repeated also, so this should still wake up the sleep_on call. But we
would change it definitely.

 2) The code to handle pending signals is missing - you need to have
   this after wait_event_interruptible():

   if (signal_pending(current))
   return -ERESTARTSYS;

   (but be careful - you might need to clean up something before
   returning).

   This is what causes your problem - interruptible_sleep_on()
   returns if a signal is pending, but your code does not check for
   signals and therefore invokes interruptible_sleep_on() again; but
   if a signal is pending, interruptible_sleep_on() returns
   immediately, causing your driver to eat 100% CPU looping in kernel
   mode until some device event finally happens.

As pointed out by Robert, I added the checking
  if(signal_pending(current))
  return -ERESTARTSYS;
right after the line interruptible_sleep_on , but I don't see any
difference yet.

 3) If uncond_wakeup is set, you break out of the loop with dev->lock
   unlocked; however, if dev->irqEvent gets set, you exit the loop
   with dev->lock locked.  The subsequent code always unlocks
   dev->lock, so in the uncond_wakeup case you have double unlock.

 Thanks for catching it

 4) You are doing copy_to_user() while holding a spinlock - this is
   prohibited (as any other form of sleep inside a spinlock).

Thanks again. But may I ask if it is prohibited, how come it has
been running without any error?

 5) The return code for the copy_to_user() failure is wrong - it
   should be -EFAULT (this is not a fatal bug, but an annoyance for
   users of your driver, who might get such nonstandard error codes
   while debugging their programs and wonder what is going on).

   changed.

Thank you for your input.

-LNgo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ABI coupling to hypervisors via CONFIG_PARAVIRT

2007-03-09 Thread Linus Torvalds

On Fri, 9 Mar 2007, Ingo Molnar wrote:
> 
> Unfortunately i still dont see where i'm wrong, and i'm really trying to 
> understand your argument. Is your argument that as long as an ABI (VMI) 
> is never directly used but only used via wrapper functions 
> (paravirt_ops)

No.

My argument is utternly and *purely* that you've been confusing the 
discussion by using the wrong terms, and as a result, you've been 
discussing things that aren't *relevant*.

You haven't been saying anything constructive.

For example, here's a *constructive* thing you could have said, and never 
actually did:

 - paravirt_op->write_apic should not exist

   anything that needs to write to the apic should ether

(a) have been caught much earlier in the paravirt stack, ie it's a 
"disable interrupt" kind of operation, and should never even have 
gotten to the APIC write in the first place, but been handled by 
the paravirtualized handler.
(b) just be emulated as an APIC write (and if the emulation isn't good 
enough, screw it)

In other words, to be *constructive*, you need to point out particular 
and practical problem spots, instead of just ranting about any 
"paravirtualized ABI".

Of *course* there is an ABI at some point behind any API. Why do you harp 
on that? It's irrelevant. Any API will always end up being instantiated 
into a binary thing at some point, and that binary thing will have to work 
with some particular version of a hardware/infrastructure combination, but 
that has *nothing* to do with anything.

The x86 instruction set is an ABI. Our API's eventually tend to be 
compiled to something like that ABI, and yes, some ABI's may not be able 
to do certain things. For example, on the 32-bit x86 ABI, there are no 
interfaces for address space identifiers, and the wrappers become a no-op 
that just don't do anything, and if you want fast context switches between 
two contexts, you're screwed.

Similarly, maybe the VMI ABI doesn't allow for something that the kernel 
wants to do efficiently. Big deal. What relevance does that have to do 
with anything, except the fact that if true, the VMWare people are 
screwed? It's *their* problem.

So please

 - point out things that are badly done. I agree that apic_write() simply 
   shouldn't be an ABI point at all. But do so *directly* without some 
   ranting about other things that aren't relevant. Your "1400 hooks" rant 
   was pointless - there aren't 1400 hooks at all. There are 1400 
   call-sites, but that's like saying that the "mov" operation is a bad 
   instruction, because there are 5 million mov instructions in the 
   kernel.

 - Realize that if VMI has problems, it's not *your* problem, or even the
   kernels problem. It's purely a VMI problem. I don't understand why you 
   care, or why you think we should care.

 - and I guess we can also stop cc'ing me in the first place. I don't even 
   think virtualization is very interesting. I'd much rather flame people 
   about bad taste in more important areas ;)

Thanks,

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2.6.20-1] radeonfb: Add support for Radeon xpress 200m

2007-03-09 Thread johan henriksson


Benjamin Herrenschmidt wrote:

-   radeonfb_pm_init(rinfo, rinfo->is_mobility ? 1 : -1, 
ignore_devlist, force_sleep);
+   radeonfb_pm_init(rinfo, rinfo->is_mobility && rinfo->family != 
CHIP_FAMILY_RS480 ? 1 : -1, ignore_devlist, force_sleep);


I'd rather you add a check for RS480 inside radeonfb_pm_*

Ben.




Something like this?


---
diff -upr linux-2.6.20.1-vanilla/drivers/video/aty/ati_ids.h 
linux-2.6.20.1/drivers/video/aty/ati_ids.h
--- linux-2.6.20.1-vanilla/drivers/video/aty/ati_ids.h  Tue Feb 20 07:34:32 2007
+++ linux-2.6.20.1/drivers/video/aty/ati_ids.h  Fri Mar  9 20:30:09 2007
@@ -209,4 +209,4 @@
#define PCI_CHIP_R423_5D57  0x5D57
#define PCI_CHIP_RS350_7834 0x7834
#define PCI_CHIP_RS350_7835 0x7835
-
+#define PCI_CHIP_RS480_5955 0x5955
diff -upr linux-2.6.20.1-vanilla/drivers/video/aty/radeon_base.c 
linux-2.6.20.1/drivers/video/aty/radeon_base.c
--- linux-2.6.20.1-vanilla/drivers/video/aty/radeon_base.c  Tue Feb 20 
07:34:32 2007
+++ linux-2.6.20.1/drivers/video/aty/radeon_base.c  Fri Mar  9 20:42:31 2007
@@ -100,6 +100,8 @@
{ PCI_VENDOR_ID_ATI, id, PCI_ANY_ID, PCI_ANY_ID, 0, 0, (flags) | 
(CHIP_FAMILY_##family) }

static struct pci_device_id radeonfb_pci_table[] = {
+/* Radeon Xpress 200m */
+   CHIP_DEF(PCI_CHIP_RS480_5955,   RS480,  CHIP_HAS_CRTC2 | CHIP_IS_IGP | 
CHIP_IS_MOBILITY),
/* Mobility M6 */
CHIP_DEF(PCI_CHIP_RADEON_LY,RV100,  CHIP_HAS_CRTC2 | 
CHIP_IS_MOBILITY),
CHIP_DEF(PCI_CHIP_RADEON_LZ,RV100,  CHIP_HAS_CRTC2 | 
CHIP_IS_MOBILITY),
@@ -1990,7 +1992,8 @@ static void radeon_identify_vram(struct 
	/* framebuffer size */

if ((rinfo->family == CHIP_FAMILY_RS100) ||
(rinfo->family == CHIP_FAMILY_RS200) ||
-(rinfo->family == CHIP_FAMILY_RS300)) {
+(rinfo->family == CHIP_FAMILY_RS300) ||
+   (rinfo->family == CHIP_FAMILY_RS480) ) {
  u32 tom = INREG(NB_TOM);
  tmp = tom >> 16) - (tom & 0x) + 1) << 6) * 1024);

diff -upr linux-2.6.20.1-vanilla/drivers/video/aty/radeon_pm.c 
linux-2.6.20.1/drivers/video/aty/radeon_pm.c
--- linux-2.6.20.1-vanilla/drivers/video/aty/radeon_pm.cTue Feb 20 
07:34:32 2007
+++ linux-2.6.20.1/drivers/video/aty/radeon_pm.cFri Mar  9 20:39:54 2007
@@ -2826,11 +2826,15 @@ void radeonfb_pm_init(struct radeonfb_in
rinfo->pm_reg = pci_find_capability(rinfo->pdev, PCI_CAP_ID_PM);

/* Enable/Disable dynamic clocks: TODO add sysfs access */
-   rinfo->dynclk = dynclk;
-   if (dynclk == 1) {
+   if (rinfo->family == CHIP_FAMILY_RS480)
+   rinfo->dynclk = -1;
+   else
+		rinfo->dynclk = dynclk; 
+	

+   if (rinfo->dynclk == 1) {
radeon_pm_enable_dynamic_mode(rinfo);
printk("radeonfb: Dynamic Clock Power Management enabled\n");
-   } else if (dynclk == 0) {
+   } else if (rinfo->dynclk == 0) {
radeon_pm_disable_dynamic_mode(rinfo);
printk("radeonfb: Dynamic Clock Power Management disabled\n");
}
diff -upr linux-2.6.20.1-vanilla/drivers/video/aty/radeonfb.h 
linux-2.6.20.1/drivers/video/aty/radeonfb.h
--- linux-2.6.20.1-vanilla/drivers/video/aty/radeonfb.h Tue Feb 20 07:34:32 2007
+++ linux-2.6.20.1/drivers/video/aty/radeonfb.h Fri Mar  9 20:30:09 2007
@@ -48,6 +48,7 @@ enum radeon_family {
CHIP_FAMILY_RV350,
CHIP_FAMILY_RV380,/* RV370/RV380/M22/M24 */
CHIP_FAMILY_R420, /* R420/R423/M18 */
+   CHIP_FAMILY_RS480,
CHIP_FAMILY_LAST,
};

@@ -64,7 +65,8 @@ enum radeon_family {
((rinfo)->family == CHIP_FAMILY_RV350) || \
((rinfo)->family == CHIP_FAMILY_R350)  || \
((rinfo)->family == CHIP_FAMILY_RV380) || \
-   ((rinfo)->family == CHIP_FAMILY_R420))
+   ((rinfo)->family == CHIP_FAMILY_R420)  || \
+   ((rinfo)->family == CHIP_FAMILY_RS480) )

/*
 * Chip flags
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] swsusp: Disable nonboot CPUs before entering platform suspend

2007-03-09 Thread Rafael J. Wysocki

Hi,

On Friday, 9 March 2007 09:54, Pavel Machek wrote:
> Hi!
> 
> > > > Index: linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > > > ===
> > > > --- linux-2.6.21-rc2-mm2.orig/kernel/power/disk.c
> > > > +++ linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > > > @@ -61,6 +61,7 @@ static void power_down(suspend_disk_meth
> > > > switch(mode) {
> > > > case PM_DISK_PLATFORM:
> > > > if (pm_ops && pm_ops->enter) {
> > > > +   disable_nonboot_cpus();
> > > > kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
> > > > pm_ops->enter(PM_SUSPEND_DISK);
> > > > break;
> > > 
> > > ...so, if pm_ops is non-null, power_down does nonboot cpu disabling,
> > > otherwise we proceed with cpus enabled?
> > > 
> > > That looks ugly.
> > > 
> > > Is the warning bogus?
> > 
> > Well, maybe.  I'm not sure.
> > 
> > > Or maybe we should *always* disable nonboot cpus in powerdown path?
> > 
> > I think we should do that.
> 
> That would be acceptable.
> 
> > > > Index: linux-2.6.21-rc2-mm2/kernel/power/user.c
> > > > ===
> > > > --- linux-2.6.21-rc2-mm2.orig/kernel/power/user.c
> > > > +++ linux-2.6.21-rc2-mm2/kernel/power/user.c
> > > > @@ -398,9 +398,9 @@ static int snapshot_ioctl(struct inode *
> > > >  
> > > > case PMOPS_ENTER:
> > > > if (data->platform_suspend) {
> > > > +   disable_nonboot_cpus();
> > > > 
> > > > kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
> > > > error = pm_ops->enter(PM_SUSPEND_DISK);
> > > > -   error = 0;
> > > > }
> > > > break;
> > > 
> > > Foe an userland application, disabling cpus during pmops_enter is at
> > > least surprising...
> > 
> > Yes, but this is not a usual ioctl().  OTOH, we can call 
> > enable_nonboot_cpus()
> > if pm_ops->enter(PM_SUSPEND_DISK) returns an error (otherwise it souldn't
> > return at all, no?).
> 
> Ok.

Well, does the appended patch look better?

Rafael


---
 kernel/power/disk.c |1 +
 kernel/power/user.c |3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6.21-rc3/kernel/power/disk.c
===
--- linux-2.6.21-rc3.orig/kernel/power/disk.c
+++ linux-2.6.21-rc3/kernel/power/disk.c
@@ -58,6 +58,7 @@ static inline int platform_prepare(void)
 
 static void power_down(suspend_disk_method_t mode)
 {
+   disable_nonboot_cpus();
switch(mode) {
case PM_DISK_PLATFORM:
if (pm_ops && pm_ops->enter) {
Index: linux-2.6.21-rc3/kernel/power/user.c
===
--- linux-2.6.21-rc3.orig/kernel/power/user.c
+++ linux-2.6.21-rc3/kernel/power/user.c
@@ -402,9 +402,10 @@ static int snapshot_ioctl(struct inode *
 
case PMOPS_ENTER:
if (data->platform_suspend) {
+   disable_nonboot_cpus();
kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
error = pm_ops->enter(PM_SUSPEND_DISK);
-   error = 0;
+   enable_nonboot_cpus();
}
break;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: refcounting drivers' data structures used in sysfs buffers

2007-03-09 Thread Oliver Neukum

Am Freitag, 9. März 2007 20:32 schrieb Alan Stern:
> On Fri, 9 Mar 2007, Dmitry Torokhov wrote:
> 
> > On 3/9/07, Oliver Neukum <[EMAIL PROTECTED]> wrote:
> > > Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:
> > >
> > > > I think we already have all refcounting that is needed. What is
> > > > missing is subsystem-provided ->release() hooks for drivers to release
> > > > driver-specific resources when a device finally goes away.
> > >
> > > This is an interesting idea. Is it nice to pass through release()
> > > but not open() ?
> > >
> > 
> > Not sure if I follow... Generally speaking open is not a mandatory
> > operation; however every object in driver model has a release method.
> > What I am saying is that certain drivers need to have their disconnect
> > method split in 2 parts - one that shuts down the device and second is
> > releases resources that might be accesses through sysfs (and other
> > kernel parts). That second part will have to be called from
> > subsystem's core ->release() method se we need a release() hook.
> 
> Dmitry, you're not viewing this correctly.
> 
> Adding a new release() callback would solve the problem by creating 
> another.  Drivers need to release their data as soon as possible after
> they unbind from a device, not when the device itself goes away.  Think

Wait, the callback from closing the file in sysfs is the earliest we can safely
free the data structure. How do you want to free earlier?

> about what would happen if you tried to rmmod a driver.  The rmmod process 
> would block until the device was unregistered.
> 
> Oliver, your idea won't work either.  Think about what would happen if 
> someone did
> 
>   rmmod driver_module  The rmmod process would never actually read the attribute, so until it 
> exited the private data structure would have a positive refcount.  But 
> rmmod can't exit until the driver has been unloaded from memory, and it 
> can't be unloaded while its data structure is still allocated.  Thus we 
> would end up with deadlock; rmmod would hang forever.
> 
> It might be better to keep your earlier patch and fix the deadlock you
> mentioned earlier, the one that occurs when unbinding a driver through
> sysfs.  How exactly does that deadlock work?

http://lkml.org/lkml/2007/3/6/364
http://lkml.org/lkml/2007/3/6/528

Regards
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: refcounting drivers' data structures used in sysfs buffers

2007-03-09 Thread Alan Stern

On Fri, 9 Mar 2007, Alan Stern wrote:

> Oliver, your idea won't work either.  Think about what would happen if 
> someone did
> 
>   rmmod driver_module  
> The rmmod process would never actually read the attribute, so until it 
> exited the private data structure would have a positive refcount.  But 
> rmmod can't exit until the driver has been unloaded from memory, and it 
> can't be unloaded while its data structure is still allocated.  Thus we 
> would end up with deadlock; rmmod would hang forever.

I take this back.  Redirecting stdin to the attribute file would increase 
the module's refcount and cause rmmod to exit immediately with an error.

After some more thought, I basically agree with what Oliver wrote
originally.  sysfs_dirent is indeed the logical place to store the kref
pointer.  However it needs to be used during open and release, not during
read, write, and poll.  Another point, which Oliver didn't think of, is
that the kref pointer needs to be passed to the driver as an argument in
the show() and store() method calls.

Implementing this will be difficult.  One possibility is to change the 
definition of sysfs_ops, adding the new struct kref * argument to the 
prototypes.  This will involve changing _lots_ of source files, adding an 
unused argument to many functions, which isn't attractive.

The other possibility is to test at runtime whether the kref pointer is 
NULL, and if it is, don't pass it.  This would work, but it isn't 
type-safe.

Finally, there's added complexity in each driver which wants to use the 
new facility.  The module_exit routine will need to be smart enough to 
block until all the private data structures have been released.  
usb-storage does something like that now; it's kind of ugly (although it 
could be improved if appropriate support were added to the core kernel).

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Bitbanging i2c bus driver using the GPIO API

2007-03-09 Thread Russell King

On Fri, Mar 09, 2007 at 11:30:12AM -0800, David Brownell wrote:
> On Friday 09 March 2007 10:48 am, Haavard Skinnemoen wrote:
> > This is a very simple bitbanging i2c bus driver utilizing the new
> > arch-neutral GPIO API. Useful for chips that don't have a built-in
> > i2c controller, additional i2c busses, or testing purposes.
> 
> That's the right idea!  But remember that not all GPIOs support
> reading back the actual value on SCL (it's an OUT pin, so lacking
> multidrive capability the values "should" be what you wrote), so
> getscl() support should depend on a flag in platform data.  In
> the same vein, if SCL is an output-only pin, you won't be able
> to change its direction ... but then, I'm not sure why you were
> changing its direction in setscl() rather than just its value.

That's a more correct I2C implementation.  If you read the specs, the
SDA and SCL signals are supposed to be driven by open-collector or
open-drain drivers, such that devices only pull the bus low.  Pull-up
resistors pull the signals high when undriven.

This avoids the possibility of damage caused when one device drives
a signal low and another device tries to drive it high.

Therefore, the correct I2C GPIO implementation is one where you drive
both SDA and SCL low by using a combination of the data direction
register and the output level register, but avoid driving the output
high.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ABI coupling to hypervisors via CONFIG_PARAVIRT

2007-03-09 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> Similarly, maybe the VMI ABI doesn't allow for something that the 
> kernel wants to do efficiently. Big deal. What relevance does that 
> have to do with anything, except the fact that if true, the VMWare 
> people are screwed? It's *their* problem.

i wont hold you up for long, but i think this is the key difference, and 
if i understand your point correctly i think you are really wrong here.

This is 'enterprise Linux compatibility and ABI 101', really - i dare to 
bet blindly that you wont see anyone here from distros arguing against 
this simple point.

Once this thing is released upstream, it creates a new compatibility 
rule:

  _new kernel must not break on an older hypervisor_

due to a new paravirt_ops design. Ever. It's really that simple. (I 
think i never said this explicitly because this requirement of backwards 
compatibility was so obvious to me.)

And it doesnt matter whether we think that it was VMWare who messed up. 
Users/customers _will_ blame us: "v2.6.25 regresses, it wont run under 
ESX v1.12 anymore". Distro will yield and will undo whatever change 
breaks backwards compatibility with older hypervisors. (most likely it 
will be undone upstream already) Backwards compatibility acts as a very 
heavy barrier against certain types of paravirt_ops design changes.

Once v2.6.21 is released, and a bigger distro releases a kernel with 
CONFIG_PARAVIRT+CONFIG_VMI enabled: backwards compatibility in future 
versions becomes mainly /that/ distro's problem (and upstream's 
problem), _NOT_ WMware's problem.

That's why i mentioned CONFIG_COMPAT_VDSO as an example. One major 
distro (SuSE 9.0) came out with that particular glibc version that had a 
bug that depended on a particular and totally unintentional ABI detail 
in the vDSO. As a result we had to do several iterations of 
CONFIG_COMPAT_VDSO to keep backwards compatibility. And glibc is perhaps 
_the_ most kernel-friendly external software project in existence. 
Still, the ABI dependency was there, and we cannot break users who run 
old userspace. The same rule holds here: we cannot break users who run 
an old hypervisor.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc3-mm1 RSDL results

2007-03-09 Thread Con Kolivas

On Saturday 10 March 2007 05:27, Matt Mackall wrote:
> On Fri, Mar 09, 2007 at 07:39:05PM +1100, Con Kolivas wrote:
> > On Friday 09 March 2007 19:20, Matt Mackall wrote:
> > > And I've just rebooted with NO_HZ and things are greatly improved. At
> > > idle, Beryl effects are silky smooth (possibly better than stock) and
> > > shows less load. Under 'make', Beryl is still responsive as is Galeon.
> > > No sign of lagging mouse or typing.
> > >
> > > Under make -j 5, things are intermittent. Galeon scrolling is
> > > sometimes still responsive, but Beryl, terminals and mouse still drag
> > > quite a bit.
> >
> > I just replied before you sent this one out I think our messages passed
> > each other across the ocean somewhere. I don't quite get what combination
> > of factors you're saying here caused great improvement. Was it enabling
> > NO_HZ on mainline cpu scheduler or disabling NO_HZ or on RSDL?
>
> Turning on NO_HZ on RSDL greatly improved it. I have not tried NO_HZ
> on mainline. The first test was with NO_HZ=n, the second was with
> NO_HZ=y.

How odd. I would have thought that if an interaction was to occur it would 
have been without the new feature. Clearly what you describe without NO_HZ is 
not the expected behaviour with RSDL. I wonder what went wrong. Are you on 
100HZ on that laptop? While I expect 100HZ should be ok, it might just not 
be... My laptop is about the same performance and works fine with 100HZ under 
load of all sorts BUT I don't have Beryl (which I would have thought swayed 
things in the opposite direction also).

> As an aside, we should not name config options NO_* or DISABLE_*
> because of the potential for double negation.

Case in point,  I couldn't figure out what you were saying :)

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix building kernel under Solaris 11_snv

2007-03-09 Thread Jan Engelhardt


On Mar 9 2007 20:00, Sam Ravnborg wrote:
>On Thu, Mar 08, 2007 at 11:01:57PM +0100, Jan Engelhardt wrote:
>> 
>> Since Solaris seems to be on the run, I did myself try compile it. 
>> However, unlike the original poster who said he did so on SunOS 4.8, I 
>> did it on 5.11_snv39, yielding a bigger changeset. I thought I just 
>> share the diff that piled up so far. It needs a lot of hacks on the 
>> Solaris side - prioritizing GNU names, then, second, gnu ld has a 
>> glitch, then, gcc has a missing file... it's fun fun fun!
>
>Can I please have a signed-off version of this patch.

_Are you sure_ you want all these hacks without further
review from other people? Also note the patch is incomplete,
for example I could not compile the acpi pieces because
acsolaris.h -- which is referenced in the acpi includes --
does not exist. (Yet another piece of software that has
crossplatform compatibilty stuff, like XFS.)

>> --- linux-2.6.21-rc3.orig/include/linux/input.h  2007-03-07 
>> 05:41:20.0 +0100
>> +++ linux-2.6.21-rc3/include/linux/input.h   2007-03-07 23:40:39.417339000 
>> +0100
>> @@ -16,7 +16,9 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>> +#ifndef __sun__
>> +#   include 
>> +#endif
>>  #endif

This is not a proper fix for sure. The problem lies in
file2alias.c, see (your own) http://lkml.org/lkml/2007/3/8/339

>> Index: linux-2.6.21-rc3/scripts/genksyms/genksyms.c
>> ===
>> --- linux-2.6.21-rc3.orig/scripts/genksyms/genksyms.c2007-03-07 
>> 05:41:20.0 +0100
>> +++ linux-2.6.21-rc3/scripts/genksyms/genksyms.c 2007-03-07 
>> 23:28:35.659555000 +0100
>> @@ -21,6 +21,7 @@
>> along with this program; if not, write to the Free Software Foundation,
>> Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
>>  
>> +#include 
>>  #include 
>>  #include 
>>  #include 

This is however, is valid. Can I gave sign-offs for single hunks?

>> Index: linux-2.6.21-rc3/scripts/kallsyms.c
>> ===
>> --- linux-2.6.21-rc3.orig/scripts/kallsyms.c 2007-03-07 05:41:20.0 
>> +0100
>> +++ linux-2.6.21-rc3/scripts/kallsyms.c  2007-03-07 23:46:46.249005000 
>> +0100
>> @@ -378,6 +378,40 @@
>>  table_cnt = pos;
>>  }
>>  
>> +#ifdef __sun__
>> +/* Return the first occurrence of NEEDLE in HAYSTACK.  */
>> +void *
>> +memmem (haystack, haystack_len, needle, needle_len)
>> + const void *haystack;
>> + size_t haystack_len;
>> + const void *needle;
>> + size_t needle_len;
>> +{
>> +  const char *begin;
>> +  const char *const last_possible
>> += (const char *) haystack + haystack_len - needle_len;
>> +
>> +  if (needle_len == 0)
>> +/* The first occurrence of the empty string is deemed to occur at
>> +   the beginning of the string.  */
>> +return (void *) haystack;
>> +
>> +  /* Sanity check, otherwise the loop might search through the whole
>> + memory.  */
>> +  if (__builtin_expect (haystack_len < needle_len, 0))
>> +return NULL;
>> +
>> +  for (begin = (const char *) haystack; begin <= last_possible; ++begin)
>> +if (begin[0] == ((const char *) needle)[0] &&
>> +!memcmp ((const void *) &begin[1],
>> + (const void *) ((const char *) needle + 1),
>> + needle_len - 1))
>> +  return (void *) begin;
>> +
>> +  return NULL;
>> +}
>> +#endif
>> +
>>  /* replace a given token in all the valid symbols. Use the sampled symbols
>>   * to update the counts */
>>  static void compress_symbols(unsigned char *str, int idx)

This one, I am just waiting for someone to object to the extra #if-#endif.

>> Index: linux-2.6.21-rc3/scripts/kconfig/Makefile
>> ===
>> --- linux-2.6.21-rc3.orig/scripts/kconfig/Makefile   2007-03-07 
>> 05:41:20.0 +0100
>> +++ linux-2.6.21-rc3/scripts/kconfig/Makefile2007-03-07 
>> 23:21:19.730679000 +0100
>> @@ -88,7 +88,7 @@
>>  HOST_EXTRACFLAGS = $(shell $(CONFIG_SHELL) $(check-lxdialog) -ccflags)
>>  HOST_LOADLIBES   = $(shell $(CONFIG_SHELL) $(check-lxdialog) -ldflags 
>> $(HOSTCC))
>>  
>> -HOST_EXTRACFLAGS += -DLOCALE
>> +HOST_EXTRACFLAGS += -DLOCALE -std=c99 -D__EXTENSIONS__
>>  
>>  PHONY += $(obj)/dochecklxdialog
>>  $(obj)/dochecklxdialog:

The error message for this one was:  only valid in C99 mode.
Linux GCC 4.1.2 does not print that, Solaris GCC 3.4.3 does. I do not
know offhand who is right.

>> Index: linux-2.6.21-rc3/scripts/kconfig/lxdialog/dialog.h
>> ===
>> --- linux-2.6.21-rc3.orig/scripts/kconfig/lxdialog/dialog.h  2007-03-07 
>> 05:41:20.0 +0100
>> +++ linux-2.6.21-rc3/scripts/kconfig/lxdialog/dialog.h   2007-03-07 
>> 23:14:48.462956000 +0100
>> @@ -222,3 +222,7 @@
>>   *   -- uppercase chars are used to invoke the button (M_EVENT + 'O')
>>   */
>>  #de

< 1 2 3 4 5 6 >

101 - 200 of 523 matches

Mail list logo