Re: [PATCH 0/2] unify DMA_..BIT_MASK definitions: v1

2007-09-17 Thread Muli Ben-Yehuda
On Tue, Sep 18, 2007 at 06:29:19AM +0200, Borislav Petkov wrote:
> These patches remove redundant DMA_..BIT_MASK definitions across two drivers.
> In this version of the patches, the computation of the bitmasks is done by
> the compiler.
> 
> Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> 
> --
> Index: 23-rc6/include/linux/dma-mapping.h
> ===
> --- 23-rc6/include/linux/dma-mapping.h.orig   2007-09-17 17:48:20.0 
> +0200
> +++ 23-rc6/include/linux/dma-mapping.h2007-09-18 06:12:33.0 
> +0200
> @@ -13,16 +13,19 @@
>   DMA_NONE = 3,
>  };
>  
> -#define DMA_64BIT_MASK   0xULL
> -#define DMA_48BIT_MASK   0xULL
> -#define DMA_40BIT_MASK   0x00ffULL
> -#define DMA_39BIT_MASK   0x007fULL
> -#define DMA_32BIT_MASK   0xULL
> -#define DMA_31BIT_MASK   0x7fffULL
> -#define DMA_30BIT_MASK   0x3fffULL
> -#define DMA_29BIT_MASK   0x1fffULL
> -#define DMA_28BIT_MASK   0x0fffULL
> -#define DMA_24BIT_MASK   0x00ffULL
> +#define DMA_BIT_MASK(n)  ((1ULL<<(n))-1)
> +
> +#define DMA_64BIT_MASK   DMA_BIT_MASK(64)

This one does not do what you mean. You need an explicit mask or a
~0ULL here.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

2007-09-17 Thread Daniel Phillips
(Reposted for completeness.  Previously rejected by vger due to 
accidental send as html mail.  CC's except for Mike and vger deleted)

On Monday 17 September 2007 20:27, Mike Snitzer wrote:
> To give you context for where I'm coming from; I'm looking to get NBD
> to survive the mke2fs hell I described here:
> http://marc.info/?l=linux-mm=118981112030719=2

The dread blk_congestion_wait is biting you hard.  We're very familiar
with the feeling.  Congestion_wait is basically the traffic cop that
implements the dirty page limit.  I believe it was conceived as a
method of fixing writeout deadlocks, but in our experience it does not
help, in fact it introduces a new kind of deadlock
(blk_congestion_wait) that is much easier to trigger.  One of the
things we do to get ddsnap running reliably is disable congestion_wait
via the PF_LESS_THROTTLE hack that was introduced to stop local NFS
clients from deadlocking.  NBD will need a similar treatment.

Actually, I hope to show quite soon that dirty page limiting is not
needed at all in order to prevent writeout deadlock.  In which case we
can just get rid of the dirty limits and go back to being able to use
all of non-reserve memory as a write cache, the way things used to be
in the days of yore.

It has been pointed out to me that congestion_wait not only enforces
the dirty limit, it controls the balancing of memory resources between
slow and fast block devices.  The Peterz/Phillips approach to deadlock
prevention does not provide any such balancing and so it seems to me
that congestion_wait is ideally situated in the kernel to provide that
missing functionality.  As I see it, blk_congestion_wait can easily be
modified to balance the _rate_ at which cache memory is dirtied for
various block devices of different speeeds.  This should turn out to
be less finicky than balancing the absolute ratios, after all you can
make a lot of mistakes in rate limiting and still not deadlock so long
as dirty rate doesn't drop to zero and stay there for any block
device.  Gotta be easy, hmm?

Please note: this plan is firmly in the category of speculation until
we have actually tried it and have patches to show, but I thought that
now  is about the right time to say something about where we think
this storage robustness work is headed.

> >   - Statically prove bounded memory use of all code in the writeout
> > path.
> >
> >   - Implement any special measures required to be able to make such
> > a proof.
>
> Once the memory requirements of a userspace daemon (e.g. nbd-server)
> are known; should one mlockall() the memory similar to how is done in
> heartbeat daemon's realtime library?

Yes, and also inspect the code to ensure it doesn't violate mlock_all
by execing programs (no shell scripts!), dynamically loading
libraries, etc.

> Bigger question for me is what kind of hell am I (or others) in for
> to try to cap nbd-server's memory usage?  All those glib-gone-wild
> changes over the recent past feel problematic but I'll look to work
> with Wouter to see if we can get things bounded.

Avoiding glib is a good start.  Look at your library dependencies and
prune them merclilessly.  Just don't use any libraries that you can
code up yourself in a few hundred bytes of program text for the
functionalituy you need.

> >   - All allocations performed by the block driver must have access
> > to dedicated memory resources.
> >
> >   - Disable the congestion_wait mechanism for our code as much as
> > possible, at least enough to obtain the maximum memory
> > resources that can be used on the writeout path.
>
> Would peter's per bdi dirty page accounting patchset provide this?
> If not, what steps are you taking to disable this mechanism?  I've
> found that nbd-server is frequently locked with 'blk_congestion_wait'
> in its call trace when I hit the deadlock.

See PF_LESS_THROTTLE.   Also notice that this mechanism is somewhat
less than general.  In mainline it only has one user, NFS, and it only
can have one user before you have to fiddle that code to create things
like PF_EVEN_LESS_THROTTLE.

As far as I can see, not having any dirty page limit for normal
allocations is the way to go, it avoids this mess nicely.  Now we just
need to prove that this works ;-)

> > The specific measure we implement in order to prove a bound is:
> >
> >   - Throttle IO on our block device to a known amount of traffic
> > for which we are sure that the MEMALLOC reserve will always be
> > adequate.
>
> I've embraced Evgeniy's bio throttle patch on a 2.6.22.6 kernel
> http://thread.gmane.org/gmane.linux.network/68021/focus=68552
>
> But are you referring to that (as you did below) or is this more a
> reference to peterz's bdi dirty accounting patchset?

No, it's a patch I wrote based on Evgeniy's original, that appeared
quietly later in the thread.  At the time we hadn't tested it and now
we have.  It works fine, it's short, general, efficient and easy to
understand.  So it will get a post of its own 

Re: CFS patch (v6) -- dynamic RT priorities?

2007-09-17 Thread Willy Tarreau
On Mon, Sep 17, 2007 at 02:33:28PM -0600, Chris Rigg wrote:
> Hello,
> 
> I have a system with 2.6.20.7 patched with the v6 CFS patch. I am having 
> issues (I believe) with fairness in regards to my real-time tasks. 
> First, let me describe my setup:

Chris,

CFSv6 is *very* old. It was not that bad, but looking back, it has
improved a lot since. You should upgrade to v20.x first, and it's
likely that most of your problems will vanish.

Regards,
willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Scheduler benchmarks - a follow-up

2007-09-17 Thread Rob Hussey
On 9/18/07, Willy Tarreau <[EMAIL PROTECTED]> wrote:
> Hi Rob,
>
> On Tue, Sep 18, 2007 at 12:30:05AM -0400, Rob Hussey wrote:
> > I should have pointed out before that I don't really have a dual-core
> > system, just a P4 with Hyper-Threading (I loosely used core to refer
> > to processor).
>
> Just for reference, we call them "siblings", not "cores" on HT. I believe
> that a line "Sibling:" appears in /proc/cpuinfo BTW.

Thanks, I was searching for the right word but couldn't come up with it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Scheduler benchmarks - a follow-up

2007-09-17 Thread Willy Tarreau
Hi Rob,

On Tue, Sep 18, 2007 at 12:30:05AM -0400, Rob Hussey wrote:
> I should have pointed out before that I don't really have a dual-core
> system, just a P4 with Hyper-Threading (I loosely used core to refer
> to processor).

Just for reference, we call them "siblings", not "cores" on HT. I believe
that a line "Sibling:" appears in /proc/cpuinfo BTW.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/7] Use extended crashkernel command line on i386

2007-09-17 Thread Vivek Goyal
On Thu, Sep 13, 2007 at 06:14:30PM +0200, Bernhard Walle wrote:
> -
>  void arch_crash_save_vmcoreinfo(void)
>  {
>  #ifdef CONFIG_ARCH_DISCONTIGMEM_ENABLE
> --- a/arch/i386/kernel/setup.c
> +++ b/arch/i386/kernel/setup.c
> @@ -381,6 +381,33 @@ extern unsigned long __init setup_memory
>  extern void zone_sizes_init(void);
>  #endif /* !CONFIG_NEED_MULTIPLE_NODES */
> 
> +#ifdef CONFIG_KEXEC
> +static void reserve_crashkernel(void)
> +{
> + unsigned long long  free_mem;
> + unsigned long long  crash_size, crash_base;
> + int ret;
> +
> + free_mem = (max_low_pfn + highend_pfn - highstart_pfn) << PAGE_SHIFT;
> +
> + ret = parse_crashkernel(boot_command_line, free_mem,
> + _size, _base);
> + if (ret == 0 && crash_size > 0 && crash_base > 0) {
> + printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
> + "for crashkernel (System RAM: %ldMB)\n",
> + (unsigned long)(crash_size >> 20),
> + (unsigned long)(crash_base >> 20),
> + (unsigned long)(free_mem >> 20));
> + crashk_res.start = crash_base;
> + crashk_res.end   = crash_base + crash_size - 1;
> + reserve_bootmem(crash_base, crash_size);

Hi Bernhard,

I think we might need to do more here. Because [offset] is optional, one
would assume that things will work even if offset is not specified. But
in this patchset, that's not the case for i386 and x86_64. It will silently
fail if a user does not specify the offset. No memory will be reserved for
capture kernel.

I think we either need to make offset mandatory or put in additional
intelligence here to choose the offset automatically based on the memory
available.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] unify DMA_..BIT_MASK definitions v1: cleanup drivers/scsi/gdth.c

2007-09-17 Thread Borislav Petkov
Move dma bitmask definitions into the dma-mappings header.

Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

--
Index: 23-rc6/drivers/scsi/gdth.c
===
--- 23-rc6/drivers/scsi/gdth.c.orig 2007-09-17 17:53:26.0 +0200
+++ 23-rc6/drivers/scsi/gdth.c  2007-09-17 17:53:49.0 +0200
@@ -392,12 +392,7 @@
 #include 
 #include 
 #include 
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,6)
 #include 
-#else
-#define DMA_32BIT_MASK 0xULL
-#define DMA_64BIT_MASK 0xULL
-#endif
 
 #ifdef GDTH_RTC
 #include 

-- 
Regards/Gruß,
Boris.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] unify DMA_..BIT_MASK definitions v1: netxen local defs

2007-09-17 Thread Borislav Petkov
Move dma bitmask definitions into the dma-mappings header.

Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

--
Index: 23-rc6/drivers/net/netxen/netxen_nic_main.c
===
--- 23-rc6/drivers/net/netxen/netxen_nic_main.c.orig2007-09-17 
17:51:46.0 +0200
+++ 23-rc6/drivers/net/netxen/netxen_nic_main.c 2007-09-17 17:51:59.0 
+0200
@@ -54,9 +54,6 @@
 #define NETXEN_ADAPTER_UP_MAGIC 777
 #define NETXEN_NIC_PEG_TUNE 0
 
-#define DMA_32BIT_MASK 0xULL
-#define DMA_35BIT_MASK 0x0007ULL
-
 /* Local functions to NetXen NIC driver */
 static int __devinit netxen_nic_probe(struct pci_dev *pdev,
  const struct pci_device_id *ent);

-- 
Regards/Gruß,
Boris.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Scheduler benchmarks - a follow-up

2007-09-17 Thread Rob Hussey
On 9/17/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> > i've meanwhile tested hackbench 90 and the performance difference
> > between -ck and -cfs-devel seems to be mostly down to the more precise
> > (but slower) sched_clock() introduced in v2.6.23 and to the startup
> > penalty of freshly created tasks.
>
> Rob, another thing i just noticed in your .configs: you have
> CONFIG_PREEMPT=y enabled. Would it be possible to get a testrun with
> that disabled? That gives the best throughput and context-switch latency
> numbers. (CONFIG_PREEMPT might also have preemption artifacts - there's
> one report of it having _worse_ desktop latencies on certain hardware
> than !CONFIG_PREEMPT.)

I reverted the patch from before since it didn't seem to help. Do you
think it may have to do with my system having Hyper-Threading enabled?
I should have pointed out before that I don't really have a dual-core
system, just a P4 with Hyper-Threading (I loosely used core to refer
to processor).

Some new numbers for 2.6.23-rc6-cfs-devel (!CONFIG_PREEMPT and bound
to single processor)

lat_ctx:

15  2.73
16  2.74
17  2.81
18  2.74
19  2.74
20  2.73
21  2.60
22  2.74
23  2.72
24  2.74
25  2.74

hackbench:

80 11.578
81 11.991
82 11.914
83 12.026
84 12.226
85 12.347
86 12.552
87 12.655
88 13.011
89 12.941
90 13.237

pipe-test:

1  9.58
2  9.58
3  9.58
4  9.58
5  9.58
6  9.58
7  9.58
8  9.58
9  9.58
10 9.58

The obligatory graphs:
http://www.healthcarelinen.com/misc/benchmarks/BOUND_NOPREEMPT_lat_ctx_benchmark.png
http://www.healthcarelinen.com/misc/benchmarks/BOUND_NOPREEMPT_hackbench_benchmark.png
http://www.healthcarelinen.com/misc/benchmarks/BOUND_NOPREEMPT_pipe-test_benchmark.png

A cursory glance suggests that performance wrt lat_ctx and hackbench
has increased (lower numbers), but degraded quite a lot for pipe-test.
The numbers for  pipe-test are extremely stable though, while the
numbers for hackbench are more erratic (which isn't saying much since
the original numbers gave nearly a straight line). I'm still willing
to try out any more ideas.

Regards,
Rob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] unify DMA_..BIT_MASK definitions: v1

2007-09-17 Thread Borislav Petkov
These patches remove redundant DMA_..BIT_MASK definitions across two drivers.
In this version of the patches, the computation of the bitmasks is done by
the compiler.

Signed-off-by: Borislav Petkov <[EMAIL PROTECTED]>
Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

--
Index: 23-rc6/include/linux/dma-mapping.h
===
--- 23-rc6/include/linux/dma-mapping.h.orig 2007-09-17 17:48:20.0 
+0200
+++ 23-rc6/include/linux/dma-mapping.h  2007-09-18 06:12:33.0 +0200
@@ -13,16 +13,19 @@
DMA_NONE = 3,
 };
 
-#define DMA_64BIT_MASK 0xULL
-#define DMA_48BIT_MASK 0xULL
-#define DMA_40BIT_MASK 0x00ffULL
-#define DMA_39BIT_MASK 0x007fULL
-#define DMA_32BIT_MASK 0xULL
-#define DMA_31BIT_MASK 0x7fffULL
-#define DMA_30BIT_MASK 0x3fffULL
-#define DMA_29BIT_MASK 0x1fffULL
-#define DMA_28BIT_MASK 0x0fffULL
-#define DMA_24BIT_MASK 0x00ffULL
+#define DMA_BIT_MASK(n)((1ULL<<(n))-1)
+
+#define DMA_64BIT_MASK DMA_BIT_MASK(64)
+#define DMA_48BIT_MASK DMA_BIT_MASK(48)
+#define DMA_40BIT_MASK DMA_BIT_MASK(40)
+#define DMA_39BIT_MASK DMA_BIT_MASK(39)
+#define DMA_35BIT_MASK DMA_BIT_MASK(35)
+#define DMA_32BIT_MASK DMA_BIT_MASK(32)
+#define DMA_31BIT_MASK DMA_BIT_MASK(31)
+#define DMA_30BIT_MASK DMA_BIT_MASK(30)
+#define DMA_29BIT_MASK DMA_BIT_MASK(29)
+#define DMA_28BIT_MASK DMA_BIT_MASK(28)
+#define DMA_24BIT_MASK DMA_BIT_MASK(24)
 
 static inline int valid_dma_direction(int dma_direction)
 {

-- 
Regards/Gruß,
Boris.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: My position on general ``RAS'' tool support infrastructure

2007-09-17 Thread Vivek Goyal
On Mon, Sep 17, 2007 at 06:38:53PM -0700, Randy Dunlap wrote:
> On Thu, 13 Sep 2007 07:21:10 -0600 Eric W. Biederman wrote:
> 
> > Pete/Piet Delaney <[EMAIL PROTECTED]> writes:
> > 
> > > Jason, Eric:
> > >
> > > Did you read Keith Owens suggestion on RAS tools from:
> 
> 
> Yes.  and I re-read it.
> 
> There are several things in Keith's email that make sense:
> 
> a.  all RAS tools should use a common interface
> b.  it's not the kernel's job to decide which RAS tool runs first
> 
> 
> Eric makes some good points too.  I'm mostly similar to Eric:
> paranoid about trusting software/hardware after a panic (or oops).
> 
> So if someone wants to use multiple RAS tools on a panic event,
> enabling an admin to set priorities is OK with me, but I'll only
> trust the first one that is used, and even that one may have
> problems.  IOW, I don't see a big need to support multiple RAS
> tools at one time.  (speaking for myself)
> 

I would be nice to have a kernel debugger co-exist with crash dumping.

I like Eric's idea of debugger putting a break point on panic(). This
would mean that rest of the post panic() actions have to be performed
by second kernel which can perform those actions much more reliably.

But this also brings in the additional requirement of passing all the
required context to second kernel. For example, in the past somebody wanted
to send a message to a remote node that sytem crashed so that standby can
take over.  If the same job has to be done in second kernel, it requires all
the relavant information like remote host IP, port etc passed to the second
kernel which I think makes the job little harder. May be one can pre-configure
these parameters in user space and let the job be done either from initrd
or user space scripts in second kernel.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH mm] fix swapoff breakage; however...

2007-09-17 Thread Balbir Singh
Hugh Dickins wrote:
> On Tue, 18 Sep 2007, Balbir Singh wrote:
>> Hugh Dickins wrote:
>>> What would make sense is (what I meant when I said swap counted
>>> along with RSS) not to count pages out and back in as they are
>>> go out to swap and back in, just keep count of instantiated pages
>>>
>> I am not sure how you define instantiated pages. I suspect that
>> you mean RSS + pages swapped out (swap_pte)?
> 
> That's it.  (Whereas file pages counted out when paged out,
> then counted back in when paged back in.)
> 
>> If a swapoff is going to push a container over it's limit, then
>> we break the container and the isolation it provides.
> 
> Is it just my traditional bias, that makes me prefer you break
> your container than my swapoff?  I'm not sure.
>


:-) Please see my response below

>> Upon swapoff
>> failure, may be we could get the container to print a nice
>> little warning so that anyone else with CAP_SYS_ADMIN can fix the
>> container limit and retry swapoff.
> 
> And then they hit the next one... rather like trying to work out
> the dependencies of packages for oneself: a very tedious process.
> 

Yes, but here's the overall picture of what is happening

1. The system administrator setup a memory container to contain
   a group of applications.
2. The administrator tried to swapoff one/a group of swap files/
   devices
3. Operation 2, failed due to a container being above it's limit.
   Which implies that at some point a container went over it's
   limit and some of it's pages were swapped out

During swapoff, we try to account for pages coming back into the
container, our charging routine does try to reclaim pages,
which in turn implies -- it will use another swap device or
reclaim page cache, if both fails, we return -ENOMEM.

Given that the system administrator has setup the container and
the swap devices, I feel that he is in better control of what
to do with the system when swapoff fails.

In the future we plan to implement per container swap (a feature
desired by several people), assuming that administrators use
per container swap in the future, failing on limit sounds
like the right way to go forward.

> If the swapoff succeeds, that does mean there was actually room
> in memory (+ other swap) for everyone, even if some have gone over
> their nominal limits.  (But if the swapoff runs out of memory in
> the middle, yes, it might well have assigned the memory unfairly.)
> 

Yes, precisely my point, the administrator is the best person
to decide how to assign memory to containers. Would it help
to add a container tunable that says, it's ok to go overlimit
with this container during a swapoff.

> The appropriate answer may depend on what you do when a container
> tries to fault in one more page than its limit.  Apparently just
> fail it (no attempt to page out another page from that container).
> 

The problem with that approach is that applications will fail
in the middle of their task. They will never get a chance
to run at all, they will always get killed in the middle.
We want to be able to reclaim pages from the container and
let the application continue.

> So, if the whole system is under memory pressure, kswapd will
> be keeping the RSS of all tasks low, and they won't reach their
> limits; whereas if the system is not under memory pressure,
> tasks will easily approach their limits and so fail.
> 

Tasks failing on limit does not sound good unless we are out
of all backup memory (slow storage). We still let the application
run, although slowly.


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Network sky2 Module

2007-09-17 Thread ben soo
i'm experiencing this problem myself.  i have 2 servers, one using 
X86_64 kernel version 2.6.23-rc5 on a 100Mbit network and one with 
i386 kernel version 2.6.23-rc6 on a 1Gbit network.


They both have this issue with the sky2 network device driver 
whereby the device would stop working and need to be brought down 
and back up.


On the X86_64 kernel on a 100Mbit network, this is a very 
occasional thing, while on the i386 kernel on a 1Gbit network the 
device only works for a few minutes at a time.  If i set the MTU to 
7200 then the device seems to stay functional, but then i see long 
delays when it's talking to 100Mbit devices with standard 1500 MTU 
that are outside of its LAN segment.


This last might be an artifact caused by the firewall, i dunno.

b

Werner Meurer wrote:
[1.] One line summary of the problem:   
hw csum failure appears in syslog


[2.] Full description of the problem/report:

hw csum failure appears in syslog and sometimes, under heavy network 
utilization, with NFS-Daemon the Network Device totally fails. Then no 
Network Access is possible. Reboot is not required but i must restart 
the Network with the following commands: "ifdown eth0" and "rmmod sky2", 
then "insmod sky2" and "ifup eth0".


[3.] Keywords (i.e., modules, networking, kernel):

sky2 (Marvell Yukon Onboard Ethernet), networking, checksum

[4.] Kernel version (from /proc/version):

Linux version 2.6.18.8 ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 
(prerelease) (SUSE Linux)) #6 SMP Sun Aug 5 15:09:57 CEST 2007


[5.] Output of Oops.. message (if applicable) with symbolic information 
resolved (see Documentation/oops-tracing.txt)


endeavour:~ # dmesg -c
: hw csum failure.

Call Trace:
[] __skb_checksum_complete+0x4a/0x62
[] udp_poll+0x67/0xf3
[] do_select+0x285/0x46d
[] __pollwait+0x0/0xe0
[] default_wake_function+0x0/0xe
[] _spin_lock_bh+0x9/0x14
[] release_sock+0x13/0xaa
[] udp_sendmsg+0x480/0x563
[] sock_sendmsg+0xf3/0x110
[] sock_recvmsg+0x101/0x120
[] autoremove_wake_function+0x0/0x2e
[] sock_aio_write+0x51/0x60
[] try_to_wake_up+0x3e2/0x3f3
[] sys_select+0x26f/0x3d6
[] sys_sendto+0x119/0x14c
[] system_call+0x7e/0x83


[6.] A small shell script or example program which triggers the
problem (if possible)

-

[7.] Environment

Homebrew Server (Asus P5WDG2-WS Motherboard).

[7.1.] Software (add the output of the ver_linux script here)

Linux endeavour 2.6.18.8 #6 SMP Sun Aug 5 15:09:57 CEST 2007 x86_64 
x86_64 x86_6  4 GNU/Linux


Gnu C  4.1.2
Gnu make   3.81
binutils   2.17.50.0.5
util-linux 2.12r
mount  2.12r
module-init-tools  3.2.2
e2fsprogs  1.39
jfsutils   1.1.11
reiserfsprogs  3.6.19
xfsprogs   2.8.11
PPP2.4.4
Linux C Library> libc.2.5
Dynamic linker (ldd)   2.5
Procps 3.2.7
Net-tools  1.60
Kbd1.12
Sh-utils   6.4
udev   103
Modules Loaded capability commoncap sky2 nfs lockd nfs_acl 
sunrpc iptabl  e_filter 
ip_tables x_tables nls_utf8 af_packet w83627ehf hwmon i2c_isa eeprom 
i2  c_dev joydev st sd_mod 
ide_cd truecrypt nls_iso8859_1 nls_cp437 vfat fat vmnet 
v  mmon ipv6 button battery 
ac sg sr_mod cdrom loop usb_storage scsi_mod dm_mod 
i2c  _i801 shpchp 
pci_hotplug ehci_hcd uhci_hcd i2c_core usbcore floppy parport_pc 
lp   parport ext3 mbcache 
jbd edd fan piix thermal processor ide_disk ide_core



[7.2.] Processor information (from /proc/cpuinfo):

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 6
model name  :   Intel(R) Pentium(R) D CPU 3.60GHz
stepping: 4
cpu MHz : 3612.869
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 6
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall 
lm constant_tsc pni monitor ds_cpl vmx est cid cx16 xtpr lahf_lm

bogomips: 7231.81
clflush size: 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual

[7.3.] Module information (from /proc/modules):

capability 22536 0 - Live 0x88435000
commoncap 25472 1 capability, Live 0x8842d000
sky2 61444 0 - Live 0x8811b000
nfs 275768 0 - Live 0x883e8000
lockd 86800 1 nfs, Live 0x883d1000
nfs_acl 20608 1 nfs, Live 0x883ca000
sunrpc 193224 3 nfs,lockd,nfs_acl, Live 0x88399000
iptable_filter 20096 0 - Live 0x88393000
ip_tables 39656 1 

Re: Fwd: Intel DQ35JOE Mainboard 82566DM-2 Gigabit Network

2007-09-17 Thread Kok, Auke
John Duthie wrote:
> I'm having a few Problems with a NEW PC
> 
> Spec is:
> Intel DQ35JOE Mainboard
> 
> The Integrated NIC is not found by kernel 2.6.23-rc6 or  2.6.22.1
> Am I missing an option in there ??

support for that nic has not yet been released as a -rc or stable kernel release

> The Intel Drivers (e1000-7.6.5)  don't compile against 2.6.23-rc6 or 2.6.22.1
> /usr/src/intel/e1000- 7.6.5/src/e1000_ethtool.c:2109: error:
> 'ethtool_op_get_perm_addr' undeclared here (not in a function)
> ( I know, wrong place to report this .. )

deleting that 1 line should fix the compile issue

> ( also SATA dvd writer does not seem to write yet )
> 
> If anyone has Patches to try I'm Currently able and willing to test
> them on this hardware config!
> 
> see attached stuff
> mail me for more info if required !

A new driver will be available in 2 weeks, or you can use any of the git trees
that have the "e1000e" driver which sill support this device (davem/net-2.6.24 
or
jgarzik/[EMAIL PROTECTED])

Cheers,

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

2007-09-17 Thread Mike Snitzer
On 9/17/07, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> On Friday 07 September 2007 22:12, Mike Snitzer wrote:
> > Can you be specific about which changes to existing mainline code
> > were needed to make recursive reclaim "work" in your tests (albeit
> > less ideally than peterz's patchset in your view)?
>
> Sorry, I was incommunicado out on the high seas all last week.  OK, the
> measures that actually prevent our ddsnap driver from deadlocking are:

Hope you enjoyed yourself.  First off, as always thanks for the
extremely insightful reply.

To give you context for where I'm coming from; I'm looking to get NBD
to survive the mke2fs hell I described here:
http://marc.info/?l=linux-mm=118981112030719=2

>   - Statically prove bounded memory use of all code in the writeout
> path.
>
>   - Implement any special measures required to be able to make such a
> proof.

Once the memory requirements of a userspace daemon (e.g. nbd-server)
are known; should one mlockall() the memory similar to how is done in
heartbeat daemon's realtime library?

Bigger question for me is what kind of hell am I (or others) in for to
try to cap nbd-server's memory usage?  All those glib-gone-wild
changes over the recent past feel problematic but I'll look to work
with Wouter to see if we can get things bounded.

>   - All allocations performed by the block driver must have access
> to dedicated memory resources.
>
>   - Disable the congestion_wait mechanism for our code as much as
> possible, at least enough to obtain the maximum memory resources
> that can be used on the writeout path.

Would peter's per bdi dirty page accounting patchset provide this?  If
not, what steps are you taking to disable this mechanism?  I've found
that nbd-server is frequently locked with 'blk_congestion_wait' in its
call trace when I hit the deadlock.

> The specific measure we implement in order to prove a bound is:
>
>   - Throttle IO on our block device to a known amount of traffic for
> which we are sure that the MEMALLOC reserve will always be
> adequate.

I've embraced Evgeniy's bio throttle patch on a 2.6.22.6 kernel
http://thread.gmane.org/gmane.linux.network/68021/focus=68552

But are you referring to that (as you did below) or is this more a
reference to peterz's bdi dirty accounting patchset?

> Note that the boundedness proof we use is somewhat loose at the moment.
> It goes something like "we only need at most X kilobytes of reserve and
> there are X megabytes available".  Much of Peter's patch set is aimed
> at getting more precise about this, but to be sure, handwaving just
> like this has been part of core kernel since day one without too many
> ill effects.
>
> The way we provide guaranteed access to memory resources is:
>
>   - Run critical daemons in PF_MEMALLOC mode, including
> any userspace daemons that must execute in the block IO path
>(cluster coders take note!)

I've been using Avi Kivity's patch from some time ago:
http://lkml.org/lkml/2004/7/26/68

to get nbd-server to to run in PF_MEMALLOC mode (could've just used
the _POSIX_PRIORITY_SCHEDULING hack instead right?)... it didn't help
on its own; I likely didn't have enough of the stars aligned to see my
MD+NBD mke2fs test not deadlock.

> Right now, all writeout submitted to ddsnap gets handed off to a daemon
> running in PF_MEMALLOC mode.  This is a needless inefficiency that we
> want to remove in future, and handle as many of those submissions as
> possible entirely in the context of the submitter.  To do this, further
> measures are needed:
>
>   - Network writes performed by the block driver must have access to
> dedicated memory resources.

I assume peterz's network deadlock avoidance patchset (or some subset
of it) has you covered here?

> We have not yet managed to trigger network read memory deadlock, but it
> is just a matter of time, additional fancy virtual block devices, and
> enough stress.  So:
>
>   - Network reads need some fancy extra support because dedicated
> memory resources must be consumed before knowing whether the
> network traffic belongs to a block device or not.
>
> Now, the interesting thing about this whole discussion is, none of the
> measures that we are actually using at the moment are implemented in
> either Peter's or Christoph's patch set.  In other words, at present we
> do not require either patch set in order to run under heavy load
> without deadlocking.  But in order to generalize our solution to a
> wider range of virtual block devices and other problematic systems such
> as userspace filesystems, we need to incorporate a number of elements
> of Peter's patch set.
>
> As far as Christoph's proposal goes, it is not required to prevent
> deadlocks.   Whether or not it is a good optimization is an open
> question.

OK, yes I've included Christoph's recursive reclaim patch and didn't
have any luck either.  Good to know that patch isn't _really_ going to
help me.

> Of all the patches posted 

Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet

2007-09-17 Thread lepton
Hi,
  Sorry for my error.
  The problem is the current icmp_reply and ip_send_reply will
send out packets with wrong destination address. Not wrong source
address.
  My point is that we should always use the source address of packets we
received as the destination address of our reply packets.

On Mon, Sep 17, 2007 at 08:14:56PM -0700, [EMAIL PROTECTED] wrote:
> On Tue, 18 Sep 2007, YOSHIFUJI Hideaki / [EMAIL PROTECTED](B wrote:
> 
> >In article <[EMAIL PROTECTED]> (at Mon, 17 Sep 
> >2007 19:20:44 -0700 (PDT)), David Miller <[EMAIL PROTECTED]> says:
> >
> >>From: lepton <[EMAIL PROTECTED]>
> >>Date: Tue, 18 Sep 2007 10:16:17 +0800
> >>
> >>>Hi,
> >>>  In some situation, icmp_reply and ip_send_reply will send
> >>>  out packet with the wrong source addr, the following patch
> >>>  will fix this.
> >>>
> >>>  I don't understand why we must use rt->rt_src in the current
> >>>  code, if this is a wrong fix, please correct me.
> >>>
> >>>Signed-off-by: Lepton Wu <[EMAIL PROTECTED]>
> >>
> >>That the address is wrong is your opinion only :-)
> >>
> >>Source address selection is a rather complex topic, and
> >>here we are definitely purposefully using the source
> >>address selected by the routing lookup for the reply.
> >
> >And, if you do think something is "wrong", you need to describe it
> >in detail, at least.
> 
> I missed the beginning of the discussion, so apologies if I'm way off 
> base.
> 
> it sounds like the question is, when a packet hits the box that causes a 
> icmp_reply (or other packet) to be generated, which IP address should be 
> used as the source
> 
> 1. the destination address of the packet that generated the message
> 
> or.
> 
> 2. the IP address that the machine would use by default if the machine 
> were to generate a new connection to the destination.
> 
> I understand that in many cases the historical approach has been #2, but 
> as more machines get multiple IP addresses on each interface, I believe 
> that it's less of a surprise to other systems if the default is #1. most 
> of the time the other systems don't care (and useusally don't want to 
> know) if the service they are contacting is on a dedicated machine or is 
> just one IP among many sharing a box.
> 
> it gets especially bad when you have load balancing going on and the 
> results could come from multiple boxes.
> 
> yes, sysadmins deal with this today, but it's a pain to do so and is a 
> continuing dribble of suprises when things don't quite work the way you 
> expect them to as you consoldate things onto more powerful systems (or 
> distribute them among multiple systems).
> 
> if the packet got to the machine and the machine is accepting it, replying 
> back from the destination IP of that packet should be legitimate (it's 
> what you would do if there was a full connection after all) and greatly 
> reduces the cases where things change.
> 
> David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet

2007-09-17 Thread david

On Tue, 18 Sep 2007, YOSHIFUJI Hideaki / [EMAIL PROTECTED](B wrote:


In article <[EMAIL PROTECTED]> (at Mon, 17 Sep 2007 19:20:44 -0700 (PDT)), David 
Miller <[EMAIL PROTECTED]> says:


From: lepton <[EMAIL PROTECTED]>
Date: Tue, 18 Sep 2007 10:16:17 +0800


Hi,
  In some situation, icmp_reply and ip_send_reply will send
  out packet with the wrong source addr, the following patch
  will fix this.

  I don't understand why we must use rt->rt_src in the current
  code, if this is a wrong fix, please correct me.

Signed-off-by: Lepton Wu <[EMAIL PROTECTED]>


That the address is wrong is your opinion only :-)

Source address selection is a rather complex topic, and
here we are definitely purposefully using the source
address selected by the routing lookup for the reply.


And, if you do think something is "wrong", you need to describe it
in detail, at least.


I missed the beginning of the discussion, so apologies if I'm way off 
base.


it sounds like the question is, when a packet hits the box that causes a 
icmp_reply (or other packet) to be generated, which IP address should be 
used as the source


1. the destination address of the packet that generated the message

or.

2. the IP address that the machine would use by default if the machine 
were to generate a new connection to the destination.


I understand that in many cases the historical approach has been #2, but 
as more machines get multiple IP addresses on each interface, I believe 
that it's less of a surprise to other systems if the default is #1. most 
of the time the other systems don't care (and useusally don't want to 
know) if the service they are contacting is on a dedicated machine or is 
just one IP among many sharing a box.


it gets especially bad when you have load balancing going on and the 
results could come from multiple boxes.


yes, sysadmins deal with this today, but it's a pain to do so and is a 
continuing dribble of suprises when things don't quite work the way you 
expect them to as you consoldate things onto more powerful systems (or 
distribute them among multiple systems).


if the packet got to the machine and the machine is accepting it, replying 
back from the destination IP of that packet should be legitimate (it's 
what you would do if there was a full connection after all) and greatly 
reduces the cases where things change.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CONFIG_ZONE_MOVABLE [2/2] config zone movable

2007-09-17 Thread KAMEZAWA Hiroyuki
On Mon, 17 Sep 2007 19:47:48 -0700
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Fri, 31 Aug 2007 19:14:15 +0900 KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> 
> wrote:
> 
> > Makes ZONE_MOVABLE as configurable
> > 
> > Based on "zone_ifdef_cleanup_by_renumbering.patch"
> > 
> 
> This patch causes my old dual-pIII machine to instantly reboot: 0.01 
> seconds
> uptime.
> 
> http://userweb.kernel.org/~akpm/config-vmm.txt

Ok, will find problem.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet

2007-09-17 Thread lepton
Hi,
  sorry for my previous email.
  What I mean is icmp_reply and ip_send_reply
in some situation will send out packets with wrong 
DESTINATION address.  the source address is always
correct.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC -mm 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-17 Thread Huang, Ying
On Mon, 2007-09-17 at 18:48 -0700, H. Peter Anvin wrote:
> Huang, Ying wrote:
> > 
> > OK, I will check the actual structure, and change the document
> > accordingly.
> > 
> 
> The best would probably be to fix zero-page.txt (and probably rename it
> something saner.)

Does the patch appended with the mail seems better?

If it is desired, I can move the zero page description into
zero-page.txt, and refer to it in 32-bit boot protocol description.

I delete the hd0_info and hd1_info from the zero page. If it is
undesired, I will move them back.

The field in zero page is fairly complex (such as struct edd_info). Do
you think it is necessary to document every field inside the first level
field, until the primary data type? Or we just provide the C struct
name?

Best Regards,
Huang Ying

---

Index: linux-2.6.23-rc4/Documentation/i386/boot.txt
===
--- linux-2.6.23-rc4.orig/Documentation/i386/boot.txt   2007-09-18 
10:40:34.0 +0800
+++ linux-2.6.23-rc4/Documentation/i386/boot.txt2007-09-18 
10:46:13.0 +0800
@@ -2,7 +2,7 @@
 
 
H. Peter Anvin <[EMAIL PROTECTED]>
-   Last update 2007-05-23
+   Last update 2007-09-14
 
 On the i386 platform, the Linux kernel uses a rather complicated boot
 convention.  This has evolved partially due to historical aspects, as
@@ -42,6 +42,9 @@
 Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of
the boot command line
 
+Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical
+   pointer to single linked list of struct setup_data.
+   Added 32-bit boot protocol.
 
  MEMORY LAYOUT
 
@@ -168,6 +171,9 @@
 0234/1 2.05+   relocatable_kernel Whether kernel is relocatable or not
 0235/3 N/A pad2Unused
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
+023c/4 N/A pad3Unused
+0240/8 2.07+   setup_data  64-bit physical pointer to linked list
+   of struct setup_data
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -480,6 +486,36 @@
   cmdline_size characters. With protocol version 2.05 and earlier, the
   maximum size was 255.
 
+Field name:setup_data
+Type:  write (obligatory)
+Offset/size:   0x240/8
+Protocol:  2.07+
+
+  The 64-bit physical pointer to NULL terminated single linked list of
+  struct setup_data. This is used to define a more extensible boot
+  parameters passing mechanism. The definition of struct setup_data is
+  as follow:
+
+  struct setup_data {
+ u64 next;
+ u32 type;
+ u32 len;
+ u8  data[0];
+  } __attribute__((packed));
+
+  Where, the next is a 64-bit physical pointer to the next node of
+  linked list, the next field of the last node is 0; the type is used
+  to identify the contents of data; the len is the length of data
+  field; the data holds the real payload.
+
+  With this field, to add a new boot parameter written by bootloader,
+  it is not needed to add a new field to real mode header, just add a
+  new setup_data type is sufficient. But to add a new boot parameter
+  read by bootloader, it is still needed to add a new field.
+
+  TODO: Where is the safe place to place the linked list of struct
+   setup_data?
+
 
  THE KERNEL COMMAND LINE
 
@@ -753,3 +789,57 @@
After completing your hook, you should jump to the address
that was in this field before your boot loader overwrote it
(relocated, if appropriate.)
+
+
+ SETUP DATA TYPES
+
+
+ 32-bit BOOT PROTOCOL
+
+For machine with some new BIOS other than legacy BIOS, such as EFI,
+LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
+based on legacy BIOS can not be used, so a 32-bit boot protocol need
+to be defined.
+
+In 32-bit boot protocol, the first step in loading a Linux kernel
+should still be to load the real-mode code and then examine the kernel
+header at offset 0x01f1. But, it is not necessary to load all
+real-mode code, just first 4K bytes traditionally known as "zero page"
+is needed.
+
+In addition to read/modify/write kernel header of the zero page as
+that of 16-bit boot protocol, the boot loader should fill the
+following additional fields of the zero page too.
+
+Offset Proto   NameMeaning
+/Size
+
+000/0402.07+   screen_info Text mode or frame buffer information
+   (struct screen_info)
+040/0142.07+   apm_bios_info   APM BIOS information (struct 
apm_bios_info)
+060/0102.07+   ist_infoIntel SpeedStep (IST) BIOS support 
information
+   (struct ist_info)
+0A0/0102.07+   sys_desc_table  System description table (struct 
sys_desc_table)
+140/0802.07+   edid_info   

Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet

2007-09-17 Thread lepton
Hi,
  sorry for lack of details.
  let's think about ip_send_reply. it is only called
by tcp_v4_send_ack and tcp_v4_reset. I don't know why
we need a source address diffrent from ip_hdr(skb)->s_addr
  icmp_reply is only called by icmp_echo and icmp_timestamp.
Is there a situation to need we use a source address diffrent
from ip_hdr(skb)->s_addr?

  My situaiton is:
  I DNAT some tcp packet to my box. some times the box will
reply reset or ack packet with tcp_v4_send_ack and tcp_v4_reset, 
when this happens, it will use the rt->s_addr instead of
ip_hdr(skb)->s_addr, then the packet will send out without change
the source addr. Becaus netfilter don't know these packets belongs
to the DNATed connection.

  Another people's situaiton is (quoted from email to me):
 
 While conducting a research about networking, I discovered
 improper handling of ICMP echo reply messages in Linux 2.4.26.  I
 looked into the code and noticed that the icmp_reply function sets the
 destination address in the reply packet to rt->rt_src.  This produces
 strange results in some cases as can be easily shown with hping and
 tcpdump.  Here is an example (NOTE: eth0 address is set to
 10.10.10.1/24):

  # tcpdump -n -i any icmp &
 
  [1] 16842
  tcpdump: WARNING: Promiscuous mode not supported on the "any" device
  tcpdump: verbose output suppressed, use -v or -vv for full protocol
  decode
  listening on any, link-type LINUX_SLL (Linux cooked), capture size 96
  bytes
 
  # hping2 --icmp --spoof 10.10.10.3 10.10.10.1
 
  HPING 10.10.10.1 (eth0 10.10.10.1): icmp mode set, 28 headers + 0
  data bytes
  02:16:53.206016 IP 10.10.10.3 > 10.10.10.1: icmp 8: echo request seq
  0
  02:16:53.206082 IP 10.10.10.1 > 10.10.10.1: icmp 8: echo reply seq 0
  02:16:54.202123 IP 10.10.10.3 > 10.10.10.1: icmp 8: echo request seq
 
  If ICMP echo requests with a spoofed source address are sent to the
  address of our eth0 interface (which of course happens through the
  loopback interface), the code of icmp_reply sets the destination
  address in the reply to 10.10.10.1 instead of simply reversing the
  source and destination addresses as required by the RFC.

On Tue, Sep 18, 2007 at 11:26:44AM +0900, YOSHIFUJI Hideaki / [EMAIL 
PROTECTED](B wrote:
> In article <[EMAIL PROTECTED]> (at Mon, 17 Sep 2007 19:20:44 -0700 (PDT)), 
> David Miller <[EMAIL PROTECTED]> says:
> 
> > From: lepton <[EMAIL PROTECTED]>
> > Date: Tue, 18 Sep 2007 10:16:17 +0800
> > 
> > > Hi,
> > >   In some situation, icmp_reply and ip_send_reply will send
> > >   out packet with the wrong source addr, the following patch
> > >   will fix this.
> > > 
> > >   I don't understand why we must use rt->rt_src in the current
> > >   code, if this is a wrong fix, please correct me.
> > > 
> > > Signed-off-by: Lepton Wu <[EMAIL PROTECTED]>
> > 
> > That the address is wrong is your opinion only :-)
> > 
> > Source address selection is a rather complex topic, and
> > here we are definitely purposefully using the source
> > address selected by the routing lookup for the reply.
> 
> And, if you do think something is "wrong", you need to describe it
> in detail, at least.
> 
> --yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CONFIG_ZONE_MOVABLE [2/2] config zone movable

2007-09-17 Thread Andrew Morton
On Fri, 31 Aug 2007 19:14:15 +0900 KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> Makes ZONE_MOVABLE as configurable
> 
> Based on "zone_ifdef_cleanup_by_renumbering.patch"
> 

This patch causes my old dual-pIII machine to instantly reboot: 0.01 seconds
uptime.

http://userweb.kernel.org/~akpm/config-vmm.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: Avoid pointless WARN_ON(irqs_disabled()) from panic codepath

2007-09-17 Thread Josh Boyer
On Mon, 17 Sep 2007 18:37:49 -0700
Randy Dunlap <[EMAIL PROTECTED]> wrote:

> On Tue, 18 Sep 2007 05:13:40 +0530 (IST) Satyam Sharma wrote:
> 
> > Untested (not even compile-tested) patch.
> > Could someone point me to ppc32/64 cross-compilers for i386?
> 
> OSDL had some, but those are gone now.
> I downloaded all of them and still use them, although it would
> be good to have some more recent versions of them.
> 
> I put the power* compiler tarballs here:
> 
> http://userweb.kernel.org/~rdunlap/cross-compilers/

Crosstool is widely used.  It'll build several combinations of
gcc/binutils/glibc for you.  

http://www.kegel.com/crosstool/

There's also the ELDK from Denx:  

http://www.denx.de/en/view/Software/WebHome#Embedded_Linux_Development_Kit

josh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet

2007-09-17 Thread YOSHIFUJI Hideaki / 吉藤英明
In article <[EMAIL PROTECTED]> (at Mon, 17 Sep 2007 19:20:44 -0700 (PDT)), 
David Miller <[EMAIL PROTECTED]> says:

> From: lepton <[EMAIL PROTECTED]>
> Date: Tue, 18 Sep 2007 10:16:17 +0800
> 
> > Hi,
> >   In some situation, icmp_reply and ip_send_reply will send
> >   out packet with the wrong source addr, the following patch
> >   will fix this.
> > 
> >   I don't understand why we must use rt->rt_src in the current
> >   code, if this is a wrong fix, please correct me.
> > 
> > Signed-off-by: Lepton Wu <[EMAIL PROTECTED]>
> 
> That the address is wrong is your opinion only :-)
> 
> Source address selection is a rather complex topic, and
> here we are definitely purposefully using the source
> address selected by the routing lookup for the reply.

And, if you do think something is "wrong", you need to describe it
in detail, at least.

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet

2007-09-17 Thread David Miller
From: lepton <[EMAIL PROTECTED]>
Date: Tue, 18 Sep 2007 10:16:17 +0800

> Hi,
>   In some situation, icmp_reply and ip_send_reply will send
>   out packet with the wrong source addr, the following patch
>   will fix this.
> 
>   I don't understand why we must use rt->rt_src in the current
>   code, if this is a wrong fix, please correct me.
> 
> Signed-off-by: Lepton Wu <[EMAIL PROTECTED]>

That the address is wrong is your opinion only :-)

Source address selection is a rather complex topic, and
here we are definitely purposefully using the source
address selected by the routing lookup for the reply.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 2.6.22.6 NETWORKING [IPV4]: Always use source addr in skb to reply packet

2007-09-17 Thread lepton
Hi,
  In some situation, icmp_reply and ip_send_reply will send
  out packet with the wrong source addr, the following patch
  will fix this.

  I don't understand why we must use rt->rt_src in the current
  code, if this is a wrong fix, please correct me.

Signed-off-by: Lepton Wu <[EMAIL PROTECTED]>

diff -X linux-2.6.22.6/Documentation/dontdiff -pru 
linux-2.6.22.6/net/ipv4/icmp.c linux-2.6.22.6-lepton/net/ipv4/icmp.c
--- linux-2.6.22.6/net/ipv4/icmp.c  2007-09-14 17:41:18.0 +0800
+++ linux-2.6.22.6-lepton/net/ipv4/icmp.c   2007-09-18 09:57:30.0 
+0800
@@ -382,6 +382,7 @@ static void icmp_reply(struct icmp_bxm *
struct ipcm_cookie ipc;
struct rtable *rt = (struct rtable *)skb->dst;
__be32 daddr;
+   struct iphdr *ip = ip_hdr(skb);
 
if (ip_options_echo(_param->replyopts, skb))
return;
@@ -393,7 +394,7 @@ static void icmp_reply(struct icmp_bxm *
icmp_out_count(icmp_param->data.icmph.type);
 
inet->tos = ip_hdr(skb)->tos;
-   daddr = ipc.addr = rt->rt_src;
+   daddr = ipc.addr = ip->saddr;
ipc.opt = NULL;
if (icmp_param->replyopts.optlen) {
ipc.opt = _param->replyopts;
diff -X linux-2.6.22.6/Documentation/dontdiff -pru 
linux-2.6.22.6/net/ipv4/ip_output.c linux-2.6.22.6-lepton/net/ipv4/ip_output.c
--- linux-2.6.22.6/net/ipv4/ip_output.c 2007-09-14 17:41:18.0 +0800
+++ linux-2.6.22.6-lepton/net/ipv4/ip_output.c  2007-09-18 09:57:13.0 
+0800
@@ -1337,11 +1337,12 @@ void ip_send_reply(struct sock *sk, stru
struct ipcm_cookie ipc;
__be32 daddr;
struct rtable *rt = (struct rtable*)skb->dst;
+   struct iphdr *ip = ip_hdr(skb);
 
if (ip_options_echo(, skb))
return;
 
-   daddr = ipc.addr = rt->rt_src;
+   daddr = ipc.addr = ip->saddr;
ipc.opt = NULL;
 
if (replyopts.opt.optlen) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG:] forcedeth: MCP55 not allowing DHCP

2007-09-17 Thread Casey Dahlin

Casey Dahlin wrote:
I have an Asus Striker Extreme motherboard with two built in MCP55 
GigE interfaces. When I build with the original Fedora 7 release 
kernel ( 
ftp://ftp.belnet.be/linux/fedora/linux/releases/7/Fedora/i386/os/Fedora/kernel-2.6.21-1.3194.fc7.i686.rpm 
) everything works fine. However, when I boot with any updated kernels 
or any other kernel (have tried building from several points in the 
linus git tree between 2.6.20 and .23-rc3, and 2.6.21.2 in -stable) I 
cannot get an IP address via dhcp. There is no error in dmesg. The 
card shows a link and otherwise appears to be working, but it is as if 
the dhcp server has been removed from the network.


On a running system there is no indication that this is a kernel bug 
at all, however by varying only the kernel the bug appears and 
disappears. I've run all these tests repeatedly with no intervening 
updates of any other packages.


As I said I attempted to build 2.6.21.2 ( the point of divergence 
between the Fedora kernel in question and -stable ) and still the card 
did not work. I will next attempt to manually build the rpm for the 
release kernel. If this works I will try experimenting with the 
included patches to narrow it down, but at this point I'm at a 
complete loss.


-Casey Dahlin



Is there any feedback to be had on this? I've gotten no reply whatsoever 
from several sources now.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC -mm 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-17 Thread H. Peter Anvin
Huang, Ying wrote:
> 
> OK, I will check the actual structure, and change the document
> accordingly.
> 

The best would probably be to fix zero-page.txt (and probably rename it
something saner.)

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20 (XFS? related) crash after uptime of > 180 days during apt-get dist-upgrade on Debian Testing

2007-09-17 Thread David Chinner
On Mon, Sep 17, 2007 at 01:20:17PM -0400, Justin Piszcz wrote:
> Including the XFS mailing list in here too because it may be an XFS bug 
> looking at the call trace.
> 
> System: Debian Testing
> Kernel: 2.6.20
> Config: Attached
> 
> I was running apt-get dist-upgrade as I always do to get the latest 
> packages upgraded and the kernel OOPS'd when it was upgrading 'tzdata' and 
> the process went into D-state and I had to reboot.
> 
> The config file is from 2.6.20 but it had been moved to a 2.6.22 directory 
> for an upgrade, but all of the options have been left unchanged.
> 
> Here is the *OOPS I captured via dmesg before I rebooted:
> 
> [16201055.214559] nfsd: last server has exited
> [16201055.214566] nfsd: unexporting all filesystems
> [17341583.697472] BUG: unable to handle kernel paging request at virtual 
> address 99e00750
> [17341583.697480]  printing eip:
> [17341583.697482] c01531b0
> [17341583.697484] *pde = 
> [17341583.697488] Oops:  [#1]
> [17341583.697491] CPU:0
> [17341583.697493] EIP:0060:[]Not tainted VLI
> [17341583.697494] EFLAGS: 00210286   (2.6.20 #3)
> [17341583.697502] EIP is at __d_lookup+0x5d/0xd6
> [17341583.697505] eax: c8d7c17e   ebx: 99e00750   ecx: 0011   edx: 
> c17f9200
> [17341583.697508] esi: 99e00750   edi: d2a10016   ebp: c7fe2304   esp: 
> dba35d98
> [17341583.697511] ds: 007b   es: 007b   ss: 0068
> [17341583.697514] Process kdm_greet (pid: 22119, ti=dba34000 task=f52d4a70 
> task.ti=dba34000)
> [17341583.697516] Stack: c8d7c17e  dba35e10 f705d478 dba35db8 
> 002c d2a10016 d2a10042 [17341583.697522]dba35e10 dba35f30 
> dba35e10 c014ab6d dba35e1c c18c5240 dba35f04 c021877e [17341583.697528] 
> d2a10042 dba35e10 c8d7c17e dba35f30 c014c38f d2a10016 0101 dba35e48 
> [17341583.697534] Call Trace:
> [17341583.697537]  [] do_lookup+0x1c/0x168
> [17341583.697540]  [] xfs_vn_lookup+0x53/0x77
> [17341583.697547]  [] __link_path_walk+0x6e8/0xb1b
> [17341583.697551]  [] dput+0x18/0x121
> [17341583.697554]  [] link_path_walk+0x43/0xb8
> [17341583.697558]  [] do_path_lookup+0x75/0x181
> [17341583.697561]  [] get_empty_filp+0x2f/0xe5
> [17341583.697566]  [] __path_lookup_intent_open+0x45/0x80
> [17341583.697570]  [] path_lookup_open+0x20/0x25
> [17341583.697573]  [] open_namei+0x66/0x58a
> [17341583.697576]  [] do_filp_open+0x25/0x40
> [17341583.697580]  [] do_sys_open+0x3e/0xc7
> [17341583.697584]  [] sys_open+0x1c/0x20
> [17341583.697587]  [] syscall_call+0x7/0xb
> [17341583.697591]  ===
> [17341583.697593] Code: 81 f2 01 00 37 9e 8b 0d 18 3f 44 c0 d3 ea 31 d0 23 
> 05 14 3f 44 c0 8b 15 1c 3f 44 c0 8b 34 82 85 f6 75 08 eb 4d 89 de 85 db 74 
> 47 <8b> 1e 0f 18 03 90 8d 6e f4 8b 04 24 3b 45 18 75 e9 8b 44 24 0c 
> [17341583.697621] EIP: [] __d_lookup+0x5d/0xd6 SS:ESP 
> 0068:dba35d98
> [17341583.697626]  <1>BUG: unable to handle kernel paging request at 
> virtual address 99e00750
> [17341648.066740]  printing eip:
> [17341648.066786] c01531b0
> [17341648.066868] *pde = 
> [17341648.066916] Oops:  [#2]
> [17341648.066965] CPU:0
> [17341648.066966] EIP:0060:[]Not tainted VLI
> [17341648.066967] EFLAGS: 00010286   (2.6.20 #3)
> [17341648.067115] EIP is at __d_lookup+0x5d/0xd6
> [17341648.067165] eax: 1efcce0e   ebx: 99e00750   ecx: 0011   edx: 
> c17f9200
> [17341648.067219] esi: 99e00750   edi: cc87901a   ebp: c7fe2304   esp: 
> f7755f04
> [17341648.067271] ds: 007b   es: 007b   ss: 0068
> [17341648.067320] Process dpkg (pid: 24684, ti=f7754000 task=d9846a70 
> task.ti=f7754000)
> [17341648.067371] Stack: 1efcce0e 46dd3a20 f7755f5c e489fe28  
> 0010 cc87901a  [17341648.067715]e489fe28 0001 
> f7755f54 c014b7cb f7755f5c ef0d4098 ffd9 cc879000 [17341648.068056] 
> 0001 f7755f54 c014cf84 f7755f54 e489fe28 c18c5240 1efcce0e 0010 
> [17341648.068397] Call Trace:
> [17341648.068482]  [] __lookup_hash+0x4a/0xef
> [17341648.068563]  [] do_rmdir+0x69/0xbb
> [17341648.068642]  [] syscall_call+0x7/0xb
> [17341648.068724]  ===
> [17341648.068770] Code: 81 f2 01 00 37 9e 8b 0d 18 3f 44 c0 d3 ea 31 d0 23 
> 05 14 3f 44 c0 8b 15 1c 3f 44 c0 8b 34 82 85 f6 75 08 eb 4d 89 de 85 db 74 
> 47 <8b> 1e 0f 18 03 90 8d 6e f4 8b 04 24 3b 45 18 75 e9 8b 44 24 0c 
> [17341648.070874] EIP: [] __d_lookup+0x5d/0xd6 SS:ESP 
> 0068:f7755f04
> [17341648.070988]
> 
> I doubt I can reproduce it as it has happened after 180 days or so, and I 
> am upgrading to 2.6.22.6 but I was wondering what exactly happened here?

No idea - it looks like dkpg was trying to remove a directory on the
same path the lookup was and both have gone splat in __d_lookup on
the same dentry. Something happened in  those 180 days that left a
landmine that was tripped over here, I think. I can't see any way of
tracking it down from this, but thanks for reporting it anyway,
Justin.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To 

Re: Scheduler benchmarks - a follow-up

2007-09-17 Thread Rob Hussey
On 9/17/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * Rob Hussey <[EMAIL PROTECTED]> wrote:
>
> > http://www.healthcarelinen.com/misc/benchmarks/BOUND_hackbench_benchmark2.png
>
> heh - am i the only one impressed by the consistency of the blue line in
> this graph? :-) [ and the green line looks a bit like a .. staircase? ]
>
> i've meanwhile tested hackbench 90 and the performance difference
> between -ck and -cfs-devel seems to be mostly down to the more precise
> (but slower) sched_clock() introduced in v2.6.23 and to the startup
> penalty of freshly created tasks.
>
> Putting back the 2.6.22 version and tweaking the startup penalty gives
> this:
>
>  [hackbench 90, smaller is better]
>
> sched-devel.git  sched-devel.git+lowres-sched-clock+dsp
> ---  --
>   5.555  5.149
>   5.641  5.149
>   5.572  5.171
>   5.583  5.155
>   5.532  5.111
>   5.540  5.138
>   5.617  5.176
>   5.542  5.119
>   5.587  5.159
>   5.553  5.177
> --
>  avg: 5.572 avg: 5.150 (-8.1%)
>
> ('lowres-sched-clock' is the patch i sent in the previous mail. 'dsp' is
> a disable-startup-penalty patch that is in the latest sched-devel.git)
>
> i have used your .config to conduct this test.
>
> can you reproduce this with the (very-) latest sched-devel git tree:
>
>   git-pull 
> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git
>
> plus with the low-res-sched-clock patch (re-) attached below?
>
> Ingo
> ---
>  arch/i386/kernel/tsc.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux/arch/i386/kernel/tsc.c
> ===
> --- linux.orig/arch/i386/kernel/tsc.c
> +++ linux/arch/i386/kernel/tsc.c
> @@ -110,9 +110,9 @@ unsigned long long native_sched_clock(vo
>  *   very important for it to be as fast as the platform
>  *   can achive it. )
>  */
> -   if (unlikely(!tsc_enabled && !tsc_unstable))
> +   if (1 || unlikely(!tsc_enabled && !tsc_unstable))
> /* No locking but a rare wrong value is not a big deal: */
> -   return (jiffies_64 - INITIAL_JIFFIES) * (10 / HZ);
> +   return jiffies_64 * (10 / HZ);
>
> /* read the Time Stamp Counter: */
> rdtscll(this_offset);
> -

Sorry it took so long for me to get back.

Ok, to start the dmesg output for 2.6.22-ck1 is attached. The relevant
lines seem to be:
[   27.691348] checking TSC synchronization [CPU#0 -> CPU#1]: passed.
[   27.995427] Time: tsc clocksource has been installed.

I've updated to the latest sched-devel git, and applied the patch
above. I ran it through the same tests, but this time only while bound
to a single core. Some selected numbers:

lat_ctx -s 0 $i (the left most number is $i):

15  3.09
16  3.09
17  3.11
18  3.07
19  2.99
20  3.09
21  3.05
22  3.11
23  3.05
24  3.08
25  3.06

hackbench $i:

80 11.720
81 11.698
82 11.888
83 12.094
84 12.232
85 12.351
86 12.512
87 12.680
88 12.736
89 12.861
90 13.103

pipe-test (the left most number is the run #):

1  8.85
2  8.80
3  8.84
4  8.82
5  8.82
6  8.80
7  8.82
8  8.82
9  8.85
10 8.83

Once again, graphs:
http://www.healthcarelinen.com/misc/benchmarks/BOUND_PATCHED_lat_ctx_benchmark.png
http://www.healthcarelinen.com/misc/benchmarks/BOUND_PATCHED_hackbench_benchmark.png
http://www.healthcarelinen.com/misc/benchmarks/BOUND_PATCHED_pipe-test_benchmark.png

I saw in your other email that you'd like for me to try with
CONFIG_PREEMPT disabled. I should have a chance to try that very soon.

Regards,
Rob


dmesg-2.6.22-ck1.bz2
Description: BZip2 compressed data
<><><>

data_files2.tar.bz2
Description: BZip2 compressed data


Re: My position on general ``RAS'' tool support infrastructure

2007-09-17 Thread Randy Dunlap
On Thu, 13 Sep 2007 07:21:10 -0600 Eric W. Biederman wrote:

> Pete/Piet Delaney <[EMAIL PROTECTED]> writes:
> 
> > Jason, Eric:
> >
> > Did you read Keith Owens suggestion on RAS tools from:


Yes.  and I re-read it.

There are several things in Keith's email that make sense:

a.  all RAS tools should use a common interface
b.  it's not the kernel's job to decide which RAS tool runs first


Eric makes some good points too.  I'm mostly similar to Eric:
paranoid about trusting software/hardware after a panic (or oops).

So if someone wants to use multiple RAS tools on a panic event,
enabling an admin to set priorities is OK with me, but I'll only
trust the first one that is used, and even that one may have
problems.  IOW, I don't see a big need to support multiple RAS
tools at one time.  (speaking for myself)


> So if someone who is suggesting an implementation can absorb 
> and understand the requirements of the different groups and come
> up with solutions that meet the requirements of the different projects
> I think progress can be made.  That as far as I know takes talent.

Ack that.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: Avoid pointless WARN_ON(irqs_disabled()) from panic codepath

2007-09-17 Thread Randy Dunlap
On Tue, 18 Sep 2007 05:13:40 +0530 (IST) Satyam Sharma wrote:

> Untested (not even compile-tested) patch.
> Could someone point me to ppc32/64 cross-compilers for i386?

OSDL had some, but those are gone now.
I downloaded all of them and still use them, although it would
be good to have some more recent versions of them.

I put the power* compiler tarballs here:

http://userweb.kernel.org/~rdunlap/cross-compilers/

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fwd: Intel DQ35JOE Mainboard 82566DM-2 Gigabit Network

2007-09-17 Thread John Duthie
I'm having a few Problems with a NEW PC

Spec is:
Intel DQ35JOE Mainboard
Intel Q6600 Quad core CPU
4GB ram
3 SATA HDDs
1 SATA DVD-RW

The Integrated NIC is not found by kernel 2.6.23-rc6 or  2.6.22.1
Am I missing an option in there ??

The Intel Drivers (e1000-7.6.5)  don't compile against 2.6.23-rc6 or 2.6.22.1
/usr/src/intel/e1000- 7.6.5/src/e1000_ethtool.c:2109: error:
'ethtool_op_get_perm_addr' undeclared here (not in a function)
( I know, wrong place to report this .. )

( also SATA dvd writer does not seem to write yet )

If anyone has Patches to try I'm Currently able and willing to test
them on this hardware config!

see attached stuff
mail me for more info if required !

TIA


dmesg.gz
Description: GNU Zip compressed data


lspciv.gz
Description: GNU Zip compressed data


dotconfig.gz
Description: GNU Zip compressed data


Re: [2.6.22.6] nfsd: fh_verify() `malloc failure' with lots of free memory leads to NFS hang

2007-09-17 Thread J. Bruce Fields
On Tue, Sep 18, 2007 at 12:54:07AM +0100, Nix wrote:
> The code which calls new_do_write() looks like this:
> 
> ,[ libio/fileops.c:_IO_new_file_xsputn() ]
> |  if (do_write)
> |{
> |  count = new_do_write (f, s, do_write);
> |  to_do -= count;
> |  if (count < do_write)
> |return n - to_do;
> |}
> `
> 
> This code handles partial writes followed by errors by returning a
> suitable nonzero value, and immediate errors by returning -1.
> 
> In either case the buffer will have been filled as much as possible by
> that point, and will still be filled when (vf)printf() is next called.

OK, I'm a little lost at this point (what's n?  What's to_do?), but I'll
take your word for it.

I'd be kinda curious when exactly the behavior changed and why.

Also I suppose we should check which version of nfs-utils that fix is in
and make sure distributions are getting the fixed nfs-utils before they
get the new libc, or we're going to see this bug a lot

> This behaviour is, IIRC, mandated by the C Standard: I can find no
> reference in the Standard to streams being flushed on error, only
> on fclose(), fflush(), or program termination.

OK!

Let me know if the problem's fixed.

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC -mm 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-17 Thread Huang, Ying
On Mon, 2007-09-17 at 08:29 -0700, H. Peter Anvin wrote:
> Huang, Ying wrote:
> > This patch defines a 32-bit boot protocol and adds corresponding
> > document.
> > +
> > +In addition to read/modify/write kernel header of the zero page as
> > +that of 16-bit boot protocol, the boot loader should fill the
> > +following additional fields of the zero page too.
> > +
> > +Offset TypeDescription
> > +--     ---
> > +0  32 bytesstruct screen_info, SCREEN_INFO
> > +   ATTENTION, overlaps the following !!!
> > +2  unsigned short  EXT_MEM_K, extended memory size in Kb (from int 
> > 0x15)
> > + 0x20  unsigned short  CL_MAGIC, commandline magic number (=0xA33F)
> > + 0x22  unsigned short  CL_OFFSET, commandline offset
> > +   Address of commandline is calculated:
> > + 0x9 + contents of CL_OFFSET
> > +   (only taken, when CL_MAGIC = 0xA33F)
> > + 0x40  20 bytesstruct apm_bios_info, APM_BIOS_INFO
> > + 0x60  16 bytesIntel SpeedStep (IST) BIOS support information
> > + 0x80  16 byteshd0-disk-parameter from intvector 0x41
> > + 0x90  16 byteshd1-disk-parameter from intvector 0x46
> > +
> > + 0xa0  16 bytesSystem description table truncated to 16 bytes.
> > +   ( struct sys_desc_table_struct )
> > + 0xb0 - 0x13f  Free. Add more parameters here if you really 
> > need them.
> > + 0x140- 0x1be  EDID_INFO Video mode setup
> > +
> > +0x1c4  unsigned long   EFI system table pointer
> > +0x1c8  unsigned long   EFI memory descriptor size
> > +0x1cc  unsigned long   EFI memory descriptor version
> > +0x1d0  unsigned long   EFI memory descriptor map pointer
> > +0x1d4  unsigned long   EFI memory descriptor map size
> > +0x1e0  unsigned long   ALT_MEM_K, alternative mem check, in Kb
> > +0x1e4  unsigned long   Scratch field for the kernel setup code
> > +0x1e8  charnumber of entries in E820MAP (below)
> > +0x1e9  unsigned char   number of entries in EDDBUF (below)
> > +0x1ea  unsigned char   number of entries in EDD_MBR_SIG_BUFFER (below)
> > +0x290 - 0x2cf  EDD_MBR_SIG_BUFFER (edd.S)
> > +0x2d0 - 0xd00  E820MAP
> > +0xd00 - 0xeff  EDDBUF (edd.S) for disk signature read sector
> > +0xd00 - 0xeeb  EDDBUF (edd.S) for edd data
> > +
> > +After loading and setuping the zero page, the boot loader can load the
> > +32/64-bit kernel in the same way as that of 16-bit boot protocol.
> > +
> > +In 32-bit boot protocol, the kernel is started by jumping to the
> > +32-bit kernel entry point, which is the start address of loaded
> > +32/64-bit kernel.
> > +
> > +At entry, the CPU must be in 32-bit protected mode with paging
> > +disabled; the CS and DS must be 4G flat segments; %esi holds the base
> > +address of the "zero page"; %esp, %ebp, %edi should be zero.
> 
> This is just replicating the "zero-page.txt" document, which can best be
> described as a "total lie" -- compare with the actual structure.

OK, I will check the actual structure, and change the document
accordingly.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC -mm 1/2] i386/x86_64 boot: setup data

2007-09-17 Thread Huang, Ying
On Mon, 2007-09-17 at 08:30 -0700, H. Peter Anvin wrote:
> Huang, Ying wrote:
> > This patch add a field of 64-bit physical pointer to NULL terminated
> > single linked list of struct setup_data to real-mode kernel
> > header. This is used to define a more extensible boot parameters
> > passing mechanism.
> 
> You MUST NOT add a field like this without changing the version number,
> and, since you expect to enter the kernel at the PM entrypoint, you
> better *CHECK* that version number before ever descending down the chain.
> 

I forgot changing the version number in boot/head.S. I will add it. And
I will add version number checking before descending down the chain.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wasting our Freedom

2007-09-17 Thread Al Viro
On Mon, Sep 17, 2007 at 05:03:55PM -0700, David Schwartz wrote:
> 
> > "David Schwartz" <[EMAIL PROTECTED]> writes:
> 
> > > My point is that you *cannot* prevent a recipient of a
> > > derivative work from
> > > receiving any rights under either the GPL or the BSD to any protectable
> > > elements in that work.
> >
> > Of course you can.
> 
> No you can't.

Gentlemen, please remove your wanking selves back to the gutter you've
crawled from.  This is not slashdot[1].  This is not gnu.misc.discuss.
This is not alt.sex.cartooney.sue.sue.sue.  This is a technical maillist
and that dungpile doesn't belong here.  If you insist on hitting vger,
ask davem to create a new maillist ([EMAIL PROTECTED] would fit that kind
of traffic nicely) and for pity sake, do fuck off already.  Enough is
enough.

[1] the spews from nerds, the spews that splatter...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

2007-09-17 Thread Daniel Phillips
On Friday 07 September 2007 22:12, Mike Snitzer wrote:
> Can you be specific about which changes to existing mainline code
> were needed to make recursive reclaim "work" in your tests (albeit
> less ideally than peterz's patchset in your view)?

Sorry, I was incommunicado out on the high seas all last week.  OK, the
measures that actually prevent our ddsnap driver from deadlocking are:

  - Statically prove bounded memory use of all code in the writeout
path.

  - Implement any special measures required to be able to make such a
proof.

  - All allocations performed by the block driver must have access
to dedicated memory resources.

  - Disable the congestion_wait mechanism for our code as much as
possible, at least enough to obtain the maximum memory resources
that can be used on the writeout path.

The specific measure we implement in order to prove a bound is:

  - Throttle IO on our block device to a known amount of traffic for
which we are sure that the MEMALLOC reserve will always be
adequate.

Note that the boundedness proof we use is somewhat loose at the moment. 
It goes something like "we only need at most X kilobytes of reserve and 
there are X megabytes available".  Much of Peter's patch set is aimed 
at getting more precise about this, but to be sure, handwaving just 
like this has been part of core kernel since day one without too many 
ill effects.

The way we provide guaranteed access to memory resources is:

  - Run critical daemons in PF_MEMALLOC mode, including
any userspace daemons that must execute in the block IO path
   (cluster coders take note!)

Right now, all writeout submitted to ddsnap gets handed off to a daemon
running in PF_MEMALLOC mode.  This is a needless inefficiency that we 
want to remove in future, and handle as many of those submissions as 
possible entirely in the context of the submitter.  To do this, further 
measures are needed:

  - Network writes performed by the block driver must have access to
dedicated memory resources.

We have not yet managed to trigger network read memory deadlock, but it 
is just a matter of time, additional fancy virtual block devices, and 
enough stress.  So:

  - Network reads need some fancy extra support because dedicated
memory resources must be consumed before knowing whether the
network traffic belongs to a block device or not.

Now, the interesting thing about this whole discussion is, none of the 
measures that we are actually using at the moment are implemented in 
either Peter's or Christoph's patch set.  In other words, at present we 
do not require either patch set in order to run under heavy load 
without deadlocking.  But in order to generalize our solution to a 
wider range of virtual block devices and other problematic systems such 
as userspace filesystems, we need to incorporate a number of elements 
of Peter's patch set.

As far as Christoph's proposal goes, it is not required to prevent 
deadlocks.   Whether or not it is a good optimization is an open 
question.

Of all the patches posted so far related to this work, the only 
indispensable one is the bio throttling patch developed by Evgeniy and 
I in a parallel thread.  The other essential pieces are all implemented 
in our block driver for now.  Some of those can be generalized and 
moved at least partially into core, and some cannot.

I do need to write some sort of primer on this, because there is no 
fire-and-forget magic core kernel solution.  There are helpful things 
we can do in core, but some of it can only be implemented in the 
drivers themselves.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] Consolidate host virtualization support under Virtualization menu

2007-09-17 Thread Jeremy Fitzhardinge
Charles N Wyble wrote:
>
>
> Zachary Amsden wrote:
> >
> > Virtualization is completely different, and probably needs separate
> > server (kvm, lguest) and client (kvm, lguest, xen, vmware) sections in
> > it's menu.
>
>
> So what is the differentiation between client and server above? Just
> curious what makes kvm and lguest server and client.

"Host" and "guest" are better terms, I think.  Kvm is all host, since
guests need no modification.  lguest turns the kernel into both host and
guest.  Xen Linux kernels are all guest, since the Xen hypervisor is the
host.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ext34: ensure do_split leaves enough free space in both blocks

2007-09-17 Thread hooanon05

Andreas Dilger:
> > So this looks like 2.6.22 and 2.6.23 material, but the timing is getting
> > pretty squeezy.  Could people please give this change an extra-close
> > review, let me know?
> 
> I already discussed it at length with Eric and inspected the patch, so we
> could add:
> Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
> 
> Haven't actually tested the code myself.

I've just tested the patch on linux-2.6.23-rc6 (i386) with the test
program I posted a few months ago, and found it solved the problem.
Thank you very much Eric Sandeen, Andreas Dilger and all in ML.

Junjiro Okajima
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] powerpc: Avoid pointless WARN_ON(irqs_disabled()) from panic codepath

2007-09-17 Thread Satyam Sharma


On Tue, 18 Sep 2007, Satyam Sharma wrote:
> 
> > [ cut here ]
> > Badness at arch/powerpc/kernel/smp.c:202
> 
> comes when smp_call_function_map() has been called with irqs disabled,
> which is illegal. However, there is a special case, the panic() codepath,
> when we do not want to warn about this -- warning at that time is pointless
> anyway, and only serves to scroll away the *real* cause of the panic and
> distracts from the real bug.

BTW arch/ppc/ has same issue, but that's gonna be removed by next year
anyways, so I think there's no point making a patch for that (?)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] Consolidate host virtualization support under Virtualization menu

2007-09-17 Thread Charles N Wyble
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



Zachary Amsden wrote:
>
> Virtualization is completely different, and probably needs separate
> server (kvm, lguest) and client (kvm, lguest, xen, vmware) sections in
> it's menu.


So what is the differentiation between client and server above? Just
curious what makes kvm and lguest server and client.

>
> Zach
>
> ___
> Virtualization mailing list
> [EMAIL PROTECTED]
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
>
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG7w3ckQPZV56XDBMRAvvaAJ9cHl+A321MJyw6W4J4yIDurz0K2wCcDg8J
uOR6alAGvWjxEbThiuaeIDc=
=TQ3m
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Wasting our Freedom

2007-09-17 Thread David Schwartz

> "David Schwartz" <[EMAIL PROTECTED]> writes:

> > My point is that you *cannot* prevent a recipient of a
> > derivative work from
> > receiving any rights under either the GPL or the BSD to any protectable
> > elements in that work.
>
> Of course you can.

No you can't.

> What rights do you have to BSD-licenced works, made available
> (under BSD) to MS exclusively? You only get the binary object...

You are equating what rights I have with my ability to exercise those
rights. They are not the same thing. For example, I once bought the rights
to publically display the movie "Monty Python and the Holy Grail". To my
surprise, the rights to public display did not include an actual copy of the
film.

In any event, I never claimed that anyone has rights to a protectable
element that they do not possess a lawful copy of. That's a complete
separate issue and one that has nothing to do with what's being discussed
here because these are all cases where you have the work.

> You know, this is quite common practice - instead of assigning
> copyright, you can grant a BSD-style licence (for some fee,
> something like "do what you want but I will do what I want with
> my code").

Sure, *you* can grant a BSD-style license to any protectable elements *you*
authored. But unless your recpients can obtain a BSD-style license to all
protectable elements in the work from their respective authors, they cannot
modify or distribute it.

*You* cannot grant any rights to protectable elements authored by someone
else, unless you have a relicensing agreement. Neither the GPL nor the BSD
is one of those.

> >> If A sold a BSD licence to B only and this B sold a proprietary
> >> licence (for a derived work) to C, C (without that clause) wouldn't
> >> have a BSD licence to the original work. This is BTW common scenario.
> >
> > C most certainly would have a BSD license, should he choose to
> > comply with
> > terms, to every protectable element that is in both the
> > original work and
> > the work he received.

> But he may have received only binary program image - or the source
> under NDA.
> Sure, NDA doesn't cover public information, but BSD doesn't mean public.
> Now what?

What the hell does that have to do with anything? Are you just trying to be
deliberately dense or waste time? Is it not totally obvious how the
principles I explain apply to a case like that?

Only someone who signs an NDA must comply with it. If you signed an NDA, you
must comply with it. An NDA can definitely subtract rights. It's a complex
question whether an NDA can subtract GPL rights, but again, that has nothing
to do with what we're talking about here.

Sure, you can have the right from me to do X and still not be allowed to do
X because you agreed with someone else not to do it. So what?

> > C has no right to license any protectable element he did not author to
> > anyone else. He cannot set the license terms for those elements to C.

> Sure, the licence covers the >>>entire work<<<, not some "elements".

This is a misleading statement. The phrase "entire work" has two senses. The
license definitely does not cover the "entire work" in the sense of every
protectable element in the work unless each individual author of those
elements chose to offer that element under that license.

If by "entire work", you mean any compilation or derivative work copyright
the "final" author has, then yes, that's available under whatever license
the "final" author places it under. But that license does not actually
permit you to distribute the work.

This is really complicated and I wish I had a clear way to explain it.
Suppose I write a work and then you modify it. Assume your modification
includes adding new protectable elements to that work. When someone
distributes that new derivative work, they are distributing protectable
elements authored by both you and me.

Absent a relicensing agreement, they must obtain some rights from you and
some rights from me to do that. You cannot license the protectable elements
that I authored that are still in the resulting derivative work.

> > Neither the BSD nor the GPL ever give you the right to change the actual
> > license a work is offered under by the original author.
>
> Of course, that's a very distant thing.

Exactly. Every protectable element in the final work is licensed by the
original author to every recipient who takes advantage of the license offer.

> >> BTW: a work by multiple authors is a different thing than a work
> >> derived from another.
> >
> > In practice it doesn't matter.
>
> Of course it does. Only author of a (derived) work can licence
> it, in this case he/she could change the licence back to BSD,
> or sell it to MS (if not based on GPL etc).

Only the author of any protectable element can license it, whether it's in a
derivated work or by itself.

You are seriously confused if you think that just because you create a
derivative work that includes my protectable elements you can then license
the 

Re: Wasting our Freedom

2007-09-17 Thread Ingo Schwarze
[EMAIL PROTECTED] wrote on Sun, Sep 16, 2007 at 04:40:38PM -0700:
> On Sun, 16 Sep 2007, Jacob Meuser wrote:

>> so the linux community is morally equivilent to a corporation?
>> that's what it sounds like you are all legally satisfied with.
>
> if it's legal it's legal. it's not a matter of the Linux community being 
> satisfied with it, it's a matter of the BSD people desiring it based on 
> their selection of license (and the repeated statements that this feature 
> of the BSD license being an advantage compared to the GPL makes it clear 
> that this isn't an unknown side effect, it's an explicit desire).

Indeed, that argument is often paraphrased in a way that makes it
hard to understand.  What i heard people say is not "If people make
derivative works based on BSD code, they should make them less free
instead of fully free", but it is: "If people caring nothing about
free software in the first place are building their own commercial
systems anyway, they should rather reuse BSD code than hacking up
their own bricolage of bug-ridden insecure stuff."

Granted, that's a different approach than taken by the GPL, which
essentially says "... anyway, they deserve to be on their own."

> so the Linux community is following the desires of the BSD community
> by following their license but the BSD community is unhappy, why?

Be careful not to confuse "desires" with "legal requirements"...  :-(

Given BSD code, BSD-licensed substantial improvements
make happier than restrictively licensed substantial improvements
make happier than derived non-free closed-source software
make happier than license violations.

Besides, the Linux communities neither qualify as "caring nothing
about free software" nor as "hacking up their own bricolage of
bug-ridden insecure stuff" (hopefully ;-).  So that argument
simply doesn't apply to you.  Probably, that's why Jacob talked
about "morally equivalent to a corporation".

> you claim that it's unethical for the linux community to use the
> code, but brag about NetApp useing the code.  what makes NetApp ok
> and Linux evil?  many people honestly don't understand the logic
> behind this.  please explain it.

Several people have already explained this nicely; the degree
of happiness may also depend on the level of cooperation and
understanding you expect from the people building on the code,
given their own intentions and goals.  I may well be thankful
towards an enemy just for not killing me, but at the same time
sad about a friend leaving me out in the rain.

( This just being stated in general; i'm not sure what the state
  of discussions in the various Linux communities is just now. )
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22.6] nfsd: fh_verify() `malloc failure' with lots of free memory leads to NFS hang

2007-09-17 Thread Nix
On 17 Sep 2007, J. Bruce Fields stated:
> On Mon, Sep 17, 2007 at 11:23:46PM +0100, Nix wrote:
>> A while later we start seeing runs of malloc failures, which I think
>> correlated with the unexplained pauses in NFS response:
>
> Actually, they're nothing to do with malloc failures--the message
> printed here is misleading, and isn't even an error; it gets printed
> whenever an upcall to mountd is made.

Indeed, with more debugging, all the failures I see come from the call
to exp_find(), which is digging out exports...

>The problem is almost certainly a
> problem with kernel<->mountd communication--the kernel depends on mountd
> to answer questions about exported filesystems as part of the fh_verify
> code.

Ah! I keep forgetting that mountd isn't just used at mount time: damned
misleading names, grumble.

Restarting mountd clears the problem up temporarily, so you are
definitely right.

> commit dd087896285da9e160e13ee9f7d75381b67895e3
> Author: J. Bruce Fields <[EMAIL PROTECTED]>
> Date:   Thu Jul 26 16:30:46 2007 -0400

Aha! I'm on 3b55934b9baefecee17aefc3ea139e261a4b03b8, over a month older.

> On a recent Debian/Sid machine, I saw libc retrying stdio writes that
> returned write errors.

Debian Sid recently upgraded to glibc 2.6.x, as did I... earlier
versions of glibc will have had this behaviour too, but it may have been
less frequent.

> I don't know whether this libc behavior is correct or expected, but it
> seems safest to add the __fpurge() (suggested by Neil) to ensure data is
> thrown away.

It is expected, judging from my reading of the
code. stdio-common/vfprintf.c emits single chars using the outchar()
macro, and strings using the outstring() macro, using functions in libio
to do the job. The string output routine then calls _IO_file_xsputn(),
which, tracing through libio's jump tables and symbol aliases, ends up
calling _IO_new_file_xsputn() in libio/fileops.c. (I've only just
started to understand libio. It's basically undocumented as far as I can
tell, but it's deeply nifty. Think of stdio, only made entirely out of
hookable components. :) )

(Actual writing then thunks down through _IO_new_do_write() and
new_do_write() in the same file, which finally calls __write().  If
there's any kind of error this returns EOF after some opaque messing
about with a _cur_column value, which is as far as I can tell never
used!)

The code which calls new_do_write() looks like this:

,[ libio/fileops.c:_IO_new_file_xsputn() ]
|  if (do_write)
|{
|  count = new_do_write (f, s, do_write);
|  to_do -= count;
|  if (count < do_write)
|return n - to_do;
|}
`

This code handles partial writes followed by errors by returning a
suitable nonzero value, and immediate errors by returning -1.

In either case the buffer will have been filled as much as possible by
that point, and will still be filled when (vf)printf() is next called.


This behaviour is, IIRC, mandated by the C Standard: I can find no
reference in the Standard to streams being flushed on error, only
on fclose(), fflush(), or program termination.


I'm upgrading now: thank you!

-- 
`Some people don't think performance issues are "real bugs", and I think 
such people shouldn't be allowed to program.' --- Linus Torvalds
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 OOPS in forcedeth?

2007-09-17 Thread Satyam Sharma


On Mon, 17 Sep 2007, Denis V. Lunev wrote:
> Dhaval Giani wrote:
> > On Thu, Sep 13, 2007 at 11:51:33PM -0400, Andrew James Wade wrote:

> >> EIP: [] tcp_rto_min+0xb/0x15 SS:ESP 0068:c0596dec

As Vlad Yasevich mentioned, this one is already fixed in 23-rc6.

The forcedeth oops is unrelated, but multiple people have reported that
same oops now -- adding Manfred Spraul to CC. [ original thread is at:
http://lkml.org/lkml/2007/9/1/115 ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wasting our Freedom

2007-09-17 Thread Theodore Tso
On Mon, Sep 17, 2007 at 03:06:37PM -0700, Can E. Acar wrote:
> The only remaining issue is whether Nick & Jiri have enough
> original contributions to the code to be added to the Copyright.
> 
> I believe this needs to be resolved between Reyk and Nick and Jiri.
> 
> The main reason of Theo's message, linked earlier, was the
> lack of response on this issue. It seems that the SFLC is
> dismissing this issue, and thus stalling its resolution by the
> developers.

OK, so all of this flaming, and digging up of "licenses ripped off",
and chaff thrown up in the air, and moaning and bewailing about
"theft", is now down to these two lines regarding Nick and Jiri:

> * Copyright (c) 2004-2007 Reyk Floeter <[EMAIL PROTECTED]>
> * Copyright (c) 2006-2007 Nick Kossifidis <[EMAIL PROTECTED]>
> * Copyright (c) 2007 Jiri Slaby <[EMAIL PROTECTED]>
> [snip rest of BSD license]

It's under a BSD license; what material difference does those two
lines make, for goodness sake?  It's under a BSD license, so it's not
like anything won't be "given back".  Whether or not they have made
enough for changes is really a question for the lawyers, and may
differ from one jurisdiction to another --- but whether or not they
have now, or maybe will not make until later --- does it really make a
difference?  Who gets hurt if someone gets they get a bit more credit
than they deserve?  Certainly the most important thing is that Reyk is
given proper credit, right?

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] [PATCH] [WORKAROUND] CONFIG_PREEMPT_RT and ib_umad_close() issue

2007-09-17 Thread John Blackwood

Roland Dreier wrote:

Thanks for the explanation...

 > But basically, with CONFIG_PREEMPT_RT enabled, the lock points, such as
 > aqcuiring a spinlock, potentially become places where the current task
 > may be context switched out / preempted.
 > 
 > Therefore, when a call is made to lock a spinlock for example, the

 > caller should not currently have irqs disabled, or preemption disabled,
 > since a context switch may occur.

this doesn't seem relevant here...


Hi Roland,

right.  just some background info.


 > void fastcall rt_downgrade_write(struct rw_semaphore *rwsem)
 > {
 > BUG();
 > }

this seems to be the problem... the -rt patch turns downgrade_write()
into a BUG().

I need to look at the locking in user_mad.c again, but I think it may
be possible to replace both places that do downgrade_write() with
up_write() followed by down_read().

 - R.



that sounds like it would be a good solution for both preempt rt and 
non-preempt rt kernels.


thanks again for looking at this for us.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] powerpc: Avoid pointless WARN_ON(irqs_disabled()) from panic codepath

2007-09-17 Thread Satyam Sharma

> [ cut here ]
> Badness at arch/powerpc/kernel/smp.c:202

comes when smp_call_function_map() has been called with irqs disabled,
which is illegal. However, there is a special case, the panic() codepath,
when we do not want to warn about this -- warning at that time is pointless
anyway, and only serves to scroll away the *real* cause of the panic and
distracts from the real bug.

* So let's extract the WARN_ON() from smp_call_function_map() into all its
  callers -- smp_call_function() and smp_call_function_single()

* Also, introduce another caller of smp_call_function_map(), namely
  __smp_call_function() (and make smp_call_function() a wrapper over this)
  which does *not* warn about disabled irqs

* Use this __smp_call_function() from the panic codepath's smp_send_stop()

We also end having to move code of smp_send_stop() below the definition
of __smp_call_function().

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

Untested (not even compile-tested) patch.
Could someone point me to ppc32/64 cross-compilers for i386?

 arch/powerpc/kernel/smp.c |   27 ++-
 1 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 1ea4316..b24dcba 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -152,11 +152,6 @@ static void stop_this_cpu(void *dummy)
;
 }
 
-void smp_send_stop(void)
-{
-   smp_call_function(stop_this_cpu, NULL, 1, 0);
-}
-
 /*
  * Structure and data for smp_call_function(). This is designed to minimise
  * static memory requirements. It also looks cleaner.
@@ -198,9 +193,6 @@ int smp_call_function_map(void (*func) (void *info), void 
*info, int nonatomic,
int cpu;
u64 timeout;
 
-   /* Can deadlock when called with interrupts disabled */
-   WARN_ON(irqs_disabled());
-
if (unlikely(smp_ops == NULL))
return ret;
 
@@ -270,10 +262,19 @@ int smp_call_function_map(void (*func) (void *info), void 
*info, int nonatomic,
return ret;
 }
 
+static int __smp_call_function(void (*func)(void *info), void *info,
+  int nonatomic, int wait)
+{
+   return smp_call_function_map(func,info,nonatomic,wait,cpu_online_map);
+}
+
 int smp_call_function(void (*func) (void *info), void *info, int nonatomic,
int wait)
 {
-   return smp_call_function_map(func,info,nonatomic,wait,cpu_online_map);
+   /* Can deadlock when called with interrupts disabled */
+   WARN_ON(irqs_disabled());
+
+   return __smp_call_function(func, info, nonatomic, wait);
 }
 EXPORT_SYMBOL(smp_call_function);
 
@@ -283,6 +284,9 @@ int smp_call_function_single(int cpu, void (*func) (void 
*info), void *info, int
cpumask_t map = CPU_MASK_NONE;
int ret = 0;
 
+   /* Can deadlock when called with interrupts disabled */
+   WARN_ON(irqs_disabled());
+
if (!cpu_online(cpu))
return -EINVAL;
 
@@ -299,6 +303,11 @@ int smp_call_function_single(int cpu, void (*func) (void 
*info), void *info, int
 }
 EXPORT_SYMBOL(smp_call_function_single);
 
+void smp_send_stop(void)
+{
+   __smp_call_function(stop_this_cpu, NULL, 1, 0);
+}
+
 void smp_call_function_interrupt(void)
 {
void (*func) (void *info);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wasting our Freedom

2007-09-17 Thread Krzysztof Halasa
"David Schwartz" <[EMAIL PROTECTED]> writes:

> My point is that you *cannot* prevent a recipient of a derivative work from
> receiving any rights under either the GPL or the BSD to any protectable
> elements in that work.

Of course you can.
What rights do you have to BSD-licenced works, made available
(under BSD) to MS exclusively? You only get the binary object...

You know, this is quite common practice - instead of assigning
copyright, you can grant a BSD-style licence (for some fee,
something like "do what you want but I will do what I want with
my code").

>> If A sold a BSD licence to B only and this B sold a proprietary
>> licence (for a derived work) to C, C (without that clause) wouldn't
>> have a BSD licence to the original work. This is BTW common scenario.
>
> C most certainly would have a BSD license, should he choose to comply with
> terms, to every protectable element that is in both the original work and
> the work he received.

But he may have received only binary program image - or the source
under NDA.
Sure, NDA doesn't cover public information, but BSD doesn't mean public.
Now what?

> C has no right to license any protectable element he did not author to
> anyone else. He cannot set the license terms for those elements to C.

Sure, the licence covers the >>>entire work<<<, not some "elements".

> Neither the BSD nor the GPL ever give you the right to change the actual
> license a work is offered under by the original author.

Of course, that's a very distant thing.

>> BTW: a work by multiple authors is a different thing than a work
>> derived from another.
>
> In practice it doesn't matter.

Of course it does. Only author of a (derived) work can licence
it, in this case he/she could change the licence back to BSD,
or sell it to MS (if not based on GPL etc).

> Would you argue that I can license Disney's "The Lion King" movie to you if
> I promise not to sue you over any (no) rights that I possess to it?

Sure you can :-) that doesn't mean it would protect me from Disney,
but you can.

> You are confusing licenses of two very different types. The BSD and GPL
> licenses only cover modification and distribution, two rights you do not get
> to MS Windows at all. *Use* is not restricted under copyright.

I'm told in the USA use = copying from disk to RAM = distribution,
isn't it true? :-)
It doesn't matter of course.

> There is simply nothing remotely comparable to the BSD or GPL license in the
> case of MS Windows. There is no grant of additional rights beyond those you
> get automatically with lawful possession (such as use).

I don't compare them (though you can). You don't get a licence for
"original elements" in MS-Windows, do you?

> If MS wished to grant someone the right to modify or redistribute Windows,
> that person would also need to obtain the right to modify or distribute
> protectable elements not authored by Microsoft. The only way they could
> obtain those rights, assuming Microsoft didn't have written relicensing
> agreements, is from the original author under the original licenses.

Yes, but it isn't automatic. Imagine you have received something
from MS, under more permissive licence (I think such things did
happen). How do you, for example, recognice boundaries of the
elements, IOW what additional rights do you have to each line in
the code or pixel in the font?

The file itself only states:
(C) MS
portions (C) e.g. Bitstream
licenced under their special agreement

What extra rights do you receive from Bitstream? Perhaps you should
ask them if they have given you some licence? :-)

Or another example, redistributable runtime libraries. What extra
rights do you have?

What you write is true for GPL, but it doesn't mean it's true
everytime. It's just that clause in the GPL.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] modpost: detect unterminated device id lists

2007-09-17 Thread Mauro Carvalho Chehab
Hi Andrew,

Em Seg, 2007-09-17 às 14:50 -0700, Andrew Morton escreveu:
> On Tue, 18 Sep 2007 03:15:14 +0530 (IST)
> Satyam Sharma <[EMAIL PROTECTED]> wrote:
> 
> > 
> > 
> > On Sun, 16 Sep 2007, Andrew Morton wrote:
> > 
> > > On Mon, 17 Sep 2007 05:54:45 +0530 "Satyam Sharma" <[EMAIL PROTECTED]> 
> > > wrote:
> > > 
> > > > On 9/17/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > I'm getting this:
> > > > >
> > > > > rusb2/pvrusb2: struct usb_device_id is 20 bytes.  The last of 3 is:
> > > > > 0x03 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
> > > > > 0x00
> > > > > 0x00 0x00 0x00 0x00 0x00
> > > > > FATAL: drivers/media/video/pvrusb2/pvrusb2: struct usb_device_id is 
> > > > > not terminated
> > > > > with a NULL entry!
> > > > >
> > > > > ("rusb2/pvrusb2" ??)
> > > > 
> > > > Hmm? Are you sure you didn't see any "drivers/media/video/pv" before the
> > > > "rusb2/pvrusb2" bit?
> > > 
> > > Fairly.  I looked twice.
> > 
> > "drivers/media/video/pvrusb2/pvrusb2" comes out correctly here ...
> > 
> > 
> > > > Looking at Kees' patch (and the existing code), I've no
> > > > clue how/why this should happen ... will try to reproduce here ...
> > > > 
> > > > 
> > > > > but:
> > > > >
> > > > > struct usb_device_id pvr2_device_table[] = {
> > > > > [PVR2_HDW_TYPE_29XXX] = { USB_DEVICE(0x2040, 0x2900) },
> > > > > [PVR2_HDW_TYPE_24XXX] = { USB_DEVICE(0x2040, 0x2400) },
> > > > > { USB_DEVICE(0, 0) },
> > > > > };
> > > > >
> > > > > looks OK?
> > > > >
> > > > > Using plain old "{ }" shut the warning up.
> > > > 
> > > > USB_DEVICE(0, 0) is not empty termination, actually, and this looks like
> > > > a genuine bug caught by the patch. As that dump shows, USB_DEVICE(0, 0)
> > > > assigns "0x03 0x00" (in little endian) to usb_device_id.match_flags. And
> > > > I don't think the USB code treats such an entry as an empty entry (?)
> > > > 
> > > > Interestingly, the "USB_DEVICE(0, 0)" thing is absent from latest -git
> > > > tree and also in my copy of 23-rc4-mm1 -- so this looks like something
> > > > you must've merged recently.
> > > 
> > > git-dvb very carefully does
> > > 
> > > --- a/drivers/media/video/pvrusb2/pvrusb2-hdw.c~git-dvb
> > > +++ a/drivers/media/video/pvrusb2/pvrusb2-hdw.c
> > > @@ -44,7 +44,7 @@
> > >  struct usb_device_id pvr2_device_table[] = {
> > >   [PVR2_HDW_TYPE_29XXX] = { USB_DEVICE(0x2040, 0x2900) },
> > >   [PVR2_HDW_TYPE_24XXX] = { USB_DEVICE(0x2040, 0x2400) },
> > > -   { }
> > > +   { USB_DEVICE(0, 0) },
> > > };
> > >
> > > MODULE_DEVICE_TABLE(usb, pvr2_device_table);
> > 
> > Ok, this is a false positive indeed, the core USB code does in fact
> > treat such an entry as an empty entry (usb_match_id() tests only the
> > .idVendor, .bDeviceClass, .bInterfaceClass and .driver_info members
> > for non-zero and not the .match_flags member).
> > 
> > However, a quick-grep-and-glance tells us that none of the other 2213
> > occurrences of USB_DEVICE() in the tree ever do this "(0,0)" thing,
> > so it does make sense to change this one to a simple "{ }" as well --
> > that's clearer style anyway, and the "standard" way to empty-terminate
> > in the rest of the tree, if nothing else.
> > 
> 
> yeah, I think so.  Mauro, could you please drop that change?

Patch dropped from my tree.

Cheers,
Mauro.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.23-rc4-mm1][Bug] kernel BUG at include/linux/netdevice.h:339!

2007-09-17 Thread David Miller
From: Andrew Morton <[EMAIL PROTECTED]>
Date: Mon, 17 Sep 2007 14:16:22 -0700

> On Mon, 17 Sep 2007 17:46:38 +0530
> Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> 
> > Kernel Bug is hit with 2.6.23-rc4-mm1 kernel on ppc64 machine.
> > 
> > kernel BUG at include/linux/netdevice.h:339!
> 
> (please cc [EMAIL PROTECTED] on networking-related matters)
> 
> You died here:
> 
> static inline void napi_complete(struct napi_struct *n)
> {
> BUG_ON(!test_bit(NAPI_STATE_SCHED, >state));
> 
> The NAPI changes have had a few problems and hopefully things have
> been fixed up since then.  I'll try to get rc6-mm1 out this evening,
> so please retest that?

And if you trigger this still it is absolutely critical that
you tell us what networking device driver you are using at
the time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 048/104] KVM: Add and use pr_unimpl for standard formatting of unimplemented features

2007-09-17 Thread Rusty Russell
On Mon, 2007-09-17 at 09:16 -0700, Joe Perches wrote:
> On Mon, 2007-09-17 at 10:31 +0200, Avi Kivity wrote:
> > diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
> > index cfda3ab..6d25826 100644
> > --- a/drivers/kvm/kvm.h
> > +++ b/drivers/kvm/kvm.h
> > @@ -474,6 +474,14 @@ struct kvm_arch_ops {
> >  
> >  extern struct kvm_arch_ops *kvm_arch_ops;
> >  
> > +/* The guest did something we don't support. */
> > +#define pr_unimpl(vcpu, fmt, ...)  \
> > + do {  
> > \
> > +   if (printk_ratelimit()) \
> > +   printk(KERN_ERR "kvm: %i: cpu%i " fmt,  \
> > +  current->tgid, (vcpu)->vcpu_id , ## __VA_ARGS__); \
> > + } while(0)
> > +
> >  #define kvm_printf(kvm, fmt ...) printk(KERN_DEBUG fmt)
> >  #define vcpu_printf(vcpu, fmt...) kvm_printf(vcpu->kvm, fmt)
> >  
> 
> This converts all KERN_ uses to KERN_ERR.
> It seems better to add a  argument to kvm_printf.
> pr_unimpl is perhaps a poor name choice.
> perhaps vcpu_printk_ratelimit(vcpu, level, fmt, ...)

Possibly, but remember that printk() is an admission of failure.  It's
only useful to developers, and the only reason for printk over
pr_debug() is for users to report to developers when guests crash.

pr_unimpl() means exactly what it says: the guest asked for something we
don't support.  If that turns out to be the last thing in the logs
before a crash, it's a clue.  The rest of the printks should probably
move to pr_debug().

Hope that helps,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Christoph Lameter
On Sun, 16 Sep 2007, Nick Piggin wrote:

> > > So if you argue that vmap is a downside, then please tell me how you
> > > consider the -ENOMEM of your approach to be better?
> >
> > That is again pretty undifferentiated. Are we talking about low page
> 
> In general.

There is no -ENOMEM approach. Lower order page allocation (< 
PAGE_ALLOC_COSTLY_ORDER) will reclaim and in the worst case the OOM killer 
will be activated. That is the nature of the failures that we saw early in 
the year when this was first merged into mm.

> > With the ZONE_MOVABLE you can remove the unmovable objects into a defined
> > pool then higher order success rates become reasonable.
> 
> OK, if you rely on reserve pools, then it is not 1st class support and hence
> it is a non-solution to VM and IO scalability problems.

ZONE_MOVABLE creates two memory pools in a machine. One of them is for 
movable and one for unmovable. That is in 2.6.23. So 2.6.23 has no first 
call support for order 0 pages?

> > > If, by special software layer, you mean the vmap/vunmap support in
> > > fsblock, let's see... that's probably all of a hundred or two lines.
> > > Contrast that with anti-fragmentation, lumpy reclaim, higher order
> > > pagecache and its new special mmap layer... Hmm, seems like a no
> > > brainer to me. You really still want to persue the "extra layer"
> > > argument as a point against fsblock here?
> >
> > Yes sure. You code could not live without these approaches. Without the
> 
> Actually: your code is the one that relies on higher order allocations. Now
> you're trying to turn that into an argument against fsblock?

fsblock also needs contiguous pages in order to have a beneficial 
effect that we seem to be looking for.

> > antifragmentation measures your fsblock code would not be very successful
> > in getting the larger contiguous segments you need to improve performance.
> 
> Complely wrong. *I* don't need to do any of that to improve performance.
> Actually the VM is well tuned for order-0 pages, and so seeing as I have
> sane hardware, 4K pagecache works beautifully for me.

Sure the system works fine as is. Not sure why we would need fsblock then.

> > (There is no new mmap layer, the higher order pagecache is simply the old
> > API with set_blocksize expanded).
> 
> Yes you add another layer in the userspace mapping code to handle higher
> order pagecache.

That would imply a new API or something? I do not see it.

> > Why: It is the same approach that you use.
> 
> Again, rubbish.

Ok the logical conclusion from the above is that you think your approach 
is rubbish Is there some way you could cool down a bit?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Mingming Cao
On Mon, 2007-09-17 at 15:01 -0700, Badari Pulavarty wrote:
> On Mon, 2007-09-17 at 12:29 -0700, Mingming Cao wrote:
> > On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
> > > jbd/jbd2: Replace slab allocations with page cache allocations
> > > 
> > > From: Christoph Lameter <[EMAIL PROTECTED]>
> > > 
> > > JBD should not pass slab pages down to the block layer.
> > > Use page allocator pages instead. This will also prepare
> > > JBD for the large blocksize patchset.
> > > 
> > 
> > Currently memory allocation for committed_data(and frozen_buffer) for
> > bufferhead is done through jbd slab management, as Christoph Hellwig
> > pointed out that this is broken as jbd should not pass slab pages down
> > to IO layer. and suggested to use get_free_pages() directly.
> > 
> > The problem with this patch, as Andreas Dilger pointed today in ext4
> > interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
> > 1/3-1/2 page space. 
> > 
> > What was the originally intention to set up slabs for committed_data(and
> > frozen_buffer) in JBD? Why not using kmalloc?
> > 
> > Mingming
> 
> Looks good. Small suggestion is to get rid of all kmalloc() usages and
> consistently use jbd_kmalloc() or jbd2_kmalloc().
> 
> Thanks,
> Badari
> 

Here is the incremental small cleanup patch. 

Remove kamlloc usages in jbd/jbd2 and consistently use jbd_kmalloc/jbd2_malloc.


Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/jbd/journal.c  |8 +---
 fs/jbd/revoke.c   |   12 ++--
 fs/jbd2/journal.c |8 +---
 fs/jbd2/revoke.c  |   12 ++--
 4 files changed, 22 insertions(+), 18 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-17 14:32:16.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-17 14:33:59.0 -0700
@@ -723,7 +723,8 @@ journal_t * journal_init_dev(struct bloc
journal->j_blocksize = blocksize;
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
-   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal->j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
@@ -777,7 +778,8 @@ journal_t * journal_init_inode (struct i
/* journal descriptor can store up to n blocks -bzzz */
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
-   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal->j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
@@ -1157,7 +1159,7 @@ void journal_destroy(journal_t *journal)
iput(journal->j_inode);
if (journal->j_revoke)
journal_destroy_revoke(journal);
-   kfree(journal->j_wbuf);
+   jbd_kfree(journal->j_wbuf);
jbd_kfree(journal);
 }
 
Index: linux-2.6.23-rc6/fs/jbd/revoke.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-17 14:32:22.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-17 14:35:13.0 -0700
@@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
journal->j_revoke->hash_shift = shift;
 
journal->j_revoke->hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   jbd_kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!journal->j_revoke->hash_table) {
kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
journal->j_revoke = NULL;
@@ -231,7 +231,7 @@ int journal_init_revoke(journal_t *journ
 
journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
if (!journal->j_revoke_table[1]) {
-   kfree(journal->j_revoke_table[0]->hash_table);
+   jbd_kfree(journal->j_revoke_table[0]->hash_table);
kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
return -ENOMEM;
}
@@ -246,9 +246,9 @@ int journal_init_revoke(journal_t *journ
journal->j_revoke->hash_shift = shift;
 
journal->j_revoke->hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   jbd_kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!journal->j_revoke->hash_table) {
-   kfree(journal->j_revoke_table[0]->hash_table);
+   jbd_kfree(journal->j_revoke_table[0]->hash_table);
   

Re: [patch 1/8] Immediate Values - Global Modules List and Module Mutex

2007-09-17 Thread Rusty Russell
On Fri, 2007-09-14 at 11:32 -0400, Mathieu Desnoyers wrote:
> * Rusty Russell ([EMAIL PROTECTED]) wrote:
> > Alternatively, if you called it "immediate_init" then the semantics
> > change slightly, but are more obvious (ie. only use this when the value
> > isn't being accessed yet).  But it can't be __init then anyway.
> > 
> 
> I think your idea is good. immediate_init() could be used to update the
> immediate values at boot time _and_ at module load time, and we could
> use an architecture specific arch_immediate_update_init() to support it.

Right.

> As for "when" to use this, it should be used at boot time when
> interrupts are still disabled, still running in UP. It can also be used
> at module load time before any of the module code is executed, as long
> as the module code pages are writable (which they always are, for
> now..). Therefore, the flag seems inappropriate for module load
> arch_immediate_update_init. It cannot be put in __init section neither
> though if we use it like this.

I think from a user's POV it would be nice to have a 1:1 mapping with
normal initialization semantics (ie. it will work as long as you don't
access this value until initialized).  And I think this would be the
case.  eg:

int foo_func(void)
{
if (immediate_read(_immediate))
return 0;
...
}

int some_init(void)
{
immediate_init(some_immediate, 0);
register_foo(foo_func);
...
}


> > On an unrelated note, did you consider simply IPI-ing and doing the
> > substitution with all CPUs stopped?  If you only updated the immediate
> > references to this particular var, it should be fast enough not to upset
> > the RT guys, even.
> > 
> 
> Yes, I thought about this, but since I use immediate values in the
> kernel markers, which can be put in exception handlers (including nmi,
> mce handler), which cannot be disabled without important side-effects, I
> don't think trying to stop the CPUs is a workable solution.

OK, but can you justify the use of immediates within the nmi or mce
handlers?  They don't strike me as useful candidates for optimization.

> > Well, you can do that in asm without gcc support.  It's a little nasty:
> > since gcc will know nothing about the function call, it can't have side
> > effects which are visible in this function, and you'll have to save and
> > restore *all* regs if you decide to do the function call.  But it's
> > possible (a 5-byte nop gets changed to a call, the call does the pushes
> > and sets the args regs, calls the function, then pops everything and
> > rets).
> 
> GCC support is required if we want to embed inline functions inside
> unlikely branches depending on immediate values (no function call
> there). It also permits passing local variables as arguments to the
> function call (stack setup), which would be tricky, instrumentation site
> specific and non portable if done in assembly.

Well if this is the slow path, you don't want inline anyway.  But it
would be horribly, horribly arch-specific, yes.

Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.22.6] nfsd: fh_verify() `malloc failure' with lots of free memory leads to NFS hang

2007-09-17 Thread J. Bruce Fields
On Mon, Sep 17, 2007 at 11:23:46PM +0100, Nix wrote:
> Sep 17 22:57:55 loki warning: kernel: nfsd_dispatch: vers 3 proc 4
> Sep 17 22:57:55 loki warning: kernel: nfsd: ACCESS(3)   36: 01070001 000fb001 
>  d32ff38f 404811a6 a88d96ab 0x1f
> Sep 17 22:57:55 loki warning: kernel: nfsd: fh_verify(36: 01070001 000fb001 
>  d32ff38f 404811a6 a88d96ab)
> Sep 17 22:57:55 loki warning: kernel: nfsd: Dropping request due to malloc 
> failure!
> Sep 17 22:58:50 hades notice: kernel: nfs: server loki not responding, still 
> trying
> Sep 17 22:58:50 hades notice: kernel: nfs: server loki not responding, still 
> trying
> Sep 17 22:58:55 hades notice: kernel: nfs: server loki not responding, still 
> trying
> Sep 17 22:59:40 hades notice: kernel: nfs: server loki not responding, still 
> trying
> 
> 
> >From then on, *every* fh_verify() request fails the same way, and
> obviously if you can't verify any fds you can't do much with NFS.
> 
> Looking back in the log I see intermittent malloc failures starting
> almost as soon as I've booted (allowing a couple of minutes for me to
> turn debugging on):
> 
> Sep 17 22:25:50 hades notice: kernel: nfs: server loki OK
> [...]
> Sep 17 22:28:09 loki warning: kernel: nfsd_dispatch: vers 3 proc 19
> Sep 17 22:28:09 loki warning: kernel: nfsd: FSINFO(3)   28: 00070001 000fb001 
>  d32ff38f 404811a6 a88d96ab
> Sep 17 22:28:09 loki warning: kernel: nfsd: fh_verify(28: 00070001 000fb001 
>  d32ff38f 404811a6 a88d96ab)
> Sep 17 22:28:09 loki warning: kernel: nfsd: Dropping request due to malloc 
> failure!
> 
> A while later we start seeing runs of malloc failures, which I think
> correlated with the unexplained pauses in NFS response:

Actually, they're nothing to do with malloc failures--the message
printed here is misleading, and isn't even an error; it gets printed
whenever an upcall to mountd is made.  The problem is almost certainly a
problem with kernel<->mountd communication--the kernel depends on mountd
to answer questions about exported filesystems as part of the fh_verify
code.

It's just a shot in the dark, but you might try the latest nfs-utils
(get the latest out of git://linux-nfs.org/nfs-utils if you're already
on the most recent your distro will give you).  Or just apply the
following--which did fix a problem whose symptoms varied depending on
libc behavior.

If that doesn't work, I'd try

strace -s0 `pidof rpc.mountd`

and also look at the contents of /proc/net/rpc/nfsd.fh/contents.

--b.

commit dd087896285da9e160e13ee9f7d75381b67895e3
Author: J. Bruce Fields <[EMAIL PROTECTED]>
Date:   Thu Jul 26 16:30:46 2007 -0400

Use __fpurge to ensure single-line writes to cache files

On a recent Debian/Sid machine, I saw libc retrying stdio writes that
returned write errors.  The result is that if an export downcall returns
an error (which it can in normal operation, since it currently
(incorrectly) returns -ENOENT on any negative downcall), then subsequent
downcalls will write multiple lines (including the original line that
received the error).

The result is that the server fails to respond to any rpc call that
refers to an unexported mount point (such as a readdir of a directory
containing such a mountpoint), so client commands hang.

I don't know whether this libc behavior is correct or expected, but it
seems safest to add the __fpurge() (suggested by Neil) to ensure data is
thrown away.

Signed-off-by: "J. Bruce Fields" <[EMAIL PROTECTED]>
Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

diff --git a/support/nfs/cacheio.c b/support/nfs/cacheio.c
index a76915b..9d271cd 100644
--- a/support/nfs/cacheio.c
+++ b/support/nfs/cacheio.c
@@ -17,6 +17,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -111,7 +112,18 @@ void qword_printint(FILE *f, int num)
 
 int qword_eol(FILE *f)
 {
+   int err;
+
fprintf(f,"\n");
+   err = fflush(f);
+   /*
+* We must send one line (and one line only) in a single write
+* call.  In case of a write error, libc may accumulate the
+* unwritten data and try to write it again later, resulting in a
+* multi-line write.  So we must explicitly ask it to throw away
+* any such cached data:
+*/
+   __fpurge(f);
return fflush(f);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH mm] fix swapoff breakage; however...

2007-09-17 Thread Hugh Dickins
On Tue, 18 Sep 2007, Balbir Singh wrote:
> Hugh Dickins wrote:
> > 
> > What would make sense is (what I meant when I said swap counted
> > along with RSS) not to count pages out and back in as they are
> > go out to swap and back in, just keep count of instantiated pages
> > 
> 
> I am not sure how you define instantiated pages. I suspect that
> you mean RSS + pages swapped out (swap_pte)?

That's it.  (Whereas file pages counted out when paged out,
then counted back in when paged back in.)

> If a swapoff is going to push a container over it's limit, then
> we break the container and the isolation it provides.

Is it just my traditional bias, that makes me prefer you break
your container than my swapoff?  I'm not sure.

> Upon swapoff
> failure, may be we could get the container to print a nice
> little warning so that anyone else with CAP_SYS_ADMIN can fix the
> container limit and retry swapoff.

And then they hit the next one... rather like trying to work out
the dependencies of packages for oneself: a very tedious process.

If the swapoff succeeds, that does mean there was actually room
in memory (+ other swap) for everyone, even if some have gone over
their nominal limits.  (But if the swapoff runs out of memory in
the middle, yes, it might well have assigned the memory unfairly.)

The appropriate answer may depend on what you do when a container
tries to fault in one more page than its limit.  Apparently just
fail it (no attempt to page out another page from that container).

So, if the whole system is under memory pressure, kswapd will
be keeping the RSS of all tasks low, and they won't reach their
limits; whereas if the system is not under memory pressure,
tasks will easily approach their limits and so fail.

Please tell me my understanding is wrong!

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.6.23-rc6: Fix NUMA Memory Policy Reference Counting

2007-09-17 Thread Andi Kleen

> Handling policy ref counts for hugepages is a bit trickier.
> huge_zonelist() returns a zone list that might come from a 
> shared or vma 'BIND policy.  In this case, we should hold the
> reference until after the huge page allocation in 
> dequeue_hugepage().  The patch modifies huge_zonelist() to
> return a pointer to the mempolicy if it needs to be unref'd
> after allocation.

Acked-by: Andi Kleen <[EMAIL PROTECTED]>

Andrew, can you please queue that for .23?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.6.23-rc6: Fix NUMA Memory Policy Reference Counting

2007-09-17 Thread Andi Kleen

> The patch does require concurrent increments and decrements in the main 
> fault patch. The potential is to create another bouncing cacheline for 
> concurrent faults. This looks like it would cause a performance issue.

While may be true correctness is always more important than performance.
So I think this is the right thing for .23. Any performance improvements
if needed can come later.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.22.6] nfsd: fh_verify() `malloc failure' with lots of free memory leads to NFS hang

2007-09-17 Thread Nix
Back in early 2006 I reported persistent hangs on the NFS server,
whereby all of a sudden about ten minutes after boot my primary NFS
server would cease responding to NFS requests until it was rebooted.


That time, the problem vanished when I switched to NFS-over-TCP:


Well, I just rebooted --- post-glibc-upgrade from 2.5 to 2.6.1, no
kernel upgrade or anything, so this bug has been latent during at
least my last three weeks of uptime. And it's back. (I've been
seeing strange long pauses doing things like ls, and I suspect
they are related: see below.)

/proc/sys/nfsd/exports on the freezing server:

# Version 1.1
# Path Client(Flags) # IPs
/usr/packages.bin/non-free  
hades.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,uuid=90a98d8a:a8be4806:aea3a4e1:fe3437a0)
/home/.loki.wkstn.nix   
esperi.srvr.nix(rw,root_squash,async,wdelay,no_subtree_check,uuid=8ff32fd3:a6114840:ab968da8:25b41721)
/home/.loki.wkstn.nix   
hades.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,uuid=8ff32fd3:a6114840:ab968da8:25b41721)
/usr/lib/X11/fonts  
hades.wkstn.nix(ro,root_squash,async,wdelay,uuid=87553c5b:d84740fc:b7d9f7e8:4f749689)
/usr/share/xplanet  
hades.wkstn.nix(ro,root_squash,async,wdelay,uuid=87553c5b:d84740fc:b7d9f7e8:4f749689)
/usr/share/xemacs   
hades.wkstn.nix(rw,no_root_squash,async,wdelay,uuid=87553c5b:d84740fc:b7d9f7e8:4f749689)
/usr/packages   
hades.wkstn.nix(rw,no_root_squash,async,wdelay,no_subtree_check,uuid=2a35f82a:cca144df:a1123587:23527f53)

I turned on ALL-class nfsd debugging and here's what I see as it freezes
up:

Sep 17 22:57:00 loki warning: kernel: nfsd_dispatch: vers 3 proc 1
Sep 17 22:57:00 loki warning: kernel: nfsd: GETATTR(3)  36: 01070001 000fb001 
 d32ff38f 404811a6 a88d96ab
Sep 17 22:57:00 loki warning: kernel: nfsd: fh_verify(36: 01070001 000fb001 
 d32ff38f 404811a6 a88d96ab)
Sep 17 22:57:44 loki warning: kernel: nfsd_dispatch: vers 3 proc 4
Sep 17 22:57:44 loki warning: kernel: nfsd: ACCESS(3)   36: 01070001 000fb001 
 d32ff38f 404811a6 a88d96ab 0x1f
Sep 17 22:57:44 loki warning: kernel: nfsd: fh_verify(36: 01070001 000fb001 
 d32ff38f 404811a6 a88d96ab)
Sep 17 22:57:45 loki warning: kernel: nfsd: Dropping request due to malloc 
failure!
Sep 17 22:57:52 loki warning: kernel: nfsd_dispatch: vers 3 proc 4
Sep 17 22:57:52 loki warning: kernel: nfsd: ACCESS(3)   36: 01070001 000fb001 
 d32ff38f 404811a6 a88d96ab 0x1f
Sep 17 22:57:52 loki warning: kernel: nfsd: fh_verify(36: 01070001 000fb001 
 d32ff38f 404811a6 a88d96ab)
Sep 17 22:57:52 loki warning: kernel: nfsd: Dropping request due to malloc 
failure!
Sep 17 22:57:55 loki warning: kernel: nfsd_dispatch: vers 3 proc 4
Sep 17 22:57:55 loki warning: kernel: nfsd: ACCESS(3)   36: 01070001 000fb001 
 d32ff38f 404811a6 a88d96ab 0x1f
Sep 17 22:57:55 loki warning: kernel: nfsd: fh_verify(36: 01070001 000fb001 
 d32ff38f 404811a6 a88d96ab)
Sep 17 22:57:55 loki warning: kernel: nfsd: Dropping request due to malloc 
failure!
Sep 17 22:58:50 hades notice: kernel: nfs: server loki not responding, still 
trying
Sep 17 22:58:50 hades notice: kernel: nfs: server loki not responding, still 
trying
Sep 17 22:58:55 hades notice: kernel: nfs: server loki not responding, still 
trying
Sep 17 22:59:40 hades notice: kernel: nfs: server loki not responding, still 
trying


>From then on, *every* fh_verify() request fails the same way, and
obviously if you can't verify any fds you can't do much with NFS.

Looking back in the log I see intermittent malloc failures starting
almost as soon as I've booted (allowing a couple of minutes for me to
turn debugging on):

Sep 17 22:25:50 hades notice: kernel: nfs: server loki OK
[...]
Sep 17 22:28:09 loki warning: kernel: nfsd_dispatch: vers 3 proc 19
Sep 17 22:28:09 loki warning: kernel: nfsd: FSINFO(3)   28: 00070001 000fb001 
 d32ff38f 404811a6 a88d96ab
Sep 17 22:28:09 loki warning: kernel: nfsd: fh_verify(28: 00070001 000fb001 
 d32ff38f 404811a6 a88d96ab)
Sep 17 22:28:09 loki warning: kernel: nfsd: Dropping request due to malloc 
failure!

A while later we start seeing runs of malloc failures, which I think
correlated with the unexplained pauses in NFS response:

Sep 17 22:33:59 loki warning: kernel: nfsd_dispatch: vers 3 proc 6
Sep 17 22:33:59 loki warning: kernel: nfsd: READ(3) 44: 02070001 0001ce75 
 5b3c5587 fc4047d8 e8f7d9b7 20480 bytes at 4096
Sep 17 22:33:59 loki warning: kernel: nfsd: fh_verify(44: 02070001 0001ce75 
 5b3c5587 fc4047d8 e8f7d9b7)
Sep 17 22:33:59 loki warning: kernel: nfsd: Dropping request due to malloc 
failure!
Sep 17 22:33:59 loki warning: kernel: nfsd_dispatch: vers 3 proc 6
Sep 17 22:33:59 loki warning: kernel: nfsd: READ(3) 44: 02070001 0001ce75 
 5b3c5587 fc4047d8 e8f7d9b7 28672 bytes at 53248
Sep 17 22:33:59 loki 

Add all thread stats for TASKSTATS_CMD_ATTR_TGID (v5)

2007-09-17 Thread Guillaume Chazarain
TASKSTATS_CMD_ATTR_TGID used to return only the delay accounting stats, not
the basic and extended accounting.  With this patch,
TASKSTATS_CMD_ATTR_TGID also aggregates the accounting info for all threads
of a thread group.  This makes TASKSTATS_CMD_ATTR_TGID usable in a similar
fashion to TASKSTATS_CMD_ATTR_PID, for commands like iotop -P
(http://guichaz.free.fr/misc/iotop.py).

Changelog since V4 (http://lkml.org/lkml/2007/9/15/171):
- Revert gratuitous user interface change (returning exit_code >> 8 instead of
exit_code). Thanks Oleg Nesterov.
- Revert useless heavyweight locking (lock_task_sighand() in fill_tgid_exit).
Thanks Oleg.
- Correctly fill the TGID in taskstats_exit(). Thanks Oleg.

Changelog since V3 (http://lkml.org/lkml/2007/8/31/121):
- Removed userspace example, either it gets accepted in util-linux-ng or I'll
maintain it elsewhere.
- Added kerneldoc for fill_threadgroup() and add_tsk().
- Removed useless {get,put}_task_struct(leader) as spotted by Andrew Morton
and Oleg Nesterov.
- Use lock_task_sighand() instead of spin_lock_irqsave(>sighand->siglock)
for consistency with the locking of task->signal->stats in fill_tgid().
- Removed useless check for a NULL taskstats in fill_tgid_exit(). Thanks Oleg.
- Documented double accounting race seen by Oleg.
- Rephrased the fill_tgid_exit() comment as per Oleg's recommendation.
- Documented the special case for the AFORK ac_flag.
- Use the exit status (code >> 8) instead of the exit code as documented in
Documentation/accounting/taskstats-struct.txt.
- Use signal->group_exit_code if set for stats->ac_exitcode on a TGID as
suggested by Oleg.

Changelog since V2 (http://lkml.org/lkml/2007/8/19/96):
- Added a testcase
- Added an indirection between the stats producer and consumer:
add_task() & fill_threadgroup()
- TGID stats are either summed from all the threads or taken from the leader

Changelog since V1 (http://lkml.org/lkml/2007/8/2/185):
- Update combined stats of exited threads in fill_tgid_exit() as
suggested by Balbir Singh.
- Very light cleanup of fill_tgid_exit() by the way.
- bacct fields are also combined for all threads.
- Instead of assuming memory stats are identical for all threads, we
take the max of all threads.

Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]>
Cc: Balbir Singh <[EMAIL PROTECTED]>
Cc: Jay Lan <[EMAIL PROTECTED]>
Cc: Jonathan Lim <[EMAIL PROTECTED]>
Cc: Oleg Nesterov <[EMAIL PROTECTED]>
---

 include/linux/tsacct_kern.h |   12 ++-
 kernel/taskstats.c  |  135 +-
 kernel/tsacct.c |  113 
 3 files changed, 159 insertions(+), 101 deletions(-)

diff -r 2908770b8fc2 include/linux/tsacct_kern.h
--- a/include/linux/tsacct_kern.h   Sun Sep 16 22:24:49 2007 -0700
+++ b/include/linux/tsacct_kern.h   Tue Aug 28 20:35:27 2007 +0200
@@ -10,17 +10,23 @@
 #include 
 
 #ifdef CONFIG_TASKSTATS
-extern void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk);
+void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task);
+void bacct_add_tsk(struct taskstats *stats, struct task_struct *task);
 #else
-static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct 
*tsk)
+static inline void bacct_fill_threadgroup(struct taskstats *stats, struct 
task_struct *task)
+{}
+static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct 
*task)
 {}
 #endif /* CONFIG_TASKSTATS */
 
 #ifdef CONFIG_TASK_XACCT
-extern void xacct_add_tsk(struct taskstats *stats, struct task_struct *p);
+void xacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task);
+void xacct_add_tsk(struct taskstats *stats, struct task_struct *p);
 extern void acct_update_integrals(struct task_struct *tsk);
 extern void acct_clear_integrals(struct task_struct *tsk);
 #else
+static inline void xacct_fill_threadgroup(struct taskstats *stats, struct 
task_struct *task)
+{}
 static inline void xacct_add_tsk(struct taskstats *stats, struct task_struct 
*p)
 {}
 static inline void acct_update_integrals(struct task_struct *tsk)
diff -r 2908770b8fc2 kernel/taskstats.c
--- a/kernel/taskstats.cSun Sep 16 22:24:49 2007 -0700
+++ b/kernel/taskstats.cMon Sep 17 22:55:04 2007 +0200
@@ -168,6 +168,68 @@ static void send_cpu_listeners(struct sk
up_write(>sem);
 }
 
+/**
+ * fill_threadgroup - initialize some common stats for the thread group
+ * @stats: the taskstats to write into
+ * @task: the thread representing the whole group
+ *
+ * There are two types of taskstats fields when considering a thread group:
+ * - those that can be aggregated from each thread in the group (like CPU
+ * times),
+ * - those that cannot be aggregated (like UID) or are identical (like
+ * memory usage), so are taken from the group leader.
+ * XXX_threadgroup() methods deal with the first type while XXX_add_tsk() with
+ * the second.
+ */
+static void fill_threadgroup(struct taskstats *stats, struct 

Re: [PATCH 1/3] IB/ehca: Fix large page HW cap defines

2007-09-17 Thread Roland Dreier
obviously OK...applied.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-17 Thread Roland Dreier
 > The IGMP enabling patch posted by me on September 2nd isn't on your list
 > http://lists.openfabrics.org/pipermail/general/2007-September/040250.html
 > can you add it?

Yes, I lost that somehow.  I will add it to my list of things to take
a look at (no opinion yet).

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Christoph Lameter
On Mon, 17 Sep 2007, Bernd Schmidt wrote:

> Christoph Lameter wrote:
> > True. That is why we want to limit the number of unmovable allocations and
> > that is why ZONE_MOVABLE exists to limit those. However, unmovable
> > allocations are already rare today. The overwhelming majority of allocations
> > are movable and reclaimable. You can see that f.e. by looking at
> > /proc/meminfo and see how high SUnreclaim: is (does not catch everything but
> > its a good indicator).
> 
> Just to inject another factor into the discussion, please remember that Linux
> also runs on nommu systems, where things like user space allocations are
> neither movable nor reclaimable.

Hmmm However, sorting of the allocations would result in avoiding 
defragmentation to some degree?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wasting our Freedom

2007-09-17 Thread Ingo Schwarze
Adrian Bunk wrote on Mon, Sep 17, 2007 at 02:57:14PM +0200:

> But stating in your licence that noone has to give back but then 
> complaining to some people on ethical grounds that they should give
> back is simply dishonest.
> 
> Is your intention to allow people to include your code into GPL'ed code 
> and never give back, or is your intention that this shouldn't happen?
> 
> And whatever your intention is should be stated in your licence.

As this is a recurring argument in the present discussion, let's
address it, even though it lies somewhat beside the main topic.
What i wish and what i try to enforce by legal contracts are two
completely different things.  In particular, it is _not_ a smart
idea to try to enforce all one's wishes by legal means.

For example, i wish that as much as possible of the code i write be
freely available such that others can use it, too, and i wish that
others write useful code and make it free such that i can use it.
When i publish code, i wish bugfixes to be fed back to me, and i
hope that others might free their derivative works, too.  Besides,
i might hope that people at large behave in human and rational ways
and refrain from doing harm to others.  In particular i might wish
the fruits of my work not to be abused to harm or oppress people.
Quite probably, lots of software developers share similar wishes,
whatever licenses they happen to be employing.

But this doesn't imply i should be putting any of the above into
the license for my code.  Once people attach additional conditions
to their licences, sooner or later i get stuck when trying to
combine different code covered by different licences.  However well
intentioned, in practice, those additional conditions habitually
turn out to be incompatible - even when, regarded seperately, all
of them might appear to make some sense.

Now doubtless, the two main additional conditions imposed by the GPL -
derivative works may only be distributed if they are made as open and
as free as the original - are among those making the most sense of all
the additional conditions you might imagine, in the sense that nearly
any developer of free software will wish that anybody holding the
copyright on a derivative work would make that free.  Still, when
trying to combine code with different licences, even the GPL at times
turns out to be a bother.  This does not only apply to the case of
non-free closed-source commercial code, but also to cases where
authors intended to make their code free, but, be it by inexperience
or because they failed to restrain themselves, unfortunately added
some uncommon condition to the license.  Combining such code with ISC
or BSD code is hardly ever problem, combining such code with GPL code
may well be.

Thus, even when wishing derivative works to be free in their turn,
i still see a strong theoretical and a strong practical argument to
choose the ISC license over the GPL: Theoretically, it's just the
categorical imperative: If everybody would be adding her or his
favorite condition to her or his license, we would not end up in
free software, but in chaos.  Practically, i'm quite fed up with
GPL license incompatibility issues always popping up at the most
inconvenient places, and still more, with all those license
compatibility discussions.  With the ISC license, there are no
incompatibility issues and no incompatibility discussions, it just
works.  Of course, i lose the option to sue people to open up
derivative works, but i keep the hope that some people (especially
those engaged in free software themselves) understand and keep up
the spirit, and above all, i avoid lots of legalese worries.
Ultimately, it's kind of a trade-off.

To summarize, there are valid reasons to wish that people would make
derivative works free, but to not require it in the license.  Just
like there are valid reasons to wish that people should not use the
code for waging war, but to not require that in the license.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Christoph Lameter
On Sun, 16 Sep 2007, Nick Piggin wrote:

> > > fsblock doesn't need any of those hacks, of course.
> >
> > Nor does mine for the low orders that we are considering. For order >
> > MAX_ORDER this is unavoidable since the page allocator cannot manage such
> > large pages. It can be used for lower order if there are issues (that I
> > have not seen yet).
> 
> Or we can just avoid all doubt (and doesn't have arbitrary limitations
> according to what you think might be reasonable or how well the
> system actually behaves).

We can avoid all doubt in this patchset as well by adding support for 
fallback to a vmalloced compound page.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Christoph Lameter
On Sun, 16 Sep 2007, Jörn Engel wrote:

> I bet!  My (false) assumption was the same as Goswin's.  If non-movable
> pages are clearly seperated from movable ones and will evict movable
> ones before polluting further mixed superpages, Nick's scenario would be
> nearly infinitely impossible.
> 
> Assumption doesn't reflect current code.  Enforcing this assumption
> would cost extra overhead.  The amount of effort to make Christoph's
> approach work reliably seems substantial and I have no idea whether it
> would be worth it.

My approach is based on Mel's code and is already working the way you 
describe. Page cache allocs are marked __GFP_MOVABLE by Mel's work.



Re: Wasting our Freedom

2007-09-17 Thread Can E. Acar
Theodore Tso wrote:
> On Mon, Sep 17, 2007 at 09:23:41PM +0200, Claudio Jeker wrote:
>> Because they put their copyright plus license on code that they barely
>> modified. If they would have added substantial work into the OpenHAL code
>> and by doing that creating something new I would not say much.
> 
> Number 1, some of the Linux wireless developers screwed up earlier
> versions.  No denying that, the problems were pointed out during the
> patch reviewed problem, AND THEY WERE FIXED.

Not all, see below:

> Number 2, if you take a look at their latest set of changes (which
> have still not been accepted), the HAL code is under a pure BSD
> license (ath5k_hw.c).  Other portions are dual licensed, but not the
> HAL --- if people would only take a look at
> 
> http://git.kernel.org/?p=linux/kernel/git/linville/wireless-dev.git;a=tree;f=drivers/net/wireless;h=2d6caeba0924c34b9539960b9ab568ab3d193fc8;hb=everything
> 

from latest ath5k_hw.c:

* Copyright (c) 2004-2007 Reyk Floeter <[EMAIL PROTECTED]>
* Copyright (c) 2006-2007 Nick Kossifidis <[EMAIL PROTECTED]>
* Copyright (c) 2007 Jiri Slaby <[EMAIL PROTECTED]>
[snip rest of BSD license]

The only remaining issue is whether Nick & Jiri have enough
original contributions to the code to be added to the Copyright.

I believe this needs to be resolved between Reyk and Nick and Jiri.

The main reason of Theo's message, linked earlier, was the
lack of response on this issue. It seems that the SFLC is
dismissing this issue, and thus stalling its resolution by the
developers.

The rest is, as you say, history.

Can

-- 
In theory, there is no difference between theory and practice.
But, in practice, there is.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Christoph Lameter
On Sun, 16 Sep 2007, Nick Piggin wrote:

> I don't know how it would prevent fragmentation from building up
> anyway. It's commonly the case that potentially unmovable objects
> are allowed to fill up all of ram (dentries, inodes, etc).

Not in 2.6.23 with ZONE_MOVABLE. Unmovable objects are not allocated from 
ZONE_MOVABLE and thus the memory that can be allocated for them is 
limited.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Userspace tuner

2007-09-17 Thread Bill Davidsen

Dâniel Fraga wrote:

Well, I'd like to see Linus' opinion about this, because while
programmers keep discussing this, users are waiting forever... so if
Markus has a concrete and better solution, why don't use it?

And as far as I know, Markus is the programmer who is most
interested in this code. I didn't see anybody else in the world doing
his work...

And I always had a impression that if most of things could be
done in user space, than it will be better (for example, devfs -> udev).
Why do everything in kernel space? Lets put *less* code in the kernel,
not more code. And besides that, code in user space can be changed
easily. Code in kernel has to wait a long time for Linus to accept (*if*
he accepts).

The problem with user space drivers is that it encourages binary only 
drivers, drivers which work only for a limited set of hardware, and 
other means to reduce choice for the user. There's a reason why binary 
modules make the kernel tainted, I have to feel that this is more and 
worse of same.


Linus will have an opinion, no doubt.

--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Badari Pulavarty
On Mon, 2007-09-17 at 12:29 -0700, Mingming Cao wrote:
> On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
> > jbd/jbd2: Replace slab allocations with page cache allocations
> > 
> > From: Christoph Lameter <[EMAIL PROTECTED]>
> > 
> > JBD should not pass slab pages down to the block layer.
> > Use page allocator pages instead. This will also prepare
> > JBD for the large blocksize patchset.
> > 
> 
> Currently memory allocation for committed_data(and frozen_buffer) for
> bufferhead is done through jbd slab management, as Christoph Hellwig
> pointed out that this is broken as jbd should not pass slab pages down
> to IO layer. and suggested to use get_free_pages() directly.
> 
> The problem with this patch, as Andreas Dilger pointed today in ext4
> interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
> 1/3-1/2 page space. 
> 
> What was the originally intention to set up slabs for committed_data(and
> frozen_buffer) in JBD? Why not using kmalloc?
> 
> Mingming

Looks good. Small suggestion is to get rid of all kmalloc() usages and
consistently use jbd_kmalloc() or jbd2_kmalloc().

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] selinux: Improving SELinux read/write performance

2007-09-17 Thread James Morris
On Mon, 17 Sep 2007, Stephen Smalley wrote:

> > It reduces the selinux overhead on read/write by only revalidating
> > permissions in selinux_file_permission if the task or inode labels have
> > changed or the policy has changed since the open-time check.  A new LSM
> > hook, security_dentry_open, is added to capture the necessary state at
> > open time to allow this optimization.
> > 
> > Signed-off-by: Yuichi Nakamura<[EMAIL PROTECTED]>
> 
> Thanks, looks good.
> 
> Acked-by:  Stephen Smalley <[EMAIL PROTECTED]>

Applied to 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6.git#for-akpm


-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] [PATCH 2/3] Refactor hypercall infrastructure (v3)

2007-09-17 Thread Nakajima, Jun
Anthony Liguori wrote:
> This patch refactors the current hypercall infrastructure to better
support
> live 
> migration and SMP.  It eliminates the hypercall page by trapping the
UD
> exception that would occur if you used the wrong hypercall instruction
for the
> underlying architecture and replacing it with the right one lazily.
> 
> It also introduces the infrastructure to probe for hypercall available
via
> CPUID leaves 0x4000.  CPUID leaf 0x4001 should be filled out
by
> userspace.
> 
> A fall-out of this patch is that the unhandled hypercalls no longer
trap to
> userspace.  There is very little reason though to use a hypercall to
> communicate 
> with userspace as PIO or MMIO can be used.  There is no code in tree
that uses
> userspace hypercalls.
> 
> Signed-off-by: Anthony Liguori <[EMAIL PROTECTED]>
> 
> diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
> index 3b29256..cc5dfb4 100644
> --- a/include/linux/kvm_para.h
> +++ b/include/linux/kvm_para.h
> @@ -1,73 +1,110 @@
>  #ifndef __LINUX_KVM_PARA_H
>  #define __LINUX_KVM_PARA_H
> 
> -/*
> - * Guest OS interface for KVM paravirtualization
> - *
> - * Note: this interface is totally experimental, and is certain to
change
> - *   as we make progress.
> +/* This CPUID returns the signature 'KVMKVMKVM' in ebx, ecx, and edx.
It
> + * should be used to determine that a VM is running under KVM.



> +
> +static inline int kvm_para_available(void)
> +{
> + unsigned int eax, ebx, ecx, edx;
> + char signature[13];
> +
> + cpuid(KVM_CPUID_SIGNATURE, , , , );
> + memcpy(signature + 0, , 4);
> + memcpy(signature + 4, , 4);
> + memcpy(signature + 8, , 4);
> + signature[12] = 0;
> +
> + if (strcmp(signature, "KVMKVMKVM") == 0)

> + return 1;
> +
> + return 0;
> +}

I think we should compare 12 characters (not just 9, as far as my eyes
tell), and can we use some cute one, like "FantasticKVM"? ;-)

Jun
---
Intel Open Source Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/11] eCryptfs: Replace magic numbers

2007-09-17 Thread Michael Halcrow
Replace some magic numbers with sizeof() equivalents.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
---
 fs/ecryptfs/crypto.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 3b3cf27..425a144 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -1426,10 +1426,10 @@ static int parse_header_metadata(struct 
ecryptfs_crypt_stat *crypt_stat,
u32 header_extent_size;
u16 num_header_extents_at_front;
 
-   memcpy(_extent_size, virt, 4);
+   memcpy(_extent_size, virt, sizeof(u32));
header_extent_size = be32_to_cpu(header_extent_size);
-   virt += 4;
-   memcpy(_header_extents_at_front, virt, 2);
+   virt += sizeof(u32);
+   memcpy(_header_extents_at_front, virt, sizeof(u16));
num_header_extents_at_front = be16_to_cpu(num_header_extents_at_front);
crypt_stat->num_header_extents_at_front =
(int)num_header_extents_at_front;
-- 
1.5.1.6

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/11] eCryptfs: Remove unused functions and kmem_cache

2007-09-17 Thread Michael Halcrow
The switch to read_write.c routines and the persistent file make a
number of functions unnecessary. This patch removes them.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
---
 fs/ecryptfs/crypto.c  |  150 --
 fs/ecryptfs/ecryptfs_kernel.h |   21 +---
 fs/ecryptfs/file.c|   28 
 fs/ecryptfs/main.c|5 -
 fs/ecryptfs/mmap.c|  336 -
 5 files changed, 1 insertions(+), 539 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index b3014d7..3b3cf27 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -353,119 +353,6 @@ out:
return rc;
 }
 
-static void
-ecryptfs_extent_to_lwr_pg_idx_and_offset(unsigned long *lower_page_idx,
-int *byte_offset,
-struct ecryptfs_crypt_stat *crypt_stat,
-unsigned long extent_num)
-{
-   unsigned long lower_extent_num;
-   int extents_occupied_by_headers_at_front;
-   int bytes_occupied_by_headers_at_front;
-   int extent_offset;
-   int extents_per_page;
-
-   bytes_occupied_by_headers_at_front =
-   (crypt_stat->extent_size
-* crypt_stat->num_header_extents_at_front);
-   extents_occupied_by_headers_at_front =
-   ( bytes_occupied_by_headers_at_front
- / crypt_stat->extent_size );
-   lower_extent_num = extents_occupied_by_headers_at_front + extent_num;
-   extents_per_page = PAGE_CACHE_SIZE / crypt_stat->extent_size;
-   (*lower_page_idx) = lower_extent_num / extents_per_page;
-   extent_offset = lower_extent_num % extents_per_page;
-   (*byte_offset) = extent_offset * crypt_stat->extent_size;
-   ecryptfs_printk(KERN_DEBUG, " * crypt_stat->extent_size = "
-   "[%d]\n", crypt_stat->extent_size);
-   ecryptfs_printk(KERN_DEBUG, " * crypt_stat->"
-   "num_header_extents_at_front = [%d]\n",
-   crypt_stat->num_header_extents_at_front);
-   ecryptfs_printk(KERN_DEBUG, " * extents_occupied_by_headers_at_"
-   "front = [%d]\n", extents_occupied_by_headers_at_front);
-   ecryptfs_printk(KERN_DEBUG, " * lower_extent_num = [0x%.16x]\n",
-   lower_extent_num);
-   ecryptfs_printk(KERN_DEBUG, " * extents_per_page = [%d]\n",
-   extents_per_page);
-   ecryptfs_printk(KERN_DEBUG, " * (*lower_page_idx) = [0x%.16x]\n",
-   (*lower_page_idx));
-   ecryptfs_printk(KERN_DEBUG, " * extent_offset = [%d]\n",
-   extent_offset);
-   ecryptfs_printk(KERN_DEBUG, " * (*byte_offset) = [%d]\n",
-   (*byte_offset));
-}
-
-static int ecryptfs_write_out_page(struct ecryptfs_page_crypt_context *ctx,
-  struct page *lower_page,
-  struct inode *lower_inode,
-  int byte_offset_in_page, int bytes_to_write)
-{
-   int rc = 0;
-
-   if (ctx->mode == ECRYPTFS_PREPARE_COMMIT_MODE) {
-   rc = ecryptfs_commit_lower_page(lower_page, lower_inode,
-   ctx->param.lower_file,
-   byte_offset_in_page,
-   bytes_to_write);
-   if (rc) {
-   ecryptfs_printk(KERN_ERR, "Error calling lower "
-   "commit; rc = [%d]\n", rc);
-   goto out;
-   }
-   } else {
-   rc = ecryptfs_writepage_and_release_lower_page(lower_page,
-  lower_inode,
-  ctx->param.wbc);
-   if (rc) {
-   ecryptfs_printk(KERN_ERR, "Error calling lower "
-   "writepage(); rc = [%d]\n", rc);
-   goto out;
-   }
-   }
-out:
-   return rc;
-}
-
-static int ecryptfs_read_in_page(struct ecryptfs_page_crypt_context *ctx,
-struct page **lower_page,
-struct inode *lower_inode,
-unsigned long lower_page_idx,
-int byte_offset_in_page)
-{
-   int rc = 0;
-
-   if (ctx->mode == ECRYPTFS_PREPARE_COMMIT_MODE) {
-   /* TODO: Limit this to only the data extents that are
-* needed */
-   rc = ecryptfs_get_lower_page(lower_page, lower_inode,
-ctx->param.lower_file,
-lower_page_idx,
-byte_offset_in_page,
-   

[PATCH 9/11] eCryptfs: Initialize persistent lower file on inode create

2007-09-17 Thread Michael Halcrow
Initialize persistent lower file on inode create.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
---
 fs/ecryptfs/super.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/ecryptfs/super.c b/fs/ecryptfs/super.c
index b97e210..f8cdab2 100644
--- a/fs/ecryptfs/super.c
+++ b/fs/ecryptfs/super.c
@@ -47,15 +47,16 @@ struct kmem_cache *ecryptfs_inode_info_cache;
  */
 static struct inode *ecryptfs_alloc_inode(struct super_block *sb)
 {
-   struct ecryptfs_inode_info *ecryptfs_inode;
+   struct ecryptfs_inode_info *inode_info;
struct inode *inode = NULL;
 
-   ecryptfs_inode = kmem_cache_alloc(ecryptfs_inode_info_cache,
- GFP_KERNEL);
-   if (unlikely(!ecryptfs_inode))
+   inode_info = kmem_cache_alloc(ecryptfs_inode_info_cache, GFP_KERNEL);
+   if (unlikely(!inode_info))
goto out;
-   ecryptfs_init_crypt_stat(_inode->crypt_stat);
-   inode = _inode->vfs_inode;
+   ecryptfs_init_crypt_stat(_info->crypt_stat);
+   mutex_init(_info->lower_file_mutex);
+   inode_info->lower_file = NULL;
+   inode = _info->vfs_inode;
 out:
return inode;
 }
-- 
1.5.1.6

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/11] eCryptfs: Convert mmap functions to use persistent file

2007-09-17 Thread Michael Halcrow
Convert readpage, prepare_write, and commit_write to use read_write.c
routines. Remove sync_page; I cannot think of a good reason for
implementing that in eCryptfs.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
---
 fs/ecryptfs/mmap.c |  199 +++-
 1 files changed, 103 insertions(+), 96 deletions(-)

diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index 60e635e..dd68dd3 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -267,9 +267,78 @@ static void set_header_info(char *page_virt,
 }
 
 /**
+ * ecryptfs_copy_up_encrypted_with_header
+ * @page: Sort of a ``virtual'' representation of the encrypted lower
+ *file. The actual lower file does not have the metadata in
+ *the header.
+ * @crypt_stat: The eCryptfs inode's cryptographic context
+ *
+ * The ``view'' is the version of the file that userspace winds up
+ * seeing, with the header information inserted.
+ */
+static int
+ecryptfs_copy_up_encrypted_with_header(struct page *page,
+  struct ecryptfs_crypt_stat *crypt_stat)
+{
+   loff_t extent_num_in_page = 0;
+   loff_t num_extents_per_page = (PAGE_CACHE_SIZE
+  / crypt_stat->extent_size);
+   int rc = 0;
+
+   while (extent_num_in_page < num_extents_per_page) {
+   loff_t view_extent_num = ((page->index * num_extents_per_page)
+ + extent_num_in_page);
+
+   if (view_extent_num < crypt_stat->num_header_extents_at_front) {
+   /* This is a header extent */
+   char *page_virt;
+
+   page_virt = kmap_atomic(page, KM_USER0);
+   memset(page_virt, 0, PAGE_CACHE_SIZE);
+   /* TODO: Support more than one header extent */
+   if (view_extent_num == 0) {
+   rc = ecryptfs_read_xattr_region(
+   page_virt, page->mapping->host);
+   set_header_info(page_virt, crypt_stat);
+   }
+   kunmap_atomic(page_virt, KM_USER0);
+   flush_dcache_page(page);
+   if (rc) {
+   ClearPageUptodate(page);
+   printk(KERN_ERR "%s: Error reading xattr "
+  "region; rc = [%d]\n", __FUNCTION__, rc);
+   goto out;
+   }
+   SetPageUptodate(page);
+   } else {
+   /* This is an encrypted data extent */
+   loff_t lower_offset =
+   ((view_extent_num -
+ crypt_stat->num_header_extents_at_front)
+* crypt_stat->extent_size);
+
+   rc = ecryptfs_read_lower_page_segment(
+   page, (lower_offset >> PAGE_CACHE_SHIFT),
+   (lower_offset & ~PAGE_CACHE_MASK),
+   crypt_stat->extent_size, page->mapping->host);
+   if (rc) {
+   printk(KERN_ERR "%s: Error attempting to read "
+  "extent at offset [%lld] in the lower "
+  "file; rc = [%d]\n", __FUNCTION__,
+  lower_offset, rc);
+   goto out;
+   }
+   }
+   extent_num_in_page++;
+   }
+out:
+   return rc;
+}
+
+/**
  * ecryptfs_readpage
- * @file: This is an ecryptfs file
- * @page: ecryptfs associated page to stick the read data into
+ * @file: An eCryptfs file
+ * @page: Page from eCryptfs inode mapping into which to stick the read data
  *
  * Read in a page, decrypting if necessary.
  *
@@ -277,59 +346,35 @@ static void set_header_info(char *page_virt,
  */
 static int ecryptfs_readpage(struct file *file, struct page *page)
 {
+   struct ecryptfs_crypt_stat *crypt_stat =
+   
_inode_to_private(file->f_path.dentry->d_inode)->crypt_stat;
int rc = 0;
-   struct ecryptfs_crypt_stat *crypt_stat;
 
-   BUG_ON(!(file && file->f_path.dentry && file->f_path.dentry->d_inode));
-   crypt_stat = _inode_to_private(file->f_path.dentry->d_inode)
-   ->crypt_stat;
if (!crypt_stat
|| !(crypt_stat->flags & ECRYPTFS_ENCRYPTED)
|| (crypt_stat->flags & ECRYPTFS_NEW_FILE)) {
ecryptfs_printk(KERN_DEBUG,
"Passing through unencrypted page\n");
-   rc = ecryptfs_do_readpage(file, page, page->index);
-   if (rc) {
-   ecryptfs_printk(KERN_ERR, "Error reading page; rc = "
-   

[PATCH 7/11] eCryptfs: Make open, truncate, and setattr use persistent file

2007-09-17 Thread Michael Halcrow
Rather than open a new lower file for every eCryptfs file that is
opened, truncated, or setattr'd, instead use the existing lower
persistent file for the eCryptfs inode. Change truncate to use
read_write.c functions. Change ecryptfs_getxattr() to use the common
ecryptfs_getxattr_lower() function.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
---
 fs/ecryptfs/crypto.c |2 +-
 fs/ecryptfs/file.c   |   50 --
 fs/ecryptfs/inode.c  |  113 +++---
 3 files changed, 44 insertions(+), 121 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 6b4d310..b3014d7 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -1674,7 +1674,7 @@ out:
 /**
  * ecryptfs_read_xattr_region
  * @page_virt: The vitual address into which to read the xattr data
- * @ecryptfs_dentry: The eCryptfs dentry
+ * @ecryptfs_inode: The eCryptfs inode
  *
  * Attempts to read the crypto metadata from the extended attribute
  * region of the lower file.
diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index df70bfa..95be9a9 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -187,11 +187,7 @@ static int ecryptfs_open(struct inode *inode, struct file 
*file)
/* Private value of ecryptfs_dentry allocated in
 * ecryptfs_lookup() */
struct dentry *lower_dentry = ecryptfs_dentry_to_lower(ecryptfs_dentry);
-   struct inode *lower_inode = NULL;
-   struct file *lower_file = NULL;
-   struct vfsmount *lower_mnt;
struct ecryptfs_file_info *file_info;
-   int lower_flags;
 
mount_crypt_stat = _superblock_to_private(
ecryptfs_dentry->d_sb)->mount_crypt_stat;
@@ -219,26 +215,12 @@ static int ecryptfs_open(struct inode *inode, struct file 
*file)
if (!(crypt_stat->flags & ECRYPTFS_POLICY_APPLIED)) {
ecryptfs_printk(KERN_DEBUG, "Setting flags for stat...\n");
/* Policy code enabled in future release */
-   crypt_stat->flags |= ECRYPTFS_POLICY_APPLIED;
-   crypt_stat->flags |= ECRYPTFS_ENCRYPTED;
+   crypt_stat->flags |= (ECRYPTFS_POLICY_APPLIED
+ | ECRYPTFS_ENCRYPTED);
}
mutex_unlock(_stat->cs_mutex);
-   lower_flags = file->f_flags;
-   if ((lower_flags & O_ACCMODE) == O_WRONLY)
-   lower_flags = (lower_flags & O_ACCMODE) | O_RDWR;
-   if (file->f_flags & O_APPEND)
-   lower_flags &= ~O_APPEND;
-   lower_mnt = ecryptfs_dentry_to_lower_mnt(ecryptfs_dentry);
-   /* Corresponding fput() in ecryptfs_release() */
-   rc = ecryptfs_open_lower_file(_file, lower_dentry, lower_mnt,
- lower_flags);
-   if (rc) {
-   ecryptfs_printk(KERN_ERR, "Error opening lower file\n");
-   goto out_puts;
-   }
-   ecryptfs_set_file_lower(file, lower_file);
-   /* Isn't this check the same as the one in lookup? */
-   lower_inode = lower_dentry->d_inode;
+   ecryptfs_set_file_lower(
+   file, ecryptfs_inode_to_private(inode)->lower_file);
if (S_ISDIR(ecryptfs_dentry->d_inode->i_mode)) {
ecryptfs_printk(KERN_DEBUG, "This is a directory\n");
crypt_stat->flags &= ~(ECRYPTFS_ENCRYPTED);
@@ -260,7 +242,7 @@ static int ecryptfs_open(struct inode *inode, struct file 
*file)
   "and plaintext passthrough mode is not "
   "enabled; returning -EIO\n");
mutex_unlock(_stat->cs_mutex);
-   goto out_puts;
+   goto out_free;
}
rc = 0;
crypt_stat->flags &= ~(ECRYPTFS_ENCRYPTED);
@@ -272,11 +254,8 @@ static int ecryptfs_open(struct inode *inode, struct file 
*file)
ecryptfs_printk(KERN_DEBUG, "inode w/ addr = [0x%p], i_ino = [0x%.16x] "
"size: [0x%.16x]\n", inode, inode->i_ino,
i_size_read(inode));
-   ecryptfs_set_file_lower(file, lower_file);
goto out;
-out_puts:
-   mntput(lower_mnt);
-   dput(lower_dentry);
+out_free:
kmem_cache_free(ecryptfs_file_info_cache,
ecryptfs_file_to_private(file));
 out:
@@ -296,20 +275,9 @@ static int ecryptfs_flush(struct file *file, fl_owner_t td)
 
 static int ecryptfs_release(struct inode *inode, struct file *file)
 {
-   struct file *lower_file = ecryptfs_file_to_lower(file);
-   struct ecryptfs_file_info *file_info = ecryptfs_file_to_private(file);
-   struct inode *lower_inode = ecryptfs_inode_to_lower(inode);
-   int rc;
-
-   rc = ecryptfs_close_lower_file(lower_file);
-   if (rc) {
-   printk(KERN_ERR "Error closing lower_file\n");
-   goto out;
-   }
-   inode->i_blocks = 

Re: [PATCH] add consts where appropriate in sound/pci/hda/*

2007-09-17 Thread Denys Vlasenko
On Monday 17 September 2007 11:01, Takashi Iwai wrote:
> > There is a lot of data structures in that code,
> > and most of them seems to be read-only.
> > 
> > I added const modifiers to most of such places:
> > 
> >textdata bss dec hex filename
> >  106315  179564  36  285915   45cdb snd-hda-intel.o
> >  2830512624  36  285711   45c0f snd-hda-intel_patched.o
> > 
> > Patch is attached.
> > 
> > It moves "static struct hda_codec_preset *hda_preset_tables[]"
> > from hda_patch.h to hda_codec.c, and then adds
> > #include "hda_patch.h"
> > in a few .c files so that definitions of e.g.
> > const struct hda_codec_preset snd_hda_preset_analog[]
> > are checked to match declarations in hda_patch.h
> > 
> > The rest of the patch (bulk of it) adds "const"
> > in many places.
> > 
> > Patch is compile tested. Please apply.
> > 
> > Signed-off-by: Denys Vlasenko <[EMAIL PROTECTED]>
> 
> Sorry for the late reply.
> 
> First, thanks for your patch.  Although I have also a similar patch
> pending on my tree, but it wasn't applied, because I'd like to mark
> these functions/data rather as __devinit*.  And, sadly, init and const
> don't like with each other.

Unless we will go to the pains of implementing __devrodata,
which doesn't sound encouraging.

> So, my plan is to apply __devinit but 
> without const.

Yes, I see. const as it stands is not very useful in kernel anyway
(only a small code reduction sometimes).
ro or rw, the data is still taking space.

Well, maybe someday ld will be sooo clever that it will actually
merge rodata which is identical, but so far it is not implemented.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/11] eCryptfs: Update metadata read/write functions

2007-09-17 Thread Michael Halcrow
Update the metadata read/write functions and grow_file() to use the
read_write.c routines. Do not open another lower file; use the
persistent lower file instead. Provide a separate function for
crypto.c::ecryptfs_read_xattr_region() to get to the lower xattr
without having to go through the eCryptfs getxattr.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
---
 fs/ecryptfs/crypto.c  |  126 +++--
 fs/ecryptfs/ecryptfs_kernel.h |   15 +++--
 fs/ecryptfs/file.c|2 +-
 fs/ecryptfs/inode.c   |  101 +++--
 fs/ecryptfs/mmap.c|2 +-
 5 files changed, 113 insertions(+), 133 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index d6a0680..6b4d310 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -1344,21 +1344,28 @@ out:
return rc;
 }
 
-int ecryptfs_read_and_validate_header_region(char *data, struct dentry *dentry,
-struct vfsmount *mnt)
+int ecryptfs_read_and_validate_header_region(char *data,
+struct inode *ecryptfs_inode)
 {
+   struct ecryptfs_crypt_stat *crypt_stat =
+   &(ecryptfs_inode_to_private(ecryptfs_inode)->crypt_stat);
int rc;
 
-   rc = ecryptfs_read_header_region(data, dentry, mnt);
-   if (rc)
+   rc = ecryptfs_read_lower(data, 0, crypt_stat->extent_size,
+ecryptfs_inode);
+   if (rc) {
+   printk(KERN_ERR "%s: Error reading header region; rc = [%d]\n",
+  __FUNCTION__, rc);
goto out;
-   if (!contains_ecryptfs_marker(data + ECRYPTFS_FILE_SIZE_BYTES))
+   }
+   if (!contains_ecryptfs_marker(data + ECRYPTFS_FILE_SIZE_BYTES)) {
rc = -EINVAL;
+   ecryptfs_printk(KERN_DEBUG, "Valid marker not found\n");
+   }
 out:
return rc;
 }
 
-
 void
 ecryptfs_write_header_metadata(char *virt,
   struct ecryptfs_crypt_stat *crypt_stat,
@@ -1443,24 +1450,18 @@ static int ecryptfs_write_headers_virt(char *page_virt, 
size_t *size,
 
 static int
 ecryptfs_write_metadata_to_contents(struct ecryptfs_crypt_stat *crypt_stat,
-   struct file *lower_file, char *page_virt)
+   struct dentry *ecryptfs_dentry,
+   char *page_virt)
 {
-   mm_segment_t oldfs;
int current_header_page;
int header_pages;
-   ssize_t size;
-   int rc = 0;
+   int rc;
 
-   lower_file->f_pos = 0;
-   oldfs = get_fs();
-   set_fs(get_ds());
-   size = vfs_write(lower_file, (char __user *)page_virt, PAGE_CACHE_SIZE,
-_file->f_pos);
-   if (size < 0) {
-   rc = (int)size;
-   printk(KERN_ERR "Error attempting to write lower page; "
-  "rc = [%d]\n", rc);
-   set_fs(oldfs);
+   if ((rc = ecryptfs_write_lower(ecryptfs_dentry->d_inode, page_virt,
+  0, PAGE_CACHE_SIZE))) {
+   printk(KERN_ERR "%s: Error attempting to write header "
+  "information to lower file; rc = [%d]\n", __FUNCTION__,
+  rc);
goto out;
}
header_pages = ((crypt_stat->extent_size
@@ -1469,18 +1470,19 @@ ecryptfs_write_metadata_to_contents(struct 
ecryptfs_crypt_stat *crypt_stat,
memset(page_virt, 0, PAGE_CACHE_SIZE);
current_header_page = 1;
while (current_header_page < header_pages) {
-   size = vfs_write(lower_file, (char __user *)page_virt,
-PAGE_CACHE_SIZE, _file->f_pos);
-   if (size < 0) {
-   rc = (int)size;
-   printk(KERN_ERR "Error attempting to write lower page; "
-  "rc = [%d]\n", rc);
-   set_fs(oldfs);
+   loff_t offset;
+
+   offset = (current_header_page << PAGE_CACHE_SHIFT);
+   if ((rc = ecryptfs_write_lower(ecryptfs_dentry->d_inode,
+  page_virt, offset,
+  PAGE_CACHE_SIZE))) {
+   printk(KERN_ERR "%s: Error attempting to write header "
+  "information to lower file; rc = [%d]\n",
+  __FUNCTION__, rc);
goto out;
}
current_header_page++;
}
-   set_fs(oldfs);
 out:
return rc;
 }
@@ -1500,7 +1502,6 @@ ecryptfs_write_metadata_to_xattr(struct dentry 
*ecryptfs_dentry,
 /**
  * ecryptfs_write_metadata
  * @ecryptfs_dentry: The eCryptfs dentry
- * @lower_file: The lower file struct, which was returned from dentry_open
  *
  * Write the file headers out.  

[PATCH 5/11] eCryptfs: Set up and destroy persistent lower file

2007-09-17 Thread Michael Halcrow
This patch sets up and destroys the persistent lower file for each
eCryptfs inode.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
---
 fs/ecryptfs/inode.c |   23 +++---
 fs/ecryptfs/main.c  |   65 +++
 fs/ecryptfs/super.c |   22 +++--
 3 files changed, 103 insertions(+), 7 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 7192a81..c746b5d 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -119,10 +119,23 @@ ecryptfs_do_create(struct inode *directory_inode,
}
rc = ecryptfs_create_underlying_file(lower_dir_dentry->d_inode,
 ecryptfs_dentry, mode, nd);
-   if (unlikely(rc)) {
-   ecryptfs_printk(KERN_ERR,
-   "Failure to create underlying file\n");
-   goto out_lock;
+   if (rc) {
+   struct inode *ecryptfs_inode = ecryptfs_dentry->d_inode;
+   struct ecryptfs_inode_info *inode_info =
+   ecryptfs_inode_to_private(ecryptfs_inode);
+
+   printk(KERN_WARNING "%s: Error creating underlying file; "
+  "rc = [%d]; checking for existing\n", __FUNCTION__, rc);
+   if (inode_info) {
+   mutex_lock(_info->lower_file_mutex);
+   if (!inode_info->lower_file) {
+   mutex_unlock(_info->lower_file_mutex);
+   printk(KERN_ERR "%s: Failure to set underlying "
+  "file; rc = [%d]\n", __FUNCTION__, rc);
+   goto out_lock;
+   }
+   mutex_unlock(_info->lower_file_mutex);
+   }
}
rc = ecryptfs_interpose(lower_dentry, ecryptfs_dentry,
directory_inode->i_sb, 0);
@@ -252,6 +265,8 @@ ecryptfs_create(struct inode *directory_inode, struct 
dentry *ecryptfs_dentry,
 {
int rc;
 
+   /* ecryptfs_do_create() calls ecryptfs_interpose(), which opens
+* the crypt_stat->lower_file (persistent file) */
rc = ecryptfs_do_create(directory_inode, ecryptfs_dentry, mode, nd);
if (unlikely(rc)) {
ecryptfs_printk(KERN_WARNING, "Failed to create file in"
diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index 967bad0..3e324f8 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -98,6 +98,64 @@ void __ecryptfs_printk(const char *fmt, ...)
 }
 
 /**
+ * ecryptfs_init_persistent_file
+ * @ecryptfs_dentry: Fully initialized eCryptfs dentry object, with
+ *   the lower dentry and the lower mount set
+ *
+ * eCryptfs only ever keeps a single open file for every lower
+ * inode. All I/O operations to the lower inode occur through that
+ * file. When the first eCryptfs dentry that interposes with the first
+ * lower dentry for that inode is created, this function creates the
+ * persistent file struct and associates it with the eCryptfs
+ * inode. When the eCryptfs inode is destroyed, the file is closed.
+ *
+ * The persistent file will be opened with read/write permissions, if
+ * possible. Otherwise, it is opened read-only.
+ *
+ * This function does nothing if a lower persistent file is already
+ * associated with the eCryptfs inode.
+ *
+ * Returns zero on success; non-zero otherwise
+ */
+int ecryptfs_init_persistent_file(struct dentry *ecryptfs_dentry)
+{
+   struct ecryptfs_inode_info *inode_info =
+   ecryptfs_inode_to_private(ecryptfs_dentry->d_inode);
+   int rc = 0;
+
+   mutex_lock(_info->lower_file_mutex);
+   if (!inode_info->lower_file) {
+   struct dentry *lower_dentry;
+   struct vfsmount *lower_mnt =
+   ecryptfs_dentry_to_lower_mnt(ecryptfs_dentry);
+
+   lower_dentry = ecryptfs_dentry_to_lower(ecryptfs_dentry);
+   /* Corresponding dput() and mntput() are done when the
+* persistent file is fput() when the eCryptfs inode
+* is destroyed. */
+   dget(lower_dentry);
+   mntget(lower_mnt);
+   inode_info->lower_file = dentry_open(lower_dentry,
+lower_mnt,
+(O_RDWR | O_LARGEFILE));
+   if (IS_ERR(inode_info->lower_file))
+   inode_info->lower_file = dentry_open(lower_dentry,
+lower_mnt,
+(O_RDONLY
+ | O_LARGEFILE));
+   if (IS_ERR(inode_info->lower_file)) {
+   printk(KERN_ERR "Error opening lower persistent file "
+  "for lower_dentry [0x%p] and lower_mnt [0x%p]\n",

[PATCH 4/11] eCryptfs: Replace encrypt, decrypt, and inode size write

2007-09-17 Thread Michael Halcrow
Replace page encryption and decryption routines and inode size write
routine with versions that utilize the read_write.c functions.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
---
 fs/ecryptfs/crypto.c  |  427 ++--
 fs/ecryptfs/ecryptfs_kernel.h |   14 +-
 fs/ecryptfs/inode.c   |   12 +-
 fs/ecryptfs/mmap.c|  131 -
 fs/ecryptfs/read_write.c  |   12 +-
 5 files changed, 290 insertions(+), 306 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 5d8a553..b829d3c 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -467,8 +467,91 @@ out:
 }
 
 /**
+ * ecryptfs_lower_offset_for_extent
+ *
+ * Convert an eCryptfs page index into a lower byte offset
+ */
+void ecryptfs_lower_offset_for_extent(loff_t *offset, loff_t extent_num,
+ struct ecryptfs_crypt_stat *crypt_stat)
+{
+   (*offset) = ((crypt_stat->extent_size
+ * crypt_stat->num_header_extents_at_front)
++ (crypt_stat->extent_size * extent_num));
+}
+
+/**
+ * ecryptfs_encrypt_extent
+ * @enc_extent_page: Allocated page into which to encrypt the data in
+ *   @page
+ * @crypt_stat: crypt_stat containing cryptographic context for the
+ *  encryption operation
+ * @page: Page containing plaintext data extent to encrypt
+ * @extent_offset: Page extent offset for use in generating IV
+ *
+ * Encrypts one extent of data.
+ *
+ * Return zero on success; non-zero otherwise
+ */
+static int ecryptfs_encrypt_extent(struct page *enc_extent_page,
+  struct ecryptfs_crypt_stat *crypt_stat,
+  struct page *page,
+  unsigned long extent_offset)
+{
+   unsigned long extent_base;
+   char extent_iv[ECRYPTFS_MAX_IV_BYTES];
+   int rc;
+
+   extent_base = (page->index
+  * (PAGE_CACHE_SIZE / crypt_stat->extent_size));
+   rc = ecryptfs_derive_iv(extent_iv, crypt_stat,
+   (extent_base + extent_offset));
+   if (rc) {
+   ecryptfs_printk(KERN_ERR, "Error attempting to "
+   "derive IV for extent [0x%.16x]; "
+   "rc = [%d]\n", (extent_base + extent_offset),
+   rc);
+   goto out;
+   }
+   if (unlikely(ecryptfs_verbosity > 0)) {
+   ecryptfs_printk(KERN_DEBUG, "Encrypting extent "
+   "with iv:\n");
+   ecryptfs_dump_hex(extent_iv, crypt_stat->iv_bytes);
+   ecryptfs_printk(KERN_DEBUG, "First 8 bytes before "
+   "encryption:\n");
+   ecryptfs_dump_hex((char *)
+ (page_address(page)
+  + (extent_offset * crypt_stat->extent_size)),
+ 8);
+   }
+   rc = ecryptfs_encrypt_page_offset(crypt_stat, enc_extent_page, 0,
+ page, (extent_offset
+* crypt_stat->extent_size),
+ crypt_stat->extent_size, extent_iv);
+   if (rc < 0) {
+   printk(KERN_ERR "%s: Error attempting to encrypt page with "
+  "page->index = [%ld], extent_offset = [%ld]; "
+  "rc = [%d]\n", __FUNCTION__, page->index, extent_offset,
+  rc);
+   goto out;
+   }
+   rc = 0;
+   if (unlikely(ecryptfs_verbosity > 0)) {
+   ecryptfs_printk(KERN_DEBUG, "Encrypt extent [0x%.16x]; "
+   "rc = [%d]\n", (extent_base + extent_offset),
+   rc);
+   ecryptfs_printk(KERN_DEBUG, "First 8 bytes after "
+   "encryption:\n");
+   ecryptfs_dump_hex((char *)(page_address(enc_extent_page)), 8);
+   }
+out:
+   return rc;
+}
+
+/**
  * ecryptfs_encrypt_page
- * @ctx: The context of the page
+ * @page: Page mapped from the eCryptfs inode for the file; contains
+ *decrypted content that needs to be encrypted (to a temporary
+ *page; not in place) and written out to the lower file
  *
  * Encrypt an eCryptfs page. This is done on a per-extent basis. Note
  * that eCryptfs pages may straddle the lower pages -- for instance,
@@ -478,128 +561,121 @@ out:
  * file, 24K of page 0 of the lower file will be read and decrypted,
  * and then 8K of page 1 of the lower file will be read and decrypted.
  *
- * The actual operations performed on each page depends on the
- * contents of the ecryptfs_page_crypt_context struct.
- *
  * Returns zero on success; negative on error
  */
-int ecryptfs_encrypt_page(struct ecryptfs_page_crypt_context *ctx)
+int ecryptfs_encrypt_page(struct 

[PATCH 3/11] eCryptfs: read_write.c routines

2007-09-17 Thread Michael Halcrow
Add a set of functions through which all I/O to lower files is
consolidated. This patch adds a new inode_info reference to a
persistent lower file for each eCryptfs inode; another patch later in
this series will set that up. This persistent lower file is what the
read_write.c functions use to call vfs_read() and vfs_write() on the
lower filesystem, so even when reads and writes come in through
aops->readpage and aops->writepage, we can satisfy them without
resorting to direct access to the lower inode's address space.
Several function declarations are going to be changing with this
patchset. For now, in order to keep from breaking the build, I am
putting dummy parameters in for those functions.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
---
 fs/ecryptfs/Makefile  |2 +-
 fs/ecryptfs/ecryptfs_kernel.h |   18 ++
 fs/ecryptfs/mmap.c|2 +-
 fs/ecryptfs/read_write.c  |  359 +
 4 files changed, 379 insertions(+), 2 deletions(-)
 create mode 100644 fs/ecryptfs/read_write.c

diff --git a/fs/ecryptfs/Makefile b/fs/ecryptfs/Makefile
index 1f11072..7688570 100644
--- a/fs/ecryptfs/Makefile
+++ b/fs/ecryptfs/Makefile
@@ -4,4 +4,4 @@
 
 obj-$(CONFIG_ECRYPT_FS) += ecryptfs.o
 
-ecryptfs-objs := dentry.o file.o inode.o main.o super.o mmap.o crypto.o 
keystore.o messaging.o netlink.o debug.o
+ecryptfs-objs := dentry.o file.o inode.o main.o super.o mmap.o read_write.o 
crypto.o keystore.o messaging.o netlink.o debug.o
diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
index a618ab7..e6a68a8 100644
--- a/fs/ecryptfs/ecryptfs_kernel.h
+++ b/fs/ecryptfs/ecryptfs_kernel.h
@@ -260,6 +260,8 @@ struct ecryptfs_crypt_stat {
 struct ecryptfs_inode_info {
struct inode vfs_inode;
struct inode *wii_inode;
+   struct file *lower_file;
+   struct mutex lower_file_mutex;
struct ecryptfs_crypt_stat crypt_stat;
 };
 
@@ -653,5 +655,21 @@ int ecryptfs_keyring_auth_tok_for_sig(struct key 
**auth_tok_key,
  char *sig);
 int ecryptfs_write_zeros(struct file *file, pgoff_t index, int start,
 int num_zeros);
+int ecryptfs_write_lower(struct inode *ecryptfs_inode, char *data,
+loff_t offset, size_t size);
+int ecryptfs_write_lower_page_segment(struct inode *ecryptfs_inode,
+ struct page *page_for_lower,
+ size_t offset_in_page, size_t size);
+int ecryptfs_write(struct file *ecryptfs_file, char *data, loff_t offset,
+  size_t size);
+int ecryptfs_read_lower(char *data, loff_t offset, size_t size,
+   struct inode *ecryptfs_inode);
+int ecryptfs_read_lower_page_segment(struct page *page_for_ecryptfs,
+pgoff_t page_index,
+size_t offset_in_page, size_t size,
+struct inode *ecryptfs_inode);
+int ecryptfs_read(char *data, loff_t offset, size_t size,
+ struct file *ecryptfs_file);
+struct page *ecryptfs_get1page(struct file *file, loff_t index);
 
 #endif /* #ifndef ECRYPTFS_KERNEL_H */
diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index 307f7ee..0c53320 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -44,7 +44,7 @@ struct kmem_cache *ecryptfs_lower_page_cache;
  * Returns unlocked and up-to-date page (if ok), with increased
  * refcnt.
  */
-static struct page *ecryptfs_get1page(struct file *file, int index)
+struct page *ecryptfs_get1page(struct file *file, loff_t index)
 {
struct dentry *dentry;
struct inode *inode;
diff --git a/fs/ecryptfs/read_write.c b/fs/ecryptfs/read_write.c
new file mode 100644
index 000..e59c94a
--- /dev/null
+++ b/fs/ecryptfs/read_write.c
@@ -0,0 +1,359 @@
+/**
+ * eCryptfs: Linux filesystem encryption layer
+ *
+ * Copyright (C) 2007 International Business Machines Corp.
+ *   Author(s): Michael A. Halcrow <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
+ * 02111-1307, USA.
+ */
+
+#include 
+#include 
+#include "ecryptfs_kernel.h"
+
+/**
+ * ecryptfs_write_lower
+ * @ecryptfs_inode: The eCryptfs inode
+ * @data: Data to write
+ * @offset: Byte offset in the lower file 

Re: [PATCH] modpost: detect unterminated device id lists

2007-09-17 Thread Andrew Morton
On Tue, 18 Sep 2007 03:15:14 +0530 (IST)
Satyam Sharma <[EMAIL PROTECTED]> wrote:

> 
> 
> On Sun, 16 Sep 2007, Andrew Morton wrote:
> 
> > On Mon, 17 Sep 2007 05:54:45 +0530 "Satyam Sharma" <[EMAIL PROTECTED]> 
> > wrote:
> > 
> > > On 9/17/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > > >
> > > > I'm getting this:
> > > >
> > > > rusb2/pvrusb2: struct usb_device_id is 20 bytes.  The last of 3 is:
> > > > 0x03 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
> > > > 0x00
> > > > 0x00 0x00 0x00 0x00 0x00
> > > > FATAL: drivers/media/video/pvrusb2/pvrusb2: struct usb_device_id is not 
> > > > terminated
> > > > with a NULL entry!
> > > >
> > > > ("rusb2/pvrusb2" ??)
> > > 
> > > Hmm? Are you sure you didn't see any "drivers/media/video/pv" before the
> > > "rusb2/pvrusb2" bit?
> > 
> > Fairly.  I looked twice.
> 
> "drivers/media/video/pvrusb2/pvrusb2" comes out correctly here ...
> 
> 
> > > Looking at Kees' patch (and the existing code), I've no
> > > clue how/why this should happen ... will try to reproduce here ...
> > > 
> > > 
> > > > but:
> > > >
> > > > struct usb_device_id pvr2_device_table[] = {
> > > > [PVR2_HDW_TYPE_29XXX] = { USB_DEVICE(0x2040, 0x2900) },
> > > > [PVR2_HDW_TYPE_24XXX] = { USB_DEVICE(0x2040, 0x2400) },
> > > > { USB_DEVICE(0, 0) },
> > > > };
> > > >
> > > > looks OK?
> > > >
> > > > Using plain old "{ }" shut the warning up.
> > > 
> > > USB_DEVICE(0, 0) is not empty termination, actually, and this looks like
> > > a genuine bug caught by the patch. As that dump shows, USB_DEVICE(0, 0)
> > > assigns "0x03 0x00" (in little endian) to usb_device_id.match_flags. And
> > > I don't think the USB code treats such an entry as an empty entry (?)
> > > 
> > > Interestingly, the "USB_DEVICE(0, 0)" thing is absent from latest -git
> > > tree and also in my copy of 23-rc4-mm1 -- so this looks like something
> > > you must've merged recently.
> > 
> > git-dvb very carefully does
> > 
> > --- a/drivers/media/video/pvrusb2/pvrusb2-hdw.c~git-dvb
> > +++ a/drivers/media/video/pvrusb2/pvrusb2-hdw.c
> > @@ -44,7 +44,7 @@
> >  struct usb_device_id pvr2_device_table[] = {
> > [PVR2_HDW_TYPE_29XXX] = { USB_DEVICE(0x2040, 0x2900) },
> > [PVR2_HDW_TYPE_24XXX] = { USB_DEVICE(0x2040, 0x2400) },
> > -   { }
> > +   { USB_DEVICE(0, 0) },
> > };
> >  
> > MODULE_DEVICE_TABLE(usb, pvr2_device_table);
> 
> Ok, this is a false positive indeed, the core USB code does in fact
> treat such an entry as an empty entry (usb_match_id() tests only the
> .idVendor, .bDeviceClass, .bInterfaceClass and .driver_info members
> for non-zero and not the .match_flags member).
> 
> However, a quick-grep-and-glance tells us that none of the other 2213
> occurrences of USB_DEVICE() in the tree ever do this "(0,0)" thing,
> so it does make sense to change this one to a simple "{ }" as well --
> that's clearer style anyway, and the "standard" way to empty-terminate
> in the rest of the tree, if nothing else.
> 

yeah, I think so.  Mauro, could you please drop that change?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/11] eCryptfs: Remove assignments in if-statements

2007-09-17 Thread Michael Halcrow
Remove assignments in if-statements.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
---
 fs/ecryptfs/crypto.c|   17 --
 fs/ecryptfs/file.c  |8 --
 fs/ecryptfs/inode.c |   35 ++
 fs/ecryptfs/keystore.c  |   55 +-
 fs/ecryptfs/main.c  |   28 ++-
 fs/ecryptfs/messaging.c |5 ++-
 fs/ecryptfs/mmap.c  |5 ++-
 7 files changed, 89 insertions(+), 64 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 3dbb21a..5d8a553 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -1277,8 +1277,8 @@ static int ecryptfs_read_header_region(char *data, struct 
dentry *dentry,
mm_segment_t oldfs;
int rc;
 
-   if ((rc = ecryptfs_open_lower_file(_file, dentry, mnt,
-  O_RDONLY))) {
+   rc = ecryptfs_open_lower_file(_file, dentry, mnt, O_RDONLY);
+   if (rc) {
printk(KERN_ERR
   "Error opening lower_file to read header region\n");
goto out;
@@ -1289,7 +1289,8 @@ static int ecryptfs_read_header_region(char *data, struct 
dentry *dentry,
rc = lower_file->f_op->read(lower_file, (char __user *)data,
  ECRYPTFS_DEFAULT_EXTENT_SIZE, _file->f_pos);
set_fs(oldfs);
-   if ((rc = ecryptfs_close_lower_file(lower_file))) {
+   rc = ecryptfs_close_lower_file(lower_file);
+   if (rc) {
printk(KERN_ERR "Error closing lower_file\n");
goto out;
}
@@ -1951,9 +1952,10 @@ ecryptfs_add_new_key_tfm(struct ecryptfs_key_tfm 
**key_tfm, char *cipher_name,
strncpy(tmp_tfm->cipher_name, cipher_name,
ECRYPTFS_MAX_CIPHER_NAME_SIZE);
tmp_tfm->key_size = key_size;
-   if ((rc = ecryptfs_process_key_cipher(_tfm->key_tfm,
- tmp_tfm->cipher_name,
- _tfm->key_size))) {
+   rc = ecryptfs_process_key_cipher(_tfm->key_tfm,
+tmp_tfm->cipher_name,
+_tfm->key_size);
+   if (rc) {
printk(KERN_ERR "Error attempting to initialize key TFM "
   "cipher with name = [%s]; rc = [%d]\n",
   tmp_tfm->cipher_name, rc);
@@ -1988,7 +1990,8 @@ int ecryptfs_get_tfm_and_mutex_for_cipher_name(struct 
crypto_blkcipher **tfm,
}
}
mutex_unlock(_tfm_list_mutex);
-   if ((rc = ecryptfs_add_new_key_tfm(_tfm, cipher_name, 0))) {
+   rc = ecryptfs_add_new_key_tfm(_tfm, cipher_name, 0);
+   if (rc) {
printk(KERN_ERR "Error adding new key_tfm to list; rc = [%d]\n",
   rc);
goto out;
diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index 12ba7e3..59c846d 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -230,8 +230,9 @@ static int ecryptfs_open(struct inode *inode, struct file 
*file)
lower_flags &= ~O_APPEND;
lower_mnt = ecryptfs_dentry_to_lower_mnt(ecryptfs_dentry);
/* Corresponding fput() in ecryptfs_release() */
-   if ((rc = ecryptfs_open_lower_file(_file, lower_dentry, lower_mnt,
-  lower_flags))) {
+   rc = ecryptfs_open_lower_file(_file, lower_dentry, lower_mnt,
+ lower_flags);
+   if (rc) {
ecryptfs_printk(KERN_ERR, "Error opening lower file\n");
goto out_puts;
}
@@ -300,7 +301,8 @@ static int ecryptfs_release(struct inode *inode, struct 
file *file)
struct inode *lower_inode = ecryptfs_inode_to_lower(inode);
int rc;
 
-   if ((rc = ecryptfs_close_lower_file(lower_file))) {
+   rc = ecryptfs_close_lower_file(lower_file);
+   if (rc) {
printk(KERN_ERR "Error closing lower_file\n");
goto out;
}
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index abac91c..d70f599 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -202,8 +202,9 @@ static int ecryptfs_initialize_file(struct dentry 
*ecryptfs_dentry)
lower_flags = ((O_CREAT | O_TRUNC) & O_ACCMODE) | O_RDWR;
lower_mnt = ecryptfs_dentry_to_lower_mnt(ecryptfs_dentry);
/* Corresponding fput() at end of this function */
-   if ((rc = ecryptfs_open_lower_file(_file, lower_dentry, lower_mnt,
-  lower_flags))) {
+   rc = ecryptfs_open_lower_file(_file, lower_dentry, lower_mnt,
+ lower_flags);
+   if (rc) {
ecryptfs_printk(KERN_ERR,
"Error opening dentry; rc = [%i]\n", rc);
goto out;
@@ -229,7 +230,8 @@ static int ecryptfs_initialize_file(struct dentry 
*ecryptfs_dentry)

[PATCH 1/11] eCryptfs: Remove header_extent_size

2007-09-17 Thread Michael Halcrow
There is no point to keeping a separate header_extent_size and an
extent_size. The total size of the header can always be represented as
some multiple of the regular data extent size.

Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
---
 fs/ecryptfs/crypto.c  |   40 
 fs/ecryptfs/ecryptfs_kernel.h |   39 +++
 fs/ecryptfs/inode.c   |7 ---
 fs/ecryptfs/mmap.c|2 +-
 4 files changed, 52 insertions(+), 36 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 8e9b36d..3dbb21a 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -366,8 +366,8 @@ ecryptfs_extent_to_lwr_pg_idx_and_offset(unsigned long 
*lower_page_idx,
int extents_per_page;
 
bytes_occupied_by_headers_at_front =
-   ( crypt_stat->header_extent_size
- * crypt_stat->num_header_extents_at_front );
+   (crypt_stat->extent_size
+* crypt_stat->num_header_extents_at_front);
extents_occupied_by_headers_at_front =
( bytes_occupied_by_headers_at_front
  / crypt_stat->extent_size );
@@ -376,8 +376,8 @@ ecryptfs_extent_to_lwr_pg_idx_and_offset(unsigned long 
*lower_page_idx,
(*lower_page_idx) = lower_extent_num / extents_per_page;
extent_offset = lower_extent_num % extents_per_page;
(*byte_offset) = extent_offset * crypt_stat->extent_size;
-   ecryptfs_printk(KERN_DEBUG, " * crypt_stat->header_extent_size = "
-   "[%d]\n", crypt_stat->header_extent_size);
+   ecryptfs_printk(KERN_DEBUG, " * crypt_stat->extent_size = "
+   "[%d]\n", crypt_stat->extent_size);
ecryptfs_printk(KERN_DEBUG, " * crypt_stat->"
"num_header_extents_at_front = [%d]\n",
crypt_stat->num_header_extents_at_front);
@@ -899,15 +899,17 @@ void ecryptfs_set_default_sizes(struct 
ecryptfs_crypt_stat *crypt_stat)
crypt_stat->extent_size = ECRYPTFS_DEFAULT_EXTENT_SIZE;
set_extent_mask_and_shift(crypt_stat);
crypt_stat->iv_bytes = ECRYPTFS_DEFAULT_IV_BYTES;
-   if (PAGE_CACHE_SIZE <= ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE) {
-   crypt_stat->header_extent_size =
-   ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE;
-   } else
-   crypt_stat->header_extent_size = PAGE_CACHE_SIZE;
if (crypt_stat->flags & ECRYPTFS_METADATA_IN_XATTR)
crypt_stat->num_header_extents_at_front = 0;
-   else
-   crypt_stat->num_header_extents_at_front = 1;
+   else {
+   if (PAGE_CACHE_SIZE <= ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE)
+   crypt_stat->num_header_extents_at_front =
+   (ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE
+/ crypt_stat->extent_size);
+   else
+   crypt_stat->num_header_extents_at_front =
+   (PAGE_CACHE_SIZE / crypt_stat->extent_size);
+   }
 }
 
 /**
@@ -1319,7 +1321,7 @@ ecryptfs_write_header_metadata(char *virt,
u32 header_extent_size;
u16 num_header_extents_at_front;
 
-   header_extent_size = (u32)crypt_stat->header_extent_size;
+   header_extent_size = (u32)crypt_stat->extent_size;
num_header_extents_at_front =
(u16)crypt_stat->num_header_extents_at_front;
header_extent_size = cpu_to_be32(header_extent_size);
@@ -1415,7 +1417,7 @@ ecryptfs_write_metadata_to_contents(struct 
ecryptfs_crypt_stat *crypt_stat,
set_fs(oldfs);
goto out;
}
-   header_pages = ((crypt_stat->header_extent_size
+   header_pages = ((crypt_stat->extent_size
 * crypt_stat->num_header_extents_at_front)
/ PAGE_CACHE_SIZE);
memset(page_virt, 0, PAGE_CACHE_SIZE);
@@ -1532,17 +1534,16 @@ static int parse_header_metadata(struct 
ecryptfs_crypt_stat *crypt_stat,
virt += 4;
memcpy(_header_extents_at_front, virt, 2);
num_header_extents_at_front = be16_to_cpu(num_header_extents_at_front);
-   crypt_stat->header_extent_size = (int)header_extent_size;
crypt_stat->num_header_extents_at_front =
(int)num_header_extents_at_front;
-   (*bytes_read) = 6;
+   (*bytes_read) = (sizeof(u32) + sizeof(u16));
if ((validate_header_size == ECRYPTFS_VALIDATE_HEADER_SIZE)
-   && ((crypt_stat->header_extent_size
+   && ((crypt_stat->extent_size
 * crypt_stat->num_header_extents_at_front)
< ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE)) {
rc = -EINVAL;
-   ecryptfs_printk(KERN_WARNING, "Invalid header extent size: "
-   "[%d]\n", crypt_stat->header_extent_size);
+   printk(KERN_WARNING "Invalid number 

[PATCH 0/11] eCryptfs: Introduce persistent lower files for each eCryptfs inode

2007-09-17 Thread Michael Halcrow
Currently, eCryptfs directly accesses the lower inode address space,
doing things like grab_cache_page() on lower_inode->i_mapping. It
really should not do that. The main point of this patch set is to make
all I/O with the lower files go through vfs_read() and vfs_write()
instead.

In order to accomplish this, eCryptfs needs a way to call vfs_read()
and vfs_write() on the lower file when ecryptfs_aops->readpage() and
ecryptfs_aops->writepage() are called. I propose keeping a persistent
lower file around for each eCryptfs inode. This is the only lower file
that eCryptfs will open for any given eCryptfs inode; multiple
eCryptfs files may map to this one persistent lower file. When the
eCrypfs inode is destroyed, this persistent lower file is closed.

Consolidating all reads and writes to the lower file to a single
execution path simplifies the code. This should also make it easier to
port eCryptfs to use the asynchronous crypto API functions. Note that
this patch set also removes all direct calls to lower prepare_write()
and commite_write(), fixing an oops when mounted on NFS.

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: InfiniBand/RDMA merge plans for 2.6.24

2007-09-17 Thread Roland Dreier
 > > IPoIB CM handles this properly by gathering together single pages in
 > > skbs' fragment lists.

 > Then can we reuse IPoIB CM code here?

Yes, if possible, refactoring things so that the rx skb allocation
code becomes common between CM and non-CM would definitely make sense.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] revert ath5k ioread32()/iowrite32() usage - use readl()/writel(), we're MMIO-only

2007-09-17 Thread Jiri Slaby
On 09/17/2007 10:59 PM, Jeff Garzik wrote:
> Jiri Slaby wrote:
>> NACK, this is wrong. iomap returns platform dependant return value,
>> which may or
> 
> Incorrect.  readl() and writel() work just fine on all existing
> platforms where Atheros may be used.

Ok, this is what Alan Cox wrote about that and you didn't reply to it, so I
thought he's right. Anyway I wouldn't rely on iomap that it will never be
changed even on x86 -- what's the (performance) impact of having ioread instead
of readl? How much data are transferred this way?

http://lkml.org/lkml/2007/8/25/50;>
On Sat, 25 Aug 2007 04:56:19 -0400
Jeff Garzik <[EMAIL PROTECTED]> wrote:

> If the driver knows its MMIO, using readX/writeX after pci_iomap() is
> just fine, for all current implementations, and it makes sense that way.

There is nothing that guarantees this is permitted, any more than there
is anything saying not to use outb/outl. Some of the implementations do
quite strange things. It may happen to work but its not in the
documentation or the comments.

If you want to change this then you need to check the existing usages and
update all the docs if its safe, oh and tell the sparc64 pcmcia people to
take a hike, which is probably not a big problem.


Please, can anybody clarify it?

thanks,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 alpha unistd.h changes

2007-09-17 Thread Adrian Bunk
On Mon, Sep 17, 2007 at 10:33:07PM +0200, Oliver Falk wrote:
> Hi!

Hi Oliver!

>...
> As these additions are quite new to upstream kernel, but at Alphacore we
> have patched it since a while now (I don't know about other Alpha ports;
> Debian folks may speak up now!), I would suggest to use the same
> 'ordering' of the syscalls upstream and add the new syscalls that we had
> not in place, but are now upstream to the end of our 'old' list.
>...

I just checked:

It seems Debian didn't patch them into the kernel at all, and since two 
months Debian unstable ships kernel 2.6.22 with the upstream syscall 
numbers.

> Best,
>  Oliver

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] modpost: detect unterminated device id lists

2007-09-17 Thread Satyam Sharma


On Sun, 16 Sep 2007, Andrew Morton wrote:

> On Mon, 17 Sep 2007 05:54:45 +0530 "Satyam Sharma" <[EMAIL PROTECTED]> wrote:
> 
> > On 9/17/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > >
> > > I'm getting this:
> > >
> > > rusb2/pvrusb2: struct usb_device_id is 20 bytes.  The last of 3 is:
> > > 0x03 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> > > 0x00 0x00 0x00 0x00 0x00
> > > FATAL: drivers/media/video/pvrusb2/pvrusb2: struct usb_device_id is not 
> > > terminated
> > > with a NULL entry!
> > >
> > > ("rusb2/pvrusb2" ??)
> > 
> > Hmm? Are you sure you didn't see any "drivers/media/video/pv" before the
> > "rusb2/pvrusb2" bit?
> 
> Fairly.  I looked twice.

"drivers/media/video/pvrusb2/pvrusb2" comes out correctly here ...


> > Looking at Kees' patch (and the existing code), I've no
> > clue how/why this should happen ... will try to reproduce here ...
> > 
> > 
> > > but:
> > >
> > > struct usb_device_id pvr2_device_table[] = {
> > > [PVR2_HDW_TYPE_29XXX] = { USB_DEVICE(0x2040, 0x2900) },
> > > [PVR2_HDW_TYPE_24XXX] = { USB_DEVICE(0x2040, 0x2400) },
> > > { USB_DEVICE(0, 0) },
> > > };
> > >
> > > looks OK?
> > >
> > > Using plain old "{ }" shut the warning up.
> > 
> > USB_DEVICE(0, 0) is not empty termination, actually, and this looks like
> > a genuine bug caught by the patch. As that dump shows, USB_DEVICE(0, 0)
> > assigns "0x03 0x00" (in little endian) to usb_device_id.match_flags. And
> > I don't think the USB code treats such an entry as an empty entry (?)
> > 
> > Interestingly, the "USB_DEVICE(0, 0)" thing is absent from latest -git
> > tree and also in my copy of 23-rc4-mm1 -- so this looks like something
> > you must've merged recently.
> 
> git-dvb very carefully does
> 
> --- a/drivers/media/video/pvrusb2/pvrusb2-hdw.c~git-dvb
> +++ a/drivers/media/video/pvrusb2/pvrusb2-hdw.c
> @@ -44,7 +44,7 @@
>  struct usb_device_id pvr2_device_table[] = {
>   [PVR2_HDW_TYPE_29XXX] = { USB_DEVICE(0x2040, 0x2900) },
>   [PVR2_HDW_TYPE_24XXX] = { USB_DEVICE(0x2040, 0x2400) },
> -   { }
> +   { USB_DEVICE(0, 0) },
> };
>
> MODULE_DEVICE_TABLE(usb, pvr2_device_table);

Ok, this is a false positive indeed, the core USB code does in fact
treat such an entry as an empty entry (usb_match_id() tests only the
.idVendor, .bDeviceClass, .bInterfaceClass and .driver_info members
for non-zero and not the .match_flags member).

However, a quick-grep-and-glance tells us that none of the other 2213
occurrences of USB_DEVICE() in the tree ever do this "(0,0)" thing,
so it does make sense to change this one to a simple "{ }" as well --
that's clearer style anyway, and the "standard" way to empty-terminate
in the rest of the tree, if nothing else.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] [PATCH] [WORKAROUND] CONFIG_PREEMPT_RT and ib_umad_close() issue

2007-09-17 Thread Roland Dreier
Thanks for the explanation...

 > But basically, with CONFIG_PREEMPT_RT enabled, the lock points, such as
 > aqcuiring a spinlock, potentially become places where the current task
 > may be context switched out / preempted.
 > 
 > Therefore, when a call is made to lock a spinlock for example, the
 > caller should not currently have irqs disabled, or preemption disabled,
 > since a context switch may occur.

this doesn't seem relevant here...

 > void fastcall rt_downgrade_write(struct rw_semaphore *rwsem)
 > {
 > BUG();
 > }

this seems to be the problem... the -rt patch turns downgrade_write()
into a BUG().

I need to look at the locking in user_mad.c again, but I think it may
be possible to replace both places that do downgrade_write() with
up_write() followed by down_read().

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/33] containers implement namespace tracking subsystem

2007-09-17 Thread Paul Menage
From: "Serge E. Hallyn" <[EMAIL PROTECTED]>
(container->cgroup renaming by Paul Menage <[EMAIL PROTECTED]>)

When a task enters a new namespace via a clone() or unshare(), a new cgroup
is created and the task moves into it.

This version names cgroups which are automatically created using
cgroup_clone() as "node_" where pid is the pid of the unsharing or
cloned process.  (Thanks Pavel for the idea) This is safe because if the
process unshares again, it will create

/cgroups/(...)/node_/node_

The only possibilities (AFAICT) for a -EEXIST on unshare are

1. pid wraparound
2. a process fails an unshare, then tries again.

Case 1 is unlikely enough that I ignore it (at least for now).  In case 2, the
node_ will be empty and can be rmdir'ed to make the subsequent unshare()
succeed.

Changelog:
Name cloned cgroups as "node_".

Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]>
Signed-off-by: Paul Menage <[EMAIL PROTECTED]>

---

 include/linux/cgroup_subsys.h |6 +
 include/linux/nsproxy.h  |7 ++
 init/Kconfig |9 ++
 kernel/Makefile  |1 
 kernel/ns_cgroup.c|  100 +
 kernel/nsproxy.c |   17 
 6 files changed, 139 insertions(+), 1 deletion(-)

diff -puN 
include/linux/cgroup_subsys.h~cgroups-implement-namespace-tracking-subsystem 
include/linux/cgroup_subsys.h
--- 
a/include/linux/cgroup_subsys.h~cgroups-implement-namespace-tracking-subsystem
+++ a/include/linux/cgroup_subsys.h
@@ -24,3 +24,9 @@ SUBSYS(debug)
 #endif
 
 /* */
+
+#ifdef CONFIG_CGROUP_NS
+SUBSYS(ns)
+#endif
+
+/* */
diff -puN 
include/linux/nsproxy.h~cgroups-implement-namespace-tracking-subsystem 
include/linux/nsproxy.h
--- a/include/linux/nsproxy.h~cgroups-implement-namespace-tracking-subsystem
+++ a/include/linux/nsproxy.h
@@ -55,4 +55,11 @@ static inline void exit_task_namespaces(
put_nsproxy(ns);
}
 }
+
+#ifdef CONFIG_CGROUP_NS
+int ns_cgroup_clone(struct task_struct *tsk);
+#else
+static inline int ns_cgroup_clone(struct task_struct *tsk) { return 0; }
+#endif
+
 #endif
diff -puN init/Kconfig~cgroups-implement-namespace-tracking-subsystem 
init/Kconfig
--- a/init/Kconfig~cgroups-implement-namespace-tracking-subsystem
+++ a/init/Kconfig
@@ -323,6 +323,15 @@ config SYSFS_DEPRECATED
  If you are using a distro that was released in 2006 or later,
  it should be safe to say N here.
 
+config CGROUP_NS
+bool "Namespace cgroup subsystem"
+select CGROUPS
+help
+  Provides a simple namespace cgroup subsystem to
+  provide hierarchical naming of sets of namespaces,
+  for instance virtual servers and checkpoint/restart
+  jobs.
+
 config PROC_PID_CPUSET
bool "Include legacy /proc//cpuset file"
depends on CPUSETS
diff -puN kernel/Makefile~cgroups-implement-namespace-tracking-subsystem 
kernel/Makefile
--- a/kernel/Makefile~cgroups-implement-namespace-tracking-subsystem
+++ a/kernel/Makefile
@@ -42,6 +42,7 @@ obj-$(CONFIG_CGROUPS) += cgroup.o
 obj-$(CONFIG_CGROUP_DEBUG) += cgroup_debug.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_CPUACCT) += cpu_acct.o
+obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
 obj-$(CONFIG_IKCONFIG) += configs.o
 obj-$(CONFIG_STOP_MACHINE) += stop_machine.o
 obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
diff -puN /dev/null kernel/ns_cgroup.c
--- /dev/null
+++ a/kernel/ns_cgroup.c
@@ -0,0 +1,100 @@
+/*
+ * ns_cgroup.c - namespace cgroup subsystem
+ *
+ * Copyright 2006, 2007 IBM Corp
+ */
+
+#include 
+#include 
+#include 
+
+struct ns_cgroup {
+   struct cgroup_subsys_state css;
+   spinlock_t lock;
+};
+
+struct cgroup_subsys ns_subsys;
+
+static inline struct ns_cgroup *cgroup_to_ns(
+   struct cgroup *cgroup)
+{
+   return container_of(cgroup_subsys_state(cgroup, ns_subsys_id),
+   struct ns_cgroup, css);
+}
+
+int ns_cgroup_clone(struct task_struct *task)
+{
+   return cgroup_clone(task, _subsys);
+}
+
+/*
+ * Rules:
+ *   1. you can only enter a cgroup which is a child of your current
+ * cgroup
+ *   2. you can only place another process into a cgroup if
+ * a. you have CAP_SYS_ADMIN
+ * b. your cgroup is an ancestor of task's destination cgroup
+ *   (hence either you are in the same cgroup as task, or in an
+ *ancestor cgroup thereof)
+ */
+static int ns_can_attach(struct cgroup_subsys *ss,
+   struct cgroup *new_cgroup, struct task_struct *task)
+{
+   struct cgroup *orig;
+
+   if (current != task) {
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (!cgroup_is_descendant(new_cgroup))
+   return -EPERM;
+   }
+
+   if (atomic_read(_cgroup->count) != 0)
+   return -EPERM;
+
+   orig = task_cgroup(task, ns_subsys_id);
+   if (orig && orig != new_cgroup->parent)
+   

Re: [PATCH] 2.6.23-rc6: Fix NUMA Memory Policy Reference Counting

2007-09-17 Thread Christoph Lameter
On Mon, 17 Sep 2007, Lee Schermerhorn wrote:

> Only for vma policy, right?  show_numa_maps() isn't a performance path,
> and shared policies are already reference counted--just not unref'd!

Right.

> I do have some ideas for enhancements to memtoy to test vma policies in
> a multi-threaded task.  I have the basic multi-threading infrastructure
> that binds threads to cpus, allocates node local stacks, thread state
> structs, ... in my mmtrace tool that I can probably hack for use in
> memtoy to provoke cacheline bouncing of the mem policy.  But, if pft
> does the trick, I won't rush the memtoy enhancments...

Well pft is old and limited in what it can do. I'd be glad if you could 
put it into memtoy. Then it may perhaps be useful in the future.
 
> Meanwhile, we do have a mem policy ref counting bug in the mainline.

But we have had this ref counting issue forever with no ill effect. Memory 
policies were designed to have almost no overhead for the default 
allocation paths. Incrementing and decrementing refcounters makes that 
design no longer light weight as it was intended to be.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Configurable reclaim batch size

2007-09-17 Thread Christoph Lameter
On Tue, 18 Sep 2007, Balbir Singh wrote:

> Please do let me know if someone finds a good standard test for it or a
> way to stress reclaim. I've heard AIM7 come up often, but never been
> able to push it much. I should retry.

AIM7 does small computing loads reflecting an earlier time. I wish there 
was something better reflecting large computing loads of today. The tests 
that I know of require MPI and other libraries and are not that suitable 
for kernel hackers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/33] Rename "Task Containers" to "Control Groups"

2007-09-17 Thread Paul Menage
--


This patchset renames "Task Containers" to "Control Groups", in line with
recent discussions.

These patches are drop-in replacements for those of the same names in
the current -mm tree, as listed at the bottom of this email. I have
tried to keep the patch names (based on the description lines of each
patch) the same even where such patch names refer to "containers"
rather than "cgroups" - hopefully after these have been imported into
-mm, Andrew can rename the patches in his quilt stack.

Note that the Signed-off-by lines in these patches (other than mine)
refer to the original versions of these patches, before the rename. I
believe that all such original authors were Cc'd on the review email
last week and no-one has objected to the rename.

The new patches are constructed by running the following perl script
over the original versions of the patches in 2.6.23-rc4-mm1 (and others seen on 
mm-commits since then):

perl -pi -e 's/subcontainer/child cgroup/g; 
s/(\b|_)container(\b|_(?!of)|fs|s)/$1cgroup$2/g; s/CONTAINER(_|S)/CGROUP$1/g; 
s/Container/Control Group/g; s/css_group/css_set/g;' patches/*.patch

Owners of other control-group related patches may want to run this
script or some variation of it on their own patches.

Replaced patches


task-containersv11-basic-task-container-framework.patch
task-containersv11-basic-task-container-framework-fix.patch
task-containersv11-add-tasks-file-interface.patch
task-containersv11-add-fork-exit-hooks.patch
task-containersv11-add-container_clone-interface.patch
task-containersv11-add-procfs-interface.patch
task-containersv11-shared-container-subsystem-group-arrays.patch
task-containersv11-shared-container-subsystem-group-arrays-avoid-lockdep-warning.patch
task-containersv11-shared-container-subsystem-group-arrays-include-fix.patch
task-containersv11-automatic-userspace-notification-of-idle-containers.patch
task-containersv11-make-cpusets-a-client-of-containers.patch
task-containersv11-example-cpu-accounting-subsystem.patch
task-containersv11-simple-task-container-debug-info-subsystem.patch

task-containersv11-basic-task-container-framework-containers-fix-refcount-bug.patch
task-containersv11-add-container_clone-interface-containers-fix-refcount-bug.patch

add-containerstats-v3.patch
add-containerstats-v3-fix.patch

containers-implement-namespace-tracking-subsystem.patch
containers-implement-namespace-tracking-subsystem-fix-order-of-container-subsystems-in-init-kconfig.patch

memory-controller-add-documentation.patch
memory-controller-resource-counters-v7.patch
memory-controller-resource-counters-v7-fix.patch
memory-controller-containers-setup-v7.patch
memory-controller-accounting-setup-v7.patch
memory-controller-memory-accounting-v7.patch
memory-controller-task-migration-v7.patch
memory-controller-add-per-container-lru-and-reclaim-v7.patch
memory-controller-add-per-container-lru-and-reclaim-v7-fix.patch
memory-controller-oom-handling-v7.patch
memory-controller-add-switch-to-control-what-type-of-pages-to-limit-v7.patch
memory-controller-make-page_referenced-container-aware-v7.patch
memory-controller-improve-user-interface.patch
memory-controller-make-charging-gfp-mask-aware.patch

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23 alpha unistd.h changes

2007-09-17 Thread Adrian Bunk
On Mon, Sep 17, 2007 at 10:33:07PM +0200, Oliver Falk wrote:
> Hi!

Hi Oliver!

> At Alphacore we used to patch the kernel headers for a while now; We
> added syscalls __NR_openat (447) until __NR_tee (466).

Why did your numbers differ from the numbers that were used in the 
upstream kernel?

The Alpha maintainers (Cc's added) might now better what happened here.

> However, since 2.6.23 these syscall where added upstream, but with
> different syscall numbers; What happens is the following:
>...

These syscalls were added in 2.6.22, not 2.6.23, and are therefore in 
the officially released kernel since more than two months.

Changing a userspace ABI that has already been part of an officially 
released kernel because someone patched other syscall numbers into his 
private kernel doesn't sound like a good solution.

> Best,
>  Oliver

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.23-rc4-mm1][Bug] kernel BUG at include/linux/netdevice.h:339!

2007-09-17 Thread Andrew Morton
On Mon, 17 Sep 2007 17:46:38 +0530
Kamalesh Babulal <[EMAIL PROTECTED]> wrote:

> Kernel Bug is hit with 2.6.23-rc4-mm1 kernel on ppc64 machine.
> 
> kernel BUG at include/linux/netdevice.h:339!

(please cc [EMAIL PROTECTED] on networking-related matters)

You died here:

static inline void napi_complete(struct napi_struct *n)
{
BUG_ON(!test_bit(NAPI_STATE_SCHED, >state));

The NAPI changes have had a few problems and hopefully things have
been fixed up since then.  I'll try to get rc6-mm1 out this evening,
so please retest that?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mac_hid: fix build error if MAC_EMUMOUSEBTN && !INPUT

2007-09-17 Thread Andreas Herrmann
Hi,

With current git an invalid kernel configuration is
selectable which leads to kernel build errors for mac_hid.
To prevent this selection I suggest follwoing patch.

Regards,

Andreas
-- 
Build error if MAC_EMUMOUSEBTN && !INPUT:

  LD  .tmp_vmlinux1
drivers/built-in.o: In function `input_report_key':
include/linux/input.h:1158: undefined reference to `input_event'
...

Auto-select INPUT for MAC_EMUMOUSEBTN option.

Signed-off-by: Andreas Herrmann <[EMAIL PROTECTED]>
---
 drivers/macintosh/Kconfig |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/macintosh/Kconfig b/drivers/macintosh/Kconfig
index 56cd899..77f50b6 100644
--- a/drivers/macintosh/Kconfig
+++ b/drivers/macintosh/Kconfig
@@ -172,6 +172,7 @@ config INPUT_ADBHID
 
 config MAC_EMUMOUSEBTN
bool "Support for mouse button 2+3 emulation"
+   select INPUT
help
  This provides generic support for emulating the 2nd and 3rd mouse
  button with keypresses.  If you say Y here, the emulation is still
-- 
1.5.3


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >