Re: [SLUB 0/3] SLUB: The unqueued slab allocator V4

2007-03-09 Thread Mel Gorman

On Thu, 8 Mar 2007, Christoph Lameter wrote:


Note that I am amazed that the kernbench even worked. On small machine


How small? The machines I am testing on aren't "big" but they aren't 
misterable either.



I
seem to be getting into trouble with order 1 allocations.


That in itself is pretty incredible. From what I see, allocations up to 3 
generally work unless they are atomic even with the vanilla kernel. That 
said, it could be because slab is holding onto the high order pages for 
itself.



SLAB seems to be
able to avoid the situation by keeping higher order pages on a freelist
and reduce the alloc/frees of higher order pages that the page allocator
has to deal with. Maybe we need per order queues in the page allocator?



I'm not sure what you mean by per-order queues. The buddy allocator 
already has per-order lists.



There must be something fundamentally wrong in the page allocator if the
SLAB queues fix this issue. I was able to fix the issue in V5 by forcing
SLUB to keep a mininum number of objects around regardless of the fit to
a page order page. Pass through is deadly since the crappy page allocator
cannot handle it.

Higher order page allocation failures can be avoided by using kmalloc.
Yuck! Hopefully your patches fix that fundamental problem.



One way to find out for sure.

--
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Pluggable Schedulers (was: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler)

2007-03-09 Thread William Lee Irwin III
William Lee Irwin III wrote:
>> I consider policy issues to be hopeless political quagmires and
>> therefore stick to mechanism. So even though I may have started the
>> code in question, I have little or nothing to say about that sort of
>> use for it.
>> There's my longwinded excuse for having originated that tidbit of code.

On Fri, Mar 09, 2007 at 04:25:55PM +0300, Al Boldi wrote:
> I've no idea what both of you are talking about.

The short translation of my message for you is "Linus, please don't
LART me too hard."


On Fri, Mar 09, 2007 at 04:25:55PM +0300, Al Boldi wrote:
> How can giving people the freedom of choice be in any way counter-productive?

This sort of concern is too subjective for me to have an opinion on it.
My preferred sphere of operation is the Manichean domain of faster vs.
slower, functionality vs. non-functionality, and the like. For me, such
design concerns are like the need for a kernel to format pagetables so
the x86 MMU decodes what was intended, or for a compiler to emit valid
assembly instructions, or for a programmer to write C the compiler
won't reject with parse errors. If Linus, akpm, et al object to the
design, then invalid output was produced. Please refer to Linus, akpm,
et al for these sorts of design concerns.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bugs in kernel 2.6.21 (both two release candidates) and kernel 2.6.20

2007-03-09 Thread Uwe Bugla

 Original-Nachricht 
Datum: Tue, 6 Mar 2007 09:57:34 -0800 (PST)
Von: Linus Torvalds <[EMAIL PROTECTED]>
An: Uwe Bugla <[EMAIL PROTECTED]>
CC: linux-kernel@vger.kernel.org, [EMAIL PROTECTED]
Betreff: Re: bugs in kernel 2.6.21 (both two release candidates) and kernel 
2.6.20

> 
Hello Linus,
> 
> [ Whitespace in your email fixed so that it's easier to read again ;^]
> 
Thanks.
> On Tue, 6 Mar 2007, Uwe Bugla wrote:
> >
> > But please do not compare me with maintainers, especially never in this 
> > life with the suboptimal one from linuxtv.org being in question.
> >
> >  a. I ain't no maintainer
> 
> Sure. The problem is that it's actually really really hard (read: almost 
> totally impossible) to find a person that everybody will agree is a good 
> maintainer. It's just a really hard job, and we all screw up.
> 

I know.

> Not just from a technical angle too - quite often, the biggest problem for
> a maintainer is just the people skills.

Exactly! Although I had a lot of positive experiences in connection with 
mm-tree bug fixing. Ask Andrew if you do not believe. 

There's a fair number of people 
> that are good at technology but have no people skills what-so-ever, and 
> those are usually *worse* maintainers than people who may not be 100% up 
> on all the technical issues, but have no problem working with people who 
> do.

You are definitely right.

> 
> Finding somebody who is both technically top-notch *and* can work with 
> people is so rare as to be something you shouldn't even look for. That's 
> especially true since quite often, maintainership doesn't even come with a
> lot of glory - just a lot of work, and the expectation that you always be 
> there.

Yes. But I am the last one to avoid a "thank you" or "well done" if I notice 
that I have met someone doing a pretty good job! Positive motivation, OK?

> 
> So when you call maintainers suboptimal, please realize that:
> 
>  - We're *all* suboptimal.

Of course. But the one in question @linuxtv.org is extreme as far as this is 
concerned.

> 
>I personally, of course, am totally perfect, and never ever make any 
>mistakes (did I already mention that I'm also good-looking?) but even 
>despite my obviously superior features, some people have the temerity 
>to point out that they think I make mistakes and argue way too much.

a. I am good-looking too. If you doubt, ask my women, (yes plural!) they will 
agree.

b. Apart from the sarcastical background of that statement your biggest fault 
is that you have been publishing horrible kernels since December 2006. The 
speed (1 release candidate almost every week) is too high, completely insane.

The result is crap code as hell in official release candidates.
To get those regressions fixed you feel like a hamster in a running wheel.
It simply feels inhuman, like a robot.

I remember Michael Krufky posting something like: "Linus, this code needs more 
testing". And your answer was: "Too late, already being pushed!"
See, Linus, you cannot go on like this, so please slow down now, man!

I can very well understand Andrew asking me to test the mm-tree which conforms 
to something like a filter system for buggy code. But when should I do this if 
I need a machine running for 10 hours to wait that the "hopefully" last 
regression in kernel 2.6.20 can be identified? And fixed by Bart or Alan 
hopefully?


And if I cannot even trust in the fact that an official rc is sane, how should 
I or anybody else draw conclusions which module or which hunks are faulty then 
in some mm-tree? And I still do not know how to do git bisecting. Never done 
that.
You cannot test the quality of the roof of a house if its fundament is faulty 
and buggy like hell, can you?? SLOW DOWN MAN!

> 
>So imagine that if people can find fault in the absolute perfection 
>that is Linus "almost Godlike" Torvalds, what about some poor sap who 
>maintains a piece of hardware with crappy documentation and a difficult
>user base? And he doesn't even get the recognition that I do, so he's 
>just left with tons of abuse and may not be paid to do what he wants to
>do, so he has to do other work *too*.

I guess I can see through the problem.
> 
>  - if you want to change something, it's fine to not be entirely polite 
>all the time: Al Viro and Christoph "is my hair blue this week?" 
>Hellwig are *famous* for being blunt bastards that are negative as hell
>(and hey, so am I), but they are also well-known for getting things 
>done and mostly being right.
> 
>And building that up takes time.


Who is Al Viro and Christoph? The two examples of the reproduction of your 
personal genetic code I suppose?


> 
> >  b. As far as my virtues are concerned in comparison to his the 
> > comparison is completely displaced.
> 
> Well, not entirely. The thing is, we all have "mental filters". Like it or
> not, people get associated with what they do, and even if they then do 
> 

Re: [SLUB 0/3] SLUB: The unqueued slab allocator V4

2007-03-09 Thread Mel Gorman

On Thu, 8 Mar 2007, Christoph Lameter wrote:


On Thu, 8 Mar 2007, Mel Gorman wrote:


Note that the 16kb page size has a major
impact on SLUB performance. On IA64 slub will use only 1/4th the locking
overhead as on 4kb platforms.

It'll be interesting to see the kernbench tests then with debugging
disabled.


You can get a similar effect on 4kb platforms by specifying slub_min_order=2 on 
bootup.
This means that we have to rely on your patches to allow higher order
allocs to work reliably though.


It should work out because of the way buddy always selects the minimum 
page size will tend to cluster the slab allocations together whether they 
are reclaimable or not. It's something I can investigate when slub has 
stabilised a bit.


However, in general, high order kernel allocations remain a bad idea. 
Depending on high order allocations that do not group could potentially 
lead to a situation where the movable areas are used more and more by 
kernel allocations. I cannot think of a workload that would actually break 
everything, but it's a possibility.



The higher the order of slub the less
locking overhead. So the better your patches deal with fragmentation the
more we can reduce locking overhead in slub.



I can certainly kick it around a lot and see what happen. It's best that 
slub_min_order=2 remain an optional performance enhancing switch though.


--
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_write() racy for multi-threaded append?

2007-03-09 Thread Eric Dumazet
On Friday 09 March 2007 13:19, Michael K. Edwards wrote:
> On 3/8/07, Benjamin LaHaise <[EMAIL PROTECTED]> wrote:
> > Any number of things can cause a short write to occur, and rewinding the
> > file position after the fact is just as bad.  A sane app has to either
> > serialise the writes itself or use a thread safe API like pwrite().
>
> Not on a pipe/FIFO.  Short writes there are flat out verboten by
> 1003.1 unless O_NONBLOCK is set.  (Not that f_pos is interesting on a
> pipe except as a "bytes sent" indicator  -- and in the multi-threaded
> scenario, if you do the speculative update that I'm suggesting, you
> can't 100% trust it unless you ensure that you are not in
> mid-read/write in some other thread at the moment you sample f_pos.
> But that doesn't make it useless.)
Hello Michael

When was the last time you checked standards ? Please read them again, and 
stop disinforming people.

http://www.opengroup.org/onlinepubs/007908775/xsh/write.html

"On a file not capable of seeking, writing always takes place starting 
at the
 current position. The value of a file offset associated with such a 
device
is undefined."

A pipe/FIFO is not capable of seeking.

I let you make the conclusion of these two points.

A conformant kernel is free to not touch f_pos for non capable seeking files 
(pipes, sockets, ...), or to put any value in it.

Current code does that not because of lazy programmers, but because its 
generic, and adding special cases (tests + conditional branches) just slow 
down the code and make it larger.

>
> As to what a "sane app" has to do: it's just not that unusual to write
> application code that treats a short read/write as a catastrophic
> error, especially when the fd is of a type that is known never to
> produce a short read/write unless something is drastically wrong.  For
> instance, I bomb on short write in audio applications where the driver
> is known to block until enough bytes have been read/written, period.
> When switching from reading a stream of audio frames from thread A to
> reading them from thread B, I may be willing to omit app
> serialization, because I can tolerate an imperfect hand-off in which
> thread A steals one last frame after thread B has started reading --
> as long as the fd doesn't get screwed up.  There is no reason for the
> generic sys_read code to leave a race open in which the same frame is
> read by both threads and a hardware buffer overrun results later.

Don't assume your app is sane while the kernel is not. It's not very fair :

Show us the source code so that we can agree with you or disagree.

Also, I've seen some Unixes (namely AIX IBM) that could return a partial write 
even on a regular file on regular file system. An easy way to trigger this 
was to launch a debugger/syscall_tracer on the live process while it was 
doing a big write(). Most 'sane apps' were ignoring the partial return or 
just throw an exception.

Even on 'cleaner Unixes', a write() near the ulimit -f may return a partial 
count on a regular file.

>
> In short, I'm not proposing that the kernel perfectly serialize
> concurrent reads and writes to arbitrary fd types.  I'm proposing that
> it not do something blatantly stupid and easily avoided in generic
> code that makes it impossible for any fd type to guarantee that, after
> 10 successful pipelined 100-byte reads or writes, f_pos will have
> advanced by 1000.
>

Before saying current linux code is "blatantly stupid and easily avoided", 
just post your patches so that we can check them and eventually say : 

Oh yes, Michael was right and {we|they} were "stupid" all these years

Thank you
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] e1000: Implement the new kernel API for multiqueue TX support.

2007-03-09 Thread Thomas Graf
* Kok, Auke <[EMAIL PROTECTED]> 2007-02-08 16:09
> 
> From: Peter Waskiewicz Jr. <[EMAIL PROTECTED]>
> 
> Several newer e1000 chipsets support multiple RX and TX queues. Most
> commonly, 82571's and ESB2LAN support 2 rx and 2 rx queues.
> 
> Signed-off-by: Peter Waskiewicz Jr. <[EMAIL PROTECTED]>
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
> ---
> 
>  drivers/net/Kconfig   |   13 ++
>  drivers/net/e1000/e1000.h |   23 +++
>  drivers/net/e1000/e1000_ethtool.c |   43 ++
>  drivers/net/e1000/e1000_main.c|  269 
> +++--
>  4 files changed, 304 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index ad92b6a..2d758ab 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -1988,6 +1988,19 @@ config E1000_DISABLE_PACKET_SPLIT
>  
> If in doubt, say N.
>  
> +config E1000_MQ
> + bool "Enable Tx/Rx Multiqueue Support (EXPERIMENTAL)"
> + depends on E1000 && NET_MULTI_QUEUE_DEVICE && EXPERIMENTAL

Would be better to just select NET_MULTI_QUEUE_DEVICE instead of
depending on it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] NET: Multiple queue network device support

2007-03-09 Thread Thomas Graf
* Kok, Auke <[EMAIL PROTECTED]> 2007-02-08 16:09
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 455d589..42b635c 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1477,6 +1477,49 @@ gso:
>   skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);
>  #endif
>   if (q->enqueue) {
> +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
> + int queue_index;
> + /* If we're a multi-queue device, get a queue index to lock */
> + if (netif_is_multiqueue(dev))
> + {
> + /* Get the queue index and lock it. */
> + if (likely(q->ops->map_queue)) {
> + queue_index = q->ops->map_queue(skb, q);
> + 
> spin_lock(>egress_subqueue[queue_index].queue_lock);
> + rc = q->enqueue(skb, q);
> + /*
> +  * Unlock because the underlying qdisc
> +  * may queue and send a packet from a
> +  * different queue.
> +  */
> + 
> spin_unlock(>egress_subqueue[queue_index].queue_lock);
> + qdisc_run(dev);

I really dislike this asymmetric locking logic, while enqueue() is called
with queue_lock held dequeue() is repsonsible to acquire the lock for
qdisc_restart().

> + rc = rc == NET_XMIT_BYPASS
> +? NET_XMIT_SUCCESS : rc;
> + goto out;
> + } else {
> + printk(KERN_CRIT "Device %s is "
> + "multiqueue, but map_queue is "
> + "undefined in the qdisc!!\n",
> + dev->name);
> + kfree_skb(skb);

Move this check to tc_modify_qdisc(), it's useless to check this for
every packet being delivered.

> + }
> + } else {
> + /* We're not a multi-queue device. */
> + spin_lock(>queue_lock);
> + q = dev->qdisc;
> + if (q->enqueue) {
> + rc = q->enqueue(skb, q);
> + qdisc_run(dev);
> + spin_unlock(>queue_lock);
> + rc = rc == NET_XMIT_BYPASS
> +? NET_XMIT_SUCCESS : rc;
> + goto out;
> + }
> + spin_unlock(>queue_lock);

Please don't duplicate already existing code.

> @@ -130,6 +140,72 @@ static inline int qdisc_restart(struct net_device *dev)
>   }
>   
>   {
> +#ifdef CONFIG_NET_MULTI_QUEUE_DEVICE
> + if (netif_is_multiqueue(dev)) {
> + if (likely(q->ops->map_queue)) {
> + queue_index = q->ops->map_queue(skb, q);
> + } else {
> + printk(KERN_CRIT "Device %s is "
> + "multiqueue, but map_queue is "
> + "undefined in the qdisc!!\n",
> + dev->name);
> + goto requeue;
> + }

Yet another condition completely useless for every transmission.

> + 
> spin_unlock(>egress_subqueue[queue_index].queue_lock);
> + /* Check top level device, and any sub-device */
> + if ((!netif_queue_stopped(dev)) &&
> +   (!netif_subqueue_stopped(dev, queue_index))) {
> + int ret;
> + ret = 
> dev->hard_start_subqueue_xmit(skb, dev, queue_index);
> + if (ret == NETDEV_TX_OK) {
> + if (!nolock) {
> + netif_tx_unlock(dev);
> + }
> + return -1;
> + }
> + if (ret == NETDEV_TX_LOCKED && nolock) {
> + 
> spin_lock(>egress_subqueue[queue_index].queue_lock);
> + goto collision;
> + }
> + }
> + /* NETDEV_TX_BUSY - we need to requeue */
> + /* Release the driver */
> + if (!nolock) {
> + netif_tx_unlock(dev);
> +  

Re: [PATCH 1/2] rcfs core patch

2007-03-09 Thread Herbert Poetzl
On Fri, Mar 09, 2007 at 12:07:27PM +0300, Kirill Korotaev wrote:
>>  nobody actually cares about a precise accounting and
>>  calculating shares or partitions of whatever resource,
>>  all that matters is that you have a way to prevent a
>>  potential hostile environment from sucking up all your
>>  resources (or even a single one) resulting in a DoS

> This is not true. People care. Reasons:
>   - resource planning
>   - fairness
>   - guarantees

let me make that a little more clear ...

_nobody_ cares wether a shared memory page is
accounted as full page or as fraction of a page
(depending on the number of guests sharing it)
as long as the accounted amount is substracted
correctly when the page is disposed 

so there _is_ a difference between _false_
accounting (which seems what you are referring
to in the next paragraph) and imprecise, but
consistant accounting (which is what I was 
talking about)

best,
Herbert

>   What you talk is about security only. Not the above issues.
>   So good precision is required. If there is no precision at all,
>   security sucks as well and can be exploited, e.g. for CPU
>   schedulers doing an accounting based on jiffies accounting in
>   scheduler_tick() it is easy to build an application consuming
>   90% of CPU, but ~0% from scheduler POV.

> Kirill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Pluggable Schedulers (was: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler)

2007-03-09 Thread Al Boldi
William Lee Irwin III wrote:
> On Thu, Mar 08, 2007 at 10:31:48PM -0800, Linus Torvalds wrote:
> > No. Really.
> > I absolutely *detest* pluggable schedulers. They have a huge downside:
> > they allow people to think that it's ok to make special-case schedulers.
> > And I simply very fundamentally disagree.
> > If you want to play with a scheduler of your own, go wild. It's easy
> > (well, you'll find out that getting good results isn't, but that's a
> > different thing). But actual pluggable schedulers just cause people to
> > think that "oh, the scheduler performs badly under circumstance X, so
> > let's tell people to use special scheduler Y for that case".
> > And CPU scheduling really isn't that complicated. It's *way* simpler
> > than IO scheduling. There simply is *no*excuse* for not trying to do it
> > well enough for all cases, or for having special-case stuff.
> > But even IO scheduling actually ends up being largely the same. Yes, we
> > have pluggable schedulers, and we even allow switching them, but in the
> > end, we don't want people to actually do it. It's much better to have a
> > scheduler that is "good enough" than it is to have five that are
> > "perfect" for five particular cases.
>
> For the most part I was trying to assist development, but ran out of
> patience and interest before getting much of anywhere. The basic idea
> was to be able to fork over a kernel to a benchmark team and have them
> run head-to-head comparisons, switching schedulers on the fly,
> particularly on machines that took a very long time to boot. The
> concept ideally involved making observations and loading fresh
> schedulers based on them as kernel modules on the fly. I was more
> interested in rapid incremental changes than total rewrites, though I
> considered total rewrites to be tests of adequacy, since somewhere in
> the back of my mind I had thoughts about experimenting with gang
> scheduling policies on those machines taking very long times to boot.
>
> What actually got written, the result of it being picked up by others,
> and how it's getting used are all rather far from what I had in mind,
> not that I'm offended in the least by any of it. I also had little or
> no interest in mainline for it. The intention was more on the order of
> an elaborate instrumentation patch for systems where the time required
> to reboot is prohibitive and the duration of access strictly limited.
> (In fact, downward-revised estimates of the likelihood of such access
> also factored into the abandonment of the codebase.)
>
> I consider policy issues to be hopeless political quagmires and
> therefore stick to mechanism. So even though I may have started the
> code in question, I have little or nothing to say about that sort of
> use for it.
>
> There's my longwinded excuse for having originated that tidbit of code.

I've no idea what both of you are talking about.

How can giving people the freedom of choice be in any way counter-productive?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Experimental driver for Ricoh Bay1Controller SD Card readers

2007-03-09 Thread James

On 2/16/07, Ivan Babkin <[EMAIL PROTECTED]> wrote:

Thank for the job you've done!
Your driver works with 1 Gb sd-card (x86_64 suse's 2.16.18.2 kernel).
Read rate for me was around 250 Kb/s, write - 28 Kb/s (using dd utility).
BTW, I get continuous flow of "sdricoh_cs: timeout waiting for data"
messages in dmesg.


Works for me too. Using a 512mb SD card, and a 256miniSD in an adaptor.

Managed to mount both SD cards and play mp3's off each with mpg321.
Literally music to my ears. I used the 0.1 release on the homepage, so
it still oopsed on removal when I hadnt unmounted.

Laptop is an Acer Travelmate 370 for the record.

James

--
iphitus // Arch Developer // kernel26beyond // iphitus.loudas.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] rcfs core patch

2007-03-09 Thread Herbert Poetzl
On Fri, Mar 09, 2007 at 12:23:55PM +0300, Kirill Korotaev wrote:
>>> There have been various projects attempting to provide
>>> resource management support in Linux, including 
>>> CKRM/Resource Groups and UBC.
>> 
>> let me note here, once again, that you forgot Linux-VServer
>> which does quite non-intrusive resource management ...

> Herbert, do you care to send patches except for ask 
> others to do something that works for you?

sorry, I'm not in the lucky position that I get payed
for sending patches to LKML, so I have to think twice
before I invest time in coding up extra patches ...

i.e. you will have to live with my comments for now

> Looks like your main argument is non-intrusive...
> "working", "secure", "flexible" are not required to 
> people any more? :/

well, Linux-VServer is "working", "secure", "flexible"
_and_ non-intrusive ... it is quite natural that less
won't work for me ... and regarding patches, there
will be a 2.2 release soon, with all the patches ...

>>> Each had its own task-grouping mechanism. 

>> the basic 'context' (pid space) is the grouping mechanism
>> we use for resource management too

>>> Paul Menage observed [1] that cpusets in the kernel already has a
>>> grouping mechanism which was working well for cpusets. He went ahead
>>> and generalized the grouping code in cpusets so that it could be
>>> used for overall resource management purpose.

>>> With his patches, it is possible to even create multiple hierarchies
>>> of groups (see [2] on why multiple hierarchies) as follows:

>> do we need or even want that? IMHO the hierarchical
>> concept CKRM was designed with, was also the reason
>> for it being slow, unuseable and complicated

> 1. cpusets are hierarchical already. So hierarchy is required.
> 2. As it was discussed on the call controllers which are flat
>can just prohibit creation of hierarchy on the filesystem.
>i.e. allow only 1 depth and continue being fast.
> 
>>> mount -t container -o cpuset none /dev/cpuset <- cpuset hierarchy
>>> mount -t container -o mem,cpu none /dev/mem <- memory/cpu hierarchy
>>> mount -t container -o disk none /dev/disk   <- disk hierarchy
>>> 
>>> In each hierarchy, you can create task groups and manipulate the
>>> resource parameters of each group. You can also move tasks between
>>> groups at run-time (see [3] on why this is required). 

>>> Each hierarchy is also manipulated independent of the other.  

>>> Paul's patches also introduced a 'struct container' in the kernel,
>>> which serves these key purposes:
>>> 
>>> - Task-grouping
>>>   'struct container' represents a task-group created in each hierarchy.
>>>   So every directory created under /dev/cpuset or /dev/mem above will
>>>   have a corresponding 'struct container' inside the kernel. All tasks
>>>   pointing to the same 'struct container' are considered to be part of
>>>   a group
>>> 
>>>   The 'struct container' in turn has pointers to resource objects which
>>>   store actual resource parameters for that group. In above example,
>>>   'struct container' created under /dev/cpuset will have a pointer to
>>>   'struct cpuset' while 'struct container' created under /dev/disk will
>>>   have pointer to 'struct disk_quota_or_whatever'.
>>> 
>>> - Maintain hierarchical information
>>>   The 'struct container' also keeps track of hierarchical relationship
>>>   between groups.
>>> 
>>> The filesystem interface in the patches essentially serves these
>>> purposes:
>>> 
>>> - Provide an interface to manipulate task-groups. This includes
>>>   creating/deleting groups, listing tasks present in a group and 
>>>   moving tasks across groups
>>> 
>>> - Provdes an interface to manipulate the resource objects
>>>   (limits etc) pointed to by 'struct container'.
>>> 
>>> As you know, the introduction of 'struct container' was objected
>>> to and was felt redundant as a means to group tasks. Thats where I
>>> took a shot at converting over Paul Menage's patch to avoid 'struct
>>> container' abstraction and insead work with 'struct nsproxy'.

>> which IMHO isn't a step in the right direction, as
>> you will need to handle different nsproxies within
>> the same 'resource container' (see previous email)

> tend to agree.
> Looks like Paul's original patch was in the right way.

> [...]

>>> A separate filesystem would give us more flexibility like the
>>> implementing multi-hierarchy support described above.

>> why is the filesystem approach so favored for this
>> kind of manipulations?

>> IMHO it is one of the worst interfaces I can imagine
>> (to move tasks between spaces and/or assign resources)
>> but yes, I'm aware that filesystems are 'in' nowadays

> I also hate filesystems approach being used nowdays everywhere.
> But, looks like there are reasons still:
> 1. cpusets already use fs interface.
> 2. each controller can have a bit of specific 
>information/controls exported easily.

yes, but there are certain drawbacks too, like:

 - 

Re: [PATCH 2/7] revoke: add f_light flag for struct file

2007-03-09 Thread Pekka J Enberg
At some point in time, I wrote:
> > Btw, what we can do is delay closing the actual revoked file until the
> > task terminates. This has the unfortunate side-effect that a task has
> > no way of freeing the resources now. But, I am beginning to think it's
> > not a big problem because the inode mapping will be zapped immediately
> > upon revoke anyway...
 
On Fri, 9 Mar 2007, Alan Cox wrote:
> Actually you can't entirely do this. The revoke() definition states
> explicitly that the driver close occurs at the point of revoke() not
> later.
> 
> That can however be pushed into the device revoke method for the cases
> where it might matter (eg tty).

Yeah, you just make f_ops->revoke close the driver and f_ops->flush a 
no-op if the driver has already been closed.

Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] revoke: add f_light flag for struct file

2007-03-09 Thread Alan Cox
> Btw, what we can do is delay closing the actual revoked file until the
> task terminates. This has the unfortunate side-effect that a task has
> no way of freeing the resources now. But, I am beginning to think it's
> not a big problem because the inode mapping will be zapped immediately
> upon revoke anyway...

Actually you can't entirely do this. The revoke() definition states
explicitly that the driver close occurs at the point of revoke() not
later.

That can however be pushed into the device revoke method for the cases
where it might matter (eg tty).

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] futex: restartable futex_wait?

2007-03-09 Thread Thomas Gleixner
On Fri, 2007-03-09 at 13:24 +0100, Nick Piggin wrote:
> > > > 'time' here is relative, so the restarted syscall will do a /full/ wait 
> > > > again.
> > > 
> > > But it has been modified by schedule_timeout?
> > 
> > But this does not change the syscall registers, so it is restarted in
> > the same way. We need a new futex OP for this, which takes absolute time
> > like the PI futex op does.
> 
> Forgive me if I'm missing something here, but I'm using the restart block
> and saving the updated value of time in ->arg2, and using that as the new
> time parameter passed into futex_wait from futex_wait_restart.

Oops. I went into confusion mode. You are right, the restart block keeps
that.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.20-1] radeonfb: Add support for Radeon xpress 200m

2007-03-09 Thread Benjamin Herrenschmidt

> - radeonfb_pm_init(rinfo, rinfo->is_mobility ? 1 : -1, 
> ignore_devlist, force_sleep);
> + radeonfb_pm_init(rinfo, rinfo->is_mobility && rinfo->family != 
> CHIP_FAMILY_RS480 ? 1 : -1, ignore_devlist, force_sleep);

I'd rather you add a check for RS480 inside radeonfb_pm_*

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix atomicity of TIF update in flush_thread() for powerpc

2007-03-09 Thread Benjamin Herrenschmidt
 .../...

> Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

Acked-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>

Nice catch !

> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -476,8 +476,13 @@ void flush_thread(void)
>  #ifdef CONFIG_PPC64
>   struct thread_info *t = current_thread_info();
>  
> - if (t->flags & _TIF_ABI_PENDING)
> - t->flags ^= (_TIF_ABI_PENDING | _TIF_32BIT);
> + if (test_tsk_thread_flag(tsk, TIF_ABI_PENDING)) {
> + clear_tsk_thread_flag(tsk, TIF_ABI_PENDING);
> + if (test_tsk_thread_flag(tsk, TIF_32BIT))
> + clear_tsk_thread_flag(tsk, TIF_32BIT);
> + else
> + set_tsk_thread_flag(tsk, TIF_32BIT);
> + }
>  #endif
>  
>   discard_lazy_cpu_state();

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_write() racy for multi-threaded append?

2007-03-09 Thread Alan Cox
> 1003.1 unless O_NONBLOCK is set.  (Not that f_pos is interesting on a
> pipe except as a "bytes sent" indicator  -- and in the multi-threaded

f_pos is undefined on a FIFO or similar object.

> As to what a "sane app" has to do: it's just not that unusual to write
> application code that treats a short read/write as a catastrophic
> error, especially when the fd is of a type that is known never to
> produce a short read/write unless something is drastically wrong.  For

If you are working in a strictly POSIX environment then a signal can
interrupt almost any I/O as a short write even disk I/O. In the sane
world the file I/O cases don't do this.

> as long as the fd doesn't get screwed up.  There is no reason for the
> generic sys_read code to leave a race open in which the same frame is
> read by both threads and a hardware buffer overrun results later.

Audio devices are not seekable anyway.

> concurrent reads and writes to arbitrary fd types.  I'm proposing that
> it not do something blatantly stupid and easily avoided in generic
> code that makes it impossible for any fd type to guarantee that, after
> 10 successful pipelined 100-byte reads or writes, f_pos will have
> advanced by 1000.

You might want to read up on the Unix design philosophy. Things like
record based I/O are user space to avoid kernel complexity and also so
that the overhead of these things is paid only by those who need them
(its kind of RISC for OS design).

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [Patch 1/1] IBAC Patch

2007-03-09 Thread Mimi Zohar
On Thu, 2007-03-08 at 15:08 -0800, Randy Dunlap wrote:
> On Thu, 08 Mar 2007 17:58:16 -0500 Mimi Zohar wrote:
> 
> > This is a request for comments for a new Integrity Based Access
> > Control(IBAC) LSM module which bases access control decisions
> > on the new integrity framework services. 
> > 
> > (Hopefully this will help clarify the interaction between an LSM 
> > module and LIM module.)
> > 
> > Index: linux-2.6.21-rc3-mm2/security/ibac/Kconfig
> > ===
> > --- /dev/null
> > +++ linux-2.6.21-rc3-mm2/security/ibac/Kconfig
> > @@ -0,0 +1,36 @@
> > +config SECURITY_IBAC
> > +   boolean "IBAC support"
> > +   depends on SECURITY && SECURITY_NETWORK && INTEGRITY
> > +   help
> > + Integrity Based Access Control(IBAC) implements integrity
> > + based access control.
> 
> Please make the help text do more than repeat the words I B A C...
> Put a short explanation or say something like:
> See Documentation/security/foobar.txt for more information.
> (and add that file)

Agreed.  Perhaps something like:

Integrity Based Access Control(IBAC) uses the Linux Integrity
Module(LIM) API calls to verify an executable's metadata and 
data's integrity.  Based on the results, execution permission 
is permitted/denied.  Integrity providers may implement the 
LIM hooks differently.  For more information on integrity
verification refer to the specific integrity provider 
documentation. 

> > +config SECURITY_IBAC_BOOTPARAM
> > +   bool "IBAC boot parameter"
> > +   depends on SECURITY_IBAC
> > +   default y
> > +   help
> > + This option adds a kernel parameter 'ibac', which allows IBAC
> > + to be disabled at boot.  If this option is selected, IBAC
> > + functionality can be disabled with ibac=0 on the kernel
> > + command line.  The purpose of this option is to allow a
> > + single kernel image to be distributed with IBAC built in,
> > + but not necessarily enabled.
> > +
> > + If you are unsure how to answer this question, answer N.
> 
> What's the downside to having this always builtin instead of
> yet another config option?

The ability of changing LSM modules at runtime might be perceived
as problematic.

> > +static struct security_operations ibac_security_ops = {
> > +   .bprm_check_security = ibac_bprm_check_security
> > +};
> > +
> > +static int __init init_ibac(void)
> > +{
> > +   int rc;
> > +
> > +   if (!ibac_enabled)
> > +   return 0;
> > +
> > +   rc = register_security(_security_ops);
> > +   if (rc != 0)
> > +   panic("IBAC: Unable to register with kernel\n");
> 
> Normally we would not want to see a panic() from a register_xyz()
> failure, but I guess you are arguing that an ibac register_security()
> failure needs to halt everything??

Yes, as this implies that another LSM module registered the hooks first,
preventing IBAC from registering itself. 

Thank you for your other comments.  They'll be addressed in the next
ibac patch release.

Mimi Zohar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pata_cmd640: Multiple updates

2007-03-09 Thread Jeff Garzik

Alan Cox wrote:

Add suspend/resume support
Write 0x5B to 0 not 0x5C

The former is important as we must kill the FIFO on a resume

Signed-off-by: Alan Cox <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] ATAPI command TEST_UNIT_READY never succeeds!

2007-03-09 Thread Tejun Heo
Conke Hu wrote:
> On 3/7/07, Tejun Heo <[EMAIL PROTECTED]> wrote:
>> (snip)
>> I've read your last posting about this, but forgot to follow up.  TUR is
>> supposed to fail if ATAPI device doesn't have media loaded.  TUR fails
>> and sense data returns device not ready - media not present.  That's the
>> normal operation.  Does TUR fail even with media loaded or is sense data
>> not properly fetched?
>>
> 
> Thank you, Tejun! I did forget to load any media :(
> When I load media and retry, TUR succeeds but there is still a
> problem: when using ahci driver, TUR will not succeed unless it runs
> twice, and the following loop always runs till retries==2
>  code in sr.c, get_capabilities()
> -
> retries = 0;
> do {
> memset((void *)cmd, 0, MAX_COMMAND_SIZE);
> cmd[0] = TEST_UNIT_READY;
> 
> the_result = scsi_execute_req (cd->device, cmd, DMA_NONE, NULL,
>0, , SR_TIMEOUT,
>MAX_RETRIES);
> 
> retries++;
> } while (retries < 5 &&
>  (!scsi_status_is_good(the_result) ||
>   (scsi_sense_valid() &&
>sshdr.sense_key == UNIT_ATTENTION)));
> - end code
> -
> 
> this issue only occurs in ahci driver,  and libata + pata driver is OK.

After media is loaded, it takes some time for the device to get ready,
and it's supposed to raise UNIT_ATTENTION to tell the driver that media
has changed.  TUR fails with UA and sense data reports media exchanged
and the next TUR succeeds.  In SATA, you might also see unit reset UA
due to the probing sequence.  Those are why TUR is tried several times
in the first place.

So, there's nothing wrong with failing TURs, it's perfectly normal that
they fail first one or two tries.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc4-mm1: PCI=n: drivers/net/3c59x.c compile error

2007-03-09 Thread Tejun Heo
Hello,

Randy Dunlap wrote:
>> Erm, before I do that, could somebody explain what
>>
>> #define HAVE_PCI_REQ_REGIONS 2
>>
>> accompanying their declaration is for? I have't found any references to it 
>> in 
>> the source. Should I duplicate it for CONFIG_PCI=n case (I guess not)?
> 
> I wouldn't since it's not used anywhere, but maybe Tejun could comment
> on it...

This is the first time I see that macro.  There is no user in the whole
source.  I think the best way is to just kill it.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: any thoughts yet on a "generic" ioctl.h?

2007-03-09 Thread Robert P. J. Day
On Fri, 9 Mar 2007, Stefan Richter wrote:

> Robert P. J. Day wrote:
> > if someone can't immediately see what i'm trying
> > to do given the previously-posted patch, then they shouldn't be
> > commenting on it one way or the other.
>
> I'm not sure if you are addressing me too.  Just to clarify:  I
> wasn't commenting on the patch, I only commented on what I quoted in
> my reply.

sorry, i worded that *really* badly.  i didn't mean to imply that
*you* were incapable of understanding what the patch represented --
i've seen enough of your posts to appreciate your technical expertise.

all i want to know is if the proposed patch making
include/asm-generic/ioctl.h more flexible is even *theoretically* a
feasible thing to do, or whether anyone on this list would have any
howling objections to it.

not only that, but i would *prefer* to submit just that file as a
first patch all by itself since, again theoretically, it shouldn't
break anything and i'd like to verify that first before trying to
simplify any of the arch-specific ioctl.h files one at a time.

as it is, the number of arch-specific ioctl.h files that could
potentially be made *much* shorter are for the arches:

  mips
  parisc
  alpha
  sparc
  sparc-64
  powerpc

all the rest simply include asm-generic/ioctl.h directly.

rday

p.s.  for those who may have come in late, the proposed new
asm-generic/ioctl.h file would allow an includer to override any or
all of:

  _IOC_SIZEBITS
  _IOC_DIRBITS
  _IOC_NONE
  _IOC_WRITE
  _IOC_READ

AFAICT, those are the only differences across the entire spectrum of
ioctl.h files.  if there's something i've missed, feel free to let me
know.

--

Robert P. J. Day Linux Consulting, Training and Annoying Kernel
Pedantry Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] fs: introduce perform_write aop

2007-03-09 Thread Nick Piggin
Hi Christoph,

On Fri, Mar 09, 2007 at 10:39:13AM +, Christoph Hellwig wrote:
> Hi Nick,
> 
> sorry for my later reply, this has been on my to answer list for the last
> month and I only managed to get back to it now.

No worries, I haven't had much time to work on it since then anyway.
Thanks for taking a look.

> On Thu, Feb 08, 2007 at 02:07:36PM +0100, Nick Piggin wrote:
> > as a single call to copy a given amount of userdata at the given offset. 
> > This
> > is more flexible, because the implementation can determine how to best 
> > handle
> > errors, or multi-page ranges (eg. it may use a gang lookup), and only 
> > requires
> > one call into the fs.
> 
> I really like this idea, especially for avoiding to call into the allocator
> for every block.  Have you contacted the reiser4 folks whether this would
> superceed their batch_write op completely?

I haven't yet, although that's been on my todo list when I get the API
into a more final state.

batch_write seems quite similar, however theirs is still page based, and
a bit crufty, IMO. I found it to be really clean to just pass down offsets,
but that may be a matter for debate.

What they _do_ have is a write actor function that will do the data copy.
This could be one possible way to get rid of ->prepare_write and
->commit_write, but I haven't tried that yet, because I don't like adding
more redirection and complexity if possible...

> > One problem with this interface is that it cannot be used to write into the
> > filesystem by any means other than already-initialised buffers via iovecs. 
> > So
> > prepare/commit have to stay around for non-user data... 
> 
> Actually I think that's a a good thing to a certain extent.  It reminds
> us that all other users are horrible abuse of the interface.  I'd even
> go so far as to make batch_write a callback that the filesystem passes
> to generic_file_aio_write to make clear it's not a generic thing but
> a helper.  (It's not a generic thing because it's the upper layer writing
> into the pagecache, not a pagecache to fs below operation).

OK, if you think that's reasonable, then that is one hurdle out of the way ;)

> The still leaves open on how to get rid of ->prepare_write and ->commit_write
> compltely, and for that we'll probably need ->kernel_read and ->kernel_write
> file operations.  But that's a step you shouldn't consider yet when doing
> this work.

I had a couple of possibilities for that. First is passing in a write actor
(eg. defaulting to the normal iovec usercopy), but as I said I consider this
more like fixing the problem with brute force (ie. just making the interface
more complex). Maybe as a last resort, though.

Another thing that would be much nicer from _my_ point of view would be to
just make all kernel users set up their data in an iovec, and use the normal
call with KERNEL_DS. Unfortunately, this is not the expected way for a lot
of code to work, and it might require extra copying of the data.


> > Another thing is that it seems to be less able to be implemented in generic,
> > reusable code. It should be possible to introduce a new 2-op interface (or
> > maybe just a new error handler op) which can be used correctly in generic 
> > code.
> 
> We should be able to find a nice abstraction for this, see my next mails.
> 
> > +   /*
> > +* perform_write replaces prepare and commit_write callbacks.
> > +*/
> 
> This is a rather useless comment :)  Better remove it and add a proper
> descriptions to Documentation/filesystems/vfs.txt and
> Documentation/filesystems/Locking

Will do. Thanks!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sata_nv: revert use of notifiers for now

2007-03-09 Thread Jeff Garzik

Robert Hancock wrote:

Commit 721449bf0d51213fe3abf0ac3e3561ef9ea7827a added support for using the
ADMA notifier bits to determine which commands to check for completion.
However there have been reports that this causes command timeouts in 
certain
cases. This is still being investigated. In addition, apparently the 
notifiers
won't work if ADMA is disabled on the other port as a result of an ATAPI 
device

being connected, and we don't handle this case properly.

For now, just restore the previous behavior of checking all active commands
to see if they are complete, without relying on the notifiers.

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>


applied to #upstream-fixes


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] devres: release resources on device_del()

2007-03-09 Thread Jeff Garzik

Tejun Heo wrote:

Some platform devices are driven without driver attached, so managed
resources can be acquired without driver attached.  Make sure such
resources are released by calling devres_release_all() in
device_del().

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
This one fixes oops on pata_platform and pata_legacy unload.  libata
being the only user of devres at the moment.  I think this can go
through libata-dev#upstream.


applied to #upstream-fixes


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATA] Failed to set xfermode on LITE-ON LTR-48246S

2007-03-09 Thread Tejun Heo
Philipp Matthias Hahn wrote:
> Hello Tejun!
> 
> On Tue, Mar 06, 2007 at 12:46:07AM +0900, Tejun Heo wrote:
>> Philipp Matthias Hahn wrote:
>>> On Mon, Mar 05, 2007 at 01:10:10PM +0900, Tejun Heo wrote:
 * Does applying the attached patch over unpatched 2.6.20.1 fix the problem?
>>> Yes, it seems to fix it. (testes on top of the first patch mentioned
>>> above, but witheout the second one.).
>>> If I should test it as the sole patche, just mail me again please.
>> Yeap, please test it on top of vanilla kernel to verify that the patch
>> proper fixes the problem.
> 
> Yes, applying only this patch does fix the problem.
> Thank you for your support.

Okay, it seems we'll have to default to polling SETXFER.  I'll forward
it upstream soon.  Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] libata fixes

2007-03-09 Thread Tejun Heo
Paul Rolland wrote:
> Hell Tejun,
> 
> I've boot-tested this yesterday, with no real luck... 
> 
> 1 - Tested on top of 2.6.21-rc2 (hope it's fine for you),
> 2 - Collected a full dmesg before and after
> 
> Extract is :
> ata7: PATA max UDMA/100 cmd 0x00019c00 ctl 0x00019882 bmdma
> 0x00
> 019400 irq 16
> ata8: PATA max UDMA/100 cmd 0x00019800 ctl 0x00019482 bmdma
> 0x00
> 019408 irq 16
> scsi6 : pata_jmicron
> ata7.00: ATAPI, max UDMA/66
> ata7.00: configured for UDMA/66
> scsi7 : pata_jmicron
> ATA: abnormal status 0x7F on port 0x00019807
> ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata7.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x12 data 36 in
>  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

It seems like IRQ is not getting through.  The first IRQ driven command
is failing for you.

* Does giving 'acpi=off' or 'irqpoll' make any difference?

* Can you connect a harddisk to the channel and see whether that works?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: any thoughts yet on a "generic" ioctl.h?

2007-03-09 Thread Stefan Richter
Robert P. J. Day wrote:
> if someone can't immediately see what i'm trying
> to do given the previously-posted patch, then they shouldn't be
> commenting on it one way or the other.

I'm not sure if you are addressing me too.  Just to clarify:  I wasn't
commenting on the patch, I only commented on what I quoted in my reply.
-- 
Stefan Richter
-=-=-=== --== --===
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] revoke: delayed file closing

2007-03-09 Thread Pekka Enberg

On 3/9/07, Pekka J Enberg <[EMAIL PROTECTED]> wrote:

To fix this, change sys_revoke() not to close the actual revoked file
immediately. Instead, we do filp_close() when the user does close(2)
on the revoked file descriptor.


Btw, this is safe because a filesystem implementing f_ops->revoke must
ensure there are no pending writes after it has completed so
f_ops->flush will not do any actual flushing when it is invoked by a
delayed close.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] swsusp: Disable nonboot CPUs before entering platform suspend

2007-03-09 Thread Heiko Carstens
On Wed, Mar 07, 2007 at 09:07:17PM +, Pavel Machek wrote:
> Hi!
> 
> > Prevent the WARN_ON() in arch/x86_64/kernel/acpi/sleep.c:init_low_mapping()
> > from triggering by disabling nonboot CPUs before we finally enter the 
> > platform
> > suspend.
> > 
> > Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > ---
> >  kernel/power/disk.c |1 +
> >  kernel/power/user.c |2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > ===
> > --- linux-2.6.21-rc2-mm2.orig/kernel/power/disk.c
> > +++ linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > @@ -61,6 +61,7 @@ static void power_down(suspend_disk_meth
> > switch(mode) {
> > case PM_DISK_PLATFORM:
> > if (pm_ops && pm_ops->enter) {
> > +   disable_nonboot_cpus();
> > kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
> > pm_ops->enter(PM_SUSPEND_DISK);
> > break;
> 
> ...so, if pm_ops is non-null, power_down does nonboot cpu disabling,
> otherwise we proceed with cpus enabled?
> 
> That looks ugly.
> 
> Is the warning bogus? Or maybe we should *always* disable nonboot cpus
> in powerdown path?

Is disable_nonboot_cpus() assuming that first_cpu(cpu_present_map) is
the boot cpu? Just wondering why disable_nonboot_cpus() isn't using just
any_online_cpu(cpu_online_map)...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] revoke: delayed file closing

2007-03-09 Thread Pekka J Enberg
From: Pekka Enberg <[EMAIL PROTECTED]>

As explained by Eric Dumazet, one of the interests of fget_light() is
to avoid dirtying struct file which is broken by the newly added
file->f_light. In addition, fget_light() currently has a race window
between fcheck_files() and set_f_light().

To fix this, change sys_revoke() not to close the actual revoked file
immediately. Instead, we do filp_close() when the user does close(2)
on the revoked file descriptor.

Cc: Eric Dumazet <[EMAIL PROTECTED]>
Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 fs/file_table.c  |1 
 fs/revoke.c  |   53 +++
 fs/revoked_inode.c   |3 +-
 include/linux/file.h |   13 --
 include/linux/fs.h   |2 -
 include/linux/revoked_fs_i.h |   20 
 6 files changed, 26 insertions(+), 66 deletions(-)

Index: uml-2.6/fs/file_table.c
===
--- uml-2.6.orig/fs/file_table.c2007-03-09 14:06:48.0 +0200
+++ uml-2.6/fs/file_table.c 2007-03-09 14:06:53.0 +0200
@@ -219,7 +219,6 @@
*fput_needed = 0;
if (likely((atomic_read(>count) == 1))) {
file = fcheck_files(files, fd);
-   set_f_light(file);
} else {
rcu_read_lock();
file = fcheck_files(files, fd);
Index: uml-2.6/fs/revoke.c
===
--- uml-2.6.orig/fs/revoke.c2007-03-09 14:00:02.0 +0200
+++ uml-2.6/fs/revoke.c 2007-03-09 14:18:15.0 +0200
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * This is used for pre-allocating an array of file pointers so that we don't
@@ -33,20 +34,6 @@
  */
 static struct vfsmount *revokefs_mnt;
 
-struct revokefs_inode_info {
-   struct task_struct *owner;
-   struct file *file;
-   unsigned int fd;
-   struct inode vfs_inode;
-};
-
-static inline struct revokefs_inode_info *revokefs_i(struct inode *inode)
-{
-   return container_of(inode, struct revokefs_inode_info, vfs_inode);
-}
-
-extern void make_revoked_inode(struct inode *, int);
-
 static struct file *get_revoked_file(void)
 {
struct dentry *dentry;
@@ -235,24 +222,6 @@
return err;
 }
 
-static int task_filp_close(struct task_struct *task, struct file *filp)
-{
-   struct files_struct *files;
-   int err = 0;
-
-   files = get_files_struct(task);
-   if (files) {
-   /*
-* Wait until sys_read and sys_write are done.
-*/
-   while (filp->f_light)
-   schedule();
-   err = filp_close(filp, files);
-   put_files_struct(files);
-   }
-   return err;
-}
-
 static void restore_file(struct revokefs_inode_info *info)
 {
struct files_struct *files;
@@ -293,19 +262,6 @@
}
 }
 
-static int revoke_file(struct task_struct *task, struct file *filp)
-{
-   int err;
-
-   err = filp->f_op->revoke(filp);
-   if (err)
-   goto out;
-
-   err = task_filp_close(task, filp);
-  out:
-   return err;
-}
-
 static int revoke_files(struct revoke_table *table)
 {
unsigned long i;
@@ -313,7 +269,7 @@
 
for (i = 0; i < table->end; i++) {
struct revokefs_inode_info *info;
-   struct file *this;
+   struct file *this, *filp;
 
this = table->files[i];
info = revokefs_i(this->f_dentry->d_inode);
@@ -323,7 +279,8 @@
 * an partially closed file can no longer be restored.
 */
table->restore_start++;
-   err = revoke_file(info->owner, info->file);
+   filp = info->file;
+   err = filp->f_op->revoke(filp);
put_task_struct(info->owner);
info->owner = NULL; /* To avoid restoring closed file. */
if (err)
@@ -565,8 +522,6 @@
kmem_cache_free(revokefs_inode_cache, revokefs_i(inode));
 }
 
-#define REVOKEFS_MAGIC 0x5245564B  /* REVK */
-
 static struct super_operations revokefs_super_ops = {
.alloc_inode = revokefs_alloc_inode,
.destroy_inode = revokefs_destroy_inode,
Index: uml-2.6/fs/revoked_inode.c
===
--- uml-2.6.orig/fs/revoked_inode.c 2007-03-09 14:03:58.0 +0200
+++ uml-2.6/fs/revoked_inode.c  2007-03-09 14:05:21.0 +0200
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static loff_t revoked_file_llseek(struct file *file, loff_t offset, int origin)
 {
@@ -96,7 +97,7 @@
 
 static int revoked_file_flush(struct file *file, fl_owner_t id)
 {
-   return 0;
+   return filp_close(file, id);
 }
 
 static int revoked_file_release(struct inode *inode, struct file *filp)
Index: uml-2.6/include/linux/file.h

Re: [rfc][patch] futex: restartable futex_wait?

2007-03-09 Thread Nick Piggin
On Fri, Mar 09, 2007 at 10:38:35AM +0100, Thomas Gleixner wrote:
> On Fri, 2007-03-09 at 06:10 +0100, Nick Piggin wrote:
> > > i think that's quite right. I'm wondering why this never came up before? 
> > > But your fix is not complete i think:
> > > 
> > > > + restart->arg2 = time;
> > > > + return -ERESTART_RESTARTBLOCK;
> > > > + }
> > > 
> > > 'time' here is relative, so the restarted syscall will do a /full/ wait 
> > > again.
> > 
> > But it has been modified by schedule_timeout?
> 
> But this does not change the syscall registers, so it is restarted in
> the same way. We need a new futex OP for this, which takes absolute time
> like the PI futex op does.

Forgive me if I'm missing something here, but I'm using the restart block
and saving the updated value of time in ->arg2, and using that as the new
time parameter passed into futex_wait from futex_wait_restart.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] ATAPI command TEST_UNIT_READY never succeeds!

2007-03-09 Thread Conke Hu

On 3/7/07, Tejun Heo <[EMAIL PROTECTED]> wrote:

(snip)
I've read your last posting about this, but forgot to follow up.  TUR is
supposed to fail if ATAPI device doesn't have media loaded.  TUR fails
and sense data returns device not ready - media not present.  That's the
normal operation.  Does TUR fail even with media loaded or is sense data
not properly fetched?



Thank you, Tejun! I did forget to load any media :(
When I load media and retry, TUR succeeds but there is still a
problem: when using ahci driver, TUR will not succeed unless it runs
twice, and the following loop always runs till retries==2
 code in sr.c, get_capabilities() -
retries = 0;
do {
memset((void *)cmd, 0, MAX_COMMAND_SIZE);
cmd[0] = TEST_UNIT_READY;

the_result = scsi_execute_req (cd->device, cmd, DMA_NONE, NULL,
   0, , SR_TIMEOUT,
   MAX_RETRIES);

retries++;
} while (retries < 5 &&
 (!scsi_status_is_good(the_result) ||
  (scsi_sense_valid() &&
   sshdr.sense_key == UNIT_ATTENTION)));
- end code
-

this issue only occurs in ahci driver,  and libata + pata driver is OK.

Conke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sys_write() racy for multi-threaded append?

2007-03-09 Thread Michael K. Edwards

On 3/8/07, Benjamin LaHaise <[EMAIL PROTECTED]> wrote:

Any number of things can cause a short write to occur, and rewinding the
file position after the fact is just as bad.  A sane app has to either
serialise the writes itself or use a thread safe API like pwrite().


Not on a pipe/FIFO.  Short writes there are flat out verboten by
1003.1 unless O_NONBLOCK is set.  (Not that f_pos is interesting on a
pipe except as a "bytes sent" indicator  -- and in the multi-threaded
scenario, if you do the speculative update that I'm suggesting, you
can't 100% trust it unless you ensure that you are not in
mid-read/write in some other thread at the moment you sample f_pos.
But that doesn't make it useless.)

As to what a "sane app" has to do: it's just not that unusual to write
application code that treats a short read/write as a catastrophic
error, especially when the fd is of a type that is known never to
produce a short read/write unless something is drastically wrong.  For
instance, I bomb on short write in audio applications where the driver
is known to block until enough bytes have been read/written, period.
When switching from reading a stream of audio frames from thread A to
reading them from thread B, I may be willing to omit app
serialization, because I can tolerate an imperfect hand-off in which
thread A steals one last frame after thread B has started reading --
as long as the fd doesn't get screwed up.  There is no reason for the
generic sys_read code to leave a race open in which the same frame is
read by both threads and a hardware buffer overrun results later.

In short, I'm not proposing that the kernel perfectly serialize
concurrent reads and writes to arbitrary fd types.  I'm proposing that
it not do something blatantly stupid and easily avoided in generic
code that makes it impossible for any fd type to guarantee that, after
10 successful pipelined 100-byte reads or writes, f_pos will have
advanced by 1000.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] revoke: consolidate revoked file operations

2007-03-09 Thread Alan Cox
On Fri, Mar 09, 2007 at 01:55:06PM +0200, Pekka J Enberg wrote:
> From: Pekka Enberg <[EMAIL PROTECTED]>
> 
> Return EBADF for all revoked file operations except for read(2) which
> returns zero for special files as the BSDs do and close(2) which is 
> always zero.
> 
> Cc: Alan Cox <[EMAIL PROTECTED]>
> Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>

Acked-by: Alan Cox <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] revoke: add f_light flag for struct file

2007-03-09 Thread Pekka Enberg

On 3/9/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:

> Then just drop the fget_light() 'optimisation' and always take a reference
> (atomic on f_count) regardless of single-thread or not. Instead of dirtying
> f_light, just do the straightforward thing and be with it.


On 3/9/07, Pekka Enberg <[EMAIL PROTECTED]> wrote:

That's what I did first but akpm thought it was "unfortunate." Hmm.. ;-)


Btw, what we can do is delay closing the actual revoked file until the
task terminates. This has the unfortunate side-effect that a task has
no way of freeing the resources now. But, I am beginning to think it's
not a big problem because the inode mapping will be zapped immediately
upon revoke anyway...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] xfs: stop using kmalloc in xfs_buf_get_noaddr

2007-03-09 Thread Christoph Hellwig
Ed Cashin found a bug in the error handling code for the case where
a page allocation fails.  Here's the updated version:

Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
===
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c   2007-03-08 19:08:38.0 
+0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c2007-03-09 08:59:15.0 
+0100
@@ -314,7 +314,7 @@
 
ASSERT(list_empty(>b_hash_list));
 
-   if (bp->b_flags & _XBF_PAGE_CACHE) {
+   if (bp->b_flags & (_XBF_PAGE_CACHE|_XBF_PAGES)) {
uinti;
 
if ((bp->b_flags & XBF_MAPPED) && (bp->b_page_count > 1))
@@ -323,18 +323,11 @@
for (i = 0; i < bp->b_page_count; i++) {
struct page *page = bp->b_pages[i];
 
-   ASSERT(!PagePrivate(page));
+   if (bp->b_flags & _XBF_PAGE_CACHE)
+   ASSERT(!PagePrivate(page));
page_cache_release(page);
}
_xfs_buf_free_pages(bp);
-   } else if (bp->b_flags & _XBF_KMEM_ALLOC) {
-/*
- * XXX(hch): bp->b_count_desired might be incorrect (see
- * xfs_buf_associate_memory for details), but fortunately
- * the Linux version of kmem_free ignores the len argument..
- */
-   kmem_free(bp->b_addr, bp->b_count_desired);
-   _xfs_buf_free_pages(bp);
}
 
xfs_buf_deallocate(bp);
@@ -764,41 +757,41 @@
size_t  len,
xfs_buftarg_t   *target)
 {
-   size_t  malloc_len = len;
+   unsigned long   page_count = PAGE_ALIGN(len) >> PAGE_SHIFT;
+   int error, i;
xfs_buf_t   *bp;
-   void*data;
-   int error;
 
bp = xfs_buf_allocate(0);
if (unlikely(bp == NULL))
goto fail;
_xfs_buf_initialize(bp, target, 0, len, 0);
 
- try_again:
-   data = kmem_alloc(malloc_len, KM_SLEEP | KM_MAYFAIL | KM_LARGE);
-   if (unlikely(data == NULL))
+   error = _xfs_buf_get_pages(bp, page_count, 0);
+   if (error)
goto fail_free_buf;
 
-   /* check whether alignment matches.. */
-   if ((__psunsigned_t)data !=
-   ((__psunsigned_t)data & ~target->bt_smask)) {
-   /* .. else double the size and try again */
-   kmem_free(data, malloc_len);
-   malloc_len <<= 1;
-   goto try_again;
-   }
-
-   error = xfs_buf_associate_memory(bp, data, len);
-   if (error)
+   for (i = 0; i < page_count; i++) {
+   bp->b_pages[i] = alloc_page(GFP_KERNEL);
+   if (!bp->b_pages[i])
+   goto fail_free_mem;
+   }
+   bp->b_flags |= _XBF_PAGES;
+
+   error = _xfs_buf_map_pages(bp, XBF_MAPPED);
+   if (unlikely(error)) {
+   printk(KERN_WARNING "%s: failed to map pages\n",
+   __FUNCTION__);
goto fail_free_mem;
-   bp->b_flags |= _XBF_KMEM_ALLOC;
+   }
 
xfs_buf_unlock(bp);
 
XB_TRACE(bp, "no_daddr", data);
return bp;
+
  fail_free_mem:
-   kmem_free(data, malloc_len);
+   for ( ; i >= 0; i--)
+   __free_page(bp->b_pages[i]);
  fail_free_buf:
xfs_buf_free(bp);
  fail:
Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.h
===
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.h   2007-03-08 19:08:38.0 
+0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.h2007-03-09 08:58:50.0 
+0100
@@ -63,7 +63,7 @@
 
/* flags used only internally */
_XBF_PAGE_CACHE = (1 << 17),/* backed by pagecache */
-   _XBF_KMEM_ALLOC = (1 << 18),/* backed by kmem_alloc()  */
+   _XBF_PAGES = (1 << 18), /* backed by refcounted pages  */
_XBF_RUN_QUEUES = (1 << 19),/* run block device task queue */
_XBF_DELWRI_Q = (1 << 21),   /* buffer on delwri queue */
 } xfs_buf_flags_t;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] chaostables

2007-03-09 Thread Amin Azez
* Jan Engelhardt wrote, On 09/03/07 10:19:
> Hello,
>
> On Mar 9 2007 09:35, Amin Azez wrote:
>   
>> * Jan Engelhardt wrote, On 08/03/07 20:26:
>> 
>>> xt_portscan needs to keep track of what packets the machine has already 
>>> seen. So on the first SYN, the connection is marked with "1". (Then we 
>>> send our SYN-ACK... and the connection turns ESTABLISHED.) The next 
>>> packet that is received will be an ACK or an RST. But it must come 
>>> _exactly after_ the SYN, so just using --tcp-flags ACK will not work. A 
>>> state which can be remembered is required. For that, an automaton is 
>>> used, whose state is saved in the connection mark.
>>>   
>> There would me more point in having this as a new match if it didn't
>> trample on the connection mark, but used it's own slot or flag-bit.
>> 
For the record, I support inclusion of this extension in general.
It is true to say "but a netfilter guru could craft together a sequence
of mark-consuming rules to do something somewhat similar" the same is
also somewhat true for connlimit (packet limits) and so on. The point of
this match is that people don't have to.
> Adding a member to the ip_conntrack/nf_conntrack and sk_buff struct would
> increase the struct sizes, and that would penalize users who do not intend
> to use xt_portscan.
>   
I understand what you say but it sounds a bit like saying: "but we
didn't make it very good because so few people would use it anyway"
which of course makes it even less attractive. I realise you have your
own interpretation but this is how it reads to me.
> I do not see why the packet/connection marks should not be used to record
> additional information
...
> Almost never I required connection marking myself 
I guessed as much. I use it heavily, with my xml rule generators.
> except for this
> portscanning automaton and perhaps a little MARK here and there for
> finely-tuned SNAT. Again, things might look different on your side(s).
>   
There's too many things fighting over the same few bits of the mark, and
in your case you are using it to track internal state of a connection
that has no relevance to the rest of the iptables/ebtables rules.

I'm suggesting that some of the people who would want to use the chaos
match, won't because of the mark issue.

This is not a new problem.

http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/16217
> From: Tom Eastep  shorewall.net>
> Subject: Re: RFC: Disable defered bridge hooks by default


>
> Once again, netfilter marks are the solution of last resort. This is
> becoming very painful for those of us who produce general Netfilter
> configuration tools. The situation is exacerbated by the fact that
> ebtables doesn't support modifying the mark value via logical AND/OR and
> the other fwmark consumers (tc, ip) don't allow a mask when testing the
> fwmark value.

I suggested one solution
http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/16244

and Patrick McHardy has suggested using ct_extend.

I've not looked into this further because I'm too busy doing xml
versions of iptables, ebtables, iproute anc tc.

[There's an ip route<->xml at:
http://mailman.ds9a.nl/pipermail/lartc/2007q1/020376.html
iptables now has xml<-> convertor ]

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] revoke: consolidate revoked file operations

2007-03-09 Thread Pekka J Enberg
From: Pekka Enberg <[EMAIL PROTECTED]>

Return EBADF for all revoked file operations except for read(2) which
returns zero for special files as the BSDs do and close(2) which is 
always zero.

Cc: Alan Cox <[EMAIL PROTECTED]>
Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 fs/revoked_inode.c |  367 +
 1 file changed, 38 insertions(+), 329 deletions(-)

Index: uml-2.6/fs/revoked_inode.c
===
--- uml-2.6.orig/fs/revoked_inode.c 2007-03-09 13:26:30.0 +0200
+++ uml-2.6/fs/revoked_inode.c  2007-03-09 13:47:54.0 +0200
@@ -29,6 +29,12 @@
return -EBADF;
 }
 
+static ssize_t revoked_special_file_read(struct file *filp, char __user * buf,
+size_t size, loff_t * ppos)
+{
+   return 0;
+}
+
 static ssize_t revoked_file_write(struct file *filp, const char __user * buf,
  size_t siz, loff_t * ppos)
 {
@@ -200,6 +206,35 @@
.splice_read = revoked_file_splice_read,
 };
 
+static const struct file_operations revoked_special_file_ops = {
+   .llseek = revoked_file_llseek,
+   .read = revoked_special_file_read,
+   .write = revoked_file_write,
+   .aio_read = revoked_file_aio_read,
+   .aio_write = revoked_file_aio_write,
+   .readdir = revoked_file_readdir,
+   .poll = revoked_file_poll,
+   .ioctl = revoked_file_ioctl,
+   .unlocked_ioctl = revoked_file_unlocked_ioctl,
+   .compat_ioctl = revoked_file_compat_ioctl,
+   .mmap = revoked_file_mmap,
+   .open = revoked_file_open,
+   .flush = revoked_file_flush,
+   .release = revoked_file_release,
+   .fsync = revoked_file_fsync,
+   .aio_fsync = revoked_file_aio_fsync,
+   .fasync = revoked_file_fasync,
+   .lock = revoked_file_lock,
+   .sendfile = revoked_file_sendfile,
+   .sendpage = revoked_file_sendpage,
+   .get_unmapped_area = revoked_file_get_unmapped_area,
+   .check_flags = revoked_file_check_flags,
+   .dir_notify = revoked_file_dir_notify,
+   .flock = revoked_file_flock,
+   .splice_write = revoked_file_splice_write,
+   .splice_read = revoked_file_splice_read,
+};
+
 static int revoked_inode_create(struct inode *dir, struct dentry *dentry,
int mode, struct nameidata *nd)
 {
@@ -326,330 +361,6 @@
/* truncate_range returns void */
 };
 
-static loff_t revoked_special_file_llseek(struct file *file, loff_t offset,
- int origin)
-{
-   return -ENXIO;
-}
-
-static ssize_t revoked_special_file_read(struct file *filp, char __user * buf,
-size_t size, loff_t * ppos)
-{
-   return -ENXIO;
-}
-
-static ssize_t revoked_special_file_write(struct file *filp,
- const char __user * buf, size_t siz,
- loff_t * ppos)
-{
-   return -ENXIO;
-}
-
-static ssize_t revoked_special_file_aio_read(struct kiocb *iocb,
-const struct iovec *iov,
-unsigned long nr_segs, loff_t pos)
-{
-   return -ENXIO;
-}
-
-static ssize_t revoked_special_file_aio_write(struct kiocb *iocb,
- const struct iovec *iov,
- unsigned long nr_segs, loff_t pos)
-{
-   return -ENXIO;
-}
-
-static int revoked_special_file_readdir(struct file *filp, void *dirent,
-   filldir_t filldir)
-{
-   return -ENXIO;
-}
-
-static unsigned int revoked_special_file_poll(struct file *filp,
- poll_table * wait)
-{
-   return POLLERR;
-}
-
-static int revoked_special_file_ioctl(struct inode *inode, struct file *filp,
- unsigned int cmd, unsigned long arg)
-{
-   return -ENXIO;
-}
-
-static long revoked_special_file_unlocked_ioctl(struct file *file, unsigned 
cmd,
-   unsigned long arg)
-{
-   return -ENXIO;
-}
-
-static long revoked_special_file_compat_ioctl(struct file *file,
- unsigned int cmd,
- unsigned long arg)
-{
-   return -ENXIO;
-}
-
-static int revoked_special_file_mmap(struct file *file,
-struct vm_area_struct *vma)
-{
-   return -ENXIO;
-}
-
-static int revoked_special_file_open(struct inode *inode, struct file *filp)
-{
-   return -ENXIO;
-}
-
-static int revoked_special_file_flush(struct file *file, fl_owner_t id)
-{
-   return 0;
-}
-
-static int revoked_special_file_release(struct inode *inode, struct file *filp)
-{
-   return -ENXIO;
-}
-
-static int revoked_special_file_fsync(struct 

Re: sys_write() racy for multi-threaded append?

2007-03-09 Thread Michael K. Edwards

On 3/8/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:

Dont even try, you *cannot* do that, without breaking the standards, or
without a performance drop.

The only safe way would be to lock the file during the whole read()/write()
syscall, and we dont want this (this would be more expensive than current)
Dont forget 'file' may be some sockets/tty/whatever, not a regular file.


I'm not trying to provide full serialization of concurrent
multi-threaded read()/write() in all exceptional scenarios.  I am
trying to think through the semantics of pipelined I/O operations on
struct file.  In the _absence_ of an exception, something sane should
happen -- and when you start at f_pos == 1000 and write 100 bytes each
from two threads (successfully), leaving f_pos at 1100 is not sane.


Standards are saying :

If an error occurs, file pointer remains unchanged.



From 1003.1, 2004 edition:



This volume of IEEE Std 1003.1-2001 does not specify the value of the
file offset after an error is returned; there are too many cases. For
programming errors, such as [EBADF], the concept is meaningless since
no file is involved. For errors that are detected immediately, such as
[EAGAIN], clearly the pointer should not change. After an interrupt or
hardware error, however, an updated value would be very useful and is
the behavior of many implementations.

This volume of IEEE Std 1003.1-2001 does not specify behavior of
concurrent writes to a file from multiple processes. Applications
should use some form of concurrency control.


The effect on f_pos of an error during concurrent writes is therefore
doubly unconstrained.  In the absence of concurrent writes, it is
quite harmless for f_pos to have transiently contained, at some point
during the execution of write(), an overestimate of the file position
after the write().  In the presence of concurrent writes (let us say
two 100-byte writes to a file whose f_pos starts at 1000), it is
conceivable that the second write will succeed at f_pos == 1100 but
the first will be short (let us say only 50 bytes are written),
leaving f_pos at 1150 and no bytes written in the range 1050 to 1099.
That would suck -- but the standard does not oblige you to avoid it
unless the destination is a pipe or FIFO with O_NONBLOCK clear, in
which case partial writes are flat out verboten.


You cannot know for sure how many bytes will be written, since write() can
returns a count that is different than buflen.


Of course (except in the pipe/FIFO case).  But if it does, and you're
writing concurrently to the fd, you're not obliged to do anything sane
at all.  If you're not writing concurrently, the fact that you
overshot and then fixed it up after vfs_write() returned is _totally_
invisible.  f_pos is local to the struct file.


So you cannot update fpos before calling vfs_write()


You can speculatively update it, and in the common case you don't have
to touch it again.  That's a win.


About your L1 'miss', dont forget that multi-threaded apps are going to
atomic_dec_and_test(>f_count) anyway when fput() is done at the end of
syscall. And you were concerned about multi-threaded apps, didnt you ?


That does indeed interfere with the optimization for multi-threaded
apps.  Which doesn't mean it isn't worth having for single-threaded
apps.  And if we get to the point that that atomic_dec_and_test is the
only thing left (in the common case) that touches the struct file
after a VFS operation completes, then we can evaluate whether f_count
ought to be split out of the struct file and put somewhere else.  In
fact, if I understand the calls inside vfs_write() correctly, f_count
is (usually?) the only member of struct file that is written to during
a call to sys_pwrite64(); so moving it out of struct file and into
some place where it has to be kept cache-coherent anyway would also
improve the performance on SMP of distributed pwrite() calls to a
process-global fd.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] revoke: add f_light flag for struct file

2007-03-09 Thread Pekka Enberg

On 3/9/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:

Then just drop the fget_light() 'optimisation' and always take a reference
(atomic on f_count) regardless of single-thread or not. Instead of dirtying
f_light, just do the straightforward thing and be with it.


That's what I did first but akpm thought it was "unfortunate." Hmm.. ;-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI failures during boot

2007-03-09 Thread Jim van Wel
Hi there,

Please try first a newer kernel of FC6.

2.6.19-1.2911.6.5.fc6

That's 2.6.19.7.

That's the newest one.

Greetings,
Jim,

> Hi,
>
> I just bought a brand new notebook and wanted to install FC6 on it.
Unfornately there are some issues reported by the console during boot.
Here are some reported errors, I attached the full dmesg and other
information just in case.
>
> ACPI: bus type pci registered
> PCI: Using MMCONFIG at e000
> ACPI: Interpreter enabled
> ACPI: Using IOAPIC for interrupt routing
> ACPI: PCI Root Bridge [PCI0] (:00)
> PCI: Probing PCI hardware (bus 00)
> PCI: Ignoring BAR0-3 of IDE controller :00:1f.2
> Boot video device is :01:00.0
> PCI: Transparent bridge - :00:1e.0
> PCI: Bus #07 (-#0a) is hidden behind transparent bridge #06 (-#07) (try
'pci=assign-busses')
> Please report the result to linux-kernel to fix this permanently
>
> and later:
>
> PCI: Using ACPI for IRQ routing
> PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a
report
> PCI: Cannot allocate resource region 7 of bridge :00:1c.0
> PCI: Cannot allocate resource region 8 of bridge :00:1c.0
> NetLabel: Initializing
> NetLabel:  domain hash size = 128
>
> I currently can't update my kernel version and I'm running a
> 2.6.18-1.2798.fc6 kernel.
>
> Could anybody give me some advices please ?
> --
> Francis
>



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] utrace: nommu fixup support utrace

2007-03-09 Thread David Howells

Roland McGrath <[EMAIL PROTECTED]> wrote:

> What I meant to suggest is that I would start from a safety point of view
> with get_user_pages/access_process_vm refusing to do force& to
> MAP_PRIVATE pages that are in fact being shared (ETXTBSY or something).

That's a good idea.  The other possibility I've thought of is maintaining a
list of the changes made to such a region and deapplying them / reapplying
them as the processes get scheduled.  That's probably fine as long as it's
just a few breakpoints and it's a single-CPU system.

But this is irrelevant as it doesn't address the sharing-prevention issue.

> (When it's not being shared, it should do whatever is necessary to make sure
> that page is known dirty and not hand it out for later mappings.)

NOMMU doesn't with pages at this level, but deals with regions of memory
instead.  A mapping may be part of a page, a whole page, or several pages.
NOMMU private file mmap() allocates using kmalloc(), so if you allocate a
1-byte buffer, that's all you're guaranteed to get.

As it happens, when the code sees PT_PTRACED, the VMA is marked as being
unshareable by the simple expedient of turning off VM_MAYSHARE, meaning that it
neither shares with already existing mappings, nor will it be shareable by
mappings that have yet to be made - even within the same process.

> Then you can go about trying to make the safe (no sharing) case come about
> when you want it.

Which brings us back to the if-statement you objected to.  Its presence is
still required so as to prevent sharing of the executable and loader, and this
seems a good a way to do it as any as far as I can see.  Remember that it has
be controlled by something that can be set before the binfmt load_binary() op
runs.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] revoke: core code

2007-03-09 Thread Pekka J Enberg
On Fri, 9 Mar 2007 10:15:15 +0200 (EET)
Pekka J Enberg <[EMAIL PROTECTED]> wrote:
> > +static ssize_t revoked_file_aio_read(struct kiocb *iocb,
> > +const struct iovec *iov,
> > +unsigned long nr_segs, loff_t pos)
> > +{
> > +   return -EBADF;
> > +}
 
On Fri, 9 Mar 2007, Alan Cox wrote:
> Do we need both -EBADF and -EXNIO versions. It is hard to tell from
> existing OS's as they don't support revoke of files just special files ?

No, we don't. We should always do EBADF except for close(2) which is zero 
always and make read(2) zero for special files.

On Fri, 9 Mar 2007 10:15:15 +0200 (EET)
Pekka J Enberg <[EMAIL PROTECTED]> wrote:
> > +static ssize_t revoked_special_file_read(struct file *filp, char __user * 
> > buf,
> > +size_t size, loff_t * ppos)
> > +{
> > +   return -ENXIO;
> > +}

On Fri, 9 Mar 2007, Alan Cox wrote:
> Bezerkly Unix returns 0 for the special file read case

Aah, I'll fix that up. Thanks.

Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm1

2007-03-09 Thread Frederik Deweerdt
On Wed, Mar 07, 2007 at 08:18:39PM -0800, Andrew Morton wrote:
> - The wireless changes in here need a lot of testers, please.  It is major
>   rework.
[...]
>   I was able to get ipw2200 working after some fumbling,
Any details on the symptoms? I'm unable to boot rc3-mm2, and it hangs
right after printing the ipw2200 driver message. I'll investigate that
this week-end.

Regards,
Frederik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: any thoughts yet on a "generic" ioctl.h?

2007-03-09 Thread Robert P. J. Day
On Fri, 9 Mar 2007, Stefan Richter wrote:

> Robert P. J. Day wrote:
> > each simplification could be submitted as
> > a separate arch-specific patch, as many things are.
> >
> > i was more asking about the *philosophy* of that patch,
>
> The justification of this initial patch is more obvious if followed
> up by those subsequent patches which make use of the initial one.

  no, it's not.  i should be able to ask about the *feasibility* of a
possible simplifying patch without having to provide an actual example
of its application. if someone can't immediately see what i'm trying
to do given the previously-posted patch, then they shouldn't be
commenting on it one way or the other.

  i'm not going to go to the trouble of creating and submitting all
possible follow-up patches, only to have someone higher up the food
chain "NAK" the whole idea on philosophical grounds.  either you can
see what i'm talking about or you can't.

rday

p.s.  there is already ample predecent for what i'm asking here.  one
can submit a patch to add, say, a simplifying macro to kernel.h
without simultaneously submitting patches for everywhere it possibly
might be used.

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use more gcc extensions in the Linux headers

2007-03-09 Thread Theodore Ts'o
On Fri, Mar 09, 2007 at 04:56:32PM +1100, Rusty Russell wrote:
> __builtin_types_compatible_p() has been around since gcc 2.95, and we
> don't use it anywhere.  This patch quietly fixes that.
> 
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

Is your clock set correctly?  Looks like this mail was sent 23 days
early.  :-)

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] Could the k8temp driver be interfering with ACPI?

2007-03-09 Thread Pavel Machek
Hi!

> >>Can you take this as a wishlist item?
> >>
> >>It would be nice if next version of acpi specs supported table
> >>
> >>'AML / SMM BIOS will access these ports'
> >>
> >>...so we can get it correct with acpi4 or something..?
> >>
> >
> >I can only second Pavel's wish here. This would be highly convenient
> >for OS developers to at least know which resources are accessed by AML
> >and SMM. Without this information, we can never be sure that OS-level
> >code won't conflict with ACPI or SMM.
> >
> >  
> BIOS vendors are not required to support latest and greatest ACPI spec. 
> So even if some future spec version
> will include this ports description, we will still have majority of 
> hardware not exporting it...

That's okay... vendors are not required to support _ACPI_, but they
mostly do. Can we get the "ports used by BIOS" table to the spec?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2/6] 2.6.21-rc2: known regressions

2007-03-09 Thread Pavel Machek
Hi!

> > disabling the following radeonfb options in the .config made resume work 
> > again:
> 
> In general, don't even *try* to use radeonfb for suspend/resume.
> 
> I don't think it has ever worked, except on some very rare laptops 
> (largely PPC Macs) where people had enough information to set up the 
> PLL's.

It worked ok on thinkpad x32. BIOS did the setup in resume case (with
acpi_sleep=..., anyway), and radeonfb could pick the card up from there.

> I don't think the other framebuffer drivers are much better.
> 
> You're better off using the VGA console, and lettign X re-initialize the 
> graphics device. That generally at least has a reasonably good chance of 
> working.

suspend.sf.net, s2ram there has a long list of tricks. If you invent
new one, please add it there.

> Re-initializing graphics modes really is very hard. You can try with the 
> BIOS video hack (I forget the kernel command line to turn it on), but we 
> really do end up depending on X doing it better.

...or you can try vbetool; it is similar hack to acpi_sleep=... , but
it works for more people.

> Some day we may have modesetting support in the kernel for some graphics 
> hw, right now it's pretty damn spotty.

Yep, that's the way to go.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix sparc TIF_USEDFPU flag atomicity

2007-03-09 Thread William Lee Irwin III
On Thu, 8 Mar 2007 22:12:27 -0500 Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
>> Fix sparc TIF_USEDFPU flag atomicity
>> Non atomic update of TIF can be very dangerous, except at thread structure
>> creation time. Here I standardize the TIF_USEDFPU usage of the sparc arch.
>> Applies on 2.6.20.
>> Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

On Thu, Mar 08, 2007 at 09:25:23PM -0800, David Miller wrote:
> Also applied, thanks a lot.

Thanks again for doing just about everything here while I get my act
back together (never mind how preposterously long it's taking).

Acked-by: William Irwin <[EMAIL PROTECTED]>


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: any thoughts yet on a "generic" ioctl.h?

2007-03-09 Thread Stefan Richter
Robert P. J. Day wrote:
> each simplification could be submitted as
> a separate arch-specific patch, as many things are.
> 
> i was more asking about the *philosophy* of that patch,

The justification of this initial patch is more obvious if followed up
by those subsequent patches which make use of the initial one.
-- 
Stefan Richter
-=-=-=== --== --===
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drivers/char/vt.c: check kmalloc() return value.

2007-03-09 Thread Alan Cox
On Thu, 8 Mar 2007 23:27:13 -0800
Amit Choudhary <[EMAIL PROTECTED]> wrote:

> Description: Check the return value of kmalloc() in function con_init(), in 
> file drivers/char/vt.c.

NAK; This occurs at boot and if it fails we are wasting our time
recovering.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] revoke: wire up i386 system calls

2007-03-09 Thread Alan Cox
On Fri, 9 Mar 2007 10:16:30 +0200 (EET)
Pekka J Enberg <[EMAIL PROTECTED]> wrote:

> From: Pekka Enberg <[EMAIL PROTECTED]>
> 
> Make revokeat and frevoke system calls available to user-space on i386.

Acked-by: Alan Cox <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] revoke: core code

2007-03-09 Thread Alan Cox
On Fri, 9 Mar 2007 10:15:15 +0200 (EET)
Pekka J Enberg <[EMAIL PROTECTED]> wrote:

> From: Pekka Enberg <[EMAIL PROTECTED]>
> 
> The revokeat(2) and frevoke(2) system calls invalidate open file
> descriptors and shared mappings of an inode. After an successful
> revocation, operations on file descriptors fail with the EBADF or
> ENXIO error code for regular and device files,
> respectively. Attempting to read from or write to a revoked mapping
> causes SIGBUS.

Acked-by: Alan Cox <[EMAIL PROTECTED]>


> +static ssize_t revoked_file_aio_read(struct kiocb *iocb,
> +  const struct iovec *iov,
> +  unsigned long nr_segs, loff_t pos)
> +{
> + return -EBADF;
> +}

Do we need both -EBADF and -EXNIO versions. It is hard to tell from
existing OS's as they don't support revoke of files just special files ?

> +static ssize_t revoked_special_file_read(struct file *filp, char __user * 
> buf,
> +  size_t size, loff_t * ppos)
> +{
> + return -ENXIO;
> +}

Bezerkly Unix returns 0 for the special file read case


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] revoke: add f_light flag for struct file

2007-03-09 Thread Eric Dumazet
On Friday 09 March 2007 11:43, Pekka Enberg wrote:
> On 3/9/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:
> > Cannot we use a flag in 'struct files_struct', set to one when the task
> > is mono-thread (at task creation in fact), and set to 0 when it creates a
> > new thread  (or when someone remotely access to this "struct
> > files_struct" in /proc/pid/fd/... )
>
> How does that work? fget_light() has a built-in assumption that as
> long as you don't share files_struct, it's okay not to take an extra
> reference as current is only one doing close(2) and revoke(2) changes
> that. So it's not really about being single-threaded or not.

I just dropped one (silly ?) idea and expected you made the hard work :)

Then just drop the fget_light() 'optimisation' and always take a reference 
(atomic on f_count) regardless of single-thread or not. Instead of dirtying 
f_light, just do the straightforward thing and be with it.

(that is : fget_light() = fget() = no more keeping fput_needed everywhere, and 
convoluted things in some dark sides of the kernel.

It will save some conditional branches and complexity, and you dont need this 
f_light thing.

>
> On 3/9/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:
> > Also, the thing is racy.
>
> Aah, fget_light() indeed has a race window between fcheck_files() and
> set_f_light().

Yes, you see how hard it is to get this right.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/7] revoke: add documentation

2007-03-09 Thread Alan Cox
On Fri, 9 Mar 2007 10:16:09 +0200 (EET)
Pekka J Enberg <[EMAIL PROTECTED]> wrote:

> From: Pekka Enberg <[EMAIL PROTECTED]>
> 
> This documents revoke file operation in Documentation/filesystems/vfs.txt.
> 
> Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>

Acked-by: Alan Cox <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: more than 65535 outbound connections

2007-03-09 Thread Matti Aarnio
On Fri, Mar 09, 2007 at 04:13:00PM +0530, Niklaus wrote:
> yes now lets take 2 dest machines , source ip is fixed , source port (2^16 
> - 1)
> destip is fixed (a.a.a.a and b.b.b.b) ,dest port(2^16 -1) each ,
> 
> for a connection we have one port used , say connection 1 is
> 
> source ip,port 1 , a.a.a.a port 1
> source ip,port 2 , a.a.a.a port 2
> .
> .
> .
> source ip,port 65535 , a.a.a.a port 65535

You do have some sort of fixation of having same port numbers at both ends.
In some rare applications that is done (e.g. with NTP server-server connections
using UDP), but it is very rare and never done with TCP.

Now if you have 65535 server ports at a.a.a.a, you can have very nearly
4000 million TCP streams in between them.

> so total of 65535 connections (assume traffic is still going on, a
> movie on a slow line dialup or 1kbps )
> 
> now if i try to open another connection (assume lots of file
> descriptors are present) to a.a.a.a what happens
> 
> to b.b.b.b what happens
> 
> i think both will not get established as the OS doesn't have any free
> source ports or am i wrong

  you are wrong.
 
> >David Lang

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] revoke: special mmap handling

2007-03-09 Thread Alan Cox
On Fri, 9 Mar 2007 10:14:09 +0200 (EET)
Pekka J Enberg <[EMAIL PROTECTED]> wrote:

> From: Pekka Enberg <[EMAIL PROTECTED]>
> 
> This adds special handling for revoked memory mappings.  We want to
> raise SIGBUS when accessing revoked mappings and return ENODEV when
> trying to remap with mmap(2).
> 
> Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>

Acked-by: Alan Cox <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: more than 65535 outbound connections

2007-03-09 Thread Matti Aarnio
On Fri, Mar 09, 2007 at 01:49:34PM +0530, Niklaus wrote:
> Hi,
> 
> I could be wrong in the below description or might have misunderstood
> many of the concepts , please correct appropriately.
> 
> 65535 ports can allowed . So on a  machine namely C you can have max
> 65535 outbound connections

IP connections are quads (four-tuples), machine A and B IP addresses,
plus 16 bit port numbers at both ends.

You can have about  64 k * 3 G = 192 T  connections out from a machine
to any single port number out there to all existing IP addresses.

If  A.ip, B.ip, and B.port  stay the same, A can setup up to some
10 - 50 thousand parallel connections.  (Depending on allowed dynamic
source IP port number space at machine A.)

If either B.ip or B.port changes, A can reuse a port that is actively
connected to something. Resulting four-tuple is different -> connection
is different.

Does Linux reuse port numbers in this way ?
It most likely does, but I didn't verify.

> What i was thinking was to send to another machines A and B from the
> same port [X] and then when we get data from it to [X] we can the send
> it to the correct application using stateful mapping or storing some
> information . The machines A and B are unaware of this mapping from
> the C  machine.

You want to make a "L4 switch" -- a "load balancer" ?
That thing is a NAT-box, and is really not making buffered TCP flows,
but rather mapping IP/TCP header rewriters to divert the flows to new
destinations.

> Can we increase it by anymeans in the kernel. Does we have patches for the 
> above
> 
> i read on the web that terry lambert has got 1.6 million simultaneous
> connection ? how is the way it is done.
> 
> http://kerneltrap.org/node/277

With 50 thousand connections per single ( A.ip / B.ip / B.port ) set,
one needs only 32  B.ports or A.ip:s or B.ip:s to do that 1.6 million
parallel TCP streams.

Such does eat up lots and lots system kernel memory...

> Regs
> Nik

/Matti Aarnio

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] Input: psmouse - create PS/2 protocol options for Kconfig

2007-03-09 Thread Andres Salomon
Dmitry Torokhov wrote:
> On 2/16/07, Andres Salomon <[EMAIL PROTECTED]> wrote:
>> Dmitry Torokhov wrote:
>> > On Thursday 15 February 2007 20:30, Andrew Morton wrote:
>> >> On Thu, 15 Feb 2007 19:55:29 -0500
>> >> Andres Salomon <[EMAIL PROTECTED]> wrote:
>> [...]
>> >> Perhaps a nicer implementation would be to have a separate .c file
>> for each
>> >> variant.
>> >>
>> >
>> > Having completely separate sub-drivers is very hard because of very
>> delicate
>> > PS/2 protocol probing
>> >
>> > What do you think about patch below? It somewhat reduces #ifdef
>> clutter in main
>> > module moving it in .h files...
>> >
>>
>> Normally, I'm a fan of that sort of thing.  However, in this case, I
>> think it makes sense to have the #ifdefs right in the probe function; at
>> least for me, it makes it easier to understand what's going on.  The
>> synaptics stuff is especially tricky; with a cursory glance over the
>> code, one might assume that all the synaptics functions disappear when
>> CONFIG_MOUSE_PS2_SYNAPTICS is unset.  However, if the #ifdef's are in
>> the probe function, it's pretty clear that some synaptics functions
>> still get called even when CONFIG_MOUSE_PS2_SYNAPTICS is unset.
>>
> 
> Thit is a valid point but #ifdef maze in the middle of already messy
> psmouse-extensions() is too much for me. I guess I will just add a
> comment explaining that synaptics probing is really special.
> 

I haven't seen patches in your tree; are you waiting for me to do the
cleanups and resend?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2/6] 2.6.21-rc2: known regressions

2007-03-09 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> > disabling the following radeonfb options in the .config made resume 
> > work again:
> 
> In general, don't even *try* to use radeonfb for suspend/resume.
> 
> I don't think it has ever worked, except on some very rare laptops 
> (largely PPC Macs) where people had enough information to set up the 
> PLL's.
> 
> I don't think the other framebuffer drivers are much better.
> 
> You're better off using the VGA console, and lettign X re-initialize 
> the graphics device. That generally at least has a reasonably good 
> chance of working.
> 
> Re-initializing graphics modes really is very hard. You can try with 
> the BIOS video hack (I forget the kernel command line to turn it on), 
> but we really do end up depending on X doing it better.

it's the s3_sleep boot option and /proc/sys/kernel/acpi_video_mode, but 
that didnt make a difference.

> Some day we may have modesetting support in the kernel for some 
> graphics hw, right now it's pretty damn spotty.

having no video is what i'd have expected - but getting a /hang/ is not 
what i'd have expected.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] devres: release resources on device_del()

2007-03-09 Thread Tejun Heo
Tejun Heo wrote:
> Some platform devices are driven without driver attached, so managed
> resources can be acquired without driver attached.  Make sure such
> resources are released by calling devres_release_all() in
> device_del().
> 
> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
> ---
> This one fixes oops on pata_platform and pata_legacy unload.  libata
> being the only user of devres at the moment.  I think this can go
> through libata-dev#upstream.

I meant libata-dev#upstream-fixes.  This needs to be fixed in 2.6.21.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4]: [SPARC64]: Add clocksource/clockevents support.

2007-03-09 Thread Ingo Molnar

* David Miller <[EMAIL PROTECTED]> wrote:

> >From 1171ef62b18d7eef093ecf961dd09b11339d53d9 Mon Sep 17 00:00:00 2001
> From: David S. Miller <[EMAIL PROTECTED]>
> Date: Mon, 5 Mar 2007 15:28:37 -0800
> Subject: [PATCH] [SPARC64]: Add clocksource/clockevents support.
> 
> I'd like to thank John Stul and others for helping me along the way.
> 
> A lot of cleanups fell out of this.  For example, the get_compare() 
> tick_op was totally unused, so was deleted.  And the most often used 
> tick_op members were grouped together for cache-friendlyness.

cool stuff! It's really gratifying to see that you were able to two new 
major features on Sparc64 (CONFIG_NO_HZ and CONFIG_HIGH_RES_TIMERS) via:

>  7 files changed, 25 insertions(+), 172 deletions(-)
>  4 files changed, 238 insertions(+), 230 deletions(-)

so it's in fact a net removal of code! I think this really demonstrates 
the power of unified frameworks.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] RSDL completely fair starvation free interactive cpu scheduler

2007-03-09 Thread William Lee Irwin III
On Thu, Mar 08, 2007 at 10:31:48PM -0800, Linus Torvalds wrote:
> No. Really.
> I absolutely *detest* pluggable schedulers. They have a huge downside: 
> they allow people to think that it's ok to make special-case schedulers. 
> And I simply very fundamentally disagree.
> If you want to play with a scheduler of your own, go wild. It's easy 
> (well, you'll find out that getting good results isn't, but that's a 
> different thing). But actual pluggable schedulers just cause people to 
> think that "oh, the scheduler performs badly under circumstance X, so 
> let's tell people to use special scheduler Y for that case".
> And CPU scheduling really isn't that complicated. It's *way* simpler than 
> IO scheduling. There simply is *no*excuse* for not trying to do it well 
> enough for all cases, or for having special-case stuff.
> But even IO scheduling actually ends up being largely the same. Yes, we 
> have pluggable schedulers, and we even allow switching them, but in the 
> end, we don't want people to actually do it. It's much better to have a 
> scheduler that is "good enough" than it is to have five that are "perfect" 
> for five particular cases.

For the most part I was trying to assist development, but ran out of
patience and interest before getting much of anywhere. The basic idea
was to be able to fork over a kernel to a benchmark team and have them
run head-to-head comparisons, switching schedulers on the fly,
particularly on machines that took a very long time to boot. The
concept ideally involved making observations and loading fresh
schedulers based on them as kernel modules on the fly. I was more
interested in rapid incremental changes than total rewrites, though I
considered total rewrites to be tests of adequacy, since somewhere in
the back of my mind I had thoughts about experimenting with gang
scheduling policies on those machines taking very long times to boot.

What actually got written, the result of it being picked up by others,
and how it's getting used are all rather far from what I had in mind,
not that I'm offended in the least by any of it. I also had little or
no interest in mainline for it. The intention was more on the order of
an elaborate instrumentation patch for systems where the time required
to reboot is prohibitive and the duration of access strictly limited.
(In fact, downward-revised estimates of the likelihood of such access
also factored into the abandonment of the codebase.)

I consider policy issues to be hopeless political quagmires and
therefore stick to mechanism. So even though I may have started the
code in question, I have little or nothing to say about that sort of
use for it.

There's my longwinded excuse for having originated that tidbit of code.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible "struct pid" leak from tty_io.c

2007-03-09 Thread Catalin Marinas

On 08/03/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

"Catalin Marinas" <[EMAIL PROTECTED]> writes:
> I'm trying to track down a kmemleak report (on an ARM platform) which
> seems to have appeared with commit
> ab521dc0f8e117fd808d3e425216864d60390500. As I'm not familiar with the
> TTY layer at all, is it possible that the above commit missed a
> put_pid() call on some path?

I won't arbitrarily rule a missing put_pid out.  I have been know to
goof up upon occasion.


I'm not entirely sure it's this part of the code, I would have to do
some more investigations (I didn't get this leak before). An
"unscientific" test shows that if I define get_pid/put_pid in the
tty_io.c file so that pid->count is not affected, the leak disappears.
This doesn't necessarily prove that the fault is here though.


I just did a quick look to see what kmemleak is.  A conservative
tracing leak detector sounds interesting.  Except for all of the list
heads which lead to container_of calls I don't know of anything in the
struct pid implementation that would be difficult for it to work with.
Well that and there is some rcu access protection which can delay the
free by a bit.


Kmemleak can cope with list heads and rcu delayed freeing as it also
checks for pointer aliases (those accessible via container_of).


> The /sbin/init application calls sys_clone() a few times but only one
> leak is reported (see below). Looking at the reported pid object (at
> 0xc7c14500), count is 2 and nr is 296 but no process with pid 296
> exists any more.

It could still be a valid session or a process group id.
If you examine the struct pid you can test for this be examining all
of the list heads it keeps.  If there is something on any of the
lists that would account a count of 1.  How we have a count of 2
I don't have enough information to guess.


I think it's only the pid_chain and rcu member that could be placed in
a list and kmemleak scans the memory for these two offsets as well.
I'll check those lists anyway but I doubt it's a more fundamental
problem with how kmemleak handles struct pid as I should've probably
got more reports.


In most any other layer we cache pids indefinitely and a situation
where we have a pointer to a struct pid with a ref count of 1 long
after the process goes away is expected.


Yes, indeed, but what kmemleak reports is that the pid structure
wasn't freed yet and there is no way to determine its pointer directly
or via container_of on members (by scanning the memory), hence it is
considered a leak.


I don't understand your situation enough to guess what is going wrong
yet.  Hopefully I have given you enough information to get started.


Yes, many thanks. I'll dig further and let you know.

--
Catalin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] iomap: implement pcim_iounmap_regions()

2007-03-09 Thread Tejun Heo
Implement pcim_iounmap_regions() - the opposite of
pcim_iomap_regions().

Signed-off-by: Tejun heo <[EMAIL PROTECTED]>
---
This one is used by libata's new init model and generally useful for
driver midlayers.  Please push it through libata-dev#upstream.

Thanks.

 include/linux/pci.h |1 +
 lib/devres.c|   26 ++
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2c4b684..8d3b221 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -844,6 +844,7 @@ void __iomem * pcim_iomap(struct pci_dev *pdev, int bar, 
unsigned long maxlen);
 void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr);
 void __iomem * const * pcim_iomap_table(struct pci_dev *pdev);
 int pcim_iomap_regions(struct pci_dev *pdev, u16 mask, const char *name);
+void pcim_iounmap_regions(struct pci_dev *pdev, u16 mask);
 
 extern int pci_pci_problems;
 #define PCIPCI_FAIL1   /* No PCI PCI DMA */
diff --git a/lib/devres.c b/lib/devres.c
index eb38849..b1d336c 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -296,5 +296,31 @@ int pcim_iomap_regions(struct pci_dev *pdev, u16 mask, 
const char *name)
return rc;
 }
 EXPORT_SYMBOL(pcim_iomap_regions);
+
+/**
+ * pcim_iounmap_regions - Unmap and release PCI BARs
+ * @pdev: PCI device to map IO resources for
+ * @mask: Mask of BARs to unmap and release
+ *
+ * Unamp and release regions specified by @mask.
+ */
+void pcim_iounmap_regions(struct pci_dev *pdev, u16 mask)
+{
+   void __iomem * const *iomap;
+   int i;
+
+   iomap = pcim_iomap_table(pdev);
+   if (!iomap)
+   return;
+
+   for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+   if (!(mask & (1 << i)))
+   continue;
+
+   pcim_iounmap(pdev, iomap[i]);
+   pci_release_region(pdev, i);
+   }
+}
+EXPORT_SYMBOL(pcim_iounmap_regions);
 #endif
 #endif
-- 
1.5.0.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: more than 65535 outbound connections

2007-03-09 Thread Niklaus

On 3/9/07, David Lang <[EMAIL PROTECTED]> wrote:

On Fri, 9 Mar 2007, Florian Weimer wrote:

>> i read on the web that terry lambert has got 1.6 million simultaneous
>> connection ? how is the way it is done.
>
> Multiple IP addresses, I guess.

what must be unique is the four-parts of a connection
source IP, source port, destination IP, destination port

as long as the set is unique any element can be re-used (a big webserver has one
IP and port on the server side, but many IPs and ports on the client side)

when you make a connection you have the option of not specifying the source IP
and port (letting the OS/library pick ones for you). some libraries will not
re-use the same source port for multiple connections, others will (with the
appropriate options)

if you want to have your program assign the source IPs and port itself you can
do so (you may have to also give the library/os a flag that tells it you know
what you're doing, and it's ok to let you re-use ports)



yes now lets take 2 dest machines , source ip is fixed , source port (2^16 - 1)
destip is fixed (a.a.a.a and b.b.b.b) ,dest port(2^16 -1) each ,

for a connection we have one port used , say connection 1 is

source ip,port 1 , a.a.a.a port 1
source ip,port 2 , a.a.a.a port 2
.
.
.
source ip,port 65535 , a.a.a.a port 65535

so total of 65535 connections (assume traffic is still going on, a
movie on a slow line dialup or 1kbps )

now if i try to open another connection (assume lots of file
descriptors are present) to a.a.a.a what happens

to b.b.b.b what happens

i think both will not get established as the OS doesn't have any free
source ports or am i wrong


David Lang


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] revoke: add f_light flag for struct file

2007-03-09 Thread Pekka Enberg

On 3/9/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:

Cannot we use a flag in 'struct files_struct', set to one when the task is
mono-thread (at task creation in fact), and set to 0 when it creates a new
thread  (or when someone remotely access to this "struct files_struct"
in /proc/pid/fd/... )


How does that work? fget_light() has a built-in assumption that as
long as you don't share files_struct, it's okay not to take an extra
reference as current is only one doing close(2) and revoke(2) changes
that. So it's not really about being single-threaded or not.

On 3/9/07, Eric Dumazet <[EMAIL PROTECTED]> wrote:

Also, the thing is racy.


Aah, fget_light() indeed has a race window between fcheck_files() and
set_f_light().
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] fs: add an iovec iterator

2007-03-09 Thread Christoph Hellwig
On Thu, Feb 08, 2007 at 06:03:50PM -0800, Nate Diller wrote:
> i had a patch integrating the iodesc idea, but after some thought, had
> decided to call it struct file_io.  That name reflects the fact that
> it's doing I/O in arbitrary lengths with byte offsets, and struct
> file_io *fio contrasts well with struct bio (block_io).  I also had
> used the field ->nbytes instead of ->count, to clarify the difference
> between segment iterators, segment offsets, and absolute bytecount.

struct file_io sounds rather ugly to me, I don't know why.  And it's
really user I/O so we could call it struct uio (historical punt intended) :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] Could the k8temp driver be interfering with ACPI?

2007-03-09 Thread Alexey Starikovskiy

Jean Delvare wrote:

On Fri, 9 Mar 2007 07:18:56 +, Pavel Machek wrote:
  

Port (and memory) addresses can be dynamically generated by the AML code
and thus, there is no way that the ACPI subsystem can statically predict
any addresses that will be accessed by the AML.
  

Can you take this as a wishlist item?

It would be nice if next version of acpi specs supported table

'AML / SMM BIOS will access these ports'

...so we can get it correct with acpi4 or something..?



I can only second Pavel's wish here. This would be highly convenient
for OS developers to at least know which resources are accessed by AML
and SMM. Without this information, we can never be sure that OS-level
code won't conflict with ACPI or SMM.

  
BIOS vendors are not required to support latest and greatest ACPI spec. 
So even if some future spec version
will include this ports description, we will still have majority of 
hardware not exporting it...


Regards,
   Alex.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] fs: introduce perform_write aop

2007-03-09 Thread Christoph Hellwig
Hi Nick,

sorry for my later reply, this has been on my to answer list for the last
month and I only managed to get back to it now.

On Thu, Feb 08, 2007 at 02:07:36PM +0100, Nick Piggin wrote:
> Add a new "perform_write" aop, which replaces prepare_write and commit_write
> as a single call to copy a given amount of userdata at the given offset. This
> is more flexible, because the implementation can determine how to best handle
> errors, or multi-page ranges (eg. it may use a gang lookup), and only requires
> one call into the fs.

I really like this idea, especially for avoiding to call into the allocator
for every block.  Have you contacted the reiser4 folks whether this would
superceed their batch_write op completely?

> One problem with this interface is that it cannot be used to write into the
> filesystem by any means other than already-initialised buffers via iovecs. So
> prepare/commit have to stay around for non-user data... 

Actually I think that's a a good thing to a certain extent.  It reminds
us that all other users are horrible abuse of the interface.  I'd even
go so far as to make batch_write a callback that the filesystem passes
to generic_file_aio_write to make clear it's not a generic thing but
a helper.  (It's not a generic thing because it's the upper layer writing
into the pagecache, not a pagecache to fs below operation).

The still leaves open on how to get rid of ->prepare_write and ->commit_write
compltely, and for that we'll probably need ->kernel_read and ->kernel_write
file operations.  But that's a step you shouldn't consider yet when doing
this work.

> Another thing is that it seems to be less able to be implemented in generic,
> reusable code. It should be possible to introduce a new 2-op interface (or
> maybe just a new error handler op) which can be used correctly in generic 
> code.

We should be able to find a nice abstraction for this, see my next mails.

> + /*
> +  * perform_write replaces prepare and commit_write callbacks.
> +  */

This is a rather useless comment :)  Better remove it and add a proper
descriptions to Documentation/filesystems/vfs.txt and
Documentation/filesystems/Locking

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm1 RSDL results

2007-03-09 Thread Serge Belyshev
William Lee Irwin III <[EMAIL PROTECTED]> writes:

> On Fri, Mar 09, 2007 at 12:07:06PM +0300, Serge Belyshev wrote:
>> If you see sched_yield() when stracing any 3d program, I suggest you
>> to try this bruteforce workaround, which works fine for me,
>> disable sched_yield():
>
> May I suggest LD_PRELOAD of a library consisting of only a nopped
> sched_yield() function in userspace?
>

Sure. This is definitely clearer way to do. You just need to put
export LD_PRELOAD=/path/to/your/lib.so somewhere early enough.

cat > yield.c << EOF
int sched_yield (void)
{
return 0;
}
EOF
gcc yield.c -o yield.so -shared -O2 -fPIC -g
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] devres: release resources on device_del()

2007-03-09 Thread Tejun Heo
Some platform devices are driven without driver attached, so managed
resources can be acquired without driver attached.  Make sure such
resources are released by calling devres_release_all() in
device_del().

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
This one fixes oops on pata_platform and pata_legacy unload.  libata
being the only user of devres at the moment.  I think this can go
through libata-dev#upstream.

 drivers/base/core.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index cf2a398..89ebe36 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -787,6 +787,13 @@ void device_del(struct device * dev)
device_remove_attrs(dev);
bus_remove_device(dev);
 
+   /*
+* Some platform devices are driven without driver attached
+* and managed resources may have been acquired.  Make sure
+* all resources are released.
+*/
+   devres_release_all(dev);
+
/* Notify the platform of the removal, in case they
 * need to do anything...
 */
-- 
1.5.0.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OOPS with 2.6.21rc2-git (ata: conflict with ide0/1)

2007-03-09 Thread Tejun Heo
Jeff Garzik wrote:
> Which patch?
> 
> Since this affects libata directly, and since devres came in via libata,
> I would rather that libata bugs not get /too/ blocked by patches in
> other trees.

This one.

  http://article.gmane.org/gmane.linux.kernel/495515

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Keyboard stops working after *lock [Was: 2.6.21-rc2-mm1]

2007-03-09 Thread Jiri Slaby

Andrew Morton napsal(a):

On Sat, 03 Mar 2007 16:54:45 +0100 Jiri Slaby <[EMAIL PROTECTED]> wrote:



Jiri Slaby napsal(a):

Andrew Morton napsal(a):

Temporarily at

  http://userweb.kernel.org/~akpm/2.6.21-rc2-mm1/
Weird behaviour of numlock and capslock on USB keyboard in X. After 

Hmm, it's not X related. Console behaves similarly.


pressing

Or actually if some script tries to change LEDs (logout).


those keys, keyboard "hangs" -- no sysrq, no lock leds are flashing.

After plug; unplug of the keyboard, it works unless I press the keys
again.

There is nothing in dmesg. X log says
(II) evdev brain: Rescanning devices (3).
(II) evdev brain: Rescanning devices (4).
(II) evdev brain: Rescanning devices (5).
(II) evdev brain: Rescanning devices (6).
(II) evdev brain: Rescanning devices (7).
(II) evdev brain: Rescanning devices (8).
(II) evdev brain: Rescanning devices (9).
(II) evdev brain: Rescanning devices (10).
(II) evdev brain: Rescanning devices (11).
(II) evdev brain: Rescanning devices (12).
(II) evdev brain: Rescanning devices (13).
(II) evdev brain: Rescanning devices (14).
in this kernel, but I don't know if this is relevant.

After booting back to .20-mm2 everything is OK.


Thanks.  Cc's added.


Remains unsolved in 2.6.21-rc3-mm2.

regards,
--
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E

Hnus <[EMAIL PROTECTED]> is an alias for /dev/null
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] Could the k8temp driver be interfering with ACPI?

2007-03-09 Thread Jean Delvare
On Fri, 9 Mar 2007 07:18:56 +, Pavel Machek wrote:
> > Port (and memory) addresses can be dynamically generated by the AML code
> > and thus, there is no way that the ACPI subsystem can statically predict
> > any addresses that will be accessed by the AML.
> 
> Can you take this as a wishlist item?
> 
> It would be nice if next version of acpi specs supported table
> 
> 'AML / SMM BIOS will access these ports'
> 
> ...so we can get it correct with acpi4 or something..?

I can only second Pavel's wish here. This would be highly convenient
for OS developers to at least know which resources are accessed by AML
and SMM. Without this information, we can never be sure that OS-level
code won't conflict with ACPI or SMM.

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] chaostables

2007-03-09 Thread Jan Engelhardt
Hello,

On Mar 9 2007 09:35, Amin Azez wrote:
>* Jan Engelhardt wrote, On 08/03/07 20:26:
>> xt_portscan needs to keep track of what packets the machine has already 
>> seen. So on the first SYN, the connection is marked with "1". (Then we 
>> send our SYN-ACK... and the connection turns ESTABLISHED.) The next 
>> packet that is received will be an ACK or an RST. But it must come 
>> _exactly after_ the SYN, so just using --tcp-flags ACK will not work. A 
>> state which can be remembered is required. For that, an automaton is 
>> used, whose state is saved in the connection mark.
>
>There would me more point in having this as a new match if it didn't
>trample on the connection mark, but used it's own slot or flag-bit.

Adding a member to the ip_conntrack/nf_conntrack and sk_buff struct would
increase the struct sizes, and that would penalize users who do not intend
to use xt_portscan.

I do not see why the packet/connection marks should not be used to record
additional information. After all, that is what users use marks for.
xt_portscan can be precisely configured how to set its marks (mask is
supported), so that it only takes away 4 bits of the 32-bit connection
mark and 1 bit of the 32-bit packet mark.

Almost never I required connection marking myself except for this
portscanning automaton and perhaps a little MARK here and there for
finely-tuned SNAT. Again, things might look different on your side(s).
QoS, while primarily using CLASSIFY, is one point where MARK can be used.
I would assume that those who use xt_portscan should be fine with the
remaining 24 bits.


Thank you,

Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, take2] VFS : Delay the dentry name generation on sockets and pipes.

2007-03-09 Thread Eric Dumazet
Hi Andrew

Please find a new version of this patch : I realized d_path() has very 
uncommon semantic (it seems nobody caught the point in previous patches), and 
had to change the documentation and pipefs_dname() / sockfs_dname() 
accordingly.

Now, readlink("/proc/pid/fd/xx", buffer, 4096) returns the exact number of 
bytes, not the whole 4095 bytes :)

Extract of new Documentation:

CAUTION : d_path() logic is quite  tricky. 
The correct way to return for example "Hello" is to put it
at the end of the buffer, and returns a pointer to the first char.

   Example :

static char *somefs_dname(struct dentry *dent, char *buffer, int buflen)
{
   char *string = "Hello";
   int sz = strlen(string) + 1;
   if (sz > buflen)
   return ERR_PTR(-ENAMETOOLONG);
   buffer += (buflen - sz);
   return memcpy(buffer, string, sz);
}



Thank you

[PATCH] VFS : Delay the dentry name generation on sockets and pipes.

1) Introduces a new method in 'struct dentry_operations'. This method called 
d_dname() might be called from d_path() to build a pathname 
for special filesystems. It is called without locks.

Future patches (if we succeed in having one common dentry for all 
pipes/sockets) may need to change prototype of this method, but we now use :
char *d_dname(struct dentry *dentry, char *buffer, int buflen);


2) Use this new method for sockets : No more sprintf() at socket creation. 
This is delayed up to the moment someone does an access to /proc/pid/fd/...

3) Use this new method for pipes : No more sprintf() at pipe creation. This is 
delayed up to the moment someone does an access to /proc/pid/fd/...

A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a 
*nice* speedup on my Pentium(M) 1.6 Ghz :

3.090 s instead of 3.450 s

Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>
Acked-by: Christoph Hellwig <[EMAIL PROTECTED]>
Acked-by: Linus Torvalds <[EMAIL PROTECTED]>

 Documentation/filesystems/Locking |2 ++
 Documentation/filesystems/vfs.txt |   26 +-
 fs/dcache.c   |   10 ++
 fs/pipe.c |   23 +--
 include/linux/dcache.h|1 +
 net/socket.c  |   25 ++---
 6 files changed, 73 insertions(+), 14 deletions(-)

--- linux-2.6.21-rc3/include/linux/dcache.h 2007-03-07 17:23:55.0 
+0100
+++ linux-2.6.21-rc3-ed/include/linux/dcache.h  2007-03-08 11:57:41.0 
+0100
@@ -133,6 +133,7 @@ struct dentry_operations {
int (*d_delete)(struct dentry *);
void (*d_release)(struct dentry *);
void (*d_iput)(struct dentry *, struct inode *);
+   char *(*d_dname)(struct dentry *, char *, int);
 };
 
 /* the dentry parameter passed to d_hash and d_compare is the parent
--- linux-2.6.21-rc3/Documentation/filesystems/vfs.txt  2007-03-08 
10:14:38.0 +0100
+++ linux-2.6.21-rc3-ed/Documentation/filesystems/vfs.txt   2007-03-09 
10:25:44.0 +0100
@@ -827,7 +827,7 @@ This describes how a filesystem can over
 operations. Dentries and the dcache are the domain of the VFS and the
 individual filesystem implementations. Device drivers have no business
 here. These methods may be set to NULL, as they are either optional or
-the VFS uses a default. As of kernel 2.6.13, the following members are
+the VFS uses a default. As of kernel 2.6.22, the following members are
 defined:
 
 struct dentry_operations {
@@ -837,6 +837,7 @@ struct dentry_operations {
int (*d_delete)(struct dentry *);
void (*d_release)(struct dentry *);
void (*d_iput)(struct dentry *, struct inode *);
+   char *(*d_dname)(struct dentry *, char *, int);
 };
 
   d_revalidate: called when the VFS needs to revalidate a dentry. This
@@ -859,6 +860,29 @@ struct dentry_operations {
VFS calls iput(). If you define this method, you must call
iput() yourself
 
+  d_dname: called when the pathname of a dentry should be generated.
+   Usefull for some pseudo filesystems (sockfs, pipefs, ...) to delay
+   pathname generation. (Instead of doing it when dentry is created,
+   its done only when the path is needed.). Real filesystems probably
+   dont want to use it, because their dentries are present in global
+   dcache hash, so their hash should be an invariant. As no lock is
+   held, d_dname() should not try to modify the dentry itself, unless
+   appropriate SMP safety is used. CAUTION : d_path() logic is quite
+   tricky. The correct way to return for example "Hello" is to put it
+   at the end of the buffer, and returns a pointer to the first char.
+
+   Example :
+
+static char *somefs_dname(struct dentry *dent, char *buffer, int buflen)
+{
+   char *string = "Hello";
+   int sz = strlen(string) + 1;
+   if (sz > buflen)
+   return ERR_PTR(-ENAMETOOLONG);
+   buffer += (buflen - sz);
+   

[PATCH] i2c-core: i2c bitbang gpio structure

2007-03-09 Thread Wu, Bryan
Hi folks,

A new structure is added to i2c-core for GPIO-based I2C interface
adapter. My latest GPIO based I2C adapter driver for Blackfin system
will use this stuff. And also IXP4XX GPIO based I2C driver can also be
moved to this.

Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> 
---
 include/linux/i2c.h |   20 
 1 file changed, 20 insertions(+)

Index: include/linux/i2c.h
===
--- include/linux/i2c.h (revision 2813)
+++ include/linux/i2c.h (working copy)
@@ -201,6 +201,26 @@ struct i2c_algorithm {
 };
 
 /*
+ * Some chips do not have an I2C unit, so GPIO lines are just used to 
+ * Used as platform_data to provide GPIO pin information to this kind GPIO 
+ * based I2C driver.
+ */
+struct i2c_bitbang_gpio {
+   int sda;
+   int scl;
+};
+
+static inline int i2c_bitbang_gpio_sda(struct i2c_bitbang_gpio *gpio)
+{
+   return (gpio->sda);
+}
+
+static inline int i2c_bitbang_gpio_scl(struct i2c_bitbang_gpio *gpio)
+{
+   return (gpio->scl);
+}
+
+/*
  * i2c_adapter is the structure used to identify a physical i2c bus along
  * with the access algorithms necessary to access it.
  */
_

Thanks,
-Bryan Wu



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] revoke: add f_light flag for struct file

2007-03-09 Thread Eric Dumazet
On Friday 09 March 2007 09:14, Pekka J Enberg wrote:
> From: Pekka Enberg <[EMAIL PROTECTED]>
>
> This adds a f_light flag to struct file to indicate that the file was
> looked up with fget_light().  Needed by revoke to ensure we don't
> close a file pointer while someone is using it without actually
> holding a reference.
>
> These bits were taken from the forced unmount patches by Tigran
> Aivazian.

Well, I disagree very much with this patch.

One of the interest of fget_light() is not dirtying file structure (avoiding 
atomic changes to f_count).

You add a 'flag' (4 bytes !) at the end of the file structure (so in a 
different cache line than the parts that are usually accessed in a fd_related 
syscall) and dirty this part at syscall entry and exit. Thats really a heavy 
price for supporting an unlikely revoke() syscall.

Also, the thing is racy.

( BTW, the whole revoke() concept is evil, especially if we want to avoid 
using inodes/dentries for some kind of pseudo files like sockets / pipes)

Cannot we use a flag in 'struct files_struct', set to one when the task is 
mono-thread (at task creation in fact), and set to 0 when it creates a new 
thread  (or when someone remotely access to this "struct files_struct" 
in /proc/pid/fd/... )

No need to set back this flag to 1 when task revert to mono-threaded, since 
this case is probably unlikely. This way we can be non racy.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] optimize and simplify get_cycles_sync()

2007-03-09 Thread Joerg Roedel
On Tue, Mar 06, 2007 at 04:25:41PM -0800, Andrew Morton wrote:
> On Wed, 28 Feb 2007 15:25:54 +0100
> "Joerg Roedel" <[EMAIL PROTECTED]> wrote:
> 
> > From: Joerg Roedel <[EMAIL PROTECTED]>
> > 
> > This patch simplifies the get_cycles_sync() function by removing the
> > #ifdefs from it. Further it introduces an optimization for AMD
> > processors. There the RDTSCP instruction is used instead of CPUID;RDTSC
> > which is helpfull if the kernel runs as a KVM guest. Running as a guest
> > makes CPUID very expensive because it causes an intercept of the guest.
> 
> Problem:
> 
> http://test.kernel.org/functional/index.html
> 
> lots of builds fell over because their binutils versions don't understand
> rdtscp.
> 
> I don't know how recently rdtscp support was added to binutils, but
> this is likely to be a problem.  Perhaps we can change this patch to
> embed the hex code for that opcode instead?

Ok, I will submit an updated version soon. Thanks for pointing this out.

Joerg

-- 
Joerg Roedel
Operating System Research Center
AMD Saxony LLC & Co. KG


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3: /proc broken

2007-03-09 Thread Con Kolivas
On Friday 09 March 2007 19:53, Russell King wrote:
> On Fri, Mar 09, 2007 at 08:56:44AM +1100, Con Kolivas wrote:
> > I did make oldconfig from http://userweb.kernel.org/~akpm/ck/config.txt
> > and chose all the defaults. Then building your fat config with -rc3, 'ps'
> > hangs on qemu for almost 30 seconds and then at last produces a broken
> > output
>
> Let me guess - you have either a serial console or something like that
> and you're running these commands over said serial console?
>
> Or you have console directed to both a serial port and the VT and you're
> capturing this off the VT using gpm.
>
> Either way, "serial8250: too much work for irq4" is a printk which will
> be displayed by the kernel when it's unable to clear down work for the
> serial port within 256 loops or so of the interrupt handler; it's a
> protection against the box locking up.
>
> It not actually contained in any of the files.

Thank you very much for taking the time to explain it to me and I apologise 
for the false positive. It's very much due to running qemu directing 
everything to the serial console which is just as you say. Allowing qemu to 
output to graphic fixes the error. Unfortunately that also makes akpm's oops 
go away so I can't really reproduce it now. Perhaps the bug occurs due to 
interrupts being disabled for an extended time; it gives me something to look 
at now.

Thanks!

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: any thoughts yet on a "generic" ioctl.h?

2007-03-09 Thread Robert P. J. Day
On Fri, 9 Mar 2007, Stefan Richter wrote:

> Robert P. J. Day wrote:
> >   i asked about this a while back, but i still haven't heard a
> > definitive response as to whether it's acceptable.
>
> Maybe you get response if you post a complete patch.

that *was* the complete patch -- its purpose was simply to make
asm-generic/ioctl.h general enough to allow arch-specific ioctl.h
files to *subsequently* be simplified.  there was no need to do
*everything* in one step -- each simplification could be submitted as
a separate arch-specific patch, as many things are.

i was more asking about the *philosophy* of that patch, and whether
there were any obvious objections.

rday

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm1 RSDL results

2007-03-09 Thread William Lee Irwin III
On Fri, Mar 09, 2007 at 12:07:06PM +0300, Serge Belyshev wrote:
> If you see sched_yield() when stracing any 3d program, I suggest you
> to try this bruteforce workaround, which works fine for me,
> disable sched_yield():

May I suggest LD_PRELOAD of a library consisting of only a nopped
sched_yield() function in userspace?


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-09 Thread Jean Delvare
On Fri, 9 Mar 2007 12:04:59 +0800, Sonic Zhang wrote:
> On 3/8/07, Jean Delvare <[EMAIL PROTECTED]> wrote:
> > i2c-core can emulate SMBus transactions using master_xfer, so in
> > general when you have a complete master_xfer implementation you do not
> > need to define a separate smbus_xfer function. This would save a lot of
> > code.
> 
> Actually the i2c-core can't emulate SMBus transactions using the
> master_xfer function, because the blackfin TWI controller provide
> hardware support to the SMBus transactions and the combination of
> master_xfer operations can't generate proper signal for SMBus.

Did you try? I can't think of any valid reason why it wouldn't work.
All SMBus transactions are compatible with I2C by definition.

Now performance might be better with dedicated code if the hardware
accelerate SMBus transactions, that's a different issue. If you prefer
to keep the smbus_xfer method for performance reasons, that's OK with
me. After all you're the one who will have to maintain the driver ;)

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-09 Thread Andre Noll
On 12:05, Mingming Cao wrote:
> > > BTW: Are ext3 filesystem sizes greater than 8T now officially
> > > supported?
> > 
> > I think so, but I don't know how much 16TB testing developers and
> > distros are doing - perhaps the linux-ext4 denizens can tell us?
> > -
> 
> IBM has done some testing (dbench, fsstress, fsx, tiobench, iozone etc)
> on 10TB ext3, I think RedHat and BULL have done similar test on >8TB
> ext3 too.

Thanks. I'm asking because some days ago I tried to create a 10T ext3
filesytem on a linear software raid over two hardware raids, and it
failed horribly. mke2fs from e2fsprogs-1.39 refused to create such a
large filesystem but did it with -F, and I could mount it afterwards.
But writing data immediately produced zillions of errors and only
power-cycling the box helped.

We're now using a 7.9T filesystem on the same hardware. That seems
to work fine on 2.6.21-rc2, so I think this is an ext3 problem. I
cannot completely rule out other reasons though as the underlying
qla2xxx driver also had some problems on earlier kernels.

We'd much rather have a 10T filesystem if possible. So if you have
time to look into the issue I would be willing to recreate the 10T
filesystem and send details.

Regards
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe


signature.asc
Description: Digital signature


Re: [PATCH 1/2] eCryptfs: convert lookup_one_len() to lookup_one_len_nd()

2007-03-09 Thread Christoph Hellwig
On Sat, Feb 17, 2007 at 03:56:55AM -0500, Josef 'Jeff' Sipek wrote:
> From: Michael Halcrow <[EMAIL PROTECTED]>
> 
> Call the new lookup_one_len_nd() rather than lookup_one_len().  This fixes an
> oops when stacked on NFS.
> 
> Note that there are still some issues with eCryptfs on NFS having to do with
> directory deletion (I'm not getting an oops, just an -EBUSY).

Biug NACK here.  This is just working around the broken lookup intents
code.  lookup_one_len still is a hack for some network filesystems that
unfortunately grew a few too many users.

There is a valid case for in-kernel lookups from an arbitrary point,
but lookup_one_len* is the wrong API for this.  The righ API for that
is a variant of path?lookup that takes a vfsmount + dentry pair.
Implementing this might be a good idea anyway to clean up the mess
do_path_lookup is currently.

> 
> Signed-off-by: Michael Halcrow <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> Signed-off-by: Josef 'Jeff' Sipek <[EMAIL PROTECTED]>
> ---
>  fs/ecryptfs/inode.c |   10 --
>  1 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index 11f5e50..4c3d786 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -283,7 +283,9 @@ static struct dentry *ecryptfs_lookup(struct inode *dir, 
> struct dentry *dentry,
>   int rc = 0;
>   struct dentry *lower_dir_dentry;
>   struct dentry *lower_dentry;
> + struct dentry *dentry_save;
>   struct vfsmount *lower_mnt;
> + struct vfsmount *mnt_save;
>   char *encoded_name;
>   unsigned int encoded_namelen;
>   struct ecryptfs_crypt_stat *crypt_stat = NULL;
> @@ -310,9 +312,13 @@ static struct dentry *ecryptfs_lookup(struct inode *dir, 
> struct dentry *dentry,
>   }
>   ecryptfs_printk(KERN_DEBUG, "encoded_name = [%s]; encoded_namelen "
>   "= [%d]\n", encoded_name, encoded_namelen);
> - lower_dentry = lookup_one_len(encoded_name, lower_dir_dentry,
> -   encoded_namelen - 1);
> + dentry_save = nd->dentry;
> + mnt_save = nd->mnt;
> + lower_dentry = lookup_one_len_nd(encoded_name, lower_dir_dentry,
> +  (encoded_namelen - 1), nd);
>   kfree(encoded_name);
> + nd->mnt = mnt_save;
> + nd->dentry = dentry_save;
>   if (IS_ERR(lower_dentry)) {
>   ecryptfs_printk(KERN_ERR, "ERR from lower_dentry\n");
>   rc = PTR_ERR(lower_dentry);
> -- 
> 1.5.0.19.gddff
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc][patch] futex: restartable futex_wait?

2007-03-09 Thread Thomas Gleixner
On Fri, 2007-03-09 at 06:10 +0100, Nick Piggin wrote:
> > i think that's quite right. I'm wondering why this never came up before? 
> > But your fix is not complete i think:
> > 
> > > + restart->arg2 = time;
> > > + return -ERESTART_RESTARTBLOCK;
> > > + }
> > 
> > 'time' here is relative, so the restarted syscall will do a /full/ wait 
> > again.
> 
> But it has been modified by schedule_timeout?

But this does not change the syscall registers, so it is restarted in
the same way. We need a new futex OP for this, which takes absolute time
like the PI futex op does.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] rcfs core patch

2007-03-09 Thread Paul Jackson
Kirill, responding to Herbert:
> > do we need or even want that? IMHO the hierarchical
> > concept CKRM was designed with, was also the reason
> > for it being slow, unuseable and complicated
> 1. cpusets are hierarchical already. So hierarchy is required.

I think that CKRM has a harder time doing a hierarchy than cpusets.

CKRM is trying to account for and control how much of an amorphous
resource is used, whereas cpusets is trying to control whether a
specifically identifiable resource is used, or not used, not how
much of it is used.

A child cpuset gets configured to allow certain CPUs and Nodes, and
then does not need to dynamically pass back any information about
what is actually used - it's a one-way control with no feedback.
That's a relatively easier problem.

CKRM (as I recall it, from long ago ...) has to track the amount
of usage dynamically, across parent and child groups (whatever they
were called.)  That's a harder problem.

So, yes, as Kirill observes, we need the hierarchy because cpusets
has it, cpuset users make good use of the hierarchy, and the hierarchy
works fine in that case, even if a hierarchy is more difficult for CKRM.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] chaostables

2007-03-09 Thread Amin Azez
* Jan Engelhardt wrote, On 08/03/07 20:26:
> xt_portscan needs to keep track of what packets the machine has already 
> seen. So on the first SYN, the connection is marked with "1". (Then we 
> send our SYN-ACK... and the connection turns ESTABLISHED.) The next 
> packet that is received will be an ACK or an RST. But it must come 
> _exactly after_ the SYN, so just using --tcp-flags ACK will not work. A 
> state which can be remembered is required. For that, an automaton is 
> used, whose state is saved in the connection mark.

There would me more point in having this as a new match if it didn't
trample on the connection mark, but used it's own slot or flag-bit.

Sam

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.19.2: maybe a bug inside the r8169 network driver (was Re: Linux 2.6.19.2: Freeze with CIFS mount)

2007-03-09 Thread Eric Lacombe
Just to alert potential readers, that the bug is now discussed there :
http://bugzilla.kernel.org/show_bug.cgi?id=8143

Eric Lacombe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] revoke: fix VM_REVOKED mask

2007-03-09 Thread Pekka J Enberg
From: Pekka Enberg <[EMAIL PROTECTED]>

Fix VM_REVOKED mask which overlaps with VM_ALWAYSDUMP.

Cc: Peter Zijlstra <[EMAIL PROTECTED]>
Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 include/linux/mm.h |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: uml-2.6/include/linux/mm.h
===
--- uml-2.6.orig/include/linux/mm.h 2007-03-09 11:14:06.0 +0200
+++ uml-2.6/include/linux/mm.h  2007-03-09 11:14:19.0 +0200
@@ -169,8 +169,7 @@ #define VM_NONLINEAR0x0080  /* Is no
 #define VM_MAPPED_COPY 0x0100  /* T if mapped copy of data (nommu 
mmap) */
 #define VM_INSERTPAGE  0x0200  /* The vma has had "vm_insert_page()" 
done on it */
 #define VM_ALWAYSDUMP  0x0400  /* Always include in core dumps */
-
-#define VM_REVOKED 0x0400  /* Mapping has been revoked */
+#define VM_REVOKED 0x0800  /* Mapping has been revoked */
 
 #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] rcfs core patch

2007-03-09 Thread Kirill Korotaev
>>There have been various projects attempting to provide resource
>>management support in Linux, including CKRM/Resource Groups and UBC.
> 
> 
> let me note here, once again, that you forgot Linux-VServer
> which does quite non-intrusive resource management ...
Herbert, do you care to send patches except for ask others to do
something that works for you?

Looks like your main argument is non-intrusive...
"working", "secure", "flexible" are not required to people any more? :/

>> Each had its own task-grouping mechanism. 
> 
> 
> the basic 'context' (pid space) is the grouping mechanism
> we use for resource management too
> 
> 
>>Paul Menage observed [1] that cpusets in the kernel already has a
>>grouping mechanism which was working well for cpusets. He went ahead
>>and generalized the grouping code in cpusets so that it could be used
>>for overall resource management purpose. 
> 
> 
>>With his patches, it is possible to even create multiple hierarchies
>>of groups (see [2] on why multiple hierarchies) as follows:
> 
> 
> do we need or even want that? IMHO the hierarchical
> concept CKRM was designed with, was also the reason
> for it being slow, unuseable and complicated
1. cpusets are hierarchical already. So hierarchy is required.
2. As it was discussed on the call controllers which are flat
   can just prohibit creation of hierarchy on the filesystem.
   i.e. allow only 1 depth and continue being fast.

>>mount -t container -o cpuset none /dev/cpuset <- cpuset hierarchy
>>mount -t container -o mem,cpu none /dev/mem   <- memory/cpu hierarchy
>>mount -t container -o disk none /dev/disk <- disk hierarchy
>>
>>In each hierarchy, you can create task groups and manipulate the
>>resource parameters of each group. You can also move tasks between
>>groups at run-time (see [3] on why this is required). 
> 
> 
>>Each hierarchy is also manipulated independent of the other.  
> 
> 
>>Paul's patches also introduced a 'struct container' in the kernel,
>>which serves these key purposes:
>>
>>- Task-grouping
>>  'struct container' represents a task-group created in each hierarchy.
>>  So every directory created under /dev/cpuset or /dev/mem above will
>>  have a corresponding 'struct container' inside the kernel. All tasks
>>  pointing to the same 'struct container' are considered to be part of
>>  a group
>>
>>  The 'struct container' in turn has pointers to resource objects which
>>  store actual resource parameters for that group. In above example,
>>  'struct container' created under /dev/cpuset will have a pointer to
>>  'struct cpuset' while 'struct container' created under /dev/disk will
>>  have pointer to 'struct disk_quota_or_whatever'.
>>
>>- Maintain hierarchical information
>>  The 'struct container' also keeps track of hierarchical relationship
>>  between groups.
>>
>>The filesystem interface in the patches essentially serves these
>>purposes:
>>
>>  - Provide an interface to manipulate task-groups. This includes
>>creating/deleting groups, listing tasks present in a group and 
>>moving tasks across groups
>>
>>  - Provdes an interface to manipulate the resource objects
>>(limits etc) pointed to by 'struct container'.
>>
>>As you know, the introduction of 'struct container' was objected
>>to and was felt redundant as a means to group tasks. Thats where I
>>took a shot at converting over Paul Menage's patch to avoid 'struct
>>container' abstraction and insead work with 'struct nsproxy'.
> 
> 
> which IMHO isn't a step in the right direction, as
> you will need to handle different nsproxies within
> the same 'resource container' (see previous email)
tend to agree.
Looks like Paul's original patch was in the right way.

[...]
>>A separate filesystem would give us more flexibility like the
>>implementing multi-hierarchy support described above.
> 
> 
> why is the filesystem approach so favored for this
> kind of manipulations?
> 
> IMHO it is one of the worst interfaces I can imagine
> (to move tasks between spaces and/or assign resources)
> but yes, I'm aware that filesystems are 'in' nowadays
I also hate filesystems approach being used nowdays everywhere.
But, looks like there are reasons still:
1. cpusets already use fs interface.
2. each controller can have a bit of specific information/controls exported 
easily.

Can you suggest any other extensible/flexible interface for these?

Thanks,
Kirill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc3-mm1 RSDL results

2007-03-09 Thread Serge Belyshev
Con Kolivas <[EMAIL PROTECTED]> writes:

> On Friday 09 March 2007 18:53, Matt Mackall wrote:
...
>>
>> With a single non-parallel make running (all in cache, mind you), the
>> system kicks up into just about 100% CPU usage at full speed. Desktop
>> spinning becomes between 10x to 100x slower (from ~30fps to < 1fps).
>> Galeon scrolling pauses for as much as a second. Mouse movement pauses
>> for as much as a second. Typing in terminals lags noticeably.
>>
>> This is not the expected behavior of a fair, low-latency scheduler.
>
> No indeed it does not sound right at all to me either. Last time I 
> encountered 
> something like this we traced it and hit sched_yield calls somewhere in the 
> graphic pipeline. So first question is, how does mainline perform with the 
> same testcase, and second question is umm whatever it is that is slow is 
> there a way to trace it to see if it yields?

Matt, some 3d drivers are known to do sched_yield() behind user's back,

(notably dri radeon ones, grep for sched_yield:
http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r200/r200_ioctl.c?revision=1.37=markup
http://webcvs.freedesktop.org/mesa/Mesa/src/mesa/drivers/dri/r300/radeon_ioctl.c?revision=1.14=markup)

thus absolutely killing any desktop interactivity whatsoever.

If you see sched_yield() when stracing any 3d program, I suggest you
to try this bruteforce workaround, which works fine for me,
disable sched_yield():


 kernel/sched.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -4285,7 +4285,7 @@ asmlinkage long sys_sched_getaffinity(pi
  * This function yields the current CPU by dropping the priority of current
  * to the lowest priority.
  */
-asmlinkage long sys_sched_yield(void)
+static long sys_sched_yield1(void)
 {
struct rq *rq = this_rq_lock();
struct task_struct *p = current;
@@ -4312,6 +4312,11 @@ asmlinkage long sys_sched_yield(void)
return 0;
 }
 
+asmlinkage long sys_sched_yield(void)
+{
+   return 0;
+}
+
 static void __cond_resched(void)
 {
 #ifdef CONFIG_DEBUG_SPINLOCK_SLEEP
@@ -4395,7 +4400,7 @@ EXPORT_SYMBOL(cond_resched_softirq);
 void __sched yield(void)
 {
set_current_state(TASK_RUNNING);
-   sys_sched_yield();
+   sys_sched_yield1();
 }
 EXPORT_SYMBOL(yield);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Software Suspend: Fix suspend when console is in VT_AUTO/KD_GRAPHICS mode

2007-03-09 Thread Pavel Machek
Hi!

> When the console is in VT_AUTO/KD_GRAPHICS mode, switching to the
> SUSPEND_CONSOLE fails, resulting in vt_waitactive() waiting indefinately
> or until the task is interrupted.  The following patch tests if a
> console switch can occur in set_console() and returns early if a console
> switch is not possible.

Your mailer wraps lines...

How do I reproduce this? Suspending from X works for me...?

> Signed-off-by: Andrew Johnson <[EMAIL PROTECTED]>
> 
> diff -rup linux-2.6.20.1/drivers/char/vt.c linux/drivers/char/vt.c
> --- linux-2.6.20.1/drivers/char/vt.c  2007-02-19 22:34:32.0 -0800
> +++ linux/drivers/char/vt.c   2007-03-08 14:15:41.0 -0800
> @@ -2188,10 +2188,20 @@ static void console_callback(struct work
>   release_console_sem();
>  }
>  
> -void set_console(int nr)
> +extern char vt_dont_switch;
> +
> +int set_console(int nr)
>  {
> + struct vc_data *vc = vc_cons[fg_console].d;
> +
> + if(!vc_cons_allocated(nr) || vt_dont_switch || vc->vc_mode ==
> KD_GRAPHICS) {
> + return -EINVAL;
> + }
> +

So... if current console is graphical, we leave X accessing the
console... That's bad, because video state is not going to be
restored...?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use more gcc extensions in the Linux headers

2007-03-09 Thread Christoph Hellwig
On Fri, Mar 09, 2007 at 12:02:19PM +0300, Andrey Panin wrote:
> Kernel compilation with Intel compiler is (was ?) supported.
> This patch will break it.

It was only put in under the premise that they'll fix whatever breaks,
we're not going to put any maintaince border on us to hack around
broken propritary compilers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drivers/media/video/videocodec.c: check kmalloc() return value.

2007-03-09 Thread Andrey Panin
On 067, 03 08, 2007 at 11:14:01PM -0800, Amit Choudhary wrote:
> Description: Check the return value of kmalloc() in function 
> videocodec_build_table(), in file drivers/media/video/videocodec.c.
> 
> Signed-off-by: Amit Choudhary <[EMAIL PROTECTED]>
> 
> diff --git a/drivers/media/video/videocodec.c 
> b/drivers/media/video/videocodec.c
> index 2ae3fb2..16fc1dd 100644
> --- a/drivers/media/video/videocodec.c
> +++ b/drivers/media/video/videocodec.c
> @@ -348,6 +348,8 @@ #define LINESIZE 100
>   kfree(videocodec_buf);
>   videocodec_buf = (char *) kmalloc(size, GFP_KERNEL);
>  
> + if (!videocodec_buf)
> + return 0;
>   i = 0;
>   i += scnprintf(videocodec_buf + i, size - 1,
> "lave or attached aster name  type flagsmagic   
>  ");

Can you also remove useless (char *) cast above ?

-- 
Andrey Panin| Linux and UNIX system administrator
[EMAIL PROTECTED]   | PGP key: wwwkeys.pgp.net


signature.asc
Description: Digital signature


Re: [PATCH] Use more gcc extensions in the Linux headers

2007-03-09 Thread Andrey Panin
On 068, 03 09, 2007 at 07:53:08AM +, Christoph Hellwig wrote:
> On Fri, Mar 09, 2007 at 09:50:56AM +0300, Andrey Panin wrote:
> > On 068, 03 09, 2007 at 04:56:32PM +1100, Rusty Russell wrote:
> > > __builtin_types_compatible_p() has been around since gcc 2.95,
> > 
> > but it's not available in Intel C compiler IIRC :(
> 
> So what?

Kernel compilation with Intel compiler is (was ?) supported.
This patch will break it.

-- 
Andrey Panin| Linux and UNIX system administrator
[EMAIL PROTECTED]   | PGP key: wwwkeys.pgp.net


signature.asc
Description: Digital signature


Re: should RTS init in serial core be tied to CRTSCTS

2007-03-09 Thread Russell King
On Thu, Mar 08, 2007 at 06:32:29PM -0500, Robin Getz wrote:
> On Thu 8 Mar 2007 15:40, Russell King pondered:
> > On Thu, Mar 08, 2007 at 03:23:39PM -0500, Robin Getz wrote:
> > > Right - We both agree - And setting console=/dev/null in the bootargs
> > > still does not help.
> >
> > Ok, good.
> >
> > > When the kernel initializes the UART Port, it asserts RTS - which
> > > confuses the host it is attached to (in this case, the Linux system
> > > is the serial peripheral).
> >
> > ... which occurs /after/ userspace is up and running, when sysfs is
> > available.  So putting it in sysfs is reasonable.
> 
> Hmm - maybe I don't understand things then.
> 
> Today - RTS gets asserted when serial_core calls uart_startup(), which is 
> pretty early in the boot process (unless it is loaded as a module - which I'm 
> not doing).

uart_startup() is called when something opens a serial port.  There's
two points which that happens:

1. when you have serial console enabled, and the kernel opens /dev/console
   before starting userspace.

2. when userspace opens a serial port.

If you're not using a serial console (and you don't have an utterly broken
/dev nodes - iow incorrect /dev/console entry), then (1) doesn't apply.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] swsusp: Disable nonboot CPUs before entering platform suspend

2007-03-09 Thread Pavel Machek
Hi!

> > > Index: linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > > ===
> > > --- linux-2.6.21-rc2-mm2.orig/kernel/power/disk.c
> > > +++ linux-2.6.21-rc2-mm2/kernel/power/disk.c
> > > @@ -61,6 +61,7 @@ static void power_down(suspend_disk_meth
> > >   switch(mode) {
> > >   case PM_DISK_PLATFORM:
> > >   if (pm_ops && pm_ops->enter) {
> > > + disable_nonboot_cpus();
> > >   kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
> > >   pm_ops->enter(PM_SUSPEND_DISK);
> > >   break;
> > 
> > ...so, if pm_ops is non-null, power_down does nonboot cpu disabling,
> > otherwise we proceed with cpus enabled?
> > 
> > That looks ugly.
> > 
> > Is the warning bogus?
> 
> Well, maybe.  I'm not sure.
> 
> > Or maybe we should *always* disable nonboot cpus in powerdown path?
> 
> I think we should do that.

That would be acceptable.

> > > Index: linux-2.6.21-rc2-mm2/kernel/power/user.c
> > > ===
> > > --- linux-2.6.21-rc2-mm2.orig/kernel/power/user.c
> > > +++ linux-2.6.21-rc2-mm2/kernel/power/user.c
> > > @@ -398,9 +398,9 @@ static int snapshot_ioctl(struct inode *
> > >  
> > >   case PMOPS_ENTER:
> > >   if (data->platform_suspend) {
> > > + disable_nonboot_cpus();
> > >   kernel_shutdown_prepare(SYSTEM_SUSPEND_DISK);
> > >   error = pm_ops->enter(PM_SUSPEND_DISK);
> > > - error = 0;
> > >   }
> > >   break;
> > 
> > Foe an userland application, disabling cpus during pmops_enter is at
> > least surprising...
> 
> Yes, but this is not a usual ioctl().  OTOH, we can call enable_nonboot_cpus()
> if pm_ops->enter(PM_SUSPEND_DISK) returns an error (otherwise it souldn't
> return at all, no?).

Ok.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >