date:20070708

Re: [Hdaps-devel] [PATCH] hdaps - switch to using input-polldev

2007-07-08 Thread Dmitry Torokhov

On Monday 09 July 2007 01:29, Shem Multinymous wrote:
> On 7/9/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:
> > > > input-polldev uses a separate workqueue, not keventd, and so should not
> > > > suffer from other workqueue users loading keventd. But if entire box
> > > > is under stress then workqueue vs timer context does not matter much -
> > > > your daemon which is in userspace may not get to run in a timely manner
> > > > anyway.
> > >
> > > The daemon itself typically runs with a higher priority (and sleeps a
> > > lot so it gets further dumped). More importantly, the daemon depends
> > > not only on the latest measurement, but also on recent measurements
> > > have been obtained from the hardware in a regular fashion and with
> > > reasonably accurate timestamps. And *this* depends solely on the hdaps
> > > driver.
> > >
> >
> > Every input event carries a timestamp so even if there are irregularities
> > in taking the samples you should be able to account for it.
> 
> The issue is how good are the input event timestamps. The way it works
> is that the EC samples the analog sensor at some fixed rate and makes
> them available over the LPC bus. If the hdaps driver consumes these
> samples at the same rate then the timestamps will be accurate up to a
> small phase difference, which is mostly inconsequential. But if the
> hdaps driver gets scheduled irregularly, its timestamps will be offset
> by varying amounts, which will completely throw off computation (e.g.,
> think of estimating the angular velocity).
>

Timers do not guarantee you that they will be fired at the exact time.
If system is under load and there are hard IRQs they will also be
delayed.

> 
> > > > However I am open to bumping up priority of ipolldevd a little.
> > >
> > > Will this result in scheduling tha'ts as reliable as rearming timers
> > > from softirq? I saw claims to the contrary, but it it's true then I
> > > withdraw the first objection.
> >
> > Probably not. But I still think that if system is so busy that it can't
> > get aroung to schedule one of workqueues it will not be able to part
> > the driver fast enough anyway.
> 
> A delay of 20ms in invoking the userspace daemon is negligible, but a
> timing variance of 20ms in the driver's measurements is devastating.
>

Even if you know that there is such variance?
 
> 
> > > > I am curious why you can't use the current device, since the calibration
> > > > done in hdaps does not alter the scale but merely moves '0' point 
> > > > around.
> > > > And fuzz should only remove small jitters, not rapidly changing data
> > > > that you shoudl get when your box is falling.
> > >
> > > Recent versions of the hdapsd daemons do much more than a simple
> > > threshold check: they gather some 2nd-order and decaying averages
> > > statistics to catch subtle abnormal movement (e.g., sliding off a
> > > surface) that's indicative of potential shock. As pointed out in IBM's
> > > HDAPS whitepaper, by the time the box is actually in free fall, it's
> > > too late to start parking the heads. Now, that kind of movement is not
> > > very far from the noise floor, so hdapsd needs all the accuracy it can
> > > get -- hence fuzzing is very disruptive. Calibration is currently
> > > harmless, but I can certainly imagine more advanced hdapsd that uses
> > > heuristics based, e.g., on the absolute orientation of the laptop, so
> > > let's not ruin this data.
> >
> > If hdaps is the main consumer for the data it may be a good idea to
> > just remove the fuzz setting from input device. I don't have the hardware,
> > how bad is it without fuzz?
> 
> People are using the existing input device as a joystick input for
> things like Neverball. Current fuzz is 4 and the observed std dev is
> roughly 2, so eliminating fuzz will certainly affect the values. The
> implications are app-specific, but I guess some apps do care about
> such noise, otherwise we wouldn't have fuzz built into the input
> infrastructure.
>

OK.
 
> 
> > > You could one input device open, or the other, or both. How would you
> > > set up input-polldev to handle this?
> >
> > Have 2nd input device's ->open() method call input_open_device() for
> > the first one.
> 
> Won't that create an overhead by the redundant, unused notifications?
> 

They won't leave input core so nothing really noticeable.


>   Shem
> 

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Documentation of kernel messages (Summary)

2007-07-08 Thread H. Peter Anvin

Kunai, Takashi wrote:
> (1) Your kernel development proposal will be greatly supported by
> Japanese vendor community. At the same time, it needs support from the
> kernel communities, as well.

There is a very strong reason for the kernel community to NOT support
this: it makes it much harder to deal with bug reports.

Like it or not, English is a lingua franca[1] of the technology
industry.  When I deal with the rest of the kernel community, I use
English, instead of my native language (Swedish).

For things that are targetted toward end users, that's a different
matter, of course, but kernel messages are inherently directed toward
developers and technically sophisticated users.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Moving MD/LVM from PPC to x86

2007-07-08 Thread Neil Brown

On Sunday July 8, [EMAIL PROTECTED] wrote:
> Quoting Neil Brown <[EMAIL PROTECTED]>:
> 
> > Version 0.90 MD superblocks (still the default) uses host-endian
> > values so you cannot move between architectures directly.  However
> > isn't too hard to make it work. 
> > Firstly, use
> >mdadm --examine --metadata=0.swap /dev/DEVICE
> 
> - s n i p -
> ppc:~# mdadm --examine --metadata=0.swap /dev/md0
> mdadm: No super block found on /dev/md0 (Expected magic a92b4efc, got 
> )
> ppc:~# mdadm --examine --metadata=0.swap /dev/md1
> mdadm: No super block found on /dev/md1 (Expected magic a92b4efc, got 
> )
> ppc:~# mdadm --examine --metadata=0.swap /dev/md2
> mdadm: No super block found on /dev/md2 (Expected magic a92b4efc, got 
> )

You --examine component devices of an array, not the whole array.

> ppc:~# mdadm --examine --metadata=0.swap /dev/sda1
> mdadm: No super block found on /dev/sda1 (Expected magic a92b4efc, got 
> fc4e2ba9)

This suggests that the superblock is currently the correct byteorder
for the current host.  You use
   mdadm --examine --metadata=0.swap /dev/sda1

when you have moved a devices from one host to different host with the
opposite endian-ness (e.g. bigendian to littlendian).

> 
> > to check that you have the right devices.
> > Then
> >
> >   mdadm --assemble /dev/md0 --update=byteorder /dev/DEV0 /dev/DEV1  
> >
> > That should assemble the array and update the superblocks so that they
> > are in the right byteorder and will assemble easily in future.
> 
> Is this safe, changing the byteorder on all the physical devices (that are
> part of my MD's)?
> Will it still work on the PPC?

I cannot remember where you were moving 'from' or 'to', but what you
have to do is:

  1/ move the devices to the new computer.
  2/ use "mdadm --examine" to make sure they are where you expect them
  to be.  Use "--metdata=0.swap" if the byteorder is different on
  the new machine.
  3/ Once you are satisfied that things look right, use
   mdadm --assemble /dev/md0 --update=byteorder 
to assemble the array.  This will change the byteorder in the
superblocks.  After you have done this, the array will assemble
normally on the new machine, but will not if you move it back to the
old machine. If you want to use the old machine again, you need to
use --update=byteorder again.

Does that make it clear?

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Nick Piggin


Nick Piggin wrote:

Jeremy Maitin-Shepard wrote:


Nick Piggin <[EMAIL PROTECTED]> writes:




Yes, I have a rough idea about how page reclaim works. But I just
mean it would not be trivial to load the new kernel into physically
discontiguous memory. Possible of course, but I don't think kexec or
the setup code could quite cope ATM.




It would indeed be a pain for the new kernel to be loaded and have to
use discontiguous memory.  The trick is, though, that this is not
necessary.  Immediately before jumping to the new kernel, the first X
bytes (where X is the amount of memory the new kernel will get,
typically 16MB or 64MB) of physical memory are backed up into the
arbitrary discontiguous pages that are made available.  This will not
take very long, because copying even 64MB of memory is extremely fast.
Then the new kernel is free to use the first X bytes of contiguous
physical memory.  Problem solved.



Ah, that sounds like it would be the right way to go. Good thinking.


Hmm, considering it is not a crash situation, it might even be
better again to simply reuse the exising kernel text and kernel
page table and memory map information if possible. That would
probably only require a meg or two to reinit drivers, load a
suspend-to-disk-init, and do the IO.

OTOH it would likely be more complex than just freeing up a bit
of memory and relocating it.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Hdaps-devel] [PATCH] hdaps - switch to using input-polldev

2007-07-08 Thread Shem Multinymous

On 7/9/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:

> > input-polldev uses a separate workqueue, not keventd, and so should not
> > suffer from other workqueue users loading keventd. But if entire box
> > is under stress then workqueue vs timer context does not matter much -
> > your daemon which is in userspace may not get to run in a timely manner
> > anyway.
>
> The daemon itself typically runs with a higher priority (and sleeps a
> lot so it gets further dumped). More importantly, the daemon depends
> not only on the latest measurement, but also on recent measurements
> have been obtained from the hardware in a regular fashion and with
> reasonably accurate timestamps. And *this* depends solely on the hdaps
> driver.
>

Every input event carries a timestamp so even if there are irregularities
in taking the samples you should be able to account for it.

The issue is how good are the input event timestamps. The way it works
is that the EC samples the analog sensor at some fixed rate and makes
them available over the LPC bus. If the hdaps driver consumes these
samples at the same rate then the timestamps will be accurate up to a
small phase difference, which is mostly inconsequential. But if the
hdaps driver gets scheduled irregularly, its timestamps will be offset
by varying amounts, which will completely throw off computation (e.g.,
think of estimating the angular velocity).

> > However I am open to bumping up priority of ipolldevd a little.
>
> Will this result in scheduling tha'ts as reliable as rearming timers
> from softirq? I saw claims to the contrary, but it it's true then I
> withdraw the first objection.

Probably not. But I still think that if system is so busy that it can't
get aroung to schedule one of workqueues it will not be able to part
the driver fast enough anyway.

A delay of 20ms in invoking the userspace daemon is negligible, but a
timing variance of 20ms in the driver's measurements is devastating.

> > I am curious why you can't use the current device, since the calibration
> > done in hdaps does not alter the scale but merely moves '0' point around.
> > And fuzz should only remove small jitters, not rapidly changing data
> > that you shoudl get when your box is falling.
>
> Recent versions of the hdapsd daemons do much more than a simple
> threshold check: they gather some 2nd-order and decaying averages
> statistics to catch subtle abnormal movement (e.g., sliding off a
> surface) that's indicative of potential shock. As pointed out in IBM's
> HDAPS whitepaper, by the time the box is actually in free fall, it's
> too late to start parking the heads. Now, that kind of movement is not
> very far from the noise floor, so hdapsd needs all the accuracy it can
> get -- hence fuzzing is very disruptive. Calibration is currently
> harmless, but I can certainly imagine more advanced hdapsd that uses
> heuristics based, e.g., on the absolute orientation of the laptop, so
> let's not ruin this data.

If hdaps is the main consumer for the data it may be a good idea to
just remove the fuzz setting from input device. I don't have the hardware,
how bad is it without fuzz?

People are using the existing input device as a joystick input for
things like Neverball. Current fuzz is 4 and the observed std dev is
roughly 2, so eliminating fuzz will certainly affect the values. The
implications are app-specific, but I guess some apps do care about
such noise, otherwise we wouldn't have fuzz built into the input
infrastructure.

> You could one input device open, or the other, or both. How would you
> set up input-polldev to handle this?

Have 2nd input device's ->open() method call input_open_device() for
the first one.

Won't that create an overhead by the redundant, unused notifications?

 Shem
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.22 -- net/core/dev.c:1515: error: 'NETIF_F_IPV6_CSUM' undeclared (first use in this function)

2007-07-08 Thread Miles Lane


CC  net/core/dev.o
net/core/dev.c: In function 'dev_queue_xmit':
net/core/dev.c:1515: error: 'NETIF_F_IPV6_CSUM' undeclared (first use
in this function)
net/core/dev.c:1515: error: (Each undeclared identifier is reported only once
net/core/dev.c:1515: error: for each function it appears in.)
make[2]: *** [net/core/dev.o] Error 1
make[1]: *** [net/core] Error 2

CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set
CONFIG_UNIX=y
CONFIG_XFRM=y
# CONFIG_XFRM_USER is not set
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
# CONFIG_INET_XFRM_MODE_BEET is not set
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
# CONFIG_TCP_MD5SIG is not set
# CONFIG_IP_VS is not set
CONFIG_IPV6=m
# CONFIG_IPV6_PRIVACY is not set
# CONFIG_IPV6_ROUTER_PREF is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
# CONFIG_INET6_AH is not set
# CONFIG_INET6_ESP is not set
# CONFIG_INET6_IPCOMP is not set
# CONFIG_IPV6_MIP6 is not set
# CONFIG_INET6_XFRM_TUNNEL is not set
# CONFIG_INET6_TUNNEL is not set
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
# CONFIG_INET6_XFRM_MODE_BEET is not set
# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
CONFIG_IPV6_SIT=m
# CONFIG_IPV6_TUNNEL is not set
# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_NETWORK_SECMARK is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Documentation of kernel messages (Summary)

2007-07-08 Thread Kunai, Takashi

Hi Micheal,

This is a late response to your summary below.

First of all, there is a lot of interest in this issue among Japanese
vendor community. Recently, I have built a Japanese mailing list to
discuss this issue. I'm posting this comment resulting from such discussion.

(1) Your kernel development proposal will be greatly supported by
Japanese vendor community. At the same time, it needs support from the
kernel communities, as well.
(2) As for message ID's, you proposed 3 ways; (a)adding message numbers,
(b)printk hashes and (c)Format strings. (a) seems difficult to get
supports from kernel developers and (c) lacks uniqueness of each
message. Though (b) also lacks uniqueness, adding component id and/or
file name (files name could be hashed, as well), it could achieve some
practical level of uniqueness. To avoid ugly message format, it could be
possible to introduce a system parameter to suppress this hash ids.
Thus, our preference is (b).
(3) As for a documentation file, outside of kernel source tree would be
fine.
(4) When this kernel message scheme become feasible, it will be needed
to talk about the standardized contents of each description
(meanings/actions/etc).

Takashi Kunai
-- 
Kunai, Takashi
The Linux Foundation Japan (New since '07/2/5)
Shibuya Mark City West 22nd Floor
1-12-1 Dogenzaka,Shibuya-ku,Tokyo,Japan
zip150-0043 tel+81-3-4360-5493

Michael Holzheu wrote:
> On Mon, 2007-06-25 at 11:44 -0400, Rob Landley wrote:
>> On Monday 25 June 2007 09:48:41 Michael Holzheu wrote:
>>> Hi all,
>>>
>>> Any idea, how to proceed with this topic? Do you think that any of the
>>> suggested solutions for documentation / translation of kernel messages
>>> will have a chance to be included in the kernel?
>> Personally?  No to the second question, which renders the first "do it 
>> yourself outside of the tree".
> 
> If that is the opinion of the majority here, fine. If there is no hard
> rule on how to define printk macros, one option for us would be to
> define some new s390 specific printk macros for our device drivers.
> Similar to hundreds of other driver specific printk macros in the
> kernel.
> 
>> Just a guess, and I don't speak for anyone else here, but I think most of us 
>> are waiting to see how long it takes you to lose interest.
> 
> :-) Work is not always fun. Sometimes it is just a duty.
> 
> Michael
> 
> 
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: partially mounted cifs filesystem

2007-07-08 Thread Albert Cahalan


On 7/7/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:

On 7/7/07, Albert Cahalan <[EMAIL PROTECTED]> wrote:



I had one share mounted, from XP to Linux, and wanted another.
At first I had an incorrect setting on the XP box, almost
certainly related to permissions. The mount failed of course.
Running "mount" showed that the filesystem was not mounted,
but apparently it didn't remain fully unmounted either.
There was also nothing under the mount point, and the "ls -l"
data (directory size and link count) looked like ext3.


That means nothing was mounted there ...


I changed settings on the XP box numerous times. After many
frustrating attempts, I ran "umount" on the mount point and
then successfully mounted the filesystem.


... but still umount succeeded? Didn't it complain about nothing
being mounted there in the first place? Surprising that it actually
resolved the problem ...


It complained, and it resolved the problem.


I'll guess that the kernel returned an error for my early
attempts at mounting, but left open a CIFS connection.

I suppose the cifs error handling is buggy.


Yes, that could be the case. Could you please:

1. Tell us which kernel version was it? .config?
2. Was there some dmesg output from the failed mount(2) attempt?
3. What was the mount command line / options?


Server: Windows XP service pack 2, recently updated
Client: Fedora kernel 2.6.20-1.3094.fc7, mount.cifs version 1.10

My xterm still had the commands in the scrollback buffer.
I added a few, grepping dmesg and /etc/fstab, and chopped
out the unrelated stuff. Note that the number in my command
prompt is the exit code of the previous command; these are
all correct despite editing out the unrelated commands.

There are some interesting error messages, plus a lock order
warning that mentions cifs. Note that I have numerous cifs
shares mounted, so not every log message relates to this one.


Then:

1. Rebuild kernel with CIFS_DEBUG2.
2. Revert back (on the XP share export side) to the buggy / incorrect
settings -- so that you can try and reproduce the problem.
3. Let us know if you could reproduce, if so, any debug ouput / etc?


I probably spent a week messing with Windows settings. I switched
back and forth between simple file sharing and not, adjusted many
registry settings related to anonymous/guest treatment, redid the
ACLs more times than I care to think about... There really isn't
any hope I could get back to the original settings. My best guess
would be something related to an ACL for guest, everybody, SYSTEM,
or anonymous, or something related to the checkboxes for client
permissions in the file sharing dialog. At one time I had a deny ACL.

Here you go. The fstab lines will be word wrapped in this email,
but are not word wrapped in the file.

--
proc 0 # mount /mnt/vm/sc
Password:
mount error 11 = Resource temporarily unavailable
Refer to the mount.cifs(8) manual page (e.g.man mount.cifs)
proc 255 # smbclient -L //192.168.1.141
Password:
Domain=[ALBERTXP] OS=[Windows 5.1] Server=[Windows 2000 LAN Manager]

   Sharename   Type  Comment
   -     ---
   IPC$IPC   Remote IPC
   sourcecode  Disk
   ADMIN$  Disk  Remote Admin
   C$  Disk  Default share
   homedir Disk
session request to 192.168.1.141 failed (Called name not present)
session request to 192 failed (Called name not present)
Domain=[ALBERTXP] OS=[Windows 5.1] Server=[Windows 2000 LAN Manager]

   Server   Comment
   ----

   WorkgroupMaster
   ----
proc 0 # smbclient  //192.168.1.141/sourcecode
Password:
Domain=[ALBERTXP] OS=[Windows 5.1] Server=[Windows 2000 LAN Manager]
smb: \> ls
 .   D0  Wed Dec  6 18:12:30 2006
 ..  D0  Wed Dec  6 18:12:30 2006
 development D0  Mon Jul  2 15:10:15 2007
 legacy  D0  Wed Dec  6 22:29:42 2006
 libraries   D0  Mon Jul  2 16:03:25 2007
 mmm D0  Mon Jul  2 16:53:27 2007
 re  D0  Mon Jul  2 17:39:34 2007
 s   D0  Mon Jul  2 17:46:23 2007
 thirdparty  D0  Mon Jul  2 18:05:05 2007

   40931 blocks of size 524288. 18955 blocks available
smb: \> q
proc 0 # mount /mnt/vm/sc
Password:
mount error 11 = Resource temporarily unavailable
Refer to the mount.cifs(8) manual page (e.g.man mount.cifs)
proc 255 # ls -l /mnt/vm/sc
total 0
proc 0 # ls -l /mnt/vm
total 2
drwxr-xr-x 1 root root0 2007-07-03 17:43 homedir
drwxr-xr-x 2 root root 1024 2007-07-03 13:30 sc
proc 0 # ls -al /mnt/vm/sc
total 4
drwxr-xr-x 2 root root 1024 2007-07-03 13:30 .
drwxr-xr-x 4 root root 1024 2007-07-03

Re: Linux 2.6.22 released

2007-07-08 Thread Willy Tarreau

On Sun, Jul 08, 2007 at 04:52:52PM -0700, Linus Torvalds wrote:
> Anybody? Should I make just the shortlogs available instead (I don't save 
> those, but I post those for the later -rc's - usually the -rc1 and -rc2's 
> are too big for the mailing list, but they are still a lot smaller and 
> more readable than the *full* logs are)?
> 
> Or do people really want the full logs, and don't use git?

The changelogs would be more useful if they were indexed by google, but
it seems they aren't (maybe too big, since 2.4 and 2.6.16 changelogs are
indexed ?). At least having the shortlog available would be a minimum,
provided that we try to keep the most possible descriptive subjects.
This also means being more transparent about security fixes.

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Nick Piggin


Jeremy Maitin-Shepard wrote:

Nick Piggin <[EMAIL PROTECTED]> writes:



Yes, I have a rough idea about how page reclaim works. But I just
mean it would not be trivial to load the new kernel into physically
discontiguous memory. Possible of course, but I don't think kexec or
the setup code could quite cope ATM.



It would indeed be a pain for the new kernel to be loaded and have to
use discontiguous memory.  The trick is, though, that this is not
necessary.  Immediately before jumping to the new kernel, the first X
bytes (where X is the amount of memory the new kernel will get,
typically 16MB or 64MB) of physical memory are backed up into the
arbitrary discontiguous pages that are made available.  This will not
take very long, because copying even 64MB of memory is extremely fast.
Then the new kernel is free to use the first X bytes of contiguous
physical memory.  Problem solved.


Ah, that sounds like it would be the right way to go. Good thinking.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Jeremy Maitin-Shepard

Nick Piggin <[EMAIL PROTECTED]> writes:

> Jeremy Maitin-Shepard wrote:
>> Nick Piggin <[EMAIL PROTECTED]> writes:

>>> This is the Morton method, isn't it? :) I remember it sounding like a
>>> very good idea when he brought it up, but I can't remember the details
>>> of why it was rejected or what the problems were.
>> 
>> 
>> Perhaps he did bring it up before I did.  Please forward me a link to
>> the thread or other reference if you can find it, as I'd be interested
>> in reading it.

> Sent in the next mail.

Thanks.  I've started reading over the thread.

>>> I suspect that freeing memory on the fly for the new kernel
>>> would be non-trivial (but possible), however simply having a reserve
>>> RAM region for the new kernel would be fine for a first step.
>> 
>> 
>> Freeing memory on the fly should be extremely easy for the kernel (this
>> is precisely what it does when it needs to satisfy an allocation).  Note
>> that the memory allocated need not be contiguous.

> Yes, I have a rough idea about how page reclaim works. But I just
> mean it would not be trivial to load the new kernel into physically
> discontiguous memory. Possible of course, but I don't think kexec or
> the setup code could quite cope ATM.

It would indeed be a pain for the new kernel to be loaded and have to
use discontiguous memory.  The trick is, though, that this is not
necessary.  Immediately before jumping to the new kernel, the first X
bytes (where X is the amount of memory the new kernel will get,
typically 16MB or 64MB) of physical memory are backed up into the
arbitrary discontiguous pages that are made available.  This will not
take very long, because copying even 64MB of memory is extremely fast.
Then the new kernel is free to use the first X bytes of contiguous
physical memory.  Problem solved.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: KVM-AMD OOPS

2007-07-08 Thread Jeremy Fitzhardinge


Sasa Ostrouska wrote:

[EMAIL PROTECTED]:~# modprobe kvm-amd
int3:  [1] PREEMPT SMP
CPU 1
Modules linked in: kvm_amd snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
nls_iso8859_1 ntfs nls_base usb_storage libusual capability commoncap
lp psmouse snd_hda_intel snd_hda_codec snd_pcm snd_timer ohci_hcd
ehci_hcd 8139too rtc_cmos snd soundcore snd_page_alloc usbcore k8temp
mii rtc_core rtc_lib i2c_nforce2 parport_pc parport
Pid: 2898, comm: modprobe Tainted: P   2.6.21.5 #1
RIP: 0010:[]  []
register_cpu_notifier+0x1/0x31
RSP: :81006e34df40  EFLAGS: 0246
RAX:  RBX:  RCX: c0010117
RDX:  RSI: 81006e219640 RDI: 80536510
RBP: 880d8840 R08:  R09: 0006b5f4
R10:  R11:  R12: 005296b0
R13: 7fff51fe55c0 R14:  R15: 
FS:  2b4e58e18b00() GS:810002e794c0() 
knlGS:

CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0050df64 CR3: 7a0f3000 CR4: 06e0
Process modprobe (pid: 2898, threadinfo 81006e34c000, task 
81007bc48400)
Stack:  8039e024 81006e34c000 880d8840 
5e19

8024537c  7fff51fe50c0 004142d0
8020967e 0206 005230e0 0052f4c9
Call Trace:
[] kvm_init_arch+0x90/0x145
[] sys_init_module+0xad/0x168
[] system_call+0x7e/0x83


Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc


This is the init section poison pattern.  Looks like an init function 
was used after the code was freed.


   J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Hdaps-devel] [PATCH] hdaps - switch to using input-polldev

2007-07-08 Thread Dmitry Torokhov

On Monday 09 July 2007 00:31, Shem Multinymous wrote:
> Hi Dmitry,
> 
> On 7/8/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:
> > > First, the hdaps driver regularly polls the embedded controller, which
> > > in turns regularly polls the hardware. If the two polling rates differ
> > > or fluctuate, we lose events.
> >
> > That was the case with the original driver as well bit instead of
> > rearming workqueue it was using rearming timer.
> 
> Right. Doesn't the latter result in more regular scheduling?
>

Probably.
 
> 
> > > AFAICT, the delayed workqueues used by
> > > input-polldev can get very laggy under load. That's very bad for
> > > sensitive clients like hdapsd (the hard disk shock protection daemon).
> > >
> >
> > input-polldev uses a separate workqueue, not keventd, and so should not
> > suffer from other workqueue users loading keventd. But if entire box
> > is under stress then workqueue vs timer context does not matter much -
> > your daemon which is in userspace may not get to run in a timely manner
> > anyway.
> 
> The daemon itself typically runs with a higher priority (and sleeps a
> lot so it gets further dumped). More importantly, the daemon depends
> not only on the latest measurement, but also on recent measurements
> have been obtained from the hardware in a regular fashion and with
> reasonably accurate timestamps. And *this* depends solely on the hdaps
> driver.
>

Every input event carries a timestamp so even if there are irregularities
in taking the samples you should be able to account for it. 

> 
> > However I am open to bumping up priority of ipolldevd a little.
> 
> Will this result in scheduling tha'ts as reliable as rearming timers
> from softirq? I saw claims to the contrary, but it it's true then I
> withdraw the first objection.

Probably not. But I still think that if system is so busy that it can't
get aroung to schedule one of workqueues it will not be able to part
the driver fast enough anyway.

> 
> > > Second, this is incompatible with the much-needed addition of a 2nd
> > > input device relying on the same data. The existing hdaps input device
> > > does "joystick emulation", i.e., reports values after calibration and
> > > fuzzing. Userspace programs that need the raw data, like hdapsd,
> > > currently have to poll the sysfs attribute, which is inefficient,
> > > lag-prone and induces unnecessary interrupts on tickless sytems. To
> > > solve this we'll have to add a 2nd input device to hdaps, for
> > > reporting the raw accelerometer data. (Michael Riepe and me are now
> > > working on such a patch.) But these two input devices need to share
> > > their polling of the underlying EC hardware, and this is impossible
> > > using input-polldev.
> >
> > I am curious why you can't use the current device, since the calibration
> > done in hdaps does not alter the scale but merely moves '0' point around.
> > And fuzz should only remove small jitters, not rapidly changing data
> > that you shoudl get when your box is falling.
> 
> Recent versions of the hdapsd daemons do much more than a simple
> threshold check: they gather some 2nd-order and decaying averages
> statistics to catch subtle abnormal movement (e.g., sliding off a
> surface) that's indicative of potential shock. As pointed out in IBM's
> HDAPS whitepaper, by the time the box is actually in free fall, it's
> too late to start parking the heads. Now, that kind of movement is not
> very far from the noise floor, so hdapsd needs all the accuracy it can
> get -- hence fuzzing is very disruptive. Calibration is currently
> harmless, but I can certainly imagine more advanced hdapsd that uses
> heuristics based, e.g., on the absolute orientation of the laptop, so
> let's not ruin this data.

If hdaps is the main consumer for the data it may be a good idea to
just remove the fuzz setting from input device. I don't have the hardware,
how bad is it without fuzz?

> 
> 
> > However nothing stops you from generating events for the 2nd input
> > device from the same polling function that generates events for the
> > first device.
> 
> You could one input device open, or the other, or both. How would you
> set up input-polldev to handle this?
> 

Have 2nd input device's ->open() method call input_open_device() for
the first one.

> 
> > > As for the mutex in atomic context issue, isn't it best addressed by
> > > making mutex_trylock() do the sensible thing in softirqt?
> 
> BTW, I think that's worth fixing in any case.
> 
>   Shem
> 

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Nick Piggin


Jeremy Maitin-Shepard wrote:

Nick Piggin <[EMAIL PROTECTED]> writes:



This is the Morton method, isn't it? :) I remember it sounding like a
very good idea when he brought it up, but I can't remember the details
of why it was rejected or what the problems were.



Perhaps he did bring it up before I did.  Please forward me a link to
the thread or other reference if you can find it, as I'd be interested
in reading it.


Sent in the next mail.



I suspect that freeing memory on the fly for the new kernel
would be non-trivial (but possible), however simply having a reserve
RAM region for the new kernel would be fine for a first step.



Freeing memory on the fly should be extremely easy for the kernel (this
is precisely what it does when it needs to satisfy an allocation).  Note
that the memory allocated need not be contiguous.


Yes, I have a rough idea about how page reclaim works. But I just
mean it would not be trivial to load the new kernel into physically
discontiguous memory. Possible of course, but I don't think kexec or
the setup code could quite cope ATM.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Nick Piggin


Nick Piggin wrote:

Jeremy Maitin-Shepard wrote:


Al Boldi <[EMAIL PROTECTED]> writes:



Pavel Machek wrote:


We are stuck with refrigerator for now, and at least for hibernation,
I don't see any feasible alternative.





Feasible alternative?




I posted such an alternative to the list a short time ago: hibenrating
from a *new* kernel space/user space that is created by loading a new
kernel in a manner similar to what is done for kexec crashdumps.  Unlike
kexec crashdumps, however, it would not require reserving any memory at
boot, because the necessary memory (maybe 16MB or 64MB) can be freed
just before hibernating, and device drivers can be properly stopped so
that DMAs don't stomp over certain memory.



This is the Morton method, isn't it? :) I remember it sounding like a
very good idea when he brought it up, but I can't remember the details
of why it was rejected or what the problems were.


Hmm, and it seems like I won't get to know without reliving what
looks like an epic flamewar starting here:

  http://thread.gmane.org/gmane.linux.kernel/374889

However from a quick look it seems like the only reason is the RAM
overhead of a reserve area. It seems unfortunate that it was
dismissed so quickly because of that problem (which could be
improved).

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Jeremy Maitin-Shepard

Nick Piggin <[EMAIL PROTECTED]> writes:

> Jeremy Maitin-Shepard wrote:
>> Al Boldi <[EMAIL PROTECTED]> writes:
>> 
>> 
>>> Pavel Machek wrote:
>>> 
 We are stuck with refrigerator for now, and at least for hibernation,
 I don't see any feasible alternative.
>> 
>> 
>>> Feasible alternative?
>> 
>> 
>> I posted such an alternative to the list a short time ago: hibenrating
>> from a *new* kernel space/user space that is created by loading a new
>> kernel in a manner similar to what is done for kexec crashdumps.  Unlike
>> kexec crashdumps, however, it would not require reserving any memory at
>> boot, because the necessary memory (maybe 16MB or 64MB) can be freed
>> just before hibernating, and device drivers can be properly stopped so
>> that DMAs don't stomp over certain memory.

> This is the Morton method, isn't it? :) I remember it sounding like a
> very good idea when he brought it up, but I can't remember the details
> of why it was rejected or what the problems were.

Perhaps he did bring it up before I did.  Please forward me a link to
the thread or other reference if you can find it, as I'd be interested
in reading it.


>> This approach eliminates the need for the freezer, as it would make
>> hibernate look a lot a bit like suspend to ram from the perspective of
>> the "old" kernel (the kernel being hibernated), as the hibernate
>> operation itself would be completely atomic from the perspective of the
>> "old" kernel.  That is not to say, of course, that any code paths would
>> actually be shared, or that the drivers would do the same things
>> (because they probably would not).

> Well it basically is suspend to RAM with the additional step that a
> new kernel gets booted and writes out the data from RAM to disk then
> shuts down.

There is the key difference, though, that the drivers should do rather
different things.  In particular, rather than place the hardware in a
low-power mode, it should place it in some state such that the new
kernel being loaded can handle it.

> I suspect that freeing memory on the fly for the new kernel
> would be non-trivial (but possible), however simply having a reserve
> RAM region for the new kernel would be fine for a first step.

Freeing memory on the fly should be extremely easy for the kernel (this
is precisely what it does when it needs to satisfy an allocation).  Note
that the memory allocated need not be contiguous.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Hdaps-devel] [PATCH] hdaps - switch to using input-polldev

2007-07-08 Thread Shem Multinymous

Hi Dmitry,

On 7/8/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:

> First, the hdaps driver regularly polls the embedded controller, which
> in turns regularly polls the hardware. If the two polling rates differ
> or fluctuate, we lose events.

That was the case with the original driver as well bit instead of
rearming workqueue it was using rearming timer.

Right. Doesn't the latter result in more regular scheduling?

> AFAICT, the delayed workqueues used by
> input-polldev can get very laggy under load. That's very bad for
> sensitive clients like hdapsd (the hard disk shock protection daemon).
>

input-polldev uses a separate workqueue, not keventd, and so should not
suffer from other workqueue users loading keventd. But if entire box
is under stress then workqueue vs timer context does not matter much -
your daemon which is in userspace may not get to run in a timely manner
anyway.

The daemon itself typically runs with a higher priority (and sleeps a
lot so it gets further dumped). More importantly, the daemon depends
not only on the latest measurement, but also on recent measurements
have been obtained from the hardware in a regular fashion and with
reasonably accurate timestamps. And *this* depends solely on the hdaps
driver.

However I am open to bumping up priority of ipolldevd a little.

Will this result in scheduling tha'ts as reliable as rearming timers
from softirq? I saw claims to the contrary, but it it's true then I
withdraw the first objection.

> Second, this is incompatible with the much-needed addition of a 2nd
> input device relying on the same data. The existing hdaps input device
> does "joystick emulation", i.e., reports values after calibration and
> fuzzing. Userspace programs that need the raw data, like hdapsd,
> currently have to poll the sysfs attribute, which is inefficient,
> lag-prone and induces unnecessary interrupts on tickless sytems. To
> solve this we'll have to add a 2nd input device to hdaps, for
> reporting the raw accelerometer data. (Michael Riepe and me are now
> working on such a patch.) But these two input devices need to share
> their polling of the underlying EC hardware, and this is impossible
> using input-polldev.

I am curious why you can't use the current device, since the calibration
done in hdaps does not alter the scale but merely moves '0' point around.
And fuzz should only remove small jitters, not rapidly changing data
that you shoudl get when your box is falling.

Recent versions of the hdapsd daemons do much more than a simple
threshold check: they gather some 2nd-order and decaying averages
statistics to catch subtle abnormal movement (e.g., sliding off a
surface) that's indicative of potential shock. As pointed out in IBM's
HDAPS whitepaper, by the time the box is actually in free fall, it's
too late to start parking the heads. Now, that kind of movement is not
very far from the noise floor, so hdapsd needs all the accuracy it can
get -- hence fuzzing is very disruptive. Calibration is currently
harmless, but I can certainly imagine more advanced hdapsd that uses
heuristics based, e.g., on the absolute orientation of the laptop, so
let's not ruin this data.

However nothing stops you from generating events for the 2nd input
device from the same polling function that generates events for the
first device.

You could one input device open, or the other, or both. How would you
set up input-polldev to handle this?

> As for the mutex in atomic context issue, isn't it best addressed by
> making mutex_trylock() do the sensible thing in softirqt?

BTW, I think that's worth fixing in any case.

 Shem
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Nick Piggin


Jeremy Maitin-Shepard wrote:

Al Boldi <[EMAIL PROTECTED]> writes:



Pavel Machek wrote:


We are stuck with refrigerator for now, and at least for hibernation,
I don't see any feasible alternative.




Feasible alternative?



I posted such an alternative to the list a short time ago: hibenrating
from a *new* kernel space/user space that is created by loading a new
kernel in a manner similar to what is done for kexec crashdumps.  Unlike
kexec crashdumps, however, it would not require reserving any memory at
boot, because the necessary memory (maybe 16MB or 64MB) can be freed
just before hibernating, and device drivers can be properly stopped so
that DMAs don't stomp over certain memory.


This is the Morton method, isn't it? :) I remember it sounding like a
very good idea when he brought it up, but I can't remember the details
of why it was rejected or what the problems were.



This approach eliminates the need for the freezer, as it would make
hibernate look a lot a bit like suspend to ram from the perspective of
the "old" kernel (the kernel being hibernated), as the hibernate
operation itself would be completely atomic from the perspective of the
"old" kernel.  That is not to say, of course, that any code paths would
actually be shared, or that the drivers would do the same things
(because they probably would not).


Well it basically is suspend to RAM with the additional step that a
new kernel gets booted and writes out the data from RAM to disk then
shuts down. I suspect that freeing memory on the fly for the new kernel
would be non-trivial (but possible), however simply having a reserve
RAM region for the new kernel would be fine for a first step.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

inotify_rm_watch() EINVAL race condition for deleted files

2007-07-08 Thread Timo Sirainen

1. Add IN_DELETE_SELF watch to a file
2. unlink() the file
3. Try to remove the watch -> EINVAL

I undestand this happens because there's an IN_IGNORED event in the
inotify queue, but couldn't inotify_rm_watch() be silent about it if the
event hasn't been read()? Otherwise there's a race condition between the
read() and inotify_rm_watch(), so I'll just always have to ignore EINVAL
errors. I don't like ignoring errors.

(2.6.21.3 + ext3 if it matters)



signature.asc
Description: This is a digitally signed message part

Re: [RFC][PATCH -mm] Freezer: Handle uninterruptible tasks

2007-07-08 Thread Jeremy Maitin-Shepard

Pavel Machek <[EMAIL PROTECTED]> writes:

[snip]

> I don't know how to do that mechanism... but if we knew where to trap
> filesystem writes, we could simply freeze at that point, and at that
> point only, no?

Any operation at all that has an external effect must not occur after
the snapshot is made; otherwise, there will be random hard-to-find
corruptions and other problems occurring as a result.  Thus, for
example, any writes (either directly or indirectly through e.g. a
filesystem) to non-volatile storage, any network traffic, any
communication with hardware like a printer must be prevented after the
snapshot.  It seems, though, that in general the kernel will have no way
to know which operations are safe, and which are not safe.

(This is why the whole "proper filesystem snapshot support is the
solution" argument is bogus.)

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Hibernation Redesign

2007-07-08 Thread Jeremy Maitin-Shepard

Al Boldi <[EMAIL PROTECTED]> writes:

> Pavel Machek wrote:
>> We are stuck with refrigerator for now, and at least for hibernation,
>> I don't see any feasible alternative.

> Feasible alternative?

I posted such an alternative to the list a short time ago: hibenrating
from a *new* kernel space/user space that is created by loading a new
kernel in a manner similar to what is done for kexec crashdumps.  Unlike
kexec crashdumps, however, it would not require reserving any memory at
boot, because the necessary memory (maybe 16MB or 64MB) can be freed
just before hibernating, and device drivers can be properly stopped so
that DMAs don't stomp over certain memory.

This approach eliminates the need for the freezer, as it would make
hibernate look a lot a bit like suspend to ram from the perspective of
the "old" kernel (the kernel being hibernated), as the hibernate
operation itself would be completely atomic from the perspective of the
"old" kernel.  That is not to say, of course, that any code paths would
actually be shared, or that the drivers would do the same things
(because they probably would not).

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.21.5-rt19

2007-07-08 Thread Fernando Pablo Lopez-Lezcano


On Mon, 9 Jul 2007, Gabriel C wrote:

Fernando Lopez-Lezcano wrote:

On Sat, 2007-07-07 at 11:24 +0200, Ingo Molnar wrote:

* Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:

Changes since 2.6.21.5-rt18:
- Fixed a nasty and hard to track down slowness / boot problem on SMP
machines with CONFIG_NOHZ enabled. The problem was caused by the timer
wheel base lock held during the get_next_timer_interrupt() call in the
idle path, which eventually led to a bogus PI boosting of the idle task
and in consequence a stale wrong scheduler selection for the affected 
idle

task.

Kudos to Carsten Emde, who patiently and meticulously isolated the
problem and provided the traces, which allowed to identify the root 
cause.


Problem solution: Prevent idle task boosting

Maybe someone remember me whining about troubles with 2.6.21-rt2..18 on 
my Core2 T7200 laptop (fujitsu-siemens amilo i1520).


Althought I'm still with my fingers crossed, I can tell the good news 
are that 2.6.21.5-rt19 (and -rt20) does behave far better now on the 
very same box.



Yes, it works much better indeed...

Ingo: is there a place where I can read about the changes in different 
rtxx releases? What is new/better/fixed in rt20? (I see scheduler stuff 
in a diff from rt19 to rt20 but I don't really know what it means).


and rt18 was a -rt-only NOHZ fix, that bug got introduced in rt11 when CFS 
was merged.


i _think_ Rui might have seen two separate problems. Perhaps by the time 
we fixed the first problem (which Rui saw since -rt2) we introduced the 
other one via -rt11 - which then got fixed in -rt19.


Ahh, CFS is now part of rt, I was obviously not paying attention... I'm
really trying to provide a "stable" rt kernel for audio usage and
including another subsystem into rt is - IMHO - not going to help.
What's the chance of splitting things?

btw., we'd love to get more feedback regarding CFS. CFS is a completely 
new scheduler for Linux. 


Then I'd rather have it separate from rt.

It has a design centered around keeping application latencies down, so it 
is ultimately real-time friendly, and it should also make things work 
better for desktop-ish and audio-ish stuff as well. (even under 
SCHED_OTHER)




Maybe this is CFS related? (tail of a thread in the Planet CCRMA mailing
list):

On Sun, 2007-07-08 at 15:26 -0400, Hector Centeno wrote:


Ok, so just to confirm, that 2.6.21-0182.rt19.1.fc7.ccrmart works fine
on my desktop but on my laptop it makes Firefox and Tomboy to crash.
On the same laptop using 2.6.21-0182.rt17.1.fc7.ccrmart there is no
problem.


I managed to completely hang firefox (fc7) with flash 9 installed
(unkillable even with -9).


Firefox with flash 9 does not work good , there are a lot bugs reported 
about ( just google ) and it hangs on vanilla or whatever other kernels 
as well. Not only Firefox but also Swiftfox, Opera, Epiphany etc.


The most time Firefox dies when you use flash 9 and close a window or a 
tab.


More tests...

The problem is the rt kernel AFAICT, this goes beyond Flash 9, way 
beyond:


_OpenOffice_ hangs with 2.6.21.5-rt20, works fine with stock Fedora 7 
kernel. Flash 9 hangs with 2.6.21.5-rt20, works fine with the stock Fedora 
7 kernel. Same machine booting different kernels, I'd say it is the 
kernel.


The only way out for a hung app is a reboot.

Ingo: what would be a good way to trace this? It makes the rt kernels not 
very usable at least on this hardware (more tests tomorrow in the CCRMA 
machines).


Same on 2.6.21.5-rt18 with CONFIG_NO_HZ not set.

-- Fernando
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

removing flush_tlb_mm as a generic hook ?

2007-07-08 Thread Benjamin Herrenschmidt

Hi folks !

While toying around with various MM callbacks, I found out that
flush_tlb_mm() as a generic hook provided by the archs has been mostly
obsoleted by the mmu_gather stuff.

(I'm not talking about archs internally wanting to implement it and use
it as a tlb_flush(), I'm talking about possibly making that optional :-)

I see two remaining users:

 - fs/proc/task_mmu.c, which I easily converted to use the mmu_gather
(I'll send a patch if people agree it's worth doing)

 - kernel/fork.c uses it to flush the "old" mm. That's the "meat".

I wonder if it's worth pursuing, that is converting copy_page_range to
use an mmu_gather on the source instead of using flush_tlb_mm. It might
allow some archs that can't just "flush all" easily but have to go
through every PTE individually to improve things a bit on fork, and it
allow them to remove the flush_tlb_mm() logic.

There is one reason why it's not a trivial conversion though, is that
copy_page_range() calls copy_hugetlb_page_range() for huge pages, and
I'm not sure about mixing up the hugetlb stuff with the mmu_gather
stuff, I need to do a bit more code auditing to figure out whether
that's an ok thing to do.

Nothing very urgent or important, it's just that one less hook seems
like a good idea ;-)

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: cdparanoia not setting count and/or reply_len properly

2007-07-08 Thread Douglas Gilbert

Stefan Richter wrote:
> DervishD wrote at lkml:
>> Hi all :)
>>
>> I know, this has been treated on the list before (year 2005) but
>> without any real solution I'm aware of.
>>
>> I'm running kernel 2.6.20.14, and I have an ATAPI DVD writer that I
>> use with an IDE-to-USB adapter, so it appears as an SCSI drive to the
>> kernel.
>>
>> Anytime I rip anything with it, the log fills with the same message:
>> some numbers about a certain number of bytes and the old friend message
>> that I've put in the subject.
>>
>> I assume that the warning makes sense, but the fact is that my log
>> is full with the same message, the ripping is correct (so cdparanoia is
>> working OK WRT ripping) and if weren't for the printk_ratelimit, the
>> system will freeze.
>>
>> I don't know if cdparanoia should be fixed, but certainly the
>> warning could be issued only if CONFIG_SCSI_VERBOSE is set. This way you
>> will have the message if something goes wrong and you want more info,
>> but in cases where the warning is harmless your log will be clean...
>>
>> Anyway, this message is not for make suggestions, but for asking for
>> information: why is this warning happening? naugthy cdparanoia? naughty
>> kernel? I'm a bit confused and I want to use my external DVD drive for
>> ripping from time to time, to "exercise" it...
>>
>> Thanks a lot in advance :)
>>
>> Raúl Núñez de Arenas Coronado
>>
> 
> This question is better asked at lsml.  (Therefore I'm quoting in full.)

In Fedora 7 I see this:

# cdparanoia --version
cdparanoia III release 9.8 (March 23, 2001)
(C) 2001 Monty <[EMAIL PROTECTED]> and Xiphophorus

Report bugs to [EMAIL PROTECTED]
http://www.xiph.org/paranoia/

So, given that date, lk 2.4.2 was out but it was probably
a bit early to start using the sg version 3 interface
which first appeared in lk 2.4.1 . So that "lets annoy
the user" message was added by someone who got burnt by
the old sg version 2 interface and decided people needed
to be warned. The warning comes from this code is sg.c :

/*
 * SG_DXFER_TO_FROM_DEV is functionally equivalent to SG_DXFER_FROM_DEV,
 * but is is possible that the app intended SG_DXFER_TO_DEV, because the
re
 * is a non-zero input_size, so emit a warning.
 */
if (hp->dxfer_direction == SG_DXFER_TO_FROM_DEV)
if (printk_ratelimit())
printk(KERN_WARNING
   "sg_write: data in/out %d/%d bytes for SCSI comma
nd 0x%x--"
   "guessing data in;\n" KERN_WARNING "   "
   "program %s not setting count and/or reply_len pr
operly\n",
   old_hdr.reply_len - (int)SZ_SG_HEADER,
   input_size, (unsigned int) cmnd[0],
   current->comm);

That code wasn't written be me and I would gladly remove it.
For anyone who has read the sg driver documentation,
SG_DXFER_TO_FROM_DEV implies a _read_ from the device. The
reason SG_DXFER_TO_FROM_DEV exists is for backward
compatibility to the sg version 1 interface. It was a hack to
get around the fact that the SCSI subsystem didn't report short
reads (what folks should use 'resid' for) back in those days.

It is probably about time that cdparanoia was updated ...

Doug Gilbert

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [2.6.22] Remove unneeded pointer idev from addrconf_cleanup() in net/ipv6/addrconf.c

2007-07-08 Thread Micah Gruber


This trivial patch removes the unneeded pointer idev returned from
__in6_dev_get(), which is never used. The check for NULL can be simply
done by if (__in6_dev_get(dev) == NULL).

Signed-off-by: Micah Gruber < [EMAIL PROTECTED]>

--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4240,7 +4240,6 @@
void __exit addrconf_cleanup(void)
{
   struct net_device *dev;
-   struct inet6_dev *idev;
   struct inet6_ifaddr *ifa;
   int i;

@@ -4258,7 +4257,7 @@
*/

   for_each_netdev(dev) {
-   if ((idev = __in6_dev_get(dev)) == NULL)
+   if (__in6_dev_get(dev) == NULL)
   continue;
   addrconf_ifdown(dev, 1);
   }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [2.6.22] Fix a potential NULL pointer dereference in free_shared_mem() in drivers/net/s2io.c

2007-07-08 Thread Micah Gruber


This patch fixes a potential null dereference bug where we dereference
nic before a null check. This patch simply moves the dereferencing
after the null check.

Signed-off-by: Micah Gruber < [EMAIL PROTECTED]>

--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -789,12 +789,14 @@
   struct mac_info *mac_control;
   struct config_param *config;
   int lst_size, lst_per_page;
-   struct net_device *dev = nic->dev;
+   struct net_device *dev;
   int page_num = 0;

   if (!nic)
   return;

+   dev = nic->dev;
+
   mac_control = >mac_control;
   config = >config;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [2.6.22] Fix a potential NULL pointer dereference in free_shared_mem() in drivers/net/s2io.c

2007-07-08 Thread Jeff Garzik


Micah Gruber wrote:
This patch fixes a potential null dereference bug where we dereference 
nic before a null check. This patch simply moves the dereferencing after 
the null check.


Signed-off-by: Micah Gruber < [EMAIL PROTECTED] 
>


any chance you can resend in an email format other than format=flowed?

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Hdaps-devel] [PATCH] hdaps - switch to using input-polldev

2007-07-08 Thread Dmitry Torokhov

On Sunday 08 July 2007 21:00, Shem Multinymous wrote:
> On 5/25/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:
> > HWMON: hdaps - convert to use input-polldev.
> >
> > Switch to using input-polldev skeleton instead of implementing
> > polling loop by itself. This also fixes problem with trylock
> > on a mutex in atomic context.
> >
> > Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>
> 
> There's a couple of inherent problems with this patch (now in -mm).
> 
> First, the hdaps driver regularly polls the embedded controller, which
> in turns regularly polls the hardware. If the two polling rates differ
> or fluctuate, we lose events.

That was the case with the original driver as well bit instead of
rearming workqueue it was using rearming timer.

> AFAICT, the delayed workqueues used by 
> input-polldev can get very laggy under load. That's very bad for
> sensitive clients like hdapsd (the hard disk shock protection daemon).
>

input-polldev uses a separate workqueue, not keventd, and so should not
suffer from other workqueue users loading keventd. But if entire box
is under stress then workqueue vs timer context does not matter much -
your daemon which is in userspace may not get to run in a timely manner
anyway.

However I am open to bumping up priority of ipolldevd a little.

> Second, this is incompatible with the much-needed addition of a 2nd
> input device relying on the same data. The existing hdaps input device
> does "joystick emulation", i.e., reports values after calibration and
> fuzzing. Userspace programs that need the raw data, like hdapsd,
> currently have to poll the sysfs attribute, which is inefficient,
> lag-prone and induces unnecessary interrupts on tickless sytems. To
> solve this we'll have to add a 2nd input device to hdaps, for
> reporting the raw accelerometer data. (Michael Riepe and me are now
> working on such a patch.) But these two input devices need to share
> their polling of the underlying EC hardware, and this is impossible
> using input-polldev.

I am curious why you can't use the current device, since the calibration
done in hdaps does not alter the scale but merely moves '0' point around.
And fuzz should only remove small jitters, not rapidly changing data
that you shoudl get when your box is falling.

However nothing stops you from generating events for the 2nd input
device from the same polling function that generates events for the
first device.

> 
> Since this patch will  degrade accuracy and will eventually be
> reverted anyway, I suggest retracting it.

I have not seen anything in your mail that would warrant reverting
the patch.

> 
> As for the mutex in atomic context issue, isn't it best addressed by
> making mutex_trylock() do the sensible thing in softirqt?
> 
>   Shem
> 

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/11] security: unexport mmap_min_addr

2007-07-08 Thread James Morris

From: Adrian Bunk <[EMAIL PROTECTED]>

Remove unneeded export.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 security/security.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/security/security.c b/security/security.c
index 024484f..27e5863 100644
--- a/security/security.c
+++ b/security/security.c
@@ -177,5 +177,4 @@ EXPORT_SYMBOL_GPL(register_security);
 EXPORT_SYMBOL_GPL(unregister_security);
 EXPORT_SYMBOL_GPL(mod_reg_security);
 EXPORT_SYMBOL_GPL(mod_unreg_security);
-EXPORT_SYMBOL_GPL(mmap_min_addr);
 EXPORT_SYMBOL(security_ops);
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/11] SELinux: allow preemption between transition permission checks

2007-07-08 Thread James Morris

From: Stephen Smalley <[EMAIL PROTECTED]>

In security_get_user_sids, move the transition permission checks
outside of the section holding the policy rdlock, and use the AVC to
perform the checks, calling cond_resched after each one.  These
changes should allow preemption between the individual checks and
enable caching of the results.  It may however increase the overall
time spent in the function in some cases, particularly in the cache
miss case.

The long term fix will be to take much of this logic to userspace by
exporting additional state via selinuxfs, and ultimately deprecating
and eliminating this interface from the kernel.

Tested-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by:  Stephen Smalley <[EMAIL PROTECTED]>
Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 security/selinux/avc.c |   10 +---
 security/selinux/hooks.c   |9 ---
 security/selinux/include/avc.h |6 +++-
 security/selinux/ss/services.c |   49 ---
 4 files changed, 45 insertions(+), 29 deletions(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index e4396a8..cc5fcef 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -832,6 +832,7 @@ int avc_ss_reset(u32 seqno)
  * @tsid: target security identifier
  * @tclass: target security class
  * @requested: requested permissions, interpreted based on @tclass
+ * @flags:  AVC_STRICT or 0
  * @avd: access vector decisions
  *
  * Check the AVC to determine whether the @requested permissions are granted
@@ -846,8 +847,9 @@ int avc_ss_reset(u32 seqno)
  * should be released for the auditing.
  */
 int avc_has_perm_noaudit(u32 ssid, u32 tsid,
- u16 tclass, u32 requested,
- struct av_decision *avd)
+u16 tclass, u32 requested,
+unsigned flags,
+struct av_decision *avd)
 {
struct avc_node *node;
struct avc_entry entry, *p_ae;
@@ -874,7 +876,7 @@ int avc_has_perm_noaudit(u32 ssid, u32 tsid,
denied = requested & ~(p_ae->avd.allowed);
 
if (!requested || denied) {
-   if (selinux_enforcing)
+   if (selinux_enforcing || (flags & AVC_STRICT))
rc = -EACCES;
else
if (node)
@@ -909,7 +911,7 @@ int avc_has_perm(u32 ssid, u32 tsid, u16 tclass,
struct av_decision avd;
int rc;
 
-   rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, );
+   rc = avc_has_perm_noaudit(ssid, tsid, tclass, requested, 0, );
avc_audit(ssid, tsid, tclass, requested, , rc, auditdata);
return rc;
 }
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index ad8dd4e..b29059e 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1592,9 +1592,10 @@ static int selinux_vm_enough_memory(long pages)
rc = secondary_ops->capable(current, CAP_SYS_ADMIN);
if (rc == 0)
rc = avc_has_perm_noaudit(tsec->sid, tsec->sid,
-   SECCLASS_CAPABILITY,
-   CAP_TO_MASK(CAP_SYS_ADMIN),
-   NULL);
+ SECCLASS_CAPABILITY,
+ CAP_TO_MASK(CAP_SYS_ADMIN),
+ 0,
+ NULL);
 
if (rc == 0)
cap_sys_admin = 1;
@@ -4626,7 +4627,7 @@ static int selinux_setprocattr(struct task_struct *p,
if (p->ptrace & PT_PTRACED) {
error = avc_has_perm_noaudit(tsec->ptrace_sid, sid,
 SECCLASS_PROCESS,
-PROCESS__PTRACE, );
+PROCESS__PTRACE, 0, );
if (!error)
tsec->sid = sid;
task_unlock(p);
diff --git a/security/selinux/include/avc.h b/security/selinux/include/avc.h
index 6ed10c3..e145f6e 100644
--- a/security/selinux/include/avc.h
+++ b/security/selinux/include/avc.h
@@ -102,9 +102,11 @@ void avc_audit(u32 ssid, u32 tsid,
u16 tclass, u32 requested,
struct av_decision *avd, int result, struct avc_audit_data 
*auditdata);
 
+#define AVC_STRICT 1 /* Ignore permissive mode. */
 int avc_has_perm_noaudit(u32 ssid, u32 tsid,
- u16 tclass, u32 requested,
- struct av_decision *avd);
+u16 tclass, u32 requested,
+unsigned flags,
+struct av_decision *avd);
 
 int avc_has_perm(u32 ssid, u32 tsid,
  u16 tclass, u32 requested,
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index e4249ad..b5f017f 100644
---

[PATCH 09/11] security: Protection for exploiting null dereference using mmap

2007-07-08 Thread James Morris

From: Eric Paris <[EMAIL PROTECTED]>

Add a new security check on mmap operations to see if the user is attempting
to mmap to low area of the address space.  The amount of space protected is
indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to
0, preserving existing behavior.

This patch uses a new SELinux security class "memprotect."  Policy already
contains a number of allow rules like a_t self:process * (unconfined_t being
one of them) which mean that putting this check in the process class (its
best current fit) would make it useless as all user processes, which we also
want to protect against, would be allowed. By taking the memprotect name of
the new class it will also make it possible for us to move some of the other
memory protect permissions out of 'process' and into the new class next time
we bump the policy version number (which I also think is a good future idea)

Acked-by: Stephen Smalley <[EMAIL PROTECTED]>
Signed-off-by: Eric Paris <[EMAIL PROTECTED]>
Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 Documentation/sysctl/vm.txt  |   15 +++
 include/linux/security.h |   17 -
 kernel/sysctl.c  |   10 ++
 mm/mmap.c|4 ++--
 mm/mremap.c  |   13 +++--
 mm/nommu.c   |2 +-
 security/dummy.c |6 +-
 security/security.c  |2 ++
 security/selinux/hooks.c |   12 
 security/selinux/include/av_perm_to_string.h |1 +
 security/selinux/include/av_permissions.h|1 +
 security/selinux/include/class_to_string.h   |1 +
 security/selinux/include/flask.h |1 +
 13 files changed, 70 insertions(+), 15 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 1d19256..8cfca17 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -31,6 +31,7 @@ Currently, these files are in /proc/sys/vm:
 - min_unmapped_ratio
 - min_slab_ratio
 - panic_on_oom
+- mmap_min_address
 
 ==
 
@@ -216,3 +217,17 @@ above-mentioned.
 The default value is 0.
 1 and 2 are for failover of clustering. Please select either
 according to your policy of failover.
+
+==
+
+mmap_min_addr
+
+This file indicates the amount of address space  which a user process will
+be restricted from mmaping.  Since kernel null dereference bugs could
+accidentally operate based on the information in the first couple of pages
+of memory userspace processes should not be allowed to write to them.  By
+default this value is set to 0 and no protections will be enforced by the
+security module.  Setting this value to something like 64k will allow the
+vast majority of applications to work correctly and provide defense in depth
+against future potential kernel bugs.
+
diff --git a/include/linux/security.h b/include/linux/security.h
index 9eb9e0f..c11dc8a 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -71,6 +71,7 @@ struct xfrm_user_sec_ctx;
 extern int cap_netlink_send(struct sock *sk, struct sk_buff *skb);
 extern int cap_netlink_recv(struct sk_buff *skb, int cap);
 
+extern unsigned long mmap_min_addr;
 /*
  * Values used in the task_security_ops calls
  */
@@ -1241,8 +1242,9 @@ struct security_operations {
int (*file_ioctl) (struct file * file, unsigned int cmd,
   unsigned long arg);
int (*file_mmap) (struct file * file,
- unsigned long reqprot,
- unsigned long prot, unsigned long flags);
+ unsigned long reqprot, unsigned long prot,
+ unsigned long flags, unsigned long addr,
+ unsigned long addr_only);
int (*file_mprotect) (struct vm_area_struct * vma,
  unsigned long reqprot,
  unsigned long prot);
@@ -1814,9 +1816,12 @@ static inline int security_file_ioctl (struct file 
*file, unsigned int cmd,
 
 static inline int security_file_mmap (struct file *file, unsigned long reqprot,
  unsigned long prot,
- unsigned long flags)
+ unsigned long flags,
+ unsigned long addr,
+ unsigned long addr_only)
 {
-   return security_ops->file_mmap (file, reqprot, prot, flags);
+   return security_ops->file_mmap (file, reqprot, prot, flags, addr,
+   addr_only);
 }
 
 static inline int security_file_mprotect (struct vm_area_struct *vma,
@@ -2489,7 +2494,9 @@ static inline int

[PATCH 10/11] SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel

2007-07-08 Thread James Morris

From: Paul Moore <[EMAIL PROTECTED]>

These changes will make NetLabel behave like labeled IPsec where there is an
access check for both labeled and unlabeled packets as well as providing the
ability to restrict domains to receiving only labeled packets when NetLabel
is in use.  The changes to the policy are straight forward with the
following necessary to receive labeled traffic (with SECINITSID_NETMSG
defined as "netlabel_peer_t"):

 allow mydom_t netlabel_peer_t:{ tcp_socket udp_socket rawip_socket } recvfrom;

The policy for unlabeled traffic would be:

 allow mydom_t unlabeled_t:{ tcp_socket udp_socket rawip_socket } recvfrom;

These policy changes, as well as more general NetLabel support, are included
in the SELinux Reference Policy SVN tree, r2352 or later.  Users who enable
NetLabel support in the kernel are strongly encouraged to upgrade their
policy to avoid network problems.

Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 security/selinux/hooks.c|   21 +++--
 security/selinux/netlabel.c |   34 +-
 2 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 78c3f98..aff8f46 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3129,17 +3129,19 @@ static int selinux_parse_skb(struct sk_buff *skb, 
struct avc_audit_data *ad,
 /**
  * selinux_skb_extlbl_sid - Determine the external label of a packet
  * @skb: the packet
- * @base_sid: the SELinux SID to use as a context for MLS only external labels
  * @sid: the packet's SID
  *
  * Description:
  * Check the various different forms of external packet labeling and determine
- * the external SID for the packet.
+ * the external SID for the packet.  If only one form of external labeling is
+ * present then it is used, if both labeled IPsec and NetLabel labels are
+ * present then the SELinux type information is taken from the labeled IPsec
+ * SA and the MLS sensitivity label information is taken from the NetLabel
+ * security attributes.  This bit of "magic" is done in the call to
+ * selinux_netlbl_skbuff_getsid().
  *
  */
-static void selinux_skb_extlbl_sid(struct sk_buff *skb,
-  u32 base_sid,
-  u32 *sid)
+static void selinux_skb_extlbl_sid(struct sk_buff *skb, u32 *sid)
 {
u32 xfrm_sid;
u32 nlbl_sid;
@@ -3147,10 +3149,9 @@ static void selinux_skb_extlbl_sid(struct sk_buff *skb,
selinux_skb_xfrm_sid(skb, _sid);
if (selinux_netlbl_skbuff_getsid(skb,
 (xfrm_sid == SECSID_NULL ?
- base_sid : xfrm_sid),
+ SECINITSID_NETMSG : xfrm_sid),
 _sid) != 0)
nlbl_sid = SECSID_NULL;
-
*sid = (nlbl_sid == SECSID_NULL ? xfrm_sid : nlbl_sid);
 }
 
@@ -3695,7 +3696,7 @@ static int selinux_socket_getpeersec_dgram(struct socket 
*sock, struct sk_buff *
if (sock && sock->sk->sk_family == PF_UNIX)
selinux_get_inode_sid(SOCK_INODE(sock), _secid);
else if (skb)
-   selinux_skb_extlbl_sid(skb, SECINITSID_UNLABELED, _secid);
+   selinux_skb_extlbl_sid(skb, _secid);
 
if (peer_secid == SECSID_NULL)
err = -EINVAL;
@@ -3756,7 +3757,7 @@ static int selinux_inet_conn_request(struct sock *sk, 
struct sk_buff *skb,
u32 newsid;
u32 peersid;
 
-   selinux_skb_extlbl_sid(skb, SECINITSID_UNLABELED, );
+   selinux_skb_extlbl_sid(skb, );
if (peersid == SECSID_NULL) {
req->secid = sksec->sid;
req->peer_secid = SECSID_NULL;
@@ -3794,7 +3795,7 @@ static void selinux_inet_conn_established(struct sock *sk,
 {
struct sk_security_struct *sksec = sk->sk_security;
 
-   selinux_skb_extlbl_sid(skb, SECINITSID_UNLABELED, >peer_sid);
+   selinux_skb_extlbl_sid(skb, >peer_sid);
 }
 
 static void selinux_req_classify_flow(const struct request_sock *req,
diff --git a/security/selinux/netlabel.c b/security/selinux/netlabel.c
index e64eca2..8192e8b 100644
--- a/security/selinux/netlabel.c
+++ b/security/selinux/netlabel.c
@@ -158,9 +158,7 @@ int selinux_netlbl_skbuff_getsid(struct sk_buff *skb, u32 
base_sid, u32 *sid)
netlbl_secattr_init();
rc = netlbl_skbuff_getattr(skb, );
if (rc == 0 && secattr.flags != NETLBL_SECATTR_NONE)
-   rc = security_netlbl_secattr_to_sid(,
-   base_sid,
-   sid);
+   rc = security_netlbl_secattr_to_sid(, base_sid, sid);
else
*sid = SECSID_NULL;
netlbl_secattr_destroy();
@@ -198,7 +196,7 @@ void selinux_netlbl_sock_graft(struct sock *sk, struct 
socket *sock)
if

[PATCH 06/11] selinux: introduce schedule points in policydb_destroy()

2007-07-08 Thread James Morris

From: Eric Paris <[EMAIL PROTECTED]>

During the LSPP testing we found that it was possible for
policydb_destroy() to take 10+ seconds of kernel time to complete.
Basically all policydb_destroy() does is walk some (possibly long) lists
and free the memory it finds.  Turning off slab debugging config options
made the problem go away since the actual functions which took most of
the time were (as seen by oprofile)

> 121202   23.9879  .check_poison_obj
> 7824715.4864  .check_slabp

were caused by that.  So I decided to also add some voluntary schedule
points in that code so config voluntary preempt would be enough to solve
the problem.  Something similar was done in places like
shmem_free_pages() when we have to walk a list of memory and free it.
This was tested by the LSPP group on the hardware which could reproduce
the problem just loading a new policy and was found to not trigger the
softlock detector.  It takes just as much processing time, but the
kernel doesn't spend all that time stuck doing one thing and never
scheduling.

Someday a better way to handle memory might make the time needed in this
function a lot less, but this fixes the current issue as it stands
today.

Signed-off-by: Eric Paris <[EMAIL PROTECTED]>
Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 security/selinux/ss/policydb.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
index 0ac1021..f05f97a 100644
--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c
@@ -21,6 +21,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -598,6 +599,7 @@ void policydb_destroy(struct policydb *p)
struct range_trans *rt, *lrt = NULL;
 
for (i = 0; i < SYM_NUM; i++) {
+   cond_resched();
hashtab_map(p->symtab[i].table, destroy_f[i], NULL);
hashtab_destroy(p->symtab[i].table);
}
@@ -612,6 +614,7 @@ void policydb_destroy(struct policydb *p)
avtab_destroy(>te_avtab);
 
for (i = 0; i < OCON_NUM; i++) {
+   cond_resched();
c = p->ocontexts[i];
while (c) {
ctmp = c;
@@ -623,6 +626,7 @@ void policydb_destroy(struct policydb *p)
 
g = p->genfs;
while (g) {
+   cond_resched();
kfree(g->fstype);
c = g->head;
while (c) {
@@ -639,18 +643,21 @@ void policydb_destroy(struct policydb *p)
cond_policydb_destroy(p);
 
for (tr = p->role_tr; tr; tr = tr->next) {
+   cond_resched();
kfree(ltr);
ltr = tr;
}
kfree(ltr);
 
for (ra = p->role_allow; ra; ra = ra -> next) {
+   cond_resched();
kfree(lra);
lra = ra;
}
kfree(lra);
 
for (rt = p->range_tr; rt; rt = rt -> next) {
+   cond_resched();
if (lrt) {
ebitmap_destroy(>target_range.level[0].cat);
ebitmap_destroy(>target_range.level[1].cat);
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/11] SELinux: Use %lu for inode->i_no when printing avc

2007-07-08 Thread James Morris

From: Tobias Oed <[EMAIL PROTECTED]>

Inode numbers are unsigned long and so need to %lu as format string of printf.

Signed-off-by: Tobias Oed <[EMAIL PROTECTED]>
Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 security/selinux/avc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index cc5fcef..78c408f 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -586,7 +586,7 @@ void avc_audit(u32 ssid, u32 tsid,
}
}
if (inode)
-   audit_log_format(ab, " dev=%s ino=%ld",
+   audit_log_format(ab, " dev=%s ino=%lu",
 inode->i_sb->s_id,
 inode->i_ino);
break;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/11] selinux: add selinuxfs structure for object class discovery

2007-07-08 Thread James Morris

From: Christopher J. PeBenito <[EMAIL PROTECTED]>

The structure is as follows (relative to selinuxfs root):

/class/file/index
/class/file/perms/read
/class/file/perms/write
...

Each class is allocated 33 inodes, 1 for the class index and 32 for
permissions.  Relative to SEL_CLASS_INO_OFFSET, the inode of the index file
DIV 33 is the class number.  The inode of the permission file % 33 is the
index of the permission for that class.

Signed-off-by: Christopher J. PeBenito <[EMAIL PROTECTED]>
Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 security/selinux/include/security.h |1 +
 security/selinux/selinuxfs.c|  249 +++
 2 files changed, 250 insertions(+), 0 deletions(-)

diff --git a/security/selinux/include/security.h 
b/security/selinux/include/security.h
index 731a173..83bdd4d 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -41,6 +41,7 @@ extern int selinux_mls_enabled;
 
 int security_load_policy(void * data, size_t len);
 
+#define SEL_VEC_MAX 32
 struct av_decision {
u32 allowed;
u32 decided;
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index cf1acde..c9e92da 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -67,6 +67,10 @@ static struct dentry *bool_dir = NULL;
 static int bool_num = 0;
 static int *bool_pending_values = NULL;
 
+/* global data for classes */
+static struct dentry *class_dir = NULL;
+static unsigned long last_class_ino;
+
 extern void selnl_notify_setenforce(int val);
 
 /* Check whether a task is allowed to use a security operation. */
@@ -106,6 +110,7 @@ static unsigned long sel_last_ino = SEL_INO_NEXT - 1;
 
 #define SEL_INITCON_INO_OFFSET 0x0100
 #define SEL_BOOL_INO_OFFSET0x0200
+#define SEL_CLASS_INO_OFFSET   0x0400
 #define SEL_INO_MASK   0x00ff
 
 #define TMPBUFLEN  12
@@ -237,6 +242,11 @@ static const struct file_operations sel_policyvers_ops = {
 
 /* declaration for sel_write_load */
 static int sel_make_bools(void);
+static int sel_make_classes(void);
+
+/* declaration for sel_make_class_dirs */
+static int sel_make_dir(struct inode *dir, struct dentry *dentry,
+   unsigned long *ino);
 
 static ssize_t sel_read_mls(struct file *filp, char __user *buf,
size_t count, loff_t *ppos)
@@ -287,10 +297,18 @@ static ssize_t sel_write_load(struct file * file, const 
char __user * buf,
goto out;
 
ret = sel_make_bools();
+   if (ret) {
+   length = ret;
+   goto out1;
+   }
+
+   ret = sel_make_classes();
if (ret)
length = ret;
else
length = count;
+
+out1:
audit_log(current->audit_context, GFP_KERNEL, AUDIT_MAC_POLICY_LOAD,
"policy loaded auid=%u",
audit_get_loginuid(current->audit_context));
@@ -1293,6 +1311,225 @@ out:
return ret;
 }
 
+static inline unsigned int sel_div(unsigned long a, unsigned long b)
+{
+   return a / b - (a % b < 0);
+}
+
+static inline unsigned long sel_class_to_ino(u16 class)
+{
+   return (class * (SEL_VEC_MAX + 1)) | SEL_CLASS_INO_OFFSET;
+}
+
+static inline u16 sel_ino_to_class(unsigned long ino)
+{
+   return sel_div(ino & SEL_INO_MASK, SEL_VEC_MAX + 1);
+}
+
+static inline unsigned long sel_perm_to_ino(u16 class, u32 perm)
+{
+   return (class * (SEL_VEC_MAX + 1) + perm) | SEL_CLASS_INO_OFFSET;
+}
+
+static inline u32 sel_ino_to_perm(unsigned long ino)
+{
+   return (ino & SEL_INO_MASK) % (SEL_VEC_MAX + 1);
+}
+
+static ssize_t sel_read_class(struct file * file, char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   ssize_t rc, len;
+   char *page;
+   unsigned long ino = file->f_path.dentry->d_inode->i_ino;
+
+   page = (char *)__get_free_page(GFP_KERNEL);
+   if (!page) {
+   rc = -ENOMEM;
+   goto out;
+   }
+
+   len = snprintf(page, PAGE_SIZE, "%d", sel_ino_to_class(ino));
+   rc = simple_read_from_buffer(buf, count, ppos, page, len);
+   free_page((unsigned long)page);
+out:
+   return rc;
+}
+
+static const struct file_operations sel_class_ops = {
+   .read   = sel_read_class,
+};
+
+static ssize_t sel_read_perm(struct file * file, char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   ssize_t rc, len;
+   char *page;
+   unsigned long ino = file->f_path.dentry->d_inode->i_ino;
+
+   page = (char *)__get_free_page(GFP_KERNEL);
+   if (!page) {
+   rc = -ENOMEM;
+   goto out;
+   }
+
+   len = snprintf(page, PAGE_SIZE,"%d", sel_ino_to_perm(ino));
+   rc = simple_read_from_buffer(buf, count, ppos, page, len);
+   free_page((unsigned long)page);
+out:
+   return rc;
+}
+
+static const struct file_operations sel_perm_ops

[PATCH 05/11] security: revalidate rw permissions for sys_splice and sys_vmsplice

2007-07-08 Thread James Morris

Revalidate read/write permissions for splice(2) and vmslice(2), in case 
security policy has changed since the files were opened.

Signed-off-by: James Morris <[EMAIL PROTECTED]>
Signed-off-by: Jens Axboe <[EMAIL PROTECTED]>
Acked-by: Stephen Smalley <[EMAIL PROTECTED]>
---
 fs/splice.c |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index e7d7080..98025ec 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct partial_page {
unsigned int offset;
@@ -931,6 +932,10 @@ static long do_splice_from(struct pipe_inode_info *pipe, 
struct file *out,
if (unlikely(ret < 0))
return ret;
 
+   ret = security_file_permission(out, MAY_WRITE);
+   if (unlikely(ret < 0))
+   return ret;
+
return out->f_op->splice_write(pipe, out, ppos, len, flags);
 }
 
@@ -953,6 +958,10 @@ static long do_splice_to(struct file *in, loff_t *ppos,
if (unlikely(ret < 0))
return ret;
 
+   ret = security_file_permission(in, MAY_READ);
+   if (unlikely(ret < 0))
+   return ret;
+
return in->f_op->splice_read(in, ppos, pipe, len, flags);
 }
 
@@ -1271,6 +1280,7 @@ static int get_iovec_page_array(const struct iovec __user 
*iov,
 static long do_vmsplice(struct file *file, const struct iovec __user *iov,
unsigned long nr_segs, unsigned int flags)
 {
+   long err;
struct pipe_inode_info *pipe;
struct page *pages[PIPE_BUFFERS];
struct partial_page partial[PIPE_BUFFERS];
@@ -1289,6 +1299,10 @@ static long do_vmsplice(struct file *file, const struct 
iovec __user *iov,
else if (unlikely(!nr_segs))
return 0;
 
+   err = security_file_permission(file, MAY_WRITE);
+   if (unlikely(err < 0))
+   return err;
+
spd.nr_pages = get_iovec_page_array(iov, nr_segs, pages, partial,
flags & SPLICE_F_GIFT);
if (spd.nr_pages <= 0)
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 09/10] Remove the SLOB allocator for 2.6.23

2007-07-08 Thread Nick Piggin


Andrew Morton wrote:

On Sun, 8 Jul 2007 09:51:19 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote:


A year ago the -rt kernel defaulted to the SLOB for a few releases, and 
barring some initial scalability issues (which were solved in -rt) it 
worked pretty well on generic PCs, so i dont buy the 'it doesnt work' 
argument either.





I don't think a saving of a few k of text would justify slob's retention.


Probably not.



A reason for retaining slob would be that it has some O(n) memory saving
due to better packing, etc.  Indeed that was the reason for merging it in
the first place.  If slob no longer retains that advantage (wrt slub) then
we no longer need it.


SLOB contains several significant O(1) and also O(n) memory savings that
are so far impossible-by-design for SLUB. They are: slab external
fragmentation is significantly reduced; kmalloc internal fragmentation is
significantly reduced; order of magnitude smaller kmem_cache data type;
order of magnitude less code...

Actually with an unscientific test boot of a semi-stripped down kernel and
mem=16MB, SLOB (the version in -mm) uses 400K less than SLUB (or a full 50%
more RAM free after booting into bash as the init).

Now it's not for me to say that this is significant enough to make SLOB
worth keeping, or what sort of results it yields in the field, so I cc'ed
Denis who is the busybox maintainer, and Erik who is ulibc maintainer in
case they have anything to add.



Guys, look at this the other way.  Suppose we only had slub, and someone
came along and said "here's a whole new allocator which saves 4.5k of
text", would we merge it on that basis?  Hell no, it's not worth it.  What
we might do is to get motivated to see if we can make slub less porky under
appropriate config settings.



In light of Denis's recent statement I saw "In busybox project people
can kill for 1.7K", there might be a mass killing of kernel developers
in Cambridge this year if SLOB gets removed ;)

Joking aside, the last time this came up, I thought we concluded that
removal of SLOB would be a severe regression for a significant userbase.



Let's not get sentimental about these things: in general, if there's any
reasonable way in which we can rid ourselves of any code at all, we should
do so, no?


Definitely. And this is exactly what we said last time as well. If the
small memory embedded guys are happy for SLOB to go, then I'm happy too.
So, the relevant question is -- are most/all current SLOB users are now
happy to switch over to SLUB, in light of the recent advances to both
allocators?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/11] selinux: change sel_make_dir() to specify inode counter

2007-07-08 Thread James Morris

From: Christopher J. PeBenito <[EMAIL PROTECTED]>

Specify the inode counter explicitly in sel_make_dir(), rather than always
using sel_last_ino.

Signed-off-by: Christopher J. PeBenito <[EMAIL PROTECTED]>
Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 security/selinux/selinuxfs.c |   11 ++-
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index e955246..cf1acde 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -1293,7 +1293,8 @@ out:
return ret;
 }
 
-static int sel_make_dir(struct inode *dir, struct dentry *dentry)
+static int sel_make_dir(struct inode *dir, struct dentry *dentry,
+   unsigned long *ino)
 {
int ret = 0;
struct inode *inode;
@@ -1305,7 +1306,7 @@ static int sel_make_dir(struct inode *dir, struct dentry 
*dentry)
}
inode->i_op = _dir_inode_operations;
inode->i_fop = _dir_operations;
-   inode->i_ino = ++sel_last_ino;
+   inode->i_ino = ++(*ino);
/* directory inodes start off with i_nlink == 2 (for "." entry) */
inc_nlink(inode);
d_add(dentry, inode);
@@ -1351,7 +1352,7 @@ static int sel_fill_super(struct super_block * sb, void * 
data, int silent)
goto err;
}
 
-   ret = sel_make_dir(root_inode, dentry);
+   ret = sel_make_dir(root_inode, dentry, _last_ino);
if (ret)
goto err;
 
@@ -1384,7 +1385,7 @@ static int sel_fill_super(struct super_block * sb, void * 
data, int silent)
goto err;
}
 
-   ret = sel_make_dir(root_inode, dentry);
+   ret = sel_make_dir(root_inode, dentry, _last_ino);
if (ret)
goto err;
 
@@ -1398,7 +1399,7 @@ static int sel_fill_super(struct super_block * sb, void * 
data, int silent)
goto err;
}
 
-   ret = sel_make_dir(root_inode, dentry);
+   ret = sel_make_dir(root_inode, dentry, _last_ino);
if (ret)
goto err;
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/11] selinux: add support for querying object classes and permissions from the running policy

2007-07-08 Thread James Morris

From: Christopher J. PeBenito <[EMAIL PROTECTED]>

Add support to the SELinux security server for obtaining a list of classes,
and for obtaining a list of permissions for a specified class.

Signed-off-by: Christopher J. PeBenito <[EMAIL PROTECTED]>
Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 security/selinux/include/security.h |3 +
 security/selinux/ss/services.c  |   95 +++
 2 files changed, 98 insertions(+), 0 deletions(-)

diff --git a/security/selinux/include/security.h 
b/security/selinux/include/security.h
index b94378a..731a173 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -87,6 +87,9 @@ int security_validate_transition(u32 oldsid, u32 newsid, u32 
tasksid,
 
 int security_sid_mls_copy(u32 sid, u32 mls_sid, u32 *new_sid);
 
+int security_get_classes(char ***classes, int *nclasses);
+int security_get_permissions(char *class, char ***perms, int *nperms);
+
 #define SECURITY_FS_USE_XATTR  1 /* use xattr */
 #define SECURITY_FS_USE_TRANS  2 /* use transition SIDs, e.g. 
devpts/tmpfs */
 #define SECURITY_FS_USE_TASK   3 /* use task SIDs, e.g. pipefs/sockfs 
*/
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 40660ff..e4249ad 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -1996,6 +1996,101 @@ out:
return rc;
 }
 
+static int get_classes_callback(void *k, void *d, void *args)
+{
+   struct class_datum *datum = d;
+   char *name = k, **classes = args;
+   int value = datum->value - 1;
+
+   classes[value] = kstrdup(name, GFP_ATOMIC);
+   if (!classes[value])
+   return -ENOMEM;
+
+   return 0;
+}
+
+int security_get_classes(char ***classes, int *nclasses)
+{
+   int rc = -ENOMEM;
+
+   POLICY_RDLOCK;
+
+   *nclasses = policydb.p_classes.nprim;
+   *classes = kcalloc(*nclasses, sizeof(*classes), GFP_ATOMIC);
+   if (!*classes)
+   goto out;
+
+   rc = hashtab_map(policydb.p_classes.table, get_classes_callback,
+   *classes);
+   if (rc < 0) {
+   int i;
+   for (i = 0; i < *nclasses; i++)
+   kfree((*classes)[i]);
+   kfree(*classes);
+   }
+
+out:
+   POLICY_RDUNLOCK;
+   return rc;
+}
+
+static int get_permissions_callback(void *k, void *d, void *args)
+{
+   struct perm_datum *datum = d;
+   char *name = k, **perms = args;
+   int value = datum->value - 1;
+
+   perms[value] = kstrdup(name, GFP_ATOMIC);
+   if (!perms[value])
+   return -ENOMEM;
+
+   return 0;
+}
+
+int security_get_permissions(char *class, char ***perms, int *nperms)
+{
+   int rc = -ENOMEM, i;
+   struct class_datum *match;
+
+   POLICY_RDLOCK;
+
+   match = hashtab_search(policydb.p_classes.table, class);
+   if (!match) {
+   printk(KERN_ERR "%s:  unrecognized class %s\n",
+   __FUNCTION__, class);
+   rc = -EINVAL;
+   goto out;
+   }
+
+   *nperms = match->permissions.nprim;
+   *perms = kcalloc(*nperms, sizeof(*perms), GFP_ATOMIC);
+   if (!*perms)
+   goto out;
+
+   if (match->comdatum) {
+   rc = hashtab_map(match->comdatum->permissions.table,
+   get_permissions_callback, *perms);
+   if (rc < 0)
+   goto err;
+   }
+
+   rc = hashtab_map(match->permissions.table, get_permissions_callback,
+   *perms);
+   if (rc < 0)
+   goto err;
+
+out:
+   POLICY_RDUNLOCK;
+   return rc;
+
+err:
+   POLICY_RDUNLOCK;
+   for (i = 0; i < *nperms; i++)
+   kfree((*perms)[i]);
+   kfree(*perms);
+   return rc;
+}
+
 struct selinux_audit_rule {
u32 au_seqno;
struct context au_ctxt;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/11] selinux: rename sel_remove_bools() for more general usage.

2007-07-08 Thread James Morris

From: Christopher J. PeBenito <[EMAIL PROTECTED]>

sel_remove_bools() will also be used by the object class discovery, rename
it for more general use.

Signed-off-by: Christopher J. PeBenito <[EMAIL PROTECTED]>
Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 security/selinux/selinuxfs.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index aca099a..e955246 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -940,9 +940,8 @@ static const struct file_operations sel_commit_bools_ops = {
.write  = sel_commit_bools_write,
 };
 
-/* delete booleans - partial revoke() from
- * fs/proc/generic.c proc_kill_inodes */
-static void sel_remove_bools(struct dentry *de)
+/* partial revoke() from fs/proc/generic.c proc_kill_inodes */
+static void sel_remove_entries(struct dentry *de)
 {
struct list_head *p, *node;
struct super_block *sb = de->d_sb;
@@ -998,7 +997,7 @@ static int sel_make_bools(void)
kfree(bool_pending_values);
bool_pending_values = NULL;
 
-   sel_remove_bools(dir);
+   sel_remove_entries(dir);
 
if (!(page = (char*)get_zeroed_page(GFP_KERNEL)))
return -ENOMEM;
@@ -1048,7 +1047,7 @@ out:
return ret;
 err:
kfree(values);
-   sel_remove_bools(dir);
+   sel_remove_entries(dir);
ret = -ENOMEM;
goto out;
 }
-- 
1.5.0.6

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/11] SELinux patches for 2.6.23

2007-07-08 Thread James Morris

The following changes since commit 7dcca30a32aadb0520417521b0c44f42d09fe05c:
  Linus Torvalds (1):
Linux 2.6.22

are found in the git repository at:

  
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6.git#for-linus

Adrian Bunk (1):
  security: unexport mmap_min_addr

Christopher J. PeBenito (4):
  selinux: add support for querying object classes and permissions from the 
running policy
  selinux: rename sel_remove_bools() for more general usage.
  selinux: change sel_make_dir() to specify inode counter.
  selinux: add selinuxfs structure for object class discovery

Eric Paris (2):
  selinux: introduce schedule points in policydb_destroy()
  security: Protection for exploiting null dereference using mmap

James Morris (1):
  security: revalidate rw permissions for sys_splice and sys_vmsplice

Paul Moore (1):
  SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for 
NetLabel

Stephen Smalley (1):
  SELinux: allow preemption between transition permission checks

Tobias Oed (1):
  SELinux: Use %lu for inode->i_no when printing avc

 Documentation/sysctl/vm.txt  |   15 ++
 fs/splice.c  |   14 ++
 include/linux/security.h |   17 ++-
 kernel/sysctl.c  |   10 +
 mm/mmap.c|4 +-
 mm/mremap.c  |   13 +-
 mm/nommu.c   |2 +-
 security/dummy.c |6 +-
 security/security.c  |1 +
 security/selinux/avc.c   |   12 +-
 security/selinux/hooks.c |   42 +++--
 security/selinux/include/av_perm_to_string.h |1 +
 security/selinux/include/av_permissions.h|1 +
 security/selinux/include/avc.h   |6 +-
 security/selinux/include/class_to_string.h   |1 +
 security/selinux/include/flask.h |1 +
 security/selinux/include/security.h  |4 +
 security/selinux/netlabel.c  |   34 ++--
 security/selinux/selinuxfs.c |  269 +-
 security/selinux/ss/policydb.c   |7 +
 security/selinux/ss/services.c   |  144 --
 21 files changed, 518 insertions(+), 86 deletions(-)



-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61

2007-07-08 Thread Peer Chen

It's a rule for all our drivers not just for linux, also if we put the Maxtor 
drive to the blacklist so that its NCQ function won't be enable for all 
controllers of other vendors,but we don't have the strong evidence that those 
Maxtor HDDs are also broken for other controllers.


BRs
Peer Chen

-Original Message-
From: Zoltan Boszormenyi [mailto:[EMAIL PROTECTED] 
Sent: Saturday, July 07, 2007 3:33 PM
To: kuan luo
Cc: linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL 
PROTECTED]; [EMAIL PROTECTED]; Peer Chen; Kuan Luo
Subject: Re: [PATCH] ata: Add the SW NCQ support to sata_nv for 
MCP51/MCP55/MCP61

kuan luo írta:
> From: Kuan Luo <[EMAIL PROTECTED]>
>
> Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA 
> controller.  NCQ function is disable by default, you can enable it 
> with 'swncq=1'. NCQ will be turned off if the drive is Maxtor on MCP51 
> or
> MCP55 rev 0xa2  platform.
>
> Signed-off-by: Kuan Luo <[EMAIL PROTECTED]>
> Signed-off-by: Peer Chen <[EMAIL PROTECTED]>
> ---

Thanks, I am using it on 2.6.22-rc7-git5, have run a stress test yesterday 
night.
It seems to be as stable as the previous version. I guess it's safe to turn it 
on by default when it gets into Linus' kernels.

I have a question though. Why is the blanket needed for Maxtor drives?
Can't it be narrowed down to certain models? Maybe those models should be put 
into libata blacklist...
Not that my current machine is affected, I am just curious.

Best regards,
Zoltán Böszörményi


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Please revert 21564fd2a3deb48200b595332f9ed4c9f311f2a7

2007-07-08 Thread Jeremy Fitzhardinge


Adrian Bunk wrote:

Reverting is safe since it simply re-establishes the 2.6.21 status quo.
  

Well, not really.  It breaks any non-GPL module when CONFIG_PARAVIRT is
enabled, even though the same module would work fine otherwise.  That's
a pretty large regression.
...



The 2.6.21 status quo can by definition not be a regression compared
to 2.6.21.
  


2.6.21's behaviour was a bug.  CONFIG_PARAVIRT is not supposed to cause 
any behavioural changes.


   J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fixup binutils printing from scripts/ver_linux

2007-07-08 Thread Gabriel C


Alexey Dobriyan wrote:

On Mon, Jul 09, 2007 at 01:47:58AM +0200, Jesper Juhl wrote:
  
I just found that scripts/ver_linux does not print the binutils version 
properly on my Slackware 12.0 system.



  

--- a/scripts/ver_linux
+++ b/scripts/ver_linux
@@ -21,9 +21,8 @@ gcc --version 2>&1| grep gcc | awk \
 make --version 2>&1 | awk -F, '{print $1}' | awk \
   '/GNU Make/{print "Gnu make  ",$NF}'
 
-ld -v | awk -F\) '{print $1}' | awk \

-'/BFD/{print "binutils  ",$NF} \
-/^GNU/{print "binutils  ",$4}'
+echo "binutils   $(ld -v | awk -F \) \
+{'print $2'} | tr -d ' ')"



NAK. It starts reporting here empty binutils field.
FWIW,

$ ld -v
GNU ld version 2.16.1


  


Well the format changed so now we have :

$ ld -v
GNU ld (Linux/GNU Binutils) 2.17.50.0.16.20070511

Maybe something like this may work :

echo "binutils $(ld -v | tr -d [:alpha:] | sed 's/.*)/\//;s/\///g' | tr 
-d ' ')"



Regards,

Gabriel C




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix use-after-free oops in Bluetooth HID.

2007-07-08 Thread Dmitry Torokhov

On Saturday 07 July 2007 14:58, David Woodhouse wrote:
> When cleaning up HIDP sessions, we currently close the ACL connection
> before deregistering the input device. Closing the ACL connection
> schedules a workqueue to remove the associated objects from sysfs, but
> the input device still refers to them -- and if the workqueue happens to
> run before the input device removal, the kernel will oops when trying to
> look up PHYSDEVPATH for the removed input device.
> 
> Fix this by deregistering the input device before closing the
> connections.

I think it will work ok for 2.6.22 but I don't think this is a final
solution: input_unregister_device might not free the device right away.
If there is a process that hangs on to one of the input interfaces
(evdev or mousedev) then freeing of the device structure may be delayed
and we may still run into the case when session is wiped out before
device is freed.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SMART problems in 2.6.22

2007-07-08 Thread Jeff Garzik

On the base point, libata has never enabled SMART on its own.  That's 
always up to the BIOS, etc.


It's possible that the recent addition of ACPI support will cause disks 
to be in different modes than previously expected.  ACPI supplies ATA 
taskfiles to be pushed to the disk, and who knows what's in there...


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fixup binutils printing from scripts/ver_linux

2007-07-08 Thread Alexey Dobriyan

On Mon, Jul 09, 2007 at 01:47:58AM +0200, Jesper Juhl wrote:
> I just found that scripts/ver_linux does not print the binutils version 
> properly on my Slackware 12.0 system.

> --- a/scripts/ver_linux
> +++ b/scripts/ver_linux
> @@ -21,9 +21,8 @@ gcc --version 2>&1| grep gcc | awk \
>  make --version 2>&1 | awk -F, '{print $1}' | awk \
>'/GNU Make/{print "Gnu make  ",$NF}'
>  
> -ld -v | awk -F\) '{print $1}' | awk \
> -'/BFD/{print "binutils  ",$NF} \
> -/^GNU/{print "binutils  ",$4}'
> +echo "binutils   $(ld -v | awk -F \) \
> +{'print $2'} | tr -d ' ')"

NAK. It starts reporting here empty binutils field.
FWIW,

$ ld -v
GNU ld version 2.16.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: panics with 16port Promise Supertrack EX Controller

2007-07-08 Thread Nick Piggin


Michal Piotrowski wrote:

Hi Flavio,

On 09/07/07, Flavio Curti <[EMAIL PROTECTED]> wrote:


Hello

I have a problem with a server running 2.6.22rc4.



Jul  8 00:19:13 dorade.cyberlink.ch EFLAGS: 00210046   
(2.6.22-rc7-dorade #1)


Is this a regression?


The machine panics
after some days of running fine, the machine inst heavy loaded.

The Controller detects as stex device:

scsi0 : stex
scsi 0:0:0:0: Direct-Access Promise 1X2 Mirror  1.10 PQ: 0 ANSI: 3
scsi 0:0:2:0: Direct-Access Promise  12+2 Disk RAID6  1.10 PQ: 0 ANSI: 3
scsi 0:0:16:0: Processor Promise RAID Console 1.00 PQ: 0 ANSI: 3
sd 0:0:0:0: [sda] 976642048 512-byte hardware sectors (500041 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sd 0:0:0:0: [sda] 976642048 512-byte hardware sectors (500041 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

 sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:2:0: [sdb] Very big device.  Trying to use READ CAPACITY(16).
sd 0:0:2:0: [sdb] 11719704576 512-byte hardware sectors (6000489 MB)
sd 0:0:2:0: [sdb] Write Protect is off
sd 0:0:2:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sd 0:0:2:0: [sdb] Very big device.  Trying to use READ CAPACITY(16).
sd 0:0:2:0: [sdb] 11719704576 512-byte hardware sectors (6000489 MB)
sd 0:0:2:0: [sdb] Write Protect is off
sd 0:0:2:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

 sdb: sdb1
sd 0:0:2:0: [sdb] Attached SCSI disk
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 0:0:2:0: Attached scsi generic sg1 type 0
scsi 0:0:16:0: Attached scsi generic sg2 type 3

Im not sure where the problem is (controller/lvm/ext3), so if anyone has
an idea, Im happy to try it out...



kernel BUG at block/as-iosched.c:1084!

BUG_ON(RB_EMPTY_ROOT(>sort_list[REQ_ASYNC]));


Could be a bug in the driver that just happens to be caught by AS checks.
If you could test another scheduler (boot with elevator=deadline or 
elevator=cfq),
it might help give us an idea.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Remove unneded lock_kernel() in driver/block/loop.c

2007-07-08 Thread Diego Woitasen


Signed-off-by: Diego Woitasen <[EMAIL PROTECTED]>
---
 drivers/block/loop.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ed5470..1cc004e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1286,7 +1286,6 @@ static long lo_compat_ioctl(struct file *file, unsigned 
int cmd, unsigned long a
struct loop_device *lo = inode->i_bdev->bd_disk->private_data;
int err;
 
-   lock_kernel();
switch(cmd) {
case LOOP_SET_STATUS:
mutex_lock(>lo_ctl_mutex);
@@ -1312,7 +1311,6 @@ static long lo_compat_ioctl(struct file *file, unsigned 
int cmd, unsigned long a
err = -ENOIOCTLCMD;
break;
}
-   unlock_kernel();
return err;
 }
 #endif
-- 
1.5.2.3

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Use the tsk argument in init_new_context()

2007-07-08 Thread Diego Woitasen

Signed-off-by: Diego Woitasen <[EMAIL PROTECTED]>
---
 arch/i386/kernel/ldt.c   |2 +-
 arch/x86_64/kernel/ldt.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/i386/kernel/ldt.c b/arch/i386/kernel/ldt.c
index e0b2d17..c2eb4fb 100644
--- a/arch/i386/kernel/ldt.c
+++ b/arch/i386/kernel/ldt.c
@@ -96,7 +96,7 @@ int init_new_context(struct task_struct *tsk, struct 
mm_struct *mm)
 
init_MUTEX(>context.sem);
mm->context.size = 0;
-   old_mm = current->mm;
+   old_mm = tsk->mm;
if (old_mm && old_mm->context.size > 0) {
down(_mm->context.sem);
retval = copy_ldt(>context, _mm->context);
diff --git a/arch/x86_64/kernel/ldt.c b/arch/x86_64/kernel/ldt.c
index bc9ffd5..99a92ed 100644
--- a/arch/x86_64/kernel/ldt.c
+++ b/arch/x86_64/kernel/ldt.c
@@ -100,7 +100,7 @@ int init_new_context(struct task_struct *tsk, struct 
mm_struct *mm)
 
init_MUTEX(>context.sem);
mm->context.size = 0;
-   old_mm = current->mm;
+   old_mm = tsk->mm;
if (old_mm && old_mm->context.size > 0) {
down(_mm->context.sem);
retval = copy_ldt(>context, _mm->context);
-- 
1.5.2.3

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt


> You could do this with a straight mutex except for the  
> dev_suspend_trylock/unlock bit in uninterruptable contexts, but I  
> seem to recall somebody saying that could be made to work if there  
> was a real need for it.  Alternatively you could just drive the  
> "Generic Mutex" guys up the wall by inventing your own pseudo-mutex  
> with a spinlock, a boolean value, and a waitqueue.

Most of the time, the interrupt/atomic contexts are part of what I call
the "main path" of the driver, which doesn't need most of this stuff.

Depending on what is "upstream" from the driver, you typically set a
condition stopping the flow of "interrupt" activity and wait for it to
drain in suspend. For example, a block driver would instruct the driver
to stop processing the request queue and wait for pending in flight
requests to drain, or a network driver would stop the tx queue and wait
for pending/concurrent xmit() callbacks to complete (basically sync. bh
and sync. with NAPI poll).

In both examples, helpers can be provided in the block layer or network
stack to make that easier.

In many cases, the driver may just turn off something in the chip in
"suspend" (using a low level spinlock that it already has if it has an
irq path to sync. HW access) and set some flag/state to instruct the irq
handler to not restart, then synchronize_irq().

All those techniques are fairly simple and well known. The main source
of problems are the "sideband" channels, such as sysfs accesses to
driver tunables, ioctl's, etc... for which I believe mutexes can solve
90% of them.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] IT8212 libata driver still hard-freezes system on boot on 2.6.22 final.

2007-07-08 Thread Rodney Gordon II

Have been quite busy, and I regret I haven't tried a rc kernel in
awhile, but, this bug has gone un-fixed still.. Thought I'd throw in
another "heads up"..

2.6.22 hang: http://spherevision.org/sync/visual/itelock.jpg
lspci -vvv: http://spherevision.org/sync/tmp/lspci.out

Pentium-D 830 3.0GHz Dualcore on a ASUS P5LD2 with latest bios rev.
1804.

SysRq does not work when this locks up.

Patches, info, things to try, anything is welcome, I am available for
awhile now and will try anything to get this working.

Cheers,
-r

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SMART problems in 2.6.22

2007-07-08 Thread Bruce Allen


Here is another similar report:

http://article.gmane.org/gmane.linux.utilities.smartmontools/4704/match=diamondmax

Again, this indicates that SMART is enabled.  But it's not clear what the 
kernel version here is.  The report indicates that the problem started 
with an FC7 kernel upgrade


Bruce

On Sun, 8 Jul 2007, Bruce Allen wrote:


Mark, David, Doug, Tejin, Alan, Jeff, LKML,

I'm afraid that there may be some problem with SMART + libata in the 2.6.22 
kernel.  An hour ago I discovered that I missed a month of correspondence 
(some LKML, some private) about this problem which Alan, Tejun, Jeff, Mark 
and others copied to me -- it was automatically shoved into one of my 
mailboxes by my mail client.  Sorry about that.  So I am trying to catch up 
to see if there is some real problem or not.


Here is a typical bug report that worries me:
http://article.gmane.org/gmane.linux.utilities.smartmontools/4712

Here is another similar report:
http://thread.gmane.org/gmane.linux.utilities.smartmontools/4713

And another report:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg358354.html

From some of the earlier threads that I missed (below) I have the impression 
that the problem may be a very simple one, namely that starting with 2.6.22 
one needs to run a command to enable SMART when a box is first booted -- the 
kernel no longer does this as part of the init/setup of the disks. But that 
is NOT consistent with the first two reports above, which show 'SMART 
ENABLED'.


Here are some of the earlier threads that I completely missed:

http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg164863.html

Before I go off half-cocked, could anyone shed some light on this?  Is there 
a real problem here or just something dumb?


Cheers,
Bruce


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.22 released

2007-07-08 Thread Phil Oester

On Sun, Jul 08, 2007 at 04:52:52PM -0700, Linus Torvalds wrote:
> Anybody? Should I make just the shortlogs available instead (I don't save 
> those, but I post those for the later -rc's - usually the -rc1 and -rc2's 
> are too big for the mailing list, but they are still a lot smaller and 
> more readable than the *full* logs are)?
> 
> Or do people really want the full logs, and don't use git?

I don't use git, and sometimes find it useful to view the changelogs to look
for when a particular change occurred.  Doing so via:

http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.X

where X is decremented from current rev is handy.  Please keep them around.

Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] retrieve VBE EDID/DDC info independent of used video mode

2007-07-08 Thread Daniel Drake


Andrew Morton wrote:

Well drat.  I didn't merge the patch because it conflicts with
git-newsetup, and Peter believes that git-newsetup already contains an
equivalent fix.  Testing 2.6.22-rc6-mm1 would confirm that.  Please.

2.6.22-rc6-mm1 has the same problem (it is not fixed there).



Well damn, we've let this slide for too long.

Guys, 2.6.22 is days away.  Do we think that the below is safe to merge
now?

Add Daniel, this does fix things for you, doesn't it?


I don't have the issue, I'm just proxying for a bug reporter on the 
Gentoo bugzilla. According to him, Jan's patch does solve the problem:


https://bugs.gentoo.org/show_bug.cgi?id=181067#c8

Thanks,
Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Hdaps-devel] [PATCH] hdaps - switch to using input-polldev

2007-07-08 Thread Shem Multinymous


On 5/25/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:

HWMON: hdaps - convert to use input-polldev.

Switch to using input-polldev skeleton instead of implementing
polling loop by itself. This also fixes problem with trylock
on a mutex in atomic context.

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>


There's a couple of inherent problems with this patch (now in -mm).

First, the hdaps driver regularly polls the embedded controller, which
in turns regularly polls the hardware. If the two polling rates differ
or fluctuate, we lose events. AFAICT, the delayed workqueues used by
input-polldev can get very laggy under load. That's very bad for
sensitive clients like hdapsd (the hard disk shock protection daemon).

Second, this is incompatible with the much-needed addition of a 2nd
input device relying on the same data. The existing hdaps input device
does "joystick emulation", i.e., reports values after calibration and
fuzzing. Userspace programs that need the raw data, like hdapsd,
currently have to poll the sysfs attribute, which is inefficient,
lag-prone and induces unnecessary interrupts on tickless sytems. To
solve this we'll have to add a 2nd input device to hdaps, for
reporting the raw accelerometer data. (Michael Riepe and me are now
working on such a patch.) But these two input devices need to share
their polling of the underlying EC hardware, and this is impossible
using input-polldev.

Since this patch will  degrade accuracy and will eventually be
reverted anyway, I suggest retracting it.

As for the mutex in atomic context issue, isn't it best addressed by
making mutex_trylock() do the sensible thing in softirqt?

 Shem
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Kyle Moffett


On Jul 08, 2007, at 20:33:37, Benjamin Herrenschmidt wrote:

drivers_sysfs_write()
while(!suspend_trylock())
			try_to_icebox() --> or even try_to_freeze(), what's the  
difference?


hit hardware
unlock_suspend()

where the PM core must wait for the suspend lock to get released  
(say with a timeout)?


Somewhat. What I'd like is to have that a construct of that sort on  
a per-driver basis though :-) Now the question is, in what way  
would the above be different from a simple suspend mutex ? I've  
been using mutexes in the past for a couple of drivers (iirc,  
that's how I did it for dmasound_pmac, though that driver is long  
past obsolescence now).


I agree completely.  It's a bit trickier if you want to do work in  
uninterruptable contexts:


driver_suspend_callback(...)
{
dev_suspend_lock(dev);
put_hardware_to_sleep(dev);
}

driver_resume_callback(...)
{
wake_hardware_up(dev);
dev_suspend_unlock(dev);
}

Then for sleep-capable contexts:
dev_suspend_lock(dev);
dev_suspend_unlock(dev);

And for no-sleep contexts like interrupts etc:
if (!dev_suspend_trylock(dev))
return postpone_work_for_later(dev, ...);
do_stuff_with(dev);
dev_suspend_unlock(dev);

You could do this with a straight mutex except for the  
dev_suspend_trylock/unlock bit in uninterruptable contexts, but I  
seem to recall somebody saying that could be made to work if there  
was a real need for it.  Alternatively you could just drive the  
"Generic Mutex" guys up the wall by inventing your own pseudo-mutex  
with a spinlock, a boolean value, and a waitqueue.


Then when you put your driver to sleep, things trying to do IO will  
automatically "freeze" themselves exactly until the device is woken  
again.


Assuming the driver model and subsystem get the ordering right for  
which devices to suspend/resume, then it's impossible to deadlock or  
cause hardware confusion.  And even further, if you manage to make  
the automagic mutex-debugging code work with the noninterruptable  
trylock it will yell at you when the driver model does nasty deadlock- 
y things.


On the other hand, if the driver model *doesn't* get the ordering  
right then it's fundamentally impossible to reliably suspend and resume.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] retrieve VBE EDID/DDC info independent of used video mode

2007-07-08 Thread H. Peter Anvin

Daniel Drake wrote:
> 
> 2.6.22-rc6-mm1 has the same problem (it is not fixed there).
> 

I believe this patch should fix it for 2.6.22-rc6-mm1.  I will check
this into the newsetup tree.

-hpa
diff --git a/arch/i386/boot/video-vesa.c b/arch/i386/boot/video-vesa.c
index 3c21bd7..e6aa9eb 100644
--- a/arch/i386/boot/video-vesa.c
+++ b/arch/i386/boot/video-vesa.c
@@ -28,7 +28,7 @@ static void vesa_store_mode_params_graphics(void);
 
 static int vesa_probe(void)
 {
-#ifdef CONFIG_VIDEO_VESA
+#if defined(CONFIG_VIDEO_VESA) || defined(CONFIG_FIRMWARE_EDID)
u16 ax;
u16 mode;
addr_t mode_ptr;
@@ -47,7 +47,8 @@ static int vesa_probe(void)
vginfo.signature != VESA_MAGIC ||
vginfo.version < 0x0102)
return 0;   /* Not present */
-
+#endif /* CONFIG_VIDEO_VESA || CONFIG_FIRMWARE_EDID */
+#ifdef CONFIG_VIDEO_VESA
set_fs(vginfo.video_mode_ptr.seg);
mode_ptr = vginfo.video_mode_ptr.off;
 
@@ -96,7 +97,7 @@ static int vesa_probe(void)
return nmodes;
 #else
return 0;
-#endif
+#endif /* CONFIG_VIDEO_VESA */
 }
 
 static int vesa_set_mode(struct mode_info *mode)

SMART problems in 2.6.22

2007-07-08 Thread Bruce Allen


Mark, David, Doug, Tejin, Alan, Jeff, LKML,

I'm afraid that there may be some problem with SMART + libata in the 
2.6.22 kernel.  An hour ago I discovered that I missed a month of 
correspondence (some LKML, some private) about this problem which Alan, 
Tejun, Jeff, Mark and others copied to me -- it was automatically shoved 
into one of my mailboxes by my mail client.  Sorry about that.  So I am 
trying to catch up to see if there is some real problem or not.


Here is a typical bug report that worries me:
http://article.gmane.org/gmane.linux.utilities.smartmontools/4712

Here is another similar report:
http://thread.gmane.org/gmane.linux.utilities.smartmontools/4713

And another report:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg358354.html

From some of the earlier threads that I missed (below) I have the 
impression that the problem may be a very simple one, namely that starting 
with 2.6.22 one needs to run a command to enable SMART when a box is first 
booted -- the kernel no longer does this as part of the init/setup of the 
disks. But that is NOT consistent with the first two reports above, which 
show 'SMART ENABLED'.


Here are some of the earlier threads that I completely missed:

http://www.ussg.iu.edu/hypermail/linux/kernel/0706.1/0849.html
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg164863.html

Before I go off half-cocked, could anyone shed some light on this?  Is 
there a real problem here or just something dumb?


Cheers,
Bruce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt


>   drivers_sysfs_write()
>   while(!suspend_trylock())
>   try_to_icebox() --> or even try_to_freeze(), what's the 
> difference?
> 
>   hit hardware
>   unlock_suspend()
> 
> where the PM core must wait for the suspend lock to get released (say with a
> timeout)?

Somewhat. What I'd like is to have that a construct of that sort on a
per-driver basis though :-) Now the question is, in what way would the
above be different from a simple suspend mutex ? I've been using mutexes
in the past for a couple of drivers (iirc, that's how I did it for
dmasound_pmac, though that driver is long past obsolescence now).

But yes, that's the idea, something under control of drivers, instead of
a 3rd party freezer that tries to hit at processes. 

> Hmm, perhaps we can use a special workqueue that's created before the suspend
> (say by the PM core) and starts processing jobs after the resume?

Yeah, whatever the implementation is, though, the interface should be
transparent. But then, I see little use of that. Mostly things like this
sysfs write to unbind Alan was talking about. Driver workqueues probably
need to be functioning all the way until the driver itself is suspended,
in which case, the driver may just flush_worqueues in it's suspend path
at the right time to make sure it doesn't exit with something still
pending.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.22 released

2007-07-08 Thread Jesper Juhl


On 09/07/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
[snip]

The shortlog (appended) is fairly self-explanatory and the diffstat (at
the very end) also gives a fairly good picture of where the changes are.


I've always felt that these "shortlog since last -rc for a full
release" were a bit pointless.
Whoever is reading the release notes for a final 2.6.x release is not
going to be really interrested in what changed since the last -rc,
they want to know what changed since the last 2.6.(x-1) kernel. And
the people who are interrested in the changes since last -rc have
obviously been keeping track of what happened between the previous
-rc's (or else reading the last one doesn't make much sense).



The full changelog since 2.6.21 also got uploaded, but quite frankly, I
wonder if anybody uses those things? I've been uploading them for non-git
users, but I have a suspicion that any people who want that kind of
detail have long since learnt to use git, or are following the commit
mailing lists or equivalent.


I believe you are right, but it's still a nice thing to have for
archeology purposes. To be able to download kernel 2.6.(some old x)
along with its Changelog makes for a nice combined package where
people can see where this kernel is different from the previous one...
it's just a nice thing to have in the FTP archives.
I won't cry much if it dies, but on the other hand I think it's a good
thing to have archived in a plain-text form for posterity.


So this is also a heads-up that I'm considering skipping the ChangeLog
files in the future - the full release ones are so big as to not be very
easily readable (the full ChangeLog from 2.6.21 is ove ra hundred thousand
lines, and weighs in at 3.8MB for example), and you really can get much
better per-subsystem logs from git.

Anybody? Should I make just the shortlogs available instead (I don't save
those, but I post those for the later -rc's - usually the -rc1 and -rc2's
are too big for the mailing list, but they are still a lot smaller and
more readable than the *full* logs are)?



This is just my oppinion. The shortlogs are nice and readable,
archiving the shortlog-between-rc's would be nice, as for the full
logs see above.

I think what would be even better than posting full-/short-logs after
each -rc/full release, would be to post a list of who was involved in
that specific kernel release. I think that you are right that the
people who want the full details know how to use git (or else they can
get to parse the full Changelog), but what's really more interresting
is giving credit where it is due.
I'd suggest that whenever you release a new kernel you should upload
the full Changelog since last version, just so it's available. But, in
your release notes you should just list who contributed to that new
release, something like this :

[EMAIL PROTECTED]:~/kernel/linux-2.6$ git log v2.6.21..v2.6.22 | egrep
"^Author: " | sort | uniq -c | sort -n -r
   283 Author: Linus Torvalds <[EMAIL PROTECTED]>
   174 Author: David S. Miller <[EMAIL PROTECTED]>
   106 Author: Kristian HÃ¸gsberg <[EMAIL PROTECTED]>
88 Author: Stephen Hemminger <[EMAIL PROTECTED]>
84 Author: Christoph Lameter <[EMAIL PROTECTED]>
82 Author: Stefan Richter <[EMAIL PROTECTED]>
79 Author: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
78 Author: Tejun Heo <[EMAIL PROTECTED]>
78 Author: Patrick McHardy <[EMAIL PROTECTED]>
78 Author: Dmitry Torokhov <[EMAIL PROTECTED]>
...

The details are in git / the Changelog, but this lets the worls easily
see who contributed - I suspect that's a lot more interresting to the
people reading your release-mails. I could be wrong (and I probably
am) ;-)



Or do people really want the full logs, and don't use git?


git rules. It's a fantastic tool - anyone wanting the full details
should use it.


Let me know how you feel. And test the actual release out too, of course!


Running 2.6.22-rc7-g4e99325b atm :)


--
Jesper Juhl <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 2.6.22 released

2007-07-08 Thread Linus Torvalds


It's out there now (or at least in the process of mirroring out - if you 
don't see everything, give it a bit of time).

Not a whole lot of changes since -rc7: some small architecture changes 
(ppc, mips, blackfin), and most of those are defconfig updates. Various 
driver fixes: new PCI ID's along with some ide, ata and networking fixes 
(for example - the magic wireless libertas ioctl's got removed, they may 
be re-added later, hopefully in a more generic form, but in the meantime 
this doesn't make a release with new interfaces that aren't universally 
liked).

And various random fixes for regressions and other buglets. Mostly really 
small "few-liners"..

The shortlog (appended) is fairly self-explanatory and the diffstat (at 
the very end) also gives a fairly good picture of where the changes are.

The full changelog since 2.6.21 also got uploaded, but quite frankly, I 
wonder if anybody uses those things? I've been uploading them for non-git 
users, but I have a suspicion that any people who want that kind of 
detail have long since learnt to use git, or are following the commit 
mailing lists or equivalent. 

So this is also a heads-up that I'm considering skipping the ChangeLog 
files in the future - the full release ones are so big as to not be very 
easily readable (the full ChangeLog from 2.6.21 is ove ra hundred thousand 
lines, and weighs in at 3.8MB for example), and you really can get much 
better per-subsystem logs from git.

Anybody? Should I make just the shortlogs available instead (I don't save 
those, but I post those for the later -rc's - usually the -rc1 and -rc2's 
are too big for the mailing list, but they are still a lot smaller and 
more readable than the *full* logs are)?

Or do people really want the full logs, and don't use git?

Let me know how you feel. And test the actual release out too, of course!

Linus

---
Adrian Bunk (4):
  drivers/net/ns83820.c: fix a check-after-use
  [NET]: net/core/netevent.c should #include 
  include/linux/kallsyms.h must #include 
  DLM must depend on SYSFS

Alan Cox (4):
  ata_generic: Check the right register for the DMA enabled flags
  pata_pdc202xx_old: Correct cable detect logic
  pata_pcmcia: Switch to ata_sff_port_start
  ide: Fix a theoretical Ooops case

Albert Lee (3):
  libata: pata_pdc2027x PLL input clock fix
  libata: remove reading alt_status from ata_hsm_qc_complete()
  ide: pdc202xx_new PLL input clock fix

Alexander Graf (1):
  fix logic error in ipc compat semctl()

Andi Kleen (2):
  Revert HPET resource reservation
  Revert perfctr reservation to 2.6.21 state

Andres Salomon (1):
  GEODE: reboot fixup for geode machines with CS5536 boards

Andrew Morton (1):
  ide: ide_scan_pcibus(): check __pci_register_driver return value

Andrew Sharp (1):
  [MIPS] 64-bit TO_PHYS_MASK macro for RM9000 processors

Andrzej Zaborowski (1):
  [ARM] 4454/1: Use word accesses in Versatile PCI config reads

Atsushi Nemoto (1):
  [MIPS] Add whitelists for checksyscalls.sh

Bartlomiej Zolnierkiewicz (3):
  amd74xx: resume fix
  it821x: fix incorrect SWDMA mask
  qd65xx: fix PIO mode selection

Bjorn Helgaas (1):
  PNP SMCf010 quirk: work around Toshiba Portege 4000 ACPI issues

Chris Dearman (1):
  [MIPS] Fix timer/performance interrupt detection

Christian Krafft (1):
  [POWERPC] Fix PMI breakage in cbe_cbufreq driver

Christoph Lameter (2):
  SLUB: Make lockdep happy by not calling add_partial with interrupts 
enabled during bootstrap
  slub: remove useless EXPORT_SYMBOL

Chuck Ebbert (1):
  pata_ali: fix UDMA settings

Dan Williams (4):
  libertas: style fixes
  libertas: kill wlan_scan_process_results
  libertas: fix WPA associations by handling ENABLE_RSN correctly
  libertas: remove private ioctls

Dave Jones (1):
  Clean up E7520/7320/7525 quirk printk.

David Brownell (1):
  net/usb/cdc_ether minor sparse cleanup

David Gibson (1):
  [POWERPC] Disable old EMAC driver in arch/powerpc

David Woodhouse (4):
  [JFFS2] Fix readinode failure when read_dnode() detects CRC failure.
  Fix slab redzone alignment
  x86_64: fix headers_install
  Fix use-after-free oops in Bluetooth HID.

Dhananjay Phadke (1):
  RESEND [PATCH 3/3] NetXen: Graceful teardown of interface and hardware 
upon module unload

Dmitry Torokhov (4):
  Input: i8042 - add HP Pavilion ZT1000 to the MUX blacklist
  Input: atkbd - throttle LED switching
  Input: serio - take drv_mutex in serio_cleanup()
  Input: document some of keycodes

Florian Attenberger (1):
  sata_mv: PCI-ID for Adaptec 1430SA SATA Controller

Hartmut Birr (1):
  V4L/DVB (5822): Fix the return value in ttpci_budget_init()

Henrique de Moraes Holschuh (1):
  Input: add a new EV_SW SW_RADIO event, for radio switches on laptops

Jack Morgenstein (1):
  mlx4_core: Add new Mellanox device IDs

Jarek Poplawski

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread david


On Mon, 9 Jul 2007, Pavel Machek wrote:


On Sun 2007-07-08 16:20:46, [EMAIL PROTECTED] wrote:

On Mon, 9 Jul 2007, Pavel Machek wrote:


Actaully, I'm perfectly fine with that, as long as each task blocked by
the
driver due to suspend has PF_FROZEN (or something similar) set.  Then,
at
least theoretically, we'll be able to drop the freezer from the suspend
code
path and move it after device_suspend() (or the hibernation-specific
equivalent) for hibernation (in that case there shouldn't be a problem
with
any task waiting on I/O while the freezer is running ;-)).


I don't see the need for a freezer for snapshot but that's a different
issue. (stop_machine looks good enough to me).


Freezer is not needed for snapshot -- it is needed so that we can
write out the snapshot to disk without the need for special
drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
write to disk from userland code in uswsusp).


instead of trying to freeze most of the system, could you do something
like start a virtual machine sandbox to write the data out, and not let
any userspace other then the sandbox operate?

you would need to throw away disk buffers so that you don't mix current
pending I/O with I/O from the sandbox, and this would be a visable change
for how suspend is setup, but wouldn't this work?


It feels kind of expensive, but yes, we could use another kernel for
doing the dump. Kdump people are using that. We could use hypervisor
for doing the dump. Xen people are doing that. (But I do not think any
of those solutions is suitable for "lets hibernate my notebook" case).


expensive and reliable beats efficiant and unrelaible.

why do you say that neither would work for the "lets hibernate my
notebook" case?


Both would work. One would eat 8-64MB of your RAM, permanently; second
would eat 5-15% of your cpu, permanently. Not very suitable.


how much overlap is there between the two approaches? are they close 
enough to be able to give the user the choice of which to use depending on 
their machine (new machines with the hardware virtualization support may 
want the hypervisor, other hardware may want to sacrafice 8M of ram)



Who says current solution is unreliable?


users report problems, suspend* developers repeatedly state that the 
problems are that the rest of the kernel needs to be fixed to work 
properly with the existing approach.


I think it's safe to say that it doesn't work in the general case, even 
though it does work in some specific cases.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fixup binutils printing from scripts/ver_linux

2007-07-08 Thread Jesper Juhl

I just found that scripts/ver_linux does not print the binutils version 
properly on my Slackware 12.0 system. 
The following patch fixes things up.

Pre this patch:
[EMAIL PROTECTED]:~/kernel/linux-2.6$ scripts/ver_linux
...
binutils   Binutils
...

Post this patch:
[EMAIL PROTECTED]:~/kernel/linux-2.6$ scripts/ver_linux
...
binutils   2.17.50.0.17.20070615
...


Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
---

 scripts/ver_linux |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/scripts/ver_linux b/scripts/ver_linux
index 72876df..2f96a1b 100755
--- a/scripts/ver_linux
+++ b/scripts/ver_linux
@@ -21,9 +21,8 @@ gcc --version 2>&1| grep gcc | awk \
 make --version 2>&1 | awk -F, '{print $1}' | awk \
   '/GNU Make/{print "Gnu make  ",$NF}'
 
-ld -v | awk -F\) '{print $1}' | awk \
-'/BFD/{print "binutils  ",$NF} \
-/^GNU/{print "binutils  ",$4}'
+echo "binutils   $(ld -v | awk -F \) \
+{'print $2'} | tr -d ' ')"
 
 echo -n "util-linux "
 fdformat --version | awk '{print $NF}' | sed -e s/^util-linux-// -e s/\)$//





-- 
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.21.5-rt19

2007-07-08 Thread Gabriel C


Fernando Lopez-Lezcano wrote:

On Sat, 2007-07-07 at 11:24 +0200, Ingo Molnar wrote:
  

* Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:


Changes since 2.6.21.5-rt18:

- Fixed a nasty and hard to track down slowness / boot problem on SMP
machines with CONFIG_NOHZ enabled. The problem was caused by the timer
wheel base lock held during the get_next_timer_interrupt() call in the
idle path, which eventually led to a bogus PI boosting of the idle task
and in consequence a stale wrong scheduler selection for the affected idle
task.

Kudos to Carsten Emde, who patiently and meticulously isolated the
problem and provided the traces, which allowed to identify the root cause.

Problem solution: Prevent idle task boosting
  
Maybe someone remember me whining about troubles with 2.6.21-rt2..18 
on my Core2 T7200 laptop (fujitsu-siemens amilo i1520).


Althought I'm still with my fingers crossed, I can tell the good 
news are that 2.6.21.5-rt19 (and -rt20) does behave far better now 
on the very same box.


Yes, it works much better indeed...

Ingo: is there a place where I can read about the changes in different 
rtxx releases? What is new/better/fixed in rt20? (I see scheduler 
stuff in a diff from rt19 to rt20 but I don't really know what it 
means).
  
and rt18 was a -rt-only NOHZ fix, that bug got introduced in rt11 when 
CFS was merged.


i _think_ Rui might have seen two separate problems. Perhaps by the time 
we fixed the first problem (which Rui saw since -rt2) we introduced the 
other one via -rt11 - which then got fixed in -rt19.



Ahh, CFS is now part of rt, I was obviously not paying attention... I'm
really trying to provide a "stable" rt kernel for audio usage and
including another subsystem into rt is - IMHO - not going to help.
What's the chance of splitting things?

  
btw., we'd love to get more feedback regarding CFS. CFS is a completely 
new scheduler for Linux. 



Then I'd rather have it separate from rt. 

  
It has a design centered around keeping 
application latencies down, so it is ultimately real-time friendly, and 
it should also make things work better for desktop-ish and audio-ish 
stuff as well. (even under SCHED_OTHER)



Maybe this is CFS related? (tail of a thread in the Planet CCRMA mailing
list):

On Sun, 2007-07-08 at 15:26 -0400, Hector Centeno wrote:
  

Ok, so just to confirm, that 2.6.21-0182.rt19.1.fc7.ccrmart works fine
on my desktop but on my laptop it makes Firefox and Tomboy to crash.
On the same laptop using 2.6.21-0182.rt17.1.fc7.ccrmart there is no
problem.

Cheers,

Hector


On 7/7/07, Hector Centeno <[EMAIL PROTECTED]> wrote:
Hi Fernando,

I do have Flash installed but for me Firefox crashes when

trying to
access gmail (which AFAIK doesn't use Flash, does it?). Right
now
Firefox is frozen and I'm typing this email using Konkeror (in
Gnome). 
This is ps' output:

hector3595  1.1  2.2 194352 46336 ?D16:25

0:03
/usr/lib/firefox-2.0.0.4/firefox-bin

I think the problem is not present in my Desktop but I have to
double 
check. In the same laptop using the stock fedora kernel both

Tomboy
and Firefox work fine. My laptop has a centrino duo processor,
2 gigs
of ram and the Inte GMA950 graphics chip.

Hector



I managed to completely hang firefox (fc7) with flash 9 installed
(unkillable even with -9).


Firefox with flash 9 does not work good , there are a lot bugs reported 
about ( just google ) and it hangs on vanilla or
whatever other kernels as well. Not only Firefox but also  Swiftfox, 
Opera, Epiphany etc.


The most time Firefox dies when you use flash 9 and close a window or a tab.


 Does not seem to happen with flash 7.


Yes flash 7 is fine.


 Have
not tried yet with gmail and flash uninstalled. I'll try to strace it to
see when/why it hangs. 

  
-- Fernando



  
So it would be nice if you could keep an extra eye on any scheduling 
artifacts or regressions, and make sure your favorite workload is still 
handled by the Linux scheduler in the utmost best way. I'd like to hear 
about any sort of "scheduling behavior / interactivity" regression you 
might see, relative to the vanilla kernel. Or if you can see no such 
problems then a line of "it works as well as the previous scheduler" is 
important info to us too. Thanks!




  



Regards,

Gabriel C
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [kvm-devel] [PATCH][RFC] kvm-scheduler integration

2007-07-08 Thread Rusty Russell

On Sun, 2007-07-08 at 15:48 +0200, Ingo Molnar wrote:
> * Avi Kivity <[EMAIL PROTECTED]> wrote:
> 
> > >>+#ifdef CONFIG_SCHED_KVM
> > >>+static __read_mostly struct sched_kvm_hooks kvm_hooks;
> > >>+#endif
> > >
> > >please just add a current->put_vcpu() function pointer instead of 
> > >this hooks thing.
> > 
> > Won't that increase task_struct (16 bytes on 64-bit) unnecessarily?  
> > The function pointers are common to all virtual machines.
> 
> well, this function pointer could then be reused by other virtual 
> machines as well, couldnt it? If the task struct overhead is a problem 
> (it really isnt, and it's dependent on CONFIG_KVM) then we could switch 
> it around to a notifier-alike mechanism.

OK, this patch is *ugly*.  Not that there's anything wrong with a patch
which says "I'm going to preempt you", but making it kvm-specific is
ugly.  ISTR times past where I wanted such a hook, although none spring
immediately into my pre-coffee brain.

I think a "struct preempt_ops *" and a "void *preempt_ops_data" inside
every task struct is a better idea.  Call the config option
PREEMPT_SCHED_HOOKS and now there's nothing kvm-specific about it...

Cheers,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread Pavel Machek

On Sun 2007-07-08 16:20:46, [EMAIL PROTECTED] wrote:
> On Mon, 9 Jul 2007, Pavel Machek wrote:
> 
> >Actaully, I'm perfectly fine with that, as long as each task blocked by
> >the
> >driver due to suspend has PF_FROZEN (or something similar) set.  Then, 
> >at
> >least theoretically, we'll be able to drop the freezer from the suspend
> >code
> >path and move it after device_suspend() (or the hibernation-specific
> >equivalent) for hibernation (in that case there shouldn't be a problem
> >with
> >any task waiting on I/O while the freezer is running ;-)).
> 
> I don't see the need for a freezer for snapshot but that's a different
> issue. (stop_machine looks good enough to me).
> >>>
> >>>Freezer is not needed for snapshot -- it is needed so that we can
> >>>write out the snapshot to disk without the need for special
> >>>drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
> >>>write to disk from userland code in uswsusp).
> >>
> >>instead of trying to freeze most of the system, could you do something
> >>like start a virtual machine sandbox to write the data out, and not let
> >>any userspace other then the sandbox operate?
> >>
> >>you would need to throw away disk buffers so that you don't mix current
> >>pending I/O with I/O from the sandbox, and this would be a visable change
> >>for how suspend is setup, but wouldn't this work?
> >
> >It feels kind of expensive, but yes, we could use another kernel for
> >doing the dump. Kdump people are using that. We could use hypervisor
> >for doing the dump. Xen people are doing that. (But I do not think any
> >of those solutions is suitable for "lets hibernate my notebook" case).
> 
> expensive and reliable beats efficiant and unrelaible.
> 
> why do you say that neither would work for the "lets hibernate my 
> notebook" case?

Both would work. One would eat 8-64MB of your RAM, permanently; second
would eat 5-15% of your cpu, permanently. Not very suitable.

Who says current solution is unreliable?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread david


On Mon, 9 Jul 2007, Pavel Machek wrote:


Actaully, I'm perfectly fine with that, as long as each task blocked by
the
driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
least theoretically, we'll be able to drop the freezer from the suspend
code
path and move it after device_suspend() (or the hibernation-specific
equivalent) for hibernation (in that case there shouldn't be a problem
with
any task waiting on I/O while the freezer is running ;-)).


I don't see the need for a freezer for snapshot but that's a different
issue. (stop_machine looks good enough to me).


Freezer is not needed for snapshot -- it is needed so that we can
write out the snapshot to disk without the need for special
drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
write to disk from userland code in uswsusp).


instead of trying to freeze most of the system, could you do something
like start a virtual machine sandbox to write the data out, and not let
any userspace other then the sandbox operate?

you would need to throw away disk buffers so that you don't mix current
pending I/O with I/O from the sandbox, and this would be a visable change
for how suspend is setup, but wouldn't this work?


It feels kind of expensive, but yes, we could use another kernel for
doing the dump. Kdump people are using that. We could use hypervisor
for doing the dump. Xen people are doing that. (But I do not think any
of those solutions is suitable for "lets hibernate my notebook" case).


expensive and reliable beats efficiant and unrelaible.

why do you say that neither would work for the "lets hibernate my 
notebook" case?


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: CONFIG_PAGE_SHIFT (aka software PAGE_SIZE)

2007-07-08 Thread David Chinner

On Sat, Jul 07, 2007 at 12:26:51AM +0200, Andrea Arcangeli wrote:
> The xfs developers for example want to enlarge their filesystem
> blocksize (the filesystem blocksize has a tradeoff similar to the
> PAGE_SIZE, the larger the faster the filesystem but more disk space is
> potentially wasted),

I think you've misunderstood why large block sizes are important to
XFS.  The major benefits to XFS of larger block size have almost
nothing to do with data layout or in memory indexing - it comes from
metadata btree's getting much broader and so we can search much
larger spaces using the same number of seeks. It's metadata
scalability that I'm concerned about here, not file data.

IOWs, larger pages in the page cache are not directly related to
improving data I/O performance of the filesystem, but to allow us
to greatly improve metadata scalability of the filesystem by
allowing us to increase the fundamental block size of the filesystem.
This, in turn, improves the data I/O scalability of the filesystem.

And given that XFS has different metadata block sizes (even on 4k
block size filesystems), it would be really handy to be able to
allocate different sized large pages to match all those different
block sizes so we could avoid having to play vmap() games

> they also want to use the ânormalâ writeback
> pagecache efficient behavior when using a writable fs on top of a
> dvd-ram with an hardblocksize of 64k.

In this case "they" != "XFS developers" - you're lumping several
different groups of ppl that want large pages for I/O into one
group.

This is where simply increasing the page size falls down - if you
want to use large block size on your DVD drive (i.e. every desktop
machine out there) you need to use (say) a 64k page size which is
less than ideal for caching the kernel trees that you are currently
compiling.

e.g. I was recently asked what the downsides of moving from a 16k
page to a 64k page size would be - the back-of-the-envelope
calculations I did for a cached kernel tree showed it's foot-print
increased from about 300MB to ~1.2GB of RAM because 80% of the files
in the kernel tree I looked at were smaller than 16k and all that
happened is we wasted much more memory on those files.  That's not
what you want for your desktop, yet we would like 32-64k pages for
the DVD drives.

The point that seems to be ignored is that this is not a "one size
fits all" type of problem.  This is why the variable page cache may
be a better solution if the fragmentation issues can be solved.
They've been solved before, so I don't see why they can't be solved
again.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: panics with 16port Promise Supertrack EX Controller

2007-07-08 Thread Michal Piotrowski


Hi Flavio,

On 09/07/07, Flavio Curti <[EMAIL PROTECTED]> wrote:

Hello

I have a problem with a server running 2.6.22rc4.


Jul  8 00:19:13 dorade.cyberlink.ch EFLAGS: 00210046   (2.6.22-rc7-dorade #1)

Is this a regression?


The machine panics
after some days of running fine, the machine inst heavy loaded.

The Controller detects as stex device:

scsi0 : stex
scsi 0:0:0:0: Direct-Access Promise 1X2 Mirror  1.10 PQ: 0 ANSI: 3
scsi 0:0:2:0: Direct-Access Promise  12+2 Disk RAID6  1.10 PQ: 0 ANSI: 3
scsi 0:0:16:0: Processor Promise RAID Console 1.00 PQ: 0 ANSI: 3
sd 0:0:0:0: [sda] 976642048 512-byte hardware sectors (500041 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 0:0:0:0: [sda] 976642048 512-byte hardware sectors (500041 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:2:0: [sdb] Very big device.  Trying to use READ CAPACITY(16).
sd 0:0:2:0: [sdb] 11719704576 512-byte hardware sectors (6000489 MB)
sd 0:0:2:0: [sdb] Write Protect is off
sd 0:0:2:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 0:0:2:0: [sdb] Very big device.  Trying to use READ CAPACITY(16).
sd 0:0:2:0: [sdb] 11719704576 512-byte hardware sectors (6000489 MB)
sd 0:0:2:0: [sdb] Write Protect is off
sd 0:0:2:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sdb: sdb1
sd 0:0:2:0: [sdb] Attached SCSI disk
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 0:0:2:0: Attached scsi generic sg1 type 0
scsi 0:0:16:0: Attached scsi generic sg2 type 3

Im not sure where the problem is (controller/lvm/ext3), so if anyone has
an idea, Im happy to try it out...


kernel BUG at block/as-iosched.c:1084!

BUG_ON(RB_EMPTY_ROOT(>sort_list[REQ_ASYNC]));



Thank you

Flavio Curti

--
http://no-way.org/~fcu/




Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Please revert 21564fd2a3deb48200b595332f9ed4c9f311f2a7

2007-07-08 Thread Adrian Bunk

On Mon, Jul 09, 2007 at 01:02:21AM +0200, Andi Kleen wrote:
> On Monday 09 July 2007 00:44:44 Adrian Bunk wrote:
> > On Sun, Jun 17, 2007 at 09:43:51PM -0700, Jeremy Fitzhardinge wrote:
> > > Adrian Bunk wrote:
> > >...
> > > > Andi forwarded it although the following people had already NAK'ed it:
> > > > - Christoph Hellwig [1]
> > > > - Peter Zijlstra [2]
> > > > - Alan Cox [3]
> > > >
> > > > Considering that Andi forwarded it 2 days after he himself said a 
> > > > different solution was pending [4] I assume he mistakenly sent it for 
> > > > inclusion in your tree.
> > > >   
> > > 
> > > We played with some ideas, but they all turned out way too ugly to live. 
> > 
> > Andi got some NAK's, said himself it will be solved differently, and
> > two days later he submits the NAK'ed patch into Linus' tree.
> 
> It will be solved differently longer term, but short term the fix
> was still needed. There are limits on what can be done late 
> in the release cycle so simple patches win.

Your patch got into Linus' tree in the middle of the merge window...

> Besides none of the "NAK"s were particularly inspired in my opinion;
> there were no clear technical objections brought forward.

Why didn't you say this?

Your answer in [1] didn't sound as if you would submit this patch, and 
even less that you would submit it just two days later.

> -Andi

cu
Adrian

[1] http://lkml.org/lkml/2007/4/30/335

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH resend] ieee1394: remove old isochronous ABI

2007-07-08 Thread Stefan Richter

Date: Sun, 24 Jun 2007 15:31:54 +0200 (CEST)
From: Stefan Richter <[EMAIL PROTECTED]>
Subject: [RFC PATCH] ieee1394: remove old isochronous ABI

Based on patch "the scheduled removal of RAW1394_REQ_ISO_{SEND,LISTEN}"
from Adrian Bunk, November 20 2006.

This patch also removes the underlying facilities in ohci1394 and
disables them in pcilynx.  That is, hpsb_host_driver.devctl() and
hpsb_host_driver.transmit_packet() are no longer used for iso reception
and transmission.

Since video1394 and dv1394 only work with ohci1394 and raw1394's rawiso
interface has never been implemented in pcilynx, pcilynx is now no
longer useful for isochronous applications.

raw1394 will still handle the request types but will complete the
requests with errors that indicate API version conflicts.

Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
---


This patch has been sent to linux1394-devel and libdc1394-devel and
didn't receive a comment.


 Documentation/ABI/removed/raw1394_legacy_isochronous |   16 
 Documentation/feature-removal-schedule.txt   |   10 
 drivers/ieee1394/highlevel.c |   45 --
 drivers/ieee1394/highlevel.h |   16 
 drivers/ieee1394/hosts.h |8 
 drivers/ieee1394/ieee1394_core.c |8 
 drivers/ieee1394/ieee1394_core.h |5 
 drivers/ieee1394/ieee1394_transactions.c |   30 -
 drivers/ieee1394/ieee1394_transactions.h |2 
 drivers/ieee1394/ohci1394.c  |  221 ---
 drivers/ieee1394/ohci1394.h  |   14 
 drivers/ieee1394/pcilynx.c   |   16 
 drivers/ieee1394/raw1394-private.h   |5 
 drivers/ieee1394/raw1394.c   |  176 
 drivers/ieee1394/raw1394.h   |4 
 15 files changed, 48 insertions(+), 528 deletions(-)

Index: linux/Documentation/ABI/removed/raw1394_legacy_isochronous
===
--- /dev/null
+++ linux/Documentation/ABI/removed/raw1394_legacy_isochronous
@@ -0,0 +1,16 @@
+What:  legacy isochronous ABI of raw1394 (1st generation iso ABI)
+Date:  June 2007 (scheduled), removed in kernel v2.6.23
+Contact:   [EMAIL PROTECTED]
+Description:
+   The two request types RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN have
+   been deprecated for quite some time.  They are very inefficient as they
+   come with high interrupt load and several layers of callbacks for each
+   packet.  Because of these deficiencies, the video1394 and dv1394 drivers
+   and the 3rd-generation isochronous ABI in raw1394 (rawiso) were created.
+
+Users:
+   libraw1394 users via the long deprecated API raw1394_iso_write,
+   raw1394_start_iso_write, raw1394_start_iso_rcv, raw1394_stop_iso_rcv
+
+   libdc1394, which optionally uses these old libraw1394 calls
+   alternatively to the more efficient video1394 ABI
Index: linux/Documentation/feature-removal-schedule.txt
===
--- linux.orig/Documentation/feature-removal-schedule.txt
+++ linux/Documentation/feature-removal-schedule.txt
@@ -49,16 +49,6 @@ Who: Adrian Bunk <[EMAIL PROTECTED]>
 
 ---
 
-What:  raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN
-When:  June 2007
-Why:   Deprecated in favour of the more efficient and robust rawiso interface.
-   Affected are applications which use the deprecated part of libraw1394
-   (raw1394_iso_write, raw1394_start_iso_write, raw1394_start_iso_rcv,
-   raw1394_stop_iso_rcv) or bypass libraw1394.
-Who:   Dan Dennedy <[EMAIL PROTECTED]>, Stefan Richter <[EMAIL PROTECTED]>
-

-
 What:  old NCR53C9x driver
 When:  October 2007
 Why:   Replaced by the much better esp_scsi driver.  Actual low-level
Index: linux/drivers/ieee1394/raw1394.h
===
--- linux.orig/drivers/ieee1394/raw1394.h
+++ linux/drivers/ieee1394/raw1394.h
@@ -17,11 +17,11 @@
 #define RAW1394_REQ_ASYNC_WRITE 101
 #define RAW1394_REQ_LOCK102
 #define RAW1394_REQ_LOCK64  103
-#define RAW1394_REQ_ISO_SEND104
+#define RAW1394_REQ_ISO_SEND104 /* removed ABI, now a no-op */
 #define RAW1394_REQ_ASYNC_SEND  105
 #define RAW1394_REQ_ASYNC_STREAM106
 
-#define RAW1394_REQ_ISO_LISTEN  200
+#define RAW1394_REQ_ISO_LISTEN  200 /* removed ABI, now a no-op */
 #define RAW1394_REQ_FCP_LISTEN  201
 #define RAW1394_REQ_RESET_BUS   202
 #define RAW1394_REQ_GET_ROM 203
Index: linux/drivers/ieee1394/raw1394-private.h
===
--- linux.orig/drivers/ieee1394/raw1394-private.h
+++ linux/drivers/ieee1394/raw1394-private.h
@@ -36,11 +36,6 @@ struct file_info {

Re: Please revert 21564fd2a3deb48200b595332f9ed4c9f311f2a7

2007-07-08 Thread Andi Kleen

On Monday 09 July 2007 00:44:44 Adrian Bunk wrote:
> On Sun, Jun 17, 2007 at 09:43:51PM -0700, Jeremy Fitzhardinge wrote:
> > Adrian Bunk wrote:
> >...
> > > Andi forwarded it although the following people had already NAK'ed it:
> > > - Christoph Hellwig [1]
> > > - Peter Zijlstra [2]
> > > - Alan Cox [3]
> > >
> > > Considering that Andi forwarded it 2 days after he himself said a 
> > > different solution was pending [4] I assume he mistakenly sent it for 
> > > inclusion in your tree.
> > >   
> > 
> > We played with some ideas, but they all turned out way too ugly to live. 
> 
> Andi got some NAK's, said himself it will be solved differently, and
> two days later he submits the NAK'ed patch into Linus' tree.

It will be solved differently longer term, but short term the fix
was still needed. There are limits on what can be done late 
in the release cycle so simple patches win.

Besides none of the "NAK"s were particularly inspired in my opinion;
there were no clear technical objections brought forward.

-Andi
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread Pavel Machek

Hi!

> > Freezer is not needed for snapshot -- it is needed so that we can
> > write out the snapshot to disk without the need for special
> > drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
> > write to disk from userland code in uswsusp).
> 
> Yes.
> 
> BTW, this patch:
> 
> http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch
> 
> that's queued up in -mm contains a freezer documentation update, in which the
> reasons of using it, as well as its limitations, are described.
> 
> To summarize what was previously said in this thread:
> 
> * Apparently, we agree that the freezer is _generally_ not needed for suspend
>   (ie. any transition to a system sleep state other than hibernation), but 
> some
>   of us (eg. me) think that it wouldn't be reasonable to drop the freezer from
>   the suspend code path _right_ _now_ .
> 
> * Some of us, including you, Nigel and me, think that the freezer is needed
>   for hibernation (please see the document in the patch above for details).
>   In the (very) long run this might be avoided too, but (IMO) certainly not at
>   this point.
> 
> * We seem to agree that in order to remove the freezer from the suspend code
>   path some work needs to be done on device drivers, driver midlayers and the
>   PM core.  We also need to do some work on the PM core in order to introduce
>   a separate hibernation framework and IMO it would be reasonable to
>   synchronize these efforts.
> 
> * We are now to decide what to do so that the freezer can be safely removed
>   from the suspend code path and how to integrate that change with the
>   hibernation code path (if possible and reasonable).

Nice summary, thanks.

> * The freezer vs FUSE issue that started this thread remains unresolved, so
>   it would be desirable to provide a short-term fix (need not be very nice).

Actually there are _2_ freezer vs FUSE issues, and one of them should
be simple to solve, once we have sysrq-t of the deadlock. (Or did I
miss it somewhere with discussion going on 10 lists in parallel?)
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

panics with 16port Promise Supertrack EX Controller

2007-07-08 Thread Flavio Curti

Hello

I have a problem with a server running 2.6.22rc4. The machine panics
after some days of running fine, the machine inst heavy loaded.

The Controller detects as stex device:

scsi0 : stex
scsi 0:0:0:0: Direct-Access Promise 1X2 Mirror  1.10 PQ: 0 ANSI: 3
scsi 0:0:2:0: Direct-Access Promise  12+2 Disk RAID6  1.10 PQ: 0 ANSI: 3
scsi 0:0:16:0: Processor Promise RAID Console 1.00 PQ: 0 ANSI: 3
sd 0:0:0:0: [sda] 976642048 512-byte hardware sectors (500041 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 0:0:0:0: [sda] 976642048 512-byte hardware sectors (500041 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:2:0: [sdb] Very big device.  Trying to use READ CAPACITY(16).
sd 0:0:2:0: [sdb] 11719704576 512-byte hardware sectors (6000489 MB)
sd 0:0:2:0: [sdb] Write Protect is off
sd 0:0:2:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 0:0:2:0: [sdb] Very big device.  Trying to use READ CAPACITY(16).
sd 0:0:2:0: [sdb] 11719704576 512-byte hardware sectors (6000489 MB)
sd 0:0:2:0: [sdb] Write Protect is off
sd 0:0:2:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
 sdb: sdb1
sd 0:0:2:0: [sdb] Attached SCSI disk
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 0:0:2:0: Attached scsi generic sg1 type 0
scsi 0:0:16:0: Attached scsi generic sg2 type 3

Im not sure where the problem is (controller/lvm/ext3), so if anyone has
an idea, Im happy to try it out...

Thank you

Flavio Curti

--
http://no-way.org/~fcu/
Jul  8 00:19:13 dorade.cyberlink.ch [ cut here ] 
Jul  8 00:19:13 dorade.cyberlink.ch kernel BUG at block/as-iosched.c:1084! 
Jul  8 00:19:13 dorade.cyberlink.ch invalid opcode:  [#1] 
Jul  8 00:19:13 dorade.cyberlink.ch SMP 
Jul  8 00:19:13 dorade.cyberlink.ch  
Jul  8 00:19:13 dorade.cyberlink.ch Modules linked in:
Jul  8 00:19:13 dorade.cyberlink.ch  i2c_i801
Jul  8 00:19:13 dorade.cyberlink.ch  i2c_core
Jul  8 00:19:13 dorade.cyberlink.ch  
Jul  8 00:19:13 dorade.cyberlink.ch CPU:2 
Jul  8 00:19:13 dorade.cyberlink.ch EIP:0060:[]Not tainted 
VLI 
Jul  8 00:19:13 dorade.cyberlink.ch EFLAGS: 00210046   (2.6.22-rc7-dorade #1) 
Jul  8 00:19:13 dorade.cyberlink.ch EIP is at as_dispatch_request+0x387/0x390 
Jul  8 00:19:13 dorade.cyberlink.ch eax:    ebx: c5ee10c0   ecx: 
c5ee10d4   edx:  
Jul  8 00:19:13 dorade.cyberlink.ch esi:    edi: 0001   ebp: 
   esp: cd90bbd4 
Jul  8 00:19:13 dorade.cyberlink.ch ds: 007b   es: 007b   fs: 00d8  gs: 0033  
ss: 0068 
Jul  8 00:19:13 dorade.cyberlink.ch Process vsftpd (pid: 5332, ti=cd90a000 
task=f780d030 task.ti=cd90a000)
Jul  8 00:19:13 dorade.cyberlink.ch  
Jul  8 00:19:13 dorade.cyberlink.ch Stack: 
Jul  8 00:19:13 dorade.cyberlink.ch c012abe7 
Jul  8 00:19:13 dorade.cyberlink.ch f784c5cc 
Jul  8 00:19:13 dorade.cyberlink.ch c5f11000 
Jul  8 00:19:13 dorade.cyberlink.ch f784c5cc 
Jul  8 00:19:13 dorade.cyberlink.ch 00200286 
Jul  8 00:19:13 dorade.cyberlink.ch c5ee8be4 
Jul  8 00:19:13 dorade.cyberlink.ch c5f11000 
Jul  8 00:19:13 dorade.cyberlink.ch f7a16000 
Jul  8 00:19:13 dorade.cyberlink.ch  
Jul  8 00:19:13 dorade.cyberlink.ch
Jul  8 00:19:13 dorade.cyberlink.ch c5ee8be4 
Jul  8 00:19:13 dorade.cyberlink.ch c02a0810 
Jul  8 00:19:13 dorade.cyberlink.ch f7a16000 
Jul  8 00:19:13 dorade.cyberlink.ch 00200287 
Jul  8 00:19:13 dorade.cyberlink.ch c0370076 
Jul  8 00:19:13 dorade.cyberlink.ch f784c580 
Jul  8 00:19:13 dorade.cyberlink.ch  
Jul  8 00:19:13 dorade.cyberlink.ch c5f11000 
Jul  8 00:19:13 dorade.cyberlink.ch c5ee8be4 
Jul  8 00:19:13 dorade.cyberlink.ch  
Jul  8 00:19:13 dorade.cyberlink.ch Call Trace: 
Jul  8 00:19:13 dorade.cyberlink.ch  [] 
Jul  8 00:19:13 dorade.cyberlink.ch lock_timer_base+0x27/0x60 
Jul  8 00:19:13 dorade.cyberlink.ch  [] 
Jul  8 00:19:13 dorade.cyberlink.ch elv_next_request+0x20/0x130 
Jul  8 00:19:13 dorade.cyberlink.ch  [] 
Jul  8 00:19:13 dorade.cyberlink.ch scsi_dispatch_cmd+0x146/0x230 
Jul  8 00:19:13 dorade.cyberlink.ch  [] 
Jul  8 00:19:13 dorade.cyberlink.ch scsi_request_fn+0x185/0x2c0 
Jul  8 00:19:13 dorade.cyberlink.ch  [] 
Jul  8 00:19:13 dorade.cyberlink.ch __generic_unplug_device+0x25/0x30 
Jul  8 00:19:13 dorade.cyberlink.ch  [] 
Jul  8 00:19:13 dorade.cyberlink.ch generic_unplug_device+0x15/0x30 
Jul  8 00:19:13 dorade.cyberlink.ch  [] 
Jul  8 00:19:13 dorade.cyberlink.ch dm_table_unplug_all+0x22/0x30 
Jul  8 00:19:13 dorade.cyberlink.ch  [] 
Jul  8 00:19:13 dorade.cyberlink.ch dm_unplug_all+0x17/0x30 
Jul  8 00:19:13 dorade.cyberlink.ch  [] 
Jul  8 00:19:13 dorade.cyberlink.ch blk_backing_dev_unplug+0xc/0x10 
Jul  8 00:19:13 dorade.cyberlink.ch  [] 
Jul  8 00:19:13 dorade.cyberlink.ch sync_buffer+0x2b/0x40 
Jul  8 00:19:13

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread Pavel Machek

Hi!

> >>>Actaully, I'm perfectly fine with that, as long as each task blocked by 
> >>>the
> >>>driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
> >>>least theoretically, we'll be able to drop the freezer from the suspend 
> >>>code
> >>>path and move it after device_suspend() (or the hibernation-specific
> >>>equivalent) for hibernation (in that case there shouldn't be a problem 
> >>>with
> >>>any task waiting on I/O while the freezer is running ;-)).
> >>
> >>I don't see the need for a freezer for snapshot but that's a different
> >>issue. (stop_machine looks good enough to me).
> >
> >Freezer is not needed for snapshot -- it is needed so that we can
> >write out the snapshot to disk without the need for special
> >drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
> >write to disk from userland code in uswsusp).
> 
> instead of trying to freeze most of the system, could you do something 
> like start a virtual machine sandbox to write the data out, and not let 
> any userspace other then the sandbox operate?
> 
> you would need to throw away disk buffers so that you don't mix current 
> pending I/O with I/O from the sandbox, and this would be a visable change 
> for how suspend is setup, but wouldn't this work?

It feels kind of expensive, but yes, we could use another kernel for
doing the dump. Kdump people are using that. We could use hypervisor
for doing the dump. Xen people are doing that. (But I do not think any
of those solutions is suitable for "lets hibernate my notebook" case).

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread Rafael J. Wysocki

Hi,

On Monday, 9 July 2007 00:13, Pavel Machek wrote:
> Hi!
> 
> > > Actaully, I'm perfectly fine with that, as long as each task blocked by 
> > > the
> > > driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
> > > least theoretically, we'll be able to drop the freezer from the suspend 
> > > code
> > > path and move it after device_suspend() (or the hibernation-specific
> > > equivalent) for hibernation (in that case there shouldn't be a problem 
> > > with
> > > any task waiting on I/O while the freezer is running ;-)).
> > 
> > I don't see the need for a freezer for snapshot but that's a different
> > issue. (stop_machine looks good enough to me).
> 
> Freezer is not needed for snapshot -- it is needed so that we can
> write out the snapshot to disk without the need for special
> drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
> write to disk from userland code in uswsusp).

Yes.

BTW, this patch:

http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch

that's queued up in -mm contains a freezer documentation update, in which the
reasons of using it, as well as its limitations, are described.

To summarize what was previously said in this thread:

* Apparently, we agree that the freezer is _generally_ not needed for suspend
  (ie. any transition to a system sleep state other than hibernation), but some
  of us (eg. me) think that it wouldn't be reasonable to drop the freezer from
  the suspend code path _right_ _now_ .

* Some of us, including you, Nigel and me, think that the freezer is needed
  for hibernation (please see the document in the patch above for details).
  In the (very) long run this might be avoided too, but (IMO) certainly not at
  this point.

* We seem to agree that in order to remove the freezer from the suspend code
  path some work needs to be done on device drivers, driver midlayers and the
  PM core.  We also need to do some work on the PM core in order to introduce
  a separate hibernation framework and IMO it would be reasonable to
  synchronize these efforts.

* We are now to decide what to do so that the freezer can be safely removed
  from the suspend code path and how to integrate that change with the
  hibernation code path (if possible and reasonable).

* The freezer vs FUSE issue that started this thread remains unresolved, so
  it would be desirable to provide a short-term fix (need not be very nice).

Greetings,
Rafael

-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.21.5-rt19

2007-07-08 Thread Fernando Lopez-Lezcano

On Sun, 2007-07-08 at 15:36 -0700, Fernando Lopez-Lezcano wrote:
> On Sat, 2007-07-07 at 11:24 +0200, Ingo Molnar wrote:
> > * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> > > > > Changes since 2.6.21.5-rt18:
> > > > >
> > > > > - Fixed a nasty and hard to track down slowness / boot problem on SMP
> > > > > machines with CONFIG_NOHZ enabled. The problem was caused by the timer
> > > > > wheel base lock held during the get_next_timer_interrupt() call in the
> > > > > idle path, which eventually led to a bogus PI boosting of the idle 
> > > > > task
> > > > > and in consequence a stale wrong scheduler selection for the affected 
> > > > > idle
> > > > > task.
> > > > >
> > > > > Kudos to Carsten Emde, who patiently and meticulously isolated the
> > > > > problem and provided the traces, which allowed to identify the root 
> > > > > cause.
> > > > >
> > > > > Problem solution: Prevent idle task boosting
> > 
> > > > Maybe someone remember me whining about troubles with 2.6.21-rt2..18 
> > > > on my Core2 T7200 laptop (fujitsu-siemens amilo i1520).
> > > > 
> > > > Althought I'm still with my fingers crossed, I can tell the good 
> > > > news are that 2.6.21.5-rt19 (and -rt20) does behave far better now 
> > > > on the very same box.
> > > 
> > > Yes, it works much better indeed...
> > > 
> > > Ingo: is there a place where I can read about the changes in different 
> > > rtxx releases? What is new/better/fixed in rt20? (I see scheduler 
> > > stuff in a diff from rt19 to rt20 but I don't really know what it 
> > > means).
> > 
> > and rt18 was a -rt-only NOHZ fix, that bug got introduced in rt11 when 
> > CFS was merged.
> > 
> > i _think_ Rui might have seen two separate problems. Perhaps by the time 
> > we fixed the first problem (which Rui saw since -rt2) we introduced the 
> > other one via -rt11 - which then got fixed in -rt19.
> 
> Ahh, CFS is now part of rt, I was obviously not paying attention... I'm
> really trying to provide a "stable" rt kernel for audio usage and
> including another subsystem into rt is - IMHO - not going to help.
> What's the chance of splitting things?
> 
> > btw., we'd love to get more feedback regarding CFS. CFS is a completely 
> > new scheduler for Linux. 
> 
> Then I'd rather have it separate from rt. 

Please?

I would like to provide the least ammount of new functionality that is
really necessary in my audio kernels. Audio related requirements include
the rt patch but not a new scheduler. 

> > It has a design centered around keeping 
> > application latencies down, so it is ultimately real-time friendly, and 
> > it should also make things work better for desktop-ish and audio-ish 
> > stuff as well. (even under SCHED_OTHER)
> 
> Maybe this is CFS related? (tail of a thread in the Planet CCRMA mailing
> list):
> 
> On Sun, 2007-07-08 at 15:26 -0400, Hector Centeno wrote:
> > Ok, so just to confirm, that 2.6.21-0182.rt19.1.fc7.ccrmart works fine
> > on my desktop but on my laptop it makes Firefox and Tomboy to crash.
> > On the same laptop using 2.6.21-0182.rt17.1.fc7.ccrmart there is no
> > problem.

It looks to my untrained eye like it is CFS related, I'm attaching the
last part of the strace of firefox while it tries to load a flash site.
The firefox process is left in an unkillable (not even by -9) state.
What else could I provide to debug the problem? (this is in a T61 laptop
with the Intel 7700 processor). 

-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Please revert 21564fd2a3deb48200b595332f9ed4c9f311f2a7

2007-07-08 Thread Adrian Bunk

On Sun, Jun 17, 2007 at 09:43:51PM -0700, Jeremy Fitzhardinge wrote:
> Adrian Bunk wrote:
>...
> > Andi forwarded it although the following people had already NAK'ed it:
> > - Christoph Hellwig [1]
> > - Peter Zijlstra [2]
> > - Alan Cox [3]
> >
> > Considering that Andi forwarded it 2 days after he himself said a 
> > different solution was pending [4] I assume he mistakenly sent it for 
> > inclusion in your tree.
> >   
> 
> We played with some ideas, but they all turned out way too ugly to live. 

Andi got some NAK's, said himself it will be solved differently, and
two days later he submits the NAK'ed patch into Linus' tree.

Was this a mistake that should be reverted at least for now because of 
this, or is silently doing the opposite of what you said you'd do how
Linux development is expected to work today?

> > Reverting is safe since it simply re-establishes the 2.6.21 status quo.
> 
> Well, not really.  It breaks any non-GPL module when CONFIG_PARAVIRT is
> enabled, even though the same module would work fine otherwise.  That's
> a pretty large regression.
>...

The 2.6.21 status quo can by definition not be a regression compared
to 2.6.21.

> J

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.21.5-rt19

2007-07-08 Thread Fernando Lopez-Lezcano

On Sat, 2007-07-07 at 11:24 +0200, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> > > > Changes since 2.6.21.5-rt18:
> > > >
> > > > - Fixed a nasty and hard to track down slowness / boot problem on SMP
> > > > machines with CONFIG_NOHZ enabled. The problem was caused by the timer
> > > > wheel base lock held during the get_next_timer_interrupt() call in the
> > > > idle path, which eventually led to a bogus PI boosting of the idle task
> > > > and in consequence a stale wrong scheduler selection for the affected 
> > > > idle
> > > > task.
> > > >
> > > > Kudos to Carsten Emde, who patiently and meticulously isolated the
> > > > problem and provided the traces, which allowed to identify the root 
> > > > cause.
> > > >
> > > > Problem solution: Prevent idle task boosting
> 
> > > Maybe someone remember me whining about troubles with 2.6.21-rt2..18 
> > > on my Core2 T7200 laptop (fujitsu-siemens amilo i1520).
> > > 
> > > Althought I'm still with my fingers crossed, I can tell the good 
> > > news are that 2.6.21.5-rt19 (and -rt20) does behave far better now 
> > > on the very same box.
> > 
> > Yes, it works much better indeed...
> > 
> > Ingo: is there a place where I can read about the changes in different 
> > rtxx releases? What is new/better/fixed in rt20? (I see scheduler 
> > stuff in a diff from rt19 to rt20 but I don't really know what it 
> > means).
> 
> and rt18 was a -rt-only NOHZ fix, that bug got introduced in rt11 when 
> CFS was merged.
> 
> i _think_ Rui might have seen two separate problems. Perhaps by the time 
> we fixed the first problem (which Rui saw since -rt2) we introduced the 
> other one via -rt11 - which then got fixed in -rt19.

Ahh, CFS is now part of rt, I was obviously not paying attention... I'm
really trying to provide a "stable" rt kernel for audio usage and
including another subsystem into rt is - IMHO - not going to help.
What's the chance of splitting things?

> btw., we'd love to get more feedback regarding CFS. CFS is a completely 
> new scheduler for Linux. 

Then I'd rather have it separate from rt. 

> It has a design centered around keeping 
> application latencies down, so it is ultimately real-time friendly, and 
> it should also make things work better for desktop-ish and audio-ish 
> stuff as well. (even under SCHED_OTHER)

Maybe this is CFS related? (tail of a thread in the Planet CCRMA mailing
list):

On Sun, 2007-07-08 at 15:26 -0400, Hector Centeno wrote:
> Ok, so just to confirm, that 2.6.21-0182.rt19.1.fc7.ccrmart works fine
> on my desktop but on my laptop it makes Firefox and Tomboy to crash.
> On the same laptop using 2.6.21-0182.rt17.1.fc7.ccrmart there is no
> problem.
> 
> Cheers,
> 
> Hector
> 
> 
> On 7/7/07, Hector Centeno <[EMAIL PROTECTED]> wrote:
> Hi Fernando,
> 
> I do have Flash installed but for me Firefox crashes when
> trying to
> access gmail (which AFAIK doesn't use Flash, does it?). Right
> now
> Firefox is frozen and I'm typing this email using Konkeror (in
> Gnome). 
> This is ps' output:
> 
> hector3595  1.1  2.2 194352 46336 ?D16:25
> 0:03
> /usr/lib/firefox-2.0.0.4/firefox-bin
> 
> I think the problem is not present in my Desktop but I have to
> double 
> check. In the same laptop using the stock fedora kernel both
> Tomboy
> and Firefox work fine. My laptop has a centrino duo processor,
> 2 gigs
> of ram and the Inte GMA950 graphics chip.
> 
> Hector

I managed to completely hang firefox (fc7) with flash 9 installed
(unkillable even with -9). Does not seem to happen with flash 7. Have
not tried yet with gmail and flash uninstalled. I'll try to strace it to
see when/why it hangs. 

-- Fernando


> So it would be nice if you could keep an extra eye on any scheduling 
> artifacts or regressions, and make sure your favorite workload is still 
> handled by the Linux scheduler in the utmost best way. I'd like to hear 
> about any sort of "scheduling behavior / interactivity" regression you 
> might see, relative to the vanilla kernel. Or if you can see no such 
> problems then a line of "it works as well as the previous scheduler" is 
> important info to us too. Thanks!


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread david


On Mon, 9 Jul 2007, Pavel Machek wrote:


Actaully, I'm perfectly fine with that, as long as each task blocked by the
driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
least theoretically, we'll be able to drop the freezer from the suspend code
path and move it after device_suspend() (or the hibernation-specific
equivalent) for hibernation (in that case there shouldn't be a problem with
any task waiting on I/O while the freezer is running ;-)).


I don't see the need for a freezer for snapshot but that's a different
issue. (stop_machine looks good enough to me).


Freezer is not needed for snapshot -- it is needed so that we can
write out the snapshot to disk without the need for special
drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
write to disk from userland code in uswsusp).


instead of trying to freeze most of the system, could you do something 
like start a virtual machine sandbox to write the data out, and not let 
any userspace other then the sandbox operate?


you would need to throw away disk buffers so that you don't mix current 
pending I/O with I/O from the sandbox, and this would be a visable change 
for how suspend is setup, but wouldn't this work?


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread Pavel Machek

Hi!

> > Actaully, I'm perfectly fine with that, as long as each task blocked by the
> > driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
> > least theoretically, we'll be able to drop the freezer from the suspend code
> > path and move it after device_suspend() (or the hibernation-specific
> > equivalent) for hibernation (in that case there shouldn't be a problem with
> > any task waiting on I/O while the freezer is running ;-)).
> 
> I don't see the need for a freezer for snapshot but that's a different
> issue. (stop_machine looks good enough to me).

Freezer is not needed for snapshot -- it is needed so that we can
write out the snapshot to disk without the need for special
drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
write to disk from userland code in uswsusp).
Pavel 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 23:20, Benjamin Herrenschmidt wrote:
> 
> > But I'm not sure it's a good idea in the long run.  Think of a printer 
> > daemon, for example.  It shouldn't have to experience unexpected I/O 
> > problems merely because someone has decided to put the system to sleep.
> 
> Why not ? Printer is offline when machine is asleep... trying to print
> errors out, I don't see the problem there. At one point, we'll need a
> cleaner way to also notify userland in which case our daemon could
> become more intelligent and stop servicings things before sleep and
> resume afterward :-)
>  
> > This will be up to the people responsible for the subsystems.  I can 
> > take care of USB.
> 
> USB is not that much of a problem in the sense that for most "leaf"
> drivers, USB is a provider (ie, the bus they sit on), not the client
> (like the network stack is to network drivers).
> 
> In most cases, that "helper" thing would sit on the client subsystem,
> since it's the one feeding drivers with requests. The main ones I see at
> hand are block, alsa, net, fb/drm... Some of them already have
> infrastructure to do it, some my need some more work.
> 
> > > I think it's a fairly significant change from the current freezer and I
> > > also think it's a very good idea. The more I think about it, the more I
> > > like it, in the sense that it's a simple drop-in that you could put in a
> > > lot of the ioctl path of drivers to just block tasks that are trying to
> > > call in while suspending, and could be used selectively by things like
> > > the USB hub threads.
> > 
> > That's what I had in mind.  Rafael, can we add an "icebox" routine?  
> > Like Ben says, it doesn't need to be much more than a waitqueue
> > that the current task puts itself on if a suspend is in progress.  
> > Callers arriving at a time when the icebox isn't activated should
> > simply return without blocking.  Basically the icebox should be active 
> > at the same times as the existing freezer.
> 
> There is still the race of:
> 
>   drivers_sysfs_write()
>   try_to_icebox()
>   < 
>   hit hardware
> 
> Those are akin, in some ways, to the freezer races.

I'm not sure what races you're referring to, but never mind. :-)

This is the reason why the freezer waits for tasks to freeze, actually.

> Some kind of RCU might take care of them if we enable the icebox,
> then wait for all tasks to hit an explicit schedule point once (or return to
> userland).

This is what the freezer does, isn't it? ;-)

> That would mean that drivers need to try_to_icebox() again if they do
> something that may schedule (such as __get_user). So it's not a magic
> solution, it has issues, but it can handle a lot of the simplest cases. 

Well, can't we do:

drivers_sysfs_write()
while(!suspend_trylock())
try_to_icebox() --> or even try_to_freeze(), what's the 
difference?

hit hardware
unlock_suspend()

where the PM core must wait for the suspend lock to get released (say with a
timeout)?
 
> > Here's a wacky idea which just might work:
> > 
> > In order to prevent binding and unbinding, while suspending devices all
> > the PM core has to do is avoid dropping the device semaphores!  It can
> > release the semaphores as it resumes the devices.
> > 
> > Of course, for this to work it's necessary to avoid changes to the 
> > device list during the suspend.  However I believe the iteration can be 
> > made safe against unregistration, so we only have to prevent device 
> > registration.  (And anyway, it won't be possible to unregister a device 
> > while the PM core is holding its semaphore.)
> > 
> > If we are willing to be somewhat non-transparent, this is easy to
> > accomplish.  After the notifier chain has been alerted about the
> > upcoming suspend, we tell the driver core to disallow adding new
> > devices.  Maybe use SRCU to synchronize with registration calls that
> > are in progress.  Thus, until the suspend is over device_add() will
> > immediately return an error.  We could even add a new ESUSPENDING code
> > to errno.h; it would come in handy in a few places.
> > 
> > Drivers are already prepared for device registration to fail (or they
> > ought to be), so this change shouldn't knock the bottom out of things.  
> > device_add() isn't on a hot path, so adding an extra check and
> > srcu_read_lock() won't hurt.
> 
> True. Also, bus drivers could just flag the port with something saying
> "try registering again later". Don't underestimate the power of "try
> later" constructs :-)
> 
> > I have had the same thought, that unbinding and unregistration would be 
> > easier to handle than binding and registration.  As it happens, holding 
> > the device semaphore will block both all three -- which makes life 
> > simpler.
> 
> Yup. Fair and simple.
> 
> > There are other possibilities too.  For example, instead of using
> > keventd these

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt

On Sun, 2007-07-08 at 23:45 +0200, Rafael J. Wysocki wrote:
> 
> Workqueues are kernel threads and the creator decides if they are going to
> freeze.  There are only two freezable worqueues in the entire tree right now.

That and keventd workqueues... my point is you may well end up with
something in a workqueue doing things that should have been blocked
after the freeze.

> Actaully, I'm perfectly fine with that, as long as each task blocked by the
> driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
> least theoretically, we'll be able to drop the freezer from the suspend code
> path and move it after device_suspend() (or the hibernation-specific
> equivalent) for hibernation (in that case there shouldn't be a problem with
> any task waiting on I/O while the freezer is running ;-)).

I don't see the need for a freezer for snapshot but that's a different
issue. (stop_machine looks good enough to me).

Ben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

ERROR: "ROOT_DEV" [drivers/mtd/maps/nettel.ko] undefined!

2007-07-08 Thread Jesper Juhl

While building randconfig kernels I ran into this today : 

...
Kernel: arch/i386/boot/bzImage is ready  (#1)
  Building modules, stage 2.
  MODPOST 239 modules
ERROR: "ROOT_DEV" [drivers/mtd/maps/nettel.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

The config that caused this is attached.


(PS. when replying from the linux-mtd list, please keep me on Cc since
I'm not subscribed)

--
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22-rc7
# Sun Jul  8 23:29:43 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
# CONFIG_EXPERIMENTAL is not set
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set
CONFIG_UTS_NS=y
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_IKCONFIG=y
# CONFIG_IKCONFIG_PROC is not set
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_CPUSETS is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_RELAY=y
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_EMBEDDED=y
CONFIG_UID16=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
# CONFIG_ELF_CORE is not set
# CONFIG_BASE_FULL is not set
# CONFIG_FUTEX is not set
# CONFIG_ANON_INODES is not set
# CONFIG_SHMEM is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_TINY_SHMEM=y
CONFIG_BASE_SMALL=1

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODVERSIONS=y
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_KMOD is not set
CONFIG_STOP_MACHINE=y

#
# Block layer
#
CONFIG_BLOCK=y
CONFIG_LBD=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_LSF=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
CONFIG_DEFAULT_NOOP=y
CONFIG_DEFAULT_IOSCHED="noop"

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_SMP=y
# CONFIG_X86_PC is not set
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
CONFIG_X86_VISWS=y
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
CONFIG_MWINCHIPC6=y
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
CONFIG_X86_GENERIC=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_MINIMUM_CPU_MODEL=4
CONFIG_HPET_TIMER=y
CONFIG_NR_CPUS=8
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_BKL is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_VISWS_APIC=y
# CONFIG_X86_MCE is not set
# CONFIG_VM86 is not set
CONFIG_TOSHIBA=y
# CONFIG_I8K is not set
# CONFIG_X86_REBOOTFIXUPS is not set
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
# CONFIG_X86_CPUID is not set

#
# Firmware Drivers
#
CONFIG_EDD=m
CONFIG_DELL_RBU=y
CONFIG_DCDBAS=m
# CONFIG_NOHIGHMEM is not set
# CONFIG_HIGHMEM4G is not set
CONFIG_HIGHMEM64G=y
CONFIG_PAGE_OFFSET=0xC000
CONFIG_HIGHMEM=y
CONFIG_X86_PAE=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
#

Re: [2.6.22-rc7] khubd NULL deref oops...

2007-07-08 Thread Michal Piotrowski


Hi Daniel,

On 08/07/07, Daniel J Blueman <[EMAIL PROTECTED]> wrote:

When plugging in a USB 2 mass-storage device which I've been seeing
problems with, I caught a khubd oops [1]. Kernel is 2.6.22-rc7


Is this a regression?


on ia32
built with Ubuntu's 2.6.22 .config.


Please send this config.



Let me know if you need more details,
  Daniel

--- [1]

[ 4764.112000] usb 5-3: new high speed USB device using ehci_hcd and address 8
[ 4764.244000] PM: Adding info for usb:5-3
[ 4764.244000] PM: Adding info for No Bus:usbdev5.8_ep00
[ 4764.244000] usb 5-3: configuration #1 chosen from 1 choice
[ 4764.244000] PM: Adding info for usb:5-3:1.0
[ 4764.244000] scsi6 : SCSI emulation for USB Mass Storage devices
[ 4764.244000] PM: Adding info for No Bus:host6
[ 4764.244000] PM: Adding info for No Bus:usbdev5.8_ep81
[ 4764.244000] PM: Adding info for No Bus:usbdev5.8_ep02
[ 4764.244000] PM: Adding info for No Bus:usbdev5.8
[ 4764.244000] usb-storage: device found at 8
[ 4764.244000] usb-storage: waiting for device to settle before scanning
[ 4769.244000] usb-storage: device scan complete
[ 4769.244000] PM: Adding info for No Bus:target6:0:0
[ 4769.244000] scsi 6:0:0:0: Direct-Access Maxtor 6 Y120L0
  0811 PQ: 0 ANSI: 0
[ 4769.244000] PM: Adding info for No Bus:target6:0:1
[ 4769.244000] PM: Removing info for No Bus:target6:0:1
[ 4769.244000] PM: Adding info for No Bus:target6:0:2
[ 4769.244000] PM: Removing info for No Bus:target6:0:2
[ 4769.244000] PM: Adding info for No Bus:target6:0:3
[ 4769.244000] PM: Removing info for No Bus:target6:0:3
[ 4769.244000] PM: Adding info for No Bus:target6:0:4
[ 4769.244000] PM: Removing info for No Bus:target6:0:4
[ 4769.244000] PM: Adding info for No Bus:target6:0:5
[ 4769.244000] PM: Removing info for No Bus:target6:0:5
[ 4769.244000] PM: Adding info for No Bus:target6:0:6
[ 4769.244000] PM: Removing info for No Bus:target6:0:6
[ 4769.244000] PM: Adding info for No Bus:target6:0:7
[ 4769.244000] PM: Removing info for No Bus:target6:0:7
[ 4769.244000] PM: Adding info for scsi:6:0:0:0
[ 4769.248000] sd 6:0:0:0: [sdb] 240121728 512-byte hardware sectors (122942 MB)
[ 4769.248000] sd 6:0:0:0: [sdb] Test WP failed, assume Write Enabled
[ 4769.248000] sd 6:0:0:0: [sdb] Assuming drive cache: write through
[ 4769.248000] sd 6:0:0:0: [sdb] 240121728 512-byte hardware sectors (122942 MB)
[ 4769.252000] sd 6:0:0:0: [sdb] Test WP failed, assume Write Enabled
[ 4769.252000] sd 6:0:0:0: [sdb] Assuming drive cache: write through
[ 4769.252000]  sdb: sdb1 < sdb5 sdb6<6>usb 5-3: reset high speed USB
device using ehci_hcd and address 8
[ 4769.544000] usb 5-3: device descriptor read/64, error -71
[ 4769.76] usb 5-3: device descriptor read/64, error -71
[ 4769.976000] usb 5-3: reset high speed USB device using ehci_hcd and address 8
[ 4770.088000] usb 5-3: device descriptor read/64, error -71
[ 4770.304000] usb 5-3: device descriptor read/64, error -71
[ 4770.52] usb 5-3: reset high speed USB device using ehci_hcd and address 8
[ 4770.928000] usb 5-3: device not accepting address 8, error -71
[ 4771.04] usb 5-3: reset high speed USB device using ehci_hcd and address 8
[ 4771.448000] usb 5-3: device not accepting address 8, error -71
[ 4771.448000] usb 5-3: USB disconnect, address 8
[ 4771.448000] PM: Removing info for No Bus:usbdev5.8_ep81
[ 4771.448000] PM: Removing info for No Bus:usbdev5.8_ep02
[ 4771.448000] BUG: unable to handle kernel NULL pointer dereference
at virtual address 
[ 4771.448000]  printing eip:
[ 4771.448000] c0255cc7
[ 4771.448000] *pde = 
[ 4771.448000] Oops:  [#1]
[ 4771.448000] SMP
[ 4771.448000] Modules linked in: usb_storage ide_core libusual
binfmt_misc nfs lockd sunrpc ipv6 sonypi ppdev acpi_cpufreq
cpufreq_stats cpufreq_powersave cpufreq_conservative cpufreq_userspace
cpufreq_ondemand freq_table button ac video dock container sbs battery
af_packet sbp2 parport_pc lp parport fuse snd_hda_intel snd_pcm_oss
snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss joydev snd_seq_midi
snd_rawmidi snd_seq_midi_event tifm_7xx1 pcmcia snd_seq snd_timer
snd_seq_device sky2 serio_raw tifm_core snd soundcore yenta_socket
rsrc_nonstatic pcmcia_core psmouse iTCO_wdt iTCO_vendor_support
snd_page_alloc shpchp pci_hotplug intel_agp agpgart evdev ext3 jbd
mbcache sg sd_mod sr_mod cdrom ata_generic ata_piix ohci1394 ieee1394
libata scsi_mod ehci_hcd uhci_hcd usbcore thermal processor fan
[ 4771.448000] CPU:0
[ 4771.448000] EIP:0060:[]Not tainted VLI
[ 4771.448000] EFLAGS: 00010202   (2.6.22-rc7-50 #1)
[ 4771.448000] EIP is at make_class_name+0x27/0x80
[ 4771.448000] eax:    ebx:    ecx:    edx: 000b
[ 4771.448000] esi: f89ffca6   edi:    ebp:    esp: f7d05e5c
[ 4771.448000] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
[ 4771.448000] Process khubd (pid: 2188, ti=f7d04000 task=c190f480
task.ti=f7d04000)
[ 4771.448000] Stack: dfc9a608 dfc9a494 dfc9a600 dfc9a608 f8a12ac0
c0255e59  f8a12b88
[

[2.6.22-rc7] khubd NULL deref oops...

2007-07-08 Thread Daniel J Blueman


When plugging in a USB 2 mass-storage device which I've been seeing
problems with, I caught a khubd oops [1]. Kernel is 2.6.22-rc7 on ia32
built with Ubuntu's 2.6.22 .config.

Let me know if you need more details,
 Daniel

--- [1]

[ 4764.112000] usb 5-3: new high speed USB device using ehci_hcd and address 8
[ 4764.244000] PM: Adding info for usb:5-3
[ 4764.244000] PM: Adding info for No Bus:usbdev5.8_ep00
[ 4764.244000] usb 5-3: configuration #1 chosen from 1 choice
[ 4764.244000] PM: Adding info for usb:5-3:1.0
[ 4764.244000] scsi6 : SCSI emulation for USB Mass Storage devices
[ 4764.244000] PM: Adding info for No Bus:host6
[ 4764.244000] PM: Adding info for No Bus:usbdev5.8_ep81
[ 4764.244000] PM: Adding info for No Bus:usbdev5.8_ep02
[ 4764.244000] PM: Adding info for No Bus:usbdev5.8
[ 4764.244000] usb-storage: device found at 8
[ 4764.244000] usb-storage: waiting for device to settle before scanning
[ 4769.244000] usb-storage: device scan complete
[ 4769.244000] PM: Adding info for No Bus:target6:0:0
[ 4769.244000] scsi 6:0:0:0: Direct-Access Maxtor 6 Y120L0
 0811 PQ: 0 ANSI: 0
[ 4769.244000] PM: Adding info for No Bus:target6:0:1
[ 4769.244000] PM: Removing info for No Bus:target6:0:1
[ 4769.244000] PM: Adding info for No Bus:target6:0:2
[ 4769.244000] PM: Removing info for No Bus:target6:0:2
[ 4769.244000] PM: Adding info for No Bus:target6:0:3
[ 4769.244000] PM: Removing info for No Bus:target6:0:3
[ 4769.244000] PM: Adding info for No Bus:target6:0:4
[ 4769.244000] PM: Removing info for No Bus:target6:0:4
[ 4769.244000] PM: Adding info for No Bus:target6:0:5
[ 4769.244000] PM: Removing info for No Bus:target6:0:5
[ 4769.244000] PM: Adding info for No Bus:target6:0:6
[ 4769.244000] PM: Removing info for No Bus:target6:0:6
[ 4769.244000] PM: Adding info for No Bus:target6:0:7
[ 4769.244000] PM: Removing info for No Bus:target6:0:7
[ 4769.244000] PM: Adding info for scsi:6:0:0:0
[ 4769.248000] sd 6:0:0:0: [sdb] 240121728 512-byte hardware sectors (122942 MB)
[ 4769.248000] sd 6:0:0:0: [sdb] Test WP failed, assume Write Enabled
[ 4769.248000] sd 6:0:0:0: [sdb] Assuming drive cache: write through
[ 4769.248000] sd 6:0:0:0: [sdb] 240121728 512-byte hardware sectors (122942 MB)
[ 4769.252000] sd 6:0:0:0: [sdb] Test WP failed, assume Write Enabled
[ 4769.252000] sd 6:0:0:0: [sdb] Assuming drive cache: write through
[ 4769.252000]  sdb: sdb1 < sdb5 sdb6<6>usb 5-3: reset high speed USB
device using ehci_hcd and address 8
[ 4769.544000] usb 5-3: device descriptor read/64, error -71
[ 4769.76] usb 5-3: device descriptor read/64, error -71
[ 4769.976000] usb 5-3: reset high speed USB device using ehci_hcd and address 8
[ 4770.088000] usb 5-3: device descriptor read/64, error -71
[ 4770.304000] usb 5-3: device descriptor read/64, error -71
[ 4770.52] usb 5-3: reset high speed USB device using ehci_hcd and address 8
[ 4770.928000] usb 5-3: device not accepting address 8, error -71
[ 4771.04] usb 5-3: reset high speed USB device using ehci_hcd and address 8
[ 4771.448000] usb 5-3: device not accepting address 8, error -71
[ 4771.448000] usb 5-3: USB disconnect, address 8
[ 4771.448000] PM: Removing info for No Bus:usbdev5.8_ep81
[ 4771.448000] PM: Removing info for No Bus:usbdev5.8_ep02
[ 4771.448000] BUG: unable to handle kernel NULL pointer dereference
at virtual address 
[ 4771.448000]  printing eip:
[ 4771.448000] c0255cc7
[ 4771.448000] *pde = 
[ 4771.448000] Oops:  [#1]
[ 4771.448000] SMP
[ 4771.448000] Modules linked in: usb_storage ide_core libusual
binfmt_misc nfs lockd sunrpc ipv6 sonypi ppdev acpi_cpufreq
cpufreq_stats cpufreq_powersave cpufreq_conservative cpufreq_userspace
cpufreq_ondemand freq_table button ac video dock container sbs battery
af_packet sbp2 parport_pc lp parport fuse snd_hda_intel snd_pcm_oss
snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss joydev snd_seq_midi
snd_rawmidi snd_seq_midi_event tifm_7xx1 pcmcia snd_seq snd_timer
snd_seq_device sky2 serio_raw tifm_core snd soundcore yenta_socket
rsrc_nonstatic pcmcia_core psmouse iTCO_wdt iTCO_vendor_support
snd_page_alloc shpchp pci_hotplug intel_agp agpgart evdev ext3 jbd
mbcache sg sd_mod sr_mod cdrom ata_generic ata_piix ohci1394 ieee1394
libata scsi_mod ehci_hcd uhci_hcd usbcore thermal processor fan
[ 4771.448000] CPU:0
[ 4771.448000] EIP:0060:[]Not tainted VLI
[ 4771.448000] EFLAGS: 00010202   (2.6.22-rc7-50 #1)
[ 4771.448000] EIP is at make_class_name+0x27/0x80
[ 4771.448000] eax:    ebx:    ecx:    edx: 000b
[ 4771.448000] esi: f89ffca6   edi:    ebp:    esp: f7d05e5c
[ 4771.448000] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
[ 4771.448000] Process khubd (pid: 2188, ti=f7d04000 task=c190f480
task.ti=f7d04000)
[ 4771.448000] Stack: dfc9a608 dfc9a494 dfc9a600 dfc9a608 f8a12ac0
c0255e59  f8a12b88
[ 4771.448000]dfc9a600 dfc9a494 0246  c0255ee8
dfc9a400 f89f9b56 dfc9a400
[ 4771.448000]efc26800

[2.6 patch] the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI

2007-07-08 Thread Adrian Bunk

On Mon, Jul 02, 2007 at 09:13:54PM -0400, Dave Jones wrote:
> On Tue, Jul 03, 2007 at 03:06:11AM +0200, Adrian Bunk wrote:
> 
>  > >  > @@ -65,1 +60,1 @@
>  > >  > - depends on X86_ACPI_CPUFREQ || X86_SPEEDSTEP_CENTRINO_ACPI || 
> X86_POWERNOW_K8_ACPI
>  > >  > + depends on X86_ACPI_CPUFREQ || X86_POWERNOW_K8_ACPI
>  > >  > --- 
> linux-2.6.20-mm1/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c.old
> 2007-02-17 23:29:53.0 +0100
>  > >  > +++ 
> linux-2.6.20-mm1/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
> 2007-02-17 23:30:44.0 +0100
>  > > 
>  > > This won't apply either.
>  > 
>  >   git-apply --unidiff-zero
> 
> git-apply isn't the same thing as git-applymbox. (which doesn't know that 
> option)
> The former involves a lot more hassle to get the changelog & comments
> pasted in, along with it getting attribution correct automatically.

Next try...

>   Dave

cu
Adrian


<--  snip  -->


This patch contains the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
Acked-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>
Acked-by: Dave Jones <[EMAIL PROTECTED]>

---

 Documentation/feature-removal-schedule.txt|   22 -
 arch/i386/kernel/cpu/cpufreq/Kconfig  |   18 
 arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c |  280 --
 arch/x86_64/kernel/cpufreq/Kconfig|6 
 4 files changed, 21 insertions(+), 305 deletions(-)

--- linux-2.6.19-rc6-mm2/arch/i386/kernel/cpu/cpufreq/Kconfig.old   
2006-12-01 07:23:38.0 +0100
+++ linux-2.6.19-rc6-mm2/arch/i386/kernel/cpu/cpufreq/Kconfig   2006-12-01 
07:24:02.0 +0100
@@ -109,7 +109,7 @@
 config X86_SPEEDSTEP_CENTRINO
tristate "Intel Enhanced SpeedStep"
select CPU_FREQ_TABLE
-   select X86_SPEEDSTEP_CENTRINO_TABLE if (!X86_SPEEDSTEP_CENTRINO_ACPI)
+   select X86_SPEEDSTEP_CENTRINO_TABLE
help
  This adds the CPUFreq driver for Enhanced SpeedStep enabled
  mobile CPUs.  This means Intel Pentium M (Centrino) CPUs. However,
@@ -121,20 +121,6 @@
 
  If in doubt, say N.
 
-config X86_SPEEDSTEP_CENTRINO_ACPI
-   bool "Use ACPI tables to decode valid frequency/voltage (deprecated)"
-   depends on X86_SPEEDSTEP_CENTRINO && ACPI_PROCESSOR
-   depends on !(X86_SPEEDSTEP_CENTRINO = y && ACPI_PROCESSOR = m)
-   help
- This is deprecated and this functionality is now merged into
- acpi_cpufreq (X86_ACPI_CPUFREQ). Use that driver instead of
- speedstep_centrino.
- Use primarily the information provided in the BIOS ACPI tables
- to determine valid CPU frequency and voltage pairings. It is
- required for the driver to work on non-Banias CPUs.
-
- If in doubt, say Y.
-
 config X86_SPEEDSTEP_CENTRINO_TABLE
bool "Built-in tables for Banias CPUs"
depends on X86_SPEEDSTEP_CENTRINO
@@ -222,7 +207,7 @@
 config X86_ACPI_CPUFREQ_PROC_INTF
bool "/proc/acpi/processor/../performance interface (deprecated)"
depends on PROC_FS
-   depends on X86_ACPI_CPUFREQ || X86_SPEEDSTEP_CENTRINO_ACPI || 
X86_POWERNOW_K7_ACPI || X86_POWERNOW_K8_ACPI
+   depends on X86_ACPI_CPUFREQ || X86_POWERNOW_K7_ACPI || 
X86_POWERNOW_K8_ACPI
help
  This enables the deprecated /proc/acpi/processor/../performance
  interface. While it is helpful for debugging, the generic,
--- linux-2.6.20-mm1/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c.old  
2007-02-17 23:29:53.0 +0100
+++ linux-2.6.20-mm1/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c  
2007-02-17 23:30:44.0 +0100
@@ -21,12 +21,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_X86_SPEEDSTEP_CENTRINO_ACPI
-#include 
-#include 
-#include 
-#endif
-
 #include 
 #include 
 #include 
@@ -257,9 +251,7 @@
/* Matched a non-match */
dprintk("no table support for CPU model \"%s\"\n",
   cpu->x86_model_id);
-#ifndef CONFIG_X86_SPEEDSTEP_CENTRINO_ACPI
-   dprintk("try compiling with CONFIG_X86_SPEEDSTEP_CENTRINO_ACPI 
enabled\n");
-#endif
+   dprintk("try using the acpi-cpufreq driver\n");
return -ENOENT;
}
 
@@ -346,213 +338,6 @@
 }
 
 
-#ifdef CONFIG_X86_SPEEDSTEP_CENTRINO_ACPI
-
-static struct acpi_processor_performance *acpi_perf_data[NR_CPUS];
-
-/*
- * centrino_cpu_early_init_acpi - Do the preregistering with ACPI P-States
- * library
- *
- * Before doing the actual init, we need to do _PSD related setup whenever
- * supported by the BIOS. These are handled by this early_init routine.
- */
-static int centrino_cpu_early_init_acpi(void)
-{
-   unsigned inti, j;
-   struct acpi_processor_performance   *data;
-
-   for_each_possible_cpu(i) {
-   data = kzalloc(sizeof(struct acpi_processor_performance), 
-   GFP_KERNEL);
-   if (!data) {
-

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 23:03, Benjamin Herrenschmidt wrote:
> On Sun, 2007-07-08 at 21:15 +0200, Rafael J. Wysocki wrote:
> > On Sunday, 8 July 2007 07:14, Benjamin Herrenschmidt wrote:
> > [--snip--]
> > > 
> > > I just think that the freezer approach, as it is, is backward. We can't
> > > have a 3rd party try to discriminate what to freeze and what not, it
> > > will always get something wrong, and in some cases with the wrong timing
> > > or ordering.
> > 
> > Nice discussion, except for one thing: the freezer doesn't decide what to
> > freeze.  For example, even right now kernel threads decide if they want to 
> > be
> > frozen.
> 
> Somewhat... userspace doesn't and workqueues are a gray area.

Workqueues are kernel threads and the creator decides if they are going to
freeze.  There are only two freezable worqueues in the entire tree right now.

> Also, I've been thinking this "icebox" idea a bit more and it seems in
> fact a bit racy in some areas, at least for use by things like drivers,
> unless we end up doing something aking to an RCU on suspend, waiting for
> all tasks to reach userland once, but that has the same annoyances as
> the current freezer.
> 
> Thus I'm tempted to go back to saying that driver can handle things
> locally :-)

Actaully, I'm perfectly fine with that, as long as each task blocked by the
driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
least theoretically, we'll be able to drop the freezer from the suspend code
path and move it after device_suspend() (or the hibernation-specific
equivalent) for hibernation (in that case there shouldn't be a problem with
any task waiting on I/O while the freezer is running ;-)).

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

cfi_interleave undefined build error

2007-07-08 Thread Jesper Juhl


For your information;

I was building a few randconfig kernels today and ran into this little
problem :

...
 CC [M]  drivers/mtd/chips/cfi_probe.o
In file included from drivers/mtd/chips/cfi_probe.c:19:
include/linux/mtd/cfi.h: In function 'cfi_build_cmd':
include/linux/mtd/cfi.h:293: warning: implicit declaration of function
'cfi_interleave'
 CC [M]  drivers/mtd/chips/cfi_util.o
In file included from drivers/mtd/chips/cfi_util.c:27:
include/linux/mtd/cfi.h: In function 'cfi_build_cmd':
include/linux/mtd/cfi.h:293: warning: implicit declaration of function
'cfi_interleave'
 CC [M]  drivers/mtd/chips/cfi_cmdset_0020.o
In file included from drivers/mtd/chips/cfi_cmdset_0020.c:36:
include/linux/mtd/cfi.h: In function 'cfi_build_cmd':
include/linux/mtd/cfi.h:293: warning: implicit declaration of function
'cfi_interleave'
 CC [M]  drivers/mtd/chips/cfi_cmdset_0002.o
In file included from drivers/mtd/chips/cfi_cmdset_0002.c:39:
include/linux/mtd/cfi.h: In function 'cfi_build_cmd':
include/linux/mtd/cfi.h:293: warning: implicit declaration of function
'cfi_interleave'
 CC [M]  drivers/mtd/chips/gen_probe.o
In file included from drivers/mtd/chips/gen_probe.c:13:
include/linux/mtd/cfi.h: In function 'cfi_build_cmd':
include/linux/mtd/cfi.h:293: warning: implicit declaration of function
'cfi_interleave'
 CC [M]  drivers/mtd/chips/jedec_probe.o
In file included from drivers/mtd/chips/jedec_probe.c:24:
include/linux/mtd/cfi.h: In function 'cfi_build_cmd':
include/linux/mtd/cfi.h:293: warning: implicit declaration of function
'cfi_interleave'
...
Kernel: arch/i386/boot/bzImage is ready  (#1)
 Building modules, stage 2.
 MODPOST 102 modules
ERROR: "cfi_interleave" [drivers/mtd/chips/jedec_probe.ko] undefined!
ERROR: "cfi_interleave" [drivers/mtd/chips/cfi_util.ko] undefined!
ERROR: "cfi_interleave" [drivers/mtd/chips/cfi_probe.ko] undefined!
ERROR: "cfi_interleave" [drivers/mtd/chips/cfi_cmdset_0020.ko] undefined!
ERROR: "cfi_interleave" [drivers/mtd/chips/cfi_cmdset_0002.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

The config file that resulted in this is attached.

(PS. when replying from the linux-mtd list, please keep me on Cc since
I'm not subscribed)

--
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html


config.10
Description: Binary data

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 22:16, Alan Stern wrote:
> I'll make this reply short by agreeing up front with most of what you 
> say.
> 
> On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote:
> 
> > But that's only the "main" path. Aside for that, almost all drivers also
> > have sideband "request" input and some driver don't actually live behind
> > a subsystem. That ranges from ioctl, to direct read/write on a char dev
> > from userland.
> 
> Yes, these are the problem cases.
> 
> > I think many of those cases can fairly well deal with just taking a PM
> > semaphore, that's how I did for a couple of things in the past, provided
> > that the request path isn't deadlocking with the semaphore held because
> > of the system suspending of course.  
> 
> That's what USB does as well (for the drivers which have runtime PM
> support -- at the moment only a few of them).
> 
> > But in a whole lot of cases, it's, I beleive, perfectly kosher to just
> > return an error. You're trying to capture frame from your camera while
> > the machine is suspended ? error. At worst, your capture app will be
> > unhappy when you wakeup, nothing terrible and totally fixable in
> > userland if it's a problem.
> 
> We can try falling back on this approach for now.  If the drivers are
> smart enough to fail cleanly when the device is already suspended, it
> should work.

And are they?

> But I'm not sure it's a good idea in the long run.  Think of a printer 
> daemon, for example.  It shouldn't have to experience unexpected I/O 
> problems merely because someone has decided to put the system to sleep.

Agreed.

> > In some cases, we could use a little bit more help from the subsystem.
> > Network for example, could have some explicit knowledge of the suspend
> > state, and in addition to stopping the queue would also stop calling
> > into things like change_mtu or set_multicast, provided it's agreed that
> > the driver will account for those changes on resume (the actual MTU
> > values or multicast lists are still updated in the netdev).
> 
> This will be up to the people responsible for the subsystems.  I can 
> take care of USB.
> 
> > There are two things I believe. There's a generic issue with usermode
> > helpers that make no sense to call between pre-suspend and
> > post-resume, and there's the specific issue of adding/removing
> > devices.
> > 
> > I believe that "bus" drivers such as USB should indeed get a first
> > round of notifications to tell them to stop performing bus
> > plug/unplug operations (it's debatable whether we want to keep unplug
> > going provided we can stack up the usermode events and re-send them
> > later though, but let's say no for the sake of simplicity).
> 
> Yes.  Rafael, how close is your new notifier chain to mainline?  Can it 
> at least be added to Greg KH's development tree so that I can start 
> using it?

It's in -mm.  The patches queued up in -mm are also in my patchset at
http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/

> > > So instead, why not have the PM core take care of all this?  There
> > > could be a block_task_until_suspend_is_over() routine available for all
> > > drivers to use.  Its effect would be exactly the same as sending the
> > > current task into the freezer, but it wouldn't be the freezer that
> > > exists now.  It would just be some routine that blocks until the system 
> > > suspend is over.  We could call it "the icebox" instead of "the 
> > > freezer".  :-)
> > 
> > I'm not totally sure about that. I like some of it, but I think it's
> > fairly different conceptually from the freezer (and the implementation
> > could be as trivial as a single system wide wait queue). 
> 
> Exactly.
> 
> > Basically it has a very big difference to the current freezer, and I
> > like that, which is that we don't have some 3rd party trying to find out
> > what to freeze and what not (the freezer), but instead, we have
> > explicitely drivers or kernel threads sending -themselves- to the
> > "icebox" when they think it's a good idea. Think of it as lazy freezing
> > -> you only freeze lazy tasks that are trying to do something that
> > cannot be done because of suspend.
> > 
> > > Does that make you happier?
> > 
> > I think it's a fairly significant change from the current freezer and I
> > also think it's a very good idea. The more I think about it, the more I
> > like it, in the sense that it's a simple drop-in that you could put in a
> > lot of the ioctl path of drivers to just block tasks that are trying to
> > call in while suspending, and could be used selectively by things like
> > the USB hub threads.
> 
> That's what I had in mind.  Rafael, can we add an "icebox" routine?  

Yes, I think so.

> Like Ben says, it doesn't need to be much more than a waitqueue
> that the current task puts itself on if a suspend is in progress.  
> Callers arriving at a time when the icebox isn't activated should
> simply return without blocking.  Basically the icebox should be

Re: Understanding I/O behaviour

2007-07-08 Thread Jesper Juhl

On 05/07/07, Jesper Juhl <[EMAIL PROTECTED]> wrote:

On 05/07/07, Martin Knoblauch <[EMAIL PROTECTED]> wrote:
> Hi,
>
>  for a customer we are operating a rackful of HP/DL380/G4 boxes that
> have given us some problems with system responsiveness under [I/O
> triggered] system load.
>
>  The systems in question have the following HW:
>
> 2x Intel/EM64T CPUs
> 8GB memory
> CCISS Raid controller with 4x72GB SCSI disks as RAID5
> 2x BCM5704 NIC (using tg3)
>
>  The distribution is RHEL4. We have tested several kernels including
> the original 2.6.9, 2.6.19.2, 2.6.22-rc7 and 2.6.22-rc7+cfs-v18.
>
>  One part of the workload is when several processes try to write 5 GB
> each to the local filesystem (ext2->LVM->CCISS). When this happens, the
> load goes up to 12 and responsiveness goes down. This means from one
> moment to the next things like opening a ssh connection to the host in
> question, or doing "df" take forever (minutes). Especially bad with the
> vendor kernel, better (but not perfect) with 2.6.19 and 2.6.22-rc7.
>
>  The load basically comes from the writing processes and up to 12
> "pdflush" threads all being in "D" state.
>
>  So, what I would like to understand is how we can maximize the
> responsiveness of the system, while keeping disk throughput at maximum.
>

I'd suspect you can't get both at 100%.

I'd guess you are probably using a 100Hz no-preempt kernel.  Have you
tried a 1000Hz + preempt kernel?   Sure, you'll get a bit lower
overall throughput, but interactive responsiveness should be better -
if it is, then you could experiment with various combinations of
CONFIG_PREEMPT, CONFIG_PREEMPT_VOLUNTARY, CONFIG_PREEMPT_NONE and
CONFIG_HZ_1000, CONFIG_HZ_300, CONFIG_HZ_250, CONFIG_HZ_100 to see
what gives you the best balance between throughput and interactive
responsiveness (you could also throw CONFIG_PREEMPT_BKL and/or
CONFIG_NO_HZ, but I don't think the impact will be as significant as
with the other options, so to keep things simple I'd leave those out
at first) .

I'd guess that something like CONFIG_PREEMPT_VOLUNTARY + CONFIG_HZ_300
would probably be a good compromise for you, but just to see if
there's any effect at all, start out with CONFIG_PREEMPT +
CONFIG_HZ_1000.

I'm currious, did you ever try playing around with CONFIG_PREEMPT* and
CONFIG_HZ* to see if that had any noticable impact on interactive
performance and stuff like logging into the box via ssh etc...?

--
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt

On Sun, 2007-07-08 at 16:22 -0400, Alan Stern wrote:
> On Sun, 8 Jul 2007, Rafael J. Wysocki wrote:
> 
> > I'm all for changing this infrastructure, but in an organized way (ie. we
> > discuss what to do next, we do that and then we go to the next step) and in
> > the order that everyone will be comfortable with.
> > 
> > So, let's finish this thread and start over from discussing what needs to be
> > done, how (ie. in what order etc.) we are going to do that and who is going 
> > to
> > do what.  Shall we?
> 
> IMO we should start by using the new notifier chain and by implementing 
> a central "icebox" routine.  Then we can forbid device registration 
> during suspend.
> 
> It might also be a good idea to add a freezable keventd-like workqueue 
> specifically intended for things that need to block during a suspend.  
> Although maybe this will end up being unnecessary; it's too soon to 
> tell.

I think looking at the kmallo/gfp issue I mentioned earlier should be
done asap. I can try to spare some time for it this week but I can't
promise, I'm actually swamped by something totally unrelated at the
moment.

Another issue that's been a problem forever with suspend is the
synchronous request_firmware interface. Lots of drivers do that in
resume() which will generally not work.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt


> But I'm not sure it's a good idea in the long run.  Think of a printer 
> daemon, for example.  It shouldn't have to experience unexpected I/O 
> problems merely because someone has decided to put the system to sleep.

Why not ? Printer is offline when machine is asleep... trying to print
errors out, I don't see the problem there. At one point, we'll need a
cleaner way to also notify userland in which case our daemon could
become more intelligent and stop servicings things before sleep and
resume afterward :-)
 
> This will be up to the people responsible for the subsystems.  I can 
> take care of USB.

USB is not that much of a problem in the sense that for most "leaf"
drivers, USB is a provider (ie, the bus they sit on), not the client
(like the network stack is to network drivers).

In most cases, that "helper" thing would sit on the client subsystem,
since it's the one feeding drivers with requests. The main ones I see at
hand are block, alsa, net, fb/drm... Some of them already have
infrastructure to do it, some my need some more work.

> > I think it's a fairly significant change from the current freezer and I
> > also think it's a very good idea. The more I think about it, the more I
> > like it, in the sense that it's a simple drop-in that you could put in a
> > lot of the ioctl path of drivers to just block tasks that are trying to
> > call in while suspending, and could be used selectively by things like
> > the USB hub threads.
> 
> That's what I had in mind.  Rafael, can we add an "icebox" routine?  
> Like Ben says, it doesn't need to be much more than a waitqueue
> that the current task puts itself on if a suspend is in progress.  
> Callers arriving at a time when the icebox isn't activated should
> simply return without blocking.  Basically the icebox should be active 
> at the same times as the existing freezer.

There is still the race of:

drivers_sysfs_write()
try_to_icebox()
< 
hit hardware

Those are akin, in some ways, to the freezer races. Some kind of RCU
might take care of them if we enable the icebox, then wait for all tasks
to hit an explicit schedule point once (or return to userland). That
would mean that drivers need to try_to_icebox() again if they do
something that may schedule (such as __get_user). So it's not a magic
solution, it has issues, but it can handle a lot of the simplest cases. 

> Here's a wacky idea which just might work:
> 
> In order to prevent binding and unbinding, while suspending devices all
> the PM core has to do is avoid dropping the device semaphores!  It can
> release the semaphores as it resumes the devices.
> 
> Of course, for this to work it's necessary to avoid changes to the 
> device list during the suspend.  However I believe the iteration can be 
> made safe against unregistration, so we only have to prevent device 
> registration.  (And anyway, it won't be possible to unregister a device 
> while the PM core is holding its semaphore.)
> 
> If we are willing to be somewhat non-transparent, this is easy to
> accomplish.  After the notifier chain has been alerted about the
> upcoming suspend, we tell the driver core to disallow adding new
> devices.  Maybe use SRCU to synchronize with registration calls that
> are in progress.  Thus, until the suspend is over device_add() will
> immediately return an error.  We could even add a new ESUSPENDING code
> to errno.h; it would come in handy in a few places.
> 
> Drivers are already prepared for device registration to fail (or they
> ought to be), so this change shouldn't knock the bottom out of things.  
> device_add() isn't on a hot path, so adding an extra check and
> srcu_read_lock() won't hurt.

True. Also, bus drivers could just flag the port with something saying
"try registering again later". Don't underestimate the power of "try
later" constructs :-)

> I have had the same thought, that unbinding and unregistration would be 
> easier to handle than binding and registration.  As it happens, holding 
> the device semaphore will block both all three -- which makes life 
> simpler.

Yup. Fair and simple.

> There are other possibilities too.  For example, instead of using
> keventd these attributes could use a separate workqueue which would put
> itself in the icebox during a suspend.  Or maybe sysfs can be reworked 
> so that they don't need to use a workqueue at all.

I think having a facility for a given workqueue entry to requeue itself
for after resume might be of use for drivers too.

> > Don't get me wrong, I never said we don't need generic infrastructure
> > and utilities, such as your proposed icebox scheme, or some of those
> > workqueue bits, helpers in subsystems, etc...
> 
> I hope that everyone will agree that now is a good time to get started 
> on them.  There shouldn't be any problem about having them present 
> along with the freezer, and then it will be all that much easier to 
> remove the freezer later on if

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Pavel Machek

Hi!

> > I'm all for changing this infrastructure, but in an organized way (ie. we
> > discuss what to do next, we do that and then we go to the next step) and in
> > the order that everyone will be comfortable with.
> > 
> > So, let's finish this thread and start over from discussing what needs to be
> > done, how (ie. in what order etc.) we are going to do that and who is going 
> > to
> > do what.  Shall we?
> 
> IMO we should start by using the new notifier chain and by implementing 
> a central "icebox" routine.  Then we can forbid device registration 
> during suspend.
> 
> It might also be a good idea to add a freezable keventd-like workqueue 
> specifically intended for things that need to block during a suspend.  
> Although maybe this will end up being unnecessary; it's too soon to 
> tell.

We actually had patches for freezeable workqueues at one point.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Pavel Machek

Hi!

> > > I just think that the freezer approach, as it is, is backward. We can't
> > > have a 3rd party try to discriminate what to freeze and what not, it
> > > will always get something wrong, and in some cases with the wrong timing
> > > or ordering.
> > 
> > Nice discussion, except for one thing: the freezer doesn't decide what to
> > freeze.  For example, even right now kernel threads decide if they want to 
> > be
> > frozen.
> 
> Somewhat... userspace doesn't and workqueues are a gray area.

But userspace must not be neccessary for kernel functioning, so that's
quite okay. And we do need to solve the workqueues.

> Also, I've been thinking this "icebox" idea a bit more and it seems in
> fact a bit racy in some areas, at least for use by things like drivers,
> unless we end up doing something aking to an RCU on suspend, waiting for
> all tasks to reach userland once, but that has the same annoyances as
> the current freezer.
> 
> Thus I'm tempted to go back to saying that driver can handle things
> locally :-)

:-). Or perhaps freezer is not _that_ evil after all?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc6 bad page error

2007-07-08 Thread Bob Tracy

Hugh Dickins wrote:
> (...) I'm expecting this to be a regression we introduced in
> 2.6.15, rather than recently in 2.6.22 (now, that's better isn't it ;-?)

I was reasonably certain this wasn't a recent regression.  I share a set
of speakers between two machines, and I hadn't used sound on the Alpha
in quite a while.

> Thanks for reporting: please let us all know whether this
> patch does fix your problem: I may be guessing wrong.

ACK.  I'll build the kernel today, and fire up the test fixture
tomorrow.

-- 
---
Bob Tracy   | "Eagles may soar, but weasels don't get
[EMAIL PROTECTED]|  sucked into jet engines."   --Anon
---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt

On Sun, 2007-07-08 at 21:15 +0200, Rafael J. Wysocki wrote:
> On Sunday, 8 July 2007 07:14, Benjamin Herrenschmidt wrote:
> [--snip--]
> > 
> > I just think that the freezer approach, as it is, is backward. We can't
> > have a 3rd party try to discriminate what to freeze and what not, it
> > will always get something wrong, and in some cases with the wrong timing
> > or ordering.
> 
> Nice discussion, except for one thing: the freezer doesn't decide what to
> freeze.  For example, even right now kernel threads decide if they want to be
> frozen.

Somewhat... userspace doesn't and workqueues are a gray area.

Also, I've been thinking this "icebox" idea a bit more and it seems in
fact a bit racy in some areas, at least for use by things like drivers,
unless we end up doing something aking to an RCU on suspend, waiting for
all tasks to reach userland once, but that has the same annoyances as
the current freezer.

Thus I'm tempted to go back to saying that driver can handle things
locally :-)

Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Pavel Machek

Hi!

> > There are two things I believe. There's a generic issue with usermode
> > helpers that make no sense to call between pre-suspend and
> > post-resume, and there's the specific issue of adding/removing
> > devices.
> > 
> > I believe that "bus" drivers such as USB should indeed get a first
> > round of notifications to tell them to stop performing bus
> > plug/unplug operations (it's debatable whether we want to keep unplug
> > going provided we can stack up the usermode events and re-send them
> > later though, but let's say no for the sake of simplicity).
> 
> Yes.  Rafael, how close is your new notifier chain to mainline?  Can it 
> at least be added to Greg KH's development tree so that I can start 
> using it?

It should be in -mm, IIRC.

> > I think it's a fairly significant change from the current freezer and I
> > also think it's a very good idea. The more I think about it, the more I
> > like it, in the sense that it's a simple drop-in that you could put in a
> > lot of the ioctl path of drivers to just block tasks that are trying to
> > call in while suspending, and could be used selectively by things like
> > the USB hub threads.
> 
> That's what I had in mind.  Rafael, can we add an "icebox" routine?  
> Like Ben says, it doesn't need to be much more than a waitqueue
> that the current task puts itself on if a suspend is in progress.  
> Callers arriving at a time when the icebox isn't activated should
> simply return without blocking.  Basically the icebox should be active 
> at the same times as the existing freezer.

You could use try_to_freeze(), but you want it to be separate routine
so it can be grepped for... Can you #define icebox_me try_to_freeze
for testing?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 21:50, Miklos Szeredi wrote:
> > > > Well, fix userspace filesystems and maybe NFS. If they react to
> > > > sigstop in timely manner, they will work with suspend properly, too.
> > > 
> > > Which is pretty much impossible, given the unix filesystem API.  To be
> > > able to react to sigstop, the operations in question need to be
> > > restartable.  Which they are not, so they can't react to sigstop.  End
> > > of story.
> > 
> > Or not.  That depends on your willingness to cooperate, I'd say. :-)
> 
> Do you actually understand what I'm talking about?  Because it sure
> doesn't depend on my cooperation.
>
> Maybe I'm stupid, and I'm missing something obvious.  In that case
> please explain how you propose to make filesystem operations, like
> rename() restartable.

I'm not proposing that, sorry.  I'm not a filesystem expert, so don't think
I can propose a working solution here, right now.  Still, maybe something
like in fork.c, line 1409-1411 might work (I know that it won't solve the
"other tasks may be waiting on VFS mutexes" issue, but at least it may
decrease the probability of freezer failure).

> > > You may not like the fact that one process can cause another to go
> > > into uninterruptible sleep, but in fact there's nothing wrong with
> > > that.
> > 
> > Well, this introduces interdependencies between processes that do not exist
> > otherwise.  Even if that isn't wrong per se, it's something that needs
> > consideration in any case.
> > 
> > IMO, FUSE breaks one of the assumptions that the freezer is based on and
> > saying that the freezer is broken because of that is unfair.
> 
> The freezer is not broken because of that, it's broken anyway.  What
> we are seeing is a _symptom_ of it being broken.
> 
> And by broken, I don't mean it's buggy or that it was badly designed.
> I just mean, that it's simply not what suspend should depend on, to
> protect drivers.

Please have a look at the documentation update at the bottom of this patch:

http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch

It says what the freezer is for in the first place. :-)
 
> > > So the fact that the freezer can't handle this is unfortunate, but
> > > it's just a symptom of the brokenness of it, not something that fuse
> > > introduced.  Not being able to suspend with NFS (or other network
> > > filesystems) when the network is lost shows that this is a deeper
> > > problem.
> > 
> > Well, the system that cannot access its filesystems is not in a consistent
> > state, so it generally is not reasonable to suspend or hibernate it.
> 
> Saying the system must be in a "consistent" state, and defining
> consistent as "every process is stopped", is just an arbitrary
> limitation that fits what the freezer does now.  Yes the _hardware_
> state must be consistent, but that has nothing to do with either fuse
> or NFS.  Can't you see that?

Well, I can agree as far as FUSE is concerned.  Still, imagine that you have
an NFS share mounted, which is unavailable at the moment and one of the
tasks waits for an I/O on it to complete (it might hold some locks needed
for suspending, but say it doesn't).  Then you suspend, take the box somewhere
else and attach to a different network with a different NFS server.  Now, you
resume and you have a mess to recover from.  Yes, it _should_ be recoverable,
but refusing to suspend in such a case is not okerkill, IMHO.

> > > As stated otherwise in the thread, suspend2 in fact allowed processes
> > > to be in uninterruptible sleep instead, without negative side effects.
> > 
> > And yet, Nigel thinks that the freezer is necessary for the hibernation.
> > Strange, no?
> 
> I'm totally ignorant about why the freezer is necessary for hibernate.
> Please explain.

See above. :-)

> Yes, we need to make sure, that nothing is scheduled during (and
> possibly after) taking the snapshot.  But AFAICS that could be
> achieved by unplugging all but one CPU.

Actually, we want things to get scheduled, because we need some of them to
save the image (to make things more difficult, we don't know what will be
needed to save the image in advance ;-)).

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 >

1 - 100 of 486 matches

Mail list logo