Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-10-24 Thread Rafael J. Wysocki
On Thursday, 11 October 2007 22:54, Pavel Machek wrote:
> Hi!
> 
> > > That's certainly possible. We already pass a very small amount of data 
> > > between 
> > > the boot and resuming kernels at the moment, and it's done quite simply - 
> > > by 
> > > putting the variables we want to 'transfer' in a nosave page/section.
> > 
> > Well, if the boot and image kernels are different, which is now possible on
> > x86_64 with some recent patches (currently in -mm), the nosave trick won't
> > work.
> 
> I guess we should remove the nosave at least from x86-64. If
> someone tries to use it, he'll get a nasty surprise.

Agreed.

I'll try to prepare a patch for that when I have a bit of time.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-10-24 Thread Pavel Machek
Hi!

> > That's certainly possible. We already pass a very small amount of data 
> > between 
> > the boot and resuming kernels at the moment, and it's done quite simply - 
> > by 
> > putting the variables we want to 'transfer' in a nosave page/section.
> 
> Well, if the boot and image kernels are different, which is now possible on
> x86_64 with some recent patches (currently in -mm), the nosave trick won't
> work.

I guess we should remove the nosave at least from x86-64. If
someone tries to use it, he'll get a nasty surprise.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-10-24 Thread Pavel Machek
Hi!

  That's certainly possible. We already pass a very small amount of data 
  between 
  the boot and resuming kernels at the moment, and it's done quite simply - 
  by 
  putting the variables we want to 'transfer' in a nosave page/section.
 
 Well, if the boot and image kernels are different, which is now possible on
 x86_64 with some recent patches (currently in -mm), the nosave trick won't
 work.

I guess we should remove the nosave at least from x86-64. If
someone tries to use it, he'll get a nasty surprise.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-10-24 Thread Rafael J. Wysocki
On Thursday, 11 October 2007 22:54, Pavel Machek wrote:
 Hi!
 
   That's certainly possible. We already pass a very small amount of data 
   between 
   the boot and resuming kernels at the moment, and it's done quite simply - 
   by 
   putting the variables we want to 'transfer' in a nosave page/section.
  
  Well, if the boot and image kernels are different, which is now possible on
  x86_64 with some recent patches (currently in -mm), the nosave trick won't
  work.
 
 I guess we should remove the nosave at least from x86-64. If
 someone tries to use it, he'll get a nasty surprise.

Agreed.

I'll try to prepare a patch for that when I have a bit of time.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-27 Thread Nigel Cunningham
Hi.

On Thursday 27 September 2007 16:33:54 Huang, Ying wrote:
> On Wed, 2007-09-26 at 16:30 -0400, Joseph Fannin wrote:
> > But, in my ignorance, I'm not sure even fixing the ext3 bug will
> > guarantee you consistent metadata so that you can handle a
> > swap/hibernate file.  You can do a sync(), but how do you make that
> > not race against running processes without the freezer, or blkdev
> > snapshots?
> > 
> > I guess uswsusp and the-patch-previously-known-as-suspend2 handle
> > this somehow, though.
> 
> The image-writing kernel of kexec based hibernation run in a controlled
> way. It is not used by normal user, so only really necessary process
> need to be run. For example, it is possible that there is only one user
> process -- the image-writing process running in image-writing kernel.
> So, no freezer or blkdev snapshot is needed.

You're thinking of the wrong kernel - we were talking about prior to switching 
to the kexec'd kernel while suspending.

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-27 Thread Huang, Ying
On Wed, 2007-09-26 at 16:30 -0400, Joseph Fannin wrote:
> But, in my ignorance, I'm not sure even fixing the ext3 bug will
> guarantee you consistent metadata so that you can handle a
> swap/hibernate file.  You can do a sync(), but how do you make that
> not race against running processes without the freezer, or blkdev
> snapshots?
> 
> I guess uswsusp and the-patch-previously-known-as-suspend2 handle
> this somehow, though.

The image-writing kernel of kexec based hibernation run in a controlled
way. It is not used by normal user, so only really necessary process
need to be run. For example, it is possible that there is only one user
process -- the image-writing process running in image-writing kernel.
So, no freezer or blkdev snapshot is needed.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-27 Thread Huang, Ying
On Wed, 2007-09-26 at 16:30 -0400, Joseph Fannin wrote:
 But, in my ignorance, I'm not sure even fixing the ext3 bug will
 guarantee you consistent metadata so that you can handle a
 swap/hibernate file.  You can do a sync(), but how do you make that
 not race against running processes without the freezer, or blkdev
 snapshots?
 
 I guess uswsusp and the-patch-previously-known-as-suspend2 handle
 this somehow, though.

The image-writing kernel of kexec based hibernation run in a controlled
way. It is not used by normal user, so only really necessary process
need to be run. For example, it is possible that there is only one user
process -- the image-writing process running in image-writing kernel.
So, no freezer or blkdev snapshot is needed.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-27 Thread Nigel Cunningham
Hi.

On Thursday 27 September 2007 16:33:54 Huang, Ying wrote:
 On Wed, 2007-09-26 at 16:30 -0400, Joseph Fannin wrote:
  But, in my ignorance, I'm not sure even fixing the ext3 bug will
  guarantee you consistent metadata so that you can handle a
  swap/hibernate file.  You can do a sync(), but how do you make that
  not race against running processes without the freezer, or blkdev
  snapshots?
  
  I guess uswsusp and the-patch-previously-known-as-suspend2 handle
  this somehow, though.
 
 The image-writing kernel of kexec based hibernation run in a controlled
 way. It is not used by normal user, so only really necessary process
 need to be run. For example, it is possible that there is only one user
 process -- the image-writing process running in image-writing kernel.
 So, no freezer or blkdev snapshot is needed.

You're thinking of the wrong kernel - we were talking about prior to switching 
to the kexec'd kernel while suspending.

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-26 Thread Nigel Cunningham
Hi.

On Thursday 27 September 2007 06:30:36 Joseph Fannin wrote:
> On Fri, Sep 21, 2007 at 11:45:12AM +0200, Pavel Machek wrote:
> > Hi!
> > > >
> > > > Sounds doable, as long as you can cope with long command lines (which
> > > > shouldn't be a biggie). (If you've got a swapfile or parts of a swap
> > > > partition already in use, it can be quite fragmented).
> > >
> > > Hmm.  This is an interesting problem.  Sharing a swap file or a swap
> > > partition with the actual swap of user space pages does seem to be
> > > a limitation of this approach.
> > >
> > > Although the fact that it is simple to write to a separate file may
> > > be a reasonable compensation.
> >
> > I'm not sure how you'd write it to a separate file. Notice that kjump
> > kernel may not mount journalling filesystems, not even
> > read-only. (Ext3 replays journal in that case). You could pass block
> > numbers from the original kernel...
> 
> The ext3 thing is a bug, the case for which I don't think has been
> adequately explained to the ext[34] folks.  There should be at least a
> no_replay mount flag available, or something.  It has ramifications
> for more than just hibernation.
> 
> And yeah, I'm gonna bring up the swap files thing again.  If you
> can hibernate to a swap file, you can hibernate to a dedicated
> hibernation file, and vice versa.
> 
> If you can't hibernate to a swap file, then swap files are
> effectively unsupported for any system you might want to hibernate.
>  I wonder what embedded folks would think about that
> .
> 
> But, in my ignorance, I'm not sure even fixing the ext3 bug will
> guarantee you consistent metadata so that you can handle a
> swap/hibernate file.  You can do a sync(), but how do you make that
> not race against running processes without the freezer, or blkdev
> snapshots?
> 
> I guess uswsusp and the-patch-previously-known-as-suspend2 handle
> this somehow, though.
> 
>(It's that same ignorance that has me waiting for someone with
> established credit with kernel people to make that argument for the
> ext3 bug, so I can hang my own reasons for thinking that it's bad off
> of theirs).

I haven't looked at swsusp support, but TuxOnIce handles all storage (swap 
partitions, swap files and ordinary files) by first allocating swap (if we're 
using swap), then bmapping the storage we're going to use. After that, we can 
freeze filesystems and processes with impunity. The allocated storage is then 
viewed as just a collection of bdevs, each with an ordered chain of extents 
defining which blocks we're going to read/write - a series of tapes if you 
like. In the image header, we store dev_ts and the block chains, together 
with the configuration information. As long as the same bdevs are configured 
at boot time prior to the echo > /sys/power/resume, we're in business. 
Filesystems don't need to be mounted because we don't use filesystem code 
anyway. (LVM etc does though in so far as it's needed to make the dev_t match 
the device again).

This matches with what you said above about hibernating to swap files and 
dedicated hibernation files - TuxOnIce uses exactly the same code to do the 
i/o to both; the variation is in the code to recognise the image header and 
allocate/free/bmap storage.

 Personally, I don't think ext[34] is broken. If 
there's data being left in the journal that will need replaying, then 
mounting without replaying the journal sounds wrong. Perhaps you should 
instead be arguing that nothing should be left in the journal after a 
filesystem freeze. But, of course, current code isn't doing a filesystem 
freeze (just a process freeze) and the kexec guys want to take even that 
away. 

In short, I agree. AFAICS, you need both the process freezer and filesystem 
freezing to make this thing fly properly.

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-26 Thread Joseph Fannin
FWIW, on all the hardware I have, Windows is able to deal with:

(1) hibernate Windows
(2) run $(OTHER_OS)
(3) resume Windows

... which seems to me to say that Linux is doing it wrong if it can't
handle other ACPI users between hibernate and resume.  But maybe
that's just my hardware.

--
Joseph Fannin
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-26 Thread Joseph Fannin
On Fri, Sep 21, 2007 at 11:45:12AM +0200, Pavel Machek wrote:
> Hi!
> > >
> > > Sounds doable, as long as you can cope with long command lines (which
> > > shouldn't be a biggie). (If you've got a swapfile or parts of a swap
> > > partition already in use, it can be quite fragmented).
> >
> > Hmm.  This is an interesting problem.  Sharing a swap file or a swap
> > partition with the actual swap of user space pages does seem to be
> > a limitation of this approach.
> >
> > Although the fact that it is simple to write to a separate file may
> > be a reasonable compensation.
>
> I'm not sure how you'd write it to a separate file. Notice that kjump
> kernel may not mount journalling filesystems, not even
> read-only. (Ext3 replays journal in that case). You could pass block
> numbers from the original kernel...

The ext3 thing is a bug, the case for which I don't think has been
adequately explained to the ext[34] folks.  There should be at least a
no_replay mount flag available, or something.  It has ramifications
for more than just hibernation.

And yeah, I'm gonna bring up the swap files thing again.  If you
can hibernate to a swap file, you can hibernate to a dedicated
hibernation file, and vice versa.

If you can't hibernate to a swap file, then swap files are
effectively unsupported for any system you might want to hibernate.
 I wonder what embedded folks would think about that
.

But, in my ignorance, I'm not sure even fixing the ext3 bug will
guarantee you consistent metadata so that you can handle a
swap/hibernate file.  You can do a sync(), but how do you make that
not race against running processes without the freezer, or blkdev
snapshots?

I guess uswsusp and the-patch-previously-known-as-suspend2 handle
this somehow, though.

   (It's that same ignorance that has me waiting for someone with
established credit with kernel people to make that argument for the
ext3 bug, so I can hang my own reasons for thinking that it's bad off
of theirs).

--
Joseph Fannin
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-26 Thread Joseph Fannin
On Fri, Sep 21, 2007 at 11:45:12AM +0200, Pavel Machek wrote:
 Hi!
  
   Sounds doable, as long as you can cope with long command lines (which
   shouldn't be a biggie). (If you've got a swapfile or parts of a swap
   partition already in use, it can be quite fragmented).
 
  Hmm.  This is an interesting problem.  Sharing a swap file or a swap
  partition with the actual swap of user space pages does seem to be
  a limitation of this approach.
 
  Although the fact that it is simple to write to a separate file may
  be a reasonable compensation.

 I'm not sure how you'd write it to a separate file. Notice that kjump
 kernel may not mount journalling filesystems, not even
 read-only. (Ext3 replays journal in that case). You could pass block
 numbers from the original kernel...

The ext3 thing is a bug, the case for which I don't think has been
adequately explained to the ext[34] folks.  There should be at least a
no_replay mount flag available, or something.  It has ramifications
for more than just hibernation.

And yeah, I'm gonna bring up the swap files thing again.  If you
can hibernate to a swap file, you can hibernate to a dedicated
hibernation file, and vice versa.

If you can't hibernate to a swap file, then swap files are
effectively unsupported for any system you might want to hibernate.
handwave I wonder what embedded folks would think about that
/handwave.

But, in my ignorance, I'm not sure even fixing the ext3 bug will
guarantee you consistent metadata so that you can handle a
swap/hibernate file.  You can do a sync(), but how do you make that
not race against running processes without the freezer, or blkdev
snapshots?

I guess uswsusp and the-patch-previously-known-as-suspend2 handle
this somehow, though.

   (It's that same ignorance that has me waiting for someone with
established credit with kernel people to make that argument for the
ext3 bug, so I can hang my own reasons for thinking that it's bad off
of theirs).

--
Joseph Fannin
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-26 Thread Joseph Fannin
FWIW, on all the hardware I have, Windows is able to deal with:

(1) hibernate Windows
(2) run $(OTHER_OS)
(3) resume Windows

... which seems to me to say that Linux is doing it wrong if it can't
handle other ACPI users between hibernate and resume.  But maybe
that's just my hardware.

--
Joseph Fannin
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-26 Thread Nigel Cunningham
Hi.

On Thursday 27 September 2007 06:30:36 Joseph Fannin wrote:
 On Fri, Sep 21, 2007 at 11:45:12AM +0200, Pavel Machek wrote:
  Hi!
   
Sounds doable, as long as you can cope with long command lines (which
shouldn't be a biggie). (If you've got a swapfile or parts of a swap
partition already in use, it can be quite fragmented).
  
   Hmm.  This is an interesting problem.  Sharing a swap file or a swap
   partition with the actual swap of user space pages does seem to be
   a limitation of this approach.
  
   Although the fact that it is simple to write to a separate file may
   be a reasonable compensation.
 
  I'm not sure how you'd write it to a separate file. Notice that kjump
  kernel may not mount journalling filesystems, not even
  read-only. (Ext3 replays journal in that case). You could pass block
  numbers from the original kernel...
 
 The ext3 thing is a bug, the case for which I don't think has been
 adequately explained to the ext[34] folks.  There should be at least a
 no_replay mount flag available, or something.  It has ramifications
 for more than just hibernation.
 
 And yeah, I'm gonna bring up the swap files thing again.  If you
 can hibernate to a swap file, you can hibernate to a dedicated
 hibernation file, and vice versa.
 
 If you can't hibernate to a swap file, then swap files are
 effectively unsupported for any system you might want to hibernate.
 handwave I wonder what embedded folks would think about that
 /handwave.
 
 But, in my ignorance, I'm not sure even fixing the ext3 bug will
 guarantee you consistent metadata so that you can handle a
 swap/hibernate file.  You can do a sync(), but how do you make that
 not race against running processes without the freezer, or blkdev
 snapshots?
 
 I guess uswsusp and the-patch-previously-known-as-suspend2 handle
 this somehow, though.
 
(It's that same ignorance that has me waiting for someone with
 established credit with kernel people to make that argument for the
 ext3 bug, so I can hang my own reasons for thinking that it's bad off
 of theirs).

I haven't looked at swsusp support, but TuxOnIce handles all storage (swap 
partitions, swap files and ordinary files) by first allocating swap (if we're 
using swap), then bmapping the storage we're going to use. After that, we can 
freeze filesystems and processes with impunity. The allocated storage is then 
viewed as just a collection of bdevs, each with an ordered chain of extents 
defining which blocks we're going to read/write - a series of tapes if you 
like. In the image header, we store dev_ts and the block chains, together 
with the configuration information. As long as the same bdevs are configured 
at boot time prior to the echo  /sys/power/resume, we're in business. 
Filesystems don't need to be mounted because we don't use filesystem code 
anyway. (LVM etc does though in so far as it's needed to make the dev_t match 
the device again).

This matches with what you said above about hibernating to swap files and 
dedicated hibernation files - TuxOnIce uses exactly the same code to do the 
i/o to both; the variation is in the code to recognise the image header and 
allocate/free/bmap storage.

not a filesystem expert Personally, I don't think ext[34] is broken. If 
there's data being left in the journal that will need replaying, then 
mounting without replaying the journal sounds wrong. Perhaps you should 
instead be arguing that nothing should be left in the journal after a 
filesystem freeze. But, of course, current code isn't doing a filesystem 
freeze (just a process freeze) and the kexec guys want to take even that 
away. /not a filesystem expert

In short, I agree. AFAICS, you need both the process freezer and filesystem 
freezing to make this thing fly properly.

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-24 Thread Thomas Meyer
Andrew Morton schrieb:
> On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham <[EMAIL PROTECTED]> wrote:
>
>   
>> Hi Andrew.
>>
>> On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
>> 
>>> Seems like good enough for -mm to me.
>>>
>>> Pavel
>>>   
>> Andrew, if I recall correctly, you said a while ago that you didn't want 
>> another hibernation implementation in the vanilla kernel. If you're going to 
>> consider merging this kexec code, will you also please consider merging 
>> TuxOnIce?
>>
>> 
>
> The theory is that kexec-based hibernation will mainly use preexisting
> kexec code and will permit us to delete the existing hibernation
> implementation.
>
> That's different from replacing it
Before replacing existing hibernation implementations, someone should
fix kexec for i386 (maybe others?) EFI systems...


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-24 Thread Thomas Meyer
Andrew Morton schrieb:
 On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham [EMAIL PROTECTED] wrote:

   
 Hi Andrew.

 On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
 
 Seems like good enough for -mm to me.

 Pavel
   
 Andrew, if I recall correctly, you said a while ago that you didn't want 
 another hibernation implementation in the vanilla kernel. If you're going to 
 consider merging this kexec code, will you also please consider merging 
 TuxOnIce?

 

 The theory is that kexec-based hibernation will mainly use preexisting
 kexec code and will permit us to delete the existing hibernation
 implementation.

 That's different from replacing it
Before replacing existing hibernation implementations, someone should
fix kexec for i386 (maybe others?) EFI systems...


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-22 Thread Alon Bar-Lev
On 9/21/07, Huang, Ying <[EMAIL PROTECTED]> wrote:
> This is fairly simple in fact. For example, you can specify the
> bdev/sectors in kernel command line when do kexec load "kexec -l <...>
> --append='...'", then the image writing system can get it through
> "cat /proc/cmdline".

I hope you take into account encrypted swap configuration.
Currently all three suspend implementations support using encrypted
swap in order to suspend/resume.
A configuration which forces the user to remap encryption on the kexec
kernel during suspend is not valid.

Best Regards,
Alon Bar-Lev.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-22 Thread Rafael J. Wysocki
On Saturday, 22 September 2007 20:00, Kyle Moffett wrote:
> On Sep 22, 2007, at 06:34:17, Rafael J. Wysocki wrote:
> > On Saturday, 22 September 2007 01:19, Kyle Moffett wrote:
> >> On Sep 21, 2007, at 17:16:59, Jeremy Maitin-Shepard wrote:
> >>> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
>  The ACPI platform firmware is allowed to preserve information  
>  accross the hibernation-resume cycle, so this need not be the same.
> >>>
> >>> All of my comments related to the case where S4 is not being used  
> >>> (instead the system is just powered off normally), and a boot  
> >>> kernel that does not initialize ACPI is used.  In that case, the  
> >>> ACPI platform firmware should not be able to distinguish a normal  
> >>> boot from a resume from hibernation.
> >>
> >> I think that in order for this to work, there would need to be  
> >> some ABI whereby the resume-ing kernel can pass its entire ACPI  
> >> state and a bunch of other ACPI-related device details to the  
> >> resume-ed kernel, which I believe it does not do at the moment.
> >
> > In fact we don't need to do this.
> >
> > The solution is not to touch ACPI in the boot kernel (ie. the one  
> > that loads the image) and pass control to the image kernel.  This  
> > is how it's supposed to work according to the spec, more or less  
> > (well, there are some ugly details  that need handling, like the  
> > restoration of the NVS area).
> 
> First of all, we will need to make the resumed kernel throw away  
> *ALL* of its ACPI state on S5 and completely reinitialize ACPI as  
> though it was booting for the first time on resume.

Yes, if we entered S5 in the last step of the hibernation sequence, the right
thing to do would be to make the resumed kernel reinitialize ACPI from
scratch.

> From what I can tell, we "throw away" all the ACPI state in the boot kernel
> and reinitialize it there, but then the reinitialized state is  
> overwritten with the resumed kernel's state and the two don't always  
> happen to be the same.  (Like if a battery got replaced or AC status  
> changed).

Usually it goes like that.  Still, you can pass "acpi=off" to the boot kernel,
in which case it won't reinitialize ACPI.

> Umm, I don't see how that can possibly work properly.  For a laptop,  
> for example, the restore kernel will need to access the disk, the LCD  
> display, and possibly the AC/battery and current CPU frequency.  From  
> what I understand of ACPI, both of the former may need ACPI code to  
> operate properly (Isn't there an ATA taskfile object of some kind?)  
> and the latter two almost definitely need ACPI.

Well, this is not the case on any systems that I have access to, including
two quite modern notebooks.  Apparently, everything works without ACPI on
these machines.

Besides, in theory, it's possible to use an "intelligent" boot loader to read
the hibernation image and that doesn't need ACPI for anything.

> Ergo the boot kernel may need to initialize and use ACPI just to run an ATA
> taskfile so it can read from the HDD efficiently.

It is possible, but I haven't seen that yet.
 
> >> I believe that what causes problems is the ACPI state data that  
> >> the kernel stores is *different* between identical sequential  
> >> boots, especially when you add/remove/replace batteries, AC, etc.
> >
> > Rather the ACPI state data that the platform firmware stores may be  
> > different, depending on whether you enter S4 or S5 during "power  
> > off" and that determines the interactions between the kernel and  
> > the firmware after the next boot.
> 
> That's not what he was talking about.  The problem discussed was:
>(A) You hibernate your box, entering S5 (IE: power off)
>(B) You resume the box and the boot kernel inits all the ACPI stuff.
>(C) The boot kernel's ACPI state is completely replaced by the  
> resumed kernel's state.
>(D) Hardware stops working mysteriously because of ACPI problems.
> 
> The only possible conclusion is that the state between the boot  
> kernel and the resume kernel was *different* and so the device failed  
> because the ACPI state in the resume kernel doesn't match the actual  
> state of the hardware.

I think it's even more complicated.  The ACPI state of the resumed kernel
has to match whatever is preserved by the platform.

Well, my impression is that our current ACPI resume code actually expects
the platform to preserve something and if that's missing the devices in
question are not handled properly.  If that really is the case, there is the
question whether we can do something about it in a reasonable way and I can't
answer it right now.

Besides, I really think that we should use the ACPI S4 state, because machines
generally support that.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-22 Thread Kyle Moffett

On Sep 22, 2007, at 06:34:17, Rafael J. Wysocki wrote:

On Saturday, 22 September 2007 01:19, Kyle Moffett wrote:

On Sep 21, 2007, at 17:16:59, Jeremy Maitin-Shepard wrote:

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
The ACPI platform firmware is allowed to preserve information  
accross the hibernation-resume cycle, so this need not be the same.


All of my comments related to the case where S4 is not being used  
(instead the system is just powered off normally), and a boot  
kernel that does not initialize ACPI is used.  In that case, the  
ACPI platform firmware should not be able to distinguish a normal  
boot from a resume from hibernation.


I think that in order for this to work, there would need to be  
some ABI whereby the resume-ing kernel can pass its entire ACPI  
state and a bunch of other ACPI-related device details to the  
resume-ed kernel, which I believe it does not do at the moment.


In fact we don't need to do this.

The solution is not to touch ACPI in the boot kernel (ie. the one  
that loads the image) and pass control to the image kernel.  This  
is how it's supposed to work according to the spec, more or less  
(well, there are some ugly details  that need handling, like the  
restoration of the NVS area).


First of all, we will need to make the resumed kernel throw away  
*ALL* of its ACPI state on S5 and completely reinitialize ACPI as  
though it was booting for the first time on resume.  From what I can  
tell, we "throw away" all the ACPI state in the boot kernel and  
reinitialize it there, but then the reinitialized state is  
overwritten with the resumed kernel's state and the two don't always  
happen to be the same.  (Like if a battery got replaced or AC status  
changed).


Umm, I don't see how that can possibly work properly.  For a laptop,  
for example, the restore kernel will need to access the disk, the LCD  
display, and possibly the AC/battery and current CPU frequency.  From  
what I understand of ACPI, both of the former may need ACPI code to  
operate properly (Isn't there an ATA taskfile object of some kind?)  
and the latter two almost definitely need ACPI.  Ergo the boot kernel  
may need to initialize and use ACPI just to run an ATA taskfile so it  
can read from the HDD efficiently.


I believe that what causes problems is the ACPI state data that  
the kernel stores is *different* between identical sequential  
boots, especially when you add/remove/replace batteries, AC, etc.


Rather the ACPI state data that the platform firmware stores may be  
different, depending on whether you enter S4 or S5 during "power  
off" and that determines the interactions between the kernel and  
the firmware after the next boot.


That's not what he was talking about.  The problem discussed was:
  (A) You hibernate your box, entering S5 (IE: power off)
  (B) You resume the box and the boot kernel inits all the ACPI stuff.
  (C) The boot kernel's ACPI state is completely replaced by the  
resumed kernel's state.

  (D) Hardware stops working mysteriously because of ACPI problems.

The only possible conclusion is that the state between the boot  
kernel and the resume kernel was *different* and so the device failed  
because the ACPI state in the resume kernel doesn't match the actual  
state of the hardware.


Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-22 Thread Rafael J. Wysocki
On Saturday, 22 September 2007 01:47, Nigel Cunningham wrote:
> Hi.
> 
> On Saturday 22 September 2007 09:19:18 Kyle Moffett wrote:
> > I think that in order for this to work, there would need to be some  
> > ABI whereby the resume-ing kernel can pass its entire ACPI state and  
> > a bunch of other ACPI-related device details to the resume-ed kernel,  
> > which I believe it does not do at the moment.  I believe that what  
> > causes problems is the ACPI state data that the kernel stores is  
> > *different* between identical sequential boots, especially when you  
> > add/remove/replace batteries, AC, etc.
> 
> That's certainly possible. We already pass a very small amount of data 
> between 
> the boot and resuming kernels at the moment, and it's done quite simply - by 
> putting the variables we want to 'transfer' in a nosave page/section.

Well, if the boot and image kernels are different, which is now possible on
x86_64 with some recent patches (currently in -mm), the nosave trick won't
work.

Still, I don't think we need to pass anything from the boot to the image
kernel.  Moreover, we shouldn't do that, IMO (arguably, the boot kernel
could be replaced with a resume-aware boot loader).

> I could  conceive of a scheme wherein this was extended for driver data.
> Since the memory needed would depend on the drivers loaded, it would
> probably require that the space be allocated when hibernating, and the
> locations of structures be stored in the image header and then drivers
> notified of the locations to use when preparing to resume, but it could
> work... 
>  
> > Since we currently throw away most of that in-kernel ACPI interpreter  
> > state data when we load the to-be-resumed image and replace it with  
> > the state from the previous boot it looks to the ACPI code and  
> > firmware like our system's hardware magically changed behind its  
> > back.  The result is that the ACPI and firmware code is justifiably  
> > confused (although probably it should be more idempotent to begin  
> > with).  There's 2 potential solutions:
> >1) Formalize and copy a *lot* of ACPI state from the resume-ing  
> > kernel to the resume-ed kernel.
> >2) Properly call the ACPI S4 methods in the proper order
> 
> ... that said, I don't think the above should be necessary in most cases. I 
> believe we're already calling the ACPI S4 methods in the proper order. If I 
> understood correctly, Rafael put a lot of effort into learning what that was, 
> and into ensuring it does get done.

Yes, I did, but I can be wrong nevertheless. ;-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-22 Thread Rafael J. Wysocki
On Saturday, 22 September 2007 01:19, Kyle Moffett wrote:
> On Sep 21, 2007, at 17:16:59, Jeremy Maitin-Shepard wrote:
> > "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
> >> The ACPI platform firmware is allowed to preserve information  
> >> accross the hibernation-resume cycle, so this need not be the same.
> >
> > All of my comments related to the case where S4 is not being used  
> > (instead the system is just powered off normally), and a boot  
> > kernel that does not initialize ACPI is used.  In that case, the  
> > ACPI platform firmware should not be able to distinguish a normal  
> > boot from a resume from hibernation.
> 
> I think that in order for this to work, there would need to be some  
> ABI whereby the resume-ing kernel can pass its entire ACPI state and  
> a bunch of other ACPI-related device details to the resume-ed kernel,  
> which I believe it does not do at the moment.

In fact we don't need to do this.

The solution is not to touch ACPI in the boot kernel (ie. the one that loads
the image) and pass control to the image kernel.  This is how it's supposed
to work according to the spec, more or less (well, there are some ugly details
that need handling, like the restoration of the NVS area).

> I believe that what causes problems is the ACPI state data that the kernel
> stores is *different* between identical sequential boots, especially when
> you add/remove/replace batteries, AC, etc.

Rather the ACPI state data that the platform firmware stores may be different,
depending on whether you enter S4 or S5 during "power off" and that determines
the interactions between the kernel and the firmware after the next boot.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-22 Thread Rafael J. Wysocki
On Saturday, 22 September 2007 01:19, Kyle Moffett wrote:
 On Sep 21, 2007, at 17:16:59, Jeremy Maitin-Shepard wrote:
  Rafael J. Wysocki [EMAIL PROTECTED] writes:
  The ACPI platform firmware is allowed to preserve information  
  accross the hibernation-resume cycle, so this need not be the same.
 
  All of my comments related to the case where S4 is not being used  
  (instead the system is just powered off normally), and a boot  
  kernel that does not initialize ACPI is used.  In that case, the  
  ACPI platform firmware should not be able to distinguish a normal  
  boot from a resume from hibernation.
 
 I think that in order for this to work, there would need to be some  
 ABI whereby the resume-ing kernel can pass its entire ACPI state and  
 a bunch of other ACPI-related device details to the resume-ed kernel,  
 which I believe it does not do at the moment.

In fact we don't need to do this.

The solution is not to touch ACPI in the boot kernel (ie. the one that loads
the image) and pass control to the image kernel.  This is how it's supposed
to work according to the spec, more or less (well, there are some ugly details
that need handling, like the restoration of the NVS area).

 I believe that what causes problems is the ACPI state data that the kernel
 stores is *different* between identical sequential boots, especially when
 you add/remove/replace batteries, AC, etc.

Rather the ACPI state data that the platform firmware stores may be different,
depending on whether you enter S4 or S5 during power off and that determines
the interactions between the kernel and the firmware after the next boot.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-22 Thread Rafael J. Wysocki
On Saturday, 22 September 2007 01:47, Nigel Cunningham wrote:
 Hi.
 
 On Saturday 22 September 2007 09:19:18 Kyle Moffett wrote:
  I think that in order for this to work, there would need to be some  
  ABI whereby the resume-ing kernel can pass its entire ACPI state and  
  a bunch of other ACPI-related device details to the resume-ed kernel,  
  which I believe it does not do at the moment.  I believe that what  
  causes problems is the ACPI state data that the kernel stores is  
  *different* between identical sequential boots, especially when you  
  add/remove/replace batteries, AC, etc.
 
 That's certainly possible. We already pass a very small amount of data 
 between 
 the boot and resuming kernels at the moment, and it's done quite simply - by 
 putting the variables we want to 'transfer' in a nosave page/section.

Well, if the boot and image kernels are different, which is now possible on
x86_64 with some recent patches (currently in -mm), the nosave trick won't
work.

Still, I don't think we need to pass anything from the boot to the image
kernel.  Moreover, we shouldn't do that, IMO (arguably, the boot kernel
could be replaced with a resume-aware boot loader).

 I could  conceive of a scheme wherein this was extended for driver data.
 Since the memory needed would depend on the drivers loaded, it would
 probably require that the space be allocated when hibernating, and the
 locations of structures be stored in the image header and then drivers
 notified of the locations to use when preparing to resume, but it could
 work... 
  
  Since we currently throw away most of that in-kernel ACPI interpreter  
  state data when we load the to-be-resumed image and replace it with  
  the state from the previous boot it looks to the ACPI code and  
  firmware like our system's hardware magically changed behind its  
  back.  The result is that the ACPI and firmware code is justifiably  
  confused (although probably it should be more idempotent to begin  
  with).  There's 2 potential solutions:
 1) Formalize and copy a *lot* of ACPI state from the resume-ing  
  kernel to the resume-ed kernel.
 2) Properly call the ACPI S4 methods in the proper order
 
 ... that said, I don't think the above should be necessary in most cases. I 
 believe we're already calling the ACPI S4 methods in the proper order. If I 
 understood correctly, Rafael put a lot of effort into learning what that was, 
 and into ensuring it does get done.

Yes, I did, but I can be wrong nevertheless. ;-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-22 Thread Kyle Moffett

On Sep 22, 2007, at 06:34:17, Rafael J. Wysocki wrote:

On Saturday, 22 September 2007 01:19, Kyle Moffett wrote:

On Sep 21, 2007, at 17:16:59, Jeremy Maitin-Shepard wrote:

Rafael J. Wysocki [EMAIL PROTECTED] writes:
The ACPI platform firmware is allowed to preserve information  
accross the hibernation-resume cycle, so this need not be the same.


All of my comments related to the case where S4 is not being used  
(instead the system is just powered off normally), and a boot  
kernel that does not initialize ACPI is used.  In that case, the  
ACPI platform firmware should not be able to distinguish a normal  
boot from a resume from hibernation.


I think that in order for this to work, there would need to be  
some ABI whereby the resume-ing kernel can pass its entire ACPI  
state and a bunch of other ACPI-related device details to the  
resume-ed kernel, which I believe it does not do at the moment.


In fact we don't need to do this.

The solution is not to touch ACPI in the boot kernel (ie. the one  
that loads the image) and pass control to the image kernel.  This  
is how it's supposed to work according to the spec, more or less  
(well, there are some ugly details  that need handling, like the  
restoration of the NVS area).


First of all, we will need to make the resumed kernel throw away  
*ALL* of its ACPI state on S5 and completely reinitialize ACPI as  
though it was booting for the first time on resume.  From what I can  
tell, we throw away all the ACPI state in the boot kernel and  
reinitialize it there, but then the reinitialized state is  
overwritten with the resumed kernel's state and the two don't always  
happen to be the same.  (Like if a battery got replaced or AC status  
changed).


Umm, I don't see how that can possibly work properly.  For a laptop,  
for example, the restore kernel will need to access the disk, the LCD  
display, and possibly the AC/battery and current CPU frequency.  From  
what I understand of ACPI, both of the former may need ACPI code to  
operate properly (Isn't there an ATA taskfile object of some kind?)  
and the latter two almost definitely need ACPI.  Ergo the boot kernel  
may need to initialize and use ACPI just to run an ATA taskfile so it  
can read from the HDD efficiently.


I believe that what causes problems is the ACPI state data that  
the kernel stores is *different* between identical sequential  
boots, especially when you add/remove/replace batteries, AC, etc.


Rather the ACPI state data that the platform firmware stores may be  
different, depending on whether you enter S4 or S5 during power  
off and that determines the interactions between the kernel and  
the firmware after the next boot.


That's not what he was talking about.  The problem discussed was:
  (A) You hibernate your box, entering S5 (IE: power off)
  (B) You resume the box and the boot kernel inits all the ACPI stuff.
  (C) The boot kernel's ACPI state is completely replaced by the  
resumed kernel's state.

  (D) Hardware stops working mysteriously because of ACPI problems.

The only possible conclusion is that the state between the boot  
kernel and the resume kernel was *different* and so the device failed  
because the ACPI state in the resume kernel doesn't match the actual  
state of the hardware.


Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-22 Thread Rafael J. Wysocki
On Saturday, 22 September 2007 20:00, Kyle Moffett wrote:
 On Sep 22, 2007, at 06:34:17, Rafael J. Wysocki wrote:
  On Saturday, 22 September 2007 01:19, Kyle Moffett wrote:
  On Sep 21, 2007, at 17:16:59, Jeremy Maitin-Shepard wrote:
  Rafael J. Wysocki [EMAIL PROTECTED] writes:
  The ACPI platform firmware is allowed to preserve information  
  accross the hibernation-resume cycle, so this need not be the same.
 
  All of my comments related to the case where S4 is not being used  
  (instead the system is just powered off normally), and a boot  
  kernel that does not initialize ACPI is used.  In that case, the  
  ACPI platform firmware should not be able to distinguish a normal  
  boot from a resume from hibernation.
 
  I think that in order for this to work, there would need to be  
  some ABI whereby the resume-ing kernel can pass its entire ACPI  
  state and a bunch of other ACPI-related device details to the  
  resume-ed kernel, which I believe it does not do at the moment.
 
  In fact we don't need to do this.
 
  The solution is not to touch ACPI in the boot kernel (ie. the one  
  that loads the image) and pass control to the image kernel.  This  
  is how it's supposed to work according to the spec, more or less  
  (well, there are some ugly details  that need handling, like the  
  restoration of the NVS area).
 
 First of all, we will need to make the resumed kernel throw away  
 *ALL* of its ACPI state on S5 and completely reinitialize ACPI as  
 though it was booting for the first time on resume.

Yes, if we entered S5 in the last step of the hibernation sequence, the right
thing to do would be to make the resumed kernel reinitialize ACPI from
scratch.

 From what I can tell, we throw away all the ACPI state in the boot kernel
 and reinitialize it there, but then the reinitialized state is  
 overwritten with the resumed kernel's state and the two don't always  
 happen to be the same.  (Like if a battery got replaced or AC status  
 changed).

Usually it goes like that.  Still, you can pass acpi=off to the boot kernel,
in which case it won't reinitialize ACPI.

 Umm, I don't see how that can possibly work properly.  For a laptop,  
 for example, the restore kernel will need to access the disk, the LCD  
 display, and possibly the AC/battery and current CPU frequency.  From  
 what I understand of ACPI, both of the former may need ACPI code to  
 operate properly (Isn't there an ATA taskfile object of some kind?)  
 and the latter two almost definitely need ACPI.

Well, this is not the case on any systems that I have access to, including
two quite modern notebooks.  Apparently, everything works without ACPI on
these machines.

Besides, in theory, it's possible to use an intelligent boot loader to read
the hibernation image and that doesn't need ACPI for anything.

 Ergo the boot kernel may need to initialize and use ACPI just to run an ATA
 taskfile so it can read from the HDD efficiently.

It is possible, but I haven't seen that yet.
 
  I believe that what causes problems is the ACPI state data that  
  the kernel stores is *different* between identical sequential  
  boots, especially when you add/remove/replace batteries, AC, etc.
 
  Rather the ACPI state data that the platform firmware stores may be  
  different, depending on whether you enter S4 or S5 during power  
  off and that determines the interactions between the kernel and  
  the firmware after the next boot.
 
 That's not what he was talking about.  The problem discussed was:
(A) You hibernate your box, entering S5 (IE: power off)
(B) You resume the box and the boot kernel inits all the ACPI stuff.
(C) The boot kernel's ACPI state is completely replaced by the  
 resumed kernel's state.
(D) Hardware stops working mysteriously because of ACPI problems.
 
 The only possible conclusion is that the state between the boot  
 kernel and the resume kernel was *different* and so the device failed  
 because the ACPI state in the resume kernel doesn't match the actual  
 state of the hardware.

I think it's even more complicated.  The ACPI state of the resumed kernel
has to match whatever is preserved by the platform.

Well, my impression is that our current ACPI resume code actually expects
the platform to preserve something and if that's missing the devices in
question are not handled properly.  If that really is the case, there is the
question whether we can do something about it in a reasonable way and I can't
answer it right now.

Besides, I really think that we should use the ACPI S4 state, because machines
generally support that.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-22 Thread Alon Bar-Lev
On 9/21/07, Huang, Ying [EMAIL PROTECTED] wrote:
 This is fairly simple in fact. For example, you can specify the
 bdev/sectors in kernel command line when do kexec load kexec -l ...
 --append='...', then the image writing system can get it through
 cat /proc/cmdline.

I hope you take into account encrypted swap configuration.
Currently all three suspend implementations support using encrypted
swap in order to suspend/resume.
A configuration which forces the user to remap encryption on the kexec
kernel during suspend is not valid.

Best Regards,
Alon Bar-Lev.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Nigel Cunningham
Hi.

On Saturday 22 September 2007 09:19:18 Kyle Moffett wrote:
> I think that in order for this to work, there would need to be some  
> ABI whereby the resume-ing kernel can pass its entire ACPI state and  
> a bunch of other ACPI-related device details to the resume-ed kernel,  
> which I believe it does not do at the moment.  I believe that what  
> causes problems is the ACPI state data that the kernel stores is  
> *different* between identical sequential boots, especially when you  
> add/remove/replace batteries, AC, etc.

That's certainly possible. We already pass a very small amount of data between 
the boot and resuming kernels at the moment, and it's done quite simply - by 
putting the variables we want to 'transfer' in a nosave page/section. I could 
conceive of a scheme wherein this was extended for driver data. Since the 
memory needed would depend on the drivers loaded, it would probably require 
that the space be allocated when hibernating, and the locations of structures 
be stored in the image header and then drivers notified of the locations to 
use when preparing to resume, but it could work...
 
> Since we currently throw away most of that in-kernel ACPI interpreter  
> state data when we load the to-be-resumed image and replace it with  
> the state from the previous boot it looks to the ACPI code and  
> firmware like our system's hardware magically changed behind its  
> back.  The result is that the ACPI and firmware code is justifiably  
> confused (although probably it should be more idempotent to begin  
> with).  There's 2 potential solutions:
>1) Formalize and copy a *lot* of ACPI state from the resume-ing  
> kernel to the resume-ed kernel.
>2) Properly call the ACPI S4 methods in the proper order

... that said, I don't think the above should be necessary in most cases. I 
believe we're already calling the ACPI S4 methods in the proper order. If I 
understood correctly, Rafael put a lot of effort into learning what that was, 
and into ensuring it does get done.
 
> Neither one is particularly easy or particularly pleasant, especially  
> given all the vendor bugs in this general area.  Theoretically we  
> should be able to do both, since one will be more reliable than the  
> other on different systems depending on what kinds of firmware bugs  
> they have.

Regards,

Nigel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Kyle Moffett

On Sep 21, 2007, at 17:16:59, Jeremy Maitin-Shepard wrote:

"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
The ACPI platform firmware is allowed to preserve information  
accross the hibernation-resume cycle, so this need not be the same.


All of my comments related to the case where S4 is not being used  
(instead the system is just powered off normally), and a boot  
kernel that does not initialize ACPI is used.  In that case, the  
ACPI platform firmware should not be able to distinguish a normal  
boot from a resume from hibernation.


I think that in order for this to work, there would need to be some  
ABI whereby the resume-ing kernel can pass its entire ACPI state and  
a bunch of other ACPI-related device details to the resume-ed kernel,  
which I believe it does not do at the moment.  I believe that what  
causes problems is the ACPI state data that the kernel stores is  
*different* between identical sequential boots, especially when you  
add/remove/replace batteries, AC, etc.


Since we currently throw away most of that in-kernel ACPI interpreter  
state data when we load the to-be-resumed image and replace it with  
the state from the previous boot it looks to the ACPI code and  
firmware like our system's hardware magically changed behind its  
back.  The result is that the ACPI and firmware code is justifiably  
confused (although probably it should be more idempotent to begin  
with).  There's 2 potential solutions:
  1) Formalize and copy a *lot* of ACPI state from the resume-ing  
kernel to the resume-ed kernel.

  2) Properly call the ACPI S4 methods in the proper order

Neither one is particularly easy or particularly pleasant, especially  
given all the vendor bugs in this general area.  Theoretically we  
should be able to do both, since one will be more reliable than the  
other on different systems depending on what kinds of firmware bugs  
they have.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

> On Friday, 21 September 2007 23:08, Jeremy Maitin-Shepard wrote:
>> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
>> 
>> > On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
>> >> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
>> >> 
>> >> [snip]
>> >> 
>> >> > The ACPI NVS area is explicitly marked as reserved and we don't save it.
>> >> > On x86_64 we don't save any memory areas marked as reserved and yet the
>> > above
>> >> > happens.
>> >> 
>> >> I think you have mentioned before, though, that ACPI is first
>> >> initialized by the boot kernel, before it is later initialized by
>> >> resuming kernel.  This could well be the source of the problem.
>> 
>> > No, it's not.  I have tested that too with an ACPI-less boot kernel.
>> 
>> Well, it seems that there just must be some other bug.  I would define
>> anything that differs between the post-resume initialization of ACPI

> I'm not sure what you mean.

>> from the normal boot initialization of ACPI as a bug.  If the interaction
>> with the hardware is the same, then the behavior will be the same.

> The ACPI platform firmware is allowed to preserve information accross the
> hibernation-resume cycle, so this need not be the same.

All of my comments related to the case where S4 is not being used
(instead the system is just powered off normally), and a boot kernel
that does not initialize ACPI is used.  In that case, the ACPI platform
firmware should not be able to distinguish a normal boot from a resume
from hibernation.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 23:08, Jeremy Maitin-Shepard wrote:
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
> 
> > On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
> >> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
> >> 
> >> [snip]
> >> 
> >> > The ACPI NVS area is explicitly marked as reserved and we don't save it.
> >> > On x86_64 we don't save any memory areas marked as reserved and yet the
> > above
> >> > happens.
> >> 
> >> I think you have mentioned before, though, that ACPI is first
> >> initialized by the boot kernel, before it is later initialized by
> >> resuming kernel.  This could well be the source of the problem.
> 
> > No, it's not.  I have tested that too with an ACPI-less boot kernel.
> 
> Well, it seems that there just must be some other bug.  I would define
> anything that differs between the post-resume initialization of ACPI

I'm not sure what you mean.

> from the normal boot initialization of ACPI as a bug.  If the interaction
> with the hardware is the same, then the behavior will be the same.

The ACPI platform firmware is allowed to preserve information accross the
hibernation-resume cycle, so this need not be the same.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

> On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
>> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
>> 
>> [snip]
>> 
>> > The ACPI NVS area is explicitly marked as reserved and we don't save it.
>> > On x86_64 we don't save any memory areas marked as reserved and yet the
> above
>> > happens.
>> 
>> I think you have mentioned before, though, that ACPI is first
>> initialized by the boot kernel, before it is later initialized by
>> resuming kernel.  This could well be the source of the problem.

> No, it's not.  I have tested that too with an ACPI-less boot kernel.

Well, it seems that there just must be some other bug.  I would define
anything that differs between the post-resume initialization of ACPI from
the normal boot initialization of ACPI as a bug.  If the interaction
with the hardware is the same, then the behavior will be the same.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
> 
> [snip]
> 
> > The ACPI NVS area is explicitly marked as reserved and we don't save it.
> > On x86_64 we don't save any memory areas marked as reserved and yet the 
> > above
> > happens.
> 
> I think you have mentioned before, though, that ACPI is first
> initialized by the boot kernel, before it is later initialized by
> resuming kernel.  This could well be the source of the problem.

No, it's not.  I have tested that too with an ACPI-less boot kernel.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

[snip]

> The ACPI NVS area is explicitly marked as reserved and we don't save it.
> On x86_64 we don't save any memory areas marked as reserved and yet the above
> happens.

I think you have mentioned before, though, that ACPI is first
initialized by the boot kernel, before it is later initialized by
resuming kernel.  This could well be the source of the problem.

In particular, isn't it the case that you also switch the devices to low
power mode before resuming?

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 21:45, Alan Stern wrote:
> On Fri, 21 Sep 2007, Rafael J. Wysocki wrote:
> 
> > > > Well, the problem is that apparently some systems (eg. my HP nx6325) 
> > > > expect us
> > > > to execute the _PTS ACPI global control method before creating the 
> > > > image _and_
> > > > to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally 
> > > > put the
> > > > system into the sleep state.  In particular, on nx6325, if we don't do 
> > > > that,
> > > > then after the restore the status of the AC power will not be reported
> > > > correctly (and if you replace the battery while in the sleep state, the
> > > > battery status will not be updated correctly after the restore).  
> > > > Similar
> > > > issues have been reported for other machines.
> > > 
> > > Suppose that instead of using ACPI S4 state at all, you instead just
> > > power off.  Yes, you'll lose wakeup event functionality, and flashy
> > > LEDs, but doesn't this take care of the problem?
> > 
> > Nope.
> > 
> > > The firmware shouldn't see the hibernate as anything other than a shutdown
> > > and reboot.
> > 
> > Actually, this assumption is apparently wrong.
> 
> One gets the impression that the hibernation image includes a memory 
> area used by the firmware.  That could explain why devices need to be 
> in a low-power state when the image is created -- so that when the 
> image is restored, the firmware doesn't get confused about the device 
> states.
> 
> It would also explain why the firmware sees
> resume-from-power-off-hibernation as different from a regular reboot:
> because its data area gets overwritten as part of the resume.
> 
> In reality it's probably more complicated than this, with weird 
> interactions between the firmware and the various ACPI methods.  
> Nevertheless, the main idea seems valid.

I guess so, but I'm not sure.

The ACPI NVS area is explicitly marked as reserved and we don't save it.
On x86_64 we don't save any memory areas marked as reserved and yet the above
happens.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Alan Stern
On Fri, 21 Sep 2007, Rafael J. Wysocki wrote:

> > > Well, the problem is that apparently some systems (eg. my HP nx6325) 
> > > expect us
> > > to execute the _PTS ACPI global control method before creating the image 
> > > _and_
> > > to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put 
> > > the
> > > system into the sleep state.  In particular, on nx6325, if we don't do 
> > > that,
> > > then after the restore the status of the AC power will not be reported
> > > correctly (and if you replace the battery while in the sleep state, the
> > > battery status will not be updated correctly after the restore).  Similar
> > > issues have been reported for other machines.
> > 
> > Suppose that instead of using ACPI S4 state at all, you instead just
> > power off.  Yes, you'll lose wakeup event functionality, and flashy
> > LEDs, but doesn't this take care of the problem?
> 
> Nope.
> 
> > The firmware shouldn't see the hibernate as anything other than a shutdown
> > and reboot.
> 
> Actually, this assumption is apparently wrong.

One gets the impression that the hibernation image includes a memory 
area used by the firmware.  That could explain why devices need to be 
in a low-power state when the image is created -- so that when the 
image is restored, the firmware doesn't get confused about the device 
states.

It would also explain why the firmware sees
resume-from-power-off-hibernation as different from a regular reboot:
because its data area gets overwritten as part of the resume.

In reality it's probably more complicated than this, with weird 
interactions between the firmware and the various ACPI methods.  
Nevertheless, the main idea seems valid.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

> On Friday, 21 September 2007 15:14, huang ying wrote:
>> On 9/21/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
>> > On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
>> > > Nigel Cunningham <[EMAIL PROTECTED]> writes:
> [--snip--]
>> > >
>> > > No one has yet attacked the hard problem of coming up with separate
>> > > hibernate methods for drivers.
>> >
>> > Well, I've been playing a bit with that for some time, but it's not easy by
> any
>> > means.
>> >
>> > In short, I'm seeing some problems related to the handling of ACPI that 
>> > seem
> to
>> > shatter the entire idea of having separate hibernate methods, at least as
> far
>> > as ACPI systems are concerned.
>> 
>> So sadly to hear this. Can you details it a little? Or a link?

> Well, the problem is that apparently some systems (eg. my HP nx6325) expect us
> to execute the _PTS ACPI global control method before creating the image _and_
> to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
> system into the sleep state.  In particular, on nx6325, if we don't do that,
> then after the restore the status of the AC power will not be reported
> correctly (and if you replace the battery while in the sleep state, the
> battery status will not be updated correctly after the restore).  Similar
> issues have been reported for other machines.

Suppose that instead of using ACPI S4 state at all, you instead just
power off.  Yes, you'll lose wakeup event functionality, and flashy
LEDs, but doesn't this take care of the problem?  The firmware shouldn't
see the hibernate as anything other than a shutdown and reboot.  ACPI
should be initialized normally when resuming, which should take care of
getting AC power status reported properly.

This should be the behavior, anyway, on the many systems that do not
support S4.

> Now, the ACPI specification requires us to put devices into low power states
> before executing _PTS and that's exactly what we're doing before a suspend to
> RAM.  Thus, it seems that in general we need to do the same for hibernation on
> ACPI systems.

It seems that if ACPI S4 is going to be used, Switching to low power
state is something that should be done only immediately before entering
that state (i.e. after the image has already been saved).  In
particular, it should not be done just before the atomic copy.  It is
true that (during resume) after the atomic copy snapshot is restored,
drivers will need to be prepared (i.e. have saved whatever information
is necessary) to _resume_ devices from the low power state, but that
does not mean they have to actually be put into that low power state
before the copy is made.

I agree that for the kexec implementation there may be additional
issues.  For swsusp, uswsusp, and tuxonice, though, I don't see why
there should be a problem.  I think that, as was recognized before, all
of the issues are resolved by properly considering exactly what each
callback should do and when it should be called.  The problems stem from
ambiguous specifications, or trying to use the same callback for two
different purposes or in two different cases.

Let me know if I'm mistaken.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 20:11, Jeremy Maitin-Shepard wrote:
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
> 
> > On Friday, 21 September 2007 15:14, huang ying wrote:
> >> On 9/21/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> >> > On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
> >> > > Nigel Cunningham <[EMAIL PROTECTED]> writes:
> > [--snip--]
> >> > >
> >> > > No one has yet attacked the hard problem of coming up with separate
> >> > > hibernate methods for drivers.
> >> >
> >> > Well, I've been playing a bit with that for some time, but it's not easy 
> >> > by
> > any
> >> > means.
> >> >
> >> > In short, I'm seeing some problems related to the handling of ACPI that 
> >> > seem
> > to
> >> > shatter the entire idea of having separate hibernate methods, at least as
> > far
> >> > as ACPI systems are concerned.
> >> 
> >> So sadly to hear this. Can you details it a little? Or a link?
> 
> > Well, the problem is that apparently some systems (eg. my HP nx6325) expect 
> > us
> > to execute the _PTS ACPI global control method before creating the image 
> > _and_
> > to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
> > system into the sleep state.  In particular, on nx6325, if we don't do that,
> > then after the restore the status of the AC power will not be reported
> > correctly (and if you replace the battery while in the sleep state, the
> > battery status will not be updated correctly after the restore).  Similar
> > issues have been reported for other machines.
> 
> Suppose that instead of using ACPI S4 state at all, you instead just
> power off.  Yes, you'll lose wakeup event functionality, and flashy
> LEDs, but doesn't this take care of the problem?

Nope.

> The firmware shouldn't see the hibernate as anything other than a shutdown
> and reboot.

Actually, this assumption is apparently wrong.

> ACPI should be initialized normally when resuming, which should take care of
> getting AC power status reported properly.

Well, that doesn't work.  I've tested it, really. :-)

> This should be the behavior, anyway, on the many systems that do not
> support S4.
> 
> > Now, the ACPI specification requires us to put devices into low power states
> > before executing _PTS and that's exactly what we're doing before a suspend 
> > to
> > RAM.  Thus, it seems that in general we need to do the same for hibernation 
> > on
> > ACPI systems.
> 
> It seems that if ACPI S4 is going to be used, Switching to low power
> state is something that should be done only immediately before entering
> that state (i.e. after the image has already been saved).

Doesn't.  Work.

> In particular, it should not be done just before the atomic copy.  It is
> true that (during resume) after the atomic copy snapshot is restored,
> drivers will need to be prepared (i.e. have saved whatever information
> is necessary) to _resume_ devices from the low power state, but that
> does not mean they have to actually be put into that low power state
> before the copy is made.
> 
> I agree that for the kexec implementation there may be additional
> issues.  For swsusp, uswsusp, and tuxonice, though, I don't see why
> there should be a problem.  I think that, as was recognized before, all
> of the issues are resolved by properly considering exactly what each
> callback should do and when it should be called.  The problems stem from
> ambiguous specifications, or trying to use the same callback for two
> different purposes or in two different cases.
> 
> Let me know if I'm mistaken.

See above. :-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 17:02, huang ying wrote:
> On 9/21/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > On Friday, 21 September 2007 15:14, huang ying wrote:
> > > On 9/21/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > > > On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
> > > > > Nigel Cunningham <[EMAIL PROTECTED]> writes:
> > [--snip--]
> > > > >
> > > > > No one has yet attacked the hard problem of coming up with separate
> > > > > hibernate methods for drivers.
> > > >
> > > > Well, I've been playing a bit with that for some time, but it's not 
> > > > easy by any
> > > > means.
> > > >
> > > > In short, I'm seeing some problems related to the handling of ACPI that 
> > > > seem to
> > > > shatter the entire idea of having separate hibernate methods, at least 
> > > > as far
> > > > as ACPI systems are concerned.
> > >
> > > So sadly to hear this. Can you details it a little? Or a link?
> >
> > Well, the problem is that apparently some systems (eg. my HP nx6325) expect 
> > us
> > to execute the _PTS ACPI global control method before creating the image 
> > _and_
> > to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
> > system into the sleep state.  In particular, on nx6325, if we don't do that,
> > then after the restore the status of the AC power will not be reported
> > correctly (and if you replace the battery while in the sleep state, the
> > battery status will not be updated correctly after the restore).  Similar
> > issues have been reported for other machines.
> >
> > Now, the ACPI specification requires us to put devices into low power states
> > before executing _PTS and that's exactly what we're doing before a suspend 
> > to
> > RAM.  Thus, it seems that in general we need to do the same for hibernation 
> > on
> > ACPI systems.
> 
> Then, is it possible to separate device quiesce from device suspend.

It surely is possible, but I'm not sure if it's going to be useful.

I mean, if we need to do exactly the same thing before a suspend to RAM and
before a hibernation (ie. to put devices into low power states), why would we
want to use different methods for that in both cases?

> Perhaps not for swsusp, but for kexec based hibernation?

Frankly, I don't know.

Generally, changing the way in which device drivers handle suspend (to RAM)
and hibernation is a huge task.  After considering this issue for some time
I think that we really should start from hardening suspend (to RAM) so that it
doesn't need the freezer any more, because _that_ would require us to change
the suspend-related drivers' callbacks anyway.

When we are sure how we are going to eliminate the freezer from suspend
(to RAM), we'll know how that affects hibernation and what to do about it.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread huang ying
On 9/21/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> On Friday, 21 September 2007 15:14, huang ying wrote:
> > On 9/21/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > > On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
> > > > Nigel Cunningham <[EMAIL PROTECTED]> writes:
> [--snip--]
> > > >
> > > > No one has yet attacked the hard problem of coming up with separate
> > > > hibernate methods for drivers.
> > >
> > > Well, I've been playing a bit with that for some time, but it's not easy 
> > > by any
> > > means.
> > >
> > > In short, I'm seeing some problems related to the handling of ACPI that 
> > > seem to
> > > shatter the entire idea of having separate hibernate methods, at least as 
> > > far
> > > as ACPI systems are concerned.
> >
> > So sadly to hear this. Can you details it a little? Or a link?
>
> Well, the problem is that apparently some systems (eg. my HP nx6325) expect us
> to execute the _PTS ACPI global control method before creating the image _and_
> to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
> system into the sleep state.  In particular, on nx6325, if we don't do that,
> then after the restore the status of the AC power will not be reported
> correctly (and if you replace the battery while in the sleep state, the
> battery status will not be updated correctly after the restore).  Similar
> issues have been reported for other machines.
>
> Now, the ACPI specification requires us to put devices into low power states
> before executing _PTS and that's exactly what we're doing before a suspend to
> RAM.  Thus, it seems that in general we need to do the same for hibernation on
> ACPI systems.

Then, is it possible to separate device quiesce from device suspend.
Perhaps not for swsusp, but for kexec based hibernation?

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 15:14, huang ying wrote:
> On 9/21/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
> > > Nigel Cunningham <[EMAIL PROTECTED]> writes:
[--snip--]
> > >
> > > No one has yet attacked the hard problem of coming up with separate
> > > hibernate methods for drivers.
> >
> > Well, I've been playing a bit with that for some time, but it's not easy by 
> > any
> > means.
> >
> > In short, I'm seeing some problems related to the handling of ACPI that 
> > seem to
> > shatter the entire idea of having separate hibernate methods, at least as 
> > far
> > as ACPI systems are concerned.
> 
> So sadly to hear this. Can you details it a little? Or a link?

Well, the problem is that apparently some systems (eg. my HP nx6325) expect us
to execute the _PTS ACPI global control method before creating the image _and_
to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
system into the sleep state.  In particular, on nx6325, if we don't do that,
then after the restore the status of the AC power will not be reported
correctly (and if you replace the battery while in the sleep state, the
battery status will not be updated correctly after the restore).  Similar
issues have been reported for other machines.

Now, the ACPI specification requires us to put devices into low power states
before executing _PTS and that's exactly what we're doing before a suspend to
RAM.  Thus, it seems that in general we need to do the same for hibernation on
ACPI systems.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread huang ying
On 9/21/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> Hi Andrew,
>
> On Friday, 21 September 2007 03:41, Andrew Morton wrote:
> > On Fri, 21 Sep 2007 11:19:59 +1000 Nigel Cunningham <[EMAIL PROTECTED]> 
> > wrote:
> >
> > > Hi.
> > >
> > > On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> > > > On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham
> > > <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Hi Andrew.
> > > > >
> > > > > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > > > > Seems like good enough for -mm to me.
> > > > > >
> > > > > > 
> > > > > >   Pavel
> > > > >
> > > > > Andrew, if I recall correctly, you said a while ago that you didn't 
> > > > > want
> > > > > another hibernation implementation in the vanilla kernel. If you're 
> > > > > going
> > > to
> > > > > consider merging this kexec code, will you also please consider 
> > > > > merging
> > > > > TuxOnIce?
> > > > >
> > > >
> > > > The theory is that kexec-based hibernation will mainly use preexisting
> > > > kexec code and will permit us to delete the existing hibernation
> > > > implementation.
> > > >
> > > > That's different from replacing it.
> > >
> > > TuxOnIce doesn't remove the existing implementation either. It can
> > > transparently replace it, but you can enable/disable that at compile time.
> >
> > Right.  So we end up with two implementations in-tree.  Whereas
> > kexec-based-hibernation leads us to having zero implementations in-tree.
>
> Well, I don't quite agree.
>
> For now, the kexec-based approach is missing the handling of devices, AFAICS.
> Namely, it's quite easy to snapshot memory with the help of kexec, but the
> state of devices gets trashed in the process, so you need some additional code
> saving the state of devices for you, executed before the kexec.
>
> Moreover, on ACPI systems the transition to the S4 sleep state and back to S0
> (working state) is more complicated than a system checkpointing, because we
> are supposed to take the platform firmware into consideration in that case.
> The more I think about this, the more it seems to me that it just can't be 
> done
> on top of kexec in a reasonable fashion.  Of course, we could avoid handling
> the ACPI S4, but that would leave some people (including me ;-)) with
> semi-working hardware after the "restore".  I don't think that's generally
> acceptable in the long run.
>
> IMHO, for ACPI systems the way to go is to harden suspend to RAM (with s2ram
> in place and the graphics adapters specifications from Intel and AMD released
> we are in a good position to do that) and build the S4 transition mechanism
> on top of that.  It can be done easlily by adapting the current hibernation
> code, but not on top of kexec (I'm afraid).

Yes. ACPI is a biggest issue of kexec based hibernation now. I will
try to work on that. At least I can prove whether kexec based
hibernation is possible with ACPI.

> [Besides, the current hibernation userland interface is used by default by
> openSUSE and it's also used by quite some Debian users, so we can't drop
> it overnight and it can't be implemented in a compatible way on top of the
> kexec-based solution.]

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread huang ying
On 9/21/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
> > Nigel Cunningham <[EMAIL PROTECTED]> writes:
> > >
> > > That's not true. Kexec will itself be an implementation, otherwise you'd 
> > > end
> > > up with people screaming about no hibernation support.
> >
> > There needs to be an implementation of hibernation based on kexec with
> > return yes.
> >
> > > And it won't result in
> > > the complete removal of the existing hibernation code from the kernel. At 
> > > the
> > > very least, it's going to want the kernel being hibernated to have an
> > > interface by which it can find out which pages need to be saved.
> >
> > That interface should be running kernel -> user space -> target kernel.
> > Not direct kernel to kernel.
> >
> > > I wouldn't
> > > be surprised if it also ends up with an interface in which the kernel 
> > > being
> > > hibernated tells it what bdev/sectors in which to save the image as well
> > > (otherwise you're going to need a dedicated, otherwise untouched partition
> > > exclusively for the kexec'd kernel to use), or what network settings to 
> > > use
> > > if it wants to try to save the image to a network storage device.
> >
> > initramfs.  We already seem to have that interface.  And distros
> > seems to do a pretty decent job of using it to configure systems.
> >
> > > On top of
> > > that, there are all the issues related to device reinitialisation and so 
> > > on,
> >
> > Yes.
> >
> > > and it looks like there's greatly increased pain for users wanting to
> > > configure this new implementation.
> >
> > Not to be callous but that really is a user space and distro issue.
> >
> > > Kexec is by no means proven to be the panacea for all the issues.
> >
> > I agree.  I'm still not quite convinced it will do a satisfactory job.
> > But I think it does make sense to implement a general kexec with
> > return and see if that can reasonably be used for handling hibernation
> > issues.  If done cleanly and with care the implementation won't be
> > hibernation specific.
>
> Yes, and that's worth doing anyway, IMO.
>
> > Frankly this looks like the best way I can see to implement a general
> > mechanism for calling silly firmware/BIOS/EFI services after we
> > have a kernel up and running.  It's a little bit like allowing
> > X to call iopl(3) and do inb/outb directly.
> >
> > The configuration issues you raise pretty much exist for kexec on
> > panic, and they seem to be being resolved for that case in a
> > reasonable way.  I do agree that the current kexec+return effort seems
> > to be one of those unfortunate cases where we give every mechanism in
> > the kernel to do something in user space and then no one actually
> > implements the user space.  That doesn't do any one any good.
> >
> > For hibernation we don't have the absolute need to step outside of the
> > current kernel that we do in the kexec on panic approach.  However we
> > have this practical fight about mechanism and policy, and kexec with
> > return has this seductive allure that it appears to be the minimal
> > necessary mechanism in the kernel.
> >
> > No one has yet attacked the hard problem of coming up with separate
> > hibernate methods for drivers.
>
> Well, I've been playing a bit with that for some time, but it's not easy by 
> any
> means.
>
> In short, I'm seeing some problems related to the handling of ACPI that seem 
> to
> shatter the entire idea of having separate hibernate methods, at least as far
> as ACPI systems are concerned.

So sadly to hear this. Can you details it a little? Or a link?

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Nigel Cunningham
Hi.

On Friday 21 September 2007 22:18:19 Rafael J. Wysocki wrote:
> On Friday, 21 September 2007 13:58, Nigel Cunningham wrote:
> > Hi.
> > 
> > On Friday 21 September 2007 21:56:29 Rafael J. Wysocki wrote:
> > > [Besides, the current hibernation userland interface is used by default 
by
> > > openSUSE and it's also used by quite some Debian users, so we can't drop
> > > it overnight and it can't be implemented in a compatible way on top of 
the
> > > kexec-based solution.]
> > 
> > Could it be fudged by giving userland a null image and having (say) the 
first 
> > ioctl be one that triggers all the real work (with other ioctls being 
noops 
> > or such like, as appropriate)?
> 
> Well, the "suspend" part is probably doable, but I'm afraid of the "resume"
> one.

'k. I've occasionally thought about trying it, but haven't ever gotten around 
to actually doing it yet. (I'd like to make TuxOnIce transparently replace 
both swsusp and uswsusp if I could).

Regards,

Nigel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 13:58, Nigel Cunningham wrote:
> Hi.
> 
> On Friday 21 September 2007 21:56:29 Rafael J. Wysocki wrote:
> > [Besides, the current hibernation userland interface is used by default by
> > openSUSE and it's also used by quite some Debian users, so we can't drop
> > it overnight and it can't be implemented in a compatible way on top of the
> > kexec-based solution.]
> 
> Could it be fudged by giving userland a null image and having (say) the first 
> ioctl be one that triggers all the real work (with other ioctls being noops 
> or such like, as appropriate)?

Well, the "suspend" part is probably doable, but I'm afraid of the "resume"
one.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 11:49, Pavel Machek wrote:
> Hi!
> 
> > > Seems like good enough for -mm to me.
> 
> (For the record, I do not think this is going to be
> hibernation-replacement any time soon. But it is functionality useful
> for other stuff -- dump memory and continue -- and yes it may be able
> to do hibernation in the long term.
> 
> It really comes from the other side of reliability:
> 
> * swsusp is "if your kernel is perfectly healthy, it will work"
> 
> while this, coming from kdump is
> 
> * "if your kernel is not completely trashed, it should work"
> 
> ...which is why can't use swsusp to do dump memory and continue -- you
> want to do dumps on "slightly broken" systems. And yes, as a
> sideeffect it may be able to do hibernation... why not, lets see how
> it works out).

I generally agree. :-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Nigel Cunningham
Hi.

On Friday 21 September 2007 21:56:29 Rafael J. Wysocki wrote:
> [Besides, the current hibernation userland interface is used by default by
> openSUSE and it's also used by quite some Debian users, so we can't drop
> it overnight and it can't be implemented in a compatible way on top of the
> kexec-based solution.]

Could it be fudged by giving userland a null image and having (say) the first 
ioctl be one that triggers all the real work (with other ioctls being noops 
or such like, as appropriate)?

Regards,

Nigel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
> Nigel Cunningham <[EMAIL PROTECTED]> writes:
> >
> > That's not true. Kexec will itself be an implementation, otherwise you'd 
> > end 
> > up with people screaming about no hibernation support. 
> 
> There needs to be an implementation of hibernation based on kexec with
> return yes.
> 
> > And it won't result in 
> > the complete removal of the existing hibernation code from the kernel. At 
> > the 
> > very least, it's going to want the kernel being hibernated to have an 
> > interface by which it can find out which pages need to be saved.
> 
> That interface should be running kernel -> user space -> target kernel.
> Not direct kernel to kernel.
> 
> > I wouldn't 
> > be surprised if it also ends up with an interface in which the kernel being 
> > hibernated tells it what bdev/sectors in which to save the image as well 
> > (otherwise you're going to need a dedicated, otherwise untouched partition 
> > exclusively for the kexec'd kernel to use), or what network settings to use 
> > if it wants to try to save the image to a network storage device. 
> 
> initramfs.  We already seem to have that interface.  And distros
> seems to do a pretty decent job of using it to configure systems.
> 
> > On top of 
> > that, there are all the issues related to device reinitialisation and so 
> > on, 
> 
> Yes.
> 
> > and it looks like there's greatly increased pain for users wanting to 
> > configure this new implementation. 
> 
> Not to be callous but that really is a user space and distro issue.
> 
> > Kexec is by no means proven to be the panacea for all the issues.
> 
> I agree.  I'm still not quite convinced it will do a satisfactory job.
> But I think it does make sense to implement a general kexec with
> return and see if that can reasonably be used for handling hibernation
> issues.  If done cleanly and with care the implementation won't be
> hibernation specific.

Yes, and that's worth doing anyway, IMO.

> Frankly this looks like the best way I can see to implement a general
> mechanism for calling silly firmware/BIOS/EFI services after we
> have a kernel up and running.  It's a little bit like allowing
> X to call iopl(3) and do inb/outb directly.
> 
> The configuration issues you raise pretty much exist for kexec on
> panic, and they seem to be being resolved for that case in a
> reasonable way.  I do agree that the current kexec+return effort seems
> to be one of those unfortunate cases where we give every mechanism in
> the kernel to do something in user space and then no one actually
> implements the user space.  That doesn't do any one any good.
> 
> For hibernation we don't have the absolute need to step outside of the
> current kernel that we do in the kexec on panic approach.  However we
> have this practical fight about mechanism and policy, and kexec with
> return has this seductive allure that it appears to be the minimal
> necessary mechanism in the kernel.
> 
> No one has yet attacked the hard problem of coming up with separate
> hibernate methods for drivers.

Well, I've been playing a bit with that for some time, but it's not easy by any
means.

In short, I'm seeing some problems related to the handling of ACPI that seem to
shatter the entire idea of having separate hibernate methods, at least as far
as ACPI systems are concerned.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
Hi Andrew,

On Friday, 21 September 2007 03:41, Andrew Morton wrote:
> On Fri, 21 Sep 2007 11:19:59 +1000 Nigel Cunningham <[EMAIL PROTECTED]> wrote:
> 
> > Hi.
> > 
> > On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> > > On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
> > <[EMAIL PROTECTED]> wrote:
> > > 
> > > > Hi Andrew.
> > > > 
> > > > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > > > Seems like good enough for -mm to me.
> > > > > 
> > > > >   
> > > > > Pavel
> > > > 
> > > > Andrew, if I recall correctly, you said a while ago that you didn't 
> > > > want 
> > > > another hibernation implementation in the vanilla kernel. If you're 
> > > > going 
> > to 
> > > > consider merging this kexec code, will you also please consider merging 
> > > > TuxOnIce?
> > > > 
> > > 
> > > The theory is that kexec-based hibernation will mainly use preexisting
> > > kexec code and will permit us to delete the existing hibernation
> > > implementation.
> > > 
> > > That's different from replacing it.
> > 
> > TuxOnIce doesn't remove the existing implementation either. It can 
> > transparently replace it, but you can enable/disable that at compile time.
> 
> Right.  So we end up with two implementations in-tree.  Whereas
> kexec-based-hibernation leads us to having zero implementations in-tree.

Well, I don't quite agree.

For now, the kexec-based approach is missing the handling of devices, AFAICS.
Namely, it's quite easy to snapshot memory with the help of kexec, but the
state of devices gets trashed in the process, so you need some additional code
saving the state of devices for you, executed before the kexec.

Moreover, on ACPI systems the transition to the S4 sleep state and back to S0
(working state) is more complicated than a system checkpointing, because we
are supposed to take the platform firmware into consideration in that case.
The more I think about this, the more it seems to me that it just can't be done
on top of kexec in a reasonable fashion.  Of course, we could avoid handling
the ACPI S4, but that would leave some people (including me ;-)) with
semi-working hardware after the "restore".  I don't think that's generally
acceptable in the long run.

IMHO, for ACPI systems the way to go is to harden suspend to RAM (with s2ram
in place and the graphics adapters specifications from Intel and AMD released
we are in a good position to do that) and build the S4 transition mechanism
on top of that.  It can be done easlily by adapting the current hibernation
code, but not on top of kexec (I'm afraid).

[Besides, the current hibernation userland interface is used by default by
openSUSE and it's also used by quite some Debian users, so we can't drop
it overnight and it can't be implemented in a compatible way on top of the
kexec-based solution.]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Pavel Machek
Hi!

> > Seems like good enough for -mm to me.

(For the record, I do not think this is going to be
hibernation-replacement any time soon. But it is functionality useful
for other stuff -- dump memory and continue -- and yes it may be able
to do hibernation in the long term.

It really comes from the other side of reliability:

* swsusp is "if your kernel is perfectly healthy, it will work"

while this, coming from kdump is

* "if your kernel is not completely trashed, it should work"

...which is why can't use swsusp to do dump memory and continue -- you
want to do dumps on "slightly broken" systems. And yes, as a
sideeffect it may be able to do hibernation... why not, lets see how
it works out).

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Pavel Machek
Hi!
> >
> > Sounds doable, as long as you can cope with long command lines (which 
> > shouldn't be a biggie). (If you've got a swapfile or parts of a swap 
> > partition already in use, it can be quite fragmented).
> 
> Hmm.  This is an interesting problem.  Sharing a swap file or a swap
> partition with the actual swap of user space pages does seem to be
> a limitation of this approach.
> 
> Although the fact that it is simple to write to a separate file may
> be a reasonable compensation.

I'm not sure how you'd write it to a separate file. Notice that kjump
kernel may not mount journalling filesystems, not even
read-only. (Ext3 replays journal in that case). You could pass block
numbers from the original kernel...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Huang, Ying
On Thu, 2007-09-20 at 22:01 -0600, Eric W. Biederman wrote:
> "Huang, Ying" <[EMAIL PROTECTED]> writes:
> 
> > Index: linux-2.6.23-rc6/include/linux/kexec.h
> > ===
> > --- linux-2.6.23-rc6.orig/include/linux/kexec.h 2007-09-20 
> > 11:24:25.0
> > +0800
> > +++ linux-2.6.23-rc6/include/linux/kexec.h 2007-09-20 11:26:03.0 
> > +0800
> > @@ -83,6 +83,7 @@
> >  
> > unsigned long start;
> > struct page *control_code_page;
> > +   struct page *swap_page;
> >  
> > unsigned long nr_segments;
> > struct kexec_segment segment[KEXEC_SEGMENT_MAX];
> > @@ -194,4 +195,12 @@
> >  static inline void crash_kexec(struct pt_regs *regs) { }
> >  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
> >  #endif /* CONFIG_KEXEC */
> > +
> > +#ifdef CONFIG_KEXEC_JUMP
> > +extern int machine_kexec_jump(struct kimage *image);
> > +extern unsigned long kexec_jump_back_entry;
> > +extern int kexec_jump(void);
> > +#else /* !CONFIG_KEXEC_JUMP */
> > +static inline int kexec_jump(void) { return 0; }
> > +#endif /* CONFIG_KEXEC_JUMP */
> >  #endif /* LINUX_KEXEC_H */
> 
> Please the kexec_jump code just be triggered off of a flag in
> struct kimage.  We just need to define an extra flag to sys_kexec_load
> say KEXEC_RETURNS.  Ideally in the long term we would not have to
> do anything except to accept the flag.  Adding a flag makes
> a nice feature test if you want to see if your kernel supports
> the extended version of kexec.
> 
> Until we get the hibernation methods sorted out storing the flag in
> struct kimage and making the methods that we call conditional feels
> like a more maintainable interface.  Especially since we have to
> know at kexec image load time what we are going to do with the
> kexec image.

You mean we use KEXEC_RETURNS when do sys_kexec_load, then use ordinary
reboot command LINUX_REBOOT_CMD_KEXEC, which call kexec_jump conditional
based on KEXEC_RETURNS? This is reasonable. I will change it.

> > +#ifdef CONFIG_KEXEC_JUMP
> > +unsigned long kexec_jump_back_entry;
> > +
> > +int kexec_jump(void)
> > +{
> > +   int error;
> > +
> > +   if (!kexec_image)
> > +   return -EINVAL;
> 
> I understand where you are coming from with this implementation of
> kexec_jump but it looks like this is one of the big parts of this
> patch that have not reached their final form.
> 
> The line above is racy with sys_kexec_load.

Yes. I should use xchg(_image, NULL) as that of other kexec
related functions.

> > +   pm_prepare_console();
> > +   suspend_console();
> > +   error = device_suspend(PMSG_FREEZE);
> > +   if (error)
> > +   goto Resume_console;
> 
> This as everyone knows needs to be device_shutdown or a better hibernation
> replacement.

Yes.

> > +   error = disable_nonboot_cpus();
> > +   if (error)
> > +   goto Resume_devices;
> 
> Can't we just catch the noboot cpu's in a mutex.
> disable_nonboot_cpus is actually impossible to implement 100% reliably
> with current hardware.  But something smp_call_function so we trap them
> at a specific location and then the equivalent when we come back should
> be simple.  I guess the tricky part is bringing the cpus back up again.
> 
> Using the broken by design version of cpu hotplug really annoys me here.

I think this is not very simple. Given that we may jump back from the
kernel with SMP turned off, or from bootloader directly. But CPU hotplug
is another topic, I think it should be solved in another patch.

> > +   local_irq_disable();
> > +   /* At this point, device_suspend() has been called, but *not*
> > +* device_power_down(). We *must* device_power_down() now.
> > +* Otherwise, drivers for some devices (e.g. interrupt controllers)
> > +* become desynchronized with the actual state of the hardware
> > +* at resume time, and evil weirdness ensues.
> > +*/
> > +   error = device_power_down(PMSG_FREEZE);
> > +   if (error)
> > +   goto Enable_irqs;
> 
> This of course should go away when we have the proper methods.
Yes.
> > +   save_processor_state();
> This line might even be reasonable.
> > +   error = machine_kexec_jump(kexec_image);
> > +   restore_processor_state();
> >
> > +   /* NOTE:  device_power_up() is just a resume() for devices
> > +* that suspended with irqs off ... no overall powerup.
> > +*/
> > +   device_power_up();
> Yep this can go away.
Yes.
> > + Enable_irqs:
> > +   local_irq_enable();
> > +   enable_nonboot_cpus();
> 
> I haven't looked at the cpu start up code yet to see if it
> is generally implementable.  I would think so, but I guess
> we need to be careful with our data structures.
> 
> > + Resume_devices:
> > +   device_resume();
> This of course should change.
Yes.
> > + Resume_console:
> > +   resume_console();
> > +   pm_restore_console();
> 
> Odd.  I'm a little surprised that the console is the last
> thing we restore.  But it does make sense to treat it 

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Huang, Ying
On Thu, 2007-09-20 at 20:55 -0600, Eric W. Biederman wrote:
> "Huang, Ying" <[EMAIL PROTECTED]> writes:
> 
> > This patch implements the functionality of jumping between the kexeced
> > kernel and the original kernel.
> >
> > A new reboot command named LINUX_REBOOT_CMD_KJUMP is defined to
> > trigger the jumping to (executing) the new kernel and jumping back to
> > the original kernel.
> >
> > To support jumping between two kernels, before jumping to (executing)
> > the new kernel and jumping back to the original kernel, the devices
> > are put into quiescent state (to be fully implemented),
> 
> Well this we have an implementation of (it's called shutdown) or does
> that method not do enough to meet the requirements of hibernation.

I think the "device_shutdown" is not enough for hibernation. Because in
current implementation of the device shutdown method, "recover" is not
considered. For example, for hibernation, the current executing request
of device should be delayed or finished before shutdown, and may be
re-executing after "recover". So I think another pair of callbacks may
be needed for the purpose of hibernation.

> If at all possible I would like to keep reboot, kexec and kexec+return
> all using the same device driver methods.

I totally agree!

> > and the state of devices and CPU is saved. 
> 
> Makes a reasonable amount of sense.  We do need to save whatever
> state we cannot recover just be reprogramming the hardware.
> As long as the drivers are built so this is a good place for a
> hot remove to happen we should be in good shape.
> 
> > After jumping back from kexeced kernel
> > and jumping to the new kernel, the state of devices and CPU are
> > restored accordingly. The devices/CPU state save/restore code of
> > software suspend is called to implement corresponding function.
> 
> At least for now that sounds like a reasonable work around.
> 
> I don't think we want to merge this code until we have agreed upon
> how the new device_detach and device_reattach (or whatever we call the
> device methods for hibernate) are to be implemented.

There is a thread on LKML about this:
http://lkml.org/lkml/2007/4/27/129

Do you agree with the conclusion there?

> > To support jumping without preserving memory. One shadow backup page
> > is allocated for each page used by new (kexeced) kernel. 
> 
> That does not sound correct.  The current implementation of kexec_load
> does allocate a source page and give it a destination page and usually
> those two pages are different.  But if our memory allocations happen
> to return a destination page there we use it directly, making no
> copy necessary.
> 
> I think we are talking about the same thing but I'm not certain
> you have thought about the case where your shadow backup page happens
> to be the same as current page.

My description here has some problem. If the source page (shadow page)
is same as the target page, there is no copy or swap. I have thought
about that, and current implementation works in this situation too. In
original kernel it is a allocated page for kexec, so it will not be used
for other purpose; in kexeced, it can be used freely.

> > When do
> > kexec_load, the image of new kernel is loaded into shadow pages, 
> 
> Ok.  This sounds like the existing implementation.  Except it
> depending on your destination it may force the address.

Yes. This is the existing implementation, just a little usage changing.
I load all memory area used by kexeced kernel in addition to kernel
image. This is done in kexec-tools. So the shadow page is allocated for
every pages used by kexeced kernel.

> > and
> > before executing, the original pages and the shadow pages are swapped,
> > so the contents of original pages are backuped.
> 
> Yes.  Unless we happen to have everything allocated on the same page.
> Does your code handle that case?  I know the generic kexec code will
> pass lists like that in the proper circumstances.  Especially for
> the kexec on panic case.

My code can handle that case. If everything allocated on the same page,
just do not swap or swap with itself. The same lists of generic kexec
code is used for swap too.

> > Before jumping to the
> > new (kexeced) kernel and after jumping back to the original kernel,
> > the original pages and the shadow pages are swapped too.
> 
> Yes.   That sounds right.
> 
> > A jump back protocol is defined and documented.
> 
> Bleh.  We do need to document the requirements but we don't need a
> versioned monster.  And we don't need to be exposing implementation
> details in that documentation.
> 
> In the kexec world /sbin/kexec or another user space caller is
> responsible for passing information to our callers.
> 
> To be polite we need to document more but the jump back protocol
> really should be as if the entry point kexec handed control to did
> a subroutine return.

This protocol is mainly for loading the hibernation image from the
bootloader directly, not for kexec. An external protocol 

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Huang, Ying
On Thu, 2007-09-20 at 20:55 -0600, Eric W. Biederman wrote:
 Huang, Ying [EMAIL PROTECTED] writes:
 
  This patch implements the functionality of jumping between the kexeced
  kernel and the original kernel.
 
  A new reboot command named LINUX_REBOOT_CMD_KJUMP is defined to
  trigger the jumping to (executing) the new kernel and jumping back to
  the original kernel.
 
  To support jumping between two kernels, before jumping to (executing)
  the new kernel and jumping back to the original kernel, the devices
  are put into quiescent state (to be fully implemented),
 
 Well this we have an implementation of (it's called shutdown) or does
 that method not do enough to meet the requirements of hibernation.

I think the device_shutdown is not enough for hibernation. Because in
current implementation of the device shutdown method, recover is not
considered. For example, for hibernation, the current executing request
of device should be delayed or finished before shutdown, and may be
re-executing after recover. So I think another pair of callbacks may
be needed for the purpose of hibernation.

 If at all possible I would like to keep reboot, kexec and kexec+return
 all using the same device driver methods.

I totally agree!

  and the state of devices and CPU is saved. 
 
 Makes a reasonable amount of sense.  We do need to save whatever
 state we cannot recover just be reprogramming the hardware.
 As long as the drivers are built so this is a good place for a
 hot remove to happen we should be in good shape.
 
  After jumping back from kexeced kernel
  and jumping to the new kernel, the state of devices and CPU are
  restored accordingly. The devices/CPU state save/restore code of
  software suspend is called to implement corresponding function.
 
 At least for now that sounds like a reasonable work around.
 
 I don't think we want to merge this code until we have agreed upon
 how the new device_detach and device_reattach (or whatever we call the
 device methods for hibernate) are to be implemented.

There is a thread on LKML about this:
http://lkml.org/lkml/2007/4/27/129

Do you agree with the conclusion there?

  To support jumping without preserving memory. One shadow backup page
  is allocated for each page used by new (kexeced) kernel. 
 
 That does not sound correct.  The current implementation of kexec_load
 does allocate a source page and give it a destination page and usually
 those two pages are different.  But if our memory allocations happen
 to return a destination page there we use it directly, making no
 copy necessary.
 
 I think we are talking about the same thing but I'm not certain
 you have thought about the case where your shadow backup page happens
 to be the same as current page.

My description here has some problem. If the source page (shadow page)
is same as the target page, there is no copy or swap. I have thought
about that, and current implementation works in this situation too. In
original kernel it is a allocated page for kexec, so it will not be used
for other purpose; in kexeced, it can be used freely.

  When do
  kexec_load, the image of new kernel is loaded into shadow pages, 
 
 Ok.  This sounds like the existing implementation.  Except it
 depending on your destination it may force the address.

Yes. This is the existing implementation, just a little usage changing.
I load all memory area used by kexeced kernel in addition to kernel
image. This is done in kexec-tools. So the shadow page is allocated for
every pages used by kexeced kernel.

  and
  before executing, the original pages and the shadow pages are swapped,
  so the contents of original pages are backuped.
 
 Yes.  Unless we happen to have everything allocated on the same page.
 Does your code handle that case?  I know the generic kexec code will
 pass lists like that in the proper circumstances.  Especially for
 the kexec on panic case.

My code can handle that case. If everything allocated on the same page,
just do not swap or swap with itself. The same lists of generic kexec
code is used for swap too.

  Before jumping to the
  new (kexeced) kernel and after jumping back to the original kernel,
  the original pages and the shadow pages are swapped too.
 
 Yes.   That sounds right.
 
  A jump back protocol is defined and documented.
 
 Bleh.  We do need to document the requirements but we don't need a
 versioned monster.  And we don't need to be exposing implementation
 details in that documentation.
 
 In the kexec world /sbin/kexec or another user space caller is
 responsible for passing information to our callers.
 
 To be polite we need to document more but the jump back protocol
 really should be as if the entry point kexec handed control to did
 a subroutine return.

This protocol is mainly for loading the hibernation image from the
bootloader directly, not for kexec. An external protocol should be
defined for the bootloader, because they are external code.

  Known issues
 
  - A field is added to Linux 

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Huang, Ying
On Thu, 2007-09-20 at 22:01 -0600, Eric W. Biederman wrote:
 Huang, Ying [EMAIL PROTECTED] writes:
 
  Index: linux-2.6.23-rc6/include/linux/kexec.h
  ===
  --- linux-2.6.23-rc6.orig/include/linux/kexec.h 2007-09-20 
  11:24:25.0
  +0800
  +++ linux-2.6.23-rc6/include/linux/kexec.h 2007-09-20 11:26:03.0 
  +0800
  @@ -83,6 +83,7 @@
   
  unsigned long start;
  struct page *control_code_page;
  +   struct page *swap_page;
   
  unsigned long nr_segments;
  struct kexec_segment segment[KEXEC_SEGMENT_MAX];
  @@ -194,4 +195,12 @@
   static inline void crash_kexec(struct pt_regs *regs) { }
   static inline int kexec_should_crash(struct task_struct *p) { return 0; }
   #endif /* CONFIG_KEXEC */
  +
  +#ifdef CONFIG_KEXEC_JUMP
  +extern int machine_kexec_jump(struct kimage *image);
  +extern unsigned long kexec_jump_back_entry;
  +extern int kexec_jump(void);
  +#else /* !CONFIG_KEXEC_JUMP */
  +static inline int kexec_jump(void) { return 0; }
  +#endif /* CONFIG_KEXEC_JUMP */
   #endif /* LINUX_KEXEC_H */
 
 Please the kexec_jump code just be triggered off of a flag in
 struct kimage.  We just need to define an extra flag to sys_kexec_load
 say KEXEC_RETURNS.  Ideally in the long term we would not have to
 do anything except to accept the flag.  Adding a flag makes
 a nice feature test if you want to see if your kernel supports
 the extended version of kexec.
 
 Until we get the hibernation methods sorted out storing the flag in
 struct kimage and making the methods that we call conditional feels
 like a more maintainable interface.  Especially since we have to
 know at kexec image load time what we are going to do with the
 kexec image.

You mean we use KEXEC_RETURNS when do sys_kexec_load, then use ordinary
reboot command LINUX_REBOOT_CMD_KEXEC, which call kexec_jump conditional
based on KEXEC_RETURNS? This is reasonable. I will change it.

  +#ifdef CONFIG_KEXEC_JUMP
  +unsigned long kexec_jump_back_entry;
  +
  +int kexec_jump(void)
  +{
  +   int error;
  +
  +   if (!kexec_image)
  +   return -EINVAL;
 
 I understand where you are coming from with this implementation of
 kexec_jump but it looks like this is one of the big parts of this
 patch that have not reached their final form.
 
 The line above is racy with sys_kexec_load.

Yes. I should use xchg(kexec_image, NULL) as that of other kexec
related functions.

  +   pm_prepare_console();
  +   suspend_console();
  +   error = device_suspend(PMSG_FREEZE);
  +   if (error)
  +   goto Resume_console;
 
 This as everyone knows needs to be device_shutdown or a better hibernation
 replacement.

Yes.

  +   error = disable_nonboot_cpus();
  +   if (error)
  +   goto Resume_devices;
 
 Can't we just catch the noboot cpu's in a mutex.
 disable_nonboot_cpus is actually impossible to implement 100% reliably
 with current hardware.  But something smp_call_function so we trap them
 at a specific location and then the equivalent when we come back should
 be simple.  I guess the tricky part is bringing the cpus back up again.
 
 Using the broken by design version of cpu hotplug really annoys me here.

I think this is not very simple. Given that we may jump back from the
kernel with SMP turned off, or from bootloader directly. But CPU hotplug
is another topic, I think it should be solved in another patch.

  +   local_irq_disable();
  +   /* At this point, device_suspend() has been called, but *not*
  +* device_power_down(). We *must* device_power_down() now.
  +* Otherwise, drivers for some devices (e.g. interrupt controllers)
  +* become desynchronized with the actual state of the hardware
  +* at resume time, and evil weirdness ensues.
  +*/
  +   error = device_power_down(PMSG_FREEZE);
  +   if (error)
  +   goto Enable_irqs;
 
 This of course should go away when we have the proper methods.
Yes.
  +   save_processor_state();
 This line might even be reasonable.
  +   error = machine_kexec_jump(kexec_image);
  +   restore_processor_state();
 
  +   /* NOTE:  device_power_up() is just a resume() for devices
  +* that suspended with irqs off ... no overall powerup.
  +*/
  +   device_power_up();
 Yep this can go away.
Yes.
  + Enable_irqs:
  +   local_irq_enable();
  +   enable_nonboot_cpus();
 
 I haven't looked at the cpu start up code yet to see if it
 is generally implementable.  I would think so, but I guess
 we need to be careful with our data structures.
 
  + Resume_devices:
  +   device_resume();
 This of course should change.
Yes.
  + Resume_console:
  +   resume_console();
  +   pm_restore_console();
 
 Odd.  I'm a little surprised that the console is the last
 thing we restore.  But it does make sense to treat it specially.
 
  +   return error;
  +}
  +#endif /* CONFIG_KEXEC_JUMP */

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a 

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Pavel Machek
Hi!
 
  Sounds doable, as long as you can cope with long command lines (which 
  shouldn't be a biggie). (If you've got a swapfile or parts of a swap 
  partition already in use, it can be quite fragmented).
 
 Hmm.  This is an interesting problem.  Sharing a swap file or a swap
 partition with the actual swap of user space pages does seem to be
 a limitation of this approach.
 
 Although the fact that it is simple to write to a separate file may
 be a reasonable compensation.

I'm not sure how you'd write it to a separate file. Notice that kjump
kernel may not mount journalling filesystems, not even
read-only. (Ext3 replays journal in that case). You could pass block
numbers from the original kernel...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Pavel Machek
Hi!

  Seems like good enough for -mm to me.

(For the record, I do not think this is going to be
hibernation-replacement any time soon. But it is functionality useful
for other stuff -- dump memory and continue -- and yes it may be able
to do hibernation in the long term.

It really comes from the other side of reliability:

* swsusp is if your kernel is perfectly healthy, it will work

while this, coming from kdump is

* if your kernel is not completely trashed, it should work

...which is why can't use swsusp to do dump memory and continue -- you
want to do dumps on slightly broken systems. And yes, as a
sideeffect it may be able to do hibernation... why not, lets see how
it works out).

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
 Nigel Cunningham [EMAIL PROTECTED] writes:
 
  That's not true. Kexec will itself be an implementation, otherwise you'd 
  end 
  up with people screaming about no hibernation support. 
 
 There needs to be an implementation of hibernation based on kexec with
 return yes.
 
  And it won't result in 
  the complete removal of the existing hibernation code from the kernel. At 
  the 
  very least, it's going to want the kernel being hibernated to have an 
  interface by which it can find out which pages need to be saved.
 
 That interface should be running kernel - user space - target kernel.
 Not direct kernel to kernel.
 
  I wouldn't 
  be surprised if it also ends up with an interface in which the kernel being 
  hibernated tells it what bdev/sectors in which to save the image as well 
  (otherwise you're going to need a dedicated, otherwise untouched partition 
  exclusively for the kexec'd kernel to use), or what network settings to use 
  if it wants to try to save the image to a network storage device. 
 
 initramfs.  We already seem to have that interface.  And distros
 seems to do a pretty decent job of using it to configure systems.
 
  On top of 
  that, there are all the issues related to device reinitialisation and so 
  on, 
 
 Yes.
 
  and it looks like there's greatly increased pain for users wanting to 
  configure this new implementation. 
 
 Not to be callous but that really is a user space and distro issue.
 
  Kexec is by no means proven to be the panacea for all the issues.
 
 I agree.  I'm still not quite convinced it will do a satisfactory job.
 But I think it does make sense to implement a general kexec with
 return and see if that can reasonably be used for handling hibernation
 issues.  If done cleanly and with care the implementation won't be
 hibernation specific.

Yes, and that's worth doing anyway, IMO.

 Frankly this looks like the best way I can see to implement a general
 mechanism for calling silly firmware/BIOS/EFI services after we
 have a kernel up and running.  It's a little bit like allowing
 X to call iopl(3) and do inb/outb directly.
 
 The configuration issues you raise pretty much exist for kexec on
 panic, and they seem to be being resolved for that case in a
 reasonable way.  I do agree that the current kexec+return effort seems
 to be one of those unfortunate cases where we give every mechanism in
 the kernel to do something in user space and then no one actually
 implements the user space.  That doesn't do any one any good.
 
 For hibernation we don't have the absolute need to step outside of the
 current kernel that we do in the kexec on panic approach.  However we
 have this practical fight about mechanism and policy, and kexec with
 return has this seductive allure that it appears to be the minimal
 necessary mechanism in the kernel.
 
 No one has yet attacked the hard problem of coming up with separate
 hibernate methods for drivers.

Well, I've been playing a bit with that for some time, but it's not easy by any
means.

In short, I'm seeing some problems related to the handling of ACPI that seem to
shatter the entire idea of having separate hibernate methods, at least as far
as ACPI systems are concerned.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
Hi Andrew,

On Friday, 21 September 2007 03:41, Andrew Morton wrote:
 On Fri, 21 Sep 2007 11:19:59 +1000 Nigel Cunningham [EMAIL PROTECTED] wrote:
 
  Hi.
  
  On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
   On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
  [EMAIL PROTECTED] wrote:
   
Hi Andrew.

On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
 Seems like good enough for -mm to me.
 
   
 Pavel

Andrew, if I recall correctly, you said a while ago that you didn't 
want 
another hibernation implementation in the vanilla kernel. If you're 
going 
  to 
consider merging this kexec code, will you also please consider merging 
TuxOnIce?

   
   The theory is that kexec-based hibernation will mainly use preexisting
   kexec code and will permit us to delete the existing hibernation
   implementation.
   
   That's different from replacing it.
  
  TuxOnIce doesn't remove the existing implementation either. It can 
  transparently replace it, but you can enable/disable that at compile time.
 
 Right.  So we end up with two implementations in-tree.  Whereas
 kexec-based-hibernation leads us to having zero implementations in-tree.

Well, I don't quite agree.

For now, the kexec-based approach is missing the handling of devices, AFAICS.
Namely, it's quite easy to snapshot memory with the help of kexec, but the
state of devices gets trashed in the process, so you need some additional code
saving the state of devices for you, executed before the kexec.

Moreover, on ACPI systems the transition to the S4 sleep state and back to S0
(working state) is more complicated than a system checkpointing, because we
are supposed to take the platform firmware into consideration in that case.
The more I think about this, the more it seems to me that it just can't be done
on top of kexec in a reasonable fashion.  Of course, we could avoid handling
the ACPI S4, but that would leave some people (including me ;-)) with
semi-working hardware after the restore.  I don't think that's generally
acceptable in the long run.

IMHO, for ACPI systems the way to go is to harden suspend to RAM (with s2ram
in place and the graphics adapters specifications from Intel and AMD released
we are in a good position to do that) and build the S4 transition mechanism
on top of that.  It can be done easlily by adapting the current hibernation
code, but not on top of kexec (I'm afraid).

[Besides, the current hibernation userland interface is used by default by
openSUSE and it's also used by quite some Debian users, so we can't drop
it overnight and it can't be implemented in a compatible way on top of the
kexec-based solution.]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 11:49, Pavel Machek wrote:
 Hi!
 
   Seems like good enough for -mm to me.
 
 (For the record, I do not think this is going to be
 hibernation-replacement any time soon. But it is functionality useful
 for other stuff -- dump memory and continue -- and yes it may be able
 to do hibernation in the long term.
 
 It really comes from the other side of reliability:
 
 * swsusp is if your kernel is perfectly healthy, it will work
 
 while this, coming from kdump is
 
 * if your kernel is not completely trashed, it should work
 
 ...which is why can't use swsusp to do dump memory and continue -- you
 want to do dumps on slightly broken systems. And yes, as a
 sideeffect it may be able to do hibernation... why not, lets see how
 it works out).

I generally agree. :-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Nigel Cunningham
Hi.

On Friday 21 September 2007 21:56:29 Rafael J. Wysocki wrote:
 [Besides, the current hibernation userland interface is used by default by
 openSUSE and it's also used by quite some Debian users, so we can't drop
 it overnight and it can't be implemented in a compatible way on top of the
 kexec-based solution.]

Could it be fudged by giving userland a null image and having (say) the first 
ioctl be one that triggers all the real work (with other ioctls being noops 
or such like, as appropriate)?

Regards,

Nigel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 13:58, Nigel Cunningham wrote:
 Hi.
 
 On Friday 21 September 2007 21:56:29 Rafael J. Wysocki wrote:
  [Besides, the current hibernation userland interface is used by default by
  openSUSE and it's also used by quite some Debian users, so we can't drop
  it overnight and it can't be implemented in a compatible way on top of the
  kexec-based solution.]
 
 Could it be fudged by giving userland a null image and having (say) the first 
 ioctl be one that triggers all the real work (with other ioctls being noops 
 or such like, as appropriate)?

Well, the suspend part is probably doable, but I'm afraid of the resume
one.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Nigel Cunningham
Hi.

On Friday 21 September 2007 22:18:19 Rafael J. Wysocki wrote:
 On Friday, 21 September 2007 13:58, Nigel Cunningham wrote:
  Hi.
  
  On Friday 21 September 2007 21:56:29 Rafael J. Wysocki wrote:
   [Besides, the current hibernation userland interface is used by default 
by
   openSUSE and it's also used by quite some Debian users, so we can't drop
   it overnight and it can't be implemented in a compatible way on top of 
the
   kexec-based solution.]
  
  Could it be fudged by giving userland a null image and having (say) the 
first 
  ioctl be one that triggers all the real work (with other ioctls being 
noops 
  or such like, as appropriate)?
 
 Well, the suspend part is probably doable, but I'm afraid of the resume
 one.

'k. I've occasionally thought about trying it, but haven't ever gotten around 
to actually doing it yet. (I'd like to make TuxOnIce transparently replace 
both swsusp and uswsusp if I could).

Regards,

Nigel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread huang ying
On 9/21/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
  Nigel Cunningham [EMAIL PROTECTED] writes:
  
   That's not true. Kexec will itself be an implementation, otherwise you'd 
   end
   up with people screaming about no hibernation support.
 
  There needs to be an implementation of hibernation based on kexec with
  return yes.
 
   And it won't result in
   the complete removal of the existing hibernation code from the kernel. At 
   the
   very least, it's going to want the kernel being hibernated to have an
   interface by which it can find out which pages need to be saved.
 
  That interface should be running kernel - user space - target kernel.
  Not direct kernel to kernel.
 
   I wouldn't
   be surprised if it also ends up with an interface in which the kernel 
   being
   hibernated tells it what bdev/sectors in which to save the image as well
   (otherwise you're going to need a dedicated, otherwise untouched partition
   exclusively for the kexec'd kernel to use), or what network settings to 
   use
   if it wants to try to save the image to a network storage device.
 
  initramfs.  We already seem to have that interface.  And distros
  seems to do a pretty decent job of using it to configure systems.
 
   On top of
   that, there are all the issues related to device reinitialisation and so 
   on,
 
  Yes.
 
   and it looks like there's greatly increased pain for users wanting to
   configure this new implementation.
 
  Not to be callous but that really is a user space and distro issue.
 
   Kexec is by no means proven to be the panacea for all the issues.
 
  I agree.  I'm still not quite convinced it will do a satisfactory job.
  But I think it does make sense to implement a general kexec with
  return and see if that can reasonably be used for handling hibernation
  issues.  If done cleanly and with care the implementation won't be
  hibernation specific.

 Yes, and that's worth doing anyway, IMO.

  Frankly this looks like the best way I can see to implement a general
  mechanism for calling silly firmware/BIOS/EFI services after we
  have a kernel up and running.  It's a little bit like allowing
  X to call iopl(3) and do inb/outb directly.
 
  The configuration issues you raise pretty much exist for kexec on
  panic, and they seem to be being resolved for that case in a
  reasonable way.  I do agree that the current kexec+return effort seems
  to be one of those unfortunate cases where we give every mechanism in
  the kernel to do something in user space and then no one actually
  implements the user space.  That doesn't do any one any good.
 
  For hibernation we don't have the absolute need to step outside of the
  current kernel that we do in the kexec on panic approach.  However we
  have this practical fight about mechanism and policy, and kexec with
  return has this seductive allure that it appears to be the minimal
  necessary mechanism in the kernel.
 
  No one has yet attacked the hard problem of coming up with separate
  hibernate methods for drivers.

 Well, I've been playing a bit with that for some time, but it's not easy by 
 any
 means.

 In short, I'm seeing some problems related to the handling of ACPI that seem 
 to
 shatter the entire idea of having separate hibernate methods, at least as far
 as ACPI systems are concerned.

So sadly to hear this. Can you details it a little? Or a link?

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread huang ying
On 9/21/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 Hi Andrew,

 On Friday, 21 September 2007 03:41, Andrew Morton wrote:
  On Fri, 21 Sep 2007 11:19:59 +1000 Nigel Cunningham [EMAIL PROTECTED] 
  wrote:
 
   Hi.
  
   On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham
   [EMAIL PROTECTED] wrote:
   
 Hi Andrew.

 On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
  Seems like good enough for -mm to me.
 
  
Pavel

 Andrew, if I recall correctly, you said a while ago that you didn't 
 want
 another hibernation implementation in the vanilla kernel. If you're 
 going
   to
 consider merging this kexec code, will you also please consider 
 merging
 TuxOnIce?

   
The theory is that kexec-based hibernation will mainly use preexisting
kexec code and will permit us to delete the existing hibernation
implementation.
   
That's different from replacing it.
  
   TuxOnIce doesn't remove the existing implementation either. It can
   transparently replace it, but you can enable/disable that at compile time.
 
  Right.  So we end up with two implementations in-tree.  Whereas
  kexec-based-hibernation leads us to having zero implementations in-tree.

 Well, I don't quite agree.

 For now, the kexec-based approach is missing the handling of devices, AFAICS.
 Namely, it's quite easy to snapshot memory with the help of kexec, but the
 state of devices gets trashed in the process, so you need some additional code
 saving the state of devices for you, executed before the kexec.

 Moreover, on ACPI systems the transition to the S4 sleep state and back to S0
 (working state) is more complicated than a system checkpointing, because we
 are supposed to take the platform firmware into consideration in that case.
 The more I think about this, the more it seems to me that it just can't be 
 done
 on top of kexec in a reasonable fashion.  Of course, we could avoid handling
 the ACPI S4, but that would leave some people (including me ;-)) with
 semi-working hardware after the restore.  I don't think that's generally
 acceptable in the long run.

 IMHO, for ACPI systems the way to go is to harden suspend to RAM (with s2ram
 in place and the graphics adapters specifications from Intel and AMD released
 we are in a good position to do that) and build the S4 transition mechanism
 on top of that.  It can be done easlily by adapting the current hibernation
 code, but not on top of kexec (I'm afraid).

Yes. ACPI is a biggest issue of kexec based hibernation now. I will
try to work on that. At least I can prove whether kexec based
hibernation is possible with ACPI.

 [Besides, the current hibernation userland interface is used by default by
 openSUSE and it's also used by quite some Debian users, so we can't drop
 it overnight and it can't be implemented in a compatible way on top of the
 kexec-based solution.]

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 15:14, huang ying wrote:
 On 9/21/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
   Nigel Cunningham [EMAIL PROTECTED] writes:
[--snip--]
  
   No one has yet attacked the hard problem of coming up with separate
   hibernate methods for drivers.
 
  Well, I've been playing a bit with that for some time, but it's not easy by 
  any
  means.
 
  In short, I'm seeing some problems related to the handling of ACPI that 
  seem to
  shatter the entire idea of having separate hibernate methods, at least as 
  far
  as ACPI systems are concerned.
 
 So sadly to hear this. Can you details it a little? Or a link?

Well, the problem is that apparently some systems (eg. my HP nx6325) expect us
to execute the _PTS ACPI global control method before creating the image _and_
to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
system into the sleep state.  In particular, on nx6325, if we don't do that,
then after the restore the status of the AC power will not be reported
correctly (and if you replace the battery while in the sleep state, the
battery status will not be updated correctly after the restore).  Similar
issues have been reported for other machines.

Now, the ACPI specification requires us to put devices into low power states
before executing _PTS and that's exactly what we're doing before a suspend to
RAM.  Thus, it seems that in general we need to do the same for hibernation on
ACPI systems.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread huang ying
On 9/21/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 On Friday, 21 September 2007 15:14, huang ying wrote:
  On 9/21/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
   On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
Nigel Cunningham [EMAIL PROTECTED] writes:
 [--snip--]
   
No one has yet attacked the hard problem of coming up with separate
hibernate methods for drivers.
  
   Well, I've been playing a bit with that for some time, but it's not easy 
   by any
   means.
  
   In short, I'm seeing some problems related to the handling of ACPI that 
   seem to
   shatter the entire idea of having separate hibernate methods, at least as 
   far
   as ACPI systems are concerned.
 
  So sadly to hear this. Can you details it a little? Or a link?

 Well, the problem is that apparently some systems (eg. my HP nx6325) expect us
 to execute the _PTS ACPI global control method before creating the image _and_
 to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
 system into the sleep state.  In particular, on nx6325, if we don't do that,
 then after the restore the status of the AC power will not be reported
 correctly (and if you replace the battery while in the sleep state, the
 battery status will not be updated correctly after the restore).  Similar
 issues have been reported for other machines.

 Now, the ACPI specification requires us to put devices into low power states
 before executing _PTS and that's exactly what we're doing before a suspend to
 RAM.  Thus, it seems that in general we need to do the same for hibernation on
 ACPI systems.

Then, is it possible to separate device quiesce from device suspend.
Perhaps not for swsusp, but for kexec based hibernation?

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 17:02, huang ying wrote:
 On 9/21/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  On Friday, 21 September 2007 15:14, huang ying wrote:
   On 9/21/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
 Nigel Cunningham [EMAIL PROTECTED] writes:
  [--snip--]

 No one has yet attacked the hard problem of coming up with separate
 hibernate methods for drivers.
   
Well, I've been playing a bit with that for some time, but it's not 
easy by any
means.
   
In short, I'm seeing some problems related to the handling of ACPI that 
seem to
shatter the entire idea of having separate hibernate methods, at least 
as far
as ACPI systems are concerned.
  
   So sadly to hear this. Can you details it a little? Or a link?
 
  Well, the problem is that apparently some systems (eg. my HP nx6325) expect 
  us
  to execute the _PTS ACPI global control method before creating the image 
  _and_
  to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
  system into the sleep state.  In particular, on nx6325, if we don't do that,
  then after the restore the status of the AC power will not be reported
  correctly (and if you replace the battery while in the sleep state, the
  battery status will not be updated correctly after the restore).  Similar
  issues have been reported for other machines.
 
  Now, the ACPI specification requires us to put devices into low power states
  before executing _PTS and that's exactly what we're doing before a suspend 
  to
  RAM.  Thus, it seems that in general we need to do the same for hibernation 
  on
  ACPI systems.
 
 Then, is it possible to separate device quiesce from device suspend.

It surely is possible, but I'm not sure if it's going to be useful.

I mean, if we need to do exactly the same thing before a suspend to RAM and
before a hibernation (ie. to put devices into low power states), why would we
want to use different methods for that in both cases?

 Perhaps not for swsusp, but for kexec based hibernation?

Frankly, I don't know.

Generally, changing the way in which device drivers handle suspend (to RAM)
and hibernation is a huge task.  After considering this issue for some time
I think that we really should start from hardening suspend (to RAM) so that it
doesn't need the freezer any more, because _that_ would require us to change
the suspend-related drivers' callbacks anyway.

When we are sure how we are going to eliminate the freezer from suspend
(to RAM), we'll know how that affects hibernation and what to do about it.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard
Rafael J. Wysocki [EMAIL PROTECTED] writes:

 On Friday, 21 September 2007 15:14, huang ying wrote:
 On 9/21/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
  On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
   Nigel Cunningham [EMAIL PROTECTED] writes:
 [--snip--]
  
   No one has yet attacked the hard problem of coming up with separate
   hibernate methods for drivers.
 
  Well, I've been playing a bit with that for some time, but it's not easy by
 any
  means.
 
  In short, I'm seeing some problems related to the handling of ACPI that 
  seem
 to
  shatter the entire idea of having separate hibernate methods, at least as
 far
  as ACPI systems are concerned.
 
 So sadly to hear this. Can you details it a little? Or a link?

 Well, the problem is that apparently some systems (eg. my HP nx6325) expect us
 to execute the _PTS ACPI global control method before creating the image _and_
 to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
 system into the sleep state.  In particular, on nx6325, if we don't do that,
 then after the restore the status of the AC power will not be reported
 correctly (and if you replace the battery while in the sleep state, the
 battery status will not be updated correctly after the restore).  Similar
 issues have been reported for other machines.

Suppose that instead of using ACPI S4 state at all, you instead just
power off.  Yes, you'll lose wakeup event functionality, and flashy
LEDs, but doesn't this take care of the problem?  The firmware shouldn't
see the hibernate as anything other than a shutdown and reboot.  ACPI
should be initialized normally when resuming, which should take care of
getting AC power status reported properly.

This should be the behavior, anyway, on the many systems that do not
support S4.

 Now, the ACPI specification requires us to put devices into low power states
 before executing _PTS and that's exactly what we're doing before a suspend to
 RAM.  Thus, it seems that in general we need to do the same for hibernation on
 ACPI systems.

It seems that if ACPI S4 is going to be used, Switching to low power
state is something that should be done only immediately before entering
that state (i.e. after the image has already been saved).  In
particular, it should not be done just before the atomic copy.  It is
true that (during resume) after the atomic copy snapshot is restored,
drivers will need to be prepared (i.e. have saved whatever information
is necessary) to _resume_ devices from the low power state, but that
does not mean they have to actually be put into that low power state
before the copy is made.

I agree that for the kexec implementation there may be additional
issues.  For swsusp, uswsusp, and tuxonice, though, I don't see why
there should be a problem.  I think that, as was recognized before, all
of the issues are resolved by properly considering exactly what each
callback should do and when it should be called.  The problems stem from
ambiguous specifications, or trying to use the same callback for two
different purposes or in two different cases.

Let me know if I'm mistaken.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 20:11, Jeremy Maitin-Shepard wrote:
 Rafael J. Wysocki [EMAIL PROTECTED] writes:
 
  On Friday, 21 September 2007 15:14, huang ying wrote:
  On 9/21/07, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
   On Friday, 21 September 2007 05:33, Eric W. Biederman wrote:
Nigel Cunningham [EMAIL PROTECTED] writes:
  [--snip--]
   
No one has yet attacked the hard problem of coming up with separate
hibernate methods for drivers.
  
   Well, I've been playing a bit with that for some time, but it's not easy 
   by
  any
   means.
  
   In short, I'm seeing some problems related to the handling of ACPI that 
   seem
  to
   shatter the entire idea of having separate hibernate methods, at least as
  far
   as ACPI systems are concerned.
  
  So sadly to hear this. Can you details it a little? Or a link?
 
  Well, the problem is that apparently some systems (eg. my HP nx6325) expect 
  us
  to execute the _PTS ACPI global control method before creating the image 
  _and_
  to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put the
  system into the sleep state.  In particular, on nx6325, if we don't do that,
  then after the restore the status of the AC power will not be reported
  correctly (and if you replace the battery while in the sleep state, the
  battery status will not be updated correctly after the restore).  Similar
  issues have been reported for other machines.
 
 Suppose that instead of using ACPI S4 state at all, you instead just
 power off.  Yes, you'll lose wakeup event functionality, and flashy
 LEDs, but doesn't this take care of the problem?

Nope.

 The firmware shouldn't see the hibernate as anything other than a shutdown
 and reboot.

Actually, this assumption is apparently wrong.

 ACPI should be initialized normally when resuming, which should take care of
 getting AC power status reported properly.

Well, that doesn't work.  I've tested it, really. :-)

 This should be the behavior, anyway, on the many systems that do not
 support S4.
 
  Now, the ACPI specification requires us to put devices into low power states
  before executing _PTS and that's exactly what we're doing before a suspend 
  to
  RAM.  Thus, it seems that in general we need to do the same for hibernation 
  on
  ACPI systems.
 
 It seems that if ACPI S4 is going to be used, Switching to low power
 state is something that should be done only immediately before entering
 that state (i.e. after the image has already been saved).

Doesn't.  Work.

 In particular, it should not be done just before the atomic copy.  It is
 true that (during resume) after the atomic copy snapshot is restored,
 drivers will need to be prepared (i.e. have saved whatever information
 is necessary) to _resume_ devices from the low power state, but that
 does not mean they have to actually be put into that low power state
 before the copy is made.
 
 I agree that for the kexec implementation there may be additional
 issues.  For swsusp, uswsusp, and tuxonice, though, I don't see why
 there should be a problem.  I think that, as was recognized before, all
 of the issues are resolved by properly considering exactly what each
 callback should do and when it should be called.  The problems stem from
 ambiguous specifications, or trying to use the same callback for two
 different purposes or in two different cases.
 
 Let me know if I'm mistaken.

See above. :-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Alan Stern
On Fri, 21 Sep 2007, Rafael J. Wysocki wrote:

   Well, the problem is that apparently some systems (eg. my HP nx6325) 
   expect us
   to execute the _PTS ACPI global control method before creating the image 
   _and_
   to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally put 
   the
   system into the sleep state.  In particular, on nx6325, if we don't do 
   that,
   then after the restore the status of the AC power will not be reported
   correctly (and if you replace the battery while in the sleep state, the
   battery status will not be updated correctly after the restore).  Similar
   issues have been reported for other machines.
  
  Suppose that instead of using ACPI S4 state at all, you instead just
  power off.  Yes, you'll lose wakeup event functionality, and flashy
  LEDs, but doesn't this take care of the problem?
 
 Nope.
 
  The firmware shouldn't see the hibernate as anything other than a shutdown
  and reboot.
 
 Actually, this assumption is apparently wrong.

One gets the impression that the hibernation image includes a memory 
area used by the firmware.  That could explain why devices need to be 
in a low-power state when the image is created -- so that when the 
image is restored, the firmware doesn't get confused about the device 
states.

It would also explain why the firmware sees
resume-from-power-off-hibernation as different from a regular reboot:
because its data area gets overwritten as part of the resume.

In reality it's probably more complicated than this, with weird 
interactions between the firmware and the various ACPI methods.  
Nevertheless, the main idea seems valid.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 21:45, Alan Stern wrote:
 On Fri, 21 Sep 2007, Rafael J. Wysocki wrote:
 
Well, the problem is that apparently some systems (eg. my HP nx6325) 
expect us
to execute the _PTS ACPI global control method before creating the 
image _and_
to execute acpi_enter_sleep_state(ACPI_STATE_S4) in order to finally 
put the
system into the sleep state.  In particular, on nx6325, if we don't do 
that,
then after the restore the status of the AC power will not be reported
correctly (and if you replace the battery while in the sleep state, the
battery status will not be updated correctly after the restore).  
Similar
issues have been reported for other machines.
   
   Suppose that instead of using ACPI S4 state at all, you instead just
   power off.  Yes, you'll lose wakeup event functionality, and flashy
   LEDs, but doesn't this take care of the problem?
  
  Nope.
  
   The firmware shouldn't see the hibernate as anything other than a shutdown
   and reboot.
  
  Actually, this assumption is apparently wrong.
 
 One gets the impression that the hibernation image includes a memory 
 area used by the firmware.  That could explain why devices need to be 
 in a low-power state when the image is created -- so that when the 
 image is restored, the firmware doesn't get confused about the device 
 states.
 
 It would also explain why the firmware sees
 resume-from-power-off-hibernation as different from a regular reboot:
 because its data area gets overwritten as part of the resume.
 
 In reality it's probably more complicated than this, with weird 
 interactions between the firmware and the various ACPI methods.  
 Nevertheless, the main idea seems valid.

I guess so, but I'm not sure.

The ACPI NVS area is explicitly marked as reserved and we don't save it.
On x86_64 we don't save any memory areas marked as reserved and yet the above
happens.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard
Rafael J. Wysocki [EMAIL PROTECTED] writes:

[snip]

 The ACPI NVS area is explicitly marked as reserved and we don't save it.
 On x86_64 we don't save any memory areas marked as reserved and yet the above
 happens.

I think you have mentioned before, though, that ACPI is first
initialized by the boot kernel, before it is later initialized by
resuming kernel.  This could well be the source of the problem.

In particular, isn't it the case that you also switch the devices to low
power mode before resuming?

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
 Rafael J. Wysocki [EMAIL PROTECTED] writes:
 
 [snip]
 
  The ACPI NVS area is explicitly marked as reserved and we don't save it.
  On x86_64 we don't save any memory areas marked as reserved and yet the 
  above
  happens.
 
 I think you have mentioned before, though, that ACPI is first
 initialized by the boot kernel, before it is later initialized by
 resuming kernel.  This could well be the source of the problem.

No, it's not.  I have tested that too with an ACPI-less boot kernel.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard
Rafael J. Wysocki [EMAIL PROTECTED] writes:

 On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
 Rafael J. Wysocki [EMAIL PROTECTED] writes:
 
 [snip]
 
  The ACPI NVS area is explicitly marked as reserved and we don't save it.
  On x86_64 we don't save any memory areas marked as reserved and yet the
 above
  happens.
 
 I think you have mentioned before, though, that ACPI is first
 initialized by the boot kernel, before it is later initialized by
 resuming kernel.  This could well be the source of the problem.

 No, it's not.  I have tested that too with an ACPI-less boot kernel.

Well, it seems that there just must be some other bug.  I would define
anything that differs between the post-resume initialization of ACPI from
the normal boot initialization of ACPI as a bug.  If the interaction
with the hardware is the same, then the behavior will be the same.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Rafael J. Wysocki
On Friday, 21 September 2007 23:08, Jeremy Maitin-Shepard wrote:
 Rafael J. Wysocki [EMAIL PROTECTED] writes:
 
  On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
  Rafael J. Wysocki [EMAIL PROTECTED] writes:
  
  [snip]
  
   The ACPI NVS area is explicitly marked as reserved and we don't save it.
   On x86_64 we don't save any memory areas marked as reserved and yet the
  above
   happens.
  
  I think you have mentioned before, though, that ACPI is first
  initialized by the boot kernel, before it is later initialized by
  resuming kernel.  This could well be the source of the problem.
 
  No, it's not.  I have tested that too with an ACPI-less boot kernel.
 
 Well, it seems that there just must be some other bug.  I would define
 anything that differs between the post-resume initialization of ACPI

I'm not sure what you mean.

 from the normal boot initialization of ACPI as a bug.  If the interaction
 with the hardware is the same, then the behavior will be the same.

The ACPI platform firmware is allowed to preserve information accross the
hibernation-resume cycle, so this need not be the same.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Jeremy Maitin-Shepard
Rafael J. Wysocki [EMAIL PROTECTED] writes:

 On Friday, 21 September 2007 23:08, Jeremy Maitin-Shepard wrote:
 Rafael J. Wysocki [EMAIL PROTECTED] writes:
 
  On Friday, 21 September 2007 22:26, Jeremy Maitin-Shepard wrote:
  Rafael J. Wysocki [EMAIL PROTECTED] writes:
  
  [snip]
  
   The ACPI NVS area is explicitly marked as reserved and we don't save it.
   On x86_64 we don't save any memory areas marked as reserved and yet the
  above
   happens.
  
  I think you have mentioned before, though, that ACPI is first
  initialized by the boot kernel, before it is later initialized by
  resuming kernel.  This could well be the source of the problem.
 
  No, it's not.  I have tested that too with an ACPI-less boot kernel.
 
 Well, it seems that there just must be some other bug.  I would define
 anything that differs between the post-resume initialization of ACPI

 I'm not sure what you mean.

 from the normal boot initialization of ACPI as a bug.  If the interaction
 with the hardware is the same, then the behavior will be the same.

 The ACPI platform firmware is allowed to preserve information accross the
 hibernation-resume cycle, so this need not be the same.

All of my comments related to the case where S4 is not being used
(instead the system is just powered off normally), and a boot kernel
that does not initialize ACPI is used.  In that case, the ACPI platform
firmware should not be able to distinguish a normal boot from a resume
from hibernation.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Kyle Moffett

On Sep 21, 2007, at 17:16:59, Jeremy Maitin-Shepard wrote:

Rafael J. Wysocki [EMAIL PROTECTED] writes:
The ACPI platform firmware is allowed to preserve information  
accross the hibernation-resume cycle, so this need not be the same.


All of my comments related to the case where S4 is not being used  
(instead the system is just powered off normally), and a boot  
kernel that does not initialize ACPI is used.  In that case, the  
ACPI platform firmware should not be able to distinguish a normal  
boot from a resume from hibernation.


I think that in order for this to work, there would need to be some  
ABI whereby the resume-ing kernel can pass its entire ACPI state and  
a bunch of other ACPI-related device details to the resume-ed kernel,  
which I believe it does not do at the moment.  I believe that what  
causes problems is the ACPI state data that the kernel stores is  
*different* between identical sequential boots, especially when you  
add/remove/replace batteries, AC, etc.


Since we currently throw away most of that in-kernel ACPI interpreter  
state data when we load the to-be-resumed image and replace it with  
the state from the previous boot it looks to the ACPI code and  
firmware like our system's hardware magically changed behind its  
back.  The result is that the ACPI and firmware code is justifiably  
confused (although probably it should be more idempotent to begin  
with).  There's 2 potential solutions:
  1) Formalize and copy a *lot* of ACPI state from the resume-ing  
kernel to the resume-ed kernel.

  2) Properly call the ACPI S4 methods in the proper order

Neither one is particularly easy or particularly pleasant, especially  
given all the vendor bugs in this general area.  Theoretically we  
should be able to do both, since one will be more reliable than the  
other on different systems depending on what kinds of firmware bugs  
they have.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-21 Thread Nigel Cunningham
Hi.

On Saturday 22 September 2007 09:19:18 Kyle Moffett wrote:
 I think that in order for this to work, there would need to be some  
 ABI whereby the resume-ing kernel can pass its entire ACPI state and  
 a bunch of other ACPI-related device details to the resume-ed kernel,  
 which I believe it does not do at the moment.  I believe that what  
 causes problems is the ACPI state data that the kernel stores is  
 *different* between identical sequential boots, especially when you  
 add/remove/replace batteries, AC, etc.

That's certainly possible. We already pass a very small amount of data between 
the boot and resuming kernels at the moment, and it's done quite simply - by 
putting the variables we want to 'transfer' in a nosave page/section. I could 
conceive of a scheme wherein this was extended for driver data. Since the 
memory needed would depend on the drivers loaded, it would probably require 
that the space be allocated when hibernating, and the locations of structures 
be stored in the image header and then drivers notified of the locations to 
use when preparing to resume, but it could work...
 
 Since we currently throw away most of that in-kernel ACPI interpreter  
 state data when we load the to-be-resumed image and replace it with  
 the state from the previous boot it looks to the ACPI code and  
 firmware like our system's hardware magically changed behind its  
 back.  The result is that the ACPI and firmware code is justifiably  
 confused (although probably it should be more idempotent to begin  
 with).  There's 2 potential solutions:
1) Formalize and copy a *lot* of ACPI state from the resume-ing  
 kernel to the resume-ed kernel.
2) Properly call the ACPI S4 methods in the proper order

... that said, I don't think the above should be necessary in most cases. I 
believe we're already calling the ACPI S4 methods in the proper order. If I 
understood correctly, Rafael put a lot of effort into learning what that was, 
and into ensuring it does get done.
 
 Neither one is particularly easy or particularly pleasant, especially  
 given all the vendor bugs in this general area.  Theoretically we  
 should be able to do both, since one will be more reliable than the  
 other on different systems depending on what kinds of firmware bugs  
 they have.

Regards,

Nigel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Eric W. Biederman
Nigel Cunningham <[EMAIL PROTECTED]> writes:
>
> Sounds doable, as long as you can cope with long command lines (which 
> shouldn't be a biggie). (If you've got a swapfile or parts of a swap 
> partition already in use, it can be quite fragmented).

Hmm.  This is an interesting problem.  Sharing a swap file or a swap
partition with the actual swap of user space pages does seem to be
a limitation of this approach.

Although the fact that it is simple to write to a separate file may
be a reasonable compensation.

> Andrew, you're seeing that it really doesn't mean the removal of all 
> hibernation code from the kernel being suspended, aren't you? (And if the 
> kexec'd kernel is the same binary, then there's more code again).

More binary size yes not more code to maintain.

As for the rest the current implementation is small enough and allows
for enough beyond hibernation I think it makes sense to eventually
merge assuming a good clean implementation can be achieved.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Andrew Morton
On Fri, 21 Sep 2007 11:57:26 +1000 Nigel Cunningham <[EMAIL PROTECTED]> wrote:

> Hi.
> 
> On Friday 21 September 2007 11:41:06 Andrew Morton wrote:
> > > On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> > > > On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
> > > <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > Hi Andrew.
> > > > > 
> > > > > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > > > > Seems like good enough for -mm to me.
> > > > > > 
> > > > > > 
> > > > > > Pavel
> > > > > 
> > > > > Andrew, if I recall correctly, you said a while ago that you didn't 
> want 
> > > > > another hibernation implementation in the vanilla kernel. If you're 
> going 
> > > to 
> > > > > consider merging this kexec code, will you also please consider 
> merging 
> > > > > TuxOnIce?
> > > > > 
> > > > 
> > > > The theory is that kexec-based hibernation will mainly use preexisting
> > > > kexec code and will permit us to delete the existing hibernation
> > > > implementation.
> > > > 
> > > > That's different from replacing it.
> > > 
> > > TuxOnIce doesn't remove the existing implementation either. It can 
> > > transparently replace it, but you can enable/disable that at compile time.
> > 
> > Right.  So we end up with two implementations in-tree.  Whereas
> > kexec-based-hibernation leads us to having zero implementations in-tree.
> > 
> > See, it's different.
> 
> That's not true. Kexec will itself be an implementation, otherwise you'd end 
> up with people screaming about no hibernation support. And it won't result in 
> the complete removal of the existing hibernation code from the kernel. At the 
> very least, it's going to want the kernel being hibernated to have an 
> interface by which it can find out which pages need to be saved. I wouldn't 
> be surprised if it also ends up with an interface in which the kernel being 
> hibernated tells it what bdev/sectors in which to save the image as well 
> (otherwise you're going to need a dedicated, otherwise untouched partition 
> exclusively for the kexec'd kernel to use), or what network settings to use 
> if it wants to try to save the image to a network storage device. On top of 
> that, there are all the issues related to device reinitialisation and so on, 
> and it looks like there's greatly increased pain for users wanting to 
> configure this new implementation. Kexec is by no means proven to be the 
> panacea for all the issues.
> 

Maybe, maybe not, dunno.  That's why we haven't merged it yet.  If it ends
up being no good, we won't merge it!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Eric W. Biederman
"Huang, Ying" <[EMAIL PROTECTED]> writes:

> Index: linux-2.6.23-rc6/include/linux/kexec.h
> ===
> --- linux-2.6.23-rc6.orig/include/linux/kexec.h 2007-09-20 11:24:25.0
> +0800
> +++ linux-2.6.23-rc6/include/linux/kexec.h 2007-09-20 11:26:03.0 +0800
> @@ -83,6 +83,7 @@
>  
>   unsigned long start;
>   struct page *control_code_page;
> + struct page *swap_page;
>  
>   unsigned long nr_segments;
>   struct kexec_segment segment[KEXEC_SEGMENT_MAX];
> @@ -194,4 +195,12 @@
>  static inline void crash_kexec(struct pt_regs *regs) { }
>  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
>  #endif /* CONFIG_KEXEC */
> +
> +#ifdef CONFIG_KEXEC_JUMP
> +extern int machine_kexec_jump(struct kimage *image);
> +extern unsigned long kexec_jump_back_entry;
> +extern int kexec_jump(void);
> +#else /* !CONFIG_KEXEC_JUMP */
> +static inline int kexec_jump(void) { return 0; }
> +#endif /* CONFIG_KEXEC_JUMP */
>  #endif /* LINUX_KEXEC_H */

Please the kexec_jump code just be triggered off of a flag in
struct kimage.  We just need to define an extra flag to sys_kexec_load
say KEXEC_RETURNS.  Ideally in the long term we would not have to
do anything except to accept the flag.  Adding a flag makes
a nice feature test if you want to see if your kernel supports
the extended version of kexec.

Until we get the hibernation methods sorted out storing the flag in
struct kimage and making the methods that we call conditional feels
like a more maintainable interface.  Especially since we have to
know at kexec image load time what we are going to do with the
kexec image.

> +#ifdef CONFIG_KEXEC_JUMP
> +unsigned long kexec_jump_back_entry;
> +
> +int kexec_jump(void)
> +{
> + int error;
> +
> + if (!kexec_image)
> + return -EINVAL;

I understand where you are coming from with this implementation of
kexec_jump but it looks like this is one of the big parts of this
patch that have not reached their final form.

The line above is racy with sys_kexec_load.

> + pm_prepare_console();
> + suspend_console();
> + error = device_suspend(PMSG_FREEZE);
> + if (error)
> + goto Resume_console;

This as everyone knows needs to be device_shutdown or a better hibernation
replacement.

> + error = disable_nonboot_cpus();
> + if (error)
> + goto Resume_devices;

Can't we just catch the noboot cpu's in a mutex.
disable_nonboot_cpus is actually impossible to implement 100% reliably
with current hardware.  But something smp_call_function so we trap them
at a specific location and then the equivalent when we come back should
be simple.  I guess the tricky part is bringing the cpus back up again.

Using the broken by design version of cpu hotplug really annoys me here.

> + local_irq_disable();
> + /* At this point, device_suspend() has been called, but *not*
> +  * device_power_down(). We *must* device_power_down() now.
> +  * Otherwise, drivers for some devices (e.g. interrupt controllers)
> +  * become desynchronized with the actual state of the hardware
> +  * at resume time, and evil weirdness ensues.
> +  */
> + error = device_power_down(PMSG_FREEZE);
> + if (error)
> + goto Enable_irqs;

This of course should go away when we have the proper methods.

> + save_processor_state();
This line might even be reasonable.
> + error = machine_kexec_jump(kexec_image);
> + restore_processor_state();
>
> + /* NOTE:  device_power_up() is just a resume() for devices
> +  * that suspended with irqs off ... no overall powerup.
> +  */
> + device_power_up();
Yep this can go away.
> + Enable_irqs:
> + local_irq_enable();
> + enable_nonboot_cpus();

I haven't looked at the cpu start up code yet to see if it
is generally implementable.  I would think so, but I guess
we need to be careful with our data structures.

> + Resume_devices:
> + device_resume();
This of course should change.
> + Resume_console:
> + resume_console();
> + pm_restore_console();

Odd.  I'm a little surprised that the console is the last
thing we restore.  But it does make sense to treat it specially.

> + return error;
> +}
> +#endif /* CONFIG_KEXEC_JUMP */

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Eric W. Biederman
Nigel Cunningham <[EMAIL PROTECTED]> writes:
>
> That's not true. Kexec will itself be an implementation, otherwise you'd end 
> up with people screaming about no hibernation support. 

There needs to be an implementation of hibernation based on kexec with
return yes.

> And it won't result in 
> the complete removal of the existing hibernation code from the kernel. At the 
> very least, it's going to want the kernel being hibernated to have an 
> interface by which it can find out which pages need to be saved.

That interface should be running kernel -> user space -> target kernel.
Not direct kernel to kernel.

> I wouldn't 
> be surprised if it also ends up with an interface in which the kernel being 
> hibernated tells it what bdev/sectors in which to save the image as well 
> (otherwise you're going to need a dedicated, otherwise untouched partition 
> exclusively for the kexec'd kernel to use), or what network settings to use 
> if it wants to try to save the image to a network storage device. 

initramfs.  We already seem to have that interface.  And distros
seems to do a pretty decent job of using it to configure systems.

> On top of 
> that, there are all the issues related to device reinitialisation and so on, 

Yes.

> and it looks like there's greatly increased pain for users wanting to 
> configure this new implementation. 

Not to be callous but that really is a user space and distro issue.

> Kexec is by no means proven to be the panacea for all the issues.

I agree.  I'm still not quite convinced it will do a satisfactory job.
But I think it does make sense to implement a general kexec with
return and see if that can reasonably be used for handling hibernation
issues.  If done cleanly and with care the implementation won't be
hibernation specific.

Frankly this looks like the best way I can see to implement a general
mechanism for calling silly firmware/BIOS/EFI services after we
have a kernel up and running.  It's a little bit like allowing
X to call iopl(3) and do inb/outb directly.

The configuration issues you raise pretty much exist for kexec on
panic, and they seem to be being resolved for that case in a
reasonable way.  I do agree that the current kexec+return effort seems
to be one of those unfortunate cases where we give every mechanism in
the kernel to do something in user space and then no one actually
implements the user space.  That doesn't do any one any good.

For hibernation we don't have the absolute need to step outside of the
current kernel that we do in the kexec on panic approach.  However we
have this practical fight about mechanism and policy, and kexec with
return has this seductive allure that it appears to be the minimal
necessary mechanism in the kernel.

No one has yet attacked the hard problem of coming up with separate
hibernate methods for drivers.  This should be the hard part of the
puzzle, and the recurring work from a kernel maintenance point of
view.  There is some reason to hope that things will be a maintenance
will be a little simpler because you can get at all of the distinct
pieces of the puzzle.

Currently kexec with return appears to require the minimal amount of
mechanism in the kernel and leaves the policy to someplace else, plus
the code is not hibernation specific.  We could use it to make runtime
EFI calls, or to implement cooperative multitasking between kernels.

My current opinion is that the patches are starting to get close
enough that it isn't a waste of my time reviewing them.  But there
is still a fair amount to be done before this code is in shape for
us to merge it into the kernel.

At 500 or so lines I don't feel bad about pushing back until all of
the core user interface issues are resolved, and we have the code
calling the proper driver methods.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham
Hi.

On Friday 21 September 2007 12:45:57 Huang, Ying wrote:
> On Fri, 2007-09-21 at 12:25 +1000, Nigel Cunningham wrote:
> > Hi.
> > 
> > On Friday 21 September 2007 12:18:57 Huang, Ying wrote:
> > > > That's not true. Kexec will itself be an implementation, otherwise 
you'd 
> > end 
> > > > up with people screaming about no hibernation support. And it won't 
result 
> > in 
> > > > the complete removal of the existing hibernation code from the kernel. 
At 
> > the 
> > > > very least, it's going to want the kernel being hibernated to have an 
> > > > interface by which it can find out which pages need to be saved. I 
> > wouldn't 
> > > 
> > > This has been done by kexec/kdump guys. There is a makedumpfile utility
> > > and vmcoreinfo kernel mechanism to implement this. We can just reuse the
> > > work of kexec/kdump.
> > 
> > You've already said that you are currently saving all pages. How are you 
going 
> > to avoid saving free pages if you don't get the information from the 
kernel 
> > being saved? This will require more than just code reuse.
> 
> I have not tried "makedumpfile". The "makedumpfile" avoids saving free
> pages through checking the "mem_map" of the original kernel. I think
> there is nothing prevent it been used for kexec based hibernation image
> writing.
> 
> This is an example of duplicated effort between kexec/kdump and original
> hibernation implementation. Both kexec/kdump and hibernation need to
> save memory image without saving the free pages. This can be done once
> instead of twice.

Ok.

> > > > be surprised if it also ends up with an interface in which the kernel 
> > being 
> > > > hibernated tells it what bdev/sectors in which to save the image as 
well 
> > > > (otherwise you're going to need a dedicated, otherwise untouched 
partition 
> > > > exclusively for the kexec'd kernel to use), or what network settings 
to 
> > use 
> > > > if it wants to try to save the image to a network storage device. On 
top 
> > of
> > > 
> > > These can be done in user space. The image writing will be done in user
> > > space for kexec base hibernation.
> > 
> > That only complicates things more. Now you need to get the information on 
> > where to save the image from the kernel being saved, then transfer it to 
> > userspace after switching to the kexec kernel. That's more kernel code, 
not 
> > less.
> 
> This is fairly simple in fact. For example, you can specify the
> bdev/sectors in kernel command line when do kexec load "kexec -l <...>
> --append='...'", then the image writing system can get it through
> "cat /proc/cmdline".

Sounds doable, as long as you can cope with long command lines (which 
shouldn't be a biggie). (If you've got a swapfile or parts of a swap 
partition already in use, it can be quite fragmented).

Andrew, you're seeing that it really doesn't mean the removal of all 
hibernation code from the kernel being suspended, aren't you? (And if the 
kexec'd kernel is the same binary, then there's more code again).

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Eric W. Biederman
"Huang, Ying" <[EMAIL PROTECTED]> writes:

> This patch implements the functionality of jumping between the kexeced
> kernel and the original kernel.
>
> A new reboot command named LINUX_REBOOT_CMD_KJUMP is defined to
> trigger the jumping to (executing) the new kernel and jumping back to
> the original kernel.
>
> To support jumping between two kernels, before jumping to (executing)
> the new kernel and jumping back to the original kernel, the devices
> are put into quiescent state (to be fully implemented),

Well this we have an implementation of (it's called shutdown) or does
that method not do enough to meet the requirements of hibernation.
If at all possible I would like to keep reboot, kexec and kexec+return
all using the same device driver methods.

> and the state of devices and CPU is saved. 

Makes a reasonable amount of sense.  We do need to save whatever
state we cannot recover just be reprogramming the hardware.
As long as the drivers are built so this is a good place for a
hot remove to happen we should be in good shape.

> After jumping back from kexeced kernel
> and jumping to the new kernel, the state of devices and CPU are
> restored accordingly. The devices/CPU state save/restore code of
> software suspend is called to implement corresponding function.

At least for now that sounds like a reasonable work around.

I don't think we want to merge this code until we have agreed upon
how the new device_detach and device_reattach (or whatever we call the
device methods for hibernate) are to be implemented.

> To support jumping without preserving memory. One shadow backup page
> is allocated for each page used by new (kexeced) kernel. 

That does not sound correct.  The current implementation of kexec_load
does allocate a source page and give it a destination page and usually
those two pages are different.  But if our memory allocations happen
to return a destination page there we use it directly, making no
copy necessary.

I think we are talking about the same thing but I'm not certain
you have thought about the case where your shadow backup page happens
to be the same as current page.

> When do
> kexec_load, the image of new kernel is loaded into shadow pages, 

Ok.  This sounds like the existing implementation.  Except it
depending on your destination it may force the address.

> and
> before executing, the original pages and the shadow pages are swapped,
> so the contents of original pages are backuped.

Yes.  Unless we happen to have everything allocated on the same page.
Does your code handle that case?  I know the generic kexec code will
pass lists like that in the proper circumstances.  Especially for
the kexec on panic case.

> Before jumping to the
> new (kexeced) kernel and after jumping back to the original kernel,
> the original pages and the shadow pages are swapped too.

Yes.   That sounds right.

> A jump back protocol is defined and documented.

Bleh.  We do need to document the requirements but we don't need a
versioned monster.  And we don't need to be exposing implementation
details in that documentation.

In the kexec world /sbin/kexec or another user space caller is
responsible for passing information to our callers.

To be polite we need to document more but the jump back protocol
really should be as if the entry point kexec handed control to did
a subroutine return.

> Known issues
>
> - A field is added to Linux kernel real-mode header. This is
>   temporary, and should be replaced after the 32-bit boot protocol and
>   setup data patches are accepted.

It shouldn't be needed.

> - The suspend method of device is used to put device in quiescent
>   state. But if the ACPI is enabled this will also put devices into
>   low power state, which prevent the new kernel from booting. So, the
>   ACPI must be disabled both in original kernel and kexeced
>   kernel. This is planed to be resolved after the suspend method and
>   hibernate method is separated for device as proposed earlier in the
>   LKML.

Reasonable.

> - The NX (none executable) bit should be turned off for the control
>   page if available.

Why don't we have a problem with this in the normal kexec case?


More comments below.

> Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
>
> ---
>
>  Documentation/i386/jump_back_protocol.txt |   81 
>  arch/i386/Kconfig |7 +
>  arch/i386/boot/header.S   |2 
>  arch/i386/kernel/machine_kexec.c  |   77 +---
>  arch/i386/kernel/relocate_kernel.S | 187 ++
>  arch/i386/kernel/setup.c  |3 
>  include/asm-i386/bootparam.h  |3 
>  include/asm-i386/kexec.h  |   48 ++-
>  include/linux/kexec.h |9 +
>  include/linux/reboot.h|2 
>  kernel/kexec.c|   59 +
>  kernel/ksysfs.c   |   17 ++
>  kernel/power/Kconfig 

Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Huang, Ying
On Fri, 2007-09-21 at 12:25 +1000, Nigel Cunningham wrote:
> Hi.
> 
> On Friday 21 September 2007 12:18:57 Huang, Ying wrote:
> > > That's not true. Kexec will itself be an implementation, otherwise you'd 
> end 
> > > up with people screaming about no hibernation support. And it won't 
> > > result 
> in 
> > > the complete removal of the existing hibernation code from the kernel. At 
> the 
> > > very least, it's going to want the kernel being hibernated to have an 
> > > interface by which it can find out which pages need to be saved. I 
> wouldn't 
> > 
> > This has been done by kexec/kdump guys. There is a makedumpfile utility
> > and vmcoreinfo kernel mechanism to implement this. We can just reuse the
> > work of kexec/kdump.
> 
> You've already said that you are currently saving all pages. How are you 
> going 
> to avoid saving free pages if you don't get the information from the kernel 
> being saved? This will require more than just code reuse.

I have not tried "makedumpfile". The "makedumpfile" avoids saving free
pages through checking the "mem_map" of the original kernel. I think
there is nothing prevent it been used for kexec based hibernation image
writing.

This is an example of duplicated effort between kexec/kdump and original
hibernation implementation. Both kexec/kdump and hibernation need to
save memory image without saving the free pages. This can be done once
instead of twice.

> > > be surprised if it also ends up with an interface in which the kernel 
> being 
> > > hibernated tells it what bdev/sectors in which to save the image as well 
> > > (otherwise you're going to need a dedicated, otherwise untouched 
> > > partition 
> > > exclusively for the kexec'd kernel to use), or what network settings to 
> use 
> > > if it wants to try to save the image to a network storage device. On top 
> of
> > 
> > These can be done in user space. The image writing will be done in user
> > space for kexec base hibernation.
> 
> That only complicates things more. Now you need to get the information on 
> where to save the image from the kernel being saved, then transfer it to 
> userspace after switching to the kexec kernel. That's more kernel code, not 
> less.

This is fairly simple in fact. For example, you can specify the
bdev/sectors in kernel command line when do kexec load "kexec -l <...>
--append='...'", then the image writing system can get it through
"cat /proc/cmdline".

> > > that, there are all the issues related to device reinitialisation and so 
> on, 
> > 
> > Yes. Device reinitialisation is needed. But all in all, kexec based
> > hibernation can be much simpler on the kernel side.
> 
> Sorry, but I'm yet to be convinced. I'm not unwilling, I'm just not there yet.
>  
> > > and it looks like there's greatly increased pain for users wanting to 
> > > configure this new implementation. Kexec is by no means proven to be the 
> > > panacea for all the issues.
> > 
> > Configuration is a problem, we will work on it.
> > 
> > But, because it is based on kexec/kdump instead of starting from
> > scratch, the duplicated part between hibernation and kexec/kdump can be
> > eliminated.
> 

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham
Hi.

On Friday 21 September 2007 12:18:57 Huang, Ying wrote:
> > That's not true. Kexec will itself be an implementation, otherwise you'd 
end 
> > up with people screaming about no hibernation support. And it won't result 
in 
> > the complete removal of the existing hibernation code from the kernel. At 
the 
> > very least, it's going to want the kernel being hibernated to have an 
> > interface by which it can find out which pages need to be saved. I 
wouldn't 
> 
> This has been done by kexec/kdump guys. There is a makedumpfile utility
> and vmcoreinfo kernel mechanism to implement this. We can just reuse the
> work of kexec/kdump.

You've already said that you are currently saving all pages. How are you going 
to avoid saving free pages if you don't get the information from the kernel 
being saved? This will require more than just code reuse.

> > be surprised if it also ends up with an interface in which the kernel 
being 
> > hibernated tells it what bdev/sectors in which to save the image as well 
> > (otherwise you're going to need a dedicated, otherwise untouched partition 
> > exclusively for the kexec'd kernel to use), or what network settings to 
use 
> > if it wants to try to save the image to a network storage device. On top 
of
> 
> These can be done in user space. The image writing will be done in user
> space for kexec base hibernation.

That only complicates things more. Now you need to get the information on 
where to save the image from the kernel being saved, then transfer it to 
userspace after switching to the kexec kernel. That's more kernel code, not 
less.

> > that, there are all the issues related to device reinitialisation and so 
on, 
> 
> Yes. Device reinitialisation is needed. But all in all, kexec based
> hibernation can be much simpler on the kernel side.

Sorry, but I'm yet to be convinced. I'm not unwilling, I'm just not there yet.
 
> > and it looks like there's greatly increased pain for users wanting to 
> > configure this new implementation. Kexec is by no means proven to be the 
> > panacea for all the issues.
> 
> Configuration is a problem, we will work on it.
> 
> But, because it is based on kexec/kdump instead of starting from
> scratch, the duplicated part between hibernation and kexec/kdump can be
> eliminated.

Regards,

Nigel
-- 
Nigel, Michelle and Alisdair Cunningham
5 Mitchell Street
Cobden 3266
Victoria, Australia
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Huang, Ying
On Fri, 2007-09-21 at 11:57 +1000, Nigel Cunningham wrote:
> Hi.
> 
> On Friday 21 September 2007 11:41:06 Andrew Morton wrote:
> > > On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> > > > On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
> > > <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > Hi Andrew.
> > > > > 
> > > > > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > > > > Seems like good enough for -mm to me.
> > > > > > 
> > > > > > 
> > > > > > Pavel
> > > > > 
> > > > > Andrew, if I recall correctly, you said a while ago that you didn't 
> want 
> > > > > another hibernation implementation in the vanilla kernel. If you're 
> going 
> > > to 
> > > > > consider merging this kexec code, will you also please consider 
> merging 
> > > > > TuxOnIce?
> > > > > 
> > > > 
> > > > The theory is that kexec-based hibernation will mainly use preexisting
> > > > kexec code and will permit us to delete the existing hibernation
> > > > implementation.
> > > > 
> > > > That's different from replacing it.
> > > 
> > > TuxOnIce doesn't remove the existing implementation either. It can 
> > > transparently replace it, but you can enable/disable that at compile time.
> > 
> > Right.  So we end up with two implementations in-tree.  Whereas
> > kexec-based-hibernation leads us to having zero implementations in-tree.
> > 
> > See, it's different.
> 
> That's not true. Kexec will itself be an implementation, otherwise you'd end 
> up with people screaming about no hibernation support. And it won't result in 
> the complete removal of the existing hibernation code from the kernel. At the 
> very least, it's going to want the kernel being hibernated to have an 
> interface by which it can find out which pages need to be saved. I wouldn't 

This has been done by kexec/kdump guys. There is a makedumpfile utility
and vmcoreinfo kernel mechanism to implement this. We can just reuse the
work of kexec/kdump.

> be surprised if it also ends up with an interface in which the kernel being 
> hibernated tells it what bdev/sectors in which to save the image as well 
> (otherwise you're going to need a dedicated, otherwise untouched partition 
> exclusively for the kexec'd kernel to use), or what network settings to use 
> if it wants to try to save the image to a network storage device. On top of

These can be done in user space. The image writing will be done in user
space for kexec base hibernation.

> that, there are all the issues related to device reinitialisation and so on, 

Yes. Device reinitialisation is needed. But all in all, kexec based
hibernation can be much simpler on the kernel side.

> and it looks like there's greatly increased pain for users wanting to 
> configure this new implementation. Kexec is by no means proven to be the 
> panacea for all the issues.

Configuration is a problem, we will work on it.

But, because it is based on kexec/kdump instead of starting from
scratch, the duplicated part between hibernation and kexec/kdump can be
eliminated.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham
Hi.

On Friday 21 September 2007 11:41:06 Andrew Morton wrote:
> > On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> > > On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
> > <[EMAIL PROTECTED]> wrote:
> > > 
> > > > Hi Andrew.
> > > > 
> > > > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > > > Seems like good enough for -mm to me.
> > > > > 
> > > > >   
> > > > > Pavel
> > > > 
> > > > Andrew, if I recall correctly, you said a while ago that you didn't 
want 
> > > > another hibernation implementation in the vanilla kernel. If you're 
going 
> > to 
> > > > consider merging this kexec code, will you also please consider 
merging 
> > > > TuxOnIce?
> > > > 
> > > 
> > > The theory is that kexec-based hibernation will mainly use preexisting
> > > kexec code and will permit us to delete the existing hibernation
> > > implementation.
> > > 
> > > That's different from replacing it.
> > 
> > TuxOnIce doesn't remove the existing implementation either. It can 
> > transparently replace it, but you can enable/disable that at compile time.
> 
> Right.  So we end up with two implementations in-tree.  Whereas
> kexec-based-hibernation leads us to having zero implementations in-tree.
> 
> See, it's different.

That's not true. Kexec will itself be an implementation, otherwise you'd end 
up with people screaming about no hibernation support. And it won't result in 
the complete removal of the existing hibernation code from the kernel. At the 
very least, it's going to want the kernel being hibernated to have an 
interface by which it can find out which pages need to be saved. I wouldn't 
be surprised if it also ends up with an interface in which the kernel being 
hibernated tells it what bdev/sectors in which to save the image as well 
(otherwise you're going to need a dedicated, otherwise untouched partition 
exclusively for the kexec'd kernel to use), or what network settings to use 
if it wants to try to save the image to a network storage device. On top of 
that, there are all the issues related to device reinitialisation and so on, 
and it looks like there's greatly increased pain for users wanting to 
configure this new implementation. Kexec is by no means proven to be the 
panacea for all the issues.

Regards,

Nigel
-- 
Nigel Cunningham
Pastor
Christian Reformed Church of Cobden
Victoria, Australia
+61 3 5595 1185
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Andrew Morton
On Fri, 21 Sep 2007 11:19:59 +1000 Nigel Cunningham <[EMAIL PROTECTED]> wrote:

> Hi.
> 
> On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> > On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
> <[EMAIL PROTECTED]> wrote:
> > 
> > > Hi Andrew.
> > > 
> > > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > > Seems like good enough for -mm to me.
> > > > 
> > > > 
> > > > Pavel
> > > 
> > > Andrew, if I recall correctly, you said a while ago that you didn't want 
> > > another hibernation implementation in the vanilla kernel. If you're going 
> to 
> > > consider merging this kexec code, will you also please consider merging 
> > > TuxOnIce?
> > > 
> > 
> > The theory is that kexec-based hibernation will mainly use preexisting
> > kexec code and will permit us to delete the existing hibernation
> > implementation.
> > 
> > That's different from replacing it.
> 
> TuxOnIce doesn't remove the existing implementation either. It can 
> transparently replace it, but you can enable/disable that at compile time.

Right.  So we end up with two implementations in-tree.  Whereas
kexec-based-hibernation leads us to having zero implementations in-tree.

See, it's different.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham
Hi.

On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
> On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
<[EMAIL PROTECTED]> wrote:
> 
> > Hi Andrew.
> > 
> > On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > > Seems like good enough for -mm to me.
> > > 
> > >   Pavel
> > 
> > Andrew, if I recall correctly, you said a while ago that you didn't want 
> > another hibernation implementation in the vanilla kernel. If you're going 
to 
> > consider merging this kexec code, will you also please consider merging 
> > TuxOnIce?
> > 
> 
> The theory is that kexec-based hibernation will mainly use preexisting
> kexec code and will permit us to delete the existing hibernation
> implementation.
> 
> That's different from replacing it.

TuxOnIce doesn't remove the existing implementation either. It can 
transparently replace it, but you can enable/disable that at compile time.

Regards,

Nigel
-- 
Nigel Cunningham
Christian Reformed Church of Cobden
103 Curdie Street, Cobden 3266, Victoria, Australia
Ph. +61 3 5595 1185 / +61 417 100 574
Communal Worship: 11 am Sunday.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Andrew Morton
On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham <[EMAIL PROTECTED]> wrote:

> Hi Andrew.
> 
> On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> > Seems like good enough for -mm to me.
> > 
> > Pavel
> 
> Andrew, if I recall correctly, you said a while ago that you didn't want 
> another hibernation implementation in the vanilla kernel. If you're going to 
> consider merging this kexec code, will you also please consider merging 
> TuxOnIce?
> 

The theory is that kexec-based hibernation will mainly use preexisting
kexec code and will permit us to delete the existing hibernation
implementation.

That's different from replacing it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham
Hi Andrew.

On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
> Seems like good enough for -mm to me.
> 
>   Pavel

Andrew, if I recall correctly, you said a while ago that you didn't want 
another hibernation implementation in the vanilla kernel. If you're going to 
consider merging this kexec code, will you also please consider merging 
TuxOnIce?

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Pavel Machek
Hi!

> This patch implements the functionality of jumping between the kexeced
> kernel and the original kernel.
> 
> A new reboot command named LINUX_REBOOT_CMD_KJUMP is defined to
> trigger the jumping to (executing) the new kernel and jumping back to
> the original kernel.
> 
> To support jumping between two kernels, before jumping to (executing)
> the new kernel and jumping back to the original kernel, the devices
> are put into quiescent state (to be fully implemented), and the state
> of devices and CPU is saved. After jumping back from kexeced kernel
> and jumping to the new kernel, the state of devices and CPU are
> restored accordingly. The devices/CPU state save/restore code of
> software suspend is called to implement corresponding function.
> 
> To support jumping without preserving memory. One shadow backup page
> is allocated for each page used by new (kexeced) kernel. When do
> kexec_load, the image of new kernel is loaded into shadow pages, and
> before executing, the original pages and the shadow pages are swapped,
> so the contents of original pages are backuped. Before jumping to the
> new (kexeced) kernel and after jumping back to the original kernel,
> the original pages and the shadow pages are swapped too.
> 
> A jump back protocol is defined and documented.


> Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

Seems like good enough for -mm to me.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Pavel Machek
Hi!

 This patch implements the functionality of jumping between the kexeced
 kernel and the original kernel.
 
 A new reboot command named LINUX_REBOOT_CMD_KJUMP is defined to
 trigger the jumping to (executing) the new kernel and jumping back to
 the original kernel.
 
 To support jumping between two kernels, before jumping to (executing)
 the new kernel and jumping back to the original kernel, the devices
 are put into quiescent state (to be fully implemented), and the state
 of devices and CPU is saved. After jumping back from kexeced kernel
 and jumping to the new kernel, the state of devices and CPU are
 restored accordingly. The devices/CPU state save/restore code of
 software suspend is called to implement corresponding function.
 
 To support jumping without preserving memory. One shadow backup page
 is allocated for each page used by new (kexeced) kernel. When do
 kexec_load, the image of new kernel is loaded into shadow pages, and
 before executing, the original pages and the shadow pages are swapped,
 so the contents of original pages are backuped. Before jumping to the
 new (kexeced) kernel and after jumping back to the original kernel,
 the original pages and the shadow pages are swapped too.
 
 A jump back protocol is defined and documented.


 Signed-off-by: Huang Ying [EMAIL PROTECTED]

Seems like good enough for -mm to me.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham
Hi Andrew.

On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
 Seems like good enough for -mm to me.
 
   Pavel

Andrew, if I recall correctly, you said a while ago that you didn't want 
another hibernation implementation in the vanilla kernel. If you're going to 
consider merging this kexec code, will you also please consider merging 
TuxOnIce?

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Andrew Morton
On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham [EMAIL PROTECTED] wrote:

 Hi Andrew.
 
 On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
  Seems like good enough for -mm to me.
  
  Pavel
 
 Andrew, if I recall correctly, you said a while ago that you didn't want 
 another hibernation implementation in the vanilla kernel. If you're going to 
 consider merging this kexec code, will you also please consider merging 
 TuxOnIce?
 

The theory is that kexec-based hibernation will mainly use preexisting
kexec code and will permit us to delete the existing hibernation
implementation.

That's different from replacing it.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham
Hi.

On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
 On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
[EMAIL PROTECTED] wrote:
 
  Hi Andrew.
  
  On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
   Seems like good enough for -mm to me.
   
 Pavel
  
  Andrew, if I recall correctly, you said a while ago that you didn't want 
  another hibernation implementation in the vanilla kernel. If you're going 
to 
  consider merging this kexec code, will you also please consider merging 
  TuxOnIce?
  
 
 The theory is that kexec-based hibernation will mainly use preexisting
 kexec code and will permit us to delete the existing hibernation
 implementation.
 
 That's different from replacing it.

TuxOnIce doesn't remove the existing implementation either. It can 
transparently replace it, but you can enable/disable that at compile time.

Regards,

Nigel
-- 
Nigel Cunningham
Christian Reformed Church of Cobden
103 Curdie Street, Cobden 3266, Victoria, Australia
Ph. +61 3 5595 1185 / +61 417 100 574
Communal Worship: 11 am Sunday.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Andrew Morton
On Fri, 21 Sep 2007 11:19:59 +1000 Nigel Cunningham [EMAIL PROTECTED] wrote:

 Hi.
 
 On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
  On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
 [EMAIL PROTECTED] wrote:
  
   Hi Andrew.
   
   On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
Seems like good enough for -mm to me.


Pavel
   
   Andrew, if I recall correctly, you said a while ago that you didn't want 
   another hibernation implementation in the vanilla kernel. If you're going 
 to 
   consider merging this kexec code, will you also please consider merging 
   TuxOnIce?
   
  
  The theory is that kexec-based hibernation will mainly use preexisting
  kexec code and will permit us to delete the existing hibernation
  implementation.
  
  That's different from replacing it.
 
 TuxOnIce doesn't remove the existing implementation either. It can 
 transparently replace it, but you can enable/disable that at compile time.

Right.  So we end up with two implementations in-tree.  Whereas
kexec-based-hibernation leads us to having zero implementations in-tree.

See, it's different.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Nigel Cunningham
Hi.

On Friday 21 September 2007 11:41:06 Andrew Morton wrote:
  On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
   On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
  [EMAIL PROTECTED] wrote:
   
Hi Andrew.

On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
 Seems like good enough for -mm to me.
 
   
 Pavel

Andrew, if I recall correctly, you said a while ago that you didn't 
want 
another hibernation implementation in the vanilla kernel. If you're 
going 
  to 
consider merging this kexec code, will you also please consider 
merging 
TuxOnIce?

   
   The theory is that kexec-based hibernation will mainly use preexisting
   kexec code and will permit us to delete the existing hibernation
   implementation.
   
   That's different from replacing it.
  
  TuxOnIce doesn't remove the existing implementation either. It can 
  transparently replace it, but you can enable/disable that at compile time.
 
 Right.  So we end up with two implementations in-tree.  Whereas
 kexec-based-hibernation leads us to having zero implementations in-tree.
 
 See, it's different.

That's not true. Kexec will itself be an implementation, otherwise you'd end 
up with people screaming about no hibernation support. And it won't result in 
the complete removal of the existing hibernation code from the kernel. At the 
very least, it's going to want the kernel being hibernated to have an 
interface by which it can find out which pages need to be saved. I wouldn't 
be surprised if it also ends up with an interface in which the kernel being 
hibernated tells it what bdev/sectors in which to save the image as well 
(otherwise you're going to need a dedicated, otherwise untouched partition 
exclusively for the kexec'd kernel to use), or what network settings to use 
if it wants to try to save the image to a network storage device. On top of 
that, there are all the issues related to device reinitialisation and so on, 
and it looks like there's greatly increased pain for users wanting to 
configure this new implementation. Kexec is by no means proven to be the 
panacea for all the issues.

Regards,

Nigel
-- 
Nigel Cunningham
Pastor
Christian Reformed Church of Cobden
Victoria, Australia
+61 3 5595 1185
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-20 Thread Huang, Ying
On Fri, 2007-09-21 at 11:57 +1000, Nigel Cunningham wrote:
 Hi.
 
 On Friday 21 September 2007 11:41:06 Andrew Morton wrote:
   On Friday 21 September 2007 11:06:23 Andrew Morton wrote:
On Fri, 21 Sep 2007 10:24:34 +1000 Nigel Cunningham 
   [EMAIL PROTECTED] wrote:

 Hi Andrew.
 
 On Thursday 20 September 2007 20:09:41 Pavel Machek wrote:
  Seems like good enough for -mm to me.
  
  
  Pavel
 
 Andrew, if I recall correctly, you said a while ago that you didn't 
 want 
 another hibernation implementation in the vanilla kernel. If you're 
 going 
   to 
 consider merging this kexec code, will you also please consider 
 merging 
 TuxOnIce?
 

The theory is that kexec-based hibernation will mainly use preexisting
kexec code and will permit us to delete the existing hibernation
implementation.

That's different from replacing it.
   
   TuxOnIce doesn't remove the existing implementation either. It can 
   transparently replace it, but you can enable/disable that at compile time.
  
  Right.  So we end up with two implementations in-tree.  Whereas
  kexec-based-hibernation leads us to having zero implementations in-tree.
  
  See, it's different.
 
 That's not true. Kexec will itself be an implementation, otherwise you'd end 
 up with people screaming about no hibernation support. And it won't result in 
 the complete removal of the existing hibernation code from the kernel. At the 
 very least, it's going to want the kernel being hibernated to have an 
 interface by which it can find out which pages need to be saved. I wouldn't 

This has been done by kexec/kdump guys. There is a makedumpfile utility
and vmcoreinfo kernel mechanism to implement this. We can just reuse the
work of kexec/kdump.

 be surprised if it also ends up with an interface in which the kernel being 
 hibernated tells it what bdev/sectors in which to save the image as well 
 (otherwise you're going to need a dedicated, otherwise untouched partition 
 exclusively for the kexec'd kernel to use), or what network settings to use 
 if it wants to try to save the image to a network storage device. On top of

These can be done in user space. The image writing will be done in user
space for kexec base hibernation.

 that, there are all the issues related to device reinitialisation and so on, 

Yes. Device reinitialisation is needed. But all in all, kexec based
hibernation can be much simpler on the kernel side.

 and it looks like there's greatly increased pain for users wanting to 
 configure this new implementation. Kexec is by no means proven to be the 
 panacea for all the issues.

Configuration is a problem, we will work on it.

But, because it is based on kexec/kdump instead of starting from
scratch, the duplicated part between hibernation and kexec/kdump can be
eliminated.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >