Bug#499070: dpkg leaves system in unusable state after running out of diskspace

2008-11-09 Thread Guillem Jover
Hi,

On Wed, 2008-10-22 at 09:51:05 +0300, Guillem Jover wrote:
   http://git.hadrons.org/?p=debian/dpkg.git

 The correct approach is the one in ood-abort, implementing what I
 guess was meant to be done with onerr_abort. What's missing is
 untangling onerr_abort being activated by the archives/packages
 processing loop when too many errors happen.

I've done that now and updated pu/ood-abort, and overall, the relevant
changes from the branch are not that much code

 6 files changed, 30 insertions(+), 7 deletions(-)

from which 10 of the inserted lines are comments or blank lines. The
main issue is the new string for translation. This code though should
be way more robust against unrecoverable error conditions. I still need
to review once more all onerr_abort uses, and test when onerr_abort is
being activated due to too many processing errors.

 The ood-unwind branch might imply less behaviour changes, but it's
 not the correct solution long term, and some side effects might be
 still there as the execution continues when it should not.

I guess this branch can be reduced to just the cb1bdc7d commit (even
also removing the fseek call), as the rest are just papering over the
real problems and will not cover all situations anyway. On unrecoverable
errors, it becomes a mess as the code tries to continue running even if
it's not expecting to, but that's the behaviour that has been present
since forever and people have been living with it, so it would be a
slight improvement, which should at least fix the reported problem of
the bogus update file.

regards,
guillem



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#499070: dpkg leaves system in unusable state after running out of diskspace

2008-10-22 Thread Guillem Jover
Hi,

Hmm all this time passed already...

On Tue, 2008-10-07 at 17:47:52 +0200, Raphael Hertzog wrote:
 On Tue, 07 Oct 2008, Guillem Jover wrote:
  Yeah got the code the day after that mail, but I've not found the time
  to test it. I guess the easiest is to change one of the function
  return values to the output of rand() or similar and see from there.
  I'll try to get to it this week.

Worked and tested that last week but didn't clean up a bit the stuff
until today. I've pushed two branches, pu/ood-abort and pu/ood-unwind
with different approaches for the fix, to:

  http://git.hadrons.org/?p=debian/dpkg.git

 Just share it, others might have the time to test (using quota on a openvz
 virtual environment for example).

The problem is triggering the exact conditions for this bug, having
OOD is not enough, and would probably need a lot of iterations to maybe
be able to reproduce it. That's why I wanted to add the targetted
random().


The initial patch (f0efc5cc in ood-unwind) I mentioned in the previous
mail only fixed part of the problem, in some conditions when doing
error unwinding some of the cu_ functions from src/cleanup.c get
called then those call modstatdb_note, which trigger this bug.

So the root problem here is that the onerr_abort logic is broken, the
code assumes that once onerr_abort has been flagged the program will
terminate, but that does not happen, and some code unexpectedly runs
again. At the same time the code is trying to deal with onerr_abort in
random places.

The correct approach is the one in ood-abort, implementing what I
guess was meant to be done with onerr_abort. What's missing is
untangling onerr_abort being activated by the archives/packages
processing loop when too many errors happen.

The ood-unwind branch might imply less behaviour changes, but it's
not the correct solution long term, and some side effects might be
still there as the execution continues when it should not.

I'll finish the odd-abort code, and do some testing, if I feel
comfortable with that one I'll push it, otherwise I might consider
ood-unwind for lenny, and switch to ood-abort for squeeze. Let's see.

regards,
guillem



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#499070: dpkg leaves system in unusable state after running out of diskspace

2008-10-07 Thread Raphael Hertzog
On Thu, 18 Sep 2008, Guillem Jover wrote:
  The other is that when onerr_abort is signaled dpkg should not
  continue processing anything anymore, it should just do whatever cleanup
  is required and exit. But that can wait probably post-lenny.
 
 So this is the proper fix, and it should not be that big, probably less
 than 10 lines? Will cook something today or tomorrow...

Any progress ?

Cheers,
-- 
Raphaël Hertzog

Le best-seller français mis à jour pour Debian Etch :
http://www.ouaza.com/livre/admin-debian/



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#499070: dpkg leaves system in unusable state after running out of diskspace

2008-10-07 Thread Guillem Jover
Hey!

On Tue, 2008-10-07 at 09:02:54 +0200, Raphael Hertzog wrote:
 On Thu, 18 Sep 2008, Guillem Jover wrote:
   The other is that when onerr_abort is signaled dpkg should not
   continue processing anything anymore, it should just do whatever cleanup
   is required and exit. But that can wait probably post-lenny.
  
  So this is the proper fix, and it should not be that big, probably less
  than 10 lines? Will cook something today or tomorrow...
 
 Any progress ?

Yeah got the code the day after that mail, but I've not found the time
to test it. I guess the easiest is to change one of the function
return values to the output of rand() or similar and see from there.
I'll try to get to it this week.

And it ended up being a 7 lines patch, although I think I'll be
changing part of that error recovery logic for squeeze.

regards,
guillem



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#499070: dpkg leaves system in unusable state after running out of diskspace

2008-10-07 Thread Raphael Hertzog
On Tue, 07 Oct 2008, Guillem Jover wrote:
  Any progress ?
 
 Yeah got the code the day after that mail, but I've not found the time
 to test it. I guess the easiest is to change one of the function
 return values to the output of rand() or similar and see from there.
 I'll try to get to it this week.

Just share it, others might have the time to test (using quota on a openvz
virtual environment for example).

Cheers,
-- 
Raphaël Hertzog

Le best-seller français mis à jour pour Debian Etch :
http://www.ouaza.com/livre/admin-debian/



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#499070: dpkg leaves system in unusable state after running out of diskspace

2008-09-18 Thread Guillem Jover
Hey,

On Tue, 2008-09-16 at 23:14:04 +0300, Guillem Jover wrote:
 So I think there's two things to fix here, one is that the fseek() in
 createimptmp() should be done just before the fwrite() in
 modstatdb_note_core() to guarantee that we are going to be always at the
 beginning, and we make proper use of the reserved space allocated
 previously with those '#padding' lines to avoid the out of space
 condition. That's a 4-liner patch, which should be fine for lenny, and
 prevents this bogus condition were the user most probably is going to
 remove that file to be able to continue, which might produce an
 inconsistent state in the dpkg db. And the real problem is that there
 might not be an easy manual fix by the users if the status data could
 not be completely written to the update file.

I take this back, the fseek() needs to be there to guarantee that the
data has been written. So moving it would only help partially, as we
would always be writting at the beginning but the fwrite() might not
have enough space from the reserved padding, and the data might get
truncated.

 The other is that when onerr_abort is signaled dpkg should not
 continue processing anything anymore, it should just do whatever cleanup
 is required and exit. But that can wait probably post-lenny.

So this is the proper fix, and it should not be that big, probably less
than 10 lines? Will cook something today or tomorrow...

regards,
guillem



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#499070: dpkg leaves system in unusable state after running out of diskspace

2008-09-16 Thread Guillem Jover
forcemerge 499070 497041
thanks

On Tue, 2008-09-16 at 01:06:05 +0200, Alexander Prinsier wrote:
 Package: dpkg
 Version: 1.14.22
 Severity: serious
 
 I was installing php5 while my system ran out of disk space. dpkg
 breaks, and leaves the system in a state where I can no longer use dpkg.
 To find out the version number of dpkg I used /var/log/dpkg.log, as I
 couldn't use dpkg to find it's own version number... Hope it's the
 correct one.

 This is what happened:
 
 [...]
 Selecting previously deselected package libjpeg62.
 Unpacking libjpeg62 (from .../libjpeg62_6b-14_i386.deb) ...
 dpkg: error processing /var/cache/apt/archives/libjpeg62_6b-14_i386.deb
 (--unpack):
  failed in buffer_write(fd) (10, ret=-1): backend dpkg-deb during
 `./usr/share/doc/libjpeg62/copyright': Disk quota exceeded
 Selecting previously deselected package libdjvulibre21.
 Unpacking libdjvulibre21 (from .../libdjvulibre21_3.5.20-8_i386.deb) ...
 dpkg: error processing
 /var/cache/apt/archives/libdjvulibre21_3.5.20-8_i386.deb (--unpack):
  failed in buffer_write(fd) (10, ret=-1): backend dpkg-deb during
 `./usr/lib/libdjvulibre.so.21.0.0': Disk quota exceeded
 dpkg-deb: subprocess paste killed by signal (Broken pipe)
 Selecting previously deselected package libxpm4.
 Unpacking libxpm4 (from .../libxpm4_1%3a3.5.7-1_i386.deb) ...
 Selecting previously deselected package libgd2-xpm.
 Unpacking libgd2-xpm (from .../libgd2-xpm_2.0.36~rc1~dfsg-3_i386.deb) ...
 dpkg: error processing
 /var/cache/apt/archives/libgd2-xpm_2.0.36~rc1~dfsg-3_i386.deb (--unpack):
  unable to flush /var/lib/dpkg/updates/tmp.i after padding: Disk quota 
 exceeded

Here's where the problem starts. That «tmp.i» file is the one that has
been filled with '#padding' lines, the fflush() in createimptmp() fails,
and it ohshite()s, but just before it has been able to rewind the file to
the beginning.

 Processing triggers for man-db ...
 /usr/bin/mandb: can't write to /var/cache/man/19933: Disk quota exceeded
 gdbm fatal: read error
 dpkg: failed to write status record about `libcairo2' to
 `/var/lib/dpkg/status': Disk quota exceeded
 E: Sub-process /usr/bin/dpkg returned an error code (2)
 A package failed to install.  Trying to recover:
 dpkg: parse error, in file `/var/lib/dpkg/updates/0119' near line 1:
  newline in field name `#padding'
 Press return to continue.

The second problem is that even if the ohshite() called from
createimptmp() is protected inside an onerr_abort section, the execution
continues in archivefiles(), out of process_archive() but into
process_queue(), which at some point calls a modstatdb_note(), moving
the «tmp.i» file into the «0119» one, then another ohshite() is called
and the onerr_abort is sensed again, making process_queue() terminate
its loop.

 # dpkg --configure -a
 dpkg: parse error, in file `/var/lib/dpkg/updates/0119' near line 1:
  newline in field name `#padding'
 #

But at the point that modstat_note() was called the function wrote at
the end of it, preserving all the '#padding' lines, and making dpkg
barf subsequently.


So I think there's two things to fix here, one is that the fseek() in
createimptmp() should be done just before the fwrite() in
modstatdb_note_core() to guarantee that we are going to be always at the
beginning, and we make proper use of the reserved space allocated
previously with those '#padding' lines to avoid the out of space
condition. That's a 4-liner patch, which should be fine for lenny, and
prevents this bogus condition were the user most probably is going to
remove that file to be able to continue, which might produce an
inconsistent state in the dpkg db. And the real problem is that there
might not be an easy manual fix by the users if the status data could
not be completely written to the update file.

The other is that when onerr_abort is signaled dpkg should not
continue processing anything anymore, it should just do whatever cleanup
is required and exit. But that can wait probably post-lenny.

regards,
guillem



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]