from:"Neil Brown"

Bug#644470: git-daemon-run: git-run-daemon should use --reuseaddr to ensure clean restart during upgrade etc.

2011-10-06 Thread Neil Brown

Package: git-daemon-run
Version: 1:1.7.6.3-1
Severity: normal

Dear Maintainer,
  sometimes an incoming connection the the git-daemon will abort in such
  a way that the daemon doesn't notice and keeps the connection open.

  When git is upgraded the daemon will be killed and restarted.  When it
  tries to bind to the address to listen to it will fail because there
  are some open connection still using that address.  They will be in
  TIME_WAIT for a couple minutes after the old daemon is killed.
  So the daemon starts not listening (or only listening on IPv6).
  This does not make for a clean upgrade.

  If git-daemon is started with --reuseaddr, then the fact that there are
  old connections around will not stop it from binding to the correct
  address.   So --reuseaddr makes a clean upgrade much more likely.

  In my opinion --reuseaddr should not be an option - it should always
  be used.  This may be debatable.  However when a daemon can be restarted
  automatically it is certain that --reuseaddr will make the restart
  cleaner.

  So please consider including --reuseaddr in /etc/sv/git-daemon/run

Thanks,
NeilBrown


-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-5-686 (SMP w/1 CPU core)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages git-daemon-run depends on:
ii  adduser  3.113  
ii  git  1:1.7.6.3-1
ii  runit2.1.1-6.2  

git-daemon-run recommends no packages.

git-daemon-run suggests no packages.

-- Configuration Files:
/etc/sv/git-daemon/run changed:
exec 21
echo 'git-daemon starting.'
exec chpst -ugitdaemon \
  $(git --exec-path)/git-daemon --reuseaddr --verbose 
--base-path=/var/cache/git /var/cache/git


-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-28 Thread Neil Brown

On Sun, 28 Nov 2010 04:18:25 + Ben Hutchings b...@debian.org wrote:

 On Sun, 2010-11-28 at 08:28 +1100, Neil Brown wrote:
   The fix I would recommend for 2.6.26 is to add
  
 if (q-merge_bvec_fn)
 rs-max_phys_segments = 1;
  
  to dm_set_device_limits.  Though the redhat one is probably adequate.
  
  If you really need an upstream fix, you will need to chase upstream to apply
  one :-(
 
 I won't do that myself - as you can see, I don't really understand the
 issue fully.  Is that fix also valid (modulo renaming of
 max_phys_segments) for later versions?
 

Yes.
For current mainline it would look like replacing


if (q-merge_bvec_fn  !ti-type-merge)
limits-max_sectors =
min_not_zero(limits-max_sectors,
 (unsigned int) (PAGE_SIZE  9));

with

if (q-merge_bvec_fn  !ti-type-merge)
limits-max_segments = 1;

(the test on -type-merge is important and applies to 2.6.26 as well).

NeilBrown



signature.asc
Description: PGP signature

Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-28 Thread Neil Brown

On Mon, 29 Nov 2010 00:08:47 + Ben Hutchings b...@debian.org wrote:

  
  if (q-merge_bvec_fn  !ti-type-merge)
  limits-max_segments = 1;
  
  (the test on -type-merge is important and applies to 2.6.26 as well).
 
 Why is it not necessary to set seg_boundary_mask to PAGE_CACHE_SIZE - 1,
 as for md devices?
 

Sorry.  It is necessary of course.  I guess I was being a bit hasty and
forgetting all the details.

 if (q-merge_bvec_fn  !ti-type-merge) {
  limits-max_segments = 1;   /* Make sure only one segment in each bio */
  limits-seg_boundary_mask = PAGE_CACHE_SIZE-1; /* make sure that
segment is in just one page */
 }

NeilBrown



signature.asc
Description: PGP signature

Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-27 Thread Neil Brown

On Sat, 27 Nov 2010 19:53:54 + Ben Hutchings b...@debian.org wrote:

Neil, would you mind looking at this:

On Sat, 2010-11-27 at 10:49 +0100, Wouter D'Haeseleer wrote:
Ben,

I'm running 4 days now without any disk errors anymore.
As stated in my previous message this is with the RedHat patch applied.

If I compair the patches I see that the patch you grabed upstream does not
deal with t-limits.max_sectors

I've tried backporting your commit
627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 to Linux 2.6.26 in Debian
stable but it doesn't seem to fix the problem there. My version is
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=57;filename=0002-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch;att=1;bug=604457
and RH's very different patch for RHEL 5 is
https://bugzilla.redhat.com/attachment.cgi?id=342638action=diffcontext=patchcollapsed=headers=1format=raw

Our bug log is at http://bugs.debian.org/604457.

Ben.

Hi Ben,

You probably know most of this, but:

The problem is that with stacked devices, if the lower device has a
merge_bvec_fn, and the upper device never bothers to call it, then the
upper device must make sure that it never sends a bio with more than one
page in the bi_iovec. This is a property of the block device interface.

The patch you back-ported fixes md so when it is the upper device it
behaves correctly.

However in the original problem, the md/raid10 is the lower device, and
dm is the upper device. So dm needs to be fixed.

Despite the fact that I learned about setting blk_queue_max_segments on
the dm mailing list (if I remember correctly), dm still doesn't include
this fix in mainline.

The fix I would recommend for 2.6.26 is to add

if (q-merge_bvec_fn)
rs-max_phys_segments = 1;

to dm_set_device_limits. Though the redhat one is probably adequate.

If you really need an upstream fix, you will need to chase upstream to apply
one :-(

NeilBrown

signature.asc
Description: PGP signature

Bug#598721: mdadm: internal bitmap uses way too large chunks

2010-10-01 Thread Neil Brown

On Fri, 01 Oct 2010 13:53:13 +0200
Paul Slootman p...@debian.org wrote:

 Package: mdadm
 Version: 3.1.4-1+8efb9d1
 Severity: important
 
 I noticed that when adding internal bitmaps to md devices,
 that the chunks used were far too large:

That depends on what you mean by 'too large'.
I find that the ideal chunk size relates to how much of the array can be
resynced in about second or a bit less.  I suspect I might be able to do an
IO test and see how fast the devices are, but that is messy and error prone.
So I simply choose a default of 64M as that seems to be the right ball-park
for modern hardware.

Smaller chunk sizes increase write overhead for little appreciable gain.

 
 r...@corky:/home/paul# cat /proc/mdstat 
 Personalities : [raid1] 
 md4 : active raid1 sdb5[0] sdc5[1]
   587202560 blocks [2/2] [UU]
   bitmap: 2/5 pages [8KB], 65536KB chunk
 
 md3 : active raid1 sdc3[2](W) sda3[0] sdb3[1](W)
   10485696 blocks [3/3] [UUU]
   bitmap: 1/1 pages [4KB], 65536KB chunk
 
 md2 : active raid1 sdb2[0] sdc2[1]
   4194240 blocks [2/2] [UU]
   bitmap: 0/1 pages [0KB], 65536KB chunk
 
 md1 : active raid1 sdc1[2](W) sda1[0] sdb1[1](W)
   213120 blocks [3/3] [UUU]
   bitmap: 1/1 pages [4KB], 65536KB chunk
 
 unused devices: none
 
 
 I mean, using 64M chunks on an md device that's just 208MB is silly.

It might also be said that having a bitmap on a 208MB array is a bit silly as
it would only take about 5 seconds to resync it without a bitmap.
In reality neither are 'silly', but may be non-optimal.


 
 After downgrading to mdadm 3.0.3:

You don't need to down grade.  If you don't like the default that mdadm
chooses for you, you are free to choose your own and specify it with the
--bitmap-chunk option.

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#597518: gnome-panel: workspace switch now ignores 'rows' setting

2010-09-20 Thread Neil Brown

Package: gnome-panel
Version: 2.30.2-2
Severity: normal


The recent gnome-panel update to 2.30.2-2 caused the workspace
switcher to misbehave.

My configuration is for 9 workspaces in 3 rows - so 3 by 3.
This works find in 2.30.2-1 (which I have reverted back to).
In 2.30.2-2, the '3 rows' setting is ignored, and the switcher
is consistently displayed a 1 row of 9 workspaces.

NeilBrown


-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages gnome-panel depends on:
ii  gnome-about2.30.2-1  The GNOME about box
ii  gnome-control-center   1:2.30.1-2utilities to configure the GNOME d
ii  gnome-desktop-data 2.30.2-1  Common files for GNOME desktop app
ii  gnome-menus2.30.3-1  an implementation of the freedeskt
ii  gnome-panel-data   2.30.2-2  common files for the GNOME Panel
ii  libatk1.0-01.30.0-1  The ATK accessibility toolkit
ii  libbonobo2-0   2.24.3-1  Bonobo CORBA interfaces library
ii  libbonoboui2-0 2.24.3-1  The Bonobo UI library
ii  libc6  2.11.2-6  Embedded GNU C Library: Shared lib
ii  libcairo2  1.8.10-6  The Cairo 2D vector graphics libra
ii  libcanberra-gtk0   0.24-1Gtk+ helper for playing widget eve
ii  libcanberra0   0.24-1a simple abstract interface for pl
ii  libdbus-1-31.2.24-3  simple interprocess messaging syst
ii  libdbus-glib-1-2   0.88-2simple interprocess messaging syst
ii  libecal1.2-7   2.30.3-1  Client library for evolution calen
ii  libedataserver1.2-13   2.30.3-1  Utility library for evolution data
ii  libedataserverui1.2-8  2.30.3-1  GUI utility library for evolution 
ii  libgconf2-42.28.1-4  GNOME configuration database syste
ii  libglib2.0-0   2.24.2-1  The GLib library of C routines
ii  libgnome-desktop-2-17  2.30.2-1  Utility library for loading .deskt
ii  libgnome-menu2 2.30.3-1  an implementation of the freedeskt
ii  libgtk2.0-02.20.1-1+b1   The GTK+ graphical user interface 
ii  libgweather1   2.30.2-1  GWeather shared library
ii  libical0   0.44-3iCalendar library implementation i
ii  libice62:1.0.6-1 X11 Inter-Client Exchange library
ii  liborbit2  1:2.14.18-0.1 libraries for ORBit2 - a CORBA ORB
ii  libpanel-applet2-0 2.30.2-2  library for GNOME Panel applets
ii  libpango1.0-0  1.28.1-1  Layout and rendering of internatio
ii  libpolkit-gobject-1-0  0.96-3PolicyKit Authorization API
ii  librsvg2-2 2.26.3-1  SAX-based renderer library for SVG
ii  libsm6 2:1.1.1-1 X11 Session Management library
ii  libwnck22  2.30.4-1  Window Navigator Construction Kit 
ii  libx11-6   2:1.3.3-3 X11 client-side library
ii  libxrandr2 2:1.3.0-3 X11 RandR extension library
ii  menu-xdg   0.5   freedesktop.org menu compliant win
ii  policykit-1-gnome  0.96-2GNOME authentication agent for Pol
ii  python 2.6.6-1   interactive high-level object-orie
ii  python-gconf   2.28.1-1  Python bindings for the GConf conf
ii  python-gnome2  2.28.1-1  Python bindings for the GNOME desk

Versions of packages gnome-panel recommends:
ii  alacarte  0.13.1-1   easy GNOME menu editing tool
ii  evolution-data-server 2.30.3-1   evolution database backend server
ii  gnome-applets 2.30.0-3   Various applets for the GNOME pane
ii  gnome-icon-theme  2.30.3-1   GNOME Desktop icon theme
ii  gnome-session 2.30.2-2   The GNOME Session Manager - GNOME 
ii  gvfs  1.6.3-2userspace virtual filesystem - ser

Versions of packages gnome-panel suggests:
ii  epiphany-browser 2.30.6-1Intuitive GNOME web browser
pn  evolutionnone  (no description available)
ii  gnome-terminal [x-termin 2.30.2-1The GNOME terminal emulator applic
ii  gnome-user-guide [gnome2 2.30.1-1GNOME user's guide
ii  konsole [x-terminal-emul 4:4.4.5-1   X terminal emulator
ii  nautilus 2.30.1-2file manager and graphical shell f
ii  xterm [x-terminal-emulat 261-1   X terminal emulator
ii  yelp 2.30.1+webkit-1 Help browser for GNOME

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org

Bug#594418: confirming that 3.0.3-2 is not affected by #594418

2010-08-25 Thread Neil Brown


I'm fairly sure this is a known bug which is fixed in 3.1.3 which has just
been released in Debian (I think).

The problem is that mdadm tries to use /lib/init/rw which doesn't exist
during the initramfs stage.

If you repeat your experiment with init=/bin/sh, and
   mkdir -p /lib/init/rw
before starting udev, I think you will find that it works better.
If you could either confirm that, or confirm that 3.1.3 works, that would be
great.

3.1.3 uses /dev/.mdadm rather than /lib/init/rw.

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#590319: udevadm settle timeout with mdadm 3.1.2

2010-07-28 Thread Neil Brown

On Mon, 26 Jul 2010 21:37:47 +0200
Jan Echternach j...@goneko.de wrote:

 On Mon, Jul 26, 2010 at 06:44:25PM +1000, Neil Brown wrote:
  If you are up to compiling and installing your own mdadm, could you try
  reverting
  
  http://neil.brown.name/git?p=mdadm;a=commitdiff;h=319767b85c2b16d80c235195329470c46d4547b3
  
  from 3.1.2 and see if the makes the difference?
 
 It doesn't make a difference.
 

Thanks Some one else reported a similar problem and used 'git bisect' to
identify that patch, and there certainly are problems with that patch.
So I'm still in the dark.

Would you be able to try 'git bisect' from
   git://neil.brown.name/mdadm/ master
starting with
   git bisect start mdadm-3.1.2 mdadm-3.1.1

and see where it gets you?  You would need to 
 make ; make install ; mkinitramfs ; reboot

each time.

Thanks,
NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#590319: udevadm settle timeout with mdadm 3.1.2

2010-07-26 Thread Neil Brown

On Sun, 25 Jul 2010 22:08:12 +0200
Jan Echternach j...@goneko.de wrote:

 Package: mdadm
 Version: 3.1.2-2
 Severity: normal
 
 I get an udevadm settle timeout during boot after upgrading mdadm from
 version 3.1.1-1 to 3.1.2-2:

If you are up to compiling and installing your own mdadm, could you try
reverting

http://neil.brown.name/git?p=mdadm;a=commitdiff;h=319767b85c2b16d80c235195329470c46d4547b3

from 3.1.2 and see if the makes the difference?

Thanks

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#589413: mdadm: Segmentation fault when converting to RAID6 from RAID5

2010-07-22 Thread Neil Brown

On Sat, 17 Jul 2010 15:49:22 +0200
Björn Påhlsson bel...@fukt.bsnet.se wrote:

.
 # mdadm --verbose --grow /dev/md0 --level=raid6 --raid-devices=4
 
 The output of that last command is:
 
 mdadm level of /dev/md0 changed to raid6
 Segmentation fault
 

When I try that (thanks for providing precise commands!) it works:

While the array is still performing recovery I get:

# mdadm --verbose --grow /dev/md0 --level=raid6 --raid-devices=4
mdadm: /dev/md0 is performing resync/recovery and cannot be reshaped

and once recovery completes I get:

# mdadm --verbose --grow /dev/md0 --level=raid6 --raid-devices=4
mdadm level of /dev/md0 changed to raid6
mdadm: /dev/md0: Cannot grow - need backup-file
mdadm: aborting level change

This is with 
Version: 3.1.2-2

on Debian, though I'm still running 2.6.32-5-amd64.  I doubt that would
make a difference, but it might.

Can you run the --grow command under 'strace' and post the output.
e.g.
   strace -o /tmp/trace mdadm --verbose --grow /dev/md0 --level=raid6  
--raid-devices=4

 and post /tmp/trace.

NeilBrown



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#589413: mdadm: Segmentation fault when converting to RAID6 from RAID5

2010-07-22 Thread Neil Brown

On Thu, 22 Jul 2010 16:15:32 +1000
Neil Brown ne...@suse.de wrote:


 on Debian, though I'm still running 2.6.32-5-amd64.  I doubt that would
 make a difference, but it might.

And if fact it does.  I found a 2.6.34 kernel and hit the same bug.

The mdadm bug is fixed by upstream commit
c03ef02d92e4b2a7397f7247ea5a25d932a1a889

It is triggered by a kernel bug that is fixed in upstream commit
a64c876fd357906a1f7193723866562ad290654c
(and many a few near by commits) which will be in 2.6.35, and is in
the latest 2.6.34-stable: 2.6.34.1.

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#585015: kernel 2.6.34 fails to boot normally with mdadm 3.1.1-1

2010-07-22 Thread Neil Brown

On Tue, 08 Jun 2010 13:25:13 +0200
Jean-Luc Coulon (f5ibh) jean-luc.cou...@wanadoo.fr wrote:


 I've a system with lvm2 over raid1 and some filesystems encrypted.
 When I updated mdadm from 3.1.1-1 to 3.1.2-1 the system failed to boot kernel
 2.6.34 (from experimental).
 I tried 3.1.2-2 when it was released, I got the same problem.
 
 The system boot from grub and the wait for the password.
 
 When I enter the password, there is a huge / endless disk activity but the 
 boot
 process seems to be frozen.
 
 The system is still living : if I plug/unplug an usb disk, it is reported on 
 the
 console.
 
 I've then rebooted with 2.6.33 without any problem.
 
 Reverting to 3.1.1-1 solved also the problem.

I wonder if it could be
   commit b179246f4f519082158279b2f45e5fd51842cc42
causing this.

Can you report the contents of /etc/mdadm/mdadm.conf ??

Can you boot of a live CDROM and see if
  mdadm -As

works, or spins or does something else bad?

I'd love to know the cause of this before I release 3.1.3, but
there is so little hard information to go on, it is hard to make progress.

Thanks,
NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#585015: kernel 2.6.34 fails to boot normally with mdadm 3.1.1-1

2010-07-22 Thread Neil Brown

On Thu, 22 Jul 2010 09:30:17 +0200
Jean-Luc Coulon jean-luc.cou...@wanadoo.fr wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Le 22/07/2010 09:04, Neil Brown a écrit :
  
 
  
  I wonder if it could be
 commit b179246f4f519082158279b2f45e5fd51842cc42
  causing this.
  
  Can you report the contents of /etc/mdadm/mdadm.conf ??
 
 Please find it attached

Thanks.  Unfortunately it didn't help.

 
  
  Can you boot of a live CDROM and see if
mdadm -As
  
  works, or spins or does something else bad?
 
 Well: the latest version of madm works on kernel version older the
 2.6.34… and I've not found a live CD with ealier version (2.6.34 or
 3.6.65.rc)…
 Should the version on the live CD of importance?

Yes, you would need a kernel which causes problems - scratch that idea.

 
 BTW, I tried to left the system for a couple of years waiting something
 happens. Finally, I got an udev message: no space left on device. The
 device was not indicated and I've no device (disk) short of room...

Sounds like udev creating lots of things in a ramdisk...

How helpful are you feeling???

It would be really great if you could use 'git bisect' to isolate which
change causes the problem.

Debian doesn't add any significant patches to mainline mdadm, so you could 

  git clone git://neil.brown.name/mdadm
  cd mdadm
Then

  git bisect start mdadm-3.1.2 mdadm-3.1.1

  make ; make install ; mkinitramfs ; reboot
  git bisect good  OR git bisect bad

and see where you end up.

There are only 104 commits, so it should only take 7 iterations.

Thanks,
NeilBrown



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#585015: kernel 2.6.34 fails to boot normally with mdadm 3.1.1-1

2010-07-22 Thread Neil Brown

On Thu, 22 Jul 2010 12:36:47 +0200
Jean-Luc Coulon jean-luc.cou...@wanadoo.fr wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Le 22/07/2010 10:16, Neil Brown a écrit :
  
  On Thu, 22 Jul 2010 09:30:17 +0200
  Jean-Luc Coulon jean-luc.cou...@wanadoo.fr wrote:
 
 [ ... ]
 
  Debian doesn't add any significant patches to mainline mdadm, so you could 
  
git clone git://neil.brown.name/mdadm
cd mdadm
  Then
  
git bisect start mdadm-3.1.2 mdadm-3.1.1
  
make ; make install ; mkinitramfs ; reboot
git bisect good  OR git bisect bad
  
  and see where you end up.
  
  There are only 104 commits, so it should only take 7 iterations.
 
 Ok, done, attached my log.

Great, thanks for doing that.

Had you run one more git bisect good it would have said:
319767b85c2b16d80c235195329470c46d4547b3 is the first bad commit
commit 319767b85c2b16d80c235195329470c46d4547b3
Author: NeilBrown ne...@suse.de
Date:   Mon Feb 8 14:33:31 2010 +1100

mapfile: use ALT_RUN as alternate place to store mapfile

This gives better consistency and fewer hidden '.' files.

Signed-off-by: NeilBrown ne...@suse.de

:100644 100644 552df2914f3fc0b0fc128512d92e818e6627 
366ebe332299c92fceaa4d3c9fa2e8c644b27801 M  mapfile.c

to make it explicit.
That patch changed the location where the mapfile is stored during initramfs
time from /dev/.mdadm.map to /lib/init/rw/map.
Which:
  a/ isn't exactly what I wanted (I wanted /lib/init.rw/mdadm/map) and
  b/ doesn't exist - damn.
I thought that /lib/init/rw/map existed during early boot, but it seems not.
I guess I'm going to have to leave it in /dev - which I don't like at all but
there doesn't seem to be an option (OK Doug, you can say I told you so now).

Why that would cause infinite loops I'm not sure.  It would stop udev from
creating a symlink from /dev/md/whatever to /dev/mdXX - maybe that is enough
to upset some part of the boot process.

I wonder how that related to the kernel... you say it only breaks with 2.6.34.

I'll try experimenting with 2.6.34 and see if I can break it ... but not
today.
Meanwhile I'll revert that change for mdadm-3.1.3.

Thanks for your help.
NeilBrown



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#587550: mdadm: --manage --remove faulty does not remove all faulty disks

2010-06-30 Thread Neil Brown

On Tue, 29 Jun 2010 14:24:07 -0400
Jim Paris j...@jtan.com wrote:

 My guess is that in Manage.c:Manage_subdevs, the loops like
 for (; j  array.raid_disks + array.nr_disks ; j++) {
 are missing disks because the disk numbers are changing as they are
 removed, but I didn't have the time to follow the code in detail.
 


Thanks for the report.
Good guess.

This is the patch that I have just checked in to mdadm.

Thanks,
NeilBrown


commit b3b4e8a7a229a915421329a5319f996b0842
Author: NeilBrown ne...@suse.de
Date:   Wed Jun 30 17:20:38 2010 +1000

Avoid skipping devices where removing all faulty/detached devices.

When using 0.90 metadata, devices can be renumbered when
earlier devices are removed.
So when iterating all devices looking for 'failed' or 'detached'
devices, we need to re-check the same slot we checked last time
to see if maybe it has a different device now.

Reported-by: Jim Paris j...@jtan.com
Resolves-Debian-Bug: 587550
Signed-off-by: NeilBrown ne...@suse.de

diff --git a/Manage.c b/Manage.c
index 6bc5d0a..edf41e9 100644
--- a/Manage.c
+++ b/Manage.c
@@ -376,6 +376,7 @@ int Manage_subdevs(char *devname, int fd,
return 1;
}
 
+   stb.st_rdev = 0;
for (dv = devlist, j=0 ; dv; dv = next, j = jnext) {
unsigned long long ldsize;
char dvname[20];
@@ -394,6 +395,7 @@ int Manage_subdevs(char *devname, int fd,
return 1;
}
for (; j  array.raid_disks + array.nr_disks ; j++) {
+   int dev;
disc.number = j;
if (ioctl(fd, GET_DISK_INFO, disc))
continue;
@@ -401,9 +403,15 @@ int Manage_subdevs(char *devname, int fd,
continue;
if ((disc.state  1) == 0) /* faulty */
continue;
-   stb.st_rdev = makedev(disc.major, disc.minor);
+   dev = makedev(disc.major, disc.minor);
+   if (stb.st_rdev == dev)
+   /* already did that one */
+   continue;
+   stb.st_rdev = dev;
next = dv;
-   jnext = j+1;
+   /* same slot again next time - things might
+* have reshuffled */
+   jnext = j;
sprintf(dvname,%d:%d, disc.major, disc.minor);
dnprintable = dvname;
break;
@@ -419,6 +427,7 @@ int Manage_subdevs(char *devname, int fd,
}
for (; j  array.raid_disks + array.nr_disks; j++) {
int sfd;
+   int dev;
disc.number = j;
if (ioctl(fd, GET_DISK_INFO, disc))
continue;
@@ -435,9 +444,15 @@ int Manage_subdevs(char *devname, int fd,
continue;
if (errno != ENXIO)
continue;
-   stb.st_rdev = makedev(disc.major, disc.minor);
+   dev = makedev(disc.major, disc.minor);
+   if (stb.st_rdev == dev)
+   /* already did that one */
+   continue;
+   stb.st_rdev = dev;
next = dv;
-   jnext = j+1;
+   /* same slot again next time - things might
+* have reshuffled */
+   jnext = j;
dnprintable = dvname;
break;
}



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#569359: mdadm -Ds insists on /dev/md/X device presence

2010-06-08 Thread Neil Brown

On Tue, 8 Jun 2010 12:37:38 +0200
martin f krafft madd...@debian.org wrote:

 also sprach Martin Michlmayr t...@cyrius.com [2010.06.08.1124 +0200]:
   It would be safer to use
  mdadm -As
   to ensure all arrays are assembled, then
  mdadm -Ds
   to create mdadm.conf
   
   ... but why do you even want to create mdadm.conf ???
  
  I'll let madduck answer this.
 
 I have not yet found a way to ensure stable device names without an
 mdadm.conf, so I have not yet made it optional. The current push to
 UUID-based device access in combination with incremental assembly
 might be the key.
 

That is fair enough.
So if you goal is to create an mdadm.conf which will ensure that the current
arrays will continue to have the names that they have now, then it would be
best to use mdadm -Ds to create that mdadm.conf as it will use the names
that the arrays have now.  mdadm -E will use the names that the arrays
would have if they were auto-assembled.

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#569359: mdadm -Ds insists on /dev/md/X device presence

2010-06-07 Thread Neil Brown

On Mon, 7 Jun 2010 16:25:47 +0100
Martin Michlmayr t...@cyrius.com wrote:

 * martin f krafft madd...@debian.org [2010-05-26 10:31]:
  also sprach Neil Brown ne...@suse.de [2010.05.26.1016 +0200]:
   The most likely explanation for this is that /var/run/mdadm/map existed 
   and
   contained and entry for 'md0' declaring that it could be found at 
   /dev/md/0.
   
   Normally mdadm will write such a name there when auto-assembling an array,
   then udev will notice that the array has appeared, will use mdadm to find 
   the
   path name in /var/run/mdadm/map and will create the node in /dev.
   
   So it would seem that udev is not running, but /dev/.udev exists.
   If udev was running, it would have created /dev/md/0.
   If /dev/.udev didn't exist, then mdadm would have created it when 
   assembling
   the array.
   
   If this is correct, udev is not running but /dev/.udev exists, then you 
   can
   force mdadm to still create the device with
 export MDADM_NO_UDEV=1
  
  Martin, could you please investigate this further? I don't have much
  experience with d-i, nor really the time to attain that right now.
 
 Thanks for your explanation, Neal.  So here's a description of what's
 going on:
 
 We boot into Debian installer.  The installer creates a RAID1 device with:
 mdadm --create /dev/md0 --auto=yes [...]
 udev is running.  /dev/md0 exists.  /dev/md/0 does not exist.
 
 There's a file /var/run/map (not /var/run/mdadm/map) which contains
 this:
 md0 1.2 dcfdb5af:3b385a1e:def86b24:09a2527f /dev/md0

When I compile mdadm from source, it creates /var/run/mdadm/map.  When I use
Debian's mdadm, it creates /var/run/map as you say. Something is broken
there.
Otherwise everything seems OK so far.

 
 When I run:
 mdadm --examine --scan --config=partitions
 I get:
 ARRAY /dev/md/0 metadata=1.2 UUID=afb5fddc:1e5a383b:246bf8de:7f52a209 
 name=debian:0
 i.e. the /dev/md/0 form.  (Also note that that the UUID in the map
 file is different to that reported by --examine --scan; not sure
 why.)

You use --examine like this?  You know the array you have just created.  You
know the name you want to call it (/dev/md0).  Why not
   mdadm --detail --brief /dev/md0
??
It gives a name like /dev/md/foo because it is using 1.x (1.2 in this case)
metadata.  1.x metadata stores and array name, not a number.  In that case
the name just happens to be numerical.
If the name was actually 'foo', it would be wrong to report /dev/mdfoo, but
correct to report /dev/md/foo.
With 1.x metadata you get /dev/md/name
With 0.90 metadata you get /dev/mdNUM


 
 After creating the RAID0 and formating it, Debian installer will then
 install Debian to disk (in a chroot).  At some point, mdadm is
 installed in the chroot.  /dev from the real system is bind mounted
 in the chroot.
 
 Debian's mdadm sees that there's no mdadm config file and tries to
 generate one.  It runs:
 mdadm --examine --scan --config=partitions
 and writes this to /etc/mdadm/mdadm.conf.  The output is:
 ARRAY /dev/md/0 metadata=1.2 UUID=afb5fddc:1e5a383b:246bf8de:7f52a209 
 name=debian:0
 There's no map file in /var/run at this point.
 
 /dev/md0 exists but /dev/md/0 does not.
 
 Debian then generates a ramdisk.  It looks at /etc/mdadm/mdadm.conf
 and sees that the RAID device is /dev/md/0 and then does:
 mdadm --detail /dev/md/0
 to determine the RAID level.  This fails because /dev/md/0 does not
 exist.  As a consequence, the ramdisk won't contain the RAID modules
 and will fail to boot.

Maybe we need to get mdadm --assemble, if it finds that the array is
already assembled, to create the device named in mdadm.conf anyway.
i.e. if the array mentioned in mdadm.conf already exists in 'map' with
a different name, mdadm --assemble just creates the new name.  There is
another context that came up recently where that would be helpful.

Alternately, you could just
  mdadm -Ir

this rebuilds the 'map' file using the same name that mdadm -E would use, and
then tickles udev so that it creates the right names in /dev...
Almost.
I'm not sure it uses exactly the same names as -E.

It would be safer to use
   mdadm -As
to ensure all arrays are assembled, then
   mdadm -Ds
to create mdadm.conf

... but why do you even want to create mdadm.conf ???

Maybe a deeper rethink is needed here.
 
 
 So my questions (the first one primarily for Neil, the second one
 for madduck):
 
  - Why does mdadm --examine --scan output the /dev/md/X form rather
than /dev/mdX when no config and map file exists.  Is /dev/md/X
prefered over /dev/mdX?  If so, maybe Debian installer should
use
mdadm --create /dev/md/X
instead of
mdadm --create /dev/mdX
?

  I think I have explained above. It is a metadata version difference.


 
  - Why is /dev/md/X not created?  Well, I guess it's not generated
because we use /dev/mdX and not even the map file mentions
/dev/md/X but given that mdadm --examine --scan prefers /dev/md/X
maybe we should

Bug#583495: super-intel.c:700: error: format '%llu' expects type 'long long unsigned int', but argument 3 has type '__u64'

2010-05-27 Thread Neil Brown

On Thu, 27 May 2010 21:51:47 +0200
martin f krafft madd...@debian.org wrote:

 Package: mdadm
 Version: 3.1.2-1
 Severity: serious
 Tags: upstream
 Justification: fails to build from source on ia64 (at least)
 
 Another one of these, which upstream will hopefully fix:
 
 gcc -Wall -Werror -Wstrict-prototypes -ggdb -fomit-frame-pointer -Os 
 -DSendmail=\/usr/sbin/sendmail -t\ -DCONFFILE=\/tmp/mdadm.conf\ 
 -DCONFFILE2=\/etc/mdadm.conf\ -DALT_RUN=\/lib/init/rw\ 
 -DVAR_RUN=\/var/run\ -DDEBIAN  -c -o super-intel.o super-intel.c
 cc1: warnings being treated as errors
 super-intel.c: In function 'print_imsm_dev':
 super-intel.c:700: error: format '%llu' expects type 'long long unsigned 
 int', but argument 3 has type '__u64'
 (https://buildd.debian.org/fetch.cgi?pkg=mdadmarch=ia64ver=3.1.2-1stamp=1274986951file=log)
 


Already queued.

http://neil.brown.name/git?p=mdadm;a=commitdiff;h=94fcb80a8e0c6311636b2ee689a6ac5b7125afe6

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#271766: wiggle: Wiggle adds strange bogus conflict to merged file (Debian bug)

2010-03-23 Thread Neil Brown

On Fri, 19 Mar 2010 06:45:12 +0200
Jari Aalto jari.aa...@cante.net wrote:

 Would you have any insight on this bug:
 
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=271766
 
 Unfortunately there is no test cases to try, so it's a long shot.

Fortunately there is enough of a test-case.  I found a suitable xfaces.c
at

  http://www.opensource.apple.com/source/emacs/emacs-60/emacs/src/xfaces.c?txt

I have fixed the problem and the fix can be found at 

  
http://neil.brown.name/git?p=wiggle;a=commitdiff;h=8420e684cf3736de120db8f7828408d06240566c

Running wiggle -d on just the .rej file is quite telling.  The diff that is
generated has no end context.  This confuses wiggle's merge routine.  It
probably shouldn't but it does.  I fixed the diff fixup code to restore the
end context.

The important part of the patch is below.

Thanks for the report.

(pity it has taken over 5 years to find out about this bug and fix it ... I
really should spend more time on wiggle!)

NeilBrown


diff --git a/diff.c b/diff.c
index 49f87d1..9e3e7cc 100644
--- a/diff.c
+++ b/diff.c
@@ -243,45 +243,58 @@ static struct csl *lcsl(struct file *a, int alo, int ahi,
return csl;
 }
 
-/* if two common sequences are separated by only an add or remove,
+/* If two common sequences are separated by only an add or remove,
  * and the first common ends the same as the middle text,
  * extend the second and contract the first in the hope that the
  * first might become empty.  This ameliorates against the greedyness
  * of the 'diff' algorithm.
+ * We treat the final zero-length 'csl' as a common sequence which
+ * can be extended so we much make sure to add a new zero-length csl
+ * to the end.
  * Once this is done, repeat the process but extend the first
- * in favour of the second.  The acknowledges that semantic units
- * more often end with common text (return 0;\n}\n, \n) than
- * start with it.
+ * in favour of the second but only up to the last newline.  This
+ * acknowledges that semantic units more often end with common
+ * text (return 0;\n}\n, \n) than start with it.
  */
 static void fixup(struct file *a, struct file *b, struct csl *list)
 {
struct csl *list1, *orig;
int lasteol = -1;
+   int found_end = 0;
if (!list) return;
orig = list;
list1 = list+1;
-   while (list-len  list1-len) {
+   while (list-len) {
+   if (list1-len == 0)
+   found_end = 1;
+
if ((list-a+list-len == list1-a 
+list-b+list-len != list1-b 
 /* text at b inserted */
 match(b-list[list-b+list-len-1],
   b-list[list1-b-1])
-   )
+   )
||
(list-b+list-len == list1-b 
+list-a+list-len != list1-a 
 /* text at a deleted */
 match(a-list[list-a+list-len-1],
   a-list[list1-a-1])
)
) {
-/* printword(a-list[list1-a-1]);
+#if 0
+   printword(stderr, a-list[list1-a-1]);
printf(fixup %d,%d %d : %d,%d %d\n,
   list-a,list-b,list-len,
   list1-a,list1-b,list1-len);
-*/ if (ends_line(a-list[list-a+list-len-1])
+#endif
+   if (ends_line(a-list[list-a+list-len-1])
 a-list[list-a+list-len-1].len==1
 lasteol == -1
) {
-/* printf(E\n);*/
+#if 0
+   printf(E\n);
+#endif
lasteol = list1-a-1;
}
list1-a--;
@@ -290,7 +303,12 @@ static void fixup(struct file *a, struct file *b, struct 
csl *list)
list-len--;
if (list-len == 0) {
lasteol = -1;
-   if (list  orig)
+   if (found_end) {
+   *list = *list1;
+   list1-a += list1-len;
+   list1-b += list1-len;
+   list1-len = 0;
+   } else if (list  orig)
list--;
else {
*list = *list1++;
@@ -300,7 +318,8 @@ static void fixup(struct file *a, struct file *b, struct 
csl *list)
} else {
if (lasteol = 0) {
 /* printf(seek %d\n, lasteol);*/
-   while (list1-a = lasteol  list1-len1) {
+   while (list1-a = lasteol

Bug#567468: md homehost

2010-02-24 Thread Neil Brown

On Wed, 24 Feb 2010 18:52:57 +0100
Mario 'BitKoenig' Holbe mario.ho...@tu-ilmenau.de wrote:

 On Wed, Feb 24, 2010 at 02:13:53PM +0100, Goswin von Brederlow wrote:
  grub.cfg (grub2) uses UUID for grub itself. But the kernel can be bootet
  with root=/dev/md0. But in that case where does it get the homehost from
  and since when does kernel raid autoconfig have a homehost?
 
 The homehost attribute does only exist with v1 superblocks. And there is
 no in-kernel auto-assembly for v1 superblocks.
 v0.9 superblocks (for which in-kernel auto-assembly is deprecated but
 still provided) have no homehost.


Not entirely correct.  The 'homehost' is encoded in the uuid of v0.90
metadata, so it does affect them too.

in-kernel autodetect does not make use of 'homehost' and so does not protect
you from the potential confusions that homehost tries to protect you from.

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#567468: md homehost

2010-02-24 Thread Neil Brown

On Wed, 24 Feb 2010 14:41:16 +0100
Goswin von Brederlow goswin-...@web.de wrote:

 Neil Brown ne...@suse.de writes:
 
  On Tue, 23 Feb 2010 07:27:00 +0100
  martin f krafft madd...@madduck.net wrote:
  The only issue homehost protects against, I think, is machines that
  use /dev/md0 directly from grub.conf or fstab.
 
  That is exactly correct.  If no code or config file depends on a name like
  /dev/mdX or /dev/md/foo, then you don't need to be concerned about the whole
  homehost thing.
  You can either mount by fs-uuid, or mount e.g.
 /dev/disk/by-id/md-uuid-8fd0af3f:4fbb94ea:12cc2127:f9855db5 
 
 What if you have two raids (one local, one from the other hosts that
 broke down) and both have LVM on it with /dev/vg/root?
 
 Shouldn't it only assemble the local raid (as md0 or whatever) and then
 only start the local volume group? If it assembles the remote raid as
 /dev/md127 as well then lvm will have problems and the boot will likely
 (even randomly) go wrong since only one VG can be activated.
 
 I think it is pretty common for admins to configure LVM to the same
 volume group name on different systems. So if you consider raids being
 pluged into other systems please keep this in mind.

You are entirely correct.  However lvm problems are not my problems.

It has always been my position that the best way to configure md is to
explicitly list your arrays in mdadm.conf.  But people seem to not like this
and want it to all be 'automatic'.  So I do my best to make it as automatic
as possible but still remove as many of the possible confusion that this can
cause as possible.  But I cannot remove them all.

If you move disks around and boot and lvm gets confused because there are two
things call /dev/vg/root, then I'm sorry but there is nothing I can do
about that.  If you had an mdadm.conf which listed you md arrays, and had
   auto -all
then you can be sure that mdadm would not be contributing to this problem.

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#567468: md homehost

2010-02-24 Thread Neil Brown

On Thu, 25 Feb 2010 08:16:14 +0100
Goswin von Brederlow goswin-...@web.de wrote:

 Neil Brown ne...@suse.de writes:
 
  On Wed, 24 Feb 2010 14:41:16 +0100
  Goswin von Brederlow goswin-...@web.de wrote:
 
  Neil Brown ne...@suse.de writes:
  
   On Tue, 23 Feb 2010 07:27:00 +0100
   martin f krafft madd...@madduck.net wrote:
   The only issue homehost protects against, I think, is machines that
   use /dev/md0 directly from grub.conf or fstab.
  
   That is exactly correct.  If no code or config file depends on a name 
   like
   /dev/mdX or /dev/md/foo, then you don't need to be concerned about the 
   whole
   homehost thing.
   You can either mount by fs-uuid, or mount e.g.
  /dev/disk/by-id/md-uuid-8fd0af3f:4fbb94ea:12cc2127:f9855db5 
  
  What if you have two raids (one local, one from the other hosts that
  broke down) and both have LVM on it with /dev/vg/root?
  
  Shouldn't it only assemble the local raid (as md0 or whatever) and then
  only start the local volume group? If it assembles the remote raid as
  /dev/md127 as well then lvm will have problems and the boot will likely
  (even randomly) go wrong since only one VG can be activated.
  
  I think it is pretty common for admins to configure LVM to the same
  volume group name on different systems. So if you consider raids being
  pluged into other systems please keep this in mind.
 
  You are entirely correct.  However lvm problems are not my problems.
 
  It has always been my position that the best way to configure md is to
  explicitly list your arrays in mdadm.conf.  But people seem to not like this
  and want it to all be 'automatic'.  So I do my best to make it as automatic
  as possible but still remove as many of the possible confusion that this can
  cause as possible.  But I cannot remove them all.
 
  If you move disks around and boot and lvm gets confused because there are 
  two
  things call /dev/vg/root, then I'm sorry but there is nothing I can do
  about that.  If you had an mdadm.conf which listed you md arrays, and had
 auto -all
  then you can be sure that mdadm would not be contributing to this problem.
 
  NeilBrown
 
 Yes you can do something about it: Only start the raid arrays with the
 correct homehost.

This is what 'homehost' originally did, but I got a lot of push-back on that.
I added the auto line in mdadm.conf so that the admin could choose what
happens.
If the particular metadata type is enabled on the auto line, the the array is
assembled with a random name.  If it is disabled, it is not assembled at all
(unless explicitly listed in mdadm.conf).
I'm not sure exactly how 'auto' interacts with 'homehost'.  The documentation
I wrote only talks about arrays listed in mdadm.conf or on the command line,
not arrays with a valid homehost.  I guess I should check. I think I want
auto -all to still assemble arrays with a valid homehost.  I'll confirm
that before I release 3.1.2.

 
 If the homehost is only used to decide wether the prefered minor in the
 metadata is used for the device name then I feel the feature is entirely
 useless. It would only help in stupid configurations, i.e. when you
 use the device name directly.

Yes.

 
 Another scenario where starting a raid with the wrong homehost would be
 bad is when the raid is degraded and you have a global spare. You
 probably wouldn't want the global spare of one host to be used to repair
 a raid of another host.

I only support global spares that are explicitly listed in mdadm.conf, so
currently this couldn't happen.  One day some one is going to ask for
auto-configure global spares.  Then I'll have to worry about this (or just
say no).

 
 MfG
 Goswin
 
 PS: If a raid is not listed in mdadm.conf doesn't it currently start too
 but the name can be random?

It depends on the auto line in mdadm.conf

Thanks,
NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#567468: md homehost (was: Bug#567468: (boot time consequences of) Linux mdadm superblock) question.

2010-02-23 Thread Neil Brown

On Tue, 23 Feb 2010 07:27:00 +0100
martin f krafft madd...@madduck.net wrote:

 also sprach Neil Brown ne...@suse.de [2010.02.23.0330 +0100]:
  The problem to protect against is any consequence of rearranging
  devices while the host is off, including attaching devices that
  previously were attached to a different computer.
 
 How often does this happen, and how grave/dangerous are the effects?

a/ no idea.
b/ it all depends...
  It is the sort of thing that happens when something has just gone
  drastically wrong and you need to stitch things back together again as
  quickly as you can.  You aren't exactly panicing, but you are probably
  hasty and don't want anything else to go wrong.

  If the array from the 'other' machine with the same name has very different
  content, then things could go wrong in various different ways if we
  depended on that name.
  It is true that the admin would have to by physically present and could
  presumably get a console and 'fix' things.  But it would be best if they
  didn't have too.  They may not even know clearly what to do to 'fix' things
  - because it always worked perfectly before, but this time when in a
particular hurry, something strange goes wrongs.  I've been there, I
don't want to inflict it on others.

 
  But if '/' is mounted by a name in /dev/md/, I want to be sure
  mdadm puts the correct array at that name no matter what other
  arrays might be visible.
 
 Of course it would be nice if this happened, but wouldn't it be
 acceptable to assume that if someone swaps drives between machines
 that they ought to know how to deal with the consequences, or at
 least be ready to tae additional steps to make sure the system still
 boots as desired?

No.  We cannot assume that an average sys-admin will have a deep knowledge of
md and mdadm.  Many do, many don't.  But in either case the behaviour must be
predictable.
After all, Debian is for when you have better things to do than fixing
systems

 
 Even if the wrong array appeared as /dev/md0 and was mounted as root
 device, is there any actual problem, other than inconvenience?
 Remember that the person who has previously swapped the drives is
 physically in front of (or behind ;)) the machine.
 
 I am unconvinced. I think we should definitely switch to using
 filesystem-UUIDs over device names, and that is the only real
 solution to the problem, no?
 

What exactly are you unconvinced of?
I agree completely that mounting filesystems by UUID is the right way to go.
(I also happen to think that assembly md arrays by UUID is the right way to
go too, but while people seem happy to put fs uuids in /etc/fstab, they seem
less happy to put md uuids in /etc/mdadm.conf).

As you say in another email:

 The only issue homehost protects against, I think, is machines that
 use /dev/md0 directly from grub.conf or fstab.

That is exactly correct.  If no code or config file depends on a name like
/dev/mdX or /dev/md/foo, then you don't need to be concerned about the whole
homehost thing.
You can either mount by fs-uuid, or mount e.g.
   /dev/disk/by-id/md-uuid-8fd0af3f:4fbb94ea:12cc2127:f9855db5 


NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#567468: md homehost (was: Bug#567468: (boot time consequences of) Linux mdadm superblock) question.

2010-02-22 Thread Neil Brown

On Mon, 22 Feb 2010 10:16:32 +0100
martin f krafft madd...@madduck.net wrote:

 also sprach Piergiorgio Sartor piergiorgio.sar...@nexgo.de [2010.02.21.2113 
 +0100]:
  I do not see how the homehost plays a role, here.
 
 Neil,
 
 Could you please put forth the argument for why the homehost must
 match, and why unconditional auto-assembly is not desirable?
 Realistically, what problems are we protecting against?
 

The problem to protect against is any consequence of rearranging devices
while the host is off, including attaching devices that previously were
attached to a different computer.

mdadm will currently assembly any array that it finds, but will not give a
predictable name to anything that looks like it might be imported from a
different host.
So if you have 'md0' on each of two computers, one computer dies and you move
the devices from that computer to the other, then as long as the bios boots
of the right drive, mdadm will assemble the local array as 'md0' and the
other array as 'something else'.

There are two ways that mdadm determines than array is 'local'.
1/ is the uuid listed against an array in mdadm.conf
2/ is the 'homehost' encoded in the metadata.

If either of those is true, the array is local and gets a predictable name.
If neither, the name gets an _%d suffix.

This is only an issue if you use device name (.e.g /dev/md0 or /dev/md/root)
to mount the root filesystem.
If you use mount-by-uuid then it clearly doesn't matter what name mdadm
assembles the array under.  In that case, the fs UUID (stored on the
initramfs or similar) will assure the necessary uniqueness and mdadm need not
worry about homehost.

But if '/' is mounted by a name in /dev/md/, I want to be sure mdadm puts the
correct array at that name no matter what other arrays might be visible.

Does that clarify things enough?

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#511417: Raid1 resync does not run in the background

2010-02-19 Thread Neil Brown

On Fri, 19 Feb 2010 16:37:29 +0100
Joachim Zobel jz-2...@heute-morgen.de wrote:


 Do the upstream kernel developers know about this? A box unnecessarily
 taking 1h+ for a reboot is a rather serious problem.

I do now.
I'm quite surprised though.  Any attempted by other programs to access the
devices should cause the resync to slow down the the min value which is
really quite slow.
So my guess is that something is going wrong with the detection of other
programs accessing the device.

Can you tell me a bit about the configuration of your storage - what devices,
what connections, and LVM in use, what raid levels, partitions, etc.

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#517731: Please add raid1 - raid5 reshaping support

2010-01-31 Thread Neil Brown

On Sunday March 1, goswin-...@web.de wrote:
 Package: mdadm
 Version: 2.6.8-12-gb47dff6-2
 Severity: wishlist
 
 Hi,
 
 just recently Ingo Juergensmann brought up a question on irc about
 converting a raid1 into a raid5. Currently --grow does not support
 that but one can achive it manualy with a little trick.

mdadm now supports this with mdadm-3.1.1 and Linux-2.6.32

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#396570: mdadm: Incorrect error message for insufficient permission

2010-01-31 Thread Neil Brown

On Fri, 29 Jan 2010 15:44:27 -0700
Rob Sims debbug...@robsims.com wrote:
 
 Still not fixed:
 $  mdadm --remove /dev/md0 /dev/sdd1
 mdadm: /dev/md0 does not appear to be an md device
 $ mdadm --version
 mdadm - v3.1.1 - 19th November 2009

Thanks.
I have committed the following fix upstream which should resolve this issue.

NeilBrown

commit ac5678dd9b67995a84bf2348d82e641d7895415e
Author: NeilBrown ne...@suse.de
Date:   Mon Feb 1 10:22:38 2010 +1100

Add test for are we running as root.

Most operations require root access.  Rather than ensure we generate
the right error message when something fails because we aren't root,
check early.
Note that --examine does not necessarily require root, so test
for that first.

Resolves-Debian-bug: 396570
Signed-off-by: NeilBrown ne...@suse.de

diff --git a/mdadm.c b/mdadm.c
index be4fbf6..eb124d5 100644
--- a/mdadm.c
+++ b/mdadm.c
@@ -1046,6 +1046,12 @@ int main(int argc, char *argv[])
}
}
 
+   if ((mode != MISC || devmode != 'E') 
+   geteuid() != 0) {
+   fprintf(stderr, Name : must be super-user to perform this 
action\n);
+   exit(1);
+   }
+
ident.autof = autof;
 
rv = 0;




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#567167: format '%llu' expects type 'long long unsigned int', but argument 2 has type '__u64'

2010-01-27 Thread Neil Brown

On Thu, 28 Jan 2010 08:13:46 +1300
martin f krafft madd...@debian.org wrote:

 Package: mdadm
 Version: 3.1.1-1
 Severity: serious
 Justification: no longer builds from source on certain architectures
 Tags: upstream confirmed
 Forwarded: ne...@suse.de
 
 -Wall -Werror -Wstrict-prototypes -ggdb -fomit-frame-pointer -Os 
 -DSendmail=\/usr/sbin/sendmail -t\ -DCONFFILE=\/tmp/mdadm.conf\ 
 -DCONFFILE2=\/etc/mdadm.conf\ -DDEBIAN  -c -o Grow.o Grow.c
 cc1: warnings being treated as errors
 Grow.c: In function 'validate':
 Grow.c:1443: error: format '%llu' expects type 'long long unsigned int', but 
 argument 2 has type '__u64'
 make[2]: *** [Grow.o] Error 1
 
 This was reported bu ia64 and alpha autobuilders thus far.
 
 Neil, this is definitely outside of my domain. I think I would try
 to solve this with an explicit cast to llui, but that might not be
 what we want at all. Hence I am leaving it to you.
 

Thanks.
Those prints are really just for debugging, so the best thing to do is remove
them.  Were we to keep the, a cast to (unsigned long long) would be the right
way to go.

I cave committed and pushed a patch to fix this.

Thanks,
NeilBrown




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#534470: Using mdadm 2.6.7.2 to assemble a raid array created with mdadm 1.9.0 will corrupt it making mdadm 1.9.0 to crash when trying to reassemble

2010-01-27 Thread Neil Brown

On Wed, 27 Jan 2010 14:13:36 +1300
martin f krafft madd...@debian.org wrote:

 also sprach RUSSOTTO François-Xavier 200103 francois-xavier.russo...@cea.fr 
 [2009.12.02.0407 +1300]:
  As I suggested to Neil: prior to auto-remount a raid array, the
  raid tool should perform a version checking so that, at least,
  user is warned that the raid array might be corrupt performing
  such operation.  
 
 Neil, does this sound like a feasible solution to
 http://bugs.debian.org/534470?

I don't think so.
We really want old arrays to work smoothly on new kernels.
To avoid a repartition of the original problem we would need not just a
message, but an option not to assemble the array and that would be awkward for
people normally upgrading their system.

So I don't think there is a solution for this problem that does not introduce
other problems.
It does not affect x86 architectures, and is only a problem if you move to a
new kernel, then back to an old kernel.  So hopefully it will be very rare.

Sorry.

NeilBrown



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#554461: udev: partitions on sd* belonging to a software raid are not populated

2009-12-07 Thread Neil Brown

On Mon, 7 Dec 2009 18:55:24 +0100
m...@linux.it (Marco d'Itri) wrote:

 reassign 554461 mdadm
 thanks
 
 On Dec 07, Andrea Palazzi palazziand...@yahoo.it wrote:
 
  At boot time the system (usually) complains about /dev/sdc and /dev/sdc1
  having a very similar superblock, and suggests to clear the fake one;
  however mdadm --zero-superblock does not solve.
 Looks like an mdadm bug to me.
 
  Tehy say it can be a race condition, and it seems to me coherent with
  the behaviour I'm seeing; however I couldn't find on my system the
  related udef file, so I couldn't use the proposed workaround.
 Even if this were true, it would still be an mdadm bug.
 

This is not really an mdadm bug.  It is a bug with the 0.90 metadata format.

If a partition starts at and offset from the start of the device which is a
multiple of 64K, and extends to the end of the device, then a 0.90 superblock
on the partition is indistinguishable from a similar superblock on the whole
device.
So mdadm cannot know which one to use.
I guess the error message which mdadm is giving doesn't make this very clear,
so maybe that is a bug in mdadm.  But even if that were fixed you would still
have a problem with your current setup.

The easiest way to deal with this situation is to tell mdadm whether you want
to use partitions or whole devices.  You do this with the DEVICE line in
/etc/mdadm/mdadm.conf

To you partitions, make the line e.g.

   DEVICE /dev/sd[a-z][0-9]

For whole devices, something like

   DEVICE /dev/sd[a-z]

However if you have more than 26 devices, you might need more possibilities
listed.

We are slowly moving towards making the v1.x metadata the default.  This
metadata does not suffer from this problem.

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#434565: vm: 8bit characters are not escapes in In-reply-to field

2009-11-30 Thread Neil Brown

On Mon, 30 Nov 2009 12:14:02 -0600
Manoj Srivastava sriva...@acm.org wrote:

 Uday Reddy  wrote:
  I agree that this is a bug. But, if you leave the variable
  vm-in-reply-to-format with its default value %i then you won't
  have this problem. 
 
 manoj

Thanks!

I had completely forgotten that I had set this variable.
I'll put it back the way it was.  Feel free to close the bug.

Thanks,
NeilBrown




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#549083: Not working with 3.0.3-2

2009-11-08 Thread Neil Brown

On Sunday November 8, berbe...@fmi.uni-passau.de wrote:
 hello,
 
 Am Sonntag, den 08. November schrieb M G Berberich:
  still not working with mdadm 3.0.3-2. During boot it failes with:
  
  Begin: Assemble all MD arrays ... Failure: failed to assemble all arrays
  
  and then can't mount the root-fs. Downgrading to mdadm 2.6.7.2-3 makes
  it work again, but breaks a lot of dependies in dpkg/apt.
 
 I found out that the problem was that the $,1r|(Bmdadm --assemble 
 $,1sr}(B-call
 failed because of the names of the arrays. For root:
 
sda2 has wrong name
 
 So I tried to remove the $,1r|(Bname='hermione':2$,1r}(B (and all the 
 others) from
 /etc/mdadm/mdadm.conf and now it boots.
 
 Result: This does not boot:
 
   ARRAY /dev/md/2 level=raid1 metadata=1.0 num-devices=2 
 UUID=edc699ab:86dacff1:4b646c0c:73e9f59e name='hermione':2
 
 this does:
 
   ARRAY /dev/md/2 level=raid1 metadata=1.0 num-devices=2 
 UUID=edc699ab:86dacff1:4b646c0c:73e9f59e
 
 calling $,1r|(Bmdadm --detail --scan$,1r}(B still gives the same names:
 
   $,1s
(B   ARRAY /dev/md/2 metadata=1.00 name='hermione':2 
UUID=edc699ab:86dacff1:4b646c0c:73e9f59e
   $,1s(B
 

I wonder where those ' characters are coming from.
Presumably it should be 
   . name=hermione:2 

Looking around the initrd someone else posted I find in
   scripts/local-top/mdadm
the code:

  [ -n ${MD_HOMEHOST:-} ]  extra_args=--homehost='$MD_HOMEHOST'
  if $MDADM --assemble --scan --run --auto=yes $extra_args; then

That is wrong.
It will cause the --homehost string given to mdadm to have single
quotes around it.  It should be

  [ -n ${MD_HOMEHOST:-} ]  extra_args=--homehost=$MD_HOMEHOST
  if $MDADM --assemble --scan --run --auto=yes ${extra_args:+$MD_HOMEHOST}; 
then

so that spaces in MD_HOMEHOST will be protected, but quotes won't
be passed through (isn't Bourne-Shell a wonderful programming
language!!).

I cannot see how that bug would cause the current problem, but it
should still be fixed.

Actually, having spaces in the hostname would break lots of things, so
all that extra quoting really is pointless.  So this would be
adequate.

  [ -n ${MD_HOMEHOST:-} ]  extra_args=--homehost=$MD_HOMEHOST
  if $MDADM --assemble --scan --run --auto=yes $extra_args ; then


NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#485989: mdadm metadata issue

2009-09-15 Thread Neil Brown

On Monday August 31, robe...@debath.co.uk wrote:
 On Sun, 30 Aug 2009, martin f krafft wrote:
 
  Why specify it in the first place? I suggest to remove all metadata=
  stuff from mdadm.conf. Inspect the /usr/share/mdadm/mkconf output.
 
 I didn't.
 
 It got added automatically...  Bug?

Bug somewhere.  No idea where though, mdadm definitely wouldn't emit
metadata=0.9.

metadata=1 should work since 2.6.4, and possibly even before then.
metadata=0.9 would never work.  It is a version number, not a decimal
   number.  metadata=0.90 is correct and totally different from
   metadata=0.9

Do you have any idea what upgrade script put metadata=0.9 in
there?

NeilBrown




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#534470: Using mdadm 2.6.7.2 to assemble a raid array created with mdadm 1.9.0 will corrupt it making mdadm 1.9.0 to crash when trying to reassemble

2009-06-25 Thread Neil Brown

On Wednesday June 24, francois-xavier.russo...@cea.fr wrote:
 Package: mdadm
 Version: 2.6.7.2
 Severity: critical
 
 After booting a Debian 5.0.1 - Lenny install cdrom in rescue mode
 (debian-501-ia64-netinst.iso) on an Itanium 2 server with Debian 3.1
 - Sarge (ia64) installed on a software raid 5 root partition,
 opening a console in the root partition mounted from the raid array
 (auto-assembled) corrupts the raid array, leading to a kernel panic
 at server reboot, and preventing from manual reassembly using mdadm
 1.9.0 (Sarge). 
 
 The raid 5 array containing the root partition is made of 3
 partitions on 3 scsi disks (sda2, sdb2, sdc2) which ran fluently for
 years. Here is the output at server reboot: 
 
   md: invalid superblock checksum on sdb2

 
 Apart from fixing this bug, I would be grateful that you suggest me
 a safe way to make the server bootable again. I was thinking about
 booting on a Sarge install cdrom and try to re-create the raid array
 with option --assume-clean or, if that fails, re-create the array
 and restore content from a tar backup. 

This problem is due to the fact that the superblock checksumming
routine was changed since 2.6.8.
It was changed because it used code that was different on different
architectures, and was basically unmaintainable.
New kernels can accept most old checksums, but old kernels cannot
necessarily accept the new ones, and mdadm does not know enough about
kernel internals to always create the correct old one.

So it isn't really fixable.

If you boot with the Sarge install cdrom and reacreate the array with
--assume-clean as you suggest it should work fine.
Check the --examine values for chunksize and layout, and the order
of the drives, and make sure you preserve all of those.

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#510261: mdadm: After install if sync is not done when rebooting, it starts over

2009-02-04 Thread Neil Brown

On Wednesday February 4, madd...@debian.org wrote:
 tags 510261 confirmed upstream
 forwarded 510261 ne...@suse.de
 thanks
 
 also sprach John McMonagle jo...@advocap.org [2008.12.30.2214 +0100]:
  In both cases I rebooted before the sync was done.
  In both cases arrays re-synced from the beginning.
  After it finished syncing a reboot does not cause a re-sync.
 
 This is something Neil would need to fix in the kernel. I think
 I remember there being some talk about that, but I may misremember.
 There is nothing mdadm can do against that, I think.

No, this has nothing to do with mdadm.

But it should not behave like this, and in my experience it doesn't.

There was a bug related to this that was fixed in 2.6.25, but I'm
fairly sure lenny has a more recent kernel than that (??).

How exactly did you reboot?  Just shutdown now or /sbin/reboot or
turn power off or ...

Exactly what kernel is used during the install?

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#509167: mdadm: check asprintf() return codes

2009-02-01 Thread Neil Brown

On Thursday January 8, kirkl...@canonical.com wrote:
 Neil-
 
 One more follow-up on this patch...
 
 I actually think that the code in the inline function should :
 
 -   ret = asprintf(strp, fmt, ap);
 +   ret = vasprintf(strp, fmt, ap);
 
 Otherwise, we might fill ap into the format string, and have bad things
 happen...
 
 Cheers,
 :-Dustin

Yes, of course.  Thanks.

I've applied this now.

NeilBrown



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#511164: It'd be nice if mdadm -G --help mentioned max argument to -z switch

2009-02-01 Thread Neil Brown

On Wednesday January 7, chr...@debian.org wrote:
 Package: mdadm
 Version: 2.6.7-1
 Severity: wishlist
 
 It'd nice if the help printed out when running mdadm -G --help also 
 mentioned the max parameter that can be passed to the -z option. Just 
 a one-line mention would be fine. (I know it's already in the manpage... 
 I found it trying to come up by hand with the right block size to pass.) 
 Because otherwise reading the help text gives the impression that you 
 *have* to specify the maximum size manually, which isn't the case with 
 the max argument. Also, I suspect said max argument is passed to -z 
 a fair bit more often than a numerical value. (It's certainly much less 
 error-prone).

Thanks for the suggestion.
I have added the following patch in 'upstream'.

NeilBrown

From 0083584d5e8162f684112ed32da1931d3190 Mon Sep 17 00:00:00 2001
From: NeilBrown ne...@suse.de
Date: Mon, 2 Feb 2009 10:58:08 +1100
Subject: [PATCH] Document 'max' option to --grow --size in --help output.

Suggestion from Christian Hudon chr...@debian.org

Signed-off-by: NeilBrown ne...@suse.de
---
 ReadMe.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/ReadMe.c b/ReadMe.c
index 88c4433..3164322 100644
--- a/ReadMe.c
+++ b/ReadMe.c
@@ -517,7 +517,8 @@ char Help_grow[] =
   --layout=  -p   : For a FAULTY array, set/change the error mode.\n
   --size=-z   : Change the active size of devices in an array.\n
   : This is useful if all devices have been replaced\n
-  : with larger devices.\n
+  : with larger devices.   Value is in Kilobytes, or\n
+  : the special word 'max' meaning 'as large as 
possible'.\n
   --raid-devices= -n  : Change the number of active devices in an array.\n
   --bitmap=  -b   : Add or remove a write-intent bitmap.\n
   --backup-file= file : A file on a differt device to store data for a\n
-- 
1.5.6.5




-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#509167: mdadm: check asprintf() return codes

2009-01-07 Thread Neil Brown

On Monday January 5, kirkl...@canonical.com wrote:
 On Fri, 2008-12-19 at 16:02 +1100, Neil Brown wrote:
  I'm not really keen on taking this sort of patch.
  It isn't clear that an abort (caused by the assert) is really much
  better than just segfaulting normally ... though you do get a message
  I guess.
  But it makes the code rather ugly.
  
  Maybe if you defined a asprintf_nofail (possibly an inline in mdadm.h)
  and called that it would be acceptable.
 
 Hi Neil, et al-
 
 I have an updated patch attached.
 
 I actually called the function xasprintf(), as that seems to be used
 elsewhere.
 
 I have verified that this code builds, but I have not functionally
 tested it.
 
 Perhaps you're more willing to accept something like this?

Yes, that looks much better, thanks.

I have committed it to my git tree.

Thanks,
NeilBrown

 
 -- 
 :-Dustin
 
 Dustin Kirkland
 Ubuntu Server Developer
 Canonical, LTD
 kirkl...@canonical.com
 GPG: 1024D/83A61194
 diff -uprN mdadm-2.6.7.1.orig/Assemble.c mdadm-2.6.7.1/Assemble.c
 --- mdadm-2.6.7.1.orig/Assemble.c 2008-10-15 00:29:37.0 -0500
 +++ mdadm-2.6.7.1/Assemble.c  2009-01-05 18:35:02.021045097 -0600
 @@ -386,9 +386,9 @@ int Assemble(struct supertype *st, char 
   if (c) c++; else c= info.name;
   if (isdigit(*c)  ((ident-autof  7)==4 || 
 (ident-autof7)==6))
   /* /dev/md/d0 style for partitionable */
 - asprintf(mddev, /dev/md/d%s, c);
 + xasprintf(mddev, /dev/md/d%s, c);
   else
 - asprintf(mddev, /dev/md/%s, c);
 + xasprintf(mddev, /dev/md/%s, c);
   mdfd = open_mddev(mddev, ident-autof);
   if (mdfd  0) {
   st-ss-free_super(st);
 diff -uprN mdadm-2.6.7.1.orig/config.c mdadm-2.6.7.1/config.c
 --- mdadm-2.6.7.1.orig/config.c   2008-10-12 21:46:39.0 -0500
 +++ mdadm-2.6.7.1/config.c2009-01-05 18:35:17.477104526 -0600
 @@ -559,7 +559,7 @@ void mailfromline(char *line)
   alert_mail_from = strdup(w);
   else {
   char *t= NULL;
 - asprintf(t, %s %s, alert_mail_from, w);
 + xasprintf(t, %s %s, alert_mail_from, w);
   free(alert_mail_from);
   alert_mail_from = t;
   }
 diff -uprN mdadm-2.6.7.1.orig/mdadm.h mdadm-2.6.7.1/mdadm.h
 --- mdadm-2.6.7.1.orig/mdadm.h2008-10-15 00:29:37.0 -0500
 +++ mdadm-2.6.7.1/mdadm.h 2009-01-05 18:49:34.061044664 -0600
 @@ -527,6 +527,17 @@ extern int open_mddev(char *dev, int aut
  extern int open_mddev_devnum(char *devname, int devnum, char *name,
char *chosen_name, int parts);
  
 +#include assert.h
 +#include stdarg.h
 +static inline int xasprintf(char **strp, const char *fmt, ...) {
 + va_list ap;
 + int ret;
 + va_start(ap, fmt);
 + ret = asprintf(strp, fmt, ap);
 + va_end(ap);
 + assert(ret = 0);
 + return ret;
 +}
  
  #define  LEVEL_MULTIPATH (-4)
  #define  LEVEL_LINEAR(-1)



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#509167: mdadm: check asprintf() return codes

2008-12-18 Thread Neil Brown


I'm not really keen on taking this sort of patch.
It isn't clear that an abort (caused by the assert) is really much
better than just segfaulting normally ... though you do get a message
I guess.
But it makes the code rather ugly.

Maybe if you defined a asprintf_nofail (possibly an inline in mdadm.h)
and called that it would be acceptable.

Thanks,
NeilBrown


On Thursday December 18, kirkl...@canonical.com wrote:
 Package: mdadm
 Version: 2.6.7.1-1
 Severity: normal
 Tags: patch
 
 Hello,
 
 I have attached a minor patch that we're carrying in Ubuntu, against the
 mdadm-2.6.7.1-1 source.
 
 This rather trivial patch checks the return codes of 3 asprintf() memory
 allocations, to prevent segfaults on memory-challenged systems.
 
 Thanks,
 -- 
 :-Dustin
 
 Dustin Kirkland
 Ubuntu Server Developer
 Canonical, LTD
 kirkl...@canonical.com
 GPG: 1024D/83A61194
 --- mdadm-2.6.7.1.orig/Assemble.c
 +++ mdadm-2.6.7.1/Assemble.c
 @@ -29,6 +29,7 @@
  
  #include mdadm.h
  #include ctype.h
 +#include assert.h
  
  static int name_matches(char *found, char *required, char *homehost)
  {
 @@ -384,11 +385,15 @@
   st-ss-getinfo_super(st, info);
   c = strchr(info.name, ':');
   if (c) c++; else c= info.name;
 - if (isdigit(*c)  ((ident-autof  7)==4 || 
 (ident-autof7)==6))
 + if (isdigit(*c)  ((ident-autof  7)==4 || 
 (ident-autof7)==6)) {
   /* /dev/md/d0 style for partitionable */
 - asprintf(mddev, /dev/md/d%s, c);
 - else
 - asprintf(mddev, /dev/md/%s, c);
 + int ret = asprintf(mddev, /dev/md/d%s, c);
 + assert(ret = 0);
 + }
 + else {
 + int ret = asprintf(mddev, /dev/md/%s, c);
 + assert(ret = 0);
 + }
   mdfd = open_mddev(mddev, ident-autof);
   if (mdfd  0) {
   st-ss-free_super(st);
 only in patch2:
 unchanged:
 --- mdadm-2.6.7.1.orig/config.c
 +++ mdadm-2.6.7.1/config.c
 @@ -35,6 +35,7 @@
  #include ctype.h
  #include pwd.h
  #include grp.h
 +#include assert.h
  
  /*
   * Read the config file
 @@ -559,7 +560,8 @@
   alert_mail_from = strdup(w);
   else {
   char *t= NULL;
 - asprintf(t, %s %s, alert_mail_from, w);
 + int ret = asprintf(t, %s %s, alert_mail_from, w);
 + assert(ret = 0);
   free(alert_mail_from);
   alert_mail_from = t;
   }
 ___
 pkg-mdadm-devel mailing list
 pkg-mdadm-de...@lists.alioth.debian.org
 http://lists.alioth.debian.org/mailman/listinfo/pkg-mdadm-devel



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#489608: mdadm: similar kernel call traces

2008-11-09 Thread Neil Brown


This bug is a kernel bug rather than an mdadm bug.

It is almost certainly the bug fixed by 
  commit 9744197c3d7b329590c2be33ad7b17409bd798fe
which went into 2.6.27.

Nothing is actually going wrong.  It is just an annoying message.
If it bothers you, run the command
   echo 0  /proc/sys/kernel/hung_task_timeout_secs
as the message suggests.

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#498505: mdadm: Bug still not fixed

2008-11-05 Thread Neil Brown

On Monday November 3, [EMAIL PROTECTED] wrote:
 Package: mdadm
 Version: 2.6.7.1-1
 Followup-For: Bug #498505
 
 
 Problem still not fixed in 2.6.7.1-1. I start to reshape RAID5 array and
 reboot machine. After that, mdadm --assemble causes segmentation fault. While 
 machine was powered off, some drives was moved to other SATA connectors and
 devices in Linux was changed respectively (/dev/sdg was moved to /dev/sdd).

This bug is fixed by commit 56f8add211a840faaed325bd16483b55da544e93
which is scheduled to be in 2.6.8, but was not included in 2.6.7.1.

I include it below.

NeilBrown

From 56f8add211a840faaed325bd16483b55da544e93 Mon Sep 17 00:00:00 2001
From: Neil Brown [EMAIL PROTECTED]
Date: Thu, 19 Jun 2008 16:30:36 +1000
Subject: [PATCH] Fix an error when assembling arrays that are in the middle of 
a reshape.

It is important that dup_super always returns an 'st' with the same
-ss and -minor_version as the st that was passed.
This wasn't happening for 0.91 metadata (i.e. in the middle of a reshape).
---
 super0.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/super0.c b/super0.c
index 7e81482..8e4c568 100644
--- a/super0.c
+++ b/super0.c
@@ -849,12 +849,15 @@ static struct supertype *match_metadata_desc0(char *arg)
st-sb = NULL;
if (strcmp(arg, 0) == 0 ||
strcmp(arg, 0.90) == 0 ||
-   strcmp(arg, 0.91) == 0 ||
strcmp(arg, default) == 0 ||
strcmp(arg, ) == 0 /* no metadata */
)
return st;
 
+   st-minor_version = 91; /* reshape in progress */
+   if (strcmp(arg, 0.91) == 0) /* For dup_super support */
+   return st;
+
st-minor_version = 9; /* flag for 'byte-swapped' */
if (strcmp(arg, 0.swap)==0 ||
strcmp(arg, 0.9) == 0) /* For dup_super support */
-- 
1.5.6.5




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#500309: mdadm thinks disk is not large enough to be added, but it is (v1 superblock)

2008-10-13 Thread Neil Brown

On Saturday September 27, [EMAIL PROTECTED] wrote:
 Package: mdadm
 Version: 2.6.7-3.1
 Severity: important
 Tags: upstream
 
 md0 : active raid1 hda1[1]
   64220 blocks super 1.0 [2/1] [_U]
 
 wall:/# sfdisk -s /dev/hdc1
 64228
 
 wall:/# mdadm --add /dev/md0 /dev/hdc1
 mdadm: /dev/hdc1 not large enough to join array


This should be fixed by the following patch which I have just
committed as 

 
http://neil.brown.name/git?p=mdadm;a=commitdiff;h=2a528478c75b6659188fc2ce0d9543124992fe6c


NeilBrown
---
From: NeilBrown [EMAIL PROTECTED]
Date: Mon, 13 Oct 2008 05:15:16 + (+1100)
Subject: Manage: allow adding device that is just large enough to v1.x array.
X-Git-Url: 
http://neil.brown.name/git?p=mdadm;a=commitdiff_plain;h=2a528478c75b6659188fc2ce0d9543124992fe6c

Manage: allow adding device that is just large enough to v1.x array.

When adding a device to an array, we check that it is large enough.

Currently the check makes sure there is also room for a reasonably
sized bitmap.  But if the array doesn't have a bitmap, then this test
might be too restrictive.
So when adding, only insist there is enough space for the current
bitmap.
When Creating, still require room for the standard sized bitmap.

This resolved Debian Bug 500309
---

diff --git a/Manage.c b/Manage.c
index 8297708..7b3fabe 100644
--- a/Manage.c
+++ b/Manage.c
@@ -349,14 +349,6 @@ int Manage_subdevs(char *devname, int fd,
 
if (array.not_persistent == 0) {
 
-   /* Make sure device is large enough */
-   if (tst-ss-avail_size(tst, ldsize/512) 
-   array_size) {
-   fprintf(stderr, Name : %s not large 
enough to join array\n,
-   dv-devname);
-   return 1;
-   }
-
/* need to find a sample superblock to copy, and
 * a spare slot to use
 */
@@ -386,6 +378,15 @@ int Manage_subdevs(char *devname, int fd,
fprintf(stderr, Name : cannot find 
valid superblock in this array - HELP\n);
return 1;
}
+
+   /* Make sure device is large enough */
+   if (tst-ss-avail_size(tst, ldsize/512) 
+   array_size) {
+   fprintf(stderr, Name : %s not large 
enough to join array\n,
+   dv-devname);
+   return 1;
+   }
+
/* Possibly this device was recently part of 
the array
 * and was temporarily removed, and is now 
being re-added.
 * If so, we can simply re-add it.
diff --git a/bitmap.c b/bitmap.c
index fdf8884..b647939 100644
--- a/bitmap.c
+++ b/bitmap.c
@@ -115,6 +115,15 @@ unsigned long long bitmap_bits(unsigned long long 
array_size,
return (array_size * 512 + chunksize - 1) / chunksize;
 }
 
+unsigned long bitmap_sectors(struct bitmap_super_s *bsb)
+{
+   unsigned long long bits = bitmap_bits(__le64_to_cpu(bsb-sync_size),
+ __le32_to_cpu(bsb-chunksize));
+   int bits_per_sector = 8*512;
+   return (bits + bits_per_sector - 1) / bits_per_sector;
+}
+
+
 bitmap_info_t *bitmap_fd_read(int fd, int brief)
 {
/* Note: fd might be open O_DIRECT, so we must be
diff --git a/mdadm.h b/mdadm.h
index 5c18d15..ce140e5 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -474,6 +474,7 @@ extern int CreateBitmap(char *filename, int force, char 
uuid[16],
int major);
 extern int ExamineBitmap(char *filename, int brief, struct supertype *st);
 extern int bitmap_update_uuid(int fd, int *uuid, int swap);
+extern unsigned long bitmap_sectors(struct bitmap_super_s *bsb);
 
 extern int md_get_version(int fd);
 extern int get_linux_version(void);
diff --git a/super1.c b/super1.c
index fe915f8..e1d0219 100644
--- a/super1.c
+++ b/super1.c
@@ -1214,10 +1214,21 @@ static struct supertype *match_metadata_desc1(char *arg)
  */
 static __u64 avail_size1(struct supertype *st, __u64 devsize)
 {
+   struct mdp_superblock_1 *super = st-sb;
if (devsize  24)
return 0;
 
-   devsize -= choose_bm_space(devsize);
+   if (super == NULL)
+   /* creating:  allow suitable space for bitmap */
+   devsize -= choose_bm_space(devsize);
+#ifndef MDASSEMBLE
+   else if (__le32_to_cpu(super-feature_map)MD_FEATURE_BITMAP_OFFSET) {
+   /* hot-add. allow for actual size of bitmap */
+   struct bitmap_super_s *bsb;
+

Bug#496334: mdadm segfault on --assemble --force with raid10

2008-10-12 Thread Neil Brown


I believe this bug is fixed by

 
http://neil.brown.name/git?p=mdadm;a=commitdiff;h=60b435db5a7b085ad1204168879037bf14ebd6d1

see below.

NeilBrown


From: Chris Webb [EMAIL PROTECTED]
Date: Thu, 19 Jun 2008 06:30:39 + (+1000)
Subject: Fix bug in forced assemble.
X-Git-Tag: mdadm-3.0-devel1~76^2~12
X-Git-Url: 
http://neil.brown.name/git?p=mdadm;a=commitdiff_plain;h=60b435db5a7b085ad1204168879037bf14ebd6d1

Fix bug in forced assemble.

From: Chris Webb [EMAIL PROTECTED]

We are loading into the already-loaded 'st' instead of the
newly create 'tst', which is clearly wrong.
---

diff --git a/Assemble.c b/Assemble.c
index 36b2304..79f0912 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -656,7 +656,7 @@ int Assemble(struct supertype *st, char *mddev, int mdfd,
continue;
}
tst = dup_super(st);
-   if (tst-ss-load_super(st,fd, NULL)) {
+   if (tst-ss-load_super(tst,fd, NULL)) {
close(fd);
fprintf(stderr, Name : RAID superblock disappeared 
from %s - not updating.\n,
devices[chosen_drive].devname);



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#495580: mdadm: 4 disk raid10 with 1 active and 3 spare possible

2008-10-12 Thread Neil Brown

On Wednesday August 20, [EMAIL PROTECTED] wrote:
 
 The current Debian sid kernel 2.6.26-2 does sync the disks if I add
 them.
 I tried now even the newer 2.6.27-rc3-git6 out, because it has a few MD
 changes.
 That one adds the disks only as `spare' but does not start a resync.

I cannot reproduce this.

This is surprising though:

 fz-vm:~# cat /proc/mdstat
 Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
 [multipath] 
 md0 : active raid10 sdc1[4](S) sdf1[3] sde1[2] sdd1[1]
   16771584 blocks 64K chunks 2 near-copies [4/3] [_UUU]
   
 unused devices: none

Here there is one failed device.  There is one spare but the array
isn't resyncing.

And...

 fz-vm:~# mdadm -Q --detail /dev/md0
 /dev/md0:
 Version : 00.90
   Creation Time : Wed Aug 20 09:03:27 2008
  Raid Level : raid10
  Array Size : 16771584 (15.99 GiB 17.17 GB)
   Used Dev Size : 8385792 (8.00 GiB 8.59 GB)
Raid Devices : 4
   Total Devices : 4
 Preferred Minor : 0
 Persistence : Superblock is persistent
 
 Update Time : Wed Aug 20 09:07:32 2008
   State : clean, degraded
  Active Devices : 3
 Working Devices : 4
  Failed Devices : 0
   Spare Devices : 1
 
  Layout : near=2, far=1
  Chunk Size : 64K
 
UUID : 04e72c15:980e57a0:89ccbef7:ff5abfb0 (local to host fz-vm)
  Events : 0.10
 
 Number   Major   Minor   RaidDevice State
0   000  removed
1   8   491  active sync   /dev/sdd1
2   8   652  active sync   /dev/sde1
3   8   813  active sync   /dev/sdf1
 
4   8   33-  spare   /dev/sdc1
 fz-vm:~# cat /proc/mdstat
 Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
 [multipath] 
 md0 : active raid10 sdd1[4](S) sdc1[5](S) sdf1[3] sde1[2]
   16771584 blocks 64K chunks 2 near-copies [4/2] [__UU]

Here there are two failed devices.  What happened?

Do the kernel logs show anything between these two cat /proc/mdstats?


NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#492970: (was: nfs-utils-1.1.3 released)

2008-08-03 Thread Neil Brown

On Saturday August 2, [EMAIL PROTECTED] wrote:
 On Fri, Aug 01, 2008 at 11:15:33PM +1000, Aníbal Monsalve Salazar wrote:
  On Mon, Jul 28, 2008 at 03:13:19AM -0400, Steve Dickson wrote:
  I just cut the 1.1.3 nfs-utils release. Unfortunately I'm having
  issues accessing my kernel.org account so for the moment the 
  tar ball is only available on SourceForge:
  
   http://sourceforge.net/projects/nfs
  [...] 
  
  1.1.3 clients don't work with a 1.0.10 server anymore.
 
 Very weird--it might make sense if upgrading nfs-utils broke the mount
 itself, but here it seems the mount is succeeding and subsequent file
 access (which I'd expect to only involve the in-kernel client code) is
 failing.  Maybe there's some difference in the mount options?  What does
 /proc/self/mounts say?  I assume these are all v2 or v3 mounts?

I'm guessing v4 and that idmapd is causing problems.  I cannot see any
other possible cause (not that this one seems likely).

If we can get a tcpdump -s0 -port 2049 of the traffic it might help.

NeilBrown



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#489257: mdadm.conf manpage does not explain HOMEHOST option

2008-07-06 Thread Neil Brown

On Friday July 4, [EMAIL PROTECTED] wrote:
 Package: mdadm
 Version: 2.6.7-2
 Severity: minor
 
 Hi,
 
 the mdadm.conf man page does not explain the HOMEHOST setting.

True.
A future release will have this text:

   HOMEHOST
  The homehost line gives a  default  value  for  the  --homehost=
  option  to mdadm.  There should be exactly one other word on the
  line.  It should either exactly system or  a  host  name.   If
  system is given, then the gethostname(2) systemcall is used to
  get the host name.  When arrays are created, this host name will
  be  stored  in  the  metadata.   When arrays are assembled using
  auto-assembly, only arrays with this host  name  stored  in  the
  metadata will be considered.



Thanks.

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#484460: hal-info: Video Quirks wrong for Dell Latitude D820

2008-06-04 Thread Neil Brown

Package: hal-info
Version: 20080317+git20080318-1
Severity: normal


My Dell Latitude D820 doesn't have working video after
a suspend/resume cycle.

It seems that the quirk in 20-video-quirk-pm-dell.fdi is wrong,
at least for my notebook.
It contains
  !-- the Dell D820 is also reported to work with 
vbe_post+vbemode_restore and
   need may dpms_on --
  merge key=power_management.quirk.vbestate_restore 
type=booltrue/merge

I need
  merge key=power_management.quirk.vbe_post type=booltrue/merge
  merge key=power_management.quirk.vbemode_restore 
type=booltrue/merge
  merge key=power_management.quirk.vbestate_restore 
type=boolfalse/merge

to get suspend/resume to work.  So I support the comment that it works
with vbe_post+vbemode_restore, and add that it does NOT work with 
vbestate_restore.

This is with
 match key=system.firmware.version contains=A04

I have created an information/20thirdparty/ file to fix it more myself,
but getting this correction upstream would be nice.

NeilBrown

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.24-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

-- no debconf information



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#474548: mdadm: error hot-adding drive to raid1 array after fail

2008-04-29 Thread Neil Brown

On Friday April 11, [EMAIL PROTECTED] wrote:
 tags 474548 upstream
 thanks
 
 also sprach Alex Samad [EMAIL PROTECTED] [2008.04.06.1525 +0200]:
  seems like there is an error in 2.6.x stream of mdadm, when trying to
  hot add a disk to a raid1 array after 1 disk failed.
  
  I tried to do it and I kept getting errors. down graded to 2.5.x and it
  worked fine.
  
  Found this thread (with patch)
 
 Any chance of getting this patch[0] into upstream?
 
 0. http://www.mail-archive.com/[EMAIL PROTECTED]/msg10248.html

OK, it will appear in my .git shortly.

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#463769: Fails on files (ie disk images)

2008-04-29 Thread Neil Brown

On Friday April 11, [EMAIL PROTECTED] wrote:
 also sprach Wakko Warner [EMAIL PROTECTED] [2008.04.11.1802 +0200]:
  That wasn't the point.  The point was I wasn't able to examine
  the file. There's absolutely no reason that one has to attach it
  to a loopback device just to examine the md superblock.
 
 md is multi-device. A file is not a device. Of course, mdadm
 could work on files, it could also implicitly gunzip them and do
 character translation and all the like. But it doesn't, and it
 won't.
 
 Now, at least this is my interpretation. I CC'd upstream and if he
 disagrees, I'll take it all back. Neil, the bug report is at
 http://bugs.debian.org/463769.

It don't entirely disagree with Martin, but then I don't entirely
disagree with the patch either... conceptually at least.

The O_LARGEFILE changes are not needed as the _FILE_OFFSET_BITS
change makes them unnecessary.  And there is some bad indenting...

That fixed patch will appear in my .git shortly.

NeilBrown




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#462154: mdadm: It's not only --monitor --scan

2008-04-28 Thread Neil Brown

On Friday April 11, [EMAIL PROTECTED] wrote:
  I have attached said patch, maybe it should be included in the
  Debian package?
 
 Jonathan, does the patch fix things for you?
 
 Neil, the patch on
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=462154 is not
 upstream yet. Should I add it to Debian? Or would you consider
 including it and releasing 2.6.5?

It fixed it differently and with

 
http://neil.brown.name/git?p=mdadm;a=commitdiff;h=2cdb64897d4fe33a11af13c6356dcd338c561e77

which I recently added to my .git.

Yes, I need to release 2.6.5 sometime soon!

NeilBrown


 
 -- 
  .''`.   martin f. krafft [EMAIL PROTECTED]
 : :'  :  proud Debian developer, author, administrator, and user
 `. `'`   http://people.debian.org/~madduck - http://debiansystem.info
   `-  Debian - when you have better things to do than fixing systems



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#462154: mdadm: It's not only --monitor --scan

2008-03-05 Thread Neil Brown

On Tuesday March 4, [EMAIL PROTECTED] wrote:
 Package: mdadm
 Version: 2.6.4-1
 Followup-For: Bug #462154
 
 
 I've had it after a few creates (raid benchmarking atm) as well, and 
 assembles.
 
 It seems to crash *after* what it's supposed to do, though, i. e.:
 
 segfault at  rip 00412d2c rsp 7fffc0bf9eb0 error 4
 segfault at  rip 00412d2c rsp 7fff9f31b5d0 error 4
 segfault at  rip 00412d2c rsp 7fffd6f371f0 error 4
 segfault at  rip 00412d2c rsp 7fffa08b5b70 error 4
 segfault at  rip 00412d2c rsp 7fff78944c00 error 4
 segfault at  rip 00412d2c rsp 7fffc0bf9eb0 error 4

If this is at all repeatable, then it would be great if you can run it
under 'gdb' and get a stack trace.
i.e
   gdb `which mdadm`
   run ...mdadm options
  wait for segfault
   where

Thanks,
NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#458327: closed by martin f krafft [EMAIL PROTECTED] (Re: Bug#458327: on reboot mdadm fails to mount all drives in a raid5 array)

2008-01-03 Thread Neil Brown


It looks like you are relaying on in-kernel autodetection, and sdd2
and sdd1 don't have the RAID Autodetect partition type.

I would change all the partition types to something else myself, but
it would work to change those two to be Autodetect.

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#446323: mdadm: recovery in infinite loop

2007-10-15 Thread Neil Brown

On Monday October 15, [EMAIL PROTECTED] wrote:
 
 seems as the size is the same?

seems.
I was hoping for 
   cat /proc/partitions
and maybe even
   fdisk -l /dev/hda /dev/hdb

I should have been more specific.

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#446323: mdadm: recovery in infinite loop

2007-10-15 Thread Neil Brown


As you say, the devices are exactly the same size, thanks.

On Monday October 15, [EMAIL PROTECTED] wrote:
 
 how do I undo?  mdadm /dev/md2 -f /dev/hda2
 So I could try the sync in init 1
 Lucas

Well, you could:
  mdadm /dev/md2 -f /dev/hda2
  mdadm /dev/md2 -r /dev/hda2

then when you are ready to try again

  mdadm /dev/md2 -a /dev/hda2

I think there must be something odd happening with the drive or
controller.  I notice that the two devices are on the same IDE
channel, which is sometimes a source of problems, though it should
behave like this.

If you feel up to patching the kernel, recompiling, and experimenting,
I can send you a patch which should provide more detailed information
on what is happening.  Let me know what kernel version you will be
working with.

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#443591: [NFS] Bug#443591: nfs-kernel-server: Unexporting directories no longer working

2007-10-14 Thread Neil Brown

On Monday September 24, [EMAIL PROTECTED] wrote:
 
 Hi Neil,
 
 Thanks for looking.
 
 Neil Brown wrote:
  On Sunday September 23, [EMAIL PROTECTED] wrote:

  On Sat, Sep 22, 2007 at 10:42:31AM -0700, David Liontooth wrote:
  
  Package: nfs-kernel-server
  Version: 1:1.1.0-13
  Severity: normal
 
 
  The command to unexport a directory appears to no longer have any effect.
 
  I issue exportfs -u :/tv01 and exportfs shows /tv01 still exported; 
  consequently, I cannot unmount it.
  In contrast, removing /tv01 from /etc/exports and then running exportfs 
  -ra successfully removes the export.
 
  This used to work fine.

  Sending this on to upstream, as I cannot see any good reason offhand why it
  should not work.
  
 
  Some simple testing and code review suggests that this works as
  expected.  However it is possible that I am expecting something
  different to you, or testing something different.
 
  You say you:
  exportfs -u :/tv01
 
  What exactly is in your /etc/exports that this is expected to revert?

 
 /tv01 \
 134.32.443.30(ro,no_subtree_check,async) \
 134.32.443.32(ro,no_subtree_check,async) \
 134.32.443.33(ro,no_subtree_check,async) \
 134.32.443.34(ro,no_subtree_check,async) \
 134.32.443.35(ro,no_subtree_check,async) \
 134.32.443.36(ro,no_subtree_check,async) \
 134.32.443.37(ro,no_subtree_check,async)

To unexport /tv01, you would need to individually unexport each of
those export.  Or edit /etc/exports to remove those lines and
   exportfs -r

 
 Several other drives have similar entries.
  The obvious answer would be
 
  /tv01  (some,flags,here)
 
  however exportfs will complain about that, so I suspect not.
 
  Maybe you have:
 
  /tv01 somehost(someflags)  otherhost(otherflags)
 
  and you expect
  exportfs -u :/tv01
 
  to unexport /tv01 to all hosts?  I would agree that doesn't work.  Did
  it ever?  What version?

 
 I see. So that would unexport only the first one?

No, it would not export anything as that is asking to stop exporting
it to the wildcard host (matches anything) and it is not currently
exported to the wildcard host.

 
 Can I unexport only /tv01 to all hosts?
 (If it's just a matter of my being uninformed, let's close the bug --
 but I'd appreciate an answer!)

No.  You cannot currently export a filesystem.  You can only unexport
an 'export' which is a host:filesystem combination.

 
  As an aside, you can always:
 exportfs -f
  and then unmount filesystems.  They will be free to be unmounted until
  the next NFS access request arrives.  Maybe that will serve your
  needs?

 I see -- that may be helpful -- but what if someone is accessing one of
 the drives right then?

If you 
  exportfs -f ; umount /tv01
there is a chance that a request will arrive between the two, so the
umount will fail.
You could instead
   umount -l /tv01 ; exportfs -f

which will avoid the race and be just as effective.


 I would prefer to have individual control; I export a dozen other drives
 to several different machines, and they should not be unexported.

exportfs -f will not exactly unexport them.  It just removes cached
information from the kernel so that it has to ask mountd again.
So the most you will notice is a slight pause, and you probably won't
notice that unless the system is very busy and there are lots of
mounts - or hostname/netgroup lookup is very slow.

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#445573: auto-ro raids have 'recover' for sync-action

2007-10-14 Thread Neil Brown

On Monday October 8, [EMAIL PROTECTED] wrote:
 reassign 445573 linux-image-2.6.18-5-686
 tags 445573 patch confirmed
 severity 445573 minor
 thanks
 
 also sprach Neil Brown [EMAIL PROTECTED] [2007.10.08.0231 +0100]:
  Yep, this is a kernel bug.
  This should fix it.
 
 Can you provide an ETA as to when this will be upstream? 2.6.23?
 .24?
 
 I am forwarding the bug to our kernel team and they'll be glad to
 have this information.

I've just submitted it to Andrew and it should be in 2.6.24.  I don't
think it warrants a patch to 2.6.23.y.

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#446323: mdadm: recovery in infinite loop

2007-10-12 Thread Neil Brown

On Friday October 12, [EMAIL PROTECTED] wrote:
 yes I have.
 Then I tried booting in knoppix to see if maybe there are hdd problems
 so I run e2fsck on both drives to see if there are any issues.
 reboot and it still does the same thing.

Ok, it must be caused by some persistent state.
Could you send
  mdadm -E /dev/hda2
  mdadm -E /dev/hdb2

and the exact partition layouts of /dev/hda and /dev/hdb

 
 Is mdadm able to sync in init 1?

The array will start resync as soon as it is assembled.  So I think
the answer to your question is 'yes'.

 Is there a debug mode so I could get more logs for the mdadm?

It isn't 'mdadm', it is the kernel.  And no, there is no debug mode.

 If this seems as a kernel problem, how could I find out that is the
 case? Where would I start?

I can't tell what you mean by this question.

My current guess is that hda2 might be slightly smaller than hdb2 -
though that is really justd clutching at straws.

I suggest you 
 mdadm /dev/md2 -f /dev/hda2
so that it doesn't keep trying to rebuild all the time.

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#446323: mdadm: recovery in infinite loop

2007-10-11 Thread Neil Brown

On Thursday October 11, [EMAIL PROTECTED] wrote:
 Package: mdadm
 Version: 2.5.6-9
 Severity: normal
 
 Hello,
 I am trying to setup raid1 on my pc.
 I have 2 identical drives. 270+gb. each with 3 partitions. 
 30gb hdb1   - md0
 250gb hdb2   -md2
 4gb swap hdb5  -md4
 
 Initially my raid had only one drive. I have added the second one with 
 mdadm --add /dev/md2 /dev/hda1 then 2 then 4
 
 It started doing recovery for drives. IT finished for md0,md4 but for
 md2 it is in infinite loop. IT goes to 15 % and starts again

Very weird

This is a kernel problem rather than an 'mdadm' problem, but anyway...

Th recovery process thinks that it is either getting an error or being
interrupted. 

If it was getting a write error to the new drive, that drive would be
marked faulty, so that isn't happening.
If it was getting a read error from the good drive, it would say
   raid1: md2: unrecoverable I/O read error fro block XX
and that isn't happening.. So it cannot be an IO error.

There are three ways to interrupt the resync.

 1/ Send a signal to the thread.  That would result in the message
md: md_do_sync() got signal ... exiting
 2/ Stop the array (mdadm -S).  If you did that, the array would stop.
 3/ Write 'idle' to /sys/block/md2/md/sync_action

Given the logs that you posted, the last one is the only option I can
think of.  By why would you be doing that.
It could be done with
/usr/share/mdadm/checkarray -x all
but I cannot see that being done either.

So I am confused.

Have you tried rebooting to see if that helps?

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#435522: latex2html: Does not correct handle framebox in picture environments

2007-10-07 Thread Neil Brown

On Wednesday October 3, [EMAIL PROTECTED] wrote:
 Hi,
 
 thanks for your report and the patch and sorry for the delay.
 
 Neil Brown wrote:
  latex2html seems to expect framebox to take two options (square bracket) 
  arguments before the 
  main argument.
  Infact it takes a (), a [] and then the {}.
 
 I checked the LaTeX user guide (Latex2e for authors), and it defines
 framebox as [][]{}.

There seem to be 2 framebox commands.

Looking at my LaTeX User's Guide and Reference Manual (for 2.09)

On page 194 in C.12.3 Boxes we have

   \framebox[width][pos]{text}

while on page 197 in C.13.2 Picture Objects we have

   \framebox(x_dimen,y_dimen)[pos]{text}

 
 Therefore, I'm closing this bug for now since the problem looks like
 intended behaviour. But feel free to reopen it if you still have good
 reasons.

So I think I have good reason, but I suspect my patch is not
sufficient.  It needs to either detect if we are in a 'picture'
environment and adjust the expected arguments accordingly, or simply
allow either pattern in all contexts.

Thanks,
NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#445573: auto-ro raids have 'recover' for sync-action

2007-10-07 Thread Neil Brown

On Sunday October 7, [EMAIL PROTECTED] wrote:
 Package: mdadm
 Version: 2.6.2-1
 Severity: normal
 Tags: upstream
 
 sync_action cannot deal with auto-read-only, it seems. Neil, can you
 please confirm that you saw this?

Yep, this is a kernel bug.

This should fix it.

NeilBrown


Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/md.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c   2007-08-20 11:36:09.0 +1000
+++ ./drivers/md/md.c   2007-10-08 11:20:00.0 +1000
@@ -2723,7 +2723,7 @@ action_show(mddev_t *mddev, char *page)
 {
char *type = idle;
if (test_bit(MD_RECOVERY_RUNNING, mddev-recovery) ||
-   test_bit(MD_RECOVERY_NEEDED, mddev-recovery)) {
+   (!mddev-ro  test_bit(MD_RECOVERY_NEEDED, mddev-recovery))) {
if (test_bit(MD_RECOVERY_RESHAPE, mddev-recovery))
type = reshape;
else if (test_bit(MD_RECOVERY_SYNC, mddev-recovery)) {




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#443591: [NFS] Bug#443591: nfs-kernel-server: Unexporting directories no longer working

2007-09-24 Thread Neil Brown

On Sunday September 23, [EMAIL PROTECTED] wrote:
 On Sat, Sep 22, 2007 at 10:42:31AM -0700, David Liontooth wrote:
  Package: nfs-kernel-server
  Version: 1:1.1.0-13
  Severity: normal
  
  
  The command to unexport a directory appears to no longer have any effect.
  
  I issue exportfs -u :/tv01 and exportfs shows /tv01 still exported; 
  consequently, I cannot unmount it.
  In contrast, removing /tv01 from /etc/exports and then running exportfs -ra 
  successfully removes the export.
  
  This used to work fine.
 
 Sending this on to upstream, as I cannot see any good reason offhand why it
 should not work.

Some simple testing and code review suggests that this works as
expected.  However it is possible that I am expecting something
different to you, or testing something different.

You say you:
exportfs -u :/tv01

What exactly is in your /etc/exports that this is expected to revert?
The obvious answer would be

/tv01  (some,flags,here)

however exportfs will complain about that, so I suspect not.

Maybe you have:

/tv01 somehost(someflags)  otherhost(otherflags)

and you expect
exportfs -u :/tv01

to unexport /tv01 to all hosts?  I would agree that doesn't work.  Did
it ever?  What version?

As an aside, you can always:
   exportfs -f
and then unmount filesystems.  They will be free to be unmounted until
the next NFS access request arrives.  Maybe that will server your
needs?

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#442874: mdadm: --write-mostly does nothing if device is re-added using --add

2007-09-23 Thread Neil Brown


Thanks for the bug report.

This is now fixed in the upstream .git by the following patch.

It highlights the fact that while you can turn on the write-mostly
bit, you cannot easily turn it off.  I wonder if that is a problem.

NeilBrown


### Diffstat output
 ./Manage.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff .prev/Manage.c ./Manage.c
--- .prev/Manage.c  2007-08-10 13:36:18.0 +1000
+++ ./Manage.c  2007-09-24 13:09:09.0 +1000
@@ -395,6 +395,8 @@ int Manage_subdevs(char *devname, int fd
disc.number = mdi.disk.number;
disc.raid_disk = 
mdi.disk.raid_disk;
disc.state = mdi.disk.state;
+   if (dv-writemostly)
+   disk.state |= 1  
MD_DISK_WRITEMOSTLY;
if (ioctl(fd, ADD_NEW_DISK, 
disc) == 0) {
if (verbose = 0)
fprintf(stderr, 
Name : re-added %s\n, dv-devname);



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#435522: latex2html: Does not correct handle framebox in picture environments

2007-08-01 Thread Neil Brown

Package: latex2html
Version: 2002-2-1-20050114-5
Severity: normal
Tags: patch


latex2html seems to expect framebox to take two options (square bracket) 
arguments before the 
main argument.
Infact it takes a (), a [] and then the {}.

This patch fixes it for me.

--- /usr/bin/latex2html.orig2006-04-17 18:28:39.0 +1000
+++ /usr/bin/latex2html 2007-08-01 20:38:03.0 +1000
@@ -16137,7 +16137,7 @@
 process_commands_in_tex (_RAW_ARG_CMDS_);
 psfig # {} # \$args =~ s/ //g;
 usebox # {}
-framebox # [] # [] # {}
+framebox # () # [] # {}
 _RAW_ARG_CMDS_
 
 # ... but these are set in a box to measure height/depth 


Thanks.
NeilBrown


-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.22-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages latex2html depends on:
ii  gs-esp [gs]  8.15.3.dfsg.1-1 The Ghostscript PostScript interpr
ii  netpbm   2:10.0-11   Graphics conversion tools
ii  perl 5.8.8-7 Larry Wall's Practical Extraction 
ii  perl-doc 5.8.8-7 Perl documentation
ii  tetex-bin2007-10 TeX Live: teTeX transitional packa
ii  tetex-extra  2007-10 TeX Live: teTeX transitional packa
ii  texlive-base-bin 2007-12 TeX Live: Essential binaries
ii  texlive-fonts-recommende 2007-10 TeX Live: Recommended fonts
ii  texlive-latex-recommende 2007-10 TeX Live: LaTeX recommended packag

latex2html recommends no packages.

-- no debconf information


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#434565: vm: 8bit characters are not escapes in In-reply-to field

2007-07-24 Thread Neil Brown

Package: vm
Version: 7.19-14
Severity: normal

When I reply to an email from someone with an 8-bit character in their name, 
such as 
  =?iso-8859-1?Q?V=EDctor_Paesa?= [EMAIL PROTECTED]

The In-reply-to field gets the un-escaped name:
  In-Reply-To: message from Víctor Paesa on Sunday July 15

And a number of mail systems reject the email as 8bit characters in headers are 
illegal.


-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.21-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages vm depends on:
ii  emacs21   21.4a+1-5  The GNU Emacs editor
ii  ucf   3.001  Update Configuration File: preserv

Versions of packages vm recommends:
ii  make  3.81-3 The GNU version of the make util

-- debconf-show failed


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#416512: removed disk md-device

2007-05-11 Thread Neil Brown

On Thursday May 10, [EMAIL PROTECTED] wrote:
 
 No, I haven't, but it is getting near the top of my list.

I have just committed a change to the mdadm .git so that
   mdadm /dev/md4 --fail detached

will fail any components of /dev/md4 that appear to be detached (open
returns -ENXIO). and
   mdadm /dev/md4 --remove detached
will remove any such devices (that are failed or spare).
so

   mdadm /dev/md4 --fail detached --remove detached

will get rid of any detached devices completely, as will

   mdadm /dev/md4 --fail detached --remove failed

though that will also remove any failed devices that don't happen to
be detached.

NeilBrown


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#416512: removed disk md-device

2007-05-10 Thread Neil Brown

On Wednesday May 9, [EMAIL PROTECTED] wrote:
 
 Neil Brown [EMAIL PROTECTED] [2007.04.02.0953 +0200]:
 Hmmm... this is somewhat awkward.  You could argue that udev should be
 taught to remove the device from the array before removing the device
 from /dev.  But I'm not convinced that you always want to 'fail' the
 device.   It is possible in this case that the array is quiescent and
 you might like to shut it down without registering a device failure...
 
 Hmm, the the kernel advised hotplug to remove the device from /dev, but you 
 don't want to remove it from md? Do you have an example for that case?

Until there is known to be an inconsistency among the devices in an
array, you don't want to record that there is.

Suppose I have two USB drives with a mounted but quiescent filesystem
on a raid1 across them.
I pull them both out, one after the other, to take them to my friends
place.

I plug them both in and find that the array is degraded, because as
soon as I unplugged on, the other was told that it was now the only
one. 
Not good.  Best to wait for an IO request that actually returns an
errors. 

 
 Maybe an mdadm command that will do that for a given device, or for
 all components of a given array if the 'dev' link is 'broken', or even
 for all devices for all array.
 
mdadm --fail-unplugged --scan
 or
mdadm --fail-unplugged /dev/md3
 
 Ok, so one could run this as cron script. Neil, may I ask if you already 
 started to work on this? Since we have the problem on a customer system, we 
 should fix it ASAP, but at least within the next 2 or 3 weeks. If you didn't 
 start work on it yet, I will do...

No, I haven't, but it is getting near the top of my list.
If you want a script that does this automatically for every array,
something like:

  for a in /sys/block/md*/md/dev-*
  do
if [ -f $a/block/dev ]
then : still there
else
echo faulty  $a/state
echo remove  $a/state
fi
  done

should do what you want. (I haven't tested it though).

NeilBrown


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#416512: removed disk md-device

2007-05-10 Thread Neil Brown

On Thursday May 10, [EMAIL PROTECTED] wrote:
 Neil Brown wrote:
  On Wednesday May 9, [EMAIL PROTECTED] wrote:
  Neil Brown [EMAIL PROTECTED] [2007.04.02.0953 +0200]:
  Hmmm... this is somewhat awkward.  You could argue that udev should be
  taught to remove the device from the array before removing the device
  from /dev.  But I'm not convinced that you always want to 'fail' the
  device.   It is possible in this case that the array is quiescent and
  you might like to shut it down without registering a device failure...
  Hmm, the the kernel advised hotplug to remove the device from /dev, but 
  you 
  don't want to remove it from md? Do you have an example for that case?
  
  Until there is known to be an inconsistency among the devices in an
  array, you don't want to record that there is.
  
  Suppose I have two USB drives with a mounted but quiescent filesystem
  on a raid1 across them.
  I pull them both out, one after the other, to take them to my friends
  place.
  
  I plug them both in and find that the array is degraded, because as
  soon as I unplugged on, the other was told that it was now the only
  one.
 And, in truth, so it was.

So what was?
It is true that now one drive is the only one plugged in, but is
that relevant?
Is it true that the one drive is the only drive in the array??
That depends on what you mean by the array.  If I am moving the
array to another computer, then the one drive still plugged into the
first computer is not the only drive in the array from my
perspective.

If there is a write request, and it can only be written to one drive
(because the other is unplugged), then it becomes appropriate to tell
the still-present drive that it is the only drive in the array.

 
 Who updated the event count though?

Sorry, not enough words.  I don't know what you are asking.

 
  Not good.  Best to wait for an IO request that actually returns an
  errors. 
 Ah, now would that be a good time to update the event count?

Yes.  Of course.  It is an event (IO failed).  That makes it a good
time to update the event count.. am I missing something here?

 
 
 Maybe you should allow drives to be removed even if they aren't faulty or 
 spare?
 A write to a removed device would mark it faulty in the other devices without
 waiting for a timeout.

Maybe, but I'm not sure what the real gain would be.

 
 But joggling a usb stick (similar to your use case) would probably be OK since
 it would be hot-removed and then hot-added.

This still needs user-space interaction.
If the USB layer detects a removal and a re-insert, sdb may well come
back a something different (sdp?) - though I'm not completely familiar
with how USB storage works.

In any case, it should really be a user-space decision what happens
then.  A hot re-add may well be appropriate, but I wouldn't want to
have the kernel make that decision.

NeilBrown



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#422554: After reboot, drives marked faulty/removed get reassebled as the mirrored array instead of the healthy ones

2007-05-06 Thread Neil Brown

On Sunday May 6, [EMAIL PROTECTED] wrote:
 Package: mdadm
 Version: 2.5.6-9
 
 I don't know whether this is a bug, but here's what happened.
 I had two mirrors operational /dev/hd[ac]1 and /dev/hd[ac]3 on md0 and md1, 
 respectively.
 On Apr 23, /dev/hdc started to complain about bad sectors. I first marked it 
 as faulty, 
 and then removed it from the array with mdadm. I didn't have physical access 
 to the machine, 
 so I didn't remove /dev/hdc. On Apr 27, the owner of the machine rebooted the 
 machine with ssh - reboot

You marked it faulty ... so it didn't get marked faulty itself?

Do you have kernel logs from when it got marked faulty?  That might
help make it cleared what happened.

NeilBrown


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#415441: s2disk and raid

2007-04-03 Thread Neil Brown

On Tuesday April 3, [EMAIL PROTECTED] wrote:
 Hi,
 
 I've got a bugreport [0] from a user trying to use raid and uswsusp. He's
 using initramfs-tools available in debian. I'll describe the problem
 and my analysis, maybe you can comment on what you think. A warning: I only
 have a casual understanding of raid, never looked at any code related to it.
 
 This is a setup where root maybe on raid, but swap isn't. Swap on raid
 will be very difficult to support, I think.

Nah... shouldn't be a problem well, maybe raid5.

 
 When s2disk is started, nothing special is done to the array. It may be
 in an unclean state (just like filesystems). Image is written to disk.
 
 After the power cycle the kernel boots, devices are discovered, among
 which the ones holding raid. Then we try to find the device that holds
 swap in case of resume and / in case of a normal boot.
 
 Now comes a crucial point. The script that finds the raid array, finds
 the array in an unclean state and starts syncing.

Uhm, so you are finding the device for the root filesystem before you
have decided which case it will be (resume or normal boot).  Can that
be delayed until after the decision.  It's probably not important but
it seems neater.
Or do you need the root device even when resuming (I guess if swap is
in a file on the root filesystem)

The trick is to use the 'start_ro' module parameter.
  echo 1  /sys/module/md_mod/parameters/start_ro

Then md will start arrays assuming read-only.  No resync will be
started, no superblock will be written.  They stay this way until the
first write at which point they become normal read-write and any
required resync starts.

So you can start arrays 'readonly', and resume off a raid1 without any
risk of the the resync starting when it shouldn't.

It is probably best to 'echo 0  ' once you have committed to a
normal boot, but it isn't really critical.

 
 The debian-maintainer of mdadm thinks that the suspend process should
 have left the array in a clean state, but this is IMHO impossible.

It probably would be best if suspend left the process in a clean
state.  It shouldn't be too hard, but it needs to be done in the
kernel.
However it isn't critical to all of this working well.

I mentioned above that if swap in on raid5 it might be awkward.  This
is because raid5 caches some data that is on disk.  If you snapshot
the raid5 memory, then resume raid5 so it can write to disk, when you
come back from suspend you could have old data in the cache.  It
should be possible to fix this, but it is currently a potential
problem that might be worth warning people against.

NeilBrown


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#413450: libc6: svc_getreqset doesn't work on 64bit hosts

2007-03-04 Thread Neil Brown

Package: libc6
Version: 2.3.6.ds1-13
Severity: normal
Tags: patch

On 64bit machines, svc_getreqset ignores filedescriptors 32-63 and others.
This has been fixed in glibc 2.4 but needs to be
fixed in 2.3.6.debian while we keep using it. 
The consequence of this bug is that if mountd (or any other rpc server) 
gets 26 or more concurrent tcp connections, it goes into a spin constantly 
using CPU, and not servicing requests on those connections.

Following patch fixes it.

Thanks,
NeilBrown


### Diffstat output
 ./sunrpc/svc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/sunrpc/svc.c ./sunrpc/svc.c
--- .prev/sunrpc/svc.c  2007-03-05 14:50:06.0 +1100
+++ ./sunrpc/svc.c  2007-03-05 14:50:44.0 +1100
@@ -372,7 +372,7 @@ svc_getreqset (fd_set *readfds)
 setsize = FD_SETSIZE;
   maskp = readfds-fds_bits;
   for (sock = 0; sock  setsize; sock += NFDBITS)
-for (mask = *maskp++; (bit = ffs (mask)); mask ^= (1  (bit - 1)))
+for (mask = *maskp++; (bit = ffsl (mask)); mask ^= (1L  (bit - 1)))
   INTUSE(svc_getreq_common) (sock + bit - 1);
 }
 INTDEF (svc_getreqset)




-- System Information:
Debian Release: 4.0
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-4-amd64
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8)

Versions of packages libc6 depends on:
ii  tzdata2007b-1Time Zone and Daylight Saving Time

libc6 recommends no packages.

-- no debconf information


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#406181: fix submitted yet?

2007-01-24 Thread Neil Brown

On Thursday January 11, [EMAIL PROTECTED] wrote:
 Thanks for the report, and the patch. Has the raid1 recovery fix been
 submitted upstream yet?

I've sent it to Andrew Morton.  I suspect it will be in 2.6.20
I'll be sending it to -stable later today.
NeilBrown


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#398058: [NFS] nhfsstone license

2006-12-18 Thread Neil Brown

On Thursday December 14, [EMAIL PROTECTED] wrote:
 On Mon, Nov 20, 2006 at 05:20:34PM -0500, J. Bruce Fields wrote:
  No modification required if we remove it from upstream too.  Do people
  use it?
 
 It seems like nobody does, judging from the response. Is there any chance of
 a relatively quick 1.0.11 without nhfsstone? We have frozen in Debian at the
 moment, but fixing such a license bug would probably be allowed past the
 freeze.

I've removed nhfstone from the git repository, but I'm not going to
roll up an 1.0.11 release just now - sorry.
Maybe in January.

NeilBrown


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#402416: Important data loss with Debian/GNU 3.1 (Sarge), kernel 2.6.18 and RAID1 software

2006-12-16 Thread Neil Brown

On Saturday December 16, [EMAIL PROTECTED] wrote:
 
 Don't use the Debian/GNU 3.1 with software RAID1 and kernel 2.6.18.
 I tested with SCSI and IDE drives with different controllers and PCs
 and IMPORTANT DATA LOSS can occur. It seems to be a problem with mdadm,
 maybe a new version of mdadm is required.

If you could report specific details - preferably to
linux-raid@vger.kernel.org - I would love to heard about it.
I am not aware of any significant problems in 2.6.18 that aren't fixed
in a later -stable release (e.g. 2.6.18.5).
Data loss is very unlikely to be a problem with mdadm.

Thanks,
NeilBrown


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#396582: Some additional info

2006-11-06 Thread Neil Brown

On Thursday November 2, [EMAIL PROTECTED] wrote:
 
 Neil, can we apply the patch contributed to fix this:
 
   
 http://bugs.debian.org/cgi-bin/bugreport.cgi/mdadm-fix-infinite-loop.diff?bug=396582;msg=5;att=1
 
 or do I remember that you previously replaced devlist with NULL to
 fix another bug?

Replacing the NULL with devlist will stopped stacked-device from being
auto-assembled properly.

This patch seems to fix the problem, and I am happy with it.

Thanks for the report.

NeilBrown



Fixed problems that could cause infinitel loop with auto assemble.

If an auto-assembly attempt failes because the array cannot be 
opened or because the array has already been created, then we
get into an infinite loop.

Reported-by: Dan Pascu [EMAIL PROTECTED]
Fixes-debian-bug: 396582


### Diffstat output
 ./Assemble.c |   22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff .prev/Assemble.c ./Assemble.c
--- .prev/Assemble.c2006-10-13 08:46:15.0 +1000
+++ ./Assemble.c2006-11-06 12:48:20.0 +1100
@@ -185,6 +185,8 @@ int Assemble(struct supertype *st, char 
else if (mdfd = 0)
inargv = 1;
 
+ try_again:
+
tmpdev = devlist; num_devs = 0;
while (tmpdev) {
if (tmpdev-used)
@@ -383,14 +385,28 @@ int Assemble(struct supertype *st, char 
else
asprintf(mddev, /dev/md/%s, c);
mdfd = open_mddev(mddev, ident-autof);
-   if (mdfd  0)
-   return mdfd;
+   if (mdfd  0) {
+   free(first_super);
+   free(devices);
+   first_super = NULL;
+   goto try_again;
+   }
vers = md_get_version(mdfd);
if (ioctl(mdfd, GET_ARRAY_INFO, inf)==0) {
+   for (tmpdev = devlist ;
+tmpdev  tmpdev-used != 1;
+tmpdev = tmpdev-next)
+   ;
fprintf(stderr, Name : %s already active, cannot 
restart it!\n, mddev);
+   if (tmpdev)
+   fprintf(stderr, Name :   %s needed for 
%s...\n,
+   mddev, tmpdev-devname);
close(mdfd);
+   mdfd = -1;
free(first_super);
-   return 1;
+   free(devices);
+   first_super = NULL;
+   goto try_again;
}
must_close = 1;
}


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#394193: initramfs-tools: root file system fails to mount when mounting by a label, on top of an md device.

2006-10-22 Thread Neil Brown

On Friday October 20, [EMAIL PROTECTED] wrote:
 On Oct 20, maximilian attems [EMAIL PROTECTED] wrote:
 
  On Fri, 20 Oct 2006, Chris Andrews wrote:
   At this point there are no /dev/disk/by-label enteries for the md devices,
   and running a udevtrigger will populate it. 
 No, this is wrong. DO NOT run udevtrigger for no good reason.
 If you need to trigger a specific event then tickle $DEVPATH/uevent,
 e.g:
 
 echo add  /sys/block/md0/uevent
 
 But still, I can't see why this should help: if /sys/block/md0/ exists
 then the MD array has already been activated, so its uevent should have
 been generated too.

No.  /sys/block/md0 gets created *before* the array is fully
functional.
First /sys/block/md0 is created
then the array is assembled.
then the array is started and becomes functional.

I realise this is confusing for udev and I have been looking into the
problem, but it is not clear what the best solution is.

One possibility is the following patch.  It sends online/offline
events at the appropriate times.  Would that make udev sufficiently
happy?

NeilBrown



---
Send online/offline uevents when an md array starts/stops.

This allows udev to do something intellegent when an
array becomes available.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/md.c |2 ++
 1 file changed, 2 insertions(+)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c   2006-09-29 11:04:13.0 +1000
+++ ./drivers/md/md.c   2006-09-28 14:50:00.0 +1000
@@ -3194,6 +3194,7 @@ static int do_md_run(mddev_t * mddev)
 
mddev-changed = 1;
md_new_event(mddev);
+   kobject_uevent(mddev-gendisk-kobj, KOBJ_ONLINE);
return 0;
 }
 
@@ -3355,6 +3356,7 @@ static int do_md_stop(mddev_t * mddev, i
if (disk)
set_capacity(disk, 0);
mddev-changed = 1;
+   kobject_uevent(mddev-gendisk-kobj, KOBJ_OFFLINE);
} else if (mddev-pers)
printk(KERN_INFO md: %s switched to read-only mode.\n,
mdname(mddev));


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#390954: [Pkg-openldap-devel] Bug#390954: slapd is compiled withou SLAPI support

2006-10-12 Thread Neil Brown

On Thursday October 12, [EMAIL PROTECTED] wrote:
 
 That's for sure :) Do you know if it could impact stability of the
 packages? I do not really want to risk anything as the packages 
 should get ready for the release soon...

I haven't done a thorough review of the code, but from what I have
seen and what I understand of how it works there should be zero impact
on stability.
It just makes some extra functionality available via the keywords
plugin and pluginlog in slapd.conf.
If you don't compile in SLAPI support, those keywords are quietly
ignored.  If you do, then those keywords give access to the extra
functionality. 
Certainly *using* this functionality _could_ upset stability.  But
just having it compiled in shouldn't.

Thanks,
NeilBrown


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#390954: slapd is compiled withou SLAPI support

2006-10-03 Thread Neil Brown

Package: slapd
Version: 2.3.27-1
Severity: normal

Compiling slapd with SLAPI support would make it a lot easier to
develop, test, and use SLAPI plugins :-)

Thanks
NeilBrown


-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

Versions of packages slapd depends on:
ii  adduser 3.97 Add and remove users and groups
ii  coreutils   5.97-5   The GNU core utilities
ii  debconf [debconf-2.0]   1.5.5Debian configuration management sy
ii  libc6   2.3.6.ds1-5  GNU C Library: Shared libraries
ii  libdb4.24.2.52+dfsg-1Berkeley v4.2 Database Libraries [
ii  libiodbc2   3.52.4-3 iODBC Driver Manager
ii  libldap-2.3-0   2.3.27-1 OpenLDAP libraries
ii  libltdl31.5.22-4 A system independent dlopen wrappe
ii  libperl5.8  5.8.8-6.1Shared Perl library
ii  libsasl22.1.19.dfsg1-0.5 Authentication abstraction library
ii  libslp1 1.2.1-6  OpenSLP libraries
ii  libssl0.9.8 0.9.8c-2 SSL shared libraries
ii  libwrap07.6.dbs-11   Wietse Venema's TCP wrappers libra
ii  perl [libmime-base64-pe 5.8.8-6.1Larry Wall's Practical Extraction 
ii  psmisc  22.3-1   Utilities that use the proc filesy

Versions of packages slapd recommends:
pn  db4.2-util  none   (no description available)
ii  libsasl2-modules2.1.19.dfsg1-0.5 Pluggable Authentication Modules f

-- debconf information:
  slapd/fix_directory: true
  shared/organization:
  slapd/upgrade_slapcat_failure:
  slapd/backend: BDB
* slapd/allow_ldap_v2: false
* slapd/no_configuration: true
  slapd/move_old_database: true
  slapd/suffix_change: false
  slapd/slave_databases_require_updateref:
  slapd/dump_database_destdir: /var/backups/slapd-VERSION
  slapd/autoconf_modules: true
  slapd/domain:
  slapd/password_mismatch:
  slapd/invalid_config: true
  slapd/upgrade_slapadd_failure:
  slapd/dump_database: when needed
  slapd/migrate_ldbm_to_bdb: false
  slapd/purge_database: false


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#382326: missing /dev/mdX cause assembly to fail and to generate missing nodes

2006-08-11 Thread Neil Brown

On Thursday August 10, [EMAIL PROTECTED] wrote:
 Package: mdadm
 Version: 2.5.3-1~unreleased.3
 Severity: normal
 
 Dear Neil,
 
 Here's an issue I've stumbled over today: say /etc/mdadm/mdadm.conf
 refers to /dev/md1, but that does not exist during boot and udev
 does *not* create it. Now, check this out:

Does mdadm.conf only have md1?  or does it have md5 and md6 as well?

 
   mdadm:~# rm /dev/md?
   mdadm:~# mdadm -As
   mdadm: No arrays found in config file or automatically

Looking at the 'strace' of that I see e.g.

stat64(/dev/md5, 0xaf81cf00)  = -1 ENOENT (No such file or 
directory)
mknod(/dev/md5, S_IFBLK|0600, makedev(9, 5)) = 0
chown32(/dev/md5, 0, 6)   = 0
chmod(/dev/md5, 0660) = 0
stat64(/dev/md5, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 5), ...}) = 0
open(/dev/md5, O_RDWR)= 3

It didn't exist, so it made it, and then:

fstat64(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 5), ...}) = 0
ioctl(3, 0x800c0910, 0xaf81cf58)= 0
That was RAID_VERSION
ioctl(3, 0x80480911, 0xaf81d370)= 0
and that was GET_ARRAY_INFO
it didn't return -ENODEV, so the array must be at least partly
assembled.
My guess would be that it got assembled as /dev/md/5  or some other name.
What did 'cat /proc/mdstat' show before hand?
What about ls -l /dev/md/ ??


   mdadm:~# ls /dev/md?
   /dev/md1  /dev/md5  /dev/md6
   mdadm:~# mdadm -As
   mdadm: /dev/md5 is already active.
   mdadm: /dev/md1 is already active.
   mdadm: /dev/md6 is already active.

Hmm.. that's not what I would have wanted. mdadm -As was meant to be
silent about already-active arrays.  It was the first time.
But this is coming from 'open_mddev' which has been asked to create
a device, but found that it already existed and was active.
I'll have to think about that.

 
 So it seems that mdadm creates the device nodes even though it does
 not find any. On second try then, of course, it works.

I'm not sure what creates the device nodes even though it does not
find any means
In both cases, mdadm discovered that the arrays listed in mdadm.conf
were already assembled, and so didn't assemble them.  The first time
it had to create the device nodes first.

NeilBrown


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#375879: fails to prepare initramfs if root is on LVM on RAID

2006-06-28 Thread Neil Brown

On Wednesday June 28, [EMAIL PROTECTED] wrote:
 Package: mdadm
 Version: 2.5.2-1
 Severity: important
 
 If rootfs is on LVM, which in turn is on RAID, the mdadm hook
 determines that RAID is not needed:
 
   ++ df /
   ++ sed -ne 's,^\(/dev/[^[:space:]]\+\).*,\1,p'
   + ROOTRAIDDEV=/dev/mapper/r1vg-root
   + '[' -n /dev/mapper/r1vg-root ']'
   + touch /tmp/mkinitramfs_yc5768/conf/mdadm.conf
   + echo ROOTRAIDDEV=/dev/mapper/r1vg-root
   ++ mdadm --detail /dev/mapper/r1vg-root
   ++ sed -ne 's,[[:space:]]*UUID : ,,p'
   + ROOTUUID=
   + case $ROOTUUID in
   + logger -sp syslog.notice -- 'I: mdadm: rootfs not on RAID, not including 
 RAID stuff
 
 This means we have to get rid of the optimisation in the initramfs
 and install all kernel modules, as before, and then tell mdadm to
 assemble *all* arrays from the initramfs. This sucks because it
 basically makes /etc/mdadm/mdadm.conf obsolete, unless the user will
 start and stop arrays on the running system.

mdadm.conf is not completely obsolete.  It can still contain useful
guidance for --monitor (such as expected number of spares).

Maybe you just need to be more subtle in tracing the structure of the
root device, though I agree that might be fairly complicated.

NeilBrown


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#373802: Fwd: Bug#373802: FTBFS with GCC 4.2: cast from pointer to integer of different size

2006-06-15 Thread Neil Brown

On Thursday June 15, [EMAIL PROTECTED] wrote:
 tags 373802 confirmed upstream
 thanks
 
 Neil, this one's for you.. future compatibility and all that jazz.
 :)
 
 - Forwarded message from Martin Michlmayr [EMAIL PROTECTED] -
 
 Now it fails with the following error with GCC 4.2.  Fortunately, this
 is the only remaining problem.
 
  Automatic build of mdadm_2.4.1-6 on juist by sbuild/alpha 0.44
 ...
  gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\/etc/mdadm/mdadm.conf\ 
  -DCONFFILE2=\/etc/mdadm.conf\ -ggdb -fno-strict-aliasing -Os 
  -DSendmail=\/usr/sbin/sendmail -t\   -c -o super1.o super1.c
  cc1: warnings being treated as errors
  super1.c: In function 'calc_sb_1_csum':
  super1.c:118: warning: cast from pointer to integer of different size
  super1.c:119: warning: cast from pointer to integer of different size
  make[1]: *** [super1.o] Error 1
 

Yes, fixed in 2.5.1 - thanks.


-
Fix offsetof macro for 64bit hosts

### Diffstat output
 ./super1.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/super1.c ./super1.c
--- .prev/super1.c  2006-06-16 10:52:10.0 +1000
+++ ./super1.c  2006-06-16 10:53:41.0 +1000
@@ -104,7 +104,7 @@ struct mdp_superblock_1 {
 #defineMD_FEATURE_ALL  (1|2|4)
 
 #ifndef offsetof
-#define offsetof(t,f) ((int)(((t*)0)-f))
+#define offsetof(t,f) ((size_t)(((t*)0)-f))
 #endif
 static unsigned int calc_sb_1_csum(struct mdp_superblock_1 * sb)
 {


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#372696: cupsys: cupsd crashes mysteriously

2006-06-11 Thread Neil Brown

Package: cupsys
Version: 1.2.1-2
Severity: grave
Justification: renders package unusable


Cupsd crashes occasionally, particularly when browing the web interface.

I compiled from src and ran 'gdb' on the version with all the symbols
still intact, and the problem is near line 2541 on scheduler/dirsvc.c.

The code there reads:

  httpAssembleURIf(HTTP_URI_CODING_ALL, uri, sizeof(uri), ipp, NULL,
   iface-hostname, iface-port,
   (p-type  CUPS_PRINTER_CLASS) ? /classes/%s%s :
/printers/%s,
   p-name);

The two '%s' in the /classes/ branch are the problem.  When trying to
format the second, an invalid address is dereferenced, and BANG.

NeilBrown


-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.16
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

Versions of packages cupsys depends on:
ii  adduser  3.87Add and remove users and groups
ii  debconf [debconf-2.0]1.5.1   Debian configuration management sy
ii  gs-esp   8.15.1.dfsg.1-2 The Ghostscript PostScript interpr
ii  libacl1  2.2.37-1Access control list shared library
ii  libc62.3.6-15GNU C Library: Shared libraries
ii  libcupsimage21.2.1-2 Common UNIX Printing System(tm) - 
ii  libcupsys2   1.2.1-2 Common UNIX Printing System(tm) - 
ii  libdbus-1-2  0.61-6  simple interprocess messaging syst
ii  libgnutls13  1.3.5-1.1   the GNU TLS library - runtime libr
ii  libldap2 2.1.30-13   OpenLDAP libraries
ii  libpam0g 0.79-3.1Pluggable Authentication Modules l
ii  libpaper11.1.18  Library for handling paper charact
ii  libslp1  1.2.1-5 OpenSLP libraries
ii  lsb-base 3.1-10  Linux Standard Base 3.1 init scrip
ii  patch2.5.9-4 Apply a diff file to an original
ii  perl-modules 5.8.8-5 Core Perl modules
ii  procps   1:3.2.6-2.2 /proc file system utilities
ii  xpdf-utils [poppler-util 3.01-8  Portable Document Format (PDF) sui
ii  zlib1g   1:1.2.3-11  compression library - runtime

Versions of packages cupsys recommends:
ii  cupsys-client   1.2.1-2  Common UNIX Printing System(tm) - 
ii  foomatic-filters3.0.2-20060530-1 linuxprinting.org printer support 
ii  smbclient   3.0.22-1 a LanManager-like simple client fo

-- debconf information:
  cupsys/raw-print: true
  cupsys/ports: 631
  cupsys/backend: ipp, lpd, parallel, socket, usb
  cupsys/portserror:
  cupsys/browse: true


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#372618: Fwd: Bug#372618: mdadm --monitor consumes much memory and idle cpu

2006-06-11 Thread Neil Brown

On Saturday June 10, [EMAIL PROTECTED] wrote:
 
 Neil, this one should be of interest to you. Please reply to
 [EMAIL PROTECTED] (reply-to set).
 
 - Forwarded message from Elimar Riesebieter [EMAIL PROTECTED] -
 
...
 
 The memory and cpu consumption grows up to around 50% running 
 /sbin/mdadm --monitor --pid-file /var/run/mdadm.pid --mail root
 --daemonise --scan
 

Yeah, thanks...

See patch.

NeilBrown



Fix memory leak in monitor mode

When rescanning /dev, we didn't free the old list.
Also don't search for device with a number of 0,0

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./util.c |   13 +
 1 file changed, 13 insertions(+)

diff ./util.c~current~ ./util.c
--- ./util.c~current~   2006-06-11 19:48:02.0 +1000
+++ ./util.c2006-06-11 19:39:27.0 +1000
@@ -416,10 +416,23 @@ char *map_dev(int major, int minor, int 
struct devmap *p;
char *std = NULL, *nonstd=NULL;
int did_check = 0;
+
+   if (major == 0  minor == 0) {
+   if (!create)
+   return NULL;
+   else
+   return 0:0;
+   }
  retry:
if (!devlist_ready) {
char *dev = /dev;
struct stat stb;
+   while(devlist) {
+   struct devmap *d = devlist;
+   devlist = d-next;
+   free(d-name);
+   free(d);
+   }
if (lstat(dev, stb)==0 
S_ISLNK(stb.st_mode))
dev = /dev/.;


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#344617: closed by martin f krafft [EMAIL PROTECTED] (Re: Fwd: Re: userspace incorrectly detects RAID (d-i))

2006-06-02 Thread Neil Brown

On Thursday June 1, [EMAIL PROTECTED] wrote:
  Unfortunately, this means that you have to recreate the arrays. At
  least I do not know of a way to migrate version-0 to version-1.
 
   Magic : a92b4efc
 Version : 00.90.00
UUID : 881b45c3:b1f47d8a:7b9b8401:167b0d02
 
 Sigh, I take it 00.90.00 isn't sufficient? On the face of it writing such
 a conversion utility shouldn't be /too/ difficult, there that much to
 change?

No, 0.90.0 isn't sufficient.
I suspect one day
  mdadm --assemble --metadata=1 --update=metadata .
will do what you want, but not today :-(

 
 Having looked at it, since the devices have an indicator for where they
 are in the array (dev 0-nnn), that is also a good sanity test. In the
 case of a multiple detection of an array member never add the drive by
 default.

That would be fairly easy to implement it should be safe as long
as we only reject multiple devices if they have identical superblocks
(rather than just identical positions in the array, but one is older
than the other) maybe this will be in 2.5.1

NeilBrown



 
   So the correct fix if you are concerned about this it to use version-1
   metadata. 
  --metadata=1
  
  ... when creating new arrays.
  
  Hopefully this will be the default soon.
 
 The hardware transition might be a good time...
 
 
 -- 
 (\___(\___(\__  --= 8-) EHM =--  __/)___/)___/)
  \BS (| [EMAIL PROTECTED] PGP 8881EF59 |)   /
   \_CS\   |  _  -O #include stddisclaimer.h O-   _  |   /  _/
 \___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/
 
 
 
 
 
 ___
 pkg-mdadm-devel mailing list
 [EMAIL PROTECTED]
 http://lists.alioth.debian.org/mailman/listinfo/pkg-mdadm-devel


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#369779: strict aliasing bug in mdadm

2006-06-01 Thread Neil Brown

On Thursday June 1, [EMAIL PROTECTED] wrote:
 tags 369779 + confirmed upstream pending
 thanks
 
 I can confirm this bug. For now, it surely suffices to add
 -fno-strict-aliasing in Debian if you, Neil, still use the flag for
 development.
 
 I am surely not touching dlink.h, even though that seems to be the
 source of the problem. Maybe this could be fixed sometime in the
 future? Maybe there's even another implementation out there?

I already received a patch to fix this problem.  It will be in 2.5.1

NeilBrown

From: Luca Berra [EMAIL PROTECTED]

diff ./dlink.h~current~ ./dlink.h
--- ./dlink.h~current~  2005-12-05 16:52:22.0 +1100
+++ ./dlink.h   2006-05-29 11:46:19.0 +1000
@@ -4,16 +4,16 @@
 
 struct __dl_head
 {
-struct __dl_head * dh_prev;
-struct __dl_head * dh_next;
+void * dh_prev;
+void * dh_next;
 };
 
 #definedl_alloc(size)  ((void*)(((char*)calloc(1,(size)+sizeof(struct 
__dl_head)))+sizeof(struct __dl_head)))
 #definedl_new(t)   ((t*)dl_alloc(sizeof(t)))
 #definedl_newv(t,n)((t*)dl_alloc(sizeof(t)*n))
 
-#define dl_next(p) *((void**)(((struct __dl_head*)(p))[-1].dh_next))
-#define dl_prev(p) *((void**)(((struct __dl_head*)(p))[-1].dh_prev))
+#define dl_next(p) *struct __dl_head*)(p))[-1].dh_next))
+#define dl_prev(p) *struct __dl_head*)(p))[-1].dh_prev))
 
 void *dl_head(void);
 char *dl_strdup(char *);


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#367901: mdadm --examine returns non-zero exit code even though it appears to have been successful

2006-05-18 Thread Neil Brown

On Thursday May 18, [EMAIL PROTECTED] wrote:
 Package: mdadm
 Severity: normal
 Version: 2.4.1-1
 
 mdadm:~# mdadm --examine --scan -c partitions ; echo $?
 ARRAY /dev/md0 level=raid5 num-devices=3 
 UUID=7a2c0483:a2788d6f:be006945:f0cbf15a
 1
 
 
 I see no reason why mdadm shouldn't exit with code 0...

Neither can I.  It'll be fixed in the next release.  Thanks.

NeilBrown


Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./Examine.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff ./Examine.c~current~ ./Examine.c
--- ./Examine.c~current~2006-03-27 15:18:48.0 +1100
+++ ./Examine.c 2006-05-19 10:00:49.0 +1000
@@ -72,10 +72,11 @@ int Examine(mddev_dev_t devlist, int bri
 
fd = dev_open(devlist-devname, O_RDONLY);
if (fd  0) {
-   if (!scan)
+   if (!scan) {
fprintf(stderr,Name : cannot open %s: %s\n,
devlist-devname, strerror(errno));
-   err = 1;
+   err = 1;
+   }
}
else {
if (!st)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#356630: Make always recompiles everything in the Linux Kernel.

2006-03-12 Thread Neil Brown


Package: make
Version: 3.80+3.81.rc1-1
Severity: important


While doing kernel development (in the latest -mm tree if it makes a
difference), make always want to recompile everything since upgrading
to this version.  Other packages with simpler make files work ok.

I've just down graded to 3.80+3.81.b4-1 and it works much better.

Thanks,
NeilBrown


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#274859: [help needed] RAID and /dev advice needed

2005-05-23 Thread Neil Brown

On Monday May 23, [EMAIL PROTECTED] wrote:
 
 tags 274859 +patch
 thanks
 
 [martin f krafft]
  I checked out the source and opening is not hard... but mdadm also
  creates device nodes and uses S_ISBLK all over the place, so I don't
  really know whether adding a || S_ISLNK will fix it.
 
 I didn't actually test this, but I honestly don't see why lstat() is
 used here, instead of stat() which is used everywhere else.
 
 Neil?  Is there a good reason for lstat here?  It apparently breaks on
 devfs.  (Ref. http://bugs.debian.org/274859)

Maybe it what there deliberately to break devfs ??? ;-)

No, it is a bug.  It should be 'stat', not 'lstat'.

Thanks,
NeilBrown

 
 Peter
 
 --- mdadm-1.9.0/mdopen.c~ 2005-02-03 18:45:23.0 -0600
 +++ mdadm-1.9.0/mdopen.c  2005-05-23 19:34:12.0 -0500
 @@ -97,7 +97,7 @@
   return -1;
   }
   stb.st_mode = 0;
 - if (lstat(dev, stb)==0  ! S_ISBLK(stb.st_mode)) {
 + if (stat(dev, stb)==0  ! S_ISBLK(stb.st_mode)) {
   fprintf(stderr, Name : %s is not a block device.\n,
   dev);
   return -1;


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

87 matches

Mail list logo