Bug#621786: One Thought/Thing to Try

2011-07-15 Thread Arnaud Desmier

Neil,

Bad news, increasing alignment from 512 to 4096 didn't solved  the 
problem, segmentation fault is still occuring randomly when rebooting 
but less often than before.


Bellow are some boot logs extracts, I hope it will give you other ideas:

===
Begin: Loading[5.181890] md: raid1 personality registered for level 1
Success: loaded module raid1.
done.
Begin: Assembling all MD arrays ... [5.253192] md: md1 stopped.
[5.289986] md: bind
[5.297419] md: bind
[5.323848] raid1: raid set md1 active with 2 out of 2 mirrors
[5.330025] md1: detected capacity change from 0 to 511692800
mdadm: /dev/md/debian:1 has been started with 2 drives.
[5.345109]  md1: unknown partition table
[5.583525] md: md0 stopped.
mdadm: cannot re-read metadata from /dev/mtdblock5 - aborting
[5.668071] md: md2 stopped.
[5.701962] md: bind
[5.707337] md: bind
[5.738602] raid1: raid set md2 active with 2 out of 2 mirrors
[5.744644] md2: detected capacity change from 0 to 989689110528
mdadm: /dev/md2 has been started with 2 drives.
[5.762720]  md2: unknown partition table
Segmentation fault
Failure: failed to assemble all arrays.
done.
done.

===
Begin: Loading[5.105979] md: raid1 personality registered for level 1
Success: loaded module raid1.
done.
Begin: Assembling all MD arrays ... [5.177587] md: md1 stopped.
[5.215192] md: bind
[5.222784] md: bind
[5.249114] raid1: raid set md1 active with 2 out of 2 mirrors
[5.255129] md1: detected capacity change from 0 to 511692800
mdadm: /dev/md/debian:1 has been started with 2 drives.
[5.270464]  md1: unknown partition table
Segmentation fault
Failure: failed to assemble all arrays.
done.
done.
Begin: Waiting for root file system ... done.
Gave up waiting for root device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT!  /dev/disk/by-uuid/695192e7-148e-4d02-8588-4d2fc3f8c978 does not 
exist.  Dropping to a shell!


===


Arnaud


Le 15/07/2011 05:34, Arnaud Desmier a écrit :

Hi Neil,

I've applied your patch and it is working well, no more crash with 
DEVICE=partitions in mdadm.conf


Thanks,

Arnaud

Le 15/07/2011 04:25, NeilBrown a écrit :

On Thu, 14 Jul 2011 20:24:52 +0200 Arnaud Desmier
  wrote:


Hi Scott,

Thanks for your answer, I've changed /etc/mdadm/mdadm.conf as you
requested and "/sbin/mdadm --assemble --scan --auto=yes --symlink=no"
didn't crashed but exit with code 2, I don't know what it means.

Reboot is now going fine and each array is correctly mounted.

Arnaud

Le 10/07/2011 14:28, Scott Schaefer a écrit :

...

If it "succeeds", then I think there is reasonable chance this is
related to the mtdblock device(s).  Update the initramfs, reboot, and
see if the devices are normal on startup.

So it seems to be directly related to the mtdblock devices... I wonder how.

If you have time/interest to experiment and compile some code I would be very
interested to know if changing:

if (posix_memalign((void**)&super, 512,

on line 1291 of super1.c (in the function load_super1) to

if (posix_memalign((void**)&super, 4096,

made any difference.
i.e. get the source, make this change, compile and install.  Then
revert the change to mdadm.conf and see if it then fails or works.

Thanks,
NeilBrown



Bug#621786: One Thought/Thing to Try

2011-07-14 Thread Arnaud Desmier

Hi Neil,

I've applied your patch and it is working well, no more crash with 
DEVICE=partitions in mdadm.conf


Thanks,

Arnaud

Le 15/07/2011 04:25, NeilBrown a écrit :

On Thu, 14 Jul 2011 20:24:52 +0200 Arnaud Desmier
  wrote:


Hi Scott,

Thanks for your answer, I've changed /etc/mdadm/mdadm.conf as you
requested and "/sbin/mdadm --assemble --scan --auto=yes --symlink=no"
didn't crashed but exit with code 2, I don't know what it means.

Reboot is now going fine and each array is correctly mounted.

Arnaud

Le 10/07/2011 14:28, Scott Schaefer a écrit :

...

If it "succeeds", then I think there is reasonable chance this is
related to the mtdblock device(s).  Update the initramfs, reboot, and
see if the devices are normal on startup.

So it seems to be directly related to the mtdblock devices... I wonder how.

If you have time/interest to experiment and compile some code I would be very
interested to know if changing:

if (posix_memalign((void**)&super, 512,

on line 1291 of super1.c (in the function load_super1) to

if (posix_memalign((void**)&super, 4096,

made any difference.
i.e. get the source, make this change, compile and install.  Then
revert the change to mdadm.conf and see if it then fails or works.

Thanks,
NeilBrown



Bug#621786: One Thought/Thing to Try

2011-07-14 Thread Arnaud Desmier

Hi Scott,

Thanks for your answer, I've changed /etc/mdadm/mdadm.conf as you 
requested and "/sbin/mdadm --assemble --scan --auto=yes --symlink=no" 
didn't crashed but exit with code 2, I don't know what it means.


Reboot is now going fine and each array is correctly mounted.

Arnaud

Le 10/07/2011 14:28, Scott Schaefer a écrit :
I have spent some time attempting to duplicate this problem w/o much 
success.


I have duplicated your setup as closely as I can, which, unfortunately 
is not real close:


1) I am running under QEMU, and
2) I have smaller drive; unfortunately, I don't have a spare TB drive 
lying around :-((


I read the thread on the QNAP site you linked to as well.

I have one thing that should be simple to try:

Change your /etc/mdadm/mdadm.conf DEVICE line to read:

DEVICE /dev/sd*

Then, retry mdadm --assemble ...

If it still fails, running under strace again may help.

If it "succeeds", then I think there is reasonable chance this is 
related to the mtdblock device(s).  Update the initramfs, reboot, and 
see if the devices are normal on startup.






Bug#621786: mdadm: invalid pointer or memory corruption on armel system

2011-04-09 Thread Arnaud Desmier
It seems that "mdadm --assemble --scan --auto=yes --symlink=no" is the 
faulty invocation.


I've run gdb using package mdadm but it couldn't find any debug 
symbol... So I run the following command as Makefile displays when 
compiling package:
gcc  -o mdadm mdadm.o config.o mdstat.o  ReadMe.o util.o Manage.o 
Assemble.o Build.o Create.o Detail.o Examine.o Grow.o Monitor.o dlink.o 
Kill.o Query.o Incremental.o mdopen.o super0.o super1.o super-ddf.o 
super-intel.o bitmap.o restripe.o sysfs.o sha1.o mapfile.o crc32.o 
sg_io.o msg.o platform-intel.o probe_roms.o


and now I can get the folowing:

# gdb --args mdadm --assemble --scan --auto=yes --symlink=no
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabi".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/src/archive/mdadm-3.1.4/mdadm...done.
(gdb) run
Starting program: /usr/src/archive/mdadm-3.1.4/mdadm --assemble --scan 
--auto=yes --symlink=no
*** glibc detected *** /usr/src/archive/mdadm-3.1.4/mdadm: 
munmap_chunk(): invalid pointer: 0x000a9800 ***
*** glibc detected *** /usr/src/archive/mdadm-3.1.4/mdadm: malloc(): 
memory corruption: 0x000a9660 ***


Program received signal SIGABRT, Aborted.
0x40065c88 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x40065c88 in raise () from /lib/libc.so.6
#1  0x40069c24 in abort () from /lib/libc.so.6
#2  0x40069c24 in abort () from /lib/libc.so.6
#3  0x40069c24 in abort () from /lib/libc.so.6
#4  0x40069c24 in abort () from /lib/libc.so.6
#5  0x40069c24 in abort () from /lib/libc.so.6
#6  0x40069c24 in abort () from /lib/libc.so.6
#7  0x40069c24 in abort () from /lib/libc.so.6
#8  0x40069c24 in abort () from /lib/libc.so.6
#9  0x40069c24 in abort () from /lib/libc.so.6
#10 0x40069c24 in abort () from /lib/libc.so.6


Is it interesting?

Thanks,

Arnaud



Le 09/04/2011 09:30, martin f krafft a écrit :

also sprach Arnaud Desmier  [2011.04.09.0923 
+0200]:

How can I get the stack trace as you requested?

Find the exact invocation of mdadm that produces the problem and run
it under gdb until it crashes. The stacktrace you can then get with
'bt'.



Bug#621786: mdadm: invalid pointer or memory corruption on armel system

2011-04-09 Thread Arnaud Desmier

Hi Martin,

I've made the following changes before compiling the debian package 
version (not the one build from git repository):

debian/rules:
 - change declaration of CXFLAGS to CXFLAGS = -ggdb -DDEBUG -g

Before building package:
 - export DEB_BUILD_OPTIONS="noopt"

Starting build of package:
 - dpkg-buildpackage

Installing package (3.1.4.1 is my current version from git used as 
workaround):

# dpkg -i ../mdadm_3.1.4-1+8efb9d1_armel.deb
dpkg: warning: downgrading mdadm from 3.1.4.1-0 to 3.1.4-1+8efb9d1.
(Reading database ... 35477 files and directories currently 
installed.)
Preparing to replace mdadm 3.1.4.1-0 (using 
.../mdadm_3.1.4-1+8efb9d1_armel.deb) ...

Stopping MD monitoring service: mdadm --monitor.
Unpacking replacement mdadm ...
Setting up mdadm (3.1.4-1+8efb9d1) ...
Generating array device nodes... done.
update-initramfs: deferring update (trigger activated)
Starting MD monitoring service: mdadm --monitor.
*** glibc detected *** /sbin/mdadm: munmap_chunk(): invalid 
pointer: 0x000a9800 ***
*** glibc detected *** /sbin/mdadm: malloc(): memory corruption: 
0x000a9660 ***

Aborted
Generating udev events for MD arrays...done.
Processing triggers for man-db ...
Processing triggers for initramfs-tools ...
update-initramfs: Generating /boot/initrd.img-2.6.32-5-orion5x
Generating kernel u-boot image... done.
Flashing kernel... done.
Flashing initramfs... done.


/etc/init.d/mdadm-raid restart
*** glibc detected *** /sbin/mdadm: munmap_chunk(): invalid 
pointer: 0x000a9800 ***
*** glibc detected *** /sbin/mdadm: malloc(): memory corruption: 
0x000a9660 ***

Aborted
Generating udev events for MD arrays...done.

/etc/init.d/mdadm restart
Stopping MD monitoring service: mdadm --monitor.
Starting MD monitoring service: mdadm --monitor.

How can I get the stack trace as you requested?

Thanks,

Arnaud

Le 09/04/2011 08:32, martin f krafft a écrit :

tags 621786 moreinfo
thanks


/etc/init.d/mdadm-raid restart
*** glibc detected *** /sbin/mdadm: munmap_chunk(): invalid pointer: 0x00084800 
***
*** glibc detected *** /sbin/mdadm: malloc(): memory corruption: 0x00084660 ***
Aborted

Please compile mdadm with debug symbols and get a stack trace of the
alleged double-free.