Bug#621786: One Thought/Thing to Try
Neil, Bad news, increasing alignment from 512 to 4096 didn't solved the problem, segmentation fault is still occuring randomly when rebooting but less often than before. Bellow are some boot logs extracts, I hope it will give you other ideas: === Begin: Loading[5.181890] md: raid1 personality registered for level 1 Success: loaded module raid1. done. Begin: Assembling all MD arrays ... [5.253192] md: md1 stopped. [5.289986] md: bind [5.297419] md: bind [5.323848] raid1: raid set md1 active with 2 out of 2 mirrors [5.330025] md1: detected capacity change from 0 to 511692800 mdadm: /dev/md/debian:1 has been started with 2 drives. [5.345109] md1: unknown partition table [5.583525] md: md0 stopped. mdadm: cannot re-read metadata from /dev/mtdblock5 - aborting [5.668071] md: md2 stopped. [5.701962] md: bind [5.707337] md: bind [5.738602] raid1: raid set md2 active with 2 out of 2 mirrors [5.744644] md2: detected capacity change from 0 to 989689110528 mdadm: /dev/md2 has been started with 2 drives. [5.762720] md2: unknown partition table Segmentation fault Failure: failed to assemble all arrays. done. done. === Begin: Loading[5.105979] md: raid1 personality registered for level 1 Success: loaded module raid1. done. Begin: Assembling all MD arrays ... [5.177587] md: md1 stopped. [5.215192] md: bind [5.222784] md: bind [5.249114] raid1: raid set md1 active with 2 out of 2 mirrors [5.255129] md1: detected capacity change from 0 to 511692800 mdadm: /dev/md/debian:1 has been started with 2 drives. [5.270464] md1: unknown partition table Segmentation fault Failure: failed to assemble all arrays. done. done. Begin: Waiting for root file system ... done. Gave up waiting for root device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait long enough?) - Check root= (did the system wait for the right device?) - Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/695192e7-148e-4d02-8588-4d2fc3f8c978 does not exist. Dropping to a shell! === Arnaud Le 15/07/2011 05:34, Arnaud Desmier a écrit : Hi Neil, I've applied your patch and it is working well, no more crash with DEVICE=partitions in mdadm.conf Thanks, Arnaud Le 15/07/2011 04:25, NeilBrown a écrit : On Thu, 14 Jul 2011 20:24:52 +0200 Arnaud Desmier wrote: Hi Scott, Thanks for your answer, I've changed /etc/mdadm/mdadm.conf as you requested and "/sbin/mdadm --assemble --scan --auto=yes --symlink=no" didn't crashed but exit with code 2, I don't know what it means. Reboot is now going fine and each array is correctly mounted. Arnaud Le 10/07/2011 14:28, Scott Schaefer a écrit : ... If it "succeeds", then I think there is reasonable chance this is related to the mtdblock device(s). Update the initramfs, reboot, and see if the devices are normal on startup. So it seems to be directly related to the mtdblock devices... I wonder how. If you have time/interest to experiment and compile some code I would be very interested to know if changing: if (posix_memalign((void**)&super, 512, on line 1291 of super1.c (in the function load_super1) to if (posix_memalign((void**)&super, 4096, made any difference. i.e. get the source, make this change, compile and install. Then revert the change to mdadm.conf and see if it then fails or works. Thanks, NeilBrown
Bug#621786: One Thought/Thing to Try
Hi Neil, I've applied your patch and it is working well, no more crash with DEVICE=partitions in mdadm.conf Thanks, Arnaud Le 15/07/2011 04:25, NeilBrown a écrit : On Thu, 14 Jul 2011 20:24:52 +0200 Arnaud Desmier wrote: Hi Scott, Thanks for your answer, I've changed /etc/mdadm/mdadm.conf as you requested and "/sbin/mdadm --assemble --scan --auto=yes --symlink=no" didn't crashed but exit with code 2, I don't know what it means. Reboot is now going fine and each array is correctly mounted. Arnaud Le 10/07/2011 14:28, Scott Schaefer a écrit : ... If it "succeeds", then I think there is reasonable chance this is related to the mtdblock device(s). Update the initramfs, reboot, and see if the devices are normal on startup. So it seems to be directly related to the mtdblock devices... I wonder how. If you have time/interest to experiment and compile some code I would be very interested to know if changing: if (posix_memalign((void**)&super, 512, on line 1291 of super1.c (in the function load_super1) to if (posix_memalign((void**)&super, 4096, made any difference. i.e. get the source, make this change, compile and install. Then revert the change to mdadm.conf and see if it then fails or works. Thanks, NeilBrown
Bug#621786: One Thought/Thing to Try
Hi Scott, Thanks for your answer, I've changed /etc/mdadm/mdadm.conf as you requested and "/sbin/mdadm --assemble --scan --auto=yes --symlink=no" didn't crashed but exit with code 2, I don't know what it means. Reboot is now going fine and each array is correctly mounted. Arnaud Le 10/07/2011 14:28, Scott Schaefer a écrit : I have spent some time attempting to duplicate this problem w/o much success. I have duplicated your setup as closely as I can, which, unfortunately is not real close: 1) I am running under QEMU, and 2) I have smaller drive; unfortunately, I don't have a spare TB drive lying around :-(( I read the thread on the QNAP site you linked to as well. I have one thing that should be simple to try: Change your /etc/mdadm/mdadm.conf DEVICE line to read: DEVICE /dev/sd* Then, retry mdadm --assemble ... If it still fails, running under strace again may help. If it "succeeds", then I think there is reasonable chance this is related to the mtdblock device(s). Update the initramfs, reboot, and see if the devices are normal on startup.
Bug#621786: mdadm: invalid pointer or memory corruption on armel system
It seems that "mdadm --assemble --scan --auto=yes --symlink=no" is the faulty invocation. I've run gdb using package mdadm but it couldn't find any debug symbol... So I run the following command as Makefile displays when compiling package: gcc -o mdadm mdadm.o config.o mdstat.o ReadMe.o util.o Manage.o Assemble.o Build.o Create.o Detail.o Examine.o Grow.o Monitor.o dlink.o Kill.o Query.o Incremental.o mdopen.o super0.o super1.o super-ddf.o super-intel.o bitmap.o restripe.o sysfs.o sha1.o mapfile.o crc32.o sg_io.o msg.o platform-intel.o probe_roms.o and now I can get the folowing: # gdb --args mdadm --assemble --scan --auto=yes --symlink=no GNU gdb (GDB) 7.0.1-debian Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "arm-linux-gnueabi". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/src/archive/mdadm-3.1.4/mdadm...done. (gdb) run Starting program: /usr/src/archive/mdadm-3.1.4/mdadm --assemble --scan --auto=yes --symlink=no *** glibc detected *** /usr/src/archive/mdadm-3.1.4/mdadm: munmap_chunk(): invalid pointer: 0x000a9800 *** *** glibc detected *** /usr/src/archive/mdadm-3.1.4/mdadm: malloc(): memory corruption: 0x000a9660 *** Program received signal SIGABRT, Aborted. 0x40065c88 in raise () from /lib/libc.so.6 (gdb) bt #0 0x40065c88 in raise () from /lib/libc.so.6 #1 0x40069c24 in abort () from /lib/libc.so.6 #2 0x40069c24 in abort () from /lib/libc.so.6 #3 0x40069c24 in abort () from /lib/libc.so.6 #4 0x40069c24 in abort () from /lib/libc.so.6 #5 0x40069c24 in abort () from /lib/libc.so.6 #6 0x40069c24 in abort () from /lib/libc.so.6 #7 0x40069c24 in abort () from /lib/libc.so.6 #8 0x40069c24 in abort () from /lib/libc.so.6 #9 0x40069c24 in abort () from /lib/libc.so.6 #10 0x40069c24 in abort () from /lib/libc.so.6 Is it interesting? Thanks, Arnaud Le 09/04/2011 09:30, martin f krafft a écrit : also sprach Arnaud Desmier [2011.04.09.0923 +0200]: How can I get the stack trace as you requested? Find the exact invocation of mdadm that produces the problem and run it under gdb until it crashes. The stacktrace you can then get with 'bt'.
Bug#621786: mdadm: invalid pointer or memory corruption on armel system
Hi Martin, I've made the following changes before compiling the debian package version (not the one build from git repository): debian/rules: - change declaration of CXFLAGS to CXFLAGS = -ggdb -DDEBUG -g Before building package: - export DEB_BUILD_OPTIONS="noopt" Starting build of package: - dpkg-buildpackage Installing package (3.1.4.1 is my current version from git used as workaround): # dpkg -i ../mdadm_3.1.4-1+8efb9d1_armel.deb dpkg: warning: downgrading mdadm from 3.1.4.1-0 to 3.1.4-1+8efb9d1. (Reading database ... 35477 files and directories currently installed.) Preparing to replace mdadm 3.1.4.1-0 (using .../mdadm_3.1.4-1+8efb9d1_armel.deb) ... Stopping MD monitoring service: mdadm --monitor. Unpacking replacement mdadm ... Setting up mdadm (3.1.4-1+8efb9d1) ... Generating array device nodes... done. update-initramfs: deferring update (trigger activated) Starting MD monitoring service: mdadm --monitor. *** glibc detected *** /sbin/mdadm: munmap_chunk(): invalid pointer: 0x000a9800 *** *** glibc detected *** /sbin/mdadm: malloc(): memory corruption: 0x000a9660 *** Aborted Generating udev events for MD arrays...done. Processing triggers for man-db ... Processing triggers for initramfs-tools ... update-initramfs: Generating /boot/initrd.img-2.6.32-5-orion5x Generating kernel u-boot image... done. Flashing kernel... done. Flashing initramfs... done. /etc/init.d/mdadm-raid restart *** glibc detected *** /sbin/mdadm: munmap_chunk(): invalid pointer: 0x000a9800 *** *** glibc detected *** /sbin/mdadm: malloc(): memory corruption: 0x000a9660 *** Aborted Generating udev events for MD arrays...done. /etc/init.d/mdadm restart Stopping MD monitoring service: mdadm --monitor. Starting MD monitoring service: mdadm --monitor. How can I get the stack trace as you requested? Thanks, Arnaud Le 09/04/2011 08:32, martin f krafft a écrit : tags 621786 moreinfo thanks /etc/init.d/mdadm-raid restart *** glibc detected *** /sbin/mdadm: munmap_chunk(): invalid pointer: 0x00084800 *** *** glibc detected *** /sbin/mdadm: malloc(): memory corruption: 0x00084660 *** Aborted Please compile mdadm with debug symbols and get a stack trace of the alleged double-free.