Re: Debian machine not booting
Hi, Got the problems solved. I couldn't solve the problem by using the rescue disk, as it wouldn't let me stop the raid array. What I did was drop into the maintenance mode: mdadm --assemble /dev/md1 /dev/sdd1 /dev/sde1 //This should recreate the array mdadm --detail --scan /etc/mdadm/mdadm.conf //edit the file to remove duplicates dpkg-reconfigure linux-image-$(uname -r) then reboot. Huge amount of thanks go to Bob Proulx for all the help along the way, James On 7 July 2013 21:53, James Allsopp jamesaalls...@googlemail.com wrote: Hello, I've been too nervous to reboot, so I've left it in the rescue mode at the point where I assembled the raid arrays and went into boot at the \ partition. Tried to run: mdadm --stop /dev/md127 but got a mdadm: failed to stop array /dev/md127: Device or resource busy. Perhaps a running process, mounted filesustem or active volume group? I tried unmounting /home which stretches onto this disk via LVM, but this made no difference. Any idea how I should proceed? Thanks, James On 5 July 2013 01:10, Bob Proulx b...@proulx.com wrote: James Allsopp wrote: I'd like to hear about the optimisations, but I think I'll wait till I get the system rebuilt! Basically I had expected you to use either rescue mode of the d-i or a livecd or other to assemble the arrays. You did. But neither array came up completely correct. One came up with one disk degraded. The split brain clone came up on md127 instead of md0. The other one came up on md126. So you should fix those using the discussed instructions. I was thinking you would do that from the same system boot that you had posted that information from. But your recent mail implies that you shut the system down and went away for a while. So now it appears you need to rescue the system again by the same method you used to get that information you posted. All of that is fine. Except now we already know the information you posted. And so now we know how those arrays are supposed to go together. But that is okay. You can go through rescue mode and assemble the arrays exactly as you did before. And then --stop the arrays and assemble them correctly. But since we know how they are supposed to be assembled now you could skip the assembly of them in rescue mode or livecd mode or whatever you used and simply assemble the arrays correctly the first time. Basically I think you are going to do: * rescue * assemble arrays * stop arrays * assemble arrays correctly Which is perfectly acceptable. The result will be fine. But now that we know what we need to do you could simply do this: * rescue * assemble arrays correctly But I don't want to distract you with complications like this! :-) And then after you get everything working you should visit the partitioning on that second array. Your partitioning starts at the sector 1. But that won't be aligned. It will cause all writes to be a read-modify-write and performance will suffer. Device Boot Start End Blocks Id System /dev/sdd1 1 243201 1953512001 fd Linux raid autodetect Partition 1 does not start on physical sector boundary. Device Boot Start End Blocks Id System /dev/sde1 1 243201 1953512001 fd Linux raid autodetect Partition 1 does not start on physical sector boundary. Instead of using the entire disk starting at 1 it would be much better if you started at sector 2048 as is the new standard for Advanced Format 4k sector drives. I would expect that to be a large performance lever on your system. But fix that after you have your data up and available. Bob
Re: Debian machine not booting
Hello, I've been too nervous to reboot, so I've left it in the rescue mode at the point where I assembled the raid arrays and went into boot at the \ partition. Tried to run: mdadm --stop /dev/md127 but got a mdadm: failed to stop array /dev/md127: Device or resource busy. Perhaps a running process, mounted filesustem or active volume group? I tried unmounting /home which stretches onto this disk via LVM, but this made no difference. Any idea how I should proceed? Thanks, James On 5 July 2013 01:10, Bob Proulx b...@proulx.com wrote: James Allsopp wrote: I'd like to hear about the optimisations, but I think I'll wait till I get the system rebuilt! Basically I had expected you to use either rescue mode of the d-i or a livecd or other to assemble the arrays. You did. But neither array came up completely correct. One came up with one disk degraded. The split brain clone came up on md127 instead of md0. The other one came up on md126. So you should fix those using the discussed instructions. I was thinking you would do that from the same system boot that you had posted that information from. But your recent mail implies that you shut the system down and went away for a while. So now it appears you need to rescue the system again by the same method you used to get that information you posted. All of that is fine. Except now we already know the information you posted. And so now we know how those arrays are supposed to go together. But that is okay. You can go through rescue mode and assemble the arrays exactly as you did before. And then --stop the arrays and assemble them correctly. But since we know how they are supposed to be assembled now you could skip the assembly of them in rescue mode or livecd mode or whatever you used and simply assemble the arrays correctly the first time. Basically I think you are going to do: * rescue * assemble arrays * stop arrays * assemble arrays correctly Which is perfectly acceptable. The result will be fine. But now that we know what we need to do you could simply do this: * rescue * assemble arrays correctly But I don't want to distract you with complications like this! :-) And then after you get everything working you should visit the partitioning on that second array. Your partitioning starts at the sector 1. But that won't be aligned. It will cause all writes to be a read-modify-write and performance will suffer. Device Boot Start End Blocks Id System /dev/sdd1 1 243201 1953512001 fd Linux raid autodetect Partition 1 does not start on physical sector boundary. Device Boot Start End Blocks Id System /dev/sde1 1 243201 1953512001 fd Linux raid autodetect Partition 1 does not start on physical sector boundary. Instead of using the entire disk starting at 1 it would be much better if you started at sector 2048 as is the new standard for Advanced Format 4k sector drives. I would expect that to be a large performance lever on your system. But fix that after you have your data up and available. Bob
Re: Debian machine not booting
Thanks Bob, like I say, very much appreciated and I'll let you know how it goes! I'd like to hear about the optimisations, but I think I'll wait till I get the system rebuilt! James On 4 July 2013 00:47, Bob Proulx b...@proulx.com wrote: James Allsopp wrote: Thanks Bob, really can't thank you enough. Just to be clear about this, I'd do these commands from the rescue disk after I have assembled the arrays and gone to the bash shell? Short answer: Yes. Go for it! Longer answer: There are all kinds of things I want to say here. And I already said a lot! There are some optimizations that could be made. But if you do what is outlined it should work. But I don't want to make things more confusing by talking about minor things. I have my fingers crossed for you! :-) Bob
Re: Debian machine not booting
James Allsopp wrote: I'd like to hear about the optimisations, but I think I'll wait till I get the system rebuilt! Basically I had expected you to use either rescue mode of the d-i or a livecd or other to assemble the arrays. You did. But neither array came up completely correct. One came up with one disk degraded. The split brain clone came up on md127 instead of md0. The other one came up on md126. So you should fix those using the discussed instructions. I was thinking you would do that from the same system boot that you had posted that information from. But your recent mail implies that you shut the system down and went away for a while. So now it appears you need to rescue the system again by the same method you used to get that information you posted. All of that is fine. Except now we already know the information you posted. And so now we know how those arrays are supposed to go together. But that is okay. You can go through rescue mode and assemble the arrays exactly as you did before. And then --stop the arrays and assemble them correctly. But since we know how they are supposed to be assembled now you could skip the assembly of them in rescue mode or livecd mode or whatever you used and simply assemble the arrays correctly the first time. Basically I think you are going to do: * rescue * assemble arrays * stop arrays * assemble arrays correctly Which is perfectly acceptable. The result will be fine. But now that we know what we need to do you could simply do this: * rescue * assemble arrays correctly But I don't want to distract you with complications like this! :-) And then after you get everything working you should visit the partitioning on that second array. Your partitioning starts at the sector 1. But that won't be aligned. It will cause all writes to be a read-modify-write and performance will suffer. Device Boot Start End Blocks Id System /dev/sdd1 1 243201 1953512001 fd Linux raid autodetect Partition 1 does not start on physical sector boundary. Device Boot Start End Blocks Id System /dev/sde1 1 243201 1953512001 fd Linux raid autodetect Partition 1 does not start on physical sector boundary. Instead of using the entire disk starting at 1 it would be much better if you started at sector 2048 as is the new standard for Advanced Format 4k sector drives. I would expect that to be a large performance lever on your system. But fix that after you have your data up and available. Bob signature.asc Description: Digital signature
Re: Debian machine not booting
Thanks Bob, really can't thank you enough. Just to be clear about this, I'd do these commands from the rescue disk after I have assembled the arrays and gone to the bash shell? Much appreciated, James On 2 July 2013 22:44, Bob Proulx b...@proulx.com wrote: James Allsopp wrote: One other point sda isn't the boot hard drive, that's the partitions /sdb1 and sdc1, but these should be the same (I thought I'd mirrored them to be honest). I don't see sda anywhere. It might be a dual booting Windows disk? Or other. But the BIOS will boot the first disk from the BIOS boot order. BIOS boot order may be different from OS disk order. It can be confusing. I might assume that BIOS sata0 is the same as the OS disk sda but actually it often is different. Let's ignore this for now. You have sdb1 and sdc1 mirrored into md1. I can see that because the UUID is identical. /dev/md1: Version : 1.2 Creation Time : Thu Jan 31 22:43:49 2013 Raid Level : raid1 Array Size : 1953510841 (1863.01 GiB 2000.40 GB) Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Tue Jul 2 13:49:55 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : Hawaiian:1 (local to host Hawaiian) UUID : a544829f:33778728:79870439:241c5c51 Events : 112 Number Major Minor RaidDevice State 0 000 removed 1 8 651 active sync /dev/sde1 That info is the same as: /dev/md127: Version : 1.2 Creation Time : Thu Jan 31 22:43:49 2013 Raid Level : raid1 Array Size : 1953510841 (1863.01 GiB 2000.40 GB) Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Tue Jul 2 13:49:29 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : Hawaiian:1 (local to host Hawaiian) UUID : a544829f:33778728:79870439:241c5c51 Events : 106 Number Major Minor RaidDevice State 0 8 490 active sync /dev/sdd1 1 001 removed The UUIDs are identical. Therefore those two disks are mirrors of each other. And note: /dev/md1: (/dev/sde1) Update Time : Tue Jul 2 13:49:55 2013 /dev/md127: (/dev/sdd1) Update Time : Tue Jul 2 13:49:29 2013 sde1 is newer than sdd1. This seems consistent with it being the best copy to keep. If it were the other way around I would think about using the other one. But selecting the right master is important since it is a component of the lvm. How should I proceed from here? I would proceed as previously suggested. I would do this: mdadm --stop /dev/md127 mdadm --manage /dev/md1 --add /dev/sdd1 watch cat /proc/mdstat That will discard the older stale copy of the mirror on sdd1. It will use sdd1 as a mirror of sde1. After doing the add the mirror will sync and you can watch the progress using 'watch cat /proc/mdstat'. Use control-c to interrupt it when you want to stop it. For ruther information: /dev/sdb3: Preferred Minor : 126 ... /dev/sdc3: Preferred Minor : 126 ... That further information looked _okay_ to me. But I would still change the md126 back to md0. mdadm --stop /dev/md126 mdadm --assemble /dev/md0 --update=super-minor /dev/sdb1 /dev/sdc1 cat /proc/mdstat Since it is clean now it will be stopped cleanly and reassembled cleanly and no sync will be needed. The --update=super-minor will reset the superblock with the updated md0 minor device number. Then update /etc/mdadm/mdadm.conf and rebuild the initrd. Bob
Re: Debian machine not booting
James Allsopp wrote: Thanks Bob, really can't thank you enough. Just to be clear about this, I'd do these commands from the rescue disk after I have assembled the arrays and gone to the bash shell? Short answer: Yes. Go for it! Longer answer: There are all kinds of things I want to say here. And I already said a lot! There are some optimizations that could be made. But if you do what is outlined it should work. But I don't want to make things more confusing by talking about minor things. I have my fingers crossed for you! :-) Bob signature.asc Description: Digital signature
Re: Debian machine not booting
Thanks Bob, I'll get back to after I've followed your instructions. I think I'm going to have to learn to type with crossed fingers! I think I initially sorted out all my partitions manually, rather than directly using the installer to do it automatically, Really appreciated, James On 2 July 2013 00:46, Bob Proulx b...@proulx.com wrote: James Allsopp wrote: Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] md126 : active raid1 sdb3[0] sdc3[1] 972550912 blocks [2/2] [UU] So sdb3 and sdc3 are assembled into /dev/md126. That seems good. One full array is assembled. Is /dev/md126 your preferred name for that array? I would guess not. Usually it is /dev/md0 or some such. But when that name is not available because it is already in use then mdadm will rotate up to a later name like /dev/md126. You can fix this by using mdadm with --update=super-minor to force it back to the desired name. Something like this using your devices: mdadm --assemble /dev/md0 --update=super-minor /dev/sdb3 /dev/sdc3 But that can only be done at assembly time. If it is already assembled then you would need to stop the array first and then assemble it again. md127 : active raid1 sdd1[0] 1953510841 blocks super 1.2 [2/1] [U_] md1 : active raid1 sde1[1] 1953510841 blocks super 1.2 [2/1] [_U] I think this array is now has a split brain problem. At this point the original single mirrored array has had both halves of the mirror assembled and both are running. So now you have two clones of each other and both are active. Meaning that each think they are newer than the other. Is that right? In which case you will eventually need to pick one and call it the master. I think the sde1 is the natural master since it is assembled on /dev/md1. cat /etc/mdadm/mdadm.conf ... # definitions of existing MD arrays ARRAY /dev/md0 UUID=a529cd1b:c055887e:bfe78010:bc810f04 Only one array specified. That is definitely part of your problem. You should have at least two arrays specified there. mdadm --detail --scan: ARRAY /dev/md/0_0 metadata=0.90 UUID=a529cd1b:c055887e:bfe78010:bc810f04 That mdadm --scan only found one array is odd. fdisk -l Disk /dev/sda: 120.0 GB, 120033041920 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0002ae52 Device Boot Start End Blocks Id System /dev/sda1 1 14593 117218241 83 Linux I take it that this is your boot disk? Your boot disk is not RAID? I don't like that the first used sector is 1. That would have been 63 using the previous debian-installer to leave space for the MBR and other things. But that is a different issue. Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes That is an Advanced Format 4k sector drive. Meaning that the partitions should start on a 4k sector alignment. The debian-installer would do this automatically. Disk identifier: 0xe044b9be Device Boot Start End Blocks Id System /dev/sdd1 1 243201 1953512001 fd Linux raid autodetect ^ /dev/sde1 1 243201 1953512001 fd Linux raid autodetect ^ Partition 1 does not start on physical sector boundary. I don't recall if the first sector is 0 or 1 but I think the first sector is 0 for the partition table. Meaning that sector 1 is not going to be 4k aligned. (Can someone double check me on this?) Meaning that this will require a lot of read-modify-write causing performance problems for those drives. The new standard for sector alignment would start at 2048 to leave space for the partition table and other things and still be aligned properly. I don't know if this helps or where to go from here, but I think I need to get the mdadm up and running properly before I do anything. Probably a good idea. If there's any commands you need me to run, please ask, How are you booted now? Are you root on the system through something like the debian-installer rescue boot? Or did you use a live cd or something? Please run: # mdadm --detail /dev/sdd1 # mdadm --detail /dev/sde1 Those are what look to be the split brain of the second array. They will list something at the bottom that will look like: Number Major Minor RaidDevice State this 1 8 171 active sync /dev/sdb1 0 0 810 active sync /dev/sda1 1 1
Re: Debian machine not booting
One other point sda isn't the boot hard drive, that's the partitions /sdb1 and sdc1, but these should be the same (I thought I'd mirrored them to be honest). I tried mdadm --detail /dev/sdd1 but it didn't work. I have these results if they help? /dev/md1: Version : 1.2 Creation Time : Thu Jan 31 22:43:49 2013 Raid Level : raid1 Array Size : 1953510841 (1863.01 GiB 2000.40 GB) Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Tue Jul 2 13:49:55 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : Hawaiian:1 (local to host Hawaiian) UUID : a544829f:33778728:79870439:241c5c51 Events : 112 Number Major Minor RaidDevice State 0 000 removed 1 8 651 active sync /dev/sde1 /dev/md127: Version : 1.2 Creation Time : Thu Jan 31 22:43:49 2013 Raid Level : raid1 Array Size : 1953510841 (1863.01 GiB 2000.40 GB) Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Tue Jul 2 13:49:29 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : Hawaiian:1 (local to host Hawaiian) UUID : a544829f:33778728:79870439:241c5c51 Events : 106 Number Major Minor RaidDevice State 0 8 490 active sync /dev/sdd1 1 001 removed How should I proceed from here? James On 2 July 2013 09:50, James Allsopp jamesaalls...@googlemail.com wrote: Thanks Bob, I'll get back to after I've followed your instructions. I think I'm going to have to learn to type with crossed fingers! I think I initially sorted out all my partitions manually, rather than directly using the installer to do it automatically, Really appreciated, James On 2 July 2013 00:46, Bob Proulx b...@proulx.com wrote: James Allsopp wrote: Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] md126 : active raid1 sdb3[0] sdc3[1] 972550912 blocks [2/2] [UU] So sdb3 and sdc3 are assembled into /dev/md126. That seems good. One full array is assembled. Is /dev/md126 your preferred name for that array? I would guess not. Usually it is /dev/md0 or some such. But when that name is not available because it is already in use then mdadm will rotate up to a later name like /dev/md126. You can fix this by using mdadm with --update=super-minor to force it back to the desired name. Something like this using your devices: mdadm --assemble /dev/md0 --update=super-minor /dev/sdb3 /dev/sdc3 But that can only be done at assembly time. If it is already assembled then you would need to stop the array first and then assemble it again. md127 : active raid1 sdd1[0] 1953510841 blocks super 1.2 [2/1] [U_] md1 : active raid1 sde1[1] 1953510841 blocks super 1.2 [2/1] [_U] I think this array is now has a split brain problem. At this point the original single mirrored array has had both halves of the mirror assembled and both are running. So now you have two clones of each other and both are active. Meaning that each think they are newer than the other. Is that right? In which case you will eventually need to pick one and call it the master. I think the sde1 is the natural master since it is assembled on /dev/md1. cat /etc/mdadm/mdadm.conf ... # definitions of existing MD arrays ARRAY /dev/md0 UUID=a529cd1b:c055887e:bfe78010:bc810f04 Only one array specified. That is definitely part of your problem. You should have at least two arrays specified there. mdadm --detail --scan: ARRAY /dev/md/0_0 metadata=0.90 UUID=a529cd1b:c055887e:bfe78010:bc810f04 That mdadm --scan only found one array is odd. fdisk -l Disk /dev/sda: 120.0 GB, 120033041920 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0002ae52 Device Boot Start End Blocks Id System /dev/sda1 1 14593 117218241 83 Linux I take it that this is your boot disk? Your boot disk is not RAID? I don't like that the first used sector is 1. That would have been 63 using the previous debian-installer to leave space for the MBR and other things. But that is a different issue. Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096
Re: Debian machine not booting
For ruther information: /dev/sdb3: Magic : a92b4efc Version : 0.90.00 UUID : a529cd1b:c055887e:bfe78010:bc810f04 Creation Time : Fri Nov 20 09:37:34 2009 Raid Level : raid1 Used Dev Size : 972550912 (927.50 GiB 995.89 GB) Array Size : 972550912 (927.50 GiB 995.89 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Update Time : Tue Jul 2 13:49:18 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 6203fa40 - correct Events : 1036616 Number Major Minor RaidDevice State this 0 8 190 active sync /dev/sdb3 0 0 8 190 active sync /dev/sdb3 1 1 8 351 active sync /dev/sdc3 /dev/sdc3: Magic : a92b4efc Version : 0.90.00 UUID : a529cd1b:c055887e:bfe78010:bc810f04 Creation Time : Fri Nov 20 09:37:34 2009 Raid Level : raid1 Used Dev Size : 972550912 (927.50 GiB 995.89 GB) Array Size : 972550912 (927.50 GiB 995.89 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Update Time : Tue Jul 2 13:49:18 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 6203fa52 - correct Events : 1036616 Number Major Minor RaidDevice State this 1 8 351 active sync /dev/sdc3 0 0 8 190 active sync /dev/sdb3 1 1 8 351 active sync /dev/sdc3 /dev/sdd1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : a544829f:33778728:79870439:241c5c51 Name : Hawaiian:1 (local to host Hawaiian) Creation Time : Thu Jan 31 22:43:49 2013 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 3907021954 (1863.01 GiB 2000.40 GB) Array Size : 3907021682 (1863.01 GiB 2000.40 GB) Used Dev Size : 3907021682 (1863.01 GiB 2000.40 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 1e0de6be:bbcc874e:e00e2caa:593de9b1 Update Time : Tue Jul 2 13:51:19 2013 Checksum : a8cf720f - correct Events : 108 Device Role : Active device 0 Array State : A. ('A' == active, '.' == missing) /dev/sde1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : a544829f:33778728:79870439:241c5c51 Name : Hawaiian:1 (local to host Hawaiian) Creation Time : Thu Jan 31 22:43:49 2013 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 3907021954 (1863.01 GiB 2000.40 GB) Array Size : 3907021682 (1863.01 GiB 2000.40 GB) Used Dev Size : 3907021682 (1863.01 GiB 2000.40 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 926788c3:9dfbf62b:26934208:5a72d05d Update Time : Tue Jul 2 13:51:05 2013 Checksum : 94e2b4a1 - correct Events : 114 Device Role : Active device 1 Array State : .A ('A' == active, '.' == missing) Thanks James On 2 July 2013 13:52, James Allsopp jamesaalls...@googlemail.com wrote: One other point sda isn't the boot hard drive, that's the partitions /sdb1 and sdc1, but these should be the same (I thought I'd mirrored them to be honest). I tried mdadm --detail /dev/sdd1 but it didn't work. I have these results if they help? /dev/md1: Version : 1.2 Creation Time : Thu Jan 31 22:43:49 2013 Raid Level : raid1 Array Size : 1953510841 (1863.01 GiB 2000.40 GB) Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Tue Jul 2 13:49:55 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : Hawaiian:1 (local to host Hawaiian) UUID : a544829f:33778728:79870439:241c5c51 Events : 112 Number Major Minor RaidDevice State 0 000 removed 1 8 651 active sync /dev/sde1 /dev/md127: Version : 1.2 Creation Time : Thu Jan 31 22:43:49 2013 Raid Level : raid1 Array Size : 1953510841 (1863.01 GiB 2000.40 GB) Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Tue Jul 2 13:49:29 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : Hawaiian:1 (local to host Hawaiian) UUID : a544829f:33778728:79870439:241c5c51 Events : 106 Number Major Minor RaidDevice State 0 8 490 active sync /dev/sdd1
Re: Debian machine not booting
James Allsopp wrote: One other point sda isn't the boot hard drive, that's the partitions /sdb1 and sdc1, but these should be the same (I thought I'd mirrored them to be honest). I don't see sda anywhere. It might be a dual booting Windows disk? Or other. But the BIOS will boot the first disk from the BIOS boot order. BIOS boot order may be different from OS disk order. It can be confusing. I might assume that BIOS sata0 is the same as the OS disk sda but actually it often is different. Let's ignore this for now. You have sdb1 and sdc1 mirrored into md1. I can see that because the UUID is identical. /dev/md1: Version : 1.2 Creation Time : Thu Jan 31 22:43:49 2013 Raid Level : raid1 Array Size : 1953510841 (1863.01 GiB 2000.40 GB) Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Tue Jul 2 13:49:55 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : Hawaiian:1 (local to host Hawaiian) UUID : a544829f:33778728:79870439:241c5c51 Events : 112 Number Major Minor RaidDevice State 0 000 removed 1 8 651 active sync /dev/sde1 That info is the same as: /dev/md127: Version : 1.2 Creation Time : Thu Jan 31 22:43:49 2013 Raid Level : raid1 Array Size : 1953510841 (1863.01 GiB 2000.40 GB) Used Dev Size : 1953510841 (1863.01 GiB 2000.40 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Tue Jul 2 13:49:29 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : Hawaiian:1 (local to host Hawaiian) UUID : a544829f:33778728:79870439:241c5c51 Events : 106 Number Major Minor RaidDevice State 0 8 490 active sync /dev/sdd1 1 001 removed The UUIDs are identical. Therefore those two disks are mirrors of each other. And note: /dev/md1: (/dev/sde1) Update Time : Tue Jul 2 13:49:55 2013 /dev/md127: (/dev/sdd1) Update Time : Tue Jul 2 13:49:29 2013 sde1 is newer than sdd1. This seems consistent with it being the best copy to keep. If it were the other way around I would think about using the other one. But selecting the right master is important since it is a component of the lvm. How should I proceed from here? I would proceed as previously suggested. I would do this: mdadm --stop /dev/md127 mdadm --manage /dev/md1 --add /dev/sdd1 watch cat /proc/mdstat That will discard the older stale copy of the mirror on sdd1. It will use sdd1 as a mirror of sde1. After doing the add the mirror will sync and you can watch the progress using 'watch cat /proc/mdstat'. Use control-c to interrupt it when you want to stop it. For ruther information: /dev/sdb3: Preferred Minor : 126 ... /dev/sdc3: Preferred Minor : 126 ... That further information looked _okay_ to me. But I would still change the md126 back to md0. mdadm --stop /dev/md126 mdadm --assemble /dev/md0 --update=super-minor /dev/sdb1 /dev/sdc1 cat /proc/mdstat Since it is clean now it will be stopped cleanly and reassembled cleanly and no sync will be needed. The --update=super-minor will reset the superblock with the updated md0 minor device number. Then update /etc/mdadm/mdadm.conf and rebuild the initrd. Bob signature.asc Description: Digital signature
Re: Debian machine not booting
Thanks Bob for your e-mail, it was really helpful. I think you've identified the nub of the problem, not updating mdadm.conf and the initramfs. However things are a bit unusual on the other side. I'm not sure if the rescue disk or myself has screwed something up, but the second raid which has home extended onto it has divided into two raid arrays. Here's a summary, cat /proc/mdstat: Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] md126 : active raid1 sdb3[0] sdc3[1] 972550912 blocks [2/2] [UU] md127 : active raid1 sdd1[0] 1953510841 blocks super 1.2 [2/1] [U_] md1 : active raid1 sde1[1] 1953510841 blocks super 1.2 [2/1] [_U] unused devices: none cat /etc/mdadm/mdadm.conf # mdadm.conf # # Please refer to mdadm.conf(5) for information about this file. # # by default, scan all partitions (/proc/partitions) for MD superblocks. # alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions # auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system HOMEHOST system # instruct the monitoring daemon where to send mail alerts MAILADDR root # definitions of existing MD arrays ARRAY /dev/md0 UUID=a529cd1b:c055887e:bfe78010:bc810f04 # This file was auto-generated on Mon, 11 Jan 2010 22:18:22 + # by mkconf 3.0.3-2 mdadm --detail --scan: ARRAY /dev/md/0_0 metadata=0.90 UUID=a529cd1b:c055887e:bfe78010:bc810f04 ls -l /dev/disk/by-uuid total 0 lrwxrwxrwx 1 root root 10 Jun 30 23:25 5e39b4bc-3b24-4df3-978d-1b3d3dca97da - ../../sdb1 lrwxrwxrwx 1 root root 10 Jun 30 23:25 93a8d1f1-96f2-4169-852a-b37100b3e497 - ../../sda1 lrwxrwxrwx 1 root root 10 Jun 30 23:25 a5c8d2c0-e454-4288-9987-ea7712242858 - ../../sdc1 lrwxrwxrwx 1 root root 10 Jun 30 23:25 ba9f44ad-d43e-4863-801d-2de96d80ca08 - ../../sdc2 lrwxrwxrwx 1 root root 10 Jun 30 23:25 ea2afa32-26b3-42af-83a3-57efc3ae3dce - ../../sdb2 fdisk -l Disk /dev/sdb: 1000.2 GB, 1000203804160 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xf229fe3e Device Boot Start End Blocks Id System /dev/sdb1 * 1 37 297171 83 Linux /dev/sdb2 38 524 3911827+ 82 Linux swap / Solaris /dev/sdb3 525 121601 972551002+ fd Linux raid autodetect Disk /dev/sda: 120.0 GB, 120033041920 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0002ae52 Device Boot Start End Blocks Id System /dev/sda1 1 14593 117218241 83 Linux Disk /dev/sdc: 1000.2 GB, 1000203804160 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00049c5c Device Boot Start End Blocks Id System /dev/sdc1 1 37 297171 83 Linux /dev/sdc2 38 524 3911827+ 82 Linux swap / Solaris /dev/sdc3 525 121601 972551002+ fd Linux raid autodetect Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0xe044b9be Device Boot Start End Blocks Id System /dev/sdd1 1 243201 1953512001 fd Linux raid autodetect Partition 1 does not start on physical sector boundary. Disk /dev/sde: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0xcfa9d090 Device Boot Start End Blocks Id System /dev/sde1 1 243201 1953512001 fd Linux raid autodetect Partition 1 does not start on physical sector boundary. Disk /dev/md1: 2000.4 GB, 2000395101184 bytes 2 heads, 4 sectors/track, 488377710 cylinders Units = cylinders of 8 * 512 = 4096 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Alignment offset: 512 bytes Disk identifier: 0x Disk /dev/md127: 2000.4 GB, 2000395101184 bytes 2 heads, 4 sectors/track, 488377710 cylinders Units = cylinders of 8 * 512 = 4096 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Alignment
Re: Debian machine not booting
James Allsopp wrote: Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] md126 : active raid1 sdb3[0] sdc3[1] 972550912 blocks [2/2] [UU] So sdb3 and sdc3 are assembled into /dev/md126. That seems good. One full array is assembled. Is /dev/md126 your preferred name for that array? I would guess not. Usually it is /dev/md0 or some such. But when that name is not available because it is already in use then mdadm will rotate up to a later name like /dev/md126. You can fix this by using mdadm with --update=super-minor to force it back to the desired name. Something like this using your devices: mdadm --assemble /dev/md0 --update=super-minor /dev/sdb3 /dev/sdc3 But that can only be done at assembly time. If it is already assembled then you would need to stop the array first and then assemble it again. md127 : active raid1 sdd1[0] 1953510841 blocks super 1.2 [2/1] [U_] md1 : active raid1 sde1[1] 1953510841 blocks super 1.2 [2/1] [_U] I think this array is now has a split brain problem. At this point the original single mirrored array has had both halves of the mirror assembled and both are running. So now you have two clones of each other and both are active. Meaning that each think they are newer than the other. Is that right? In which case you will eventually need to pick one and call it the master. I think the sde1 is the natural master since it is assembled on /dev/md1. cat /etc/mdadm/mdadm.conf ... # definitions of existing MD arrays ARRAY /dev/md0 UUID=a529cd1b:c055887e:bfe78010:bc810f04 Only one array specified. That is definitely part of your problem. You should have at least two arrays specified there. mdadm --detail --scan: ARRAY /dev/md/0_0 metadata=0.90 UUID=a529cd1b:c055887e:bfe78010:bc810f04 That mdadm --scan only found one array is odd. fdisk -l Disk /dev/sda: 120.0 GB, 120033041920 bytes 255 heads, 63 sectors/track, 14593 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0002ae52 Device Boot Start End Blocks Id System /dev/sda1 1 14593 117218241 83 Linux I take it that this is your boot disk? Your boot disk is not RAID? I don't like that the first used sector is 1. That would have been 63 using the previous debian-installer to leave space for the MBR and other things. But that is a different issue. Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes That is an Advanced Format 4k sector drive. Meaning that the partitions should start on a 4k sector alignment. The debian-installer would do this automatically. Disk identifier: 0xe044b9be Device Boot Start End Blocks Id System /dev/sdd1 1 243201 1953512001 fd Linux raid autodetect ^ /dev/sde1 1 243201 1953512001 fd Linux raid autodetect ^ Partition 1 does not start on physical sector boundary. I don't recall if the first sector is 0 or 1 but I think the first sector is 0 for the partition table. Meaning that sector 1 is not going to be 4k aligned. (Can someone double check me on this?) Meaning that this will require a lot of read-modify-write causing performance problems for those drives. The new standard for sector alignment would start at 2048 to leave space for the partition table and other things and still be aligned properly. I don't know if this helps or where to go from here, but I think I need to get the mdadm up and running properly before I do anything. Probably a good idea. If there's any commands you need me to run, please ask, How are you booted now? Are you root on the system through something like the debian-installer rescue boot? Or did you use a live cd or something? Please run: # mdadm --detail /dev/sdd1 # mdadm --detail /dev/sde1 Those are what look to be the split brain of the second array. They will list something at the bottom that will look like: Number Major Minor RaidDevice State this 1 8 171 active sync /dev/sdb1 0 0 810 active sync /dev/sda1 1 1 8 171 active sync /dev/sdb1 Except in your case each will list one drive and will probably have the other drive listed as removed. But importantly it will list the UUID of the array in the listing. Magic : a914bfec Version : 0.90.00 UUID : b8eb34b1:bcd37664:2d9e4c59:117ab348 Creation Time : Fri Apr 30 17:21:12 2010 Raid Level : raid1 Used Dev Size :
Debian machine not booting
Hi, I have a debian machine which was on for a long time (~months). Just moved house and rebooted and now it doesn't boot. My 4 harddrives are organised in pairs of RAID 1 (Mirrored) with LVM spanning them. Originally there was just one pair, but then I got two new hard drives and added them. I then increased the space of VolGroup-LogVol03 to cover these new drives and increase the space of Home (/ wass on one of the other logical volume groups). This all worked fine for ages. When I boot all four drives are detected in BIOS and I've check all the connections. It gets to 3 logical volumes in volume group VolGroup now active which sounds good. Then Activating lvm and md swap.. done Checking file sysmtes...fsck from util-linux-ng 2.17.2 Says /dev/sde1: clean /dev/sda1:clean /dev/mapper/VolGroup-LogVol01: clean /dev/mapper/VolGroup-LogVol02: clean Then here's the error: fsck.ext4: No such file or directory while trying to open /dev/mapper/VolGroup-LogVol03 /dev/mapper/VolGroup-LogVol03: The superblock could not be read or does not describe a correct ext2 . NB. All partitions are Ext4, from memory. It then drops to a maintenance shell. and says to check a log (/var/log/fsck/checkfs) but I don't even have a log directory at this point in the boot process. I'm wondering if some of the drive id's have been switched. Apologies for quoting, I'm not using the computer in question. Any help would be really appreciated. I'm worried I've lost all my data on home Thanks, James
Re: Debian machine not booting
James Allsopp wrote: I have a debian machine which was on for a long time (~months). Just moved house and rebooted and now it doesn't boot. Bummer. My 4 harddrives are organised in pairs of RAID 1 (Mirrored) with LVM spanning them. Originally there was just one pair, but then I got two new hard drives and added them. I then increased the space of VolGroup-LogVol03 to cover these new drives and increase the space of Home (/ wass on one of the other logical volume groups). This all worked fine for ages. Sounds fine. Assuming that it booted after those changes. When I boot all four drives are detected in BIOS and I've check all the connections. Good. It gets to 3 logical volumes in volume group VolGroup now active which sounds good. That does sound good. Then here's the error: fsck.ext4: No such file or directory while trying to open /dev/mapper/VolGroup-LogVol03 /dev/mapper/VolGroup-LogVol03: The superblock could not be read or does not describe a correct ext2 . Hmm... I am not familiar with that error. But searching the web found several stories about it. Most concerned recent changes to the system that prevented it from booting. I have a debian machine which was on for a long time (~months). Just moved house and rebooted and now it doesn't boot. My 4 harddrives are organised in pairs of RAID 1 (Mirrored) with LVM spanning them. Originally there was just one pair, but then I got two new hard drives and added them. I then increased the space of VolGroup-LogVol03 to cover these new drives and increase the space of Home (/ wass on one of the other logical volume groups). This all worked fine for ages. And you rebooted in that time period? Otherwise these changes, if not done completely correct, seem prime to have triggered your current problem independent of any other action. You say it was on for a long time. If you had not rebooted in that long time then this may have been a hang-fire problem for all of that time. I'm wondering if some of the drive id's have been switched. If you mean the drive UUIDs then no those would not have changed. Any help would be really appreciated. I'm worried I've lost all my data on home First, do not despair. You should be able to get your system working again. You are probably simply missing the extra raid pair configuration. I strongly recommend using the debian-installer rescue mode to gain control of your system again. It works well and is readily available. Use a standard Debian installation disk. Usually we recommend the netinst disk because it is the smallest image. But any of the netinst or CD#1 or DVD#1 images will work fine for rescue mode since it is not actually installing but booting your system at that point so the difference between them does not matter. You have a disk? Go fish it out and boot it. Here is the official documentation for it: http://www.debian.org/releases/stable/i386/ch08s07.html.en But that is fairly terse. Let me say that the rescue mode looks just like the install mode initially. It will ask your keyboard and locale questions and you might wonder if you are rescuing or installing! But it will have Rescue in the upper left corner so that you can tell that you are not in install mode and be assured. Get the tool set up with keyboard, locale, timezone, and similar and eventually it will give you a menu with a list of actions. Here is a quick run-through. Advanced options... Rescue mode keyboard dialog ...starts networking... hostname dialog domainname dialog ...apt update release files... ...loading additional components, Retrieving udebs... ...detecting disks... Then eventually it will get to a menu Enter rescue mode that will ask what device to use as a root file system. It will list the partitions that it has automatically detected. If you have used a RAID then one of the menu entry items near the bottom will be Assemble RAID array and you should assemble the raid at that point. That will bring up the next dialog menu asking for partitions to assemble. Select the appropriate for your system. Then continue. Since you have two RAID configurations I think you will need to do this twice. Once for each. I believe that you won't be able to use the automatically select partitions option but not sure. In any case get both raid arrays up and online at this step before proceeding. At that point it presents a menu Execute a shell in /dev/ That should get you a shell on your system with the root partition mounted. It is a /bin/sh shell. I usually at that point start bash so as to have bash command line recall and editing. Then mount all of the additional disks. # /bin/bash root@hostname:~# mount -a At that point you have a root superuser shell on the system and can make system changes. After doing what needs doing you can reboot to the system. Remove the Debian install media and boot to the normal system and see if the changes were able