Re: HELP! Re: How to fix I/O errors? (SOLVED)
On Sun, Feb 12, 2017 at 09:36:16PM -0500, Bob Weber wrote: > I use a program called ossec. It watches logs of all my linux boxes so I get > email messages about disk problems. I also do periodic self tests on all my > drives controlled by smartd from the smartmontools package. I also use a > package called logwatch which summarizes my logs. The messages from mdadm > and > smartd are seen by ossec. When I mess with an array to make it larger and > add a > disk for backup I get the messages in my mailbox about a degraded array. As > I'm > reading them I am startled until I remember ...Oh I did that! I have a daily > cron job that emails the output of "smartctl -a /dev/sdx" for each drive on > each > machine so I can keep a history of the parameters for each drive. > $ apt-file search ossec sagan-rules: /etc/sagan-rules/ossec.rules Seems like the only reference to ossec in Jessie is this rules file in the Sagan package. Looking at the description for sagan-rules, it seems to be along the right lines. But the sagan package is not in Jessie it seems. It's in wheezy and in stretch/sid, but not in jessie. Any idea what's up with that? And was ossec packaged, or did you build it from source? Cheers Mark
Re: HELP! Re: How to fix I/O errors? (SOLVED)
On 02/12/2017 06:36 PM, Bob Weber wrote: After writing this I wonder if I am over doing this. I just don't want to loose data from a failing drive. I lived through 3.5 inch floppies which seemed to always fail. And tape drives that were painfully slow. Not to mention back in the mid 70s saving Z80 programs and data to audio cassette tapes at 1200 baud! I was so glad to get my first 8 inch floppys working. ...Bob I, too remember the cassette tapes for saving files and programs on my TRS-80 Model III. I think I still have a few of those tapes (10 minutes tapes for a single program) lying around. The Radio Shack cassette player has long since died, however. Marc
Re: HELP! Re: How to fix I/O errors? (SOLVED)
On 02/12/2017 01:59 PM, Marc Shapiro wrote: > On 02/12/2017 08:30 AM, Marc Auslander wrote: >> I do not use LVM over raid 1. I think it can be made to work, >> although IIRC booting from an LVM over RAID partion has caused issues. > my boot partitions are separate. They are not under LVM. >> LVM is useful when space requirements are changing over time and the >> ability to add additional disks and grow logical partions is needed. >> In my case, that isn't an issue. I have only a small number of >> paritions - 3 because of history but starting from scratch, I'd only >> have two - root (including boot) and /home. > I started using LVM when I had a much smaller disk (40GB). With the current > 1TB disk, even with three accounts on the box, and expanding several > partitions when moving to the new disk, I have still partitioned less than > half the disk and that is less than 1/3 used. So, no, LVM is probably not an > issue any more. > > BTW, what is your third partition, and why would you not separate it now if > starting from scratch? >> I converted to mdamd raid as follows, IIRC. >> >> Install the second disk, and parition it the way I wanted. >> Create a one disk raid 1 partion in each of the new paritions. >> Take down my system, boot a live system from CD, and use a reliable >> copy program like rsync to copy each of the partitions contents to the >> equivalent raid partition. >> Run grub to set the new disk as bootable. This is by far the >> trickiest part. >> Boot the new system and verify it's happy. >> Repartion the now spare disk to match the new one if necessary. >> You may need to zero the front of each partion with dd if=/dev/zero >> to avoid mdadm error checks. >> Add the partitions from that disk to the mdadm paritions and let mdadm >> do its thing. >> > On 02/12/2017 07:08 AM, Bob Weber wrote: >> >> I use raid 1 also for the redundancy it provides. If I need a backup I just >> connect a disk, grow each array and add it to the array (I have 3 arrays for >> /, /home and swap). It syncs up in a couple hours (depending on size of the >> array). If you have grub install itself on the added disk you have a >> bootable copy of your system (mdadm will complain about a degraded array). I >> then remove the drive and place it in another outbuilding in case of fire. >> You can even use a external USB disk housing for the drive to keep from >> shutting down the system. The sync is MUCH slower ... just coma back the >> next day and you will have your backup. You then grow each array back to the >> number of disks you had before and all is happy again. Note that this single >> disk backup will only work with raid 1. >> > So, how do you do a complete restore from backup? Boot from just the single > backup drive and add additional drives as Marc Auslander describes, above? Yes if that is what you need to do if there was a complete failure in your machine and maybe you had to start over with a new motherboard and power supply. > > > One other question. If using raid, how do you know when a disk is starting to > have trouble, as mine did? Since the whole purpose of raid is to keep the > system up and running I wouldn't expect errors to pop up like I was getting. > Do you have to keep an eye on log files? Which ones? Or is there some other > way that mdadm provides notification of errors? I've got to admit, even > though I have been using Debian for 18 or 19 years (since Bo), log files have > never been my favorite thing. I generally only look at them when I have a > problem and someone on this luist tells me what to look for and where. > > Marc > > I use a program called ossec. It watches logs of all my linux boxes so I get email messages about disk problems. I also do periodic self tests on all my drives controlled by smartd from the smartmontools package. I also use a package called logwatch which summarizes my logs. The messages from mdadm and smartd are seen by ossec. When I mess with an array to make it larger and add a disk for backup I get the messages in my mailbox about a degraded array. As I'm reading them I am startled until I remember ...Oh I did that! I have a daily cron job that emails the output of "smartctl -a /dev/sdx" for each drive on each machine so I can keep a history of the parameters for each drive. I also use backuppc on a dedicated server to backup all my boxes. That way I can get back files I deleted by mistake or modified and has to go back to a previous version. I now have all my machines on raid 1, My wife just recently gave up on Win 10 with all those updates that just took over her machine when Windows wanted to! So now she is running Debian/KDE. After writing this I wonder if I am over doing this. I just don't want to loose data from a failing drive. I lived through 3.5 inch floppies which seemed to always fail. And tape drives that were painfully slow. Not to mention back in the mid 70s saving Z80 programs and data to audio cassette tapes
Re: HELP! Re: How to fix I/O errors? (SOLVED)
Marc Shapiro writes: > BTW, what is your third partition, and why would you not separate it > now if starting from scratch? My third partition is for backups which I make to protect against software or operator error. At one point it was on a separate disk since disks were small and without LVM had to be a different partition/file system. > > > One other question. If using raid, how do you know when a disk is > starting to have trouble, as mine did? Since the whole purpose of ... > Marc Ok - I'm pretty paranoid about that. smart is checking. mdadm will notice if a disk is bad and turn it off, so to speak. Again in the logs. I run a cron job to check form smart errors based on: smartctl -l error -q errorsonly "device" smartctl -H -q errorsonly "device" But I've always checked all my disks once a week. A root cron job reads the whole disk with dd into /dev/null. Any error get logged, of course. Separately, a cron job scans syslog and syslog.1 grepping for "IO Error" and informs me by email if any new errors are found. This catches error in the dd check but also actual errors in operation.
Re: HELP! Re: How to fix I/O errors? (SOLVED)
On 02/12/2017 08:30 AM, Marc Auslander wrote: I do not use LVM over raid 1. I think it can be made to work, although IIRC booting from an LVM over RAID partion has caused issues. my boot partitions are separate. They are not under LVM. LVM is useful when space requirements are changing over time and the ability to add additional disks and grow logical partions is needed. In my case, that isn't an issue. I have only a small number of paritions - 3 because of history but starting from scratch, I'd only have two - root (including boot) and /home. I started using LVM when I had a much smaller disk (40GB). With the current 1TB disk, even with three accounts on the box, and expanding several partitions when moving to the new disk, I have still partitioned less than half the disk and that is less than 1/3 used. So, no, LVM is probably not an issue any more. BTW, what is your third partition, and why would you not separate it now if starting from scratch? I converted to mdamd raid as follows, IIRC. Install the second disk, and parition it the way I wanted. Create a one disk raid 1 partion in each of the new paritions. Take down my system, boot a live system from CD, and use a reliable copy program like rsync to copy each of the partitions contents to the equivalent raid partition. Run grub to set the new disk as bootable. This is by far the trickiest part. Boot the new system and verify it's happy. Repartion the now spare disk to match the new one if necessary. You may need to zero the front of each partion with dd if=/dev/zero to avoid mdadm error checks. Add the partitions from that disk to the mdadm paritions and let mdadm do its thing. On 02/12/2017 07:08 AM, Bob Weber wrote: I use raid 1 also for the redundancy it provides. If I need a backup I just connect a disk, grow each array and add it to the array (I have 3 arrays for /, /home and swap). It syncs up in a couple hours (depending on size of the array). If you have grub install itself on the added disk you have a bootable copy of your system (mdadm will complain about a degraded array). I then remove the drive and place it in another outbuilding in case of fire. You can even use a external USB disk housing for the drive to keep from shutting down the system. The sync is MUCH slower ... just coma back the next day and you will have your backup. You then grow each array back to the number of disks you had before and all is happy again. Note that this single disk backup will only work with raid 1. So, how do you do a complete restore from backup? Boot from just the single backup drive and add additional drives as Marc Auslander describes, above? One other question. If using raid, how do you know when a disk is starting to have trouble, as mine did? Since the whole purpose of raid is to keep the system up and running I wouldn't expect errors to pop up like I was getting. Do you have to keep an eye on log files? Which ones? Or is there some other way that mdadm provides notification of errors? I've got to admit, even though I have been using Debian for 18 or 19 years (since Bo), log files have never been my favorite thing. I generally only look at them when I have a problem and someone on this luist tells me what to look for and where. Marc
Re: HELP! Re: How to fix I/O errors? (SOLVED)
Marc Shapiro writes: > the past couple of weeks. AIUI you can use LVM over raid. Is there > any actual advantage to this? I was trying to determine the > advantages of using straight raid, straight LVM, or LVM over raid. If > I decide, later, to use raid, how dificult is it to add to a currently > running system (with, or without LVM)? > > > Marc I do not use LVM over raid 1. I think it can be made to work, although IIRC booting from an LVM over RAID partion has caused issues. LVM is useful when space requirements are changing over time and the ability to add additional disks and grow logical partions is needed. In my case, that isn't an issue. I have only a small number of paritions - 3 because of history but starting from scratch, I'd only have two - root (including boot) and /home. I converted to mdamd raid as follows, IIRC. Install the second disk, and parition it the way I wanted. Create a one disk raid 1 partion in each of the new paritions. Take down my system, boot a live system from CD, and use a reliable copy program like rsync to copy each of the partitions contents to the equivalent raid partition. Run grub to set the new disk as bootable. This is by far the trickiest part. Boot the new system and verify it's happy. Repartion the now spare disk to match the new one if necessary. You may need to zero the front of each partion with dd if=/dev/zero to avoid mdadm error checks. Add the partitions from that disk to the mdadm paritions and let mdadm do its thing.
Re: HELP! Re: How to fix I/O errors? (SOLVED)
I use raid 1 also for the redundancy it provides. If I need a backup I just connect a disk, grow each array and add it to the array (I have 3 arrays for /, /home and swap). It syncs up in a couple hours (depending on size of the array). If you have grub install itself on the added disk you have a bootable copy of your system (mdadm will complain about a degraded array). I then remove the drive and place it in another outbuilding in case of fire. You can even use a external USB disk housing for the drive to keep from shutting down the system. The sync is MUCH slower ... just coma back the next day and you will have your backup. You then grow each array back to the number of disks you had before and all is happy again. Note that this single disk backup will only work with raid 1. *...Bob* On 02/11/2017 10:42 PM, Marc Shapiro wrote: > On 02/11/2017 05:22 PM, Marc Auslander wrote: >> You didn't ask for advice so take it or ignore it. >> >> IMHO, in this day and age, there is no reason not to run raid 1. Two >> disks, identially partitioned, each parition set up as a raid 1 >> partition with two copies. >> >> When a disk dies, you remove it from all the raid partitions, pop in a >> new disk, partition it, add the new partitions back into the raid >> partitions and raid rebuilds the copies. >> >> Except for taking the system down to replace the disk (assuming you >> don't have a third installed as a spare) you just keep running as if >> nothing has happened. >> > I had been considering using raid 1 and I have not yet ruled it out entirely. > I have never used raid and have been reading up on it over the past couple of > weeks. AIUI you can use LVM over raid. Is there any actual advantage to > this? I was trying to determine the advantages of using straight raid, > straight LVM, or LVM over raid. If I decide, later, to use raid, how dificult > is it to add to a currently running system (with, or without LVM)? > > > Marc > >
Re: HELP! Re: How to fix I/O errors? (SOLVED)
On 02/11/2017 05:22 PM, Marc Auslander wrote: You didn't ask for advice so take it or ignore it. IMHO, in this day and age, there is no reason not to run raid 1. Two disks, identially partitioned, each parition set up as a raid 1 partition with two copies. When a disk dies, you remove it from all the raid partitions, pop in a new disk, partition it, add the new partitions back into the raid partitions and raid rebuilds the copies. Except for taking the system down to replace the disk (assuming you don't have a third installed as a spare) you just keep running as if nothing has happened. I had been considering using raid 1 and I have not yet ruled it out entirely. I have never used raid and have been reading up on it over the past couple of weeks. AIUI you can use LVM over raid. Is there any actual advantage to this? I was trying to determine the advantages of using straight raid, straight LVM, or LVM over raid. If I decide, later, to use raid, how dificult is it to add to a currently running system (with, or without LVM)? Marc
Re: HELP! Re: How to fix I/O errors? (SOLVED)
Marc Auslander composed on 2017-02-11 20:22 (UTC-0500): IMHO, in this day and age, there is no reason not to run raid 1. Are you sure? Laptops have been outselling desktops for years. -- "The wise are known for their understanding, and pleasant words are persuasive." Proverbs 16:21 (New Living Translation) Team OS/2 ** Reg. Linux User #211409 ** a11y rocks! Felix Miata *** http://fm.no-ip.com/
Re: HELP! Re: How to fix I/O errors? (SOLVED)
You didn't ask for advice so take it or ignore it. IMHO, in this day and age, there is no reason not to run raid 1. Two disks, identially partitioned, each parition set up as a raid 1 partition with two copies. When a disk dies, you remove it from all the raid partitions, pop in a new disk, partition it, add the new partitions back into the raid partitions and raid rebuilds the copies. Except for taking the system down to replace the disk (assuming you don't have a third installed as a spare) you just keep running as if nothing has happened.
Re: HELP! Re: How to fix I/O errors? (SOLVED)
On 02/10/17 23:39, Marc Shapiro wrote: On 02/08/2017 05:32 PM, David Christensen wrote: On 02/08/17 15:59, Marc Shapiro wrote: So how do I lay down a low level format on [the new 1 TB] drive? I would use the SeaTools bootable CD to fill the drive with zeroes: On 02/03/17 23:13, David Christensen wrote: Sometimes you get lucky and the tool is a live CD: www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS223ALL.ISO I didn't feel like burning a CD and it has been a long time since I had a box with a 3.5" floppy (although i do have one or two drives in a box somewhere and quite a few of the folppies, themselves, as well) 3.5" floppy? The link above is for a live CD. so I just used dd to write zeros to the disk. It took a while, but it > did the job. For a HDD, the effect should be the same. I partitioned the new disk with 3 physical partitions of 2GB each for root/boot partitions. ... The 4th partition was set up for LVM and was set as a Physical Volume (PV) to be added to the volume group along with my old drive. The problem with putting everything on one big disk is that it becomes impractical to clone the system image. I'm still climbing the disk imaging learning curve, but it's a useful technique that has saved me countless hours. In the end, I picked yet another method for moving to the new disk. ... Congratulations on your success battling through it all, especially LVM. David
Re: HELP! Re: How to fix I/O errors? (SOLVED)
On 02/08/2017 05:32 PM, David Christensen wrote: On 02/08/17 15:59, Marc Shapiro wrote: So how do I lay down a low level format on [the new 1 TB] drive? I would use the SeaTools bootable CD to fill the drive with zeroes: On 02/03/17 23:13, David Christensen wrote: > Sometimes you get lucky and the tool is a live CD: > > www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS223ALL.ISO David I didn't feel like burning a CD and it has been a long time since I had a box with a 3.5" floppy (although i do have one or two drives in a box somewhere and quite a few of the folppies, themselves, as well) so I just used dd to write zeros to the disk. It took a while, but it did the job. In the end, I picked yet another method for moving to the new disk. As mentioned in my first post, I am using LVM and I have unused space in the VG. I was debating with myself whether I wanted to continue to use LVM, or just use raw disk partitions. I almost went with raw disk partitions before I came across 'pvmove', which does exactly what I needed. So... I partitioned the new disk with 3 physical partitions of 2GB each for root/boot partitions. The 4th partition was set up for LVM and was set as a Physical Volume (PV) to be added to the volume group along with my old drive. Before adding the new disk, I created a new Logical Volume (LV) and manually copied my home partition (one user tree at a time) to the new partition. This spat out errors whenever it hit an unreadable sector and I redirected those errors to a file for later use. I then added the LVM partition from the new disk to the Volume Group (VG) and did a 'pvmove' for each LV from the old PVto the new PV. I included the original LV for /home, along with the newly copied LV. I expected it to spit out errors and fail, but it didn't. I could hear it struggle a bit when it hit the bad spots, but then it kept going. This was actually a good thing. I had the list of affected files from when I did the manual copy of the /home partition, so I knew what to check after the move. Several of the files were videos. Using the original files before copying, Xine would play up to the first I/O Error and then freeze, even though it continued to read the file and advance the timeline until the file ended. Using the manually copied file, which truncated at the first error, I also only got the beginning of the video and then it ended. Using the file from the original LV which I moved to the new disk with pvmove, however, gave better results. There is a bit of flicker when it hits a sector that had been unreadable before moving, but it continues on so the rest of the video can be viewed. A few of the other files I did delete (Libre Office document files do not survive well, but I have a PDF of that file if I ever need it again). Then I just had to copy over the root/boot partitions which I did from a shell after booting my clonezilla CD (it came in handy after all) and run lilo on them to make the new disk bootable. Everything seems good, now. I ran the full test from SeagateTools (st) again, today, just to verify that all was still good. It was. I now have an empty PV in my LVM volume group that I will need to remove before I add any new Logical Volumes (LVs), but I can do that any time. Since there are no LVs on it nothing will attempt to read from it, or write to it. I'll keep an eye on the disk for a while, but this should fix the problem. If I ever have a failing disk again I hope that I will remember this method because the LVM pvmove command really did make moving to another disk easy. The hard part was dealing with the root/boot partitions and getting the new disk bootable. Hopefully this thread will help someone else who has a similar problem in the future. Marc