Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-13 Thread Mark Fletcher
On Sun, Feb 12, 2017 at 09:36:16PM -0500, Bob Weber wrote:
> I use a program called ossec.  It watches logs of all my linux boxes so I get
> email messages about disk problems.  I also do periodic self tests on all my
> drives controlled by smartd from the  smartmontools package.  I also use a
> package called logwatch which summarizes my logs.   The messages from mdadm 
> and
> smartd are seen by ossec.  When I mess with an array to make it larger and 
> add a
> disk for backup I get the messages in my mailbox about a degraded array.  As 
> I'm
> reading them I am startled until I remember ...Oh I did that!  I have a daily
> cron job that emails the output of "smartctl -a /dev/sdx" for each drive on 
> each
> machine so I can keep a history of the parameters for each drive.
> 

$ apt-file search ossec

sagan-rules: /etc/sagan-rules/ossec.rules

Seems like the only reference to ossec in Jessie is this rules file in 
the Sagan package. Looking at the description for sagan-rules, it seems 
to be along the right lines. But the sagan package is not in Jessie it 
seems. It's in wheezy and in stretch/sid, but not in jessie. Any idea 
what's up with that?

And was ossec packaged, or did you build it from source?

Cheers

Mark



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Marc Shapiro

On 02/12/2017 06:36 PM, Bob Weber wrote:


After writing this I wonder if I am over doing this.  I just don't want to loose
data from a failing drive.  I lived through 3.5 inch floppies which seemed to
always fail.  And tape drives that were painfully slow.  Not to mention back in
the mid 70s saving Z80 programs and data to audio cassette tapes at 1200 baud!
I was so glad to get my first 8 inch floppys working.

...Bob

I, too remember the cassette tapes for saving files and programs on my 
TRS-80 Model III.  I think I still have a few of those tapes (10 minutes 
tapes for a single program) lying around.  The Radio Shack cassette 
player has long since died, however.



Marc



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Bob Weber
On 02/12/2017 01:59 PM, Marc Shapiro wrote:
> On 02/12/2017 08:30 AM, Marc Auslander wrote:
>> I do not use LVM over raid 1.  I think it can be made to work,
>> although IIRC booting from an LVM over RAID partion has caused issues.
> my boot partitions are separate.  They are not under LVM.
>> LVM is useful when space requirements are changing over time and the
>> ability to add additional disks and grow logical partions is needed.
>> In my case, that isn't an issue.  I have only a small number of
>> paritions - 3 because of history but starting from scratch, I'd only
>> have two - root (including boot) and /home.
> I started using LVM when I had a much smaller disk (40GB).  With the current
> 1TB disk, even with three accounts on the box, and expanding several
> partitions when moving to the new disk, I have still partitioned less than
> half the disk and that is less than 1/3 used. So, no, LVM is probably not an
> issue any more.
>
> BTW, what is your third partition, and why would you not separate it now if
> starting from scratch?
>> I converted to mdamd raid as follows, IIRC.
>>
>> Install the second disk, and parition it the way I wanted.
>> Create a one disk raid 1 partion in each of the new paritions.
>> Take down my system, boot a live system from CD, and use a reliable
>> copy program like rsync to copy each of the partitions contents to the
>> equivalent raid partition.
>> Run grub to set the new disk as bootable.  This is by far the
>> trickiest part.
>> Boot the new system and verify it's happy.
>> Repartion the now spare disk to match the new one if necessary.
>> You may need to zero the front of each partion with dd if=/dev/zero
>> to avoid mdadm error checks.
>> Add the partitions from that disk to the mdadm paritions and let mdadm
>> do its thing.
>>
> On 02/12/2017 07:08 AM, Bob Weber wrote:
>>
>> I use raid 1 also for the redundancy it provides.  If I need a backup I just
>> connect a disk, grow each array and add it to the array (I have 3 arrays for
>> /, /home and swap).  It syncs up in a couple hours (depending on size of the
>> array).  If you have grub install itself on the added disk you have a
>> bootable copy of your system (mdadm will complain about a degraded array).  I
>> then remove the drive and place it in another outbuilding in case of fire. 
>> You can even use a external USB disk housing for the drive to keep from
>> shutting down the system.  The sync is MUCH slower ... just coma back the
>> next day and you will have your backup.  You then grow each array back to the
>> number of disks you had before and all is happy again.  Note that this single
>> disk backup will only work with raid 1.
>>
> So, how do you do a complete restore from backup?  Boot from just the single
> backup drive and add additional drives as Marc Auslander describes, above?

Yes if that is what you need to do if there was a complete failure in your
machine and maybe you had to start over with a new motherboard and power supply.

>
>
> One other question.  If using raid, how do you know when a disk is starting to
> have trouble, as mine did?  Since the whole purpose of raid is to keep the
> system up and running I wouldn't expect errors to pop up like I was getting. 
> Do you have to keep an eye on log files?  Which ones?  Or is there some other
> way that mdadm provides notification of errors?  I've got to admit, even
> though I have been using Debian for 18 or 19 years (since Bo), log files have
> never been my favorite thing.  I generally only look at them when I have a
> problem and someone on this luist tells me what to look for and where.
>
> Marc
>
>
I use a program called ossec.  It watches logs of all my linux boxes so I get
email messages about disk problems.  I also do periodic self tests on all my
drives controlled by smartd from the  smartmontools package.  I also use a
package called logwatch which summarizes my logs.   The messages from mdadm and
smartd are seen by ossec.  When I mess with an array to make it larger and add a
disk for backup I get the messages in my mailbox about a degraded array.  As I'm
reading them I am startled until I remember ...Oh I did that!  I have a daily
cron job that emails the output of "smartctl -a /dev/sdx" for each drive on each
machine so I can keep a history of the parameters for each drive.

I also use backuppc on a dedicated server to backup all my boxes.  That way I
can get back files I deleted by mistake or modified and has to go back to a
previous version.  I now have all my machines on raid 1,  My wife just recently
gave up on Win 10 with all those updates that just took over her machine when
Windows wanted to!  So now she is running Debian/KDE.

After writing this I wonder if I am over doing this.  I just don't want to loose
data from a failing drive.  I lived through 3.5 inch floppies which seemed to
always fail.  And tape drives that were painfully slow.  Not to mention back in
the mid 70s saving Z80 programs and data to audio cassette tapes

Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Marc Auslander
Marc Shapiro  writes:

> BTW, what is your third partition, and why would you not separate it
> now if starting from scratch?
My third partition is for backups which I make to protect against
software or operator error.  At one point it was on a separate disk
since disks were small and without LVM had to be a different
partition/file system.
>
>
> One other question.  If using raid, how do you know when a disk is
> starting to have trouble, as mine did?  Since the whole purpose of
...
> Marc

Ok - I'm pretty paranoid about that.  smart is checking.
mdadm will notice if a disk is bad and turn
it off, so to speak.  Again in the logs.
I run a cron job to check form smart errors based on:

smartctl -l error -q errorsonly "device"
smartctl -H -q errorsonly "device"

But I've always checked all my disks once a week.  A root cron job
reads the whole disk with dd into /dev/null.  Any error get logged, of
course.  Separately, a cron job scans syslog and syslog.1 grepping for
"IO Error" and informs me by email if any new errors are found.  This
catches error in the dd check but also actual errors in operation.



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Marc Shapiro

On 02/12/2017 08:30 AM, Marc Auslander wrote:

I do not use LVM over raid 1.  I think it can be made to work,
although IIRC booting from an LVM over RAID partion has caused issues.

my boot partitions are separate.  They are not under LVM.

LVM is useful when space requirements are changing over time and the
ability to add additional disks and grow logical partions is needed.
In my case, that isn't an issue.  I have only a small number of
paritions - 3 because of history but starting from scratch, I'd only
have two - root (including boot) and /home.
I started using LVM when I had a much smaller disk (40GB).  With the 
current 1TB disk, even with three accounts on the box, and expanding 
several partitions when moving to the new disk, I have still partitioned 
less than half the disk and that is less than 1/3 used. So, no, LVM is 
probably not an issue any more.


BTW, what is your third partition, and why would you not separate it now 
if starting from scratch?

I converted to mdamd raid as follows, IIRC.

Install the second disk, and parition it the way I wanted.
Create a one disk raid 1 partion in each of the new paritions.
Take down my system, boot a live system from CD, and use a reliable
copy program like rsync to copy each of the partitions contents to the
equivalent raid partition.
Run grub to set the new disk as bootable.  This is by far the
trickiest part.
Boot the new system and verify it's happy.
Repartion the now spare disk to match the new one if necessary.
You may need to zero the front of each partion with dd if=/dev/zero
to avoid mdadm error checks.
Add the partitions from that disk to the mdadm paritions and let mdadm
do its thing.


On 02/12/2017 07:08 AM, Bob Weber wrote:


I use raid 1 also for the redundancy it provides.  If I need a backup 
I just connect a disk, grow each array and add it to the array (I have 
3 arrays for /, /home and swap).  It syncs up in a couple hours 
(depending on size of the array).  If you have grub install itself on 
the added disk you have a bootable copy of your system (mdadm will 
complain about a degraded array).  I then remove the drive and place 
it in another outbuilding in case of fire.  You can even use a 
external USB disk housing for the drive to keep from shutting down the 
system.  The sync is MUCH slower ... just coma back the next day and 
you will have your backup.  You then grow each array back to the 
number of disks you had before and all is happy again.  Note that this 
single disk backup will only work with raid 1.


So, how do you do a complete restore from backup?  Boot from just the 
single backup drive and add additional drives as Marc Auslander 
describes, above?



One other question.  If using raid, how do you know when a disk is 
starting to have trouble, as mine did?  Since the whole purpose of raid 
is to keep the system up and running I wouldn't expect errors to pop up 
like I was getting.  Do you have to keep an eye on log files?  Which 
ones?  Or is there some other way that mdadm provides notification of 
errors?  I've got to admit, even though I have been using Debian for 18 
or 19 years (since Bo), log files have never been my favorite thing.  I 
generally only look at them when I have a problem and someone on this 
luist tells me what to look for and where.


Marc



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Marc Auslander
Marc Shapiro  writes:

> the past couple of weeks.  AIUI you can use LVM over raid.  Is there
> any actual advantage to this?  I was trying to determine the
> advantages of using straight raid, straight LVM, or LVM over raid.  If
> I decide, later, to use raid, how dificult is it to add to a currently
> running system (with, or without LVM)?
>
>
> Marc
I do not use LVM over raid 1.  I think it can be made to work,
although IIRC booting from an LVM over RAID partion has caused issues.

LVM is useful when space requirements are changing over time and the
ability to add additional disks and grow logical partions is needed.
In my case, that isn't an issue.  I have only a small number of
paritions - 3 because of history but starting from scratch, I'd only
have two - root (including boot) and /home.

I converted to mdamd raid as follows, IIRC.

Install the second disk, and parition it the way I wanted.
Create a one disk raid 1 partion in each of the new paritions.
Take down my system, boot a live system from CD, and use a reliable
copy program like rsync to copy each of the partitions contents to the
equivalent raid partition.
Run grub to set the new disk as bootable.  This is by far the
trickiest part.
Boot the new system and verify it's happy.
Repartion the now spare disk to match the new one if necessary.
You may need to zero the front of each partion with dd if=/dev/zero
to avoid mdadm error checks.
Add the partitions from that disk to the mdadm paritions and let mdadm
do its thing.



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-12 Thread Bob Weber
I use raid 1 also for the redundancy it provides.  If I need a backup I just
connect a disk, grow each array and add it to the array (I have 3 arrays for /,
/home and swap).  It syncs up in a couple hours (depending on size of the
array).  If you have grub install itself on the added disk you have a bootable
copy of your system (mdadm will complain about a degraded array).  I then remove
the drive and place it in another outbuilding in case of fire.  You can even use
a external USB disk housing for the drive to keep from shutting down the
system.  The sync is MUCH slower ... just coma back the next day and you will
have your backup.  You then grow each array back to the number of disks you had
before and all is happy again.  Note that this single disk backup will only work
with raid 1.


*...Bob*
On 02/11/2017 10:42 PM, Marc Shapiro wrote:
> On 02/11/2017 05:22 PM, Marc Auslander wrote:
>> You didn't ask for advice so take it or ignore it.
>>
>> IMHO, in this day and age, there is no reason not to run raid 1.  Two
>> disks, identially partitioned, each parition set up as a raid 1
>> partition with two copies.
>>
>> When a disk dies, you remove it from all the raid partitions, pop in a
>> new disk, partition it,  add the new partitions back into the raid
>> partitions and raid rebuilds the copies.
>>
>> Except for taking the system down to replace the disk (assuming you
>> don't have a third installed as a spare) you just keep running as if
>> nothing has happened.
>>
> I had been considering using raid 1 and I have not yet ruled it out entirely. 
> I have never used raid and have been reading up on it over the past couple of
> weeks.  AIUI you can use LVM over raid.  Is there any actual advantage to
> this?  I was trying to determine the advantages of using straight raid,
> straight LVM, or LVM over raid.  If I decide, later, to use raid, how dificult
> is it to add to a currently running system (with, or without LVM)?
>
>
> Marc
>
>



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-11 Thread Marc Shapiro

On 02/11/2017 05:22 PM, Marc Auslander wrote:

You didn't ask for advice so take it or ignore it.

IMHO, in this day and age, there is no reason not to run raid 1.  Two
disks, identially partitioned, each parition set up as a raid 1
partition with two copies.

When a disk dies, you remove it from all the raid partitions, pop in a
new disk, partition it,  add the new partitions back into the raid
partitions and raid rebuilds the copies.

Except for taking the system down to replace the disk (assuming you
don't have a third installed as a spare) you just keep running as if
nothing has happened.

I had been considering using raid 1 and I have not yet ruled it out 
entirely.  I have never used raid and have been reading up on it over 
the past couple of weeks.  AIUI you can use LVM over raid.  Is there any 
actual advantage to this?  I was trying to determine the advantages of 
using straight raid, straight LVM, or LVM over raid.  If I decide, 
later, to use raid, how dificult is it to add to a currently running 
system (with, or without LVM)?



Marc



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-11 Thread Felix Miata

Marc Auslander composed on 2017-02-11 20:22 (UTC-0500):


IMHO, in this day and age, there is no reason not to run raid 1.

Are you sure? Laptops have been outselling desktops for years.
--
"The wise are known for their understanding, and pleasant
words are persuasive." Proverbs 16:21 (New Living Translation)

 Team OS/2 ** Reg. Linux User #211409 ** a11y rocks!

Felix Miata  ***  http://fm.no-ip.com/



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-11 Thread Marc Auslander
You didn't ask for advice so take it or ignore it.

IMHO, in this day and age, there is no reason not to run raid 1.  Two
disks, identially partitioned, each parition set up as a raid 1
partition with two copies.

When a disk dies, you remove it from all the raid partitions, pop in a
new disk, partition it,  add the new partitions back into the raid
partitions and raid rebuilds the copies.

Except for taking the system down to replace the disk (assuming you
don't have a third installed as a spare) you just keep running as if
nothing has happened.



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-11 Thread David Christensen

On 02/10/17 23:39, Marc Shapiro wrote:

On 02/08/2017 05:32 PM, David Christensen wrote:

On 02/08/17 15:59, Marc Shapiro wrote:

So how do I lay down a low level format on [the new 1 TB] drive?

I would use the SeaTools bootable CD to fill the drive with zeroes:
On 02/03/17 23:13, David Christensen wrote:

Sometimes you get lucky and the tool is a live CD:

www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS223ALL.ISO

I didn't feel like burning a CD and it has been a long time since I had
a box with a 3.5" floppy (although i do have one or two drives in a box
somewhere and quite a few of the folppies, themselves, as well)


3.5" floppy?  The link above is for a live CD.


 so I just used dd to write zeros to the disk. It took a while, but it 

> did the job.

For a HDD, the effect should be the same.



I partitioned the new disk with 3 physical partitions of 2GB each for
root/boot partitions.  ...
The 4th partition was set up for LVM and was set as a Physical Volume
(PV) to be added to the volume group along with my old drive.


The problem with putting everything on one big disk is that it becomes 
impractical to clone the system image.  I'm still climbing the disk 
imaging learning curve, but it's a useful technique that has saved me 
countless hours.




In the end, I picked yet another method for moving to the new disk. ...


Congratulations on your success battling through it all, especially LVM.


David



Re: HELP! Re: How to fix I/O errors? (SOLVED)

2017-02-10 Thread Marc Shapiro

On 02/08/2017 05:32 PM, David Christensen wrote:

On 02/08/17 15:59, Marc Shapiro wrote:

So how do I lay down a low level format on [the new 1 TB] drive?


I would use the SeaTools bootable CD to fill the drive with zeroes:

On 02/03/17 23:13, David Christensen wrote:
> Sometimes you get lucky and the tool is a live CD:
>
> 
www.seagate.com/files/www-content/support-content/downloads/seatools/_shared/downloads/SeaToolsDOS223ALL.ISO



David

I didn't feel like burning a CD and it has been a long time since I had 
a box with a 3.5" floppy (although i do have one or two drives in a box 
somewhere and quite a few of the folppies, themselves, as well) so I 
just used dd to write zeros to the disk.  It took a while, but it did 
the job.  In the end, I picked yet another method for moving to the new 
disk.  As mentioned  in my first post, I am using LVM and I have unused 
space in the VG. I was debating with myself whether I wanted to continue 
to use LVM, or just use raw disk partitions.  I almost went with raw 
disk partitions before I came across 'pvmove', which does exactly what I 
needed.  So...


I partitioned the new disk with 3 physical partitions of 2GB each for 
root/boot partitions.


The 4th partition was set up for LVM and was set as a Physical Volume 
(PV) to be added to the volume group along with my old drive.


Before adding the new disk, I created a new Logical Volume (LV) and 
manually copied my home partition (one user tree at a time) to the new 
partition.  This spat out errors whenever it hit an unreadable sector 
and I redirected those errors to a file for later use.


I then added the LVM partition from the new disk to the Volume Group 
(VG) and did a 'pvmove' for each LV from the old PVto the new PV.


I included the original LV for /home, along with the newly copied LV.  I 
expected it to spit out errors and fail, but it didn't.  I could hear it 
struggle a bit when it hit the bad spots, but then it kept going.  This 
was actually a good thing.  I had the list of affected files from when I 
did the manual copy of the /home partition, so I knew what to check 
after the move.  Several of the files were videos.  Using the original 
files before copying, Xine would play up to the first I/O Error and then 
freeze, even though it continued to read the file and advance the 
timeline until the file ended.  Using the manually copied file, which 
truncated at the first error, I also only got the beginning of the video 
and then it ended.  Using the file from the original LV which I moved to 
the new disk with pvmove, however, gave better results.  There is a bit 
of flicker when it hits a sector that had been unreadable before moving, 
but it continues on so the rest of the video can be viewed.  A few of 
the other files I did delete (Libre Office document files do not survive 
well, but I have a PDF of that file if I ever need it again).


Then I just had to copy over the root/boot partitions which I did from a 
shell after booting my clonezilla CD (it came in handy after all) and 
run lilo on them to make the new disk bootable. Everything seems good, 
now.  I ran the full test from SeagateTools (st) again, today, just to 
verify that all was still good.  It was.  I now have an empty PV in my 
LVM volume group that I will need to remove before I add any new Logical 
Volumes (LVs), but I can do that any time.  Since there are no LVs on it 
nothing will attempt to read from it, or write to it.


I'll keep an eye on the disk for a while, but this should fix the 
problem.  If I ever have a failing disk again I hope that I will 
remember this method because the LVM pvmove command really did make 
moving to another disk easy.  The hard part was dealing with the 
root/boot partitions and getting the new disk bootable.


Hopefully this thread will help someone else who has a similar problem 
in the future.



Marc