Re: A Very Bad umount

2018-09-25 Thread Rick Thomas


On Sep 11, 2018, at 12:28 PM, Martin McCormick  wrote:

> /bin/rm: cannot remove 
> '/var/cache/rsnapshot/halfday.1/wb5agz/home/usr/lib/i386
> -linux-gnu': Transport endpoint is not connected

In your rsnapshot.conf file, is “use_lazy_deletes” set to 1?  If so, the final 
delete part of “rsnapshot hourly” may still be going on when you do the unmount.

Just a thought…
Rick 


Re: A Very Bad umount

2018-09-25 Thread Martin McCormick
=?UTF-8?Q?=c3=89tienne_Mollier?=  writes:
> 
> Good Day,
> 
> Not sure if that is the kind of answer you would wish to
> expect, but have you considered doing umounts sequentially?
> (optionally after synchronizing file systems)
> 
> sync
> umount /var/cache/rsnapshot
> umount /rsnapshot2
> umount /rsnapshot1
> 
> Each call is blocking, so it will help perhaps...

It has been exactly 2 weeks since I tried your suggestion
and there has not been one backup failure.  It may be too early
to celebrate but the sequencial approach appears to have solved
the problem.

Thank you for your help.

Martin McCormick



Re: A Very Bad umount

2018-09-12 Thread Martin McCormick
=?UTF-8?Q?=c3=89tienne_Mollier?=  writes:
> Good Day Gene,
> 
> Gene Heskett  2018-09-12T03:14 +0200 :

> 
> Should a badly placed “rm” command occur on the system, the
> system and both of its backup disks would be wiped clean.  I
> don't believe the risk mentioned here over was related to disk
> decay.  It was more about minimizing the time frame when this
> catastrophe could happen.

Precisely. I don't want to leave them mounted since we
might get a power hit that would corrupt the drives causing all
the backups to go poof!

Bad stuff can even happen with good UPS's in line.

When I was a systems engineer at Oklahoma State
University, we had a weird chain of events one cold and bright
Winter day just before Christmas in the mid nineties.

A pigeon wandered in to a power sub station and pecked at
the wrong thing and received about 100-thousand volts through his
body, ruining his day for sure.  The power on the campus went
dark and our UPS's held until an auxiliary generator came on line
a few minutes later.  The problem was that it was running at the
wrong throttle setting, sending AC at near 50 HZ instead of 60 HZ
into our building.  The UPS's were older fero-resonant devices
and the frequency was far enough off that the UPS's stayed
switched to battery mode, something we were unaware of.

About half an hour later, the batteries ran all the way
down and those computers connected to them suddenly lost power
without a proper shut-down.

I think they came back okay but we were just lucky.

Nothing sounds more odd than hearing all the electric
motors and fans in the building revving up and slowing down as
the generator throttle was adjusted to 60 HZ.

The usb drives I am using for backups are SSD devices so
there are no moving parts.

Martin

> I wouldn't do both backups at the same time personally, If
> something very wrong occurs to the system at backup time, I'd
> still have the secondary backup available for restore.
> 
> Things are a bit different when centralizing backup policies
> with tools like Amanda.
> 
> 
> > IMO the power savings from spinning down when not in active
> > use, do not compensate for the increased failure rate you'll
> > get under stop and start conditions.
> 
> Interesting opinion, it could be worth verifying.  Keeping a
> machine running for BOINC, I only had a disk issue once since
> the beginning of the decade.  Building disks has energy costs
> too indeed.
> 
> 
> Kind Regards,
> --
> Étienne Mollier 
> 
> 



Re: A Very Bad umount

2018-09-12 Thread Gene Heskett
On Wednesday 12 September 2018 13:12:43 Étienne Mollier wrote:

> Good Day Gene,
>
> Gene Heskett  2018-09-12T03:14 +0200 :
> > On Tuesday 11 September 2018 15:28:30 Martin McCormick wrote:
> >
> > [...]
> >
> > >Any constructive ideas are appreciated.  If I left
> > > the drives mounted all the time, there would be no spew but
> > > since these are backup drives, having them mounted all the
> > > time is quite risky.
> > >
> > > Martin McCormick WB5AGZ
> >
> > Why should you call that risky? I have been using amanda for
> > my backups with quite a menagerie of media since 1998. On 4
> > different boxes as I built newer, faster ones over the years.
>
> Should a badly placed “rm” command occur on the system, the
> system and both of its backup disks would be wiped clean.  I
> don't believe the risk mentioned here over was related to disk
> decay.  It was more about minimizing the time frame when this
> catastrophe could happen.
>
> I wouldn't do both backups at the same time personally, If
> something very wrong occurs to the system at backup time, I'd
> still have the secondary backup available for restore.
>
> Things are a bit different when centralizing backup policies
> with tools like Amanda.
>
> > IMO the power savings from spinning down when not in active
> > use, do not compensate for the increased failure rate you'll
> > get under stop and start conditions.
>
> Interesting opinion, it could be worth verifying.
> 
True, but actually has 2 identical systems, doing the same things to 
prove it.  So its difficult at best.

OTOH, someone like google that runs thousands of machines 24/7 is in the 
opposite camp, they have machines that are only down for disk 
replacements,  which they do in pallet qty's, at least for those 
machines facing the public. But just guessing, based on my own 
experience, I'd say they have records going back to the beginning of 
their search engine that would confirm to a high degree of certainty 
that letting them spin 24/7 till they do die is the most important point 
of their longevity. That drive I just pulled out at 77,000+ spinning 
hours had just under 50 powerdowns while it was in this machine since I 
built this one in 2007. My ups used to shut things off at about 7 or 8 
minutes, but its now been 3+ years since I had a 20kw generac with 
autostart and autotransfer put in, (the missus has end stage copd and a 
prolonged failure would probably finish her) so as far as this ups is 
concerned, there has only been one powerdown since as the powerfails 
have been in the 6 second territory, the startup and transfer delays. 
That leaves a hdwe failure, which there hasn't been except for bad sata 
cables, and its semi-annual shutdown and wheeled out to the front deck 
for a dusting and cleaning with an 80 lb air hose.

The rest of the 1T drives in this machine except for the 2T I just 
installed, have 40,000+ hours of spin time on them.

> Keeping a 
> machine running for BOINC, I only had a disk issue once since
> the beginning of the decade.  Building disks has energy costs
> too indeed.

True, but thats hidden in what you pay for them at the gitten place. ;-)

> Kind Regards,

Back to you, Étienne Mollier.

-- 
Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 



Re: A Very Bad umount

2018-09-12 Thread Étienne Mollier
Good Day Gene,

Gene Heskett  2018-09-12T03:14 +0200 :
> On Tuesday 11 September 2018 15:28:30 Martin McCormick wrote:
>
> [...]
>
> >Any constructive ideas are appreciated.  If I left
> > the drives mounted all the time, there would be no spew but
> > since these are backup drives, having them mounted all the
> > time is quite risky.
> >
> > Martin McCormick WB5AGZ
>
> Why should you call that risky? I have been using amanda for
> my backups with quite a menagerie of media since 1998. On 4
> different boxes as I built newer, faster ones over the years.

Should a badly placed “rm” command occur on the system, the
system and both of its backup disks would be wiped clean.  I
don't believe the risk mentioned here over was related to disk
decay.  It was more about minimizing the time frame when this
catastrophe could happen.

I wouldn't do both backups at the same time personally, If
something very wrong occurs to the system at backup time, I'd
still have the secondary backup available for restore.

Things are a bit different when centralizing backup policies
with tools like Amanda.


> IMO the power savings from spinning down when not in active
> use, do not compensate for the increased failure rate you'll
> get under stop and start conditions.

Interesting opinion, it could be worth verifying.  Keeping a
machine running for BOINC, I only had a disk issue once since
the beginning of the decade.  Building disks has energy costs
too indeed.


Kind Regards,
-- 
Étienne Mollier 



Re: A Very Bad umount

2018-09-11 Thread Gene Heskett
On Tuesday 11 September 2018 15:28:30 Martin McCormick wrote:

[...]

>    Any constructive ideas are appreciated.  If I left the
> drives mounted all the time, there would be no spew but since
> these are backup drives, having them mounted all the time is
> quite risky.
>
> Martin McCormick WB5AGZ

Why should you call that risky? I have been using amanda for my backups 
with quite a menagerie of media since 1998. On 4 different boxes as I 
built newer, faster ones over the years.

Most recently like 2 weeks back, I retired the drive I had been using for 
virtual tapes for the last 77 thousand spinning hours in favor of a new 
one twice the size. It had 25 re-allocated sectors the first time I 
checked it, at about 3k spinning hours. It was still showing that same 
25 re-allocated sectors the day I pulled it out as it was not big 
enough, hovering at around 90% capacity for the last year.

Not once in all those mounted and spinning hours, has it had even a hint 
of a problem because its mounted 24/7/365 minus a few seconds for my 
backup generator to start, which averages around 4x a year.

I used to worry about it, but HD's that aren't ever unmounted and spun 
down suffer far less damage from parking the heads on a still moving 
platter when the cushioning air film collapses as they stop, and 
dragging on that same platter for at least a turn during the spin up.

IMO the power savings from spinning down when not in active use, do not 
compensate for the increased failure rate you'll get under stop and 
start conditions.

-- 
Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 



Re: A Very Bad umount

2018-09-11 Thread Martin McCormick
=?UTF-8?Q?=c3=89tienne_Mollier?=  writes:
> 
> Good Day,
> 
> Not sure if that is the kind of answer you would wish to
> expect, but have you considered doing umounts sequentially?
> (optionally after synchronizing file systems)
> 
> sync
> umount /var/cache/rsnapshot
> umount /rsnapshot2
> umount /rsnapshot1
> 
> Each call is blocking, so it will help perhaps...
> 
> Kind Regards,
> --
> Étienne Mollier 

Thank you.  I was thinking of trying that very thing next.

If things are quiet for a month or so, I'll think it worked.

Martin



Re: A Very Bad umount

2018-09-11 Thread Étienne Mollier



On 9/11/18 9:28 PM, Martin McCormick wrote:
> #Combine 2 256-GB drives in to 1 512 GB drive.
> 
> mount /rsnapshot1
> mount /rsnapshot2
> mhddfs /rsnapshot1,/rsnapshot2 /var/cache/rsnapshot -o mlimit=100M 
>
-8<8<
>   I have actually tried 
> 
> umount /rsnapshot2 /rsnapshot1 /var/cache/rsnapshot
> 
> as well as
> 
> umount /var/cache/rsnapshot /rsnapshot2 /rsnapshot1 .  I was
> thinking that the order might make a difference but have gotten
> as many good runs with either order.

Good Day,

Not sure if that is the kind of answer you would wish to
expect, but have you considered doing umounts sequentially?
(optionally after synchronizing file systems)

sync
umount /var/cache/rsnapshot
umount /rsnapshot2
umount /rsnapshot1

Each call is blocking, so it will help perhaps...

Kind Regards,
-- 
Étienne Mollier