Quoting "Steven R. Gerber" <open...@gerber-systems.com>:

> On 3/24/2011 4:33 PM, richardtoo...@paradise.net.nz wrote:
> > Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
> > 
> >> On 3/24/2011 2:36 PM, richardtoo...@paradise.net.nz wrote:
> >>> Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
> >>>
> >>>> -------- Original Message --------
> >>>> Subject: Re: rdist times out but will not die
> >>>> Date: Thu, 24 Mar 2011 21:49:01 +1300
> >>>> From: Richard Toohey <richardtoo...@paradise.net.nz>
> >>>> To: Steven R. Gerber <sger...@gerber-systems.com>
> >>>> CC: t...@openbsd.org
> >>>>
> >>>> On 24/03/2011, at 4:06 PM, Steven R. Gerber wrote:
> >>>>
> >>>>> On 3/20/2011 2:07 PM, Steven R. Gerber wrote:
> >>>>>> I want to do local/remote mirror/backup (or should that be
> >>>> local-mirror
> >>>>>> / offsite-backup).
> >>>>>> So a two-part question:
> >>>>>> 1.     Even if there is a timeout, shouldn't the job/process exit?
> >>>>>>
> >>>> *************************************************************
> >>>> ****************
> >>>> *
> >>>>>> rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies:
> >> chown
> >>>> from
> >>>>>> rdist:operator to cdripper:operator
> >>>>>> rdist@thedump: thedump:
> >>>>>>
> /mnt/mirror2/public/read_only/movies/The_Thomas_Crown_Affair_1999:
> >>>> chown
> >>>>>> from rdist:operator to root:operator
> >>>>>> rdist@thedump:
> >>>>>>
> >>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow
> >>>> n_Affair_1999/THOMAS_CROW
> >>>> N_AFFAIR_16X9.md5:
> >>>>>> updating
> >>>>>> rdist@thedump:
> >>>>>>
> >>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow
> >>>> n_Affair_1999/THOMAS_CROW
> >>>> N_AFFAIR_16X9.iso:
> >>>>>> installing
> >>>>>> rdist@thedump: LOCAL ERROR: Response time out
> >>>>>> rdist@thedump: updating of rdist@thedump finished
> >>>>>> $ ps -ax|grep rdist
> >>>>>> 26025 ?? I 0:00.00 tee /var/log/rdist/2011-03-20
> >>>>>> 11059 ?? I 0:00.01 rdist -f /etc/Distfile
> >>>>>> 28446 ?? I 0:22.99 rdist: update rdist@thedump (rdist)
> >>>>>> 7795 ?? I 1:10.32 ssh -l rdist thedump r
> >>>>>> 13045 p0 S+ 0:00.00 grep rdist
> >>>>>>
> >>>> *************************************************************
> >>>> ****************
> >>>> *
> >>>>>> 2.     I know that they happen from time to time. How can I
> >>>> avoid/prevent
> >>>>>> timeouts? The default is 900 sec AKA 15 min? How can this happen
> >>>>>> between two local machines?
> >>>>
> >>>> How big is the file?
> >>>
> >>> So, how big is the file that it times out on?
> >>>
> >>> More than 2Gb? Guess so if a movie file?
> >>>
> >>> I might be barking up the wrong tree, but it will take you two
> seconds
> >> to see if
> >>> there's anything in this > 2Gb idea and if I'm wrong, move on.
> >>>
> >>> Regardless of that, yes, put more debugging on - might give you
> some
> >> more clues.
> >>>
> >>> OpenBSD helps those who help themselves.
> >> Richard,
> >> Thanks for the help.
> >> I had already read the IBM note 'LOCAL ERROR: response time out'
> (from
> >> 2006). (Google is not my enemy?)
> >> I had already checked: the file is >2GB (4.4GB).
> >> I ASSUMED that I can't the only who has tried to push large files
> with
> >> rdist. I searched the OpenBSD list archives (mine go back to 2006)
> and
> >> found nothing significant/useful. Maybe I missed something?
> >> I immediately moved to the misc list per your suggestion.
> >> I did a (manual) run of rdist with "-D" and got similar results -- I
> am
> >> still analyzing those messages.
> >> I usually do not compile OpenBSD, so it will take a while to review
> the
> >> rdist source code (client.c?).
> > 
> > Thanks ... never assume anything, eh? 8-)
> > 
> > If your files are > 2Gb, then that IBM link seems to be spot on, and
> answers
> > (maybe) number 2 on your list - why would you get a timeout on a local
> transfer
> > (if hardware related, you'd expect sftp to fail, or there to be other
> noticeable
> > issues)?
> > 
> > I've not used rdist before, but I don't mind having a look now that I
> know your
> > files are > 2Gb. But going to be a quiet (ha!) evening project, so no
> promises
> > (and maybe someone else will blow the theory out of the water and
> provide a
> > different answer/fix.)
> > 
> > The IBM note suggests that both client & server need to be amended, IF
> I am on
> > the right track.
> > 
> > This is all purely speculative on my part, but it does SEEM to match
> what you
> > are seeing, doesn't it?
> > 
> > Thanks.
> [SNIP]
> 
> You are right on it! Thanks!
> Not to be greedy, but ...
> What do you think of the other issue that rdist logs a "finished"
> message but does not exit?
> 
> Thanks.
> 
>  
More guessing (I'm already out on a limb ... the branch is about to break) ...
"something" is unhappy because of the time out?

What messages are in the debug output - do you see "finish() called" as per the
code in common.c below?  What's the rest of the message(s)?

What happens if you move all the > 2Gb files out the way temporarily and re-run
(obviously I don't know how practical this is)?  Does it finish normally?

Or if that doesn't suit, how about creating a test directory with 20 (<2 Gb
each) files in, run it, then drop a big file (>2 Gb) in, re-run.  If it fails,
then I'd say I was on to something (I don't know anything about rdist, so I do
not know how to set up this test environment.)  Remove the big file, or truncate
it down to < 2Gb and re-run.  If that works, I get a cookie.

common.c

    154 void
    155 finish(void)
    156 {
    157         extern jmp_buf finish_jmpbuf;
    158
    159         debugmsg(DM_CALL,
    160                  "finish() called: do_fork = %d amchild = %d isserver = 
%d",
    161                  do_fork, amchild, isserver);
    162         cleanup(0);
    163
    164         /*
    165          * There's no valid finish_jmpbuf for the rdist master parent.
    166          */
    167         if (!do_fork || amchild || isserver) {
    168
    169                 if (!setjmp_ok) {
    170 #ifdef DEBUG_SETJMP
    171                         error("attemping longjmp() without target");
    172                         abort();
    173 #else
    174                         exit(1);
    175 #endif
    176                 }
    177
    178                 longjmp(finish_jmpbuf, 1);
    179                 /*NOTREACHED*/
    180                 error("Unexpected failure of longjmp() in finish()");
    181                 exit(2);
    182         } else
    183                 exit(1);
    184 }

Thanks.

Reply via email to