Quoting "Steven R. Gerber" <open...@gerber-systems.com>:

> On 3/24/2011 5:00 PM, richardtoo...@paradise.net.nz wrote:
> > Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
> > 
> >> On 3/24/2011 4:33 PM, richardtoo...@paradise.net.nz wrote:
> >>> Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
> >>>
> >>>> On 3/24/2011 2:36 PM, richardtoo...@paradise.net.nz wrote:
> >>>>> Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
> >>>>>
> >>>>>> -------- Original Message --------
> >>>>>> Subject: Re: rdist times out but will not die
> >>>>>> Date: Thu, 24 Mar 2011 21:49:01 +1300
> >>>>>> From: Richard Toohey <richardtoo...@paradise.net.nz>
> >>>>>> To: Steven R. Gerber <sger...@gerber-systems.com>
> >>>>>> CC: t...@openbsd.org
> >>>>>>
> >>>>>> On 24/03/2011, at 4:06 PM, Steven R. Gerber wrote:
> >>>>>>
> >>>>>>> On 3/20/2011 2:07 PM, Steven R. Gerber wrote:
> >>>>>>>> I want to do local/remote mirror/backup (or should that be
> >>>>>> local-mirror
> >>>>>>>> / offsite-backup).
> >>>>>>>> So a two-part question:
> >>>>>>>> 1.   Even if there is a timeout, shouldn't the job/process exit?
> >>>>>>>>
> >>>>>> *************************************************************
> >>>>>> ****************
> >>>>>> *
> >>>>>>>> rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies:
> >>>> chown
> >>>>>> from
> >>>>>>>> rdist:operator to cdripper:operator
> >>>>>>>> rdist@thedump: thedump:
> >>>>>>>>
> >> /mnt/mirror2/public/read_only/movies/The_Thomas_Crown_Affair_1999:
> >>>> >> chown
> >>>>>>>> from rdist:operator to root:operator
> >>>>>>>> rdist@thedump:
> >>>>>>>>
> >>>>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow
> >>>>>> n_Affair_1999/THOMAS_CROW
> >>>>>> N_AFFAIR_16X9.md5:
> >>>>>>>> updating
> >>>>>>>> rdist@thedump:
> >>>>>>>>
> >>>>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow
> >>>>>> n_Affair_1999/THOMAS_CROW
> >>>>>> N_AFFAIR_16X9.iso:
> >>>>>>>> installing
> >>>>>>>> rdist@thedump: LOCAL ERROR: Response time out
> >>>>>>>> rdist@thedump: updating of rdist@thedump finished
> >>>>>>>> $ ps -ax|grep rdist
> >>>>>>>> 26025 ?? I 0:00.00 tee /var/log/rdist/2011-03-20
> >>>>>>>> 11059 ?? I 0:00.01 rdist -f /etc/Distfile
> >>>>>>>> 28446 ?? I 0:22.99 rdist: update rdist@thedump (rdist)
> >>>>>>>> 7795 ?? I 1:10.32 ssh -l rdist thedump r
> >>>>>>>> 13045 p0 S+ 0:00.00 grep rdist
> >>>>>>>>
> >>>>>> *************************************************************
> >>>>>> ****************
> >>>>>> *
> >>>>>>>> 2.   I know that they happen from time to time. How can I
> >>>>>> avoid/prevent
> >>>>>>>> timeouts? The default is 900 sec AKA 15 min? How can this
> happen
> >>>>>>>> between two local machines?
> >>>>>>
> >>>>>> How big is the file?
> >>>>>
> >>>>> So, how big is the file that it times out on?
> >>>>>
> >>>>> More than 2Gb? Guess so if a movie file?
> >>>>>
> >>>>> I might be barking up the wrong tree, but it will take you two
> >> seconds
> >>>> to see if
> >>>>> there's anything in this > 2Gb idea and if I'm wrong, move on.
> >>>>>
> >>>>> Regardless of that, yes, put more debugging on - might give you
> >> some
> >>>> more clues.
> >>>>>
> >>>>> OpenBSD helps those who help themselves.
> >>>> Richard,
> >>>> Thanks for the help.
> >>>> I had already read the IBM note 'LOCAL ERROR: response time out'
> >> (from
> >>>> 2006). (Google is not my enemy?)
> >>>> I had already checked: the file is >2GB (4.4GB).
> >>>> I ASSUMED that I can't the only who has tried to push large files
> >> with
> >>>> rdist. I searched the OpenBSD list archives (mine go back to 2006)
> >> and
> >>>> found nothing significant/useful. Maybe I missed something?
> >>>> I immediately moved to the misc list per your suggestion.
> >>>> I did a (manual) run of rdist with "-D" and got similar results --
> I
> >> am
> >>>> still analyzing those messages.
> >>>> I usually do not compile OpenBSD, so it will take a while to
> review
> >> the
> >>>> rdist source code (client.c?).
> >>>
> >>> Thanks ... never assume anything, eh? 8-)
> >>>
> >>> If your files are > 2Gb, then that IBM link seems to be spot on,
> and
> >> answers
> >>> (maybe) number 2 on your list - why would you get a timeout on a
> local
> >> transfer
> >>> (if hardware related, you'd expect sftp to fail, or there to be
> other
> >> noticeable
> >>> issues)?
> >>>
> >>> I've not used rdist before, but I don't mind having a look now that
> I
> >> know your
> >>> files are > 2Gb. But going to be a quiet (ha!) evening project, so
> no
> >> promises
> >>> (and maybe someone else will blow the theory out of the water and
> >> provide a
> >>> different answer/fix.)
> >>>
> >>> The IBM note suggests that both client & server need to be amended,
> IF
> >> I am on
> >>> the right track.
> >>>
> >>> This is all purely speculative on my part, but it does SEEM to
> match
> >> what you
> >>> are seeing, doesn't it?
> >>>
> >>> Thanks.
> >> [SNIP]
> >>
> >> You are right on it! Thanks!
> >> Not to be greedy, but ...
> >> What do you think of the other issue that rdist logs a "finished"
> >> message but does not exit?
> >>
> >> Thanks.
> >>
> >> 
> > More guessing (I'm already out on a limb ... the branch is about to
> break) ...
> > "something" is unhappy because of the time out?
> > 
> > What messages are in the debug output - do you see "finish() called"
> as per the
> > code in common.c below? What's the rest of the message(s)?
> > 
> > What happens if you move all the > 2Gb files out the way temporarily
> and re-run
> > (obviously I don't know how practical this is)? Does it finish
> normally?
> > 
> > Or if that doesn't suit, how about creating a test directory with 20
> (<2 Gb
> > each) files in, run it, then drop a big file (>2 Gb) in, re-run. If it
> fails,
> > then I'd say I was on to something (I don't know anything about rdist,
> so I do
> > not know how to set up this test environment.) Remove the big file, or
> truncate
> > it down to < 2Gb and re-run. If that works, I get a cookie.
> > 
> > common.c
> > 
> > 154 void
> > 155 finish(void)
> > 156 {
> > 157 extern jmp_buf finish_jmpbuf;
> > 158
> > 159 debugmsg(DM_CALL,
> > 160 "finish() called: do_fork = %d amchild = %d isserver = %d",
> > 161 do_fork, amchild, isserver);
> > 162 cleanup(0);
> > 163
> > 164 /*
> > 165 * There's no valid finish_jmpbuf for the rdist master parent.
> > 166 */
> > 167 if (!do_fork || amchild || isserver) {
> > 168
> > 169 if (!setjmp_ok) {
> > 170 #ifdef DEBUG_SETJMP
> > 171 error("attemping longjmp() without target");
> > 172 abort();
> > 173 #else
> > 174 exit(1);
> > 175 #endif
> > 176 }
> > 177
> > 178 longjmp(finish_jmpbuf, 1);
> > 179 /*NOTREACHED*/
> > 180 error("Unexpected failure of longjmp() in finish()");
> > 181 exit(2);
> > 182 } else
> > 183 exit(1);
> > 184 }
> > 
> > Thanks.
> > 
> > 
> > 
> 
> I am getting the "finished() called" etc.
> I now have a theory (your "something" unhappy guess): rdist times out,
> but the child process does not and is still trying to get the
> end-of-file. The child is basically in an infinite loop: it does not
> time out because the dump does respond but it keeps retrieving from the
> first part of file -- it never reaches past the miscalculated size.
> 
>  

My diffs will no doubt get mangled by my webmail and I don't know enough about
rdist (or the rdist protocol) to know if these are correct.

Hopefully they are a step in the right direction.

Basic idea from https://www-304.ibm.com/support/docview.wss?uid=isg1IY85396

(I was going to look at FreeBSD's version for inspiration but looks like they
ditched rdist in 2003.)

Basically strtol to strtoll, %ld to %lld, and (int)/(long) to (off_t) to cope
with files bigger than > 2Gb.

Works for me on i386 - without these patches I see the reported behaviour, with
the patches I see the 4Gb file transferred.

With patches - it works:

$ cat rdist.conf                                                         
HOSTS = (172.16.1.111)
FILES = (/home/richard.toohey/rdist-test)
${FILES} -> ${HOSTS}

$ rdist -f rdist.conf  
172.16.1.111: updating host 172.16.1.111
richard.toohey@172.16.1.111's password: 
172.16.1.111: /home/richard.toohey/rdist-test/zerofile.tst: installing
172.16.1.111: updating of 172.16.1.111 finished

zerofile.tst created with:

dd if=/dev/zero of=zerofile.tst bs=1k count=4700000

HTH.

/usr/src/usr.bin/rdist/client.c
===============================

# diff -uw /home/richard.toohey/obsd-src/usr.bin/rdist/client.c client.c 
--- /home/richard.toohey/obsd-src/usr.bin/rdist/client.c        Thu Oct 29
17:34:06 2009
+++ client.c    Fri Mar 25 14:54:32 2011
@@ -399,8 +399,8 @@
         */
        ENCODE(ername, rname);
 
-       (void) sendcmd(C_RECVREG, "%o %04o %ld %ld %ld %s %s %s", 
-                      opts, stb->st_mode & 07777, (long) stb->st_size, 
+       (void) sendcmd(C_RECVREG, "%o %04o %lld %ld %ld %s %s %s", 
+                      opts, stb->st_mode & 07777, (off_t) stb->st_size, 
                       stb->st_mtime, stb->st_atime,
                       user, group, ername);
        if (response() < 0) {
@@ -409,8 +409,8 @@
        }
 
 
-       debugmsg(DM_MISC, "Send file '%s' %ld bytes\n", rname,
-                (long) stb->st_size);
+       debugmsg(DM_MISC, "Send file '%s' %lld bytes\n", rname,
+                (off_t) stb->st_size);
 
        /*
         * Set remote time out alarm handler.
@@ -666,8 +666,8 @@
         * Gather and send basic link info
         */
        ENCODE(ername, rname);
-       (void) sendcmd(C_RECVSYMLINK, "%o %04o %ld %ld %ld %s %s %s", 
-                      opts, stb->st_mode & 07777, (long) stb->st_size, 
+       (void) sendcmd(C_RECVSYMLINK, "%o %04o %lld %ld %ld %s %s %s", 
+                      opts, stb->st_mode & 07777, (off_t) stb->st_size, 
                       stb->st_mtime, stb->st_atime,
                       user, group, ername);
        if (response() < 0)
@@ -682,7 +682,7 @@
                error("%s: readlink failed", target);
                err();
        }
-       (void) snprintf(tbuf, sizeof(tbuf), "%.*s", (int) stb->st_size, lbuf);
+       (void) snprintf(tbuf, sizeof(tbuf), "%.*s", (off_t) stb->st_size, lbuf);
        ENCODE(ername, tbuf);
        (void) sendcmd(C_NONE, "%s\n", ername);
 
@@ -869,7 +869,7 @@
        /*
         * Parse size
         */
-       size = (off_t) strtol(cp, (char **)&cp, 10);
+       size = (off_t) strtoll(cp, (char **)&cp, 10);
        if (*cp++ != ' ') {
                error("update: size not delimited");
                return(US_NOTHING);
@@ -921,8 +921,8 @@
 
        debugmsg(DM_MISC, "update(%s,) local mode %04o remote mode %04o\n", 
                 rname, lmode, rmode);
-       debugmsg(DM_MISC, "update(%s,) size %ld mtime %d owner '%s' grp '%s'\n",
-                rname, (long) size, mtime, owner, group);
+       debugmsg(DM_MISC, "update(%s,) size %lld mtime %d owner '%s' grp 
'%s'\n",
+                rname, (off_t) size, mtime, owner, group);
 
        if (statp->st_mtime != mtime) {
                if (statp->st_mtime < mtime && IS_ON(opts, DO_YOUNGER)) {
@@ -935,8 +935,8 @@
        }
 
        if (statp->st_size != size) {
-               debugmsg(DM_MISC, "size does not match (%ld != %ld).\n",
-                        (long) statp->st_size, (long) size);
+               debugmsg(DM_MISC, "size does not match (%lld != %lld).\n",
+                        (off_t) statp->st_size, (off_t) size);
                return(US_OUTDATE);
        } 

/usr/src/usr.bin/rdistd/server.c
================================
# diff -uw /home/richard.toohey/obsd-src/usr.bin/rdistd/server.c server.c 
--- /home/richard.toohey/obsd-src/usr.bin/rdistd/server.c       Thu Oct 29
17:34:06 2009
+++ server.c    Fri Mar 25 14:49:18 2011
@@ -391,7 +391,7 @@
 #else
        /*
         * We use MT_NOTICE instead of MT_CHANGE because this function is
-        * sometimes called by other functions that are suppose to return a
+        * sometimes called by other functions that are supposed to return a
         * single ack() back to the client (rdist).  This is a kludge until
         * the Rdist protocol is re-done.  Sigh.
         */
@@ -656,8 +656,8 @@
        case S_IFIFO:
 #endif
 #endif
-               (void) sendcmd(QC_YES, "%ld %ld %o %s %s",
-                              (long) stb.st_size, stb.st_mtime,
+               (void) sendcmd(QC_YES, "%lld %ld %o %s %s",
+                              (off_t) stb.st_size, stb.st_mtime,
                               stb.st_mode & 07777,
                               getusername(stb.st_uid, target, options), 
                               getgroupname(stb.st_gid, target, options));
@@ -1420,7 +1420,7 @@
        /*
         * Get file size
         */
-       size = strtol(cp, &cp, 10);
+       size = strtoll(cp, &cp, 10);
        if (*cp++ != ' ') {
                error("recvit: size not delimited");
                return;
@@ -1523,7 +1523,7 @@
         */
        if (min_freespace || min_freefiles) {
                /* Convert file size to kilobytes */
-               long fsize = (long) (size / 1024);
+               off_t fsize = (off_t) (size / 1024);
 
                if (getfilesysinfo(target, &freespace, &freefiles) != 0)
                        return;

Thanks.

Reply via email to