Re: system/6586: rdist (file larger than 2GB) times out but will not die -- Testers needed
Hi folks. Current rdist will timeout with files >2GB, log as finished, but will not die. The bug (system/6586) was originally noted by IBM (AIX) in 2006: https://www-304.ibm.com/support/docview.wss?uid=isg1IY85396 I have patches for the client rdist and server rdistd. I have tested i386 and amd64, in both directions. Please continue this. Testing on alpha would be especially welcomed. Thanks to everyone in advance. Steven client.c I did check into the comparison at line 689. Basically, it is ASSUMED that link files (not the actual files) will be tiny. The only attributes returned from an lstat() that refer to the symbolic link itself are the file type (S_IFLNK), size, blocks, and link count (always 1). That code is safe FOR NOW ... IF the (meta)data in the link grows a lot THEN it could be a problem. This should be a good state. 1. FIXED bug of filesize >2GB -- calculations and messages 2. FIXED similar in minimum freespace (and free files) 3. verified/fixed system write (and read) calls 4. TODO improve buffering i386 -> i386 OK i386 -> i386OK amd64 -> i386 OK amd64 -> i386 OK i386 -> amd64 OK i386 -> amd64 OK i386 -> macppc OK richardtoo...@paradise.net.nz i386 -> amd64 OK richardtoo...@paradise.net.nz macppc -> amd64 OK richardtoo...@paradise.net.nz amd64 -> i386 OK richardtoo...@paradise.net.nz diff -uw /usr/src/usr.bin/rdist/Makefile rdist/Makefile --- /usr/src/usr.bin/rdist/Makefile Sun Jan 4 21:55:28 2004 +++ rdist/Makefile Mon Mar 28 22:03:24 2011 @@ -3,6 +3,7 @@ PROG= rdist CFLAGS+=-I. -I${.CURDIR} -DOS_H=\"os-openbsd.h\" +#CFLAGS+=-Wall -pedantic SRCS= gram.y child.c client.c common.c distopt.c docmd.c expand.c \ isexec.c lookup.c message.c rdist.c CLEANFILES+=gram.c y.tab.h diff -uw /usr/src/usr.bin/rdist/child.c rdist/child.c --- /usr/src/usr.bin/rdist/child.c Thu Oct 29 00:34:05 2009 +++ rdist/child.c Sun Mar 27 16:36:19 2011 @@ -177,7 +177,7 @@ readchild(CHILD *child) { char rbuf[BUFSIZ]; - int amt; + ssize_t amt; debugmsg(DM_CALL, "[readchild(%s, %d, %d) start]", child->c_name, child->c_pid, child->c_readfd); @@ -196,7 +196,7 @@ */ while ((amt = read(child->c_readfd, rbuf, sizeof(rbuf))) > 0) { /* XXX remove these debug calls */ - debugmsg(DM_MISC, "[readchild(%s, %d, %d) got %d bytes]", + debugmsg(DM_MISC, "[readchild(%s, %d, %d) got %ld bytes]", child->c_name, child->c_pid, child->c_readfd, amt); (void) xwrite(fileno(stdout), rbuf, amt); @@ -205,7 +205,7 @@ child->c_name, child->c_pid, child->c_readfd); } - debugmsg(DM_MISC, "readchild(%s, %d, %d) done: amt = %d errno = %d\n", + debugmsg(DM_MISC, "readchild(%s, %d, %d) done: amt = %ld errno = %d\n", child->c_name, child->c_pid, child->c_readfd, amt, errno); /* diff -uw /usr/src/usr.bin/rdist/client.c rdist/client.c --- /usr/src/usr.bin/rdist/client.c Thu Oct 29 00:34:06 2009 +++ rdist/client.c Sun Mar 27 16:05:15 2011 @@ -399,8 +399,8 @@ */ ENCODE(ername, rname); - (void) sendcmd(C_RECVREG, "%o %04o %ld %ld %ld %s %s %s", - opts, stb->st_mode & 0, (long) stb->st_size, + (void) sendcmd(C_RECVREG, "%o %04o %lld %ld %ld %s %s %s", + opts, stb->st_mode & 0, stb->st_size, stb->st_mtime, stb->st_atime, user, group, ername); if (response() < 0) { @@ -409,8 +409,8 @@ } - debugmsg(DM_MISC, "Send file '%s' %ld bytes\n", rname, -(long) stb->st_size); + debugmsg(DM_MISC, "Send file '%s' %lld bytes\n", rname, + stb->st_size); /* * Set remote time out alarm handler. @@ -666,8 +666,8 @@ * Gather and send basic link info */ ENCODE(ername, rname); - (void) sendcmd(C_RECVSYMLINK, "%o %04o %ld %ld %ld %s %s %s", - opts, stb->st_mode & 0, (long) stb->st_size, + (void) sendcmd(C_RECVSYMLINK, "%o %04o %lld %ld %ld %s %s %s", + opts, stb->st_mode & 0, stb->st_size, stb->st_mtime, stb->st_atime, user, group, ername); if (response() < 0) @@ -869,7 +869,7 @@ /* * Parse size */ - size = (off_t) strtol(cp, (char **)&cp, 10); + size = (off_t) strtoll(cp, (char **)&cp, 10); if (*cp++ != ' ') { error("update: size n
system/6586: rdist (file larger than 2GB) times out but will not die -- Testers needed
Hi folks. Current rdist will timeout with files >2GB, log as finished, but will not die. The bug (system/6586) was originally noted by IBM (AIX) in 2006: https://www-304.ibm.com/support/docview.wss?uid=isg1IY85396 I have patches for the client rdist and server rdistd. I have tested i386 and amd64, in both directions. Please continue this. Testing on alpha would be especially welcomed. Thanks to everyone in advance. Steven client.c I did check into the comparison at line 689. Basically, it is ASSUMED that link files (not the actual files) will be tiny. The only attributes returned from an lstat() that refer to the symbolic link itself are the file type (S_IFLNK), size, blocks, and link count (always 1). That code is safe FOR NOW ... IF the (meta)data in the link grows a lot THEN it could be a problem. This should be a good state. 1. FIXED bug of filesize >2GB -- calculations and messages 2. FIXED similar in minimum freespace (and free files) 3. verified/fixed system write (and read) calls 4. TODO improve buffering i386 -> i386 OK i386 -> i386OK amd64 -> i386 OK amd64 -> i386 OK i386 -> amd64 OK i386 -> amd64 OK diff -uw /usr/src/usr.bin/rdist/Makefile rdist/Makefile --- /usr/src/usr.bin/rdist/Makefile Sun Jan 4 21:55:28 2004 +++ rdist/Makefile Mon Mar 28 22:03:24 2011 @@ -3,6 +3,7 @@ PROG= rdist CFLAGS+=-I. -I${.CURDIR} -DOS_H=\"os-openbsd.h\" +#CFLAGS+=-Wall -pedantic SRCS= gram.y child.c client.c common.c distopt.c docmd.c expand.c \ isexec.c lookup.c message.c rdist.c CLEANFILES+=gram.c y.tab.h diff -uw /usr/src/usr.bin/rdist/child.c rdist/child.c --- /usr/src/usr.bin/rdist/child.c Thu Oct 29 00:34:05 2009 +++ rdist/child.c Sun Mar 27 16:36:19 2011 @@ -177,7 +177,7 @@ readchild(CHILD *child) { char rbuf[BUFSIZ]; - int amt; + ssize_t amt; debugmsg(DM_CALL, "[readchild(%s, %d, %d) start]", child->c_name, child->c_pid, child->c_readfd); @@ -196,7 +196,7 @@ */ while ((amt = read(child->c_readfd, rbuf, sizeof(rbuf))) > 0) { /* XXX remove these debug calls */ - debugmsg(DM_MISC, "[readchild(%s, %d, %d) got %d bytes]", + debugmsg(DM_MISC, "[readchild(%s, %d, %d) got %ld bytes]", child->c_name, child->c_pid, child->c_readfd, amt); (void) xwrite(fileno(stdout), rbuf, amt); @@ -205,7 +205,7 @@ child->c_name, child->c_pid, child->c_readfd); } - debugmsg(DM_MISC, "readchild(%s, %d, %d) done: amt = %d errno = %d\n", + debugmsg(DM_MISC, "readchild(%s, %d, %d) done: amt = %ld errno = %d\n", child->c_name, child->c_pid, child->c_readfd, amt, errno); /* diff -uw /usr/src/usr.bin/rdist/client.c rdist/client.c --- /usr/src/usr.bin/rdist/client.c Thu Oct 29 00:34:06 2009 +++ rdist/client.c Sun Mar 27 16:05:15 2011 @@ -399,8 +399,8 @@ */ ENCODE(ername, rname); - (void) sendcmd(C_RECVREG, "%o %04o %ld %ld %ld %s %s %s", - opts, stb->st_mode & 0, (long) stb->st_size, + (void) sendcmd(C_RECVREG, "%o %04o %lld %ld %ld %s %s %s", + opts, stb->st_mode & 0, stb->st_size, stb->st_mtime, stb->st_atime, user, group, ername); if (response() < 0) { @@ -409,8 +409,8 @@ } - debugmsg(DM_MISC, "Send file '%s' %ld bytes\n", rname, -(long) stb->st_size); + debugmsg(DM_MISC, "Send file '%s' %lld bytes\n", rname, + stb->st_size); /* * Set remote time out alarm handler. @@ -666,8 +666,8 @@ * Gather and send basic link info */ ENCODE(ername, rname); - (void) sendcmd(C_RECVSYMLINK, "%o %04o %ld %ld %ld %s %s %s", - opts, stb->st_mode & 0, (long) stb->st_size, + (void) sendcmd(C_RECVSYMLINK, "%o %04o %lld %ld %ld %s %s %s", + opts, stb->st_mode & 0, stb->st_size, stb->st_mtime, stb->st_atime, user, group, ername); if (response() < 0) @@ -869,7 +869,7 @@ /* * Parse size */ - size = (off_t) strtol(cp, (char **)&cp, 10); + size = (off_t) strtoll(cp, (char **)&cp, 10); if (*cp++ != ' ') { error("update: size not delimited"); return(US_NOTHING); @@ -878,7 +878,7 @@ /* * Parse mtime */ - mtime = strtol(cp, (char **)&cp, 10); + mtime = (time_t) strtol(cp, (char **)&cp, 10); if (*cp++ != ' ') { error("update: mti
Upgrade i386 to amd64
Ran the upgrade from CD. Want to be sure that packages are OK. Is "pkg_add -u" sufficient? (It looks like nothing changed.) Thanks, Steven
Re: rdist times out but will not die
On 3/20/2011 2:07 PM, Steven R. Gerber wrote: > I want to do local/remote mirror/backup (or should that be local-mirror > / offsite-backup). > So a two-part question: > 1.Even if there is a timeout, shouldn't the job/process exit? > ** > rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies: chown from > rdist:operator to cdripper:operator > rdist@thedump: thedump: > /mnt/mirror2/public/read_only/movies/The_Thomas_Crown_Affair_1999: chown > from rdist:operator to root:operator > rdist@thedump: > /mnt/stripe2/public/read_only/movies/The_Thomas_Crown_Affair_1999/THOMAS_CROWN_AFFAIR_16X9.md5: > updating > rdist@thedump: > /mnt/stripe2/public/read_only/movies/The_Thomas_Crown_Affair_1999/THOMAS_CROWN_AFFAIR_16X9.iso: > installing > rdist@thedump: LOCAL ERROR: Response time out > rdist@thedump: updating of rdist@thedump finished > $ ps -ax|grep rdist > 26025 ?? I 0:00.00 tee /var/log/rdist/2011-03-20 > 11059 ?? I 0:00.01 rdist -f /etc/Distfile > 28446 ?? I 0:22.99 rdist: update rdist@thedump (rdist) > 7795 ?? I 1:10.32 ssh -l rdist thedump r > 13045 p0 S+ 0:00.00 grep rdist > ** > 2.I know that they happen from time to time. How can I avoid/prevent > timeouts? The default is 900 sec AKA 15 min? How can this happen > between two local machines? > > Thanks. > > > Sorry to reply to myself, but I really need help with this. The movies always timeout via rdist. If I transfer the movies myself via sftp then there are no timeouts. The processes continue to accumulate everyday unless I manually kill them. I know that I am missing something. Should I edit /etc/daily to turn on debugging? Please/Thanks.
rdist times out but will not die
I want to do local/remote mirror/backup (or should that be local-mirror / offsite-backup). So a two-part question: 1. Even if there is a timeout, shouldn't the job/process exit? ** rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies: chown from rdist:operator to cdripper:operator rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies/The_Thomas_Crown_Affair_1999: chown from rdist:operator to root:operator rdist@thedump: /mnt/stripe2/public/read_only/movies/The_Thomas_Crown_Affair_1999/THOMAS_CROWN_AFFAIR_16X9.md5: updating rdist@thedump: /mnt/stripe2/public/read_only/movies/The_Thomas_Crown_Affair_1999/THOMAS_CROWN_AFFAIR_16X9.iso: installing rdist@thedump: LOCAL ERROR: Response time out rdist@thedump: updating of rdist@thedump finished $ ps -ax|grep rdist 26025 ?? I 0:00.00 tee /var/log/rdist/2011-03-20 11059 ?? I 0:00.01 rdist -f /etc/Distfile 28446 ?? I 0:22.99 rdist: update rdist@thedump (rdist) 7795 ?? I 1:10.32 ssh -l rdist thedump r 13045 p0 S+ 0:00.00 grep rdist ** 2. I know that they happen from time to time. How can I avoid/prevent timeouts? The default is 900 sec AKA 15 min? How can this happen between two local machines? Thanks.
Re: OpenBSD 4.8 RAID 0+1 or 1+0 or 5
On 2/16/2011 10:50 AM, Joel Sing wrote: > On Wednesday 16 February 2011, Steven R. Gerber wrote: >> Sorry for cross posting? >> I am trying to setup a decent RAID (0+1 or 1+0 or 5) and there SEEMS to >> be no approved method. (4 disks -- I usually like stripe on top of >> mirrors.) >> I believe that I have done my homework. >> What are my options? >> >> softraid (bioctl) cannot handle stripe on mirrors: >> I can easily create 2 mirrors and they survive reboot. >> I can create stripe on those mirrors (works -- can create files), but it >> does not survive reboot. > > Define "does not survive reboot". I'm guessing that you probably mean "fails > to automatically reassemble at boot", which is accurate - we do not currently > probe volumes that we have just assembled. Things should just work if you > manually assemble it after the mirrors are available. Note that this is not a > supported configuration, however it does seem to work - YMMV. > >> Message is device not configured. >> >> Both ccd and RAIDframe are decprecated (FAQ 14.13): >>> Software Options >>> OpenBSD supports softraid(4), a framework supporting many kinds of I/O >> >> transformations, including RAID and encryption disciplines. Softraid(4) >> is managed using bioctl(8). >> >>> OpenBSD also includes RAIDframe (raid(4), requires a custom kernel), >> >> and ccd(4) as historic ways of implementing RAID, but at this point >> OpenBSD does not suggest implementing either as a RAID solution for new >> installs or reinstalls. >> "OpenBSD does not suggest implementing either" >> Also, RAIDframe requires a custom kernel and we all know that GENERIC is >> preferred. >> >> RAID 5 is experimental (man bioctl): >>> CAVEATS >>> Use of the CRYPTO & RAID 4/5 disciplines are currently considered >>> experimental. >>> >>> OpenBSD 4.9December 22, 2010 >> >> OpenBSD 4.9 >> >> Also, bioctl would not let me create a RAID 5 set: >> # bioctl -iv softraid0 >> # bioctl -c 5 -l /dev/sd1a,/dev/sd2a,/dev/sd3a,/dev/sd4a softraid0 >> bioctl: BIOCCREATERAID: Invalid argument >> # bioctl -iv softraid0 >> # dmesg|tail >> sd11 at scsibus6 targ 0 lun 0: SCSI2 0/direct >> fixed >> sd11: 3815436MB, 512 bytes/sec, 7814014721 sec total >> sd11 detached >> scsibus6 detached >> sd10 detached >> scsibus5 detached >> sd9 detached >> scsibus4 detached >> softraid0: not part of the same volume >> softraid0: can't attach metadata type 0 > > You previously had a RAID 0 volume on some or all of these partitions, hence > the "not part of the same volume" and "can't attach metadata type 0" messages > (softraid is refusing to make members of a RAID 0 volume into a RAID 5 > volume). Either wipe the first 1MB or so of each partition (dd if=/dev/zero > of=/dev/rsd1a bs=1m count=1, etc) or use 'bioctl -C force ... '. > -- > > bReason is not automatic. Those who deny it cannot be conquered by it. > Do not count on them. Leave them alone.b -- Ayn Rand > > > "-C force" still fails (BUG!) I had to manually clear sd1...sd4 Now, I have an EXPERIMENTAL RAID 5 volume. Not the worst. *** sd9 at scsibus4 targ 0 lun 0: SCSI2 0/direct fixed sd9: 5723178MB, 512 bytes/sec, 11721070081 sec total *** But, EXPERIMENTAL RAID 5 is dangerous (Marco Peereboom). OpenBSD softraid fully supports only RAID 0 (stripe) and RAID 1 (mirror). RAID 0 provides NO redundancy (not really RAID). RAID 1 is a waste beyond 2 disks. I want/need to use 4 (or more) disks. A real RAID (array) requires RAID 0+1 or RAID 1+0 or RAID 5 ... A custom kernel with RAIDframe is starting to look good. Still waiting for the next step ... Thanks, Steven
Re: OpenBSD 4.8 RAID 0+1 or 1+0 or 5
On 2/15/2011 5:52 PM, Marco Peereboom wrote: > it isn't supported so don't do it. it is in the pipeline to do stacked > raid sets but it is all talk for now. > > On Tue, Feb 15, 2011 at 02:45:13PM -0500, Steven R. Gerber wrote: >> Sorry for cross posting? >> I am trying to setup a decent RAID (0+1 or 1+0 or 5) and there SEEMS to >> be no approved method. (4 disks -- I usually like stripe on top of >> mirrors.) >> I believe that I have done my homework. >> What are my options? [SNIP] >> >> Both ccd and RAIDframe are decprecated (FAQ 14.13): [SNIP] >> RAID 5 is experimental (man bioctl): [SNIP] >> Thanks, >> Steven Understood. I need/want to use 4 drives. But, RAID 5 is still experimental, right? Please, give me some guidance. Should I just fall back to ccd? Should I try to debug my setup re. softraid RAID 5? Thanks, Steven
OpenBSD 4.8 RAID 0+1 or 1+0 or 5
Sorry for cross posting? I am trying to setup a decent RAID (0+1 or 1+0 or 5) and there SEEMS to be no approved method. (4 disks -- I usually like stripe on top of mirrors.) I believe that I have done my homework. What are my options? softraid (bioctl) cannot handle stripe on mirrors: I can easily create 2 mirrors and they survive reboot. I can create stripe on those mirrors (works -- can create files), but it does not survive reboot. Message is device not configured. Both ccd and RAIDframe are decprecated (FAQ 14.13): > Software Options > OpenBSD supports softraid(4), a framework supporting many kinds of I/O transformations, including RAID and encryption disciplines. Softraid(4) is managed using bioctl(8). > > OpenBSD also includes RAIDframe (raid(4), requires a custom kernel), and ccd(4) as historic ways of implementing RAID, but at this point OpenBSD does not suggest implementing either as a RAID solution for new installs or reinstalls. "OpenBSD does not suggest implementing either" Also, RAIDframe requires a custom kernel and we all know that GENERIC is preferred. RAID 5 is experimental (man bioctl): > CAVEATS > Use of the CRYPTO & RAID 4/5 disciplines are currently considered > experimental. > > OpenBSD 4.9December 22, 2010 OpenBSD 4.9 > Also, bioctl would not let me create a RAID 5 set: # bioctl -iv softraid0 # bioctl -c 5 -l /dev/sd1a,/dev/sd2a,/dev/sd3a,/dev/sd4a softraid0 bioctl: BIOCCREATERAID: Invalid argument # bioctl -iv softraid0 # dmesg|tail sd11 at scsibus6 targ 0 lun 0: SCSI2 0/direct fixed sd11: 3815436MB, 512 bytes/sec, 7814014721 sec total sd11 detached scsibus6 detached sd10 detached scsibus5 detached sd9 detached scsibus4 detached softraid0: not part of the same volume softraid0: can't attach metadata type 0 Thanks, Steven