Re: system/6586: rdist (file larger than 2GB) times out but will not die -- Testers needed

2011-04-09 Thread Steven R. Gerber
Hi folks.
Current rdist will timeout with files >2GB, log as finished, but will
not die.
The bug (system/6586) was originally noted by IBM (AIX) in 2006:
https://www-304.ibm.com/support/docview.wss?uid=isg1IY85396
I have patches for the client rdist and server rdistd.
I have tested i386 and amd64, in both directions.  Please continue this.
Testing on alpha would be especially welcomed.
Thanks to everyone in advance.

Steven

client.c
I did check into the comparison at line 689. Basically, it is ASSUMED
that link files (not the actual files) will be tiny.
The only attributes returned from an lstat() that refer to the symbolic
link itself are the file type (S_IFLNK), size, blocks, and link count
(always 1).
That code is safe FOR NOW ...
IF the (meta)data in the link grows a lot THEN it could be a problem.

This should be a good state.
1. FIXED bug of filesize >2GB -- calculations and messages
2. FIXED similar in minimum freespace (and free files)
3. verified/fixed system write (and read) calls
4. TODO improve buffering

i386 -> i386  OK
i386 -> i386OK
amd64 -> i386 OK
amd64 -> i386   OK
i386 -> amd64 OK
i386 -> amd64   OK

i386 -> macppc  OK  richardtoo...@paradise.net.nz
i386 -> amd64   OK  richardtoo...@paradise.net.nz
macppc -> amd64 OK  richardtoo...@paradise.net.nz
amd64 -> i386   OK  richardtoo...@paradise.net.nz


diff -uw /usr/src/usr.bin/rdist/Makefile rdist/Makefile
--- /usr/src/usr.bin/rdist/Makefile Sun Jan  4 21:55:28 2004
+++ rdist/Makefile  Mon Mar 28 22:03:24 2011
@@ -3,6 +3,7 @@

 PROG=  rdist
 CFLAGS+=-I. -I${.CURDIR} -DOS_H=\"os-openbsd.h\"
+#CFLAGS+=-Wall -pedantic
 SRCS=  gram.y child.c client.c common.c distopt.c docmd.c expand.c \
isexec.c lookup.c message.c rdist.c
 CLEANFILES+=gram.c y.tab.h
diff -uw /usr/src/usr.bin/rdist/child.c rdist/child.c
--- /usr/src/usr.bin/rdist/child.c  Thu Oct 29 00:34:05 2009
+++ rdist/child.c   Sun Mar 27 16:36:19 2011
@@ -177,7 +177,7 @@
 readchild(CHILD *child)
 {
char rbuf[BUFSIZ];
-   int amt;
+   ssize_t amt;

debugmsg(DM_CALL, "[readchild(%s, %d, %d) start]",
 child->c_name, child->c_pid, child->c_readfd);
@@ -196,7 +196,7 @@
 */
while ((amt = read(child->c_readfd, rbuf, sizeof(rbuf))) > 0) {
/* XXX remove these debug calls */
-   debugmsg(DM_MISC, "[readchild(%s, %d, %d) got %d bytes]",
+   debugmsg(DM_MISC, "[readchild(%s, %d, %d) got %ld bytes]",
 child->c_name, child->c_pid, child->c_readfd, amt);

(void) xwrite(fileno(stdout), rbuf, amt);
@@ -205,7 +205,7 @@
 child->c_name, child->c_pid, child->c_readfd);
}

-   debugmsg(DM_MISC, "readchild(%s, %d, %d) done: amt = %d errno = %d\n",
+   debugmsg(DM_MISC, "readchild(%s, %d, %d) done: amt = %ld errno = %d\n",
 child->c_name, child->c_pid, child->c_readfd, amt, errno);

/*
diff -uw /usr/src/usr.bin/rdist/client.c rdist/client.c
--- /usr/src/usr.bin/rdist/client.c Thu Oct 29 00:34:06 2009
+++ rdist/client.c  Sun Mar 27 16:05:15 2011
@@ -399,8 +399,8 @@
 */
ENCODE(ername, rname);

-   (void) sendcmd(C_RECVREG, "%o %04o %ld %ld %ld %s %s %s",
-  opts, stb->st_mode & 0, (long) stb->st_size,
+   (void) sendcmd(C_RECVREG, "%o %04o %lld %ld %ld %s %s %s",
+   opts, stb->st_mode & 0, stb->st_size,
   stb->st_mtime, stb->st_atime,
   user, group, ername);
if (response() < 0) {
@@ -409,8 +409,8 @@
}


-   debugmsg(DM_MISC, "Send file '%s' %ld bytes\n", rname,
-(long) stb->st_size);
+   debugmsg(DM_MISC, "Send file '%s' %lld bytes\n", rname,
+   stb->st_size);

/*
 * Set remote time out alarm handler.
@@ -666,8 +666,8 @@
 * Gather and send basic link info
 */
ENCODE(ername, rname);
-   (void) sendcmd(C_RECVSYMLINK, "%o %04o %ld %ld %ld %s %s %s",
-  opts, stb->st_mode & 0, (long) stb->st_size,
+   (void) sendcmd(C_RECVSYMLINK, "%o %04o %lld %ld %ld %s %s %s",
+   opts, stb->st_mode & 0, stb->st_size,
   stb->st_mtime, stb->st_atime,
   user, group, ername);
if (response() < 0)
@@ -869,7 +869,7 @@
/*
 * Parse size
 */
-   size = (off_t) strtol(cp, (char **)&cp, 10);
+   size = (off_t) strtoll(cp, (char **)&cp, 10);
if (*cp++ != ' ') {
error("update: size n

system/6586: rdist (file larger than 2GB) times out but will not die -- Testers needed

2011-04-07 Thread Steven R. Gerber
Hi folks.
Current rdist will timeout with files >2GB, log as finished, but will
not die.
The bug (system/6586) was originally noted by IBM (AIX) in 2006:
https://www-304.ibm.com/support/docview.wss?uid=isg1IY85396
I have patches for the client rdist and server rdistd.
I have tested i386 and amd64, in both directions.  Please continue this.
Testing on alpha would be especially welcomed.
Thanks to everyone in advance.

Steven

client.c
I did check into the comparison at line 689. Basically, it is ASSUMED
that link files (not the actual files) will be tiny.
The only attributes returned from an lstat() that refer to the symbolic
link itself are the file type (S_IFLNK), size, blocks, and link count
(always 1).
That code is safe FOR NOW ...
IF the (meta)data in the link grows a lot THEN it could be a problem.

This should be a good state.
1. FIXED bug of filesize >2GB -- calculations and messages
2. FIXED similar in minimum freespace (and free files)
3. verified/fixed system write (and read) calls
4. TODO improve buffering

i386 -> i386  OK
i386 -> i386OK
amd64 -> i386 OK
amd64 -> i386   OK
i386 -> amd64 OK
i386 -> amd64   OK


diff -uw /usr/src/usr.bin/rdist/Makefile rdist/Makefile
--- /usr/src/usr.bin/rdist/Makefile Sun Jan  4 21:55:28 2004
+++ rdist/Makefile  Mon Mar 28 22:03:24 2011
@@ -3,6 +3,7 @@

 PROG=  rdist
 CFLAGS+=-I. -I${.CURDIR} -DOS_H=\"os-openbsd.h\"
+#CFLAGS+=-Wall -pedantic
 SRCS=  gram.y child.c client.c common.c distopt.c docmd.c expand.c \
isexec.c lookup.c message.c rdist.c
 CLEANFILES+=gram.c y.tab.h
diff -uw /usr/src/usr.bin/rdist/child.c rdist/child.c
--- /usr/src/usr.bin/rdist/child.c  Thu Oct 29 00:34:05 2009
+++ rdist/child.c   Sun Mar 27 16:36:19 2011
@@ -177,7 +177,7 @@
 readchild(CHILD *child)
 {
char rbuf[BUFSIZ];
-   int amt;
+   ssize_t amt;

debugmsg(DM_CALL, "[readchild(%s, %d, %d) start]",
 child->c_name, child->c_pid, child->c_readfd);
@@ -196,7 +196,7 @@
 */
while ((amt = read(child->c_readfd, rbuf, sizeof(rbuf))) > 0) {
/* XXX remove these debug calls */
-   debugmsg(DM_MISC, "[readchild(%s, %d, %d) got %d bytes]",
+   debugmsg(DM_MISC, "[readchild(%s, %d, %d) got %ld bytes]",
 child->c_name, child->c_pid, child->c_readfd, amt);

(void) xwrite(fileno(stdout), rbuf, amt);
@@ -205,7 +205,7 @@
 child->c_name, child->c_pid, child->c_readfd);
}

-   debugmsg(DM_MISC, "readchild(%s, %d, %d) done: amt = %d errno = %d\n",
+   debugmsg(DM_MISC, "readchild(%s, %d, %d) done: amt = %ld errno = %d\n",
 child->c_name, child->c_pid, child->c_readfd, amt, errno);

/*
diff -uw /usr/src/usr.bin/rdist/client.c rdist/client.c
--- /usr/src/usr.bin/rdist/client.c Thu Oct 29 00:34:06 2009
+++ rdist/client.c  Sun Mar 27 16:05:15 2011
@@ -399,8 +399,8 @@
 */
ENCODE(ername, rname);

-   (void) sendcmd(C_RECVREG, "%o %04o %ld %ld %ld %s %s %s",
-  opts, stb->st_mode & 0, (long) stb->st_size,
+   (void) sendcmd(C_RECVREG, "%o %04o %lld %ld %ld %s %s %s",
+   opts, stb->st_mode & 0, stb->st_size,
   stb->st_mtime, stb->st_atime,
   user, group, ername);
if (response() < 0) {
@@ -409,8 +409,8 @@
}


-   debugmsg(DM_MISC, "Send file '%s' %ld bytes\n", rname,
-(long) stb->st_size);
+   debugmsg(DM_MISC, "Send file '%s' %lld bytes\n", rname,
+   stb->st_size);

/*
 * Set remote time out alarm handler.
@@ -666,8 +666,8 @@
 * Gather and send basic link info
 */
ENCODE(ername, rname);
-   (void) sendcmd(C_RECVSYMLINK, "%o %04o %ld %ld %ld %s %s %s",
-  opts, stb->st_mode & 0, (long) stb->st_size,
+   (void) sendcmd(C_RECVSYMLINK, "%o %04o %lld %ld %ld %s %s %s",
+   opts, stb->st_mode & 0, stb->st_size,
   stb->st_mtime, stb->st_atime,
   user, group, ername);
if (response() < 0)
@@ -869,7 +869,7 @@
/*
 * Parse size
 */
-   size = (off_t) strtol(cp, (char **)&cp, 10);
+   size = (off_t) strtoll(cp, (char **)&cp, 10);
if (*cp++ != ' ') {
error("update: size not delimited");
return(US_NOTHING);
@@ -878,7 +878,7 @@
/*
 * Parse mtime
 */
-   mtime = strtol(cp, (char **)&cp, 10);
+   mtime = (time_t) strtol(cp, (char **)&cp, 10);
if (*cp++ != ' ') {
error("update: mti

Upgrade i386 to amd64

2011-04-06 Thread Steven R. Gerber
Ran the upgrade from CD.
Want to be sure that packages are OK.
Is "pkg_add -u" sufficient?  (It looks like nothing changed.)

Thanks,
Steven



Re: rdist times out but will not die

2011-03-23 Thread Steven R. Gerber
On 3/20/2011 2:07 PM, Steven R. Gerber wrote:
> I want to do local/remote mirror/backup (or should that be local-mirror
> / offsite-backup).
> So a two-part question:
> 1.Even if there is a timeout, shouldn't the job/process exit?
> **
> rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies: chown from
> rdist:operator to cdripper:operator
> rdist@thedump: thedump:
> /mnt/mirror2/public/read_only/movies/The_Thomas_Crown_Affair_1999: chown
> from rdist:operator to root:operator
> rdist@thedump:
> /mnt/stripe2/public/read_only/movies/The_Thomas_Crown_Affair_1999/THOMAS_CROWN_AFFAIR_16X9.md5:
> updating
> rdist@thedump:
> /mnt/stripe2/public/read_only/movies/The_Thomas_Crown_Affair_1999/THOMAS_CROWN_AFFAIR_16X9.iso:
> installing
> rdist@thedump: LOCAL ERROR: Response time out
> rdist@thedump: updating of rdist@thedump finished
> $ ps -ax|grep rdist
> 26025 ??  I   0:00.00 tee /var/log/rdist/2011-03-20
> 11059 ??  I   0:00.01 rdist -f /etc/Distfile
> 28446 ??  I   0:22.99 rdist: update rdist@thedump (rdist)
>  7795 ??  I   1:10.32 ssh -l rdist thedump r
> 13045 p0  S+  0:00.00 grep rdist
> **
> 2.I know that they happen from time to time.  How can I avoid/prevent
> timeouts? The default is 900 sec AKA 15 min?  How can this happen
> between two local machines?
> 
> Thanks.
> 
> 
> 

Sorry to reply to myself, but I really need help with this.
The movies always timeout via rdist.  If I transfer the movies myself
via sftp then there are no timeouts.
The processes continue to accumulate everyday unless I manually kill them.
I know that I am missing something.  Should I edit /etc/daily to turn on
debugging?

Please/Thanks.



rdist times out but will not die

2011-03-20 Thread Steven R. Gerber
I want to do local/remote mirror/backup (or should that be local-mirror
/ offsite-backup).
So a two-part question:
1.  Even if there is a timeout, shouldn't the job/process exit?
**
rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies: chown from
rdist:operator to cdripper:operator
rdist@thedump: thedump:
/mnt/mirror2/public/read_only/movies/The_Thomas_Crown_Affair_1999: chown
from rdist:operator to root:operator
rdist@thedump:
/mnt/stripe2/public/read_only/movies/The_Thomas_Crown_Affair_1999/THOMAS_CROWN_AFFAIR_16X9.md5:
updating
rdist@thedump:
/mnt/stripe2/public/read_only/movies/The_Thomas_Crown_Affair_1999/THOMAS_CROWN_AFFAIR_16X9.iso:
installing
rdist@thedump: LOCAL ERROR: Response time out
rdist@thedump: updating of rdist@thedump finished
$ ps -ax|grep rdist
26025 ??  I   0:00.00 tee /var/log/rdist/2011-03-20
11059 ??  I   0:00.01 rdist -f /etc/Distfile
28446 ??  I   0:22.99 rdist: update rdist@thedump (rdist)
 7795 ??  I   1:10.32 ssh -l rdist thedump r
13045 p0  S+  0:00.00 grep rdist
**
2.  I know that they happen from time to time.  How can I avoid/prevent
timeouts? The default is 900 sec AKA 15 min?  How can this happen
between two local machines?

Thanks.



Re: OpenBSD 4.8 RAID 0+1 or 1+0 or 5

2011-02-16 Thread Steven R. Gerber
On 2/16/2011 10:50 AM, Joel Sing wrote:
> On Wednesday 16 February 2011, Steven R. Gerber wrote:
>> Sorry for cross posting?
>> I am trying to setup a decent RAID (0+1 or 1+0 or 5) and there SEEMS to
>> be no approved method.  (4 disks -- I usually like stripe on top of
>> mirrors.)
>> I believe that I have done my homework.
>> What are my options?
>>
>> softraid (bioctl) cannot handle stripe on mirrors:
>> I can easily create 2 mirrors and they survive reboot.
>> I can create stripe on those mirrors (works -- can create files), but it
>> does not survive reboot.
> 
> Define "does not survive reboot". I'm guessing that you probably mean "fails
> to automatically reassemble at boot", which is accurate - we do not currently
> probe volumes that we have just assembled. Things should just work if you
> manually assemble it after the mirrors are available. Note that this is not a
> supported configuration, however it does seem to work - YMMV.
> 
>> Message is device not configured.
>>
>> Both ccd and RAIDframe are decprecated (FAQ 14.13):
>>> Software Options
>>> OpenBSD supports softraid(4), a framework supporting many kinds of I/O
>>
>> transformations, including RAID and encryption disciplines. Softraid(4)
>> is managed using bioctl(8).
>>
>>> OpenBSD also includes RAIDframe (raid(4), requires a custom kernel),
>>
>> and ccd(4) as historic ways of implementing RAID, but at this point
>> OpenBSD does not suggest implementing either as a RAID solution for new
>> installs or reinstalls.
>> "OpenBSD does not suggest implementing either"
>> Also, RAIDframe requires a custom kernel and we all know that GENERIC is
>> preferred.
>>
>> RAID 5 is experimental (man bioctl):
>>> CAVEATS
>>>  Use of the CRYPTO & RAID 4/5 disciplines are currently considered
>>>  experimental.
>>>
>>> OpenBSD 4.9December 22, 2010
>>
>> OpenBSD 4.9
>>
>> Also, bioctl would not let me create a RAID 5 set:
>>  # bioctl -iv softraid0
>>  # bioctl -c 5 -l /dev/sd1a,/dev/sd2a,/dev/sd3a,/dev/sd4a softraid0
>> bioctl: BIOCCREATERAID: Invalid argument
>>  # bioctl -iv softraid0
>>  # dmesg|tail
>>  sd11 at scsibus6 targ 0 lun 0:  SCSI2 0/direct
>> fixed
>>  sd11: 3815436MB, 512 bytes/sec, 7814014721 sec total
>>  sd11 detached
>>  scsibus6 detached
>>  sd10 detached
>>  scsibus5 detached
>>  sd9 detached
>>  scsibus4 detached
>>  softraid0: not part of the same volume
>>  softraid0: can't attach metadata type 0
> 
> You previously had a RAID 0 volume on some or all of these partitions, hence
> the "not part of the same volume" and "can't attach metadata type 0" messages
> (softraid is refusing to make members of a RAID 0 volume into a RAID 5
> volume). Either wipe the first 1MB or so of each partition (dd if=/dev/zero
> of=/dev/rsd1a bs=1m count=1, etc) or use 'bioctl -C force ... '.
> --
> 
> bReason is not automatic. Those who deny it cannot be conquered by it.
>  Do not count on them. Leave them alone.b -- Ayn Rand
> 
> 
> 

"-C force" still fails (BUG!)
I had to manually clear sd1...sd4
Now, I have an EXPERIMENTAL RAID 5 volume.  Not the worst.
***
sd9 at scsibus4 targ 0 lun 0:  SCSI2 0/direct fixed
sd9: 5723178MB, 512 bytes/sec, 11721070081 sec total
***
But, EXPERIMENTAL RAID 5 is dangerous (Marco Peereboom).
OpenBSD softraid fully supports only RAID 0 (stripe) and RAID 1
(mirror).  RAID 0 provides NO redundancy (not really RAID).  RAID 1 is a
waste beyond 2 disks.
I want/need to use 4 (or more) disks.
A real RAID (array) requires RAID 0+1 or RAID 1+0 or RAID 5 ...
A custom kernel with RAIDframe is starting to look good.
Still waiting for the next step ...

Thanks,
Steven



Re: OpenBSD 4.8 RAID 0+1 or 1+0 or 5

2011-02-15 Thread Steven R. Gerber
On 2/15/2011 5:52 PM, Marco Peereboom wrote:
> it isn't supported so don't do it.  it is in the pipeline to do stacked
> raid sets but it is all talk for now.
> 
> On Tue, Feb 15, 2011 at 02:45:13PM -0500, Steven R. Gerber wrote:
>> Sorry for cross posting?
>> I am trying to setup a decent RAID (0+1 or 1+0 or 5) and there SEEMS to
>> be no approved method.  (4 disks -- I usually like stripe on top of
>> mirrors.)
>> I believe that I have done my homework.
>> What are my options?
[SNIP]
>>
>> Both ccd and RAIDframe are decprecated (FAQ 14.13):
[SNIP]
>> RAID 5 is experimental (man bioctl):
[SNIP]
>> Thanks,
>> Steven

Understood.  I need/want to use 4 drives.
But, RAID 5 is still experimental, right?
Please, give me some guidance.  Should I just fall back to ccd?  Should
I try to debug my setup re. softraid RAID 5?

Thanks,
Steven



OpenBSD 4.8 RAID 0+1 or 1+0 or 5

2011-02-15 Thread Steven R. Gerber
Sorry for cross posting?
I am trying to setup a decent RAID (0+1 or 1+0 or 5) and there SEEMS to
be no approved method.  (4 disks -- I usually like stripe on top of
mirrors.)
I believe that I have done my homework.
What are my options?

softraid (bioctl) cannot handle stripe on mirrors:
I can easily create 2 mirrors and they survive reboot.
I can create stripe on those mirrors (works -- can create files), but it
does not survive reboot.
Message is device not configured.

Both ccd and RAIDframe are decprecated (FAQ 14.13):
> Software Options
> OpenBSD supports softraid(4), a framework supporting many kinds of I/O
transformations, including RAID and encryption disciplines. Softraid(4)
is managed using bioctl(8).
>
> OpenBSD also includes RAIDframe (raid(4), requires a custom kernel),
and ccd(4) as historic ways of implementing RAID, but at this point
OpenBSD does not suggest implementing either as a RAID solution for new
installs or reinstalls.
"OpenBSD does not suggest implementing either"
Also, RAIDframe requires a custom kernel and we all know that GENERIC is
preferred.

RAID 5 is experimental (man bioctl):
> CAVEATS
>  Use of the CRYPTO & RAID 4/5 disciplines are currently considered
>  experimental.
>
> OpenBSD 4.9December 22, 2010
OpenBSD 4.9
>
Also, bioctl would not let me create a RAID 5 set:
# bioctl -iv softraid0
# bioctl -c 5 -l /dev/sd1a,/dev/sd2a,/dev/sd3a,/dev/sd4a softraid0
bioctl: BIOCCREATERAID: Invalid argument
# bioctl -iv softraid0
# dmesg|tail
sd11 at scsibus6 targ 0 lun 0:  SCSI2 0/direct
fixed
sd11: 3815436MB, 512 bytes/sec, 7814014721 sec total
sd11 detached
scsibus6 detached
sd10 detached
scsibus5 detached
sd9 detached
scsibus4 detached
softraid0: not part of the same volume
softraid0: can't attach metadata type 0

Thanks,
Steven