Puzzling stack trace

2010-03-26 Thread Peter Steele
I'm reposting this here since it's a pretty low-level discussion. Hopefully 
someone here can explain what's going on.


We had an app crash and the resulting core dump produced a very puzzling stack 
trace:



#0 0x0008011d438c in thr_kill () from /lib/libc.so.7

#1 0x0008012722bb in abort () from /lib/libc.so.7

#2 0x0008011fb70c in malloc_usable_size () from /lib/libc.so.7

#3 0x0008011fbb95 in malloc_usable_size () from /lib/libc.so.7

#4 0x0008011fdaea in _malloc_thread_cleanup () from /lib/libc.so.7

#5 0x0008011fdc86 in _malloc_thread_cleanup () from /lib/libc.so.7

#6 0x0008011fc8e9 in malloc_usable_size () from /lib/libc.so.7

#7 0x0008011fccc7 in malloc_usable_size () from /lib/libc.so.7

#8 0x0008011ffe8f in malloc () from /lib/libc.so.7

#9 0x00080127374b in memchr () from /lib/libc.so.7

#10 0x00080125e6e9 in __srget () from /lib/libc.so.7

#11 0x0008012352dd in vsscanf () from /lib/libc.so.7

#12 0x000801220087 in fscanf () from /lib/libc.so.7



This trace resulted from a call to fscanf, as follows:



char buffer[21];

fscanf(in, %20s, buffer);



We've verified that the data being read was correct, and clearly the buffer in 
which fscanf is storing the string it reads is valid (i.e., it's not NULL). So 
what would lead this fscanf() call into calling abort()? Everything seems to be 
in order. What's more puzzling to us is that we've looked for calls to 
malloc_usable_size() in the libc sources and although the function is defined 
we can find no direct call to the function in our FBSD 8 sources:


$ grep -R 'malloc_usable_size' *|grep -v .svn
libc/stdlib/Symbol.map: malloc_usable_size;
libc/stdlib/Makefile.inc:   malloc.3 realloc.3 malloc.3 reallocf.3 malloc.3 
malloc_usable_size.3
libc/stdlib/malloc.c:malloc_usable_size(const void *ptr)

That's it. Nothing calls this function from what we can tell. Even if something 
did call it, we don't understand why it would call abort(). It has an assert:

malloc_usable_size(const void *ptr)
{
assert(ptr != NULL);
return (isalloc(ptr));
}

but the pointer we pass to fscanf() is clearly not NULL, so what pointer would 
this function be testing?

It's all very puzzling and we cannot reproduce this failure. We'd like to 
understand what happened though.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: Puzzling stack trace

2010-03-26 Thread Peter Steele
Type frame 9 and see what it says.  If the bug is easily reproducable, try 
reproducing it with a debugging version of libc (buildworld with
DEBUG_FLAGS=-g)

This crash happened at a production customer site--we have the core and the 
matching binary and our logs for the application that crashed but that's all. 
We've never seen this particular crash before and cannot reproduce it. The 
fscanf() call that failed is repeated on a continual basis as part of a 
monitoring thread, so literally thousands of this exact same call have been 
made without incident.

The frame 9 command doesn't show anything useful:

(gdb) frame 9
#9  0x00080127374b in memchr () from /lib/libc.so.7

That's it. And yes, the stack trace appears to be wrong. Even the trace 
starting from the vsscanf call is wrong. It says that __srget() is the next 
function in the stack but vsscanf() doesn't call __srget():

int
vsscanf(const char * __restrict str, const char * __restrict fmt,
__va_list ap)
{
FILE f;

f._file = -1;
f._flags = __SRD;
f._bf._base = f._p = (unsigned char *)str;
f._bf._size = f._r = strlen(str);
f._read = eofread;
f._ub._base = NULL;
f._lb._base = NULL;
f._orientation = 0;
memset(f._mbstate, 0, sizeof(mbstate_t));
return (__svfscanf(f, fmt, ap));
}

So it seems our application went completely out to lunch. This is concerning.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

RE: Puzzling stack trace

2010-03-26 Thread Peter Steele
Are you absolutely sure the machine you ran gdb on has the exact same libc 
etc. as the customer's machine?

I just connected to the customer's box and generated the stack trace directly 
on their box. It looks identical to the one I posted in my original message.

Something's not right here...

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

RE: Puzzling stack trace

2010-03-26 Thread Peter Steele
Also, you should see if 
__svfscanf() calls __srget().  The __svfscanf() call frame may not show up in 
gdb if the compiler re-used the callframe from vsscanf for __svfscanf() as an 
optimization.

I just checked--it does not call __srget()...

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: Puzzling stack trace

2010-03-26 Thread Peter Steele
As stated in a earlier message. This may help get the information you need. 
Just more of a automated approach to compiling these.

Thanks for the script; I'll definitely archive it. Unfortunately, our window 
for investigating this problem further is over as this customer is upgrading 
their systems today and the OS is getting wiped...

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: ntpd hangs under FBSD 8

2010-02-26 Thread Peter Steele
So this is arguably a Python bug. Did you contacted anybody who cares about 
the Python ?

I did not, mainly because this link:

http://bugs.python.org/msg61870

seems to imply they are already aware of the problem. I agree it must be a 
Python bug though. It worked in 2.5.1 but not in 2.5.5 and later, so clearly 
they changed how processes are launched from threads that has led to this 
problem. One should not have to be forced to make explicit calls to change the 
signal mask in order to launch an external app. Granted, we've only had this 
issue with ntpd--other apps launch fine--but there is clearly something wrong 
somewhere for even one app to hang when it is spawned as a thread.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: ntpd hangs under FBSD 8

2010-02-25 Thread Peter Steele
I think problem not in ntpd, since I use ntpdate. And in 50% times, when it 
run from startup script, it hangs with kernel.
No Ctrl+C work, kernel don`t answer for ping, just freeze.
Problem somewhere in kernel, maybe in subsystems that set new time, maybe in 
network(UDP) parts.
This problem don`t affect other programs, so I think this in time handling 
code.

I think you may be describing a different problem. For one thing, we don't use 
ntpdate, we use the ntpd -g -q alternative. Secondly, for us ntpd is hanging 
100% of the time when run via a Python thread class. The exception is Python 
2.5.1; this succeeds 100% of the time.

Peter, what platform You use? I use MIPS BCM5354.

We have a variety of 1U and 3U boxes. They all hang the same way.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: ntpd hangs under FBSD 8

2010-02-25 Thread Peter Steele
Very wild guess, check the process signal mask of the child for both methods 
of spawning.

I'm running ntpd through Python. How do I check the process signal mask? I did 
some quick searches and it seems Python does not support sigprocmask(). 

In my searches I came across this link:

http://bugs.python.org/msg61870

I think you might be right that this is related to the signal mask. In my 
scenario the select call is hanging indefinitely, just like discussed in this 
article.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: ntpd hangs under FBSD 8

2010-02-25 Thread Peter Steele
 We'll likely go with this solution instead of downgrading Python and the 
 related libraries.

In fact I came up with another solution. I realized that since the problem was 
related to the process signal mask, instead of called ntpd directly, wrap it up 
in a C app that resets the signal mask to something that works. I have the 
following code:

   sigset_t set, oset;
   sigemptyset(set);
   pthread_sigmask(SIG_SETMASK, set, oset);
   system(/usr/sbin/ntpd -g -q);
   pthread_sigmask(SIG_SETMASK, oset, NULL);

I wrapped this up into a standalone app and call this from Python instead of 
calling ntpd directly. This solved the problem--no more hang. Thanks very much 
to Kostik Belousov for his wild guess that this was related to the process 
signal mask. His guess was dead on.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: ntpd hangs under FBSD 8

2010-02-24 Thread Peter Steele
You're going to need a debug version of libc, too.  gdb won't be able to find 
a backtrace out of a libc function without it.

What's the proper way to build a debug version of libc and the other libraries? 
I tried this:

export CFLAGS=-O0
make buildworld
make installworld DESTDIR=/mydir

and then copied libc.so.7 from /mydir/lib to the /lib dir on my target system. 
I also replaced the ntpd binary with the debug version. I can see that -O0 is 
being used in the various cc commands that are generated, but libc still 
doesn't seem to be built properly. When I attach to a hung ntpd process, I get 
this:

# gdb /usr/sbin/ntpd -p 2113
GNU gdb 6.1.1 [FreeBSD]
...
Attaching to program: /usr/sbin/ntpd, process 2113
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
...
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
...
[Switching to Thread 8012041c0 (LWP 100283)]
0x000800dbeddc in select () from /lib/libc.so.7
(gdb) bt
#0  0x000800dbeddc in select () from /lib/libc.so.7
#1  0x004335de in ntpdmain ()
#2  0x0043310b in main ()

So I'm getting some symbols from ntpd but I still can't see into select(). It 
hangs in there forever so that's where I need to drill down further. How do I 
get libc built with full debug symbols?

In other testing I've narrowed the problem down to some kind of Python issue. 
If I run the Python code at the end of this email where ntpd -g -q is 
launched as part of a Python thread class, the command hangs (the code assumes 
that ntpd is not already running). If I run the same ntpd command in a normal 
function (e.g. main) no hang occurs. I've tried subcommand.Popen and os.spawnv 
to run ntpd and these calls behave exactly the same way--when called from a 
thread the ntpd process hangs but it works fine when called from outside of a 
thread. This is a breakdown of course of our larger project into a simple test 
app. In our real code we cannot so easily eliminate this thread wrapper.

The same code BTW works fine on our FreeBSD 7 boxes, the main difference being 
we are running an older version of Python on those boxes (2.5.1 instead of 
2.6.2). I tried installing the same 2.5.1 package on a FBSD 8 box and that 
solved the problem. Curiously a slightly newer FBSD 7 version of Python, 2.5.5, 
causes the same hang to occur. So only Python 2.5.1 built under FreeBSD 7 works 
to get around this issue with ntpd on FreeBSD 8. That means one potential 
solution is to downgrade to this 2.5.1, but we have other libraries targeted to 
work with Python 2.6 and we don't really want to downgrade all these associated 
libraries.

If anyone has any clues at all as to what is causing this issue, I'd appreciate 
the feedback. Here's the code that reproduces this behavior.

#! /usr/bin/env python
import os
import threading

class RunProc(threading.Thread):
def __init__(self, cmd):
threading.Thread.__init__(self)
self.cmd = cmd

def run(self):
os.system(self.cmd)

def main():
RunProc(/usr/sbin/ntpd -g -q).start()

if __name__ == __main__:
main()


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: ntpd hangs under FBSD 8

2010-02-24 Thread Peter Steele
 What's the proper way to build a debug version of libc and the other 
 libraries? I tried this:

You can just do this:

cd /usr/src/lib/libc
make clean
make DEBUG_FLAGS=-g
make install

When I tried this the make actually failed with various errors. So I decided to 
do a full make buildworld DEBUG_FLAGS=-g but in looking at the output being 
generated I see see -O2 in the cc commands and this at least should be -O0. It 
doesn't look like the DEBUG_FLAGS is having any effect.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: ntpd hangs under FBSD 8

2010-02-24 Thread Peter Steele
 How do I get libc built with full debug symbols?
 
I haven't tried it by myself but think here is the way to go: put the 
following to /etc/make.conf and recompile needed libraries / ports.
WITH_DEBUG=yes
DEBUG_FLAGS=-g

That didn't seem to have any effect. I still see -O2 being used instead of -O0.

Mmm... Do other daemons (sshd, lpd, ...) also fail when started through this 
script? Normal commands (ls, ps) seem not affected.

I tried a few other things and they all seemed to run correctly. We use this 
same general approach in the full version of this script to launch lots of 
applications. Its role in fact is a process launcher/monitor. I stripped it 
down to the bare minimum in order to isolate the cause of the problem. It seems 
that only ntpd hangs, but not if I use Python 2.5.1.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: ntpd hangs under FBSD 8

2010-02-24 Thread Peter Steele
I bet ntpd doesn't call select() in all that many places.  Instead of going to 
all this trouble to build a debugging libc, you could just
grep for select() and place breakpoints on all occurrences.  (It might also be 
obvious from looking at them which one is the offender.)

I just checked--there are five calls to select. I might flag each one with a 
printf or something and recompile to see which one is the culprit.

Also, since a system call is causing the trouble, you might learn something 
from truss or ktrace.

I'll check these out...
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: ntpd hangs under FBSD 8

2010-02-24 Thread Peter Steele
 make install should be done with DEBUG_FLAGS containing -g too, otherwise
 strip(1) is called on the installed binary.

Doh, yes.

I did not do this; that's likely my problem. Thanks.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: ntpd hangs under FBSD 8

2010-02-22 Thread Peter Steele
Just out of curiosity, can you attach to the process via gdb and get a 
backtrace? This smells like a locked pthread_join I hit in my own code a few 
weeks ago

I'm not using the debug version of ntpd so the backtrace isn't too useful, but 
here's what I get:

(gdb) bt
#0  0x000800d52bfc in select () from /lib/libc.so.7
#1  0x00425273 in ?? ()
#2  0x0040540e in ?? ()
#3  0x00080058 in ?? ()
#4  0x in ?? ()

The trace continues for 700+ entries. The first entry is useful enough though. 
One of the parameters to select() is a timeout parameter. Every time I do the 
backtrace it's stuck on this select call so it seems they have an infinite 
timeout set. One of these was running all weekend in fact and it's still stuck. 
Curiously, this problem only happens when we make the call from code via a 
system() call. If I run the same command interactively, it never hangs:

# /usr/sbin/ntpd -g -q
ntpd: time set +28845.997063s

The same code that runs this command does not hang when we run it on a BSD 7 
box. 

I think I'm going to have to build the debug version of ntpd and try to debug 
it. Definitely something weird going on.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: ntpd hangs under FBSD 8

2010-02-22 Thread Peter Steele
You're going to need a debug version of libc, too.  gdb won't be able to find 
a backtrace out of a libc function without it.

Yeah, you're right. This is definitely an annoying bug...

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


ntpd hangs under FBSD 8

2010-02-19 Thread Peter Steele
I posted this originally on the -questions list but did not make any headway. 
We have an application where the user can change the date/time via a GUI. One 
of the options the user has is to specify that the time is to be synced using 
ntp. Our coding worked fine under BSD 7 but since we've moved to BSD 8 we've 
encountered a problem where the command that we initiate from the GUI:



ntpd -g -q



to perform the initial time sync is hanging indefinitely. Logs we've captured 
do not give any clues. This is the log from a BSD 7 system produced when this 
ntpd command is run:



17 Feb 06:35:36 ntpd[3578]: logging to file /var/log/ntpd.log

17 Feb 06:35:36 ntpd[3578]: ntpd 4.2.0-a Sun Feb 24 09:12:07 UTC 2008 (1)

17 Feb 06:35:36 ntpd[3578]: precision = 1.676 usec

17 Feb 06:35:36 ntpd[3578]: kernel time sync status 2040

17 Feb 06:35:36 ntpd[3578]: frequency initialized -10.706 PPM from 
/var/db/ntpd.drift

17 Feb 06:35:45 ntpd[3578]: synchronized to 198.186.191.229, stratum=2

17 Feb 06:35:45 ntpd[3578]: time slew +0.003648 s



and this is the log from a BSD 8 system:



17 Feb 06:35:36 ntpd[2293]: logging to file /var/log/ntpd.log

17 Feb 06:35:36 ntpd[2293]: precision = 1.676 usec

17 Feb 06:35:36 ntpd[2293]: Listening on interface #0 wildcard, 0.0.0.0#123 
Disabled

17 Feb 06:35:36 ntpd[2293]: Listening on interface #1 wildcard, ::#123 Disabled

17 Feb 06:35:36 ntpd[2293]: Listening on interface #2 nic0, 
fe80::2a0:d1ff:fee3:53cc#123 Enabled

17 Feb 06:35:36 ntpd[2293]: Listening on interface #3 nic1, 
fe80::2a0:d1ff:fee3:53cd#123 Enabled

17 Feb 06:35:36 ntpd[2293]: Listening on interface #4 lo0, fe80::1#123 Enabled

17 Feb 06:35:36 ntpd[2293]: Listening on interface #5 lo0, ::1#123 Enabled

17 Feb 06:35:36 ntpd[2293]: Listening on interface #6 lo0, 127.0.0.1#123 Enabled

17 Feb 06:35:36 ntpd[2293]: Listening on interface #7 lagg0, 192.168.17.46#123 
Enabled

17 Feb 06:35:36 ntpd[2293]: Listening on routing socket on fd #29 for interface 
updates

17 Feb 06:35:36 ntpd[2293]: kernel time sync status 2040

17 Feb 06:35:36 ntpd[2293]: frequency initialized -10.706 PPM from 
/var/db/ntpd.drift



It never gets past this last log line and we have to do a kill -9 on the ntpd 
process. The ntp.conf file we're using is



# General Configuration

server 0.us.pool.ntp.org

server 1.us.pool.ntp.org

server 2.us.pool.ntp.org

server 3.us.pool.ntp.org



# Drift file

driftfile /var/db/ntpd.drift



The versions of the two ntpd binaries are different--4.2.0-a for FBSD 7 and 
4.2.4p5 for FBSD 8. Someone suggested that I try the command:



ntpq -pc rv localhost



But I'm not sure how to interpret the output. On a FBSD 7 system I get this:



 remote   refid  st t when poll reach   delay   offset  jitter

==

+169.229.70.183  169.229.128.214  3 u   40  512   377.9219.170   8.836

*208.75.88.4 192.12.19.20 2 u   43  512   37   12.0498.224   8.168

+217.160.254.116 209.51.161.238   2 u   38  512   37   55.111   -7.128  10.347

+198.247.173.220 128.206.12.130   3 u   39  512   37   47.401   -1.149   3.659

status=c624 sync_alarm, sync_ntp, 2 events, event_peer/strat_chg, version=ntpd 
4.2.0-a Sun Feb 24 09:12:07 UTC 2008 (1), processor=amd64, 
system=FreeBSD/7.0-RELEASE-p9, leap=11, stratum=16, precision=-20, 
rootdelay=0.000, rootdispersion=8.340, peer=25349, refid=INIT, 
reftime=.  Wed, Feb  6 2036 22:28:16.000, poll=4, 
clock=cf26c2d5.ea2b4541  Wed, Feb 17 2010 11:32:37.914, state=1, offset=0.000, 
frequency=-13.269, jitter=0.001, stability=0.000



and on a FBSD 8 system I get this:



 remote   refid  st t when poll reach   delay   offset  jitter

==

assID=0 status=c011 sync_alarm, sync_unspec, 1 event, event_restart, 
version=ntpd 4.2.4p5-a (1), processor=amd64, system=FreeBSD/8.0-CURRENT, 
leap=11, stratum=16, precision=-19, rootdelay=0.000, rootdispersion=0.000, 
peer=0, refid=INIT,

reftime=.  Wed, Feb  6 2036 22:28:16.000, poll=6,

clock=cf26c4d1.d21b33f1  Wed, Feb 17 2010 11:41:05.820, state=1, offset=0.000, 
frequency=-14.299, jitter=0.002, noise=0.002, stability=0.000, tai=0

169.229.70.183  .INIT.  16 u-   6400.0000.000   0.002

208.75.88.4 .INIT.  16 u-   6400.0000.000   0.002

217.160.254.116 .INIT.  16 u-   6400.0000.000   0.002

198.137.202.16  .INIT.  16 u-   6400.0000.000   0.002



In the case of the FBSD8 output, I collected this while one of these hangs was 
happening. The most obvious difference is the .INIT. entries, but there also 
appear to be several 0.0 type of entries that look like the ntpd process is 
stuck in some kind of initialization state.



Anyone have any ideas what's going on here?




___

RE: How can I force boot from alternate drive with boot.config?

2010-02-09 Thread Peter Steele
 So, more precisely, if I wanted to boot from drive 1, I'd use this?
 
 1:ad(1p3)/boot/loader

Yes, unless there are more bugs hiding. :-) I fixed a few in August last year.

Well, I'll give it a try and let you know if I find new bugs... :-)

I just tried this and it works as advertised--thanks. One question though: Why 
does this string list the device number twice? The man page describes it as

bios_drive:interface(unit,[slice,]part)filename

where bios_drive is the drive number as recognized by the BIOS.  0 for the  
first drive, 1 for the second drive, etc., and unit is the unit number of the 
drive on the interface being used. 0 for the first drive, 1 for the second 
drive, etc.

This sounds like it's describing the same thing, but not exactly, but I've 
always used the same value in both fields and it's always worked. Is there a 
case where these values might be different? In the test I just did I booted 
from the fourth drive of a four drive system using

3:ad(3p4)/boot/loader

I know my hardware and knew ad10 mapped to the fourth drive and would be 
referenced as drive 3 in this context. But how would I determine this 
generically? For example, given something like /dev/adN, how do I know what 
number I'd use for this drive in boot.config?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


How can I force boot from alternate drive with boot.config?

2010-02-08 Thread Peter Steele
I've asked this on the -questions list but haven't had any feedback. I have a 
system configured with multiple identical drives each loaded with FreeBSD. When 
I was using MBR partitioning, I could create a boot.config to force the system 
to boot from a specific drive. For example, if I wanted to boot from the second 
drive, I'd create a boot.config with:



1:ad(1,a)/boot/loader



We've switched to GPT partitioning and I can't seem to find a way to do this 
same trick. The boot loader only seems to recognize MBR partitions when it 
comes to this feature. I looked at the boot.c source code and there doesn't 
seem to be anything specifically related to GPT partitioning. I cannot for 
example say something like:



1:ad(1,p3)/boot/loader



where p3 is the root partition in my GPT partitioned drives. So I'm puzzled: If 
I have a two drive system with BSD loaded on both drives and the drives are 
configured with GPT partitions, how can I force the system to boot from the 
second drive using boot.config?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: How can I force boot from alternate drive with boot.config?

2010-02-08 Thread Peter Steele
 I use: ad(0p3)/boot/loader

So, more precisely, if I wanted to boot from drive 1, I'd use this?

1:ad(1p3)/boot/loader

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Converting a bootable USB stick in to bootable CD-ROM

2009-11-10 Thread Peter Steele
I posted this on the -questions list but didn't get any replies. I have a 
FreeBSD image that I install on USB sticks to build new systems. When the stick 
boots it automatically clones itself on the system's hard drive, creating 
partitions and other configuration parameters that are programmed into the 
stick's cloning logic. I want to create a similar mechanism using a bootable 
CD-ROM. The biggest difference in the process of course is that the CD-ROM 
itself is read-only so clearly there needs to be an mfsroot involved in the 
process. I looked at how the FreeBSD Live CD is setup and the loader.conf file 
has these lines:



mfsroot_load=YES

mfsroot_type=mfs_root

mfsroot_name=/boot/mfsroot



along with the file /boot/mfsroot.gz and no /etc/fstab. The fstab on my USB 
stick version has root mounted as /etc/da0s1a and clearly that isn't going to 
work. I changed my core BSD image accordingly, duplicating the mfsroot settings 
in my loader.conf.



I used the command below to create the iso file from the BSD image I prepared.



mkisofs -R -no-emul-boot -o /tmp/bsd.iso -b boot/cdboot  /bsd



When this iso is copied to a CD, it does boot. However, it doesn't seem to be 
picking up the mfsroot config and complains that the system is running from on 
a read-only file system, which of course is what I'm trying to avoid.



I assume I simply have the boot config setup wrong. I essentially want the same 
kind of thing that's done for BSD Live. Can anyone point me to the right info 
for setting up this kind of bootable BSD CD?



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


How to signal a time zone change?

2009-08-07 Thread Peter Steele
We have a suite of applications with a Java GUI controlling everything.
One of the actions the user can perform is to set the time zone. We do
this through our Java application and update the /etc/localtime as
required. We also make an API call to tell the JVM that the time zone as
changed, and from the perspective of the Java app, the time zone is
changed correctly (the timestamps for example in our log files reflect
the change). Likewise, after the user performs this action, running
date on one of our systems shows that the time zone has been changed
as requested. 

 

The problem is with our C applications. They continue to operate with
the old time zone, so things like timestamps in log files are not in
sync with the timestamps in the Java app log files. If we stop and
restart the C apps they pick up the time zone change. However, we don't
want to take this extreme approach. We want the Java app to signal to
the C applications that the time zone has changed. However, I've
experimented with the various time zone related calls and I cannot
figure out what call is needed to make the C applications pick up the
time zone change. I've tried setting the environment variable TZ to the
new time zone and this doesn't seem to work, and I've tried calling

tzset() and tzsetwall(). In each case after I make these calls the
function localtime() does not return the same time base as the Java
application.

 

Based on what I've read, I would think that the following steps would do
the trick on the C side after the Java app changes time zone and updates

/etc/localtime:

 

time_t date = time(NULL);

unsetenv(TZ);

tzset();

printf(time zone is %s/%s, tzname[0], tzname[1]);

struct tm* locTime = localtime(date);

printf(%02d:%02d:%02d, locTime-tm_hour,
locTime-tm_min, locTime-tm_sec);

 

The time printed in this example however is still based on the old time
zone. The tzname variable that is set by tzset() still shows for example
EDT even if I have just changed the time zone to PDT. If I stop and
restart the C app, the time is correct, and tzname is then PDT instead
of EDT.

 

I'm very puzzled on what I'm supposed to do to kick start the time zone
change in C. We do not want to have to restart our C apps for something
as trivial as this. I posted this originally to the questions list but
didn't get much traction. I'm hoping someone on this list can point me
in the right direction.

 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: How to signal a time zone change?

2009-08-07 Thread Peter Steele
You need to signal your app in some way.. Assuming you have source for
the app then you can monitor /etc/localtime (or /etc) for change with
kevent.

Signaling our C apps aren't the problem. We have an IPC framework in
place and we can easily tell the C apps when the user has changed the
time zone via the GUI. The problem is I can't figure out what C calls
are needed to instantiate the time zone change. Based on the
documentation, I would think that tzset() would do the trick once
/etc/localtime has been updated by the Java app, but this does not work.
The only way I've discovered that works is to restart our C apps and we
want to avoid that.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: How to signal a time zone change?

2009-08-07 Thread Peter Steele
What's the value of the TZ environment variable for the C apps? You may
need to have them read the new value from somewhere, and then rerun
tzset().

The default value of the TZ environment variable is null. I just tried
passing the explicitly time zone value to the C app and setting TZ to
that value and that seemed to work. I would think that that I should be
able to retrieve that value from /etc/localtime as the docs imply. Guess
not. If I have to pass the time zone to the C app, then I guess that's
what I'll do...

 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Number of open files per process

2009-06-22 Thread Peter Steele
Is it possible to determine the number of open files per process? We
want to monitor this via a separate process and issue an alarm if some
threshold is crossed.

 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


WARNING: Expected rawoffset 0, found 63?

2009-03-25 Thread Peter Steele
I posted this on the questions list but didn't get a lot of traction. I've 
created GEOM mirrored file systems on two slices of my system's drives and 
everything seems to be working, but I get the warning s 

WARNING: Expected rawoffset 0, found 63 
WARNING: Expected rawoffset 0, found 50332464 

when the mirrors are being created. These correspond to the offsets for these 
slices in the partition table: 

# fdisk -p ad4 
# /dev/ad4 
g c484521 h16 s63 
p 1 0xa5 63 50332401 
a 1 
p 2 0xa5 50332464 16778160 
p 3 0xa5 67110624 421285536 

Partition three is not mirror, just partitions 1 and 2. I use the following 
command to create the slice 1 mirror: 

gmirror label -v -n -b round-robin mirror-name drive-names1 

and a similar one for slice 2. Additional drives are added to this mirror after 
the data has been copied to the mirrored file systems. 

The disks are setup with the required labels, including making sure the c 
partition is reduced in size by one sector. E.g.: 

# bsdlabel ad4s1 
# /dev/ad4s1: 
8 partitions: 
# size offset fstype [fsize bsize bps/cpg] 
a: 10485760 16 4.2BSD 2048 16384 28528 
c: 50332400 0 unused 0 0 # raw part, don't edit 
d: 8388608 10485776 4.2BSD 2048 16384 28528 
e: 31457280 18874384 4.2BSD 2048 16384 28528 
bsdlabel: partition c doesn't cover the whole unit! 
bsdlabel: An incorrect partition c may cause problems for standard system 
utilities 

# bsdlabel ad4s2 
# /dev/ad4s2: 
8 partitions: 
# size offset fstype [fsize bsize bps/cpg] 
b: 16778143 16 swap 
c: 16778159 0 unused 0 0 # raw part, don't edit 
bsdlabel: partition c doesn't cover the whole unit! 
bsdlabel: An incorrect partition c may cause problems for standard system 
utilities 

So as far as I can tell I have everything configured the way it should be and 
everything appears to be working fine, but these warnings worry me. Should I be 
worried? 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: How to tear down a geom mirror?

2009-03-06 Thread Peter Steele
 Or simply use the clean command, for example gmirror clean (also 
supported in other GEOM classes). 

Can I do a gmirror clean without first doing a gmirror load? That's what I want 
to avoid since it can hang if the mirror is is a bad state. 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: How to tear down a geom mirror?

2009-03-06 Thread Peter Steele
gmirror and various other geom modules store their metadata on the last 
sector(s) of the drive, so you need to wipe that too. 

In our case the systems we are using aren't mirroring the whole drive, just 
certain slices. Some systems have a single slice mirrored (plus an unmirrored 
slice), and others have two slices mirrored (plus a third unmirrored slice). I 
need a way to destroy the existing mirrors, without doing a gmirror load, and 
ultimately without making any assumptions about the number or condition of 
mirrored slices on the drives I am about to install a new OS onto. 


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: How to tear down a geom mirror?

2009-03-06 Thread Peter Steele
Yes. The clear commands usually just zero-out the last sector of the 
underlying provider (doesn't matter if it's a drive, slice or something 
altogether different) so you don't have to do it manually. 

So, as a generic solution then I could just iterate through all slices of all 
drives and run gmirror clear on each, and run dd to clear the first sectors. 
What btw is in these first sectors? I use this command because I saw it being 
done in one of the gmirror tutorials. I understand what the gmirror clear 
command does, but what is the dd command clearing? 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: How to tear down a geom mirror?

2009-03-06 Thread Peter Steele
Okay, thanks everyone for their feedback. I think I have a workable solution 
now. 

Peter 

- Original Message - 
From: Oliver Fromme o...@lurza.secnetix.de 
To: freebsd-hackers@FreeBSD.ORG, pste...@maxiscale.com 
Sent: Friday, March 6, 2009 11:15:11 AM GMT -08:00 US/Canada Pacific 
Subject: Re: How to tear down a geom mirror? 

Peter Steele wrote: 
  Yes. The clear commands usually just zero-out the last sector of the 
  underlying provider (doesn't matter if it's a drive, slice or something 
  altogether different) so you don't have to do it manually. 
 
 So, as a generic solution then I could just iterate through all 
 slices of all drives and run gmirror clear on each, and run dd 
 to clear the first sectors. What btw is in these first sectors? I 
 use this command because I saw it being done in one of the gmirror 
 tutorials. I understand what the gmirror clear command does, but what 
 is the dd command clearing? 

It clears the MBR (slice table) and GPT or disklabel 
(partition table), if any. Depending on how many 
sectors you clear, it will also destroy the beginning 
the file system, e.g. the first UFS superblock. 

By the way, if you cannot use gmirror clear for any 
reason, you can also easily clear the last sector on 
any devices using the information from diskinfo. 
For example: 

DEV=/dev/ad0s1a 
set -- $(diskinfo $DEV) 
BLOCKSIZE=$2 
MEDIASIZE=$4 
LASTSEC=$(( $MEDIASIZE - 1 )) 
dd if=/dev/zero of=$DEV bs=$BLOCKSIZE seek=$(( $MEDIASIZE - 1 )) count=1 

That's pretty much what gmirror clear /dev/ad0s1a does. 

Best regards 
Oliver 

-- 
Oliver Fromme, secnetix GmbH  Co. KG, Marktplatz 29, 85567 Grafing b. M. 
Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: 
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- 
chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart 

FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd 

One of the main causes of the fall of the Roman Empire was that, 
lacking zero, they had no way to indicate successful termination 
of their C programs. 
-- Robert Firth 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


How to tear down a geom mirror?

2009-03-05 Thread Peter Steele
I posed this question in the questions list but didn't get any traction. 
Hopefully someone here will have an answer. 

I've created a USB boot disk that is used to clone itself onto the systems hard 
drives, setting up mirrored file systems in the process. The main difficulty 
I'm having is reimaging a system with an existing OS whose drives are already 
configured in a mirror. I want of course to destroy the mirror and create a 
complete new one, but I can't find the right process to accomplish this 
reliably. I don't want to make any assumptions about what mirrors might exist 
already and I definitely don't want to do gmirror load before I get a chance 
to destroy any existing mirrors. 

What I am doing is to clean the drive using dd. For example, assume my target 
system has two drives ad1 and ad2. I issue the following commands: 

dd if=/dev/zero of=/dev/ad1 bs=512 count=79 
dd if=/dev/zero of=/dev/ad2 bs=512 count=79 

I'm assuming this is enough to destroy any existing mirrors on the target 
drives, and I do this before the geom driver is loaded. After this, I partition 
the drives as I want them, and then create the mirrored pair: 

gmirror load 
gmirror label -v -n -b round-robin gm0 ad1s1 
gmirror insert gm0 ad2s1 

This process works exactly as I want it if the system that is being reimaged 
has existing mirrors. However, if the drives were previously participating in a 
mirror, the label command fails, reporting the following error: 

gmirror: Can't store metadata on ad1s1: Operation not permitted. 

If I make sure the existing mirrors are torn down first doing an remove 
operation instead of using the dd method, this can solve the problem, but in 
some cases the mirror on the target system is in a suspect state and I've seen 
the gmirror load command hang idefiinitely. So I don't want to do a load 
command before I destroy the old mirrors, but I can't seem to find a way to 
reliably destroy the old mirrors. Can anyone suggest a way to do this? 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: FreeBSD boot menu is missing

2008-11-27 Thread Peter Steele
Mirroring the entire slice is far simpler.  If you mirror individual
partitions, you have to label them *before* you newfs them.

What we're really trying to accomplish is an automated install via a PXE boot 
server. Unfortunately gmirror isn't available in mfsroot at the point the file 
systems need to be set up. So what we've ended up doing is doing is what 
amounts to a bootstrap install on the first disk, and then after the 
installCommit is done, gmirror is available and we have a post install script 
that runs gmirror on the other drives. Then the script copies the OS slice over 
to the gmirrored fs, reboots to this mirrored system, and finally adds the 
original disk to the mirror. It's fully automated and gives us a mirrored OS 
slice across four drives, and we even handle drives of different sizes.

I would mirror the whole drive, though

We can't do that. The data on the non-mirrored portion is different on each 
drive and we don't want it mirrored.

 - and I would use ZFS, with which
you can easily transition to larger drives (just replace them one by one
and resilver in between - you can even do it online if your disks are
hot-swappable)

FreeBSD doesn't handle hot swap very well we've discovered, not unless you are 
using a RAID based backplane and drives. We cannot use RAID in our application, 
and don't in fact want to. We're still trying to figure out how to deal with 
drive removal in a live non-RAIDed system.

We plan to move to ZFS but we are too close to a release cycle to make the move 
now (QA would have to run through weeks of testing). ZFS will happen, though, 
sooner or later.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: FreeBSD boot menu is missing

2008-11-27 Thread Peter Steele
So what you do, instead, is make sure there is a little space left over
at the end of the slice that you create in the first step.  Then, once
gmirror is available, you gmirror label the slice, then gmirror insert
the corresponding slice on the other disk(s), and gmirror rebuild.  No
copying involved; gmirror takes care of it all.

The key here is that 'gmirror label' is non-destructive as long as the
last sector on the provider is unused.

The problem is I was unable to get multiple slices defined in a sysinstall 
config script. I tried many variations of parameters to pump into 
diskPartitionEditor and diskLabelEditor so that we could create three slices 
during the install but I couldn't find anything that worked. So I ended up 
having to create a single full disk slice to install the OS onto, and then in a 
post commit step slice the disks up as we want them and copy the OS over. I 
couldn't find a single example how to create multiple slices in a sysinstall 
config file. If you know how to do this, I'd love to see it. 

It does, AFAIK, even on SATA, provided the controller supports it and is
configured correctly.

With the proper controller and drive, yes, FreeBSD does support hot swap, to a 
point. Let's say for example that you have a file system mounted on a drive and 
that drive dies. You can pull it and put in a new one, but FreeBSD will not let 
you unmount the file system on the original drive. Even umount -f fails. We 
have to reboot to get the old mount point released, and we haven't found any 
way around this.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: FreeBSD boot menu is missing

2008-11-27 Thread Peter Steele
I wouldn't use a sysinstall script.

Yeah, I should probably have done it that way but I inherited the existing 
sysinstall framework from someone else and ended up extending it to use 
gmirror. I know more about this area now and I'd like to redo the whole thing, 
avoiding sysinstall. That will have to be a future project though.

That's an entirely different matter...  that's why you use gmirror or
graid or zfs or whatever, so you can swap out the drive online.

RAID is not an option for us, at least not for this particular problem. Long 
story.



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]

FreeBSD boot menu is missing

2008-11-26 Thread Peter Steele
I have a procedure for converting a FreeBSD box to use a mirrored slice
for the OS. Everything working fine except that after I've made the
conversion I am no longer getting the normal boot menu, the one that
counts down 10 seconds waiting for the user to pick on option. 

 

I see a single line showing that the BTX 1.01 loader has been launched,
but from there the system simply boots directly with no menu being
displayed. I'm obviously missing a step when using gmirror to convert a
system over to use mirroring but I'm not sure what. My basic approach is
to install the OS onto the first drive, setting it to use the standard
boot manager, and then setup the second drive using gmirror and copy the
file systems over to the mirror. I then set boot.config to boot off this
drive and it comes up fine, there just isn't any boot menu. 

 

Any advice on how to solve this would be appreciated. Thanks.

 

 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: FreeBSD boot menu is missing

2008-11-26 Thread Peter Steele
The phrase and copy the file systems over to the mirror worries
me. Do you actually copy the file systems, or do you let the mirror
system do it for you? In particular, are you mirroring file systems or
the entire disk? Because the boot blocks aren't part of any file
system, so you won't have copied them over, hence you'll be getting
whatever boot software the second drive has installed.

I'm more or less using the approach described here:

http://people.freebsd.org/~rse/mirror/

This assumes you have an existing OS installed on one drive of a
multi-drive system. You then use gmirror to create mirror devices on a
second drive to match the partitions of the boot drive, transfer the
data to the newly established mirror, adjust /etc/fstab on the mirrored
root partition to mount the appropriate mirrored devices, then reboot,
telling the boot loader to boot from the mirrored drive instead of the
original boot drive (via an entry in boot.config). After it comes up,
you can then add the original boot drive to the mirror (and any other
drive if there are more than two drives that you want to mirror) using
gmirror insert. This all works fine, except I'm not getting the boot
menu. I know this isn't part of the mirroring, but it is a step I need
to perform as part of the whole process. The question is what do I need
to do to make sure the appropriate boot loader is setup?

My recommendation for gmirror is to set up one drive to boot from,
then us gmirror label to create a gmirror device on each partition
(excluding swap). Edit /etc/fstab to use the gmirror devices thus
created, and reboot to make sure it's working properly. It will
initially boot from the disk device (pretty much required until
gmirror is started), then switch to the mirrored root partition.  Now
use gmirror insert to add the matching partitions on the second disk,
and let gmirror update the bits on the second drive. You'll need to
copy the boot blocks from the first drive to the second drive by hand
if you want to boot off the second drive.

I think you are describing more or less the same process here.

FWIW, these days I use ZFS on 64 bit systems in preference to UFS and
gmirror.

We plan to switch our application over to ZFS, but not this close to a
release.

Final comment: if you didn't ask on -questions first, this would have
been more appropriate there than here.

My bad. I'm new in this arena and didn't know where the appropriate
place to post. I'm use -questions in the future.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: FreeBSD boot menu is missing

2008-11-26 Thread Peter Steele
He had you install a stock MBR on the second disk. You never copied
the boot loader from the first disk, so that's what you're going to
use when you boot from the second disk. You need to install the boot
block you want on the second disk. Which probably means
boot0. boot0cfg will do that for you. You probably want
   boot0cfg -B -s 1 diskdevice   # The device - ad1, not the
slice!

Okay, that makes sense. That's an easy change to my script.

Um, no. He reduced the size of one partition because he's overly
paranoid about gmirror failing to recognize the providers properly,
which forces him to dump and restore one partition - which leads to
doing them all to get them on one disk. If you don't need to resize
the partitions, you can just labelling the disk you're already using.
Once you've done that, you can gmirror insert the second drive into
the mirror, and it will resilver the second drive while providing full
access to the first one. No need to copy any data at all.

Man, I wish I'd known this. I built a whole automated framework around
this, assuming you couldn't set up the initial mirror drive with a live
file system. I'll have to try your solution; it is definitely the way to
go. We are dealing with identical size drives as well so this shouldn't
be a problem.

His analysis of the choices is pretty shallow as well. He lets wanting
to use different-sized disks dominate the analysis, which is great if
you're building your mirror with disks from the parts bin. I tend to
by drives to pairs if I want to mirror them, so that's
immaterial. Once that's gone, mirroring a full disk slice just doesn't
make sense at all - either mirror the entire disk (to get the MBR), or
mirror the partitions in the slice (for extra flexibility and less
painful resilvering).

We don't want to mirror the whole drive, just the OS partitions. I
decided to go with the full slice mirroring because of what was
described in this link. If mirroring the partitions in the slice is the
better way to go, then that's fine my me. 

Better instructions for getting a full-disk mirror can be found here:
http://www.onlamp.com/pub/a/bsd/2005/11/10/FreeBSD_Basics.html

I look forward to reading this. Thanks for the help!

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Hot swapping SATA drives

2008-11-25 Thread Peter Steele
I've done some searches regarding FreeBSD's 7 support for the hot
swapping of SATA drives and the general consensus appears that it *is*
supported, but not necessarily with all drive models/brands. In our own
testing, we've discovered that our Seagate 250GB drives cannot be hot
swapped in our servers. The system appears to sense when they are
removed but not when they are reinserted, and we've had numerous panics
experimenting with them.

 

We also have some Western Digital drives, and these fare much better.
FreeBSD appears to recognize when these drives are removed and inserted.
If we have a WD configured as part of a geom mirror, the geom driver
automatically re-inserts a previously configured drive as soon as it is
plugged in. It isn't even necessary to do an atacontrol attach/detach.

 

However, even with the Western Digital drive, there are issues. In
particular, if there are any mounted file systems on a drive when it is
removed, attempting to unmount the file systems after it has been
removed usually leads to a kernel panic, not necessarily immediately but
shortly afterwards. I've tried the latest 7.0 patch level, p6, and the
panics appear to have been fixed, but there are still problems.

 

If a drive dies on us, we want to be able to close existing file handles
and allow the new drive to take over. But what we've experienced is that
even a umount -f will not umount a file system if the drive has been
pulled. And as I type this, I have a system in the lab that is
completely frozen after a drive pull test. No panic, no reboot, it's
just hung up solid.

 

Why does FreeBSD panic/freeze instead of simply issuing an I/O error,
and why is there no way to force open file handles to close when a drive
is pulled. The implication is that if a drive was to suddenly die on a
live system, even if we have gmirror configured for HA, the system will
likely panic or freeze and we'll have to reboot. We have software that
detects when a drive disappears, but if the system is going to end up
having to be rebooted, our detection code isn't going to do us much
good.

 

Is there any solution to this? Can a server be built around FreeBSD that
supports hot swappable SATA drives?

 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Hot swapping SATA drives

2008-11-25 Thread Peter Steele
Use a real hot-swappable drive plane, attached to a good SATA
controller 
that handles hot-swap in hardware?  :)

Use ZFS, which seems to work better with drives being added/removed
than 
ata(4)?  :)

Sorry, the few systems we have running FreeBSD either have single IDE 
drives, single SATA drives, or 12-24 SATA drives attached to a hardware

hot-swappable drive-plane connected to 3Ware 9550/9650 RAID
controllers.  
The single-drive systems obviously can't do swapping, and the rest work

without issues.

I should further clarified that we are running 4-drive systems, with
drive sizes ranging from 250GB-1TB. These drives are not in a RAID
cluster and we do not want them to be. We do need the drives to be hot
swappable though. I'll contact 3Ware and go from there.

Thanks for the reply.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: What are proper install.cfg for configuring multiple slices?

2008-11-19 Thread Peter Steele
I want to do an automated sysinstall through an install.cfg script and
the script partition the install disk into three slices. I've been
going
through various tests trying to figure out what the proper directives
are but I haven't had much luck, and I can't find any good examples.

After a lot of experimenting, my impression is that sysinstall simply
doesn't support multiple slice installations. It works to a point, but I
get some unexpected errors, e.g.

Unable to make device node for /dev/ad0s1a in /dev

and after the partitioning is complete, there are funky entries under
/dev:

/dev/ad0c
/dev/ad0cs1
/dev/ad0cs2
/dev/ad0cs3
/dev/ad0s1
/dev/ad0s2
/dev/ad0s3

There should be entries such as /dev/ad0s1a and so on, but these do not
get created. I've been unable to find even one example of how to
formulate multiple partitions in install.cfg, but I'm pretty sure I'm
doing it right, based on the sysinstall docs. Does anyone have any
experience with this?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: How can I add new binaries to the mfsroot image?

2008-11-18 Thread Peter Steele
I believe you modify /usr/src/release/${ARCH}/boot_crunch.conf to do
this.

I haven't actually tried though...

I think it would be possible to have a 'GEOM' menu that you can run
prior to fdisk, label, etc that would allow you to
do some basic stuff like this.

While the sysinstall code is a bit fugly it's not that difficult to
hack on (speaking from limited experience :)

Hmmm. I hadn't planned on actually creating a custom sysinstall but I
guess that's another way we could approach this. I have some research to
do...
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: How can I add new binaries to the mfsroot image?

2008-11-18 Thread Peter Steele
What I've done in the past is skip sysinstall alltogether and just boot

of an NFS root. Then use custom scripts for the slicing/partitioning/ 
mirroring, copy a minimal system to disk and pkg_add the rest.
Would be nice to do all this with install.cfg though. Please let me
know 
when you get this working.

I thought of doing something like this as well. I'll have to investigate
this as
another option to this problem.

Thanks for the feedback guys.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


What are proper install.cfg for configuring multiple slices?

2008-11-18 Thread Peter Steele
I want to do an automated sysinstall through an install.cfg script and
the script partition the install disk into three slices. I've been going
through various tests trying to figure out what the proper directives
are but I haven't had much luck, and I can't find any good examples.
Here is a snippet of my config file:

 

disk=ad0

bootManager=standard

partition=12582912

diskPartitionEditor

partition=2097152

diskPartitionEditor

partition=free

diskPartitionEditor

 

ad0s1-1=ufs 4194304 /

ad0s1-2=ufs 4194304 /tmp

ad0s1-3=ufs 4194304 /var

ad0s2-1=swap 2097152 none

ad0s3-1=ufs 4194304 none

ad0s3-2=ufs 4194304 none

ad0s3-3=ufs 0 none

diskLabelEditor

diskLabelCommit

 

My intent here is to create three slices-one 6GB in size, another 1GB in
size, and the third sized to consume the remaining free space. When I
run this through sysinstall, it complains that it can't find the space
for the partitions. It even complains that it can't find any free space.
Because the slices don't get created, the subsequent label assignments
fail as well. What is the proper commands for creating multiple slices
in install.cfg?

 

Another thing I'm having trouble with is partitioning more than one
disk. I have four disks that I'd like to partition as part of the
install.cfg script. In fact, I want to partition the four disks more or
less identically (although only one should have an active root
partition). Again though, if I try partitioning another disk after ad0,
sysinstall complains about various things and the disk does not get
partitioned. Can multiple disks be partitioned in this manner or does
the step have to be done as a post-install operation?

 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: How can I add new binaries to the mfsroot image?

2008-11-18 Thread Peter Steele
You wouldn't have to do so - you could just run a shell script from
sysinstall and do what you want.

That brings me back to my original problem. Yes, I can run a shell
script from sysinstall, but gmirror isn't available in mfsroot, and
adding gmirror to mfsroot isn't straightforward because it needs shared
libraries. I think the best approach to use may very well to have a
custom boot that mounts root from an NFS disk. Then I can run whatever
commands I need without having to actually add anything to mfsroot...

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: How can I add new binaries to the mfsroot image?

2008-11-17 Thread Peter Steele
I'm not sure, but probably the installation CD doesn't carry shared
libraries at all? All binaries in /stand are
static-linked ones.

Yeah, that is absolutely the problem--no shared libraries are available
when sysinstall is running. 

You could also try scripts from mfsbsd project:
http://people.freebsd.org/~mm/mfsbsd/

These works for me fine for building custom installation CDs.

I'll have to check this out. I'm not getting anywhere with trying to
customize mfsroot with my current approach...


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: How can I add new binaries to the mfsroot image?

2008-11-17 Thread Peter Steele
I'll have to check this out. I'm not getting anywhere with trying to
customize mfsroot with my current approach...

The goal we are trying to achieve btw is to make gmirror available
during an install so that the file systems are mirrored right from the
get-go, so that we can avoid having to go through the process of
converting a system as a post operation. The standard slicing/partition
commands of sysinstall do support the creation of a mirrored file system
though, so our idea was to run a script via install.cfg to take care of
fdisk/bsdlabel/gmirror phase, and then install the packages in the
normal fashion via subsequent steps in install.cfg.

Is this something that can be done via sysinstall? If not, what's the
best alternative? This whole process is targeted to be on a PXE boot
server so we can configure our systems in a completely automated
hands-off manner. We have 200+ FreeBSD systems and we definitely need an
automated process. We already have it working fine, but without
mirroring. We can upgrade doezens of systems at a time simply by making
them boot from our PXE server. We now need to tweak this process so that
we can establish the mirrored file systems as part of the automated
install.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: How can I add new binaries to the mfsroot image?

2008-11-17 Thread Peter Steele
I'll have to check this out. I'm not getting anywhere with trying to
customize mfsroot with my current approach...

The goal we are trying to achieve btw is to make gmirror available
during an install so that the file systems are mirrored right from the
get-go, so that we can avoid having to go through the process of
converting a system as a post operation. The standard slicing/partition
commands of sysinstall do *not* support the creation of a mirrored file
system
though, so our idea was to run a script via install.cfg to take care of
fdisk/bsdlabel/gmirror phase, and then install the packages in the
normal fashion via subsequent steps in install.cfg.

Is this something that can be done via sysinstall? If not, what's the
best alternative? This whole process is targeted to be on a PXE boot
server so we can configure our systems in a completely automated
hands-off manner. We have 200+ FreeBSD systems and we definitely need an
automated process. We already have it working fine, but without
mirroring. We can upgrade doezens of systems at a time simply by making
them boot from our PXE server. We now need to tweak this process so that
we can establish the mirrored file systems as part of the automated
install.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


How can I add new binaries to the mfsroot image?

2008-11-16 Thread Peter Steele
I want to make a custom FreeBSD install CD-ROM with additional commands
available in the mfsroot image. Adding the new commands to the image is
easy enough, and I've made an install.cfg file on the CD-ROM as well so
that when the CD runs the commands in install.cfg are automatically
executed. This all works, except none of the new binaries I add to the
mfsroot image run during the automated sysinstall session. If I
reference one of the default commands (the ones stored in /stand) they
run fine, but if I add a new FreeBSD binary to the /stand directory
(e.g. gmirror), the command fails.

 

What's weird is that I can open a fixit shell after the install.cfg
script fails and then run the same commands interactively and they work
fine. Why would work these commands work in an interactive fixit shell
but not during the automated sysinstall session?

 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: How can I add new binaries to the mfsroot image?

2008-11-16 Thread Peter Steele
 How does it fail?

There doesn't seem to be any error generated. Or at least I tried to
capture stderr and got nothing.

 Is the binary you added statically linked?

The command I'm doing most of my testing with is gmirror. I pulled it
from one of our operation FreeBSD boxes, and it appears to be
referencing several shared libraries:

# strings /stand/gmirror | grep '.so.'
/libexec/ld-elf.so.1
libgeom.so.4
libsbuf.so.4
libbsdxml.so.3
libutil.so.7
libc.so.7

Wild guess: the shared libraries are present somewhere else on the CD, 
which perhaps is either not mounted or not pointed to by
LD_LIBRARY_PATH 
or similar until the fixit shell is run.

All of these shared libraries exist under /dist, which is mounted as the
FreeBSD CD. The first one is an absolute path that is in fact a symbolic
link in the fixit shell that ends up pointing to a location under /dist.
LD_LIBRARY_PATH is not set in the fixit shell, so I'm curious how these
shared libraries are being located under /dist (the ones without the
explicit path).

I think you are right though, it might be related to the shared
libraries. I'll try setting LD_LIBRARY_PATH explicitly to see if that
solves the problem.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]