Re: devfs panic w/INVARIANTS

2010-02-05 Thread Andrew Gallatin

Kostik Belousov wrote:

On Thu, Feb 04, 2010 at 03:40:28PM -0500, Andrew Gallatin wrote:

I've got a commercial driver that uses device cloning.
At unload time, the driver calls clone_cleanup(). When I unload
the driver when the kernel is built with INVARIANTS, I'll see a
panic in devfs_populate_loop().  This happens in 6-stable,
as well as 8-stable.

From what I can see the clone has been freed, but it
remains on the devfs cdevp_list.   Then the next time
devfs_populate_loop() is called, it trips over the bad
entry (cdp-cdp_dirents points to 0xdeadc0dedeadc0de)
See appended kgdb session.

If I trace the code path, it looks like clone_cleanup()
calls destroy_devl().  And destroy_devl() will eventually
call devfs_free() if the si_refcnt is zero.  But I don't
see anything which will get the cdev removed from
the cdevp_list prior to it being freed.

The only code I see which will get the cdev removed from
the cdevp_list() seems to be the GC any lingering devices
block in devfs_populate_loop

What am I missing?


You did not mentioned it, but my guess is that you create clones from
the dev_clone event handler. Please note that devfs_lookup() that fires


Yes, I do.


dev_clone event, consumes a device reference. Thus clone handlers shall
do dev_ref().

Due to races with cleanup, you should use MAKEDEV_REF flag for
make_dev_credv(9) KPI instead of doing make_dev()/dev_ref() pair.


I need to support FreeBSD going all the way back to 6, so that's not an
option in some versions.

But, I'm talking about device removal time.  If I call clone_cleanup()
where the clones have dev-si_refcount==1, then I get the use-after-free
panic.  If I hack things to elevate the reference count (such that
dev-si_refcount==2 when clone_cleanup() is called), then I don't
get the panic.

Are you saying I should have been taking the extra reference
via my dev_clone eventhandler?   Won't having the extra reference
lead to a memory leak?   Or am I just mis-reading the code, and
this will lead to things being freed normally?


That said, do you really need clones at all ?


I need to support FreeBSD back to 6.x, and I need to support the
linux-like model of opening the same /dev/node multiple times
and getting unique handles.  So I think I need clones.

Thanks for the help!
Drew
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: devfs panic w/INVARIANTS

2010-02-05 Thread Andrew Gallatin

Kostik Belousov wrote:

On Fri, Feb 05, 2010 at 08:51:25AM -0500, Andrew Gallatin wrote:

Kostik Belousov wrote:

On Thu, Feb 04, 2010 at 03:40:28PM -0500, Andrew Gallatin wrote:

I've got a commercial driver that uses device cloning.
At unload time, the driver calls clone_cleanup(). When I unload
the driver when the kernel is built with INVARIANTS, I'll see a
panic in devfs_populate_loop().  This happens in 6-stable,
as well as 8-stable.


From what I can see the clone has been freed, but it

remains on the devfs cdevp_list.   Then the next time
devfs_populate_loop() is called, it trips over the bad
entry (cdp-cdp_dirents points to 0xdeadc0dedeadc0de)
See appended kgdb session.

If I trace the code path, it looks like clone_cleanup()
calls destroy_devl().  And destroy_devl() will eventually
call devfs_free() if the si_refcnt is zero.  But I don't
see anything which will get the cdev removed from
the cdevp_list prior to it being freed.

The only code I see which will get the cdev removed from
the cdevp_list() seems to be the GC any lingering devices
block in devfs_populate_loop

What am I missing?

You did not mentioned it, but my guess is that you create clones from
the dev_clone event handler. Please note that devfs_lookup() that fires

Yes, I do.


dev_clone event, consumes a device reference. Thus clone handlers shall
do dev_ref().

Due to races with cleanup, you should use MAKEDEV_REF flag for
make_dev_credv(9) KPI instead of doing make_dev()/dev_ref() pair.

I need to support FreeBSD going all the way back to 6, so that's not an
option in some versions.

But, I'm talking about device removal time.  If I call clone_cleanup()
where the clones have dev-si_refcount==1, then I get the use-after-free
panic.  If I hack things to elevate the reference count (such that
dev-si_refcount==2 when clone_cleanup() is called), then I don't
get the panic.

Are you saying I should have been taking the extra reference
via my dev_clone eventhandler?   Won't having the extra reference
lead to a memory leak?   Or am I just mis-reading the code, and
this will lead to things being freed normally?

Yes, clone handler shall do dev_ref(). Either by doing race-free
make_dev_credf(MAKEDEV_REF) call, or by using dev_ref() after make_dev().


OK, cool.  The man pages are handy.  When I started this
back in the FreeBSD 5 days, the man pages didn't exist :)


That said, do you really need clones at all ?

I need to support FreeBSD back to 6.x, and I need to support the
linux-like model of opening the same /dev/node multiple times
and getting unique handles.  So I think I need clones.


Wouldn't it be cleaner to use cdevpriv for the 7/8/HEAD where it is
present ? And have special #ifdef-ed code for 6, that could be
eventually dropped.


Yes, the cdevpriv() is a much cleaner interface.  I'll probably add
support for that soon.

Thanks for the help,

Drew
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


devfs panic w/INVARIANTS

2010-02-04 Thread Andrew Gallatin

I've got a commercial driver that uses device cloning.
At unload time, the driver calls clone_cleanup(). When I unload
the driver when the kernel is built with INVARIANTS, I'll see a
panic in devfs_populate_loop().  This happens in 6-stable,
as well as 8-stable.

From what I can see the clone has been freed, but it
remains on the devfs cdevp_list.   Then the next time
devfs_populate_loop() is called, it trips over the bad
entry (cdp-cdp_dirents points to 0xdeadc0dedeadc0de)
See appended kgdb session.

If I trace the code path, it looks like clone_cleanup()
calls destroy_devl().  And destroy_devl() will eventually
call devfs_free() if the si_refcnt is zero.  But I don't
see anything which will get the cdev removed from
the cdevp_list prior to it being freed.

The only code I see which will get the cdev removed from
the cdevp_list() seems to be the GC any lingering devices
block in devfs_populate_loop

What am I missing?


Thanks,

Drew


Fatal trap 9: general protection fault while in kernel mode
cpuid = 1; apic id = 01
instruction pointer = 0x8:0x803e8780
stack pointer   = 0x10:0xade623b0
frame pointer   = 0x10:0xade62400
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 896 (ps)
Dumping 510 MB (2 chunks)
Dumping 510 MB (2 chunks)
Dumping 510 MB (2 chunks)
  chunk 0: 1MB (156 pages) ... ok
  chunk 1: 510MB (130528 pages) 494 478 462 446 430 414 398 382 366 350 
334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 
30 14


#0  doadump () at pcpu.h:172
172 __asm __volatile(movq %%gs:0,%0 : =r (td));
(kgdb) bt
#0  doadump () at pcpu.h:172
#1  0x801b8d91 in db_fncall (dummy1=0, dummy2=0, dummy3=0, 
dummy4=0x0) at ../../../ddb/db_command.c:493
#2  0x801b91e5 in db_command_loop () at 
../../../ddb/db_command.c:408
#3  0x801bb0ed in db_trap (type=-1377427040, code=0) at 
../../../ddb/db_main.c:222
#4  0x80468b99 in kdb_trap (type=9, code=0, 
tf=0xade62300) at ../../../kern/subr_kdb.c:473
#5  0x806c5d14 in trap_fatal (frame=0xade62300, 
eva=18446742974557577824)

at ../../../amd64/amd64/trap.c:660
#6  0x806c62eb in trap (frame=
  {tf_rdi = -2136471632, tf_rsi = -2136471656, tf_rdx = 
-2401050962867404578, tf_rcx = 1, tf_r8 = -2136471624, tf_r9 = 
-1099151973792, tf_rax = 0, tf_rbx = -1099307447040, tf_rbp = 
-1377426432, tf_r10 = 0, tf_r11 = 4, tf_r12 = 0, tf_r13 = 
-1099086652928, tf_r14 = -1099307447040, tf_r15 = 86032452, tf_trapno = 
9, tf_addr = 0, tf_flags = -2143029088, tf_err = 0, tf_rip = 
-2143385728, tf_cs = 8, tf_rflags = 66071, tf_rsp = -1377426496, tf_ss = 
16}) at ../../../amd64/amd64/trap.c:470
#7  0x806ad84b in calltrap () at 
../../../amd64/amd64/exception.S:168
#8  0x803e8780 in devfs_populate_loop (dm=0xff000c2b8d00, 
cleanup=0) at ../../../fs/devfs/devfs_devs.c:370
#9  0x803e8beb in devfs_populate (dm=0xff000c2b8d00) at 
../../../fs/devfs/devfs_devs.c:486
#10 0x803eafab in devfs_lookup (ap=0x0) at 
../../../fs/devfs/devfs_vnops.c:587
#11 0x80724a2e in VOP_LOOKUP_APV (vop=0x80948600, 
a=0xade62630) at vnode_if.c:99

#12 0x804aadb2 in lookup (ndp=0xade629c0) at vnode_if.h:56
#13 0x804abb66 in namei (ndp=0xade629c0) at 
../../../kern/vfs_lookup.c:216
#14 0x804c1be2 in vn_open_cred (ndp=0xade629c0, 
flagp=0xade6290c, cmode=0,

cred=0xff09ac00, fdidx=3) at ../../../kern/vfs_vnops.c:183
#15 0x804b8d64 in kern_open (td=0xff00156fe260, 
path=0xmode=373490024) at ../../../kern/vfs_syscalls.c:1016
#16 0x804b9455 in open (td=0x80a807b0, 
uap=0xade62bc0) at ../../../kern/vfs_syscalls.c:971

#17 0x806c6b52 in syscall (frame=
  {tf_rdi = 4218321, tf_rsi = 0, tf_rdx = 0, tf_rcx = 0, tf_r8 = 
140737488348272, tf_r9 = 0, tf_rax = 5, tf_rbx = 5300224, tf_rbp = 
4218321, tf_r10 = 0, tf_r11 = 5300224, tf_r12 = 4218321, tf_r13 = 0, 
tf_r14 = 140737488348272, tf_r15 = 6, tf_trapno = 12, tf_addr = 5300224, 
tf_flags = 0, tf_err = 2, tf_rip = 34369309420, tf_cs = 43, tf_rflags = 
514, tf_rsp = 140737488347528, tf_ss = 35}) at 
../../../amd64/amd64/trap.c:807
#18 0x806ada48 in Xfast_syscall () at 
../../../amd64/amd64/exception.S:287

#19 0x000800920aec in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 7
#7  0x806ad84b in calltrap () at 
../../../amd64/amd64/exception.S:168

168 calltrap
Current language:  auto; currently asm
(kgdb) up
#8  0x803e8780 in devfs_populate_loop (dm=0xff000c2b8d00, 
cleanup=0) at ../../../fs/devfs/devfs_devs.c:370

370 if ((cleanup || !(cdp-cdp_flags  CDP_ACTIVE)) 
Current language:  auto; currently c

Re: semaphores between processes

2009-10-23 Thread Andrew Gallatin

Daniel Eischen wrote:

On Fri, 23 Oct 2009, John Baldwin wrote:


On Thursday 22 October 2009 5:17:07 pm Daniel Eischen wrote:

On Thu, 22 Oct 2009, Andrew Gallatin wrote:


Daniel Eischen wrote:

On Thu, 22 Oct 2009, Andrew Gallatin wrote:


Hi,

We're designing some software which has to lock access to
shared memory pages between several processes, and has to
run on Linux, Solaris, and FreeBSD.  We were planning to
have the lock be a pthread_mutex_t residing in the
shared memory page.  This works well on Linux and Solaris,
but FreeBSD (at least 7-stable) does not support
PTHREAD_PROCESS_SHARED mutexes.

We then moved on to posix semaphores.  Using sem_wait/sem_post
with the sem_t residing in a shared page seems to work on
all 3 platforms.  However, the FreeBSD (7-stable) man page
for sem_init(3) has this scary text regarding the pshared
value:

The sem_init() function initializes the unnamed semaphore 
pointed to

by
sem to have the value value.  A non-zero value for pshared 
specifies

a
shared semaphore that can be used by multiple processes, which 
this

implementation is not capable of.

Is this text obsolete?  Or is my test just getting lucky?


I think you're getting lucky.


Yes, after playing with the code some, I now see that. :(


Is there recommended way to do this?


I believe the only way to do this is with SYSV semaphores
(semop, semget, semctl).  Unfortunately, these are not as
easy to use, IMHO.


Yes, they are pretty ugly, and we were hoping to avoid them.
Are there any plans to support either PTHREAD_PROCESS_SHARED
mutexes, or pshared posix semaphores in FreeBSD?


It's planned, just not (yet) being actively worked on.
It's a API change mostly, and then adding in all the
compat hooks so we don't break ABI.


There are also an alternate set of patches on threads@ to allow just 
shared

semaphores I think w/o the changes to the pthread types.  I can't recall
exactly what they did, but I think rrs@ was playing with using umtx 
directly

to implement some sort of process-shared primitive.


That's really not the way to go.  The structs really need
to become public.



It would be great if they were, but that discussion was 6 months
ago, and nothing seems to have happened.  Plus we need to support
at least 7.X and probably 6, so any changes here might not even
help us.

What is wrong  with just using umtx directly?  It seems to do
exactly what we need.

Thanks,
Drew
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: semaphores between processes

2009-10-23 Thread Andrew Gallatin

Daniel Eischen wrote:

On Fri, 23 Oct 2009, Andrew Gallatin wrote:


Daniel Eischen wrote:

On Fri, 23 Oct 2009, John Baldwin wrote:


On Thursday 22 October 2009 5:17:07 pm Daniel Eischen wrote:

On Thu, 22 Oct 2009, Andrew Gallatin wrote:


Daniel Eischen wrote:

On Thu, 22 Oct 2009, Andrew Gallatin wrote:


Hi,

We're designing some software which has to lock access to
shared memory pages between several processes, and has to
run on Linux, Solaris, and FreeBSD.  We were planning to
have the lock be a pthread_mutex_t residing in the
shared memory page.  This works well on Linux and Solaris,
but FreeBSD (at least 7-stable) does not support
PTHREAD_PROCESS_SHARED mutexes.

We then moved on to posix semaphores.  Using sem_wait/sem_post
with the sem_t residing in a shared page seems to work on
all 3 platforms.  However, the FreeBSD (7-stable) man page
for sem_init(3) has this scary text regarding the pshared
value:

The sem_init() function initializes the unnamed semaphore 
pointed to

by
sem to have the value value.  A non-zero value for pshared 
specifies

a
shared semaphore that can be used by multiple processes, 
which this

implementation is not capable of.

Is this text obsolete?  Or is my test just getting lucky?


I think you're getting lucky.


Yes, after playing with the code some, I now see that. :(


Is there recommended way to do this?


I believe the only way to do this is with SYSV semaphores
(semop, semget, semctl).  Unfortunately, these are not as
easy to use, IMHO.


Yes, they are pretty ugly, and we were hoping to avoid them.
Are there any plans to support either PTHREAD_PROCESS_SHARED
mutexes, or pshared posix semaphores in FreeBSD?


It's planned, just not (yet) being actively worked on.
It's a API change mostly, and then adding in all the
compat hooks so we don't break ABI.


There are also an alternate set of patches on threads@ to allow just 
shared
semaphores I think w/o the changes to the pthread types.  I can't 
recall
exactly what they did, but I think rrs@ was playing with using umtx 
directly

to implement some sort of process-shared primitive.


That's really not the way to go.  The structs really need
to become public.



It would be great if they were, but that discussion was 6 months
ago, and nothing seems to have happened.  Plus we need to support
at least 7.X and probably 6, so any changes here might not even
help us.

What is wrong  with just using umtx directly?  It seems to do
exactly what we need.


Because you can't do anything more than use umtx directly,
like check for mutex types and return appropriate error
codes.  Just look at other implementations - Solaris,
Linux, all have their pthread_*_t as public structs.


I'm not saying that having pthread*t public, and getting all
the features of real PTHREAD_PROCESS_SHARED would not be far
better in general.  But in this case all we need is a lock around
a shared resource.  Eg, nothing fance.  So our choices seem to be
either:

1) use sysv semaphores (ick)
2) use a hand rolled spinlock (ick)
3) use some sort of hack built into our driver (ick, ick)
4) use umtx

Is there some bug or limitation in umtx that makes it inappropriate?
(beyond the obvious, like the potential to leave a resource locked
forever if the lock holder exits).

Thanks,

Drew
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: semaphores between processes

2009-10-23 Thread Andrew Gallatin

Daniel Eischen wrote:



We already use umtx.  This really is a hack and I wouldn't
advocate it.  I'm not sure how you could make it work and
not break existing ability to return appropriate error
codes without slowing down the path in the non-shared
case.  You'd have to check to see if the address space
was shared or not, which would require a system call.


I'm probably missing something.  What does it matter if the
address space is shared, as long as the umtx struct is
in shared memory?

From my quick read, the umtx operations use a lock word
in userspace. For uncontested locks, they use atomic
ops to flip an id into the lock word.  The kernel takes
over for contested locks, and does sleeping, wakup, etc.
Is this correct?  Is there something here that matters
if the address space (and not just the lock word) is
shared?


All our public pthread_foo() symbols are weak.  You
can easily override them in your application code in
the #ifdef freebsd case.  What is wrong with providing
your own library that overrides them to do what you
require - this shouldn't change your application code?



For our code, I was thinking of something like:

#ifdef FreeBSD
#define lock(x) umtx_lock(x, getpid())
#define unlock(x) umtx_unlock(x, getpid())
#else
#define lock(x) pthread_mutex_lock(x)
#define unlock(x) pthread_mutex_lock(x)
#endif


I should probably just shut up and try it..

Drew

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


semaphores between processes

2009-10-22 Thread Andrew Gallatin

Hi,

We're designing some software which has to lock access to
shared memory pages between several processes, and has to
run on Linux, Solaris, and FreeBSD.  We were planning to
have the lock be a pthread_mutex_t residing in the
shared memory page.  This works well on Linux and Solaris,
but FreeBSD (at least 7-stable) does not support
PTHREAD_PROCESS_SHARED mutexes.

We then moved on to posix semaphores.  Using sem_wait/sem_post
with the sem_t residing in a shared page seems to work on
all 3 platforms.  However, the FreeBSD (7-stable) man page
for sem_init(3) has this scary text regarding the pshared
value:

 The sem_init() function initializes the unnamed semaphore pointed 
to by

 sem to have the value value.  A non-zero value for pshared specifies a
 shared semaphore that can be used by multiple processes, which this
 implementation is not capable of.

Is this text obsolete?  Or is my test just getting lucky?

Is there recommended way to do this?

Thanks,

Drew
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: semaphores between processes

2009-10-22 Thread Andrew Gallatin

Daniel Eischen wrote:

On Thu, 22 Oct 2009, Andrew Gallatin wrote:


Hi,

We're designing some software which has to lock access to
shared memory pages between several processes, and has to
run on Linux, Solaris, and FreeBSD.  We were planning to
have the lock be a pthread_mutex_t residing in the
shared memory page.  This works well on Linux and Solaris,
but FreeBSD (at least 7-stable) does not support
PTHREAD_PROCESS_SHARED mutexes.

We then moved on to posix semaphores.  Using sem_wait/sem_post
with the sem_t residing in a shared page seems to work on
all 3 platforms.  However, the FreeBSD (7-stable) man page
for sem_init(3) has this scary text regarding the pshared
value:

The sem_init() function initializes the unnamed semaphore pointed 
to by
sem to have the value value.  A non-zero value for pshared 
specifies a

shared semaphore that can be used by multiple processes, which this
implementation is not capable of.

Is this text obsolete?  Or is my test just getting lucky?


I think you're getting lucky.


Yes, after playing with the code some, I now see that. :(


Is there recommended way to do this?


I believe the only way to do this is with SYSV semaphores
(semop, semget, semctl).  Unfortunately, these are not as
easy to use, IMHO.


Yes, they are pretty ugly, and we were hoping to avoid them.
Are there any plans to support either PTHREAD_PROCESS_SHARED
mutexes, or pshared posix semaphores in FreeBSD?


Thanks,

Drew

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: namei (via firmware_get(9)) from taskq in 7.x

2009-10-19 Thread Andrew Gallatin

Kostik Belousov wrote:


It seems that you want a merge of r178042,183614,184842,188057 (one of


Yes,  I finally figured this out on Fri.  I probably should
have posted a response to this thread to avoid others
wasting time on this.

Drew
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


namei (via firmware_get(9)) from taskq in 7.x

2009-10-15 Thread Andrew Gallatin

Hi,

I'm trying to re-initialize a NIC which uses firmware(9)
after a hardware fault.  As part of the process, I need
to re-load the firmware using firmware_get().  If the
firmware kld is not resident, then the machine will panic
like this:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x20
fault code  = supervisor read data, page not present
instruction pointer = 0x8:0x805b05d4
stack pointer   = 0x10:0xff880460
frame pointer   = 0x10:0xff880510
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 21 (swi5: +)
[thread pid 21 tid 100021 ]
Stopped at  namei+0x174:movq0x20(%rbx),%rax
db bt
Tracing pid 21 tid 100021 td 0xff00013c3ae0
namei() at namei+0x174
vn_open_cred() at vn_open_cred+0x3a4
linker_load_module() at linker_load_module+0x1f2
linker_reference_module() at linker_reference_module+0xae
firmware_get() at firmware_get+0x136
mxge_load_firmware() at mxge_load_firmware+0x2d
mxge_watchdog_task() at mxge_watchdog_task+0x2f6
taskqueue_run() at taskqueue_run+0x9d
ithread_loop() at ithread_loop+0x17d
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe

Looking at it in gdb, it seems like the problem is that namei
is trying to use ndp-ni_cnd.cn_thread-td_proc-p_fd-fd_cdir
which is null in this context.

Can somebody tell me what kernel context it is safe to
call firmware_get() (and hence namei) from?  Is there
a safe way to do it from a taskq?

FWIW, this seems to work fine (even from a callout context)
in 8 and higher.  It is only 7 and earlier where I'm having
this problem.

Thanks,
Drew
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Progress for 7.0 - the what's cooking page

2007-09-05 Thread Andrew Gallatin


The TSO/LRO section needs a little updating.

According to find sys/dev | xargs grep -l IFCAP_TSO, TSO is present in
at least:   bce, cxgb, em, ixgbe, msk, mxge, nfe, nxge, re

Based on grepping for IFCAP_LRO, LRO is currently available only in mxge.

Note that the LRO in mxge is currently a driver specific hack (I wrote
it, so I can say it :), intended to tide us over until Andre finishes
his more extensive LRO infastructure.  Further, LRO is currently done
in software.  Jack Vogel was looking at porting the mxge LRO into
something that could be used by several 10GbE drivers; I'm not sure
what happened to that.

Drew
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: IP over FireWire and Mac OSX

2005-03-25 Thread Andrew Gallatin

P.ArulChandran writes:
   By analyzing the packets from FreeBSD in firebug log, I could see that
  unfragmented packets are sent as fragmented packets, with inappropriate
  values in the packet header. Even if the packets are fragmented, the
  'lf' field is not set correctly. To comply with Section 4.2 of RFC
  2734, FreeBSD should set 'lf' to correct values to indicate, whether
  the packet is fragmented or unfragmented.

I just read the RFC and it looks like we're both at fault.  According
to the RFC:

   A RESERVED object has no defined meaning and SHALL be zeroed by its
   originator or, upon development of a future standard, set to a
   value specified by such a standard. The recipient of a RESERVED object
   SHALL NOT check its value.

Emperically it would seem that FreeBSD is not zeroing the reserved
fields like it should.  Further, since zeroing the reserved fields
fixes interoperability, it would seem that MacOSX is not ignoring them
like it should.  It is fun when different implementations collide in
the field ;)

In any case, Mac OS X should add more saftey checks to prevent panics
  from corrupted packets.

Yes, and we should zero the reserved fields.  Doing so seems to fix
interoperability with unpatched versions of MacOSX.  See the attached
patch. 

Thanks for letting me know what was going on and making this so easy
to fix..

Drew



fwip.diff
Description: full rfc2734 compliance
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


mapping small parts of a pci card to conserve KVA

2005-02-15 Thread Andrew Gallatin

I maintain drivers for a PCI card which presents itself as having
16MB of address space.  Eg:

mx0: Myrinet PCIXE mem 0xf900-0xf9ff irq 20 at device 3.0 on pci1

However, most of that address space does not need to be mapped into
the host.  Really, only a little over 2MB needs to be mapped (3 regions
with length 1024 bytes, 256 bytes, and 2MB).

I've tried to re-write things so that I make multiple calls
to bus_alloc_resource() with the (hopefully) appropriate offset and
lengths.  Eg:

  rid = PCIR_MAPS;
  *res = bus_alloc_resource(is-arch.dev, SYS_RES_MEMORY, rid,
 (u_long)offset, 
 (u_long)(offset + len - 1), len, 
 RF_ACTIVE|PCI_RF_DENSE);

At least on 5.3R, I seem to get back the same struct resource * from
each call.  rman_get_virtual() returns a different kva for each
mapping, yet they all seem to map to the same physical address. 
Eg, I call vtophys() on the results of rman_get_virtual(),
for each segment, and they all map to 0xf900.

Is there a way to just map what I need?

Thanks,

Drew



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: mapping small parts of a pci card to conserve KVA

2005-02-15 Thread Andrew Gallatin

Scott Long writes:
  
  You can use pmap_mapdev() to create a KVA mapping of an arbitrary
  physaddr+len.  In fact, this is exactly what newbus uses to create the
  PCI MEMIO resources when bus_alloc_resource() is called.  I'm not sure
  if the range is mapped and activated before the driver makes that call,
  Warner or John might know for sure.

Thanks..  But since this is an out of tree driver,  I want to stick
as much as I can to the normal driver APIs.   If the KVA wastage
becomes a huge problem,  I'll explore pmap_mapdev(), but for now
its not a big deal.

Thanks again,

Drew
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]



Re: obtaining a kernel crash dump

2004-05-20 Thread Andrew Gallatin

Nick Strebkov writes:
  May 19 16:17:00 devel /kernel: 
  May 19 16:17:00 devel /kernel: syncing disks... 60 3 2 
  
  [dd boot kernel messages]

Try disabling sync-on-panic.  It almost always causes problems for me
when trying to get dumps.  

% cat /etc/sysctl.conf 
kern.sync_on_panic=0

If you are running a newer version of FreeBSD with the DDB_TRACE
options, you want to enable DDB and DDB_TRACE.  This will get you a
stack trace on console, which is a heck of a lot better than nothing
if your crashdumps don't work.

options DDB #Enable the kernel debugger
options DDB_TRACE


Sometimes I have problems getting a dump on 5.x if I've dropped into
ddb, so I use the following to prevent the system from dropping to a
DDB prompt at panic:

options DDB_UNATTENDED



Drew
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: em0, polling performance, P4 2.8ghz FSB 800mhz

2004-03-03 Thread Andrew Gallatin

Don Bowman writes:

  I'm not sure what affect on fxp. fxp is inherently limited
  by something internal to it, which prevents achieving 
  high packet rates. bge is the best chip, but doesn't
  have the best bsd support.
  

Just curious - why is bge the best chip?  Is it because
it exports a really nice API (separate recv ring for small messages),
or is the chip inherently faster, regardless of its API?

I'm trying to design a new ethernet API for a firmware-based nic,
and I'm trying to convince a colleague that having separate
receive rings for small and large frames is a really good thing.

Thanks,

Drew
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: em0, polling performance, P4 2.8ghz FSB 800mhz

2004-03-03 Thread Andrew Gallatin

Luigi Rizzo writes:
  On Wed, Mar 03, 2004 at 10:03:11AM -0500, Andrew Gallatin wrote:
...
   I'm trying to design a new ethernet API for a firmware-based nic,
   and I'm trying to convince a colleague that having separate
   receive rings for small and large frames is a really good thing.
  
  i am actually not very convinced either, unless you are telling me
  that there is a way to preserve ordering. Or you'd be in trouble
  when, on your busy link, there is a mismatch between user-level and
  link-level block sizes.
  
  So, what is your design like, you want to pass the NIC buffers of
  2-3 different sizes and let the NIC choose from the most appropriate
  pool depending on the incoming frame size, but still return
  received frames in a single ring in arrival order ?

Yes, exactly.  This way you get to pass the stack small (MHLEN)
frames in mbufs, rather than clusters without doing something like
copying them in the driver's rx interrupt handler.  You can allocate
tons of mbufs so that you can absorb the occasional burst (or spike in
host latency) without being as bad of pig as you'd be if you allocated
a huge number of clusters ;)

You also get to set yourself up for zero-copy receive by splitting
the headers into mbufs, and the payloads into jumbo clusters
that can get page-flipped.  But that's a lot trickier and not
really in the scope of the initial implementation.

Drew
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Zero copy sockets question

2004-02-12 Thread Andrew Gallatin

Dung Patrick writes:
  Hi
  
  I have read http://people.freebsd.org/~ken/zero_copy/
  
  To correctly use zero copy receive, it seems it need to set the MTU to:
  have to be at least page sized, and be aligned on page boundaries.

Yes.

  So is the default MTU for ethernet network card 1500 works?

No, you need to have an MTU of at least PAGE_SIZE + headers.
And a NIC which is smart enough to do the header splitting.
Currently, the Alteon Tigon2 is the only nic which fits the bill.
I keep meaning to implement header splitting in the Myricom Myrinet
firmware, and I keep not getting time for it..

Note that send-side zero-copy works on any NIC, and with a standard
MTU.


Drew
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Re: Zero copy sockets question

2004-02-12 Thread Andrew Gallatin

Dung Patrick writes:
  Correct me if I am wrong:
  
  To use the zero copy 'receive' on i386, you need to set the MTU to 4096 bytes(page 
  size) or 4096 multiples.

No, just larger than a page-size plus headers.  FreeBSD's tcp
automagically sets the mss to a page-sized multiple for large MTUs.

And you need a nic which can do header splitting (ie, DMA the headers
and the payload to different places in the host).

  If it is true, until zero copy receive can do auto fitting, I think zero copy 
  receive is more useful in gigabit ethernet than in fast ethernet (I assume MTU 
  1500(or smaller) is suitable for fast ethernet/Internet.)

Fast ethernet is slow enough, it doesn't really make sense there.
These days, one could argue that it really only makes sense for 10GbE.

Drew
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Determining CPU features / cache organization from userland

2003-10-10 Thread Andrew Gallatin

Bruce M Simpson writes:
  I've been thinking we should definitely make the cache organization
  info available via sysctl. I am thinking we should do this to make
  the UMA_ALIGN_CACHE definition mean something...

If you do this,  it may make sense to use the same names as MacOSX.

Eg: 

g51% sysctl hw | grep cache
hw.cachelinesize: 128
hw.l1icachesize: 65536
hw.l1dcachesize: 32768
hw.l2cachesize: 524288


Drew
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: VIA EPIA-M10000 board just works with FreeBSD 4.8

2003-09-25 Thread Andrew Gallatin

Clifton Royston writes:
For anyone who's interested, I've been running FreeBSD 4.8 on the
  EPIA-1M mini-ITX for at least a couple months now; it's available

Cool!  Have you measured the power consumption? 

I'm looking for a low power consumption, 'always on' box for my home
office, and have had bad luck with packaged appliances for things like
ipsec.  It would be great to have a real computer for not much more
power consumption than one of these appliances..

Drew




___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: PCI interrupts passing DMA

2003-09-18 Thread Andrew Gallatin

Aaro Koskinen writes:

   My question is: What the heck could the SMP kernel be doing which
   causes the DMA to complete faster?
  
  The chipset probably uses PCI bus (MSI-like mechanism) to deliver the
  interrupt from the IO APIC to the local APIC, which means that the PCI
  bridge(s) must complete the DMA transfer before the interrupt is
  delivered to preserve the write order.

AHA!  I think you hit it on the nose.  It turns out that the FreeBSD
SMP kernel sets up all IOAPIC interrupts as IOART_DELLOPRI.   But
linux doesn't set the IOART_DELLOPRI bit.  This seems account for the
difference in behaviour between FreeBSD  linux.

The following diff seems to make SMP FreeBSD behave the same as linux,
and the same as UP FreeBSD:

Index: i386/i386/mpapic.c
===
RCS file: /home/ncvs/src/sys/i386/i386/mpapic.c,v
retrieving revision 1.63
diff -u -r1.63 mpapic.c
--- i386/i386/mpapic.c  23 Jul 2003 18:59:38 -  1.63
+++ i386/i386/mpapic.c  18 Sep 2003 14:07:38 -
@@ -134,7 +134,7 @@
((u_int32_t)\
 (IOART_INTMSET |   \
  IOART_DESTPHY |   \
- IOART_DELLOPRI))
+ IOART_DELFIXED))

 #define DEFAULT_ISA_FLAGS  \
((u_int32_t)\



  In PIC mode, the interrupt is delivered by the wire and it has no
  effect on pending writes. A common solution is that the interrupt
  handler must perform a read from the device to the force flushing of
  buffers.

Yep.  I was trying to avoid that because PIO reads are so horribly
expensive..   I guess I'll have to do it after all.  I wish MSIs had
been around from the beginning  were more widely used.

Thanks for your help,

Drew
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


PCI interrupts passing DMA

2003-09-17 Thread Andrew Gallatin

I was toying with a programmable PCI card and wrote some code 
which DMAs a small block of data to the host, and then interrupts the
host.   The host checks the end of the block, and sees if it gets the
value it expects.  

On an SMP P4 (hyperthreaded, with ServerWorks chipset) FreeBSD 4.8 UP,
and on Linux 2.4.18, there is a huge delay between the interrupt being
handled, and the DMA finally completing (from the host's perspective).
Time enough for the interrupt handler to be triggered 3 or 4 times,
and to print foo to a serial console line each time it notices
that the DMA has not completed.

The interesting thing is that on FreeBSD 4.8SMP, and FreeBSD
5.1-current (SMP), the data has arrived by the time the interrupt
handler is called.

This would be easy to explain if the interrupt latency were vastly
different between the FreeBSD SMP kernel and the other kernels, but it
does not seem to be.  It actually seems to be about 5us faster
(interrupt to wakeup of user-level process, so some fat is in there)
than the FreeBSD UP kernel, possibly due to APIC io.  *measurement done
without console printf*

My question is: What the heck could the SMP kernel be doing which
causes the DMA to complete faster?   

Thanks,

Drew
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: BSD make question

2003-08-09 Thread Andrew Gallatin

Ruslan Ermilov writes:
   
  Ah, didn't notice it.  Try this:
  
  .for f in $(LIB)
  $(f:.c=.o): $(f)
   gcc -DLIB -c $ -o $@
  .endfor

Thanks!  That works.

Drew
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: BSD make question

2003-08-08 Thread Andrew Gallatin

Ruslan Ermilov writes:
  On Thu, Aug 07, 2003 at 02:42:30PM -0400, Andrew Gallatin wrote:
   
   Using BSD make, how can I apply different rules based on different
   directories while using only a single makefile?
   
  There's a .CURDIR variable that can be used to conditionalize
  parts of a makefile.
  
   Ie, the appended Makefile results in the following compilations:
   
   gcc -DLIB -c lib/foo.c -o lib/foo.o
   gcc -DLIB -c lib/bar.c -o lib/bar.o
   gcc -DMCP -c mcp/baz.c -o mcp/baz.o
   
   Is it possible to do something similar with BSD make?
   
  It just works as is with bmake.  What's your problem, Drew?  ;-)
  
  $ make -n
  cc -O -pipe -march=pentiumpro -c lib/foo.c

;)  But its missing the -DLIB or -DMCP.

Thanks for the .CURDIR hint.

Drew
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


BSD make question

2003-08-07 Thread Andrew Gallatin

Using BSD make, how can I apply different rules based on different
directories while using only a single makefile?


Ie, the appended Makefile results in the following compilations:

gcc -DLIB -c lib/foo.c -o lib/foo.o
gcc -DLIB -c lib/bar.c -o lib/bar.o
gcc -DMCP -c mcp/baz.c -o mcp/baz.o

Is it possible to do something similar with BSD make?

Drew


###
.SUFFIXES:
.SUFFIXES: .o .c

LIB=\
lib/foo.c \
lib/bar.c

MCP=\
mcp/baz.c

all: $(LIB:.c=.o) $(MCP:.c=.o)

lib/%.o: lib/%.c
gcc -DLIB -c $ -o $@

mcp/%.o: mcp/%.c
gcc -DMCP -c $ -o $@

.PHONY: clean
clean:
rm -f $(LIB:.c=.o) $(MCP:.c=.o)
###
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: per-open device private data, mmap

2003-03-14 Thread Andrew Gallatin

Eric Anholt writes:
  shouldn't be too big of an issue.  The unique identifier is the big
  problem and the fileops trick should work for that.
  
  However, is this going to get easier some day?  Are there any plans to
  pass the struct file down to the drivers and have a void * in there for
  private data?
  

I think that phk is working on this for 6.x

In the meantime, I have a new driver Im developing which uses the
fileops trick you describe, but takes it a step further and conjurs up
a new vnode.  That makes it work with mmap.  I've not run into
any problems yet, but it is lightly tested.

Cheers,

Drew


/*
 * Conjure up our own vnode out of thin air.  We need the
 * vnode so that we can stash a pointer to the per-connection
 * priv struct for use in open/close/ioctl and mmap.  This is
 * tricky, because we need make it look enough like the device
 * vnode so that VOP_GETATTR() works on the slave vnode in mmap()
 */

static int
xxx_conjur_vnode(dev_t dev, struct thread *td)
{
  int error, fd;
  struct filedesc *fdp;
  struct file *fp;
  struct vnode *vn = NULL, *vd = NULL;
  struct cdev *rdev;

  fdp = td-td_proc-p_fd;
  if (fdp == NULL)
return (0);

  if (td-td_dupfd = 0)
return ENODEV;

  rdev = xxx_malloc(sizeof(*rdev), M_WAITOK);

  if ((error = falloc(td, fp, fd)) != 0)
goto abort_with_rdev;

  vd = SLIST_FIRST(dev-si_hlist);

  if ((error = getnewvnode(none, vd-v_mount, vd-v_op, vn)))
goto abort_with_falloc;

  vn-v_type = VCHR;

  /*  really should clone v_vdata  not copy pointer */
  vn-v_data = vd-v_data;/* for VTOI in devfs_getattr() */

  /* copy our cdev info */
  vn-v_rdev = rdev;
  bcopy(vd-v_rdev, vn-v_rdev, sizeof(*rdev));

  /* finally, save the data pointer (our softc) */
  vn-v_rdev-si_drv2 = 0;

  fp-f_data = (caddr_t)vn;
  fp-f_flag = FREAD|FWRITE;
  fp-f_ops = xxx_fileops;
  fp-f_type = DTYPE_VNODE;   /* so that we can mmap */

  /*
   * Save the new fd as dupfd in the proc structure, then we have
   * open() return the special error code (ENXIO).  Returning with a
   * dupfd and ENXIO causes magic things to happen in kern_open().
   */
  td-td_dupfd = fd;
  return 0;

 abort_with_rdev:
  xxx_free(rdev);

 abort_with_falloc:
  FILEDESC_LOCK(fdp);
  fdp-fd_ofiles[fd] = NULL;
  FILEDESC_UNLOCK(fdp);
  fdrop(fp, td);


  return (error);

}

static int
xxx_fileclose(struct file *fp, struct thread *td)
{
  int ready_to_close;
  struct vnode *vn;
  struct cdev *rdev;
  xxx_port_state_t *ps;

  vn = (struct vnode *)fp-f_data;
  rdev = vn-v_rdev;
  ps = rdev-si_drv2;
  rdev-si_drv2 = NULL;

  /* replace the vnode ops so that devfs doesn't try to reclaim
 anything */
  vn-v_op = spec_vnodeop_p;
  vn-v_type = VNON; /* don't want to freedev() in vgonel()*/
  vgone(vn);

  /* free our private rdev */
  xxx_free(rdev);

  if (ps) {
xxx_mutex_enter(ps-sync);

/* Close the port if there are no more mappings */
ready_to_close = ps-ref_count == 0;
XXX_DEBUG_PRINT (XXX_DEBUG_OPENCLOSE,
(Board %d, port %d closed\n, ps-is-id, ps-port));

xxx_mutex_exit(ps-sync);

if (ready_to_close) {
  xxx_common_close (ps);
} else {
  XXX_INFO ((Application closed file descriptor while 
mappings still alive: port destruct delayed\n));
}
  }

  return (0);
}


static int
xxx_mmap(dev_t dev, vm_offset_t offset,
#if MMAP_RETURNS_PINDEX == 0
vm_offset_t *paddr,
#endif
int nprot)
{
  int status;
  xxx_port_state_t *ps;
  void *kva;
#if MMAP_RETURNS_PINDEX
  vm_offset_t phys;
  vm_offset_t *paddr = phys;
#endif

  ps = (xxx_port_state_t *)dev-si_drv2;
...


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: Smarter kernel modules?

2003-03-06 Thread Andrew Gallatin

M. Warner Losh writes:
  In message: [EMAIL PROTECTED]
  Sean Kelly [EMAIL PROTECTED] writes:
  : Has anyone ever considered embedding some sort of identifier in kernel
  : modules to keep them from being loaded with the wrong kernel?
  
  Actually, I was talking about this with Matt Dodd this morning...

Whatever we do, lets NOT be anywhere near as fascist as linux.  If we
implement any kind of versioning, its got to be fine-grained enough
that 3rd party binary modules will not get broken by an ABI change in
an area of the kernel which they do not care about, or there needs to
be a way for a module to opt-out.

My company ships a binary driver (ethernet network, and character
device) built on 4.1.1-R, and it has continued to work at least until
4.7-R.  I'd like to see that same level of ABI stability throughout
the 5-STABLE branch.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: Mac iBook OS10 + BSD

2003-01-15 Thread Andrew Gallatin

void writes:
  
   Also, X11 feels quite slow if you're
   used to X11.  (I'm writing this from KDE running under XDarwin on a ti
   powerbook, 867MHz).
  
  Apple's new X11-for-Mac-OS-X beta software is much faster than XDarwin.
  

Much buggier too.  And it lacks full screen mode.

I've dropped back to just using ctwm and XDarwin.  Aqua is all the
eye-candy one man can stand, kde on top is just overkill.

FWIW, the only config I've found which allows cut and paste between X
and Aqua is XDarwin+{C}TWM, or Apple's X11 which uses their hideous
Aqua-like WM...  Half the reason I use X rather than a bunch of
terminals is to *avoid* the clunky, non-custamizable UI that the Aqua
interface gives you..

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Mac iBook OS10 + BSD

2002-12-26 Thread Andrew Gallatin

Julian Elischer writes:
  
  news to me.. I run multiple terminal windows, each running tcsh.
  That's with an unaltered macosX 10.1.5.
  from the user perspective it looks a lot like FreeBSD 3.{something}


I think he means text-only syscons like vtys.  MacOSX does not have
them.   Nobody has ever been able to tell me how to make a serial
console work on my OS-X crashbox either.

  The new one is basically like FreeBSD 4.4.

All versions of OSX feel more like Nextstep than any version of
FreeBSD.

How much can BSD share things like utilities and config files with
OS10? Is there any special compatability due to the OSs being similar
in some ways?

Depends what you mean by share.  OSX uses Nexstep's netinfo database
for managing things like hosts, passwd, groups.  The config files in
/etc are just decoys there to confuse you.

It uses series of startup scripts somewhat similar to RCng.

How should I plan my BSD intallation? Any special advantage of having
BSD on a Mac with OS10, as compared to Linux Slackware?
  
  Stick with MacOS-X it's going to run better onthis hardware than
  anything else.
  

It all depends what you mean by better.  If you're talking pure unix
performance, then I say you're full of crap.  OS-X is a dog.  Linux
runs circles around it.  If you like, I'll post some LMbench numbers
showing linux kicking sand in OS-X's face on my dual 800MHz crashbox
when I return from vacation.  I'm hoping our powerpc port comes close
to doing as well as linux.  Also, X11 feels quite slow if you're
used to X11.  (I'm writing this from KDE running under XDarwin on a ti
powerbook, 867MHz).

However, if you're talking about ease of operation, then I agree with
you 100%.  Suspend always works, the my ti powerbook is up and on the
network before I have the case open.  My wife bought me a 2 button+
scrollwheel mouse for Christmas.  The mouse worked (scrollwheel
included) with no configuration at all, just as soon as I plugged it
in.  It even worked in XDarwin.  I was amazed.  Iphoto rocks.  Its
nice being able to run M$ Office natively, etc.

Fink (based on debian's dselect/apt-get) is great.  As much as I hate
to say it, I think its better than our ports/pkgs system.  I love how
it upgrades packages + dependancies seemlessly when you upgrade one
component.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: core dump from ffs_write - i think

2002-10-30 Thread Andrew Gallatin

Nate Lawson writes:
  Try to figure out where it was in frames 8 and 10 (probably a module).
  

Try the gdbmods port (/usr/ports/devel/gdbmods)

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Ati Rage 128: Dpms suspend failes

2002-10-23 Thread Andrew Gallatin

Eric Anholt writes:
  On Tue, 2002-10-22 at 07:37, Andrew Gallatin wrote:
..
   Do I need something special in my /etc/X11/XF86Config to make this
   work?  I never had problems on my old system (an alpha with a
   3dlabs Permedia-2 based AGP card).
  
  Could you send me a
  grep -i dpms /etc/X11/XF86Config /var/log/XFree86.0.log
  ?

OK, I'm an idiot.  I did not have Option DPMS in the monior section
of my XF86Config file.  Sorry for wasting your time.

But in my own defense... should xset even let me enable DPMS if
its turned off at a lower level?  If xset had complained and not
allowed me to enable DPMS, I would have taken a harder look at
my XF86Config file..  Talk about a POLA.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Ati Rage 128: Dpms suspend failes

2002-10-22 Thread Andrew Gallatin

Eric Anholt writes:
  On Mon, 2002-10-21 at 08:16, Hanspeter Roth wrote:
   Hello,
   
   I have two hosts connected to one monitor. My idea is attach the
   display to the other host by issuing `xset dpms force suspend'.
   This works on one host with a Matrox Millenium.
   On the host with an Ati Rage 128 Pro TF it works with Netbsd, but
   it doesn't work with FreeBSD 4.7-Release.
   The screen only turns blank but the LED remains green. This is the
   same when issuing `xset s activate'.
   
   What could be the reason on FreeBSD 4.7 that dpms force suspend
   doesn't work?
   
   Installed are XFree86-Server-4.2.1_3 and XFree86-libraries-4.2.1_1.)
  
  You need XFree86-Server-4.2.1_4 or later (it's at _5 now).

I've now upgraded to XFree86-Server-4.2.1_5.  dpms still does not
work for me:

% xset dpms force off ; xset q | tail -5
  Standby: 300Suspend: 600Off: 660
  DPMS is Enabled
  Monitor is Off
Font cache:
  hi-mark (KB): 1024  low-mark (KB): 768  balance (%): 70

(and I'm looking at the monitor and it is on)

My video card is an ATI Rage 128:

none1@pci1:0:0: class=0x03 card=0x7106174b chip=0x54461002
rev=0x00 hdr=0x00
vendor   = 'ATI Technologies'
device   = 'Rage 128 Pro AGP 4x'
class= display
subclass = VGA


Do I need something special in my /etc/X11/XF86Config to make this
work?  I never had problems on my old system (an alpha with a
3dlabs Permedia-2 based AGP card).



Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Ati Rage 128: Dpms suspend failes

2002-10-22 Thread Andrew Gallatin

Hanspeter Roth writes:
On Oct 22 at 10:37, Andrew Gallatin spoke:
  
   I've now upgraded to XFree86-Server-4.2.1_5.  dpms still does not
   work for me:
   
   % xset dpms force off ; xset q | tail -5
  
  I didn't care about off. My monitor seems to behave the similar when
  set to `off' as when set to suspend or standby. The status LED turns
  yellow and the screen turns blank and recovery takes a few seconds.

As does mine (based on experiance from when I had a video card that
worked in my old machine :-( )


  My application is to switch the display to the alternate host. This
  is working now.

Lucky you!  What does pciconf -lv say about your card?

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Ati Rage 128: Dpms suspend failes

2002-10-21 Thread Andrew Gallatin

Eric Anholt writes:
  On Mon, 2002-10-21 at 08:16, Hanspeter Roth wrote:
   Hello,
   
   I have two hosts connected to one monitor. My idea is attach the
   display to the other host by issuing `xset dpms force suspend'.
   This works on one host with a Matrox Millenium.
   On the host with an Ati Rage 128 Pro TF it works with Netbsd, but
   it doesn't work with FreeBSD 4.7-Release.
   The screen only turns blank but the LED remains green. This is the
   same when issuing `xset s activate'.
   
   What could be the reason on FreeBSD 4.7 that dpms force suspend
   doesn't work?
   
   Installed are XFree86-Server-4.2.1_3 and XFree86-libraries-4.2.1_1.)
  
  You need XFree86-Server-4.2.1_4 or later (it's at _5 now).
  

I'm running 4.2.1_4 and dpms does not work for me.

I just grabbed some diffs from the Xfree86 cvs to bring
drivers/ati/r128_driver.c up to 1.57.2.1 and drivers/ati/r128_reg.h up
to 1.14 and rebuilt the my r128_drv.o module.  I'll see if it works
the next time X crashes..  (I'm running current, so X crashes once/day
or so..)



Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: gdb support for kernel modules

2002-10-08 Thread Andrew Gallatin


Giorgos Keramidas writes:
  On 2002-10-07 17:09, Ian Dowse [EMAIL PROTECTED] wrote:
   
   This is something I have been meaning to investigate for a while: [...]
   Anyway, below is a proof-of-concept patch that does the basics, but
   among other things, its logic for locating the kernel module files
   needs a lot of work - currently it just assumes /boot/kernel/module,
  
   diff -N solib-fbsd-kld.c
   --- /dev/null  1 Jan 1970 00:00:00 -
   +++ solib-fbsd-kld.c   7 Oct 2002 10:39:48 -
  
   +  snprintf (new-so_name, SO_NAME_MAX_PATH_SIZE, /boot/kernel/%s,
   +  new-so_original_name);
  
  I'm not really sure this would work for remote gdb sessions, but locally
  it's probably more correct to use sysctl and grab the value of
  kern.module_path or kern.bootfile instead of hardwiring `/boot/kernel/%s'.

gdbmods does an ugly thing which is incredibly useful.  It assumes
that the modules you want to debug are sitting in your kernel build
pool.  So what it does is extract the build directory from the kernel
(using strings), and runs a find rooted there for the module in
question.  But its a shell script, so it can get away with stuff like
that ;)

Perhaps we could embed the build directory somewhere the elf headers
of each kernel module (including the kernel) so that kgdb could find
the corresponding build file with symbols.  Then your (very cool)
solib-fbsd-kld.c could easily find the kernel and modules which match
the kernel you're debugging..

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



how are sysctls in klds relocated?

2002-09-26 Thread Andrew Gallatin


Can somebody explain to me how sysctls from klds are relocated?

For background, after the binutils upgrade in -stable, I'm unable to
load linux.ko on my desktop.  The faulting address is always
0x9010102464c457f (oidp-oid_parent) and the pc is in
sysctl_find_oid_name().

The crash looks like this:

acd0: CDROM CD-ROM CDU4011 at ata1-slave PIO4
Mounting root from ufs:/dev/ad2a
linker_load_file: trying to load osf1 as elf64
linker_make_file: new file, filename=osf1.ko
linker_file_register_sysctls: registering SYSCTLs for osf1.ko
linker_file_register_sysctls: SYSCTLs 0
linker_file_sysinit: calling SYSINITs for osf1.ko
linker_file_sysinit: SYSINITs 0xfe00020799a0
linker_load_file: trying to load linux as elf64
linker_make_file: new file, filename=linux.ko
linker_file_register_sysctls: registering SYSCTLs for linux.ko
linker_file_register_sysctls: SYSCTLs 0xfe00020a6d08

fatal kernel trap:

trap entry = 0x2 (memory management fault)
a0 = 0x9010102464c457f
a1 = 0x1
a2 = 0x0
pc = 0xfc3f42dc
ra = 0xfc3f436c
curproc= 0xfe001557e980
pid = 15, comm = kldload


#0  0xfc3ed460 in dumpsys () at ../../kern/kern_shutdown.c:486
#1  0xfc3ecfa8 in boot (howto=256) at
../../kern/kern_shutdown.c:316
#2  0xfc3ed870 in panic (fmt=0xfc61da1c trap)
at ../../kern/kern_shutdown.c:595
#3  0xfc5ad4c0 in trap (a0=0x9010102464c457f,
a1=0xfe0019c49e30, a2=0, entry=2, framep=0xfe0019c49c20)
at ../../alpha/alpha/trap.c:551
#4  0xfc59f31c in XentMM ()
#5  0xfc3f3f2c in sysctl_register_oid
(oidp=0xfe00020cc000)
at ../../kern/kern_sysctl.c:102
the rest from ddb, which actually works to get a stack trace..
sysctl_find_oid_name()
sysctl_register_iod()
sysctl_register_set()
linker_file_register_sysctls()
linker_load_file()
kldload()
syscall()

(gdb) p *(struct linker_set *) 0xfe00020a6d08
$6 = {
  ls_length = 4, 
  ls_items = {0xfe000208}
}

(gdb) p/x *(struct sysctl_oid *)0xfe000208
$5 = {
  oid_parent = 0x9010102464c457f, 
  oid_link = {
sle_next = 0x0
  }, 
  oid_number = 0x90260003, 
  oid_kind = 0x1, 
  oid_arg1 = 0x8d40, 
  oid_arg2 = 0x40, 
  oid_name = 0x18140, 
  oid_handler = 0x380040, 
  oid_fmt = 0x1a001d0043, 
  oid_refcnt = 0x1


From this, it appears that the contents of this linkerset are not
getting relocated.  How is that supposed to happen?

Interestingly enough, the value of oid_parent looks a hell of a lot
like offset 0 of the kld file, and the rest of the values seem to
match further offsets in the file:

% hd /modules/linux.ko 
  7f 45 4c 46 02 01 01 09  00 00 00 00 00 00 00 00  |.ELF|
0010  03 00 26 90 01 00 00 00  00 8b 00 00 00 00 00 00  |...|
0020  40 00 00 00 00 00 00 00  d8 a1 12 00 00 00 00 00  |@...|
0030  00 00 00 00 40 00 38 00  03 00 40 00 1f 00 1c 00  |@.8...@.|
0040  01 00 00 00 05 00 00 00  00 00 00 00 00 00 00 00  ||
...

Does anybody have any idea WTF is happening here?   I'd like to figure
this out before 4.7-release..

Whats *really* odd (and annoying) is that I cannot reprduce this on my
crashbox.  The same binaries work fine on it ... this only happens on
my desktop.   

Thanks,

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: gigabit NIC of choice?

2002-09-10 Thread Andrew Gallatin


Terry Lambert writes:
  I guess the next question is Anyone know a gigabit NIC that is
  currently in production, which has hack-friendly firmware?...

I think our products are the only game in town.

http://www.myri.com/myrinet/product_list.html
http://www.myri.com/myrinet/performance/index.html

Yes, they are a little pricy, but quite hackable.  And the link speed
is twice gig ethers's (ie, 2Gb/sec full duplex, rather than 1Gb/sec
full duplex).

Sorry for the shameless plug ;)

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: gigabit NIC of choice?

2002-09-10 Thread Andrew Gallatin


Brandon D. Valentine writes:
  running it through a computer (AFAIK).  There are rumors afloat of
  Gigabit Ethernet linecards for Myrinet switch hardware on the horizon

Slightly more than rumours -- 
 http://www.myri.com/news/02512/slides/Seitz_roadmap.pdf
 http://www.myri.com/news/02512/slides/Seizovic_lanai.pdf


Cheers,

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: More dynamic KVA_SPACE

2002-08-30 Thread Andrew Gallatin


Terry Lambert writes:
  Wilko Bulte wrote:
I knew not to recommend the Alpha because it is limited to 2G
of physical memory.
   
   ?
   
   FreeBSD is limited to using 2G of whatever you have in the Alpha.
   Which is a deficiency that has been debated a number of times,
   IIRC it needs bus space work etc. See the archives..
  
  I know... which is why I didn't recommend it.  8-).

Not bus space, busdma!

The 2GB limit is due to the lack of MI PCI device driver support for
busdma.  Especially network drivers, most scsi drivers already do
busdma.  So as soon as other platforms work with more than the size of
their direct map (whatever it happens to be), alpha will too.


Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: remote crashdump

2002-07-23 Thread Andrew Gallatin


Jacques Fourie writes:
  I was wondering what the amount of effort involved would be to add
  support for dumping on a remote machine via tftp, for example. This
  would be extremely handy for devices with little or no hard disk space. 
  
  Does anyone know of anything with this functionality? 


http://www.cs.duke.edu/~anderson/freebsd/netdump/

This worked a few years ago when 4.0 was -current.
You might want to see how hard it would be to update it for -stable.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: remote crashdump

2002-07-23 Thread Andrew Gallatin


Terry Lambert writes:
  The closest anyone has come to this (to my knowledge) is
  the creation of a polled network driver and a tiny UDP
  stack to permit remote debugging over the network to a
  different machine on the same switch.  This isn't very
  close to dumping.

I think Darrell's netdump has been discussed before.  
(http://www.cs.duke.edu/~anderson/freebsd/netdump/)

It does exactly what the poster wants, but needs to be cleaned up and
brought up to date.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: dual booting current/stable on x86?

2002-07-01 Thread Andrew Gallatin


Cyrille Lefevre writes:
  On Sun, Jun 30, 2002 at 09:23:22PM -0400, Andrew Gallatin wrote:
   
   How do I dual boot -current and -stable from different slices on the
   same IDE disk? (and linux too.) 
   
   When I tell lilo to boot hde3, I get the -stable boot2 and
   /boot/loader from hde2 (ad4s2a).  I can then monkey around setting
   currdev and hints and unloading the -stable kernel  then boot
   -current, but I'd like to just pop right into -current on ad4s3a if I
   choose it.
   
   Is there a magic bullet?  I'd like to continue using lilo so that I
   can choose what OS to load via a serial console..
  
  what is the problem w/ the following entries ?
  
  other=/dev/hde2
  label=stable
  alias=s
  table=/dev/hde
  loader=/boot/chain.b
  other=/dev/hde3
  label=current
  alias=c
  table=/dev/hde
  loader=/boot/chain.b


Just that it behaves exactly as described above -- they both boot
-stable. 

  what is the content of /boot/loader.conf and /boot/loader.conf.local
  for each FreeBSD ?

/boot/loader.conf:

-stable:
 hw.ata.wc=1

-current:
 console=comconsole

/boot/loader.conf.local is empty both places.



  did you tryed grub which is far better than lilo :P


x86 bootloaders terrify me, so  I have not tried grub.  Does grub
understand reiserfs?

  you could also take a look at /usr/share/examples/bootforth then
  have something like :
  
  /boot/stable.conf
  currdev=disk1s2a
  rootdev=disk1s2a
  
  /boot/current.conf
  currdev=disk1s3a
  rootdev=disk1s3a
  
  hope this help ?

Thanks..  it did help.

I just discovered liloboot.  I may just hack myself together a custom
liloboot and forget about it.   That seems to be the most
straightforward solution.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: dual booting current/stable on x86?

2002-07-01 Thread Andrew Gallatin


Chan Tur Wei writes:
  
  I'm not sure how booting with lilo will work (never played with it).
  
  Instead, I dug around a bit previously, and I found that boot1.s reads:
  #
  # If we are on a hard drive, then load the MBR and look for the first
  # FreeBSD slice.  We use the fake partition entry below that points to
  # the MBR when we call nread.  The first pass looks for the first active
  # FreeBSD slice.  The second pass looks for the first non-active FreeBSD
  # slice if the first one fails.
  #
  
  So unless someone specifically sets the active partition, the 1st FreeBSD
  one, usually -stable, will get loaded.  Since boot1+boot2 is loaded by the
  partition boot boot0, or the standard DOS boot (or, even MS's multi boot
  selector), the above may cause the 2nd FreeBSD slice to never get loaded.
  
  Incidentally, our booteasy (boot0.s) is one such someone.  Maybe if lilo
  or liloboot does the same thing, it will work too.

Excellent.  Thanks for the pointer.  Now I at least have some
understanding of what's happening.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: dual booting current/stable on x86?

2002-07-01 Thread Andrew Gallatin


Chan Tur Wei writes:
  So unless someone specifically sets the active partition, the 1st FreeBSD
  one, usually -stable, will get loaded.  Since boot1+boot2 is loaded by the
  partition boot boot0, or the standard DOS boot (or, even MS's multi boot
  selector), the above may cause the 2nd FreeBSD slice to never get loaded.
  
  Incidentally, our booteasy (boot0.s) is one such someone.  Maybe if lilo
  or liloboot does the same thing, it will work too.

Yep, it turns out that you can make lilo set a partition active and/or
deactivate a partition via lilo's change keyword:

other = /dev/hde2
label=stable
alias=s
table=/dev/hde
loader=/boot/chain.b
change
  partition=/dev/hde2
activate
  partition=/dev/hde3
deactivate

other = /dev/hde3
label=current
alias=c
table=/dev/hde
loader=/boot/chain.b
change
  partition=/dev/hde3
activate
  partition=/dev/hde2
deactivate

Thanks again for the pointer; I'm now booting directly to -current.

Perhaps this should be a FAQ entry..

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



dual booting current/stable on x86?

2002-06-30 Thread Andrew Gallatin


How do I dual boot -current and -stable from different slices on the
same IDE disk? (and linux too.) 

When I tell lilo to boot hde3, I get the -stable boot2 and
/boot/loader from hde2 (ad4s2a).  I can then monkey around setting
currdev and hints and unloading the -stable kernel  then boot
-current, but I'd like to just pop right into -current on ad4s3a if I
choose it.

Is there a magic bullet?  I'd like to continue using lilo so that I
can choose what OS to load via a serial console..

Thanks,

Drew

The data for partition 1 is:
sysid 131 (0x83),(Linux native)
start 63, size 10522512 (5137 Meg), flag 0
beg: cyl 0/ head 1/ sector 1;
end: cyl 654/ head 254/ sector 63
The data for partition 2 is:--- STABLE
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 12578895, size 12562830 (6134 Meg), flag 80 (active)
beg: cyl 783/ head 0/ sector 1;
end: cyl 1023/ head 254/ sector 63
The data for partition 3 is:--- CURRENT
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 25141725, size 13960485 (6816 Meg), flag 80 (active)
beg: cyl 1023/ head 255/ sector 63;
end: cyl 1023/ head 254/ sector 63





To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Re: bge driver issue

2002-06-24 Thread Andrew Gallatin


John Polstra writes:
  On the i386, living with the misalignment is probably the best
  solution, unfortunately.  The only alternatives I can think of are:
  
  - bcopy the packet up by 2 bytes after reception to align the
payload, or
  
  - disable PCI-X mode on the bus
  

If the bge's API allows it, you could setup a receive descriptor with
a length of 14 bytes (size of ethernet header), and start the next
descripter 2 bytes after it (at a 16 byte offset from the front of the
mbuf).  When the receive is done, just copy the 14 bytes.

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Possible problem with rl (Realtek) ethernet card driver in 4.5-STABLE

2002-05-22 Thread Andrew Gallatin


Nigel Roberts writes:
  #10 0xc0237fbe in rl_rxeof (sc=0xc0b9d200) at ../../pci/if_rl.c:1151
  #11 0xc023827a in rl_intr (arg=0xc0b9d200) at ../../pci/if_rl.c:1342
  #12 0xc0279c7a in vec3 ()
  #13 0xc01c2196 in ether_output (ifp=0xc0ba4000, m=0xc076af00, dst=0xc0c28770, 
  rt0=0xc0c59d00) at ../../net/if_ethersubr.c:369
  #14 0xc01d4663 in ip_output (m0=0xc076af00, opt=0x0, ro=0xc02f9970, flags=1, 
  imo=0x0) at ../../netinet/ip_output.c:822

Was the realtek really at IRQ 3?

I'm NOT an x86 hacker, and I don't understand the interrupt code there
very well..  Is it possible to have an irq line which is shared
between 2 devices which use different interrupt masks?  If so, what
prevents intr_mux() from being called for a TTY interrupt, and then
calling another driver which shares the line but has a NET mask, even
when NET interrupts are masked?

Does this go away if you remove the serial line driver (sio) from your
kernel?  Can we see a (non verbose) dmesg from this box?

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: How to dump a 4gig system on panic ?

2002-05-18 Thread Andrew Gallatin


Marc G. Fournier writes:
  
  Okay, seem to be about halfway there ... client kldload's no problem,
  server runs ... do a ctl-alt-esc to get into DDB and type panic, and it
  gives a message that its looking for the server and it finds it on the
  right IP ... then it prints out a '1023' and finishes the panic ...
  
  On the 'dump server', a vmcore gets created, but its zero length ...
  
  thoughts?

As I said, it hasn't been used for quite some time.  It may require
work to get it working again.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: How to dump a 4gig system on panic ?

2002-05-17 Thread Andrew Gallatin



There are 3 things you could do:

a) Limit your memory size in the loader

b) Use partial dumps

c) Use network dumps if you have another machine to run the dump
server on.

Both the netdump  partial dump code can be found at:

 http://www.cs.duke.edu/~anderson/freebsd/

Both may be a little out of date  require some work to get working
with a recent -stable, as they were developed in the days when 4.0 was
-current.


Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: How to dump a 4gig system on panic ?

2002-05-17 Thread Andrew Gallatin


Marc G. Fournier writes:
  
  Oh, I like the netdump one ... I have a machine sitting right beside this
  one that I can use to dump to ... has anyone thought to include this as a
  'standard' sort of thing with FreeBSD?  So that it keeps up with the
  current code?
  
  

I plan to integrate partial dumps as an option at some point, but my
only -current machines are alphas, so I need to get gdb working again
there first.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: How to dump a 4gig system on panic ?

2002-05-17 Thread Andrew Gallatin


Marc G. Fournier writes:
  
  Well, downloaded the files (a .tar.gz would be nice? *grin*) and the
  client built perfectly, and kldload worked fine ... is there some way
  someone can suggest of 'simulating a crash'?  Some way to test to make
  sure that it is working as expected?  I have a 4.6-PRE machine on my desk
  that I'd like to test with before I try it on the real thing, if at all
  possible?

break into ddb  do: 
  ddb  call dumpsys()

Unless you're running a savecore which supports partial dumps, you
need to disable partial dumps (sysctl net.net_dump.partial=0).

And remember, you'll be spewing the contents of your ram (possibly
passwords, etc) across the network in clear text.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



pushal ebp

2002-04-25 Thread Andrew Gallatin


Kenneth Culver writes:
  So, as far as I can tell, this version of glibc is doing the Right Thing,
  and the ebp register is getting messed up somewhere along the line in
  either the assembly code that handles the 0x80 trap in FreeBSD, or in
  syscall2 (I think it's probably the asm that handles the 0x80 trap)...
  
  Can anyone confirm this?

I just looked at the NetBSD code  like linux, they use a macro which
individually pushes the registers onto the stack rather than using
pushal (which I assume is the same as what intel calls PUSHAD in their
x86 instruction set ref. manual).

NetBSD stopped using pushal in 1994 in rev 1.85 of their
arch/i386/i386/locore.s in a commit helpfully documented
Don't use pusha and popa.

Does anybody know why the other OSes push the registers individually,
rather than using pushal?  Could our using pushal be causing Kenneth's
ebp to get lost, or is this just a red herring?

Thanks,

Drew
 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: pushal ebp

2002-04-25 Thread Andrew Gallatin


Kenneth Culver writes:
   I just looked at the NetBSD code  like linux, they use a macro which
   individually pushes the registers onto the stack rather than using
   pushal (which I assume is the same as what intel calls PUSHAD in their
   x86 instruction set ref. manual).
  
   NetBSD stopped using pushal in 1994 in rev 1.85 of their
   arch/i386/i386/locore.s in a commit helpfully documented
   Don't use pusha and popa.
  
   Does anybody know why the other OSes push the registers individually,
   rather than using pushal?  Could our using pushal be causing Kenneth's
   ebp to get lost, or is this just a red herring?
  
   Thanks,
  
   Drew
  
  
  
  according to the intel docs, pushad (or what I'm assuming is pushal in our
  case) pushes eax, ecx, edx, ebx then pushes some temporary value (the
  original esp I think) then pushes ebp, esi, and edi:
  
  this is from the documentation for pushad
  
  IF OperandSize = 32 (* PUSHAD instruction *)
  THEN
  Temp  (ESP);
  Push(EAX);
  Push(ECX);
  Push(EDX);
  Push(EBX);
  Push(Temp);
  Push(EBP);
  Push(ESI);
  Push(EDI);
  
  so could this be the problem?
  
  Ken

I don't think so.  The temp its pushing is the stack pointer.  If you
look at the layout of the trap frame, then you'll see tf_isp comes
between tf_ebp  tf_ebx.  I assume tf_isp is the stack pointer, so
that should be OK..

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: implementing linux mmap2 syscall

2002-04-24 Thread Andrew Gallatin


Kenneth Culver writes:
  OK, I THINK I found what calls the actual kernel syscall handler, and
  sets it's args first, but I'm not sure:
  
  from linux_locore.s
  
  NON_GPROF_ENTRY(linux_sigcode)
...

  Does anyone who actually knows assembly have any ideas?

This is the linux sigtramp, or signal trampoline.  It is used to wrap
a signal handler.  Eg, the kernel calls it (by returning to it) when
it delivers a signal.  It calls the apps signal handler.  When the
handler returns, it calls the linux sigreturn system call.

This has essentially nothing to do with system calls.

The system call entry point on x86 is int0x80_syscall, which is
labled:

/*
 * Call gate entry for FreeBSD ELF and Linux/NetBSD syscall (int 0x80)
..

This then calls syscall2(), which calls the linux prepsyscall.

Maybe the argument isn't where you expect it to be, but is there.
Can you make a test program which calls mmap2 with its 6th arg as
something unique like 0xdeadbeef?  Then print out (in hex :) the trapframe
from the linux prepsyscall routine  see if you can find the deadbeef.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: implementing linux mmap2 syscall

2002-04-23 Thread Andrew Gallatin


Kenneth Culver writes:
  OK, I found another problem, here it is:
  
  static void
  linux_prepsyscall(struct trapframe *tf, int *args, u_int *code, caddr_t 
  *params)
  {
   args[0] = tf-tf_ebx;
   args[1] = tf-tf_ecx;
   args[2] = tf-tf_edx;
   args[3] = tf-tf_esi;
   args[4] = tf-tf_edi;
   *params = NULL; /* no copyin */
  }
  
  Basically, linux_mmap2 takes 6 args, and this looks here like only 5 args are 
  making it in... I checked this because the sixth argument to linux_mmap2() in 
  truss was showing 0x6, but when I printed out that arg from the kernel, it 
  was showing 0x0. Am I correct here?
  
  Ken

Yes.  According to http://john.fremlin.de/linux/asm/, linux used to
parse only 5 args but now it parses six.  Try adding:
  args[5] = tf-tf_ebp;

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: implementing linux mmap2 syscall

2002-04-23 Thread Andrew Gallatin


Kenneth Culver writes:
 Basically, linux_mmap2 takes 6 args, and this looks here like only 5 args are
 making it in... I checked this because the sixth argument to linux_mmap2() in
 truss was showing 0x6, but when I printed out that arg from the kernel, it
 was showing 0x0. Am I correct here?

 Ken
  
   Yes.  According to http://john.fremlin.de/linux/asm/, linux used to
   parse only 5 args but now it parses six.  Try adding:
args[5] = tf-tf_ebp;
  
  I don't think that arg is there:
  
  Apr 23 10:36:13 ken /kernel: tf-tf_ebp = -1077938040
  
  Ken

My guess is that we're not doing something we should be doing in
int0x80_syscall in order to get that last arg.  But I do not have
enough x86 knowledge to understand how the trapframe is constructed,
so I cannot tell what needs to be done.

Perhaps somebody with more x86 fu can help.

Sorry,

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: SSE bcopy

2002-04-11 Thread Andrew Gallatin


Denis Serenyi writes:
  I've been looking at adding an SSE bcopy that runs at user-level to a 
  program that I'm working on. I'm using FreeBSD 4.3 currently.
  
  I wrote the routine, and when I execute it, I get an illegal instruction 
  exception when I try to execute the first SSE instruction (movups).
  
  After searching the hackers archives, I'm guessing that this is because 
  FreeBSD 4.3 does not execute the instructions at boot time to enable SSE 
  instructions to be executed, and also because FreeBSD 4.3 does not save 
  the 128-bit SIMD registers on context switches.
  
  Am I correct in this assessment?
  
  It also seems like this support has been added to FreeBSD 4.5. Is this 
  correct?
  
  Assuming yes, in what release was SSE support added to FreeBSD? Has 
  anyone done a patch that can be applied to FreeBSD 4.3, or are the 
  changes non-trivial?
  

As David says, have a look at
http://kobe1995.net/~kaz/FreeBSD/SSE.en.html  There is a patch there
for 4.3.

What are the performance implications to an SSE bcopy?  How much
faster is it than a normal bcopy?   

Would you consider releasing your code under a BSD license so that
others could play with it, and possibly integrate it (or something
based on it) into FreeBSD?

Thanks,

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: SSE bcopy

2002-04-11 Thread Andrew Gallatin


Denis Serenyi writes:
  I don't think there will be a problem with releasing my source code. 
  That is, if it works and is truly a performance win :)

Cool!

  There are some PDF docs available on Intel's web site that have sample 
  code for an SSE bcopy, and give performance results (in particular, 
  Block Copy Using Pentium III Streaming SIMD Extensions). It seems to 
  be about 60 - 80% faster than using MMX instructions. However, when you 
  use SSE to store data in the destination memory location, you bypass the 
  processor's caches. So, if you were to touch the data soon after the 
  bcopy, it is no win at all.

Hey, that's great!  The copies I care about are in situtations where
the data is not touched until much later, so the normal copy is
typically a big loose because it blows out the cache..

Good luck,

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



performance of mbufs vs contig buffers?

2002-04-08 Thread Andrew Gallatin


After updating the firmware on our our 2 gigabit nic to allow enough
scatter entries per packet to stock the 9K (jumbo frame) receive
rings with cluster mubfs rather than contigmalloc'ed buffers(*), I
noticed a dramatic performance decrease: netperf TCP_STREAM
performance dropped from 1.6Gb/sec to 1.2Gb/sec.

(*) By contigmalloc'ed buffers, I mean a few megs of memory, carved
up into 9K chunks and managed via slists, like is done in most of the
in-tree gigabit ethernet drivers.

My first thought was that the firmware and/or processor on the NIC was
somehow overwhelmed by the extra work of doing 5 2K DMAs rather than
one 9K DMA. So I rebuilt my kernel  driver using 4K cluster mbufs and
added an option to the driver so that when it stocks the receive rings
with contig buffers which are greater than a PAGE_SIZE, it breaks them
up at page (4K) boundaries.

After making these change, I'm roughly comparing apples to apples.  Each
packet is received into 3 DMA descriptors.  However, I'm still
seeing the same performance - 1.6Gb/sec receives into contigmalloc'ed
buffers whose DMA descriptors are broken up into PAGE_SIZE'ed chunks,
and 1.2Gb/sec into 4K mbufs.

Is it possible that my problems are being caused by cache misses in
on cluster mbufs occuring when copying out to userspace as another
packet is being DMA'ed up?  I'd thought that since the cache line size
is 32 bytes, I'd be pretty much equally screwed either way.

Also, UDP_STREAM performance goes from 1.75Gb/sec - 1.25 Gb/sec, so
its not some weird TCP quirk.  All the UDP drops are from the
socketbuffer being full (the host is receiving data at 1.9Gb/sec into
main memory in both cases), so its as if I have less memory bandwidth
when using normal cluster mbufs.  I've been trying to use perfmon to
compare cache misses, but I'm not sure what options I should be
using..

Does anybody have any ideas why contig malloc'ed buffers are so much
quicker?  

Thanks!

Drew

PS: Here's the dmesg from the machine in question.  Serverworks LE
3.0, 1GHz PIII (256K cache).  I've got page coloring enabled in the
kernel; it doesn't seem to make much difference.

Copyright (c) 1992-2002 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 4.5-STABLE #1: Mon Apr  8 17:33:51 EDT 2002
gallatin@ugly:/usr/src/sys/compile/PERFMON
Timecounter i8254  frequency 1193182 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (999.53-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x68a  Stepping = 10
  
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
real memory  = 536805376 (524224K bytes)
avail memory = 517902336 (505764K bytes)
Preloaded elf kernel kernel.perfmon at 0xc044f000.
Pentium Pro MTRR support enabled
md0: Malloc disk
Using $PIR table, 9 entries at 0xc00f5250
npx0: math processor on motherboard
npx0: INT 16 interface
pcib0: ServerWorks NB6635 3.0LE host to PCI bridge on motherboard
pci0: PCI bus on pcib0
atapci0: Promise ATA66 controller port 
0xdf00-0xdf3f,0xdfe0-0xdfe3,0xdfa8-0xdfaf,0xdfe4-0xdfe7,0xdff0-0xdff7 mem 
0xfc9e-0xfc9f irq 10 at device 2.0 on pci0
ata2: at 0xdff0 on atapci0
ata3: at 0xdfa8 on atapci0
fxp0: Intel Pro 10/100B/100+ Ethernet port 0xd800-0xd83f mem 
0xfc80-0xfc8f,0xfc9ce000-0xfc9cefff irq 9 at device 6.0 on pci0
fxp0: Ethernet address 00:30:48:21:e4:47
inphy0: i82555 10/100 media interface on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
isab0: ServerWorks IB6566 PCI to ISA bridge at device 15.0 on pci0
isa0: ISA bus on isab0
atapci1: ServerWorks ROSB4 ATA33 controller port 0xffa0-0xffaf at device 15.1 on pci0
ata0: at 0x1f0 irq 14 on atapci1
ata1: at 0x170 irq 15 on atapci1
pci0: OHCI USB controller at 15.2 irq 10
pcib1: ServerWorks NB6635 3.0LE host to PCI bridge on motherboard
pci1: PCI bus on pcib1
pci1: ATI Mach64-GO graphics accelerator at 1.0 irq 11
pci1: unknown card (vendor=0x14c1, dev=0x8043) at 2.0 irq 5
orm0: Option ROMs at iomem 0xc-0xc7fff,0xc8000-0xc97ff,0xc9800-0xca7ff on isa0
fdc0: NEC 72065B or clone at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1440-KB 3.5 drive on fdc0 drive 0
atkbdc0: Keyboard controller (i8042) at port 0x60,0x64 on isa0
vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0
sc0: System console at flags 0x100 on isa0
sc0: VGA 16 virtual consoles, flags=0x100
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A, console
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
ppc0: Parallel port at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
plip0: PLIP network interface on ppbus0
lpt0: Printer on ppbus0
lpt0: Interrupt-driven port
ppi0: Parallel I/O on ppbus0
ad4: 19092MB ST320414A [38792/16/63] at ata2-master UDMA66
acd0: CDROM CDU5211 at 

Re: performance of mbufs vs contig buffers?

2002-04-08 Thread Andrew Gallatin


Terry Lambert writes:
  Andrew Gallatin wrote:
   After updating the firmware on our our 2 gigabit nic to allow enough
   scatter entries per packet to stock the 9K (jumbo frame) receive
   rings with cluster mubfs rather than contigmalloc'ed buffers(*), I
   noticed a dramatic performance decrease: netperf TCP_STREAM
   performance dropped from 1.6Gb/sec to 1.2Gb/sec.
  
  [ ... ]
  
   Is it possible that my problems are being caused by cache misses in
   on cluster mbufs occuring when copying out to userspace as another
   packet is being DMA'ed up?  I'd thought that since the cache line size
   is 32 bytes, I'd be pretty much equally screwed either way.
  
  [ ... ]
  
   Does anybody have any ideas why contig malloc'ed buffers are so much
   quicker?
  
  Instrument m_pullup(), and see how much it's being called in
  both cases.  Probably you are seeing the 2 byte misalignment
  of the TCP payload in the the ethernet packet.

The TCP payload is aligned.  We stock the rings so that the
ethernet header is intentionally misaligned, which makes the IP
portion of the packet land aligned.  (actually, we encapsulate the
ethernet traffic behind another 16-bit header, so everything ends up
aligned without the +2/-2 stuff).

  My other guess would be that the clusters you are dealing
  with are non-contiguous.  This has both scatter/gather
  implications, and cache-line implications when using them.

Please elaborate...  What sort of scatter/gather implications?
Microbenchmarks don't show much of a difference DMA'ing to
non-contigous vs. contigous pages. (over 400MB/sec in all cases).
Also, we get close to link speed DMA'ing to user space, and with page
coloring, that virtually guarantees that the pages are not physically
contigous.

Based on the UDP behaviour, I think that its cache implications.  The
bottleneck seems to be when copyout() reads the recently DMA'ed data.
The driver reads the first few dozen bytes (so as to touch up the csum
by subracting off the extra bits the DMA engines added in).  We do
hardware csum offloading, so the entire packet is not read until
copyout() is called.

  Having thought about this problem before, I think that what
  you probably need is to chunk the buffers up, and treat them
  as M_EXT type mbufs (e.g. go with contigmalloc).

I really, really hate doing this for a variety of reasons.  Mainly
that the user may not expect the NIC driver is doing this  it may
take her a while to realize that adjusting NMBCLUSTERS has no effect.
Although... Hmmm..  I could use a small amount of private buffers
while I have them  then fall back to contig buffers when I run out.  

I'd still like to fully understand the problem though; sweeping it
under the rug bothers me.

  To be able to use generic mbufs for this, what's really
  needed is the ability to have variable size mbufs.  At the
  very least, I think a single mbuf should be of a size so
  that the MTU fits inside it.  Fixing this would be a large
  amount of work, and the gain is uncertain.
  
  You can get a minor idea of the available gain by looking
  at the Tigon II firmware changes to use page based buffer
  allocations, per Bill Paul  Co..

If you're thinking of what I'm thinking of (the zero copy stuff), I
wrote that code. ;)

I seem to remember you talking about seeing a 10% speedup from using 
4MB pages for cluster mbufs.   How did you do that?  I'd like to see
what affect it has with this workload.

Thanks!

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-05 Thread Andrew Gallatin


Bruce A. Mah writes:
  
  I was discussing this with some of my cow-orkers, as we've had a similar
  situation (cluster mbufs getting temporarily depleted on a
  4.5-RELEASE-p2 NFS server with Linux and FreeBSD clients, but no kernel
  panics).  Shouldn't the net.inet.ip.maxfragpackets sysctl variable
  (introduced in 4.4-RELEASE) limit the number of fragments on the
  reassembly queue(s)?  This value looks to be about 1/4 the number of
  cluster mbufs, by default.

That's a good point.  When I was bitten by this, I didn't have time to
mess with things  I cranked down the read/write size on the linux
clients.   

The problem is that ip_maxfragpackets is:
Maximum number of IPv4 fragment reassembly queue entries


You ( I,  most people probably) took that number to mean the cap on
the number of mbufs sitting on reassembly queues.  However, its really
a cap on the number of fragmented packets sitting on reassembly
queues:

/*
 * If first fragment to arrive, create a reassembly queue.
 */
if (fp == 0) {
/*
 * Enforce upper bound on number of fragmented packets
 * for which we attempt reassembly;
 * If maxfrag is 0, never accept fragments.
 * If maxfrag is -1, accept all fragments without limitation.
 ...

Since the linux host is sending 16K packets, that means that each
packet is made up of 11 cluster mbufs (assuming a 1500 byte mtu).
There can be as many as 10 cluster mbufs on the reassembly queue for
for each packet.

Lets say we have 2048 cluster mbufs.  That makes maxfragpackets 512.
However, 512 * 10 mbufs = 5120 mbufs.  Oops.

I think the limit should probably be something much smaller, like
maybe nmbclusters / (net.inet.udp.recvspace / 1472).  Or the
implementation  name should be changed to maxfragmbufs

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-05 Thread Andrew Gallatin


Terry Lambert writes:
  Andrew Gallatin wrote:
   The problem is that ip_maxfragpackets is:
   Maximum number of IPv4 fragment reassembly queue entries
   
   You ( I,  most people probably) took that number to mean the cap on
   the number of mbufs sitting on reassembly queues.  However, its really
   a cap on the number of fragmented packets sitting on reassembly
   queues:
  
  [ ... ]
  
   Since the linux host is sending 16K packets, that means that each
   packet is made up of 11 cluster mbufs (assuming a 1500 byte mtu).
   There can be as many as 10 cluster mbufs on the reassembly queue for
   for each packet.
   
   Lets say we have 2048 cluster mbufs.  That makes maxfragpackets 512.
   However, 512 * 10 mbufs = 5120 mbufs.  Oops.
   
   I think the limit should probably be something much smaller, like
   maybe nmbclusters / (net.inet.udp.recvspace / 1472).  Or the
   implementation  name should be changed to maxfragmbufs
  
  
  This suggests that one could fragment as large a UDP packet
  as one chooses into n fragments, and then supply only n-1
  elements of the whole packet, as an attack, in order to use
  up system resources.

Essentially what a linux NFS client is already doing.. ;-(

  I think we are better off with my suggestion, where udp packets
  above a certain size are intentionally dropped as not supported.

Depending on what the certain size is, that might be reasonable.

  Alternately, it would be a good idea to have a ip_maxpacketfrags
  instead of an ip_maxfragpackets, to put a hard limit on the
  number of mbufs that can be consumed by the fragment reassembly
  process.

I think this is the best solution.

  Of course, this also suggests that using TCP instead of UDP for
  the NFS would result in the problem just going away, for the
  original poster, which is probably all the opriginal poster
  really cares about...

Considering that a modern linux NFS client is going to be a common
scenario, we should probably be able to interroperate with it, no
matter how broken its defaults are.  BTW, 16K UDP packets are legal
according to the NFS V3 spec, if I remember it correctly.

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Fatal trap 12: page fault while in kernel mode

2002-04-04 Thread Andrew Gallatin


Will Froning writes:
  I have a 4.5-RELEASE-p2 box that is my Firewall/NAT/NFS server.  As a
  NFS client I have a RH7.2 linux box.  When I do massive NFS writes to
  my FBSD (from RH7.2 box), I get a panic.  I've attached the info I got
  from my debug kernel.
  

While the fix being discussed by Peter  others will prevent panics,
the linux box will still run your server out of mbufs clusters.  This
is happening because the linux box is using a 16K write size over UDP
by default.  This is a stupid default.  If there is any lossage
between the hosts (eg, any packets get dropped), more and more packets
will end up on the reassembly queues.  Eventually, all your cluster
mbufs will be there.

I suggest changing the mount options on the linux box to use 8k reads
and writes, or use TCP.

Another problem I've see w/Linux NFS clients is that recent linux NFS
clients seem to spew ACCESS requests like there's no tomorrow  beats
the snot out of my NFS server.  When building large software pacakges
via make -j4 over NFSv3 (100Mb ethernet) on a dual PIII 1GHz system,
a FreeBSD 4.5 host issues 400-500 ACCESS calls/sec.  A Linux 2.4.18
host spews 12,000 - 14,000 ACCESS calls/sec, or roughly 30 times as
many.  Needless to say, the build finishes a whole lot quicker on
FreeBSD.  Does anybody know what I can do to make the linux client
cache ACCESS info?

Cheers,

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Kernel Debugging over the Ethernet?

2002-02-21 Thread Andrew Gallatin


Justin C.Walker writes:
  
  On Wednesday, February 20, 2002, at 04:52 PM, Julian Elischer wrote:
  
   yes but we might as well be protocol compatible if possible :-)
   If only to re-use what they did in gdb :-)
  
  The Darwin/Mac OS X scheme only deals with IOKit because that's where 
  the drivers live.  The protocol implementation is in the directory 
  'xnu/osfmk/kdp'.  It's in essence a UDP protocol, and is implemented 
  without using any of the system's networking scheme (except for mbufs).  
  The implementation is polling.  The implementation is pretty 
  light-weight.

Where do the Darwin gdb sources live, so we can see the gdb end of it
too?  I've looked, but have so far been unable to find them.

Thanks,

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Serverworks ATA controller data corruption

2002-02-20 Thread Andrew Gallatin


Søren Schmidt writes:
  
  Hmm, the problem is known, but belived to be fixed *IF* your BIOS
  setup things the right way. I've newer seen the problem on my
  ASUS CUR-DLS, but I have several reports of TYAN's (forgot the model#)
  that fails all over. I have not verified if ASUS has done some HW
  trickery or if its just a BIOS matter. However the Serverworks
  ROSB4 chips is not one I would recommend using, if you need serious
  ATA support on such a board, install a Promise TX2 or later or a
  HPT370 or later ...

I don't much care about serious ATA support on these machines --
nearly all work is done on NFS volumes exported from an alpha.  If I
can just trust PIO not to corrupt the system disk, then it will be
fine for me.

So.. Is PIO safe?  Is there any sort of CRC being done on PIO data?

Thanks,

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Serverworks ATA controller data corruption

2002-02-20 Thread Andrew Gallatin


Terry Lambert writes:
   So.. Is PIO safe?  Is there any sort of CRC being done on PIO data?
  
  He just said: if your chipset is programmed correctly
  by the BIOS, then there will not be a problem, but
  apparently, there is a very narrow band of correctly
  (perhaps even only a single state), and the vendor
  apparently does not default the chip into that state.

I was asking a more general question about ATA -- I know that UDMA has
has some sort of CRC protection because (on other machines) I've seen
the occasional error about a bad CRC, retrying.  But what I don't know
is if PIO offers the same protection. 

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Serverworks ATA controller data corruption

2002-02-20 Thread Andrew Gallatin


Terry Lambert writes:
  Andrew Gallatin wrote:
   Terry Lambert writes:
  So.. Is PIO safe?  Is there any sort of CRC being done on PIO data?

 He just said: if your chipset is programmed correctly
 by the BIOS, then there will not be a problem, but
 apparently, there is a very narrow band of correctly
 (perhaps even only a single state), and the vendor
 apparently does not default the chip into that state.
   
   I was asking a more general question about ATA -- I know that UDMA has
   has some sort of CRC protection because (on other machines) I've seen
   the occasional error about a bad CRC, retrying.  But what I don't know
   is if PIO offers the same protection.
  
  PIO is safe.  The problem with ATA DMA needing the CRC is
  to recover from the case where the DMA is aborted in the
  middle, which is not signalled (this was the problem with
  the CMD640B ATA chipset interface on Intel).

Or marginal cables, I'd assume.

  In fact, you might want to try enabling the CMD640B workaround
  on your system, even though it is not probing a CMD640B
  present, and see if that fixes it (the chipset in question
  might be using the same macrocell in its implementation, or
  it might just be similarly buggy).  If that worked, then you
  could leave the DMA enabled.

Ick.  No thanks.

  PIO makes the host CPU do the work... basically, it's like a
  WinModem, only for ATA interfaces, and it's documented.  8-(.
  
  Actually, now that I think about it, using the main CPU and
  doinf PIO might be better anyway, given the speed difference
  between the main CPU and the DMA engine on the ATA chip; the
  overall performance may even be up to 2x better using the
  host CPU to do the work, particularly if you special case
  the transfer alignment, the way bcopy does.

Not without write combining, at least, and PIO reads suck for x86s
almost universally.  To add insult to injury, most revs of this chip
have a well known PIO corruption bug when write combining is enabled.


Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: mmap and PROT_WRITE

2002-02-14 Thread Andrew Gallatin


Jason Mawdsley writes:
  Why can't I write to memory in the first case?
  
  Is there anyway I can implement writable but no readable memory?
  
  I read some where that there is no true write only memory do to the
  limitations of x86.

I think you must have read correctly -- your sample code runs fine (both
cases) on FreeBSD/alpha.  The same test program dumps core on
FreeBSD/i386

Cheers,

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: requesting guidance for updating the RocketPort driver

2002-02-09 Thread Andrew Gallatin


John Baldwin writes:
  
  On 09-Feb-02 Julian Elischer wrote:
   he infrastructure needed for a new driver can be taken from 
   the sample driver in /usr/share/examples/drivers/make_device_driver.sh
   IN -CURRENT. (use cvdweb on the website to get it)
   
   that will at least get rid of the 'shims' stuff.
  
  There is already a newer driver in current.  I've backported it but it didn't
  help me get my RocketPort working. :-P  (I think my rocketport has other issues
  though as other people have had success with the cards on the old driver.)  The
  backport is fairly easy.  You need to bascially take the src/sys/dev/rp/
..

I've done essentially that and have been using it to drive serial
consoles off my alpha for a few months now on -stable with no problems.
I needed the newer driver from -current because the old driver isn't
bus-space-ified and doens't work on alpha

 rp0: RocketPort PCI port 0x10180-0x101bf irq 9 at device 10.0 on pci0
 RocketPort0 (Version 3.02) 8 ports.


Try http://people.freebsd.org/~gallatin/rp.tgz.  I never bothered to
touch all the files files so that means you must build it as a module:

cd /usr/src
fetch http://people.freebsd.org/~gallatin/rp.tgz
tar zxf rp.tgz
cd sys/modules/rp
make depend  make  make install
kldload rp
(this assumes you're not already running a kernel with old driver
built in)

All the standard disclaimers apply. Don't blame me if it blows up your
computer, gives you an ulcer or gives your cat hairballs...

Anyway, let me know if it works any better for you than the old driver
in -stable.  I've been reluctant to commit it since I don't want to be
responsible for maintaining it and just running serial consoles at
9600 baud doesn't push things hard enough to find bugs.

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Does anyone know if the Broadcom BCM5700 has problems with HW csum?

2001-12-16 Thread Andrew Gallatin


David Greenman writes:
  David Greenman wrote:
   In any case, disabling it is what ClickArray ended up doing, as well,
   for the Tigon II, until the firmware could be fixed.
   
  We're talking about the Tigon III (bge driver for Broadcom BCM5700/BCM5701).
  
  Crap.  Thanks for the info.
  
  Have you manually calculated the checksum on a bad packet to see
  how it's off?
  
 Yes. It's typically off by 0x1051, but varies depending on the TCP/IP
  header contents.

Hmm.. Since you've already got the code for calculating the checksum
in the driver written, why not use it?  Eg, why not pass the csum up 
set CSUM_DATA_VALID iff the csum ends up being 0? Are you worried that
the firmware will yield false posatives too?

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: TCP Performance Graphs

2001-11-30 Thread Andrew Gallatin


Leo Bicknell writes:

  The question that immediately comes to mind is, why not simply use
  as big a value as possible?  The problem comes down to buffering
  the data, and busy servers may have to buffer a lot of data.  Having
  a 1 meg window size may have you buffer 1 meg per connection.  Note
  that FreeBSD's current buffer management is particularly stupid in
  that it will _always_ buffer 1 Meg, need it or not.  Until we fix
  this we need an interim solution.
  

I thought that I heard a few months ago that Matt Dillon was looking
at ways to dynamically size tcp windows from within the kernel.  Maybe
I'm on crack.

Maybe we should look at the Dynamic Righsizing work being done at
LANL.  See Dynamic Adjustment of TCP Window Sizes and 
Dynamic Right-Sizing: A Simulation Study at
http://public.lanl.gov/radiant/publications.html 

Cheers,

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 64bit Ethernet Card (if_sf driver)

2001-10-04 Thread Andrew Gallatin


[EMAIL PROTECTED] writes:
  
  Anyone with experience or ideas?
  

Because of the aligment constraints of the card, its copying every
single packet the driver recvs.  This is required on alpha (and
possibly other platforms) to prevent an unlaligned access.  In a
forwarding situation on an x86, it is suboptimal.

Try making the m_devget in the rcv handler conditional on !i386 (see
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/nge/if_nge.c.diff?r1=1.13.2.2r2=1.13.2.3
for an example of how to change this)

I'd be interestd to hear (quantitatively) how much your perf changes.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: timestamp offload [was Re: TCPIP cksum offload on FreeBSD 4.2]

2001-09-28 Thread Andrew Gallatin


Louis A. Mamakos writes:
  
  Some work I did a year or so ago measured the interrupt response time
  latency, and it was pretty impressive at how large and variable it
  could be.  
  
  louie

Yes.  Me too, but with a pamette, not a nic.

Have you read the pci pamette perf paper (Systems Performance Measurement
on PCI Pamette (1997), Laurent Moll  Mark Shand)?
http://citeseer.nj.nec.com/1690.html

If anybody cares, I have freebsd drivers for the pamette.

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: TCPIP cksum offload on FreeBSD 4.2

2001-09-28 Thread Andrew Gallatin


Terry Lambert writes:
  Jonathan Lemon wrote:
   I'm trying to use the TCPIP checksum offload capability of the Netgear
   GA620 NIC from a SMP FreeBSD 4.2R system running on a typical PIII SBC.
..
  
  He didn't say his packet size, either.
  
  To the original poster: if you are sending jumbograms, the
  buffer size on these cards is limited, so the entire packet
  can't be in the card buffer at the same time, which means
  that you can not offload the send checksum for jumbograms,
  only for regular sized packets.

This is an Alteon Tigon-2 (ti driver) based card with 512K of sram on board.
It has plenty of space for offloading transmit checksums on jumbo frames.

Perhaps you're thinking of the DP83820/DP83821 (nge driver), which
cannot compute the checksum on an outgoing frame unless it fits in the
8K tx fifo.  I think NetGear sells a card with a similar name (GA622T)
based around this chip.

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: TCPIP cksum offload on FreeBSD 4.2

2001-09-27 Thread Andrew Gallatin


Ronald G Minnich writes:

  I have a question on the checksum offloading. Has anyone measured any
  incidence of data corruption between the PCI card and memory. In other
  words, when you offload checksums the end-to-end checking becomes
  card-to-card checking, and the possibility exists that what goes in memory
  at the destination end is not what was sent at the source. Very remote
  possibility, of course, but ...

We used to see occasional data corruption at Duke with 440BX based
motherboards with non-ecc ram. We never saw it on higher-quality hosts
(alphas or serverworks based pc motherboards) with ecc memory.  It
would manifest itself as bad TCP checksums (no csum offload at the
time).

  of these types of problems (of course FreeBSD has the fastest IP over
  Myrinet anyway, so it's not like that's a huge problem).
  

Not any more.  A 2.4 linux kernel will do a bit better than FreeBSD on
an SMP box because it is able to use both processors.

Speaking of which -- who is working on making the network stack SMP
capable in -current?  Anything I can do to help?

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: TCPIP cksum offload on FreeBSD 4.2

2001-09-27 Thread Andrew Gallatin


Louis A. Mamakos writes:
  The other type of failure you might not catch are software errors; that
  is, where a packet is produced by the network stack and then is
  subsequently stomped on by a random store from some other code.  Or
  a mis-programmed I/O card with scatter/gather capability doesn't pick 
  up what was intended, etc.  The Internet checksum is useful for
  detecting this class of error.
  

No, you're missing the point almost entirely.  The checksum is not
skipped.  It is calculated by the DMA engine based on the data that's
transferred across the I/O bus on the receiver (and / or the sender).
If the data is incorrect as seen by the receiving nic, the checksum
will be wrong and the packet will be dropped.

If the packet lands in the wrong place, you have much worse problems. 

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: TCPIP cksum offload on FreeBSD 4.2

2001-09-27 Thread Andrew Gallatin


Ronald G Minnich writes:
  
  you still have a potential problem here with variance in chipsets, namely
  the case of broken ABORT or other unusual PCI cycle handling (missed word
  problem). I agree it's a low probability. But we've seen it, just a week
  or two ago on a brand new box.
  
  But then we tend to see things here nobody else sees due to our scale.
  
  ron

At this level, you're basically screwed.  A sofware checksum isn't
even an option on other PCI users, like disk controllers.  If you
don't trust your PCI chipset, what do you do about things like that? 

I'm rather curious -- what was the problematic hardware combination?

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: TCPIP cksum offload on FreeBSD 4.2

2001-09-27 Thread Andrew Gallatin


Ronald G Minnich writes:
  On Thu, 27 Sep 2001, Andrew Gallatin wrote:
  
   At this level, you're basically screwed.  A sofware checksum isn't
   even an option on other PCI users, like disk controllers.  If you
   don't trust your PCI chipset, what do you do about things like that?
  
   I'm rather curious -- what was the problematic hardware combination?
  
  Can't say yet :-(
  
  But it is one of the fancy network interfaces that essentially runs an
  RTOS on the NIC so it can help you. Actually fancy $5000 network
  interfaces are in general less reliable than your average garden-variety
  $2 IDE chip. Partly because they have so much capability.
  
  So we don't worry a lot about lossage with IDE. But it's a big problem on
  expensive, high end, high performance network interfaces.

But SCSI isn't immune either.  We had some data corruption problems
with early adaptec Ultra-2 scsi controllers too, before Justin fixed
it by working around it in the driver.

Basically, anything that uses a PCI chipset harder or in different
ways than its designers expected can end up being a problem.  Low
volume hardware is somtimes worse, but not always...

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: TCPIP cksum offload on FreeBSD 4.2

2001-09-27 Thread Andrew Gallatin


Louis A. Mamakos writes:

  I was referring to the case on the transmit side where the wrong
  data get's gathered up by the DMA engine because of software related
  errors.  You get a valid checksum, but for the wrong data.  You might
  have the wrong data because a drive screwed up setting the DMA descriptors,
  or some other I/O transfer splatted over the buffer waiting in a 
  transmit queue.

What happens if that same i/o transfer splatted over the buffer
waiting in user space prior to the copyin, or sitting in
the socket buffer prior to a software checksum being done?
Software checksums are not quite the panacea you make them out to be. 
And they're very expensive.

Geez.  All I wanted to do was pat Jonathan on the back for coming up
with what is apparently the most flexible and well though out
mechanism out there.  

These issues have been argued to death; I don't feel like arguing with
you.  I'm satisified that I'm not going to convince you  you're not
going to convince me. 

Drew










To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: TCPIP cksum offload on FreeBSD 4.2

2001-09-27 Thread Andrew Gallatin


Louis A. Mamakos writes:
  
  Folks ought to consider the likelyhood of this class of data
  corruption, unlikely as it is, and weigh it along with the impact on
  your application, and the differences in performance and loading.
  

Agreed.  Very well said, by the way..

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: ecc on i386

2001-09-25 Thread Andrew Gallatin


Peter Wemm writes:


Thanks for your description of how ECC is reported on PCs.  That was
very, very helpful.

  The Tyan Thunder 2510 BIOS even disables ECC - NMI routing so you have to
  go to quite a bit of trouble to reprogram the serverworks chipset to
  actually generate NMI's so that you can find out if something got trashed.

Is that the He-Sl or the LE-3 chipset?  Is that code available?
I have some LE-3 based boxes which I'd like be certain DTRT.

Unlike my wife's Dual Athlon, these boxes have nothing in their
BIOS pertaining to ECC error reporting. (Supermicro 370-DLE)

  Our NMI / ECC handling really really sucks in FreeBSD. Consider:
  - i686_pagezero - reads before writing in order to minimize cache snooping
  traffic in SMP systems.  However, if it gets an NMI while trying to check
  if the cache line is already zero, it will take the entire machine down
  instead of just zeroing the line.
  - NFS / VM / bio:  when they get an NMI while trying to copy data that is
  clean and backed by storage, they take the machine down instead of trying
  to recover and re-read the page.
  - userland.. If userland gets an NMI, the machine dies instead of killing
  the process (or rereading a text page etc if possible)
  - our NMI handlers are a festering pile of excretement.  They dont have
  the code to 'ack' the NMI so it isn't possible to return after recovery.
  - and so on.

Well, at least we take the machine down, which is a heck of a lot
better than ignoring the problem, which is really all that I was
hoping for. 

Thanks again,

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



ecc on i386

2001-09-24 Thread Andrew Gallatin


What happens on an ECC equipped PC when you have a multi-bit memory
error that hardware scrubbing can't fix?  Will there be some sort of
NMI or something that will panic the box?

I'm used to alphas (where you'll get a fatal machine check panic) and
I am just wondering if PCs are as safe.

Thanks,

Drew



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: ecc on i386

2001-09-24 Thread Andrew Gallatin


Matt Dillon writes:
  
  :What happens on an ECC equipped PC when you have a multi-bit memory
  :error that hardware scrubbing can't fix?  Will there be some sort of
  :NMI or something that will panic the box?
  :
  :I'm used to alphas (where you'll get a fatal machine check panic) and
  :I am just wondering if PCs are as safe.
  :
  :Thanks,
  :
  :Drew
  
  ECC can typically detect and correct single bit errors and detect
  double bit errors.  Anything beyond that is problematic... it may or
  may not detect the problem or may mis-correct a multi-bit error. 
  An NMI is generated if an uncorrectable error is detected.
  
  On PC's, ECC is optional.  Desktops typically do not ship with ECC
  memory.  Branded servers typically do.A year or two ago I would
  have been happy to use non-ECC rams (finding bad RAM through trial
  and error), but now with capacities as they are and memory prices down
  ECC is definitely the way to go.

My sentiments exactly.

  Bit errors can come from many sources, memory being only one.  Bit errors
  can occur inside the cpu chip, in the L1 and L2 caches, in memory, in
  controller chips... all over the place.  Many modern processors implement
  parity on their caches to try to cover the problem areas.  I'm not sure
  how Pentium III's and IV's are setup.
  
   -Matt

Hmm.. Well, it turns out that the box Im insterested in (Thunder K7)
can be set to send an SERR on multiple bit errors.  I wonder what
happens when a pc gets an SERR? (that's another machine check
on alpha)

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: any reason to use m_devget in the dc driver ?

2001-09-21 Thread Andrew Gallatin


Luigi Rizzo writes:
  Does anyone know of specific reasons to use m_devget()
  to extract received packets from the rx buffer in the dc
  driver, as opposed to passing up the mbuf and just
  replacing it with a fresh one in the controller's queue ?
  
  Other drivers just happily do the latter, including the de
  driver, so there seems to be no problem with the chipset
  in handling this ?

I imagine that this was done to follow alignment constraints on
non-i386 platforms where having the ip header misaligned is fatal.
(the tulip is not capable of byte granularity DMA, so you can't
intentionally misalign the ethernet header  end up with an aligned IP
header)

I imagine the i386 should be made an exception. See rev 1.17 of
sys/dev/nge/if_nge.c

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: any reason to use m_devget in the dc driver ?

2001-09-21 Thread Andrew Gallatin


Terry Lambert writes:
  Andrew Gallatin wrote:
   I imagine that this was done to follow alignment constraints on
   non-i386 platforms where having the ip header misaligned is fatal.
   (the tulip is not capable of byte granularity DMA, so you can't
   intentionally misalign the ethernet header  end up with an aligned IP
   header)
  
  This is the reason: the ethernet header is 14 bytes.
  
  
   I imagine the i386 should be made an exception. See rev 1.17 of
   sys/dev/nge/if_nge.c
  
  I disagree with this code; the elemenets in the header
  are referenced multiple times.  If you are doing the
  checksum check, you might as well be relocating the data,
  as well.  The change I would make would be to integrate
  the checksum calculation with the m_devget(), to ensure
  a single pass, in the case that m_devget() must be used
  to get aligned packet payload, and the checksum has not
  been offloaded to hardware.

Interesting idea... However, what if you're a bridge or a router?
You've just done a whole lot of work for nothing.  I imagine its just
this case that Luigi cares about.

If you want to integrate a checksum  a copy, it should really be done
at the copyout() stage. 

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Driver structures alignment

2001-09-14 Thread Andrew Gallatin


Peter Wemm writes:
  The same goes for __format_arg(n) in stdio.h.  And so on.  We've been pretty
  clean about it so far, but a few have slipped through.
  

That __format_arg, btw, breaks the Compaq CCC compiler  causes us to
have to override stdio.h because of just that one line.

Does your comment mean this has a chance of getting fixed?

Thanks,

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: TCSH bug...

2001-08-28 Thread Andrew Gallatin


Steve Ames writes:
   We *do* know who that is.  This however is a more tcsh-specific issue,
   and raising it with the tcsh author would probably lead you to faster
   happiness.  Is there some reason you wont email him about this?
  
  Except it isn't tcsh specific really. 
  
  Our config.h in /usr/src/bin/csh defines SYSMALLOC. The port does not.
  The port works, the system version doesn't. If you comment out SYSMALLOC
  in /usr/src/bin/csh/config.h and recompile then the TCSH bug goes away.
  
  Now you could argue that perhaps the definition of SYSMALLOC just exposes
  a bug in tcsh? OTOH, since the system version in -STABLE also defines
  SYSMALLOC and still manages to work... you could also argue that this points
  to some other bug in -CURRENT... lastly it could be argued that I'm barking
  up completely the wrong tree. *shrug*

Actually, it is a tcsh bug. Try playing with the MALLOC_OPTIONS
env. variable in -stable.  Specifically, set it to 'AJ'  I bet it will
drop core in -stable.  Eg:

12:10pmthunder/gallatin:/tmpuname -sr
FreeBSD 4.4-RC
12:10pmthunder/gallatin:/tmpsetenv MALLOC_OPTIONS 'AJ'
12:10pmthunder/gallatin:/tmptcsh
tcsh 6.10.00 (Astron) 2000-11-19 (alpha-digital-FreeBSD) options
8b,nls,dl,al,kan,sm,rh,color,dspm
12:10pmthunder/gallatin:/tmpset rmstar
12:10pmthunder/gallatin:/tmprm *
Do you really want to delete all files? [n/y] n
Segmentation fault (core dumped)

Note that -current has malloc options 'AJ' on by default to catch just
this kind of bug. 

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: NatSemi DP83820 gigE driver kit for 4.2 and 4.3

2001-07-16 Thread Andrew Gallatin


[EMAIL PROTECTED] writes:
  A more important question is are these 32-bit cards, and if so, do they have 
  enough internal buffer to do sustained 1GB transfers. Generally 32-bit PCI 
  is too slow for GB, as it cant do sustained 1GB transfers. Some 32-bit GB 
  cards are just a total waste.

The two cards that I have experience with are the Netgear GA622T and
SMC9462TX.  Both are 64-bit/66MHz cards.

The first nge cards we tried were a pair of Netgear GA622T boards.
They leave a lot to be desired.  We put them in our Dell PowerEdge
4400 boxes (Serverworks chipset with interleaved ram and 64-bit/66MHz
PCI, 733MHz Xeon)  hooked them up through our Extreme Summit 7i
Gigabit switch (Copper).  They have a decent packets/second rate for
minimally sized packets (155,000 packets/sec or so), but they have
serious trouble filling the link with UDP packets -- even with jumbo
frames, I can't seem to push more than 450Mb/sec out of them.

At this point, we figured the NatSemi DP8382x was just a lousy
chipset, so we ordered a pair of SMC9462TX boards.  Based on comments
which used to be in the lge driver, we assumed that they used the
Level 1 LXT1001 chips.  However, we found out that the SMC9462TX
boards that we have use the NatSemi DP8382x.  (Perhaps the SMC9462SX
uses the LXT1001?)

We were pleasantly surprised to learn that the nge based SMC boards do
perform well.  Using the same hosts  switch as above, we can nearly
fill the link with 1500 byte packets (950Mb/sec, I think).  And they
can also sustain more than 155,000 minimally sized packets/sec.  They
can easily fill the link with jumbo frames, but then there's that 8k
tx fifo checksum limitation.

Hope this helps,

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: NatSemi DP83820 gigE driver kit for 4.2 and 4.3

2001-07-16 Thread Andrew Gallatin


Bill Paul writes:
  by user programs, but these don't panic the system. In the case of
  FreeBSD/alpha, we fake it up so know about the problem but the process
  keeps running. Some OSes (e.g. Solaris) clobber the process with a
  SIGBUS. Some would argue the latter behavior is better since it makes
  it easier to find and fix what is probably a bug in the first place.

Actually, you can control this behaviour with the uac (1) command on
FreeBSD/alpha. 'uac -s' causes unaligned access errors to result in a
SIGBUS being delivered to the parent and its future descendants.
You can also enable/disable printing of errors, etc.  Really handy
when you're using a ghostscript not built w/Compaq C.

Also, Tru64 has a similar command with the same name and different syntax.

Drew

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: How to disable software TCP checksumming?

2001-05-29 Thread Andrew Gallatin



Jesper Skriver writes:
  On Tue, May 29, 2001 at 02:41:14PM -0500, Bob Willcox wrote:
   Hi,
   
   I am working on a device driver for a GSN adapter that has hardware CRC
   checking and need to know if there is a way to disable the software CRC
   checking for TCP?  This is on a FreeBSD 4.2-stable system.
  

Eegads.  I think the original poster wanted to be able to use the
hardware CRC features of his nic, not ignore checksums altogther.

Bob -- Take a look at the /sys/pci/if_ti.c driver for an example of
how to use hardware checksum assist.

On the recieve side, you want to set the m_pkthdr.csum_flags
appropriately (depending on what your device can do) on each recieve,
as well as fill in the actual checksum in m_pkthdr.csum_data.

On the send side, you need to specify what your device is capable of
assisting with in the if_hwassist field of your driver's ifp struct.
Packets will come down w/o those fields filled in.  The stack will
expect your device to calculate those fields in hardware.

I beleive these features appeared around 4.1, so if this is a 3rd
party driver, you may want to check __FreeBSD_version = 41.


Hope this helps,

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: _SC_NPROCESSORS_CONF

2001-05-20 Thread Andrew Gallatin


Arun Sharma writes:
  Single UNIX spec doesn't include the above sysconf(3) argument, but 
  many UNIX variants do. What's the BSD way of doing this ? 

How about the hw.ncpu sysctl?

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: md disks more than one from a kldload?

2001-05-18 Thread Andrew Gallatin


Jaye Mathisen writes:
  
  I kldload md.ko.  First md device comes up just peachy.
  
  however, attempts to now create an md1 fail with device not
  configured.

If you're feeling brave, I just back ported the all-singing /
all-dancing md device from -current today (I wanted a
size-configurable, non MFS malloc disk for something).  I haven't
pushed it very hard, but multiple disks appear to work from a module.

Apply the patch at http://people.freebsd.org/~gallatin/md.diff
Then grab sys/sys/mdioctl.h and sbin/mdconfig from -current.
You'll need to make the mdctl device node yourself (95, 0x00ff)

If anybody else feels like testing this, please do so.  Is there
some interest an MFC?

Cheers,

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: vmware on freebsd for fast booting for devel.

2001-04-25 Thread Andrew Gallatin


Sven Huster writes:

  OT FYI:
  
  Check the ISP1100 from Intel if you like
  support for PIII up to 850
  2GB RAM
  2 x Intel Network onboard (includes pxe boot, possible on both)
  full serial console (even for access to bios setup)

Hmm.. We have some Dell PowerEdge 1550s that do this (nice machines,
but horribleb bootstones).  But I've got a basic problem with console
redirection on PCs that we don't see on Alphas or Suns.

The problem is that I cannot figure out how in the hell to hit F2 in
my environment.  My environment is essentially telnet'ing into a
console server from an xterm.  Hitting Ctrl-A for the scsi bios
works just fine  dandy..

Anybody know how to make ansi function keys work from an xterm?

Thanks,

Drew
--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Re[2]: vmware on freebsd for fast booting for devel.

2001-04-25 Thread Andrew Gallatin


Walter Hop writes:
  [in reply to [EMAIL PROTECTED], 25-04-2001]
  
   Interesting.  What happens if it's like the reverse where one runs
   FreeBSD under vmware from Windows2000?  Since 5-10% seems to be really
   slow.
  
  I always try out new applications in a virtual machine running FreeBSD
  on my Windows workstation, it's lovely. I/O is painfully slow, but
  in normal situations performance is 10%... (PII-350, 256MB ram)

Note that the 5-10% I was talking about is just the tertiary
bootloader (/boot/loader).  I mentioned it because the original poster
was primarily concerned about 'bootstones' -- in more normal
situations (ie, once the kernel is loaded) I'd say performance is more
like 40-80% of native.

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: vmware on freebsd for fast booting for devel.

2001-04-24 Thread Andrew Gallatin


Alfred Perlstein writes:
  So I've got this really elite machinery here to test on, problem is that
  booting takes about 2 minutes each time I make a bad kernel, s...

Do you mean that vmware boots so slowly that the extra reboot cycle
required to install the next test kernel is painfully slow?  

One thing to try to speedup vmware boots would be getting rid of the
spinner in libstand -- vwware's dos-mode console i/o is painfully
slow.

The best way to cut the reboot wait time down is to network boot.
Unfortunately, VMware's AMD PCInet card doesn't support PXE.  Somebody
here has been using something called grub
(http://www.gnu.org/software/grub/) 

Grub doesn't support FreeBSD very well (eg, it can't set the root
device, set hints, etc).  I think he was hacking grub to add those
features, but I don't know how far he got...BTW, grub has no spinner.

  Anyone using anything like vmware in order to have a rapid reboot/test
  cycle for low level FreeBSD kernel coding?  How fast is it to

I've actually found real hardware to be much faster than vmware in
most cases. My dream quick-reboot box has no scsi disks, can skip the
memory test, has a serial console  loads its kernels via pxe.

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: vmware on freebsd for fast booting for devel.

2001-04-24 Thread Andrew Gallatin


Alfred Perlstein writes:
  * Andrew Gallatin [EMAIL PROTECTED] [010424 14:44] wrote:
   
   Alfred Perlstein writes:
 So I've got this really elite machinery here to test on, problem is that
 booting takes about 2 minutes each time I make a bad kernel, s...
   
   Do you mean that vmware boots so slowly that the extra reboot cycle
   required to install the next test kernel is painfully slow?  
  
  I acutally haven't tried vmware yet, I was hoping to utilize the
  lists to find out others' experiences wrt using vmware like I
  wish to.

Ah.. the 2 minutes above made it sound like you were already using
it. 

If you do start to use it (running -current as a guest), make sure to
use the i386 path for atomic_cmpset_int() unconditionally -- somehow
the cmpxchgl is finding a very slow path through the emulator.

   One thing to try to speedup vmware boots would be getting rid of the
   spinner in libstand -- vwware's dos-mode console i/o is painfully
   slow.
   
   The best way to cut the reboot wait time down is to network boot.
   Unfortunately, VMware's AMD PCInet card doesn't support PXE.  Somebody
   here has been using something called grub
   (http://www.gnu.org/software/grub/) 
   
   Grub doesn't support FreeBSD very well (eg, it can't set the root
   device, set hints, etc).  I think he was hacking grub to add those
   features, but I don't know how far he got...BTW, grub has no spinner.
   
 Anyone using anything like vmware in order to have a rapid reboot/test
 cycle for low level FreeBSD kernel coding?  How fast is it to
   
   I've actually found real hardware to be much faster than vmware in
   most cases. My dream quick-reboot box has no scsi disks, can skip the
   memory test, has a serial console  loads its kernels via pxe.
  
  Yeah, where do i buy one?

Heh.

Most Dell i810 based Optiplexes boot quickly.  You just need to throw
an fxp in there for pxe.  I'm sure other, cheaper, boxes do just as well.
Compared to the full-price vmware, it would probably be quicker to buy
a used p6..

Drew
--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: vmware on freebsd for fast booting for devel.

2001-04-24 Thread Andrew Gallatin


Doug Ambrisko writes:
  | 
  | Grub doesn't support FreeBSD very well (eg, it can't set the root
  | device, set hints, etc).  I think he was hacking grub to add those
  | features, but I don't know how far he got...BTW, grub has no spinner.
  
  Why not just use EtherBoot?

Simple ignorance.

I'll pass that pointer along to the person here who was hacking with
VMware.

Thanks!

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: vmware on freebsd for fast booting for devel.

2001-04-24 Thread Andrew Gallatin


Vincent Poy writes:
   Speaking about vmware, how much of the performance is a vm
  supposed to give compared to the actual processor in a stand-alone
  machine?

It depends on what metric one uses to measure performance.  Boots
(loading kernel) with a graphics console are painfully slow, like
5-10% of native speed. CPU bound programs run at near-native speeds.
I/O bound jobs are much slower.

Memory is a very important factor -- 128MB or less is too little to
run VMware at a reasonable speed. And to conserve memory, it really
helps to use a plain disk rather than using a disk file.  This
entails vmware doing I/O to a raw disk partition rather than to a file
and reduces memory use by eliminating double caching of data by the
host and guest OSes.

FWIW, my old 300MHz PII (128MB ram, disk file) was nearly unusable.
My wife's 400MHz laptop (192MB ram, plain disk) is fairly decent.  My
new 1.2GHz Tbird (1GB ram, plain disk) feels quite fast.  This is for
my workload, which is typically an occasional boot into Windows.  

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: x86-64 Hammer and IA64 Itainium

2001-04-18 Thread Andrew Gallatin


Mike Silbersack writes:
  
  Once that's done, it'll probably be a matter to send a clawhammer
  system and a large box of cheese and crackers to the guys who did the
  freebsd alpha port.  If the architecture is actually so similar to x86,
  it should only take them a few weekends. :)

As one of the FreeBSD/alpha porters, I must point out that I don't
know diddly-squat about low-level x86isms.  I've never even written a
line of x86 assembly.  

What's the timeframe that they're shooting for with this beast, anyway?

Drew

--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



  1   2   >