Re: devfs panic w/INVARIANTS
Kostik Belousov wrote: On Thu, Feb 04, 2010 at 03:40:28PM -0500, Andrew Gallatin wrote: I've got a commercial driver that uses device cloning. At unload time, the driver calls clone_cleanup(). When I unload the driver when the kernel is built with INVARIANTS, I'll see a panic in devfs_populate_loop(). This happens in 6-stable, as well as 8-stable. From what I can see the clone has been freed, but it remains on the devfs cdevp_list. Then the next time devfs_populate_loop() is called, it trips over the bad entry (cdp-cdp_dirents points to 0xdeadc0dedeadc0de) See appended kgdb session. If I trace the code path, it looks like clone_cleanup() calls destroy_devl(). And destroy_devl() will eventually call devfs_free() if the si_refcnt is zero. But I don't see anything which will get the cdev removed from the cdevp_list prior to it being freed. The only code I see which will get the cdev removed from the cdevp_list() seems to be the GC any lingering devices block in devfs_populate_loop What am I missing? You did not mentioned it, but my guess is that you create clones from the dev_clone event handler. Please note that devfs_lookup() that fires Yes, I do. dev_clone event, consumes a device reference. Thus clone handlers shall do dev_ref(). Due to races with cleanup, you should use MAKEDEV_REF flag for make_dev_credv(9) KPI instead of doing make_dev()/dev_ref() pair. I need to support FreeBSD going all the way back to 6, so that's not an option in some versions. But, I'm talking about device removal time. If I call clone_cleanup() where the clones have dev-si_refcount==1, then I get the use-after-free panic. If I hack things to elevate the reference count (such that dev-si_refcount==2 when clone_cleanup() is called), then I don't get the panic. Are you saying I should have been taking the extra reference via my dev_clone eventhandler? Won't having the extra reference lead to a memory leak? Or am I just mis-reading the code, and this will lead to things being freed normally? That said, do you really need clones at all ? I need to support FreeBSD back to 6.x, and I need to support the linux-like model of opening the same /dev/node multiple times and getting unique handles. So I think I need clones. Thanks for the help! Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: devfs panic w/INVARIANTS
Kostik Belousov wrote: On Fri, Feb 05, 2010 at 08:51:25AM -0500, Andrew Gallatin wrote: Kostik Belousov wrote: On Thu, Feb 04, 2010 at 03:40:28PM -0500, Andrew Gallatin wrote: I've got a commercial driver that uses device cloning. At unload time, the driver calls clone_cleanup(). When I unload the driver when the kernel is built with INVARIANTS, I'll see a panic in devfs_populate_loop(). This happens in 6-stable, as well as 8-stable. From what I can see the clone has been freed, but it remains on the devfs cdevp_list. Then the next time devfs_populate_loop() is called, it trips over the bad entry (cdp-cdp_dirents points to 0xdeadc0dedeadc0de) See appended kgdb session. If I trace the code path, it looks like clone_cleanup() calls destroy_devl(). And destroy_devl() will eventually call devfs_free() if the si_refcnt is zero. But I don't see anything which will get the cdev removed from the cdevp_list prior to it being freed. The only code I see which will get the cdev removed from the cdevp_list() seems to be the GC any lingering devices block in devfs_populate_loop What am I missing? You did not mentioned it, but my guess is that you create clones from the dev_clone event handler. Please note that devfs_lookup() that fires Yes, I do. dev_clone event, consumes a device reference. Thus clone handlers shall do dev_ref(). Due to races with cleanup, you should use MAKEDEV_REF flag for make_dev_credv(9) KPI instead of doing make_dev()/dev_ref() pair. I need to support FreeBSD going all the way back to 6, so that's not an option in some versions. But, I'm talking about device removal time. If I call clone_cleanup() where the clones have dev-si_refcount==1, then I get the use-after-free panic. If I hack things to elevate the reference count (such that dev-si_refcount==2 when clone_cleanup() is called), then I don't get the panic. Are you saying I should have been taking the extra reference via my dev_clone eventhandler? Won't having the extra reference lead to a memory leak? Or am I just mis-reading the code, and this will lead to things being freed normally? Yes, clone handler shall do dev_ref(). Either by doing race-free make_dev_credf(MAKEDEV_REF) call, or by using dev_ref() after make_dev(). OK, cool. The man pages are handy. When I started this back in the FreeBSD 5 days, the man pages didn't exist :) That said, do you really need clones at all ? I need to support FreeBSD back to 6.x, and I need to support the linux-like model of opening the same /dev/node multiple times and getting unique handles. So I think I need clones. Wouldn't it be cleaner to use cdevpriv for the 7/8/HEAD where it is present ? And have special #ifdef-ed code for 6, that could be eventually dropped. Yes, the cdevpriv() is a much cleaner interface. I'll probably add support for that soon. Thanks for the help, Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
devfs panic w/INVARIANTS
I've got a commercial driver that uses device cloning. At unload time, the driver calls clone_cleanup(). When I unload the driver when the kernel is built with INVARIANTS, I'll see a panic in devfs_populate_loop(). This happens in 6-stable, as well as 8-stable. From what I can see the clone has been freed, but it remains on the devfs cdevp_list. Then the next time devfs_populate_loop() is called, it trips over the bad entry (cdp-cdp_dirents points to 0xdeadc0dedeadc0de) See appended kgdb session. If I trace the code path, it looks like clone_cleanup() calls destroy_devl(). And destroy_devl() will eventually call devfs_free() if the si_refcnt is zero. But I don't see anything which will get the cdev removed from the cdevp_list prior to it being freed. The only code I see which will get the cdev removed from the cdevp_list() seems to be the GC any lingering devices block in devfs_populate_loop What am I missing? Thanks, Drew Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x8:0x803e8780 stack pointer = 0x10:0xade623b0 frame pointer = 0x10:0xade62400 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 896 (ps) Dumping 510 MB (2 chunks) Dumping 510 MB (2 chunks) Dumping 510 MB (2 chunks) chunk 0: 1MB (156 pages) ... ok chunk 1: 510MB (130528 pages) 494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 #0 doadump () at pcpu.h:172 172 __asm __volatile(movq %%gs:0,%0 : =r (td)); (kgdb) bt #0 doadump () at pcpu.h:172 #1 0x801b8d91 in db_fncall (dummy1=0, dummy2=0, dummy3=0, dummy4=0x0) at ../../../ddb/db_command.c:493 #2 0x801b91e5 in db_command_loop () at ../../../ddb/db_command.c:408 #3 0x801bb0ed in db_trap (type=-1377427040, code=0) at ../../../ddb/db_main.c:222 #4 0x80468b99 in kdb_trap (type=9, code=0, tf=0xade62300) at ../../../kern/subr_kdb.c:473 #5 0x806c5d14 in trap_fatal (frame=0xade62300, eva=18446742974557577824) at ../../../amd64/amd64/trap.c:660 #6 0x806c62eb in trap (frame= {tf_rdi = -2136471632, tf_rsi = -2136471656, tf_rdx = -2401050962867404578, tf_rcx = 1, tf_r8 = -2136471624, tf_r9 = -1099151973792, tf_rax = 0, tf_rbx = -1099307447040, tf_rbp = -1377426432, tf_r10 = 0, tf_r11 = 4, tf_r12 = 0, tf_r13 = -1099086652928, tf_r14 = -1099307447040, tf_r15 = 86032452, tf_trapno = 9, tf_addr = 0, tf_flags = -2143029088, tf_err = 0, tf_rip = -2143385728, tf_cs = 8, tf_rflags = 66071, tf_rsp = -1377426496, tf_ss = 16}) at ../../../amd64/amd64/trap.c:470 #7 0x806ad84b in calltrap () at ../../../amd64/amd64/exception.S:168 #8 0x803e8780 in devfs_populate_loop (dm=0xff000c2b8d00, cleanup=0) at ../../../fs/devfs/devfs_devs.c:370 #9 0x803e8beb in devfs_populate (dm=0xff000c2b8d00) at ../../../fs/devfs/devfs_devs.c:486 #10 0x803eafab in devfs_lookup (ap=0x0) at ../../../fs/devfs/devfs_vnops.c:587 #11 0x80724a2e in VOP_LOOKUP_APV (vop=0x80948600, a=0xade62630) at vnode_if.c:99 #12 0x804aadb2 in lookup (ndp=0xade629c0) at vnode_if.h:56 #13 0x804abb66 in namei (ndp=0xade629c0) at ../../../kern/vfs_lookup.c:216 #14 0x804c1be2 in vn_open_cred (ndp=0xade629c0, flagp=0xade6290c, cmode=0, cred=0xff09ac00, fdidx=3) at ../../../kern/vfs_vnops.c:183 #15 0x804b8d64 in kern_open (td=0xff00156fe260, path=0xmode=373490024) at ../../../kern/vfs_syscalls.c:1016 #16 0x804b9455 in open (td=0x80a807b0, uap=0xade62bc0) at ../../../kern/vfs_syscalls.c:971 #17 0x806c6b52 in syscall (frame= {tf_rdi = 4218321, tf_rsi = 0, tf_rdx = 0, tf_rcx = 0, tf_r8 = 140737488348272, tf_r9 = 0, tf_rax = 5, tf_rbx = 5300224, tf_rbp = 4218321, tf_r10 = 0, tf_r11 = 5300224, tf_r12 = 4218321, tf_r13 = 0, tf_r14 = 140737488348272, tf_r15 = 6, tf_trapno = 12, tf_addr = 5300224, tf_flags = 0, tf_err = 2, tf_rip = 34369309420, tf_cs = 43, tf_rflags = 514, tf_rsp = 140737488347528, tf_ss = 35}) at ../../../amd64/amd64/trap.c:807 #18 0x806ada48 in Xfast_syscall () at ../../../amd64/amd64/exception.S:287 #19 0x000800920aec in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) frame 7 #7 0x806ad84b in calltrap () at ../../../amd64/amd64/exception.S:168 168 calltrap Current language: auto; currently asm (kgdb) up #8 0x803e8780 in devfs_populate_loop (dm=0xff000c2b8d00, cleanup=0) at ../../../fs/devfs/devfs_devs.c:370 370 if ((cleanup || !(cdp-cdp_flags CDP_ACTIVE)) Current language: auto; currently c
Re: semaphores between processes
Daniel Eischen wrote: On Fri, 23 Oct 2009, John Baldwin wrote: On Thursday 22 October 2009 5:17:07 pm Daniel Eischen wrote: On Thu, 22 Oct 2009, Andrew Gallatin wrote: Daniel Eischen wrote: On Thu, 22 Oct 2009, Andrew Gallatin wrote: Hi, We're designing some software which has to lock access to shared memory pages between several processes, and has to run on Linux, Solaris, and FreeBSD. We were planning to have the lock be a pthread_mutex_t residing in the shared memory page. This works well on Linux and Solaris, but FreeBSD (at least 7-stable) does not support PTHREAD_PROCESS_SHARED mutexes. We then moved on to posix semaphores. Using sem_wait/sem_post with the sem_t residing in a shared page seems to work on all 3 platforms. However, the FreeBSD (7-stable) man page for sem_init(3) has this scary text regarding the pshared value: The sem_init() function initializes the unnamed semaphore pointed to by sem to have the value value. A non-zero value for pshared specifies a shared semaphore that can be used by multiple processes, which this implementation is not capable of. Is this text obsolete? Or is my test just getting lucky? I think you're getting lucky. Yes, after playing with the code some, I now see that. :( Is there recommended way to do this? I believe the only way to do this is with SYSV semaphores (semop, semget, semctl). Unfortunately, these are not as easy to use, IMHO. Yes, they are pretty ugly, and we were hoping to avoid them. Are there any plans to support either PTHREAD_PROCESS_SHARED mutexes, or pshared posix semaphores in FreeBSD? It's planned, just not (yet) being actively worked on. It's a API change mostly, and then adding in all the compat hooks so we don't break ABI. There are also an alternate set of patches on threads@ to allow just shared semaphores I think w/o the changes to the pthread types. I can't recall exactly what they did, but I think rrs@ was playing with using umtx directly to implement some sort of process-shared primitive. That's really not the way to go. The structs really need to become public. It would be great if they were, but that discussion was 6 months ago, and nothing seems to have happened. Plus we need to support at least 7.X and probably 6, so any changes here might not even help us. What is wrong with just using umtx directly? It seems to do exactly what we need. Thanks, Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: semaphores between processes
Daniel Eischen wrote: On Fri, 23 Oct 2009, Andrew Gallatin wrote: Daniel Eischen wrote: On Fri, 23 Oct 2009, John Baldwin wrote: On Thursday 22 October 2009 5:17:07 pm Daniel Eischen wrote: On Thu, 22 Oct 2009, Andrew Gallatin wrote: Daniel Eischen wrote: On Thu, 22 Oct 2009, Andrew Gallatin wrote: Hi, We're designing some software which has to lock access to shared memory pages between several processes, and has to run on Linux, Solaris, and FreeBSD. We were planning to have the lock be a pthread_mutex_t residing in the shared memory page. This works well on Linux and Solaris, but FreeBSD (at least 7-stable) does not support PTHREAD_PROCESS_SHARED mutexes. We then moved on to posix semaphores. Using sem_wait/sem_post with the sem_t residing in a shared page seems to work on all 3 platforms. However, the FreeBSD (7-stable) man page for sem_init(3) has this scary text regarding the pshared value: The sem_init() function initializes the unnamed semaphore pointed to by sem to have the value value. A non-zero value for pshared specifies a shared semaphore that can be used by multiple processes, which this implementation is not capable of. Is this text obsolete? Or is my test just getting lucky? I think you're getting lucky. Yes, after playing with the code some, I now see that. :( Is there recommended way to do this? I believe the only way to do this is with SYSV semaphores (semop, semget, semctl). Unfortunately, these are not as easy to use, IMHO. Yes, they are pretty ugly, and we were hoping to avoid them. Are there any plans to support either PTHREAD_PROCESS_SHARED mutexes, or pshared posix semaphores in FreeBSD? It's planned, just not (yet) being actively worked on. It's a API change mostly, and then adding in all the compat hooks so we don't break ABI. There are also an alternate set of patches on threads@ to allow just shared semaphores I think w/o the changes to the pthread types. I can't recall exactly what they did, but I think rrs@ was playing with using umtx directly to implement some sort of process-shared primitive. That's really not the way to go. The structs really need to become public. It would be great if they were, but that discussion was 6 months ago, and nothing seems to have happened. Plus we need to support at least 7.X and probably 6, so any changes here might not even help us. What is wrong with just using umtx directly? It seems to do exactly what we need. Because you can't do anything more than use umtx directly, like check for mutex types and return appropriate error codes. Just look at other implementations - Solaris, Linux, all have their pthread_*_t as public structs. I'm not saying that having pthread*t public, and getting all the features of real PTHREAD_PROCESS_SHARED would not be far better in general. But in this case all we need is a lock around a shared resource. Eg, nothing fance. So our choices seem to be either: 1) use sysv semaphores (ick) 2) use a hand rolled spinlock (ick) 3) use some sort of hack built into our driver (ick, ick) 4) use umtx Is there some bug or limitation in umtx that makes it inappropriate? (beyond the obvious, like the potential to leave a resource locked forever if the lock holder exits). Thanks, Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: semaphores between processes
Daniel Eischen wrote: We already use umtx. This really is a hack and I wouldn't advocate it. I'm not sure how you could make it work and not break existing ability to return appropriate error codes without slowing down the path in the non-shared case. You'd have to check to see if the address space was shared or not, which would require a system call. I'm probably missing something. What does it matter if the address space is shared, as long as the umtx struct is in shared memory? From my quick read, the umtx operations use a lock word in userspace. For uncontested locks, they use atomic ops to flip an id into the lock word. The kernel takes over for contested locks, and does sleeping, wakup, etc. Is this correct? Is there something here that matters if the address space (and not just the lock word) is shared? All our public pthread_foo() symbols are weak. You can easily override them in your application code in the #ifdef freebsd case. What is wrong with providing your own library that overrides them to do what you require - this shouldn't change your application code? For our code, I was thinking of something like: #ifdef FreeBSD #define lock(x) umtx_lock(x, getpid()) #define unlock(x) umtx_unlock(x, getpid()) #else #define lock(x) pthread_mutex_lock(x) #define unlock(x) pthread_mutex_lock(x) #endif I should probably just shut up and try it.. Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
semaphores between processes
Hi, We're designing some software which has to lock access to shared memory pages between several processes, and has to run on Linux, Solaris, and FreeBSD. We were planning to have the lock be a pthread_mutex_t residing in the shared memory page. This works well on Linux and Solaris, but FreeBSD (at least 7-stable) does not support PTHREAD_PROCESS_SHARED mutexes. We then moved on to posix semaphores. Using sem_wait/sem_post with the sem_t residing in a shared page seems to work on all 3 platforms. However, the FreeBSD (7-stable) man page for sem_init(3) has this scary text regarding the pshared value: The sem_init() function initializes the unnamed semaphore pointed to by sem to have the value value. A non-zero value for pshared specifies a shared semaphore that can be used by multiple processes, which this implementation is not capable of. Is this text obsolete? Or is my test just getting lucky? Is there recommended way to do this? Thanks, Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: semaphores between processes
Daniel Eischen wrote: On Thu, 22 Oct 2009, Andrew Gallatin wrote: Hi, We're designing some software which has to lock access to shared memory pages between several processes, and has to run on Linux, Solaris, and FreeBSD. We were planning to have the lock be a pthread_mutex_t residing in the shared memory page. This works well on Linux and Solaris, but FreeBSD (at least 7-stable) does not support PTHREAD_PROCESS_SHARED mutexes. We then moved on to posix semaphores. Using sem_wait/sem_post with the sem_t residing in a shared page seems to work on all 3 platforms. However, the FreeBSD (7-stable) man page for sem_init(3) has this scary text regarding the pshared value: The sem_init() function initializes the unnamed semaphore pointed to by sem to have the value value. A non-zero value for pshared specifies a shared semaphore that can be used by multiple processes, which this implementation is not capable of. Is this text obsolete? Or is my test just getting lucky? I think you're getting lucky. Yes, after playing with the code some, I now see that. :( Is there recommended way to do this? I believe the only way to do this is with SYSV semaphores (semop, semget, semctl). Unfortunately, these are not as easy to use, IMHO. Yes, they are pretty ugly, and we were hoping to avoid them. Are there any plans to support either PTHREAD_PROCESS_SHARED mutexes, or pshared posix semaphores in FreeBSD? Thanks, Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: namei (via firmware_get(9)) from taskq in 7.x
Kostik Belousov wrote: It seems that you want a merge of r178042,183614,184842,188057 (one of Yes, I finally figured this out on Fri. I probably should have posted a response to this thread to avoid others wasting time on this. Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
namei (via firmware_get(9)) from taskq in 7.x
Hi, I'm trying to re-initialize a NIC which uses firmware(9) after a hardware fault. As part of the process, I need to re-load the firmware using firmware_get(). If the firmware kld is not resident, then the machine will panic like this: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x20 fault code = supervisor read data, page not present instruction pointer = 0x8:0x805b05d4 stack pointer = 0x10:0xff880460 frame pointer = 0x10:0xff880510 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 21 (swi5: +) [thread pid 21 tid 100021 ] Stopped at namei+0x174:movq0x20(%rbx),%rax db bt Tracing pid 21 tid 100021 td 0xff00013c3ae0 namei() at namei+0x174 vn_open_cred() at vn_open_cred+0x3a4 linker_load_module() at linker_load_module+0x1f2 linker_reference_module() at linker_reference_module+0xae firmware_get() at firmware_get+0x136 mxge_load_firmware() at mxge_load_firmware+0x2d mxge_watchdog_task() at mxge_watchdog_task+0x2f6 taskqueue_run() at taskqueue_run+0x9d ithread_loop() at ithread_loop+0x17d fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe Looking at it in gdb, it seems like the problem is that namei is trying to use ndp-ni_cnd.cn_thread-td_proc-p_fd-fd_cdir which is null in this context. Can somebody tell me what kernel context it is safe to call firmware_get() (and hence namei) from? Is there a safe way to do it from a taskq? FWIW, this seems to work fine (even from a callout context) in 8 and higher. It is only 7 and earlier where I'm having this problem. Thanks, Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Progress for 7.0 - the what's cooking page
The TSO/LRO section needs a little updating. According to find sys/dev | xargs grep -l IFCAP_TSO, TSO is present in at least: bce, cxgb, em, ixgbe, msk, mxge, nfe, nxge, re Based on grepping for IFCAP_LRO, LRO is currently available only in mxge. Note that the LRO in mxge is currently a driver specific hack (I wrote it, so I can say it :), intended to tide us over until Andre finishes his more extensive LRO infastructure. Further, LRO is currently done in software. Jack Vogel was looking at porting the mxge LRO into something that could be used by several 10GbE drivers; I'm not sure what happened to that. Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: IP over FireWire and Mac OSX
P.ArulChandran writes: By analyzing the packets from FreeBSD in firebug log, I could see that unfragmented packets are sent as fragmented packets, with inappropriate values in the packet header. Even if the packets are fragmented, the 'lf' field is not set correctly. To comply with Section 4.2 of RFC 2734, FreeBSD should set 'lf' to correct values to indicate, whether the packet is fragmented or unfragmented. I just read the RFC and it looks like we're both at fault. According to the RFC: A RESERVED object has no defined meaning and SHALL be zeroed by its originator or, upon development of a future standard, set to a value specified by such a standard. The recipient of a RESERVED object SHALL NOT check its value. Emperically it would seem that FreeBSD is not zeroing the reserved fields like it should. Further, since zeroing the reserved fields fixes interoperability, it would seem that MacOSX is not ignoring them like it should. It is fun when different implementations collide in the field ;) In any case, Mac OS X should add more saftey checks to prevent panics from corrupted packets. Yes, and we should zero the reserved fields. Doing so seems to fix interoperability with unpatched versions of MacOSX. See the attached patch. Thanks for letting me know what was going on and making this so easy to fix.. Drew fwip.diff Description: full rfc2734 compliance ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
mapping small parts of a pci card to conserve KVA
I maintain drivers for a PCI card which presents itself as having 16MB of address space. Eg: mx0: Myrinet PCIXE mem 0xf900-0xf9ff irq 20 at device 3.0 on pci1 However, most of that address space does not need to be mapped into the host. Really, only a little over 2MB needs to be mapped (3 regions with length 1024 bytes, 256 bytes, and 2MB). I've tried to re-write things so that I make multiple calls to bus_alloc_resource() with the (hopefully) appropriate offset and lengths. Eg: rid = PCIR_MAPS; *res = bus_alloc_resource(is-arch.dev, SYS_RES_MEMORY, rid, (u_long)offset, (u_long)(offset + len - 1), len, RF_ACTIVE|PCI_RF_DENSE); At least on 5.3R, I seem to get back the same struct resource * from each call. rman_get_virtual() returns a different kva for each mapping, yet they all seem to map to the same physical address. Eg, I call vtophys() on the results of rman_get_virtual(), for each segment, and they all map to 0xf900. Is there a way to just map what I need? Thanks, Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: mapping small parts of a pci card to conserve KVA
Scott Long writes: You can use pmap_mapdev() to create a KVA mapping of an arbitrary physaddr+len. In fact, this is exactly what newbus uses to create the PCI MEMIO resources when bus_alloc_resource() is called. I'm not sure if the range is mapped and activated before the driver makes that call, Warner or John might know for sure. Thanks.. But since this is an out of tree driver, I want to stick as much as I can to the normal driver APIs. If the KVA wastage becomes a huge problem, I'll explore pmap_mapdev(), but for now its not a big deal. Thanks again, Drew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: obtaining a kernel crash dump
Nick Strebkov writes: May 19 16:17:00 devel /kernel: May 19 16:17:00 devel /kernel: syncing disks... 60 3 2 [dd boot kernel messages] Try disabling sync-on-panic. It almost always causes problems for me when trying to get dumps. % cat /etc/sysctl.conf kern.sync_on_panic=0 If you are running a newer version of FreeBSD with the DDB_TRACE options, you want to enable DDB and DDB_TRACE. This will get you a stack trace on console, which is a heck of a lot better than nothing if your crashdumps don't work. options DDB #Enable the kernel debugger options DDB_TRACE Sometimes I have problems getting a dump on 5.x if I've dropped into ddb, so I use the following to prevent the system from dropping to a DDB prompt at panic: options DDB_UNATTENDED Drew ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: em0, polling performance, P4 2.8ghz FSB 800mhz
Don Bowman writes: I'm not sure what affect on fxp. fxp is inherently limited by something internal to it, which prevents achieving high packet rates. bge is the best chip, but doesn't have the best bsd support. Just curious - why is bge the best chip? Is it because it exports a really nice API (separate recv ring for small messages), or is the chip inherently faster, regardless of its API? I'm trying to design a new ethernet API for a firmware-based nic, and I'm trying to convince a colleague that having separate receive rings for small and large frames is a really good thing. Thanks, Drew ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: em0, polling performance, P4 2.8ghz FSB 800mhz
Luigi Rizzo writes: On Wed, Mar 03, 2004 at 10:03:11AM -0500, Andrew Gallatin wrote: ... I'm trying to design a new ethernet API for a firmware-based nic, and I'm trying to convince a colleague that having separate receive rings for small and large frames is a really good thing. i am actually not very convinced either, unless you are telling me that there is a way to preserve ordering. Or you'd be in trouble when, on your busy link, there is a mismatch between user-level and link-level block sizes. So, what is your design like, you want to pass the NIC buffers of 2-3 different sizes and let the NIC choose from the most appropriate pool depending on the incoming frame size, but still return received frames in a single ring in arrival order ? Yes, exactly. This way you get to pass the stack small (MHLEN) frames in mbufs, rather than clusters without doing something like copying them in the driver's rx interrupt handler. You can allocate tons of mbufs so that you can absorb the occasional burst (or spike in host latency) without being as bad of pig as you'd be if you allocated a huge number of clusters ;) You also get to set yourself up for zero-copy receive by splitting the headers into mbufs, and the payloads into jumbo clusters that can get page-flipped. But that's a lot trickier and not really in the scope of the initial implementation. Drew ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Zero copy sockets question
Dung Patrick writes: Hi I have read http://people.freebsd.org/~ken/zero_copy/ To correctly use zero copy receive, it seems it need to set the MTU to: have to be at least page sized, and be aligned on page boundaries. Yes. So is the default MTU for ethernet network card 1500 works? No, you need to have an MTU of at least PAGE_SIZE + headers. And a NIC which is smart enough to do the header splitting. Currently, the Alteon Tigon2 is the only nic which fits the bill. I keep meaning to implement header splitting in the Myricom Myrinet firmware, and I keep not getting time for it.. Note that send-side zero-copy works on any NIC, and with a standard MTU. Drew ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Re: Zero copy sockets question
Dung Patrick writes: Correct me if I am wrong: To use the zero copy 'receive' on i386, you need to set the MTU to 4096 bytes(page size) or 4096 multiples. No, just larger than a page-size plus headers. FreeBSD's tcp automagically sets the mss to a page-sized multiple for large MTUs. And you need a nic which can do header splitting (ie, DMA the headers and the payload to different places in the host). If it is true, until zero copy receive can do auto fitting, I think zero copy receive is more useful in gigabit ethernet than in fast ethernet (I assume MTU 1500(or smaller) is suitable for fast ethernet/Internet.) Fast ethernet is slow enough, it doesn't really make sense there. These days, one could argue that it really only makes sense for 10GbE. Drew ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Determining CPU features / cache organization from userland
Bruce M Simpson writes: I've been thinking we should definitely make the cache organization info available via sysctl. I am thinking we should do this to make the UMA_ALIGN_CACHE definition mean something... If you do this, it may make sense to use the same names as MacOSX. Eg: g51% sysctl hw | grep cache hw.cachelinesize: 128 hw.l1icachesize: 65536 hw.l1dcachesize: 32768 hw.l2cachesize: 524288 Drew ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: VIA EPIA-M10000 board just works with FreeBSD 4.8
Clifton Royston writes: For anyone who's interested, I've been running FreeBSD 4.8 on the EPIA-1M mini-ITX for at least a couple months now; it's available Cool! Have you measured the power consumption? I'm looking for a low power consumption, 'always on' box for my home office, and have had bad luck with packaged appliances for things like ipsec. It would be great to have a real computer for not much more power consumption than one of these appliances.. Drew ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: PCI interrupts passing DMA
Aaro Koskinen writes: My question is: What the heck could the SMP kernel be doing which causes the DMA to complete faster? The chipset probably uses PCI bus (MSI-like mechanism) to deliver the interrupt from the IO APIC to the local APIC, which means that the PCI bridge(s) must complete the DMA transfer before the interrupt is delivered to preserve the write order. AHA! I think you hit it on the nose. It turns out that the FreeBSD SMP kernel sets up all IOAPIC interrupts as IOART_DELLOPRI. But linux doesn't set the IOART_DELLOPRI bit. This seems account for the difference in behaviour between FreeBSD linux. The following diff seems to make SMP FreeBSD behave the same as linux, and the same as UP FreeBSD: Index: i386/i386/mpapic.c === RCS file: /home/ncvs/src/sys/i386/i386/mpapic.c,v retrieving revision 1.63 diff -u -r1.63 mpapic.c --- i386/i386/mpapic.c 23 Jul 2003 18:59:38 - 1.63 +++ i386/i386/mpapic.c 18 Sep 2003 14:07:38 - @@ -134,7 +134,7 @@ ((u_int32_t)\ (IOART_INTMSET | \ IOART_DESTPHY | \ - IOART_DELLOPRI)) + IOART_DELFIXED)) #define DEFAULT_ISA_FLAGS \ ((u_int32_t)\ In PIC mode, the interrupt is delivered by the wire and it has no effect on pending writes. A common solution is that the interrupt handler must perform a read from the device to the force flushing of buffers. Yep. I was trying to avoid that because PIO reads are so horribly expensive.. I guess I'll have to do it after all. I wish MSIs had been around from the beginning were more widely used. Thanks for your help, Drew ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
PCI interrupts passing DMA
I was toying with a programmable PCI card and wrote some code which DMAs a small block of data to the host, and then interrupts the host. The host checks the end of the block, and sees if it gets the value it expects. On an SMP P4 (hyperthreaded, with ServerWorks chipset) FreeBSD 4.8 UP, and on Linux 2.4.18, there is a huge delay between the interrupt being handled, and the DMA finally completing (from the host's perspective). Time enough for the interrupt handler to be triggered 3 or 4 times, and to print foo to a serial console line each time it notices that the DMA has not completed. The interesting thing is that on FreeBSD 4.8SMP, and FreeBSD 5.1-current (SMP), the data has arrived by the time the interrupt handler is called. This would be easy to explain if the interrupt latency were vastly different between the FreeBSD SMP kernel and the other kernels, but it does not seem to be. It actually seems to be about 5us faster (interrupt to wakeup of user-level process, so some fat is in there) than the FreeBSD UP kernel, possibly due to APIC io. *measurement done without console printf* My question is: What the heck could the SMP kernel be doing which causes the DMA to complete faster? Thanks, Drew ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: BSD make question
Ruslan Ermilov writes: Ah, didn't notice it. Try this: .for f in $(LIB) $(f:.c=.o): $(f) gcc -DLIB -c $ -o $@ .endfor Thanks! That works. Drew ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: BSD make question
Ruslan Ermilov writes: On Thu, Aug 07, 2003 at 02:42:30PM -0400, Andrew Gallatin wrote: Using BSD make, how can I apply different rules based on different directories while using only a single makefile? There's a .CURDIR variable that can be used to conditionalize parts of a makefile. Ie, the appended Makefile results in the following compilations: gcc -DLIB -c lib/foo.c -o lib/foo.o gcc -DLIB -c lib/bar.c -o lib/bar.o gcc -DMCP -c mcp/baz.c -o mcp/baz.o Is it possible to do something similar with BSD make? It just works as is with bmake. What's your problem, Drew? ;-) $ make -n cc -O -pipe -march=pentiumpro -c lib/foo.c ;) But its missing the -DLIB or -DMCP. Thanks for the .CURDIR hint. Drew ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
BSD make question
Using BSD make, how can I apply different rules based on different directories while using only a single makefile? Ie, the appended Makefile results in the following compilations: gcc -DLIB -c lib/foo.c -o lib/foo.o gcc -DLIB -c lib/bar.c -o lib/bar.o gcc -DMCP -c mcp/baz.c -o mcp/baz.o Is it possible to do something similar with BSD make? Drew ### .SUFFIXES: .SUFFIXES: .o .c LIB=\ lib/foo.c \ lib/bar.c MCP=\ mcp/baz.c all: $(LIB:.c=.o) $(MCP:.c=.o) lib/%.o: lib/%.c gcc -DLIB -c $ -o $@ mcp/%.o: mcp/%.c gcc -DMCP -c $ -o $@ .PHONY: clean clean: rm -f $(LIB:.c=.o) $(MCP:.c=.o) ### ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: per-open device private data, mmap
Eric Anholt writes: shouldn't be too big of an issue. The unique identifier is the big problem and the fileops trick should work for that. However, is this going to get easier some day? Are there any plans to pass the struct file down to the drivers and have a void * in there for private data? I think that phk is working on this for 6.x In the meantime, I have a new driver Im developing which uses the fileops trick you describe, but takes it a step further and conjurs up a new vnode. That makes it work with mmap. I've not run into any problems yet, but it is lightly tested. Cheers, Drew /* * Conjure up our own vnode out of thin air. We need the * vnode so that we can stash a pointer to the per-connection * priv struct for use in open/close/ioctl and mmap. This is * tricky, because we need make it look enough like the device * vnode so that VOP_GETATTR() works on the slave vnode in mmap() */ static int xxx_conjur_vnode(dev_t dev, struct thread *td) { int error, fd; struct filedesc *fdp; struct file *fp; struct vnode *vn = NULL, *vd = NULL; struct cdev *rdev; fdp = td-td_proc-p_fd; if (fdp == NULL) return (0); if (td-td_dupfd = 0) return ENODEV; rdev = xxx_malloc(sizeof(*rdev), M_WAITOK); if ((error = falloc(td, fp, fd)) != 0) goto abort_with_rdev; vd = SLIST_FIRST(dev-si_hlist); if ((error = getnewvnode(none, vd-v_mount, vd-v_op, vn))) goto abort_with_falloc; vn-v_type = VCHR; /* really should clone v_vdata not copy pointer */ vn-v_data = vd-v_data;/* for VTOI in devfs_getattr() */ /* copy our cdev info */ vn-v_rdev = rdev; bcopy(vd-v_rdev, vn-v_rdev, sizeof(*rdev)); /* finally, save the data pointer (our softc) */ vn-v_rdev-si_drv2 = 0; fp-f_data = (caddr_t)vn; fp-f_flag = FREAD|FWRITE; fp-f_ops = xxx_fileops; fp-f_type = DTYPE_VNODE; /* so that we can mmap */ /* * Save the new fd as dupfd in the proc structure, then we have * open() return the special error code (ENXIO). Returning with a * dupfd and ENXIO causes magic things to happen in kern_open(). */ td-td_dupfd = fd; return 0; abort_with_rdev: xxx_free(rdev); abort_with_falloc: FILEDESC_LOCK(fdp); fdp-fd_ofiles[fd] = NULL; FILEDESC_UNLOCK(fdp); fdrop(fp, td); return (error); } static int xxx_fileclose(struct file *fp, struct thread *td) { int ready_to_close; struct vnode *vn; struct cdev *rdev; xxx_port_state_t *ps; vn = (struct vnode *)fp-f_data; rdev = vn-v_rdev; ps = rdev-si_drv2; rdev-si_drv2 = NULL; /* replace the vnode ops so that devfs doesn't try to reclaim anything */ vn-v_op = spec_vnodeop_p; vn-v_type = VNON; /* don't want to freedev() in vgonel()*/ vgone(vn); /* free our private rdev */ xxx_free(rdev); if (ps) { xxx_mutex_enter(ps-sync); /* Close the port if there are no more mappings */ ready_to_close = ps-ref_count == 0; XXX_DEBUG_PRINT (XXX_DEBUG_OPENCLOSE, (Board %d, port %d closed\n, ps-is-id, ps-port)); xxx_mutex_exit(ps-sync); if (ready_to_close) { xxx_common_close (ps); } else { XXX_INFO ((Application closed file descriptor while mappings still alive: port destruct delayed\n)); } } return (0); } static int xxx_mmap(dev_t dev, vm_offset_t offset, #if MMAP_RETURNS_PINDEX == 0 vm_offset_t *paddr, #endif int nprot) { int status; xxx_port_state_t *ps; void *kva; #if MMAP_RETURNS_PINDEX vm_offset_t phys; vm_offset_t *paddr = phys; #endif ps = (xxx_port_state_t *)dev-si_drv2; ... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Smarter kernel modules?
M. Warner Losh writes: In message: [EMAIL PROTECTED] Sean Kelly [EMAIL PROTECTED] writes: : Has anyone ever considered embedding some sort of identifier in kernel : modules to keep them from being loaded with the wrong kernel? Actually, I was talking about this with Matt Dodd this morning... Whatever we do, lets NOT be anywhere near as fascist as linux. If we implement any kind of versioning, its got to be fine-grained enough that 3rd party binary modules will not get broken by an ABI change in an area of the kernel which they do not care about, or there needs to be a way for a module to opt-out. My company ships a binary driver (ethernet network, and character device) built on 4.1.1-R, and it has continued to work at least until 4.7-R. I'd like to see that same level of ABI stability throughout the 5-STABLE branch. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Mac iBook OS10 + BSD
void writes: Also, X11 feels quite slow if you're used to X11. (I'm writing this from KDE running under XDarwin on a ti powerbook, 867MHz). Apple's new X11-for-Mac-OS-X beta software is much faster than XDarwin. Much buggier too. And it lacks full screen mode. I've dropped back to just using ctwm and XDarwin. Aqua is all the eye-candy one man can stand, kde on top is just overkill. FWIW, the only config I've found which allows cut and paste between X and Aqua is XDarwin+{C}TWM, or Apple's X11 which uses their hideous Aqua-like WM... Half the reason I use X rather than a bunch of terminals is to *avoid* the clunky, non-custamizable UI that the Aqua interface gives you.. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Mac iBook OS10 + BSD
Julian Elischer writes: news to me.. I run multiple terminal windows, each running tcsh. That's with an unaltered macosX 10.1.5. from the user perspective it looks a lot like FreeBSD 3.{something} I think he means text-only syscons like vtys. MacOSX does not have them. Nobody has ever been able to tell me how to make a serial console work on my OS-X crashbox either. The new one is basically like FreeBSD 4.4. All versions of OSX feel more like Nextstep than any version of FreeBSD. How much can BSD share things like utilities and config files with OS10? Is there any special compatability due to the OSs being similar in some ways? Depends what you mean by share. OSX uses Nexstep's netinfo database for managing things like hosts, passwd, groups. The config files in /etc are just decoys there to confuse you. It uses series of startup scripts somewhat similar to RCng. How should I plan my BSD intallation? Any special advantage of having BSD on a Mac with OS10, as compared to Linux Slackware? Stick with MacOS-X it's going to run better onthis hardware than anything else. It all depends what you mean by better. If you're talking pure unix performance, then I say you're full of crap. OS-X is a dog. Linux runs circles around it. If you like, I'll post some LMbench numbers showing linux kicking sand in OS-X's face on my dual 800MHz crashbox when I return from vacation. I'm hoping our powerpc port comes close to doing as well as linux. Also, X11 feels quite slow if you're used to X11. (I'm writing this from KDE running under XDarwin on a ti powerbook, 867MHz). However, if you're talking about ease of operation, then I agree with you 100%. Suspend always works, the my ti powerbook is up and on the network before I have the case open. My wife bought me a 2 button+ scrollwheel mouse for Christmas. The mouse worked (scrollwheel included) with no configuration at all, just as soon as I plugged it in. It even worked in XDarwin. I was amazed. Iphoto rocks. Its nice being able to run M$ Office natively, etc. Fink (based on debian's dselect/apt-get) is great. As much as I hate to say it, I think its better than our ports/pkgs system. I love how it upgrades packages + dependancies seemlessly when you upgrade one component. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: core dump from ffs_write - i think
Nate Lawson writes: Try to figure out where it was in frames 8 and 10 (probably a module). Try the gdbmods port (/usr/ports/devel/gdbmods) Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Ati Rage 128: Dpms suspend failes
Eric Anholt writes: On Tue, 2002-10-22 at 07:37, Andrew Gallatin wrote: .. Do I need something special in my /etc/X11/XF86Config to make this work? I never had problems on my old system (an alpha with a 3dlabs Permedia-2 based AGP card). Could you send me a grep -i dpms /etc/X11/XF86Config /var/log/XFree86.0.log ? OK, I'm an idiot. I did not have Option DPMS in the monior section of my XF86Config file. Sorry for wasting your time. But in my own defense... should xset even let me enable DPMS if its turned off at a lower level? If xset had complained and not allowed me to enable DPMS, I would have taken a harder look at my XF86Config file.. Talk about a POLA. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Ati Rage 128: Dpms suspend failes
Eric Anholt writes: On Mon, 2002-10-21 at 08:16, Hanspeter Roth wrote: Hello, I have two hosts connected to one monitor. My idea is attach the display to the other host by issuing `xset dpms force suspend'. This works on one host with a Matrox Millenium. On the host with an Ati Rage 128 Pro TF it works with Netbsd, but it doesn't work with FreeBSD 4.7-Release. The screen only turns blank but the LED remains green. This is the same when issuing `xset s activate'. What could be the reason on FreeBSD 4.7 that dpms force suspend doesn't work? Installed are XFree86-Server-4.2.1_3 and XFree86-libraries-4.2.1_1.) You need XFree86-Server-4.2.1_4 or later (it's at _5 now). I've now upgraded to XFree86-Server-4.2.1_5. dpms still does not work for me: % xset dpms force off ; xset q | tail -5 Standby: 300Suspend: 600Off: 660 DPMS is Enabled Monitor is Off Font cache: hi-mark (KB): 1024 low-mark (KB): 768 balance (%): 70 (and I'm looking at the monitor and it is on) My video card is an ATI Rage 128: none1@pci1:0:0: class=0x03 card=0x7106174b chip=0x54461002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies' device = 'Rage 128 Pro AGP 4x' class= display subclass = VGA Do I need something special in my /etc/X11/XF86Config to make this work? I never had problems on my old system (an alpha with a 3dlabs Permedia-2 based AGP card). Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Ati Rage 128: Dpms suspend failes
Hanspeter Roth writes: On Oct 22 at 10:37, Andrew Gallatin spoke: I've now upgraded to XFree86-Server-4.2.1_5. dpms still does not work for me: % xset dpms force off ; xset q | tail -5 I didn't care about off. My monitor seems to behave the similar when set to `off' as when set to suspend or standby. The status LED turns yellow and the screen turns blank and recovery takes a few seconds. As does mine (based on experiance from when I had a video card that worked in my old machine :-( ) My application is to switch the display to the alternate host. This is working now. Lucky you! What does pciconf -lv say about your card? Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Ati Rage 128: Dpms suspend failes
Eric Anholt writes: On Mon, 2002-10-21 at 08:16, Hanspeter Roth wrote: Hello, I have two hosts connected to one monitor. My idea is attach the display to the other host by issuing `xset dpms force suspend'. This works on one host with a Matrox Millenium. On the host with an Ati Rage 128 Pro TF it works with Netbsd, but it doesn't work with FreeBSD 4.7-Release. The screen only turns blank but the LED remains green. This is the same when issuing `xset s activate'. What could be the reason on FreeBSD 4.7 that dpms force suspend doesn't work? Installed are XFree86-Server-4.2.1_3 and XFree86-libraries-4.2.1_1.) You need XFree86-Server-4.2.1_4 or later (it's at _5 now). I'm running 4.2.1_4 and dpms does not work for me. I just grabbed some diffs from the Xfree86 cvs to bring drivers/ati/r128_driver.c up to 1.57.2.1 and drivers/ati/r128_reg.h up to 1.14 and rebuilt the my r128_drv.o module. I'll see if it works the next time X crashes.. (I'm running current, so X crashes once/day or so..) Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: gdb support for kernel modules
Giorgos Keramidas writes: On 2002-10-07 17:09, Ian Dowse [EMAIL PROTECTED] wrote: This is something I have been meaning to investigate for a while: [...] Anyway, below is a proof-of-concept patch that does the basics, but among other things, its logic for locating the kernel module files needs a lot of work - currently it just assumes /boot/kernel/module, diff -N solib-fbsd-kld.c --- /dev/null 1 Jan 1970 00:00:00 - +++ solib-fbsd-kld.c 7 Oct 2002 10:39:48 - + snprintf (new-so_name, SO_NAME_MAX_PATH_SIZE, /boot/kernel/%s, + new-so_original_name); I'm not really sure this would work for remote gdb sessions, but locally it's probably more correct to use sysctl and grab the value of kern.module_path or kern.bootfile instead of hardwiring `/boot/kernel/%s'. gdbmods does an ugly thing which is incredibly useful. It assumes that the modules you want to debug are sitting in your kernel build pool. So what it does is extract the build directory from the kernel (using strings), and runs a find rooted there for the module in question. But its a shell script, so it can get away with stuff like that ;) Perhaps we could embed the build directory somewhere the elf headers of each kernel module (including the kernel) so that kgdb could find the corresponding build file with symbols. Then your (very cool) solib-fbsd-kld.c could easily find the kernel and modules which match the kernel you're debugging.. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
how are sysctls in klds relocated?
Can somebody explain to me how sysctls from klds are relocated? For background, after the binutils upgrade in -stable, I'm unable to load linux.ko on my desktop. The faulting address is always 0x9010102464c457f (oidp-oid_parent) and the pc is in sysctl_find_oid_name(). The crash looks like this: acd0: CDROM CD-ROM CDU4011 at ata1-slave PIO4 Mounting root from ufs:/dev/ad2a linker_load_file: trying to load osf1 as elf64 linker_make_file: new file, filename=osf1.ko linker_file_register_sysctls: registering SYSCTLs for osf1.ko linker_file_register_sysctls: SYSCTLs 0 linker_file_sysinit: calling SYSINITs for osf1.ko linker_file_sysinit: SYSINITs 0xfe00020799a0 linker_load_file: trying to load linux as elf64 linker_make_file: new file, filename=linux.ko linker_file_register_sysctls: registering SYSCTLs for linux.ko linker_file_register_sysctls: SYSCTLs 0xfe00020a6d08 fatal kernel trap: trap entry = 0x2 (memory management fault) a0 = 0x9010102464c457f a1 = 0x1 a2 = 0x0 pc = 0xfc3f42dc ra = 0xfc3f436c curproc= 0xfe001557e980 pid = 15, comm = kldload #0 0xfc3ed460 in dumpsys () at ../../kern/kern_shutdown.c:486 #1 0xfc3ecfa8 in boot (howto=256) at ../../kern/kern_shutdown.c:316 #2 0xfc3ed870 in panic (fmt=0xfc61da1c trap) at ../../kern/kern_shutdown.c:595 #3 0xfc5ad4c0 in trap (a0=0x9010102464c457f, a1=0xfe0019c49e30, a2=0, entry=2, framep=0xfe0019c49c20) at ../../alpha/alpha/trap.c:551 #4 0xfc59f31c in XentMM () #5 0xfc3f3f2c in sysctl_register_oid (oidp=0xfe00020cc000) at ../../kern/kern_sysctl.c:102 the rest from ddb, which actually works to get a stack trace.. sysctl_find_oid_name() sysctl_register_iod() sysctl_register_set() linker_file_register_sysctls() linker_load_file() kldload() syscall() (gdb) p *(struct linker_set *) 0xfe00020a6d08 $6 = { ls_length = 4, ls_items = {0xfe000208} } (gdb) p/x *(struct sysctl_oid *)0xfe000208 $5 = { oid_parent = 0x9010102464c457f, oid_link = { sle_next = 0x0 }, oid_number = 0x90260003, oid_kind = 0x1, oid_arg1 = 0x8d40, oid_arg2 = 0x40, oid_name = 0x18140, oid_handler = 0x380040, oid_fmt = 0x1a001d0043, oid_refcnt = 0x1 From this, it appears that the contents of this linkerset are not getting relocated. How is that supposed to happen? Interestingly enough, the value of oid_parent looks a hell of a lot like offset 0 of the kld file, and the rest of the values seem to match further offsets in the file: % hd /modules/linux.ko 7f 45 4c 46 02 01 01 09 00 00 00 00 00 00 00 00 |.ELF| 0010 03 00 26 90 01 00 00 00 00 8b 00 00 00 00 00 00 |...| 0020 40 00 00 00 00 00 00 00 d8 a1 12 00 00 00 00 00 |@...| 0030 00 00 00 00 40 00 38 00 03 00 40 00 1f 00 1c 00 |@.8...@.| 0040 01 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 || ... Does anybody have any idea WTF is happening here? I'd like to figure this out before 4.7-release.. Whats *really* odd (and annoying) is that I cannot reprduce this on my crashbox. The same binaries work fine on it ... this only happens on my desktop. Thanks, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: gigabit NIC of choice?
Terry Lambert writes: I guess the next question is Anyone know a gigabit NIC that is currently in production, which has hack-friendly firmware?... I think our products are the only game in town. http://www.myri.com/myrinet/product_list.html http://www.myri.com/myrinet/performance/index.html Yes, they are a little pricy, but quite hackable. And the link speed is twice gig ethers's (ie, 2Gb/sec full duplex, rather than 1Gb/sec full duplex). Sorry for the shameless plug ;) Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: gigabit NIC of choice?
Brandon D. Valentine writes: running it through a computer (AFAIK). There are rumors afloat of Gigabit Ethernet linecards for Myrinet switch hardware on the horizon Slightly more than rumours -- http://www.myri.com/news/02512/slides/Seitz_roadmap.pdf http://www.myri.com/news/02512/slides/Seizovic_lanai.pdf Cheers, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: More dynamic KVA_SPACE
Terry Lambert writes: Wilko Bulte wrote: I knew not to recommend the Alpha because it is limited to 2G of physical memory. ? FreeBSD is limited to using 2G of whatever you have in the Alpha. Which is a deficiency that has been debated a number of times, IIRC it needs bus space work etc. See the archives.. I know... which is why I didn't recommend it. 8-). Not bus space, busdma! The 2GB limit is due to the lack of MI PCI device driver support for busdma. Especially network drivers, most scsi drivers already do busdma. So as soon as other platforms work with more than the size of their direct map (whatever it happens to be), alpha will too. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: remote crashdump
Jacques Fourie writes: I was wondering what the amount of effort involved would be to add support for dumping on a remote machine via tftp, for example. This would be extremely handy for devices with little or no hard disk space. Does anyone know of anything with this functionality? http://www.cs.duke.edu/~anderson/freebsd/netdump/ This worked a few years ago when 4.0 was -current. You might want to see how hard it would be to update it for -stable. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: remote crashdump
Terry Lambert writes: The closest anyone has come to this (to my knowledge) is the creation of a polled network driver and a tiny UDP stack to permit remote debugging over the network to a different machine on the same switch. This isn't very close to dumping. I think Darrell's netdump has been discussed before. (http://www.cs.duke.edu/~anderson/freebsd/netdump/) It does exactly what the poster wants, but needs to be cleaned up and brought up to date. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: dual booting current/stable on x86?
Cyrille Lefevre writes: On Sun, Jun 30, 2002 at 09:23:22PM -0400, Andrew Gallatin wrote: How do I dual boot -current and -stable from different slices on the same IDE disk? (and linux too.) When I tell lilo to boot hde3, I get the -stable boot2 and /boot/loader from hde2 (ad4s2a). I can then monkey around setting currdev and hints and unloading the -stable kernel then boot -current, but I'd like to just pop right into -current on ad4s3a if I choose it. Is there a magic bullet? I'd like to continue using lilo so that I can choose what OS to load via a serial console.. what is the problem w/ the following entries ? other=/dev/hde2 label=stable alias=s table=/dev/hde loader=/boot/chain.b other=/dev/hde3 label=current alias=c table=/dev/hde loader=/boot/chain.b Just that it behaves exactly as described above -- they both boot -stable. what is the content of /boot/loader.conf and /boot/loader.conf.local for each FreeBSD ? /boot/loader.conf: -stable: hw.ata.wc=1 -current: console=comconsole /boot/loader.conf.local is empty both places. did you tryed grub which is far better than lilo :P x86 bootloaders terrify me, so I have not tried grub. Does grub understand reiserfs? you could also take a look at /usr/share/examples/bootforth then have something like : /boot/stable.conf currdev=disk1s2a rootdev=disk1s2a /boot/current.conf currdev=disk1s3a rootdev=disk1s3a hope this help ? Thanks.. it did help. I just discovered liloboot. I may just hack myself together a custom liloboot and forget about it. That seems to be the most straightforward solution. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: dual booting current/stable on x86?
Chan Tur Wei writes: I'm not sure how booting with lilo will work (never played with it). Instead, I dug around a bit previously, and I found that boot1.s reads: # # If we are on a hard drive, then load the MBR and look for the first # FreeBSD slice. We use the fake partition entry below that points to # the MBR when we call nread. The first pass looks for the first active # FreeBSD slice. The second pass looks for the first non-active FreeBSD # slice if the first one fails. # So unless someone specifically sets the active partition, the 1st FreeBSD one, usually -stable, will get loaded. Since boot1+boot2 is loaded by the partition boot boot0, or the standard DOS boot (or, even MS's multi boot selector), the above may cause the 2nd FreeBSD slice to never get loaded. Incidentally, our booteasy (boot0.s) is one such someone. Maybe if lilo or liloboot does the same thing, it will work too. Excellent. Thanks for the pointer. Now I at least have some understanding of what's happening. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: dual booting current/stable on x86?
Chan Tur Wei writes: So unless someone specifically sets the active partition, the 1st FreeBSD one, usually -stable, will get loaded. Since boot1+boot2 is loaded by the partition boot boot0, or the standard DOS boot (or, even MS's multi boot selector), the above may cause the 2nd FreeBSD slice to never get loaded. Incidentally, our booteasy (boot0.s) is one such someone. Maybe if lilo or liloboot does the same thing, it will work too. Yep, it turns out that you can make lilo set a partition active and/or deactivate a partition via lilo's change keyword: other = /dev/hde2 label=stable alias=s table=/dev/hde loader=/boot/chain.b change partition=/dev/hde2 activate partition=/dev/hde3 deactivate other = /dev/hde3 label=current alias=c table=/dev/hde loader=/boot/chain.b change partition=/dev/hde3 activate partition=/dev/hde2 deactivate Thanks again for the pointer; I'm now booting directly to -current. Perhaps this should be a FAQ entry.. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
dual booting current/stable on x86?
How do I dual boot -current and -stable from different slices on the same IDE disk? (and linux too.) When I tell lilo to boot hde3, I get the -stable boot2 and /boot/loader from hde2 (ad4s2a). I can then monkey around setting currdev and hints and unloading the -stable kernel then boot -current, but I'd like to just pop right into -current on ad4s3a if I choose it. Is there a magic bullet? I'd like to continue using lilo so that I can choose what OS to load via a serial console.. Thanks, Drew The data for partition 1 is: sysid 131 (0x83),(Linux native) start 63, size 10522512 (5137 Meg), flag 0 beg: cyl 0/ head 1/ sector 1; end: cyl 654/ head 254/ sector 63 The data for partition 2 is:--- STABLE sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 12578895, size 12562830 (6134 Meg), flag 80 (active) beg: cyl 783/ head 0/ sector 1; end: cyl 1023/ head 254/ sector 63 The data for partition 3 is:--- CURRENT sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 25141725, size 13960485 (6816 Meg), flag 80 (active) beg: cyl 1023/ head 255/ sector 63; end: cyl 1023/ head 254/ sector 63 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Re: bge driver issue
John Polstra writes: On the i386, living with the misalignment is probably the best solution, unfortunately. The only alternatives I can think of are: - bcopy the packet up by 2 bytes after reception to align the payload, or - disable PCI-X mode on the bus If the bge's API allows it, you could setup a receive descriptor with a length of 14 bytes (size of ethernet header), and start the next descripter 2 bytes after it (at a 16 byte offset from the front of the mbuf). When the receive is done, just copy the 14 bytes. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Possible problem with rl (Realtek) ethernet card driver in 4.5-STABLE
Nigel Roberts writes: #10 0xc0237fbe in rl_rxeof (sc=0xc0b9d200) at ../../pci/if_rl.c:1151 #11 0xc023827a in rl_intr (arg=0xc0b9d200) at ../../pci/if_rl.c:1342 #12 0xc0279c7a in vec3 () #13 0xc01c2196 in ether_output (ifp=0xc0ba4000, m=0xc076af00, dst=0xc0c28770, rt0=0xc0c59d00) at ../../net/if_ethersubr.c:369 #14 0xc01d4663 in ip_output (m0=0xc076af00, opt=0x0, ro=0xc02f9970, flags=1, imo=0x0) at ../../netinet/ip_output.c:822 Was the realtek really at IRQ 3? I'm NOT an x86 hacker, and I don't understand the interrupt code there very well.. Is it possible to have an irq line which is shared between 2 devices which use different interrupt masks? If so, what prevents intr_mux() from being called for a TTY interrupt, and then calling another driver which shares the line but has a NET mask, even when NET interrupts are masked? Does this go away if you remove the serial line driver (sio) from your kernel? Can we see a (non verbose) dmesg from this box? Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: How to dump a 4gig system on panic ?
Marc G. Fournier writes: Okay, seem to be about halfway there ... client kldload's no problem, server runs ... do a ctl-alt-esc to get into DDB and type panic, and it gives a message that its looking for the server and it finds it on the right IP ... then it prints out a '1023' and finishes the panic ... On the 'dump server', a vmcore gets created, but its zero length ... thoughts? As I said, it hasn't been used for quite some time. It may require work to get it working again. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: How to dump a 4gig system on panic ?
There are 3 things you could do: a) Limit your memory size in the loader b) Use partial dumps c) Use network dumps if you have another machine to run the dump server on. Both the netdump partial dump code can be found at: http://www.cs.duke.edu/~anderson/freebsd/ Both may be a little out of date require some work to get working with a recent -stable, as they were developed in the days when 4.0 was -current. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: How to dump a 4gig system on panic ?
Marc G. Fournier writes: Oh, I like the netdump one ... I have a machine sitting right beside this one that I can use to dump to ... has anyone thought to include this as a 'standard' sort of thing with FreeBSD? So that it keeps up with the current code? I plan to integrate partial dumps as an option at some point, but my only -current machines are alphas, so I need to get gdb working again there first. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: How to dump a 4gig system on panic ?
Marc G. Fournier writes: Well, downloaded the files (a .tar.gz would be nice? *grin*) and the client built perfectly, and kldload worked fine ... is there some way someone can suggest of 'simulating a crash'? Some way to test to make sure that it is working as expected? I have a 4.6-PRE machine on my desk that I'd like to test with before I try it on the real thing, if at all possible? break into ddb do: ddb call dumpsys() Unless you're running a savecore which supports partial dumps, you need to disable partial dumps (sysctl net.net_dump.partial=0). And remember, you'll be spewing the contents of your ram (possibly passwords, etc) across the network in clear text. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
pushal ebp
Kenneth Culver writes: So, as far as I can tell, this version of glibc is doing the Right Thing, and the ebp register is getting messed up somewhere along the line in either the assembly code that handles the 0x80 trap in FreeBSD, or in syscall2 (I think it's probably the asm that handles the 0x80 trap)... Can anyone confirm this? I just looked at the NetBSD code like linux, they use a macro which individually pushes the registers onto the stack rather than using pushal (which I assume is the same as what intel calls PUSHAD in their x86 instruction set ref. manual). NetBSD stopped using pushal in 1994 in rev 1.85 of their arch/i386/i386/locore.s in a commit helpfully documented Don't use pusha and popa. Does anybody know why the other OSes push the registers individually, rather than using pushal? Could our using pushal be causing Kenneth's ebp to get lost, or is this just a red herring? Thanks, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: pushal ebp
Kenneth Culver writes: I just looked at the NetBSD code like linux, they use a macro which individually pushes the registers onto the stack rather than using pushal (which I assume is the same as what intel calls PUSHAD in their x86 instruction set ref. manual). NetBSD stopped using pushal in 1994 in rev 1.85 of their arch/i386/i386/locore.s in a commit helpfully documented Don't use pusha and popa. Does anybody know why the other OSes push the registers individually, rather than using pushal? Could our using pushal be causing Kenneth's ebp to get lost, or is this just a red herring? Thanks, Drew according to the intel docs, pushad (or what I'm assuming is pushal in our case) pushes eax, ecx, edx, ebx then pushes some temporary value (the original esp I think) then pushes ebp, esi, and edi: this is from the documentation for pushad IF OperandSize = 32 (* PUSHAD instruction *) THEN Temp (ESP); Push(EAX); Push(ECX); Push(EDX); Push(EBX); Push(Temp); Push(EBP); Push(ESI); Push(EDI); so could this be the problem? Ken I don't think so. The temp its pushing is the stack pointer. If you look at the layout of the trap frame, then you'll see tf_isp comes between tf_ebp tf_ebx. I assume tf_isp is the stack pointer, so that should be OK.. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: implementing linux mmap2 syscall
Kenneth Culver writes: OK, I THINK I found what calls the actual kernel syscall handler, and sets it's args first, but I'm not sure: from linux_locore.s NON_GPROF_ENTRY(linux_sigcode) ... Does anyone who actually knows assembly have any ideas? This is the linux sigtramp, or signal trampoline. It is used to wrap a signal handler. Eg, the kernel calls it (by returning to it) when it delivers a signal. It calls the apps signal handler. When the handler returns, it calls the linux sigreturn system call. This has essentially nothing to do with system calls. The system call entry point on x86 is int0x80_syscall, which is labled: /* * Call gate entry for FreeBSD ELF and Linux/NetBSD syscall (int 0x80) .. This then calls syscall2(), which calls the linux prepsyscall. Maybe the argument isn't where you expect it to be, but is there. Can you make a test program which calls mmap2 with its 6th arg as something unique like 0xdeadbeef? Then print out (in hex :) the trapframe from the linux prepsyscall routine see if you can find the deadbeef. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: implementing linux mmap2 syscall
Kenneth Culver writes: OK, I found another problem, here it is: static void linux_prepsyscall(struct trapframe *tf, int *args, u_int *code, caddr_t *params) { args[0] = tf-tf_ebx; args[1] = tf-tf_ecx; args[2] = tf-tf_edx; args[3] = tf-tf_esi; args[4] = tf-tf_edi; *params = NULL; /* no copyin */ } Basically, linux_mmap2 takes 6 args, and this looks here like only 5 args are making it in... I checked this because the sixth argument to linux_mmap2() in truss was showing 0x6, but when I printed out that arg from the kernel, it was showing 0x0. Am I correct here? Ken Yes. According to http://john.fremlin.de/linux/asm/, linux used to parse only 5 args but now it parses six. Try adding: args[5] = tf-tf_ebp; Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: implementing linux mmap2 syscall
Kenneth Culver writes: Basically, linux_mmap2 takes 6 args, and this looks here like only 5 args are making it in... I checked this because the sixth argument to linux_mmap2() in truss was showing 0x6, but when I printed out that arg from the kernel, it was showing 0x0. Am I correct here? Ken Yes. According to http://john.fremlin.de/linux/asm/, linux used to parse only 5 args but now it parses six. Try adding: args[5] = tf-tf_ebp; I don't think that arg is there: Apr 23 10:36:13 ken /kernel: tf-tf_ebp = -1077938040 Ken My guess is that we're not doing something we should be doing in int0x80_syscall in order to get that last arg. But I do not have enough x86 knowledge to understand how the trapframe is constructed, so I cannot tell what needs to be done. Perhaps somebody with more x86 fu can help. Sorry, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: SSE bcopy
Denis Serenyi writes: I've been looking at adding an SSE bcopy that runs at user-level to a program that I'm working on. I'm using FreeBSD 4.3 currently. I wrote the routine, and when I execute it, I get an illegal instruction exception when I try to execute the first SSE instruction (movups). After searching the hackers archives, I'm guessing that this is because FreeBSD 4.3 does not execute the instructions at boot time to enable SSE instructions to be executed, and also because FreeBSD 4.3 does not save the 128-bit SIMD registers on context switches. Am I correct in this assessment? It also seems like this support has been added to FreeBSD 4.5. Is this correct? Assuming yes, in what release was SSE support added to FreeBSD? Has anyone done a patch that can be applied to FreeBSD 4.3, or are the changes non-trivial? As David says, have a look at http://kobe1995.net/~kaz/FreeBSD/SSE.en.html There is a patch there for 4.3. What are the performance implications to an SSE bcopy? How much faster is it than a normal bcopy? Would you consider releasing your code under a BSD license so that others could play with it, and possibly integrate it (or something based on it) into FreeBSD? Thanks, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: SSE bcopy
Denis Serenyi writes: I don't think there will be a problem with releasing my source code. That is, if it works and is truly a performance win :) Cool! There are some PDF docs available on Intel's web site that have sample code for an SSE bcopy, and give performance results (in particular, Block Copy Using Pentium III Streaming SIMD Extensions). It seems to be about 60 - 80% faster than using MMX instructions. However, when you use SSE to store data in the destination memory location, you bypass the processor's caches. So, if you were to touch the data soon after the bcopy, it is no win at all. Hey, that's great! The copies I care about are in situtations where the data is not touched until much later, so the normal copy is typically a big loose because it blows out the cache.. Good luck, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
performance of mbufs vs contig buffers?
After updating the firmware on our our 2 gigabit nic to allow enough scatter entries per packet to stock the 9K (jumbo frame) receive rings with cluster mubfs rather than contigmalloc'ed buffers(*), I noticed a dramatic performance decrease: netperf TCP_STREAM performance dropped from 1.6Gb/sec to 1.2Gb/sec. (*) By contigmalloc'ed buffers, I mean a few megs of memory, carved up into 9K chunks and managed via slists, like is done in most of the in-tree gigabit ethernet drivers. My first thought was that the firmware and/or processor on the NIC was somehow overwhelmed by the extra work of doing 5 2K DMAs rather than one 9K DMA. So I rebuilt my kernel driver using 4K cluster mbufs and added an option to the driver so that when it stocks the receive rings with contig buffers which are greater than a PAGE_SIZE, it breaks them up at page (4K) boundaries. After making these change, I'm roughly comparing apples to apples. Each packet is received into 3 DMA descriptors. However, I'm still seeing the same performance - 1.6Gb/sec receives into contigmalloc'ed buffers whose DMA descriptors are broken up into PAGE_SIZE'ed chunks, and 1.2Gb/sec into 4K mbufs. Is it possible that my problems are being caused by cache misses in on cluster mbufs occuring when copying out to userspace as another packet is being DMA'ed up? I'd thought that since the cache line size is 32 bytes, I'd be pretty much equally screwed either way. Also, UDP_STREAM performance goes from 1.75Gb/sec - 1.25 Gb/sec, so its not some weird TCP quirk. All the UDP drops are from the socketbuffer being full (the host is receiving data at 1.9Gb/sec into main memory in both cases), so its as if I have less memory bandwidth when using normal cluster mbufs. I've been trying to use perfmon to compare cache misses, but I'm not sure what options I should be using.. Does anybody have any ideas why contig malloc'ed buffers are so much quicker? Thanks! Drew PS: Here's the dmesg from the machine in question. Serverworks LE 3.0, 1GHz PIII (256K cache). I've got page coloring enabled in the kernel; it doesn't seem to make much difference. Copyright (c) 1992-2002 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.5-STABLE #1: Mon Apr 8 17:33:51 EDT 2002 gallatin@ugly:/usr/src/sys/compile/PERFMON Timecounter i8254 frequency 1193182 Hz CPU: Pentium III/Pentium III Xeon/Celeron (999.53-MHz 686-class CPU) Origin = GenuineIntel Id = 0x68a Stepping = 10 Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE real memory = 536805376 (524224K bytes) avail memory = 517902336 (505764K bytes) Preloaded elf kernel kernel.perfmon at 0xc044f000. Pentium Pro MTRR support enabled md0: Malloc disk Using $PIR table, 9 entries at 0xc00f5250 npx0: math processor on motherboard npx0: INT 16 interface pcib0: ServerWorks NB6635 3.0LE host to PCI bridge on motherboard pci0: PCI bus on pcib0 atapci0: Promise ATA66 controller port 0xdf00-0xdf3f,0xdfe0-0xdfe3,0xdfa8-0xdfaf,0xdfe4-0xdfe7,0xdff0-0xdff7 mem 0xfc9e-0xfc9f irq 10 at device 2.0 on pci0 ata2: at 0xdff0 on atapci0 ata3: at 0xdfa8 on atapci0 fxp0: Intel Pro 10/100B/100+ Ethernet port 0xd800-0xd83f mem 0xfc80-0xfc8f,0xfc9ce000-0xfc9cefff irq 9 at device 6.0 on pci0 fxp0: Ethernet address 00:30:48:21:e4:47 inphy0: i82555 10/100 media interface on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto isab0: ServerWorks IB6566 PCI to ISA bridge at device 15.0 on pci0 isa0: ISA bus on isab0 atapci1: ServerWorks ROSB4 ATA33 controller port 0xffa0-0xffaf at device 15.1 on pci0 ata0: at 0x1f0 irq 14 on atapci1 ata1: at 0x170 irq 15 on atapci1 pci0: OHCI USB controller at 15.2 irq 10 pcib1: ServerWorks NB6635 3.0LE host to PCI bridge on motherboard pci1: PCI bus on pcib1 pci1: ATI Mach64-GO graphics accelerator at 1.0 irq 11 pci1: unknown card (vendor=0x14c1, dev=0x8043) at 2.0 irq 5 orm0: Option ROMs at iomem 0xc-0xc7fff,0xc8000-0xc97ff,0xc9800-0xca7ff on isa0 fdc0: NEC 72065B or clone at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: 1440-KB 3.5 drive on fdc0 drive 0 atkbdc0: Keyboard controller (i8042) at port 0x60,0x64 on isa0 vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0 sc0: System console at flags 0x100 on isa0 sc0: VGA 16 virtual consoles, flags=0x100 sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A, console sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A ppc0: Parallel port at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold plip0: PLIP network interface on ppbus0 lpt0: Printer on ppbus0 lpt0: Interrupt-driven port ppi0: Parallel I/O on ppbus0 ad4: 19092MB ST320414A [38792/16/63] at ata2-master UDMA66 acd0: CDROM CDU5211 at
Re: performance of mbufs vs contig buffers?
Terry Lambert writes: Andrew Gallatin wrote: After updating the firmware on our our 2 gigabit nic to allow enough scatter entries per packet to stock the 9K (jumbo frame) receive rings with cluster mubfs rather than contigmalloc'ed buffers(*), I noticed a dramatic performance decrease: netperf TCP_STREAM performance dropped from 1.6Gb/sec to 1.2Gb/sec. [ ... ] Is it possible that my problems are being caused by cache misses in on cluster mbufs occuring when copying out to userspace as another packet is being DMA'ed up? I'd thought that since the cache line size is 32 bytes, I'd be pretty much equally screwed either way. [ ... ] Does anybody have any ideas why contig malloc'ed buffers are so much quicker? Instrument m_pullup(), and see how much it's being called in both cases. Probably you are seeing the 2 byte misalignment of the TCP payload in the the ethernet packet. The TCP payload is aligned. We stock the rings so that the ethernet header is intentionally misaligned, which makes the IP portion of the packet land aligned. (actually, we encapsulate the ethernet traffic behind another 16-bit header, so everything ends up aligned without the +2/-2 stuff). My other guess would be that the clusters you are dealing with are non-contiguous. This has both scatter/gather implications, and cache-line implications when using them. Please elaborate... What sort of scatter/gather implications? Microbenchmarks don't show much of a difference DMA'ing to non-contigous vs. contigous pages. (over 400MB/sec in all cases). Also, we get close to link speed DMA'ing to user space, and with page coloring, that virtually guarantees that the pages are not physically contigous. Based on the UDP behaviour, I think that its cache implications. The bottleneck seems to be when copyout() reads the recently DMA'ed data. The driver reads the first few dozen bytes (so as to touch up the csum by subracting off the extra bits the DMA engines added in). We do hardware csum offloading, so the entire packet is not read until copyout() is called. Having thought about this problem before, I think that what you probably need is to chunk the buffers up, and treat them as M_EXT type mbufs (e.g. go with contigmalloc). I really, really hate doing this for a variety of reasons. Mainly that the user may not expect the NIC driver is doing this it may take her a while to realize that adjusting NMBCLUSTERS has no effect. Although... Hmmm.. I could use a small amount of private buffers while I have them then fall back to contig buffers when I run out. I'd still like to fully understand the problem though; sweeping it under the rug bothers me. To be able to use generic mbufs for this, what's really needed is the ability to have variable size mbufs. At the very least, I think a single mbuf should be of a size so that the MTU fits inside it. Fixing this would be a large amount of work, and the gain is uncertain. You can get a minor idea of the available gain by looking at the Tigon II firmware changes to use page based buffer allocations, per Bill Paul Co.. If you're thinking of what I'm thinking of (the zero copy stuff), I wrote that code. ;) I seem to remember you talking about seeing a 10% speedup from using 4MB pages for cluster mbufs. How did you do that? I'd like to see what affect it has with this workload. Thanks! Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Fatal trap 12: page fault while in kernel mode
Bruce A. Mah writes: I was discussing this with some of my cow-orkers, as we've had a similar situation (cluster mbufs getting temporarily depleted on a 4.5-RELEASE-p2 NFS server with Linux and FreeBSD clients, but no kernel panics). Shouldn't the net.inet.ip.maxfragpackets sysctl variable (introduced in 4.4-RELEASE) limit the number of fragments on the reassembly queue(s)? This value looks to be about 1/4 the number of cluster mbufs, by default. That's a good point. When I was bitten by this, I didn't have time to mess with things I cranked down the read/write size on the linux clients. The problem is that ip_maxfragpackets is: Maximum number of IPv4 fragment reassembly queue entries You ( I, most people probably) took that number to mean the cap on the number of mbufs sitting on reassembly queues. However, its really a cap on the number of fragmented packets sitting on reassembly queues: /* * If first fragment to arrive, create a reassembly queue. */ if (fp == 0) { /* * Enforce upper bound on number of fragmented packets * for which we attempt reassembly; * If maxfrag is 0, never accept fragments. * If maxfrag is -1, accept all fragments without limitation. ... Since the linux host is sending 16K packets, that means that each packet is made up of 11 cluster mbufs (assuming a 1500 byte mtu). There can be as many as 10 cluster mbufs on the reassembly queue for for each packet. Lets say we have 2048 cluster mbufs. That makes maxfragpackets 512. However, 512 * 10 mbufs = 5120 mbufs. Oops. I think the limit should probably be something much smaller, like maybe nmbclusters / (net.inet.udp.recvspace / 1472). Or the implementation name should be changed to maxfragmbufs Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Fatal trap 12: page fault while in kernel mode
Terry Lambert writes: Andrew Gallatin wrote: The problem is that ip_maxfragpackets is: Maximum number of IPv4 fragment reassembly queue entries You ( I, most people probably) took that number to mean the cap on the number of mbufs sitting on reassembly queues. However, its really a cap on the number of fragmented packets sitting on reassembly queues: [ ... ] Since the linux host is sending 16K packets, that means that each packet is made up of 11 cluster mbufs (assuming a 1500 byte mtu). There can be as many as 10 cluster mbufs on the reassembly queue for for each packet. Lets say we have 2048 cluster mbufs. That makes maxfragpackets 512. However, 512 * 10 mbufs = 5120 mbufs. Oops. I think the limit should probably be something much smaller, like maybe nmbclusters / (net.inet.udp.recvspace / 1472). Or the implementation name should be changed to maxfragmbufs This suggests that one could fragment as large a UDP packet as one chooses into n fragments, and then supply only n-1 elements of the whole packet, as an attack, in order to use up system resources. Essentially what a linux NFS client is already doing.. ;-( I think we are better off with my suggestion, where udp packets above a certain size are intentionally dropped as not supported. Depending on what the certain size is, that might be reasonable. Alternately, it would be a good idea to have a ip_maxpacketfrags instead of an ip_maxfragpackets, to put a hard limit on the number of mbufs that can be consumed by the fragment reassembly process. I think this is the best solution. Of course, this also suggests that using TCP instead of UDP for the NFS would result in the problem just going away, for the original poster, which is probably all the opriginal poster really cares about... Considering that a modern linux NFS client is going to be a common scenario, we should probably be able to interroperate with it, no matter how broken its defaults are. BTW, 16K UDP packets are legal according to the NFS V3 spec, if I remember it correctly. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Fatal trap 12: page fault while in kernel mode
Will Froning writes: I have a 4.5-RELEASE-p2 box that is my Firewall/NAT/NFS server. As a NFS client I have a RH7.2 linux box. When I do massive NFS writes to my FBSD (from RH7.2 box), I get a panic. I've attached the info I got from my debug kernel. While the fix being discussed by Peter others will prevent panics, the linux box will still run your server out of mbufs clusters. This is happening because the linux box is using a 16K write size over UDP by default. This is a stupid default. If there is any lossage between the hosts (eg, any packets get dropped), more and more packets will end up on the reassembly queues. Eventually, all your cluster mbufs will be there. I suggest changing the mount options on the linux box to use 8k reads and writes, or use TCP. Another problem I've see w/Linux NFS clients is that recent linux NFS clients seem to spew ACCESS requests like there's no tomorrow beats the snot out of my NFS server. When building large software pacakges via make -j4 over NFSv3 (100Mb ethernet) on a dual PIII 1GHz system, a FreeBSD 4.5 host issues 400-500 ACCESS calls/sec. A Linux 2.4.18 host spews 12,000 - 14,000 ACCESS calls/sec, or roughly 30 times as many. Needless to say, the build finishes a whole lot quicker on FreeBSD. Does anybody know what I can do to make the linux client cache ACCESS info? Cheers, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Kernel Debugging over the Ethernet?
Justin C.Walker writes: On Wednesday, February 20, 2002, at 04:52 PM, Julian Elischer wrote: yes but we might as well be protocol compatible if possible :-) If only to re-use what they did in gdb :-) The Darwin/Mac OS X scheme only deals with IOKit because that's where the drivers live. The protocol implementation is in the directory 'xnu/osfmk/kdp'. It's in essence a UDP protocol, and is implemented without using any of the system's networking scheme (except for mbufs). The implementation is polling. The implementation is pretty light-weight. Where do the Darwin gdb sources live, so we can see the gdb end of it too? I've looked, but have so far been unable to find them. Thanks, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Serverworks ATA controller data corruption
Søren Schmidt writes: Hmm, the problem is known, but belived to be fixed *IF* your BIOS setup things the right way. I've newer seen the problem on my ASUS CUR-DLS, but I have several reports of TYAN's (forgot the model#) that fails all over. I have not verified if ASUS has done some HW trickery or if its just a BIOS matter. However the Serverworks ROSB4 chips is not one I would recommend using, if you need serious ATA support on such a board, install a Promise TX2 or later or a HPT370 or later ... I don't much care about serious ATA support on these machines -- nearly all work is done on NFS volumes exported from an alpha. If I can just trust PIO not to corrupt the system disk, then it will be fine for me. So.. Is PIO safe? Is there any sort of CRC being done on PIO data? Thanks, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Serverworks ATA controller data corruption
Terry Lambert writes: So.. Is PIO safe? Is there any sort of CRC being done on PIO data? He just said: if your chipset is programmed correctly by the BIOS, then there will not be a problem, but apparently, there is a very narrow band of correctly (perhaps even only a single state), and the vendor apparently does not default the chip into that state. I was asking a more general question about ATA -- I know that UDMA has has some sort of CRC protection because (on other machines) I've seen the occasional error about a bad CRC, retrying. But what I don't know is if PIO offers the same protection. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Serverworks ATA controller data corruption
Terry Lambert writes: Andrew Gallatin wrote: Terry Lambert writes: So.. Is PIO safe? Is there any sort of CRC being done on PIO data? He just said: if your chipset is programmed correctly by the BIOS, then there will not be a problem, but apparently, there is a very narrow band of correctly (perhaps even only a single state), and the vendor apparently does not default the chip into that state. I was asking a more general question about ATA -- I know that UDMA has has some sort of CRC protection because (on other machines) I've seen the occasional error about a bad CRC, retrying. But what I don't know is if PIO offers the same protection. PIO is safe. The problem with ATA DMA needing the CRC is to recover from the case where the DMA is aborted in the middle, which is not signalled (this was the problem with the CMD640B ATA chipset interface on Intel). Or marginal cables, I'd assume. In fact, you might want to try enabling the CMD640B workaround on your system, even though it is not probing a CMD640B present, and see if that fixes it (the chipset in question might be using the same macrocell in its implementation, or it might just be similarly buggy). If that worked, then you could leave the DMA enabled. Ick. No thanks. PIO makes the host CPU do the work... basically, it's like a WinModem, only for ATA interfaces, and it's documented. 8-(. Actually, now that I think about it, using the main CPU and doinf PIO might be better anyway, given the speed difference between the main CPU and the DMA engine on the ATA chip; the overall performance may even be up to 2x better using the host CPU to do the work, particularly if you special case the transfer alignment, the way bcopy does. Not without write combining, at least, and PIO reads suck for x86s almost universally. To add insult to injury, most revs of this chip have a well known PIO corruption bug when write combining is enabled. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: mmap and PROT_WRITE
Jason Mawdsley writes: Why can't I write to memory in the first case? Is there anyway I can implement writable but no readable memory? I read some where that there is no true write only memory do to the limitations of x86. I think you must have read correctly -- your sample code runs fine (both cases) on FreeBSD/alpha. The same test program dumps core on FreeBSD/i386 Cheers, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: requesting guidance for updating the RocketPort driver
John Baldwin writes: On 09-Feb-02 Julian Elischer wrote: he infrastructure needed for a new driver can be taken from the sample driver in /usr/share/examples/drivers/make_device_driver.sh IN -CURRENT. (use cvdweb on the website to get it) that will at least get rid of the 'shims' stuff. There is already a newer driver in current. I've backported it but it didn't help me get my RocketPort working. :-P (I think my rocketport has other issues though as other people have had success with the cards on the old driver.) The backport is fairly easy. You need to bascially take the src/sys/dev/rp/ .. I've done essentially that and have been using it to drive serial consoles off my alpha for a few months now on -stable with no problems. I needed the newer driver from -current because the old driver isn't bus-space-ified and doens't work on alpha rp0: RocketPort PCI port 0x10180-0x101bf irq 9 at device 10.0 on pci0 RocketPort0 (Version 3.02) 8 ports. Try http://people.freebsd.org/~gallatin/rp.tgz. I never bothered to touch all the files files so that means you must build it as a module: cd /usr/src fetch http://people.freebsd.org/~gallatin/rp.tgz tar zxf rp.tgz cd sys/modules/rp make depend make make install kldload rp (this assumes you're not already running a kernel with old driver built in) All the standard disclaimers apply. Don't blame me if it blows up your computer, gives you an ulcer or gives your cat hairballs... Anyway, let me know if it works any better for you than the old driver in -stable. I've been reluctant to commit it since I don't want to be responsible for maintaining it and just running serial consoles at 9600 baud doesn't push things hard enough to find bugs. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Does anyone know if the Broadcom BCM5700 has problems with HW csum?
David Greenman writes: David Greenman wrote: In any case, disabling it is what ClickArray ended up doing, as well, for the Tigon II, until the firmware could be fixed. We're talking about the Tigon III (bge driver for Broadcom BCM5700/BCM5701). Crap. Thanks for the info. Have you manually calculated the checksum on a bad packet to see how it's off? Yes. It's typically off by 0x1051, but varies depending on the TCP/IP header contents. Hmm.. Since you've already got the code for calculating the checksum in the driver written, why not use it? Eg, why not pass the csum up set CSUM_DATA_VALID iff the csum ends up being 0? Are you worried that the firmware will yield false posatives too? Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: TCP Performance Graphs
Leo Bicknell writes: The question that immediately comes to mind is, why not simply use as big a value as possible? The problem comes down to buffering the data, and busy servers may have to buffer a lot of data. Having a 1 meg window size may have you buffer 1 meg per connection. Note that FreeBSD's current buffer management is particularly stupid in that it will _always_ buffer 1 Meg, need it or not. Until we fix this we need an interim solution. I thought that I heard a few months ago that Matt Dillon was looking at ways to dynamically size tcp windows from within the kernel. Maybe I'm on crack. Maybe we should look at the Dynamic Righsizing work being done at LANL. See Dynamic Adjustment of TCP Window Sizes and Dynamic Right-Sizing: A Simulation Study at http://public.lanl.gov/radiant/publications.html Cheers, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: 64bit Ethernet Card (if_sf driver)
[EMAIL PROTECTED] writes: Anyone with experience or ideas? Because of the aligment constraints of the card, its copying every single packet the driver recvs. This is required on alpha (and possibly other platforms) to prevent an unlaligned access. In a forwarding situation on an x86, it is suboptimal. Try making the m_devget in the rcv handler conditional on !i386 (see http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/nge/if_nge.c.diff?r1=1.13.2.2r2=1.13.2.3 for an example of how to change this) I'd be interestd to hear (quantitatively) how much your perf changes. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: timestamp offload [was Re: TCPIP cksum offload on FreeBSD 4.2]
Louis A. Mamakos writes: Some work I did a year or so ago measured the interrupt response time latency, and it was pretty impressive at how large and variable it could be. louie Yes. Me too, but with a pamette, not a nic. Have you read the pci pamette perf paper (Systems Performance Measurement on PCI Pamette (1997), Laurent Moll Mark Shand)? http://citeseer.nj.nec.com/1690.html If anybody cares, I have freebsd drivers for the pamette. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: TCPIP cksum offload on FreeBSD 4.2
Terry Lambert writes: Jonathan Lemon wrote: I'm trying to use the TCPIP checksum offload capability of the Netgear GA620 NIC from a SMP FreeBSD 4.2R system running on a typical PIII SBC. .. He didn't say his packet size, either. To the original poster: if you are sending jumbograms, the buffer size on these cards is limited, so the entire packet can't be in the card buffer at the same time, which means that you can not offload the send checksum for jumbograms, only for regular sized packets. This is an Alteon Tigon-2 (ti driver) based card with 512K of sram on board. It has plenty of space for offloading transmit checksums on jumbo frames. Perhaps you're thinking of the DP83820/DP83821 (nge driver), which cannot compute the checksum on an outgoing frame unless it fits in the 8K tx fifo. I think NetGear sells a card with a similar name (GA622T) based around this chip. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: TCPIP cksum offload on FreeBSD 4.2
Ronald G Minnich writes: I have a question on the checksum offloading. Has anyone measured any incidence of data corruption between the PCI card and memory. In other words, when you offload checksums the end-to-end checking becomes card-to-card checking, and the possibility exists that what goes in memory at the destination end is not what was sent at the source. Very remote possibility, of course, but ... We used to see occasional data corruption at Duke with 440BX based motherboards with non-ecc ram. We never saw it on higher-quality hosts (alphas or serverworks based pc motherboards) with ecc memory. It would manifest itself as bad TCP checksums (no csum offload at the time). of these types of problems (of course FreeBSD has the fastest IP over Myrinet anyway, so it's not like that's a huge problem). Not any more. A 2.4 linux kernel will do a bit better than FreeBSD on an SMP box because it is able to use both processors. Speaking of which -- who is working on making the network stack SMP capable in -current? Anything I can do to help? Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: TCPIP cksum offload on FreeBSD 4.2
Louis A. Mamakos writes: The other type of failure you might not catch are software errors; that is, where a packet is produced by the network stack and then is subsequently stomped on by a random store from some other code. Or a mis-programmed I/O card with scatter/gather capability doesn't pick up what was intended, etc. The Internet checksum is useful for detecting this class of error. No, you're missing the point almost entirely. The checksum is not skipped. It is calculated by the DMA engine based on the data that's transferred across the I/O bus on the receiver (and / or the sender). If the data is incorrect as seen by the receiving nic, the checksum will be wrong and the packet will be dropped. If the packet lands in the wrong place, you have much worse problems. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: TCPIP cksum offload on FreeBSD 4.2
Ronald G Minnich writes: you still have a potential problem here with variance in chipsets, namely the case of broken ABORT or other unusual PCI cycle handling (missed word problem). I agree it's a low probability. But we've seen it, just a week or two ago on a brand new box. But then we tend to see things here nobody else sees due to our scale. ron At this level, you're basically screwed. A sofware checksum isn't even an option on other PCI users, like disk controllers. If you don't trust your PCI chipset, what do you do about things like that? I'm rather curious -- what was the problematic hardware combination? Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: TCPIP cksum offload on FreeBSD 4.2
Ronald G Minnich writes: On Thu, 27 Sep 2001, Andrew Gallatin wrote: At this level, you're basically screwed. A sofware checksum isn't even an option on other PCI users, like disk controllers. If you don't trust your PCI chipset, what do you do about things like that? I'm rather curious -- what was the problematic hardware combination? Can't say yet :-( But it is one of the fancy network interfaces that essentially runs an RTOS on the NIC so it can help you. Actually fancy $5000 network interfaces are in general less reliable than your average garden-variety $2 IDE chip. Partly because they have so much capability. So we don't worry a lot about lossage with IDE. But it's a big problem on expensive, high end, high performance network interfaces. But SCSI isn't immune either. We had some data corruption problems with early adaptec Ultra-2 scsi controllers too, before Justin fixed it by working around it in the driver. Basically, anything that uses a PCI chipset harder or in different ways than its designers expected can end up being a problem. Low volume hardware is somtimes worse, but not always... Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: TCPIP cksum offload on FreeBSD 4.2
Louis A. Mamakos writes: I was referring to the case on the transmit side where the wrong data get's gathered up by the DMA engine because of software related errors. You get a valid checksum, but for the wrong data. You might have the wrong data because a drive screwed up setting the DMA descriptors, or some other I/O transfer splatted over the buffer waiting in a transmit queue. What happens if that same i/o transfer splatted over the buffer waiting in user space prior to the copyin, or sitting in the socket buffer prior to a software checksum being done? Software checksums are not quite the panacea you make them out to be. And they're very expensive. Geez. All I wanted to do was pat Jonathan on the back for coming up with what is apparently the most flexible and well though out mechanism out there. These issues have been argued to death; I don't feel like arguing with you. I'm satisified that I'm not going to convince you you're not going to convince me. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: TCPIP cksum offload on FreeBSD 4.2
Louis A. Mamakos writes: Folks ought to consider the likelyhood of this class of data corruption, unlikely as it is, and weigh it along with the impact on your application, and the differences in performance and loading. Agreed. Very well said, by the way.. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: ecc on i386
Peter Wemm writes: Thanks for your description of how ECC is reported on PCs. That was very, very helpful. The Tyan Thunder 2510 BIOS even disables ECC - NMI routing so you have to go to quite a bit of trouble to reprogram the serverworks chipset to actually generate NMI's so that you can find out if something got trashed. Is that the He-Sl or the LE-3 chipset? Is that code available? I have some LE-3 based boxes which I'd like be certain DTRT. Unlike my wife's Dual Athlon, these boxes have nothing in their BIOS pertaining to ECC error reporting. (Supermicro 370-DLE) Our NMI / ECC handling really really sucks in FreeBSD. Consider: - i686_pagezero - reads before writing in order to minimize cache snooping traffic in SMP systems. However, if it gets an NMI while trying to check if the cache line is already zero, it will take the entire machine down instead of just zeroing the line. - NFS / VM / bio: when they get an NMI while trying to copy data that is clean and backed by storage, they take the machine down instead of trying to recover and re-read the page. - userland.. If userland gets an NMI, the machine dies instead of killing the process (or rereading a text page etc if possible) - our NMI handlers are a festering pile of excretement. They dont have the code to 'ack' the NMI so it isn't possible to return after recovery. - and so on. Well, at least we take the machine down, which is a heck of a lot better than ignoring the problem, which is really all that I was hoping for. Thanks again, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
ecc on i386
What happens on an ECC equipped PC when you have a multi-bit memory error that hardware scrubbing can't fix? Will there be some sort of NMI or something that will panic the box? I'm used to alphas (where you'll get a fatal machine check panic) and I am just wondering if PCs are as safe. Thanks, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: ecc on i386
Matt Dillon writes: :What happens on an ECC equipped PC when you have a multi-bit memory :error that hardware scrubbing can't fix? Will there be some sort of :NMI or something that will panic the box? : :I'm used to alphas (where you'll get a fatal machine check panic) and :I am just wondering if PCs are as safe. : :Thanks, : :Drew ECC can typically detect and correct single bit errors and detect double bit errors. Anything beyond that is problematic... it may or may not detect the problem or may mis-correct a multi-bit error. An NMI is generated if an uncorrectable error is detected. On PC's, ECC is optional. Desktops typically do not ship with ECC memory. Branded servers typically do.A year or two ago I would have been happy to use non-ECC rams (finding bad RAM through trial and error), but now with capacities as they are and memory prices down ECC is definitely the way to go. My sentiments exactly. Bit errors can come from many sources, memory being only one. Bit errors can occur inside the cpu chip, in the L1 and L2 caches, in memory, in controller chips... all over the place. Many modern processors implement parity on their caches to try to cover the problem areas. I'm not sure how Pentium III's and IV's are setup. -Matt Hmm.. Well, it turns out that the box Im insterested in (Thunder K7) can be set to send an SERR on multiple bit errors. I wonder what happens when a pc gets an SERR? (that's another machine check on alpha) Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: any reason to use m_devget in the dc driver ?
Luigi Rizzo writes: Does anyone know of specific reasons to use m_devget() to extract received packets from the rx buffer in the dc driver, as opposed to passing up the mbuf and just replacing it with a fresh one in the controller's queue ? Other drivers just happily do the latter, including the de driver, so there seems to be no problem with the chipset in handling this ? I imagine that this was done to follow alignment constraints on non-i386 platforms where having the ip header misaligned is fatal. (the tulip is not capable of byte granularity DMA, so you can't intentionally misalign the ethernet header end up with an aligned IP header) I imagine the i386 should be made an exception. See rev 1.17 of sys/dev/nge/if_nge.c Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: any reason to use m_devget in the dc driver ?
Terry Lambert writes: Andrew Gallatin wrote: I imagine that this was done to follow alignment constraints on non-i386 platforms where having the ip header misaligned is fatal. (the tulip is not capable of byte granularity DMA, so you can't intentionally misalign the ethernet header end up with an aligned IP header) This is the reason: the ethernet header is 14 bytes. I imagine the i386 should be made an exception. See rev 1.17 of sys/dev/nge/if_nge.c I disagree with this code; the elemenets in the header are referenced multiple times. If you are doing the checksum check, you might as well be relocating the data, as well. The change I would make would be to integrate the checksum calculation with the m_devget(), to ensure a single pass, in the case that m_devget() must be used to get aligned packet payload, and the checksum has not been offloaded to hardware. Interesting idea... However, what if you're a bridge or a router? You've just done a whole lot of work for nothing. I imagine its just this case that Luigi cares about. If you want to integrate a checksum a copy, it should really be done at the copyout() stage. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Driver structures alignment
Peter Wemm writes: The same goes for __format_arg(n) in stdio.h. And so on. We've been pretty clean about it so far, but a few have slipped through. That __format_arg, btw, breaks the Compaq CCC compiler causes us to have to override stdio.h because of just that one line. Does your comment mean this has a chance of getting fixed? Thanks, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: TCSH bug...
Steve Ames writes: We *do* know who that is. This however is a more tcsh-specific issue, and raising it with the tcsh author would probably lead you to faster happiness. Is there some reason you wont email him about this? Except it isn't tcsh specific really. Our config.h in /usr/src/bin/csh defines SYSMALLOC. The port does not. The port works, the system version doesn't. If you comment out SYSMALLOC in /usr/src/bin/csh/config.h and recompile then the TCSH bug goes away. Now you could argue that perhaps the definition of SYSMALLOC just exposes a bug in tcsh? OTOH, since the system version in -STABLE also defines SYSMALLOC and still manages to work... you could also argue that this points to some other bug in -CURRENT... lastly it could be argued that I'm barking up completely the wrong tree. *shrug* Actually, it is a tcsh bug. Try playing with the MALLOC_OPTIONS env. variable in -stable. Specifically, set it to 'AJ' I bet it will drop core in -stable. Eg: 12:10pmthunder/gallatin:/tmpuname -sr FreeBSD 4.4-RC 12:10pmthunder/gallatin:/tmpsetenv MALLOC_OPTIONS 'AJ' 12:10pmthunder/gallatin:/tmptcsh tcsh 6.10.00 (Astron) 2000-11-19 (alpha-digital-FreeBSD) options 8b,nls,dl,al,kan,sm,rh,color,dspm 12:10pmthunder/gallatin:/tmpset rmstar 12:10pmthunder/gallatin:/tmprm * Do you really want to delete all files? [n/y] n Segmentation fault (core dumped) Note that -current has malloc options 'AJ' on by default to catch just this kind of bug. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: NatSemi DP83820 gigE driver kit for 4.2 and 4.3
[EMAIL PROTECTED] writes: A more important question is are these 32-bit cards, and if so, do they have enough internal buffer to do sustained 1GB transfers. Generally 32-bit PCI is too slow for GB, as it cant do sustained 1GB transfers. Some 32-bit GB cards are just a total waste. The two cards that I have experience with are the Netgear GA622T and SMC9462TX. Both are 64-bit/66MHz cards. The first nge cards we tried were a pair of Netgear GA622T boards. They leave a lot to be desired. We put them in our Dell PowerEdge 4400 boxes (Serverworks chipset with interleaved ram and 64-bit/66MHz PCI, 733MHz Xeon) hooked them up through our Extreme Summit 7i Gigabit switch (Copper). They have a decent packets/second rate for minimally sized packets (155,000 packets/sec or so), but they have serious trouble filling the link with UDP packets -- even with jumbo frames, I can't seem to push more than 450Mb/sec out of them. At this point, we figured the NatSemi DP8382x was just a lousy chipset, so we ordered a pair of SMC9462TX boards. Based on comments which used to be in the lge driver, we assumed that they used the Level 1 LXT1001 chips. However, we found out that the SMC9462TX boards that we have use the NatSemi DP8382x. (Perhaps the SMC9462SX uses the LXT1001?) We were pleasantly surprised to learn that the nge based SMC boards do perform well. Using the same hosts switch as above, we can nearly fill the link with 1500 byte packets (950Mb/sec, I think). And they can also sustain more than 155,000 minimally sized packets/sec. They can easily fill the link with jumbo frames, but then there's that 8k tx fifo checksum limitation. Hope this helps, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: NatSemi DP83820 gigE driver kit for 4.2 and 4.3
Bill Paul writes: by user programs, but these don't panic the system. In the case of FreeBSD/alpha, we fake it up so know about the problem but the process keeps running. Some OSes (e.g. Solaris) clobber the process with a SIGBUS. Some would argue the latter behavior is better since it makes it easier to find and fix what is probably a bug in the first place. Actually, you can control this behaviour with the uac (1) command on FreeBSD/alpha. 'uac -s' causes unaligned access errors to result in a SIGBUS being delivered to the parent and its future descendants. You can also enable/disable printing of errors, etc. Really handy when you're using a ghostscript not built w/Compaq C. Also, Tru64 has a similar command with the same name and different syntax. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: How to disable software TCP checksumming?
Jesper Skriver writes: On Tue, May 29, 2001 at 02:41:14PM -0500, Bob Willcox wrote: Hi, I am working on a device driver for a GSN adapter that has hardware CRC checking and need to know if there is a way to disable the software CRC checking for TCP? This is on a FreeBSD 4.2-stable system. Eegads. I think the original poster wanted to be able to use the hardware CRC features of his nic, not ignore checksums altogther. Bob -- Take a look at the /sys/pci/if_ti.c driver for an example of how to use hardware checksum assist. On the recieve side, you want to set the m_pkthdr.csum_flags appropriately (depending on what your device can do) on each recieve, as well as fill in the actual checksum in m_pkthdr.csum_data. On the send side, you need to specify what your device is capable of assisting with in the if_hwassist field of your driver's ifp struct. Packets will come down w/o those fields filled in. The stack will expect your device to calculate those fields in hardware. I beleive these features appeared around 4.1, so if this is a 3rd party driver, you may want to check __FreeBSD_version = 41. Hope this helps, Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: _SC_NPROCESSORS_CONF
Arun Sharma writes: Single UNIX spec doesn't include the above sysconf(3) argument, but many UNIX variants do. What's the BSD way of doing this ? How about the hw.ncpu sysctl? Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: md disks more than one from a kldload?
Jaye Mathisen writes: I kldload md.ko. First md device comes up just peachy. however, attempts to now create an md1 fail with device not configured. If you're feeling brave, I just back ported the all-singing / all-dancing md device from -current today (I wanted a size-configurable, non MFS malloc disk for something). I haven't pushed it very hard, but multiple disks appear to work from a module. Apply the patch at http://people.freebsd.org/~gallatin/md.diff Then grab sys/sys/mdioctl.h and sbin/mdconfig from -current. You'll need to make the mdctl device node yourself (95, 0x00ff) If anybody else feels like testing this, please do so. Is there some interest an MFC? Cheers, Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: vmware on freebsd for fast booting for devel.
Sven Huster writes: OT FYI: Check the ISP1100 from Intel if you like support for PIII up to 850 2GB RAM 2 x Intel Network onboard (includes pxe boot, possible on both) full serial console (even for access to bios setup) Hmm.. We have some Dell PowerEdge 1550s that do this (nice machines, but horribleb bootstones). But I've got a basic problem with console redirection on PCs that we don't see on Alphas or Suns. The problem is that I cannot figure out how in the hell to hit F2 in my environment. My environment is essentially telnet'ing into a console server from an xterm. Hitting Ctrl-A for the scsi bios works just fine dandy.. Anybody know how to make ansi function keys work from an xterm? Thanks, Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Re[2]: vmware on freebsd for fast booting for devel.
Walter Hop writes: [in reply to [EMAIL PROTECTED], 25-04-2001] Interesting. What happens if it's like the reverse where one runs FreeBSD under vmware from Windows2000? Since 5-10% seems to be really slow. I always try out new applications in a virtual machine running FreeBSD on my Windows workstation, it's lovely. I/O is painfully slow, but in normal situations performance is 10%... (PII-350, 256MB ram) Note that the 5-10% I was talking about is just the tertiary bootloader (/boot/loader). I mentioned it because the original poster was primarily concerned about 'bootstones' -- in more normal situations (ie, once the kernel is loaded) I'd say performance is more like 40-80% of native. Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: vmware on freebsd for fast booting for devel.
Alfred Perlstein writes: So I've got this really elite machinery here to test on, problem is that booting takes about 2 minutes each time I make a bad kernel, s... Do you mean that vmware boots so slowly that the extra reboot cycle required to install the next test kernel is painfully slow? One thing to try to speedup vmware boots would be getting rid of the spinner in libstand -- vwware's dos-mode console i/o is painfully slow. The best way to cut the reboot wait time down is to network boot. Unfortunately, VMware's AMD PCInet card doesn't support PXE. Somebody here has been using something called grub (http://www.gnu.org/software/grub/) Grub doesn't support FreeBSD very well (eg, it can't set the root device, set hints, etc). I think he was hacking grub to add those features, but I don't know how far he got...BTW, grub has no spinner. Anyone using anything like vmware in order to have a rapid reboot/test cycle for low level FreeBSD kernel coding? How fast is it to I've actually found real hardware to be much faster than vmware in most cases. My dream quick-reboot box has no scsi disks, can skip the memory test, has a serial console loads its kernels via pxe. Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: vmware on freebsd for fast booting for devel.
Alfred Perlstein writes: * Andrew Gallatin [EMAIL PROTECTED] [010424 14:44] wrote: Alfred Perlstein writes: So I've got this really elite machinery here to test on, problem is that booting takes about 2 minutes each time I make a bad kernel, s... Do you mean that vmware boots so slowly that the extra reboot cycle required to install the next test kernel is painfully slow? I acutally haven't tried vmware yet, I was hoping to utilize the lists to find out others' experiences wrt using vmware like I wish to. Ah.. the 2 minutes above made it sound like you were already using it. If you do start to use it (running -current as a guest), make sure to use the i386 path for atomic_cmpset_int() unconditionally -- somehow the cmpxchgl is finding a very slow path through the emulator. One thing to try to speedup vmware boots would be getting rid of the spinner in libstand -- vwware's dos-mode console i/o is painfully slow. The best way to cut the reboot wait time down is to network boot. Unfortunately, VMware's AMD PCInet card doesn't support PXE. Somebody here has been using something called grub (http://www.gnu.org/software/grub/) Grub doesn't support FreeBSD very well (eg, it can't set the root device, set hints, etc). I think he was hacking grub to add those features, but I don't know how far he got...BTW, grub has no spinner. Anyone using anything like vmware in order to have a rapid reboot/test cycle for low level FreeBSD kernel coding? How fast is it to I've actually found real hardware to be much faster than vmware in most cases. My dream quick-reboot box has no scsi disks, can skip the memory test, has a serial console loads its kernels via pxe. Yeah, where do i buy one? Heh. Most Dell i810 based Optiplexes boot quickly. You just need to throw an fxp in there for pxe. I'm sure other, cheaper, boxes do just as well. Compared to the full-price vmware, it would probably be quicker to buy a used p6.. Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: vmware on freebsd for fast booting for devel.
Doug Ambrisko writes: | | Grub doesn't support FreeBSD very well (eg, it can't set the root | device, set hints, etc). I think he was hacking grub to add those | features, but I don't know how far he got...BTW, grub has no spinner. Why not just use EtherBoot? Simple ignorance. I'll pass that pointer along to the person here who was hacking with VMware. Thanks! Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: vmware on freebsd for fast booting for devel.
Vincent Poy writes: Speaking about vmware, how much of the performance is a vm supposed to give compared to the actual processor in a stand-alone machine? It depends on what metric one uses to measure performance. Boots (loading kernel) with a graphics console are painfully slow, like 5-10% of native speed. CPU bound programs run at near-native speeds. I/O bound jobs are much slower. Memory is a very important factor -- 128MB or less is too little to run VMware at a reasonable speed. And to conserve memory, it really helps to use a plain disk rather than using a disk file. This entails vmware doing I/O to a raw disk partition rather than to a file and reduces memory use by eliminating double caching of data by the host and guest OSes. FWIW, my old 300MHz PII (128MB ram, disk file) was nearly unusable. My wife's 400MHz laptop (192MB ram, plain disk) is fairly decent. My new 1.2GHz Tbird (1GB ram, plain disk) feels quite fast. This is for my workload, which is typically an occasional boot into Windows. Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: x86-64 Hammer and IA64 Itainium
Mike Silbersack writes: Once that's done, it'll probably be a matter to send a clawhammer system and a large box of cheese and crackers to the guys who did the freebsd alpha port. If the architecture is actually so similar to x86, it should only take them a few weekends. :) As one of the FreeBSD/alpha porters, I must point out that I don't know diddly-squat about low-level x86isms. I've never even written a line of x86 assembly. What's the timeframe that they're shooting for with this beast, anyway? Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message