SMP profiling -- patch, if someone cares to bring it up to date
I sent a patch considerably improving kernel profiling support to tech-kern while I was with Coyote Point. It is at: http://mail-index.netbsd.org/tech-kern/2010/12/11/msg009519.html I got a few comments about the code organization but not much else. I simply ran out of time to work on updating and integrating the patch. I mention it here because anyone wanting to take good kernel profiles on SMP NetBSD systems might want to look into integrating this. I do not have the time at present. -- Thor Lancelot Simont...@panix.com All of my opinions are consistent, but I cannot present them all at once.-Jean-Jacques Rousseau, On The Social Contract
Re: Question about tcp ephemeral ports
Hi, Attached is a patch that makes my small test program working. I applies to 5.1 and 5.1.1 only. Porting it to current would be a bit harder due to the port randomization, as described by Eric previously. This is just a proof of concept and I would be happy to have some feedback about how to write it better and what are the potential issues. Olivier From 61c4012c89cd088f8f6e3f16f5e1306104232b28 Mon Sep 17 00:00:00 2001 From: Olivier Matz olivier.m...@6wind.com Date: Thu, 2 Feb 2012 16:51:05 +0100 Subject: tcp: allow to reuse an ephemeral port if dest addr/port is different When a TCP client calls connect(), an implicit bind is done by the network stack to choose an ephemeral port. Currently, there is a limitation that prevent the tcp client to open many ephemeral ports even if the destination port or address is different. The problem is described in details there: http://mail-index.netbsd.org/tech-kern/2012/01/30/msg012602.html The goal of this patch is to allow duplicate the code of in_pcbbind() in a new function in_pcbbind_before_connect() that is called specifically by the TCP connect code when doing an implicit bind. The behaviour is a bit different compared to the initial in_pcbbind(): - only the (nam == NULL) case is allowed - the function is aware of remote address that will be given to the connect(). The duplication of the ephemeral port is checked by a in_pcblookup_connect() instead of a in_pcblookup_port(). - the socket state is not changed to BOUND (but the the pcb is added in the INPCBHASH_PORT table). The connect() will change the state to CONNECTED if it is successful. If the in_pcbconnect() fails, we need to restore the initial state: inp-in_port to 0, tcp in INPCBHASH_PORT table[0], remove INP_ANONPORT flag. Note: this patch is just a proof of concept and should probably be cleaned and enhanced. Currently, only IPv4 is done. --- netinet/in_pcb.c | 88 ++ netinet/in_pcb.h |2 + netinet/tcp_usrreq.c | 10 +- 3 files changed, 99 insertions(+), 1 deletions(-) diff --git a/netinet/in_pcb.c b/netinet/in_pcb.c index 5d662ce..498a344 100644 --- a/netinet/in_pcb.c +++ b/netinet/in_pcb.c @@ -371,6 +371,94 @@ noname: return (0); } +int +in_pcbbind_before_connect(void *v, struct in_addr raddr, + u_int rport, struct lwp *l) +{ + struct inpcb *inp = v; + struct socket *so = inp-inp_socket; + struct inpcbtable *table = inp-inp_table; + struct sockaddr_in *sin = NULL; /* XXXGCC */ + u_int16_t lport = 0; +#ifndef IPNOPRIVPORTS + kauth_cred_t cred = l-l_cred; +#endif + int cnt; + u_int16_t mymin, mymax; + u_int16_t *lastport; + + if (inp-inp_af != AF_INET) + return (EINVAL); + + if (TAILQ_FIRST(in_ifaddrhead) == 0) + return (EADDRNOTAVAIL); + if (inp-inp_lport || !in_nullhost(inp-inp_laddr)) + return (EINVAL); + + if (inp-inp_flags INP_LOWPORT) { +#ifndef IPNOPRIVPORTS + if (kauth_authorize_network(cred, + KAUTH_NETWORK_BIND, + KAUTH_REQ_NETWORK_BIND_PRIVPORT, so, + sin, NULL)) + return (EACCES); +#endif + mymin = lowportmin; + mymax = lowportmax; + lastport = table-inpt_lastlow; + } else { + mymin = anonportmin; + mymax = anonportmax; + lastport = table-inpt_lastport; + } + if (mymin mymax) { /* sanity check */ + u_int16_t swp; + + swp = mymin; + mymin = mymax; + mymax = swp; + } + + lport = *lastport - 1; + for (cnt = mymax - mymin + 1; cnt; cnt--, lport--) { + if (lport mymin || lport mymax) + lport = mymax; + if (!in_pcblookup_connect(table, inp-inp_laddr, + htons(lport), raddr, htons(rport))) + goto found; + } + if (!in_nullhost(inp-inp_laddr)) + inp-inp_laddr.s_addr = INADDR_ANY; + return (EAGAIN); + + found: + inp-inp_flags |= INP_ANONPORT; + *lastport = lport; + lport = htons(lport); + + inp-inp_lport = lport; + LIST_REMOVE(inp-inp_head, inph_lhash); + LIST_INSERT_HEAD(INPCBHASH_PORT(table, inp-inp_lport), inp-inp_head, + inph_lhash); + + return (0); +} + +void +in_pcbbind_revert(void *v) +{ + struct inpcb *inp = v; + struct inpcbtable *table = inp-inp_table; + + /* Called from tcp_usrreq if the connect failed after an + * implicit bind. This will restore the initial state */ + inp-inp_flags = ~INP_ANONPORT; + inp-inp_lport = 0; + LIST_REMOVE(inp-inp_head, inph_lhash); + LIST_INSERT_HEAD(INPCBHASH_PORT(table, inp-inp_lport), inp-inp_head, + inph_lhash); +} + /* * Connect from a socket to a specified address. * Both address and port must be specified in argument sin. diff --git a/netinet/in_pcb.h b/netinet/in_pcb.h index 8e1d929..51a0a5c 100644 --- a/netinet/in_pcb.h +++ b/netinet/in_pcb.h @@ -125,6 +125,8 @@ struct inpcb { void in_losing(struct inpcb *); int in_pcballoc(struct socket *, void *); int in_pcbbind(void *, struct mbuf *, struct lwp *); +int in_pcbbind_before_connect(void *, struct in_addr, u_int, struct lwp *); +void in_pcbbind_revert(void *v); int in_pcbconnect(void *, struct mbuf *, struct lwp *); void
Re: Second stage bootloader (i386) hangs on ls command for ext2
On Sun, Dec 25, 2011 at 11:54 AM, Evgeniy Ivanov lolkaanti...@gmail.com wrote: Hi, On Sun, Dec 25, 2011 at 10:20 AM, Izumi Tsutsui tsut...@ceres.dti.ne.jp wrote: Hi, Evgeniy Ivanov wrote: Izumi, thank you for reviewing! New patches are attached. : I think it's better to use a positive LIBSA_ENABLE_LS_OP option rather than LIBSA_NO_LS_OP, and make whole (fs_ops)-ls op part optional because - there are many primary bootloaders (bootxx_foo) which don't need the ls op and have size restrictions (alpha, atari, pmax ...) - there are few bootloaders which support command prompt mode where the `ls' op is actually required (some ports don't have even getchar()) Done. We also have to check all other non-x86 bootloaders which refer ufs_ls(). (ews4800mips, ia64, landisk, x68k, zaurus etc) Done. I'm not able to check though, but the modification is trivial and almost the same as for i386. Committed all changes (with several fixes for ews4800mips and x68k) http://mail-index.NetBSD.org/source-changes/2011/12/25/msg02.html Great! Thank you for your great work! np :-) Now it's time for someone[TM] to try PR/30866 :-) http://gnats.NetBSD.org/30866 Seems to be a useful feature, I'll work on this in Jan if it doesn't violate [TM] :P Unfortunately I was out of time and doubtfully will get some time for this soon... So anybody is welcome to work on this feature. -- Evgeniy
Re: kmem change related trouble
Lars Heidieker wrote: I've just posted a patch ( http://www.netbsd.org/~para/fix.patch ) - It moves uareas and buffer cache back to the kernel_map restoring the previous behavior. Sizing the kmem_arena is changed accordingly (Something I stepped on while checking evbmips on gxemul). - Code to drain pools if the kmem_arena runs out of space. I tried your patch on sandpoint and ofppc. Unfortunately it doesn't change anything. Here are the last lines before the crash on ofppc (note that the warning no /dev/console is wrong): [...] boot device: wd0 root on wd0a dumps on wd0b root file system type: ffs warning: no /dev/console trap: kernel read DSI trap @ 0xa00011c8 by 0x3d6324 (DSISR 0x4000, err=14), lr 0x3d6310 Press a key to panic. panic: trap Entering ddb shows the crash happened in pool_cache_get_paddr(): kernel DSI read trap @ 0xa00011c8 by pool_cache_get_paddr+0x4c: srr1=0x9032 r1=0xa22b9aa0 cr=0x28284084 xer=0x02000 ctr=0x1642c dsisr=0x4000 The backtrace: copyright kmem_intr_alloc exec_elf32_makecmds check_exec execve1 start_init, setfunc_trampoline -- Frank Wille
Re: kmem change related trouble
kernel DSI read trap @ 0xa00011c8 by pool_cache_get_paddr+0x4c: srr1=0x9032 r1=0xa22b9aa0 cr=0x28284084 xer=0x02000 ctr=0x1642c dsisr=0x4000 The backtrace: copyright kmem_intr_alloc exec_elf32_makecmds check_exec execve1 start_init, setfunc_trampoline is that with the latest exec_elf.c? I'd like to see if the location changes with the latest one.
rw_lock vs mutex
While digging around looking into another problem, I noticed that the piixpm(4) driver uses an rw_lock for its ic_acquire_bus/ic_release_bus routines. ic_acquire_bus() uses rw_enter(..., RW_WRITER) and there doesn't appear to be any use anywhere of RW_READER for that lock. The man page for rw_lock implies that it is a superset of a mutex. So I'm wondering if it makes any sense to use the simpler mutex instead? - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: rw_lock vs mutex
On Feb 2, 2012, at 5:38 PM, Paul Goyette wrote: While digging around looking into another problem, I noticed that the piixpm(4) driver uses an rw_lock for its ic_acquire_bus/ic_release_bus routines. ic_acquire_bus() uses rw_enter(..., RW_WRITER) and there doesn't appear to be any use anywhere of RW_READER for that lock. The man page for rw_lock implies that it is a superset of a mutex. So I'm wondering if it makes any sense to use the simpler mutex instead? Switch to a mutex, it's much less overhead that a r/w lock
RE: rw_lock vs mutex
A rw_lock allows multiple readers, correct? If there's a non-trivial probability of concurrent reads that would make a difference. If not, then a mutex would be just as good especially if that is lower overhead. paul -Original Message- From: tech-kern-ow...@netbsd.org [mailto:tech-kern-ow...@netbsd.org] On Behalf Of Matt Thomas Sent: Thursday, February 02, 2012 8:53 PM To: p...@whooppee.com Cc: tech-kern@netbsd.org Subject: Re: rw_lock vs mutex On Feb 2, 2012, at 5:38 PM, Paul Goyette wrote: While digging around looking into another problem, I noticed that the piixpm(4) driver uses an rw_lock for its ic_acquire_bus/ic_release_bus routines. ic_acquire_bus() uses rw_enter(..., RW_WRITER) and there doesn't appear to be any use anywhere of RW_READER for that lock. The man page for rw_lock implies that it is a superset of a mutex. So I'm wondering if it makes any sense to use the simpler mutex instead? Switch to a mutex, it's much less overhead that a r/w lock
RE: rw_lock vs mutex
On Thu, 2 Feb 2012, paul_kon...@dell.com wrote: A rw_lock allows multiple readers, correct? If there's a non-trivial probability of concurrent reads that would make a difference. If not, then a mutex would be just as good especially if that is lower overhead. The rwlock in question is contained with the driver's softc. The only exported accessors are i2c_bus_acquire() (which grabs a RW_WRITER lock) and i2c_bus_release(). A mutex makes much more sense. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: extended attributes and lsextattr/extattr_list_file
hi, YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote: we need to decide what to be shipped for netbsd-6. (hope it isn't too late.) is anyone against the removal of freebsd-style syscalls? We will need some macro to discover what API is available: FreeBSD-line in 5.x and Linux-like in 6.0. i was assuming we have no releases on which the freebsd-style API is actually usable. it it wrong? YAMAMOTO Takashi -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: extended attributes and lsextattr/extattr_list_file
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote: We will need some macro to discover what API is available: FreeBSD-line in 5.x and Linux-like in 6.0. i was assuming we have no releases on which the freebsd-style API is actually usable. it it wrong? Yes, you are right. I committed code in glusterfs that used it, but it is #ifndef HAVE_SYS_XATTR_H, therefore that will automatically revert to Linux-style API when sys/xattr.h is availale. Therefore we have no problem. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org