Current problem reports assigned to freebsd-net@FreeBSD.org
Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description o kern/175267 net[pf] [tap] pf + tap keep state problem o kern/175236 net[epair] [gif] epair and gif Devices On Bridge o kern/175182 net[panic] kernel panic on RADIX_MPATH when deleting rout o kern/175153 net[tcp] will there miss a FIN when do TSO? o kern/174959 net[net] [patch] rnh_walktree_from visits spurious nodes o kern/174958 net[net] [patch] rnh_walktree_from makes unreasonable ass o kern/174897 net[route] Interface routes are broken o kern/174851 net[bxe] [patch] UDP checksum offload is wrong in bxe dri o kern/174850 net[bxe] [patch] bxe driver does not receive multicasts o kern/174849 net[bxe] [patch] bxe driver can hang kernel when reset o kern/174822 net[tcp] Page fault in tcp_discardcb under high traffic o kern/174602 net[gif] [ipsec] traceroute issue on gif tunnel with ipse o kern/174535 net[tcp] TCP fast retransmit feature works strange o kern/173475 net[tun] tun(4) stays opened by PID after process is term o kern/173201 net[ixgbe] [patch] Missing / broken ixgbe sysctl's and tu o kern/173137 net[em] em(4) unable to run at gigabit with 9.1-RC2 o kern/173002 net[patch] data type size problem in if_spppsubr.c o kern/172985 net[patch] [ip6] lltable leak when adding and removing IP o kern/172895 net[ixgb] [ixgbe] do not properly determine link-state o kern/172683 net[ip6] Duplicate IPv6 Link Local Addresses o kern/172675 net[netinet] [patch] sysctl_tcp_hc_list (net.inet.tcp.hos o kern/172113 net[panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4 o kern/171840 net[ip6] IPv6 packets transmitting only on queue 0 o kern/171838 net[oce] [patch] Possible lock reversal and duplicate loc o kern/171739 net[bce] [panic] bce related kernel panic o kern/171728 net[arp] arp issue o kern/171711 net[dummynet] [panic] Kernel panic in dummynet o kern/171532 net[ndis] ndis(4) driver includes 'pccard'-specific code, o kern/171531 net[ndis] undocumented dependency for ndis(4) o kern/171524 net[ipmi] ipmi driver crashes kernel by reboot or shutdow s kern/171508 net[epair] [request] Add the ability to name epair device o kern/171228 net[re] [patch] if_re - eeprom write issues o kern/170701 net[ppp] killl ppp or reboot with active ppp connection c o kern/170267 net[ixgbe] IXGBE_LE32_TO_CPUS is probably an unintentiona o kern/170081 net[fxp] pf/nat/jails not working if checksum offloading o kern/169898 netifconfig(8) fails to set MTU on multiple interfaces. o kern/169676 net[bge] [hang] system hangs, fully or partially after re o kern/169664 net[bgp] Wrongful replacement of interface connected net o kern/169620 net[ng] [pf] ng_l2tp incoming packet bypass pf firewall o kern/169459 net[ppp] umodem/ppp/3g stopped working after update from o kern/169438 net[ipsec] ipv4-in-ipv6 tunnel mode IPsec does not work p kern/168294 net[ixgbe] [patch] ixgbe driver compiled in kernel has no o kern/168246 net[em] Multiple em(4) not working with qemu o kern/168245 net[arp] [regression] Permanent ARP entry not deleted on o kern/168244 net[arp] [regression] Unable to manually remove permanent o kern/168183 net[bce] bce driver hang system o kern/167947 net[setfib] [patch] arpresolve checks only the default FI o kern/167603 net[ip] IP fragment reassembly's broken: file transfer ov o kern/167500 net[em] [panic] Kernel panics in em driver o kern/167325 net[netinet] [patch] sosend sometimes return EINVAL with o kern/167202 net[igmp]: Sending multiple IGMP packets crashes kernel o kern/167059 net[tcp] [panic] System does panic in in_pcbbind() and ha o kern/166940 net[ipfilter] [panic] Double fault in kern 8.2 o kern/166462 net[gre] gre(4) when using a tunnel source address from c o kern/166372 net[patch] ipfilter drops UDP packets with zero checksum o kern/166285 net[arp] FreeBSD v8.1 REL p8 arp: unknown hardware addres o kern/166255 net[net] [patch] It should be possible to disable promis o kern/165963 net[panic] [ipf] ipfilter/nat NULL pointer deference o kern/165903 netmbuf leak o kern/165643 net[net] [patch] Missing vnet restores in net/if_ethersub o kern/165622 net[ndis][panic][patch]
Re: ng_ether naming
On Sun, Jan 27, 2013 at 01:16:53PM +0200, Andriy Gapon wrote: A based on your suggestions and submissions I've produced the following patch: A http://people.freebsd.org/~avg/ng_ether-renaming.diff A A It's only compile-tested at the moment :) A but I'd like to get your opinion about the direction of the change(s). A I am going to really test the change very soon. Patch is okay. If you wan't to close all related PRs, you also need to patch ppp(8), so that it does same sanitation of interface names when running PPPoE. And probably drop a note to Dmitry, who maintains mpd5. -- Totus tuus, Glebius. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: 9.1-stable crashes while copying data from a NFS mounted directory
On Monday 28 January 2013 07:35:31 YongHyeon PYUN wrote: On Fri, Jan 25, 2013 at 06:09:50PM +0100, Christian Gusenbauer wrote: On Friday 25 January 2013 05:50:48 YongHyeon PYUN wrote: On Fri, Jan 25, 2013 at 01:30:43PM +0900, YongHyeon PYUN wrote: On Thu, Jan 24, 2013 at 05:21:50PM -0500, John Baldwin wrote: On Thursday, January 24, 2013 4:22:12 pm Konstantin Belousov wrote: On Thu, Jan 24, 2013 at 09:50:52PM +0100, Christian Gusenbauer wrote: On Thursday 24 January 2013 20:37:09 Konstantin Belousov wrote: On Thu, Jan 24, 2013 at 07:50:49PM +0100, Christian Gusenbauer wrote: On Thursday 24 January 2013 19:07:23 Konstantin Belousov wrote: On Thu, Jan 24, 2013 at 08:03:59PM +0200, Konstantin Belousov wrote: On Thu, Jan 24, 2013 at 06:05:57PM +0100, Christian Gusenbauer wrote: Hi! I'm using 9.1 stable svn revision 245605 and I get the panic below if I execute the following commands (as single user): # swapon -a # dumpon /dev/ada0s3b # mount -u / # ifconfig age0 inet 192.168.2.2 mtu 6144 up # mount -t nfs -o rsize=32768 data:/multimedia /mnt # cp /mnt/Movies/test/a.m2ts /tmp then the system panics almost immediately. I'll attach the stack trace. Note, that I'm using jumbo frames (6144 byte) on a 1Gbit network, maybe that's the cause for the panic, because the bcopy (see stack frame #15) fails. Any clues? I tried a similar operation with the nfs mount of rsize=32768 and mtu 6144, but the machine runs HEAD and em instead of age. I was unable to reproduce the panic on the copy of the 5GB file from nfs mount. Hmmm, I did a quick test. If I do not change the MTU, so just configuring age0 with # ifconfig age0 inet 192.168.2.2 up then I can copy all files from the mounted directory without any problems, too. So it's probably age0 related? From your backtrace and the buffer printout, I see somewhat strange thing. The buffer data address is 0xff8171418000, while kernel faulted at the attempt to write at 0xff8171413000, which is is lower then the buffer data pointer, at the attempt to bcopy to the buffer. The other data suggests that there were no overflow of the data from the server response. So it might be that mbuf_len(mp) returned negative number ? I am not sure is it possible at all. Try this debugging patch, please. You need to add INVARIANTS etc to the kernel config. diff --git a/sys/fs/nfs/nfs_commonsubs.c b/sys/fs/nfs/nfs_commonsubs.c index efc0786..9a6bda5 100644 --- a/sys/fs/nfs/nfs_commonsubs.c +++ b/sys/fs/nfs/nfs_commonsubs.c @@ -218,6 +218,7 @@ nfsm_mbufuio(struct nfsrv_descript *nd, struct uio *uiop, int siz) } mbufcp = NFSMTOD(mp, caddr_t); len = mbuf_len(mp); + KASSERT(len 0, (len %d, len)); } xfer = (left len) ? len : left; #ifdef notdef @@ -239,6 +240,8 @@ nfsm_mbufuio(struct nfsrv_descript *nd, struct uio *uiop, int siz) uiop-uio_resid -= xfer; } if (uiop-uio_iov-iov_len = siz) { + KASSERT(uiop-uio_iovcnt 1, (uio_iovcnt %d, + uiop-uio_iovcnt)); uiop-uio_iovcnt--; uiop-uio_iov++; } else { I thought that server have returned too long response, but it seems to be not the case from your data. Still, I think the patch below might be due. diff --git a/sys/fs/nfsclient/nfs_clrpcops.c b/sys/fs/nfsclient/nfs_clrpcops.c index be0476a..a89b907 100644 --- a/sys/fs/nfsclient/nfs_clrpcops.c +++ b/sys/fs/nfsclient/nfs_clrpcops.c @@ -1444,7 +1444,7 @@ nfsrpc_readrpc(vnode_t vp, struct uio *uiop, struct ucred *cred, NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED); eof = fxdr_unsigned(int, *tl); } - NFSM_STRSIZ(retlen, rsize); + NFSM_STRSIZ(retlen, len); error = nfsm_mbufuio(nd,
Re: Cas driver fails to load first time after boot.
On 01/25/13 17:34, Marius Strobl wrote: On Fri, Jan 25, 2013 at 01:14:51PM -0600, Paul Keusemann wrote: On 01/25/13 10:19, Marius Strobl wrote: On Thu, Jan 24, 2013 at 08:48:04PM -0600, Paul Keusemann wrote: On 01/24/13 15:50, Marius Strobl wrote: On Thu, Jan 24, 2013 at 12:39:44PM -0600, Paul Keusemann wrote: On 01/24/13 09:09, Marius Strobl wrote: On Tue, Jan 22, 2013 at 02:46:48PM -0600, Paul Keusemann wrote: Hi, I've got a Dell R200 which I'm trying to build into a gateway with a Sun QGE (501-6738-10). The cas driver fails to load the first time I try to load it but succeeds the second time. Is this a problem with the card, the driver, my karma? Wrong phase of the moon, apparently :) The MII setup of these chips is a bit tricky and I'm not sure whether I've hit all code paths during development of the driver. I certainly didn't test with a 501-6738, these have been reported as working before, though. It also doesn't make much sense that attaching the devices succeeds on the second attempt. Could you please use a if_cas.ko built with the attached patch and report the debug output for one of the interfaces in both the working and the non-working case? I would love to give you output from the working and non-working case but apparently the phase of the moon has changed, I can't get it to fail now. The messages output from the working case is attached. Thanks but unfortunately this doesn't make any sense either. In general, printf()s cause deays which can be relevant. In the locations I've put them they hardly can make such a difference though. If you haven't already done so, could you please power off the machine before doing the test with the patched module? Is the problem still gone if you revert to the original module? OK, power-cycling makes a difference. The driver fails to attach all of the devices after power-cycling most of the time if not all of the time. The number of devices attached varies, the attached message file fragment is from my last test. Three of the devices were attached on the first load attempt and all four of them on the second attempt. Okay, so we now at least have a way to reproduce the problem. Unfortunately, it's still unclear what's the exact cause of it. At least the problem is not what I suspected and hoped it most likely is. Could you please test how things behave after a power-cycle with the attached patche (after reverting the previous one). The patched driver fails to compile with the following error message: ... I found the following defintion of nitems in the iwn and usb/wlan drivers: #define nitems(_a) (sizeof((_a)) / sizeof((_a)[0])) so I added it to if_cas.c and rebuilt without errors. Sorry, I didn't think of 8.3 not having nitems(), yet. Actually, this part of the patch is orthogonal to your problem and just a change I had in that tree. This looks like like it fixed the problem. I ran three tests from power-up to loading the driver and the driver loaded successfully all three times. I then added if_cas_load=YES to /boot/loader.conf and did two more successful reboots from power-up. Great! Thanks a lot for testing! Will this driver work on FreeBSD 9.1? Yes, the patch should also solve the problem in 9.1. I suspect the hang you are seeing there isn't specific to cas(4) but rather a general regression that came in with the VIMAGE changes. Now, if a network device driver fails to attach during boot and tries to clean up by detaching and freeing the interface part at that stage again this causes problems. I already talked to bz@ about this and what I remember from his reply this is an ordering issue that is at least very hard to fix. OK. I've successfully upgraded from 8.3-Release to 9.1-Release. I stupidly powered-down the machine after the upgrade, so I had to remove the QGE card to get it to boot 9.1 and build a custom kernel. The patch applied cleanly, the kernel built without errors and boots from power-up without problems. I've attached the most recent messages file, dmesg, kldstat and ifconfig output if you're interested. The only odd thing I noticed was that cas0 and cas3 log messages: cannot disable RX MAC but cas1 and cas2 do not. I haven't actually tried any of the interfaces yet but I assume they'll work as expected. Let me know if there's anything further testing you'd like me to do. Thanks so much for your help with this, it is much appreciated. Paul Marius -- Paul Keusemannpkeu...@visi.com 4266 Joppa Court (952) 894-7805 Savage, MN 55378 Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.1-RELEASE #4: Mon Jan 28 09:02:45 CST 2013 toor@lucid:/usr/obj/usr/src/sys/LUCID amd64 CPU: Intel(R)
Re: ixgbe msi/x
just curious, is this happening under behyve or also native, and is it always occurring or it is occasional ? Native, and it happens when the pps rate is high, even if the aggregate bandwidth is low. I am asking because with netmap when i tried to exploit interrupt mitigation (strictly processing incoming traffic only on rx interrupts) i noticed packet drops even at relatively low rates, which made me suspect that interrupts were either lost or heavily delayed. I am running whatever is in the driver (version 2.4.5) by default. Since msi/x isnt enabled by default, I have enabled that. The same test, sustains a *much* higher pps load with legacy interrupts, so I think that the msi/x interrupt setup is missing something. -vijay ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: ixgbe msi/x
On Monday, January 28, 2013 1:39:53 am Vijay Singh wrote: I am investigating an issue where the ixgbe (82599) device is hung and I think I have traced it to the driver not getting interrupts. I have MSI/X enabled, with 2 rx/tx queues. I am trying to understand this bit of code in the MSI/X setup: if (ixgbe_enable_msix) { ixgbe_configure_ivars(adapter); /* Set up auto-mask */== THIS BIT if (hw-mac.type == ixgbe_mac_82598EB) IXGBE_WRITE_REG(hw, IXGBE_EIAM, IXGBE_EICS_RTX_QUEUE); else { IXGBE_WRITE_REG(hw, IXGBE_EIAM_EX(0), 0x); IXGBE_WRITE_REG(hw, IXGBE_EIAM_EX(1), 0x); } } Does this mean that ixgbe_disable_queue() is not needed in the msi/x interrupt handler - ixgbe_msix_que()? You are really going to need the datasheet for this adapter to tell. From my recent reading of the datasheet for igb(4) (which is likely similar), it appears that MSI interrupts on that device are configured to auto-clear bits in the interrupt cause registers (ICR and EICR) when an MSI interrupt is posted so that the interrupt handler doesn't have to do a read of these registers to clear their status bits (one of the points of MSI interrupts is that you can just access in-memory descriptor rings when the handler fires without needing to do a read of the PCI device to force any posted memory writes by the device to flush). If I had to wager a guess, I'd say that ixgbe was following the same model. -- John Baldwin ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: ixgbe msi/x
On Mon, Jan 28, 2013 at 9:15 AM, Vijay Singh vijju.si...@gmail.com wrote: just curious, is this happening under behyve or also native, and is it always occurring or it is occasional ? Native, and it happens when the pps rate is high, even if the aggregate bandwidth is low. that was my case too. I have not gone too far into my investigation but should note that not _all_ interrupts were lost; my symptoms were queue overflows under netmap even at a low 2 Mpps, which with 2k entries in the rx ring means that the interrupt was delayed for more than 1ms, well above the moderation delay. With these symptoms I would normally blame the os scheduler, but in this case it seems a bit hard given that the machine has 4 cores at 2.8 GHz and no other processes running. So just to clarify, which one of these symptoms did you see 1) no rx interrupts at all at any rx rate 2) occasional missing interrupts/drops as the rx pps increase 3) complete loss of rx interrupts above some pps threshold ? cheers luigi ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: ixgbe msi/x
that was my case too. I have not gone too far into my investigation but should note that not _all_ interrupts were lost; my symptoms were queue overflows under netmap even at a low 2 Mpps, which with 2k entries in the rx ring means that the interrupt was delayed for more than 1ms, well above the moderation delay. This would be consistent with what I am seeing. I saw that vmstat -i reported some interrupt rate for the rx rings but even a simple ping at that point would lead to input errors - queue overflows. So just to clarify, which one of these symptoms did you see 1) no rx interrupts at all at any rx rate 2) occasional missing interrupts/drops as the rx pps increase 3) complete loss of rx interrupts above some pps threshold ? I think it would be closest to 3. The same HW runs fine when I disable msi/x and use legacy interrupts. -vijay ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
On Tuesday, January 15, 2013 8:23:24 pm Luigi Rizzo wrote: Hi, i found a couple of problems in dev/e1000/if_lem.c::lem_handle_rxtx() , (compare with dev/e1000/if_em.c::em_handle_que() for better understanding): 1. in if_em.c::em_handle_que(), when em_rxeof() exceeds the rx_process_limit, the task is rescheduled so it can complete the work. Conversely, in if_lem.c::lem_handle_rxtx() the lem_rxeof() is only run once, and if there are more pending packets the only chance to drain them is to receive (many) more interrupts. This is a relatively serious problem, because the receiver has a hard time recovering. I'd like to commit a fix to this same as it is done in e1000. This seems sensible. 2. in if_em.c::em_handle_que(), interrupts are reenabled unconditionally, whereas lem_handle_rxtx() only enables them if IFF_DRV_RUNNING is set. I cannot really tell what is the correct way here, so I'd like to put a comment there unless there is a good suggestion on what to do. Accesses to the intr register are race-prone anyways (disabled in fastintr, enabled in the rxtx task without holding any lock, and generally accessed under EM_CORE_LOCK in other places), and presumably enabling/disabling the interrupts around activations of the taks is just an optimization (and on a VM, it is actually a pessimization due to the huge cost of VM exits). Actually, this is quite important. The reason being that you don't want the interrupt handler and the task both running at the same time (ever). If you do they will both process RX packets and deliver them in different order to the stack wreaking havoc on TCP. In the case of what lem is doing, it is not racy (except with ifconfig up/down which is handled by checking for IFF_DRV_RUNNING changing after reacquiring the RX lock) as the algorith, is that once the fast interrupt handler masks the interrupt, that code path cannot run again until either the threaded handler or the task re-enables interrupts. -- John Baldwin ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: e1000 serdes link flap
Hi Jack, On Wed, Jan 23, 2013 at 2:58 PM, Ryan Stone ryst...@gmail.com wrote: On Wed, Jan 23, 2013 at 2:13 AM, Neel Natu neeln...@gmail.com wrote: Hi, I am running into a problem in head with the e1000 link state detection logic attached to a 82571EB serdes controller. The symptom is that the link state keeps flapping between up and down. After I enabled the debug output in 'e1000_check_for_serdes_link_82571()' this is what I see: snip e1000_check_for_serdes_link_82571 ctrl = 0x4c0241, status = 0x803a7, rxcw = 0x4400 FORCED_UP - AN_PROG em6: link state changed to DOWN e1000_check_for_serdes_link_82571 ctrl = 0x4c0201, status = 0x803a4, rxcw = 0x4400 AN_PROG - FORCED_UP em6: link state changed to UP e1000_check_for_serdes_link_82571 ctrl = 0x4c0241, status = 0x803a7, rxcw = 0x4400 FORCED_UP - AN_PROG em6: link state changed to DOWN /snip The problem goes away if I apply the following patch to bring the link state detection logic in line with the e1000e driver in Linux: Index: e1000_82571.c === --- e1000_82571.c (revision 245766) +++ e1000_82571.c (working copy) @@ -1712,10 +1712,8 @@ * auto-negotiation in the TXCW register and disable * forced link in the Device Control register in an * attempt to auto-negotiate with our link partner. -* If the partner code word is null, stop forcing -* and restart auto negotiation. */ - if ((rxcw E1000_RXCW_C) || !(rxcw E1000_RXCW_CW)) { + if ((rxcw E1000_RXCW_C) != 0) { /* Enable autoneg, and unforce link up */ E1000_WRITE_REG(hw, E1000_TXCW, mac-txcw); E1000_WRITE_REG(hw, E1000_CTRL, I am not sure why the !(rxcw E1000_RXCW_CW) check was added and the e1000 SDM does not have any more information. Jack, can you take a look at the patch and commit if it looks alright? I have this change applied internally. I thought that it was something funny in my environment, so I never tried to push it upstream. Sorry for costing you time in having to debug this. :( Are you planning to commit this patch? I am happy to get you more information from my system if it helps you get to the bottom of this quicker. best Neel ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
openntp server prompts dispatch_imsg in main: pipe closed error
My system is freebsd 8.2, openntp server prompts dispatch_imsg in main: pipe closed error,Who can help me?Thanks! - FreeBSD http://www.unixnotes.net -- View this message in context: http://freebsd.1045724.n5.nabble.com/openntp-server-prompts-dispatch-imsg-in-main-pipe-closed-error-tp5781938.html Sent from the freebsd-net mailing list archive at Nabble.com. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org