Re: FYI: SVN to GIT converter currently broken, github is falling behind
Hi, I just got this error when fetching from remote; related? [elars@laurel: ~/src] git fetch --all Fetching origin Auto packing the repository in background for optimum performance. See "git help gc" for manual housekeeping. Fetching upstream remote: Counting objects: 557, done. remote: Compressing objects: 100% (543/543), done. remote: Total 557 (delta 213), reused 2 (delta 2), pack-reused 0 Receiving objects: 100% (557/557), 1.15 MiB | 433.00 KiB/s, done. Resolving deltas: 100% (213/213), completed with 2 local objects. From github.com:/freebsd/freebsd b4eb11a..3eb0ea4 master -> upstream/master f147893..9c319c0 stable/10 -> upstream/stable/10 e901edd..b3c9fd2 stable/8 -> upstream/stable/8 81ab2b1..2fc7a9a stable/9 -> upstream/stable/9 c2c933c..cc76737 svn_head -> upstream/svn_head Auto packing the repository in background for optimum performance. See "git help gc" for manual housekeeping. error: The last gc run reported the following. Please correct the root cause and remove .git/gc.log. Automatic cleanup will not be performed until the file is removed. fatal: bad object refs/remotes/origin/HEAD error: failed to run repack Auto packing the repository in background for optimum performance. See "git help gc" for manual housekeeping. error: The last gc run reported the following. Please correct the root cause and remove .git/gc.log. Automatic cleanup will not be performed until the file is removed. fatal: bad object refs/remotes/origin/HEAD error: failed to run repack Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: env functionality of config(5) has no effect?
amd64 -- Sent from a mobile device; please excuse typos. +49 151 120 55791 > On Oct 22, 2015, at 21:08, Ian Lepore <i...@freebsd.org> wrote: > >> On Wed, 2015-10-21 at 08:09 +, Eggert, Lars wrote: >> Hi, >> >> I'm trying to include some loader tunables in the kernel, via the >> "env" functionality described in config(5). >> >> When I look at the compiled kernel binary with strings(1), I see that >> the tunables are compiled in. >> >> However, they don't seem to take any effect when booting the kernel, >> and they also don't show up when running kenv(1) after boot. >> >> Any ideas? >> >> Thanks, >> Lars > > I finally found a few minutes to look into this today. You didn't say > what platform you're working with. It appears that this has only ever > worked on i386 and a handful of old arm and mips platforms. > > -- Ian ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
env functionality of config(5) has no effect?
Hi, I'm trying to include some loader tunables in the kernel, via the "env" functionality described in config(5). When I look at the compiled kernel binary with strings(1), I see that the tunables are compiled in. However, they don't seem to take any effect when booting the kernel, and they also don't show up when running kenv(1) after boot. Any ideas? Thanks, Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Difference between pkg 1.5.2 and 1.5.4
Hi, I'm netbooting with a read-only rootfs. Up until version 1.5.2 of pkg, that sometimes caused some errors when installing various packages, but the install continued even if some files couldn't be written. That seems to have changed with 1.5.4. Specifically, upgrading ca_root_nss from 3.19 to 3.19.1_1 now aborts in archive_read_extract () as shown below. This regression makes it difficult to run read-only; any chance this abort can be turned into a warning instead? Lars Updating FreeBSD repository catalogue... FreeBSD repository is up-to-date. All repositories are up-to-date. Checking integrity... done (0 conflicting) The following 1 package(s) will be affected (of 0 checked): Installed packages to be UPGRADED: ca_root_nss: 3.19 - 3.19.1_1 The process will require 42 B more space. Proceed with this action? [y/N]: y [1/1] Upgrading ca_root_nss from 3.19 to 3.19.1_1... You may need to manually remove /usr/local/etc/ssl/cert.pem if it's no longer needed. You may need to manually remove /usr/local/openssl/cert.pem if it's no longer needed. pkg: unlinkat(usr/local/share/licenses/ca_root_nss-3.19/LICENSE): No such file or directory pkg: unlinkat(usr/local/share/licenses/ca_root_nss-3.19/MPL): No such file or directory pkg: unlinkat(usr/local/share/licenses/ca_root_nss-3.19/catalog.mk): No such file or directory [1/1] Extracting ca_root_nss-3.19.1_1: 71% pkg: archive_read_extract(): Can't create '/etc/ssl/cert.pem.pXkDjkwDtvyq' [1/1] Extracting ca_root_nss-3.19.1_1: 100% [1/1] Deleting files for ca_root_nss-3.19.1_1: 100% signature.asc Description: Message signed with OpenPGP using GPGMail
Re: ixl and BOOTP
On 2015-5-20, at 17:42, Ryan Stone ryst...@gmail.com wrote: Oh, I bet that you have a bunch of CPUs and ixl is consuming all of your interrupt vectors. Does setting this tunable fix the issue? hw.ixl.max_queues=1 Yeah, this box has 40 cores, but unfortunately that tunable doesn't change things in terms of BOOTP (I do see 2 vectors assigned not instead of 41 though). Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: ixl and BOOTP
On 2015-5-18, at 19:22, Ryan Stone ryst...@gmail.com wrote: Hm, I'm unable to reproduce this on the latest -CURRENT (r283059). My hardware is a little different from yours -- my CPU is a Haswell Xeon, and I have only 1 igb port and no ixgbe. Also, I was just booting GENERIC. I didn't have Xen or anything running. Happens also without Xen. I will dig a bit further. Thanks for testing! Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: ixl and BOOTP
On 2015-5-18, at 16:08, Ryan Stone ryst...@gmail.com wrote: This is very strange. I have successfully netbooted -CURRENT in a very similar environment (ixl compiled into kernel and booting over igb). I can't remember when the last time I did this but it was probably within the last couple of weeks. I routinely netboot an 8.2 derivative in this kind of environment and I've never seen this kind of problem. It used to work here, too. Something recently must have broken this. Could it be related to the size of the kernel, and not ixl specifically? I don't know. Why would the size of the kernel matter? Also, do you have any indication as to where the hang happens? Is it still in the BIOS, or in pxeloader, or in the kernel itself? Are you booting in legacy mode or EFI? Legacy mode, and it hangs in the kernel. Without if_ixl in loader.conf, it does the usual BOOTP logic: ses0 at ahciem0 bus 0 scbus7 target 0 lun 0 ses0: AHCI SGPIO Enclosure 1.00 0001 SEMB S-E-S 2.00 device ses0: SEMB SES Device ada0 at ahcich0 bus 0 scbus1 target 0 lun 0 ada0: INTEL SSDSC2BW180A3F 400i ACS-2 ATA SATA 3.x device ada0: Serial Number CVCV3102050X180EGN ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 171705MB (351651888 512 byte sectors: 16H 63S/T 16383C) ada0: quirks=0x14K ada0: Previously was known as ad4 Sending DHCP Discover packet from interface igb0 (00:25:90:9b:73:2e) Sending DHCP Discover packet from interface igb1 (00:25:90:9b:73:2f) Sending DHCP Discover packet from interface ix0 (90:e2:ba:77:d4:9c) Sending DHCP Discover packet from interface ix1 (90:e2:ba:77:d4:9d) uhub1: 2 ports with 2 removable, self powered uhub0: 2 ports with 2 removable, self powered ugen0.2: vendor 0x8087 at usbus0 uhub2: vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2 on usbus0 ugen1.2: vendor 0x8087 at usbus1 uhub3: vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2 on usbus1 uhub2: 6 ports with 6 removable, self powered uhub3: 8 ports with 8 removable, self powered ugen0.3: American Megatrends Inc. at usbus0 ukbd0: Keyboard Interface on usbus0 ums0: Mouse Interface on usbus0 ums0: 3 buttons and [Z] coordinates ID=0 igb0: link state changed to UP Received DHCP Offer packet on igb0 from 192.168.0.2 (accepted) (no root path) (boot_file) Received DHCP Offer packet on igb0 from 192.168.0.2 (ignored) (no root path) (boot_file) Received DHCP Offer packet on igb0 from 192.168.0.2 (ignored) (no root path) (boot_file) Sending DHCP Request packet from interface igb0 (00:25:90:9b:73:2e) Received DHCP Ack packet on igb0 from 192.168.0.2 (accepted) (got root path) DHCP timeout for interface igb1 DHCP timeout for interface ix0 DHCP timeout for interface ix1 Wired loader interface (IP 192.168.11.1) is igb0 igb0 at 192.168.11.1 server 192.168.0.2 boot file /pxe/pxelinux.0 subnet mask 255.255.0.0 router 192.168.0.2 rootfs 192.168.0.10:/home/elars/dst hostname phobos2 Adjusted interface igb0 Shutdown interface igb1 Shutdown interface ix0 Shutdown interface ix1 ... And then later on mount the rootfs correctly: Trying to mount root from nfs: []... NFS ROOT: 192.168.0.10:/home/elars/dst Interface igb0 IP-Address 192.168.11.1 Broadcast 192.168.255.255 Setting hostuuid: ----0025909b732e. Setting hostid: 0xe85d6456. ... If I enable if_ixl in loader.conf, I don't see any Sending DHCP Discover packet messages in the log at all, and consequently the NFS mount fails. See the attached diff; on the left is a boot without if_ixl, and on the right, with if_ixl. Lars ok nok B2 B2 __ _ _ __ _ _ | | | _ \ / | __ \ | | | _ \ / | __ \ | |___ _ __ ___ ___ | |_) | (___ | | | | | |___ _ __ ___ ___ | |_) | (___ | | | | | ___| '__/ _ \/ _ \| _ \___ \| | | | | ___| '__/ _ \/ _ \| _ \___ \| | | | | | | | | __/ __/| |_) |) | |__| | | | | | | __/ __/| |_) |) | |__| | | | | | | | || | | | | | | | | | || | | | Xen 4.6-unstabled.ko size 0x2c00 at 0x10620` ` /boot/kernel/cc_htcp.ko size 0x2f90 at 0x109e ` (XEN) Xen version 4.6-unstable (r...@netapp.com) (gcc47 (FreeBSD Ports Collection) 4.7.4) debug=y Mon May 18 14:50:17 CEST 2015ko size 0x30d0 at 0x1068000 .--` /y:` +. /boot/kernel/cc_vegas.ko size 0x30d0 at 0x10a1000.---...--.``` -/ (XEN) Latest ChangeSet: | yo`:. :o `+- Booting...el/cc_hd.ko size 0x2c00 at 0x109b000o .--` /y:` +. (XEN) Bootloader: FreeBSD Loader | y/ -/` -o/ Xen 4.6-unstable | yo`:. :o `+- (XEN) Command line: dom0_mem=4096M dom0pvh=1 com1=115200,8n1 console=com1. (XEN) Xen version 4.6-unstable (r...@netapp.com) (gcc47 (FreeBSD Ports Collection) 4.7.4) debug=y Mon May 18 14:50:17 CEST 2015ser | .- ::/sy+:. (XEN) Video information:r prompt | / `-- / (XEN) Latest ChangeSet: r p | / `-- / (XEN) VGA is text mode 80x25, font 8x16 | `: :` (XEN) Bootloader: FreeBSD Loader | `: :` (XEN)
ixl and BOOTP
Hi, when I have the ixl driver compiled into my -CURRENT kernel (or loaded as a module via loader.conf), the boot seems to hang (or silently crash) when BOOTP starts bringing up interfaces to send out probes. (I'm not netbooting over an ixl, the boot interface is an igb.) What works is building the kernel without the ixl driver and then loading it manually once the system is up. That way, BOOTP via igb succeeds. Any ideas what could be causing this? Lars signature.asc Description: Message signed with OpenPGP using GPGMail
FreeBSD FUSE calls truncate() on read-only files
Hi, this came up when trying to port tup (https://github.com/gittup/tup) to FreeBSD. Even though we are opening the file read-only with cat, FUSE calls truncate() on it, which modifies its mtime and this screws up tup. See https://github.com/gittup/tup/issues/198 Anyone know why FreeBSD's FUSE is doing this? Thanks, Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: HEADS UP: Upgraded clang, llvm and lldb to 3.5.0
On 2015-1-7, at 16:28, Garrett Cooper yaneurab...@gmail.com wrote: Please open a bug and cc hselasky@ on it. Done: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196597 Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: HEADS UP: Upgraded clang, llvm and lldb to 3.5.0
Hi, On 2014-12-31, at 21:41, Dimitry Andric d...@freebsd.org wrote: I just committed an upgrade of clang, llvm and lldb to 3.5.0 to head, in r276479. there seem to be issues when building with -DWITH_OFED: --- contrib/ofed.all__D --- /usr/home/elars/src/contrib/ofed/usr.bin/opensm/../../management/opensm/opensm/osm_ucast_ftree.c:2996:8: error: taking the absolute value of unsigned type 'unsigned int' has no effect [-Werror,-Wabsolute-value] if (abs(p_sw-rank - p_remote_sw-rank) != 1) { Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PATCH] nscd
Hi, I've been seeing the same issues with nscd not caching, but unfortunately your patch doesn't seem to change things, for better or worse. My nsswitch.conf looks as follows: group: cache files nis hosts: cache files dns networks: cache files passwd: cache files nis shells: files services: cache files nis protocols: cache files rpc: cache files When I start nscd -n -s -t and then run top in another shell, top takes ~10 seconds to start up every time; if nscd did its thing, repeat invocations should be much faster. nscd doesn't seem to see any activity either, based on its log: [elars@one: ~] sudo nscd -n -s -t M1 from main: request agents registered successfully M2 from cache: cache was successfully initialized M2 from runtime environment: using socket /var/run/nscd M2 from runtime environment: successfully initialized M1 from main: working in single-threaded mode no further output Lars On 2014-9-30, at 5:40, David Shane Holden dpej...@yahoo.com wrote: So, I've noticed nscd hasn't worked right for awhile now. Since I upgraded to 10.0 it never seemed to cache properly but I never bothered to really dig into it until recently and here's what I've found. In my environment I have nsswitch set to use caching and LDAP as such: group: files cache ldap passwd: files cache ldap The LDAP part works fine, but caching didn't on 10.0 for some reason. On my 9.2 machines it works as expected though. What I've found is in usr.sbin/nscd/query.c struct query_state * init_query_state(int sockfd, size_t kevent_watermark, uid_t euid, gid_t egid) { ... memcpy(retval-timeout, s_configuration-query_timeout, sizeof(struct timeval)); ... } s_configuration-query_timeout is an 'int' which is being memcpy'd into a 'struct timeval' causing it to grab other parts of the s_configuration struct along with the query_timeout value and polluting retval-timeout. In this case it appears to be grabbing s_configuration-threads_num and shoving that into timeout.tv_sec along with the query_timeout. This ends up confusing nscd later on (instead of being 8 it ends up being set to 34359738376) and breaks it's ability to cache. I've attached a patch to set the retval-timeout properly and gets nscd working again. I'm guessing gcc was handling this differently from clang which is why it wasn't a problem before 10.0. nscd.patch___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org signature.asc Description: Message signed with OpenPGP using GPGMail
Re: nscd not caching
On 2014-8-18, at 20:23, John-Mark Gurney j...@funkthat.com wrote: Why not run a local slave on your server? I am trying to get one set up. It requires a change request to our organization's IT, which is, ahem, not always lightning fast. Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: nscd not caching
On 2014-8-19, at 13:54, Daniel Braniss da...@cs.huji.ac.il wrote: I know that this a bit late but have you ever considered Hesiod? it uses DNS/txt. we have been using it since the days when BSDi had no NIS support and haven’t seen a ypserver not responding since :-) I don't control the master NIS infrastructure, I just want to use it (with a server that is unfortunately 25ms away). We will move to LDAP at some time, but in the meantime, a functioning nscd would be nice. Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: nscd not caching
Hi, On 2014-8-17, at 18:10, Adam McDougall mcdou...@egr.msu.edu wrote: We were using +: type entries in the local password and group tables and I believe we used an unmodified /etc/nsswitch.conf (excluding cache lines while testing nscd): I tried that setup too, and it doesn't seem to be caching any NIS lookups either. The current NIS server is 25ms away, which is a pain. I'm trying to get a local slave set up, which will make the need for nscd go away, but it would sure be nice if it worked in the meantime. At our site, we never had enough load to outright require nscd on FreeBSD, although there were some areas where caching had a usability benefit. Load is not an issue, latency is (see above). top was slow to open since it would load the whole passwd table first, but top -u was a workaround. Right, I see that issue too. As a workaround until we retired NIS, I wrote a hack of a script to merge NIS groups into my local /etc/group files periodically from cron. Aside from bugs in my script, that worked well. I may end up doing this, too. Given all this, maybe it's time to retire nscd? Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: nscd not caching
Nobody using nscd? Really? On 2014-8-14, at 13:26, Eggert, Lars l...@netapp.com wrote: [Resending to current@, since I can't get it to work on -CURRENT either.] Hi, anyone have an idea why nscd would not be caching NIS lookups? My nsswitch.conf looks as follows: group: cache files nis hosts: cache files dns networks: cache files passwd: cache files nis shells: files services: cache files nis protocols: cache files rpc: cache files nisdomain is set and ypbind is started, and I see lots of NIS traffic going in and out. But nothing is cached; running nscd with -t just prints this and then then nothing, ever: M1 from main: successfully daemonized M1 from main: request agents registered successfully M2 from cache: cache was successfully initialized M2 from runtime environment: using socket /var/run/nscd M2 from runtime environment: successfully initialized M1 from main: thread #0 was successfully created M1 from main: thread #1 was successfully created M1 from main: thread #2 was successfully created M1 from main: thread #3 was successfully created M1 from main: thread #4 was successfully created M1 from main: thread #5 was successfully created M1 from main: thread #6 was successfully created M1 from main: thread #7 was successfully created Lars signature.asc Description: Message signed with OpenPGP using GPGMail
nscd not caching
[Resending to current@, since I can't get it to work on -CURRENT either.] Hi, anyone have an idea why nscd would not be caching NIS lookups? My nsswitch.conf looks as follows: group: cache files nis hosts: cache files dns networks: cache files passwd: cache files nis shells: files services: cache files nis protocols: cache files rpc: cache files nisdomain is set and ypbind is started, and I see lots of NIS traffic going in and out. But nothing is cached; running nscd with -t just prints this and then then nothing, ever: M1 from main: successfully daemonized M1 from main: request agents registered successfully M2 from cache: cache was successfully initialized M2 from runtime environment: using socket /var/run/nscd M2 from runtime environment: successfully initialized M1 from main: thread #0 was successfully created M1 from main: thread #1 was successfully created M1 from main: thread #2 was successfully created M1 from main: thread #3 was successfully created M1 from main: thread #4 was successfully created M1 from main: thread #5 was successfully created M1 from main: thread #6 was successfully created M1 from main: thread #7 was successfully created Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: HEADS UP: Updated llvm/clang to 3.4 in r261991
Hi, On 2014-2-16, at 21:06, Dimitry Andric d...@freebsd.org wrote: I have just upgraded our copy of llvm/clang to 3.4 release, in r261991. I just done a git pull followed by a buildworld. Shouldn't I be having version 3.4? root@six:~ # dmesg | grep clang FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 root@six:~ # clang -v FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 Target: x86_64-unknown-freebsd11.0 Thread model: posix root@six:~ # uname -a FreeBSD six 11.0-CURRENT FreeBSD 11.0-CURRENT #14 df7b691(fas3270): Tue Feb 4 13:28:37 CET 2014 el...@stanley.muccbc.hq.netapp.com:/usr/home/elars/obj/usr/home/elars/src/sys/FAS3270 amd64 Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: HEADS UP: Updated llvm/clang to 3.4 in r261991
Disregard - pilot error. (It's not Feb 4...) Lars On 2014-2-18, at 9:50, Eggert, Lars l...@netapp.com wrote: Hi, On 2014-2-16, at 21:06, Dimitry Andric d...@freebsd.org wrote: I have just upgraded our copy of llvm/clang to 3.4 release, in r261991. I just done a git pull followed by a buildworld. Shouldn't I be having version 3.4? root@six:~ # dmesg | grep clang FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 root@six:~ # clang -v FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 Target: x86_64-unknown-freebsd11.0 Thread model: posix root@six:~ # uname -a FreeBSD six 11.0-CURRENT FreeBSD 11.0-CURRENT #14 df7b691(fas3270): Tue Feb 4 13:28:37 CET 2014 el...@stanley.muccbc.hq.netapp.com:/usr/home/elars/obj/usr/home/elars/src/sys/FAS3270 amd64 Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: using ConnectX card as Ethernet (mlxen)
Hi, On 2014-1-20, at 21:59, John Baldwin j...@freebsd.org wrote: I believe this should work, yes. Getting a crashdump or the panic messages would be really helpful in figuring out why it isn't. Thanks. I rebuilt the kernel, and see no crashes anymore. So that's good. But there are a bunch of other issues that maybe someone has some ideas about: (1) Late attach The ConnectX-3 attaches very late during the boot process, after the system is already in single-user mode. See the attached dmesg; pci17 and pci18 (there are two identical cards in this system) first show as no driver attached during the PCI bus enumeration. Only after the system is single-user mode does the mlx4_core attach to the cards. That means that e.g. trying to set sysctls for these cards in /etc/sysctl.conf, or configuring their IP addresses via rc.conf is not possible. At the moment, I work around this by sleeping in rc.local and then doing assignments there, but that's a hack. Any clues why these cards attach so late? (2) Device numbers change After booting, these cards show up in InfiniBand mode: ib0: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520 options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE lladdr 80.0.0.48.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d1.21 nd6 options=21PERFORMNUD,AUTO_LINKLOCAL ib1: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520 options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE lladdr 80.0.0.49.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d1.22 nd6 options=21PERFORMNUD,AUTO_LINKLOCAL ib2: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520 options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE lladdr 80.0.0.48.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d1 nd6 options=21PERFORMNUD,AUTO_LINKLOCAL ib3: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520 options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE lladdr 80.0.0.49.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d2 nd6 options=21PERFORMNUD,AUTO_LINKLOCAL Then I force one into Ethernet mode: # sysctl sys.device.mlx4_core0.mlx4_port1=eth sys.device.mlx4_core0.mlx4_port1: auto (ib) - eth and the device numbers on the ib devices change: ib1 is now ib4, and I have a new mlxen0 device. ib2: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520 options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE lladdr 80.0.0.48.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d1 nd6 options=21PERFORMNUD,AUTO_LINKLOCAL ib3: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520 options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE lladdr 80.0.0.49.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d2 nd6 options=21PERFORMNUD,AUTO_LINKLOCAL mlxen0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=d05bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE ether f4:52:14:10:d1:21 inet6 fe80::f652:14ff:fe10:d121%mlxen0 prefixlen 64 scopeid 0xe nd6 options=21PERFORMNUD,AUTO_LINKLOCAL media: Ethernet autoselect status: no carrier ib4: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520 options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE lladdr 80.0.0.4a.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d1.22 nd6 options=21PERFORMNUD,AUTO_LINKLOCAL When I change another port into Ethernet mode # sysctl sys.device.mlx4_core0.mlx4_port2=eth sys.device.mlx4_core0.mlx4_port2: auto (ib) - eth device numbers change again. Now mxlen0 disappears and becomes mxlen1, and I have a new mxlen2 device: ib2: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520 options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE lladdr 80.0.0.48.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d1 nd6 options=21PERFORMNUD,AUTO_LINKLOCAL ib3: flags=8002BROADCAST,MULTICAST metric 0 mtu 65520 options=80018VLAN_MTU,VLAN_HWTAGGING,LINKSTATE lladdr 80.0.0.49.fe.80.0.0.0.0.0.0.f4.52.14.3.0.10.d0.d2 nd6 options=21PERFORMNUD,AUTO_LINKLOCAL mlxen1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=d05bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE ether f4:52:14:10:d1:21 inet6 fe80::f652:14ff:fe10:d121%mlxen1 prefixlen 64 scopeid 0xe nd6 options=21PERFORMNUD,AUTO_LINKLOCAL media: Ethernet autoselect status: no carrier mlxen2: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=d05bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE ether f4:52:14:10:d1:22 inet6 fe80::f652:14ff:fe10:d122%mlxen2 prefixlen 64 scopeid 0xf nd6 options=21PERFORMNUD,AUTO_LINKLOCAL media: Ethernet autoselect status: no carrier Changing the other two ports (on the second card) to Ethernet mode # sysctl sys.device.mlx4_core1.mlx4_port1=eth sys.device.mlx4_core1.mlx4_port1: auto (ib) -
Re: using ConnectX card as Ethernet (mlxen)
On 2014-1-21, at 10:04, Lars Eggert l...@netapp.com wrote: See the attached dmesg which I of course forget to attach (sigh). See below. Lars GDB: no debug ports present970 KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2014 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.0-CURRENT #8 ab08c30(fas3270)-dirty: Tue Jan 21 09:07:36 CET 2014 el...@stanley.muccbc.hq.netapp.com:/usr/home/elars/obj/usr/home/elars/src/sys/FAS3270 amd64 FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 CPU: Intel(R) Xeon(R) CPU E5240 @ 3.00GHz (3000.17-MHz K8-class CPU) Origin=GenuineIntel Id=0x1067a Family=0x6 Model=0x17 Stepping=10 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0xc0ce3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE,OSXSAVE AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant, performance statistics real memory = 18253611008 (17408 MB) avail memory = 16599695360 (15830 MB) MPTable: NETAPP SB_XVI Event timer LAPIC quality 400 FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 2 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP): APIC ID: 7 random device not loaded; using insecure entropy ioapic0: Assuming intbase of 0 ioapic0 Version 2.0 irqs 0-23 on motherboard netmap: loaded module random: Software, Yarrow initialized smbios0: System Management BIOS at iomem 0xf6c00-0xf6c1e on motherboard smbios0: Version: 2.5 cryptosoft0: software crypto on motherboard pcib0: MPTable Host-PCI bridge pcibus 0 on motherboard pci0: PCI bus on pcib0 pcib1: MPTable PCI-PCI bridge at device 2.0 on pci0 pci1: PCI bus on pcib1 cxgbc0: Carnegie T3 onboard SR KR, 2 ports mem 0xdd001000-0xdd001fff,0xdc80-0xdcff,0xdd00-0xdd000fff irq 16 at device 0.0 on pci1 cxgbc0: AD8158 0xf=0x3 0x1=0xf cxgbc0: using MSI-X interrupts (9 vectors) cxgb0: Port 0 10GBASE-R on cxgbc0 cxgb0: Ethernet address: 00:a0:98:30:c2:2a cxgb1: Port 1 10GBASE-R on cxgbc0 cxgb1: Ethernet address: 00:a0:98:30:c2:2b cxgbc0: Firmware Version 7.11.0 pcib2: PCI-PCI bridge at device 3.0 on pci0 pci2: PCI bus on pcib2 pcib3: MPTable PCI-PCI bridge at device 4.0 on pci0 pci3: PCI bus on pcib3 pcib4: PCI-PCI bridge mem 0xdd30-0xdd31 irq 16 at device 0.0 on pci3 pci4: PCI bus on pcib4 pcib3: unable to route slot 0 INTB pcib5: PCI-PCI bridge irq 16 at device 4.0 on pci4 pci5: PCI bus on pcib5 pcib6: MPTable PCI-PCI bridge irq 10 at device 5.0 on pci4 pci6: PCI bus on pcib6 ix0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15 mem 0xdd40-0xdd47,0xdd50-0xdd503fff irq 17 at device 0.0 on pci6 ix0: Using MSIX interrupts with 5 vectors ix0: Ethernet address: 90:e2:ba:37:d5:b4 ix0: PCI Express Bus: Speed 5.0GT/s Width x8 001.08 [2141] netmap_attach success for ix0 ix1: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15 mem 0xdd48-0xdd4f,0xdd504000-0xdd507fff irq 18 at device 0.1 on pci6 ix1: Using MSIX interrupts with 5 vectors ix1: Ethernet address: 90:e2:ba:37:d5:b5 ix1: PCI Express Bus: Speed 5.0GT/s Width x8 001.09 [2141] netmap_attach success for ix1 pcib7: PCI-PCI bridge irq 16 at device 8.0 on pci4 pci7: PCI bus on pcib7 pcib8: PCI-PCI bridge at device 0.0 on pci7 pci8: PCI bus on pcib8 pcib9: MPTable PCI-PCI bridge at device 0.0 on pci8 pci9: PCI bus on pcib9 em0: Intel(R) PRO/1000 Network Connection 7.3.8 mem 0xdd62-0xdd63,0xdd60-0xdd61 irq 16 at device 0.0 on pci9 em0: Using an MSI interrupt em0: Ethernet address: 00:1b:21:a8:a5:34 001.10 [2141] netmap_attach success for em0 em1: Intel(R) PRO/1000 Network Connection 7.3.8 mem 0xdd66-0xdd67,0xdd64-0xdd65 irq 17 at device 0.1 on pci9 em1: Using an MSI interrupt em1: Ethernet address: 00:1b:21:a8:a5:35 001.11 [2141] netmap_attach success for em1 pcib10: MPTable PCI-PCI bridge at device 1.0 on pci8 pci10: PCI bus on pcib10 em2: Intel(R) PRO/1000 Network Connection 7.3.8 mem 0xdd72-0xdd73,0xdd70-0xdd71 irq 17 at device 0.0 on pci10 em2: Using an MSI interrupt em2: Ethernet address: 00:1b:21:a8:a5:36 001.12 [2141] netmap_attach success for em2 em3: Intel(R) PRO/1000 Network Connection 7.3.8 mem 0xdd76-0xdd77,0xdd74-0xdd75 irq 18 at device 0.1 on pci10 em3: Using an MSI interrupt em3: Ethernet address: 00:1b:21:a8:a5:37 001.13 [2141] netmap_attach success for em3 pcib11: PCI-PCI bridge at device 5.0 on pci0 pci11: PCI bus on pcib11 pcib12: PCI-PCI bridge at device 6.0 on pci0 pci12: PCI bus on pcib12 pcib0: unable to
Re: using ConnectX card as Ethernet (mlxen)
Last follow-up: I just saw that there are some additional messages (errors?) on the serial console when changing the device from IB to Ethernet, maybe they mean something to someone: root@one:~ # sysctl sys.device.mlx4_core0.mlx4_port1=eth sys.device.mlx4_core0.mlx4_port1: auto (ib)7ib0: stopping interface 7ib0: downing ib_dev 7ib0: stopping multicast thread 7ib0: flushing multicast list 7qpn 0x48: invalid attribute mask specified for transition 0 to 6. qp_type 4, attr_mask 0x1\n4ib0: Failed to modify QP to ERROR state 7ib0: All sends and receives done. 7ib0: cleaning up ib_dev 7ib0: stopping multicast thread 7ib0: flushing multicast list 7ib0: Cleanup ipoib connected mode. 7ib1: stopping interface 7ib1: downing ib_dev 7ib1: stopping multicast thread 7ib1: flushing multicast list 7qpn 0x49: invalid attribute mask specified for transition 0 to 6. qp_type 4, attr_mask 0x1\n4ib1: Failed to modify QP to ERROR state 7ib1: All sends and receives done. 7ib1: cleaning up ib_dev 7ib1: stopping multicast thread 7ib1: flushing multicast list 7ib1: Cleanup ipoib connected mode. 6mlx4_en mlx4_core0: Using 5 tx rings for port:1 6mlx4_en mlx4_core0: Defaulting to 4 rx rings for port:1 6mlx4_en mlx4_core0: Activating port:1 mlxen0: Ethernet address: f4:52:14:10:d1:21 4mlx4_en: mlx4_core0: Port 1: Using 5 TX rings 4mlx4_en: mlx4_core0: Port 1: Using 4 RX rings 6mlx4_ib: Mellanox ConnectX InfiniBand driver v1.Jan 21 09:21:31 0 (April 4, 2008) one kernel: mlx4_en: mlx4_core0: Port 1: Using 5 TX rings Jan 7ib4: max_srq_sge=31 21 09:21:31 one 7ib4: max_cm_mtu = 0x1, num_frags=16 kernel: mlx4_en:ib4: mlx4_core0: PorAttached to mlx4_0 port 2 t 1: Using 4 RX rings - eth Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: using ConnectX card as Ethernet (mlxen)
Hi, On 2013-7-9, at 22:08, John Nielsen li...@jnielsen.net wrote: On Jul 9, 2013, at 9:58 AM, John Baldwin j...@freebsd.org wrote: So this was just fixed (finally) in HEAD in r253048. You can how use the sysctls to change this. I saw the commit. Thanks! I'll give it a try at some point (whenever my schedule and hardware availability align). is this supposed to work at the moment? When I try, the machine seems to crash: root@one:~ # sysctl sys.device.mlx4_core0.mlx4_port1=eth sys.device.mlx4_core0.mlx4_port1: auto (ib) Write failed: Broken pipe Shared connection to xxx closed. Unfortunately I don't have serial console access at the moment, so I can't access any messages that may have gotten dumped. The cards in question are: mlx4_core0@pci0:17:0:0: class=0x028000 card=0x005015b3 chip=0x100315b3 rev=0x00 hdr=0x00 vendor = 'Mellanox Technologies' device = 'MT27500 Family [ConnectX-3]' class = network Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: using ConnectX card as Ethernet (mlxen)
Hi, if I leave the mlx4ib device out of the kernel (i.e., only compile in mlxen), doing the sysctl switch to Ethernet mode works fine. Lars On 2014-1-20, at 13:08, Eggert, Lars l...@netapp.com wrote: Hi, On 2013-7-9, at 22:08, John Nielsen li...@jnielsen.net wrote: On Jul 9, 2013, at 9:58 AM, John Baldwin j...@freebsd.org wrote: So this was just fixed (finally) in HEAD in r253048. You can how use the sysctls to change this. I saw the commit. Thanks! I'll give it a try at some point (whenever my schedule and hardware availability align). is this supposed to work at the moment? When I try, the machine seems to crash: root@one:~ # sysctl sys.device.mlx4_core0.mlx4_port1=eth sys.device.mlx4_core0.mlx4_port1: auto (ib) Write failed: Broken pipe Shared connection to xxx closed. Unfortunately I don't have serial console access at the moment, so I can't access any messages that may have gotten dumped. The cards in question are: mlx4_core0@pci0:17:0:0: class=0x028000 card=0x005015b3 chip=0x100315b3 rev=0x00 hdr=0x00 vendor = 'Mellanox Technologies' device = 'MT27500 Family [ConnectX-3]' class = network Lars signature.asc Description: Message signed with OpenPGP using GPGMail
nfsd server cache flooded, try to increase nfsrc_floodlevel
Hi, every few days or so, my -STABLE NFS server (v3 and v4) gets wedged with a ton of messages about nfsd server cache flooded, try to increase nfsrc_floodlevel in the log, and nfsstat shows TCPPeak at 16385. It requires a reboot to unwedge, restarting the server does not help. The clients are (mostly) six -CURRENT nfsv4 boxes that netboot from the server and mount all drives from there. I googled around and saw that others have hit this issue, but I haven't seen any resolution posted. I guess I can increase NFSRVCACHE_FLOODLEVEL in the source, but I wonder if I wouldn't simply hit the increase value after a little while longer... Lars signature.asc Description: Message signed with OpenPGP using GPGMail
Re: NFSv4 console messages (locks lost etc.)
Hi, On Jun 29, 2013, at 2:45, Rick Macklem rmack...@uoguelph.ca wrote: Btw, a NFSv4 mounted root fs will not work correctly, because the client name is generated from the host uuid, which isn't set when the root fs is mounted. I'm not sure what the client would use as its client name, but this will definitely break things badly if multiple clients use the same name. (And this might explain the lease expiry problem.) ah, now that explains a lot. Since these are diskless clients, I had set hostid_enable=NO in order to turn off the /etc/rc: WARNING: could not store hostuuid in /etc/hostid warning. Turning this back on seems to have fixed the issue. (It might make sense to have the NFSv4 code throw a warning when the hostid isn't set, if it depends on it?) Thanks, Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: NFSv4 console messages (locks lost etc.)
Hi, I should have mentioned that the server is FreeBSD -STABLE running newnfs, and the network isn't partitioned (because I access the box over SSH at the same time I see these messages.) They only appear under heavy NFS load (portmaster build of math/R in this case.) Lars On Jun 29, 2013, at 2:32, Rick Macklem rmack...@uoguelph.ca wrote: Lars Eggert wrote: Hi, on a -CURRENT client, I get quite a number of console messages under heavy NFSv4 load, such as: nfsv4 expired locks lost Means the lease expired on the NFSv4 server somehow. Lease expiry is bad news and there is no way to recover locks lost because of it. nfscl: never fnd open Usually, opens can be recovered after a lease expiry, but it might be broken. Since lease expiry should never happen during normal operation (see below), it doesn't get a lot of testing. nfscl: never fnd open nfscl: never fnd open nfsv4 expired locks lost nfscl: never fnd open nfscl: never fnd open nfsv4 expired locks lost nfsv4 expired locks lost nfsv4 expired locks lost nfsv4 expired locks lost nfsv4 expired locks lost nfscl: never fnd open Can I ignore them? Can I turn them off? Well, these should never happen during normal, correct operation. The nfsv4 expired locks lost implies lease expiry. This should only happen when the client is network partitioned from the server for more than a lease duration (chosen by the server, but typically about 1minute). The client does a Renew Op every 1/2 lease durations to avoid this. Also, any state related operation (open/lock/locku/close/etc) is supposed to renew the lease implicitly. If you are getting network partitions happening, then you really need to fix the network. If not, then if you watch network traffic with something like wireshark and see Renew Ops happening at regular intervals, then I can only suggest that the server is somehow broken for NFSv4. You should also look for NFS4ERR_EXPIRED error replies to operations related to state (open/lock/locku/close). That is the server reply which indicates the lease expiry. If the server is never returning this, I have no idea how the client would generate the above messages, but it does indicate a client NFSv4 bug if that is the case. Switching all mounts to NFSv3 will get rid of the above, although it is not exactly a fix;-) rick Thanks, Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: NFSv4 console messages (locks lost etc.)
Thanks Rick! I will check these Monday. Lars On Jun 29, 2013, at 2:45, Rick Macklem rmack...@uoguelph.ca wrote: Lars Eggert wrote: Hi, On Jun 28, 2013, at 16:37, Eggert, Lars l...@netapp.com wrote: On Jun 28, 2013, at 16:14, Eggert, Lars l...@netapp.com wrote: on a -CURRENT client, I get quite a number of console messages under heavy NFSv4 load, such as: nfsv4 expired locks lost nfscl: never fnd open The never fnd open message is generated by the NFSv4 client when a close can't find an extant open to close. I suspect the open was not recovered after lease expiry. Since Close Ops only matter to the NFSv4 server, this doesn't imply a problem unless the NFSv4 server thinks the client still has an Open (which would not be the case after an NFSv4 server expires a lease, since it assumes all state such as opens are lost when a lease is expired). actually, not sure if the nfscl message is from an NFSv4 mount point or not, because the box mounts root via BOOTP, so with NFSv3 (or v2?) and some other mounts with NFSv4. and another data point: the nfscl messages seem to disappear when I remove the BOOTP_NFSV3 flag from the kernel. The client hangs that made me dig into these messages seem to also disappear, fingers crossed. Hmm, weird, since NFSv3 should never generate these messages. I think that a root fs is remounted using the / entry in the /etc/fstab in the NFS mounted root fs. Did this entry specify nfsv4 by any chance? Btw, a NFSv4 mounted root fs will not work correctly, because the client name is generated from the host uuid, which isn't set when the root fs is mounted. I'm not sure what the client would use as its client name, but this will definitely break things badly if multiple clients use the same name. (And this might explain the lease expiry problem.) If the root fs is mounted NFSv3 (or NFSv2) it shouldn't generate the messages or have any effect on the NFSv4 client, so I have no idea why removing BOOTP_NFSV3 would have any effect on this? Oh, and if you are using a pretty up to date system, you can nfsstat -m to find out what mount options are actually in use. If nfsv4 is listed for your root fs, that is a serious problem that you need to fix. rick (I still get a bunch of nfsv4 expired locks lost messages, but no hangs.) Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
NFSv4 console messages (locks lost etc.)
Hi, on a -CURRENT client, I get quite a number of console messages under heavy NFSv4 load, such as: nfsv4 expired locks lost nfscl: never fnd open nfscl: never fnd open nfscl: never fnd open nfsv4 expired locks lost nfscl: never fnd open nfscl: never fnd open nfsv4 expired locks lost nfsv4 expired locks lost nfsv4 expired locks lost nfsv4 expired locks lost nfsv4 expired locks lost nfscl: never fnd open Can I ignore them? Can I turn them off? Thanks, Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: NFSv4 console messages (locks lost etc.)
Hi, On Jun 28, 2013, at 16:14, Eggert, Lars l...@netapp.com wrote: on a -CURRENT client, I get quite a number of console messages under heavy NFSv4 load, such as: nfsv4 expired locks lost nfscl: never fnd open actually, not sure if the nfscl message is from an NFSv4 mount point or not, because the box mounts root via BOOTP, so with NFSv3 (or v2?) and some other mounts with NFSv4. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: NFSv4 console messages (locks lost etc.)
Hi, On Jun 28, 2013, at 16:37, Eggert, Lars l...@netapp.com wrote: On Jun 28, 2013, at 16:14, Eggert, Lars l...@netapp.com wrote: on a -CURRENT client, I get quite a number of console messages under heavy NFSv4 load, such as: nfsv4 expired locks lost nfscl: never fnd open actually, not sure if the nfscl message is from an NFSv4 mount point or not, because the box mounts root via BOOTP, so with NFSv3 (or v2?) and some other mounts with NFSv4. and another data point: the nfscl messages seem to disappear when I remove the BOOTP_NFSV3 flag from the kernel. The client hangs that made me dig into these messages seem to also disappear, fingers crossed. (I still get a bunch of nfsv4 expired locks lost messages, but no hangs.) Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
reboot hanging?
Hi, something changed in the last 2-3 weeks on -CURRENT that causes reboots to hang after this line: Waiting (max 60 seconds) for system process `vnlru' to stop...done I need to manually cycle the power to reboot. Any clues? Thanks, Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ccache issues during buildworld on recent -CURRENT
Hi, any further ideas? This issue still exist when building -CURRENT on -STABLE as of today. Thanks, Lars On May 23, 2013, at 12:33, Eggert, Lars l...@netapp.com wrote: Hi, On May 22, 2013, at 13:37, Dimitry Andric d...@freebsd.org wrote: Can you try to figure out which copy of clang ccache finds and runs? I enabled CCACHE_LOGFILE, and it seems that it runs /usr/bin/clang: [2013-05-23T12:25:36.810346 48913] Command line: /usr/local/libexec/ccache/clang --sysroot=/home/elars/obj/usr/home/elars/src/tmp -B/home/elars/obj/usr/home/elars/src/tmp/usr/bin -E -M -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/include -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/include -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic -I. -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -DCLANG_ENABLE_ARCMT -DCLANG_ENABLE_REWRITER -DCLANG_ENABLE_STATIC_ANALYZER -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-freebsd10.0 -DLLVM_HOSTTRIPLE=x86_64-unknown-freebsd10.0 -DDEFAULT_SYSROOT= /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Builtins.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tool s/ clang/lib/Basic/CharInfo.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Diagnostic.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/DiagnosticIDs.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileManager.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileSystemStatCache.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/IdentifierTable.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/LangOptions.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Module.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/ObjCRuntime.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OpenMPKinds.cpp /usr/home/elars / src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OperatorPrecedence.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceLocation.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceManer.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TargetInfo.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Targets.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TokenKinds.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Version.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/VersionTuple.cpp [2013-05-23T12:25:36.810373 48913] Hostname: stanley.muccbc.hq.netapp.com [2013-05-23T12:25:36.810380 48913] Working directory: (null) [2013-05-23T12:25:36.810399 48913] Failed; falling back to running the real compiler [2013-05-23T12:25:36.810405 48913] Executing /usr/bin/clang --sysroot=/home/elars/obj/usr/home/elars/src/tmp -B/home/elars/obj/usr/home/elars/src/tmp/usr/bin -E -M -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/include -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/include -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic -I. -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -DCLANG_ENABLE_ARCMT -DCLANG_ENABLE_REWRITER -DCLANG_ENABLE_STATIC_ANALYZER -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-freebsd10.0 -DLLVM_HOSTTRIPLE=x86_64-unknown-freebsd10.0 -DDEFAULT_SYSROOT= /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Builtins.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Cha rI nfo.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Diagnostic.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/DiagnosticIDs.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileManager.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileSystemStatCache.cpp /usr/home
Re: reboot hanging?
On Jun 17, 2013, at 15:18, Luiz Otavio O Souza lists...@gmail.com wrote: It was a change on alq. The fix was committed in r251838. Awesome! I can confirm it's fixed. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ccache issues during buildworld on recent -CURRENT
Hi, On Jun 17, 2013, at 14:51, Bryan Drewery bdrew...@freebsd.org wrote: ccache is known to be broken with clang [1]. I would recommend not using it. Pity! We do buildworld often enough that that's an inconvenience. Sometimes CCACHE_CPP2=1 in make.conf can help. CCACHE_CPP2=1 is hardcoded in version 3.1.9_2 of devel/ccache on CURRENT. There are other classes of problems that are not fixed in upstream ccache yet. doesn't seem to help, unfortunately. There are a few fixes in the ccache development repository that I have been meaning to bring to our devel/ccache port. I'd be happy to test once there is something to test! Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: mounting root from NFS via ROOTDEVNAME
Hi, On Jun 3, 2013, at 1:57, Rick Macklem rmack...@uoguelph.ca wrote: Cool. Thanks. Would you like to review and/or test the above? it'd be great if folks would test this a bit. It certainly works for me, but I can't say that I have done a very thorough testing. I'll be happy to commit it if Lars doesn't have a src commit bit. (I've seen his posts, but can't remember if he is a committer?) I'm not a committer. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: mounting root from NFS via ROOTDEVNAME
Hi, to conclude this thread, the patch below allows one to specify an nfs rootfs via the ROOTDEVNAME kernel option, which will be mounted when BOOTP does not return a root-path option. Lars diff --git a/sys/nfs/bootp_subr.c b/sys/nfs/bootp_subr.c index 2c57a91..972fb12 100644 --- a/sys/nfs/bootp_subr.c +++ b/sys/nfs/bootp_subr.c @@ -45,6 +45,7 @@ __FBSDID($FreeBSD$); #include opt_bootp.h #include opt_nfs.h +#include opt_rootdevname.h #include sys/param.h #include sys/systm.h @@ -870,8 +871,20 @@ bootpc_call(struct bootpc_globalcontext *gctx, struct thread *td) rtimo = time_second + BOOTP_SETTLE_DELAY; printf( (got root path)); - } else + } else { printf( (no root path)); +#ifdef ROOTDEVNAME + /* +* If we'll mount rootfs from +* ROOTDEVNAME, we can accept +* offers without root paths. +*/ + gotrootpath = 1; + rtimo = time_second + + BOOTP_SETTLE_DELAY; + printf( (ROOTDEVNAME)); +#endif + } printf(\n); } } /* while secs */ @@ -1440,6 +1453,16 @@ bootpc_decode_reply(struct nfsv3_diskless *nd, struct bootpc_ifcontext *ifctx, p = bootpc_tag(gctx-tag, ifctx-reply, ifctx-replylen, TAG_ROOT); +#ifdef ROOTDEVNAME + /* +* If there was no root path in BOOTP, use the one in ROOTDEVNAME. +*/ + if (p == NULL) { + p = strdup(ROOTDEVNAME, M_TEMP); + if (strcmp(strsep(p, :), nfs) != 0) + panic(ROOTDEVNAME is not an NFS mount point); + } +#endif if (p != NULL) { if (gctx-setrootfs != NULL) { printf(rootfs %s (ignored) , p); ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
port building warnings about compat.ia32.maxvmem
Hi, when I try to build ports on -CURRENT, I've been seeing tons of these messages since the past week or so: make: /usr/ports/Mk/bsd.port.mk line 1633: warning: Couldn't read shell's output for if /sbin/sysctl -n compat.ia32.maxvmem /dev/null 21; then echo YES; fi The amd64 kernel I'm building this on doesn't have COMPAT_FREEBSD32 defined. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ccache issues during buildworld on recent -CURRENT
Hi, On May 22, 2013, at 13:37, Dimitry Andric d...@freebsd.org wrote: Can you try to figure out which copy of clang ccache finds and runs? I enabled CCACHE_LOGFILE, and it seems that it runs /usr/bin/clang: [2013-05-23T12:25:36.810346 48913] Command line: /usr/local/libexec/ccache/clang --sysroot=/home/elars/obj/usr/home/elars/src/tmp -B/home/elars/obj/usr/home/elars/src/tmp/usr/bin -E -M -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/include -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/include -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic -I. -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -DCLANG_ENABLE_ARCMT -DCLANG_ENABLE_REWRITER -DCLANG_ENABLE_STATIC_ANALYZER -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-freebsd10.0 -DLLVM_HOSTTRIPLE=x86_64-unknown-freebsd10.0 -DDEFAULT_SYSROOT= /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Builtins.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/ clang/lib/Basic/CharInfo.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Diagnostic.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/DiagnosticIDs.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileManager.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileSystemStatCache.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/IdentifierTable.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/LangOptions.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Module.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/ObjCRuntime.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OpenMPKinds.cpp /usr/home/elars/ src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OperatorPrecedence.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceLocation.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceManer.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TargetInfo.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Targets.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TokenKinds.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Version.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/VersionTuple.cpp [2013-05-23T12:25:36.810373 48913] Hostname: stanley.muccbc.hq.netapp.com [2013-05-23T12:25:36.810380 48913] Working directory: (null) [2013-05-23T12:25:36.810399 48913] Failed; falling back to running the real compiler [2013-05-23T12:25:36.810405 48913] Executing /usr/bin/clang --sysroot=/home/elars/obj/usr/home/elars/src/tmp -B/home/elars/obj/usr/home/elars/src/tmp/usr/bin -E -M -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/include -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/include -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic -I. -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -DCLANG_ENABLE_ARCMT -DCLANG_ENABLE_REWRITER -DCLANG_ENABLE_STATIC_ANALYZER -DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-freebsd10.0 -DLLVM_HOSTTRIPLE=x86_64-unknown-freebsd10.0 -DDEFAULT_SYSROOT= /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Builtins.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/CharI nfo.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Diagnostic.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/DiagnosticIDs.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileManager.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileSystemStatCache.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/IdentifierTable.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/LangOptions.cpp
ccache issues during buildworld on recent -CURRENT
Hi, my buildworlds using ccache have recently begun failing with the message below. Buildworld without ccache works fine. Any ideas? CC='/usr/local/libexec/ccache/world/clang --sysroot=/home/elars/obj/usr/home/elars/src/tmp -B/home/elars/obj/usr/home/elars/src/tmp/usr/bin' mkdep -f .depend -a -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/include -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/include -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic -I. -I/usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -DCLANG_ENABLE_ARCMT -DCLANG_ENABLE_REWRITER -DCLANG_ENABLE_STATIC_ANALYZER -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd10.0\ -DLLVM_HOSTTRIPLE=\x86_64-unknown-freebsd10.0\ -DDEFAULT_SYSROOT=\\ /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Builtins.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/li b/Basic/CharInfo.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Diagnostic.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/DiagnosticIDs.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileManager.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/FileSystemStatCache.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/IdentifierTable.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/LangOptions.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Module.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/ObjCRuntime.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OpenMPKinds.cpp /usr/home/elars/src/lib/ clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/OperatorPrecedence.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceLocation.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceManager.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TargetInfo.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Targets.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/TokenKinds.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/Version.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/VersionTuple.cpp /usr/home/elars/src/lib/clang/libclangbasic/../../../contrib/llvm/tools/clang/lib/Basic/SourceManager.cpp:1100:10: fatal error: 'emmintrin.h' file not found #include emmintrin.h ^ 1 error generated. mkdep: compile failed Thanks, Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: newnfs pkgng database corruption?
Hi, On Apr 22, 2013, at 2:56, Baptiste Daroussin b...@freebsd.org wrote: As anyone been able to test this patch? I've been running with it for a few days. I've done a reinstall of all ports plus a few portmaster -a runs without pkgng database corruption. I've not tested it for very long, but so far, things look good. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: newnfs pkgng database corruption?
Hi, On Apr 12, 2013, at 1:10, Rick Macklem rmack...@uoguelph.ca wrote: Well, I have no idea why an NFS server would reply errno 70 if the file still exists, unless the client has somehow sent a bogus file handle to the server. (I am not aware of any client bug that might do that. I am almost suspicious that there might be a memory problem or something that corrupts bits in the network layer. Do you have TSO enabled for your network interface by any chance? If so, I'd try disabling that on the network interface. Same goes for checksum offload.) rick ps: If you can capture packets between the client and server at the time this error occurs, looking at them in wireshark might be useful? I will try all of those things. But first, a question that someone who understands pkgng will be able to answerr: Is this fake-pkg process even running on the NFS mount? The WRKDIR is /tmp, which is an mfs mount. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: newnfs pkgng database corruption?
Hi, On Apr 11, 2013, at 10:30, Baptiste Daroussin b...@freebsd.org wrote: First, I think you can recover your database. that would be great. Can you try the following command: # mv /var/db/pkg/local.sqlite /var/db/pkg/backup.sqlite # echo '.dump' | pkg shell /var/db/pkg/backup.sqlite | pkg shell That step doesn't quite work: [root@stanley /usr/home/elars/local/db]# echo '.dump' | pkg shell backup.sqlite | pkg shell Error: near line 15927: column path is not unique Error: near line 15928: column path is not unique Error: near line 15929: column path is not unique Error: near line 15930: column path is not unique Error: near line 15931: column path is not unique Error: near line 15932: column path is not unique Error: near line 15933: column path is not unique Error: near line 15934: column path is not unique Error: near line 15935: column path is not unique Error: near line 15936: column path is not unique Error: near line 15937: column path is not unique [root@stanley /usr/home/elars/local/db]# ll local.sqlite -rw-r--r-- 1 root wheel 0 Apr 11 10:42 local.sqlite I can send you the database off-list, if you like. I think the corruption you get are due to the synchronous pragma. I need to dig in that direction. Thanks for looking into this! Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
newnfs pkgng database corruption?
Hi, on a diskless server, I keep the ports tree and pkgng databases on a newnfs NFSv4 mount. After a bunch of portmaster -a runs, the pkgng sqlite database appears to get corrupted. For example, when I try to update an existing port, this happens: root@five:~ # portmaster ports-mgmt/pkg ... === Registering installation for pkg-1.0.11 Installing pkg-1.0.11...pkg: sqlite: database disk image is malformed (pkgdb.c:925) pkg: sqlite: database disk image is malformed (pkgdb.c:1914) *** [fake-pkg] Error code 70 I have removed all ports and the pkgng databases and reinstalled, but the corruption seems to return after a few days or weeks of installing and deinstalling ports. On another system that has a disk, that corruption of the pkgng database has not happened over six months or so. I therefore wonder if storing the sqlite database on an NFS-mount is triggering some sort of bug, either in pkgng or in newnfs. AFAIK, pkgng is using locks on the database quite liberally, could that be where a bug is lurking? I'm happy to help debug this, but someone would need to let me know what to try. Thanks, Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: newnfs pkgng database corruption?
Hi, On Apr 10, 2013, at 10:02, Baptiste Daroussin b...@freebsd.org wrote: This can usually happen when a user do not have the nfs lock system started. Are you sure that nfs lock is correctly started? with NFSv4, the locking system is integrated with the main protocol, it's no longer separate. If that is the case, there is anyway a bug in pkgng that should catch the problem and refuse to operate in such situation, I know sqlite to provide a mechanism that allow us to be able to catch this, I'm not sure yet to use it. Not sure about that. In case anyone wonders, the corruption is quite substantial: [elars@stanley ~]$ sqlite3 local/db/local.sqlite SQLite version 3.7.14.1 2012-10-04 19:37:12 Enter .help for instructions Enter SQL statements terminated with a ; sqlite PRAGMA integrity_check; *** in database main *** On tree page 1238 cell 17: 2nd reference to page 1237 On tree page 1238 cell 17: Child page depth differs On tree page 1238 cell 18: Child page depth differs On tree page 1241 cell 6: Rowid 17518 out of order (max larger than parent max of 12550) On tree page 1242 cell 3: Rowid 17566 out of order (max larger than parent max of 12557) On tree page 1243 cell 6: Rowid 12558 out of order (min less than parent min of 17566) On tree page 2867 cell 28: 2nd reference to page 1241 On tree page 2867 cell 28: Child page depth differs On tree page 2867 cell 29: 2nd reference to page 1242 On tree page 2867 cell 30: Child page depth differs On tree page 1417 cell 66: 2nd reference to page 1239 On tree page 1417 cell 66: Child page depth differs On tree page 1417 cell 67: 2nd reference to page 1240 On tree page 1417 cell 68: Child page depth differs rowid 62 missing from index sqlite_autoindex_packages_1 wrong # of entries in index sqlite_autoindex_packages_1 rowid 96 missing from index scripts_package_id rowid 96 missing from index sqlite_autoindex_scripts_1 rowid 97 missing from index scripts_package_id rowid 97 missing from index sqlite_autoindex_scripts_1 rowid 98 missing from index scripts_package_id rowid 98 missing from index sqlite_autoindex_scripts_1 wrong # of entries in index scripts_package_id wrong # of entries in index sqlite_autoindex_scripts_1 rowid 12509 missing from index sqlite_autoindex_files_1 rowid 12510 missing from index sqlite_autoindex_files_1 rowid 12511 missing from index sqlite_autoindex_files_1 rowid 12512 missing from index sqlite_autoindex_files_1 rowid 86 missing from index files_package_id rowid 86 missing from index sqlite_autoindex_files_1 rowid 87 missing from index files_package_id rowid 87 missing from index sqlite_autoindex_files_1 Error: database disk image is malformed Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: newnfs pkgng database corruption?
Hi, On Apr 11, 2013, at 1:28, Rick Macklem rmack...@uoguelph.ca wrote: Error code 70 is ESTALE (or NFSERR_STALE, if you prefer). The server replies with that when the file no longer exists. File locking doesn't stop a file from being removed, as far as I know. but the file is still there. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: newnfs pkgng database corruption?
Hi, On Apr 11, 2013, at 0:16, Baptiste Daroussin b...@freebsd.org wrote: Will you be able to test it? yes. (But I will be traveling for the next two weeks and so the turnaround may be a bit longer than normal.) Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: NewNFS vs. oldNFS for 10.0?
Hi, this reminds me that I ran into an issue lately with the new NFS and locking for NFSv3 mounts on a client that ran -CURRENT and a server that ran -STABLE. When I ran portmaster -a on the client, which mounted /usr/ports and /usr/local, as well as the location of the respective sqlite databases over NFSv3, the client network stack became unresponsive on all interfaces for 30 or so seconds and e.g. SSH connections broke. The serial console remained active throughout, and the system didn't crash. About a minute after the wedgie I could SSH into the box again, too. The issue went away when I killed lockd on the client, but that caused the sqlite database to become corrupted over time. The workaround for me was to move to NFSv4, which has been working fine. (One more reason to make it the default...) I'm not really sure how to debug this further, but would be willing to work with someone off-list who'd tell me what tests to run. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: serial console not accepting input?
On Mar 4, 2013, at 20:59, Doug Ambrisko ambri...@ambrisko.com wrote: Try to do a {Ctrl}D to see if works. We've seen that the TX on reset hangs but input works fine. I'm not sure if we ran into this with uart(4) but had a problem with sio(4). No change. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Dtrace: Module is no longer loaded
Hi, On Feb 19, 2013, at 16:03, Andriy Gapon a...@freebsd.org wrote: Couple of thoughts: - is your kernel installed in the typical location? yup. - what does the following produce? readelf -a -W /boot/kernel/kernel | fgrep shstrtab readelf -a -W /boot/kernel/kernel | fgrep SUNW_ctf # readelf -a -W /boot/kernel/kernel | fgrep shstrtab [24] .shstrtab STRTAB 7975ee 000124 00 0 0 1 # readelf -a -W /boot/kernel/kernel | fgrep SUNW_ctf [22] .SUNW_ctf PROGBITS 74ed68 048872 00 0 0 4 And then: # dtrace -n 'syscall:::' dtrace: invalid probe specifier syscall /usr/lib/dtrace/psinfo.d, line 90: failed to resolve type kernel`struct thread * for identifier curthread: Module is no longer loaded Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: system 20% busy at all times?
Hi, On Feb 19, 2013, at 17:58, Adrian Chadd adr...@freebsd.org wrote: Try top -HS .. to try and break down the kernel threads. ACPI is eating the cycles, according to top: 0 root 80 0K 496K - 2 1:13 27.88% kernel{acpi_task_2} 0 root 80 0K 496K - 0 1:13 25.68% kernel{acpi_task_1} 0 root 80 0K 496K CPU11 1:07 23.68% kernel{acpi_task_0} I got an off-list hint that the machine in question requires device mptable instead of relying on ACPI. I will try that. As for dtrace, a complete buildworld/installworld cycle didn't change things, I still get: # dtrace -n 'syscall:::entry { @num[execname] = count(); }' dtrace: invalid probe specifier syscall:::entry { @num[execname] = count(); }: /usr/lib/dtrace/psinfo.d, line 90: failed to resolve type kernel`struct thread * for identifier curthread: Module is no longer loaded Thanks for all the help! Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
system 20% busy at all times?
Hi, I have a system running -CURRENT that in top(1) is showing ~20% CPU usage for the system at all times. Any ideas what could be causing this, or how I would go about diagnosing this further? Nothing in the logs. Thanks, Lars PS: dmesg attached, in case it helps: Copyright (c) 1992-2013 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 10.0-CURRENT #11 r+2fc9b3d: Tue Feb 12 19:32:15 CET 2013 el...@stanley.muccbc.hq.netapp.com:/home/elars/obj/usr/home/elars/src/sys/FAS3270 amd64 FreeBSD clang version 3.2 (tags/RELEASE_32/final 170710) 20121221 CPU: Intel(R) Xeon(R) CPU E5240 @ 3.00GHz (3000.17-MHz K8-class CPU) Origin = GenuineIntel Id = 0x1067a Family = 0x6 Model = 0x17 Stepping = 10 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0xc0ce3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE,OSXSAVE AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant, performance statistics real memory = 18253611008 (17408 MB) avail memory = 16526143488 (15760 MB) Event timer LAPIC quality 400 ACPI APIC Table: PTLTD CARNEGIE FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 2 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP): APIC ID: 7 ioapic0 Version 2.0 irqs 0-23 on motherboard kbd0 at kbdmux0 ctl: CAM Target Layer loaded smbios0: System Management BIOS at iomem 0xf6c00-0xf6c1e on motherboard smbios0: Version: 2.5 cryptosoft0: software crypto on motherboard acpi0: PTLTD CARNEGIE on motherboard acpi0: Power Button (fixed) cpu0: ACPI CPU on acpi0 ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND (20130117/psargs-393) ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560) ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND (20130117/psargs-393) ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560) ACPI Error: Method parse/execution failed [\134_PR_.CPU0._PDC] (Node 0xfe0007630c40), AE_NOT_FOUND (20130117/psparse-560) cpu1: ACPI CPU on acpi0 cpu2: ACPI CPU on acpi0 cpu3: ACPI CPU on acpi0 atrtc0: AT realtime clock port 0x70-0x71 irq 8 on acpi0 Event timer RTC frequency 32768 Hz quality 0 attimer0: AT timer port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter i8254 frequency 1193182 Hz quality 0 Event timer i8254 frequency 1193182 Hz quality 100 Timecounter ACPI-safe frequency 3579545 Hz quality 850 acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0 pci1: ACPI PCI bus on pcib1 pci1: network, ethernet at device 0.0 (no driver attached) pcib2: PCI-PCI bridge at device 3.0 on pci0 pci2: PCI bus on pcib2 pcib3: ACPI PCI-PCI bridge at device 4.0 on pci0 pci3: ACPI PCI bus on pcib3 pcib4: ACPI PCI-PCI bridge mem 0xdeb0-0xdeb1 irq 16 at device 0.0 on pci3 pci4: ACPI PCI bus on pcib4 pcib4: no PRT entry for 4.4.INTA pcib4: no PRT entry for 4.5.INTA pcib4: no PRT entry for 4.8.INTA pcib5: PCI-PCI bridge irq 5 at device 4.0 on pci4 pci5: PCI bus on pcib5 pcib6: PCI-PCI bridge irq 10 at device 5.0 on pci4 pci6: PCI bus on pcib6 pcib4: no PRT entry for 4.5.INTA pcib4: no PRT entry for 4.5.INTB ix0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.0 mem 0xdec0-0xdec7,0xded0-0xded03fff irq 10 at device 0.0 on pci6 ix0: Using MSIX interrupts with 5 vectors ix0: Ethernet address: 90:e2:ba:2b:3b:6c ix0: PCI Express Bus: Speed 5.0Gb/s Width x8 ix1: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.0 mem 0xdec8-0xdecf,0xded04000-0xded07fff irq 11 at device 0.1 on pci6 ix1: Using MSIX interrupts with 5 vectors ix1: Ethernet address: 90:e2:ba:2b:3b:6d ix1: PCI Express Bus: Speed 5.0Gb/s Width x8 pcib7: ACPI PCI-PCI bridge irq 5 at device 8.0 on pci4 pci7: ACPI PCI bus on pcib7 pcib8: PCI-PCI bridge at device 5.0 on pci0 pci8: PCI bus on pcib8 pcib9: PCI-PCI bridge at device 6.0 on pci0 pci9: PCI bus on pcib9 pcib10: PCI-PCI bridge mem 0xdee0-0xdee1 irq 16 at device 0.0 on pci9 pci10: PCI bus on pcib10 pcib11: PCI-PCI bridge irq 16 at device 0.0 on pci10 pci11: PCI bus on pcib11 pcib12: PCI-PCI bridge mem 0xdef0-0xdef1 irq 16 at device 0.0 on pci11 pci12: PCI bus on pcib12 pcib13: PCI-PCI bridge irq 17 at device 1.0 on pci12 pci13: PCI bus on pcib13 pcib14: PCI-PCI bridge irq 16 at device 4.0 on pci12 pci14: PCI bus on pcib14 pcib15: PCI-PCI bridge irq 17 at device 5.0 on pci12
Re: system 20% busy at all times?
Hi, On Feb 19, 2013, at 10:40, Fleuriot Damien m...@my.gd wrote: What about reviewing top(1) ? top shows the ~20% I mentioned: last pid: 3176; load averages: 0.79, 0.80, 0.84 up 0+14:49:49 09:43:51 17 processes: 1 running, 16 sleeping CPU: 0.0% user, 0.0% nice, 18.7% system, 0.0% interrupt, 81.3% idle Mem: 32M Active, 9456K Inact, 196M Wired, 19M Buf, 15G Free Swap: PID USERNAMETHR PRI NICE SIZERES STATE C TIME WCPU COMMAND 3002 root 1 200 14264K 1664K select 0 0:02 0.00% powerd 2999 root 1 200 25120K 3304K select 3 0:01 0.00% ntpd 3084 root 1 200 81420K 6120K select 0 0:00 0.00% sshd 3094 root 1 200 17180K 3956K pause 1 0:00 0.00% csh 3062 root 1 210 17180K 3900K ttyin 2 0:00 0.00% csh 2867 root 1 200 14296K 2028K select 1 0:00 0.00% syslogd 2959 root 1 520 20500K 7512K rpcsvc 3 0:00 0.00% rpc.lockd 2943 root 1 200 16376K 2064K select 2 0:00 0.00% rpcbind 3061 root 1 200 47504K 2604K wait3 0:00 0.00% login 2945 root 1 200 274M 7448K select 1 0:00 0.00% rpc.statd 3176 root 1 200 19608K 2996K CPU31 0:00 0.00% top 2676 root 1 200 9016K 4652K select 0 0:00 0.00% devd 3014 root 1 200 56152K 4964K select 2 0:00 0.00% sshd 2562 root 1 290 14416K 2224K select 1 0:00 0.00% dhclient 2629 _dhcp 1 200 14416K 2240K select 2 0:00 0.00% dhclient 3065 root 1 200 14528K 1708K select 2 0:00 0.00% netserver 2708 root 1 200 14232K 1568K select 2 0:00 0.00% rtsold or possibly ps(1) aufx # ps -aufx USER PID %CPU %MEMVSZ RSS TT STAT STARTED TIME COMMAND root10 346.8 0.0 0 64 - RL6:54PM 2862:46.43 [idle] root 0 64.1 0.0 0 496 - DLs 6:54PM 694:47.32 [kernel] root 1 0.0 0.0 9344 792 - ILs 6:54PM0:00.09 /sbin/init -- root 2 0.0 0.0 0 16 - DL6:54PM0:00.00 [crypto] root 3 0.0 0.0 0 16 - DL6:54PM0:00.00 [crypto returns] root 4 0.0 0.0 0 16 - DL6:54PM0:00.00 [ctl_thrd] root 5 0.0 0.0 0 16 - DL6:54PM0:00.00 [xpt_thrd] root 6 0.0 0.0 0 16 - DL6:54PM0:00.00 [ipmi0: kcs] root 7 0.0 0.0 0 16 - DL6:54PM0:00.04 [pagedaemon] root 8 0.0 0.0 0 16 - DL6:54PM0:00.00 [pagezero] root 9 0.0 0.0 0 16 - DL6:54PM0:00.15 [bufdaemon] root11 0.0 0.0 0 416 - WL6:54PM0:10.47 [intr] root12 0.0 0.0 0 48 - DL6:54PM0:00.03 [geom] root13 0.0 0.0 0 16 - DL6:54PM0:01.87 [yarrow] root14 0.0 0.0 0 256 - DL6:54PM0:00.50 [usb] root15 0.0 0.0 0 16 - DL6:54PM0:00.17 [vnlru] root16 0.0 0.0 0 16 - DL6:54PM0:00.56 [syncer] root17 0.0 0.0 0 16 - DL6:54PM0:00.21 [softdepflush] root42 0.0 0.0 0 16 - DL6:54PM0:00.03 [md0] root53 0.0 0.0 0 16 - DL6:54PM0:00.00 [md1] root 120 0.0 0.0 0 16 - DL6:54PM0:00.00 [md2] root 125 0.0 0.0 0 16 - DL6:54PM0:00.00 [md3] root 2562 0.0 0.0 14416 2224 - Is6:54PM0:00.00 dhclient: em4 [priv] (dhclient) _dhcp 2629 0.0 0.0 14416 2240 - Is6:54PM0:00.00 dhclient: em4 (dhclient) root 2676 0.0 0.0 9016 4652 - Is6:54PM0:00.01 /sbin/devd root 2708 0.0 0.0 14232 1568 - Is6:54PM0:00.00 /usr/sbin/rtsold -a root 2867 0.0 0.0 14296 2028 - Ss6:54PM0:00.04 /usr/sbin/syslogd -s root 2943 0.0 0.0 16376 2064 - Ss6:54PM0:00.03 /usr/sbin/rpcbind root 2945 0.0 0.0 280472 7448 - Ss6:54PM0:00.02 /usr/sbin/rpc.statd root 2959 0.0 0.0 20500 7512 - Ss6:54PM0:00.04 /usr/sbin/rpc.lockd root 2999 0.0 0.0 25120 3304 - Ss6:54PM0:00.85 /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift root 3002 0.0 0.0 14264 1664 - Ss6:54PM0:01.61 /usr/sbin/powerd root 3014 0.0 0.0 56152 4964 - Is6:54PM0:00.00 /usr/sbin/sshd -o PermitRootLogin=without-password root 3065 0.0 0.0 14528 1708 - Is6:54PM0:00.00 netserver root 3084 0.0 0.0 81420 6120 - Ss9:21AM0:00.09 sshd: root@pts/0 (sshd) root 3061 0.0 0.0 47504 2604 u0 Is6:54PM0:00.02 login [pam] (login) root 3062 0.0 0.0 17180 3900 u0 I+6:54PM0:00.05 -csh (csh) root 3094 0.0 0.0 17180 3956 0 Ss9:32AM0:00.05 -csh (csh) root 3177 0.0 0.0 16436 1900 0 R+9:44AM0:00.00 ps -aufx At least you
Re: system 20% busy at all times?
Hi, On Feb 19, 2013, at 10:54, Fleuriot Damien m...@my.gd wrote: And indeed we find your answer here, acpi0 firing up a lot of interrupts. Don't you get any message about that in dmesg -a or /var/log/messages ? I'd expect something like interrupt storm blabla… source throttled blabla.. nope. The only odd ACPI-related messages I see in dmesg are these: ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND (20130117/psargs-393) ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560) ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND (20130117/psargs-393) ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560) ACPI Error: Method parse/execution failed [\134_PR_.CPU0._PDC] (Node 0xfe0007630c40), AE_NOT_FOUND (20130117/psparse-560) Nothing in syslog. From man 4 acpi , in /boot/loader.conf : hint.acpi.0.disabled=1 Set this to 1 to disable all of ACPI. If ACPI has been disabled on your system due to a blacklist entry for your BIOS, you can set this to 0 to re-enable ACPI for testing. Any chance you could reboot the host with ACPI disabled ? If I do that, I get an early kernel crash: Loading 10.11.12.13/~elars/kernel/kernel:0x20/7634255 0xb47d50/473552 0xbbb720/890736 Entry at 0x802746f0 Closing network. Starting program at 0x802746f0 GDB: no debug ports present KDB: debugger backends: ddb KDB: current backend: ddb panic: running without device atpic requires a local APIC cpuid = 0 KDB: stack backtrace: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0x805c2973 stack pointer = 0x28:0x80c9a960 frame pointer = 0x28:0x80c9aa80 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= resume, IOPL = 0 current process = 0 () [ thread pid 0 tid 0 ] Stopped at 0x805c2973: movzbl (%rdi),%ecx If that helps your CPU load, try setting this in /boot/loader.conf : hw.acpi.verbose=1 Turn on verbose debugging information about what ACPI is doing. Done, but it doesn't really result in any additional messages: # dmesg | grep -i acpi Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE ACPI APIC Table: PTLTD CARNEGIE acpi0: PTLTD CARNEGIE on motherboard acpi0: Power Button (fixed) cpu0: ACPI CPU on acpi0 ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND (20130117/psargs-393) ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560) ACPI Error: [\134_SB_.PCI0.LPC0.BCMD] Namespace lookup failure, AE_NOT_FOUND (20130117/psargs-393) ACPI Error: Method parse/execution failed [\134_PR_.CPU0._OSC] (Node 0xfe0007630c00), AE_NOT_FOUND (20130117/psparse-560) ACPI Error: Method parse/execution failed [\134_PR_.CPU0._PDC] (Node 0xfe0007630c40), AE_NOT_FOUND (20130117/psparse-560) cpu1: ACPI CPU on acpi0 cpu2: ACPI CPU on acpi0 cpu3: ACPI CPU on acpi0 atrtc0: AT realtime clock port 0x70-0x71 irq 8 on acpi0 attimer0: AT timer port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter ACPI-fast frequency 3579545 Hz quality 900 acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0 pci1: ACPI PCI bus on pcib1 pcib3: ACPI PCI-PCI bridge at device 4.0 on pci0 pci3: ACPI PCI bus on pcib3 pcib4: ACPI PCI-PCI bridge mem 0xdeb0-0xdeb1 irq 16 at device 0.0 on pci3 pci4: ACPI PCI bus on pcib4 pcib7: ACPI PCI-PCI bridge irq 5 at device 8.0 on pci4 pci7: ACPI PCI bus on pcib7 pcib29: ACPI PCI-PCI bridge irq 16 at device 28.0 on pci0 pci29: ACPI PCI bus on pcib29 pcib30: ACPI PCI-PCI bridge irq 16 at device 28.4 on pci0 pci30: ACPI PCI bus on pcib30 pcib31: ACPI PCI-PCI bridge irq 17 at device 28.5 on pci0 pci31: ACPI PCI bus on pcib31 pcib32: ACPI PCI-PCI bridge at device 30.0 on pci0 pci32: ACPI PCI bus on pcib32 acpi_button0: Power Button on acpi0 uart0: 16550 or compatible port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart1: 16550 or compatible port 0x2f8-0x2ff irq 3 on acpi0 Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: system 20% busy at all times?
Hi, On Feb 19, 2013, at 11:21, Fleuriot Damien m...@my.gd wrote: What about a newly build kernel without the line device acpi and without the options ACPI_DEBUG ? Hoping that this kernel: 1/ won't crash on boot 2/ will make the 20% cpu load and high interrupt rates disappear I added device atpic to my kernel config and rebooted with hint.acpi.0.disabled=1 in the loader. I get further during boot, but then get a panic: No usable event timer found! Also, my is devices showed errors trying to allocate bus resources. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: system 20% busy at all times?
Hi, thanks for looking into this! On Feb 19, 2013, at 12:14, Andriy Gapon a...@freebsd.org wrote: Please try to run the following DTrace script (dtrace -s script-file) and capture its output. I get this error: # dtrace -s x dtrace: failed to compile script x: /usr/lib/dtrace/psinfo.d, line 90: failed to resolve type kernel`struct thread * for identifier curthread: Module is no longer loaded (New to dtrace, so no clue what this means.) Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: system 20% busy at all times?
Hi, On Feb 19, 2013, at 13:15, Lars Engels lars.eng...@0x20.net wrote: You need to recompile your Kernel to use DTrace: https://wiki.freebsd.org/DTrace I did. But I still get that error, even with the sample from the wiki: # dtrace -n 'syscall:::entry { @num[execname] = count(); }' dtrace: invalid probe specifier syscall:::entry { @num[execname] = count(); }: /usr/lib/dtrace/psinfo.d, line 90: failed to resolve type kernel`struct thread * for identifier curthread: Module is no longer loaded I cross-compile the -CURRENT world and kernel under -STABLE for netbooting. Could doing that cause this issue? Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: system 20% busy at all times?
Hi, On Feb 19, 2013, at 13:37, Eggert, Lars l...@netapp.com wrote: On Feb 19, 2013, at 13:15, Lars Engels lars.eng...@0x20.net wrote: You need to recompile your Kernel to use DTrace: https://wiki.freebsd.org/DTrace I did. But I still get that error, even with the sample from the wiki: # dtrace -n 'syscall:::entry { @num[execname] = count(); }' dtrace: invalid probe specifier syscall:::entry { @num[execname] = count(); }: /usr/lib/dtrace/psinfo.d, line 90: failed to resolve type kernel`struct thread * for identifier curthread: Module is no longer loaded I cross-compile the -CURRENT world and kernel under -STABLE for netbooting. Could doing that cause this issue? FWIW, a full buildworld/installworld of the latest -CURRENT also didn't help, the error remains. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Dtrace: Module is no longer loaded
Hi, did you ever figure this out? I'm seeing the same thing. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: mounting root from NFS via ROOTDEVNAME
Hi, On Jan 30, 2013, at 22:43, Craig Rodrigues rodr...@crodrigues.org wrote: What you need to do is, before the FreeBSD kernel boots, your loader needs to export some environment variables. This will trigger the various behaviors in the FreeBSD mount code. the loader can export some environment variables (this is how I get the serial console working.) So as I suggested before, you should continue with: (1) Have /usr/home/elars/dst/etc/fstab with: # Options Dump Pass 10.11.12.13:/usr/home/elars/dst/ / nfs ro00 Done. (2) From your loader, you need to export this environment variable, so that the kernel can get it with getenv(). You need at least: vfs.root.mountfrom=nfs:10.11.12.13:/usr/home/elars/dst Done. Now, there are some other environment variables you need to export from the loader. boot.netif.ip boot.netif.netmask boot.netif.gateway boot.nfsroot.server boot.nfsroot.path Done. I also ripped out all the BOOTP* options from the kernel. However, this still fails: Trying to mount root from nfs:10.11.12.13:/usr/home/elars/dst []... mountroot: waiting for device 10.11.12.13:/usr/home/elars/dst ... Mounting from nfs:10.11.12.13:/usr/home/elars/dst failed with error 19. Loader variables: vfs.root.mountfrom=nfs:10.11.12.13:/usr/home/elars/dst Manual root filesystem specification: fstype:device [options] Mount device using filesystem fstype and with the specified (optional) option list. eg. ufs:/dev/da0s1a zfs:tank cd9660:/dev/acd0 ro (which is equivalent to: mount -t cd9660 -o ro /dev/acd0 /) ? List valid disk boot devices . Yield 1 second (for background tasks) empty lineAbort manual input mountroot I did a tcpdump and no traffic shows up on the correct interface (em4). I guess I need to set yet another loader environment variable to indicate which interface I'd like to use. Looking at the source, I only see boot.netif.name, but setting that to em4 doesn't help either. Any further ideas? Thanks, Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: mounting root from NFS via ROOTDEVNAME
On Jan 31, 2013, at 12:45, Andreas Nilsson andrn...@gmail.com wrote: Just a shot in the dark, did you actually tell it to do the root mount ro, or try with the nfs share as rw? ro Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: mounting root from NFS via ROOTDEVNAME
On Jan 31, 2013, at 12:53, Andre Oppermann an...@freebsd.org wrote: The interface doesn't have a name during loader stage. The kernel finds the interface to use based on the MAC address. You should set boot.netif.hwaddr as well in the kernel environment. Done, no change. Here is what's in my loader environment: boot.netif.netmask 255.255.255.0 boot.netif.gateway 10.11.12.13 boot.nfsroot.server 10.11.12.13 boot.nfsroot.path/usr/home/elars/dst boot.netif.ip10.11.12.15 boot.netif.name em4 boot.netif.hwaddrxx:xx:xx:xx:xx:xx vfs.root.mountfrom nfs:10.11.12.13:/usr/home/elars/dst And here is what I see during boot: Trying to mount root from nfs:10.11.12.13:/usr/home/elars/dst []... mountroot: waiting for device 10.11.12.13:/usr/home/elars/dst ... Mounting from nfs:10.11.12.13:/usr/home/elars/dst failed with error 19. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: mounting root from NFS via ROOTDEVNAME
On Jan 31, 2013, at 15:54, Daniel Braniss da...@cs.huji.ac.il wrote: a shot in the dark, but is /usr/home/elars/dst properly exported? Yep, the NFS mount works fine when I use BOOTP with a root-path option Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: mounting root from NFS via ROOTDEVNAME
Hi, On Jan 30, 2013, at 10:32, Eggert, Lars l...@netapp.com wrote: On Jan 29, 2013, at 20:22, Craig Rodrigues rodr...@crodrigues.org wrote: In src/sys/boot/common/boot.c which is part of the loader (not the kernel), if you look in the getrootmount() function, you will see that the loader will try to figure out where the root file system is by parsing /etc/fstab, and looking for the / mount. So, if your kernel is located in: /usr/home/elars/dst/boot/kernel/kernel Then create a file /usr/home/elars/dst/etc/fstab file with something like: # Device MountpointFSType Options Dump Pass 10.11.12.13:/usr/home/elars/dst/ / nfs ro00 Thanks, will try that! doesn't work. The kernel never leaves the DHCP/BOOTP timeout for server-loop unless I hand out a root-path option via DHCP. I tried your tip above, I tried setting ROOTDEVNAME in the kernel, I created a /boot.config with -r in it on the NFS root - all to no avail. Alternatively, if you don't want to create an /etc/fstab file, then you could put something like this in your loader.conf file: vfs.root.mountfrom=nfs:10.11.12.13:/usr/home/elars/dst Will try that too, but not sure if this works with our custom loader. Doesn't seem to work either. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: mounting root from NFS via ROOTDEVNAME
Hi, On Jan 29, 2013, at 9:34, Craig Rodrigues rodr...@crodrigues.org wrote: I recommend that you do not use ROOTDEVNAME, and instead you should follow the instructions which I wrote and contributed to the FreeBSD handbook: PXE Booting with an NFS Root File System http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-pxe-nfs.html The content of this document is the same as the text file which Rick Macklem pointed out (I wrote that too). I had read both before, and they're very useful documents. Unfortunately, they don't fully apply to my case, since I'm not PXE-booting the system; it netboots the kernel from a custom loader. So once the kernel bootstraps, I need it to obtain an IP address and then NFS-mount root. BTW, if you ever visit the Netapp campus in Sunnyvale, California, feel free to say hello, because I work around the corner from there. :) -- Craig Rodrigues rodr...@crodrigues.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: mounting root from NFS via ROOTDEVNAME
On Jan 29, 2013, at 10:13, Lars Eggert l...@netapp.com wrote: On Jan 29, 2013, at 9:34, Craig Rodrigues rodr...@crodrigues.org wrote: I recommend that you do not use ROOTDEVNAME, and instead you should follow the instructions which I wrote and contributed to the FreeBSD handbook: PXE Booting with an NFS Root File System http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-pxe-nfs.html The content of this document is the same as the text file which Rick Macklem pointed out (I wrote that too). I had read both before, and they're very useful documents. Unfortunately, they don't fully apply to my case, since I'm not PXE-booting the system; it netboots the kernel from a custom loader. So once the kernel bootstraps, I need it to obtain an IP address and then NFS-mount root. (Whoops, hit send by mistake.) That's what I was trying to achieve with the BOOTP and BOOTP_WIRED_TO options. Hm, I wonder if I could simply use the custom loader to netboot tftpboot, and then follow your instructions... Will try. BTW, if you ever visit the Netapp campus in Sunnyvale, California, feel free to say hello, because I work around the corner from there. :) Am there about once a month, will do :-) Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
mounting root from NFS via ROOTDEVNAME
Hi, I'm trying to netboot a system where the root device is specified in the kernel via ROOTDEVNAME: options BOOTP options BOOTP_NFSROOT options BOOTP_NFSV3 options BOOTP_COMPAT options BOOTP_WIRED_TO=em4 options ROOTDEVNAME=\nfs:10.11.12.13:/usr/home/elars/dst\ I was under the assumption that specifying a ROOTDEVNAME in the kernel config would override the root-path option in DHCP, or at least take effect when root-path wasn't provided via DHCP, but that doesn't seem to be the case. The system configures it's address correctly over em4, but then enters a loop: em4: link state changed to UP Received DHCP Offer packet on em4 from 0.0.0.0 (accepted) (no root path) Sending DHCP Request packet from interface em4 (XX:XX:XX:XX:XX:XX) Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path) Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path) DHCP/BOOTP timeout for server 255.255.255.255 Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path) DHCP/BOOTP timeout for server 255.255.255.255 ... If I hand out a root path via DHCP the system boots fine, but the idea here is to be able to boot different root devices without needing to diddle dhcpd.conf. Can this be done? Thanks, Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: mounting root from NFS via ROOTDEVNAME
Hi, On Jan 28, 2013, at 16:23, Ian Lepore i...@freebsd.org wrote: Remove the BOOTP_NFSROOT option, it tells the bootp/dhcp code to keep querying the server until a root path is delivered. Without it, the ROOTDEVNAME option should get used (and I think even override a path from the server, if it delivers one). no luck: em4: link state changed to UP Received DHCP Offer packet on em4 from 0.0.0.0 (accepted) (no root path) Sending DHCP Request packet from interface em4 (XX:XX:XX:XX:XX:XX) Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path) DHCP/BOOTP timeout for server 255.255.255.255 Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path) DHCP/BOOTP timeout for server 255.255.255.255 ... The only visible difference is that the first Received DHCP Ack packet line is now printed only once, instead of twice as in the previous log. Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: serial console not accepting input?
Hi, On Jan 23, 2013, at 17:04, Dimitry Andric d...@freebsd.org wrote: CTS/RTS hardware flow control, maybe? E.g. add :hw to the default settings in /etc/gettytab, or make a specific entry with an added :hw setting. nope, I don't even get a login prompt if I do that. If it is a physical serial console, you could also simply have a bad cable. Try swapping it with working system. :) Spent the last few hours fiddling with the cabling and the various BIOS serial redirection options (it's a Dell 2950). My best guess is that the serial port on the box is physically broken. Thanks for the help! Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
serial console not accepting input?
Hi, I'm embarrassed to ask this newbie question, but I'm at my wit's end: I've configured a serial console according to the handbook. I see the boot messages and get the login prompt. But at no point during the boot process does the console seem to accept any input, incl. when at the boot prompt. The same serial setup works fine with other boxes. Any ideas? Thanks, Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
genmodes dumping core during buildworld
Hi, anyone see something similar before: === usr.sbin/zic/zdump (depend) rm -f .depend mkdep -f .depend -a-DTM_GMTOFF=tm_gmtoff -DTM_ZONE=tm_zone -DSTD_INSPIRED -DPCTS -DHAVE_LONG_DOUBLE -DTZDIR=\/usr/share/zoneinfo\ -Demkdir=mkdir -I/usr/src/usr.sbin/zic/zdump/.. -I/usr/src/usr.sbin/zic/zdump/../../../contrib/tzcode/stdtime -std=gnu99 /usr/src/usr.sbin/zic/zdump/../../../contrib/tzcode/zic/zdump.c /usr/src/usr.sbin/zic/zdump/../../../contrib/tzcode/zic/ialloc.c /usr/src/usr.sbin/zic/zdump/../../../contrib/tzcode/zic/scheck.c echo zdump: /usr/obj/usr/src/tmp/usr/lib/libc.a .depend === usr.sbin/zzz (depend) 1 error *** [_depend] Error code 2 1 error *** [buildworld] Error code 2 1 error This is in the log: Nov 15 15:56:49 server kernel: pid 83891 (genmodes), uid 0: exited on signal 10 (core dumped) Nov 15 15:56:49 server kernel: pid 83893 (genmodes), uid 0: exited on signal 10 (core dumped) Nov 15 15:56:49 server kernel: pid 83897 (genmodes), uid 0: exited on signal 10 (core dumped) genmodes seems to be a component of gcc? Lars