Re: Despite the documentation, "etcupdate extract" handles -D destdir (and its contribution to the default workdir)
On 4/24/21 12:22 PM, Mark Millard via freebsd-current wrote: # etcupdate -? Illegal option -? usage: etcupdate [-npBF] [-d workdir] [-r | -s source | -t tarball] [-A patterns] [-D destdir] [-I patterns] [-L logfile] [-M options] etcupdate build [-B] [-d workdir] [-s source] [-L logfile] [-M options] etcupdate diff [-d workdir] [-D destdir] [-I patterns] [-L logfile] etcupdate extract [-B] [-d workdir] [-s source | -t tarball] [-L logfile] [-M options] etcupdate resolve [-p] [-d workdir] [-D destdir] [-L logfile] etcupdate status [-d workdir] [-D destdir] The "etcupdate extract" material does not show -D destdir as valid. Thanks, it was a documentation oversight I've just fixed. It is definitely supposed to work and is quite useful for cross-builds (e.g. I use it frequently to update rootfs images I use with qemu for RISC-V or MIPS that I run under qemu, or when updating the SD-card for my RPI that I cross-build on an x86 host). -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Suspected mbuf leak with Nginx + sendfile + TLS in 12.2-STABLE
On 2/4/21 8:08 AM, GomoR wrote: Dear FreeBSD community, we are encountering a DoS condition on our production machines. Our use case is an Nginx reverse proxy serving large files via HTTPS. This problem arose when switching kernel and userland from 12.1-RELEASE to 12.2-RELEASE. Ports were not upgraded (at first). Each time a user downloads a file, mbuf & mbuf_clusters are raising to reach the maximum limit in a matter of seconds. Those values are asserted by 'netstat -m' as follows: Normal situation: mbuf: 256, 26031105, 16767,5974,428087938, 0, 0 mbuf_cluster: 2048, 8135232, 18408,2704,101644203, 0, 0 Warning situtation: mbuf: 256, 26031105, 2981516, 151205,1109483561, 0, 0 mbuf_cluster: 2048, 8135232, 2983155,4201,319714617, 0, 0 We have seen a patch related to sendfile + KTLS + mbuf at the below link and we updated to -STABLE to apply: None of the sendfile or KTLS changes from Netflix are in 12, they are only in 13 and later. Don't transmit mbufs that aren't yet ready on TOE sockets. This includes mbufs waiting for data from sendfile() I/O requests, or mbufs awaiting encryption for KTLS. https://github.com/freebsd/freebsd-src/commit/14c77f30b201bf76119d59678e72051c09c2 This patch only applies to Chelsio T5/T6 NICs when using TOE (TCP offload) and doesn't affect freeing mbufs, it just fixes a race when the NIC could potentially send random garbage if it sends the mbuf before the scheduled disk I/O to populate it with data from disk has completed. NIC is: ix0: What can we do to help you find the root cause? The first step I would do if possible would be to bisect between the last known working version and the version that is known to be broken to determine which commit introduced the problem. One thing that could help here is to see if you can reproduce the problem using a 12.2 kernel on a 12.1 world + ports. If you can, then you can limit your bisecting to just building new kernels which will make that process quicker. You might also see if using a different NIC shows the same problem. If not, then it might point to a regression in the NIC driver (or perhaps in iflib as ix uses iflib I believe). -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: svn commit: r351246 - in stable: 11/sys/opencrypto 12/sys/opencrypto
On 8/26/19 5:25 PM, John Baldwin wrote: > On 8/26/19 1:59 PM, mike tancsa wrote: >> On 8/22/2019 6:51 PM, John Baldwin wrote: >>> On 8/21/19 5:47 PM, Mike Tancsa wrote: >>>> On 8/21/2019 6:38 PM, John Baldwin wrote: >>>>> On 8/21/19 9:08 AM, mike tancsa wrote: >>>>>> On 8/21/2019 12:00 PM, John Baldwin wrote: >>>>>>> dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = >>>>>>> count()' >>>>>> Thanks, I am not familiar with dtrace at all. This command gives a >>>>>> syntax error >>>>>> >>>>>> 0(cage)# dtrace -n 'fbt::_gone_in:entry { >>>>>> @counts[curthread->td_proc->p_comm] = count()' >>>>>> dtrace: invalid probe specifier fbt::_gone_in:entry { >>>>>> @counts[curthread->td_proc->p_comm] = count(): syntax error near end of >>>>>> input >>>>>> 1(cage)# >>>>> Oops, I forgot the closing }. First, do "dtrace -l | grep _gone_in" to >>>>> make >>>>> sure dtrace is loaded. You should see something like this: >>>>> >>>>> # dtrace -l | grep _gone_in >>>>> 87003fbtkernel _gone_in entry >>>>> 87004fbtkernel _gone_in >>>>> return >>>>> 98682fbtkernel _gone_in_dev entry >>>>> 98683fbtkernel _gone_in_dev >>>>> return >>>>> >>>>> Then this should work: >>>>> >>>>> # dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = >>>>> count() }' >>>>> dtrace: description 'fbt::_gone_in:entry ' matched 1 probe >>>>> >>>> Thanks! >>>> >>>> # dtrace -l | grep _gone_in >>>> 15632 fbt kernel _gone_in entry >>>> 22693 fbt kernel _gone_in_dev entry >>>> >>>> # dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = >>>> count() }' >>>> dtrace: description 'fbt::_gone_in:entry ' matched 1 probe >>>> >>>> However, It doesnt show anything after that even as I get the >>>> deprecation messages in dmesg >>> Can you hit Ctrl-C after seeing some of the messages? This trace won't >>> show any results until you exit dtrace. >> >> Hi, >> >> I am still having problems tracking it down via dtrace, but I am >> able to create the problem on demand on sshd. Whats odd is that if I >> restrict the list of ciphers in sshd and even specify something like >> aes-128 on the client, I still get warnings on the server. >> >> e.g from a client, >> >> % ssh -c aes128-cbc console1 uptime >> 4:53PM up 1:02, 3 users, load averages: 0.04, 0.08, 0.08 >> >> The server shows > > Ok, I was able to reproduce this on an 11.x VM. It appears to only > be something that the crypto engine in OpenSSL 1.0.x does (1.1.1 used > in 12.0 and later has a rewritten /dev/crypto engine). > > I'll see if I can find a way to tone down the warning. Maybe if > sshd is only creating sessions and not using them I can restrict > it to warning the first time a session tries to perform an operation > using a deprecated algorithm. (There are separate ioctls for > creating a sessions vs doing actual crypto ops and the warning is > in the session creation currently.) I've committed a fix to head and will MFC it in a few days. Thanks for tracking this down! -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: svn commit: r351246 - in stable: 11/sys/opencrypto 12/sys/opencrypto
On 8/26/19 1:59 PM, mike tancsa wrote: > On 8/22/2019 6:51 PM, John Baldwin wrote: >> On 8/21/19 5:47 PM, Mike Tancsa wrote: >>> On 8/21/2019 6:38 PM, John Baldwin wrote: >>>> On 8/21/19 9:08 AM, mike tancsa wrote: >>>>> On 8/21/2019 12:00 PM, John Baldwin wrote: >>>>>> dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = >>>>>> count()' >>>>> Thanks, I am not familiar with dtrace at all. This command gives a >>>>> syntax error >>>>> >>>>> 0(cage)# dtrace -n 'fbt::_gone_in:entry { >>>>> @counts[curthread->td_proc->p_comm] = count()' >>>>> dtrace: invalid probe specifier fbt::_gone_in:entry { >>>>> @counts[curthread->td_proc->p_comm] = count(): syntax error near end of >>>>> input >>>>> 1(cage)# >>>> Oops, I forgot the closing }. First, do "dtrace -l | grep _gone_in" to >>>> make >>>> sure dtrace is loaded. You should see something like this: >>>> >>>> # dtrace -l | grep _gone_in >>>> 87003fbtkernel _gone_in entry >>>> 87004fbtkernel _gone_in return >>>> 98682fbtkernel _gone_in_dev entry >>>> 98683fbtkernel _gone_in_dev return >>>> >>>> Then this should work: >>>> >>>> # dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = >>>> count() }' >>>> dtrace: description 'fbt::_gone_in:entry ' matched 1 probe >>>> >>> Thanks! >>> >>> # dtrace -l | grep _gone_in >>> 15632 fbt kernel _gone_in entry >>> 22693 fbt kernel _gone_in_dev entry >>> >>> # dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = >>> count() }' >>> dtrace: description 'fbt::_gone_in:entry ' matched 1 probe >>> >>> However, It doesnt show anything after that even as I get the >>> deprecation messages in dmesg >> Can you hit Ctrl-C after seeing some of the messages? This trace won't >> show any results until you exit dtrace. > > Hi, > > I am still having problems tracking it down via dtrace, but I am > able to create the problem on demand on sshd. Whats odd is that if I > restrict the list of ciphers in sshd and even specify something like > aes-128 on the client, I still get warnings on the server. > > e.g from a client, > > % ssh -c aes128-cbc console1 uptime > 4:53PM up 1:02, 3 users, load averages: 0.04, 0.08, 0.08 > > The server shows Ok, I was able to reproduce this on an 11.x VM. It appears to only be something that the crypto engine in OpenSSL 1.0.x does (1.1.1 used in 12.0 and later has a rewritten /dev/crypto engine). I'll see if I can find a way to tone down the warning. Maybe if sshd is only creating sessions and not using them I can restrict it to warning the first time a session tries to perform an operation using a deprecated algorithm. (There are separate ioctls for creating a sessions vs doing actual crypto ops and the warning is in the session creation currently.) > kern.cryptodev_warn_interval=0 I'll try to get this tracked down this week, but this should be a suitable workaround for now. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: svn commit: r351246 - in stable: 11/sys/opencrypto 12/sys/opencrypto
On 8/21/19 5:47 PM, Mike Tancsa wrote: > On 8/21/2019 6:38 PM, John Baldwin wrote: >> On 8/21/19 9:08 AM, mike tancsa wrote: >>> On 8/21/2019 12:00 PM, John Baldwin wrote: >>>> dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = >>>> count()' >>> Thanks, I am not familiar with dtrace at all. This command gives a >>> syntax error >>> >>> 0(cage)# dtrace -n 'fbt::_gone_in:entry { >>> @counts[curthread->td_proc->p_comm] = count()' >>> dtrace: invalid probe specifier fbt::_gone_in:entry { >>> @counts[curthread->td_proc->p_comm] = count(): syntax error near end of >>> input >>> 1(cage)# >> Oops, I forgot the closing }. First, do "dtrace -l | grep _gone_in" to make >> sure dtrace is loaded. You should see something like this: >> >> # dtrace -l | grep _gone_in >> 87003fbtkernel _gone_in entry >> 87004fbtkernel _gone_in return >> 98682fbtkernel _gone_in_dev entry >> 98683fbtkernel _gone_in_dev return >> >> Then this should work: >> >> # dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = >> count() }' >> dtrace: description 'fbt::_gone_in:entry ' matched 1 probe >> > Thanks! > > # dtrace -l | grep _gone_in > 15632 fbt kernel _gone_in entry > 22693 fbt kernel _gone_in_dev entry > > # dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = > count() }' > dtrace: description 'fbt::_gone_in:entry ' matched 1 probe > > However, It doesnt show anything after that even as I get the > deprecation messages in dmesg Can you hit Ctrl-C after seeing some of the messages? This trace won't show any results until you exit dtrace. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: svn commit: r351246 - in stable: 11/sys/opencrypto 12/sys/opencrypto
On 8/21/19 9:08 AM, mike tancsa wrote: > On 8/21/2019 12:00 PM, John Baldwin wrote: >> dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = >> count()' > > Thanks, I am not familiar with dtrace at all. This command gives a > syntax error > > 0(cage)# dtrace -n 'fbt::_gone_in:entry { > @counts[curthread->td_proc->p_comm] = count()' > dtrace: invalid probe specifier fbt::_gone_in:entry { > @counts[curthread->td_proc->p_comm] = count(): syntax error near end of > input > 1(cage)# Oops, I forgot the closing }. First, do "dtrace -l | grep _gone_in" to make sure dtrace is loaded. You should see something like this: # dtrace -l | grep _gone_in 87003fbtkernel _gone_in entry 87004fbtkernel _gone_in return 98682fbtkernel _gone_in_dev entry 98683fbtkernel _gone_in_dev return Then this should work: # dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = count() }' dtrace: description 'fbt::_gone_in:entry ' matched 1 probe -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: svn commit: r351246 - in stable: 11/sys/opencrypto 12/sys/opencrypto
On 8/21/19 8:21 AM, mike tancsa wrote: > On a busy server, I am getting a lot of these spewing to dmesg I have a change staged for MFC that lets you adjust the warning intervals so you can tone down the spam. > Deprecated code (to be removed in FreeBSD 13): ARC4 cipher via /dev/crypto > Deprecated code (to be removed in FreeBSD 13): DES cipher via /dev/crypto > Deprecated code (to be removed in FreeBSD 13): 3DES cipher via /dev/crypto > Deprecated code (to be removed in FreeBSD 13): Blowfish cipher via > /dev/crypto > Deprecated code (to be removed in FreeBSD 13): CAST128 cipher via > /dev/crypto > Deprecated code (to be removed in FreeBSD 13): ARC4 cipher via /dev/crypto > Deprecated code (to be removed in FreeBSD 13): DES cipher via /dev/crypto > Deprecated code (to be removed in FreeBSD 13): 3DES cipher via /dev/crypto > Deprecated code (to be removed in FreeBSD 13): Blowfish cipher via > /dev/crypto > Deprecated code (to be removed in FreeBSD 13): CAST128 cipher via > /dev/crypto > > > What is the best way to try and track down what apps are triggering that ? One might be to use 'procstat -af' to see which processes have crypto file descriptors open (file descriptor type 'c'). The other approach would be to use dtrace with the fbt::_gone_in:entry trace maybe building a count of process names or some such, something like: dtrace -n 'fbt::_gone_in:entry { @counts[curthread->td_proc->p_comm] = count()' Let that run and then Ctrl-C after you see some warnings. > ---Mike > > On 8/19/2019 9:30 PM, John Baldwin wrote: >> Author: jhb >> Date: Tue Aug 20 01:30:35 2019 >> New Revision: 351246 >> URL: https://svnweb.freebsd.org/changeset/base/351246 >> >> Log: >> MFC 348876: Add warnings to /dev/crypto for deprecated algorithms. >> >> These algorithms are deprecated algorithms that will have no in-kernel >> consumers in FreeBSD 13. Specifically, deprecate the following >> algorithms: >> - ARC4 >> - Blowfish >> - CAST128 >> - DES >> - 3DES >> - MD5-HMAC >> - Skipjack >> >> Relnotes: yes >> >> Modified: >> stable/11/sys/opencrypto/cryptodev.c >> Directory Properties: >> stable/11/ (props changed) >> >> Changes in other areas also in this revision: >> Modified: >> stable/12/sys/opencrypto/cryptodev.c >> Directory Properties: >> stable/12/ (props changed) >> >> Modified: stable/11/sys/opencrypto/cryptodev.c >> == >> --- stable/11/sys/opencrypto/cryptodev.c Tue Aug 20 01:26:02 2019 >> (r351245) >> +++ stable/11/sys/opencrypto/cryptodev.c Tue Aug 20 01:30:35 2019 >> (r351246) >> @@ -388,6 +388,9 @@ cryptof_ioctl( >> struct crypt_op copc; >> struct crypt_kop kopc; >> #endif >> +static struct timeval arc4warn, blfwarn, castwarn, deswarn, md5warn; >> +static struct timeval skipwarn, tdeswarn; >> +static struct timeval warninterval = { .tv_sec = 60, .tv_usec = 0 }; >> >> switch (cmd) { >> case CIOCGSESSION: >> @@ -408,18 +411,28 @@ cryptof_ioctl( >> case 0: >> break; >> case CRYPTO_DES_CBC: >> +if (ratecheck(, )) >> +gone_in(13, "DES cipher via /dev/crypto"); >> txform = _xform_des; >> break; >> case CRYPTO_3DES_CBC: >> +if (ratecheck(, )) >> +gone_in(13, "3DES cipher via /dev/crypto"); >> txform = _xform_3des; >> break; >> case CRYPTO_BLF_CBC: >> +if (ratecheck(, )) >> +gone_in(13, "Blowfish cipher via /dev/crypto"); >> txform = _xform_blf; >> break; >> case CRYPTO_CAST_CBC: >> +if (ratecheck(, )) >> +gone_in(13, "CAST128 cipher via /dev/crypto"); >> txform = _xform_cast5; >> break; >> case CRYPTO_SKIPJACK_CBC: >> +if (ratecheck(, )) >> +gone_in(13, "Skipjack cipher via /dev/crypto"); >> txform = _xform_skipjack; >> break; >> case CRYPTO_AES_CBC: >> @@ -432,6 +445,8 @@ cryptof_ioctl( >>
Re: /dev/crypto not being used in 12-STABLE
On 12/6/18 4:19 PM, Konstantin Belousov wrote: > On Thu, Dec 06, 2018 at 04:48:35PM -0700, John Nielsen wrote: >> Is aesni(4) even required if all you want is userland acceleration? >> > No, it is not. Same for rdrand_rng(4), if an application uses hw random > source directly. To elaborate further, aesni(4) is only useful to accelerate in-kernel crypto use (e.g. IPSec or GELI). The fact that /dev/crypto trys to use it by default is a bug (IMO) that I'm planning on addressing. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: /dev/crypto not being used in 12-STABLE
On 12/6/18 3:24 PM, John Nielsen wrote: >> On Dec 6, 2018, at 4:04 PM, Xin LI wrote: >> >> On Thu, Dec 6, 2018 at 11:37 AM John Nielsen wrote: >>> >>> I have upgraded two physical machines from 11-STABLE to 12-STABLE recently >>> (one is 12.0-PRERELEASE r341380 and the other is 12.0-PRERELEASE r341391). >>> I noticed today that neither machine seems to be utilizing /dev/crypto. >>> Typically I see at least ssh/sshd have the device open plus some programs >>> from ports. But 'fuser' doesn't list any processes on either machine: >>> >>> # fuser /dev/crypto >>> /dev/crypto: >>> >>> Both machines are running custom kernels that include "device crypto" and >>> "device cryptodev". One of them additionally has "device aesni". >>> >>> Is anyone else seeing this? Any idea what would cause it? >> >> Your average OpenSSL applications should not use /dev/crypto, if your >> goal is to utilize AES-NI (which does not require /dev/crypto). On >> capable systems, AES-NI would be used automatically (and it's faster >> this way). > > Thanks for the response. Is there a way to verify that AES-NI is being used > for e.g. ssh? I'm also curious why/when/how the change to not use (or > support?) /dev/crypto from base openssl was made. I suspect it was something we just didn't test in the flurry of other work during the OpenSSL upgrade. However, it is much faster to use the AES-NI instructions in userland than to use a system call that copies the data into a kernel buffer, uses the sames AES-NI instructions, then copies the data back out again along with the overhead of a pair of user <--> kernel transitions. If you have an actual crypto offload device (as in a PCI-e card or something), then you might be interested in /dev/crypto (and we should fix that eventually), but AES-NI is just faster software crypto and is best done directly in userland. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Panic on 11-STABLE with Xen guest
On 11/22/18 12:39 PM, Joe Clarke wrote: > I believe after the commit 340016 for the dynamic IRQ layout, my Xen VM > started to panic. I just upgraded the kernel today and saw this: > > xen: unable to map IRQ#2 > panic: Unable to register interrupt override > cpuid = 0 > KDB: stack backtrace: > #0 0x8060a4e7 at kdb_backtrace+0x67 > #1 0x805c3787 at vpanic+0x177 > #2 0x805c3603 at panic+0x43 > #3 0x8093a766 at madt_parse_ints+0x96 > #4 0x803353f9 at acpi_walk_subtables+0x29 > #5 0x8093a5e6 at xenpv_register_pirqs+0x56 > #6 0x80928296 at intr_init_sources+0x116 > #7 0x8055eba8 at mi_startup+0x118 > #8 0x8029902c at btext+0x2c > > The following kernel works: > > @(#)FreeBSD 11.2-STABLE #4: Thu Nov 1 02:24:07 EDT 2018 > FreeBSD 11.2-STABLE #4: Thu Nov 1 02:24:07 EDT 2018 > root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE > > The following kernel produces the panic above immediately on boot: > > @(#)FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018 > FreeBSD 11.2-STABLE #5: Wed Nov 21 11:08:38 EST 2018 > root@creme-brulee:/usr/obj/usr/src/sys/CREME-BRULEE > > Attached is a screen grab of the console of the panic. Hmm, I don't see any obvious candidates of Xen changes that weren't included in the MFC. I've added royger@ (who maintains Xen in FreeBSD) to the cc to see if he has an idea. Roger, the main changes that aren't MFC'd to 11 from 12/head seem to be some refcounting on event channels and PVHv2 vs PVHv1? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Problem with USB <---> UPS management connection
On Thursday, March 08, 2018 01:16:46 AM Glen Barber wrote: > On Wed, Mar 07, 2018 at 08:04:47PM -0500, Mark Saad wrote: > > > On Mar 7, 2018, at 6:55 AM, wishmaster <artem...@ukr.net> wrote: > > > > > > Hi, colleagues! > > > > > > Something strange happens with a server. I am attempting to connect > > > management interface of UPS with server via USB. > > > In console I see a lot of errors: > > > > > > Mar 7 13:42:04 xxx kernel: ugen2.2: > > > at usbus2 > > > Mar 7 13:42:05 xxx kernel: uhid0 on uhub6 > > > Mar 7 13:42:05 xxx kernel: uhid0: > > class 0/0, rev 1.10/0.02, addr 2> on usbus2 > > > Mar 7 13:42:08 xxx kernel: ugen2.2: > > > at usbus2 (disconnected) > > > Mar 7 13:42:08 xxx kernel: uhid0: at uhub6, port 3, addr 2 (disconnected) > > > Mar 7 13:42:08 xxx kernel: uhid0: detached > > > Mar 7 13:42:12 xxx kernel: ugen2.2: > > > at usbus2 > > > Mar 7 13:42:12 xxx kernel: uhid0 on uhub6 > > > Mar 7 13:42:12 xxx kernel: uhid0: > > class 0/0, rev 1.10/0.02, addr 2> on usbus2 > > > Mar 7 13:42:16 xxx kernel: ugen2.2: > > > at usbus2 (disconnected) > > > Mar 7 13:42:16 xxx kernel: uhid0: at uhub6, port 3, addr 2 (disconnected) > > > Mar 7 13:42:16 xxx kernel: uhid0: detached > > > > > > I have changed USB-cables, USB port on the server - without success. > > > On another server this problem is absent. > > > > > > FreeBSD version: FreeBSD 11.1-STABLE #1 r329364M: > > > > > > Any ideas? > > > > > All > > I lost power at home and noticed that nut didn’t work right . I > > had a similar dmesg . My box is running 11.1-stable amd64 built > > from svn 7-8 days ago . When I get power back I’ll post details . > > > > This seems suspiciously similar to an issue I am seeing with a USB mouse > on both stable/11 a patched build of releng/11.1. In my case, the dmesg > shows: > > ugen1.3: at usbus1 (disconnected) > ugen1.3: at usbus1 > ugen1.3: at usbus1 (disconnected) > ugen1.3: at usbus1 > > What struck me as "suspiciously similar" is the 'ugen' reference. > Unfortunately, I do not have more information yet, but have been > pounding my head on my desk throughout the day. Then, I saw this > thread. > > Anyone else seeing at least USB mouse-related issues? It could entirely > be a red herring. I am definitely seeing issues with an APC USB I have on my desktop. I have used this desktop + APC combination for at least 5 years now and only after my most recent upgrade to 11.1-STABLE at r326909. I did not have issues on the previous 11.1-STABLE kernel at r321399, so it does seem like it could be a regression. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: DDD hangs on start on 11.1-R
On Monday, March 05, 2018 08:19:24 AM Daniel Eischen wrote: > On Mon, 5 Mar 2018, Trond Endrest�l wrote: > > > On Sat, 3 Mar 2018 18:09+0100, Holm Tiffe wrote: > > > >> can anyone get ddd get to work in 11.1-R or stable? > > > > I've more or less given up on devel/ddd, since it relies on the old > > pty subsystem, now replaced by the new pts subsystem, to communicate > > with gdb. > > > > I build custom kernels containing "device pty", but I'm not sure if > > that directive is being honoured these days. > > > > It's a shame, 'cos ddd is very good at visualizing data structures. > > Maybe it's possible to patch ddd to use pts instead of pty. > > I used to like ddd also. You might try devel/gps. It's more > than just a debugger, but you can use it just for debugging. > Note, it's been a while since I've used it, but worked similarly > to ddd. I patched ddd to use pts (was a short patch) but it still hangs for me with both old and new gdb. I think it is unfortunately abandonware. :( -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: post ino64: lockd no runs?
On Sunday, June 11, 2017 11:12:25 AM David Wolfskill wrote: > On Sun, Jun 04, 2017 at 08:57:44AM -0400, Michael Butler wrote: > > It seems that {rpc.}lockd no longer runs after the ino64 changes on any > > of my systems after a full rebuild of src and ports. No log entries > > offer any insight as to why :-( > > > > imb > > I don't tend to use NFS on my systems that are running head, so I > haven't had occasion to test this as stated. > > However, I just completed my weekly update of the "prooduction" systems > here at home, running stable/11. And I find that lockd seems to be ... > claiming that all is well, but declining to run (for long). > > To the best of my knowledge, that was not the case until this last > update, which was from: > > FreeBSD albert.catwhisker.org 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #316 > r319566M/319569:1100514: Sun Jun 4 03:54:41 PDT 2017 > r...@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT amd64 > > to > > FreeBSD albert.catwhisker.org 11.1-BETA1 FreeBSD 11.1-BETA1 #322 > r319823M/319823:1100514: Sun Jun 11 03:56:10 PDT 2017 > r...@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT amd64 > > The "glaringly obvious" symptom in my case is that I am now unable > to (directly) save an email message from within mutt(1) by appending > it to an NFS-resident file. (Saving it to a local file, then using > cat(1) to append that to the NFS- resident file & removing the local > copy works) > > After a few variations on a theme of: > > albert(11.1)[5] sudo service lockd restart > lockd not running? > Starting lockd. > albert(11.1)[6] echo $? > 0 > albert(11.1)[7] service lockd status > lockd is not running. > > I finally(!) thought to ask ktrace what's going on (as tailing > /var/log/messages was completely unproductive, even after enabling > rc_debug). > > So I tried: "sudo ktrace -di service lockd restart"; upon exanimation of > the output of kdump(1), I see that the trace ends with: > > ... > 2811 rpc.lockd NAMI "/var/run/logpriv" > 2786 sh CALL read(0xa,0x627fc0,0x400) > 2786 sh GIO fd 10 read 0 bytes >"" > 2811 rpc.lockd RET connect 0 > 2786 sh RET read 0 > 2811 rpc.lockd CALL sendto(0x3,0x7fffe2c0,0x27,0,0,0) > 2786 sh CALL exit(0) > 2811 rpc.lockd GIO fd 3 wrote 39 bytes >"<30>Jun 11 15:43:10 rpc.lockd: Starting" > 2811 rpc.lockd RET sendto 39/0x27 > 2811 rpc.lockd CALL sigaction(SIGALRM,0x7fffec20,0) > 2811 rpc.lockd RET sigaction 0 > 2811 rpc.lockd CALL nlm_syscall(0,0x1e,0x4,0x801015040) > 2811 rpc.lockd RET nlm_syscall -1 errno 14 Bad address This is a really good clue. nlm_syscall is dying with EFAULT. The last argument is a pointer to an array of char * pointers, and the only way I can see it dying is if it fails to copyin() one of the strings pointed to by those pointers. You could try running rpc.lockd under gdb from ports and setting a breakpoint on 'nlm_syscall' and then printing out 'addr_count' and 'p addrs@(addr_count * 2)'. Unfortunately I'm not able to reproduce the failure on a test machine I have running head post-ino64. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: if_cxgbev build error on -stable
On Sunday, December 04, 2016 03:53:23 PM Konstantin Belousov wrote: > On Sun, Dec 04, 2016 at 04:23:00PM +0300, Andrey Chernov wrote: > > It seems counter.h is included before systm.h where critical_* are declared. > It is more weird, since sys/counter.h was added in the stable/10 > merge, but the header is not used in the HEAD sources. It is indeed > needed for stable/10 driver. critical_enter() pre-requisite for counter.h > only exists on i386, which probably explains why John' build test did not > catched it. > > I am preparing another MFC, so I committed the fix in r309529. Thanks for fixing this. I had indeed only tested it on amd64. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 -r307797 on BPi-M3 (cortex-a7): truss gets segmentation fault for handling unknown system call
On Tuesday, October 25, 2016 11:40:38 AM Mark Millard wrote: > [The following has been reported in: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213778 .] > > In trying to build lang/gcc6 xgcc's cc1 got some SIGSYS examples. In trying > to track things down I ran into truss getting a SIGSEGV when it tries to > handle the situation. . . > > In truss's enter_syscall there is (from a live gdb on truss, after the > segmentation fault): > > 380 t->cs.name = sysdecode_syscallname(t->proc->abi->abi, > t->cs.number); > 381 if (t->cs.name == NULL) > (gdb) > 382 fprintf(info->outfile, "-- UNKNOWN %s SYSCALL %d --\n", > 383 t->proc->abi->type, t->cs.number); > 384 > 385 sc = get_syscall(t->cs.name, narg); > 386 t->cs.nargs = sc->nargs; > 387 assert(sc->nargs <= nitems(t->cs.s_args)); > 388 > 389 t->cs.sc = sc; > > (gdb) print *t > $2 = {entries = {le_next = 0x0, le_prev = 0x20617070}, proc = 0x20617060, tid > = 100150, in_syscall = 1, cs = {sc = 0x0, name = 0x0, number = 580828064, > args = 0x2061b0c0, nargs = 0, > s_args = 0x2061b0ec}, before = {tv_sec = 1477418265, tv_nsec = > 492342263}, after = {tv_sec = 1477418265, tv_nsec = 492496630}} > > (gdb) print sc > $3 = (struct syscall *) 0x0 > > So line 386 listed above gets a segmentation fault for sc->nargs when > t->cs.name is a NULL pointer: sc ends up NULL. > > Looking at the two things that the fprintf on lines 382 and 383 would report: > > (gdb) print t->proc->abi->type > $4 = 0x10166 "FreeBSD ELF32" > > (gdb) print t->cs.number > $5 = 580828064 > > (gdb) print narg > $6 = 0 > > (that last is for context for the get_syscall arguments). > > FYI: 580828064 = 0x229EBBA0 I have a patchset I have tested some in a git branch that I believe fixes handling of unknown system calls. Please try this: https://github.com/freebsd/freebsd/compare/master...bsdjhb:truss_unknown (Add .diff to get a diff you can apply with patch) -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nginx and FreeBSD11
On Sunday, September 18, 2016 07:22:41 PM Slawa Olhovchenkov wrote: > On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote: > > > On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote: > > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote: > > > > > > > I am have strange issuse with nginx on FreeBSD11. > > > > I am have FreeBSD11 instaled over STABLE-10. > > > > nginx build for FreeBSD10 and run w/o recompile work fine. > > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node > > > > totaly craped. > > > > > > > > I am see next potential cause: > > > > > > > > 1) clang 3.8 code generation issuse > > > > 2) system library issuse > > > > > > > > may be i am miss something? > > > > > > > > How to find real cause? > > > > > > I find real cause and this like show-stopper for RELEASE. > > > I am use nginx with AIO and AIO from one nginx process corrupt memory > > > from other nginx process. Yes, this is cross-process memory > > > corruption. > > > > > > Last case, core dumped proccess with pid 1060 at 15:45:14. > > > Corruped memory at 0x860697000. > > > I am know about good memory at 0x86067f800. > > > Dumping (form core) this region to file and analyze by hexdump I am > > > found start of corrupt region -- offset c8c0 from 0x86067f800. > > > 0x86067f800+0xc8c0 = 0x86068c0c0 > > > > > > I am preliminary enabled debuggin of AIO started operation to nginx > > > error log (memory address, file name, offset and size of transfer). > > > > > > grep -i 86068c0c0 error.log near 15:45:14 give target file. > > > grep ce949665cbcd.hls error.log near 15:45:14 give next result: > > > > > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 > > > start 00086068C0C0 561b0 2646736 ce949665cbcd.hls > > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 > > > start 00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls > > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 > > > start 00086472B7C0 7ff70 2999424 ce949665cbcd.hls > > > > Does nginx only use AIO for regular files or does it also use it with > > sockets? > > > > You can try using this patch as a diagnostic (you will need to > > run with INVARIANTS enabled, or at least enabled for vfs_aio.c): > > > > Index: vfs_aio.c > > === > > --- vfs_aio.c (revision 305811) > > +++ vfs_aio.c (working copy) > > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job) > > * aio_aqueue() acquires a reference to the file that is > > * released in aio_free_entry(). > > */ > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > + ("%s: vmspace mismatch", __func__)); > > if (cb->aio_lio_opcode == LIO_READ) { > > auio.uio_rw = UIO_READ; > > if (auio.uio_resid == 0) > > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job) > > { > > > > vmspace_switch_aio(job->userproc->p_vmspace); > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > + ("%s: vmspace mismatch", __func__)); > > } > > > > If this panics, then vmspace_switch_aio() is not working for > > some reason. > > I am try using next DTrace script: > > #pragma D option dynvarsize=64m > > int req[struct vmspace *, void *]; > self int trace; > > syscall:freebsd:aio_read:entry > { > this->aio = *(struct aiocb *)copyin(arg0, sizeof(struct aiocb)); > req[curthread->td_proc->p_vmspace, this->aio.aio_buf] = > curthread->td_proc->p_pid; > } > > fbt:kernel:aio_process_rw:entry > { > self->job = args[0]; > self->trace = 1; > } > > fbt:kernel:aio_process_rw:return > /self->trace/ > { > req[self->job->userproc->p_vmspace, self->job->uaiocb.aio_buf] = 0; > self->job = 0; > self->trace = 0; > } > > fbt:kernel:vn_io_fault:entry > /self->trace && !req[curthread->td_proc->p_vmspace, > args[1]->uio_iov[0].iov_base]/ > { > this->buf = args[1]->uio_iov[0].iov_base; > printf("%Y vn_io_fault %p:%p pid %d\n", walltimestamp, > curthread->td_proc->p_vmspace, this->buf, req[curthread->td_proc->p_vmspace, > this->buf]); > } > === > > And don't got any messages near nginx core dump. > What I can check next? > May be check context/address space switch for kernel process? Which CPU are you using? Perhaps try disabling PCID support (I think vm.pmap.pcid_enabled=0 from loader prompt or loader.conf)? (Wondering if pmap_activate() is somehow not switching) -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nginx and FreeBSD11
On Thursday, September 15, 2016 10:09:48 PM Slawa Olhovchenkov wrote: > On Thu, Sep 15, 2016 at 11:54:12AM -0700, John Baldwin wrote: > > > > > Index: vfs_aio.c > > > > === > > > > --- vfs_aio.c (revision 305811) > > > > +++ vfs_aio.c (working copy) > > > > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job) > > > > * aio_aqueue() acquires a reference to the file that is > > > > * released in aio_free_entry(). > > > > */ > > > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > > > + ("%s: vmspace mismatch", __func__)); > > > > if (cb->aio_lio_opcode == LIO_READ) { > > > > auio.uio_rw = UIO_READ; > > > > if (auio.uio_resid == 0) > > > > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job) > > > > { > > > > > > > > vmspace_switch_aio(job->userproc->p_vmspace); > > > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > > > + ("%s: vmspace mismatch", __func__)); > > > > } > > > > > > > > If this panics, then vmspace_switch_aio() is not working for > > > > some reason. > > > > > > This issuse caused rare, this panic produced with issuse or on any aio > > > request? (this is production server) > > > > It would panic in the case that we are going to write into the wrong > > process (so about as rare as your issue). > > Can I configure automatic reboot (not halted) in this case? FreeBSD in a stable branch should already reboot (after writing out a dump) by default unless you have configured it otherwise. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nginx and FreeBSD11
On Thursday, September 15, 2016 08:49:48 PM Slawa Olhovchenkov wrote: > On Thu, Sep 15, 2016 at 10:28:11AM -0700, John Baldwin wrote: > > > On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote: > > > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote: > > > > > > > I am have strange issuse with nginx on FreeBSD11. > > > > I am have FreeBSD11 instaled over STABLE-10. > > > > nginx build for FreeBSD10 and run w/o recompile work fine. > > > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node > > > > totaly craped. > > > > > > > > I am see next potential cause: > > > > > > > > 1) clang 3.8 code generation issuse > > > > 2) system library issuse > > > > > > > > may be i am miss something? > > > > > > > > How to find real cause? > > > > > > I find real cause and this like show-stopper for RELEASE. > > > I am use nginx with AIO and AIO from one nginx process corrupt memory > > > from other nginx process. Yes, this is cross-process memory > > > corruption. > > > > > > Last case, core dumped proccess with pid 1060 at 15:45:14. > > > Corruped memory at 0x860697000. > > > I am know about good memory at 0x86067f800. > > > Dumping (form core) this region to file and analyze by hexdump I am > > > found start of corrupt region -- offset c8c0 from 0x86067f800. > > > 0x86067f800+0xc8c0 = 0x86068c0c0 > > > > > > I am preliminary enabled debuggin of AIO started operation to nginx > > > error log (memory address, file name, offset and size of transfer). > > > > > > grep -i 86068c0c0 error.log near 15:45:14 give target file. > > > grep ce949665cbcd.hls error.log near 15:45:14 give next result: > > > > > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 > > > start 00086068C0C0 561b0 2646736 ce949665cbcd.hls > > > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 > > > start 00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls > > > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 > > > start 00086472B7C0 7ff70 2999424 ce949665cbcd.hls > > > > Does nginx only use AIO for regular files or does it also use it with > > sockets? > > Only for regular files. > > > You can try using this patch as a diagnostic (you will need to > > run with INVARIANTS enabled, > > How much debugs produced? > I am have about 5-10K aio's per second. > > > or at least enabled for vfs_aio.c): > > How I can do this (enable INVARIANTS for vfs_aio.c)? Include INVARIANT_SUPPORT in your kernel and add a line with: #define INVARIANTS at the top of sys/kern/vfs_aio.c. > > > Index: vfs_aio.c > > === > > --- vfs_aio.c (revision 305811) > > +++ vfs_aio.c (working copy) > > @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job) > > * aio_aqueue() acquires a reference to the file that is > > * released in aio_free_entry(). > > */ > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > + ("%s: vmspace mismatch", __func__)); > > if (cb->aio_lio_opcode == LIO_READ) { > > auio.uio_rw = UIO_READ; > > if (auio.uio_resid == 0) > > @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job) > > { > > > > vmspace_switch_aio(job->userproc->p_vmspace); > > + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, > > + ("%s: vmspace mismatch", __func__)); > > } > > > > If this panics, then vmspace_switch_aio() is not working for > > some reason. > > This issuse caused rare, this panic produced with issuse or on any aio > request? (this is production server) It would panic in the case that we are going to write into the wrong process (so about as rare as your issue). -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: nginx and FreeBSD11
On Thursday, September 15, 2016 05:41:03 PM Slawa Olhovchenkov wrote: > On Wed, Sep 07, 2016 at 10:13:48PM +0300, Slawa Olhovchenkov wrote: > > > I am have strange issuse with nginx on FreeBSD11. > > I am have FreeBSD11 instaled over STABLE-10. > > nginx build for FreeBSD10 and run w/o recompile work fine. > > nginx build for FreeBSD11 crushed inside rbtree lookups: next node > > totaly craped. > > > > I am see next potential cause: > > > > 1) clang 3.8 code generation issuse > > 2) system library issuse > > > > may be i am miss something? > > > > How to find real cause? > > I find real cause and this like show-stopper for RELEASE. > I am use nginx with AIO and AIO from one nginx process corrupt memory > from other nginx process. Yes, this is cross-process memory > corruption. > > Last case, core dumped proccess with pid 1060 at 15:45:14. > Corruped memory at 0x860697000. > I am know about good memory at 0x86067f800. > Dumping (form core) this region to file and analyze by hexdump I am > found start of corrupt region -- offset c8c0 from 0x86067f800. > 0x86067f800+0xc8c0 = 0x86068c0c0 > > I am preliminary enabled debuggin of AIO started operation to nginx > error log (memory address, file name, offset and size of transfer). > > grep -i 86068c0c0 error.log near 15:45:14 give target file. > grep ce949665cbcd.hls error.log near 15:45:14 give next result: > > 2016/09/15 15:45:13 [notice] 1055#0: *11659936 AIO_RD 00082065DB60 start > 00086068C0C0 561b0 2646736 ce949665cbcd.hls > 2016/09/15 15:45:14 [notice] 1060#0: *10998125 AIO_RD 00081F1FFB60 start > 00086FF2C0C0 6cdf0 140016832 ce949665cbcd.hls > 2016/09/15 15:45:14 [notice] 1055#0: *11659936 AIO_RD 0008216B6B60 start > 00086472B7C0 7ff70 2999424 ce949665cbcd.hls Does nginx only use AIO for regular files or does it also use it with sockets? You can try using this patch as a diagnostic (you will need to run with INVARIANTS enabled, or at least enabled for vfs_aio.c): Index: vfs_aio.c === --- vfs_aio.c (revision 305811) +++ vfs_aio.c (working copy) @@ -787,6 +787,8 @@ aio_process_rw(struct kaiocb *job) * aio_aqueue() acquires a reference to the file that is * released in aio_free_entry(). */ + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, + ("%s: vmspace mismatch", __func__)); if (cb->aio_lio_opcode == LIO_READ) { auio.uio_rw = UIO_READ; if (auio.uio_resid == 0) @@ -1054,6 +1056,8 @@ aio_switch_vmspace(struct kaiocb *job) { vmspace_switch_aio(job->userproc->p_vmspace); + KASSERT(curproc->p_vmspace == job->userproc->p_vmspace, + ("%s: vmspace mismatch", __func__)); } If this panics, then vmspace_switch_aio() is not working for some reason. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.0-RELEASE status update
On Thursday, September 01, 2016 02:22:04 PM Bryan Drewery wrote: > On 9/1/2016 2:13 PM, Slawa Olhovchenkov wrote: > > On Thu, Sep 01, 2016 at 09:10:00PM +, Glen Barber wrote: > > > >> As some of you may be aware, a few last-minute showstoppers appeared > >> since 11.0-RC1 (and before RC1). > >> > >> One of the showstoppers has been fixed in 12-CURRENT, and merged to > >> stable/11 and releng/11.0 that affected booting from large volumes: > >> > >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212139 > >> > >> There is one issue that is still being investigated, which we are > >> classifying as an EN candidate, given the manifestations of the issue > >> and reproducibility: > >> > >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212168 > >> > >> There is one blocker before 11.0-RELEASE, that affects libarchive, which > >> we are waiting for feedback. Once feedback is received, the schedule > >> for 11.0-RELEASE will be updated on the website to reflect reality. > >> > >> There are a few post-release EN items on our watch list as well, so if > >> something was not mentioned here, that does not mean it will not be > >> fixed in 11.0-RELEASE. > >> > >> Apologies for the delay, and as always, thank you for your patience. > >> > >> Glen > >> On behalf of: re@ > >> > > > > > > Do you planed to fix issuse with missied and delete libmap32.conf? > > > > This was done intentionally quite a while ago: > https://svnweb.freebsd.org/base?view=revision=282421 > > Though it was later removed from ObsoleteFiles so 'make delete-old' > would not remove it from users' systems in r282423. > > etcupdate removing it is the problem really being reported here. Mmm, etcupdate should not remove a modified file. However, etcupdate assumes that a file removed from /etc is supposed to be removed. If your libmap32.conf is unmodified then it truly is pointless since /usr/lib32/private doesn't exist anymore in 11. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ahci-timeout regression in beta3
e ahci(4) fails when using MSI with ahcichX timeout! > nooptions RACCT # Resource accounting framework > nooptions RACCT_DEFAULT_TO_DISABLED # Set > kern.racct.enable=0 by default > nooptions RCTL# Resource limits > > Perhpas it's related?! > https://lists.freebsd.org/pipermail/freebsd-stable/2015-July/082706.html I think it's related in the sense that there is a timing race in ahci and that the /dev/random and RACCT changes alter the timing enough to trigger the race simply by changing the relative order of SYSINIT's during boot (and/or the amount of time between the ahci driver doing its initial probe and the second probe that is run for the interrupt config hooks that actually probes the attached SATA devices). -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ahci-timeout regression in beta3
On Monday, February 29, 2016 07:29:03 PM Harry Schmalzbauer wrote: > Bezüglich Harry Schmalzbauer's Nachricht vom 28.02.2016 20:55 (localtime): > > Hello, > > > > I have a remote machine with a probably defective ODD, but until r294989 > > (from Jan 28th) I could boot with just these warnings: > > (cd1:ahcich1:0:0:0): READ(10). CDB: 28 00 00 38 85 e0 00 00 01 00 > > (cd1:ahcich1:0:0:0): CAM status: SCSI Status Error > > (cd1:ahcich1:0:0:0): SCSI status: Check Condition > > (cd1:ahcich1:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read > > error) > > (cd1:ahcich1:0:0:0): Error 5, Unretryable error > > (cd1:ahcich1:0:0:0): cddone: got error 0x5 back > > … > > > > beta3 doesn't boot anymore, it's hanging with ahci-timeouts: > > ahcich2: Timeout on slot 11 port 0 > > ahcich2: is 0008 cs ss rs 0800 tfd 40 derr > > cmd 0004cb17 > > (ada1:ahcich2:0:0:0): READ_FPDMA_QUEUED. ACB: 60 01 ae a3 50 40 5d 01 00 > > 00 00 00 > > ... > > (aprobe0:ahcich2:0:0:0) ATA_IDENTIFY. ACB eec 00 00 00 00 40 00 00 00 00 > > 00 00 > > (aprobe0:ahcich2:0:0:0) CAM status: Command timeout > > (aprobe0:ahcich2:0:0:0) Error 5, Retry was blocked > > ada1 detached > > ... > > The numbers (first ACB) and also the channel varies from time to time > > I could narrow it down to r295480 > (https://svnweb.freebsd.org/base?view=revision=295480) > > Reverting that lets the machine boot again. > > I captured verbose boot messages, finding out that problem relaxes with > verbose-booting, since ahci seems to recover: > … > TSC timecounter discards lower 1 bit(s) > Timecounter "TSC-low" frequency 1746033500 Hz quality -100 > ahcich2: Timeout on slot 12 port 0 > ahcich2: is 0008 cs ss rs 1000 tfd 40 serr > cmd 0004cc17 > ahcich2: AHCI reset... > (ada1:ahcich2:0:0:0): READ_FPDMA_QUEUED. ACB: 60 04 71 a3 50 40 5d 01 00 > 00 00 00 > (ada1:ahcich2:0:0:0): CAM status: Command timeout > (ada1:ahcich2:0:0:0): Retrying command > ahcich2: SATA connect time=100us status=0123 > ahcich2: AHCI reset: device found > ahcich2: AHCI reset: device ready after 100ms > ahcich1: SNTF 0x0001 > ahcich1: SNTF 0x0001 > … > > I have checked twice that r295480 introduces boot failure here. > > I have absolutely no idea where/how/why/what race happens... > > Thanks for any hints, That is most bizarre. Does HEAD boot fine on this machine? The change in question probably alters the timing of startup a bit since the random kthread is placed on the run queue later which might affect the relative order of kthreads as they start executing, but that would just mean it is exposting a race in some other part of the system. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ia64 10-stable about r292594: rescue crunchide *.lo unknown executable format
On Wednesday, January 27, 2016 12:32:39 AM Anton Shterenlikht wrote: > I asked about this already in stable@ and ia64@. > Got no reply. Perhaps ia64 has been abandoned in > 10-stable too? If so, I'd like to know. > > If not, I'm getting: > > # pwd > /usr/obj/usr/src/rescue/rescue > # crunchide -k _crunched_chio_stub cat.lo > cat.lo: unknown executable format > # crunchide -k _crunched_chio_stub chflags.lo > chflags.lo: unknown executable format > # crunchide -k _crunched_chio_stub chio.lo > chio.lo: unknown executable format > # file *lo > cat.lo: ELF 64-bit LSB relocatable, IA-64, version 1 (FreeBSD), not > stripped > chflags.lo: ELF 64-bit LSB relocatable, IA-64, version 1 (FreeBSD), not > stripped > chio.lo:ELF 64-bit LSB relocatable, IA-64, version 1 (FreeBSD), not > stripped > # file /usr/bin/crunchide > /usr/bin/crunchide: ELF 64-bit LSB executable, IA-64, version 1 (FreeBSD), > dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 10.2 > (1002504), stripped > # > > This is on 10.2-STABLE #20 r292594. > > I tried to buildworld up to r294823, > and back to r291000, all with the same > error. I cannot even build the same > revision as the one I'm running now. > > I've deleted /usr/obj completely,- still > the same. > > Please advise > > Thanks While ia64 is mostly abandoned, this build failure was fixed a few weeks ago: r292885 | emaste | 2015-12-29 12:36:11 -0800 (Tue, 29 Dec 2015) | 4 lines crunchide: Restore IA-64 support accidentally lost in r292421 mismerge Reported by:ngie -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: smbfs crashes since approx. 10.1-RELEASE
On Wednesday, October 07, 2015 08:52:30 AM Christian Kratzer wrote: > Hi, > > On Tue, 6 Oct 2015, John Baldwin wrote: > > >> This crash is occurring when doing an mtx_unlock(). Unfortunately, > >> I'm not > >> conversant w.r.t. this code. I've cc'd jhb@ in case he has some insight. > >> If you don't get any responses, I'd suggest reposting to freebsd-current@ > >> with > >> "crashes in mtx_unlock()" in the subject line. > >> > >> Btw John, the code does tsleep() in a loop before the mtx_unlock(). > >> I do > >> remember that was once allowed, but am not sure if it still is (ie a > >> tsleep() call > >> while holding Giant)? > >> > >> Hopefully someone who knows what is special about Giant that might cause > >> this will > >> respond. > >> > >> Good luck with it, rick > > > > tsleep() with Giant is still allowed. However, this sort of panic usually > > means > > you unlocked a mutex you didn't hold (but without INVARIANTS enabled or > > you'd get > > an assertion failure earlier). > > > > I don't see anything obviously wrong in smb_iod_thread() however. > > > > If you have the crashdump, can you please run this in kgdb: > > > > frame 9 > > p (struct mtx *)c > > p *(struct mtx *)c > > yes I have. Here we go: > > --snipp-- > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x20 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0x80996c7c > stack pointer = 0x28:0xfe004e79bac0 > frame pointer = 0x28:0xfe004e79baf0 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= resume, IOPL = 0 > current process = 12235 (smbiod172) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > #0 0x80984e30 at kdb_backtrace+0x60 > #1 0x809489e6 at vpanic+0x126 > #2 0x809488b3 at panic+0x43 > #3 0x80d4aadb at trap_fatal+0x36b > #4 0x80d4addd at trap_pfault+0x2ed > #5 0x80d4a47a at trap+0x47a > #6 0x80d307f2 at calltrap+0x8 > #7 0x8092ebe0 at __mtx_unlock_sleep+0x60 > #8 0x8092eb69 at __mtx_unlock_flags+0x69 > #9 0x81a1b724 at smb_iod_thread+0xb4 > #10 0x8091244a at fork_exit+0x9a > #11 0x80d30d2e at fork_trampoline+0xe > Uptime: 1d18h34m4s > Dumping 161 out of 999 MB:..10%..20%..30%..40%..50%..60%..70%..80%..90%..100% > > Reading symbols from /boot/kernel/smbfs.ko.symbols...done. > Loaded symbols for /boot/kernel/smbfs.ko.symbols > Reading symbols from /boot/kernel/libiconv.ko.symbols...done. > Loaded symbols for /boot/kernel/libiconv.ko.symbols > Reading symbols from /boot/kernel/libmchain.ko.symbols...done. > Loaded symbols for /boot/kernel/libmchain.ko.symbols > #0 doadump (textdump=) at pcpu.h:219 > 219 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) frame 9 > #9 0x8092ebe0 in __mtx_unlock_sleep (c=0xf8002f531790, > opts=, > file=0x81a25801 "%s: Can't handle disordered parameters > %d:%d\n", line=1) at /usr/src/sys/kern/kern_mutex.c:791 > 791 /usr/src/sys/kern/kern_mutex.c: No such file or directory. > in /usr/src/sys/kern/kern_mutex.c > Current language: auto; currently minimal > (kgdb) p (struct mtx *)c > $1 = (struct mtx *) 0xf8002f531790 > (kgdb) p *(struct mtx *)c > $2 = {lock_object = {lo_name = 0x6 , lo_flags = 0, > lo_data = 0, lo_witness = 0xf8002f531798}, >mtx_lock = 1444181401} Ok, so that is a destroyed mutex. This means it is probably not Giant, and it might be some mutex in smb_iod_main() that shows up in smb_iod_thread() due to inlining. Actually, we know this from your earlier mail: if (evp->ev_type & SMBIOD_EV_SYNC) { SMB_IOD_EVLOCK(iod); wakeup(evp); SMB_IOD_EVUNLOCK(iod); Line 624 is that SMB_IOD_EVUNLOCK(). Hmm, does 'p *evp' work at frame 10? If not, can you try building the devel/gdb port from a recent ports tree with the 'KGDB' option enabled and use 'kgdb710' instead of 'kgdb' to see if you can print out '*evp'? > (kgdb) > --snipp-- > > I can build a GENERIC kernel with INVARIANTS enabled on the box to see if we > get a better assertions next time this happens. That would be great, but please keep the existing core and kernel. We might be able to figure this out from that stil
Re: smbfs crashes since approx. 10.1-RELEASE
On Monday, October 05, 2015 06:16:54 PM Rick Macklem wrote: > Christian Kratzer wrote: > > Hi, > > > > I run a regular rsync job that runs from cron and copies stuff that gets > > created on a Windows smbfs share. > > > > Starting about 10.1-RELEASE the VM has become unstable and started panicing. > > > > I have narrowed the issue down to the aforementioned rsync job. > > > > When I move the job to a different VM the the other VM starts crashing and > > the VM without the job becomes stable agin. > > > > I have panics and crashinfos stored in /var/crash if anybody is interested: > > > > root@noc2:/var/crash # uname -a > > FreeBSD noc2.cksoft.de 10.2-RELEASE FreeBSD 10.2-RELEASE #0 r28: > > Wed > > Aug 12 15:26:37 UTC 2015 > > r...@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 > > root@noc2:/var/crash # freebsd-version -u > > 10.2-RELEASE-p5 > > root@noc2:/var/crash # freebsd-version -k > > 10.2-RELEASE > > root@noc2:/var/crash # > > > > This is what I have in /var/crash/core.txt.0 > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 00 > > fault virtual address = 0x20 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0x80996c7c > > stack pointer = 0x28:0xfe003d6c0ac0 > > frame pointer = 0x28:0xfe003d6c0af0 > > code segment= base 0x0, limit 0xf, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags= resume, IOPL = 0 > > current process = 1349 (smbiod10) > > trap number = 12 > > panic: page fault > > cpuid = 0 > > KDB: stack backtrace: > > #0 0x80984e30 at kdb_backtrace+0x60 > > #1 0x809489e6 at vpanic+0x126 > > #2 0x809488b3 at panic+0x43 > > #3 0x80d4aadb at trap_fatal+0x36b > > #4 0x80d4addd at trap_pfault+0x2ed > > #5 0x80d4a47a at trap+0x47a > > #6 0x80d307f2 at calltrap+0x8 > > #7 0x8092ebe0 at __mtx_unlock_sleep+0x60 > > #8 0x8092eb69 at __mtx_unlock_flags+0x69 > > #9 0x81a1b724 at smb_iod_thread+0xb4 > > #10 0x8091244a at fork_exit+0x9a > > #11 0x80d30d2e at fork_trampoline+0xe > > Uptime: 2h43m55s > > Dumping 103 out of 999 MB: (CTRL-C to abort) > > ..16%..31%..47%..62%..78%..93% > > > This crash is occurring when doing an mtx_unlock(). Unfortunately, I'm > not > conversant w.r.t. this code. I've cc'd jhb@ in case he has some insight. > If you don't get any responses, I'd suggest reposting to freebsd-current@ with > "crashes in mtx_unlock()" in the subject line. > > Btw John, the code does tsleep() in a loop before the mtx_unlock(). I do > remember that was once allowed, but am not sure if it still is (ie a tsleep() > call > while holding Giant)? > > Hopefully someone who knows what is special about Giant that might cause this > will > respond. > > Good luck with it, rick tsleep() with Giant is still allowed. However, this sort of panic usually means you unlocked a mutex you didn't hold (but without INVARIANTS enabled or you'd get an assertion failure earlier). I don't see anything obviously wrong in smb_iod_thread() however. If you have the crashdump, can you please run this in kgdb: frame 9 p (struct mtx *)c p *(struct mtx *)c -- John Baldwin ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: suspend/resume regression
On Saturday, July 25, 2015 03:54:40 PM Kevin Oberman wrote: John, I'm concerned that two issues may be getting conflated. The issue I thought we were looking at was the failure of some systems (T520, X220, T430) to resume after a number of PCI enhancements were MFCed. This is completely unrelated to the USB issue I was experiencing when trying to test the problem on HEAD. The more I think about it, the more I think that the USB issue is just how things need to work. Well, the USB thing could be smarter, but it's a bit of a PITA. What if you take the USB stick out, mess with it on another system, then plug it back in before resume? All the cached file data in the RAM of the resumed system would need to be invalidated, etc. However, I ended up copying a HEAD kernel onto my USB stick and seeing that I at least got the console back before it panic'd. This was sufficient to let me test the reversion patch via the USB stick (and would be sufficient for seeing if we can merge it again for 10.3). The real issue is just resuming the system after r281874 was MFCed as a part of 284034. No USB connected file systems are involved. I m happy to see that it has been reverted for 10.2, but clearly, these changes are needed down the line and I hope the issue can be resolved well before 11.0. (This assumes a 10.3 before 11.0 happens next year.) So it works fine in 11.0 on my x220, and as other folks reported in the PR, so 11.0 is fine. It is also needed for PCI-e hotplug to work after resume (using out-of-tree patches for PCI-e hotplug that jmg@ has). If I merge it to 10.3 it won't be until I've verified that whatever I merge works on my x220 as well as the T440. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: suspend/resume regression
On Saturday, July 18, 2015 10:22:33 PM Kevin Oberman wrote: I just confirmed that my system resumes on HEAD of July 16 but fails on 10.2-BETA2. So the problem limited to 10. I'm guessing that some other change made to pci that has not been MFCed is the cause, but it is only causing a problem on some hardware. I have seen no reports about systems other than Lenovo systems. So my x220 does fail with a USB disk on 10, but I also get a weird behavior where it seems to wake up (disk lights up) and then goes back to sleep and never resumes again. I'm not sure if this is due to using a USB disk or not. I get the same result when I disable power management during suspend which was reported to fix other laptops IIRC. Please try this: Index: sys/dev/acpica/acpi.c === --- sys/dev/acpica/acpi.c (revision 285761) +++ sys/dev/acpica/acpi.c (working copy) @@ -691,7 +691,7 @@ static void acpi_set_power_children(device_t dev, int state) { - device_t child, parent; + device_t child; device_t *devlist; struct pci_devinfo *dinfo; int dstate, i, numdevs; @@ -703,13 +703,12 @@ * Retrieve and set D-state for the sleep state if _SxD is present. * Skip children who aren't attached since they are handled separately. */ - parent = device_get_parent(dev); for (i = 0; i numdevs; i++) { child = devlist[i]; dinfo = device_get_ivars(child); dstate = state; if (device_is_attached(child) - acpi_device_pwr_for_sleep(parent, dev, dstate) == 0) + acpi_device_pwr_for_sleep(dev, child, dstate) == 0) acpi_set_powerstate(child, dstate); } free(devlist, M_TEMP); Index: sys/dev/pci/pci.c === --- sys/dev/pci/pci.c (revision 285761) +++ sys/dev/pci/pci.c (working copy) @@ -3671,7 +3671,7 @@ child = devlist[i]; dstate = state; if (device_is_attached(child) - PCIB_POWER_FOR_SLEEP(pcib, dev, dstate) == 0) + PCIB_POWER_FOR_SLEEP(pcib, child, dstate) == 0) pci_set_powerstate(child, dstate); } } Index: . === --- . (revision 285761) +++ . (working copy) Property changes on: . ___ Modified: svn:mergeinfo Merged /head:r274386,274397 -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: suspend/resume regression
On Tuesday, July 14, 2015 03:10:59 PM Brandon J. Wandersee wrote: Please forgive me if this seems impudent, but has there been any progress on this? The status of the bug report hasn't changed since it was opened. I don't mean to be rude, and I certainly appreciate the effort that's gone into this already (especially Kevin's detective work), but support for suspend-to-RAM and my laptop's hotkeys were essentially the only reasons I started tracking 10-STABLE to begin with. Since both features were resolved many months ago, I was hoping to switch from -STABLE to 10.2-RELEASE when it came out, but I'm starting to get the feeling that won't happen because of a single errant commit. Having to continue following -STABLE would not be terrible, but it would be disappointing. As noted previously, I have been moving house and generally offline since mid-June (and I'm not really fully online yet). My last request was if Kevin (or someone else with an affected laptop) could test HEAD to see if there is a missing bugfix on HEAD that needs to be merged. This specific change was tested on HEAD on both a T440 and X220 and on 10 to test the MFC on the T440. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: suspend/resume regression
I'm traveling and AFK for a week or so more, but I did test this MFC including suspend/resume with CardBus, etc. on a T440 before committing it. It would be good to know if HEAD works for you. If it does then there's likely another fix from HEAD that you need merged. -- John Baldwin On Jun 29, 2015, at 00:54, Kevin Oberman rkober...@gmail.com wrote: On Sun, Jun 28, 2015 at 11:07 PM, Adrian Chadd adrian.ch...@gmail.com wrote: Ok, so which subset of changes is the culprit? (sorry, I'm tired.. :( ) The merge of 281874 broke it. Unfortunately, this is a fairly large and important change that touches five files, mainly dev/pci/pci.c and dev/pci/pci_pci.c with a less significant update to dev/pccbb/pccbb_pci.c. Get some rest. This is an annoying regression, but not disastrous. Systems still run and it sounds like many still resume. Unfortunately my T520 and some contemporary ThinkPads don't. I now have enough data to open a fairly coherent ticket. I'll try to open it tomorrow. (I'm tired, too.) -- Kevin Oberman, Network Engineer, Retired E-mail: rkober...@gmail.com PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 -a On 28 June 2015 at 22:45, Kevin Oberman rkober...@gmail.com wrote: On Sun, Jun 28, 2015 at 4:54 PM, Kevin Oberman rkober...@gmail.com wrote: On Sun, Jun 28, 2015 at 10:38 AM, Joseph Mingrone j...@ftfl.ca wrote: Adrian Chadd adrian.ch...@gmail.com writes: ok. I've updated my x230 to the latest -head and it is okay at suspend/resume. No problem with -head on the X220 as well. I can go acquire an x220 (now that they're cheap) to have as another reference laptop. You might ping Allan Jude. If I'm not mistaken he had at least two X220s at BSDCan. Maybe he'd be willing to part with one. I have now merged all of the parts of 284034 except for 281874 and resume works correctly. As i suspected, something in that rather large commit is the problem and it is probably something that is tied to some other change in HEAD as Adrian has reported that it works fine in HEAD. I'll have to admit that have no idea how to approach figuring this out. I'm not sure how I can even revert a part of the commit to get 10.2-PRERELEASE working for me. I really wish that a commit as large as this one had been MFCed separately. :-( So far there has been only a single commit to pci and none to pccbb since 284034, so I built stable with the files modified in 281874 manually reverted. I now have r284916M running and it seems to be working fine. All of 284034 committed except for the MFC from 281874. That left three files conflicting with STABLE: /usr/src/sys/dev/pci/pci.c /usr/src/sys/dev/pci/pci_pci.c /usr/src/sys/dev/pccbb/pccbb_pci.c -- Kevin Oberman, Network Engineer, Retired E-mail: rkober...@gmail.com PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Build failed in Jenkins: FreeBSD_stable_9 #729
On Tuesday, March 31, 2015 05:02:04 PM jenkins-ad...@freebsd.org wrote: See https://jenkins.freebsd.org/job/FreeBSD_stable_9/729/changes Changes: [jhb] MFC 278760: Add two new counters for vnode life cycle events: - vfs.recycles counts the number of vnodes forcefully recycled to avoid exceeding kern.maxvnodes. - vfs.vnodes_created counts the number of vnodes created by successful calls to getnewvnode(). The actual error is unrelated (and also not in this really long e-mail). It appears to be: mv -f dtparserparse.h dtparser.y.h mv: rename dtparserparse.h to dtparser.y.h: No such file or directory *** [dtparser.y.h] Error code 1 I suspect this is some sort of race with -j? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: RELENG_10 performance regression (was Re: 35-40% performance drop releng9 vs releng10 openvpn
On 3/21/15 12:31 PM, Adrian Chadd wrote: On 21 March 2015 at 08:52, John Baldwin j...@freebsd.org wrote: On 3/20/15 8:46 PM, Mike Tancsa wrote: On 3/20/2015 8:15 PM, Konstantin Belousov wrote: For the purpose of devfs, does it make sense to bump timestamps like normal filesystems for each read/write operation? Looks like Mac OS X will bump timestamps for each operation but Debian don't. First question is, what timecounter hardware is used. I would accept some slowdown from hardware like HPET, but it is indeed surprising if caused by TSC. David Wolfskill suggested trying the problem commit with vfs.timestamp_precision=0 and it does indeed restore performance to what it was. The raw dtrace files are available and FlameGraphs can all be found at http://tancsa.com/time/ Do you know why you are using the HPET instead of TSC for timestamping? Using the TSC can make a non-trivial performance difference since userland can calculate timestamps without using system calls when it is used. (That is not related to this case, but switching to the TSC in general is preferable.) There are a few generations of Intel CPUs where you can't mix deeper sleep states with the TSC as timecounter, but those CPUs are getting to be a bit older at this point. What about various VMs? It depends on the hypervisor. bryanv@ is working on bits to allow us to use very cheap timecounters under KVM for example (if that isn't already in the tree). I think bhyve permits guests to use the TSC already. I think when we talked about this on arch@ before the change was made folks felt that even many embedded systems would have some sort of relatively cheap cycle counter, especially going forward. It may be that we end up picking a different default for guests as we do for 'hz' (though that has its downsides. Luigi has noted that one of the things he has to do to fix network performance in VMs is undo that and raise hz back to 1000). However, for bare metal I'd like to figure out why folks aren't using the TSC and fix those if possible. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: RELENG_10 performance regression (was Re: 35-40% performance drop releng9 vs releng10 openvpn
On 3/20/15 8:46 PM, Mike Tancsa wrote: On 3/20/2015 8:15 PM, Konstantin Belousov wrote: For the purpose of devfs, does it make sense to bump timestamps like normal filesystems for each read/write operation? Looks like Mac OS X will bump timestamps for each operation but Debian don't. First question is, what timecounter hardware is used. I would accept some slowdown from hardware like HPET, but it is indeed surprising if caused by TSC. David Wolfskill suggested trying the problem commit with vfs.timestamp_precision=0 and it does indeed restore performance to what it was. The raw dtrace files are available and FlameGraphs can all be found at http://tancsa.com/time/ Do you know why you are using the HPET instead of TSC for timestamping? Using the TSC can make a non-trivial performance difference since userland can calculate timestamps without using system calls when it is used. (That is not related to this case, but switching to the TSC in general is preferable.) There are a few generations of Intel CPUs where you can't mix deeper sleep states with the TSC as timecounter, but those CPUs are getting to be a bit older at this point. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: savecore problem
On Monday, March 16, 2015 10:17:54 AM Brandon Allbery wrote: On Mon, Mar 16, 2015 at 9:40 AM, Michael BlackHeart amdm...@gmail.com wrote: Hello there. I've got a problem. Recently my personal server issued a kernel panic. Then there's a dump and so on. But there's no dump information after reboot. I do not know what was really the panic cause but assume that savecore failed because of RAID. Problem - minidump was done (I saw it was) but was not recovered by savecore after reboot into /var/vrash (...) /dev/ufs/varfs /varufs rw,noatime 2 2 Last I checked, savecore had to happen very early --- before filesystems other than / are mounted. No, it can happen after that. What really has to happen is that you don't use swap (if you are dumping to your swap partition) before savecore runs. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: HP EliteBook EFI boot failure
On Sunday, March 15, 2015 02:28:41 PM Oliver Pinter wrote: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194063 I am curious if the redzone fix I committed to the EFI loader last week might help. It was noticed because gzipped kernels were corrupted when loaded from disk, but it might generate other random corruption even in the non-gzip case. I think the chance that it helps is low, but it isn't quite zero. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: savecore problem
On Monday, March 16, 2015 11:54:52 AM Michael Jung wrote: On 2015-03-16 11:23, John Baldwin wrote: On Monday, March 16, 2015 10:17:54 AM Brandon Allbery wrote: On Mon, Mar 16, 2015 at 9:40 AM, Michael BlackHeart amdm...@gmail.com wrote: Hello there. I've got a problem. Recently my personal server issued a kernel panic. Then there's a dump and so on. But there's no dump information after reboot. I do not know what was really the panic cause but assume that savecore failed because of RAID. Problem - minidump was done (I saw it was) but was not recovered by savecore after reboot into /var/vrash (...) /dev/ufs/varfs /varufs rw,noatime 2 2 Last I checked, savecore had to happen very early --- before filesystems other than / are mounted. No, it can happen after that. What really has to happen is that you don't use swap (if you are dumping to your swap partition) before savecore runs. Can someone elaborate on not using swap as a dump device a little more? I have had instances in the past were I had issues with getting a core dump and resorted to a dedicated dump device but didn't investigate further nor have I read this as a requirement. Typically the first swap partition is used as the dump partition. If the system writes anything out to swap before savecore runs, then it can potentially overwrite part of the core. (Note that the running kernel doesn't know that there is a core on the swap partition to try to preserve, it just sees that there is an available swap partition.) To try to minimize the chances of this happening, the dump is written at the end of the swap partition instead of the start, but that is not foolproof. Usually you don't run too many things during early boot before savecore that would cause swapping, though a fsck of a large filesystem might use quite a bit of RAM which could result in swapping. A second question - Can a USB devices be used reliably for a dump device for ZFS on boot systems? I'm not sure if USB devices will work as a dump device or not. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: On-going laptop brightness issues
On Thursday, March 12, 2015 06:19:27 PM Kevin Oberman wrote: On Thu, Mar 12, 2015 at 12:40 PM, Adrian Chadd adr...@freebsd.org wrote: I thought jhb already mfc'ed it? -a Adrian, jhb asked that I ask you to MFC. I did so and, since you declined (as is your right and I understand that an MFC takes a fair bit of time to do correctly), I am hoping someone who has a commit bit and some spare time will take this one. I was going to merge it, but there is another bug report where these changes broke a different system, see the followups here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=190186 Hmm, looking at the bug report about the hang itself: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193500 it seems that there is a workaround at least, but not yet a fix. However, since there is a viable workaround the gain from merging this probably outweighs the downside. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: There has to be a better way of merging /etc during a major freebsd-update
On Wednesday, March 11, 2015 10:19:33 AM Peter Olsson wrote: On Tue, Mar 10, 2015 at 10:06:37PM +, Ben Morrow wrote: Quoth Peter Olsson list-freebsd-sta...@jyborn.se: (But I will try running freebsd-update without merging /etc, and use mergemaster -F instead. Should solve my problem.) I'm fairly sure this won't do what you want, and in fact won't work at all, unless your /etc is identical to the stock /etc installed from the ISO. (Which it isn't, of course.) installworld specifically avoids installing the files in /etc; then, when you run mergemaster, it installs the new versions of those files into a temporary directory and merges them with the existing /etc. freebsd-update works a little differently: because it doesn't have a source tree available, it has to fetch the stock versions of the files in /etc for the release you're upgrading from, so that it can patch them to the new release and then merge the changes into your current /etc. If you tell freebsd-update to install /etc without merging it will blindly update files you haven't changed (which is probably what you want) but (I think) will fail to update the files that you have changed, because it uses binary patches which won't apply to your modified versions. If you want a rather hackish solution, you could try something like this: - Rename /etc to /oldetc. - Find yourself a copy of the stock /etc for the version you are upgrading from. (tar -xpf base.txz --include /etc) - Run freebsd-update with /etc removed from the merge list. This will (should?) give you a stock /etc for the version you are upgrading to. - Rename /etc - /tmp/etc, /oldetc - /etc and run mergemaster with -t /tmp. Obviously I would script this if I was doing more than one or two machines Ben I'm not really clear on what will happen if I remove /etc/ from MergeChanges in freebsd-update.conf. Will my /etc then be ignored by freebsd-update, or will my /etc be completely overwritten by freebsd-update? Anyway, your hack could be useful to me. There are no more than about ten files I usually change in /etc, so saving the current /etc, installing a stock /etc, running freebsd-update and then running diff -r to sort out my changes could work. But I'm a little worried about removing my /etc changes from a running server. BTW, this is kind of how etcupdate works (except that it does a full 3-way merge unlike mergemaster since it keeps the previous /etc around to compare with the new /etc and apply the diffs to the real /etc). It even has a mode to allow it to generate tarballs on the build machine that can then be used in place of having a source tree during upgrades so that freebsd-update could be changed to ship the updated bundle on each update. However, I haven't had time to look at what it would take to update freebsd-update to do this (and freebsd-update would have to include building the tarballs in its upstream build process as well). OTOH, if you ask freebsd-update to update your source tree after each update, you can use etcupdate to manage /etc instead of using freebsd-update. (Note that starting with 10.1 and 9.3 etcupdate is in base now and new releases ship with an initial etcupdate database that matches the release ISOs). The (completely untested) process might go something like this: Before your next freebsd-update run, ensure etcupdate is setup: 1) See if etcupdate already works by running 'etcupdate diff' and seeing if you get a sane diff. If you get a nice diff (without lots of noise like $FreeBSD$ changes), skip to step 3. 2) Ensure you have an up-to-date source tree with your current world. Run 'etcupdate extract'. 'etcupdate diff' should now give you a reasonable diff of your changes to /etc files. (Note that it does not show new files like /etc/fstab, just changes to files installed by a clean install.) 3) Review the output of 'etcupdate diff'. If there are local changes that are not correct, you can edit the files in /etc to reduce the diffs. If you want to restore a file to its original state, you can use 'cp /var/db/etcupdate/current/etc/foo /etc/foo' (I will someday add an 'etcupdate revert' command for this) 4) Ensure that freebsd-update is set to update your source tree on each update and to not do any /etc merges After your next run of 'freebsd-update', run 'etcupdate' to merge in any changes to '/etc'. It can generally cope with simple merges similar to 'svn up'. If it encounters a conflict, it saves off a copy of the file with conflict markers for you to resolve via 'etcupdate resolve' but leaves the old file in /etc untouched until you resolve the conflict. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable
Re: Suspected libkvm infinite loop
On Wednesday, March 11, 2015 02:00:41 PM Nick Frampton wrote: On 11/03/15 07:59, Mark Johnston wrote: On Tue, Mar 10, 2015 at 02:10:09PM -0400, John Baldwin wrote: Often loops using libkvm are due to programs using libkvm are trying to read kernel data structures while they are changing. However, if you use sysctls to fetch this data instead, you should be able to get a stable snapshot of the system state without getting stuck in a possible loop. I believe for libkvm to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and /dev/null for the core image. In our code, we're invoking kvm_openfiles as you suggest: kd = kvm_openfiles (NULL, _PATH_DEVNULL, NULL, O_RDONLY, errbuf) It sounds like this issue might be the one fixed in r272566: if the KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an sbuf error return value could bubble up and be treated as ERESTART, resulting in a loop. This can be confirmed with something like dtrace -n 'syscall:::entry /pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p pid of looping proc If the output consists solely of __sysctl, this bug is likely the culprit. Unfortunately, I accidentally killed fstat this morning before I could do any further debug. I ran truss -p on it yesterday and it was spinning solely on __sysctl. I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the problem in a reasonable time frame so it could be days or weeks before we see it happen again. Tha truss output is consistent with Mark's suggestion, so I would try his suggested fix of 272566. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Suspected libkvm infinite loop
On Thursday, March 12, 2015 12:40:23 PM Konstantin Belousov wrote: On Wed, Mar 11, 2015 at 09:34:07PM -0700, Mark Johnston wrote: On Thu, Mar 12, 2015 at 02:05:32PM +1000, Nick Frampton wrote: On 12/03/15 00:38, John Baldwin wrote: It sounds like this issue might be the one fixed in r272566: if the KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an sbuf error return value could bubble up and be treated as ERESTART, resulting in a loop. This can be confirmed with something like dtrace -n 'syscall:::entry/pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p pid of looping proc If the output consists solely of __sysctl, this bug is likely the culprit. Unfortunately, I accidentally killed fstat this morning before I could do any further debug. I ran truss -p on it yesterday and it was spinning solely on __sysctl. I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the problem in a reasonable time frame so it could be days or weeks before we see it happen again. Tha truss output is consistent with Mark's suggestion, so I would try his suggested fix of 272566. I patched the 10.1 kernel with r272566 and it appears to have fixed the issue. Is this patch likely to be MFCed back to 10-stable? I can't see any reason it shouldn't be, and there was an MFC reminder in the commit log entry for that revision. I've cc'ed kib@, who might have a reason. The mentioned commit depends on r271976, in fact it depends on the series of commits, including r271486 and r271489. I did not merged r271976 with manual resolution of the conficts, since it means that the work done for HEAD needs to be redone for stable/10 to ensure that all cases are covered. Later, when the mentioned series is merged, the work should be redone once more. And to note, r271489 is not trivially mergeable as well, just checked. You could merge r272566 and just fixup the sbuf_bcat() in export_fd_to_sb() in kern_descrip.c instead. I hadn't really considered fo_fill_kinfo to be something that was mergeable to 10. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Suspected libkvm infinite loop
On Tuesday, March 10, 2015 10:17:07 AM Nick Frampton wrote: Hi, For the past several months, we have had an intermittent problem where a process calling kvm_openfiles(3) or kvm_getprocs(3) (not sure which) gets stuck in an infinite loop and goes to 100% cpu. We have just observed fstat -m do the same thing and suspect it may be the same problem. Our environment is a 10.1-RELEASE-p6 amd64 guest running in VirtualBox, with ufs root and zfs /home. Has anyone else experienced this? Is there anything we can do to investigate the problem further? Often loops using libkvm are due to programs using libkvm are trying to read kernel data structures while they are changing. However, if you use sysctls to fetch this data instead, you should be able to get a stable snapshot of the system state without getting stuck in a possible loop. I believe for libkvm to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and /dev/null for the core image. fstat -m should be doing that by default however, so if it is not that, can you ktrace fstat when it is spinning to see if it is spinning userland or in the kernel? If you see no activity via ktrace, then it is spinning in one of the two places without making any system calls, etc. You can attach to it with gdb to pause it, then see where gdb thinks it is. If gdb hangs attaching to it, then it is stuck in the kernel. If gdb attaches to it ok, then it is spinning in userland. Unfortunately, for gdb to be useful, you really need debug symbols. We don't currently provide those for release binaries or binaries provided via freebsd-update (though that is being worked on for 11.0). If you build from source, then the simplest way to get this is to add 'WITH_DEBUG_FILES=yes' to /etc/src.conf and rebuild your world without NO_CLEAN. If you are building from source and are able to reproduce with those binaries, then after attaching to the process with gdb, use 'bt' to see where it is hung and reply with that. If it is hanging in the kernel, then you will need to use the kernel debugger to see where it is hanging. The simplest way to do this is probably to force a crash via the debug.kdb.panic sysctl (set it to a non-zero value). You will then need to fire up kgdb on the crash dump after it reboots, switch to the fstat process via the 'proc pid' command and get a backtrace via 'bt'. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9.2-PRERELEASE #0 r254557 amd64: core dump on shutdown
On Thursday, September 12, 2013 1:29:40 am Marko Cupać wrote: On Wed, 11 Sep 2013 11:11:24 -0400 John Baldwin j...@freebsd.org wrote: Is this reproducible? It happened a few times before (maybe 3-4 times this year), but I can't reproduce it intentionally. Hmm, I'm tempted to chalk this up to a hardware failure then. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9.2-PRERELEASE #0 r254557 amd64: core dump on shutdown
On Tuesday, September 10, 2013 10:50:55 am Marko Cupać wrote: My 9.2-PRERELEASE #0 r254557 amd64 just dumped core on shutdown. I updated src to Last Changed Rev: 255395 two days ago but did not get to rebuild worldkernel. Also I did not rebuild any ports since. Virtualbox was not running. pacija@kaa:/var/crash % uname -a FreeBSD kaa.mimar.rs 9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0 r254557: Sun Aug 25 22:44:52 CEST 2013 pac...@kaa.mimar.rs:/usr/obj/usr/src/sys/KAAGEN amd64 pacija@kaa:/var/crash % sudo cat core.txt.2 kaa.mimar.rs dumped core - see /var/crash/vmcore.2 Tue Sep 10 16:41:45 CEST 2013 FreeBSD kaa.mimar.rs 9.2-PRERELEASE FreeBSD 9.2-PRERELEASE #0 r254557: Sun Aug 25 22:44:52 CEST 2013 pac...@kaa.mimar.rs:/usr/obj/usr/src/sys/KAAGEN amd64 panic: page fault Is this reproducible? #6 0x80cdc843 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232 #7 0x80b71085 in swapoff_one (sp=0xfe0006296600, cred=0xfe00037a0e00) at /usr/src/sys/vm/swap_pager.c:1753 Relevant line is: 1753for (swap = swhash[i]; swap != NULL; swap = swap-swb_hnext) { -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: unexpected idprio 31 behavior on 9.2-BETA2 and 9.2-RC1
On Thursday, August 08, 2013 10:41:12 am Eric van Gyzen wrote: On 08/08/2013 09:19, Eric van Gyzen wrote: On 08/06/2013 14:23, J David wrote: On Tue, Aug 6, 2013 at 1:59 PM, Eric van Gyzen e...@vangyzen.net wrote: on an otherwise idle amd64 system with 4 CPUs. The first command in the build.log file: rm -rf /usr/obj/home/freebsd/tmp took over three minutes. It should have taken about three /seconds/. uptime reported a load average of around 1.00. top showed no threads (user or kernel) using CPU. iostat showed an average of less than 20 tps on ada0. rm was usually in the RUN state. We are looking at something similar. Would you be able to try to reproduce it using a kernel with: nooptions SCHED_ULE optionsSCHED_4BSD to see if it makes a difference? It seems to, but the problem is inconsistent enough that I can't be sure. The 4BSD scheduler does //not// exhibit this problem. I tested with the latest releng/9.2 (r254054) and an otherwise GENERIC config. To be thorough, I built a GENERIC kernel at the same rev, and it still exhibits the problem. Please try this change: Index: sched_ule.c === --- sched_ule.c (revision 255020) +++ sched_ule.c (working copy) @@ -243,7 +243,7 @@ struct tdq { int tdq_transferable; /* Transferable thread count. */ short tdq_switchcnt; /* Switches this tick. */ short tdq_oldswitchcnt; /* Switches last tick. */ - u_char tdq_lowpri; /* Lowest priority thread. */ + u_short tdq_lowpri; /* Lowest priority thread. */ u_char tdq_ipipending; /* IPI pending. */ u_char tdq_idx;/* Current insert index. */ u_char tdq_ridx; /* Current removal index. */ @@ -2323,7 +2323,7 @@ sched_choose(void) tdq-tdq_lowpri = td-td_priority; return (td); } - tdq-tdq_lowpri = PRI_MAX_IDLE; + tdq-tdq_lowpri = PRI_MAX_IDLE + 1; return (PCPU_GET(idlethread)); } -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Why are cardbus drivers cbb(4) and pccard(4) still included in GENERIC?
On Thursday, August 29, 2013 6:56:53 am Adrian Chadd wrote: Hm! Are they dynamically loaded if you insert the cards? (Ie, has devd been taught about them as appropriate?) These are drivers for the bridges, not for cards you plug into the bridges. If you autoloaded them at all you would load them during boot when you saw an appropriate PCI device. Currently we don't autoload any PCI drivers, so I don't think that should be a blocker for taking these out of GENERIC. Warner is probably the best person to ask. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [REGRESSION] Root zpool mounting broken between 06/30/2013 and 07/21/2013 when PS/2 support compiled into the kernel
On Monday, July 22, 2013 10:30:32 am Garrett Cooper wrote: I have a KERNCONF that previously had PS/2 support compiled into the kernel. If I comment out the following lines like so: # atkbdc0 controls both the keyboard and the PS/2 mouse #device atkbdc # AT keyboard controller #device atkbd # AT keyboard then I'm able to mount root again (it was failing with ENOXDEV). The working kernel was as follows: $ strings /boot/kernel.WORKING/kernel | grep -B 2 -A 2 BAYONETTA @(#)FreeBSD 9.1-STABLE #7 r+0304216: Sun Jun 30 15:22:55 PDT 2013 FreeBSD 9.1-STABLE #7 r+0304216: Sun Jun 30 15:22:55 PDT 2013 gcooper@bayonetta.local:/usr/obj/scratch/git/github/yaneurabeya-freebsd-stable-9/sys/BAYONETTA gcc version 4.2.1 20070831 patched [FreeBSD] FreeBSD 9.1-STABLE BAYONETTA $ cd /usr/src; git log 0304216 commit 03042167f73c213732b44218a24d8e1bbea00f8c Merge: 2edcad2 974abfb Author: Garrett Cooper yaneg...@gmail.com Date: Mon Jun 24 19:00:45 2013 -0700 Merge remote-tracking branch 'upstream/stable/9' into stable/9 The working kernel [with atkbdc] was as follows: FreeBSD bayonetta.local 9.2-BETA1 FreeBSD 9.2-BETA1 #12 r+c178034: Sun Jul 21 20:19:38 PDT 2013 root@bayonetta.local:/usr/obj/scratch/git/github/yaneurabeya-freebsd-stable-9/sys/BAYONETTA amd64 $ git log c178034 commit c17803445f4ffb97e1a46a1be5f7ea04692793f0 Author: avg a...@freebsd.org Date: Tue Jul 9 08:30:31 2013 + zfsboottest.sh: remove checks for things that are not strictly required MFC after: 10 days (Yes, I had to backport some things because they are busted on stable/9 due to other incomplete/missing MFCs). I can test out patches, but I don't have time to bisect the actual commit that caused the failure. That being said my intuition says it's this commit should be looked at first: commit 28f961058b0667841d7e9d8639bfd02ed8689faa Author: jhb j...@freebsd.org Date: Wed Jul 17 14:04:18 2013 + MFC 252576: Don't perform the acpi_DeviceIsPresent() check for PCI-PCI bridges. If we are probing a PCI-PCI bridge it is because we found one by enumerating the devices on a PCI bus, so the bridge is definitely present. A few BIOSes report incorrect status (_STA) for some bridges that claimed they were not present when in fact they were. While here, move this check earlier for Host-PCI bridges so attach fails before doing any work that needs to be torn down. PR: kern/91594 Approved by:re (marius) I strongly doubt that this is related. It would be most helpful if you could obtain a dmesg from the new kernel however (perhaps via a serial console) to rule it out. All you would need to see is if the new kernel sees more pcib devices than the old one to see if this change even has an effect on your system. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: syncer causing latency spikes
On Wednesday, July 17, 2013 3:18:52 pm Konstantin Belousov wrote: On Wed, Jul 17, 2013 at 02:07:55PM -0400, Mark Johnston wrote: During such an fsync, DTrace shows me that syncer sleeps of 50-200ms are happening up to 8 or 10 times a second. When this happens, a bunch of postgres threads become blocked in vn_write() waiting for the vnode lock to become free. It looks like the write-clustering code is limited to using (nswbuf / 2) pbufs, and FreeBSD prevents one from setting nswbuf to anything greater than 256. Syncer is probably just a victim of profiling. Would postgres called fsync(2), you then blame the fsync code for the pauses. Just add a tunable to allow the user to manually-tune the nswbuf, regardless of the buffer cache sizing. And yes, nswbuf default max probably should be bumped to something like 1024, at least on 64bit architectures which do not starve for kernel memory. Also, if you are seeing I/O stalls with mfi(4), then you might need a firmware update for your mfi(4) controller. cc'ing smh@ who knows more about that particular issue (IIRC). -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: locks under printf(9) and WITNESS = panic?
On Saturday, June 29, 2013 9:19:24 pm Steven Hartland wrote: when booting stable/9 under a debug kernel with WITNESS enabled and verbose I get the following panic.. It seems very much like the discussion from a year back on current: http://lists.freebsd.org/pipermail/freebsd-current/2012- January/031375.html Any ideas? Yeah, that lock needs to be MTX_RECURSE (the cnputs_mtx). However, it only recurses under witness. *sigh* -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: USB ports on Lenovo T400 do not work after a suspend/resume
On Sunday, June 30, 2013 10:22:09 am Ian Smith wrote: On Sat, 29 Jun 2013, Adrian Chadd wrote: On 27 June 2013 04:58, Ian Smith smi...@nimnet.asn.au wrote: We don't yet know if this is a bus, ACPI /or USB issue. Home yet? :) Yup: http://people.freebsd.org/~adrian/usb/ dmesg.boot = dmesg at startup 1 - after powerup, usb device in 2 - after acpiconf -s3 suspend/resume, w/ a USB device plugged in 3 - after acpiconf -s3 suspend/resume, with a USB device removed before suspend/resume After removing [numbers] (for WITNESS?), diff started making sense. The below is between the first and second suspend/resume cycles in dmesg-3.txt, encompassing the others. Nothing of note that I can see, if that usb hub-to-bus remapping is normal. As you said, 'CPU0: local APIC error 0x40' looks maybe sus. Maybe someone who knows might comment on that? From sys/amd64/include/apicreg.h: /* fields in ESR */ #define APIC_ESR_SEND_CS_ERROR 0x0001 #define APIC_ESR_RECEIVE_CS_ERROR 0x0002 #define APIC_ESR_SEND_ACCEPT0x0004 #define APIC_ESR_RECEIVE_ACCEPT 0x0008 #define APIC_ESR_SEND_ILLEGAL_VECTOR0x0020 #define APIC_ESR_RECEIVE_ILLEGAL_VECTOR 0x0040 #define APIC_ESR_ILLEGAL_REGISTER 0x0080 Receive illegal vector (if look in Intel's SDM manuals) means it got an interrupt vector 32 (probably zero). Perhaps it asserted an interrupt in an I/O APIC before the I/O APIC was properly reset? Are you using MSI at all? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found
On Sunday, June 16, 2013 2:39:42 am Andre Albsmeier wrote: On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote: On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote: Each day at 5:15 we are generating snapshots on various machines. This used to work perfectly under 7-STABLE for years but since we started to use 9.1-STABLE the machine reboots in about 10% of all cases. After rebooting we find a new snapshot file which is a bit smaller than the good ones and with different permissions It does not succeed a fsck. In this example it is the one whose name is beginning with s3: -r--r- 1 root operator snapshot 72802894528 29 May 05:15 s2-2013.05.28-03.15.04 -r 1 root operator snapshot 72802893824 29 May 05:15 s3-2013.05.29-03.15.03 -r--r- 1 root operator snapshot 72802894528 28 May 14:22 s4-2013.05.23-06.38.44 -r--r- 1 root operator snapshot 72802894528 28 May 14:22 s5-2013.05.24-03.15.03 -r--r- 1 root operator snapshot 72802894528 28 May 14:22 s6-2013.05.25-03.15.03 After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel I see the following LORs (mksnap_ffs starts exactly at 5:15): May 29 05:15:00 kern.crit palveli kernel: lock order reversal: May 29 05:15:00 kern.crit palveli kernel: 1st 0xc2371da8 ufs (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240 May 29 05:15:00 kern.crit palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414 May 29 05:15:04 kern.crit palveli kernel: lock order reversal: May 29 05:15:04 kern.crit palveli kernel: 1st 0xc228471c snaplk (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976 May 29 05:15:04 kern.crit palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626 Unfortunatley no corefiles are being generated ;-(. I have checked and even rebuilt the (UFS1) fs in question from scratch. I have also seen this happen on an UFS2 on another machine and on a third one when running dump -L on a root fs. Any hints of how to proceed? Would it be possible to setup a serial console that is logged on this machine to see if it is panic'ing but failing to write out a crashdump? Couldn't attach the serial console yet ;-(. But I had people attach a KVMoverIP switch and enabled the various KDB options in the kernel. Now we can see a bit more (see below) -- no crashdump is being generated though. :( Unfortunately these LORs don't really help with discerning the cause of the reboot. If you have remote power access (and still wanted to test this) one option would be to change KDB to drop into the debugger on a panic. Then you could connect over the KVM and take images of the original panic along with a stack trace. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ACPI Warning, then hang
On Monday, June 10, 2013 10:18:47 pm Bryce Edwards wrote: Verbose boot: https://www.dropbox.com/s/obm8rtavro68ea8/acpi-verbose.jpg That is odd. I had expected it to output some other messages. Hmm, the line two lines up shows your RSDP (list of ACPI tables) seems to be garbage as well. I think the BIOS is just broken I'm afraid. :( -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zpool labelclear destroys GPT data
On Friday, June 14, 2013 4:21:08 am Daniel O'Connor wrote: On 14/06/2013, at 17:48, Alban Hertroys haram...@gmail.com wrote: IMHO it would be helpful to verify what's there first and warn the user about it if such an operation will overwrite a different type of label than what is about to get written there. Perhaps it should even refuse to write (by issuing an error stating that there is already a label there - and preferably also what type) until the label that's already there gets explicitly cleared by the user or until the command gets forced. Does that make sense? The problem with this is that then each label tool needs to know about every other label format you want to detect for.. If a label format has a checksum then you could ignore a request to nuke the label if there is no valid checksum (with a flag to force). No idea how many have checksums though.. Well, you could have zpool check if there is a valid ZFS label and prompt/warn if it doesn't find one on whatever device it's about to wipe. That doesn't fix the gmirror/gpt case, but it might make zpool more intuitive to use. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Reproducable Infiniband panic
On Monday, June 10, 2013 8:04:12 am Julian Stecklina wrote: On 06/07/2013 06:06 PM, John Baldwin wrote: On Friday, June 07, 2013 5:07:34 am Julian Stecklina wrote: On 06/06/2013 08:57 PM, John Baldwin wrote: On Thursday, June 06, 2013 9:54:35 am Andriy Gapon wrote: [...] The problem seems to be in incorrect interaction between devfs_close_f and linux_file_dtor. The latter expects curthread-td_fpop to have a valid reasonable value. But the former sets curthread-td_fpop to fp only around vnops.fo_close() call and then restores it back to some (what?) previous value before calling devfs_fpdrop-devfs_destroy_cdevpriv. In this case the previous value is NULL. It is normally NULL in this case. Why does linux_file_dtor even look at td_fpop? Ah. I think it should not do that and make the data it uses in the dtor more self-contained: [...] Seems to fix my panic. Thanks! Can you please retest this updated version? I had thought that I didn't need a reference count on the vnode, but devfs drops its reference count before the cdevpriv destructor is called. Index: sys/ofed/include/linux/fs.h === --- sys/ofed/include/linux/fs.h (revision 251604) +++ sys/ofed/include/linux/fs.h (working copy) @@ -73,6 +73,7 @@ struct dentry f_dentry_store; struct selinfo f_selinfo; struct sigio*f_sigio; + struct vnode*f_vnode; }; #definefilelinux_file Index: sys/ofed/include/linux/linux_compat.c === --- sys/ofed/include/linux/linux_compat.c (revision 251604) +++ sys/ofed/include/linux/linux_compat.c (working copy) @@ -212,7 +212,8 @@ struct linux_file *filp; filp = cdp; - filp-f_op-release(curthread-td_fpop-f_vnode, filp); + filp-f_op-release(filp-f_vnode, filp); + vdrop(filp-f_vnode); kfree(filp); } @@ -232,6 +233,8 @@ filp-f_dentry = filp-f_dentry_store; filp-f_op = ldev-ops; filp-f_flags = file-f_flag; + vhold(file-f_vnode); + filp-f_vnode = file-f_vnode; if (filp-f_op-open) { error = -filp-f_op-open(file-f_vnode, filp); if (error) { -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ACPI Warning, then hang
On Monday, June 10, 2013 10:35:07 am Jeremy Chadwick wrote: On Mon, Jun 10, 2013 at 09:18:14AM -0500, Bryce Edwards wrote: I'm getting the following warning, and then the system locks: ACPI Warning: Incorrect checksum in table [(bunch of spaces)] - 0x29, should be 0x48 Here's a pic: http://db.tt/O6dxONzI System is on a SuperMicro C7X58 motherboard that I just upgraded to BIOS 2.0a, which I would like to stay on if possible. I tried adjusting all the ACPI related BIOS settings without success. The message in question refers to hard-coded data in one of the many ACPI tables (see acpidump(8) for the list -- there are many). ACPI tables are stored within the BIOS -- the motherboard/BIOS vendor has full control over all of them and is fully 100% responsible for their content. It looks to me like they severely botched their BIOS, or somehow it got flashed wrong. You need to contact Supermicro Technical Support and tell them of the problem. They need to either fix their BIOS, or help figure out what's become corrupted. You can point them to this thread if you'd like. I should note that the corruption/issue is major enough that you are missing very key/important lines from your dmesg (after avail memory but before kdbX at kdbmuxX, which come from pure reliance upon ACPI. Lines such as: Event timer LAPIC quality 400 ACPI APIC Table: PTLTDAPIC FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ioapic0 Version 2.0 irqs 0-23 on motherboard ioapic1 Version 2.0 irqs 24-47 on motherboard In the meantime, you can try booting without ACPI support (there should be a boot-up menu option for that) and pray that works. If it doesn't, then your workaround is to roll back to an older BIOS version and/or put pressure on Supermicro. You will find their Technical Support folks are quite helpful/responsive to technical issues. Good luck and keep us posted on what transpires. Actually, that message is mostly harmless. All sorts of vendors ship tables with busted checksums that are in fact fine. :( However, the table name looks very odd which is more worrying. Booting without ACPI enabled would be a good first step. Trying a verbose boot to capture the last message before the hang would also be useful. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Reproducable Infiniband panic
On Friday, June 07, 2013 5:07:34 am Julian Stecklina wrote: On 06/06/2013 08:57 PM, John Baldwin wrote: On Thursday, June 06, 2013 9:54:35 am Andriy Gapon wrote: [...] The problem seems to be in incorrect interaction between devfs_close_f and linux_file_dtor. The latter expects curthread-td_fpop to have a valid reasonable value. But the former sets curthread-td_fpop to fp only around vnops.fo_close() call and then restores it back to some (what?) previous value before calling devfs_fpdrop-devfs_destroy_cdevpriv. In this case the previous value is NULL. It is normally NULL in this case. Why does linux_file_dtor even look at td_fpop? Ah. I think it should not do that and make the data it uses in the dtor more self-contained: Index: sys/ofed/include/linux/linux_compat.c === --- linux_compat.c (revision 251465) +++ linux_compat.c (working copy) @@ -212,7 +212,7 @@ linux_file_dtor(void *cdp) struct linux_file *filp; filp = cdp; - filp-f_op-release(curthread-td_fpop-f_vnode, filp); + filp-f_op-release(filp-f_vnode, filp); kfree(filp); } @@ -232,6 +232,7 @@ linux_dev_open(struct cdev *dev, int oflags, int d filp-f_dentry = filp-f_dentry_store; filp-f_op = ldev-ops; filp-f_flags = file-f_flag; + filp-f_vnode = file-f_vnode; if (filp-f_op-open) { error = -filp-f_op-open(file-f_vnode, filp); if (error) { Doesn't compile for me. Did you forget to add the f_vnode member to struct linux_file? sys/ofed/include/linux/linux_compat.c: In function 'linux_file_dtor': sys/ofed/include/linux/linux_compat.c:214: error: 'struct linux_file' has no member named 'f_vnode' sys/ofed/include/linux/linux_compat.c: In function 'linux_dev_open': sys/ofed/include/linux/linux_compat.c:234: error: 'struct linux_file' has no member named 'f_vnode' Oof it's in another header: Index: sys/ofed/include/linux/fs.h === --- fs.h(revision 251494) +++ fs.h(working copy) @@ -73,6 +73,7 @@ struct linux_file { struct dentry f_dentry_store; struct selinfo f_selinfo; struct sigio*f_sigio; + struct vnode*f_vnode; }; #definefilelinux_file -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Reproducable Infiniband panic
On Thursday, June 06, 2013 9:54:35 am Andriy Gapon wrote: on 06/06/2013 14:48 Julian Stecklina said the following: #7 0x807a3d83 in linux_file_dtor (cdp=0xfe000aeabb80) at /usr/home/julian/src/freebsd/sys/ofed/include/linux/linux_compat.c:214 filp = (struct linux_file *) 0xfe000aeabb80 #8 0x80513c39 in devfs_destroy_cdevpriv (p=0xfe0005772980) at /usr/home/julian/src/freebsd/sys/fs/devfs/devfs_vnops.c:159 No locals. #9 0x80513e47 in devfs_close_f (fp=0xfe000b0e9aa0, td=value optimized out) at /usr/home/julian/src/freebsd/sys/fs/devfs/devfs_vnops.c:619 error = 0 fpop = (struct file *) 0x0 The problem seems to be in incorrect interaction between devfs_close_f and linux_file_dtor. The latter expects curthread-td_fpop to have a valid reasonable value. But the former sets curthread-td_fpop to fp only around vnops.fo_close() call and then restores it back to some (what?) previous value before calling devfs_fpdrop-devfs_destroy_cdevpriv. In this case the previous value is NULL. It is normally NULL in this case. Why does linux_file_dtor even look at td_fpop? Ah. I think it should not do that and make the data it uses in the dtor more self-contained: Index: sys/ofed/include/linux/linux_compat.c === --- linux_compat.c (revision 251465) +++ linux_compat.c (working copy) @@ -212,7 +212,7 @@ linux_file_dtor(void *cdp) struct linux_file *filp; filp = cdp; - filp-f_op-release(curthread-td_fpop-f_vnode, filp); + filp-f_op-release(filp-f_vnode, filp); kfree(filp); } @@ -232,6 +232,7 @@ linux_dev_open(struct cdev *dev, int oflags, int d filp-f_dentry = filp-f_dentry_store; filp-f_op = ldev-ops; filp-f_flags = file-f_flag; + filp-f_vnode = file-f_vnode; if (filp-f_op-open) { error = -filp-f_op-open(file-f_vnode, filp); if (error) { -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD-9.1: machine reboots during snapshot creation, LORs found
On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote: Each day at 5:15 we are generating snapshots on various machines. This used to work perfectly under 7-STABLE for years but since we started to use 9.1-STABLE the machine reboots in about 10% of all cases. After rebooting we find a new snapshot file which is a bit smaller than the good ones and with different permissions It does not succeed a fsck. In this example it is the one whose name is beginning with s3: -r--r- 1 root operator snapshot 72802894528 29 May 05:15 s2-2013.05.28-03.15.04 -r 1 root operator snapshot 72802893824 29 May 05:15 s3-2013.05.29-03.15.03 -r--r- 1 root operator snapshot 72802894528 28 May 14:22 s4-2013.05.23-06.38.44 -r--r- 1 root operator snapshot 72802894528 28 May 14:22 s5-2013.05.24-03.15.03 -r--r- 1 root operator snapshot 72802894528 28 May 14:22 s6-2013.05.25-03.15.03 After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel I see the following LORs (mksnap_ffs starts exactly at 5:15): May 29 05:15:00 kern.crit palveli kernel: lock order reversal: May 29 05:15:00 kern.crit palveli kernel: 1st 0xc2371da8 ufs (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240 May 29 05:15:00 kern.crit palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414 May 29 05:15:04 kern.crit palveli kernel: lock order reversal: May 29 05:15:04 kern.crit palveli kernel: 1st 0xc228471c snaplk (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976 May 29 05:15:04 kern.crit palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626 Unfortunatley no corefiles are being generated ;-(. I have checked and even rebuilt the (UFS1) fs in question from scratch. I have also seen this happen on an UFS2 on another machine and on a third one when running dump -L on a root fs. Any hints of how to proceed? Would it be possible to setup a serial console that is logged on this machine to see if it is panic'ing but failing to write out a crashdump? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: SunFire X2200 ilo's bge1 DOWN/UP
On Thursday, May 30, 2013 2:44:35 am Daniel Braniss wrote: --/04w6evG8XlLl3ft Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename=bge.media_sts.diff Index: sys/dev/bge/if_bge.c === --- sys/dev/bge/if_bge.c(revision 251021) +++ sys/dev/bge/if_bge.c(working copy) @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct ifmediar BGE_LOCK(sc); + if ((ifp-if_flags IFF_UP) == 0) { + BGE_UNLOCK(sc); + return; + } if (sc-bge_flags BGE_FLAG_TBI) { ifmr-ifm_status = IFM_AVALID; ifmr-ifm_active = IFM_ETHER; --/04w6evG8XlLl3ft-- after 18hs, the logs are empty! it seems the patch fixes the problem. now maybe it's time to hunt for who is randomly calling for bge_ifmedia_sts ... It could be any number of daemons that query interface state such as an SNMP server, ladvd, etc. If you wanted help you could modify the patch so that it does something like this: if (/* test for IFF_UP */) { BGE_UNLOCK(sc); if_printf(ifp, state queried on down interface by pid %d (%s), curthread-td_proc-p_pid, curthread-td_proc-p_comm); return; } -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: System doesn't dump
On Wednesday, May 29, 2013 2:41:38 am Dominic Fandrey wrote: I have a number of actions that reliably panic the system, such as performing shutdown -p (yes I'm booting into an inconsistent file system every time). Both with my notebook and my workstation. However I cannot get the system to dump. dumpdir=/var/crash and I've tried ada0s2b, /dev/ada0s2b, label/5swap, /dev/label/5swap and AUTO for dumpdev to no avail. The swap partition is 16g, the machines have 8g RAM and there's plenty of hard disk space available for /var/crash. I'm looking for that secret, undocumented trigger, that makes the system dump if a panic occurs. Once upon a time dumping just worked if the swap partition was large enough. I miss those olden days. Does /dev/dumpdev exist and point to your swap partition after booting? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9.1-REL Supermicro H8DCL-iF kernel panic
On Monday, April 01, 2013 12:29:46 pm Xin Li wrote: Yes, this is a bandaid and the right fix should be refactor the code a little bit to make sure that no interrupt handler is installed before the driver have done other initializations but I don't have hardware that can reproduce this issue handy to validate changes like that. It is not that easy. I instrumented the crap out of the igb driver on the one machine where I could reliably reproduce this and kept clearing the interrupt cause register during attach multiple times and still got a spurious interrupt. I believe this is a chip bug of some sort, but I've no idea whose fault it is. It has only been reported on SuperMicro *8* boards to date. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [patch] IPMI KCS can drop the lock while servicing a request
On Saturday, March 23, 2013 11:11:20 pm Eric van Gyzen wrote: At work, we discovered that our application's IPMI thread would often use a lot of CPU time. The KCS thread uses DELAY to wait for the BMC, so it can run without sleeping for a long time with a slow BMC. It also holds the ipmi_softc.ipmi_lock during this time. When using adaptive mutexes, an application thread that wants to operate on the ipmi_pending_requests list will also spin during this same time. We see no reason that the KCS thread needs to hold the lock while servicing a request. We've been running with the attached patch for a few months, with no ill effects. The lock protects against concurrent access to the registers themselves (though the thread sort of does this already). However, even with a slow BMC it shouldn't be waiting but so long. I had some other comments about this patch in my reply to when it was committed. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: gptzfsboot: error 4 lba 30
On Monday, March 25, 2013 7:52:04 am Kai Gallasch wrote: Hi. On one of my fresh installed servers I am seeing the following output during boot: gptzfsboot: error 4 lba 30 gptzfsboot: error 4 lba 31 gptzfsboot: error 4 lba 31 gptzfsboot: error 4 lba 31 gptzfsboot: error 4 lba 30 gptzfsboot: error 4 lba 31 gptzfsboot: error 4 lba 31 gptzfsboot: error 4 lba 31 gptzfsboot: error 4 lba 31 gptzfsboot: error 4 lba 31 gptzfsboot: error 4 lba 31 gptzfsboot: error 4 lba 31 Humm, do you have disks that the BIOS sees that are small? An error code of 4 means 'sector not found' or 'read error'. It would be interesting to see the output of 'lsdev -v' from the loader prompt. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Core Dump / panic sleeping thread
On Wednesday, March 20, 2013 9:22:22 am Konstantin Belousov wrote: On Wed, Mar 20, 2013 at 12:13:05PM +0100, Michael Landin Hostbaek wrote: On Mar 20, 2013, at 10:49 AM, Konstantin Belousov kostik...@gmail.com wrote: I do not like it. As I said in the previous response to Andrey, I think that moving the vnode_pager_setsize() after the unlock is better, since it reduces races with other thread seeing half-done attribute update or making attribute change simultaneously. OK - so should I wait for another patch - or? I think the following is what I mean. As an additional note, why nfs client does not trim the buffers when server reported node size change ? Will changing the size always result in an mtime change forcing the client to throw away the data on the next read or fault anyway (or does it only affect ctime)? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: svn - but smaller?
On Wednesday, March 13, 2013 10:11:28 pm John Mehr wrote: And svnup(1) really should mention that any files in the target tree not in the repository will be deleted, which was (explicitly) not the case with c{,v}sup. I only lost a few acpi patches that I think have likely made it to stable/9 anyway, and it's a test system, but I was surprised. I always thought csup did delete files. I was looking at csup's man page for things to put on the to-do list and there's a csup command line parameter ( -d ) that puts a limit on the number of files that can be deleted in a given run. Adding this feature is already on my to-do list, and I've just added another item to let the user choose whether svnup should delete extra files in the local source tree. csup deletes files that are deleted upstream (so if an svn commit were to remove a file from the source tree). It did not delete files that were locally added (like work/ directories for port builds, or kernel config files) that were never in the repository in the first place. I think that is the approach you probably want to take by default. That is also how the stock svn client acts. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: mfi timeouts
On Wednesday, February 27, 2013 12:58:11 am rihad wrote: Now about this part taken from here http://lists.freebsd.org/pipermail/freebsd-scsi/2011-March/004839.html By issuing a dummy read operation (thus forcing a flush of data buffers), this issue is largely averted. Does this mean that battery-backed cache (BBU) is effectively rendered useless, as all write operations are forced on to the disk platters on every interrupt? No, this is a very different level. This is forcing pending PCI DMA transactions on the PCI bus to flush by doing a read, not forcing I/O buffers to be flushed to disk. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: IPMI serial console
On Thursday, February 21, 2013 5:55:01 pm Glen Barber wrote: On Thu, Feb 21, 2013 at 05:23:14PM -0500, John Baldwin wrote: On Thursday, February 21, 2013 4:56:02 pm Daniel O'Connor wrote: On 22/02/2013, at 2:19, John Baldwin j...@freebsd.org wrote: Does anyone have any hints? Rather than using all these hints, just use these three in loader.conf: console=comconsole vidconsole console_speed=115200 console_port=0xblah (where blah is the correct I/O port for COM3, 0x3e8 maybe?) No dice :( I also tried booting with '-D -h -S 115200' but nothing either. Sorry, those should be 'comconsole_speed' and 'comconsole_port'. Also, you should be able to get the loader prompt working if you enter those by hand using an IPMI KVM or some such. John, this sounds very similar to a question I posed to you a few weeks ago. I guess it's not just me with these weird SuperMicro BMCs. :( I am using exactly this on many SuperMicro X8 and X9 boards. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: IPMI serial console
On Thursday, February 21, 2013 5:42:08 pm Daniel O'Connor wrote: On 22/02/2013, at 8:53, John Baldwin j...@freebsd.org wrote: I also tried booting with '-D -h -S 115200' but nothing either. Sorry, those should be 'comconsole_speed' and 'comconsole_port'. Also, you should be able to get the loader prompt working if you enter those by hand using an IPMI KVM or some such. No luck with that either :( The IPMI serial console works for the BIOS loader so I guess the comconsole parts work, however the kernel doesn't seem to use it even with '-D -h'. The uart(4) flags are correct (I believe) uart0: 16550 or compatible port 0x3f8-0x3ff irq 4 on acpi0 uart1: 16550 or compatible port 0x2f8-0x2ff irq 3 on acpi0 uart2: 16550 or compatible port 0x3e8-0x3ef irq 5 flags 0x30 on acpi0 The way this works with the kernel is that the loader has to be setting a hw.uart.console hint based on comconsole_port. The hint.uart.X.flags settings are completely ignored for this. Also, for 9.1, you must set the speed before you set the port (so the order of lines in loader.conf matters), or hw.uart.console will tell the kernel to use 9600 instead of 115200. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD-9.1 would not boot on pentium3 laptop
On Tuesday, February 26, 2013 1:10:37 am Mikhail T. wrote: 15.02.2013 08:49, John Baldwin ???(??): Were you able to test this patch? Yes, with the patch my laptop boots -- even after I removed the work-around (hint.ichss.0.disabled=1 from device.hints). powerd is also able to regulate the frequency -- I'm not sure, how else to test the functionality. Thank you. Yours, Perfect, thanks for testing! -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: mfi timeouts
On Tuesday, February 26, 2013 1:31:44 pm rihad wrote: On 28/10/2011 04:14, Jan Mikkelsen wrote: / Hi, // // There is a patch linked to from this PR, which seems very similar: // // http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/140416 // // http://lists.freebsd.org/pipermail/freebsd-scsi/2011- March/004839.html // // The problem is also consistent with running mfiutil clearing the problem. // // I'm about to deploy mfi controllers in a similar configuration, so I'd be very curious about whether the patch fixes the problem for you. // /This looks promising, I'll give a try when I get a moment. Hi, Did the patch help? We're having the same issues running mfiutil show volumes every minute doesn't make the freezes go away. Will this small patch be ok on 8.2-RELEASE-p4? Thanks. You can use the patch on 8.2. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: IPMI serial console
On Thursday, February 21, 2013 5:45:13 am Daniel O'Connor wrote: Hi all, A recent thread inspired me to try getting a proper serial console working on a Supermicro X9SCL motherboard with IPMI. However I find that while I see loader messages and the getty I enabled after boot I don't get any kernel messages which does somewhat limit the utility.. The BMC creates COM3 (/dev/cuau2) which works with getty. I modified /boot/loader.conf like so.. boot_multicons=yes boot_serial=YES console=comconsole vidconsole comconsole_speed=115200 # Disable console flags on these 2 ports hint.uart.0.flags=0x00 hint.uart.1.flags=0x00 # Set console flag hint.uart.2.flags=0x10 Does anyone have any hints? Rather than using all these hints, just use these three in loader.conf: console=comconsole vidconsole console_speed=115200 console_port=0xblah (where blah is the correct I/O port for COM3, 0x3e8 maybe?) -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: IPMI serial console
On Thursday, February 21, 2013 4:56:02 pm Daniel O'Connor wrote: On 22/02/2013, at 2:19, John Baldwin j...@freebsd.org wrote: Does anyone have any hints? Rather than using all these hints, just use these three in loader.conf: console=comconsole vidconsole console_speed=115200 console_port=0xblah (where blah is the correct I/O port for COM3, 0x3e8 maybe?) No dice :( I also tried booting with '-D -h -S 115200' but nothing either. Sorry, those should be 'comconsole_speed' and 'comconsole_port'. Also, you should be able to get the loader prompt working if you enter those by hand using an IPMI KVM or some such. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9-STABLE - NFS - NetAPP:
On Friday, February 15, 2013 11:31:11 pm Marc Fournier wrote: Trying the patch now … but what do you mean by using 'SIGSTOP'? I generally do a 'kill -HUP' then when that doesn't work 'kill -9' … should Iuse -STOP instead of 9? No. This patch only helps if you are using kill -STOP to pause processes and later resume them. If you aren't doing that, then the suspension could be due to a different cause. Please try this patch instead and let me know if you see any of the 'Deferring' messages on the console: Index: kern_thread.c === --- kern_thread.c (revision 246122) +++ kern_thread.c (working copy) @@ -794,7 +794,30 @@ thread_suspend_check(int return_instead) (p-p_flag P_SINGLE_BOUNDARY) return_instead) return (ERESTART); +#if 0 /* +* Ignore suspend requests for stop signals if they +* are deferred. +*/ + if (P_SHOULDSTOP(p) == P_STOPPED_SIG + td-td_flags TDF_SBDRY) { + KASSERT(return_instead, + (TDF_SBDRY set for unsafe thread_suspend_check)); + return (0); + } +#else + /* Ignore syspend requests if stops are deferred. */ + if (td-td_flags TDF_SBDRY) { + if (!return_instead) + panic(TDF_SBDRY set, but return_instead not); + if (P_SHOULDSTOP(p) != P_STOPPED_SIG) + printf(Deferring non-STOP suspension: SHOULDSTOP: %x p_flag %x\n, + P_SHOULDSTOP(p), p-p_flag); + return (0); + } +#endif + + /* * If the process is waiting for us to exit, * this thread should just suicide. * Assumes that P_SINGLE_EXIT implies P_STOPPED_SINGLE. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9-STABLE - NFS - NetAPP:
On Thursday, February 14, 2013 10:05:56 pm Rick Macklem wrote: Marc Fournier wrote: On 2013-02-13, at 3:54 PM, Rick Macklem rmack...@uoguelph.ca wrote: The pid that is in T state for the ps auxlH. Different server, last kernel update on Jan 22nd, https process this time instead of du last time. I've attached: ps auxlH ps auxlH of just the processes that are in TJ state (6 httpd servers) procstat output for each of the 6 process They are included as attachments … if these don't make it through, let me know, just figured I'd try and keep it compact ... Well, I've looked at this call path a little closer: 16693 104135 httpd-mi_switch+0x186 thread_suspend_check+0x19f sleepq_catch_signals+0x1c5 sleepq_timedwait_sig+0x19 _sleep+0x2ca clnt_vc_call+0x763 clnt_reconnect_call+0xfb newnfs_request+0xadb nfscl_request+0x72 nfsrpc_accessrpc+0x1df nfs34_access_otw+0x56 nfs_access+0x306 vn_open_cred+0x5a8 kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7 I am probably way off, since I am not familiar with this stuff, but it seems to me that thread_suspend_check() should just return 0 for the case where stop_allowed == SIG_STOP_NOT_ALLOWED (TDF_SBDRY flag set) instead of sitting in the loop and doing a mi_switch(). I'm not even sure if it should call thread_suspend_check() for this case, but there are cases in thread_suspend_check() that I don't understand. Although I don't really understand thread_suspend_check(), I've attached a simple patch that might be a starting point for fixing this? I wouldn't recommend trying the patch until kib and/or jhb weigh in on whether it makes any sense. I think this is the right idea, but in HEAD with the sigdeferstop() changes it should just check for TDF_SBDRY instead of adding a new parameter. I think checking for TDF_SBDRY will work even in 9 (and will make the patch smaller). Also, I think this is only needed for stop signals. Other suspend requests will eventually resume the thread, it is only stop signals that can cause the thread to get stuck indefinitely (since it depends on the user sending SIGCONT). Marc, are you using SIGSTOP? Index: kern_thread.c === --- kern_thread.c (revision 246122) +++ kern_thread.c (working copy) @@ -795,6 +795,17 @@ thread_suspend_check(int return_instead) return (ERESTART); /* +* Ignore suspend requests for stop signals if they +* are deferred. +*/ + if (P_SHOULDSTOP(p) == P_STOPPED_SIG + td-td_flags TDF_SBDRY) { + KASSERT(return_instead, + (TDF_SBDRY set for unsafe thread_suspend_check)); + return (0); + } + + /* * If the process is waiting for us to exit, * this thread should just suicide. * Assumes that P_SINGLE_EXIT implies P_STOPPED_SINGLE. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD-9.1 would not boot on pentium3 laptop
On Thursday, February 07, 2013 2:25:17 pm John Baldwin wrote: On Thursday, February 07, 2013 1:28:30 pm Mikhail T. wrote: On 07.02.2013 13:16, John Baldwin wrote: Can you get pciconf -lc output? Here: hostb0@pci0:0:0:0: class=0x06 card=0x chip=0x11308086 rev=0x02 hdr=0x00 cap 09[88] = vendor (length 4) Intel cap 15 version 1 cap 02[a0] = AGP 4x 2x 1x SBA disabled Looks like you have one of the systems the comment mentions. Try this patch to see if ichss is disabled automatically for you: Were you able to test this patch? Index: ichss.c === --- ichss.c (revision 246122) +++ ichss.c (working copy) @@ -67,7 +67,7 @@ struct ichss_softc { #define PCI_DEV_82801BA 0x244c /* ICH2M */ #define PCI_DEV_82801CA 0x248c /* ICH3M */ #define PCI_DEV_82801DB 0x24cc /* ICH4M */ -#define PCI_DEV_82815BA 0x1130 /* Unsupported/buggy part */ +#define PCI_DEV_82815_MC 0x1130 /* Unsupported/buggy part */ /* PCI config registers for finding PMBASE and enabling SpeedStep. */ #define ICHSS_PMBASE_OFFSET 0x40 @@ -155,9 +155,6 @@ ichss_identify(driver_t *driver, device_t parent) * E.g. see Section 6.1 PCI Devices and Functions and table 6.1 of * Intel(r) 82801BA I/O Controller Hub 2 (ICH2) and Intel(r) 82801BAM * I/O Controller Hub 2 Mobile (ICH2-M). - * - * TODO: add a quirk to disable if we see the 82815_MC along - * with the 82801BA and revision 5. */ ich_device = pci_find_bsf(0, 0x1f, 0); if (ich_device == NULL || @@ -167,6 +164,22 @@ ichss_identify(driver_t *driver, device_t parent) pci_get_device(ich_device) != PCI_DEV_82801DB)) return; + /* + * Certain systems with ICH2 and an Intel 82815_MC host bridge + * where the host bridge's revision is 5 lockup if SpeedStep + * is used. + */ + if (pci_get_device(ich_device) == PCI_DEV_82801BA) { + device_t hostb; + + hostb = pci_find_bsf(0, 0, 0); + if (hostb != NULL + pci_get_vendor(hostb) == PCI_VENDOR_INTEL + pci_get_device(hostb) == PCI_DEV_82815_MC + pci_get_revid(hostb) 5) + return; + } + /* Find the PMBASE register from our PCI config header. */ pmbase = pci_read_config(ich_device, ICHSS_PMBASE_OFFSET, sizeof(pmbase)); -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9-STABLE - NFS - NetAPP:
On Friday, February 15, 2013 10:21:11 am Rick Macklem wrote: Konstantin Belousov wrote: On Fri, Feb 15, 2013 at 08:44:43AM -0500, John Baldwin wrote: On Thursday, February 14, 2013 10:05:56 pm Rick Macklem wrote: Marc Fournier wrote: On 2013-02-13, at 3:54 PM, Rick Macklem rmack...@uoguelph.ca wrote: The pid that is in T state for the ps auxlH. Different server, last kernel update on Jan 22nd, https process this time instead of du last time. I've attached: ps auxlH ps auxlH of just the processes that are in TJ state (6 httpd servers) procstat output for each of the 6 process They are included as attachments ??? if these don't make it through, let me know, just figured I'd try and keep it compact ... Well, I've looked at this call path a little closer: 16693 104135 httpd - mi_switch+0x186 thread_suspend_check+0x19f sleepq_catch_signals+0x1c5 sleepq_timedwait_sig+0x19 _sleep+0x2ca clnt_vc_call+0x763 clnt_reconnect_call+0xfb newnfs_request+0xadb nfscl_request+0x72 nfsrpc_accessrpc+0x1df nfs34_access_otw+0x56 nfs_access+0x306 vn_open_cred+0x5a8 kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7 I am probably way off, since I am not familiar with this stuff, but it seems to me that thread_suspend_check() should just return 0 for the case where stop_allowed == SIG_STOP_NOT_ALLOWED (TDF_SBDRY flag set) instead of sitting in the loop and doing a mi_switch(). I'm not even sure if it should call thread_suspend_check() for this case, but there are cases in thread_suspend_check() that I don't understand. Although I don't really understand thread_suspend_check(), I've attached a simple patch that might be a starting point for fixing this? I wouldn't recommend trying the patch until kib and/or jhb weigh in on whether it makes any sense. I think this is the right idea, but in HEAD with the sigdeferstop() changes it should just check for TDF_SBDRY instead of adding a new parameter. I think checking for TDF_SBDRY will work even in 9 (and will make the patch smaller). Also, I think this is only needed for stop signals. Other suspend requests will eventually resume the thread, it is only stop signals that can cause the thread to get stuck indefinitely (since it depends on the user sending SIGCONT). Marc, are you using SIGSTOP? Index: kern_thread.c === --- kern_thread.c (revision 246122) +++ kern_thread.c (working copy) @@ -795,6 +795,17 @@ thread_suspend_check(int return_instead) return (ERESTART); /* + * Ignore suspend requests for stop signals if they + * are deferred. + */ + if (P_SHOULDSTOP(p) == P_STOPPED_SIG + td-td_flags TDF_SBDRY) { + KASSERT(return_instead, + (TDF_SBDRY set for unsafe thread_suspend_check)); + return (0); + } + + /* * If the process is waiting for us to exit, * this thread should just suicide. * Assumes that P_SINGLE_EXIT implies P_STOPPED_SINGLE. This looks correct. Righto. Thanks jhb and kib for looking at this. Btw John, PBDRY still gets set for sleeps in the sys/rpc code. However, as far as I can tell, it just sets TDF_SBDRY when it is already set and seems harmless. (Since this code is supposed to be generic and not specific to NFS, maybe it should stay that way?) In HEAD PBDRY is now a nop and the existing sigdeferstop() stuff should cover the calls in sys/rpc. Also, since PBDRY on the sleeps sets TDF_SBDRY, I think the above patch is ok for stable/9 without your recent head patch. Yep, exactly. Thanks everyone for your help, rick Thanks for your debugging! -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9.1-RELEASE AMD64 crash under VBox 4.2.6 when IO APIC is disabled
On Wednesday, February 13, 2013 6:56:06 pm CeDeROM wrote: On Wed, Feb 13, 2013 at 4:48 PM, John Baldwin j...@freebsd.org wrote: The simple answer that I have deduced is that APIC is MANDATORY for AMD64 machines and they won't run otherwise? This is why generic AMD64 install fails when no APIC is enabled in the VBox? No, it is not quite like that. x86 machines have two entirely different sets of interrupt controllers. (...) Hello John :-) Things now are more clear to me, thank you for your extensive explanation!! :-) I am wondering in that case if it wouldn't be a good idea to put atpci (old x86 IRQ handler) in the GENERIC configuration, or at least in the default installer kernel, so it is a safe fallback for a AMD64 machines with no APIC support, as for example VBox with APIC disabled..? Is atpic removed on purpose so it enforces use of new APIC and so better performance? Real hardware should always use device apic on amd64. Even for a VM you should prefer apic. That is, I think you should just enable APIC when using VBox. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9.1-RELEASE AMD64 crash under VBox 4.2.6 when IO APIC is disabled
On Monday, February 11, 2013 4:34:37 pm CeDeROM wrote: On Mon, Feb 11, 2013 at 10:06 PM, John Baldwin j...@freebsd.org wrote: On Sunday, February 10, 2013 1:16:16 pm CeDeROM wrote: Hey :-) I have just noticed that booting installation media for FreeBSD 9.1-RELEASE AMD64 from ISO bootonly under VirtualBox 4.2.6 results in a kernel panic both when ACPI is enabled and disabled in You will need to add 'device atpic' to your kernel config and build a custom kernel. All real amd64-capable hardware has APICs. Hello John :-) Thank you for your reply, still I need some more information to understand why this happens :-) The simple answer that I have deduced is that APIC is MANDATORY for AMD64 machines and they won't run otherwise? This is why generic AMD64 install fails when no APIC is enabled in the VBox? No, it is not quite like that. x86 machines have two entirely different sets of interrupt controllers. Old i386 machines only had a pair of 8259A controllers (this is what 'device atpic' manages), and i386 kernels assume they are always present (see sys/i386/conf/DEFAULTS). When Intel added SMP support to i386 machines starting with the 486 and Pentium they added a new set of interrupt controllers called APICs (both I/O APICs to manage device interrupts ala the 8259As and on-CPU APICs on Pentium and later called local APICs). device apic enables use of APICs. The code to manage these is actually shared between i386 and amd64 and any x86 kernel can use one or the other of these _if_ the relevant driver is compiled in. On i386 'device atpic' is enabled by default (via DEFAULTS) and 'device apic' is enabled in GENERIC, so i386 kernels will work with both out of the box. On amd64, 'device atpic' is not enabled by default (not in GENERIC), but 'device apic' is mandated to be on (it's not even an option, just always compiled in). So GENERIC on amd64 only supports 'device apic' by default. You can use 'device atpic' on amd64 if you really want to, but APICs are more efficient and required for using multiple CPUs, so unless you are working around a specific hardware bug (or writing a hypervisor where you haven't implemented APIC emulation yet), you should prefer APIC. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9.1-RELEASE AMD64 crash under VBox 4.2.6 when IO APIC is disabled
On Sunday, February 10, 2013 1:16:16 pm CeDeROM wrote: Hey :-) I have just noticed that booting installation media for FreeBSD 9.1-RELEASE AMD64 from ISO bootonly under VirtualBox 4.2.6 results in a kernel panic both when ACPI is enabled and disabled in the boot dialog screen (seems different cause of crash), when IO APIC is disabled in VBox (which is a default). I thought AMD64 is not related to APIC..? Best regards :-) Tomek You will need to add 'device atpic' to your kernel config and build a custom kernel. All real amd64-capable hardware has APICs. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD-9.1 would not boot on pentium3 laptop
On Wednesday, February 06, 2013 1:24:57 am Mikhail T. wrote: On 05.02.2013 23:38, Mikhail T. wrote: What happened between 6.x and 7.x? Ok, what happened is that device cpufreq is now in GENERIC and the ichss0 along with it. Setting set hint.ichss.0.disabled=1 on the loader prompt allows me to boot -- both my own kernel as well as the 9.1-RELEASE from CD. Solved... Annoying beyond belief, but solved. I wonder if your system falls into this: /* * ICH2/3/4-M I/O Controller Hub is at bus 0, slot 1F, function 0. * E.g. see Section 6.1 PCI Devices and Functions and table 6.1 of * Intel(r) 82801BA I/O Controller Hub 2 (ICH2) and Intel(r) 82801BAM * I/O Controller Hub 2 Mobile (ICH2-M). * * TODO: add a quirk to disable if we see the 82815_MC along * with the 82801BA and revision 5. */ ich_device = pci_find_bsf(0, 0x1f, 0); if (ich_device == NULL || pci_get_vendor(ich_device) != PCI_VENDOR_INTEL || (pci_get_device(ich_device) != PCI_DEV_82801BA pci_get_device(ich_device) != PCI_DEV_82801CA pci_get_device(ich_device) != PCI_DEV_82801DB)) return; Can you get pciconf -lc output? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD-9.1 would not boot on pentium3 laptop
On Thursday, February 07, 2013 1:28:30 pm Mikhail T. wrote: On 07.02.2013 13:16, John Baldwin wrote: Can you get pciconf -lc output? Here: hostb0@pci0:0:0:0: class=0x06 card=0x chip=0x11308086 rev=0x02 hdr=0x00 cap 09[88] = vendor (length 4) Intel cap 15 version 1 cap 02[a0] = AGP 4x 2x 1x SBA disabled Looks like you have one of the systems the comment mentions. Try this patch to see if ichss is disabled automatically for you: Index: ichss.c === --- ichss.c (revision 246122) +++ ichss.c (working copy) @@ -67,7 +67,7 @@ struct ichss_softc { #define PCI_DEV_82801BA0x244c /* ICH2M */ #define PCI_DEV_82801CA0x248c /* ICH3M */ #define PCI_DEV_82801DB0x24cc /* ICH4M */ -#define PCI_DEV_82815BA0x1130 /* Unsupported/buggy part */ +#define PCI_DEV_82815_MC 0x1130 /* Unsupported/buggy part */ /* PCI config registers for finding PMBASE and enabling SpeedStep. */ #define ICHSS_PMBASE_OFFSET0x40 @@ -155,9 +155,6 @@ ichss_identify(driver_t *driver, device_t parent) * E.g. see Section 6.1 PCI Devices and Functions and table 6.1 of * Intel(r) 82801BA I/O Controller Hub 2 (ICH2) and Intel(r) 82801BAM * I/O Controller Hub 2 Mobile (ICH2-M). -* -* TODO: add a quirk to disable if we see the 82815_MC along -* with the 82801BA and revision 5. */ ich_device = pci_find_bsf(0, 0x1f, 0); if (ich_device == NULL || @@ -167,6 +164,22 @@ ichss_identify(driver_t *driver, device_t parent) pci_get_device(ich_device) != PCI_DEV_82801DB)) return; + /* +* Certain systems with ICH2 and an Intel 82815_MC host bridge +* where the host bridge's revision is 5 lockup if SpeedStep +* is used. +*/ + if (pci_get_device(ich_device) == PCI_DEV_82801BA) { + device_t hostb; + + hostb = pci_find_bsf(0, 0, 0); + if (hostb != NULL + pci_get_vendor(hostb) == PCI_VENDOR_INTEL + pci_get_device(hostb) == PCI_DEV_82815_MC + pci_get_revid(hostb) 5) + return; + } + /* Find the PMBASE register from our PCI config header. */ pmbase = pci_read_config(ich_device, ICHSS_PMBASE_OFFSET, sizeof(pmbase)); -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: problems with the mfi
On Tuesday, February 05, 2013 3:48:28 am Daniel Braniss wrote: after rebooting I get very often: ... mfi0: COMMAND 0xff800132d990 TIMEOUT AFTER 659 SECONDS mfi0: COMMAND 0xff800132d990 TIMEOUT AFTER 689 SECONDS mfi0: COMMAND 0xff800132d990 TIMEOUT AFTER 719 SECONDS ... another reboot usualy fixes this. Does it have the latest firmware? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: bge numbering
On Friday, January 25, 2013 3:46:10 am Daniel Braniss wrote: Hi, this server, a Dell R720 has 4 bge on board, Broadcom NetXtreme Gigabit Ethernet, ASIC rev. 0x572 bge0: APE FW version: NCSI v1.1.7.0 bge0: CHIP ID 0x0572; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E miibus0: MII bus on bge0 ... I have connected the ethernet to port labeled 0, but it appears as bge2, how can this be corrected? It can't really. The order of PCI devices is determined by the layout of the PCI device hierarchy which is generally determined by the physical traces on your motherboard. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9-STABLE - NFS - NetAPP:
On Sunday, January 20, 2013 01:10:29 AM Hub- Marketing wrote: On 2013-01-19, at 4:57 AM, John Baldwin j...@freebsd.org wrote: On Tuesday, December 18, 2012 11:58:36 PM Hub- Marketing wrote: I'm running a few servers sitting on top of a NetAPP file server … everything runs great, but periodically I'm getting: nfs_getpages: error 13 vm_fault: pager read error, pid 11355 (https) Are you using interruptible mounts (intr mount option)? 192.168.1.253:/vol/vol1 /vm nfs rw,intr,soft,nolockd 0 0 I just added the 'soft' option to the mix … nolockd is enabled since I know for a fact that its not possible for two processes to access the same file on both mounts at the same time … Ah, ok. I just fixed a bug with interruptible mounts in HEAD where having a signal interrupt an NFS request returns EACCESS (13) rather than EINTR. You should retest with that fix applied. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Failed to attach P_CNT - FreeBSD 9.1 RC3
On Sunday, November 04, 2012 05:56:33 AM Shiv. Nath wrote: Dear FreeBSD Community Friends, It is FreeBSD 9.1 RC3, i get the following warning in the message log file. i need assistance to understand the meaning of this error, how serious is it? acpi_throttle23: failed to attach P_CNT On newer CPUs that use est you don't want to use acpi_throttle anyway so you can ignore the errors. (est gives you power savings when it lowers your CPU speed, acpi_throttle generally does not, it only helps with lowering the temperature) -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Startup lapic messages
On Tuesday, December 18, 2012 06:28:25 AM S.N.Grigoriev wrote: Hi list, I've installed FreeBSD 9.1R amd64 on a new Intel server. The following lapic messages appear during system startup: lapic18: Forcing LINT1 to edge trigger SMP: AP CPU #2 Launched! lapic50: Forcing LINT1 to edge trigger SMP: AP CPU #6 Launched! lapic20: Forcing LINT1 to edge trigger SMP: AP CPU #3 Launched! lapic32: Forcing LINT1 to edge trigger SMP: AP CPU #4 Launched! lapic2: Forcing LINT1 to edge trigger SMP: AP CPU #1 Launched! lapic34: Forcing LINT1 to edge trigger SMP: AP CPU #5 Launched! lapic52: Forcing LINT1 to edge trigger SMP: AP CPU #7 Launched! I've never seen such messages in past. Does it mean I have some hardware problem/misconfiguration? Your BIOS is slightly buggy, but in a harmless way. You can ignore these. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9-STABLE - NFS - NetAPP:
On Tuesday, December 18, 2012 11:58:36 PM Hub- Marketing wrote: I'm running a few servers sitting on top of a NetAPP file server … everything runs great, but periodically I'm getting: nfs_getpages: error 13 vm_fault: pager read error, pid 11355 (https) Are you using interruptible mounts (intr mount option)? Also, can you get ps output that includes the 'l' flag to show what the processes are stuck on? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Failsafe on kernel panic
On Wednesday, January 16, 2013 4:27:53 pm Sami Halabi wrote: Thank you for your response, very helpful. one question - how do i configure auto-reboot once kernel panic occurs? Unless you've added DDB and KDB to your kernel it will reboot by default on a panic. Stable kernel configs also include the unattended option so that even with the debugger present they reboot by default on a panic. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Failsafe on kernel panic
On Wednesday, January 16, 2013 2:25:33 pm Sami Halabi wrote: Hi everyone, I have a production box, in which I want to install new kernel without any remotd kvn. my problem is its 2 hours away, and if a kernel panic occurs I got a problem. I woner if I can seg failsafe script to load the old kernel in case of psnic. man nextboot (if you are using UFS) -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Possible to reset PCI device at boot? [Was: Re: msi-x enabled igb works only if module loaded twice]
On Tuesday, October 23, 2012 8:40:44 am Harald Schmalzbauer wrote: schrieb Harald Schmalzbauer am 23.10.2012 11:49 (localtime): schrieb Harald Schmalzbauer am 22.10.2012 21:48 (localtime): schrieb Harald Schmalzbauer am 22.10.2012 21:33 (localtime): Hello, when using igb as module, no packet is received. If I send out anything, I see the packet with tcpdump, also the switch learns the MAC address, but nothing comes back in - total silenc, no boradcasts, nothing. If I unload the module and load it again, everything works as expected! No matter if I load it by 4th loader, or later, I always have tio unload first then load it again. I'ts late here, I'll see tomorrow if things change when compieled into kernel. It doesn't matter if igb is loaded as module or compiled into kernel. Maby somebody has an idea what the source of the problem could be. Please find atteched some info, the OS is 9-RC2-amd64 on ESXi5.1 and nics are pci-passthrough. I found one possibly relevant difference: Non-Working state:dev.igb.0.link_irq: 0 Working state: dev.igb.0.link_irq: 2 This is only true with msi-x!!! If I disable mis-x, the problem itself vanishes. igb just works fine from the initial loading (with dev.igb.0.link_irq=0!). So dev.igb.0.link_irq is only relevant with msi-x. But what makes me curious is why it also works mith mis-x enabled after the second kldload!?! I think I found the root cause: When ESXi powers up the guest, the passthru-devices are intialized with: VMKPCIPassthru: 2565: BDF = 02:00.1 intrType = 2 numVectors: 1 intrType=2 seems to mean MSI. I guess, IOMMUIntel is instructed to remap one irq-vector for the device. But igb uses MSI-X and wants 3 vectors. If I unload if_igb and reload again, the ESXi-log shows the following: VMKPCIPassthru: 2565: BDF = 02:00.1 intrType = 4 numVectors: 3 intrType=4 seems to mean MSI-X. After that initialization, if_igb works fine and saves 25kIRQ/s! I haven't found a way to change the power-up behaviour for the guest with ESXi. Is it possible to re-init a pci device from userland? The problem is you want the igb driver to retry MSI-X even after a re-init and that basically requires a full detach/attach, so your existing workaround is actually the best way to do this. :( Alternatively, you could try forcing igb to not use MSI, only use either MSI-X or INTx. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ${CTFCONVERT_CMD} expands to empty string
On Sunday, October 21, 2012 8:25:32 pm Andrey Chernov wrote: Those lines cause this error: .if ${MK_CTF} != no CTFCONVERT_CMD= ${CTFCONVERT} ${CTFFLAGS} ${.TARGET} .elif ${MAKE_VERSION} = 520300 CTFCONVERT_CMD= .else CTFCONVERT_CMD= @: .endif My make version is 9201206140 So, either the check for = 520300 is incorrect or change for empty make variables expansion is not merged into stable-9 I can't reproduce this doing a buildworld of a stable/9 checkout on a 9.0- stable machine btw. What exact contents of /etc/src.conf and commands are you using to reproduce this? I also can't find the string empty string in the output of my stable/9 'make universe' build before I committed this. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ${CTFCONVERT_CMD} expands to empty string
On Monday, October 22, 2012 1:01:53 pm Andrey Chernov wrote: All that happens because this commit is not merged into stable-9. Do you plan to mere it by yourself? r228157 | fjoe | 2011-11-30 22:07:38 +0400 (ср, 30 ноя 2011) | 10 lines - Fix segmentation fault when running +command when run with -jX -n due to Compat_RunCommand() being called with `cmd' that is not on the node- commands list - Make ellipsis (... command) handling consistent: check for ... command in job make after variables expansion to match compat make behavior - Fix empty command handling (after variables expansion and @+- modifiers are processed): now empty commands are ignored in compat make and are not printed in job make case - Bump MAKE_VERSION to 5-2011-11-30-0 As soon as I can reproduce something that tests it, sure (I want to have a test case I can reproduce so that I can also check for 8). Your test Makefile does break on 8 and 9, want to do some more tests. On 22.10.2012 20:45, Andrey Chernov wrote: And simple test case proving that make v9201206140 dislike empty commands. Makefile: CTFCONVERT_CMD= all: echo ${MAKE_VERSION} ${CTFCONVERT_CMD} echo b make echo 9201206140 9201206140 ${CTFCONVERT_CMD} expands to empty string echo b b -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ${CTFCONVERT_CMD} expands to empty string
On Friday, October 19, 2012 09:06:55 PM Andrey Chernov wrote: On recent -stable I got a lots of (see subj) now due to CTF changes in *.mk files. I have WITHOUT_CDDL=yes in my /etc/src.conf and WITHOUT_CDDL have wider scope than WITHOUT_CTF suggested, but WITHOUT_CDDL is not checked in recent CTF changes. Please fix this thing. Which stable? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 9.1-RC2 Available...
On Friday, October 19, 2012 2:26:45 pm Alex de Joode wrote: https://sabotage.org/FBSD/FBSD-9.1RC2.jpg Screen shot. Basicly the only diff between the two r210 are the disks, one has 2x2TB (works) and the one that has 2x1Tb fails with the above error. Both are sw/ mirrored. No hw/ raid and ACHI sata settings. Hummm, somehow we are executing data, not code: 8c 39 00 00 01 82 44 45 4c 4c 20 20 50 45 5f 53 |.9DELL PE_S| That isn't a valid instruction. :( Also, your eip value is not anything that would be normal. Actually, your eip value looks like a pointer into the BIOS (0xf000:bf6a). I bet something in your BIOS had a buffer overrun and trashed the stack or some such. Or it overran an I/O buffer which trashed the return stack of the userland process somehow. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: mpt irq timeout problem after reboot - only if non-verbose booting !?!
On Wednesday, October 17, 2012 3:14:52 pm Harald Schmalzbauer (mobil) wrote: -Ursprüngliche Nachricht- Von: John Baldwin j...@freebsd.org An: freebsd-stable@freebsd.org Cc: h.schmalzba...@omnilan.de Gesendet: 17.10.'12, 20:46 On Tuesday, October 16, 2012 5:24:44 am Harald Schmalzbauer wrote: Hello, I have 9.1-RC2 running in an ESXi 5.1 guest. I use 'lsisas' as virtual SCSI-Controller and mpt attaches and finds 1068E. Everything is working fine until the first 'shutdown -r now': The second boot pauses for ~2 minutes after probing disks and continues with this error: mpt0: Timedout requests already complete. Interrupts may not be functioning. To be clear, you only see this at the end of reboot, and the hardware is fine once the machine is back up? . Thanks for your attention! The timeout occurs after the first 'shutdown -r' while device probing during second boot process. Perhaps this is amd64 specific. Today I had a new i386 setup which doesn't exhibit this timeout. But it's on different hardware and hv-host was 5.0 inestead 5.1. So not really representative... Hmmm, ok. In that case my patch is not relevant. It would only fix that message occuring during the shutdown. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: mpt irq timeout problem after reboot - only if non-verbose booting !?!
, mpt_vol, mpt_verify_mwce: Get request failed!\n); @@ -965,7 +966,7 @@ static void rv = mpt_issue_raid_req(mpt, mpt_vol, /*disk*/NULL, req, MPI_RAID_ACTION_CHANGE_VOLUME_SETTINGS, data, /*addr*/0, /*len*/0, - /*write*/FALSE, /*wait*/TRUE); + /*write*/FALSE, /*wait*/TRUE, sleep_ok); if (rv == ETIMEDOUT) { mpt_vol_prt(mpt, mpt_vol, mpt_verify_mwce: Write Cache Enable Timed-out\n); @@ -1018,7 +1019,8 @@ mpt_verify_resync_rate(struct mpt_softc *mpt, stru rv = mpt_issue_raid_req(mpt, mpt_vol, /*disk*/NULL, req, MPI_RAID_ACTION_SET_RESYNC_RATE, mpt-raid_resync_rate, /*addr*/0, - /*len*/0, /*write*/FALSE, /*wait*/TRUE); + /*len*/0, /*write*/FALSE, /*wait*/TRUE, + /*sleep_ok*/TRUE); if (rv == ETIMEDOUT) { mpt_vol_prt(mpt, mpt_vol, mpt_refresh_raid_data: Resync Rate Setting Timed-out\n); @@ -1054,7 +1056,8 @@ mpt_verify_resync_rate(struct mpt_softc *mpt, stru rv = mpt_issue_raid_req(mpt, mpt_vol, /*disk*/NULL, req, MPI_RAID_ACTION_CHANGE_VOLUME_SETTINGS, data, /*addr*/0, /*len*/0, - /*write*/FALSE, /*wait*/TRUE); + /*write*/FALSE, /*wait*/TRUE, + /*sleep_ok*/TRUE); if (rv == ETIMEDOUT) { mpt_vol_prt(mpt, mpt_vol, mpt_refresh_raid_data: Resync Rate Setting Timed-out\n); @@ -1314,7 +1317,7 @@ mpt_refresh_raid_vol(struct mpt_softc *mpt, struct return; } rv = mpt_issue_raid_req(mpt, mpt_vol, NULL, req, - MPI_RAID_ACTION_INDICATOR_STRUCT, 0, 0, 0, FALSE, TRUE); + MPI_RAID_ACTION_INDICATOR_STRUCT, 0, 0, 0, FALSE, TRUE, TRUE); if (rv == ETIMEDOUT) { mpt_vol_prt(mpt, mpt_vol, mpt_refresh_raid_vol: Progress Indicator fetch timeout\n); @@ -1474,7 +1477,7 @@ mpt_refresh_raid_data(struct mpt_softc *mpt) mpt_vol-flags |= MPT_RVF_UP2DATE; mpt_vol_prt(mpt, mpt_vol, %s - %s\n, mpt_vol_type(mpt_vol), mpt_vol_state(mpt_vol)); - mpt_verify_mwce(mpt, mpt_vol); + mpt_verify_mwce(mpt, mpt_vol, TRUE); if (vol_pg-VolumeStatus.Flags == 0) { continue; @@ -1752,7 +1755,7 @@ mpt_raid_set_vol_mwce(struct mpt_softc *mpt, mpt_r mpt_vol_prt(mpt, mpt_vol, WARNING - Unsafe shutdown detected. Suggest full resync.\n); } - mpt_verify_mwce(mpt, mpt_vol); + mpt_verify_mwce(mpt, mpt_vol, TRUE); } mpt-raid_mwce_set = 1; MPT_UNLOCK(mpt); -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 9.1-RC2 Available...
On Thursday, October 11, 2012 3:49:51 am Sami Halabi wrote: Hi, there's a patch in the list you mentioned. it should go to rc3 i guess. No, that patch would break all other interrupt config hooks like probes for SATA and SCSI disks and USB disks. Some driver's config hook is not finishing. Each driver's hook is responsible for deregistering itself once it has finished it's interrupt probing which is why it is not obvious how the list becomes empty. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: stable/9 panic Bad tailq NEXT(0xffffffff80e52660-tqh_last) != NULL
On Tuesday, October 02, 2012 7:12:59 pm Sean Bruno wrote: On Tue, 2012-10-02 at 14:06 -0700, John Baldwin wrote: On Tuesday, October 02, 2012 3:05:30 pm Sean Bruno wrote: On Mon, 2012-10-01 at 05:47 -0700, John Baldwin wrote: Can you add extra printfs to see where exactly attach is failing? I would start with the attach routine in sys/dev/acpica/acpi_pcib_pci.c: hrm ... interesting side effects. After adding my printf's I don't hit the panic any more. :-) I changed the ret val of acpi_pcib_pci_attach() and put in some instrumentation in acpi_pcib_attach(). The key value is that acpi_DeviceIsPresent() appears to be returning FALSE in this case. patch used --http://people.freebsd.org/~sbruno/acpi_pcib.txt What happens if you just comment out the acpi_DeviceIsPresent() check? wow, it booted up and seems to be fine. huh ... pcib7: ACPI PCI-PCI bridge at device 28.0 on pci0 pcib7: domain0 pcib7: secondary bus 7 pcib7: subordinate bus 7 pcib7: no prefetched decode pci7: ACPI PCI bus on pcib7 pci7: domain=0, physical bus=7 Is there anything on the bus? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: panic Sleeping thread owns a non-sleepable lock via cv_timedwait_signal, was rsync over NFS
On Tuesday, October 02, 2012 11:21:06 am Norbert Aschendorff wrote: I'll compile a kernel with options WITNESS options WITNESS_KDB ok? Or should I include WITNESS_SKIPSPIN too? Yes, you should include WITNESS_SKIPSPIN. We should probably make that the default. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: stable/9 panic Bad tailq NEXT(0xffffffff80e52660-tqh_last) != NULL
On Tuesday, October 02, 2012 3:05:30 pm Sean Bruno wrote: On Mon, 2012-10-01 at 05:47 -0700, John Baldwin wrote: Can you add extra printfs to see where exactly attach is failing? I would start with the attach routine in sys/dev/acpica/acpi_pcib_pci.c: hrm ... interesting side effects. After adding my printf's I don't hit the panic any more. :-) I changed the ret val of acpi_pcib_pci_attach() and put in some instrumentation in acpi_pcib_attach(). The key value is that acpi_DeviceIsPresent() appears to be returning FALSE in this case. patch used --http://people.freebsd.org/~sbruno/acpi_pcib.txt What happens if you just comment out the acpi_DeviceIsPresent() check? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: panic Sleeping thread owns a non-sleepable lock via cv_timedwait_signal, was rsync over NFS
On Tuesday, October 02, 2012 2:19:35 pm Norbert Aschendorff wrote: Well... Here the results for a kernel without WITNESS_SKIPSPIN (I'll compile one including that tomorrow, but until then...) Good news is: The kernel crashed with activated WITNESS. Bad news is: I have to turn power off after the crash with WITNESS. The crash dump is _not_ written to disk :( Good news II is: It wrote something to the syslog. Actually, it wrote very much to the syslog, some megabytes in total. Most of it is the same, here the latest messages logfile: http://lbo.spheniscida.de/Files/nfs-crash.log (94K) It specifies the file, line and zone. Maybe it's useful... That does help. It tells us that the lock being held is a vnode interlock that was last acquired in vinactive(). I don't see how though, unless the lock was recursively acquired elsewhere. You could try adding a different WITNESS check (using WITNESS_WARN) to see which NFS proc returns with a lock held so you can catch this when it first occurs rather than much later after the fact. Do you have the start of the log messages? -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: stable/9 panic Bad tailq NEXT(0xffffffff80e52660-tqh_last) != NULL
On Thursday, September 27, 2012 4:53:49 pm Sean Bruno wrote: On Thu, 2012-09-27 at 10:52 -0700, Sean Bruno wrote: pcib7: ACPI PCI-PCI bridge irq 19 at device 28.7 on pci0 panic: Bad tailq NEXT(0x80e52660-tqh_last) != NULL cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1d8 rman_init() at rman_init+0x17c pcib_alloc_window() at pcib_alloc_window+0x9f pcib_attach_common() at pcib_attach_common+0x457 acpi_pcib_pci_attach() at acpi_pcib_pci_attach+0x1c device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_pci_attach() at acpi_pci_attach+0x164 device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_pcib_attach() at acpi_pcib_attach+0x1a7 acpi_pcib_acpi_attach() at acpi_pcib_acpi_attach+0x1f6 device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_attach() at acpi_attach+0xbc1 device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a nexus_acpi_attach() at nexus_acpi_attach+0x69 device_attach() at device_attach+0x72 bus_generic_new_pass() at bus_generic_new_pass+0xd6 bus_set_pass() at bus_set_pass+0x7a configure() at configure+0xa mi_startup() at mi_startup+0x77 btext() at btext+0x2c Uptime: 1s Automatic reboot in 15 seconds - press a key on the console to abort -- Press a key on the console to reboot, -- or switch off the system now. -- Andriy Gapon resurrecting this thread from my sent items folder, not sure if mailman will thread this correctly or not Anyway, after disabling the broken pci bridge via some hackery that jhb and eadler had lying around, I was able to get the r620 up on the new BIOS and get an acpidump before and after the firmware update. I can poke a the machines, but I don't quite see in this nonsense where it breaks acpi_pcib_pci_attach(). Where should I start poking next? http://people.freebsd.org/~sbruno/acpi_112_r620.txt http://people.freebsd.org/~sbruno/acpi_126_r620.txt For fun, I added the pciconf output to see if there's anything obviously wrong with pcib7. But, as usual, I have no idea how to interpret this. http://people.freebsd.org/~sbruno/r620_pciconf.txt Can you add extra printfs to see where exactly attach is failing? I would start with the attach routine in sys/dev/acpica/acpi_pcib_pci.c: static int acpi_pcib_pci_attach(device_t dev) { struct acpi_pcib_softc *sc; ACPI_FUNCTION_TRACE((char *)(uintptr_t)__func__); pcib_attach_common(dev); sc = device_get_softc(dev); sc-ap_handle = acpi_get_handle(dev); return (acpi_pcib_attach(dev, sc-ap_prt, sc-ap_pcibsc.secbus)); } Hmm, so that can only fail inside of acpi_pcib_attach() in sys/dev/acpica/acpi_pcib.c. I would add printfs to annotate that. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org