Re: newbus' ivar's limitation..
Hi, On Thu, Jul 12, 2012 at 1:20 AM, Warner Losh i...@bsdimp.com wrote: I'm sorry you feel that way. Honestly, though, I think you'll be more pissed when you find out that the N:1 interface that you want is being done in the wrong domain. But I've been wrong before and look forward to seeing your replacement. acpi_pcib_acpi.c, btw, implements both PCIB interfaces and ACPI interfaces. Does it ? From the definition of `acpi_pcib_acpi_methods', I can only see a single pcib(4) interface being exported. Moreover, I do not seem to be able to find any clue that would led any ACPI devices to attach on acpi_pcib_acpi(4), neither to find how could acpi_get_flags() ends up in acpi_pcib_read_ivar() ? Thks, - Arnaud ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: nvidia-driver-295.49 is highly unstable
On Tuesday, July 03, 2012 11:20:49 pm Yuri wrote: On 05/27/2012 13:08, Alexey Dokuchaev wrote: Perhaps you can try asking on official nVidia FreeBSD forum: http://www.nvnews.net/vbulletin/forumdisplay.php?f=47 I reported there 05-28-12, but got no response. Do you know if there is a way to report a problem with NVidia? For example, is there a for example bugzilla or other bug reporting system for this? In addition, I observe system hangup for a few seconds when running glxinfo. Also I observe Xorg freeze when I run nvidia-settings. So I have to run 285.05.09 from cvs instead. If you read the README that comes with the driver there are notes on how to submit a bug report. You need to run a script that collects debug information and then submit an e-mail to the e-mail address in the README. It sometimes takes a few days for them to respond, but I've always had them respond. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: newbus' ivar's limitation..
On Thursday, July 12, 2012 3:01:36 am Arnaud Lacombe wrote: Hi, On Thu, Jul 12, 2012 at 1:20 AM, Warner Losh i...@bsdimp.com wrote: I'm sorry you feel that way. Honestly, though, I think you'll be more pissed when you find out that the N:1 interface that you want is being done in the wrong domain. But I've been wrong before and look forward to seeing your replacement. acpi_pcib_acpi.c, btw, implements both PCIB interfaces and ACPI interfaces. Does it ? From the definition of `acpi_pcib_acpi_methods', I can only see a single pcib(4) interface being exported. Moreover, I do not seem to be able to find any clue that would led any ACPI devices to attach on acpi_pcib_acpi(4), neither to find how could acpi_get_flags() ends up in acpi_pcib_read_ivar() ? acpi_get_handle() is certainly supported. Relevant code snippets: sys/dev/acpica/acpivar.h: /* * Note that the low ivar values are reserved to provide * interface compatibility with ISA drivers which can also * attach to ACPI. */ #define ACPI_IVAR_HANDLE0x100 #define ACPI_IVAR_UNUSED0x101 /* Unused/reserved. */ #define ACPI_IVAR_PRIVATE 0x102 #define ACPI_IVAR_FLAGS 0x103 /* * Accessor functions for our ivars. Default value for BUS_READ_IVAR is * (type) 0. The sys/bus.h accessor functions don't check return values. */ #define __ACPI_BUS_ACCESSOR(varp, var, ivarp, ivar, type) \ \ static __inline type varp ## _get_ ## var(device_t dev) \ { \ uintptr_t v = 0;\ BUS_READ_IVAR(device_get_parent(dev), dev, \ ivarp ## _IVAR_ ## ivar, v); \ return ((type) v); \ } \ \ static __inline void varp ## _set_ ## var(device_t dev, type t) \ { \ uintptr_t v = (uintptr_t) t;\ BUS_WRITE_IVAR(device_get_parent(dev), dev, \ ivarp ## _IVAR_ ## ivar, v);\ } __ACPI_BUS_ACCESSOR(acpi, handle, ACPI, HANDLE, ACPI_HANDLE) __ACPI_BUS_ACCESSOR(acpi, private, ACPI, PRIVATE, void *) __ACPI_BUS_ACCESSOR(acpi, flags, ACPI, FLAGS, int) sys/dev/acpica/acpi_pcib_acpi.c: /* * Support for standard PCI bridge ivars. */ static int acpi_pcib_read_ivar(device_t dev, device_t child, int which, uintptr_t *result) { struct acpi_hpcib_softc *sc = device_get_softc(dev); switch (which) { case PCIB_IVAR_DOMAIN: *result = sc-ap_segment; return (0); case PCIB_IVAR_BUS: *result = sc-ap_bus; return (0); case ACPI_IVAR_HANDLE: *result = (uintptr_t)sc-ap_handle; return (0); case ACPI_IVAR_FLAGS: *result = (uintptr_t)sc-ap_flags; return (0); } return (ENOENT); } sys/dev/acpica/acpi_pcib_pci.c: static int acpi_pcib_read_ivar(device_t dev, device_t child, int which, uintptr_t *result) { struct acpi_pcib_softc *sc = device_get_softc(dev); switch (which) { case ACPI_IVAR_HANDLE: *result = (uintptr_t)sc-ap_handle; return (0); } return (pcib_read_ivar(dev, child, which, result)); } This is used by the ACPI PCI bus driver to detect buses that are enumerated via ACPI and to then provide the ACPI_IVAR_HANDLE for all such PCI devices. Note that ACPI PCI uses its own ivars structure (acpi_pci_devinfo) that extends the base PCI ivars to add ACPI handle and flags. sys/dev/acpi/acpi_pci.c: struct acpi_pci_devinfo { struct pci_devinfo ap_dinfo; ACPI_HANDLE ap_handle; int ap_flags; }; ... static int acpi_pci_read_ivar(device_t dev, device_t child, int which, uintptr_t *result) { struct acpi_pci_devinfo *dinfo; dinfo = device_get_ivars(child); switch (which) { case ACPI_IVAR_HANDLE: *result = (uintptr_t)dinfo-ap_handle; return (0); case ACPI_IVAR_FLAGS: *result = (uintptr_t)dinfo-ap_flags; return (0); } return (pci_read_ivar(dev, child, which, result)); } ... static int acpi_pci_attach(device_t dev) { ... /* * First, PCI devices are added as in the normal PCI bus driver. * Afterwards, the ACPI namespace under the bridge driver is * walked to save ACPI handles to all the devices that appear in * the ACPI namespace as immediate descendants of the bridge. * * XXX: Sometimes PCI devices show up in the ACPI namespace that * pci_add_children() doesn't find. We currently just ignore * these devices. */ pci_add_children(dev, domain, busno,
Re: contigmalloc() breaking Xorg
[ Adding alc@ for VM stuff, Warner for arm/mips bus dma brokenness ] On Sunday, July 08, 2012 7:05:16 am Peter Jeremy wrote: On 2012-Jul-03 21:17:53 +1000, Peter Jeremy pe...@server.rulingia.com wrote: I have a reasonably recent 8-stable/amd64 system (r237444) with a ATI Radeon HD 2400 Pro, xorg-server-1.10.6,1 and xf86-video-ati-6.14.3_1 8GB RAM and ZFS. I'm seeing fairly consistent problems with Xorg ... How difficult would it be to modify bus_dmamem_alloc() [at least on x86] to handle multi-segment allocations? I think I've managed to create an amd64 bus_dmamem_alloc() that allows page-sized allocations as long as no boundary condition is specified and no more than page-sized alignment is required (porting it to other architectures would be trivial). I've given it a quick whirl inside a VBox and no smoke came out but I'd appreciate someone with a better understanding of bus_dma(9) and vm/vm_contig.c giving http://www.rulingia.com/bugs/patch-wiredmalloc a once-over. Note that this patch is against 8.x but there's only a trivial difference to head. BTW, the comment in busdma_machdep.c:bus_dmamem_alloc() * XXX Use Contigmalloc until it is merged into this facility * and handles multi-seg allocations. Nobody is doing * multi-seg allocations yet though. * XXX Certain AGP hardware does. does not appear to be accurate. Apart from drm, quite a few drivers call bus_dma_tag_create(9) with multiple segments and also call bus_dmamem_alloc(9) [though I haven't verified that the calls share the same bus_dma_tag, so I can't be absolutely certain]. I do think that all tags currently used with bus_dmamem_alloc() only use a single segment. It's a bit of an unfortunate part of the bus_dmamem API that the size of the allocate is determined by the tag (the tag should be used for determining the features and constraints of a DMA engine, not really the amount of memory to allocate). However, rather add a wiredmalloc(), I think you should just have bus_dmamem_alloc() call kmem_alloc_attr() directly in this case. One of the things I've been meaning to add to bus_dma is a way to allocate other memory types (e.g. WC memory), and in that case it would be best to just call kmem_alloc_attr() directly instead. BTW(2): Whilst studying busdma_machdep.c for arm and mips, I've noticed they appear to potentially allocate substantial kernel stack under some conditions as several bus_dma(9) functions include: bus_dma_segment_t dm_segments[dmat-nsegments]; What prevents this overflowing the kernel stack? That does seem dubious. x86 stores the array in the tag instead. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: kqueue periodic timer confusion
On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: Hi, Sorry about this repost but I'm confused about the responses I received in my last post so I'm looking for some clarification. Specifically, I though I could use the kqueue timer as essentially a drop in replacement for linuxfd_create/read, but was surprised that the accuracy of the kqueue timer is much less than what I need for my application. So my confusion at this point is whether this is consider to be a bug or feature? Here's some test code if you want to verify the problem: #include stdio.h #include stdlib.h #include string.h #include unistd.h #include errno.h #include sys/types.h #include sys/event.h #include sys/time.h int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq = kqueue()) == -1) { fprintf(stderr, kqueue error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(start, 0); for (i = 0; i 50; i++) { if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == -1) { fprintf(stderr, kevent error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags EV_ERROR) { fprintf(stderr, EV_ERROR: %s\n, strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(end, 0); msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + end.tv_usec - start.tv_usec) / 1000) - 1000); printf(msec = %d\n, msec); close(kq); return EXIT_SUCCESS; } What you are seeing is just the way FreeBSD currently works. Sleeping (in most all of its various forms, and I've just looked at the kevent code to verify this is true there) is handled by converting the amount of time to sleep (usually specified in a timeval or timespec struct) to a count of timer ticks, using an internal routine called tvtohz() in kern/kern_time.c. That routine rounds up by one tick to account for the current tick. Whether that's a good idea or not (it probably was once, and probably not anymore) it's how things currently work, and could explain the fairly consistant +1ms you're seeing. This is all true, but mostly irrelevant for his case. EVFILT_TIMER installs a periodic callout that executes KNOTE() and then resets itself (via callout_reset()) each time it runs. This should generally be closer to regulary spaced intervals than something that does: for (;;) { usleep(20 * 1000); } Which should be subject to the problem you are describing. It would be interesting to see if the callout routine itself is running at the right interval or if it is being delayed. If the latter, then that should be fixed if at all possible. You could investigate that by adding KTR traces to the relevant callout routine (so recording the TSC timestamp each time the callout runs), and then parsing the ktrdump output to compute TSC deltas and examining that distribution to see if it is noisy or incorrect, etc. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: newbus' ivar's limitation..
On Jul 12, 2012, at 6:01 AM, John Baldwin wrote: New-bus is certainly not the only way to organize a device hierarchy and is not perfect, but in your case I suggest you tone down your language until you have enough information to develop an informed opinion. It is also not the only way to represent relationships between objects, or to export services to the rest of the kernel. From earlier descriptions, it seems like some of these relationships aren't very newbus-y. From what I know about FDT, many of them are 'this device's interrupt pin is tied to GPIO 12 on controller 3' which isn't a parent/child relationship, but rather some kind of interrupt cookie you'll need to implement bus_setup_intr. Warner ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: kqueue periodic timer confusion
On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: Hi, Sorry about this repost but I'm confused about the responses I received in my last post so I'm looking for some clarification. Specifically, I though I could use the kqueue timer as essentially a drop in replacement for linuxfd_create/read, but was surprised that the accuracy of the kqueue timer is much less than what I need for my application. So my confusion at this point is whether this is consider to be a bug or feature? Here's some test code if you want to verify the problem: #include stdio.h #include stdlib.h #include string.h #include unistd.h #include errno.h #include sys/types.h #include sys/event.h #include sys/time.h int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq = kqueue()) == -1) { fprintf(stderr, kqueue error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(start, 0); for (i = 0; i 50; i++) { if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == -1) { fprintf(stderr, kevent error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags EV_ERROR) { fprintf(stderr, EV_ERROR: %s\n, strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(end, 0); msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + end.tv_usec - start.tv_usec) / 1000) - 1000); printf(msec = %d\n, msec); close(kq); return EXIT_SUCCESS; } What you are seeing is just the way FreeBSD currently works. Sleeping (in most all of its various forms, and I've just looked at the kevent code to verify this is true there) is handled by converting the amount of time to sleep (usually specified in a timeval or timespec struct) to a count of timer ticks, using an internal routine called tvtohz() in kern/kern_time.c. That routine rounds up by one tick to account for the current tick. Whether that's a good idea or not (it probably was once, and probably not anymore) it's how things currently work, and could explain the fairly consistant +1ms you're seeing. This is all true, but mostly irrelevant for his case. EVFILT_TIMER installs a periodic callout that executes KNOTE() and then resets itself (via callout_reset()) each time it runs. This should generally be closer to regulary spaced intervals than something that does: In what way is it irrelevant? That is, what did I miss? It appears to me that the next callout is scheduled by calling timertoticks() passing a count of milliseconds, that count is converted to a struct timeval and passed to tvtohz() which is where the +1 adjustment happens. If you ask for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. There is some time, likely a small number of microseconds, that you've consumed of the current tick, and that's what the +1 in tvtohz() is supposed to account for according to the comments. The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick and then adds one tick on top of that. That seems not quite right to me, except that it is a way to g'tee that you don't return early, and that is the one promise made by sleep routines on any OS; those magical at least words always appear in the docs. Actually what I'm missing (that I know of) is how the scheduler works. Maybe the +1 adjustment to account for the fraction of the current tick you've already consumed is the right thing to do, even when that fraction is 1uS or less of a 1mS tick. That would depend on scheduler behavior that I know nothing about. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: kqueue periodic timer confusion
On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote: On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: Hi, Sorry about this repost but I'm confused about the responses I received in my last post so I'm looking for some clarification. Specifically, I though I could use the kqueue timer as essentially a drop in replacement for linuxfd_create/read, but was surprised that the accuracy of the kqueue timer is much less than what I need for my application. So my confusion at this point is whether this is consider to be a bug or feature? Here's some test code if you want to verify the problem: #include stdio.h #include stdlib.h #include string.h #include unistd.h #include errno.h #include sys/types.h #include sys/event.h #include sys/time.h int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq = kqueue()) == -1) { fprintf(stderr, kqueue error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(start, 0); for (i = 0; i 50; i++) { if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == -1) { fprintf(stderr, kevent error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags EV_ERROR) { fprintf(stderr, EV_ERROR: %s\n, strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(end, 0); msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + end.tv_usec - start.tv_usec) / 1000) - 1000); printf(msec = %d\n, msec); close(kq); return EXIT_SUCCESS; } What you are seeing is just the way FreeBSD currently works. Sleeping (in most all of its various forms, and I've just looked at the kevent code to verify this is true there) is handled by converting the amount of time to sleep (usually specified in a timeval or timespec struct) to a count of timer ticks, using an internal routine called tvtohz() in kern/kern_time.c. That routine rounds up by one tick to account for the current tick. Whether that's a good idea or not (it probably was once, and probably not anymore) it's how things currently work, and could explain the fairly consistant +1ms you're seeing. This is all true, but mostly irrelevant for his case. EVFILT_TIMER installs a periodic callout that executes KNOTE() and then resets itself (via callout_reset()) each time it runs. This should generally be closer to regulary spaced intervals than something that does: In what way is it irrelevant? That is, what did I miss? It appears to me that the next callout is scheduled by calling timertoticks() passing a count of milliseconds, that count is converted to a struct timeval and passed to tvtohz() which is where the +1 adjustment happens. If you ask for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. There is some time, likely a small number of microseconds, that you've consumed of the current tick, and that's what the +1 in tvtohz() is supposed to account for according to the comments. The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick and then adds one tick on top of that. That seems not quite right to me, except that it is a way to g'tee that you don't return early, and that is the one promise made by sleep routines on any OS; those magical at least words always appear in the docs. Actually what I'm missing (that I know of) is how the scheduler works. Maybe the +1 adjustment to account for the fraction of the current tick you've already consumed is the right thing to do, even when that fraction is 1uS or less of a 1mS tick. That would depend on scheduler behavior that I know nothing about. Oh. My bad, sorry. You are correct. It is a bug to use +1 in this case. That is, the +1 makes sense when you are computing a one-time delta for things like nanosleep(). It is incorrect when computing a periodic delta such as for computing the interval for an itimer (setitimer) or EVFILT_TIMER(). Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1: sys/kern/kern_time.c: /* * Real interval timer expired: * send process whose timer expired an alarm signal. * If time is not set up to reload, then just return. * Else compute next time timer should go off which is current
Re: kqueue periodic timer confusion
On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote: On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote: On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: Hi, Sorry about this repost but I'm confused about the responses I received in my last post so I'm looking for some clarification. Specifically, I though I could use the kqueue timer as essentially a drop in replacement for linuxfd_create/read, but was surprised that the accuracy of the kqueue timer is much less than what I need for my application. So my confusion at this point is whether this is consider to be a bug or feature? Here's some test code if you want to verify the problem: #include stdio.h #include stdlib.h #include string.h #include unistd.h #include errno.h #include sys/types.h #include sys/event.h #include sys/time.h int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq = kqueue()) == -1) { fprintf(stderr, kqueue error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(start, 0); for (i = 0; i 50; i++) { if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == -1) { fprintf(stderr, kevent error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags EV_ERROR) { fprintf(stderr, EV_ERROR: %s\n, strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(end, 0); msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + end.tv_usec - start.tv_usec) / 1000) - 1000); printf(msec = %d\n, msec); close(kq); return EXIT_SUCCESS; } What you are seeing is just the way FreeBSD currently works. Sleeping (in most all of its various forms, and I've just looked at the kevent code to verify this is true there) is handled by converting the amount of time to sleep (usually specified in a timeval or timespec struct) to a count of timer ticks, using an internal routine called tvtohz() in kern/kern_time.c. That routine rounds up by one tick to account for the current tick. Whether that's a good idea or not (it probably was once, and probably not anymore) it's how things currently work, and could explain the fairly consistant +1ms you're seeing. This is all true, but mostly irrelevant for his case. EVFILT_TIMER installs a periodic callout that executes KNOTE() and then resets itself (via callout_reset()) each time it runs. This should generally be closer to regulary spaced intervals than something that does: In what way is it irrelevant? That is, what did I miss? It appears to me that the next callout is scheduled by calling timertoticks() passing a count of milliseconds, that count is converted to a struct timeval and passed to tvtohz() which is where the +1 adjustment happens. If you ask for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. There is some time, likely a small number of microseconds, that you've consumed of the current tick, and that's what the +1 in tvtohz() is supposed to account for according to the comments. The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick and then adds one tick on top of that. That seems not quite right to me, except that it is a way to g'tee that you don't return early, and that is the one promise made by sleep routines on any OS; those magical at least words always appear in the docs. Actually what I'm missing (that I know of) is how the scheduler works. Maybe the +1 adjustment to account for the fraction of the current tick you've already consumed is the right thing to do, even when that fraction is 1uS or less of a 1mS tick. That would depend on scheduler behavior that I know nothing about. Oh. My bad, sorry. You are correct. It is a bug to use +1 in this case. That is, the +1 makes sense when you are computing a one-time delta for things like nanosleep(). It is incorrect when computing a periodic delta such as for computing the interval for an itimer (setitimer) or EVFILT_TIMER(). Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1: sys/kern/kern_time.c: /* * Real interval timer expired: * send process whose timer expired an alarm signal. * If time is not set up to reload, then just return. *
Re: kqueue periodic timer confusion
On Thursday, July 12, 2012 11:08:47 am Davide Italiano wrote: On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote: On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote: On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: Hi, Sorry about this repost but I'm confused about the responses I received in my last post so I'm looking for some clarification. Specifically, I though I could use the kqueue timer as essentially a drop in replacement for linuxfd_create/read, but was surprised that the accuracy of the kqueue timer is much less than what I need for my application. So my confusion at this point is whether this is consider to be a bug or feature? Here's some test code if you want to verify the problem: #include stdio.h #include stdlib.h #include string.h #include unistd.h #include errno.h #include sys/types.h #include sys/event.h #include sys/time.h int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq = kqueue()) == -1) { fprintf(stderr, kqueue error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(start, 0); for (i = 0; i 50; i++) { if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == -1) { fprintf(stderr, kevent error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags EV_ERROR) { fprintf(stderr, EV_ERROR: %s\n, strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(end, 0); msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + end.tv_usec - start.tv_usec) / 1000) - 1000); printf(msec = %d\n, msec); close(kq); return EXIT_SUCCESS; } What you are seeing is just the way FreeBSD currently works. Sleeping (in most all of its various forms, and I've just looked at the kevent code to verify this is true there) is handled by converting the amount of time to sleep (usually specified in a timeval or timespec struct) to a count of timer ticks, using an internal routine called tvtohz() in kern/kern_time.c. That routine rounds up by one tick to account for the current tick. Whether that's a good idea or not (it probably was once, and probably not anymore) it's how things currently work, and could explain the fairly consistant +1ms you're seeing. This is all true, but mostly irrelevant for his case. EVFILT_TIMER installs a periodic callout that executes KNOTE() and then resets itself (via callout_reset()) each time it runs. This should generally be closer to regulary spaced intervals than something that does: In what way is it irrelevant? That is, what did I miss? It appears to me that the next callout is scheduled by calling timertoticks() passing a count of milliseconds, that count is converted to a struct timeval and passed to tvtohz() which is where the +1 adjustment happens. If you ask for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. There is some time, likely a small number of microseconds, that you've consumed of the current tick, and that's what the +1 in tvtohz() is supposed to account for according to the comments. The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick and then adds one tick on top of that. That seems not quite right to me, except that it is a way to g'tee that you don't return early, and that is the one promise made by sleep routines on any OS; those magical at least words always appear in the docs. Actually what I'm missing (that I know of) is how the scheduler works. Maybe the +1 adjustment to account for the fraction of the current tick you've already consumed is the right thing to do, even when that fraction is 1uS or less of a 1mS tick. That would depend on scheduler behavior that I know nothing about. Oh. My bad, sorry. You are correct. It is a bug to use +1 in this case. That is, the +1 makes sense when you are computing a one-time delta for things like nanosleep(). It is incorrect when computing a periodic delta such as for computing the interval for an itimer (setitimer) or EVFILT_TIMER(). Hah, setitimer()'s callout (realitexpire) uses
Re: kqueue periodic timer confusion
On Thu, 2012-07-12 at 17:08 +0200, Davide Italiano wrote: On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote: On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote: On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: Hi, Sorry about this repost but I'm confused about the responses I received in my last post so I'm looking for some clarification. Specifically, I though I could use the kqueue timer as essentially a drop in replacement for linuxfd_create/read, but was surprised that the accuracy of the kqueue timer is much less than what I need for my application. So my confusion at this point is whether this is consider to be a bug or feature? Here's some test code if you want to verify the problem: #include stdio.h #include stdlib.h #include string.h #include unistd.h #include errno.h #include sys/types.h #include sys/event.h #include sys/time.h int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq = kqueue()) == -1) { fprintf(stderr, kqueue error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(start, 0); for (i = 0; i 50; i++) { if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == -1) { fprintf(stderr, kevent error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags EV_ERROR) { fprintf(stderr, EV_ERROR: %s\n, strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(end, 0); msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + end.tv_usec - start.tv_usec) / 1000) - 1000); printf(msec = %d\n, msec); close(kq); return EXIT_SUCCESS; } What you are seeing is just the way FreeBSD currently works. Sleeping (in most all of its various forms, and I've just looked at the kevent code to verify this is true there) is handled by converting the amount of time to sleep (usually specified in a timeval or timespec struct) to a count of timer ticks, using an internal routine called tvtohz() in kern/kern_time.c. That routine rounds up by one tick to account for the current tick. Whether that's a good idea or not (it probably was once, and probably not anymore) it's how things currently work, and could explain the fairly consistant +1ms you're seeing. This is all true, but mostly irrelevant for his case. EVFILT_TIMER installs a periodic callout that executes KNOTE() and then resets itself (via callout_reset()) each time it runs. This should generally be closer to regulary spaced intervals than something that does: In what way is it irrelevant? That is, what did I miss? It appears to me that the next callout is scheduled by calling timertoticks() passing a count of milliseconds, that count is converted to a struct timeval and passed to tvtohz() which is where the +1 adjustment happens. If you ask for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. There is some time, likely a small number of microseconds, that you've consumed of the current tick, and that's what the +1 in tvtohz() is supposed to account for according to the comments. The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick and then adds one tick on top of that. That seems not quite right to me, except that it is a way to g'tee that you don't return early, and that is the one promise made by sleep routines on any OS; those magical at least words always appear in the docs. Actually what I'm missing (that I know of) is how the scheduler works. Maybe the +1 adjustment to account for the fraction of the current tick you've already consumed is the right thing to do, even when that fraction is 1uS or less of a 1mS tick. That would depend on scheduler behavior that I know nothing about. Oh. My bad, sorry. You are correct. It is a bug to use +1 in this case. That is, the +1 makes sense when you are computing a one-time delta for things like nanosleep(). It is incorrect when computing a periodic delta such as for computing the interval for an itimer (setitimer) or EVFILT_TIMER(). Hah, setitimer()'s callout (realitexpire) uses
Re: kqueue periodic timer confusion
On Thu, 2012-07-12 at 09:26 -0500, John Baldwin wrote: On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote: On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: Hi, Sorry about this repost but I'm confused about the responses I received in my last post so I'm looking for some clarification. Specifically, I though I could use the kqueue timer as essentially a drop in replacement for linuxfd_create/read, but was surprised that the accuracy of the kqueue timer is much less than what I need for my application. So my confusion at this point is whether this is consider to be a bug or feature? Here's some test code if you want to verify the problem: #include stdio.h #include stdlib.h #include string.h #include unistd.h #include errno.h #include sys/types.h #include sys/event.h #include sys/time.h int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq = kqueue()) == -1) { fprintf(stderr, kqueue error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(start, 0); for (i = 0; i 50; i++) { if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == -1) { fprintf(stderr, kevent error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags EV_ERROR) { fprintf(stderr, EV_ERROR: %s\n, strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(end, 0); msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + end.tv_usec - start.tv_usec) / 1000) - 1000); printf(msec = %d\n, msec); close(kq); return EXIT_SUCCESS; } What you are seeing is just the way FreeBSD currently works. Sleeping (in most all of its various forms, and I've just looked at the kevent code to verify this is true there) is handled by converting the amount of time to sleep (usually specified in a timeval or timespec struct) to a count of timer ticks, using an internal routine called tvtohz() in kern/kern_time.c. That routine rounds up by one tick to account for the current tick. Whether that's a good idea or not (it probably was once, and probably not anymore) it's how things currently work, and could explain the fairly consistant +1ms you're seeing. This is all true, but mostly irrelevant for his case. EVFILT_TIMER installs a periodic callout that executes KNOTE() and then resets itself (via callout_reset()) each time it runs. This should generally be closer to regulary spaced intervals than something that does: In what way is it irrelevant? That is, what did I miss? It appears to me that the next callout is scheduled by calling timertoticks() passing a count of milliseconds, that count is converted to a struct timeval and passed to tvtohz() which is where the +1 adjustment happens. If you ask for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. There is some time, likely a small number of microseconds, that you've consumed of the current tick, and that's what the +1 in tvtohz() is supposed to account for according to the comments. The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick and then adds one tick on top of that. That seems not quite right to me, except that it is a way to g'tee that you don't return early, and that is the one promise made by sleep routines on any OS; those magical at least words always appear in the docs. Actually what I'm missing (that I know of) is how the scheduler works. Maybe the +1 adjustment to account for the fraction of the current tick you've already consumed is the right thing to do, even when that fraction is 1uS or less of a 1mS tick. That would depend on scheduler behavior that I know nothing about. Oh. My bad, sorry. You are correct. It is a bug to use +1 in this case. That is, the +1 makes sense when you are computing a one-time delta for things like nanosleep(). It is incorrect when computing a periodic delta such as for computing the interval for an itimer (setitimer) or EVFILT_TIMER(). Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1: sys/kern/kern_time.c: /* * Real
CVE-2012-0217 Intel's sysret Kernel Privilege Escalation and FreeBSD 6.2/6.3
Good Morning! This was also posted to the FreeBSD forums: I have been researching CVE-2012-0217 and while I have patched the kernels on servers with 7.3/8.2 that I have, I would like to see if anyone knows for sure if 6.2/6.3 are also vulnerable? I am aware that those kernels are out of support from looking at the documentation. I have looked at the code in trap.c to see if the current patch would work with 6.3 source but it won't based on what I saw. I am also aware of upgrading as an option to resolve this unfortunately in some cases I have this is not possible right now. Any help would be greatly appreciated, and I can of course test anything that might need it. Thanks! ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: port devel/doxygen failing to test on -CURRENT and -STABLE
On 9 July 2012 02:49, David Xu davi...@freebsd.org wrote: On 2012/07/08 18:21, Chris Rees wrote: Hi all / David, doxygen has been failing for a while now on -CURRENT and apparently -STABLE too. The current fix is disabling one of the tests in the build, but obviously it points to a problem with our base system I've trussed [1] the failing code [2], and it looks as though it's hanging on a _umtx call. I'm gratuitously ignorant of what goes on there... but the timings of recent commits to umtx.h [3] could indicate a link (hope it's not bogus...). Any pointers on what I should do next? Chris [1] http://www.bayofrum.net/~crees/scratch/doxygen-truss _umtx_op(0x8012b0280,0x16,0x0,0x0,0x0,0x1) ERR#22 'Invalid argument' can you execute it in gdb and print its value ? print/x *(int *)0x8012b0280 print/x *(int *)(0x8012b0280+4) I've been having trouble debugging it since it's threaded, and so I ran a binary search over the last few days of revisions from 1/Apr to 1/May. Unfortunately I discovered to my horror today that all but the first test was useless, because the patch I committed to disable the test was of course readded to my ports tree, so none of the tests ran :/ I'll hopefully have it narrowed down to the offending commit over the next few days. Chris ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: dtraceall.ko with old nfsclient
Benjamin Kaduk ka...@mit.edu wrote: On Wed, 11 Jul 2012, Fabian Keil wrote: I'm using the following modification of Sean's patch: This way it seems to work as expected: diff --git a/sys/modules/dtrace/dtraceall/Makefile b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644 --- a/sys/modules/dtrace/dtraceall/Makefile +++ b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD: src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs Exp $ KMOD= dtraceall -SRCS= dtraceall.c opt_compat.h +SRCS= dtraceall.c opt_compat.h opt_nfs.h CFLAGS+= -I${.CURDIR}/../../.. diff --git a/sys/modules/dtrace/dtraceall/dtraceall.c b/sys/modules/dtrace/dtraceall/dtraceall.c index d256489..0672556 100644 --- a/sys/modules/dtrace/dtraceall/dtraceall.c +++ b/sys/modules/dtrace/dtraceall/dtraceall.c @@ -33,6 +33,7 @@ #include sys/module.h #include sys/errno.h #include opt_compat.h +#include opt_nfs.h static int dtraceall_modevent(module_t mod __unused, int type, void *data __unused) @@ -67,8 +68,11 @@ MODULE_DEPEND(dtraceall, opensolaris, 1, 1, 1); MODULE_DEPEND(dtraceall, dtrace, 1, 1, 1); MODULE_DEPEND(dtraceall, dtio, 1, 1, 1); MODULE_DEPEND(dtraceall, dtmalloc, 1, 1, 1); +#if defined (NFSCL) MODULE_DEPEND(dtraceall, dtnfscl, 1, 1, 1); +#elif defined (NFSCLIENT) MODULE_DEPEND(dtraceall, dtnfsclient, 1, 1, 1); +#endif #if defined(__amd64__) || defined(__i386__) MODULE_DEPEND(dtraceall, fbt, 1, 1, 1); MODULE_DEPEND(dtraceall, fasttrap, 1, 1, 1); Note that dtnfscl.ko is not loaded even though loading it manually works and I have NFSCL in the kernel. I wasn't entirely clear here, what I meant was that the KERNCONF used when compiling the module included options NFSCL. I didn't expect run-time detection. This is because dtraceall.c only #includes opt_compat.h, and the kernel build system only passes -include opt_global.h, so the dtraceall module build has no way of knowing about the NFSCL{IENT,} options defined in opt_nfs.h. (As you noted earlier in the thread?) Yes. You would still need to address Andriy's comments in order to ensure that the configuration seen by the module matches the kernel. Sure. Fabian signature.asc Description: PGP signature
Re: kqueue periodic timer confusion
On Thu, 2012-07-12 at 10:40 -0500, Ian Lepore wrote: On Thu, 2012-07-12 at 17:08 +0200, Davide Italiano wrote: On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote: On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote: On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote: On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote: On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote: Hi, Sorry about this repost but I'm confused about the responses I received in my last post so I'm looking for some clarification. Specifically, I though I could use the kqueue timer as essentially a drop in replacement for linuxfd_create/read, but was surprised that the accuracy of the kqueue timer is much less than what I need for my application. So my confusion at this point is whether this is consider to be a bug or feature? Here's some test code if you want to verify the problem: #include stdio.h #include stdlib.h #include string.h #include unistd.h #include errno.h #include sys/types.h #include sys/event.h #include sys/time.h int main(void) { int i,msec; int kq,nev; struct kevent inqueue; struct kevent outqueue; struct timeval start,end; if ((kq = kqueue()) == -1) { fprintf(stderr, kqueue error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0); gettimeofday(start, 0); for (i = 0; i 50; i++) { if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == -1) { fprintf(stderr, kevent error!? errno = %s, strerror(errno)); exit(EXIT_FAILURE); } else if (outqueue.flags EV_ERROR) { fprintf(stderr, EV_ERROR: %s\n, strerror(outqueue.data)); exit(EXIT_FAILURE); } } gettimeofday(end, 0); msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + end.tv_usec - start.tv_usec) / 1000) - 1000); printf(msec = %d\n, msec); close(kq); return EXIT_SUCCESS; } What you are seeing is just the way FreeBSD currently works. Sleeping (in most all of its various forms, and I've just looked at the kevent code to verify this is true there) is handled by converting the amount of time to sleep (usually specified in a timeval or timespec struct) to a count of timer ticks, using an internal routine called tvtohz() in kern/kern_time.c. That routine rounds up by one tick to account for the current tick. Whether that's a good idea or not (it probably was once, and probably not anymore) it's how things currently work, and could explain the fairly consistant +1ms you're seeing. This is all true, but mostly irrelevant for his case. EVFILT_TIMER installs a periodic callout that executes KNOTE() and then resets itself (via callout_reset()) each time it runs. This should generally be closer to regulary spaced intervals than something that does: In what way is it irrelevant? That is, what did I miss? It appears to me that the next callout is scheduled by calling timertoticks() passing a count of milliseconds, that count is converted to a struct timeval and passed to tvtohz() which is where the +1 adjustment happens. If you ask for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms. There is some time, likely a small number of microseconds, that you've consumed of the current tick, and that's what the +1 in tvtohz() is supposed to account for according to the comments. The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick and then adds one tick on top of that. That seems not quite right to me, except that it is a way to g'tee that you don't return early, and that is the one promise made by sleep routines on any OS; those magical at least words always appear in the docs. Actually what I'm missing (that I know of) is how the scheduler works. Maybe the +1 adjustment to account for the fraction of the current tick you've already consumed is the right thing to do, even when that fraction is 1uS or less of a 1mS tick. That would depend on scheduler behavior that I know nothing about. Oh. My bad, sorry. You are correct. It is a bug to use +1 in this case. That is, the +1 makes sense when you are computing a one-time delta
Re: dtraceall.ko with old nfsclient
on 12/07/2012 21:17 Fabian Keil said the following: Benjamin Kaduk ka...@mit.edu wrote: On Wed, 11 Jul 2012, Fabian Keil wrote: I'm using the following modification of Sean's patch: This way it seems to work as expected: diff --git a/sys/modules/dtrace/dtraceall/Makefile b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644 --- a/sys/modules/dtrace/dtraceall/Makefile +++ b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD: src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs Exp $ KMOD= dtraceall -SRCS= dtraceall.c opt_compat.h +SRCS= dtraceall.c opt_compat.h opt_nfs.h CFLAGS+= -I${.CURDIR}/../../.. If you do cd sys/modules/dtrace/dtraceall make [obj depend] all, does it compile OK with the above change? I am hinting at the case where KERNBUILDDIR is not set. This is not the proper way of building modules, but traditionally we keep it working. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: dtraceall.ko with old nfsclient
Andriy Gapon a...@freebsd.org wrote: on 12/07/2012 21:17 Fabian Keil said the following: Benjamin Kaduk ka...@mit.edu wrote: On Wed, 11 Jul 2012, Fabian Keil wrote: I'm using the following modification of Sean's patch: This way it seems to work as expected: diff --git a/sys/modules/dtrace/dtraceall/Makefile b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644 --- a/sys/modules/dtrace/dtraceall/Makefile +++ b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD: src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs Exp $ KMOD= dtraceall -SRCS= dtraceall.c opt_compat.h +SRCS= dtraceall.c opt_compat.h opt_nfs.h CFLAGS+= -I${.CURDIR}/../../.. If you do cd sys/modules/dtrace/dtraceall make [obj depend] all, does it compile OK with the above change? Depends on your expectations I guess. As neither NFS-related option gets defined, no dependency on either NFS module is registered. The compiler has no complaints, though. Fabian signature.asc Description: PGP signature
Re: dtraceall.ko with old nfsclient
on 12/07/2012 22:36 Fabian Keil said the following: Andriy Gapon a...@freebsd.org wrote: on 12/07/2012 21:17 Fabian Keil said the following: Benjamin Kaduk ka...@mit.edu wrote: On Wed, 11 Jul 2012, Fabian Keil wrote: I'm using the following modification of Sean's patch: This way it seems to work as expected: diff --git a/sys/modules/dtrace/dtraceall/Makefile b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644 --- a/sys/modules/dtrace/dtraceall/Makefile +++ b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD: src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs Exp $ KMOD= dtraceall -SRCS= dtraceall.c opt_compat.h +SRCS= dtraceall.c opt_compat.h opt_nfs.h CFLAGS+= -I${.CURDIR}/../../.. If you do cd sys/modules/dtrace/dtraceall make [obj depend] all, does it compile OK with the above change? Depends on your expectations I guess. As neither NFS-related option gets defined, no dependency on either NFS module is registered. The compiler has no complaints, though. Interesting. Could you repeat after sufficient cleaning up? I am not sure where from opt_nfs.h file could come. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: dtraceall.ko with old nfsclient
Andriy Gapon a...@freebsd.org wrote: on 12/07/2012 22:36 Fabian Keil said the following: Andriy Gapon a...@freebsd.org wrote: on 12/07/2012 21:17 Fabian Keil said the following: Benjamin Kaduk ka...@mit.edu wrote: On Wed, 11 Jul 2012, Fabian Keil wrote: I'm using the following modification of Sean's patch: This way it seems to work as expected: diff --git a/sys/modules/dtrace/dtraceall/Makefile b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644 --- a/sys/modules/dtrace/dtraceall/Makefile +++ b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD: src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs Exp $ KMOD= dtraceall -SRCS= dtraceall.c opt_compat.h +SRCS= dtraceall.c opt_compat.h opt_nfs.h CFLAGS+= -I${.CURDIR}/../../.. If you do cd sys/modules/dtrace/dtraceall make [obj depend] all, does it compile OK with the above change? Depends on your expectations I guess. As neither NFS-related option gets defined, no dependency on either NFS module is registered. The compiler has no complaints, though. Interesting. Could you repeat after sufficient cleaning up? I am not sure where from opt_nfs.h file could come. The Makefile seems to create an empty one: fk@r500 /usr/src/sys/modules/dtrace/dtraceall $make clean rm -f export_syms dtraceall.ko dtraceall.kld dtraceall.o dtraceall.ko.debug dtraceall.ko.symbols opt_compat.h opt_nfs.h fk@r500 /usr/src/sys/modules/dtrace/dtraceall $make echo #define COMPAT_FREEBSD32 1 opt_compat.h : opt_nfs.h cc -O2 -pipe [...] -c /usr/src/sys/modules/dtrace/dtraceall/dtraceall.c ld -d -warn-common -r -d -o dtraceall.ko.debug dtraceall.o : export_syms awk -f /usr/src/sys/modules/dtrace/dtraceall/../../../conf/kmod_syms.awk dtraceall.ko.debug export_syms | xargs -J% objcopy % dtraceall.ko.debug objcopy --only-keep-debug dtraceall.ko.debug dtraceall.ko.symbols objcopy --strip-debug --add-gnu-debuglink=dtraceall.ko.symbols dtraceall.ko.debug dtraceall.ko Fabian signature.asc Description: PGP signature
Re: dtraceall.ko with old nfsclient
On Jul 12, 2012, at 3:39 PM, Andriy Gapon wrote: on 12/07/2012 22:36 Fabian Keil said the following: Andriy Gapon a...@freebsd.org wrote: on 12/07/2012 21:17 Fabian Keil said the following: Benjamin Kaduk ka...@mit.edu wrote: On Wed, 11 Jul 2012, Fabian Keil wrote: I'm using the following modification of Sean's patch: This way it seems to work as expected: diff --git a/sys/modules/dtrace/dtraceall/Makefile b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644 --- a/sys/modules/dtrace/dtraceall/Makefile +++ b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD: src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs Exp $ KMOD= dtraceall -SRCS= dtraceall.c opt_compat.h +SRCS= dtraceall.c opt_compat.h opt_nfs.h CFLAGS+= -I${.CURDIR}/../../.. If you do cd sys/modules/dtrace/dtraceall make [obj depend] all, does it compile OK with the above change? Depends on your expectations I guess. As neither NFS-related option gets defined, no dependency on either NFS module is registered. The compiler has no complaints, though. Interesting. Could you repeat after sufficient cleaning up? I am not sure where from opt_nfs.h file could come. Maybe related: check out sys/modules/ipfw/Makefile. It makes its own option headers for INET and INET6. -A -- Andrew Boyerabo...@averesystems.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: dtraceall.ko with old nfsclient
on 12/07/2012 22:44 Fabian Keil said the following: Andriy Gapon a...@freebsd.org wrote: on 12/07/2012 22:36 Fabian Keil said the following: Andriy Gapon a...@freebsd.org wrote: on 12/07/2012 21:17 Fabian Keil said the following: Benjamin Kaduk ka...@mit.edu wrote: On Wed, 11 Jul 2012, Fabian Keil wrote: I'm using the following modification of Sean's patch: This way it seems to work as expected: diff --git a/sys/modules/dtrace/dtraceall/Makefile b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644 --- a/sys/modules/dtrace/dtraceall/Makefile +++ b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD: src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs Exp $ KMOD= dtraceall -SRCS= dtraceall.c opt_compat.h +SRCS= dtraceall.c opt_compat.h opt_nfs.h CFLAGS+= -I${.CURDIR}/../../.. If you do cd sys/modules/dtrace/dtraceall make [obj depend] all, does it compile OK with the above change? Depends on your expectations I guess. As neither NFS-related option gets defined, no dependency on either NFS module is registered. The compiler has no complaints, though. Interesting. Could you repeat after sufficient cleaning up? I am not sure where from opt_nfs.h file could come. The Makefile seems to create an empty one: fk@r500 /usr/src/sys/modules/dtrace/dtraceall $make clean rm -f export_syms dtraceall.ko dtraceall.kld dtraceall.o dtraceall.ko.debug dtraceall.ko.symbols opt_compat.h opt_nfs.h fk@r500 /usr/src/sys/modules/dtrace/dtraceall $make echo #define COMPAT_FREEBSD32 1 opt_compat.h : opt_nfs.h cc -O2 -pipe [...] -c /usr/src/sys/modules/dtrace/dtraceall/dtraceall.c ld -d -warn-common -r -d -o dtraceall.ko.debug dtraceall.o : export_syms awk -f /usr/src/sys/modules/dtrace/dtraceall/../../../conf/kmod_syms.awk dtraceall.ko.debug export_syms | xargs -J% objcopy % dtraceall.ko.debug objcopy --only-keep-debug dtraceall.ko.debug dtraceall.ko.symbols objcopy --strip-debug --add-gnu-debuglink=dtraceall.ko.symbols dtraceall.ko.debug dtraceall.ko Ah, correct. I now even see the relevant snippet in kmod.mk: .for _src in ${SRCS:Mopt_*.h} CLEANFILES+=${_src} .if !target(${_src}) ${_src}: : ${.TARGET} .endif .endfor It's only when we want those file to be not empty (have some options) that we define any targets for them (like COMPAT_FREEBSD32 in the dtraceall Makefile). Sorry for the noise. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: CVE-2012-0217 Intel's sysret Kernel Privilege Escalation and FreeBSD 6.2/6.3
On 07/12/12 09:36, Bill Crisp wrote: Good Morning! This was also posted to the FreeBSD forums: I have been researching CVE-2012-0217 and while I have patched the kernels on servers with 7.3/8.2 that I have, I would like to see if anyone knows for sure if 6.2/6.3 are also vulnerable? I am aware that those kernels are out of support from looking at the documentation. I have looked at the code in trap.c to see if the current patch would work with 6.3 source but it won't based on what I saw. I am also aware of upgrading as an option to resolve this unfortunately in some cases I have this is not possible right now. I believe that 6.x are vulnerable. You will have to backport the change (something like this against sys/amd64/amd64/trap.c, in syscall() right after PTRACESTOP_SC(p, td, S_PT_SCX); Add: + /* +* If the user-supplied value of %rip is not a canonical +* address, then some CPUs will trigger a ring 0 #GP during +* the sysret instruction. However, the fault handler would +* execute with the user's %gs and %rsp in ring 0 which would +* not be safe. Instead, preemptively kill the thread with a +* SIGBUS. +*/ + if (td-td_frame-tf_rip= VM_MAXUSER_ADDRESS) { + ksiginfo_init_trap(ksi); + ksi.ksi_signo = SIGBUS; + ksi.ksi_code = BUS_OBJERR; + ksi.ksi_trapno = T_PROTFLT; + ksi.ksi_addr = (void *)td-td_frame-tf_rip; + trapsignal(td,ksi); + } Right before: WITNESS_WARN(...) Cheers, ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: /proc filesystem
On Tue, 2012-06-19 at 06:47 +0200, Wojciech Puchar wrote: that is what i need. but still need some explanation after using it and reading manual say: PID STARTEND PRT RES PRES REF SHD FL TP PATH 1378 0x40 0x5ac000 r-x 385 415 2 1 CN- vn /usr/local/bin/Xorg 1378 0x7ab000 0x7bc000 rw- 170 1 0 C-- vn /usr/local/bin/Xorg 1378 0x7bc000 0x80 rw- 140 1 0 C-- df 13780x8007ab0000x8007c3000 r-x 240 32 0 CN- vn /libexec/ld-elf.so.1 13780x8007c30000x8007f rw- 430 1 0 C-- df 13780x8007f0x8007f2000 rw-10 4 0 --- dv 13780x8007f20000x8007f4000 rw-20 4 0 --- dv 13780x8007f40000x800874000 rw- 110 4 0 --- dv 13780x8008740000x800884000 rw- 160 4 0 --- dv 13780x8008840000x800895000 rw- 100 1 0 CN- df 13780x8009c20000x8009c5000 rw-30 1 0 C-- df 1) Xorg is mapped twice - IMHO first is text/rodata second is data. But what REF really means here and why it is 2 once and 1 second. 2) what really PRES (private resident) means? df (default) mappings are IMHO anonymous maps==private data of process. so why RES is nonzero while PRES is zero, while on shared code PRES is nonzero and large. what does it really means? thanks. I'm catching up on threads I was following before I went on vacation, and it looks like there was never a response to this. I'm interested in the answers to these questions too, so today I did some spelunking in the code to see what I could figure out. I don't think I really understand things too well, but I'll just say what I think I found and hopefully the experts will correct anything I get wrong. I think you're right about the first two mappings in that procstat output. The REF value is the reference count on the vm object (the vnode for the exe file, I presume). I think the reason the reference count is 2 is that one reference is the open file itself, and the other is the shadow object. I've always been a bit confused about the concept of shadow objects in freebsd's vm, but I think it's somehow related to the running processes that are based on that executable vnode. For example, if another copy of Xorg were running, I think REF would be 3, and SHD would be 2. I don't know why there is no shadow object for the writable data mapping and why the refcount is only 1 for that. The PRES thing seemed simple when I first looked at the code, but the more I think about it in relation to other numbers the more confused I get. The logic in the code is if the shadow count is 1 then PRES is the resident size of the shadow object. This seems to be a measure of shared-code usage... any object which could be shared but isn't gets counted as private resident. The part that confuses me is how PRES can be larger than RES. The value for PRES is taken from the resident_page_count field of the shadow object. The RES value is calculated by walking each page of the map entry and calling pmap_mincore() to see if it's resident. So the number of resident pages is calculated to be fewer than the resident_page_count of the object the entry maps. I don't understand. Oh hmmm, wait a sec... could it be that read-ahead or relocation fixup or various other things caused lots of pages to be faulted in for the vnode object (so they're resident) but not all of those pages are mapped into the process because the path of execution has never referenced them and caused faults to map them into the process' vmspace? -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
RE: /proc filesystem
-Original Message- From: owner-freebsd-hack...@freebsd.org [mailto:owner-freebsd- hack...@freebsd.org] On Behalf Of Ian Lepore Sent: Thursday, July 12, 2012 6:42 PM To: Wojciech Puchar Cc: freebsd-hackers@freebsd.org Subject: Re: /proc filesystem On Tue, 2012-06-19 at 06:47 +0200, Wojciech Puchar wrote: that is what i need. but still need some explanation after using it and reading manual say: PID STARTEND PRT RES PRES REF SHD FL TP PATH 1378 0x40 0x5ac000 r-x 385 415 2 1 CN- vn /usr/local/bin/Xorg 1378 0x7ab000 0x7bc000 rw- 170 1 0 C-- vn /usr/local/bin/Xorg 1378 0x7bc000 0x80 rw- 140 1 0 C-- df 13780x8007ab0000x8007c3000 r-x 240 32 0 CN- vn /libexec/ld-elf.so.1 13780x8007c30000x8007f rw- 430 1 0 C-- df 13780x8007f0x8007f2000 rw-10 4 0 --- dv 13780x8007f20000x8007f4000 rw-20 4 0 --- dv 13780x8007f40000x800874000 rw- 110 4 0 --- dv 13780x8008740000x800884000 rw- 160 4 0 --- dv 13780x8008840000x800895000 rw- 100 1 0 CN- df 13780x8009c20000x8009c5000 rw-30 1 0 C-- df 1) Xorg is mapped twice - IMHO first is text/rodata second is data. But what REF really means here and why it is 2 once and 1 second. 2) what really PRES (private resident) means? df (default) mappings are IMHO anonymous maps==private data of process. so why RES is nonzero while PRES is zero, while on shared code PRES is nonzero and large. what does it really means? thanks. I'm catching up on threads I was following before I went on vacation, and it looks like there was never a response to this. I'm interested in the answers to these questions too, so today I did some spelunking in the code to see what I could figure out. I don't think I really understand things too well, but I'll just say what I think I found and hopefully the experts will correct anything I get wrong. I think you're right about the first two mappings in that procstat output. The REF value is the reference count on the vm object (the vnode for the exe file, I presume). I think the reason the reference count is 2 is that one reference is the open file itself, and the other is the shadow object. I've always been a bit confused about the concept of shadow objects in freebsd's vm, but I think it's somehow related to the running processes that are based on that executable vnode. For example, if another copy of Xorg were running, I think REF would be 3, and SHD would be 2. I don't know why there is no shadow object for the writable data mapping and why the refcount is only 1 for that. BSS that doesn't exist in the file? ... Andrew Duane Juniper Networks +1 978-589-0551 (o) +1 603-770-7088 (m) adu...@juniper.net ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: /proc filesystem
On Thu, Jul 12, 2012 at 04:41:58PM -0600, Ian Lepore wrote: On Tue, 2012-06-19 at 06:47 +0200, Wojciech Puchar wrote: that is what i need. but still need some explanation after using it and reading manual say: PID STARTEND PRT RES PRES REF SHD FL TP PATH 1378 0x40 0x5ac000 r-x 385 415 2 1 CN- vn /usr/local/bin/Xorg 1378 0x7ab000 0x7bc000 rw- 170 1 0 C-- vn /usr/local/bin/Xorg 1378 0x7bc000 0x80 rw- 140 1 0 C-- df 13780x8007ab0000x8007c3000 r-x 240 32 0 CN- vn /libexec/ld-elf.so.1 13780x8007c30000x8007f rw- 430 1 0 C-- df 13780x8007f0x8007f2000 rw-10 4 0 --- dv 13780x8007f20000x8007f4000 rw-20 4 0 --- dv 13780x8007f40000x800874000 rw- 110 4 0 --- dv 13780x8008740000x800884000 rw- 160 4 0 --- dv 13780x8008840000x800895000 rw- 100 1 0 CN- df 13780x8009c20000x8009c5000 rw-30 1 0 C-- df 1) Xorg is mapped twice - IMHO first is text/rodata second is data. But what REF really means here and why it is 2 once and 1 second. ref shows the reference count on the top of the shadow chain. The Xorg text is mapped read-only private and flags indicate that there were no writes to the text (e.g. from debuggers to set breakpoints), so no COW were performed, and no shadows to contain the COW pages were inserted. You see the reference count 2 because text and data mappings are separate vm map entries, and both reference the same vm object. For the Xorg data, there were writes into private writeable mapping, so you can see in flags that COW was performed, and shadow object installed over the vnode vm object. Since the shadow object has a single user, namely the data mapping in the Xorg process, the ref count is 1. 2) what really PRES (private resident) means? df (default) mappings are IMHO anonymous maps==private data of process. so why RES is nonzero while PRES is zero, while on shared code PRES is nonzero and large. what does it really means? thanks. I'm catching up on threads I was following before I went on vacation, and it looks like there was never a response to this. I'm interested in the answers to these questions too, so today I did some spelunking in the code to see what I could figure out. I don't think I really understand things too well, but I'll just say what I think I found and hopefully the experts will correct anything I get wrong. I think you're right about the first two mappings in that procstat output. The REF value is the reference count on the vm object (the vnode for the exe file, I presume). I think the reason the reference count is 2 is that one reference is the open file itself, and the other is the shadow object. I've always been a bit confused about the concept This is wrong, see above for explanation. Vnode ownership of the vm object does not end in the vm object reference count increase. Instead, filesystems manually manage vm object creation and destruction, since it fits with the vnode lifecycle management. of shadow objects in freebsd's vm, but I think it's somehow related to the running processes that are based on that executable vnode. For example, if another copy of Xorg were running, I think REF would be 3, and SHD would be 2. I don't know why there is no shadow object for the writable data mapping and why the refcount is only 1 for that. There _is_ shadow object, as indicated by flags showing that entry no longer 'needs copy'. The PRES thing seemed simple when I first looked at the code, but the more I think about it in relation to other numbers the more confused I get. The logic in the code is if the shadow count is 1 then PRES is the resident size of the shadow object. This seems to be a measure of shared-code usage... any object which could be shared but isn't gets counted as private resident. The part that confuses me is how PRES can be larger than RES. The value for PRES is taken from the resident_page_count field of the shadow object. The RES value is calculated by walking each page of the map entry and calling pmap_mincore() to see if it's resident. So the number of resident pages is calculated to be fewer than the resident_page_count of the object the entry maps. I don't understand. Oh hmmm, wait a sec... could it be that read-ahead or relocation fixup or various other things caused lots of pages to be faulted in for the vnode object (so they're resident) but not all of those pages are mapped into the process because the path of execution has never referenced them and caused faults to map them into the process' vmspace? This is mostly right, except the note that established