Re: newbus' ivar's limitation..

2012-07-12 Thread Arnaud Lacombe
Hi,

On Thu, Jul 12, 2012 at 1:20 AM, Warner Losh i...@bsdimp.com wrote:
 I'm sorry you feel that way.

 Honestly, though, I think you'll be more pissed when you find out that the 
 N:1 interface that you want is being done in the wrong domain.  But I've been 
 wrong before and look forward to seeing your replacement.

 acpi_pcib_acpi.c, btw, implements both PCIB interfaces and ACPI interfaces.

Does it ? From the definition of `acpi_pcib_acpi_methods', I can only
see a single pcib(4) interface being exported. Moreover, I do not seem
to be able to find any clue that would led any ACPI devices to attach
on acpi_pcib_acpi(4), neither to find how could acpi_get_flags() ends
up in acpi_pcib_read_ivar() ?

Thks,
 - Arnaud
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: nvidia-driver-295.49 is highly unstable

2012-07-12 Thread John Baldwin
On Tuesday, July 03, 2012 11:20:49 pm Yuri wrote:
 On 05/27/2012 13:08, Alexey Dokuchaev wrote:
  Perhaps you can try asking on official nVidia FreeBSD forum:
 
  http://www.nvnews.net/vbulletin/forumdisplay.php?f=47
 
 I reported there 05-28-12, but got no response.
 Do you know if there is a way to report a problem with NVidia? For 
 example, is there a for example bugzilla or other bug reporting system 
 for this?
 In addition, I observe system hangup for a few seconds when running 
 glxinfo. Also I observe Xorg freeze when I run nvidia-settings.
 So I have to run 285.05.09 from cvs instead.

If you read the README that comes with the driver there are notes on how to 
submit a bug report.  You need to run a script that collects debug information 
and then submit an e-mail to the e-mail address in the README.  It sometimes 
takes a few days for them to respond, but I've always had them respond.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: newbus' ivar's limitation..

2012-07-12 Thread John Baldwin
On Thursday, July 12, 2012 3:01:36 am Arnaud Lacombe wrote:
 Hi,
 
 On Thu, Jul 12, 2012 at 1:20 AM, Warner Losh i...@bsdimp.com wrote:
  I'm sorry you feel that way.
 
  Honestly, though, I think you'll be more pissed when you find out that the 
N:1 interface that you want is being done in the wrong domain.  But I've been 
wrong before and look forward to seeing your replacement.
 
  acpi_pcib_acpi.c, btw, implements both PCIB interfaces and ACPI 
interfaces.
 
 Does it ? From the definition of `acpi_pcib_acpi_methods', I can only
 see a single pcib(4) interface being exported. Moreover, I do not seem
 to be able to find any clue that would led any ACPI devices to attach
 on acpi_pcib_acpi(4), neither to find how could acpi_get_flags() ends
 up in acpi_pcib_read_ivar() ?

acpi_get_handle() is certainly supported.

Relevant code snippets:

sys/dev/acpica/acpivar.h:

/*
 * Note that the low ivar values are reserved to provide
 * interface compatibility with ISA drivers which can also
 * attach to ACPI.
 */
#define ACPI_IVAR_HANDLE0x100
#define ACPI_IVAR_UNUSED0x101   /* Unused/reserved. */
#define ACPI_IVAR_PRIVATE   0x102
#define ACPI_IVAR_FLAGS 0x103

/*
 * Accessor functions for our ivars.  Default value for BUS_READ_IVAR is
 * (type) 0.  The sys/bus.h accessor functions don't check return values.
 */
#define __ACPI_BUS_ACCESSOR(varp, var, ivarp, ivar, type)   \
\
static __inline type varp ## _get_ ## var(device_t dev) \
{   \
uintptr_t v = 0;\
BUS_READ_IVAR(device_get_parent(dev), dev,  \
ivarp ## _IVAR_ ## ivar, v);   \
return ((type) v);  \
}   \
\
static __inline void varp ## _set_ ## var(device_t dev, type t) \
{   \
uintptr_t v = (uintptr_t) t;\
BUS_WRITE_IVAR(device_get_parent(dev), dev, \
ivarp ## _IVAR_ ## ivar, v);\
}

__ACPI_BUS_ACCESSOR(acpi, handle, ACPI, HANDLE, ACPI_HANDLE)
__ACPI_BUS_ACCESSOR(acpi, private, ACPI, PRIVATE, void *)
__ACPI_BUS_ACCESSOR(acpi, flags, ACPI, FLAGS, int)

sys/dev/acpica/acpi_pcib_acpi.c:

/*
 * Support for standard PCI bridge ivars.
 */
static int
acpi_pcib_read_ivar(device_t dev, device_t child, int which, uintptr_t 
*result)
{
struct acpi_hpcib_softc *sc = device_get_softc(dev);

switch (which) {
case PCIB_IVAR_DOMAIN:
*result = sc-ap_segment;
return (0);
case PCIB_IVAR_BUS:
*result = sc-ap_bus;
return (0);
case ACPI_IVAR_HANDLE:
*result = (uintptr_t)sc-ap_handle;
return (0);
case ACPI_IVAR_FLAGS:
*result = (uintptr_t)sc-ap_flags;
return (0);
}
return (ENOENT);
}

sys/dev/acpica/acpi_pcib_pci.c:

static int
acpi_pcib_read_ivar(device_t dev, device_t child, int which, uintptr_t 
*result)
{
struct acpi_pcib_softc *sc = device_get_softc(dev);

switch (which) {
case ACPI_IVAR_HANDLE:
*result = (uintptr_t)sc-ap_handle;
return (0);
}
return (pcib_read_ivar(dev, child, which, result));
}

This is used by the ACPI PCI bus driver to detect buses that are enumerated 
via ACPI and to then provide the ACPI_IVAR_HANDLE for all such PCI devices.
Note that ACPI PCI uses its own ivars structure (acpi_pci_devinfo) that 
extends the base PCI ivars to add ACPI handle and flags.

sys/dev/acpi/acpi_pci.c:

struct acpi_pci_devinfo {
struct pci_devinfo  ap_dinfo;
ACPI_HANDLE ap_handle;
int ap_flags;
};

...

static int
acpi_pci_read_ivar(device_t dev, device_t child, int which, uintptr_t *result)
{
struct acpi_pci_devinfo *dinfo;

dinfo = device_get_ivars(child);
switch (which) {
case ACPI_IVAR_HANDLE:
*result = (uintptr_t)dinfo-ap_handle;
return (0);
case ACPI_IVAR_FLAGS:
*result = (uintptr_t)dinfo-ap_flags;
return (0);
}
return (pci_read_ivar(dev, child, which, result));
}

...

static int
acpi_pci_attach(device_t dev)
{
...
/*
 * First, PCI devices are added as in the normal PCI bus driver.
 * Afterwards, the ACPI namespace under the bridge driver is
 * walked to save ACPI handles to all the devices that appear in
 * the ACPI namespace as immediate descendants of the bridge.
 *
 * XXX: Sometimes PCI devices show up in the ACPI namespace that
 * pci_add_children() doesn't find.  We currently just ignore
 * these devices.
 */
pci_add_children(dev, domain, busno, 

Re: contigmalloc() breaking Xorg

2012-07-12 Thread John Baldwin
[ Adding alc@ for VM stuff, Warner for arm/mips bus dma brokenness ]

On Sunday, July 08, 2012 7:05:16 am Peter Jeremy wrote:
 On 2012-Jul-03 21:17:53 +1000, Peter Jeremy pe...@server.rulingia.com 
wrote:
 I have a reasonably recent 8-stable/amd64 system (r237444) with a ATI
 Radeon HD 2400 Pro, xorg-server-1.10.6,1 and xf86-video-ati-6.14.3_1
 8GB RAM and ZFS.  I'm seeing fairly consistent problems with Xorg
 ...
 How difficult would it be to modify bus_dmamem_alloc() [at least on
 x86] to handle multi-segment allocations?
 
 I think I've managed to create an amd64 bus_dmamem_alloc() that allows
 page-sized allocations as long as no boundary condition is specified
 and no more than page-sized alignment is required (porting it to other
 architectures would be trivial).  I've given it a quick whirl inside a
 VBox and no smoke came out but I'd appreciate someone with a better
 understanding of bus_dma(9) and vm/vm_contig.c giving
 http://www.rulingia.com/bugs/patch-wiredmalloc a once-over.  Note that
 this patch is against 8.x but there's only a trivial difference to head.
 
 BTW, the comment in busdma_machdep.c:bus_dmamem_alloc()
* XXX Use Contigmalloc until it is merged into this facility
* and handles multi-seg allocations.  Nobody is doing
* multi-seg allocations yet though.
* XXX Certain AGP hardware does.
 does not appear to be accurate.  Apart from drm, quite a few drivers
 call bus_dma_tag_create(9) with multiple segments and also call
 bus_dmamem_alloc(9) [though I haven't verified that the calls share
 the same bus_dma_tag, so I can't be absolutely certain].

I do think that all tags currently used with bus_dmamem_alloc() only use a 
single segment.  It's a bit of an unfortunate part of the bus_dmamem API
that the size of the allocate is determined by the tag (the tag should be
used for determining the features and constraints of a DMA engine, not really 
the amount of memory to allocate).

However, rather add a wiredmalloc(), I think you should just have 
bus_dmamem_alloc() call kmem_alloc_attr() directly in this case.  One of the 
things I've been meaning to add to bus_dma is a way to allocate other memory 
types (e.g. WC memory), and in that case it would be best to just call 
kmem_alloc_attr() directly instead.

 BTW(2): Whilst studying busdma_machdep.c for arm and mips, I've
 noticed they appear to potentially allocate substantial kernel stack
 under some conditions as several bus_dma(9) functions include:
 bus_dma_segment_t dm_segments[dmat-nsegments];
 What prevents this overflowing the kernel stack?

That does seem dubious.  x86 stores the array in the tag instead.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: kqueue periodic timer confusion

2012-07-12 Thread John Baldwin
On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
 On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
  Hi,
  
  Sorry about this repost but I'm confused about the responses I received
  in my last post so I'm looking for some clarification.
  
  Specifically, I though I could use the kqueue timer as essentially a
  drop in replacement for linuxfd_create/read, but was surprised that
  the accuracy of the kqueue timer is much less than what I need for my
  application.
  
  So my confusion at this point is whether this is consider to be a bug or
  feature?
  
  Here's some test code if you want to verify the problem:
  
  #include stdio.h
  #include stdlib.h
  #include string.h
  #include unistd.h
  #include errno.h
  #include sys/types.h
  #include sys/event.h
  #include sys/time.h
  
  int
  main(void)
  {
  int i,msec;
  int kq,nev;
  struct kevent inqueue;
  struct kevent outqueue;
  struct timeval start,end;
  
  if ((kq = kqueue()) == -1) {
  fprintf(stderr, kqueue error!? errno = %s, 
strerror(errno));
  exit(EXIT_FAILURE);
  }
  EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);
  
  gettimeofday(start, 0);
  for (i = 0; i  50; i++) {
  if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == 
-1) {
  fprintf(stderr, kevent error!? errno = %s, 
strerror(errno));
  exit(EXIT_FAILURE);
  } else if (outqueue.flags  EV_ERROR) {
  fprintf(stderr, EV_ERROR: %s\n, 
strerror(outqueue.data));
  exit(EXIT_FAILURE);
  }
  }
  gettimeofday(end, 0);
  
  msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + 
end.tv_usec - start.tv_usec) / 1000) - 1000);
  
  printf(msec = %d\n, msec);
  
  close(kq);
  return EXIT_SUCCESS;
  }
  
  
 
 What you are seeing is just the way FreeBSD currently works.  
 
 Sleeping (in most all of its various forms, and I've just looked at the
 kevent code to verify this is true there) is handled by converting the
 amount of time to sleep (usually specified in a timeval or timespec
 struct) to a count of timer ticks, using an internal routine called
 tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
 account for the current tick.  Whether that's a good idea or not (it
 probably was once, and probably not anymore) it's how things currently
 work, and could explain the fairly consistant +1ms you're seeing.

This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
installs a periodic callout that executes KNOTE() and then resets itself (via 
callout_reset()) each time it runs.  This should generally be closer to
regulary spaced intervals than something that does:

for (;;) {
usleep(20 * 1000);
}

Which should be subject to the problem you are describing.  It would be 
interesting to see if the callout routine itself is running at the right 
interval or if it is being delayed.  If the latter, then that should be fixed 
if at all possible.  You could investigate that by adding KTR traces to the
relevant callout routine (so recording the TSC timestamp each time the callout 
runs), and then parsing the ktrdump output to compute TSC deltas and examining 
that distribution to see if it is noisy or incorrect, etc.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: newbus' ivar's limitation..

2012-07-12 Thread Warner Losh

On Jul 12, 2012, at 6:01 AM, John Baldwin wrote:
 New-bus is certainly not the only way to organize a device hierarchy and is 
 not perfect, but in your case I suggest you tone down your language until you 
 have enough information to develop an informed opinion.

It is also not the only way to represent relationships between objects, or to 
export services to the rest of the kernel.  From earlier descriptions, it seems 
like some of these relationships aren't very newbus-y.  From what I know about 
FDT, many of them are 'this device's interrupt pin is tied to GPIO 12 on 
controller 3' which isn't a parent/child relationship, but rather some kind of 
interrupt cookie you'll need to implement bus_setup_intr.

Warner

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: kqueue periodic timer confusion

2012-07-12 Thread Ian Lepore
On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
 On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
  On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
   Hi,
   
   Sorry about this repost but I'm confused about the responses I received
   in my last post so I'm looking for some clarification.
   
   Specifically, I though I could use the kqueue timer as essentially a
   drop in replacement for linuxfd_create/read, but was surprised that
   the accuracy of the kqueue timer is much less than what I need for my
   application.
   
   So my confusion at this point is whether this is consider to be a bug or
   feature?
   
   Here's some test code if you want to verify the problem:
   
   #include stdio.h
   #include stdlib.h
   #include string.h
   #include unistd.h
   #include errno.h
   #include sys/types.h
   #include sys/event.h
   #include sys/time.h
   
   int
   main(void)
   {
   int i,msec;
   int kq,nev;
   struct kevent inqueue;
   struct kevent outqueue;
   struct timeval start,end;
   
   if ((kq = kqueue()) == -1) {
   fprintf(stderr, kqueue error!? errno = %s, 
 strerror(errno));
   exit(EXIT_FAILURE);
   }
   EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);
   
   gettimeofday(start, 0);
   for (i = 0; i  50; i++) {
   if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) 
   == 
 -1) {
   fprintf(stderr, kevent error!? errno = %s, 
 strerror(errno));
   exit(EXIT_FAILURE);
   } else if (outqueue.flags  EV_ERROR) {
   fprintf(stderr, EV_ERROR: %s\n, 
 strerror(outqueue.data));
   exit(EXIT_FAILURE);
   }
   }
   gettimeofday(end, 0);
   
   msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + 
 end.tv_usec - start.tv_usec) / 1000) - 1000);
   
   printf(msec = %d\n, msec);
   
   close(kq);
   return EXIT_SUCCESS;
   }
   
   
  
  What you are seeing is just the way FreeBSD currently works.  
  
  Sleeping (in most all of its various forms, and I've just looked at the
  kevent code to verify this is true there) is handled by converting the
  amount of time to sleep (usually specified in a timeval or timespec
  struct) to a count of timer ticks, using an internal routine called
  tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
  account for the current tick.  Whether that's a good idea or not (it
  probably was once, and probably not anymore) it's how things currently
  work, and could explain the fairly consistant +1ms you're seeing.
 
 This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
 installs a periodic callout that executes KNOTE() and then resets itself (via 
 callout_reset()) each time it runs.  This should generally be closer to
 regulary spaced intervals than something that does:
 

In what way is it irrelevant?  That is, what did I miss?  It appears to
me that the next callout is scheduled by calling timertoticks() passing
a count of milliseconds, that count is converted to a struct timeval and
passed to tvtohz() which is where the +1 adjustment happens.  If you ask
for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
There is some time, likely a small number of microseconds, that you've
consumed of the current tick, and that's what the +1 in tvtohz() is
supposed to account for according to the comments.  

The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
and then adds one tick on top of that.  That seems not quite right to
me, except that it is a way to g'tee that you don't return early, and
that is the one promise made by sleep routines on any OS; those magical
at least words always appear in the docs.

Actually what I'm missing (that I know of) is how the scheduler works.
Maybe the +1 adjustment to account for the fraction of the current tick
you've already consumed is the right thing to do, even when that
fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
behavior that I know nothing about.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: kqueue periodic timer confusion

2012-07-12 Thread John Baldwin
On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
 On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
  On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
   On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
Hi,

Sorry about this repost but I'm confused about the responses I received
in my last post so I'm looking for some clarification.

Specifically, I though I could use the kqueue timer as essentially a
drop in replacement for linuxfd_create/read, but was surprised that
the accuracy of the kqueue timer is much less than what I need for my
application.

So my confusion at this point is whether this is consider to be a bug or
feature?

Here's some test code if you want to verify the problem:

#include stdio.h
#include stdlib.h
#include string.h
#include unistd.h
#include errno.h
#include sys/types.h
#include sys/event.h
#include sys/time.h

int
main(void)
{
int i,msec;
int kq,nev;
struct kevent inqueue;
struct kevent outqueue;
struct timeval start,end;

if ((kq = kqueue()) == -1) {
fprintf(stderr, kqueue error!? errno = %s, 
  strerror(errno));
exit(EXIT_FAILURE);
}
EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);

gettimeofday(start, 0);
for (i = 0; i  50; i++) {
if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) 
== 
  -1) {
fprintf(stderr, kevent error!? errno = %s, 
  strerror(errno));
exit(EXIT_FAILURE);
} else if (outqueue.flags  EV_ERROR) {
fprintf(stderr, EV_ERROR: %s\n, 
  strerror(outqueue.data));
exit(EXIT_FAILURE);
}
}
gettimeofday(end, 0);

msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + 
  end.tv_usec - start.tv_usec) / 1000) - 1000);

printf(msec = %d\n, msec);

close(kq);
return EXIT_SUCCESS;
}


   
   What you are seeing is just the way FreeBSD currently works.  
   
   Sleeping (in most all of its various forms, and I've just looked at the
   kevent code to verify this is true there) is handled by converting the
   amount of time to sleep (usually specified in a timeval or timespec
   struct) to a count of timer ticks, using an internal routine called
   tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
   account for the current tick.  Whether that's a good idea or not (it
   probably was once, and probably not anymore) it's how things currently
   work, and could explain the fairly consistant +1ms you're seeing.
  
  This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
  installs a periodic callout that executes KNOTE() and then resets itself 
  (via 
  callout_reset()) each time it runs.  This should generally be closer to
  regulary spaced intervals than something that does:
  
 
 In what way is it irrelevant?  That is, what did I miss?  It appears to
 me that the next callout is scheduled by calling timertoticks() passing
 a count of milliseconds, that count is converted to a struct timeval and
 passed to tvtohz() which is where the +1 adjustment happens.  If you ask
 for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
 There is some time, likely a small number of microseconds, that you've
 consumed of the current tick, and that's what the +1 in tvtohz() is
 supposed to account for according to the comments.
 
 The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
 and then adds one tick on top of that.  That seems not quite right to
 me, except that it is a way to g'tee that you don't return early, and
 that is the one promise made by sleep routines on any OS; those magical
 at least words always appear in the docs.
 
 Actually what I'm missing (that I know of) is how the scheduler works.
 Maybe the +1 adjustment to account for the fraction of the current tick
 you've already consumed is the right thing to do, even when that
 fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
 behavior that I know nothing about.

Oh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
case.  That is, the +1 makes sense when you are computing a one-time delta
for things like nanosleep().  It is incorrect when computing a periodic
delta such as for computing the interval for an itimer (setitimer) or
EVFILT_TIMER().

Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1:

sys/kern/kern_time.c:

/*
 * Real interval timer expired:
 * send process whose timer expired an alarm signal.
 * If time is not set up to reload, then just return.
 * Else compute next time timer should go off which is  current 

Re: kqueue periodic timer confusion

2012-07-12 Thread Davide Italiano
On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote:
 On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
 On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
  On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
   On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
Hi,
   
Sorry about this repost but I'm confused about the responses I received
in my last post so I'm looking for some clarification.
   
Specifically, I though I could use the kqueue timer as essentially a
drop in replacement for linuxfd_create/read, but was surprised that
the accuracy of the kqueue timer is much less than what I need for my
application.
   
So my confusion at this point is whether this is consider to be a bug 
or
feature?
   
Here's some test code if you want to verify the problem:
   
#include stdio.h
#include stdlib.h
#include string.h
#include unistd.h
#include errno.h
#include sys/types.h
#include sys/event.h
#include sys/time.h
   
int
main(void)
{
int i,msec;
int kq,nev;
struct kevent inqueue;
struct kevent outqueue;
struct timeval start,end;
   
if ((kq = kqueue()) == -1) {
fprintf(stderr, kqueue error!? errno = %s,
  strerror(errno));
exit(EXIT_FAILURE);
}
EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 
0);
   
gettimeofday(start, 0);
for (i = 0; i  50; i++) {
if ((nev = kevent(kq, inqueue, 1, outqueue, 1, 
NULL)) ==
  -1) {
fprintf(stderr, kevent error!? errno = %s,
  strerror(errno));
exit(EXIT_FAILURE);
} else if (outqueue.flags  EV_ERROR) {
fprintf(stderr, EV_ERROR: %s\n,
  strerror(outqueue.data));
exit(EXIT_FAILURE);
}
}
gettimeofday(end, 0);
   
msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 +
  end.tv_usec - start.tv_usec) / 1000) - 1000);
   
printf(msec = %d\n, msec);
   
close(kq);
return EXIT_SUCCESS;
}
   
   
  
   What you are seeing is just the way FreeBSD currently works.
  
   Sleeping (in most all of its various forms, and I've just looked at the
   kevent code to verify this is true there) is handled by converting the
   amount of time to sleep (usually specified in a timeval or timespec
   struct) to a count of timer ticks, using an internal routine called
   tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
   account for the current tick.  Whether that's a good idea or not (it
   probably was once, and probably not anymore) it's how things currently
   work, and could explain the fairly consistant +1ms you're seeing.
 
  This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
  installs a periodic callout that executes KNOTE() and then resets itself 
  (via
  callout_reset()) each time it runs.  This should generally be closer to
  regulary spaced intervals than something that does:
 

 In what way is it irrelevant?  That is, what did I miss?  It appears to
 me that the next callout is scheduled by calling timertoticks() passing
 a count of milliseconds, that count is converted to a struct timeval and
 passed to tvtohz() which is where the +1 adjustment happens.  If you ask
 for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
 There is some time, likely a small number of microseconds, that you've
 consumed of the current tick, and that's what the +1 in tvtohz() is
 supposed to account for according to the comments.

 The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
 and then adds one tick on top of that.  That seems not quite right to
 me, except that it is a way to g'tee that you don't return early, and
 that is the one promise made by sleep routines on any OS; those magical
 at least words always appear in the docs.

 Actually what I'm missing (that I know of) is how the scheduler works.
 Maybe the +1 adjustment to account for the fraction of the current tick
 you've already consumed is the right thing to do, even when that
 fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
 behavior that I know nothing about.

 Oh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
 case.  That is, the +1 makes sense when you are computing a one-time delta
 for things like nanosleep().  It is incorrect when computing a periodic
 delta such as for computing the interval for an itimer (setitimer) or
 EVFILT_TIMER().

 Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1:

 sys/kern/kern_time.c:

 /*
  * Real interval timer expired:
  * send process whose timer expired an alarm signal.
  * If time is not set up to reload, then just return.
  * 

Re: kqueue periodic timer confusion

2012-07-12 Thread John Baldwin
On Thursday, July 12, 2012 11:08:47 am Davide Italiano wrote:
 On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote:
  On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
  On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
   On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
 Hi,

 Sorry about this repost but I'm confused about the responses I 
 received
 in my last post so I'm looking for some clarification.

 Specifically, I though I could use the kqueue timer as essentially a
 drop in replacement for linuxfd_create/read, but was surprised that
 the accuracy of the kqueue timer is much less than what I need for my
 application.

 So my confusion at this point is whether this is consider to be a 
 bug or
 feature?

 Here's some test code if you want to verify the problem:

 #include stdio.h
 #include stdlib.h
 #include string.h
 #include unistd.h
 #include errno.h
 #include sys/types.h
 #include sys/event.h
 #include sys/time.h

 int
 main(void)
 {
 int i,msec;
 int kq,nev;
 struct kevent inqueue;
 struct kevent outqueue;
 struct timeval start,end;

 if ((kq = kqueue()) == -1) {
 fprintf(stderr, kqueue error!? errno = %s,
   strerror(errno));
 exit(EXIT_FAILURE);
 }
 EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 
 0);

 gettimeofday(start, 0);
 for (i = 0; i  50; i++) {
 if ((nev = kevent(kq, inqueue, 1, outqueue, 1, 
 NULL)) ==
   -1) {
 fprintf(stderr, kevent error!? errno = %s,
   strerror(errno));
 exit(EXIT_FAILURE);
 } else if (outqueue.flags  EV_ERROR) {
 fprintf(stderr, EV_ERROR: %s\n,
   strerror(outqueue.data));
 exit(EXIT_FAILURE);
 }
 }
 gettimeofday(end, 0);

 msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 +
   end.tv_usec - start.tv_usec) / 1000) - 1000);

 printf(msec = %d\n, msec);

 close(kq);
 return EXIT_SUCCESS;
 }


   
What you are seeing is just the way FreeBSD currently works.
   
Sleeping (in most all of its various forms, and I've just looked at the
kevent code to verify this is true there) is handled by converting the
amount of time to sleep (usually specified in a timeval or timespec
struct) to a count of timer ticks, using an internal routine called
tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
account for the current tick.  Whether that's a good idea or not (it
probably was once, and probably not anymore) it's how things currently
work, and could explain the fairly consistant +1ms you're seeing.
  
   This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
   installs a periodic callout that executes KNOTE() and then resets itself 
   (via
   callout_reset()) each time it runs.  This should generally be closer to
   regulary spaced intervals than something that does:
  
 
  In what way is it irrelevant?  That is, what did I miss?  It appears to
  me that the next callout is scheduled by calling timertoticks() passing
  a count of milliseconds, that count is converted to a struct timeval and
  passed to tvtohz() which is where the +1 adjustment happens.  If you ask
  for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
  There is some time, likely a small number of microseconds, that you've
  consumed of the current tick, and that's what the +1 in tvtohz() is
  supposed to account for according to the comments.
 
  The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
  and then adds one tick on top of that.  That seems not quite right to
  me, except that it is a way to g'tee that you don't return early, and
  that is the one promise made by sleep routines on any OS; those magical
  at least words always appear in the docs.
 
  Actually what I'm missing (that I know of) is how the scheduler works.
  Maybe the +1 adjustment to account for the fraction of the current tick
  you've already consumed is the right thing to do, even when that
  fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
  behavior that I know nothing about.
 
  Oh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
  case.  That is, the +1 makes sense when you are computing a one-time delta
  for things like nanosleep().  It is incorrect when computing a periodic
  delta such as for computing the interval for an itimer (setitimer) or
  EVFILT_TIMER().
 
  Hah, setitimer()'s callout (realitexpire) uses 

Re: kqueue periodic timer confusion

2012-07-12 Thread Ian Lepore
On Thu, 2012-07-12 at 17:08 +0200, Davide Italiano wrote:
 On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote:
  On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
  On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
   On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
 Hi,

 Sorry about this repost but I'm confused about the responses I 
 received
 in my last post so I'm looking for some clarification.

 Specifically, I though I could use the kqueue timer as essentially a
 drop in replacement for linuxfd_create/read, but was surprised that
 the accuracy of the kqueue timer is much less than what I need for my
 application.

 So my confusion at this point is whether this is consider to be a 
 bug or
 feature?

 Here's some test code if you want to verify the problem:

 #include stdio.h
 #include stdlib.h
 #include string.h
 #include unistd.h
 #include errno.h
 #include sys/types.h
 #include sys/event.h
 #include sys/time.h

 int
 main(void)
 {
 int i,msec;
 int kq,nev;
 struct kevent inqueue;
 struct kevent outqueue;
 struct timeval start,end;

 if ((kq = kqueue()) == -1) {
 fprintf(stderr, kqueue error!? errno = %s,
   strerror(errno));
 exit(EXIT_FAILURE);
 }
 EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 
 0);

 gettimeofday(start, 0);
 for (i = 0; i  50; i++) {
 if ((nev = kevent(kq, inqueue, 1, outqueue, 1, 
 NULL)) ==
   -1) {
 fprintf(stderr, kevent error!? errno = %s,
   strerror(errno));
 exit(EXIT_FAILURE);
 } else if (outqueue.flags  EV_ERROR) {
 fprintf(stderr, EV_ERROR: %s\n,
   strerror(outqueue.data));
 exit(EXIT_FAILURE);
 }
 }
 gettimeofday(end, 0);

 msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 +
   end.tv_usec - start.tv_usec) / 1000) - 1000);

 printf(msec = %d\n, msec);

 close(kq);
 return EXIT_SUCCESS;
 }


   
What you are seeing is just the way FreeBSD currently works.
   
Sleeping (in most all of its various forms, and I've just looked at the
kevent code to verify this is true there) is handled by converting the
amount of time to sleep (usually specified in a timeval or timespec
struct) to a count of timer ticks, using an internal routine called
tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
account for the current tick.  Whether that's a good idea or not (it
probably was once, and probably not anymore) it's how things currently
work, and could explain the fairly consistant +1ms you're seeing.
  
   This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
   installs a periodic callout that executes KNOTE() and then resets itself 
   (via
   callout_reset()) each time it runs.  This should generally be closer to
   regulary spaced intervals than something that does:
  
 
  In what way is it irrelevant?  That is, what did I miss?  It appears to
  me that the next callout is scheduled by calling timertoticks() passing
  a count of milliseconds, that count is converted to a struct timeval and
  passed to tvtohz() which is where the +1 adjustment happens.  If you ask
  for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
  There is some time, likely a small number of microseconds, that you've
  consumed of the current tick, and that's what the +1 in tvtohz() is
  supposed to account for according to the comments.
 
  The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
  and then adds one tick on top of that.  That seems not quite right to
  me, except that it is a way to g'tee that you don't return early, and
  that is the one promise made by sleep routines on any OS; those magical
  at least words always appear in the docs.
 
  Actually what I'm missing (that I know of) is how the scheduler works.
  Maybe the +1 adjustment to account for the fraction of the current tick
  you've already consumed is the right thing to do, even when that
  fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
  behavior that I know nothing about.
 
  Oh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
  case.  That is, the +1 makes sense when you are computing a one-time delta
  for things like nanosleep().  It is incorrect when computing a periodic
  delta such as for computing the interval for an itimer (setitimer) or
  EVFILT_TIMER().
 
  Hah, setitimer()'s callout (realitexpire) uses 

Re: kqueue periodic timer confusion

2012-07-12 Thread Paul Albrecht
On Thu, 2012-07-12 at 09:26 -0500, John Baldwin wrote:
 On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
  On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
   On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
 Hi,
 
 Sorry about this repost but I'm confused about the responses I 
 received
 in my last post so I'm looking for some clarification.
 
 Specifically, I though I could use the kqueue timer as essentially a
 drop in replacement for linuxfd_create/read, but was surprised that
 the accuracy of the kqueue timer is much less than what I need for my
 application.
 
 So my confusion at this point is whether this is consider to be a bug 
 or
 feature?
 
 Here's some test code if you want to verify the problem:
 
 #include stdio.h
 #include stdlib.h
 #include string.h
 #include unistd.h
 #include errno.h
 #include sys/types.h
 #include sys/event.h
 #include sys/time.h
 
 int
 main(void)
 {
 int i,msec;
 int kq,nev;
 struct kevent inqueue;
 struct kevent outqueue;
 struct timeval start,end;
 
 if ((kq = kqueue()) == -1) {
 fprintf(stderr, kqueue error!? errno = %s, 
   strerror(errno));
 exit(EXIT_FAILURE);
 }
 EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 
 0);
 
 gettimeofday(start, 0);
 for (i = 0; i  50; i++) {
 if ((nev = kevent(kq, inqueue, 1, outqueue, 1, 
 NULL)) == 
   -1) {
 fprintf(stderr, kevent error!? errno = %s, 
   strerror(errno));
 exit(EXIT_FAILURE);
 } else if (outqueue.flags  EV_ERROR) {
 fprintf(stderr, EV_ERROR: %s\n, 
   strerror(outqueue.data));
 exit(EXIT_FAILURE);
 }
 }
 gettimeofday(end, 0);
 
 msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + 
   end.tv_usec - start.tv_usec) / 1000) - 1000);
 
 printf(msec = %d\n, msec);
 
 close(kq);
 return EXIT_SUCCESS;
 }
 
 

What you are seeing is just the way FreeBSD currently works.  

Sleeping (in most all of its various forms, and I've just looked at the
kevent code to verify this is true there) is handled by converting the
amount of time to sleep (usually specified in a timeval or timespec
struct) to a count of timer ticks, using an internal routine called
tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
account for the current tick.  Whether that's a good idea or not (it
probably was once, and probably not anymore) it's how things currently
work, and could explain the fairly consistant +1ms you're seeing.
   
   This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
   installs a periodic callout that executes KNOTE() and then resets itself 
   (via 
   callout_reset()) each time it runs.  This should generally be closer to
   regulary spaced intervals than something that does:
   
  
  In what way is it irrelevant?  That is, what did I miss?  It appears to
  me that the next callout is scheduled by calling timertoticks() passing
  a count of milliseconds, that count is converted to a struct timeval and
  passed to tvtohz() which is where the +1 adjustment happens.  If you ask
  for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
  There is some time, likely a small number of microseconds, that you've
  consumed of the current tick, and that's what the +1 in tvtohz() is
  supposed to account for according to the comments.
  
  The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
  and then adds one tick on top of that.  That seems not quite right to
  me, except that it is a way to g'tee that you don't return early, and
  that is the one promise made by sleep routines on any OS; those magical
  at least words always appear in the docs.
  
  Actually what I'm missing (that I know of) is how the scheduler works.
  Maybe the +1 adjustment to account for the fraction of the current tick
  you've already consumed is the right thing to do, even when that
  fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
  behavior that I know nothing about.
 
 Oh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
 case.  That is, the +1 makes sense when you are computing a one-time delta
 for things like nanosleep().  It is incorrect when computing a periodic
 delta such as for computing the interval for an itimer (setitimer) or
 EVFILT_TIMER().
 
 Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1:
 
 sys/kern/kern_time.c:
 
 /*
  * Real 

CVE-2012-0217 Intel's sysret Kernel Privilege Escalation and FreeBSD 6.2/6.3

2012-07-12 Thread Bill Crisp
Good Morning!

This was also posted to the FreeBSD forums:

I have been researching CVE-2012-0217 and while I have patched the kernels
on servers with 7.3/8.2 that I have, I would like to see if anyone knows
for sure if 6.2/6.3 are also vulnerable? I am aware that those kernels are
out of support from looking at the documentation. I have looked at the code
in trap.c to see if the current patch would work with 6.3 source but it
won't based on what I saw. I am also aware of upgrading as an option to
resolve this unfortunately in some cases I have this is not possible right
now.

Any help would be greatly appreciated, and I can of course test anything
that might need it.

Thanks!
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: port devel/doxygen failing to test on -CURRENT and -STABLE

2012-07-12 Thread Chris Rees
On 9 July 2012 02:49, David Xu davi...@freebsd.org wrote:
 On 2012/07/08 18:21, Chris Rees wrote:

 Hi all / David,

 doxygen has been failing for a while now on -CURRENT and apparently
 -STABLE too.  The current fix is disabling one of the tests in the
 build, but obviously it points to a problem with our base system

 I've trussed [1] the failing code [2], and it looks as though it's
 hanging on a _umtx call.  I'm gratuitously ignorant of what goes on
 there... but the timings of recent commits to umtx.h [3] could
 indicate a link (hope it's not bogus...).

 Any pointers on what I should do next?

 Chris

 [1] http://www.bayofrum.net/~crees/scratch/doxygen-truss

 _umtx_op(0x8012b0280,0x16,0x0,0x0,0x0,0x1)   ERR#22 'Invalid argument'

 can you execute it in gdb and print its value ?

 print/x *(int *)0x8012b0280
 print/x *(int *)(0x8012b0280+4)

I've been having trouble debugging it since it's threaded, and so I
ran a binary search over the last few days of revisions from 1/Apr to
1/May.

Unfortunately I discovered to my horror today that all but the first
test was useless, because the patch I committed to disable the test
was of course readded to my ports tree, so none of the tests ran :/

I'll hopefully have it narrowed down to the offending commit over the
next few days.

Chris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: dtraceall.ko with old nfsclient

2012-07-12 Thread Fabian Keil
Benjamin Kaduk ka...@mit.edu wrote:

 On Wed, 11 Jul 2012, Fabian Keil wrote:
 
  I'm using the following modification of Sean's patch:

This way it seems to work as expected:

diff --git a/sys/modules/dtrace/dtraceall/Makefile 
b/sys/modules/dtrace/dtraceall/Makefile
index 456efd1..628583b 100644
--- a/sys/modules/dtrace/dtraceall/Makefile
+++ b/sys/modules/dtrace/dtraceall/Makefile
@@ -1,7 +1,7 @@
 # $FreeBSD: src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 
09:07:31 uqs Exp $

 KMOD=  dtraceall
-SRCS=  dtraceall.c opt_compat.h
+SRCS=  dtraceall.c opt_compat.h opt_nfs.h

 CFLAGS+=   -I${.CURDIR}/../../..

diff --git a/sys/modules/dtrace/dtraceall/dtraceall.c 
b/sys/modules/dtrace/dtraceall/dtraceall.c
index d256489..0672556 100644
--- a/sys/modules/dtrace/dtraceall/dtraceall.c
+++ b/sys/modules/dtrace/dtraceall/dtraceall.c
@@ -33,6 +33,7 @@
 #include sys/module.h
 #include sys/errno.h
 #include opt_compat.h
+#include opt_nfs.h

 static int
 dtraceall_modevent(module_t mod __unused, int type, void *data __unused)
@@ -67,8 +68,11 @@ MODULE_DEPEND(dtraceall, opensolaris, 1, 1, 1);
 MODULE_DEPEND(dtraceall, dtrace, 1, 1, 1);
 MODULE_DEPEND(dtraceall, dtio, 1, 1, 1);
 MODULE_DEPEND(dtraceall, dtmalloc, 1, 1, 1);
+#if defined (NFSCL)
 MODULE_DEPEND(dtraceall, dtnfscl, 1, 1, 1);
+#elif defined (NFSCLIENT)
 MODULE_DEPEND(dtraceall, dtnfsclient, 1, 1, 1);
+#endif
 #if defined(__amd64__) || defined(__i386__)
 MODULE_DEPEND(dtraceall, fbt, 1, 1, 1);
 MODULE_DEPEND(dtraceall, fasttrap, 1, 1, 1);

  Note that dtnfscl.ko is not loaded even though loading
  it manually works and I have NFSCL in the kernel.

I wasn't entirely clear here, what I meant was that
the KERNCONF used when compiling the module included
options NFSCL. I didn't expect run-time detection.

 This is because dtraceall.c only #includes opt_compat.h, and the kernel 
 build system only passes -include opt_global.h, so the dtraceall module 
 build has no way of knowing about the NFSCL{IENT,} options defined in 
 opt_nfs.h.  (As you noted earlier in the thread?)

Yes.

 You would still need to address Andriy's comments in order to ensure that 
 the configuration seen by the module matches the kernel.

Sure.

Fabian


signature.asc
Description: PGP signature


Re: kqueue periodic timer confusion

2012-07-12 Thread Paul Albrecht
On Thu, 2012-07-12 at 10:40 -0500, Ian Lepore wrote:
 On Thu, 2012-07-12 at 17:08 +0200, Davide Italiano wrote:
  On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote:
   On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
   On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
 On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
  Hi,
 
  Sorry about this repost but I'm confused about the responses I 
  received
  in my last post so I'm looking for some clarification.
 
  Specifically, I though I could use the kqueue timer as essentially 
  a
  drop in replacement for linuxfd_create/read, but was surprised 
  that
  the accuracy of the kqueue timer is much less than what I need for 
  my
  application.
 
  So my confusion at this point is whether this is consider to be a 
  bug or
  feature?
 
  Here's some test code if you want to verify the problem:
 
  #include stdio.h
  #include stdlib.h
  #include string.h
  #include unistd.h
  #include errno.h
  #include sys/types.h
  #include sys/event.h
  #include sys/time.h
 
  int
  main(void)
  {
  int i,msec;
  int kq,nev;
  struct kevent inqueue;
  struct kevent outqueue;
  struct timeval start,end;
 
  if ((kq = kqueue()) == -1) {
  fprintf(stderr, kqueue error!? errno = %s,
strerror(errno));
  exit(EXIT_FAILURE);
  }
  EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 
  20, 0);
 
  gettimeofday(start, 0);
  for (i = 0; i  50; i++) {
  if ((nev = kevent(kq, inqueue, 1, outqueue, 1, 
  NULL)) ==
-1) {
  fprintf(stderr, kevent error!? errno = 
  %s,
strerror(errno));
  exit(EXIT_FAILURE);
  } else if (outqueue.flags  EV_ERROR) {
  fprintf(stderr, EV_ERROR: %s\n,
strerror(outqueue.data));
  exit(EXIT_FAILURE);
  }
  }
  gettimeofday(end, 0);
 
  msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 +
end.tv_usec - start.tv_usec) / 1000) - 1000);
 
  printf(msec = %d\n, msec);
 
  close(kq);
  return EXIT_SUCCESS;
  }
 
 

 What you are seeing is just the way FreeBSD currently works.

 Sleeping (in most all of its various forms, and I've just looked at 
 the
 kevent code to verify this is true there) is handled by converting 
 the
 amount of time to sleep (usually specified in a timeval or timespec
 struct) to a count of timer ticks, using an internal routine called
 tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
 account for the current tick.  Whether that's a good idea or not (it
 probably was once, and probably not anymore) it's how things 
 currently
 work, and could explain the fairly consistant +1ms you're seeing.
   
This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
installs a periodic callout that executes KNOTE() and then resets 
itself (via
callout_reset()) each time it runs.  This should generally be closer to
regulary spaced intervals than something that does:
   
  
   In what way is it irrelevant?  That is, what did I miss?  It appears to
   me that the next callout is scheduled by calling timertoticks() passing
   a count of milliseconds, that count is converted to a struct timeval and
   passed to tvtohz() which is where the +1 adjustment happens.  If you ask
   for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
   There is some time, likely a small number of microseconds, that you've
   consumed of the current tick, and that's what the +1 in tvtohz() is
   supposed to account for according to the comments.
  
   The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
   and then adds one tick on top of that.  That seems not quite right to
   me, except that it is a way to g'tee that you don't return early, and
   that is the one promise made by sleep routines on any OS; those magical
   at least words always appear in the docs.
  
   Actually what I'm missing (that I know of) is how the scheduler works.
   Maybe the +1 adjustment to account for the fraction of the current tick
   you've already consumed is the right thing to do, even when that
   fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
   behavior that I know nothing about.
  
   Oh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
   case.  That is, the +1 makes sense when you are computing a one-time delta
   

Re: dtraceall.ko with old nfsclient

2012-07-12 Thread Andriy Gapon
on 12/07/2012 21:17 Fabian Keil said the following:
 Benjamin Kaduk ka...@mit.edu wrote:
 
 On Wed, 11 Jul 2012, Fabian Keil wrote:
 
 I'm using the following modification of Sean's patch:
 
 This way it seems to work as expected:
 
 diff --git a/sys/modules/dtrace/dtraceall/Makefile
 b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644 ---
 a/sys/modules/dtrace/dtraceall/Makefile +++
 b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD:
 src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs Exp
 $
 
 KMOD=  dtraceall -SRCS=  dtraceall.c opt_compat.h +SRCS=
 dtraceall.c opt_compat.h opt_nfs.h
 
 CFLAGS+=   -I${.CURDIR}/../../..
 

If you do cd sys/modules/dtrace/dtraceall  make [obj depend] all, does it
compile OK with the above change?
I am hinting at the case where KERNBUILDDIR is not set.
This is not the proper way of building modules, but traditionally we keep it
working.

-- 
Andriy Gapon

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: dtraceall.ko with old nfsclient

2012-07-12 Thread Fabian Keil
Andriy Gapon a...@freebsd.org wrote:

 on 12/07/2012 21:17 Fabian Keil said the following:
  Benjamin Kaduk ka...@mit.edu wrote:
  
  On Wed, 11 Jul 2012, Fabian Keil wrote:
  
  I'm using the following modification of Sean's patch:
  
  This way it seems to work as expected:
  
  diff --git a/sys/modules/dtrace/dtraceall/Makefile
  b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644 ---
  a/sys/modules/dtrace/dtraceall/Makefile +++
  b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD:
  src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs Exp
  $
  
  KMOD=  dtraceall -SRCS=  dtraceall.c opt_compat.h +SRCS=
  dtraceall.c opt_compat.h opt_nfs.h
  
  CFLAGS+=   -I${.CURDIR}/../../..
  
 
 If you do cd sys/modules/dtrace/dtraceall  make [obj depend] all, does it
 compile OK with the above change?

Depends on your expectations I guess. As neither NFS-related
option gets defined, no dependency on either NFS module is
registered. The compiler has no complaints, though.

Fabian


signature.asc
Description: PGP signature


Re: dtraceall.ko with old nfsclient

2012-07-12 Thread Andriy Gapon
on 12/07/2012 22:36 Fabian Keil said the following:
 Andriy Gapon a...@freebsd.org wrote:
 
 on 12/07/2012 21:17 Fabian Keil said the following:
 Benjamin Kaduk ka...@mit.edu wrote:
 
 On Wed, 11 Jul 2012, Fabian Keil wrote:
 
 I'm using the following modification of Sean's patch:
 
 This way it seems to work as expected:
 
 diff --git a/sys/modules/dtrace/dtraceall/Makefile 
 b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644
 --- a/sys/modules/dtrace/dtraceall/Makefile +++ 
 b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD: 
 src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs
 Exp $
 
 KMOD=  dtraceall -SRCS=  dtraceall.c opt_compat.h
 +SRCS= dtraceall.c opt_compat.h opt_nfs.h
 
 CFLAGS+=   -I${.CURDIR}/../../..
 
 
 If you do cd sys/modules/dtrace/dtraceall  make [obj depend] all, does
 it compile OK with the above change?
 
 Depends on your expectations I guess. As neither NFS-related option gets
 defined, no dependency on either NFS module is registered. The compiler has
 no complaints, though.

Interesting.  Could you repeat after sufficient cleaning up?
I am not sure where from opt_nfs.h file could come.

-- 
Andriy Gapon


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: dtraceall.ko with old nfsclient

2012-07-12 Thread Fabian Keil
Andriy Gapon a...@freebsd.org wrote:

 on 12/07/2012 22:36 Fabian Keil said the following:
  Andriy Gapon a...@freebsd.org wrote:
  
  on 12/07/2012 21:17 Fabian Keil said the following:
  Benjamin Kaduk ka...@mit.edu wrote:
  
  On Wed, 11 Jul 2012, Fabian Keil wrote:
  
  I'm using the following modification of Sean's patch:
  
  This way it seems to work as expected:
  
  diff --git a/sys/modules/dtrace/dtraceall/Makefile 
  b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644
  --- a/sys/modules/dtrace/dtraceall/Makefile +++ 
  b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD: 
  src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs
  Exp $
  
  KMOD=  dtraceall -SRCS=  dtraceall.c opt_compat.h
  +SRCS= dtraceall.c opt_compat.h opt_nfs.h
  
  CFLAGS+=   -I${.CURDIR}/../../..
  
  
  If you do cd sys/modules/dtrace/dtraceall  make [obj depend] all, does
  it compile OK with the above change?
  
  Depends on your expectations I guess. As neither NFS-related option gets
  defined, no dependency on either NFS module is registered. The compiler has
  no complaints, though.
 
 Interesting.  Could you repeat after sufficient cleaning up?
 I am not sure where from opt_nfs.h file could come.

The Makefile seems to create an empty one:
 
fk@r500 /usr/src/sys/modules/dtrace/dtraceall $make clean
rm -f export_syms dtraceall.ko dtraceall.kld dtraceall.o dtraceall.ko.debug 
dtraceall.ko.symbols opt_compat.h opt_nfs.h
fk@r500 /usr/src/sys/modules/dtrace/dtraceall $make
echo #define COMPAT_FREEBSD32 1  opt_compat.h
: opt_nfs.h
cc -O2 -pipe [...] -c /usr/src/sys/modules/dtrace/dtraceall/dtraceall.c
ld  -d -warn-common -r -d -o dtraceall.ko.debug dtraceall.o
: export_syms
awk -f /usr/src/sys/modules/dtrace/dtraceall/../../../conf/kmod_syms.awk 
dtraceall.ko.debug  export_syms | xargs -J% objcopy % dtraceall.ko.debug
objcopy --only-keep-debug dtraceall.ko.debug dtraceall.ko.symbols
objcopy --strip-debug --add-gnu-debuglink=dtraceall.ko.symbols 
dtraceall.ko.debug dtraceall.ko

Fabian


signature.asc
Description: PGP signature


Re: dtraceall.ko with old nfsclient

2012-07-12 Thread Andrew Boyer

On Jul 12, 2012, at 3:39 PM, Andriy Gapon wrote:

 on 12/07/2012 22:36 Fabian Keil said the following:
 Andriy Gapon a...@freebsd.org wrote:
 
 on 12/07/2012 21:17 Fabian Keil said the following:
 Benjamin Kaduk ka...@mit.edu wrote:
 
 On Wed, 11 Jul 2012, Fabian Keil wrote:
 
 I'm using the following modification of Sean's patch:
 
 This way it seems to work as expected:
 
 diff --git a/sys/modules/dtrace/dtraceall/Makefile 
 b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b 100644
 --- a/sys/modules/dtrace/dtraceall/Makefile +++ 
 b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD: 
 src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09 09:07:31 uqs
 Exp $
 
 KMOD=  dtraceall -SRCS=  dtraceall.c opt_compat.h
 +SRCS= dtraceall.c opt_compat.h opt_nfs.h
 
 CFLAGS+=   -I${.CURDIR}/../../..
 
 
 If you do cd sys/modules/dtrace/dtraceall  make [obj depend] all, does
 it compile OK with the above change?
 
 Depends on your expectations I guess. As neither NFS-related option gets
 defined, no dependency on either NFS module is registered. The compiler has
 no complaints, though.
 
 Interesting.  Could you repeat after sufficient cleaning up?
 I am not sure where from opt_nfs.h file could come.
 

Maybe related: check out sys/modules/ipfw/Makefile.  It makes its own option 
headers for INET and INET6.

-A

--
Andrew Boyerabo...@averesystems.com




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: dtraceall.ko with old nfsclient

2012-07-12 Thread Andriy Gapon
on 12/07/2012 22:44 Fabian Keil said the following:
 Andriy Gapon a...@freebsd.org wrote:
 
 on 12/07/2012 22:36 Fabian Keil said the following:
 Andriy Gapon a...@freebsd.org wrote:
 
 on 12/07/2012 21:17 Fabian Keil said the following:
 Benjamin Kaduk ka...@mit.edu wrote:
 
 On Wed, 11 Jul 2012, Fabian Keil wrote:
 
 I'm using the following modification of Sean's patch:
 
 This way it seems to work as expected:
 
 diff --git a/sys/modules/dtrace/dtraceall/Makefile 
 b/sys/modules/dtrace/dtraceall/Makefile index 456efd1..628583b
 100644 --- a/sys/modules/dtrace/dtraceall/Makefile +++ 
 b/sys/modules/dtrace/dtraceall/Makefile @@ -1,7 +1,7 @@ # $FreeBSD:
  src/sys/modules/dtrace/dtraceall/Makefile,v 1.3 2011/04/09
 09:07:31 uqs Exp $
 
 KMOD=  dtraceall -SRCS=  dtraceall.c opt_compat.h 
 +SRCS= dtraceall.c opt_compat.h opt_nfs.h
 
 CFLAGS+=   -I${.CURDIR}/../../..
 
 
 If you do cd sys/modules/dtrace/dtraceall  make [obj depend] all,
 does it compile OK with the above change?
 
 Depends on your expectations I guess. As neither NFS-related option
 gets defined, no dependency on either NFS module is registered. The
 compiler has no complaints, though.
 
 Interesting.  Could you repeat after sufficient cleaning up? I am not
 sure where from opt_nfs.h file could come.
 
 The Makefile seems to create an empty one:
 
 fk@r500 /usr/src/sys/modules/dtrace/dtraceall $make clean rm -f export_syms
 dtraceall.ko dtraceall.kld dtraceall.o dtraceall.ko.debug
 dtraceall.ko.symbols opt_compat.h opt_nfs.h fk@r500
 /usr/src/sys/modules/dtrace/dtraceall $make echo #define COMPAT_FREEBSD32
 1  opt_compat.h : opt_nfs.h cc -O2 -pipe [...] -c
 /usr/src/sys/modules/dtrace/dtraceall/dtraceall.c ld  -d -warn-common -r -d
 -o dtraceall.ko.debug dtraceall.o : export_syms awk -f
 /usr/src/sys/modules/dtrace/dtraceall/../../../conf/kmod_syms.awk
 dtraceall.ko.debug  export_syms | xargs -J% objcopy % dtraceall.ko.debug 
 objcopy --only-keep-debug dtraceall.ko.debug dtraceall.ko.symbols objcopy
 --strip-debug --add-gnu-debuglink=dtraceall.ko.symbols dtraceall.ko.debug
 dtraceall.ko

Ah, correct.  I now even see the relevant snippet in kmod.mk:
.for _src in ${SRCS:Mopt_*.h}
CLEANFILES+=${_src}
.if !target(${_src})
${_src}:
: ${.TARGET}
.endif
.endfor

It's only when we want those file to be not empty (have some options) that we
define any targets for them (like COMPAT_FREEBSD32 in the dtraceall Makefile).
Sorry for the noise.

-- 
Andriy Gapon


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: CVE-2012-0217 Intel's sysret Kernel Privilege Escalation and FreeBSD 6.2/6.3

2012-07-12 Thread Xin Li

On 07/12/12 09:36, Bill Crisp wrote:

Good Morning!

This was also posted to the FreeBSD forums:

I have been researching CVE-2012-0217 and while I have patched the kernels
on servers with 7.3/8.2 that I have, I would like to see if anyone knows
for sure if 6.2/6.3 are also vulnerable? I am aware that those kernels are
out of support from looking at the documentation. I have looked at the code
in trap.c to see if the current patch would work with 6.3 source but it
won't based on what I saw. I am also aware of upgrading as an option to
resolve this unfortunately in some cases I have this is not possible right
now.
I believe that 6.x are vulnerable.  You will have to backport the change 
(something like this against sys/amd64/amd64/trap.c, in syscall() right 
after


PTRACESTOP_SC(p, td, S_PT_SCX);

Add:

+   /*
+* If the user-supplied value of %rip is not a canonical
+* address, then some CPUs will trigger a ring 0 #GP during
+* the sysret instruction.  However, the fault handler would
+* execute with the user's %gs and %rsp in ring 0 which would
+* not be safe.  Instead, preemptively kill the thread with a
+* SIGBUS.
+*/
+   if (td-td_frame-tf_rip= VM_MAXUSER_ADDRESS) {
+   ksiginfo_init_trap(ksi);
+   ksi.ksi_signo = SIGBUS;
+   ksi.ksi_code = BUS_OBJERR;
+   ksi.ksi_trapno = T_PROTFLT;
+   ksi.ksi_addr = (void *)td-td_frame-tf_rip;
+   trapsignal(td,ksi);
+   }

Right before:

WITNESS_WARN(...)


Cheers,


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: /proc filesystem

2012-07-12 Thread Ian Lepore
On Tue, 2012-06-19 at 06:47 +0200, Wojciech Puchar wrote:
 that is what i need.
 
 but still need some explanation after using it and reading manual
 
 say:
PID  STARTEND PRT  RES PRES REF SHD  FL TP PATH
   1378   0x40   0x5ac000 r-x  385  415   2   1 CN- vn 
 /usr/local/bin/Xorg
   1378   0x7ab000   0x7bc000 rw-   170   1   0 C-- vn 
 /usr/local/bin/Xorg
   1378   0x7bc000   0x80 rw-   140   1   0 C-- df
   13780x8007ab0000x8007c3000 r-x   240  32   0 CN- vn 
 /libexec/ld-elf.so.1
   13780x8007c30000x8007f rw-   430   1   0 C-- df
   13780x8007f0x8007f2000 rw-10   4   0 --- dv
   13780x8007f20000x8007f4000 rw-20   4   0 --- dv
   13780x8007f40000x800874000 rw-   110   4   0 --- dv
   13780x8008740000x800884000 rw-   160   4   0 --- dv
   13780x8008840000x800895000 rw-   100   1   0 CN- df
   13780x8009c20000x8009c5000 rw-30   1   0 C-- df
 
 
 1) Xorg is mapped twice - IMHO first is text/rodata second is data. But 
 what REF really means here and why it is 2 once and 1 second.
 
 2) what really PRES (private resident) means? df (default) mappings are 
 IMHO anonymous maps==private data of process. so why RES is nonzero while 
 PRES is zero, while on shared code PRES is nonzero and large. what does it 
 really means?
 
 thanks.
 

I'm catching up on threads I was following before I went on vacation,
and it looks like there was never a response to this.  I'm interested in
the answers to these questions too, so today I did some spelunking in
the code to see what I could figure out.  I don't think I really
understand things too well, but I'll just say what I think I found and
hopefully the experts will correct anything I get wrong.

I think you're right about the first two mappings in that procstat
output.  The REF value is the reference count on the vm object (the
vnode for the exe file, I presume).  I think the reason the reference
count is 2 is that one reference is the open file itself, and the other
is the shadow object.  I've always been a bit confused about the concept
of shadow objects in freebsd's vm, but I think it's somehow related to
the running processes that are based on that executable vnode.  For
example, if another copy of Xorg were running, I think REF would be 3,
and SHD would be 2.

I don't know why there is no shadow object for the writable data mapping
and why the refcount is only 1 for that.

The PRES thing seemed simple when I first looked at the code, but the
more I think about it in relation to other numbers the more confused I
get.  The logic in the code is if the shadow count is 1 then PRES is
the resident size of the shadow object.  This seems to be a measure of
shared-code usage... any object which could be shared but isn't gets
counted as private resident.

The part that confuses me is how PRES can be larger than RES.  The value
for PRES is taken from the resident_page_count field of the shadow
object.  The RES value is calculated by walking each page of the map
entry and calling pmap_mincore() to see if it's resident.  So the number
of resident pages is calculated to be fewer than the resident_page_count
of the object the entry maps.  I don't understand.

Oh hmmm, wait a sec... could it be that read-ahead or relocation fixup
or various other things caused lots of pages to be faulted in for the
vnode object (so they're resident) but not all of those pages are mapped
into the process because the path of execution has never referenced them
and caused faults to map them into the process' vmspace?

-- Ian

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: /proc filesystem

2012-07-12 Thread Andrew Duane
 -Original Message-
 From: owner-freebsd-hack...@freebsd.org [mailto:owner-freebsd-
 hack...@freebsd.org] On Behalf Of Ian Lepore
 Sent: Thursday, July 12, 2012 6:42 PM
 To: Wojciech Puchar
 Cc: freebsd-hackers@freebsd.org
 Subject: Re: /proc filesystem
 
 On Tue, 2012-06-19 at 06:47 +0200, Wojciech Puchar wrote:
  that is what i need.
 
  but still need some explanation after using it and reading manual
 
  say:
 PID  STARTEND PRT  RES PRES REF SHD  FL
 TP PATH
1378   0x40   0x5ac000 r-x  385  415   2   1 CN- vn 
  /usr/local/bin/Xorg
1378   0x7ab000   0x7bc000 rw-   170   1   0 C-- vn 
  /usr/local/bin/Xorg
1378   0x7bc000   0x80 rw-   140   1   0 C-- df
13780x8007ab0000x8007c3000 r-x   240  32   0 CN- vn 
  /libexec/ld-elf.so.1
13780x8007c30000x8007f rw-   430   1   0 C-- df
13780x8007f0x8007f2000 rw-10   4   0 --- dv
13780x8007f20000x8007f4000 rw-20   4   0 --- dv
13780x8007f40000x800874000 rw-   110   4   0 --- dv
13780x8008740000x800884000 rw-   160   4   0 --- dv
13780x8008840000x800895000 rw-   100   1   0 CN- df
13780x8009c20000x8009c5000 rw-30   1   0 C-- df
 
  1) Xorg is mapped twice - IMHO first is text/rodata second is data. But
  what REF really means here and why it is 2 once and 1 second.
 
  2) what really PRES (private resident) means? df (default) mappings are
  IMHO anonymous maps==private data of process. so why RES is nonzero while
  PRES is zero, while on shared code PRES is nonzero and large. what does it
  really means?
 
  thanks.
 
 
 I'm catching up on threads I was following before I went on vacation,
 and it looks like there was never a response to this.  I'm interested in
 the answers to these questions too, so today I did some spelunking in
 the code to see what I could figure out.  I don't think I really
 understand things too well, but I'll just say what I think I found and
 hopefully the experts will correct anything I get wrong.
 
 I think you're right about the first two mappings in that procstat
 output.  The REF value is the reference count on the vm object (the
 vnode for the exe file, I presume).  I think the reason the reference
 count is 2 is that one reference is the open file itself, and the other
 is the shadow object.  I've always been a bit confused about the concept
 of shadow objects in freebsd's vm, but I think it's somehow related to
 the running processes that are based on that executable vnode.  For
 example, if another copy of Xorg were running, I think REF would be 3,
 and SHD would be 2.
 
 I don't know why there is no shadow object for the writable data mapping
 and why the refcount is only 1 for that.

BSS that doesn't exist in the file?

 ...
Andrew Duane
Juniper Networks
+1 978-589-0551 (o)
+1 603-770-7088 (m)
adu...@juniper.net

 



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: /proc filesystem

2012-07-12 Thread Konstantin Belousov
On Thu, Jul 12, 2012 at 04:41:58PM -0600, Ian Lepore wrote:
 On Tue, 2012-06-19 at 06:47 +0200, Wojciech Puchar wrote:
  that is what i need.
  
  but still need some explanation after using it and reading manual
  
  say:
 PID  STARTEND PRT  RES PRES REF SHD  FL TP 
  PATH
1378   0x40   0x5ac000 r-x  385  415   2   1 CN- vn 
  /usr/local/bin/Xorg
1378   0x7ab000   0x7bc000 rw-   170   1   0 C-- vn 
  /usr/local/bin/Xorg
1378   0x7bc000   0x80 rw-   140   1   0 C-- df
13780x8007ab0000x8007c3000 r-x   240  32   0 CN- vn 
  /libexec/ld-elf.so.1
13780x8007c30000x8007f rw-   430   1   0 C-- df
13780x8007f0x8007f2000 rw-10   4   0 --- dv
13780x8007f20000x8007f4000 rw-20   4   0 --- dv
13780x8007f40000x800874000 rw-   110   4   0 --- dv
13780x8008740000x800884000 rw-   160   4   0 --- dv
13780x8008840000x800895000 rw-   100   1   0 CN- df
13780x8009c20000x8009c5000 rw-30   1   0 C-- df
  
  
  1) Xorg is mapped twice - IMHO first is text/rodata second is data. But 
  what REF really means here and why it is 2 once and 1 second.
ref shows the reference count on the top of the shadow chain.

The Xorg text is mapped read-only private and flags indicate that there
were no writes to the text (e.g. from debuggers to set breakpoints),
so no COW were performed, and no shadows to contain the COW pages were
inserted. You see the reference count 2 because text and data mappings
are separate vm map entries, and both reference the same vm object.

For the Xorg data, there were writes into private writeable mapping, so
you can see in flags that COW was performed, and shadow object installed
over the vnode vm object. Since the shadow object has a single user,
namely the data mapping in the Xorg process, the ref count is 1.

  
  2) what really PRES (private resident) means? df (default) mappings are 
  IMHO anonymous maps==private data of process. so why RES is nonzero while 
  PRES is zero, while on shared code PRES is nonzero and large. what does it 
  really means?
  
  thanks.
  
 
 I'm catching up on threads I was following before I went on vacation,
 and it looks like there was never a response to this.  I'm interested in
 the answers to these questions too, so today I did some spelunking in
 the code to see what I could figure out.  I don't think I really
 understand things too well, but I'll just say what I think I found and
 hopefully the experts will correct anything I get wrong.
 
 I think you're right about the first two mappings in that procstat
 output.  The REF value is the reference count on the vm object (the
 vnode for the exe file, I presume).  I think the reason the reference
 count is 2 is that one reference is the open file itself, and the other
 is the shadow object.  I've always been a bit confused about the concept
This is wrong, see above for explanation.

Vnode ownership of the vm object does not end in the vm object reference
count increase. Instead, filesystems manually manage vm object creation
and destruction, since it fits with the vnode lifecycle management.

 of shadow objects in freebsd's vm, but I think it's somehow related to
 the running processes that are based on that executable vnode.  For
 example, if another copy of Xorg were running, I think REF would be 3,
 and SHD would be 2.
 
 I don't know why there is no shadow object for the writable data mapping
 and why the refcount is only 1 for that.
There _is_ shadow object, as indicated by flags showing that entry no
longer 'needs copy'.

 
 The PRES thing seemed simple when I first looked at the code, but the
 more I think about it in relation to other numbers the more confused I
 get.  The logic in the code is if the shadow count is 1 then PRES is
 the resident size of the shadow object.  This seems to be a measure of
 shared-code usage... any object which could be shared but isn't gets
 counted as private resident.
 
 The part that confuses me is how PRES can be larger than RES.  The value
 for PRES is taken from the resident_page_count field of the shadow
 object.  The RES value is calculated by walking each page of the map
 entry and calling pmap_mincore() to see if it's resident.  So the number
 of resident pages is calculated to be fewer than the resident_page_count
 of the object the entry maps.  I don't understand.
 
 Oh hmmm, wait a sec... could it be that read-ahead or relocation fixup
 or various other things caused lots of pages to be faulted in for the
 vnode object (so they're resident) but not all of those pages are mapped
 into the process because the path of execution has never referenced them
 and caused faults to map them into the process' vmspace?

This is mostly right, except the note that established