Re: COUNT_IPIS vs CPU_FOREACH

2011-05-09 Thread John Baldwin
On Saturday, May 07, 2011 5:37:26 am Andriy Gapon wrote:
 
 I believe that the following change is needed to fix COUNT_IPIS option.
 Right now it seems to be a noop.
 
 
 mp_ipi_intrcnt: CPU_FOREACH can't be used this early
 
 ... because all_cpus is not set yet.

Have you tested this?

all_cpus is set by start_all_aps() from cpu_mp_start() which runs at 
SI_SUB_CPU.   This SYSINIT runs later at SI_SUB_INTR.

So I think CPU_FOREACH() should be fine here.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dsp mmap change

2011-05-09 Thread John Baldwin
On Saturday, May 07, 2011 3:16:25 pm Kostik Belousov wrote:
 On Fri, May 06, 2011 at 04:16:40PM -0400, John Baldwin wrote:
  On Friday, May 06, 2011 10:04:28 am Kostik Belousov wrote:
   On Fri, May 06, 2011 at 04:38:00PM +0300, Andriy Gapon wrote:
on 06/05/2011 16:32 Kostik Belousov said the following:
 On Fri, May 06, 2011 at 11:55:00AM +0300, Andriy Gapon wrote:

 I would like to ask for a review and/or testing of the following 
 patch:
 http://people.freebsd.org/~avg/dev_dsp_mmap.diff

 It's supposed to fix an issue described here:
 http://lists.freebsd.org/pipermail/freebsd-multimedia/2011-
  February/011691.html

 In short, the following pseudo-code should do the right thing:
 fd = open(/dev/dsp, O_RDWR);
 mmap(PROT_READ, fd);
 mmap(PROT_WRITE, fd);

 Thank you!
 
 I think that you have to call PCM_GIANT_LEAVE() when returning
 EINVAL on the vm_pager_alloc() failure.

Yes, thank you.

 Your patch hardcodes an assumption that sndbufs are always
 contiguous. I was unable to convince myself that this is true.

I think that this should be true for the case when DMA is used?
   In the current driver, yes, but there is nothing that theoretically
   prevents scatter-gather from be used.
  
  You could fix this by creating an sglist (via sglist_build()) and an
  OBJT_SG VM object that the d_mmap_single callback returned.  I wish there
  was a cleaner way to just create a VM object and populate it with pages
  though, and then use vm_map_insert() to map it into the kernel rather
  than the more roundabout method of OBJT_SG.
 
 You cannot have one page inserted into two vm objects. Contigmalloc()
 inserts the allocated pages into kernel_object.

Yes, I would want to allocate N unmapped pages and stuff them into a VM
object that can then be mapped into the kernel and/or into user mappings.
This would be a much cleaner approach for the nvidia driver for example.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dsp mmap change

2011-05-09 Thread Alan Cox
On Mon, May 9, 2011 at 8:45 AM, John Baldwin j...@freebsd.org wrote:

 On Saturday, May 07, 2011 3:16:25 pm Kostik Belousov wrote:
  On Fri, May 06, 2011 at 04:16:40PM -0400, John Baldwin wrote:
   On Friday, May 06, 2011 10:04:28 am Kostik Belousov wrote:
On Fri, May 06, 2011 at 04:38:00PM +0300, Andriy Gapon wrote:
 on 06/05/2011 16:32 Kostik Belousov said the following:
  On Fri, May 06, 2011 at 11:55:00AM +0300, Andriy Gapon wrote:
 
  I would like to ask for a review and/or testing of the following
 patch:
  http://people.freebsd.org/~avg/dev_dsp_mmap.diff
 
  It's supposed to fix an issue described here:
  http://lists.freebsd.org/pipermail/freebsd-multimedia/2011-
   February/011691.html
 
  In short, the following pseudo-code should do the right thing:
  fd = open(/dev/dsp, O_RDWR);
  mmap(PROT_READ, fd);
  mmap(PROT_WRITE, fd);
 
  Thank you!
 
  I think that you have to call PCM_GIANT_LEAVE() when returning
  EINVAL on the vm_pager_alloc() failure.

 Yes, thank you.

  Your patch hardcodes an assumption that sndbufs are always
  contiguous. I was unable to convince myself that this is true.

 I think that this should be true for the case when DMA is used?
In the current driver, yes, but there is nothing that theoretically
prevents scatter-gather from be used.
  
   You could fix this by creating an sglist (via sglist_build()) and an
   OBJT_SG VM object that the d_mmap_single callback returned.  I wish
 there
   was a cleaner way to just create a VM object and populate it with pages
   though, and then use vm_map_insert() to map it into the kernel rather
   than the more roundabout method of OBJT_SG.
 
  You cannot have one page inserted into two vm objects. Contigmalloc()
  inserts the allocated pages into kernel_object.

 Yes, I would want to allocate N unmapped pages and stuff them into a VM
 object that can then be mapped into the kernel and/or into user mappings.
 This would be a much cleaner approach for the nvidia driver for example.


There is a relatively new function, vm_object_populate(), that allocates a
collection of pages, inserts them into a vm object, and validates them for
use.  This function exits in FreeBSD 7.x, 8.x, and HEAD.

Alan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dsp mmap change

2011-05-09 Thread John Baldwin
On Monday, May 09, 2011 11:35:07 am Alan Cox wrote:
 On Mon, May 9, 2011 at 8:45 AM, John Baldwin j...@freebsd.org wrote:
 
  On Saturday, May 07, 2011 3:16:25 pm Kostik Belousov wrote:
   On Fri, May 06, 2011 at 04:16:40PM -0400, John Baldwin wrote:
On Friday, May 06, 2011 10:04:28 am Kostik Belousov wrote:
 On Fri, May 06, 2011 at 04:38:00PM +0300, Andriy Gapon wrote:
  on 06/05/2011 16:32 Kostik Belousov said the following:
   On Fri, May 06, 2011 at 11:55:00AM +0300, Andriy Gapon wrote:
  
   I would like to ask for a review and/or testing of the following
  patch:
   http://people.freebsd.org/~avg/dev_dsp_mmap.diff
  
   It's supposed to fix an issue described here:
   http://lists.freebsd.org/pipermail/freebsd-multimedia/2011-
February/011691.html
  
   In short, the following pseudo-code should do the right thing:
   fd = open(/dev/dsp, O_RDWR);
   mmap(PROT_READ, fd);
   mmap(PROT_WRITE, fd);
  
   Thank you!
  
   I think that you have to call PCM_GIANT_LEAVE() when returning
   EINVAL on the vm_pager_alloc() failure.
 
  Yes, thank you.
 
   Your patch hardcodes an assumption that sndbufs are always
   contiguous. I was unable to convince myself that this is true.
 
  I think that this should be true for the case when DMA is used?
 In the current driver, yes, but there is nothing that theoretically
 prevents scatter-gather from be used.
   
You could fix this by creating an sglist (via sglist_build()) and an
OBJT_SG VM object that the d_mmap_single callback returned.  I wish
  there
was a cleaner way to just create a VM object and populate it with pages
though, and then use vm_map_insert() to map it into the kernel rather
than the more roundabout method of OBJT_SG.
  
   You cannot have one page inserted into two vm objects. Contigmalloc()
   inserts the allocated pages into kernel_object.
 
  Yes, I would want to allocate N unmapped pages and stuff them into a VM
  object that can then be mapped into the kernel and/or into user mappings.
  This would be a much cleaner approach for the nvidia driver for example.
 
 
 There is a relatively new function, vm_object_populate(), that allocates a
 collection of pages, inserts them into a vm object, and validates them for
 use.  This function exits in FreeBSD 7.x, 8.x, and HEAD.

Hmm, is there a way to specify restrictions on the pages allocated similar to
what contigmalloc() supports (e.g. being able to allocate pages in the lower
4GB is something the Nvidia driver needs)?

IOW, a method about like contigmalloc() that returned a VM object holding
pages meeting the desired restrictions but didn't necessarily map the
pages.  kmem_alloc_attr() is close to this except it always inserts the
pages into the kernel_object and always maps the pages into the address space.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: dsp mmap change

2011-05-09 Thread Kostik Belousov
On Mon, May 09, 2011 at 01:38:24PM -0400, John Baldwin wrote:
 On Monday, May 09, 2011 11:35:07 am Alan Cox wrote:
  On Mon, May 9, 2011 at 8:45 AM, John Baldwin j...@freebsd.org wrote:
  
   On Saturday, May 07, 2011 3:16:25 pm Kostik Belousov wrote:
On Fri, May 06, 2011 at 04:16:40PM -0400, John Baldwin wrote:
 On Friday, May 06, 2011 10:04:28 am Kostik Belousov wrote:
  On Fri, May 06, 2011 at 04:38:00PM +0300, Andriy Gapon wrote:
   on 06/05/2011 16:32 Kostik Belousov said the following:
On Fri, May 06, 2011 at 11:55:00AM +0300, Andriy Gapon wrote:
   
I would like to ask for a review and/or testing of the 
following
   patch:
http://people.freebsd.org/~avg/dev_dsp_mmap.diff
   
It's supposed to fix an issue described here:
http://lists.freebsd.org/pipermail/freebsd-multimedia/2011-
 February/011691.html
   
In short, the following pseudo-code should do the right thing:
fd = open(/dev/dsp, O_RDWR);
mmap(PROT_READ, fd);
mmap(PROT_WRITE, fd);
   
Thank you!
   
I think that you have to call PCM_GIANT_LEAVE() when returning
EINVAL on the vm_pager_alloc() failure.
  
   Yes, thank you.
  
Your patch hardcodes an assumption that sndbufs are always
contiguous. I was unable to convince myself that this is true.
  
   I think that this should be true for the case when DMA is used?
  In the current driver, yes, but there is nothing that theoretically
  prevents scatter-gather from be used.

 You could fix this by creating an sglist (via sglist_build()) and an
 OBJT_SG VM object that the d_mmap_single callback returned.  I wish
   there
 was a cleaner way to just create a VM object and populate it with 
 pages
 though, and then use vm_map_insert() to map it into the kernel rather
 than the more roundabout method of OBJT_SG.
   
You cannot have one page inserted into two vm objects. Contigmalloc()
inserts the allocated pages into kernel_object.
  
   Yes, I would want to allocate N unmapped pages and stuff them into a VM
   object that can then be mapped into the kernel and/or into user mappings.
   This would be a much cleaner approach for the nvidia driver for example.
  
  
  There is a relatively new function, vm_object_populate(), that allocates a
  collection of pages, inserts them into a vm object, and validates them for
  use.  This function exits in FreeBSD 7.x, 8.x, and HEAD.
 
 Hmm, is there a way to specify restrictions on the pages allocated similar to
 what contigmalloc() supports (e.g. being able to allocate pages in the lower
 4GB is something the Nvidia driver needs)?
 
 IOW, a method about like contigmalloc() that returned a VM object holding
 pages meeting the desired restrictions but didn't necessarily map the
 pages.  kmem_alloc_attr() is close to this except it always inserts the
 pages into the kernel_object and always maps the pages into the address space.
 
For GEM, that needs something very similar, I just allocate the swap object,
and then do vm_page_grab() over all range.

In fact, I never need the full kernel mapping of the object into the
KVA using physical addresses, it is enough to do only per-page temporal
mappings with sfbuf. On the other hand, both usermode and kernelmode
need to access the pages through the aperture remap. Usermode might need
to establish several mappings for one GTT page, and this causes troubles
because pmap_remove_all() does not work on the fictitious pages.
[You was on Cc:].

Other then that, and the fact that vm_page_grab() cannot satisfy the
restrictions on the page placement in the physical space,
normal swap object is enough for much more complicated device
then Azalia.


pgpqhAT8KOxam.pgp
Description: PGP signature


Re: [regression] unable to boot: no GEOM devices found.

2011-05-09 Thread David Naylor
On Friday 15 April 2011 23:29:55 David Naylor wrote:
 On Friday 15 April 2011 18:28:06 John Baldwin wrote:
  On Wednesday, April 13, 2011 1:07:06 pm David Naylor wrote:
   On Tuesday 12 April 2011 22:12:55 Alexander Motin wrote:
David Naylor wrote:
 On Tuesday 12 April 2011 08:17:51 Alexander Motin wrote:
 David Naylor wrote:
 I am running -current and since a few days ago (at least
 2011/04/11) I am unable to boot.
 
 The boot process stops when it looks to find a bootable device.
 The prompt (when pressing '?') does not display any device and
 yielding
  
  one
  
 second (or more) to the kernel (by pressing '.') does not improve
 the situation.
 
 A known working date is 2011/02/20.
 
 I am running amd64 on a nVidia MCP51 chipset.
 
 MCP51... again...

+ata2: reiniting channel ..
+ata2: SATA connect time=0ms status=0113
+ata2: reset tp1 mask=01 ostat0=58 ostat1=00
+ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
+ata2: reset tp2 stat0=50 stat1=00 devices=0x1
+ata2: reinit done ..
+unknown: FAILURE - ATA_IDENTIFY timed out LBA=0

As soon as all devices detected but not responding to commands, I
would suppose that there is something wrong with ATA interrupts.
There is a long chain of interrupt problems in this chipset. I have
already tried to debug one case where ATA wasn't generating
interrupts at all. Unfortunately, without success -- requests were
executing, but not generating interrupts, it wasn't looked like ATA
driver problem.

What's about possible candidate to revision triggering your problem,
I would look on this message:
+pcib0: Enabling MSI window for HyperTransport slave at pci0:0:9:0

At least it is recent (SVN revs 219737,219740 on 2011-03-18 by jhb)
and it is interrupt related.
   
   I reverted those two revs and everything works again.
  
  Hmm, can you provide a full boot verbose dmesg?  Alternatively, can you
  see if the device at pci0:0:9:0 is a PCI-PCI bridge?
 
 I can provide a verbose dmesg if the following is not enough:
 
 none17@pci0:0:9:0:  class=0x05 card=0x50011458 chip=0x027010de
 rev=0xa2 hdr=0x00
 vendor = 'NVIDIA Corporation'
 device = 'MCP51 Host Bridge'
 class  = memory
 subclass   = RAM
 
 I see two PCI-PCI bridges at pci0:0:3:0 and pci0:0:16:0.  I've attached the
 full `pciconf -lv` output.

FYI, this issue is still present on current (~24 hours old).  Reverting the  
above mentioned revisions still fixes the problem.  


signature.asc
Description: This is a digitally signed message part.


Re: [regression] unable to boot: no GEOM devices found.

2011-05-09 Thread John Baldwin
On Monday, May 09, 2011 2:24:37 pm David Naylor wrote:
 On Friday 15 April 2011 23:29:55 David Naylor wrote:
  On Friday 15 April 2011 18:28:06 John Baldwin wrote:
   On Wednesday, April 13, 2011 1:07:06 pm David Naylor wrote:
On Tuesday 12 April 2011 22:12:55 Alexander Motin wrote:
 David Naylor wrote:
  On Tuesday 12 April 2011 08:17:51 Alexander Motin wrote:
  David Naylor wrote:
  I am running -current and since a few days ago (at least
  2011/04/11) I am unable to boot.
  
  The boot process stops when it looks to find a bootable device.
  The prompt (when pressing '?') does not display any device and
  yielding
   
   one
   
  second (or more) to the kernel (by pressing '.') does not 
improve
  the situation.
  
  A known working date is 2011/02/20.
  
  I am running amd64 on a nVidia MCP51 chipset.
  
  MCP51... again...
 
 +ata2: reiniting channel ..
 +ata2: SATA connect time=0ms status=0113
 +ata2: reset tp1 mask=01 ostat0=58 ostat1=00
 +ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
 +ata2: reset tp2 stat0=50 stat1=00 devices=0x1
 +ata2: reinit done ..
 +unknown: FAILURE - ATA_IDENTIFY timed out LBA=0
 
 As soon as all devices detected but not responding to commands, I
 would suppose that there is something wrong with ATA interrupts.
 There is a long chain of interrupt problems in this chipset. I have
 already tried to debug one case where ATA wasn't generating
 interrupts at all. Unfortunately, without success -- requests were
 executing, but not generating interrupts, it wasn't looked like ATA
 driver problem.
 
 What's about possible candidate to revision triggering your problem,
 I would look on this message:
 +pcib0: Enabling MSI window for HyperTransport slave at pci0:0:9:0
 
 At least it is recent (SVN revs 219737,219740 on 2011-03-18 by jhb)
 and it is interrupt related.

I reverted those two revs and everything works again.
   
   Hmm, can you provide a full boot verbose dmesg?  Alternatively, can you
   see if the device at pci0:0:9:0 is a PCI-PCI bridge?
  
  I can provide a verbose dmesg if the following is not enough:
  
  none17@pci0:0:9:0:  class=0x05 card=0x50011458 chip=0x027010de
  rev=0xa2 hdr=0x00
  vendor = 'NVIDIA Corporation'
  device = 'MCP51 Host Bridge'
  class  = memory
  subclass   = RAM
  
  I see two PCI-PCI bridges at pci0:0:3:0 and pci0:0:16:0.  I've attached 
the
  full `pciconf -lv` output.
 
 FYI, this issue is still present on current (~24 hours old).  Reverting the  
 above mentioned revisions still fixes the problem.  

Yes, I'm still chewing on how best to fix this.  The problem is that for the 
most part we should enable the MSI mapping window everywhere, but for certain 
broken Nvidia chipsets it seems that doing so breaks INTx interrupts and we 
need to not enable it (and disable MSI globally) on those chipsets.  Linux has 
some grotty code to allow PCI devices to figure out which Host Bridge device 
on PCI bus 0 is the real host bridge for each HT slave and to selectively 
enable it in the host bridge when an MSI interrupt is first enabled.

They also have a quirk to disable MSI altogether on certain nvidia chipsets if 
the MSI mapping window is not enabled by the BIOS.  I attempted to implement 
the latter, but it broke perfectly good nvidia chipsets on older ppc-based 
Macs.  I think I want to just disable MSI entirely on busted chipsets like 
yours, but I need to come up with a good way to detect your chipset (and 
similar).

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: firewire debugging

2011-05-09 Thread Sean Bruno
On Tue, 2011-05-03 at 14:35 -0700, Julian Elischer wrote:
 does anyone know if there is a limitation on firewire debugging on a 
 machine with  4GB or memory?
 
 I have 1394 {a,b} cards.  does it make a difference?
 
 also, the firewire card on one machine stops it from booting..
 
 is there a way to disable it during boot other than recompiling the 
 kernel without firewire?
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Ug.  I've been looking for a machine that fails to boot with firewire in
it for a while.  What are the odds that you can bring it to BSDCan?

Sean

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: firewire debugging

2011-05-09 Thread Julian Elischer

On 5/9/11 12:48 PM, Sean Bruno wrote:

On Tue, 2011-05-03 at 14:35 -0700, Julian Elischer wrote:

does anyone know if there is a limitation on firewire debugging on a
machine with  4GB or memory?

I have 1394 {a,b} cards.  does it make a difference?

also, the firewire card on one machine stops it from booting..

is there a way to disable it during boot other than recompiling the
kernel without firewire?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Ug.  I've been looking for a machine that fails to boot with firewire in
it for a while.  What are the odds that you can bring it to BSDCan?


err it's in a rack at work and surprisingly enough I took the card out 
of it :-)
The machine booted with a 1394a card in a different slot. and 
debugging works fine.

The card that stopped boot was a 1394b card. (PCIe)

I may be able to get the card to you though.





Sean




___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


geom kernel proc

2011-05-09 Thread Andrew Thompson
Hi,


I have a cosmetic patch that moves the geom kprocs to kthreads

from

PID  TT  STAT  TIME COMMAND
2  ??  DL 2:38.03 [g_event]
3  ??  DL49:43.61 [g_up]
4  ??  DL57:10.71 [g_down]

to

% procstat -t 13
  PIDTID COMM TDNAME   CPU  PRI STATE   WCHAN
   13 100021 geom g_event1   92 sleep   -
   13 100022 geom g_up   3   92 sleep   -
   13 100023 geom g_down 3   92 sleep   -

Any objections?


Andrew


geom-kthread.diff
Description: Binary data
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: firewire debugging

2011-05-09 Thread Peter Grehan

Hi Julian,


does anyone know if there is a limitation on firewire debugging on a
machine with  4GB or memory?


 I don't know of any Firewire cards that support physical access 
*above* 4GB. They may exist.


 For instance, the (last?) Texas Instruments PCIe 1394a/b chip, the 
XIO2213B, has the following text in it's data sheet -


The physical upper bound register is an optional register and is
not implemented.

 .. and this is the firewire OHCI register that contains the upper 16 
bits of the architected 48-bit physical address.


 So, you can probably use it for some form of amd64 kernel debug since 
kernel txt/data/bss is  4G, but accessing anything above 4G won't work.


later,

Peter.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: My problems with stability on -current

2011-05-09 Thread Alexander Motin

On 10.05.2011 02:48, Doug Barton wrote:

I would start from most obvious problems. I need to know more about
crashes. As usual: how to trigger, stack backtraces, etc.


Triggering is easy, I can start a buildworld with -j2, and a build of
ports/www/firefox with FORCE_MAKE_JOBS, and within 30 minutes the system
will reboot. I posted a panic message relative to r220282, (-current
archives, 4/4) but kib said it didn't make any sense. Usually I don't
get a panic at all.


Could you hint me the thread?


Go to http://www.FreeBSD.org/
Click 'mailing lists'
Click 'listed in the FreeBSD Handbook.'
Click freebsd-current
Click freebsd-current Archives
Click April 2011
search for r220282
Voila! :)


OK, but URL would be fine also. :) I am agree with kib@ -- the message 
doesn't match the backtrace.



What's about time problems, I would try to collect more data:
- show `sysctl kern.eventtimer`, `sysctl kern.timecounter` and verbose
dmesg outputs;


http://people.freebsd.org/~dougb/dougb-current-r221566.txt


- what eventtimer is used now and does it helps to switch to another
one with kern.eventtimer.timer sysctl?


When I was trying to track down the problems last summer I vaguely
remember trying RTC, but eventually we realized that the real problem
was throttling, so I stopped specifying RTC and let it go back to the
default. What do you suggest I try?


As I see, now you are using HPET (chosen automatically). I would try
switch to the LAPIC. Just make sure to disable C-states if you are
enabled them to be sure that LAPIC timer won't stop.


Ok, so kern.eventtimer.timer=LAPIC in /boot/loader.conf should do
that, right?


Yes. You can do it in run-time also.


I don't use C-states (in part as a result of previous investigation) but
I do use powerd as such:
powerd_flags=-a adaptive -b adaptive -n adaptive


- does the timer runs in periodic or one-shot mode and does it helps to
switch to another one?


How could I tell, and how would I switch?


`sysctl kern.eventtimer.periodic`.


kern.eventtimer.periodic: 0


And read eventtimers(4) please.


I did that, but I don't see anything in there as to which choice is
one-shot, and how to change to periodic. I assume 0 is the default,
which I also assume is one-shot. Does setting that to 1 change to
periodic? Also, can I safely do this while the system is running, or
should it be in /boot/loader.conf as well?


Yes, nonzero value means periodic. And yes, changing in run-time is safe.

--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: My problems with stability on -current

2011-05-09 Thread Jason Hellenthal

Alexander,

On Tue, May 10, 2011 at 04:29:25AM +0300, Alexander Motin wrote:
 On 10.05.2011 02:48, Doug Barton wrote:
 
  Ok, so kern.eventtimer.timer=LAPIC in /boot/loader.conf should do
  that, right?
 
 Yes. You can do it in run-time also.

Not quite absolutely sure here but IIRC the last time I tried setting that 
via loader.conf in 8-STABLE it was not being set so I eventually added it 
to sysctl.conf. Just for reference I never looked into it further.

-- 

 Regards, (jhell)
 Jason Hellenthal



pgpkLYmqIZwBa.pgp
Description: PGP signature


Re: My problems with stability on -current

2011-05-09 Thread Doug Barton
New symptom, today (still running r221566) I compiled a small port, that 
worked without any freezes or interactivity problems. Then I tried

compiling a larger port (java/openjdk6 if anyone cares) and still no
interactivity problems, but I got the system wedge requiring power
cycle problem I was seeing previously that I tracked to the one-shot
timer update.

More below.

On 05/07/2011 02:43, Alexander Motin wrote:

Doug Barton wrote:

On 05/05/2011 13:55, Alexander Motin wrote:

I see several possibly unrelated problems there:
   - crashes are always crashes. They should be debugged.
   - calcru going backwards could have the same roots as lost wall clock
time.


I think you're right about that. What usually happens when the load
maxes out is that the system visibly freezes for a minute or 2, and when
it comes back to life the log is flooded with calcru messages. If it
stays up long enough after that the wall clock drift becomes noticeable.
This is in spite of running ntpd.


These system freezes are very suspicious. Most time counters need only
few seconds to overflow, some even less. So freeze for few minutes will
easily overflow most of them. So the freezes are probably the cause of
time problems, but the question now is what the cause of freezes. You
should try to investigate what is going on during freezes. Does the
system do anything, are there any interrupts working (`vmstat -i` just
before and after), are there any interrupt storms, etc?


Here is the output on a mostly-idle system, shortly after reboot:

vmstat -i
interrupt  total   rate
irq1: atkbd01784  0
irq9: acpi01  0
irq14: ata0   213355 89
irq15: ata1   58  0
irq17: wpi074331 31
irq20: hpet0 uhci0+   787767331
irq22: uhci2   21453  9
irq256: hdac0 11  0
Total1098760462

At a more opportune time I'll try crashing it again and get another result.


If there are some problems with timer interrupts, timecounters
could wrap unnoticed that will cause random time jumps.
   - interactivity problems. I can't prove it is unrelated, but have no
real ideas now.

I would start from most obvious problems. I need to know more about
crashes. As usual: how to trigger, stack backtraces, etc.


Triggering is easy, I can start a buildworld with -j2, and a build of
ports/www/firefox with FORCE_MAKE_JOBS, and within 30 minutes the system
will reboot. I posted a panic message relative to r220282, (-current
archives, 4/4) but kib said it didn't make any sense. Usually I don't
get a panic at all.


Could you hint me the thread?


Go to http://www.FreeBSD.org/
Click 'mailing lists'
Click 'listed in the FreeBSD Handbook.'
Click freebsd-current
Click freebsd-current Archives
Click April 2011
search for r220282
Voila! :)


What's about time problems, I would try to collect more data:
   - show `sysctl kern.eventtimer`, `sysctl kern.timecounter` and verbose
dmesg outputs;


http://people.freebsd.org/~dougb/dougb-current-r221566.txt


   - what eventtimer is used now and does it helps to switch to another
one with kern.eventtimer.timer sysctl?


When I was trying to track down the problems last summer I vaguely
remember trying RTC, but eventually we realized that the real problem
was throttling, so I stopped specifying RTC and let it go back to the
default. What do you suggest I try?


As I see, now you are using HPET (chosen automatically). I would try
switch to the LAPIC. Just make sure to disable C-states if you are
enabled them to be sure that LAPIC timer won't stop.


Ok, so kern.eventtimer.timer=LAPIC in /boot/loader.conf should do
that, right?

I don't use C-states (in part as a result of previous investigation) but 
I do use powerd as such:

powerd_flags=-a adaptive -b adaptive -n adaptive


   - does the timer runs in periodic or one-shot mode and does it helps to
switch to another one?


How could I tell, and how would I switch?


`sysctl kern.eventtimer.periodic`.


kern.eventtimer.periodic: 0


And read eventtimers(4) please.


I did that, but I don't see anything in there as to which choice is
one-shot, and how to change to periodic. I assume 0 is the default,
which I also assume is one-shot. Does setting that to 1 change to
periodic? Also, can I safely do this while the system is running, or
should it be in /boot/loader.conf as well?


   - if full CPU load makes time to stop, try to track what is going on
with timer interrupts using `vmstat -i` and `systat -vm 1`. Under full
CPU load in one-shot mode you should have stable timer interrupt rate
about hz+stathz.


Ok, I'll do that tomorrow, tired now.


   - if timer interrupts are not working well, you can build kernel with
optionsKTR
optionsALQ
optionsKTR_ALQ