date:20100721

Re: Why is intr taking up so much cpu?

2010-07-21 Thread Andriy Gapon



Doug,

could you please show your timer configuration, part of devinfo -u that
describes interrupts and top of the output of top -SPH (including the header)
when high interrupt load strikes?

P.S. I saw output of top -SH, but I have a reason to be curious about top -SPH.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-21 Thread PseudoCylon

- Original Message 
 From: Hans Petter Selasky hsela...@c2i.net
 To: PseudoCylon moonlightak...@yahoo.ca
 Cc: freebsd-current@freebsd.org; Sam Leffler s...@freebsd.org; 
freebsd-...@freebsd.org
 Sent: Tue, July 20, 2010 4:46:34 AM
 Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers
 
 On Tuesday 20 July 2010 12:03:22 PseudoCylon wrote:
  - Original  Message 
  
   From: Hans Petter Selasky hsela...@c2i.net
   To: freebsd-current@freebsd.org
Cc: PseudoCylon moonlightak...@yahoo.ca; Sam  Leffler 
s...@freebsd.org;
  
   freebsd-...@freebsd.org
   
   Sent: Mon, July 19, 2010 1:17:04 PM
   Subject: Re:  [panic] Race in IEEE802.11 layer towards device drivers
   
Hi AK,
   
   I've committed your patches to USB P4.  I've made some additional 
   patches.
   
   Can  you check and verify everything?
   
   http://p4web.freebsd.org/@@181189?ac=10
  
  Hi
  
  If we change sc-cmdq_run = RUN_CMDQ_ABORT,
  
  --  begin excerpt --
  
  
  @@ -4890,7 +4877,10 @@ run_stop(void  *arg)
   ifp-if_drv_flags = ~(IFF_DRV_RUNNING |  IFF_DRV_OACTIVE);
  
   sc-ratectl_run =  RUN_RATECTL_OFF;
  -sc-cmdq_run = RUN_CMDQ_ABORT;
  +
   +RUN_CMDQ_LOCK(sc);
  +sc-cmdq_run = sc-cmdq_key_set =  RUN_CMDQ_ABORT;
  +RUN_CMDQ_UNLOCK(sc);
  
  -- end excerpt  --
  
  
  we also need to change this, otherwise key will be  cleared.
 
 Ok.
 
 Try to give the second mutex a different name, and  see how many warnings go 
 away.
 
 --HPS
 

Giving different name makes all of duplicate lock warnings away.

Here is the patch includes all changes

-- begin patch --

diff --git a/dev/usb/wlan/if_run.c b/dev/usb/wlan/if_run.c
index 017e4b0..da22077 100644
--- a/dev/usb/wlan/if_run.c
+++ b/dev/usb/wlan/if_run.c
@@ -549,7 +549,7 @@ run_attach(device_t self)
 mtx_init(sc-sc_mtx, device_get_nameunit(sc-sc_dev),
 MTX_NETWORK_LOCK, MTX_DEF);
 mtx_init(sc-sc_cmdq_mtx, device_get_nameunit(sc-sc_dev),
-MTX_NETWORK_LOCK, MTX_DEF);
+command queue, MTX_DEF);
 
 iface_index = RT2860_IFACE_INDEX;
 
@@ -4670,8 +4670,6 @@ run_init_locked(struct run_softc *sc)
 if(ic-ic_nrunning  1)
 return;
 
-run_stop(sc);
-
 for (ntries = 0; ntries  100; ntries++) {
 if (run_read(sc, RT2860_ASIC_VER_ID, tmp) != 0)
 goto fail;

-- end patch --


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: firefox is stuck in getbuf()

2010-07-21 Thread Gavin Atkinson

On Tue, 2010-07-20 at 16:29 +0300, Kostik Belousov wrote:
 On Tue, Jul 20, 2010 at 10:58:00AM +0800, David Xu wrote:
  With newest -HEAD code, firefox is stuck in getbuf().
  
  top
  
  last pid:  1814;  load averages:  0.00,  0.05,  0.07 
  
  up 0+00:37:11  10:54:01
  135 processes: 1 running, 134 sleeping
  CPU:  3.7% user,  0.0% nice,  0.6% system,  0.0% interrupt, 95.7% idle
  Mem: 259M Active, 393M Inact, 151M Wired, 1484K Cache, 111M Buf, 186M Free
  Swap: 2020M Total, 2020M Free
  
PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME   WCPU 
  COMMAND
   1427 davidxu   1  450   114M   101M select  0   1:24  0.29% Xorg
   1588 davidxu  10  440   279M   145M getbuf  0   2:15  0.00% 
  firefox-bin
  
  
  procstat  -k 1588
PIDTID COMM TDNAME   KSTACK 
  
   1588 100200 firefox-bin  initial thread   mi_switch sleepq_switch 
  sleepq_wait _sleep getdirtybuf flush_deplist softdep_sync_metadata 
  ffs_syncvnode ffs_fsync VOP_FSYNC_APV fsync syscallenter syscall 
  Xint0x80_syscall
   1588 100207 firefox-bin  -mi_switch sleepq_switch 
  sleepq_catch_signals sleepq_wait_sig _cv_wait_sig seltdwait poll 
  syscallenter syscall Xint0x80_syscall
   1588 100208 firefox-bin  -mi_switch sleepq_switch 
  sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
  syscallenter syscall Xint0x80_syscall
   1588 100209 firefox-bin  -mi_switch sleepq_switch 
  sleepq_catch_signals sleepq_timedwait_sig _sleep __umtx_op_cv_wait 
  _umtx_op syscallenter syscall Xint0x80_syscall
   1588 100210 firefox-bin  -mi_switch sleepq_switch 
  sleepq_catch_signals sleepq_timedwait_sig _sleep __umtx_op_cv_wait 
  _umtx_op syscallenter syscall Xint0x80_syscall
   1588 100216 firefox-bin  -mi_switch sleepq_switch 
  sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
  syscallenter syscall Xint0x80_syscall
   1588 100220 firefox-bin  -mi_switch sleepq_switch 
  sleepq_wait _sleep getdirtybuf flush_deplist softdep_sync_metadata 
  ffs_syncvnode ffs_fsync VOP_FSYNC_APV fsync syscallenter syscall 
  Xint0x80_syscall
   1588 100238 firefox-bin  -mi_switch sleepq_switch 
  sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
  syscallenter syscall Xint0x80_syscall
   1588 100239 firefox-bin  -mi_switch sleepq_switch 
  sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
  syscallenter syscall Xint0x80_syscall
   1588 100240 firefox-bin  -mi_switch sleepq_switch 
  sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op 
  syscallenter syscall Xint0x80_syscall
 
 Can you, please, do the following:
 show the backtraces for the system processes, in particular, syncer,
 bufdaemon, softdepflush daemon, pagedaemon and vm ?
 for the stuck firefox thread, find the address of the buffer
 supplied as an argument to getdirtybuf, and print the *(struct buf *)addr ?
 This can be done on the live/stuck system using kgdb on /dev/mem.

I can relatively easily recreate this, see my thread on -current on the
17th July (Filesystem wedge, SUJ-related?), which (and the followup
emails) contain additional info.  I'm currently trying to find the
commit responsible for introducing this, and have established that a
kernel from the 1st June does not seem to exhibit the same issue.

Tonight, I'll revert to a current -current and try to get the info you
need.

Thanks,

Gavin

-- 
Gavin Atkinson
FreeBSD committer and bugmeister
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: [panic] Race in IEEE802.11 layer towards device drivers

2010-07-21 Thread Hans Petter Selasky

Hi,

Please confirm that this patch is working for you:

http://p4web.freebsd.org/@@181261?ac=10

--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: current + mpt = panic: Bad link elm 0xffffff80002d6480 next-prev != elm

2010-07-21 Thread Ståle Kristoffersen

On 2010-07-20 at 14:16, Svein Skogen (Listmail account) wrote:
 Sorry for the late response here, but what you're describing matches
 fairly well what I saw with RELENG_8 (just after 8.0 was released), but
 luckily I didn't have any disks on my MPT, just my tape autoloader.
 
 Random timeouts, and then bus resets (that made tape IO unreliable).
 
 The bad news, is that I had the exact same trouble with OpenSolaris
 (134), and something-similar with Linux (can't remember versions), at
 the time.
 
 I never did find a solution, and ended up throwing windows on the box,
 just to get reliable backups.
 
 My MPT is a 3801 LSI1068e based card running the latest bios.

Hmm, that does not sound good. Did windows work on the same hardware
without problems?

I -might- have solved my problem. It has now ran for 24h without timeouts,
and with a bit of load on it. I think I might have ran into the seagate +
NCQ-problem, even tho seagate's webpage told me my drives was not affected
(according to the serial numbers). I did however update the following
num drives   firmware 
6x  ST31000340AS SD15
4x  ST31500341AS SD17

to firmware SD1B (old SD17) and SD1A (old SD15), and that looks like it has
done the trick. I'll report back in a week or so if the problem has not
reappeared.

-- 
Ståle Kristoffersen
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

HPC on FreeBSD - proposal to EPSRC

2010-07-21 Thread Anton Shterenlikht

This post is primarily directed at people in the UK.

EPSRC UK (Engineering and Physical Sciences
Research Council) issued HPC Software
Development Call for 2010/11. From their
site:   
This call invites proposals for
High Performance Computing (HPC)
software development to enable
science and engineering.

For full details see:

http://www.epsrc.ac.uk/SiteCollectionDocuments/Calls/2010/HPCSoftwareDevelopmentCall.pdf

I indend to draft a proposal for this call.
I'm particularly interested in making
HPC on FreeBSD ia64 a reality.

Briefly, I want to propose development
of an optimising MPI/OpenMP C/c++/Fortran
compiler for a fbsd/ia64 cluster environment,
probably based on llvm/clang. 

If you are in UK academia and want to participate,
or if you are in business and might consider
supporting a proposal of this sort
(e.g. a letter of support) please get in touch  
directly.

If you have other FreeBSD based
ideas suitable for this call - I'd also
love to hear.

yours
anton



-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 331 5944
Fax: +44 (0)117 929 4423
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: current + mpt = panic: Bad link elm 0xffffff80002d6480 next-prev != elm

2010-07-21 Thread Svein Skogen (Listmail account)

On 21.07.2010 18:33, Ståle Kristoffersen wrote:
 On 2010-07-20 at 14:16, Svein Skogen (Listmail account) wrote:
 Sorry for the late response here, but what you're describing matches
 fairly well what I saw with RELENG_8 (just after 8.0 was released), but
 luckily I didn't have any disks on my MPT, just my tape autoloader.

 Random timeouts, and then bus resets (that made tape IO unreliable).

 The bad news, is that I had the exact same trouble with OpenSolaris
 (134), and something-similar with Linux (can't remember versions), at
 the time.

 I never did find a solution, and ended up throwing windows on the box,
 just to get reliable backups.

 My MPT is a 3801 LSI1068e based card running the latest bios.
 
 Hmm, that does not sound good. Did windows work on the same hardware
 without problems?

Yup. But notice that I do _NOT_ have any disks on my MPT (I have an MFI
for that), it's just a mini-sas--mini-sas into a HP 1/8G2 LTO3 Autoloader.

 I -might- have solved my problem. It has now ran for 24h without timeouts,
 and with a bit of load on it. I think I might have ran into the seagate +
 NCQ-problem, even tho seagate's webpage told me my drives was not affected
 (according to the serial numbers). I did however update the following
 num drives   firmware 
 6x  ST31000340AS SD15
 4x  ST31500341AS SD17

I have 8 of the last type (31500341AS) mine running on CC1H firmware,
connected to my MFI. Not a single glitch so far.

 
 to firmware SD1B (old SD17) and SD1A (old SD15), and that looks like it has
 done the trick. I'll report back in a week or so if the problem has not
 reappeared.

Hope it's fixed for you. I'm still keeping an eye on the MPT code to see
if someone changes something that CAN be affecting my timeout
issues/reset, and if I see something promising, I'm willing to dump out
the entire server to tapes, and test run (I have sufficient spare tapes
to actually test without losing data), but such a job will take me a
week to prepare, and another to test. Quite a bit of time for something
that may solve my problem... ;)

//Svein

-- 
+---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
+---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.

 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/




signature.asc
Description: OpenPGP digital signature

Re: Why is intr taking up so much cpu?

2010-07-21 Thread Doug Barton


On Wed, 21 Jul 2010, Andriy Gapon wrote:




Doug,

could you please show your timer configuration,


Nothing special in /boot/loader.conf, /etc/sysctl.conf, or my kernel. 
It's basically just GENERIC minus devices I don't have, plus the 
following:


options DDB_CTF
options VESA
options GEOM_BDE
device  atapicam 
device  sound

device  snd_hda

Interestingly, I had a runaway intr thing again after watching a flash 
video, but this time it was hdac0, not swi:4.


http://people.freebsd.org/~dougb/bad-dtrace-3-hdac.txt
http://people.freebsd.org/~dougb/bad-dtrace-4-hdac.txt


part of devinfo -u that describes interrupts


Interrupt request lines:
0 (attimer0)
1 (atkbd0)
3 (root0)
4 (uart0)
5-7 (root0)
8 (atrtc0)
9 (acpi0)
10-11 (root0)
12 (psm0)
12 (psmcpnp0)
13 (root0)
14 (ata0)
15 (ata1)
16 (root0)
17 (wpi0)
18 (cbb0)
19 (root0)
20 (ehci0)
20 (uhci0)
20 (hpet0)
21 (uhci1)
22 (uhci2)
23 (uhci3)
256 (hdac0)


and top of the output of top -SPH (including the header)
when high interrupt load strikes?


Will do next time, thanks!


Doug

--

Improve the effectiveness of your Internet presence with
a domain name makeover!http://SupersetSolutions.com/

Computers are useless. They can only give you answers.
-- Pablo Picasso

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Why is intr taking up so much cpu?

2010-07-21 Thread Andriy Gapon

on 21/07/2010 21:50 Doug Barton said the following:
 On Wed, 21 Jul 2010, Andriy Gapon wrote:
 


 Doug,

 could you please show your timer configuration,
 
 Nothing special in /boot/loader.conf, /etc/sysctl.conf, or my kernel.
 It's basically just GENERIC minus devices I don't have, plus the following:

I didn't mean your manual tuning, I meant how the system is configured :-)  E.g.
the relevant sysctl tree.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

panic with clangbsd kernel in kern/vfs_bio.c

2010-07-21 Thread René Ladan

Hi,

on my Acer 7738G laptop running FreeBSD 9.0-amd64 r209980 (with latest
clangbsd kernel),
I encountered this panic (recovered from /var/log/messages), while
doing some moderately
light load (portmaster, openoffice, firefox, thunderbird in an xfce4 session):

Jul 21 22:29:47 acer kernel: panic: buf 0xff80526d00c0 already
counted as free
Jul 21 22:29:47 acer kernel: cpuid = 1
Jul 21 22:29:47 acer kernel: KDB: enter: panic
Jul 21 22:29:47 acer kernel:
Jul 21 22:29:47 acer kernel: 0xff0005093b40: tag devfs, type VCHR
Jul 21 22:29:47 acer kernel: usecount 1, writecount 0, refcount 588
mountedhere 0xff0004c84200
Jul 21 22:29:47 acer kernel: flags ()
Jul 21 22:29:47 acer kernel: v_object 0xff00050adbd0 ref 0 pages 12590
Jul 21 22:29:47 acer kernel: lock type devfs: EXCL by thread
0xff0004c59880 (pid 17)
Jul 21 22:29:47 acer kernel: dev ad4s1f

I have the following modules loaded:
acer % kldstat
Id Refs AddressSize Name
 1   32 0x8010 f925f0   kernel (GENERIC)
 21 0x81093000 1bd28if_iwn.ko
 31 0x810af000 296f8snd_hda.ko
 42 0x810d9000 85fe8sound.ko
 51 0x8115f000 570f8iwn5000fw.ko
 61 0x811b7000 dc29d0   nvidia.ko
(ports/x11/nvidia-driver, version 256.35 with patches from current@)
 73 0x81f7a000 423c8linux.ko
 81 0x81fbd000 d38  biosfont.ko (ports/sysutils/biosfont)
 91 0x82012000 3a73 linprocfs.ko

acer % grep -nr already counted as free ~/freebsd/clangbsd/sys/* |
grep -v \.svn
kern/vfs_bio.c:401: (buf %p already counted as free, bp));
acer % ident kern/vfs_bio.c
kern/vfs_bio.c:
 $FreeBSD: projects/clangbsd/sys/kern/vfs_bio.c 209170 2010-06-14
18:45:33Z rdivacky $
acer % ident /sys/kern/vfs_bio.c
/sys/kern/vfs_bio.c:
 $FreeBSD: head/sys/kern/vfs_bio.c 209902 2010-07-11 20:11:44Z alc $

So it might be fixed already.

Let me know if you need more information. Unfortunately I didn't get a
core dump,
although dumpdev=AUTO in /etc/rc.conf.  The laptop rebooted by itself.

Regards,
Rene
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Why is intr taking up so much cpu?

2010-07-21 Thread Doug Barton


On Wed, 21 Jul 2010, Andriy Gapon wrote:


I didn't mean your manual tuning, I meant how the system is configured :-)  E.g.
the relevant sysctl tree.


Duh. :)  Sorry.

sysctl -a | grep timer
kern.eventtimer.choice: LAPIC(500) HPET(450) HPET1(440) HPET2(440) 
i8254(100) RTC(0)

kern.eventtimer.et.LAPIC.flags: 15
kern.eventtimer.et.LAPIC.frequency: 83223728
kern.eventtimer.et.LAPIC.quality: 500
kern.eventtimer.et.HPET.flags: 3
kern.eventtimer.et.HPET.frequency: 14318180
kern.eventtimer.et.HPET.quality: 450
kern.eventtimer.et.HPET1.flags: 3
kern.eventtimer.et.HPET1.frequency: 14318180
kern.eventtimer.et.HPET1.quality: 440
kern.eventtimer.et.HPET2.flags: 3
kern.eventtimer.et.HPET2.frequency: 14318180
kern.eventtimer.et.HPET2.quality: 440
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.timer2: HPET
kern.eventtimer.timer1: LAPIC
kern.eventtimer.singlemul: 2
net.inet.tcp.timer_race: 0
net.inet.tcp.per_cpu_timers: 0
machdep.acpi_timer_freq: 3579545
p1003_1b.timers: 200112
p1003_1b.delaytimer_max: 2147483647
p1003_1b.timer_max: 32
dev.acpi_timer.0.%desc: 24-bit timer at 3.579545MHz
dev.acpi_timer.0.%driver: acpi_timer
dev.acpi_timer.0.%location: unknown
dev.acpi_timer.0.%pnpinfo: unknown
dev.acpi_timer.0.%parent: acpi0
dev.attimer.0.%desc: AT timer
dev.attimer.0.%driver: attimer
dev.attimer.0.%location: handle=\_SB_.PCI0.ISAB.TMR_
dev.attimer.0.%pnpinfo: _HID=PNP0100 _UID=0
dev.attimer.0.%parent: acpi0
dev.pmtimer.0.%driver: pmtimer
dev.pmtimer.0.%parent: isa0


--

Improve the effectiveness of your Internet presence with
a domain name makeover!http://SupersetSolutions.com/

Computers are useless. They can only give you answers.
-- Pablo Picasso

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Why is intr taking up so much cpu?

Re: [panic] Race in IEEE802.11 layer towards device drivers

Re: firefox is stuck in getbuf()

Re: [panic] Race in IEEE802.11 layer towards device drivers

Re: current + mpt = panic: Bad link elm 0xffffff80002d6480 next-prev != elm

HPC on FreeBSD - proposal to EPSRC

Re: current + mpt = panic: Bad link elm 0xffffff80002d6480 next-prev != elm

Re: Why is intr taking up so much cpu?

Re: Why is intr taking up so much cpu?

panic with clangbsd kernel in kern/vfs_bio.c

Re: Why is intr taking up so much cpu?

11 matches

Site Navigation

Mail list logo

Footer information