Re: Big problem still remains with 7.2-STABLE locking up

2009-06-09 Thread NAKAJI Hiroyuki
> In <3bbf2fe10906091757w35ffd5cfr6f091fc718bc8...@mail.gmail.com> 
>   Attilio Rao  wrote:

> > Dcons session was recorded with script.
> > http://www.heimat.gr.jp/localhost/dcons.log

Just fix my typo.
http://www.heimat.gr.jp/~nakaji/localhost/dcons.log
^^^

> I'm following up privately with the user, news to come hopefully.

Thanks. I'll try.
-- 
NAKAJI Hiroyuki
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Big problem still remains with 7.2-STABLE locking up

2009-06-09 Thread Attilio Rao
2009/6/10 NAKAJI Hiroyuki :
> Thanks Attilio,
>
> I set up dcons target/host pair. Target is 7.2-STABLE and host is
> 6.4-STABLE.
>
> Dcons session was recorded with script.
> http://www.heimat.gr.jp/localhost/dcons.log

I'm following up privately with the user, news to come hopefully.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Big problem still remains with 7.2-STABLE locking up

2009-06-09 Thread NAKAJI Hiroyuki
Thanks Attilio,

I set up dcons target/host pair. Target is 7.2-STABLE and host is
6.4-STABLE.

Dcons session was recorded with script.
http://www.heimat.gr.jp/localhost/dcons.log

> In <3bbf2fe10906060749xbbc2f2fy4c09f67711a...@mail.gmail.com> 
>   Attilio Rao  wrote:

> 2) Once you get the deadlock break in the DDB debugger

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
vfs_badlock() at vfs_badlock+0x95
assert_vop_elocked() at assert_vop_elocked+0x64
VOP_WRITE_APV() at VOP_WRITE_APV+0x155
vn_write() at vn_write+0x1ce
dofilewrite() at dofilewrite+0x85
kern_pwritev() at kern_pwritev+0x66
pwrite() at pwrite+0x58
syscall() at syscall+0x1ce
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (476, FreeBSD ELF64, pwrite), rip = 0x80074d17c, rsp = 
0x7fffda98, rbp = 0xb4000 ---
VOP_WRITE: 0xff004a26c000 is not exclusive locked but should be
KDB: enter: lock violation
[thread pid 29756 tid 100177 ]
Stopped at  kdb_enter_why+0x3d: movq$0,0x626418(%rip)

> 3) Once you are in DDB informations which could be very useful are:
db> show allpcpu
db> show alllocks
db> show lockedvnods
db> ps
db> allthreads
> Note that this is a lot of printout so you won't be able of collecting
> all these informations if not with a serial connection.

db> show allpcpu
Current CPU: 0

cpuid= 0
curthread= 0xff0114476ab0: pid 29756 "expireover"
curpcb   = 0xff80a3f7ed40
fpcurthread  = none
idlethread   = 0xff0001589720: pid 11 "idle: cpu0"
spin locks held:

db> show alllocks
Process 29784 (spamc) thread 0xff004afac720 (100170)
exclusive sx so_rcv_sx r = 0 (0xff0065383c40) locked @ 
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 29413 (perl) thread 0xff004afb0720 (100162)
exclusive sx so_rcv_sx r = 0 (0xff00656163d0) locked @ 
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 29409 (perl) thread 0xff01144ea390 (100175)
exclusive sx so_rcv_sx r = 0 (0xff004a210970) locked @ 
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 29406 (perl) thread 0xff0065899000 (100196)
exclusive sx so_rcv_sx r = 0 (0xff00655cbc40) locked @ 
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 1497 (perl5.8.9) thread 0xff0013dedab0 (100113)
exclusive sx so_rcv_sx r = 0 (0xff004a0b9970) locked @ 
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 1496 (ninpaths) thread 0xff001333a720 (100082)
exclusive sx so_rcv_sx r = 0 (0xff004a33d100) locked @ 
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 1494 (perl5.8.9) thread 0xff0013dcf720 (100107)
exclusive sx so_rcv_sx r = 0 (0xff004a33d6a0) locked @ 
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 1397 (python2.5) thread 0xff0013dd1ab0 (100098)
exclusive sx so_rcv_sx r = 0 (0xff00655cc3d0) locked @ 
/usr/src/sys/kern/uipc_sockbuf.c:148
db> show lockedvnods
Locked vnodes

0xff004a26c000: tag ufs, type VREG
usecount 5, writecount 2, refcount 587 mountedhere 0
flags (VI_OBJDIRTY)
v_object 0xff004a1fad80 ref 3 pages 4604
 lock type ufs: SHARED (count 1)
ino 3157042, on dev ad4s1f
db> ps
  pid  ppid  pgrp   uid   state   wmesg wchancmd
29803  1443  1406 8  S   nanslp   0x80b58688 sleep
29797  1534  1534 0  S   connec   0xff00654815fe sendmail
29785  1499  1499   110  S   lockf0xff0065a19000 perl
[snip]

db> allthreads
No such command
db> panic
panic: from debugger
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x182
db_panic() at db_panic+0x17
db_command() at db_command+0x1ef
db_command_loop() at db_command_loop+0x50
db_trap() at db_trap+0x89
kdb_trap() at kdb_trap+0x95
trap() at trap+0x295
calltrap() at calltrap+0x8
--- trap 0x3, rip = 0x8054694d, rsp = 0xff80a3f7e850, rbp = 
0xff80a3f7e870 ---
kdb_enter_why() at kdb_enter_why+0x3d
assert_vop_elocked() at assert_vop_elocked+0x64
VOP_WRITE_APV() at VOP_WRITE_APV+0x155
vn_write() at vn_write+0x1ce
dofilewrite() at dofilewrite+0x85
kern_pwritev() at kern_pwritev+0x66
pwrite() at pwrite+0x58
syscall() at syscall+0x1ce
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (476, FreeBSD ELF64, pwrite), rip = 0x80074d17c, rsp = 
0x7fffda98, rbp = 0xb4000 ---
Uptime: 3h20m7s
Physical memory: 6121 MB
Dumping 1730 MB: (CTRL-C to abort) ...
Dump complete
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
--> or switch off the system now.
Rebooting...

And, here is a backtrace.

(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0x80517e73 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:418
#2  0x805182fc in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0x801c6817 in db_panic (addr=Variable "addr" is not available.
) at /usr/src/sys/ddb/db_command.c:446
#4  0x801c710f in db_command (last_cmdp=0x80b21088, 
cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_comman

Re: Big problem still remains with 7.2-STABLE locking up

2009-06-08 Thread NAKAJI Hiroyuki
I got a lockup at 3 a.m. JST, but because I'm not ready for dcons I
cannot show you guys the whole ddb session.

I put a 'bt' output of kgdb.
http://www.heimat.gr.jp/localhost/kgdbbtvmcore.0

Kernel config:

include GENERIC
ident   HEIMAT
options MSGBUF_SIZE=81920
makeoptions DEBUG=-g
options KDB
options KDB_TRACE
options KDB_UNATTENDED
options DDB
options BREAK_TO_DEBUGGER
options QUOTA
options DEVICE_POLLING
options HZ=1000
options SW_WATCHDOG
options DEBUG_VFS_LOCKS
options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS

Thanks.

P.S.
"allthreads" was not a usable command in my RELENG_7's ddb.

> In <3bbf2fe10906060749xbbc2f2fy4c09f67711a...@mail.gmail.com> 
>   Attilio Rao  wrote:

> Anyways, the only one way we have to debug this is getting some help
> by the user.
> 1) Drop the option WITNESS_SPIKSPIN (as we would like to debug
> spinlocks too) and LOCK_PROFILING (in order to create higher
> contention and kill some barriers)
> 2) Once you get the deadlock break in the DDB debugger
> 3) Once you are in DDB informations which could be very useful are:
db> show allpcpu
db> show alllocks
db> show lockedvnods
db> ps
db> allthreads

> Note that this is a lot of printout so you won't be able of collecting
> all these informations if not with a serial connection.
> 4) Dump the content so that we can further look at locks structure
> states once we identify something useful (ideally, keeping the machine
> up in DDB for that would be very useful, but often not viable)

> Let me know.
> Attilio
-- 
NAKAJI Hiroyuki
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Big problem still remains with 7.2-STABLE locking up

2009-06-06 Thread Jase Thew

NAKAJI Hiroyuki wrote:

Note that this is a lot of printout so you won't be able of collecting
all these informations if not with a serial connection.


The box does not have any serial port. Is there any other way? Is it
possible to use dcons(4) for that purpose, if I add firewire PCI board?



http://wiki.freebsd.org/DebugWithDcons may be of use to you.

Regards,

Jase Thew.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Big problem still remains with 7.2-STABLE locking up

2009-06-06 Thread NAKAJI Hiroyuki
> In <3bbf2fe10906060749xbbc2f2fy4c09f67711a...@mail.gmail.com> 
>   Attilio Rao  wrote:

> > The kernel configuration is:
> >
> > include GENERIC
> > ident   HEIMAT
> > options MSGBUF_SIZE=81920
> > makeoptions     DEBUG=-g
> > options KDB
> > options DDB
> > options BREAK_TO_DEBUGGER
> > options QUOTA

> Were you unmounting any of the QUOTA'ed filesystems?

No. Quota'ed file system is /home which is not easily unmounted.

> Anyways, the only one way we have to debug this is getting some help
> by the user.
> 1) Drop the option WITNESS_SPIKSPIN (as we would like to debug
> spinlocks too) and LOCK_PROFILING (in order to create higher
> contention and kill some barriers)

Removed two lines from KERNCONF.

> 2) Once you get the deadlock break in the DDB debugger

Hmm. It is the most difficult: the box cannot break in the DDB debugger
for now.

> 3) Once you are in DDB informations which could be very useful are:
db> show allpcpu
db> show alllocks
db> show lockedvnods
db> ps
db> allthreads

> Note that this is a lot of printout so you won't be able of collecting
> all these informations if not with a serial connection.

The box does not have any serial port. Is there any other way? Is it
possible to use dcons(4) for that purpose, if I add firewire PCI board?

> 4) Dump the content so that we can further look at locks structure
> states once we identify something useful (ideally, keeping the machine
> up in DDB for that would be very useful, but often not viable)

Thank you for instruction. I'll try.
-- 
NAKAJI Hiroyuki
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Big problem still remains with 7.2-STABLE locking up

2009-06-06 Thread NAKAJI Hiroyuki
> In  
>   Pete French  wrote:

> > I followed some instructions in the list thread. But unfortunately, the
> > big problem still remains. 7.2-STABLE server locks up frequently.

> Are you using the latest STABLE ? I am rolling out the one from a few
> days ago with the bce fixes, and that works fine.

Yes, the kernel was compiled at Sat Jun  6 17:59:50 JST 2009.
I did not check the source changes but I need watching how it works.

Yesterday's kernel can easily lock up, and ichwd+watchdogd can restart
(reset?) the box.

> > The kernel configuration is:
> ...
> > options BREAK_TO_DEBUGGER

> When the box locks up, can you actyually break to the debugger ? This is how
> we eventually tracked down my problem.

No. I have never seen the debugger, Ctrl+Alt+Esc cannot break. And
because this box does not have serial port, debugging seems difficult.
-- 
NAKAJI Hiroyuki
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Big problem still remains with 7.2-STABLE locking up

2009-06-06 Thread Pete French
> My story is very similar to Pete's.
> http://lists.freebsd.org/pipermail/freebsd-stable/2009-January/047487.html

My problem, which you link to there, tturrned out to be due to ICMP
redirects, and is most definitely fixed in 7.2. So, your problem is
not the same as mine, but some of the tips given there may help you
ddebug it.

> I followed some instructions in the list thread. But unfortunately, the
> big problem still remains. 7.2-STABLE server locks up frequently.

Are you using the latest STABLE ? I am rolling out the one from a few
days ago with the bce fixes, and that works fine.

> The kernel configuration is:
...
> options BREAK_TO_DEBUGGER

When the box locks up, can you actyually break to the debugger ? This is how
we eventually tracked down my problem.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Big problem still remains with 7.2-STABLE locking up

2009-06-06 Thread Attilio Rao
2009/6/6 NAKAJI Hiroyuki :
> Hi,
>
> I noticed, some months ago, frequent lockups on my RELENG_6 server with
> ECS PM800-M2, Celeron 2.6GHz (UP), 2GB ram, ATA HDDs and 3Com NIC(xl0),
> and then I gave up this old server.
>
> Last month, I replaced this 'unstable' server to the new one with
> 7.2-RELEASE which worked very well until I setup it as 'a server'. The
> problem began just after it started 'the services'.
>
> My story is very similar to Pete's.
> http://lists.freebsd.org/pipermail/freebsd-stable/2009-January/047487.html
>
> I followed some instructions in the list thread. But unfortunately, the
> big problem still remains. 7.2-STABLE server locks up frequently.
>
> Help! :-(
>
> The server is NEC Express5800 S70/SD.
>
> o CPU: Intel(R) Celeron(R) CPU 440 @ 2.00GHz (2280.25-MHz K8-class CPU)
> o 6GB RAM
> o ACPI APIC Table: 
> o 80GB and 250GB SATA HDDs
> o http://www.heimat.gr.jp/~nakaji/localhost/dmesg.boot
>
> The kernel configuration is:
>
> include GENERIC
> ident   HEIMAT
> options MSGBUF_SIZE=81920
> makeoptions     DEBUG=-g
> options KDB
> options DDB
> options BREAK_TO_DEBUGGER
> options QUOTA

Were you unmounting any of the QUOTA'ed filesystems?
I'm aware of a possible deadlock between quota and unmount path which
is very difficult to trigger though.

Anyways, the only one way we have to debug this is getting some help
by the user.
1) Drop the option WITNESS_SPIKSPIN (as we would like to debug
spinlocks too) and LOCK_PROFILING (in order to create higher
contention and kill some barriers)
2) Once you get the deadlock break in the DDB debugger
3) Once you are in DDB informations which could be very useful are:
db> show allpcpu
db> show alllocks
db> show lockedvnods
db> ps
db> allthreads

Note that this is a lot of printout so you won't be able of collecting
all these informations if not with a serial connection.
4) Dump the content so that we can further look at locks structure
states once we identify something useful (ideally, keeping the machine
up in DDB for that would be very useful, but often not viable)

Let me know.
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"