Re: debugging a repeating panic that does not produce a dump

2003-02-20 Thread Dag-Erling Smorgrav
Mike Tancsa <[EMAIL PROTECTED]> writes:
> It only happens when periodic runs, but it on occasion skips a day.
> Eg. yesterday it did not do it.  It only started happening post
> Jan28th.  I can brutalize the server with repeated buildworlds (-j2
> through 8) and it is always successful.  Its only on periodic that it
> dies and find is always the process running. Its only with SMP as well
> on this 'oldish' machine

Hmm, it would be great to know what process was running when it
crashed.  Unfortunately, I don't know how to do that post-KSE...

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: debugging a repeating panic that does not produce a dump

2003-02-19 Thread phk
In message <[EMAIL PROTECTED]>, Mike Tancsa wr
ites:
>
>It only happens when periodic runs, but it on occasion skips a day.  Eg. 
>yesterday it did not do it.  It only started happening post Jan28th.  I can 
>brutalize the server with repeated buildworlds (-j2 through 8) and it is 
>always successful.  Its only on periodic that it dies and find is always 
>the process running. Its only with SMP as well on this 'oldish' machine

This sounds like the double-fault my laptop does every so often.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: debugging a repeating panic that does not produce a dump

2003-02-19 Thread Mike Tancsa

It only happens when periodic runs, but it on occasion skips a day.  Eg. 
yesterday it did not do it.  It only started happening post Jan28th.  I can 
brutalize the server with repeated buildworlds (-j2 through 8) and it is 
always successful.  Its only on periodic that it dies and find is always 
the process running. Its only with SMP as well on this 'oldish' machine

---Mike

At 08:30 PM 19/02/2003 +0100, Dag-Erling Smorgrav wrote:
Mike Tancsa <[EMAIL PROTECTED]> writes:
> Fatal trap 12: page fault while in kernel mode
> mp_lock = 0002; cpuid = 0; lapic.id = 0100
> fault virtual address   = 0xc6efa8e8

Hmm, different fault address this time.

> (kgdb) up 6
> #6  0xc0174830 in makedev (x=28, y=160) at 
/usr/src/sys/kern/kern_conf.c:207

These numbers look perfectly valid (cuaia0).  The only explanation I
can think of is some kind of race, or some kind of corruption.
Hopefully somebody more clued than myself will be able to figure it
out.

DES
--
Dag-Erling Smorgrav - [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: debugging a repeating panic that does not produce a dump

2003-02-19 Thread Dag-Erling Smorgrav
Mike Tancsa <[EMAIL PROTECTED]> writes:
> Fatal trap 12: page fault while in kernel mode
> mp_lock = 0002; cpuid = 0; lapic.id = 0100
> fault virtual address   = 0xc6efa8e8

Hmm, different fault address this time.

> (kgdb) up 6
> #6  0xc0174830 in makedev (x=28, y=160) at /usr/src/sys/kern/kern_conf.c:207

These numbers look perfectly valid (cuaia0).  The only explanation I
can think of is some kind of race, or some kind of corruption.
Hopefully somebody more clued than myself will be able to figure it
out.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: debugging a repeating panic that does not produce a dump

2003-02-19 Thread Mike Tancsa
At 08:39 PM 17/02/2003 +0100, Dag-Erling Smorgrav wrote:

Mike Tancsa <[EMAIL PROTECTED]> writes:
> ns4# nm /kernel | grep \^c0174 | sort
> [...]
> c01747d4 T makedev
> c01748f4 T freedev

This is it (makedev)

> Does this actually show the location ?
> ns4# gdb -k kernel.debug
> [...]
> (kgdb) list *0xc0174830
> 0xc0174830 is in makedev (/usr/src/sys/kern/kern_conf.c:208).
> 203 if (x == umajor(NOUDEV) && y == uminor(NOUDEV))
> 204 Debugger("makedev of NOUDEV");
> 205 udev = (x << 8) | y;
> 206 hash = udev % DEVT_HASH;
> 207 LIST_FOREACH(si, &dev_hash[hash], si_hash) {
> 208 if (si->si_udev == udev)
> 209 return (si);
> 210 }
> 211 if (stashed >= DEVT_STASH) {
> 212 MALLOC(si, struct specinfo *, sizeof(*si), M_DEVT,
> (kgdb)

Yep.  Looks like si is garbage:

> fault virtual address   = 0x211e6d36

is most likely the value of si at the time of the crash.  It's nowhere
near kernel memory (which starts at 0xc000).

If / when you get a dump, show me the backtrace and the value of x, y
and udev (as reported by gdb operating on the recovered core)


OK, got it today.

ns4# gdb -k /usr/obj/usr/src/sys/smp/kernel.debug vmcore.0
GNU gdb 4.18 (FreeBSD)
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read 
called at 
/usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c 
line 2627 in elfstab_build_psymtabs
Deprecated bfd_read called at 
/usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c 
line 933 in fill_symbuf

SMP 2 cpus
IdlePTD at phsyical address 0x003d4000
initial pcb at physical address 0x0032ebc0
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
mp_lock = 0002; cpuid = 0; lapic.id = 0100
fault virtual address   = 0xc6efa8e8
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc0174830
stack pointer   = 0x10:0xdef1dc4c
frame pointer   = 0x10:0xdef1dc58
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 91300 (find)
interrupt mask  = none <- SMP: XXX
trap number = 12
panic: page fault
mp_lock = 0002; cpuid = 0; lapic.id = 0100
boot() called on cpu#0

syncing disks... 2 2
done
Uptime: 1d22h58m21s

dumping to dev #twed/1, offset 524312
dump 767 766 765 764 763 762 761 760 759 758 757 756 755 754 753 752 751 
750 749 748 747 746 745 744 743 742 741 740 739 738 737 736 735 734 733 732 
731 730 729 728 727 726 725 724 723 722 721 720 719 718 717 716 715 714 713 
712 711 710 709 708 707 706 705 704 703 702 701 700 699 698 697 696 695 694 
693 692 691 690 689 688 687 686 685 684 683 682 681 680 679 678 677 676 675 
674 673 672 671 670 669 668 667 666 665 664 663 662 661 660 659 658 657 656 
655 654 653 652 651 650 649 648 647 646 645 644 643 642 641 640 639 638 637 
636 635 634 633 632 631 630 629 628 627 626 625 624 623 622 621 620 619 618 
617 616 615 614 613 612 611 610 609 608 607 606 605 604 603 602 601 600 599 
598 597 596 595 594 593 592 591 590 589 588 587 586 585 584 583 582 581 580 
579 578 577 576 575 574 573 572 571 570 569 568 567 566 565 564 563 562 561 
560 559 558 557 556 555 554 553 552 551 550 549 548 547 546 545 544 543 542 
541 540 539 538 537 536 535 534 533 532 531 530 529 528 527 526 525 524 523 
522 521 520 519 518 517 516 515 514 513 512 511 510 509 508 507 506 505 504 
503 502 501 500 499 498 497 496 495 494 493 492 491 490 489 488 487 486 485 
484 483 482 481 480 479 478 477 476 475 474 473 472 471 470 469 468 467 466 
465 464 463 462 461 460 459 458 457 456 455 454 453 452 451 450 449 448 447 
446 445 444 443 442 441 440 439 438 437 436 435 434 433 432 431 430 429 428 
427 426 425 424 423 422 421 420 419 418 417 416 415 414 413 412 411 410 409 
408 407 406 405 404 403 402 401 400 399 398 397 396 395 394 393 392 391 390 
389 388 387 386 385 384 383 382 381 380 379 378 377 376 375 374 373 372 371 
370 369 368 367 366 365 364 363 362 361 360 359 358 357 356 355 354 353 352 
351 350 349 348 347 346 345 344 343 342 341 340 339 338 337 336 335 334 333 
332 331 330 329 328 327 326 325 324 323 322 321 320 319 318 317 316 315 314 
313 312 311 310 309 308 307 306 305 304 303 302 301 300 299 298 297 296 295 
294 293 292 291 290 289 288 287 286 285 284 283 282 281 280 279 278 277 276 
275 274 273 272 271 270 269 268 267 266 265 264 263 262 261 260 259 258 257 
256 255 254 253 252 251 250 249 

Re: debugging a repeating panic that does not produce a dump

2003-02-17 Thread Dag-Erling Smorgrav
Mike Tancsa <[EMAIL PROTECTED]> writes:
> Thank you very much, I will do so as soon as I get the dump.  BTW,
> could the act of giving the wrong params to dumpon cause the crash ?

No, it wouldn't.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: debugging a repeating panic that does not produce a dump

2003-02-17 Thread Mike Tancsa
At 08:39 PM 17/02/2003 +0100, Dag-Erling Smorgrav wrote:


If / when you get a dump, show me the backtrace and the value of x, y
and udev (as reported by gdb operating on the recovered core)


Thank you very much, I will do so as soon as I get the dump.  BTW, could 
the act of giving the wrong params to dumpon cause the crash ? I dont see 
anything directly in periodic that would look for / trigger that however.

---Mike


> > How do you build your kernels - 'make buildkernel' or manually?
> Always make buildkernel. I have a debug kernel built as well
> (makeoptions DEBUG=-g)

That's what I wanted to know.

DES
--
Dag-Erling Smorgrav - [EMAIL PROTECTED]



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: debugging a repeating panic that does not produce a dump

2003-02-17 Thread Dag-Erling Smorgrav
Mike Tancsa <[EMAIL PROTECTED]> writes:
> ns4# nm /kernel | grep \^c0174 | sort
> [...]
> c01747d4 T makedev
> c01748f4 T freedev

This is it (makedev)

> Does this actually show the location ?
> ns4# gdb -k kernel.debug
> [...]
> (kgdb) list *0xc0174830
> 0xc0174830 is in makedev (/usr/src/sys/kern/kern_conf.c:208).
> 203 if (x == umajor(NOUDEV) && y == uminor(NOUDEV))
> 204 Debugger("makedev of NOUDEV");
> 205 udev = (x << 8) | y;
> 206 hash = udev % DEVT_HASH;
> 207 LIST_FOREACH(si, &dev_hash[hash], si_hash) {
> 208 if (si->si_udev == udev)
> 209 return (si);
> 210 }
> 211 if (stashed >= DEVT_STASH) {
> 212 MALLOC(si, struct specinfo *, sizeof(*si), M_DEVT,
> (kgdb)

Yep.  Looks like si is garbage:

> fault virtual address   = 0x211e6d36

is most likely the value of si at the time of the crash.  It's nowhere
near kernel memory (which starts at 0xc000).

If / when you get a dump, show me the backtrace and the value of x, y
and udev (as reported by gdb operating on the recovered core)

> > How do you build your kernels - 'make buildkernel' or manually?
> Always make buildkernel. I have a debug kernel built as well
> (makeoptions DEBUG=-g)

That's what I wanted to know.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: debugging a repeating panic that does not produce a dump

2003-02-17 Thread Mike Tancsa
At 07:50 PM 17/02/2003 +0100, Dag-Erling Smorgrav wrote:

Mike Tancsa <[EMAIL PROTECTED]> writes:
> I am seeing a repeatable panic with a 4.x SMP machine (not when in uni
> mode). It never produces a crash dump, but always panics when periodic
> runs.

Hmm, it doesn't even seem to *try* to dump...  are you sure you have
configured a dump device?


Arrh... There was a typo on /etc/rc.conf :-(

dumpdev="/dev/twed0s1b" # Device name to crashdump to (or NO).
dumpdir="/var/crash"# Directory where crash dumps are to be stored
istead of the correct

/dev/twed0b

I have corrected that and did a

ns4# dumpon -v /dev/twed0b
dumpon: crash dumps to /dev/twed0b (147, 1)
ns4#



> instruction pointer = 0x8:0xc0174830

This is the address of the instruction which caused the fault.  You
can run nm(1) on your kernel to find out where in the kernel that is,
e.g.:

# nm /kernel | grep \^c0174 | sort



ns4# nm /kernel | grep \^c0174 | sort
c0174034 t switch_timecounter
c01740c4 t sync_other_counter
c0174130 t tco_forward
c0174278 t sysctl_kern_timecounter_hardware
c0174310 T pps_ioctl
c01743fc T pps_init
c0174420 T pps_event
c0174578 T devsw
c017459c T cdevsw_add
c01746c4 T cdevsw_remove
c017471c T major
c017473c T minor
c017475c T lminor
c0174788 T makebdev
c01747d4 T makedev
c01748f4 T freedev
c0174980 T dev2udev
c017499c T udev2dev
c0174a00 T uminor
c0174a0c T umajor
c0174a18 T makeudev
c0174a28 T make_dev
c0174a68 T destroy_dev
c0174a90 T devtoname
c0174b2c T getdtablesize
c0174b54 T dup2
c0174bf4 T dup
c0174c4c T fcntl
ns4#

Does this actually show the location ?
ns4# gdb -k kernel.debug
GNU gdb 4.18 (FreeBSD)
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read 
called at 
/usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c 
line 2627 in elfstab_build_psymtabs
Deprecated bfd_read called at 
/usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c 
line 933 in fill_symbuf

(kgdb) list *0xc0174830
0xc0174830 is in makedev (/usr/src/sys/kern/kern_conf.c:208).
203 if (x == umajor(NOUDEV) && y == uminor(NOUDEV))
204 Debugger("makedev of NOUDEV");
205 udev = (x << 8) | y;
206 hash = udev % DEVT_HASH;
207 LIST_FOREACH(si, &dev_hash[hash], si_hash) {
208 if (si->si_udev == udev)
209 return (si);
210 }
211 if (stashed >= DEVT_STASH) {
212 MALLOC(si, struct specinfo *, sizeof(*si), M_DEVT,
(kgdb)


this should give you a list of maybe a dozen symbols; the one you want
is the last one in the list that has a lower address than c0174830.

How do you build your kernels - 'make buildkernel' or manually?


Always make buildkernel. I have a debug kernel built as well 
(makeoptions DEBUG=-g)

Thanks for responding. Your above comment was what was needed to triple 
check my rc.conf and correct the typo :(

---Mike 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message


Re: debugging a repeating panic that does not produce a dump

2003-02-17 Thread Dag-Erling Smorgrav
Mike Tancsa <[EMAIL PROTECTED]> writes:
> I am seeing a repeatable panic with a 4.x SMP machine (not when in uni
> mode). It never produces a crash dump, but always panics when periodic
> runs.

Hmm, it doesn't even seem to *try* to dump...  are you sure you have
configured a dump device?

> instruction pointer = 0x8:0xc0174830

This is the address of the instruction which caused the fault.  You
can run nm(1) on your kernel to find out where in the kernel that is,
e.g.:

# nm /kernel | grep \^c0174 | sort

this should give you a list of maybe a dozen symbols; the one you want
is the last one in the list that has a lower address than c0174830.

How do you build your kernels - 'make buildkernel' or manually?

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message