Re: debugging a repeating panic that does not produce a dump
Mike Tancsa <[EMAIL PROTECTED]> writes: > It only happens when periodic runs, but it on occasion skips a day. > Eg. yesterday it did not do it. It only started happening post > Jan28th. I can brutalize the server with repeated buildworlds (-j2 > through 8) and it is always successful. Its only on periodic that it > dies and find is always the process running. Its only with SMP as well > on this 'oldish' machine Hmm, it would be great to know what process was running when it crashed. Unfortunately, I don't know how to do that post-KSE... DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: debugging a repeating panic that does not produce a dump
In message <[EMAIL PROTECTED]>, Mike Tancsa wr ites: > >It only happens when periodic runs, but it on occasion skips a day. Eg. >yesterday it did not do it. It only started happening post Jan28th. I can >brutalize the server with repeated buildworlds (-j2 through 8) and it is >always successful. Its only on periodic that it dies and find is always >the process running. Its only with SMP as well on this 'oldish' machine This sounds like the double-fault my laptop does every so often. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: debugging a repeating panic that does not produce a dump
It only happens when periodic runs, but it on occasion skips a day. Eg. yesterday it did not do it. It only started happening post Jan28th. I can brutalize the server with repeated buildworlds (-j2 through 8) and it is always successful. Its only on periodic that it dies and find is always the process running. Its only with SMP as well on this 'oldish' machine ---Mike At 08:30 PM 19/02/2003 +0100, Dag-Erling Smorgrav wrote: Mike Tancsa <[EMAIL PROTECTED]> writes: > Fatal trap 12: page fault while in kernel mode > mp_lock = 0002; cpuid = 0; lapic.id = 0100 > fault virtual address = 0xc6efa8e8 Hmm, different fault address this time. > (kgdb) up 6 > #6 0xc0174830 in makedev (x=28, y=160) at /usr/src/sys/kern/kern_conf.c:207 These numbers look perfectly valid (cuaia0). The only explanation I can think of is some kind of race, or some kind of corruption. Hopefully somebody more clued than myself will be able to figure it out. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: debugging a repeating panic that does not produce a dump
Mike Tancsa <[EMAIL PROTECTED]> writes: > Fatal trap 12: page fault while in kernel mode > mp_lock = 0002; cpuid = 0; lapic.id = 0100 > fault virtual address = 0xc6efa8e8 Hmm, different fault address this time. > (kgdb) up 6 > #6 0xc0174830 in makedev (x=28, y=160) at /usr/src/sys/kern/kern_conf.c:207 These numbers look perfectly valid (cuaia0). The only explanation I can think of is some kind of race, or some kind of corruption. Hopefully somebody more clued than myself will be able to figure it out. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: debugging a repeating panic that does not produce a dump
At 08:39 PM 17/02/2003 +0100, Dag-Erling Smorgrav wrote: Mike Tancsa <[EMAIL PROTECTED]> writes: > ns4# nm /kernel | grep \^c0174 | sort > [...] > c01747d4 T makedev > c01748f4 T freedev This is it (makedev) > Does this actually show the location ? > ns4# gdb -k kernel.debug > [...] > (kgdb) list *0xc0174830 > 0xc0174830 is in makedev (/usr/src/sys/kern/kern_conf.c:208). > 203 if (x == umajor(NOUDEV) && y == uminor(NOUDEV)) > 204 Debugger("makedev of NOUDEV"); > 205 udev = (x << 8) | y; > 206 hash = udev % DEVT_HASH; > 207 LIST_FOREACH(si, &dev_hash[hash], si_hash) { > 208 if (si->si_udev == udev) > 209 return (si); > 210 } > 211 if (stashed >= DEVT_STASH) { > 212 MALLOC(si, struct specinfo *, sizeof(*si), M_DEVT, > (kgdb) Yep. Looks like si is garbage: > fault virtual address = 0x211e6d36 is most likely the value of si at the time of the crash. It's nowhere near kernel memory (which starts at 0xc000). If / when you get a dump, show me the backtrace and the value of x, y and udev (as reported by gdb operating on the recovered core) OK, got it today. ns4# gdb -k /usr/obj/usr/src/sys/smp/kernel.debug vmcore.0 GNU gdb 4.18 (FreeBSD) Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 2627 in elfstab_build_psymtabs Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 933 in fill_symbuf SMP 2 cpus IdlePTD at phsyical address 0x003d4000 initial pcb at physical address 0x0032ebc0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode mp_lock = 0002; cpuid = 0; lapic.id = 0100 fault virtual address = 0xc6efa8e8 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0174830 stack pointer = 0x10:0xdef1dc4c frame pointer = 0x10:0xdef1dc58 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 91300 (find) interrupt mask = none <- SMP: XXX trap number = 12 panic: page fault mp_lock = 0002; cpuid = 0; lapic.id = 0100 boot() called on cpu#0 syncing disks... 2 2 done Uptime: 1d22h58m21s dumping to dev #twed/1, offset 524312 dump 767 766 765 764 763 762 761 760 759 758 757 756 755 754 753 752 751 750 749 748 747 746 745 744 743 742 741 740 739 738 737 736 735 734 733 732 731 730 729 728 727 726 725 724 723 722 721 720 719 718 717 716 715 714 713 712 711 710 709 708 707 706 705 704 703 702 701 700 699 698 697 696 695 694 693 692 691 690 689 688 687 686 685 684 683 682 681 680 679 678 677 676 675 674 673 672 671 670 669 668 667 666 665 664 663 662 661 660 659 658 657 656 655 654 653 652 651 650 649 648 647 646 645 644 643 642 641 640 639 638 637 636 635 634 633 632 631 630 629 628 627 626 625 624 623 622 621 620 619 618 617 616 615 614 613 612 611 610 609 608 607 606 605 604 603 602 601 600 599 598 597 596 595 594 593 592 591 590 589 588 587 586 585 584 583 582 581 580 579 578 577 576 575 574 573 572 571 570 569 568 567 566 565 564 563 562 561 560 559 558 557 556 555 554 553 552 551 550 549 548 547 546 545 544 543 542 541 540 539 538 537 536 535 534 533 532 531 530 529 528 527 526 525 524 523 522 521 520 519 518 517 516 515 514 513 512 511 510 509 508 507 506 505 504 503 502 501 500 499 498 497 496 495 494 493 492 491 490 489 488 487 486 485 484 483 482 481 480 479 478 477 476 475 474 473 472 471 470 469 468 467 466 465 464 463 462 461 460 459 458 457 456 455 454 453 452 451 450 449 448 447 446 445 444 443 442 441 440 439 438 437 436 435 434 433 432 431 430 429 428 427 426 425 424 423 422 421 420 419 418 417 416 415 414 413 412 411 410 409 408 407 406 405 404 403 402 401 400 399 398 397 396 395 394 393 392 391 390 389 388 387 386 385 384 383 382 381 380 379 378 377 376 375 374 373 372 371 370 369 368 367 366 365 364 363 362 361 360 359 358 357 356 355 354 353 352 351 350 349 348 347 346 345 344 343 342 341 340 339 338 337 336 335 334 333 332 331 330 329 328 327 326 325 324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307 306 305 304 303 302 301 300 299 298 297 296 295 294 293 292 291 290 289 288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271 270 269 268 267 266 265 264 263 262 261 260 259 258 257 256 255 254 253 252 251 250 249
Re: debugging a repeating panic that does not produce a dump
Mike Tancsa <[EMAIL PROTECTED]> writes: > Thank you very much, I will do so as soon as I get the dump. BTW, > could the act of giving the wrong params to dumpon cause the crash ? No, it wouldn't. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: debugging a repeating panic that does not produce a dump
At 08:39 PM 17/02/2003 +0100, Dag-Erling Smorgrav wrote: If / when you get a dump, show me the backtrace and the value of x, y and udev (as reported by gdb operating on the recovered core) Thank you very much, I will do so as soon as I get the dump. BTW, could the act of giving the wrong params to dumpon cause the crash ? I dont see anything directly in periodic that would look for / trigger that however. ---Mike > > How do you build your kernels - 'make buildkernel' or manually? > Always make buildkernel. I have a debug kernel built as well > (makeoptions DEBUG=-g) That's what I wanted to know. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: debugging a repeating panic that does not produce a dump
Mike Tancsa <[EMAIL PROTECTED]> writes: > ns4# nm /kernel | grep \^c0174 | sort > [...] > c01747d4 T makedev > c01748f4 T freedev This is it (makedev) > Does this actually show the location ? > ns4# gdb -k kernel.debug > [...] > (kgdb) list *0xc0174830 > 0xc0174830 is in makedev (/usr/src/sys/kern/kern_conf.c:208). > 203 if (x == umajor(NOUDEV) && y == uminor(NOUDEV)) > 204 Debugger("makedev of NOUDEV"); > 205 udev = (x << 8) | y; > 206 hash = udev % DEVT_HASH; > 207 LIST_FOREACH(si, &dev_hash[hash], si_hash) { > 208 if (si->si_udev == udev) > 209 return (si); > 210 } > 211 if (stashed >= DEVT_STASH) { > 212 MALLOC(si, struct specinfo *, sizeof(*si), M_DEVT, > (kgdb) Yep. Looks like si is garbage: > fault virtual address = 0x211e6d36 is most likely the value of si at the time of the crash. It's nowhere near kernel memory (which starts at 0xc000). If / when you get a dump, show me the backtrace and the value of x, y and udev (as reported by gdb operating on the recovered core) > > How do you build your kernels - 'make buildkernel' or manually? > Always make buildkernel. I have a debug kernel built as well > (makeoptions DEBUG=-g) That's what I wanted to know. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: debugging a repeating panic that does not produce a dump
At 07:50 PM 17/02/2003 +0100, Dag-Erling Smorgrav wrote: Mike Tancsa <[EMAIL PROTECTED]> writes: > I am seeing a repeatable panic with a 4.x SMP machine (not when in uni > mode). It never produces a crash dump, but always panics when periodic > runs. Hmm, it doesn't even seem to *try* to dump... are you sure you have configured a dump device? Arrh... There was a typo on /etc/rc.conf :-( dumpdev="/dev/twed0s1b" # Device name to crashdump to (or NO). dumpdir="/var/crash"# Directory where crash dumps are to be stored istead of the correct /dev/twed0b I have corrected that and did a ns4# dumpon -v /dev/twed0b dumpon: crash dumps to /dev/twed0b (147, 1) ns4# > instruction pointer = 0x8:0xc0174830 This is the address of the instruction which caused the fault. You can run nm(1) on your kernel to find out where in the kernel that is, e.g.: # nm /kernel | grep \^c0174 | sort ns4# nm /kernel | grep \^c0174 | sort c0174034 t switch_timecounter c01740c4 t sync_other_counter c0174130 t tco_forward c0174278 t sysctl_kern_timecounter_hardware c0174310 T pps_ioctl c01743fc T pps_init c0174420 T pps_event c0174578 T devsw c017459c T cdevsw_add c01746c4 T cdevsw_remove c017471c T major c017473c T minor c017475c T lminor c0174788 T makebdev c01747d4 T makedev c01748f4 T freedev c0174980 T dev2udev c017499c T udev2dev c0174a00 T uminor c0174a0c T umajor c0174a18 T makeudev c0174a28 T make_dev c0174a68 T destroy_dev c0174a90 T devtoname c0174b2c T getdtablesize c0174b54 T dup2 c0174bf4 T dup c0174c4c T fcntl ns4# Does this actually show the location ? ns4# gdb -k kernel.debug GNU gdb 4.18 (FreeBSD) Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 2627 in elfstab_build_psymtabs Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 933 in fill_symbuf (kgdb) list *0xc0174830 0xc0174830 is in makedev (/usr/src/sys/kern/kern_conf.c:208). 203 if (x == umajor(NOUDEV) && y == uminor(NOUDEV)) 204 Debugger("makedev of NOUDEV"); 205 udev = (x << 8) | y; 206 hash = udev % DEVT_HASH; 207 LIST_FOREACH(si, &dev_hash[hash], si_hash) { 208 if (si->si_udev == udev) 209 return (si); 210 } 211 if (stashed >= DEVT_STASH) { 212 MALLOC(si, struct specinfo *, sizeof(*si), M_DEVT, (kgdb) this should give you a list of maybe a dozen symbols; the one you want is the last one in the list that has a lower address than c0174830. How do you build your kernels - 'make buildkernel' or manually? Always make buildkernel. I have a debug kernel built as well (makeoptions DEBUG=-g) Thanks for responding. Your above comment was what was needed to triple check my rc.conf and correct the typo :( ---Mike To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: debugging a repeating panic that does not produce a dump
Mike Tancsa <[EMAIL PROTECTED]> writes: > I am seeing a repeatable panic with a 4.x SMP machine (not when in uni > mode). It never produces a crash dump, but always panics when periodic > runs. Hmm, it doesn't even seem to *try* to dump... are you sure you have configured a dump device? > instruction pointer = 0x8:0xc0174830 This is the address of the instruction which caused the fault. You can run nm(1) on your kernel to find out where in the kernel that is, e.g.: # nm /kernel | grep \^c0174 | sort this should give you a list of maybe a dozen symbols; the one you want is the last one in the list that has a lower address than c0174830. How do you build your kernels - 'make buildkernel' or manually? DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message