On Thu, Dec 04, 2008 at 03:32:37PM +0100, Ulrich Leodolter wrote: > On Thu, 2008-12-04 at 14:35 +0200, Pasi Kärkkäinen wrote: > > On Thu, Dec 04, 2008 at 02:13:56PM +0200, Pasi Kärkkäinen wrote: > > > > > > Could you stop all daemons with a sigsegv to force a backtrace ? > > > > killall -SEGV bacula-sd bacula-dir > > > > > > > > (you will find 2 kind of file, *traceback and *bactrace in working > > > > directory) > > > > > > > > After, if you can put results to pastbin, it will give information > > > > about your > > > > problem. > > > > > > > > > > Ok, problems again.. here are the tracebacks: > > > > > > http://pasik.reaktio.net/bacula/debug/bacula-sd-traceback.txt > > > http://pasik.reaktio.net/bacula/debug/bacula-dir-traceback.txt > > > > > > Here's what I did to make bacula-sd hang: > > > > > > 1. Rebooted the bacula server and the tape library > > > 2. Fresh after the reboot made sure mtx and bacula mtx-changer work OK. > > > 3. Started bacula > > > 4. Ran a job that copies jobs from disk pool to tape pool > > > 5. Bacula starts a bunch of jobs, but nothing happens.. bacula-sd is > > > stuck. > > > > > > Any ideas how to debug this further? > > > > > > Atm I'm running Bacula 2.5.20 (svn rev 8083) on CentOS 5.2 x86 32bit. > > > > > > I also tried applying 2.4.3-sd-deadlock.patch (from bug #1192) but it > > > didn't > > > seem to help. > > > > > > > And how did I verify bacula-sd is stuck/hanged.. > > > > - Checking what's happening on SCSI devices with "iostat 1" -> I don't see > > any disk activity. > > - Nothing happens in bconsole > > - Checking the status of Storage (tape pool) in bconsole makes bconsole > > stuck: > > > > http://pasik.reaktio.net/bacula/debug/bconsole-sd-hang.txt > > > > Hi, > > Did you notice broken "Terminated Jobs:" list in bconsole-sd-hang.txt? >
Yeah I did actually.. didn't pay too much attention to that, because it was more important to get the hang fixed :) > > Here ist my output of "status dir" (after upgrade to current svn) > Mine was from "status Storage".. but yeah, something wrong in the list.. -- Pasi > > Connecting to Director troll:9101 > 1000 OK: troll-dir Version: 2.5.22 (01 December 2008) > Enter a period to cancel a command. > *status dir > troll-dir Version: 2.5.22 (01 December 2008) i686-pc-linux-gnu redhat > Enterprise release > Daemon started 04-Dec-08 15:18, 0 Jobs run since started. > Heap: heap=274,432 smbytes=129,912 max_bytes=130,465 bufs=1,260 > max_bufs=1,294 > > ... > > Terminated Jobs: > JobId Level Files Bytes Status Finished Name > ==================================================================== > 8696 Incr 366 6.058 G OK 03-Feb-27 18:17 > belix.2008-12-04_09 > 1228072365 ??? (2 1 5.275 E Other 29-Sep-21 15:51 > 008-12-04_09 > 104123763 8 (5 1,228,379,050 8.299 E Other 15-Jan-94 19:04 > 4_09 > 1228378991 1,802,725,700 3.615 E Other 28-May-00 01:50 56 > 1801675074 ??? (1 841,903,973 3.471 E Other 28-Sep-97 12:49 > 1632923476 D (1 808,268,337 3.328 E Other 01-Jan-70 01:00 > 758657072 i (8 774,975,534 232.8 G Other 01-Jan-70 01:00 > 959471412 1 (8 53 0 Other 01-Jan-70 01:00 > 775304494 9 (8 0 0 Other 01-Jan-70 01:00 > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
