Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process
Hi, Jim Leonard wrote on 2009-08-23 20:04:31 -0500 [Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process]: > Steve wrote: > > - > > Aug 23 11:23:38 vaccg1 kernel: [ 1890.496217] BackupPC_dump[3888]: > > segfault at bf23293c ip b7e249ea sp bfe92850 error 4 in > > libz.so.1.2.3.3[b7e19000+14000] > > -- > > Seems pretty obvious that the problem isn't free memory but rather some > bug in the interaction between backuppc_dump and libz. "pretty obvious" as in "completely explains"? So, you are experiencing the same behaviour? Or, at least, know of someone else that has, ever? http://www.catb.org/~esr/faqs/smart-questions.html#id306810 > When it dies, there is a core file left behind; Is there? Well, there might be, and looking at the stack trace might even be a good idea (though you almost definitely will *not* find out ... > what part of backuppc_dump you were in to cause it ... because you'll see a C backtrace - probably of stripped code -, not a Perl backtrace, and Perl code cannot "cause" a SEGV anyway, only a bug in Perl or a module (part) implemented in C can). Do that if you like, but it is certainly not mandatory. I'd either try replacing libz.so.1.2.3.3 with an uncorrupted copy (check the md5sum of that file if you can; it may not, in fact, be a corrupted libz, but that's where I'd start looking), if the SEGV happens roughly at the same place each time, or else think about the possibility of hardware issues (memory, CPU, I/O, PSU, ...; are you observing any other unexplained problems?). I wouldn't go so far as to say any of that was obvious, let alone the only possible explanation. Hope that helps. Regards, Holger P.S.: I wouldn't discount running out of memory as a possible cause for a SEGV. It was a good idea to check that. -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process
Steve wrote: > - > Aug 23 11:23:38 vaccg1 kernel: [ 1890.496217] BackupPC_dump[3888]: > segfault at bf23293c ip b7e249ea sp bfe92850 error 4 in > libz.so.1.2.3.3[b7e19000+14000] > -- Seems pretty obvious that the problem isn't free memory but rather some bug in the interaction between backuppc_dump and libz. When it dies, there is a core file left behind; I would research analyzing core files for your particular operating system so you can see what the stack trace shows (ie. what part of backuppc_dump you were in to cause it). -- Jim Leonard (trix...@oldskool.org)http://www.oldskool.org/ Help our electronic games project: http://www.mobygames.com/ Or check out some trippy MindCandy at http://www.mindcandydvd.com/ A child borne of the home computer wars: http://trixter.wordpress.com/ -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process
Hello all; I am still trying to figure this issue out. I only today noticed that a segfault is recorded in BackupPC_dump when this happens! The notice shows up in /var/log/messages, not in any log file generated by backuppc so i hadn't checked that before. Below is the one from this morning. - Aug 23 11:23:38 vaccg1 kernel: [ 1890.496217] BackupPC_dump[3888]: segfault at bf23293c ip b7e249ea sp bfe92850 error 4 in libz.so.1.2.3.3[b7e19000+14000] -- I have been checking "free" regularly and also "top" to see memory usage and it doesn't look anywhere close to critical; below is the output during operation, but before stall/failure: -- top: 8494 backuppc 20 0 89876 83m 1320 D 12 4.1 40:29.57 BackupPC_dump 8797 backuppc 20 0 119m 114m 1176 D8 5.7 55:18.06 BackupPC_dump free: total used free sharedbuffers cached Mem: 20612282030492 30736 0 2513001243688 -/+ buffers/cache: 5355041525724 Swap: 1646620 290201617600 - Again, I would really appreciate some help with this if anyone has any ideas. Thanks, Steve -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process
> I don't know about stalling, but leaving zombie processes is something > "normal". As far as I can tell, it happens when you dump multiple > directories or drives from the same machine. BackupPC_dump will fork > one process to dump the first directory, then one process to dump the > second one, and so on, but it will "reap" the children processes all > at once at the end. I don't know if this happens with "rsync" only > (that's how I use it and see it here) or with the other xfer methods > ("tar", "smb") as well... I guess I don't care about the zombie, but since the backup never "completes" something is obviously amiss. I suspect that once "complete" maybe TrashClean or something clears away the leftover processes; there are never zombies after a successful inc or full backup, only when one hangs... Maybe the zombie isn't the problem put some parent process never realizing the subtasks are complete. thanks steve -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process
Hi, On Thu, Jul 30, 2009 at 13:39, Steve wrote: > Wondering if anything has had time to think about this question. I am > still having the issue... I don't know about stalling, but leaving zombie processes is something "normal". As far as I can tell, it happens when you dump multiple directories or drives from the same machine. BackupPC_dump will fork one process to dump the first directory, then one process to dump the second one, and so on, but it will "reap" the children processes all at once at the end. I don't know if this happens with "rsync" only (that's how I use it and see it here) or with the other xfer methods ("tar", "smb") as well... HTH, Filipe -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process
Wondering if anything has had time to think about this question. I am still having the issue... thanks, Steve On Thu, Jul 23, 2009 at 5:45 PM, Steve wrote: > Hello, > > I am running Backuppc 3.1.0 with ubuntu 9.04 now, and since the > upgrade I've developed a problem with backuppc. During full backups, > the backup stalls and leaves a zombie. The backup just sits (as shown > below) until I kill the ssh process or restart backuppc. I have not > seen this problem with smb clients, and have not seen it with inc > backups (although the incs take a lot less time so maybe it will > happen sometime...), just the full with rsync. > --- > 32167 backuppc 25 5 0 0 0 Z 0 0.0 5:21.56 > BackupPC_dump > 32321 backuppc 25 5 7168 3600 1164 S 0 0.2 0:01.16 ssh > 32587 backuppc 25 5 79768 73m 1220 S 0 3.7 2:31.20 BackupPC_dump > > > The status page says: > - > localhost full backuppc 7/23 12:00 BackupPC_dump > localhost > 32167 32321, 32587 > > > The computer doesn't crash or appear to have any problems, and if i > watch the whole time memory usage never goes above 10-20% total. When > I kill the ssh process, the status page goes back to showing nothing, > and a partial backup is left as shown below on the home page for the > backup: > > -- > 848 partial yes 0 7/23 00:12 112.8 0.5 > /home/backup/pc/localhost/848 > --- > > If I keep trying, eventually it makes it through the whole backup, as > shown here: > > --- > 848 full yes 0 7/23 15:04 152.4 0.1 > /home/backup/pc/localhost/848 > - > > Since the backup eventually completes, and sometimes has to only be > restarted once or twice while other times 6 or 7 times, I can't figure > out where to start debugging. It is reproducible in that I haven't > had a successful full backup without at least one restart since the > upgrade. > > If some of you have some theories, I will be happy to capture > additional info about what is going on when this happens...I am not > sure offhand what info/logs/files would be important. It does not > seem like a hardware problem to me. The log itself ends like this: > > same 764 506/500 49664 > home/common/LabView/Gas_Swirl_Process_Control/flame_frf/dyn_data/WriteData2/WriteData2.opt > Parent read EOF from child: fatal error! > Done: 0 files, 0 bytes > Got fatal error during xfer (Child exited prematurely) > Backup aborted (Child exited prematurely) > Not saving this as a partial backup since it has fewer files than the > prior one (got 2 and 0 files versus 5) > --- > However i suspect that error is from me killing the ssh not whatever > caused things to stall. i can't find any errors in the log when it is > stalled but before i kill processes as backuppc seems to think things > are running fine even though they have stalled out... > > I am wondering if there is some kind of timeout or permission or other > default that has changed with ssh or rsync on the upgrade that is > causing this...however since it works for increments and the transport > is the same for those I don't see how it could be the case... > > Also, is there a difference when the process is started from the cgi > interface vs. starting itself on schedule? When it succeeds it has > always been after one or more restarts from the cgi... > > anyway thanks, looking forward to hearing ideas from you guys on where to > look. > > Steve > -- "It turns out there is considerable overlap between the smartest bears and the dumbest tourists." -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
[BackupPC-users] BackupPC_dump stalls and leave zombie process
Hello, I am running Backuppc 3.1.0 with ubuntu 9.04 now, and since the upgrade I've developed a problem with backuppc. During full backups, the backup stalls and leaves a zombie. The backup just sits (as shown below) until I kill the ssh process or restart backuppc. I have not seen this problem with smb clients, and have not seen it with inc backups (although the incs take a lot less time so maybe it will happen sometime...), just the full with rsync. --- 32167 backuppc 25 5 000 Z0 0.0 5:21.56 BackupPC_dump 32321 backuppc 25 5 7168 3600 1164 S0 0.2 0:01.16 ssh 32587 backuppc 25 5 79768 73m 1220 S0 3.7 2:31.20 BackupPC_dump The status page says: - localhostfullbackuppc7/23 12:00 BackupPC_dump localhost 32167 32321, 32587 The computer doesn't crash or appear to have any problems, and if i watch the whole time memory usage never goes above 10-20% total. When I kill the ssh process, the status page goes back to showing nothing, and a partial backup is left as shown below on the home page for the backup: -- 848 partial yes 0 7/23 00:12 112.8 0.5 /home/backup/pc/localhost/848 --- If I keep trying, eventually it makes it through the whole backup, as shown here: --- 848 fullyes 0 7/23 15:04 152.4 0.1 /home/backup/pc/localhost/848 - Since the backup eventually completes, and sometimes has to only be restarted once or twice while other times 6 or 7 times, I can't figure out where to start debugging. It is reproducible in that I haven't had a successful full backup without at least one restart since the upgrade. If some of you have some theories, I will be happy to capture additional info about what is going on when this happens...I am not sure offhand what info/logs/files would be important. It does not seem like a hardware problem to me. The log itself ends like this: same 764 506/500 49664 home/common/LabView/Gas_Swirl_Process_Control/flame_frf/dyn_data/WriteData2/WriteData2.opt Parent read EOF from child: fatal error! Done: 0 files, 0 bytes Got fatal error during xfer (Child exited prematurely) Backup aborted (Child exited prematurely) Not saving this as a partial backup since it has fewer files than the prior one (got 2 and 0 files versus 5) --- However i suspect that error is from me killing the ssh not whatever caused things to stall. i can't find any errors in the log when it is stalled but before i kill processes as backuppc seems to think things are running fine even though they have stalled out... I am wondering if there is some kind of timeout or permission or other default that has changed with ssh or rsync on the upgrade that is causing this...however since it works for increments and the transport is the same for those I don't see how it could be the case... Also, is there a difference when the process is started from the cgi interface vs. starting itself on schedule? When it succeeds it has always been after one or more restarts from the cgi... anyway thanks, looking forward to hearing ideas from you guys on where to look. Steve -- ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/