Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process

2009-08-23 Thread Holger Parplies
Hi,

Jim Leonard wrote on 2009-08-23 20:04:31 -0500 [Re: [BackupPC-users] 
BackupPC_dump stalls and leave zombie process]:
> Steve wrote:
> > -
> > Aug 23 11:23:38 vaccg1 kernel: [ 1890.496217] BackupPC_dump[3888]:
> > segfault at bf23293c ip b7e249ea sp bfe92850 error 4 in
> > libz.so.1.2.3.3[b7e19000+14000]
> > --
> 
> Seems pretty obvious that the problem isn't free memory but rather some 
> bug in the interaction between backuppc_dump and libz.

"pretty obvious" as in "completely explains"? So, you are experiencing the
same behaviour? Or, at least, know of someone else that has, ever?

http://www.catb.org/~esr/faqs/smart-questions.html#id306810

> When it dies, there is a core file left behind;

Is there? Well, there might be, and looking at the stack trace might even be a
good idea (though you almost definitely will *not* find out ...

> what part of backuppc_dump you were in to cause it

... because you'll see a C backtrace - probably of stripped code -, not a Perl
backtrace, and Perl code cannot "cause" a SEGV anyway, only a bug in Perl or a
module (part) implemented in C can). Do that if you like, but it is certainly
not mandatory.

I'd either try replacing libz.so.1.2.3.3 with an uncorrupted copy (check the
md5sum of that file if you can; it may not, in fact, be a corrupted libz, but
that's where I'd start looking), if the SEGV happens roughly at the same place
each time, or else think about the possibility of hardware issues (memory, CPU,
I/O, PSU, ...; are you observing any other unexplained problems?). I wouldn't
go so far as to say any of that was obvious, let alone the only possible
explanation.

Hope that helps.

Regards,
Holger

P.S.: I wouldn't discount running out of memory as a possible cause for a
  SEGV. It was a good idea to check that.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process

2009-08-23 Thread Jim Leonard
Steve wrote:
> -
> Aug 23 11:23:38 vaccg1 kernel: [ 1890.496217] BackupPC_dump[3888]:
> segfault at bf23293c ip b7e249ea sp bfe92850 error 4 in
> libz.so.1.2.3.3[b7e19000+14000]
> --

Seems pretty obvious that the problem isn't free memory but rather some 
bug in the interaction between backuppc_dump and libz.  When it dies, 
there is a core file left behind; I would research analyzing core files 
for your particular operating system so you can see what the stack trace 
shows (ie. what part of backuppc_dump you were in to cause it).
-- 
Jim Leonard (trix...@oldskool.org)http://www.oldskool.org/
Help our electronic games project:   http://www.mobygames.com/
Or check out some trippy MindCandy at http://www.mindcandydvd.com/
A child borne of the home computer wars: http://trixter.wordpress.com/

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process

2009-08-23 Thread Steve
Hello all;

I am still trying to figure this issue out.  I only today noticed that
a segfault is recorded in BackupPC_dump when this happens!  The notice
shows up in /var/log/messages, not in any log file generated by
backuppc so i hadn't checked that before.  Below is the one from this
morning.
-
Aug 23 11:23:38 vaccg1 kernel: [ 1890.496217] BackupPC_dump[3888]:
segfault at bf23293c ip b7e249ea sp bfe92850 error 4 in
libz.so.1.2.3.3[b7e19000+14000]
--

I have been checking "free" regularly and also "top" to see memory
usage and it doesn't look anywhere close to critical; below is the
output during operation, but before stall/failure:
--
top:
 8494 backuppc  20   0 89876  83m 1320 D   12  4.1  40:29.57 BackupPC_dump
 8797 backuppc  20   0  119m 114m 1176 D8  5.7  55:18.06 BackupPC_dump
free:
 total   used   free sharedbuffers cached
Mem:   20612282030492  30736  0 2513001243688
-/+ buffers/cache: 5355041525724
Swap:  1646620  290201617600
-

Again, I would really appreciate some help with this if anyone has any ideas.

Thanks,
Steve

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process

2009-07-30 Thread Steve
> I don't know about stalling, but leaving zombie processes is something
> "normal". As far as I can tell, it happens when you dump multiple
> directories or drives from the same machine. BackupPC_dump will fork
> one process to dump the first directory, then one process to dump the
> second one, and so on, but it will "reap" the children processes all
> at once at the end. I don't know if this happens with "rsync" only
> (that's how I use it and see it here) or with the other xfer methods
> ("tar", "smb") as well...

I guess I don't care about the zombie, but since the backup never
"completes" something is obviously amiss.  I suspect that once
"complete" maybe TrashClean or something clears away the leftover
processes; there are never zombies after a successful inc or full
backup, only when one hangs...

Maybe the zombie isn't the problem put some parent process never
realizing the subtasks are complete.

thanks
steve

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process

2009-07-30 Thread Filipe Brandenburger
Hi,

On Thu, Jul 30, 2009 at 13:39, Steve wrote:
> Wondering if anything has had time to think about this question.  I am
> still having the issue...

I don't know about stalling, but leaving zombie processes is something
"normal". As far as I can tell, it happens when you dump multiple
directories or drives from the same machine. BackupPC_dump will fork
one process to dump the first directory, then one process to dump the
second one, and so on, but it will "reap" the children processes all
at once at the end. I don't know if this happens with "rsync" only
(that's how I use it and see it here) or with the other xfer methods
("tar", "smb") as well...

HTH,
Filipe

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] BackupPC_dump stalls and leave zombie process

2009-07-30 Thread Steve
Wondering if anything has had time to think about this question.  I am
still having the issue...
thanks,
Steve

On Thu, Jul 23, 2009 at 5:45 PM, Steve wrote:
> Hello,
>
> I am running Backuppc 3.1.0 with ubuntu 9.04 now, and since the
> upgrade I've developed a problem with backuppc. During full backups,
> the backup stalls and leaves a zombie.  The backup just sits (as shown
> below) until I kill the ssh process or restart backuppc.  I have not
> seen this problem with smb clients, and have not seen it with inc
> backups (although the incs take a lot less time so maybe it will
> happen sometime...), just the full with rsync.
> ---
> 32167 backuppc  25   5     0    0    0 Z    0  0.0   5:21.56
> BackupPC_dump 
> 32321 backuppc  25   5  7168 3600 1164 S    0  0.2   0:01.16 ssh
> 32587 backuppc  25   5 79768  73m 1220 S    0  3.7   2:31.20 BackupPC_dump
> 
>
> The status page says:
> -
> localhost        full    backuppc        7/23 12:00      BackupPC_dump 
> localhost
>         32167           32321, 32587
> 
>
> The computer doesn't crash or appear to have any problems, and if i
> watch the whole time memory usage never goes above 10-20% total.  When
> I kill the ssh process, the status page goes back to showing nothing,
> and a partial backup is left as shown below on the home page for the
> backup:
>
> --
> 848      partial         yes     0       7/23 00:12      112.8           0.5
> /home/backup/pc/localhost/848
> ---
>
> If I keep trying, eventually it makes it through the whole backup, as
> shown here:
>
> ---
> 848      full    yes     0       7/23 15:04      152.4           0.1
> /home/backup/pc/localhost/848
> -
>
> Since the backup eventually completes, and sometimes has to only be
> restarted once or twice while other times 6 or 7 times, I can't figure
> out where to start debugging.  It is reproducible in that I haven't
> had a successful full backup without at least one restart since the
> upgrade.
>
> If some of you have some theories, I will be happy to capture
> additional info about what is going on when this happens...I am not
> sure offhand what info/logs/files would be important.  It does not
> seem like a hardware problem to me.  The log itself ends like this:
> 
> same     764   506/500       49664
> home/common/LabView/Gas_Swirl_Process_Control/flame_frf/dyn_data/WriteData2/WriteData2.opt
> Parent read EOF from child: fatal error!
> Done: 0 files, 0 bytes
> Got fatal error during xfer (Child exited prematurely)
> Backup aborted (Child exited prematurely)
> Not saving this as a partial backup since it has fewer files than the
> prior one (got 2 and 0 files versus 5)
> ---
> However i suspect that error is from me killing the ssh not whatever
> caused things to stall.  i can't find any errors in the log when it is
> stalled but before i kill processes as backuppc seems to think things
> are running fine even though they have stalled out...
>
> I am wondering if there is some kind of timeout or permission or other
> default that has changed with ssh or rsync on the upgrade that is
> causing this...however since it works for increments and the transport
> is the same for those I don't see how it could be the case...
>
> Also, is there a difference when the process is started from the cgi
> interface vs. starting itself on schedule?  When it succeeds it has
> always been after one or more restarts from the cgi...
>
> anyway thanks, looking forward to hearing ideas from you guys on where to 
> look.
>
> Steve
>



-- 
"It turns out there is considerable overlap between the smartest bears
and the dumbest tourists."

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


[BackupPC-users] BackupPC_dump stalls and leave zombie process

2009-07-23 Thread Steve
Hello,

I am running Backuppc 3.1.0 with ubuntu 9.04 now, and since the
upgrade I've developed a problem with backuppc. During full backups,
the backup stalls and leaves a zombie.  The backup just sits (as shown
below) until I kill the ssh process or restart backuppc.  I have not
seen this problem with smb clients, and have not seen it with inc
backups (although the incs take a lot less time so maybe it will
happen sometime...), just the full with rsync.
---
32167 backuppc  25   5 000 Z0  0.0   5:21.56
BackupPC_dump 
32321 backuppc  25   5  7168 3600 1164 S0  0.2   0:01.16 ssh
32587 backuppc  25   5 79768  73m 1220 S0  3.7   2:31.20 BackupPC_dump


The status page says:
-
localhostfullbackuppc7/23 12:00  BackupPC_dump localhost
 32167   32321, 32587


The computer doesn't crash or appear to have any problems, and if i
watch the whole time memory usage never goes above 10-20% total.  When
I kill the ssh process, the status page goes back to showing nothing,
and a partial backup is left as shown below on the home page for the
backup:

--
848  partial yes 0   7/23 00:12  112.8   0.5
/home/backup/pc/localhost/848
---

If I keep trying, eventually it makes it through the whole backup, as
shown here:

---
848  fullyes 0   7/23 15:04  152.4   0.1
/home/backup/pc/localhost/848
-

Since the backup eventually completes, and sometimes has to only be
restarted once or twice while other times 6 or 7 times, I can't figure
out where to start debugging.  It is reproducible in that I haven't
had a successful full backup without at least one restart since the
upgrade.

If some of you have some theories, I will be happy to capture
additional info about what is going on when this happens...I am not
sure offhand what info/logs/files would be important.  It does not
seem like a hardware problem to me.  The log itself ends like this:

same 764   506/500   49664
home/common/LabView/Gas_Swirl_Process_Control/flame_frf/dyn_data/WriteData2/WriteData2.opt
Parent read EOF from child: fatal error!
Done: 0 files, 0 bytes
Got fatal error during xfer (Child exited prematurely)
Backup aborted (Child exited prematurely)
Not saving this as a partial backup since it has fewer files than the
prior one (got 2 and 0 files versus 5)
---
However i suspect that error is from me killing the ssh not whatever
caused things to stall.  i can't find any errors in the log when it is
stalled but before i kill processes as backuppc seems to think things
are running fine even though they have stalled out...

I am wondering if there is some kind of timeout or permission or other
default that has changed with ssh or rsync on the upgrade that is
causing this...however since it works for increments and the transport
is the same for those I don't see how it could be the case...

Also, is there a difference when the process is started from the cgi
interface vs. starting itself on schedule?  When it succeeds it has
always been after one or more restarts from the cgi...

anyway thanks, looking forward to hearing ideas from you guys on where to look.

Steve

--
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/