Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-21 Thread Matija Nalis
On Sun, Apr 18, 2010 at 11:46:33AM -0500, Jon Schewe wrote:
  http://wiki.bacula.org/doku.php?id=faq#my_backup_starts_but_dies_after_a_while_with_connection_reset_by_peer_error
 
  [1] It actually tries that at one point in src/lib/bsock.c if
  TCP_KEEPIDLE support is detected, but it fails to detect it
  properly because netinet/tcp.h is not included.
 
  However, even after fixing that (and missing semicolon in 
  'int opt = heart_beat' line), it still doesn't look like it sets
  TCP_KEEPIDLE correctly on FD-SD connection, so maybe this
  codepath is not used there. 
 
  Anyway I gave up debugging there and just set the system
  defaults. But I just though I'd mention that in case someone
  else wants to continue chasing the bug.
 

 Hmm, this sounds like a bug that should be fixed and once it is fixed
 may remove a bunch of problems with firewalls.

FYI, I've put up a patch which fixes current support on bacula-devel
mailing list. That support could be extended (as not all parts of
bacula use that function), but it might be enough. 

If someone is willing to try it, let me (or better, the whole list)
know how it fares and if it fixes the timeouts without the user
needing to resort to changing systems defaults.

--
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-18 Thread Jon Schewe
On 04/16/2010 08:30 AM, Matija Nalis wrote:
 On Mon, Apr 12, 2010 at 03:59:49PM -0500, Jon Schewe wrote:
   
 On 4/12/10 9:40 AM, Matija Nalis wrote:
 
 It is especially problem with bigger databases and MySQL instead of
 PostgreSQL, see http://bugs.bacula.org/view.php?id=1472, where it can
 take even several hours! (note that while it talks about restore
 speed, it is also related to accurate backups which employ similar
 SQL queries)

   
 Must be what it is then. I've been thinking about switching to postgres,
 but haven't because the opensuse packages for bacula are only for mysql.
 This may motivate me more.
 
 You should probably switch soon, before you get to like your
 database,,, Exporting bacula mysql tables for import in PostgreSQL
 can be very painful and problematic; it is much better to just drop
 the database and create fresh one.

   
I'll keep that in mind as I go forward.

 The backup finished, so it seems that in version 3.0.3 bacula does NOT
 set the socket option SO_KEEPALIVE.
 
 Hmm, yeah, I've check the code casually, and it indeed looks like the
 heartbeats are not setting SO_KEEPALIVE timeouts (note that it does
 set SO_KEEPALIVE on the socket, otherwise the advice above wouldn't
 work -- it just doesn't do TCP_KEEPIDLE on that[1] to specify
 user-defined timeouts and instead uses system defaults). 

 The heartbeats look like are doing other things though (application-level, 
 not socket-level), but as you saw they are not perfect for fixing network 
 idleness problems - and so you also MUST set system defaults.

 I've updated the FAQ at:
 http://wiki.bacula.org/doku.php?id=faq#my_backup_starts_but_dies_after_a_while_with_connection_reset_by_peer_error


 [1] It actually tries that at one point in src/lib/bsock.c if
 TCP_KEEPIDLE support is detected, but it fails to detect it
 properly because netinet/tcp.h is not included.

 However, even after fixing that (and missing semicolon in 
 'int opt = heart_beat' line), it still doesn't look like it sets
 TCP_KEEPIDLE correctly on FD-SD connection, so maybe this
 codepath is not used there. 

 Anyway I gave up debugging there and just set the system
 defaults. But I just though I'd mention that in case someone
 else wants to continue chasing the bug.

   
Hmm, this sounds like a bug that should be fixed and once it is fixed
may remove a bunch of problems with firewalls.


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-16 Thread Matija Nalis
On Mon, Apr 12, 2010 at 03:59:49PM -0500, Jon Schewe wrote:
 On 4/12/10 9:40 AM, Matija Nalis wrote:
  It is especially problem with bigger databases and MySQL instead of
  PostgreSQL, see http://bugs.bacula.org/view.php?id=1472, where it can
  take even several hours! (note that while it talks about restore
  speed, it is also related to accurate backups which employ similar
  SQL queries)
 
 Must be what it is then. I've been thinking about switching to postgres,
 but haven't because the opensuse packages for bacula are only for mysql.
 This may motivate me more.

You should probably switch soon, before you get to like your
database,,, Exporting bacula mysql tables for import in PostgreSQL
can be very painful and problematic; it is much better to just drop
the database and create fresh one.

 The backup finished, so it seems that in version 3.0.3 bacula does NOT
 set the socket option SO_KEEPALIVE.

Hmm, yeah, I've check the code casually, and it indeed looks like the
heartbeats are not setting SO_KEEPALIVE timeouts (note that it does
set SO_KEEPALIVE on the socket, otherwise the advice above wouldn't
work -- it just doesn't do TCP_KEEPIDLE on that[1] to specify
user-defined timeouts and instead uses system defaults). 

The heartbeats look like are doing other things though (application-level, 
not socket-level), but as you saw they are not perfect for fixing network 
idleness problems - and so you also MUST set system defaults.

I've updated the FAQ at:
http://wiki.bacula.org/doku.php?id=faq#my_backup_starts_but_dies_after_a_while_with_connection_reset_by_peer_error


[1] It actually tries that at one point in src/lib/bsock.c if
TCP_KEEPIDLE support is detected, but it fails to detect it
properly because netinet/tcp.h is not included.

However, even after fixing that (and missing semicolon in 
'int opt = heart_beat' line), it still doesn't look like it sets
TCP_KEEPIDLE correctly on FD-SD connection, so maybe this
codepath is not used there. 

Anyway I gave up debugging there and just set the system
defaults. But I just though I'd mention that in case someone
else wants to continue chasing the bug.

-- 
Matija Nalis
Odjel racunalno-informacijskih sustava i servisa
  
Hrvatska akademska i istrazivacka mreza - CARNet 
Josipa Marohnica 5, 1 Zagreb
tel. +385 1 6661 616, fax. +385 1 6661 766
www.CARNet.hr

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-12 Thread Graham Keeling
On Sun, Apr 11, 2010 at 09:32:43AM -0500, Jon Schewe wrote:
 I got it to work again last night. Changing the firewall time outs
 didn't help. What fixed it was turning off Accurate backups.

Ah, so possibly bacula spent long enough stuck doing an accurate query in the
catalog that the firewall connection timed out.
Are you using mysql and bacula-5.0.1?


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-12 Thread Matija Nalis
On Fri, Apr 09, 2010 at 07:30:19PM -0500, Jon Schewe wrote:
 I have heartbeat intervals set at the following:
 bacula-dir.conf:
 client {
   Heartbeat interval = 15 Seconds
 }
 storage {
   Heartbeat interval = 1 minutes
 }
 
 bacula-sd.conf
 storage {
   Heartbeat interval = 1 minute
 }
 
 bacula-fd.conf
 FileDaemon {
   Heartbeat Interval = 5 seconds
 }

Strange. Are you running GNU/Linux system on all the machines 
(FD, SD, DIR) ? IIRC, it might not be supported on other systems,
and/or it may need additional tuning on them.


I've updated the docs at http://tinyurl.com/y8wapdu


-- 
Matija Nalis
Odjel racunalno-informacijskih sustava i servisa
  
Hrvatska akademska i istrazivacka mreza - CARNet 
Josipa Marohnica 5, 1 Zagreb
tel. +385 1 6661 616, fax. +385 1 6661 766
www.CARNet.hr

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-12 Thread Jon Schewe
On 04/12/2010 04:17 AM, Matija Nalis wrote:
 On Fri, Apr 09, 2010 at 07:30:19PM -0500, Jon Schewe wrote:
   
 I have heartbeat intervals set at the following:
 bacula-dir.conf:
 client {
   Heartbeat interval = 15 Seconds
 }
 storage {
   Heartbeat interval = 1 minutes
 }

 bacula-sd.conf
 storage {
   Heartbeat interval = 1 minute
 }

 bacula-fd.conf
 FileDaemon {
   Heartbeat Interval = 5 seconds
 }
 
 Strange. Are you running GNU/Linux system on all the machines 
 (FD, SD, DIR) ? IIRC, it might not be supported on other systems,
 and/or it may need additional tuning on them.

   
I'm running opensuse Linux for the director and storage daemon and
Debian Linux for the file daemon.


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-12 Thread Matija Nalis
On Mon, Apr 12, 2010 at 05:41:51AM -0500, Jon Schewe wrote:
  Strange. Are you running GNU/Linux system on all the machines 
  (FD, SD, DIR) ? IIRC, it might not be supported on other systems,
  and/or it may need additional tuning on them.
 

 I'm running opensuse Linux for the director and storage daemon and
 Debian Linux for the file daemon.

that is strange... 
can you check what are your default SO_KEEPALIVE values with:

grep '' /proc/sys/net/ipv4/tcp_keepalive_*

and what bacula is using for running connections - start backup first,
then check if keepalive is enabled (and with what timers) with:

netstat -to

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-12 Thread Jon Schewe
On 4/12/10 7:21 AM, Matija Nalis wrote:
 On Mon, Apr 12, 2010 at 05:41:51AM -0500, Jon Schewe wrote:
   
 Strange. Are you running GNU/Linux system on all the machines 
 (FD, SD, DIR) ? IIRC, it might not be supported on other systems,
 and/or it may need additional tuning on them.

   
   
 I'm running opensuse Linux for the director and storage daemon and
 Debian Linux for the file daemon.
 
 that is strange... 
 can you check what are your default SO_KEEPALIVE values with:

 grep '' /proc/sys/net/ipv4/tcp_keepalive_*

   
Server:
/proc/sys/net/ipv4/tcp_keepalive_intvl:75
/proc/sys/net/ipv4/tcp_keepalive_probes:9
/proc/sys/net/ipv4/tcp_keepalive_time:7200

Client:
/proc/sys/net/ipv4/tcp_keepalive_intvl:75
/proc/sys/net/ipv4/tcp_keepalive_probes:9
/proc/sys/net/ipv4/tcp_keepalive_time:7200

bacula 3.0.3 on both systems

 and what bacula is using for running connections - start backup first,
 then check if keepalive is enabled (and with what timers) with:

 netstat -to
   
Client:
tcp0  0 client:9102   server:54043  ESTABLISHED
keepalive (7196.36/0/0)
tcp0  0 client:43628  server:9103   ESTABLISHED
keepalive (7197.26/0/0)

Server (behind NAT):
tcp0  0 192.168.42.2:9103   client:43628 
ESTABLISHED keepalive (7199.10/0/0)
tcp0  0 127.0.0.2:9103  127.0.0.2:33218
ESTABLISHED keepalive (7197.84/0/0)
tcp0  0 127.0.0.2:36664 127.0.0.2:9101 
TIME_WAIT   timewait (56.31/0/0)
tcp0  0 192.168.42.2:54043  client:9102  
ESTABLISHED keepalive (7198.18/0/0)

-- 
Jon Schewe | http://mtu.net/~jpschewe
If you see an attachment named signature.asc, this is my digital
signature. See http://www.gnupg.org for more information.


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-12 Thread Matija Nalis
On Mon, Apr 12, 2010 at 07:59:53AM -0500, Jon Schewe wrote:
 /proc/sys/net/ipv4/tcp_keepalive_time:7200
  netstat -to
 Client:
 tcp0  0 client:9102   server:54043  ESTABLISHED
 keepalive (7196.36/0/0)

That's strange. It should've been the timeouts you specified in
config files, not 7200 seconds (two hours) which is system default.

It looks like bacula does not use TCP_KEEPIDLE setsockopt(2) on your
system. You might want to report a bug on http://bugs.bacula.org/

IMHO, it should work there. Or if not, it should probably throw a
warning if you try to use it and it is not supported or fails.

Apart from fixing bacula, you can override system default, for
example (on both server and client) do :

echo 60  /proc/sys/net/ipv4/tcp_keepalive_time

(or edit /etc/sysctl.d/* or /etc/sysctl.conf to retain value across
reboots). Can you try what netstat -to says after you lower that
limit and rerun backups ? 

If netstat -to then reports smaller timers (60 or less), than it
should fix your problem, so you can try turning accurate back to yes.

Does that help ?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-12 Thread Jon Schewe
On 4/12/10 8:39 AM, Matija Nalis wrote:
 On Mon, Apr 12, 2010 at 07:59:53AM -0500, Jon Schewe wrote:
   
 /proc/sys/net/ipv4/tcp_keepalive_time:7200
 
 netstat -to
   
 Client:
 tcp0  0 client:9102   server:54043  ESTABLISHED
 keepalive (7196.36/0/0)
 
 That's strange. It should've been the timeouts you specified in
 config files, not 7200 seconds (two hours) which is system default.

 It looks like bacula does not use TCP_KEEPIDLE setsockopt(2) on your
 system. You might want to report a bug on http://bugs.bacula.org/

 IMHO, it should work there. Or if not, it should probably throw a
 warning if you try to use it and it is not supported or fails.

 Apart from fixing bacula, you can override system default, for
 example (on both server and client) do :

 echo 60  /proc/sys/net/ipv4/tcp_keepalive_time

 (or edit /etc/sysctl.d/* or /etc/sysctl.conf to retain value across
 reboots). Can you try what netstat -to says after you lower that
 limit and rerun backups ? 
   
Now I see the timer down where I expect it. Should I only need this on
the client?
 If netstat -to then reports smaller timers (60 or less), than it
 should fix your problem, so you can try turning accurate back to yes.

 Does that help ?
   
It's running, I'll know in a couple of hours.

-- 
Jon Schewe | http://mtu.net/~jpschewe
If you see an attachment named signature.asc, this is my digital
signature. See http://www.gnupg.org for more information.


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-12 Thread Jon Schewe
On 4/12/10 9:00 AM, Matija Nalis wrote:
 On Mon, Apr 12, 2010 at 08:45:36AM -0500, Jon Schewe wrote:
   
 On 4/12/10 8:39 AM, Matija Nalis wrote:
 
 echo 60  /proc/sys/net/ipv4/tcp_keepalive_time

 (or edit /etc/sysctl.d/* or /etc/sysctl.conf to retain value across
 reboots). Can you try what netstat -to says after you lower that
 limit and rerun backups ? 

   
 Now I see the timer down where I expect it. Should I only need this on
 the client?
 
 If only that client is having timeout timeout problems, than yes (as
 I understand your Director and SD are on same server, so you should
 not have timeout issues there as no networking is involved).

 (SO_KEEPALIVE will work even with only one side of connection having
 it enabled).

   
So I should only need the heartbeat on that client's setup as well,
right? Getting rid of extra heart beats would be nice.

 If netstat -to then reports smaller timers (60 or less), than it
 should fix your problem, so you can try turning accurate back to yes.

 Does that help ?
   
 It's running, I'll know in a couple of hours.
 
 Good, let us know how it fares.

   
It seems to be running, but I've run into a problem with bconsole. Once
I started the job, if I run bconsole and then status dir, the console
hangs. If I strace the bconsole process it's stuck in a select call.
strace -p 18452
Process 18452 attached - interrupt to quit
select(4, [3], NULL, NULL, {9, 461287}) = 0 (Timeout)
read(3, 0x655d80, 5)= -1 EAGAIN (Resource
temporarily unavailable)
select(4, [3], NULL, NULL, {10, 0}) = 0 (Timeout)
read(3, 0x655d80, 5)= -1 EAGAIN (Resource
temporarily unavailable)
select(4, [3], NULL, NULL, {10, 0}


-- 
Jon Schewe | http://mtu.net/~jpschewe
If you see an attachment named signature.asc, this is my digital
signature. See http://www.gnupg.org for more information.


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-12 Thread Matija Nalis
On Mon, Apr 12, 2010 at 09:23:51AM -0500, Jon Schewe wrote:
 On 4/12/10 9:00 AM, Matija Nalis wrote:
  (SO_KEEPALIVE will work even with only one side of connection having
  it enabled).

 So I should only need the heartbeat on that client's setup as well,
 right? Getting rid of extra heart beats would be nice.

Yes, it should be enough. Note that there is no real need to get rid
of extra heartbeats, they are not really expensive (so biggest gain
is cleaner config files).

  Good, let us know how it fares.

 It seems to be running, but I've run into a problem with bconsole. Once
 I started the job, if I run bconsole and then status dir, the console
 hangs. If I strace the bconsole process it's stuck in a select call.

 strace -p 18452
 Process 18452 attached - interrupt to quit
 select(4, [3], NULL, NULL, {9, 461287}) = 0 (Timeout)
 read(3, 0x655d80, 5)= -1 EAGAIN (Resource
 temporarily unavailable)

That should not be related to SO_KEEPALIVE - it should be completly
transparent to the applications if the network is working (and even
when it is not working, it should differ only in always terminating
the connection instead of sometimes terminating connection and
sometimes hanging idefinitely).

Anyway, it may be few issues with directory hanging. Most common is
you are too eager. For example, is SQL server is busy, status dir
will hang until it completes.

It is especially problem with bigger databases and MySQL instead of
PostgreSQL, see http://bugs.bacula.org/view.php?id=1472, where it can
take even several hours! (note that while it talks about restore
speed, it is also related to accurate backups which employ similar
SQL queries)

You can check for this with show processlist in MySQL (if you are
running MySQL for database, of course) if that is the case (or simply
wait).

Or you might be unlucky enough to hit a real director bug in 5.0.1,
see http://bugs.bacula.org/view.php?id=1528, but that is unlikely.


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-12 Thread Jon Schewe
On 4/12/10 9:40 AM, Matija Nalis wrote:
 On Mon, Apr 12, 2010 at 09:23:51AM -0500, Jon Schewe wrote:
   
 On 4/12/10 9:00 AM, Matija Nalis wrote:
 
 Good, let us know how it fares.
   
   
 It seems to be running, but I've run into a problem with bconsole. Once
 I started the job, if I run bconsole and then status dir, the console
 hangs. If I strace the bconsole process it's stuck in a select call.

 
 strace -p 18452
   
 Process 18452 attached - interrupt to quit
 select(4, [3], NULL, NULL, {9, 461287}) = 0 (Timeout)
 read(3, 0x655d80, 5)= -1 EAGAIN (Resource
 temporarily unavailable)
 
 That should not be related to SO_KEEPALIVE - it should be completly
 transparent to the applications if the network is working (and even
 when it is not working, it should differ only in always terminating
 the connection instead of sometimes terminating connection and
 sometimes hanging idefinitely).

 Anyway, it may be few issues with directory hanging. Most common is
 you are too eager. For example, is SQL server is busy, status dir
 will hang until it completes.

   
 It is especially problem with bigger databases and MySQL instead of
 PostgreSQL, see http://bugs.bacula.org/view.php?id=1472, where it can
 take even several hours! (note that while it talks about restore
 speed, it is also related to accurate backups which employ similar
 SQL queries)

   
Must be what it is then. I've been thinking about switching to postgres,
but haven't because the opensuse packages for bacula are only for mysql.
This may motivate me more.

The backup finished, so it seems that in version 3.0.3 bacula does NOT
set the socket option SO_KEEPALIVE.

-- 
Jon Schewe | http://mtu.net/~jpschewe
If you see an attachment named signature.asc, this is my digital
signature. See http://www.gnupg.org for more information.


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-11 Thread Jon Schewe
I got it to work again last night. Changing the firewall time outs
didn't help. What fixed it was turning off Accurate backups.


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-10 Thread Jon Schewe
On 04/09/2010 02:33 AM, jerry lowry wrote:
 On 4/10/2010 3:30 AM, Jon Schewe wrote:
   
 On 04/08/2010 07:04 AM, Matija Nalis wrote:

 
 On Wed, Apr 07, 2010 at 02:15:14PM +0100, Prashant Ramhit wrote:

  
   
 b06-Apr 12:54 client-fd JobId 299: Fatal error: backup.c:892 Network 
 send error to SD. ERR=Connection reset by peer/b/small/pre

 Is it possible to tell me how to enable more debug on client and
 storage so that i can find more clues to this issue.br


 
 You can use -d number to increase debug level; but in your case it
 should be pretty clear -- something (usually router or firewall)
 between SD and FD (or even local firewalls on themselves) is killing
 TCP connection (usually because it was idle for too long).

 See http://tinyurl.com/y8wapdu
 it adding Heartbeat Interval helps you.


  
   
 I have heartbeat intervals set at the following:
 bacula-dir.conf:
 client {
Heartbeat interval = 15 Seconds
 }
 storage {
Heartbeat interval = 1 minutes
 }

 bacula-sd.conf
 storage {
Heartbeat interval = 1 minute
 }

 bacula-fd.conf
 FileDaemon {
Heartbeat Interval = 5 seconds
 }


 
 Hi,  are you backing up through a firewall.  I had this same problem and 
 it tuned out that the firewall has a setup limit on how long a job will 
 last.  Reset the limit and all my backups work as planned.


   
Yes, I'm behind a firewall running dd-wrt. Do I just need to increase
the connection timeout? Why doesn't the heartbeat take care of this?


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-10 Thread Jon Schewe
I increased the connection timeout and started another job and got this:

10-Apr 08:11 jon-dir JobId 5334: Start Backup JobId 5334,
Job=mtu.2010-04-10_08.11.11_32
10-Apr 08:11 jon-dir JobId 5334: Using Device FileStorage
10-Apr 08:11 mtu-fd JobId 5334: shell command: run ClientRunBeforeJob
/etc/bacula/before-full-backup.sh
10-Apr 08:11 jon-dir JobId 5334: Sending Accurate information.
10-Apr 10:51 jon-dir JobId 0: Error: bsock.c:379 Wrote 77 bytes to
client:127.0.0.2:36131, but only 0 accepted.
10-Apr 10:51 jon-dir JobId 0: Error: bsock.c:379 Wrote 77 bytes to
client:127.0.0.2:36131, but only 0 accepted.
10-Apr 10:51 jon-dir JobId 0: Error: openssl.c:86 TLS shutdown failure.:
ERR=error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry
10-Apr 10:51 jon-dir JobId 0: Error: openssl.c:86 TLS shutdown failure.:
ERR=error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry
10-Apr 10:53 mtu-fd JobId 5334: Fatal error: Bad response from stored to
open command
10-Apr 10:53 jon-dir JobId 5334: Error: Bacula jon-dir 3.0.3 (18Oct09):
10-Apr-2010 10:53:23


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-09 Thread jerry lowry
On 4/10/2010 3:30 AM, Jon Schewe wrote:
 On 04/08/2010 07:04 AM, Matija Nalis wrote:

 On Wed, Apr 07, 2010 at 02:15:14PM +0100, Prashant Ramhit wrote:

  
 b06-Apr 12:54 client-fd JobId 299: Fatal error: backup.c:892 Network send 
 error to SD. ERR=Connection reset by peer/b/small/pre

 Is it possible to tell me how to enable more debug on client and
 storage so that i can find more clues to this issue.br


 You can use -d number to increase debug level; but in your case it
 should be pretty clear -- something (usually router or firewall)
 between SD and FD (or even local firewalls on themselves) is killing
 TCP connection (usually because it was idle for too long).

 See http://tinyurl.com/y8wapdu
 it adding Heartbeat Interval helps you.


  
 I have heartbeat intervals set at the following:
 bacula-dir.conf:
 client {
Heartbeat interval = 15 Seconds
 }
 storage {
Heartbeat interval = 1 minutes
 }

 bacula-sd.conf
 storage {
Heartbeat interval = 1 minute
 }

 bacula-fd.conf
 FileDaemon {
Heartbeat Interval = 5 seconds
 }


 --
 Download Intel#174; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users

Hi,  are you backing up through a firewall.  I had this same problem and 
it tuned out that the firewall has a setup limit on how long a job will 
last.  Reset the limit and all my backups work as planned.

jerry

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: backup.c:892 Network send error to SD. ERR=Connection reset by peer

2010-04-08 Thread Matija Nalis
On Wed, Apr 07, 2010 at 02:15:14PM +0100, Prashant Ramhit wrote:
 b06-Apr 12:54 client-fd JobId 299: Fatal error: backup.c:892 Network send 
 error to SD. ERR=Connection reset by peer/b/small/pre

 Is it possible to tell me how to enable more debug on client and
 storage so that i can find more clues to this issue.br

You can use -d number to increase debug level; but in your case it
should be pretty clear -- something (usually router or firewall)
between SD and FD (or even local firewalls on themselves) is killing
TCP connection (usually because it was idle for too long).

See http://tinyurl.com/y8wapdu
it adding Heartbeat Interval helps you.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users