Re: Dump failed with data read: recv error: Resource temporarily unavailable after migration of some clients

2017-09-25 Thread Jean-Louis Martineau
On 23/09/17 07:31 AM, Matthias Teege wrote:
> On Fri, Sep 22, 2017 at 01:16:27PM -0400, Jean-Louis Martineau wrote:
>
> Hi,
>
> I have again looked more closely. Perhaps the (recv error) problem is
> an update of the Centos amanda packages. On tuesday we updated the
> server and there was an update from amanda.*3.3.3-17.el7 to
> 3.3.3-18.el7.
The problem is in 3.3.8-18 because they put the socket in non blocking mode.
This is wrong because it create more problem.

Jean-Louis

>
>> Can you try the attached patch?
> I've compiled 3.3.3 with your patch and run a dump with:
>
>   su amandabackup -c '/opt/amanda/3.3.3/sbin/amdump --no-taper daily nfs 
> /home/./c'
>
> It works without problems.
>
>> After the run, can you grep for'first read return EAGAIN' and'second
>> read return EAGAIN' in the dumper debug files?
> I'm surprised, there is no dumper debug file. In my amanda.conf I have:
>
>   logdir "/var/log/amanda/daily"
>   debug-planner 3
>   debug-dumper 3
>
> but ...
>
>   # find /var/log/amanda -name '*dump*23*'
>   /var/log/amanda/daily/amdump.20170923125853
>   /var/log/amanda/daily/amdump.20170923123220
>   /var/log/amanda/daily/amdump.20170923123412
>   /var/log/amanda/server/daily/amdump.20170923005801.debug
>
> Is this because of "--no-taper"?
>
> Thanks again!
> Matthias
>
This message is the property of CARBONITE, INC. and may contain confidential or 
privileged information.
If this message has been delivered to you by mistake, then do not copy or 
deliver this message to anyone.  Instead, destroy it and notify me by reply 
e-mail


Re: Dump failed with data read: recv error: Resource temporarily unavailable after migration of some clients

2017-09-24 Thread Matthias Teege
On Sat, Sep 23, 2017 at 08:16:38AM -0400, Jean-Louis Martineau wrote:

Hi,

> The debug files are in the debug directory:
>amgetconf build.amanda_dbgdir

ah, ok its /tmp/amanda.

 su amandabackup -c '/opt/amanda/3.3.3/sbin/amgetconf build.amanda_dbgdir'
 /tmp/amanda

I've made a run with 6 servers, all on different spindles. It simply
works. No error message and nothing in the debug log.

 # grep -ri EAG /tmp/amanda/
 # 

(debug logs are there)

I'll start a full run over all hosts tonight. It looks like its only a
problem with the new Centos Packages.

What do you think?
Matthias



Re: Dump failed with data read: recv error: Resource temporarily unavailable after migration of some clients

2017-09-23 Thread Jean-Louis Martineau
The debug files are in the debug directory:
   amgetconf build.amanda_dbgdir

Jean-Louis

On 23/09/17 07:31 AM, Matthias Teege wrote:
> On Fri, Sep 22, 2017 at 01:16:27PM -0400, Jean-Louis Martineau wrote:
>
> Hi,
>
> I have again looked more closely. Perhaps the (recv error) problem is
> an update of the Centos amanda packages. On tuesday we updated the
> server and there was an update from amanda.*3.3.3-17.el7 to
> 3.3.3-18.el7.
>
>> Can you try the attached patch?
> I've compiled 3.3.3 with your patch and run a dump with:
>
>   su amandabackup -c '/opt/amanda/3.3.3/sbin/amdump --no-taper daily nfs 
> /home/./c'
>
> It works without problems.
>
>> After the run, can you grep for'first read return EAGAIN' and'second
>> read return EAGAIN' in the dumper debug files?
> I'm surprised, there is no dumper debug file. In my amanda.conf I have:
>
>   logdir "/var/log/amanda/daily"
>   debug-planner 3
>   debug-dumper 3
>
> but ...
>
>   # find /var/log/amanda -name '*dump*23*'
>   /var/log/amanda/daily/amdump.20170923125853
>   /var/log/amanda/daily/amdump.20170923123220
>   /var/log/amanda/daily/amdump.20170923123412
>   /var/log/amanda/server/daily/amdump.20170923005801.debug
>
> Is this because of "--no-taper"?
>
> Thanks again!
> Matthias
>
This message is the property of CARBONITE, INC. and may contain confidential or 
privileged information.
If this message has been delivered to you by mistake, then do not copy or 
deliver this message to anyone.  Instead, destroy it and notify me by reply 
e-mail


Re: Dump failed with data read: recv error: Resource temporarily unavailable after migration of some clients

2017-09-23 Thread Matthias Teege
On Fri, Sep 22, 2017 at 01:16:27PM -0400, Jean-Louis Martineau wrote:

Hi,

I have again looked more closely. Perhaps the (recv error) problem is
an update of the Centos amanda packages. On tuesday we updated the
server and there was an update from amanda.*3.3.3-17.el7 to
3.3.3-18.el7.

> Can you try the attached patch?

I've compiled 3.3.3 with your patch and run a dump with:

 su amandabackup -c '/opt/amanda/3.3.3/sbin/amdump --no-taper daily nfs 
/home/./c'

It works without problems.

> After the run, can you grep for'first read return EAGAIN' and'second
> read return EAGAIN' in the dumper debug files?

I'm surprised, there is no dumper debug file. In my amanda.conf I have:

 logdir "/var/log/amanda/daily"
 debug-planner 3
 debug-dumper 3

but ...

 # find /var/log/amanda -name '*dump*23*'
 /var/log/amanda/daily/amdump.20170923125853
 /var/log/amanda/daily/amdump.20170923123220
 /var/log/amanda/daily/amdump.20170923123412
 /var/log/amanda/server/daily/amdump.20170923005801.debug

Is this because of "--no-taper"?

Thanks again!
Matthias



Re: Dump failed with data read: recv error: Resource temporarily unavailable after migration of some clients

2017-09-22 Thread Jean-Louis Martineau
Matthias,

Can you try the attached patch?
After the run, can you grep for 'first read return EAGAIN' and 'second 
read return EAGAIN' in the dumper debug files?

Jean-Louis

On 22/09/17 10:57 AM, Matthias Teege wrote:
> On Thu, Sep 21, 2017 at 09:45:04AM +0200, Matthias Teege wrote:
>
> Hi,
>
>> I have a running Amanda 3.3.3 Server. The backup works without
>> problems. On the clients we use Amanda 3.3.0/3.3.6 and 2.6.x on some
>> old systems.
> I've made some more tests and its look like that its not a problem
> of one host. I've run amdump for a single host and a single disk and
> got the same error. On the client I see:
>
> Fri Sep 22 16:15:35 2017: thd-0x55b3cc1bd400: amandad: 
> security_stream_seterr(0x55b3cc1e0f70, write error to: Connection reset by 
> peer)
> Fri Sep 22 16:15:35 2017: thd-0x55b3cc1bd400: amandad: sending NAK pkt:
> <
> ERROR write error on stream 49: write error to: Connection reset by peer
>
> On the server I see:
>
> river: hdisk-state time 173.994 hdisk 0: free 1439233056 dumpers 1
> driver: result time 173.994 from dumper0: FAILED 00-1 "[data read: recv 
> error: Resource temporarily unavailable]"
> driver: send-cmd time 173.994 to chunker0: FAILED 00-1
>
> What can be the cause of the "reset"?
>
> Thanks!
> Matthias
This message is the property of CARBONITE, INC. and may contain confidential or 
privileged information.
If this message has been delivered to you by mistake, then do not copy or 
deliver this message to anyone.  Instead, destroy it and notify me by reply 
e-mail
diff --git a/common-src/security-util.c b/common-src/security-util.c
index 43360ea..4929bbd 100644
--- a/common-src/security-util.c
+++ b/common-src/security-util.c
@@ -517,6 +517,17 @@ tcpm_recv_token(
 	rval = read(fd, ((char *)&rc->netint) + rc->size_header_read,
 		SIZEOF(rc->netint) - rc->size_header_read);
 	if (rval == -1) {
+	if (0
+#ifdef EAGAIN
+		|| errno == EAGAIN
+#endif
+#ifdef EWOULDBLOCK
+		|| errno == EWOULDBLOCK
+#endif
+		) {
+		g_debug("first read return EAGAIN");
+		return -2;
+	}
 	if (errmsg)
 		*errmsg = newvstrallocf(*errmsg, _("recv error: %s"),
 	strerror(errno));
@@ -600,6 +611,17 @@ tcpm_recv_token(
 rval = read(fd, rc->buffer + rc->size_buffer_read,
 		(size_t)*size - rc->size_buffer_read);
 if (rval == -1) {
+	if (0
+#ifdef EAGAIN
+|| errno == EAGAIN
+#endif
+#ifdef EWOULDBLOCK
+|| errno == EWOULDBLOCK
+#endif
+	) {
+	g_debug("second read return EAGAIN");
+	return -2;
+	}
 	if (errmsg)
 	*errmsg = newvstrallocf(*errmsg, _("recv error: %s"),
 strerror(errno));


Re: Dump failed with data read: recv error: Resource temporarily unavailable after migration of some clients

2017-09-22 Thread Matthias Teege
On Thu, Sep 21, 2017 at 09:45:04AM +0200, Matthias Teege wrote:

Hi,

> I have a running Amanda 3.3.3 Server. The backup works without
> problems. On the clients we use Amanda 3.3.0/3.3.6 and 2.6.x on some
> old systems.

I've made some more tests and its look like that its not a problem
of one host. I've run amdump for a single host and a single disk and
got the same error. On the client I see:

Fri Sep 22 16:15:35 2017: thd-0x55b3cc1bd400: amandad: 
security_stream_seterr(0x55b3cc1e0f70, write error to: Connection reset by peer)
Fri Sep 22 16:15:35 2017: thd-0x55b3cc1bd400: amandad: sending NAK pkt:
<
ERROR write error on stream 49: write error to: Connection reset by peer

On the server I see:

river: hdisk-state time 173.994 hdisk 0: free 1439233056 dumpers 1
driver: result time 173.994 from dumper0: FAILED 00-1 "[data read: recv 
error: Resource temporarily unavailable]"
driver: send-cmd time 173.994 to chunker0: FAILED 00-1

What can be the cause of the "reset"?

Thanks!
Matthias
> 


Dump failed with data read: recv error: Resource temporarily unavailable after migration of some clients

2017-09-21 Thread Matthias Teege
Hello!

I have a running Amanda 3.3.3 Server. The backup works without
problems. On the clients we use Amanda 3.3.0/3.3.6 and 2.6.x on some
old systems.

After migrating one of the 2.6 Clients to 3.3.0 the backup fails with:

 cl15 /var/lib/jenkins lev 0  FAILED [data read: recv error: Resource 
temporarily unavailable]
 cl15 /var/lib/jenkins lev 0  FAILED [data read: recv error: Resource 
temporarily unavailable]
 cl15 /var/lib/jenkins lev 0  partial taper: successfully taped a partial dump

That would not be a big problem but after the migration other systems
start to fail too.

What does the error message mean? Do I have to remove some old indicies
from the server? Any hints?

Thanks!
Matthias