Unusual TSIG problem

2010-12-08 Thread Kevin Oberman
I just ran into an odd issue with a TSIG signed zone transfer.

On occasion I was logging a clocks are unsynchronized message doing a
transfer from a customer server at a site about 30 ms away. I dropped a
note to the manager there asking that he look at the his system for a
time issue. He checked and found no problems.

Today I looked at the problem more closely. I realized that the problem
was NOT a clock sync issue. They were probably within a millisecond of
one another. I found the following in the log:
Dec  8 06:26:18 ns1 named[67170]: zone XX.gov/IN: notify from 
123.234.1.1#33372: refresh in progress, refresh check queued
Dec  8 06:31:18 ns1 named[67170]: transfer of 'XX.gov/IN' from 
123.234.1.1#53: failed while receiving responses: clocks are unsynchronized
Dec  8 06:31:18 ns1 named[67170]: transfer of 'XX.gov/IN' from 
123.234.1.1#53: Transfer completed: 1 messages, 397 records, 59674 bytes, 
898.462 secs (66 bytes/sec)

The transfer, probably due to a hardware problem was taking over 5
minutes to transfer the zone and RFC2845 suggests tha the difference
between clocks should be limited to 300 seconds (5 minutes). This really
means that, should the transfer take over 5 minutes, you get the
unsynced clocks error. (4.5.2. TIME check and error handling)

Clearly, something is broken when a zone transfer takes over 5
minutes. (This one SHOULD take about 2-3 seconds.) But the message
certainly pointed in the wrong direction. Is there more appropriate
language that might indicate that it could also be an effective time-out
because the transfer took too long? Maybe "failed while receiving
responses: clocks are unsynchronized or maximum transfer time exceeded"?
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: ober...@es.net  Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Unusual TSIG problem

2010-12-08 Thread Mark Andrews

In message <20101208214221.566771c...@ptavv.es.net>, "Kevin Oberman" writes:
> I just ran into an odd issue with a TSIG signed zone transfer.
> 
> On occasion I was logging a clocks are unsynchronized message doing a
> transfer from a customer server at a site about 30 ms away. I dropped a
> note to the manager there asking that he look at the his system for a
> time issue. He checked and found no problems.
> 
> Today I looked at the problem more closely. I realized that the problem
> was NOT a clock sync issue. They were probably within a millisecond of
> one another. I found the following in the log:
> Dec  8 06:26:18 ns1 named[67170]: zone XX.gov/IN: notify from 123.234.1.1
> #33372: refresh in progress, refresh check queued
> Dec  8 06:31:18 ns1 named[67170]: transfer of 'XX.gov/IN' from 123.234.1.
> 1#53: failed while receiving responses: clocks are unsynchronized
> Dec  8 06:31:18 ns1 named[67170]: transfer of 'XX.gov/IN' from 123.234.1.
> 1#53: Transfer completed: 1 messages, 397 records, 59674 bytes, 898.462 secs 
> (66 bytes/sec)
> 
> The transfer, probably due to a hardware problem was taking over 5
> minutes to transfer the zone and RFC2845 suggests tha the difference
> between clocks should be limited to 300 seconds (5 minutes). This really
> means that, should the transfer take over 5 minutes, you get the
> unsynced clocks error. (4.5.2. TIME check and error handling)

59674*8/397 = 1202 b/s that's slower than almost all dialup lines.
If this happens regularly use transfer-format multiple-messages which
results in smaller messages and more signatures.
 
> Clearly, something is broken when a zone transfer takes over 5
> minutes. (This one SHOULD take about 2-3 seconds.) But the message
> certainly pointed in the wrong direction. Is there more appropriate
> language that might indicate that it could also be an effective time-out
> because the transfer took too long? Maybe "failed while receiving
> responses: clocks are unsynchronized or maximum transfer time exceeded"?
> -- 
> R. Kevin Oberman, Network Engineer
> Energy Sciences Network (ESnet)
> Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
> E-mail: ober...@es.netPhone: +1 510 486-8634
> Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751
> ___
> bind-users mailing list
> bind-users@lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: Unusual TSIG problem

2010-12-08 Thread Kevin Oberman
> From: Mark Andrews 
> Date: Thu, 09 Dec 2010 09:07:53 +1100
> 
> 
> In message <20101208214221.566771c...@ptavv.es.net>, "Kevin Oberman" writes:
> > I just ran into an odd issue with a TSIG signed zone transfer.
> > 
> > On occasion I was logging a clocks are unsynchronized message doing a
> > transfer from a customer server at a site about 30 ms away. I dropped a
> > note to the manager there asking that he look at the his system for a
> > time issue. He checked and found no problems.
> > 
> > Today I looked at the problem more closely. I realized that the problem
> > was NOT a clock sync issue. They were probably within a millisecond of
> > one another. I found the following in the log:
> > Dec  8 06:26:18 ns1 named[67170]: zone XX.gov/IN: notify from 
> > 123.234.1.1
> > #33372: refresh in progress, refresh check queued
> > Dec  8 06:31:18 ns1 named[67170]: transfer of 'XX.gov/IN' from 
> > 123.234.1.
> > 1#53: failed while receiving responses: clocks are unsynchronized
> > Dec  8 06:31:18 ns1 named[67170]: transfer of 'XX.gov/IN' from 
> > 123.234.1.
> > 1#53: Transfer completed: 1 messages, 397 records, 59674 bytes, 898.462 
> > secs 
> > (66 bytes/sec)
> > 
> > The transfer, probably due to a hardware problem was taking over 5
> > minutes to transfer the zone and RFC2845 suggests tha the difference
> > between clocks should be limited to 300 seconds (5 minutes). This really
> > means that, should the transfer take over 5 minutes, you get the
> > unsynced clocks error. (4.5.2. TIME check and error handling)
> 
> 59674*8/397 = 1202 b/s that's slower than almost all dialup lines.
> If this happens regularly use transfer-format multiple-messages which
> results in smaller messages and more signatures.

I agree that this should not happen. The hardware is obviously
broken. It's just that the message resulted in a great deal of wasted
effort looking at clock issues when the problem was completely unrelated
to that. I'm just wondering if it is worthwhile to mention this
possibility in the log message. The more detailed message would have
caused me to note the time stamps on the messages immediately and
realize what was going on.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: ober...@es.net  Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users