Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64
[I removed [EMAIL PROTECTED] from CC to avoid bounces due to not being subscribed] On Sa Mai 3 2008, John Hasler wrote: > Please test this patch to ntp_core.c on a pristine upstream 1.23: > > > --- ../pristine/chrony-1.23/ntp_core.c 2008-05-02 22:14:21.0 -0500 It seems that these timevals do not always end up in offset_time. A $ grep offset_time *.c reveals that offset_time is set (all other occurences of offset_time are either reads or modifications through UTI_* functions) in exactly one place: line [EMAIL PROTECTED], within SST_DoNewRegression Inspection of the surrounding code shows that the assignment depends on condition 'regression_ok'. There is no assignment in the else block at line 489. I confirmed with the debugger that this spot is reached before any reads/modifications take place, so this would be one place to put a fix. I have no idea what would be a good replacement value. Looking at the places calling SST_DoNewRegression and others it seems possible that enough samples can be dropped that regression_ok becomes false after it has been true before. In that case inst->sample_times[inst->n_samples - 1] might be better than {0, 0} if n_samples > 0. The canonical place to initialize offset_time = {0, 0} would be SST_CreateInstance. Best regards, Peter Pöschl -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64
On Sa Mai 3 2008, John Hasler wrote: > Please test this patch to ntp_core.c on a pristine upstream 1.23: I'm not quite sure what you mean with 'pristine upstream'. I applied the patch to the 1.23-3 sources from Debian unstable. > --- ../pristine/chrony-1.23/ntp_core.c 2008-05-02 22:14:21.0 -0500 > +++ ntp_core.c 2008-05-02 22:14:56.0 -0500 > @@ -320,6 +320,8 @@ > >result->local_rx.tv_sec = 0; >result->local_rx.tv_usec = 0; > + result->local_tx.tv_sec = 0; > + result->local_tx.tv_usec = 0; > >return result; The watchpoint with sources[0]->stats.offset_time.tv_sec>0x now triggers at main () at main.c:304 SCH_MainLoop () at sched.c:470 read_from_socket () at ntp_io.c:215 NSR_ProcessReceive () at ntp_sources.c:258 receive_packet () at ntp_core.c:1064 SRC_SelectSource () sources.c:695 REF_SetReference () at reference.c:408 LCL_AccumulateOffset () at local.c:446 slew_sources () at sources.c:763 SST_SlewSamples () at sourcestats.c:698 UTI_NormaliseTimeval () at util.c:93 I had to apply this patch --- sources.c.orig Thu May 01 10:38:40 2008 +0200 +++ sources.c Sun May 04 21:27:10 2008 +0200 @@ -136,9 +136,11 @@ max_n_sources = 0; selected_source_index = INVALID_SOURCE; initialised = 1; + static volatile int dbg_is_connected = 0; LCL_AddParameterChangeHandler(slew_sources, NULL); + while (!dbg_is_connected) ; return; } to reproduce the bug. It disappears when I start 'chronyd -d' from within the debugger. Best regards, Peter Pöschl -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64
Please test this patch to ntp_core.c on a pristine upstream 1.23: --- ../pristine/chrony-1.23/ntp_core.c 2008-05-02 22:14:21.0 -0500 +++ ntp_core.c 2008-05-02 22:14:56.0 -0500 @@ -320,6 +320,8 @@ result->local_rx.tv_sec = 0; result->local_rx.tv_usec = 0; + result->local_tx.tv_sec = 0; + result->local_tx.tv_usec = 0; return result; -- John Hasler [EMAIL PROTECTED] Elmwood, WI USA -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64
> The ridiculously high values come from the line > sources = MallocArray(struct SRC_Instance_Record *, max_n_sources); > in SRC_CreateNewInstance in sources.c. Thank you. I changed the limits in your patch to catch values that would be unreasonable on my 32 bit system but I couldn't get the bug to trigger after I started using gdb (I did see it a few times while I was testing the patch). > This missing initialization was obviously harmless in the 32-bit version, > but you should ask upstream for the implications of a huge random > starting value. It might be that with your divider/remainder patch the > program won't loop till the sun goes out, but nontheless take an eternity > until the system time converges to UTC. Occasionally I see "Residual freq : -32768.000 ppm" in the chronyc "Tracking" display at startup (it goes away after a few minutes). It occurred to me yesterday that it might be associated with this bug. Now I'm fairly sure it's the system time converging from such a random starting value. -- John Hasler [EMAIL PROTECTED] Elmwood, WI USA -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64
> BTW please cc: [EMAIL PROTECTED] so that this discussion is on record. Oh, sorry, I thought that was a bad idea after the bug was closed. The ridiculously high values come from the line sources = MallocArray(struct SRC_Instance_Record *, max_n_sources); in SRC_CreateNewInstance in sources.c. I put a watchpoint on sources[0]->stats.offset_time.tv_sec with condition '> 0x'. It never triggered when at the calltree main () at main.c:304 SCH_MainLoop () at sched.c:470 read_from_socket () at ntp_io.c:215 NSR_ProcessReceive () at ntp_sources.c:258 receive_packet () at ntp_core.c:1063 SRC_SelectSource () sources.c:693 REF_SetReference () at reference.c:408 LCL_AccumulateOffset () at local.c:446 slew_sources () at sources.c:761 sources is defined static in this file. SST_SlewSamples () at sourcestats.c:698 parameter inst = sources[1]->stats in caller UTI_DiffTimevalsToDouble () at util.c:161 parameter b = inst->offset_time of caller the value was, for the first time, used to calculate the result of UTI_DiffTimevalsToDouble (line numbers apply to 1.23-2 sources plus the instrumentation patches I sent you off-list). This missing initialization was obviously harmless in the 32-bit version, but you should ask upstream for the implications of a huge random starting value. It might be that with your divider/remainder patch the program won't loop till the sun goes out, but nontheless take an eternity until the system time converges to UTC. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64
Peter Pöschl writes: > When I single-step through main() I loose the interesting process in > LOG_GoDaemon(). Is there a way to tell the debugger that I want to trace > the child process ponafter the fork? Starting chrony with the "-d" option will prevent the fork. Chronyd will then remain attached to the terminal and send all messages there. BTW please cc: [EMAIL PROTECTED] so that this discussion is on record. -- John Hasler [EMAIL PROTECTED] Elmwood, WI USA -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]