Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64

2008-05-18 Thread Peter Pöschl
[I removed [EMAIL PROTECTED] from CC to avoid bounces due to not being 
subscribed]

On Sa Mai 3 2008, John Hasler wrote:
> Please test this patch to ntp_core.c on a pristine upstream 1.23:
>
>
> --- ../pristine/chrony-1.23/ntp_core.c  2008-05-02 22:14:21.0 -0500

It seems that these timevals do not always end up in offset_time. A

  $ grep offset_time *.c

reveals that offset_time is set (all other occurences of offset_time are 
either reads or modifications through UTI_* functions) in exactly one place:

  line [EMAIL PROTECTED], within SST_DoNewRegression

Inspection of the surrounding code shows that the assignment depends on 
condition 'regression_ok'.

There is no assignment in the else block at line 489. I confirmed with the 
debugger that this spot is reached before any reads/modifications take place, 
so this would be one place to put a fix. I have no idea what would be a good 
replacement value.
Looking at the places calling SST_DoNewRegression and others it seems possible 
that enough samples can be dropped that regression_ok becomes false after it 
has been true before. In that case inst->sample_times[inst->n_samples - 1] 
might be better than {0, 0} if n_samples > 0. 

The canonical place to initialize offset_time = {0, 0} would be 
SST_CreateInstance.


Best regards,

  Peter Pöschl





--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64

2008-05-04 Thread Peter Pöschl
On Sa Mai 3 2008, John Hasler wrote:
> Please test this patch to ntp_core.c on a pristine upstream 1.23:
I'm not quite sure what you mean with 'pristine upstream'.
I applied the patch to the 1.23-3 sources from Debian unstable.


> --- ../pristine/chrony-1.23/ntp_core.c  2008-05-02 22:14:21.0 -0500
> +++ ntp_core.c  2008-05-02 22:14:56.0 -0500
> @@ -320,6 +320,8 @@
>
>result->local_rx.tv_sec = 0;
>result->local_rx.tv_usec = 0;
> +  result->local_tx.tv_sec = 0;
> +  result->local_tx.tv_usec = 0;
>
>return result;

The watchpoint with sources[0]->stats.offset_time.tv_sec>0x
now triggers at
  main () at main.c:304
  SCH_MainLoop () at sched.c:470
  read_from_socket () at ntp_io.c:215
  NSR_ProcessReceive () at ntp_sources.c:258
  receive_packet () at ntp_core.c:1064
  SRC_SelectSource () sources.c:695
  REF_SetReference () at reference.c:408
  LCL_AccumulateOffset () at local.c:446
  slew_sources () at sources.c:763
  SST_SlewSamples () at sourcestats.c:698
  UTI_NormaliseTimeval () at util.c:93


I had to apply this patch 

--- sources.c.orig Thu May 01 10:38:40 2008 +0200
+++ sources.c Sun May 04 21:27:10 2008 +0200
@@ -136,9 +136,11 @@
   max_n_sources = 0;
   selected_source_index = INVALID_SOURCE;
   initialised = 1;
+  static volatile int dbg_is_connected = 0;

   LCL_AddParameterChangeHandler(slew_sources, NULL);

+  while (!dbg_is_connected) ;
   return;
 }


to reproduce the bug. It disappears when I start 'chronyd -d' from within the 
debugger.


Best regards,

  Peter Pöschl



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64

2008-05-02 Thread John Hasler
Please test this patch to ntp_core.c on a pristine upstream 1.23:


--- ../pristine/chrony-1.23/ntp_core.c  2008-05-02 22:14:21.0 -0500
+++ ntp_core.c  2008-05-02 22:14:56.0 -0500
@@ -320,6 +320,8 @@
 
   result->local_rx.tv_sec = 0;
   result->local_rx.tv_usec = 0;
+  result->local_tx.tv_sec = 0;
+  result->local_tx.tv_usec = 0;
 
   return result;
 


-- 
John Hasler 
[EMAIL PROTECTED]
Elmwood, WI USA



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64

2008-04-29 Thread John Hasler
> The ridiculously high values come from the line

>  sources = MallocArray(struct SRC_Instance_Record *, max_n_sources);

> in SRC_CreateNewInstance in sources.c.

Thank you.  I changed the limits in your patch to catch values that would
be unreasonable on my 32 bit system but I couldn't get the bug to trigger
after I started using gdb (I did see it a few times while I was testing the
patch).

> This missing initialization was obviously harmless in the 32-bit version,
> but you should ask upstream for the implications of a huge random
> starting value.  It might be that with your divider/remainder patch the
> program won't loop till the sun goes out, but nontheless take an eternity
> until the system time converges to UTC.

Occasionally I see "Residual freq : -32768.000 ppm" in the chronyc
"Tracking" display at startup (it goes away after a few minutes).  It
occurred to me yesterday that it might be associated with this bug.  Now
I'm fairly sure it's the system time converging from such a random starting
value.
-- 
John Hasler 
[EMAIL PROTECTED]
Elmwood, WI USA



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64

2008-04-29 Thread Peter Pöschl
> BTW please cc: [EMAIL PROTECTED] so that this discussion is on record.
Oh, sorry, I thought that was a bad idea after the bug was closed.


The ridiculously high values come from the line

  sources = MallocArray(struct SRC_Instance_Record *, max_n_sources);

in SRC_CreateNewInstance in sources.c.

I put a watchpoint on
  sources[0]->stats.offset_time.tv_sec
with condition '> 0x'.
It never triggered when at the calltree
  main () at main.c:304
  SCH_MainLoop () at sched.c:470
  read_from_socket () at ntp_io.c:215
  NSR_ProcessReceive () at ntp_sources.c:258
  receive_packet () at ntp_core.c:1063
  SRC_SelectSource () sources.c:693
  REF_SetReference () at reference.c:408
  LCL_AccumulateOffset () at local.c:446
  slew_sources () at sources.c:761
   sources is defined static in this file.
  SST_SlewSamples () at sourcestats.c:698
   parameter inst = sources[1]->stats in caller
  UTI_DiffTimevalsToDouble () at util.c:161
   parameter b = inst->offset_time of caller

the value was, for the first time, used to calculate the result of 
UTI_DiffTimevalsToDouble (line numbers apply to 1.23-2 sources plus the 
instrumentation patches I sent you off-list).

This missing initialization was obviously harmless in the 32-bit version, but 
you should ask upstream for the implications of a huge random starting value.
It might be that with your divider/remainder patch the program won't loop till 
the sun goes out, but nontheless take an eternity until the system time 
converges to UTC.







-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Bug#474294: Moreinfo: Bug#474294: RFH: Chrony goes into endless loop on x86_64

2008-04-27 Thread John Hasler
Peter Pöschl writes:
> When I single-step through main() I loose the interesting process in
> LOG_GoDaemon(). Is there a way to tell the debugger that I want to trace
> the child process ponafter the fork?

Starting chrony with the "-d" option will prevent the fork.  Chronyd will
then remain attached to the terminal and send all messages there.

BTW please cc: [EMAIL PROTECTED] so that this discussion is on record.
-- 
John Hasler 
[EMAIL PROTECTED]
Elmwood, WI USA



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]