Re: defects for uses of abs(u64) (was: Re: Regression: can't apply frequency offsets above 1000ppm)

2015-09-23 Thread Neil Brown
Joe Perches  writes:

> On Fri, 2015-09-04 at 18:00 -0700, John Stultz wrote:
>> On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
>> > On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  
>> > wrote:
>> >> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
>> >>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
>> >>> > And just installing chrony from the feeds. With any kernel from 3.17
>> >>> > you'll have wrong estimates at chronyc sourcestats.
>> >>>
>> >>> Wrong estimates? Could you be more specific about what the failure
>> >>> you're seeing is here? The
>> >>>
>> >>> I installed the image above, which comes with a 4.1.6 kernel, and
>> >>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
>> >>> internet fairly quickly (at least according to chronyc tracking).
>> >>
>> >> To see the bug with chronyd the initial offset shouldn't be very close
>> >> to zero, so it's forced to correct the offset by adjusting the
>> >> frequency in a larger step.
>> >>
>> >> I'm attaching a simple C program that prints the frequency offset
>> >> as measured between the REALTIME and MONOTONIC_RAW clocks when the
>> >> adjtimex tick is set to 9000. It should show values close to -10
>> >> ppm and I suspect on the BBB it will be much smaller.
>> >
>> > So I spent some time on this late last night and this afternoon.
>> >
>> > It was a little odd because things don't seem totally broken, but
>> > something isn't quite right.
>> >
>> > Digging around it seems the iterative logrithmic approximation done in
>> > timekeeping_freqadjust() wasn't working right. Instead of making
>> > smaller order alternating positive and negative adjustments, it was
>> > doing strange growing adjustments for the same value that wern't large
>> > enough to actually correct things very quickly. This made it much
>> > slower to adapt to specified frequency values.
>> >
>> > The odd bit, is it seems to come down to:
>> > tick_error = abs(tick_error);
>> >
>> > Haven't chased down why yet, but apparently abs() isn't doing what one
>> > would think when passed a s64 value.
>> 
>> Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
>> /*
>>  * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
>>  * input types abs() returns a signed long.
>>  * abs() should not be used for 64-bit types (s64, u64, long long) - use 
>> abs64()
>>  * for those.
>>  */
>> 
>> Ouch.
>
> Here's a little cocci script that finds more of these in:

Thanks.

Maybe we should also:

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5582410727cb..aa7d69afdcac 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -208,6 +208,7 @@ extern int _cond_resched(void);
  */
 #define abs(x) ({  \
long ret;   \
+   BUILD_BUG_ON(sizeof(x) > sizeof(long)); \
if (sizeof(x) == sizeof(long)) {\
long __x = (x); \
ret = (__x < 0) ? -__x : __x;   \


so that people won't make the same mistake again.
That finds bugs in
 driver/md/raid10.c
 drivers/gpu/drm/radeon/radeon_display.c
 kernel/time/clocksource.c
 kernel/time/timekeeping.c
 fs/ext4/mballoc.c
 
that your cocci scripted missed.  All "abs(x - y)".

As sector_t can be 32bit and can be 64bit, I wonder if abs_sector()
would be a good idea ... probably not.

Thoughts?

NeilBrown


signature.asc
Description: PGP signature


Re: defects for uses of abs(u64) (was: Re: Regression: can't apply frequency offsets above 1000ppm)

2015-09-23 Thread Neil Brown
Joe Perches  writes:

> On Fri, 2015-09-04 at 18:00 -0700, John Stultz wrote:
>> On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
>> > On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  
>> > wrote:
>> >> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
>> >>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
>> >>> > And just installing chrony from the feeds. With any kernel from 3.17
>> >>> > you'll have wrong estimates at chronyc sourcestats.
>> >>>
>> >>> Wrong estimates? Could you be more specific about what the failure
>> >>> you're seeing is here? The
>> >>>
>> >>> I installed the image above, which comes with a 4.1.6 kernel, and
>> >>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
>> >>> internet fairly quickly (at least according to chronyc tracking).
>> >>
>> >> To see the bug with chronyd the initial offset shouldn't be very close
>> >> to zero, so it's forced to correct the offset by adjusting the
>> >> frequency in a larger step.
>> >>
>> >> I'm attaching a simple C program that prints the frequency offset
>> >> as measured between the REALTIME and MONOTONIC_RAW clocks when the
>> >> adjtimex tick is set to 9000. It should show values close to -10
>> >> ppm and I suspect on the BBB it will be much smaller.
>> >
>> > So I spent some time on this late last night and this afternoon.
>> >
>> > It was a little odd because things don't seem totally broken, but
>> > something isn't quite right.
>> >
>> > Digging around it seems the iterative logrithmic approximation done in
>> > timekeeping_freqadjust() wasn't working right. Instead of making
>> > smaller order alternating positive and negative adjustments, it was
>> > doing strange growing adjustments for the same value that wern't large
>> > enough to actually correct things very quickly. This made it much
>> > slower to adapt to specified frequency values.
>> >
>> > The odd bit, is it seems to come down to:
>> > tick_error = abs(tick_error);
>> >
>> > Haven't chased down why yet, but apparently abs() isn't doing what one
>> > would think when passed a s64 value.
>> 
>> Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
>> /*
>>  * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
>>  * input types abs() returns a signed long.
>>  * abs() should not be used for 64-bit types (s64, u64, long long) - use 
>> abs64()
>>  * for those.
>>  */
>> 
>> Ouch.
>
> Here's a little cocci script that finds more of these in:

Thanks.

Maybe we should also:

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5582410727cb..aa7d69afdcac 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -208,6 +208,7 @@ extern int _cond_resched(void);
  */
 #define abs(x) ({  \
long ret;   \
+   BUILD_BUG_ON(sizeof(x) > sizeof(long)); \
if (sizeof(x) == sizeof(long)) {\
long __x = (x); \
ret = (__x < 0) ? -__x : __x;   \


so that people won't make the same mistake again.
That finds bugs in
 driver/md/raid10.c
 drivers/gpu/drm/radeon/radeon_display.c
 kernel/time/clocksource.c
 kernel/time/timekeeping.c
 fs/ext4/mballoc.c
 
that your cocci scripted missed.  All "abs(x - y)".

As sector_t can be 32bit and can be 64bit, I wonder if abs_sector()
would be a good idea ... probably not.

Thoughts?

NeilBrown


signature.asc
Description: PGP signature


defects for uses of abs(u64) (was: Re: Regression: can't apply frequency offsets above 1000ppm)

2015-09-04 Thread Joe Perches
On Fri, 2015-09-04 at 18:00 -0700, John Stultz wrote:
> On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
> > On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  
> > wrote:
> >> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
> >>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
> >>> > And just installing chrony from the feeds. With any kernel from 3.17
> >>> > you'll have wrong estimates at chronyc sourcestats.
> >>>
> >>> Wrong estimates? Could you be more specific about what the failure
> >>> you're seeing is here? The
> >>>
> >>> I installed the image above, which comes with a 4.1.6 kernel, and
> >>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
> >>> internet fairly quickly (at least according to chronyc tracking).
> >>
> >> To see the bug with chronyd the initial offset shouldn't be very close
> >> to zero, so it's forced to correct the offset by adjusting the
> >> frequency in a larger step.
> >>
> >> I'm attaching a simple C program that prints the frequency offset
> >> as measured between the REALTIME and MONOTONIC_RAW clocks when the
> >> adjtimex tick is set to 9000. It should show values close to -10
> >> ppm and I suspect on the BBB it will be much smaller.
> >
> > So I spent some time on this late last night and this afternoon.
> >
> > It was a little odd because things don't seem totally broken, but
> > something isn't quite right.
> >
> > Digging around it seems the iterative logrithmic approximation done in
> > timekeeping_freqadjust() wasn't working right. Instead of making
> > smaller order alternating positive and negative adjustments, it was
> > doing strange growing adjustments for the same value that wern't large
> > enough to actually correct things very quickly. This made it much
> > slower to adapt to specified frequency values.
> >
> > The odd bit, is it seems to come down to:
> > tick_error = abs(tick_error);
> >
> > Haven't chased down why yet, but apparently abs() isn't doing what one
> > would think when passed a s64 value.
> 
> Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
> /*
>  * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
>  * input types abs() returns a signed long.
>  * abs() should not be used for 64-bit types (s64, u64, long long) - use 
> abs64()
>  * for those.
>  */
> 
> Ouch.

Here's a little cocci script that finds more of these in:

lib/percpu_counter.c
drivers/input/joystick/walkera0701.c
drivers/md/raid5.c
drivers/spi/spi-pxa2xx.c
fs/f2fs/debug.c

$ cat abs.cocci
@@
u64 t;
@@

*   abs(t)

@@
s64 t;
@@

*   abs(t)

@@
long long t;
@@

*   abs(t)

@@
unsigned long long t;
@@

*   abs(t)

@@
uint64_t t;
@@

*   abs(t)

@@
int64_t t;
@@

*   abs(t)

$

diff -u -p ./lib/percpu_counter.c /tmp/nothing/lib/percpu_counter.c
--- ./lib/percpu_counter.c
+++ /tmp/nothing/lib/percpu_counter.c
@@ -203,7 +203,6 @@ int __percpu_counter_compare(struct perc
 
count = percpu_counter_read(fbc);
/* Check to see if rough count will be sufficient for comparison */
-   if (abs(count - rhs) > (batch * num_online_cpus())) {
if (count > rhs)
return 1;
else

diff -u -p ./drivers/input/joystick/walkera0701.c 
/tmp/nothing/drivers/input/joystick/walkera0701.c
--- ./drivers/input/joystick/walkera0701.c
+++ /tmp/nothing/drivers/input/joystick/walkera0701.c
@@ -150,7 +150,6 @@ static void walkera0701_irq_handler(void
if (w->counter == 24) { /* full frame */
walkera0701_parse_frame(w);
w->counter = NO_SYNC;
-   if (abs(pulse_time - SYNC_PULSE) < RESERVE) /* new 
frame sync */
w->counter = 0;
} else {
if ((pulse_time > (ANALOG_MIN_PULSE - RESERVE)
@@ -161,7 +160,6 @@ static void walkera0701_irq_handler(void
} else
w->counter = NO_SYNC;
}
-   } else if (abs(pulse_time - SYNC_PULSE - BIN0_PULSE) <
RESERVE + BIN1_PULSE - BIN0_PULSE)  /* 
frame sync .. */
w->counter = 0;

diff -u -p ./drivers/md/raid5.c /tmp/nothing/drivers/md/raid5.c
--- ./drivers/md/raid5.c
+++ /tmp/nothing/drivers/md/raid5.c
@@ -6701,8 +6701,6 @@ static int run(struct mddev *mddev)
 * readonly mode so it can take control before
 * allowing any writes.  So just check for that.
 */
-   if (abs(min_offset_diff) >= mddev->chunk_sectors &&
-   abs(min_offset_diff) >= mddev->new_chunk_sectors)
/* not really in-place - so OK */;
else if (mddev->ro == 0) {
printk(KERN_ERR "md/raid:%s: in-place reshape "

diff -u -p ./drivers/spi/spi-pxa2xx.c 

defects for uses of abs(u64) (was: Re: Regression: can't apply frequency offsets above 1000ppm)

2015-09-04 Thread Joe Perches
On Fri, 2015-09-04 at 18:00 -0700, John Stultz wrote:
> On Fri, Sep 4, 2015 at 5:57 PM, John Stultz  wrote:
> > On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar  
> > wrote:
> >> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
> >>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves  wrote:
> >>> > And just installing chrony from the feeds. With any kernel from 3.17
> >>> > you'll have wrong estimates at chronyc sourcestats.
> >>>
> >>> Wrong estimates? Could you be more specific about what the failure
> >>> you're seeing is here? The
> >>>
> >>> I installed the image above, which comes with a 4.1.6 kernel, and
> >>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
> >>> internet fairly quickly (at least according to chronyc tracking).
> >>
> >> To see the bug with chronyd the initial offset shouldn't be very close
> >> to zero, so it's forced to correct the offset by adjusting the
> >> frequency in a larger step.
> >>
> >> I'm attaching a simple C program that prints the frequency offset
> >> as measured between the REALTIME and MONOTONIC_RAW clocks when the
> >> adjtimex tick is set to 9000. It should show values close to -10
> >> ppm and I suspect on the BBB it will be much smaller.
> >
> > So I spent some time on this late last night and this afternoon.
> >
> > It was a little odd because things don't seem totally broken, but
> > something isn't quite right.
> >
> > Digging around it seems the iterative logrithmic approximation done in
> > timekeeping_freqadjust() wasn't working right. Instead of making
> > smaller order alternating positive and negative adjustments, it was
> > doing strange growing adjustments for the same value that wern't large
> > enough to actually correct things very quickly. This made it much
> > slower to adapt to specified frequency values.
> >
> > The odd bit, is it seems to come down to:
> > tick_error = abs(tick_error);
> >
> > Haven't chased down why yet, but apparently abs() isn't doing what one
> > would think when passed a s64 value.
> 
> Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
> /*
>  * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
>  * input types abs() returns a signed long.
>  * abs() should not be used for 64-bit types (s64, u64, long long) - use 
> abs64()
>  * for those.
>  */
> 
> Ouch.

Here's a little cocci script that finds more of these in:

lib/percpu_counter.c
drivers/input/joystick/walkera0701.c
drivers/md/raid5.c
drivers/spi/spi-pxa2xx.c
fs/f2fs/debug.c

$ cat abs.cocci
@@
u64 t;
@@

*   abs(t)

@@
s64 t;
@@

*   abs(t)

@@
long long t;
@@

*   abs(t)

@@
unsigned long long t;
@@

*   abs(t)

@@
uint64_t t;
@@

*   abs(t)

@@
int64_t t;
@@

*   abs(t)

$

diff -u -p ./lib/percpu_counter.c /tmp/nothing/lib/percpu_counter.c
--- ./lib/percpu_counter.c
+++ /tmp/nothing/lib/percpu_counter.c
@@ -203,7 +203,6 @@ int __percpu_counter_compare(struct perc
 
count = percpu_counter_read(fbc);
/* Check to see if rough count will be sufficient for comparison */
-   if (abs(count - rhs) > (batch * num_online_cpus())) {
if (count > rhs)
return 1;
else

diff -u -p ./drivers/input/joystick/walkera0701.c 
/tmp/nothing/drivers/input/joystick/walkera0701.c
--- ./drivers/input/joystick/walkera0701.c
+++ /tmp/nothing/drivers/input/joystick/walkera0701.c
@@ -150,7 +150,6 @@ static void walkera0701_irq_handler(void
if (w->counter == 24) { /* full frame */
walkera0701_parse_frame(w);
w->counter = NO_SYNC;
-   if (abs(pulse_time - SYNC_PULSE) < RESERVE) /* new 
frame sync */
w->counter = 0;
} else {
if ((pulse_time > (ANALOG_MIN_PULSE - RESERVE)
@@ -161,7 +160,6 @@ static void walkera0701_irq_handler(void
} else
w->counter = NO_SYNC;
}
-   } else if (abs(pulse_time - SYNC_PULSE - BIN0_PULSE) <
RESERVE + BIN1_PULSE - BIN0_PULSE)  /* 
frame sync .. */
w->counter = 0;

diff -u -p ./drivers/md/raid5.c /tmp/nothing/drivers/md/raid5.c
--- ./drivers/md/raid5.c
+++ /tmp/nothing/drivers/md/raid5.c
@@ -6701,8 +6701,6 @@ static int run(struct mddev *mddev)
 * readonly mode so it can take control before
 * allowing any writes.  So just check for that.
 */
-   if (abs(min_offset_diff) >= mddev->chunk_sectors &&
-   abs(min_offset_diff) >= mddev->new_chunk_sectors)
/* not really in-place - so OK */;
else if (mddev->ro == 0) {
printk(KERN_ERR "md/raid:%s: