Re: [Rd] Undefined behavior of head() and tail() with n = 0

2017-01-27 Thread Martin Maechler
Dear Florent,

thank you for striving to clearly disentangle and present the
issue below.
That is a nice "role model" way of approaching such topics!

> Florent Angly 
> on Fri, 27 Jan 2017 10:24:39 +0100 writes:

> Martin, I agree with you that +0 and -0 should generally be treated as
> equal, and R does a fine job in this respect. The Wikipedia article on
> signed zero (https://en.wikipedia.org/wiki/Signed_zero) echoes this
> view but also highlights that +0 and -0 can be treated differently in
> particular situations, including their interpretation as mathematical
> limits (as in the 1/-0 case). Indeed, the main question here is
> whether head() and tail() represent a special case that would benefit
> from differentiating between +0 and -0.

> We can break down the discussion into two problems:
> A/ the discrepancy between the implementation of R head() and tail()
> and the documentation of these functions (where the use of zero is not
> documented and thus not permissible),

Ehm, no, in R (and many other software systems),

  "not documented" does *NOT* entail "not permissible"


> B/ the discrepancy between the implementation of R head() and tail()
> and their GNU equivalent (which allow zeros and differentiate between
> -0 and +0, i.e. head takes "0" and "-0", tail takes "0" and "+0").

This discrepancy, as you mention later comes from the fact that
basically, these arguments are strings in the Unix tools (GNU being a
special case of Unix, here) and integers in R.

Below, I'm giving my personal view of the issue:

> There are several possible solutions to address these discrepancies:

> 1/ Leave the code as-is but document its behavior with respect to zero
> (zeros allowed, with negative zeros treated like positive zeros).
> Advantages: This is the path of least resistance, and discrepancy A is 
fixed.
> Disadvantages: Discrepancy B remains (but is documented).

That would be my "clear" choice.


> 2/ Leave the documentation as-is but reflect this in code by not
> allowing zeros at all.
> Advantages: Discrepancy A is fixed.
> Disadvantages: Discrepancy B remains in some form (but is documented).
> Need to deprecate the usage of +0 (which was not clearly documented
> but may have been assumed by users).

2/ looks "uniformly inferior" to 1/ to me


> 3/ Update the code and documentation to differentiate between +0 and -0.
> Advantages: In my eyes, this is the ideal solution since discrepancy A
> and (most of) B are resolved.
> Disadvantages: It is unclear how to implement this solution and the
> implications it may have on backward compatibility:
> a/ Allow -0 (as double). But is it supported on all platforms used
> by R (see ?Arithmetic)? William has raised the issue that negative
> zero cannot be represented as an integer. Should head() and tail()
> then strictly check double input (while forbidding integers)?
> b/ The input could always be as character. This would allow to
> mirror even more closely GNU tail (where the prefix "+" is used to
> invert the meaning of n). This probably involves a fair amount of work
> and careful handling of deprecation.

3/ involves quite a few complications, and in my view, your
   advantages are not even getting close to counter-weigh the drawbacks.


> On 26 January 2017 at 16:51, William Dunlap  wrote:
>> In addition, signed zeroes only exist for floating point numbers - the
>> bit patterns for as.integer(0) and as.integer(-0) are identical.

indeed!

>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>> 
>> 
>> On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler
>>  wrote:
 Florent Angly 
 on Wed, 25 Jan 2017 16:31:45 +0100 writes:
>>> 
>>> > Hi all,
>>> > The documentation for head() and tail() describes the behavior of
>>> > these generic functions when n is strictly positive (n > 0) and
>>> > strictly negative (n < 0). How these functions work when given a zero
>>> > value is not defined.
>>> 
>>> > Both GNU command-line utilities head and tail behave differently with 
+0 and -0:
>>> > http://man7.org/linux/man-pages/man1/head.1.html
>>> > http://man7.org/linux/man-pages/man1/tail.1.html
>>> 
>>> > Since R supports signed zeros (1/+0 != 1/-0)
>>> 
>>> whoa, whoa, .. slow down --  The above is misleading!
>>> 
>>> Rather read in  ?Arithmetic (*the* reference to consult for such 
issues),
>>> where the 2nd part of the following section
>>> 
>>> || Implementation limits:
>>> ||
>>> ||  [..]
>>> ||
>>> ||  Another potential issue is signed zeroes: on IEC 60659 platforms
>>> ||  there are two zeroes with internal 

Re: [Rd] Undefined behavior of head() and tail() with n = 0

2017-01-27 Thread Florent Angly
Martin, I agree with you that +0 and -0 should generally be treated as
equal, and R does a fine job in this respect. The Wikipedia article on
signed zero (https://en.wikipedia.org/wiki/Signed_zero) echoes this
view but also highlights that +0 and -0 can be treated differently in
particular situations, including their interpretation as mathematical
limits (as in the 1/-0 case). Indeed, the main question here is
whether head() and tail() represent a special case that would benefit
from differentiating between +0 and -0.

We can break down the discussion into two problems:
A/ the discrepancy between the implementation of R head() and tail()
and the documentation of these functions (where the use of zero is not
documented and thus not permissible),
B/ the discrepancy between the implementation of R head() and tail()
and their GNU equivalent (which allow zeros and differentiate between
-0 and +0, i.e. head takes "0" and "-0", tail takes "0" and "+0").

There are several possible solutions to address these discrepancies:

1/ Leave the code as-is but document its behavior with respect to zero
(zeros allowed, with negative zeros treated like positive zeros).
Advantages: This is the path of least resistance, and discrepancy A is fixed.
Disadvantages: Discrepancy B remains (but is documented).

2/ Leave the documentation as-is but reflect this in code by not
allowing zeros at all.
Advantages: Discrepancy A is fixed.
Disadvantages: Discrepancy B remains in some form (but is documented).
Need to deprecate the usage of +0 (which was not clearly documented
but may have been assumed by users).

3/ Update the code and documentation to differentiate between +0 and -0.
Advantages: In my eyes, this is the ideal solution since discrepancy A
and (most of) B are resolved.
Disadvantages: It is unclear how to implement this solution and the
implications it may have on backward compatibility:
   a/ Allow -0 (as double). But is it supported on all platforms used
by R (see ?Arithmetic)? William has raised the issue that negative
zero cannot be represented as an integer. Should head() and tail()
then strictly check double input (while forbidding integers)?
   b/ The input could always be as character. This would allow to
mirror even more closely GNU tail (where the prefix "+" is used to
invert the meaning of n). This probably involves a fair amount of work
and careful handling of deprecation.



On 26 January 2017 at 16:51, William Dunlap  wrote:
> In addition, signed zeroes only exist for floating point numbers - the
> bit patterns for as.integer(0) and as.integer(-0) are identical.
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler
>  wrote:
>>> Florent Angly 
>>> on Wed, 25 Jan 2017 16:31:45 +0100 writes:
>>
>> > Hi all,
>> > The documentation for head() and tail() describes the behavior of
>> > these generic functions when n is strictly positive (n > 0) and
>> > strictly negative (n < 0). How these functions work when given a zero
>> > value is not defined.
>>
>> > Both GNU command-line utilities head and tail behave differently with 
>> +0 and -0:
>> > http://man7.org/linux/man-pages/man1/head.1.html
>> > http://man7.org/linux/man-pages/man1/tail.1.html
>>
>> > Since R supports signed zeros (1/+0 != 1/-0)
>>
>> whoa, whoa, .. slow down --  The above is misleading!
>>
>> Rather read in  ?Arithmetic (*the* reference to consult for such issues),
>> where the 2nd part of the following section
>>
>>  || Implementation limits:
>>  ||
>>  ||  [..]
>>  ||
>>  ||  Another potential issue is signed zeroes: on IEC 60659 platforms
>>  ||  there are two zeroes with internal representations differing by
>>  ||  sign.  Where possible R treats them as the same, but for example
>>  ||  direct output from C code often does not do so and may output
>>  ||  ‘-0.0’ (and on Windows whether it does so or not depends on the
>>  ||  version of Windows).  One place in R where the difference might be
>>  ||  seen is in division by zero: ‘1/x’ is ‘Inf’ or ‘-Inf’ depending on
>>  ||  the sign of zero ‘x’.  Another place is ‘identical(0, -0, num.eq =
>>  ||  FALSE)’.
>>
>> says the *contrary* ( __Where possible R treats them as the same__ ):
>> We do _not_ want to distinguish -0 and +0,
>> but there are cases where it is inavoidable
>>
>> And there are good reasons (mathematics !!) for this.
>>
>> I'm pretty sure that it would be quite a mistake to start
>> differentiating it here...  but of course we can continue
>> discussing here if you like.
>>
>> Martin Maechler
>> ETH Zurich and R Core
>>
>>
>> > and the R head() and tail() functions are modeled after
>> > their GNU counterparts, I would expect the R functions to
>> > distinguish between +0 and -0
>>
>> >> tail(1:5, n=0)
>> > integer(0)
>> >> tail(1:5, 

Re: [Rd] Undefined behavior of head() and tail() with n = 0

2017-01-26 Thread William Dunlap via R-devel
In addition, signed zeroes only exist for floating point numbers - the
bit patterns for as.integer(0) and as.integer(-0) are identical.
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler
 wrote:
>> Florent Angly 
>> on Wed, 25 Jan 2017 16:31:45 +0100 writes:
>
> > Hi all,
> > The documentation for head() and tail() describes the behavior of
> > these generic functions when n is strictly positive (n > 0) and
> > strictly negative (n < 0). How these functions work when given a zero
> > value is not defined.
>
> > Both GNU command-line utilities head and tail behave differently with 
> +0 and -0:
> > http://man7.org/linux/man-pages/man1/head.1.html
> > http://man7.org/linux/man-pages/man1/tail.1.html
>
> > Since R supports signed zeros (1/+0 != 1/-0)
>
> whoa, whoa, .. slow down --  The above is misleading!
>
> Rather read in  ?Arithmetic (*the* reference to consult for such issues),
> where the 2nd part of the following section
>
>  || Implementation limits:
>  ||
>  ||  [..]
>  ||
>  ||  Another potential issue is signed zeroes: on IEC 60659 platforms
>  ||  there are two zeroes with internal representations differing by
>  ||  sign.  Where possible R treats them as the same, but for example
>  ||  direct output from C code often does not do so and may output
>  ||  ‘-0.0’ (and on Windows whether it does so or not depends on the
>  ||  version of Windows).  One place in R where the difference might be
>  ||  seen is in division by zero: ‘1/x’ is ‘Inf’ or ‘-Inf’ depending on
>  ||  the sign of zero ‘x’.  Another place is ‘identical(0, -0, num.eq =
>  ||  FALSE)’.
>
> says the *contrary* ( __Where possible R treats them as the same__ ):
> We do _not_ want to distinguish -0 and +0,
> but there are cases where it is inavoidable
>
> And there are good reasons (mathematics !!) for this.
>
> I'm pretty sure that it would be quite a mistake to start
> differentiating it here...  but of course we can continue
> discussing here if you like.
>
> Martin Maechler
> ETH Zurich and R Core
>
>
> > and the R head() and tail() functions are modeled after
> > their GNU counterparts, I would expect the R functions to
> > distinguish between +0 and -0
>
> >> tail(1:5, n=0)
> > integer(0)
> >> tail(1:5, n=1)
> > [1] 5
> >> tail(1:5, n=2)
> > [1] 4 5
>
> >> tail(1:5, n=-2)
> > [1] 3 4 5
> >> tail(1:5, n=-1)
> > [1] 2 3 4 5
> >> tail(1:5, n=-0)
> > integer(0)  # expected 1:5
>
> >> head(1:5, n=0)
> > integer(0)
> >> head(1:5, n=1)
> > [1] 1
> >> head(1:5, n=2)
> > [1] 1 2
>
> >> head(1:5, n=-2)
> > [1] 1 2 3
> >> head(1:5, n=-1)
> > [1] 1 2 3 4
> >> head(1:5, n=-0)
> > integer(0)  # expected 1:5
>
> > For both head() and tail(), I expected 1:5 as output but got
> > integer(0). I obtained similar results using a data.frame and a
> > function as x argument.
>
> > An easy fix would be to explicitly state in the documentation what n =
> > 0 does, and that there is no practical difference between -0 and +0.
> > However, in my eyes, the better approach would be implement support
> > for -0 and document it. What do you think?
>
> > Best,
>
> > Florent
>
>
> > PS/ My sessionInfo() gives:
> > R version 3.3.2 (2016-10-31)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> > Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> > locale:
> > [1] LC_COLLATE=German_Switzerland.1252
> > LC_CTYPE=German_Switzerland.1252
> > LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> > LC_TIME=German_Switzerland.1252
>
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
>
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Undefined behavior of head() and tail() with n = 0

2017-01-26 Thread Martin Maechler
> Florent Angly 
> on Wed, 25 Jan 2017 16:31:45 +0100 writes:

> Hi all,
> The documentation for head() and tail() describes the behavior of
> these generic functions when n is strictly positive (n > 0) and
> strictly negative (n < 0). How these functions work when given a zero
> value is not defined.

> Both GNU command-line utilities head and tail behave differently with +0 
and -0:
> http://man7.org/linux/man-pages/man1/head.1.html
> http://man7.org/linux/man-pages/man1/tail.1.html

> Since R supports signed zeros (1/+0 != 1/-0) 

whoa, whoa, .. slow down --  The above is misleading!

Rather read in  ?Arithmetic (*the* reference to consult for such issues),
where the 2nd part of the following section

 || Implementation limits:
 || 
 ||  [..]
 || 
 ||  Another potential issue is signed zeroes: on IEC 60659 platforms
 ||  there are two zeroes with internal representations differing by
 ||  sign.  Where possible R treats them as the same, but for example
 ||  direct output from C code often does not do so and may output
 ||  ‘-0.0’ (and on Windows whether it does so or not depends on the
 ||  version of Windows).  One place in R where the difference might be
 ||  seen is in division by zero: ‘1/x’ is ‘Inf’ or ‘-Inf’ depending on
 ||  the sign of zero ‘x’.  Another place is ‘identical(0, -0, num.eq =
 ||  FALSE)’.

says the *contrary* ( __Where possible R treats them as the same__ ):
We do _not_ want to distinguish -0 and +0,
but there are cases where it is inavoidable

And there are good reasons (mathematics !!) for this.

I'm pretty sure that it would be quite a mistake to start
differentiating it here...  but of course we can continue
discussing here if you like.

Martin Maechler
ETH Zurich and R Core


> and the R head() and tail() functions are modeled after
> their GNU counterparts, I would expect the R functions to
> distinguish between +0 and -0

>> tail(1:5, n=0)
> integer(0)
>> tail(1:5, n=1)
> [1] 5
>> tail(1:5, n=2)
> [1] 4 5

>> tail(1:5, n=-2)
> [1] 3 4 5
>> tail(1:5, n=-1)
> [1] 2 3 4 5
>> tail(1:5, n=-0)
> integer(0)  # expected 1:5

>> head(1:5, n=0)
> integer(0)
>> head(1:5, n=1)
> [1] 1
>> head(1:5, n=2)
> [1] 1 2

>> head(1:5, n=-2)
> [1] 1 2 3
>> head(1:5, n=-1)
> [1] 1 2 3 4
>> head(1:5, n=-0)
> integer(0)  # expected 1:5

> For both head() and tail(), I expected 1:5 as output but got
> integer(0). I obtained similar results using a data.frame and a
> function as x argument.

> An easy fix would be to explicitly state in the documentation what n =
> 0 does, and that there is no practical difference between -0 and +0.
> However, in my eyes, the better approach would be implement support
> for -0 and document it. What do you think?

> Best,

> Florent


> PS/ My sessionInfo() gives:
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1

> locale:
> [1] LC_COLLATE=German_Switzerland.1252
> LC_CTYPE=German_Switzerland.1252
> LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
> LC_TIME=German_Switzerland.1252

> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Undefined behavior of head() and tail() with n = 0

2017-01-25 Thread Florent Angly
Hi all,

The documentation for head() and tail() describes the behavior of
these generic functions when n is strictly positive (n > 0) and
strictly negative (n < 0). How these functions work when given a zero
value is not defined.

Both GNU command-line utilities head and tail behave differently with +0 and -0:
http://man7.org/linux/man-pages/man1/head.1.html
http://man7.org/linux/man-pages/man1/tail.1.html

Since R supports signed zeros (1/+0 != 1/-0) and the R head() and
tail() functions are modeled after their GNU counterparts, I would
expect the R functions to distinguish between +0 and -0

> tail(1:5, n=0)
integer(0)
> tail(1:5, n=1)
[1] 5
> tail(1:5, n=2)
[1] 4 5

> tail(1:5, n=-2)
[1] 3 4 5
> tail(1:5, n=-1)
[1] 2 3 4 5
> tail(1:5, n=-0)
integer(0)  # expected 1:5

> head(1:5, n=0)
integer(0)
> head(1:5, n=1)
[1] 1
> head(1:5, n=2)
[1] 1 2

> head(1:5, n=-2)
[1] 1 2 3
> head(1:5, n=-1)
[1] 1 2 3 4
> head(1:5, n=-0)
integer(0)  # expected 1:5

For both head() and tail(), I expected 1:5 as output but got
integer(0). I obtained similar results using a data.frame and a
function as x argument.

An easy fix would be to explicitly state in the documentation what n =
0 does, and that there is no practical difference between -0 and +0.
However, in my eyes, the better approach would be implement support
for -0 and document it. What do you think?

Best,

Florent


PS/ My sessionInfo() gives:
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Switzerland.1252
LC_CTYPE=German_Switzerland.1252
LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
 LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel