Re: [Rd] Undefined behavior of head() and tail() with n = 0
Dear Florent, thank you for striving to clearly disentangle and present the issue below. That is a nice "role model" way of approaching such topics! > Florent Angly> on Fri, 27 Jan 2017 10:24:39 +0100 writes: > Martin, I agree with you that +0 and -0 should generally be treated as > equal, and R does a fine job in this respect. The Wikipedia article on > signed zero (https://en.wikipedia.org/wiki/Signed_zero) echoes this > view but also highlights that +0 and -0 can be treated differently in > particular situations, including their interpretation as mathematical > limits (as in the 1/-0 case). Indeed, the main question here is > whether head() and tail() represent a special case that would benefit > from differentiating between +0 and -0. > We can break down the discussion into two problems: > A/ the discrepancy between the implementation of R head() and tail() > and the documentation of these functions (where the use of zero is not > documented and thus not permissible), Ehm, no, in R (and many other software systems), "not documented" does *NOT* entail "not permissible" > B/ the discrepancy between the implementation of R head() and tail() > and their GNU equivalent (which allow zeros and differentiate between > -0 and +0, i.e. head takes "0" and "-0", tail takes "0" and "+0"). This discrepancy, as you mention later comes from the fact that basically, these arguments are strings in the Unix tools (GNU being a special case of Unix, here) and integers in R. Below, I'm giving my personal view of the issue: > There are several possible solutions to address these discrepancies: > 1/ Leave the code as-is but document its behavior with respect to zero > (zeros allowed, with negative zeros treated like positive zeros). > Advantages: This is the path of least resistance, and discrepancy A is fixed. > Disadvantages: Discrepancy B remains (but is documented). That would be my "clear" choice. > 2/ Leave the documentation as-is but reflect this in code by not > allowing zeros at all. > Advantages: Discrepancy A is fixed. > Disadvantages: Discrepancy B remains in some form (but is documented). > Need to deprecate the usage of +0 (which was not clearly documented > but may have been assumed by users). 2/ looks "uniformly inferior" to 1/ to me > 3/ Update the code and documentation to differentiate between +0 and -0. > Advantages: In my eyes, this is the ideal solution since discrepancy A > and (most of) B are resolved. > Disadvantages: It is unclear how to implement this solution and the > implications it may have on backward compatibility: > a/ Allow -0 (as double). But is it supported on all platforms used > by R (see ?Arithmetic)? William has raised the issue that negative > zero cannot be represented as an integer. Should head() and tail() > then strictly check double input (while forbidding integers)? > b/ The input could always be as character. This would allow to > mirror even more closely GNU tail (where the prefix "+" is used to > invert the meaning of n). This probably involves a fair amount of work > and careful handling of deprecation. 3/ involves quite a few complications, and in my view, your advantages are not even getting close to counter-weigh the drawbacks. > On 26 January 2017 at 16:51, William Dunlap wrote: >> In addition, signed zeroes only exist for floating point numbers - the >> bit patterns for as.integer(0) and as.integer(-0) are identical. indeed! >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> >> On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler >> wrote: Florent Angly on Wed, 25 Jan 2017 16:31:45 +0100 writes: >>> >>> > Hi all, >>> > The documentation for head() and tail() describes the behavior of >>> > these generic functions when n is strictly positive (n > 0) and >>> > strictly negative (n < 0). How these functions work when given a zero >>> > value is not defined. >>> >>> > Both GNU command-line utilities head and tail behave differently with +0 and -0: >>> > http://man7.org/linux/man-pages/man1/head.1.html >>> > http://man7.org/linux/man-pages/man1/tail.1.html >>> >>> > Since R supports signed zeros (1/+0 != 1/-0) >>> >>> whoa, whoa, .. slow down -- The above is misleading! >>> >>> Rather read in ?Arithmetic (*the* reference to consult for such issues), >>> where the 2nd part of the following section >>> >>> || Implementation limits: >>> || >>> || [..] >>> || >>> || Another potential issue is signed zeroes: on IEC 60659 platforms >>> || there are two zeroes with internal
Re: [Rd] Undefined behavior of head() and tail() with n = 0
Martin, I agree with you that +0 and -0 should generally be treated as equal, and R does a fine job in this respect. The Wikipedia article on signed zero (https://en.wikipedia.org/wiki/Signed_zero) echoes this view but also highlights that +0 and -0 can be treated differently in particular situations, including their interpretation as mathematical limits (as in the 1/-0 case). Indeed, the main question here is whether head() and tail() represent a special case that would benefit from differentiating between +0 and -0. We can break down the discussion into two problems: A/ the discrepancy between the implementation of R head() and tail() and the documentation of these functions (where the use of zero is not documented and thus not permissible), B/ the discrepancy between the implementation of R head() and tail() and their GNU equivalent (which allow zeros and differentiate between -0 and +0, i.e. head takes "0" and "-0", tail takes "0" and "+0"). There are several possible solutions to address these discrepancies: 1/ Leave the code as-is but document its behavior with respect to zero (zeros allowed, with negative zeros treated like positive zeros). Advantages: This is the path of least resistance, and discrepancy A is fixed. Disadvantages: Discrepancy B remains (but is documented). 2/ Leave the documentation as-is but reflect this in code by not allowing zeros at all. Advantages: Discrepancy A is fixed. Disadvantages: Discrepancy B remains in some form (but is documented). Need to deprecate the usage of +0 (which was not clearly documented but may have been assumed by users). 3/ Update the code and documentation to differentiate between +0 and -0. Advantages: In my eyes, this is the ideal solution since discrepancy A and (most of) B are resolved. Disadvantages: It is unclear how to implement this solution and the implications it may have on backward compatibility: a/ Allow -0 (as double). But is it supported on all platforms used by R (see ?Arithmetic)? William has raised the issue that negative zero cannot be represented as an integer. Should head() and tail() then strictly check double input (while forbidding integers)? b/ The input could always be as character. This would allow to mirror even more closely GNU tail (where the prefix "+" is used to invert the meaning of n). This probably involves a fair amount of work and careful handling of deprecation. On 26 January 2017 at 16:51, William Dunlapwrote: > In addition, signed zeroes only exist for floating point numbers - the > bit patterns for as.integer(0) and as.integer(-0) are identical. > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler > wrote: >>> Florent Angly >>> on Wed, 25 Jan 2017 16:31:45 +0100 writes: >> >> > Hi all, >> > The documentation for head() and tail() describes the behavior of >> > these generic functions when n is strictly positive (n > 0) and >> > strictly negative (n < 0). How these functions work when given a zero >> > value is not defined. >> >> > Both GNU command-line utilities head and tail behave differently with >> +0 and -0: >> > http://man7.org/linux/man-pages/man1/head.1.html >> > http://man7.org/linux/man-pages/man1/tail.1.html >> >> > Since R supports signed zeros (1/+0 != 1/-0) >> >> whoa, whoa, .. slow down -- The above is misleading! >> >> Rather read in ?Arithmetic (*the* reference to consult for such issues), >> where the 2nd part of the following section >> >> || Implementation limits: >> || >> || [..] >> || >> || Another potential issue is signed zeroes: on IEC 60659 platforms >> || there are two zeroes with internal representations differing by >> || sign. Where possible R treats them as the same, but for example >> || direct output from C code often does not do so and may output >> || ‘-0.0’ (and on Windows whether it does so or not depends on the >> || version of Windows). One place in R where the difference might be >> || seen is in division by zero: ‘1/x’ is ‘Inf’ or ‘-Inf’ depending on >> || the sign of zero ‘x’. Another place is ‘identical(0, -0, num.eq = >> || FALSE)’. >> >> says the *contrary* ( __Where possible R treats them as the same__ ): >> We do _not_ want to distinguish -0 and +0, >> but there are cases where it is inavoidable >> >> And there are good reasons (mathematics !!) for this. >> >> I'm pretty sure that it would be quite a mistake to start >> differentiating it here... but of course we can continue >> discussing here if you like. >> >> Martin Maechler >> ETH Zurich and R Core >> >> >> > and the R head() and tail() functions are modeled after >> > their GNU counterparts, I would expect the R functions to >> > distinguish between +0 and -0 >> >> >> tail(1:5, n=0) >> > integer(0) >> >> tail(1:5,
Re: [Rd] Undefined behavior of head() and tail() with n = 0
In addition, signed zeroes only exist for floating point numbers - the bit patterns for as.integer(0) and as.integer(-0) are identical. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechlerwrote: >> Florent Angly >> on Wed, 25 Jan 2017 16:31:45 +0100 writes: > > > Hi all, > > The documentation for head() and tail() describes the behavior of > > these generic functions when n is strictly positive (n > 0) and > > strictly negative (n < 0). How these functions work when given a zero > > value is not defined. > > > Both GNU command-line utilities head and tail behave differently with > +0 and -0: > > http://man7.org/linux/man-pages/man1/head.1.html > > http://man7.org/linux/man-pages/man1/tail.1.html > > > Since R supports signed zeros (1/+0 != 1/-0) > > whoa, whoa, .. slow down -- The above is misleading! > > Rather read in ?Arithmetic (*the* reference to consult for such issues), > where the 2nd part of the following section > > || Implementation limits: > || > || [..] > || > || Another potential issue is signed zeroes: on IEC 60659 platforms > || there are two zeroes with internal representations differing by > || sign. Where possible R treats them as the same, but for example > || direct output from C code often does not do so and may output > || ‘-0.0’ (and on Windows whether it does so or not depends on the > || version of Windows). One place in R where the difference might be > || seen is in division by zero: ‘1/x’ is ‘Inf’ or ‘-Inf’ depending on > || the sign of zero ‘x’. Another place is ‘identical(0, -0, num.eq = > || FALSE)’. > > says the *contrary* ( __Where possible R treats them as the same__ ): > We do _not_ want to distinguish -0 and +0, > but there are cases where it is inavoidable > > And there are good reasons (mathematics !!) for this. > > I'm pretty sure that it would be quite a mistake to start > differentiating it here... but of course we can continue > discussing here if you like. > > Martin Maechler > ETH Zurich and R Core > > > > and the R head() and tail() functions are modeled after > > their GNU counterparts, I would expect the R functions to > > distinguish between +0 and -0 > > >> tail(1:5, n=0) > > integer(0) > >> tail(1:5, n=1) > > [1] 5 > >> tail(1:5, n=2) > > [1] 4 5 > > >> tail(1:5, n=-2) > > [1] 3 4 5 > >> tail(1:5, n=-1) > > [1] 2 3 4 5 > >> tail(1:5, n=-0) > > integer(0) # expected 1:5 > > >> head(1:5, n=0) > > integer(0) > >> head(1:5, n=1) > > [1] 1 > >> head(1:5, n=2) > > [1] 1 2 > > >> head(1:5, n=-2) > > [1] 1 2 3 > >> head(1:5, n=-1) > > [1] 1 2 3 4 > >> head(1:5, n=-0) > > integer(0) # expected 1:5 > > > For both head() and tail(), I expected 1:5 as output but got > > integer(0). I obtained similar results using a data.frame and a > > function as x argument. > > > An easy fix would be to explicitly state in the documentation what n = > > 0 does, and that there is no practical difference between -0 and +0. > > However, in my eyes, the better approach would be implement support > > for -0 and document it. What do you think? > > > Best, > > > Florent > > > > PS/ My sessionInfo() gives: > > R version 3.3.2 (2016-10-31) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > > locale: > > [1] LC_COLLATE=German_Switzerland.1252 > > LC_CTYPE=German_Switzerland.1252 > > LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C > > LC_TIME=German_Switzerland.1252 > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Undefined behavior of head() and tail() with n = 0
> Florent Angly> on Wed, 25 Jan 2017 16:31:45 +0100 writes: > Hi all, > The documentation for head() and tail() describes the behavior of > these generic functions when n is strictly positive (n > 0) and > strictly negative (n < 0). How these functions work when given a zero > value is not defined. > Both GNU command-line utilities head and tail behave differently with +0 and -0: > http://man7.org/linux/man-pages/man1/head.1.html > http://man7.org/linux/man-pages/man1/tail.1.html > Since R supports signed zeros (1/+0 != 1/-0) whoa, whoa, .. slow down -- The above is misleading! Rather read in ?Arithmetic (*the* reference to consult for such issues), where the 2nd part of the following section || Implementation limits: || || [..] || || Another potential issue is signed zeroes: on IEC 60659 platforms || there are two zeroes with internal representations differing by || sign. Where possible R treats them as the same, but for example || direct output from C code often does not do so and may output || ‘-0.0’ (and on Windows whether it does so or not depends on the || version of Windows). One place in R where the difference might be || seen is in division by zero: ‘1/x’ is ‘Inf’ or ‘-Inf’ depending on || the sign of zero ‘x’. Another place is ‘identical(0, -0, num.eq = || FALSE)’. says the *contrary* ( __Where possible R treats them as the same__ ): We do _not_ want to distinguish -0 and +0, but there are cases where it is inavoidable And there are good reasons (mathematics !!) for this. I'm pretty sure that it would be quite a mistake to start differentiating it here... but of course we can continue discussing here if you like. Martin Maechler ETH Zurich and R Core > and the R head() and tail() functions are modeled after > their GNU counterparts, I would expect the R functions to > distinguish between +0 and -0 >> tail(1:5, n=0) > integer(0) >> tail(1:5, n=1) > [1] 5 >> tail(1:5, n=2) > [1] 4 5 >> tail(1:5, n=-2) > [1] 3 4 5 >> tail(1:5, n=-1) > [1] 2 3 4 5 >> tail(1:5, n=-0) > integer(0) # expected 1:5 >> head(1:5, n=0) > integer(0) >> head(1:5, n=1) > [1] 1 >> head(1:5, n=2) > [1] 1 2 >> head(1:5, n=-2) > [1] 1 2 3 >> head(1:5, n=-1) > [1] 1 2 3 4 >> head(1:5, n=-0) > integer(0) # expected 1:5 > For both head() and tail(), I expected 1:5 as output but got > integer(0). I obtained similar results using a data.frame and a > function as x argument. > An easy fix would be to explicitly state in the documentation what n = > 0 does, and that there is no practical difference between -0 and +0. > However, in my eyes, the better approach would be implement support > for -0 and document it. What do you think? > Best, > Florent > PS/ My sessionInfo() gives: > R version 3.3.2 (2016-10-31) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > locale: > [1] LC_COLLATE=German_Switzerland.1252 > LC_CTYPE=German_Switzerland.1252 > LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C > LC_TIME=German_Switzerland.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Undefined behavior of head() and tail() with n = 0
Hi all, The documentation for head() and tail() describes the behavior of these generic functions when n is strictly positive (n > 0) and strictly negative (n < 0). How these functions work when given a zero value is not defined. Both GNU command-line utilities head and tail behave differently with +0 and -0: http://man7.org/linux/man-pages/man1/head.1.html http://man7.org/linux/man-pages/man1/tail.1.html Since R supports signed zeros (1/+0 != 1/-0) and the R head() and tail() functions are modeled after their GNU counterparts, I would expect the R functions to distinguish between +0 and -0 > tail(1:5, n=0) integer(0) > tail(1:5, n=1) [1] 5 > tail(1:5, n=2) [1] 4 5 > tail(1:5, n=-2) [1] 3 4 5 > tail(1:5, n=-1) [1] 2 3 4 5 > tail(1:5, n=-0) integer(0) # expected 1:5 > head(1:5, n=0) integer(0) > head(1:5, n=1) [1] 1 > head(1:5, n=2) [1] 1 2 > head(1:5, n=-2) [1] 1 2 3 > head(1:5, n=-1) [1] 1 2 3 4 > head(1:5, n=-0) integer(0) # expected 1:5 For both head() and tail(), I expected 1:5 as output but got integer(0). I obtained similar results using a data.frame and a function as x argument. An easy fix would be to explicitly state in the documentation what n = 0 does, and that there is no practical difference between -0 and +0. However, in my eyes, the better approach would be implement support for -0 and document it. What do you think? Best, Florent PS/ My sessionInfo() gives: R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C LC_TIME=German_Switzerland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel