Re: [Rd] [External] Re: zapsmall(x) for scalar x

2023-12-19 Thread Steve Martin
Thanks for sharing, Martin. You're right that the interface for mFUN
should be more general than I initially thought.*

Perhaps you have other cases/examples where the ina argument is
useful, in which case ignore me, but your example with the robust mFUN
doesn't use the ina argument. What about having mFUN be only an
argument of x (NAs and all), with a default of \(x) max(abs(x), na.rm
= TRUE)? It's a minor difference, but it might make the mFUN argument
a bit simpler to use (no need to carry a dummy argument when NAs in x
can be handled directly).

Steve

* Tangent: Does boxplot.stats() use the number of NA values? The
documentation says NAs are omitted, and a quick scan of the code and
some tests suggests boxplot.stats(x) should give the same result as
boxplot.stats(x[!is.na(x)]), although I may be missing something. But
your point is well taken, and the interface should be more general
than I initially thought.

On Tue, 19 Dec 2023 at 11:25, Martin Maechler
 wrote:
>
> > Steve Martin
> > on Mon, 18 Dec 2023 07:56:46 -0500 writes:
>
> > Does mFUN() really need to be a function of x and the NA values of x? I
> > can't think of a case where it would be used on anything but the non-NA
> > values of x.
>
> > I think it would be easier to specify a different mFUN() (and document 
> this
> > new argument) if the function has one argument and is applied to the 
> non-NA
> > values of x.
>
> > zapsmall <- function(x,
> > digits = getOption("digits"),
> > mFUN = function(x) max(abs(x)),
> > min.d = 0L) {
> > if (length(digits) == 0L)
> > stop("invalid 'digits'")
> > if (all(ina <- is.na(x)))
> > return(x)
> > mx <- mFUN(x[!ina])
> > round(x, digits = if(mx > 0) max(min.d, digits - 
> as.numeric(log10(mx)))
> > else digits)
> > }
>
> > Steve
>
> Thank you, Steve,
> you are right that it would look simpler to do it that way.
>
> On the other hand, in your case, mFUN() no longer sees the
> original  n observations, and would not know if there where NAs
> in that case how many NAs there were in the original data.
>
> The examples I have on my version of zapsmall's help page (see below)
> uses a robust mFUN, "the upper hinge of a box plot":
>
>mF_rob <- function(x, ina) boxplot.stats(x, do.conf=FALSE)$stats[5]
>
> and if you inspect boxplot.stats() you may know that indeed it
> also wants to use the full data 'x' to compute its statistics and
> then deal with NAs directly.  Your simplified mFUN interface
> would not be fully consistent with boxplot(), and I think could
> not be made so,  hence my more flexible 2-argument "design" for  mFUN().
>
>  and BTW, these examples also exemplify the use of  `min.d`
> about which  Serguei Sokol asked for an example or two.
>
> Here I repeat my definition of zapsmall, and then my current set
> of examples:
>
> zapsmall <- function(x, digits = getOption("digits"),
>  mFUN = function(x, ina) max(abs(x[!ina])), min.d = 0L)
> {
> if (length(digits) == 0L)
> stop("invalid 'digits'")
> if (all(ina <- is.na(x)))
> return(x)
> mx <- mFUN(x, ina)
> round(x, digits = if(mx > 0) max(min.d, digits - as.numeric(log10(mx))) 
> else digits)
> }
>
>
> ##--- \examples{
> x2 <- pi * 100^(-2:2)/10
>print(  x2, digits = 4)
> zapsmall(  x2) # automatical digits
> zapsmall(  x2, digits = 4)
> zapsmall(c(x2, Inf)) # round()s to integer ..
> zapsmall(c(x2, Inf), min.d=-Inf) # everything  is small wrt  Inf
>
> (z <- exp(1i*0:4*pi/2))
> zapsmall(z)
>
> zapShow <- function(x, ...) rbind(orig = x, zapped = zapsmall(x, ...))
> zapShow(x2)
>
> ## using a *robust* mFUN
> mF_rob <- function(x, ina) boxplot.stats(x, do.conf=FALSE)$stats[5]
> ## with robust mFUN(), 'Inf' is no longer distorting the picture:
> zapShow(c(x2, Inf), mFUN = mF_rob)
> zapShow(c(x2, Inf), mFUN = mF_rob, min.d = -5) # the same
> zapShow(c(x2, 999), mFUN = mF_rob) # same *rounding* as w/ Inf
> zapShow(c(x2, 999), mFUN = mF_rob, min.d =  3) # the same
> zapShow(c(x2, 999), mFUN = mF_rob, min.d =  8) # small diff
> ##--- }
>
>
>
> > On Mon, Dec 18, 2023, 05:47 Serguei Sokol via R-devel 
> 
> > wrote:
>
> > Le 18/12/2023 à 11:24, Martin Maechler a écrit :
> > >> Serguei Sokol via R-devel
> > >>  on Mon, 18 Dec 2023 10:29:02 +0100 writes:
> > >  > Le 17/12/2023 à 18:26, Barry Rowlingson a écrit :
> > >  >> I think what's been missed is that zapsmall works relative to the 
> > > absolute
> > >  >> largest value in the vector. Hence if there's only one
> > >  >> item in the vector, it is the largest, so its not zapped. The 
> > > function's
> > >  >> raison d'etre isn't to replace absolutely small values,
> > >  >> but small values relative to the largest. Hence a vector of 
> > > similar tiny
> > >  >> values doesn't get zapped.
> > >  >>
> > >  >> Maybe the line in the docs:
> > >  >>
> > > 

Re: [Rd] [External] Re: zapsmall(x) for scalar x

2023-12-19 Thread Martin Maechler
> Steve Martin 
> on Mon, 18 Dec 2023 07:56:46 -0500 writes:

> Does mFUN() really need to be a function of x and the NA values of x? I
> can't think of a case where it would be used on anything but the non-NA
> values of x.

> I think it would be easier to specify a different mFUN() (and document 
this
> new argument) if the function has one argument and is applied to the 
non-NA
> values of x.

> zapsmall <- function(x,
> digits = getOption("digits"),
> mFUN = function(x) max(abs(x)),
> min.d = 0L) {
> if (length(digits) == 0L)
> stop("invalid 'digits'")
> if (all(ina <- is.na(x)))
> return(x)
> mx <- mFUN(x[!ina])
> round(x, digits = if(mx > 0) max(min.d, digits - 
as.numeric(log10(mx)))
> else digits)
> }

> Steve

Thank you, Steve,
you are right that it would look simpler to do it that way.

On the other hand, in your case, mFUN() no longer sees the
original  n observations, and would not know if there where NAs
in that case how many NAs there were in the original data.

The examples I have on my version of zapsmall's help page (see below)
uses a robust mFUN, "the upper hinge of a box plot":

   mF_rob <- function(x, ina) boxplot.stats(x, do.conf=FALSE)$stats[5]

and if you inspect boxplot.stats() you may know that indeed it
also wants to use the full data 'x' to compute its statistics and
then deal with NAs directly.  Your simplified mFUN interface
would not be fully consistent with boxplot(), and I think could
not be made so,  hence my more flexible 2-argument "design" for  mFUN().

 and BTW, these examples also exemplify the use of  `min.d`
about which  Serguei Sokol asked for an example or two.

Here I repeat my definition of zapsmall, and then my current set
of examples:

zapsmall <- function(x, digits = getOption("digits"),
 mFUN = function(x, ina) max(abs(x[!ina])), min.d = 0L)
{
if (length(digits) == 0L)
stop("invalid 'digits'")
if (all(ina <- is.na(x)))
return(x)
mx <- mFUN(x, ina)
round(x, digits = if(mx > 0) max(min.d, digits - as.numeric(log10(mx))) 
else digits)
}


##--- \examples{
x2 <- pi * 100^(-2:2)/10
   print(  x2, digits = 4)
zapsmall(  x2) # automatical digits
zapsmall(  x2, digits = 4)
zapsmall(c(x2, Inf)) # round()s to integer ..
zapsmall(c(x2, Inf), min.d=-Inf) # everything  is small wrt  Inf

(z <- exp(1i*0:4*pi/2))
zapsmall(z)

zapShow <- function(x, ...) rbind(orig = x, zapped = zapsmall(x, ...))
zapShow(x2)

## using a *robust* mFUN
mF_rob <- function(x, ina) boxplot.stats(x, do.conf=FALSE)$stats[5]
## with robust mFUN(), 'Inf' is no longer distorting the picture:
zapShow(c(x2, Inf), mFUN = mF_rob)
zapShow(c(x2, Inf), mFUN = mF_rob, min.d = -5) # the same
zapShow(c(x2, 999), mFUN = mF_rob) # same *rounding* as w/ Inf
zapShow(c(x2, 999), mFUN = mF_rob, min.d =  3) # the same
zapShow(c(x2, 999), mFUN = mF_rob, min.d =  8) # small diff
##--- }



> On Mon, Dec 18, 2023, 05:47 Serguei Sokol via R-devel 

> wrote:

> Le 18/12/2023 à 11:24, Martin Maechler a écrit :
> >> Serguei Sokol via R-devel
> >>  on Mon, 18 Dec 2023 10:29:02 +0100 writes:
> >  > Le 17/12/2023 à 18:26, Barry Rowlingson a écrit :
> >  >> I think what's been missed is that zapsmall works relative to the 
> > absolute
> >  >> largest value in the vector. Hence if there's only one
> >  >> item in the vector, it is the largest, so its not zapped. The 
> > function's
> >  >> raison d'etre isn't to replace absolutely small values,
> >  >> but small values relative to the largest. Hence a vector of similar 
> > tiny
> >  >> values doesn't get zapped.
> >  >>
> >  >> Maybe the line in the docs:
> >  >>
> >  >> " (compared with the maximal absolute value)"
> >  >>
> >  >> needs to read:
> >  >>
> >  >> " (compared with the maximal absolute value in the vector)"
> >
> >  > I agree that this change in the doc would clarify the situation but
> >  > would not resolve proposed corner cases.
> >
> >  > I think that an additional argument 'mx' (absolute max value of
> >  > reference) would do. Consider:
> >
> >  > zapsmall2 <-
> >  > function (x, digits = getOption("digits"), mx=max(abs(x),  
> > na.rm=TRUE))
> >  > {
> >  > if (length(digits) == 0L)
> >  > stop("invalid 'digits'")
> >  > if (all(ina <- is.na(x)))
> >  > return(x)
> >  > round(x, digits = if (mx > 0) max(0L, digits - 
> > as.numeric(log10(mx))) else digits)
> >  > }
> >
> >  > then zapsmall2() without explicit 'mx' behaves
> >  > identically to actual
> >  > zapsmall() and for a scalar or a vector of identical value, user
> can
> >  > manually fix the scale of what should be considered as small:
> >
> >  >> zapsmall2(y)
> >  > [1] 2.220446e-16
> >  >> zapsmall2(y, mx=1)
> >  > [1] 0

Re: [Rd] [External] Re: zapsmall(x) for scalar x

2023-12-18 Thread Steve Martin
Does mFUN() really need to be a function of x and the NA values of x? I
can't think of a case where it would be used on anything but the non-NA
values of x.

I think it would be easier to specify a different mFUN() (and document this
new argument) if the function has one argument and is applied to the non-NA
values of x.

zapsmall <- function(x,
digits = getOption("digits"),
mFUN = function(x) max(abs(x)),
min.d = 0L
) {
if (length(digits) == 0L)
stop("invalid 'digits'")
if (all(ina <- is.na(x)))
return(x)
mx <- mFUN(x[!ina])
round(x, digits = if(mx > 0) max(min.d, digits - as.numeric(log10(mx)))
else digits)
}

Steve

On Mon, Dec 18, 2023, 05:47 Serguei Sokol via R-devel 
wrote:

> Le 18/12/2023 à 11:24, Martin Maechler a écrit :
> >> Serguei Sokol via R-devel
> >>  on Mon, 18 Dec 2023 10:29:02 +0100 writes:
> >  > Le 17/12/2023 à 18:26, Barry Rowlingson a écrit :
> >  >> I think what's been missed is that zapsmall works relative to
> the absolute
> >  >> largest value in the vector. Hence if there's only one
> >  >> item in the vector, it is the largest, so its not zapped. The
> function's
> >  >> raison d'etre isn't to replace absolutely small values,
> >  >> but small values relative to the largest. Hence a vector of
> similar tiny
> >  >> values doesn't get zapped.
> >  >>
> >  >> Maybe the line in the docs:
> >  >>
> >  >> " (compared with the maximal absolute value)"
> >  >>
> >  >> needs to read:
> >  >>
> >  >> " (compared with the maximal absolute value in the vector)"
> >
> >  > I agree that this change in the doc would clarify the situation
> but
> >  > would not resolve proposed corner cases.
> >
> >  > I think that an additional argument 'mx' (absolute max value of
> >  > reference) would do. Consider:
> >
> >  > zapsmall2 <-
> >  > function (x, digits = getOption("digits"), mx=max(abs(x),
> na.rm=TRUE))
> >  > {
> >  > if (length(digits) == 0L)
> >  > stop("invalid 'digits'")
> >  > if (all(ina <- is.na(x)))
> >  > return(x)
> >  > round(x, digits = if (mx > 0) max(0L, digits -
> >  > as.numeric(log10(mx))) else digits)
> >  > }
> >
> >  > then zapsmall2() without explicit 'mx' behaves identically to
> actual
> >  > zapsmall() and for a scalar or a vector of identical value, user
> can
> >  > manually fix the scale of what should be considered as small:
> >
> >  >> zapsmall2(y)
> >  > [1] 2.220446e-16
> >  >> zapsmall2(y, mx=1)
> >  > [1] 0
> >  >> zapsmall2(c(y, y), mx=1)
> >  > [1] 0 0
> >  >> zapsmall2(c(y, NA))
> >  > [1] 2.220446e-16   NA
> >  >> zapsmall2(c(y, NA), mx=1)
> >  > [1]  0 NA
> >
> >  > Obviously, the name 'zapsmall2' was chosen just for this
> explanation.
> >  > The original name 'zapsmall' could be reused as a full backward
> >  > compatibility is preserved.
> >
> >  > Best,
> >  > Serguei.
> >
> > Thank you, Serguei, Duncan, Barry et al.
> >
> > Generally :
> >Yes, zapsmall was meant and is used for zapping *relatively*
> >small numbers.  In the other cases,  directly  round()ing is
> >what you should use.
> >
> > Specifically to Serguei's proposal of allowing the "max" value
> > to be user specified (in which case it is not really a true
> > max() anymore):
> >
> > I've spent quite a a few hours on this problem in May 2022, to
> > make it even more flexible, e.g. allowing to use a 99%
> > percentile instead of the max(), or allowing to exclude +Inf
> > from the "mx"; but -- compared to your zapsmall2() --
> > to allow reproducible automatic choice :
> >
> >
> > zapsmall <- function(x, digits = getOption("digits"),
> >   mFUN = function(x, ina) max(abs(x[!ina])),
> >min.d = 0L)
> > {
> >  if (length(digits) == 0L)
> >  stop("invalid 'digits'")
> >  if (all(ina <- is.na(x)))
> >  return(x)
> >  mx <- mFUN(x, ina)
> >  round(x, digits = if(mx > 0) max(min.d, digits -
> as.numeric(log10(mx))) else digits)
> > }
> >
> > with optional 'min.d' as I had (vaguely remember to have) found
> > at the time that the '0' is also not always "the only correct" choice.
> Do you have a case or two where min.d could be useful?
>
> Serguei.
>
> >
> > Somehow I never got to propose/discuss the above,
> > but it seems a good time to do so now.
> >
> > Martin
> >
> >
> >
> >  >> barry
> >  >>
> >  >>
> >  >> On Sun, Dec 17, 2023 at 2:17 PM Duncan Murdoch <
> murdoch.dun...@gmail.com>
> >  >> wrote:
> >  >>
> >  >>> This email originated outside the University. Check before
> clicking links
> >  >>> or attachments.
> >  >>>
> >  >>> I'm really confused.  Steve's example wasn't a scalar x, it was
> a
> >  >>> vector.  Your zapsmall() proposal wouldn't zap it to zero, and
> I don't
> >  >>> 

Re: [Rd] [External] Re: zapsmall(x) for scalar x

2023-12-18 Thread Serguei Sokol via R-devel

Le 18/12/2023 à 11:24, Martin Maechler a écrit :

Serguei Sokol via R-devel
 on Mon, 18 Dec 2023 10:29:02 +0100 writes:

 > Le 17/12/2023 à 18:26, Barry Rowlingson a écrit :
 >> I think what's been missed is that zapsmall works relative to the 
absolute
 >> largest value in the vector. Hence if there's only one
 >> item in the vector, it is the largest, so its not zapped. The function's
 >> raison d'etre isn't to replace absolutely small values,
 >> but small values relative to the largest. Hence a vector of similar tiny
 >> values doesn't get zapped.
 >>
 >> Maybe the line in the docs:
 >>
 >> " (compared with the maximal absolute value)"
 >>
 >> needs to read:
 >>
 >> " (compared with the maximal absolute value in the vector)"

 > I agree that this change in the doc would clarify the situation but
 > would not resolve proposed corner cases.

 > I think that an additional argument 'mx' (absolute max value of
 > reference) would do. Consider:

 > zapsmall2 <-
 > function (x, digits = getOption("digits"), mx=max(abs(x), na.rm=TRUE))
 > {
 >     if (length(digits) == 0L)
 >     stop("invalid 'digits'")
 >     if (all(ina <- is.na(x)))
 >     return(x)
 >     round(x, digits = if (mx > 0) max(0L, digits -
 > as.numeric(log10(mx))) else digits)
 > }

 > then zapsmall2() without explicit 'mx' behaves identically to actual
 > zapsmall() and for a scalar or a vector of identical value, user can
 > manually fix the scale of what should be considered as small:

 >> zapsmall2(y)
 > [1] 2.220446e-16
 >> zapsmall2(y, mx=1)
 > [1] 0
 >> zapsmall2(c(y, y), mx=1)
 > [1] 0 0
 >> zapsmall2(c(y, NA))
 > [1] 2.220446e-16   NA
 >> zapsmall2(c(y, NA), mx=1)
 > [1]  0 NA

 > Obviously, the name 'zapsmall2' was chosen just for this explanation.
 > The original name 'zapsmall' could be reused as a full backward
 > compatibility is preserved.

 > Best,
 > Serguei.

Thank you, Serguei, Duncan, Barry et al.

Generally :
   Yes, zapsmall was meant and is used for zapping *relatively*
   small numbers.  In the other cases,  directly  round()ing is
   what you should use.

Specifically to Serguei's proposal of allowing the "max" value
to be user specified (in which case it is not really a true
max() anymore):

I've spent quite a a few hours on this problem in May 2022, to
make it even more flexible, e.g. allowing to use a 99%
percentile instead of the max(), or allowing to exclude +Inf
from the "mx"; but -- compared to your zapsmall2() --
to allow reproducible automatic choice :


zapsmall <- function(x, digits = getOption("digits"),
  mFUN = function(x, ina) max(abs(x[!ina])),
 min.d = 0L)
{
 if (length(digits) == 0L)
 stop("invalid 'digits'")
 if (all(ina <- is.na(x)))
 return(x)
 mx <- mFUN(x, ina)
 round(x, digits = if(mx > 0) max(min.d, digits - as.numeric(log10(mx))) 
else digits)
}

with optional 'min.d' as I had (vaguely remember to have) found
at the time that the '0' is also not always "the only correct" choice.

Do you have a case or two where min.d could be useful?

Serguei.



Somehow I never got to propose/discuss the above,
but it seems a good time to do so now.

Martin



 >> barry
 >>
 >>
 >> On Sun, Dec 17, 2023 at 2:17 PM Duncan Murdoch 

 >> wrote:
 >>
 >>> This email originated outside the University. Check before clicking 
links
 >>> or attachments.
 >>>
 >>> I'm really confused.  Steve's example wasn't a scalar x, it was a
 >>> vector.  Your zapsmall() proposal wouldn't zap it to zero, and I don't
 >>> see why summary() would if it was using your proposal.
 >>>
 >>> Duncan Murdoch
 >>>
 >>> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:
  Isn’t that the correct outcome?  The user can change the number of
 >>> digits if they want to see small values…
 
  --
  Change your thoughts and you change the world.
  --Dr. Norman Vincent Peale
 
 > On Dec 17, 2023, at 12:11 AM, Steve Martin 
 >>> wrote:
 > Zapping a vector of small numbers to zero would cause problems when
 > printing the results of summary(). For example, if
 > zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0) then
 > print(summary(2.220446e-16), digits = 7) would print
 > Min. 1st Qu.  MedianMean 3rd Qu.Max.
 > 0  00   0   0  0
 >
 > The same problem can also appear when printing the results of
 > summary.glm() with show.residuals = TRUE if there's little dispersion
 > in the residuals.
 >
 > Steve
 >

On Sat, 16 Dec 2023 at 17:34, Gregory Warnes  wrote:

 >>

I was quite 

Re: [Rd] [External] Re: zapsmall(x) for scalar x

2023-12-18 Thread Martin Maechler
> Serguei Sokol via R-devel 
> on Mon, 18 Dec 2023 10:29:02 +0100 writes:

> Le 17/12/2023 à 18:26, Barry Rowlingson a écrit :
>> I think what's been missed is that zapsmall works relative to the 
absolute
>> largest value in the vector. Hence if there's only one
>> item in the vector, it is the largest, so its not zapped. The function's
>> raison d'etre isn't to replace absolutely small values,
>> but small values relative to the largest. Hence a vector of similar tiny
>> values doesn't get zapped.
>> 
>> Maybe the line in the docs:
>> 
>> " (compared with the maximal absolute value)"
>> 
>> needs to read:
>> 
>> " (compared with the maximal absolute value in the vector)"

> I agree that this change in the doc would clarify the situation but 
> would not resolve proposed corner cases.

> I think that an additional argument 'mx' (absolute max value of 
> reference) would do. Consider:

> zapsmall2 <-
> function (x, digits = getOption("digits"), mx=max(abs(x), na.rm=TRUE))
> {
>     if (length(digits) == 0L)
>     stop("invalid 'digits'")
>     if (all(ina <- is.na(x)))
>     return(x)
>     round(x, digits = if (mx > 0) max(0L, digits - 
> as.numeric(log10(mx))) else digits)
> }

> then zapsmall2() without explicit 'mx' behaves identically to actual 
> zapsmall() and for a scalar or a vector of identical value, user can 
> manually fix the scale of what should be considered as small:

>> zapsmall2(y)
> [1] 2.220446e-16
>> zapsmall2(y, mx=1)
> [1] 0
>> zapsmall2(c(y, y), mx=1)
> [1] 0 0
>> zapsmall2(c(y, NA))
> [1] 2.220446e-16   NA
>> zapsmall2(c(y, NA), mx=1)
> [1]  0 NA

> Obviously, the name 'zapsmall2' was chosen just for this explanation. 
> The original name 'zapsmall' could be reused as a full backward 
> compatibility is preserved.

> Best,
> Serguei.

Thank you, Serguei, Duncan, Barry et al.

Generally :
  Yes, zapsmall was meant and is used for zapping *relatively*
  small numbers.  In the other cases,  directly  round()ing is
  what you should use.

Specifically to Serguei's proposal of allowing the "max" value
to be user specified (in which case it is not really a true
max() anymore):

I've spent quite a a few hours on this problem in May 2022, to
make it even more flexible, e.g. allowing to use a 99%
percentile instead of the max(), or allowing to exclude +Inf
from the "mx"; but -- compared to your zapsmall2() --
to allow reproducible automatic choice :


zapsmall <- function(x, digits = getOption("digits"),
 mFUN = function(x, ina) max(abs(x[!ina])),
 min.d = 0L)
{
if (length(digits) == 0L)
stop("invalid 'digits'")
if (all(ina <- is.na(x)))
return(x)
mx <- mFUN(x, ina)
round(x, digits = if(mx > 0) max(min.d, digits - as.numeric(log10(mx))) 
else digits)
}

with optional 'min.d' as I had (vaguely remember to have) found
at the time that the '0' is also not always "the only correct" choice.

Somehow I never got to propose/discuss the above,
but it seems a good time to do so now.

Martin



>> barry
>> 
>> 
>> On Sun, Dec 17, 2023 at 2:17 PM Duncan Murdoch 
>> wrote:
>> 
>>> This email originated outside the University. Check before clicking 
links
>>> or attachments.
>>> 
>>> I'm really confused.  Steve's example wasn't a scalar x, it was a
>>> vector.  Your zapsmall() proposal wouldn't zap it to zero, and I don't
>>> see why summary() would if it was using your proposal.
>>> 
>>> Duncan Murdoch
>>> 
>>> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:
 Isn’t that the correct outcome?  The user can change the number of
>>> digits if they want to see small values…
 
 --
 Change your thoughts and you change the world.
 --Dr. Norman Vincent Peale
 
> On Dec 17, 2023, at 12:11 AM, Steve Martin 
>>> wrote:
> Zapping a vector of small numbers to zero would cause problems when
> printing the results of summary(). For example, if
> zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0) then
> print(summary(2.220446e-16), digits = 7) would print
> Min. 1st Qu.  MedianMean 3rd Qu.Max.
> 0  00   0   0  0
> 
> The same problem can also appear when printing the results of
> summary.glm() with show.residuals = TRUE if there's little dispersion
> in the residuals.
> 
> Steve
> 
> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes  wrote:
>> 
> I was quite suprised to discover that applying `zapsmall` to a scalar
>>> value has no apparent effect.  For example:
>>> y <- 2.220446e-16
>>> zapsmall(y,)
> [1] 

Re: [Rd] [External] Re: zapsmall(x) for scalar x

2023-12-18 Thread Serguei Sokol via R-devel

Le 17/12/2023 à 18:26, Barry Rowlingson a écrit :

I think what's been missed is that zapsmall works relative to the absolute
largest value in the vector. Hence if there's only one
item in the vector, it is the largest, so its not zapped. The function's
raison d'etre isn't to replace absolutely small values,
but small values relative to the largest. Hence a vector of similar tiny
values doesn't get zapped.

Maybe the line in the docs:

" (compared with the maximal absolute value)"

needs to read:

" (compared with the maximal absolute value in the vector)"
I agree that this change in the doc would clarify the situation but 
would not resolve proposed corner cases.
I think that an additional argument 'mx' (absolute max value of 
reference) would do. Consider:


zapsmall2 <-
function (x, digits = getOption("digits"), mx=max(abs(x), na.rm=TRUE))
{
    if (length(digits) == 0L)
    stop("invalid 'digits'")
    if (all(ina <- is.na(x)))
    return(x)
    round(x, digits = if (mx > 0) max(0L, digits - 
as.numeric(log10(mx))) else digits)

}

then zapsmall2() without explicit 'mx' behaves identically to actual 
zapsmall() and for a scalar or a vector of identical value, user can 
manually fix the scale of what should be considered as small:


> zapsmall2(y)
[1] 2.220446e-16
> zapsmall2(y, mx=1)
[1] 0
> zapsmall2(c(y, y), mx=1)
[1] 0 0
> zapsmall2(c(y, NA))
[1] 2.220446e-16   NA
> zapsmall2(c(y, NA), mx=1)
[1]  0 NA

Obviously, the name 'zapsmall2' was chosen just for this explanation. 
The original name 'zapsmall' could be reused as a full backward 
compatibility is preserved.


Best,
Serguei.



Barry





On Sun, Dec 17, 2023 at 2:17 PM Duncan Murdoch 
wrote:


This email originated outside the University. Check before clicking links
or attachments.

I'm really confused.  Steve's example wasn't a scalar x, it was a
vector.  Your zapsmall() proposal wouldn't zap it to zero, and I don't
see why summary() would if it was using your proposal.

Duncan Murdoch

On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:

Isn’t that the correct outcome?  The user can change the number of

digits if they want to see small values…


--
Change your thoughts and you change the world.
--Dr. Norman Vincent Peale


On Dec 17, 2023, at 12:11 AM, Steve Martin 

wrote:

Zapping a vector of small numbers to zero would cause problems when
printing the results of summary(). For example, if
zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0) then
print(summary(2.220446e-16), digits = 7) would print
Min. 1st Qu.  MedianMean 3rd Qu.Max.
 0  00   0   0  0

The same problem can also appear when printing the results of
summary.glm() with show.residuals = TRUE if there's little dispersion
in the residuals.

Steve


On Sat, 16 Dec 2023 at 17:34, Gregory Warnes  wrote:

I was quite suprised to discover that applying `zapsmall` to a scalar

value has no apparent effect.  For example:

y <- 2.220446e-16
zapsmall(y,)

[1] 2.2204e-16

I was expecting zapsmall(x)` to act like


round(y, digits=getOption('digits'))

[1] 0

Looking at the current source code, indicates that `zapsmall` is

expecting a vector:

zapsmall <-
function (x, digits = getOption("digits"))
{
 if (length(digits) == 0L)
 stop("invalid 'digits'")
 if (all(ina <- is.na(x)))
 return(x)
 mx <- max(abs(x[!ina]))
 round(x, digits = if (mx > 0) max(0L, digits -

as.numeric(log10(mx))) else digits)

}

If `x` is a non-zero scalar, zapsmall will never perform rounding.

The man page simply states:
zapsmall determines a digits argument dr for calling round(x, digits =

dr) such that values close to zero (compared with the maximal absolute
value) are ‘zapped’, i.e., replaced by 0.

and doesn’t provide any details about how ‘close to zero’ is defined.

Perhaps handling the special when `x` is a scalar (or only contains a

single non-NA value)  would make sense:

zapsmall <-
function (x, digits = getOption("digits"))
{
 if (length(digits) == 0L)
 stop("invalid 'digits'")
 if (all(ina <- is.na(x)))
 return(x)
 mx <- max(abs(x[!ina]))
 round(x, digits = if (mx > 0 && (length(x)-sum(ina))>1 ) max(0L,

digits - as.numeric(log10(mx))) else digits)

}

Yielding:


y <- 2.220446e-16
zapsmall(y)

[1] 0

Another edge case would be when all of the non-na values are the same:


y <- 2.220446e-16
zapsmall(c(y,y))

[1] 2.220446e-16 2.220446e-16

Thoughts?


Gregory R. Warnes, Ph.D.
g...@warnes.net
Eternity is a long time, take a friend!



 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

   [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org 

Re: [Rd] [External] Re: zapsmall(x) for scalar x

2023-12-17 Thread Steve Martin
Sorry for being unclear. I was commenting on the edge case that
Gregory brought up when calling zapsmall() with a vector of small
values. I thought Gregory was asking for thoughts on that as well, but
maybe I misunderstood. IMO it would be weird for zapsmall() to make a
small scalar zero but not a vector of the identical values.

The example with summary() was meant to show that zapping a vector of
small values to 0 could change the current printing behavior for
certain objects. Ducan is right that zapping only a scalar to zero
wouldn't do anything.

>>> Isn’t that the correct outcome?  The user can change the number of digits 
>>> if they want to see small values…

I'm not sure a user would be able to change the digits without
updating other functions. If xx[finite] <- zapsmall(x[finite]) in
print.summaryDefault() makes a vector of 0s (e.g., zapsmall(x) works
like round(x, digits = getOption("digits")) and getOptions("digits")
is 7) then calling print(summary(2.220446e-16), digits = 16) would
still print a vector of 0s. The digits argument to print() wouldn't do
anything.

In any case, I just wanted to point out that changes to zapsmall() in
the corner case Gregory brought up could affect the way certain
objects are printed, both changing the current behavior and perhaps
requiring changes to some other functions.

Steve

On Sun, 17 Dec 2023 at 12:26, Barry Rowlingson
 wrote:
>
> I think what's been missed is that zapsmall works relative to the absolute 
> largest value in the vector. Hence if there's only one
> item in the vector, it is the largest, so its not zapped. The function's 
> raison d'etre isn't to replace absolutely small values,
> but small values relative to the largest. Hence a vector of similar tiny 
> values doesn't get zapped.
>
> Maybe the line in the docs:
>
> " (compared with the maximal absolute value)"
>
> needs to read:
>
> " (compared with the maximal absolute value in the vector)"
>
> Barry
>
>
>
>
>
> On Sun, Dec 17, 2023 at 2:17 PM Duncan Murdoch  
> wrote:
>>
>> This email originated outside the University. Check before clicking links or 
>> attachments.
>>
>> I'm really confused.  Steve's example wasn't a scalar x, it was a
>> vector.  Your zapsmall() proposal wouldn't zap it to zero, and I don't
>> see why summary() would if it was using your proposal.
>>
>> Duncan Murdoch
>>
>> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:
>> > Isn’t that the correct outcome?  The user can change the number of digits 
>> > if they want to see small values…
>> >
>> >
>> > --
>> > Change your thoughts and you change the world.
>> > --Dr. Norman Vincent Peale
>> >
>> >> On Dec 17, 2023, at 12:11 AM, Steve Martin  
>> >> wrote:
>> >>
>> >> Zapping a vector of small numbers to zero would cause problems when
>> >> printing the results of summary(). For example, if
>> >> zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0) then
>> >> print(summary(2.220446e-16), digits = 7) would print
>> >>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>> >> 0  00   0   0  0
>> >>
>> >> The same problem can also appear when printing the results of
>> >> summary.glm() with show.residuals = TRUE if there's little dispersion
>> >> in the residuals.
>> >>
>> >> Steve
>> >>
>> >>> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes  wrote:
>> >>>
>> >>> I was quite suprised to discover that applying `zapsmall` to a scalar 
>> >>> value has no apparent effect.  For example:
>> >>>
>>  y <- 2.220446e-16
>>  zapsmall(y,)
>> >>> [1] 2.2204e-16
>> >>>
>> >>> I was expecting zapsmall(x)` to act like
>> >>>
>>  round(y, digits=getOption('digits'))
>> >>> [1] 0
>> >>>
>> >>> Looking at the current source code, indicates that `zapsmall` is 
>> >>> expecting a vector:
>> >>>
>> >>> zapsmall <-
>> >>> function (x, digits = getOption("digits"))
>> >>> {
>> >>> if (length(digits) == 0L)
>> >>> stop("invalid 'digits'")
>> >>> if (all(ina <- is.na(x)))
>> >>> return(x)
>> >>> mx <- max(abs(x[!ina]))
>> >>> round(x, digits = if (mx > 0) max(0L, digits - 
>> >>> as.numeric(log10(mx))) else digits)
>> >>> }
>> >>>
>> >>> If `x` is a non-zero scalar, zapsmall will never perform rounding.
>> >>>
>> >>> The man page simply states:
>> >>> zapsmall determines a digits argument dr for calling round(x, digits = 
>> >>> dr) such that values close to zero (compared with the maximal absolute 
>> >>> value) are ‘zapped’, i.e., replaced by 0.
>> >>>
>> >>> and doesn’t provide any details about how ‘close to zero’ is defined.
>> >>>
>> >>> Perhaps handling the special when `x` is a scalar (or only contains a 
>> >>> single non-NA value)  would make sense:
>> >>>
>> >>> zapsmall <-
>> >>> function (x, digits = getOption("digits"))
>> >>> {
>> >>> if (length(digits) == 0L)
>> >>> stop("invalid 'digits'")
>> >>> if (all(ina <- is.na(x)))
>> >>> return(x)
>> >>> mx <- max(abs(x[!ina]))
>> >>> round(x, digits = if (mx > 

Re: [Rd] [External] Re: zapsmall(x) for scalar x

2023-12-17 Thread Barry Rowlingson
I think what's been missed is that zapsmall works relative to the absolute
largest value in the vector. Hence if there's only one
item in the vector, it is the largest, so its not zapped. The function's
raison d'etre isn't to replace absolutely small values,
but small values relative to the largest. Hence a vector of similar tiny
values doesn't get zapped.

Maybe the line in the docs:

" (compared with the maximal absolute value)"

needs to read:

" (compared with the maximal absolute value in the vector)"

Barry





On Sun, Dec 17, 2023 at 2:17 PM Duncan Murdoch 
wrote:

> This email originated outside the University. Check before clicking links
> or attachments.
>
> I'm really confused.  Steve's example wasn't a scalar x, it was a
> vector.  Your zapsmall() proposal wouldn't zap it to zero, and I don't
> see why summary() would if it was using your proposal.
>
> Duncan Murdoch
>
> On 17/12/2023 8:43 a.m., Gregory R. Warnes wrote:
> > Isn’t that the correct outcome?  The user can change the number of
> digits if they want to see small values…
> >
> >
> > --
> > Change your thoughts and you change the world.
> > --Dr. Norman Vincent Peale
> >
> >> On Dec 17, 2023, at 12:11 AM, Steve Martin 
> wrote:
> >>
> >> Zapping a vector of small numbers to zero would cause problems when
> >> printing the results of summary(). For example, if
> >> zapsmall(c(2.220446e-16, ..., 2.220446e-16)) == c(0, ..., 0) then
> >> print(summary(2.220446e-16), digits = 7) would print
> >>Min. 1st Qu.  MedianMean 3rd Qu.Max.
> >> 0  00   0   0  0
> >>
> >> The same problem can also appear when printing the results of
> >> summary.glm() with show.residuals = TRUE if there's little dispersion
> >> in the residuals.
> >>
> >> Steve
> >>
> >>> On Sat, 16 Dec 2023 at 17:34, Gregory Warnes  wrote:
> >>>
> >>> I was quite suprised to discover that applying `zapsmall` to a scalar
> value has no apparent effect.  For example:
> >>>
>  y <- 2.220446e-16
>  zapsmall(y,)
> >>> [1] 2.2204e-16
> >>>
> >>> I was expecting zapsmall(x)` to act like
> >>>
>  round(y, digits=getOption('digits'))
> >>> [1] 0
> >>>
> >>> Looking at the current source code, indicates that `zapsmall` is
> expecting a vector:
> >>>
> >>> zapsmall <-
> >>> function (x, digits = getOption("digits"))
> >>> {
> >>> if (length(digits) == 0L)
> >>> stop("invalid 'digits'")
> >>> if (all(ina <- is.na(x)))
> >>> return(x)
> >>> mx <- max(abs(x[!ina]))
> >>> round(x, digits = if (mx > 0) max(0L, digits -
> as.numeric(log10(mx))) else digits)
> >>> }
> >>>
> >>> If `x` is a non-zero scalar, zapsmall will never perform rounding.
> >>>
> >>> The man page simply states:
> >>> zapsmall determines a digits argument dr for calling round(x, digits =
> dr) such that values close to zero (compared with the maximal absolute
> value) are ‘zapped’, i.e., replaced by 0.
> >>>
> >>> and doesn’t provide any details about how ‘close to zero’ is defined.
> >>>
> >>> Perhaps handling the special when `x` is a scalar (or only contains a
> single non-NA value)  would make sense:
> >>>
> >>> zapsmall <-
> >>> function (x, digits = getOption("digits"))
> >>> {
> >>> if (length(digits) == 0L)
> >>> stop("invalid 'digits'")
> >>> if (all(ina <- is.na(x)))
> >>> return(x)
> >>> mx <- max(abs(x[!ina]))
> >>> round(x, digits = if (mx > 0 && (length(x)-sum(ina))>1 ) max(0L,
> digits - as.numeric(log10(mx))) else digits)
> >>> }
> >>>
> >>> Yielding:
> >>>
>  y <- 2.220446e-16
>  zapsmall(y)
> >>> [1] 0
> >>>
> >>> Another edge case would be when all of the non-na values are the same:
> >>>
>  y <- 2.220446e-16
>  zapsmall(c(y,y))
> >>> [1] 2.220446e-16 2.220446e-16
> >>>
> >>> Thoughts?
> >>>
> >>>
> >>> Gregory R. Warnes, Ph.D.
> >>> g...@warnes.net
> >>> Eternity is a long time, take a friend!
> >>>
> >>>
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> __
> >>> R-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel