Re: [Rd] scan(..., skip=1e11): infinite loop; cannot interrupt

2023-03-13 Thread Suharto Anggono Suharto Anggono via R-devel


With

 if (!j--) {
 R_CheckUserInterrupt();
 j = 1;
 }

as in current R devel (r83976), j goes negative (-1) and interrupt is checked 
every 10001 instead of 1. I prefer

 if (!--j) {
 R_CheckUserInterrupt();
 j = 1;
 }

.


In current R devel (r83976), if EOF is reached, the outer loop keeps going, i 
keeps incrementing until nskip.

The outer loop could be made to also stop on EOF.

Alternatively, not using nested loop is possible, like the following.

 if (nskip) for (R_xlen_t i = 0, j = 1; ; ) { /* MBCS-safe */
 c = scanchar(FALSE, &data);
 if (!j--) {
 R_CheckUserInterrupt();
 j = 1;
 }
 if ((c == '\n' && ++i == nskip) || c == R_EOF)
 break;
 }


---
On 2/11/23 09:33, Ivan Krylov wrote:
> On Fri, 10 Feb 2023 23:38:55 -0600
> Spencer Graves  wrote:
>
>> I have a 4.54 GB file that I'm trying to read in chunks using
>> "scan(..., skip=__)".  It works as expected for small values of
>> "skip" but goes into an infinite loop for "skip=1e11" and similar
>> large values of skip:  I cannot even interrupt it;  I must kill R.
> Skipping lines is done by two nested loops. The outer loop counts the
> lines to skip; the inner loop reads characters until it encounters a
> newline or end of file. The outer loop doesn't check for EOF and keeps
> asking for more characters until the inner loop runs at least once for
> every line it wants to skip. The following patch should avoid the
> wait in such cases:
>
> --- src/main/scan.c (revision 83797)
> +++ src/main/scan.c (working copy)
> @@ -835,7 +835,7 @@
>   attribute_hidden SEXP do_scan(SEXP call, SEXP op, SEXP args, SEXP rho)
>   {
>   SEXP ans, file, sep, what, stripwhite, dec, quotes, comstr;
> -int c, flush, fill, blskip, multiline, escapes, skipNul;
> +int c = 0, flush, fill, blskip, multiline, escapes, skipNul;
>   R_xlen_t nmax, nlines, nskip;
>   const char *p, *encoding;
>   RCNTXT cntxt;
> @@ -952,7 +952,7 @@
>if(!data.con->canread)
>error(_("cannot read from this connection"));
>}
> - for (R_xlen_t i = 0; i < nskip; i++) /* MBCS-safe */
> + for (R_xlen_t i = 0; i < nskip && c != R_EOF; i++) /* MBCS-safe */
>while ((c = scanchar(FALSE, &data)) != '\n' && c != R_EOF);
>   }
>
>
> Making it interruptible is a bit more work: we need to ensure that a
> valid context is set up and check regularly for an interrupt.
>
> --- src/main/scan.c (revision 83797)
> +++ src/main/scan.c (working copy)
> @@ -835,7 +835,7 @@
>   attribute_hidden SEXP do_scan(SEXP call, SEXP op, SEXP args, SEXP rho)
>   {
>   SEXP ans, file, sep, what, stripwhite, dec, quotes, comstr;
> -int c, flush, fill, blskip, multiline, escapes, skipNul;
> +int c = 0, flush, fill, blskip, multiline, escapes, skipNul;
>   R_xlen_t nmax, nlines, nskip;
>   const char *p, *encoding;
>   RCNTXT cntxt;
> @@ -952,8 +952,6 @@
>if(!data.con->canread)
>error(_("cannot read from this connection"));
>}
> - for (R_xlen_t i = 0; i < nskip; i++) /* MBCS-safe */
> - while ((c = scanchar(FALSE, &data)) != '\n' && c != R_EOF);
>   }
>
>   ans = R_NilValue; /* -Wall */
> @@ -966,6 +964,10 @@
>   cntxt.cend = &scan_cleanup;
>   cntxt.cenddata = &data;
>
> +if (ii) for (R_xlen_t i = 0, j = 0; i < nskip && c != R_EOF; i++) /* 
>MBCS-safe */
> + while ((c = scanchar(FALSE, &data)) != '\n' && c != R_EOF)
> + if (j++ % 1 == ) R_CheckUserInterrupt();
> +
>   switch (TYPEOF(what)) {
>   case LGLSXP:
>   case INTSXP:
>
> This way, even if you pour a Decanter of Endless Lines (e.g. mkfifo
> LINES; perl -E'print "A"x42 while 1;' > LINES) into scan(), it can
> still be interrupted, even if neither newline nor EOF ever arrives.

Thanks, I've updated the implementation of scan() in R-devel to be
interruptible while skipping lines.

I've done it slightly differently as I found there already was a memory
leak, which could be fixed by creating the context a bit earlier.

I've also avoided modulo on the fast path as I saw 13% performance
overhead on my mailbox file. Decrementing and checking against zero
didn't have measurable overhead.

Best
Tomas

[snip]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug with `[<-.POSIXlt` on specific OSes

2022-10-30 Thread Suharto Anggono Suharto Anggono via R-devel
I just pointed out that, in 
https://stat.ethz.ch/pipermail/r-devel/2022-October/082082.html ("A potential 
POSIXlt->Date bug introduced in r-devel"),
dlt <- .POSIXlt(list(sec = c(-999, 1 + c(1:10,-Inf, NA)) + pi,
# "out of range", non-finite, fractions
 min = 45L, hour = c(21L, 3L, NA, 4L),
 mday = 6L, mon  = c(11L, NA, 3L),
 year = 116L, wday = 2L, yday = 340L, isdst = 1L))
doesn't work generally as an example.

When as.POSIXct(dlt)[1] is NA, it is unexpected to me that 
as.POSIXct(balancePOSIXlt(dlt))[1] is not NA.

It happens because, unlike 'dlt', 'isdst' is 0 in balancePOSIXlt(dlt). It is 
because 'isGMT' is TRUE in 'do_balancePOSIXlt' in datetime.c, as the number of 
components of 'dlt' is 9.


If content is changed, possible output of 'balancePOSIXlt' that I expect:

Option 1: companion of 'as.POSIXct.POSIXlt' applied to the same input, as with 
function 'mktime' in C
- The input "POSIXlt" object is like the initial struct tm whose pointer is 
presented to 'mktime'.
- The result of 'as.POSIXct.POSIXlt' is like the return value of 'mktime'.
- The result of 'balancePOSIXlt' is like the final struct tm after 'mktime' is 
applied.

Option 2: corresponding with 'format.POSIXlt' applied to the same input
'format.POSIXlt' doesn't fix 'wday' or 'yday'.

format(dlt, "%Y-%m-%d %w %j")[c(6, 9)]
# c("2016-04-06 2 341", "2016-04-07 2 341")


Side issues on 'format.POSIXlt':

- %OSn uses unnormalized 'sec', unlike %S.
format(dlt, "%S %OS3")[1]  # "24 -995.858"

-
format(dlt, "%A")[12]  # "-Inf"
It is rather strange to me to get "-Inf" from format %A. I expect to get 
weekday name. NA is acceptable.
Function 'weekdays' use it.


The reported issue remains.

x <- as.POSIXlt(as.POSIXct("2013-01-31", tz = "America/Chicago"))
Sys.setenv(TZ = "UTC")
x[1] <- NA
# Error in x[[n]][i] <- value[[n]] : replacement has length zero




---
On Saturday, 22 October 2022, 07:12:51 pm GMT+7, Martin Maechler 
 wrote:


>>>>> Martin Maechler
>>>>>on Tue, 18 Oct 2022 10:56:25 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel
>>>>>on Fri, 14 Oct 2022 16:21:14 + (UTC) writes:

>> I think '[.POSIXlt' and '[<-.POSIXlt' don't need to
>> normalize out-of-range values. I think they just make
>> same length for all components, to ensure correct
>> extraction or replacement for arbitrary index.

> Yes, you are right; this is definitely correct...  and
> would be more efficient.

> At the moment, we were mostly focused on *correct*
> behaviour in the case of "ragged" and/or out-of-range
> POSIXlt objects.


>> I have a thought of adding an optional argument for
>> 'as.POSIXlt' applied to "POSIXlt" object. Possible name:
>> normalize adjust fixup

>> To allow recycling only without changing content, instead
>> of TRUE or FALSE, maybe choice, like fixup = c("none",
>> "balance", "normalize") , where "normalize" implies
>> "balance", or adjust = c("none", "length", "content",
>> "value") , where "content" and "value" are synonymous.

> Such an optional argument for as.POSIXlt() would be a
> possibility and could replace the new and for now still
> somewhat experimental balancePOSIXlt().

> +: One advantage of (one of the above proposals) would
> be that it does not take up a new function name.

> -: OTOH, it may be overdoing the semantics

>  as.POSIXlt(,  = )

>  and it may be harder to understand by
> non-sophisticated R users, because as.POSIXlt() is a
> generic with several methods, and these extra arguments
> would probably only apply to the as.POSIXlt.default()
> method and there *only* for the case where the argument
> inherits from "POSIXlt" .. and all that being somewhat
> subtle to see for Joe Average UseR

> I agree that it will make sense to get an R-level
> version, either using new arguments in as.POSIXlt() or
> (still my preference) in balancePOSIXlt() to allow to
> "only fill all components".

> HOWEVER note that the "filling" (by recycling) and no
> extra checking will of

Re: [Rd] Bug with `[<-.POSIXlt` on specific OSes

2022-10-14 Thread Suharto Anggono Suharto Anggono via R-devel
I think '[.POSIXlt' and '[<-.POSIXlt' don't need to normalize out-of-range 
values. I think they just make same length for all components, to ensure 
correct extraction or replacement for arbitrary index.

I have a thought of adding an optional argument for 'as.POSIXlt' applied to 
"POSIXlt" object. Possible name:
normalize
adjust
fixup

To allow recycling only without changing content, instead of TRUE or FALSE, 
maybe choice, like
fixup = c("none", "balance", "normalize")
, where "normalize" implies "balance", or
adjust = c("none", "length", "content", "value")
, where "content" and "value" are synonymous.

By the way, Inf in 'sec' component is out-of-range!


For 'gmtoff', NA or 0 should be put for unknown. A known 'gmtoff' may be 
[ositive, negative, or zero. The documentation says
‘gmtoff’ (Optional.) The offset in seconds from GMT: positive
values are East of the meridian.  Usually ‘NA’ if unknown,
but ‘0’ could mean unknown.


dlt <- .POSIXlt(list(sec = c(-999, 1 + c(1:10,-Inf, NA)) + pi,
# "out of range", non-finite, fractions
 min = 45L, hour = c(21L, 3L, NA, 4L),
 mday = 6L, mon  = c(11L, NA, 3L),
 year = 116L, wday = 2L, yday = 340L, isdst = 1L))

as.POSIXct(dlt)[1] is NA on Linux with timezone without DST. For example, after
Sys.setenv(TZ = "EST")



> Martin Maechler
> on Wed, 12 Oct 2022 10:17:28 +0200 writes:

> Kurt Hornik
> on Tue, 11 Oct 2022 16:44:13 +0200 writes:

> Davis Vaughan writes:
>>> I've got a bit more information about this one. It seems like it
>>> (only? not sure) appears when `TZ = "UTC"`, which is why I didn't see
>>> it before on my Mac, which defaults to `TZ = ""`. I think this is at
>>> least explainable by the fact that those "optional" fields aren't
>>> technically needed when the time zone is UTC.

>> Exactly.  Debugging `[<-.POSIlt` with

>> x <- as.POSIXlt(as.POSIXct("2013-01-31", tz = "America/Chicago"))
>> Sys.setenv(TZ = "UTC")
>> x[1] <- NA

>> shows we get into

>> value <- unclass(as.POSIXlt(value))
>> if (ici) {
>> for (n in names(x)) names(x[[n]]) <- nms
>> }
>> for (n in names(x)) x[[n]][i] <- value[[n]]

>> where

>> Browse[2]> names(value)
>> [1] "sec"   "min"   "hour"  "mday"  "mon"   "year"  "wday"  "yday"  
"isdst"
>> Browse[2]> names(x)
>> [1] "sec""min""hour"   "mday"   "mon""year"   "wday"   "yday"
>> [9] "isdst"  "zone"   "gmtoff"

>> Without having looked at the code, the docs say

>> ‘zone’ (Optional.) The abbreviation for the time zone in force at
>> that time: ‘""’ if unknown (but ‘""’ might also be used for
>> UTC).

>> ‘gmtoff’ (Optional.) The offset in seconds from GMT: positive
>> values are East of the meridian.  Usually ‘NA’ if unknown,
>> but ‘0’ could mean unknown.

>> so perhaps we should fill with the values for the unknown case?

>> -k

> Well,

> I think you both know  I'm in the midst of dealing with these
> issues, to fix both

> [.POSIXlt  and
> [<-.POSIXlt

> Yes, one needs a way to not only "fill" the partially filled
> entries but also to *normalize* out-of-range values
> (say negative seconds, minutes > 60, etc)

> All this is available in our C code, but not on the R level,
> so yesterday, I wrote a C function to be called via .Internal(.)
> from a new R that provides this.

> Provisionally called

> balancePOSIXlt()

> because it both balances the 9 to 11 list-components of POSIXlt
> and it also puts all numbers of (sec, min, hour, mday, mon)
> into a correct range (and also computes correctl wday and yday numbers).
> but I'm happy for proposals of better names.
> I had contemplated  validatePOSIXlt() as alternative, but then
> dismissed that as in some sense we now do agree that
> "imbalanced" POSIXlt's are not really invalid ..

> .. and yes, to Davis:  Even though I've spent so many hours with
> POSIXlt, POSIXct and Date during the last week, I'm still
> surprised more often than I like by the effects of timezone
> settings there.

> Martin

I have committed the new R and C code now, defining  balancePOSIXlt(),
to get feedback from the community.

I've extended the documentation in  help(DateTimeClasses),
and notably factored out the description
of  POSIXlt  mentioning the  "ragged" and "out-of-range" cases.

This needs more testing and experiments, and I have not
announced it  NEWS  yet.

Planned next is to use it in  [.POSIXlt and [<-.POSIXlt
so they will work correctly.

But please share your thoughts, propositions, ...

Martin


[snip]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.character.POSIXt in R devel

2022-10-07 Thread Suharto Anggono Suharto Anggono via R-devel
 Yes, no documentation.
"POSIXlt" object with out-of-bounds components or whose components are not all 
of the same length may be produced internally by 'seq.POSIXt'.
Initially, 'r1' is a "POSIXlt" object whose all components have length 1.
Component 'year', 'mon', or 'mday' of 'r1' is then modified. It may have more 
than one elements. For 'mon' or 'mday', some may be out-of-bounds.



On Monday, 3 October 2022, 11:58:53 pm GMT+7, Martin Maechler 
 wrote:


>>>>> Martin Maechler
>>>>>    on Mon, 3 Oct 2022 14:46:08 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel
>>>>>    on Sun, 2 Oct 2022 08:42:50 + (UTC) writes:

    >> With r82904, 'as.character.POSIXt' in R devel is changed. The NEWS item:

    >> as.character() now behaves more in line with the
    >> methods for atomic vectors such as numbers, and is no longer
    >> influenced by options().

[..]

[snip]


    >> * Behavior with "improper" "POSIXlt" object:

    >> - "POSIXlt" object with out-of-bounds components is not normalized.

    >> Example (modified from regr.tests-1d.R):
    >> op <- options(scipen = 0) # (default setting)
    >> x <- structure(
    >> list(sec = 1, min = 59L, hour = 18L,
    >> mday = 6L, mon = 11L, year = 116L,
    >> wday = 2L, yday = 340L,
    >> isdst = 0L, zone = "CET", gmtoff = 3600L),
    >> class = c("POSIXlt", "POSIXt"), tzone = "CET")
    >> as.character(x)
    >> # "2016-12-06 18:59:1"
    >> format(x)
    >> # "2016-12-06 21:45:40"
    >> options(op)


    > Yes, we knew that  and were not too happy about it, but also not
    > too unhappy:
    > After all,            help(DateTimeClasses)
    > clearly explains how
    > POSIXlt objects should look like :

    > ---
    > Class ‘"POSIXlt"’ is a named list of vectors representing

    > ‘sec’ 0-61: seconds.
    > ‘min’ 0-59: minutes.
    > ‘hour’ 0-23: hours.
    > ‘mday’ 1-31: day of the month
    > ‘mon’ 0-11: months after the first of the year.
    > ‘year’ years since 1900.
    > ‘wday’ 0-6 day of the week, starting on Sunday.
    > ‘yday’ 0-365: day of the year (365 only in leap years).

    > ‘isdst’ Daylight Saving Time ... ... ...
    > 
    > 

    > ---

    > We have been aware that as.character() assumes the above specification,
    > even though other R functions, notably format() which uses
    > internal (C level; either system (OS) or R's own) strptime() do
    > arithmetic (modulo 60, then modulo 24, then modulo month length)
    > to compute the date "used".

    > Allowing such  "un-normalized" / out-of-bound  POSIXlt objects
    > in R has not been documented AFAICS, and has the consequence
    > that two different POSIXlt objects may correspond to the exact
    > same time.

    > This may be something worth discussing.
    > In some sense we are discussing how the "POSIXlt" class is defined
    > (even though an S3 class is never formally defined).

(nothing changed here)


    >> - With "POSIXlt" object where sec, min, hour, mday, mon,
    >> and year components are not all of the same length, recycling is not 
handled.

This is still the case... (see below).

    > Good point.  I tend to agree that this should be improved *and* also
    > documented: AFAIK, it is also not at all documented  (or is it ??)
    > that the POSIXlt components should be thought to be recycling.

    > If we decide we want that,
    > once this is documented (and all methods/functions tested with
    > such POSIXlt) it could also be used to use considerably smaller size
    > POSIXlt objects, e.g, when all parts are in the same year, or
    > when all seconds are 0, or ...

    >> Example (modified from regr.tests-1d.R):
    >> op <- options(scipen = 0) # (default setting)
    >> x <- structure(
    >> list(sec = c(1,  2), min = 59L, hour = 18L,
    >> mday = 6L, mon = 11L, year = 116L,
    >> wday = 2L, yday = 340L,
    >> isdst = 0L, zone = "CET", gmtoff = 3600L),
    >> class = c("POSIXlt", "POSIXt"), tzone = "CET")
    >> as.character(x)
    >> # c("2016-12-06 18:59:01", "NA NA:NA:02")
    >> format(x)
    >> 

Re: [Rd] as.character.POSIXt in R devel

2022-10-06 Thread Suharto Anggono Suharto Anggono via R-devel
In 'as.character.POSIXt' in R devel after r83010:
if(getOption("OutDec") != OutDec) { op <- options(OutDec = OutDec); 
on.exit(op) }

on.exit(op)
does nothing. It should be
on.exit(options(op))


Is it OK to output the seconds using scientific notation?
Example (modified from https://bugs.r-project.org/show_bug.cgi?id=9819):
op <- options(scipen = 0) # (default setting)
as.character(as.POSIXlt("2007-07-27 16:11:00.02"))
# "2007-07-27 16:11:02e-06"
options(op)


Example (modified from https://bugs.r-project.org/show_bug.cgi?id=14579):
op <- options(scipen = 0) # (default setting)
ct <- as.POSIXct(1302811200 - 2e-07,
origin = as.POSIXct("1970-01-01 00:00:00", tz="UTC"), tz = "UTC")
as.character(ct) # outputs "60" for seconds
# "2011-04-14 19:59:60"
options(op)


(CharleMagne.crowned <- as.POSIXlt(ISOdate(774,7,10)))
as.character(CharleMagne.crowned)

As I mentioned, they are different on OSes _other than_ Linux.

In R on Windows:
> (CharleMagne.crowned <- as.POSIXlt(ISOdate(774,7,10)))
[1] "0774-07-10 12:00:00 GMT"


---
On Monday, 3 October 2022, 11:58:53 pm GMT+7, Martin Maechler 
 wrote:


>>>>> Martin Maechler
>>>>>on Mon, 3 Oct 2022 14:46:08 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel
>>>>>on Sun, 2 Oct 2022 08:42:50 + (UTC) writes:

>> With r82904, 'as.character.POSIXt' in R devel is changed. The NEWS item:

>> as.character() now behaves more in line with the
>> methods for atomic vectors such as numbers, and is no longer
>> influenced by options().

[..]

>> * Wrong:

>> The result is wrong when as.character(fs[n0]) has scientific notation.

> yes, you are right.  This is a lapsus I will fix.

>> Example (modified from https://bugs.r-project.org/show_bug.cgi?id=9819):
>> op <- options(scipen = 0, OutDec = ".") # (default setting)
>> x <- as.POSIXlt("2007-07-27 16:11:03.02")
>> as.character(x)
>> # "2007-07-27 16:11:03.983547e-06"
>> as.character(x$sec - trunc(x$sec))
>> # "1.983547e-06"
>> options(op)

>> 'as.character.POSIXt' could temporarily set option 'scipen' large enough 
to prevent scientific notation in as.character(fs[n0]) .

> Yes, something like that.

I have committed a version now of datetime.R,  svn rev 83010 ,
which does no longer depend on  'OutDec' (but gets such argument)
and which has a new 'digits' argument which defaults
to 14 for POSIXlt and
to  6 for POSIXct  .. but the user can choose a different value.

Also, it now uses the equivalent of  as.character(round(x$sec, digits))
(in case the seconds need to be shown)  which also solves the
following  "too much precision"  problem.

>> * Too much precision:

>> In some cases with fractional seconds with seconds close to 60, the 
result has many decimal places while there is an accurate representation with 
less decimal places. It is actually OK, just unpleasant.

> I agree that is unpleasant.
> To someone else I had written that we also may need to improve
> the number of decimals shown here.
> The design has been that it should be "full precision"
> as it is for  as.character()

> Now, we know that POSIXct cannot be very precise (in its
> fractional seconds) but that is very different for POSIXlt where
> fractional seconds may have 14 digits after the decimal point.

> Ideally we could *store* with the POSIXlt object if it was
> produced from a POSIXct one, and hence have only around 6 valid digits
> (after the dec.) or not.  As we cannot currently store/save that
> info, we kept using "full" precision which may be much more than
> is sensible.

>> Example (modified from https://bugs.r-project.org/show_bug.cgi?id=14693):
>> op <- options(scipen = 0, OutDec = ".") # (default setting)
>> x <- as.POSIXlt("2011-10-01 12:34:56.3")
>> x$sec == 56.3 # TRUE

> [which may be typical, but may also be platform dependent]

>> print(x$sec, 17)
>> # [1] 56.297
>> as.character(x)
>> # "2011-10-01 12:34:56.297"
>> format(x, "%Y-%m-%d %H:%M:%OS1") # short and accurate
>> # "2011-10-01 12:34:56.3"
>> ct <- as.POSIXct(x, tz = "UTC")
>> identical(ct,
>> as.POSIXct("2011-10-01 12:34:56.3", tz = "UTC"))
>> # TRUE
>&g

[Rd] as.character.POSIXt in R devel

2022-10-02 Thread Suharto Anggono Suharto Anggono via R-devel
With r82904, 'as.character.POSIXt' in R devel is changed. The NEWS item:
as.character() now behaves more in line with the
   methods for atomic vectors such as numbers, and is no longer
   influenced by options().

Part of the code:

   s <- trunc(x$sec)
fs <- x$sec - s
r1 <- sprintf("%d-%02d-%02d", 1900 + x$year, x$mon+1L, x$mday)
  if(any(n0 <- time != 0)) # add time if not 0
r1[n0] <- paste(r1[n0],
sprintf("%02d:%02d:%02d%s", x$hour[n0], x$min[n0], 
s[n0],
substr(as.character(fs[n0]), 2L, 32L)))


* Wrong:
The result is wrong when as.character(fs[n0]) has scientific notation.
Example (modified from https://bugs.r-project.org/show_bug.cgi?id=9819):
op <- options(scipen = 0, OutDec = ".") # (default setting)
x <- as.POSIXlt("2007-07-27 16:11:03.02")
as.character(x)
# "2007-07-27 16:11:03.983547e-06"
as.character(x$sec - trunc(x$sec))
# "1.983547e-06"
options(op)

'as.character.POSIXt' could temporarily set option 'scipen' large enough to 
prevent scientific notation in as.character(fs[n0]) .


* Too much precision:
In some cases with fractional seconds with seconds close to 60, the result has 
many decimal places while there is an accurate representation with less decimal 
places. It is actually OK, just unpleasant.
Example (modified from https://bugs.r-project.org/show_bug.cgi?id=14693):
op <- options(scipen = 0, OutDec = ".") # (default setting)
x <- as.POSIXlt("2011-10-01 12:34:56.3")
x$sec == 56.3 # TRUE
print(x$sec, 17)
# [1] 56.297
as.character(x)
# "2011-10-01 12:34:56.297"
format(x, "%Y-%m-%d %H:%M:%OS1") # short and accurate
# "2011-10-01 12:34:56.3"
ct <- as.POSIXct(x, tz = "UTC")
identical(ct,
as.POSIXct("2011-10-01 12:34:56.3", tz = "UTC"))
# TRUE
print(as.numeric(ct), 17)
# [1] 1317472496.3
lct <- as.POSIXlt(ct)
lct$sec == 56.3 # FALSE
print(lct$sec, 17)
# [1] 56.29952316284
as.character(ct)
# "2011-10-01 12:34:56.29952316284"
options(op)

The "POSIXct" case is a little different because some precision is already lost 
after converted to "POSIXct".

In 'as.character.POSIXt', using 'as.character' on the seconds (not separating 
the fractional part) might be good enough, but a leading zero must be added as 
necessary.


* Different from 'format':

- With fractional seconds, the result is influenced by option 'OutDec'.

- From "Printing years" in ?strptime: "For years 0 to 999 most OSes pad with 
zeros or spaces to 4 characters, and Linux outputs just the number."
Because (1900 + x$year) is formatted with %d in 'as.character.POSIXt', years 0 
to 999 is output without padding. It is different from 'format' in OSes other 
than Linux.


* Behavior with "improper" "POSIXlt" object:

- "POSIXlt" object with out-of-bounds components is not normalized.
Example (modified from regr.tests-1d.R):
op <- options(scipen = 0) # (default setting)
x <- structure(
list(sec = 1, min = 59L, hour = 18L,
mday = 6L, mon = 11L, year = 116L,
wday = 2L, yday = 340L,
isdst = 0L, zone = "CET", gmtoff = 3600L),
class = c("POSIXlt", "POSIXt"), tzone = "CET")
as.character(x)
# "2016-12-06 18:59:1"
format(x)
# "2016-12-06 21:45:40"
options(op)

- With "POSIXlt" object where sec, min, hour, mday, mon, and year components 
are not all of the same length, recycling is not handled.
Example (modified from regr.tests-1d.R):
op <- options(scipen = 0) # (default setting)
x <- structure(
list(sec = c(1,  2), min = 59L, hour = 18L,
mday = 6L, mon = 11L, year = 116L,
wday = 2L, yday = 340L,
isdst = 0L, zone = "CET", gmtoff = 3600L),
class = c("POSIXlt", "POSIXt"), tzone = "CET")
as.character(x)
# c("2016-12-06 18:59:01", "NA NA:NA:02")
format(x)
# c("2016-12-06 18:59:01", "2016-12-06 18:59:02")
options(op)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] gettext(msgid, domain="R") doesn't work for some 'msgid':s

2021-11-06 Thread Suharto Anggono Suharto Anggono via R-devel
This issue has come up before: 
https://stat.ethz.ch/pipermail/r-help/2013-February/346721.html ("gettext 
wierdness"), https://stat.ethz.ch/pipermail/r-devel/2007-December/047893.html 
("gettext() and messages in 'pkg' domain").

Using 'ngettext' is a workaround, like in 
https://rdrr.io/cran/svMisc/src/R/svMisc-internal.R .

It is documented: "For 'gettext', leading and trailing whitespace is ignored 
when looking for the translation."


>> Martin Maechler
> on Fri, 5 Nov 2021 17:55:24 +0100 writes:

> Tomas Kalibera
> on Fri, 5 Nov 2021 16:15:19 +0100 writes:

 >> On 11/5/21 4:12 PM, Duncan Murdoch wrote:
 >>> On 05/11/2021 10:51 a.m., Henrik Bengtsson wrote:
  I'm trying to reuse some of the translations available in base R by
  using:
 
     gettext(msgid, domain="R")
 
  This works great for most 'msgid's, e.g.
 
  $ LANGUAGE=de Rscript -e 'gettext("cannot get working directory",
  domain="R")'
  [1] "kann das Arbeitsverzeichnis nicht ermitteln"
 
  However, it does not work for all.  For instance,
 
  $ LANGUAGE=de Rscript -e 'gettext("Execution halted\n", domain="R")'
  [1] "Execution halted\n"
 
  This despite that 'msgid' existing in:
 
  $ grep -C 2 -F 'Execution halted\n' src/library/base/po/de.po
 
  #: src/main/main.c:342
  msgid "Execution halted\n"
  msgstr "Ausführung angehalten\n"
 
  It could be that the trailing newline causes problems, because the
  same happens also for:
 
  $ LANGUAGE=de Rscript --vanilla -e 'gettext("error during cleanup\n",
  domain="R")'
  [1] "error during cleanup\n"
 
  Is this meant to work, and if so, how do I get it to work, or is it a
  bug?
 >>>
 >>> I don't know the solution, but I think the cause is different than you
 >>> think, because I also have the problem with other strings not
 >>> including "\n":
 >>>
 >>> $ LANGUAGE=de Rscript -e 'gettext("malformed version string",
 >>> domain="R")'
 >>> [1] "malformed version string"

 > You need domain="R-base" for the "malformed version "string"


 >> I can reproduce Henrik's report and the problem there is that the
 >> trailing \n is stripped by R before doing the lookup, in do_gettext


 >>     /* strip leading and trailing white spaces and
 >>    add back after translation */
 >>     for(p = tmp;
 >>     *p && (*p == ' ' || *p == '\t' || *p == '\n');
 >>     p++, ihead++) ;

 >> But, calling dgettext with the trailing \n does translate correctly for me.

 >> I'd leave to translation experts how this should work (e.g. whether the
 >> .po files should have trailing newlines).

 > Thanks a lot, Tomas.
 > This is "interesting" .. and I think an R bug one way or the
 > other (and I also note that Henrik's guess was also right on !).

 > We have the following:

 > - New translation *.po source files are to be made from the original *.pot 
 > files.

 > In our case it's our code that produce R.pot and R-base.pot
 > (and more for the non-base packages, and more e.g. for
 > Recommended packages 'Matrix' and 'cluster' I maintain).

 > And notably the R.pot (from all the "base" C error/warn/.. messages)
 > contains tons of msgid strings of the form "...\n"
 > i.e., ending in \n.
 >> From that automatically the translator's *.po files should also
 > end in \n.

 > Additionally, the GNU gettext FAQ has
 > (here : https://www.gnu.org/software/gettext/FAQ.html#newline )

 > 
 > Q: What does this mean: “'msgid' and 'msgstr' entries do not both end with 
 > '\n'”

 > A: It means that when the original string ends in a newline, your 
 > translation must also end in a newline. And if the original string does not 
 > end in a newline, then your translation should likewise not have a newline 
 > at the end.
 > 

 >> From all that I'd conclude that we (R base code) are the source
 > of the problem.
 > Given the above FAQ, it seems common in other projects also to
 > have such trailing \n and so we should really change the C code
 > you cite above.

 > On the other hand, this is from almost the very beginning of
 > when Brian added translation to R,
 > 
 > r32938 | ripley | 2005-01-30 20:24:04 +0100 (Sun, 30 Jan 2005) | 2 lines

 > include \n in whitespace ignored for R-level gettext
 > 

 > I think this has been because simultaneously we had started to
 > emphasize to useRs they should *not* end message/format strings
 > in stop() / warning() by a new line, but rather stop() and
 > warning() would *add* the newlines(s) themselves.

 > Still, currently we have a few such cases in R-base.pot,
 > but just these few and maybe they really are "in error", in the
 > sense we could drop the e

[Rd] formatC() doesn't keep attributes

2021-10-08 Thread Suharto Anggono Suharto Anggono via R-devel
By r80949, 'formatC' code in R devel has
    if (!(n <- length(x))) return(character())

If 'x' has length zero, the return value of 'formatC' doesn't have attributes. 
It doesn't follow the documented "Value": "A character object of the same size 
and attributes as x".

Based on my observation, the early return could be removed.
n <- length(x)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] In function isum in summary.c, k should be R_xlen_t

2020-12-31 Thread Suharto Anggono Suharto Anggono via R-devel


In summary.c, in function 'isum', the loop is 'ITERATE_BY_REGION' that contains 
'for' loop
for (int k = 0; k < nbatch; k++)
It is since SVN revision 73445, in released R since version 3.5.0.
Previously, the loop is
for (R_xlen_t i = 0; i < n; i++)

Inside 'ITERATE_BY_REGION', the type of the index, 'k', should still be 
'R_xlen_t' as previously. If 'sx' is a regular vector (not ALTREP), data 
pointer is taken and 'nbatch' is the length of the vector, like without 
'ITERATE_BY_REGION'. With 64-bit R, it is possible that the vector is a long 
vector. In that case, correct iteration should reach index outside the range of 
'int'.

However, I haven't found an example in 64-bit R of wrong behavior of
sum(x)
for 'x' with storage mode "integer" and length 2^31 or more.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Documentation on 'recycle0' argument of 'paste'

2020-09-18 Thread Suharto Anggono Suharto Anggono via R-devel
This is about R 4.0.2 (or 4.0.1) help on 'paste'.


End part of the first paragraph in "Details" section:
If the arguments are vectors, they are concatenated term-by-term to give a 
character vector result. Vector arguments are recycled as needed, with 
zero-length arguments being recycled to '""' only if 'recycle0' is not true 
_or_ 'collapse' is not 'NULL'.

I suggest removing " _or_ 'collapse' is not 'NULL'".
'collapse' has nothing to do with recycling.


Description of 'recycle0' in "Arguments" section:
'logical' indicating if zero-length character arguments should lead to the 
zero-length 'character(0)' after the 'sep'-phase (which turns into '""' in the 
'collapse'-phase, i.e., when 'collapse' is not 'NULL')

I suggest the explanation to include the term "recycle" or "recycling" like the 
argument name. For example,
'logical' indicating if "zero-length recycling rule" is in effect, where mix of 
zero-length and nonzero-length arguments leads to zero-length ...
or
... where, in mix of zero-length and nonzero-length arguments, zero-length wins 
and the nonzero-length arguments are truncated to zero-length

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R-intro: Appendix A: attach position

2019-09-10 Thread Suharto Anggono Suharto Anggono via R-devel


In "An Introduction to R", in "Appendix A  A sample session", in the part on 
Michelson data, information for
attach(mm)
is
Make the data frame visible at position 3 (the default).

In fact, the 'attach' is at position 2.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] methods package: A _R_CHECK_LENGTH_1_LOGIC2_=true error

2019-07-04 Thread Suharto Anggono Suharto Anggono via R-devel
In 'conformMethod', there is another instance of
omittedSig &  .
It just affects error message.

Original:
    if(any(is.na(match(signature[omittedSig], c("ANY", "missing") {
        bad <- omittedSig & is.na(match(signature[omittedSig], c("ANY", 
"missing")))

After r76756:
    if(any(iiN <- is.na(match(signature[omittedSig], c("ANY", "missing") {
        bad <- omittedSig & iiN


--
> Martin Maechler 
>     on Sat, 29 Jun 2019 12:05:49 +0200 writes:

> Martin Maechler 
>     on Sat, 29 Jun 2019 10:33:10 +0200 writes:

> peter dalgaard 
>     on Fri, 28 Jun 2019 16:20:03 +0200 writes:

    >>> > On 28 Jun 2019, at 16:03 , Martin Maechler  wrote:
    >>> > 
    >>> >> Henrik Bengtsson 
    >>> >>    on Thu, 27 Jun 2019 16:00:39 -0700 writes:
    >>> > 
    >>> >> Using:
    >>> >> 
    >>> >> untrace(methods::conformMethod)
    >>> >> at <- c(12,4,3,2)
    >>> >> str(body(methods::conformMethod)[[at]])
    >>> >> ## language omittedSig <- omittedSig && (signature[omittedSig] != 
"missing")
    >>> >> cc <- 0L
    >>> >> trace(methods::conformMethod, tracer = quote({
    >>> >>  cc <<- cc + 1L
    >>> >>  print(cc)
    >>> >>  if (cc == 31) {  ## manually identified
    >>> >>    untrace(methods::conformMethod)
    >>> >>    trace(methods::conformMethod, at = list(at), tracer = quote({
    >>> >>      str(list(signature = signature, mnames = mnames, fnames = 
fnames))
    >>> >>      print(ls())
    >>> >>      try(str(list(omittedSig = omittedSig, signature = signature)))
    >>> >>    }))
    >>> >>  }
    >>> >> }))
    >>> >> loadNamespace("oligo")
    >>> >> 
    >>> >> gives:
    >>> >> 
    >>> >> Untracing function "conformMethod" in package "methods"
    >>> >> Tracing function "conformMethod" in package "methods"
    >>> >> Tracing conformMethod(signature, mnames, fnames, f, fdef, definition)
    >>> >> step 12,4,3,2
    >>> >> List of 3
    >>> >> $ signature: Named chr [1:4] "TilingFeatureSet" "ANY" "ANY" "array"
    >>> >>  ..- attr(*, "names")= chr [1:4] "object" "subset" "target" "value"
    >>> >>  ..- attr(*, "package")= chr [1:4] "oligoClasses" "methods" 
"methods" "methods"
    >>> >> $ mnames   : chr [1:2] "object" "value"
    >>> >> $ fnames   : chr [1:4] "object" "subset" "target" "value"
    >>> >> [1] "f"          "fdef"       "fnames"     "fsig"       "imf"
    >>> >> [6] "method"     "mnames"     "omitted"    "omittedSig" "sig0"
    >>> >> [11] "sigNames"   "signature"
    >>> >> List of 2
    >>> >> $ omittedSig: logi [1:4] FALSE TRUE TRUE FALSE
    >>> >> $ signature : Named chr [1:4] "TilingFeatureSet" "ANY" "ANY" "array"
    >>> >>  ..- attr(*, "names")= chr [1:4] "object" "subset" "target" "value"
    >>> >>  ..- attr(*, "package")= chr [1:4] "oligoClasses" "methods" 
"methods" "methods"
    >>> >> Error in omittedSig && (signature[omittedSig] != "missing") :
    >>> >>  'length(x) = 4 > 1' in coercion to 'logical(1)'
    >>> >> Error: unable to load R code in package 'oligo'
    >>> >> 
    >>> > 
    >>> > Thank you, Henrik, nice piece of using trace() .. and the above
    >>> > is useful for solving the issue --  I can work with that.
    >>> > 
    >>> > I'm  already pretty sure the wrong code starts with
    >>> > 
    >>> >    omittedSig <- sigNames %in% fnames[omitted] # 

    >> my  "pretty sure"  statement above has proven to be wrong ..

    >>> > -
    >>> > 
    >>> 
    >>> I think the intention must have been that the two "ANY" signatures 
should change to "missing". 

    >> Definitely.

    >>> However, with the current logic that will not happen, because
    >>> 
    >>> > c(F,T,T,F) &&  c(T,T)
    >>> [1] FALSE
    >>> 
    >>> Henrik's non-fix would have resulted in
    >>> 
    >>> > c(F,T,T,F) &  c(T,T)
    >>> [1] FALSE  TRUE  TRUE FALSE
    >>> 
    >>> which is actually right, but only coincidentally due to recycling of 
c(T,T). Had it been c(F,T) then it would have been expanded to c(F,T,F,T) which 
would be the opposite of what was wanted.
    >>> 
    >>> Barring NA issues, I still think 
    >>> 
    >>> omittedSig[omittedSig] <- (signature[omittedSig] != "missing")
    >>> 
    >>> should do the trick.

    >> yes, (most probably).  I've found a version of that which should
    >> be even easier to "read and understand", in  svn commit 76753 :

    >> svn diff -c 76753 src/library/methods/R/RMethodUtils.R

    >> --- src/library/methods/R/RMethodUtils.R    (Revision 76752)
    >> +++ src/library/methods/R/RMethodUtils.R    (Revision 76753)
    >> @@ -342,8 +342,7 @@
    >> gettextf("formal arguments (%s) omitted in the method definition cannot 
be in the signature", bad2),
    >> call. = TRUE, domain = NA)
    >> }
    >> -    else if(!all(signature[omittedSig] == "missing")) {
    >> -        omittedSig <- omittedSig && (signature[omittedSig] != "missing")
    >> +    else if(any(omittedSig <- omittedSig & signature != "missing")) {


    >> BTW:  I

Re: [Rd] stopifnot

2019-05-30 Thread Suharto Anggono Suharto Anggono via R-devel
Here is a patch to function 'stopifnot' that adds 'evaluated' argument and 
makes 'exprs' argument in 'stopifnot' like 'exprs' argument in 'withAutoprint'.

--- stop.R  2019-05-30 14:01:15.282197286 +
+++ stop_new.R  2019-05-30 14:01:51.372187466 +
@@ -31,7 +31,7 @@
 .Internal(stop(call., .makeMessage(..., domain = domain)))
 }
 
-stopifnot <- function(..., exprs, local = TRUE)
+stopifnot <- function(..., exprs, evaluated = FALSE, local = TRUE)
 {
 n <- ...length()
 if(!missing(exprs)) {
@@ -41,21 +41,19 @@
 else if(isFALSE(local)) .GlobalEnv
 else if (is.environment(local)) local
 else stop("'local' must be TRUE, FALSE or an environment")
-   exprs <- substitute(exprs) # protect from evaluation
-   E1 <- if(is.call(exprs)) exprs[[1]]
+   E1 <- if(!evaluated && is.call(exprs <- substitute(exprs))) exprs[[1]]
cl <- if(is.symbol(E1) &&
-(E1 == quote(`{`) || E1 == quote(expression))) {
+E1 == quote(`{`)) {
  exprs[[1]] <- quote(stopifnot) ## --> stopifnot(*, *, ..., *) 
:
  exprs
  }
  else
  as.call(c(quote(stopifnot),
-   if(is.null(E1) && is.symbol(exprs) &&
-  is.expression(E1 <- eval(exprs))) # the *name* 
of an expression
-   as.list(E1)
+   if(is.expression(exprs))
+   exprs
else
as.expression(exprs)
-   )) # or fail ..
+   ))
 names(cl) <- NULL
return(eval(cl, envir=envir))
 }




 Subject: Re: [Rd] stopifnot
 To: "Martin Maechler" 
 Cc: r-devel@r-project.org
 Date: Monday, 15 April, 2019, 2:56 AM
 
Also, in current definition of function 'stopifnot' in R 3.6.0 beta or R devel, 
for 'cl' if 'exprs' is specified, there a case with comment "the *name* of an 
expression". The intent is allowing
stopifnot(exprs = ee) ,
where variable 'ee' holds an expression object, to work on the expression 
object.

It is not quite right to use eval(exprs) . It fails when 'stopifnot' is called 
inside a function, like
f <- function(ee) stopifnot(exprs = ee)
f(expression())

But, how about local=FALSE case? Should the following work?
f <- function(ee) stopifnot(exprs = ee, local = FALSE)
f(expression())

But, why bother making it work, while it is undocumented that 'exprs' argument 
in 'stopifnot' can be an expression? Well, yes, expectation may be set from the 
name "exprs" itself or from argument 'exprs' in function 'source' or 
'withAutoprint'. Function 'withAutoprint' may be the closest match.

Function 'withAutoprint' has 'evaluated' argument that controls whether work is 
on value of  'exprs' or on 'exprs' as given. I like the approach.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stopifnot

2019-04-14 Thread Suharto Anggono Suharto Anggono via R-devel
In current definition of function 'stopifnot' in stop.R in R 3.6.0 beta 
(https://svn.r-project.org/R/branches/R-3-6-branch/src/library/base/R/stop.R) 
or R devel (https://svn.r-project.org/R/trunk/src/library/base/R/stop.R), if 
'exprs' is specified, cl[[1]] is quote(stopifnot) . To be more robust, 
quote(base::stopifnot) may be used instead.


Also, in current definition of function 'stopifnot' in R 3.6.0 beta or R devel, 
for 'cl' if 'exprs' is specified, there a case with comment "the *name* of an 
expression". The intent is allowing
stopifnot(exprs = ee) ,
where variable 'ee' holds an expression object, to work on the expression 
object.

It is not quite right to use eval(exprs) . It fails when 'stopifnot' is called 
inside a function, like
f <- function(ee) stopifnot(exprs = ee)
f(expression())

But, how about local=FALSE case? Should the following work?
f <- function(ee) stopifnot(exprs = ee, local = FALSE)
f(expression())

But, why bother making it work, while it is undocumented that 'exprs' argument 
in 'stopifnot' can be an expression? Well, yes, expectation may be set from the 
name "exprs" itself or from argument 'exprs' in function 'source' or 
'withAutoprint'. Function 'withAutoprint' may be the closest match.

Function 'withAutoprint' has 'evaluated' argument that controls whether work is 
on value of  'exprs' or on 'exprs' as given. I like the approach.


If 'E1' is an expression object,
as.call(c(quote(stopifnot), E1))
also works, without converting 'E1' to list.


I suggest to arrange "details" section in stopifnot.Rd as follows:
This function is intended ...
Since R version 3.5.0, stopifnot(exprs = { ... }) ...
stopifnot(A, B) ... is conceptually equivalent to ...
Since R version 3.5.0, expressions are evaluated sequentially ...
Since R version 3.6.0, stopifnot no longer handles potential errors or warnings 
...  ---not including sys.call()
Since R version 3.4.0, ... all.equal ...
sys.call()

Use of sys.call() in 'stopifnot' actually happens since R 3.5.0, as the call 
included in error message produced by 'stopifnot'. In R 3.5.x, it is 
sys.call(-1) , that can be NULL . In current R 3.6.0 beta, it is 
sys.call(sys.parent(1L)) , only if sys.parent(1L) is not 0. The two may differ 
only for 'stopifnot' that is called via 'eval' or the like.

I think it is good if the documentation also includes an example of use of 
'stopifnot' inside a function, where error message from 'stopifnot' includes 
call since R 3.5.0. Such an example is in 
https://stat.ethz.ch/pipermail/r-devel/2017-May/074303.html .


On Mon, 1/4/19, Martin Maechler  wrote:

 Subject: Re: [Rd] stopifnot

 Cc: r-devel@r-project.org
 Date: Monday, 1 April, 2019, 8:12 PM

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>    on Sun, 31 Mar 2019 15:26:13 + writes:

[.]
[ "eval() inside for()" not giving call in error message .]
[.]

    > "Details" section of 'stopifnot' documentation in current R 3.6.0 alpha
    > 
(https://svn.r-project.org/R/branches/R-3-6-branch/src/library/base/man/stopifnot.Rd)
    > has this.

    >   Since \R version 3.6.0, \code{stopifnot()} no longer handles potential
    >   errors or warnings (by \code{\link{tryCatch}()} etc) for each single
    >   expression but rather aims at using the correct
    >   \code{\link{sys.call}()} to get the most meaningful error message in
    >   case of an error.  This provides considerably less overhead.

    > I think part of the first sentence starting from "but rather" should be 
removed because it is not true.

You are right that it is not accurate... I'll modify it,
including keeping the  "considerably less overhead"
which had been one important reason for changing from 3.5.x to
the current version.

    > The next paragraph:

    >   Since \R version 3.5.0, expressions \emph{are} evaluated sequentially,
    >   and hence evaluation stops as soon as there is a \dQuote{non-TRUE}, as
    >   indicated by the above conceptual equivalence statement.
    >   Further, when such an expression signals an error or
    >   \code{\link{warning}}, its \code{\link{conditionCall}()} no longer
    >   contains the full \code{stopifnot} call, but just the erroneous
    >   expression.

    > As I said earlier 
(https://stat.ethz.ch/pipermail/r-devel/2019-February/077386.html), the last 
sentence above is not entirely true. 

You are right to some degree:  That really was true for R 3.5.x,
but is no longer entirely accurate.

It is still true currently interestingly tha

Re: [Rd] stopifnot -- eval(*) inside for()

2019-04-02 Thread Suharto Anggono Suharto Anggono via R-devel
With
f <- function(x) for (i in 1) x
fc <- cmpfun(f)
(my previous example), error message of
fc(is.numeric(y))
shows the originating call as well, while error message of
f(is.numeric(y))
doesn't. Compiled version behaves differently.

Even with
f <- function(x) for (i in 1) {x; eval(expression(i))}
fc <- cmpfun(f)
, error message of
fc(is.numeric(y))
shows the originating call in R 3.3.1.


As I see, error message only has one line of call. If the deparsed call spans 
more than one line, the rest is not shown.


In 'stopifnot' in R 3.5.x, each is wrapped in 'tryCatch' which is wrapped again 
in 'withCallingHandlers'. Just one wrapping may be enough. The 
'withCallingHandlers' construct in 'stopifnot' in R 3.5.x has no effect anyway, 
as I said before 
(https://stat.ethz.ch/pipermail/r-devel/2019-February/077386.html). Also, 
'tryCatch' (or 'withCallingHandlers' ...) can wrap the entire 'for' loop. The 
slowdown can be less than in R 3.5.x.


On Mon, 1/4/19, Martin Maechler  wrote:

 Subject: Re: [Rd] stopifnot -- eval(*) inside for()

 Cc: r-devel@r-project.org
 Date: Monday, 1 April, 2019, 5:00 PM
 
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>on Sun, 31 Mar 2019 15:26:13 + writes:

> Ah, with R 3.5.0 or R 3.4.2, but not with R 3.3.1, 'eval'
> inside 'for' makes compiled version behave like
> non-compiled version. 

Ah.. ... thank you for detecting that  " eval() inside for()" behaves
specially  in how error message get a call or not.
Let's focus only on this issue here.

I'm adding a 0-th case to make even clearer what you are saying:

  >  options(error = expression(NULL))
  >  library(compiler)
  >  enableJIT(0)

  > f0 <- function(x) { x ; x^2 } ; f0(is.numeric(y))
  Error in f0(is.numeric(y)) (from #1) : object 'y' not found
  > (function(x) { x ; x^2 })(is.numeric(y))
  Error in (function(x) { (from #1) : object 'y' not found
  > f0c <- cmpfun(f0) ; f0c(is.numeric(y))

so by default, not only the error message but the originating
call is shown as well.

However, here's your revealing examples:

  > f <- function(x) for (i in 1) {x; eval(expression(i))}
  > f(is.numeric(y))
  > # Error: object 'y' not found
  > fc <- cmpfun(f)
  > fc(is.numeric(y))
  > # Error: object 'y' not found

I've tried more examples and did not find any difference
between simple interpreted and bytecompiled code {apart
from "keep.source=TRUE" keeping source, sometimes visible}.
So I don't understand yet why you think the byte compiler plays
a role.

Rather the crucial difference seems  the error happens inside a
loop which contains an explicit eval(.), and that eval() may
even be entirely unrelated to the statement in which the error
happens [above: The error happens when the promise 'x' is
evaluated, *before* eval() is called at all].


> Is this accidental feature going to be relied upon?

[i.e.  *in  stopifnot() R code (which in R-devel and R 3.5.x has
had an eval() inside the for()-loop)]

That is a good question.
What I really like about the R-devel case:  We do get errors
signalled that do *not* contain the full stopifnot() call.

With the newish introduction of the `exprs = { ... ... }` variant,
it is even more natural to have large `exprs` in a stopifnot() call,
and when there's one accidental error in there, it's quite
unhelpful to see the full stopifnot(..) call {many lines
of R code} obfuscating the one statement which produced the
error.

So it seems I am asking for a new feature in R, 
namely to temporarily say: Set the call to errors to NULL "in

the following".

In R 3.5.x, I had used withCallingHandlers(...) to achieve that
and do even similar for warnings... but needed to that for every
expression and hence inside the for loop  and the consequence
was a relatively large slowdown of stopifnot()..  which
triggered all the changes since.

Whereas what we see here ["eval() inside for()"] is a cheap
automatic suppression of 'call' for the "internal errors", i.e.,
those we don't trigger ourselves via stop(simplError(...)).

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stopifnot

2019-03-31 Thread Suharto Anggono Suharto Anggono via R-devel
Ah, with R 3.5.0 or R 3.4.2, but not with R 3.3.1, 'eval' inside 'for' makes 
compiled version behave like non-compiled version.

options(error = expression(NULL))
library(compiler)
enableJIT(0)
f <- function(x) for (i in 1) {x; eval(expression(i))}
f(is.numeric(y))
# Error: object 'y' not found
fc <- cmpfun(f)
fc(is.numeric(y))
# Error: object 'y' not found

Is this accidental feature going to be relied upon?


"Details" section of 'stopifnot' documentation in current R 3.6.0 alpha 
(https://svn.r-project.org/R/branches/R-3-6-branch/src/library/base/man/stopifnot.Rd)
 has this.

  Since \R version 3.6.0, \code{stopifnot()} no longer handles potential
  errors or warnings (by \code{\link{tryCatch}()} etc) for each single
  expression but rather aims at using the correct
  \code{\link{sys.call}()} to get the most meaningful error message in
  case of an error.  This provides considerably less overhead.

I think part of the first sentence starting from "but rather" should be removed 
because it is not true.


The next paragraph:

  Since \R version 3.5.0, expressions \emph{are} evaluated sequentially,
  and hence evaluation stops as soon as there is a \dQuote{non-TRUE}, as
  indicated by the above conceptual equivalence statement.
  Further, when such an expression signals an error or
  \code{\link{warning}}, its \code{\link{conditionCall}()} no longer
  contains the full \code{stopifnot} call, but just the erroneous
  expression.

As I said earlier 
(https://stat.ethz.ch/pipermail/r-devel/2019-February/077386.html), the last 
sentence above is not entirely true. It may say something like:
Furher, when such an expression signals an error, stopifnot() in R 3.5.x makes 
its conditionCall() the erroneous expression, but no longer since R 3.6.0.


Is it OK that, for
do.call(stopifnot, list(exprs = expression())) ,
the whole expression object is taken as one?


End portion from running
example(stopifnot)
in R 3.5.0:
stpfnt> stopifnot(all.equal(pi, 3.141593),  2 < 2, all(1:10 < 12), "a" < "b")
Error in eval(ei, envir) : pi and 3.141593 are not equal:
  Mean relative difference: 1.102658e-07

To me, "in eval(*)" is rather surprising and annoying and doesn't add clarity. 
Yes, stop() gives the same. But, in this case, just "Error", like in R before 
version 3.5.0, feels better to me. If
stop(simpleError(msg, call = if(p <- sys.parent()) sys.call(p)))
were used in 'stopifnot', just "Error" would be given in this case.



 wrote:

 Subject: Re: [Rd] stopifnot
 To: r-devel@r-project.org
 Date: Thursday, 7 March, 2019, 3:43 PM

[...]

As far as I can see, full stopifnot(...) call can only appear from an error 
that happens during evaluation of an argument of 'stopifnot'. Because the error 
is not raised by 'stopifnot', the call in the error has nothing to do with how 
'n' is computed in sys.call(n-1) , or even with use of sys.call(n-1) itself.

if(n > 1) sys.call(n-1)
that I proposed previously was aimed to be like
sys.call(-1)
in 'stopifnot' in R 3.5.x. Negative number counts back from current frame. The 
value of 'n' is sys.nframe() or (sys.nframe()-3). In my patch, 
stopifnot(exprs=*) drives stopifnot(...) call via 'eval'. I found that frames 
were generated for
stopifnot (exprs) -> eval -> eval (.Internal) -> stopifnot (...)
>From stopifnot (...) , reaching stopifnot (exprs) takes 3 steps back.


[...]

options(error = expression(NULL))
library(compiler)
enableJIT(0)
f <- function(x) for (i in 1) x
f(is.numeric(y))
# Error: object 'y' not found
fc <- cmpfun(f)
fc(is.numeric(y))
# Error in fc(is.numeric(y)) : object 'y' not found

The above illustrates what happens in current 'stopifnot' without 
'withCallingHandlers' or 'tryCatch'. For error during 'for', non-compiled and 
compiled versions are different. It surprised me.

[...]


With my revised patch, the 'else' clause for 'cl' gives
call("expression", exprs) .
For
do.call(stopifnot, list(exprs = expression())) ,
the whole expression object is taken as one.

do.call(stopifnot, list(exprs = expression(1==1, 2 < 1, stop("NOT GOOD!\n"
Error in do.call(stopifnot, list(exprs = expression(1 == 1, 2 < 1, stop("NOT 
GOOD!\n" : 
  expression(1 == 1, 2 < 1, stop("NOT GOOD!\n")) are not all TRUE

To be the same as in R 3.5.x, the 'else' can be
as.call(c(quote(expression), as.expression(exprs)))


On Wed, 6/3/19, Martin Maechler  wrote:

Subject: Re: [Rd] stopifnot

r-project.org
Cc: "Martin Maechler" 
Date: Wednesday, 6 March, 2019, 3:50 PM

> Martin Maechler 
>    on Tue, 5 Mar 2019 21:04:08 +0100 writes:

> Suharto Anggono Suharto Anggono 
>    on Tue, 5 Mar 2019 17:29:20 + writes:

[...]

    >> After thinking again, I propose to use
    >>         stop(simpleError(msg, call = if(p <- sys.parent()) sys.call(p)))

    > That would of course be considerably simpler indeed,  part "2 a" of these:

    >> - It seems that the call is the call of the frame where st

[Rd] Inappropriate "Unix-alike platforms" in txtProgressBar.Rd in R devel

2019-03-18 Thread Suharto Anggono Suharto Anggono via R-devel
In current R devel (r76247), "See Also" section of help page on 
'txtProgressBar' 
(https://svn.r-project.org/R/trunk/src/library/utils/man/txtProgressBar.Rd) has 
"Unix-alike platforms" as a detail on 'tkProgressBar':
\seealso{
  \code{\link{winProgressBar}} (Windows only),
  \code{\link{tkProgressBar}} (Unix-alike platforms).
}

In fact, 'tkProgressBar' also exists in R on Windows.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stopifnot

2019-03-07 Thread Suharto Anggono Suharto Anggono via R-devel
By not using 'withCallingHandler' or 'tryCatch', the state is like 'stopifnot' 
in R 3.4.x. If 'stopifnot' becomes faster than in R 3.4.x when the expressions 
given to 'stopifnot' are all TRUE, it is because 'match.call' is not called. 
Credit is to https://github.com/HenrikBengtsson/Wishlist-for-R/issues/70 for 
the idea.

Speaking about 'match.call',
match.call()[[i+1L]]
can replace
match.call(expand.dots=FALSE)$...[[i]] .
Result of match.call() follows argument order in function definition. In 
'stopifnot', '...' comes first.


Note that what I proposed lately was not merely
sys.call(sys.parent()) ,
but
if(p <- sys.parent()) sys.call(p) .
When sys.parent() is 0, which is the frame number of .GlobalEnv, the result is 
NULL. The result is never the current call. I believe that it is the call of 
sys.frame(sys.parent()) or parent.frame(), which is the frame where 
stopifnot(...) is evaluated, like I said before.
sys.frame(0) is .GlobalEnv, but sys.call(0) is current call, the same as 
sys.call() or sys.call(sys.nframe()). See 
https://stat.ethz.ch/pipermail/r-devel/2016-March/072511.html .

As far as I can see, full stopifnot(...) call can only appear from an error 
that happens during evaluation of an argument of 'stopifnot'. Because the error 
is not raised by 'stopifnot', the call in the error has nothing to do with how 
'n' is computed in sys.call(n-1) , or even with use of sys.call(n-1) itself.

if(n > 1) sys.call(n-1)
that I proposed previously was aimed to be like
sys.call(-1)
in 'stopifnot' in R 3.5.x. Negative number counts back from current frame. The 
value of 'n' is sys.nframe() or (sys.nframe()-3). In my patch, 
stopifnot(exprs=*) drives stopifnot(...) call via 'eval'. I found that frames 
were generated for
stopifnot (exprs) -> eval -> eval (.Internal) -> stopifnot (...)
>From stopifnot (...) , reaching stopifnot (exprs) takes 3 steps back.


Showing full call in error is not unique to 'stopifnot'. In my E-mail in 
https://stat.ethz.ch/pipermail/r-devel/2019-February/077386.html , I gave
identity(is.na(log()))
as an example. It gives
Error in identity(is.na(log())) : 
  argument "x" is missing, with no default

Expanding further,
identity(identity(is.na(log(
has the same error message, with only one call to 'identity'.

I guess that it is because 'log' and 'is.na' are primitive functions, but 
'identity' is not. I guess that a primitive function doesn't have its own 
context, doesn't generate frame, so the innermost non-primitive function is 
taken as context.

However,
identity(is.na(log("a")))
gives
Error in log("a") : non-numeric argument to mathematical function

I guess that some primitive functions in some cases modify call to be shown in 
error or warning message.

options(error = expression(NULL))
library(compiler)
enableJIT(0)
f <- function(x) for (i in 1) x
f(is.numeric(y))
# Error: object 'y' not found
fc <- cmpfun(f)
fc(is.numeric(y))
# Error in fc(is.numeric(y)) : object 'y' not found

The above illustrates what happens in current 'stopifnot' without 
'withCallingHandlers' or 'tryCatch'. For error during 'for', non-compiled and 
compiled versions are different. It surprised me.

'stopifnot' without 'withCallingHandlers' and 'tryCatch' is like in R 3.4.x. I 
had expected error from
stopifnot(is.numeric(y))
when 'y' doesn't exist to contain full 'stopifnot' call, as in R 3.4.x. My idea 
of calling 'stopifnot' again for stopifnot(exprs=*) was to avoid seeing 'eval' 
in error message of
stopifnot(exprs = { is.numeric(y) })
when 'y' doesn't exist, assuming that seeing
stopifnot(is.numeric(y))
in error message was OK. As an aside, calling 'eval' is faster than calling 
'eval' multiple times.

If it is really wanted that error from
stopifnot(is.numeric(y))
when 'y' doesn't exist doesn't give full stopifnot(...) call, I think use of 
'tryCatch' is unavoidable.


A minor advantage of 'assert' with 'do.call' is smaller traceback() .


With my revised patch, the 'else' clause for 'cl' gives
call("expression", exprs) .
For
do.call(stopifnot, list(exprs = expression())) ,
the whole expression object is taken as one.

do.call(stopifnot, list(exprs = expression(1==1, 2 < 1, stop("NOT GOOD!\n"
Error in do.call(stopifnot, list(exprs = expression(1 == 1, 2 < 1, stop("NOT 
GOOD!\n" : 
  expression(1 == 1, 2 < 1, stop("NOT GOOD!\n")) are not all TRUE

To be the same as in R 3.5.x, the 'else' can be
as.call(c(quote(expression), as.expression(exprs)))


On Wed, 6/3/19, Martin Maechler  wrote:

 Subject: Re: [Rd] stopifnot

@r-project.org
 Cc: "Martin Maechler" 
 Date: Wednesday, 6 March, 2019, 3:50 PM

> Martin Maechler 
>    on Tue, 5 Mar 2019 21:04:08 +0100 writes:

> Suharto Anggono Suharto Anggono 
>    on Tue, 5 Mar 2019 17:29:20 + writes:

    >> Another possible shortcut definition:

    >> assert <- function(exprs)
    >> do.call("stopifnot", list(exprs = substitute(exprs), local = 
parent.frame()))

    > Thank you. 

Re: [Rd] stopifnot

2019-03-05 Thread Suharto Anggono Suharto Anggono via R-devel
Another possible shortcut definition:
assert <- function(exprs)
do.call("stopifnot", list(exprs = substitute(exprs), local = parent.frame()))


After thinking again, I propose to use
        stop(simpleError(msg, call = if(p <- sys.parent()) sys.call(p)))

- It seems that the call is the call of the frame where stopifnot(...) is 
evaluated. Because that is the correct context, I think it is good.
- It is simpler and also works for call that originally comes from 
stopifnot(exprs=*) .
- It allows shortcut ('assert') to have the same call in error message as 
stopifnot(exprs=*) .


Another thing: Is it intended that
do.call("stopifnot", list(exprs = expression()))
evaluates each element of the expression object? If so, maybe add a case for 
'cl', like
        else if(is.expression(exprs))
        as.call(c(quote(expression), exprs))


On Mon, 4/3/19, Martin Maechler  wrote:

 Subject: Re: [Rd] stopifnot

 Cc: r-devel@r-project.org
 Date: Monday, 4 March, 2019, 4:59 PM

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>    on Sat, 2 Mar 2019 08:28:23 + writes:
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>    on Sat, 2 Mar 2019 08:28:23 + writes:

    > A private reply by Martin made me realize that I was wrong about
    > stopifnot(exprs=TRUE) .
    > It actually works fine. I apologize. What I tried and was failed was

    > stopifnot(exprs=T) .
    > Error in exprs[[1]] : object of type 'symbol' is not subsettable

indeed! .. and your patch below does address that, too.

    > The shortcut
    > assert <- function(exprs) stopifnot(exprs = exprs)
    > mentioned in "Warning" section of the documentation similarly fails when 
called, for example
    > assert({})

    > About shortcut, a definition that rather works:
    > assert <- function(exprs) eval.parent(substitute(stopifnot(exprs = 
exprs)))

Interesting... thank you for the suggestion!  I plan to add it
to the help page and then use it a bit .. before considering more.

    > Looking at https://stat.ethz.ch/pipermail/r-devel/2017-May/074227.html , 
using sys.parent() may be not good. For example, in
    > f <- function() stopifnot(exprs={FALSE}, local=FALSE); f()

I'm glad you found this too.. I did have "uneasy feelings" about
using sys.parent(2) to find the correct call ..  and I'm still
not 100% sure about the smart computation of 'n' for
sys.call(n-1) ... but I agree we should move in that direction
as it is so much faster than using withCallingHandlers() + tryCatch()
for all the expressions.

In my tests your revised patch (including the simplificationn
you sent 4 hours later) seems good and indeed does have very
good timing in simple experiments.

It will lead to some error messages being changed,
but in the examples I've seen,  the few changes were acceptable
(sometimes slightly less helpful, sometimes easier to read).


Martin

    > A revised patch (also with simpler 'cl'):
    > --- stop.R    2019-02-27 16:15:45.324167577 +
    > +++ stop_new.R    2019-03-02 06:21:35.919471080 +
    > @@ -1,7 +1,7 @@
    > #  File src/library/base/R/stop.R
    > #  Part of the R package, https://www.R-project.org
    > #
    > -#  Copyright (C) 1995-2018 The R Core Team
    > +#  Copyright (C) 1995-2019 The R Core Team
    > #
    > #  This program is free software; you can redistribute it and/or modify
    > #  it under the terms of the GNU General Public License as published by
    > @@ -33,25 +33,28 @@

    > stopifnot <- function(..., exprs, local = TRUE)
    > {
    > +    n <- ...length()
    > missE <- missing(exprs)
    > -    cl <-
    > if(missE) {  ## use '...' instead of exprs
    > -        match.call(expand.dots=FALSE)$...
    > } else {
    > -        if(...length())
    > +        if(n)
    > stop("Must use 'exprs' or unnamed expressions, but not both")
    > envir <- if (isTRUE(local)) parent.frame()
    > else if(isFALSE(local)) .GlobalEnv
    > else if (is.environment(local)) local
    > else stop("'local' must be TRUE, FALSE or an environment")
    > exprs <- substitute(exprs) # protect from evaluation
    > -        E1 <- exprs[[1]]
    > +        E1 <- if(is.call(exprs)) exprs[[1]]
    > +        cl <-
    > if(identical(quote(`{`), E1)) # { ... }
    > -        do.call(expression, as.list(exprs[-1]))
    > +        exprs
    > else if(identical(quote(expression), E1))
    > -        eval(exprs, envir=envir)
    > +        exprs
    > else
    > -        as.expression(exprs) # or fail ..
    > +        call("expression", exprs) # or fail

Re: [Rd] stopifnot

2019-03-02 Thread Suharto Anggono Suharto Anggono via R-devel
Instead of
if(!is.null(names(cl))) names(cl) <- NULL ,
just
names(cl) <- NULL
looks simpler and the memory usage and speed is not bad in my little experiment.




 Subject: Re: [Rd] stopifnot
 To: r-devel@r-project.org
 Date: Saturday, 2 March, 2019, 3:28 PM
 
[...]

A revised patch (also with simpler 'cl'):
--- stop.R2019-02-27 16:15:45.324167577 +
+++ stop_new.R2019-03-02 06:21:35.919471080 +
@@ -1,7 +1,7 @@
#  File src/library/base/R/stop.R
#  Part of the R package, https://www.R-project.org
#
-#  Copyright (C) 1995-2018 The R Core Team
+#  Copyright (C) 1995-2019 The R Core Team
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
@@ -33,25 +33,28 @@

stopifnot <- function(..., exprs, local = TRUE)
{
+n <- ...length()
missE <- missing(exprs)
-cl <-
if(missE) {  ## use '...' instead of exprs
-match.call(expand.dots=FALSE)$...
} else {
-if(...length())
+if(n)
stop("Must use 'exprs' or unnamed expressions, but not both")
envir <- if (isTRUE(local)) parent.frame()
else if(isFALSE(local)) .GlobalEnv
else if (is.environment(local)) local
else stop("'local' must be TRUE, FALSE or an environment")
exprs <- substitute(exprs) # protect from evaluation
-E1 <- exprs[[1]]
+E1 <- if(is.call(exprs)) exprs[[1]]
+cl <-
if(identical(quote(`{`), E1)) # { ... }
-do.call(expression, as.list(exprs[-1]))
+exprs
else if(identical(quote(expression), E1))
-eval(exprs, envir=envir)
+exprs
else
-as.expression(exprs) # or fail ..
+call("expression", exprs) # or fail ..
+if(!is.null(names(cl))) names(cl) <- NULL
+cl[[1]] <- sys.call()[[1]]
+return(eval(cl, envir=envir))
}
Dparse <- function(call, cutoff = 60L) {
ch <- deparse(call, width.cutoff = cutoff)
@@ -62,14 +65,10 @@
abbrev <- function(ae, n = 3L)
paste(c(head(ae, n), if(length(ae) > n) ""), collapse="\n  ")
##
-for (i in seq_along(cl)) {
-cl.i <- cl[[i]]
-## r <- eval(cl.i, ..)  # with correct warn/err messages:
-r <- withCallingHandlers(
-tryCatch(if(missE) ...elt(i) else eval(cl.i, envir=envir),
-error = function(e) { e$call <- cl.i; stop(e) }),
-warning = function(w) { w$call <- cl.i; w })
+for (i in seq_len(n)) {
+r <- ...elt(i)
if (!(is.logical(r) && !anyNA(r) && all(r))) {
+cl.i <- match.call(expand.dots=FALSE)$...[[i]]
msg <- ## special case for decently written 'all.equal(*)':
if(is.call(cl.i) && identical(cl.i[[1]], quote(all.equal)) &&
  (is.null(ni <- names(cl.i)) || length(cl.i) == 3L ||
@@ -84,7 +83,12 @@
"%s are not all TRUE"),
Dparse(cl.i))

-stop(simpleError(msg, call = sys.call(-1)))
+n <- sys.nframe()
+if((p <- n-3) > 0 &&
+  identical(sys.function(p), sys.function(n)) &&
+  eval(expression(!missE), p)) # originally stopifnot(exprs=*)
+n <- p
+stop(simpleError(msg, call = if(n > 1) sys.call(n-1)))
}
}
invisible()

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stopifnot

2019-03-02 Thread Suharto Anggono Suharto Anggono via R-devel
A private reply by Martin made me realize that I was wrong about
stopifnot(exprs=TRUE) .
It actually works fine. I apologize. What I tried and was failed was
stopifnot(exprs=T) .
Error in exprs[[1]] : object of type 'symbol' is not subsettable

The shortcut
assert <- function(exprs) stopifnot(exprs = exprs)
mentioned in "Warning" section of the documentation similarly fails when 
called, for example
assert({})

About shortcut, a definition that rather works:
assert <- function(exprs) eval.parent(substitute(stopifnot(exprs = exprs)))

Looking at https://stat.ethz.ch/pipermail/r-devel/2017-May/074227.html , using 
sys.parent() may be not good. For example, in
f <- function() stopifnot(exprs={FALSE}, local=FALSE); f()

A revised patch (also with simpler 'cl'):
--- stop.R  2019-02-27 16:15:45.324167577 +
+++ stop_new.R  2019-03-02 06:21:35.919471080 +
@@ -1,7 +1,7 @@
 #  File src/library/base/R/stop.R
 #  Part of the R package, https://www.R-project.org
 #
-#  Copyright (C) 1995-2018 The R Core Team
+#  Copyright (C) 1995-2019 The R Core Team
 #
 #  This program is free software; you can redistribute it and/or modify
 #  it under the terms of the GNU General Public License as published by
@@ -33,25 +33,28 @@
 
 stopifnot <- function(..., exprs, local = TRUE)
 {
+n <- ...length()
 missE <- missing(exprs)
-cl <-
if(missE) {  ## use '...' instead of exprs
-   match.call(expand.dots=FALSE)$...
} else {
-   if(...length())
+   if(n)
stop("Must use 'exprs' or unnamed expressions, but not both")
envir <- if (isTRUE(local)) parent.frame()
 else if(isFALSE(local)) .GlobalEnv
 else if (is.environment(local)) local
 else stop("'local' must be TRUE, FALSE or an environment")
exprs <- substitute(exprs) # protect from evaluation
-   E1 <- exprs[[1]]
+   E1 <- if(is.call(exprs)) exprs[[1]]
+   cl <-
if(identical(quote(`{`), E1)) # { ... }
-   do.call(expression, as.list(exprs[-1]))
+   exprs
else if(identical(quote(expression), E1))
-   eval(exprs, envir=envir)
+   exprs
else
-   as.expression(exprs) # or fail ..
+   call("expression", exprs) # or fail ..
+   if(!is.null(names(cl))) names(cl) <- NULL
+   cl[[1]] <- sys.call()[[1]]
+   return(eval(cl, envir=envir))
}
 Dparse <- function(call, cutoff = 60L) {
ch <- deparse(call, width.cutoff = cutoff)
@@ -62,14 +65,10 @@
 abbrev <- function(ae, n = 3L)
paste(c(head(ae, n), if(length(ae) > n) ""), collapse="\n  ")
 ##
-for (i in seq_along(cl)) {
-   cl.i <- cl[[i]]
-   ## r <- eval(cl.i, ..)   # with correct warn/err messages:
-   r <- withCallingHandlers(
-   tryCatch(if(missE) ...elt(i) else eval(cl.i, envir=envir),
-error = function(e) { e$call <- cl.i; stop(e) }),
-   warning = function(w) { w$call <- cl.i; w })
+for (i in seq_len(n)) {
+   r <- ...elt(i)
if (!(is.logical(r) && !anyNA(r) && all(r))) {
+   cl.i <- match.call(expand.dots=FALSE)$...[[i]]
msg <- ## special case for decently written 'all.equal(*)':
if(is.call(cl.i) && identical(cl.i[[1]], quote(all.equal)) &&
   (is.null(ni <- names(cl.i)) || length(cl.i) == 3L ||
@@ -84,7 +83,12 @@
 "%s are not all TRUE"),
Dparse(cl.i))
 
-   stop(simpleError(msg, call = sys.call(-1)))
+   n <- sys.nframe()
+   if((p <- n-3) > 0 &&
+  identical(sys.function(p), sys.function(n)) &&
+  eval(expression(!missE), p)) # originally stopifnot(exprs=*)
+   n <- p
+   stop(simpleError(msg, call = if(n > 1) sys.call(n-1)))
}
 }
 invisible()


On Fri, 1/3/19, Martin Maechler  wrote:

 Subject: Re: [Rd] stopifnot

 Cc: "Martin Maechler" , r-devel@r-project.org
 Date: Friday, 1 March, 2019, 6:40 PM

> Suharto Anggono Suharto Anggono 
>    on Wed, 27 Feb 2019 22:46:04 + writes:

[...]

    > Another thing: currently,
    > stopifnot(exprs=TRUE)
    > fails.

good catch - indeed!

I've started to carefully test and try the interesting nice
patch you've provided below.

[...]

Martin


    > A patch:
    > --- stop.R    2019-02-27 16:15:45.324167577 +
    > +++ stop_new.R    2019-02-27 16:22:15.936203541 +
    > @@ -1,7 +1,7 @@
    > #  File src/library/base/R/stop.R
    > #  Part of the R package, https://www.R-project.org
    > #
    > -#  Copyright (C) 1995-2018 The R Core Team
    > +#  Copyright (C) 1995-2019 The R Core Team
    > #
    > #  This program is free software; you can redistribute it and/or modify
    > #  it under the terms of the GNU G

Re: [Rd] stopifnot

2019-02-27 Thread Suharto Anggono Suharto Anggono via R-devel
My points:
- The 'withCallingHandlers' construct that is used in current 'stopifnot' code 
has no effect. Without it, the warning message is the same. The overridden 
warning is not raised. The original warning stays.
- Overriding call in error and warning to 'cl.i' doesn't always give better 
outcome. The original call may be "narrower" than 'cl.i'.

I have found these examples.
identity(is.na(log()))
identity(is.na(log("a")))

Error message from the first contains full call. Error message from the second 
doesn't.

So, how about being "natural", not using 'withCallingHandlers' and 'tryCatch' 
in 'stopifnot'?

Another thing: currently,
stopifnot(exprs=TRUE)
fails.

A patch:
--- stop.R  2019-02-27 16:15:45.324167577 +
+++ stop_new.R  2019-02-27 16:22:15.936203541 +
@@ -1,7 +1,7 @@
 #  File src/library/base/R/stop.R
 #  Part of the R package, https://www.R-project.org
 #
-#  Copyright (C) 1995-2018 The R Core Team
+#  Copyright (C) 1995-2019 The R Core Team
 #
 #  This program is free software; you can redistribute it and/or modify
 #  it under the terms of the GNU General Public License as published by
@@ -33,25 +33,27 @@
 
 stopifnot <- function(..., exprs, local = TRUE)
 {
+n <- ...length()
 missE <- missing(exprs)
-cl <-
if(missE) {  ## use '...' instead of exprs
-   match.call(expand.dots=FALSE)$...
} else {
-   if(...length())
+   if(n)
stop("Must use 'exprs' or unnamed expressions, but not both")
envir <- if (isTRUE(local)) parent.frame()
 else if(isFALSE(local)) .GlobalEnv
 else if (is.environment(local)) local
 else stop("'local' must be TRUE, FALSE or an environment")
exprs <- substitute(exprs) # protect from evaluation
-   E1 <- exprs[[1]]
+   E1 <- if(is.call(exprs)) exprs[[1]]
+   cl <-
if(identical(quote(`{`), E1)) # { ... }
-   do.call(expression, as.list(exprs[-1]))
+   exprs[-1]
else if(identical(quote(expression), E1))
eval(exprs, envir=envir)
else
as.expression(exprs) # or fail ..
+   if(!is.null(names(cl))) names(cl) <- NULL
+   return(eval(as.call(c(sys.call()[[1]], as.list(cl))), envir=envir))
}
 Dparse <- function(call, cutoff = 60L) {
ch <- deparse(call, width.cutoff = cutoff)
@@ -62,14 +64,10 @@
 abbrev <- function(ae, n = 3L)
paste(c(head(ae, n), if(length(ae) > n) ""), collapse="\n  ")
 ##
-for (i in seq_along(cl)) {
-   cl.i <- cl[[i]]
-   ## r <- eval(cl.i, ..)   # with correct warn/err messages:
-   r <- withCallingHandlers(
-   tryCatch(if(missE) ...elt(i) else eval(cl.i, envir=envir),
-error = function(e) { e$call <- cl.i; stop(e) }),
-   warning = function(w) { w$call <- cl.i; w })
+for (i in seq_len(n)) {
+   r <- ...elt(i)
if (!(is.logical(r) && !anyNA(r) && all(r))) {
+   cl.i <- match.call(expand.dots=FALSE)$...[[i]]
msg <- ## special case for decently written 'all.equal(*)':
if(is.call(cl.i) && identical(cl.i[[1]], quote(all.equal)) &&
   (is.null(ni <- names(cl.i)) || length(cl.i) == 3L ||
@@ -84,7 +82,11 @@
 "%s are not all TRUE"),
Dparse(cl.i))
 
-   stop(simpleError(msg, call = sys.call(-1)))
+   p <- sys.parent()
+   if(p && identical(sys.function(p), stopifnot) &&
+  !eval(expression(missE), p)) # originally stopifnot(exprs=*)
+       p <- sys.parent(2)
+       stop(simpleError(msg, call = if(p) sys.call(p)))
}
 }
 invisible()


On Wed, 27/2/19, Martin Maechler  wrote:

 Subject: Re: [Rd] stopifnot

 Cc: r-devel@r-project.org
 Date: Wednesday, 27 February, 2019, 5:36 PM

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>    on Sun, 24 Feb 2019 14:22:48 + writes:

    >> From https://github.com/HenrikBengtsson/Wishlist-for-R/issues/70 :
    > ... and follow up note from 2018-03-15: Ouch... in R-devel, stopifnot() 
has become yet 4-5 times slower;

    > ...
    > which is due to a complete rewrite using tryCatch() and 
withCallingHandlers().


    >> From https://stat.ethz.ch/pipermail/r-devel/2017-May/074256.html , it 
seems that 'tryCatch' was used to avoid the following example from giving error 
message with 'eval' call and 'with

[Rd] stopifnot

2019-02-24 Thread Suharto Anggono Suharto Anggono via R-devel
>From https://github.com/HenrikBengtsson/Wishlist-for-R/issues/70 :
... and follow up note from 2018-03-15: Ouch... in R-devel, stopifnot() has 
become yet 4-5 times slower;

...
which is due to a complete rewrite using tryCatch() and withCallingHandlers().


>From https://stat.ethz.ch/pipermail/r-devel/2017-May/074256.html , it seems 
>that 'tryCatch' was used to avoid the following example from giving error 
>message with 'eval' call and 'withCallingHandlers' was meant to handle similar 
>case for warning.
tst <- function(y) { stopifnot(is.numeric(y)); y+ 1 }
try(tst())

However,
withCallingHandlers(,
warning = function(w) { w$call <- cl.i; w })
actally has no effect. In current code of function 'stopifnot', 'eval' is used 
only in handling stopifnot(exprs=) . The warning message from
stopifnot(exprs={warning()})
has 'eval' call:
In eval(cl.i, envir = envir) : 

This may work.
withCallingHandlers(,
warning = function(w) {
w$call <- cl.i; warning(w); invokeRestart("muffleWarning") })


Current documentation says:
Since R version 3.5.0, expressions are evaluated sequentially, and hence 
evaluation stops as soon as there is a "non-TRUE", asnindicated by the above 
conceptual equivalence statement. Further, when such an expression signals an 
error or warning, its conditionCall() no longer contains the full stopifnot 
call, but just the erroneous expression.

I assume that "no longer contains ..." is supposed to be the effect of the use 
of 'withCallingHandlers' and 'tryCatch' in 'stopifnot'.

Actually, "contains the full stopifnot call" is not always the case in R before 
version 3.5.0. Normally, the call is the "innermost context".

Example:
stopifnot((1:2) + (1:3) > 0)
Warning message:
In (1:2) + (1:3) :
  longer object length is not a multiple of shorter object length

Example that gives error:
stopifnot(is.na(log("a")))
R 3.5.0:
R 3.3.2:

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Extract.data.frame.Rd about $.data.frame

2019-02-18 Thread Suharto Anggono Suharto Anggono via R-devel
The statement in R devel:
  There is no \code{data.frame} method for \code{$}, so \code{x$name}
  uses the default method which treats \code{x} as a list (with no partial
  matching of column names).  The replacement method (for \code{$}) checks
  \code{value} for the correct number of rows, and replicates it if necessary.

The added "(with no partial matching of column names)" is wrong. The default 
method of '$' (for extraction) allows partial matching for list; partial 
matching gives warning if option warnPartialMatchDollar is TRUE.


On Fri, 15/2/19, Martin Maechler  wrote:

 Subject: Re: [Rd] Extract.data.frame.Rd about $.data.frame

 Cc: r-devel@r-project.org
 Date: Friday, 15 February, 2019, 4:15 PM
 
>>>>> Suharto Anggono Suharto Anggono via R-devel

>>>>>on Sun, 10 Feb 2019 16:33:25 + writes:

> In R devel, data.frame method of '$' has been removed, but this part of 
"Details" section of Extract.data.frame.Rd still implies existence of the 
method.
> The \code{data.frame} method for \code{$}, treats \code{x} as a
> list, except that (as of R-3.1.0) partial matching of \code{name} to
> the names of \code{x} will generate a warning; this may become an
> error in future versions.  The replacement method checks
> \code{value} for the correct number of rows, and replicates it if
> necessary.


> Statement from before R 3.1.0 could be used again:

> There is no \code{data.frame} method for \code{$}, so \code{x$name}
> uses the default method which treats \code{x} as a list.  There is a
> replacement method which checks \code{value} for the correct number
> of rows, and replicates it if necessary.


[[elided Yahoo spam]]
I've added a 2 x 2 words of explanation to make it easier to understand.

Now changed.
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Extract.data.frame.Rd about $.data.frame

2019-02-10 Thread Suharto Anggono Suharto Anggono via R-devel
In R devel, data.frame method of '$' has been removed, but this part of 
"Details" section of Extract.data.frame.Rd still implies existence of the 
method.

  The \code{data.frame} method for \code{$}, treats \code{x} as a
  list, except that (as of R-3.1.0) partial matching of \code{name} to
  the names of \code{x} will generate a warning; this may become an
  error in future versions.  The replacement method checks
  \code{value} for the correct number of rows, and replicates it if
  necessary.


Statement from before R 3.1.0 could be used again:

  There is no \code{data.frame} method for \code{$}, so \code{x$name}
  uses the default method which treats \code{x} as a list.  There is a
  replacement method which checks \code{value} for the correct number
  of rows, and replicates it if necessary.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Redundant code in 'split.default' in R devel

2018-10-05 Thread Suharto Anggono Suharto Anggono via R-devel
After r75387, function 'split.default' in R devel still has this part that no 
longer has effect.
lf <- levels(f)
y <- vector("list", length(lf))
names(y) <- lf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal: more accurate seq(from, to, length=n)

2018-09-08 Thread Suharto Anggono Suharto Anggono via R-devel
I just thought that returning a more accurate result was better and that
(1:10)/10
was an obvious way to calculate. It turned out that it was not that easy.

I found that calculation that I proposed previously was not accurate for
seq(-5, 5, length=101).
I then thought of
from + (0:(length.out - 1))/((length.out - 1)/(to - from)) ,
that is dividing by (1/by) instead of multiplying by 'by'. But I then found 
that 1/(1/49) didn't give 49.

So, now I am proposing dividing by (1/by) selectively, like
from + if (abs(to - from) < length.out - 1 &&
abs(to - from) >= 2^(-22)  # exact with 16 significant digits
) (0:(length.out - 1))/((length.out - 1)/(to - from)) else
(0:(length.out - 1))*((to - from)/(length.out - 1))

Not changing 'seq.default' is fine, too.


On Sat, 8/9/18, Gabe Becker  wrote:

 Subject: Re: [Rd] Proposal: more accurate seq(from, to, length=n)

 Cc: "r-devel" 
 Date: Saturday, 8 September, 2018, 5:38 AM

 Suharto,
 My 2c
 inline.
 On Fri,
 Sep 7, 2018 at 2:34 PM, Suharto Anggono Suharto Anggono via
 R-devel 
 wrote:
 In R,

 seq(0, 1, 0.1)

 gives the same result as

 (0:10)*0.1.

 It is not the same as

 c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1) ,

 as 0.1 is not represented exactly. I am fine with it.



 In R,

 seq(0, 1, length=11)

 gives the same result as

 seq(0, 1, 0.1).

 However, for 

 seq(0, 1, length=11),

 it is more accurate to return

 c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)
 .
 It can be obtained by

 (0:10)/10.



 When 'from', 'to', and 'length.out'
 are specified and length.out > 2, I propose for function
 'seq.default' in R to use something like

 from + ((0:(length.out - 1))/(length.out - 1)) * (to -
 from)

 instead of something like

 from + (0:(length.out - 1)) * ((to - from)/(length.out - 1))
 .

 In your example case under
 3.50 on my system these two expressions give results
 which return TRUE from all.equal, which is the accepted way
 of comparing non-integer numerics in R for
 "sameness".











 > from =
 0
 > to =
 1
 > length.out =
 11
 > all.equal(from +
 ((0:(length.out - 1))/(length.out - 1)) * (to - from), from
 + (0:(length.out - 1)) * ((to - from)/(length.out -
 1)))
 [1] TRUE



 Given that I'm
 wondering what the benefit you're looking for here is
 that would outweigh the very large set of existing code
 whose behavior would technically change  under this change.
 Then again, it wouldn't change with respect to the
 accepted all.equal test, so I guess you could argue that
 either there's "no change" or the change is
 ok? 
 I'd still
 like to know what practical problem you're trying to
 solve though. if you're looking for the ability to use
 == to compare non integer sequences generated different
 ways, as far as I understand the answer is that you
 shouldn't be expecting to be able to do
 that.
 Best,~G



 __ 

 R-devel@r-project.org
 mailing list

 https://stat.ethz.ch/mailman/
 listinfo/r-devel




 -- 
 Gabriel Becker, Ph.DScientistBioinformatics and
 Computational BiologyGenentech Research

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Proposal: more accurate seq(from, to, length=n)

2018-09-07 Thread Suharto Anggono Suharto Anggono via R-devel
In R,
seq(0, 1, 0.1)
gives the same result as
(0:10)*0.1.
It is not the same as
c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1) ,
as 0.1 is not represented exactly. I am fine with it.

In R,
seq(0, 1, length=11)
gives the same result as
seq(0, 1, 0.1).
However, for 
seq(0, 1, length=11),
it is more accurate to return
c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1) .
It can be obtained by
(0:10)/10.

When 'from', 'to', and 'length.out' are specified and length.out > 2, I propose 
for function 'seq.default' in R to use something like
from + ((0:(length.out - 1))/(length.out - 1)) * (to - from)
instead of something like
from + (0:(length.out - 1)) * ((to - from)/(length.out - 1)) .

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] aic() component in GLM-family objects

2018-06-19 Thread Suharto Anggono Suharto Anggono via R-devel
In R, family has aic component since version 0.62. There is no aic component in 
family in R 0.61.3.

Looking at blame, 
https://github.com/wch/r-source/blame/tags/R-0-62/src/library/base/R/family.R , 
aic component in family is introduced in svn revision 640 
(https://github.com/wch/r-source/commit/ac666741679b50bb1dfb5ce631717b375119f6ab):
using aic(.) [Jim Lindsey]; use switch() rather than many if else else.. (MM)

Components of family is documented since R 2.3.0.


> Ben Bolker 
> on Sun, 17 Jun 2018 11:40:38 -0400 writes:

> FWIW p. 206 of the White Book gives the following for
> names(binomial()): family, names, link, inverse, deriv,
> initialize, variance, deviance, weight.

>   So $aic wasn't there In The Beginning.  I haven't done
> any more archaeology to try to figure out when/by whom it
> was first introduced ...

Thank you Ben.

I think I was already suggesting that it was by Simon and Ross
and we cannot know who of the two.

>  Section 6.3.3, on extending families, doesn't give any
> other relevant info.

> A patch for src/library/stats/man/family.Rd below: please
> check what I've said about $aic and $mu.eta to make sure
> they're correct!  I can submit this to the r bug list if
> preferred.

I've spent quite some time checking this - to some extent.

Thank you for the patch. I will use an even slightly extended
version ((and using the correct '\eqn{\eta}{eta}' )).

Thank you indeed.
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-06 Thread Suharto Anggono Suharto Anggono via R-devel
Does anyone notice r-devel thread "stopifnot() does not stop at first non-TRUE 
argument" starting with 
https://stat.ethz.ch/pipermail/r-devel/2017-May/074179.html ?

I have mentioned
(function(...)nargs())(...)
in https://stat.ethz.ch/pipermail/r-devel/2017-May/074294.html .

Something like ..elt(n) is switch(n, ...) . I have mentioned it in 
https://stat.ethz.ch/pipermail/r-devel/2017-May/074270.html . See also response 
in https://stat.ethz.ch/pipermail/r-devel/2017-May/074282.html .

By the way, because 'stopifnot' in R 3.5.0 contains argument other than '...', 
it might be better to use
match.call(expand.dots=FALSE)$...
instead of
match.call()[-1L] .

---
> Joris Meys 
> on Fri, 4 May 2018 15:37:27 +0200 writes:

> The one difference I see, is the necessity to pass the dots to the 
function
> dotlength :

> dotlength <- function(...) nargs()

> myfun <- function(..., someArg = 1){
> n1 <- ...length()
> n2 <- dotlength()
> n3 <- dotlength(...)
> return(c(n1, n2, n3))
> }

> myfun(stop("A"), stop("B"), someArg = stop("c"))

> I don't really see immediately how one can replace the C definition with
> Hadley's solution without changing how the function has to be used.

Yes, of course:  nargs() can only be applied to the function inside
which it is used, and hence  n2 <- dotlength()  must therefore be 0.
Thank you, Joris

> Personally, I have no preference over the use, but changing it now would
> break code dependent upon ...length() imho. Unless I'm overlooking
> something of course.

Yes.  OTOH, as it's been very new, one could consider
deprecating it, and advertize say,  .length(...) instead of ...length()
[yes, in spite of the fact that the pure-R solution is slower
 than a primitive; both are fast enough for all purposes]

But such a deprecation cycle typically entails time more writing
etc, not something I've time for just these days.

Martin


> On Fri, May 4, 2018 at 3:02 PM, Martin Maechler 
> wrote:

>> > Hervé Pagès 
>> > on Thu, 3 May 2018 08:55:20 -0700 writes:
>> 
>> > Hi,
>> > It would be great if one of the experts could comment on the
>> > difference between Hadley's dotlength and ...length? The fact
>> > that someone bothered to implement a new primitive for that
>> > when there seems to be a very simple and straightforward R-only
>> > solution suggests that there might be some gotchas/pitfalls with
>> > the R-only solution.
>> 
>> Namely
>> 
>> > dotlength <- function(...) nargs()
>> 
>> > (This is subtly different from calling nargs() directly as it will
>> > only count the elements in ...)
>> 
>> > Hadley
>> 
>> 
>> Well,  I was the "someone".  In the past I had seen (and used myself)
>> 
>> length(list(...))
>> 
>> and of course that was not usable.
>> I knew of some substitute() / match.call() tricks [but I think
>> did not know Bill's cute substitute(...()) !] at the time, but
>> found them too esoteric.
>> 
>> Aditionally and importantly,  ...length()  and  ..elt(n)  were
>> developed  "synchronously",  and the R-substitutes for ..elt()
>> definitely are less trivial (I did not find one at the time), as
>> Duncan's example to Bill's proposal has shown, so I had looked
>> at .Primitive() solutions of both.
>> 
>> In hindsight I should have asked here for advice,  but may at
>> the time I had been a bit frustrated by the results of some of
>> my RFCs ((nothing specific in mind !))
>> 
>> But __if__ there's really no example where current (3.5.0 and newer)
>> 
>> ...length()
>> 
>> differs from Hadley's  dotlength()
>> I'd vert happy to replace ...length 's C based definition by
>> Hadley's beautiful minimal solution.
>> 
>> Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Result of 'seq' doesn't use compact internal representation

2018-04-28 Thread Suharto Anggono Suharto Anggono via R-devel
> .Internal(inspect(1:10))
@300e4e8 13 INTSXP g0c0 [NAM(3)]  1 : 10 (compact)
> .Internal(inspect(seq(1,10)))
@3b6e1f8 13 INTSXP g0c4 [] (len=10, tl=0) 1,2,3,4,5,...
> system.time(1:1e7)
   user  system elapsed
  0   0   0
> system.time(seq(1,1e7))
   user  system elapsed
   0.050.000.04

It seems that result of function 'seq' doesn't use compact internal 
representation. However, looking at the code of function 'seq.default', 
seq(1,n) produces 1:n. What is going on?

> h <- seq.default
> environment(h) <- .GlobalEnv
> library(compiler)
> enableJIT(0)
[1] 3
> .Internal(inspect(h(1,10)))
@375ade8 13 INTSXP g0c0 [NAM(3)]  1 : 10 (compact)

A non-byte-compiled version of function 'seq.default' can produce object that 
uses compact internal representation.


> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 3

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] compiler  stats graphics  grDevices utils datasets  methods
[8] base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Part of fastpass in 'sort.list' can make sorting unstable

2018-04-06 Thread Suharto Anggono Suharto Anggono via R-devel
In the code of functions 'order' and 'sort.list' in R 3.5.0 alpha (in 
https://svn.r-project.org/R/branches/R-3-5-branch/src/library/base/R/sort.R), 
in "fastpass, take advantage of ALTREP metadata", there is "try the reverse 
since that's easy too...". If it succeeds, ties are reordered, violating 
stability of sorting.

Example:
x <- sort(c(1, 1, 3))
x  # 1 1 3
sort.list(x, decreasing=TRUE)  # should be 3 1 2

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Nice names in deparse

2018-03-29 Thread Suharto Anggono Suharto Anggono via R-devel
I am raising this again.

As
as.character(list(c(one = "1")))
is still
"1"
in R 3.5.0 alpha, could
as.character(list(c(one = 1)))
be
"1"
, too, as before?

The case here is the list component is atomic with length 1.


On Sat, 16/12/17, Suharto Anggono Suharto Anggono  
wrote:

 Subject: Nice names in deparse
 To: r-devel@r-project.org
 Date: Saturday, 16 December, 2017, 11:09 PM

Tags (argument names) in call to 'list' becomes names of the result. It is not 
necessarily so with call to 'c'. The default method of 'c' has 'recursive' and 
'use.names' arguments.

In R devel r73778, with
x <- 0; names(x) <- "recursive"  ,
dput(x)
or even
dput(x, control = "all")
gives
c(recursive = 0)
However, actual result of c(recursive = 0) is NULL.

Also with
x <- 0; names(x) <- "recursive"  ,
dput(x, control = c("keepNA", "keepInteger", "showAttributes"))
in R devel r73778
gives
structure(c(0), .Names = "recursive")
The 'control' is suggested by an example for output as in R < 3.5.0. However, 
the output is slightly different from
dput(x)
in R 3.3.2:
structure(0, .Names = "recursive")


Part of NEWS item related with "niceNames" control option:
as.character(list( c (one = 1))) now includes the name, as 
as.character(list(list(one = 1))) has always done.

Please reconsider.
As
as.numeric(list(c(one = 1)))
gives
1 ,
I expect that
as.character(list(c(one = "1")))
gives
"1" .
It does in R devel r73778.
Why does
as.character(list(c(one = 1)))
give
"c(one = 1)" ?

as.numeric(list(c(one = "1")))
gives
1 .

list(list(one = 1))
is not quite the same.
as.numeric(list(list(one = 1)))
gives
NA .

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Function 'factor' issues

2018-03-23 Thread Suharto Anggono Suharto Anggono via R-devel
I am trying once again.

By just changing
f <- match(xlevs[f], nlevs)
to
f <- match(xlevs, nlevs)[f]
, function 'factor' in R devel could be made more consistent and 
back-compatible. Why not picking it?


On Sat, 25/11/17, Suharto Anggono Suharto Anggono  
wrote:

 Subject: Re: [Rd] Function 'factor' issues
 To: r-devel@r-project.org
 Date: Saturday, 25 November, 2017, 6:03 PM

>From commits to R devel, I saw attempts to speed up subsetting and 'match', 
>and to cache results of conversion of small nonnegative integers to character 
>string. That's good.

I am sorry for pushing, still.

Is the partial new behavior of function 'factor' with respect to NA really 
worthy?

match(xlevs, nlevs)[f]  looks nice, too.

- Using
f <- match(xlevs, nlevs)[f]
instead of
f <- match(xlevs[f], nlevs)
for remapping
- Remapping only if length(nlevs) differs from length(xlevs)
Applying changes similar to above to function 'levels<-.factor' will not change 
'levels<-.factor' result at all. So, the corresponding part of functions 
'factor' and 'levels<-.factor' can be kept in sync.


On Sun, 22/10/17, Suharto Anggono Suharto Anggono  
wrote:

Subject: Re: [Rd] Function 'factor' issues
To: r-devel@r-project.org
Date: Sunday, 22 October, 2017, 6:43 AM

My idea (like in https://bugs.r-project.org/bugzilla/attachment.cgi?id=1540 ):
- For remapping, use
f <- match(xlevs, nlevs)[f]
instead of
f <- match(xlevs[f], nlevs)
(I have mentioned it).
- Remap only if length(nlevs) differs from length(xlevs) .


[snip]


On Wed, 18/10/17, Martin Maechler  wrote:

Subject: Re: [Rd] Function 'factor' issues

Cc: r-devel@r-project.org
Date: Wednesday, 18 October, 2017, 11:54 PM

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>    on Sun, 15 Oct 2017 16:03:48 + writes:


    > In R devel, function 'factor' has been changed, allowing and merging 
duplicated 'labels'.

Indeed.  That had been asked for and discussed a bit on this
list from June 14 to June 23, starting at
  https://stat.ethz.ch/pipermail/r-devel/2017-June/074451.html

    > Issue 1: Handling of specified 'labels' without duplicates is slower than 
before.
    > Example:
    > x <- rep(1:26, 4)
    > system.time(factor(x, levels=1:26, labels=letters))

    > Function 'factor' is already rather slow because of conversion to 
character. Please don't add slowdown.

Indeed, I doo see a ~ 20%  performance loss for the example
above, and I may get to look into this.
However, in R-devel there have been important internal
changes (ALTREP additions) some of which are currently giving
some performance losses in some cases (but they have the
potential to give big performance _gains_ e.g. for simple
indexing into large vectors which may apply here !).
For factor(), these C level "ALTREP" changes may not be the reason at
all for the slow down;
I may find time to investigate further.

{{ For the ALTREP-change slowdowns I've noticed in some
  indexing/subset operations, we'll definitely have time to look into
  before R-devel is going to be released next spring... and as mentioned,
  these operations may even become considerably faster *thanks*
  to ALTREP ... }}

    > Issue 2: While default 'labels' is 'levels', not specifying 'labels' may 
be different from specifying 'labels' to be the same as 'levels'.

    > Example 1:
    > as.integer(factor(c(NA,2,3), levels = c(2, NA), exclude = NULL))
    > is different from
    > as.integer(factor(c(NA,2,3), levels = c(2, NA), labels = c(2, NA), 
exclude = NULL))

You are right.  But this is not so exceptional and part of the new feature of
'labels' allowing to "fix up" things in such cases.  While it
would be nice if this was not the case the same phenomenon
happens in other functions as well because of lazy evaluation.
I think I had noticed that already and at the time found
"not easy" to work around.
(There are many aspects about changing such important base functions:
1. not breaking back compatibility ((unless in rare
    border cases, where we are sure it's worth))
2. Keeping code relatively transparent
3. Keep the semantics "simple" to document and as intuitive as possible
)

    > File reg-tests-1d.R indicates that 'factor' behavior with NA is slightly 
changed, for the better. NA entry (because it is unmatched to 'levels' argument 
or is in 'exclude') is absorbed into NA in "levels" attribute (comes from 
'labels' argument), if any. The issue is that it happens only when 'labels

Re: [Rd] Inappropriate parens fix for Logic.Rd

2018-03-17 Thread Suharto Anggono Suharto Anggono via R-devel
Logic.Rd has been changed again in r74377. After change:
  \item{x, y}{raw or logical or \sQuote{number-like} vectors (i.e., of
    types \code{\link{double}} (class \code{\link{numeric}},
    \code{\link{integer}}) and \code{\link{complex}}), or objects for

It is still inappropriate. As I said before, integer is not double.

Right: numeric includes double and integer
Wrong: double includes numeric and integer

The text mentions "type" and "class". I believe that, in the text, originally, 
"type" refers to what is returned by typeof() and "class" refers to what is 
returned by class() in R.
When typeof(x) is "double", class(x) is "numeric".
When typeof(x) is "integer", class(x) is "integer".



> wrote:

 Subject: Inappropriate parens fix for Logic.Rd
 To: r-devel@r-project.org
 Date: Saturday, 10 March, 2018, 8:23 AM

Logic.Rd is one of the files changed in r74363.

Before change:
  \item{x, y}{raw or logical or \sQuote{number-like} vectors (i.e., of types
    \code{\link{double}} (class \code{\link{numeric}}), \code{\link{integer}}
    and \code{\link{complex}})), or objects for

After change:
  \item{x, y}{raw or logical or \sQuote{number-like} vectors (i.e., of types
    \code{\link{double}} (class \code{\link{numeric}}, \code{\link{integer}}
    and \code{\link{complex}})), or objects for

Neither integer nor complex is double.
I think, it should be
  \item{x, y}{raw or logical or \sQuote{number-like} vectors (i.e., of types
    \code{\link{double}} (class \code{\link{numeric}}), \code{\link{integer}}
    and \code{\link{complex}}), or objects for

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Inappropriate parens fix for Logic.Rd

2018-03-09 Thread Suharto Anggono Suharto Anggono via R-devel
Logic.Rd is one of the files changed in r74363.

Before change:
  \item{x, y}{raw or logical or \sQuote{number-like} vectors (i.e., of types
\code{\link{double}} (class \code{\link{numeric}}), \code{\link{integer}}
and \code{\link{complex}})), or objects for

After change:
  \item{x, y}{raw or logical or \sQuote{number-like} vectors (i.e., of types
\code{\link{double}} (class \code{\link{numeric}}, \code{\link{integer}}
and \code{\link{complex}})), or objects for

Neither integer nor complex is double.
I think, it should be
  \item{x, y}{raw or logical or \sQuote{number-like} vectors (i.e., of types
\code{\link{double}} (class \code{\link{numeric}}), \code{\link{integer}}
and \code{\link{complex}}), or objects for

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Nice names in deparse

2018-02-10 Thread Suharto Anggono Suharto Anggono via R-devel
x <- 0; names(x) <- "recursive"
I am saying more plainly: With 'x' above, deparse(x, control = "all") is wrong 
in R devel.


On Sat, 16/12/17, Suharto Anggono Suharto Anggono  
wrote:

 Subject: Nice names in deparse
 To: r-devel@r-project.org
 Date: Saturday, 16 December, 2017, 11:09 PM

 Tags (argument names) in call to 'list'
 becomes names of the result. It is not necessarily so with
 call to 'c'. The default method of 'c' has 'recursive' and
 'use.names' arguments.

 In R devel r73778, with
 x <- 0; names(x) <- "recursive" 
 ,
 dput(x)
 or even
 dput(x, control = "all")
 gives
 c(recursive = 0)
 However, actual result of c(recursive =
 0) is NULL.

 Also with
 x <- 0; names(x) <- "recursive" 
 ,
 dput(x, control = c("keepNA",
 "keepInteger", "showAttributes"))
 in R devel r73778
 gives
 structure(c(0), .Names = "recursive")
 The 'control' is suggested by an
 example for output as in R < 3.5.0. However, the output
 is slightly different from
 dput(x)
 in R 3.3.2:
 structure(0, .Names = "recursive")


 Part of NEWS item related with
 "niceNames" control option:
 as.character(list( c (one = 1))) now
 includes the name, as as.character(list(list(one = 1))) has
 always done.

 Please reconsider.
 As
 as.numeric(list(c(one = 1)))
 gives
 1 ,
 I expect that
 as.character(list(c(one = "1")))
 gives
 "1" .
 It does in R devel r73778.
 Why does
 as.character(list(c(one = 1)))
 give
 "c(one = 1)" ?

 as.numeric(list(c(one = "1")))
 gives
 1 .

 list(list(one = 1))
 is not quite the same.
 as.numeric(list(list(one = 1)))
 gives
 NA .

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.list method for by Objects

2018-02-03 Thread Suharto Anggono Suharto Anggono via R-devel
Maybe behavior of 'as.list' in R is not inherited from S?
- From https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=78 , in "the 
prototype" (S), 'as.list' on a data frame gave a list, not a data frame as 
given by the default 'as.list' in R. That led to introduction of 
'as.list.data.frame'.
- From https://biostat-lists.wustl.edu/sympa/arc/s-news/1999-07/msg00198.html , 
with
s <- c("a"=1, "b"=2) ,
as.list(z) doesn't have names in S-PLUS 3.4, different from in R. In S-PLUS 
5.1, as.list(z) has names, like in R.

In "Details" section, the documentation, list.Rd, mentions this about 'as.list'.
Attributes may be dropped unless the argument already is a list or expression.  
(This is inconsistent with functions such as as.character which always drop 
attributes, and is for efficiency since lists can be expensive to copy.)

On efficiency issue, shallow copying has been introduced. So, can the behavior 
of the default method of 'as.list' be reconsidered?

Related: The default mehod of 'as.vector' with mode="list" behaves like the 
default method of 'as.list'. As a consequence, 'is.vector' with mode="list" on 
its result may return FALSE. I have raised the issue in 
https://stat.ethz.ch/pipermail/r-devel/2013-May/066671.html .


> Michael Lawrence 
>     on Tue, 30 Jan 2018 15:57:42 -0800 writes:

    > I just meant that the minimal contract for as.list() appears to be that it
    > returns a VECSXP. To the user, we might say that is.list() will always
    > return TRUE.
    
Indeed. I also agree with Herv'e that the user level
documentation should rather mention  is.list(.) |--> TRUE  than
VECSXP, and interestingly for the experts among us,
the  is.list() primitive gives not only TRUE for  VECSXP  but
also of LISTSXP (the good ole' pairlists).

    > I'm not sure we can expect consistency across methods
    > beyond that, nor is it feasible at this point to match the
    > semantics of the methods package. It deals in "class
    > space" while as.list() deals in "typeof() space".

    > Michael

Yes, and that *is* the extra complexity we have in R (inherited
from S, I'd say)  which ideally wasn't there and of course is
not there in much younger languages/systems such as julia.

And --- by the way let me preach, for the "class space" ---
do __never__ use

      if(class(obj) == "")

in your code (I see this so often, shockingly to me ...) but rather use

      if(inherits(obj, ""))

instead.

Martin



    > On Tue, Jan 30, 2018 at 3:47 PM, Hervé Pagès  
wrote:

    >> On 01/30/2018 02:50 PM, Michael Lawrence wrote:
    >> 
    >>> by() does not always return a list. In Gabe's example, it returns an
    >>> integer, thus it is coerced to a list. as.list() means that it should 
be a
    >>> VECSXP, not necessarily with "list" in the class attribute.
    >>> 
    >> 
    >> The documentation is not particularly clear about what as.list()
    >> means for list derivatives. IMO clarifications should stick to
    >> simple concepts and formulations like "is.list(x) is TRUE" or
    >> "x is a list or a list derivative" rather than "x is a VECSXP".
    >> Coercion is useful beyond the use case of implementing a .C entry
    >> point and calling as.numeric/as.list/etc... on its arguments.
    >> 
    >> This is why I was hoping that we could maybe discuss the possibility
    >> of making the as.list() contract less vague than just "as.list()
    >> must return a list or a list derivative".
    >> 
    >> Again, I think that 2 things weight quite a lot in that discussion:
    >> 1) as.list() returns an object of class "data.frame" on a
    >> data.frame (strict coercion). If all what as.list() needed to
    >> do was to return a VECSXP, then as.list.default() already does
    >> this on a data.frame so why did someone bother adding an
    >> as.list.data.frame method that does strict coercion?
    >> 2) The S4 coercion system based on as() does strict coercion by
    >> default.
    >> 
    >> H.
    >> 
    >> 
    >>> Michael
    >>> 
    >>> 
    >>> On Tue, Jan 30, 2018 at 2:41 PM, Hervé Pagès >> > wrote:
    >>> 
    >>> Hi Gabe,
    >>> 
    >>> Interestingly the behavior of as.list() on by objects seem to
    >>> depend on the object itself:
    >>> 
    >>> > b1 <- by(1:2, 1:2, identity)
    >>> > class(as.list(b1))
    >>> [1] "list"
    >>> 
    >>> > b2 <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
    >>> > class(as.list(b2))
    >>> [1] "by"
    >>> 
    >>> This is with R 3.4.3 and R devel (2017-12-11 r73889).
    >>> 
    >>> H.
    >>> 
    >>> On 01/30/2018 02:33 PM, Gabriel Becker wrote:
    >>> 
    >>> Dario,
    >>> 
    >>> What version of R are you using. In my mildly old 3.4.0
    >>> installation and in the version of Revel I have lying around
    >>> (also mildly old...)  I don't see the behavior I think you are
    >>> describing
    >>> 
    >>> > b = by(1:2, 1:2, identity)
    >>> 
    >>> > class(as.list(b))
    >>

[Rd] Nice names in deparse

2017-12-16 Thread Suharto Anggono Suharto Anggono via R-devel
Tags (argument names) in call to 'list' becomes names of the result. It is not 
necessarily so with call to 'c'. The default method of 'c' has 'recursive' and 
'use.names' arguments.

In R devel r73778, with
x <- 0; names(x) <- "recursive"  ,
dput(x)
or even
dput(x, control = "all")
gives
c(recursive = 0)
However, actual result of c(recursive = 0) is NULL.

Also with
x <- 0; names(x) <- "recursive"  ,
dput(x, control = c("keepNA", "keepInteger", "showAttributes"))
in R devel r73778
gives
structure(c(0), .Names = "recursive")
The 'control' is suggested by an example for output as in R < 3.5.0. However, 
the output is slightly different from
dput(x)
in R 3.3.2:
structure(0, .Names = "recursive")


Part of NEWS item related with "niceNames" control option:
as.character(list( c (one = 1))) now includes the name, as 
as.character(list(list(one = 1))) has always done.

Please reconsider.
As
as.numeric(list(c(one = 1)))
gives
1 ,
I expect that
as.character(list(c(one = "1")))
gives
"1" .
It does in R devel r73778.
Why does
as.character(list(c(one = 1)))
give
"c(one = 1)" ?

as.numeric(list(c(one = "1")))
gives
1 .

list(list(one = 1))
is not quite the same.
as.numeric(list(list(one = 1)))
gives
NA .

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Function 'factor' issues

2017-11-25 Thread Suharto Anggono Suharto Anggono via R-devel
>From commits to R devel, I saw attempts to speed up subsetting and 'match', 
>and to cache results of conversion of small nonnegative integers to character 
>string. That's good.

I am sorry for pushing, still.

Is the partial new behavior of function 'factor' with respect to NA really 
worthy?

match(xlevs, nlevs)[f]  looks nice, too.

- Using
f <- match(xlevs, nlevs)[f]
instead of
f <- match(xlevs[f], nlevs)
for remapping
- Remapping only if length(nlevs) differs from length(xlevs)
Applying changes similar to above to function 'levels<-.factor' will not change 
'levels<-.factor' result at all. So, the corresponding part of functions 
'factor' and 'levels<-.factor' can be kept in sync.




 Subject: Re: [Rd] Function 'factor' issues
 To: r-devel@r-project.org
 Date: Sunday, 22 October, 2017, 6:43 AM
 
My idea (like in https://bugs.r-project.org/bugzilla/attachment.cgi?id=1540 ):
- For remapping, use
f <- match(xlevs, nlevs)[f]
instead of
f <- match(xlevs[f], nlevs)
(I have mentioned it).
- Remap only if length(nlevs) differs from length(xlevs) .


[snip]


On Wed, 18/10/17, Martin Maechler  wrote:

Subject: Re: [Rd] Function 'factor' issues

Cc: r-devel@r-project.org
Date: Wednesday, 18 October, 2017, 11:54 PM

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Sun, 15 Oct 2017 16:03:48 + writes:

 
> In R devel, function 'factor' has been changed, allowing and merging 
duplicated 'labels'.

Indeed.  That had been asked for and discussed a bit on this
list from June 14 to June 23, starting at
   https://stat.ethz.ch/pipermail/r-devel/2017-June/074451.html

> Issue 1: Handling of specified 'labels' without duplicates is slower than 
before.
> Example:
> x <- rep(1:26, 4)
> system.time(factor(x, levels=1:26, labels=letters))

> Function 'factor' is already rather slow because of conversion to 
character. Please don't add slowdown.

Indeed, I doo see a ~ 20%  performance loss for the example
above, and I may get to look into this.
However, in R-devel there have been important internal
changes (ALTREP additions) some of which are currently giving
some performance losses in some cases (but they have the
potential to give big performance _gains_ e.g. for simple
indexing into large vectors which may apply here !).
For factor(), these C level "ALTREP" changes may not be the reason at
all for the slow down;
I may find time to investigate further.

{{ For the ALTREP-change slowdowns I've noticed in some
   indexing/subset operations, we'll definitely have time to look into
   before R-devel is going to be released next spring... and as mentioned,
   these operations may even become considerably faster *thanks*
   to ALTREP ... }}

> Issue 2: While default 'labels' is 'levels', not specifying 'labels' may 
be different from specifying 'labels' to be the same as 'levels'.

> Example 1:
> as.integer(factor(c(NA,2,3), levels = c(2, NA), exclude = NULL))
> is different from
> as.integer(factor(c(NA,2,3), levels = c(2, NA), labels = c(2, NA), 
exclude = NULL))

You are right.  But this is not so exceptional and part of the new feature of
'labels' allowing to "fix up" things in such cases.  While it
would be nice if this was not the case the same phenomenon
happens in other functions as well because of lazy evaluation.
I think I had noticed that already and at the time found
"not easy" to work around.
(There are many aspects about changing such important base functions:
1. not breaking back compatibility ((unless in rare
border cases, where we are sure it's worth))
2. Keeping code relatively transparent
3. Keep the semantics "simple" to document and as intuitive as possible
)

> File reg-tests-1d.R indicates that 'factor' behavior with NA is slightly 
changed, for the better. NA entry (because it is unmatched to 'levels' argument 
or is in 'exclude') is absorbed into NA in "levels" attribute (comes from 
'labels' argument), if any. The issue is that it happens only when 'labels' is 
specified.

I'm not sure anymore, but I think I had noticed that also in
June, considered to change it and found that such a changed
factor() would be too different from what it has "always been".
So, yes, IIRC, this current behavior is on purpose, if only for back 
compatibility.


> Function 'factor' could use match(xlevs, nlevs)[f]. It doesn't match NA 
to NA level. When 'f' is long enough, longer than 'xlevs', it is faster

[Rd] ans[nas] <- NA in 'ifelse' (was: ifelse() woes ... can we agree on a ifelse2() ?)

2017-11-04 Thread Suharto Anggono Suharto Anggono via R-devel
Removal of
ans[nas] <- NA
from the code of function 'ifelse' in R is not committed (yet). Why?


On Mon, 28/11/16, Martin Maechler  wrote:

 Subject: Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

 Cc: R-devel@r-project.org, maech...@stat.math.ethz.ch
 Date: Monday, 28 November, 2016, 10:00 PM
 
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Sat, 26 Nov 2016 17:14:01 + writes:

...


> On current 'ifelse' code in R:

> * The part
> ans[nas] <- NA
> could be omitted because NA's are already in place.
> If the part is removed, variable 'nas' is no longer used.

I agree that this seems logical.  If I apply the change, R's own
full checks do not seem affected, and I may try to commit that
change and "wait and see".


...

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Function 'factor' issues

2017-10-21 Thread Suharto Anggono Suharto Anggono via R-devel
My idea (like in https://bugs.r-project.org/bugzilla/attachment.cgi?id=1540 ):
- For remapping, use
f <- match(xlevs, nlevs)[f]
instead of
f <- match(xlevs[f], nlevs)
(I have mentioned it).
- Remap only if length(nlevs) differs from length(xlevs) .


On use of 'order' in function 'factor' in R devel, factor.Rd still says 
'sort.list' in "Details" section.

My comments on the part of "Details" section:
- Sortable 'x' is needed only when 'levels' is not specified.
- Complete requirements for properly working factor(x) in R devel: 
'as.character', 'order', 'unique' corresponding to '['. Take data frame and 
"Surv" object (package survival) as examples.


On Wed, 18/10/17, Martin Maechler  wrote:

 Subject: Re: [Rd] Function 'factor' issues

 Cc: r-devel@r-project.org
 Date: Wednesday, 18 October, 2017, 11:54 PM

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>     on Sun, 15 Oct 2017 16:03:48 + writes:

  
    > In R devel, function 'factor' has been changed, allowing and merging 
duplicated 'labels'.

Indeed.  That had been asked for and discussed a bit on this
list from June 14 to June 23, starting at
   https://stat.ethz.ch/pipermail/r-devel/2017-June/074451.html

    > Issue 1: Handling of specified 'labels' without duplicates is slower than 
before.
    > Example:
    > x <- rep(1:26, 4)
    > system.time(factor(x, levels=1:26, labels=letters))

    > Function 'factor' is already rather slow because of conversion to 
character. Please don't add slowdown.

Indeed, I doo see a ~ 20%  performance loss for the example
above, and I may get to look into this.
However, in R-devel there have been important internal
changes (ALTREP additions) some of which are currently giving
some performance losses in some cases (but they have the
potential to give big performance _gains_ e.g. for simple
indexing into large vectors which may apply here !).
For factor(), these C level "ALTREP" changes may not be the reason at
all for the slow down;
I may find time to investigate further.

{{ For the ALTREP-change slowdowns I've noticed in some
   indexing/subset operations, we'll definitely have time to look into
   before R-devel is going to be released next spring... and as mentioned,
   these operations may even become considerably faster *thanks*
   to ALTREP ... }}

    > Issue 2: While default 'labels' is 'levels', not specifying 'labels' may 
be different from specifying 'labels' to be the same as 'levels'.

    > Example 1:
    > as.integer(factor(c(NA,2,3), levels = c(2, NA), exclude = NULL))
    > is different from
    > as.integer(factor(c(NA,2,3), levels = c(2, NA), labels = c(2, NA), 
exclude = NULL))

You are right.  But this is not so exceptional and part of the new feature of
'labels' allowing to "fix up" things in such cases.  While it
would be nice if this was not the case the same phenomenon
happens in other functions as well because of lazy evaluation.
I think I had noticed that already and at the time found
"not easy" to work around.
(There are many aspects about changing such important base functions:
 1. not breaking back compatibility ((unless in rare
    border cases, where we are sure it's worth))
 2. Keeping code relatively transparent
 3. Keep the semantics "simple" to document and as intuitive as possible
)

    > File reg-tests-1d.R indicates that 'factor' behavior with NA is slightly 
changed, for the better. NA entry (because it is unmatched to 'levels' argument 
or is in 'exclude') is absorbed into NA in "levels" attribute (comes from 
'labels' argument), if any. The issue is that it happens only when 'labels' is 
specified.

I'm not sure anymore, but I think I had noticed that also in
June, considered to change it and found that such a changed
factor() would be too different from what it has "always been".
So, yes, IIRC, this current behavior is on purpose, if only for back 
compatibility.


    > Function 'factor' could use match(xlevs, nlevs)[f]. It doesn't match NA 
to NA level. When 'f' is long enough, longer than 'xlevs', it is faster than 
match(xlevs[f], nlevs).

    > Example 2:
    > With
    > levs <- c("A","A")  ,
    > factor(levs, levels=levs)
    > gives error, but
    > factor(levs, levels=levs, labels=levs)
    > doesn't.

yes, again that is a consequence of what you said above (before
'Example 1')

    > Note: In theory, if function 'factor' merged duplicated 'labels' i

[Rd] Function 'factor' issues

2017-10-15 Thread Suharto Anggono Suharto Anggono via R-devel
In R devel, function 'factor' has been changed, allowing and merging duplicated 
'labels'.

Issue 1: Handling of specified 'labels' without duplicates is slower than 
before.
Example:
x <- rep(1:26, 4)
system.time(factor(x, levels=1:26, labels=letters))

Function 'factor' is already rather slow because of conversion to character. 
Please don't add slowdown.

Issue 2: While default 'labels' is 'levels', not specifying 'labels' may be 
different from specifying 'labels' to be the same as 'levels'.

Example 1:
as.integer(factor(c(NA,2,3), levels = c(2, NA), exclude = NULL))
is different from
as.integer(factor(c(NA,2,3), levels = c(2, NA), labels = c(2, NA), exclude = 
NULL))

File reg-tests-1d.R indicates that 'factor' behavior with NA is slightly 
changed, for the better. NA entry (because it is unmatched to 'levels' argument 
or is in 'exclude') is absorbed into NA in "levels" attribute (comes from 
'labels' argument), if any. The issue is that it happens only when 'labels' is 
specified.

Function 'factor' could use match(xlevs, nlevs)[f]. It doesn't match NA to NA 
level. When 'f' is long enough, longer than 'xlevs', it is faster than 
match(xlevs[f], nlevs).

Example 2:
With
levs <- c("A","A")  ,
factor(levs, levels=levs)
gives error, but
factor(levs, levels=levs, labels=levs)
doesn't.

Note: In theory, if function 'factor' merged duplicated 'labels' in all cases, 
at least in
factor(c(sqrt(2)^2, 2))  ,
function 'factor' could do matching on original 'x' (without conversion to 
character), as in R before version 2.10.0. If function 'factor' did it,
factor(c(sqrt(2)^2, 2), levels = c(sqrt(2)^2, 2), labels = c("sqrt(2)^2", "2"))
could take sqrt(2)^2 and 2 as distinct.


Another thing: Function 'factor' in R devel uses 'order' instead of 'sort.list'.

The case of as.factor(x) for
x <- as.data.frame(character(0))
in tests/isas-tests.Rout.save reveals that 'order' on data frame is strange.

x <- as.data.frame(character(0))
y <- unique(x)
length(y)  # 1
length(order(y))  # 0
length(as.character(y))  # 1

order(y) is not as long as as.character(y).

Another example:
length(mtcars)  # 11
length(order(mtcars))  # 352

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Revert to R 3.2.x code of logicalSubscript in subscript.c?

2017-10-01 Thread Suharto Anggono Suharto Anggono via R-devel
Currently, in function 'logicalSubscript' in subscript.c, the case of no 
recycling is handled like the implentation of R function 'which'. It passes 
through the data only once, but uses more memory. It is since R 3.3.0. For the 
case of recycling, two passes are done, first to get number of elements in the 
result.

Also since R 3.3.0, function 'makeSubscript' in subscript.c doesn't call 
'duplicate' on logical index vector.

A side note: I guess that it is safe not to call 'duplicate' on logical index 
vector, even if it is the one being modified in subassignment, because it is 
converted to positive indices before use in extraction or replacement. If so, 
isn't it true for character index vector as well?

Here are examples of subsetting a numeric vector of length 10^8 with logical 
index vector, inspired by Hong Ooi's answer in 
https://stackoverflow.com/questions/17510778/why-is-subsetting-on-a-logical-type-slower-than-subsetting-on-numeric-type
 . I presents two extreme cases, each with no-recycling and recycling versions 
that convert to the same positive indices. Difference between the two versions 
can be attributed to function 'logicalSubscript'.

Example 1: select none
x <- numeric(1e8)
i <- rep(FALSE, length(x))# no reycling
system.time(x[i])
system.time(x[i])
i <- FALSE# recycling
system.time(x[i])
system.time(x[i])

Output:
   user  system elapsed 
  0.083   0.000   0.083 
   user  system elapsed 
  0.085   0.000   0.085 
   user  system elapsed 
  0.144   0.000   0.144 
   user  system elapsed 
  0.143   0.000   0.144 

Example 2: select all
x <- numeric(1e8)
i <- rep(TRUE, length(x))# no reycling
system.time(x[i])
system.time(x[i])
i <- TRUE# recycling
system.time(x[i])
system.time(x[i])

Output:
   user  system elapsed 
  0.538   0.741   1.292 
   user  system elapsed 
  0.506   0.668   1.175 
   user  system elapsed 
  0.448   0.534   0.986 
   user  system elapsed 
  0.431   0.528   0.960 

The results were from R 3.3.2 on http://rextester.com/l/r_online_compiler . The 
no-recycling version took longer time than the recycling version for example 2, 
where more time was taken in both versions.

Function 'logicalSubscript' in subscript.c in R 3.2.x also use a faster code 
for the case of no recycling, but does two passes in all cases. Treatment for 
the case of recycling is identical with current code.

Function 'logicalSubscript' in subscript.c affects subsetting with negative 
indices, because negative indices are converted first to a logical index vector 
with the same length as the vector (no recycling).

Example, comparing times of x[-1] and its equivalent, x[2:length(x)] :
x <- numeric(1e8)
system.time(x[-1])
system.time(x[-1])
system.time(x[2:length(x)])
system.time(x[2:length(x)])

Output from R 3.3.2 on http://rextester.com/l/r_online_compiler :
   user  system elapsed 
  0.591   0.903   1.515 
   user  system elapsed 
  0.558   0.822   1.384 
   user  system elapsed 
  0.620   0.659   1.285 
   user  system elapsed 
  0.607   0.663   1.274 

Output from R 3.2.2 in Zenppelin Notebook, 
https://my.datascientistworkbench.com/tools/zeppelin-notebook/ :
user  system elapsed 
  1.156   1.636   2.794 
   user  system elapsed 
  0.884   1.528   2.413 
   user  system elapsed 
  0.932   1.544   2.476 
   user  system elapsed 
  0.932   1.584   2.519

From above, apparently, x[-1] consistently took longer time than x[2:length(x)] 
with R 3.3.2, but not with R 3.2.2.

So, how about reverting to R 3.2.x code of function 'logicalSubscript' in 
subscript.c?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] NEWS item about PR#17284

2017-09-18 Thread Suharto Anggono Suharto Anggono via R-devel
Previous mentions:
- https://stat.ethz.ch/pipermail/r-devel/2017-July/074723.html
- https://stat.ethz.ch/pipermail/r-devel/2017-August/074737.html

The NEWS item corresponding to PR#17284 is in "CHANGES in R-devel". However, 
fix for PR#17284 is already included in R 3.4.2 beta.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R_xlen_t is an integer type

2017-08-27 Thread Suharto Anggono Suharto Anggono via R-devel
A recent R devel NEWS item: approx(), spline(), splinefun() and approxfun() 
also work for long vectors.

In current R devel, in function 'approx1' in src/library/stats/src/approx.c and 
in function 'spline_eval' in src/library/stats/splines.c, in
#ifdef LONG_VECTOR_SUPPORT
there is a comment "R_xlen_t is double". It is incorrect. In Rinternals.h, in
#ifdef LONG_VECTOR_SUPPORT
R_xlen_t is defined as ptrdiff_t , an integer type.

In function 'approx1' in src/library/stats/src/approx.c,
R_xlen_t ij = (i+j) / 2;
can be used unconditionally.

In function 'spline_eval' in src/library/stats/src/splines.c,
R_xlen_t k = (i+j) / 2;
can be used unconditionally.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issues of R_pretty in src/appl/pretty.c

2017-08-19 Thread Suharto Anggono Suharto Anggono via R-devel
Yes, they work now.

I mentioned them partly because the commit description said overflow for large 
n and partly to be considered for regression tests.


On Sat, 19/8/17, Martin Maechler  wrote:

 Subject: Re: [Rd] Issues of R_pretty in src/appl/pretty.c

 Cc: r-devel@r-project.org
 Date: Saturday, 19 August, 2017, 7:47 PM

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Fri, 18 Aug 2017 15:44:06 + writes:

> Examples similar to
> pretty(c(-1,1)*1e300, n = 1e9, min.n = 1)
> with smaller 'n':
> pretty(c(-1,1)*1e304, n = 1e5, min.n = 1)
> pretty(c(-1,1)*1e306, n = 1e3, min.n = 1)


Thank you.

"But" all these work now (in R-devel, rev >= 73094) as they should,
at least for me, right?

Are you mentioning the  "small n" examples so we could use them
as regression tests  (instead of the regression test I had added
to tests/reg-large.R  which needs enough GB and is slowish ) --
or are you seeing a platform where the above cases still don't
work in (new enough) R-devel ?


> A report on 'pretty' when working with integers, similar to what led to 
change of 'seq' fuzz, is 
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15137

> 
> On Tue, 15/8/17, Martin Maechler  wrote:

> Subject: Re: [Rd] Issues of R_pretty in src/appl/pretty.c
> To: "Martin Maechler" 

> @r-project.org
> Date: Tuesday, 15 August, 2017, 3:55 PM

[snip]

> I've committed now what I think has been suggested
> above ... to R-devel only :
> 
> r73094 | maechler | 2017-08-15 09:10:27 +0200 (Tue, 15. Aug 2017) | 1 
Zeile
> Geänderte Pfade:
>M doc/NEWS.Rd
>M src/appl/pretty.c
>M src/main/engine.c
>M tests/reg-large.R
>M tests/reg-tests-2.Rout.save

> pretty(x, n) fix overflow for large n suggested by Suhartu Aggano, 
R-devel, 2017-08-11

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Issues of R_pretty in src/appl/pretty.c

2017-08-18 Thread Suharto Anggono Suharto Anggono via R-devel
Examples similar to
pretty(c(-1,1)*1e300, n = 1e9, min.n = 1)
with smaller 'n':
pretty(c(-1,1)*1e304, n = 1e5, min.n = 1)
pretty(c(-1,1)*1e306, n = 1e3, min.n = 1)

A report on 'pretty' when working with integers, similar to what led to change 
of 'seq' fuzz, is https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15137


On Tue, 15/8/17, Martin Maechler  wrote:

 Subject: Re: [Rd] Issues of R_pretty in src/appl/pretty.c
 To: "Martin Maechler" 

@r-project.org
 Date: Tuesday, 15 August, 2017, 3:55 PM

>>>>> Martin Maechler 
>>>>>     on Mon, 14 Aug 2017 11:46:07 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>     on Fri, 11 Aug 2017 17:11:06 + writes:
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>     on Fri, 11 Aug 2017 17:11:06 + writes:

    >> See https://stat.ethz.ch/pipermail/r-devel/2017-August/074746.html for 
the origin of the example here.

    >> That
    >> pretty(c(-1,1)*1e300, n = 1e9, min.n = 1) gave 20 intervals, far from 
1e9, but
    >> pretty(c(-1,1)*1e300, n = 1e6, min.n = 1) gave 100 intervals
    >> (on a machine), made me trace through the code to function 'R_pretty' in 
https://svn.r-project.org/R/trunk/src/appl/pretty.c .

    > thank you.

    >> *lo is -1e300, *up is 1e300.
    >> cell = fmax2(fabs(*lo),fabs(*up));
    >> 'cell' is 1e300.
    >> i_small = dx < cell * U * imax2(1,*ndiv) * DBL_EPSILON *3;
    >> When *ndiv is (int) 1e9, apparently cell * U * imax2(1,*ndiv) overflows 
to infinity and 'i_small' is 1 (true). It doesn't happen when *ndiv is (int) 
1e6.

[[elided Yahoo spam]]

    >> Putting parentheses may avoid the floating point overflow. For example,
    >> i_small = dx < cell * (U * imax2(1,*ndiv) * DBL_EPSILON) *3;

    > yes... but only if the compiler optimization steps  "keep the 
parentheses".
    > AFAIK, there is no guarantee for that.
    > To make sure, I'd replace the above by

    > U *= imax2(1,*ndiv) * DBL_EPSILON;
    > i_small = dx < cell * U * 3;


    >> The part
    >> U = (1 + (h5 >= 1.5*h+.5)) ? 1/(1+h) : 1.5/(1+h5);
    >> is strange. Because (h5 >= 1.5*h+.5) is 1 or 0, (1 + (h5 >= 1.5*h+.5)) 
is never zero and 1/(1+h) will always be chosen.

[[elided Yahoo spam]]
    > here was as a change (not by me!) adding wrong parentheses
    > there (or maybe adding what the previously "missing" parens
    > implied, but not what they intended!).
    > The original code had been
     
    > U = 1 + (h5 >= 1.5*h+.5) ? 1/(1+h) : 1.5/(1+h5);

    > and "of course" was intended to mean

    > U = 1 + ((h5 >= 1.5*h+.5) ? 1/(1+h) : 1.5/(1+h5));

    > and this what I'll change it to, now.


    >> The comment for 'rounding_eps' says "1e-7 is consistent with 
seq.default()". Currently, seq.default() uses 1e-10 as fuzz.

    > Hmm, yes, thank you; this was correct when written,
    > but seq.default had been changed in the mean time,
    > namely in  svn r51095 | 2010-02-03

    > Usually we are cautious / reluctant to change such things w/o
    > any bug that we see to fix.
    > OTOH, we did have  bug cases we'd wanted to amend for seq() /
    > seq.int();
    > and I'll look into updating the "pretty - epsilon" also to
    > 1e-10.

[[elided Yahoo spam]]

I've committed now what I think has been suggested
above ... to R-devel only :

r73094 | maechler | 2017-08-15 09:10:27 +0200 (Tue, 15. Aug 2017) | 1 Zeile
Geänderte Pfade:
   M doc/NEWS.Rd
   M src/appl/pretty.c
   M src/main/engine.c
   M tests/reg-large.R
   M tests/reg-tests-2.Rout.save

pretty(x, n) fix overflow for large n suggested by Suhartu Aggano, R-devel, 
2017-08-11

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Issues of R_pretty in src/appl/pretty.c

2017-08-11 Thread Suharto Anggono Suharto Anggono via R-devel
See https://stat.ethz.ch/pipermail/r-devel/2017-August/074746.html for the 
origin of the example here.

That
pretty(c(-1,1)*1e300, n = 1e9, min.n = 1) gave 20 intervals, far from 1e9, but
pretty(c(-1,1)*1e300, n = 1e6, min.n = 1) gave 100 intervals
(on a machine), made me trace through the code to function 'R_pretty' in 
https://svn.r-project.org/R/trunk/src/appl/pretty.c .

*lo is -1e300, *up is 1e300.
cell = fmax2(fabs(*lo),fabs(*up));
'cell' is 1e300.
i_small = dx < cell * U * imax2(1,*ndiv) * DBL_EPSILON *3;
When *ndiv is (int) 1e9, apparently cell * U * imax2(1,*ndiv) overflows to 
infinity and 'i_small' is 1 (true). It doesn't happen when *ndiv is (int) 1e6.

Putting parentheses may avoid the floating point overflow. For example,
i_small = dx < cell * (U * imax2(1,*ndiv) * DBL_EPSILON) *3;

The part
U = (1 + (h5 >= 1.5*h+.5)) ? 1/(1+h) : 1.5/(1+h5);
is strange. Because (h5 >= 1.5*h+.5) is 1 or 0, (1 + (h5 >= 1.5*h+.5)) is never 
zero and 1/(1+h) will always be chosen.

The comment for 'rounding_eps' says "1e-7 is consistent with seq.default()". 
Currently, seq.default() uses 1e-10 as fuzz.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] translateChar in NewName in bind.c

2017-08-01 Thread Suharto Anggono Suharto Anggono via R-devel
For the 2nd example, I say that R 3.4.1 result is acceptable, as names(c(x)) 
and names(x) are equal.

The change exposed by the 2nd example is in line with statement of the NEWS 
item corresponding to PR#17284: "c() and unlist() are now more efficient in 
constructing the names(.) of their return value, " However, currently, the 
NEWS item is for R-devel, not R 3.4.1 patched.


On Mon, 31/7/17, Martin Maechler  wrote:

 Subject: Re: [Rd] translateChar in NewName in bind.c

 Cc: r-devel@r-project.org
 Date: Monday, 31 July, 2017, 8:38 PM
 
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Sun, 30 Jul 2017 14:57:53 + writes:

> R devel's bind.c has been ported to R patched. Is it OK while names of 
'unlist' or 'c' result may be not strictly the same as in R 3.4.1 because of 
changed function 'NewName' in bind.c?

> Using 'translateCharUTF8' instead of 'translateChar' is as it should be. 
It has an effect in non-UTF-8 locale for this example.

> x <- list(1:2)
> names(x) <- "\ue7"
> res <- unlist(x)
> charToRaw(names(res)[1])

> Directly assigning 'tag' to 'ans' is more efficient, but
> may be different from in R 3.4.1 that involves
> 'translateCharUTF8', that is also correct. It has an
> effect for this example. 

> x <- 0
> names(x) <- "\xe7"
> Encoding(names(x)) <- "latin1"
> res <- c(x)
> Encoding(names(res))
> charToRaw(names(res))

Yes, you are right, thank you:

That part of the changes in bind.c was *not* directly related to
the two R-bugs (PR#17284 & PR#17292)... and therefore, maybe I
should not have ported it to R-patched (= R 3.4.1 patched).

Your examples above are instructive..  notably the 2nd one seems
to demonstrate to me, that the change also *did* fix a bug:

   Encoding(names(res))

is "latin1" in R-devel  but interestingly is "UTF-8" in R 3.4.1,
indeed independently of the locale.

I would argue R-devel (and current R-patched) is more faithful
by keeping the Encoding "latin1" that was set for names(x) also
in the  names(c(x)) .

I could revert R-patched's  bind.c (so it only contains the two
official bug fixes PR#172(84|92)   but I wonder if it is
desirable in this case.

I'm glad for further reasoning.
Given current "knowledge"/"evidence",  I would not  revert
R-patched to R 3.4.1's behavior.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] translateChar in NewName in bind.c

2017-07-30 Thread Suharto Anggono Suharto Anggono via R-devel
R devel's bind.c has been ported to R patched. Is it OK while names of 'unlist' 
or 'c' result may be not strictly the same as in R 3.4.1 because of changed 
function 'NewName' in bind.c?

Using 'translateCharUTF8' instead of 'translateChar' is as it should be. It has 
an effect in non-UTF-8 locale for this example.
x <- list(1:2)
names(x) <- "\ue7"
res <- unlist(x)
charToRaw(names(res)[1])

Directly assigning 'tag' to 'ans' is more efficient, but may be different from 
in R 3.4.1 that involves 'translateCharUTF8', that is also correct. It has an 
effect for this example.
x <- 0
names(x) <- "\xe7"
Encoding(names(x)) <- "latin1"
res <- c(x)
Encoding(names(res))
charToRaw(names(res))


On Tue, 13/6/17, Tomas Kalibera  wrote:

 Subject: Re: [Rd] translateChar in NewName in bind.c

@r-project.org
 Date: Tuesday, 13 June, 2017, 2:35 PM

 Thanks, fixed in R-devel.
 Best
 Tomas

 On
 06/11/2017 02:30 PM, Suharto Anggono Suharto Anggono via
 R-devel wrote:
 > I see another thing in
 function 'NewName' in bind.c. In
 > else if (*CHAR(tag)) ,
 > 'ans' is basically copied from
 'tag'. Could the whole thing there be just the
 following?
 > ans = tag;
 > It seems to me that it can also replace
 > ans = R_BlankString;
 >
 in 'else'; so,
 > else if
 (*CHAR(tag))
 > and
 >
 else
 > can be merged to be just
 > else .
 >
 >
 
 >
 >
 >   Subject: translateChar in NewName in
 bind.c
 >   To: r-devel@r-project.org
 >   Date: Saturday, 10 June, 2017, 9:14
 PM
 >   
 >   In
 function 'NewName' in bind.c 
(https://svn.r-project.org/R/trunk/src/main/bind.c),
 in
 >   else if (*CHAR(base)) ,
 >   'translateChar' is used. Should
 it be
 >   'translateCharUTF8'
 instead? The end result is marked as
 >  
 UTF-8:
 >   mkCharCE(cbuf, CE_UTF8)
 >   Other cases already use
 >   'translateCharUTF8'.
 >
 >
 __
 > R-devel@r-project.org
 mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [bug] droplevels() also drop object attributes (comment…)

2017-06-14 Thread Suharto Anggono Suharto Anggono via R-devel
In R devel r72789, the added part in 'factor' documentation (factor.Rd) is the 
following.
Undocumentedly for a long time, \code{factor(x)} loses all 
\code{\link{attributes}(x)} but \code{"names"}, and resets \code{"levels"} and 
\code{"class"}.

In the code of function 'factor', names(x) is copied to the result. As I 
mentioned before, names(x) is _not_ "names" attribute of 'x' when 'x' is a 
"POSIXlt" object. In R devel r72789,
factor(x)
is successful when 'x' is a "POSIXlt" object.

I think, it is better to accurately state what the code does, maybe like this.
Undocumentedly for a long time, \code{factor(x)} loses all 
\code{\link{attributes}(x)}, but has a copy of \code{\link{names}(x)}.

Attributes "levels" and "class" are already documented right before the 
statement.


To be more balanced, I am pointing out that, currently, levels replacement of a 
factor ('levels<-.factor') keeps attributes. My previous statement about 
"contrasts" attribute also holds there. By replacing levels, number of levels 
can change and, consequently, the original contrasts matrix is no longer valid. 
It can be argued that 'levels<-.factor' doesn't know "contrasts" attribute, as 
function 'contrasts' is in package stats, different from 'levels<-.factor' that 
is in package base. However, factor subsetting ('[.factor') knows "contrasts" 
attribute.

------------------------
On Sat, 10/6/17, Martin Maechler  wrote:

 Subject: Re: [Rd] [bug] droplevels() also drop object attributes (comment…)
 To: "R Development" 
 Cc: "Serge Bibauw" , "Suharto Anggono" >>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>     on Thu, 8 Jun 2017 16:43:48 + writes:

    > * Be careful with "contrasts" attribute. If the number of levels is 
reduced, the original contrasts matrix is no longer valid.
    > Example case:
    > x <- factor(c("a", "a", "b", "b", "b"), levels = c("a", "b", "c"))
    > contrasts(x) <- contr.treatment(levels(x), contrasts=FALSE)[, -2, 
drop=FALSE]
    > droplevels(x)

    > * If function 'factor' is changed, make sure that as.factor(x) and 
factor(x) is the same for 'x' where is.integer(x) is TRUE. Currently, 
as.factor() is treated specially.

    > * It is possible that names(x) is not attr(x, "names"). For example, 'x' 
is a "POSIXlt" object.
    > Look at this example, which works in R 3.3.2.
    > x <- as.POSIXlt("2017-01-01", tz="UTC")
    > factor(x, levels=x)


    > By the way, in NEWS, in "CHANGES IN R 3.4.0", in "SIGNIFICANT 
USER-VISIBLE CHANGES", there is "factor() now uses order() to sort its levels". 
It is false. Code of function 'factor' in R 3.4.0 
(https://svn.r-project.org/R/tags/R-3-4-0/src/library/base/R/factor.R) still 
uses 'sort.list', not 'order'.

    > 
>>>>> Martin Maechler 
>>>>>     on Tue, 16 May 2017 11:01:23 +0200 writes:

>>>>> Serge Bibauw 
>>>>>     on Mon, 15 May 2017 11:59:32 -0400 writes:

    >>> Hi,

    >>> Just reporting a small bug… not really a big deal, but I
    >>> don’t think that is intended: droplevels() also drops all
    >>> object’s attributes.

    >> Yes.  The help page for droplevels (or the simple
    >> definition of 'droplevels.factor') clearly indicate that
    >> the method for factors is really just a call to factor(x,
    >> exclude = *)

    >> and that _is_ quite an important base function whose
    >> semantic should not be changed lightly. Still, let's
    >> continue :

    >> Looking a bit, I see that the current behavior of factor()
    >> {and hence droplevels} has been unchanged in this respect
    >> for the whole history of R, well, at least for more than
    >> 17 years (R 1.0.1, April 2000).

    >> I'd agree there _is_ a bug, at least in the documentation
    >> which does *not* mention that currently, all attributes
    >> are dropped but "names", "levels" (and "class").

    >> OTOH, factor() would only need a small change to make it
    >> preserve all attributes (but "class" and "levels" which
    >> are set explicitly).

    >> I'm sure this will break some checks in some packages.  Is
    >> it worth it?

    >> e.g., our own R  QC checks currently c

Re: [Rd] translateChar in NewName in bind.c

2017-06-11 Thread Suharto Anggono Suharto Anggono via R-devel
I see another thing in function 'NewName' in bind.c. In
else if (*CHAR(tag)) ,
'ans' is basically copied from 'tag'. Could the whole thing there be just the 
following?
ans = tag;
It seems to me that it can also replace
ans = R_BlankString;
in 'else'; so,
else if (*CHAR(tag))
and
else
can be merged to be just
else .




 Subject: translateChar in NewName in bind.c
 To: r-devel@r-project.org
 Date: Saturday, 10 June, 2017, 9:14 PM
 
 In function 'NewName' in bind.c 
(https://svn.r-project.org/R/trunk/src/main/bind.c), in
 else if (*CHAR(base)) ,
 'translateChar' is used. Should it be
 'translateCharUTF8' instead? The end result is marked as
 UTF-8:
 mkCharCE(cbuf, CE_UTF8)
 Other cases already use
 'translateCharUTF8'.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] translateChar in NewName in bind.c

2017-06-10 Thread Suharto Anggono Suharto Anggono via R-devel
In function 'NewName' in bind.c 
(https://svn.r-project.org/R/trunk/src/main/bind.c), in
else if (*CHAR(base)) ,
'translateChar' is used. Should it be 'translateCharUTF8' instead? The end 
result is marked as UTF-8:
mkCharCE(cbuf, CE_UTF8)
Other cases already use 'translateCharUTF8'.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [bug] droplevels() also drop object attributes (comment…)

2017-06-08 Thread Suharto Anggono Suharto Anggono via R-devel
* Be careful with "contrasts" attribute. If the number of levels is reduced, 
the original contrasts matrix is no longer valid.
Example case:
x <- factor(c("a", "a", "b", "b", "b"), levels = c("a", "b", "c"))
contrasts(x) <- contr.treatment(levels(x), contrasts=FALSE)[, -2, drop=FALSE]
droplevels(x)

* If function 'factor' is changed, make sure that as.factor(x) and factor(x) is 
the same for 'x' where is.integer(x) is TRUE. Currently, as.factor() 
is treated specially.

* It is possible that names(x) is not attr(x, "names"). For example, 'x' is a 
"POSIXlt" object.
Look at this example, which works in R 3.3.2.
x <- as.POSIXlt("2017-01-01", tz="UTC")
factor(x, levels=x)


By the way, in NEWS, in "CHANGES IN R 3.4.0", in "SIGNIFICANT USER-VISIBLE 
CHANGES", there is "factor() now uses order() to sort its levels". It is false. 
Code of function 'factor' in R 3.4.0 
(https://svn.r-project.org/R/tags/R-3-4-0/src/library/base/R/factor.R) still 
uses 'sort.list', not 'order'.


> Martin Maechler 
> on Tue, 16 May 2017 11:01:23 +0200 writes:

> Serge Bibauw 
> on Mon, 15 May 2017 11:59:32 -0400 writes:

>> Hi,

>> Just reporting a small bug… not really a big deal, but I
>> don’t think that is intended: droplevels() also drops all
>> object’s attributes.

> Yes.  The help page for droplevels (or the simple
> definition of 'droplevels.factor') clearly indicate that
> the method for factors is really just a call to factor(x,
> exclude = *)

> and that _is_ quite an important base function whose
> semantic should not be changed lightly. Still, let's
> continue :

> Looking a bit, I see that the current behavior of factor()
> {and hence droplevels} has been unchanged in this respect
> for the whole history of R, well, at least for more than
> 17 years (R 1.0.1, April 2000).

> I'd agree there _is_ a bug, at least in the documentation
> which does *not* mention that currently, all attributes
> are dropped but "names", "levels" (and "class").

> OTOH, factor() would only need a small change to make it
> preserve all attributes (but "class" and "levels" which
> are set explicitly).

> I'm sure this will break some checks in some packages.  Is
> it worth it?

> e.g., our own R  QC checks currently check (the printing of) the
> following (in tests/reg-tests-2.R ):

>   > ## some tests of factor matrices
>   > A <- factor(7:12)
>   > dim(A) <- c(2, 3)
>   > A
>[,1] [,2] [,3]
>   [1,] 7911  
>   [2,] 810   12  
>   Levels: 7 8 9 10 11 12
>   > str(A)
>factor [1:2, 1:3] 7 8 9 10 ...
>- attr(*, "levels")= chr [1:6] "7" "8" "9" "10" ...
>   > A[, 1:2]
>[,1] [,2]
>   [1,] 79   
>   [2,] 810  
>   Levels: 7 8 9 10 11 12
>   > A[, 1:2, drop=TRUE]
>   [1] 7  8  9  10
>   Levels: 7 8 9 10
> 
> with the proposed change to factor(),
> the last call would change its result:
> 
>   > A[, 1:2, drop=TRUE]
>[,1] [,2]
>   [1,] 79   
>   [2,] 810  
>   Levels: 7 8 9 10

> because 'drop=TRUE' calls factor(..) and that would also
> preserve the "dim" attribute.  I would think that the
> changed behavior _is_ better, and is also according to
> documentation, because the help page for [.factor explains
> that 'drop = TRUE' drops levels, but _not_ that it
> transforms a factor matrix into a factor (vector).

> Martin

I'm finally coming back to this.
It still seems to make sense to change factor() and hence
droplevels() behavior here, and plan to commit this change
within a day.

Martin Maechler
ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-18 Thread Suharto Anggono Suharto Anggono via R-devel
>From an example in 
>http://www.uni-muenster.de/ZIV.BennoSueselbeck/s-html/helpfiles/nargs.html , 
>number of arguments in '...' can be obtained by
(function(...)nargs())(...) .

I now realize that sys.call() doesn't expand '...' when the function is called 
with '...'. It just returns the call as is. If 'stopifnot' uses sys.call() 
instead of match.call() , the following example behaves improperly:
g <- function(...) stopifnot(...)
g(TRUE, FALSE)


On Thu, 18/5/17, Martin Maechler  wrote:

 Subject: Re: [Rd] stopifnot() does not stop at first non-TRUE argument

 Cc: r-devel@r-project.org
 Date: Thursday, 18 May, 2017, 3:03 PM
 
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Tue, 16 May 2017 16:37:45 + writes:

> switch(i, ...)
> extracts 'i'-th argument in '...'. It is like
> eval(as.name(paste0("..", i))) .

Yes, that's neat.

It is only almost the same:  in the case of illegal 'i'
the switch() version returns
invisible(NULL)

whereas the version we'd want should signal an error, typically
the same error message as

  > t2 <- function(...) ..2
  > t2(1)
  Error in t2(1) (from #1) : the ... list does not contain 2 elements
  > 


> Just mentioning other things:
> - For 'n',
> n <- nargs()
> can be used.

I know .. [in this case, where '...' is the only formal argument of the 
function]

> - sys.call() can be used in place of match.call() .

Hmm... in many cases, yes notably, as we do *not* want the
argument names here, I think you are right.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] stopifnot() does not stop at first non-TRUE argument

2017-05-16 Thread Suharto Anggono Suharto Anggono via R-devel
switch(i, ...)
extracts 'i'-th argument in '...'. It is like
eval(as.name(paste0("..", i))) .

Just mentioning other things:
- For 'n',
n <- nargs()
can be used.
- sys.call() can be used in place of match.call() .
---
> peter dalgaard 
> on Mon, 15 May 2017 16:28:42 +0200 writes:

> I think Hervé's idea was just that if switch can evaluate arguments 
selectively, so can stopifnot(). But switch() is .Primitive, so does it from C. 

if he just meant that, then "yes, of course" (but not so interesting).

> I think it is almost a no-brainer to implement a sequential stopifnot if 
dropping to C code is allowed. In R it gets trickier, but how about this:

Something like this, yes, that's close to what Serguei Sokol had proposed
(and of course I *do*  want to keep the current sophistication
 of stopifnot(), so this is really too simple)

> Stopifnot <- function(...)
> {
> n <- length(match.call()) - 1
> for (i in 1:n)
> {
> nm <- as.name(paste0("..",i))
> if (!eval(nm)) stop("not all true")
> }
> }
> Stopifnot(2+2==4)
> Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!")
> Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!")
> Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T)


>> On 15 May 2017, at 15:37 , Martin Maechler  wrote:
>> 
>> I'm still curious about Hervé's idea on using  switch()  for the
>> issue.

> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] c documentation after change

2017-04-19 Thread Suharto Anggono Suharto Anggono via R-devel
In R 3.4.0 RC, argument list of 'c' as S4 generic function has become
(x, ...) .
However, "S4 methods" section in documentation of 'c' (c.Rd) is not updated yet.

Also, in R 3.4.0 RC, 'c' method of class "Date" ('c.Date') is still not 
explicitly documented.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] complex NA's match(), etc: not back-compatible change proposal

2017-04-01 Thread Suharto Anggono Suharto Anggono via R-devel
I am raising this again.

With
z <- complex(real = c(0,NaN,NaN), imaginary = c(NA,NA,0)) ,
results of
sapply(z, match, table = z)
and
match(z, z)
are different in R 3.4.0 alpha. I think they should be the same.

I suggest changing 'cequal' in unique.c such that a complex number that has 
both NA and NaN matches NA and doesn't match NaN, as such complex number is 
printed as NA.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rep/rep.int: in NEWS, but not yet ported from trunk

2017-02-27 Thread Suharto Anggono Suharto Anggono via R-devel
For R 3.3.3, if 3.3.3 is really the last in 3.3.x series, I suggest reverting 
to R 3.3.2 code (and removing the corresponding NEWS entry), if possible. 
Failure of something like
rep(5, list(6))
makes some previously working R code broken in some situation. It is not good 
to have in an R release that will last long, I think.

On Mon, 27/2/17, Martin Maechler  wrote:

 Subject: Re: [Rd] rep/rep.int: in NEWS, but not yet ported from trunk

 Cc: R-devel@r-project.org
 Date: Monday, 27 February, 2017, 4:20 PM
 
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Sun, 26 Feb 2017 12:02:44 + writes:

> According to "CHANGES IN R 3.3.2 patched" in NEWS, rep(x,
> times) and rep.int(x, times) also work when 'times' has
> length greater than one and has element larger than the
> maximal integer. In fact, it is still not the case in R
> 3.3.3 beta r72259. In seq.c
> (https://svn.r-project.org/R/branches/R-3-3-branch/src/main/seq.c),
> 'times' that is a vector with storage mode "double" and
> length greater than one is still changed first to storage
> mode "integer". Number in 'times' that represents an
> integer that is larger than the maximal integer becomes NA
> and error is issued for such 'times'.  
> I have put a comment,
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16932#c30
> .

[[elided Yahoo spam]]

- I've changed the NEWS entry for R-patched (and moved the more
general statement to a new entry for R-devel). 
- The changes were quite substantial so I did not port them to
R-patched at the time..  We could have ported them later, but
not now, immediately before code freeze (of R 3.3.3).

- I would say   rep(5, list(6))  was never "meant to" work and had worked
  incidentally only.
  OTOH, you are correct with your comments 11 & 29 in the about
  bug report, and your proposal to make the simple case   rep(s, list(7))
  work as previously seems ok to me.

However, for all this, we will concentrate on R-devel (to become
R 3.4.0).

Best regards,
Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] rep/rep.int: in NEWS, but not yet ported from trunk

2017-02-26 Thread Suharto Anggono Suharto Anggono via R-devel
According to "CHANGES IN R 3.3.2 patched" in NEWS, rep(x, times) and rep.int(x, 
times) also work when 'times' has length greater than one and has element 
larger than the maximal integer. In fact, it is still not the case in R 3.3.3 
beta r72259. In seq.c 
(https://svn.r-project.org/R/branches/R-3-3-branch/src/main/seq.c), 'times' 
that is a vector with storage mode "double" and length greater than one is 
still changed first to storage mode "integer". Number in 'times' that 
represents an integer that is larger than the maximal integer becomes NA and 
error is issued for such 'times'.

I have put a comment, 
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16932#c30 .

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-02-07 Thread Suharto Anggono Suharto Anggono via R-devel
Function 'tapply' in R devel r72137 uses
if(!is.null(ans) && is.na(default) && is.atomic(ans)) .

Problems:
- It is possible that user-specified 'default' is not of length 1. If the 
length is zero, the 'if' gives an error.
- It is possible that is.na(default) is TRUE and user-specified 'default' is 
NaN.


On Sat, 4/2/17, Martin Maechler  wrote:

 Subject: Re: [Rd] RFC: tapply(*, ..., init.value = NA)

 Cc: R-devel@r-project.org
 Date: Saturday, 4 February, 2017, 10:48 PM
 
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Wed, 1 Feb 2017 16:17:06 + writes:

[snip]

> vector(typeof(ans)) (or vector(storage.mode(ans))) has
> length zero and can be used to initialize array.  

Yes,.. unless in the case where ans is NULL.
You have convinced me, that is  nicer.

> Instead of if(missing(default)) , if(identical(default,
> NA)) could be used. The documentation could then say, for
> example: "If default = NA (the default), NA of appropriate
> storage mode (0 for raw) is automatically used."

After some thought (and experiments), I have reverted and no
longer use if(missing). You are right that it is not needed
(and even potentially confusing) here.

Changes are in svn c72106.

Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Lack of 'seq_len' in 'head' in 'stopifnot'

2017-02-04 Thread Suharto Anggono Suharto Anggono via R-devel
Function 'stopifnot' in R devel r72104 has this.
head <- function(x, n = 6L) ## basically utils:::head.default()
x[if(n < 0L) max(length(x) + n, 0L) else min(n, length(x))]

If definition like utils:::head.default is intended, the index of 'x' should be 
wrapped in seq_len(...):
x[seq_len(...)]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-02-01 Thread Suharto Anggono Suharto Anggono via R-devel
On 'aggregate data.frame', the URL should be 
https://stat.ethz.ch/pipermail/r-help/2016-May/438631.html .

vector(typeof(ans))
(or  vector(storage.mode(ans)))
has length zero and can be used to initialize array.

Instead of
if(missing(default)) ,
if(identical(default, NA))
could be used. The documentation could then say, for example: "If default = NA 
(the default), NA of appropriate storage mode (0 for raw) is automatically 
used."

On Wed, 1/2/17, Martin Maechler  wrote:

 Subject: Re: [Rd] RFC: tapply(*, ..., init.value = NA)

 Cc: R-devel@r-project.org
 Date: Wednesday, 1 February, 2017, 12:14 AM
 
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Tue, 31 Jan 2017 15:43:53 + writes:

> Function 'aggregate.data.frame' in R has taken a different route. With 
drop=FALSE, the function is also applied to subset corresponding to combination 
of grouping variables that doesn't appear in the data (example 2 in 
https://stat.ethz.ch/pipermail/r-devel/2017-January/073678.html).

Interesting point (I couldn't easily find 'the example 2' though).
However, aggregate.data.frame() is a considerably more
sophisticated function and one goal was to change tapply() as
little as possible for compatibility (and maintenance!) reasons .

[snip]

> With the code using
>if(missing(default)) ,
> I consider the stated default value of 'default',
>default = NA ,
> misleading because the code doesn't use it. 

I know and I also had thought about it and decided to keep it 
in the spirit of "self documentation" because  "in spirit", the
default still *is* NA.

> Also,
>  tapply(1:3, 1:3, as.raw)
> is not the same as
>  tapply(1:3, 1:3, as.raw, default = NA) .
> The accurate statement is the code in
> if(missing(default)) ,
> but it involves the local variable 'ans'.

exactly.  But putting that whole expression in there would look
confusing to those using  str(tapply), args(tapply) or similar
inspection to quickly get a glimpse of the function user "interface".
That's why we typically don't do that and rather slightly cheat
with the formal default, for the above "didactical" purposes.

If you are puristic about this, then missing() should almost never
be used when the function argument has a formal default.

I don't have a too strong opinion here, and we do have quite a
few other cases, where the formal default argument is not always
used because of   if(missing(.))  clauses.

I think I could be convinced to drop the '= NA' from the formal
argument list..


> As far as I know, the result of function 'array' in is not a classed 
object and the default method of  `[<-` will be used in the 'tapply' code 
portion.

> As far as I know, the result of 'lapply' is a list without class. So, 
'unlist' applied to it uses the default method and the 'unlist' result is a 
vector or a factor.

You may be right here
  ((or not:  If a package author makes array() into an S3 generic and defines
S3method(array, *) and she or another make tapply() into a
generic with methods,  are we really sure that this code
would not be used ??))

still, the as.raw example did not easily work without a warning
when using as.vector() .. or similar.

> With the change, the result of

> tapply(1:3, 1:3, factor, levels=3:1)

> is of mode "character". The value is from the internal code, not from the 
factor levels. It is worse than before the change, where it is really the 
internal code, integer.

I agree that this change is not desirable.
One could argue that it was quite a "lucky coincidence" that the previous
code returned the internal integer codes though..


[snip]


> To initialize array, a zero-length vector can also be used.

yes, of course; but my  ans[0L][1L]  had the purpose to get the
correct mode specific version of NA .. which works for raw (by
getting '00' because "raw" has *no* NA!).

So it seems I need an additional   !is.factor(ans)  there ...
a bit ugly.


-

[snip]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-31 Thread Suharto Anggono Suharto Anggono via R-devel
Function 'aggregate.data.frame' in R has taken a different route. With 
drop=FALSE, the function is also applied to subset corresponding to combination 
of grouping variables that doesn't appear in the data (example 2 in 
https://stat.ethz.ch/pipermail/r-devel/2017-January/073678.html).

Because 'default' is used only when simplification happens, putting 'default' 
after 'simplify' in the argument list may be more logical. Anyway, it doesn't 
affect call to 'tapply' because the argument 'default' must be specified by 
name.

With the code using
if(missing(default)) ,
I consider the stated default value of 'default',
default = NA ,
misleading because the code doesn't use it. Also,
tapply(1:3, 1:3, as.raw)
is not the same as
tapply(1:3, 1:3, as.raw, default = NA) .
The accurate statement is the code in
if(missing(default)) ,
but it involves the local variable 'ans'.

As far as I know, the result of function 'array' in is not a classed object and 
the default method of  `[<-` will be used in the 'tapply' code portion.

As far as I know, the result of 'lapply' is a list without class. So, 'unlist' 
applied to it uses the default method and the 'unlist' result is a vector or a 
factor.

With the change, the result of
tapply(1:3, 1:3, factor, levels=3:1)
is of mode "character". The value is from the internal code, not from the 
factor levels. It is worse than before the change, where it is really the 
internal code, integer.
In the documentation, the description of argument 'simplify' says: "If 'TRUE' 
(the default), then if 'FUN' always returns a scalar, 'tapply' returns an array 
with the mode of the scalar."

To initialize array, a zero-length vector can also be used.

For 'xtabs', I think that it is better if the result has storage mode "integer" 
if 'sum' results are of storage mode "integer", as in R 3.3.2. As 'default' 
argument for 'tapply', 'xtabs' can use 0L, or use 0L or 0 depending on storage 
mode of the summed quantity.


> Henrik Bengtsson 
> on Fri, 27 Jan 2017 09:46:15 -0800 writes:

> On Fri, Jan 27, 2017 at 12:34 AM, Martin Maechler
>  wrote:
>> 
>> > On Jan 26, 2017 07:50, "William Dunlap via R-devel"
>>  > wrote:
>> 
>> > It would be cool if the default for tapply's init.value
>> could be > FUN(X[0]), so it would be 0 for FUN=sum or
>> FUN=length, TRUE for > FUN=all, -Inf for FUN=max, etc.
>> But that would take time and would > break code for which
>> FUN did not work on length-0 objects.
>> 
>> > Bill Dunlap > TIBCO Software > wdunlap tibco.com
>> 
>> I had the same idea (after my first post), so I agree
>> that would be nice. One could argue it would take time
>> only if the user is too lazy to specify the value, and we
>> could use tryCatch(FUN(X[0]), error = NA) to safeguard
>> against those functions that fail for 0 length arg.
>> 
>> But I think the main reason for _not_ setting such a
>> default is back-compatibility.  In my proposal, the new
>> argument would not be any change by default and so all
>> current uses of tapply() would remain unchanged.
>> 
>>> Henrik Bengtsson  on
>>> Thu, 26 Jan 2017 07:57:08 -0800 writes:
>> 
>> > On a related note, the storage mode should try to match
>> ans[[1]] (or > unlist:ed and) when allocating 'ansmat' to
>> avoid coercion and hence a full > copy.
>> 
>> Yes, related indeed; and would fall "in line" with Bill's
>> idea.  OTOH, it could be implemented independently, by
>> something like
>> 
>> if(missing(init.value)) init.value <- if(length(ans))
>> as.vector(NA, mode=storage.mode(ans[[1]])) else NA

> I would probably do something like:

>   ans <- unlist(ans, recursive = FALSE, use.names = FALSE)
>   if (length(ans)) storage.mode(init.value) <- storage.mode(ans[[1]])
>   ansmat <- array(init.value, dim = extent, dimnames = namelist)

> instead.  That completely avoids having to use missing() and the value
> of 'init.value' will be coerced later if not done upfront.  use.names
> = FALSE speeds up unlist().

Thank you, Henrik.
That's a good idea to do the unlist() first, and with 'use.names=FALSE'.
I'll copy that.

On the other hand, "brutally" modifying  'init.value' (now called 'default')
even when the user has specified it is not acceptable I think.
You are right that it would be coerced anyway subsequently, but
the coercion will happen in whatever method of  `[<-` will be
appropriate.
Good S3 and S4 programmers will write such methods for their classes.

For that reason, I'm even more conservative now, only fiddle in
case of an atomic 'ans' and make use of the corresponding '['
method rather than as.vector(.) ... because that will fulfill
the following new regression test {not fulfilled in current R}:

identical(tapply(1:3, 1:3, as.raw),
  array(as.raw(1:3), 3L, dimnames=list(1:3)))

Also, I've done a few more things -- treating if(.) .

Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-27 Thread Suharto Anggono Suharto Anggono via R-devel
The "no factor combination" case is distinguishable by 'tapply' with 
simplify=FALSE.

> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), N=3)
> D2 <- D2[-c(1,5), ]
> DN <- D2; DN[1,"N"] <- NA
> with(DN, tapply(N, list(n,L), FUN=sum, simplify=FALSE))
  ABCDEF
1 NA   6NULL NULL NULL NULL
2 NULL NULL 36NULL NULL
3 NULL NULL NULL NULL 66


There is an old related discussion starting on 
https://stat.ethz.ch/pipermail/r-devel/2007-November/047338.html .

--
Last week, we've talked here about "xtabs(), factors and NAs",
 ->  https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html

In the mean time, I've spent several hours on the issue
and also committed changes to R-devel "in two iterations".

In the case there is a *Left* hand side part to xtabs() formula,
see the help page example using 'esoph',
it uses  tapply(...,  FUN = sum)   and
I now think there is a missing feature in tapply() there, which
I am proposing to change. 

Look at a small example:

> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), N=3)[-c(1,5), 
> ]; xtabs(~., D2)
, , N = 3

   L
n   A B C D E F
  1 1 2 0 0 0 0
  2 0 0 1 2 0 0
  3 0 0 0 0 2 2

> DN <- D2; DN[1,"N"] <- NA; DN
   n L  N
2  1 A NA
3  1 B  3
4  1 B  3
6  2 C  3
7  2 D  3
8  2 D  3
9  3 E  3
10 3 E  3
11 3 F  3
12 3 F  3
> with(DN, tapply(N, list(n,L), FUN=sum))
   A  B  C  D  E  F
1 NA  6 NA NA NA NA
2 NA NA  3  6 NA NA
3 NA NA NA NA  6  6
>  

and as you can see, the resulting matrix has NAs, all the same
NA_real_, but semantically of two different kinds:

1) at ["1", "A"], the  NA  comes from the NA in 'N'
2) all other NAs come from the fact that there is no such factor combination
   *and* from the fact that tapply() uses

   array(dim = .., dimnames = ...)

i.e., initializes the array with NAs  (see definition of 'array').

My proposition is the following patch to  tapply(), adding a new
option 'init.value':

-
 
-tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
+tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = 
TRUE)
 {
 FUN <- if (!is.null(FUN)) match.fun(FUN)
 if (!is.list(INDEX)) INDEX <- list(INDEX)
@@ -44,7 +44,7 @@
 index <- as.logical(lengths(ans))  # equivalently, lengths(ans) > 0L
 ans <- lapply(X = ans[index], FUN = FUN, ...)
 if (simplify && all(lengths(ans) == 1L)) {
-   ansmat <- array(dim = extent, dimnames = namelist)
+   ansmat <- array(init.value, dim = extent, dimnames = namelist)
ans <- unlist(ans, recursive = FALSE)
 } else {
ansmat <- array(vector("list", prod(extent)),

-

With that, I can set the initial value to '0' instead of array's
default of NA :

> with(DN, tapply(N, list(n,L), FUN=sum, init.value=0))
   A B C D E F
1 NA 6 0 0 0 0
2  0 0 3 6 0 0
3  0 0 0 0 6 6
> 

which now has 0 counts and NA  as is desirable to be used inside
xtabs().

All fine... and would not be worth a posting to R-devel,
except for this:

The change will not be 100% back compatible -- by necessity: any new argument 
for
tapply() will make that argument name not available to be
specified (via '...') for 'FUN'.  The new function would be

> str(tapply)
function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE)  

where the '...' are passed FUN(),  and with the new signature,
'init.value' then won't be passed to FUN  "anymore" (compared to
R <= 3.3.x).

For that reason, we could use   'INIT.VALUE' instead (possibly decreasing
the probability the arg name is used in other functions).


Opinions?

Thank you in advance,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-29 Thread Suharto Anggono Suharto Anggono via R-devel
Interspersed below.



 Subject: Re: ifelse() woes ... can we agree on a ifelse2() ?
 To: r-de...@lists.r-project.org
 Date: Sunday, 27 November, 2016, 12:14 AM
 
On current 'ifelse' code in R:
...
* If 'test' is a factor, doing
storage.mode(test) <- "logical"
is not appropriate, but is.atomic(test) returns TRUE. Maybe use
if(!is.object(test))
instead of
if(is.atomic(test)) .
===
I now see that, for 'test' that is atomic and has "class" attribute, with 
current 'ifelse' code, changing
if(is.atomic(test))
to
if(!is.object(test))
removes class of 'test' and makes the result doesn't have class of 'test', 
which is not according to the documentation. The documentation of 'ifelse' says 
that the value is "A vector of the same length and attributes (including 
dimensions and "class") as 'test' ...".
===


function(test, yes, no, NA. = NA) {
if(!is.logical(test))
test <- if(isS4(test)) methods::as(test, "logical") else 
as.logical(test)
n <- length(test)
n.yes <- length(yes); n.no <- length(no)
if (n.yes != n) {
if (n.no == n) {  # swap yes <-> no
test <- !test
ans <- yes; yes <- no; no <- ans
n.no <- n.yes
} else yes <- yes[rep_len(seq_len(n.yes), n)]
}
ans <- yes
if (n.no == 1L)
ans[!test] <- no
else
ans[!test & !is.na(test)] <- no[
if (n.no == n) !test & !is.na(test)
else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
stopifnot(length(NA.) == 1L)
ans[is.na(test)] <- NA.
ans
}

===
For data frame, indexing by logical matrix is different from indexing by 
logical vector.
Because there is an example like that, I think that it is better to remove
if(!is.logical(test))
in the function definition above, making 'as.logical' also applied to 'test' of 
mode "logical", stripping attributes. Doing so makes sure that 'test' is a 
plain logical vector, so that indexing is compatible with 'length'.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-26 Thread Suharto Anggono Suharto Anggono via R-devel
Related to the length of 'ifelse' result, I want to say that "example of 
different return modes" in ?ifelse led me to perceive a wrong thing in the past.

 ## example of different return modes:
 yes <- 1:3
 no <- pi^(0:3)
 typeof(ifelse(NA,yes, no)) # logical
 typeof(ifelse(TRUE,  yes, no)) # integer
 typeof(ifelse(FALSE, yes, no)) # double

As the result of each 'ifelse' call is not printed, I thought that the length 
of the result is 3. In fact, the length of the result is 1.
I realize just now that the length of 'no' is different from 'yes'. The length 
of 'yes' is 3, the length of 'no' is 4.






 Subject: Re: ifelse() woes ... can we agree on a ifelse2() ?
 To: r-de...@lists.r-project.org
 Date: Sunday, 27 November, 2016, 8:50 AM
 
In all of the proposed 'ifelse'-like functions so far, including from me (that 
I labeled as 'ifelse2', following Martin Maechler) and from Martin Maechler, 
the length of the result equals the length of 'test', like in 'ifelse'. There 
is no recycling of 'test'.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-26 Thread Suharto Anggono Suharto Anggono via R-devel
For S Ellison, just clarifying, I am Suharto Anggono, not Martin Maechler. 
"Martin et al.," from my previous E-mail was the beginning of message from 
Gabriel Becker, that I quoted.
The quoted "still a bit disappointed that nobody has taken a look" is from 
Martin Maechler.
In all of the proposed 'ifelse'-like functions so far, including from me (that 
I labeled as 'ifelse2', following Martin Maechler) and from Martin Maechler, 
the length of the result equals the length of 'test', like in 'ifelse'. There 
is no recycling of 'test'.



-
> Just stating, in 'ifelse', 'test' is not recycled. As I said in "R-intro: 
> length of 'ifelse' result" 
> (https://stat.ethz.ch/pipermail/r-devel/2016-September/073136.html), 
> ifelse(condition, a, b) 
> returns a vector of the length of 'condition', even if 'a' or 'b' is longer.

That is indeed (almost) the documented behaviour. The documented behaviour is 
slightly more complex; '... returns a value _of the same shape_ as 'test''. IN 
principle, test can be a matrix, for example.

> A concrete version of 'ifelse2' that starts the result from 'yes':
> .. still a bit disappointed that nobody has taken a look ...

I took a look. The idea leaves (at least) me very uneasy. If you are recycling 
'test' as well as arbitrary-length yes and no, results will become 
frighteningly hard to predict except in very simple cases where you have 
well-defined and consistent regularities in the data. And where you do, surely 
passing ifelse a vetor of the right length, generated by rep() applied to a 
short 'test' vector, will do what you want without messing around with new 
functions that hide what you're doing.

Do you really have a case where 'test' is neither a single logical (that could 
be used with 'if') nor a vector that can be readily replicated to the desired 
length with 'rep'?

If not, I'd drop the attempt to generate new ifelse-like functions. 

S Ellison

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-11-26 Thread Suharto Anggono Suharto Anggono via R-devel
Just stating, in 'ifelse', 'test' is not recycled. As I said in "R-intro: 
length of 'ifelse' result" 
(https://stat.ethz.ch/pipermail/r-devel/2016-September/073136.html), 
ifelse(condition, a, b) returns a vector of the length of 'condition', even if 
'a' or 'b' is longer.

On current 'ifelse' code in R:
* The part
ans[nas] <- NA
could be omitted because NA's are already in place.
If the part is removed, variable 'nas' is no longer used.
* The any(*) part actually checks the thing that is used as the index vector. 
The index vector could be stored and then repeatedly used, like the following.
    if (any(sel <- test & ok))
    ans[sel] <- rep(yes, length.out = length(ans))[sel]
* If 'test' is a factor, doing
storage.mode(test) <- "logical"
is not appropriate, but is.atomic(test) returns TRUE. Maybe use
if(!is.object(test))
instead of
if(is.atomic(test)) .

On ifelse-checks.R:
* In function 'chkIfelse', if the fourth function argument names is not "NA.", 
the argument name is changed, but the function body still uses the old name. 
That makes error in chkIfelse(ifelseHW) .
A fix:
        if(names(formals(FUN))[[4]] != "NA.") {
            body(FUN) <- do.call(substitute, list(body(FUN),
                setNames(list(quote(NA.)), names(formals(FUN))[[4]])))
            names(formals(FUN))[[4]] <- "NA."
        }
After fixing, chkIfelse(ifelseHW) just fails at identical(iflt, 
as.POSIXlt(ifct)) .
'iflt' has NA as 'tzone' and 'isdst' components.
* Because function 'chkIfelse' continues checking after failure,
as.POSIXlt(ifct)
may give error. The error happens, for example, in chkIfelse(ifelseR) . Maybe 
place it inside try(...).
* If 'lt' is a "POSIXlt" object, (lt-100) is a "POSIXct" object.
So,
FUN(c(TRUE, FALSE, NA, TRUE), lt, lt-100)
is an example of mixed class.
* The part of function 'chkIfelse' in
for(i in seq_len(nFact))
uses 'NA.' function argument. That makes error when 'chkIfelse' is applied to 
function without fourth argument.
The part should be wrapped in
if(has.4th) .
* Function 'ifelseJH' has fourth argument, but the argument is not for value if 
NA. So, instead of
chkIfelse(ifelseJH) ,
maybe call
chkIfelse(function(test, yes, no) ifelseJH(test, yes, no)) .

A concrete version of 'ifelse2' that starts the result from 'yes':
function(test, yes, no, NA. = NA) {
    if(!is.logical(test))
        test <- if(isS4(test)) methods::as(test, "logical") else 
as.logical(test)
    n <- length(test)
    ans <- rep(yes, length.out = n)
    ans[!test & !is.na(test)] <- rep(no, length.out = n)[!test & !is.na(test)]
    ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)]
    ans
}

It requires 'rep' method that is compatible with subsetting. It also works with 
"POSIXlt" in R 2.7.2, when 'length' gives 9, and gives an appropriate result if 
time zones are the same.
For coercion of 'test', there is no need of keeping attributes. So, it doesn't 
do
storage.mode(test) <- "logical"
and goes directly to 'as.logical'.
It relies on subassignment for silent coercions of
logical < integer < double < complex .
Unlike 'ifelse', it never skips any subassignment. So, phenomenon as in 
"example of different return modes" in ?ifelse doesn't happen.

Another version, for keeping attributes as pointed out by Duncan Murdoch:
function(test, yes, no, NA. = NA) {
    if(!is.logical(test))
        test <- if(isS4(test)) methods::as(test, "logical") else 
as.logical(test)
    n <- length(test)
    n.yes <- length(yes); n.no <- length(no)
    if (n.yes != n) {
        if (n.no == n) {  # swap yes <-> no
            test <- !test
            ans <- yes; yes <- no; no <- ans
            n.no <- n.yes
        } else yes <- yes[rep_len(seq_len(n.yes), n)]
    }
    ans <- yes
    if (n.no == 1L)
        ans[!test] <- no
    else
        ans[!test & !is.na(test)] <- no[
            if (n.no == n) !test & !is.na(test)
            else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
    stopifnot(length(NA.) == 1L)
    ans[is.na(test)] <- NA.
    ans
}

Note argument evaluation order: 'test', 'yes', 'no', 'NA.'.
First, it chooses the first of 'yes' and 'no' that has the same length as the 
result. If none of 'yes' and 'no' matches the length of the result, it chooses 
recycled (or truncated) 'yes'.
It uses 'rep' on the index and subsetting as a substitute for 'rep' on the 
value.
It requires 'length' method that is compatible with subsetting.
Additionally, it uses the same idea as dplyr::if_else, or more precisely the 
helper function 'replace_with'. It doesn't use 'rep' if the length of 'no' is 1 
or is the same as the length of the result. For subassignment with value of 
length 1, recycling happens by itself and NA in index is OK.
It limits 'NA.' to be of length 1, considering 'NA.' just as a label for NA.

Cases where the last version above or 'ifelse2 or 'ifelseHW' in ifelse-def.R 
gives inappropriate answers:
- 'yes' and 'no' are "difftime" objects with different "units" attribute
- 'yes' and 'no' are "POSIXlt" objects with di

[Rd] Potential integer overflow in 'do_mapply'

2016-11-14 Thread Suharto Anggono Suharto Anggono via R-devel
Function 'do_mapply' in mapply.c has the following fragment.
for (int i = 0; i < longest; i++) {

Variable 'longest' is declared as R_xlen_t. Its value can be larger than the 
maximum int.

In the fragment, when 'longest' is larger than the maximum int, when 'i' 
reaches the maximum int, i++ will lead to overflow.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] lapply on long vector fails

2016-10-29 Thread Suharto Anggono Suharto Anggono via R-devel
I report here that, in RStudio in Data Scientist Workbench,
lapply(raw(2^31), function(x) NULL)
failed after not so long time.

> res <- lapply(raw(2^31), function(x) NULL)
Error in FUN(X[[i]], ...) : long vectors not supported yet: memory.c:1652
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid

locale:
 [1] LC_CTYPE=en_US.UTF-8  
 [2] LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8   
 [4] LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8  
 [8] LC_NAME=C 
 [9] LC_ADDRESS=C  
[10] LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8
[12] LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets 
[6] methods   base 

other attached packages:
[1] SparkR_1.6.1

loaded via a namespace (and not attached):
[1] tools_3.3.1


However, the code that implements 'lapply', function 'do_lapply' in apply.c, 
seems to support long vectors.

The error message points to memory.c:1652. I don't understand the code there.


A different case:
gc()
after
system.time(vector("list", 2^31))
gave an error with message pointing to memory.c at different line. Subsequent
gc()
didn't give error.

> system.time(vector("list", 2^31))
   user  system elapsed 
  3.104  15.608  18.711 
> gc()
Error in gc() : long vectors not supported yet: memory.c:1121
> gc()
 used (Mb) gc trigger(Mb)   max used(Mb)
Ncells 445496 23.8 75040040.1 59200031.7
Vcells 667725  5.1 2062440028 15735.2 2148153146 16389.2

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] 'max' on mixed numeric and character arguments

2016-10-08 Thread Suharto Anggono Suharto Anggono via R-devel
Bug 17160 (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17160) leads me 
to ask this. In R, should max(...) equals max(c(...)) ? If 'max' have both 
numeric and character arguments, should lexicographic sorting be used for all?

> max("", 3, 10)
[1] "3"
> max("", c(3, 10))
[1] "10"
> range("", c(3, 10))[2]
[1] "3"

Should all above have the same result?

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Undocumented 'use.names' argument to c()

2016-09-26 Thread Suharto Anggono Suharto Anggono via R-devel
By "an argument named 'use.names' is included for concatenation", I meant 
something like this, that someone might try.

> c(as.Date("2016-01-01"), use.names=FALSE)
use.names 
"2016-01-01" "1970-01-01" 

See, 'use.names' is in the output. That's precisely because 'c.Date' doesn't 
have 'use.names', so that 'use.names' is absorbed into '...'.

On Sun, 25/9/16, Martin Maechler  wrote:

 Subject: Re: [Rd] Undocumented 'use.names' argument to c()

 Cc: "R-devel" 
 Date: Sunday, 25 September, 2016, 10:14 PM
 
>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Sun, 25 Sep 2016 14:12:10 + writes:

>> From comments in
>> 
http://stackoverflow.com/questions/24815572/why-does-function-c-accept-an-undocumented-argument/24815653
>> : The code of c() and unlist() was formerly shared but
>> has been (long time passing) separated. From July 30,
>> 1998, is where do_c got split into do_c and do_unlist.
> With the implementation of 'c.Date' in R devel r71350, an
> argument named 'use.names' is included for
> concatenation. So, it doesn't follow the documented
> 'c'. But, 'c.Date' is not explicitly documented in
> Dates.Rd, that has 'c.Date' as an alias.

I do not see any  c.Date  in R-devel with a 'use.names'; its a
base function, hence not hidden ..

As mentioned before, 'use.names' is used in unlist() in quite a
few places, and such an argument also exists for

lengths()and
all.equal.list()

and now c()

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Undocumented 'use.names' argument to c()

2016-09-25 Thread Suharto Anggono Suharto Anggono via R-devel
ething we do not want.

(read on)

>>>> c
>>> function(..., recursive = F)
>>> .Internal(c(..., recursive = recursive), "S_unlist", TRUE, 1)
>>>> unlist
>>> function(data, recursive = T, use.names = T)
>>> .Internal(unlist(data, recursive = recursive, use.names = use.names),
>>> "S_unlist", TRUE, 2)
>>>> c(A=1,B=2,use.names=FALSE)
>>> A B use.names
>>> 1 2 0
>>> 
>>> The C code used sys_index==2 to mean 'the last  argument is the 
'use.names'
>>> argument, if sys_index==1 only the recursive argument was considered
>>> special.
>>> 
>>> Sys.funs.c:
>>> 405 S_unlist(vector *ent, vector *arglist, s_evaluator *S_evaluator)
>>> 406 {
>>> 407 int which = sys_index; boolean named, recursive, names;
>>> ...
>>> 419 args = arglist->value.tree; n = arglist->length;
>>> ...
>>> 424 names = which==2 ? logical_value(args[--n], ent, 
S_evaluator)
>>> : (which == 1);
>>> 
>>> Thus there is no historical reason for giving c() the use.names 
argument.
>>> 
>>> 
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>> 
>>> On Fri, Sep 23, 2016 at 9:37 AM, Suharto Anggono Suharto Anggono via
>>> R-devel  wrote:
>>> 
>>>> In S-PLUS 3.4 help on 'c' (http://www.uni-muenster.de/
>>>> ZIV.BennoSueselbeck/s-html/helpfiles/c.html), there is no 'use.names'
>>>> argument.
>>>> 
>>>> Because 'c' is a generic function, I don't think that changing formal
>>>> arguments is good.
>>>> 
>>>> In R devel r71344, 'use.names' is not an argument of functions 
'c.Date',
>>>> 'c.POSIXct' and 'c.difftime'.
You are right, Suharto, that methods for c() currently have no
such argument.

But again because c() is primitive and has a '...' at the
beginning, this does not explicitly hurt, currently, does it?

>>>> Could 'use.names' be documented to be accepted by the default method of
>>>> 'c', but not listed as a formal argument of 'c'?
>>>> Or, could the code that handles the argument name
>>>> 'use.names' be removed? 

In principle, of course both could happen, and if one of these
two was preferable to the current state, I'd tend to the first one:
Consider 'use.names [= FALSE]' just an argument of the default
method for c(),  so existing c() methods would not have a strong need
for updating.

Notably, as the S4 generic for c,
via lines 48-49 of src/library/methods/R/BasicFunsList.R

, "c" = structure(function(x, ..., recursive = FALSE) standardGeneric("c"),
  signature="x")

has never had 'recursive' as part of the signature..
(and yes, that line 48 does need an update too !!!).

Martin


>>>> 
>>>> >>>>> David Winsemius 
>>>> >>>>> on Tue, 20 Sep 2016 23:46:48 -0700 writes:
>>>> 
>>>> >> On Sep 20, 2016, at 7:18 PM, Karl Millar via R-devel  wrote:
>>>> >>
>>>> >> 'c' has an undocumented 'use.names' argument.  I'm not sure if this
>>>> is
>>>> >> a documentation or implementation bug.
>>>> 
>>>> > It came up on stackoverflow a couple of years ago:
>>>> 
>>>> > http://stackoverflow.com/questions/24815572/why-does-
>>>> function-c-accept-an-undocumented-argument/24815653#24815653
>>>> 
>>>> > At the time it appeared to me to be a documentation lag.
>>>> 
>>>> Thank you, Karl and David,
>>>> yes it is a documentation glitch ... and a bit more:  Experts know that
>>>> print()ing of primitive functions is, eehm, "special".
>>>> 
>>>> I've committed a change to R-devel ... (with the intent to port
>>>> to R-patched).
>>>> 
>>>> Martin
>>>> 
>>>> >>
>>>> >>> c(a = 1)
>>>> >> a
>>>> >> 1
>>>> >>> c(a = 1, use.names = F)
>>>> >> [1] 1
>>>> >>
>>>> >> Karl
>>>>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Undocumented 'use.names' argument to c()

2016-09-23 Thread Suharto Anggono Suharto Anggono via R-devel
In S-PLUS 3.4 help on 'c' 
(http://www.uni-muenster.de/ZIV.BennoSueselbeck/s-html/helpfiles/c.html), there 
is no 'use.names' argument.

Because 'c' is a generic function, I don't think that changing formal arguments 
is good.

In R devel r71344, 'use.names' is not an argument of functions 'c.Date', 
'c.POSIXct' and 'c.difftime'.

Could 'use.names' be documented to be accepted by the default method of 'c', 
but not listed as a formal argument of 'c'? Or, could the code that handles the 
argument name 'use.names' be removed?

> David Winsemius 
> on Tue, 20 Sep 2016 23:46:48 -0700 writes:

>> On Sep 20, 2016, at 7:18 PM, Karl Millar via R-devel  wrote:
>> 
>> 'c' has an undocumented 'use.names' argument.  I'm not sure if this is
>> a documentation or implementation bug.

> It came up on stackoverflow a couple of years ago:

> 
http://stackoverflow.com/questions/24815572/why-does-function-c-accept-an-undocumented-argument/24815653#24815653

> At the time it appeared to me to be a documentation lag.

Thank you, Karl and David,
yes it is a documentation glitch ... and a bit more:  Experts know that
print()ing of primitive functions is, eehm, "special".

I've committed a change to R-devel ... (with the intent to port
to R-patched).

Martin

>> 
>>> c(a = 1)
>> a
>> 1
>>> c(a = 1, use.names = F)
>> [1] 1
>> 
>> Karl

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R-intro: length of 'ifelse' result

2016-09-14 Thread Suharto Anggono Suharto Anggono via R-devel
In "An Introduction to R" Version 3.3.1, in "9.2.1 Conditional execution: if 
statements", the last paragraph is the following:
There is a vectorized version of the if/else construct, the ifelse function. 
This has the form ifelse(condition, a, b) and returns a vector of the length of 
its longest argument, with elements a[i] if condition[i] is true, otherwise 
b[i].


In fact, ifelse(condition, a, b) returns a vector of the length of 'condition', 
even if 'a' or 'b' is longer.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] table(exclude = NULL) always includes NA

2016-09-09 Thread Suharto Anggono Suharto Anggono via R-devel
Looking at the code of function 'table' in R devel r71227, I see that the part 
"remove NA level if it was added only for excluded in factor(a, exclude=.)" is 
not quite right.

In
is.na(a) <- match(a0, c(exclude,NA), nomatch=0L)   ,
I think that what is intended is
a[a0 %in% c(exclude,NA)] <- NA  .
So, it should be
is.na(a) <- match(a0, c(exclude,NA), nomatch=0L) > 0L
or
is.na(a) <- as.logical(match(a0, c(exclude,NA), nomatch=0L))  .
The parallel code
is.na(a) <- match(a0,   exclude, nomatch=0L)
is to be treated similarly.

Example that gives wrong result in R devel r71225:
table(3:1, exclude = 1)
table(3:1, exclude = 1, useNA = "always")

On Tue, 16/8/16, Martin Maechler  wrote:

 Subject: Re: [Rd] table(exclude = NULL) always includes NA

 Cc: "Martin Maechler" 
 Date: Tuesday, 16 August, 2016, 5:42 PM

> Martin Maechler 
> on Mon, 15 Aug 2016 12:35:41 +0200 writes:

> Martin Maechler 
> on Mon, 15 Aug 2016 11:07:43 +0200 writes:


> on Sun, 14 Aug 2016 03:42:08 + writes:

>>> useNA <- if (missing(useNA) && !missing(exclude) && !(NA %in% exclude)) 
"ifany"
>>> An example where it change 'table' result for non-factor input, from 
https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html :

>>> x <- c(1,2,3,3,NA)
>>> table(as.integer(x), exclude=NaN)

>>> I bring the example up, in case that the change in result is not 
intended.

>> Thanks a lot, Suharto.

>> To me, the example is convincing that the change (I commited
>> Friday), svn rev 71087 & 71088,   are a clear improvement:

>> (As you surely know, but not all the other readers:)
>> Before the change, the above example gave *different* results
>> for  'x'  and  'as.integer(x)', the integer case *not* counting the NAs,
>> whereas with the change in effect, they are the same:

>>> x <- as.integer(dx <- c(1,2,3,3,NA))
>>> table(x, exclude=NaN); table(dx, exclude=NaN)
>> x
>> 123  
>> 1121 
>> dx
>> 123  
>> 1121 
>>> 

>> --
>> But the change has affected 6-8 (of the 8000+) CRAN packages
>> which I am investigating now and probably will be in contact with the
>> package maintainers after that.

> There has been another bug in table(), since the time  'useNA'
> was introduced, which gives (in released R, R-patched, or R-devel):

>> table(1:3, exclude = 1, useNA = "ifany")

> 23  
> 111 
>> 

> and that bug now (in R-devel, after my changes) triggers in
> cases it did not previously, notably in

> table(1:3, exclude = 1)

> which now does set 'useNA = "ifany"' and so gives the same silly
> result as the one above.

> The reason for this bug is that   addNA(..)  is called (in all R
> versions mentioned) in this case, but it should not.

> I'm currently testing yet another amendment..

which was not sufficient... so I had to do *much* more work.

The result is code which functions -- I hope -- uniformly better
than the current code, but unfortunately, code that is much longer.

After all I came to the conclusion that using addNA() was not
good enough [I did not yet consider *changing* addNA() itself,
even though the only place we use it in R's own packages is
inside table()] and so for now have code in table() that does
the equivalent of addNA() *but* does remember if addNA() did add
an NA level or not.
I also have extended the regression tests considerably,
*and*  example(table)  now reverts to give identical output to
R 3.3.1 (which it did no longer in R-devel (r 71088)).

I'm still investigating the CRAN package fallout (from the above
change 4 days ago) but plan to commit my (unfortunately
somewhat extensive) changes.

Also, I think this will become the first in this year's R-devel

SIGNIFICANT USER-VISIBLE CHANGES:

  • ‘table()’ has been amended to be more internally consistent
and become back compatible to R <= 2.7.2 again.
Consequently, ‘table(1:2, exclude=NULL)’ no longer contains
a zero count for ‘’, but ‘useNA = "always"’ continues to
do so.


--
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] R-intro: function 'stderr' and 'sd'

2016-09-09 Thread Suharto Anggono Suharto Anggono via R-devel
In "An Introduction to R" Version 3.3.1, in "4.2 The function tapply() and 
ragged arrays", after
stderr <- function(x) sqrt(var(x)/length(x))  ,
there is a note in brackets:
Writing functions will be considered later in [Writing your own functions], and 
in this case was unnecessary as R also has a builtin function sd().

The part "in this case was unnecessary as R also has a builtin function sd()" 
is misleading. The builtin function sd() doesn't calculate standard error of 
the mean. It calculates standard deviation. The function 'stderr' can use 'sd':
function(x) sd(x)/sqrt(length(x))

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Coercion of 'exclude' in function 'factor' (was 'droplevels' inappropriate change)

2016-09-02 Thread Suharto Anggono Suharto Anggono via R-devel
I am basically fine with the change.

How about using just the following?
if(!is.character(exclude))
exclude <- as.vector(exclude, typeof(x)) # may result in NA
x <- as.character(x)

It looks simpler and is, more or less, equivalent.

In factor.Rd, in description of argument 'exclude', "(when \code{x} is a 
\code{factor} already)" can be removed.


A larger change that, I think, is reasonable is entirely removing the code
exclude <- as.vector(exclude, typeof(x)) # may result in NA

The explicit coercion of 'exclude' is not necessary. Function 'factor' works 
without it.

The coercion of 'exclude' may leads to a surprise because it "may result in 
NA". Example from https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html :
factor(as.integer(c(1,2,3,3,NA)), exclude=NaN)
excludes NA.

As a bonus, without the coercion of 'exclude', 'exclude' can be a factor if 'x' 
is a factor. This part of an example in 
https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html works.
cc <- c("x","y","NA")
ff <- factor(cc)
factor(ff,exclude=ff[3])

However, the coercion of 'exclude' has been in function 'factor' in R "forever".

On Wed, 31/8/16, Martin Maechler  wrote:

 Subject: Re: [Rd] 'droplevels' inappropriate change

 Cc: "Martin Maechler" 
 Date: Wednesday, 31 August, 2016, 2:51 PM
 
>>>>> Martin Maechler 
>>>>> on Sat, 27 Aug 2016 18:55:37 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Sat, 27 Aug 2016 03:17:32 + writes:

>> In R devel r71157, 'droplevels' documentation, in "Arguments" section, 
says this about argument 'exclude'.
>> passed to factor(); factor levels which should be excluded from the 
result even if present.  Note that this was implicitly NA in R <= 3.3.1 which 
did drop NA levels even when present in x, contrary to the documentation.  The 
current default is compatible with x[ , drop=FALSE].

>> The part
>> x[ , drop=FALSE]
>> should be
>> x[ , drop=TRUE]

[[elided Yahoo spam]]
> a "typo" by me. .. fixed now.

>> Saying that 'exclude' is factor levels is not quite true for NA element. 
NA may be not an original level, but NA in 'exclude' affects the result.

>> For a factor 'x', factor(x, exclude = exclude) doesn't really work for 
excluding in general. See, for example, 
https://stat.ethz.ch/pipermail/r-help/2005-September/079336.html .
>> factor(factor(c("a","b","c")), exclude="c")

>> However, this excludes "2":
>> factor(factor(2:3), exclude=2)

>> Rather unexpectedly, this excludes NA:
>> factor(factor(c("a",NA), exclude=NULL), exclude="c")

>> For a factor 'x', factor(x, exclude = exclude) can only exclude 
integer-like or NA levels. An explanation is in 
https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html .

> Well, Peter Dalgaard (in that R-devel e-mail, a bit more than 5
> years ago) is confirming the problem there,  and suggesting (as
> you, right?) that actually   `factor()` is not behaving
> correctly here.

> And your persistence is finally getting close to convince me
> that it is not just droplevels(), but  factor() itself which
> needs care here.

> Interestingly, the following patch *does* pass 'make check-all'
> (after small change in tests/reg-tests-1b.R which is ok),
> and leads to behavior which is much closer to the documentation,
> notably for your two examples above would give what one would
> expect.

> (( If the R-Hub would support experiments with branches of R-devel 
> from R-core members,  I could just create such a branch and R Hub
> would run 'R CMD check '  for thousands of CRAN packages
> and provide a web page with the *differences* in the package
> check results ... so we could see ... ))

> I do agree that we should strongly consider such a change.

as nobody has commented, I've been liberal and have taken these
no comments as consent.

Hence I have committed


r71178 | maechler | 2016-08-31 09:45:40 +0200 (Wed, 31 Aug 2016) | 1 line
Changed paths:
   M /trunk/doc/NEWS.Rd
   M /trunk/src/library/base/R/factor.R
   M /trunk/src/library/base/man/factor.Rd
   M /trunk/tests/reg-tests-1b.R
   M /trunk/tests/reg-tests-1c.R

factor(x, exclude) more "rational" when x or exclude are 

Re: [Rd] 'droplevels' inappropriate change

2016-08-26 Thread Suharto Anggono Suharto Anggono via R-devel
In R devel r71157, 'droplevels' documentation, in "Arguments" section, says 
this about argument 'exclude'.
passed to factor(); factor levels which should be excluded from the result even 
if present.  Note that this was implicitly NA in R <= 3.3.1 which did drop NA 
levels even when present in x, contrary to the documentation.  The current 
default is compatible with x[ , drop=FALSE].

The part
x[ , drop=FALSE]
should be
x[ , drop=TRUE]

Saying that 'exclude' is factor levels is not quite true for NA element. NA may 
be not an original level, but NA in 'exclude' affects the result.

For a factor 'x', factor(x, exclude = exclude) doesn't really work for 
excluding in general. See, for example, 
https://stat.ethz.ch/pipermail/r-help/2005-September/079336.html .
factor(factor(c("a","b","c")), exclude="c")

However, this excludes "2":
factor(factor(2:3), exclude=2)

Rather unexpectedly, this excludes NA:
factor(factor(c("a",NA), exclude=NULL), exclude="c")

For a factor 'x', factor(x, exclude = exclude) can only exclude integer-like or 
NA levels. An explanation is in 
https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html .

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] 'droplevels' inappropriate change

2016-08-21 Thread Suharto Anggono Suharto Anggono via R-devel
In R devel r71124, if 'x' is a factor, droplevels(x) gives
factor(x, exclude = NULL) .
In R 3.3.1, it gives
factor(x) .

If a factor 'x' has NA and levels of 'x' doesn't contain NA, factor(x) gives 
the expected result for droplevels(x) , but factor(x, exclude = NULL) doesn't. 
As I said in https://stat.ethz.ch/pipermail/r-devel/2016-May/072796.html , 
factor(x, exclude = NULL) adds NA as a level.

Using
factor(x, exclude = if(anyNA(levels(x))) NULL else NA ) ,
like in the code of function `[.factor` (in the same file, factor.R, as 
'droplevels'), is better.
It is possible just to use
x[, drop = TRUE] .

For a factor 'x' that has NA level and also NA value, factor(x, exclude = NULL) 
is not perfect, though. It change NA to be associated with NA factor level.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] table(exclude = NULL) always includes NA

2016-08-16 Thread Suharto Anggono Suharto Anggono via R-devel
The quirk as in table(1:3, exclude = 1, useNA = "ifany") is actually somewhat 
documented, and still in R devel r71104. In R help on 'table', in "Details" 
section:
It is best to supply factors rather than rely on coercion.  In particular, 
‘exclude’ will be used in coercion to a factor, and so values (not levels) 
which appear in ‘exclude’ before coercion will be mapped to ‘NA’ rather than be 
discarded.

Another part, above it:
‘useNA’ controls if the table includes counts of ‘NA’ values:  Note that 
levels specified in ‘exclude’ are mapped to ‘NA’ and so included in ‘NA’ counts.

The last statement is actually not true for an argument that is already a 
factor.

On Tue, 16/8/16, Martin Maechler  wrote:

 Subject: Re: [Rd] table(exclude = NULL) always includes NA

 Cc: "Martin Maechler" 
 Date: Tuesday, 16 August, 2016, 5:42 PM

> Martin Maechler 
> on Mon, 15 Aug 2016 12:35:41 +0200 writes:

> Martin Maechler 
> on Mon, 15 Aug 2016 11:07:43 +0200 writes:


> on Sun, 14 Aug 2016 03:42:08 + writes:

>>> useNA <- if (missing(useNA) && !missing(exclude) && !(NA %in% exclude)) 
"ifany"
>>> An example where it change 'table' result for non-factor input, from 
https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html :

>>> x <- c(1,2,3,3,NA)
>>> table(as.integer(x), exclude=NaN)

>>> I bring the example up, in case that the change in result is not 
intended.

>> Thanks a lot, Suharto.

>> To me, the example is convincing that the change (I commited
>> Friday), svn rev 71087 & 71088,   are a clear improvement:

>> (As you surely know, but not all the other readers:)
>> Before the change, the above example gave *different* results
>> for  'x'  and  'as.integer(x)', the integer case *not* counting the NAs,
>> whereas with the change in effect, they are the same:

>>> x <- as.integer(dx <- c(1,2,3,3,NA))
>>> table(x, exclude=NaN); table(dx, exclude=NaN)
>> x
>> 123  
>> 1121 
>> dx
>> 123  
>> 1121 
>>> 

>> --
>> But the change has affected 6-8 (of the 8000+) CRAN packages
>> which I am investigating now and probably will be in contact with the
>> package maintainers after that.

> There has been another bug in table(), since the time  'useNA'
> was introduced, which gives (in released R, R-patched, or R-devel):

>> table(1:3, exclude = 1, useNA = "ifany")

> 23  
> 111 
>> 

> and that bug now (in R-devel, after my changes) triggers in
> cases it did not previously, notably in

> table(1:3, exclude = 1)

> which now does set 'useNA = "ifany"' and so gives the same silly
> result as the one above.

> The reason for this bug is that   addNA(..)  is called (in all R
> versions mentioned) in this case, but it should not.

> I'm currently testing yet another amendment..

which was not sufficient... so I had to do *much* more work.

The result is code which functions -- I hope -- uniformly better
than the current code, but unfortunately, code that is much longer.

After all I came to the conclusion that using addNA() was not
good enough [I did not yet consider *changing* addNA() itself,
even though the only place we use it in R's own packages is
inside table()] and so for now have code in table() that does
the equivalent of addNA() *but* does remember if addNA() did add
an NA level or not.
I also have extended the regression tests considerably,
*and*  example(table)  now reverts to give identical output to
R 3.3.1 (which it did no longer in R-devel (r 71088)).

I'm still investigating the CRAN package fallout (from the above
change 4 days ago) but plan to commit my (unfortunately
somewhat extensive) changes.

Also, I think this will become the first in this year's R-devel

SIGNIFICANT USER-VISIBLE CHANGES:

  • ‘table()’ has been amended to be more internally consistent
and become back compatible to R <= 2.7.2 again.
Consequently, ‘table(1:2, exclude=NULL)’ no longer contains
a zero count for ‘’, but ‘useNA = "always"’ continues to
do so.


--
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] table(exclude = NULL) always includes NA

2016-08-13 Thread Suharto Anggono Suharto Anggono via R-devel
useNA <- if (missing(useNA) && !missing(exclude) && !(NA %in% exclude)) "ifany"

An example where it change 'table' result for non-factor input, from 
https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html :
x <- c(1,2,3,3,NA)
table(as.integer(x), exclude=NaN)

I bring the example up, in case that the change in result is not intended.

On Sat, 13/8/16, Martin Maechler  wrote:

 Subject: Re: [Rd] table(exclude = NULL) always includes NA
 To: "Martin Maechler" 

@r-project.org
 Date: Saturday, 13 August, 2016, 4:29 AM

>>>>> Martin Maechler 
>>>>> on Fri, 12 Aug 2016 10:12:01 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Thu, 11 Aug 2016 16:19:49 + writes:

>> I stand corrected. The part "If set to 'NULL', it implies
>> 'useNA="always"'." is even in the documentation in R
>> 2.8.0. It was my fault not to check carefully.  I wonder,
>> why "always" was chosen for 'useNA' for exclude = NULL.

> me too.  "ifany" would seem more logical, and I am
> considering changing to that as a 2nd step (if the 1st
> step, below) shows to be feasible.

>> Why exclude = NULL is so special? What about another
>> 'exclude' of length zero, like character(0) (not c(),
>> because c() is NULL)? I thought that, too. But then, I
>> have no opinion about making it general.

> As mentioned, I entirely agree with that {and you are
> right about c() !!}.

>> It fits my expectation to override 'useNA' only if the
>> user doesn't explicitly specify 'useNA'.

>> Thank you for looking into this.

> you are welcome.  As first step, I plan to commit the
> change to (*)

>  useNA <- if (missing(useNA) && !missing(exclude) && !(NA
> %in% exclude)) "always"

> as proposed yesterday, and I'll eventually see / be
> notified about the effect in CRAN space.

and as I'm finding now,  20 minutes too late,   doing step 1
without doing step 2  is not feasible.
It gives many  0 counts for   e.g. for  exclude = "foo".



> --
> (*) slightly more efficiently, I'll be using match()
> directly instead of %in%

>> My points: Could R 2.7.2 behavior of table(,
>> exclude = NULL) be brought back? But R 3.3.1 behavior is
>> in R since version 2.8.0, rather long.

> you are right... but then, the places / cases where the
> behavior would change back should be quite rare.

>> If not, I suggest changing summary().
>> ----

> Thank you for your feedback, Suharto!  Martin

>> On Thu, 11/8/16, Martin Maechler
>>  wrote:
>> 
>> Subject: Re: [Rd] table(exclude = NULL) always includes
>> NA
>> 
>> @r-project.org Cc: "Martin Maechler"
>>  Date: Thursday, 11 August,
>> 2016, 12:39 AM
>> 
>> >>>>> Martin Maechler  >>>>>
>> on Tue, 9 Aug 2016 15:35:41 +0200 writes:
>> 
>> >>>>> Suharto Anggono Suharto Anggono via R-devel
>>  >>>>> on Sun, 7 Aug 2016 15:32:19
>> + writes:
>> 
>> > > This is an example from
>> https://stat.ethz.ch/pipermail/r-help/2007-May/132573.html
>> .
>> > 
>> > > With R 2.7.2:
>> > 
>> > > > a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
>> > > > table(a, b, exclude = NULL) > > b > > a 1 2 > > 1 1
>> 1 > > 2 2 0 > > 3 1 0 > >  1 0
>> > 
>> > > With R 3.3.1:
>> > 
>> > > > a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
>> > > > table(a, b, exclude = NULL) > > b > > a 1 2  >
>> > 1 1 1 0 > > 2 2 0 0 > > 3 1 0 0 > >  1 0 0 > > >
>> table(a, b, useNA = "ifany") > > b > > a 1 2 > > 1 1 1 >
>> > 2 2 0 > > 3 1 0 > >  1 0 > > > table(a, b, exclude
>> = NULL, useNA = "ifany") > > b > > a 1 2  > > 1 1 1 0
>> > > 2 2 0 0 > > 3 1 0 0 > >  1 0 0
>> > 
>> > > For the example, in R 3.3.1, the result of 'table'
>> with > > exclude = NULL includes NA even if

Re: [Rd] table(exclude = NULL) always includes NA

2016-08-11 Thread Suharto Anggono Suharto Anggono via R-devel
I stand corrected. The part "If set to 'NULL', it implies 'useNA="always"'." is 
even in the documentation in R 2.8.0. It was my fault not to check carefully.

I wonder, why "always" was chosen for 'useNA' for exclude = NULL.

Why exclude = NULL is so special? What about another 'exclude' of length zero, 
like character(0) (not c(), because c() is NULL)? I thought that, too. But 
then, I have no opinion about making it general.

It fits my expectation to override 'useNA' only if the user doesn't explicitly 
specify 'useNA'.

Thank you for looking into this.

My points:
Could R 2.7.2 behavior of table(, exclude = NULL) be brought back? 
But R 3.3.1 behavior is in R since version 2.8.0, rather long.
If not, I suggest changing summary().

On Thu, 11/8/16, Martin Maechler  wrote:

 Subject: Re: [Rd] table(exclude = NULL) always includes NA

@r-project.org
 Cc: "Martin Maechler" 
 Date: Thursday, 11 August, 2016, 12:39 AM

>>>>> Martin Maechler 
>>>>> on Tue, 9 Aug 2016 15:35:41 +0200 writes:

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>> on Sun, 7 Aug 2016 15:32:19 + writes:

> > This is an example from 
> > https://stat.ethz.ch/pipermail/r-help/2007-May/132573.html .
> 
> > With R 2.7.2:
> 
> > > a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
> > > table(a, b, exclude = NULL)
> >   b
> > a  1 2
> >   11 1
> >   22 0
> >   31 0
> >1 0
> 
> > With R 3.3.1:
> 
> > > a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
> > > table(a, b, exclude = NULL)
> >   b
> > a  1 2 
> >   11 10
> >   22 00
> >   31 00
> >1 00
> > > table(a, b, useNA = "ifany")
> >   b
> > a  1 2
> >   11 1
> >   22 0
> >   31 0
> >1 0
> > > table(a, b, exclude = NULL, useNA = "ifany")
> >   b
> > a  1 2 
> >   11 10
> >   22 00
> >   31 00
> >1 00
> 
> > For the example, in R 3.3.1, the result of 'table' with
> > exclude = NULL includes NA even if NA is not present. It is
> > different from R 2.7.2, that comes from factor(exclude = NULL), 
> > that includes NA only if NA is present.
> 
> I agree that this (R 3.3.1 behavior) seems undesirable and looks
> wrong, and the old (<= 2.2.7) behavior for  table(a,b,
> exclude=NULL) seems desirable to me.
> 
> 
> > >From R 3.3.1 help on 'table', in "Details" section:
> > 'useNA' controls if the table includes counts of 'NA' values: the allowed 
> > values correspond to never, only if the count is positive and even for zero 
> > counts.  This is overridden by specifying 'exclude = NULL'.
> 
> > Specifying 'exclude = NULL' overrides 'useNA' to what value? The 
> > documentation doesn't say. Looking at the code of function 'table', the 
> > value is "always".
> 
> Yes, it should be documented what happens for this case,
> (but read on ...)

and it is *not* true that the documentation does not say, since
2013, it has contained

exclude: levels to remove for all factors in ‘...’.  If set to ‘NULL’,
  it implies ‘useNA = "always"’.  See ‘Details’ for its
  interpretation for non-factor arguments.


> > For the example, in R 3.3.1, the result like in R 2.7.2 can be obtained 
> > with useNA = "ifany" and 'exclude' unspecified.
> 
> Yes.  What should we do?
> I currently think that we'd want to change the line
> 
>  useNA <- if (!missing(exclude) && is.null(exclude)) "always"
> 
> to
> 
>  useNA <- if (!missing(exclude) && is.null(exclude)) "ifany" # was 
> "always"
> 
> 
> which would not even contradict documentation, as indeed you
> mentioned above, the exact action here had not been documented.

The last part ("which ..") above is wrong, as mentioned earlier.

The above change entails behaviour which looks better to me;
however, the change *is* "against the current documentation".
and after experimentation (a "complete factorial design" of
argument settings), I'm not entirely happy with the result and one reason
is that   'exclude = NULL'  and  (e.g.)   'exclude = c()'
are (still) handled differently: From a usual interpreation,
both should mean 
  &q

[Rd] table(exclude = NULL) always includes NA

2016-08-07 Thread Suharto Anggono Suharto Anggono via R-devel
This is an example from 
https://stat.ethz.ch/pipermail/r-help/2007-May/132573.html .

With R 2.7.2:

> a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
> table(a, b, exclude = NULL)
  b
a  1 2
  11 1
  22 0
  31 0
   1 0

With R 3.3.1:

> a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
> table(a, b, exclude = NULL)
  b
a  1 2 
  11 10
  22 00
  31 00
   1 00
> table(a, b, useNA = "ifany")
  b
a  1 2
  11 1
  22 0
  31 0
   1 0
> table(a, b, exclude = NULL, useNA = "ifany")
  b
a  1 2 
  11 10
  22 00
  31 00
   1 00

For the example, in R 3.3.1, the result of 'table' with exclude = NULL includes 
NA even if NA is not present. It is different from R 2.7.2, that comes from 
factor(exclude = NULL), that includes NA only if NA is present.

>From R 3.3.1 help on 'table', in "Details" section:
'useNA' controls if the table includes counts of 'NA' values: the allowed 
values correspond to never, only if the count is positive and even for zero 
counts.  This is overridden by specifying 'exclude = NULL'.

Specifying 'exclude = NULL' overrides 'useNA' to what value? The documentation 
doesn't say. Looking at the code of function 'table', the value is "always".

For the example, in R 3.3.1, the result like in R 2.7.2 can be obtained with 
useNA = "ifany" and 'exclude' unspecified.


The result of 'summary' of a logical vector is affected. As mentioned in 
http://stackoverflow.com/questions/26775501/r-dropping-nas-in-logical-column-levels
 , in the code of function 'summary.default', for logical, table(object, 
exclude = NULL) is used.

With R 2.7.2:

> log <- c(NA, logical(4), NA, !logical(2), NA)
> summary(log)
   Mode   FALSETRUENA's
logical   4   2   3
> summary(log[!is.na(log)])
   Mode   FALSETRUE
logical   4   2
> summary(TRUE)
   ModeTRUE
logical   1

With R 3.3.1:

> log <- c(NA, logical(4), NA, !logical(2), NA)
> summary(log)
   Mode   FALSETRUENA's
logical   4   2   3
> summary(log[!is.na(log)])
   Mode   FALSETRUENA's
logical   4   2   0
> summary(TRUE)
   ModeTRUENA's
logical   1   0

In R 3.3.1, "NA's' is always in the result of 'summary' of a logical vector. It 
is unlike 'summary' of a numeric vector.
On the other hand, in R 3.3.1, FALSE is not in the result of 'summary' of a 
logical vector that doesn't  contain FALSE.

I prefer the result of 'summary' of a logical vector like in R 2.7.2, or, 
alternatively, the result that always includes all possible values: FALSE, 
TRUE, NA.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] complex NA's match(), etc: not back-compatible change proposal

2016-06-03 Thread Suharto Anggono Suharto Anggono via R-devel
27;1
 A' above ?

   Can "the" new behavior easily be described in words
 (if '1 A'
   above is already assumed)?

 At the moment, I would not tackle Problem 2.
 It would become less problematic once  Problem 1 is
 solved
 according to '1 A', because it least  length(unique(.))
 would
 not change:  It would contain *one* z[] with an NA, and
 all the
 other z[]s.

 Opinions ?  Thank you in advance for chiming in..

 Martin Maechler,
 ETH Zurich

     > On Mon, 23/5/16, Martin Maechler 
 wrote:

     > Subject: Re: [Rd] complex NA's match(),
 etc: not back-compatible change proposal

     > Cc: R-devel@r-project.org
     > Date: Monday, 23 May, 2016, 11:06 PM

     >>>>>> 
     > Suharto Anggono Suharto Anggono via
 R-devel 
     >>>>>>      on Fri, 13
     > May 2016 16:33:05 + writes:

     >     > That, for example,
 complex(real=NaN)
     > and complex(imaginary=NaN) are regarded
 as equal makes it
     > possible that 

     >     > 
     > length(unique(as.character(x))) >
 length(unique(x)) 

     >     > (current code of
     > function 'factor' doesn't expect it). 

     > Thank you, that is an
     > interesting remark - but is already
 true,
     > in
     > [[elided Yahoo spam]]

     > ..
     > and of course this is because we do
     > *print*   0+NaNi  etc,
     > i.e., we
     > differentiate the  non-NA-but-NaN
 complex values in
     > formatting / printing but not in
 match(),
     > unique() ...

     > and indeed,
     > with the  'z'  example below,
     >  
     > fz <- factor(z,z)
     > gives a warnings about
     > duplicated levels and gives such
 warnings
     > also in current (and previous) versions
 of R,
     > at least for the slightly
     > larger z 
     > I've used in the tests/reg-tests-1c.R
 example.

     > For the moment I can live with
     > that warning, as I don't think
     > factor()s
     > are constructed from complex numbers
 "often"...
     > and the performance of factor() in the
 more
     > regular cases is important.

     >> Yes, an argument for the behavior is
 that
     > NA and NaN are of one kind.
     >> On my
     > system, using 32-bit R for Windows from
 binary from CRAN,
     > the result of sapply(z, match, table = z)
 (not in current
     > R-devel) may be different from below:
     >    
     >> 1 2 3 4 1 3 7 8 2 4 8 12  # R
 2.10.1, different from
     > below
     >     > 1 2 3 4 1 3 7 8 2 4 8 12 
     > # R 3.2.5, different from below

     > interesting, thank you... and another
 reason
     > why the change
     > (currently only in R-devel)
     > may have been a good one: More
 uniformity.

     >     > I noticed that, by
     > function 'cequal' in unique.c, a complex
 number that
     > has both NA and NaN matches NA and also
 matches NaN.

     >     >> x0 <- c(0,1,
     > NA, NaN); z <- outer(x0,x0, complex,
 length.out=1);
     > rm(x0)
     >     >> (z <-
     > z[is.na(z)])
     >     > [1]   
     >    NA NaN+  0i       NA NaN+ 
 1i 
     >      NA       NA   
     >    NA       NA
     >    
     >> [9]   0+NaNi   1+NaNi   
     >    NA NaN+NaNi

     >     >> sapply(z, match, table =
     > z[8])
     >     > [1] 1 1 1 1 1 1 1 1 1 1 1
     > 1
     >     >> match(z, z[8])
     >     > [1] 1 1 1 1 1 1 1 1 1 1 1 1

     > Yes, I see the same. But is
     > n't it what we expect:

     > All of our z[] entries has at least one
 NA or a
     > NaN in its real
     > or imaginary, and since z[8]
     > has both, it does match with all
     > z[]'s
     > either because of the NA or because of
 the NaN in common.

     > Hence, currently, I don't
     > think this needs to be changed...
     > but if
     > there are other reasons / arguments ...

     > Thank you again,
     > Martin
     > Maechler


     >     >> sessionInfo()
     >  
     >   > R Under development (unstable)
 (2016-05-12
     > r70604)
     >     > Platform:
     > i386-w64-mingw32/i386 (32-bit)
     >     >
     > Running under: Windows XP (build 2600)
 Service Pack 2

     >     > locale:
     >     > [1] LC_COLLATE=English_United
     > States.1252
     >     > [2]
     > LC_CTYPE=English_United States.1252
     >    
     >> [3] LC_MONETARY=English_United
 States.1252
     >     > [4] LC_NUMERIC=C
     >  
     >   > [5] LC_TIME=English_United
 States.1252

     >     > attached base
     > packages:
     >     > [1] stats 
     >    graphics  grDevices utils 
  

[Rd] factor(x, exclude=NULL) for factor x; names in as.factor()

2016-05-30 Thread Suharto Anggono Suharto Anggono via R-devel
In R 3.3.0 (also in R 2.7.2), the documentation on 'factor', in "Details" 
section, has this statement.
'factor(x, exclude = NULL)' applied to a factor is a no-operation unless there 
are unused levels: in that case, a factor with the reduced level set is 
returned.

It is not true for a factor 'x' that has NA. In that case, if levels of 'x' 
doesn't contain NA, factor(x, exclude = NULL) adds NA as a level.
If levels of a factor 'x' doesn't contain NA, factor(x) is a no-operation if 
all levels are used.


In R 3.3.0 (also in R 3.1.3), for a named integer 'x', factor(x) has names and 
as.factor(x) doesn't. It would be better if the behavior on names were matched.

> x <- integer(1)
> names(x) <- "a"
> names(factor(x))
[1] "a"
> names(as.factor(x))
NULL
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.3.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] complex NA's match(), etc: not back-compatible change proposal

2016-05-28 Thread Suharto Anggono Suharto Anggono via R-devel
On 'factor', I meant the case where 'levels' is not specified, where 'unique' 
is called.

> factor(c(complex(real=NaN), complex(imaginary=NaN)))
[1] NaN+0i 
Levels: NaN+0i

Look at  in the result above. Yes, it happens in earlier versions of R, too.

On matching both NA and NaN, another consequence is that length(unique(.)) may 
depend on order. Example using R devel r70604:

> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0)
> (z <- z[is.na(z)])
 [1]   NA NaN+  0i   NA NaN+  1i   NA   NA   NA   NA
 [9]   0+NaNi   1+NaNi   NA NaN+NaNi
> length(print(unique(z)))
[1] NA NaN+0i
[1] 2
> length(print(unique(c(z[8], z[-8]
[1] NA
[1] 1

On Mon, 23/5/16, Martin Maechler  wrote:

 Subject: Re: [Rd] complex NA's match(), etc: not back-compatible change 
proposal

 Cc: R-devel@r-project.org
 Date: Monday, 23 May, 2016, 11:06 PM

 >>>>>
 Suharto Anggono Suharto Anggono via R-devel 
 >>>>>     on Fri, 13
 May 2016 16:33:05 + writes:

     > That, for example, complex(real=NaN)
 and complex(imaginary=NaN) are regarded as equal makes it
 possible that 

     > 
 length(unique(as.character(x))) > length(unique(x)) 

     > (current code of
 function 'factor' doesn't expect it). 

 Thank you, that is an
 interesting remark - but is already true,
 in
[[elided Yahoo spam]]

 ..
 and of course this is because we do
 *print*   0+NaNi  etc,
 i.e., we
 differentiate the  non-NA-but-NaN complex values in
 formatting / printing but not in match(),
 unique() ...

 and indeed,
 with the  'z'  example below,
  
 fz <- factor(z,z)
 gives a warnings about
 duplicated levels and gives such warnings
 also in current (and previous) versions of R,
 at least for the slightly
 larger z 
 I've used in the tests/reg-tests-1c.R example.

 For the moment I can live with
 that warning, as I don't think
 factor()s
 are constructed from complex numbers "often"...
 and the performance of factor() in the more
 regular cases is important.

 > Yes, an argument for the behavior is that
 NA and NaN are of one kind.
 > On my
 system, using 32-bit R for Windows from binary from CRAN,
 the result of sapply(z, match, table = z) (not in current
 R-devel) may be different from below:
    
 > 1 2 3 4 1 3 7 8 2 4 8 12  # R 2.10.1, different from
 below
     > 1 2 3 4 1 3 7 8 2 4 8 12 
 # R 3.2.5, different from below

 interesting, thank you... and another reason
 why the change
 (currently only in R-devel)
 may have been a good one: More uniformity.

     > I noticed that, by
 function 'cequal' in unique.c, a complex number that
 has both NA and NaN matches NA and also matches NaN.

     >> x0 <- c(0,1,
 NA, NaN); z <- outer(x0,x0, complex, length.out=1);
 rm(x0)
     >> (z <-
 z[is.na(z)])
     > [1]   
    NA NaN+  0i       NA NaN+  1i 
      NA       NA   
    NA       NA
    
 > [9]   0+NaNi   1+NaNi   
    NA NaN+NaNi

     >> sapply(z, match, table =
 z[8])
     > [1] 1 1 1 1 1 1 1 1 1 1 1
 1
     >> match(z, z[8])
     > [1] 1 1 1 1 1 1 1 1 1 1 1 1

 Yes, I see the same. But is
 n't it what we expect:

 All of our z[] entries has at least one NA or a
 NaN in its real
 or imaginary, and since z[8]
 has both, it does match with all
 z[]'s
 either because of the NA or because of the NaN in common.

 Hence, currently, I don't
 think this needs to be changed...
 but if
 there are other reasons / arguments ...

 Thank you again,
 Martin
 Maechler


     >> sessionInfo()
  
   > R Under development (unstable) (2016-05-12
 r70604)
     > Platform:
 i386-w64-mingw32/i386 (32-bit)
     >
 Running under: Windows XP (build 2600) Service Pack 2

     > locale:
     > [1] LC_COLLATE=English_United
 States.1252
     > [2]
 LC_CTYPE=English_United States.1252
    
 > [3] LC_MONETARY=English_United States.1252
     > [4] LC_NUMERIC=C
  
   > [5] LC_TIME=English_United States.1252

     > attached base
 packages:
     > [1] stats 
    graphics  grDevices utils 
    datasets  methods   base

     >
 -
 >>>>>
 Martin Maechler 
 >>>>>     on Tue, 10
 May 2016 16:08:39 +0200 writes:

     >> This is an RFC / announcement
 related to the 2nd part of PR#16885
    
 >> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16885
     >> about  complex NA's.

     >> The (somewhat
 rare) incompatibility in R's 3.3.0 match() behavior for
 the
     >> case of complex numbers
 with NA & NaN's {which has been fixed for R 3.3.0
     >> patched in the mean time}
 triggered some more comprehensive "research".

     >> I found that we
 have had a long-standing inconsistency at least bet

Re: [Rd] complex NA's match(), etc: not back-compatible change proposal

2016-05-13 Thread Suharto Anggono Suharto Anggono via R-devel
That, for example, complex(real=NaN) and complex(imaginary=NaN) are regarded as 
equal makes it possible that length(unique(as.character(x))) > 
length(unique(x)) (current code of function 'factor' doesn't expect it). Yes, 
an argument for the behavior is that NA and NaN are of one kind.

On my system, using 32-bit R for Windows from binary from CRAN, the result of 
sapply(z, match, table = z) (not in current R-devel) may be different from 
below:
1 2 3 4 1 3 7 8 2 4 8 12  # R 2.10.1, different from below
1 2 3 4 1 3 7 8 2 4 8 12  # R 3.2.5, different from below

I noticed that, by function 'cequal' in unique.c, a complex number that has 
both NA and NaN matches NA and also matches NaN.

> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0)
> (z <- z[is.na(z)])
 [1]   NA NaN+  0i   NA NaN+  1i   NA   NA   NA   NA
 [9]   0+NaNi   1+NaNi   NA NaN+NaNi
> sapply(z, match, table = z[8])
 [1] 1 1 1 1 1 1 1 1 1 1 1 1
> match(z, z[8])
 [1] 1 1 1 1 1 1 1 1 1 1 1 1
> sessionInfo()
R Under development (unstable) (2016-05-12 r70604)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows XP (build 2600) Service Pack 2

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

-
> Martin Maechler 
> on Tue, 10 May 2016 16:08:39 +0200 writes:

> This is an RFC / announcement related to the 2nd part of PR#16885
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16885
> about  complex NA's.

> The (somewhat rare) incompatibility in R's 3.3.0 match() behavior for the
> case of complex numbers with NA & NaN's {which has been fixed for R 3.3.0
> patched in the mean time} triggered some more comprehensive "research".

> I found that we have had a long-standing inconsistency at least between 
the
> documented and the real behavior.  I am claiming that the documented
> behavior is desirable and hence R's current "real" behavior is bugous, and
> I am proposing to change it, in R-devel (to be 3.4.0) for now.

After the  "roaring unanimous" assent  (one private msg
  encouraging me to go forward, no dissenting voice, hence an
  "odds ratio" of  +Inf  in favor ;-)

I have now committed my proposal to R-devel (svn rev. 70597) and
some of us will be seeing the effect in package space within a
day or so, in the CRAN checks against R-devel (not for
bioconductor AFAIK; their checks using R-devel only when it less
than ca 6 months from release).

It's still worthwhile to discuss the issue, if you come late
to it, notably as ---paraphrasing Dirk on the R-package-devel list---
the release of 3.4.0 is almost a year away, and so now is the
best time to tinker with the API, in other words, consider breaking
rarely used legacy APIs..

Martin


> In help(match) we have been saying

> |  Exactly what matches what is to some extent a matter of definition.
> |  For all types, \code{NA} matches \code{NA} and no other value.
> |  For real and complex values, \code{NaN} values are regarded
> |  as matching any other \code{NaN} value, but not matching \code{NA}.

> for at least 10 years.  But we don't do that at all in the
> complex case (and AFAIK never got a bug report about it).

> Also, e.g., print(.) or format(.) do simply use  "NA" for all
> the different complex NA-containing numbers, where OTOH,
> non-NA NaN's { <=>  !is.nan(z) & is.na(z) }
> in format() or print() do show the NaN in real and/or imaginary
> parts; for an example, look at the "format" column of the matrix
> below, after 'print(cbind' ...

> The current match()---and duplicated(), unique() which are based on the 
same
> C code---*do* distinguish almost all complex NA / NaN's which is
> NOT according to documentation. I have found that this is just because of 
> of our hashing function for the complex case, chash() in 
R/src/main/unique.c,
> is bogous in the sense that it is not compatible with the above 
documentation
> and also not with the cequal() function (in the same file uniqu.c) for 
checking
> equality of complex numbers.

> As I have found,, a *simplified* version of the chash() function
> to make it compatible with cequal() does solve all the problems I've
> indicated,  and the current plan is to commit that change --- after some
> discussion time, here on R-devel ---  to the code base.

> My change passes  'make check-all' fine, but I'm 100% sure that there will
> be effects in package-space. ... one reason for this posting.

> As mentioned above, note that the chash() function has been in
> use for all three functions
> match()
> duplicated()
> unique()
> and the change will affe

[Rd] Suggestion on default 'levels' in 'factor'

2016-05-06 Thread Suharto Anggono Suharto Anggono via R-devel
At first read, the logic of the following fragment in code of function 'factor' 
was not clear to me.
if (missing(levels)) {
y <- unique(x, nmax = nmax)
ind <- sort.list(y) # or possibly order(x) which is more (too ?) 
tolerant
y <- as.character(y)
levels <- unique(y[ind])
}

Code similar to the originally proposed in 
https://stat.ethz.ch/pipermail/r-devel/2009-May/053316.html is more readable to 
me.

I suggest using this.
if (missing(levels))
levels <- unique(as.character(
sort.int(unique(x, nmax = nmax), na.last = TRUE)# or possibly 
sort(x) which is more (too ?) tolerant
))

I assume that as.character(y)[sort.list(y)] is equivalent to 
as.character(sort.int(y, na.last = TRUE)). So, what I suggest above has the 
same effect as code in current 'factor'.  Function 'sort.int' instead of 'sort' 
to be like 'sort.list' that fails for non-atomic input.

What I suggest is similar in form to default 'levels' in 'factor' in R before 
version 2.10.0, which is
sort(unique.default(x), na.last = TRUE)

If this suggestion is used, the help page for 'factor' can be changed to say 
"(by 'sort.int')" instead of "(by 'sort.list')".

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] PR# for match.arg(arg)

2016-04-08 Thread Suharto Anggono Suharto Anggono via R-devel
In "R News", in "Changes in R 3.3.0", in "New Features", a news item is
match.arg(arg) (the one-argument case) is faster; so is sort.int(). (PR#16640)

While it was motivated by speeding up tapply, it is in a separate bug number: 
16652 (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16652).

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] %OS on output

2016-02-24 Thread Suharto Anggono Suharto Anggono via R-devel
R help on 'strptime' has the following in "Details" section.
Specific to R is ‘%OSn’, which for output gives the seconds truncated to ‘0 <= 
n <= 6’ decimal places (and if ‘%OS’ is not followed by a digit, it uses the 
setting of ‘getOption("digits.secs")’, or if that is unset, ‘n = 3’).

In reality, for output, if '%OS' is not followed by a digit and 
getOption("digits.secs") is unset, the output has no fractional part, as if n = 
0 is used.

> getOption("digits.secs")
NULL
> z <- strptime("20/2/06 11:16:16.683", "%d/%m/%y %H:%M:%OS")
> format(z, "%OS")
[1] "16"

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] On recent change to 'prettyDate'

2016-02-24 Thread Suharto Anggono Suharto Anggono via R-devel
A news entry for R 3.2.3 patched:
pretty(D) for date-time objects D now also works well if range(D) is (much) 
smaller than a second. In the case of only one unique value in D, the pretty 
range now is more symmetric around that value than previously. 
Similarly, pretty(dt) no longer returns a length 5 vector with duplicated 
entries for Date objects dt which span only a few days.

Looking at function 'prettyDate' in 
https://svn.r-project.org/R/trunk/src/library/grDevices/R/prettyDate.R, the 
early return part (if(isDate && D <= n * 24*3600)) is not quite right:
- The result doesn't have attribute "labels".
Help on 'pretty.Date', "Value":
A vector (of the suitable class) of locations, with attribute ‘"labels"’ giving 
corresponding formatted character labels.
- Argument 'min.n' is not respected.

Regarding 'min.n', I think min.n > n should be an error, as in 'pretty.default'.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Show rapply(how="replace") as alternative to 'dendrapply' to change leaves?

2016-02-13 Thread Suharto Anggono Suharto Anggono via R-devel
A non-leaf dendrogram object in R is a list, possibly nested. In R-devel, 
rapply() preserves attributes on the list when how = "replace". So, the result 
of applying rapply(how = "replace") to such a dendrogram object is a dendrogram 
object. Usually, non-list components of such a dendrogram object are the 
leaves. So, rapply(how = "replace") can be used to change leaves only of such a 
dendrogram object. For such case, it is an alternative to 'dendrapply'.

How about showing it in the documentation? In particular, in examples for 
'dendrapply', creation of 'dL' can alternatively use something like the 
following.
 local({
   colLabLeaf <<- function(n) {
   a <- attributes(n)
   i <<- i+1
   attr(n, "nodePar") <-
   c(a$nodePar, list(lab.col = mycols[i], lab.font = i%%3))
   n
   }
   mycols <- grDevices::rainbow(attr(dhc21,"members"))
   i <- 0
  })
 dL <- rapply(dhc21, colLabLeaf, how = "replace")

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] (no subject)

2015-10-21 Thread Suharto Anggono Suharto Anggono via R-devel
--
>>>>> Henric Winell <[hidden email]>
>>>>> on Wed, 21 Oct 2015 13:43:02 +0200 writes:

    > Den 2015-10-21 kl. 07:24, skrev Suharto Anggono Suharto Anggono via 
R-devel:
>> Marius Hofert-4--
>>> Den 2015-10-09 kl. 12:14, skrev Martin Maechler:
>>> I think so: the code above doesn't seem to do the right thing.  Consider
>>> the following example:
>>>
>>> > x <- c(1, 1, 2, 3)
>>> > rank2(x, ties.method = "last")
>>> [1] 1 2 4 3
>>>
>>> That doesn't look right to me -- I had expected
>>>
>>> > rev(sort.list(x, decreasing = TRUE))
>>> [1] 2 1 3 4
>>>
>>
>> Indeed, well spotted, that seems to be correct.
>>
>>>
>>> Henric Winell
>>>
>> --
>>
>> In the particular example (of length 4), what is really wanted is the 
following.
>> ind <- integer(4)
>> ind[sort.list(x, decreasing=TRUE)] <- 4:1
>> ind

> You don't provide the output here, but 'ind' is, of course,

>> ind
> [1] 2 1 3 4

>> The following gives the desired result:
>> sort.list(rev(sort.list(x, decreasing=TRUE)))

> And, again, no output, but

>> sort.list(rev(sort.list(x, decreasing=TRUE)))
> [1] 2 1 3 4

> Why is it necessary to use 'sort.list' on the result from
> 'rev(sort.list(...'?

You can try all kind of code on this *too* simple example and do
experiments.  But let's approach this a bit more scientifically
and hence systematically:

Look at  rank  {the R function definition} to see that
for the case of no NA's,

 rank(x, ties.method = "first')   ===sort.list(sort.list(x))

If you assume that to be correct and want to define "last" to be
correct as well (in the sense of being  "first"-consistent),
it is clear that

  rank(x, ties.method = "last)   ===  rev(sort.list(sort.list(rev(x

must also be correct.  I don't think that *any* of the proposals
so far had a correct version [but the too simplistic examples
did not show the problems].

In  R-devel (the R development) version of today, i.e., svn
revision >= 69549, the implementation of  ties.method = "last'
uses
## == rev(sort.list(sort.list(rev(x :
if(length(x) == 0) integer(0)
else { i <- length(x):1L
   sort.list(sort.list(x[i]))[i] },

which is equivalent to using rev() but a bit more efficient.

Martin Maechler, ETH Zurich 
--

I'll defend that my code is correct in general.

All comes from the fact that, if p is a permutation of 1:n,
{ ind <- integer(n); ind[p] <- 1:n; ind }
gives the same result to
sort.list(p)
You can make sense of it like this. In ind[p] <- 1:n, ind[1] is the position 
where p == 1. So, ind[1] is the position of the smallest element of p. So, it 
is the first element of sort.list(p). Next elements follow.

That's why 'sort.list' is used for ties.method="first" and ties.method="random" 
in function 'rank' in R. When p gives the desired order,
{ ind <- integer(n); ind[p] <- 1:n; ind }
gives ranks of the original elements based on the order. The original element 
in position p[1] has rank 1, the original element in position p[2] has rank 2, 
and so on.

Now, I say that rev(sort.list(x, decreasing=TRUE)) gives the desired order for 
ties.method="last". With the order, the elements are from smallest to largest; 
for equal elements, elements are ordered by their positions backwards.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rank(, ties.method="last")

2015-10-20 Thread Suharto Anggono Suharto Anggono via R-devel
Marius Hofert-4--
> Den 2015-10-09 kl. 12:14, skrev Martin Maechler:
> I think so: the code above doesn't seem to do the right thing.  Consider
> the following example:
>
>  > x <- c(1, 1, 2, 3)
>  > rank2(x, ties.method = "last")
> [1] 1 2 4 3
>
> That doesn't look right to me -- I had expected
>
>  > rev(sort.list(x, decreasing = TRUE))
> [1] 2 1 3 4
>

Indeed, well spotted, that seems to be correct.

>
> Henric Winell
> 
--

In the particular example (of length 4), what is really wanted is the following.
ind <- integer(4)
ind[sort.list(x, decreasing=TRUE)] <- 4:1
ind

The following gives the desired result:
sort.list(rev(sort.list(x, decreasing=TRUE)))

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'vapply' not returning list element names when returned element is a length-1 list

2015-08-05 Thread Suharto Anggono Suharto Anggono via R-devel
Quote

> If i have a function that returns a named list with 2 (or more) elements,
> then using 'vapply' retains the names of the elements:
> 
> But if the function only returns one element, then the name "foo" is lost

vapply _always simplifies_ according to the documentation.

In the first case (function return value contains more than one element, and 
each ), vapply simplifies to a matrix of two lists (!).  The names "foo" and 
"hello" have been added to the dimnames so you can tell which is which.

in the second case the function return value is a single list and not a matrix 
of lists (a simple list is simpler than a matrix of lists). The name of the 
list ('foo') has nowhere to go; instead, you would be assigning the list to a 
named variable and you don't need the name 'foo'.

Whether that is inconsistent is something of a matter of perspective. 
Simplification applied as far as possible will always depend on what 
simplification is possible for the particular return values, so different 
return values provide different behaviour.

S Ellison 

--

In the first case, the result is a matrix of mode list, which is a list with 
"dim" attribute of length 2.

For comparison, 'sapply' retains the name "foo".

> sapply(1:3, function(x) list("foo" = "bar"))
$foo
[1] "bar"

$foo
[1] "bar"

$foo
[1] "bar"

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] WISH: eval() to preserve the "visibility" (now value is always visible)

2015-02-08 Thread Suharto Anggono Suharto Anggono via R-devel
Sorry to intervene.

Argument passed to 'eval' is evaluated first.
So,
eval(x <- 2)
is effectively like
{ x <- 2; eval(2) } ,
which is effectively
{ x <- 2; 2 } .
The result is visible.

eval(expression(x <- 2))
or
eval(quote(x <- 2))
or
evalq(x <- 2)
gives the same effect as
x <- 2 .
The result is invisible.

In function 'eval2',
res <- eval(withVisible(expr), envir=envir, ...)
is effectively
res <- withVisible(expr) .

---

Would it be possible to have the value of eval() preserve the
"visibility" of the value of the expression?


"PROBLEM":

# Invisible
> x <- 1

# Visible
> eval(x <- 2)
[1] 2

"TROUBLESHOOTING":
> withVisible(x <- 1)
$value
[1] 1
$visible
[1] FALSE

> withVisible(eval(x <- 2))
$value
[1] 2
$visible
[1] TRUE


WORKAROUND:
eval2 <- function(expr, envir=parent.frame(), ...) {
  res <- eval(withVisible(expr), envir=envir, ...)
  value <- res$value
  if (res$visible) value else invisible(value)
}

> x <- 1
> eval(x <- 2)
[1] 2
> eval2(x <- 3)
> x
[1] 3

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel