Re: [Rd] True length - length(unclass(x)) - without having to call unclass()?

Tomas Kalibera Wed, 05 Sep 2018 01:10:15 -0700

On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:

Is there a low-level function that returns the length of an object 'x'
- the length that for instance .subset(x) and .subset2(x) see? An
obvious candidate would be to use:


.length <- function(x) length(unclass(x))

However, I'm concerned that calling unclass(x) may trigger an
expensive copy internally in some cases.  Is that concern unfounded?

Unclass() will always copy when "x" is really a variable, because thevalue in "x" will be referenced; whether it is prohibitively expensiveor not depends only on the workload - if "x" is a very long list andthis functions is called often then it could, but at least to me thissounds unlikely. Unless you have a strong reason to believe it is thecase I would just use length(unclass(x)).

If the copying is really a problem, I would think about why theunderlying vector length is needed at R level - whether you really needto know the length without actually having the unclassed vector anywayfor something else, so whether you are not paying for the copy anyway.Or, from the other end, if you need to do more without copying, and itis possible without breaking the value semantics, then you might need toswitch to C anyway and for a bigger piece of code.

If it were still just .length() you needed and it were performancecritical, you could just switch to C and call Rf_length. That does notviolate the semantics, just indeed it is not elegant as you areswitching to C.

If you stick to R and can live with the overhead of length(unclass(x))then there is a chance the overhead will decrease as R is optimizedinternally. This is possible in principle when the runtime knows thatthe unclassed vector is only needed to compute something that does notmodify the vector. The current R cannot optimize this out, but it shouldbe possible with ALTREP at some point (and as Radford mentioned pqR doesit differently). Even with such internal optimizations indeed it isoften necessary to make guesses about realistic workloads, so if youhave a realistic workload where say length(unclass(x)) is critical, youare more than welcome to donate it as benchmark.

Obviously, if you use a C version calling Rf_length, after such Roptimization your code would be unnecessarily non-elegant, but wouldstill work and probably without overhead, because R can't do much lessthan Rf_length. In more complicated cases though hand-optimized C codeto implement say 2 operations in sequence could be slower than whatbetter optimizing runtime could do by joining the effect of possiblymore operations, which is in principle another danger of switching fromR to C. But as far as the semantics is followed, there is no other danger.

The temptation should be small anyway in this case when Rf_length()would be the simplest, but as I made it more than clear in the previousemail, one should never violate the value semantics by temporarilymodifying the object (temporarily removing the class attribute ortemporarily remove the object bit). Violating semantics causes bugs, ifnot with the present then with future versions of R (where version maybe an svn revision). A concrete recent example: modifying objects inplace in violation of the semantics caused a lot of bugs withintroduction of unification of constants in the byte-code compiler.


Best
Tomas


Thxs,

Henrik

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] True length - length(unclass(x)) - without having to call unclass()?

Reply via email to