On Jul 5, 2011, at 7:18 PM, <luke-tier...@uiowa.edu> <luke-tier...@uiowa.edu
> wrote:
On Tue, 5 Jul 2011, Simon Urbanek wrote:
On Jul 5, 2011, at 2:08 PM, Matthew Dowle wrote:
Simon (and all),
I've tried to make assignment as fast as calling `[<-.data.table`
directly, for user convenience. Profiling shows (IIUC) that it isn't
dispatch, but x being copied. Is there a way to prevent '[<-' from
copying x?
Good point, and conceptually, no. It's a subassignment after all -
see R-lang 3.4.4 - it is equivalent to
`*tmp*` <- x
x <- `[<-`(`*tmp*`, i, j, value)
rm(`*tmp*`)
so there is always a copy involved.
Now, a conceptual copy doesn't mean real copy in R since R tries to
keep the pass-by-value illusion while passing references in cases
where it knows that modifications cannot occur and/or they are
safe. The default subassign method uses that feature which means it
can afford to not duplicate if there is only one reference -- then
it's safe to not duplicate as we are replacing that only existing
reference. And in the case of a matrix, that will be true at the
latest from the second subassignment on.
Unfortunately the method dispatch (AFAICS) introduces one more
reference in the dispatch chain so there will always be two
references so duplication is necessary. Since we have only 0 / 1 /
2+ information on the references, we can't distinguish whether the
second reference is due to the dispatch or due to the passed object
having more than one reference, so we have to duplicate in any
case. That is unfortunate, and I don't see a way around (unless we
handle subassignment methods is some special way).
I don't believe dispatch is bumping NAMED (and a quick experiment
seems to confirm this though I don't guarantee I did that right). The
issue is that a replacement function implemented as a closure, which
is the only option for a package, will always see NAMED on the object
to be modified as 2 (because the value is obtained by forcing the
argument promise) and so any R level assignments will duplicate. This
also isn't really an issue of imprecise reference counting -- there
really are (at least) two legitimate references -- one though the
argument and one through the caller's environment.
It would be good it we could come up with a way for packages to be
able to define replacement functions that do not duplicate in cases
where we really don't want them to, but this would require coming up
with some sort of protocol, minimally involving an efficient way to
detect whether a replacement funciton is being called in a replacement
context or directly.
Would "$<-" always satisfy that condition. It would be big help to me
if it could be designed to avoid duplication the rest of the data.frame.
--
There are some replacement functions that use C code to cheat, but
these may create problems if called directly, so I won't advertise
them.
Best,
luke
Cheers,
Simon
--
Luke Tierney
Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: l...@stat.uiowa.edu
Iowa City, IA 52242 WWW: http://
www.stat.uiowa.edu______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
David Winsemius, MD
West Hartford, CT
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel