On Mon, Nov 17, 2008 at 1:11 PM, laurent <[EMAIL PROTECTED]> wrote:
> Yes, I have briefly seen that you are doing it at the rpy2.robjects
> level. That's certainly a good way to go when wanting to offer such
> capability.
>
> Regarding SexpVector accepting numpy's one-dimensional arrays, that's
> because anything that provides the sequence protocol is accepted by the
> constructor.

Right.

> A potential drawback of the approach is that there is copying of the
> data from Python/numpy to the embedded R process (and that can be a
> significant penalty on performances when working with large(r) data
> sets), and I hoped I could avoid that.
>
> I am also aware that there is already copying when creating R vectors
> from array.array instances anyway, and the benefit of having something
> working at the rpy2.robjects level right now is looking greater than me
> fussing about an hypothetical perfect solution. It is also always
> possible to copy any voluminous numpy object to R once, then delete that
> numpy object and recreate one from the R object (making both point to
> the same underlying data).

Ah, I see -- I agree that it would be fantastic to be able to pass
data to R without copying, but my impression was that R's internals
had much more simple-minded data structures than numpy, and that made
this impossible.

In particular, looking at Rinternals.h, VECTOR_ELT etc. are defined in
terms of calculating direct memory offsets from an SEXPREC.  Not only
are there no strides (necessary for column-major R to grok row-major
Numpy arrays), but you can't even have a SEXPREC that points to an
array stored elsewhere in memory -- if you want a vector to be visible
in R space, you have to stick a header struct right on the front of
it.

What might be most useful in practice would be noticing when we are
getting an ndarray that *is* backed by an R vector, and then just
passing on that R vector instead of re-copying it.  Right now, users
have to keep track of two different objects -- the SexpVector (or
wrapper) and the ndarray returned by numpy.asarray(), and make sure to
use the correct one in each case -- if you pass the ndarray to an R
function, the data will be copied into R space, even though it's
*already* in R space!  It would be nice if one could just pass the
R-backed ndarray directly to R functions, and they automatically
received the corresponding R object.

The downside is that there would be mysteriously inconsistent behavior
when ndarrays were passed to R functions that mutated their arguments
-- R-backed ndarrays would be passed by reference and mutated, normal
ndarrays would be passed by value and not mutated.  There's also the
implementation difficulty that one needs some way for the conversion
machinery to recognize which ndarrays already have an associated SEXP,
and to find the right SEXP.  It's probably possible to do this in some
complicated way using weakrefs and a lookaside table etc., but much
easier would be to just define a custom ndarray subclass, that stashes
that info in an instance variable.  And it would be more obvious which
objects would be passed by reference vs. by value, because they would
be different types.

In my perfect world, actually, RArray (or SexpVector!) would *be* that
subclass, since that would simplify programs -- no need for calling
explicit conversion functions (or confusion about which conversion
function to use -- numpy.asarray wouldn't know about the special
subclass, so we'd need our own version, etc.), and the full numpy API
would be directly available, which means that e.g. multidimensional
indexing would Just Work (right now it gives obscure error messages).
Also, the numpy API just feels more natural to me -- __add__ mapping
to c() is neat, but I can't imagine when I would actually use that
behavior in practice, while I can imagine lots of times I would want
to do addition on my numeric arrays :-).

-- Nathaniel

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Reply via email to