shikokuchuo commented on issue #40231:
URL: https://github.com/apache/arrow/issues/40231#issuecomment-1967047575
Hi @etiennebacher @eitsupi
It is true that objects backed by external pointers cannot be
straightforwardly exported to parallel processes.
However [{mirai}](https://github.com/shikokuchuo/mirai) has devised a novel
method, utilising the low level 'refhook' capability of R serialization itself,
to allow such objects to be used transparently in the same way as other R
objects.
This was originally devised for torch tensors:
https://shikokuchuo.net/mirai/articles/torch.html
This has just been extended in the development version to support other
object types such as Arrow.
``` r
library(arrow, warn.conflicts = FALSE)
library(mirai)
serialization(refhook = list(arrow::write_to_raw, arrow::read_ipc_stream),
class = "ArrowTabular")
cl <- make_cluster(1)
parallel::parLapply(cl, 1, \(x) arrow::as_arrow_table(head(iris)))
#> [[1]]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
```
The above aims to demonstrate something close to what you were attempting
above.
It relies on registering the custom serialization and unserialisation
functions. Here the return value is a 'tibble' as this is what is produced
round-tripping `read_ipc_stream(write_ipc_raw(x))` but as per the torch case,
it is possible to get back the same type if this were perfect.
Using the native 'mirai' interface better demonstrates the possibilities,
such as seamlessly moving Arrow objects in deeply nested structures:
``` r
library(arrow, warn.conflicts = FALSE)
library(mirai)
serialization(refhook = list(arrow::write_to_raw, arrow::read_ipc_stream),
class = "ArrowTabular")
daemons(2)
#> [1] 2
m <- mirai(list(a = arrow::as_arrow_table(x), b = "some text"), x =
arrow::as_arrow_table(head(iris)))
call_mirai(m)$data
#> $a
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#>
#> $b
#> [1] "some text"
```
Hope you find this useful. If you believe there is a better way to integrate
with Arrow, please do let me know.
p.s. you'll need the dev versions installable from:
```r
install.packages(c("nanonext", "mirai"), repos = "hibiki-ai.r-universe.dev")
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]