lwhite1 commented on PR #13248:
URL: https://github.com/apache/arrow/pull/13248#issuecomment-1260033269
Hoping to restart this thread.
I'm beginning to wonder if we (including and maybe especially, me) haven't
made this all too complicated.
The proposed simple change makes the code work the same as other
ReferenceManager implementations (or nearly so), and allows the original
Vector's ArrowBuf to hand off its memory to another ArrowBuf.
The code here around the management of imported data is pretty complicated
but I think this change is pretty safe. From my review, I believe this is what
is happening:
In the ArrayImporter code, the CDataReferenceManager releases the memory as
soon as the import is complete:
```
// This keeps the array alive as long as there are any buffers that need
it
referenceManager = new CDataReferenceManager(ownedArray);
try {
referenceManager.increment();
doImport(snapshot);
} finally {
referenceManager.release();
}
```
So the ref count is already 0 when the transfer attempt occurs later.
The Data class that provides the high-level interface for getting C-Data has
methods like `importIntoVector()` and `importVectorSchemaRoot()` that take
allocators as arguments (and objects to receive the data) that become
associated with the imported data at that step in the process, although the
ArrowBufs they hold still point to the CDataReferenceManager. That last detail
is what currently causes subsequent attempts to transfer the data to fail.
In spite of the naming/documentation of the methods involved, there is
nothing that requires a TransferPair to actually change what Allocator the
newly created ArrowBuf is associated with. There are numerous use-cases for
"transferring" the memory to the same Allocator it started with.
Maybe the most important existing, but currently unsupported, use cases are:
- constructing a new VectorSchemaRoot from one that is managed by native
code.
- slicing a vector originating from native code
Right now, you can use data from C in the format it was received in, or not
at all. It's not clear to me that the original change makes anything worse
than it currently is, and it would provide support for these two important
use-cases.
Of course, I could be wrong again. What am I missing?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]