zeroshade commented on PR #43632: URL: https://github.com/apache/arrow/pull/43632#issuecomment-2284467244
@westonpace your last example confuses me a bit, particularly because the entire purpose of this is to create a *push-based* approach for async handling, rather than a pull-based approach. Below are my questions: ```c struct Waker { // Signal to the producer that more data is available, the consumer shall release any resources // associated with the waker after this method is called. The producer shall not call any other // methods after calling this method. // // The producer must always call this method even if an error is encountered (in which case the // error will be reported on the following call to `get_next`). void wake(Waker* waker); void* private_data; }; ``` I don't understand signalling to the producer that data is available, the producer knows when data is available right? it's producing the data in the first place. Isn't the point that the producer needs to signal the *consumer* that more data is available? `Waker` would need to be consumer created, with the producer calling `wake` when data is available. Was your comment just a typo, or am I missing something? ```c struct ArrowAsyncDeviceArrayStream { // Callback to get the next array task // (if no error and the task is released, the stream has ended) // // Return value: 0 if successful, an `errno`-compatible error code otherwise. // // The consumer is allowed to call this method again even if the tasks returned by a previous // call have not completed. However, the consumer shall not call this re-entrantly. // // If successful, the ArrowDeviceArray must be released independently from the stream. int (*get_next_task)(struct ArrowAsyncDeviceArrayStream* self, struct ArrowArrayTask* task); // The rest is identical to `ArrowDeviceArrayStream` ArrowDeviceType device_type; const char* (*get_last_error)(struct ArrowAsyncDeviceArrayStream* self); void (*release)(struct ArrowAsyncDeviceArrayStream* self); void* private_data; }; ``` Why the abstraction of a task in this case? Couldn't we remove the task abstraction and just put the `Waker` here? It feels like the `get_next_task` abstraction here is just a way to re-invert the direction of the calls. Producer calls `get_next_task` which turns around and calls `task->get_next`? At that point, it feels like we could bypass the idea of needing `ArrowAsyncDeviceArrayStream` to be constructed by the consumer and instead build the entire API around the waker? Also in the case of this being able to handle concurrency, the `get_last_error` function becomes mostly meaningless, right? Depending on when it is called, you end up with a race condition as far as what the error might be. Finally, If a consumer can call `get_next_task` before the previous one has completed, what should happen in the following scenario: * There is only one record in the stream, it is delayed * Consumer calls `get_next_task` 5 times in a loop to get the next 5 tasks Do all 5 calls from the consumer wait until the record comes in? Does each one provide its own task object which then turns around and waits until the record comes in to call wake? The producer then needs to manage the order the calls came in, along with the order that the records come in for the calls to `get_next`. This seems like a lot of added complexity without much benefit. All in all, i'm not completely sold on the idea of this re-inversion of the paradigm. It honestly feels sorta hacky, but that might just be me. @paleolimbot @lidavidm thoughts on @westonpace's suggestion here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org