paleolimbot commented on PR #43632: URL: https://github.com/apache/arrow/pull/43632#issuecomment-2286856527
Echoing all the thanks to Weston for the detailed response! I wonder if it is worth clarifying the goals and non-goals of this proposal. In my mind, this is about rectifying two very different ways engines/APIs operate (push vs. pull). I don't have much experience on the performance side, but in the development time/lines-of-code side, trying to make a producer that expects to push its output interact with a consumer that wants to pull is expensive (the reverse is also true). This gets more and more complicated the more times this mismatch is encountered in a pipeline. I worry that in the quest for the best possible performance that we loose any development time/lines-of-code advantage that a simpler approach might have enabled! I also worry that an ABI that becomes too opinionated about how a scanner should be implemented will still not be able to express other ("non optimal"?) scanners that, for historical reasons (or because we were wrong about what an optimal scanner looks like), don't work that way. I still think that something like the original proposal (with clear, if imperfect, expectations about what can or should happen in the callbacks) is *a* missing piece (if not *the* missing piece). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org