tokoko commented on PR #4317: URL: https://github.com/apache/arrow-adbc/pull/4317#issuecomment-4726518894
@lidavidm I took a look at the sdk. I think the answer depends on whether we're looking at some sort of a bounded batch scenario or pure continuous streaming. **Batch**: The main problem here is that the API currently targets a scenario where writers are used to stage the data and then a coordinator "commits" during a Complete call. For iceberg/delta, that's a natural abstraction, for databases you can work around it by staging to a temp table and doing a hopefully inexpensive swap at the end. Having said that, there's no reason a driver can't implement it w/o a staging table dance and simply start writing to a target table in which case a Complete call is kind of a no-op, the data has already been materialized. maybe we should also have an option to allow a client to configure which mode they want the driver to work in? wdyt? So, a ZeroBus driver would implement this the same way as any other driver, either stage writes somewhere (temp Delta path, staging topic, etc.), then atomically promote them on Complete (not sure how easy that is in dbx unity) or start writing to the target table directly and Complete becomes something of a no-op. Same staging-and-swap pattern as an RDBMS driver, just with different internals. **Streaming/continuous**: The API doesn't prevent it — a driver could treat the handle as a long-lived session token. The main value the API provides would be the handle that's produced after a centralized setup (validate target, schema, permissions once). Complete semantics will probably be different in that case. you can either ignore it or treat it as some way to store acked receipts from writers, some sort of a state management. There's also a question whether that state needs to be exposed to the client as well and whether the exposed state should be opaque or something more concrete (for example, offsets). In short, I deliberately avoided going down the streaming rabbithole in this PR, but that can be changed of course. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
