GitHub user hubcio added a comment to the discussion: Adding a possibility to notify consumers
i checked other message streaming platforms. in Kafka they have `consumer.poll(Duration)`, in redis there is `XREAD BLOCK`, in NATS there is `NextMsg(timeout)`, this is recurring pattern. i think we need to have similar API, not some WASM-like libraries built by users - these are neat, but I cannot imagine how would that look in code, since all traffic has to go through one TCP connection. Also, from users perspective it would be nightmare to use. I think we can do better. after analyzing our architecture (both the current single node server and the new VSR we're building), i think the right approach is deferred-response polling - essentially what kafka, redis, and NATS all do. the idea is simple: add wait_timeout_ms to PollMessages. value 0 = current behavior (return immediately). value > 0 = "hold my request until data arrives or timeout expires." one parameter, every SDK already has poll_messages() implemented - this is a single field addition per language. old clients that don't send it get default 0, fully backward compatible. why not push notifications? push (server sends unsolicited frames to client) would require splitting every connection handler into separate reader/writer tasks, new subscribe/unsubscribe commands, frame type discriminator in the wire protocol (breaking change), and every foreign SDK would need a frame demultiplexer with background reader task. that's weeks of work per SDK for the same wake latency. deferred-response gives sub-millisecond wakeup with zero new concepts for users. how it works server-side: the connection handler is not the message pump. each TCP connection has its own handler task, the pump processes ShardFrames sequentially via channels. we don't block the pump - we "park" the poll. when poll_messages with wait_timeout > 0 arrives and partition has no new data, the shard stores a lightweight wakeup handle in a per-partition wait registry. handler returns from process_frame immediately, pump is not blocked and continues processing other frames. when SendMessages commits to that partition, the pump sends a wakeup signal to waiting consumers (one atomic CAS per consumer, natural coalescing). the parked poll completes, client receives data. if timeout expires first, client gets an empty response (not an error, this is normal). this fits the new VSR architecture well - the notification fire point is inside commit_messages() after partition.offset.store(committed_offset). consumers only get woken after data has achieved quorum, never after prepare. view change just lets pending polls expire naturally. the response IS the data - no race between "notification arrives" and "poll for data" that you'd get with push notifications in a consumer group scenario. as for the server internals, when we build the new connection handler we should design it with split reader/writer tasks from day one - not for push notifications, but because it eliminates per-request channel allocation and the select! overhead in the read path. this makes adding push notification support later (for cluster events, rebalance signals, etc.) trivial - just one more message type on the writer task's channel. but the user-facing API stays poll_messages(wait_timeout_ms) regardless. thoughts? @numinnex @spetz GitHub link: https://github.com/apache/iggy/discussions/2854#discussioncomment-16306647 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
