[GitHub] [flink] JingsongLi commented on pull request #18412: [FLINK-25696][datastream] Introduce MetadataPublisher interface to SinkWriter

GitBox Mon, 24 Jan 2022 18:13:07 -0800


JingsongLi commented on pull request #18412:
URL: https://github.com/apache/flink/pull/18412#issuecomment-1020739383



   > From a Kafka connector side, the subscriber is not updated on every 
record, and just when the KafkaProducer is flushed it is only updated for a 
bulk of records (either during a checkpoint or if the internal buffer size is 
reached).
   I would definitely prefer to handle the consumer in the mailbox and not by 
the Kafka thread. The Kafka threads might have surprising effects on the 
overall pipeline stability i.e. the shutdown is blocked because the producer 
cannot be stopped because it is executing the metadata consumer.
   
   Thanks for the information. I share my concern here:
   - Performance problem: If we put every meta into mailbox, this performance 
will be very poor, you can look at `TaskMailboxImpl.put`, every data is locked 
will lead to very poor throughput.
   - Consistency issues: for example, the data flushed at the time of 
transaction commit, then re-stuff their meta into the mailbox, these data 
belong to the next checkpoint, no longer the current checkpoint, and TableStore 
wants the metas of the current checkpoint.
   - Block problem: I think from the protocol Callback similar interfaces 
should be executed quickly, we can add comments, you can look at 
`org.apache.kafka.clients.producer.Callback`, it is also required without very 
heavy logic.
   
   > Regarding adding a method to the InitContext I think that is okay. Do you 
think there will be ever multiple Subscribers? Maybe it is safer to already add 
a list instead of an optional.
   
   I think we can let the implementer assemble it himself, if there is a need 
for list.
   
   > I am still a bit surprised that the TableStoreSink reads all metadata 
offsets nevertheless if they are committed in Kafka or not.
   
   `TableStoreSink` will only read these offsets at preSnapshot time, which is 
used to synchronize full (file) and incremental (log) data, the records must be 
flushed at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] JingsongLi commented on pull request #18412: [FLINK-25696][datastream] Introduce MetadataPublisher interface to SinkWriter

Reply via email to