[ https://issues.apache.org/jira/browse/FLINK-20972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
huajiewang updated FLINK-20972: ------------------------------- Description: when TwoPhaseCommitSinkFunctionOutput tigger notifyCheckpointComplete, Maybe A large number of EventData will be output (log.info) ,which will cause IO bottleneck and disk waste my code in the attachment, A large number event data output in the log output by flink, like: Jdbc2PCSinkFunction 1/1 - checkpoint 4 complete, committing transaction TransactionHolde {handle=Transaction(b420c880a951403984f231dd7e33597b, ListBuffer(insert into table(field1,field2) value ('11','22') ... ... ), transactionStartTime=1610426158532} from checkpoint 4 method notifyCheckpointComplete of TwoPhaseCommitSinkFunction, about LOG.info code is as follows: LOG.info("{} - checkpoint {} complete, committing transaction {} from checkpoint {}", name(), checkpointId, pendingTransaction, pendingTransactionCheckpointId); will be call the toString method of pendingTransaction (TransactionHolder), TransactionHolder'toString method code is: @Override public String toString() { return "TransactionHolder{" + "handle=" + handle + ", transactionStartTime=" + transactionStartTime + '}'; } handle is the concrete realization of my Transaction! There is a parameter of List type in my Transaction, which is used to receive data. as a result, these data are printed out(log.info) was: when TwoPhaseCommitSinkFunctionOutput tigger notifyCheckpointComplete, Maybe A large number of EventData will be output (log.info) ,which will cause IO bottleneck and disk waste Jdbc2PCSinkFunction 1/1 - checkpoint 4 complete, committing transaction TransactionHolder{handle=Transaction(b420c880a951403984f231dd7e33597b, ListBuffer(insert into table(field1,field2) value ('11','22') ... ... ), transactionStartTime=1610426158532} from checkpoint 4 > TwoPhaseCommitSinkFunction Output a large amount of EventData > ------------------------------------------------------------- > > Key: FLINK-20972 > URL: https://issues.apache.org/jira/browse/FLINK-20972 > Project: Flink > Issue Type: Improvement > Components: API / DataStream > Affects Versions: 1.12.0 > Environment: flink 1.4.0 + > Reporter: huajiewang > Priority: Minor > Labels: easyfix > Attachments: Jdbc2PCSinkFunction.scala > > Original Estimate: 1h > Remaining Estimate: 1h > > when TwoPhaseCommitSinkFunctionOutput tigger notifyCheckpointComplete, > Maybe A large number of EventData will be output (log.info) > ,which will cause IO bottleneck and disk waste > > my code in the attachment, A large number event data output in the log output > by flink, > > like: > Jdbc2PCSinkFunction 1/1 - checkpoint 4 complete, committing transaction > TransactionHolde > {handle=Transaction(b420c880a951403984f231dd7e33597b, ListBuffer(insert into > table(field1,field2) value ('11','22') ... ... ), > transactionStartTime=1610426158532} > from checkpoint 4 > > > method notifyCheckpointComplete of TwoPhaseCommitSinkFunction, about > LOG.info code is as follows: > LOG.info("{} - checkpoint {} complete, committing transaction {} from > checkpoint {}", > name(), checkpointId, pendingTransaction, pendingTransactionCheckpointId); > > will be call the toString method of pendingTransaction (TransactionHolder), > TransactionHolder'toString method code is: > @Override > public String toString() { > return "TransactionHolder{" > + "handle=" > + handle > + ", transactionStartTime=" > + transactionStartTime > + '}'; > } > > handle is the concrete realization of my Transaction! There is a parameter of > List type in my Transaction, which is used to receive data. as a result, > these data are printed out(log.info) > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)