[ 
https://issues.apache.org/jira/browse/FLINK-20972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huajiewang updated FLINK-20972:
-------------------------------
    Description: 
when TwoPhaseCommitSinkFunctionOutput tigger notifyCheckpointComplete,

Maybe A large number of EventData will be output (log.info)

,which will cause IO bottleneck and disk waste

 
my code in the attachment, A large number event data output in the log output 
by flink, 
 
like: 
Jdbc2PCSinkFunction 1/1 - checkpoint 4 complete, committing transaction 
TransactionHolde

{handle=Transaction(b420c880a951403984f231dd7e33597b, ListBuffer(insert into 
table(field1,field2) value ('11','22') ... ... ), 
transactionStartTime=1610426158532}

from checkpoint 4

 

 

method notifyCheckpointComplete of TwoPhaseCommitSinkFunction,  about LOG.info 
code is as follows:

LOG.info("{} - checkpoint {} complete, committing transaction {} from 
checkpoint {}",
 name(), checkpointId, pendingTransaction, pendingTransactionCheckpointId);

 
will be call the toString method of pendingTransaction (TransactionHolder), 
TransactionHolder'toString method code is:
@Override
public String toString() {
 return "TransactionHolder{"
 + "handle="
 + handle
 + ", transactionStartTime="
 + transactionStartTime
 + '}';
}
 
handle is the concrete realization of my Transaction! There is a parameter of 
List type in my Transaction, which is used to receive data. as a result, these 
data are printed out(log.info)
 
 

 

  was:
when TwoPhaseCommitSinkFunctionOutput tigger notifyCheckpointComplete,

Maybe A large number of EventData will be output (log.info)

,which will cause IO bottleneck and disk waste

 

 

Jdbc2PCSinkFunction 1/1 - checkpoint 4 complete, committing transaction 
TransactionHolder{handle=Transaction(b420c880a951403984f231dd7e33597b,
ListBuffer(insert into table(field1,field2) value ('11','22') ... ... ), 
transactionStartTime=1610426158532} from checkpoint 4

 


> TwoPhaseCommitSinkFunction Output a large amount of EventData
> -------------------------------------------------------------
>
>                 Key: FLINK-20972
>                 URL: https://issues.apache.org/jira/browse/FLINK-20972
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / DataStream
>    Affects Versions: 1.12.0
>         Environment: flink 1.4.0 +
>            Reporter: huajiewang
>            Priority: Minor
>              Labels: easyfix
>         Attachments: Jdbc2PCSinkFunction.scala
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> when TwoPhaseCommitSinkFunctionOutput tigger notifyCheckpointComplete,
> Maybe A large number of EventData will be output (log.info)
> ,which will cause IO bottleneck and disk waste
>  
> my code in the attachment, A large number event data output in the log output 
> by flink, 
>  
> like: 
> Jdbc2PCSinkFunction 1/1 - checkpoint 4 complete, committing transaction 
> TransactionHolde
> {handle=Transaction(b420c880a951403984f231dd7e33597b, ListBuffer(insert into 
> table(field1,field2) value ('11','22') ... ... ), 
> transactionStartTime=1610426158532}
> from checkpoint 4
>  
>  
> method notifyCheckpointComplete of TwoPhaseCommitSinkFunction,  about 
> LOG.info code is as follows:
> LOG.info("{} - checkpoint {} complete, committing transaction {} from 
> checkpoint {}",
>  name(), checkpointId, pendingTransaction, pendingTransactionCheckpointId);
>  
> will be call the toString method of pendingTransaction (TransactionHolder), 
> TransactionHolder'toString method code is:
> @Override
> public String toString() {
>  return "TransactionHolder{"
>  + "handle="
>  + handle
>  + ", transactionStartTime="
>  + transactionStartTime
>  + '}';
> }
>  
> handle is the concrete realization of my Transaction! There is a parameter of 
> List type in my Transaction, which is used to receive data. as a result, 
> these data are printed out(log.info)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to