[
https://issues.apache.org/jira/browse/FLINK-20972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265817#comment-17265817
]
huajiewang edited comment on FLINK-20972 at 1/15/21, 8:39 AM:
--------------------------------------------------------------
[~gaoyunhaii]
My code is just an example,What you said makes sense, but in my opinion, this
is a notification message output by the flink framework to tell the user which
batch of checkpoint completed, but currently there are no requirements and
restrictions for the Transaction type, so that users can Free definition, a
little carelessness will cause this problem, unless the user is very familiar
with the processing logic of this code, in order to effectively avoid this
problem, about this Transaction class, if the flink output information when the
checkpoint is completed requires user participation, then Flink can completely
define an interface type (Transaction), allowing users to implement this
interface. So I think this is the issue of Flink
was (Author: benjobs):
[~gaoyunhaii] What you said is reasonable, but in my opinion, this is a
notification message output from the internal Flink framework, which is used to
tell the user which batch of checkpoint completed. However, at present, there
are no requirements and restrictions for the transaction type, so that the user
can freely define it. A little carelessness will cause this problem, unless the
user is very familiar with the processing logic of this code, In order to
effectively avoid this problem, regarding the transaction class, if the user is
required to participate in the output information when the checkpoint is
completed, the Flink can completely define an interface type (Transaction) for
the user to implement the interface. Therefore, I think this is the issue of
Flink
> TwoPhaseCommitSinkFunction Output a large amount of EventData
> -------------------------------------------------------------
>
> Key: FLINK-20972
> URL: https://issues.apache.org/jira/browse/FLINK-20972
> Project: Flink
> Issue Type: Improvement
> Components: API / DataStream
> Affects Versions: 1.12.0
> Environment: flink 1.4.0 +
> Reporter: huajiewang
> Priority: Minor
> Labels: easyfix, pull-request-available
> Attachments: 1610682498960.jpg, 1610682603148.jpg,
> Jdbc2PCSinkFunction.scala
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> in TwoPhaseCommitSinkFunctionOutput Maybe A large number of EventData will be
> output(log.info),which will cause IO bottleneck and disk waste.
>
> my code in the attachment, A large number event data output in the log
> output by flink , e.g:
> {code:java}
> Jdbc2PCSinkFunction 1/1 - checkpoint 4 complete, committing transaction
> TransactionHolde {handle=Transaction(b420c880a951403984f231dd7e33597b,
> ListBuffer(insert into table(field1,field2) value ('11','22') ... ... ),
> transactionStartTime=1610426158532} from checkpoint 4{code}
> in TwoPhaseCommitSinkFunction about LOG.info code is as follows:
> !1610682498960.jpg|width=838,height=630!
> {code:java}
> LOG.info(
> "{} - checkpoint {} complete, committing transaction {} from
> checkpoint {}",
> name(),
> checkpointId,
> pendingTransaction,
> pendingTransactionCheckpointId); {code}
> will be invoke pendingTransaction'toString method (pendingTransaction is
> TransactionHolder'instance)
> TransactionHolder'toString method code is:
> !1610682603148.jpg|width=859,height=327!
> {code:java}
> @Override
> public String toString() {
> return "TransactionHolder{"
> + "handle="
> + handle
> + ", transactionStartTime="
> + transactionStartTime
> + '}';
> }{code}
> handle is the concrete realization of my Transaction! There is a parameter
> of List type in my Transaction, which is used to receive data. as a result,
> these data are printed out(log.info)
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)