Im-Manshushu commented on issue #11258:
URL: https://github.com/apache/doris/issues/11258#issuecomment-1198922608
After the function development is completed, a reference threshold can be
given to users. Users can set sink concurrency and checkpoint interval
according to scenarios such as data volume and effectiveness
------------------ 原始邮件 ------------------
发件人:
"apache/doris"
***@***.***>;
发送时间: 2022年7月29日(星期五) 中午11:47
***@***.***>;
抄送: "I'm ~ ***@***.******@***.***>;
主题: Re: [apache/doris] [Feature] JSON data is dynamically written to
the Doris table (Issue #11258)
Many users put all the canal logs of all tables in the business library into
one topic, which needs to be distributed before they can use
doris-flink-connector. His idea is to edit a task to synchronize the entire
library. Because currently doris-flink-connector uses http inputstream, that
is, a checkpoint opens a stream, and a streamLoad url is strongly bound.
Therefore, the current doris-flink-connector architecture is not suitable for
the entire library synchronization, because it will involve too many http long
link. In this case, we can only use the old streamload batch mode: the flink
side caches data, then a table generates a buffer, and binds the corresponding
table-streamload-url, and sets a threshold, such as rows number or batch size
to submit tasks, just like doris-datax-writer.
However, in the old version of stream load and batch writing, there may be
several problems:
A series of problems caused by the unreasonable setting of the cached batch
size: For example, if it is too small, it will cause the -235 problem caused by
frequent imports; if the setting is too large, the flink memory will be under
pressure.
And does not guarantee exactly-once semantics
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]