On 22.09.20 11:10, Guowei Ma wrote:
1. I think maybe we could add a EOI interface to the `GlobalCommit`. So
that we could make `write success file` be available in both batch and
stream execution mode.
We could, yes. I'm now hesitant because we're adding more things but I
think it should be fine.
2. If we choose to let the two types of committer appear at the same time
in the API we have to figure out how to express the relation between the
two committers. I think the Sink API may look like the following: What do
you think?
Sink<T, CommT, CommR, ShareStateT...> {
Writer<T, CommT, ShareStateT.....> createWriter();
Optional<Committer<CommT>> createCommitter();
Optional<GlobalComiitter<CommT, GlobalCommT>>
createGlobalCommitter();
}
Yes, I think this is what we should do. Though I think that we should
initially not support shared state. The FileSink only uses this to
create unique file names and I think we can do without it. If we see
that we need it later we can add it but I would like to keep things
minimal initially. It's always easy to add things later but it's hard to
take things away once you added them.
3. Maybe a silly question: Why do we need `commit` return `CommitResult`? I
think the sink developer could rety himself. Why need the framework to do
the retry?
It's not a silly question at all! I think we need the retry to support
such problems as Steven mentioned. If a commit fails a RETRY tells the
framework that it should keep the commits in state and retry them on the
next checkpoint. When the committer returns FAILURE we should just fail
the job. It's to support temporary outages of the external metastore.
I'm open to leaving it out of the initial version for the same reasons I
mentioned above but I think it could be valuable.
Best,
Aljoscha