On 22.09.20 11:10, Guowei Ma wrote:
1. I think maybe we could add a EOI interface to the `GlobalCommit`. So
that we could make `write success file` be available in both batch and
stream execution mode.

We could, yes. I'm now hesitant because we're adding more things but I think it should be fine.

2.  If we choose to let the two types of committer appear at the same time
in the API we have to figure out how to express the relation between the
two committers. I think the Sink API may look like the following: What do
you think?
Sink<T, CommT, CommR, ShareStateT...> {
         Writer<T, CommT, ShareStateT.....> createWriter();
         Optional<Committer<CommT>> createCommitter();
         Optional<GlobalComiitter<CommT, GlobalCommT>>
createGlobalCommitter();
}

Yes, I think this is what we should do. Though I think that we should initially not support shared state. The FileSink only uses this to create unique file names and I think we can do without it. If we see that we need it later we can add it but I would like to keep things minimal initially. It's always easy to add things later but it's hard to take things away once you added them.

3. Maybe a silly question: Why do we need `commit` return `CommitResult`? I
think the sink developer could rety himself. Why need the framework to do
the retry?

It's not a silly question at all! I think we need the retry to support such problems as Steven mentioned. If a commit fails a RETRY tells the framework that it should keep the commits in state and retry them on the next checkpoint. When the committer returns FAILURE we should just fail the job. It's to support temporary outages of the external metastore.

I'm open to leaving it out of the initial version for the same reasons I mentioned above but I think it could be valuable.

Best,
Aljoscha

Reply via email to