Alright, let's do the locking then :) Let's keep only one interface for the release.
On Sun, May 31, 2015 at 12:58 PM, Márton Balassi <balassi.mar...@gmail.com> wrote: > I am also for having only one source interface. It seems that > interruptability is to much of a burden on the sources, locking version > should be still acceptable from the user point of view. We are dealing with > inherently concurrent tasks, I suppose our users are familiar with locking > - especially the ones in need for exactly once processing. > > On Sat, May 30, 2015 at 2:44 AM, Aljoscha Krettek <aljos...@apache.org> > wrote: > > > I would also prefer having only one source. The PR still has both > > variants so that people can check them out. > > > > In my opinion the assumptions about interruptibility are easier to > > break than the requirement of locking. Even if we get the kafka source > > to work with the interruptions (which I doubt, because this fails > > somewhere in their code) this would not guarantee that this will > > always work in future versions. With the locking you either have the > > locking, then it is correct (even for feature versions) or you don't, > > then it is immediately incorrect. > > > > On Fri, May 29, 2015 at 10:56 PM, Gyula Fóra <gyula.f...@gmail.com> > wrote: > > > Hey, > > > > > > It seems like both interfaces are pretty much capable of doing the same > > > thing but work on slightly different assumptions. > > > > > > Isn't there a way that the kafka source can work with the > interruptions? > > I > > > think the reachedEnd/next interface is slightly easier to grasp than > the > > > run() with the locks. But in any case I would slightly prefer having > only > > > one of them if they can technically do the same thing. > > > > > > Also adding a new interface means we add a new streamtask > implementation > > > which is also getting slightly too much. > > > > > > What is you opinion on this? > > > > > > Gyula > > > > > > > > > > > > On Fri, May 29, 2015 at 6:51 PM, Aljoscha Krettek <aljos...@apache.org > > > > > wrote: > > > > > >> Hi All, > > >> after finishing my pull request that should fix the problems with the > > >> synchronisation of checkpoints and element emission (the reason for > > >> the faulty results of the exactly-once tests) I discovered that the > > >> Kafka source does not deal well with being interrupted. We recently > > >> changed the SourceFunction to the reachedEnd()/next() interface, with > > >> the contract that the source must be interruptible to be able to > > >> perform checkpoints. Now this doesn't seem to work with Kafka. I added > > >> another Source interface in my PR > > >> (https://github.com/apache/flink/pull/742). This is similar to the > old > > >> interface of run()/cancel(), with the addition that the source must > > >> acquire a lock before updating state and emitting elements. The update > > >> of state and the emission of elements must happen in the same > > >> synchronized block to ensure consistency. This seems to solve the > > >> problem but now we have two source interfaces. > > >> > > >> The question is now. What do you think about the two interfaces? > > >> Should we keep both? Remove one? > > >> > > >> Cheers, > > >> Aljoscha > > >> > > >