Yakov,

We can introduce several modes:

1) initial loading which replaces data (allowOverwrite=true) with initial 
version or leaves it as is (allowOverwrite=false) and requires exclusive table 
lock (fastest one)
2) continuous loading which has its own version and links the data as regular 
transaction (allowOverwrite=true) or leaves it as is (allowOverwrite=false), 
doesn’t affect concurrent readers but still requires write lock on a table 
(less fast than previous)
3) batch loading which acts as a sequence of regular transaction with all 
possible optimizations, doesn’t affect concurrent readers and writers, but 
causes possible lock conflicts with subsequent retries, links the data as 
regular transaction (allowOverwrite=true) or leaves it as is 
(allowOverwrite=false), doesn’t cause write conflicts (like READ_COMMITTED txs) 
(slowest one).

All the modes require table locks.

Your thoughts?

> 9 июля 2018 г., в 12:55, Yakov Zhdanov <yzhda...@apache.org> написал(а):
> 
> Igor,
> 
> I can't say if I agree with any of the suggestions. I would like us to
> start from answering the question - what is data streamer used for?
> 
> First of all, for initial data loading. This should be super fast mode
> probably ignoring all transactional semantics, but providing certain
> guarantees for data passed into streamer to be loaded.
> 
> Second, for continuously streaming updates to some tables (from more than 1
> streamer) and running some analytics over data, probably, with some
> modifications from non-streamer side (user transactions). This way
> streamers should not rollback user txs or do any kind of unexpected
> visibility tricks. I think we can think of proper streamer tx on batch or
> key level.
> 
> Third case I see is a combination of the above - we stream portions of data
> to an existing table let's say once a day (which may be some market data
> after closing or offloaded operations data set) with or without any other
> concurrent non-streamer operations. This mode may involve table locks or do
> the same as 2nd mode which should be up to user to decide.
> 
> So, planned changes to streamer should support at least these 3 scenarios.
> What do you think?
> 
> Igniters, feel free sharing your thoughts on this. Question is pretty
> important for us.
> 
> --Yakov

Reply via email to