Yakov, We can introduce several modes:
1) initial loading which replaces data (allowOverwrite=true) with initial version or leaves it as is (allowOverwrite=false) and requires exclusive table lock (fastest one) 2) continuous loading which has its own version and links the data as regular transaction (allowOverwrite=true) or leaves it as is (allowOverwrite=false), doesn’t affect concurrent readers but still requires write lock on a table (less fast than previous) 3) batch loading which acts as a sequence of regular transaction with all possible optimizations, doesn’t affect concurrent readers and writers, but causes possible lock conflicts with subsequent retries, links the data as regular transaction (allowOverwrite=true) or leaves it as is (allowOverwrite=false), doesn’t cause write conflicts (like READ_COMMITTED txs) (slowest one). All the modes require table locks. Your thoughts? > 9 июля 2018 г., в 12:55, Yakov Zhdanov <yzhda...@apache.org> написал(а): > > Igor, > > I can't say if I agree with any of the suggestions. I would like us to > start from answering the question - what is data streamer used for? > > First of all, for initial data loading. This should be super fast mode > probably ignoring all transactional semantics, but providing certain > guarantees for data passed into streamer to be loaded. > > Second, for continuously streaming updates to some tables (from more than 1 > streamer) and running some analytics over data, probably, with some > modifications from non-streamer side (user transactions). This way > streamers should not rollback user txs or do any kind of unexpected > visibility tricks. I think we can think of proper streamer tx on batch or > key level. > > Third case I see is a combination of the above - we stream portions of data > to an existing table let's say once a day (which may be some market data > after closing or offloaded operations data set) with or without any other > concurrent non-streamer operations. This mode may involve table locks or do > the same as 2nd mode which should be up to user to decide. > > So, planned changes to streamer should support at least these 3 scenarios. > What do you think? > > Igniters, feel free sharing your thoughts on this. Question is pretty > important for us. > > --Yakov