Bypassing WAL will make the whole cache data vulnerable to complete loss in case of node failure. I would not do this automatically.
On Mon, Jul 16, 2018 at 12:28 PM Ilya Kasnacheev <ilya.kasnach...@gmail.com> wrote: > Hello! > > Can we also bypass WAL for such mode automatically? > > However, we will definitely need a 'normal' mode of DataStreamer operation, > for people who use dataStreamer with custom stream transformers on existing > data in use. > > Regards, > > -- > Ilya Kasnacheev > > 2018-07-14 12:33 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > > > Igniters, > > > > Denis is right - please pay attention to IEP-22, as this is how we are > > going to load data into the grid in future. Note that current data > streamer > > internals are not efficient enough, primarily because it has to interact > > with page memory, free lists and various BTree's in regular manner. I > think > > that when IEP-22 is implemented, it will be integrated with data streamer > > tightly, and the most defautl way to load data would be: > > 1) Obtain exclusive table lock > > 2) Load data bypassing almost all Ignite internals > > 3) Re-build indexes > > 4) Release the lock > > > > Normally all types of data load should obey transactional semantics if > MVCC > > is enabled, and we should think separately on how to do that for > > continuous-streaming case. > > > > For now let's focus on immediate goal for MVCC release - data streamer > > should work, no new abstractions or APIs should be introduced. The > easiest > > way to do this is to agree that streamer is not transactional and use > > special version as Igor proposed. In future releases, when IEP-22 is > > implemented, it become transactional with help of exclusive table lock. > In > > more distant releases we will think about separate optimizations for > > continuous streaming and possibly other cases. > > > > Makes sense? > > > > Vladimir. > > > > > > On Fri, Jul 13, 2018 at 11:30 PM Denis Magda <dma...@apache.org> wrote: > > > > > Agree that initial loading and real-time streaming should be seen as > > > different use cases. > > > > > > For the loading part, I would borrow ideas from direct data load IEP > [1]. > > > Ignite should assume that no app works with the cluster until it's > > > preloaded. So, no global locks or things like that. Just fasten a seat > > belt > > > and feed data to your nodes. > > > > > > For the streaming part, I would consider 2 or 3 proposed by Igor. > > > > > > -- > > > Denis > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP- > > 22%3A+Direct+Data+Load > > > > > > On Fri, Jul 13, 2018 at 10:03 AM Seliverstov Igor < > gvvinbl...@gmail.com> > > > wrote: > > > > > > > Ivan, > > > > > > > > Anyway DataStreamer is the fastest way to deliver data to a data > node, > > > the > > > > question is how to apply it correctly. > > > > > > > > I don’t thing we need one more tool, which 90% is the same as > > > DataStreamer. > > > > > > > > All we need is just to implement a couple of new stream receivers. > > > > > > > > Regards, > > > > Igor > > > > > > > > > 13 июля 2018 г., в 9:56, Павлухин Иван <vololo...@gmail.com> > > > написал(а): > > > > > > > > > > Hi Igniters, > > > > > > > > > > I had a look into IgniteDataStreamer. As far as I understand, > > currently > > > > it > > > > > just works incorrectly for MVCC tables. It appears as a blocker for > > > > > releasing MVCC. The simplest thing is to refuse creating streamer > for > > > > MVCC > > > > > tables. > > > > > > > > > > Next step could be hair splitting of related use cases. For me, > > initial > > > > > load and continuous streaming look quite different cases and it is > > > better > > > > > to keep them separate at least at API level. Perhaps, it is better > to > > > > > separate API basing on user experience. For example, DataStreamer > > could > > > > be > > > > > considered tool without surprises (which means leaving data always > > > > > consistent, transactions). And let's say BulkLoader is a beast for > > > > fastest > > > > > data loading but full of surprises. Such surprises could be locking > > > > tables, > > > > > rolling back user transactions and so on. So, it is of very limited > > use > > > > > (like initial load). Keeping API entities separate looks better for > > me > > > > than > > > > > introducing multiple modes, because separated entities are easier > for > > > > > understanding and so less prone to user mistakes. > > > > > > > > > > -- > > > > > Best regards, > > > > > Ivan Pavlukhin > > > > > > > > > > > > > >