Re: MVCC and IgniteDataStreamer

Vladimir Ozerov Sat, 14 Jul 2018 02:33:51 -0700

Igniters,

Denis is right - please pay attention to IEP-22, as this is how we are
going to load data into the grid in future. Note that current data streamer
internals are not efficient enough, primarily because it has to interact
with page memory, free lists and various BTree's in regular manner. I think
that when IEP-22 is implemented, it will be integrated with data streamer
tightly, and the most defautl way to load data would be:
1) Obtain exclusive table lock
2) Load data bypassing almost all Ignite internals
3) Re-build indexes
4) Release the lock


Normally all types of data load should obey transactional semantics if MVCC
is enabled, and we should think separately on how to do that for
continuous-streaming case.

For now let's focus on immediate goal for MVCC release - data streamer
should work, no new abstractions or APIs should be introduced. The easiest
way to do this is to agree that streamer is not transactional and use
special version as Igor proposed. In future releases, when IEP-22 is
implemented, it become transactional with help of exclusive table lock. In
more distant releases we will think about separate optimizations for
continuous streaming and possibly other cases.

Makes sense?

Vladimir.


On Fri, Jul 13, 2018 at 11:30 PM Denis Magda <dma...@apache.org> wrote:

> Agree that initial loading and real-time streaming should be seen as
> different use cases.
>
> For the loading part, I would borrow ideas from direct data load IEP [1].
> Ignite should assume that no app works with the cluster until it's
> preloaded. So, no global locks or things like that. Just fasten a seat belt
> and feed data to your nodes.
>
> For the streaming part, I would consider 2 or 3 proposed by Igor.
>
> --
> Denis
>
> [1]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load
>
> On Fri, Jul 13, 2018 at 10:03 AM Seliverstov Igor <gvvinbl...@gmail.com>
> wrote:
>
> > Ivan,
> >
> > Anyway DataStreamer is the fastest way to deliver data to a data node,
> the
> > question is how to apply it correctly.
> >
> > I don’t thing we need one more tool, which 90% is the same as
> DataStreamer.
> >
> > All we need is just to implement a couple of new stream receivers.
> >
> > Regards,
> > Igor
> >
> > > 13 июля 2018 г., в 9:56, Павлухин Иван <vololo...@gmail.com>
> написал(а):
> > >
> > > Hi Igniters,
> > >
> > > I had a look into IgniteDataStreamer. As far as I understand, currently
> > it
> > > just works incorrectly for MVCC tables. It appears as a blocker for
> > > releasing MVCC. The simplest thing is to refuse creating streamer for
> > MVCC
> > > tables.
> > >
> > > Next step could be hair splitting of related use cases. For me, initial
> > > load and continuous streaming look quite different cases and it is
> better
> > > to keep them separate at least at API level. Perhaps, it is better to
> > > separate API basing on user experience. For example, DataStreamer could
> > be
> > > considered tool without surprises (which means leaving data always
> > > consistent, transactions). And let's say BulkLoader is a beast for
> > fastest
> > > data loading but full of surprises. Such surprises could be locking
> > tables,
> > > rolling back user transactions and so on. So, it is of very limited use
> > > (like initial load). Keeping API entities separate looks better for me
> > than
> > > introducing multiple modes, because separated entities are easier for
> > > understanding and so less prone to user mistakes.
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> >
> >
>

Re: MVCC and IgniteDataStreamer

Reply via email to