I totally agree with Vladimir.

>From JdbcPojo store side we could introduce support of some kind load
descriptor
 that will contains SQL to execute and node filter.

On each node store will check node filter and execute SQL if node match the
filter.

This will solve first problem - "do not load full database on each node" .

As for second problem - "not ignore non-primary non-backup entries" - I
think this should be solved on a cache level,
 because store does not know anything about primary / backup.

Thoughts?


On Wed, Apr 27, 2016 at 9:27 PM, Vladimir Ozerov <voze...@gridgain.com>
wrote:

> Igniters,
>
> We receive more and more questions about the same problem: "I have big a
> database. How should I load it to Ignite?"
>
> Obviously, users try to use POJO store as a most convenient approach, but
> it cannot handle this case properly.
> 1) If user invoke *IgniteCache.loadCache()*, then the same request - and
> usually this is full table scan - will be invoked on every node leading to
> very poor performance. For instance, we have a report of a load of 47M
> entries to cache on 16 nodes which took ... 8 hours!!!
> 2) If user invoke IgniteCache.localLoadCache(), then our internal cache
> logic will filter out non-primary and non-backup entries. So this approach
> doesn't work either.
> 3) User could try using *IgniteDataStreamer*, but in this case he had to
> deal with all JDBC-related stuff on his own - not convenient.
> 4) Another approach I heard several times - "user should have an attribute
> for affinity in the table ...". And the idea that this way user will be
> able to divide the whole data set into several disjoint sets with specific
> affinity. Doesn't work. Consider the user with some legacy database - the
> most common use case. How is he going to work with affinity?
>
> Bottom line: Ignite has *no convenient way *to load millions of entries
> from a database.
>
> We need to start thinking of possible solutions. Several ideas from my
> side:
>
> 1) POJO store must be much more flexible. We should be able to pass
> different queries to different nodes when calling "loadCache".
>
> 2) Cache store could have additional mode when it will not ignore
> non-primary non-backup entries, but rather *distribute *it to other nodes.
> E.g. with help of data streamer.
>
> Thoughts?
>
> Vladimir.
>



-- 
Alexey Kuznetsov
GridGain Systems
www.gridgain.com

Reply via email to