Re: Increase the indexing speed while loading the cache from an RDBMS

2020-09-04 Thread Michael Cherkasov
>I've checked the RDBMS perf metrics. I do not see any spike in the CPU or
memory usage.
>Is there anything in particular that could cause a bottleneck from the DB
perspective ? The DB in use is a Postgres DB.

if you use cache store, each node will read the full table you want to
load, but only records which belong to a node will be saved(based on data
affinity), so Ignite will do N parallel read of the same table, where N is
number of nodes.
So if you have 50 nodes, then you should use data streamer, so one node
will read data and stream them to the cluster.
However if you have relatively small amount of nodes, it's fine to use
CacheStore to load data.

>Is there a sample/best practice reference code for Using Datastreamers to
directly query the DB to preload the cache ?
> My intention is to use Spring Data via IgniteRepositories. In order to
use an Ignite repository, I would have to preload the cache.
>And now, to preload the cache, I would have to query the DB. A bit of a
chicken and egg problem.
I would say, use JDBC and Ignite Data Streamer, it's one time action that
you don't need to repeat often, you can even java separate app to do this.

Thanks,
Mike.

чт, 27 авг. 2020 г. в 14:01, Srikanta Patanjali :

> Thanks Anton & Mikhail for your responses.
>
> I'll try to disable the WAL and test the cache preload performance. Also
> will execute a JFR to capture some info.
>
> @Mikhail I've checked the RDBMS perf metrics. I do not see any spike in
> the CPU or memory usage. Is there anything in particular that could cause a
> bottleneck from the DB perspective ? The DB in use is a Postgres DB.
>
> @Anton Is there a sample/best practice reference code for Using
> Datastreamers to directly query the DB to preload the cache ? My intention
> is to use Spring Data via IgniteRepositories. In order to use an Ignite
> repository, I would have to preload the cache. And now, to preload the
> cache, I would have to query the DB. A bit of a chicken and egg problem.
>
> Regards,
> Srikanta
>
> On Thu, Aug 27, 2020 at 7:54 PM Mikhail Cherkasov 
> wrote:
>
>> btw, might be bottleneck is your RDBMS, it can just stream data slowly,
>> slower then ignite can save, if make sense to check this version too.
>>
>> On Thu, Aug 27, 2020 at 10:46 AM akurbanov  wrote:
>>
>>> Hi,
>>>
>>> Which API are you using to load the entries and what is the node
>>> configuration? I would recommend to share the configs and try utilizing
>>> the
>>> data streamer.
>>>
>>> https://apacheignite.readme.io/docs/data-streamers
>>>
>>> I would recommend recording a JFR to find where the VM spends most time.
>>>
>>> Best regards,
>>> Anton
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>
>>
>> --
>> Thanks,
>> Mikhail.
>>
>


Re: Increase the indexing speed while loading the cache from an RDBMS

2020-08-27 Thread Srikanta Patanjali
Thanks Anton & Mikhail for your responses.

I'll try to disable the WAL and test the cache preload performance. Also
will execute a JFR to capture some info.

@Mikhail I've checked the RDBMS perf metrics. I do not see any spike in the
CPU or memory usage. Is there anything in particular that could cause a
bottleneck from the DB perspective ? The DB in use is a Postgres DB.

@Anton Is there a sample/best practice reference code for Using
Datastreamers to directly query the DB to preload the cache ? My intention
is to use Spring Data via IgniteRepositories. In order to use an Ignite
repository, I would have to preload the cache. And now, to preload the
cache, I would have to query the DB. A bit of a chicken and egg problem.

Regards,
Srikanta

On Thu, Aug 27, 2020 at 7:54 PM Mikhail Cherkasov 
wrote:

> btw, might be bottleneck is your RDBMS, it can just stream data slowly,
> slower then ignite can save, if make sense to check this version too.
>
> On Thu, Aug 27, 2020 at 10:46 AM akurbanov  wrote:
>
>> Hi,
>>
>> Which API are you using to load the entries and what is the node
>> configuration? I would recommend to share the configs and try utilizing
>> the
>> data streamer.
>>
>> https://apacheignite.readme.io/docs/data-streamers
>>
>> I would recommend recording a JFR to find where the VM spends most time.
>>
>> Best regards,
>> Anton
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>
>
> --
> Thanks,
> Mikhail.
>


Re: Increase the indexing speed while loading the cache from an RDBMS

2020-08-27 Thread Mikhail Cherkasov
btw, might be bottleneck is your RDBMS, it can just stream data slowly,
slower then ignite can save, if make sense to check this version too.

On Thu, Aug 27, 2020 at 10:46 AM akurbanov  wrote:

> Hi,
>
> Which API are you using to load the entries and what is the node
> configuration? I would recommend to share the configs and try utilizing the
> data streamer.
>
> https://apacheignite.readme.io/docs/data-streamers
>
> I would recommend recording a JFR to find where the VM spends most time.
>
> Best regards,
> Anton
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


-- 
Thanks,
Mikhail.


Re: Increase the indexing speed while loading the cache from an RDBMS

2020-08-27 Thread akurbanov
Hi,

Which API are you using to load the entries and what is the node
configuration? I would recommend to share the configs and try utilizing the
data streamer.

https://apacheignite.readme.io/docs/data-streamers

I would recommend recording a JFR to find where the VM spends most time.

Best regards,
Anton



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Increase the indexing speed while loading the cache from an RDBMS

2020-08-27 Thread Michael Cherkasov
Hi Srikanta,
 Have you tried to load data without indexto compare time?

I think Wal disabling for data load can save some hours for you:
https://www.gridgain.com/docs/latest/developers-guide/persistence/native-persistence#disabling-wal

Thanks,
Mike.

On Wed, Aug 26, 2020, 8:47 AM Srikanta Patanjali 
wrote:

> Currently I'm using Apache Ignite v2.8.1 to preload a cache from the
> RDBMS. There are two tables with each 27M rows. The index is defined on a
> single column of type String in 1st table and Integer in the 2nd table.
> Together the total size of the two tables is around 120GB.
>
> The preloading process (triggered using loadCacheAsync() from within a
> Java app) takes about 45hrs. The cache is persistence enabled and a
> common EBS volume (SSD) is being used for both the WAL and other locations.
>
> I'm unable to figure out the bottleneck for increasing the speed.
>
> Apart from defining a separate path for WAL and the persistence, is there
> any other way to load the cache faster (with indexing enabled) ?
>
>
> Thanks,
> Srikanta
>


Increase the indexing speed while loading the cache from an RDBMS

2020-08-26 Thread Srikanta Patanjali
Currently I'm using Apache Ignite v2.8.1 to preload a cache from the RDBMS.
There are two tables with each 27M rows. The index is defined on a single
column of type String in 1st table and Integer in the 2nd table. Together
the total size of the two tables is around 120GB.

The preloading process (triggered using loadCacheAsync() from within a Java
app) takes about 45hrs. The cache is persistence enabled and a common EBS
volume (SSD) is being used for both the WAL and other locations.

I'm unable to figure out the bottleneck for increasing the speed.

Apart from defining a separate path for WAL and the persistence, is there
any other way to load the cache faster (with indexing enabled) ?


Thanks,
Srikanta