Re: Increase the indexing speed while loading the cache from an RDBMS
>I've checked the RDBMS perf metrics. I do not see any spike in the CPU or memory usage. >Is there anything in particular that could cause a bottleneck from the DB perspective ? The DB in use is a Postgres DB. if you use cache store, each node will read the full table you want to load, but only records which belong to a node will be saved(based on data affinity), so Ignite will do N parallel read of the same table, where N is number of nodes. So if you have 50 nodes, then you should use data streamer, so one node will read data and stream them to the cluster. However if you have relatively small amount of nodes, it's fine to use CacheStore to load data. >Is there a sample/best practice reference code for Using Datastreamers to directly query the DB to preload the cache ? > My intention is to use Spring Data via IgniteRepositories. In order to use an Ignite repository, I would have to preload the cache. >And now, to preload the cache, I would have to query the DB. A bit of a chicken and egg problem. I would say, use JDBC and Ignite Data Streamer, it's one time action that you don't need to repeat often, you can even java separate app to do this. Thanks, Mike. чт, 27 авг. 2020 г. в 14:01, Srikanta Patanjali : > Thanks Anton & Mikhail for your responses. > > I'll try to disable the WAL and test the cache preload performance. Also > will execute a JFR to capture some info. > > @Mikhail I've checked the RDBMS perf metrics. I do not see any spike in > the CPU or memory usage. Is there anything in particular that could cause a > bottleneck from the DB perspective ? The DB in use is a Postgres DB. > > @Anton Is there a sample/best practice reference code for Using > Datastreamers to directly query the DB to preload the cache ? My intention > is to use Spring Data via IgniteRepositories. In order to use an Ignite > repository, I would have to preload the cache. And now, to preload the > cache, I would have to query the DB. A bit of a chicken and egg problem. > > Regards, > Srikanta > > On Thu, Aug 27, 2020 at 7:54 PM Mikhail Cherkasov > wrote: > >> btw, might be bottleneck is your RDBMS, it can just stream data slowly, >> slower then ignite can save, if make sense to check this version too. >> >> On Thu, Aug 27, 2020 at 10:46 AM akurbanov wrote: >> >>> Hi, >>> >>> Which API are you using to load the entries and what is the node >>> configuration? I would recommend to share the configs and try utilizing >>> the >>> data streamer. >>> >>> https://apacheignite.readme.io/docs/data-streamers >>> >>> I would recommend recording a JFR to find where the VM spends most time. >>> >>> Best regards, >>> Anton >>> >>> >>> >>> -- >>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >>> >> >> >> -- >> Thanks, >> Mikhail. >> >
Re: Increase the indexing speed while loading the cache from an RDBMS
Thanks Anton & Mikhail for your responses. I'll try to disable the WAL and test the cache preload performance. Also will execute a JFR to capture some info. @Mikhail I've checked the RDBMS perf metrics. I do not see any spike in the CPU or memory usage. Is there anything in particular that could cause a bottleneck from the DB perspective ? The DB in use is a Postgres DB. @Anton Is there a sample/best practice reference code for Using Datastreamers to directly query the DB to preload the cache ? My intention is to use Spring Data via IgniteRepositories. In order to use an Ignite repository, I would have to preload the cache. And now, to preload the cache, I would have to query the DB. A bit of a chicken and egg problem. Regards, Srikanta On Thu, Aug 27, 2020 at 7:54 PM Mikhail Cherkasov wrote: > btw, might be bottleneck is your RDBMS, it can just stream data slowly, > slower then ignite can save, if make sense to check this version too. > > On Thu, Aug 27, 2020 at 10:46 AM akurbanov wrote: > >> Hi, >> >> Which API are you using to load the entries and what is the node >> configuration? I would recommend to share the configs and try utilizing >> the >> data streamer. >> >> https://apacheignite.readme.io/docs/data-streamers >> >> I would recommend recording a JFR to find where the VM spends most time. >> >> Best regards, >> Anton >> >> >> >> -- >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >> > > > -- > Thanks, > Mikhail. >
Re: Increase the indexing speed while loading the cache from an RDBMS
btw, might be bottleneck is your RDBMS, it can just stream data slowly, slower then ignite can save, if make sense to check this version too. On Thu, Aug 27, 2020 at 10:46 AM akurbanov wrote: > Hi, > > Which API are you using to load the entries and what is the node > configuration? I would recommend to share the configs and try utilizing the > data streamer. > > https://apacheignite.readme.io/docs/data-streamers > > I would recommend recording a JFR to find where the VM spends most time. > > Best regards, > Anton > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ > -- Thanks, Mikhail.
Re: Increase the indexing speed while loading the cache from an RDBMS
Hi, Which API are you using to load the entries and what is the node configuration? I would recommend to share the configs and try utilizing the data streamer. https://apacheignite.readme.io/docs/data-streamers I would recommend recording a JFR to find where the VM spends most time. Best regards, Anton -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Increase the indexing speed while loading the cache from an RDBMS
Hi Srikanta, Have you tried to load data without indexto compare time? I think Wal disabling for data load can save some hours for you: https://www.gridgain.com/docs/latest/developers-guide/persistence/native-persistence#disabling-wal Thanks, Mike. On Wed, Aug 26, 2020, 8:47 AM Srikanta Patanjali wrote: > Currently I'm using Apache Ignite v2.8.1 to preload a cache from the > RDBMS. There are two tables with each 27M rows. The index is defined on a > single column of type String in 1st table and Integer in the 2nd table. > Together the total size of the two tables is around 120GB. > > The preloading process (triggered using loadCacheAsync() from within a > Java app) takes about 45hrs. The cache is persistence enabled and a > common EBS volume (SSD) is being used for both the WAL and other locations. > > I'm unable to figure out the bottleneck for increasing the speed. > > Apart from defining a separate path for WAL and the persistence, is there > any other way to load the cache faster (with indexing enabled) ? > > > Thanks, > Srikanta >
Increase the indexing speed while loading the cache from an RDBMS
Currently I'm using Apache Ignite v2.8.1 to preload a cache from the RDBMS. There are two tables with each 27M rows. The index is defined on a single column of type String in 1st table and Integer in the 2nd table. Together the total size of the two tables is around 120GB. The preloading process (triggered using loadCacheAsync() from within a Java app) takes about 45hrs. The cache is persistence enabled and a common EBS volume (SSD) is being used for both the WAL and other locations. I'm unable to figure out the bottleneck for increasing the speed. Apart from defining a separate path for WAL and the persistence, is there any other way to load the cache faster (with indexing enabled) ? Thanks, Srikanta