Re: Ignite 2.7 Persistence
Some java code that helps me on node startup: // Call for each partition in parallel private void preloadPartition(int partition) { IgniteCache cache = ignite .cache("test_cache") .withKeepBinary(); ScanQuery query = new ScanQuery<>(partition, (k, v) -> { return false; }); query.setLocal(true); try (QueryCursor> cursor = cache.query(query)) { for (@SuppressWarnings("unused") Cache.Entry row : cursor) { // empty } } } // Call for each index private void preloadIndex(String index) { // Use sql query which uses index and contains falsy-condition } PS. My memory region is bigger than total data size. On 1/11/19 18:20, gweiske wrote: > Is there a command that one can/needs to run to load the data into memory > after restart of Ignite? The documentation suggests that at least for 2.7 > that is not necessary, and I have not found a command that would start the > loading into memory from persistence. It looks like one can write some Java > code, but it seems such basic functionality that I thought that there should > be a shell command. > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
RE: Ignite 2.7 Persistence
Is there a command that one can/needs to run to load the data into memory after restart of Ignite? The documentation suggests that at least for 2.7 that is not necessary, and I have not found a command that would start the loading into memory from persistence. It looks like one can write some Java code, but it seems such basic functionality that I thought that there should be a shell command. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
RE: Ignite 2.7 Persistence
Running the query the first time isn’t really like loading all data into memory and then doing the query. I would assume that it is much less efficient – all kinds of locking and contention may be involved. Also, the reads are done via random disk access, while when reading from CSV you’re reading sequentially. I assume that there are ways to make queries on a cold storage more efficient. One would probably need to spend a lot of time on that collecting and analyzing JFRs and other profiling data. On the other hand, having an ability to do a hot restart will probably solve the issue for most users. Stan From: gweiske Sent: 11 января 2019 г. 2:03 To: user@ignite.apache.org Subject: RE: Ignite 2.7 Persistence Thanks for the replies. Yes, subsequent queries are faster, but the time to run the query the first time (i.e. load the data into memory) after a restart can be measured in hours and is significantly longer than loading the data from a csv file. That does not seem right. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
RE: Ignite 2.7 Persistence
Thanks for the replies. Yes, subsequent queries are faster, but the time to run the query the first time (i.e. load the data into memory) after a restart can be measured in hours and is significantly longer than loading the data from a csv file. That does not seem right. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
RE: Ignite 2.7 Persistence
Hi, That’s right, Ignite nodes restart “cold” meaning that they become operational without the data in the RAM. It allows to restart as quickly as possible, but the price is that the first operations have to load data from the disk, meaning that the performance will be much lower. Here is a ticket to allow turn on a “hot restart” mode - https://issues.apache.org/jira/browse/IGNITE-10152. There is also an improvement that allows to manually load data of a specific partition in an efficient way - https://issues.apache.org/jira/browse/IGNITE-8873. If you iterate over all partitions after the node start it may shorten the warmup period. Stan From: Glenn Wiebe Sent: 8 января 2019 г. 18:02 To: user@ignite.apache.org Subject: Re: Ignite 2.7 Persistence I am new to Ignite, but as I understand it, after cluster restart, data is re-hydrated into memory as the nodes receive requests for their partitions' entries. So, a first query would be as slow as a distributed disk-based query. Subsequent queries should have some (depending on memory available) information in memory and thus faster. So, my question, is this the first query execution since startup? Given that you have sufficient memory to hold this particular cache, I would expect subsequent query executions to take advantage of memory resident query processing. Additionally I have done a quick look (but could not find) at whether Ignite caches in memory store aggregates (like counts) which may be able to be returned without reading actual data as here. Good luck! On Tue, Jan 8, 2019 at 7:55 AM gweiske wrote: I am using Ignite 2.7 with persistence enabled on a single VM with 128 GB RAM in Azure and separate external HDD drives each for wal, walarchive and storage. I loaded 20 GB of data/50,000,000 rows, then shut down Ignite and restarted the hosting VM, started and activated Ignite and ran a simple query that requires sorting through all the data (SELECT DISTINCT FROM ;). The query has been running for hours now. Looking at the memory, instead of the expected ~42 GB it is currently at 5.7GB (*slowly* increasing). Any ideas why it might be that slow? The same scenario with SSD drives (this time 1 drive for wal and walarchive, a second one for storage) finishes in about 5500 seconds (still slow). -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite 2.7 Persistence
I am new to Ignite, but as I understand it, after cluster restart, data is re-hydrated into memory as the nodes receive requests for their partitions' entries. So, a first query would be as slow as a distributed disk-based query. Subsequent queries should have some (depending on memory available) information in memory and thus faster. So, my question, is this the first query execution since startup? Given that you have sufficient memory to hold this particular cache, I would expect subsequent query executions to take advantage of memory resident query processing. Additionally I have done a quick look (but could not find) at whether Ignite caches in memory store aggregates (like counts) which may be able to be returned without reading actual data as here. Good luck! On Tue, Jan 8, 2019 at 7:55 AM gweiske wrote: > I am using Ignite 2.7 with persistence enabled on a single VM with 128 GB > RAM > in Azure and separate external HDD drives each for wal, walarchive and > storage. I loaded 20 GB of data/50,000,000 rows, then shut down Ignite and > restarted the hosting VM, started and activated Ignite and ran a simple > query > that requires sorting through all the data (SELECT DISTINCT FROM > ;). The query has been running for hours now. Looking at the memory, > instead > of the expected ~42 GB it is currently at 5.7GB (*slowly* increasing). Any > ideas why it might be that slow? > The same scenario with SSD drives (this time 1 drive for wal and > walarchive, > a second one for storage) finishes in about 5500 seconds (still slow). > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >