Re: Ignite 2.7 Persistence

2019-01-11 Thread Dmitry Lazurkin
Some java code that helps me on node startup:

// Call for each partition in parallel
private void preloadPartition(int partition) {
    IgniteCache cache = ignite
    .cache("test_cache")
    .withKeepBinary();

    ScanQuery query = new
ScanQuery<>(partition, (k, v) -> {
    return false;
    });
    query.setLocal(true);

    try (QueryCursor> cursor =
cache.query(query)) {
    for (@SuppressWarnings("unused") Cache.Entry row  : cursor) {
    // empty
    }
    }
    }

// Call for each index
private void preloadIndex(String index) {
    // Use sql query which uses index and contains falsy-condition
}

PS. My memory region is bigger than total data size.

On 1/11/19 18:20, gweiske wrote:
> Is there a command that one can/needs to run to load the data into memory
> after restart of Ignite? The documentation suggests that at least for 2.7
> that is not necessary, and I have not found a command that would start the
> loading into memory from persistence. It looks like one can write some Java
> code, but it seems such basic functionality that I thought that there should
> be a shell command.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/




RE: Ignite 2.7 Persistence

2019-01-11 Thread gweiske
Is there a command that one can/needs to run to load the data into memory
after restart of Ignite? The documentation suggests that at least for 2.7
that is not necessary, and I have not found a command that would start the
loading into memory from persistence. It looks like one can write some Java
code, but it seems such basic functionality that I thought that there should
be a shell command.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


RE: Ignite 2.7 Persistence

2019-01-11 Thread Stanislav Lukyanov
Running the query the first time isn’t really like loading all data into memory 
and then doing the query. I would assume that
it is much less efficient – all kinds of locking and contention may be 
involved. Also, the reads are done via random disk access, while when reading 
from
CSV you’re reading sequentially.

I assume that there are ways to make queries on a cold storage more efficient.
One would probably need to spend a lot of time on that collecting and analyzing 
JFRs and other profiling data.
On the other hand, having an ability to do a hot restart will probably solve 
the issue for most users.

Stan

From: gweiske
Sent: 11 января 2019 г. 2:03
To: user@ignite.apache.org
Subject: RE: Ignite 2.7 Persistence

Thanks for the replies. Yes, subsequent queries are faster, but the time to
run the query the first time (i.e. load the data into memory) after a
restart can be measured in hours and is significantly longer than loading
the data from a csv file. That does not seem right. 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/



RE: Ignite 2.7 Persistence

2019-01-10 Thread gweiske
Thanks for the replies. Yes, subsequent queries are faster, but the time to
run the query the first time (i.e. load the data into memory) after a
restart can be measured in hours and is significantly longer than loading
the data from a csv file. That does not seem right. 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


RE: Ignite 2.7 Persistence

2019-01-10 Thread Stanislav Lukyanov
Hi,

That’s right, Ignite nodes restart “cold” meaning that they become operational 
without the data in the RAM.
It allows to restart as quickly as possible, but the price is that the first 
operations have to load data from the disk, meaning that the performance will 
be much lower.

Here is a ticket to allow turn on a “hot restart” mode - 
https://issues.apache.org/jira/browse/IGNITE-10152.
There is also an improvement that allows to manually load data of a specific 
partition in an efficient way - 
https://issues.apache.org/jira/browse/IGNITE-8873. If you iterate over all 
partitions after the node start it may shorten the warmup period.

Stan 

From: Glenn Wiebe
Sent: 8 января 2019 г. 18:02
To: user@ignite.apache.org
Subject: Re: Ignite 2.7 Persistence

I am new to Ignite, but as I understand it, after cluster restart, data is 
re-hydrated into memory as the nodes receive requests for their partitions' 
entries. So, a first query would be as slow as a distributed disk-based query. 
Subsequent queries should have some (depending on memory available) information 
in memory and thus faster. 

So, my question, is this the first query execution since startup?
Given that you have sufficient memory to hold this particular cache, I would 
expect subsequent query executions to take advantage of memory resident query 
processing.

Additionally I have done a quick look (but could not find) at whether Ignite 
caches in memory store aggregates (like counts) which may be able to be 
returned without reading actual data as here.

Good luck!

On Tue, Jan 8, 2019 at 7:55 AM gweiske  wrote:
I am using Ignite 2.7 with persistence enabled on a single VM with 128 GB RAM
in Azure and separate external HDD drives each for wal, walarchive and
storage. I loaded 20 GB of data/50,000,000 rows, then shut down Ignite and
restarted the hosting VM, started and activated Ignite and ran a simple
query
that requires sorting through all the data (SELECT DISTINCT  FROM 
;). The query has been running for hours now. Looking at the memory, instead
of the expected ~42 GB it is currently at 5.7GB (*slowly* increasing). Any
ideas why it might be that slow? 
The same scenario with SSD drives (this time 1 drive for wal and walarchive,
a second one for storage) finishes in about 5500 seconds (still slow).



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/



Re: Ignite 2.7 Persistence

2019-01-08 Thread Glenn Wiebe
I am new to Ignite, but as I understand it, after cluster restart, data is
re-hydrated into memory as the nodes receive requests for their partitions'
entries. So, a first query would be as slow as a distributed disk-based
query. Subsequent queries should have some (depending on memory available)
information in memory and thus faster.

So, my question, is this the first query execution since startup?
Given that you have sufficient memory to hold this particular cache, I
would expect subsequent query executions to take advantage of memory
resident query processing.

Additionally I have done a quick look (but could not find) at whether
Ignite caches in memory store aggregates (like counts) which may be able to
be returned without reading actual data as here.

Good luck!

On Tue, Jan 8, 2019 at 7:55 AM gweiske  wrote:

> I am using Ignite 2.7 with persistence enabled on a single VM with 128 GB
> RAM
> in Azure and separate external HDD drives each for wal, walarchive and
> storage. I loaded 20 GB of data/50,000,000 rows, then shut down Ignite and
> restarted the hosting VM, started and activated Ignite and ran a simple
> query
> that requires sorting through all the data (SELECT DISTINCT  FROM
> ;). The query has been running for hours now. Looking at the memory,
> instead
> of the expected ~42 GB it is currently at 5.7GB (*slowly* increasing). Any
> ideas why it might be that slow?
> The same scenario with SSD drives (this time 1 drive for wal and
> walarchive,
> a second one for storage) finishes in about 5500 seconds (still slow).
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>