Hi Dima, Update with indexes would definitely be slower than update without them. The question is how much slower. For now the slowdown comes mostly from excessive data page reads ([1] and [2] in my previous email) leading to page evictions and additional IO. To the contrast, usually only a single page write is needed to update an index. Correct index implementation ([1] and [2] from previous email) would eliminate data page reads altogether and should give dramatic speedup.
On Mon, May 7, 2018 at 10:58 AM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > Vladimir, my comments are inline... > > On Sat, May 5, 2018 at 6:12 AM, Vladimir Ozerov <voze...@gridgain.com> > wrote: > >> In general I do not support this initiative. There are two serious reasons >> for that: >> 1) Our indexes are slow on updates due to architectural flaws. First, >> every >> index entry must be of fixed size. For this reason we cannot inline full >> values in general case and suffer from data page lookups [1]. Second, >> final >> comparisons always compare primary keys, so another lookup is needed [2]. >> Third, our indexes are fat because we are lacking prefix compression [3]. >> > > These all seem like great optimization and we should definitely do them. > However, I am of the strong opinion that even after these optimizations, > the data ingestion speed will be much slower with the persistence turned > on. Am I wrong? > > >> 2) Some vendors do have memory-only indexes - SQL Server, Couchbase, >> MemSQL, to name a few. But they are memory optimized - no pages, no >> BTrees. >> Lock-free skiplist is used instead. This is correct design which really >> fast. But we are very far from it at the moment. >> > > I have not heard complaints about our BTree indexes being slow in memory. > I only hear complaints about the slow-downs whenever the persistence is > turned on and users are ingesting large amounts of data. > > >> Taking this in count I would not consider memory-only BTree indexes in the >> nearest future. Instead, we should focus on performance. When mentioned >> things are fixed/implemented, our indexes will be both memory-efficient >> and >> very fast to update. >> > > I would agree with you only if there is no performance boost in the short > term. So far, disabling persistence for indexes seems like a very simple > change, but could render a significant performance boost. > > >> >> [1] >> https://issues.apache.org/jira/browse/IGNITE-8385 >> [2] >> https://issues.apache.org/jira/browse/IGNITE-8384 >> [3] >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-20% >> 3A+Data+Compression+in+Ignite#IEP-20:DataCompressioninIgnite >> -IndexPrefixCompression >> >> сб, 5 мая 2018 г. в 3:46, Dmitriy Setrakyan <dsetrak...@apache.org>: >> >> > Igniters, >> > >> > One of the main complaints I hear from users is that whenever the >> > persistence is turned on, index creation can really slow down the >> > performance, because of massive amounts of writes to disk. The reason >> > Ignite is writing indexes to disk is to support fast restarts - nothing >> > needs to be rebuilt on startup, and Ignite can become operational right >> > away. >> > >> > However, as far as I can tell, most users care about faster operations >> > after the system is started and much less about the startup speed. What >> if >> > we added a mode where we do not persist indexes at all? This way data >> > ingestion and overall throughput will significantly increase (of >> course, at >> > the cost of startup type getting longer because we have to rebuild the >> > indexes). >> > >> > There are 2 ways to achieve this in Ignite. The simplest way is not mark >> > index pages dirty in memory, so they will never participate in >> > check-pointing process. We also have to make sure that index pages never >> > get evicted form memory. This can be done fairly quickly. The >> disadvantage >> > of this approach is that if indexes fill up most of the memory, then it >> > will be very difficult to find a page to evict, which may hurt the >> > performance. >> > >> > The other way is to have a separate in-memory off-heap region for >> indexes. >> > This region should never be persisted. It maybe somewhat bigger >> > refactoring, as we currently do not separate between index and data >> pages. >> > However, the advantage of this approach is that this region can be >> flushed >> > to disk practically as is during a graceful shutdown of the node, and >> hence >> > shorten the restart time. >> > >> > I think we should start from the 1st approach and then think about the >> 2nd >> > one. What do you think? >> > >> > D. >> > >> > >