Hi Dima,

Update with indexes would definitely be slower than update without them.
The question is how much slower. For now the slowdown comes mostly from
excessive data page reads ([1] and [2] in my previous email) leading to
page evictions and additional IO. To the contrast, usually only a single
page write is needed to update an index. Correct index implementation ([1]
and [2] from previous email) would eliminate data page reads altogether and
should give dramatic speedup.

On Mon, May 7, 2018 at 10:58 AM, Dmitriy Setrakyan <dsetrak...@apache.org>
wrote:

> Vladimir, my comments are inline...
>
> On Sat, May 5, 2018 at 6:12 AM, Vladimir Ozerov <voze...@gridgain.com>
> wrote:
>
>> In general I do not support this initiative. There are two serious reasons
>> for that:
>> 1) Our indexes are slow on updates due to architectural flaws. First,
>> every
>> index entry must be of fixed size. For this reason we cannot inline full
>> values in general case and suffer from data page lookups [1]. Second,
>> final
>> comparisons always compare primary keys, so another lookup is needed [2].
>> Third, our indexes are fat because we are lacking prefix compression [3].
>>
>
> These all seem like great optimization and we should definitely do them.
> However, I am of the strong opinion that even after these optimizations,
> the data ingestion speed will be much slower with the persistence turned
> on. Am I wrong?
>
>
>> 2) Some vendors do have memory-only indexes - SQL Server, Couchbase,
>> MemSQL, to name a few. But they are memory optimized - no pages, no
>> BTrees.
>> Lock-free skiplist is used instead. This is correct design which really
>> fast. But we are very far from it at the moment.
>>
>
> I have not heard complaints about our BTree indexes being slow in memory.
> I only hear complaints about the slow-downs whenever the persistence is
> turned on and users are ingesting large amounts of data.
>
>
>> Taking this in count I would not consider memory-only BTree indexes in the
>> nearest future. Instead, we should focus on performance. When mentioned
>> things are fixed/implemented, our indexes will be both memory-efficient
>> and
>> very fast to update.
>>
>
> I would agree with you only if there is no performance boost in the short
> term. So far, disabling persistence for indexes seems like a very simple
> change, but could render a significant performance boost.
>
>
>>
>> [1]
>> https://issues.apache.org/jira/browse/IGNITE-8385
>> [2]
>> https://issues.apache.org/jira/browse/IGNITE-8384
>> [3]
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-20%
>> 3A+Data+Compression+in+Ignite#IEP-20:DataCompressioninIgnite
>> -IndexPrefixCompression
>>
>> сб, 5 мая 2018 г. в 3:46, Dmitriy Setrakyan <dsetrak...@apache.org>:
>>
>> > Igniters,
>> >
>> > One of the main complaints I hear from users is that whenever the
>> > persistence is turned on, index creation can really slow down the
>> > performance, because of massive amounts of writes to disk. The reason
>> > Ignite is writing indexes to disk is to support fast restarts - nothing
>> > needs to be rebuilt on startup, and Ignite can become operational right
>> > away.
>> >
>> > However, as far as I can tell, most users care about faster operations
>> > after the system is started and much less about the startup speed. What
>> if
>> > we added a mode where we do not persist indexes at all? This way data
>> > ingestion and overall throughput will significantly increase (of
>> course, at
>> > the cost of startup type getting longer because we have to rebuild the
>> > indexes).
>> >
>> > There are 2 ways to achieve this in Ignite. The simplest way is not mark
>> > index pages dirty in memory, so they will never participate in
>> > check-pointing process. We also have to make sure that index pages never
>> > get evicted form memory. This can be done fairly quickly. The
>> disadvantage
>> > of this approach is that if indexes fill up most of the memory, then it
>> > will be very difficult to find a page to evict, which may hurt the
>> > performance.
>> >
>> > The other way is to have a separate in-memory off-heap region for
>> indexes.
>> > This region should never be persisted. It maybe somewhat bigger
>> > refactoring, as we currently do not separate between index and data
>> pages.
>> > However, the advantage of this approach is that this region can be
>> flushed
>> > to disk practically as is during a graceful shutdown of the node, and
>> hence
>> > shorten the restart time.
>> >
>> > I think we should start from the 1st approach and then think about the
>> 2nd
>> > one. What do you think?
>> >
>> > D.
>> >
>>
>
>

Reply via email to