[
https://issues.apache.org/jira/browse/HBASE-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403836#comment-16403836
]
Anoop Sam John commented on HBASE-20188:
----------------------------------------
In tests, we try flush to SDD or HDD boss?
In writes we normally see this compares as hottest. It used to be this way. Now
I can see the issue you face is the exception because memstore size is 4x
larger than flush size. As you said yes the flush is speedy enough. I can
think of following
1. Now with compacting memstore, the flush op as such more time taking. With
default memstore, it just a matter of iterating over a map and write cells. But
now we have to read from multiple segments in a heap way and so more compares
there. Anyway at flush size the flush op was triggered. But till it reaches 4x
flush was not complete. This can be one reason
2. The writes to CSLM as such became fast. With default memstore, when we are
at flush size and started the flush op, still writes happening to CSLM. we
allow it anyway. But then CSLM state is such that it already having so many
cells and writes might be bit more delayed. So the pace of this might be low
enough for flush to complete. But with compacting memstore, it will become
fresh CSLM again and so very fast writes.
How about your global memstore size limit. I guess this might be a very large
number. Normally in tests what we see is this barrier breach and so forced
flushes by blocking writes. Because there are enough regions in RS and write to
all regions. So any region crossing this 4x mark is less likely compared to
this global barrier breach. When I did tests normally will select this barrier
as 2 * regions# * flush size.
We very much need flush to be faster. If not on SSD, any chance for an SSD
based tests? This is one reason why am a fan of that JMS issue of flush to SSD
policy.
> [TESTING] Performance
> ---------------------
>
> Key: HBASE-20188
> URL: https://issues.apache.org/jira/browse/HBASE-20188
> Project: HBase
> Issue Type: Umbrella
> Components: Performance
> Reporter: stack
> Priority: Critical
> Fix For: 2.0.0
>
> Attachments: flamegraph-1072.1.svg, flamegraph-1072.2.svg, tree.txt
>
>
> How does 2.0.0 compare to old versions? Is it faster, slower? There is rumor
> that it is much slower, that the problem is the asyncwal writing. Does
> in-memory compaction slow us down or speed us up? What happens when you
> enable offheaping?
> Keep notes here in this umbrella issue. Need to be able to say something
> about perf when 2.0.0 ships.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)