[ https://issues.apache.org/jira/browse/HBASE-9811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822264#comment-13822264 ]
Chao Shi commented on HBASE-9811: --------------------------------- HBASE-9969 is opened to improve performance of KeyValueHeap. > ColumnPaginationFilter is slow when offset is large > --------------------------------------------------- > > Key: HBASE-9811 > URL: https://issues.apache.org/jira/browse/HBASE-9811 > Project: HBase > Issue Type: Bug > Reporter: Chao Shi > > Hi there, we are trying to migrate a app from MySQL to HBase. One kind of the > queries is pagination with large offset and small limit. We don't have too > many such queries and so both MySQL and HBase should survive. (MySQL has no > index for offset either.) > When comparing the performance on both systems, we found something interest: > write ~1M values in a single row, and query with offset = 1M. So all values > should be scanned on RS side. > When running the query on MySQL, the first query is pretty slow (more than 1 > second) and then repeat the same query, it will become very low latency. > HBase on the other hand, repeating the query does not help much (~1s > forever). I can confirm that all data are in block cache and all the time is > spent on in-memory data processing. (We have flushed data to disk.) > I found "reseek" is the hot spot. It is caused by ColumnPaginationFilter > returning NEXT_COL. If I replace this line by returning SKIP (which causes to > call next rather than reseek), the latency is reduced to ~100ms. > So I think there must be some room for optimization. -- This message was sent by Atlassian JIRA (v6.1#6144)