[jira] [Commented] (HBASE-9811) ColumnPaginationFilter is slow when offset is large

Chao Shi (JIRA) Thu, 14 Nov 2013 01:17:06 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-9811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822264#comment-13822264
 ]


Chao Shi commented on HBASE-9811:
---------------------------------

HBASE-9969 is opened to improve performance of KeyValueHeap.

> ColumnPaginationFilter is slow when offset is large
> ---------------------------------------------------
>
>                 Key: HBASE-9811
>                 URL: https://issues.apache.org/jira/browse/HBASE-9811
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Chao Shi
>
> Hi there, we are trying to migrate a app from MySQL to HBase. One kind of the 
> queries is pagination with large offset and small limit. We don't have too 
> many such queries and so both MySQL and HBase should survive. (MySQL has no 
> index for offset either.)
> When comparing the performance on both systems, we found something interest: 
> write ~1M values in a single row, and query with offset = 1M. So all values 
> should be scanned on RS side.
> When running the query on MySQL, the first query is pretty slow (more than 1 
> second) and then repeat the same query, it will become very low latency.
> HBase on the other hand, repeating the query does not help much (~1s 
> forever). I can confirm that all data are in block cache and all the time is 
> spent on in-memory data processing. (We have flushed data to disk.)
> I found "reseek" is the hot spot. It is caused by ColumnPaginationFilter 
> returning NEXT_COL. If I replace this line by returning SKIP (which causes to 
> call next rather than reseek), the latency is reduced to ~100ms.
> So I think there must be some room for optimization.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9811) ColumnPaginationFilter is slow when offset is large

Reply via email to