[
https://issues.apache.org/jira/browse/HBASE-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665288#action_12665288
]
Jonathan Gray commented on HBASE-1141:
--------------------------------------
We're looking into this because our most common query is:
get_all_columns(table, row, family)
When this is a smaller number of columns, our random access times are on the
order of 2ms. But if there are a few thousand columns in the family, this can
take >100ms.
Certainly there are some inefficiencies in a query like this because you must
check all stores, but even when serving out of memory (the new cache Erik is
designing) there is a significant performance hit to having many columns.
Erik has done some timing and can post what he has found.
> Fetching large numbers of columns is slow outside of HDFS
> ---------------------------------------------------------
>
> Key: HBASE-1141
> URL: https://issues.apache.org/jira/browse/HBASE-1141
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: Jonathan Gray
> Fix For: 0.20.0
>
>
> While working on a Cell cache, we have found during random-read tests that
> the number of columns has an enormous impact on performance. Accounting for
> increased HDFS access time, there is still a great deal of time being spent
> coming out of the Region and then across the wire to HTable.
> Erik Holstad has done this testing and will post some of his results here
> when completed.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.