[ 
https://issues.apache.org/jira/browse/HBASE-18586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-18586.
--------------------------------
    Resolution: Invalid

Please ask questions such as these on the u...@hbase.apache.org. This JIRA 
instance is reserved for concrete code changes, not user support. Thanks.

> Multiple column families - scan performance
> -------------------------------------------
>
>                 Key: HBASE-18586
>                 URL: https://issues.apache.org/jira/browse/HBASE-18586
>             Project: HBase
>          Issue Type: Bug
>          Components: scan
>            Reporter: PS0618
>
> I have 2 HBase tables - one with a single column family, and other has 4 
> column families. Both tables are keyed by same rowkey, and the column 
> families all have a single column qualifier each, with a json string as value 
> (each json payload is about 10-20K in size). All column families use 
> fast-diff encoding and gzip compression.
> After loading about 60MM rows to each table, a scan test on (any) single 
> column family in the 2nd table takes 4x the time to scan the single column 
> family from the 1st table. In both cases, the scanner is bounded by a start 
> and stop key to scan 1MM rows. Performance did not change much even after 
> running a major compaction on both tables.
> Though HBase doc and other tech forums recommend not using more than 1 column 
> family per table, nothing I have read so far suggests scan performance will 
> linearly degrade based on number of column families. Has anyone else 
> experienced this, and is there a simple explanation for this?
> To note, the reason second table has 4 column families is even though I only 
> scan one column family at a time now, there are requirements to scan multiple 
> column families from that table given a set of rowkeys.
> Thanks for any insight into the performance question.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to