[ 
https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3551.
--------------------------

    Resolution: Won't Fix

Ok.  Closing.  Will reference your comment Marc over in HBASE-25, etc.  I also 
added a section to schema design on size of rows and column family names, 
keeping them small.  Thanks for digging in boss.

  <section xml:id="keysize">
      <title>Try to minimize row and column sizes</title>
      <para>In HBase, values are always freighted with their coordinates; as a
          cell value passes through the system, it'll be accompanied by its
          row, column name, and timestamp.  Always.  If your rows and column 
names
          are large, especially compared o the size of the cell value, then
          you may run up against some interesting scenarios.  One such is
          the case described by Marc Limotte at the tail of
          <link 
xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=13005272#comment-13005272";>HBASE-3551</link>
          (recommended!).
          Therein, the indices that are kept on HBase storefiles (<link 
linkend="hfile">HFile</link>s)
                  to facilitate random access may end up occupyng large chunks 
of the HBase
                  allotted RAM because the cell value coordinates are large.
                  Mark in the above cited comment suggests upping the block 
size so
                  entries in the store file index happen at a larger interval or
                  modify the table schema so it makes for smaller rows and 
column
                  names.
      `</para>
  </section>

> Loaded hfile indexes occupy a good chunk of heap; look into shrinking the 
> amount used and/or evicting unused indices
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3551
>                 URL: https://issues.apache.org/jira/browse/HBASE-3551
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: stack
>
> I hung with a user Marc and we were looking over configs and his cluster 
> profile up on ec2.  One thing we noticed was that his 100+ 1G regions of two 
> families had ~2.5G of heap resident.  We did a bit of math and couldn't get 
> to 2.5G so that needs looking into.  Even still, 2.5G is a bunch of heap to 
> give over to indices (He actually OOME'd when he had his RS heap set to just 
> 3G; we shouldn't OOME, we should just run slower).  It sounds like he needs 
> the indices loaded but still, for some cases we should drop indices for 
> unaccessed files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to