Re: questions regarding hbase major compaction

2018-09-10 Thread Josh Elser

1. Yes
2. HDFS NN pressure, read slow down, general poor performance
3. Default configuration is weekly, if you don't explicitly know some 
reasons why weekly doesn't work, this is what you should follow ;)

4. No

I would be surprised if you need to do anything special with S3, but I 
don't know for sure.


On 9/10/18 2:19 PM, Antonio Si wrote:

Hello,

As I understand, the deleted records in hbase files do not get removed
until a major compaction is performed.

I have a few questions regarding major compaction:

1.   If I set a TTL and/or a max number of versions, the records are older
than the TTL or the
   expired versions will still be in the hbase files until the major
compaction is performed?
   Is my understanding correct?

2.   If a major compaction is never performed on a table, besides the size
of the table keep
   increasing, eventually, we will have too many hbase files and the
cluster will slow down.
   Is there any other implications?

3.   Is there any guidelines about how often should we run major compaction?

4.   During major compaction, do we need to pause all read/write operations
until major
   compaction is finished?

   I realize that if using S3 as the storage, after I run major
compaction, there is inconsistencies
   between s3 metadata and s3 file system and I need to run a "emrfs
sync" to synchronize them
   after major compaction is completed. Does it mean I need to pause all
read/write operations
   during this period?

Thanks.

Antonio.



questions regarding hbase major compaction

2018-09-10 Thread Antonio Si
Hello,

As I understand, the deleted records in hbase files do not get removed
until a major compaction is performed.

I have a few questions regarding major compaction:

1.   If I set a TTL and/or a max number of versions, the records are older
than the TTL or the
  expired versions will still be in the hbase files until the major
compaction is performed?
  Is my understanding correct?

2.   If a major compaction is never performed on a table, besides the size
of the table keep
  increasing, eventually, we will have too many hbase files and the
cluster will slow down.
  Is there any other implications?

3.   Is there any guidelines about how often should we run major compaction?

4.   During major compaction, do we need to pause all read/write operations
until major
  compaction is finished?

  I realize that if using S3 as the storage, after I run major
compaction, there is inconsistencies
  between s3 metadata and s3 file system and I need to run a "emrfs
sync" to synchronize them
  after major compaction is completed. Does it mean I need to pause all
read/write operations
  during this period?

Thanks.

Antonio.