[ 
https://issues.apache.org/jira/browse/HBASE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-3048:
-----------------------------------------

    Description: 
Today minor compactions do not process deletes, purge old versions, etc. Only 
major compactions do.  The rationale was probably to save CPU (?). We should 
evaluate if major compaction logic indeed runs significantly slower.

Unifying minor compactions to do the same thing as major compactions has other 
advantages:

* If the same keys are deleted/updated repeatedly, the fact that 
deletes/overwrites are not processed during minor compaction makes each 
subsequent minor compaction more expensive as the total amount of data keeps 
growing.

* We'll have fewer bugs if the logic is as symmetric as possible. Any bugs in 
TTL enforcement, version enforcement, etc. could cause behavior to be different 
after a major compaction. Keeping the same logic means these bugs will get 
caught earlier.

-

Note: There will still need to be one difference in the two schemes, and that 
has to do with delete markers. Any compaction which doesn't compact all files 
will still need to leave delete markers.


  was:
Today minor compactions do not process deletes, purge old versions, etc. Only 
major compactions do.  The rationale was probably to save CPU (?). We should 
evaluate if major compaction logic indeed runs significantly slower.

Unifying minor compactions to do the same thing as major compactions has other 
advantages:

* If the same data is overwritten several times and we are not processing 
overwrites, it makes each subsequent minor compaction more expensive as the 
total amount of data.

* We'll have fewer bugs if the logic is as symmetric as possible. Any bugs in 
TTL enforcement, version enforcement, etc. could cause behavior to be different 
after a major compaction. Keeping the same logic means these bugs will get 
caught earlier.

-

Note: There will still need to be one difference in the two schemes, and that 
has to do with delete markers. Any compaction which doesn't compact all files 
will still need to leave delete markers.



> unify code for major/minor compactions
> --------------------------------------
>
>                 Key: HBASE-3048
>                 URL: https://issues.apache.org/jira/browse/HBASE-3048
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>
> Today minor compactions do not process deletes, purge old versions, etc. Only 
> major compactions do.  The rationale was probably to save CPU (?). We should 
> evaluate if major compaction logic indeed runs significantly slower.
> Unifying minor compactions to do the same thing as major compactions has 
> other advantages:
> * If the same keys are deleted/updated repeatedly, the fact that 
> deletes/overwrites are not processed during minor compaction makes each 
> subsequent minor compaction more expensive as the total amount of data keeps 
> growing.
> * We'll have fewer bugs if the logic is as symmetric as possible. Any bugs in 
> TTL enforcement, version enforcement, etc. could cause behavior to be 
> different after a major compaction. Keeping the same logic means these bugs 
> will get caught earlier.
> -
> Note: There will still need to be one difference in the two schemes, and that 
> has to do with delete markers. Any compaction which doesn't compact all files 
> will still need to leave delete markers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to