[ 
https://issues.apache.org/jira/browse/HBASE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akashnil updated HBASE-6361:
----------------------------

    Description: 
Currently the compaction requests are submitted to the minor/major compaction 
queue of a region-server from every column-family/region belonging to it. The 
requests are processed from the queue in FIFO order (First in First out). We 
want to make a lazy scheduler in place of the current queue-based one. The idea 
of lazy scheduling is that, it is always better to make a decision (compaction 
selection) later if the decision is relevant later only. Presently, when the 
queue gets bottle-necked, there is a delay between compaction selection of a 
request and its execution. Rather than that, we can postpone the compaction 
selection until the queue is empty when we will have more information and 
choices (new flush files will have arrived by then) to make a better decision.

Removing the queue, we propose to implement a round-robin scheduler. All the 
column families in their regions will be visited in sequence periodically. In 
each visit, if the column family generates a valid compaction request, the 
request is executed before moving to the next one. We do not plan to change the 
current compaction algorithm for now. We expect that it will automatically make 
a better decision when doing just-in-time selection due to the new change. How 
do we know that? Let us consider an example.

Suppose there is a short term bottleneck in the queue so that it is blocked for 
a period of time. (Let the min-files for compaction = 4). For an active 
column-family, when new flushes are written, new compaction requests, each of 
size 4, will be added to the queue continuously until the queue starts 
processing them.

Now consider a round-robin scheduler. The effect of a bottle-neck due to the IO 
rate of compaction results in a longer latency to visit the same column family 
again. When the same active column family is visited following a long delay, 
suppose 16 new flush files have been written there. The compaction selection 
algorithm will select one compaction request of size 16, as opposed to 4 
compaction requests of size 4 that would have been generated in the previous 
case.

A compaction request with 16 flush files is more IOPs-efficient than the same 
set of files being compacted 4 at a time. This is because both consume the same 
total amount of reads and writes while producing a file of size 16 compared to 
4 files of size 4. So, in the second case, we obtained a free compaction 
4*4->16 without paying for it. In case of the queue, those smaller 4 sized 
files would have consumed more IOPs to become bigger later.

On my simulator, I did some experiments on how a bottle-neck of the queue 
affects the compaction selections in the current system. It appears that, a 
filled up queue actually makes all future compaction selections less and less 
efficient in terms of IOPs, resulting in a runway positive feedback loop which 
can potentially explode the compaction queue. (This was also observed in 
production recently). The main effect of this change should be to deal with 
bursty loads. When a bottleneck occurs, the compaction selection will become 
more IOPs-efficient rather than less efficient, resulting in negative feedback 
and restoration to stability more easily. As for monitoring, the compaction 
queue size will not be present as a metric. However, the number of files in 
each compaction will indicate if a bottleneck has occurred.

  was:
Currently the compaction requests are submitted to the minor/major compaction 
queue of a region-server from every column-family/region belonging to it. The 
requests are processed from the queue in FIFO order (First in First out). We 
want to make a lazy scheduler in place of the current queue-based one. The idea 
of lazy scheduling is that, it is always better to make a decision (compaction 
selection) later if the decision is relevant later only. Presently, when the 
queue gets bottle-necked, there is a delay between compaction selection of a 
request and its execution. Rather than that, we can postpone the compaction 
selection until the queue is empty when we will have more information (new 
flush files will have affected the state) to make a better decision.

Removing the queue, we propose to implement a round-robin scheduler. All the 
column families in their regions will be visited in sequence periodically. In 
each visit, if the column family generates a valid compaction request, the 
request is executed before moving to the next one. We do not plan to change the 
current compaction algorithm for now. We expect that it will automatically make 
a better decision when doing just-in-time selection due to the new change. How 
do we know that? Let us consider an example.

Note that the presently existing compaction queue is only relevant as a buffer, 
when the flushes out-pace the compactions for a period of time, or a relatively 
large compaction consumes time to complete, the queue accumulates requests. 
Suppose such a short-term bottleneck has occurred. Suppose min-files for 
compaction = 4. For an active column-family, when new flushes are written, new 
compaction requests, each of size 4, will be added to the queue continuously 
until the queue starts processing them.

Now consider a round-robin scheduler. The effect of a bottle-neck due to the IO 
rate of compaction results in a longer latency to visit the same column family 
again. When the same active column family is visited following a long delay, 
suppose 16 new flush files have been written there. The compaction selection 
algorithm will select one compaction request of size 16, as opposed to 4 
compaction requests of size 4 that would have been generated in the previous 
case.

A compaction request with 16 flush files is more IOPs-efficient than the same 
set of files being compacted 4 at a time. This is because both consume the same 
total amount of reads and writes while producing a file of size 16 compared to 
4 files of size 4. So we obtained a free compaction 4*4->16 without paying for 
it. In case of the queue, those smaller 4 sized files would have consumed more 
IOPs to become bigger later.

I did some experiments on how a bottle-neck of the queue affects the compaction 
selections in the simulator. It appears that, a filled up queue actually makes 
all future compaction selections less and less efficient in terms of IOPs, 
resulting in a runway positive feedback loop which can sometimes explode the 
compaction queue. The main effect of this change should be to deal with bursty 
loads. When a bottleneck occurs, the compaction selection will become more 
IOPs-efficient rather than less efficient, resulting in negative feedback and 
restoration to stability more easily. As for monitoring, the compaction queue 
size will not be present as a metric. However, the number of files in each 
compaction will indicate if a bottleneck has occurred.

    
> Change the compaction queue to a round robin scheduler
> ------------------------------------------------------
>
>                 Key: HBASE-6361
>                 URL: https://issues.apache.org/jira/browse/HBASE-6361
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Akashnil
>
> Currently the compaction requests are submitted to the minor/major compaction 
> queue of a region-server from every column-family/region belonging to it. The 
> requests are processed from the queue in FIFO order (First in First out). We 
> want to make a lazy scheduler in place of the current queue-based one. The 
> idea of lazy scheduling is that, it is always better to make a decision 
> (compaction selection) later if the decision is relevant later only. 
> Presently, when the queue gets bottle-necked, there is a delay between 
> compaction selection of a request and its execution. Rather than that, we can 
> postpone the compaction selection until the queue is empty when we will have 
> more information and choices (new flush files will have arrived by then) to 
> make a better decision.
> Removing the queue, we propose to implement a round-robin scheduler. All the 
> column families in their regions will be visited in sequence periodically. In 
> each visit, if the column family generates a valid compaction request, the 
> request is executed before moving to the next one. We do not plan to change 
> the current compaction algorithm for now. We expect that it will 
> automatically make a better decision when doing just-in-time selection due to 
> the new change. How do we know that? Let us consider an example.
> Suppose there is a short term bottleneck in the queue so that it is blocked 
> for a period of time. (Let the min-files for compaction = 4). For an active 
> column-family, when new flushes are written, new compaction requests, each of 
> size 4, will be added to the queue continuously until the queue starts 
> processing them.
> Now consider a round-robin scheduler. The effect of a bottle-neck due to the 
> IO rate of compaction results in a longer latency to visit the same column 
> family again. When the same active column family is visited following a long 
> delay, suppose 16 new flush files have been written there. The compaction 
> selection algorithm will select one compaction request of size 16, as opposed 
> to 4 compaction requests of size 4 that would have been generated in the 
> previous case.
> A compaction request with 16 flush files is more IOPs-efficient than the same 
> set of files being compacted 4 at a time. This is because both consume the 
> same total amount of reads and writes while producing a file of size 16 
> compared to 4 files of size 4. So, in the second case, we obtained a free 
> compaction 4*4->16 without paying for it. In case of the queue, those smaller 
> 4 sized files would have consumed more IOPs to become bigger later.
> On my simulator, I did some experiments on how a bottle-neck of the queue 
> affects the compaction selections in the current system. It appears that, a 
> filled up queue actually makes all future compaction selections less and less 
> efficient in terms of IOPs, resulting in a runway positive feedback loop 
> which can potentially explode the compaction queue. (This was also observed 
> in production recently). The main effect of this change should be to deal 
> with bursty loads. When a bottleneck occurs, the compaction selection will 
> become more IOPs-efficient rather than less efficient, resulting in negative 
> feedback and restoration to stability more easily. As for monitoring, the 
> compaction queue size will not be present as a metric. However, the number of 
> files in each compaction will indicate if a bottleneck has occurred.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to