[jira] [Updated] (HBASE-6351) IO impact reduction for compaction

Otis Gospodnetic (JIRA) Wed, 11 Jul 2012 12:05:36 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Otis Gospodnetic updated HBASE-6351:
------------------------------------

    Description: 
The following came from Otis via http://search-hadoop.com/m/MGVqgZJ4Mj2 :

Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene 
developers, wrote a really nice post about new things in this version of 
Lucene.  The part that I think is interesting for HBase, and that HBase devs 
may want to look at (and borrow to use with compactions) is this:

Reducing merge IO impact 

Merging (consolidating many small segments into a single big one) is a very IO 
and CPU intensive operation which can easily interfere with ongoing searches. 
In 4.0.0 we now have two ways to reduct this impact:
        * Rate-limit the IO caused by ongoing merging, by calling 
FSDirectory.setMaxMergeWriteMBPerSec. 


        * Use the new NativeUnixDirectory which bypasses the OS's IO cache for 
all merge IO, by using direct IO. This ensures that a merge won't evict hot 
pages used by searches. (Note that there is also a native WindowsDirectory, but 
it does not yet use direct IO during merging... patches welcome!). 

Remember to also set swappiness to 0 on Linux if you want to maximize search 
responsiveness. 

More generally, the APIs that open an input or output file (Directory.openInput 
and Directory.createOutput) now take an IOContext describing what's being done 
(e.g., flush vs merge), so you can create a custom Directory that changes its 
behavior depending on the context. 

  was:
The following came from Otis:

Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene 
developers, wrote a really nice post about new things in this version of 
Lucene.  The part that I think is interesting for HBase, and that HBase devs 
may want to look at (and borrow to use with compactions) is this:

Reducing merge IO impact 

Merging (consolidating many small segments into a single big one) is a very IO 
and CPU intensive operation which can easily interfere with ongoing searches. 
In 4.0.0 we now have two ways to reduct this impact:
        * Rate-limit the IO caused by ongoing merging, by calling 
FSDirectory.setMaxMergeWriteMBPerSec. 


        * Use the new NativeUnixDirectory which bypasses the OS's IO cache for 
all merge IO, by using direct IO. This ensures that a merge won't evict hot 
pages used by searches. (Note that there is also a native WindowsDirectory, but 
it does not yet use direct IO during merging... patches welcome!). 

Remember to also set swappiness to 0 on Linux if you want to maximize search 
responsiveness. 

More generally, the APIs that open an input or output file (Directory.openInput 
and Directory.createOutput) now take an IOContext describing what's being done 
(e.g., flush vs merge), so you can create a custom Directory that changes its 
behavior depending on the context. 

    
> IO impact reduction for compaction
> ----------------------------------
>
>                 Key: HBASE-6351
>                 URL: https://issues.apache.org/jira/browse/HBASE-6351
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zhihong Ted Yu
>
> The following came from Otis via http://search-hadoop.com/m/MGVqgZJ4Mj2 :
> Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene 
> developers, wrote a really nice post about new things in this version of 
> Lucene.  The part that I think is interesting for HBase, and that HBase devs 
> may want to look at (and borrow to use with compactions) is this:
> Reducing merge IO impact 
> Merging (consolidating many small segments into a single big one) is a very 
> IO and CPU intensive operation which can easily interfere with ongoing 
> searches. In 4.0.0 we now have two ways to reduct this impact:
>         * Rate-limit the IO caused by ongoing merging, by calling 
> FSDirectory.setMaxMergeWriteMBPerSec. 
>         * Use the new NativeUnixDirectory which bypasses the OS's IO cache 
> for all merge IO, by using direct IO. This ensures that a merge won't evict 
> hot pages used by searches. (Note that there is also a native 
> WindowsDirectory, but it does not yet use direct IO during merging... patches 
> welcome!). 
> Remember to also set swappiness to 0 on Linux if you want to maximize search 
> responsiveness. 
> More generally, the APIs that open an input or output file 
> (Directory.openInput and Directory.createOutput) now take an IOContext 
> describing what's being done (e.g., flush vs merge), so you can create a 
> custom Directory that changes its behavior depending on the context. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6351) IO impact reduction for compaction

Reply via email to