[jira] [Commented] (HBASE-8174) Backport HBASE-8161(setting blocking file count on table level doesn't work)
[ https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618754#comment-13618754 ] clockfly commented on HBASE-8174: - backport patch for 0.94 updated. https://issues.apache.org/jira/secure/attachment/12576375/hbase-8174.patch > Backport HBASE-8161(setting blocking file count on table level doesn't work) > > > Key: HBASE-8174 > URL: https://issues.apache.org/jira/browse/HBASE-8174 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: hbase-8174.patch, HBASE-8174.patch.0.94.v1 > > > Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is > configured at region server level. > We should allow it to be configured at Table level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8174) Backport HBASE-8161(setting blocking file count on table level doesn't work)
[ https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8174: Summary: Backport HBASE-8161(setting blocking file count on table level doesn't work) (was: Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles") > Backport HBASE-8161(setting blocking file count on table level doesn't work) > > > Key: HBASE-8174 > URL: https://issues.apache.org/jira/browse/HBASE-8174 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: hbase-8174.patch, HBASE-8174.patch.0.94.v1 > > > Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is > configured at region server level. > We should allow it to be configured at Table level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"
[ https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8174: Attachment: hbase-8174.patch > Allow each table to customize its own flush blocking file count > "hbase.hstore.blockingStoreFiles" > - > > Key: HBASE-8174 > URL: https://issues.apache.org/jira/browse/HBASE-8174 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: hbase-8174.patch, HBASE-8174.patch.0.94.v1 > > > Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is > configured at region server level. > We should allow it to be configured at Table level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations" to 0.94
[ https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8176: Release Note: With HBASE-8176("Dynamic Schema Configurations"), we can define table/column family specific configuration by HColumnDescriptor.setValue() or HTableDescriptor.setValue(). We can also do this easily in hbase shell. Change the table-scope by set attribute CONFIG like this: alter 'test', METHOD => 'table_att', CONFIG => {'hbase.hstore.compaction.min' => '5'} Change the column family config by set attribute CONFIG like this: alter 'test', NAME => 'f', CONFIG => {'hbase.hstore.compaction.min' => '5'} was: With HBASE-8176("Dynamic Schema Configurations"), we can define table/column family specific configuration by HColumnDescriptor.setValue() or HTableDescriptor.setValue(). We can also do this easily in hbase shell. Change the table-scope by set attribute CONFIG like this: alter 'test', METHOD => 'table_att', CONFIG => {'hbase.hstore.compaction.min' => '5'} Change the column family config by set attribute CONFIG like this: alter 'test', NAME => 'f', CONFIG => {'hbase.hstore.compaction.min' => a'5'} > Backport HBASE-5335 "Dynamic Schema Configurations" to 0.94 > --- > > Key: HBASE-8176 > URL: https://issues.apache.org/jira/browse/HBASE-8176 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: hbase-8176.patch, HBASE-8176.patchv2, > hbase-8176-release-notes.patch > > > With HBASE-5335, we can support per-table configuration and per-family > configurations. > We can use it to customize the compaction on table/family basis, customize > the flush, and etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"
[ https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8176: Attachment: hbase-8176-release-notes.patch add suggested release notes > Backport HBASE-5335 "Dynamic Schema Configurations" > --- > > Key: HBASE-8176 > URL: https://issues.apache.org/jira/browse/HBASE-8176 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: hbase-8176.patch, HBASE-8176.patchv2, > hbase-8176-release-notes.patch > > > With HBASE-5335, we can support per-table configuration and per-family > configurations. > We can use it to customize the compaction on table/family basis, customize > the flush, and etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"
[ https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8176: Attachment: HBASE-8176.patchv2 add patchv2 for HBASE-8176 > Backport HBASE-5335 "Dynamic Schema Configurations" > --- > > Key: HBASE-8176 > URL: https://issues.apache.org/jira/browse/HBASE-8176 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: hbase-8176.patch, HBASE-8176.patchv2, > hbase-8176-release-notes.patch > > > With HBASE-5335, we can support per-table configuration and per-family > configurations. > We can use it to customize the compaction on table/family basis, customize > the flush, and etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"
[ https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617152#comment-13617152 ] clockfly commented on HBASE-8176: - @Ted, thanks for pointting out the test error, I made a mistake, I neglected the largeTest category test. Now, all UT and integration test passes. @Andrew, the shell syntax is completely compatible with before, it extends the function by adding a new attribute "CONFIG", while old attributes still works. @Lars and Andrew, I have updated the HBase shell context help in the patch: {quote} or a shorter version: hbase> alter 't1', 'delete' => 'f1' + +You can also change the column family config by set attribute CONFIG like this: + hbase> alter 'test', NAME=>'f', CONFIG => {'hbase.hstore.compaction.min' => '5'} You can also change table-scope attributes like MAX_FILESIZE MEMSTORE_FLUSHSIZE, READONLY, and DEFERRED_LOG_FLUSH. @@ -47,6 +50,9 @@ For example, to change the max size of a family to 128MB, do: hbase> alter 't1', METHOD => 'table_att', MAX_FILESIZE => '134217728' + +You can also change the table-scope by set attribute CONFIG like this: + hbase> alter 'test', METHOD=>'table_att', CONFIG => {'hbase.hstore.compaction.min' => '5'} {quote} Here is the suggested release notes for this patch: {quote} Release notes: With HBASE-8176("Dynamic Schema Configurations"), we can define table/column family specific configuration by HColumnDescriptor.setValue() or HTableDescriptor.setValue(). We can also do this easily in hbase shell, like this: Change the table-scope by set attribute CONFIG like this: alter 'test', METHOD => 'table_att', CONFIG => {'hbase.hstore.compaction.min' => '5'} Change the column family config by set attribute CONFIG like this:a alter 'test', NAME => 'f', CONFIG => {'hbase.hstore.compaction.min' => a'5'} {quote} > Backport HBASE-5335 "Dynamic Schema Configurations" > --- > > Key: HBASE-8176 > URL: https://issues.apache.org/jira/browse/HBASE-8176 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: hbase-8176.patch > > > With HBASE-5335, we can support per-table configuration and per-family > configurations. > We can use it to customize the compaction on table/family basis, customize > the flush, and etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file
[ https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616106#comment-13616106 ] clockfly commented on HBASE-8152: - All UT passes.(mvn test) > Avoid creating empty reference file when splitkey is outside the range of a > store file > -- > > Key: HBASE-8152 > URL: https://issues.apache.org/jira/browse/HBASE-8152 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, HFile >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.95.0, 0.94.7 > > Attachments: hbase-8152.0.94patch.v2, HBASE-8152-0.94v3.patch, > HBASE-8152-0.96v1.patch, hbase-8152.patch0.94 > > > When splitting a store file, if the split key is before the first key, or > greater than the last key, then only one reference file should be created. > Currently, two reference file will be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file
[ https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8152: Attachment: HBASE-8152-0.94v3.patch add patch for 0.94 > Avoid creating empty reference file when splitkey is outside the range of a > store file > -- > > Key: HBASE-8152 > URL: https://issues.apache.org/jira/browse/HBASE-8152 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, HFile >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.95.0, 0.94.7 > > Attachments: hbase-8152.0.94patch.v2, HBASE-8152-0.94v3.patch, > HBASE-8152-0.96v1.patch, hbase-8152.patch0.94 > > > When splitting a store file, if the split key is before the first key, or > greater than the last key, then only one reference file should be created. > Currently, two reference file will be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file
[ https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8152: Attachment: HBASE-8152-0.96v1.patch add patch HBASE-8152-0.96v1.patch for trunk > Avoid creating empty reference file when splitkey is outside the range of a > store file > -- > > Key: HBASE-8152 > URL: https://issues.apache.org/jira/browse/HBASE-8152 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, HFile >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.95.0, 0.94.7 > > Attachments: hbase-8152.0.94patch.v2, HBASE-8152-0.94v3.patch, > HBASE-8152-0.96v1.patch, hbase-8152.patch0.94 > > > When splitting a store file, if the split key is before the first key, or > greater than the last key, then only one reference file should be created. > Currently, two reference file will be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file
[ https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8152: Priority: Minor (was: Trivial) Fix Version/s: 0.95.0 > Avoid creating empty reference file when splitkey is outside the range of a > store file > -- > > Key: HBASE-8152 > URL: https://issues.apache.org/jira/browse/HBASE-8152 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, HFile >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.95.0, 0.94.7 > > Attachments: hbase-8152.0.94patch.v2, hbase-8152.patch0.94 > > > When splitting a store file, if the split key is before the first key, or > greater than the last key, then only one reference file should be created. > Currently, two reference file will be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file
[ https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616011#comment-13616011 ] clockfly commented on HBASE-8152: - Ted, I will attach a new patch soon. > Avoid creating empty reference file when splitkey is outside the range of a > store file > -- > > Key: HBASE-8152 > URL: https://issues.apache.org/jira/browse/HBASE-8152 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, HFile >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Trivial > Fix For: 0.94.7 > > Attachments: hbase-8152.0.94patch.v2, hbase-8152.patch0.94 > > > When splitting a store file, if the split key is before the first key, or > greater than the last key, then only one reference file should be created. > Currently, two reference file will be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file
[ https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615998#comment-13615998 ] clockfly commented on HBASE-8152: - {quote} // pick an split point (roughly halfway) - byte[] SPLITKEY = new byte[] { (LAST_CHAR-FIRST_CHAR)/2, FIRST_CHAR}; + byte[] SPLITKEY = new byte[] { (LAST_CHAR + FIRST_CHAR)/2, FIRST_CHAR}; {quote} {quote} How did this work before? {quote} Sergey, actually this is hidden bug in old version, it accidently worked, because the split key lies outside the store file by "LAST_CHAR-FIRST_CHAR)/2", and there was also a bug in the counter which miss adding 1 count. These two bugs make the old UT pass. I should make this another Jira to make it more clear. > Avoid creating empty reference file when splitkey is outside the range of a > store file > -- > > Key: HBASE-8152 > URL: https://issues.apache.org/jira/browse/HBASE-8152 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, HFile >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Trivial > Fix For: 0.94.7 > > Attachments: hbase-8152.0.94patch.v2, hbase-8152.patch0.94 > > > When splitting a store file, if the split key is before the first key, or > greater than the last key, then only one reference file should be created. > Currently, two reference file will be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"
[ https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613998#comment-13613998 ] clockfly commented on HBASE-8176: - All UT and all integration tests pass. > Backport HBASE-5335 "Dynamic Schema Configurations" > --- > > Key: HBASE-8176 > URL: https://issues.apache.org/jira/browse/HBASE-8176 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: hbase-8176.patch > > > With HBASE-5335, we can support per-table configuration and per-family > configurations. > We can use it to customize the compaction on table/family basis, customize > the flush, and etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"
[ https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613568#comment-13613568 ] clockfly commented on HBASE-8176: - I did another internal patch(not this one) for 0.94.1, and used it in production for months. This patch has minor difference compared with 0.94.2, I only tested the UT part, will try to run full 0.94 test suite. > Backport HBASE-5335 "Dynamic Schema Configurations" > --- > > Key: HBASE-8176 > URL: https://issues.apache.org/jira/browse/HBASE-8176 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: hbase-8176.patch > > > With HBASE-5335, we can support per-table configuration and per-family > configurations. > We can use it to customize the compaction on table/family basis, customize > the flush, and etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"
[ https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613564#comment-13613564 ] clockfly commented on HBASE-8176: - Lars, it did modify the shell admin.rb and hbase.rb. In shell, it introduced a compound config named as "CONFIG". And all additional configurations are configured under "CONFIG", such as CONFIG=>{conf1=>'', conf2=>''} > Backport HBASE-5335 "Dynamic Schema Configurations" > --- > > Key: HBASE-8176 > URL: https://issues.apache.org/jira/browse/HBASE-8176 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: hbase-8176.patch > > > With HBASE-5335, we can support per-table configuration and per-family > configurations. > We can use it to customize the compaction on table/family basis, customize > the flush, and etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"
[ https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8176: Attachment: hbase-8176.patch add HBASE-5335 patch for 0.94 > Backport HBASE-5335 "Dynamic Schema Configurations" > --- > > Key: HBASE-8176 > URL: https://issues.apache.org/jira/browse/HBASE-8176 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: hbase-8176.patch > > > With HBASE-5335, we can support per-table configuration and per-family > configurations. > We can use it to customize the compaction on table/family basis, customize > the flush, and etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"
[ https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613480#comment-13613480 ] clockfly commented on HBASE-8176: - Lars, Jean-Marc, and Anoop, thank you for the comments, they make sense. On the other side, I do find it valuable for those who want to keep 0.94 in production for a long time, considering the big changes in 0.96 and risks of upgrading. I will attach my patch him, so that those who want this feature can patch it directly. Thanks > Backport HBASE-5335 "Dynamic Schema Configurations" > --- > > Key: HBASE-8176 > URL: https://issues.apache.org/jira/browse/HBASE-8176 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > > With HBASE-5335, we can support per-table configuration and per-family > configurations. > We can use it to customize the compaction on table/family basis, customize > the flush, and etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"
[ https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611640#comment-13611640 ] clockfly commented on HBASE-8174: - @Sergey, yes, your patch is more decent, it is at column family level. Mine only works at Region level. Is it worth the risk to backport HBASE-8161 and HBASE-5335 into 0.94? > Allow each table to customize its own flush blocking file count > "hbase.hstore.blockingStoreFiles" > - > > Key: HBASE-8174 > URL: https://issues.apache.org/jira/browse/HBASE-8174 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: HBASE-8174.patch.0.94.v1 > > > Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is > configured at region server level. > We should allow it to be configured at Table level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"
clockfly created HBASE-8176: --- Summary: Backport HBASE-5335 "Dynamic Schema Configurations" Key: HBASE-8176 URL: https://issues.apache.org/jira/browse/HBASE-8176 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.5 Reporter: clockfly Assignee: clockfly Priority: Minor Fix For: 0.94.7 With HBase-5335, we can support per-table configuration and per-family configurations. We can use it to customize the compaction on table/family basis, customize the flush, and etc.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"
[ https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610010#comment-13610010 ] clockfly commented on HBASE-8174: - Hi Anoop, I neglected this, it depends on HBASE-5335. I will do a backport of HBASE-5335 to 0.94. > Allow each table to customize its own flush blocking file count > "hbase.hstore.blockingStoreFiles" > - > > Key: HBASE-8174 > URL: https://issues.apache.org/jira/browse/HBASE-8174 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: HBASE-8174.patch.0.94.v1 > > > Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is > configured at region server level. > We should allow it to be configured at Table level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"
[ https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610011#comment-13610011 ] clockfly commented on HBASE-8174: - In HBase-5335, the HRegion will get a compound configuration of regionserver + HTableDescriptor. > Allow each table to customize its own flush blocking file count > "hbase.hstore.blockingStoreFiles" > - > > Key: HBASE-8174 > URL: https://issues.apache.org/jira/browse/HBASE-8174 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.7 > > Attachments: HBASE-8174.patch.0.94.v1 > > > Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is > configured at region server level. > We should allow it to be configured at Table level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"
[ https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8174: Attachment: HBASE-8174.patch.0.94.v1 Add patch for 0.94/ > Allow each table to customize its own flush blocking file count > "hbase.hstore.blockingStoreFiles" > - > > Key: HBASE-8174 > URL: https://issues.apache.org/jira/browse/HBASE-8174 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.6 > > Attachments: HBASE-8174.patch.0.94.v1 > > > Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is > configured at region server level. > We should allow it to be configured at Table level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"
clockfly created HBASE-8174: --- Summary: Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles" Key: HBASE-8174 URL: https://issues.apache.org/jira/browse/HBASE-8174 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.5 Reporter: clockfly Assignee: clockfly Priority: Minor Fix For: 0.94.6 Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is configured at region server level. We should allow it to be configured at Table level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file
[ https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8152: Priority: Trivial (was: Major) > Avoid creating empty reference file when splitkey is outside the range of a > store file > -- > > Key: HBASE-8152 > URL: https://issues.apache.org/jira/browse/HBASE-8152 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, HFile >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Trivial > Fix For: 0.94.6 > > Attachments: hbase-8152.0.94patch.v2, hbase-8152.patch0.94 > > > When splitting a store file, if the split key is before the first key, or > greater than the last key, then only one reference file should be created. > Currently, two reference file will be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file
[ https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8152: Attachment: hbase-8152.0.94patch.v2 update the patch > Avoid creating empty reference file when splitkey is outside the range of a > store file > -- > > Key: HBASE-8152 > URL: https://issues.apache.org/jira/browse/HBASE-8152 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, HFile >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly > Fix For: 0.94.6 > > Attachments: hbase-8152.0.94patch.v2, hbase-8152.patch0.94 > > > When splitting a store file, if the split key is before the first key, or > greater than the last key, then only one reference file should be created. > Currently, two reference file will be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file
[ https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8152: Attachment: hbase-8152.patch0.94 Add patch for 0.94 > Avoid creating empty reference file when splitkey is outside the range of a > store file > -- > > Key: HBASE-8152 > URL: https://issues.apache.org/jira/browse/HBASE-8152 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, HFile >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly > Fix For: 0.94.6 > > Attachments: hbase-8152.patch0.94 > > > When splitting a store file, if the split key is before the first key, or > greater than the last key, then only one reference file should be created. > Currently, two reference file will be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file
[ https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-8152: Description: When splitting a store file, if the split key is before the first key, or greater than the last key, then only one reference file should be created. Currently, two reference file will be created. was:When splitting a store file, if the split key is before the first key, or greater than the last key, then only one reference file should be created. > Avoid creating empty reference file when splitkey is outside the range of a > store file > -- > > Key: HBASE-8152 > URL: https://issues.apache.org/jira/browse/HBASE-8152 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, HFile >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly > Fix For: 0.94.6 > > > When splitting a store file, if the split key is before the first key, or > greater than the last key, then only one reference file should be created. > Currently, two reference file will be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file
clockfly created HBASE-8152: --- Summary: Avoid creating empty reference file when splitkey is outside the range of a store file Key: HBASE-8152 URL: https://issues.apache.org/jira/browse/HBASE-8152 Project: HBase Issue Type: Improvement Components: Filesystem Integration, HFile Affects Versions: 0.94.5 Reporter: clockfly Assignee: clockfly Fix For: 0.94.6 When splitting a store file, if the split key is before the first key, or greater than the last key, then only one reference file should be created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7876) Got exception when manually triggers a split on an empty region
[ https://issues.apache.org/jira/browse/HBASE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602132#comment-13602132 ] clockfly commented on HBASE-7876: - @stack, this patches works for me, I defined a custom split policy, and a empty region can be splitted successfully. > Got exception when manually triggers a split on an empty region > --- > > Key: HBASE-7876 > URL: https://issues.apache.org/jira/browse/HBASE-7876 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: Maryann Xue >Assignee: Maryann Xue >Priority: Minor > Attachments: HBASE-7876-0.94V2.patch, HBASE-7876-trunk.patch > > > We should allow a region to split successfully even if it does not yet have > storefiles. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7876) Got exception when manually triggers a split on an empty region
[ https://issues.apache.org/jira/browse/HBASE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588168#comment-13588168 ] clockfly commented on HBASE-7876: - ramakrishna, may be this patch don't conflict with your use case. Splitting a empty region will only happen when one of the two conditions is met: 1. Force manual split by specifiying a midkey. 2. Write a customized SplitPolicy. For condition 2, it is the customized splitpolicy's choice. For condition 1, if the application don't want to split a empty region, the application should check whether the region is empty by running a small scan. If the admin don't specify a midkey, for a empty region, the default split policy RegionSplitPolicy:shouldSplit() will always return false, and split will never happen. > Got exception when manually triggers a split on an empty region > --- > > Key: HBASE-7876 > URL: https://issues.apache.org/jira/browse/HBASE-7876 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: Maryann Xue >Assignee: Maryann Xue >Priority: Minor > Attachments: HBASE-7876-0.94.patch > > > We should allow a region to split successfully even if it does not yet have > storefiles. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7876) Got exception when manually triggers a split on an empty region
[ https://issues.apache.org/jira/browse/HBASE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588134#comment-13588134 ] clockfly commented on HBASE-7876: - I also tested a customized SplitPolicy. With this patch, the split will succeed at customized point even if there is no store files in the region. In short, the split decision should be made by upper level SplitPolicy, which knows more about application logic, instead of low level CompactSplitThread. > Got exception when manually triggers a split on an empty region > --- > > Key: HBASE-7876 > URL: https://issues.apache.org/jira/browse/HBASE-7876 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: Maryann Xue >Assignee: Maryann Xue >Priority: Minor > Attachments: HBASE-7876-0.94.patch > > > We should allow a region to split successfully even if it does not yet have > storefiles. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7876) Got exception when manually triggers a split on an empty region
[ https://issues.apache.org/jira/browse/HBASE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588129#comment-13588129 ] clockfly commented on HBASE-7876: - I tested the patch, the split succeed when specifiy a midkey. > Got exception when manually triggers a split on an empty region > --- > > Key: HBASE-7876 > URL: https://issues.apache.org/jira/browse/HBASE-7876 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: Maryann Xue >Assignee: Maryann Xue >Priority: Minor > Attachments: HBASE-7876-0.94.patch > > > We should allow a region to split successfully even if it does not yet have > storefiles. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7876) Got exception when manually triggers a split on an empty region
[ https://issues.apache.org/jira/browse/HBASE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588127#comment-13588127 ] clockfly commented on HBASE-7876: - I think there are several motives: 1. HBaseAdmin can allow splitting by specifying a midkey. For a empty region, we should allow the user to manual split a empty region if the user expects there will be heavy load for this region future. 2. We allow user to customize the SplitPolicy, the splitpoint can be customized. When the customized SplitPolicy said that we need split at xx point, then we should support it, no matter there is store files in the region or not. > Got exception when manually triggers a split on an empty region > --- > > Key: HBASE-7876 > URL: https://issues.apache.org/jira/browse/HBASE-7876 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5 >Reporter: Maryann Xue >Assignee: Maryann Xue >Priority: Minor > Attachments: HBASE-7876-0.94.patch > > > We should allow a region to split successfully even if it does not yet have > storefiles. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records
[ https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586857#comment-13586857 ] clockfly commented on HBASE-7885: - The equation should be: (1+1/m) ^ n = e ^ (n/m) (when m -> infinite) (m stands for bytes length, n stands for expected max record count). > bloom filter compaction is too aggressive for Hfile which only contains small > count of records > -- > > Key: HBASE-7885 > URL: https://issues.apache.org/jira/browse/HBASE-7885 > Project: HBase > Issue Type: Bug > Components: Performance, Scanners >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.6 > > Attachments: hbase-7885.patch, hbase_bloom_shrink_fix.patch > > > For HFile V2, the bloom filter will take a initial size, 128KB. > When there are not that much records inserted into the bloom filter, the > bloom fitler will start to shrink itself to do compaction. > For example, for 128K, it will compact to 64K > ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that > it can be bounded by the estimate error rate. > If we puts only a few records in the HFile, the bloom filter will be > compacted to too small, then it will break the assumption that shrinking will > still be bounded by the estimated error rate. The False positive rate will > becomes un-acceptable high. > For example, if we set the expected error rate is 0.1, for 10 records, > after compaction, The size of the bloom filter will be 64 bytes. The real > effective false positive rate will be 50%. > The use case is like this, if we are using HBase to store big record like > images, and binaries, each record will take megabytes. Then for a 128M file, > it will only contains dozens of records. > The suggested fix is to set a lower limit for the bloom filter compaction > process. I suggest to use 1000 bytes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records
[ https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586854#comment-13586854 ] clockfly commented on HBASE-7885: - Hi Lars, This is more a mathematic process about Limit of a sequence. The assumption of the bloom filter compaction is: The expected false positive rate is the same, as long as (The byte length of bloom filter)/(max keys to contains) stays the same. This assumption is based on (1+1/m)^n == e^(n/m) (when m -> infinite) (m stands for bytes length, n stands for expected max record count). Here I use 1000 as a estimate of infinite. Please check: http://en.wikipedia.org/wiki/Bloom_filter http://en.wikipedia.org/wiki/E_(mathematical_constant) > bloom filter compaction is too aggressive for Hfile which only contains small > count of records > -- > > Key: HBASE-7885 > URL: https://issues.apache.org/jira/browse/HBASE-7885 > Project: HBase > Issue Type: Bug > Components: Performance, Scanners >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.6 > > Attachments: hbase-7885.patch, hbase_bloom_shrink_fix.patch > > > For HFile V2, the bloom filter will take a initial size, 128KB. > When there are not that much records inserted into the bloom filter, the > bloom fitler will start to shrink itself to do compaction. > For example, for 128K, it will compact to 64K > ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that > it can be bounded by the estimate error rate. > If we puts only a few records in the HFile, the bloom filter will be > compacted to too small, then it will break the assumption that shrinking will > still be bounded by the estimated error rate. The False positive rate will > becomes un-acceptable high. > For example, if we set the expected error rate is 0.1, for 10 records, > after compaction, The size of the bloom filter will be 64 bytes. The real > effective false positive rate will be 50%. > The use case is like this, if we are using HBase to store big record like > images, and binaries, each record will take megabytes. Then for a 128M file, > it will only contains dozens of records. > The suggested fix is to set a lower limit for the bloom filter compaction > process. I suggest to use 1000 bytes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records
[ https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-7885: Attachment: hbase-7885.patch attach patch and UT fix for hbase0.94 > bloom filter compaction is too aggressive for Hfile which only contains small > count of records > -- > > Key: HBASE-7885 > URL: https://issues.apache.org/jira/browse/HBASE-7885 > Project: HBase > Issue Type: Bug > Components: Performance, Scanners >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.6 > > Attachments: hbase-7885.patch, hbase_bloom_shrink_fix.patch > > > For HFile V2, the bloom filter will take a initial size, 128KB. > When there are not that much records inserted into the bloom filter, the > bloom fitler will start to shrink itself to do compaction. > For example, for 128K, it will compact to 64K > ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that > it can be bounded by the estimate error rate. > If we puts only a few records in the HFile, the bloom filter will be > compacted to too small, then it will break the assumption that shrinking will > still be bounded by the estimated error rate. The False positive rate will > becomes un-acceptable high. > For example, if we set the expected error rate is 0.1, for 10 records, > after compaction, The size of the bloom filter will be 64 bytes. The real > effective false positive rate will be 50%. > The use case is like this, if we are using HBase to store big record like > images, and binaries, each record will take megabytes. Then for a 128M file, > it will only contains dozens of records. > The suggested fix is to set a lower limit for the bloom filter compaction > process. I suggest to use 1000 bytes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records
[ https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-7885: Fix Version/s: (was: 0.94.5) 0.94.6 Assignee: clockfly > bloom filter compaction is too aggressive for Hfile which only contains small > count of records > -- > > Key: HBASE-7885 > URL: https://issues.apache.org/jira/browse/HBASE-7885 > Project: HBase > Issue Type: Bug > Components: Performance, Scanners >Affects Versions: 0.94.5 >Reporter: clockfly >Assignee: clockfly >Priority: Minor > Fix For: 0.94.6 > > Attachments: hbase_bloom_shrink_fix.patch > > > For HFile V2, the bloom filter will take a initial size, 128KB. > When there are not that much records inserted into the bloom filter, the > bloom fitler will start to shrink itself to do compaction. > For example, for 128K, it will compact to 64K > ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that > it can be bounded by the estimate error rate. > If we puts only a few records in the HFile, the bloom filter will be > compacted to too small, then it will break the assumption that shrinking will > still be bounded by the estimated error rate. The False positive rate will > becomes un-acceptable high. > For example, if we set the expected error rate is 0.1, for 10 records, > after compaction, The size of the bloom filter will be 64 bytes. The real > effective false positive rate will be 50%. > The use case is like this, if we are using HBase to store big record like > images, and binaries, each record will take megabytes. Then for a 128M file, > it will only contains dozens of records. > The suggested fix is to set a lower limit for the bloom filter compaction > process. I suggest to use 1000 bytes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records
[ https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586785#comment-13586785 ] clockfly commented on HBASE-7885: - Hi Ted, + while ( (newByteSize & 1) == 0 && newMaxKeys > (this.keyCount<<1) + && newByteSize >= MIN_BLOOMFILTER_SIZE * 2) { pieces <<= 1; newByteSize >>= 1; newMaxKeys >>= 1; } In the while loop, we will cut the size by half. After compaction, newByteSize will be reduced to newByteSize /2. newByteSize >= MIN_BLOOMFILTER_SIZE * 2 is to make sure after compaction, the bloom filter's size is still >= MIN_BLOOMFILTER_SIZE. There are UT affected, I will attach the UT fix soon. > bloom filter compaction is too aggressive for Hfile which only contains small > count of records > -- > > Key: HBASE-7885 > URL: https://issues.apache.org/jira/browse/HBASE-7885 > Project: HBase > Issue Type: Bug > Components: Performance, Scanners >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Fix For: 0.94.5 > > Attachments: hbase_bloom_shrink_fix.patch > > > For HFile V2, the bloom filter will take a initial size, 128KB. > When there are not that much records inserted into the bloom filter, the > bloom fitler will start to shrink itself to do compaction. > For example, for 128K, it will compact to 64K > ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that > it can be bounded by the estimate error rate. > If we puts only a few records in the HFile, the bloom filter will be > compacted to too small, then it will break the assumption that shrinking will > still be bounded by the estimated error rate. The False positive rate will > becomes un-acceptable high. > For example, if we set the expected error rate is 0.1, for 10 records, > after compaction, The size of the bloom filter will be 64 bytes. The real > effective false positive rate will be 50%. > The use case is like this, if we are using HBase to store big record like > images, and binaries, each record will take megabytes. Then for a 128M file, > it will only contains dozens of records. > The suggested fix is to set a lower limit for the bloom filter compaction > process. I suggest to use 1000 bytes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash
[ https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585606#comment-13585606 ] clockfly commented on HBASE-7884: - Hi Ted, the patch for 0.94 is already attached. patch for 0.94.6: https://issues.apache.org/jira/secure/attachment/12570054/bloom_performance_tunning.patch patch for trunk: https://issues.apache.org/jira/secure/attachment/12570247/bloom_optimization_trunk_patch.patch > ByteBloomFilter's performance can be improved by avoiding multiplication when > generating hash > -- > > Key: HBASE-7884 > URL: https://issues.apache.org/jira/browse/HBASE-7884 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Fix For: 0.96.0, 0.94.6 > > Attachments: bloom_optimization_trunk_patch.patch, > bloom_performance_tunning.patch, hbase-7884-performance-report.pdf, > TestByteBloom.java > > > ByteBloomFilter's performance can be optimized by avoiding multiplication > operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash
[ https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585603#comment-13585603 ] clockfly commented on HBASE-7884: - I didn't adopt the method to use major_compact to compare the performance. Major compact will call bloom filter *builing* when doing merging. This patch doesn't change the *building* process, This patch only impact the bloom filter *lookup* performance, so there will be no performance difference observed for the major compaction. Since byteBloomFilter is a standalone utility, so I adopt the method to test its performance standalone, outside of HBase. > ByteBloomFilter's performance can be improved by avoiding multiplication when > generating hash > -- > > Key: HBASE-7884 > URL: https://issues.apache.org/jira/browse/HBASE-7884 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Fix For: 0.96.0, 0.94.6 > > Attachments: bloom_optimization_trunk_patch.patch, > bloom_performance_tunning.patch, hbase-7884-performance-report.pdf, > TestByteBloom.java > > > ByteBloomFilter's performance can be optimized by avoiding multiplication > operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash
[ https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585591#comment-13585591 ] clockfly commented on HBASE-7884: - Test config: = test platform: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz JVM: java version "1.6.0_38" Java(TM) SE Runtime Environment (build 1.6.0_38-b05) Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode) > ByteBloomFilter's performance can be improved by avoiding multiplication when > generating hash > -- > > Key: HBASE-7884 > URL: https://issues.apache.org/jira/browse/HBASE-7884 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Fix For: 0.96.0, 0.94.6 > > Attachments: bloom_optimization_trunk_patch.patch, > bloom_performance_tunning.patch, hbase-7884-performance-report.pdf, > TestByteBloom.java > > > ByteBloomFilter's performance can be optimized by avoiding multiplication > operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash
[ https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-7884: Attachment: TestByteBloom.java Test code for the bloom filter performance. > ByteBloomFilter's performance can be improved by avoiding multiplication when > generating hash > -- > > Key: HBASE-7884 > URL: https://issues.apache.org/jira/browse/HBASE-7884 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Fix For: 0.96.0, 0.94.6 > > Attachments: bloom_optimization_trunk_patch.patch, > bloom_performance_tunning.patch, hbase-7884-performance-report.pdf, > TestByteBloom.java > > > ByteBloomFilter's performance can be optimized by avoiding multiplication > operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash
[ https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585589#comment-13585589 ] clockfly commented on HBASE-7884: - Hi Nicolas, ==Test Result=== Here is the result of testing using microbenchmark framework http://www.ellipticgroup.com/html/benchmarkingArticle.html (introduction in http://www.ibm.com/developerworks/library/j-benchmark2/index.html). This framework will do warnup and exclude the impact of hotspot compilation. Before tunning: == ByteBloomFilter contains() test: first = 1.187 s, mean = 1.156 s (CI deltas: -370.103 us, +547.583 us), sd(standard deviation) = 1.754 ms (CI deltas: -490.688 us, +737.411 us) WARNING: EXECUTION TIMES HAVE EXTREME OUTLIERS, execution times may have serial correlation, SD VALUES MAY BE INACCURATE After tunning: == ByteBloomFilter contains() test: first = 1.006 s, mean = 973.650 ms (CI deltas: -229.513 us, +248.122 us), sd(standard deviation) = 1.333 ms (CI deltas: -205.051 us, +337.328 us) WARNING: execution times have mild outliers, execution times may have serial correlation, SD VALUES MAY BE INACCURATE Performance boost: 19%. Test code is attached. > ByteBloomFilter's performance can be improved by avoiding multiplication when > generating hash > -- > > Key: HBASE-7884 > URL: https://issues.apache.org/jira/browse/HBASE-7884 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Fix For: 0.96.0, 0.94.6 > > Attachments: bloom_optimization_trunk_patch.patch, > bloom_performance_tunning.patch, hbase-7884-performance-report.pdf > > > ByteBloomFilter's performance can be optimized by avoiding multiplication > operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash
[ https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-7884: Attachment: hbase-7884-performance-report.pdf Hi Nicolas, Please check the attached performacne report. > ByteBloomFilter's performance can be improved by avoiding multiplication when > generating hash > -- > > Key: HBASE-7884 > URL: https://issues.apache.org/jira/browse/HBASE-7884 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Fix For: 0.96.0, 0.94.6 > > Attachments: bloom_optimization_trunk_patch.patch, > bloom_performance_tunning.patch, hbase-7884-performance-report.pdf > > > ByteBloomFilter's performance can be optimized by avoiding multiplication > operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash
[ https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-7884: Fix Version/s: 0.94.6 0.96.0 > ByteBloomFilter's performance can be improved by avoiding multiplication when > generating hash > -- > > Key: HBASE-7884 > URL: https://issues.apache.org/jira/browse/HBASE-7884 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Fix For: 0.96.0, 0.94.6 > > Attachments: bloom_optimization_trunk_patch.patch, > bloom_performance_tunning.patch > > > ByteBloomFilter's performance can be optimized by avoiding multiplication > operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash
[ https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-7884: Attachment: bloom_optimization_trunk_patch.patch add patch for trunk > ByteBloomFilter's performance can be improved by avoiding multiplication when > generating hash > -- > > Key: HBASE-7884 > URL: https://issues.apache.org/jira/browse/HBASE-7884 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Attachments: bloom_optimization_trunk_patch.patch, > bloom_performance_tunning.patch > > > ByteBloomFilter's performance can be optimized by avoiding multiplication > operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash
[ https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582792#comment-13582792 ] clockfly commented on HBASE-7884: - Hi Ted, The hash logic is equilvalent. for (int i = 0; i < hashCount; i++) { long hashLoc = Math.abs((hash1 + i * hash2) % bloomBitSize); } is equilvalent as int compositeHash = hash1; for (int i = 0; i < hashCount; i++) { int hashLoc = Math.abs(compositeHash % bloomBitSize); compositeHash += hash2; } > ByteBloomFilter's performance can be improved by avoiding multiplication when > generating hash > -- > > Key: HBASE-7884 > URL: https://issues.apache.org/jira/browse/HBASE-7884 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Attachments: bloom_optimization_trunk_patch.patch, > bloom_performance_tunning.patch > > > ByteBloomFilter's performance can be optimized by avoiding multiplication > operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash
[ https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-7884: Description: ByteBloomFilter's performance can be optimized by avoiding multiplication operation when generating hash (was: ByteBloomFilter's performance can be optimized by avoiding multiplexing operation when generating hash ) > ByteBloomFilter's performance can be improved by avoiding multiplication when > generating hash > -- > > Key: HBASE-7884 > URL: https://issues.apache.org/jira/browse/HBASE-7884 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Attachments: bloom_performance_tunning.patch > > > ByteBloomFilter's performance can be optimized by avoiding multiplication > operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records
[ https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-7885: Attachment: hbase_bloom_shrink_fix.patch > bloom filter compaction is too aggressive for Hfile which only contains small > count of records > -- > > Key: HBASE-7885 > URL: https://issues.apache.org/jira/browse/HBASE-7885 > Project: HBase > Issue Type: Bug > Components: Performance, Scanners >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Fix For: 0.94.5 > > Attachments: hbase_bloom_shrink_fix.patch > > > For HFile V2, the bloom filter will take a initial size, 128KB. > When there are not that much records inserted into the bloom filter, the > bloom fitler will start to shrink itself to do compaction. > For example, for 128K, it will compact to 64K > ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that > it can be bounded by the estimate error rate. > If we puts only a few records in the HFile, the bloom filter will be > compacted to too small, then it will break the assumption that shrinking will > still be bounded by the estimated error rate. The False positive rate will > becomes un-acceptable high. > For example, if we set the expected error rate is 0.1, for 10 records, > after compaction, The size of the bloom filter will be 64 bytes. The real > effective false positive rate will be 50%. > The use case is like this, if we are using HBase to store big record like > images, and binaries, each record will take megabytes. Then for a 128M file, > it will only contains dozens of records. > The suggested fix is to set a lower limit for the bloom filter compaction > process. I suggest to use 1000 bytes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records
clockfly created HBASE-7885: --- Summary: bloom filter compaction is too aggressive for Hfile which only contains small count of records Key: HBASE-7885 URL: https://issues.apache.org/jira/browse/HBASE-7885 Project: HBase Issue Type: Bug Components: Performance, Scanners Affects Versions: 0.94.5 Reporter: clockfly Priority: Minor Fix For: 0.94.5 For HFile V2, the bloom filter will take a initial size, 128KB. When there are not that much records inserted into the bloom filter, the bloom fitler will start to shrink itself to do compaction. For example, for 128K, it will compact to 64K ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that it can be bounded by the estimate error rate. If we puts only a few records in the HFile, the bloom filter will be compacted to too small, then it will break the assumption that shrinking will still be bounded by the estimated error rate. The False positive rate will becomes un-acceptable high. For example, if we set the expected error rate is 0.1, for 10 records, after compaction, The size of the bloom filter will be 64 bytes. The real effective false positive rate will be 50%. The use case is like this, if we are using HBase to store big record like images, and binaries, each record will take megabytes. Then for a 128M file, it will only contains dozens of records. The suggested fix is to set a lower limit for the bloom filter compaction process. I suggest to use 1000 bytes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be optimized by avoiding multiplexing operation when generating hash
[ https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] clockfly updated HBASE-7884: Attachment: bloom_performance_tunning.patch > ByteBloomFilter's performance can be optimized by avoiding multiplexing > operation when generating hash > --- > > Key: HBASE-7884 > URL: https://issues.apache.org/jira/browse/HBASE-7884 > Project: HBase > Issue Type: Bug > Components: Performance >Affects Versions: 0.94.5 >Reporter: clockfly >Priority: Minor > Attachments: bloom_performance_tunning.patch > > > ByteBloomFilter's performance can be optimized by avoiding multiplexing > operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7884) ByteBloomFilter's performance can be optimized by avoiding multiplexing operation when generating hash
clockfly created HBASE-7884: --- Summary: ByteBloomFilter's performance can be optimized by avoiding multiplexing operation when generating hash Key: HBASE-7884 URL: https://issues.apache.org/jira/browse/HBASE-7884 Project: HBase Issue Type: Bug Components: Performance Affects Versions: 0.94.5 Reporter: clockfly Priority: Minor ByteBloomFilter's performance can be optimized by avoiding multiplexing operation when generating hash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira