[jira] [Commented] (HBASE-8174) Backport HBASE-8161(setting blocking file count on table level doesn't work)

2013-04-01 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618754#comment-13618754
 ] 

clockfly commented on HBASE-8174:
-

backport patch for 0.94 updated.
https://issues.apache.org/jira/secure/attachment/12576375/hbase-8174.patch


> Backport HBASE-8161(setting blocking file count on table level doesn't work)
> 
>
> Key: HBASE-8174
> URL: https://issues.apache.org/jira/browse/HBASE-8174
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: hbase-8174.patch, HBASE-8174.patch.0.94.v1
>
>
> Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is 
> configured at region server level.
> We should allow it to be configured at Table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8174) Backport HBASE-8161(setting blocking file count on table level doesn't work)

2013-04-01 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8174:


Summary: Backport HBASE-8161(setting blocking file count on table level 
doesn't work)  (was: Allow each table to customize its own flush blocking file 
count "hbase.hstore.blockingStoreFiles")

> Backport HBASE-8161(setting blocking file count on table level doesn't work)
> 
>
> Key: HBASE-8174
> URL: https://issues.apache.org/jira/browse/HBASE-8174
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: hbase-8174.patch, HBASE-8174.patch.0.94.v1
>
>
> Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is 
> configured at region server level.
> We should allow it to be configured at Table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"

2013-04-01 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8174:


Attachment: hbase-8174.patch

> Allow each table to customize its own flush blocking file count 
> "hbase.hstore.blockingStoreFiles"
> -
>
> Key: HBASE-8174
> URL: https://issues.apache.org/jira/browse/HBASE-8174
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: hbase-8174.patch, HBASE-8174.patch.0.94.v1
>
>
> Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is 
> configured at region server level.
> We should allow it to be configured at Table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations" to 0.94

2013-03-29 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8176:


Release Note: 
With HBASE-8176("Dynamic Schema Configurations"), we can define table/column 
family specific configuration by HColumnDescriptor.setValue() or 
HTableDescriptor.setValue(). We can also do this easily in hbase shell.

Change the table-scope by set attribute CONFIG like this:
alter 'test', METHOD => 'table_att', CONFIG => {'hbase.hstore.compaction.min' 
=> '5'}

Change the column family config by set attribute CONFIG like this: 
alter 'test',  NAME => 'f', CONFIG => {'hbase.hstore.compaction.min' => '5'}

  was:
With HBASE-8176("Dynamic Schema Configurations"), we can define table/column 
family specific configuration by HColumnDescriptor.setValue() or 
HTableDescriptor.setValue(). We can also do this easily in hbase shell.

Change the table-scope by set attribute CONFIG like this:
alter 'test', METHOD => 'table_att', CONFIG => {'hbase.hstore.compaction.min' 
=> '5'}

Change the column family config by set attribute CONFIG like this: 
alter 'test',  NAME => 'f', CONFIG => {'hbase.hstore.compaction.min' => a'5'}


> Backport HBASE-5335 "Dynamic Schema Configurations" to 0.94
> ---
>
> Key: HBASE-8176
> URL: https://issues.apache.org/jira/browse/HBASE-8176
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: hbase-8176.patch, HBASE-8176.patchv2, 
> hbase-8176-release-notes.patch
>
>
> With HBASE-5335, we can support per-table configuration and per-family 
> configurations.
> We can use it to customize the compaction on table/family basis, customize 
> the flush, and etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"

2013-03-29 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8176:


Attachment: hbase-8176-release-notes.patch

add suggested release notes

> Backport HBASE-5335 "Dynamic Schema Configurations"
> ---
>
> Key: HBASE-8176
> URL: https://issues.apache.org/jira/browse/HBASE-8176
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: hbase-8176.patch, HBASE-8176.patchv2, 
> hbase-8176-release-notes.patch
>
>
> With HBASE-5335, we can support per-table configuration and per-family 
> configurations.
> We can use it to customize the compaction on table/family basis, customize 
> the flush, and etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"

2013-03-29 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8176:


Attachment: HBASE-8176.patchv2

add patchv2 for HBASE-8176

> Backport HBASE-5335 "Dynamic Schema Configurations"
> ---
>
> Key: HBASE-8176
> URL: https://issues.apache.org/jira/browse/HBASE-8176
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: hbase-8176.patch, HBASE-8176.patchv2, 
> hbase-8176-release-notes.patch
>
>
> With HBASE-5335, we can support per-table configuration and per-family 
> configurations.
> We can use it to customize the compaction on table/family basis, customize 
> the flush, and etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"

2013-03-28 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617152#comment-13617152
 ] 

clockfly commented on HBASE-8176:
-

@Ted, thanks for pointting out the test error, I made a mistake, I neglected 
the largeTest category test. Now, all UT and integration test passes.

@Andrew, the shell syntax is completely compatible with before, it extends the 
function by adding a new attribute "CONFIG", while old attributes still works.

@Lars and Andrew,
I have updated the HBase shell context help in the patch:
{quote}
 or a shorter version:
 
   hbase> alter 't1', 'delete' => 'f1'
+  
+You can also change the column family config by set attribute CONFIG like this:
+  hbase> alter 'test',  NAME=>'f', CONFIG => {'hbase.hstore.compaction.min' => 
'5'}
 
 You can also change table-scope attributes like MAX_FILESIZE
 MEMSTORE_FLUSHSIZE, READONLY, and DEFERRED_LOG_FLUSH.
@@ -47,6 +50,9 @@
 For example, to change the max size of a family to 128MB, do:
 
   hbase> alter 't1', METHOD => 'table_att', MAX_FILESIZE => '134217728'
+  
+You can also change the table-scope by set attribute CONFIG like this:
+  hbase> alter 'test', METHOD=>'table_att', CONFIG => 
{'hbase.hstore.compaction.min' => '5'}  
{quote}

Here is the suggested release notes for this patch:
{quote}
Release notes:
With HBASE-8176("Dynamic Schema Configurations"), we can define table/column 
family specific configuration by HColumnDescriptor.setValue() or 
HTableDescriptor.setValue().
We can also do this easily in hbase shell, like this:
Change the table-scope by set attribute CONFIG like this:
alter 'test', METHOD => 'table_att', CONFIG => {'hbase.hstore.compaction.min' 
=> '5'}

Change the column family config by set attribute CONFIG like this:a
alter 'test',  NAME => 'f', CONFIG => {'hbase.hstore.compaction.min' => a'5'}
{quote}

> Backport HBASE-5335 "Dynamic Schema Configurations"
> ---
>
> Key: HBASE-8176
> URL: https://issues.apache.org/jira/browse/HBASE-8176
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: hbase-8176.patch
>
>
> With HBASE-5335, we can support per-table configuration and per-family 
> configurations.
> We can use it to customize the compaction on table/family basis, customize 
> the flush, and etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file

2013-03-27 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616106#comment-13616106
 ] 

clockfly commented on HBASE-8152:
-

All UT passes.(mvn test)

> Avoid creating empty reference file when splitkey is outside the range of a 
> store file
> --
>
> Key: HBASE-8152
> URL: https://issues.apache.org/jira/browse/HBASE-8152
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, HFile
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.95.0, 0.94.7
>
> Attachments: hbase-8152.0.94patch.v2, HBASE-8152-0.94v3.patch, 
> HBASE-8152-0.96v1.patch, hbase-8152.patch0.94
>
>
> When splitting a store file, if the split key is before the first key, or 
> greater than the last key, then only one reference file should be created.
> Currently, two reference file will be created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file

2013-03-27 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8152:


Attachment: HBASE-8152-0.94v3.patch

add patch for 0.94

> Avoid creating empty reference file when splitkey is outside the range of a 
> store file
> --
>
> Key: HBASE-8152
> URL: https://issues.apache.org/jira/browse/HBASE-8152
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, HFile
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.95.0, 0.94.7
>
> Attachments: hbase-8152.0.94patch.v2, HBASE-8152-0.94v3.patch, 
> HBASE-8152-0.96v1.patch, hbase-8152.patch0.94
>
>
> When splitting a store file, if the split key is before the first key, or 
> greater than the last key, then only one reference file should be created.
> Currently, two reference file will be created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file

2013-03-27 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8152:


Attachment: HBASE-8152-0.96v1.patch

add patch HBASE-8152-0.96v1.patch for trunk

> Avoid creating empty reference file when splitkey is outside the range of a 
> store file
> --
>
> Key: HBASE-8152
> URL: https://issues.apache.org/jira/browse/HBASE-8152
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, HFile
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.95.0, 0.94.7
>
> Attachments: hbase-8152.0.94patch.v2, HBASE-8152-0.94v3.patch, 
> HBASE-8152-0.96v1.patch, hbase-8152.patch0.94
>
>
> When splitting a store file, if the split key is before the first key, or 
> greater than the last key, then only one reference file should be created.
> Currently, two reference file will be created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file

2013-03-27 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8152:


 Priority: Minor  (was: Trivial)
Fix Version/s: 0.95.0

> Avoid creating empty reference file when splitkey is outside the range of a 
> store file
> --
>
> Key: HBASE-8152
> URL: https://issues.apache.org/jira/browse/HBASE-8152
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, HFile
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.95.0, 0.94.7
>
> Attachments: hbase-8152.0.94patch.v2, hbase-8152.patch0.94
>
>
> When splitting a store file, if the split key is before the first key, or 
> greater than the last key, then only one reference file should be created.
> Currently, two reference file will be created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file

2013-03-27 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616011#comment-13616011
 ] 

clockfly commented on HBASE-8152:
-

Ted, I will attach a new patch soon.




> Avoid creating empty reference file when splitkey is outside the range of a 
> store file
> --
>
> Key: HBASE-8152
> URL: https://issues.apache.org/jira/browse/HBASE-8152
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, HFile
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Trivial
> Fix For: 0.94.7
>
> Attachments: hbase-8152.0.94patch.v2, hbase-8152.patch0.94
>
>
> When splitting a store file, if the split key is before the first key, or 
> greater than the last key, then only one reference file should be created.
> Currently, two reference file will be created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file

2013-03-27 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615998#comment-13615998
 ] 

clockfly commented on HBASE-8152:
-

{quote}
// pick an split point (roughly halfway)
-  byte[] SPLITKEY = new byte[] { (LAST_CHAR-FIRST_CHAR)/2, FIRST_CHAR};
+  byte[] SPLITKEY = new byte[] { (LAST_CHAR + FIRST_CHAR)/2, FIRST_CHAR};
{quote}
{quote}
How did this work before?
{quote}
Sergey, actually this is hidden bug in old version, it accidently worked, 
because the split key lies outside the store file by "LAST_CHAR-FIRST_CHAR)/2", 
and there was also a bug in the counter which miss adding 1 count. These two 
bugs make the old UT pass.

I should make this another Jira to make it more clear.

> Avoid creating empty reference file when splitkey is outside the range of a 
> store file
> --
>
> Key: HBASE-8152
> URL: https://issues.apache.org/jira/browse/HBASE-8152
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, HFile
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Trivial
> Fix For: 0.94.7
>
> Attachments: hbase-8152.0.94patch.v2, hbase-8152.patch0.94
>
>
> When splitting a store file, if the split key is before the first key, or 
> greater than the last key, then only one reference file should be created.
> Currently, two reference file will be created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"

2013-03-26 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613998#comment-13613998
 ] 

clockfly commented on HBASE-8176:
-

All UT and all integration tests pass.

> Backport HBASE-5335 "Dynamic Schema Configurations"
> ---
>
> Key: HBASE-8176
> URL: https://issues.apache.org/jira/browse/HBASE-8176
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: hbase-8176.patch
>
>
> With HBASE-5335, we can support per-table configuration and per-family 
> configurations.
> We can use it to customize the compaction on table/family basis, customize 
> the flush, and etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"

2013-03-26 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613568#comment-13613568
 ] 

clockfly commented on HBASE-8176:
-

I did another internal patch(not this one) for 0.94.1, and used it in 
production for months. 

This patch has minor difference compared with 0.94.2, I only tested the UT 
part, will try to run full 0.94 test suite.

> Backport HBASE-5335 "Dynamic Schema Configurations"
> ---
>
> Key: HBASE-8176
> URL: https://issues.apache.org/jira/browse/HBASE-8176
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: hbase-8176.patch
>
>
> With HBASE-5335, we can support per-table configuration and per-family 
> configurations.
> We can use it to customize the compaction on table/family basis, customize 
> the flush, and etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"

2013-03-26 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613564#comment-13613564
 ] 

clockfly commented on HBASE-8176:
-

Lars, it did modify the shell admin.rb and hbase.rb. In shell, it introduced a 
compound config named as "CONFIG". And all additional configurations are 
configured under "CONFIG", such as CONFIG=>{conf1=>'', conf2=>''}

> Backport HBASE-5335 "Dynamic Schema Configurations"
> ---
>
> Key: HBASE-8176
> URL: https://issues.apache.org/jira/browse/HBASE-8176
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: hbase-8176.patch
>
>
> With HBASE-5335, we can support per-table configuration and per-family 
> configurations.
> We can use it to customize the compaction on table/family basis, customize 
> the flush, and etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"

2013-03-25 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8176:


Attachment: hbase-8176.patch

add HBASE-5335 patch for 0.94

> Backport HBASE-5335 "Dynamic Schema Configurations"
> ---
>
> Key: HBASE-8176
> URL: https://issues.apache.org/jira/browse/HBASE-8176
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: hbase-8176.patch
>
>
> With HBASE-5335, we can support per-table configuration and per-family 
> configurations.
> We can use it to customize the compaction on table/family basis, customize 
> the flush, and etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"

2013-03-25 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613480#comment-13613480
 ] 

clockfly commented on HBASE-8176:
-

Lars, Jean-Marc, and Anoop, thank you for the comments, they make sense. On the 
other side, I do find it valuable for those who want to keep 0.94 in production 
for a long time, considering the big changes in 0.96 and risks of upgrading.

I will attach my patch him, so that those who want this feature can patch it 
directly.

Thanks


 

> Backport HBASE-5335 "Dynamic Schema Configurations"
> ---
>
> Key: HBASE-8176
> URL: https://issues.apache.org/jira/browse/HBASE-8176
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
>
> With HBASE-5335, we can support per-table configuration and per-family 
> configurations.
> We can use it to customize the compaction on table/family basis, customize 
> the flush, and etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"

2013-03-22 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611640#comment-13611640
 ] 

clockfly commented on HBASE-8174:
-

@Sergey, yes, your patch is more decent, it is at column family level. Mine 
only works at Region level.
Is it worth the risk to backport HBASE-8161 and HBASE-5335 into 0.94?


> Allow each table to customize its own flush blocking file count 
> "hbase.hstore.blockingStoreFiles"
> -
>
> Key: HBASE-8174
> URL: https://issues.apache.org/jira/browse/HBASE-8174
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: HBASE-8174.patch.0.94.v1
>
>
> Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is 
> configured at region server level.
> We should allow it to be configured at Table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8176) Backport HBASE-5335 "Dynamic Schema Configurations"

2013-03-22 Thread clockfly (JIRA)
clockfly created HBASE-8176:
---

 Summary: Backport HBASE-5335 "Dynamic Schema Configurations"
 Key: HBASE-8176
 URL: https://issues.apache.org/jira/browse/HBASE-8176
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.5
Reporter: clockfly
Assignee: clockfly
Priority: Minor
 Fix For: 0.94.7


With HBase-5335, we can support per-table configuration and per-family 
configurations.

We can use it to customize the compaction on table/family basis, customize the 
flush, and etc..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"

2013-03-22 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610010#comment-13610010
 ] 

clockfly commented on HBASE-8174:
-

Hi Anoop, 

I neglected this, it depends on HBASE-5335. I will do a backport of HBASE-5335 
to 0.94.

> Allow each table to customize its own flush blocking file count 
> "hbase.hstore.blockingStoreFiles"
> -
>
> Key: HBASE-8174
> URL: https://issues.apache.org/jira/browse/HBASE-8174
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: HBASE-8174.patch.0.94.v1
>
>
> Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is 
> configured at region server level.
> We should allow it to be configured at Table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"

2013-03-22 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610011#comment-13610011
 ] 

clockfly commented on HBASE-8174:
-

In HBase-5335, the HRegion will get a compound configuration of regionserver + 
HTableDescriptor.

> Allow each table to customize its own flush blocking file count 
> "hbase.hstore.blockingStoreFiles"
> -
>
> Key: HBASE-8174
> URL: https://issues.apache.org/jira/browse/HBASE-8174
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.7
>
> Attachments: HBASE-8174.patch.0.94.v1
>
>
> Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is 
> configured at region server level.
> We should allow it to be configured at Table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"

2013-03-21 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8174:


Attachment: HBASE-8174.patch.0.94.v1

Add patch for 0.94/

> Allow each table to customize its own flush blocking file count 
> "hbase.hstore.blockingStoreFiles"
> -
>
> Key: HBASE-8174
> URL: https://issues.apache.org/jira/browse/HBASE-8174
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.6
>
> Attachments: HBASE-8174.patch.0.94.v1
>
>
> Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is 
> configured at region server level.
> We should allow it to be configured at Table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8174) Allow each table to customize its own flush blocking file count "hbase.hstore.blockingStoreFiles"

2013-03-21 Thread clockfly (JIRA)
clockfly created HBASE-8174:
---

 Summary: Allow each table to customize its own flush blocking file 
count "hbase.hstore.blockingStoreFiles"
 Key: HBASE-8174
 URL: https://issues.apache.org/jira/browse/HBASE-8174
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.5
Reporter: clockfly
Assignee: clockfly
Priority: Minor
 Fix For: 0.94.6


Currently, the blocking file count "hbase.hstore.blockingStoreFiles" is 
configured at region server level.

We should allow it to be configured at Table level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file

2013-03-21 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8152:


Priority: Trivial  (was: Major)

> Avoid creating empty reference file when splitkey is outside the range of a 
> store file
> --
>
> Key: HBASE-8152
> URL: https://issues.apache.org/jira/browse/HBASE-8152
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, HFile
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Trivial
> Fix For: 0.94.6
>
> Attachments: hbase-8152.0.94patch.v2, hbase-8152.patch0.94
>
>
> When splitting a store file, if the split key is before the first key, or 
> greater than the last key, then only one reference file should be created.
> Currently, two reference file will be created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file

2013-03-21 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8152:


Attachment: hbase-8152.0.94patch.v2

update the patch

> Avoid creating empty reference file when splitkey is outside the range of a 
> store file
> --
>
> Key: HBASE-8152
> URL: https://issues.apache.org/jira/browse/HBASE-8152
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, HFile
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
> Fix For: 0.94.6
>
> Attachments: hbase-8152.0.94patch.v2, hbase-8152.patch0.94
>
>
> When splitting a store file, if the split key is before the first key, or 
> greater than the last key, then only one reference file should be created.
> Currently, two reference file will be created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file

2013-03-21 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8152:


Attachment: hbase-8152.patch0.94

Add patch for 0.94

> Avoid creating empty reference file when splitkey is outside the range of a 
> store file
> --
>
> Key: HBASE-8152
> URL: https://issues.apache.org/jira/browse/HBASE-8152
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, HFile
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
> Fix For: 0.94.6
>
> Attachments: hbase-8152.patch0.94
>
>
> When splitting a store file, if the split key is before the first key, or 
> greater than the last key, then only one reference file should be created.
> Currently, two reference file will be created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file

2013-03-20 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-8152:


Description: 
When splitting a store file, if the split key is before the first key, or 
greater than the last key, then only one reference file should be created.

Currently, two reference file will be created.

  was:When splitting a store file, if the split key is before the first key, or 
greater than the last key, then only one reference file should be created.


> Avoid creating empty reference file when splitkey is outside the range of a 
> store file
> --
>
> Key: HBASE-8152
> URL: https://issues.apache.org/jira/browse/HBASE-8152
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration, HFile
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
> Fix For: 0.94.6
>
>
> When splitting a store file, if the split key is before the first key, or 
> greater than the last key, then only one reference file should be created.
> Currently, two reference file will be created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8152) Avoid creating empty reference file when splitkey is outside the range of a store file

2013-03-20 Thread clockfly (JIRA)
clockfly created HBASE-8152:
---

 Summary: Avoid creating empty reference file when splitkey is 
outside the range of a store file
 Key: HBASE-8152
 URL: https://issues.apache.org/jira/browse/HBASE-8152
 Project: HBase
  Issue Type: Improvement
  Components: Filesystem Integration, HFile
Affects Versions: 0.94.5
Reporter: clockfly
Assignee: clockfly
 Fix For: 0.94.6


When splitting a store file, if the split key is before the first key, or 
greater than the last key, then only one reference file should be created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7876) Got exception when manually triggers a split on an empty region

2013-03-14 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602132#comment-13602132
 ] 

clockfly commented on HBASE-7876:
-

@stack, this patches works for me, I defined a custom split policy, and a
empty region can be splitted successfully.





> Got exception when manually triggers a split on an empty region
> ---
>
> Key: HBASE-7876
> URL: https://issues.apache.org/jira/browse/HBASE-7876
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Minor
> Attachments: HBASE-7876-0.94V2.patch, HBASE-7876-trunk.patch
>
>
> We should allow a region to split successfully even if it does not yet have 
> storefiles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7876) Got exception when manually triggers a split on an empty region

2013-02-27 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588168#comment-13588168
 ] 

clockfly commented on HBASE-7876:
-

ramakrishna, may be this patch don't conflict with your use case.

Splitting a empty region will only happen when one of the two conditions is met:
1. Force manual split by specifiying a midkey.
2. Write a customized SplitPolicy.

For condition 2, it is the customized splitpolicy's choice.
For condition 1, if the application don't want to split a empty region, the 
application should check whether the region is empty by running a small scan.

If the admin don't specify a midkey, for a empty region, the default split 
policy RegionSplitPolicy:shouldSplit() will always return false, and split will 
never happen.

> Got exception when manually triggers a split on an empty region
> ---
>
> Key: HBASE-7876
> URL: https://issues.apache.org/jira/browse/HBASE-7876
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Minor
> Attachments: HBASE-7876-0.94.patch
>
>
> We should allow a region to split successfully even if it does not yet have 
> storefiles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7876) Got exception when manually triggers a split on an empty region

2013-02-27 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588134#comment-13588134
 ] 

clockfly commented on HBASE-7876:
-

I also tested a customized SplitPolicy. With this patch, the split will succeed 
at customized point even if there is no store files in the region.

In short, the split decision should be made by upper level SplitPolicy, which 
knows more about application logic, instead of low level CompactSplitThread.

> Got exception when manually triggers a split on an empty region
> ---
>
> Key: HBASE-7876
> URL: https://issues.apache.org/jira/browse/HBASE-7876
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Minor
> Attachments: HBASE-7876-0.94.patch
>
>
> We should allow a region to split successfully even if it does not yet have 
> storefiles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7876) Got exception when manually triggers a split on an empty region

2013-02-27 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588129#comment-13588129
 ] 

clockfly commented on HBASE-7876:
-

I tested the patch, the split succeed when specifiy a midkey.

> Got exception when manually triggers a split on an empty region
> ---
>
> Key: HBASE-7876
> URL: https://issues.apache.org/jira/browse/HBASE-7876
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Minor
> Attachments: HBASE-7876-0.94.patch
>
>
> We should allow a region to split successfully even if it does not yet have 
> storefiles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7876) Got exception when manually triggers a split on an empty region

2013-02-27 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588127#comment-13588127
 ] 

clockfly commented on HBASE-7876:
-

I think there are several motives:
1. HBaseAdmin can allow splitting by specifying a midkey. For a empty region, 
we should allow the user to manual split a empty region if the user expects 
there will be heavy load for this region future.
2. We allow user to customize the SplitPolicy, the splitpoint can be 
customized. When the customized SplitPolicy said that we need split at xx 
point, then we should support it, no matter there is store files in the region 
or not.



> Got exception when manually triggers a split on an empty region
> ---
>
> Key: HBASE-7876
> URL: https://issues.apache.org/jira/browse/HBASE-7876
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.5
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Minor
> Attachments: HBASE-7876-0.94.patch
>
>
> We should allow a region to split successfully even if it does not yet have 
> storefiles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records

2013-02-25 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586857#comment-13586857
 ] 

clockfly commented on HBASE-7885:
-

The equation should be:

 (1+1/m) ^ n = e ^ (n/m) (when m -> infinite) (m stands for bytes length, n 
stands for expected max record count).

> bloom filter compaction is too aggressive for Hfile which only contains small 
> count of records
> --
>
> Key: HBASE-7885
> URL: https://issues.apache.org/jira/browse/HBASE-7885
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Scanners
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.6
>
> Attachments: hbase-7885.patch, hbase_bloom_shrink_fix.patch
>
>
> For HFile V2, the bloom filter will take a initial size, 128KB. 
> When there are not that much records inserted into the bloom filter, the 
> bloom fitler will start to shrink itself to do compaction. 
> For example, for 128K, it will compact to 64K 
> ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that 
> it can be bounded by the estimate error rate. 
> If we puts only a few records in the HFile, the bloom filter will be 
> compacted to too small, then it will break the assumption that shrinking will 
> still be bounded by the estimated error rate. The False positive rate will 
> becomes un-acceptable high. 
> For example, if we set the expected error rate is 0.1, for 10 records, 
> after compaction, The size of the bloom filter will be 64 bytes. The real 
> effective false positive rate will be 50%.
> The use case is like this, if we are using HBase to store big record like 
> images, and binaries, each record will take megabytes. Then for a 128M file, 
> it will only contains dozens of records.
> The suggested fix is to set a lower limit for the bloom filter compaction 
> process. I suggest to use 1000 bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records

2013-02-25 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586854#comment-13586854
 ] 

clockfly commented on HBASE-7885:
-

Hi Lars,

This is more a mathematic process about Limit of a sequence.

The assumption of the bloom filter compaction is: 
The expected false positive rate is the same, as long as (The byte length of 
bloom filter)/(max keys to contains) stays the same.

This assumption is based on (1+1/m)^n == e^(n/m) (when m -> infinite) (m stands 
for bytes length, n stands for expected max record count).

Here I use 1000 as a estimate of infinite.

Please check: 
http://en.wikipedia.org/wiki/Bloom_filter
http://en.wikipedia.org/wiki/E_(mathematical_constant)




> bloom filter compaction is too aggressive for Hfile which only contains small 
> count of records
> --
>
> Key: HBASE-7885
> URL: https://issues.apache.org/jira/browse/HBASE-7885
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Scanners
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.6
>
> Attachments: hbase-7885.patch, hbase_bloom_shrink_fix.patch
>
>
> For HFile V2, the bloom filter will take a initial size, 128KB. 
> When there are not that much records inserted into the bloom filter, the 
> bloom fitler will start to shrink itself to do compaction. 
> For example, for 128K, it will compact to 64K 
> ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that 
> it can be bounded by the estimate error rate. 
> If we puts only a few records in the HFile, the bloom filter will be 
> compacted to too small, then it will break the assumption that shrinking will 
> still be bounded by the estimated error rate. The False positive rate will 
> becomes un-acceptable high. 
> For example, if we set the expected error rate is 0.1, for 10 records, 
> after compaction, The size of the bloom filter will be 64 bytes. The real 
> effective false positive rate will be 50%.
> The use case is like this, if we are using HBase to store big record like 
> images, and binaries, each record will take megabytes. Then for a 128M file, 
> it will only contains dozens of records.
> The suggested fix is to set a lower limit for the bloom filter compaction 
> process. I suggest to use 1000 bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records

2013-02-25 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-7885:


Attachment: hbase-7885.patch

attach patch and UT fix for hbase0.94

> bloom filter compaction is too aggressive for Hfile which only contains small 
> count of records
> --
>
> Key: HBASE-7885
> URL: https://issues.apache.org/jira/browse/HBASE-7885
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Scanners
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.6
>
> Attachments: hbase-7885.patch, hbase_bloom_shrink_fix.patch
>
>
> For HFile V2, the bloom filter will take a initial size, 128KB. 
> When there are not that much records inserted into the bloom filter, the 
> bloom fitler will start to shrink itself to do compaction. 
> For example, for 128K, it will compact to 64K 
> ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that 
> it can be bounded by the estimate error rate. 
> If we puts only a few records in the HFile, the bloom filter will be 
> compacted to too small, then it will break the assumption that shrinking will 
> still be bounded by the estimated error rate. The False positive rate will 
> becomes un-acceptable high. 
> For example, if we set the expected error rate is 0.1, for 10 records, 
> after compaction, The size of the bloom filter will be 64 bytes. The real 
> effective false positive rate will be 50%.
> The use case is like this, if we are using HBase to store big record like 
> images, and binaries, each record will take megabytes. Then for a 128M file, 
> it will only contains dozens of records.
> The suggested fix is to set a lower limit for the bloom filter compaction 
> process. I suggest to use 1000 bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records

2013-02-25 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-7885:


Fix Version/s: (was: 0.94.5)
   0.94.6
 Assignee: clockfly

> bloom filter compaction is too aggressive for Hfile which only contains small 
> count of records
> --
>
> Key: HBASE-7885
> URL: https://issues.apache.org/jira/browse/HBASE-7885
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Scanners
>Affects Versions: 0.94.5
>Reporter: clockfly
>Assignee: clockfly
>Priority: Minor
> Fix For: 0.94.6
>
> Attachments: hbase_bloom_shrink_fix.patch
>
>
> For HFile V2, the bloom filter will take a initial size, 128KB. 
> When there are not that much records inserted into the bloom filter, the 
> bloom fitler will start to shrink itself to do compaction. 
> For example, for 128K, it will compact to 64K 
> ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that 
> it can be bounded by the estimate error rate. 
> If we puts only a few records in the HFile, the bloom filter will be 
> compacted to too small, then it will break the assumption that shrinking will 
> still be bounded by the estimated error rate. The False positive rate will 
> becomes un-acceptable high. 
> For example, if we set the expected error rate is 0.1, for 10 records, 
> after compaction, The size of the bloom filter will be 64 bytes. The real 
> effective false positive rate will be 50%.
> The use case is like this, if we are using HBase to store big record like 
> images, and binaries, each record will take megabytes. Then for a 128M file, 
> it will only contains dozens of records.
> The suggested fix is to set a lower limit for the bloom filter compaction 
> process. I suggest to use 1000 bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records

2013-02-25 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586785#comment-13586785
 ] 

clockfly commented on HBASE-7885:
-

Hi Ted,

+  while ( (newByteSize & 1) == 0 && newMaxKeys > (this.keyCount<<1) 
+  && newByteSize >= MIN_BLOOMFILTER_SIZE * 2) {
 pieces <<= 1;
 newByteSize >>= 1;
 newMaxKeys >>= 1;
   }

In the while loop, we will cut the size by half. After compaction, newByteSize  
will be reduced to newByteSize /2. newByteSize >= MIN_BLOOMFILTER_SIZE * 2 is 
to make sure after compaction, the bloom filter's size is still >= 
MIN_BLOOMFILTER_SIZE.

There are UT affected, I will attach the UT fix soon.


> bloom filter compaction is too aggressive for Hfile which only contains small 
> count of records
> --
>
> Key: HBASE-7885
> URL: https://issues.apache.org/jira/browse/HBASE-7885
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Scanners
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Fix For: 0.94.5
>
> Attachments: hbase_bloom_shrink_fix.patch
>
>
> For HFile V2, the bloom filter will take a initial size, 128KB. 
> When there are not that much records inserted into the bloom filter, the 
> bloom fitler will start to shrink itself to do compaction. 
> For example, for 128K, it will compact to 64K 
> ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that 
> it can be bounded by the estimate error rate. 
> If we puts only a few records in the HFile, the bloom filter will be 
> compacted to too small, then it will break the assumption that shrinking will 
> still be bounded by the estimated error rate. The False positive rate will 
> becomes un-acceptable high. 
> For example, if we set the expected error rate is 0.1, for 10 records, 
> after compaction, The size of the bloom filter will be 64 bytes. The real 
> effective false positive rate will be 50%.
> The use case is like this, if we are using HBase to store big record like 
> images, and binaries, each record will take megabytes. Then for a 128M file, 
> it will only contains dozens of records.
> The suggested fix is to set a lower limit for the bloom filter compaction 
> process. I suggest to use 1000 bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash

2013-02-24 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585606#comment-13585606
 ] 

clockfly commented on HBASE-7884:
-

Hi Ted, 

the patch for 0.94 is already attached.

patch for 0.94.6: 
https://issues.apache.org/jira/secure/attachment/12570054/bloom_performance_tunning.patch
patch for trunk: 
https://issues.apache.org/jira/secure/attachment/12570247/bloom_optimization_trunk_patch.patch

> ByteBloomFilter's performance can be improved by avoiding multiplication when 
> generating hash 
> --
>
> Key: HBASE-7884
> URL: https://issues.apache.org/jira/browse/HBASE-7884
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Fix For: 0.96.0, 0.94.6
>
> Attachments: bloom_optimization_trunk_patch.patch, 
> bloom_performance_tunning.patch, hbase-7884-performance-report.pdf, 
> TestByteBloom.java
>
>
> ByteBloomFilter's performance can be optimized by avoiding multiplication 
> operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash

2013-02-24 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585603#comment-13585603
 ] 

clockfly commented on HBASE-7884:
-

I didn't adopt the method to use major_compact to compare the performance. 
 
Major compact will call bloom filter *builing* when doing merging. This patch 
doesn't change the *building* process, This patch only impact the bloom filter 
*lookup* performance,  so there will be no performance difference observed for 
the major compaction.

Since byteBloomFilter is a standalone utility, so I adopt the method to test 
its performance standalone, outside of HBase.


> ByteBloomFilter's performance can be improved by avoiding multiplication when 
> generating hash 
> --
>
> Key: HBASE-7884
> URL: https://issues.apache.org/jira/browse/HBASE-7884
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Fix For: 0.96.0, 0.94.6
>
> Attachments: bloom_optimization_trunk_patch.patch, 
> bloom_performance_tunning.patch, hbase-7884-performance-report.pdf, 
> TestByteBloom.java
>
>
> ByteBloomFilter's performance can be optimized by avoiding multiplication 
> operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash

2013-02-24 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585591#comment-13585591
 ] 

clockfly commented on HBASE-7884:
-

Test config:
=
test platform: Intel(R) Core(TM)2 CPU 6300  @ 1.86GHz
JVM: java version "1.6.0_38"
Java(TM) SE Runtime Environment (build 1.6.0_38-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode)



> ByteBloomFilter's performance can be improved by avoiding multiplication when 
> generating hash 
> --
>
> Key: HBASE-7884
> URL: https://issues.apache.org/jira/browse/HBASE-7884
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Fix For: 0.96.0, 0.94.6
>
> Attachments: bloom_optimization_trunk_patch.patch, 
> bloom_performance_tunning.patch, hbase-7884-performance-report.pdf, 
> TestByteBloom.java
>
>
> ByteBloomFilter's performance can be optimized by avoiding multiplication 
> operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash

2013-02-24 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-7884:


Attachment: TestByteBloom.java

Test code for the bloom filter performance.

> ByteBloomFilter's performance can be improved by avoiding multiplication when 
> generating hash 
> --
>
> Key: HBASE-7884
> URL: https://issues.apache.org/jira/browse/HBASE-7884
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Fix For: 0.96.0, 0.94.6
>
> Attachments: bloom_optimization_trunk_patch.patch, 
> bloom_performance_tunning.patch, hbase-7884-performance-report.pdf, 
> TestByteBloom.java
>
>
> ByteBloomFilter's performance can be optimized by avoiding multiplication 
> operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash

2013-02-24 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585589#comment-13585589
 ] 

clockfly commented on HBASE-7884:
-

Hi Nicolas,


==Test Result===
Here is the result of testing using microbenchmark framework 
http://www.ellipticgroup.com/html/benchmarkingArticle.html (introduction in 
http://www.ibm.com/developerworks/library/j-benchmark2/index.html). This 
framework will do warnup and exclude the impact of hotspot compilation.

Before tunning:
==
ByteBloomFilter contains() test: first = 1.187 s, mean = 1.156 s (CI deltas: 
-370.103 us, +547.583 us), sd(standard deviation) = 1.754 ms (CI deltas: 
-490.688 us, +737.411 us) WARNING: EXECUTION TIMES HAVE EXTREME OUTLIERS, 
execution times may have serial correlation, SD VALUES MAY BE INACCURATE

After tunning:
==
ByteBloomFilter contains() test: first = 1.006 s, mean = 973.650 ms (CI deltas: 
-229.513 us, +248.122 us), sd(standard deviation) = 1.333 ms (CI deltas: 
-205.051 us, +337.328 us) WARNING: execution times have mild outliers, 
execution times may have serial correlation, SD VALUES MAY BE INACCURATE

Performance boost: 19%.

Test code is attached.


> ByteBloomFilter's performance can be improved by avoiding multiplication when 
> generating hash 
> --
>
> Key: HBASE-7884
> URL: https://issues.apache.org/jira/browse/HBASE-7884
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Fix For: 0.96.0, 0.94.6
>
> Attachments: bloom_optimization_trunk_patch.patch, 
> bloom_performance_tunning.patch, hbase-7884-performance-report.pdf
>
>
> ByteBloomFilter's performance can be optimized by avoiding multiplication 
> operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash

2013-02-22 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-7884:


Attachment: hbase-7884-performance-report.pdf

Hi Nicolas,

Please check the attached performacne report.

> ByteBloomFilter's performance can be improved by avoiding multiplication when 
> generating hash 
> --
>
> Key: HBASE-7884
> URL: https://issues.apache.org/jira/browse/HBASE-7884
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Fix For: 0.96.0, 0.94.6
>
> Attachments: bloom_optimization_trunk_patch.patch, 
> bloom_performance_tunning.patch, hbase-7884-performance-report.pdf
>
>
> ByteBloomFilter's performance can be optimized by avoiding multiplication 
> operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash

2013-02-20 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-7884:


Fix Version/s: 0.94.6
   0.96.0

> ByteBloomFilter's performance can be improved by avoiding multiplication when 
> generating hash 
> --
>
> Key: HBASE-7884
> URL: https://issues.apache.org/jira/browse/HBASE-7884
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Fix For: 0.96.0, 0.94.6
>
> Attachments: bloom_optimization_trunk_patch.patch, 
> bloom_performance_tunning.patch
>
>
> ByteBloomFilter's performance can be optimized by avoiding multiplication 
> operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash

2013-02-20 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-7884:


Attachment: bloom_optimization_trunk_patch.patch

add patch for trunk

> ByteBloomFilter's performance can be improved by avoiding multiplication when 
> generating hash 
> --
>
> Key: HBASE-7884
> URL: https://issues.apache.org/jira/browse/HBASE-7884
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Attachments: bloom_optimization_trunk_patch.patch, 
> bloom_performance_tunning.patch
>
>
> ByteBloomFilter's performance can be optimized by avoiding multiplication 
> operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash

2013-02-20 Thread clockfly (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582792#comment-13582792
 ] 

clockfly commented on HBASE-7884:
-

Hi Ted,

The hash logic is equilvalent.

 for (int i = 0; i < hashCount; i++) {
long hashLoc = Math.abs((hash1 + i * hash2) % bloomBitSize);
 }

is equilvalent as

  int compositeHash = hash1;
   for (int i = 0; i < hashCount; i++) {
 int hashLoc = Math.abs(compositeHash % bloomBitSize);
 compositeHash += hash2;
   }


> ByteBloomFilter's performance can be improved by avoiding multiplication when 
> generating hash 
> --
>
> Key: HBASE-7884
> URL: https://issues.apache.org/jira/browse/HBASE-7884
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Attachments: bloom_optimization_trunk_patch.patch, 
> bloom_performance_tunning.patch
>
>
> ByteBloomFilter's performance can be optimized by avoiding multiplication 
> operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be improved by avoiding multiplication when generating hash

2013-02-20 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-7884:


Description: ByteBloomFilter's performance can be optimized by avoiding 
multiplication operation when generating hash   (was: ByteBloomFilter's 
performance can be optimized by avoiding multiplexing operation when generating 
hash )

> ByteBloomFilter's performance can be improved by avoiding multiplication when 
> generating hash 
> --
>
> Key: HBASE-7884
> URL: https://issues.apache.org/jira/browse/HBASE-7884
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Attachments: bloom_performance_tunning.patch
>
>
> ByteBloomFilter's performance can be optimized by avoiding multiplication 
> operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records

2013-02-19 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-7885:


Attachment: hbase_bloom_shrink_fix.patch

> bloom filter compaction is too aggressive for Hfile which only contains small 
> count of records
> --
>
> Key: HBASE-7885
> URL: https://issues.apache.org/jira/browse/HBASE-7885
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Scanners
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Fix For: 0.94.5
>
> Attachments: hbase_bloom_shrink_fix.patch
>
>
> For HFile V2, the bloom filter will take a initial size, 128KB. 
> When there are not that much records inserted into the bloom filter, the 
> bloom fitler will start to shrink itself to do compaction. 
> For example, for 128K, it will compact to 64K 
> ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that 
> it can be bounded by the estimate error rate. 
> If we puts only a few records in the HFile, the bloom filter will be 
> compacted to too small, then it will break the assumption that shrinking will 
> still be bounded by the estimated error rate. The False positive rate will 
> becomes un-acceptable high. 
> For example, if we set the expected error rate is 0.1, for 10 records, 
> after compaction, The size of the bloom filter will be 64 bytes. The real 
> effective false positive rate will be 50%.
> The use case is like this, if we are using HBase to store big record like 
> images, and binaries, each record will take megabytes. Then for a 128M file, 
> it will only contains dozens of records.
> The suggested fix is to set a lower limit for the bloom filter compaction 
> process. I suggest to use 1000 bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records

2013-02-19 Thread clockfly (JIRA)
clockfly created HBASE-7885:
---

 Summary: bloom filter compaction is too aggressive for Hfile which 
only contains small count of records
 Key: HBASE-7885
 URL: https://issues.apache.org/jira/browse/HBASE-7885
 Project: HBase
  Issue Type: Bug
  Components: Performance, Scanners
Affects Versions: 0.94.5
Reporter: clockfly
Priority: Minor
 Fix For: 0.94.5


For HFile V2, the bloom filter will take a initial size, 128KB. 
When there are not that much records inserted into the bloom filter, the bloom 
fitler will start to shrink itself to do compaction. 
For example, for 128K, it will compact to 64K 
->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that it 
can be bounded by the estimate error rate. 

If we puts only a few records in the HFile, the bloom filter will be compacted 
to too small, then it will break the assumption that shrinking will still be 
bounded by the estimated error rate. The False positive rate will becomes 
un-acceptable high. 
For example, if we set the expected error rate is 0.1, for 10 records, 
after compaction, The size of the bloom filter will be 64 bytes. The real 
effective false positive rate will be 50%.

The use case is like this, if we are using HBase to store big record like 
images, and binaries, each record will take megabytes. Then for a 128M file, it 
will only contains dozens of records.

The suggested fix is to set a lower limit for the bloom filter compaction 
process. I suggest to use 1000 bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7884) ByteBloomFilter's performance can be optimized by avoiding multiplexing operation when generating hash

2013-02-19 Thread clockfly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

clockfly updated HBASE-7884:


Attachment: bloom_performance_tunning.patch

> ByteBloomFilter's performance can be optimized by avoiding multiplexing 
> operation when generating hash 
> ---
>
> Key: HBASE-7884
> URL: https://issues.apache.org/jira/browse/HBASE-7884
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 0.94.5
>Reporter: clockfly
>Priority: Minor
> Attachments: bloom_performance_tunning.patch
>
>
> ByteBloomFilter's performance can be optimized by avoiding multiplexing 
> operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7884) ByteBloomFilter's performance can be optimized by avoiding multiplexing operation when generating hash

2013-02-19 Thread clockfly (JIRA)
clockfly created HBASE-7884:
---

 Summary: ByteBloomFilter's performance can be optimized by 
avoiding multiplexing operation when generating hash 
 Key: HBASE-7884
 URL: https://issues.apache.org/jira/browse/HBASE-7884
 Project: HBase
  Issue Type: Bug
  Components: Performance
Affects Versions: 0.94.5
Reporter: clockfly
Priority: Minor


ByteBloomFilter's performance can be optimized by avoiding multiplexing 
operation when generating hash 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira