[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596697#comment-13596697
 ] 

Hadoop QA commented on HBASE-8034:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572652/HBASE-8034-v0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4720//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4720//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4720//console

This message is automatically generated.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596824#comment-13596824
 ] 

stack commented on HBASE-8034:
--

OutputStream will always implement getPos?

You need to change this comment so that it says its an estimate and say how you 
came by the estimate -- in other words, this will be definitive doc on this new 
metadata:

+  /** Number of bytes taken by the data blocks of this file. */

+1 if you fix the above on commit.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596835#comment-13596835
 ] 

Ted Yu commented on HBASE-8034:
---

{code}
+// Estimate 98% of the file is data for old files.
+this.diskDataSize = (b != null) ? Bytes.toLong(b) : 
(long)(this.reader.length() * 0.98);
{code}
@Sergey:
Can you clarify what file versions are considered 'old files' ?

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-08 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597369#comment-13597369
 ] 

Nick Dimiduk commented on HBASE-8034:
-

Would it make more sense to expose the number of KeyValues in the HFile?

{code}
+  @Override
+  public long getCurrentSize() throws IOException {
+if (this.outputStream == null) return -1;
+return this.outputStream.getPos();
+  }
{code}

This strikes me as flakey. Will there be another thread writing to the 
OutputStream when this method is invoked? Should it be synchronized?

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597499#comment-13597499
 ] 

Sergey Shelukhin commented on HBASE-8034:
-

bq. OutputStream will always implement getPos?
We rely on it in a few places in HFileWriterV2, so I would say yes.


bq. You need to change this comment so that it says its an estimate and say how 
you came by the estimate – in other words, this will be definitive doc on this 
new metadata:
bq. Can you clarify what file versions are considered 'old files' ?
Done, on the method.

bq. Would it make more sense to expose the number of KeyValues in the HFile?
That is an interesting question. For the purposes of compaction we care more 
about physical size being similar.
For the purposes of reads it's unclear, but probably key values. May be an 
improvement JIRA (including for default compaction algo)

bq. This strikes me as flakey. Will there be another thread writing to the 
OutputStream when this method is invoked? Should it be synchronized?
Probably not. Do you mean background writing inside the object or write calls?
We don't control the implementation for the former (it's hadoop one)... For the 
latter, similarly to HFileWriterV2, we rely on calling this method when we know 
we are not writing. That could be broken by changes, but adding sync to file 
writing for this would seem to be an overkill.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-08 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597558#comment-13597558
 ] 

Nick Dimiduk commented on HBASE-8034:
-

bq. Do you mean background writing inside the object or write calls?

I was thinking other write calls.

bq. That could be broken by changes, but adding sync to file writing for this 
would seem to be an overkill.

That's probably true.

+1

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597689#comment-13597689
 ] 

Ted Yu commented on HBASE-8034:
---

{code}
+   * or estimated as {@link #DATA_BLOCKS_FRACTION_ESTIMATE} if there's no such 
field (old files).
{code}
The first part of sentence seems incomplete.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597718#comment-13597718
 ] 

Hadoop QA commented on HBASE-8034:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572855/HBASE-8034-v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4734//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4734//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4734//console

This message is automatically generated.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599072#comment-13599072
 ] 

Sergey Shelukhin commented on HBASE-8034:
-

[~ted_yu] what do you mean? Seems complete to me.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599082#comment-13599082
 ] 

Ted Yu commented on HBASE-8034:
---

See the javadoc warning reported by QA.

I think DATA_BLOCKS_FRACTION_ESTIMATE should have been 
DATA_SIZE_FRACTION_ESTIMATE. The constant is a percentage while estimate should 
be for size.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599432#comment-13599432
 ] 

Hadoop QA commented on HBASE-8034:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12573182/HBASE-8034-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.procedure.TestZKProcedureControllers

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4764//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4764//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4764//console

This message is automatically generated.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600419#comment-13600419
 ] 

Sergey Shelukhin commented on HBASE-8034:
-

Hi. Any more feedback? Thanks!

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-13 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601361#comment-13601361
 ] 

Nick Dimiduk commented on HBASE-8034:
-

{noformat}
--- hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
+++ hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
@@ -319,6 +319,11 @@ public class HFile {
  * HFile V2.
  */
 void addDeleteFamilyBloomFilter(BloomFilterWriter bfw) throws IOException;
+
+/**
+ * @return Currently written raw data size on disk.
+ */
+long getCurrentSize() throws IOException;
   }
{noformat}

nit: "\@return Currently written raw data size on disk *or -1.*"

Otherwise, ship it!

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601711#comment-13601711
 ] 

Sergey Shelukhin commented on HBASE-8034:
-

Given that Ted and Nick +1d, and the condition for stack's conditional +1 
appears to be satisifed I will commit evening-ish if there are no objections.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603179#comment-13603179
 ] 

Hudson commented on HBASE-8034:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #448 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/448/])
HBASE-8034 record on-disk data size for store file and make it available 
during writing (Revision 1456743)

 Result = FAILURE
sershe : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java


> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603305#comment-13603305
 ] 

Hudson commented on HBASE-8034:
---

Integrated in HBase-TRUNK #3961 (See 
[https://builds.apache.org/job/HBase-TRUNK/3961/])
HBASE-8034 record on-disk data size for store file and make it available 
during writing (Revision 1456743)

 Result = FAILURE
sershe : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java


> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-15 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603443#comment-13603443
 ] 

Matteo Bertozzi commented on HBASE-8034:


What's the difference between this one and fs.getFileStatus(path).getLen()
at first look it seems that you're not using the data size but the data + 
headers & co

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603640#comment-13603640
 ] 

Sergey Shelukhin commented on HBASE-8034:
-

Well, I am assuming getFileStatus from fs may be stale. As it turns out 
however, the output get position can also be stale (looks like it's always 
close, but not quite at, some granularity, I am looking at it), which can cause 
problems for small files for me. I will take a look at various methods.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603689#comment-13603689
 ] 

Sergey Shelukhin commented on HBASE-8034:
-

Also - what headers do you mean?

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-15 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603695#comment-13603695
 ] 

Matteo Bertozzi commented on HBASE-8034:


HFile block headers, reading DATA_BLOCKS_FRACTION_ESTIMATE it seems that you're 
just calculating the "data blocks" without all the index, block filters, block 
headers...
I see an information like the number of keys (as Nick suggested) more useful, 
than having another full file size that we can get from the file status... but 
maybe there's something I'm missing.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603860#comment-13603860
 ] 

Sergey Shelukhin commented on HBASE-8034:
-

The assumption is that the estimate path will rarely execute, most of the time 
the final writer size before writing any footers will be written into file.
Can you please point me at what headers are written before data?

The usage of KVs is something that I also think makes much more sense; however 
that means KVs will have to be used for detecting imbalance for fixed-count 
scheme (size and KV balance may be different)/splitting out new stripe for 
sequential data/etc. Do you think it's acceptable?

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-15 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603874#comment-13603874
 ] 

Matteo Bertozzi commented on HBASE-8034:


{quote}Can you please point me at what headers are written before data?{quote}
HFileBlock.Writer.finishBlock() and finishBlockAndWriteHeaderAndData()

{quote}...that means KVs will have to be used for detecting imbalance for 
fixed-count scheme (size and KV balance may be different)/splitting out new 
stripe for sequential data/etc. Do you think it's acceptable?{quote}
yeah, splitting and some other sort of tuning can be the first use case

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604039#comment-13604039
 ] 

Sergey Shelukhin commented on HBASE-8034:
-

Hmm... I am going to back out this patch for further consideration then.

> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604172#comment-13604172
 ] 

Hudson commented on HBASE-8034:
---

Integrated in HBase-TRUNK #3963 (See 
[https://builds.apache.org/job/HBase-TRUNK/3963/])
REVERT HBASE-8034 record on-disk data size for store file and make it 
available during writing (Revision 1457193)

 Result = FAILURE
sershe : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java


> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8034) record on-disk data size for store file and make it available during writing

2013-03-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604181#comment-13604181
 ] 

Hudson commented on HBASE-8034:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #449 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/449/])
REVERT HBASE-8034 record on-disk data size for store file and make it 
available during writing (Revision 1457193)

 Result = FAILURE
sershe : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java


> record on-disk data size for store file and make it available during writing
> 
>
> Key: HBASE-8034
> URL: https://issues.apache.org/jira/browse/HBASE-8034
> Project: HBase
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HBASE-8034-v0.patch, HBASE-8034-v1.patch, 
> HBASE-8034-v2.patch
>
>
> To better estimate the size of data in the file, and to be able to split 
> files intelligently during any multi-file compactor like stripe or level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira