[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897700#comment-13897700
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

Hi, thank you very much for your time.

I need one small change. Its not critical but it will make considerable 
difference in user experience.

My line
LOG.info(MessageFormat.format(Input split length: {0} bytes., 
tSplit.getLength()));
was changed to 
LOG.info(Input split length:  + tSplit.getLength() +  bytes.);
in last code review.

The reason why i used MessageFormat.format is that the length is large number 
and it needs to be printed with thousands separator.

It takes few seconds to read number 
54798765321
How fast can you say if the number represents 5.4 TB or 5.4 GB ?

but if you print it with separators you can correctly read it in a moment:
54,798,765,321

Can we add some formatting consistent with hbase coding standards ? Maybe 
String.format i dont know.

Lukas

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897820#comment-13897820
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

One more think:
There is some versioning in class TableSplit (methods write  read). 
We dont need to increment it ?
(I am just asking)

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898037#comment-13898037
 ] 

Nick Dimiduk commented on HBASE-10413:
--

bq. Can we add some formatting consistent with hbase coding standards ? Maybe 
String.format i dont know.

I agree, this is difficult to read. Usually we use 
[StringUtils#humanReadableInt(long)|http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/StringUtils.html#humanReadableInt(long)].

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898038#comment-13898038
 ] 

Nick Dimiduk commented on HBASE-10413:
--

The TableSplit writable does not persist beyond the life of a mapreduce job. A 
single job will have the same version of the serialization code, so there's no 
versioning to increment.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898065#comment-13898065
 ] 

Ted Yu commented on HBASE-10413:


Integrated addendum to 0.98 and trunk.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898127#comment-13898127
 ] 

Nick Dimiduk commented on HBASE-10413:
--

Thanks Ted.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898263#comment-13898263
 ] 

Hudson commented on HBASE-10413:


SUCCESS: Integrated in HBase-TRUNK #4909 (See 
[https://builds.apache.org/job/HBase-TRUNK/4909/])
HBASE-10413 addendum makes split length readable (tedyu: rev 1567232)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898361#comment-13898361
 ] 

Hudson commented on HBASE-10413:


SUCCESS: Integrated in HBase-0.98 #148 (See 
[https://builds.apache.org/job/HBase-0.98/148/])
HBASE-10413 addendum makes split length readable (tedyu: rev 1567230)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898382#comment-13898382
 ] 

Hudson commented on HBASE-10413:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #136 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/136/])
HBASE-10413 addendum makes split length readable (tedyu: rev 1567230)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898556#comment-13898556
 ] 

Hudson commented on HBASE-10413:


FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #87 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/87/])
HBASE-10413 addendum makes split length readable (tedyu: rev 1567232)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 
1566768)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896437#comment-13896437
 ] 

Hadoop QA commented on HBASE-10413:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627957/HBASE-10413-6.patch
  against trunk revision .
  ATTACHMENT ID: 12627957

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//console

This message is automatically generated.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896438#comment-13896438
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

I have removed  setLength() from TableSplit.
Unit tests are green, I would like to resolve this ticket.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896763#comment-13896763
 ] 

Ted Yu commented on HBASE-10413:


[~apurtell]:
Do you want this in 0.98 ?

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896865#comment-13896865
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

It would be great.



 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896900#comment-13896900
 ] 

Nick Dimiduk commented on HBASE-10413:
--

Overall looks good. IMHO, better to have constructors with more arguments and 
support RAII than have setters folks forget to call. Just some nits now:

TableInputFormatBase:
 - HBase code doesn't use format strings (for whatever reason). Please keep it 
consistent and use string concatenation.
 - extra whitespace

TableSplit:
 - comment as with previous version has no context, just omit it.

Nice test in TestRegionSizeCalculator :)

[~enis] anything else here?

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897049#comment-13897049
 ] 

Enis Soztutar commented on HBASE-10413:
---

bq. Enis Soztutar anything else here?
Looks good to me. We can commit after your comments are addressed. 

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897102#comment-13897102
 ] 

Andrew Purtell commented on HBASE-10413:


Sure +1 for 0.98 after the remaining review comments are addressed.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897470#comment-13897470
 ] 

Hudson commented on HBASE-10413:


SUCCESS: Integrated in HBase-TRUNK #4906 (See 
[https://builds.apache.org/job/HBase-TRUNK/4906/])
HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 
1566768)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897496#comment-13897496
 ] 

Hudson commented on HBASE-10413:


FAILURE: Integrated in HBase-0.98 #146 (See 
[https://builds.apache.org/job/HBase-0.98/146/])
HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 
1566770)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897523#comment-13897523
 ] 

Hudson commented on HBASE-10413:


SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #135 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/135/])
HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 
1566770)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-08 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895729#comment-13895729
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

Lets make RegionSizeCalculator @InterfaceAudience.Private. Users are not 
expected to directly call this, right?
 - I am not sure - I have no experience with using this interface 
InterfaceAudience. Lot of developers are using heavily customized 
TableInputFormat. They may want to use this class.  I have changed it to 
Private (Btw: I was told to change it from Private to Public in previous code 
review ).

Instead of TableSplit.setLength(), you can override the ctor. TableSplit acts 
like a immutable data bean like object.
 - It means there will be ctor with 6 parameters. IMO it is too much but if you 
really want me to do it I will.

 On some cases, the regions might split or merge concurrently between getting 
the startEndKeys and asking the regions from cluster. In this case, for that 
range, we might default to 0, but it should be ok I think. We are not just 
estimating the region sizes here.
 - I think its not worth doing - it will be rare and the difference will be 
insignificant most times.


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895767#comment-13895767
 ] 

Hadoop QA commented on HBASE-10413:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627835/HBASE-10413-5.patch
  against trunk revision .
  ATTACHMENT ID: 12627835

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestHBaseFsck

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8644//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8644//console

This message is automatically generated.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894758#comment-13894758
 ] 

Hadoop QA commented on HBASE-10413:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627643/HBASE-10413-2.patch
  against trunk revision .
  ATTACHMENT ID: 12627643

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  LOG.debug(MessageFormat.format(Region {0} has size {1}, 
regionLoad.getNameAsString(), regionSizeBytes));
+assertEquals((2 * ((long) Integer.MAX_VALUE)) * megabyte, 
calculator.getRegionSize(largeRegion.getBytes()));

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8630//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8630//console

This message is automatically generated.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-07 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894790#comment-13894790
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

Ad:

+  long regionSizeBytes = (memSize + fileSize) * megaByte;

Does memstore size have to be included ?

I am not sure. What are cons and pros ?

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894799#comment-13894799
 ] 

Ted Yu commented on HBASE-10413:


The patch enhances TableInputFormat which deals with store files, right ?

We don't know when memstore would be flushed.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-07 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894845#comment-13894845
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

ok, memstore size removed.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894904#comment-13894904
 ] 

Hadoop QA commented on HBASE-10413:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627660/HBASE-10413-3.patch
  against trunk revision .
  ATTACHMENT ID: 12627660

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestHBaseFsck

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8631//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8631//console

This message is automatically generated.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894755#comment-13894755
 ] 

Ted Yu commented on HBASE-10413:


{code}
+@InterfaceAudience.Private
+public class RegionSizeCalculator {
{code}
@InterfaceAudience.Public should be used.
{code}
+  RegionSizeCalculator (HTable table, HBaseAdmin admin) throws IOException {
{code}
admin is only used in ctor. Close it in finally block.
{code}
+  long regionSizeBytes = (memSize + fileSize) * megaByte;
{code}
Does memstore size have to be included ?
{code}
+  LOG.debug(MessageFormat.format(Region {0} has size {1}, 
regionLoad.getNameAsString(), regionSizeBytes));
{code}
Wrap long line - limit is 100 char.

License header appears twice in RegionSizeCalculatorTest

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894991#comment-13894991
 ] 

Hadoop QA commented on HBASE-10413:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12627668/HBASE-10413-4.patch
  against trunk revision .
  ATTACHMENT ID: 12627668

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8632//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8632//console

This message is automatically generated.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-07 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895278#comment-13895278
 ] 

Enis Soztutar commented on HBASE-10413:
---

Thanks Lukas for pursuing this. Patch looks good overall. Just a couple of 
things to address, then we can commit:
 - Lets make RegionSizeCalculator @InterfaceAudience.Private. Users are not 
expected to directly call this, right? 
 - RegionSizeCalculatorTest - TestRegionSizeCalculator. The convention in 
HBase is to start with Test. 
 - Instead of TableSplit.setLength(), you can override the ctor. TableSplit 
acts like a immutable data bean like object. 
 - On some cases, the regions might split or merge concurrently between getting 
the startEndKeys and asking the regions from cluster. In this case, for that 
range, we might default to 0, but it should be ok I think. We are not just 
estimating the region sizes here. 

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-04 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890550#comment-13890550
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

Hi,
I know it is hacky. It is my first hbase commit, i was not sure how to do it so 
I asked 3 people and then published first draft as soon as possible. Everybody 
was fine with the solution :( .

The hacky solution is good enough for us - I have already deployed it 
yesterday.  I cant spent much more time on this. I need to close it by tomorrow.

How about this solution? I am not sure if it is the best way - it does not work 
with Scan ranges.

ToDos:
We need to filter regions by table
It would be nice to if we could filter size by column families.


https://github.com/apache/hbase/pull/8/files#diff-46ff60f1e27e3d77131acb7873050990R68


   HBaseAdmin admin = new HBaseAdmin(configuration);

ClusterStatus clusterStatus = admin.getClusterStatus();
CollectionServerName servers = clusterStatus.getServers();

for (ServerName serverName: servers) {
  ServerLoad serverLoad = clusterStatus.getLoad(serverName);

  for (Map.Entrybyte[], RegionLoad regionEntry: 
serverLoad.getRegionsLoad().entrySet()) {
byte[] regionId = regionEntry.getKey();
RegionLoad regionLoad = regionEntry.getValue();

long regionSize = 1024 * 1024 * (regionLoad.getMemStoreSizeMB() + 
regionLoad.getStorefileSizeMB());

sizeMap.put(regionId, regionSize);
  }
}

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec

 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-04 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890559#comment-13890559
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

New version with per table region filtering:
https://github.com/apache/hbase/pull/8/files#diff-46ff60f1e27e3d77131acb7873050990R76

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec

 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-03 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889593#comment-13889593
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

I made big changes in code.
You can check it and discus it in https://github.com/apache/hbase/pull/8/files .

 I have to write unit tests before making the patch.

- I need help with unit test. Is there some simple unit test helper/utility i 
can use ? I need to create table with some regions and then work with their 
sizes. It should be local, there should be some level of abstraction. 

- I have added configuration option for disabling this feature:
  Is there some policy about new configuration options ? 
  Should i move the configuration key constant to some place ? 
  Should be the feature disabled or enabled by default ?

- Computation of region sizes might be slow. We might need some parallelization.

from mail:
+  public void setLength(long length) {
This method in TableSplit can be package private.

I think that lot of people uses Table Split in their custom Input format. IMHO 
this method should be part of API.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec

 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-03 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889745#comment-13889745
 ] 

Enis Soztutar commented on HBASE-10413:
---

Good work, but I am afraid it is hacky to go around the region server and 
access the files directly even for an estimate of the split. The problem is 
that not all the files of the region reside in the same region directory (split 
daughters are referring to parent's files, etc) and we want to encourage 
encapsulating the FS layout in the region / region server layer. 

For a large number of regions, this will also slow down the job submission. A 
better way might be to ask the master about the estimated sizes for the Scan 
range for the input split. The region servers periodically send the a heartbeat 
containing info about their regions (ServerLoad / RegionLoad). The master then 
might answer that request from the latest known sizes. 

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec

 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-01-31 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887817#comment-13887817
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

first draft:
https://github.com/lukasnalezenec/hbase/commit/bf560b3c19b15cefb114132ac86664ffc44dad32

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec

 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-01-31 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887920#comment-13887920
 ] 

Nick Dimiduk commented on HBASE-10413:
--

Nice work [~lukas.nalezenec]! I left some comments on your commit in Github. 
Can you attach a patch vs trunk so we can run it through the QA bot? Thanks.


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec

 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-01-27 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882737#comment-13882737
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

I talked with guy who worked on this and he said our production issue was 
probably not directly caused by getLength() returning 0. 
Anyway, we are interested in fixing this. 
See updated ticket description.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec

 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)