subject:"\[jira\] \[Commented\] \(HBASE\-10413\) Tablesplit.getLength returns 0"

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Lukas Nalezenec (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897700#comment-13897700
]

Lukas Nalezenec commented on HBASE-10413:
-

Hi, thank you very much for your time.

I need one small change. Its not critical but it will make considerable
difference in user experience.

My line
LOG.info(MessageFormat.format(Input split length: {0} bytes.,
tSplit.getLength()));
was changed to
LOG.info(Input split length: + tSplit.getLength() + bytes.);
in last code review.

The reason why i used MessageFormat.format is that the length is large number
and it needs to be printed with thousands separator.

It takes few seconds to read number
54798765321
How fast can you say if the number represents 5.4 TB or 5.4 GB ?

but if you print it with separators you can correctly read it in a moment:
54,798,765,321

Can we add some formatting consistent with hbase coding standards ? Maybe
String.format i dont know.

Lukas

Tablesplit.getLength returns 0
--

Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch,
HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch,
HBASE-10413.patch

InputSplits should be sorted by length but TableSplit does not contain real
getLength implementation:
@Override
public long getLength() {
// Not clear how to obtain this... seems to be used only for sorting
splits
return 0;
}
This is causing us problem with scheduling - we have got jobs that are
supposed to finish in limited time but they get often stuck in last mapper
working on large region.
Can we implement this method ?
What is the best way ?
We were thinking about estimating size by size of files on HDFS.
We would like to get Scanner from TableSplit, use startRow, stopRow and
column families to get corresponding region than computing size of HDFS for
given region and column family.
Update:
This ticket was about production issue - I talked with guy who worked on this
and he said our production issue was probably not directly caused by
getLength() returning 0.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Lukas Nalezenec (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897820#comment-13897820
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

One more think:
There is some versioning in class TableSplit (methods write  read). 
We dont need to increment it ?
(I am just asking)

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898037#comment-13898037
 ] 

Nick Dimiduk commented on HBASE-10413:
--

bq. Can we add some formatting consistent with hbase coding standards ? Maybe 
String.format i dont know.

I agree, this is difficult to read. Usually we use 
[StringUtils#humanReadableInt(long)|http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/StringUtils.html#humanReadableInt(long)].

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898038#comment-13898038
 ] 

Nick Dimiduk commented on HBASE-10413:
--

The TableSplit writable does not persist beyond the life of a mapreduce job. A 
single job will have the same version of the serialization code, so there's no 
versioning to increment.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898065#comment-13898065
 ] 

Ted Yu commented on HBASE-10413:


Integrated addendum to 0.98 and trunk.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898127#comment-13898127
 ] 

Nick Dimiduk commented on HBASE-10413:
--

Thanks Ted.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898263#comment-13898263
 ] 

Hudson commented on HBASE-10413:


SUCCESS: Integrated in HBase-TRUNK #4909 (See 
[https://builds.apache.org/job/HBase-TRUNK/4909/])
HBASE-10413 addendum makes split length readable (tedyu: rev 1567232)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898361#comment-13898361
 ] 

Hudson commented on HBASE-10413:


SUCCESS: Integrated in HBase-0.98 #148 (See 
[https://builds.apache.org/job/HBase-0.98/148/])
HBASE-10413 addendum makes split length readable (tedyu: rev 1567230)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898382#comment-13898382
 ] 

Hudson commented on HBASE-10413:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #136 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/136/])
HBASE-10413 addendum makes split length readable (tedyu: rev 1567230)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898556#comment-13898556
 ] 

Hudson commented on HBASE-10413:


FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #87 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/87/])
HBASE-10413 addendum makes split length readable (tedyu: rev 1567232)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 
1566768)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, 
 HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, 
 HBASE-10413-6.patch, HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896437#comment-13896437
]

Hadoop QA commented on HBASE-10413:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12627957/HBASE-10413-6.patch
against trunk revision .
ATTACHMENT ID: 12627957

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 5 new
or modified tests.

{color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop
1.0 profile.

{color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop
1.1 profile.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:red}-1 site{color}. The patch appears to cause mvn site goal to
fail.

{color:green}+1 core tests{color}. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/8649//console

This message is automatically generated.

Tablesplit.getLength returns 0
--

Key: HBASE-10413
URL: https://issues.apache.org/jira/browse/HBASE-10413
Project: HBase
Issue Type: Bug
Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch,
HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch,
HBASE-10413.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Lukas Nalezenec (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896438#comment-13896438
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

I have removed  setLength() from TableSplit.
Unit tests are green, I would like to resolve this ticket.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896763#comment-13896763
 ] 

Ted Yu commented on HBASE-10413:


[~apurtell]:
Do you want this in 0.98 ?

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Lukas Nalezenec (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896865#comment-13896865
 ] 

Lukas Nalezenec commented on HBASE-10413:
-

It would be great.



 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896900#comment-13896900
]

Nick Dimiduk commented on HBASE-10413:
--

Overall looks good. IMHO, better to have constructors with more arguments and
support RAII than have setters folks forget to call. Just some nits now:

TableInputFormatBase:
- HBase code doesn't use format strings (for whatever reason). Please keep it
consistent and use string concatenation.
- extra whitespace

TableSplit:
- comment as with previous version has no context, just omit it.

Nice test in TestRegionSizeCalculator :)

[~enis] anything else here?

Tablesplit.getLength returns 0
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897049#comment-13897049
 ] 

Enis Soztutar commented on HBASE-10413:
---

bq. Enis Soztutar anything else here?
Looks good to me. We can commit after your comments are addressed. 

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897102#comment-13897102
 ] 

Andrew Purtell commented on HBASE-10413:


Sure +1 for 0.98 after the remaining review comments are addressed.

 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897470#comment-13897470
 ] 

Hudson commented on HBASE-10413:


SUCCESS: Integrated in HBase-TRUNK #4906 (See 
[https://builds.apache.org/job/HBase-TRUNK/4906/])
HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 
1566768)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897496#comment-13897496
 ] 

Hudson commented on HBASE-10413:


FAILURE: Integrated in HBase-0.98 #146 (See 
[https://builds.apache.org/job/HBase-0.98/146/])
HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 
1566770)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897523#comment-13897523
 ] 

Hudson commented on HBASE-10413:


SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #135 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/135/])
HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 
1566770)
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java


 Tablesplit.getLength returns 0
 --

 Key: HBASE-10413
 URL: https://issues.apache.org/jira/browse/HBASE-10413
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.96.1.1
Reporter: Lukas Nalezenec
Assignee: Lukas Nalezenec
 Fix For: 0.98.1, 0.99.0

 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
 HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
 HBASE-10413.patch


 InputSplits should be sorted by length but TableSplit does not contain real 
 getLength implementation:
   @Override
   public long getLength() {
 // Not clear how to obtain this... seems to be used only for sorting 
 splits
 return 0;
   }
 This is causing us problem with scheduling - we have got jobs that are 
 supposed to finish in limited time but they get often stuck in last mapper 
 working on large region.
 Can we implement this method ? 
 What is the best way ?
 We were thinking about estimating size by size of files on HDFS.
 We would like to get Scanner from TableSplit, use startRow, stopRow and 
 column families to get corresponding region than computing size of HDFS for 
 given region and column family. 
 Update:
 This ticket was about production issue - I talked with guy who worked on this 
 and he said our production issue was probably not directly caused by 
 getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-08 Thread Lukas Nalezenec (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895729#comment-13895729
]

Lukas Nalezenec commented on HBASE-10413:
-

Lets make RegionSizeCalculator @InterfaceAudience.Private. Users are not
expected to directly call this, right?
- I am not sure - I have no experience with using this interface
InterfaceAudience. Lot of developers are using heavily customized
TableInputFormat. They may want to use this class. I have changed it to
Private (Btw: I was told to change it from Private to Public in previous code
review ).

Instead of TableSplit.setLength(), you can override the ctor. TableSplit acts
like a immutable data bean like object.
- It means there will be ctor with 6 parameters. IMO it is too much but if you
really want me to do it I will.

On some cases, the regions might split or merge concurrently between getting
the startEndKeys and asking the regions from cluster. In this case, for that
range, we might default to 0, but it should be ok I think. We are not just
estimating the region sizes here.
- I think its not worth doing - it will be rare and the difference will be
insignificant most times.

Tablesplit.getLength returns 0
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

2014-02-08 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895767#comment-13895767
]

Hadoop QA commented on HBASE-10413:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12627835/HBASE-10413-5.patch
against trunk revision .
ATTACHMENT ID: 12627835