[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897700#comment-13897700 ] Lukas Nalezenec commented on HBASE-10413: - Hi, thank you very much for your time. I need one small change. Its not critical but it will make considerable difference in user experience. My line LOG.info(MessageFormat.format(Input split length: {0} bytes., tSplit.getLength())); was changed to LOG.info(Input split length: + tSplit.getLength() + bytes.); in last code review. The reason why i used MessageFormat.format is that the length is large number and it needs to be printed with thousands separator. It takes few seconds to read number 54798765321 How fast can you say if the number represents 5.4 TB or 5.4 GB ? but if you print it with separators you can correctly read it in a moment: 54,798,765,321 Can we add some formatting consistent with hbase coding standards ? Maybe String.format i dont know. Lukas Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897820#comment-13897820 ] Lukas Nalezenec commented on HBASE-10413: - One more think: There is some versioning in class TableSplit (methods write read). We dont need to increment it ? (I am just asking) Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898037#comment-13898037 ] Nick Dimiduk commented on HBASE-10413: -- bq. Can we add some formatting consistent with hbase coding standards ? Maybe String.format i dont know. I agree, this is difficult to read. Usually we use [StringUtils#humanReadableInt(long)|http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/StringUtils.html#humanReadableInt(long)]. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898038#comment-13898038 ] Nick Dimiduk commented on HBASE-10413: -- The TableSplit writable does not persist beyond the life of a mapreduce job. A single job will have the same version of the serialization code, so there's no versioning to increment. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898065#comment-13898065 ] Ted Yu commented on HBASE-10413: Integrated addendum to 0.98 and trunk. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898127#comment-13898127 ] Nick Dimiduk commented on HBASE-10413: -- Thanks Ted. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898263#comment-13898263 ] Hudson commented on HBASE-10413: SUCCESS: Integrated in HBase-TRUNK #4909 (See [https://builds.apache.org/job/HBase-TRUNK/4909/]) HBASE-10413 addendum makes split length readable (tedyu: rev 1567232) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898361#comment-13898361 ] Hudson commented on HBASE-10413: SUCCESS: Integrated in HBase-0.98 #148 (See [https://builds.apache.org/job/HBase-0.98/148/]) HBASE-10413 addendum makes split length readable (tedyu: rev 1567230) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898382#comment-13898382 ] Hudson commented on HBASE-10413: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #136 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/136/]) HBASE-10413 addendum makes split length readable (tedyu: rev 1567230) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898556#comment-13898556 ] Hudson commented on HBASE-10413: FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #87 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/87/]) HBASE-10413 addendum makes split length readable (tedyu: rev 1567232) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 1566768) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896437#comment-13896437 ] Hadoop QA commented on HBASE-10413: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627957/HBASE-10413-6.patch against trunk revision . ATTACHMENT ID: 12627957 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8649//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8649//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8649//console This message is automatically generated. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896438#comment-13896438 ] Lukas Nalezenec commented on HBASE-10413: - I have removed setLength() from TableSplit. Unit tests are green, I would like to resolve this ticket. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896763#comment-13896763 ] Ted Yu commented on HBASE-10413: [~apurtell]: Do you want this in 0.98 ? Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896865#comment-13896865 ] Lukas Nalezenec commented on HBASE-10413: - It would be great. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896900#comment-13896900 ] Nick Dimiduk commented on HBASE-10413: -- Overall looks good. IMHO, better to have constructors with more arguments and support RAII than have setters folks forget to call. Just some nits now: TableInputFormatBase: - HBase code doesn't use format strings (for whatever reason). Please keep it consistent and use string concatenation. - extra whitespace TableSplit: - comment as with previous version has no context, just omit it. Nice test in TestRegionSizeCalculator :) [~enis] anything else here? Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897049#comment-13897049 ] Enis Soztutar commented on HBASE-10413: --- bq. Enis Soztutar anything else here? Looks good to me. We can commit after your comments are addressed. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897102#comment-13897102 ] Andrew Purtell commented on HBASE-10413: Sure +1 for 0.98 after the remaining review comments are addressed. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897470#comment-13897470 ] Hudson commented on HBASE-10413: SUCCESS: Integrated in HBase-TRUNK #4906 (See [https://builds.apache.org/job/HBase-TRUNK/4906/]) HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 1566768) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897496#comment-13897496 ] Hudson commented on HBASE-10413: FAILURE: Integrated in HBase-0.98 #146 (See [https://builds.apache.org/job/HBase-0.98/146/]) HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 1566770) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897523#comment-13897523 ] Hudson commented on HBASE-10413: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #135 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/135/]) HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 1566770) * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Fix For: 0.98.1, 0.99.0 Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895729#comment-13895729 ] Lukas Nalezenec commented on HBASE-10413: - Lets make RegionSizeCalculator @InterfaceAudience.Private. Users are not expected to directly call this, right? - I am not sure - I have no experience with using this interface InterfaceAudience. Lot of developers are using heavily customized TableInputFormat. They may want to use this class. I have changed it to Private (Btw: I was told to change it from Private to Public in previous code review ). Instead of TableSplit.setLength(), you can override the ctor. TableSplit acts like a immutable data bean like object. - It means there will be ctor with 6 parameters. IMO it is too much but if you really want me to do it I will. On some cases, the regions might split or merge concurrently between getting the startEndKeys and asking the regions from cluster. In this case, for that range, we might default to 0, but it should be ok I think. We are not just estimating the region sizes here. - I think its not worth doing - it will be rare and the difference will be insignificant most times. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895767#comment-13895767 ] Hadoop QA commented on HBASE-10413: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627835/HBASE-10413-5.patch against trunk revision . ATTACHMENT ID: 12627835 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8644//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8644//console This message is automatically generated. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894758#comment-13894758 ] Hadoop QA commented on HBASE-10413: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627643/HBASE-10413-2.patch against trunk revision . ATTACHMENT ID: 12627643 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + LOG.debug(MessageFormat.format(Region {0} has size {1}, regionLoad.getNameAsString(), regionSizeBytes)); +assertEquals((2 * ((long) Integer.MAX_VALUE)) * megabyte, calculator.getRegionSize(largeRegion.getBytes())); {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//console This message is automatically generated. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894790#comment-13894790 ] Lukas Nalezenec commented on HBASE-10413: - Ad: + long regionSizeBytes = (memSize + fileSize) * megaByte; Does memstore size have to be included ? I am not sure. What are cons and pros ? Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894799#comment-13894799 ] Ted Yu commented on HBASE-10413: The patch enhances TableInputFormat which deals with store files, right ? We don't know when memstore would be flushed. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894845#comment-13894845 ] Lukas Nalezenec commented on HBASE-10413: - ok, memstore size removed. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894904#comment-13894904 ] Hadoop QA commented on HBASE-10413: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627660/HBASE-10413-3.patch against trunk revision . ATTACHMENT ID: 12627660 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//console This message is automatically generated. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894755#comment-13894755 ] Ted Yu commented on HBASE-10413: {code} +@InterfaceAudience.Private +public class RegionSizeCalculator { {code} @InterfaceAudience.Public should be used. {code} + RegionSizeCalculator (HTable table, HBaseAdmin admin) throws IOException { {code} admin is only used in ctor. Close it in finally block. {code} + long regionSizeBytes = (memSize + fileSize) * megaByte; {code} Does memstore size have to be included ? {code} + LOG.debug(MessageFormat.format(Region {0} has size {1}, regionLoad.getNameAsString(), regionSizeBytes)); {code} Wrap long line - limit is 100 char. License header appears twice in RegionSizeCalculatorTest Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894991#comment-13894991 ] Hadoop QA commented on HBASE-10413: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627668/HBASE-10413-4.patch against trunk revision . ATTACHMENT ID: 12627668 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//console This message is automatically generated. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895278#comment-13895278 ] Enis Soztutar commented on HBASE-10413: --- Thanks Lukas for pursuing this. Patch looks good overall. Just a couple of things to address, then we can commit: - Lets make RegionSizeCalculator @InterfaceAudience.Private. Users are not expected to directly call this, right? - RegionSizeCalculatorTest - TestRegionSizeCalculator. The convention in HBase is to start with Test. - Instead of TableSplit.setLength(), you can override the ctor. TableSplit acts like a immutable data bean like object. - On some cases, the regions might split or merge concurrently between getting the startEndKeys and asking the regions from cluster. In this case, for that range, we might default to 0, but it should be ok I think. We are not just estimating the region sizes here. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890550#comment-13890550 ] Lukas Nalezenec commented on HBASE-10413: - Hi, I know it is hacky. It is my first hbase commit, i was not sure how to do it so I asked 3 people and then published first draft as soon as possible. Everybody was fine with the solution :( . The hacky solution is good enough for us - I have already deployed it yesterday. I cant spent much more time on this. I need to close it by tomorrow. How about this solution? I am not sure if it is the best way - it does not work with Scan ranges. ToDos: We need to filter regions by table It would be nice to if we could filter size by column families. https://github.com/apache/hbase/pull/8/files#diff-46ff60f1e27e3d77131acb7873050990R68 HBaseAdmin admin = new HBaseAdmin(configuration); ClusterStatus clusterStatus = admin.getClusterStatus(); CollectionServerName servers = clusterStatus.getServers(); for (ServerName serverName: servers) { ServerLoad serverLoad = clusterStatus.getLoad(serverName); for (Map.Entrybyte[], RegionLoad regionEntry: serverLoad.getRegionsLoad().entrySet()) { byte[] regionId = regionEntry.getKey(); RegionLoad regionLoad = regionEntry.getValue(); long regionSize = 1024 * 1024 * (regionLoad.getMemStoreSizeMB() + regionLoad.getStorefileSizeMB()); sizeMap.put(regionId, regionSize); } } Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890559#comment-13890559 ] Lukas Nalezenec commented on HBASE-10413: - New version with per table region filtering: https://github.com/apache/hbase/pull/8/files#diff-46ff60f1e27e3d77131acb7873050990R76 Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889593#comment-13889593 ] Lukas Nalezenec commented on HBASE-10413: - I made big changes in code. You can check it and discus it in https://github.com/apache/hbase/pull/8/files . I have to write unit tests before making the patch. - I need help with unit test. Is there some simple unit test helper/utility i can use ? I need to create table with some regions and then work with their sizes. It should be local, there should be some level of abstraction. - I have added configuration option for disabling this feature: Is there some policy about new configuration options ? Should i move the configuration key constant to some place ? Should be the feature disabled or enabled by default ? - Computation of region sizes might be slow. We might need some parallelization. from mail: + public void setLength(long length) { This method in TableSplit can be package private. I think that lot of people uses Table Split in their custom Input format. IMHO this method should be part of API. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889745#comment-13889745 ] Enis Soztutar commented on HBASE-10413: --- Good work, but I am afraid it is hacky to go around the region server and access the files directly even for an estimate of the split. The problem is that not all the files of the region reside in the same region directory (split daughters are referring to parent's files, etc) and we want to encourage encapsulating the FS layout in the region / region server layer. For a large number of regions, this will also slow down the job submission. A better way might be to ask the master about the estimated sizes for the Scan range for the input split. The region servers periodically send the a heartbeat containing info about their regions (ServerLoad / RegionLoad). The master then might answer that request from the latest known sizes. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887817#comment-13887817 ] Lukas Nalezenec commented on HBASE-10413: - first draft: https://github.com/lukasnalezenec/hbase/commit/bf560b3c19b15cefb114132ac86664ffc44dad32 Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887920#comment-13887920 ] Nick Dimiduk commented on HBASE-10413: -- Nice work [~lukas.nalezenec]! I left some comments on your commit in Github. Can you attach a patch vs trunk so we can run it through the QA bot? Thanks. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882737#comment-13882737 ] Lukas Nalezenec commented on HBASE-10413: - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. Anyway, we are interested in fixing this. See updated ticket description. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)