[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898556#comment-13898556 ]
Hudson commented on HBASE-10413: -------------------------------- FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #87 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/87/]) HBASE-10413 addendum makes split length readable (tedyu: rev 1567232) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java HBASE-10413 Tablesplit.getLength returns 0 (Lukas Nalezenec) (tedyu: rev 1566768) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatBase.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSplit.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSizeCalculator.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableSplit.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestRegionSizeCalculator.java > Tablesplit.getLength returns 0 > ------------------------------ > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce > Affects Versions: 0.96.1.1 > Reporter: Lukas Nalezenec > Assignee: Lukas Nalezenec > Fix For: 0.98.1, 0.99.0 > > Attachments: 10413-7.patch, 10413.addendum, HBASE-10413-2.patch, > HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413-5.patch, > HBASE-10413-6.patch, HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)