[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

Lukas Nalezenec (JIRA) Tue, 11 Feb 2014 02:16:38 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897700#comment-13897700
 ]


Lukas Nalezenec commented on HBASE-10413:
-----------------------------------------

Hi, thank you very much for your time.

I need one small change. Its not critical but it will make considerable 
difference in user experience.

My line
LOG.info(MessageFormat.format("Input split length: {0} bytes.", 
tSplit.getLength()));
was changed to 
LOG.info("Input split length: " + tSplit.getLength() + " bytes.");
in last code review.

The reason why i used MessageFormat.format is that the length is large number 
and it needs to be printed with thousands separator.

It takes few seconds to read number 
54798765321
How fast can you say if the number represents 5.4 TB or 5.4 GB ?

but if you print it with separators you can correctly read it in a moment:
54,798,765,321

Can we add some formatting consistent with hbase coding standards ? Maybe 
String.format.... i dont know.

Lukas

> Tablesplit.getLength returns 0
> ------------------------------
>
>                 Key: HBASE-10413
>                 URL: https://issues.apache.org/jira/browse/HBASE-10413
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, mapreduce
>    Affects Versions: 0.96.1.1
>            Reporter: Lukas Nalezenec
>            Assignee: Lukas Nalezenec
>             Fix For: 0.98.1, 0.99.0
>
>         Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, 
> HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, 
> HBASE-10413.patch
>
>
> InputSplits should be sorted by length but TableSplit does not contain real 
> getLength implementation:
>   @Override
>   public long getLength() {
>     // Not clear how to obtain this... seems to be used only for sorting 
> splits
>     return 0;
>   }
> This is causing us problem with scheduling - we have got jobs that are 
> supposed to finish in limited time but they get often stuck in last mapper 
> working on large region.
> Can we implement this method ? 
> What is the best way ?
> We were thinking about estimating size by size of files on HDFS.
> We would like to get Scanner from TableSplit, use startRow, stopRow and 
> column families to get corresponding region than computing size of HDFS for 
> given region and column family. 
> Update:
> This ticket was about production issue - I talked with guy who worked on this 
> and he said our production issue was probably not directly caused by 
> getLength() returning 0. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0

Reply via email to