[jira] [Commented] (HBASE-14520) Optimize the number of calls for tags creation in bulk load

Ted Yu (JIRA) Mon, 05 Oct 2015 09:17:49 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943595#comment-14943595
 ]


Ted Yu commented on HBASE-14520:
--------------------------------

{code}
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:428)
        at 
org.apache.hadoop.mapred.TestMultiFileSplit.testReadWrite(TestMultiFileSplit.java:41)
{code}
Some mapreduce unit test got picked up by test script.

bq. +1 core tests. The patch passed unit tests in .

> Optimize the number of calls for tags creation in bulk load
> -----------------------------------------------------------
>
>                 Key: HBASE-14520
>                 URL: https://issues.apache.org/jira/browse/HBASE-14520
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Bhupendra Kumar Jain
>            Assignee: Bhupendra Kumar Jain
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14520.patch
>
>
> At present, ttl and Visibility expr is one per tsv line i.e. the values and 
> the tags remain same for all the columns present in that line. As per the 
> code, List of tags are created for each cell, Instead of creating new tags 
> for each cell, tags created once for the line can be reused by other cells.  
> Assume 1Million rows and 1000 columns. Currently tags creation will happen 
> for 1M * 1000 times. If reuse the tags, the tags creation can reduce to 1M 
> times. (i.e. one per tsv line). 
> This is applicable in both TsvImporterMapper and TextSortReducer logic. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14520) Optimize the number of calls for tags creation in bulk load

Reply via email to