[ 
https://issues.apache.org/jira/browse/HAWQ-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037427#comment-15037427
 ] 

ASF GitHub Bot commented on HAWQ-210:
-------------------------------------

Github user yaoj2 commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/155#issuecomment-161540301
  
    Looks good


> Improve data locality by calculating the insert host.
> -----------------------------------------------------
>
>                 Key: HAWQ-210
>                 URL: https://issues.apache.org/jira/browse/HAWQ-210
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Hubert Zhang
>            Assignee: Hubert Zhang
>
> Currently, data locality is based on a heuristic greedy algotirhm.
> First consider continue blocks for a vseg and then non continue blocks and 
> finally non local blocks.
> But when a file contains several continue blocks but each vseg could only 
> process one blocks due to avg size. In this case continue blocks are assigned 
> to different vsegs one by one, and they are to be treated as non continue 
> blocks.
> In this improvement, we try to add continue infomation to help choosing the 
> right vseg in non continue blocks allocation stages. The main idea is to go 
> through the blocks in a file, and find the host which include the max number 
> of blocks in this file. We call this host as INSERT HOST. When assigning non 
> continue blocks, we prefer INSERT HOST to other hosts when they are all local 
> read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to