[ 
https://issues.apache.org/jira/browse/HAWQ-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hubert Zhang updated HAWQ-210:
------------------------------
    Summary: Improve data locality by calculating the insert host.  (was: 
Improve data locality when table size is small)

> Improve data locality by calculating the insert host.
> -----------------------------------------------------
>
>                 Key: HAWQ-210
>                 URL: https://issues.apache.org/jira/browse/HAWQ-210
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Hubert Zhang
>            Assignee: Hubert Zhang
>
> Currently, data locality is based on a heuristic greedy algotirhm.
> First consider continue blocks for a vseg and then non continue blocks and 
> finally non local blocks.
> But when a file contains several continue blocks but each vseg could only 
> process one blocks due to avg size. In this case continue blocks are assigned 
> to different vsegs one by one, and they are to be treated as non continue 
> blocks.
> In this improvement, we try to add continue infomation to help choosing the 
> right vseg in non continue blocks allocation stages. The main idea is to go 
> through the blocks in a file, and find the host which include the max number 
> of blocks in this file. We call this host as INSERT HOST. When assigning non 
> continue blocks, we prefer INSERT HOST to other hosts when they are all local 
> read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to