[ https://issues.apache.org/jira/browse/HAWQ-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037427#comment-15037427 ]
ASF GitHub Bot commented on HAWQ-210: ------------------------------------- Github user yaoj2 commented on the pull request: https://github.com/apache/incubator-hawq/pull/155#issuecomment-161540301 Looks good > Improve data locality by calculating the insert host. > ----------------------------------------------------- > > Key: HAWQ-210 > URL: https://issues.apache.org/jira/browse/HAWQ-210 > Project: Apache HAWQ > Issue Type: Improvement > Components: Core > Reporter: Hubert Zhang > Assignee: Hubert Zhang > > Currently, data locality is based on a heuristic greedy algotirhm. > First consider continue blocks for a vseg and then non continue blocks and > finally non local blocks. > But when a file contains several continue blocks but each vseg could only > process one blocks due to avg size. In this case continue blocks are assigned > to different vsegs one by one, and they are to be treated as non continue > blocks. > In this improvement, we try to add continue infomation to help choosing the > right vseg in non continue blocks allocation stages. The main idea is to go > through the blocks in a file, and find the host which include the max number > of blocks in this file. We call this host as INSERT HOST. When assigning non > continue blocks, we prefer INSERT HOST to other hosts when they are all local > read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)