[ https://issues.apache.org/jira/browse/HAWQ-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hubert Zhang updated HAWQ-210: ------------------------------ Summary: Improve data locality by calculating the insert host. (was: Improve data locality when table size is small) > Improve data locality by calculating the insert host. > ----------------------------------------------------- > > Key: HAWQ-210 > URL: https://issues.apache.org/jira/browse/HAWQ-210 > Project: Apache HAWQ > Issue Type: Improvement > Components: Core > Reporter: Hubert Zhang > Assignee: Hubert Zhang > > Currently, data locality is based on a heuristic greedy algotirhm. > First consider continue blocks for a vseg and then non continue blocks and > finally non local blocks. > But when a file contains several continue blocks but each vseg could only > process one blocks due to avg size. In this case continue blocks are assigned > to different vsegs one by one, and they are to be treated as non continue > blocks. > In this improvement, we try to add continue infomation to help choosing the > right vseg in non continue blocks allocation stages. The main idea is to go > through the blocks in a file, and find the host which include the max number > of blocks in this file. We call this host as INSERT HOST. When assigning non > continue blocks, we prefer INSERT HOST to other hosts when they are all local > read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)