[ https://issues.apache.org/jira/browse/TRAFODION-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943284#comment-14943284 ]
Qifan Chen commented on TRAFODION-1271: --------------------------------------- This problem has been resolved. > LP Bug: 1464306 - Compiler:ESP colocation with Hbase Regions > ------------------------------------------------------------ > > Key: TRAFODION-1271 > URL: https://issues.apache.org/jira/browse/TRAFODION-1271 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmp > Reporter: Ravisha Neelakanthappa > Assignee: Ravisha Neelakanthappa > Priority: Critical > > There is a scope for performance improvement if ESPs are colocated with Habse > regions they access by leveraging data locality of HBase Region server and > Hadoop data nodes. > Currently ESPs are assigned to any random node as shown in the code below: > // Get the node map for this ESP fragment. > NodeMap *nodeMap = > (NodeMap *)fragmentDir_->getPartitioningFunction(i)->getNodeMap(); > for (CollIndex j=0; j<nodeMap->getNumEntries(); j++) { > nodeMap->setNodeNumber(j, ANY_NODE); > nodeMap->setClusterNumber(j, 0); > } > Because of this assignment the communication between ESP and RegionServers > can cross node boundaries causing > slow performance. > Here is the algorithm used for ESP colocation: > 1. During startup create a Hashdictionary of NodeNames(Key):NodeNumber(value) > 2. During NATable creation make a JNI call to get Node(Host) Names of Table's > regions > 3. get NodeNumber of each NodeName using Hashdictionary > 4. Populate NodeMap with NodeNumber from step 3 above > 5. During HbaseScan synthesis, new NodeMap gets created for each context > being optimized. > Copy NodeNumbers from NodeMap stored in table's partFunc. > 5a. If there is 1:1 mapping, do a direct copy > 5b. If there is M:N (where M < N), use most popular NodeNumber of > partition grouping > 6. In the generator, assign ANY_NODE only if ESP colocation logic is OFF > > Data locality: > When data is written in HDFS, one copy is written locally, another is written > to another node in a different rack (if possible) and a third copy is written > to another node in the same rack. For all practical purposes the two extra > copies are written to random nodes in the cluster. > In typical HBase setups a RegionServer is co-located with an HDFS DataNode on > the same physical machine. Thus every write is written locally and then to > the two nodes as mentioned above. As long the regions are not moved between > RegionServers there is good data locality: A RegionServer can serve most > reads just from the local disk (and cache), provided short circuit reads are > enabled > When regions are re-assigned data locality is lost and the RegionServers in > question need to request the data over the network from remote DataNodes, > until the data is rewritten locally (Major compaction time) -- This message was sent by Atlassian JIRA (v6.3.4#6332)