[ https://issues.apache.org/jira/browse/PHOENIX-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457151#comment-16457151 ]
Andrew Purtell commented on PHOENIX-4704: ----------------------------------------- Even a uniform split into a few regions would be an improvement? (And subsequent organic splitting would cause region boundaries to move toward the ideal.) > Presplit index tables when building asynchronously > -------------------------------------------------- > > Key: PHOENIX-4704 > URL: https://issues.apache.org/jira/browse/PHOENIX-4704 > Project: Phoenix > Issue Type: Improvement > Reporter: Vincent Poon > Priority: Major > > For large data tables with many regions, if we build the index asynchronously > using the IndexTool, the index table will initial face a hotspot as all data > region mappers attempt to write to the sole new index region. This can > potentially lead to the index getting disabled if writes to the index table > timeout during this hotspotting. > We can add an optional step (or perhaps activate it based on the count of > regions in the data table) to the IndexTool to first do a MR job to gather > stats on the indexed column values, and then attempt to presplit the index > table before we do the actual index build MR job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)