[ https://issues.apache.org/jira/browse/PHOENIX-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vincent Poon updated PHOENIX-4703: ---------------------------------- Description: Currently if we run "ALTER INDEX ... REBUILD" , all the rows in the index are deleted and the index is rebuilt synchronously. "ALTER INEX ... REBUILD ASYNC" seems to be used for the IndexTool's partial rebuild option, rebuilding from ASYNC_REBUILD_TIMESTAMP (PHOENIX-2890) So it seems currently the only way to fully rebuild is the drop the index, and recreate it. This is burdensome as it requires have the schema DDL. We should have an option to fully rebuild asynchronously, that has the same semantics as dropping and recreating the index. A further advantage of this is we can maintain the splits of the index table while dropping its data. We are currently seeing issues where rebuilding a large table via a MR job results in hotspotting due to all data regions writing to the same index region at the start. was: Currently if we run "ALTER INDEX ... REBUILD" , all the rows in the index are deleted and the index is rebuilt synchronously. "ALTER INEX ... REBUILD ASYNC" seems to be used for the IndexTool's partial rebuild option, rebuilding from ASYNC_REBUILD_TIMESTAMP (PHOENIX-2890) So it seems currently the only way to fully rebuild is the drop the index, and recreate it. This is burdensome as it requires have the schema DDL. We should have an option to fully rebuild asynchronously, that has the same semantics as dropping and recreating the index. > Provide an option to fully rebuild asynchronously through SQL > ------------------------------------------------------------- > > Key: PHOENIX-4703 > URL: https://issues.apache.org/jira/browse/PHOENIX-4703 > Project: Phoenix > Issue Type: Bug > Reporter: Vincent Poon > Priority: Major > > Currently if we run "ALTER INDEX ... REBUILD" , all the rows in the index are > deleted and the index is rebuilt synchronously. > "ALTER INEX ... REBUILD ASYNC" seems to be used for the IndexTool's partial > rebuild option, rebuilding from ASYNC_REBUILD_TIMESTAMP (PHOENIX-2890) > So it seems currently the only way to fully rebuild is the drop the index, > and recreate it. This is burdensome as it requires have the schema DDL. > We should have an option to fully rebuild asynchronously, that has the same > semantics as dropping and recreating the index. A further advantage of this > is we can maintain the splits of the index table while dropping its data. We > are currently seeing issues where rebuilding a large table via a MR job > results in hotspotting due to all data regions writing to the same index > region at the start. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)