[ https://issues.apache.org/jira/browse/PHOENIX-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113226#comment-15113226 ]
Thomas D'Silva commented on PHOENIX-2582: ----------------------------------------- Attaching a possible solution from a email conversation with [~apurtell] >In lieu of an (external) transaction manager, maybe you could run a Procedure >that must complete before the index create is declared successful? Procedure >is HBase's i?>internal coordination framework. HBase 0.98 and 1.0 have >ProcedureV1. HBase 1.1+ has ProcedureV2. > >Your procedure workers would set the writestate on each region to readonly, >wait for in flight writes to finish, and then join the barrier. Once inside >the barrier your workers >could make the index related state changes, or just >return if no further work needed. Your procedure workers would reset >writestate in the cleanup callback. Your coordinator >(in the master) can wait >on a monitor for global completion or poll on a completion status check. Note >Procedures will complete in either successful or failed state. Failure >may be >explicit (worker posted failure notice) or a timeout. If failed, you'll need >to retry. Once one of these has completed successfully, you would be good. > Creating an index while a batch of rows is being written leads to missing > rows in the index table > ------------------------------------------------------------------------------------------------- > > Key: PHOENIX-2582 > URL: https://issues.apache.org/jira/browse/PHOENIX-2582 > Project: Phoenix > Issue Type: Bug > Reporter: Thomas D'Silva > > If we create an index while we are upserting rows to the table its possible > we can miss writing corresponding rows to the index table. > If a region server is writing a batch of rows and we create an index just > before the batch is written we will miss writing that batch to the index > table. This is because we run the inital UPSERT SELECT to populate the index > with an SCN that we get from the server which will be before the timestamp > the batch of rows is written. > We need to figure out if there is a way to determine that are pending batches > have been written before running the UPSERT SELECT to do the initial index > population. -- This message was sent by Atlassian JIRA (v6.3.4#6332)