[ 
https://issues.apache.org/jira/browse/PHOENIX-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035168#comment-15035168
 ] 

James Taylor commented on PHOENIX-2478:
---------------------------------------

I can think of a few potential solutions:
# Special case the call to check if the schema is up to date at commit time to 
use the write pointer instead of the read pointer. In this case, the index 
would be noticed since it was added after the read pointer, but before the 
write pointer. This solution has several drawbacks:
#* an extra RPC would be required at commit time
#* a potential race condition exists between the index creation and the commit 
(there'd likely still be some potential for not seeing the index as we're 
seeing this occur for non transactional table sometimes - see PHOENIX-2446)
#* given that we want to make our DDL commands transactional, this would we an 
issue again then
# Hold off on building and marking an index as active for the transaction 
timeout period to give in progress transactions a chance to finish. This is 
obviously not idea, especially when creating an index over a small or empty 
table.
# Enhance Tephra's conflict detection to help with this.
#* One way would be to have a kind of "wildcard" row key that we could put in 
the change set for a table which would conflict with any rows that overlap the 
read pointer and write pointer timespan. The CREATE INDEX call would add this 
wildcard and our commit logic could handle the exception that would occur by 
resubmitting the commit (with the updated metadata).
#* Another possibility would be to allow an entry in the change set to be 
declared as a read versus a write. A read/read conflict would be allowed while 
a read/write or write/write wouldn't. Phoenix could use this by adding the 
metadata table row involved in the DML command to the change set as a "read" 
and the DDL command of creating an index on a table as a "write". Then we'd get 
an exception which we could react to if DML is being done on a table at the 
same time as DDL (i.e. index creation), but we wouldn't if two simultaneous DML 
commands are executed (we'd still get a conflict, of course, if the row keys of 
the data being mutated overlapped).

Seems like either of the last alternatives is what we really need to make DDL 
play nicely with DML. Any thoughts/ideas [~poornachandra], [~tdsilva]?

> Rows committed in transaction overlapping index creation are not populated
> --------------------------------------------------------------------------
>
>                 Key: PHOENIX-2478
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2478
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> For a reproducible case, see IndexIT.testCreateIndexAfterUpsertStarted() and 
> the associated FIXME comments for PHOENIX-2446.
> The case that is failing is when a commit starts before an index exists, but 
> commits after the index build is completed. For transactional data, this is 
> problematic because the index gets a timestamp after the commit of the data 
> table mutation and thus these mutations won't be seen during the commit. 
> Also, when the index is being built, the data hasn't yet been committed and 
> thus won't be part of the initial index build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to