[ 
https://issues.apache.org/jira/browse/KUDU-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763109#comment-16763109
 ] 

Andrew Wong commented on KUDU-2690:
-----------------------------------

Another point to note is that this appears to be viral: whatever this is can 
lead to multiple failed replicas for a given tablet.

> Alter schema seems to be missing
> --------------------------------
>
>                 Key: KUDU-2690
>                 URL: https://issues.apache.org/jira/browse/KUDU-2690
>             Project: Kudu
>          Issue Type: Bug
>          Components: log, master, tablet
>    Affects Versions: 1.7.1
>            Reporter: Andrew Wong
>            Priority: Major
>
> I've seen an issue that looks as though an ADD_COLUMN is not fully applied 
> before performing writes. This results in a failure to bootstrap with an 
> error like:
> {{F0112 19:58:08.591284  8692 transaction_driver.cc:383] T 
> 578f2c6e60d84cb18d704889ea323cda P dc0af5867d52468f8fd47abf13c08040 S R-NP Ts 
> 6317323785408049152: Cannot cancel transactions that have already replicated: 
> Invalid argument: Client provided column <COLUMN NAME>[double NULLABLE] not 
> present in tablet transaction:R-NP WriteTransaction [type=REPLICA, 
> start_time=2019-01-12 19:58:08, state=WriteTransactionState 0x5d52000 
> [op_id=(term: 2548 index: 160364490), ts=6317323785408049152, rows=[]]]}}
>  
> One clue is that in the WALs, the "client schema" (the schema in each write 
> request) contains a column that is not in the "tablet schema" (the schema in 
> the log segment), and so dumping the WALs will fail. This alone shouldn't 
> prevent bootstrapping, but when replaying the WAL, we decode the write 
> request against the schema in the tablet metadata. This failure seems to 
> indicate that the tablet metadata's schema is missing a column that is being 
> used by a committed write. I've been trying to piece together various ALTER 
> SCHEMA bugs that we have (e.g. KUDU-860) to recreate this, but haven't had 
> much luck.
>  
> It's worth noting that this cluster is misconfigured so its tablet servers 
> point to duplicate master addresses, and is therefore susceptible to 
> KUDU-2681 and KUDU-2684, meaning each tablet report will result in multiple 
> concurrent tasks being scheduled in response.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to