[ 
https://issues.apache.org/jira/browse/HBASE-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207554#comment-14207554
 ] 

Enis Soztutar commented on HBASE-12449:
---------------------------------------

If no more input, I'll commit this tomorrow. Thanks. 

> Use the max timestamp of current or old cell's timestamp in HRegion.append()
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-12449
>                 URL: https://issues.apache.org/jira/browse/HBASE-12449
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 0.98.9, 0.99.2
>
>         Attachments: hbase-12449-0.98.patch, hbase-12449.patch
>
>
> We have observed an issue in SLES clusters where the system timestamp 
> regularly goes back in time. This happens frequently enough to cause test 
> failures when LTT is used with updater. 
> Everytime an mutation is performed, the updater creates a string in the form 
> "#column:mutation_type" and appends it to the column "mutate_info". 
> It seems that when the test fails, it is always the case that the mutate_info 
> information for that particular column reported is not there in the column 
> mutate_info. However, according to the MultiThreadedUpdater source code, if a 
> row gets updated, all the columns will be mutated. So if a row contains 15 
> columns, all 15 should appear in mutate_info. 
> When the test fails though, we get an exception like: 
> {code}
> 2014-11-02 04:31:12,018 ERROR [HBaseReaderThread_7] util.MultiThreadedAction: 
> Error checking data for key [b0485292cde20d8a76cca37410a9f115-23787], column 
> family [test_cf], column [8], mutation [null]; value of length 818
> {code}
> For the same row, the mutate info DOES NOT contain columns 8 (and 9) while it 
> should: 
> {code}
>  test_cf:mutate_info                                 timestamp=1414902651388, 
> value=#increment:1#0:0#1:0#10:3#11:0#12:3#13:0#14:0#15:0#16:2#2:3#3:0#4:2#5:3#6:0#7:0
>                                          
> {code}
> Further debugging led to finding the root cause where It seems that on SUSE 
> System.currentTimeMillis() can go back in time freely (especially when run in 
> a virtualized env like EC2), and actually happens very frequently. 
> This is from a debug log that was put in place: 
> {code}
> 2014-11-04 01:16:05,025 INFO  
> [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765025/Put/mvcc=8239/#increment:1
> 2014-11-04 01:16:05,038 INFO  
> [B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765038/Put/mvcc=8255/#increment:1#0:3
> 2014-11-04 01:16:05,047 INFO  
> [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765047/Put/mvcc=8265/#increment:1#0:3#1:3
> 2014-11-04 01:16:05,057 INFO  
> [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765056/Put/mvcc=8274/#increment:1#0:3#1:3#10:2
> 2014-11-04 01:16:05,061 INFO  
> [B.DefaultRpcServer.handler=6,queue=0,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765061/Put/mvcc=8278/#increment:1#0:3#1:3#10:2#11:0
> 2014-11-04 01:16:05,070 INFO  
> [B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765070/Put/mvcc=8285/#increment:1#0:3#1:3#10:2#11:0#12:3
> 2014-11-04 01:16:05,076 INFO  
> [B.DefaultRpcServer.handler=3,queue=0,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765076/Put/mvcc=8289/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0
> 2014-11-04 01:16:05,084 INFO  
> [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765084/Put/mvcc=8293/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0
> 2014-11-04 01:16:05,090 INFO  
> [B.DefaultRpcServer.handler=7,queue=1,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765090/Put/mvcc=8297/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0
> 2014-11-04 01:16:05,097 INFO  
> [B.DefaultRpcServer.handler=0,queue=0,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765097/Put/mvcc=8301/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0
> 2014-11-04 01:16:05,100 INFO  
> [B.DefaultRpcServer.handler=14,queue=2,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765100/Put/mvcc=8303/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0
> 2014-11-04 01:16:05,103 INFO  
> [B.DefaultRpcServer.handler=11,queue=2,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765103/Put/mvcc=8305/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0
> 2014-11-04 01:16:05,110 INFO  
> [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765110/Put/mvcc=8309/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2
> 2014-11-04 01:16:05,114 INFO  
> [B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765114/Put/mvcc=8312/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2
> 2014-11-04 01:16:05,119 INFO  
> [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765119/Put/mvcc=8316/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0
> 2014-11-04 01:16:04,941 INFO  
> [B.DefaultRpcServer.handler=22,queue=1,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764941/Put/mvcc=8320/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#5:0
> 2014-11-04 01:16:04,945 INFO  
> [B.DefaultRpcServer.handler=25,queue=1,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764945/Put/mvcc=8324/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#6:0
> 2014-11-04 01:16:04,988 INFO  
> [B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764988/Put/mvcc=8329/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#7:3
> 2014-11-04 01:16:04,998 INFO  
> [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764998/Put/mvcc=8337/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#8:2
> 2014-11-04 01:16:05,005 INFO  
> [B.DefaultRpcServer.handler=17,queue=2,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765005/Put/mvcc=8343/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#9:3
> {code}
> Notice that we keep appending entries like {{#13:0}} to the end. Up until 
> {code}
> 2014-11-04 01:16:05,119 INFO  
> [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765119/Put/mvcc=8316/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0
> {code}
> the timestamps are always increasing. Then the next line has a smaller 
> timestamp. It still does the append though:  
> {code}
> 2014-11-04 01:16:04,941 INFO  
> [B.DefaultRpcServer.handler=22,queue=1,port=60020] regionserver.MemStore: 
> upserting: 
> 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764941/Put/mvcc=8320/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#5:0
> {code}
> The timestamp went from 1415063765119 to 1415063764941 (a 178ms difference). 
> Since append code always checks for latest value (latest value in hbase sort 
> order), then appends to the latest value, it does not look like an HBase 
> level issue. You can even notice that the log4j timestamp is going back.
> Apart from SUSE timestamp issues (and NTP setup etc), I think our append code 
> should ensure that the timestamp of the new key value that is appended should 
> not have a lesser timestamp than the old keyvalue. We can think about 
> implementing an EnvEdgeManager to do monotonically increment the ts within 
> the process, but that may be expensive. 
> I think we can ensure to use the max timestamp in append anyway since that 
> will ensure that even if system ts goes back, the new kv always sorts before 
> the previous value. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to