[ https://issues.apache.org/jira/browse/HBASE-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207554#comment-14207554 ]
Enis Soztutar commented on HBASE-12449: --------------------------------------- If no more input, I'll commit this tomorrow. Thanks. > Use the max timestamp of current or old cell's timestamp in HRegion.append() > ---------------------------------------------------------------------------- > > Key: HBASE-12449 > URL: https://issues.apache.org/jira/browse/HBASE-12449 > Project: HBase > Issue Type: Bug > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Fix For: 2.0.0, 0.98.9, 0.99.2 > > Attachments: hbase-12449-0.98.patch, hbase-12449.patch > > > We have observed an issue in SLES clusters where the system timestamp > regularly goes back in time. This happens frequently enough to cause test > failures when LTT is used with updater. > Everytime an mutation is performed, the updater creates a string in the form > "#column:mutation_type" and appends it to the column "mutate_info". > It seems that when the test fails, it is always the case that the mutate_info > information for that particular column reported is not there in the column > mutate_info. However, according to the MultiThreadedUpdater source code, if a > row gets updated, all the columns will be mutated. So if a row contains 15 > columns, all 15 should appear in mutate_info. > When the test fails though, we get an exception like: > {code} > 2014-11-02 04:31:12,018 ERROR [HBaseReaderThread_7] util.MultiThreadedAction: > Error checking data for key [b0485292cde20d8a76cca37410a9f115-23787], column > family [test_cf], column [8], mutation [null]; value of length 818 > {code} > For the same row, the mutate info DOES NOT contain columns 8 (and 9) while it > should: > {code} > test_cf:mutate_info timestamp=1414902651388, > value=#increment:1#0:0#1:0#10:3#11:0#12:3#13:0#14:0#15:0#16:2#2:3#3:0#4:2#5:3#6:0#7:0 > > {code} > Further debugging led to finding the root cause where It seems that on SUSE > System.currentTimeMillis() can go back in time freely (especially when run in > a virtualized env like EC2), and actually happens very frequently. > This is from a debug log that was put in place: > {code} > 2014-11-04 01:16:05,025 INFO > [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765025/Put/mvcc=8239/#increment:1 > 2014-11-04 01:16:05,038 INFO > [B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765038/Put/mvcc=8255/#increment:1#0:3 > 2014-11-04 01:16:05,047 INFO > [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765047/Put/mvcc=8265/#increment:1#0:3#1:3 > 2014-11-04 01:16:05,057 INFO > [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765056/Put/mvcc=8274/#increment:1#0:3#1:3#10:2 > 2014-11-04 01:16:05,061 INFO > [B.DefaultRpcServer.handler=6,queue=0,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765061/Put/mvcc=8278/#increment:1#0:3#1:3#10:2#11:0 > 2014-11-04 01:16:05,070 INFO > [B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765070/Put/mvcc=8285/#increment:1#0:3#1:3#10:2#11:0#12:3 > 2014-11-04 01:16:05,076 INFO > [B.DefaultRpcServer.handler=3,queue=0,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765076/Put/mvcc=8289/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0 > 2014-11-04 01:16:05,084 INFO > [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765084/Put/mvcc=8293/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0 > 2014-11-04 01:16:05,090 INFO > [B.DefaultRpcServer.handler=7,queue=1,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765090/Put/mvcc=8297/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0 > 2014-11-04 01:16:05,097 INFO > [B.DefaultRpcServer.handler=0,queue=0,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765097/Put/mvcc=8301/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0 > 2014-11-04 01:16:05,100 INFO > [B.DefaultRpcServer.handler=14,queue=2,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765100/Put/mvcc=8303/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0 > 2014-11-04 01:16:05,103 INFO > [B.DefaultRpcServer.handler=11,queue=2,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765103/Put/mvcc=8305/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0 > 2014-11-04 01:16:05,110 INFO > [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765110/Put/mvcc=8309/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2 > 2014-11-04 01:16:05,114 INFO > [B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765114/Put/mvcc=8312/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2 > 2014-11-04 01:16:05,119 INFO > [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765119/Put/mvcc=8316/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0 > 2014-11-04 01:16:04,941 INFO > [B.DefaultRpcServer.handler=22,queue=1,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764941/Put/mvcc=8320/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#5:0 > 2014-11-04 01:16:04,945 INFO > [B.DefaultRpcServer.handler=25,queue=1,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764945/Put/mvcc=8324/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#6:0 > 2014-11-04 01:16:04,988 INFO > [B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764988/Put/mvcc=8329/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#7:3 > 2014-11-04 01:16:04,998 INFO > [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764998/Put/mvcc=8337/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#8:2 > 2014-11-04 01:16:05,005 INFO > [B.DefaultRpcServer.handler=17,queue=2,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765005/Put/mvcc=8343/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#9:3 > {code} > Notice that we keep appending entries like {{#13:0}} to the end. Up until > {code} > 2014-11-04 01:16:05,119 INFO > [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765119/Put/mvcc=8316/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0 > {code} > the timestamps are always increasing. Then the next line has a smaller > timestamp. It still does the append though: > {code} > 2014-11-04 01:16:04,941 INFO > [B.DefaultRpcServer.handler=22,queue=1,port=60020] regionserver.MemStore: > upserting: > 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764941/Put/mvcc=8320/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#5:0 > {code} > The timestamp went from 1415063765119 to 1415063764941 (a 178ms difference). > Since append code always checks for latest value (latest value in hbase sort > order), then appends to the latest value, it does not look like an HBase > level issue. You can even notice that the log4j timestamp is going back. > Apart from SUSE timestamp issues (and NTP setup etc), I think our append code > should ensure that the timestamp of the new key value that is appended should > not have a lesser timestamp than the old keyvalue. We can think about > implementing an EnvEdgeManager to do monotonically increment the ts within > the process, but that may be expensive. > I think we can ensure to use the max timestamp in append anyway since that > will ensure that even if system ts goes back, the new kv always sorts before > the previous value. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)