[
https://issues.apache.org/jira/browse/KUDU-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170047#comment-15170047
]
Todd Lipcon commented on KUDU-1341:
-----------------------------------
Pretty sure I got to the bottom of this. The description of the problem and a
WIP fix is in http://gerrit.cloudera.org:8080/#/c/2300/
> Out of order UNDO in delta file, possibly related to REINSERT
> -------------------------------------------------------------
>
> Key: KUDU-1341
> URL: https://issues.apache.org/jira/browse/KUDU-1341
> Project: Kudu
> Issue Type: Bug
> Components: tablet
> Affects Versions: 0.6.0
> Reporter: Mike Percy
> Assignee: Todd Lipcon
>
> As reported by Bruce Song Zhang, there is a crasher in debug mode in tablet
> compaction under load.
> Here is a snippet of the log file he sent:
> {noformat}
> I0219 10:03:27.503398 31721 delta_tracker.cc:456] Flushing 8 deltas from DMS
> 114...
> I0219 10:03:27.504659 31721 delta_tracker.cc:410] Flushed delta block:
> 2109978395164279242
> I0219 10:03:27.504781 31721 delta_tracker.cc:416] Reopened delta block for
> read: 2109978395164279242
> I0219 10:03:28.150115 31721 maintenance_manager.cc:359] Time spent running
> FlushDeltaMemStoresOp(c6069618c9864c1d8422912bf69efd44): real 0.667s
> user 0.647s sys 0.020s
> I0219 10:03:28.665895 31721 log_reader.cc:415] T
> c6069618c9864c1d8422912bf69efd44: removed 1 log segments from log reader
> I0219 10:03:28.665935 31721 log.cc:736] Deleting log segment in path:
> /export/servers/kudu/tserver_wal_data/wals/c6069618c9864c1d8422912bf69efd44/wal-000000223
> (GCed ops < 61476431)
> I0219 10:03:28.704988 31721 maintenance_manager.cc:359] Time spent running
> LogGCOp(c6069618c9864c1d8422912bf69efd44): real 0.039s user 0.000s
> sys 0.039s
> I0219 10:03:29.456583 31721 tablet.cc:653] Flush: entering stage 1 (old
> memrowset already frozen for inserts)
> I0219 10:03:29.456614 31721 compaction.cc:591] Selected 1 rowsets to compact:
> I0219 10:03:29.456630 31721 compaction.cc:594] memrowset(current size on
> disk: ~0 bytes)
> I0219 10:03:29.456686 31721 tablet.cc:655] Memstore in-memory size: 33095447
> bytes
> I0219 10:03:29.456742 31721 tablet.cc:1132] Flush: entering phase 1 (flushing
> snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T <
> 5963147244092280832 or (T in {5963147244092280832})}]
> I0219 10:03:29.457515 31721 multi_column_writer.cc:82] Opened CFile writer
> for column country_id[int32 NOT NULL]
> I0219 10:03:29.457700 31721 multi_column_writer.cc:82] Opened CFile writer
> for column sku_id[int64 NOT NULL]
> I0219 10:03:29.457865 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cur_day[int32 NOT NULL]
> I0219 10:03:29.458027 31721 multi_column_writer.cc:82] Opened CFile writer
> for column country_name[string NOT NULL]
> I0219 10:03:29.458173 31721 multi_column_writer.cc:82] Opened CFile writer
> for column sku_name[string NOT NULL]
> I0219 10:03:29.458333 31721 multi_column_writer.cc:82] Opened CFile writer
> for column brand_id[int32 NOT NULL]
> I0219 10:03:29.458493 31721 multi_column_writer.cc:82] Opened CFile writer
> for column brand_name[string NOT NULL]
> I0219 10:03:29.458636 31721 multi_column_writer.cc:82] Opened CFile writer
> for column product_id[int64 NOT NULL]
> I0219 10:03:29.458777 31721 multi_column_writer.cc:82] Opened CFile writer
> for column product_name[string NOT NULL]
> I0219 10:03:29.458927 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid1[int32 NOT NULL]
> I0219 10:03:29.459066 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid1_name[string NOT NULL]
> I0219 10:03:29.459198 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid2[int32 NOT NULL]
> I0219 10:03:29.459347 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid2_name[string NOT NULL]
> I0219 10:03:29.459491 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid3[int32 NOT NULL]
> I0219 10:03:29.459630 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid3_name[string NOT NULL]
> I0219 10:03:29.459774 31721 multi_column_writer.cc:82] Opened CFile writer
> for column data_type[int8 NOT NULL]
> I0219 10:03:29.459910 31721 multi_column_writer.cc:82] Opened CFile writer
> for column price[double NOT NULL]
> I0219 10:03:29.460060 31721 multi_column_writer.cc:82] Opened CFile writer
> for column sale_date[timestamp NOT NULL]
> I0219 10:03:29.460197 31721 multi_column_writer.cc:82] Opened CFile writer
> for column sale_number[int64 NOT NULL]
> I0219 10:03:29.460340 31721 multi_column_writer.cc:82] Opened CFile writer
> for column gmv[double NOT NULL]
> W0219 10:03:29.572443 31721 compaction.cc:714] Found REINSERT REDO truncating
> row history for Source Row: (int32 country_id=387, int64 sku_id=50804, int32
> cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴
> 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64
> product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜
> 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string
> cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12,
> double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64
> sale_number=12332, double gmv=12233) Redo Mutations:
> [@5963145475847274496(DELETE), @5963145476833153024(REINSERT (int32
> country_id=387, int64 sku_id=50804, int32 cur_day=387, string
> country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理,
> int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413,
> string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123,
> string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123,
> string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp
> sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double
> gmv=12233))] Undo Mutations: [@5963144433666650112(DELETE)]
> Dest Row: (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387,
> string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜
> 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64
> product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜
> 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string
> cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12,
> double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64
> sale_number=12332, double gmv=12233) Redo Mutations: [] Undo Mutations:
> [@5963145476833153024(DELETE)] Note: this warning will appear only for the
> first truncated row
> W0219 10:03:32.302188 31721 compaction.cc:834] Total 7 rows lost some history
> due to REINSERT after DELETE
> I0219 10:03:32.346256 31721 tablet.cc:1222] Flush: entering phase 2 (starting
> to duplicate updates in new rowsets)
> I0219 10:03:32.846774 31721 tablet.cc:1277] Flush Phase 2: carrying over any
> updates which arrived during Phase 1
> I0219 10:03:32.846839 31721 tablet.cc:1278] Phase 2 snapshot:
> MvccSnapshot[committed={T|T < 5963147244092280832 or (T in
> {5963147244092280832})}]
> I0219 10:03:34.485435 31721 tablet.cc:1311] Flush successful on 52051 rows
> (19462877 bytes)
> I0219 10:03:34.501569 31721 maintenance_manager.cc:359] Time spent running
> FlushMRSOp(c6069618c9864c1d8422912bf69efd44): real 5.541s user 5.454s
> sys 0.081s
> I0219 10:05:45.657285 31721 tablet.cc:1333] Compaction: stage 1 complete,
> picked 6 rowsets to compact
> I0219 10:05:45.657618 31721 compaction.cc:591] Selected 6 rowsets to compact:
> I0219 10:05:45.657727 31721 compaction.cc:594] RowSet(15418)(current size on
> disk: ~30348706 bytes)
> I0219 10:05:45.657793 31721 compaction.cc:594] RowSet(15544)(current size on
> disk: ~1123338 bytes)
> I0219 10:05:45.657855 31721 compaction.cc:594] RowSet(15551)(current size on
> disk: ~26151132 bytes)
> I0219 10:05:45.657886 31721 compaction.cc:594] RowSet(15610)(current size on
> disk: ~30315579 bytes)
> I0219 10:05:45.657913 31721 compaction.cc:594] RowSet(15611)(current size on
> disk: ~32233272 bytes)
> I0219 10:05:45.657934 31721 compaction.cc:594] RowSet(15617)(current size on
> disk: ~18863054 bytes)
> I0219 10:05:45.657984 31721 tablet.cc:1132] Compaction: entering phase 1
> (flushing snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T <
> 5963147244092280832 or (T in {5963147244092280832})}]
> I0219 10:05:45.661224 31721 multi_column_writer.cc:82] Opened CFile writer
> for column country_id[int32 NOT NULL]
> I0219 10:05:45.661432 31721 multi_column_writer.cc:82] Opened CFile writer
> for column sku_id[int64 NOT NULL]
> I0219 10:05:45.661598 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cur_day[int32 NOT NULL]
> I0219 10:05:45.661761 31721 multi_column_writer.cc:82] Opened CFile writer
> for column country_name[string NOT NULL]
> I0219 10:05:45.661933 31721 multi_column_writer.cc:82] Opened CFile writer
> for column sku_name[string NOT NULL]
> I0219 10:05:45.662081 31721 multi_column_writer.cc:82] Opened CFile writer
> for column brand_id[int32 NOT NULL]
> I0219 10:05:45.662228 31721 multi_column_writer.cc:82] Opened CFile writer
> for column brand_name[string NOT NULL]
> I0219 10:05:45.662384 31721 multi_column_writer.cc:82] Opened CFile writer
> for column product_id[int64 NOT NULL]
> I0219 10:05:45.662542 31721 multi_column_writer.cc:82] Opened CFile writer
> for column product_name[string NOT NULL]
> I0219 10:05:45.662695 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid1[int32 NOT NULL]
> I0219 10:05:45.662832 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid1_name[string NOT NULL]
> I0219 10:05:45.662962 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid2[int32 NOT NULL]
> I0219 10:05:45.663100 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid2_name[string NOT NULL]
> I0219 10:05:45.663239 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid3[int32 NOT NULL]
> I0219 10:05:45.663393 31721 multi_column_writer.cc:82] Opened CFile writer
> for column cid3_name[string NOT NULL]
> I0219 10:05:45.663538 31721 multi_column_writer.cc:82] Opened CFile writer
> for column data_type[int8 NOT NULL]
> I0219 10:05:45.663679 31721 multi_column_writer.cc:82] Opened CFile writer
> for column price[double NOT NULL]
> I0219 10:05:45.663830 31721 multi_column_writer.cc:82] Opened CFile writer
> for column sale_date[timestamp NOT NULL]
> I0219 10:05:45.663964 31721 multi_column_writer.cc:82] Opened CFile writer
> for column sale_number[int64 NOT NULL]
> I0219 10:05:45.664091 31721 multi_column_writer.cc:82] Opened CFile writer
> for column gmv[double NOT NULL]
> F0219 10:06:01.038265 31721 deltafile.cc:136] Check failed:
> last_key_.CompareTo<UNDO>(key) <= 0 must insert undo deltas in sorted order
> (ascending key, then descending ts): got key (row
> 278573@tx5962430420551352320) after (row 278573@tx5962430179277082624)
> {noformat}
> It appears that the timestamps are misordered (in ascending order instead of
> descending order), triggering a DCHECK. We don't normally see this in load
> tests because we run with RELEASE mode in most of our long running stress
> tests.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)