[ 
https://issues.apache.org/jira/browse/KUDU-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-1488.
-------------------------------
       Resolution: Duplicate
    Fix Version/s: n/a

assuming this is the dup as mentioned above.

> Master crash due to tablet not set up
> -------------------------------------
>
>                 Key: KUDU-1488
>                 URL: https://issues.apache.org/jira/browse/KUDU-1488
>             Project: Kudu
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.9.0
>            Reporter: Mike Percy
>            Priority: Critical
>             Fix For: n/a
>
>
> I ran into a master crash yesterday while testing on a cluster. This is the 
> first time I've seen the issue, but it seems that somehow the VisitTable() 
> call can get invoked before the tablet is set up. I was hoping to be able to 
> reproduce the issue but I haven't seen it since. I was running under ASAN so 
> no core file. Here are the contents of the log:
> {code}
> Log file created at: 2016/06/14 14:18:04
> Running on machine: ve0120.zzz.cloudera.com
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I0614 14:18:04.813853 124586 mem_tracker.cc:138] MemTracker: hard memory 
> limit is 75.496170 GB
> I0614 14:18:04.814714 124586 mem_tracker.cc:140] MemTracker: soft memory 
> limit is 45.297703 GB
> I0614 14:18:04.820498 124586 master_main.cc:58] Initializing master server...
> I0614 14:18:04.820914 124586 hybrid_clock.cc:178] HybridClock initialized. 
> Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current 
> error: 116399
> I0614 14:18:04.915469 124586 fs_manager.cc:242] Opened local filesystem: 
> /data/1/kudu,/data/2/kudu
> uuid: "462e0e35e0b54fb7ba4517ab503b76e7"
> format_stamp: "Formatted at 2016-05-12 18:14:53 on ve0120.zzz.cloudera.com"
> I0614 14:18:04.930495 124586 master_main.cc:61] Starting Master server...
> I0614 14:18:04.965893 124586 rpc_server.cc:164] RPC server started. Bound to: 
> 0.0.0.0:7051
> I0614 14:18:04.966097 124586 webserver.cc:124] Starting webserver on 
> 0.0.0.0:8051
> I0614 14:18:04.966123 124586 webserver.cc:129] Document root: 
> /opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.545/lib/kudu/www
> I0614 14:18:04.966714 124586 webserver.cc:218] Webserver started. Bound to: 
> http://0.0.0.0:8051/
> I0614 14:18:04.971561 124651 tablet_bootstrap.cc:376] T 
> 00000000000000000000000000000000 P 462e0e35e0b54fb7ba4517ab503b76e7: 
> Bootstrap starting.
> I0614 14:18:04.976272 124651 tablet_bootstrap.cc:536] T 
> 00000000000000000000000000000000 P 462e0e35e0b54fb7ba4517ab503b76e7: Time 
> spent opening tablet: real 0.004s     user 0.002s     sys 0.001s
> I0614 14:18:04.977147 124651 tablet_bootstrap.cc:592] T 
> 00000000000000000000000000000000 P 462e0e35e0b54fb7ba4517ab503b76e7: Will 
> attempt to recover log segment 
> /data/1/kudu/wals/00000000000000000000000000000000/wal-000000002 to 
> /data/1/kudu/wals/00000000000000000000000000000000.recovery/wal-000000002
> I0614 14:18:04.977231 124651 tablet_bootstrap.cc:592] T 
> 00000000000000000000000000000000 P 462e0e35e0b54fb7ba4517ab503b76e7: Will 
> attempt to recover log segment 
> /data/1/kudu/wals/00000000000000000000000000000000/wal-000000001 to 
> /data/1/kudu/wals/00000000000000000000000000000000.recovery/wal-000000001
> I0614 14:18:04.977282 124651 tablet_bootstrap.cc:600] T 
> 00000000000000000000000000000000 P 462e0e35e0b54fb7ba4517ab503b76e7: Moving 
> log directory /data/1/kudu/wals/00000000000000000000000000000000 to recovery 
> directory /data/1/kudu/wals/00000000000000000000000000000000.recovery in 
> preparation for log replay
> W0614 14:18:04.977718 124651 log_util.cc:311] Could not read footer for 
> segment: 
> /data/1/kudu/wals/00000000000000000000000000000000.recovery/wal-000000002: 
> Not found: Footer not found. Footer magic doesn't match
> I0614 14:18:04.978049 124651 log_reader.cc:152] Log segment 
> /data/1/kudu/wals/00000000000000000000000000000000.recovery/wal-000000002 was 
> likely left in-progress after a previous crash. Will try to rebuild footer by 
> scanning data.
> I0614 14:18:07.426069 124651 log_util.cc:569] Scanning 
> /data/1/kudu/wals/00000000000000000000000000000000.recovery/wal-000000002 for 
> valid entry headers following offset 53334570...
> I0614 14:18:07.434546 124651 log_util.cc:606] Found no log entry headers
> I0614 14:18:07.434672 124651 log_util.cc:215] Ignoring log segment corruption 
> in /data/1/kudu/wals/00000000000000000000000000000000.recovery/wal-000000002 
> because there are no log entries following the corrupted one. The server 
> probably crashed in the middle of writing an entry to the write-ahead log or 
> downloaded an active log via remote bootstrap. Error detail: Corruption: CRC 
> mismatch in log entry header: Log file corruption detected. Failed trying to 
> read batch #0 at offset 53334558 for log segment 
> /data/1/kudu/wals/00000000000000000000000000000000.recovery/wal-000000002: 
> Prior entries: [off=53333876 REPLICATE (0.5519096)] [off=53333914 COMMIT 
> (0.5519096)] [off=53334520 REPLICATE (0.5519097)] [off=53334558 COMMIT 
> (0.5519097)]
> I0614 14:18:07.434697 124651 log_util.cc:359] Successfully rebuilt footer for 
> segment: 
> /data/1/kudu/wals/00000000000000000000000000000000.recovery/wal-000000002 
> (valid entries through byte offset 53334558)
> I0614 14:18:07.434893 124651 tablet.cc:854] T 
> 00000000000000000000000000000000 Rewinding schema during bootstrap to Schema [
>         0:entry_type[int8 NOT NULL],
>         1:entry_id[string NOT NULL],
>         2:metadata[string NOT NULL]
> ]
> I0614 14:18:07.435853 124651 log.cc:334] Log is configured to *not* fsync() 
> on all Append() calls
> I0614 14:18:19.927433 124651 log.cc:498] Max segment size reached. Starting 
> new segment allocation.
> I0614 14:18:19.928328 124651 tablet_bootstrap.cc:376] T 
> 00000000000000000000000000000000 P 462e0e35e0b54fb7ba4517ab503b76e7: 
> Bootstrap replayed 1/2 log segments. Stats: ops{read=107962 overwritten=0 
> applied=107961 ignored=107961} inserts{seen=0 ignored=0} mutations{seen=0 
> ignored=0} orphaned_commits=0. Pending: 1 replicates
> I0614 14:18:19.940438 124651 log.cc:384] Rolled over to a new segment: 
> /data/1/kudu/wals/00000000000000000000000000000000/wal-000000002
> F0614 14:18:28.136240 124712 sys_catalog.cc:459] Check failed: type_col_idx 
> != Schema::kColumnNotFound
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to