[ https://issues.apache.org/jira/browse/KUDU-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15922695#comment-15922695 ]
Jean-Daniel Cryans commented on KUDU-1933: ------------------------------------------ Oh and the master can't restart for the same reason. > Master crashes after too many TS re-registrations > ------------------------------------------------- > > Key: KUDU-1933 > URL: https://issues.apache.org/jira/browse/KUDU-1933 > Project: Kudu > Issue Type: Bug > Components: master > Affects Versions: 1.3.0 > Reporter: Jean-Daniel Cryans > > I had a cluster with mis-matched versions inside the 1.3 release (something > no one would see using released versions) and ended up with the tablet > servers constantly retrying to register with the master. After a few days of > this, the master died this way: > {noformat} > I0308 00:25:47.038650 7619 ts_descriptor.cc:125] Processing retry of TS > registration from permanent_uuid: "d8009e07d82b4e66a7ab50f85e60bc30" > instance_seqno: 1487888450146835 > I0308 00:25:47.038702 7619 ts_manager.cc:84] Re-registered known tserver > with Master: d8009e07d82b4e66a7ab50f85e60bc30 (ve0136.halxg.cloudera.com:7050) > I0308 00:25:47.043874 7616 ts_descriptor.cc:125] Processing retry of TS > registration from permanent_uuid: "335d132897de4bdb9b87443f2c487a42" > instance_seqno: 1487888474889244 > I0308 00:25:47.043912 7616 ts_manager.cc:84] Re-registered known tserver > with Master: 335d132897de4bdb9b87443f2c487a42 (ve0126.halxg.cloudera.com:7050) > I0308 00:25:47.108677 7617 ts_descriptor.cc:125] Processing retry of TS > registration from permanent_uuid: "7425c65d80f54f2da0a85494a5eb3e68" > instance_seqno: 1487888491433564 > I0308 00:25:47.108719 7617 ts_manager.cc:84] Re-registered known tserver > with Master: 7425c65d80f54f2da0a85494a5eb3e68 (ve0122.halxg.cloudera.com:7050) > I0308 00:25:47.111563 7611 ts_descriptor.cc:125] Processing retry of TS > registration from permanent_uuid: "c108a85a68504c2bb9f49e4ee683d981" > instance_seqno: 1487888392795318 > I0308 00:25:47.111604 7611 ts_manager.cc:84] Re-registered known tserver > with Master: c108a85a68504c2bb9f49e4ee683d981 (ve0128.halxg.cloudera.com:7050) > F0308 00:25:53.568773 7655 log_index.cc:171] Check failed: log_index > 0 > (-2147483648 vs. 0) > {noformat} > Ideally the master shouldn't crash, but it also sounds like we're not > handling log_index overflows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)