[kudu-CR] WIP Don't suicide on EIO
Andrew Wong has abandoned this change. Change subject: WIP Don't suicide on EIO .. Abandoned The contents of this patch have been split across gerrit/c/[7028, 7031] -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] WIP Don't suicide on EIO
Andrew Wong has posted comments on this change. Change subject: WIP Don't suicide on EIO .. Patch Set 11: (17 comments) This patch has been split into a few smaller ones! Will abandon this one. http://gerrit.cloudera.org:8080/#/c/6773/10/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 24: #include > warning: #includes are not sorted properly [llvm-include-order] Done http://gerrit.cloudera.org:8080/#/c/6773/10/src/kudu/master/sys_catalog.cc File src/kudu/master/sys_catalog.cc: Line 310: RETURN_NOT_OK_PREPEND(tablet_replica_->Init(tablet, > warning: missing username/bug in TODO [google-readability-todo] This TODO isn't actionable. Removed. http://gerrit.cloudera.org:8080/#/c/6773/10/src/kudu/tablet/tablet_metadata.h File src/kudu/tablet/tablet_metadata.h: Line 172: void AddOrphanedBlocks(const std::vector& blocks); > warning: function 'kudu::tablet::TabletMetadata::AddOrphanedBlocks' has a d Changed to 'blocks' to align with other functions. http://gerrit.cloudera.org:8080/#/c/6773/10/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: Line 2490 > error: use of undeclared identifier 'set' [clang-diagnostic-error] Done Line 2490 > error: use of undeclared identifier 'failed_data_dirs' [clang-diagnostic-er Done Line 2490 > error: unexpected type name 'uint16_t': expected expression [clang-diagnost Done Line 2491 > error: use of undeclared identifier 'failed_data_dirs' [clang-diagnostic-er Done Line 2492 > error: use of undeclared identifier 'failed_data_dirs' [clang-diagnostic-er Done Line 2505 > error: use of undeclared identifier 'failed_data_dirs' [clang-diagnostic-er Done Line 2506 > error: use of undeclared identifier 'failed_data_dirs' [clang-diagnostic-er Done http://gerrit.cloudera.org:8080/#/c/6773/10/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 1085: > error: unknown type name 'set'; did you mean 'std::set'? [clang-diagnostic- Done Line 1085: > warning: all parameters should be named in a function [readability-named-pa Done Line 1085: > error: expected ')' [clang-diagnostic-error] Done Line 1088: // middle of a bootstrap, tablet copy, etc. > error: reference to type 'const string' (aka 'const basic_string') co Done Line 1088: // middle of a bootstrap, tablet copy, etc. > error: use of undeclared identifier 'tablet_ids'; did you mean 'tablet_id'? Done http://gerrit.cloudera.org:8080/#/c/6773/10/src/kudu/util/status.h File src/kudu/util/status.h: Line 100: if (s.posix_code() == (err_number)) (err_handler); \ > warning: macro argument should be enclosed in parentheses [misc-macro-paren Done Line 108: if (s.posix_code() == (err_number)) (err_handler); \ > warning: macro argument should be enclosed in parentheses [misc-macro-paren Done -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] WIP Don't suicide on EIO
David Ribeiro Alves has posted comments on this change. Change subject: WIP Don't suicide on EIO .. Patch Set 11: (2 comments) had some old comments, feel free to ignore if no longer relevant. did the contents of this patch end up somewhere else? http://gerrit.cloudera.org:8080/#/c/6773/3/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: PS3, Line 48: FsErrorMan maybe EioHandler or EioHandlerManager would be a better name> Line 135: virtual Status FlushDataAsync() = 0; spurious change -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] WIP Don't suicide on EIO
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6773 to look at the new patch set (#11). Change subject: WIP Don't suicide on EIO .. WIP Don't suicide on EIO Rather than suiciding when reaching an EIO, this patch adds a mechanism that triggers error-handling in the form of a callback. This handler is attached to the lowest non-env operations that may result in EIO. E.g. PosixRWFile::Write() is an env operation that may result in an EIO. LogBlockContainer::WriteData()'s call to it must be wrapped in the new error-handling macro KUDU_RETURN_OR_HANDLE_EIO. Thus, all direct readers/writers of files must now implement EIO-handling code and wrap disk IO in KUDU_RETURN_OR_HANDLE_EIO. Also included is a new tablet data state called TABLET_DATA_CORRUPTED in which tablet data will not be deleted until explicitly requested or during a tablet copy. This patch is marked WIP, as the correct behavior after a disk fails is not yet included. For now, upon failure, the block managers will keep track of the dead disks and will mark the appropriate tablet peers as failed. Additionally, the error handling has been moved between layers offline, so more cleanup is being done to align with the above description. Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 --- M src/kudu/consensus/consensus_peers.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h A src/kudu/fs/error_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/status.h 22 files changed, 354 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/6773/11 -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] WIP Don't suicide on EIO
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6773 to look at the new patch set (#10). Change subject: WIP Don't suicide on EIO .. WIP Don't suicide on EIO Rather than suiciding when reaching an EIO, this patch adds a mechanism that triggers error-handling in the form of a callback. This handler is attached to the lowest non-env operations that may result in EIO. E.g. PosixRWFile::Write() is an env operation that may result in an EIO. LogBlockContainer::WriteData()'s call to it must be wrapped in the new error-handling macro KUDU_RETURN_OR_HANDLE_EIO. Thus, all direct readers/writers of files must now implement EIO-handling code and wrap disk IO in KUDU_RETURN_OR_HANDLE_EIO. Also included is a new tablet data state called TABLET_DATA_CORRUPTED in which tablet data will not be deleted until explicitly requested or during a tablet copy. This patch is marked WIP, as the correct behavior after a disk fails is not yet included. For now, upon failure, the block managers will keep track of the dead disks and will mark the appropriate tablet peers as failed. Additionally, the error handling has been moved between layers offline, so more cleanup is being done to align with the above description. Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 --- M src/kudu/consensus/consensus_peers.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h A src/kudu/fs/error_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/status.h 22 files changed, 398 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/6773/10 -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] WIP Don't suicide on EIO
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6773 to look at the new patch set (#9). Change subject: WIP Don't suicide on EIO .. WIP Don't suicide on EIO Rather than suiciding when reaching an EIO, this patch adds a mechanism that triggers error-handling in the form of a callback. This handler is attached to the lowest non-env operations that may result in EIO. E.g. PosixRWFile::Write() is an env operation that may result in an EIO. LogBlockContainer::WriteData()'s call to it must be wrapped in the new error-handling macro KUDU_RETURN_OR_HANDLE_EIO. Thus, all direct readers/writers of files must now implement EIO-handling code and wrap disk IO in KUDU_RETURN_OR_HANDLE_EIO. Also included is a new tablet data state called TABLET_DATA_CORRUPTED in which tablet data will not be deleted until explicitly requested or during a tablet copy. This patch is marked WIP, as the correct behavior after a disk fails is not yet included. For now, upon failure, the block managers will keep track of the dead disks and will mark the appropriate tablet peers as failed. Additionally, the error handling has been moved between layers offline, so more cleanup is being done to align with the above description. Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 --- M src/kudu/consensus/consensus_peers.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h A src/kudu/fs/error_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h A src/kudu/tablet/tablet_peer_mm_ops.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/fault_injection.cc M src/kudu/util/fault_injection.h M src/kudu/util/status.h 27 files changed, 752 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/6773/9 -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] WIP Don't suicide on EIO
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6773 to look at the new patch set (#6). Change subject: WIP Don't suicide on EIO .. WIP Don't suicide on EIO Rather than suiciding when reaching an EIO, this patch adds a mechanism that triggers error-handling in the form of a callback. This handler is attached to the lowest non-env operations that may result in EIO. E.g. PosixRWFile::Write() is an env operation that may result in an EIO. LogBlockContainer::WriteData()'s call to it must be wrapped in the new error-handling macro KUDU_RETURN_OR_HANDLE_EIO. Thus, all direct readers/writers of files must now implement EIO-handling code and wrap disk IO in KUDU_RETURN_OR_HANDLE_EIO. Also included is a new tablet data state called TABLET_DATA_CORRUPTED in which tablet data will not be deleted until explicitly requested or during a tablet copy. This patch is marked WIP, as the correct behavior after a disk fails is not yet included. For now, upon failure, the block managers will keep track of the dead disks and will mark the appropriate tablet peers as failed. Additionally, the error handling has been moved between layers offline, so more cleanup is being done to align with the above description. Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 --- M src/kudu/consensus/consensus_peers.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h A src/kudu/fs/error_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_peer_mm_ops.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/fault_injection.cc M src/kudu/util/fault_injection.h M src/kudu/util/status.h 27 files changed, 516 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/6773/6 -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] WIP Don't suicide on EIO
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6773 to look at the new patch set (#5). Change subject: WIP Don't suicide on EIO .. WIP Don't suicide on EIO Rather than suiciding when reaching an EIO, this patch adds a mechanism that triggers error-handling in the form of a callback. This handler is attached to the lowest non-env operations that may result in EIO. E.g. PosixRWFile::Write() is an env operation that may result in an EIO. LogBlockContainer::WriteData()'s call to it must be wrapped in the new error-handling macro KUDU_RETURN_OR_HANDLE_EIO. Thus, all direct readers/writers of files must now implement EIO-handling code and wrap disk IO in KUDU_RETURN_OR_HANDLE_EIO. Also included is a new tablet data state called TABLET_DATA_CORRUPTED in which tablet data will not be deleted until explicitly requested or during a tablet copy. This patch is marked WIP, as the correct behavior after a disk fails is not yet included. For now, upon failure, the block managers will keep track of the dead disks and will mark the appropriate tablet peers as failed. Additionally, the error handling has been moved between layers offline, so more cleanup is being done to align with the above description. Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 --- M src/kudu/consensus/consensus_peers.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h A src/kudu/fs/error_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_peer_mm_ops.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/fault_injection.cc M src/kudu/util/fault_injection.h M src/kudu/util/status.h 28 files changed, 519 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/6773/5 -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] WIP Don't suicide on EIO
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6773 to look at the new patch set (#4). Change subject: WIP Don't suicide on EIO .. WIP Don't suicide on EIO Rather than suiciding when reaching an EIO, this patch adds a mechanism that triggers error-handling in the form of a callback. This handler is attached to the lowest non-env operations that may result in EIO. E.g. PosixRWFile::Write() is an env operation that may result in an EIO. LogBlockContainer::WriteData()'s call to it must be wrapped in the new error-handling macro KUDU_RETURN_OR_HANDLE_EIO. Thus, all direct readers/writers of files must now implement EIO-handling code and wrap disk IO in KUDU_RETURN_OR_HANDLE_EIO. This patch is marked WIP, as the correct behavior after a disk fails is not yet included. For now, upon failure, the block managers will keep track of the dead disks and will mark the appropriate tablet peers as failed. Additionally, the error handling has been moved between layers offline, so more cleanup is being done to align with the above description. Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 --- M src/kudu/consensus/consensus_peers.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h A src/kudu/fs/error_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_peer_mm_ops.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/fault_injection.cc M src/kudu/util/fault_injection.h M src/kudu/util/status.h 24 files changed, 463 insertions(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/6773/4 -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] WIP Don't suicide on EIO
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6773 to look at the new patch set (#3). Change subject: WIP Don't suicide on EIO .. WIP Don't suicide on EIO Rather than suiciding when reaching an EIO, this patch adds a mechanism that triggers error-handling in the form of a callback. This handler is attached to the lowest non-env operations that may result in EIO. E.g. PosixRWFile::Write() is an env operation that may result in an EIO. LogBlockContainer::WriteData()'s call to it must be wrapped in the new error-handling macro KUDU_RETURN_OR_HANDLE_EIO. Thus, all direct readers/writers of files must now implement EIO-handling code and wrap disk IO in KUDU_RETURN_OR_HANDLE_EIO. This patch is marked WIP, as the correct behavior after a disk fails is not yet included. For now, upon failure, the block managers will keep track of the dead disks and will mark the appropriate tablet peers as failed. Additionally, the error handling has been moved between layers offline, so more cleanup is being done to align with the above description. Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 --- M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h A src/kudu/fs/error_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/tablet.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/env_util.cc M src/kudu/util/fault_injection.cc M src/kudu/util/fault_injection.h M src/kudu/util/status.h 20 files changed, 426 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/6773/3 -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Kudu Jenkins
[kudu-CR] WIP Don't suicide on EIO
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6773 to look at the new patch set (#2). Change subject: WIP Don't suicide on EIO .. WIP Don't suicide on EIO Rather than suiciding when reaching an EIO, this patch adds a mechanism that triggers error-handling in the form of a callback. This handler is attached to the lowest non-env operations that may result in EIO. E.g. PosixRWFile::Write() is an env operation that may result in an EIO. LogBlockContainer::WriteData()'s call to it must be wrapped in the new error-handling macro KUDU_RETURN_OR_HANDLE_EIO. Thus, all direct readers/writers of files must now implement EIO-handling code and wrap disk IO in KUDU_RETURN_OR_HANDLE_EIO. This patch is marked WIP, as the correct behavior after a disk fails is not yet included. For now, upon failure, the block managers will keep track of the dead disks and will mark the appropriate tablet peers as failed. Additionally, the error handling has been moved between layers offline, so more cleanup is being done to align with the above description. Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 --- M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/tablet.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/env.h M src/kudu/util/env_posix.cc M src/kudu/util/env_util.cc M src/kudu/util/fault_injection.cc M src/kudu/util/fault_injection.h M src/kudu/util/status.h 19 files changed, 350 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/6773/2 -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Kudu Jenkins
[kudu-CR] WIP Don't suicide on EIO
Andrew Wong has uploaded a new change for review. http://gerrit.cloudera.org:8080/6773 Change subject: WIP Don't suicide on EIO .. WIP Don't suicide on EIO Rather than suiciding when reaching an EIO, this patch adds a mechanism that triggers error-handling in the form of a callback. This handler is attached to the lowest non-env operations that may result in EIO. E.g. PosixRWFile::Write() is an env operation that may result in an EIO. LogBlockContainer::WriteData()'s call to it must be wrapped in the new error-handling macro KUDU_RETURN_OR_HANDLE_ERROR. Thus, all direct readers/writers of files must now implement EIO-handling code and wrap disk IO in KUDU_RETURN_OR_HANDLE_ERROR. This patch is marked WIP, as the correct behavior after a disk fails is not yet included. For now, upon failure, the block managers will keep track of the dead disks and will mark the appropriate tablet peers as failed. Additionally, the error handling has been moved between layers offline, so more cleanup is being done to align with the above description. Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 --- M src/kudu/cfile/cfile_writer.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs-test-util.h M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/tablet.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/env_posix.cc M src/kudu/util/fault_injection.cc M src/kudu/util/fault_injection.h M src/kudu/util/status.h 19 files changed, 359 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/6773/1 -- To view, visit http://gerrit.cloudera.org:8080/6773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I4c221a5ea97c7b7ec5f8d395b5527dc16e63f828 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong