[
https://issues.apache.org/jira/browse/KUDU-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mahesh Reddy updated KUDU-3532:
-------------------------------
Description:
When multiple locations exist, it's possible an std::length_error will be
thrown when ReservoirSample is called within PlacementPolicy::SelectReplica().
Look at this file for reference:
https://github.com/apache/kudu/blob/master/src/kudu/master/placement_policy.cc
There's an error in the logic of the code that assumes an improper relation
between two sets, one set being the tablet servers to choose from and the other
set being the tablet servers not to choose from. This error manifests itself as
an implicit conversion from unsigned long to int. If "choices_size" is
negative, the implicit conversion to int will make the value larger than the
the max size allowed to reserve a vector and an error will be thrown within
ReservoirSample().
Below is a stack trace from a master crash due to this bug:
SIGABRT (@0x1da00007b60) received by PID 31584 (TID 0x7fdf9644f700) from PID
31584; stack trace: ***
@ 0xe48496 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fdfb9a90630 (unknown)
@ 0x7fdfb7c95387 __GI_raise
@ 0x7fdfb7c96a78 __GI_abort
@ 0x7fdfb85a5a95 {_}{{_}}gnu_cxx::\{_}_verbose_terminate_handler()
@ 0x7fdfb85a3a06 (unknown)
@ 0x7fdfb85a3a33 std::terminate()
@ 0x7fdfb85a3c53 __cxa_throw
@ 0x7fdfb85f8a67 std::__throw_length_error()
@ 0xe01fcf kudu::ReservoirSample<>()
@ 0xdfce0f kudu::master::PlacementPolicy::SelectReplica()
@ 0xdff386 kudu::master::PlacementPolicy::PlaceExtraTabletReplica()
@ 0xd873bf kudu::master::AsyncAddReplicaTask::SendRequest()
@ 0xd7912c kudu::master::RetryingTSRpcTask::Run()
@ 0xda5412 kudu::master::CatalogManager::ProcessTabletReport()
@ 0xdf7018 kudu::master::MasterServiceImpl::TSHeartbeat()
@ 0x2fea455 kudu::rpc::GeneratedServiceIf::Handle()
@ 0x2feb44a kudu::rpc::ServicePool::RunThread()
@ 0x31d2e1e kudu::Thread::SuperviseThread()
@ 0x7fdfb9a88ea5 start_thread
@ 0x7fdfb7d5db0d __clone
was:
When multiple locations exist, it's possible an std::length_error will be
thrown
[here|https://github.com/apache/kudu/blob/master/src/kudu/master/placement_policy.cc#L385].
An implicit conversion from unsigned long to int is the culprit here. If
"choices_size" is negative, the implicit conversion to int will make it larger
than the the max size allowed to reserve a vector and an error will be thrown.
Below is a stack trace from a master crash due to this bug:
SIGABRT (@0x1da00007b60) received by PID 31584 (TID 0x7fdf9644f700) from PID
31584; stack trace: ***
@ 0xe48496 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7fdfb9a90630 (unknown)
@ 0x7fdfb7c95387 __GI_raise
@ 0x7fdfb7c96a78 __GI_abort
@ 0x7fdfb85a5a95 {_}{{_}}gnu_cxx::\{_}_verbose_terminate_handler()
@ 0x7fdfb85a3a06 (unknown)
@ 0x7fdfb85a3a33 std::terminate()
@ 0x7fdfb85a3c53 __cxa_throw
@ 0x7fdfb85f8a67 std::__throw_length_error()
@ 0xe01fcf kudu::ReservoirSample<>()
@ 0xdfce0f kudu::master::PlacementPolicy::SelectReplica()
@ 0xdff386 kudu::master::PlacementPolicy::PlaceExtraTabletReplica()
@ 0xd873bf kudu::master::AsyncAddReplicaTask::SendRequest()
@ 0xd7912c kudu::master::RetryingTSRpcTask::Run()
@ 0xda5412 kudu::master::CatalogManager::ProcessTabletReport()
@ 0xdf7018 kudu::master::MasterServiceImpl::TSHeartbeat()
@ 0x2fea455 kudu::rpc::GeneratedServiceIf::Handle()
@ 0x2feb44a kudu::rpc::ServicePool::RunThread()
@ 0x31d2e1e kudu::Thread::SuperviseThread()
@ 0x7fdfb9a88ea5 start_thread
@ 0x7fdfb7d5db0d __clone
> Unable to place replicas using range aware logic with multiple locations
> ------------------------------------------------------------------------
>
> Key: KUDU-3532
> URL: https://issues.apache.org/jira/browse/KUDU-3532
> Project: Kudu
> Issue Type: Bug
> Components: master
> Affects Versions: 1.17.0
> Reporter: Mahesh Reddy
> Assignee: Mahesh Reddy
> Priority: Major
> Fix For: 1.18.0
>
>
> When multiple locations exist, it's possible an std::length_error will be
> thrown when ReservoirSample is called within
> PlacementPolicy::SelectReplica().
> Look at this file for reference:
> https://github.com/apache/kudu/blob/master/src/kudu/master/placement_policy.cc
> There's an error in the logic of the code that assumes an improper relation
> between two sets, one set being the tablet servers to choose from and the
> other set being the tablet servers not to choose from. This error manifests
> itself as an implicit conversion from unsigned long to int. If "choices_size"
> is negative, the implicit conversion to int will make the value larger than
> the the max size allowed to reserve a vector and an error will be thrown
> within ReservoirSample().
> Below is a stack trace from a master crash due to this bug:
> SIGABRT (@0x1da00007b60) received by PID 31584 (TID 0x7fdf9644f700) from PID
> 31584; stack trace: ***
> @ 0xe48496 google::(anonymous namespace)::FailureSignalHandler()
> @ 0x7fdfb9a90630 (unknown)
> @ 0x7fdfb7c95387 __GI_raise
> @ 0x7fdfb7c96a78 __GI_abort
> @ 0x7fdfb85a5a95 {_}{{_}}gnu_cxx::\{_}_verbose_terminate_handler()
> @ 0x7fdfb85a3a06 (unknown)
> @ 0x7fdfb85a3a33 std::terminate()
> @ 0x7fdfb85a3c53 __cxa_throw
> @ 0x7fdfb85f8a67 std::__throw_length_error()
> @ 0xe01fcf kudu::ReservoirSample<>()
> @ 0xdfce0f kudu::master::PlacementPolicy::SelectReplica()
> @ 0xdff386 kudu::master::PlacementPolicy::PlaceExtraTabletReplica()
> @ 0xd873bf kudu::master::AsyncAddReplicaTask::SendRequest()
> @ 0xd7912c kudu::master::RetryingTSRpcTask::Run()
> @ 0xda5412 kudu::master::CatalogManager::ProcessTabletReport()
> @ 0xdf7018 kudu::master::MasterServiceImpl::TSHeartbeat()
> @ 0x2fea455 kudu::rpc::GeneratedServiceIf::Handle()
> @ 0x2feb44a kudu::rpc::ServicePool::RunThread()
> @ 0x31d2e1e kudu::Thread::SuperviseThread()
> @ 0x7fdfb9a88ea5 start_thread
> @ 0x7fdfb7d5db0d __clone
--
This message was sent by Atlassian Jira
(v8.20.10#820010)