[ 
https://issues.apache.org/jira/browse/KUDU-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Reddy updated KUDU-3532:
-------------------------------
    Fix Version/s: 1.17.0
                       (was: 1.18.0)

> Unable to place replicas using range aware logic with multiple locations
> ------------------------------------------------------------------------
>
>                 Key: KUDU-3532
>                 URL: https://issues.apache.org/jira/browse/KUDU-3532
>             Project: Kudu
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.17.0
>            Reporter: Mahesh Reddy
>            Assignee: Mahesh Reddy
>            Priority: Major
>             Fix For: 1.17.0
>
>
> When multiple locations exist, it's possible an std::length_error will be 
> thrown when ReservoirSample is called within 
> PlacementPolicy::SelectReplica(). 
> Look at this file for reference: 
> https://github.com/apache/kudu/blob/master/src/kudu/master/placement_policy.cc
> There's an error in the logic of the code that assumes an improper relation 
> between two sets, one set being the tablet servers to choose from and the 
> other set being the tablet servers not to choose from. This error manifests 
> itself as an implicit conversion from unsigned long to int. If "choices_size" 
> is negative, the implicit conversion to int will make the value larger than 
> the the max size allowed to reserve a vector and an error will be thrown 
> within ReservoirSample().
> Below is a stack trace from a master crash due to this bug:
> SIGABRT (@0x1da00007b60) received by PID 31584 (TID 0x7fdf9644f700) from PID 
> 31584; stack trace: ***
>     @ 0xe48496 google::(anonymous namespace)::FailureSignalHandler()
>     @ 0x7fdfb9a90630 (unknown)
>     @ 0x7fdfb7c95387 __GI_raise
>     @ 0x7fdfb7c96a78 __GI_abort
>     @ 0x7fdfb85a5a95 {_}{{_}}gnu_cxx::\{_}_verbose_terminate_handler()
>     @ 0x7fdfb85a3a06 (unknown)
>     @ 0x7fdfb85a3a33 std::terminate()
>     @ 0x7fdfb85a3c53 __cxa_throw
>     @ 0x7fdfb85f8a67 std::__throw_length_error()
>     @ 0xe01fcf kudu::ReservoirSample<>()
>     @ 0xdfce0f kudu::master::PlacementPolicy::SelectReplica()
>     @ 0xdff386 kudu::master::PlacementPolicy::PlaceExtraTabletReplica()
>     @ 0xd873bf kudu::master::AsyncAddReplicaTask::SendRequest()
>     @ 0xd7912c kudu::master::RetryingTSRpcTask::Run()
>     @ 0xda5412 kudu::master::CatalogManager::ProcessTabletReport()
>     @ 0xdf7018 kudu::master::MasterServiceImpl::TSHeartbeat()
>     @ 0x2fea455 kudu::rpc::GeneratedServiceIf::Handle()
>     @ 0x2feb44a kudu::rpc::ServicePool::RunThread()
>     @ 0x31d2e1e kudu::Thread::SuperviseThread()
>     @ 0x7fdfb9a88ea5 start_thread
>     @ 0x7fdfb7d5db0d __clone



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to