[jira] [Created] (KUDU-3113) auto rebalancer doesn't successfully execute moves
Andrew Wong created KUDU-3113: - Summary: auto rebalancer doesn't successfully execute moves Key: KUDU-3113 URL: https://issues.apache.org/jira/browse/KUDU-3113 Project: Kudu Issue Type: Bug Components: master Reporter: Andrew Wong I enabled the auto-rebalancer and saw the following warnings when letting it run on a cluster with a single empty tserver. I see the following in the logs. {code:java} W0430 18:08:48.829325 170113 auto_rebalancer.cc:246] failed to send replica move request: Network error: unable to resolve address for deb718184b474c3db8dfe2d61133dfa1: Name or service not known {code} It seems like we're passing in a UUID when we should be passing a host: [https://github.com/apache/kudu/blob/master/src/kudu/master/auto_rebalancer.cc#L431] Given the feature is marked experimental, I don't think this is a release blocker, but we should probably remove it from the release notes for now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3107) TestRpc.TestCancellationMultiThreads fail on ARM sometimes due to service queue is full
[ https://issues.apache.org/jira/browse/KUDU-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096864#comment-17096864 ] Alexey Serbin commented on KUDU-3107: - I think the problem is that the code doesn't do proper conversion of the RPC-level status code into the application status code. I think the following is missing: {noformat} if (controller.status().IsRemoteError()) { const ErrorStatusPB* err = rpc->error_response(); CHECK(err && err->has_code() && (err->code() == ErrorStatusPB::ERROR_SERVER_TOO_BUSY || err->code() == ErrorStatusPB::ERROR_UNAVAILABLE)); } {noformat} > TestRpc.TestCancellationMultiThreads fail on ARM sometimes due to service > queue is full > --- > > Key: KUDU-3107 > URL: https://issues.apache.org/jira/browse/KUDU-3107 > Project: Kudu > Issue Type: Sub-task >Reporter: liusheng >Priority: Major > Attachments: rpc-test.txt > > > The test TestRpc.TestCancellationMultiThreads fail sometimes on ARM mechine > due the the "service queue full" error. related error message: > {code:java} > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 318) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 319) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 320) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 321) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 324) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 332) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 334) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 335) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 336) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 337) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 338) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 339) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 340) > Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 > (request call id 341) > F0416 13:01:38.616358 31937 rpc-test.cc:1471] Check failed: > controller.status().IsAborted() || controller.status().IsServiceUnavailable() > || controller.status().ok() Remote error: Service unavailable: PushStrings > request on kudu.rpc.GenericCalculatorService from 127.0.0.1:41516 dropped due > to backpressure. The service queue is full; it has 100 items. > *** Check failure stack trace: *** > PC: @0x0 (unknown) > *** SIGABRT (@0x3e86bbf) received by PID 27583 (TID 0x84b1f050) from > PID 27583; stack trace: *** > @ 0x93cf0464 raise at ??:0 > @ 0x93cf18b4 abort at ??:0 > @ 0x942c5fdc google::logging_fail() at ??:0 > @ 0x942c7d40 google::LogMessage::Fail() at ??:0 > @ 0x942c9c78 google::LogMessage::SendToLog() at ??:0 > @ 0x942c7874 google::LogMessage::Flush() at ??:0 > @ 0x942ca4fc google::LogMessageFatal::~LogMessageFatal() at ??:0 > @ 0xdcee4940 kudu::rpc::SendAndCancelRpcs() at ??:0 > @ 0xdcee4b98 > _ZZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvENKUlvE_clEv > at ??:0 > @ 0xdcee76bc > _ZSt13__invoke_implIvZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_ > at ??:0 > @ 0xdcee7484 > _ZSt8__invokeIZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS5_DpOS6_ > at ??:0 > @ 0xdcee8208 > _ZNSt6thread8_InvokerISt5tupleIJZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvEUlvE_EEE9_M_invokeIJLm0DTcl8__invokespcl10_S_declvalIXT_ESt12_Index_tupleIJXspT_EEE > at ??:0 > @ 0xdcee8168 > _ZNSt6thread8_InvokerISt5tupleIJZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvEUlvE_EEEclEv > at ??:0 > @ 0xdcee8110 > _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvEUlvE_E6_M_runEv > at ??:0 > @ 0x93f22e94 (unknown) at ??:0 > @ 0x93e1e088 start_thread at ??:0 > @ 0x93d8e4ec (unknown) at ??:0 > {code} > The attatchment is the full test log -- This message was sent by Atlassian Jira (v8.3.4#803005)