[jira] [Created] (KUDU-3113) auto rebalancer doesn't successfully execute moves

2020-04-30 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3113:
-

 Summary: auto rebalancer doesn't successfully execute moves
 Key: KUDU-3113
 URL: https://issues.apache.org/jira/browse/KUDU-3113
 Project: Kudu
  Issue Type: Bug
  Components: master
Reporter: Andrew Wong


I enabled the auto-rebalancer and saw the following warnings when letting it 
run on a cluster with a single empty tserver. I see the following in the logs.
{code:java}
W0430 18:08:48.829325 170113 auto_rebalancer.cc:246] failed to send replica 
move request: Network error: unable to resolve address for 
deb718184b474c3db8dfe2d61133dfa1: Name or service not known {code}
It seems like we're passing in a UUID when we should be passing a host: 
[https://github.com/apache/kudu/blob/master/src/kudu/master/auto_rebalancer.cc#L431]

Given the feature is marked experimental, I don't think this is a release 
blocker, but we should probably remove it from the release notes for now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3107) TestRpc.TestCancellationMultiThreads fail on ARM sometimes due to service queue is full

2020-04-30 Thread Alexey Serbin (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096864#comment-17096864
 ] 

Alexey Serbin commented on KUDU-3107:
-

I think the problem is that the code doesn't do proper conversion of the 
RPC-level status code into the application status code.  I think the following 
is missing:

{noformat}
if (controller.status().IsRemoteError()) {
  const ErrorStatusPB* err = rpc->error_response();
  CHECK(err && err->has_code() &&
  (err->code() == ErrorStatusPB::ERROR_SERVER_TOO_BUSY ||
   err->code() == ErrorStatusPB::ERROR_UNAVAILABLE));
}
{noformat}

> TestRpc.TestCancellationMultiThreads fail on ARM sometimes due to service 
> queue is full
> ---
>
> Key: KUDU-3107
> URL: https://issues.apache.org/jira/browse/KUDU-3107
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: liusheng
>Priority: Major
> Attachments: rpc-test.txt
>
>
> The test TestRpc.TestCancellationMultiThreads fail sometimes on ARM mechine 
> due the the "service queue full" error. related  error message:
> {code:java}
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 318)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 319)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 320)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 321)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 324)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 332)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 334)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 335)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 336)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 337)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 338)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 339)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 340)
> Call kudu.rpc.GenericCalculatorService.PushStrings from 127.0.0.1:41516 
> (request call id 341)
> F0416 13:01:38.616358 31937 rpc-test.cc:1471] Check failed: 
> controller.status().IsAborted() || controller.status().IsServiceUnavailable() 
> || controller.status().ok() Remote error: Service unavailable: PushStrings 
> request on kudu.rpc.GenericCalculatorService from 127.0.0.1:41516 dropped due 
> to backpressure. The service queue is full; it has 100 items.
> *** Check failure stack trace: ***
> PC: @0x0 (unknown)
> *** SIGABRT (@0x3e86bbf) received by PID 27583 (TID 0x84b1f050) from 
> PID 27583; stack trace: ***
> @ 0x93cf0464 raise at ??:0
> @ 0x93cf18b4 abort at ??:0
> @ 0x942c5fdc google::logging_fail() at ??:0
> @ 0x942c7d40 google::LogMessage::Fail() at ??:0
> @ 0x942c9c78 google::LogMessage::SendToLog() at ??:0
> @ 0x942c7874 google::LogMessage::Flush() at ??:0
> @ 0x942ca4fc google::LogMessageFatal::~LogMessageFatal() at ??:0
> @ 0xdcee4940 kudu::rpc::SendAndCancelRpcs() at ??:0
> @ 0xdcee4b98 
> _ZZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvENKUlvE_clEv
>  at ??:0
> @ 0xdcee76bc 
> _ZSt13__invoke_implIvZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
>  at ??:0
> @ 0xdcee7484 
> _ZSt8__invokeIZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS5_DpOS6_
>  at ??:0
> @ 0xdcee8208 
> _ZNSt6thread8_InvokerISt5tupleIJZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvEUlvE_EEE9_M_invokeIJLm0DTcl8__invokespcl10_S_declvalIXT_ESt12_Index_tupleIJXspT_EEE
>  at ??:0
> @ 0xdcee8168 
> _ZNSt6thread8_InvokerISt5tupleIJZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvEUlvE_EEEclEv
>  at ??:0
> @ 0xdcee8110 
> _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN4kudu3rpc41TestRpc_TestCancellationMultiThreads_Test8TestBodyEvEUlvE_E6_M_runEv
>  at ??:0
> @ 0x93f22e94 (unknown) at ??:0
> @ 0x93e1e088 start_thread at ??:0
> @ 0x93d8e4ec (unknown) at ??:0
> {code}
> The attatchment is the full test log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)