Github user sohami commented on a diff in the pull request:
https://github.com/apache/drill/pull/772#discussion_r105286121
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -2143,6 +2146,9 @@ connectionStatus_t
PooledDrillClientImpl::connect(const char* connStr, DrillUser
Utils::shuffle(drillbits);
// The original shuffled order is maintained if we shuffle
first and then add any missing elements
Utils::add(m_drillbits, drillbits);
+ if (m_drillbits.empty()){
+ return handleConnError(CONN_FAILURE,
getMessage(ERR_CONN_ZKNODBIT));
--- End diff --
Since we are not removing the offline nodes from m_drillbits then I think
we should return connection error before shuffle. Let's say on first client
connection we get all the active node from zookeeper and store it in
m_drillbits. Then all the nodes went dead or offline. In the next connection
request, zookeeper will return zero drillbits but since m_drillbits is not
empty we will still try to connect and fail later.
Instead I think zero drillbits returned from zookeeper is a good indication
that we won't be able to connect to any other node already present inside
m_drillbits and should fail there itself ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---