lnbest0707-uber opened a new issue, #12732:
URL: https://github.com/apache/pinot/issues/12732
After servers with invalid configuration joined the cluster, e.g. like below
`{
"id": "<some_id>",
"simpleFields": {
"HELIX_HOST": "<some_host>",
"HELIX_PORT": "",
},
"mapFields": {},
"listFields": {}
}`
The broker (even in other tenants) cannot build brokerResource correctly.
The entire cluster cannot server any queries any more. It would raise "**410
BrokerResourceMissingError**".
Following noticeable error might appear in broker log
`java.lang.NullPointerException: Cannot invoke
"java.util.Set.contains(Object)" because "this._enabledInstances" is null
at
o.a.p.b.r.i.BaseInstanceSelector.getEnabledCandidatesAndAddToServingInstances(BaseInstanceSelector.java:338)
at
o.a.p.b.r.i.BaseInstanceSelector.refreshSegmentStates(BaseInstanceSelector.java:294)
at o.a.p.b.r.i.BaseInstanceSelector.init(BaseInstanceSelector.java:117)
at
o.a.p.b.r.i.BalancedInstanceSelector.init(BalancedInstanceSelector.java:50)
at
o.a.p.b.r.BrokerRoutingManager.buildRouting(BrokerRoutingManager.java:450)
at
o.a.p.b.b.h.BrokerResourceOnlineOfflineStateModelFactory$BrokerResourceOnlineOfflineStateModel.onBecomeOnlineFromOffline(BrokerResourceOnlineOfflineStateModelFactory.java:80)
at
j.i.r.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.lang.reflect.Method.invoke(Method.java:580)
at
o.a.h.m.h.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350)
at o.a.h.m.h.HelixStateTransitionHandler.h...`
which indicates that this._enabledInstances cannot be initialized
BaseInstanceSelector. This Set object is initialized in following method
`
@Override
public void init(Set<String> enabledInstances, IdealState idealState,
ExternalView externalView,
Set<String> onlineSegments) {
_enabledInstances = enabledInstances;
Map<String, Long> newSegmentCreationTimeMap =
getNewSegmentCreationTimeMapFromZK(idealState, externalView,
onlineSegments);
updateSegmentMaps(idealState, externalView, onlineSegments,
newSegmentCreationTimeMap);
refreshSegmentStates();
}
`
And it is indirectly called by
BrokerRoutingManager.processInstanceConfigChange()
Once any exception raised in the method before calling `_routableServers =
enabledServers;` the exception might be caught and leave the Set as null.
Taking the above server config as an example, the broker would have
following exception
`java.lang.NumberFormatException: For input string: "" at
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Integer.parseInt(Integer.java:662) at
java.base/java.lang.Integer.parseInt(Integer.java:770) at
org.apache.pinot.core.transport.ServerInstance.<init>(ServerInstance.java:63)
at
org.apache.pinot.broker.routing.BrokerRoutingManager.processInstanceConfigChange(BrokerRoutingManager.java:245)
at
org.apache.pinot.broker.routing.BrokerRoutingManager.processClusterChange(BrokerRoutingManager.java:133)
at
org.apache.pinot.broker.broker.helix.ClusterChangeMediator.processClusterChange(ClusterChangeMediator.java:134)
at
org.apache.pinot.broker.broker.helix.ClusterChangeMediator.lambda$new$0(ClusterChangeMediator.java:96)
at java.base/java.lang.Thread.run(Thread.java:829)`
and skip/return in advance.
Such behavior is very dangerous and problematic for a distributed system. A
single participant instance failure would cause the entire cluster down.
Following items might be required:
- Once updating server configs through controller API, sanity check and
enforcement need to be in place.
- During initialization of brokers, safely created each not-to-be null
object in constructor. Once constructing mapping across servers, safely isolate
the bad configs and ensure functionality of good candidates.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]