Hi Shawn,

Are the logs on the participant or controller? what is the helix version?



On Wed, Apr 5, 2017 at 6:36 PM, Neutron Sharc <[email protected]>
wrote:

> Hi all,
>
> We are testing a failure recovery scenario where I have many resources
> spanning many participants.  I shutdown all participants and helix
> admins, wait a while, then add each participant back into cluster.
> (zookeeper is on a separate cluster, not affected by shtudown)  During
> the recovery, it seems controller generates too many messages, and
> there so many exceptions.  Below are some examples.
>
> Are these exceptions expected?   Any comments are highly appreciated.
> Thanks.
>
>
> [ERROR 2017-04-05 14:26:17,734
> org.apache.helix.manager.zk.ZkBaseDataAccessor:303] Exception while
> updating path: /yy_cluster_name/INSTANCES/P60505029461/ERRORS/
> 1002d87a25a0589/USER_DEFINE_MSG/15ae0bd8-10
> 1f-4af3-acc3-36a486af4f4c
> org.I0Itec.zkclient.exception.ZkInterruptedException:
> java.lang.InterruptedException
>        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
> java:687)
>        at org.apache.helix.manager.zk.ZkClient.readData(ZkClient.java:240)
>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>        at org.apache.helix.manager.zk.ZkBaseDataAccessor.doUpdate(
> ZkBaseDataAccessor.java:273)
>        at org.apache.helix.manager.zk.ZkBaseDataAccessor.update(
> ZkBaseDataAccessor.java:245)
>        at org.apache.helix.manager.zk.ZKHelixDataAccessor.updateProperty(
> ZKHelixDataAccessor.java:150)
>        at org.apache.helix.util.StatusUpdateUtil.publishErrorRecord(
> StatusUpdateUtil.java:501)
>        at org.apache.helix.util.StatusUpdateUtil.
> publishStatusUpdateRecord(StatusUpdateUtil.java:435)
>        at org.apache.helix.util.StatusUpdateUtil.
> logMessageStatusUpdateRecord(StatusUpdateUtil.java:334)
>        at org.apache.helix.util.StatusUpdateUtil.logError(
> StatusUpdateUtil.java:342)
>        at org.apache.helix.messaging.handling.HelixTask.call(
> HelixTask.java:163)
>        at org.apache.helix.messaging.handling.HelixTask.call(
> HelixTask.java:42)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>        at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException
>        at java.lang.Object.wait(Native Method)
>        at java.lang.Object.wait(Object.java:502)
>        at org.apache.zookeeper.ClientCnxn.submitRequest(
> ClientCnxn.java:1344)
>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:925)
>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
>        at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
>        at org.apache.helix.manager.zk.ZkClient$4.call(ZkClient.java:244)
>        at org.apache.helix.manager.zk.ZkClient$4.call(ZkClient.java:240)
>        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
> java:675)
>        ... 15 more
>
>
> [ERROR 2017-04-05 14:26:17,676
> org.apache.helix.messaging.handling.HelixTask:162] Exception after
> executing a message, msgId:
> 35e73c64-8fd3-4fb8-b0b8-419eacfa91a0org.I0Itec.zkclient.exception.
> ZkInterrupte
> dException: java.lang.InterruptedException
> org.I0Itec.zkclient.exception.ZkInterruptedException:
> java.lang.InterruptedException
>        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
> java:687)
>        at org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.
> java:212)
>        at org.I0Itec.zkclient.ZkClient.deleteRecursive(ZkClient.java:505)
>        at org.apache.helix.manager.zk.ZkBaseDataAccessor.remove(
> ZkBaseDataAccessor.java:537)
>        at org.apache.helix.manager.zk.ZKHelixDataAccessor.removeProperty(
> ZKHelixDataAccessor.java:271)
>        at org.apache.helix.messaging.handling.HelixTask.
> removeMessageFromZk(HelixTask.java:187)
>        at org.apache.helix.messaging.handling.HelixTask.call(
> HelixTask.java:150)
>        at org.apache.helix.messaging.handling.HelixTask.call(
> HelixTask.java:42)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>        at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException
>        at java.lang.Object.wait(Native Method)
>        at java.lang.Object.wait(Object.java:502)
>        at org.apache.zookeeper.ClientCnxn.submitRequest(
> ClientCnxn.java:1344)
>        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1247)
>        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1277)
>        at org.I0Itec.zkclient.ZkConnection.getChildren(
> ZkConnection.java:99)
>        at org.apache.helix.manager.zk.ZkClient$3.call(ZkClient.java:215)
>        at org.apache.helix.manager.zk.ZkClient$3.call(ZkClient.java:212)
>        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
> java:675)
>        ... 11 more
>
> [ERROR 2017-04-05 14:26:18,321
> org.apache.helix.messaging.handling.HelixTask:102] Exception while
> executing a message. java.lang.NullPointerException msgId:
> eb79d5b0-4210-479e-b010-076764199059 type: USER
> _DEFINE_MSG
> java.lang.NullPointerException
>        at com.hcd.hcdadmin.CustomMessageHandlerFactory$
> CustomMessageHandler.handleMessage(CustomMessageHandlerFactory.java:149)
>        at org.apache.helix.messaging.handling.HelixTask.call(
> HelixTask.java:85)
>        at org.apache.helix.messaging.handling.HelixTask.call(
> HelixTask.java:42)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>        at java.lang.Thread.run(Thread.java:745)
>
> [3:06]
> [ERROR 2017-04-05 14:26:15,958
> org.apache.helix.messaging.handling.HelixTask:102] Exception while
> executing a message. java.lang.NullPointerException msgId:
> 5b05cd82-c596-4329-8484-28ac3ef40e80 type: USER
> _DEFINE_MSG
> java.lang.NullPointerException
>        at com.hcd.hcdadmin.CustomMessageHandlerFactory$
> CustomMessageHandler.handleMessage(CustomMessageHandlerFactory.java:149)
>        at org.apache.helix.messaging.handling.HelixTask.call(
> HelixTask.java:85)
>        at org.apache.helix.messaging.handling.HelixTask.call(
> HelixTask.java:42)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>        at java.lang.Thread.run(Thread.java:745)
>

Reply via email to