[jira] [Commented] (ASTERIXDB-2517) Ingestion process failed on a cluster with two machines.
[ https://issues.apache.org/jira/browse/ASTERIXDB-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830735#comment-16830735 ] Ian Maxon commented on ASTERIXDB-2517: -- Are these the ochca machines? It could be the same case as Wail saw, if something is fuzzing ports the shutdown API only requires a HTTP post to be activated. There's no authentication. > Ingestion process failed on a cluster with two machines. > > > Key: ASTERIXDB-2517 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2517 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Taewoo Kim >Priority: Major > Attachments: cc.log, nc-1.log > > > We have a cluster with two machines. Out of 1.5 billion records, about 1.2 > billion records were ingested using a socket adapter. However, the NC-1, > which is located on the same machine, was shutdown. The time was around 19:21 > (please see the log records around that time). I have attached the log > records. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ASTERIXDB-2517) Ingestion process failed on a cluster with two machines.
[ https://issues.apache.org/jira/browse/ASTERIXDB-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830701#comment-16830701 ] Wail Alkowaileet commented on ASTERIXDB-2517: - Ok .. my machine was under attack. Someone was trying to gain access using SSH: Apr 30 12:44:00 wail-desktop sshd[18273]: Failed password for root from 218.92.0.179 port 24645 ssh2 Apr 30 12:44:02 wail-desktop sshd[18275]: Failed password for root from 218.92.0.141 port 17552 ssh2 Apr 30 12:44:02 wail-desktop sshd[18277]: Failed password for root from 218.92.0.141 port 18157 ssh2 Apr 30 12:44:03 wail-desktop sshd[18273]: Failed password for root from 218.92.0.179 port 24645 ssh2 Apr 30 12:44:04 wail-desktop sshd[18275]: Failed password for root from 218.92.0.141 port 17552 ssh2 Apr 30 12:44:04 wail-desktop sshd[18277]: Failed password for root from 218.92.0.141 port 18157 ssh2 Apr 30 12:44:06 wail-desktop sshd[18273]: Failed password for root from 218.92.0.179 port 24645 ssh2 Apr 30 12:44:07 wail-desktop sshd[18275]: Failed password for root from 218.92.0.141 port 17552 ssh2 Apr 30 12:44:07 wail-desktop sshd[18277]: Failed password for root from 218.92.0.141 port 18157 ssh2 Apr 30 12:44:08 wail-desktop sshd[18273]: Failed password for root from 218.92.0.179 port 24645 ssh2 I changed my port and it seems the problem is gone. I was getting the error so frequently, almost every 20 minutes. > Ingestion process failed on a cluster with two machines. > > > Key: ASTERIXDB-2517 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2517 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Taewoo Kim >Priority: Major > Attachments: cc.log, nc-1.log > > > We have a cluster with two machines. Out of 1.5 billion records, about 1.2 > billion records were ingested using a socket adapter. However, the NC-1, > which is located on the same machine, was shutdown. The time was around 19:21 > (please see the log records around that time). I have attached the log > records. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ASTERIXDB-2517) Ingestion process failed on a cluster with two machines.
[ https://issues.apache.org/jira/browse/ASTERIXDB-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830654#comment-16830654 ] Wail Alkowaileet commented on ASTERIXDB-2517: - I got the same issue as well. Except there is no ingestion in my case. I have similar configuration (one cc and one nc) in the same machine. {code:java} 12:45:14.422 [TCPEndpoint IO Thread [null]] ERROR org.apache.hyracks.net.protocols.tcp.TCPEndpoint - Unexpected tcp io error in connection TCPConnection[Remote Address: /127.0.0.1:36955 Local Address: null] org.apache.hyracks.api.exceptions.NetException: Socket Closed at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.driveReaderStateMachine(MultiplexedConnection.java:342) ~[hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.notifyIOReady(MultiplexedConnection.java:113) ~[hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.net.protocols.tcp.TCPEndpoint$IOThread.run(TCPEndpoint.java:186) [hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] 12:46:04.244 [Executor-3:ClusterController] INFO org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 1 to shutdown to ensure failure 12:46:04.245 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.cluster.NodeManager - 1 considered dead. Last heartbeat received 50558ms ago. Max miss period: 5ms 12:46:04.245 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.work.RemoveDeadNodesWork - Number of affected jobs: 1 12:46:04.249 [Executor-3:ClusterController] WARN org.apache.hyracks.ipc.impl.ReconnectingIPCHandle - ipcHandle IPCHandle [addr=/127.0.0.1:44551 state=CLOSED] disconnected; will attempt to reconnect 1 times 12:46:04.251 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:1099]] WARN org.apache.hyracks.ipc.impl.IPCConnectionManager - Exception finishing connect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_201] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_201] at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.finishConnect(IPCConnectionManager.java:239) [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.processSelectedKeys(IPCConnectionManager.java:229) [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.doRun(IPCConnectionManager.java:200) [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.run(IPCConnectionManager.java:181) [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] 12:46:04.256 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:1099]] WARN org.apache.hyracks.ipc.impl.IPCConnectionManager - Failed to finish connect to /127.0.0.1:44551 12:46:04.257 [Executor-3:ClusterController] WARN org.apache.hyracks.ipc.impl.IPCConnectionManager - Connection to /127.0.0.1:44551 failed; retrying (retry attempt 1 of 1) after 100ms 12:46:04.265 [Worker:ClusterController] ERROR org.apache.hyracks.control.cc.executor.JobExecutor - Unexpected failure. Aborting job JID:0.13 org.apache.hyracks.api.exceptions.HyracksException: HYR0010: Node 1 does not exist at org.apache.hyracks.api.exceptions.HyracksException.create(HyracksException.java:57) ~[hyracks-api-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473) ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365) ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245) ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209) ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:732) [hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.control.cc.work.RemoveDeadNodesWork.run(RemoveDeadNodesWork.java:60) [hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) [hyracks-control-common-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT] 12:46:04.265 [Worker:ClusterController] INFO org.apache.asterix.hyracks.bootstrap.ClusterLifecycleListener - NC: 1 left 12:46:04.265 [Worker:ClusterController] INFO org.apache.asterix.runtime.utils.ClusterStateManager - Removing configuration parameters for node id 1 12:46:04.265 [Worker:ClusterController] INFO org.apache.asterix.runtime.utils.ClusterStateM