[jira] [Commented] (ASTERIXDB-2517) Ingestion process failed on a cluster with two machines.

2019-04-30 Thread Ian Maxon (JIRA)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830735#comment-16830735
 ] 

Ian Maxon commented on ASTERIXDB-2517:
--

Are these the ochca machines?
It could be the same case as Wail saw, if something is fuzzing ports the 
shutdown API only requires a HTTP post to be activated. There's no 
authentication. 

> Ingestion process failed on a cluster with two machines.
> 
>
> Key: ASTERIXDB-2517
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2517
> Project: Apache AsterixDB
>  Issue Type: Bug
>Reporter: Taewoo Kim
>Priority: Major
> Attachments: cc.log, nc-1.log
>
>
> We have a cluster with two machines. Out of 1.5 billion records, about 1.2 
> billion records were ingested using a socket adapter. However, the NC-1, 
> which is located on the same machine, was shutdown. The time was around 19:21 
> (please see the log records around that time). I have attached the log 
> records.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ASTERIXDB-2517) Ingestion process failed on a cluster with two machines.

2019-04-30 Thread Wail Alkowaileet (JIRA)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830701#comment-16830701
 ] 

Wail Alkowaileet commented on ASTERIXDB-2517:
-

 

Ok .. my machine was under attack. Someone was trying to gain access using SSH:

Apr 30 12:44:00 wail-desktop sshd[18273]: Failed password for root from 
218.92.0.179 port 24645 ssh2
Apr 30 12:44:02 wail-desktop sshd[18275]: Failed password for root from 
218.92.0.141 port 17552 ssh2
Apr 30 12:44:02 wail-desktop sshd[18277]: Failed password for root from 
218.92.0.141 port 18157 ssh2
Apr 30 12:44:03 wail-desktop sshd[18273]: Failed password for root from 
218.92.0.179 port 24645 ssh2
Apr 30 12:44:04 wail-desktop sshd[18275]: Failed password for root from 
218.92.0.141 port 17552 ssh2
Apr 30 12:44:04 wail-desktop sshd[18277]: Failed password for root from 
218.92.0.141 port 18157 ssh2
Apr 30 12:44:06 wail-desktop sshd[18273]: Failed password for root from 
218.92.0.179 port 24645 ssh2
Apr 30 12:44:07 wail-desktop sshd[18275]: Failed password for root from 
218.92.0.141 port 17552 ssh2
Apr 30 12:44:07 wail-desktop sshd[18277]: Failed password for root from 
218.92.0.141 port 18157 ssh2
Apr 30 12:44:08 wail-desktop sshd[18273]: Failed password for root from 
218.92.0.179 port 24645 ssh2

 

I changed my port and it seems the problem is gone. I was getting the error so 
frequently, almost every 20 minutes.

> Ingestion process failed on a cluster with two machines.
> 
>
> Key: ASTERIXDB-2517
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2517
> Project: Apache AsterixDB
>  Issue Type: Bug
>Reporter: Taewoo Kim
>Priority: Major
> Attachments: cc.log, nc-1.log
>
>
> We have a cluster with two machines. Out of 1.5 billion records, about 1.2 
> billion records were ingested using a socket adapter. However, the NC-1, 
> which is located on the same machine, was shutdown. The time was around 19:21 
> (please see the log records around that time). I have attached the log 
> records.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ASTERIXDB-2517) Ingestion process failed on a cluster with two machines.

2019-04-30 Thread Wail Alkowaileet (JIRA)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830654#comment-16830654
 ] 

Wail Alkowaileet commented on ASTERIXDB-2517:
-

I got the same issue as well. Except there is no ingestion in my case. I have 
similar configuration (one cc and one nc) in the same machine.
{code:java}
12:45:14.422 [TCPEndpoint IO Thread [null]] ERROR 
org.apache.hyracks.net.protocols.tcp.TCPEndpoint - Unexpected tcp io error in 
connection TCPConnection[Remote Address: /127.0.0.1:36955 Local Address: null]
org.apache.hyracks.api.exceptions.NetException: Socket Closed
at 
org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.driveReaderStateMachine(MultiplexedConnection.java:342)
 ~[hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.notifyIOReady(MultiplexedConnection.java:113)
 ~[hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.net.protocols.tcp.TCPEndpoint$IOThread.run(TCPEndpoint.java:186)
 [hyracks-net-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
12:46:04.244 [Executor-3:ClusterController] INFO 
org.apache.hyracks.control.cc.cluster.NodeManager - Requesting node 1 to 
shutdown to ensure failure
12:46:04.245 [Worker:ClusterController] INFO 
org.apache.hyracks.control.cc.cluster.NodeManager - 1 considered dead. Last 
heartbeat received 50558ms ago. Max miss period: 5ms
12:46:04.245 [Worker:ClusterController] INFO 
org.apache.hyracks.control.cc.work.RemoveDeadNodesWork - Number of affected 
jobs: 1
12:46:04.249 [Executor-3:ClusterController] WARN 
org.apache.hyracks.ipc.impl.ReconnectingIPCHandle - ipcHandle IPCHandle 
[addr=/127.0.0.1:44551 state=CLOSED] disconnected; will attempt to reconnect 1 
times
12:46:04.251 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:1099]] WARN 
org.apache.hyracks.ipc.impl.IPCConnectionManager - Exception finishing connect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_201]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 
~[?:1.8.0_201]
at 
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.finishConnect(IPCConnectionManager.java:239)
 [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.processSelectedKeys(IPCConnectionManager.java:229)
 [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.doRun(IPCConnectionManager.java:200)
 [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.run(IPCConnectionManager.java:181)
 [hyracks-ipc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
12:46:04.256 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:1099]] WARN 
org.apache.hyracks.ipc.impl.IPCConnectionManager - Failed to finish connect to 
/127.0.0.1:44551
12:46:04.257 [Executor-3:ClusterController] WARN 
org.apache.hyracks.ipc.impl.IPCConnectionManager - Connection to 
/127.0.0.1:44551 failed; retrying (retry attempt 1 of 1) after 100ms
12:46:04.265 [Worker:ClusterController] ERROR 
org.apache.hyracks.control.cc.executor.JobExecutor - Unexpected failure. 
Aborting job JID:0.13
org.apache.hyracks.api.exceptions.HyracksException: HYR0010: Node 1 does not 
exist
at 
org.apache.hyracks.api.exceptions.HyracksException.create(HyracksException.java:57)
 ~[hyracks-api-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209)
 ~[hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:732)
 [hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.cc.work.RemoveDeadNodesWork.run(RemoveDeadNodesWork.java:60)
 [hyracks-control-cc-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
at 
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
 [hyracks-control-common-0.3.5-SNAPSHOT.jar:0.3.5-SNAPSHOT]
12:46:04.265 [Worker:ClusterController] INFO 
org.apache.asterix.hyracks.bootstrap.ClusterLifecycleListener - NC: 1 left
12:46:04.265 [Worker:ClusterController] INFO 
org.apache.asterix.runtime.utils.ClusterStateManager - Removing configuration 
parameters for node id 1
12:46:04.265 [Worker:ClusterController] INFO 
org.apache.asterix.runtime.utils.ClusterStateManager -