Re: spark streaming failing to replicate blocks

2015-10-23 Thread Akhil Das
If you can reproduce, then i think you can open up a jira for this.

Thanks
Best Regards

On Fri, Oct 23, 2015 at 1:37 PM, Eugen Cepoi  wrote:

> When fixing the port to the same values as in the stack trace it works
> too. The network config of the slaves seems correct.
>
> Thanks,
> Eugen
>
> 2015-10-23 8:30 GMT+02:00 Akhil Das :
>
>> Mostly a network issue, you need to check your network configuration from
>> the aws console and make sure the ports are accessible within the cluster.
>>
>> Thanks
>> Best Regards
>>
>> On Thu, Oct 22, 2015 at 8:53 PM, Eugen Cepoi 
>> wrote:
>>
>>> Huh indeed this worked, thanks. Do you know why this happens, is that
>>> some known issue?
>>>
>>> Thanks,
>>> Eugen
>>>
>>> 2015-10-22 19:08 GMT+07:00 Akhil Das :
>>>
 Can you try fixing spark.blockManager.port to specific port and see if
 the issue exists?

 Thanks
 Best Regards

 On Mon, Oct 19, 2015 at 6:21 PM, Eugen Cepoi 
 wrote:

> Hi,
>
> I am running spark streaming 1.4.1 on EMR (AMI 3.9) over YARN.
> The job is reading data from Kinesis and the batch size is of 30s (I
> used the same value for the kinesis checkpointing).
> In the executor logs I can see every 5 seconds a sequence of
> stacktraces indicating that the block replication failed. I am using the
> default storage level MEMORY_AND_DISK_SER_2.
> WAL is not enabled nor checkpointing (the checkpoint dir is configured
> for the spark context but not for the streaming context).
>
> Here is an example of those logs for ip-10-63-160-18. They occur in
> every executor while trying to replicate to any other executor.
>
>
> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to 
> [ip-10-63-160-18.ec2.internal/10.63.160.18:50929]
> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection 
> to ip-10-63-160-18.ec2.internal/10.63.160.18:50929
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>   at 
> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>   at 
> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending 
> message.
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>   at 
> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>   at 
> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying 
> ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
> 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error 
> on connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
> 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate 
> input-1-144524231 to BlockManagerId(3, ip-10-159-151-22.ec2.internal, 
> 50929), failure #0
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>   at 
> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>   at 
> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection 
> to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to 
> [ip-10-63-160-18.ec2.internal/10.63.160.18:39506]
> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection 
> to 

Re: spark streaming failing to replicate blocks

2015-10-23 Thread Akhil Das
Mostly a network issue, you need to check your network configuration from
the aws console and make sure the ports are accessible within the cluster.

Thanks
Best Regards

On Thu, Oct 22, 2015 at 8:53 PM, Eugen Cepoi  wrote:

> Huh indeed this worked, thanks. Do you know why this happens, is that some
> known issue?
>
> Thanks,
> Eugen
>
> 2015-10-22 19:08 GMT+07:00 Akhil Das :
>
>> Can you try fixing spark.blockManager.port to specific port and see if
>> the issue exists?
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Oct 19, 2015 at 6:21 PM, Eugen Cepoi 
>> wrote:
>>
>>> Hi,
>>>
>>> I am running spark streaming 1.4.1 on EMR (AMI 3.9) over YARN.
>>> The job is reading data from Kinesis and the batch size is of 30s (I
>>> used the same value for the kinesis checkpointing).
>>> In the executor logs I can see every 5 seconds a sequence of stacktraces
>>> indicating that the block replication failed. I am using the default
>>> storage level MEMORY_AND_DISK_SER_2.
>>> WAL is not enabled nor checkpointing (the checkpoint dir is configured
>>> for the spark context but not for the streaming context).
>>>
>>> Here is an example of those logs for ip-10-63-160-18. They occur in
>>> every executor while trying to replicate to any other executor.
>>>
>>>
>>> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to 
>>> [ip-10-63-160-18.ec2.internal/10.63.160.18:50929]
>>> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection to 
>>> ip-10-63-160-18.ec2.internal/10.63.160.18:50929
>>> java.net.ConnectException: Connection refused
>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>> at 
>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>> at 
>>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>> at 
>>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending 
>>> message.
>>> java.net.ConnectException: Connection refused
>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>> at 
>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>> at 
>>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>> at 
>>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying 
>>> ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error on 
>>> connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>>> 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate 
>>> input-1-144524231 to BlockManagerId(3, ip-10-159-151-22.ec2.internal, 
>>> 50929), failure #0
>>> java.net.ConnectException: Connection refused
>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>> at 
>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>> at 
>>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>> at 
>>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection to 
>>> ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>>> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to 
>>> [ip-10-63-160-18.ec2.internal/10.63.160.18:39506]
>>> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection to 
>>> ip-10-63-160-18.ec2.internal/10.63.160.18:39506
>>> java.net.ConnectException: Connection refused
>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>> at 
>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>> at 
>>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>> at 
>>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> 

Re: spark streaming failing to replicate blocks

2015-10-23 Thread Eugen Cepoi
When fixing the port to the same values as in the stack trace it works too.
The network config of the slaves seems correct.

Thanks,
Eugen

2015-10-23 8:30 GMT+02:00 Akhil Das :

> Mostly a network issue, you need to check your network configuration from
> the aws console and make sure the ports are accessible within the cluster.
>
> Thanks
> Best Regards
>
> On Thu, Oct 22, 2015 at 8:53 PM, Eugen Cepoi 
> wrote:
>
>> Huh indeed this worked, thanks. Do you know why this happens, is that
>> some known issue?
>>
>> Thanks,
>> Eugen
>>
>> 2015-10-22 19:08 GMT+07:00 Akhil Das :
>>
>>> Can you try fixing spark.blockManager.port to specific port and see if
>>> the issue exists?
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Oct 19, 2015 at 6:21 PM, Eugen Cepoi 
>>> wrote:
>>>
 Hi,

 I am running spark streaming 1.4.1 on EMR (AMI 3.9) over YARN.
 The job is reading data from Kinesis and the batch size is of 30s (I
 used the same value for the kinesis checkpointing).
 In the executor logs I can see every 5 seconds a sequence of
 stacktraces indicating that the block replication failed. I am using the
 default storage level MEMORY_AND_DISK_SER_2.
 WAL is not enabled nor checkpointing (the checkpoint dir is configured
 for the spark context but not for the streaming context).

 Here is an example of those logs for ip-10-63-160-18. They occur in
 every executor while trying to replicate to any other executor.


 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to 
 [ip-10-63-160-18.ec2.internal/10.63.160.18:50929]
 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection 
 to ip-10-63-160-18.ec2.internal/10.63.160.18:50929
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
 org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
at 
 org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending 
 message.
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
 org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
at 
 org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying 
 ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error on 
 connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate 
 input-1-144524231 to BlockManagerId(3, ip-10-159-151-22.ec2.internal, 
 50929), failure #0
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
 org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
at 
 org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection 
 to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to 
 [ip-10-63-160-18.ec2.internal/10.63.160.18:39506]
 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection 
 to ip-10-63-160-18.ec2.internal/10.63.160.18:39506
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 

Re: spark streaming failing to replicate blocks

2015-10-22 Thread Akhil Das
Can you try fixing spark.blockManager.port to specific port and see if the
issue exists?

Thanks
Best Regards

On Mon, Oct 19, 2015 at 6:21 PM, Eugen Cepoi  wrote:

> Hi,
>
> I am running spark streaming 1.4.1 on EMR (AMI 3.9) over YARN.
> The job is reading data from Kinesis and the batch size is of 30s (I used
> the same value for the kinesis checkpointing).
> In the executor logs I can see every 5 seconds a sequence of stacktraces
> indicating that the block replication failed. I am using the default
> storage level MEMORY_AND_DISK_SER_2.
> WAL is not enabled nor checkpointing (the checkpoint dir is configured for
> the spark context but not for the streaming context).
>
> Here is an example of those logs for ip-10-63-160-18. They occur in every
> executor while trying to replicate to any other executor.
>
>
> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to 
> [ip-10-63-160-18.ec2.internal/10.63.160.18:50929]
> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection to 
> ip-10-63-160-18.ec2.internal/10.63.160.18:50929
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>   at 
> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>   at 
> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending 
> message.
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>   at 
> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>   at 
> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying 
> ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
> 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error on 
> connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
> 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate 
> input-1-144524231 to BlockManagerId(3, ip-10-159-151-22.ec2.internal, 
> 50929), failure #0
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>   at 
> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>   at 
> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection to 
> ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to 
> [ip-10-63-160-18.ec2.internal/10.63.160.18:39506]
> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection to 
> ip-10-63-160-18.ec2.internal/10.63.160.18:39506
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>   at 
> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>   at 
> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending 
> message.
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>   at 
> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>   at 
> 

Re: spark streaming failing to replicate blocks

2015-10-22 Thread Eugen Cepoi
Huh indeed this worked, thanks. Do you know why this happens, is that some
known issue?

Thanks,
Eugen

2015-10-22 19:08 GMT+07:00 Akhil Das :

> Can you try fixing spark.blockManager.port to specific port and see if the
> issue exists?
>
> Thanks
> Best Regards
>
> On Mon, Oct 19, 2015 at 6:21 PM, Eugen Cepoi 
> wrote:
>
>> Hi,
>>
>> I am running spark streaming 1.4.1 on EMR (AMI 3.9) over YARN.
>> The job is reading data from Kinesis and the batch size is of 30s (I used
>> the same value for the kinesis checkpointing).
>> In the executor logs I can see every 5 seconds a sequence of stacktraces
>> indicating that the block replication failed. I am using the default
>> storage level MEMORY_AND_DISK_SER_2.
>> WAL is not enabled nor checkpointing (the checkpoint dir is configured
>> for the spark context but not for the streaming context).
>>
>> Here is an example of those logs for ip-10-63-160-18. They occur in every
>> executor while trying to replicate to any other executor.
>>
>>
>> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to 
>> [ip-10-63-160-18.ec2.internal/10.63.160.18:50929]
>> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection to 
>> ip-10-63-160-18.ec2.internal/10.63.160.18:50929
>> java.net.ConnectException: Connection refused
>>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>  at 
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>  at 
>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>  at 
>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:745)
>> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending 
>> message.
>> java.net.ConnectException: Connection refused
>>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>  at 
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>  at 
>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>  at 
>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:745)
>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying 
>> ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error on 
>> connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>> 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate 
>> input-1-144524231 to BlockManagerId(3, ip-10-159-151-22.ec2.internal, 
>> 50929), failure #0
>> java.net.ConnectException: Connection refused
>>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>  at 
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>  at 
>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>  at 
>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:745)
>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection to 
>> ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to 
>> [ip-10-63-160-18.ec2.internal/10.63.160.18:39506]
>> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection to 
>> ip-10-63-160-18.ec2.internal/10.63.160.18:39506
>> java.net.ConnectException: Connection refused
>>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>  at 
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>  at 
>> org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>  at 
>> org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:745)
>> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending 
>> message.
>> java.net.ConnectException: Connection refused
>>  at 

spark streaming failing to replicate blocks

2015-10-19 Thread Eugen Cepoi
Hi,

I am running spark streaming 1.4.1 on EMR (AMI 3.9) over YARN.
The job is reading data from Kinesis and the batch size is of 30s (I used
the same value for the kinesis checkpointing).
In the executor logs I can see every 5 seconds a sequence of stacktraces
indicating that the block replication failed. I am using the default
storage level MEMORY_AND_DISK_SER_2.
WAL is not enabled nor checkpointing (the checkpoint dir is configured for
the spark context but not for the streaming context).

Here is an example of those logs for ip-10-63-160-18. They occur in every
executor while trying to replicate to any other executor.


15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to
[ip-10-63-160-18.ec2.internal/10.63.160.18:50929]
15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing
connection to ip-10-63-160-18.ec2.internal/10.63.160.18:50929
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
at 
org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending message.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
at 
org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying
ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection
error on connection to
ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate
input-1-144524231 to BlockManagerId(3,
ip-10-159-151-22.ec2.internal, 50929), failure #0
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
at 
org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/10/19 03:11:55 INFO nio.ConnectionManager: Removing
SendingConnection to
ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to
[ip-10-63-160-18.ec2.internal/10.63.160.18:39506]
15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing
connection to ip-10-63-160-18.ec2.internal/10.63.160.18:39506
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
at 
org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending message.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
at 
org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/10/19 03:11:55 INFO