RE: RPC timeout error for AES based encryption between driver and executor

2019-03-27 Thread Sinha, Breeta (Nokia - IN/Bangalore)
Hi Vanzin,

"spark.authenticate" is working properly for our environment (Spark 2.4 on 
Kubernetes).
We have made few code changes through which secure communication between driver 
and executor is working fine using shared spark.authenticate.secret.

Even SASL encryption works but when we set, 
spark.network.crypto.enabled true
to enable AES based encryption, we see RPC timeout error message sporadically.

Kind Regards,
Breeta


-Original Message-
From: Marcelo Vanzin  
Sent: Tuesday, March 26, 2019 9:10 PM
To: Sinha, Breeta (Nokia - IN/Bangalore) 
Cc: user@spark.apache.org
Subject: Re: RPC timeout error for AES based encryption between driver and 
executor

I don't think "spark.authenticate" works properly with k8s in 2.4 (which would 
make it impossible to enable encryption since it requires authentication). I'm 
pretty sure I fixed it in master, though.

On Tue, Mar 26, 2019 at 2:29 AM Sinha, Breeta (Nokia - IN/Bangalore) 
 wrote:
>
> Hi All,
>
>
>
> We are trying to enable RPC encryption between driver and executor. Currently 
> we're working on Spark 2.4 on Kubernetes.
>
>
>
> According to Apache Spark Security document 
> (https://spark.apache.org/docs/latest/security.html) and our understanding on 
> the same, it is clear that Spark supports AES-based encryption for RPC 
> connections. There is also support for SASL-based encryption, although it 
> should be considered deprecated.
>
>
>
> spark.network.crypto.enabled true , will enable AES-based RPC encryption.
>
> However, when we enable AES based encryption between driver and executor, we 
> could observe a very sporadic behaviour in communication between driver and 
> executor in the logs.
>
>
>
> Follwing are the options and their default values, we used for 
> enabling encryption:-
>
>
>
> spark.authenticate true
>
> spark.authenticate.secret 
>
> spark.network.crypto.enabled true
>
> spark.network.crypto.keyLength 256
>
> spark.network.crypto.saslFallback false
>
>
>
> A snippet of the executor log is provided below:-
>
> Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: 
> Ask timeout before connecting successfully
>
> Caused by: java.util.concurrent.TimeoutException: Cannot receive any 
> reply from 
> sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in 
> 120 seconds
>
>
>
> But, there is no error message or any message from executor seen in the 
> driver log for the same timestamp.
>
>
>
> We also tried increasing spark.network.timeout, but no luck.
>
>
>
> This issue is seen sporadically, as the following observations were 
> noted:-
>
> 1) Sometimes, enabling AES encryption works completely fine.
>
> 2) Sometimes, enabling AES encryption works fine for around 10 consecutive 
> spark-submits but next trigger of spark-submit would go into hang state with 
> the above mentioned error in the executor log.
>
> 3) Also, there are times, when enabling AES encryption would not work at all, 
> as it would keep on spawnning more than 50 executors where the executors fail 
> with the above mentioned error.
>
> Even, setting spark.network.crypto.saslFallback to true didn't help.
>
>
>
> Things are working fine when we enable SASL encryption, that is, only 
> setting the following parameters:-
>
> spark.authenticate true
>
> spark.authenticate.secret 
>
>
>
> I have attached the log file containing detailed error message. Please let us 
> know if any configuration is missing or if any one has faced the same issue.
>
>
>
> Any leads would be highly appreciated!!
>
>
>
> Kind Regards,
>
> Breeta Sinha
>
>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org



--
Marcelo


Re: RPC timeout error for AES based encryption between driver and executor

2019-03-26 Thread Marcelo Vanzin
I don't think "spark.authenticate" works properly with k8s in 2.4
(which would make it impossible to enable encryption since it requires
authentication). I'm pretty sure I fixed it in master, though.

On Tue, Mar 26, 2019 at 2:29 AM Sinha, Breeta (Nokia - IN/Bangalore)
 wrote:
>
> Hi All,
>
>
>
> We are trying to enable RPC encryption between driver and executor. Currently 
> we're working on Spark 2.4 on Kubernetes.
>
>
>
> According to Apache Spark Security document 
> (https://spark.apache.org/docs/latest/security.html) and our understanding on 
> the same, it is clear that Spark supports AES-based encryption for RPC 
> connections. There is also support for SASL-based encryption, although it 
> should be considered deprecated.
>
>
>
> spark.network.crypto.enabled true , will enable AES-based RPC encryption.
>
> However, when we enable AES based encryption between driver and executor, we 
> could observe a very sporadic behaviour in communication between driver and 
> executor in the logs.
>
>
>
> Follwing are the options and their default values, we used for enabling 
> encryption:-
>
>
>
> spark.authenticate true
>
> spark.authenticate.secret 
>
> spark.network.crypto.enabled true
>
> spark.network.crypto.keyLength 256
>
> spark.network.crypto.saslFallback false
>
>
>
> A snippet of the executor log is provided below:-
>
> Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: Ask 
> timeout before connecting successfully
>
> Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply 
> from sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in 120 
> seconds
>
>
>
> But, there is no error message or any message from executor seen in the 
> driver log for the same timestamp.
>
>
>
> We also tried increasing spark.network.timeout, but no luck.
>
>
>
> This issue is seen sporadically, as the following observations were noted:-
>
> 1) Sometimes, enabling AES encryption works completely fine.
>
> 2) Sometimes, enabling AES encryption works fine for around 10 consecutive 
> spark-submits but next trigger of spark-submit would go into hang state with 
> the above mentioned error in the executor log.
>
> 3) Also, there are times, when enabling AES encryption would not work at all, 
> as it would keep on spawnning more than 50 executors where the executors fail 
> with the above mentioned error.
>
> Even, setting spark.network.crypto.saslFallback to true didn't help.
>
>
>
> Things are working fine when we enable SASL encryption, that is, only setting 
> the following parameters:-
>
> spark.authenticate true
>
> spark.authenticate.secret 
>
>
>
> I have attached the log file containing detailed error message. Please let us 
> know if any configuration is missing or if any one has faced the same issue.
>
>
>
> Any leads would be highly appreciated!!
>
>
>
> Kind Regards,
>
> Breeta Sinha
>
>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



RPC timeout error for AES based encryption between driver and executor

2019-03-26 Thread Sinha, Breeta (Nokia - IN/Bangalore)
Hi All,

We are trying to enable RPC encryption between driver and executor. Currently 
we're working on Spark 2.4 on Kubernetes.

According to Apache Spark Security document 
(https://spark.apache.org/docs/latest/security.html) and our understanding on 
the same, it is clear that Spark supports AES-based encryption for RPC 
connections. There is also support for SASL-based encryption, although it 
should be considered deprecated.

spark.network.crypto.enabled true , will enable AES-based RPC encryption.
However, when we enable AES based encryption between driver and executor, we 
could observe a very sporadic behaviour in communication between driver and 
executor in the logs.

Follwing are the options and their default values, we used for enabling 
encryption:-

spark.authenticate true
spark.authenticate.secret 
spark.network.crypto.enabled true
spark.network.crypto.keyLength 256
spark.network.crypto.saslFallback false

A snippet of the executor log is provided below:-
Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: Ask 
timeout before connecting successfully
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from 
sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in 120 seconds

But, there is no error message or any message from executor seen in the driver 
log for the same timestamp.

We also tried increasing spark.network.timeout, but no luck.

This issue is seen sporadically, as the following observations were noted:-
1) Sometimes, enabling AES encryption works completely fine.
2) Sometimes, enabling AES encryption works fine for around 10 consecutive 
spark-submits but next trigger of spark-submit would go into hang state with 
the above mentioned error in the executor log.
3) Also, there are times, when enabling AES encryption would not work at all, 
as it would keep on spawnning more than 50 executors where the executors fail 
with the above mentioned error.
Even, setting spark.network.crypto.saslFallback to true didn't help.

Things are working fine when we enable SASL encryption, that is, only setting 
the following parameters:-
spark.authenticate true
spark.authenticate.secret 

I have attached the log file containing detailed error message. Please let us 
know if any configuration is missing or if any one has faced the same issue.

Any leads would be highly appreciated!!

Kind Regards,
Breeta Sinha



rpc_timeout_error.log
Description: rpc_timeout_error.log

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Help: Get Timeout error and FileNotFoundException when shuffling large files

2015-12-10 Thread kendal
Hi there, 
My application is simply easy, just read huge files from HDFS with 
textFile()
Then I will map to to tuples, after than a reduceByKey(), finally
saveToTextFile().

The problem is when I am dealing with large inputs (2.5T), when the
application enter to the 2nd stage -- reduce by key. It fail with the
exception of FileNotFoundException when trying to fetch the temp files. I
also see Timeout (120s) error before that exception. No other exception or
error. (OOM, to many files, etc..)

I had done a lot of google searches, and tried to increase executor memory,
repartition the RDD to more splits, etc but in vain. 
I also find another post here:
http://permalink.gmane.org/gmane.comp.lang.scala.spark.user/5449
which has exactly the same problem with mine. 

Any idea? thanks so much for the help



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Help-Get-Timeout-error-and-FileNotFoundException-when-shuffling-large-files-tp25662.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Help: Get Timeout error and FileNotFoundException when shuffling large files

2015-12-10 Thread Sudhanshu Janghel
Can you please paste the stack trace.

Sudhanshu


Re: Help: Get Timeout error and FileNotFoundException when shuffling large files

2015-12-10 Thread manasdebashiskar
Is that the only kind of error you are getting.
Is it possible something else dies that gets buried in other messages.
Try repairing HDFS (fsck etc) to find if blocks are intact.

Few things to check 
1) if you have too many small files.
2) Is your system complaining about too many inode etc..
3) Try smaller set while increasing the data set size to make sure it is
data volume related problem.
4) If you have monitoring turned on see what your driver, worker machines
cpu and disk io.
5) Have you tried increasing Driver memory(more partitions means driver
needs more memory to keep the metadata)

..Manas





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Help-Get-Timeout-error-and-FileNotFoundException-when-shuffling-large-files-tp25662p25675.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Timeout Error

2015-04-30 Thread ๏̯͡๏
I am facing same issue, do you have any solution ?

On Mon, Apr 27, 2015 at 9:43 PM, Deepak Gopalakrishnan dgk...@gmail.com
wrote:

 Hello All,

 I dug a little deeper and found this error :

 15/04/27 16:05:39 WARN TransportChannelHandler: Exception in connection from 
 /10.1.0.90:40590
 java.io.IOException: Connection reset by peer
   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
   at sun.nio.ch.IOUtil.read(IOUtil.java:192)
   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
   at 
 io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
   at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
   at 
 io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
   at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
   at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
   at java.lang.Thread.run(Thread.java:745)
 15/04/27 16:05:39 ERROR TransportRequestHandler: Error sending result 
 ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=45314884029, 
 chunkIndex=0}, buffer=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 
 lim=26227673 cap=26227673]}} to /10.1.0.90:40590; closing connection
 java.nio.channels.ClosedChannelException
 15/04/27 16:05:39 ERROR TransportRequestHandler: Error sending result 
 RpcResponse{requestId=8439869725098873668, response=[B@1bdcdf63} to 
 /10.1.0.90:40590; closing connection
 java.nio.channels.ClosedChannelException
 15/04/27 16:05:39 ERROR CoarseGrainedExecutorBackend: Driver Disassociated 
 [akka.tcp://sparkexecu...@master.spark.com:60802] - 
 [akka.tcp://sparkdri...@master.spark.com:37195] disassociated! Shutting down.
 15/04/27 16:05:39 WARN ReliableDeliverySupervisor: Association with remote 
 system [akka.tcp://sparkdri...@master.spark.com:37195] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].


 On Mon, Apr 27, 2015 at 8:35 AM, Shixiong Zhu zsxw...@gmail.com wrote:

 The configuration key should be spark.akka.askTimeout for this timeout.
 The time unit is seconds.

 Best Regards,
 Shixiong(Ryan) Zhu

 2015-04-26 15:15 GMT-07:00 Deepak Gopalakrishnan dgk...@gmail.com:

 Hello,


 Just to add a bit more context :

 I have done that in the code, but I cannot see it change from 30 seconds
 in the log.

 .set(spark.executor.memory, 10g)

 .set(spark.driver.memory, 20g)

 .set(spark.akka.timeout,6000)

 PS : I understand that 6000 is quite large, but I'm just trying to see
 if it actually changes


 Here is the command that I'm running

  sudo MASTER=spark://master.spark.com:7077
 /opt/spark/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class
 class-name   --executor-memory 20G --driver-memory 10G  --deploy-mode
 client --conf spark.akka.timeout=6000 --conf spark.akka.askTimeout=6000
 jar file path


 and here is how I load the file JavaPairRDDString, String
 learningRdd=sc.wholeTextFiles(filePath,10);
 Thanks

 On Mon, Apr 27, 2015 at 3:36 AM, Bryan Cutler cutl...@gmail.com wrote:

 I'm not sure what the expected performance should be for this amount of
 data, but you could try to increase the timeout with the property
 spark.akka.timeout to see if that helps.

 Bryan

 On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan 
 dgk...@gmail.com wrote:

 Hello All,

 I'm trying to process a 3.5GB file on standalone mode using spark. I
 could run my spark job succesfully on a 100MB file and it works as
 expected. But, when I try to run it on the 3.5GB file, I run into the 
 below
 error :


 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block 
 taskresult_83
 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = 
 Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, 
 39143))] in 1 attempts
 java.util.concurrent.TimeoutException: Futures timed out after [30 
 seconds]
   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
   at 
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
   at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
   at scala.concurrent.Await$.result(package.scala:107)
   at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195)
   at 

Re: Timeout Error

2015-04-27 Thread Deepak Gopalakrishnan
Hello All,

I dug a little deeper and found this error :

15/04/27 16:05:39 WARN TransportChannelHandler: Exception in
connection from /10.1.0.90:40590
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at 
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at 
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)
15/04/27 16:05:39 ERROR TransportRequestHandler: Error sending result
ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=45314884029,
chunkIndex=0}, buffer=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0
lim=26227673 cap=26227673]}} to /10.1.0.90:40590; closing connection
java.nio.channels.ClosedChannelException
15/04/27 16:05:39 ERROR TransportRequestHandler: Error sending result
RpcResponse{requestId=8439869725098873668, response=[B@1bdcdf63} to
/10.1.0.90:40590; closing connection
java.nio.channels.ClosedChannelException
15/04/27 16:05:39 ERROR CoarseGrainedExecutorBackend: Driver
Disassociated [akka.tcp://sparkexecu...@master.spark.com:60802] -
[akka.tcp://sparkdri...@master.spark.com:37195] disassociated!
Shutting down.
15/04/27 16:05:39 WARN ReliableDeliverySupervisor: Association with
remote system [akka.tcp://sparkdri...@master.spark.com:37195] has
failed, address is now gated for [5000] ms. Reason is:
[Disassociated].


On Mon, Apr 27, 2015 at 8:35 AM, Shixiong Zhu zsxw...@gmail.com wrote:

 The configuration key should be spark.akka.askTimeout for this timeout.
 The time unit is seconds.

 Best Regards,
 Shixiong(Ryan) Zhu

 2015-04-26 15:15 GMT-07:00 Deepak Gopalakrishnan dgk...@gmail.com:

 Hello,


 Just to add a bit more context :

 I have done that in the code, but I cannot see it change from 30 seconds
 in the log.

 .set(spark.executor.memory, 10g)

 .set(spark.driver.memory, 20g)

 .set(spark.akka.timeout,6000)

 PS : I understand that 6000 is quite large, but I'm just trying to see if
 it actually changes


 Here is the command that I'm running

  sudo MASTER=spark://master.spark.com:7077
 /opt/spark/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class
 class-name   --executor-memory 20G --driver-memory 10G  --deploy-mode
 client --conf spark.akka.timeout=6000 --conf spark.akka.askTimeout=6000
 jar file path


 and here is how I load the file JavaPairRDDString, String
 learningRdd=sc.wholeTextFiles(filePath,10);
 Thanks

 On Mon, Apr 27, 2015 at 3:36 AM, Bryan Cutler cutl...@gmail.com wrote:

 I'm not sure what the expected performance should be for this amount of
 data, but you could try to increase the timeout with the property
 spark.akka.timeout to see if that helps.

 Bryan

 On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan dgk...@gmail.com
  wrote:

 Hello All,

 I'm trying to process a 3.5GB file on standalone mode using spark. I
 could run my spark job succesfully on a 100MB file and it works as
 expected. But, when I try to run it on the 3.5GB file, I run into the below
 error :


 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block 
 taskresult_83
 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = 
 Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, 
 39143))] in 1 attempts
 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427)
 15/04/26 12:47:15 INFO MemoryStore: ensureFreeSpace(26227673) called with 
 curMem=265897, maxMem=5556991426
 15/04/26 

Timeout Error

2015-04-26 Thread Deepak Gopalakrishnan
Hello All,

I'm trying to process a 3.5GB file on standalone mode using spark. I could
run my spark job succesfully on a 100MB file and it works as expected. But,
when I try to run it on the 3.5GB file, I run into the below error :


15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block taskresult_83
15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message =
Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2,
master.spark.com, 39143))] in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427)
15/04/26 12:47:15 INFO MemoryStore: ensureFreeSpace(26227673) called
with curMem=265897, maxMem=5556991426
15/04/26 12:47:15 INFO MemoryStore: Block taskresult_92 stored as
bytes in memory (estimated size 25.0 MB, free 5.2 GB)
15/04/26 12:47:16 INFO MemoryStore: ensureFreeSpace(26272879) called
with curMem=26493570, maxMem=5556991426
15/04/26 12:47:16 INFO MemoryStore: Block taskresult_94 stored as
bytes in memory (estimated size 25.1 MB, free 5.1 GB)
15/04/26 12:47:18 INFO MemoryStore: ensureFreeSpace(26285327) called
with curMem=52766449, maxMem=5556991426


and the job fails.


I'm on AWS and have opened all ports. Also, since the 100MB file works, it
should not be a connection issue.  I've a r3 xlarge and 2 m3 large.

Can anyone suggest a way to fix this?

-- 
Regards,
*Deepak Gopalakrishnan*
*Mobile*:+918891509774
*Skype* : deepakgk87
http://myexps.blogspot.com


Re: Timeout Error

2015-04-26 Thread Bryan Cutler
I'm not sure what the expected performance should be for this amount of
data, but you could try to increase the timeout with the property
spark.akka.timeout to see if that helps.

Bryan

On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan dgk...@gmail.com
wrote:

 Hello All,

 I'm trying to process a 3.5GB file on standalone mode using spark. I could
 run my spark job succesfully on a 100MB file and it works as expected. But,
 when I try to run it on the 3.5GB file, I run into the below error :


 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block taskresult_83
 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = 
 Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, 
 39143))] in 1 attempts
 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
   at 
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
   at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
   at scala.concurrent.Await$.result(package.scala:107)
   at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195)
   at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427)
 15/04/26 12:47:15 INFO MemoryStore: ensureFreeSpace(26227673) called with 
 curMem=265897, maxMem=5556991426
 15/04/26 12:47:15 INFO MemoryStore: Block taskresult_92 stored as bytes in 
 memory (estimated size 25.0 MB, free 5.2 GB)
 15/04/26 12:47:16 INFO MemoryStore: ensureFreeSpace(26272879) called with 
 curMem=26493570, maxMem=5556991426
 15/04/26 12:47:16 INFO MemoryStore: Block taskresult_94 stored as bytes in 
 memory (estimated size 25.1 MB, free 5.1 GB)
 15/04/26 12:47:18 INFO MemoryStore: ensureFreeSpace(26285327) called with 
 curMem=52766449, maxMem=5556991426


 and the job fails.


 I'm on AWS and have opened all ports. Also, since the 100MB file works, it
 should not be a connection issue.  I've a r3 xlarge and 2 m3 large.

 Can anyone suggest a way to fix this?

 --
 Regards,
 *Deepak Gopalakrishnan*
 *Mobile*:+918891509774
 *Skype* : deepakgk87
 http://myexps.blogspot.com




Re: Timeout Error

2015-04-26 Thread Deepak Gopalakrishnan
Hello,


Just to add a bit more context :

I have done that in the code, but I cannot see it change from 30 seconds in
the log.

.set(spark.executor.memory, 10g)

.set(spark.driver.memory, 20g)

.set(spark.akka.timeout,6000)

PS : I understand that 6000 is quite large, but I'm just trying to see if
it actually changes


Here is the command that I'm running

 sudo MASTER=spark://master.spark.com:7077
/opt/spark/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class
class-name   --executor-memory 20G --driver-memory 10G  --deploy-mode
client --conf spark.akka.timeout=6000 --conf spark.akka.askTimeout=6000
jar file path


and here is how I load the file JavaPairRDDString, String
learningRdd=sc.wholeTextFiles(filePath,10);
Thanks

On Mon, Apr 27, 2015 at 3:36 AM, Bryan Cutler cutl...@gmail.com wrote:

 I'm not sure what the expected performance should be for this amount of
 data, but you could try to increase the timeout with the property
 spark.akka.timeout to see if that helps.

 Bryan

 On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan dgk...@gmail.com
 wrote:

 Hello All,

 I'm trying to process a 3.5GB file on standalone mode using spark. I
 could run my spark job succesfully on a 100MB file and it works as
 expected. But, when I try to run it on the 3.5GB file, I run into the below
 error :


 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block 
 taskresult_83
 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = 
 Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, 
 39143))] in 1 attempts
 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
  at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
  at 
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
  at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
  at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
  at scala.concurrent.Await$.result(package.scala:107)
  at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195)
  at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427)
 15/04/26 12:47:15 INFO MemoryStore: ensureFreeSpace(26227673) called with 
 curMem=265897, maxMem=5556991426
 15/04/26 12:47:15 INFO MemoryStore: Block taskresult_92 stored as bytes in 
 memory (estimated size 25.0 MB, free 5.2 GB)
 15/04/26 12:47:16 INFO MemoryStore: ensureFreeSpace(26272879) called with 
 curMem=26493570, maxMem=5556991426
 15/04/26 12:47:16 INFO MemoryStore: Block taskresult_94 stored as bytes in 
 memory (estimated size 25.1 MB, free 5.1 GB)
 15/04/26 12:47:18 INFO MemoryStore: ensureFreeSpace(26285327) called with 
 curMem=52766449, maxMem=5556991426


 and the job fails.


 I'm on AWS and have opened all ports. Also, since the 100MB file works,
 it should not be a connection issue.  I've a r3 xlarge and 2 m3 large.

 Can anyone suggest a way to fix this?

 --
 Regards,
 *Deepak Gopalakrishnan*
 *Mobile*:+918891509774
 *Skype* : deepakgk87
 http://myexps.blogspot.com





-- 
Regards,
*Deepak Gopalakrishnan*
*Mobile*:+918891509774
*Skype* : deepakgk87
http://myexps.blogspot.com


Re: Timeout Error

2015-04-26 Thread Shixiong Zhu
The configuration key should be spark.akka.askTimeout for this timeout.
The time unit is seconds.

Best Regards,
Shixiong(Ryan) Zhu

2015-04-26 15:15 GMT-07:00 Deepak Gopalakrishnan dgk...@gmail.com:

 Hello,


 Just to add a bit more context :

 I have done that in the code, but I cannot see it change from 30 seconds
 in the log.

 .set(spark.executor.memory, 10g)

 .set(spark.driver.memory, 20g)

 .set(spark.akka.timeout,6000)

 PS : I understand that 6000 is quite large, but I'm just trying to see if
 it actually changes


 Here is the command that I'm running

  sudo MASTER=spark://master.spark.com:7077
 /opt/spark/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class
 class-name   --executor-memory 20G --driver-memory 10G  --deploy-mode
 client --conf spark.akka.timeout=6000 --conf spark.akka.askTimeout=6000
 jar file path


 and here is how I load the file JavaPairRDDString, String
 learningRdd=sc.wholeTextFiles(filePath,10);
 Thanks

 On Mon, Apr 27, 2015 at 3:36 AM, Bryan Cutler cutl...@gmail.com wrote:

 I'm not sure what the expected performance should be for this amount of
 data, but you could try to increase the timeout with the property
 spark.akka.timeout to see if that helps.

 Bryan

 On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan dgk...@gmail.com
 wrote:

 Hello All,

 I'm trying to process a 3.5GB file on standalone mode using spark. I
 could run my spark job succesfully on a 100MB file and it works as
 expected. But, when I try to run it on the 3.5GB file, I run into the below
 error :


 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block 
 taskresult_83
 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = 
 Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, 
 39143))] in 1 attempts
 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
 at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
 at 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 at scala.concurrent.Await$.result(package.scala:107)
 at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195)
 at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427)
 15/04/26 12:47:15 INFO MemoryStore: ensureFreeSpace(26227673) called with 
 curMem=265897, maxMem=5556991426
 15/04/26 12:47:15 INFO MemoryStore: Block taskresult_92 stored as bytes in 
 memory (estimated size 25.0 MB, free 5.2 GB)
 15/04/26 12:47:16 INFO MemoryStore: ensureFreeSpace(26272879) called with 
 curMem=26493570, maxMem=5556991426
 15/04/26 12:47:16 INFO MemoryStore: Block taskresult_94 stored as bytes in 
 memory (estimated size 25.1 MB, free 5.1 GB)
 15/04/26 12:47:18 INFO MemoryStore: ensureFreeSpace(26285327) called with 
 curMem=52766449, maxMem=5556991426


 and the job fails.


 I'm on AWS and have opened all ports. Also, since the 100MB file works,
 it should not be a connection issue.  I've a r3 xlarge and 2 m3 large.

 Can anyone suggest a way to fix this?

 --
 Regards,
 *Deepak Gopalakrishnan*
 *Mobile*:+918891509774
 *Skype* : deepakgk87
 http://myexps.blogspot.com





 --
 Regards,
 *Deepak Gopalakrishnan*
 *Mobile*:+918891509774
 *Skype* : deepakgk87
 http://myexps.blogspot.com