RE: RPC timeout error for AES based encryption between driver and executor
Hi Vanzin, "spark.authenticate" is working properly for our environment (Spark 2.4 on Kubernetes). We have made few code changes through which secure communication between driver and executor is working fine using shared spark.authenticate.secret. Even SASL encryption works but when we set, spark.network.crypto.enabled true to enable AES based encryption, we see RPC timeout error message sporadically. Kind Regards, Breeta -Original Message- From: Marcelo Vanzin Sent: Tuesday, March 26, 2019 9:10 PM To: Sinha, Breeta (Nokia - IN/Bangalore) Cc: user@spark.apache.org Subject: Re: RPC timeout error for AES based encryption between driver and executor I don't think "spark.authenticate" works properly with k8s in 2.4 (which would make it impossible to enable encryption since it requires authentication). I'm pretty sure I fixed it in master, though. On Tue, Mar 26, 2019 at 2:29 AM Sinha, Breeta (Nokia - IN/Bangalore) wrote: > > Hi All, > > > > We are trying to enable RPC encryption between driver and executor. Currently > we're working on Spark 2.4 on Kubernetes. > > > > According to Apache Spark Security document > (https://spark.apache.org/docs/latest/security.html) and our understanding on > the same, it is clear that Spark supports AES-based encryption for RPC > connections. There is also support for SASL-based encryption, although it > should be considered deprecated. > > > > spark.network.crypto.enabled true , will enable AES-based RPC encryption. > > However, when we enable AES based encryption between driver and executor, we > could observe a very sporadic behaviour in communication between driver and > executor in the logs. > > > > Follwing are the options and their default values, we used for > enabling encryption:- > > > > spark.authenticate true > > spark.authenticate.secret > > spark.network.crypto.enabled true > > spark.network.crypto.keyLength 256 > > spark.network.crypto.saslFallback false > > > > A snippet of the executor log is provided below:- > > Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: > Ask timeout before connecting successfully > > Caused by: java.util.concurrent.TimeoutException: Cannot receive any > reply from > sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in > 120 seconds > > > > But, there is no error message or any message from executor seen in the > driver log for the same timestamp. > > > > We also tried increasing spark.network.timeout, but no luck. > > > > This issue is seen sporadically, as the following observations were > noted:- > > 1) Sometimes, enabling AES encryption works completely fine. > > 2) Sometimes, enabling AES encryption works fine for around 10 consecutive > spark-submits but next trigger of spark-submit would go into hang state with > the above mentioned error in the executor log. > > 3) Also, there are times, when enabling AES encryption would not work at all, > as it would keep on spawnning more than 50 executors where the executors fail > with the above mentioned error. > > Even, setting spark.network.crypto.saslFallback to true didn't help. > > > > Things are working fine when we enable SASL encryption, that is, only > setting the following parameters:- > > spark.authenticate true > > spark.authenticate.secret > > > > I have attached the log file containing detailed error message. Please let us > know if any configuration is missing or if any one has faced the same issue. > > > > Any leads would be highly appreciated!! > > > > Kind Regards, > > Breeta Sinha > > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org -- Marcelo
Re: RPC timeout error for AES based encryption between driver and executor
I don't think "spark.authenticate" works properly with k8s in 2.4 (which would make it impossible to enable encryption since it requires authentication). I'm pretty sure I fixed it in master, though. On Tue, Mar 26, 2019 at 2:29 AM Sinha, Breeta (Nokia - IN/Bangalore) wrote: > > Hi All, > > > > We are trying to enable RPC encryption between driver and executor. Currently > we're working on Spark 2.4 on Kubernetes. > > > > According to Apache Spark Security document > (https://spark.apache.org/docs/latest/security.html) and our understanding on > the same, it is clear that Spark supports AES-based encryption for RPC > connections. There is also support for SASL-based encryption, although it > should be considered deprecated. > > > > spark.network.crypto.enabled true , will enable AES-based RPC encryption. > > However, when we enable AES based encryption between driver and executor, we > could observe a very sporadic behaviour in communication between driver and > executor in the logs. > > > > Follwing are the options and their default values, we used for enabling > encryption:- > > > > spark.authenticate true > > spark.authenticate.secret > > spark.network.crypto.enabled true > > spark.network.crypto.keyLength 256 > > spark.network.crypto.saslFallback false > > > > A snippet of the executor log is provided below:- > > Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: Ask > timeout before connecting successfully > > Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply > from sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in 120 > seconds > > > > But, there is no error message or any message from executor seen in the > driver log for the same timestamp. > > > > We also tried increasing spark.network.timeout, but no luck. > > > > This issue is seen sporadically, as the following observations were noted:- > > 1) Sometimes, enabling AES encryption works completely fine. > > 2) Sometimes, enabling AES encryption works fine for around 10 consecutive > spark-submits but next trigger of spark-submit would go into hang state with > the above mentioned error in the executor log. > > 3) Also, there are times, when enabling AES encryption would not work at all, > as it would keep on spawnning more than 50 executors where the executors fail > with the above mentioned error. > > Even, setting spark.network.crypto.saslFallback to true didn't help. > > > > Things are working fine when we enable SASL encryption, that is, only setting > the following parameters:- > > spark.authenticate true > > spark.authenticate.secret > > > > I have attached the log file containing detailed error message. Please let us > know if any configuration is missing or if any one has faced the same issue. > > > > Any leads would be highly appreciated!! > > > > Kind Regards, > > Breeta Sinha > > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org -- Marcelo - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
RPC timeout error for AES based encryption between driver and executor
Hi All, We are trying to enable RPC encryption between driver and executor. Currently we're working on Spark 2.4 on Kubernetes. According to Apache Spark Security document (https://spark.apache.org/docs/latest/security.html) and our understanding on the same, it is clear that Spark supports AES-based encryption for RPC connections. There is also support for SASL-based encryption, although it should be considered deprecated. spark.network.crypto.enabled true , will enable AES-based RPC encryption. However, when we enable AES based encryption between driver and executor, we could observe a very sporadic behaviour in communication between driver and executor in the logs. Follwing are the options and their default values, we used for enabling encryption:- spark.authenticate true spark.authenticate.secret spark.network.crypto.enabled true spark.network.crypto.keyLength 256 spark.network.crypto.saslFallback false A snippet of the executor log is provided below:- Exception in thread "main" 19/02/26 07:27:08 ERROR RpcOutboxMessage: Ask timeout before connecting successfully Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply from sts-spark-thrift-server-1551165767426-driver-svc.default.svc:7078 in 120 seconds But, there is no error message or any message from executor seen in the driver log for the same timestamp. We also tried increasing spark.network.timeout, but no luck. This issue is seen sporadically, as the following observations were noted:- 1) Sometimes, enabling AES encryption works completely fine. 2) Sometimes, enabling AES encryption works fine for around 10 consecutive spark-submits but next trigger of spark-submit would go into hang state with the above mentioned error in the executor log. 3) Also, there are times, when enabling AES encryption would not work at all, as it would keep on spawnning more than 50 executors where the executors fail with the above mentioned error. Even, setting spark.network.crypto.saslFallback to true didn't help. Things are working fine when we enable SASL encryption, that is, only setting the following parameters:- spark.authenticate true spark.authenticate.secret I have attached the log file containing detailed error message. Please let us know if any configuration is missing or if any one has faced the same issue. Any leads would be highly appreciated!! Kind Regards, Breeta Sinha rpc_timeout_error.log Description: rpc_timeout_error.log - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Help: Get Timeout error and FileNotFoundException when shuffling large files
Hi there, My application is simply easy, just read huge files from HDFS with textFile() Then I will map to to tuples, after than a reduceByKey(), finally saveToTextFile(). The problem is when I am dealing with large inputs (2.5T), when the application enter to the 2nd stage -- reduce by key. It fail with the exception of FileNotFoundException when trying to fetch the temp files. I also see Timeout (120s) error before that exception. No other exception or error. (OOM, to many files, etc..) I had done a lot of google searches, and tried to increase executor memory, repartition the RDD to more splits, etc but in vain. I also find another post here: http://permalink.gmane.org/gmane.comp.lang.scala.spark.user/5449 which has exactly the same problem with mine. Any idea? thanks so much for the help -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-Get-Timeout-error-and-FileNotFoundException-when-shuffling-large-files-tp25662.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Help: Get Timeout error and FileNotFoundException when shuffling large files
Can you please paste the stack trace. Sudhanshu
Re: Help: Get Timeout error and FileNotFoundException when shuffling large files
Is that the only kind of error you are getting. Is it possible something else dies that gets buried in other messages. Try repairing HDFS (fsck etc) to find if blocks are intact. Few things to check 1) if you have too many small files. 2) Is your system complaining about too many inode etc.. 3) Try smaller set while increasing the data set size to make sure it is data volume related problem. 4) If you have monitoring turned on see what your driver, worker machines cpu and disk io. 5) Have you tried increasing Driver memory(more partitions means driver needs more memory to keep the metadata) ..Manas -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-Get-Timeout-error-and-FileNotFoundException-when-shuffling-large-files-tp25662p25675.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Timeout Error
I am facing same issue, do you have any solution ? On Mon, Apr 27, 2015 at 9:43 PM, Deepak Gopalakrishnan dgk...@gmail.com wrote: Hello All, I dug a little deeper and found this error : 15/04/27 16:05:39 WARN TransportChannelHandler: Exception in connection from /10.1.0.90:40590 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) 15/04/27 16:05:39 ERROR TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=45314884029, chunkIndex=0}, buffer=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 lim=26227673 cap=26227673]}} to /10.1.0.90:40590; closing connection java.nio.channels.ClosedChannelException 15/04/27 16:05:39 ERROR TransportRequestHandler: Error sending result RpcResponse{requestId=8439869725098873668, response=[B@1bdcdf63} to /10.1.0.90:40590; closing connection java.nio.channels.ClosedChannelException 15/04/27 16:05:39 ERROR CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkexecu...@master.spark.com:60802] - [akka.tcp://sparkdri...@master.spark.com:37195] disassociated! Shutting down. 15/04/27 16:05:39 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkdri...@master.spark.com:37195] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. On Mon, Apr 27, 2015 at 8:35 AM, Shixiong Zhu zsxw...@gmail.com wrote: The configuration key should be spark.akka.askTimeout for this timeout. The time unit is seconds. Best Regards, Shixiong(Ryan) Zhu 2015-04-26 15:15 GMT-07:00 Deepak Gopalakrishnan dgk...@gmail.com: Hello, Just to add a bit more context : I have done that in the code, but I cannot see it change from 30 seconds in the log. .set(spark.executor.memory, 10g) .set(spark.driver.memory, 20g) .set(spark.akka.timeout,6000) PS : I understand that 6000 is quite large, but I'm just trying to see if it actually changes Here is the command that I'm running sudo MASTER=spark://master.spark.com:7077 /opt/spark/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class class-name --executor-memory 20G --driver-memory 10G --deploy-mode client --conf spark.akka.timeout=6000 --conf spark.akka.askTimeout=6000 jar file path and here is how I load the file JavaPairRDDString, String learningRdd=sc.wholeTextFiles(filePath,10); Thanks On Mon, Apr 27, 2015 at 3:36 AM, Bryan Cutler cutl...@gmail.com wrote: I'm not sure what the expected performance should be for this amount of data, but you could try to increase the timeout with the property spark.akka.timeout to see if that helps. Bryan On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan dgk...@gmail.com wrote: Hello All, I'm trying to process a 3.5GB file on standalone mode using spark. I could run my spark job succesfully on a 100MB file and it works as expected. But, when I try to run it on the 3.5GB file, I run into the below error : 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block taskresult_83 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, 39143))] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at
Re: Timeout Error
Hello All, I dug a little deeper and found this error : 15/04/27 16:05:39 WARN TransportChannelHandler: Exception in connection from /10.1.0.90:40590 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) 15/04/27 16:05:39 ERROR TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=45314884029, chunkIndex=0}, buffer=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 lim=26227673 cap=26227673]}} to /10.1.0.90:40590; closing connection java.nio.channels.ClosedChannelException 15/04/27 16:05:39 ERROR TransportRequestHandler: Error sending result RpcResponse{requestId=8439869725098873668, response=[B@1bdcdf63} to /10.1.0.90:40590; closing connection java.nio.channels.ClosedChannelException 15/04/27 16:05:39 ERROR CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkexecu...@master.spark.com:60802] - [akka.tcp://sparkdri...@master.spark.com:37195] disassociated! Shutting down. 15/04/27 16:05:39 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkdri...@master.spark.com:37195] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. On Mon, Apr 27, 2015 at 8:35 AM, Shixiong Zhu zsxw...@gmail.com wrote: The configuration key should be spark.akka.askTimeout for this timeout. The time unit is seconds. Best Regards, Shixiong(Ryan) Zhu 2015-04-26 15:15 GMT-07:00 Deepak Gopalakrishnan dgk...@gmail.com: Hello, Just to add a bit more context : I have done that in the code, but I cannot see it change from 30 seconds in the log. .set(spark.executor.memory, 10g) .set(spark.driver.memory, 20g) .set(spark.akka.timeout,6000) PS : I understand that 6000 is quite large, but I'm just trying to see if it actually changes Here is the command that I'm running sudo MASTER=spark://master.spark.com:7077 /opt/spark/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class class-name --executor-memory 20G --driver-memory 10G --deploy-mode client --conf spark.akka.timeout=6000 --conf spark.akka.askTimeout=6000 jar file path and here is how I load the file JavaPairRDDString, String learningRdd=sc.wholeTextFiles(filePath,10); Thanks On Mon, Apr 27, 2015 at 3:36 AM, Bryan Cutler cutl...@gmail.com wrote: I'm not sure what the expected performance should be for this amount of data, but you could try to increase the timeout with the property spark.akka.timeout to see if that helps. Bryan On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan dgk...@gmail.com wrote: Hello All, I'm trying to process a 3.5GB file on standalone mode using spark. I could run my spark job succesfully on a 100MB file and it works as expected. But, when I try to run it on the 3.5GB file, I run into the below error : 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block taskresult_83 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, 39143))] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) 15/04/26 12:47:15 INFO MemoryStore: ensureFreeSpace(26227673) called with curMem=265897, maxMem=5556991426 15/04/26
Timeout Error
Hello All, I'm trying to process a 3.5GB file on standalone mode using spark. I could run my spark job succesfully on a 100MB file and it works as expected. But, when I try to run it on the 3.5GB file, I run into the below error : 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block taskresult_83 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, 39143))] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) 15/04/26 12:47:15 INFO MemoryStore: ensureFreeSpace(26227673) called with curMem=265897, maxMem=5556991426 15/04/26 12:47:15 INFO MemoryStore: Block taskresult_92 stored as bytes in memory (estimated size 25.0 MB, free 5.2 GB) 15/04/26 12:47:16 INFO MemoryStore: ensureFreeSpace(26272879) called with curMem=26493570, maxMem=5556991426 15/04/26 12:47:16 INFO MemoryStore: Block taskresult_94 stored as bytes in memory (estimated size 25.1 MB, free 5.1 GB) 15/04/26 12:47:18 INFO MemoryStore: ensureFreeSpace(26285327) called with curMem=52766449, maxMem=5556991426 and the job fails. I'm on AWS and have opened all ports. Also, since the 100MB file works, it should not be a connection issue. I've a r3 xlarge and 2 m3 large. Can anyone suggest a way to fix this? -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com
Re: Timeout Error
I'm not sure what the expected performance should be for this amount of data, but you could try to increase the timeout with the property spark.akka.timeout to see if that helps. Bryan On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan dgk...@gmail.com wrote: Hello All, I'm trying to process a 3.5GB file on standalone mode using spark. I could run my spark job succesfully on a 100MB file and it works as expected. But, when I try to run it on the 3.5GB file, I run into the below error : 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block taskresult_83 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, 39143))] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) 15/04/26 12:47:15 INFO MemoryStore: ensureFreeSpace(26227673) called with curMem=265897, maxMem=5556991426 15/04/26 12:47:15 INFO MemoryStore: Block taskresult_92 stored as bytes in memory (estimated size 25.0 MB, free 5.2 GB) 15/04/26 12:47:16 INFO MemoryStore: ensureFreeSpace(26272879) called with curMem=26493570, maxMem=5556991426 15/04/26 12:47:16 INFO MemoryStore: Block taskresult_94 stored as bytes in memory (estimated size 25.1 MB, free 5.1 GB) 15/04/26 12:47:18 INFO MemoryStore: ensureFreeSpace(26285327) called with curMem=52766449, maxMem=5556991426 and the job fails. I'm on AWS and have opened all ports. Also, since the 100MB file works, it should not be a connection issue. I've a r3 xlarge and 2 m3 large. Can anyone suggest a way to fix this? -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com
Re: Timeout Error
Hello, Just to add a bit more context : I have done that in the code, but I cannot see it change from 30 seconds in the log. .set(spark.executor.memory, 10g) .set(spark.driver.memory, 20g) .set(spark.akka.timeout,6000) PS : I understand that 6000 is quite large, but I'm just trying to see if it actually changes Here is the command that I'm running sudo MASTER=spark://master.spark.com:7077 /opt/spark/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class class-name --executor-memory 20G --driver-memory 10G --deploy-mode client --conf spark.akka.timeout=6000 --conf spark.akka.askTimeout=6000 jar file path and here is how I load the file JavaPairRDDString, String learningRdd=sc.wholeTextFiles(filePath,10); Thanks On Mon, Apr 27, 2015 at 3:36 AM, Bryan Cutler cutl...@gmail.com wrote: I'm not sure what the expected performance should be for this amount of data, but you could try to increase the timeout with the property spark.akka.timeout to see if that helps. Bryan On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan dgk...@gmail.com wrote: Hello All, I'm trying to process a 3.5GB file on standalone mode using spark. I could run my spark job succesfully on a 100MB file and it works as expected. But, when I try to run it on the 3.5GB file, I run into the below error : 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block taskresult_83 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, 39143))] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) 15/04/26 12:47:15 INFO MemoryStore: ensureFreeSpace(26227673) called with curMem=265897, maxMem=5556991426 15/04/26 12:47:15 INFO MemoryStore: Block taskresult_92 stored as bytes in memory (estimated size 25.0 MB, free 5.2 GB) 15/04/26 12:47:16 INFO MemoryStore: ensureFreeSpace(26272879) called with curMem=26493570, maxMem=5556991426 15/04/26 12:47:16 INFO MemoryStore: Block taskresult_94 stored as bytes in memory (estimated size 25.1 MB, free 5.1 GB) 15/04/26 12:47:18 INFO MemoryStore: ensureFreeSpace(26285327) called with curMem=52766449, maxMem=5556991426 and the job fails. I'm on AWS and have opened all ports. Also, since the 100MB file works, it should not be a connection issue. I've a r3 xlarge and 2 m3 large. Can anyone suggest a way to fix this? -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com
Re: Timeout Error
The configuration key should be spark.akka.askTimeout for this timeout. The time unit is seconds. Best Regards, Shixiong(Ryan) Zhu 2015-04-26 15:15 GMT-07:00 Deepak Gopalakrishnan dgk...@gmail.com: Hello, Just to add a bit more context : I have done that in the code, but I cannot see it change from 30 seconds in the log. .set(spark.executor.memory, 10g) .set(spark.driver.memory, 20g) .set(spark.akka.timeout,6000) PS : I understand that 6000 is quite large, but I'm just trying to see if it actually changes Here is the command that I'm running sudo MASTER=spark://master.spark.com:7077 /opt/spark/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class class-name --executor-memory 20G --driver-memory 10G --deploy-mode client --conf spark.akka.timeout=6000 --conf spark.akka.askTimeout=6000 jar file path and here is how I load the file JavaPairRDDString, String learningRdd=sc.wholeTextFiles(filePath,10); Thanks On Mon, Apr 27, 2015 at 3:36 AM, Bryan Cutler cutl...@gmail.com wrote: I'm not sure what the expected performance should be for this amount of data, but you could try to increase the timeout with the property spark.akka.timeout to see if that helps. Bryan On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan dgk...@gmail.com wrote: Hello All, I'm trying to process a 3.5GB file on standalone mode using spark. I could run my spark job succesfully on a 100MB file and it works as expected. But, when I try to run it on the 3.5GB file, I run into the below error : 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block taskresult_83 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, 39143))] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) 15/04/26 12:47:15 INFO MemoryStore: ensureFreeSpace(26227673) called with curMem=265897, maxMem=5556991426 15/04/26 12:47:15 INFO MemoryStore: Block taskresult_92 stored as bytes in memory (estimated size 25.0 MB, free 5.2 GB) 15/04/26 12:47:16 INFO MemoryStore: ensureFreeSpace(26272879) called with curMem=26493570, maxMem=5556991426 15/04/26 12:47:16 INFO MemoryStore: Block taskresult_94 stored as bytes in memory (estimated size 25.1 MB, free 5.1 GB) 15/04/26 12:47:18 INFO MemoryStore: ensureFreeSpace(26285327) called with curMem=52766449, maxMem=5556991426 and the job fails. I'm on AWS and have opened all ports. Also, since the 100MB file works, it should not be a connection issue. I've a r3 xlarge and 2 m3 large. Can anyone suggest a way to fix this? -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com