Hi Arijit, BTW, as you cannot share the code. You may find these issues helpful to avoid this problem.
Even we have this problem, yet to be resolved. https://issues.apache.org/jira/browse/SYSTEMML-831 Possible verdict. https://issues.apache.org/jira/browse/SPARK-6235 Have closer look at this comment <https://issues.apache.org/jira/browse/SYSTEMML-831?focusedCommentId=15525147&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15525147>. May be you can try manipulating the drive configurations and/or batch sizes. Cheers, Janardhan On Sun, Jul 16, 2017 at 8:51 PM, arijit chakraborty <ak...@hotmail.com> wrote: > Hi Janardhan, > > > Thanks for your reply. I, for the time being, can't share the actual code. > It's still work in progress. But our datasize is 28 MB and it has 100 > continuous variable and 1 column with numeric label variable. > > > But thanks for guiding us that it's spark issue, rather than systemML > issue. > > > Thank you! Regards, > > Arijit > > ________________________________ > From: Janardhan Pulivarthi <janardhan.pulivar...@gmail.com> > Sent: Sunday, July 16, 2017 10:15:12 AM > To: dev@systemml.apache.org > Subject: Re: Error while Executing code inSystemML > > Hi Arijit, > > Can you please send the exact code (the .dml file), you have used and the > dataset details and sizes?. This is problem has something to do with the > Apache Spark. > > Thanks, Janardhan > > On Sat, Jul 15, 2017 at 3:46 PM, arijit chakraborty <ak...@hotmail.com> > wrote: > > > Hi, > > > > > > I'm suddenly getting this error while running the code in systemML. For > > smaller number of data points it running fine. But when I'm increasing > the > > data point, it's throwing this error.I'm using system with 244 gb ram 32 > > cores and 100 gb hard disk space and putting pyspark configurations in > the > > notebook only. > > > > > > 17/07/14 21:49:25 WARN TaskSetManager: Stage 394647 contains a task of > > very large size (558 KB). The maximum recommended > > task size is 100 KB. > > 17/07/14 21:54:18 ERROR ContextCleaner: Error cleaning broadcast 431882 > > org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 > > seconds]. This timeout is controlled by spark.rpc > > .askTimeout > > at org.apache.spark.rpc.RpcTimeout.org$apache$spark$ > > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4 > > 8) > > at org.apache.spark.rpc.RpcTimeout$$anonfun$ > addMessageIfTimeout$1. > > applyOrElse(RpcTimeout.scala:63) > > at org.apache.spark.rpc.RpcTimeout$$anonfun$ > addMessageIfTimeout$1. > > applyOrElse(RpcTimeout.scala:59) > > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > > at org.apache.spark.rpc.RpcTimeout.awaitResult( > > RpcTimeout.scala:83) > > at org.apache.spark.storage.BlockManagerMaster.removeBroadcast( > > BlockManagerMaster.scala:151) > > at org.apache.spark.broadcast.TorrentBroadcast$.unpersist( > > TorrentBroadcast.scala:299) > > at org.apache.spark.broadcast.TorrentBroadcastFactory. > unbroadcast( > > TorrentBroadcastFactory.scala:45) > > at org.apache.spark.broadcast.BroadcastManager.unbroadcast( > > BroadcastManager.scala:60) > > at org.apache.spark.ContextCleaner.doCleanupBroadcast( > > ContextCleaner.scala:232) > > at org.apache.spark.ContextCleaner$$anonfun$org$ > > apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$s > > p$2.apply(ContextCleaner.scala:188) > > at org.apache.spark.ContextCleaner$$anonfun$org$ > > apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$s > > p$2.apply(ContextCleaner.scala:179) > > at scala.Option.foreach(Option.scala:257) > > at org.apache.spark.ContextCleaner$$anonfun$org$ > > apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(Context > > Cleaner.scala:179) > > at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils. > > scala:1245) > > at org.apache.spark.ContextCleaner.org$apache$ > > spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:172) > > at org.apache.spark.ContextCleaner$$anon$1.run( > > ContextCleaner.scala:67) > > Caused by: java.util.concurrent.TimeoutException: Futures timed out > after > > [120 seconds] > > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise. > > scala:219) > > at scala.concurrent.impl.Promise$DefaultPromise.result(Promise. > > scala:223) > > at scala.concurrent.Await$$anonfun$result$1.apply( > > package.scala:190) > > at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn( > > BlockContext.scala:53) > > at scala.concurrent.Await$.result(package.scala:190) > > at org.apache.spark.rpc.RpcTimeout.awaitResult( > > RpcTimeout.scala:81) > > ... 12 more > > 17/07/14 21:54:18 WARN BlockManagerMaster: Failed to remove broadcast > > 431882 with removeFromMaster = true - Cannot recei > > ve any reply in 120 seconds. This timeout is controlled by > > spark.rpc.askTimeout > > org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in > 120 > > seconds. This timeout is controlled by spark.r > > pc.askTimeout > > at org.apache.spark.rpc.RpcTimeout.org$apache$spark$ > > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4 > > 8) > > at org.apache.spark.rpc.RpcTimeout$$anonfun$ > addMessageIfTimeout$1. > > applyOrElse(RpcTimeout.scala:63) > > at org.apache.spark.rpc.RpcTimeout$$anonfun$ > addMessageIfTimeout$1. > > applyOrElse(RpcTimeout.scala:59) > > at scala.runtime.AbstractPartialFunction.apply( > > AbstractPartialFunction.scala:36) > > at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216) > > at scala.util.Try$.apply(Try.scala:192) > > at scala.util.Failure.recover(Try.scala:216) > > at scala.concurrent.Future$$anonfun$recover$1.apply( > > Future.scala:326) > > at scala.concurrent.Future$$anonfun$recover$1.apply( > > Future.scala:326) > > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > > at org.spark_project.guava.util.concurrent.MoreExecutors$ > > SameThreadExecutorService.execute(MoreExecutors.java:29 > > 3) > > at scala.concurrent.impl.ExecutionContextImpl$$anon$1. > > execute(ExecutionContextImpl.scala:136) > > at scala.concurrent.impl.CallbackRunnable. > > executeWithValue(Promise.scala:40) > > at scala.concurrent.impl.Promise$DefaultPromise.tryComplete( > > Promise.scala:248) > > at scala.concurrent.Promise$class.complete(Promise.scala:55) > > at scala.concurrent.impl.Promise$DefaultPromise.complete( > > Promise.scala:153) > > at scala.concurrent.Future$$anonfun$map$1.apply(Future. > scala:237) > > at scala.concurrent.Future$$anonfun$map$1.apply(Future. > scala:237) > > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > > at scala.concurrent.BatchingExecutor$Batch$$ > > anonfun$run$1.processBatch$1(BatchingExecutor.scala:63) > > at scala.concurrent.BatchingExecutor$Batch$$ > > anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78) > > at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply( > > BatchingExecutor.scala:55) > > at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply( > > BatchingExecutor.scala:55) > > at scala.concurrent.BlockContext$.withBlockContext( > > BlockContext.scala:72) > > at scala.concurrent.BatchingExecutor$Batch.run( > > BatchingExecutor.scala:54) > > at scala.concurrent.Future$InternalCallbackExecutor$. > > unbatchedExecute(Future.scala:601) > > at scala.concurrent.BatchingExecutor$class. > > execute(BatchingExecutor.scala:106) > > at scala.concurrent.Future$InternalCallbackExecutor$. > > execute(Future.scala:599) > > at scala.concurrent.impl.CallbackRunnable. > > executeWithValue(Promise.scala:40) > > at scala.concurrent.impl.Promise$DefaultPromise.tryComplete( > > Promise.scala:248) > > at scala.concurrent.Promise$class.tryFailure(Promise.scala:112) > > at scala.concurrent.impl.Promise$DefaultPromise.tryFailure( > > Promise.scala:153) > > at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$ > > rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala: > > 205) > > at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run( > > NettyRpcEnv.scala:239) > > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown > > Source) > > at java.util.concurrent.FutureTask.run(Unknown Source) > > at java.util.concurrent.ScheduledThreadPoolExecutor$ > > ScheduledFutureTask.access$201(Unknown Source) > > at java.util.concurrent.ScheduledThreadPoolExecutor$ > > ScheduledFutureTask.run(Unknown Source) > > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > > Source) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > > Source) > > at java.lang.Thread.run(Unknown Source) > > Caused by: java.util.concurrent.TimeoutException: Cannot receive any > > reply in 120 seconds > > ... 8 more > > 17/07/14 21:54:18 WARN BlockManagerMaster: Failed to remove broadcast > > 432310 with removeFromMaster = true - Cannot recei > > ve any reply in 120 seconds. This timeout is controlled by > > spark.rpc.askTimeout > > org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in > 120 > > seconds. This timeout is controlled by spark.r > > pc.askTimeout > > at org.apache.spark.rpc.RpcTimeout.org$apache$spark$ > > rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:4 > > 8) > > at org.apache.spark.rpc.RpcTimeout$$anonfun$ > addMessageIfTimeout$1. > > applyOrElse(RpcTimeout.scala:63) > > at org.apache.spark.rpc.RpcTimeout$$anonfun$ > addMessageIfTimeout$1. > > applyOrElse(RpcTimeout.scala:59) > > at scala.runtime.AbstractPartialFunction.apply( > > AbstractPartialFunction.scala:36) > > at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216) > > at scala.util.Try$.apply(Try.scala:192) > > at scala.util.Failure.recover(Try.scala:216) > > at scala.concurrent.Future$$anonfun$recover$1.apply( > > Future.scala:326) > > at scala.concurrent.Future$$anonfun$recover$1.apply( > > Future.scala:326) > > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > > at org.spark_project.guava.util.concurrent.MoreExecutors$ > > SameThreadExecutorService.execute(MoreExecutors.java:29 > > 3) > > at scala.concurrent.impl.ExecutionContextImpl$$anon$1. > > execute(ExecutionContextImpl.scala:136) > > at scala.concurrent.impl.CallbackRunnable. > > executeWithValue(Promise.scala:40) > > at scala.concurrent.impl.Promise$DefaultPromise.tryComplete( > > Promise.scala:248) > > at scala.concurrent.Promise$class.complete(Promise.scala:55) > > at scala.concurrent.impl.Promise$DefaultPromise.complete( > > Promise.scala:153) > > at scala.concurrent.Future$$anonfun$map$1.apply(Future. > scala:237) > > at scala.concurrent.Future$$anonfun$map$1.apply(Future. > scala:237) > > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > > at scala.concurrent.BatchingExecutor$Batch$$ > > anonfun$run$1.processBatch$1(BatchingExecutor.scala:63) > > at scala.concurrent.BatchingExecutor$Batch$$ > > anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78) > > at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply( > > BatchingExecutor.scala:55) > > at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply( > > BatchingExecutor.scala:55) > > at scala.concurrent.BlockContext$.withBlockContext( > > BlockContext.scala:72) > > at scala.concurrent.BatchingExecutor$Batch.run( > > BatchingExecutor.scala:54) > > at scala.concurrent.Future$InternalCallbackExecutor$. > > unbatchedExecute(Future.scala:601) > > at scala.concurrent.BatchingExecutor$class. > > execute(BatchingExecutor.scala:106) > > at scala.concurrent.Future$InternalCallbackExecutor$. > > execute(Future.scala:599) > > at scala.concurrent.impl.CallbackRunnable. > > executeWithValue(Promise.scala:40) > > at scala.concurrent.impl.Promise$DefaultPromise.tryComplete( > > Promise.scala:248) > > at scala.concurrent.Promise$class.tryFailure(Promise.scala:112) > > at scala.concurrent.impl.Promise$DefaultPromise.tryFailure( > > Promise.scala:153) > > at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$ > > rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala: > > 205) > > at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run( > > NettyRpcEnv.scala:239) > > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown > > Source) > > at java.util.concurrent.FutureTask.run(Unknown Source) > > at java.util.concurrent.ScheduledThreadPoolExecutor$ > > ScheduledFutureTask.access$201(Unknown Source) > > at java.util.concurrent.ScheduledThreadPoolExecutor$ > > ScheduledFutureTask.run(Unknown Source) > > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > > Source) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > > Source) > > at java.lang.Thread.run(Unknown Source) > > Caused by: java.util.concurrent.TimeoutException: Cannot receive any > > reply in 120 seconds > > ... 8 more > > 17/07/14 21:54:18 WARN NettyRpcEnv: Ignored message: 0 > > 17/07/14 21:54:18 WARN NettyRpcEnv: Ignored message: 0 > > > > > > > > Thank you! > > > > Arijit > > >