Dear all, I am trying to run a spark code on multiple machines using submit job in google cloud platform. As the inputs of my code, I have a training and testing datasets.
When I use small training data set like (10kb), the code can be successfully ran on the google cloud while when I have a large data set like 50Gb, I received the following error: 17/02/01 19:08:06 ERROR org.apache.spark.scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerTaskEnd(2,0,ResultTask,TaskKilled,org.apache.spark.scheduler.TaskInfo@3101f3b3,null) Does anyone can give me a hint how I can solve my problem? PS: I cannot use small training data set because I have an optimization code which needs to use all the data. I have to use google could platform because I need to run the code on multiple machines. Thanks a lot, Anahita