Hi, I would like to ask if it is possible to use generator, that generates data bigger than size of RAM across all the machines as the input for sc = SparkContext(), sc.paralelize(generator). I would like to create RDD this way. When I am trying to create RDD by sc.TextFile(file) where file has even bigger size than data generated by the generator everything works fine, but unfortunately I need to use sc.parallelize(generator) and it makes my OS to kill the spark job. I'm getting only this log and then the job is killed: 14/10/06 13:34:16 INFO spark.SecurityManager: Changing view acls to: root, 14/10/06 13:34:16 INFO spark.SecurityManager: Changing modify acls to: root, 14/10/06 13:34:16 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, ); users with modify permissions: Set(root, ) 14/10/06 13:34:16 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/10/06 13:34:16 INFO Remoting: Starting remoting 14/10/06 13:34:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@ip-172-31-25-197.ec2.internal:41016] 14/10/06 13:34:17 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@ip-172-31-25-197.ec2.internal:41016] 14/10/06 13:34:17 INFO util.Utils: Successfully started service 'sparkDriver' on port 41016. 14/10/06 13:34:17 INFO spark.SparkEnv: Registering MapOutputTracker 14/10/06 13:34:17 INFO spark.SparkEnv: Registering BlockManagerMaster 14/10/06 13:34:17 INFO storage.DiskBlockManager: Created local directory at /mnt/spark/spark-local-20141006133417-821e 14/10/06 13:34:17 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 42438. 14/10/06 13:34:17 INFO network.ConnectionManager: Bound socket to port 42438 with id = ConnectionManagerId(ip-172-31-25-197.ec2.internal,42438) 14/10/06 13:34:17 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB 14/10/06 13:34:17 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/10/06 13:34:17 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-31-25-197.ec2.internal:42438 with 267.3 MB RAM 14/10/06 13:34:17 INFO storage.BlockManagerMaster: Registered BlockManager 14/10/06 13:34:17 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-c4edda1c-0949-490d-8ff3-10993727c523 14/10/06 13:34:17 INFO spark.HttpServer: Starting HTTP Server 14/10/06 13:34:17 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/10/06 13:34:17 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:44768 14/10/06 13:34:17 INFO util.Utils: Successfully started service 'HTTP file server' on port 44768. 14/10/06 13:34:17 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/10/06 13:34:17 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/10/06 13:34:17 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 14/10/06 13:34:17 INFO ui.SparkUI: Started SparkUI at http://ec2-54-164-72-236.compute-1.amazonaws.com:4040 14/10/06 13:34:18 INFO util.Utils: Copying /root/generator_test.py to /tmp/spark-0bafac0c-6779-4910-b095-0ede226fa3ce/generator_test.py 14/10/06 13:34:18 INFO spark.SparkContext: Added file file:/root/generator_test.py at http://172.31.25.197:44768/files/generator_test.py with timestamp 1412602458065 14/10/06 13:34:18 INFO client.AppClient$ClientActor: Connecting to master spark://ec2-54-164-72-236.compute-1.amazonaws.com:7077... 14/10/06 13:34:18 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 14/10/06 13:34:18 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141006133418-0046 14/10/06 13:34:18 INFO client.AppClient$ClientActor: Executor added: app-20141006133418-0046/0 on worker-20141005074620-ip-172-31-30-40.ec2.internal-49979 (ip-172-31-30-40.ec2.internal:49979) with 1 cores 14/10/06 13:34:18 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141006133418-0046/0 on hostPort ip-172-31-30-40.ec2.internal:49979 with 1 cores, 512.0 MB RAM 14/10/06 13:34:18 INFO client.AppClient$ClientActor: Executor updated: app-20141006133418-0046/0 is now RUNNING 14/10/06 13:34:21 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-31-30-40.ec2.internal:50877/user/Executor#-1621441852] with ID 0 14/10/06 13:34:21 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-31-30-40.ec2.internal:34460 with 267.3 MB RAM
Thank you in advance for any advice or sugestion.
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org