How to modify Hadoop APIs used by Spark?
Hi all, I find that Spark uses some Hadoop APIs such as InputFormat, InputSplit, etc., and I want to modify these Hadoop APIs. Do you know how can I integrate my modified Hadoop code into Spark? Great thanks!
Dynamic resource allocation in Standalone mode
Hi all, I am planning to dynamically increase or decrease the number of executors allocated to an application during runtime, and it is similar to dynamic resource allocation, which is only feasible in Spark on Yarn mode. Any suggestions on how to implement this feature in Standalone mode? My current problem is: I want to send a ADD_EXECUTOR command from scheduler module (in CoarseGrainedSchedulerBackend.scala) to deploy module (in Master.scala), but don't know how to communicate between the two modules Great thanks for any suggestions!
Question about Spark process and thread
Hi, I was looking at Spark source code, and I found that when launching a Executor, actually Spark is launching a threadpool; each time the scheduler launches a task, the executor will launch a thread within the threadpool. However, I also found that the Spark process always has approximately 40 threads running regardless of my configuration (SPARK_WORKER_CORES, SPARK_WORKER_INSTANCES, --executor-cores, --total-executor-cores, etc.). Does it mean Spark will pre-launch 40 threads even before the tasks are launched? Great thanks! Best, Ray