How to modify Hadoop APIs used by Spark?

2015-09-21 Thread Dogtail Ray
Hi all,

I find that Spark uses some Hadoop APIs such as InputFormat, InputSplit,
etc., and I want to modify these Hadoop APIs. Do you know how can I
integrate my modified Hadoop code into Spark? Great thanks!


Dynamic resource allocation in Standalone mode

2015-07-18 Thread Dogtail Ray
Hi all,

I am planning to dynamically increase or decrease the number of executors
allocated to an application during runtime, and it is similar to dynamic
resource allocation, which is only feasible in Spark on Yarn mode. Any
suggestions on how to implement this feature in Standalone mode?

My current problem is: I want to send a ADD_EXECUTOR command from scheduler
module (in CoarseGrainedSchedulerBackend.scala) to deploy module (in
Master.scala), but don't know how to communicate between the two
modules Great thanks for any suggestions!


Question about Spark process and thread

2015-06-28 Thread Dogtail Ray
Hi,

I was looking at Spark source code, and I found that when launching a
Executor, actually Spark is launching a threadpool; each time the scheduler
launches a task, the executor will launch a thread within the threadpool.

However, I also found that the Spark process always has approximately 40
threads running regardless of my configuration (SPARK_WORKER_CORES,
SPARK_WORKER_INSTANCES, --executor-cores, --total-executor-cores, etc.).
Does it mean Spark will pre-launch 40 threads even before the tasks are
launched? Great thanks!

Best,
Ray