Hi Rishitesh, We are not using any RDD's to parallelize the processing and all of the algorithm runs on a single core (and in a single thread). The parallelism is done at the user level
The disk can be started in a separate IO, but then the executor will not be able to take up more jobs, since thats how I believe Spark is designed by default On Sat, Aug 22, 2015 at 12:51 AM, Rishitesh Mishra <rishi80.mis...@gmail.com > wrote: > Hi Sateesh, > It is interesting to know , how did you determine that the Dstream runs on > a single core. Did you mean receivers? > > Coming back to your question, could you not start disk io in a separate > thread, so that the sceduler can go ahead and assign other tasks ? > On 21 Aug 2015 16:06, "Sateesh Kavuri" <sateesh.kav...@gmail.com> wrote: > >> Hi, >> >> My scenario goes like this: >> I have an algorithm running in Spark streaming mode on a 4 core virtual >> machine. Majority of the time, the algorithm does disk I/O and database >> I/O. Question is, during the I/O, where the CPU is not considerably loaded, >> is it possible to run any other task/thread so as to efficiently utilize >> the CPU? >> >> Note that one DStream of the algorithm runs completely on a single CPU >> >> Thank you, >> Sateesh >> >