Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
you not start disk io in a separate thread, so that the sceduler can go ahead and assign other tasks ? On 21 Aug 2015 16:06, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Hi, My scenario goes like this: I have an algorithm running in Spark streaming mode on a 4 core virtual machine

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
...@sigmoidanalytics.com wrote: Hmm for a singl core VM you will have to run it in local mode(specifying master= local[4]). The flag is available in all the versions of spark i guess. On Aug 22, 2015 5:04 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Thanks Akhil. Does this mean that the executor running

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
...@sigmoidanalytics.com wrote: You can look at the spark.streaming.concurrentJobs by default it runs a single job. If set it to 2 then it can run 2 jobs parallely. Its an experimental flag, but go ahead and give it a try. On Aug 21, 2015 3:36 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Hi, My scenario

Spark streaming multi-tasking during I/O

2015-08-21 Thread Sateesh Kavuri
Hi, My scenario goes like this: I have an algorithm running in Spark streaming mode on a 4 core virtual machine. Majority of the time, the algorithm does disk I/O and database I/O. Question is, during the I/O, where the CPU is not considerably loaded, is it possible to run any other task/thread

Re: Spark or Storm

2015-06-16 Thread Sateesh Kavuri
Probably overloading the question a bit. In Storm, Bolts have the functionality of getting triggered on events. Is that kind of functionality possible with Spark streaming? During each phase of the data processing, the transformed data is stored to the database and this transformed data should

Spark ML decision list

2015-06-04 Thread Sateesh Kavuri
Hi, I have used weka machine learning library for generating a model for my training set. I have used the PART algorithm (decision lists) from weka. Now, I would like to use spark ML for the PART algo for my training set and could not seem to find a parallel. Could anyone point out the

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
to be Spark specific, btw On Apr 2, 2015, at 4:45 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Hi, We have a case that we will have to run concurrent jobs (for the same algorithm) on different data sets. And these jobs can run in parallel and each one of them would be fetching

Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
Hi, We have a case that we will have to run concurrent jobs (for the same algorithm) on different data sets. And these jobs can run in parallel and each one of them would be fetching the data from the database. We would like to optimize the database connections by making use of connection

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
#transformations-on-dstreams On Thu, Apr 2, 2015 at 7:52 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Right, I am aware on how to use connection pooling with oracle, but the specific question is how to use it in the context of spark job execution On 2 Apr 2015 17:41, Ted Yu yuzhih...@gmail.com

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
performance if your required number of connections outstrips the database's resources. On Fri, Apr 3, 2015 at 12:22 AM Sateesh Kavuri sateesh.kav...@gmail.com wrote: But this basically means that the pool is confined to the job (of a single app) in question, but is not sharable across