you not start disk io in a separate
thread, so that the sceduler can go ahead and assign other tasks ?
On 21 Aug 2015 16:06, Sateesh Kavuri sateesh.kav...@gmail.com wrote:
Hi,
My scenario goes like this:
I have an algorithm running in Spark streaming mode on a 4 core virtual
machine
...@sigmoidanalytics.com
wrote:
Hmm for a singl core VM you will have to run it in local mode(specifying
master= local[4]). The flag is available in all the versions of spark i
guess.
On Aug 22, 2015 5:04 AM, Sateesh Kavuri sateesh.kav...@gmail.com
wrote:
Thanks Akhil. Does this mean that the executor running
...@sigmoidanalytics.com
wrote:
You can look at the spark.streaming.concurrentJobs by default it runs a
single job. If set it to 2 then it can run 2 jobs parallely. Its an
experimental flag, but go ahead and give it a try.
On Aug 21, 2015 3:36 AM, Sateesh Kavuri sateesh.kav...@gmail.com
wrote:
Hi,
My scenario
Hi,
My scenario goes like this:
I have an algorithm running in Spark streaming mode on a 4 core virtual
machine. Majority of the time, the algorithm does disk I/O and database
I/O. Question is, during the I/O, where the CPU is not considerably loaded,
is it possible to run any other task/thread
Probably overloading the question a bit.
In Storm, Bolts have the functionality of getting triggered on events. Is
that kind of functionality possible with Spark streaming? During each phase
of the data processing, the transformed data is stored to the database and
this transformed data should
Hi,
I have used weka machine learning library for generating a model for my
training set. I have used the PART algorithm (decision lists) from weka.
Now, I would like to use spark ML for the PART algo for my training set and
could not seem to find a parallel. Could anyone point out the
to be Spark specific, btw
On Apr 2, 2015, at 4:45 AM, Sateesh Kavuri sateesh.kav...@gmail.com
wrote:
Hi,
We have a case that we will have to run concurrent jobs (for the same
algorithm) on different data sets. And these jobs can run in parallel and
each one of them would be fetching
Hi,
We have a case that we will have to run concurrent jobs (for the same
algorithm) on different data sets. And these jobs can run in parallel and
each one of them would be fetching the data from the database.
We would like to optimize the database connections by making use of
connection
#transformations-on-dstreams
On Thu, Apr 2, 2015 at 7:52 AM, Sateesh Kavuri sateesh.kav...@gmail.com
wrote:
Right, I am aware on how to use connection pooling with oracle, but the
specific question is how to use it in the context of spark job execution
On 2 Apr 2015 17:41, Ted Yu yuzhih...@gmail.com
performance
if your required number of connections outstrips the database's resources.
On Fri, Apr 3, 2015 at 12:22 AM Sateesh Kavuri sateesh.kav...@gmail.com
wrote:
But this basically means that the pool is confined to the job (of a
single app) in question, but is not sharable across
10 matches
Mail list logo