Hi Disha,
This is a good question. We plan to elaborate on it in our talk on the upcoming
Spark Summit. Less workers means less compute power, more workers means more
communication overhead. So, there exist an optimal number of workers for
solving optimization problem with batch gradient given
Hi Alexander,
Thanks for your reply.Actually I am working with a modified version of the
actual MNIST dataset ( maximum samples = 8.2 M)
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html. I
have been running different sized versions*( 1,10,50,1M,8M
samples)* on
Hi Disha,
The problem might be as follows. The data that you have might physically reside
only on two nodes and Spark launches data-local tasks. As a result, only two
workers are used. You might want to force Spark to distribute the data across
all nodes, however it does not seem to be
Dear Spark developers,
I am trying to study the effect of increasing number of cores ( CPU's) on
speedup and accuracy ( scalability with spark ANN ) performance for the
MNIST dataset using ANN implementation provided in the latest spark release.
I have formed a cluster of 5 machines with 88
Having only 2 workers for 5 machines would be your problem: you
probably want 1 worker per physical machine, which entails running the
spark-daemon.sh script to start a worker on those machines.
The partitioning is agnositic to how many executors are available for
running the tasks, so you can't