Re: Issue in parallelization of CNN model using spark

2020-07-13 Thread Juan Martín Guillén
Hi Mukhtaj, Parallelization on Spark is abstracted on the DataFrame. You can run anything locally on the driver but to make it run in parallel on the cluster you'll need to use the DataFrame abstraction. You may want to check maxpumperla/elephas. | | | | | | | | | | |

Re: Spark yarn cluster

2020-07-11 Thread Juan Martín Guillén
Hi Diwakar, A Yarn cluster not having Hadoop is kind of a fuzzy concept. Definitely you may want to have Hadoop and don't need to use MapReduce and use Spark instead. That is the main reason to use Spark in a Hadoop cluster anyway. On the other hand it is highly probable you may want to use

Re: RDD-like API for entirely local workflows?

2020-07-04 Thread Juan Martín Guillén
threads, so that is what I am trying to eliminate. Regards, Antonin On 04/07/2020 17:49, Juan Martín Guillén wrote: > Hi Antonin. > > It seems you are confusing Standalone with Local mode. They are 2 > different modes. > > From Spark in Action book: "In local mode, the

Re: RDD-like API for entirely local workflows?

2020-07-04 Thread Juan Martín Guillén
Hi Antonin. It seems you are confusing Standalone with Local mode. They are 2 different modes. >From Spark in Action book: "In local mode, there is only one executor in the >same client JVM as the driver, butthis executor can spawn several threads to >run tasks. In local mode, Spark uses your