from:"Thomas Lavocat"

Spark In Memory Shuffle

2018-10-17 Thread thomas lavocat

Hi everyone, The possibility to have in memory shuffling is discussed in this issue https://github.com/apache/spark/pull/5403. It was in 2015. In 2016 the paper "Scaling Spark on HPC Systems" says that Spark still shuffle using disks. I would like to know : What is the current state of

Re: [Spark Streaming MEMORY_ONLY] Understanding Dataflow

2018-07-05 Thread Thomas Lavocat

tasks/stages are > defined to perform which may result in shuffle. If I understand correctly : * Only shuffle data goes through the driver * The receivers data stays node local until a shuffle occurs Is that right ? > On Wed, Jul 4, 2018 at 1:56 PM, thomas lavocat < > thomas.lavo...@univ-

[Spark Streaming MEMORY_ONLY] Understanding Dataflow

2018-07-04 Thread thomas lavocat

Hello, I have a question on Spark Dataflow. If I understand correctly, all received data is sent from the executor to the driver of the application prior to task creation. Then the task embeding the data transit from the driver to the executor in order to be processed. As executor cannot

Re: [Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

2018-06-11 Thread thomas lavocat

ious batch, if you set "spark.streaming.concurrentJobs" larger than 1, then the current batch could start without waiting for the previous batch (if it is delayed), which will lead to unexpected results. thomas lavocat <mailto:thomas.lavo...@univ-grenoble-alpes.fr>> 于2018年6月5日

Re: [Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

2018-06-05 Thread thomas lavocat

are not independent. What do you mean exactly by not independent ? Are several source joined together dependent ? Thanks, Thomas thomas lavocat <mailto:thomas.lavo...@univ-grenoble-alpes.fr>> 于2018年6月5日周二下午7:17写道： Hello, Thank's for your answer. On 05/06/2018 11:24, Saisai S

Re: [Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

2018-06-05 Thread thomas lavocat

Hello, Thank's for your answer. On 05/06/2018 11:24, Saisai Shao wrote: spark.streaming.concurrentJobs is a driver side internal configuration, this means that how many streaming jobs can be submitted concurrently in one batch. Usually this should not be configured by user, unless you're

[Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

2018-06-05 Thread thomas lavocat

Hi everyone, I'm wondering if the property spark.streaming.concurrentJobs should reflects the total number of possible concurrent task on the cluster, or the a local number of concurrent tasks on one compute node. Thanks for your help. Thomas

Spark In Memory Shuffle

Re: [Spark Streaming MEMORY_ONLY] Understanding Dataflow

[Spark Streaming MEMORY_ONLY] Understanding Dataflow

Re: [Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

Re: [Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

Re: [Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

[Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

7 matches

Site Navigation

Mail list logo

Footer information