Hi Tobias, It seems that repartition can create more executors for the stages following data receiving. However, the number of executors is still far less than what I require (I specify one core for each executor). Based on the index of the executors in the stage, I find many numbers are missing in between. For example, if I repartition(100), the index of executors may be 1, 3, 5, 10, etc. Finally, there may be 45 executors although I request 100 partitions.
Bill On Thu, Jul 17, 2014 at 6:15 PM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Bill, > > are you saying, after repartition(400), you have 400 partitions on one > host and the other hosts receive nothing of the data? > > Tobias > > > On Fri, Jul 18, 2014 at 8:11 AM, Bill Jay <bill.jaypeter...@gmail.com> > wrote: > >> I also have an issue consuming from Kafka. When I consume from Kafka, >> there are always a single executor working on this job. Even I use >> repartition, it seems that there is still a single executor. Does anyone >> has an idea how to add parallelism to this job? >> >> >> >> On Thu, Jul 17, 2014 at 2:06 PM, Chen Song <chen.song...@gmail.com> >> wrote: >> >>> Thanks Luis and Tobias. >>> >>> >>> On Tue, Jul 1, 2014 at 11:39 PM, Tobias Pfeiffer <t...@preferred.jp> >>> wrote: >>> >>>> Hi, >>>> >>>> On Wed, Jul 2, 2014 at 1:57 AM, Chen Song <chen.song...@gmail.com> >>>> wrote: >>>>> >>>>> * Is there a way to control how far Kafka Dstream can read on >>>>> topic-partition (via offset for example). By setting this to a small >>>>> number, it will force DStream to read less data initially. >>>>> >>>> >>>> Please see the post at >>>> >>>> http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3ccaph-c_m2ppurjx-n_tehh0bvqe_6la-rvgtrf1k-lwrmme+...@mail.gmail.com%3E >>>> Kafka's auto.offset.reset parameter may be what you are looking for. >>>> >>>> Tobias >>>> >>>> >>> >>> >>> -- >>> Chen Song >>> >>> >> >