I also have an issue consuming from Kafka. When I consume from Kafka, there are always a single executor working on this job. Even I use repartition, it seems that there is still a single executor. Does anyone has an idea how to add parallelism to this job?
On Thu, Jul 17, 2014 at 2:06 PM, Chen Song <chen.song...@gmail.com> wrote: > Thanks Luis and Tobias. > > > On Tue, Jul 1, 2014 at 11:39 PM, Tobias Pfeiffer <t...@preferred.jp> wrote: > >> Hi, >> >> On Wed, Jul 2, 2014 at 1:57 AM, Chen Song <chen.song...@gmail.com> wrote: >>> >>> * Is there a way to control how far Kafka Dstream can read on >>> topic-partition (via offset for example). By setting this to a small >>> number, it will force DStream to read less data initially. >>> >> >> Please see the post at >> >> http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3ccaph-c_m2ppurjx-n_tehh0bvqe_6la-rvgtrf1k-lwrmme+...@mail.gmail.com%3E >> Kafka's auto.offset.reset parameter may be what you are looking for. >> >> Tobias >> >> > > > -- > Chen Song > >