The code is pretty long. But the main idea is to consume from Kafka, preprocess the data, and groupBy a field. I use mutliple DStream to add parallelism to the consumer. It seems when the number of DStreams is large, this happens often.
Thanks, Bill On Tue, Jul 22, 2014 at 11:13 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Can you paste the piece of code? > > Thanks > Best Regards > > > On Wed, Jul 23, 2014 at 1:22 AM, Bill Jay <bill.jaypeter...@gmail.com> > wrote: > >> Hi all, >> >> I am running a spark streaming job. The job hangs on one stage, which >> shows as follows: >> >> Details for Stage 4 >> Summary Metrics No tasks have started yetTasksNo tasks have started yet >> >> >> >> Does anyone have an idea on this? >> >> Thanks! >> >> Bill >> Bill >> > >