Hi Ashic Mahtab, The Cassandra and the Zookeeper are they installed as a part of Yarn architecture or are they installed in a separate layer with Apache Spark .
Thanks and Regards, Sudipta On Thu, Jan 22, 2015 at 8:13 PM, Ashic Mahtab <as...@live.com> wrote: > Hi Guys, > So I changed the interval to 15 seconds. There's obviously a lot more > messages per batch, but (I think) it looks a lot healthier. Can you see any > major warning signs? I think that with 2 second intervals, the setup / > teardown per partition was what was causing the delays. > > Streaming > > - *Started at: *Thu Jan 22 13:23:12 GMT 2015 > - *Time since start: *1 hour 17 minutes 16 seconds > - *Network receivers: *2 > - *Batch interval: *15 seconds > - *Processed batches: *309 > - *Waiting batches: *0 > > > > Statistics over last 100 processed batchesReceiver Statistics > > - Receiver > > > - Status > > > - Location > > > - Records in last batch > - [2015/01/22 14:40:29] > > > - Minimum rate > - [records/sec] > > > - Median rate > - [records/sec] > > > - Maximum rate > - [records/sec] > > > - Last Error > > RmqReceiver-0ACTIVEVDCAPP53.foo.local2.6 K29106295-RmqReceiver-1ACTIVE > VDCAPP50.bar.local2.6 K29107291- > Batch Processing Statistics > > MetricLast batchMinimum25th percentileMedian75th > percentileMaximumProcessing > Time4 seconds 812 ms4 seconds 698 ms4 seconds 738 ms4 seconds 761 ms4 > seconds 788 ms5 seconds 802 msScheduling Delay2 ms0 ms3 ms3 ms4 ms9 msTotal > Delay4 seconds 814 ms4 seconds 701 ms4 seconds 739 ms4 seconds 764 ms4 > seconds 792 ms5 seconds 809 ms > > > Regards, > Ashic. > ------------------------------ > From: as...@live.com > To: gerard.m...@gmail.com > CC: user@spark.apache.org > Subject: RE: Are these numbers abnormal for spark streaming? > Date: Thu, 22 Jan 2015 12:32:05 +0000 > > > Hi Gerard, > Thanks for the response. > > The messages get desrialised from msgpack format, and one of the strings > is desrialised to json. Certain fields are checked to decide if further > processing is required. If so, it goes through a series of in mem filters > to check if more processing is required. If so, only then does the "heavy" > work start. That consists of a few db queries, and potential updates to the > db + message on message queue. The majority of messages don't need > processing. The messages needing processing at peak are about three every > other second. > > One possible things that might be happening is the session initialisation > and prepared statement initialisation for each partition. I can resort to > some tricks, but I think I'll try increasing batch interval to 15 seconds. > I'll report back with findings. > > Thanks, > Ashic. > > ------------------------------ > From: gerard.m...@gmail.com > Date: Thu, 22 Jan 2015 12:30:08 +0100 > Subject: Re: Are these numbers abnormal for spark streaming? > To: tathagata.das1...@gmail.com > CC: as...@live.com; t...@databricks.com; user@spark.apache.org > > and post the code (if possible). > In a nutshell, your processing time > batch interval, resulting in an > ever-increasing delay that will end up in a crash. > 3 secs to process 14 messages looks like a lot. Curious what the job logic > is. > > -kr, Gerard. > > On Thu, Jan 22, 2015 at 12:15 PM, Tathagata Das < > tathagata.das1...@gmail.com> wrote: > > This is not normal. Its a huge scheduling delay!! Can you tell me more > about the application? > - cluser setup, number of receivers, whats the computation, etc. > > On Thu, Jan 22, 2015 at 3:11 AM, Ashic Mahtab <as...@live.com> wrote: > > Hate to do this...but...erm...bump? Would really appreciate input from > others using Streaming. Or at least some docs that would tell me if these > are expected or not. > > ------------------------------ > From: as...@live.com > To: user@spark.apache.org > Subject: Are these numbers abnormal for spark streaming? > Date: Wed, 21 Jan 2015 11:26:31 +0000 > > > Hi Guys, > I've got Spark Streaming set up for a low data rate system (using spark's > features for analysis, rather than high throughput). Messages are coming in > throughout the day, at around 1-20 per second (finger in the air > estimate...not analysed yet). In the spark streaming UI for the > application, I'm getting the following after 17 hours. > > Streaming > > - *Started at: *Tue Jan 20 16:58:43 GMT 2015 > - *Time since start: *18 hours 24 minutes 34 seconds > - *Network receivers: *2 > - *Batch interval: *2 seconds > - *Processed batches: *16482 > - *Waiting batches: *1 > > > > Statistics over last 100 processed batchesReceiver Statistics > > - Receiver > > > - Status > > > - Location > > > - Records in last batch > - [2015/01/21 11:23:18] > > > - Minimum rate > - [records/sec] > > > - Median rate > - [records/sec] > > > - Maximum rate > - [records/sec] > > > - Last Error > > RmqReceiver-0ACTIVEFOOOO > 144727-RmqReceiver-1ACTIVEBAAAAR > 124726- > Batch Processing Statistics > > MetricLast batchMinimum25th percentileMedian75th > percentileMaximumProcessing > Time3 seconds 994 ms157 ms4 seconds 16 ms4 seconds 961 ms5 seconds 3 ms5 > seconds 171 msScheduling Delay9 hours 15 minutes 4 seconds9 hours 10 > minutes 54 seconds9 hours 11 minutes 56 seconds9 hours 12 minutes 57 > seconds9 hours 14 minutes 5 seconds9 hours 15 minutes 4 secondsTotal > Delay9 hours 15 minutes 8 seconds9 hours 10 minutes 58 seconds9 hours > 12 minutes9 hours 13 minutes 2 seconds9 hours 14 minutes 10 seconds9 > hours 15 minutes 8 seconds > > > Are these "normal". I was wondering what the scheduling delay and total > delay terms are, and if it's normal for them to be 9 hours. > > I've got a standalone spark master and 4 spark nodes. The streaming app > has been given 4 cores, and it's using 1 core per worker node. The > streaming app is submitted from a 5th machine, and that machine has nothing > but the driver running. The worker nodes are running alongside Cassandra > (and reading and writing to it). > > Any insights would be appreciated. > > Regards, > Ashic. > > > > -- Sudipta Banerjee Consultant, Business Analytics and Cloud Based Architecture Call me +919019578099