Re: Spark on YARN driver memory allocation bug?

2014-10-17 Thread Boduo Li
It may also cause a problem when running in the yarn-client mode. If --driver-memory is large, Yarn has to allocate a lot of memory to the AM container, but AM doesn't really need the memory. Boduo -- View this message in context:

Re: How to achieve reasonable performance on Spark Streaming?

2014-06-12 Thread Boduo Li
It seems that the slow reduce tasks are caused by slow shuffling. Here is the logs regarding one slow reduce task: 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_88_18 after 5029 ms 14/06/11 23:42:45 INFO

Re: Problem in Spark Streaming

2014-06-10 Thread Boduo Li
Hi Nilmish, What's the data rate/node when you see the high latency? (It seems the latency keeps increasing.) Do you still see it if you lower the data rate or the frequency of the windowed query? -- View this message in context:

Re: abnormal latency when running Spark Streaming

2014-06-10 Thread Boduo Li
Hi Yingjun, Do you see a stable latency or the latency keeps increasing? And could you provide some details about the input data rate/node, batch interval, windowDuration and slideDuration when you see the high latency? -- View this message in context:

Re: Problem in Spark Streaming

2014-06-10 Thread Boduo Li
Oh, I mean the average data rate/node. But in case I want to know the input activities to each node (I use a custom receiver instead of Kafka), I usually search these records in logs to get a sense: BlockManagerInfo: Added input ... on [hostname:port] (size: xxx KB) I also see some spikes in