Re: how spark dstream handles congestion?

2014-03-31 Thread Evgeny Shishkin
On 31 Mar 2014, at 21:05, Dong Mo wrote: > Dear list, > > I was wondering how Spark handles congestion when the upstream is generating > dstreams faster than downstream workers can handle? It will eventually OOM.

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Evgeny Shishkin
On 28 Mar 2014, at 02:10, Scott Clasen wrote: > Thanks everyone for the discussion. > > Just to note, I restarted the job yet again, and this time there are indeed > tasks being executed by both worker nodes. So the behavior does seem > inconsistent/broken atm. > > Then I added a third node to

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Evgeny Shishkin
On 28 Mar 2014, at 01:38, Evgeny Shishkin wrote: > > On 28 Mar 2014, at 01:32, Tathagata Das wrote: > >> Yes, no one has reported this issue before. I just opened a JIRA on what I >> think is the main problem here >> https://spark-project.atlassian.net/brows

Re: Spark Streaming + Kafka + Mesos/Marathon strangeness

2014-03-27 Thread Evgeny Shishkin
On 28 Mar 2014, at 01:44, Tathagata Das wrote: > The more I think about it the problem is not about /tmp, its more about the > workers not having enough memory. Blocks of received data could be falling > out of memory before it is getting processed. > BTW, what is the storage level that you a

Re: spark streaming and the spark shell

2014-03-27 Thread Evgeny Shishkin
tion because you restarted app" > TD > > > On Thu, Mar 27, 2014 at 3:28 PM, Evgeny Shishkin wrote: > > On 28 Mar 2014, at 01:13, Tathagata Das wrote: > >> Seems like the configuration of the Spark worker is not right. Either the >> worker has not been give

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Evgeny Shishkin
for that for tracking this issue. > https://spark-project.atlassian.net/browse/SPARK-1341 > > Thank you, i will participate and can provide testing of new code. Sorry for capslock, i just debugged this whole day, literally. > TD > > > On Thu, Mar 27, 2014 at 3:23 PM, Evgeny

Re: spark streaming and the spark shell

2014-03-27 Thread Evgeny Shishkin
On 28 Mar 2014, at 01:13, Tathagata Das wrote: > Seems like the configuration of the Spark worker is not right. Either the > worker has not been given enough memory or the allocation of the memory to > the RDD storage needs to be fixed. If configured correctly, the Spark workers > should not

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Evgeny Shishkin
On 28 Mar 2014, at 01:11, Scott Clasen wrote: > Evgeniy Shishkin wrote >> So, at the bottom — kafka input stream just does not work. > > > That was the conclusion I was coming to as well. Are there open tickets > around fixing this up? > I am not aware of such. Actually nobody complained on

Re: KafkaInputDStream mapping of partitions to tasks

2014-03-27 Thread Evgeny Shishkin
On 28 Mar 2014, at 00:34, Scott Clasen wrote: Actually looking closer it is stranger than I thought, in the spark UI, one executor has executed 4 tasks, and one has executed 1928 Can anyone explain the workings of a KafkaInputStream wrt kafka partitions and mapping to spark executors and ta

Re: spark streaming and the spark shell

2014-03-27 Thread Evgeny Shishkin
> >> 2. I notice that once I start ssc.start(), my stream starts processing and >> continues indefinitely...even if I close the socket on the server end (I'm >> using unix command "nc" to mimic a server as explained in the streaming >> programming guide .) Can I tell my stream to detect if it's

Re: example of non-line oriented input data?

2014-03-19 Thread Evgeny Shishkin
On 19 Mar 2014, at 19:54, Diana Carroll wrote: > Actually, thinking more on this question, Matei: I'd definitely say support > for Avro. There's a lot of interest in this!! > Agree, and parquet as default Cloudera Impala format. > On Tue, Mar 18, 2014 at 8:14 PM, Matei Zaharia > wrote: