[streaming] reading Kafka direct stream throws kafka.common.OffsetOutOfRangeException

2015-09-30 Thread Alexey Ponkin
Hi I have simple spark-streaming job(8 executors 1 core - on 8 node cluster) - read from Kafka topic( 3 brokers with 8 partitions) and save to Cassandra. The problem is that when I increase number of incoming messages in topic the job is starting to fail with

Re: [streaming] DStream with window performance issue

2015-09-08 Thread Alexey Ponkin
Koeninger" <c...@koeninger.org>: >  Can you provide more info (what version of spark, code example)? > >  On Tue, Sep 8, 2015 at 8:18 AM, Alexey Ponkin <alexey.pon...@ya.ru> wrote: >>  Hi, >> >>  I have an application with 2 streams, which are joined together.

[streaming] DStream with window performance issue

2015-09-08 Thread Alexey Ponkin
Hi, I have an application with 2 streams, which are joined together. Stream1 - is simple DStream(relativly small size batch chunks) Stream2 - is a windowed DStream(with duration for example 60 seconds) Stream1 and Stream2 are Kafka direct stream. The problem is that according to logs window

[streaming] Using org.apache.spark.Logging will silently break task execution

2015-09-06 Thread Alexey Ponkin
Hi, I have the following code object MyJob extends org.apache.spark.Logging{ ... val source: DStream[SomeType] ... source.foreachRDD { rdd => logInfo(s"""+++ForEachRDD+++""") rdd.foreachPartition { partitionOfRecords => logInfo(s"""+++ForEachPartition+++""") } } I