Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

Cody Koeninger Mon, 20 Feb 2017 14:51:06 -0800

So there's no reason to use checkpointing at all, right?  Eliminate
that as a possible source of problems.


Probably unrelated, but this also isn't a very good way to benchmark.
Kafka producers are threadsafe, there's no reason to create one for
each partition.

On Mon, Feb 20, 2017 at 4:43 PM, Muhammad Haseeb Javed
<11besemja...@seecs.edu.pk> wrote:
> This is the code that I have been trying is giving me this error. No
> complicated operation being performed on the topics as far as I can see.
>
> class Identity() extends BenchBase {
>
>
>   override def process(lines: DStream[(Long, String)], config:
> SparkBenchConfig): Unit = {
>
>     val reportTopic = config.reporterTopic
>
>     val brokerList = config.brokerList
>
>
>     lines.foreachRDD(rdd => rdd.foreachPartition( partLines => {
>
>       val reporter = new KafkaReporter(reportTopic, brokerList)
>
>       partLines.foreach{ case (inTime , content) =>
>
>         val outTime = System.currentTimeMillis()
>
>         reporter.report(inTime, outTime)
>
>         if(config.debugMode) {
>
>           println("Event: " + inTime + ", " + outTime)
>
>         }
>
>       }
>
>     }))
>
>   }
>
> }
>
>
> On Mon, Feb 20, 2017 at 3:10 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> That's an indication that the beginning offset for a given batch is
>> higher than the ending offset, i.e. something is seriously wrong.
>>
>> Are you doing anything at all odd with topics, i.e. deleting and
>> recreating them, using compacted topics, etc?
>>
>> Start off with a very basic stream over the same kafka topic that just
>> does foreach println or similar, with no checkpointing at all, and get
>> that working first.
>>
>> On Mon, Feb 20, 2017 at 12:10 PM, Muhammad Haseeb Javed
>> <11besemja...@seecs.edu.pk> wrote:
>> > Update: I am using Spark 2.0.2 and  Kafka 0.8.2 with Scala 2.10
>> >
>> > On Mon, Feb 20, 2017 at 1:06 PM, Muhammad Haseeb Javed
>> > <11besemja...@seecs.edu.pk> wrote:
>> >>
>> >> I am PhD student at Ohio State working on a study to evaluate streaming
>> >> frameworks (Spark Streaming, Storm, Flink) using the the Intel HiBench
>> >> benchmarks. But I think I am having a problem  with Spark. I have Spark
>> >> Streaming application which I am trying to run on a 5 node cluster
>> >> (including master). I have 2 zookeeper and 4 kafka brokers. However,
>> >> whenever I run a Spark Streaming application I encounter the following
>> >> error:
>> >>
>> >> java.lang.IllegalArgumentException: requirement failed: numRecords must
>> >> not be negative
>> >>         at scala.Predef$.require(Predef.scala:224)
>> >>         at
>> >>
>> >> org.apache.spark.streaming.scheduler.StreamInputInfo.<init>(InputInfoTracker.scala:38)
>> >>         at
>> >>
>> >> org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:165)
>> >>         at
>> >>
>> >> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>> >>         at
>> >>
>> >> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>> >>         at
>> >> scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>> >>         at
>> >>
>> >> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>> >>         at
>> >>
>> >> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>> >>         at
>> >>
>> >> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
>> >>         at
>> >>
>> >> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
>> >>         at
>> >>
>> >> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
>> >>         at scala.Option.orElse(Option.scala:289)
>> >>
>> >> The application starts fine, but as soon as the Kafka producers start
>> >> emitting the stream data I start receiving the aforementioned error
>> >> repeatedly.
>> >>
>> >> I have tried removing Spark Streaming checkpointing files as has been
>> >> suggested in similar posts on the internet. However, the problem
>> >> persists
>> >> even if I start a Kafka topic and its corresponding consumer Spark
>> >> Streaming
>> >> application for the first time. Also the problem could not be offset
>> >> related
>> >> as I start the topic for the first time.
>> >>
>> >> Although the application seems to be processing the stream properly as
>> >> I
>> >> can see by the benchmark numbers generated. However, the numbers are
>> >> way of
>> >> from what I got for Storm and Flink, suspecting me to believe that
>> >> there is
>> >> something wrong with the pipeline and Spark is not able to process the
>> >> stream as cleanly as it should. Any help in this regard would be really
>> >> appreciated.
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

Reply via email to