Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-03-08 Thread Muhammad Haseeb Javed
d for the kafka client > dependency, it shouldn't have compiled at all to begin with. > > On Wed, Feb 22, 2017 at 12:11 PM, Muhammad Haseeb Javed > <11besemja...@seecs.edu.pk> wrote: > > I just noticed that Spark version that I am using (2.0.2) is built with > > Sca

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-22 Thread Muhammad Haseeb Javed
riously wrong. > >> > >> Are you doing anything at all odd with topics, i.e. deleting and > >> recreating them, using compacted topics, etc? > >> > >> Start off with a very basic stream over the same kafka topic that just > >> does foreach print

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
er the same kafka topic that just > does foreach println or similar, with no checkpointing at all, and get > that working first. > > On Mon, Feb 20, 2017 at 12:10 PM, Muhammad Haseeb Javed > <11besemja...@seecs.edu.pk> wrote: > > Update: I am using Spark 2.0.2 and Kafka 0

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
Update: I am using Spark 2.0.2 and Kafka 0.8.2 with Scala 2.10 On Mon, Feb 20, 2017 at 1:06 PM, Muhammad Haseeb Javed < 11besemja...@seecs.edu.pk> wrote: > I am PhD student at Ohio State working on a study to evaluate streaming > frameworks (Spark Streaming, Storm, Flink) using t

Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
I am PhD student at Ohio State working on a study to evaluate streaming frameworks (Spark Streaming, Storm, Flink) using the the Intel HiBench benchmarks. But I think I am having a problem with Spark. I have Spark Streaming application which I am trying to run on a 5 node cluster (including

Wrap an RDD with a ShuffledRDD

2015-11-08 Thread Muhammad Haseeb Javed
I am working on a modified Spark core and have a Broadcast variable which I deserialize to obtain an RDD along with its set of dependencies, as is done in ShuffleMapTask, as following: val taskBinary: Broadcast[Array[Byte]]var (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](

What is the abstraction for a Worker process in Spark code

2015-10-12 Thread Muhammad Haseeb Javed
I understand that each executor that is processing a Spark job is emulated in Spark code by the Executor class in Executor.scala and CoarseGrainedExecutorBackend is the abstraction which facilitates communication between an Executor and the Driver. But what is the abstraction for a Worker process

Building spark-examples takes too much time using Maven

2015-08-26 Thread Muhammad Haseeb Javed
I checked out the master branch and started playing around with the examples. I want to build a jar of the examples as I wish run them using the modified spark jar that I have. However, packaging spark-examples takes too much time as maven tries to download the jar dependencies rather than use

Re: Difference between Sort based and Hash based shuffle

2015-08-19 Thread Muhammad Haseeb Javed
to disk, then finally merges all the spilled files together to form one final output file. This places much less stress on the file system and requires much fewer I/O operations especially on the read side. -Andrew 2015-08-16 11:08 GMT-07:00 Muhammad Haseeb Javed 11besemja...@seecs.edu.pk

Re: Difference between Sort based and Hash based shuffle

2015-08-16 Thread Muhammad Haseeb Javed
ravikiranmag...@gmail.com wrote: Have a look at this presentation. http://www.slideshare.net/colorant/spark-shuffle-introduction . Can be of help to you. On Sat, Aug 15, 2015 at 1:42 PM, Muhammad Haseeb Javed 11besemja...@seecs.edu.pk wrote: What are the major differences between how Sort

Difference between Sort based and Hash based shuffle

2015-08-15 Thread Muhammad Haseeb Javed
What are the major differences between how Sort based and Hash based shuffle operate and what is it that cause Sort Shuffle to perform better than Hash? Any talks that discuss both shuffles in detail, how they are implemented and the performance gains ?