Inquiry about Processing Speed

2023-09-27 Thread Haseeb Khalid
Dear Support Team, I hope this message finds you well. My name is Haseeb Khalid, and I am reaching out to discuss a scenario related to processing speed in Apache Spark. I have been utilizing these technologies in our projects, and we have encountered a specific use case where we are seeking

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-03-08 Thread Muhammad Haseeb Javed
d for the kafka client > dependency, it shouldn't have compiled at all to begin with. > > On Wed, Feb 22, 2017 at 12:11 PM, Muhammad Haseeb Javed > <11besemja...@seecs.edu.pk> wrote: > > I just noticed that Spark version that I am using (2.0.2) is built with > > Sca

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-22 Thread Muhammad Haseeb Javed
no reason to use checkpointing at all, right? Eliminate > that as a possible source of problems. > > Probably unrelated, but this also isn't a very good way to benchmark. > Kafka producers are threadsafe, there's no reason to create one for > each partition. > > On Mon, Feb 20, 20

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
er the same kafka topic that just > does foreach println or similar, with no checkpointing at all, and get > that working first. > > On Mon, Feb 20, 2017 at 12:10 PM, Muhammad Haseeb Javed > <11besemja...@seecs.edu.pk> wrote: > > Update: I am using Spark 2.0.2 and Kafka 0

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
Update: I am using Spark 2.0.2 and Kafka 0.8.2 with Scala 2.10 On Mon, Feb 20, 2017 at 1:06 PM, Muhammad Haseeb Javed < 11besemja...@seecs.edu.pk> wrote: > I am PhD student at Ohio State working on a study to evaluate streaming > frameworks (Spark Streaming, Storm, Flink) using t

Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Muhammad Haseeb Javed
I am PhD student at Ohio State working on a study to evaluate streaming frameworks (Spark Streaming, Storm, Flink) using the the Intel HiBench benchmarks. But I think I am having a problem with Spark. I have Spark Streaming application which I am trying to run on a 5 node cluster (including

Wrap an RDD with a ShuffledRDD

2015-11-08 Thread Muhammad Haseeb Javed
I am working on a modified Spark core and have a Broadcast variable which I deserialize to obtain an RDD along with its set of dependencies, as is done in ShuffleMapTask, as following: val taskBinary: Broadcast[Array[Byte]]var (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](

What is the abstraction for a Worker process in Spark code

2015-10-12 Thread Muhammad Haseeb Javed
I understand that each executor that is processing a Spark job is emulated in Spark code by the Executor class in Executor.scala and CoarseGrainedExecutorBackend is the abstraction which facilitates communication between an Executor and the Driver. But what is the abstraction for a Worker process

Building spark-examples takes too much time using Maven

2015-08-26 Thread Muhammad Haseeb Javed
I checked out the master branch and started playing around with the examples. I want to build a jar of the examples as I wish run them using the modified spark jar that I have. However, packaging spark-examples takes too much time as maven tries to download the jar dependencies rather than use

Re: Difference between Sort based and Hash based shuffle

2015-08-19 Thread Muhammad Haseeb Javed
to disk, then finally merges all the spilled files together to form one final output file. This places much less stress on the file system and requires much fewer I/O operations especially on the read side. -Andrew 2015-08-16 11:08 GMT-07:00 Muhammad Haseeb Javed 11besemja...@seecs.edu.pk

Re: Difference between Sort based and Hash based shuffle

2015-08-16 Thread Muhammad Haseeb Javed
ravikiranmag...@gmail.com wrote: Have a look at this presentation. http://www.slideshare.net/colorant/spark-shuffle-introduction . Can be of help to you. On Sat, Aug 15, 2015 at 1:42 PM, Muhammad Haseeb Javed 11besemja...@seecs.edu.pk wrote: What are the major differences between how Sort

Difference between Sort based and Hash based shuffle

2015-08-15 Thread Muhammad Haseeb Javed
What are the major differences between how Sort based and Hash based shuffle operate and what is it that cause Sort Shuffle to perform better than Hash? Any talks that discuss both shuffles in detail, how they are implemented and the performance gains ?

Actor not found for: ActorSelection

2015-07-28 Thread Haseeb
I just cloned the master repository of Spark from Github. I am running it on OSX 10.9, Spark 1.4.1 and Scala 2.10.4 I just tried to run the SparkPi example program using IntelliJ Idea but get the error : akka.actor.ActorNotFound: Actor not found for:

Re: Actor not found for: ActorSelection

2015-07-28 Thread Haseeb
The problem was that I was trying to start the example app in standalone cluster mode by passing in *-Dspark.master=spark://myhost:7077* as an argument to the JVM. I launched the example app locally using -*Dspark.master=local* and it worked. -- View this message in context: