Low Level Kafka Consumer for Spark

2014-08-02 Thread Dibyendu Bhattacharya
Hi, I have implemented a Low Level Kafka Consumer for Spark Streaming using Kafka Simple Consumer API. This API will give better control over the Kafka offset management and recovery from failures. As the present Spark KafkaUtils uses HighLevel Kafka Consumer API, I wanted to have a better

Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread Dibyendu Bhattacharya
You can try this Kafka Spark Consumer which I recently wrote. This uses the Low Level Kafka Consumer https://github.com/dibbhatt/kafka-spark-consumer Dibyendu On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s rafeeq.ec...@gmail.com wrote: Hi, I am new to Apache Spark and Trying to Develop spark

Re: Low Level Kafka Consumer for Spark

2014-08-05 Thread Dibyendu Bhattacharya
. It's great to see community effort on adding new streams/receivers, adding a Java API for receivers was something we did specifically to allow this :) - Patrick On Sat, Aug 2, 2014 at 10:09 AM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Hi, I have implemented a Low Level

Re: Data loss - Spark streaming and network receiver

2014-08-18 Thread Dibyendu Bhattacharya
Dear All, Recently I have written a Spark Kafka Consumer to solve this problem. Even we have seen issues with KafkaUtils which is using Highlevel Kafka Consumer and consumer code has no handle to offset management. The below code solves this problem, and this has is being tested in our Spark

Re: Low Level Kafka Consumer for Spark

2014-08-26 Thread Dibyendu Bhattacharya
Hi, As I understand, your problem is similar to this JIRA. https://issues.apache.org/jira/browse/SPARK-1647 The issue in this case, Kafka can not replay the message as offsets are already committed. Also I think existing KafkaUtils ( The Default High Level Kafka Consumer) also have this issue.

Re: Low Level Kafka Consumer for Spark

2014-08-26 Thread Dibyendu Bhattacharya
Hi Bharat, Thanks for your email. If the Kafka Reader worker process dies, it will be replaced by different machine, and it will start consuming from the offset where it left over ( for each partition). Same case can happen even if I tried to have individual Receiver for every partition.

Re: Low Level Kafka Consumer for Spark

2014-08-27 Thread Dibyendu Bhattacharya
I agree. This issue should be fixed in Spark rather rely on replay of Kafka messages. Dib On Aug 28, 2014 6:45 AM, RodrigoB rodrigo.boav...@aspect.com wrote: Dibyendu, Tnks for getting back. I believe you are absolutely right. We were under the assumption that the raw data was being

Re: Low Level Kafka Consumer for Spark

2014-09-03 Thread Dibyendu Bhattacharya
Hi, Sorry for little delay . As discussed in this thread, I have modified the Kafka-Spark-Consumer ( https://github.com/dibbhatt/kafka-spark-consumer) code to have dedicated Receiver for every Topic Partition. You can see the example howto create Union of these receivers in

Re: Low Level Kafka Consumer for Spark

2014-09-07 Thread Dibyendu Bhattacharya
, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Hi, Sorry for little delay . As discussed in this thread, I have modified the Kafka-Spark-Consumer ( https://github.com/dibbhatt/kafka-spark-consumer) code to have dedicated Receiver for every Topic Partition. You can see the example

Re: Low Level Kafka Consumer for Spark

2014-09-10 Thread Dibyendu Bhattacharya
in very very slow pace. Regards, Dibyendu On Mon, Sep 8, 2014 at 12:08 AM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Hi Tathagata, I have managed to implement the logic into the Kafka-Spark consumer to recover from Driver failure. This is just a interim fix till

Re: How to scale more consumer to Kafka stream

2014-09-10 Thread Dibyendu Bhattacharya
Hi, You can use this Kafka Spark Consumer. https://github.com/dibbhatt/kafka-spark-consumer This is exactly does that . It creates parallel Receivers for every Kafka topic partitions. You can see the Consumer.java under consumer.kafka.client package to see an example how to use it. There is

Re: How to scale more consumer to Kafka stream

2014-09-11 Thread Dibyendu Bhattacharya
I agree Gerard. Thanks for pointing this.. Dib On Thu, Sep 11, 2014 at 5:28 PM, Gerard Maas gerard.m...@gmail.com wrote: This pattern works. One note, thought: Use 'union' only if you need to group the data from all RDDs into one RDD for processing (like count distinct or need a groupby).

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-12 Thread Dibyendu Bhattacharya
ContextCleaner? I met very similar issue before…but haven’t get resolved Best, -- Nan Zhu On Thursday, September 11, 2014 at 10:13 AM, Dibyendu Bhattacharya wrote: Dear All, Not sure if this is a false alarm. But wanted to raise to this to understand what is happening. I am testing

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-12 Thread Dibyendu Bhattacharya
to prevent the MemoryStore/BlockManager of dropping the block inputs? And should they be logged in the level WARN/ERROR? Thanks. On Fri, Sep 12, 2014 at 4:45 PM, Dibyendu Bhattacharya [via Apache Spark User List] [hidden email] http://user/SendEmail.jtp?type=nodenode=14081i=0 wrote: Dear all

Re: Low Level Kafka Consumer for Spark

2014-09-15 Thread Dibyendu Bhattacharya
Hi Alon, No this will not be guarantee that same set of messages will come in same RDD. This fix just re-play the messages from last processed offset in same order. Again this is just a interim fix we needed to solve our use case . If you do not need this message re-play feature, just do not

Re: Low Level Kafka Consumer for Spark

2014-09-15 Thread Dibyendu Bhattacharya
? Thanks, Tim On Mon, Sep 15, 2014 at 4:33 AM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Hi Alon, No this will not be guarantee that same set of messages will come in same RDD. This fix just re-play the messages from last processed offset in same order. Again

Re: Error when Spark streaming consumes from Kafka

2014-11-22 Thread Dibyendu Bhattacharya
I believe this is something to do with how Kafka High Level API manages consumers within a Consumer group and how it re-balance during failure. You can find some mention in this Kafka wiki. https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design Due to various issues in Kafka

Re: Low Level Kafka Consumer for Spark

2014-12-03 Thread Dibyendu Bhattacharya
Hi, Yes, as Jerry mentioned, the Spark -3129 ( https://issues.apache.org/jira/browse/SPARK-3129) enabled the WAL feature which solves the Driver failure problem. The way 3129 is designed , it solved the driver failure problem agnostic of the source of the stream ( like Kafka or Flume etc) But

Re: Error when Spark streaming consumes from Kafka

2015-02-02 Thread Dibyendu Bhattacharya
Or you can use this Low Level Kafka Consumer for Spark : https://github.com/dibbhatt/kafka-spark-consumer This is now part of http://spark-packages.org/ and is running successfully for past few months in Pearson production environment . Being Low Level consumer, it does not have this re-balancing

Re: Error when Spark streaming consumes from Kafka

2015-02-02 Thread Dibyendu Bhattacharya
working nicely. I think there is merit in make it the default Kafka Receiver for spark streaming. -neelesh On Mon, Feb 2, 2015 at 5:25 PM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Or you can use this Low Level Kafka Consumer for Spark : https://github.com/dibbhatt/kafka

Re: Low Level Kafka Consumer for Spark

2015-01-15 Thread Dibyendu Bhattacharya
Hi Kidong, No , I have not tried yet with Spark 1.2 yet. I will try this out and let you know how this goes. By the way, is there any change in Receiver Store method happened in Spark 1.2 ? Regards, Dibyendu On Fri, Jan 16, 2015 at 11:25 AM, mykidong mykid...@gmail.com wrote: Hi

Re: Low Level Kafka Consumer for Spark

2015-01-15 Thread Dibyendu Bhattacharya
On Fri, Jan 16, 2015 at 11:50 AM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Hi Kidong, No , I have not tried yet with Spark 1.2 yet. I will try this out and let you know how this goes. By the way, is there any change in Receiver Store method happened in Spark 1.2

Re: Low Level Kafka Consumer for Spark

2015-01-16 Thread Dibyendu Bhattacharya
://github.com/dibbhatt/kafka-spark-consumer/blob/master/examples/scala/LowLevelKafkaConsumer.scala#L45 which you can run after changing few lines of configurations. Thanks Best Regards On Fri, Jan 16, 2015 at 12:23 PM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Hi Kidong

Re: Spark Streaming with Kafka

2015-01-21 Thread Dibyendu Bhattacharya
You can probably try the Low Level Consumer option with Spark 1.2 https://github.com/dibbhatt/kafka-spark-consumer This Consumer can recover from any underlying failure of Spark Platform or Kafka and either retry or restart the receiver. This is being working nicely for us. Regards, Dibyendu

Re: Spark streaming app shutting down

2015-02-04 Thread Dibyendu Bhattacharya
Thanks Akhil for mentioning this Low Level Consumer ( https://github.com/dibbhatt/kafka-spark-consumer ) . Yes it has better fault tolerant mechanism than any existing Kafka consumer available . This has no data loss on receiver failure and have ability to reply or restart itself in-case of

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Dibyendu Bhattacharya
Which version of Spark you are running ? You can try this Low Level Consumer : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer This is designed to recover from various failures and have very good fault recovery mechanism built in. This is being used by many users and at present

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Dibyendu Bhattacharya
PM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Which version of Spark you are running ? You can try this Low Level Consumer : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer This is designed to recover from various failures and have very good fault recovery

Re: Some questions on Multiple Streams

2015-04-24 Thread Dibyendu Bhattacharya
You can probably try the Low Level Consumer from spark-packages ( http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) . How many partitions are there for your topics ? Let say you have 10 topics , and each having 3 partition , ideally you can create max 30 parallel Receiver and 30

Re: Reading Real Time Data only from Kafka

2015-05-13 Thread Dibyendu Bhattacharya
to do your transformation on the iterator inside a foreachPartition. Again, this isn't a con of the direct stream api, this is just a need to understand how Spark works. On Tue, May 12, 2015 at 10:30 PM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: The low level consumer

Re: force the kafka consumer process to different machines

2015-05-13 Thread Dibyendu Bhattacharya
or you can use this Receiver as well : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer Where you can specify how many Receivers you need for your topic and it will divides the partitions among the Receiver and return the joined stream for you . Say you specified 20 receivers , in

Spark Streaming graceful shutdown in Spark 1.4

2015-05-18 Thread Dibyendu Bhattacharya
Hi, Just figured out that if I want to perform graceful shutdown of Spark Streaming 1.4 ( from master ) , the Runtime.getRuntime().addShutdownHook no longer works . As in Spark 1.4 there is Utils.addShutdownHook defined for Spark Core, that gets anyway called , which leads to graceful shutdown

Re: Spark Streaming graceful shutdown in Spark 1.4

2015-05-19 Thread Dibyendu Bhattacharya
By the way this happens when I stooped the Driver process ... On Tue, May 19, 2015 at 12:29 PM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: You mean to say within Runtime.getRuntime().addShutdownHook I call ssc.stop(stopSparkContext = true, stopGracefully = true

Re: Spark Streaming graceful shutdown in Spark 1.4

2015-05-19 Thread Dibyendu Bhattacharya
PM, Sean Owen so...@cloudera.com wrote: I don't think you should rely on a shutdown hook. Ideally you try to stop it in the main exit path of your program, even in case of an exception. On Tue, May 19, 2015 at 7:59 AM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: You mean

Re: Spark Streaming graceful shutdown in Spark 1.4

2015-05-19 Thread Dibyendu Bhattacharya
On Mon, May 18, 2015 at 9:43 PM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Hi, Just figured out that if I want to perform graceful shutdown of Spark Streaming 1.4 ( from master ) , the Runtime.getRuntime().addShutdownHook no longer works . As in Spark 1.4

Re: Reading Real Time Data only from Kafka

2015-05-12 Thread Dibyendu Bhattacharya
The low level consumer which Akhil mentioned , has been running in Pearson for last 4-5 months without any downtime. I think this one is the reliable Receiver Based Kafka consumer as of today for Spark .. if you say it that way .. Prior to Spark 1.3 other Receiver based consumers have used Kafka

Latest enhancement in Low Level Receiver based Kafka Consumer

2015-04-01 Thread Dibyendu Bhattacharya
Hi, Just to let you know, I have made some enhancement in Low Level Reliable Receiver based Kafka Consumer ( http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) . Earlier version uses as many Receiver task for number of partitions of your kafka topic . Now you can configure desired

Re: spark streaming doubt

2015-05-19 Thread Dibyendu Bhattacharya
Just to add, there is a Receiver based Kafka consumer which uses Kafka Low Level Consumer API. http://spark-packages.org/package/dibbhatt/kafka-spark-consumer Regards, Dibyendu On Tue, May 19, 2015 at 9:00 PM, Akhil Das ak...@sigmoidanalytics.com wrote: On Tue, May 19, 2015 at 8:10 PM,

Re: Spark Streaming graceful shutdown in Spark 1.4

2015-05-20 Thread Dibyendu Bhattacharya
://github.com/apache/spark/pull/6307 On Tue, May 19, 2015 at 12:51 AM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Thenka Sean . you are right. If driver program is running then I can handle shutdown in main exit path . But if Driver machine is crashed (if you just stop

Re: [Kafka-Spark-Consumer] Spark-Streaming Job Fails due to Futures timed out

2015-06-08 Thread Dibyendu Bhattacharya
Hi Snehal Are you running the latest kafka consumer from github/spark-packages ? If not can you take the latest changes. This low level receiver will make attempt to keep trying if underlying BlockManager gives error. Are you see those retry cycle in log ? If yes then there is issue writing

Re: [Kafka-Spark-Consumer] Spark-Streaming Job Fails due to Futures timed out

2015-06-08 Thread Dibyendu Bhattacharya
Seems to be related to this JIRA : https://issues.apache.org/jira/browse/SPARK-3612 ? On Tue, Jun 9, 2015 at 7:39 AM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Hi Snehal Are you running the latest kafka consumer from github/spark-packages ? If not can you take the latest

Re: Kafka Spark Streaming: ERROR EndpointWriter: dropping message

2015-06-10 Thread Dibyendu Bhattacharya
Hi, Can you please little detail stack trace from your receiver logs and also the consumer settings you used ? I have never tested the consumer with Kafka 0.7.3 ..not sure if Kafka Version is the issue . Have you tried building the consumer using Kafka 0.7.3 ? Regards, Dibyendu On Wed, Jun 10,

Re: spark streaming with kafka reset offset

2015-06-27 Thread Dibyendu Bhattacharya
Hi, There is another option to try for Receiver Based Low Level Kafka Consumer which is part of Spark-Packages ( http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) . This can be used with WAL as well for end to end zero data loss. This is also Reliable Receiver and Commit offset to

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-05-21 Thread Dibyendu Bhattacharya
FileSystem interface, is returning zero. On Mon, May 11, 2015 at 4:38 AM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Just to follow up this thread further . I was doing some fault tolerant testing of Spark Streaming with Tachyon as OFF_HEAP block store. As I said in earlier

Re: Spark Streaming and Drools

2015-05-22 Thread Dibyendu Bhattacharya
Hi, Sometime back I played with Distributed Rule processing by integrating Drool with HBase Co-Processors ..and invoke Rules on any incoming data .. https://github.com/dibbhatt/hbase-rule-engine You can get some idea how to use Drools rules if you see this RegionObserverCoprocessor ..

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Dibyendu Bhattacharya
I think you also can give a try to this consumer : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer in your environment. This has been running fine for topic with large number of Kafka partition ( 200 ) like yours without any issue.. no issue with connection as this consumer re-use

Re: Reliable Streaming Receiver

2015-08-05 Thread Dibyendu Bhattacharya
Hi, You can try This Kafka Consumer for Spark which is also part of Spark Packages . https://github.com/dibbhatt/kafka-spark-consumer Regards, Dibyendu On Thu, Aug 6, 2015 at 6:48 AM, Sourabh Chandak sourabh3...@gmail.com wrote: Thanks Tathagata. I tried that but BlockGenerator internally

Contributing Receiver based Low Level Kafka Consumer from Spark-Packages to Apache Spark Project

2015-10-24 Thread Dibyendu Bhattacharya
Hi, I have raised a JIRA ( https://issues.apache.org/jira/browse/SPARK-11045) to track the discussion but also mailing user group . This Kafka consumer is around for a while in spark-packages ( http://spark-packages.org/package/dibbhatt/kafka-spark-consumer ) and I see many started using it , I

Re: Need more tasks in KafkaDirectStream

2015-10-29 Thread Dibyendu Bhattacharya
If you do not need one to one semantics and does not want strict ordering guarantee , you can very well use the Receiver based approach, and this consumer from Spark-Packages ( https://github.com/dibbhatt/kafka-spark-consumer) can give much better alternatives in terms of performance and

Some BlockManager Doubts

2015-07-09 Thread Dibyendu Bhattacharya
Hi , Just would like to clarify few doubts I have how BlockManager behaves . This is mostly in regards to Spark Streaming Context . There are two possible cases Blocks may get dropped / not stored in memory Case 1. While writing the Block for MEMORY_ONLY settings , if Node's BlockManager does

Re: BlockNotFoundException when running spark word count on Tachyon

2015-08-26 Thread Dibyendu Bhattacharya
Sometime back I was playing with Spark and Tachyon and I also found this issue . The issue here is TachyonBlockManager put the blocks in WriteType.TRY_CACHE configuration . And because of this Blocks ate evicted from Tachyon Cache when Memory is full and when Spark try to find the block it throws

Re: BlockNotFoundException when running spark word count on Tachyon

2015-08-26 Thread Dibyendu Bhattacharya
The URL seems to have changed .. here is the one .. http://tachyon-project.org/documentation/Tiered-Storage-on-Tachyon.html On Wed, Aug 26, 2015 at 12:32 PM, Dibyendu Bhattacharya dibyendu.bhattach...@gmail.com wrote: Sometime back I was playing with Spark and Tachyon and I also found

Just Released V1.0.4 Low Level Receiver Based Kafka-Spark-Consumer in Spark Packages having built-in Back Pressure Controller

2015-08-26 Thread Dibyendu Bhattacharya
Dear All, Just now released the 1.0.4 version of Low Level Receiver based Kafka-Spark-Consumer in spark-packages.org . You can find the latest release here : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer Here is github location :

Re: Using KafkaDirectStream, stopGracefully and exceptions

2015-09-10 Thread Dibyendu Bhattacharya
Hi, This is being running in Production in many organization who has adopted this consumer as an alternative option. The Consumer will run with spark 1.3.1 . This is being running in Pearson for sometime in production. This is part of spark packages and you can see how to include it in your

Re: Using KafkaDirectStream, stopGracefully and exceptions

2015-09-10 Thread Dibyendu Bhattacharya
Hi, Just to clarify one point which may not be clear to many. If someone decides to use Receiver based approach , the best options at this point is to use https://github.com/dibbhatt/kafka-spark-consumer. This will also work with WAL like any other receiver based consumer. The major issue with

Re: Managing scheduling delay in Spark Streaming

2015-09-16 Thread Dibyendu Bhattacharya
Hi Michal, If you use https://github.com/dibbhatt/kafka-spark-consumer , it comes with int own built-in back pressure mechanism. By default this is disabled, you need to enable it to use this feature with this consumer. It does control the rate based on Scheduling Delay at runtime.. Regards,

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-26 Thread Dibyendu Bhattacharya
byendu, > > How does one go about configuring spark streaming to use tachyon as its > place for storing checkpoints? Also, can one do this with tachyon running > on a completely different node than where spark processes are running? > > Thanks > Nikunj > > > On Thu

Re: Spark Streaming over YARN

2015-10-04 Thread Dibyendu Bhattacharya
> see one node receiving the kafka flow. > (I use spark 1.3.1) > > Tks > Nicolas > > > ----- Mail original - > De: "Dibyendu Bhattacharya" <dibyendu.bhattach...@gmail.com> > À: nib...@free.fr > Cc: "Cody Koeninger" <c...@koeninger.or

Re: Spark Streaming over YARN

2015-10-02 Thread Dibyendu Bhattacharya
Hi, If you need to use Receiver based approach , you can try this one : https://github.com/dibbhatt/kafka-spark-consumer This is also part of Spark packages : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer You just need to specify the number of Receivers you want for desired

Re: Spark Streaming over YARN

2015-10-02 Thread Dibyendu Bhattacharya
s (number of nodes), how RDD will be > distributed over the nodes/core. > For example in my example I have 4 nodes (with 2 cores) > > Tks > Nicolas > > > - Mail original - > De: "Dibyendu Bhattacharya" <dibyendu.bhattach...@gmail.com> > À: nib...@fre

Re: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread Dibyendu Bhattacharya
In direct stream checkpoint location is not recoverable if you modify your driver code. So if you just rely on checkpoint to commit offset , you can possibly loose messages if you modify driver code and you select offset from "largest" offset. If you do not want to loose messages, you need to

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Dibyendu Bhattacharya
, silently throwing away data because your system > isn't working right is not the same as "recover from any Kafka leader > changes and offset out of ranges issue". > > > > On Tue, Dec 1, 2015 at 11:27 PM, Dibyendu Bhattacharya < > dibyendu.bhattach...@gmail.com>

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Dibyendu Bhattacharya
the problem without actually addressing it". > > The only really correct way to deal with that situation is to recognize > why it's happening, and either increase your Kafka retention or increase > the speed at which you are consuming. > > On Wed, Dec 2, 2015 at 7:13 PM, D

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread Dibyendu Bhattacharya
Hi, if you use Receiver based consumer which is available in spark-packages ( http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) , this has all built in failure recovery and it can recover from any Kafka leader changes and offset out of ranges issue. Here is the package form github

Re: [Spark 1.6] Spark Streaming - java.lang.AbstractMethodError

2016-01-07 Thread Dibyendu Bhattacharya
0 > > Cheers > > On Thu, Jan 7, 2016 at 2:39 AM, Dibyendu Bhattacharya < > dibyendu.bhattach...@gmail.com> wrote: > >> You are using low level spark kafka consumer . I am the author of the >> same. >> >> Are you using the spark-packages version ? if y

Re: [Spark 1.6] Spark Streaming - java.lang.AbstractMethodError

2016-01-07 Thread Dibyendu Bhattacharya
Thu, Jan 7, 2016 at 11:39 AM, Dibyendu Bhattacharya > <dibyendu.bhattach...@gmail.com> wrote: > > You are using low level spark kafka consumer . I am the author of the > same. > > If I may ask, what are the differences between this and the direct > version shipped

Any Idea about this error : IllegalArgumentException: File segment length cannot be negative ?

2016-07-13 Thread Dibyendu Bhattacharya
In Spark Streaming job, I see a Batch failed with following error. Haven't seen anything like this earlier. This has happened during Shuffle for one Batch (haven't reoccurred after that).. Just curious to know what can cause this error. I am running Spark 1.5.1 Regards, Dibyendu Job aborted

Re: Severe Spark Streaming performance degradation after upgrading to 1.6.1

2016-07-13 Thread Dibyendu Bhattacharya
You can get some good pointers in this JIRA https://issues.apache.org/jira/browse/SPARK-15796 Dibyendu On Thu, Jul 14, 2016 at 12:53 AM, Sunita wrote: > I am facing the same issue. Upgrading to Spark1.6 is causing hugh > performance > loss. Could you solve this issue?

Latest Release of Receiver based Kafka Consumer for Spark Streaming.

2017-02-15 Thread Dibyendu Bhattacharya
Hi , Released latest version of Receiver based Kafka Consumer for Spark Streaming . Available at Spark Packages : https://spark-packages.org/package/dibbhatt/ kafka-spark-consumer Also at github : https://github.com/dibbhatt/kafka-spark-consumer Some key features - Tuned for better

Latest Release of Receiver based Kafka Consumer for Spark Streaming.

2016-08-25 Thread Dibyendu Bhattacharya
Hi , Released latest version of Receiver based Kafka Consumer for Spark Streaming. Receiver is compatible with Kafka versions 0.8.x, 0.9.x and 0.10.x and All Spark Versions Available at Spark Packages : https://spark-packages.org/package/dibbhatt/kafka-spark-consumer Also at github :

Re: Latest Release of Receiver based Kafka Consumer for Spark Streaming.

2016-08-25 Thread Dibyendu Bhattacharya
from my iPhone > > On Aug 25, 2016, at 6:33 AM, Dibyendu Bhattacharya < > dibyendu.bhattach...@gmail.com> wrote: > > Hi , > > Released latest version of Receiver based Kafka Consumer for Spark > Streaming. > > Receiver is compatible with Kafka versions 0.8.x, 0.

Re: question on Write Ahead Log (Spark Streaming )

2017-03-10 Thread Dibyendu Bhattacharya
Hi, You could also use this Receiver : https://github.com/dibbhatt/kafka-spark-consumer This is part of spark-packages also : https://spark-packages.org/package/dibbhatt/kafka-spark-consumer You do not need to enable WAL in this and still recover from Driver failure with no data loss. You can

why groupByKey still shuffle if SQL does "Distribute By" on same columns ?

2018-01-30 Thread Dibyendu Bhattacharya
Hi, I am trying something like this.. val sesDS: Dataset[XXX] = hiveContext.sql(select).as[XXX] The select statement is something like this : "select * from sometable DISTRIBUTE by col1, col2, col3" Then comes groupByKey... val gpbyDS = sesDS .groupByKey(x => (x.col1, x.col2, x.col3))