Re: Re: spark+kafka+dynamic resource allocation

2023-01-30 Thread Mich Talebzadeh
t; Lingzhe Sun > > > *From:* Mich Talebzadeh > *Date:* 2023-01-30 02:14 > *To:* Lingzhe Sun > *CC:* ashok34...@yahoo.com; User > *Subject:* Re: Re: spark+kafka+dynamic resource allocation > Hi, > > Spark Structured Streaming currently does not support dynamic allocation

Re: Re: spark+kafka+dynamic resource allocation

2023-01-29 Thread Lingzhe Sun
: ashok34...@yahoo.com; User Subject: Re: Re: spark+kafka+dynamic resource allocation Hi, Spark Structured Streaming currently does not support dynamic allocation (see SPARK-24815: Structured Streaming should support dynamic allocation). which is still open Autoscaling in Cloud offerings like Google

Re: Re: spark+kafka+dynamic resource allocation

2023-01-29 Thread Mich Talebzadeh
* 2023-01-29 04:01 > *To:* User ; Lingzhe Sun > *Subject:* Re: spark+kafka+dynamic resource allocation > Hi, > > Worth checking this link > > > https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation > > On Saturday, 28 January 2023 at 06:18:2

Re: Re: spark+kafka+dynamic resource allocation

2023-01-28 Thread Lingzhe Sun
Thank you for the response. But the reference does not seem to be answering any of those questions. BS Lingzhe Sun From: ashok34...@yahoo.com Date: 2023-01-29 04:01 To: User; Lingzhe Sun Subject: Re: spark+kafka+dynamic resource allocation Hi, Worth checking this link https

Re: spark+kafka+dynamic resource allocation

2023-01-28 Thread ashok34...@yahoo.com.INVALID
allocation works in spark+kafka streaming applications. Here're some questions: - Will structured streaming be supported? - Is the number of consumers always equal to the number of the partitions of subscribed topic (let's say there's only one topic)? - If consumers is evenly distributed

spark+kafka+dynamic resource allocation

2023-01-27 Thread Lingzhe Sun
Hi all, I'm wondering if dynamic resource allocation works in spark+kafka streaming applications. Here're some questions: Will structured streaming be supported? Is the number of consumers always equal to the number of the partitions of subscribed topic (let's say there's only one topic

RE: Spark kafka structured streaming - how to prevent dataloss

2022-03-22 Thread Gnanasoundari Soundarajan
Hi all, Any suggestion? Regards, Gnana From: Gnanasoundari Soundarajan Sent: Tuesday, March 8, 2022 10:02 PM To: user@spark.apache.org Subject: Spark kafka structured streaming - how to prevent dataloss Hi, In spark, it uses checkpoints to keep track of offsets in kafka. If there is any

Spark kafka structured streaming - how to prevent dataloss

2022-03-08 Thread Gnanasoundari Soundarajan
Hi, In spark, it uses checkpoints to keep track of offsets in kafka. If there is any data loss, can we edit the file and reduce the data loss? Please suggest the best practices to reduce the data loss under exceptional scenarios. Regards, Gnana

RE: Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
Ahh, ok. So, Kafka 3.1 is supported for Spark 3.2.1. Thank you very much. From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Friday, February 25, 2022 2:50 PM To: Michael Williams (SSI) Cc: user@spark.apache.org Subject: Re: Spark Kafka Integration these are the old and news ones

RE: Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
Thank you, that is good to know. From: Sean Owen [mailto:sro...@gmail.com] Sent: Friday, February 25, 2022 2:46 PM To: Michael Williams (SSI) Cc: Mich Talebzadeh ; user@spark.apache.org Subject: Re: Spark Kafka Integration Spark 3.2.1 is compiled vs Kafka 2.8.0; the forthcoming Spark 3.3

Re: Spark Kafka Integration

2022-02-25 Thread Mich Talebzadeh
ould be appreciated. Our entire team is > totally new to spark and kafka (this is a poc trial). > > > > *From:* Mich Talebzadeh [mailto:mich.talebza...@gmail.com] > *Sent:* Friday, February 25, 2022 2:30 PM > *To:* Michael Williams (SSI) > *Cc:* user@spark.apache.org > *Subjec

Re: Spark Kafka Integration

2022-02-25 Thread Sean Owen
hael Williams (SSI) > *Cc:* user@spark.apache.org > *Subject:* Re: Spark Kafka Integration > > > > and what version of kafka do you have 2.7? > > > > for spark 3.1.1 I needed these jar files to make it work > > > > kafka-clients-2.7.0.jar > commons-pool2-2.

RE: Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
Thank you From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Friday, February 25, 2022 2:35 PM To: Michael Williams (SSI) Cc: Sean Owen ; user@spark.apache.org Subject: Re: Spark Kafka Integration please see my earlier reply for 3.1.1 tested and worked in Google Dataproc

RE: Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
To: Michael Williams (SSI) Cc: user@spark.apache.org Subject: Re: Spark Kafka Integration and what version of kafka do you have 2.7? for spark 3.1.1 I needed these jar files to make it work kafka-clients-2.7.0.jar commons-pool2-2.9.0.jar spark-streaming_2.12-3.1.1.jar spark-sql-kafka-0-10_2.12

Re: Spark Kafka Integration

2022-02-25 Thread Mich Talebzadeh
t the > dependencies already exist on disk. If that makes any sense. > > > > Thank you > > > > *From:* Mich Talebzadeh [mailto:mich.talebza...@gmail.com] > *Sent:* Friday, February 25, 2022 2:16 PM > *To:* Michael Williams (SSI) > *Cc:* user@spark.apache.org > *Subj

Re: Spark Kafka Integration

2022-02-25 Thread Mich Talebzadeh
and what version of kafka do you have 2.7? for spark 3.1.1 I needed these jar files to make it work kafka-clients-2.7.0.jar commons-pool2-2.9.0.jar spark-streaming_2.12-3.1.1.jar spark-sql-kafka-0-10_2.12-3.1.0.jar HTH view my Linkedin profile

RE: Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
exist on disk. If that makes any sense. Thank you From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Friday, February 25, 2022 2:16 PM To: Michael Williams (SSI) Cc: user@spark.apache.org Subject: Re: Spark Kafka Integration What is the use case? Is this for spark structured

Re: Spark Kafka Integration

2022-02-25 Thread Sean Owen
That .jar is available on Maven, though typically you depend on it in your app, and compile an uber JAR which will contain it and all its dependencies. You can I suppose manage to compile an uber JAR from that dependency itself with tools if needed. On Fri, Feb 25, 2022 at 1:37 PM Michael

Re: Spark Kafka Integration

2022-02-25 Thread Mich Talebzadeh
What is the use case? Is this for spark structured streaming? HTH view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or

Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
After reviewing Spark's Kafka Integration guide, it indicates that spark-sql-kafka-0-10_2.12_3.2.1.jar and its dependencies are needed for Spark 3.2.1 (+ Scala 2.12) to work with Kafka. Can anybody clarify the cleanest, most repeatable (reliable) way to acquire these jars for including in a

RE: Running spark Kafka streaming jo in Azure HDInsight

2021-10-06 Thread Muhammed Favas
to submit job remotely. I have passed all dependent jars while calling the submit. Regards, Favas From: Stelios Philippou Sent: Wednesday, October 6, 2021 16:51 PM To: Muhammed Favas Cc: user@spark.apache.org Subject: Re: Running spark Kafka streaming jo in Azure HDInsight Hi Favas, The error

Re: Running spark Kafka streaming jo in Azure HDInsight

2021-10-06 Thread Stelios Philippou
Hi Favas, The error states that you are using different libraries version. Exception in thread "streaming-start" java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.subscribe(Ljava/util/Collection;)V Have in mind that Spark uses its internal libraries for the majority

Running spark Kafka streaming jo in Azure HDInsight

2021-10-06 Thread Muhammed Favas
Hi, I am facing some dependency issue in running a spark streaming job in Azure HDInsight. The job is connecting to a kafka broker which is hosted in a LAN and has public IP access to it. Spark job porn.xml set up - spark version 3.0.0, Scala version 2.12 org.scala-lang scala-library

Re: Spark Kafka Streaming With Transactional Messages

2020-09-16 Thread jianyangusa
I have the same issue. Do you have a solution? Maybe spark stream not support transaction message. I use Kafka stream to retrieve the transaction message. Maybe we can ask Spark support this feature. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: [Spark Kafka Structured Streaming] Adding partition and topic to the kafka dynamically

2020-08-28 Thread Amit Joshi
Hi Jungtaek, Thanks for the input. I did tried and it worked. I got confused earlier after reading some blogs. Regards Amit On Friday, August 28, 2020, Jungtaek Lim wrote: > Hi Amit, > > if I remember correctly, you don't need to restart the query to reflect > the newly added topic and

Re: [Spark Kafka Structured Streaming] Adding partition and topic to the kafka dynamically

2020-08-28 Thread Gabor Somogyi
Hi Amit, The answer is no. G On Fri, Aug 28, 2020 at 9:16 AM Jungtaek Lim wrote: > Hi Amit, > > if I remember correctly, you don't need to restart the query to reflect > the newly added topic and partition, if your subscription covers the topic > (like subscribe pattern). Please try it out.

Re: [Spark Kafka Structured Streaming] Adding partition and topic to the kafka dynamically

2020-08-28 Thread Jungtaek Lim
Hi Amit, if I remember correctly, you don't need to restart the query to reflect the newly added topic and partition, if your subscription covers the topic (like subscribe pattern). Please try it out. Hope this helps. Thanks, Jungtaek Lim (HeartSaVioR) On Fri, Aug 28, 2020 at 1:56 PM Amit

Re: [Spark Kafka Structured Streaming] Adding partition and topic to the kafka dynamically

2020-08-27 Thread Amit Joshi
Any pointers will be appreciated. On Thursday, August 27, 2020, Amit Joshi wrote: > Hi All, > > I am trying to understand the effect of adding topics and partitions to a > topic in kafka, which is being consumed by spark structured streaming > applications. > > Do we have to restart the spark

[Spark Kafka Structured Streaming] Adding partition and topic to the kafka dynamically

2020-08-27 Thread Amit Joshi
Hi All, I am trying to understand the effect of adding topics and partitions to a topic in kafka, which is being consumed by spark structured streaming applications. Do we have to restart the spark structured streaming application to read from the newly added topic? Do we have to restart the

回复:[Spark-Kafka-Streaming] Verifying the approach for multiple queries

2020-08-09 Thread tianlangstudio
than me/I am not good enough/I don't have time/This is the way I am -- 发件人:Amit Joshi 发送时间:2020年8月10日(星期一) 02:37 收件人:user 主 题:[Spark-Kafka-Streaming] Verifying the approach for multiple queries Hi, I have a scenario where

[Spark-Kafka-Streaming] Verifying the approach for multiple queries

2020-08-09 Thread Amit Joshi
Hi, I have a scenario where a kafka topic is being written with different types of json records. I have to regroup the records based on the type and then fetch the schema and parse and write as parquet. I have tried structured programming. But dynamic schema is a constraint. So I have used

Spark Kafka Streaming With Transactional Messages

2020-05-21 Thread nimmi.cv
I am using Spark 2.4 and using createDstream to read from kafka topic. The topic has messaged written from a transactional producer. I am getting the following error "requirement failed: Got wrong record for spark-executor-FtsTopicConsumerGrp7 test11-1 even after seeking to offset 85 got offset

Re: Spark Kafka Streaming with Offset Gaps

2020-05-21 Thread nimmi.cv
Did you resolve this issue ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: HDP 3.1 spark Kafka dependency

2020-03-18 Thread Zahid Rahman
I have found many library incompatibility issues including JVM headless issues where I had to uninstall headless jvm and install jdk and work through them, anyway This page shows the same error as yours, you may get away with making the changes to your pom.xml as suggested.

HDP 3.1 spark Kafka dependency

2020-03-18 Thread William R
Hi, I am finding difficulty in getting the proper Kafka lib's for spark. The version of HDP is 3.1 and i tried the below lib's but it produces the below issues. *POM entry :* org.apache.kafka kafka-clients 2.0.0.3.1.0.0-78 org.apache.kafka kafka_2.11 2.0.0.3.1.0.0-78

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Charles vinodh
Thanks Dhaval, that fixed the issue. The constant resetting of Kafka offsets misled me about the issue. Please feel free the answer the SO question here <https://stackoverflow.com/questions/57874681/spark-kafka-streaming-making-progress-but-there-is-no-data-to-be-consumed> if you woul

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Dhaval Patel
>>>>> On Wed, Sep 11, 2019 at 2:39 PM Charles vinodh >>>>> wrote: >>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am trying to run a spark application ingesting data from Kafka >>>

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Burak Yavuz
ication ingesting data from Kafka using >>>>> the Spark structured streaming and the spark library >>>>> org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.1. I am facing a very weird >>>>> issue where during execution of all my micro-batches the Kafka consumer is

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Charles vinodh
ed streaming and the spark library >>>> org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.1. I am facing a very weird >>>> issue where during execution of all my micro-batches the Kafka consumer is >>>> not able to fetch the offsets and its having its offsets reset as s

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Sandish Kumar HN
nd the spark library >>> org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.1. I am facing a very weird >>> issue where during execution of all my micro-batches the Kafka consumer is >>> not able to fetch the offsets and its having its offsets reset as show >>> below in th

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Charles vinodh
facing a very weird >> issue where during execution of all my micro-batches the Kafka consumer is >> not able to fetch the offsets and its having its offsets reset as show >> below in this log >> >> 19/09/11 02:49:42 INFO Fetcher: [Consumer clientId=my_client_id, >

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Burak Yavuz
below in this log > > 19/09/11 02:49:42 INFO Fetcher: [Consumer clientId=my_client_id, > groupId=spark-kafka-source-5496988b-3f5c-4342-9361-917e4f3ece51-1340785812-driver-0] > Resetting offset for partition my-topic-5 to offset 1168959116. > 19/09/11 02:49:42 INFO Fetcher: [Consum

Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Charles vinodh
the offsets and its having its offsets reset as show below in this log 19/09/11 02:49:42 INFO Fetcher: [Consumer clientId=my_client_id, groupId=spark-kafka-source-5496988b-3f5c-4342-9361-917e4f3ece51-1340785812-driver-0] Resetting offset for partition my-topic-5 to offset 1168959116. 19/09/11 02:49

Re: How to programmatically pause and resume Spark/Kafka structured streaming?

2019-08-06 Thread Gourav Sengupta
Hi There is a method to iterate only once in Spark. I use it for reading files using streaming. May be you can try that. Regards, Gourav On Tue, 6 Aug 2019, 21:50 kant kodali, wrote: > If I stop and start while processing the batch what will happen? will that > batch gets canceled and gets

Re: How to programmatically pause and resume Spark/Kafka structured streaming?

2019-08-06 Thread kant kodali
If I stop and start while processing the batch what will happen? will that batch gets canceled and gets reprocessed again when I click start? Does that mean I need to worry about duplicates in the downstream? Kafka consumers have a pause and resume and they work just fine so I am not sure why

Re: How to programmatically pause and resume Spark/Kafka structured streaming?

2019-08-05 Thread Gourav Sengupta
Hi, exactly my question, I was also looking for ways to gracefully exit spark structured streaming. Regards, Gourav On Tue, Aug 6, 2019 at 3:43 AM kant kodali wrote: > Hi All, > > I am trying to see if there is a way to pause a spark stream that process > data from Kafka such that my

How to programmatically pause and resume Spark/Kafka structured streaming?

2019-08-05 Thread kant kodali
Hi All, I am trying to see if there is a way to pause a spark stream that process data from Kafka such that my application can take some actions while the stream is paused and resume when the application completes those actions. Thanks!

Spark Kafka Streaming stopped

2019-06-14 Thread Amit Sharma
we are using spark kafka streaming. We have 6 nodes in kafka cluster if any of the node is getting down we are getting below exception and streaming stopped. ERROR DirectKafkaInputDStream:70 - ArrayBuffer(kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException

Re: Spark kafka streaming job stopped

2019-06-11 Thread Amit Sharma
Please provide update if any one knows. On Monday, June 10, 2019, Amit Sharma wrote: > > We have spark kafka sreaming job running on standalone spark cluster. We > have below kafka architecture > > 1. Two cluster running on two data centers. > 2. There is LTM on top on each

Fwd: Spark kafka streaming job stopped

2019-06-10 Thread Amit Sharma
We have spark kafka sreaming job running on standalone spark cluster. We have below kafka architecture 1. Two cluster running on two data centers. 2. There is LTM on top on each data center (load balance) 3. There is GSLB on top of LTM. I observed when ever any of the node in kafka cluster

Re: Spark Kafka Batch Write guarantees

2019-04-01 Thread hemant singh
Thanks Shixiong, read in documentation as well that duplicates might exist because of task retries. On Mon, 1 Apr 2019 at 9:43 PM, Shixiong(Ryan) Zhu wrote: > The Kafka source doesn’t support transaction. You may see partial data or > duplicated data if a Spark task fails. > > On Wed, Mar 27,

Re: Spark Kafka Batch Write guarantees

2019-04-01 Thread Shixiong(Ryan) Zhu
The Kafka source doesn’t support transaction. You may see partial data or duplicated data if a Spark task fails. On Wed, Mar 27, 2019 at 1:15 AM hemant singh wrote: > We are using spark batch to write Dataframe to Kafka topic. The spark > write function with write.format(source = Kafka). > Does

Spark Kafka Batch Write guarantees

2019-03-27 Thread hemant singh
We are using spark batch to write Dataframe to Kafka topic. The spark write function with write.format(source = Kafka). Does spark provide similar guarantee like it provides with saving dataframe to disk; that partial data is not written to Kafka i.e. full dataframe is saved or if job fails no

Re: Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Not authorized to access group: spark-kafka-source-060f3ceb-09f4-4e28-8210-3ef8a845fc92--2038748645-driver-2

2019-02-13 Thread Jungtaek Lim
nerated by Spark internally. > > Either one has to give access to "spark-kafka-source-*" group or in Spark > 3.0 this prefix can be configured with "groupidprefix" parameter. > > BR, > G > > > On Wed, Feb 13, 2019 at 3:58 AM Allu Thomas > wrote:

Re: Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Not authorized to access group: spark-kafka-source-060f3ceb-09f4-4e28-8210-3ef8a845fc92--2038748645-driver-2

2019-02-13 Thread Gabor Somogyi
/cloudera/spark/examples/DirectKafkaWordCount.scala#L59 In Strucuted Streaming the group ID is generated by Spark internally. Either one has to give access to "spark-kafka-source-*" group or in Spark 3.0 this prefix can be configured with "groupidprefix" parameter. BR, G On We

Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Not authorized to access group: spark-kafka-source-060f3ceb-09f4-4e28-8210-3ef8a845fc92--2038748645-driver-2

2019-02-12 Thread Allu Thomas
help would be greatly appreciated. Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Not authorized to access group: spark-kafka-source-060f3ceb-09f4-4e28-8210-3ef8a845fc92--2038748645-driver-2 === Streaming Query === Identifier: [id = 6ab10eab-4f71

Spark Kafka Streaming with Offset Gaps

2018-12-19 Thread Rishabh Pugalia
I have an app that uses Kafka Streaming to pull data from `input` topic and push to `output` topic with `processing.guarantee=exactly_once`. Due to `exactly_once` gaps (transaction markers) are created in Kafka. Let's call this app `kafka-streamer`. Now I've another app that listens to this

Spark Kafka Streaming with Offset Gaps

2018-12-19 Thread Rishabh Pugalia
I have an app that uses Kafka Streaming to pull data from `input` topic and push to `output` topic with `processing.guarantee=exactly_once`. Due to `exactly_once` gaps (transaction markers) are created in Kafka. Let's call this app `kafka-streamer`. Now I've another app that listens to this

spark kafka consumer with kerberos - login error

2018-06-19 Thread Amol Zambare
I am working on a spark job which reads from kafka topic and write to HDFS however while submitting the job using spark-submit command I am getting following error. Error log Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Could not login: the

Re: [Structured-Streaming][Beginner] Out of order messages with Spark kafka readstream from a specific partition

2018-05-10 Thread Cody Koeninger
wrote: > On the producer side, I make sure data for a specific user lands on the same > partition. On the consumer side, I use a regular Spark kafka readstream and > read the data. I also use a console write stream to print out the spark > kafka DataFrame. What I observer is, the data for a s

[Structured-Streaming][Beginner] Out of order messages with Spark kafka readstream from a specific partition

2018-05-09 Thread karthikjay
On the producer side, I make sure data for a specific user lands on the same partition. On the consumer side, I use a regular Spark kafka readstream and read the data. I also use a console write stream to print out the spark kafka DataFrame. What I observer is, the data for a specific user (even

Re: Does structured streaming support Spark Kafka Direct?

2018-04-12 Thread Tathagata Das
have code based on Spark Kafka Direct in production and we want to port > this code to Structured Streaming. Does structured streaming support spark > kafka direct? What are the configs for parallelism and scalability in > structured streaming? In Spark Kafka Direct, the number of kafk

Does structured streaming support Spark Kafka Direct?

2018-04-11 Thread SRK
hi, We have code based on Spark Kafka Direct in production and we want to port this code to Structured Streaming. Does structured streaming support spark kafka direct? What are the configs for parallelism and scalability in structured streaming? In Spark Kafka Direct, the number of kafka

Re: Problem in Spark-Kafka Connector

2017-12-27 Thread Sitakant Mishra
Hi, Kindly help me with this problem, for which I will be grateful. Thanks and Regards, Sitakanta Mishra On Tue, Dec 26, 2017 at 12:34 PM, Sitakant Mishra < sitakanta.mis...@gmail.com> wrote: > Hi, > > I am trying to connect my Spark cluster to a single Kafka Topic which > running as a

Problem in Spark-Kafka Connector

2017-12-26 Thread Sitakant Mishra
Hi, I am trying to connect my Spark cluster to a single Kafka Topic which running as a separate process in a machine. While submitting the spark application, I am getting the following error. *17/12/25 16:56:57 ERROR TransportRequestHandler: Error sending result

Spark Kafka API tries to connect to the dead node for every batch, which increases the processing time

2017-10-13 Thread supritht
Hi guys, I have a 3 node cluster and i am running a spark streaming job. consider the below example /*spark-submit* --master yarn-cluster --class com.huawei.bigdata.spark.examples.FemaleInfoCollectionPrint --jars

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-29 Thread Cody Koeninger
onsumer strategy for creating the stream below. Any suggestions >>>> on how >>>> >> > to >>>> >> > run this with better performance would be of great help. >>>> >> > >>>> >> > java.lang.AssertionError: assertion fai

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-28 Thread swetha kasireddy
324027964 after polling for 12 >>> >> > >>> >> > val kafkaParams = Map[String, Object]( >>> >> > "bootstrap.servers" -> kafkaBrokers, >>> >> > "key.deserializer" ->

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-28 Thread swetha kasireddy
t;> > "heartbeat.interval.ms" -> Integer.valueOf(2), >> >> > "session.timeout.ms" -> Integer.valueOf(6), >> >> > "request.timeout.ms" -> Integer.valueOf(9), >> >> > "enab

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-28 Thread swetha kasireddy
ueOf(6), > >> > "request.timeout.ms" -> Integer.valueOf(9), > >> > "enable.auto.commit" -> (false: java.lang.Boolean), > >> >

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-28 Thread Cody Koeninger
g.Boolean), >> > "spark.streaming.kafka.consumer.cache.enabled" -> "false", >> > "group.id" -> "test1" >> > ) >> > >> > val hubbleStream = KafkaUtils.createDirectStream[String, String]( >>

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-25 Thread swetha kasireddy
), >> > "session.timeout.ms" -> Integer.valueOf(6), >> > "request.timeout.ms" -> Integer.valueOf(9), >> > "enable.auto.commit" -> (false: java.lang.Boolean), >> > "spark.streaming.kafka.consumer.cache.enabled" -> &q

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-25 Thread swetha kasireddy
k.streaming.kafka.consumer.cache.enabled" -> "false", > > "group.id" -> "test1" > > ) > > > > val hubbleStream = KafkaUtils.createDirectStream[String, String]( > > ssc, > > LocationStrategies.Pref

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-25 Thread Cody Koeninger
-> "test1" > ) > > val hubbleStream = KafkaUtils.createDirectStream[String, String]( > ssc, > LocationStrategies.PreferConsistent, > ConsumerStrategies.Subscribe[String, String](topicsSet, kafkaParams) > ) > > > &

Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-25 Thread SRK
d" -> "false", "group.id" -> "test1" ) val hubbleStream = KafkaUtils.createDirectStream[String, String]( ssc, LocationStrategies.PreferConsistent, ConsumerStrategies.Subscribe[String, String](topicsSet, kafkaParams)

Re: How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-22 Thread Cody Koeninger
//spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#your-own-data-store >> >> On Tue, Aug 15, 2017 at 1:18 PM, SRK <swethakasire...@gmail.com> wrote: >> > Hi, >> > >> > How to force Spark Kafka Direct to start from the latest

Re: How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-21 Thread swetha kasireddy
> > > > Thanks! > > > > > > > > -- > > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/How-to-force-Spark-Kafka-Direct-to- > start-from-the-latest-offset-when-the-lag-is-huge-in-kafka-10-tp29071.html > > Sent from

Re: How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-21 Thread Cody Koeninger
Yes, you can start from specified offsets. See ConsumerStrategy, specifically Assign http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#your-own-data-store On Tue, Aug 15, 2017 at 1:18 PM, SRK <swethakasire...@gmail.com> wrote: > Hi, > > How to force Spa

How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-15 Thread SRK
Hi, How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10? It seems to be processing from the latest offset stored for a group id. One way to do this is to change the group id. But it would mean that each time that we need to process the job from

How to make sure that Spark Kafka Direct Streaming job maintains the state upon code deployment?

2017-06-27 Thread SRK
-to-make-sure-that-Spark-Kafka-Direct-Streaming-job-maintains-the-state-upon-code-deployment-tp28799.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr

How to bootstrap Spark Kafka direct with the previous state in case of a code upgrade

2017-06-20 Thread SRK
.1001560.n3.nabble.com/How-to-bootstrap-Spark-Kafka-direct-with-the-previous-state-in-case-of-a-code-upgrade-tp28775.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user

Re: Spark-Kafka integration - build failing with sbt

2017-06-19 Thread Cody Koeninger
org.apache.spark.streaming.kafka.KafkaUtils is in the spark-streaming-kafka-0-8 project On Mon, Jun 19, 2017 at 1:01 PM, karan alang wrote: > Hi Cody - i do have a additional basic question .. > > When i tried to compile the code in Eclipse, i was not able to do that > >

Re: Spark-Kafka integration - build failing with sbt

2017-06-19 Thread karan alang
Hi Cody - i do have a additional basic question .. When i tried to compile the code in Eclipse, i was not able to do that eg. import org.apache.spark.streaming.kafka.KafkaUtils gave errors saying KafaUtils was not part of the package. However, when i used sbt to compile - the compilation went

Re: Spark-Kafka integration - build failing with sbt

2017-06-17 Thread karan alang
Thanks, Cody .. yes, was able to fix that. On Sat, Jun 17, 2017 at 1:18 PM, Cody Koeninger wrote: > There are different projects for different versions of kafka, > spark-streaming-kafka-0-8 and spark-streaming-kafka-0-10 > > See > >

Re: Spark-Kafka integration - build failing with sbt

2017-06-17 Thread Cody Koeninger
There are different projects for different versions of kafka, spark-streaming-kafka-0-8 and spark-streaming-kafka-0-10 See http://spark.apache.org/docs/latest/streaming-kafka-integration.html On Fri, Jun 16, 2017 at 6:51 PM, karan alang wrote: > I'm trying to compile

Spark-Kafka integration - build failing with sbt

2017-06-16 Thread karan alang
I'm trying to compile kafka & Spark Streaming integration code i.e. reading from Kafka using Spark Streaming, and the sbt build is failing with error - [error] (*:update) sbt.ResolveException: unresolved dependency: org.apache.spark#spark-streaming-kafka_2.11;2.1.0: not found Scala version

reading binary file in spark-kafka streaming

2017-04-05 Thread Yogesh Vyas
Hi, I am having a binary file which I try to read in Kafka Producer and send to message queue. This I read in the Spark-Kafka consumer as streaming job. But it is giving me following error: UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 112: invalid start byte Can anyone

Re: spark kafka consumer with kerberos

2017-03-31 Thread Saisai Shao
Hi Bill, Normally Kerberos principal and keytab should be enough, because keytab could actually represent the password. Did you configure SASL/GSSAPI or SASL/Plain for KafkaClient? http://kafka.apache.org/documentation.html#security_sasl Actually this is more like a Kafka question and normally

Re: spark kafka consumer with kerberos

2017-03-31 Thread Bill Schwanitz
Saisai, Yea that seems to have helped. Looks like the kerberos ticket when I submit does not get passed to the executor? ... 3 more Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Unable to obtain password from user at

Re: spark kafka consumer with kerberos

2017-03-31 Thread Saisai Shao
Hi Bill, The exception is from executor side. From the gist you provided, looks like the issue is that you only configured java options in driver side, I think you should also configure this in executor side. You could refer to here (

spark kafka consumer with kerberos

2017-03-30 Thread Bill Schwanitz
I'm working on a poc spark job to pull data from a kafka topic with kerberos enabled ( required ) brokers. The code seems to connect to kafka and enter a polling mode. When I toss something onto the topic I get an exception which I just can't seem to figure out. Any ideas? I have a full gist up

Re: [Spark Kafka] API Doc pages for Kafka 0.10 not current

2017-02-28 Thread Cody Koeninger
. On Mon, Feb 27, 2017 at 3:01 PM, Afshartous, Nick <nafshart...@wbgames.com> wrote: > > Hello, > > > Looks like the API docs linked from the Spark Kafka 0.10 Integration page > are not current. > > > For instance, on the page > > > > https://spar

[Spark Kafka] API Doc pages for Kafka 0.10 not current

2017-02-27 Thread Afshartous, Nick
Hello, Looks like the API docs linked from the Spark Kafka 0.10 Integration page are not current. For instance, on the page https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html the code examples show the new API (i.e. class ConsumerStrategies). However

Re: [Spark Kafka] How to update batch size of input dynamically for spark kafka consumer?

2017-01-03 Thread Cody Koeninger
You can't change the batch time, but you can limit the number of items in the batch http://spark.apache.org/docs/latest/configuration.html spark.streaming.backpressure.enabled spark.streaming.kafka.maxRatePerPartition On Tue, Jan 3, 2017 at 4:00 AM, 周家帅 wrote: > Hi, > > I am

[Spark Kafka] How to update batch size of input dynamically for spark kafka consumer?

2017-01-03 Thread 周家帅
Hi, I am an intermediate spark user and have some experience in large data processing. I post this question in StackOverflow but receive no response. My problem is as follows: I use createDirectStream in my spark streaming application. I set the batch interval to 7 seconds and most of the time

Re: Back-pressure to Spark Kafka Streaming?

2016-12-05 Thread Cody Koeninger
> > Does backressure actually work on spark kafka streaming? According to the > latest spark streaming document: > http://spark.apache.org/docs/latest/streaming-programming-guide.html > "In Spark 1.5, we have introduced a feature called backpressure that > eliminate the need

Re: Back-pressure to Spark Kafka Streaming?

2016-12-05 Thread Richard Startin
tartin.com/ From: Liren Ding <sky.gonna.bri...@gmail.com> Sent: 05 December 2016 22:18 To: d...@spark.apache.org; user@spark.apache.org Subject: Back-pressure to Spark Kafka Streaming? Hey all, Does backressure actually work on spark kafka streaming? According to the latest spark st

Back-pressure to Spark Kafka Streaming?

2016-12-05 Thread Liren Ding
Hey all, Does backressure actually work on spark kafka streaming? According to the latest spark streaming document: *http://spark.apache.org/docs/latest/streaming-programming-guide.html <http://spark.apache.org/docs/latest/streaming-programming-guide.html>* "*In Spark 1.5, we have

Re: Spark kafka integration issues

2016-09-14 Thread Cody Koeninger
Yeah, an updated version of that blog post is available at https://github.com/koeninger/kafka-exactly-once On Wed, Sep 14, 2016 at 11:35 AM, Mukesh Jha wrote: > Thanks for the reply Cody. > > I found the below article on the same, very helpful. Thanks for the details, >

Re: Spark kafka integration issues

2016-09-14 Thread Mukesh Jha
Thanks for the reply Cody. I found the below article on the same, very helpful. Thanks for the details, much appreciated. http://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/ On Tue, Sep 13, 2016 at 8:14 PM, Cody Koeninger wrote: > 1. see

  1   2   >