Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread forece85
We are doing batch processing using Spark Streaming with Kinesis with a batch size of 5 mins. We want to send all events with same eventId to same executor for a batch so that we can do multiple events based grouping operations based on eventId. No previous batch or future batch data is concerned

Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread forece85
We are doing batch processing using Spark Streaming with Kinesis with a batch size of 5 mins. We want to send all events with same eventId to same executor for a batch so that we can do multiple events based grouping operations based on eventId. No previous batch or future batch data is concerned

Spark streaming with multiple Kafka topics

2021-03-05 Thread lalitha bandaru
Hi Team, I have a spark streaming application configured to consume events from 2 Kafka topics. But when I run the application locally, the messages are consumed from either of these topics only and not both. If the first event is published to say topic2 and second message to topic1 then only

Re: Converting spark batch to spark streaming

2021-01-08 Thread Jacek Laskowski
Hi, Start with DataStreamWriter.foreachBatch. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books Follow me on https://twitter.com/jaceklaskowski On Thu, Jan 7, 2021 at 6:55 PM mhd wrk

Converting spark batch to spark streaming

2021-01-07 Thread mhd wrk
I'm trying to convert a spark batch application to a streaming application and wondering what function (or design pattern) I should use to execute a series of operations inside the driver upon arrival of each message (a text file inside an HDFS folder) before starting computation inside executors.

[Spark Streaming] Why is ZooKeeper LeaderElection Agent not being called by Spark Master?

2020-12-29 Thread Saloni Mehta
Hello, Request you to please help me out on the below queries: I have 2 spark masters and 3 zookeepers deployed on my system on separate virtual machines. The services come up online in the below sequence: 1. zookeeper-1 2. sparkmaster-1 3. sparkmaster-2 4. zookeeper-2 5.

[Spark Streaming] support of non-timebased windows and lag function

2020-12-22 Thread Moser, Michael
Hi, I have a question regarding Spark structured streaming: will non-timebased window operations like the lag function be supported at some point, or is this not on the table due to technical difficulties? I.e. will something like this be possible in the future: w =

[Spark Streaming] unsubscribe and subscribe kafka topic without stopping context

2020-12-14 Thread Sanjay Tiwari
Hi All, I am using spark streaming with kafka, I am working on an application which requires me to process data based on user requests. I have created 2 dstream i.e. one per topic but I am unable to stop processing of data on user request or re-enable processing of data. I have asked question

Re: how to manage HBase connections in Executors of Spark Streaming ?

2020-11-25 Thread chen kevin
Executors of Spark Streaming ? Hi, Does any best practices about how to manage Hbase connections with kerberos authentication in Spark Streaming (YARN) environment? Want to now how executors manage the HBase connections,how to create them, close them and refresh Kerberos expires. Thanks.

how to manage HBase connections in Executors of Spark Streaming ?

2020-11-23 Thread big data
Hi, Does any best practices about how to manage Hbase connections with kerberos authentication in Spark Streaming (YARN) environment? Want to now how executors manage the HBase connections,how to create them, close them and refresh Kerberos expires. Thanks.

Single spark streaming job to read incoming events with dynamic schema

2020-11-16 Thread act_coder
I am trying to create a spark structured streaming job which reads from a Kafka topic and the events coming from that Kafka topic will have different schemas (There is no standard schema for the incoming events). Sample incoming events: event1: {timestamp:2018-09-28T15:50:57.2420418+00:00,

Re: Spark streaming with Kafka

2020-11-03 Thread Kevin Pis
Hi, this is my Word Count demo. https://github.com/kevincmchen/wordcount MohitAbbi 于2020年11月4日周三 上午3:32写道: > Hi, > > Can you please share the correct versions of JAR files which you used to > resolve the issue. I'm also facing the same issue. > > Thanks > > > > > -- > Sent from:

Re: Spark streaming with Kafka

2020-11-03 Thread MohitAbbi
Hi, Can you please share the correct versions of JAR files which you used to resolve the issue. I'm also facing the same issue. Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe

Re: Spark Streaming Job is stucked

2020-10-18 Thread Artemis User
If it was running fine before and stops working now, one thing I could think of may be your disk was full.  Check your disk space and clean up your old log files might help... On 10/18/20 12:06 PM, rajat kumar wrote: Hello Everyone, My spark streaming job is running too slow, it is having

Spark Streaming Job is stucked

2020-10-18 Thread rajat kumar
Hello Everyone, My spark streaming job is running too slow, it is having batch time of 15 seconds and the batch gets completed in 20-22 secs. It was fine till 1st week October, but it is behaving this way suddenly. I know changing the batch time can help , but other than that any idea what can

Spark Streaming Custom Receiver for REST API

2020-10-06 Thread Muhammed Favas
Hi, I have REST GET endpoint that gives data packet from a queue. I am thinking of implementing spark custom receiver to stream the data into spark. Somebody please help me on how to integrate REST endpoint with spark custom receiver class. Regards, Favas

Re: Spark Streaming ElasticSearch

2020-10-06 Thread jainshasha
Hi Siva In that case u can use structured streaming foreach / foreachBatch function which can help you process each record and write it into some sink -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To

Re: Spark Streaming ElasticSearch

2020-10-05 Thread Siva Samraj
Hi Jainshasha, I need to read each row from Dataframe and made some changes to it before inserting it into ES. Thanks Siva On Mon, Oct 5, 2020 at 8:06 PM jainshasha wrote: > Hi Siva > > To emit data into ES using spark structured streaming job you need to used > ElasticSearch jar which has

Re: Spark Streaming ElasticSearch

2020-10-05 Thread jainshasha
Hi Siva To emit data into ES using spark structured streaming job you need to used ElasticSearch jar which has support for sink for spark structured streaming job. For this you can use this one my branch where we have integrated ES with spark 3.0 and scala 2.12 compatible

Spark Streaming ElasticSearch

2020-10-05 Thread Siva Samraj
Hi Team, I have a spark streaming job, which will read from kafka and write into elastic via Http request. I want to validate each request from Kafka and change the payload as per business need and write into Elastic Search. I have used ES Http Request to push the data into Elastic Search. Can

Create custom receiver for MQTT in spark streaming

2020-10-01 Thread Muhammed Favas
Hi, I have a requirement to do analysis using spark on the data coming from IoT device via MQTT broker. The connectivity from my spark job is with MQTT broker where I can subscribe to specific topics. I have used the MQTTUtils library in spark to connect to the broker, but I have doubts about

Re: [Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

2020-09-29 Thread christinegong
Hi, In the spark job, it exports to prometheus localhost http server, to be later scraped by prometheus service. (https://github.com/prometheus/client_java#http) The problem here is when ssh to the emr instances themselves, only can see the metrics on (e.g. curl localhost:9111) driver in local

Re: [Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

2020-09-29 Thread Artemis User
nt library to populate your Spark processing result directly in Prometheus' database.  Hope this helps... -- ND On 9/28/20 3:21 AM, Christine Gong wrote: What should i do to expose my own custom prometheus metrics for cluster mode spark streaming job? I want to run a spark streaming job to read from

[Spark Prometheus Metrics] How to add my own metrics in spark streaming job?

2020-09-28 Thread Christine Gong
What should i do to expose my own custom prometheus metrics for cluster mode spark streaming job? I want to run a spark streaming job to read from kafka , do some calculations and write to localhost prometheus on port 9111. https://github.com/jaegertracing/jaeger-analytics-java/blob/master/spark

Spark streaming job not able to launch more number of executors

2020-09-18 Thread Vibhor Banga ( Engineering - VS)
Hi all, We have a spark streaming job which reads from two kafka topics with 10 partitions each. And we are running the streaming job with 3 concurrent microbatches. (So total 20 partitions and 3 concurrency) We have following question: In our processing DAG, we do a rdd.persist() at one stage

Re: Spark Streaming Checkpointing

2020-09-04 Thread András Kolbert
n to use that? > > BR, > G > > > On Thu, Sep 3, 2020 at 11:41 AM András Kolbert > wrote: > >> Hi All, >> >> I have a Spark streaming application (2.4.4, Kafka 0.8 >> so Spark Direct >> Streaming) running just fine. >> >

Re: Spark Streaming Checkpointing

2020-09-04 Thread Gabor Somogyi
:41 AM András Kolbert wrote: > Hi All, > > I have a Spark streaming application (2.4.4, Kafka 0.8 >> so Spark Direct > Streaming) running just fine. > > I create a context in the following way: > > ssc = StreamingContext(sc, 60) opts = > {"metadata.broke

Spark Streaming Checkpointing

2020-09-03 Thread András Kolbert
Hi All, I have a Spark streaming application (2.4.4, Kafka 0.8 >> so Spark Direct Streaming) running just fine. I create a context in the following way: ssc = StreamingContext(sc, 60) opts = {"metadata.broker.list":kafka_hosts,"auto.offset.reset": "largest&q

Re: Appropriate checkpoint interval in a spark streaming application

2020-08-15 Thread Sheel Pancholi
Guys any inputs explaining the rationale on the below question will really help. Requesting some expert opinion. Regards, Sheel On Sat, 15 Aug, 2020, 1:47 PM Sheel Pancholi, wrote: > Hello, > > I am trying to figure an appropriate checkpoint interval for my spark > streaming appl

Appropriate checkpoint interval in a spark streaming application

2020-08-15 Thread Sheel Pancholi
Hello, I am trying to figure an appropriate checkpoint interval for my spark streaming application. Its Spark Kafka integration based on Direct Streams. If my *micro batch interval is 2 mins*, and let's say *each microbatch takes only 15 secs to process* then shouldn't my checkpoint interval

Re: Spark Streaming with Kafka and Python

2020-08-12 Thread Sean Owen
is the issue? On Wed, Aug 12, 2020 at 7:53 AM German Schiavon wrote: > > Hey, > > Maybe I'm missing some restriction with EMR, but have you tried to use > Structured Streaming instead of Spark Streaming? > > https://spark.apache.org/docs/2.4.5/structured-streaming-kafka-integra

Re: Spark Streaming with Kafka and Python

2020-08-12 Thread German Schiavon
Hey, Maybe I'm missing some restriction with EMR, but have you tried to use Structured Streaming instead of Spark Streaming? https://spark.apache.org/docs/2.4.5/structured-streaming-kafka-integration.html Regards On Wed, 12 Aug 2020 at 14:12, Hamish Whittal wrote: > Hi folks, > >

Spark Streaming with Kafka and Python

2020-08-12 Thread Hamish Whittal
Hi folks, Thought I would ask here because it's somewhat confusing. I'm using Spark 2.4.5 on EMR 5.30.1 with Amazon MSK. The version of Scala used is 2.11.12. I'm using this version of the libraries spark-streaming-kafka-0-8_2.11-2.4.5.jar Now I'm wanting to read from Kafka topics using Python

Re: Spark streaming receivers

2020-08-10 Thread Russell Spitzer
gt;> block management system. There is a timing that you set which decides when >> each block is "done" and records after that time has passed go into the >> next block (See parameter >> <https://spark.apache.org/docs/latest/configuration.html#spark-streaming>

Re: Spark streaming receivers

2020-08-09 Thread Dark Crusader
ock management system. There is a timing that you set which decides when > each block is "done" and records after that time has passed go into the > next block (See parameter > <https://spark.apache.org/docs/latest/configuration.html#spark-streaming> > spark.streaming.blockIn

Re: Spark streaming receivers

2020-08-08 Thread Russell Spitzer
over to a block management system. There is a timing that you set which decides when each block is "done" and records after that time has passed go into the next block (See parameter <https://spark.apache.org/docs/latest/configuration.html#spark-streaming> spark.streaming.blockInterva

Spark streaming receivers

2020-08-08 Thread Dark Crusader
Hi, I'm having some trouble figuring out how receivers tie into spark driver-executor structure. Do all executors have a receiver that is blocked as soon as it receives some stream data? Or can multiple streams of data be taken as input into a single executor? I have stream data coming in at

Re: Kafka with Spark Streaming work on local but it doesn't work in Standalone mode

2020-07-24 Thread Gabor Somogyi
Hi Davide, Please see the doc: *Note: Kafka 0.8 support is deprecated as of Spark 2.3.0.* Have you tried the same with Structured Streaming and not with DStreams? If you insist somehow to DStreams you can use spark-streaming-kafka-0-10 connector instead. BR, G On Fri, Jul 24, 2020 at 12:08 PM

Kafka with Spark Streaming work on local but it doesn't work in Standalone mode

2020-07-24 Thread Davide Curcio
Hi, I'm trying to use Spark Streaming with a very simple script like this: from pyspark import SparkContext, SparkConf from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils sc = SparkContext(appName="PythonSparkStreamingKafka") ssc = Stream

How to introduce reset logic when aggregating/joining streaming dataframe with static dataframe for spark streaming

2020-07-24 Thread Yong Yuan
A good feature of spark structured streaming is that it can join the static dataframe with the streaming dataframe. To cite an example as below. users is a static dataframe read from database. transactionStream is from a stream. By the joining operation, we can get the spending of each country

Re: Spark Streaming - Set Parallelism and Optimize driver

2020-07-20 Thread Russell Spitzer
Without seeing the rest (and you can confirm this by looking at the DAG visualization in the Spark UI) I would say your first stage with 6 partitions is: Stage 1: Read data from kinesis (or read blocks from receiver not sure what method you are using) and write shuffle files for repartition Stage

Re: Spark Streaming - Set Parallelism and Optimize driver

2020-07-20 Thread forece85
Thanks for reply. Please find sudo code below. Its Dstreams reading for every 10secs from kinesis stream and after transformations, pushing into hbase. Once got Dstream, we are using below code to repartition and do processing: dStream = dStream.repartition(javaSparkContext.defaultMinPartitions()

Re: Spark Streaming - Set Parallelism and Optimize driver

2020-07-20 Thread forece85
Thanks for reply. Please find sudo code below. We are fetching Dstreams from kinesis stream for every 10sec and performing transformations and finally persisting to hbase tables using batch insertions. dStream = dStream.repartition(jssc.defaultMinPartitions() * 3); dStream.foreachRDD(javaRDD ->

Re: Spark Streaming - Set Parallelism and Optimize driver

2020-07-20 Thread Russell Spitzer
. On Mon, Jul 20, 2020, 6:33 AM forece85 wrote: > I am new to spark streaming and trying to understand spark ui and to do > optimizations. > > 1. Processing at executors took less time than at driver. How to optimize > to > make driver tasks fast ? > 2. We are usin

Spark Streaming - Set Parallelism and Optimize driver

2020-07-20 Thread forece85
I am new to spark streaming and trying to understand spark ui and to do optimizations. 1. Processing at executors took less time than at driver. How to optimize to make driver tasks fast ? 2. We are using dstream.repartition(defaultParallelism*3) to increase parallelism which is causing high

Re: Spark streaming with Confluent kafka

2020-07-03 Thread Gabor Somogyi
The error is clear: Caused by: java.lang.IllegalArgumentException: No serviceName defined in either JAAS or Kafka config On Fri, 3 Jul 2020, 15:40 dwgw, wrote: > Hi > > I am trying to stream confluent kafka topic in the spark shell. For that i > have invoked spark shell using following command.

Spark streaming with Confluent kafka

2020-07-03 Thread dwgw
Hi I am trying to stream confluent kafka topic in the spark shell. For that i have invoked spark shell using following command. # spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 --conf

Re: Spark streaming with Kafka

2020-07-02 Thread dwgw
Hi I am able to correct the issue. The issue was due to wrong version of JAR file I have used. I have removed the these JAR files and copied correct version of JAR files and the error has gone away. Regards -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Spark streaming with Kafka

2020-07-02 Thread Jungtaek Lim
I can't reproduce. Could you please make sure you're running spark-shell with official spark 3.0.0 distribution? Please try out changing the directory and using relative path like "./spark-shell". On Thu, Jul 2, 2020 at 9:59 PM dwgw wrote: > Hi > I am trying to stream kafka topic from spark

Spark streaming with Kafka

2020-07-02 Thread dwgw
Hi I am trying to stream kafka topic from spark shell but i am getting the following error. I am using *spark 3.0.0/scala 2.12.10* (Java HotSpot(TM) 64-Bit Server VM, *Java 1.8.0_212*) *[spark@hdp-dev ~]$ spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0* Ivy Default Cache

Spark streaming with Kafka

2020-07-02 Thread dwgw
HiI am trying to stream kafka topic from spark shell but i am getting the following error. I am using *spark 3.0.0/scala 2.12.10* (Java HotSpot(TM) 64-Bit Server VM, *Java 1.8.0_212*)*[spark@hdp-dev ~]$ spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0*Ivy Default Cache set

How does Spark Streaming handle late data?

2020-07-02 Thread lafeier
Hi, AllI am using Spark Streaming for real-time data, but the data is delayed.My batch time is set to 15 minutes and then Spark steaming trigger calculation at 15 minutes,30 minutes,45 minutes and 60 minutes, but my data delay is 5 minutes, what should I do

Re: Running Apache Spark Streaming on the GraalVM Native Image

2020-07-01 Thread Pasha Finkelshteyn
Hi Ivo, I believe there's absolutely no way that Spark will work on GraalVM Native Image because Spark generates code and loads classes in runtime, while GraalVM Native Image works only in closed world and has no any way to load classes which are not present in classpath at compie time. On

Running Apache Spark Streaming on the GraalVM Native Image

2020-07-01 Thread ivo.kn...@t-online.de
Hi guys, so I want to get Apache Spark to run on the GraalVM Native Image in a simple single-node streaming application, but I get the following error, when trying to build the native image: (check attached file) And as I researched online, there seems to be no successful combination of

High Availability for spark streaming application running in kubernetes

2020-06-24 Thread Shenson Joseph
Hello, I have a spark streaming application running in kubernetes and we use spark operator to submit spark jobs. Any suggestion on 1. How to handle high availability for spark streaming applications. 2. What would be the best approach to handle high availability of checkpoint data if we don't

[Spark Streaming] predicate pushdown in custom connector source.

2020-06-23 Thread Rahul Kumar
I'm trying to implement structured spark streaming source for a custom connector. I'm wondering if it is possible to do predicate pushdown in the streaming source? I'm aware this may be something native to the datastore in question. However, I would really appreciate if someone can redirect me

[Apache Spark][Streaming Job][Checkpoint]Spark job failed on Checkpoint recovery with Batch not found error

2020-05-28 Thread taylorwu
Hi, We have a Spark 2.4 job failed on Checkpoint recovery every few hours with the following errors (from the Driver Log): driver spark-kubernetes-driver ERROR 20:38:51 ERROR MicroBatchExecution: Query impressionUpdate [id = 54614900-4145-4d60-8156-9746ffc13d1f, runId =

Re: Spark Streaming Memory

2020-05-17 Thread Ali Gouta
The spark UI is misleading in spark 2.4.4. I moved to spark 2.4.5 and it fixed it. Now, your problem should be somewhere else. Probably related to memory consumption but not the one you see in the UI. Best regards, Ali Gouta. On Sun, May 17, 2020 at 7:36 PM András Kolbert wrote: > Hi, > > I

Spark Streaming Memory

2020-05-17 Thread András Kolbert
Hi, I have a streaming job (Spark 2.4.4) in which the memory usage keeps increasing over time. Periodically (20-25) mins the executors fall over (org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 6987) due to out of memory. In the UI, I can see that

Re: [spark streaming] checkpoint location feature for batch processing

2020-05-03 Thread Jungtaek Lim
t;>> >>>> Hi Rishi, >>>> >>>> That is exactly why Trigger.Once was created for Structured Streaming. >>>> The way we look at streaming is that it doesn't have to be always real >>>> time, or 24-7 always on. We see streaming as a workfl

Re: [spark streaming] checkpoint location feature for batch processing

2020-05-03 Thread Magnus Nilsson
this blog post for more details! >>> >>> https://databricks.com/blog/2017/05/22/running-streaming-jobs-day-10x-cost-savings.html >>> >>> Best, >>> Burak >>> >>> On Fri, May 1, 2020 at 2:55 PM Rishi Shah >>> wrote: >>>

Re: [spark streaming] checkpoint location feature for batch processing

2020-05-02 Thread Jungtaek Lim
bs-day-10x-cost-savings.html >> >> Best, >> Burak >> >> On Fri, May 1, 2020 at 2:55 PM Rishi Shah >> wrote: >> >>> Hi All, >>> >>> I recently started playing with spark streaming, and checkpoint location >>> feat

Re: [spark streaming] checkpoint location feature for batch processing

2020-05-02 Thread Magnus Nilsson
> wrote: > >> Hi All, >> >> I recently started playing with spark streaming, and checkpoint location >> feature looks very promising. I wonder if anyone has an opinion about using >> spark streaming with checkpoint location option as a slow batch pro

Re: [spark streaming] checkpoint location feature for batch processing

2020-05-01 Thread Rishi Shah
have to repeat > indefinitely. See this blog post for more details! > > https://databricks.com/blog/2017/05/22/running-streaming-jobs-day-10x-cost-savings.html > > Best, > Burak > > On Fri, May 1, 2020 at 2:55 PM Rishi Shah > wrote: > >> Hi All, >> >>

Re: [spark streaming] checkpoint location feature for batch processing

2020-05-01 Thread Burak Yavuz
://databricks.com/blog/2017/05/22/running-streaming-jobs-day-10x-cost-savings.html Best, Burak On Fri, May 1, 2020 at 2:55 PM Rishi Shah wrote: > Hi All, > > I recently started playing with spark streaming, and checkpoint location > feature looks very promising. I wonder if anyone has an o

[spark streaming] checkpoint location feature for batch processing

2020-05-01 Thread Rishi Shah
Hi All, I recently started playing with spark streaming, and checkpoint location feature looks very promising. I wonder if anyone has an opinion about using spark streaming with checkpoint location option as a slow batch processing solution. What would be the pros and cons of utilizing streaming

Re: Spark Streaming not working

2020-04-14 Thread Gerard Maas
Debabrata Ghosh wrote: > Hi, > I have a spark streaming application where Kafka is producing > records but unfortunately spark streaming isn't able to consume those. > > I am hitting the following error: > > 20/04/10 17:28:04 ERROR Executor: Exception in task 0.5

Re: Spark Streaming not working

2020-04-14 Thread Gabor Somogyi
n task 0.5 in stage 0.0 >> (TID 24) >> > java.lang.AssertionError: assertion failed: Failed to get records for >> spark-executor-service-spark-ingestion dice-ingestion 11 0 after polling >> for 12 >> >> Cheers, >> -z >> >> _

Re: Spark Streaming not working

2020-04-14 Thread Gabor Somogyi
> Sent: Saturday, April 11, 2020 2:25 > To: user > Subject: Re: Spark Streaming not working > > Any solution please ? > > On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh <mailto:mailford...@gmail.com>> wrote: > Hi, > I have a spark streaming application

Re: Spark Streaming not working

2020-04-14 Thread ZHANG Wei
e 0.0 (TID 24) > java.lang.AssertionError: assertion failed: Failed to get records for > spark-executor-service-spark-ingestion dice-ingestion 11 0 after polling for > 12 Cheers, -z From: Debabrata Ghosh Sent: Saturday, April 11, 2020 2:25 To: user Subjec

Re: Spark Streaming not working

2020-04-10 Thread Debabrata Ghosh
Any solution please ? On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh wrote: > Hi, > I have a spark streaming application where Kafka is producing > records but unfortunately spark streaming isn't able to consume those. > > I am hitting the following error: > > 20

Re: Spark Streaming not working

2020-04-10 Thread Chenguang He
unsubscribe

Re: Spark Streaming not working

2020-04-10 Thread Debabrata Ghosh
Yes the Kafka producer is producing records from the same host - Rechecked Kafka connection and the connection is there. Came across this URL but unable to understand it https://stackoverflow.com/questions/42264669/spark-streaming-assertion-failed-failed-to-get-records-for-spark-executor-a-gro

Re: Spark Streaming not working

2020-04-10 Thread Srinivas V
Check if your broker details are correct, verify if you have network connectivity to your client box and Kafka broker server host. On Fri, Apr 10, 2020 at 11:04 PM Debabrata Ghosh wrote: > Hi, > I have a spark streaming application where Kafka is producing > records but unfo

Spark Streaming not working

2020-04-10 Thread Debabrata Ghosh
Hi, I have a spark streaming application where Kafka is producing records but unfortunately spark streaming isn't able to consume those. I am hitting the following error: 20/04/10 17:28:04 ERROR Executor: Exception in task 0.5 in stage 0.0 (TID 24) java.lang.AssertionError: assertion

Re: Spark Streaming on Compact Kafka topic - consumers 1 message per partition per batch

2020-04-08 Thread Hrishikesh Mishra
here a > reason you chose to start reading again from the beginning by using a new > consumer group rather then sticking to the same consumer group? > > In your application, are you manually committing offsets to Kafka? > > Regards, > > Waleed > > On Wed, Apr 1, 202

Re: Spark Streaming on Compact Kafka topic - consumers 1 message per partition per batch

2020-04-01 Thread Waleed Fateem
our application, are you manually committing offsets to Kafka? Regards, Waleed On Wed, Apr 1, 2020 at 1:31 AM Hrishikesh Mishra wrote: > Hi > > Our Spark streaming job was working fine as expected (the number of events > to process in a batch). But due to some reasons, we added compaction o

Spark Streaming on Compact Kafka topic - consumers 1 message per partition per batch

2020-04-01 Thread Hrishikesh Mishra
Hi Our Spark streaming job was working fine as expected (the number of events to process in a batch). But due to some reasons, we added compaction on Kafka topic and restarted the job. But after restart it was failing for below reason: org.apache.spark.SparkException: Job aborted due to stage

Re: Spark Streaming Code

2020-03-28 Thread Jungtaek Lim
To get any meaningful answers you may want to provide the information/context as much as possible. e.g. Spark version, which behavior/output was expected (and why you think) and how it behaves actually. On Sun, Mar 29, 2020 at 3:37 AM Siva Samraj wrote: > Hi Team, > > Need help on windowing &

Spark Streaming Code

2020-03-28 Thread Siva Samraj
Hi Team, Need help on windowing & watermark concept. This code is not working as expected. package com.jiomoney.streaming import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ import org.apache.spark.sql.streaming.ProcessingTime object SlingStreaming { def

Re: Stateful Structured Spark Streaming: Timeout is not getting triggered

2020-03-05 Thread Something Something
Yes that was it! It seems it only works if input data is continuously flowing. I had stopped the input job because I had enough data but it seems timeouts work only if the data is continuously fed. Not sure why it's designed that way. Makes it a bit harder to write unit/integration tests BUT I

Re: Stateful Structured Spark Streaming: Timeout is not getting triggered

2020-03-04 Thread Tathagata Das
Make sure that you are continuously feeding data into the query to trigger the batches. only then timeouts are processed. See the timeout behavior details here - https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.streaming.GroupState On Wed, Mar 4, 2020 at 2:51 PM

Stateful Structured Spark Streaming: Timeout is not getting triggered

2020-03-04 Thread Something Something
I've set the timeout duration to "2 minutes" as follows: def updateAcrossEvents (tuple3: Tuple3[String, String, String], inputs: Iterator[R00tJsonObject], oldState: GroupState[MyState]): OutputRow = { println(" Inside updateAcrossEvents with : " + tuple3._1 + ",

Re: Stateful Spark Streaming: Required attribute 'value' not found

2020-03-04 Thread Something Something
By simply adding 'toJSON' before 'writeStream' the problem was fixed. Maybe it will help someone. On Tue, Mar 3, 2020 at 6:02 PM Something Something wrote: > In a Stateful Spark Streaming application I am writing the 'OutputRow' in > the 'updateAcrossEvents' but I keep getting this

Stateful Spark Streaming: Required attribute 'value' not found

2020-03-03 Thread Something Something
In a Stateful Spark Streaming application I am writing the 'OutputRow' in the 'updateAcrossEvents' but I keep getting this error (*Required attribute 'value' not found*) while it's trying to write to Kafka. I know from the documentation that 'value' attribute needs to be set but how do I do

Re: Spark Streaming with mapGroupsWithState

2020-03-02 Thread Something Something
I changed it to Tuple2 and that problem is solved. Any thoughts on this message *Unapplied methods are only converted to functions when a function type is expected.* *You can make this conversion explicit by writing `updateAcrossEvents _` or `updateAcrossEvents(_,_,_,_,_)` instead of

Re: Spark Streaming with mapGroupsWithState

2020-03-02 Thread lec ssmi
maybe you can combine the fields you want to use into one field Something Something 于2020年3月3日周二 上午6:37写道: > I am writing a Stateful Streaming application in which I am using > mapGroupsWithState to create aggregates for Groups but I need to create > *Groups > based on more than one column in

Spark Streaming with mapGroupsWithState

2020-03-02 Thread Something Something
I am writing a Stateful Streaming application in which I am using mapGroupsWithState to create aggregates for Groups but I need to create *Groups based on more than one column in the Input Row*. All the examples in the 'Spark: The Definitive Guide' use only one column such as 'User' or 'Device'. I

Aggregating values by a key field in Spark Streaming

2020-02-28 Thread Something Something
in for a period of X minutes. How do I do this using Spark Streaming? I am reading up on 'Stateful Transformations' with 'mapWithState'. Is that the right approach? Any sample code would be even more appreciated. Thanks.

Re: Spark Streaming: Aggregating values across batches

2020-02-27 Thread Tathagata Das
Use Structured Streaming. Its aggregation, by definition, is across batches. On Thu, Feb 27, 2020 at 3:17 PM Something Something < mailinglist...@gmail.com> wrote: > We've a Spark Streaming job that calculates some values in each batch. > What we need to do now is aggregate value

Spark Streaming: Aggregating values across batches

2020-02-27 Thread Something Something
We've a Spark Streaming job that calculates some values in each batch. What we need to do now is aggregate values across ALL batches. What is the best strategy to do this in Spark Streaming. Should we use 'Spark Accumulators' for this?

Spark Streaming job having issue with Java Flight Recorder (JFR)

2020-02-20 Thread Pramod Biligiri
Hi, Has anyone successfully used Java Flight Recorder (JFR) with Spark Streaming on Oracle Java 8? JFR works for me on batch jobs but not with Streaming. I'm running my streaming job on Amazon EMR. I have enabled Java Flight Recorder (JFR) to profile CPU usage. But at the end of the job, the JFR

[ spark-streaming ] - Data Locality issue

2020-02-04 Thread Karthik Srinivas
Hi, I am using spark 2.3.2, i am facing issues due to data locality, even after giving spark.locality.wait.rack=200, locality_level is always RACK_LOCAL, can someone help me with this. Thank you

Fwd: [Spark Streaming]: Why my Spark Direct stream is sending multiple offset commits to Kafka?

2020-01-06 Thread Raghu B
Hi Spark Community. I need help with the following issue and I have been researching about it from last 2 weeks and as a last and best resource I want to ask the Spark community. I am running the following code in Spark* * val sparkConf = new SparkConf()* *.setMaster("local[*]")*

Spark streaming when a node or nodes go down

2019-12-11 Thread Mich Talebzadeh
Hi, I know this is a basic question but someone enquired about it and I just wanted to fill my knowledge gap so to speak. Within the context of Spark streaming, the RDD is created from the incoming topic and RDD is partitioned and each node of Spark is operating on a partition at that time. OK

Re: spark streaming exception

2019-11-10 Thread Akshay Bhardwaj
2019 at 3:49 PM Amit Sharma wrote: > >> Hi , we have spark streaming job to which we send a request through our >> UI using kafka. It process and returned the response. We are getting below >> error and this stareming is not processing any request. >> >> Li

[Spark Streaming] Apply multiple ML pipelines(Models) to the same stream

2019-10-31 Thread Spico Florin
Hello! I have an use case where I have to apply multiple already trained models (e.g. M1, M2, ..Mn) on the same spark stream ( fetched from kafka). The models were trained usining the isolation forest algorithm from here: https://github.com/titicaca/spark-iforest I have found something similar

Re: spark streaming exception

2019-10-17 Thread Amit Sharma
Please update me if any one knows about it. Thanks Amit On Thu, Oct 10, 2019 at 3:49 PM Amit Sharma wrote: > Hi , we have spark streaming job to which we send a request through our UI > using kafka. It process and returned the response. We are getting below > error and this

Semantics of Manual Offset Commit for Kafka Spark Streaming

2019-10-14 Thread Andre Piwoni
When using manual Kafka offset commit in Spark streaming job and application fails to process current batch without committing offset in executor, is it expected behavior that next batch will be processed and offset will be moved to next batch regardless of application failure to commit

spark streaming exception

2019-10-10 Thread Amit Sharma
Hi , we have spark streaming job to which we send a request through our UI using kafka. It process and returned the response. We are getting below error and this stareming is not processing any request. Listener StreamingJobProgressListener threw an exception java.util.NoSuchElementException: key

<    1   2   3   4   5   6   7   8   9   10   >