I am working on a pet project to implement a real-time sentiment analysis
system for analyzing customer reviews. It leverages Kafka for data
ingestion, Spark Structured Streaming (SSS) for real-time processing, and
Vertex AI for sentiment analysis and potential action triggers.
*Features*
-
Hi Amit,
before answering your question, I am just trying to understand it.
I am not exactly clear how do the Akka application, Kafka and SPARK
Streaming application sit together, and what are you exactly trying to
achieve?
Can you please elaborate?
Regards,
Gourav
On Fri, Jan 28, 2022 at 10
Thanks Mich. The link you shared have two options Kafka and Socket only.
Thanks
Amit
On Sat, Jan 29, 2022 at 3:49 AM Mich Talebzadeh
wrote:
> So you have a classic architecture with spark receiving events through a
> kafka topic via kafka-spark-connector, do something with it and send data
>
So you have a classic architecture with spark receiving events through a
kafka topic via kafka-spark-connector, do something with it and send data
out to the consumer. Are you using Spark structured streaming here with
batch streaming? check
Hello everyone, we have spark streaming application. We send request to
stream through Akka actor using Kafka topic. We wait for response as it is
real time. Just want a suggestion is there any better option like Livy
where we can send and receive request to spark streaming.
Thanks
Amit
Hi Davide,
Please see the doc:
*Note: Kafka 0.8 support is deprecated as of Spark 2.3.0.*
Have you tried the same with Structured Streaming and not with DStreams?
If you insist somehow to DStreams you can use spark-streaming-kafka-0-10
connector instead.
BR,
G
On Fri, Jul 24, 2020 at 12:08 PM
nd Zookeeper in
the same machine in which I have the driver, it worked both locally and in the
cluster. But obviously for the sake of scalability and modularity I'd like to
use the current configuration.
I'm using Spark 2.4.6, the Kafka Streaming API are
"spark-streaming-kafka-0-8-ass
Hi Gerard,
Excellent, indeed your inputs helped. Thank you for the quick reply.
I modified the code based on inputs.
Now the application starts and it reads from the topic. Now we stream like
50,000 messages on the Kafka topic.
After a while we terminate the application using YARN kill and
Hi Arpan,
The error suggests that the streaming context has been started with
streamingContext.start() and after that statement, some other
dstream operations have been attempted.
A suggested pattern to manage the offsets is the following:
var offsetRanges: Array[OffsetRanger] = _
//create
Hi all,
In our cluster we have Kafka 0.10.1 and Spark 2.1.0. We are trying to store
the offsets in Kafka in order to achieve restartability of the streaming
application. ( Using checkpoints, I already implemented, we will require to
change code in production hence checkpoint won't work)
Checking
ine.netdna-ssl.com/wp-
> content/uploads/2015/11/spark-streaming-datanami.png
>
>
> Regards,
> Vaquar khan
>
>
> On Fri, Mar 10, 2017 at 6:17 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> Kafka and Spark Streaming don't do the same thing. Kafka stores and
>&
at 6:17 AM, Sean Owen <so...@cloudera.com> wrote:
> Kafka and Spark Streaming don't do the same thing. Kafka stores and
> transports data, Spark Streaming runs computations on a stream of data.
> Neither is itself a streaming platform in its entirety.
>
> It's kind of like aski
Kafka and Spark Streaming don't do the same thing. Kafka stores and
transports data, Spark Streaming runs computations on a stream of data.
Neither is itself a streaming platform in its entirety.
It's kind of like asking whether you should build a website using just
MySQL, or nginx.
> On 9
17, at 20:37, Gaurav1809 <gauravhpan...@gmail.com> wrote:
>>
>> Hi All, Would you please let me know which streaming platform is best. Be it
>> server log processing, social media feeds ot any such streaming data. I want
>> to know the comparison between Kafka &a
ase let me know which streaming platform is best. Be it
> server log processing, social media feeds ot any such streaming data. I want
> to know the comparison between Kafka & Spark Streaming.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.100
Hi All, Would you please let me know which streaming platform is best. Be it
server log processing, social media feeds ot any such streaming data. I want
to know the comparison between Kafka & Spark Streaming.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble
>
> From: Mich Talebzadeh <mich.talebza...@gmail.com>
> Date: Friday, December 2, 2016 at 12:26 PM
> To: Gabriel Perez <gabr...@adtheorent.com>
> Cc: Jacek Laskowski <ja...@japila.pl>, user <user@spark.apache.org>
>
>
> Subject: Re: Kafka 0.10 & Spark
:26 PM
To: Gabriel Perez <gabr...@adtheorent.com>
Cc: Jacek Laskowski <ja...@japila.pl>, user <user@spark.apache.org>
Subject: Re: Kafka 0.10 & Spark Streaming 2.0.2
in this POC of yours are you running this app with spark in Local mode by any
chance?
Dr Mich Ta
rez <gabr...@adtheorent.com>
> *Cc: *user <user@spark.apache.org>
> *Subject: *Re: Kafka 0.10 & Spark Streaming 2.0.2
>
>
>
> Hi,
>
>
>
> How many partitions does the topic have? How do you check how many
> executors read from the topic?
>
>
&
Laskowski <ja...@japila.pl>
Date: Friday, December 2, 2016 at 12:21 PM
To: Gabriel Perez <gabr...@adtheorent.com>
Cc: user <user@spark.apache.org>
Subject: Re: Kafka 0.10 & Spark Streaming 2.0.2
Hi,
Can you post the screenshot of the Executors and Streaming tabs?
Jacek
> *To: *Gabriel Perez <gabr...@adtheorent.com>
> *Cc: *user <user@spark.apache.org>
> *Subject: *Re: Kafka 0.10 & Spark Streaming 2.0.2
>
>
>
> Hi,
>
>
>
> How many partitions does the topic have? How do you check how many
> executors read from
Friday, December 2, 2016 at 11:47 AM
To: Gabriel Perez <gabr...@adtheorent.com>
Cc: user <user@spark.apache.org>
Subject: Re: Kafka 0.10 & Spark Streaming 2.0.2
Hi,
How many partitions does the topic have? How do you check how many executors
read from the topic?
Jacek
@Override
public void call( JavaRDD<ConsumerRecordString,
String>> rdd ) {
OffsetRange[] offsetRanges = (
(HasOffsetRanges) rdd.rdd()
).offsetRanges();
// some time later, after outputs have
compl
OffsetRanges) rdd.rdd()
).offsetRanges();
// some time later, after outputs have completed
( (CanCommitOffsets) stream.inputDStream()
).commitAsync( offsetRanges
);
}
} );
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-0-10-Spark-Streaming-2-0-2-tp28153.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
.
The direct stream doesn't use consumer groups in the same way the
kafka high level consumer does, but you should be able to pass group
id in the kafka parameters.
On Tue, Jun 21, 2016 at 9:56 AM, Guillermo Ortiz <konstt2...@gmail.com> wrote:
> I use Spark Streaming with Kafka and I'd lik
I use Spark Streaming with Kafka and I'd like to know how many consumers
are generated. I guess that as many as partitions in Kafka but I'm not
sure.
Is there a way to know the name of the groupId generated in Spark to Kafka?
Newberger
-Original Message-
From: Yogesh Vyas [mailto:informy...@gmail.com]
Sent: Wednesday, June 15, 2016 8:30 AM
To: David Newberger
Subject: Re: Handle empty kafka in Spark Streaming
I am looking for something which checks the JavaPairReceiverInputDStreambefore
further going for any
day, June 15, 2016 6:31 AM
To: user
Subject: Handle empty kafka in Spark Streaming
Hi,
Does anyone knows how to handle empty Kafka while Spark Streaming job is
running ?
Regards,
Yogesh
-
To unsubscribe, e-mail: user-unsub
Hi,
Does anyone knows how to handle empty Kafka while Spark Streaming job
is running ?
Regards,
Yogesh
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
cks.com> wrote:
>
>> You can use `DStream.map` to transform objects to anything you want.
>>
>> On Thu, Feb 25, 2016 at 11:06 AM, Mohammad Tariq <donta...@gmail.com>
>> wrote:
>>
>>> Hi group,
>>>
>>> I have just started working
anything you want.
>
> On Thu, Feb 25, 2016 at 11:06 AM, Mohammad Tariq <donta...@gmail.com>
> wrote:
>
>> Hi group,
>>
>> I have just started working with confluent platform and spark streaming,
>> and was wondering if it is possible to access individu
access individual fields from an
> Avro object read from a kafka topic through spark streaming. As per its
> default behaviour *KafkaUtils.createDirectStream[Object, Object,
> KafkaAvroDecoder, KafkaAvroDecoder](ssc, kafkaParams, topicsSet)* return
> a *DStream[Object, Object]*, and do
Hi group,
I have just started working with confluent platform and spark streaming,
and was wondering if it is possible to access individual fields from an
Avro object read from a kafka topic through spark streaming. As per its
default behaviour *KafkaUtils.createDirectStream[Object, Object
gt;
> Thank you very much for your reading and suggestions in advances.
>
> Jerry Wong
>
> ------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Optimize-t
and send messages to Brokers (1000 messages/per
>> time)
>>
>> But the Cassandra can only be inserted about 100 messages in each round
>> of test.
>> Can anybody give me advices why the other messages (about 900 message)
>> can't be consumed?
>> How do I c
r your reading and suggestions in advances.
Jerry Wong
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Optimize-the-performance-of-inserting-data-to-Cassandra-with-Kafka-and-Spark-Streaming-tp262
I noticed that many people are using Kafka and spark streaming. Can some one
provide a couple of use case
I image some possible use cases might be
Is the purpose using Kafka
1. provide some buffering?
2. implementing some sort of load balancing for the over all system?
3. Provide filtering
language for processing data than what you'd
get with writing kafka consumers yourself.
On Thu, Dec 10, 2015 at 8:00 PM, Andy Davidson <
a...@santacruzintegration.com> wrote:
> I noticed that many people are using Kafka and spark streaming. Can some
> one provide a couple of use case
:
You can configure PLAINTEXT listener as well with the broker and use that
port for spark.
--
Harsha
On August 28, 2015 at 12:24:45 PM, Sourabh Chandak (sourabh3...@gmail.com)
wrote:
Can we use the existing kafka spark streaming jar to connect to a kafka
server running in SSL mode
Hi,
I was going through SSL setup of Kafka.
https://cwiki.apache.org/confluence/display/KAFKA/Deploying+SSL+for+Kafka
However, I am also using Spark-Kafka streaming to read data from Kafka. Is
there a way to activate SSL for spark streaming API or not possible at all?
Thanks,
LCassa
using Spark-Kafka streaming to read data from Kafka. Is
there a way to activate SSL for spark streaming API or not possible at
all?
Thanks,
LCassa
was going through SSL setup of Kafka.
https://cwiki.apache.org/confluence/display/KAFKA/Deploying+SSL+for+Kafka
However, I am also using Spark-Kafka streaming to read data from Kafka.
Is
there a way to activate SSL for spark streaming API or not possible at
all?
Thanks,
LCassa
Can we use the existing kafka spark streaming jar to connect to a kafka
server running in SSL mode?
We are fine with non SSL consumer as our kafka cluster and spark cluster
are in the same network
Thanks,
Sourabh
On Fri, Aug 28, 2015 at 12:03 PM, Gwen Shapira g...@confluent.io wrote:
I can't
://apache-spark-user-list.1001560.n3.nabble.com/No-Twitter-Input-from-Kafka-to-Spark-Streaming-tp24131p24142.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
Thanks Akash for the answer. I added endpoint to the listener and now it is
working.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/No-Twitter-Input-from-Kafka-to-Spark-Streaming-tp24131p24142.html
Sent from the Apache Spark User List mailing list archive
My application takes Twitter4j tweets and publishes those to a topic in
Kafka. Spark Streaming subscribes to that topic for processing. But in
actual, Spark Streaming is not able to receive tweet data from Kafka so
Spark Streaming is running empty batch jobs with out input and I am not able
to see
Have you tried using the console consumer to see if anything is actually
getting published to that topic?
On Tue, Aug 4, 2015 at 11:45 AM, narendra narencs...@gmail.com wrote:
My application takes Twitter4j tweets and publishes those to a topic in
Kafka. Spark Streaming subscribes
I have a requirement to write in kafka queue from a spark streaming
application.
I am using spark 1.2 streaming. Since different executors in spark are
allocated at each run so instantiating a new kafka producer at each run
seems a costly operation .Is there a way to reuse objects in processing
Use foreachPartition, and allocate whatever the costly resource is once per
partition.
On Mon, Jul 6, 2015 at 6:11 AM, Shushant Arora shushantaror...@gmail.com
wrote:
I have a requirement to write in kafka queue from a spark streaming
application.
I am using spark 1.2 streaming. Since
Yeah, creating a new producer at the granularity of partitions may not be
that costly.
On Mon, Jul 6, 2015 at 6:40 AM, Cody Koeninger c...@koeninger.org wrote:
Use foreachPartition, and allocate whatever the costly resource is once
per partition.
On Mon, Jul 6, 2015 at 6:11 AM, Shushant
whats the difference between foreachPartition vs mapPartitions for a
Dtstream both works at partition granularity?
One is an operation and another is action but if I call an opeartion
afterwords mapPartitions also, which one is more efficient and recommeded?
On Tue, Jul 7, 2015 at 12:21 AM,
Both have same efficiency. The primary difference is that one is a
transformation (hence is lazy, and requires another action to actually
execute), and the other is an action.
But it may be a slightly better design in general to have transformations
be purely functional (that is, no external side
On using foreachPartition jobs get created are not displayed on driver
console but are visible on web ui.
On driver it creates some stage statistics of form [Stage 2:
(0 + 2) / 5] and disappeared .
I am using foreachPartition as :
Here is the pull request, you may refer to this:
https://github.com/apache/spark/pull/2994
Thanks
Jerry
2015-05-01 14:38 GMT+08:00 Pavan Sudheendra pavan0...@gmail.com:
Link to the question:
http://stackoverflow.com/questions/29974017/spark-kafka-producer-not-serializable-exception
Link to the question:
http://stackoverflow.com/questions/29974017/spark-kafka-producer-not-serializable-exception
Thanks for any pointers.
Hi,
I have written spark streaming kafka receiver using kafka simple consumer
api:
https://github.com/mykidong/spark-kafka-simple-consumer-receiver
This kafka receiver can be used as alternative to the current spark
streaming kafka receiver which is just written in high level kafka consumer
api
. my streaming job is retrieving messages from
kafka, and save them as avro files onto hdfs.
My question is, if worker fails to write avro to hdfs, sometimes, I want to
replay consuming messages from the last succeeded kafka offset again.
I think, Spark Streaming Kafka Receiver is written using
, January 15, 2015 11:59 AM
To: user@spark.apache.org
Subject: How to replay consuming messages from kafka using spark streaming?
Hi,
My Spark Streaming Job is doing like kafka etl to HDFS.
For instance, every 10 min. my streaming job is retrieving messages from kafka,
and save them as avro files
succeeded kafka offset again.
I think, Spark Streaming Kafka Receiver is written using Kafka High Level
Consumer API, not Simple Consumer API.
Any idea how to replay kafka consuming in spark streaming?
- Kidong.
--
View this message in context:
http://apache-spark-user-list.1001560.n3
59 matches
Mail list logo