Hi,
I am getting below error when creating Dataframe from twitter Streaming RDD
val sparkSession:SparkSession = SparkSession
.builder
.appName("twittertest2")
.master("local[*]")
.enableHiveSupport()
At this point I recommend that new applications are built using structured
streaming. The engine was GA-ed as of Spark 2.2 and I know of several very
large (trillions of records) production jobs that are running in Structured
Streaming. All of our production pipelines at databricks are written
here is my two cents, experts please correct me if wrong
its important to understand why one over other and for what kind of use
case. There might be sometime in future where low level API's are abstracted
and become legacy but for now in Spark RDD API is the core and low level
API, all higher
Assuming you are talking about Spark Streaming
1) How to analyze what part of code executes on Spark Driver and what part
of code executes on the executors?
RDD's can be understood as set of data transformations or set of jobs. Your
understanding deepens as you do more programming with Spark
Hi,
I read an article which recommended to use dataframes instead of rdd
primitives. Now I read about the differences over using DStreams and
Structured Streaming and structured streaming adds a lot of improvements
like checkpointing, windowing, sessioning, fault tolerance etc.
What I am
Summarizing
1) Static data set read from Parquet files as DataFrame in HDFS has initial
parallelism of 90 (based on no input files)
2) static data set DataFrame is converted as rdd, and rdd has parallelism of
18 this was not expected
dataframe.rdd is lazy evaluation there must be some operation
Hello everyone,
We are running Spark Streaming jobs in Spark 2.1 in cluster mode in YARN. We
have an RDD (3GB) that we periodically (every 30min) refresh by reading from
HDFS. Namely, we create a DataFrame /df / using /sqlContext.read.parquet/,
and then we create /RDD rdd = df.as[T].rdd
Hi,
I have written spark streaming job to use the checkpoint. I have stopped
the streaming job for 5 days and then restart it today.
I have encountered weird issue where it shows as zero records for all
cycles till date. is it causing data loss?
[image: Inline image 1]
Thanks,
Asmath
Hi All,
A cluster of one spark driver and multiple executors(5) is setup with redis
for spark processed data storage and s3 is used for checkpointing. I have a
couple of queries about this setup.
1) How to analyze what part of code executes on Spark Driver and what part
of code executes on the
We appear to be kindred spirits, I’ve recently run into the same issue. Are you
running compacted topics? I’ve run into this issue on non-compacted topics as
well, it happens rarely but is still a pain. You might check out this patch and
related spark streaming Kafka ticket:
https://github.com
Hi all
kafka version : kafka_2.11-0.11.0.2
spark version : 2.0.1
A topic-partition "adn-tracking,15" in kafka who's earliest offset is1255644602
andlatest offset is1271253441.
While starting a spark streaming to process the data from the topic , we got a
exception with "G
Hi,
I have been using the spark streaming with kafka. I have to restart the
application daily due to kms issue and after restart the offsets are not
matching with the point I left. I am creating checkpoint directory with
val streamingContext = StreamingContext.getOrCreate(checkPointDir
Hi,
Could anyone please provide your thoughts on how to kill spark streaming
application gracefully.
I followed link of
http://why-not-learn-something.blogspot.in/2016/05/apache-spark-streaming-how-to-do.html
https://github.com/lanjiang/streamingstopgraceful
I played around with having either
you are creating streaming context each time
val streamingContext = new StreamingContext(sparkSession.sparkContext,
Seconds(config.getInt(Constants.Properties.BatchInterval)))
if you want fault-tolerance, to read from where it stopped between spark job
restarts, the correct way is to restore
Hi,
I have a simple Java program to read data from kafka using spark streaming.
When i run it from eclipse on my mac, it is connecting to the zookeeper,
bootstrap nodes,
But its not displaying any data. it does not give any error.
it just shows
18/01/16 20:49:15 INFO Executor: Finished task
...@gmail.com> wrote:
> It could be a missing persist before the checkpoint
>
> > On 16. Jan 2018, at 22:04, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > Spark streaming job from kafka is not picking the messages and is alw
It could be a missing persist before the checkpoint
> On 16. Jan 2018, at 22:04, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
> wrote:
>
> Hi,
>
> Spark streaming job from kafka is not picking the messages and is always
> taking the latest offsets when streaming
Hi,
Spark streaming job from kafka is not picking the messages and is always
taking the latest offsets when streaming job is stopped for 2 hours. It is
not picking up the offsets that are required to be processed from
checkpoint directory. any suggestions on how to process the old messages
too
Kafka Clients are blocking spark streaming jobs and after a time streaming job
queue increases.
-Original Message-
From: Cody Koeninger [mailto:c...@koeninger.org]
Sent: Tuesday, December 26, 2017 6:47 PM
To: Diogo Munaro Vieira <diogo.mun...@corp.globo.com>
Cc: Serkan TAS &
Do not add a dependency on kafka-clients, the spark-streaming-kafka
library has appropriate transitive dependencies.
Either version of the spark-streaming-kafka library should work with
1.0 brokers; what problems were you having?
On Mon, Dec 25, 2017 at 7:58 PM, Diogo Munaro Vieira
<diogo.
Hey Serkan, it depends of your Kafka version... Is it 0.8.2?
Em 25 de dez de 2017 06:17, "Serkan TAS" <serkan@enerjisa.com> escreveu:
> Hi,
>
>
>
> Working on spark 2.2.0 cluster and 1.0 kafka brokers.
>
>
>
> I was using the library
>
> &quo
Hi,
Working on spark 2.2.0 cluster and 1.0 kafka brokers.
I was using the library
"org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0"
and had lots of problems during streaming process then downgraded to
"org.apache.spark" % &
Hi All,
How to use value (schema) of one of the columns of a dataset to parse
another column and create a flattened dataset using Spark Streaming 2.2.0?
I have the following *source data frame* that I create from reading
messages from Kafka
col1: string
col2: json string
col1
Sorry, for not making it explicit. We are using Spark Streaming as the
streaming solution and I was wondering if it is a common pattern to do per
tuple redis read/write and write to a REST API through Spark Streaming.
Regards,
Ashish
On Fri, Dec 22, 2017 at 4:00 AM, Gourav Sengupta <gourav.se
hi Ashish,
I was just wondering if there is any particular reason why you are posting
this to a SPARK group?
Regards,
Gourav
On Thu, Dec 21, 2017 at 8:32 PM, ashish rawat wrote:
> Hi,
>
> We are working on a streaming solution where multiple out of order streams
> are
Hi,
We are working on a streaming solution where multiple out of order streams
are flowing in the system and we need to join the streams based on a unique
id. We are planning to use redis for this, where for every tuple, we will
lookup if the id exists, we join if it does or else put the tuple
I am trying to connect spark streaming with flume with pull mode.
I have three machine and each one runs spark and flume agent at the same
time, where they are master, slave1, slave2.
I have set flume sink to slave1 on port 6689. Telnet slave1 6689 on other
two machine works well.
In my code, I
.io.confluent.kafka.serializers.KafkaAvroDecoder
kr, Gerard.
On Wed, Dec 13, 2017 at 6:05 PM, Arkadiusz Bicz <arkadiusz.b...@gmail.com>
wrote:
> Hi,
>
> I try to test spark streaming 2.2.0 version with confluent 3.3.0
>
> I have got lot of error during compilation this is my sbt:
>
> lazy val sparks
Hi,
I try to test spark streaming 2.2.0 version with confluent 3.3.0
I have got lot of error during compilation this is my sbt:
lazy val sparkstreaming = (project in file("."))
.settings(
name := "sparkstreaming",
organization := "org.arek",
version :=
Hi Team,
Can someone please advise me on the above post since because of this I have
written data file to HDFS location.
So as of now am just passing the filename into Kafka topic and not utilizing
Kafka potential at the best looking forward to suggestions.
Thanks,
Umar
--
Sent from:
Hi Richard,
Thanks for the confirmation.
However, I believe you must be facing issue as in JIRA 22008.
Regards,
Sourav
Sent from my iPhone
> On Dec 3, 2017, at 9:39 AM, Qiao, Richard <richard.q...@capitalone.com> wrote:
>
> Sourav:
> I’m using spark strea
Sourav:
I’m using spark streaming 2.1.0 and can confirm
spark.dynamicAllocation.enabled is enough.
Best Regards
Richard
From: Sourav Mazumder <sourav.mazumde...@gmail.com>
Date: Sunday, December 3, 2017 at 12:31 PM
To: user <user@spark.apache.org>
Subject: Dyna
Hi,
I see the following jira is resolved in Spark 2.0
https://issues.apache.org/jira/browse/SPARK-12133 which is supposed to
support Dynamic Resource Allocation in Spark Streaming.
I also see the JiRA https://issues.apache.org/jira/browse/SPARK-22008 which
is about fixing numer of executor
use a Java NLP library, it should not be any issue in
> integrating with spark streaming, but as I pointed out earlier, we want to
> give flexibility to data scientists to use the language and library of
> their choice, instead of restricting them to a library of our choice.
>
&g
il.com>
> Date: Thursday, November 30, 2017 at 8:16 PM
> To: Cody Koeninger <c...@koeninger.org>
> Cc: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Re: [Spark streaming] No assigned partition error during seek
>
>
>
> I notice that 'Do
)
Best Regards
Richard
From: venkat <meven...@gmail.com>
Date: Thursday, November 30, 2017 at 8:16 PM
To: Cody Koeninger <c...@koeninger.org>
Cc: "user@spark.apache.org" <user@spark.apache.org>
Subject: Re: [Spark streaming] No assigned partition error during seek
I noti
I notice that *'Do not* manually add dependencies on org.apache.kafka
artifacts (e.g. kafka-clients). The spark-streaming-kafka-0-10 artifact has
the appropriate transitive dependencies already, and different versions may
be incompatible in hard to diagnose way' after your query.
Does this imply
at 7:39 PM, venks61176 <meven...@gmail.com> wrote:
> > Version: 2.2 with Kafka010
> >
> > Hi,
> >
> > We are running spark streaming on AWS and trying to process incoming
> > messages on Kafka topics. All was well.
> > Recently we wanted to migrate fr
<meven...@gmail.com> wrote:
> Version: 2.2 with Kafka010
>
> Hi,
>
> We are running spark streaming on AWS and trying to process incoming
> messages on Kafka topics. All was well.
> Recently we wanted to migrate from 0.8 to 0.11 version of Spark library and
>
:19 AM, ashish rawat <dceash...@gmail.com> wrote:
> Thanks Holden and Chetan.
>
> Holden - Have you tried it out, do you know the right way to do it?
> Chetan - yes, if we use a Java NLP library, it should not be any issue in
> integrating with spark streaming, but as I pointed
Thanks Holden and Chetan.
Holden - Have you tried it out, do you know the right way to do it?
Chetan - yes, if we use a Java NLP library, it should not be any issue in
integrating with spark streaming, but as I pointed out earlier, we want to
give flexibility to data scientists to use
.
>
> On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com> wrote:
>
>> Hi,
>>
>> Has someone tried running NLTK (python) with Spark Streaming (scala)? I
>> was wondering if this is a good idea and what are the right Spark operators
>> to do th
So it’s certainly doable (it’s not super easy mind you), but until the
arrow udf release goes out it will be rather slow.
On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com> wrote:
> Hi,
>
> Has someone tried running NLTK (python) with Spark Streaming (scala)? I
Hi,
Has someone tried running NLTK (python) with Spark Streaming (scala)? I was
wondering if this is a good idea and what are the right Spark operators to
do this? The reason we want to try this combination is that we don't want
to run our transformations in python (pyspark), but after
Version: 2.2 with Kafka010
Hi,
We are running spark streaming on AWS and trying to process incoming
messages on Kafka topics. All was well.
Recently we wanted to migrate from 0.8 to 0.11 version of Spark library and
Kafka 0.11 version of server.
With this new version of software we are facing
I have a spark streaming job that reads from several kinesis streams and unions
them together in a single streaming context.
val streams = ingestionStreams.map(streamName => {
KinesisInputDStream.builder.checkpointAppName(s"${jobName}_$streamName")
.streamNa
Did you check that the security extensions are installed (JCE)?
KhajaAsmath Mohammed schrieb am Mi., 22. Nov.
2017 um 19:36 Uhr:
> [image: Inline image 1]
>
> This is what we are on.
>
> On Wed, Nov 22, 2017 at 12:33 PM, KhajaAsmath Mohammed <
> mdkhajaasm...@gmail.com>
[image: Inline image 1]
This is what we are on.
On Wed, Nov 22, 2017 at 12:33 PM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> We use oracle JDK. we are on unix.
>
> On Wed, Nov 22, 2017 at 12:31 PM, Georg Heiler
> wrote:
>
>> Do you use oracle or open
We use oracle JDK. we are on unix.
On Wed, Nov 22, 2017 at 12:31 PM, Georg Heiler
wrote:
> Do you use oracle or open jdk? We recently had an issue with open jdk:
> formerly, java Security extensions were installed by default - no longer so
> on centos 7.3
>
> Are
Do you use oracle or open jdk? We recently had an issue with open jdk:
formerly, java Security extensions were installed by default - no longer so
on centos 7.3
Are these installed?
KhajaAsmath Mohammed schrieb am Mi. 22. Nov. 2017
um 19:29:
> I passed keytab, renewal
I passed keytab, renewal is enabled by running the script every eight
hours. User gets renewed by the script every eight hours.
On Wed, Nov 22, 2017 at 12:27 PM, Georg Heiler
wrote:
> Did you pass a keytab? Is renewal enabled in your kdc?
> KhajaAsmath Mohammed
Did you pass a keytab? Is renewal enabled in your kdc?
KhajaAsmath Mohammed schrieb am Mi. 22. Nov. 2017
um 19:25:
> Hi,
>
> I have written spark stream job and job is running successfully for more
> than 36 hours. After around 36 hours job gets failed with kerberos
Hi,
I have written spark stream job and job is running successfully for more
than 36 hours. After around 36 hours job gets failed with kerberos issue.
Any solution on how to resolve it.
org.apache.spark.SparkException: Task failed while wri\
ting rows.
at
Hi,
In the following example using mapWithState, I set checkpoint interval to 1
minute. From the log, Spark stills write to the checkpoint directory every
second. Would be appreciated if someone can point out what I have done wrong.
object MapWithStateDemo {
def main(args: Array[String]) {
Hi,
I am running spark streaming job and it is not picking up the next batches
but the job is still shows as running on YARN.
is this expected behavior if there is no data or waiting for data to pick
up?
I am almost behind 4 hours of batches (30 min interval)
[image: Inline image 1]
[image
atest/configuration.html
>>
>> On Tue, Nov 14, 2017 at 8:56 PM, jkagitala <jka...@gmail.com> wrote:
>> > Hi,
>> >
>> > I'm trying to add spark-streaming to our kafka topic. But, I keep
>> > getting
>> > this error
>&g
Here is screenshot . Status shows finished but it should be running for
next batch to pick up the data.
[image: Inline image 1]
On Thu, Nov 16, 2017 at 10:01 PM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:
> Hi,
>
> I have scheduled spark streaming job to run e
Hi,
I have scheduled spark streaming job to run every 30 minutes and it was
running fine till 32 hours and suddenly I see status of Finsished instead
of running (Since it always run in background and shows up in resource
manager)
Am i doing anything wrong here? how come job was finished without
Hi,
You're right...killing the spark streaming job is the way to go. If a batch
was completed successfully, Spark Streaming will recover from the
controlled failure and start where it left off. I don't think there's other
way to do it.
Pozdrawiam,
Jacek Laskowski
https://about.me
..@gmail.com> wrote:
> > Hi,
> >
> > I'm trying to add spark-streaming to our kafka topic. But, I keep getting
> > this error
> > java.lang.AssertionError: assertion failed: Failed to get record after
> > polling for 512 ms.
> >
> > I tried to add differen
spark.streaming.kafka.consumer.poll.ms is a spark configuration, not
a kafka parameter.
see http://spark.apache.org/docs/latest/configuration.html
On Tue, Nov 14, 2017 at 8:56 PM, jkagitala <jka...@gmail.com> wrote:
> Hi,
>
> I'm trying to add spark-streaming to our kafka top
Hi,
I am new in the usage of spark streaming. I have developed one spark
streaming job which runs every 30 minutes with checkpointing directory.
I have to implement minor change, shall I kill the spark streaming job once
the batch is completed using yarn application -kill command and update
Hi,
I'm trying to add spark-streaming to our kafka topic. But, I keep getting
this error
java.lang.AssertionError: assertion failed: Failed to get record after
polling for 512 ms.
I tried to add different params like max.poll.interval.ms,
spark.streaming.kafka.consumer.poll.ms to 1ms
Hi All, I’m new to streaming avro records and am parsing Avro from a Kafka
direct stream with spark streaming 2.1.1, I was wondering if anyone could
please suggest an API for decoding Avro records with Scala? I’ve found
KafkaAvroDecoder, twitter/bijection and the Avro library, each seem
Hi,
I am not successful when using using spark 2.1 with Kafka 0.9, can anyone
please share the code snippet to use it.
val sparkSession: SparkSession = runMode match {
case "local" => SparkSession.builder.config(sparkConfig).getOrCreate
case "yarn" =>
Hello.
I am running Spark 2.1, Scala 2.11. We're running several Spark streaming
jobs. In some cases we restart these jobs on an occasional basis. We have
code that looks like the following:
logger.info("Starting the streaming context!")
ssc.start()
logger.info("Waiting
I am trying out the network word count example and my unit test is
producing the blow console output with an exception
Exception in thread "dispatcher-event-loop-5"
java.lang.NoClassDefFoundError:
scala/runtime/AbstractPartialFunction$mcVL$sp
at java.lang.ClassLoader.defineClass1(Native Method)
ing.DataStreamWriter.start(DataStreamWriter.scala:282)
>> at
>> org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:222)
>>
>> While running on the EMR cluster all paths point to S3. In my laptop, they
>> all point to local filesystem.
>>
>&
nk you.
From: roshan joe <impdocs2...@gmail.com<mailto:impdocs2...@gmail.com>>
Date: Monday, October 30, 2017 at 7:53 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: share datasets across mul
>>
>> From: roshan joe <impdocs2...@gmail.com>
>> Date: Monday, October 30, 2017 at 7:53 PM
>> To: "user@spark.apache.org" <user@spark.apache.org>
>> Subject: share datasets across multiple spark-streaming applications for
>> lookup
>>
&g
Do they
> work well with multiple Apps doing lookups simultaneously? Are there better
> options? Thank you.
>
>
>
> *From: *roshan joe <impdocs2...@gmail.com>
> *Date: *Monday, October 30, 2017 at 7:53 PM
> *To: *"user@spark.apache.org" <user@spark.apache.o
7 at 7:53 PM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: share datasets across multiple spark-streaming applications for lookup
Hi,
What is the recommended way to share datasets across multiple spark-streaming
applications, so that the incoming data can be looked
Hi,
What is the recommended way to share datasets across multiple
spark-streaming applications, so that the incoming data can be looked up
against this shared dataset?
The shared dataset is also incrementally refreshed and stored on S3. Below
is the scenario.
Streaming App-1 consumes data from
.
hope this helps.
Regards
Shiv
> On Oct 29, 2017, at 11:03 AM, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
> wrote:
>
> Hi,
>
> I am using spark streaming to write data back into hive with the below code
> snippet
>
>
> eventHubsWindowedStrea
Hi,
I am using spark streaming to write data back into hive with the below code
snippet
eventHubsWindowedStream.map(x => EventContent(new String(x)))
.foreachRDD(rdd => {
val sparkSession = SparkSession
.builder.enableHiveSupport.getOrCreate
end up rolling out!
>
> Good luck! and i'd love to hear any findings discovery you may come
> across!
>
> Gary Lucas
>
> On 26 October 2017 at 09:22, umargeek <umarfarooq.tech...@gmail.com>
> wrote:
>
>> We are building a spark streaming application wh
for integration tests with whichever
system you end up rolling out!
Good luck! and i'd love to hear any findings discovery you may come across!
Gary Lucas
On 26 October 2017 at 09:22, umargeek <umarfarooq.tech...@gmail.com> wrote:
> We are building a spark streaming application which is process
We are building a spark streaming application which is process and time
intensive and currently using python API but looking forward for suggestions
whether to use Scala over python such as pro's and con's as we are planning
to production setup as next step?
Thanks,
Umar
--
Sent from: http
case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 24 October 2017 at 13:53, Thomas Bailet <thomas.bai...@hurence.com>
>>>> wrote:
>>>>
>>>
>
>>> On 24 October 2017 at 13:53, Thomas Bailet <thomas.bai...@hurence.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> we (@ hurence) have released on open source middleware based on
>>>> SparkStreaming over Kafka to do CEP a
> Hi
>>>
>>> we (@ hurence) have released on open source middleware based on
>>> SparkStreaming over Kafka to do CEP and log mining, called *logisland*
>>> (https://github.com/Hurence/logisland/) it has been deployed into
>>> production for 2 y
> production for 2 years now and does a great job. You should have a look.
>>
>>
>> bye
>>
>> Thomas Bailet
>>
>> CTO : hurence
>>
>> Le 18/10/17 à 22:05, Mich Talebzadeh a écrit :
>>
>> As you may be aware the granularity that Spark s
ining, called *logisland* (
> https://github.com/Hurence/logisland/) it has been deployed into
> production for 2 years now and does a great job. You should have a look.
>
>
> bye
>
> Thomas Bailet
>
> CTO : hurence
>
> Le 18/10/17 à 22:05, Mich Talebzadeh a écrit :
>
&g
Bailet
CTO : hurence
Le 18/10/17 à 22:05, Mich Talebzadeh a écrit :
As you may be aware the granularity that Spark streaming has is
micro-batching and that is limited to 0.5 second. So if you have
continuous ingestion of data then Spark streaming may not be granular
enough for CEP. You may
As you may be aware the granularity that Spark streaming has is
micro-batching and that is limited to 0.5 second. So if you have continuous
ingestion of data then Spark streaming may not be granular enough for CEP.
You may consider other products.
Worth looking at this old thread on mine "
Hello all,
Has anyone used spark streaming for CEP (Complex Event processing). Any
CEP libraries that works well with spark. I have a use case for CEP and
trying to see if spark streaming is a good fit.
Currently we have a data pipeline using Kafka, Spark streaming and
Cassandra for data
*. (
this was not the case with checkpointing mechanism, where I could see all
50,000 messages after restart).
What do you think is missing in this?
Following is the improved code based on previous inputs
//create Spark Streaming Context
val stream:InputDStream[ConsumerRecord[String,String
ility of the
> streaming application. ( Using checkpoints, I already implemented, we will
> require to change code in production hence checkpoint won't work)
>
> Checking Spark Streaming documentation- Storing offsets on Kafka approach
> :
>
> http://spark.apache.org/docs/la
Spark Streaming documentation- Storing offsets on Kafka approach :
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#kafka-itself,
which describes :
stream.foreachRDD { rdd =>
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
// some time later, af
s only.
>> In case you need aggregation on a different key you may need to
>> re-partition the data to a new topic and run new streams app against that.
>>
>> So yes if you have good idea about your data and if it comes from kafka
>> and you want to build something
t.
>>
>> So yes if you have good idea about your data and if it comes from kafka
>> and you want to build something quick without much hardware kafka streams
>> is a way to go.
>>
>> We had first tried spark streaming but given hardware limitation and
>> c
need to
> re-partition the data to a new topic and run new streams app against that.
>
> So yes if you have good idea about your data and if it comes from kafka
> and you want to build something quick without much hardware kafka streams
> is a way to go.
>
> We had first tried spar
a different key you may need to
> re-partition the data to a new topic and run new streams app against that.
>
> So yes if you have good idea about your data and if it comes from kafka
> and you want to build something quick without much hardware kafka streams
> is a way to go.
kafka streams is a
way to go.
We had first tried spark streaming but given hardware limitation and
complexity of fetching data from mongodb we decided kafka streams as way to
go forward.
Thanks
Sachin
On Wed, Oct 11, 2017 at 1:01 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:
Hi,
Has anyone had an experience of using Kafka streams versus Spark?
I am not familiar with Kafka streams concept except that it is a set of
libraries.
Any feedback will be appreciated.
Regards,
Mich
LinkedIn *
On Tue, Oct 3, 2017 at 10:50 AM, shyla deshpande <deshpandesh...@gmail.com>
wrote:
> Hi all,
> I have a data pipeline using Spark streaming, Kafka and Cassandra.
> Are there any libraries to help me with complex event processing using
> Spark Streaming?
>
> I appreciate your help.
>
> Thanks
>
Hi all,
I have a data pipeline using Spark streaming, Kafka and Cassandra.
Are there any libraries to help me with complex event processing using
Spark Streaming?
I appreciate your help.
Thanks
Oct 1, 2017 at 7:55 PM, Hammad <ham...@flexilogix.com> wrote:
> Hello,
>
> *Background:*
>
> I have Spark Streaming context;
>
> SparkConf conf = new
> SparkConf().setMaster("local[2]").setAppName("TransformerStreamPOC");
> conf.set("spark.d
Hello,
*Background:*
I have Spark Streaming context;
SparkConf conf = new
SparkConf().setMaster("local[2]").setAppName("TransformerStreamPOC");
conf.set("spark.driver.allowMultipleContexts", "true"); *<== this*
JavaStreamingContext jssc = new Java
Can anyone provide me code snippet/ steps to write a data frame to Kafka
topic in a spark streaming application using pyspark with spark 2.1.1 and
Kafka 0.8 (Direct Stream Approach)?
Thanks,
Umar
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com
401 - 500 of 4567 matches
Mail list logo