[Error :] RDD TO Dataframe Spark Streaming

2018-01-31 Thread Divya Gehlot
Hi, I am getting below error when creating Dataframe from twitter Streaming RDD val sparkSession:SparkSession = SparkSession .builder .appName("twittertest2") .master("local[*]") .enableHiveSupport()

Re: Prefer Structured Streaming over Spark Streaming (DStreams)?

2018-01-31 Thread Michael Armbrust
At this point I recommend that new applications are built using structured streaming. The engine was GA-ed as of Spark 2.2 and I know of several very large (trillions of records) production jobs that are running in Structured Streaming. All of our production pipelines at databricks are written

Re: Prefer Structured Streaming over Spark Streaming (DStreams)?

2018-01-31 Thread vijay.bvp
here is my two cents, experts please correct me if wrong its important to understand why one over other and for what kind of use case. There might be sometime in future where low level API's are abstracted and become legacy but for now in Spark RDD API is the core and low level API, all higher

Re: Spark Streaming Cluster queries

2018-01-31 Thread vijay.bvp
Assuming you are talking about Spark Streaming 1) How to analyze what part of code executes on Spark Driver and what part of code executes on the executors? RDD's can be understood as set of data transformations or set of jobs. Your understanding deepens as you do more programming with Spark

Prefer Structured Streaming over Spark Streaming (DStreams)?

2018-01-31 Thread Biplob Biswas
Hi, I read an article which recommended to use dataframes instead of rdd primitives. Now I read about the differences over using DStreams and Structured Streaming and structured streaming adds a lot of improvements like checkpointing, windowing, sessioning, fault tolerance etc. What I am

Re: [Spark Streaming]: Non-deterministic uneven task-to-machine assignment

2018-01-31 Thread vijay.bvp
Summarizing 1) Static data set read from Parquet files as DataFrame in HDFS has initial parallelism of 90 (based on no input files) 2) static data set DataFrame is converted as rdd, and rdd has parallelism of 18 this was not expected dataframe.rdd is lazy evaluation there must be some operation

[Spark Streaming]: Non-deterministic uneven task-to-machine assignment

2018-01-30 Thread LongVehicle
Hello everyone, We are running Spark Streaming jobs in Spark 2.1 in cluster mode in YARN. We have an RDD (3GB) that we periodically (every 30min) refresh by reading from HDFS. Namely, we create a DataFrame /df / using /sqlContext.read.parquet/, and then we create /RDD rdd = df.as[T].rdd

Spark Streaming checkpoint

2018-01-29 Thread KhajaAsmath Mohammed
Hi, I have written spark streaming job to use the checkpoint. I have stopped the streaming job for 5 days and then restart it today. I have encountered weird issue where it shows as zero records for all cycles till date. is it causing data loss? [image: Inline image 1] Thanks, Asmath

Spark Streaming Cluster queries

2018-01-27 Thread puneetloya
Hi All, A cluster of one spark driver and multiple executors(5) is setup with redis for spark processed data storage and s3 is used for checkpointing. I have a couple of queries about this setup. 1) How to analyze what part of code executes on Spark Driver and what part of code executes on the

Re: uncontinuous offset in kafka will cause the spark streaming failure

2018-01-23 Thread Justin Miller
We appear to be kindred spirits, I’ve recently run into the same issue. Are you running compacted topics? I’ve run into this issue on non-compacted topics as well, it happens rarely but is still a pain. You might check out this patch and related spark streaming Kafka ticket: https://github.com

uncontinuous offset in kafka will cause the spark streaming failure

2018-01-23 Thread namesuperwood
Hi all kafka version : kafka_2.11-0.11.0.2 spark version : 2.0.1 A topic-partition "adn-tracking,15" in kafka who's earliest offset is1255644602 andlatest offset is1271253441. While starting a spark streaming to process the data from the topic , we got a exception with "G

Production Critical : Data loss in spark streaming

2018-01-22 Thread KhajaAsmath Mohammed
Hi, I have been using the spark streaming with kafka. I have to restart the application daily due to kms issue and after restart the offsets are not matching with the point I left. I am creating checkpoint directory with val streamingContext = StreamingContext.getOrCreate(checkPointDir

Gracefully shutdown spark streaming application

2018-01-21 Thread KhajaAsmath Mohammed
Hi, Could anyone please provide your thoughts on how to kill spark streaming application gracefully. I followed link of http://why-not-learn-something.blogspot.in/2016/05/apache-spark-streaming-how-to-do.html https://github.com/lanjiang/streamingstopgraceful I played around with having either

Re: Spark Streaming not reading missed data

2018-01-16 Thread vijay.bvp
you are creating streaming context each time val streamingContext = new StreamingContext(sparkSession.sparkContext, Seconds(config.getInt(Constants.Properties.BatchInterval))) if you want fault-tolerance, to read from where it stopped between spark job restarts, the correct way is to restore

spark streaming kafka not displaying data in local eclipse

2018-01-16 Thread vr spark
Hi, I have a simple Java program to read data from kafka using spark streaming. When i run it from eclipse on my mac, it is connecting to the zookeeper, bootstrap nodes, But its not displaying any data. it does not give any error. it just shows 18/01/16 20:49:15 INFO Executor: Finished task

Re: Spark Streaming not reading missed data

2018-01-16 Thread KhajaAsmath Mohammed
...@gmail.com> wrote: > It could be a missing persist before the checkpoint > > > On 16. Jan 2018, at 22:04, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> > wrote: > > > > Hi, > > > > Spark streaming job from kafka is not picking the messages and is alw

Re: Spark Streaming not reading missed data

2018-01-16 Thread Jörn Franke
It could be a missing persist before the checkpoint > On 16. Jan 2018, at 22:04, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> > wrote: > > Hi, > > Spark streaming job from kafka is not picking the messages and is always > taking the latest offsets when streaming

Spark Streaming not reading missed data

2018-01-16 Thread KhajaAsmath Mohammed
Hi, Spark streaming job from kafka is not picking the messages and is always taking the latest offsets when streaming job is stopped for 2 hours. It is not picking up the offsets that are required to be processed from checkpoint directory. any suggestions on how to process the old messages too

RE: Which kafka client to use with spark streaming

2017-12-26 Thread Serkan TAS
Kafka Clients are blocking spark streaming jobs and after a time streaming job queue increases. -Original Message- From: Cody Koeninger [mailto:c...@koeninger.org] Sent: Tuesday, December 26, 2017 6:47 PM To: Diogo Munaro Vieira <diogo.mun...@corp.globo.com> Cc: Serkan TAS &

Re: Which kafka client to use with spark streaming

2017-12-26 Thread Cody Koeninger
Do not add a dependency on kafka-clients, the spark-streaming-kafka library has appropriate transitive dependencies. Either version of the spark-streaming-kafka library should work with 1.0 brokers; what problems were you having? On Mon, Dec 25, 2017 at 7:58 PM, Diogo Munaro Vieira <diogo.

Re: Which kafka client to use with spark streaming

2017-12-25 Thread Diogo Munaro Vieira
Hey Serkan, it depends of your Kafka version... Is it 0.8.2? Em 25 de dez de 2017 06:17, "Serkan TAS" <serkan@enerjisa.com> escreveu: > Hi, > > > > Working on spark 2.2.0 cluster and 1.0 kafka brokers. > > > > I was using the library > > &quo

Which kafka client to use with spark streaming

2017-12-25 Thread Serkan TAS
Hi, Working on spark 2.2.0 cluster and 1.0 kafka brokers. I was using the library "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0" and had lots of problems during streaming process then downgraded to "org.apache.spark" % &

How to use schema from one of the columns of a dataset to parse another column and create a flattened dataset using Spark Streaming 2.2.0?

2017-12-23 Thread kant kodali
Hi All, How to use value (schema) of one of the columns of a dataset to parse another column and create a flattened dataset using Spark Streaming 2.2.0? I have the following *source data frame* that I create from reading messages from Kafka col1: string col2: json string col1

Re: Spark Streaming to REST API

2017-12-21 Thread ashish rawat
Sorry, for not making it explicit. We are using Spark Streaming as the streaming solution and I was wondering if it is a common pattern to do per tuple redis read/write and write to a REST API through Spark Streaming. Regards, Ashish On Fri, Dec 22, 2017 at 4:00 AM, Gourav Sengupta <gourav.se

Re: Spark Streaming to REST API

2017-12-21 Thread Gourav Sengupta
hi Ashish, I was just wondering if there is any particular reason why you are posting this to a SPARK group? Regards, Gourav On Thu, Dec 21, 2017 at 8:32 PM, ashish rawat wrote: > Hi, > > We are working on a streaming solution where multiple out of order streams > are

Spark Streaming to REST API

2017-12-21 Thread ashish rawat
Hi, We are working on a streaming solution where multiple out of order streams are flowing in the system and we need to join the streams based on a unique id. We are planning to use redis for this, where for every tuple, we will lookup if the id exists, we join if it does or else put the tuple

spark streaming with flume: cannot assign requested address error

2017-12-13 Thread Junfeng Chen
I am trying to connect spark streaming with flume with pull mode. I have three machine and each one runs spark and flume agent at the same time, where they are master, slave1, slave2. I have set flume sink to slave1 on port 6689. Telnet slave1 6689 on other two machine works well. In my code, I

Re: Spark Streaming with Confluent

2017-12-13 Thread Gerard Maas
.io.confluent.kafka.serializers.KafkaAvroDecoder kr, Gerard. On Wed, Dec 13, 2017 at 6:05 PM, Arkadiusz Bicz <arkadiusz.b...@gmail.com> wrote: > Hi, > > I try to test spark streaming 2.2.0 version with confluent 3.3.0 > > I have got lot of error during compilation this is my sbt: > > lazy val sparks

Spark Streaming with Confluent

2017-12-13 Thread Arkadiusz Bicz
Hi, I try to test spark streaming 2.2.0 version with confluent 3.3.0 I have got lot of error during compilation this is my sbt: lazy val sparkstreaming = (project in file(".")) .settings( name := "sparkstreaming", organization := "org.arek", version :=

Re: How to write dataframe to kafka topic in spark streaming application using pyspark other than collect?

2017-12-07 Thread umargeek
Hi Team, Can someone please advise me on the above post since because of this I have written data file to HDFS location. So as of now am just passing the filename into Kafka topic and not utilizing Kafka potential at the best looking forward to suggestions. Thanks, Umar -- Sent from:

Re: Dynamic Resource allocation in Spark Streaming

2017-12-03 Thread Sourav Mazumder
Hi Richard, Thanks for the confirmation. However, I believe you must be facing issue as in JIRA 22008. Regards, Sourav Sent from my iPhone > On Dec 3, 2017, at 9:39 AM, Qiao, Richard <richard.q...@capitalone.com> wrote: > > Sourav: > I’m using spark strea

Re: Dynamic Resource allocation in Spark Streaming

2017-12-03 Thread Qiao, Richard
Sourav: I’m using spark streaming 2.1.0 and can confirm spark.dynamicAllocation.enabled is enough. Best Regards Richard From: Sourav Mazumder <sourav.mazumde...@gmail.com> Date: Sunday, December 3, 2017 at 12:31 PM To: user <user@spark.apache.org> Subject: Dyna

Dynamic Resource allocation in Spark Streaming

2017-12-03 Thread Sourav Mazumder
Hi, I see the following jira is resolved in Spark 2.0 https://issues.apache.org/jira/browse/SPARK-12133 which is supposed to support Dynamic Resource Allocation in Spark Streaming. I also see the JiRA https://issues.apache.org/jira/browse/SPARK-22008 which is about fixing numer of executor

Re: NLTK with Spark Streaming

2017-12-01 Thread ashish rawat
use a Java NLP library, it should not be any issue in > integrating with spark streaming, but as I pointed out earlier, we want to > give flexibility to data scientists to use the language and library of > their choice, instead of restricting them to a library of our choice. > &g

Re: [Spark streaming] No assigned partition error during seek

2017-12-01 Thread Cody Koeninger
il.com> > Date: Thursday, November 30, 2017 at 8:16 PM > To: Cody Koeninger <c...@koeninger.org> > Cc: "user@spark.apache.org" <user@spark.apache.org> > Subject: Re: [Spark streaming] No assigned partition error during seek > > > > I notice that 'Do

Re: [Spark streaming] No assigned partition error during seek

2017-12-01 Thread Qiao, Richard
) Best Regards Richard From: venkat <meven...@gmail.com> Date: Thursday, November 30, 2017 at 8:16 PM To: Cody Koeninger <c...@koeninger.org> Cc: "user@spark.apache.org" <user@spark.apache.org> Subject: Re: [Spark streaming] No assigned partition error during seek I noti

Re: [Spark streaming] No assigned partition error during seek

2017-11-30 Thread venkat
I notice that *'Do not* manually add dependencies on org.apache.kafka artifacts (e.g. kafka-clients). The spark-streaming-kafka-0-10 artifact has the appropriate transitive dependencies already, and different versions may be incompatible in hard to diagnose way' after your query. Does this imply

Re: [Spark streaming] No assigned partition error during seek

2017-11-30 Thread venkat
at 7:39 PM, venks61176 <meven...@gmail.com> wrote: > > Version: 2.2 with Kafka010 > > > > Hi, > > > > We are running spark streaming on AWS and trying to process incoming > > messages on Kafka topics. All was well. > > Recently we wanted to migrate fr

Re: [Spark streaming] No assigned partition error during seek

2017-11-30 Thread Cody Koeninger
<meven...@gmail.com> wrote: > Version: 2.2 with Kafka010 > > Hi, > > We are running spark streaming on AWS and trying to process incoming > messages on Kafka topics. All was well. > Recently we wanted to migrate from 0.8 to 0.11 version of Spark library and >

Re: NLTK with Spark Streaming

2017-11-28 Thread Nicholas Hakobian
:19 AM, ashish rawat <dceash...@gmail.com> wrote: > Thanks Holden and Chetan. > > Holden - Have you tried it out, do you know the right way to do it? > Chetan - yes, if we use a Java NLP library, it should not be any issue in > integrating with spark streaming, but as I pointed

Re: NLTK with Spark Streaming

2017-11-26 Thread ashish rawat
Thanks Holden and Chetan. Holden - Have you tried it out, do you know the right way to do it? Chetan - yes, if we use a Java NLP library, it should not be any issue in integrating with spark streaming, but as I pointed out earlier, we want to give flexibility to data scientists to use

Re: NLTK with Spark Streaming

2017-11-26 Thread Chetan Khatri
. > > On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com> wrote: > >> Hi, >> >> Has someone tried running NLTK (python) with Spark Streaming (scala)? I >> was wondering if this is a good idea and what are the right Spark operators >> to do th

Re: NLTK with Spark Streaming

2017-11-26 Thread Holden Karau
So it’s certainly doable (it’s not super easy mind you), but until the arrow udf release goes out it will be rather slow. On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com> wrote: > Hi, > > Has someone tried running NLTK (python) with Spark Streaming (scala)? I

NLTK with Spark Streaming

2017-11-25 Thread ashish rawat
Hi, Has someone tried running NLTK (python) with Spark Streaming (scala)? I was wondering if this is a good idea and what are the right Spark operators to do this? The reason we want to try this combination is that we don't want to run our transformations in python (pyspark), but after

[Spark streaming] No assigned partition error during seek

2017-11-24 Thread venks61176
Version: 2.2 with Kafka010 Hi, We are running spark streaming on AWS and trying to process incoming messages on Kafka topics. All was well. Recently we wanted to migrate from 0.8 to 0.11 version of Spark library and Kafka 0.11 version of server. With this new version of software we are facing

Spark Streaming Kinesis Missing Records

2017-11-24 Thread Richard Moorhead
I have a spark streaming job that reads from several kinesis streams and unions them together in a single streaming context. val streams = ingestionStreams.map(streamName => { KinesisInputDStream.builder.checkpointAppName(s"${jobName}_$streamName") .streamNa

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread Georg Heiler
Did you check that the security extensions are installed (JCE)? KhajaAsmath Mohammed schrieb am Mi., 22. Nov. 2017 um 19:36 Uhr: > [image: Inline image 1] > > This is what we are on. > > On Wed, Nov 22, 2017 at 12:33 PM, KhajaAsmath Mohammed < > mdkhajaasm...@gmail.com>

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
[image: Inline image 1] This is what we are on. On Wed, Nov 22, 2017 at 12:33 PM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > We use oracle JDK. we are on unix. > > On Wed, Nov 22, 2017 at 12:31 PM, Georg Heiler > wrote: > >> Do you use oracle or open

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
We use oracle JDK. we are on unix. On Wed, Nov 22, 2017 at 12:31 PM, Georg Heiler wrote: > Do you use oracle or open jdk? We recently had an issue with open jdk: > formerly, java Security extensions were installed by default - no longer so > on centos 7.3 > > Are

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread Georg Heiler
Do you use oracle or open jdk? We recently had an issue with open jdk: formerly, java Security extensions were installed by default - no longer so on centos 7.3 Are these installed? KhajaAsmath Mohammed schrieb am Mi. 22. Nov. 2017 um 19:29: > I passed keytab, renewal

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
I passed keytab, renewal is enabled by running the script every eight hours. User gets renewed by the script every eight hours. On Wed, Nov 22, 2017 at 12:27 PM, Georg Heiler wrote: > Did you pass a keytab? Is renewal enabled in your kdc? > KhajaAsmath Mohammed

Re: Spark Streaming Kerberos Issue

2017-11-22 Thread Georg Heiler
Did you pass a keytab? Is renewal enabled in your kdc? KhajaAsmath Mohammed schrieb am Mi. 22. Nov. 2017 um 19:25: > Hi, > > I have written spark stream job and job is running successfully for more > than 36 hours. After around 36 hours job gets failed with kerberos

Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
Hi, I have written spark stream job and job is running successfully for more than 36 hours. After around 36 hours job gets failed with kerberos issue. Any solution on how to resolve it. org.apache.spark.SparkException: Task failed while wri\ ting rows. at

Spark 2.1.2 Spark Streaming checkpoint interval not respected

2017-11-18 Thread Shing Hing Man
Hi, In the following example using mapWithState, I set checkpoint interval to 1 minute. From the log, Spark stills write to the checkpoint directory every second. Would be appreciated if someone can point out what I have done wrong. object MapWithStateDemo { def main(args: Array[String]) {

Spark Streaming in Wait mode

2017-11-17 Thread KhajaAsmath Mohammed
Hi, I am running spark streaming job and it is not picking up the next batches but the job is still shows as running on YARN. is this expected behavior if there is no data or waiting for data to pick up? I am almost behind 4 hours of batches (30 min interval) [image: Inline image 1] [image

Re: Spark Streaming fails with unable to get records after polling for 512 ms

2017-11-17 Thread Cody Koeninger
atest/configuration.html >> >> On Tue, Nov 14, 2017 at 8:56 PM, jkagitala <jka...@gmail.com> wrote: >> > Hi, >> > >> > I'm trying to add spark-streaming to our kafka topic. But, I keep >> > getting >> > this error >&g

Re: Spark Streaming Job completed without executing next batches

2017-11-16 Thread KhajaAsmath Mohammed
Here is screenshot . Status shows finished but it should be running for next batch to pick up the data. [image: Inline image 1] On Thu, Nov 16, 2017 at 10:01 PM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi, > > I have scheduled spark streaming job to run e

Spark Streaming Job completed without executing next batches

2017-11-16 Thread KhajaAsmath Mohammed
Hi, I have scheduled spark streaming job to run every 30 minutes and it was running fine till 32 hours and suddenly I see status of Finsished instead of running (Since it always run in background and shows up in resource manager) Am i doing anything wrong here? how come job was finished without

Re: Restart Spark Streaming after deployment

2017-11-16 Thread Jacek Laskowski
Hi, You're right...killing the spark streaming job is the way to go. If a batch was completed successfully, Spark Streaming will recover from the controlled failure and start where it left off. I don't think there's other way to do it. Pozdrawiam, Jacek Laskowski https://about.me

Re: Spark Streaming fails with unable to get records after polling for 512 ms

2017-11-15 Thread jagadish kagitala
..@gmail.com> wrote: > > Hi, > > > > I'm trying to add spark-streaming to our kafka topic. But, I keep getting > > this error > > java.lang.AssertionError: assertion failed: Failed to get record after > > polling for 512 ms. > > > > I tried to add differen

Re: Spark Streaming fails with unable to get records after polling for 512 ms

2017-11-15 Thread Cody Koeninger
spark.streaming.kafka.consumer.poll.ms is a spark configuration, not a kafka parameter. see http://spark.apache.org/docs/latest/configuration.html On Tue, Nov 14, 2017 at 8:56 PM, jkagitala <jka...@gmail.com> wrote: > Hi, > > I'm trying to add spark-streaming to our kafka top

Restart Spark Streaming after deployment

2017-11-15 Thread KhajaAsmath Mohammed
Hi, I am new in the usage of spark streaming. I have developed one spark streaming job which runs every 30 minutes with checkpointing directory. I have to implement minor change, shall I kill the spark streaming job once the batch is completed using yarn application -kill command and update

Spark Streaming fails with unable to get records after polling for 512 ms

2017-11-14 Thread jkagitala
Hi, I'm trying to add spark-streaming to our kafka topic. But, I keep getting this error java.lang.AssertionError: assertion failed: Failed to get record after polling for 512 ms. I tried to add different params like max.poll.interval.ms, spark.streaming.kafka.consumer.poll.ms to 1ms

Spark Streaming Kafka

2017-11-10 Thread Frank Staszak
Hi All, I’m new to streaming avro records and am parsing Avro from a Kafka direct stream with spark streaming 2.1.1, I was wondering if anyone could please suggest an API for decoding Avro records with Scala? I’ve found KafkaAvroDecoder, twitter/bijection and the Avro library, each seem

Spark Streaming in Spark 2.1 with Kafka 0.9

2017-11-09 Thread KhajaAsmath Mohammed
Hi, I am not successful when using using spark 2.1 with Kafka 0.9, can anyone please share the code snippet to use it. val sparkSession: SparkSession = runMode match { case "local" => SparkSession.builder.config(sparkConfig).getOrCreate case "yarn" =>

Stopping a Spark Streaming Context gracefully

2017-11-07 Thread Bryan Jeffrey
Hello. I am running Spark 2.1, Scala 2.11. We're running several Spark streaming jobs. In some cases we restart these jobs on an occasional basis. We have code that looks like the following: logger.info("Starting the streaming context!") ssc.start() logger.info("Waiting

unable to run spark streaming example

2017-11-03 Thread Imran Rajjad
I am trying out the network word count example and my unit test is producing the blow console output with an exception Exception in thread "dispatcher-event-loop-5" java.lang.NoClassDefFoundError: scala/runtime/AbstractPartialFunction$mcVL$sp at java.lang.ClassLoader.defineClass1(Native Method)

Re: Chaining Spark Streaming Jobs

2017-11-02 Thread Sunita Arvind
ing.DataStreamWriter.start(DataStreamWriter.scala:282) >> at >> org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:222) >> >> While running on the EMR cluster all paths point to S3. In my laptop, they >> all point to local filesystem. >> >&

Re: share datasets across multiple spark-streaming applications for lookup

2017-11-02 Thread JG Perrin
nk you. From: roshan joe <impdocs2...@gmail.com<mailto:impdocs2...@gmail.com>> Date: Monday, October 30, 2017 at 7:53 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: share datasets across mul

Re: share datasets across multiple spark-streaming applications for lookup

2017-10-31 Thread Joseph Pride
>> >> From: roshan joe <impdocs2...@gmail.com> >> Date: Monday, October 30, 2017 at 7:53 PM >> To: "user@spark.apache.org" <user@spark.apache.org> >> Subject: share datasets across multiple spark-streaming applications for >> lookup >> &g

Re: share datasets across multiple spark-streaming applications for lookup

2017-10-31 Thread Gene Pang
Do they > work well with multiple Apps doing lookups simultaneously? Are there better > options? Thank you. > > > > *From: *roshan joe <impdocs2...@gmail.com> > *Date: *Monday, October 30, 2017 at 7:53 PM > *To: *"user@spark.apache.org" <user@spark.apache.o

Re: share datasets across multiple spark-streaming applications for lookup

2017-10-31 Thread Revin Chalil
7 at 7:53 PM To: "user@spark.apache.org" <user@spark.apache.org> Subject: share datasets across multiple spark-streaming applications for lookup Hi, What is the recommended way to share datasets across multiple spark-streaming applications, so that the incoming data can be looked

share datasets across multiple spark-streaming applications for lookup

2017-10-30 Thread roshan joe
Hi, What is the recommended way to share datasets across multiple spark-streaming applications, so that the incoming data can be looked up against this shared dataset? The shared dataset is also incrementally refreshed and stored on S3. Below is the scenario. Streaming App-1 consumes data from

Re: Spark Streaming Small files in Hive

2017-10-29 Thread Siva Gudavalli
. hope this helps. Regards Shiv > On Oct 29, 2017, at 11:03 AM, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> > wrote: > > Hi, > > I am using spark streaming to write data back into hive with the below code > snippet > > > eventHubsWindowedStrea

Spark Streaming Small files in Hive

2017-10-29 Thread KhajaAsmath Mohammed
Hi, I am using spark streaming to write data back into hive with the below code snippet eventHubsWindowedStream.map(x => EventContent(new String(x))) .foreachRDD(rdd => { val sparkSession = SparkSession .builder.enableHiveSupport.getOrCreate

Re: Suggestions on using scala/python for Spark Streaming

2017-10-26 Thread Sebastian Piu
end up rolling out! > > Good luck! and i'd love to hear any findings discovery you may come > across! > > Gary Lucas > > On 26 October 2017 at 09:22, umargeek <umarfarooq.tech...@gmail.com> > wrote: > >> We are building a spark streaming application wh

Re: Suggestions on using scala/python for Spark Streaming

2017-10-26 Thread lucas.g...@gmail.com
for integration tests with whichever system you end up rolling out! Good luck! and i'd love to hear any findings discovery you may come across! Gary Lucas On 26 October 2017 at 09:22, umargeek <umarfarooq.tech...@gmail.com> wrote: > We are building a spark streaming application which is process

Suggestions on using scala/python for Spark Streaming

2017-10-26 Thread umargeek
We are building a spark streaming application which is process and time intensive and currently using python API but looking forward for suggestions whether to use Scala over python such as pro's and con's as we are planning to production setup as next step? Thanks, Umar -- Sent from: http

Re: Spark streaming for CEP

2017-10-25 Thread anna stax
case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> On 24 October 2017 at 13:53, Thomas Bailet <thomas.bai...@hurence.com> >>>> wrote: >>>> >>>

Re: Spark streaming for CEP

2017-10-24 Thread lucas.g...@gmail.com
> >>> On 24 October 2017 at 13:53, Thomas Bailet <thomas.bai...@hurence.com> >>> wrote: >>> >>>> Hi >>>> >>>> we (@ hurence) have released on open source middleware based on >>>> SparkStreaming over Kafka to do CEP a

Re: Spark streaming for CEP

2017-10-24 Thread Mich Talebzadeh
> Hi >>> >>> we (@ hurence) have released on open source middleware based on >>> SparkStreaming over Kafka to do CEP and log mining, called *logisland* >>> (https://github.com/Hurence/logisland/) it has been deployed into >>> production for 2 y

Re: Spark streaming for CEP

2017-10-24 Thread Stephen Boesch
> production for 2 years now and does a great job. You should have a look. >> >> >> bye >> >> Thomas Bailet >> >> CTO : hurence >> >> Le 18/10/17 à 22:05, Mich Talebzadeh a écrit : >> >> As you may be aware the granularity that Spark s

Re: Spark streaming for CEP

2017-10-24 Thread Mich Talebzadeh
ining, called *logisland* ( > https://github.com/Hurence/logisland/) it has been deployed into > production for 2 years now and does a great job. You should have a look. > > > bye > > Thomas Bailet > > CTO : hurence > > Le 18/10/17 à 22:05, Mich Talebzadeh a écrit : > &g

Re: Spark streaming for CEP

2017-10-24 Thread Thomas Bailet
Bailet CTO : hurence Le 18/10/17 à 22:05, Mich Talebzadeh a écrit : As you may be aware the granularity that Spark streaming has is micro-batching and that is limited to 0.5 second. So if you have continuous ingestion of data then Spark streaming may not be granular enough for CEP. You may

Re: Spark streaming for CEP

2017-10-18 Thread Mich Talebzadeh
As you may be aware the granularity that Spark streaming has is micro-batching and that is limited to 0.5 second. So if you have continuous ingestion of data then Spark streaming may not be granular enough for CEP. You may consider other products. Worth looking at this old thread on mine "

Spark streaming for CEP

2017-10-18 Thread anna stax
Hello all, Has anyone used spark streaming for CEP (Complex Event processing). Any CEP libraries that works well with spark. I have a use case for CEP and trying to see if spark streaming is a good fit. Currently we have a data pipeline using Kafka, Spark streaming and Cassandra for data

Re: Issue Storing offset in Kafka for Spark Streaming Application

2017-10-13 Thread Arpan Rajani
*. ( this was not the case with checkpointing mechanism, where I could see all 50,000 messages after restart). What do you think is missing in this? Following is the improved code based on previous inputs //create Spark Streaming Context val stream:InputDStream[ConsumerRecord[String,String

Re: Issue Storing offset in Kafka for Spark Streaming Application

2017-10-13 Thread Gerard Maas
ility of the > streaming application. ( Using checkpoints, I already implemented, we will > require to change code in production hence checkpoint won't work) > > Checking Spark Streaming documentation- Storing offsets on Kafka approach > : > > http://spark.apache.org/docs/la

Issue Storing offset in Kafka for Spark Streaming Application

2017-10-13 Thread Arpan Rajani
Spark Streaming documentation- Storing offsets on Kafka approach : http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#kafka-itself, which describes : stream.foreachRDD { rdd => val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges // some time later, af

Re: Kafka streams vs Spark streaming

2017-10-11 Thread Sachin Mittal
s only. >> In case you need aggregation on a different key you may need to >> re-partition the data to a new topic and run new streams app against that. >> >> So yes if you have good idea about your data and if it comes from kafka >> and you want to build something

Re: Kafka streams vs Spark streaming

2017-10-11 Thread Sachin Mittal
t. >> >> So yes if you have good idea about your data and if it comes from kafka >> and you want to build something quick without much hardware kafka streams >> is a way to go. >> >> We had first tried spark streaming but given hardware limitation and >> c

Re: Kafka streams vs Spark streaming

2017-10-11 Thread Sabarish Sasidharan
need to > re-partition the data to a new topic and run new streams app against that. > > So yes if you have good idea about your data and if it comes from kafka > and you want to build something quick without much hardware kafka streams > is a way to go. > > We had first tried spar

Re: Kafka streams vs Spark streaming

2017-10-11 Thread Sabarish Sasidharan
a different key you may need to > re-partition the data to a new topic and run new streams app against that. > > So yes if you have good idea about your data and if it comes from kafka > and you want to build something quick without much hardware kafka streams > is a way to go.

Re: Kafka streams vs Spark streaming

2017-10-11 Thread Sachin Mittal
kafka streams is a way to go. We had first tried spark streaming but given hardware limitation and complexity of fetching data from mongodb we decided kafka streams as way to go forward. Thanks Sachin On Wed, Oct 11, 2017 at 1:01 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:

Kafka streams vs Spark streaming

2017-10-11 Thread Mich Talebzadeh
Hi, Has anyone had an experience of using Kafka streams versus Spark? I am not familiar with Kafka streams concept except that it is a set of libraries. Any feedback will be appreciated. Regards, Mich LinkedIn *

Re: Any libraries to do Complex Event Processing with spark streaming?

2017-10-03 Thread shyla deshpande
On Tue, Oct 3, 2017 at 10:50 AM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Hi all, > I have a data pipeline using Spark streaming, Kafka and Cassandra. > Are there any libraries to help me with complex event processing using > Spark Streaming? > > I appreciate your help. > > Thanks >

Any libraries to do Complex Event Processing with spark streaming?

2017-10-03 Thread shyla deshpande
Hi all, I have a data pipeline using Spark streaming, Kafka and Cassandra. Are there any libraries to help me with complex event processing using Spark Streaming? I appreciate your help. Thanks

Re: Spark Streaming - Multiple Spark Contexts (SparkSQL) Performance

2017-10-01 Thread Gerard Maas
Oct 1, 2017 at 7:55 PM, Hammad <ham...@flexilogix.com> wrote: > Hello, > > *Background:* > > I have Spark Streaming context; > > SparkConf conf = new > SparkConf().setMaster("local[2]").setAppName("TransformerStreamPOC"); > conf.set("spark.d

Fwd: Spark Streaming - Multiple Spark Contexts (SparkSQL) Performance

2017-10-01 Thread Hammad
Hello, *Background:* I have Spark Streaming context; SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("TransformerStreamPOC"); conf.set("spark.driver.allowMultipleContexts", "true"); *<== this* JavaStreamingContext jssc = new Java

How to write dataframe to kafka topic in spark streaming application using pyspark?

2017-09-25 Thread umargeek
Can anyone provide me code snippet/ steps to write a data frame to Kafka topic in a spark streaming application using pyspark with spark 2.1.1 and Kafka 0.8 (Direct Stream Approach)? Thanks, Umar -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com

<    1   2   3   4   5   6   7   8   9   10   >