Multiple Streaming Apps running on the Spark Cluster

2017-07-10 Thread winter fresh
We do have 4 streaming Apps running on a 3 Node cluster. No. Of Executors / App = 2 No. Of Cores / Executor = 2 Memory / Executor = 12 GB Cluster Capacity -- Cores / Node = 8 Memory / Node = 61 GB The problem is that the one Heavy Load App just crashes after running for

how to get the summary count of words by filestream please?

2017-07-10 Thread ??????????
Hi all, I use the textFileStream to count the words in the dir.I use flatmap to split the words, the reduceByKey to count and foreachRDD to see the result. I just see count in last RDD, would you like tell me how to count all words in all RDD please? thanks Fei Shao

Re: Spark 2.1.1 Graphx graph loader GC overhead error

2017-07-10 Thread Aritra Mandal
yncxcw wrote > hi, > > It highly depends on the algorithms you are going to apply to your data > sets. Graph applications are usually memory hungry and probably cause > long > GC or even OOM. > > Suggestions include: 1. make some highly reused RDD as > StorageLevel.MEMORY_ONLY > and leave the

Re: Spark streaming giving me a bunch of WARNINGS, please help meunderstand them

2017-07-10 Thread Cody Koeninger
The warnings regarding configuration on the executor are for the executor kafka consumer, not the driver kafka consumer. In general, the executor kafka consumers should consume only exactly the offsets the driver told them to, and not be involved in committing offsets / part of the same group as

spark-graphframes

2017-07-10 Thread Dennis Grinwald
Hello GraphFrame-community, our company is very interested in using GraphFrames for large enterprise tools. Therefore I would like to ask a few questions regarding the architecture of GraphFrames: 1. In GraphFrames Quick-Start Guide it says that it's built on top of SparkSQL. Is the

Re: Iterate over grouped df to create new rows/df

2017-07-10 Thread ayan guha
Hi Happy that my solution worked for you. The solution is a sql trick to identify the boundaries of a session. It has nothing to do with spark itself. In the first step it calculates the difference between two consecutive rows. Then it gives a number fg which is a running number, remains same

Re: Spark, S3A, and 503 SlowDown / rate limit issues

2017-07-10 Thread Everett Anderson
Hey, Thanks for the responses, guys! On Thu, Jul 6, 2017 at 7:08 AM, Steve Loughran wrote: > > On 5 Jul 2017, at 14:40, Vadim Semenov > wrote: > > Are you sure that you use S3A? > Because EMR says that they do not support S3A > >

Runtime exception with AccumulatorV2 on Spark 2.2/2.1.1

2017-07-10 Thread B Li
Hi community, I'm using a custom AccumulatorV2 with java api: MyAccumulatorV2 accum = new MyAccumulatorV2(); jsc.sc().register(accum, "MyAccumulator"); I got a runtime exception in an executor (in standalone cluster mode): java.lang.UnsupportedOperationException: Accumulator must be

Re: error in running StructuredStreaming-Kafka integration code (Spark 2.x & Kafka 10)

2017-07-10 Thread karan alang
Actually, i've 2 versions of Kafka (0.9 & 0.10) .. btw, i was able to resolve the issue .. sbt by default considers src/main/scala as default source location, I'd changed the location to a different one. I changed the build.sbt to point to the required location, that fixed the issue regds,

Spark streaming application is failing after running for few hours

2017-07-10 Thread shyla deshpande
My Spark streaming application is failing after running for few hours. After it fails, when I check the storage tab, I see that MapWithStateRDD is less than 100%. Is this is reason why it is failing? What does MapWithStateRDD 90% cached mean. Does this mean I lost 10% or the 10% is spilled to

Re: Event time aggregation is possible in Spark Streaming ?

2017-07-10 Thread Swapnil Chougule
Thanks Michael for update Regards, Swapnil On 10 Jul 2017 11:50 p.m., "Michael Armbrust" wrote: > Event-time aggregation is only supported in Structured Streaming. > > On Sat, Jul 8, 2017 at 4:18 AM, Swapnil Chougule > wrote: > >> Hello, >> >>

Re: error in running StructuredStreaming-Kafka integration code (Spark 2.x & Kafka 10)

2017-07-10 Thread David Newberger
Karen, It looks like the Kafka version is incorrect. You mention Kafka 0.10 however the classpath references Kafka 0.9 Thanks, David On July 10, 2017 at 1:44:06 PM, karan alang (karan.al...@gmail.com) wrote: Hi All, I'm running Spark Streaming - Kafka integration using Spark 2.x & Kafka 10.

error in running StructuredStreaming-Kafka integration code (Spark 2.x & Kafka 10)

2017-07-10 Thread karan alang
Hi All, I'm running Spark Streaming - Kafka integration using Spark 2.x & Kafka 10. & seems to be running into issues. I compiled the program using sbt, and the compilation went through fine. I was able able to import this into Eclipse & run the program from Eclipse. However, when i run the

Databricks Spark XML parsing exception while iterating

2017-07-10 Thread Amol Talap
Hi All Does anyone know a fix for below exception. The XML parsing function works fine for unit test as you see in below code but fails while using in RDD. new_xml: org.apache.spark.rdd.RDD[List[(String, String)]] = MapPartitionsRDD[119] at map at :57 17/07/10 08:29:54 ERROR Executor: Exception

Re: Event time aggregation is possible in Spark Streaming ?

2017-07-10 Thread Michael Armbrust
Event-time aggregation is only supported in Structured Streaming. On Sat, Jul 8, 2017 at 4:18 AM, Swapnil Chougule wrote: > Hello, > > I want to know whether event time aggregation in spark streaming. I could > see it's possible in structured streaming. As I am working

Re: Union of 2 streaming data frames

2017-07-10 Thread Michael Armbrust
As I said in the voting thread: This vote passes! I'll followup with the release on Monday. On Mon, Jul 10, 2017 at 10:55 AM, Lalwani, Jayesh < jayesh.lalw...@capitalone.com> wrote: > Michael, > > > > I see that 2.2 RC6 has passed a vote on Friday. Does this mean 2.2 is > going to be out

Re: Union of 2 streaming data frames

2017-07-10 Thread Lalwani, Jayesh
Michael, I see that 2.2 RC6 has passed a vote on Friday. Does this mean 2.2 is going to be out soon? Do you have some sort of ETA? From: "Lalwani, Jayesh" Date: Friday, July 7, 2017 at 5:46 PM To: Michael Armbrust Cc:

Re: Spark streaming giving me a bunch of WARNINGS, please help meunderstand them

2017-07-10 Thread shyla deshpande
WARN Use an existing SparkContext, some configuration may not take effect. I wanted to restart the spark streaming app, so stopped the running and issued a new spark submit. Why and how it will use a existing SparkContext? => you are using checkpoint to restore the sparkcontext.

SparkException: Invalid master URL

2017-07-10 Thread Mina Aslani
Hi I get below error when I try to run a job running in swarm-node. Can you please let me know what the problem is and how it can be fixed? Best regards, Mina util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception

RE: Timeline for stable release for Spark Structured Streaming

2017-07-10 Thread Mendelson, Assaf
Any day now. One of the major milestones of spark 2.2 is making structured streaming a stable feature. Spark 2.2 has passed RC6 a couple of days ago so it should be out any day now. Note that people have been using spark 2.1 for production in structured streaming so you should be able to start

Timeline for stable release for Spark Structured Streaming

2017-07-10 Thread Dhrubajyoti Hati
Hi, I was checking the documentation of Structured Streaming Programming Guide and it seems its still in alpha mode. Any timeline when this module will be ready to use for production environments. *Regards,*

Re: Spark streaming giving me a bunch of WARNINGS, please help meunderstand them

2017-07-10 Thread ??????????
It seems you are usibg kafka 0.10. See my comments below. ---Original--- From: "shyla deshpande" Date: 2017/7/10 08:17:10 To: "user"; Subject: Spark streaming giving me a bunch of WARNINGS, please help meunderstand them WARN Use an existing

Re: Glue-like Functionality

2017-07-10 Thread Simon Kitching
Sounds similar to Confluent Kafka Schema Registry and Kafka Connect. The Schema Registry and Kafka Connect themselves are open-source, but some of the datasource-specific adapters, and GUIs to manage it all, are not open-source (see Confluent Enterprise Edition). Note that the Schema Registry

Re: UI for spark machine learning.

2017-07-10 Thread Jayant Shekhar
Hello Mahesh, We have built one. You can download from here : https://www.sparkflows.io/download Feel free to ping me for any questions, etc. Best Regards, Jayant On Sun, Jul 9, 2017 at 9:35 PM, Mahesh Sawaiker < mahesh_sawai...@persistent.com> wrote: > Hi, > > > 1) Is anyone aware of any