Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream

2018-07-05 Thread Tathagata Das
Hey all, In Spark 2.4.0, there will be a new feature called *foreachBatch* which will expose the output rows of every micro-batch as a dataframe, on which you apply a user-defined function. With that, you can reuse existing batch sources for writing results as well as write results to multiple

unsubscribe

2018-07-05 Thread Peter
unsubscribe

Fwd: BeakerX 1.0 released

2018-07-05 Thread s...@draves.org
We are pleased to announce the release of BeakerX 1.0 . BeakerX is a collection of kernels and extensions to the Jupyter interactive computing environment. It provides JVM support, Spark cluster support, polyglot programming, interactive plots, tables, forms, publishing, and

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-05 Thread Jayant Shekhar
Hello Chetan, We have currently done it with .pipe(.py) as Prem suggested. That passes the RDD as CSV strings to the python script. The python script can either process it line by line, create the result and return it back. Or create things like Pandas Dataframe for processing and finally write

Strange behavior of Spark Masters during rolling update

2018-07-05 Thread bsikander
We have a Spark standalone cluster running on 2.2.1 in HA mode using Zookeeper. Occasionally, we have a rolling update where first the Primary master goes down and then Secondary node and then zookeeper nodes running on there own VMs. In the image below,

Spark 2.3 Kubernetes error

2018-07-05 Thread purna pradeep
Hello, When I’m trying to set below options to spark-submit command on k8s Master getting below error in spark-driver pod logs --conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost -Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \ --conf

Spark 2.3 Kubernetes error

2018-07-05 Thread Mamillapalli, Purna Pradeep
Hello, When I’m trying to set below options to spark-submit command on k8s Master getting below error in spark-driver pod logs --conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost -Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \ --conf

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-05 Thread Chetan Khatri
Prem sure, Thanks for suggestion. On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure wrote: > try .pipe(.py) on RDD > > Thanks, > Prem > > On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri > wrote: > >> Can someone please suggest me , thanks >> >> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, >> wrote: >>

Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream

2018-07-05 Thread Amiya Mishra
Hi Chandan/Jürgen, I had tried through a native code having single input data frame with multiple sinks as : Spark provides a method called awaitAnyTermination() in StreamingQueryManager.scala which provides all the required details to handle the query processed by spark.By observing

Automatic Json Schema inference using Structured Streaming

2018-07-05 Thread SRK
Hi, Is there a way that Automatic Json Schema inference can be done using Structured Streaming? I do not want to supply a predefined schema and bind it. With Spark Kafka Direct I could do spark.read.json(). I see that this is not supported in Structured Streaming. Thanks! -- Sent from:

Re: [Spark Streaming MEMORY_ONLY] Understanding Dataflow

2018-07-05 Thread Thomas Lavocat
Excerpts from Prem Sure's message of 2018-07-04 19:39:29 +0530: > Hoping below would help in clearing some.. > executors dont have control to share the data among themselves except > sharing accumulators via driver's support. > Its all based on the data locality or remote nature, tasks/stages are

structured streaming: how to keep counter of error records in log running streaming application

2018-07-05 Thread chandan prakash
Hi, I am writing a structured streaming application, where I process records post some validation (lets say , not null). Want to keep a counter of invalid records in the long running streaming application while other valid records get processed. How can I achieve it ? First thought was using

Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream

2018-07-05 Thread chandan prakash
Hi Amiya/Jürgen, Did you get any lead on this ? I want to process records post some validation. Correct records should go in sink1 and incorrect records should go in sink2. How to achieve this in single stream ? Regards, Chandan On Wed, Jun 13, 2018 at 2:30 PM Amiya Mishra wrote: > Hi Jürgen,