Re: Structured Streaming: How to add a listener for when a batch is complete

2019-09-03 Thread Tathagata Das
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#reporting-metrics-programmatically-using-asynchronous-apis On Tue, Sep 3, 2019, 3:26 PM Natalie Ruiz wrote: > Hello all, > > > > I’m a beginner, new to Spark and wanted to know if there was an equivalent > to Spark

Structured Streaming: How to add a listener for when a batch is complete

2019-09-03 Thread Natalie Ruiz
Hello all, I'm a beginner, new to Spark and wanted to know if there was an equivalent to Spark Streaming's StreamingListenerBatchCompleted in Structured Streaming? I want to add a listener for when a batch is complete but the documentation and examples I find are for Spark Streaming and not

Re: EMR Spark 2.4.3 executor hang

2019-09-03 Thread Vadim Semenov
Try "spark.shuffle.io.numConnectionsPerPeer=10" On Fri, Aug 30, 2019 at 10:22 AM Daniel Zhang wrote: > Hi, All: > We are testing the EMR and compare with our on-premise HDP solution. We > use one application as the test: > EMR (5.21.1) with Hadoop 2.8.5 + Spark 2.4.3 vs HDP (2.6.3) with Hadoop

Unit testing PySpark Code and doing assertion

2019-09-03 Thread Rahul Nandi
Hi, I'm trying to do unit testing of my pyspark DataFrame code. My goal is to do an assertion on the schema and data of the DataFrames. I'm looking for options if there are any known libraries that I can use for doing the same. Any library which can work on 10-15 records in the DataFrame is good

Re: Control Sqoop job from Spark job

2019-09-03 Thread Shyam P
J Franke, Leave alone sqoop , I am just asking about spark in ETL of Oracle ...? Thanks, Shyam >

Re: Control Sqoop job from Spark job

2019-09-03 Thread Jörn Franke
This I would not say. The only “issue” with Spark is that you need to build some functionality on top which is available in Sqoop out of the box, especially for import processes and if you need to define a lot of them. > Am 03.09.2019 um 09:30 schrieb Shyam P : > > Hi Mich, >Lot of people

Re: Control Sqoop job from Spark job

2019-09-03 Thread Shyam P
Hi Mich, Lot of people say that Spark does not have proven record in migrating data from oracle as sqoop has. At list in production. Please correct me if I am wrong and suggest how to deal with shuffling when dealing with groupBy ? Thanks, Shyam On Sat, Aug 31, 2019 at 12:17 PM Mich