Re: Error - Dropping SparkListenerEvent because no remaining room in event queue

2018-10-24 Thread Arun Mahadevan
Maybe you have spark listeners that are not processing the events fast enough? Do you have spark event logging enabled? You might have to profile the built in and your custom listeners to see whats going on. - Arun On Wed, 24 Oct 2018 at 16:08, karan alang wrote: > > Pls note - Spark version

Does Spark have a plan to move away from sun.misc.Unsafe?

2018-10-24 Thread kant kodali
Hi All, Does Spark have a plan to move away from sun.misc.Unsafe to VarHandles ? I am trying to find a JIRA issue for this? Thanks!

Re: Error - Dropping SparkListenerEvent because no remaining room in event queue

2018-10-24 Thread karan alang
Pls note - Spark version is 2.2.0 On Wed, Oct 24, 2018 at 3:57 PM karan alang wrote: > Hello - > we are running a Spark job, and getting the following error - > > "LiveListenerBus: Dropping SparkListenerEvent because no remaining room in > event queue" > > As per the recommendation in the Spark

Re: [Spark UI] Spark 2.3.1 UI no longer respects spark.ui.retainedJobs

2018-10-24 Thread Marcelo Vanzin
When you say many jobs at once, what ballpark are you talking about? The code in 2.3+ does try to keep data about all running jobs and stages regardless of the limit. If you're running into issues because of that we may have to look again at whether that's the right thing to do. On Tue, Oct 23,

Error - Dropping SparkListenerEvent because no remaining room in event queue

2018-10-24 Thread karan alang
Hello - we are running a Spark job, and getting the following error - "LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue" As per the recommendation in the Spark Docs - I've increased the value of property spark.scheduler.listenerbus.eventqueue.capacity to

Re: Spark 2.3.2 : No of active tasks vastly exceeds total no of executor cores

2018-10-24 Thread Shing Hing Man
I have increased spark.scheduler.listenerbus.eventqueue.capacity, and ran my application (in Yarn client mode) as before. I no longer get "Dropped events". But the driver ran out of memory. The Spark UI gradually became unreponsive. I noticed from the Spark UI that tens of thousands of jobs

Re: Watermarking without aggregation with Structured Streaming

2018-10-24 Thread Sanjay Awatramani
Try if this works... println(query.lastProgress.eventTime.get("watermark")) Regards,Sanjay On 2018/09/30 09:05:40, peay wrote:  > Thanks for the pointers. I guess right now the only workaround would be to > apply a "dummy" aggregation (e.g., group by the timestamp itself) only to > have the

Re: Watermarking without aggregation with Structured Streaming

2018-10-24 Thread sanjay_awat
Try this peay-2 wrote > For my purposes, an alternative solution to pushing it out to the source > would be to make the watermark timestamp available through a function so > that it can be used in a regular filter clause. Based on my experiments, > the timestamp is computed and updated even when

CVE-2018-11804: Apache Spark build/mvn runs zinc, and can expose information from build machines

2018-10-24 Thread Sean Owen
Severity: Low Vendor: The Apache Software Foundation Versions Affected: 1.3.x release branch and later, including master Description: Spark's Apache Maven-based build includes a convenience script, 'build/mvn', that downloads and runs a zinc server to speed up compilation. This server will

Re: [Spark for kubernetes] Azure Blob Storage credentials issue

2018-10-24 Thread Matt Cheah
Hi there, Can you check if HADOOP_CONF_DIR is being set on the executors to /opt/spark/conf? One should set an executor environment variable for that. A kubectl describe pod output for the executors would be helpful here. -Matt Cheah From: Oscar Bonilla Date: Friday, October 19,

Re: Triggering sql on Was S3 via Apache Spark

2018-10-24 Thread Gourav Sengupta
I do not think security and governance has become important it always was. Horton works and Cloudera has fantastic security implementations and hence I mentioned about updates via Hive. Regards, Gourav On Wed, 24 Oct 2018, 17:32 , wrote: > Thank you Gourav, > > Today I saw the article: >

Re: Triggering sql on Was S3 via Apache Spark

2018-10-24 Thread Omer.Ozsakarya
Thank you Gourav, Today I saw the article: https://databricks.com/session/apache-spark-in-cloud-and-hybrid-why-security-and-governance-become-more-important It seems also interesting. I was in meeting, I will also watch it. From: Gourav Sengupta Date: 24 October 2018 Wednesday 13:39 To:

Re: Triggering sql on Was S3 via Apache Spark

2018-10-24 Thread Gourav Sengupta
Also try to read about SCD and the fact that Hive may be a very good alternative as well for running updates on data Regards, Gourav On Wed, 24 Oct 2018, 14:53 , wrote: > Thank you very much  > > > > *From: *Gourav Sengupta > *Date: *24 October 2018 Wednesday 11:20 > *To: *"Ozsakarya, Omer"

Re: Triggering sql on Was S3 via Apache Spark

2018-10-24 Thread Omer.Ozsakarya
Thank you very much  From: Gourav Sengupta Date: 24 October 2018 Wednesday 11:20 To: "Ozsakarya, Omer" Cc: Spark Forum Subject: Re: Triggering sql on Was S3 via Apache Spark This is interesting you asked and then answered the questions (almost) as well Regards, Gourav On Tue, 23 Oct 2018,

How to write DataFrame to single parquet file instead of multiple files under a folder in spark?

2018-10-24 Thread mithril
For better viewing, please see https://stackoverflow.com/questions/52964167/how-to-write-dataframe-to-single-parquet-file-instead-of-multiple-files-under-a - I have a folder with files [![enter image description here][1]][1] I want to do some transform to each file and save to another

1

2018-10-24 Thread twinmegami
1

Re: Triggering sql on Was S3 via Apache Spark

2018-10-24 Thread Gourav Sengupta
This is interesting you asked and then answered the questions (almost) as well Regards, Gourav On Tue, 23 Oct 2018, 13:23 , wrote: > Hi guys, > > > > We are using Apache Spark on a local machine. > > > > I need to implement the scenario below. > > > > In the initial load: > >1. CRM