Re: Custom SparkListener

2018-09-20 Thread Mark Hamstra
What do you mean? Spark Jobs don't have names. On Thu, Sep 20, 2018 at 9:40 PM Priya Ch wrote: > Hello All, > > I am trying to extend SparkListener and post job ends trying to retrieve > job name to check the status of either success/failure and write to log > file. > > I couldn't find a way

Custom SparkListener

2018-09-20 Thread Priya Ch
Hello All, I am trying to extend SparkListener and post job ends trying to retrieve job name to check the status of either success/failure and write to log file. I couldn't find a way where I could fetch job name in the onJobEnd method. Thanks, Padma CH

Re: Python Dependencies Issue on EMR

2018-09-20 Thread Jonas Shomorony
Thanks Patrick. Using a conda virtual environment did help with libraries that required the extra C stuff. Jonas On Fri, Sep 14, 2018 at 8:02 AM Patrick McCarthy wrote: > You didn't say how you're zipping the dependencies, but I'm guessing you > either include .egg files or zipped up a

Re: Question about Spark cluster memory usage monitoring

2018-09-20 Thread Muhib Khan
Hello, As far as I know, there is no API provided for tracking the execution memory of a Spark Worker node. For tracking the execution memory you will probably need to access the MemoryManager's onHeapExecutionMemoryPool and offHeapExecutionMemoryPool objects that track the memory allocated to

Question about Spark cluster memory usage monitoring

2018-09-20 Thread Liu, Jialin
Hi there, I am currently using Spark cluster to run jobs but I really need to collect the history of actually memory usage(that’s execution memory + storage memory) of the job in the whole cluster. I know we can get the storage memory usage through either Spark UI Executor page or

unsubscribe

2018-09-20 Thread Ryan Adams
unsubscribe Ryan Adams radams...@gmail.com

Re: Run spark tests on Windows/docker

2018-09-20 Thread Shmuel Blitz
Since I got no feedback, I'll try asking differently: Can anyone point me to any resources regarding how to run the project's tests? Where can I find a good Docker image that would serve as a YARN cluster for submitting jobs? Thanks, Shmuel On Sun, Sep 16, 2018 at 10:09 PM Shmuel Blitz wrote:

How to read multiple libsvm files in Spark?

2018-09-20 Thread Md. Rezaul Karim
I'm experiencing "Exception in thread "main" java.io.IOException: Multiple input paths are not supported for libsvm data" exception while trying to read multiple libsvm files using Spark 2.3.0: val URLs = spark.read.format("libsvm").load("url_svmlight.tar/url_svmlight/*.svm") Any other

Re: Time-Series Forecasting

2018-09-20 Thread Gourav Sengupta
Hi, If you are following the time series forecasting with the mathematical rigour and tractability then I think that using R is the best option. I do think that people tend to claim quite a lot these days that SPARK ML and other Python libraries are better, but just pick up a classical text book

Re: Time-Series Forecasting

2018-09-20 Thread Akash Mishra
We are using Yahoo Egads for our Anomaly Detection system on time series data. If has good forecasting and Anomaly Detection modules. https://github.com/yahoo/egads On Thu, Sep 20, 2018 at 5:22 AM Aakash Basu wrote: > Hey, > > Even though I'm more of a Data Engineer than Data Scientist, but