Re: Spark Tuning Tool

2018-01-22 Thread Roger Marin
I'd be very interested. On 23 Jan. 2018 4:01 pm, "Rohit Karlupia" wrote: > Hi, > > I have been working on making the performance tuning of spark applications > bit easier. We have just released the beta version of the tool on Qubole. > >

Re: splitting a huge file

2017-04-21 Thread Roger Marin
If the file is in HDFS already you can use spark to read the file using a specific input format (depending on file type) to split it. http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html On Sat, Apr 22, 2017 at 4:36 AM, Paul Tremblay wrote:

Re: How can we connect RDD from previous job to next job

2016-08-28 Thread Roger Marin
Hi Sachin, Have a look at the spark job server project, it allows you to share rdds & dataframes between spark jobs running in the same context, the catch is you have to implement your spark job as a spark job server spark job.

What is the best approach to perform concurrent updates from different jobs to a in memory dataframe registered as a temp table?

2016-02-29 Thread Roger Marin
Hi all, I have multiple (>100) jobs running concurrently (sharing the same hive context) that are each appending new rows to the same dataframe registered as a temp table. Currently I am using unionAll and registering that dataframe again as a temp table in each job: Given an existing dataframe