I'd be very interested.
On 23 Jan. 2018 4:01 pm, "Rohit Karlupia" wrote:
> Hi,
>
> I have been working on making the performance tuning of spark applications
> bit easier. We have just released the beta version of the tool on Qubole.
>
>
If the file is in HDFS already you can use spark to read the file using a
specific input format (depending on file type) to split it.
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html
On Sat, Apr 22, 2017 at 4:36 AM, Paul Tremblay
wrote:
Hi Sachin,
Have a look at the spark job server project, it allows you to share rdds &
dataframes between spark jobs running in the same context, the catch is you
have to implement your spark job as a spark job server spark job.
Hi all,
I have multiple (>100) jobs running concurrently (sharing the same hive
context) that are each appending new rows to the same dataframe registered
as a temp table.
Currently I am using unionAll and registering that dataframe again as a
temp table in each job:
Given an existing dataframe