Hello, why would you like to delete the shuffle data yourself in the first
place?
On Thu, Dec 28, 2023, 10:08 yang chen wrote:
>
> hi, I'm learning spark, and wonder when to delete shuffle data, I find the
> ContextCleaner class which clean the shuffle data when shuffle dependency
> is GC-ed.
Hey Jack,
Emr serverless is a great fit for this. You can get these metrics for each
job when they are completed. Besides that, if you create separate "emr
applications" per group and tag them appropriately, you can use the cost
explorer to see the amount of resources being used.
If emr
relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 27 Feb 2023 at 09:06, murat migdisoglu
> wrote:
>
>> On an a
On an auto-scaling cluster using YARN as resource manager, we observed that
when we decrease the number of worker nodes after upscaling instance types,
the number of tasks for the same spark job spikes. (the total cpu/memory
capacity of the cluster remains identical)
the same spark job, with the
Hi ,
I recently created a spark cluster on AWS-EMR using a fleet configuration
with hybrid instance types. The instance types on this cluster vary
depending on the availability of the type.
While running the same spark applications that were running on homogenous
cluster(some pyspark apps doing
a potential reason might be that you are getting a classnotfound exception
when you run on the cluster (due to a missing jar in your uber jar) and you
are possibly silently eating up exceptions in your code.
1- you can check if there are any failed tasks
2- you can check if there are any failed
Hi,
I'm testing our codebase against spark 3.0.0 stack and I realized that
elasticsearch-hadoop libraries are built against scala 2.11 and thus are
not working with spark 3.0.0. (and probably 2.4.2).
Is there anybody else facing this issue? How did you solve it?
The PR on the ES library is open
:
org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu
wrote:
> Hello all,
> we have a hadoop cluster (using yarn) using s3 as filesystem with s3guard
> is enabled.
> We are using hadoop 3.2.1 with spark 2.4.5.
>
> When I try to save a dataframe in par
Hello all,
we have a hadoop cluster (using yarn) using s3 as filesystem with s3guard
is enabled.
We are using hadoop 3.2.1 with spark 2.4.5.
When I try to save a dataframe in parquet format, I get the following
exception:
java.lang.ClassNotFoundException: