a mechanism to limit the number of
>>>> retries for executor recreation. If the system fails to successfully create
>>>> an executor more than a specified number of times (e.g., 5 attempts), the
>>>> entire Spark application should fail and stop trying to re
y configurations
> (`spark.task.maxFailures`, `spark.stage.maxConsecutiveAttempts`), but these
> do not directly address the issue of limiting executor creation retries.
> Implementing a custom monitoring solution to track executor failures and
> manually stop the application is a poten
lure states.
>>>
>>> *Questions for the Community*
>>>
>>> 1. Is there an existing configuration or method within Spark or the
>>> Spark Operator to limit executor recreation attempts and fail the job after
>>> reaching a threshold?
>>>
&g
plored Spark's task and stage retry configurations
>> (`spark.task.maxFailures`, `spark.stage.maxConsecutiveAttempts`), but these
>> do not directly address the issue of limiting executor creation retries.
>> Implementing a custom monitoring solution to track executor failures and
>> manually stop the application is a potential workaround, but it would be
>> preferable to have a more integrated solution.
>>
>> I appreciate any guidance, insights, or feedback you can provide on this
>> matter.
>>
>> Thank you for your time and support.
>>
>> Best regards,
>> Sri P
>>
>
have explored Spark's task and stage retry configurations
> (`spark.task.maxFailures`, `spark.stage.maxConsecutiveAttempts`), but these
> do not directly address the issue of limiting executor creation retries.
> Implementing a custom monitoring solution to track executor failures and
workaround, but it would be
preferable to have a more integrated solution.
I appreciate any guidance, insights, or feedback you can provide on this
matter.
Thank you for your time and support.
Best regards,
Sri P
or best practices would be pivotal for our project. Alternatively,
if you could point me to resources or community experts who have tackled
similar challenges, it would be of immense assistance.
Thank you for bearing with the intricacies of our query, and I appreciate
your continued guidance
for bearing with the intricacies of our query, and I appreciate your continued guidance in this endeavor.Warm regards,Jon Rodríguez Aranguren.El sáb, 30 sept 2023 a las 23:19, Jayabindu Singh (<jayabi...@gmail.com>) escribió:Hi Jon,Using IAM as suggested by Jorn is the best approach.We recently
ective is to understand the best approach or methods to ensure that these secrets can be smoothly accessed by the Spark application.If any of you have previously encountered this scenario or possess relevant insights on the matter, your guidance would be highly beneficial.Thank you for your time and c
iple Kubernetes secrets, notably
>> multiple S3 keys, at the SparkConf level for a Spark application. My
>> objective is to understand the best approach or methods to ensure that
>> these secrets can be smoothly accessed by the Spark application.
>>
>> If any of you hav
t; objective is to understand the best approach or methods to ensure that
> these secrets can be smoothly accessed by the Spark application.
>
> If any of you have previously encountered this scenario or possess
> relevant insights on the matter, your guidance would be highly beneficia
e SparkConf level for a Spark application. My
> objective is to understand the best approach or methods to ensure that these
> secrets can be smoothly accessed by the Spark application.
>
> If any of you have previously encountered this scenario or possess relevant
> insights on the matt
to ensure that
these secrets can be smoothly accessed by the Spark application.
If any of you have previously encountered this scenario or possess relevant
insights on the matter, your guidance would be highly beneficial.
Thank you for your time and consideration. I'm eager to learn from
issues.
>
>
> Thanks and regards,
> Gowtham S
>
>
> On Tue, 19 Sept 2023 at 17:23, Karthick
> wrote:
>
>> Subject: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem
>>
>> Dear Spark Community,
>>
>> I recently reached out to the Apache
free to share them.
Looking forward to hearing from others who might have encountered similar
issues.
Thanks and regards,
Gowtham S
On Tue, 19 Sept 2023 at 17:23, Karthick wrote:
> Subject: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem
>
> Dear Spark Community,
>
Subject: Seeking Guidance on Kafka Slow Consumer and Data Skew Problem
Dear Spark Community,
I recently reached out to the Apache Flink community for assistance with a
critical issue we are facing in our IoT platform, which relies on Apache
Kafka and real-time data processing. We received some
uction.
-- Forwarded message -
From: Apache Spark+AI London
Date: Thu, 24 Aug 2023 at 20:01
Subject: Wednesday: Join 6 Members at "Ofir Press | Complementing Scale:
Novel Guidance Methods for Improving LMs"
To:
Apache Spark+AI London invites you to keep connecting
[imag
Hi everyone, I want to ask for guidance for my log analyzer platform idea.
I have an elasticsearch system which collects the logs from different
platforms, and creates alerts. The system writes the alerts to an index on
ES. Also, my alerts are stored in a folder as JSON (multi line format
Thanks, yes. I was using Int for my V and didn't get the second param in
the second closure right :)
On Mon, Apr 13, 2015 at 1:55 PM, Dean Wampler deanwamp...@gmail.com wrote:
That appears to work, with a few changes to get the types correct:
input.distinct().combineByKey((s: String) = 1,
That appears to work, with a few changes to get the types correct:
input.distinct().combineByKey((s: String) = 1, (agg: Int, s: String) =
agg + 1, (agg1: Int, agg2: Int) = agg1 + agg2)
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
How about this?
input.distinct().combineByKey((v: V) = 1, (agg: Int, x: Int) = agg + 1,
(agg1: Int, agg2: Int) = agg1 + agg2).collect()
On Mon, Apr 13, 2015 at 10:31 AM, Dean Wampler deanwamp...@gmail.com
wrote:
The problem with using collect is that it will fail for large data sets,
as
The problem with using collect is that it will fail for large data sets, as
you'll attempt to copy the entire RDD to the memory of your driver program.
The following works (Scala syntax, but similar to Python):
scala val i1 = input.distinct.groupByKey
scala i1.foreach(println)
**Learning the ropes**
I'm trying to grasp the concept of using the pipeline in pySpark...
Simplified example:
list=[(1,alpha),(1,beta),(1,foo),(1,alpha),(2,alpha),(2,alpha),(2,bar),(3,foo)]
Desired outcome:
[(1,3),(2,2),(3,1)]
Basically for each key, I want the number of unique values.
I've
the specs2 framework
with
Spark, but could not find any simple examples I could follow. I am open
to
specs2 or Funsuite, whichever works best with Spark. I would like some
additional guidance, or some simple sample code using specs2 or
Funsuite. My
code is provided below.
I have
tests using specs2 framework
and
have got them to work in Scalding. I tried to use the specs2 framework
with
Spark, but could not find any simple examples I could follow. I am open
to
specs2 or Funsuite, whichever works best with Spark. I would like some
additional guidance, or some simple
framework with
Spark, but could not find any simple examples I could follow. I am open to
specs2 or Funsuite, whichever works best with Spark. I would like some
additional guidance, or some simple sample code using specs2 or Funsuite. My
code is provided below.
I have the following code in src
26 matches
Mail list logo