Re: Tuning Best Practices

2023-11-28 Thread Jack Goodson
Hi Bryant,

the below docs are a good start on performance tuning

https://spark.apache.org/docs/latest/sql-performance-tuning.html

Hope it helps!

On Wed, Nov 29, 2023 at 9:32 AM Bryant Wright 
wrote:

> Hi, I'm looking for a comprehensive list of Tuning Best Practices for
> spark.
>
> I did a search on the archives for "tuning" and the search returned no
> results.
>
> Thanks for your help.
>


Re: Classpath isolation per SparkSession without Spark Connect

2023-11-28 Thread Pasha Finkelshtein
I actually think it should be totally possible to use it on an executor
side. Maybe it will require a small extension/udf, but generally no issues
here. Pf4j is very lightweight, so you'll only have a small overhead for
classloaders.

There's still a small question of distribution of plugins/extensions, but
you probably already have a storage and can store them there.



[image: facebook] 
[image: twitter] 
[image: linkedin] 
[image: instagram] 

Pasha Finkelshteyn

Developer Advocate for Data Engineering

JetBrains



asm0...@jetbrains.com
https://linktr.ee/asm0dey

Find out more 



On Tue, 28 Nov 2023 at 17:04, Faiz Halde  wrote:

> Hey Pasha,
>
> Is your suggestion towards the spark team? I can make use of the plugin
> system on the driver side of spark but considering spark is distributed,
> the executor side of spark needs to adapt to the pf4j framework I believe
> too
>
> Thanks
> Faiz
>
> On Tue, Nov 28, 2023, 16:57 Pasha Finkelshtein <
> pavel.finkelsht...@gmail.com> wrote:
>
>> To me it seems like it's the best possible use case for PF4J.
>>
>>
>> [image: facebook] 
>> [image: twitter] 
>> [image: linkedin] 
>> [image: instagram] 
>>
>> Pasha Finkelshteyn
>>
>> Developer Advocate for Data Engineering
>>
>> JetBrains
>>
>>
>>
>> asm0...@jetbrains.com
>> https://linktr.ee/asm0dey
>>
>> Find out more 
>>
>>
>>
>> On Tue, 28 Nov 2023 at 12:47, Holden Karau 
>> wrote:
>>
>>> So I don’t think we make any particular guarantees around class path
>>> isolation there, so even if it does work it’s something you’d need to pay
>>> attention to on upgrades. Class path isolation is tricky to get right.
>>>
>>> On Mon, Nov 27, 2023 at 2:58 PM Faiz Halde  wrote:
>>>
 Hello,

 We are using spark 3.5.0 and were wondering if the following is
 achievable using spark-core

 Our use case involves spinning up a spark cluster where the driver
 application loads user jars containing spark transformations at runtime. A
 single spark application can load multiple user jars ( same cluster ) that
 can have class path conflicts if care is not taken

 AFAIK, to get this right requires the Executor to be designed in a way
 that allows for class path isolation ( UDF, lambda expressions ). Ideally
 per Spark Session is what we want

 I know Spark connect has been designed this way but Spark connect is
 not an option for us at the moment. I had some luck using a private method
 inside spark called JobArtifactSet.withActiveJobArtifactState

 Is it sufficient for me to run the user code enclosed
 within JobArtifactSet.withActiveJobArtifactState to achieve my requirement?

 Thank you


 Faiz

>>>


Tuning Best Practices

2023-11-28 Thread Bryant Wright
Hi, I'm looking for a comprehensive list of Tuning Best Practices for
spark.

I did a search on the archives for "tuning" and the search returned no
results.

Thanks for your help.


Re: Classpath isolation per SparkSession without Spark Connect

2023-11-28 Thread Faiz Halde
Hey Pasha,

Is your suggestion towards the spark team? I can make use of the plugin
system on the driver side of spark but considering spark is distributed,
the executor side of spark needs to adapt to the pf4j framework I believe
too

Thanks
Faiz

On Tue, Nov 28, 2023, 16:57 Pasha Finkelshtein 
wrote:

> To me it seems like it's the best possible use case for PF4J.
>
>
> [image: facebook] 
> [image: twitter] 
> [image: linkedin] 
> [image: instagram] 
>
> Pasha Finkelshteyn
>
> Developer Advocate for Data Engineering
>
> JetBrains
>
>
>
> asm0...@jetbrains.com
> https://linktr.ee/asm0dey
>
> Find out more 
>
>
>
> On Tue, 28 Nov 2023 at 12:47, Holden Karau  wrote:
>
>> So I don’t think we make any particular guarantees around class path
>> isolation there, so even if it does work it’s something you’d need to pay
>> attention to on upgrades. Class path isolation is tricky to get right.
>>
>> On Mon, Nov 27, 2023 at 2:58 PM Faiz Halde  wrote:
>>
>>> Hello,
>>>
>>> We are using spark 3.5.0 and were wondering if the following is
>>> achievable using spark-core
>>>
>>> Our use case involves spinning up a spark cluster where the driver
>>> application loads user jars containing spark transformations at runtime. A
>>> single spark application can load multiple user jars ( same cluster ) that
>>> can have class path conflicts if care is not taken
>>>
>>> AFAIK, to get this right requires the Executor to be designed in a way
>>> that allows for class path isolation ( UDF, lambda expressions ). Ideally
>>> per Spark Session is what we want
>>>
>>> I know Spark connect has been designed this way but Spark connect is not
>>> an option for us at the moment. I had some luck using a private method
>>> inside spark called JobArtifactSet.withActiveJobArtifactState
>>>
>>> Is it sufficient for me to run the user code enclosed
>>> within JobArtifactSet.withActiveJobArtifactState to achieve my requirement?
>>>
>>> Thank you
>>>
>>>
>>> Faiz
>>>
>>


Re: Classpath isolation per SparkSession without Spark Connect

2023-11-28 Thread Pasha Finkelshtein
To me it seems like it's the best possible use case for PF4J.


[image: facebook] 
[image: twitter] 
[image: linkedin] 
[image: instagram] 

Pasha Finkelshteyn

Developer Advocate for Data Engineering

JetBrains



asm0...@jetbrains.com
https://linktr.ee/asm0dey

Find out more 



On Tue, 28 Nov 2023 at 12:47, Holden Karau  wrote:

> So I don’t think we make any particular guarantees around class path
> isolation there, so even if it does work it’s something you’d need to pay
> attention to on upgrades. Class path isolation is tricky to get right.
>
> On Mon, Nov 27, 2023 at 2:58 PM Faiz Halde  wrote:
>
>> Hello,
>>
>> We are using spark 3.5.0 and were wondering if the following is
>> achievable using spark-core
>>
>> Our use case involves spinning up a spark cluster where the driver
>> application loads user jars containing spark transformations at runtime. A
>> single spark application can load multiple user jars ( same cluster ) that
>> can have class path conflicts if care is not taken
>>
>> AFAIK, to get this right requires the Executor to be designed in a way
>> that allows for class path isolation ( UDF, lambda expressions ). Ideally
>> per Spark Session is what we want
>>
>> I know Spark connect has been designed this way but Spark connect is not
>> an option for us at the moment. I had some luck using a private method
>> inside spark called JobArtifactSet.withActiveJobArtifactState
>>
>> Is it sufficient for me to run the user code enclosed
>> within JobArtifactSet.withActiveJobArtifactState to achieve my requirement?
>>
>> Thank you
>>
>>
>> Faiz
>>
>