Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread L. C. Hsieh
+1

Thanks Hyukjin.

On Sun, Mar 31, 2024 at 10:52 PM Dongjoon Hyun  wrote:
>
> +1
>
> Thank you, Hyukjin.
>
> Dongjoon
>
> On Sun, Mar 31, 2024 at 19:07 Haejoon Lee 
>  wrote:
>>
>> +1
>>
>> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon  wrote:
>>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark 
>>> Connect)
>>>
>>> JIRA
>>> Prototype
>>> SPIP doc
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thanks.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Takuya UESHIN
+1

On Sun, Mar 31, 2024 at 6:16 PM Hyukjin Kwon  wrote:

> Hi all,
>
> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
> Connect)
>
> JIRA 
> Prototype 
> SPIP doc
> 
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thanks.
>


-- 
Takuya UESHIN


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Hyukjin Kwon
Oh I didn't send the discussion thread out as it's pretty simple,
non-invasive and the discussion was sort of done as part of the Spark
Connect initial discussion ..

On Mon, Apr 1, 2024 at 1:59 PM Mridul Muralidharan  wrote:

>
> Can you point me to the SPIP’s discussion thread please ?
> I was not able to find it, but I was on vacation, and so might have
> missed this …
>
>
> Regards,
> Mridul
>
> On Sun, Mar 31, 2024 at 9:08 PM Haejoon Lee
>  wrote:
>
>> +1
>>
>> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
>>> Connect)
>>>
>>> JIRA 
>>> Prototype 
>>> SPIP doc
>>> 
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thanks.
>>>
>>


Scheduling jobs using FAIR pool

2024-03-31 Thread Varun Shah
Hi Community,

I am currently exploring the best use of "Scheduler Pools" for executing
jobs in parallel, and require clarification and suggestions on a few points.

The implementation consists of executing "Structured Streaming" jobs on
Databricks using AutoLoader. Each stream is executed with trigger =
'AvailableNow', ensuring that the streams don't keep running for the
source. (we have ~4000 such streams, with no continuous stream from source,
hence not keeping the streams running infinitely using other triggers).

One way to achieve parallelism in the jobs is to use "MultiThreading", all
using same SparkContext, as quoted from official docs: "Inside a given
Spark application (SparkContext instance), multiple parallel jobs can run
simultaneously if they were submitted from separate threads."

There's also a availability of "FAIR Scheduler", which instead of FIFO
Scheduler (default), assigns executors in Round-Robin fashion, ensuring the
smaller jobs that were submitted later do not starve due to bigger jobs
submitted early consuming all resources.

Here are my questions:
1. The Round-Robin distribution of executors only work in case of empty
executors (achievable by enabling dynamic allocation). In case the jobs
(part of the same pool) requires all executors, second jobs will still need
to wait.
2. If we create dynamic pools for submitting each stream (by setting spark
property -> "spark.scheduler.pool" to a dynamic value as
spark.sparkContext.setLocalProperty("spark.scheduler.pool", "") , how does executor allocation happen ? Since all pools created
are created dynamically, they share equal weight. Does this also work the
same way as submitting streams to a single pool as a FAIR scheduler ?
3. Official docs quote "inside each pool, jobs run in FIFO order.". Is this
true for the FAIR scheduler also ? By definition, it does not seem right,
but it's confusing. It says "By Default" , so does it mean for FIFO
scheduler or by default for both scheduling types ?
4. Are there any overhead for spark driver while creating / using a
dynamically created spark pool vs pre-defined pools ?

Apart from these, any suggestions or ways you have implemented auto-scaling
for such loads ? We are currently trying to auto-scale the resources based
on requests, but scaling down is an issue (known already for which SPIP is
already in discussion, but it does not cater to submitting multiple streams
in a single cluster.

Thanks for reading !! Looking forward to your suggestions

Regards,
Varun Shah


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Mridul Muralidharan
Can you point me to the SPIP’s discussion thread please ?
I was not able to find it, but I was on vacation, and so might have missed
this …


Regards,
Mridul

On Sun, Mar 31, 2024 at 9:08 PM Haejoon Lee
 wrote:

> +1
>
> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
>> Connect)
>>
>> JIRA 
>> Prototype 
>> SPIP doc
>> 
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks.
>>
>


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Gengliang Wang
+1

On Sun, Mar 31, 2024 at 8:24 PM Dongjoon Hyun 
wrote:

> +1
>
> Thank you, Hyukjin.
>
> Dongjoon
>
> On Sun, Mar 31, 2024 at 19:07 Haejoon Lee
>  wrote:
>
>> +1
>>
>> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
>>> Connect)
>>>
>>> JIRA 
>>> Prototype 
>>> SPIP doc
>>> 
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thanks.
>>>
>>


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Dongjoon Hyun
+1

Thank you, Hyukjin.

Dongjoon

On Sun, Mar 31, 2024 at 19:07 Haejoon Lee
 wrote:

> +1
>
> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
>> Connect)
>>
>> JIRA 
>> Prototype 
>> SPIP doc
>> 
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks.
>>
>


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Ruifeng Zheng
+1

On Mon, Apr 1, 2024 at 10:06 AM Haejoon Lee
 wrote:

> +1
>
> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
>> Connect)
>>
>> JIRA 
>> Prototype 
>> SPIP doc
>> 
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks.
>>
>

-- 
Ruifeng Zheng
E-mail: zrfli...@gmail.com


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Haejoon Lee
+1

On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon  wrote:

> Hi all,
>
> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
> Connect)
>
> JIRA 
> Prototype 
> SPIP doc
> 
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thanks.
>


[VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Hyukjin Kwon
Hi all,

I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
Connect)

JIRA 
Prototype 
SPIP doc


Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Thanks.