Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-28 Thread Pavan Kotikalapudi
Hi Andrew, Sandy, Jerry, Thomas, marcelo, Whenchen, YangJie, Shixiong,

My apologies. I have tagged soo many of you (on multiple emails), I am in
the process of finding the core contributors of the Dynamic resource
allocation (DRA) feature in apache/spark ,
I could find you folks as some of the core contributing members to this
feature.

We(cc'd) would like to extend the current DRA to work for structured
streaming [SPARK-24815 ]
use-case (based on the heuristics of trigger interval).
Here is the design doc
.
We also have a draft PR  with
initial implementation.

This feature has been running well for the past one year at my company
(twilio) and there are a lot of folks in the community who are interested
in this feature.

Do get the PR to a mergeable state. We would love to leverage your
expertise on DRA. I request you to please review the design doc and the
draft PR, let us know your thoughts and concerns if any. This will hugely
benefit the community utilizing structured streaming applications in their
data pipelines.

Looking forward to hear back from you.

Thank you,

Pavan

On Thu, Mar 28, 2024 at 3:38 PM Pavan Kotikalapudi 
wrote:

> Hi Jungtaek,
>
> Sorry for the late reply.
>
> I understand the concerns towards finding PMC members, I had similar
> concerns in the past. Do you think we have something to improve in the SPIP
> (certain areas) so that it would get traction from PMC members? Or this
> SPIP might not be a priority to the PMC right now?
>
> I agree this change is small enough that it might not be tagged as an
> SPIP. I started with the template SPIP questions so that it would be easier
> to understand the limitations of the current system, new solution, how it
> works, how to use it, limitations etcAs you might have already
> noticed in the PR, This change is turned off by default, will only work if
> `spark.dynamicAllocation.streaming.enabled` is true.
>
> Regarding the concerns about expertise in DRA,  I will find some core
> contributors of this module/DRA and tag them to this email with details,
> Mich has also highlighted the same in the past. Once we get approval from
> them we can further discuss and enhance this to make the user experience
> better.
>
> Thank you,
>
> Pavan
>
>
> On Tue, Mar 26, 2024 at 8:12 PM Jungtaek Lim 
> wrote:
>
>> Sounds good.
>>
>> One thing I'd like to clarify before shepherding this SPIP is the process
>> itself. Getting enough traction from PMC members is another issue to pass
>> the SPIP vote. Even a vote from committer is not counted. (I don't have a
>> binding vote.) I only see one PMC member (Thomas Graves, not my team) in
>> the design doc and we still don't get positive feedback. So still a long
>> way to go. We need three supporters from PMC members.
>>
>> Another thing is, I get the proposal at a high level, but I don't have
>> actual expertise in DRA. I could review the code in general, but I feel
>> like I'm not qualified to approve the code. We still need an expert on the
>> CORE area, especially who has expertise with DRA. (Could you please
>> annotate the code and enumerate several people who worked on the codebase?)
>> If they need an expertise of streaming to understand how things will work
>> then either you or I can explain, but I can't just approve and merge the
>> code.
>>
>> That said, if we succeed in finding one and they review the code and
>> LGTM, I'd rather say not to go with taking the process of SPIP unless the
>> expert reviewing your code requires us to do so. The change you proposed is
>> rather small and does not seem to be invasive (experts can also weigh), and
>> there must never be the case that this feature is turned on by default (as
>> we pointed out limitation). It doesn't look like requiring SPIP, if we
>> carefully document the new change and also clearly describe the limitation.
>> (Also a warning in the codebase that this must not be enabled by default.)
>>
>>
>> On Tue, Mar 26, 2024 at 7:02 PM Pavan Kotikalapudi <
>> pkotikalap...@twilio.com> wrote:
>>
>>> Hi Bhuwan,
>>>
>>> Glad to hear back from you! Very much appreciate your help on reviewing
>>> the design doc/PR and endorsing this proposal.
>>>
>>> Thank you so much @Jungtaek Lim  , @Mich
>>> Talebzadeh   for graciously agreeing to
>>> mentor/shepherd this effort.
>>>
>>> Regarding Twilio copyright in Notice binary file:
>>> Twilio Opensource counsel was involved all through the process, I have
>>> placed it in the project file prior to Twilio signing a CCLA for the spark
>>> project contribution( Aug '23).
>>>
>>> Since the CCLA is signed now, I have removed the twilio copyright from
>>> that file. I didn't get a chance to update the PR after github-actions
>>> closed it.
>>>
>>> Please let me know of next steps needed to bring

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-28 Thread Pavan Kotikalapudi
Hi Jungtaek,

Sorry for the late reply.

I understand the concerns towards finding PMC members, I had similar
concerns in the past. Do you think we have something to improve in the SPIP
(certain areas) so that it would get traction from PMC members? Or this
SPIP might not be a priority to the PMC right now?

I agree this change is small enough that it might not be tagged as an SPIP.
I started with the template SPIP questions so that it would be easier to
understand the limitations of the current system, new solution, how it
works, how to use it, limitations etcAs you might have already
noticed in the PR, This change is turned off by default, will only work if
`spark.dynamicAllocation.streaming.enabled` is true.

Regarding the concerns about expertise in DRA,  I will find some core
contributors of this module/DRA and tag them to this email with details,
Mich has also highlighted the same in the past. Once we get approval from
them we can further discuss and enhance this to make the user experience
better.

Thank you,

Pavan


On Tue, Mar 26, 2024 at 8:12 PM Jungtaek Lim 
wrote:

> Sounds good.
>
> One thing I'd like to clarify before shepherding this SPIP is the process
> itself. Getting enough traction from PMC members is another issue to pass
> the SPIP vote. Even a vote from committer is not counted. (I don't have a
> binding vote.) I only see one PMC member (Thomas Graves, not my team) in
> the design doc and we still don't get positive feedback. So still a long
> way to go. We need three supporters from PMC members.
>
> Another thing is, I get the proposal at a high level, but I don't have
> actual expertise in DRA. I could review the code in general, but I feel
> like I'm not qualified to approve the code. We still need an expert on the
> CORE area, especially who has expertise with DRA. (Could you please
> annotate the code and enumerate several people who worked on the codebase?)
> If they need an expertise of streaming to understand how things will work
> then either you or I can explain, but I can't just approve and merge the
> code.
>
> That said, if we succeed in finding one and they review the code and LGTM,
> I'd rather say not to go with taking the process of SPIP unless the expert
> reviewing your code requires us to do so. The change you proposed is rather
> small and does not seem to be invasive (experts can also weigh), and there
> must never be the case that this feature is turned on by default (as we
> pointed out limitation). It doesn't look like requiring SPIP, if we
> carefully document the new change and also clearly describe the limitation.
> (Also a warning in the codebase that this must not be enabled by default.)
>
>
> On Tue, Mar 26, 2024 at 7:02 PM Pavan Kotikalapudi <
> pkotikalap...@twilio.com> wrote:
>
>> Hi Bhuwan,
>>
>> Glad to hear back from you! Very much appreciate your help on reviewing
>> the design doc/PR and endorsing this proposal.
>>
>> Thank you so much @Jungtaek Lim  , @Mich
>> Talebzadeh   for graciously agreeing to
>> mentor/shepherd this effort.
>>
>> Regarding Twilio copyright in Notice binary file:
>> Twilio Opensource counsel was involved all through the process, I have
>> placed it in the project file prior to Twilio signing a CCLA for the spark
>> project contribution( Aug '23).
>>
>> Since the CCLA is signed now, I have removed the twilio copyright from
>> that file. I didn't get a chance to update the PR after github-actions
>> closed it.
>>
>> Please let me know of next steps needed to bring this draft PR/effort to
>> completion.
>>
>> Thank you,
>>
>> Pavan
>>
>>
>> On Tue, Mar 26, 2024 at 12:01 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> I'm happy to, but it looks like I need to check one more thing about the
>>> license, according to the WIP PR
>>> 
>>> .
>>>
>>> @Pavan Kotikalapudi 
>>> I see you've added the copyright of Twilio in the NOTICE-binary file,
>>> which makes me wonder if Twilio had filed CCLA to the Apache Software
>>> Foundation.
>>>
>>> PMC members can correct me if I'm mistaken, but from my understanding
>>> (and experiences of PMC member in other ASF project), code contribution is
>>> considered as code donation and copyright belongs to ASF. That's why you
>>> can't find the copyright of employers for contributors in the codebase.
>>> What you see copyrights in NOTICE-binary is due to the fact we have binary
>>> dependency and their licenses may require to explicitly mention about
>>> copyright. It's not about direct code contribution.
>>>
>>> Is Twilio aware of this? Also, if Twilio did not file CCLA in prior,
>>> could you please engage with a relevant group in the company (could be a
>>> legal team, or similar with OSS advocate team if there is any) and ensure
>>> that CCLA is filed? The copyright issue is a legal issue, so

Re: The dedicated repository for Kubernetes Operator for Apache Spark

2024-03-28 Thread Dongjoon Hyun
Thank you, Liang-Chi!

Dongjoon.

On Wed, Mar 27, 2024 at 10:56 PM L. C. Hsieh  wrote:

> Hi all,
>
> For the passed SPIP: An Official Kubernetes Operator for Apache Spark,
> the developers have been working on code cleaning and refactoring for
> open source in the last few months. They are ready to contribute the
> code to Spark now.
>
> As we discussed, I will go to create a dedicated repository for the
> Kubernetes Operator for Apache Spark. I think the repository name will
> be "spark-kubernetes-operator". I will try to create the repository
> tomorrow.
>
> After that, they will contribute the code as an initial PR for review
> from the Spark community.
>
> Thank you.
>
> Liang-Chi
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>