Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-04-11 Thread Jungtaek Lim
I'm still having a hard time reviewing this. I have been handling a bunch of context right now, and the change is non-trivial to review in parallel. I see people were OK with the algorithm in high-level, but from a code perspective it's uneasy to understand without knowledge of DRA. It would take

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-04-06 Thread Pavan Kotikalapudi
Hi Jungtaek, Status on current SPARK-24815 : Thomas Graves is reviewing the draft PR . I need to add documentation about the configs and usage details, I am planning to do that this week. He did mention

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-28 Thread Pavan Kotikalapudi
Hi Andrew, Sandy, Jerry, Thomas, marcelo, Whenchen, YangJie, Shixiong, My apologies. I have tagged soo many of you (on multiple emails), I am in the process of finding the core contributors of the Dynamic resource allocation (DRA) feature in apache/spark , I could

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-28 Thread Pavan Kotikalapudi
Hi Jungtaek, Sorry for the late reply. I understand the concerns towards finding PMC members, I had similar concerns in the past. Do you think we have something to improve in the SPIP (certain areas) so that it would get traction from PMC members? Or this SPIP might not be a priority to the PMC

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Jungtaek Lim
Sounds good. One thing I'd like to clarify before shepherding this SPIP is the process itself. Getting enough traction from PMC members is another issue to pass the SPIP vote. Even a vote from committer is not counted. (I don't have a binding vote.) I only see one PMC member (Thomas Graves, not

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Pavan Kotikalapudi
Sounds good. Thanks again for your help on guiding the effort from discussion/review through voting phases in the spark dev community. Thank you, Pavan On Tue, Mar 26, 2024 at 4:20 AM Mich Talebzadeh wrote: > Hi Pavan, > > Thanks for instigating this proposal. Looks like the proposal is

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Mich Talebzadeh
Hi Pavan, Thanks for instigating this proposal. Looks like the proposal is ready and has enough votes to be implemented. Having a sheppard will make it more fruitful. I will leave it to @Jungtaek Lim 's capable hands to drive it forward. Will be there to help if needed. Cheers Mich

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Pavan Kotikalapudi
Hi Bhuwan, Glad to hear back from you! Very much appreciate your help on reviewing the design doc/PR and endorsing this proposal. Thank you so much @Jungtaek Lim , @Mich Talebzadeh for graciously agreeing to mentor/shepherd this effort. Regarding Twilio copyright in Notice binary file:

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Jungtaek Lim
I'm happy to, but it looks like I need to check one more thing about the license, according to the WIP PR . @Pavan Kotikalapudi I see you've added the copyright of Twilio in the NOTICE-binary file, which makes me wonder if Twilio had filed CCLA to the

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-25 Thread Bhuwan Sahni
Hi Pavan, I looked at the PR, and the changes look simple and contained. It would be useful to add dynamic resource allocation to Spark Structured Streaming. Jungtaek. Would you be able to shepherd this change? On Tue, Mar 19, 2024 at 10:38 AM Bhuwan Sahni wrote: > Thanks a lot for creating

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-19 Thread Bhuwan Sahni
Thanks a lot for creating the risk table Pavan. My apologies. I was tied up with high priority items for the last couple weeks and could not respond. I will review the PR by tomorrow's end, and get back to you. Appreciate your patience. Thanks Bhuwan Sahni On Sun, Mar 17, 2024 at 4:42 PM Pavan

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-17 Thread Pavan Kotikalapudi
Hi Bhuwan, I hope the team got a chance to review the draft PR, looking for some comments to see if the plan looks alright?. I have updated the document about the risks .(also mentioned

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Mich Talebzadeh
Hi Bhuwan et al, Thank you for passing on the DataBricks Structured Streaming team's review of the SPIP document. FYI, I work closely with Pawan and other members to help deliver this piece of work. We appreciate your insights, especially regarding the cost savings potential from the PoC. Pavan

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Nivedita VY
+1 Nivi

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Pavan Kotikalapudi
Thanks Bhuwan and rest of the databricks team for the reviews, I appreciate your reviews, was very helpful in evaluating a few options that were overlooked earlier (especially about mixed spark apps running on notebooks). Regarding the use-cases, It could handle multiple streaming queries

RE: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Nivedita VY
+1 Nivi

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Bhuwan Sahni
Hi Pavan, I am from the DataBricks Structured Streaming team, and we did a review of the SPIP internally. Wanted to pass on the points discussed in the meeting. Thanks for putting together the SPIP document. It's useful to have dynamic resource allocation for Streaming queries, and it's

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Mich Talebzadeh
Hi Pavan and those who kindly voted for this SPIP Great to have 6+ votes and no -1 and 0. The so-called mass volume is there. The rest is admin matter and how to drive the project forward and yes there is more than one way of skinning the cat. I think we need some flexibility in the rules given

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Pavan Kotikalapudi
Thanks for the pointers Mich, will wait for Jungtaek Lee or any other PMC members to respond. aggregating upvotes to this email thread +6 Mich Talebzadeh Adam Hobbs Pavan Kotikalapudi Krystal Mitchell Sona Torosyan Aaron Kern Thank you, Pavan On Thu, Feb 22, 2024 at 3:07 PM Mich Talebzadeh

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Mich Talebzadeh
+1 for me Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Aaron Kern
+1

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
Hi, please check this doc Spark Project Improvement Proposals (SPIP) | Apache Spark and specifically the below extract Discussing an SPIP All discussion of an SPIP should take place in a public forum, preferably the discussion attached to

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Pavan Kotikalapudi
Hi Mich, We have five +1s till now. Mich Talebzadeh Adam Hobbs Pavan Kotikalapudi Krystal Mitchell Sona Torosyan (few more in github pr) +0: None -1: None Does it pass the required condition as approved? Not sure of that though, nothing about minimum required is mentioned in the past

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Sona Torosyan
+1

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
Hi Pavan, Do you have a list of votes for this feature by any chance? Does it pass the required condition as approved? HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Pavan Kotikalapudi
Yes. The PR was closed due to inactivity by github actions.. The msg also says > If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! On Thu, Feb 22, 2024 at 1:09 AM Mich Talebzadeh

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
I can see it was closed. Was it because of inactivity? Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-21 Thread Pavan Kotikalapudi
Hi Spark PMC members, I think we have few upvotes for this effort here and more people are showing interest (see PR comments .) Is anyone interested in mentoring and reviewing this effort? Also can the repository admin/owner

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-20 Thread Krystal Mitchell
+1 On 2024/01/17 17:49:32 Pavan Kotikalapudi wrote: > Thanks for proposing and voting for the feature Mich. > > adding some references to the thread. > >- Jira ticket - SPARK-24815 > >- Design Doc > >

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-20 Thread Pavan Kotikalapudi
Here is the link to the voting thread https://lists.apache.org/thread/rlwqrw6ddxdkbvkp78kpd0zgvglgbbp8. Thank you, Pavan On Wed, Jan 17, 2024 at 7:15 PM Pavan Kotikalapudi wrote: > Thanks for the +1, I will propose voting in a new thread now. > > - Pavan > > On Wed, Jan 17, 2024 at 5:28 PM

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-19 Thread Mich Talebzadeh
3:19 AM Adam Hobbs > wrote: > >> +1 >> -- >> *From:* Pavan Kotikalapudi >> *Sent:* Thursday, January 18, 2024 4:19:32 AM >> *To:* Spark dev list >> *Subject:* Re: Vote on Dynamic resource allocation for structured >> streaming [SPARK-24815] >> >> &

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-19 Thread Pavan Kotikalapudi
e: Vote on Dynamic resource allocation for structured > streaming [SPARK-24815] > > > CAUTION: This email originated from outside of the organisation. Do not > click links or open attachments unless you recognise the sender's full > email address and know the content is safe.

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Adam Hobbs
+1 From: Pavan Kotikalapudi Sent: Thursday, January 18, 2024 4:19:32 AM To: Spark dev list Subject: Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815] CAUTION: This email originated from outside of the organisation. Do not click

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Pavan Kotikalapudi
Thanks for proposing and voting for the feature Mich. adding some references to the thread. - Jira ticket - SPARK-24815 - Design Doc

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Mich Talebzadeh
+1 for me (non binding) *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Pavan Kotikalapudi
Thanks for the +1, I will propose voting in a new thread now. - Pavan On Wed, Jan 17, 2024 at 5:28 PM Mich Talebzadeh wrote: > I think we have discussed this enough and I consider it as a useful > feature.. I propose a vote on it. > > + 1 for me > > Mich Talebzadeh, > Dad | Technologist |

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Mich Talebzadeh
I think we have discussed this enough and I consider it as a useful feature.. I propose a vote on it. + 1 for me Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-16 Thread Adam Hobbs
Hi, This is my first time using the dev mailing list so I hope this is the correct way to do it. I would like to lend my support to this proposal and offer my experiences as a consumer of spark, and specifically Spark Structured Streaming (SSS). I am more of an cloud infrastructure devops

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-05 Thread Mich Talebzadeh
or performance deviations, alerting us to potential issues before >>>>>>>>> they >>>>>>>>>impact the system. >>>>>>>>>5. >>>>>>>>> >>>>>>>>>

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-05 Thread Pavan Kotikalapudi
>>>>>>>> >>>>>>>>*Model Training: *ML models require training and validation >>>>>>>>using relevant data. Our DS colleagues need to define appropriate >>>>>>>> features, >>>>>&g

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-02 Thread Mich Talebzadeh
e need to have the necessary expertise in >>>>>>> both >>>>>>>Spark Structured Streaming and machine learning to design, >>>>>>> implement, and >>>>>>>maintain the system effectively. >>>>>>>

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-01 Thread Pavan Kotikalapudi
ditional computational requirements, >>>>>> especially >>>>>>during the model training and inference phases. >>>>>> 4. >>>>>> >>>>>>In summary, this idea of utilizing ML for capacity planni

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-11-12 Thread Pavan Kotikalapudi
t; that, I >>>>>totally agree that we need to evaluate the feasibility, potential >>>>> benefits, >>>>>and challenges and we will need involving experts in both Spark and >>>>> machine >>>>>learning to ensure

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-11-12 Thread Pavan Kotikalapudi
, >>>> Solutions Architect/Engineering Lead >>>> London >>>> United Kingdom >>>> >>>> >>>> view my Linkedin profile >>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-23 Thread Pavan Kotikalapudi
7h7N3BHXkBHRaR3T8ludHCpxKNgQ9ugixgI3MGy-bP2VmxTg$> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!ag4RKtjaus5ggrkrgIaT1uG75X7gM3CjxLhkaIZMA5VGjc7h7N3BHXkBHRaR

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-23 Thread Mich Talebzadeh
t; any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> &

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-20 Thread Pavan Kotikalapudi
ug 2023 at 14:58, Martin Andersson > wrote: > >> IMO, using any kind of machine learning or AI for DRA is overkill. The >> effort involved would be considerable and likely counterproductive, >> compared to a more conventional approach of comparing the rate of incoming >>

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-14 Thread Mich Talebzadeh
-- > *From:* Mich Talebzadeh > *Sent:* Tuesday, August 8, 2023 19:59 > *To:* Pavan Kotikalapudi > *Cc:* dev@spark.apache.org > *Subject:* Re: Dynamic resource allocation for structured streaming > [SPARK-24815] > > > EXTERNAL SENDER. Do not click links or open attach

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-14 Thread Martin Andersson
. From: Mich Talebzadeh Sent: Tuesday, August 8, 2023 19:59 To: Pavan Kotikalapudi Cc: dev@spark.apache.org Subject: Re: Dynamic resource allocation for structured streaming [SPARK-24815] EXTERNAL SENDER. Do not click links or open attachments unless you recognize the sender

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-08 Thread Mich Talebzadeh
I am currently contemplating and sharing my thoughts openly. Considering our reliance on previously collected statistics (as mentioned earlier), it raises the question of why we couldn't integrate certain machine learning elements into Spark Structured Streaming? While this might slightly deviate

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-08 Thread Pavan Kotikalapudi
Listeners are the best resources to the allocation manager afaik... It already has SparkListener that it utilizes. We can use it to extract more information (like processing

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-08 Thread Mich Talebzadeh
Hi Pavan or anyone else Is there any way one access the matrix displayed on SparkGUI? For example the readings for processing time? Can these be acessed? Thanks For example, Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Pavan Kotikalapudi
Thanks for the review Mich, Yes, the configuration parameters we end up setting would be based on the trigger interval. > If you are going to have additional indicators why not look at scheduling delay as well Yes. The implementation is based on scheduling delays, not for pending tasks of the

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Mich Talebzadeh
Hi, I glanced over the design doc. You are providing certain configuration parameters plus some settings based on static values. For example: spark.dynamicAllocation.schedulerBacklogTimeout": 54s I cannot see any use of which ought to be at least half of the batch interval to have the correct

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Holden Karau
Oooh fascinating. I’m going on call this week so it will take me awhile but I do want to review this :) On Mon, Aug 7, 2023 at 5:30 PM Pavan Kotikalapudi wrote: > Hi Spark Dev, > > I have extended traditional DRA to work for structured streaming > use-case. > > Here is an initial Implementation

Fwd: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Pavan Kotikalapudi
Hi Spark Dev, I have extended traditional DRA to work for structured streaming use-case. Here is an initial Implementation draft PR https://github.com/apache/spark/pull/42352 and design doc: https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing Please