Hi Pavan,

Thanks for your answers.

Given these responses , it seems like you have already taken a
comprehensive approach to address the challenges associated with dynamic
scaling in Spark Structured Streaming. IMO, It would also be beneficial to
engage with other members as well, or gather additional feedback and
perspectives, especially from those with experience in dynamic resource
allocation in Spark. Having said that, the discussion above demonstrates a
good understanding of the challenges involved in enhancing Spark Structured
Streaming resource management capabilities.

HTH

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 5 Jan 2024 at 13:43, Pavan Kotikalapudi <pkotikalap...@twilio.com>
wrote:

> Hi Mich,
>
> As always thanks for looking keenly on the design, really appreciate your
> inputs on this Ticket. Would love to improve this further and cover more
> edge-cases if any.
>
> I can answer the concerns you have below. I believe I have covered some of
> them in the proposal, If at all I missed out on anything.
>
>
>    - Implementation Complexity: Integrating dynamic scaling into Spark's
>    resource management framework requires careful design and implementation to
>    ensure effectiveness and stability.
>    I have drafted a PR with initial implementation
>    https://github.com/apache/spark/pull/42352
>    
> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9slb8WuoTPA$>,
>    made sure that we just utilize Spark's stable resource management
>    framework of batch jobs and extended it to work for our streaming
>    use-cases. As structured streaming is a micro-batch at the lowest level, I
>    tuned the scaling actions based on micro-batches.
>    Would appreciate it if anybody in the dev community who has worked on
>    dynamic resource allocation (DRA) implementation can take a look at this as
>    well.
>
>    - Heuristic Accuracy: This proposal effectiveness depends heavily on
>    the accuracy of the trigger interval heuristics used to guide scaling
>    decisions.
>    Yes. Though the scaling guidelines of the app are determined by the
>    trigger interval, The guidelines will just provide values to the
>    request/remove policy of the already existing DRA solution
>    
> <https://spark.apache.org/docs/latest/job-scheduling.html#resource-allocation-policy>.
>
>    The current dra is targeted towards batch use cases; it will
>    constantly scale out/back per stage of the job. That makes it unstable for
>    streaming jobs. I have tweaked it to scale by micro-batches. That said, I
>    am still looking for any suggestions on other stats which will be helpful
>    in effective scaling of the streaming apps
>
>    - Overhead: Monitoring and scaling processes themselves introduce some
>    overhead, which needs to be balanced against the potential performance
>    gains. For example, how we can utilise Input Rate, process rate and 
> Operation
>    Duration from Streaming Query Statistics page etc
>    We already have all of the events in the Listener Bus spark framework.
>    We are making sure we don't add anything more to the framework but rather
>    just consume that information to scale. So the solution shouldn't
>    compromise any performance, it will definitely yield better resource
>    utilization for uneven traffic patterns of the day.
>    Regarding the utilization of `Streaming Query Statistics`, it would
>    fall under the spark-sql sub-module of the project which will steer towards
>    creating a new algorithm in that module separate from current DRA
>    implementation. Since the current design doesn't require any of those stats
>    I kept it to the core module stats, but if other stats like input rate will
>    help in building better scaling accuracy would definitely look into it.
>
>    - We ought to consider the potential impact on latency. Scaling
>    operations, especially scaling up, may introduce some latency. Ensuring
>    minimal impact on the processing time is crucial
>    Since structured streaming apps tend to be latency sensitive at times
>    the scaling algorithm aggressively scales to add more resources. The scale
>    out happens usually when the processing rate of a micro-batch increases, so
>    the scale out should help reduce the processing time ( or keep it in check)
>    and the latency of the events passing through.
>
>    - Implementing mechanisms for graceful scaling operations, avoiding
>    abrupt changes, can contribute to a smoother user experience.
>    Totally! While the scale-out is aggressive, the scale-back typically
>    involves a de-allocation ratio(0-100%) and time-out period so that only few
>    resources leave the application executor pool. This also keeps the
>    processing times pretty stable throughout the day.
>
>
> Please let me know if there are any more concerns, I will also add these
> at the end of the doc for future reference.
>
> Thank you,
>
> Pavan
>
>
> On Wed, Jan 3, 2024 at 4:58 AM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi Pavan,
>>
>> Thanks for putting this request forward.
>>
>> I am generally supportive of it. In a nutshell, I believe this proposal
>> can potentially hold a significant promise for optimizing resource
>> utilization and enhancing performance in Spark Structured Streaming.
>>
>> Having said that there are potential Challenges and Considerations from
>> my experience of Spark Structured Streaming (SSS), if I summarise
>>
>>    - Implementation Complexity: Integrating dynamic scaling into Spark's
>>    resource management framework requires careful design and implementation 
>> to
>>    ensure effectiveness and stability.
>>    - Heuristic Accuracy: This proposal effectiveness depends heavily on
>>    the accuracy of the trigger interval heuristics used to guide scaling
>>    decisions.
>>    - Overhead: Monitoring and scaling processes themselves introduce
>>    some overhead, which needs to be balanced against the potential 
>> performance
>>    gains. For example, how we can utilise Input Rate, process rate and 
>> Operation
>>    Duration from Streaming Query Statistics page etc
>>    - We ought to consider the potential impact on latency. Scaling
>>    operations, especially scaling up, may introduce some latency. Ensuring
>>    minimal impact on the processing time is crucial
>>    - Implementing mechanisms for graceful scaling operations, avoiding
>>    abrupt changes, can contribute to a smoother user experience.
>>
>> I do not know whether some of these points are already considered in your
>> proposal?
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9slYR39CQjA$>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9slY3V9EBhw$>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Mon, 1 Jan 2024 at 10:34, Pavan Kotikalapudi
>> <pkotikalap...@twilio.com.invalid> wrote:
>>
>>> Hi PMC members,
>>>
>>> Bumping this idea for one last time to see if there are any approvals to
>>> take it forward.
>>>
>>> Here is an initial Implementation draft PR
>>> https://github.com/apache/spark/pull/42352
>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9slb8WuoTPA$>
>>>  and
>>> design doc:
>>> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing
>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9sladyidw2Q$>
>>>
>>>
>>> Thank you,
>>>
>>> Pavan
>>>
>>> On Mon, Nov 13, 2023 at 6:57 AM Pavan Kotikalapudi <
>>> pkotikalap...@twilio.com> wrote:
>>>
>>>>
>>>>
>>>> Here is an initial Implementation draft PR
>>>> https://github.com/apache/spark/pull/42352
>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9slb8WuoTPA$>
>>>>  and
>>>> design doc:
>>>> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing
>>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing__;!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9sladyidw2Q$>
>>>>
>>>>
>>>> On Sun, Nov 12, 2023 at 5:24 PM Pavan Kotikalapudi <
>>>> pkotikalap...@twilio.com> wrote:
>>>>
>>>>> Hi Dev community,
>>>>>
>>>>> Just bumping to see if there are more reviews to evaluate this idea of
>>>>> adding auto-scaling to structured streaming.
>>>>>
>>>>> Thanks again,
>>>>>
>>>>> Pavan
>>>>>
>>>>> On Wed, Aug 23, 2023 at 2:49 PM Pavan Kotikalapudi <
>>>>> pkotikalap...@twilio.com> wrote:
>>>>>
>>>>>> Thanks for the review Mich.
>>>>>>
>>>>>> I have updated the Q4 with as concise information as possible and
>>>>>> left the detailed explanation to Appendix.
>>>>>>
>>>>>> here is the updated answer to the Q4
>>>>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit*heading=h.xe0x4i9gc1dg__;Iw!!NCc8flgU!f57o0p_8gfCLFNDpC01KL-ol2cIFY9ToRmVSpnKl8EzBHNF7tqnvFzcGx94xjl2DzrNQSBnFrtE44gyMDwT9slZp0etSTw$>
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Pavan
>>>>>>
>>>>>> On Wed, Aug 23, 2023 at 2:46 AM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Pavan,
>>>>>>>
>>>>>>> I started reading your SPIP but have difficulty understanding it in
>>>>>>> detail.
>>>>>>>
>>>>>>> Specifically under Q4, " What is new in your approach and why do
>>>>>>> you think it will be successful?", I believe it would be better to 
>>>>>>> remove
>>>>>>> the plots and focus on "what this proposed solution is going to add to 
>>>>>>> the
>>>>>>> current play". At this stage a concise briefing would be appreciated and
>>>>>>> the specific plots should be left to the Appendix.
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>> Distinguished Technologist, Solutions Architect & Engineer
>>>>>>> London
>>>>>>> United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!Z1-Qlb9LL5r97D1tGWz_pKDVDYm-S_n99e_jhraM5XA4B058OHmw47z_FmbEVHeXdgLqEkkvS4W88hGkTBlB4wSpQtgviw$>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!Z1-Qlb9LL5r97D1tGWz_pKDVDYm-S_n99e_jhraM5XA4B058OHmw47z_FmbEVHeXdgLqEkkvS4W88hGkTBlB4wR3SukiIw$>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, 20 Aug 2023 at 07:40, Pavan Kotikalapudi <
>>>>>>> pkotikalap...@twilio.com> wrote:
>>>>>>>
>>>>>>>> IMO ML might be good for cluster scheduler but for the core DRA
>>>>>>>> algorithm of SSS I believe we should start with some primitives of
>>>>>>>> Structured streaming. I would love to get some reviews on the doc and
>>>>>>>> opinions on the feasibility of the solution.
>>>>>>>>
>>>>>>>> We have seen quite some savings using this solution in our team,
>>>>>>>> Would like to listen to the dev community to see if they are looking
>>>>>>>> for/interested in DRA for structured streaming.
>>>>>>>>
>>>>>>>> On Mon, Aug 14, 2023 at 9:12 AM Mich Talebzadeh <
>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thank you for your comments.
>>>>>>>>>
>>>>>>>>> My vision of integrating machine learning (ML) into Spark
>>>>>>>>> Structured Streaming (SSS) for capacity planning and performance
>>>>>>>>> optimization seems to be promising. By leveraging ML techniques, I 
>>>>>>>>> believe
>>>>>>>>> that we can potentially create predictive models that enhance the
>>>>>>>>> efficiency and resource allocation of the data processing pipelines. 
>>>>>>>>> Here
>>>>>>>>> are some potential benefits and considerations for adding ML to SSS 
>>>>>>>>> for
>>>>>>>>> capacity planning. However, I stand corrected
>>>>>>>>>
>>>>>>>>>    1.
>>>>>>>>>
>>>>>>>>>    *Predictive Capacity Planning:* ML models can analyze
>>>>>>>>>    historical data (that we discussed already), workloads, and trends 
>>>>>>>>> to
>>>>>>>>>    predict future resource needs accurately. This enables proactive 
>>>>>>>>> scaling
>>>>>>>>>    and allocation of resources, ensuring optimal performance during
>>>>>>>>>    high-demand periods, such as times of high trades.
>>>>>>>>>    2.
>>>>>>>>>
>>>>>>>>>    *Real-time Decision Making: *ML can be used to make real-time
>>>>>>>>>    decisions on resource allocation (software and cluster) based on 
>>>>>>>>> current
>>>>>>>>>    data and conditions, allowing for dynamic adjustments to meet the
>>>>>>>>>    processing demands.
>>>>>>>>>    3.
>>>>>>>>>
>>>>>>>>>    *Complex Data Analysis: *In a heterogeneous setup involving
>>>>>>>>>    multiple databases, ML can analyze various factors like data read 
>>>>>>>>> and write
>>>>>>>>>    times from different databases, data volumes, and data distribution
>>>>>>>>>    patterns to optimize the overall data processing flow.
>>>>>>>>>    4.
>>>>>>>>>
>>>>>>>>>    *Anomaly Detection: *ML models can identify unusual patterns
>>>>>>>>>    or performance deviations, alerting us to potential issues before 
>>>>>>>>> they
>>>>>>>>>    impact the system.
>>>>>>>>>    5.
>>>>>>>>>
>>>>>>>>>    Integration with Monitoring: ML models can work alongside
>>>>>>>>>    monitoring tools, gathering real-time data on various performance 
>>>>>>>>> metrics,
>>>>>>>>>    and using this data for making intelligent decisions on capacity 
>>>>>>>>> and
>>>>>>>>>    resource allocation.
>>>>>>>>>
>>>>>>>>> However, there are some important considerations to keep in mind:
>>>>>>>>>
>>>>>>>>>    1.
>>>>>>>>>
>>>>>>>>>    *Model Training: *ML models require training and validation
>>>>>>>>>    using relevant data. Our DS colleagues need to define appropriate 
>>>>>>>>> features,
>>>>>>>>>    select the right ML algorithms, and fine-tune the model parameters 
>>>>>>>>> to
>>>>>>>>>    achieve optimal performance.
>>>>>>>>>    2.
>>>>>>>>>
>>>>>>>>>    *Complexity:* Integrating ML adds complexity to our
>>>>>>>>>    architecture. Moreover, we need to have the necessary expertise in 
>>>>>>>>> both
>>>>>>>>>    Spark Structured Streaming and machine learning to design, 
>>>>>>>>> implement, and
>>>>>>>>>    maintain the system effectively.
>>>>>>>>>    3.
>>>>>>>>>
>>>>>>>>>    *Resource Overhead: *ML algorithms can be resource-intensive.
>>>>>>>>>    We ought to consider the additional computational requirements, 
>>>>>>>>> especially
>>>>>>>>>    during the model training and inference phases.
>>>>>>>>>    4.
>>>>>>>>>
>>>>>>>>>    In summary, this idea of utilizing ML for capacity planning in
>>>>>>>>>    Spark Structured Streaming can possibly hold significant potential 
>>>>>>>>> for
>>>>>>>>>    improving system performance and resource utilization. Having said 
>>>>>>>>> that, I
>>>>>>>>>    totally agree that we need to evaluate the feasibility, potential 
>>>>>>>>> benefits,
>>>>>>>>>    and challenges and we will need involving experts in both Spark 
>>>>>>>>> and machine
>>>>>>>>>    learning to ensure a successful outcome.
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>> Mich Talebzadeh,
>>>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>>> London
>>>>>>>>> United Kingdom
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    view my Linkedin profile
>>>>>>>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!ag4RKtjaus5ggrkrgIaT1uG75X7gM3CjxLhkaIZMA5VGjc7h7N3BHXkBHRaR3T8ludHCpxKNgQ9ugixgI3MGy-bP2VmxTg$>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!ag4RKtjaus5ggrkrgIaT1uG75X7gM3CjxLhkaIZMA5VGjc7h7N3BHXkBHRaR3T8ludHCpxKNgQ9ugixgI3MGy-as0BFUVQ$>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>>> which may
>>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>>> damages
>>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, 14 Aug 2023 at 14:58, Martin Andersson <
>>>>>>>>> martin.anders...@kambi.com> wrote:
>>>>>>>>>
>>>>>>>>>> IMO, using any kind of machine learning or AI for DRA is
>>>>>>>>>> overkill. The effort involved would be considerable and likely
>>>>>>>>>> counterproductive, compared to a more conventional approach of 
>>>>>>>>>> comparing
>>>>>>>>>> the rate of incoming stream data with the effort of handling 
>>>>>>>>>> previous data
>>>>>>>>>> rates.
>>>>>>>>>> ------------------------------
>>>>>>>>>> *From:* Mich Talebzadeh <mich.talebza...@gmail.com>
>>>>>>>>>> *Sent:* Tuesday, August 8, 2023 19:59
>>>>>>>>>> *To:* Pavan Kotikalapudi <pkotikalap...@twilio.com>
>>>>>>>>>> *Cc:* dev@spark.apache.org <dev@spark.apache.org>
>>>>>>>>>> *Subject:* Re: Dynamic resource allocation for structured
>>>>>>>>>> streaming [SPARK-24815]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> EXTERNAL SENDER. Do not click links or open attachments unless
>>>>>>>>>> you recognize the sender and know the content is safe. DO NOT 
>>>>>>>>>> provide your
>>>>>>>>>> username or password.
>>>>>>>>>>
>>>>>>>>>> I am currently contemplating and sharing my thoughts openly.
>>>>>>>>>> Considering our reliance on previously collected statistics (as 
>>>>>>>>>> mentioned
>>>>>>>>>> earlier), it raises the question of why we couldn't integrate certain
>>>>>>>>>> machine learning elements into Spark Structured Streaming? While 
>>>>>>>>>> this might
>>>>>>>>>> slightly deviate from our current topic, I am not an expert in 
>>>>>>>>>> machine
>>>>>>>>>> learning. However, there are individuals who possess the expertise to
>>>>>>>>>> assist us in exploring this avenue.
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>>
>>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>>>> London
>>>>>>>>>> United Kingdom
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!ag4RKtjaus5ggrkrgIaT1uG75X7gM3CjxLhkaIZMA5VGjc7h7N3BHXkBHRaR3T8ludHCpxKNgQ9ugixgI3MGy-bP2VmxTg$>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!ag4RKtjaus5ggrkrgIaT1uG75X7gM3CjxLhkaIZMA5VGjc7h7N3BHXkBHRaR3T8ludHCpxKNgQ9ugixgI3MGy-as0BFUVQ$>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>> other
>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>> content is
>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 8 Aug 2023 at 18:01, Pavan Kotikalapudi <
>>>>>>>>>> pkotikalap...@twilio.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Listeners are the best resources to the allocation manager
>>>>>>>>>> afaik... It already has SparkListener
>>>>>>>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala*L640__;Iw!!NCc8flgU!ag4RKtjaus5ggrkrgIaT1uG75X7gM3CjxLhkaIZMA5VGjc7h7N3BHXkBHRaR3T8ludHCpxKNgQ9ugixgI3MGy-YRkCAu0w$>
>>>>>>>>>>  that
>>>>>>>>>> it utilizes. We can use it to extract more information (like 
>>>>>>>>>> processing
>>>>>>>>>> times).
>>>>>>>>>> The one with more information regarding streaming query resides
>>>>>>>>>> in sql module
>>>>>>>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala__;!!NCc8flgU!ag4RKtjaus5ggrkrgIaT1uG75X7gM3CjxLhkaIZMA5VGjc7h7N3BHXkBHRaR3T8ludHCpxKNgQ9ugixgI3MGy-Y_DIYqaw$>
>>>>>>>>>> though.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> Pavan
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 8, 2023 at 5:43 AM Mich Talebzadeh <
>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Pavan or anyone else
>>>>>>>>>>
>>>>>>>>>> Is there any way one access the matrix displayed on SparkGUI? For
>>>>>>>>>> example the readings for processing time? Can these be acessed?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> For example,
>>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>>>> London
>>>>>>>>>> United Kingdom
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!d-qX4RylsnHucGkE4OdsO8agaKMFV59tVQnWZL1FbbZLVLWVUWgWmiiKC1Mvyy-796X-uP5XZfjLEbrVfe771d6VrCySTg$>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!d-qX4RylsnHucGkE4OdsO8agaKMFV59tVQnWZL1FbbZLVLWVUWgWmiiKC1Mvyy-796X-uP5XZfjLEbrVfe771d4r4xOqSg$>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>> other
>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>> content is
>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 8 Aug 2023 at 06:44, Pavan Kotikalapudi <
>>>>>>>>>> pkotikalap...@twilio.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks for the review Mich,
>>>>>>>>>>
>>>>>>>>>> Yes, the configuration parameters we end up setting would be
>>>>>>>>>> based on the trigger interval.
>>>>>>>>>>
>>>>>>>>>> > If you are going to have additional indicators why not look at
>>>>>>>>>> scheduling delay as well
>>>>>>>>>> Yes. The implementation is based on scheduling delays, not for
>>>>>>>>>> pending tasks of the current stage but rather pending tasks of
>>>>>>>>>> all the stages in a micro-batch
>>>>>>>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352/files*diff-fdddb0421641035be18233c212f0e3ccd2d6a49d345bd0cd4eac08fc4d911e21R1025__;Iw!!NCc8flgU!d-qX4RylsnHucGkE4OdsO8agaKMFV59tVQnWZL1FbbZLVLWVUWgWmiiKC1Mvyy-796X-uP5XZfjLEbrVfe771d6feoFH2Q$>
>>>>>>>>>>  (hence
>>>>>>>>>> trigger interval).
>>>>>>>>>>
>>>>>>>>>> > we ought to utilise the historical statistics collected under
>>>>>>>>>> the checkpointing directory to get more accurate statistics
>>>>>>>>>> You are right! This is just a simple implementation based on one
>>>>>>>>>> factor, we should also look into other indicators as well If that 
>>>>>>>>>> would
>>>>>>>>>> help build a better scaling algorithm.
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>>
>>>>>>>>>> Pavan
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 7, 2023 at 9:55 PM Mich Talebzadeh <
>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I glanced over the design doc.
>>>>>>>>>>
>>>>>>>>>> You are providing certain configuration parameters plus some
>>>>>>>>>> settings based on static values. For example:
>>>>>>>>>>
>>>>>>>>>> spark.dynamicAllocation.schedulerBacklogTimeout": 54s
>>>>>>>>>>
>>>>>>>>>> I cannot see any use of <processing time> which ought to be at
>>>>>>>>>> least half of the batch interval to have the correct margins 
>>>>>>>>>> (confidence
>>>>>>>>>> level). If you are going to have additional indicators why not
>>>>>>>>>> look at scheduling delay as well. Moreover most of the needed 
>>>>>>>>>> statistics
>>>>>>>>>> are also available to set accurate values. My inclination is
>>>>>>>>>> that this is a great effort but we ought to utilise the historical
>>>>>>>>>> statistics collected under checkpointing directory to get more
>>>>>>>>>> accurate statistics. I will review the design document in duew
>>>>>>>>>> course
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>>
>>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>>>> London
>>>>>>>>>> United Kingdom
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLFr9YSZnw$>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>> <https://urldefense.com/v3/__https://en.everybodywiki.com/Mich_Talebzadeh__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLEPx44C1w$>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>> other
>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>> content is
>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 8 Aug 2023 at 01:30, Pavan Kotikalapudi
>>>>>>>>>> <pkotikalap...@twilio.com.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Spark Dev,
>>>>>>>>>>
>>>>>>>>>> I have extended traditional DRA to work for structured streaming
>>>>>>>>>> use-case.
>>>>>>>>>>
>>>>>>>>>> Here is an initial Implementation draft PR
>>>>>>>>>> https://github.com/apache/spark/pull/42352
>>>>>>>>>> <https://urldefense.com/v3/__https://github.com/apache/spark/pull/42352__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLHLe7WCUw$>
>>>>>>>>>>  and
>>>>>>>>>> design doc:
>>>>>>>>>> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing
>>>>>>>>>> <https://urldefense.com/v3/__https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing__;!!NCc8flgU!blQ5zGotPbReMPXKaZw50BES4V_1AKqHv6bIxHVlc0QfY9iisFjT-u0be3CR6C6-41dtKLX5Ija0-EmAYfkcxLFAjJfilg$>
>>>>>>>>>>
>>>>>>>>>> Please review and let me know what you think.
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>>
>>>>>>>>>> Pavan
>>>>>>>>>>
>>>>>>>>>>

Reply via email to