Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-08 Thread Mich Talebzadeh
I am currently contemplating and sharing my thoughts openly. Considering our reliance on previously collected statistics (as mentioned earlier), it raises the question of why we couldn't integrate certain machine learning elements into Spark Structured Streaming? While this might slightly deviate

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Mich Talebzadeh
Splendid idea.  Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Holden Karau
The driver it’s self is probably another topic, perhaps I’ll make a “faster spark star time” JIRA and a DA JIRA and we can explore both. On Tue, Aug 8, 2023 at 10:07 AM Mich Talebzadeh wrote: > From my own perspective faster execution time especially with Spark on tin > boxes (Dataproc & EC2)

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Mich Talebzadeh
>From my own perspective faster execution time especially with Spark on tin boxes (Dataproc & EC2) and Spark on k8s is something that customers often bring up. Poor time to onboard with autoscaling seems to be particularly singled out for heavy ETL jobs that use Spark. I am disappointed to see

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-08 Thread Pavan Kotikalapudi
Listeners are the best resources to the allocation manager afaik... It already has SparkListener that it utilizes. We can use it to extract more information (like processing

Re: ASF board report draft for August 2023

2023-08-08 Thread Holden Karau
Maybe add a link to the 4.0 JIRA where we are tracking the current plans for 4.0? On Tue, Aug 8, 2023 at 9:33 AM Dongjoon Hyun wrote: > Thank you, Matei. > > It looks good to me. > > Dongjoon > > On Mon, Aug 7, 2023 at 22:54 Matei Zaharia > wrote: > >> It’s time to send our quarterly report to

Re: ASF board report draft for August 2023

2023-08-08 Thread Dongjoon Hyun
Thank you, Matei. It looks good to me. Dongjoon On Mon, Aug 7, 2023 at 22:54 Matei Zaharia wrote: > It’s time to send our quarterly report to the ASF board on August 9th. > Here’s what I wrote as a draft — feel free to suggest changes. > > = > > Issues for the

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread kalyan
+1 to enhancements in DEA. Long time due! There were a few things that I was thinking along the same lines for some time now(few overlap with @holden 's points) 1. How to reduce wastage on the RM side? Sometimes the driver asks for some units of resources. But when RM provisions them, the driver

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Thomas Graves
> > - Advisory user input (e.g. a way to say after X is done I know I need Y > > where Y might be a bunch of GPU machines) You are thinking of something more advanced than the Stage Level Scheduling? Or perhaps configured differently or prestarting things you know you will need? Tom On Mon,

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-08 Thread Mich Talebzadeh
Hi Pavan or anyone else Is there any way one access the matrix displayed on SparkGUI? For example the readings for processing time? Can these be acessed? Thanks For example, Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile

Re: [VOTE] Release Apache Spark 3.5.0 (RC1)

2023-08-08 Thread Yuming Wang
-1. I found a NoClassDefFoundError bug: https://issues.apache.org/jira/browse/SPARK-44719. On Mon, Aug 7, 2023 at 11:24 AM yangjie01 wrote: > > > I submitted a PR last week to try and solve this issue: > https://github.com/apache/spark/pull/42236. > > > > *发件人**: *Sean Owen > *日期**: *2023年8月7日

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Mich Talebzadeh
Thanks for pointing out this feature to me. I will have a look when I get there. Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread 齐赫
Spark 3.5 have added an method `supportsReliableStorage` in the `ShuffleDriverComponents` which indicate whether writing shuffle data to a distributed filesystem or persisting it in a remote shuffle service. Uniffle is a general purpose remote shuffle service

Re: What else could be removed in Spark 4?

2023-08-08 Thread Cheng Pan
What do you think about removing HiveContext and even SQLContext? And as an extension of this question, should we re-implement the Hive using DSv2 API in Spark 4? For developers who want to implement a custom DataSource plugin, he/she may want to learn something from the Spark built-in one[1],

Re: What else could be removed in Spark 4?

2023-08-08 Thread Cheng Pan
> Are there old Hive/Hadoop version combos we should just stop supporting? Dropping support for Java 8 means dropping support for Hive lower than 2.0(exclusive)[1]. IsolatedClientLoader is aimed to allow using different Hive jars to communicate with different versions of HMS. AFAIK, the