ASF board report draft for August 2023

2023-08-07 Thread Matei Zaharia
It’s time to send our quarterly report to the ASF board on August 9th. Here’s what I wrote as a draft — feel free to suggest changes. = Issues for the board: - None Project status: - We cut the branch Spark 3.5.0 on July 17th 2023. The community is working on bug

[no subject]

2023-08-07 Thread Bode, Meikel
unsubscribe

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Pavan Kotikalapudi
Thanks for the review Mich, Yes, the configuration parameters we end up setting would be based on the trigger interval. > If you are going to have additional indicators why not look at scheduling delay as well Yes. The implementation is based on scheduling delays, not for pending tasks of the

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Mich Talebzadeh
Hi, I glanced over the design doc. You are providing certain configuration parameters plus some settings based on static values. For example: spark.dynamicAllocation.schedulerBacklogTimeout": 54s I cannot see any use of which ought to be at least half of the batch interval to have the correct

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Holden Karau
Oooh fascinating. I’m going on call this week so it will take me awhile but I do want to review this :) On Mon, Aug 7, 2023 at 5:30 PM Pavan Kotikalapudi wrote: > Hi Spark Dev, > > I have extended traditional DRA to work for structured streaming > use-case. > > Here is an initial Implementation

Re: What else could be removed in Spark 4?

2023-08-07 Thread Wenchen Fan
I think the principle is we should remove things that block us from supporting new things like Java 21, or come with a significant maintenance cost. If there is no benefit to removing deprecated APIs (just to keep the codebase clean?), I'd prefer to leave them there and not bother. On Tue, Aug 8,

Re: What else could be removed in Spark 4?

2023-08-07 Thread Jia Fan
Thanks Sean for open this discussion. 1. I think drop Scala 2.12 is a good option. 2. Personally, I think we should remove most methods that are deprecated since 2.x/1.x unless it can't find a good replacement. There is already a 3.x version as a buffer and I don't think it is good practice

Fwd: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Pavan Kotikalapudi
Hi Spark Dev, I have extended traditional DRA to work for structured streaming use-case. Here is an initial Implementation draft PR https://github.com/apache/spark/pull/42352 and design doc: https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing Please

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Mich Talebzadeh
On the subject of dynamic allocation, is the following message a cause for concern when running Spark on k8s? INFO ExecutorAllocationManager: Dynamic allocation is enabled without a shuffle service. Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Mich Talebzadeh
Hi, >From what I have seen spark on a serverless cluster has hard up getting the driver going in a timely manner Annotations: autopilot.gke.io/resource-adjustment:

What else could be removed in Spark 4?

2023-08-07 Thread Sean Owen
While we're noodling on the topic, what else might be worth removing in Spark 4? For example, looks like we're finally hitting problems supporting Java 8 through 21 all at once, related to Scala 2.13.x updates. It would be reasonable to require Java 11, or even 17, as a baseline for the

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
Oh great point On Mon, Aug 7, 2023 at 2:23 PM bo yang wrote: > Thanks Holden for bringing this up! > > Maybe another thing to think about is how to make dynamic allocation more > friendly with Kubernetes and disaggregated shuffle storage? > > > > On Mon, Aug 7, 2023 at 1:27 PM Holden Karau

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread bo yang
Thanks Holden for bringing this up! Maybe another thing to think about is how to make dynamic allocation more friendly with Kubernetes and disaggregated shuffle storage? On Mon, Aug 7, 2023 at 1:27 PM Holden Karau wrote: > So I wondering if there is interesting in revisiting some of how

Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
So I wondering if there is interesting in revisiting some of how Spark is doing it's dynamica allocation for Spark 4+? Some things that I've been thinking about: - Advisory user input (e.g. a way to say after X is done I know I need Y where Y might be a bunch of GPU machines) - Configurable

Re: Welcome two new Apache Spark committers

2023-08-07 Thread Peter Toth
Thank you all! On Mon, Aug 7, 2023, 19:24 L. C. Hsieh wrote: > Congratulations! > > On Mon, Aug 7, 2023 at 9:44 AM huaxin gao wrote: > > > > Congratulations! Peter and Xiduo! > > > > On Mon, Aug 7, 2023 at 9:40 AM Dongjoon Hyun > wrote: > >> > >> Congratulations, Peter and Xiduo. :) > >> > >>

Re: Welcome two new Apache Spark committers

2023-08-07 Thread L. C. Hsieh
Congratulations! On Mon, Aug 7, 2023 at 9:44 AM huaxin gao wrote: > > Congratulations! Peter and Xiduo! > > On Mon, Aug 7, 2023 at 9:40 AM Dongjoon Hyun wrote: >> >> Congratulations, Peter and Xiduo. :) >> >> Dongjoon. >> >> On Sun, Aug 6, 2023 at 10:08 PM XiDuo You wrote: >>> >>> Thank you

Re: Welcome two new Apache Spark committers

2023-08-07 Thread huaxin gao
Congratulations! Peter and Xiduo! On Mon, Aug 7, 2023 at 9:40 AM Dongjoon Hyun wrote: > Congratulations, Peter and Xiduo. :) > > Dongjoon. > > On Sun, Aug 6, 2023 at 10:08 PM XiDuo You wrote: > >> Thank you all ! >> >> Jia Fan 于2023年8月7日周一 11:31写道: >> > >> > Congratulations! >> >

Re: [VOTE] Release Apache Spark 3.3.3 (RC1)

2023-08-07 Thread Dongjoon Hyun
Thank you, Yuming. Dongjoon. On Mon, Aug 7, 2023 at 9:30 AM yangjie01 wrote: > HI,Dongjoon and Yuming > > > > I submitted a PR a few days ago to try to fix this issue: > https://github.com/apache/spark/pull/42167. The reason for the failure is > that the branch daily test and the master use

Re: Welcome two new Apache Spark committers

2023-08-07 Thread Dongjoon Hyun
Congratulations, Peter and Xiduo. :) Dongjoon. On Sun, Aug 6, 2023 at 10:08 PM XiDuo You wrote: > Thank you all ! > > Jia Fan 于2023年8月7日周一 11:31写道: > > > > Congratulations! > > > > > > Jia Fan > > > > > > 2023年8月7日 11:28,Ye Xianjin 写道: > > > > Congratulations! > > >

Re: [VOTE] Release Apache Spark 3.3.3 (RC1)

2023-08-07 Thread yangjie01
HI,Dongjoon and Yuming I submitted a PR a few days ago to try to fix this issue: https://github.com/apache/spark/pull/42167. The reason for the failure is that the branch daily test and the master use the same yml file. Jie Yang 发件人: Dongjoon Hyun 日期: 2023年8月8日 星期二 00:18 收件人: Yuming Wang

Re: [VOTE] Release Apache Spark 3.3.3 (RC1)

2023-08-07 Thread Dongjoon Hyun
Hi, Yuming. One of the community GitHub Action test pipelines is unhealthy consistently due to Python mypy linter. https://github.com/apache/spark/actions/workflows/build_branch33.yml It seems due to the pipeline difference between the same Python mypy linter already pass in commit build,

Re: Spark writing API

2023-08-07 Thread Steve Loughran
On Thu, 1 Jun 2023 at 00:58, Andrew Melo wrote: > Hi all > > I've been developing for some time a Spark DSv2 plugin "Laurelin" ( > https://github.com/spark-root/laurelin > ) to read the ROOT (https://root.cern) file format (which is used in high > energy physics). I've recently presented my work

Spark 3.41 with Java 11 performance on k8s serverless/autopilot

2023-08-07 Thread Mich Talebzadeh
Hi, I would like to share experience on spark 3.4.1 running on k8s autopilot or some refer to it as serverless. My current experience is on Google GKE autopilot . So essentially you specify the name and region and CSP