Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-12 Thread bo yang
+1 On Sat, May 11, 2024 at 4:43 PM huaxin gao wrote: > +1 > > On Sat, May 11, 2024 at 4:35 PM L. C. Hsieh wrote: > >> +1 >> >> On Sat, May 11, 2024 at 3:11 PM Chao Sun wrote: >> > >> > +1 >> > >> > On Sat, May 11, 2024 at 2:10 PM L. C. Hsieh wrote: >> >> >> >> Hi all, >> >> >> >> I’d like to

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread bo yang
+1 On Tue, Apr 16, 2024 at 1:38 PM Hyukjin Kwon wrote: > +1 > > On Wed, Apr 17, 2024 at 3:57 AM L. C. Hsieh wrote: > >> +1 >> >> On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan wrote: >> > >> > +1 >> > >> > On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun >> wrote: >> >> >> >> I'll start with my

Re: [VOTE] Add new `Versions` in Apache Spark JIRA for Versioning of Spark Operator

2024-04-12 Thread bo yang
+1 On Fri, Apr 12, 2024 at 12:34 PM huaxin gao wrote: > +1 > > On Fri, Apr 12, 2024 at 9:07 AM Dongjoon Hyun wrote: > >> +1 >> >> Thank you! >> >> I hope we can customize `dev/merge_spark_pr.py` script per repository >> after this PR. >> >> Dongjoon. >> >> On 2024/04/12 03:28:36 "L. C. Hsieh"

Re: Versioning of Spark Operator

2024-04-10 Thread bo yang
Cool, looks like we have two options here. Option 1: Spark Operator and Connect Go Client versioning independent of Spark, e.g. starting with 0.1.0. Pros: they can evolve versions independently. Cons: people will need an extra step to decide the version when using Spark Operator and Connect Go

Re: Versioning of Spark Operator

2024-04-09 Thread bo yang
Thanks Liang-Chi for the Spark Operator work, and also the discussion here! For Spark Operator and Connector Go Client, I am guessing they need to support multiple versions of Spark? e.g. same Spark Operator may support running multiple versions of Spark, and Connector Go Client might support

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread bo yang
+1 (non-binding) On Mon, Apr 1, 2024 at 10:19 AM Felix Cheung wrote: > +1 > -- > *From:* Denny Lee > *Sent:* Monday, April 1, 2024 10:06:14 AM > *To:* Hussein Awala > *Cc:* Chao Sun ; Hyukjin Kwon ; > Mridul Muralidharan ; dev > *Subject:* Re: [VOTE] SPIP: Pure

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-13 Thread bo yang
+1 On Wed, Mar 13, 2024 at 7:19 AM Tom Graves wrote: > Similar as others, will be interested in working out api's and details > but overall in favor of it. > > +1 > > Tom Graves > On Monday, March 11, 2024 at 11:25:38 AM CDT, Mridul Muralidharan < > mri...@gmail.com> wrote: > > > > I am

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-14 Thread bo yang
+1 On Tue, Nov 14, 2023 at 7:18 PM huaxin gao wrote: > +1 > > On Tue, Nov 14, 2023 at 10:45 AM Holden Karau > wrote: > >> +1 >> >> On Tue, Nov 14, 2023 at 10:21 AM DB Tsai wrote: >> >>> +1 >>> >>> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >>> >>> On Nov 14, 2023, at 10:14 

Re: Write Spark Connection client application in Go

2023-09-14 Thread bo yang
at’s so cool! Great work y’all :) >> >> On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote: >> >>> Hi Spark Friends, >>> >>> Anyone interested in using Golang to write Spark application? We created >>> a Spark Connect Go Client library >>>

Write Spark Connection client application in Go

2023-09-12 Thread bo yang
Hi Spark Friends, Anyone interested in using Golang to write Spark application? We created a Spark Connect Go Client library . Would love to hear feedback/thoughts from the community. Please see the quick start guide

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread bo yang
Thanks Holden for bringing this up! Maybe another thing to think about is how to make dynamic allocation more friendly with Kubernetes and disaggregated shuffle storage? On Mon, Aug 7, 2023 at 1:27 PM Holden Karau wrote: > So I wondering if there is interesting in revisiting some of how

Re: [CONNECT] New Clients for Go and Rust

2023-06-01 Thread bo yang
Hi Martin, Thanks a lot for preparing the new repo and making it super easy for me to just copy my code to the new repo! I will create a new PR there. > I think the PR is fine from a code perspective as a starting point. I've prepared the go repository with all the things necessary so that it

Re: [CONNECT] New Clients for Go and Rust

2023-05-31 Thread bo yang
Just see the discussions here! Really appreciate Martin and other folks helping on my previous Golang Spark Connect PR ( https://github.com/apache/spark/pull/41036)! Great to see we have a new repo for Spark Golang Connect client. Thanks Hyukjin! I am thinking to migrate my PR to this new repo.

Re: How can I get the same spark context in two different python processes

2022-12-12 Thread bo yang
In theory, maybe a Jupyter notebook or something similar could achieve this? e.g. running some Jypyter kernel inside Spark driver, then another Python process could connect to that kernel. But in the end, this is like Spark Connect :) On Mon, Dec 12, 2022 at 2:55 PM Kevin Su wrote: > Also, is

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang
Yes, it should be possible, any interest to work on this together? Need more hands to add more features here :) On Tue, May 17, 2022 at 2:06 PM Holden Karau wrote: > Could we make it do the same sort of history server fallback approach? > > On Tue, May 17, 2022 at 10:41 PM bo ya

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang
is to behave like that Web Application Proxy. It will simplify settings to access Spark UI on Kubernetes. On Mon, May 16, 2022 at 11:46 PM wilson wrote: > what's the advantage of using reverse proxy for spark UI? > > Thanks > > On Tue, May 17, 2022 at 1:47 PM bo yang wrote: >

Re: Reverse proxy for Spark UI on Kubernetes

2022-05-17 Thread bo yang
Thanks Holden :) On Mon, May 16, 2022 at 11:12 PM Holden Karau wrote: > Oh that’s rad  > > On Tue, May 17, 2022 at 7:47 AM bo yang wrote: > >> Hi Spark Folks, >> >> I built a web reverse proxy to access Spark UI on Kubernetes (working >> together with >&

Reverse proxy for Spark UI on Kubernetes

2022-05-16 Thread bo yang
Hi Spark Folks, I built a web reverse proxy to access Spark UI on Kubernetes (working together with https://github.com/GoogleCloudPlatform/spark-on-k8s-operator). Want to share here in case other people have similar need. The reverse proxy code is here:

Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
gt; chart to > deploy Spark and some other stuff on K8S? > > ons. 23. feb. 2022 kl. 17:49 skrev bo yang : > >> Hi Sarath, let's follow up offline on this. >> >> On Wed, Feb 23, 2022 at 8:32 AM Sarath Annareddy < >> sarath.annare...@gmail.com> wrote: >> &

Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
Hi Sarath, let's follow up offline on this. On Wed, Feb 23, 2022 at 8:32 AM Sarath Annareddy wrote: > Hi bo > > How do we start? > > Is there a plan? Onboarding, Arch/design diagram, tasks lined up etc > > > Thanks > Sarath > > > Sent from my iPhone >

Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
Guidance is appreciated. > > Sarath > > Sent from my iPhone > > On Feb 23, 2022, at 2:01 AM, bo yang wrote: > >  > > Right, normally people start with simple script, then add more stuff, like > permission and more components. After some time, people want to run th

Re: One click to run Spark on Kubernetes

2022-02-23 Thread bo yang
liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 23 Feb 2022 at 04:06, bo yang wrote: > >> Hi Spark Community, >> >> We built an open source tool to deploy and run Spark on Kubernetes with a >> one click

Re: One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
you share link to the source? > > בתאריך יום ד׳, 23 בפבר׳ 2022, 6:52, מאת bo yang ‏: > >> We do not have SaaS yet. Now it is an open source project we build in our >> part time , and we welcome more people working together on that. >> >> You could specify cluste

Re: One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
r > about 1 hour. Do you have the SaaS solution for this? I can pay as I did. > > Thanks > > On Wed, Feb 23, 2022 at 12:21 PM bo yang wrote: > >> It is not a standalone spark cluster. In some details, it deploys a Spark >> Operator (https://github.com/GoogleCloudPlatfo

Re: One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
ion of spark? or just the standalone node? > > Thanks > > On Wed, Feb 23, 2022 at 12:06 PM bo yang wrote: > >> Hi Spark Community, >> >> We built an open source tool to deploy and run Spark on Kubernetes with a >> one click command. For example, on AWS, it co

One click to run Spark on Kubernetes

2022-02-22 Thread bo yang
Hi Spark Community, We built an open source tool to deploy and run Spark on Kubernetes with a one click command. For example, on AWS, it could automatically create an EKS cluster, node group, NGINX ingress, and Spark Operator. Then you will be able to use curl or a CLI tool to submit Spark

Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-05 Thread bo yang
+1 (non-binding) On Wed, Jan 5, 2022 at 11:01 PM Holden Karau wrote: > +1 (binding) > > On Wed, Jan 5, 2022 at 5:31 PM William Wang > wrote: > >> +1 (non-binding) >> >> Yikun Jiang 于2022年1月6日周四 09:07写道: >> >>> Hi all, >>> >>> I’d like to start a vote for SPIP: "Support Customized Kubernetes

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2022-01-05 Thread bo yang
Hi Mich, Curious what do you mean “The constraint seems to be that you can fit one Spark executor pod per Kubernetes node and from my tests you don't seem to be able to allocate more than 50% of RAM on the node to the container", Would you help to explain a bit? Asking this because there could be

Re: Apache Spark 3.2 Expectation

2021-02-28 Thread bo yang
+1 for better support for disaggregated shuffle (push-based shuffle is a great example, also there are Facebook shuffle service and Uber remote shuffle service ). There

Re: Enabling fully disaggregated shuffle on Spark

2019-12-04 Thread bo yang
Thanks guys for the discussion in the email and also this afternoon! >From our experience, we do not need to change Spark DAG scheduler to implement a remote shuffle service. Current Spark shuffle manager interfaces are pretty good and easy to implement. But we do feel the need to modify

Re: Enabling fully disaggregated shuffle on Spark

2019-11-20 Thread bo yang
19, 2019 at 4:05 PM Ryan Blue >> wrote: >> >>> I'm interested in remote shuffle services as well. I'd love to hear >>> about what you're using in production! >>> >>> rb >>> >>> On Tue, Nov 19, 2019 at 2:43 PM bo yang wrote: >>

Re: Enabling fully disaggregated shuffle on Spark

2019-11-19 Thread bo yang
Hi Ben, Thanks for the writing up! This is Bo from Uber. I am in Felix's team in Seattle, and working on disaggregated shuffle (we called it remote shuffle service, RSS, internally). We have put RSS into production for a while, and learned a lot during the work (tried quite a few techniques to

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-14 Thread bo yang
+1 This is great work, allowing plugin of different sort shuffle write/read implementation! Also great to see it retain the current Spark configuration (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl). On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah wrote: > Hi everyone, > >

Support structured plan logging

2018-10-11 Thread bo yang
Hi All, Are there any people interested in adding structured plan logging in Spark? Currently the logical/physical plan could be logged as plain text via explain() method, which has some issues, for example, string truncation and difficult for tool/program to use. This PR