This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git
The following commit(s) were added to refs/heads/master by this push: new f7ad518 [MINOR] Make readme easier to follow f7ad518 is described below commit f7ad5188552c4f0c78c2dc1ad6f24c1977583d5c Author: Matthew Powers <matthewkevinpow...@gmail.com> AuthorDate: Thu Apr 11 09:05:36 2024 +0900 [MINOR] Make readme easier to follow ### What changes were proposed in this pull request? Update the README to make it easier to follow. ### Why are the changes needed? I tried to get spark-connect-go running locally and it was a little confusing. This new layout should make the setup steps a lot clearer. ### Does this PR introduce _any_ user-facing change? Just updates the README. ### How was this patch tested? N/A. Closes #18 from MrPowers/update-readme. Authored-by: Matthew Powers <matthewkevinpow...@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- README.md | 55 ++++++++++++++++++++++--------------------------------- 1 file changed, 22 insertions(+), 33 deletions(-) diff --git a/README.md b/README.md index 8b15743..7832edb 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,6 @@ This project houses the **experimental** client for [Spark Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html) for [Apache Spark](https://spark.apache.org/) written in [Golang](https://go.dev/). - ## Current State of the Project Currently, the Spark Connect client for Golang is highly experimental and should @@ -13,33 +12,42 @@ project reserves the right to withdraw and abandon the development of this proje if it is not sustainable. ## Getting started + +This section explains how to run Spark Connect Go locally. + +Step 1: Install Golang: https://go.dev/doc/install. + +Step 2: Ensure you have installed `buf CLI` installed, [more info here](https://buf.build/docs/installation/) + +Step 3: Run the following commands to setup the Spark Connect client. + ``` git clone https://github.com/apache/spark-connect-go.git git submodule update --init --recursive make gen && make test ``` -> Ensure you have installed `buf CLI`; [more info](https://buf.build/docs/installation/) -## How to write Spark Connect Go Application in your own project +Step 4: Setup the Spark Driver on localhost. -See [Quick Start Guide](quick-start.md) +1. [Download Spark distribution](https://spark.apache.org/downloads.html) (3.4.0+), unzip the package. -## Spark Connect Go Application Example +2. Start the Spark Connect server with the following command (make sure to use a package version that matches your Spark distribution): -A very simple example in Go looks like following: +``` +sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0 +``` + +Step 5: Run the example Go application. ``` -func main() { - remote := "localhost:15002" - spark, _ := sql.SparkSession.Builder.Remote(remote).Build() - defer spark.Stop() - - df, _ := spark.Sql("select 'apple' as word, 123 as count union all select 'orange' as word, 456 as count") - df.Show(100, false) -} +go run cmd/spark-connect-example-spark-session/main.go ``` +## How to write Spark Connect Go Application in your own project + +See [Quick Start Guide](quick-start.md) + ## High Level Design Following [diagram](https://textik.com/#ac299c8f32c4c342) shows main code in current prototype: @@ -66,7 +74,6 @@ Following [diagram](https://textik.com/#ac299c8f32c4c342) shows main code in cur | SparkConnectServiceClient |--------------+| Spark Driver | | | | | +---------------------------+ +----------------+ - ``` `SparkConnectServiceClient` is GRPC client which talks to Spark Driver. `sparkSessionImpl` generates `dataFrameImpl` @@ -75,24 +82,6 @@ instances. `dataFrameImpl` uses the GRPC client in `sparkSessionImpl` to communi We will mimic the logic in Spark Connect Scala implementation, and adopt Go common practices, e.g. returning `error` object for error handling. -## How to Run Spark Connect Go Application - -1. Install Golang: https://go.dev/doc/install. - -2. Download Spark distribution (3.4.0+), unzip the folder. - -3. Start Spark Connect server by running command: - -``` -sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0 -``` - -4. In this repo, run Go application: - -``` -go run cmd/spark-connect-example-spark-session/main.go -``` - ## Contributing Please review the [Contribution to Spark guide](https://spark.apache.org/contributing.html) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org