steveburnett commented on code in PR #10793: URL: https://github.com/apache/incubator-gluten/pull/10793#discussion_r2379475207
########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above +### Maven 3.6.3 or above -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### GCC 11 or above -## GCC 11 or above +## Development -# Compile Gluten using debug mode +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). - -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: +- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), follow this guide: Review Comment: ```suggestion - If you use Moba-XTerm to connect, you don't need to install x11 server. If you are using another tool, such as putty, follow this guide: ``` Nits for conciseness, cutting the long sentence into two for easier reading, and adding a line break to left-justify the link. ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above +### Maven 3.6.3 or above -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### GCC 11 or above -## GCC 11 or above +## Development -# Compile Gluten using debug mode +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). - -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: +- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), follow this guide: [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) - Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server - Start Idea, `bash <idea_dir>/idea.sh` -## Set up Gluten project +#### Set up Gluten project - Make sure you have compiled Gluten. - Load the Gluten by File->Open, select <gluten_home/pom.xml>. Review Comment: ```suggestion - Load the Gluten by **File**->**Open**, select **<gluten_home/pom.xml>**. ``` ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above +### Maven 3.6.3 or above -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### GCC 11 or above -## GCC 11 or above +## Development -# Compile Gluten using debug mode +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). - -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: +- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), follow this guide: [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) Review Comment: ```suggestion [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) ``` Indenting the link so it is with the words in the first item of the unordered list. ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above +### Maven 3.6.3 or above -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### GCC 11 or above -## GCC 11 or above +## Development -# Compile Gluten using debug mode +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). - -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: +- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), follow this guide: [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) - Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server - Start Idea, `bash <idea_dir>/idea.sh` -## Set up Gluten project +#### Set up Gluten project - Make sure you have compiled Gluten. - Load the Gluten by File->Open, select <gluten_home/pom.xml>. -- Activate your profiles such as <backends-velox>, and Reload Maven Project, you will find all your need modules have been activated. -- Create breakpoint and debug as you wish, maybe you can try `CTRL+N` to find `TestOperator` to start your test. +- Activate your profiles such as `<backends-velox>`, then **Reload Maven Project** to activate all the needed modules. +- Create breakpoints and debug as you wish. You can use `CTRL+N` to locate a test class to start your test. -## Java/Scala code style +#### Java/Scala code style IntelliJ supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. See [IntelliJ guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). -To generate a fix for Java/Scala code style, you can run one or more of the below commands according to the code modules involved in your PR. +To format Java/Scala code using the Spotless plugin, run the following command: -For Velox backend: -``` -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.3 -Pspark-ut -DskipTests ``` -For Clickhouse backend: -``` -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.3 -Pspark-ut -DskipTests +./dev/format-scala-code.sh ``` -# CPP code development with Visual Studio Code +### C++ code development + +This guide is for remote debugging by connecting to the remote Linux server using `SSH`. -This guide is for remote debug. We will connect the remote linux server by `SSH`. Download and install [Visual Studio Code](https://code.visualstudio.com/Download). Key components found on the left side bar are: - Explorer (Project structure) - Search - Run and Debug - Extensions (Install the C/C++ Extension Pack, Remote Development, and GitLens. C++ Test Mate is also suggested.) -- Remote Explorer (Connect linux server by ssh command, click `+`, then input `ssh [email protected]`) +- Remote Explorer (Connect linux server by ssh command, click **+**, then input `ssh [email protected]`) - Manage (Settings) -Input your password in the above pop-up window, it will take a few minutes to install linux vscode server in remote machine folder `~/.vscode-server` -If download failed, delete this folder and try again. +Input your password in the above pop-up window. It will take a few minutes to install the Linux VSCode server in remote machine folder `~/.vscode-server`. -## Usage +If the download fails, delete this folder and try again. -### Set up project +Note: If VSCode is upgraded, you must download the linux server again. We recommend switching the update mode to `off`. Search `update` in Manage->Settings to turn off update mode. Review Comment: ```suggestion Note: If VSCode is upgraded, you must download the linux server again. We recommend switching the update mode to `off`. Search `update` in **Manage**->**Settings** to turn off update mode. ``` ########## docs/developers/NewToGluten.md: ########## @@ -165,11 +129,11 @@ make debug EXTRA_CMAKE_FLAGS="-DVELOX_ENABLE_PARQUET=ON -DENABLE_HDFS=ON -DVELOX ``` Then Gluten will link the Velox debug library. -Just click `build` in bottom bar, you will get intellisense search and link. +Click **build** in the bottom bar to enable IntelliSense features like search and navigation. Review Comment: ```suggestion Click **build** in the bottom bar to enable IntelliSense features like search and navigation. ``` Suggest left-justifying this instruction to make it more visible to the reader. ########## docs/developers/NewToGluten.md: ########## @@ -186,112 +150,22 @@ configurations below: After compiling with these updated configs, you should have executable files (such as Review Comment: ```suggestion After compiling with these updated configs, you should have executable files, such as ``` ########## docs/developers/NewToGluten.md: ########## @@ -356,15 +230,13 @@ We provide surefire reports of Velox ut in GHA, and developers can leverage sure You can check surefire reports: -1. Click `Checks` Tab in PR; - +1. Click **Checks** Tab in PR; 2. Find `Report test results` in `Dev PR`; - 3. Then, developers can check the result with summary and annotations. Review Comment: ```suggestion 3. There, you can can check the results with summary and annotations. ``` ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: Review Comment: ```suggestion Add the following configs in `spark-defaults.conf`: ``` ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above +### Maven 3.6.3 or above -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### GCC 11 or above -## GCC 11 or above +## Development -# Compile Gluten using debug mode +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). - -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: +- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), follow this guide: [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) - Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server - Start Idea, `bash <idea_dir>/idea.sh` -## Set up Gluten project +#### Set up Gluten project - Make sure you have compiled Gluten. - Load the Gluten by File->Open, select <gluten_home/pom.xml>. -- Activate your profiles such as <backends-velox>, and Reload Maven Project, you will find all your need modules have been activated. -- Create breakpoint and debug as you wish, maybe you can try `CTRL+N` to find `TestOperator` to start your test. +- Activate your profiles such as `<backends-velox>`, then **Reload Maven Project** to activate all the needed modules. +- Create breakpoints and debug as you wish. You can use `CTRL+N` to locate a test class to start your test. -## Java/Scala code style +#### Java/Scala code style IntelliJ supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. See [IntelliJ guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). -To generate a fix for Java/Scala code style, you can run one or more of the below commands according to the code modules involved in your PR. +To format Java/Scala code using the Spotless plugin, run the following command: -For Velox backend: -``` -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.3 -Pspark-ut -DskipTests ``` -For Clickhouse backend: -``` -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.3 -Pspark-ut -DskipTests +./dev/format-scala-code.sh ``` -# CPP code development with Visual Studio Code +### C++ code development + +This guide is for remote debugging by connecting to the remote Linux server using `SSH`. -This guide is for remote debug. We will connect the remote linux server by `SSH`. Download and install [Visual Studio Code](https://code.visualstudio.com/Download). Key components found on the left side bar are: - Explorer (Project structure) - Search - Run and Debug - Extensions (Install the C/C++ Extension Pack, Remote Development, and GitLens. C++ Test Mate is also suggested.) -- Remote Explorer (Connect linux server by ssh command, click `+`, then input `ssh [email protected]`) +- Remote Explorer (Connect linux server by ssh command, click **+**, then input `ssh [email protected]`) - Manage (Settings) -Input your password in the above pop-up window, it will take a few minutes to install linux vscode server in remote machine folder `~/.vscode-server` -If download failed, delete this folder and try again. +Input your password in the above pop-up window. It will take a few minutes to install the Linux VSCode server in remote machine folder `~/.vscode-server`. -## Usage +If the download fails, delete this folder and try again. -### Set up project +Note: If VSCode is upgraded, you must download the linux server again. We recommend switching the update mode to `off`. Search `update` in Manage->Settings to turn off update mode. -- File->Open Folder // select the Gluten folder -- After the project loads, you will be prompted to "Select CMakeLists.txt". Select the +#### Set up project + +- Select **File**->**Open Folder**, then select the Gluten folder. +- After the project loads, you will be prompted to **Select CMakeLists.txt**. Select the `${workspaceFolder}/cpp/CMakeLists.txt` file. -- Next, you will be prompted to "Select a Kit" for the Gluten project. Select GCC 11 or above. +- Next, you will be prompted to **Select a Kit** for the Gluten project. Select **GCC 11** or above. -### Settings +#### Settings VSCode supports 2 ways to set user setting. - Manage->Command Palette (Open `settings.json`, search by `Preferences: Open Settings (JSON)`) Review Comment: ```suggestion - **Manage**->**Command Palette** (Open `settings.json`, search by `Preferences: Open Settings (JSON)`) ``` ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above +### Maven 3.6.3 or above -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### GCC 11 or above -## GCC 11 or above +## Development -# Compile Gluten using debug mode +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). - -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: +- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), follow this guide: [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) - Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server Review Comment: ```suggestion - Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server. ``` ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above +### Maven 3.6.3 or above -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### GCC 11 or above -## GCC 11 or above +## Development -# Compile Gluten using debug mode +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). - -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: +- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), follow this guide: [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) - Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server - Start Idea, `bash <idea_dir>/idea.sh` -## Set up Gluten project +#### Set up Gluten project - Make sure you have compiled Gluten. - Load the Gluten by File->Open, select <gluten_home/pom.xml>. -- Activate your profiles such as <backends-velox>, and Reload Maven Project, you will find all your need modules have been activated. -- Create breakpoint and debug as you wish, maybe you can try `CTRL+N` to find `TestOperator` to start your test. +- Activate your profiles such as `<backends-velox>`, then **Reload Maven Project** to activate all the needed modules. +- Create breakpoints and debug as you wish. You can use `CTRL+N` to locate a test class to start your test. -## Java/Scala code style +#### Java/Scala code style IntelliJ supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. See [IntelliJ guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). -To generate a fix for Java/Scala code style, you can run one or more of the below commands according to the code modules involved in your PR. +To format Java/Scala code using the Spotless plugin, run the following command: -For Velox backend: -``` -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.3 -Pspark-ut -DskipTests ``` -For Clickhouse backend: -``` -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.3 -Pspark-ut -DskipTests +./dev/format-scala-code.sh ``` -# CPP code development with Visual Studio Code +### C++ code development + +This guide is for remote debugging by connecting to the remote Linux server using `SSH`. -This guide is for remote debug. We will connect the remote linux server by `SSH`. Download and install [Visual Studio Code](https://code.visualstudio.com/Download). Key components found on the left side bar are: - Explorer (Project structure) - Search - Run and Debug - Extensions (Install the C/C++ Extension Pack, Remote Development, and GitLens. C++ Test Mate is also suggested.) -- Remote Explorer (Connect linux server by ssh command, click `+`, then input `ssh [email protected]`) +- Remote Explorer (Connect linux server by ssh command, click **+**, then input `ssh [email protected]`) - Manage (Settings) -Input your password in the above pop-up window, it will take a few minutes to install linux vscode server in remote machine folder `~/.vscode-server` -If download failed, delete this folder and try again. +Input your password in the above pop-up window. It will take a few minutes to install the Linux VSCode server in remote machine folder `~/.vscode-server`. -## Usage +If the download fails, delete this folder and try again. -### Set up project +Note: If VSCode is upgraded, you must download the linux server again. We recommend switching the update mode to `off`. Search `update` in Manage->Settings to turn off update mode. -- File->Open Folder // select the Gluten folder -- After the project loads, you will be prompted to "Select CMakeLists.txt". Select the +#### Set up project + +- Select **File**->**Open Folder**, then select the Gluten folder. +- After the project loads, you will be prompted to **Select CMakeLists.txt**. Select the `${workspaceFolder}/cpp/CMakeLists.txt` file. -- Next, you will be prompted to "Select a Kit" for the Gluten project. Select GCC 11 or above. +- Next, you will be prompted to **Select a Kit** for the Gluten project. Select **GCC 11** or above. -### Settings +#### Settings VSCode supports 2 ways to set user setting. Review Comment: ```suggestion VSCode supports two ways to set the user settings. ``` ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above +### Maven 3.6.3 or above -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### GCC 11 or above -## GCC 11 or above +## Development -# Compile Gluten using debug mode +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). - -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: +- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), follow this guide: [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) - Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server - Start Idea, `bash <idea_dir>/idea.sh` Review Comment: ```suggestion - Start Idea using the following command: `bash <idea_dir>/idea.sh` ``` ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above +### Maven 3.6.3 or above -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### GCC 11 or above -## GCC 11 or above +## Development -# Compile Gluten using debug mode +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). - -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: +- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), follow this guide: [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) - Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server - Start Idea, `bash <idea_dir>/idea.sh` -## Set up Gluten project +#### Set up Gluten project - Make sure you have compiled Gluten. - Load the Gluten by File->Open, select <gluten_home/pom.xml>. -- Activate your profiles such as <backends-velox>, and Reload Maven Project, you will find all your need modules have been activated. -- Create breakpoint and debug as you wish, maybe you can try `CTRL+N` to find `TestOperator` to start your test. +- Activate your profiles such as `<backends-velox>`, then **Reload Maven Project** to activate all the needed modules. +- Create breakpoints and debug as you wish. You can use `CTRL+N` to locate a test class to start your test. -## Java/Scala code style +#### Java/Scala code style IntelliJ supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. See [IntelliJ guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). -To generate a fix for Java/Scala code style, you can run one or more of the below commands according to the code modules involved in your PR. +To format Java/Scala code using the Spotless plugin, run the following command: -For Velox backend: -``` -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.3 -Pspark-ut -DskipTests ``` -For Clickhouse backend: -``` -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.3 -Pspark-ut -DskipTests +./dev/format-scala-code.sh ``` -# CPP code development with Visual Studio Code +### C++ code development + +This guide is for remote debugging by connecting to the remote Linux server using `SSH`. -This guide is for remote debug. We will connect the remote linux server by `SSH`. Download and install [Visual Studio Code](https://code.visualstudio.com/Download). Key components found on the left side bar are: - Explorer (Project structure) - Search - Run and Debug - Extensions (Install the C/C++ Extension Pack, Remote Development, and GitLens. C++ Test Mate is also suggested.) -- Remote Explorer (Connect linux server by ssh command, click `+`, then input `ssh [email protected]`) +- Remote Explorer (Connect linux server by ssh command, click **+**, then input `ssh [email protected]`) Review Comment: Is `10.1.7.003` always going to be the correct DNS address to enter for every user? If not, the text should explain how to find what the correct DNS address is. ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above +### Maven 3.6.3 or above -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### GCC 11 or above -## GCC 11 or above +## Development -# Compile Gluten using debug mode +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). - -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: +- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), follow this guide: [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) - Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server - Start Idea, `bash <idea_dir>/idea.sh` -## Set up Gluten project +#### Set up Gluten project - Make sure you have compiled Gluten. - Load the Gluten by File->Open, select <gluten_home/pom.xml>. -- Activate your profiles such as <backends-velox>, and Reload Maven Project, you will find all your need modules have been activated. -- Create breakpoint and debug as you wish, maybe you can try `CTRL+N` to find `TestOperator` to start your test. +- Activate your profiles such as `<backends-velox>`, then **Reload Maven Project** to activate all the needed modules. +- Create breakpoints and debug as you wish. You can use `CTRL+N` to locate a test class to start your test. -## Java/Scala code style +#### Java/Scala code style IntelliJ supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. See [IntelliJ guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). -To generate a fix for Java/Scala code style, you can run one or more of the below commands according to the code modules involved in your PR. +To format Java/Scala code using the Spotless plugin, run the following command: -For Velox backend: -``` -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.3 -Pspark-ut -DskipTests ``` -For Clickhouse backend: -``` -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.3 -Pspark-ut -DskipTests +./dev/format-scala-code.sh ``` -# CPP code development with Visual Studio Code +### C++ code development + +This guide is for remote debugging by connecting to the remote Linux server using `SSH`. -This guide is for remote debug. We will connect the remote linux server by `SSH`. Download and install [Visual Studio Code](https://code.visualstudio.com/Download). Key components found on the left side bar are: - Explorer (Project structure) - Search - Run and Debug - Extensions (Install the C/C++ Extension Pack, Remote Development, and GitLens. C++ Test Mate is also suggested.) -- Remote Explorer (Connect linux server by ssh command, click `+`, then input `ssh [email protected]`) +- Remote Explorer (Connect linux server by ssh command, click **+**, then input `ssh [email protected]`) - Manage (Settings) -Input your password in the above pop-up window, it will take a few minutes to install linux vscode server in remote machine folder `~/.vscode-server` -If download failed, delete this folder and try again. +Input your password in the above pop-up window. It will take a few minutes to install the Linux VSCode server in remote machine folder `~/.vscode-server`. -## Usage +If the download fails, delete this folder and try again. -### Set up project +Note: If VSCode is upgraded, you must download the linux server again. We recommend switching the update mode to `off`. Search `update` in Manage->Settings to turn off update mode. -- File->Open Folder // select the Gluten folder -- After the project loads, you will be prompted to "Select CMakeLists.txt". Select the +#### Set up project + +- Select **File**->**Open Folder**, then select the Gluten folder. +- After the project loads, you will be prompted to **Select CMakeLists.txt**. Select the `${workspaceFolder}/cpp/CMakeLists.txt` file. -- Next, you will be prompted to "Select a Kit" for the Gluten project. Select GCC 11 or above. +- Next, you will be prompted to **Select a Kit** for the Gluten project. Select **GCC 11** or above. -### Settings +#### Settings VSCode supports 2 ways to set user setting. - Manage->Command Palette (Open `settings.json`, search by `Preferences: Open Settings (JSON)`) - Manage->Settings (Common setting) Review Comment: ```suggestion - **Manage**->**Settings** (Common setting) ``` ########## docs/developers/NewToGluten.md: ########## @@ -186,112 +150,22 @@ configurations below: After compiling with these updated configs, you should have executable files (such as `<gluten_home>/cpp/build/velox/tests/velox_shuffle_writer_test`). Review Comment: ```suggestion `<gluten_home>/cpp/build/velox/tests/velox_shuffle_writer_test`. ``` ########## docs/developers/NewToGluten.md: ########## @@ -4,158 +4,122 @@ title: New To Gluten nav_order: 2 parent: Developer Overview --- -Help users to debug and test with Gluten. -# Environment +# Guide for New Developers -Gluten supports Ubuntu20.04, Ubuntu22.04, CentOS8, CentOS7 and MacOS. +## Environment -## JDK +Gluten supports Ubuntu 20.04/22.04, CentOS 7/8, and MacOS. -Currently, Gluten supports JDK 8 for Spark 3.2/3.3/3.4/3.5. For Spark 3.3 and higher versions, Gluten -supports JDK 11 and 17. Please note since Spark 4.0, JDK 8 will not be supported. So we recommend Velox -backend users to use higher JDK version now to ease the migration for deploying Gluten with Spark-4.0 -in the future. And we may probably upgrade Arrow from 15.0.0 to some higher version, which also requires -JDK 11 is the minimum version. +### JDK -### JDK 8 +Currently, Gluten supports JDK 8 for Spark 3.2, 3.3, 3.4, and 3.5. For Spark 3.3 and later versions, Gluten +also supports JDK 11 and 17. -#### Environment Setting +Note: Starting with Spark 4.0, the minimum required JDK version is 17. -For root user, the environment variables file is `/etc/profile`, it will take effect for all the users. +We recommend using a higher JDK version now to ease migration when deploying Gluten for Spark 4.0 +in the future. In addition, we may upgrade Arrow from 15.0.0 to a newer release, which will require +JDK 11 as the minimum version. -For other user, you can set in `~/.bashrc`. +By default, Gluten compiles packages using JDK 8. Enable maven profile by `-Pjava-17` or `-Pjava-11` to use the corresponding JDK version, and ensure that the JDK version is available in your environment. -#### Guide for Ubuntu - -The default JDK version in ubuntu is java11, we need to set to java8. - -```bash -apt install openjdk-8-jdk -update-alternatives --config java -java -version -``` - -`--config java` to config java executable path, `javac` and other commands can also use this command to config. -For some other uses, we suggest to set `JAVA_HOME`. - -```bash -export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ -JRE_HOME=$JAVA_HOME/jre -export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar -# pay attention to $PATH double quote -export PATH="$PATH:$JAVA_HOME/bin" -``` - -> Must set PATH with double quote in ubuntu. - -### JDK 11/17 - -By default, Gluten compiles package using JDK8. Enable maven profile by `-Pjava-17` to use JDK17 or `-Pjava-11` to use JDK 11, and please make sure your JAVA_HOME is set correctly. - -Apache Spark and Arrow requires setting java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). -So please add following configs in `spark-defaults.conf`: +If JDK 11 or a higher version is used, Spark and Arrow require setting the java args `-Dio.netty.tryReflectionSetAccessible=true`, see [SPARK-29924](https://issues.apache.org/jira/browse/SPARK-29924) and [ARROW-6206](https://issues.apache.org/jira/browse/ARROW-6206). +So add the following configs in `spark-defaults.conf`: ``` spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true ``` -## Maven 3.6.3 or above +### Maven 3.6.3 or above -[Maven Download Page](https://maven.apache.org/docs/history.html) -And then set the environment setting. +### GCC 11 or above -## GCC 11 or above +## Development -# Compile Gluten using debug mode +To debug Java/Scala code, follow the steps in [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). -If you want to just debug java/scala code, there is no need to compile cpp code with debug mode. -You can just refer to [build-gluten-with-velox-backend](../get-started/Velox.md#build-gluten-with-velox-backend). - -If you need to debug cpp code, please compile the backend code and gluten cpp code with debug mode. +To debug C++ code, compile the backend code and gluten C++ code in debug mode. ```bash ## compile Velox backend with benchmark and tests to debug gluten_home/dev/builddeps-veloxbe.sh --build_tests=ON --build_benchmarks=ON --build_type=Debug ``` -If you need to debug the tests in <gluten>/gluten-ut, You need to compile java code with `-P spark-ut`. +Note: To debug the tests in <gluten>/gluten-ut, you must compile java code with `-Pspark-ut`. -# Java/scala code development with Intellij +### Java/scala code development -## Linux IntelliJ local debug +#### Linux IntelliJ local debug Install the Linux IntelliJ version, and debug code locally. - Ask your linux maintainer to install the desktop, and then restart the server. -- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), please follow this guide: +- If you use Moba-XTerm to connect linux server, you don't need to install x11 server, If not (e.g. putty), follow this guide: [X11 Forwarding: Setup Instructions for Linux and Mac](https://www.businessnewsdaily.com/11035-how-to-use-x11-forwarding.html) - Download [IntelliJ Linux community version](https://www.jetbrains.com/idea/download/?fromIDE=#section=linux) to Linux server - Start Idea, `bash <idea_dir>/idea.sh` -## Set up Gluten project +#### Set up Gluten project - Make sure you have compiled Gluten. - Load the Gluten by File->Open, select <gluten_home/pom.xml>. -- Activate your profiles such as <backends-velox>, and Reload Maven Project, you will find all your need modules have been activated. -- Create breakpoint and debug as you wish, maybe you can try `CTRL+N` to find `TestOperator` to start your test. +- Activate your profiles such as `<backends-velox>`, then **Reload Maven Project** to activate all the needed modules. +- Create breakpoints and debug as you wish. You can use `CTRL+N` to locate a test class to start your test. -## Java/Scala code style +#### Java/Scala code style IntelliJ supports importing settings for Java/Scala code style. You can import [intellij-codestyle.xml](../../dev/intellij-codestyle.xml) to your IDE. See [IntelliJ guide](https://www.jetbrains.com/help/idea/configuring-code-style.html#import-code-style). -To generate a fix for Java/Scala code style, you can run one or more of the below commands according to the code modules involved in your PR. +To format Java/Scala code using the Spotless plugin, run the following command: -For Velox backend: -``` -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.3 -Pspark-ut -DskipTests ``` -For Clickhouse backend: -``` -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.2 -Pspark-ut -DskipTests -mvn spotless:apply -Pbackends-clickhouse -Pspark-3.3 -Pspark-ut -DskipTests +./dev/format-scala-code.sh ``` -# CPP code development with Visual Studio Code +### C++ code development + +This guide is for remote debugging by connecting to the remote Linux server using `SSH`. -This guide is for remote debug. We will connect the remote linux server by `SSH`. Download and install [Visual Studio Code](https://code.visualstudio.com/Download). Key components found on the left side bar are: - Explorer (Project structure) - Search - Run and Debug - Extensions (Install the C/C++ Extension Pack, Remote Development, and GitLens. C++ Test Mate is also suggested.) -- Remote Explorer (Connect linux server by ssh command, click `+`, then input `ssh [email protected]`) +- Remote Explorer (Connect linux server by ssh command, click **+**, then input `ssh [email protected]`) - Manage (Settings) -Input your password in the above pop-up window, it will take a few minutes to install linux vscode server in remote machine folder `~/.vscode-server` -If download failed, delete this folder and try again. +Input your password in the above pop-up window. It will take a few minutes to install the Linux VSCode server in remote machine folder `~/.vscode-server`. -## Usage +If the download fails, delete this folder and try again. -### Set up project +Note: If VSCode is upgraded, you must download the linux server again. We recommend switching the update mode to `off`. Search `update` in Manage->Settings to turn off update mode. -- File->Open Folder // select the Gluten folder -- After the project loads, you will be prompted to "Select CMakeLists.txt". Select the +#### Set up project + +- Select **File**->**Open Folder**, then select the Gluten folder. +- After the project loads, you will be prompted to **Select CMakeLists.txt**. Select the `${workspaceFolder}/cpp/CMakeLists.txt` file. -- Next, you will be prompted to "Select a Kit" for the Gluten project. Select GCC 11 or above. +- Next, you will be prompted to **Select a Kit** for the Gluten project. Select **GCC 11** or above. -### Settings +#### Settings VSCode supports 2 ways to set user setting. - Manage->Command Palette (Open `settings.json`, search by `Preferences: Open Settings (JSON)`) - Manage->Settings (Common setting) -### Build using VSCode +#### Build using VSCode + +VSCode will try to compile using debug mode in <gluten_home>/build. You must compile Velox debug mode before Review Comment: ```suggestion VSCode will try to compile using debug mode in `<gluten_home>/build`. You must compile Velox debug mode before ``` ########## docs/developers/NewToGluten.md: ########## @@ -345,7 +219,7 @@ After the above installation, you can optionally do some configuration in Visual 4. Placement of Non-Native Code UTs: Ensure that unit tests for non-native code are placed within org.apache.gluten and org.apache.spark packages. This is important because the CI system runs unit tests from these two paths in parallel. Placing tests in other paths might cause your tests to be ignored. -### View surefire reports of Velox ut in GHA +#### View surefire reports of Velox ut in GHA Review Comment: ```suggestion #### View Surefire reports of Velox Unit Tests in GHA ``` ########## docs/developers/NewToGluten.md: ########## @@ -329,11 +203,11 @@ After the above installation, you can optionally do some configuration in Visual * Set Args: `--first-comment-is-literal=True`. * Set Exe Path to the path of the `cmake-format` command. If you installed `cmake-format` in a standard location, you might not need to change this setting. -3. Now, you can format your CMake files by right-clicking in a file and selecting `Format Document`. +3. Format your CMake files by right-clicking in a file and selecting `Format Document`. -### Add UT +#### Add UT Review Comment: ```suggestion #### Add Unit Tests ``` ########## docs/developers/NewToGluten.md: ########## @@ -356,15 +230,13 @@ We provide surefire reports of Velox ut in GHA, and developers can leverage sure You can check surefire reports: -1. Click `Checks` Tab in PR; - +1. Click **Checks** Tab in PR; 2. Find `Report test results` in `Dev PR`; Review Comment: ```suggestion 2. Find **Report test results** in **Dev PR**; ``` ########## docs/developers/NewToGluten.md: ########## @@ -470,24 +341,23 @@ child allocators: 0 at org.apache.spark.memory.SparkMemoryUtil$UnsafeItr.hasNext(SparkMemoryUtil.scala:246) ``` -## CPP code memory leak +### CPP code memory leak -Sometimes you cannot get the coredump symbols, if you debug memory leak, you can write googletest to use valgrind to detect +Sometimes you cannot get the coredump symbols, when debugging a memory leak. You can write a GoogleTest to use valgrind for detection. ```bash apt install valgrind valgrind --leak-check=yes ./exec_backend_test ``` - -# Run TPC-H and TPC-DS +## Run TPC-H and TPC-DS We supply `<gluten_home>/tools/gluten-it` to execute these queries Refer to [velox_backend.yml](https://github.com/apache/incubator-gluten/blob/main/.github/workflows/velox_backend.yml) Review Comment: Broken link: I don't find the file `velox_backend.yml` in https://github.com/apache/incubator-gluten/tree/main/.github/workflows. There are four files named in the format `velox_backend_*.yml` but I don't know which of them to link to. ########## docs/developers/NewToGluten.md: ########## @@ -356,15 +230,13 @@ We provide surefire reports of Velox ut in GHA, and developers can leverage sure You can check surefire reports: Review Comment: ```suggestion To check Surefire reports: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
