[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

GitBox Sat, 02 Apr 2022 05:33:35 -0700


lindong28 commented on a change in pull request #71:
URL: https://github.com/apache/flink-ml/pull/71#discussion_r841071624




##########
File path: flink-ml-benchmark/README.md
##########
@@ -0,0 +1,172 @@
+# Flink ML Benchmark Getting Started
+
+This document provides instructions on how to run benchmarks on Flink ML's
+stages in a Linux/MacOS environment.
+
+## Prerequisites
+
+### Install Flink
+
+Please make sure Flink 1.14 or higher version has been installed in your local
+environment. You can refer to the [local
+installation](https://nightlies.apache.org/flink/flink-docs-master/docs/try-flink/local_installation/)
+instruction on Flink's document website for how to achieve this.
+
+### Set Up Flink Environment Variables
+
+After having installed Flink, please register `$FLINK_HOME` as an environment
+variable into your local environment. For example, suppose you have downloaded
+Flink 1.14.0 and placed Flink's binary folder under `/usr/local/`, then you 
need
+to run the following command:
+
+```bash
+export FLINK_HOME=`/usr/local/flink-1.14.0`
+```
+
+Then please run the following command. If this command returns 1.14.0 or a
+higher version, then it means that the required Flink environment has been
+successfully installed and registered in your local environment.
+
+```bash
+$FLINK_HOME/bin/flink --version
+```
+
+### Acquire Flink ML Binary Distribution
+
+In order to use Flink ML's CLI you need to have the latest binary distribution
+of Flink ML. You can acquire the distribution by building Flink ML's source 
code
+locally, which means to execute the following command in Flink ML repository's
+root directory.
+
+```bash
+mvn clean package -DskipTests
+cd ./flink-ml-dist/target/flink-ml-*-bin/flink-ml*/
+```
+
+### Start Flink Cluster
+
+Please start a Flink standalone session in your local environment with the
+following command.
+
+```bash
+$FLINK_HOME/bin/start-cluster.sh
+```
+
+You should be able to navigate to the web UI at
+[localhost:8081](http://localhost:8081/) to view the Flink dashboard and see
+that the cluster is up and running.
+
+## Run Benchmark Example
+
+In Flink ML's binary distribution's folder, execute the following command to 
run
+an example benchmark.
+
+```bash
+./bin/flink-ml-benchmark.sh ./examples/benchmark-example-conf.json 
--output-file ./output/results.json
+```
+
+You will notice that some Flink job is submitted to your Flink cluster, and the
+following information is printed out in your terminal. This means that you have
+successfully executed a benchmark on `KMeansModel`.
+
+```
+Creating fat jar containing all flink ml dependencies to be submitted.
+Job has been submitted with JobID a5d8868d808eecfb357eb904c961c3bf
+Program execution finished
+Job with JobID a5d8868d808eecfb357eb904c961c3bf has finished.
+Job Runtime: 897 ms
+Accumulator Results: 
+- numElements (java.lang.Long): 10000
+
+
+{
+  "name" : "KMeansModel-1",
+  "totalTimeMs" : 897.0,
+  "inputRecordNum" : 10000,
+  "inputThroughput" : 11148.272017837235,
+  "outputRecordNum" : 10000,
+  "outputThroughput" : 11148.272017837235
+}
+```
+
+The command above would save the results into `./output/results.json` as below.
+
+```json
+[ {
+  "name" : "KMeansModel-1",
+  "totalTimeMs" : 897.0,
+  "inputRecordNum" : 10000,
+  "inputThroughput" : 11148.272017837235,
+  "outputRecordNum" : 10000,
+  "outputThroughput" : 11148.272017837235
+} ]
+```
+
+## Customize Benchmark Configuration
+
+`flink-ml-benchmark.sh` parses benchmarks to be executed according to the input
+configuration file, like `./examples/benchmark-example-conf.json`. It can also

Review comment:
       Could we move the file to the folder `conf` instead of `examples`. By 
doing this, the directory structure would be simpler and also more consistent 
with Flink.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on a change in pull request #71: [FLINK-26443] Add benchmark framework

Reply via email to