(beam) branch master updated: Add TF MNIST classification cost benchmark (#33391)

jrmccluskey Tue, 17 Dec 2024 11:46:41 -0800

This is an automated email from the ASF dual-hosted git repository.

jrmccluskey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git



The following commit(s) were added to refs/heads/master by this push:
     new 0e375012dbe Add TF MNIST classification cost benchmark (#33391)
0e375012dbe is described below

commit 0e375012dbe954e6892a53a2623125e5b74daecb
Author: Jack McCluskey <[email protected]>
AuthorDate: Tue Dec 17 14:44:13 2024 -0500

    Add TF MNIST classification cost benchmark (#33391)
    
    * Add TF MNIST classification cost benchmark
    
    * linting
    
    * Generalize to single workflow file for cost benchmarks
    
    * fix incorrect UTC time in comment
    
    * move wordcount to same workflow
    
    * update workflow job name
---
 ...yml => beam_Python_CostBenchmarks_Dataflow.yml} | 28 +++++++++++----
 .../python_tf_mnist_classification.txt             | 29 +++++++++++++++
 ...nsorflow_mnist_classification_cost_benchmark.py | 41 ++++++++++++++++++++++
 3 files changed, 91 insertions(+), 7 deletions(-)

diff --git 
a/.github/workflows/beam_Wordcount_Python_Cost_Benchmark_Dataflow.yml 
b/.github/workflows/beam_Python_CostBenchmarks_Dataflow.yml
similarity index 69%
rename from .github/workflows/beam_Wordcount_Python_Cost_Benchmark_Dataflow.yml
rename to .github/workflows/beam_Python_CostBenchmarks_Dataflow.yml
index 51d1005affb..18fe37e142a 100644
--- a/.github/workflows/beam_Wordcount_Python_Cost_Benchmark_Dataflow.yml
+++ b/.github/workflows/beam_Python_CostBenchmarks_Dataflow.yml
@@ -13,9 +13,11 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-name: Wordcount Python Cost Benchmarks Dataflow
+name: Python Cost Benchmarks Dataflow
 
 on:
+  schedule:
+    - cron: '30 18 * * 6' # Run at 6:30 pm UTC on Saturdays
   workflow_dispatch:
 
 #Setting explicit permissions for the action to avoid the default permissions 
which are `write-all` in case of pull_request_target event
@@ -47,16 +49,17 @@ env:
   INFLUXDB_USER_PASSWORD: ${{ secrets.INFLUXDB_USER_PASSWORD }}
 
 jobs:
-  beam_Inference_Python_Benchmarks_Dataflow:
+  beam_Python_Cost_Benchmarks_Dataflow:
     if: |
-      github.event_name == 'workflow_dispatch'
+      github.event_name == 'workflow_dispatch' ||
+      (github.event_name == 'schedule' && github.repository == 'apache/beam')
     runs-on: [self-hosted, ubuntu-20.04, main]
     timeout-minutes: 900
     name: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
     strategy:
       matrix:
-        job_name: ["beam_Wordcount_Python_Cost_Benchmarks_Dataflow"]
-        job_phrase: ["Run Wordcount Cost Benchmark"]
+        job_name: ["beam_Python_CostBenchmark_Dataflow"]
+        job_phrase: ["Run Python Dataflow Cost Benchmarks"]
     steps:
       - uses: actions/checkout@v4
       - name: Setup repository
@@ -76,10 +79,11 @@ jobs:
           test-language: python
           argument-file-paths: |
             ${{ github.workspace 
}}/.github/workflows/cost-benchmarks-pipeline-options/python_wordcount.txt
+            ${{ github.workspace 
}}/.github/workflows/cost-benchmarks-pipeline-options/python_tf_mnist_classification.txt
       # The env variables are created and populated in the 
test-arguments-action as 
"<github.job>_test_arguments_<argument_file_paths_index>"
       - name: get current time
         run: echo "NOW_UTC=$(date '+%m%d%H%M%S' --utc)" >> $GITHUB_ENV
-      - name: run wordcount on Dataflow Python
+      - name: Run wordcount on Dataflow
         uses: ./.github/actions/gradle-command-self-hosted-action
         timeout-minutes: 30
         with:
@@ -88,4 +92,14 @@ jobs:
             
-PloadTest.mainClass=apache_beam.testing.benchmarks.wordcount.wordcount \
             -Prunner=DataflowRunner \
             -PpythonVersion=3.10 \
-            '-PloadTest.args=${{ 
env.beam_Inference_Python_Benchmarks_Dataflow_test_arguments_1 }} 
--job_name=benchmark-tests-wordcount-python-${{env.NOW_UTC}} 
--output=gs://temp-storage-for-end-to-end-tests/wordcount/result_wordcount-${{env.NOW_UTC}}.txt'
 \
\ No newline at end of file
+            '-PloadTest.args=${{ 
env.beam_Inference_Python_Benchmarks_Dataflow_test_arguments_1 }} 
--job_name=benchmark-tests-wordcount-python-${{env.NOW_UTC}} 
--output_file=gs://temp-storage-for-end-to-end-tests/wordcount/result_wordcount-${{env.NOW_UTC}}.txt'
 \
+      - name: Run Tensorflow MNIST Image Classification on Dataflow
+        uses: ./.github/actions/gradle-command-self-hosted-action
+        timeout-minutes: 30
+        with:
+          gradle-command: :sdks:python:apache_beam:testing:load_tests:run
+          arguments: |
+            
-PloadTest.mainClass=apache_beam.testing.benchmarks.inference.tensorflow_mnist_classification_cost_benchmark
 \
+            -Prunner=DataflowRunner \
+            -PpythonVersion=3.10 \
+            '-PloadTest.args=${{ 
env.beam_Inference_Python_Benchmarks_Dataflow_test_arguments_2 }} 
--job_name=benchmark-tests-tf-mnist-classification-python-${{env.NOW_UTC}} 
--input_file=gs://apache-beam-ml/testing/inputs/it_mnist_data.csv 
--output_file=gs://temp-storage-for-end-to-end-tests/wordcount/result_tf_mnist-${{env.NOW_UTC}}.txt
 --model=gs://apache-beam-ml/models/tensorflow/mnist/' \
\ No newline at end of file
diff --git 
a/.github/workflows/cost-benchmarks-pipeline-options/python_tf_mnist_classification.txt
 
b/.github/workflows/cost-benchmarks-pipeline-options/python_tf_mnist_classification.txt
new file mode 100644
index 00000000000..01f4460b8c7
--- /dev/null
+++ 
b/.github/workflows/cost-benchmarks-pipeline-options/python_tf_mnist_classification.txt
@@ -0,0 +1,29 @@
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
+--region=us-central1
+--machine_type=n1-standard-2
+--num_workers=1
+--disk_size_gb=50
+--autoscaling_algorithm=NONE
+--input_options={}
+--staging_location=gs://temp-storage-for-perf-tests/loadtests
+--temp_location=gs://temp-storage-for-perf-tests/loadtests
+--requirements_file=apache_beam/ml/inference/tensorflow_tests_requirements.txt
+--publish_to_big_query=true
+--metrics_dataset=beam_run_inference
+--metrics_table=tf_mnist_classification
+--runner=DataflowRunner
\ No newline at end of file
diff --git 
a/sdks/python/apache_beam/testing/benchmarks/inference/tensorflow_mnist_classification_cost_benchmark.py
 
b/sdks/python/apache_beam/testing/benchmarks/inference/tensorflow_mnist_classification_cost_benchmark.py
new file mode 100644
index 00000000000..f7e12dcead0
--- /dev/null
+++ 
b/sdks/python/apache_beam/testing/benchmarks/inference/tensorflow_mnist_classification_cost_benchmark.py
@@ -0,0 +1,41 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# pytype: skip-file
+
+import logging
+
+from apache_beam.examples.inference import tensorflow_mnist_classification
+from apache_beam.testing.load_tests.dataflow_cost_benchmark import 
DataflowCostBenchmark
+
+
+class TensorflowMNISTClassificationCostBenchmark(DataflowCostBenchmark):
+  def __init__(self):
+    super().__init__()
+
+  def test(self):
+    extra_opts = {}
+    extra_opts['input'] = self.pipeline.get_option('input_file')
+    extra_opts['output'] = self.pipeline.get_option('output_file')
+    extra_opts['model_path'] = self.pipeline.get_option('model')
+    tensorflow_mnist_classification.run(
+        self.pipeline.get_full_options_as_args(**extra_opts),
+        save_main_session=False)
+
+
+if __name__ == '__main__':
+  logging.basicConfig(level=logging.INFO)
+  TensorflowMNISTClassificationCostBenchmark().run()

(beam) branch master updated: Add TF MNIST classification cost benchmark (#33391)

Reply via email to