[jira] [Comment Edited] (BEAM-8198) Investigate possible performance regression of Wordcount 1GB batch benchmark on Py3.

Valentyn Tymofieiev (Jira) Wed, 11 Sep 2019 14:38:18 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928020#comment-16928020
 ]


Valentyn Tymofieiev edited comment on BEAM-8198 at 9/11/19 9:37 PM:
--------------------------------------------------------------------

Looking at Jenkins jobs for Wordcount 1 GB benchmark 
(https://builds.apache.org/job/beam_PerformanceTests_WordCountIT_Py37), we can 
do the following to reproduce these runs.

1) Clone PKB and install PKB dependencies in a virtual environment with Python 
2.7. It looks like we run perfkit benchmarker in Python 2.7 environment, but 
the benchmarks pipeline is triggered via gradle and can use other runtime.


{noformat}
git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
pip install -r ./PerfKitBenchmarker/requirements.txt

{noformat}

2) Clone Beam SDK and build SDK tarball against desired commit

3) Configure the parameters to the benchmark:


{noformat}
PROJECT=my_gcp_project
PKB_DIR=/path/to/PerfKitBenchmarker
PKB_BQ_TABLE=bq_dataset_to_save_results.wordcount_py36_beam216_pkb_results
BEAM_LOCATION=/path/to/clone/of/beam
BEAM_TARBALL=$BEAM_LOCATION/sdks/python/dist/apache-beam-2.16.0.dev0.tar.gz
TEMP_LOCATION=gs://some/temp/location/

{noformat}

4) Run the benchmark:


{noformat}
bash -c "python $PKB_DIR/pkb.py \
--project=${PROJECT} --dpb_log_level=INFO --bigquery_table=${PKB_BQ_TABLE} \
--k8s_get_retry_count=36 --k8s_get_wait_interval=10 --temp_dir=/tmp \
--beam_location=${BEAM_LOCATION} --official=true --dpb_service_zone=fake_zone 
--beam_sdk=python \
--benchmarks=beam_integration_benchmark \
--beam_it_class=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it
 \
--beam_it_module=:sdks:python:test-suites:dataflow:py36 \
--beam_prebuilt=true --beam_python_sdk_location=${BEAM_TARBALL} \
--beam_runner=TestDataflowRunner --beam_it_timeout=12000 \
'--beam_it_args=--project=${PROJECT},\
--staging_location=${TEMP_LOCATION},\
--temp_location=${TEMP_LOCATION},\
--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*,\
--output=${TEMP_LOCATION}temp-storage-for-end-to-end-tests/py-it-cloud/output,\
--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710,\
--num_workers=10,--autoscaling_algorithm=NONE'"
{noformat}




was (Author: tvalentyn):
Looking at Jenkins jobs for Wordcount 1 GB benchmark 
(https://builds.apache.org/job/beam_PerformanceTests_WordCountIT_Py37), we can 
do the following to reproduce these runs.

1) Clone PKB and install PKB dependencies in a virtual environment with Python 
2.7. It looks like we run perfkit benchmarker in Python 2.7 environment, but 
the benchmarks pipeline is triggered via gradle and can use other runtime.

git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.git
pip install -r ./PerfKitBenchmarker/requirements.txt

2) Clone Beam SDK and build SDK tarball against desired commit

3) Configure the parameters to the benchmark:


{noformat}
PROJECT=my_gcp_project
PKB_DIR=/path/to/PerfKitBenchmarker
PKB_BQ_TABLE=bq_dataset_to_save_results.wordcount_py36_beam216_pkb_results
BEAM_LOCATION=/path/to/clone/of/beam
BEAM_TARBALL=$BEAM_LOCATION/sdks/python/dist/apache-beam-2.16.0.dev0.tar.gz
TEMP_LOCATION=gs://some/temp/location/

{noformat}

4) Run the benchmark:


{noformat}
bash -c "python $PKB_DIR/pkb.py \
--project=${PROJECT} --dpb_log_level=INFO --bigquery_table=${PKB_BQ_TABLE} \
--k8s_get_retry_count=36 --k8s_get_wait_interval=10 --temp_dir=/tmp \
--beam_location=${BEAM_LOCATION} --official=true --dpb_service_zone=fake_zone 
--beam_sdk=python \
--benchmarks=beam_integration_benchmark \
--beam_it_class=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it
 \
--beam_it_module=:sdks:python:test-suites:dataflow:py36 \
--beam_prebuilt=true --beam_python_sdk_location=${BEAM_TARBALL} \
--beam_runner=TestDataflowRunner --beam_it_timeout=12000 \
'--beam_it_args=--project=${PROJECT},\
--staging_location=${TEMP_LOCATION},\
--temp_location=${TEMP_LOCATION},\
--input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000*,\
--output=${TEMP_LOCATION}temp-storage-for-end-to-end-tests/py-it-cloud/output,\
--expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710,\
--num_workers=10,--autoscaling_algorithm=NONE'"
{noformat}



> Investigate possible performance regression of Wordcount 1GB batch benchmark 
> on Py3.
> ------------------------------------------------------------------------------------
>
>                 Key: BEAM-8198
>                 URL: https://issues.apache.org/jira/browse/BEAM-8198
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core, testing
>            Reporter: Valentyn Tymofieiev
>            Assignee: Valentyn Tymofieiev
>            Priority: Major
>             Fix For: 2.16.0
>
>
> context: 
> https://lists.apache.org/thread.html/51e000f16481451c207c00ac5e881aa4a46fa020922eddffd00ad527@%3Cdev.beam.apache.org%3E
> Setting fix version to 2.16.0 to understand the cause, hopefully before the 
> vote.
> cc: [~altay] [~thw] [~markflyhigh]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Comment Edited] (BEAM-8198) Investigate possible performance regression of Wordcount 1GB batch benchmark on Py3.

Reply via email to