[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=107478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107478 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 30/May/18 23:25 Start Date: 30/May/18 23:25 Worklog Time Spent: 10m Work Description: chamikaramj closed pull request #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy b/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy index 7aa5c3251c0..62a2346fa17 100644 --- a/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy +++ b/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy @@ -140,7 +140,7 @@ private void create_filebasedio_performance_test_job(testConfiguration) { beam_extra_mvn_properties: '["filesystem=hdfs"]', bigquery_table : testConfiguration.bqTable, beam_options_config_file : makePathAbsolute('pkb-config.yml'), -beam_kubernetes_scripts : makePathAbsolute('hdfs-single-datanode-cluster.yml') + ',' + makePathAbsolute('hdfs-single-datanode-cluster-for-local-dev.yml') +beam_kubernetes_scripts : makePathAbsolute('hdfs-multi-datanode-cluster.yml') ] common_job_properties.setupKubernetes(delegate, namespace, kubeconfig) common_job_properties.buildPerformanceTest(delegate, argMap) @@ -149,5 +149,5 @@ private void create_filebasedio_performance_test_job(testConfiguration) { } static def makePathAbsolute(String path) { -return '"$WORKSPACE/src/.test-infra/kubernetes/hadoop/SmallITCluster/' + path + '"' +return '"$WORKSPACE/src/.test-infra/kubernetes/hadoop/LargeITCluster/' + path + '"' } \ No newline at end of file diff --git a/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster-for-local-dev.yml b/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster-for-local-dev.yml new file mode 100644 index 000..7cb891bcd99 --- /dev/null +++ b/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster-for-local-dev.yml @@ -0,0 +1,73 @@ +#Licensed to the Apache Software Foundation (ASF) under one or more +#contributor license agreements. See the NOTICE file distributed with +#this work for additional information regarding copyright ownership. +#The ASF licenses this file to You under the Apache License, Version 2.0 +#(the "License"); you may not use this file except in compliance with +#the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. +# +# This cluster is intended to be run additionally to hdfs-multi-datanode-cluster.yml. +# It provides an additional setup to access large hdfs cluster by DirectRunner or any +# external application. Services created by this setup need to be properly included in +# /etc/hosts file, so it is strongly suggested to run start-all.sh script instead of +# running this file manually. +# + +apiVersion: v1 +kind: Service +metadata: + name: datanode-0 + labels: +name: datanode-0 +spec: + ports: +- name: hdfs + port: 9000 +- name: web + port: 50010 + selector: +statefulset.kubernetes.io/pod-name: datanode-0 + type: LoadBalancer + +--- + +apiVersion: v1 +kind: Service +metadata: + name: datanode-1 + labels: +name: datanode-1 +spec: + ports: +- name: hdfs + port: 9000 +- name: web + port: 50010 + selector: +statefulset.kubernetes.io/pod-name: datanode-1 + type: LoadBalancer + +--- + +apiVersion: v1 +kind: Service +metadata: + name: datanode-2 + labels: +name: datanode-2 +spec: + ports: +- name: hdfs + port: 9000 +- name: web + port: 50010 + selector: +statefulset.kubernetes.io/pod-name: datanode-2 + type: LoadBalancer diff --git a/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster.yml b/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster.yml new file mode 100644 index 000..e796243d389 --- /dev/null +++
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=107477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107477 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 30/May/18 23:25 Start Date: 30/May/18 23:25 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441#issuecomment-393351110 LGTM. I'll go ahead and merge. Please consider removing the pom.xml updates in a separate PR if Gradle based execution does not need that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107477) Time Spent: 10h 10m (was: 10h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 10h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=106689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-106689 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 29/May/18 15:55 Start Date: 29/May/18 15:55 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441#discussion_r191479040 ## File path: sdks/java/io/file-based-io-tests/pom.xml ## @@ -248,6 +248,89 @@ + +io-it-hdfs-large + +io-it-suite-hdfs-large + + + + ${project.parent.parent.parent.parent.basedir} Review comment: Why do we need this Maven config ? I assume test can be run using Gradle now, right ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 106689) Time Spent: 10h (was: 9h 50m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 10h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105700 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 24/May/18 19:07 Start Date: 24/May/18 19:07 Worklog Time Spent: 10m Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441#issuecomment-391825986 Works like a charm! Alan Myrvold update kubernetes cluster so I finally tested it on our Jenkins. @chamikaramj let me know if I can help with the review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105700) Time Spent: 9h 50m (was: 9h 40m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 9h 50m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105698 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 24/May/18 18:52 Start Date: 24/May/18 18:52 Worklog Time Spent: 10m Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441#issuecomment-391821501 Run Java XmlIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105698) Time Spent: 9h 40m (was: 9.5h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105692 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 24/May/18 18:41 Start Date: 24/May/18 18:41 Worklog Time Spent: 10m Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441#issuecomment-391818397 Run Java CompressedTextIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105692) Time Spent: 9.5h (was: 9h 20m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 9.5h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105690=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105690 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 24/May/18 18:28 Start Date: 24/May/18 18:28 Worklog Time Spent: 10m Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441#issuecomment-391814321 Run Java TextIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105690) Time Spent: 9h 20m (was: 9h 10m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 9h 20m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105689 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 24/May/18 18:24 Start Date: 24/May/18 18:24 Worklog Time Spent: 10m Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441#issuecomment-391813196 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105689) Time Spent: 9h 10m (was: 9h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 9h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105049 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 23/May/18 10:47 Start Date: 23/May/18 10:47 Worklog Time Spent: 10m Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441#issuecomment-391304075 @chamikaramj It's ready for review. I expected the error with deletion so I created an issue in JIRA describing the reason why we need to upgrade kubectl to at least version 1.9.3 [link to jira](https://issues.apache.org/jira/browse/BEAM-4362). I didn't expected that we are using kubernetes server version lower than 1.9 which is an issue because we can't use new kubernetes features like StatefulSets(they bring lots of improvement over ReplicationsControllers and simplifies hadoop configuration a lot.) I created a JIRA issue for this https://issues.apache.org/jira/browse/BEAM-4390 I also mentioned about security issues found on 20 of May that need to be pached. @chamikaramj who may have access to GCP UI and can help with upgrading of kubernetes cluster in apache-beam-testing project? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105049) Time Spent: 9h (was: 8h 50m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105045 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 23/May/18 10:25 Start Date: 23/May/18 10:25 Worklog Time Spent: 10m Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441#issuecomment-391298837 Run Java TextIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105045) Time Spent: 8h 50m (was: 8h 40m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105043 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 23/May/18 10:21 Start Date: 23/May/18 10:21 Worklog Time Spent: 10m Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441#issuecomment-391297887 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105043) Time Spent: 8h 40m (was: 8.5h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=104231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-104231 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 21/May/18 20:19 Start Date: 21/May/18 20:19 Worklog Time Spent: 10m Work Description: szewi opened a new pull request #5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use large cl… URL: https://github.com/apache/beam/pull/5441 …uster. All files have meaningful description. This PR switches Jenkins job to use hdfs large cluster in all jenkins jobs that use hdfs. Large cluster is configured as 1 namenode and 3 datanodes. StatefulSet introduced in kubernetes 1.9 is used instead of ReplicationControlller as better solution for solving cluster auto-configuration issues. Docker image with hadoop 2.7.1 is hosted in public docker hub and being pulled from it. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 104231) Time Spent: 8.5h (was: 8h 20m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87136 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 03/Apr/18 15:49 Start Date: 03/Apr/18 15:49 Worklog Time Spent: 10m Work Description: chamikaramj closed pull request #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/.test-infra/jenkins/job_beam_PerformanceTests_FileBasedIO_IT_HDFS.groovy b/.test-infra/jenkins/job_beam_PerformanceTests_FileBasedIO_IT_HDFS.groovy new file mode 100644 index 000..19c0f074e80 --- /dev/null +++ b/.test-infra/jenkins/job_beam_PerformanceTests_FileBasedIO_IT_HDFS.groovy @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import common_job_properties + +def testsConfigurations = [ +[ +jobName : 'beam_PerformanceTests_TextIOIT_HDFS', +jobDescription: 'Runs PerfKit tests for TextIOIT on HDFS', +itClass : 'org.apache.beam.sdk.io.text.TextIOIT', +bqTable : 'beam_performance.textioit_hdfs_pkb_results', +prCommitStatusName: 'Java TextIO Performance Test on HDFS', +prTriggerPhase: 'Run Java TextIO Performance Test HDFS', +extraPipelineArgs: [ +numberOfRecords: '100' +] + +], +[ +jobName: 'beam_PerformanceTests_Compressed_TextIOIT_HDFS', +jobDescription : 'Runs PerfKit tests for TextIOIT with GZIP compression on HDFS', +itClass: 'org.apache.beam.sdk.io.text.TextIOIT', +bqTable: 'beam_performance.compressed_textioit_hdfs_pkb_results', +prCommitStatusName : 'Java CompressedTextIO Performance Test on HDFS', +prTriggerPhase : 'Run Java CompressedTextIO Performance Test HDFS', +extraPipelineArgs: [ +numberOfRecords: '100', +compressionType: 'GZIP' +] +], +[ +jobName : 'beam_PerformanceTests_AvroIOIT_HDFS', +jobDescription: 'Runs PerfKit tests for AvroIOIT on HDFS', +itClass : 'org.apache.beam.sdk.io.avro.AvroIOIT', +bqTable : 'beam_performance.avroioit_hdfs_pkb_results', +prCommitStatusName: 'Java AvroIO Performance Test on HDFS', +prTriggerPhase: 'Run Java AvroIO Performance Test HDFS', +extraPipelineArgs: [ +numberOfRecords: '100' +] +], +// TODO(BEAM-3945) TFRecord performance test is failing only when running on hdfs. +// We need to fix this before enabling this job on jenkins. +//[ +//jobName : 'beam_PerformanceTests_TFRecordIOIT_HDFS', +//jobDescription: 'Runs PerfKit tests for beam_PerformanceTests_TFRecordIOIT on HDFS', +//itClass : 'org.apache.beam.sdk.io.tfrecord.TFRecordIOIT', +//bqTable : 'beam_performance.tfrecordioit_hdfs_pkb_results', +//prCommitStatusName: 'Java TFRecordIO Performance Test on HDFS', +//prTriggerPhase: 'Run Java TFRecordIO Performance Test HDFS', +//extraPipelineArgs: [ +//numberOfRecords: '100' +//] +//], +[ +jobName : 'beam_PerformanceTests_XmlIOIT_HDFS', +jobDescription: 'Runs PerfKit tests for
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87135 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 03/Apr/18 15:48 Start Date: 03/Apr/18 15:48 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-378298429 LGTM. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87135) Time Spent: 8h 10m (was: 8h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87080 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 03/Apr/18 13:16 Start Date: 03/Apr/18 13:16 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-378246261 This error on jenkins is not PR related: ```Workflow failed. Causes: Project apache-beam-testing has insufficient quota(s) to execute this workflow with 1 instances in region us-central1. Quota summary (required/available): 1/1435 instances, 1/0 CPUs, 250/13800 disk GB, 0/1998 SSD disk GB, 1/65 instance groups, 1/15 managed instance groups, 1/41 instance templates, 1/287 in-use IP addresses.``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87080) Time Spent: 8h (was: 7h 50m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87069=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87069 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 03/Apr/18 13:01 Start Date: 03/Apr/18 13:01 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-378241629 Run Java TextIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87069) Time Spent: 7h 50m (was: 7h 40m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87066 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 03/Apr/18 12:45 Start Date: 03/Apr/18 12:45 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-378236853 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87066) Time Spent: 7h 40m (was: 7.5h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87005 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 03/Apr/18 09:46 Start Date: 03/Apr/18 09:46 Worklog Time Spent: 10m Work Description: szewi commented on a change in pull request #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#discussion_r178768125 ## File path: .test-infra/jenkins/job_beam_PerformanceTests_FileBasedIO_IT_HDFS.groovy ## @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import common_job_properties + +def testsConfigurations = [ +[ +jobName : 'beam_PerformanceTests_TextIOIT_HDFS', +jobDescription: 'Runs PerfKit tests for TextIOIT on HDFS', +itClass : 'org.apache.beam.sdk.io.text.TextIOIT', +bqTable : 'beam_performance.textioit_hdfs_pkb_results', +prCommitStatusName: 'Java TextIO Performance Test on HDFS', +prTriggerPhase: 'Run Java TextIO Performance Test HDFS', +extraPipelineArgs: [ +numberOfRecords: '100' +] + +], +[ +jobName: 'beam_PerformanceTests_Compressed_TextIOIT_HDFS', +jobDescription : 'Runs PerfKit tests for TextIOIT with GZIP compression on HDFS', +itClass: 'org.apache.beam.sdk.io.text.TextIOIT', +bqTable: 'beam_performance.compressed_textioit_hdfs_pkb_results', +prCommitStatusName : 'Java CompressedTextIO Performance Test on HDFS', +prTriggerPhase : 'Run Java CompressedTextIO Performance Test HDFS', +extraPipelineArgs: [ +numberOfRecords: '100', +compressionType: 'GZIP' +] +], +[ +jobName : 'beam_PerformanceTests_AvroIOIT_HDFS', +jobDescription: 'Runs PerfKit tests for AvroIOIT on HDFS', +itClass : 'org.apache.beam.sdk.io.avro.AvroIOIT', +bqTable : 'beam_performance.avroioit_hdfs_pkb_results', +prCommitStatusName: 'Java AvroIO Performance Test on HDFS', +prTriggerPhase: 'Run Java AvroIO Performance Test HDFS', +extraPipelineArgs: [ +numberOfRecords: '100' +] +], +//[ +//jobName : 'beam_PerformanceTests_TFRecordIOIT_HDFS', +//jobDescription: 'Runs PerfKit tests for beam_PerformanceTests_TFRecordIOIT on HDFS', Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87005) Time Spent: 7.5h (was: 7h 20m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87004 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 03/Apr/18 09:45 Start Date: 03/Apr/18 09:45 Worklog Time Spent: 10m Work Description: szewi commented on a change in pull request #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#discussion_r178768038 ## File path: sdks/java/io/file-based-io-tests/pom.xml ## @@ -294,6 +294,11 @@ ${apache.hadoop.version} runtime + +javax.xml.bind +jaxb-api Review comment: Gradle was updated in #4870 Diff: https://github.com/apache/beam/pull/4870/files#diff-76558bfd90b93d37ab9369862732ffd6 It's already merged to the master branch. This jenkins jobs works, because when tests are running by perfkit on jenkins, jenkins is clonning master branch and run tests against that code. Let me know if you still prefer to add this line in gradle here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87004) Time Spent: 7h 20m (was: 7h 10m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=86934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86934 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 03/Apr/18 04:25 Start Date: 03/Apr/18 04:25 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#discussion_r178711210 ## File path: sdks/java/io/file-based-io-tests/pom.xml ## @@ -294,6 +294,11 @@ ${apache.hadoop.version} runtime + +javax.xml.bind +jaxb-api Review comment: Please update Gradle file as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 86934) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=86933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86933 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 03/Apr/18 04:25 Start Date: 03/Apr/18 04:25 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#discussion_r178711242 ## File path: .test-infra/jenkins/job_beam_PerformanceTests_FileBasedIO_IT_HDFS.groovy ## @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import common_job_properties + +def testsConfigurations = [ +[ +jobName : 'beam_PerformanceTests_TextIOIT_HDFS', +jobDescription: 'Runs PerfKit tests for TextIOIT on HDFS', +itClass : 'org.apache.beam.sdk.io.text.TextIOIT', +bqTable : 'beam_performance.textioit_hdfs_pkb_results', +prCommitStatusName: 'Java TextIO Performance Test on HDFS', +prTriggerPhase: 'Run Java TextIO Performance Test HDFS', +extraPipelineArgs: [ +numberOfRecords: '100' +] + +], +[ +jobName: 'beam_PerformanceTests_Compressed_TextIOIT_HDFS', +jobDescription : 'Runs PerfKit tests for TextIOIT with GZIP compression on HDFS', +itClass: 'org.apache.beam.sdk.io.text.TextIOIT', +bqTable: 'beam_performance.compressed_textioit_hdfs_pkb_results', +prCommitStatusName : 'Java CompressedTextIO Performance Test on HDFS', +prTriggerPhase : 'Run Java CompressedTextIO Performance Test HDFS', +extraPipelineArgs: [ +numberOfRecords: '100', +compressionType: 'GZIP' +] +], +[ +jobName : 'beam_PerformanceTests_AvroIOIT_HDFS', +jobDescription: 'Runs PerfKit tests for AvroIOIT on HDFS', +itClass : 'org.apache.beam.sdk.io.avro.AvroIOIT', +bqTable : 'beam_performance.avroioit_hdfs_pkb_results', +prCommitStatusName: 'Java AvroIO Performance Test on HDFS', +prTriggerPhase: 'Run Java AvroIO Performance Test HDFS', +extraPipelineArgs: [ +numberOfRecords: '100' +] +], +//[ +//jobName : 'beam_PerformanceTests_TFRecordIOIT_HDFS', +//jobDescription: 'Runs PerfKit tests for beam_PerformanceTests_TFRecordIOIT on HDFS', Review comment: Please add a comment with a link to a JIRA explaining why this is commented out. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 86933) Time Spent: 7h 10m (was: 7h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=86932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86932 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 03/Apr/18 04:22 Start Date: 03/Apr/18 04:22 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-378123934 Run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 86932) Time Spent: 7h (was: 6h 50m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=84432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-84432 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 26/Mar/18 16:21 Start Date: 26/Mar/18 16:21 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-376225174 @chamikaramj if seed job passes all hdfs jobs will be active except TFRecord hdfs tests job which will be disabled. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 84432) Time Spent: 6h 50m (was: 6h 40m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=84430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-84430 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 26/Mar/18 16:19 Start Date: 26/Mar/18 16:19 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-376224379 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 84430) Time Spent: 6h 40m (was: 6.5h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=84415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-84415 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 26/Mar/18 15:49 Start Date: 26/Mar/18 15:49 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-376214711 Thanks. Yeah, I think it's fine to proceed with a JIRA created for TfRecord. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 84415) Time Spent: 6.5h (was: 6h 20m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=84387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-84387 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 26/Mar/18 15:16 Start Date: 26/Mar/18 15:16 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-376202797 Hi Cham, it's ready for review. The kubernetes clusters are being created and deleted afterwards, even during failure. I trigger two builds to run in the same time and there were no interferences between them. Tests are working smoothly on HDFS for TextIO, CompressedText, AVRO and XMLIO. There is only an issue for TFRecord: https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/3/console >java.lang.IllegalStateException: Not a valid TFRecord. Fewer than 12 bytes. >java.lang.IllegalStateException: Invalid data this is only happening when running TFRecord tests on HDFS. Same happen when running tests from my local machine. This seem to be related to how data is accessed in TFRecord tests https://github.com/apache/beam/blob/597e3f92bc8be692d5d8e8040b33ce0c77350fa2/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/tfrecord/TFRecordIOIT.java#L110 and https://github.com/apache/beam/blob/597e3f92bc8be692d5d8e8040b33ce0c77350fa2/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/tfrecord/TFRecordIOIT.java#L113 I investigated it and it seems like even if I specify exact filename on hdfs (`.apply(TFRecordIO.read().from("hdfs://35.225.39.200:9000/TFRecord_1522073710252-0-of-6.tfrecord").withCompression(AUTO))`) instead of filenamePattern it still got the same error with invalid data. I can create a Bug in JIRA describing the issue, providing steps to reproduce and look at this separately. Should we proceed with that PR, but simply temporarily disable/remove TFRecord job? WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 84387) Time Spent: 6h 20m (was: 6h 10m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83990 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 24/Mar/18 11:44 Start Date: 24/Mar/18 11:44 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-375877853 Run Java TFRecordIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83990) Time Spent: 6h 10m (was: 6h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83989 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 24/Mar/18 11:38 Start Date: 24/Mar/18 11:38 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-375877105 Run Java XmlIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83989) Time Spent: 6h (was: 5h 50m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83988 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 24/Mar/18 11:31 Start Date: 24/Mar/18 11:31 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-375876057 Run Java AvroIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83988) Time Spent: 5h 50m (was: 5h 40m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83987 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 24/Mar/18 11:26 Start Date: 24/Mar/18 11:26 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-375875453 Run Java CompressedTextIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83987) Time Spent: 5h 40m (was: 5.5h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83985 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 24/Mar/18 11:18 Start Date: 24/Mar/18 11:18 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-375874415 Run Java TextIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83985) Time Spent: 5.5h (was: 5h 20m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83983 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 24/Mar/18 11:13 Start Date: 24/Mar/18 11:13 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-375873823 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83983) Time Spent: 5h 20m (was: 5h 10m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83726 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 23/Mar/18 18:10 Start Date: 23/Mar/18 18:10 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-375754624 Kamil, please let me know when this is ready for review. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83726) Time Spent: 5h 10m (was: 5h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83710 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 23/Mar/18 17:37 Start Date: 23/Mar/18 17:37 Worklog Time Spent: 10m Work Description: lukecwik closed pull request #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/build.gradle b/build.gradle index 69502ab77c0..9263f3a073d 100644 --- a/build.gradle +++ b/build.gradle @@ -44,6 +44,7 @@ def pubsub_grpc_version = "0.1.18" def apex_core_version = "3.6.0" def apex_malhar_version = "3.4.0" def postgres_version = "9.4.1212.jre7" +def jaxb_api_version = "2.2.12" // A map of maps containing common libraries used per language. To use: // dependencies { @@ -125,6 +126,7 @@ ext.library = [ jackson_dataformat_cbor: "com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:$jackson_version", jackson_dataformat_yaml: "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:$jackson_version", jackson_module_scala: "com.fasterxml.jackson.module:jackson-module-scala_2.11:$jackson_version", +jaxb_api: "javax.xml.bind:jaxb-api:$jaxb_api_version", joda_time: "joda-time:joda-time:2.4", junit: "junit:junit:4.12", kafka_clients: "org.apache.kafka:kafka-clients:1.0.0", diff --git a/pom.xml b/pom.xml index 9573a07767f..0e10ac92a86 100644 --- a/pom.xml +++ b/pom.xml @@ -185,6 +185,7 @@ -Xpkginfo:always nothing 0.20.0 +2.2.12 kubectl @@ -1498,6 +1499,13 @@ tests test + + +javax.xml.bind +jaxb-api +${jaxb-api.version} + + diff --git a/sdks/java/io/file-based-io-tests/build.gradle b/sdks/java/io/file-based-io-tests/build.gradle index e797172850a..9e6eb0b2d26 100644 --- a/sdks/java/io/file-based-io-tests/build.gradle +++ b/sdks/java/io/file-based-io-tests/build.gradle @@ -39,4 +39,5 @@ dependencies { shadowTest library.java.guava shadowTest library.java.junit shadowTest library.java.hamcrest_core + shadowTest library.java.jaxb_api } diff --git a/sdks/java/io/file-based-io-tests/pom.xml b/sdks/java/io/file-based-io-tests/pom.xml index c66537c7b15..3de4ba55ae1 100644 --- a/sdks/java/io/file-based-io-tests/pom.xml +++ b/sdks/java/io/file-based-io-tests/pom.xml @@ -294,6 +294,11 @@ ${apache.hadoop.version} runtime + +javax.xml.bind +jaxb-api +test + diff --git a/sdks/java/io/xml/build.gradle b/sdks/java/io/xml/build.gradle index 5e66ad96f35..af07424990f 100644 --- a/sdks/java/io/xml/build.gradle +++ b/sdks/java/io/xml/build.gradle @@ -27,6 +27,7 @@ dependencies { shadow library.java.stax2_api shadow library.java.findbugs_jsr305 shadow library.java.woodstox_core_asl + shadowTest library.java.jaxb_api testCompile project(path: ":sdks:java:core", configuration: "shadowTest") testCompile project(path: ":runners:direct-java", configuration: "shadow") testCompile library.java.junit diff --git a/sdks/java/io/xml/pom.xml b/sdks/java/io/xml/pom.xml index d0a3d5f7396..85ecd4ec2a2 100644 --- a/sdks/java/io/xml/pom.xml +++ b/sdks/java/io/xml/pom.xml @@ -109,6 +109,12 @@ hamcrest-library test + + + javax.xml.bind + jaxb-api + test + This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83710) Time Spent: 5h (was: 4h 50m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83665 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 23/Mar/18 16:43 Start Date: 23/Mar/18 16:43 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#issuecomment-375727745 @lukecwik , is this good to go ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83665) Time Spent: 4h 50m (was: 4h 40m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83588=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83588 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 23/Mar/18 12:55 Start Date: 23/Mar/18 12:55 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#issuecomment-375656292 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83588) Time Spent: 4h 40m (was: 4.5h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83568 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 23/Mar/18 11:52 Start Date: 23/Mar/18 11:52 Worklog Time Spent: 10m Work Description: szewi commented on issue #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#issuecomment-375635415 @lukecwik @chamikaramj now should be fine. Can we merge this? Just to remind you it's a blocker for continuing working on #4861 . This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 83568) Time Spent: 4.5h (was: 4h 20m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=82930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82930 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 21/Mar/18 21:36 Start Date: 21/Mar/18 21:36 Worklog Time Spent: 10m Work Description: szewi commented on a change in pull request #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#discussion_r176246299 ## File path: sdks/java/io/xml/build.gradle ## @@ -27,6 +27,7 @@ dependencies { shadow library.java.stax2_api shadow library.java.findbugs_jsr305 shadow library.java.woodstox_core_asl + shadow library.java.jaxb_api Review comment: I was also thinking about `testRuntime`, but `shadowTest` seems like right dependency type to consistent with intended scope also provided in sdks/java/io/xml/pom.xml. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82930) Time Spent: 4h 20m (was: 4h 10m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=82904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82904 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 21/Mar/18 20:44 Start Date: 21/Mar/18 20:44 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#discussion_r176231438 ## File path: pom.xml ## @@ -1498,6 +1499,14 @@ tests test + + +javax.xml.bind +jaxb-api +${jaxb-api.version} +test Review comment: You shouldn't need to specify the scope This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82904) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=82903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82903 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 21/Mar/18 20:44 Start Date: 21/Mar/18 20:44 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#discussion_r176231714 ## File path: sdks/java/io/xml/build.gradle ## @@ -27,6 +27,7 @@ dependencies { shadow library.java.stax2_api shadow library.java.findbugs_jsr305 shadow library.java.woodstox_core_asl + shadow library.java.jaxb_api Review comment: `shadowTest` to match the same intended scope inserted in the pom.xml? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82903) Time Spent: 4h 10m (was: 4h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=82280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82280 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 20/Mar/18 14:25 Start Date: 20/Mar/18 14:25 Worklog Time Spent: 10m Work Description: szewi commented on issue #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#issuecomment-374616925 Builds on jenkins failed, but it's not my fault. Seems like fix for this https://github.com/apache/beam/pull/4902 was merged few hours ago. @chamikaramj should I rebase and squash all commits ? I builded it using gradle and tested it locally running `./gradlew --info :sdks:java:io:xml:test` command and unit tests for xmlIO passed. `./gradlew --info :sdks:java:io:file-based-io-tests:test` is failing with other not related error: > Could not get unknown property 'sourceSets' for project ':runners:direct-java' of type org.gradle.api.Project. but it's rather out of scope of this PR to fix it. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82280) Time Spent: 4h (was: 3h 50m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81267=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81267 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 16/Mar/18 17:36 Start Date: 16/Mar/18 17:36 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#issuecomment-373789471 Hi Kamil, looks like sdks/java/io/xml is the only component that uses JAXB currently. If tests pass with 2.2.12 I'm fine with adding a root level dependency to that and making both sdks/java/io/xml and sdks/java/io/file-based-io-tests use that. Also, please update the Gradle files accordingly as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81267) Time Spent: 3h 50m (was: 3h 40m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81113 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 16/Mar/18 08:41 Start Date: 16/Mar/18 08:41 Worklog Time Spent: 10m Work Description: szewi commented on issue #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#issuecomment-373642375 Just two comments from me: 1. I was able to successfully run those file based tests on hdfs, before adding xmlIO tests to filebased package. Once this xmlIO tests were added to file based tests, the issue with mvn dependency appeared. 2. I run those tests on jaxb-api version 2.2.2 and 2.2.3 too. I also tested it with suggested by my IDE 2.2.11 version and the latest 2.2.X version - 2.2.12 and on both tests works smoothly. In my opinion 2.2.3 version is old enough to change it to 2.2.11 or 2.2.12, but I am bit not aware of consequences for the other beam components. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81113) Time Spent: 3h 40m (was: 3.5h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81015 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 15/Mar/18 23:21 Start Date: 15/Mar/18 23:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#discussion_r174959895 ## File path: pom.xml ## @@ -186,6 +186,8 @@ nothing 0.20.0 Review comment: Please remove the empty line. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81015) Time Spent: 3h 20m (was: 3h 10m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81014 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 15/Mar/18 23:21 Start Date: 15/Mar/18 23:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#discussion_r174959913 ## File path: sdks/java/io/file-based-io-tests/pom.xml ## @@ -294,6 +294,12 @@ ${apache.hadoop.version} runtime + +javax.xml.bind +jaxb-api Review comment: Thanks. Can you also define JAXB dependencies in https://github.com/apache/beam/blob/master/sdks/java/io/xml/pom.xml in root level and update that component to use the version defined in the root level. Also, can't we use 2.2.3 instead of 2.2.2 which seems to be pretty old ? I think we could run into issues if we have to use both 2.2.0 and 2.2.3 for different components. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81014) Time Spent: 3h 10m (was: 3h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81016=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81016 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 15/Mar/18 23:21 Start Date: 15/Mar/18 23:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#issuecomment-373553567 cc: @kennknowles @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81016) Time Spent: 3.5h (was: 3h 20m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81005 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 15/Mar/18 22:55 Start Date: 15/Mar/18 22:55 Worklog Time Spent: 10m Work Description: szewi commented on a change in pull request #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#discussion_r174956941 ## File path: sdks/java/io/file-based-io-tests/pom.xml ## @@ -294,6 +294,12 @@ ${apache.hadoop.version} runtime + +javax.xml.bind +jaxb-api Review comment: No problem, I will move it to beam/pom.xml, but I need to run local tests, before another pull. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81005) Time Spent: 3h (was: 2h 50m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81004 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 15/Mar/18 22:51 Start Date: 15/Mar/18 22:51 Worklog Time Spent: 10m Work Description: szewi commented on a change in pull request #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#discussion_r174956360 ## File path: sdks/java/io/file-based-io-tests/pom.xml ## @@ -294,6 +294,12 @@ ${apache.hadoop.version} runtime + +javax.xml.bind +jaxb-api Review comment: Added to dependencyManagement from java/io. Version removed from filebased pom.xml as because it's defined in dependencyManagement and this one will be used. To be consistent with scopes in dependencyManagement scope was also changed to test and it was sufficient to run tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81004) Time Spent: 2h 50m (was: 2h 40m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81003 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 15/Mar/18 22:51 Start Date: 15/Mar/18 22:51 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#discussion_r174956339 ## File path: sdks/java/io/file-based-io-tests/pom.xml ## @@ -294,6 +294,12 @@ ${apache.hadoop.version} runtime + +javax.xml.bind +jaxb-api Review comment: Sorry, by root level I meant beam/pom.xml This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81003) Time Spent: 2h 40m (was: 2.5h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80980 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 15/Mar/18 21:01 Start Date: 15/Mar/18 21:01 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#discussion_r174931346 ## File path: sdks/java/io/file-based-io-tests/pom.xml ## @@ -294,6 +294,12 @@ ${apache.hadoop.version} runtime + +javax.xml.bind +jaxb-api Review comment: Please add this to root level dependencyManagement so that we use the same version of jaxb-api across components. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80980) Time Spent: 2.5h (was: 2h 20m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80846 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 15/Mar/18 14:03 Start Date: 15/Mar/18 14:03 Worklog Time Spent: 10m Work Description: szewi commented on issue #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870#issuecomment-373386209 cc @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80846) Time Spent: 2h 20m (was: 2h 10m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80842 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 15/Mar/18 14:00 Start Date: 15/Mar/18 14:00 Worklog Time Spent: 10m Work Description: szewi opened a new pull request #4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t… URL: https://github.com/apache/beam/pull/4870 …ests on hdfs. Performance testing on jenkins of hdfs is failing with mvn dependency issue [jenkins job output].(https://builds.apache.org/job/beam_PerformanceTests_TextIOIT_HDFS/6/console) When job is runnning on jenkins - master branch is cloned and used, so we should fix this before continuing working on tests running on hdfs: Mvn is failing with: > [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.2:analyze-only (default) on project beam-sdks-java-io-file-based-io-tests: Dependency problems found -> [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.2:analyze-only (default) on project beam-sdks-java-io-file-based-io-tests: Dependency problems found The problem is related to javax.xml.bind:jaxb-api:jar:2.2.2:runtime which is used, but not declared. This commit fixes the issue. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand: - [ ] What the pull request does - [ ] Why it does it - [ ] How it does it - [ ] Why this approach - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80842) Time Spent: 2h 10m (was: 2h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80816 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 15/Mar/18 11:43 Start Date: 15/Mar/18 11:43 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-373348603 Run Java TextIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80816) Time Spent: 2h (was: 1h 50m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80813 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 15/Mar/18 11:39 Start Date: 15/Mar/18 11:39 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-373347778 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80813) Time Spent: 1h 50m (was: 1h 40m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80384 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 14/Mar/18 16:23 Start Date: 14/Mar/18 16:23 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-373084094 Run Java TextIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80384) Time Spent: 1h 40m (was: 1.5h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80379=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80379 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 14/Mar/18 16:16 Start Date: 14/Mar/18 16:16 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-373081604 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80379) Time Spent: 1.5h (was: 1h 20m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80359 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 14/Mar/18 15:03 Start Date: 14/Mar/18 15:03 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-373053604 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80359) Time Spent: 1h 20m (was: 1h 10m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80330 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 14/Mar/18 14:07 Start Date: 14/Mar/18 14:07 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-373033200 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80330) Time Spent: 1h 10m (was: 1h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80324 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 14/Mar/18 13:56 Start Date: 14/Mar/18 13:56 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-373029263 Run Java TextIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80324) Time Spent: 1h (was: 50m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80318=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80318 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 14/Mar/18 13:26 Start Date: 14/Mar/18 13:26 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-373019804 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80318) Time Spent: 50m (was: 40m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80306 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 14/Mar/18 12:15 Start Date: 14/Mar/18 12:15 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-373000502 Run Java TextIO Performance Test HDFS This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80306) Time Spent: 40m (was: 0.5h) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80304 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 14/Mar/18 12:12 Start Date: 14/Mar/18 12:12 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-372999649 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 80304) Time Spent: 0.5h (was: 20m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=79943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79943 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 13/Mar/18 16:48 Start Date: 13/Mar/18 16:48 Worklog Time Spent: 10m Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861#issuecomment-372735722 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79943) Time Spent: 20m (was: 10m) > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms
[ https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=79941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79941 ] ASF GitHub Bot logged work on BEAM-3060: Author: ASF GitHub Bot Created on: 13/Mar/18 16:45 Start Date: 13/Mar/18 16:45 Worklog Time Spent: 10m Work Description: szewi opened a new pull request #4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS. URL: https://github.com/apache/beam/pull/4861 DESCRIPTION HERE Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand: - [ ] What the pull request does - [ ] Why it does it - [ ] How it does it - [ ] Why this approach - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 79941) Time Spent: 10m Remaining Estimate: 0h > Add performance tests for commonly used file-based I/O PTransforms > -- > > Key: BEAM-3060 > URL: https://issues.apache.org/jira/browse/BEAM-3060 > Project: Beam > Issue Type: Test > Components: sdk-java-core >Reporter: Chamikara Jayalath >Assignee: Szymon Nieradka >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > We recently added a performance testing framework [1] that can be used to do > following. > (1) Execute Beam tests using PerfkitBenchmarker > (2) Manage Kubernetes-based deployments of data stores. > (3) Easily publish benchmark results. > I think it will be useful to add performance tests for commonly used > file-based I/O PTransforms using this framework. I suggest looking into > following formats initially. > (1) AvroIO > (2) TextIO > (3) Compressed text using TextIO > (4) TFRecordIO > It should be possibly to run these tests for various Beam runners (Direct, > Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) > easily. > In the initial version, tests can be made manually triggerable for PRs > through Jenkins. Later, we could make some of these tests run periodically > and publish benchmark results (to BigQuery) through PerfkitBenchmarker. > [1] https://beam.apache.org/documentation/io/testing/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)