[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=107478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107478
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 30/May/18 23:25
Start Date: 30/May/18 23:25
Worklog Time Spent: 10m 
  Work Description: chamikaramj closed pull request #5441: [BEAM-3060] HDFS 
large cluster configuration. Jenkins job updated to use large cl…
URL: https://github.com/apache/beam/pull/5441
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy 
b/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
index 7aa5c3251c0..62a2346fa17 100644
--- a/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
+++ b/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
@@ -140,7 +140,7 @@ private void 
create_filebasedio_performance_test_job(testConfiguration) {
 beam_extra_mvn_properties: '["filesystem=hdfs"]',
 bigquery_table   : testConfiguration.bqTable,
 beam_options_config_file : makePathAbsolute('pkb-config.yml'),
-beam_kubernetes_scripts  : 
makePathAbsolute('hdfs-single-datanode-cluster.yml') + ',' + 
makePathAbsolute('hdfs-single-datanode-cluster-for-local-dev.yml')
+beam_kubernetes_scripts  : 
makePathAbsolute('hdfs-multi-datanode-cluster.yml')
 ]
 common_job_properties.setupKubernetes(delegate, namespace, kubeconfig)
 common_job_properties.buildPerformanceTest(delegate, argMap)
@@ -149,5 +149,5 @@ private void 
create_filebasedio_performance_test_job(testConfiguration) {
 }
 
 static def makePathAbsolute(String path) {
-return '"$WORKSPACE/src/.test-infra/kubernetes/hadoop/SmallITCluster/' + 
path + '"'
+return '"$WORKSPACE/src/.test-infra/kubernetes/hadoop/LargeITCluster/' + 
path + '"'
 }
\ No newline at end of file
diff --git 
a/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster-for-local-dev.yml
 
b/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster-for-local-dev.yml
new file mode 100644
index 000..7cb891bcd99
--- /dev/null
+++ 
b/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster-for-local-dev.yml
@@ -0,0 +1,73 @@
+#Licensed to the Apache Software Foundation (ASF) under one or more
+#contributor license agreements.  See the NOTICE file distributed with
+#this work for additional information regarding copyright ownership.
+#The ASF licenses this file to You under the Apache License, Version 2.0
+#(the "License"); you may not use this file except in compliance with
+#the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+# This cluster is intended to be run additionally to 
hdfs-multi-datanode-cluster.yml.
+# It provides an additional setup to access large hdfs cluster by DirectRunner 
or any
+# external application. Services created by this setup need to be properly 
included in
+# /etc/hosts file, so it is strongly suggested to run start-all.sh script 
instead of
+# running this file manually.
+#
+
+apiVersion: v1
+kind: Service
+metadata:
+  name: datanode-0
+  labels:
+name: datanode-0
+spec:
+  ports:
+- name: hdfs
+  port: 9000
+- name: web
+  port: 50010
+  selector:
+statefulset.kubernetes.io/pod-name: datanode-0
+  type: LoadBalancer
+
+---
+
+apiVersion: v1
+kind: Service
+metadata:
+  name: datanode-1
+  labels:
+name: datanode-1
+spec:
+  ports:
+- name: hdfs
+  port: 9000
+- name: web
+  port: 50010
+  selector:
+statefulset.kubernetes.io/pod-name: datanode-1
+  type: LoadBalancer
+
+---
+
+apiVersion: v1
+kind: Service
+metadata:
+  name: datanode-2
+  labels:
+name: datanode-2
+spec:
+  ports:
+- name: hdfs
+  port: 9000
+- name: web
+  port: 50010
+  selector:
+statefulset.kubernetes.io/pod-name: datanode-2
+  type: LoadBalancer
diff --git 
a/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster.yml 
b/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster.yml
new file mode 100644
index 000..e796243d389
--- /dev/null
+++ 

[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=107477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107477
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 30/May/18 23:25
Start Date: 30/May/18 23:25
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #5441: [BEAM-3060] HDFS 
large cluster configuration. Jenkins job updated to use large cl…
URL: https://github.com/apache/beam/pull/5441#issuecomment-393351110
 
 
   LGTM.
   
   I'll go ahead and merge. Please consider removing the pom.xml updates in a 
separate PR if Gradle based execution does not need that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 107477)
Time Spent: 10h 10m  (was: 10h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-29 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=106689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-106689
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 29/May/18 15:55
Start Date: 29/May/18 15:55
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#5441: [BEAM-3060] HDFS large cluster configuration. Jenkins job updated to use 
large cl…
URL: https://github.com/apache/beam/pull/5441#discussion_r191479040
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -248,6 +248,89 @@
 
 
 
+
+io-it-hdfs-large
+
+io-it-suite-hdfs-large
+
+
+
+
${project.parent.parent.parent.parent.basedir}
 
 Review comment:
   Why do we need this Maven config ? I assume test can be run using Gradle 
now, right ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 106689)
Time Spent: 10h  (was: 9h 50m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105700
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 24/May/18 19:07
Start Date: 24/May/18 19:07
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large 
cluster configuration. Jenkins job updated to use large cl…
URL: https://github.com/apache/beam/pull/5441#issuecomment-391825986
 
 
   Works like a charm! Alan Myrvold update kubernetes cluster so I finally 
tested it on our Jenkins. 
   @chamikaramj let me know if I can help with the review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 105700)
Time Spent: 9h 50m  (was: 9h 40m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105698
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 24/May/18 18:52
Start Date: 24/May/18 18:52
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large 
cluster configuration. Jenkins job updated to use large cl…
URL: https://github.com/apache/beam/pull/5441#issuecomment-391821501
 
 
   Run Java XmlIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 105698)
Time Spent: 9h 40m  (was: 9.5h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105692
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 24/May/18 18:41
Start Date: 24/May/18 18:41
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large 
cluster configuration. Jenkins job updated to use large cl…
URL: https://github.com/apache/beam/pull/5441#issuecomment-391818397
 
 
   Run Java CompressedTextIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 105692)
Time Spent: 9.5h  (was: 9h 20m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105690=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105690
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 24/May/18 18:28
Start Date: 24/May/18 18:28
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large 
cluster configuration. Jenkins job updated to use large cl…
URL: https://github.com/apache/beam/pull/5441#issuecomment-391814321
 
 
   Run Java TextIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 105690)
Time Spent: 9h 20m  (was: 9h 10m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105689
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 24/May/18 18:24
Start Date: 24/May/18 18:24
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large 
cluster configuration. Jenkins job updated to use large cl…
URL: https://github.com/apache/beam/pull/5441#issuecomment-391813196
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 105689)
Time Spent: 9h 10m  (was: 9h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105049
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 23/May/18 10:47
Start Date: 23/May/18 10:47
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large 
cluster configuration. Jenkins job updated to use large cl…
URL: https://github.com/apache/beam/pull/5441#issuecomment-391304075
 
 
   @chamikaramj It's ready for review. 
   I expected the error with deletion so I created an issue in JIRA describing 
the reason why we need to upgrade kubectl to at least version 1.9.3 [link to 
jira](https://issues.apache.org/jira/browse/BEAM-4362). I didn't expected that 
we are using kubernetes server version lower than 1.9 which is an issue because 
we can't use new kubernetes features like StatefulSets(they bring lots of 
improvement over ReplicationsControllers and simplifies hadoop configuration a 
lot.)
   I created a JIRA issue for this 
https://issues.apache.org/jira/browse/BEAM-4390 I also mentioned about security 
issues found on 20 of May that need to be pached. 
   @chamikaramj who may have access to GCP UI and can help with upgrading of 
kubernetes cluster in apache-beam-testing project? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 105049)
Time Spent: 9h  (was: 8h 50m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105045
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 23/May/18 10:25
Start Date: 23/May/18 10:25
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large 
cluster configuration. Jenkins job updated to use large cl…
URL: https://github.com/apache/beam/pull/5441#issuecomment-391298837
 
 
   Run Java TextIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 105045)
Time Spent: 8h 50m  (was: 8h 40m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=105043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105043
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 23/May/18 10:21
Start Date: 23/May/18 10:21
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #5441: [BEAM-3060] HDFS large 
cluster configuration. Jenkins job updated to use large cl…
URL: https://github.com/apache/beam/pull/5441#issuecomment-391297887
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 105043)
Time Spent: 8h 40m  (was: 8.5h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-05-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=104231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-104231
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 21/May/18 20:19
Start Date: 21/May/18 20:19
Worklog Time Spent: 10m 
  Work Description: szewi opened a new pull request #5441: [BEAM-3060] HDFS 
large cluster configuration. Jenkins job updated to use large cl…
URL: https://github.com/apache/beam/pull/5441
 
 
   …uster.
   
   All files have meaningful description. This PR switches Jenkins job to use 
hdfs large cluster in all jenkins jobs that use hdfs. Large cluster is 
configured as 1 namenode and 3 datanodes. StatefulSet introduced in kubernetes 
1.9 is used instead of ReplicationControlller as better solution for solving 
cluster auto-configuration issues. Docker image with hadoop 2.7.1 is hosted in 
public docker hub and being pulled from it.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 104231)
Time Spent: 8.5h  (was: 8h 20m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87136
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 03/Apr/18 15:49
Start Date: 03/Apr/18 15:49
Worklog Time Spent: 10m 
  Work Description: chamikaramj closed pull request #4861: [BEAM-3060] 
Jenkins configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/.test-infra/jenkins/job_beam_PerformanceTests_FileBasedIO_IT_HDFS.groovy 
b/.test-infra/jenkins/job_beam_PerformanceTests_FileBasedIO_IT_HDFS.groovy
new file mode 100644
index 000..19c0f074e80
--- /dev/null
+++ b/.test-infra/jenkins/job_beam_PerformanceTests_FileBasedIO_IT_HDFS.groovy
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import common_job_properties
+
+def testsConfigurations = [
+[
+jobName   : 'beam_PerformanceTests_TextIOIT_HDFS',
+jobDescription: 'Runs PerfKit tests for TextIOIT on HDFS',
+itClass   : 'org.apache.beam.sdk.io.text.TextIOIT',
+bqTable   : 
'beam_performance.textioit_hdfs_pkb_results',
+prCommitStatusName: 'Java TextIO Performance Test on HDFS',
+prTriggerPhase: 'Run Java TextIO Performance Test HDFS',
+extraPipelineArgs: [
+numberOfRecords: '100'
+]
+
+],
+[
+jobName: 
'beam_PerformanceTests_Compressed_TextIOIT_HDFS',
+jobDescription : 'Runs PerfKit tests for TextIOIT with 
GZIP compression on HDFS',
+itClass: 'org.apache.beam.sdk.io.text.TextIOIT',
+bqTable: 
'beam_performance.compressed_textioit_hdfs_pkb_results',
+prCommitStatusName : 'Java CompressedTextIO Performance Test 
on HDFS',
+prTriggerPhase : 'Run Java CompressedTextIO Performance 
Test HDFS',
+extraPipelineArgs: [
+numberOfRecords: '100',
+compressionType: 'GZIP'
+]
+],
+[
+jobName   : 'beam_PerformanceTests_AvroIOIT_HDFS',
+jobDescription: 'Runs PerfKit tests for AvroIOIT on HDFS',
+itClass   : 'org.apache.beam.sdk.io.avro.AvroIOIT',
+bqTable   : 
'beam_performance.avroioit_hdfs_pkb_results',
+prCommitStatusName: 'Java AvroIO Performance Test on HDFS',
+prTriggerPhase: 'Run Java AvroIO Performance Test HDFS',
+extraPipelineArgs: [
+numberOfRecords: '100'
+]
+],
+// TODO(BEAM-3945) TFRecord performance test is failing only when running on 
hdfs.
+// We need to fix this before enabling this job on jenkins.
+//[
+//jobName   : 
'beam_PerformanceTests_TFRecordIOIT_HDFS',
+//jobDescription: 'Runs PerfKit tests for 
beam_PerformanceTests_TFRecordIOIT on HDFS',
+//itClass   : 
'org.apache.beam.sdk.io.tfrecord.TFRecordIOIT',
+//bqTable   : 
'beam_performance.tfrecordioit_hdfs_pkb_results',
+//prCommitStatusName: 'Java TFRecordIO Performance Test on 
HDFS',
+//prTriggerPhase: 'Run Java TFRecordIO Performance Test 
HDFS',
+//extraPipelineArgs: [
+//numberOfRecords: '100'
+//]
+//],
+[
+jobName   : 'beam_PerformanceTests_XmlIOIT_HDFS',
+jobDescription: 'Runs PerfKit tests for 

[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87135
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 03/Apr/18 15:48
Start Date: 03/Apr/18 15:48
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #4861: [BEAM-3060] 
Jenkins configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-378298429
 
 
   LGTM. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87135)
Time Spent: 8h 10m  (was: 8h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87080
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 03/Apr/18 13:16
Start Date: 03/Apr/18 13:16
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-378246261
 
 
   This error on jenkins is not PR related: 
   ```Workflow failed. Causes: Project apache-beam-testing has insufficient 
quota(s) to execute this workflow with 1 instances in region us-central1. Quota 
summary (required/available): 1/1435 instances, 1/0 CPUs, 250/13800 disk GB, 
0/1998 SSD disk GB, 1/65 instance groups, 1/15 managed instance groups, 1/41 
instance templates, 1/287 in-use IP addresses.``` 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87080)
Time Spent: 8h  (was: 7h 50m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87069=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87069
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 03/Apr/18 13:01
Start Date: 03/Apr/18 13:01
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-378241629
 
 
   Run Java TextIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87069)
Time Spent: 7h 50m  (was: 7h 40m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87066
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 03/Apr/18 12:45
Start Date: 03/Apr/18 12:45
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-378236853
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87066)
Time Spent: 7h 40m  (was: 7.5h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87005
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 03/Apr/18 09:46
Start Date: 03/Apr/18 09:46
Worklog Time Spent: 10m 
  Work Description: szewi commented on a change in pull request #4861: 
[BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#discussion_r178768125
 
 

 ##
 File path: 
.test-infra/jenkins/job_beam_PerformanceTests_FileBasedIO_IT_HDFS.groovy
 ##
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import common_job_properties
+
+def testsConfigurations = [
+[
+jobName   : 'beam_PerformanceTests_TextIOIT_HDFS',
+jobDescription: 'Runs PerfKit tests for TextIOIT on HDFS',
+itClass   : 'org.apache.beam.sdk.io.text.TextIOIT',
+bqTable   : 
'beam_performance.textioit_hdfs_pkb_results',
+prCommitStatusName: 'Java TextIO Performance Test on HDFS',
+prTriggerPhase: 'Run Java TextIO Performance Test HDFS',
+extraPipelineArgs: [
+numberOfRecords: '100'
+]
+
+],
+[
+jobName: 
'beam_PerformanceTests_Compressed_TextIOIT_HDFS',
+jobDescription : 'Runs PerfKit tests for TextIOIT with 
GZIP compression on HDFS',
+itClass: 'org.apache.beam.sdk.io.text.TextIOIT',
+bqTable: 
'beam_performance.compressed_textioit_hdfs_pkb_results',
+prCommitStatusName : 'Java CompressedTextIO Performance Test 
on HDFS',
+prTriggerPhase : 'Run Java CompressedTextIO Performance 
Test HDFS',
+extraPipelineArgs: [
+numberOfRecords: '100',
+compressionType: 'GZIP'
+]
+],
+[
+jobName   : 'beam_PerformanceTests_AvroIOIT_HDFS',
+jobDescription: 'Runs PerfKit tests for AvroIOIT on HDFS',
+itClass   : 'org.apache.beam.sdk.io.avro.AvroIOIT',
+bqTable   : 
'beam_performance.avroioit_hdfs_pkb_results',
+prCommitStatusName: 'Java AvroIO Performance Test on HDFS',
+prTriggerPhase: 'Run Java AvroIO Performance Test HDFS',
+extraPipelineArgs: [
+numberOfRecords: '100'
+]
+],
+//[
+//jobName   : 
'beam_PerformanceTests_TFRecordIOIT_HDFS',
+//jobDescription: 'Runs PerfKit tests for 
beam_PerformanceTests_TFRecordIOIT on HDFS',
 
 Review comment:
   OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87005)
Time Spent: 7.5h  (was: 7h 20m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be 

[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=87004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87004
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 03/Apr/18 09:45
Start Date: 03/Apr/18 09:45
Worklog Time Spent: 10m 
  Work Description: szewi commented on a change in pull request #4861: 
[BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#discussion_r178768038
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -294,6 +294,11 @@
 ${apache.hadoop.version}
 runtime
 
+
+javax.xml.bind
+jaxb-api
 
 Review comment:
   Gradle was updated in #4870 Diff: 
https://github.com/apache/beam/pull/4870/files#diff-76558bfd90b93d37ab9369862732ffd6
 It's already merged to the master branch. 
   This jenkins jobs works, because when tests are running by perfkit on 
jenkins, jenkins is clonning master branch and run tests against that code. Let 
me know if you still prefer to add this line in gradle here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87004)
Time Spent: 7h 20m  (was: 7h 10m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-04-02 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=86934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86934
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 03/Apr/18 04:25
Start Date: 03/Apr/18 04:25
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on 
HDFS.
URL: https://github.com/apache/beam/pull/4861#discussion_r178711210
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -294,6 +294,11 @@
 ${apache.hadoop.version}
 runtime
 
+
+javax.xml.bind
+jaxb-api
 
 Review comment:
   Please update Gradle file as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 86934)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-04-02 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=86933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86933
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 03/Apr/18 04:25
Start Date: 03/Apr/18 04:25
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4861: [BEAM-3060] Jenkins configuration allowing to run FilebasedIO tests on 
HDFS.
URL: https://github.com/apache/beam/pull/4861#discussion_r178711242
 
 

 ##
 File path: 
.test-infra/jenkins/job_beam_PerformanceTests_FileBasedIO_IT_HDFS.groovy
 ##
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import common_job_properties
+
+def testsConfigurations = [
+[
+jobName   : 'beam_PerformanceTests_TextIOIT_HDFS',
+jobDescription: 'Runs PerfKit tests for TextIOIT on HDFS',
+itClass   : 'org.apache.beam.sdk.io.text.TextIOIT',
+bqTable   : 
'beam_performance.textioit_hdfs_pkb_results',
+prCommitStatusName: 'Java TextIO Performance Test on HDFS',
+prTriggerPhase: 'Run Java TextIO Performance Test HDFS',
+extraPipelineArgs: [
+numberOfRecords: '100'
+]
+
+],
+[
+jobName: 
'beam_PerformanceTests_Compressed_TextIOIT_HDFS',
+jobDescription : 'Runs PerfKit tests for TextIOIT with 
GZIP compression on HDFS',
+itClass: 'org.apache.beam.sdk.io.text.TextIOIT',
+bqTable: 
'beam_performance.compressed_textioit_hdfs_pkb_results',
+prCommitStatusName : 'Java CompressedTextIO Performance Test 
on HDFS',
+prTriggerPhase : 'Run Java CompressedTextIO Performance 
Test HDFS',
+extraPipelineArgs: [
+numberOfRecords: '100',
+compressionType: 'GZIP'
+]
+],
+[
+jobName   : 'beam_PerformanceTests_AvroIOIT_HDFS',
+jobDescription: 'Runs PerfKit tests for AvroIOIT on HDFS',
+itClass   : 'org.apache.beam.sdk.io.avro.AvroIOIT',
+bqTable   : 
'beam_performance.avroioit_hdfs_pkb_results',
+prCommitStatusName: 'Java AvroIO Performance Test on HDFS',
+prTriggerPhase: 'Run Java AvroIO Performance Test HDFS',
+extraPipelineArgs: [
+numberOfRecords: '100'
+]
+],
+//[
+//jobName   : 
'beam_PerformanceTests_TFRecordIOIT_HDFS',
+//jobDescription: 'Runs PerfKit tests for 
beam_PerformanceTests_TFRecordIOIT on HDFS',
 
 Review comment:
   Please add a comment with a link to a JIRA explaining why this is commented 
out.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 86933)
Time Spent: 7h 10m  (was: 7h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based 

[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-04-02 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=86932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86932
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 03/Apr/18 04:22
Start Date: 03/Apr/18 04:22
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #4861: [BEAM-3060] 
Jenkins configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-378123934
 
 
   Run seed job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 86932)
Time Spent: 7h  (was: 6h 50m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=84432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-84432
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 26/Mar/18 16:21
Start Date: 26/Mar/18 16:21
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-376225174
 
 
   @chamikaramj if seed job passes all hdfs jobs will be active except TFRecord 
hdfs tests job which will be disabled. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 84432)
Time Spent: 6h 50m  (was: 6h 40m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=84430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-84430
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 26/Mar/18 16:19
Start Date: 26/Mar/18 16:19
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-376224379
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 84430)
Time Spent: 6h 40m  (was: 6.5h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=84415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-84415
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 26/Mar/18 15:49
Start Date: 26/Mar/18 15:49
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #4861: [BEAM-3060] 
Jenkins configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-376214711
 
 
   Thanks. Yeah, I think it's fine to proceed with a JIRA created for TfRecord.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 84415)
Time Spent: 6.5h  (was: 6h 20m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=84387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-84387
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 26/Mar/18 15:16
Start Date: 26/Mar/18 15:16
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-376202797
 
 
   Hi Cham, it's ready for review. The kubernetes clusters are being created 
and deleted afterwards, even during failure. I trigger two builds to run in the 
same time and there were no interferences between them. Tests are working 
smoothly on HDFS for TextIO, CompressedText, AVRO and XMLIO. There is only an 
issue for TFRecord:
   
https://builds.apache.org/job/beam_PerformanceTests_TFRecordIOIT_HDFS/3/console
   >java.lang.IllegalStateException: Not a valid TFRecord. Fewer than 12 bytes.
   >java.lang.IllegalStateException: Invalid data
   
   this is only happening when running TFRecord tests on HDFS. Same happen when 
running tests from my local machine. This seem to be related to how data is 
accessed in TFRecord tests 
https://github.com/apache/beam/blob/597e3f92bc8be692d5d8e8040b33ce0c77350fa2/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/tfrecord/TFRecordIOIT.java#L110
 and 
https://github.com/apache/beam/blob/597e3f92bc8be692d5d8e8040b33ce0c77350fa2/sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/tfrecord/TFRecordIOIT.java#L113
 
   
   I investigated it and it seems like even if I specify exact filename on hdfs 
(`.apply(TFRecordIO.read().from("hdfs://35.225.39.200:9000/TFRecord_1522073710252-0-of-6.tfrecord").withCompression(AUTO))`)
 instead of filenamePattern it still got the same error with invalid data. I 
can create a Bug in JIRA describing the issue, providing steps to reproduce and 
look at this separately. Should we proceed with that PR, but simply temporarily 
disable/remove TFRecord job? WDYT? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 84387)
Time Spent: 6h 20m  (was: 6h 10m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83990
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 24/Mar/18 11:44
Start Date: 24/Mar/18 11:44
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-375877853
 
 
   Run Java TFRecordIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83990)
Time Spent: 6h 10m  (was: 6h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83989
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 24/Mar/18 11:38
Start Date: 24/Mar/18 11:38
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-375877105
 
 
   Run Java XmlIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83989)
Time Spent: 6h  (was: 5h 50m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83988
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 24/Mar/18 11:31
Start Date: 24/Mar/18 11:31
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-375876057
 
 
   Run Java AvroIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83988)
Time Spent: 5h 50m  (was: 5h 40m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83987
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 24/Mar/18 11:26
Start Date: 24/Mar/18 11:26
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-375875453
 
 
   Run Java CompressedTextIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83987)
Time Spent: 5h 40m  (was: 5.5h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83985
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 24/Mar/18 11:18
Start Date: 24/Mar/18 11:18
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-375874415
 
 
   Run Java TextIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83985)
Time Spent: 5.5h  (was: 5h 20m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83983
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 24/Mar/18 11:13
Start Date: 24/Mar/18 11:13
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-375873823
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83983)
Time Spent: 5h 20m  (was: 5h 10m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83726
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 23/Mar/18 18:10
Start Date: 23/Mar/18 18:10
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #4861: [BEAM-3060] 
Jenkins configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-375754624
 
 
   Kamil, please let me know when this is ready for review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83726)
Time Spent: 5h 10m  (was: 5h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83710
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 23/Mar/18 17:37
Start Date: 23/Mar/18 17:37
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #4870: [BEAM-3060] Fixing 
mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/build.gradle b/build.gradle
index 69502ab77c0..9263f3a073d 100644
--- a/build.gradle
+++ b/build.gradle
@@ -44,6 +44,7 @@ def pubsub_grpc_version = "0.1.18"
 def apex_core_version = "3.6.0"
 def apex_malhar_version = "3.4.0"
 def postgres_version = "9.4.1212.jre7"
+def jaxb_api_version = "2.2.12"
 
 // A map of maps containing common libraries used per language. To use:
 // dependencies {
@@ -125,6 +126,7 @@ ext.library = [
 jackson_dataformat_cbor: 
"com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:$jackson_version",
 jackson_dataformat_yaml: 
"com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:$jackson_version",
 jackson_module_scala: 
"com.fasterxml.jackson.module:jackson-module-scala_2.11:$jackson_version",
+jaxb_api: "javax.xml.bind:jaxb-api:$jaxb_api_version",
 joda_time: "joda-time:joda-time:2.4",
 junit: "junit:junit:4.12",
 kafka_clients: "org.apache.kafka:kafka-clients:1.0.0",
diff --git a/pom.xml b/pom.xml
index 9573a07767f..0e10ac92a86 100644
--- a/pom.xml
+++ b/pom.xml
@@ -185,6 +185,7 @@
 
-Xpkginfo:always
 nothing
 0.20.0
+2.2.12
 
 
 kubectl
@@ -1498,6 +1499,13 @@
 tests
 test
   
+
+  
+javax.xml.bind
+jaxb-api
+${jaxb-api.version}
+  
+
 
   
 
diff --git a/sdks/java/io/file-based-io-tests/build.gradle 
b/sdks/java/io/file-based-io-tests/build.gradle
index e797172850a..9e6eb0b2d26 100644
--- a/sdks/java/io/file-based-io-tests/build.gradle
+++ b/sdks/java/io/file-based-io-tests/build.gradle
@@ -39,4 +39,5 @@ dependencies {
   shadowTest library.java.guava
   shadowTest library.java.junit
   shadowTest library.java.hamcrest_core
+  shadowTest library.java.jaxb_api
 }
diff --git a/sdks/java/io/file-based-io-tests/pom.xml 
b/sdks/java/io/file-based-io-tests/pom.xml
index c66537c7b15..3de4ba55ae1 100644
--- a/sdks/java/io/file-based-io-tests/pom.xml
+++ b/sdks/java/io/file-based-io-tests/pom.xml
@@ -294,6 +294,11 @@
 ${apache.hadoop.version}
 runtime
 
+
+javax.xml.bind
+jaxb-api
+test
+
 
 
 
diff --git a/sdks/java/io/xml/build.gradle b/sdks/java/io/xml/build.gradle
index 5e66ad96f35..af07424990f 100644
--- a/sdks/java/io/xml/build.gradle
+++ b/sdks/java/io/xml/build.gradle
@@ -27,6 +27,7 @@ dependencies {
   shadow library.java.stax2_api
   shadow library.java.findbugs_jsr305
   shadow library.java.woodstox_core_asl
+  shadowTest library.java.jaxb_api
   testCompile project(path: ":sdks:java:core", configuration: "shadowTest")
   testCompile project(path: ":runners:direct-java", configuration: "shadow")
   testCompile library.java.junit
diff --git a/sdks/java/io/xml/pom.xml b/sdks/java/io/xml/pom.xml
index d0a3d5f7396..85ecd4ec2a2 100644
--- a/sdks/java/io/xml/pom.xml
+++ b/sdks/java/io/xml/pom.xml
@@ -109,6 +109,12 @@
   hamcrest-library
   test
 
+
+
+  javax.xml.bind
+  jaxb-api
+  test
+
   
 
   


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83710)
Time Spent: 5h  (was: 4h 50m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam 

[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83665
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 23/Mar/18 16:43
Start Date: 23/Mar/18 16:43
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #4870: [BEAM-3060] 
Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#issuecomment-375727745
 
 
   @lukecwik , is this good to go ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83665)
Time Spent: 4h 50m  (was: 4h 40m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83588=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83588
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 23/Mar/18 12:55
Start Date: 23/Mar/18 12:55
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #4870: [BEAM-3060] 
Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#issuecomment-375656292
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83588)
Time Spent: 4h 40m  (was: 4.5h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=83568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-83568
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 23/Mar/18 11:52
Start Date: 23/Mar/18 11:52
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4870: [BEAM-3060] Fixing mvn 
dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#issuecomment-375635415
 
 
   @lukecwik @chamikaramj  now should be fine. Can we merge this? Just to 
remind you it's a blocker for continuing working on #4861 .


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 83568)
Time Spent: 4.5h  (was: 4h 20m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=82930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82930
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 21/Mar/18 21:36
Start Date: 21/Mar/18 21:36
Worklog Time Spent: 10m 
  Work Description: szewi commented on a change in pull request #4870: 
[BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#discussion_r176246299
 
 

 ##
 File path: sdks/java/io/xml/build.gradle
 ##
 @@ -27,6 +27,7 @@ dependencies {
   shadow library.java.stax2_api
   shadow library.java.findbugs_jsr305
   shadow library.java.woodstox_core_asl
+  shadow library.java.jaxb_api
 
 Review comment:
   I was also thinking about `testRuntime`, but `shadowTest` seems like right 
dependency type to consistent with intended scope also provided in 
sdks/java/io/xml/pom.xml. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82930)
Time Spent: 4h 20m  (was: 4h 10m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=82904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82904
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 21/Mar/18 20:44
Start Date: 21/Mar/18 20:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #4870: 
[BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#discussion_r176231438
 
 

 ##
 File path: pom.xml
 ##
 @@ -1498,6 +1499,14 @@
 tests
 test
   
+
+  
+javax.xml.bind
+jaxb-api
+${jaxb-api.version}
+test
 
 Review comment:
   You shouldn't need to specify the scope


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82904)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=82903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82903
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 21/Mar/18 20:44
Start Date: 21/Mar/18 20:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #4870: 
[BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#discussion_r176231714
 
 

 ##
 File path: sdks/java/io/xml/build.gradle
 ##
 @@ -27,6 +27,7 @@ dependencies {
   shadow library.java.stax2_api
   shadow library.java.findbugs_jsr305
   shadow library.java.woodstox_core_asl
+  shadow library.java.jaxb_api
 
 Review comment:
   `shadowTest` to match the same intended scope inserted in the pom.xml?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82903)
Time Spent: 4h 10m  (was: 4h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=82280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82280
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 20/Mar/18 14:25
Start Date: 20/Mar/18 14:25
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4870: [BEAM-3060] Fixing mvn 
dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#issuecomment-374616925
 
 
   Builds on jenkins failed, but it's not my fault. Seems like fix for this 
https://github.com/apache/beam/pull/4902  was merged few hours ago. 
@chamikaramj should I rebase and squash all commits ? 
   I builded it using gradle and tested it locally running `./gradlew --info 
:sdks:java:io:xml:test` command and unit tests for xmlIO passed. `./gradlew 
--info :sdks:java:io:file-based-io-tests:test` is failing with other not 
related error:
   
   > Could not get unknown property 'sourceSets' for project 
':runners:direct-java' of type org.gradle.api.Project. 
   
   but it's rather out of scope of this PR to fix it. WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 82280)
Time Spent: 4h  (was: 3h 50m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81267=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81267
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 16/Mar/18 17:36
Start Date: 16/Mar/18 17:36
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #4870: [BEAM-3060] 
Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#issuecomment-373789471
 
 
   Hi Kamil, looks like sdks/java/io/xml is the only component that uses JAXB 
currently. If tests pass with 2.2.12 I'm fine with adding a root level 
dependency to that and making both sdks/java/io/xml and 
sdks/java/io/file-based-io-tests use that. Also, please update the Gradle files 
accordingly as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 81267)
Time Spent: 3h 50m  (was: 3h 40m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81113
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 16/Mar/18 08:41
Start Date: 16/Mar/18 08:41
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4870: [BEAM-3060] Fixing mvn 
dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#issuecomment-373642375
 
 
   Just two comments from me: 

   1. I was able to successfully run those file based tests on hdfs, before 
adding xmlIO tests to filebased package. Once this xmlIO tests were added to 
file based tests, the issue with mvn dependency appeared. 

   2. I run those tests on jaxb-api version 2.2.2 and 2.2.3 too. I also tested 
it with suggested by my IDE 2.2.11 version and the latest 2.2.X version - 
2.2.12 and on both tests works smoothly. In my opinion 2.2.3 version is old 
enough to change it to 2.2.11 or 2.2.12, but I am bit not aware of consequences 
for the other beam components.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 81113)
Time Spent: 3h 40m  (was: 3.5h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81015
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 15/Mar/18 23:21
Start Date: 15/Mar/18 23:21
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#discussion_r174959895
 
 

 ##
 File path: pom.xml
 ##
 @@ -186,6 +186,8 @@
 nothing
 0.20.0
 
 
 Review comment:
   Please remove the empty line.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 81015)
Time Spent: 3h 20m  (was: 3h 10m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81014
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 15/Mar/18 23:21
Start Date: 15/Mar/18 23:21
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#discussion_r174959913
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -294,6 +294,12 @@
 ${apache.hadoop.version}
 runtime
 
+
+javax.xml.bind
+jaxb-api
 
 Review comment:
   Thanks. Can you also define JAXB dependencies in 
https://github.com/apache/beam/blob/master/sdks/java/io/xml/pom.xml in root 
level and update that component to use the version defined in the root level. 
Also, can't we use 2.2.3 instead of 2.2.2 which seems to be pretty old ? I 
think we could run into issues if we have to use both 2.2.0 and 2.2.3 for 
different components.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 81014)
Time Spent: 3h 10m  (was: 3h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81016=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81016
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 15/Mar/18 23:21
Start Date: 15/Mar/18 23:21
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #4870: [BEAM-3060] 
Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#issuecomment-373553567
 
 
   cc: @kennknowles @lukecwik 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 81016)
Time Spent: 3.5h  (was: 3h 20m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81005
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 15/Mar/18 22:55
Start Date: 15/Mar/18 22:55
Worklog Time Spent: 10m 
  Work Description: szewi commented on a change in pull request #4870: 
[BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#discussion_r174956941
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -294,6 +294,12 @@
 ${apache.hadoop.version}
 runtime
 
+
+javax.xml.bind
+jaxb-api
 
 Review comment:
   No problem, I will move it to beam/pom.xml, but I need to run local tests, 
before another pull.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 81005)
Time Spent: 3h  (was: 2h 50m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81004
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 15/Mar/18 22:51
Start Date: 15/Mar/18 22:51
Worklog Time Spent: 10m 
  Work Description: szewi commented on a change in pull request #4870: 
[BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#discussion_r174956360
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -294,6 +294,12 @@
 ${apache.hadoop.version}
 runtime
 
+
+javax.xml.bind
+jaxb-api
 
 Review comment:
   Added to dependencyManagement from java/io. Version removed from filebased 
pom.xml as because it's defined in dependencyManagement and this one will be 
used. To be consistent with scopes in dependencyManagement scope was also 
changed to test and it was sufficient to run tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 81004)
Time Spent: 2h 50m  (was: 2h 40m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=81003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81003
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 15/Mar/18 22:51
Start Date: 15/Mar/18 22:51
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#discussion_r174956339
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -294,6 +294,12 @@
 ${apache.hadoop.version}
 runtime
 
+
+javax.xml.bind
+jaxb-api
 
 Review comment:
   Sorry, by root level I meant beam/pom.xml


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 81003)
Time Spent: 2h 40m  (was: 2.5h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80980
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 15/Mar/18 21:01
Start Date: 15/Mar/18 21:01
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4870: [BEAM-3060] Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#discussion_r174931346
 
 

 ##
 File path: sdks/java/io/file-based-io-tests/pom.xml
 ##
 @@ -294,6 +294,12 @@
 ${apache.hadoop.version}
 runtime
 
+
+javax.xml.bind
+jaxb-api
 
 Review comment:
   Please add this to root level dependencyManagement so that we use the same 
version of jaxb-api across components.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80980)
Time Spent: 2.5h  (was: 2h 20m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80846
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 15/Mar/18 14:03
Start Date: 15/Mar/18 14:03
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4870: [BEAM-3060] Fixing mvn 
dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870#issuecomment-373386209
 
 
   cc @chamikaramj 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80846)
Time Spent: 2h 20m  (was: 2h 10m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80842
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 15/Mar/18 14:00
Start Date: 15/Mar/18 14:00
Worklog Time Spent: 10m 
  Work Description: szewi opened a new pull request #4870: [BEAM-3060] 
Fixing mvn dependency issue when runnning filebasedIOIT t…
URL: https://github.com/apache/beam/pull/4870
 
 
   …ests on hdfs.
   
   Performance testing on jenkins of hdfs is failing with mvn dependency issue 
[jenkins job 
output].(https://builds.apache.org/job/beam_PerformanceTests_TextIOIT_HDFS/6/console)
   When job is runnning on jenkins - master branch is cloned and used, so we 
should fix this before continuing working on tests running on hdfs: 
   
   Mvn is failing with:
   
   > [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-dependency-plugin:3.0.2:analyze-only (default) 
on project beam-sdks-java-io-file-based-io-tests: Dependency problems found -> 
[Help 1]
   org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
goal org.apache.maven.plugins:maven-dependency-plugin:3.0.2:analyze-only 
(default) on project beam-sdks-java-io-file-based-io-tests: Dependency problems 
found
   
   The problem is related to javax.xml.bind:jaxb-api:jar:2.2.2:runtime which is 
used, but not declared.
   This commit fixes the issue.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand:
  - [ ] What the pull request does
  - [ ] Why it does it
  - [ ] How it does it
  - [ ] Why this approach
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80842)
Time Spent: 2h 10m  (was: 2h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80816
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 15/Mar/18 11:43
Start Date: 15/Mar/18 11:43
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-373348603
 
 
   Run Java TextIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80816)
Time Spent: 2h  (was: 1h 50m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-15 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80813
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 15/Mar/18 11:39
Start Date: 15/Mar/18 11:39
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-373347778
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80813)
Time Spent: 1h 50m  (was: 1h 40m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80384
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 14/Mar/18 16:23
Start Date: 14/Mar/18 16:23
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-373084094
 
 
   Run Java TextIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80384)
Time Spent: 1h 40m  (was: 1.5h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80379=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80379
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 14/Mar/18 16:16
Start Date: 14/Mar/18 16:16
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-373081604
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80379)
Time Spent: 1.5h  (was: 1h 20m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80359
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 14/Mar/18 15:03
Start Date: 14/Mar/18 15:03
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-373053604
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80359)
Time Spent: 1h 20m  (was: 1h 10m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80330
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 14/Mar/18 14:07
Start Date: 14/Mar/18 14:07
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-373033200
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80330)
Time Spent: 1h 10m  (was: 1h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80324
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 14/Mar/18 13:56
Start Date: 14/Mar/18 13:56
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-373029263
 
 
   Run Java TextIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80324)
Time Spent: 1h  (was: 50m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80318=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80318
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 14/Mar/18 13:26
Start Date: 14/Mar/18 13:26
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-373019804
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80318)
Time Spent: 50m  (was: 40m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80306
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 14/Mar/18 12:15
Start Date: 14/Mar/18 12:15
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-373000502
 
 
   Run Java TextIO Performance Test HDFS


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80306)
Time Spent: 40m  (was: 0.5h)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=80304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-80304
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 14/Mar/18 12:12
Start Date: 14/Mar/18 12:12
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-372999649
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 80304)
Time Spent: 0.5h  (was: 20m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=79943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79943
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 13/Mar/18 16:48
Start Date: 13/Mar/18 16:48
Worklog Time Spent: 10m 
  Work Description: szewi commented on issue #4861: [BEAM-3060] Jenkins 
configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861#issuecomment-372735722
 
 
   Run Seed Job


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79943)
Time Spent: 20m  (was: 10m)

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3060) Add performance tests for commonly used file-based I/O PTransforms

2018-03-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3060?focusedWorklogId=79941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-79941
 ]

ASF GitHub Bot logged work on BEAM-3060:


Author: ASF GitHub Bot
Created on: 13/Mar/18 16:45
Start Date: 13/Mar/18 16:45
Worklog Time Spent: 10m 
  Work Description: szewi opened a new pull request #4861: [BEAM-3060] 
Jenkins configuration allowing to run FilebasedIO tests on HDFS.
URL: https://github.com/apache/beam/pull/4861
 
 
   DESCRIPTION HERE
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand:
  - [ ] What the pull request does
  - [ ] Why it does it
  - [ ] How it does it
  - [ ] Why this approach
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 79941)
Time Spent: 10m
Remaining Estimate: 0h

> Add performance tests for commonly used file-based I/O PTransforms
> --
>
> Key: BEAM-3060
> URL: https://issues.apache.org/jira/browse/BEAM-3060
> Project: Beam
>  Issue Type: Test
>  Components: sdk-java-core
>Reporter: Chamikara Jayalath
>Assignee: Szymon Nieradka
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We recently added a performance testing framework [1] that can be used to do 
> following.
> (1) Execute Beam tests using PerfkitBenchmarker
> (2) Manage Kubernetes-based deployments of data stores.
> (3) Easily publish benchmark results. 
> I think it will be useful to add performance tests for commonly used 
> file-based I/O PTransforms using this framework. I suggest looking into 
> following formats initially.
> (1) AvroIO
> (2) TextIO
> (3) Compressed text using TextIO
> (4) TFRecordIO
> It should be possibly to run these tests for various Beam runners (Direct, 
> Dataflow, Flink, Spark, etc.) and file-systems (GCS, local, HDFS, etc.) 
> easily.
> In the initial version, tests can be made manually triggerable for PRs 
> through Jenkins. Later, we could make some of these tests run periodically 
> and publish benchmark results (to BigQuery) through PerfkitBenchmarker.
> [1] https://beam.apache.org/documentation/io/testing/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)