[jira] [Created] (BEAM-6015) Uber task for Portable Flink scalability

2018-11-07 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-6015:
--

 Summary: Uber task for Portable Flink scalability
 Key: BEAM-6015
 URL: https://issues.apache.org/jira/browse/BEAM-6015
 Project: Beam
  Issue Type: Task
  Components: java-fn-execution, runner-flink
Reporter: Ankur Goenka


Task to track scalability issues with portable flink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6002) Nexmark tests timing out on all runners (crash loop due to metrics?)

2018-11-07 Thread Chamikara Jayalath (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678925#comment-16678925
 ] 

Chamikara Jayalath commented on BEAM-6002:
--

Looks like Dataflow job (first one) succeeded but JVM crashed for some reason.

 

For example,

[https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Dataflow/952/consoleFull]

Job passed: 
[https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-11-07_14_04_35-10954309474319148697?project=apache-beam-testing]

 

But beam-sdks-java-nexmark:run failed due to following.
Process 'command '/usr/local/asfpackages/java/jdk1.8.0_191/bin/java'' finished 
with non-zero exit value 1
org.gradle.api.tasks.TaskExecutionException
: 
Execution failed for task ':beam-sdks-java-nexmark:run'.Open stacktrace
Caused by: 
org.gradle.process.internal.ExecException
: 
Process 'command '/usr/local/asfpackages/java/jdk1.8.0_191/bin/java'' finished 
with non-zero exit value 1Close stacktrace
at 
org.gradle.process.internal.DefaultExecHandle$ExecResultImpl.assertNormalExitValue(DefaultExecHandle.java:395)
at 
org.gradle.process.internal.DefaultJavaExecAction.execute(DefaultJavaExecAction.java:37)
 
 
[~apilloud] any idea what's going on here ? Doesn't look a timeout to me 
anymore.

> Nexmark tests timing out on all runners (crash loop due to metrics?) 
> -
>
> Key: BEAM-6002
> URL: https://issues.apache.org/jira/browse/BEAM-6002
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kenneth Knowles
>Assignee: Lukasz Gajowy
>Priority: Critical
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Spark/992/
> {code}
> 08:58:26 2018-11-06T16:58:26.035Z RUNNING Query0
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.elements, from namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.bytes, from namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.startTime for namespace Query0.Events
> 08:58:26 2018-11-06T16:58:26.035Z no activity
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.endTime for namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.elements, from namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.bytes, from namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTime for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTime for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTimestamp for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTimestamp for namespace Query0.Results
> 08:58:41 2018-11-06T16:58:41.035Z RUNNING Query0
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.elements, from namespace Query0.Events
> 08:58:41 2018-11-06T16:58:41.036Z no activity
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.bytes, from namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.startTime for namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.endTime for namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.elements, from namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.bytes, from namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTime for 

[jira] [Work logged] (BEAM-4444) Parquet IO for Python SDK

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-?focusedWorklogId=163724=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163724
 ]

ASF GitHub Bot logged work on BEAM-:


Author: ASF GitHub Bot
Created on: 07/Nov/18 23:04
Start Date: 07/Nov/18 23:04
Worklog Time Spent: 10m 
  Work Description: ihji removed a comment on issue #6763: [BEAM-] 
Parquet IO for Python SDK
URL: https://github.com/apache/beam/pull/6763#issuecomment-436803756
 
 
   run python postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163724)
Time Spent: 6h  (was: 5h 50m)

> Parquet IO for Python SDK
> -
>
> Key: BEAM-
> URL: https://issues.apache.org/jira/browse/BEAM-
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Bruce Arctor
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Add Parquet Support for the Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner

2018-11-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678927#comment-16678927
 ] 

Valentyn Tymofieiev commented on BEAM-6014:
---

I wonder if the reason is that with DirectRunner we are using an some 
unsupported version of BQ client, compared to Native IO in Dataflow Runner.

> bigquery_tornadoes example does not work on Python Direct Runner
> 
>
> Key: BEAM-6014
> URL: https://issues.apache.org/jira/browse/BEAM-6014
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ahmet Altay
>Priority: Major
>
> Steps to reproduce:
>  
> {noformat}
> 
>  
> PROJECT=$(gcloud config get-value project)
> BQ_DATASET=${USER}_bq_dataset
> TABLE_NAME=out
> bq rm -rf --project=$PROJECT $BQ_DATASET
> bq mk --project=$PROJECT $BQ_DATASET
> python -m apache_beam.examples.cookbook.bigquery_tornadoes --output 
> $BQ_DATASET.$TABLE_NAME --project=$PROJECT
> bq show $PROJECT:$BQ_DATASET.$TABLE_NAME
> {noformat}
> {noformat}
>  Last modified Schema Total Rows Total Bytes Expiration Time Partitioning 
> Labels 
>  - ---  - 
>  ---  
>  07 Nov 14:45:23 |- month: integer 0 0 
>  |- tornado_count: integer
> {noformat}
>  
> The resulting Database is empty, however it has several rows if we run the 
> example with Dataflow runner.
> Older versions of SDK also seem to have this problem.
> cc: [~chamikara], [~charleschen].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6002) Nexmark tests timing out on all runners (crash loop due to metrics?)

2018-11-07 Thread Chamikara Jayalath (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath reassigned BEAM-6002:


Assignee: Andrew Pilloud  (was: Lukasz Gajowy)

> Nexmark tests timing out on all runners (crash loop due to metrics?) 
> -
>
> Key: BEAM-6002
> URL: https://issues.apache.org/jira/browse/BEAM-6002
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kenneth Knowles
>Assignee: Andrew Pilloud
>Priority: Critical
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Spark/992/
> {code}
> 08:58:26 2018-11-06T16:58:26.035Z RUNNING Query0
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.elements, from namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.bytes, from namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.startTime for namespace Query0.Events
> 08:58:26 2018-11-06T16:58:26.035Z no activity
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.endTime for namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.elements, from namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.bytes, from namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTime for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTime for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTimestamp for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTimestamp for namespace Query0.Results
> 08:58:41 2018-11-06T16:58:41.035Z RUNNING Query0
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.elements, from namespace Query0.Events
> 08:58:41 2018-11-06T16:58:41.036Z no activity
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.bytes, from namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.startTime for namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.endTime for namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.elements, from namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.bytes, from namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTime for namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTime for namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTimestamp for namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTimestamp for namespace Query0.Results
> 08:58:56 2018-11-06T16:58:56.036Z RUNNING Query0
> 08:58:56 18/11/06 16:58:56 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.elements, from namespace Query0.Events
> 08:58:56 2018-11-06T16:58:56.036Z no activity
> 08:58:56 18/11/06 16:58:56 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.bytes, from namespace Query0.Events
> 08:58:56 18/11/06 16:58:56 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.startTime for 

[jira] [Work logged] (BEAM-4444) Parquet IO for Python SDK

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-?focusedWorklogId=163725=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163725
 ]

ASF GitHub Bot logged work on BEAM-:


Author: ASF GitHub Bot
Created on: 07/Nov/18 23:04
Start Date: 07/Nov/18 23:04
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #6763: [BEAM-] Parquet IO 
for Python SDK
URL: https://github.com/apache/beam/pull/6763#issuecomment-436810501
 
 
   run python postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163725)
Time Spent: 6h 10m  (was: 6h)

> Parquet IO for Python SDK
> -
>
> Key: BEAM-
> URL: https://issues.apache.org/jira/browse/BEAM-
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Bruce Arctor
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Add Parquet Support for the Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner

2018-11-07 Thread Charles Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678911#comment-16678911
 ] 

Charles Chen commented on BEAM-6014:


I can confirm that this problem occurs for me too.  For more info, I added 
debugging statements and found that the InsertAll requests all return from 
BigQuery without error, but the table ends up without output:

{{BigQueryWriteFn \{'tornado_count': 16, 'month': 1}}}
{{BigQueryWriteFn \{'tornado_count': 6, 'month': 3}}}
{{BigQueryWriteFn \{'tornado_count': 7, 'month': 2}}}
{{BigQueryWriteFn \{'tornado_count': 6, 'month': 5}}}
{{BigQueryWriteFn \{'tornado_count': 5, 'month': 4}}}
{{BigQueryWriteFn \{'tornado_count': 8, 'month': 7}}}
{{BigQueryWriteFn \{'tornado_count': 5, 'month': 6}}}
{{BigQueryWriteFn \{'tornado_count': 7, 'month': 9}}}
{{BigQueryWriteFn \{'tornado_count': 4, 'month': 8}}}
{{BigQueryWriteFn \{'tornado_count': 9, 'month': 11}}}
{{BigQueryWriteFn \{'tornado_count': 10, 'month': 10}}}
{{BigQueryWriteFn \{'tornado_count': 10, 'month': 12}}}
{{FINISH BUNDLE}}
{{FLUSH BATCH}}
{{_insert_all_rows REQUEST >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>]>}}
{{ tableId: 'out'>}}
{{_insert_all_rows RESPONSE }}
{{BigQuery table: [MY_PROJECT:myproject.out]; errors: []}}
{{Successfully wrote 12 rows.}}

> bigquery_tornadoes example does not work on Python Direct Runner
> 
>
> Key: BEAM-6014
> URL: https://issues.apache.org/jira/browse/BEAM-6014
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ahmet Altay
>Priority: Major
>
> Steps to reproduce:
>  
> {noformat}
> 
>  
> PROJECT=$(gcloud config get-value project)
> BQ_DATASET=${USER}_bq_dataset
> TABLE_NAME=out
> bq rm -rf --project=$PROJECT $BQ_DATASET
> bq mk --project=$PROJECT $BQ_DATASET
> python -m apache_beam.examples.cookbook.bigquery_tornadoes --output 
> $BQ_DATASET.$TABLE_NAME --project=$PROJECT
> bq show $PROJECT:$BQ_DATASET.$TABLE_NAME
> {noformat}
> The resulting Database is empty, however it has several rows if we run the 
> example with Dataflow runner.
> Older versions of SDK also seem to have this problem.
> cc: [~chamikara], [~charleschen].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner

2018-11-07 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-6014:
--
Description: 
Steps to reproduce:

 
{noformat}

 
PROJECT=$(gcloud config get-value project)
BQ_DATASET=${USER}_bq_dataset
TABLE_NAME=out
bq rm -rf --project=$PROJECT $BQ_DATASET
bq mk --project=$PROJECT $BQ_DATASET
python -m apache_beam.examples.cookbook.bigquery_tornadoes --output 
$BQ_DATASET.$TABLE_NAME --project=$PROJECT
bq show $PROJECT:$BQ_DATASET.$TABLE_NAME
{noformat}
{noformat}
 Last modified Schema Total Rows Total Bytes Expiration Time Partitioning 
Labels 
 - ---  - 
 ---  
 07 Nov 14:45:23 |- month: integer 0 0 
 |- tornado_count: integer
{noformat}
 

The resulting Database is empty, however it has several rows if we run the 
example with Dataflow runner.

Older versions of SDK also seem to have this problem.

cc: [~chamikara], [~charleschen].

  was:
Steps to reproduce:

 
{noformat}

 
PROJECT=$(gcloud config get-value project)
BQ_DATASET=${USER}_bq_dataset
TABLE_NAME=out
bq rm -rf --project=$PROJECT $BQ_DATASET
bq mk --project=$PROJECT $BQ_DATASET
python -m apache_beam.examples.cookbook.bigquery_tornadoes --output 
$BQ_DATASET.$TABLE_NAME --project=$PROJECT
bq show $PROJECT:$BQ_DATASET.$TABLE_NAME
{noformat}
The resulting Database is empty, however it has several rows if we run the 
example with Dataflow runner.

Older versions of SDK also seem to have this problem.

cc: [~chamikara], [~charleschen].


> bigquery_tornadoes example does not work on Python Direct Runner
> 
>
> Key: BEAM-6014
> URL: https://issues.apache.org/jira/browse/BEAM-6014
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ahmet Altay
>Priority: Major
>
> Steps to reproduce:
>  
> {noformat}
> 
>  
> PROJECT=$(gcloud config get-value project)
> BQ_DATASET=${USER}_bq_dataset
> TABLE_NAME=out
> bq rm -rf --project=$PROJECT $BQ_DATASET
> bq mk --project=$PROJECT $BQ_DATASET
> python -m apache_beam.examples.cookbook.bigquery_tornadoes --output 
> $BQ_DATASET.$TABLE_NAME --project=$PROJECT
> bq show $PROJECT:$BQ_DATASET.$TABLE_NAME
> {noformat}
> {noformat}
>  Last modified Schema Total Rows Total Bytes Expiration Time Partitioning 
> Labels 
>  - ---  - 
>  ---  
>  07 Nov 14:45:23 |- month: integer 0 0 
>  |- tornado_count: integer
> {noformat}
>  
> The resulting Database is empty, however it has several rows if we run the 
> example with Dataflow runner.
> Older versions of SDK also seem to have this problem.
> cc: [~chamikara], [~charleschen].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5798) Add support for dynamic destinations when writing to Kafka

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5798?focusedWorklogId=163717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163717
 ]

ASF GitHub Bot logged work on BEAM-5798:


Author: ASF GitHub Bot
Created on: 07/Nov/18 22:56
Start Date: 07/Nov/18 22:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #6776: WIP: [BEAM-5798] 
Added "withTopicFn()" to set sink topics dynamically
URL: https://github.com/apache/beam/pull/6776#issuecomment-436808370
 
 
   Closed PR as per authors comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163717)
Time Spent: 2h  (was: 1h 50m)

> Add support for dynamic destinations when writing to Kafka
> --
>
> Key: BEAM-5798
> URL: https://issues.apache.org/jira/browse/BEAM-5798
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Luke Cwik
>Assignee: Alexey Romanenko
>Priority: Major
>  Labels: newbie, starter
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add support for writing to Kafka based upon contents of the data. This is 
> similar to the dynamic destination approach for file IO and other sinks.
>  
> Source of request: 
> https://lists.apache.org/thread.html/a89d1d32ecdb50c42271e805cc01a651ee3623b4df97c39baf4f2053@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5798) Add support for dynamic destinations when writing to Kafka

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5798?focusedWorklogId=163718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163718
 ]

ASF GitHub Bot logged work on BEAM-5798:


Author: ASF GitHub Bot
Created on: 07/Nov/18 22:56
Start Date: 07/Nov/18 22:56
Worklog Time Spent: 10m 
  Work Description: lukecwik closed pull request #6776: WIP: [BEAM-5798] 
Added "withTopicFn()" to set sink topics dynamically
URL: https://github.com/apache/beam/pull/6776
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java 
b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
index 31ba72c54ba..e311688b179 100644
--- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
+++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
@@ -167,7 +167,7 @@
  * PCollection> kvColl = ...;
  * kvColl.apply(KafkaIO.write()
  *  .withBootstrapServers("broker_1:9092,broker_2:9092")
- *  .withTopic("results")
+ *  .withTopic("results") // use withTopicFn(SerializableFunction fn) to 
set topics dynamically
  *
  *  .withKeySerializer(LongSerializer.class)
  *  .withValueSerializer(StringSerializer.class)
@@ -862,6 +862,9 @@ private KafkaIO() {}
 @Nullable
 abstract String getTopic();
 
+@Nullable
+abstract SerializableFunction, String> getTopicFn();
+
 abstract Map getProducerConfig();
 
 @Nullable
@@ -894,6 +897,8 @@ private KafkaIO() {}
 abstract static class Builder {
   abstract Builder setTopic(String topic);
 
+  abstract Builder setTopicFn(SerializableFunction, String> 
fn);
+
   abstract Builder setProducerConfig(Map 
producerConfig);
 
   abstract Builder setProducerFactoryFn(
@@ -927,9 +932,20 @@ private KafkaIO() {}
   ImmutableMap.of(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
bootstrapServers));
 }
 
-/** Sets the Kafka topic to write to. */
+/**
+ * Sets the Kafka topic to write to. Note that this overrides any 
previously function set
+ * by {@link #withTopicFn}.
+ */
 public Write withTopic(String topic) {
-  return toBuilder().setTopic(topic).build();
+  return toBuilder().setTopic(topic).setTopicFn(null).build();
+}
+
+/**
+ * Sets a custom function to define sink topic dynamically. Note that this 
overrides
+ * any previously set topic by {@link #withTopic}.
+ */
+public Write withTopicFn(SerializableFunction, String> 
topicFn) {
+  return toBuilder().setTopic(null).setTopicFn(topicFn).build();
 }
 
 /**
@@ -1057,11 +1073,16 @@ public PDone expand(PCollection> input) {
   checkArgument(
   getProducerConfig().get(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG) != 
null,
   "withBootstrapServers() is required");
-  checkArgument(getTopic() != null, "withTopic() is required");
+  checkArgument(
+  getTopic() != null || getTopicFn() != null, "withTopic() or 
withTopicFn() is required");
+
   checkArgument(getKeySerializer() != null, "withKeySerializer() is 
required");
   checkArgument(getValueSerializer() != null, "withValueSerializer() is 
required");
 
   if (isEOS()) {
+checkArgument(getTopic() != null, "withTopic() is required with EOS 
sink");
+checkArgument(getTopicFn() == null, "withTopicFn() can't be used 
together with EOS sink");
+
 KafkaExactlyOnceSink.ensureEOSSupport();
 
 // TODO: Verify that the group_id does not have existing state stored 
on Kafka unless
diff --git 
a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java
 
b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java
index beaa9a22053..3b55652a720 100644
--- 
a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java
+++ 
b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java
@@ -61,8 +61,9 @@ public void processElement(ProcessContext ctx) throws 
Exception {
 ? spec.getPublishTimestampFunction().getTimestamp(kv, 
ctx.timestamp()).getMillis()
 : null;
 
+String topic = spec.getTopicFn() != null ? spec.getTopicFn().apply(kv) : 
spec.getTopic();
 producer.send(
-new ProducerRecord<>(spec.getTopic(), null, timestampMillis, 
kv.getKey(), kv.getValue()),
+new ProducerRecord<>(topic, null, timestampMillis, kv.getKey(), 
kv.getValue()),
 new SendCallback());
 
 elementsWritten.inc();
diff --git 
a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOTest.java
 

[jira] [Comment Edited] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner

2018-11-07 Thread Charles Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678911#comment-16678911
 ] 

Charles Chen edited comment on BEAM-6014 at 11/7/18 10:55 PM:
--

I can confirm that this problem occurs for me too.  For more info, I added 
debugging statements and found that the InsertAll requests all return from 
BigQuery without error, but the table ends up without output:


{code}
BigQueryWriteFn {'tornado_count': 16, 'month': 1}
BigQueryWriteFn {'tornado_count': 6, 'month': 3}
BigQueryWriteFn {'tornado_count': 7, 'month': 2}
BigQueryWriteFn {'tornado_count': 6, 'month': 5}
BigQueryWriteFn {'tornado_count': 5, 'month': 4}
BigQueryWriteFn {'tornado_count': 8, 'month': 7}
BigQueryWriteFn {'tornado_count': 5, 'month': 6}
BigQueryWriteFn {'tornado_count': 7, 'month': 9}
BigQueryWriteFn {'tornado_count': 4, 'month': 8}
BigQueryWriteFn {'tornado_count': 9, 'month': 11}
BigQueryWriteFn {'tornado_count': 10, 'month': 10}
BigQueryWriteFn {'tornado_count': 10, 'month': 12}
FINISH BUNDLE
FLUSH BATCH
_insert_all_rows REQUEST >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>]>
 tableId: 'out'>
_insert_all_rows RESPONSE 
BigQuery table: [MY_PROJECT:myproject.out]; errors: []
Successfully wrote 12 rows.
{code}



was (Author: charleschen):
I can confirm that this problem occurs for me too.  For more info, I added 
debugging statements and found that the InsertAll requests all return from 
BigQuery without error, but the table ends up without output:

{{BigQueryWriteFn \{'tornado_count': 16, 'month': 1}}}
{{BigQueryWriteFn \{'tornado_count': 6, 'month': 3}}}
{{BigQueryWriteFn \{'tornado_count': 7, 'month': 2}}}
{{BigQueryWriteFn \{'tornado_count': 6, 'month': 5}}}
{{BigQueryWriteFn \{'tornado_count': 5, 'month': 4}}}
{{BigQueryWriteFn \{'tornado_count': 8, 'month': 7}}}
{{BigQueryWriteFn \{'tornado_count': 5, 'month': 6}}}
{{BigQueryWriteFn \{'tornado_count': 7, 'month': 9}}}
{{BigQueryWriteFn \{'tornado_count': 4, 'month': 8}}}
{{BigQueryWriteFn \{'tornado_count': 9, 'month': 11}}}
{{BigQueryWriteFn \{'tornado_count': 10, 'month': 10}}}
{{BigQueryWriteFn \{'tornado_count': 10, 'month': 12}}}
{{FINISH BUNDLE}}
{{FLUSH BATCH}}
{{_insert_all_rows REQUEST >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>]>}}
{{ tableId: 'out'>}}
{{_insert_all_rows RESPONSE }}
{{BigQuery table: [MY_PROJECT:myproject.out]; errors: []}}
{{Successfully wrote 12 rows.}}

> bigquery_tornadoes example does not work on Python Direct Runner
> 
>
> Key: BEAM-6014
> URL: https://issues.apache.org/jira/browse/BEAM-6014
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ahmet Altay
>Priority: Major
>
> Steps to reproduce:
>  
> {noformat}
> 
>  
> PROJECT=$(gcloud config get-value project)
> BQ_DATASET=${USER}_bq_dataset
> TABLE_NAME=out
> bq rm -rf --project=$PROJECT $BQ_DATASET
> bq mk --project=$PROJECT $BQ_DATASET
> python -m apache_beam.examples.cookbook.bigquery_tornadoes --output 
> $BQ_DATASET.$TABLE_NAME --project=$PROJECT
> bq show $PROJECT:$BQ_DATASET.$TABLE_NAME
> {noformat}
> The resulting Database is empty, however it has several rows if we run the 
> example with Dataflow runner.
> Older versions of SDK also seem to have this problem.
> cc: [~chamikara], [~charleschen].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163715
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 22:53
Start Date: 07/Nov/18 22:53
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #6966: 
[BEAM-5931] Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#discussion_r231709315
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1141,12 +1141,10 @@ artifactId=${project.name}
 
 include "**/*IT.class"
 
-def pipelineOptionString = configuration.integrationTestPipelineOptions
-if(configuration.runner?.equalsIgnoreCase('dataflow')) {
+def pipelineOptionsString = 
configuration.integrationTestPipelineOptions
+if(pipelineOptionsString && 
configuration.runner?.equalsIgnoreCase('dataflow')) {
 
 Review comment:
   Won't you want to create a new pipelineOptionsString if there wasn't one 
defined containing the args that you added?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163715)
Time Spent: 5.5h  (was: 5h 20m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner

2018-11-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678895#comment-16678895
 ] 

Valentyn Tymofieiev commented on BEAM-6014:
---

Also tried to manually tried to disable FnAPI runner at 
[https://github.com/apache/beam/blob/b558c4d4f21ae1a48fb51afedd849b1f75ff3325/sdks/python/apache_beam/runners/direct/direct_runner.py#L76]
 , with no change in behavior.

> bigquery_tornadoes example does not work on Python Direct Runner
> 
>
> Key: BEAM-6014
> URL: https://issues.apache.org/jira/browse/BEAM-6014
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ahmet Altay
>Priority: Major
>
> Steps to reproduce:
>  
> {noformat}
> 
>  
> PROJECT=$(gcloud config get-value project)
> BQ_DATASET=${USER}_bq_dataset
> TABLE_NAME=out
> bq rm -rf --project=$PROJECT $BQ_DATASET
> bq mk --project=$PROJECT $BQ_DATASET
> python -m apache_beam.examples.cookbook.bigquery_tornadoes --output 
> $BQ_DATASET.$TABLE_NAME --project=$PROJECT
> bq show $PROJECT:$BQ_DATASET.$TABLE_NAME
> {noformat}
> The resulting Database is empty, however it has several rows if we run the 
> example with Dataflow runner.
> Older versions of SDK also seem to have this problem.
> cc: [~chamikara], [~charleschen].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5939) Deduplicate constants

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163712
 ]

ASF GitHub Bot logged work on BEAM-5939:


Author: ASF GitHub Bot
Created on: 07/Nov/18 22:46
Start Date: 07/Nov/18 22:46
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #6976: [BEAM-5939] Dedupe 
runner constants
URL: https://github.com/apache/beam/pull/6976#issuecomment-436805959
 
 
   Jobs are failing with the following error:
   
   ```
   Traceback (most recent call last):
 File 
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
766, in run
   self._load_main_session(self.local_staging_directory)
 File 
"/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
480, in _load_main_session
   session_file = os.path.join(session_path, 
names.PICKLED_MAIN_SESSION_FILE)
   AttributeError: 'module' object has no attribute 'PICKLED_MAIN_SESSION_FILE'
   ```
   This is happening because Dataflow's python workers depend on the currently 
existing names.py. A change there also needs to happen, and that code is not 
open sourced yet.
   
   I will suggest, keeping the contents of existing names.py as it is, and 
making your other changes anyway. I can ask someone from Dataflow team to first 
clean up the worker (to depend on the new code) then do the clean up here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163712)
Time Spent: 1h  (was: 50m)

> Deduplicate constants
> -
>
> Key: BEAM-5939
> URL: https://issues.apache.org/jira/browse/BEAM-5939
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starer
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> apache_beam/runners/dataflow/internal/names.py
> apache_beam/runners/portability/stager.py
> has same constants defined in both files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner

2018-11-07 Thread Valentyn Tymofieiev (JIRA)
Valentyn Tymofieiev created BEAM-6014:
-

 Summary: bigquery_tornadoes example does not work on Python Direct 
Runner
 Key: BEAM-6014
 URL: https://issues.apache.org/jira/browse/BEAM-6014
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Valentyn Tymofieiev
Assignee: Ahmet Altay


Steps to reproduce:

 
{noformat}

 
PROJECT=$(gcloud config get-value project)
BQ_DATASET=${USER}_bq_dataset
TABLE_NAME=out
bq rm -rf --project=$PROJECT $BQ_DATASET
bq mk --project=$PROJECT $BQ_DATASET
python -m apache_beam.examples.cookbook.bigquery_tornadoes --output 
$BQ_DATASET.$TABLE_NAME --project=$PROJECT
bq show $PROJECT:$BQ_DATASET.$TABLE_NAME
{noformat}
The resulting Database is empty, however it has several rows if we run the 
example with Dataflow runner.

Older versions of SDK also seem to have this problem.

cc: [~chamikara], [~charleschen].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6013) Reduce verbose logging within SerializableCoder

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6013?focusedWorklogId=163686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163686
 ]

ASF GitHub Bot logged work on BEAM-6013:


Author: ASF GitHub Bot
Created on: 07/Nov/18 22:08
Start Date: 07/Nov/18 22:08
Worklog Time Spent: 10m 
  Work Description: lukecwik opened a new pull request #6982: [BEAM-6013] 
Reduce logging within SerializableCoder.
URL: https://github.com/apache/beam/pull/6982
 
 
   This significantly reduces logging to happen at most once per class load.
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163686)
Time Spent: 10m
Remaining Estimate: 0h

> Reduce verbose logging within SerializableCoder
> ---
>
> 

[jira] [Commented] (BEAM-5953) Support DataflowRunner on Python 3

2018-11-07 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678898#comment-16678898
 ] 

Scott Wegner commented on BEAM-5953:


I don't have much to contribute, but some additional details might help others:

* How did you launch the job? i.e. can you provide the commandline used?
* Are you able to successfully run a simpler job (i.e. using existing python 
support instead of python 3?)


> Support DataflowRunner on Python 3
> --
>
> Key: BEAM-5953
> URL: https://issues.apache.org/jira/browse/BEAM-5953
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4444) Parquet IO for Python SDK

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-?focusedWorklogId=163709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163709
 ]

ASF GitHub Bot logged work on BEAM-:


Author: ASF GitHub Bot
Created on: 07/Nov/18 22:38
Start Date: 07/Nov/18 22:38
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #6763: [BEAM-] Parquet IO 
for Python SDK
URL: https://github.com/apache/beam/pull/6763#issuecomment-436803756
 
 
   run python postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163709)
Time Spent: 5h 50m  (was: 5h 40m)

> Parquet IO for Python SDK
> -
>
> Key: BEAM-
> URL: https://issues.apache.org/jira/browse/BEAM-
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Bruce Arctor
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Add Parquet Support for the Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4444) Parquet IO for Python SDK

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-?focusedWorklogId=163708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163708
 ]

ASF GitHub Bot logged work on BEAM-:


Author: ASF GitHub Bot
Created on: 07/Nov/18 22:38
Start Date: 07/Nov/18 22:38
Worklog Time Spent: 10m 
  Work Description: ihji removed a comment on issue #6763: [BEAM-] 
Parquet IO for Python SDK
URL: https://github.com/apache/beam/pull/6763#issuecomment-436114110
 
 
   run python postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163708)
Time Spent: 5h 40m  (was: 5.5h)

> Parquet IO for Python SDK
> -
>
> Key: BEAM-
> URL: https://issues.apache.org/jira/browse/BEAM-
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Bruce Arctor
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Add Parquet Support for the Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner

2018-11-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678895#comment-16678895
 ] 

Valentyn Tymofieiev edited comment on BEAM-6014 at 11/7/18 10:35 PM:
-

Also tried to manually disable FnAPI runner at 
[https://github.com/apache/beam/blob/b558c4d4f21ae1a48fb51afedd849b1f75ff3325/sdks/python/apache_beam/runners/direct/direct_runner.py#L76]
 , with no change in behavior.


was (Author: tvalentyn):
Also tried to manually tried to disable FnAPI runner at 
[https://github.com/apache/beam/blob/b558c4d4f21ae1a48fb51afedd849b1f75ff3325/sdks/python/apache_beam/runners/direct/direct_runner.py#L76]
 , with no change in behavior.

> bigquery_tornadoes example does not work on Python Direct Runner
> 
>
> Key: BEAM-6014
> URL: https://issues.apache.org/jira/browse/BEAM-6014
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Ahmet Altay
>Priority: Major
>
> Steps to reproduce:
>  
> {noformat}
> 
>  
> PROJECT=$(gcloud config get-value project)
> BQ_DATASET=${USER}_bq_dataset
> TABLE_NAME=out
> bq rm -rf --project=$PROJECT $BQ_DATASET
> bq mk --project=$PROJECT $BQ_DATASET
> python -m apache_beam.examples.cookbook.bigquery_tornadoes --output 
> $BQ_DATASET.$TABLE_NAME --project=$PROJECT
> bq show $PROJECT:$BQ_DATASET.$TABLE_NAME
> {noformat}
> The resulting Database is empty, however it has several rows if we run the 
> example with Dataflow runner.
> Older versions of SDK also seem to have this problem.
> cc: [~chamikara], [~charleschen].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6013) Reduce verbose logging within SerializableCoder

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6013?focusedWorklogId=163687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163687
 ]

ASF GitHub Bot logged work on BEAM-6013:


Author: ASF GitHub Bot
Created on: 07/Nov/18 22:08
Start Date: 07/Nov/18 22:08
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #6982: [BEAM-6013] Reduce 
logging within SerializableCoder.
URL: https://github.com/apache/beam/pull/6982#issuecomment-436795762
 
 
   R: @udim @mairbek 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163687)
Time Spent: 20m  (was: 10m)

> Reduce verbose logging within SerializableCoder
> ---
>
> Key: BEAM-6013
> URL: https://issues.apache.org/jira/browse/BEAM-6013
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following message is spamming logs constantly:
> "Can't verify serialized elements of type TypeX have well defined equals 
> method. This may produce incorrect results on some PipelineRunner"
> Code location:
> https://github.com/apache/beam/blob/429547981b4534a29a0654e3b86f1a718793d816/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SerializableCoder.java#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6013) Reduce verbose logging within SerializableCoder

2018-11-07 Thread Luke Cwik (JIRA)
Luke Cwik created BEAM-6013:
---

 Summary: Reduce verbose logging within SerializableCoder
 Key: BEAM-6013
 URL: https://issues.apache.org/jira/browse/BEAM-6013
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Luke Cwik
Assignee: Luke Cwik


The following message is spamming logs constantly:
"Can't verify serialized elements of type TypeX have well defined equals 
method. This may produce incorrect results on some PipelineRunner"

Code location:
https://github.com/apache/beam/blob/429547981b4534a29a0654e3b86f1a718793d816/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SerializableCoder.java#L104





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5979) Support DATE and TIME in DML

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5979?focusedWorklogId=163642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163642
 ]

ASF GitHub Bot logged work on BEAM-5979:


Author: ASF GitHub Bot
Created on: 07/Nov/18 20:29
Start Date: 07/Nov/18 20:29
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #6967: [BEAM-5979]  Fix 
DATE and TIME in INSERTION
URL: https://github.com/apache/beam/pull/6967#issuecomment-436766368
 
 
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163642)
Time Spent: 2h 10m  (was: 2h)

> Support DATE and TIME in DML
> 
>
> Key: BEAM-5979
> URL: https://issues.apache.org/jira/browse/BEAM-5979
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Right now, BeamSQL uses Schema's DATETIME field to save all time related 
> data. However, BeamSQL doesn't implement correctly how TIME and DATE should 
> be converted to Joda's datetime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3221) Model pipeline representation improvements

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-3221:
---

Assignee: (was: Henning Rohde)

> Model pipeline representation improvements
> --
>
> Key: BEAM-3221
> URL: https://issues.apache.org/jira/browse/BEAM-3221
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
>
> Collections of various (breaking) tweaks to the Runner API, notably the 
> pipeline representation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5679) Beam should report the account used to attempt to submit a job to Dataflow

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5679:
---

Assignee: (was: Henning Rohde)

> Beam should report the account used to attempt to submit a job to Dataflow
> --
>
> Key: BEAM-5679
> URL: https://issues.apache.org/jira/browse/BEAM-5679
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Pablo Estrada
>Priority: Major
>
> If a user lacks some privilege, the job will not be created. Users should see 
> which account is the one lacking privileges.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5791) Bound the amount of data on the data plane by time.

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5791:
---

Assignee: (was: Henning Rohde)

> Bound the amount of data on the data plane by time.
> ---
>
> Key: BEAM-5791
> URL: https://issues.apache.org/jira/browse/BEAM-5791
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This is especially important for Fn API reads, where each element represents 
> a shard to read and may be very expensive, but many elements may be waiting 
> in the Fn API buffer.
> The need for this will be mitigated with full SDF support for liquid sharding 
> over the Fn API, but not eliminated unless the runner can "unread" elements 
> it has already sent. 
> This is especially important in for dataflow jobs that start out small but 
> then detect that they need more workers (e.g. due to the initial inputs being 
> an SDF).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5661) [beam_PostCommit_Py_ValCont] [:beam-sdks-python-container:docker] no such file or directory

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5661:
---

Assignee: (was: Henning Rohde)

> [beam_PostCommit_Py_ValCont] [:beam-sdks-python-container:docker] no such 
> file or directory
> ---
>
> Key: BEAM-5661
> URL: https://issues.apache.org/jira/browse/BEAM-5661
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Scott Wegner
>Priority: Major
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins Job|https://builds.apache.org/job/beam_PostCommit_Py_ValCont/902/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/pmjbkhxaeryx4/console-log?task=:beam-sdks-python-container:docker#L2]
>  * [Test source 
> code|https://github.com/apache/beam/blob/845f8d0abcc5a8d7f93457c27aff0feeb1a867d5/sdks/python/container/build.gradle#L61]
> Initial investigation:
> {{failed to get digest 
> sha256:4ee4ea2f0113e98b49d8e376ce847feb374ddf2b8ea775502459d8a1b8a3eaed: open 
> /var/lib/docker/image/aufs/imagedb/content/sha256/4ee4ea2f0113e98b49d8e376ce847feb374ddf2b8ea775502459d8a1b8a3eaed:
>  no such file or directory}}
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5657) Enable CheckStyle in dataflow java worker code

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5657:
---

Assignee: (was: Henning Rohde)

> Enable CheckStyle in dataflow java worker code
> --
>
> Key: BEAM-5657
> URL: https://issues.apache.org/jira/browse/BEAM-5657
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Priority: Minor
>
> Currently, the CheckStyle task is disabled in dataflow worker code. We should 
> fix the warnings and errors and enable this task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5689) Remove artifact naming constraint for portable Dataflow job

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5689:
---

Assignee: (was: Henning Rohde)

> Remove artifact naming constraint for portable Dataflow job
> ---
>
> Key: BEAM-5689
> URL: https://issues.apache.org/jira/browse/BEAM-5689
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Priority: Minor
>
> Artifact names/keys are not preserved in Dataflow. Remove the below 
> workarounds when they are.
>  * Go Dataflow runner
>  * Java and Python container boot code (probably)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5686) Remove DataflowRunnerHarness shim again

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5686:
---

Assignee: (was: Henning Rohde)

> Remove DataflowRunnerHarness shim again
> ---
>
> Key: BEAM-5686
> URL: https://issues.apache.org/jira/browse/BEAM-5686
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5659) Enable Javadocs check in dataflow java worker code

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5659:
---

Assignee: (was: Henning Rohde)

> Enable Javadocs check in dataflow java worker code
> --
>
> Key: BEAM-5659
> URL: https://issues.apache.org/jira/browse/BEAM-5659
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Priority: Minor
>
> The Javadoc check task is disabled for now. We should enable this task after 
> fixing all warnings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5466) Cannot deploy job to Dataflow with RuntimeValueProvider

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5466:
---

Assignee: (was: Henning Rohde)

> Cannot deploy job to Dataflow with RuntimeValueProvider
> ---
>
> Key: BEAM-5466
> URL: https://issues.apache.org/jira/browse/BEAM-5466
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Affects Versions: 2.6.0
> Environment: Python 2.7
>Reporter: Mackenzie
>Priority: Major
>
> I cannot deploy an apache beam job to Cloud Dataflow that contains runtime 
> value parameters.
> The standard use case is with Cloud Dataflow Templates which use 
> RuntimeValueProvider to get template parameters.
> When trying to call `get` on the parameter, I always get an error like:
> {noformat}
> apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: 
> myparam, type: str, default_value: 'defalut-value').get() not called from a 
> runtime context
> {noformat}
>  
> A minimal example:
> {code:java}
> class UserOptions(PipelineOptions):
> @classmethod
> def _add_argparse_args(cls, parser):
> parser.add_value_provider_argument('--myparam', type=str, 
> default='default-value')
> def run(argv=None):
> parser = argparse.ArgumentParser()
> known_args, pipeline_args = parser.parse_known_args(argv)
> pipeline_options = PipelineOptions(pipeline_args)
> pipeline_options.view_as(SetupOptions).save_main_session = True
> google_cloud_options = pipeline_options.view_as(GoogleCloudOptions)
> # insert google cloud options here, or pass them in arguments
> standard_options = pipeline_options.view_as(StandardOptions)
> standard_options.runner = 'DataflowRunner'
> user_options = pipeline_options.view_as(UserOptions)
> p = beam.Pipeline(options=pipeline_options)
> param = user_options.myparam.get() # This line is the issue
> result = p.run()
> result.wait_until_finish()
> if __name__ == '__main__':
> run()
> {code}
> I would expect that the runtime context would be ignored when running the 
> script locally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3204) Coders only should have a FunctionSpec, not an SdkFunctionSpec

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-3204:
---

Assignee: (was: Henning Rohde)

> Coders only should have a FunctionSpec, not an SdkFunctionSpec
> --
>
> Key: BEAM-3204
> URL: https://issues.apache.org/jira/browse/BEAM-3204
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: portability
>
> We added environments to coders to account for "custom" coders where it is 
> only really possible for one SDK to understand them, like this:
> {code}
> Coder {
>   spec: SdkFunctionSpec {
> environment: "java_sdk_docker_container",
> spec: FunctionSpec {
>   urn: "beam:coder:java_custom_coder",
>   payload: 
> }
>   }
> }
> {code}
> But a coder must be understood by both the producer of a PCollection and its 
> consumers. A coder is not the same as other UDF, though these are 
> user-defined.
> A pipeline where either the producer or consumer cannot handle the coder is 
> invalid, and we will have to build our cross-language APIs to prevent 
> construction of such a pipeline. So we can drop the environment.
> I think there are some folks who want to reserve the ability to add an 
> environment later, perhaps, to not pain ourselves into a corner. In this 
> case, we can just add a field to Coder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-2885) Local Dataflow proxy for portable job submission

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-2885:
---

Assignee: (was: Henning Rohde)

> Local Dataflow proxy for portable job submission
> 
>
> Key: BEAM-2885
> URL: https://issues.apache.org/jira/browse/BEAM-2885
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> As per https://s.apache.org/beam-job-api, use local support for 
> submission-side. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5658) Enable FindBugs in dataflow java worker code

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5658:
---

Assignee: (was: Henning Rohde)

> Enable FindBugs in dataflow java worker code
> 
>
> Key: BEAM-5658
> URL: https://issues.apache.org/jira/browse/BEAM-5658
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Priority: Minor
>
> Currently we disabled FindBugs task in worker code. We should enable if after 
> fixing all related warnings and errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-5512) Dataflow migrates to using shared portability library for user timers

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-5512:
---

Assignee: (was: Henning Rohde)

> Dataflow migrates to using shared portability library for user timers
> -
>
> Key: BEAM-5512
> URL: https://issues.apache.org/jira/browse/BEAM-5512
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Priority: Major
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3514) Use portable WindowIntoPayload in DataflowRunner

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-3514:
---

Assignee: (was: Henning Rohde)

> Use portable WindowIntoPayload in DataflowRunner
> 
>
> Key: BEAM-3514
> URL: https://issues.apache.org/jira/browse/BEAM-3514
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: portability
>
> The Java-specific blobs transmitted to Dataflow need more context, in the 
> form of portability framework protos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3223) PTransform spec should not reuse FunctionSpec

2018-11-07 Thread Henning Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henning Rohde reassigned BEAM-3223:
---

Assignee: (was: Henning Rohde)

> PTransform spec should not reuse FunctionSpec
> -
>
> Key: BEAM-3223
> URL: https://issues.apache.org/jira/browse/BEAM-3223
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
>
> We should add a new type instead, TransformSpec, say, or just inline a URN 
> and payload. It's confusing otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4821) Update Built-in I/O transforms page for Python

2018-11-07 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678699#comment-16678699
 ] 

Udi Meiri commented on BEAM-4821:
-

I believe both reads and writes are possible on HDFS.

> Update Built-in I/O transforms page for Python
> --
>
> Key: BEAM-4821
> URL: https://issues.apache.org/jira/browse/BEAM-4821
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Ahmet Altay
>Assignee: Melissa Pashniak
>Priority: Major
>
> We need to update:
> [https://beam.apache.org/documentation/io/built-in/]
> For python. Python now supports HDFS for reading.
> cc: [~udim]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-5953) Support DataflowRunner on Python 3

2018-11-07 Thread Mark Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677546#comment-16677546
 ] 

Mark Liu edited comment on BEAM-5953 at 11/7/18 7:47 PM:
-

With provided Python 3 SDK container (by BEAM-5089) and some fix 
([https://github.com/markflyhigh/incubator-beam/pull/3]) to the Python 3 type 
error, I'm able to invoke wordcount_fnapi_it against TestDataflowRunner on 
Python 3. The job can be submitted to the service but the runner harness seems 
broken.

Failure job link:
 
[https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-11-05_15_43_59-9596490965399700763?project=google.com:clouddfe]

Exception in worker log:
{code:java}
I  Exception in thread "main"
I  
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException:
 Protocol message tag had invalid wire type. 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:115)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:551)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.GeneratedMessageV3.parseUnknownFieldProto3(GeneratedMessageV3.java:305)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$PTransform.(RunnerApi.java:7084)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$PTransform.(RunnerApi.java:6978)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$PTransform$1.parsePartialFrom(RunnerApi.java:9169)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$PTransform$1.parsePartialFrom(RunnerApi.java:9163)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$PTransform$Builder.mergeFrom(RunnerApi.java:8052)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$PTransform$Builder.mergeFrom(RunnerApi.java:7835)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2408)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntryLite.parseField(MapEntryLite.java:128)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntryLite.parseEntry(MapEntryLite.java:184)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntry.(MapEntry.java:106)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntry.(MapEntry.java:50)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntry$Metadata$1.parsePartialFrom(MapEntry.java:70)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntry$Metadata$1.parsePartialFrom(MapEntry.java:64)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2424)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$Components.(RunnerApi.java:343)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$Components.(RunnerApi.java:300)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$Components$1.parsePartialFrom(RunnerApi.java:2166)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$Components$1.parsePartialFrom(RunnerApi.java:2160)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2424)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.(RunnerApi.java:5523)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.(RunnerApi.java:5481)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline$1.parsePartialFrom(RunnerApi.java:6612)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline$1.parsePartialFrom(RunnerApi.java:6606)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:221)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:239)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:244)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
 
I   at 
org.apache.beam.vendor.protobuf.v3.com.google.protobuf.GeneratedMessageV3.parseWithIOException(GeneratedMessageV3.java:311)
 
I   at 
org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.parseFrom(RunnerApi.java:5853)
 
I   at 
org.apache.beam.runners.dataflow.worker.DataflowWorkerHarnessHelper.getPipelineFromEnv(DataflowWorkerHarnessHelper.java:117)
 
I   at 
org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:58)
 
I  java failed with exit status 1 
F  Harness failed: exit status 1 
{code}


was (Author: 

[jira] [Work logged] (BEAM-4681) Integrate support for timers using the portability APIs into Flink

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4681?focusedWorklogId=163628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163628
 ]

ASF GitHub Bot logged work on BEAM-4681:


Author: ASF GitHub Bot
Created on: 07/Nov/18 19:33
Start Date: 07/Nov/18 19:33
Worklog Time Spent: 10m 
  Work Description: mxm commented on a change in pull request #6981: 
[BEAM-4681] Add support for portable timers in Flink streaming mode 
URL: https://github.com/apache/beam/pull/6981#discussion_r231647452
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
 ##
 @@ -359,16 +543,82 @@ public void finishBundle() {
 emitResults();
   } catch (Exception e) {
 throw new RuntimeException("Failed to finish remote bundle", e);
+  } finally {
+remoteBundle = null;
+  }
+  if (bundleFinishedCallback != null) {
+bundleFinishedCallback.run();
+bundleFinishedCallback = null;
   }
 }
 
+boolean isBundleInProgress() {
+  return remoteBundle != null;
+}
+
+void setBundleFinishedCallback(Runnable callback) {
+  this.bundleFinishedCallback = callback;
+}
+
 private void emitResults() {
   KV result;
   while ((result = outputQueue.poll()) != null) {
-outputManager.output(outputMap.get(result.getKey()), (WindowedValue) 
result.getValue());
+final String inputCollectionId = result.getKey();
+TupleTag tag = outputMap.get(inputCollectionId);
+WindowedValue windowedValue =
+Preconditions.checkNotNull(
+(WindowedValue) result.getValue(),
+"Received a null value from the SDK harness for %s",
+inputCollectionId);
+if (tag != null) {
+  // process regular elements
+  outputManager.output(tag, windowedValue);
+} else {
+  // process timer elements
+  // TODO This is ugly. There should be an easier way to retrieve the
+  String timerPCollectionId =
+  inputCollectionId.substring(0, inputCollectionId.length() - 
".out:0".length());
 
 Review comment:
   @lukecwik I couldn't figure out how to retrieve the correct name for the 
collection of the timer input. I believe this needs to be fixed. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163628)
Time Spent: 20m  (was: 10m)

> Integrate support for timers using the portability APIs into Flink
> --
>
> Key: BEAM-4681
> URL: https://issues.apache.org/jira/browse/BEAM-4681
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Luke Cwik
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Consider using the code produced in BEAM-4658 to support timers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6008) Propagate errors through portable portable runner

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6008?focusedWorklogId=163629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163629
 ]

ASF GitHub Bot logged work on BEAM-6008:


Author: ASF GitHub Bot
Created on: 07/Nov/18 19:33
Start Date: 07/Nov/18 19:33
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #6973: [BEAM-6008] 
Propagate errors through portable portable runner.
URL: https://github.com/apache/beam/pull/6973#issuecomment-436749667
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163629)
Time Spent: 40m  (was: 0.5h)

> Propagate errors through portable portable runner
> -
>
> Key: BEAM-6008
> URL: https://issues.apache.org/jira/browse/BEAM-6008
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Ahmet Altay
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163625
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 19:24
Start Date: 07/Nov/18 19:24
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] 
Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#issuecomment-436746737
 
 
   All comments addressed.  PTAL. 
   
   @lgajowy  regarding `/gradlew clean assemble`, you are correct.  We are able 
to reproduce it. Root cause is: when clean task evaluates the target, things 
are slightly different.  Made an update.  Thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163625)
Time Spent: 5h 20m  (was: 5h 10m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4681) Integrate support for timers using the portability APIs into Flink

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4681?focusedWorklogId=163627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163627
 ]

ASF GitHub Bot logged work on BEAM-4681:


Author: ASF GitHub Bot
Created on: 07/Nov/18 19:30
Start Date: 07/Nov/18 19:30
Worklog Time Spent: 10m 
  Work Description: mxm opened a new pull request #6981: [BEAM-4681] Add 
support for portable timers in Flink streaming mode 
URL: https://github.com/apache/beam/pull/6981
 
 
   This adds support for portable timers to the Flink Runner in streaming mode. 
Batch support will be added in a follow-up.
   
   The `UsesTimersInParDo` tests of Java ValidatesPortableRunner have been run 
on it and they are passing. We can't enable them yet because they currently run 
only for batch. See https://issues.apache.org/jira/browse/BEAM-6009.
   
   To enable the Python ValidatesPortableRunner tests 
https://issues.apache.org/jira/browse/BEAM-5999 needs to be addressed.
   
   CC @tweise @angoenka @robertwb 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163627)
Time Spent: 10m
Remaining Estimate: 0h

> Integrate support for timers using the portability APIs into Flink
> --
>
> Key: BEAM-4681
> URL: https://issues.apache.org/jira/browse/BEAM-4681
> Project: Beam
> 

[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163602
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 19:08
Start Date: 07/Nov/18 19:08
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6916: [BEAM-5931] Update 
nexmark performance test with dataflow worker jar
URL: https://github.com/apache/beam/pull/6916#issuecomment-436741374
 
 
   @lgajowy 
   
   Looking at the error more closely.  The error happens within operation 
   
   
```com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation```
   
   rather than from inside worker jar. 
   
   Maybe we are hitting testing account cloud usage , or started too many 
performance tests in parallel? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163602)
Time Spent: 5h 10m  (was: 5h)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163596
 ]

ASF GitHub Bot logged work on BEAM-6005:


Author: ASF GitHub Bot
Created on: 07/Nov/18 19:00
Start Date: 07/Nov/18 19:00
Worklog Time Spent: 10m 
  Work Description: jasonkuster closed pull request #6923: [BEAM-6005] 
PCollectionCustomCoderTest updates to fix test to actually function.
URL: https://github.com/apache/beam/pull/6923
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java
 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java
index 9cc400b4639..28f6fe8dcdc 100644
--- 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java
+++ 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java
@@ -17,7 +17,8 @@
  */
 package org.apache.beam.sdk.coders;
 
-import static org.hamcrest.CoreMatchers.instanceOf;
+import static 
org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder;
+import static org.junit.Assert.assertThat;
 
 import java.io.IOException;
 import java.io.InputStream;
@@ -25,26 +26,37 @@
 import java.io.ObjectOutputStream;
 import java.io.OutputStream;
 import java.lang.reflect.InvocationTargetException;
+import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
 import org.apache.beam.sdk.Pipeline;
-import org.apache.beam.sdk.options.PipelineOptions;
-import org.apache.beam.sdk.options.PipelineOptionsFactory;
 import org.apache.beam.sdk.testing.NeedsRunner;
 import org.apache.beam.sdk.testing.PAssert;
 import org.apache.beam.sdk.testing.TestPipeline;
 import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.DoFn.ProcessContext;
+import org.apache.beam.sdk.transforms.DoFn.ProcessElement;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.Reshuffle;
+import org.apache.beam.sdk.transforms.SerializableFunction;
 import org.apache.beam.sdk.values.PCollection;
+import org.hamcrest.Description;
+import org.hamcrest.TypeSafeMatcher;
+import org.junit.Ignore;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 import org.junit.rules.ExpectedException;
 import org.junit.runner.RunWith;
 import org.junit.runners.JUnit4;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 /** Tests for coder exception handling in runners. */
 @RunWith(JUnit4.class)
 public class PCollectionCustomCoderTest {
+  private static final Logger LOG = 
LoggerFactory.getLogger(PCollectionCustomCoderTest.class);
   /**
* A custom test coder that can throw various exceptions during:
*
@@ -58,7 +70,8 @@
   static final String NULL_POINTER_EXCEPTION = 
"java.lang.NullPointerException";
   static final String EXCEPTION_MESSAGE = "Super Unique Message!!!";
 
-  @Rule public ExpectedException thrown = ExpectedException.none();
+  @Rule public final transient ExpectedException thrown = 
ExpectedException.none();
+  @Rule public final transient TestPipeline pipeline = TestPipeline.create();
 
   /** Wrapper of StringUtf8Coder with customizable exception-throwing. */
   public static class CustomTestCoder extends CustomCoder {
@@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) 
throws IOException {
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingIOException() throws Exception {
-thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
 Pipeline p =
-runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
 
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
 p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingNPException() throws Exception {
-thrown.expect(RuntimeException.class);
-thrown.expectMessage("java.lang.NullPointerException: Super Unique 
Message!!!");
-
 Pipeline p =
-runPipelineWith(
+pipelineWith(
 new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super 
Unique Message!!!"));
+
+p.run().waitUntilFinish();
   }
 
  

[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163595
 ]

ASF GitHub Bot logged work on BEAM-6005:


Author: ASF GitHub Bot
Created on: 07/Nov/18 19:00
Start Date: 07/Nov/18 19:00
Worklog Time Spent: 10m 
  Work Description: jasonkuster commented on issue #6923: [BEAM-6005] 
PCollectionCustomCoderTest updates to fix test to actually function.
URL: https://github.com/apache/beam/pull/6923#issuecomment-436738928
 
 
   Presubmits came back green. Merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163595)
Time Spent: 1h 10m  (was: 1h)

> PCollectionCustomCoderTest passes spuriously.
> -
>
> Key: BEAM-6005
> URL: https://issues.apache.org/jira/browse/BEAM-6005
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The test assertions trigger before the coder is used on the actual runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163584
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:45
Start Date: 07/Nov/18 18:45
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] 
Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#issuecomment-436734035
 
 
   Run Java JdbcIO Performance Test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163584)
Time Spent: 4h 40m  (was: 4.5h)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163586=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163586
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:49
Start Date: 07/Nov/18 18:49
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6916: [BEAM-5931] Update 
nexmark performance test with dataflow worker jar
URL: https://github.com/apache/beam/pull/6916#issuecomment-436735247
 
 
   > Are recent Dataflow + Nexmark failures related to this change? See the 
logs: 
https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Dataflow/943/console
   > 
   > And the error in particular:
   > 
   > ```
   > java.io.IOException: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone
   > 10:01:00 {
   > 10:01:00   "code" : 429,
   > 10:01:00   "errors" : [ {
   > 10:01:00 "domain" : "usageLimits",
   > 10:01:00 "message" : "The total number of changes to the object 
temp-storage-for-perf-tests/nexmark/staging/beam-runners-google-cloud-dataflow-java-legacy-worker-2.9.0-SNAPSHOT-VzyH4c3KuvZFQMINha03bw.jar
 exceeds the rate limit. Please reduce the rate of create, update, and delete 
requests.",
   > 10:01:00 "reason" : "rateLimitExceeded"
   > 10:01:00   } ],
   > 10:01:00   "message" : "The total number of changes to the object 
temp-storage-for-perf-tests/nexmark/staging/beam-runners-google-cloud-dataflow-java-legacy-worker-2.9.0-SNAPSHOT-VzyH4c3KuvZFQMINha03bw.jar
 exceeds the rate limit. Please reduce the rate of create, update, and delete 
requests."
   > 10:01:00 }
   > 10:01:00   at 
com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432)
   > 10:01:00   at 
com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287)
   > 10:01:00   at 
org.apache.beam.runners.dataflow.util.PackageUtil.$closeResource(PackageUtil.java:260)
   > 10:01:00   at 
org.apache.beam.runners.dataflow.util.PackageUtil.tryStagePackage(PackageUtil.java:260)
   > 10:01:00   at 
org.apache.beam.runners.dataflow.util.PackageUtil.tryStagePackageWithRetry(PackageUtil.java:203)
   > 10:01:00   at 
org.apache.beam.runners.dataflow.util.PackageUtil.stagePackageSynchronously(PackageUtil.java:187)
   > 10:01:00   at 
org.apache.beam.runners.dataflow.util.PackageUtil.lambda$stagePackage$1(PackageUtil.java:171)
   > 10:01:00   at 
org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:104)
   > 10:01:00   at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
   > 10:01:00   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   > 10:01:00   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   > 10:01:00   at java.lang.Thread.run(Thread.java:748)
   > 10:01:00 Caused by: 
com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone
   > ```
   
   Hmmm...  But since the error is about worker jar, could be relevant. 
Looking. 
   
   + @boyuanzz as well 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163586)
Time Spent: 5h  (was: 4h 50m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163585
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:45
Start Date: 07/Nov/18 18:45
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] 
Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#issuecomment-436734054
 
 
   Run Java MongoDBIO Performance Test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163585)
Time Spent: 4h 50m  (was: 4h 40m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163583
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:44
Start Date: 07/Nov/18 18:44
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] 
Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#issuecomment-436733681
 
 
   Run Java TextIO Performance Test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163583)
Time Spent: 4.5h  (was: 4h 20m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5939) Deduplicate constants

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163575
 ]

ASF GitHub Bot logged work on BEAM-5939:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:34
Start Date: 07/Nov/18 18:34
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #6976: [BEAM-5939] Dedupe 
runner constants
URL: https://github.com/apache/beam/pull/6976#issuecomment-436730199
 
 
   run python postcommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163575)
Time Spent: 50m  (was: 40m)

> Deduplicate constants
> -
>
> Key: BEAM-5939
> URL: https://issues.apache.org/jira/browse/BEAM-5939
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starer
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> apache_beam/runners/dataflow/internal/names.py
> apache_beam/runners/portability/stager.py
> has same constants defined in both files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-6002) Nexmark tests timing out on all runners (crash loop due to metrics?)

2018-11-07 Thread Chamikara Jayalath (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath reassigned BEAM-6002:


Assignee: Lukasz Gajowy  (was: Chamikara Jayalath)

> Nexmark tests timing out on all runners (crash loop due to metrics?) 
> -
>
> Key: BEAM-6002
> URL: https://issues.apache.org/jira/browse/BEAM-6002
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kenneth Knowles
>Assignee: Lukasz Gajowy
>Priority: Critical
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Spark/992/
> {code}
> 08:58:26 2018-11-06T16:58:26.035Z RUNNING Query0
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.elements, from namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.bytes, from namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.startTime for namespace Query0.Events
> 08:58:26 2018-11-06T16:58:26.035Z no activity
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.endTime for namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.elements, from namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.bytes, from namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTime for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTime for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTimestamp for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTimestamp for namespace Query0.Results
> 08:58:41 2018-11-06T16:58:41.035Z RUNNING Query0
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.elements, from namespace Query0.Events
> 08:58:41 2018-11-06T16:58:41.036Z no activity
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.bytes, from namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.startTime for namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.endTime for namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.elements, from namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.bytes, from namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTime for namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTime for namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTimestamp for namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTimestamp for namespace Query0.Results
> 08:58:56 2018-11-06T16:58:56.036Z RUNNING Query0
> 08:58:56 18/11/06 16:58:56 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.elements, from namespace Query0.Events
> 08:58:56 2018-11-06T16:58:56.036Z no activity
> 08:58:56 18/11/06 16:58:56 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.bytes, from namespace Query0.Events
> 08:58:56 18/11/06 16:58:56 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.startTime 

[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163582=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163582
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:38
Start Date: 07/Nov/18 18:38
Worklog Time Spent: 10m 
  Work Description: HuangLED removed a comment on issue #6966: [DRAFT] 
[BEAM-5931] Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#issuecomment-436725979
 
 
   Comments above addressed. 
   
   @lgajowy  You are correct. Tried out 'clean assemble' command, now I get the 
same exception. I don't understand yet how come this target runs perfectly, but 
not 'clean assemble' .  Looking into it now (I am still kinda new to gradle), 
but if you know what might causing it, please let me know.  Thanks! 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163582)
Time Spent: 4h 20m  (was: 4h 10m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work stopped] (BEAM-5965) RPAD

2018-11-07 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-5965 stopped by Rui Wang.
--
> RPAD
> 
>
> Key: BEAM-5965
> URL: https://issues.apache.org/jira/browse/BEAM-5965
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> RPAD(original_value, return_length[, pattern])
> Returns a value that consists of original_value appended with pattern. The 
> return_length is an INT64 that specifies the length of the returned value. If 
> original_value is BYTES, return_length is the number of bytes. If 
> original_value is STRING, return_length is the number of characters.
> The default value of pattern is a blank space.
> Both original_value and pattern must be the same data type.
> If return_length is less than or equal to the original_value length, this 
> function returns the original_value value, truncated to the value of 
> return_length. For example, RPAD("hello world", 7); returns "hello w".
> If original_value, return_length, or pattern is NULL, this function returns 
> NULL.
> This function returns an error if:
> return_length is negative
> pattern is empty
> Return type
> STRING or BYTES
> Examples
> SELECT t, len, FORMAT("%T", RPAD(t, len)) AS RPAD FROM UNNEST([
>   STRUCT('abc' AS t, 5 AS len),
>   ('abc', 2),
>   ('例子', 4)
> ]);
> t len RPAD
> abc   5   "abc  "
> abc   2   "ab"
> 例子4   "例子  "
> SELECT t, len, pattern, FORMAT("%T", RPAD(t, len, pattern)) AS RPAD FROM 
> UNNEST([
>   STRUCT('abc' AS t, 8 AS len, 'def' AS pattern),
>   ('abc', 5, '-'),
>   ('例子', 5, '中文')
> ]);
> t len pattern RPAD
> abc   8   def "abcdefde"
> abc   5   -   "abc--"
> 例子5   中文  "例子中文中"
> SELECT FORMAT("%T", t) AS t, len, FORMAT("%T", RPAD(t, len)) AS RPAD FROM 
> UNNEST([
>   STRUCT(b'abc' AS t, 5 AS len),
>   (b'abc', 2),
>   (b'\xab\xcd\xef', 4)
> ]);
> t len RPAD
> b"abc"5   b"abc  "
> b"abc"2   b"ab"
> b"\xab\xcd\xef"   4   b"\xab\xcd\xef "
> SELECT
>   FORMAT("%T", t) AS t,
>   len,
>   FORMAT("%T", pattern) AS pattern,
>   FORMAT("%T", RPAD(t, len, pattern)) AS RPAD
> FROM UNNEST([
>   STRUCT(b'abc' AS t, 8 AS len, b'def' AS pattern),
>   (b'abc', 5, b'-'),
>   (b'\xab\xcd\xef', 5, b'\x00')
> ]);
> t len pattern RPAD
> b"abc"8   b"def"  b"abcdefde"
> b"abc"5   b"-"b"abc--"
> b"\xab\xcd\xef"   5   b"\x00" b"\xab\xcd\xef\x00\x00"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163574
 ]

ASF GitHub Bot logged work on BEAM-6005:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:33
Start Date: 07/Nov/18 18:33
Worklog Time Spent: 10m 
  Work Description: jasonkuster commented on issue #6923: [BEAM-6005] 
PCollectionCustomCoderTest updates to fix test to actually function.
URL: https://github.com/apache/beam/pull/6923#issuecomment-436729915
 
 
   Overall answer to the questions here:
   
   There is no contract that specified what kind of exception will be bubbled 
back up to the client when there are exceptions on the worker. In the case of 
the DirectRunner, the exceptions are bubbled directly up, but in the case of 
runners which run remote they may return just Exception, they may return 
RuntimeException, or they may attempt to pipe the actual exception cause up. 
This test is not intended to enforce a contract for how the runners should 
handle this behavior, and the test prior to this would only have passed on the 
DirectRunner. This way it can be run against any runner.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163574)
Time Spent: 1h  (was: 50m)

> PCollectionCustomCoderTest passes spuriously.
> -
>
> Key: BEAM-6005
> URL: https://issues.apache.org/jira/browse/BEAM-6005
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The test assertions trigger before the coder is used on the actual runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5979) Support DATE and TIME in DML

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5979?focusedWorklogId=163572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163572
 ]

ASF GitHub Bot logged work on BEAM-5979:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:31
Start Date: 07/Nov/18 18:31
Worklog Time Spent: 10m 
  Work Description: akedin closed pull request #6967: [BEAM-5979]  Fix DATE 
and TIME in INSERTION
URL: https://github.com/apache/beam/pull/6967
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java
 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java
index 0b6a1aee6ee..325208278ab 100644
--- 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java
+++ 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java
@@ -106,7 +106,8 @@ public static Object autoCastField(Schema.Field field, 
Object rawObj) {
 return rawObj;
   }
 } else if (type.isDateType()) {
-  return DateTime.parse(rawObj.toString());
+  // Internal representation of DateType in Calcite is convertible to 
Joda's Datetime.
+  return new DateTime(rawObj);
 } else if (type.isNumericType()
 && ((rawObj instanceof String)
 || (rawObj instanceof BigDecimal && type != TypeName.DECIMAL))) {
diff --git 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
index fffe52ac3f1..ea06fc5b010 100644
--- 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
+++ 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
@@ -103,7 +103,8 @@ public void addRows(String tableName, Row... rows) {
 return tables().get(tableName).rows;
   }
 
-  private static class TableWithRows implements Serializable {
+  /** TableWitRows. */
+  public static class TableWithRows implements Serializable {
 private Table table;
 private List rows;
 private long tableProviderInstanceId;
@@ -113,6 +114,10 @@ public TableWithRows(long tableProviderInstanceId, Table 
table) {
   this.table = table;
   this.rows = new CopyOnWriteArrayList<>();
 }
+
+public List getRows() {
+  return rows;
+}
   }
 
   private static class InMemoryTable implements BeamSqlTable {
diff --git 
a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java
 
b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java
index c7ece0de280..703e9a08bff 100644
--- 
a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java
+++ 
b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java
@@ -30,10 +30,14 @@
 import java.util.stream.Stream;
 import org.apache.beam.sdk.extensions.sql.impl.ParseException;
 import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestTableProvider;
 import org.apache.beam.sdk.extensions.sql.meta.provider.text.TextTableProvider;
 import org.apache.beam.sdk.extensions.sql.meta.store.InMemoryMetaStore;
 import org.apache.beam.sdk.schemas.Schema;
 import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.values.Row;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
 import org.junit.Test;
 
 /** UnitTest for {@link BeamSqlCli}. */
@@ -235,4 +239,41 @@ public void testExplainQuery() throws Exception {
 "BeamCalcRel(expr#0..2=[{inputs}], proj#0..2=[{exprs}])\n"
 + "  BeamIOSourceRel(table=[[beam, person]])\n"));
   }
+
+  @Test
+  public void test_time_types() throws Exception {
+InMemoryMetaStore metaStore = new InMemoryMetaStore();
+TestTableProvider testTableProvider = new TestTableProvider();
+metaStore.registerProvider(testTableProvider);
+
+BeamSqlCli cli = new BeamSqlCli().metaStore(metaStore);
+cli.execute(
+"CREATE EXTERNAL TABLE test_table (\n"
++ "f_date DATE, \n"
++ "f_time TIME, \n"
++ "f_ts TIMESTAMP"
++ ") \n"
++ "TYPE 'test'");
+
+cli.execute(
+"INSERT INTO test_table VALUES ("
++ "DATE '2018-11-01', "
++ "TIME 

[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163571
 ]

ASF GitHub Bot logged work on BEAM-6005:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:30
Start Date: 07/Nov/18 18:30
Worklog Time Spent: 10m 
  Work Description: jasonkuster commented on issue #6923: [BEAM-6005] 
PCollectionCustomCoderTest updates to fix test to actually function.
URL: https://github.com/apache/beam/pull/6923#issuecomment-436729213
 
 
   Run Java PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163571)
Time Spent: 50m  (was: 40m)

> PCollectionCustomCoderTest passes spuriously.
> -
>
> Key: BEAM-6005
> URL: https://issues.apache.org/jira/browse/BEAM-6005
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test assertions trigger before the coder is used on the actual runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-6006) Test TIME +/- Interval

2018-11-07 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang closed BEAM-6006.
--
   Resolution: Fixed
Fix Version/s: Not applicable

> Test TIME +/- Interval
> --
>
> Key: BEAM-6006
> URL: https://issues.apache.org/jira/browse/BEAM-6006
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6010) Deprecate KafkaIO withTimestampFn()

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6010?focusedWorklogId=163561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163561
 ]

ASF GitHub Bot logged work on BEAM-6010:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:18
Start Date: 07/Nov/18 18:18
Worklog Time Spent: 10m 
  Work Description: rangadi commented on issue #6964: [BEAM-6010] Deprecate 
KafkaIO withTimestampFn().
URL: https://github.com/apache/beam/pull/6964#issuecomment-436724587
 
 
   Thanks for merging this.
   We can remove the deprecated API in KafkaIO. I was wondering if we should do 
that now or for 3.0. If you agree, I can send a PR right away to remove them. 
They have been deprecated for quite a while.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163561)
Time Spent: 20m  (was: 10m)

> Deprecate KafkaIO withTimestampFn()
> ---
>
> Key: BEAM-6010
> URL: https://issues.apache.org/jira/browse/BEAM-6010
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka
>Affects Versions: 2.8.0
>Reporter: Alexey Romanenko
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 2.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163563
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:21
Start Date: 07/Nov/18 18:21
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] 
Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#issuecomment-436725979
 
 
   Comments above addressed. 
   
   @lgajowy  You are correct. Tried out 'clean assemble' command, now I get the 
same exception. I don't understand yet how come this target runs perfectly, but 
not 'clean assemble' .  Looking into it now (I am still kinda new to gradle), 
but if you know what might causing it, please let me know.  Thanks! 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163563)
Time Spent: 4h 10m  (was: 4h)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163562
 ]

ASF GitHub Bot logged work on BEAM-5906:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:19
Start Date: 07/Nov/18 18:19
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on a change in pull request #6886: 
[BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results
URL: https://github.com/apache/beam/pull/6886#discussion_r231621419
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -398,6 +399,7 @@ class BeamModulePlugin implements Plugin {
 google_api_services_storage : 
"com.google.apis:google-api-services-storage:v1-rev136-$google_clients_version",
 google_auth_library_credentials : 
"com.google.auth:google-auth-library-credentials:$google_auth_version",
 google_auth_library_oauth2_http : 
"com.google.auth:google-auth-library-oauth2-http:$google_auth_version",
+google_cloud_bigquery   : 
"com.google.cloud:google-cloud-bigquery:$google_cloud_bigquery_version",
 
 Review comment:
   ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163562)
Time Spent: 2h 50m  (was: 2h 40m)

> Remove pipeline for publishing nexmark results to bigQuery and publish using 
> BigQuery API only
> --
>
> Key: BEAM-5906
> URL: https://issues.apache.org/jira/browse/BEAM-5906
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> There's no need to start a separate pipeline for uploading metrics results 
> from Nexmark suites to BigQuery. We can use an API designed for that and 
> place it in test-utils. Thanks to that: 
>  - it won't start a separate pipeline every time it publishes results
>  - other suites will be able to use that code
>  - We will not face problems like special long to int conversion due to 
> problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163559
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:14
Start Date: 07/Nov/18 18:14
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on a change in pull request #6966: 
[DRAFT] [BEAM-5931] Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#discussion_r231619754
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1139,7 +1140,25 @@ artifactId=${project.name}
 outputs.upToDateWhen { false }
 
 include "**/*IT.class"
-systemProperties.beamTestPipelineOptions = 
configuration.integrationTestPipelineOptions
+
+def pipelineOptionString = configuration.integrationTestPipelineOptions
+if(configuration.runner?.equalsIgnoreCase('dataflow')) {
+  
project.evaluationDependsOn(":beam-runners-google-cloud-dataflow-java-legacy-worker")
+
+  def jsonSlurper = new JsonSlurper()
+  def allOptionsList = jsonSlurper.parseText(pipelineOptionString)
+  def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?:
+  
project.project(":beam-runners-google-cloud-dataflow-java-legacy-worker").shadowJar.archivePath
 
 Review comment:
   Only reason is to follow the convention in this file, for better 
readability. 
   
   If there is reason other one of using `depends on` is preferred, please let 
me know, I shall update accordingly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163559)
Time Spent: 4h  (was: 3h 50m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5979) Support DATE and TIME in DML

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5979?focusedWorklogId=163560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163560
 ]

ASF GitHub Bot logged work on BEAM-5979:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:14
Start Date: 07/Nov/18 18:14
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #6967: [BEAM-5979]  Fix 
DATE and TIME in INSERTION
URL: https://github.com/apache/beam/pull/6967#issuecomment-436723330
 
 
   @akedin Can you help merge this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163560)
Time Spent: 1h 50m  (was: 1h 40m)

> Support DATE and TIME in DML
> 
>
> Key: BEAM-5979
> URL: https://issues.apache.org/jira/browse/BEAM-5979
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Right now, BeamSQL uses Schema's DATETIME field to save all time related 
> data. However, BeamSQL doesn't implement correctly how TIME and DATE should 
> be converted to Joda's datetime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5939) Deduplicate constants

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163540
 ]

ASF GitHub Bot logged work on BEAM-5939:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:01
Start Date: 07/Nov/18 18:01
Worklog Time Spent: 10m 
  Work Description: brianmartin commented on a change in pull request 
#6976: [BEAM-5939] Dedupe runner constants
URL: https://github.com/apache/beam/pull/6976#discussion_r231615270
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/names.py
 ##
 @@ -22,11 +22,9 @@
 
 from __future__ import absolute_import
 
-# TODO (altay): Move shared names to a common location.
 # Standard file names used for staging files.
 from builtins import object
 
-PICKLED_MAIN_SESSION_FILE = 'pickled_main_session'
 DATAFLOW_SDK_TARBALL_FILE = 'dataflow_python_sdk.tar'
 STAGED_PIPELINE_FILENAME = "pipeline.pb"
 
 Review comment:
   I've moved `STAGED_PIPELINE_FILENAME ` to the common location. Though, it is 
only used in the Dataflow runner. Also, the 
`STAGED_PIPELINE_URL_METADATA_FIELD` constant was not used anywhere; I've 
removed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163540)
Time Spent: 40m  (was: 0.5h)

> Deduplicate constants
> -
>
> Key: BEAM-5939
> URL: https://issues.apache.org/jira/browse/BEAM-5939
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starer
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> apache_beam/runners/dataflow/internal/names.py
> apache_beam/runners/portability/stager.py
> has same constants defined in both files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6010) Deprecate KafkaIO withTimestampFn()

2018-11-07 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678588#comment-16678588
 ] 

Raghu Angadi commented on BEAM-6010:


Thank you for filing this.

> Deprecate KafkaIO withTimestampFn()
> ---
>
> Key: BEAM-6010
> URL: https://issues.apache.org/jira/browse/BEAM-6010
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka
>Affects Versions: 2.8.0
>Reporter: Alexey Romanenko
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 2.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5798) Add support for dynamic destinations when writing to Kafka

2018-11-07 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678582#comment-16678582
 ] 

Raghu Angadi commented on BEAM-5798:


[~aromanenko], yeah, that's what I had in mind. But we should go with whichever 
we are more comfortable with.

> Add support for dynamic destinations when writing to Kafka
> --
>
> Key: BEAM-5798
> URL: https://issues.apache.org/jira/browse/BEAM-5798
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Luke Cwik
>Assignee: Alexey Romanenko
>Priority: Major
>  Labels: newbie, starter
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Add support for writing to Kafka based upon contents of the data. This is 
> similar to the dynamic destination approach for file IO and other sinks.
>  
> Source of request: 
> https://lists.apache.org/thread.html/a89d1d32ecdb50c42271e805cc01a651ee3623b4df97c39baf4f2053@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4863) Implement consistentWithEquals/structuralValue on FullWindowedValueCoder

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4863?focusedWorklogId=163541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163541
 ]

ASF GitHub Bot logged work on BEAM-4863:


Author: ASF GitHub Bot
Created on: 07/Nov/18 18:01
Start Date: 07/Nov/18 18:01
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #6057: [BEAM-4863] 
Implement consistentWithEquals/structuralValue on FullWindowedValueCoder
URL: https://github.com/apache/beam/pull/6057#issuecomment-436719050
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163541)
Time Spent: 1.5h  (was: 1h 20m)

> Implement consistentWithEquals/structuralValue on FullWindowedValueCoder
> 
>
> Key: BEAM-4863
> URL: https://issues.apache.org/jira/browse/BEAM-4863
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Implementing *consistentWithEquals*/*structuralValue* boosts significantly 
> the performance of using these values in comparison operations since it 
> doesn't require encoding the values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5939) Deduplicate constants

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163538
 ]

ASF GitHub Bot logged work on BEAM-5939:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:59
Start Date: 07/Nov/18 17:59
Worklog Time Spent: 10m 
  Work Description: brianmartin commented on a change in pull request 
#6976: [BEAM-5939] Dedupe runner constants
URL: https://github.com/apache/beam/pull/6976#discussion_r231614348
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -51,6 +51,7 @@
 from apache_beam.runners.dataflow.internal import names
 from apache_beam.runners.dataflow.internal.clients import dataflow
 from apache_beam.runners.dataflow.internal.names import PropertyNames
+from apache_beam.runners.internal.names import BEAM_PACKAGE_NAME, BEAM_SDK_NAME
 
 Review comment:
   Sounds good. I had to import as `shared_names` to avoid conflict with the 
import on line 51 though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163538)
Time Spent: 0.5h  (was: 20m)

> Deduplicate constants
> -
>
> Key: BEAM-5939
> URL: https://issues.apache.org/jira/browse/BEAM-5939
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starer
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> apache_beam/runners/dataflow/internal/names.py
> apache_beam/runners/portability/stager.py
> has same constants defined in both files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6006) Test TIME +/- Interval

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6006?focusedWorklogId=163537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163537
 ]

ASF GitHub Bot logged work on BEAM-6006:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:53
Start Date: 07/Nov/18 17:53
Worklog Time Spent: 10m 
  Work Description: akedin closed pull request #6972: [BEAM-6006] Test TIME 
+/- INTERVAL
URL: https://github.com/apache/beam/pull/6972
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpression.java
 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpression.java
index cc3c75a15e0..025265d43d8 100644
--- 
a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpression.java
+++ 
b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpression.java
@@ -43,7 +43,9 @@ public boolean accept() {
 
   static boolean accept(List operands, SqlTypeName 
outputType) {
 return operands.size() == 2
-&& (SqlTypeName.TIMESTAMP.equals(outputType) || 
SqlTypeName.DATE.equals(outputType))
+&& (SqlTypeName.TIMESTAMP.equals(outputType)
+|| SqlTypeName.DATE.equals(outputType)
+|| SqlTypeName.TIME.equals(outputType))
 && SqlTypeName.DATETIME_TYPES.contains(operands.get(0).getOutputType())
 && 
TimeUnitUtils.INTERVALS_DURATIONS_TYPES.containsKey(operands.get(1).getOutputType());
   }
diff --git 
a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java
 
b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java
index 35f265ab535..192f86da727 100644
--- 
a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java
+++ 
b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java
@@ -1197,7 +1197,10 @@ public void testDatetimeInfixPlus() {
 parseTimestamp("1986-01-19 01:02:03"))
 .addExpr("DATE '1984-04-19' + INTERVAL '2' DAY", 
parseDate("1984-04-21"))
 .addExpr("DATE '1984-04-19' + INTERVAL '1' MONTH", 
parseDate("1984-05-19"))
-.addExpr("DATE '1984-04-19' + INTERVAL '3' YEAR", 
parseDate("1987-04-19"));
+.addExpr("DATE '1984-04-19' + INTERVAL '3' YEAR", 
parseDate("1987-04-19"))
+.addExpr("TIME '14:28:30' + INTERVAL '15' SECOND", 
parseTime("14:28:45"))
+.addExpr("TIME '14:28:30.239' + INTERVAL '4' MINUTE", 
parseTime("14:32:30.239"))
+.addExpr("TIME '14:28:30.2' + INTERVAL '4' HOUR", 
parseTime("18:28:30.2"));
 
 checker.buildRunAndCheck();
   }
@@ -1292,7 +1295,7 @@ public void testTimestampDiff() {
 
   @Test
   // More needed @SqlOperatorTest(name = "-", kind = "MINUS")
-  public void testTimestampMinusInterval() throws Exception {
+  public void testTimestampMinusInterval() {
 ExpressionChecker checker =
 new ExpressionChecker()
 .addExpr(
@@ -1315,8 +1318,11 @@ public void testTimestampMinusInterval() throws 
Exception {
 parseTimestamp("1983-01-19 01:01:58"))
 .addExpr("DATE '1984-04-19' - INTERVAL '2' DAY", 
parseDate("1984-04-17"))
 .addExpr("DATE '1984-04-19' - INTERVAL '1' MONTH", 
parseDate("1984-03-19"))
-.addExpr("DATE '1984-04-19' - INTERVAL '3' YEAR", 
parseDate("1981-04-19"));
-;
+.addExpr("DATE '1984-04-19' - INTERVAL '3' YEAR", 
parseDate("1981-04-19"))
+.addExpr("TIME '14:28:30' - INTERVAL '15' SECOND", 
parseTime("14:28:15"))
+.addExpr("TIME '14:28:30.239' - INTERVAL '4' MINUTE", 
parseTime("14:24:30.239"))
+.addExpr("TIME '14:28:30.2' - INTERVAL '4' HOUR", 
parseTime("10:28:30.2"));
+
 checker.buildRunAndCheck();
   }
 
diff --git 
a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpressionTest.java
 
b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpressionTest.java
index 870057af148..939a1cb7627 100644
--- 

[jira] [Work logged] (BEAM-5939) Deduplicate constants

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163531
 ]

ASF GitHub Bot logged work on BEAM-5939:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:46
Start Date: 07/Nov/18 17:46
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #6976: 
[BEAM-5939] Dedupe runner constants
URL: https://github.com/apache/beam/pull/6976#discussion_r231608454
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/names.py
 ##
 @@ -22,11 +22,9 @@
 
 from __future__ import absolute_import
 
-# TODO (altay): Move shared names to a common location.
 # Standard file names used for staging files.
 from builtins import object
 
-PICKLED_MAIN_SESSION_FILE = 'pickled_main_session'
 DATAFLOW_SDK_TARBALL_FILE = 'dataflow_python_sdk.tar'
 STAGED_PIPELINE_FILENAME = "pipeline.pb"
 
 Review comment:
   STAGED_... constants could also move to a common place.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163531)
Time Spent: 20m  (was: 10m)

> Deduplicate constants
> -
>
> Key: BEAM-5939
> URL: https://issues.apache.org/jira/browse/BEAM-5939
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starer
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> apache_beam/runners/dataflow/internal/names.py
> apache_beam/runners/portability/stager.py
> has same constants defined in both files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5939) Deduplicate constants

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163532
 ]

ASF GitHub Bot logged work on BEAM-5939:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:46
Start Date: 07/Nov/18 17:46
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #6976: 
[BEAM-5939] Dedupe runner constants
URL: https://github.com/apache/beam/pull/6976#discussion_r231608251
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -51,6 +51,7 @@
 from apache_beam.runners.dataflow.internal import names
 from apache_beam.runners.dataflow.internal.clients import dataflow
 from apache_beam.runners.dataflow.internal.names import PropertyNames
+from apache_beam.runners.internal.names import BEAM_PACKAGE_NAME, BEAM_SDK_NAME
 
 Review comment:
   I believe it would be more clear, if you import names, and then use 
names.BEAM_PACKAGE_NAME etc. in the code. What do you think?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163532)
Time Spent: 20m  (was: 10m)

> Deduplicate constants
> -
>
> Key: BEAM-5939
> URL: https://issues.apache.org/jira/browse/BEAM-5939
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starer
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> apache_beam/runners/dataflow/internal/names.py
> apache_beam/runners/portability/stager.py
> has same constants defined in both files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163511
 ]

ASF GitHub Bot logged work on BEAM-6005:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:25
Start Date: 07/Nov/18 17:25
Worklog Time Spent: 10m 
  Work Description: supercclank commented on a change in pull request 
#6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually 
function.
URL: https://github.com/apache/beam/pull/6923#discussion_r231599742
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java
 ##
 @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) 
throws IOException {
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingIOException() throws Exception {
-thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
 Pipeline p =
-runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
 
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
 p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingNPException() throws Exception {
-thrown.expect(RuntimeException.class);
-thrown.expectMessage("java.lang.NullPointerException: Super Unique 
Message!!!");
-
 Pipeline p =
-runPipelineWith(
+pipelineWith(
 new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super 
Unique Message!!!"));
+
+p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testEncodingIOException() throws Exception {
+Pipeline p =
+pipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, 
EXCEPTION_MESSAGE));
 thrown.expect(Exception.class);
 
 Review comment:
   Is this able to be changed to "IOException.class"?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163511)
Time Spent: 40m  (was: 0.5h)

> PCollectionCustomCoderTest passes spuriously.
> -
>
> Key: BEAM-6005
> URL: https://issues.apache.org/jira/browse/BEAM-6005
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The test assertions trigger before the coder is used on the actual runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163528
 ]

ASF GitHub Bot logged work on BEAM-5906:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:40
Start Date: 07/Nov/18 17:40
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #6886: [BEAM-5906] Use 
dedicated BigQuery client for publishing Nexmark results
URL: https://github.com/apache/beam/pull/6886#issuecomment-436711924
 
 
   LGTM. Not sure what the status is on updating all of our dependencies, but 
it looks like it may be safe to use this client now (nothing blocks BEAM-4248 
anymore).
   
   cc: @chamikaramj


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163528)
Time Spent: 2h 40m  (was: 2.5h)

> Remove pipeline for publishing nexmark results to bigQuery and publish using 
> BigQuery API only
> --
>
> Key: BEAM-5906
> URL: https://issues.apache.org/jira/browse/BEAM-5906
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There's no need to start a separate pipeline for uploading metrics results 
> from Nexmark suites to BigQuery. We can use an API designed for that and 
> place it in test-utils. Thanks to that: 
>  - it won't start a separate pipeline every time it publishes results
>  - other suites will be able to use that code
>  - We will not face problems like special long to int conversion due to 
> problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5939) Deduplicate constants

2018-11-07 Thread Brian Martin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678550#comment-16678550
 ] 

Brian Martin commented on BEAM-5939:


Hey [~altay], I've made a PR for this minor change.

> Deduplicate constants
> -
>
> Key: BEAM-5939
> URL: https://issues.apache.org/jira/browse/BEAM-5939
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starer
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> apache_beam/runners/dataflow/internal/names.py
> apache_beam/runners/portability/stager.py
> has same constants defined in both files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5939) Deduplicate constants

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163525
 ]

ASF GitHub Bot logged work on BEAM-5939:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:36
Start Date: 07/Nov/18 17:36
Worklog Time Spent: 10m 
  Work Description: brianmartin opened a new pull request #6976: 
[BEAM-5939] Dedupe runner constants
URL: https://github.com/apache/beam/pull/6976
 
 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163525)
Time Spent: 10m
Remaining Estimate: 0h

> Deduplicate constants
> -
>
> Key: BEAM-5939
> URL: https://issues.apache.org/jira/browse/BEAM-5939
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Minor
>  Labels: starer
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> apache_beam/runners/dataflow/internal/names.py
> apache_beam/runners/portability/stager.py
> has same constants defined in both files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163524
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:35
Start Date: 07/Nov/18 17:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #6966: 
[DRAFT] [BEAM-5931] Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#discussion_r231603735
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1139,7 +1140,25 @@ artifactId=${project.name}
 outputs.upToDateWhen { false }
 
 include "**/*IT.class"
-systemProperties.beamTestPipelineOptions = 
configuration.integrationTestPipelineOptions
+
+def pipelineOptionString = configuration.integrationTestPipelineOptions
+if(configuration.runner?.equalsIgnoreCase('dataflow')) {
+  
project.evaluationDependsOn(":beam-runners-google-cloud-dataflow-java-legacy-worker")
+
+  def jsonSlurper = new JsonSlurper()
 
 Review comment:
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163524)
Time Spent: 3h 50m  (was: 3h 40m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163522
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:35
Start Date: 07/Nov/18 17:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #6966: 
[DRAFT] [BEAM-5931] Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#discussion_r231603625
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1139,7 +1140,25 @@ artifactId=${project.name}
 outputs.upToDateWhen { false }
 
 include "**/*IT.class"
-systemProperties.beamTestPipelineOptions = 
configuration.integrationTestPipelineOptions
+
+def pipelineOptionString = configuration.integrationTestPipelineOptions
+if(configuration.runner?.equalsIgnoreCase('dataflow')) {
+  
project.evaluationDependsOn(":beam-runners-google-cloud-dataflow-java-legacy-worker")
+
+  def jsonSlurper = new JsonSlurper()
+  def allOptionsList = jsonSlurper.parseText(pipelineOptionString)
+  def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?:
+  
project.project(":beam-runners-google-cloud-dataflow-java-legacy-worker").shadowJar.archivePath
 
 Review comment:
   The dependency on the archive below via `shadow it.project(path: 
":beam-runners-google-cloud-dataflow-java-legacy-worker", configuration: 
'shadow')` will ensure it gets built.
   
   Is there a reason why you went with a configuration based dependency over 
using `dependsOn 
":beam-runners-google-cloud-dataflow-java-legacy-worker:shadowJar"` task like 
the others that were done?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163522)
Time Spent: 3.5h  (was: 3h 20m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163523
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:35
Start Date: 07/Nov/18 17:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #6966: 
[DRAFT] [BEAM-5931] Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#discussion_r231603738
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1139,7 +1140,25 @@ artifactId=${project.name}
 outputs.upToDateWhen { false }
 
 include "**/*IT.class"
-systemProperties.beamTestPipelineOptions = 
configuration.integrationTestPipelineOptions
+
+def pipelineOptionString = configuration.integrationTestPipelineOptions
 
 Review comment:
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163523)
Time Spent: 3h 40m  (was: 3.5h)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163521
 ]

ASF GitHub Bot logged work on BEAM-5906:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:34
Start Date: 07/Nov/18 17:34
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #6886: 
[BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results
URL: https://github.com/apache/beam/pull/6886#discussion_r231604879
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -398,6 +399,7 @@ class BeamModulePlugin implements Plugin {
 google_api_services_storage : 
"com.google.apis:google-api-services-storage:v1-rev136-$google_clients_version",
 google_auth_library_credentials : 
"com.google.auth:google-auth-library-credentials:$google_auth_version",
 google_auth_library_oauth2_http : 
"com.google.auth:google-auth-library-oauth2-http:$google_auth_version",
+google_cloud_bigquery   : 
"com.google.cloud:google-cloud-bigquery:$google_cloud_bigquery_version",
 
 Review comment:
   The version of this should probably be `google_clients_version`. Also, this 
particular client has caused dependency problems in the past. See #5286. The 
nexmark tests don't use other clients, so it might not be a problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163521)
Time Spent: 2.5h  (was: 2h 20m)

> Remove pipeline for publishing nexmark results to bigQuery and publish using 
> BigQuery API only
> --
>
> Key: BEAM-5906
> URL: https://issues.apache.org/jira/browse/BEAM-5906
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There's no need to start a separate pipeline for uploading metrics results 
> from Nexmark suites to BigQuery. We can use an API designed for that and 
> place it in test-utils. Thanks to that: 
>  - it won't start a separate pipeline every time it publishes results
>  - other suites will be able to use that code
>  - We will not face problems like special long to int conversion due to 
> problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163513
 ]

ASF GitHub Bot logged work on BEAM-5906:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:26
Start Date: 07/Nov/18 17:26
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on a change in pull request #6886: 
[BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results
URL: https://github.com/apache/beam/pull/6886#discussion_r231601088
 
 

 ##
 File path: 
sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java
 ##
 @@ -177,77 +160,36 @@ void runAll(String[] args) throws IOException {
 
   @VisibleForTesting
   static void savePerfsToBigQuery(
+  BigQueryClient bigQueryClient,
   NexmarkOptions options,
   Map perfs,
-  @Nullable BigQueryServices testBigQueryServices,
   Instant start) {
-Pipeline pipeline = Pipeline.create(options);
-PCollection> perfsPCollection =
-pipeline.apply(
-Create.of(perfs)
-.withCoder(
-KvCoder.of(
-SerializableCoder.of(NexmarkConfiguration.class),
-new CustomCoder() {
-
-  @Override
-  public void encode(NexmarkPerf value, OutputStream 
outStream)
-  throws CoderException, IOException {
-StringUtf8Coder.of().encode(value.toString(), 
outStream);
-  }
-
-  @Override
-  public NexmarkPerf decode(InputStream inStream)
-  throws CoderException, IOException {
-String perf = 
StringUtf8Coder.of().decode(inStream);
-return NexmarkPerf.fromString(perf);
-  }
-})));
-
-TableSchema tableSchema =
-new TableSchema()
-.setFields(
-ImmutableList.of(
-new 
TableFieldSchema().setName("timestamp").setType("TIMESTAMP"),
-new 
TableFieldSchema().setName("runtimeSec").setType("FLOAT"),
-new 
TableFieldSchema().setName("eventsPerSec").setType("FLOAT"),
-new 
TableFieldSchema().setName("numResults").setType("INTEGER")));
-
-String queryName = "{query}";
-if (options.getQueryLanguage() != null) {
-  queryName = queryName + "_" + options.getQueryLanguage();
-}
-final String tableSpec = NexmarkUtils.tableSpec(options, queryName, 0L, 
null);
-SerializableFunction<
-ValueInSingleWindow>, 
TableDestination>
-tableFunction =
-input ->
-new TableDestination(
-tableSpec.replace("{query}", 
input.getValue().getKey().query.getNumberOrName()),
-"perfkit queries");
-SerializableFunction, TableRow> 
rowFunction =
-input -> {
-  NexmarkPerf nexmarkPerf = input.getValue();
-  TableRow row =
-  new TableRow()
-  .set("timestamp", start.getMillis() / 1000)
-  .set("runtimeSec", nexmarkPerf.runtimeSec)
-  .set("eventsPerSec", nexmarkPerf.eventsPerSec)
-  .set("numResults", nexmarkPerf.numResults);
-  return row;
-};
-BigQueryIO.Write io =
-BigQueryIO.>write()
-.to(tableFunction)
-.withSchema(tableSchema)
-
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
-
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
-.withFormatFunction(rowFunction);
-if (testBigQueryServices != null) {
-  io = io.withTestServices(testBigQueryServices);
+
+for (Map.Entry entry : 
perfs.entrySet()) {
+  String queryName =
+  NexmarkUtils.fullQueryName(
+  options.getQueryLanguage(), 
entry.getKey().query.getNumberOrName());
+  String tableName = NexmarkUtils.tableName(options, queryName, 0L, null);
+
+  ImmutableMap schema =
+  ImmutableMap.builder()
+  .put("timestamp", "timestamp")
+  .put("runtimeSec", "float")
+  .put("eventsPerSec", "float")
+  .put("numResults", "integer")
+  .build();
+  bigQueryClient.createTableIfNotExists(tableName, schema);
+
+  Map record =
+  ImmutableMap.builder()
+  .put("timestamp", start.getMillis() / 1000)
+  .put("runtimeSec", entry.getValue().runtimeSec)
+  .put("eventsPerSec", entry.getValue().eventsPerSec)
+  .put("numResults", entry.getValue().numResults)
+  .build();
+
+  bigQueryClient.insertRow(record, 

[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163510
 ]

ASF GitHub Bot logged work on BEAM-6005:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:25
Start Date: 07/Nov/18 17:25
Worklog Time Spent: 10m 
  Work Description: supercclank commented on a change in pull request 
#6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually 
function.
URL: https://github.com/apache/beam/pull/6923#discussion_r231599920
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java
 ##
 @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) 
throws IOException {
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingIOException() throws Exception {
-thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
 Pipeline p =
-runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
 
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
 p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingNPException() throws Exception {
-thrown.expect(RuntimeException.class);
-thrown.expectMessage("java.lang.NullPointerException: Super Unique 
Message!!!");
-
 Pipeline p =
-runPipelineWith(
+pipelineWith(
 new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
 
 Review comment:
   Was the widening of RuntimeException.class to Exception.class intended?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163510)
Time Spent: 40m  (was: 0.5h)

> PCollectionCustomCoderTest passes spuriously.
> -
>
> Key: BEAM-6005
> URL: https://issues.apache.org/jira/browse/BEAM-6005
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The test assertions trigger before the coder is used on the actual runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163516
 ]

ASF GitHub Bot logged work on BEAM-5906:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:26
Start Date: 07/Nov/18 17:26
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on a change in pull request #6886: 
[BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results
URL: https://github.com/apache/beam/pull/6886#discussion_r231601163
 
 

 ##
 File path: 
sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java
 ##
 @@ -87,41 +71,47 @@ public void testSavePerfsToBigQuery() throws IOException, 
InterruptedException {
 nexmarkPerf2.eventsPerSec = 1.5F;
 nexmarkPerf2.runtimeSec = 1.325F;
 
-// simulate 2 runs of the same query just to check that rows are apened 
correctly.
+// simulate 2 runs of the same query just to check that rows are appended 
correctly.
 HashMap perfs = new HashMap<>(2);
 perfs.put(nexmarkConfiguration1, nexmarkPerf1);
 perfs.put(nexmarkConfiguration2, nexmarkPerf2);
 
-// cast to int due to BEAM-4734. To avoid overflow on int capacity,
-// set the instant to a fixed date (and not Instant.now())
-int startTimestampSeconds = 1454284800;
-Main.savePerfsToBigQuery(
-options, perfs, fakeBqServices, new Instant(startTimestampSeconds * 
1000L));
-
-String tableSpec = NexmarkUtils.tableSpec(options, 
QUERY.getNumberOrName(), 0L, null);
-List actualRows =
-fakeDatasetService.getAllRows(
-options.getProject(),
-options.getBigQueryDataset(),
-BigQueryHelpers.parseTableSpec(tableSpec).getTableId());
-assertEquals("Wrong number of rows inserted", 2, actualRows.size());
-List expectedRows = new ArrayList<>();
-TableRow row1 =
-new TableRow()
-.set("timestamp", startTimestampSeconds)
-.set("runtimeSec", nexmarkPerf1.runtimeSec)
-.set("eventsPerSec", nexmarkPerf1.eventsPerSec)
-// cast to int due to BEAM-4734.
-.set("numResults", (int) nexmarkPerf1.numResults);
-expectedRows.add(row1);
-TableRow row2 =
-new TableRow()
-.set("timestamp", startTimestampSeconds)
-.set("runtimeSec", nexmarkPerf2.runtimeSec)
-.set("eventsPerSec", nexmarkPerf2.eventsPerSec)
-// cast to int  due to BEAM-4734.
-.set("numResults", (int) nexmarkPerf2.numResults);
-expectedRows.add(row2);
-assertThat(actualRows, containsInAnyOrder(Iterables.toArray(expectedRows, 
TableRow.class)));
+long startTimestampSeconds = 145428480L;
 
 Review comment:
   ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163516)
Time Spent: 2h 20m  (was: 2h 10m)

> Remove pipeline for publishing nexmark results to bigQuery and publish using 
> BigQuery API only
> --
>
> Key: BEAM-5906
> URL: https://issues.apache.org/jira/browse/BEAM-5906
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There's no need to start a separate pipeline for uploading metrics results 
> from Nexmark suites to BigQuery. We can use an API designed for that and 
> place it in test-utils. Thanks to that: 
>  - it won't start a separate pipeline every time it publishes results
>  - other suites will be able to use that code
>  - We will not face problems like special long to int conversion due to 
> problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163515
 ]

ASF GitHub Bot logged work on BEAM-5906:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:26
Start Date: 07/Nov/18 17:26
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on a change in pull request #6886: 
[BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results
URL: https://github.com/apache/beam/pull/6886#discussion_r231601112
 
 

 ##
 File path: 
sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java
 ##
 @@ -87,41 +71,47 @@ public void testSavePerfsToBigQuery() throws IOException, 
InterruptedException {
 nexmarkPerf2.eventsPerSec = 1.5F;
 nexmarkPerf2.runtimeSec = 1.325F;
 
-// simulate 2 runs of the same query just to check that rows are apened 
correctly.
+// simulate 2 runs of the same query just to check that rows are appended 
correctly.
 HashMap perfs = new HashMap<>(2);
 perfs.put(nexmarkConfiguration1, nexmarkPerf1);
 perfs.put(nexmarkConfiguration2, nexmarkPerf2);
 
-// cast to int due to BEAM-4734. To avoid overflow on int capacity,
-// set the instant to a fixed date (and not Instant.now())
-int startTimestampSeconds = 1454284800;
-Main.savePerfsToBigQuery(
-options, perfs, fakeBqServices, new Instant(startTimestampSeconds * 
1000L));
-
-String tableSpec = NexmarkUtils.tableSpec(options, 
QUERY.getNumberOrName(), 0L, null);
-List actualRows =
-fakeDatasetService.getAllRows(
-options.getProject(),
-options.getBigQueryDataset(),
-BigQueryHelpers.parseTableSpec(tableSpec).getTableId());
-assertEquals("Wrong number of rows inserted", 2, actualRows.size());
-List expectedRows = new ArrayList<>();
-TableRow row1 =
-new TableRow()
-.set("timestamp", startTimestampSeconds)
-.set("runtimeSec", nexmarkPerf1.runtimeSec)
-.set("eventsPerSec", nexmarkPerf1.eventsPerSec)
-// cast to int due to BEAM-4734.
-.set("numResults", (int) nexmarkPerf1.numResults);
-expectedRows.add(row1);
-TableRow row2 =
-new TableRow()
-.set("timestamp", startTimestampSeconds)
-.set("runtimeSec", nexmarkPerf2.runtimeSec)
-.set("eventsPerSec", nexmarkPerf2.eventsPerSec)
-// cast to int  due to BEAM-4734.
-.set("numResults", (int) nexmarkPerf2.numResults);
-expectedRows.add(row2);
-assertThat(actualRows, containsInAnyOrder(Iterables.toArray(expectedRows, 
TableRow.class)));
+long startTimestampSeconds = 145428480L;
+Main.savePerfsToBigQuery(bigQueryClient, options, perfs, new 
Instant(startTimestampSeconds));
+
+String tableName = NexmarkUtils.tableName(options, 
QUERY.getNumberOrName(), 0L, null);
+List> rows = bigQueryClient.getRows(tableName);
+
+// savePerfsToBigQuery converts millis to seconds (it's a BigQuery's 
requirement).
 
 Review comment:
   ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163515)
Time Spent: 2h 10m  (was: 2h)

> Remove pipeline for publishing nexmark results to bigQuery and publish using 
> BigQuery API only
> --
>
> Key: BEAM-5906
> URL: https://issues.apache.org/jira/browse/BEAM-5906
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There's no need to start a separate pipeline for uploading metrics results 
> from Nexmark suites to BigQuery. We can use an API designed for that and 
> place it in test-utils. Thanks to that: 
>  - it won't start a separate pipeline every time it publishes results
>  - other suites will be able to use that code
>  - We will not face problems like special long to int conversion due to 
> problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163514
 ]

ASF GitHub Bot logged work on BEAM-5906:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:26
Start Date: 07/Nov/18 17:26
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on a change in pull request #6886: 
[BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results
URL: https://github.com/apache/beam/pull/6886#discussion_r231601096
 
 

 ##
 File path: 
sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java
 ##
 @@ -87,41 +71,47 @@ public void testSavePerfsToBigQuery() throws IOException, 
InterruptedException {
 nexmarkPerf2.eventsPerSec = 1.5F;
 nexmarkPerf2.runtimeSec = 1.325F;
 
-// simulate 2 runs of the same query just to check that rows are apened 
correctly.
+// simulate 2 runs of the same query just to check that rows are appended 
correctly.
 HashMap perfs = new HashMap<>(2);
 perfs.put(nexmarkConfiguration1, nexmarkPerf1);
 perfs.put(nexmarkConfiguration2, nexmarkPerf2);
 
-// cast to int due to BEAM-4734. To avoid overflow on int capacity,
 
 Review comment:
   I'm glad too. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163514)
Time Spent: 2h  (was: 1h 50m)

> Remove pipeline for publishing nexmark results to bigQuery and publish using 
> BigQuery API only
> --
>
> Key: BEAM-5906
> URL: https://issues.apache.org/jira/browse/BEAM-5906
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> There's no need to start a separate pipeline for uploading metrics results 
> from Nexmark suites to BigQuery. We can use an API designed for that and 
> place it in test-utils. Thanks to that: 
>  - it won't start a separate pipeline every time it publishes results
>  - other suites will be able to use that code
>  - We will not face problems like special long to int conversion due to 
> problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163512
 ]

ASF GitHub Bot logged work on BEAM-5906:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:26
Start Date: 07/Nov/18 17:26
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on a change in pull request #6886: 
[BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results
URL: https://github.com/apache/beam/pull/6886#discussion_r231601063
 
 

 ##
 File path: 
sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkUtils.java
 ##
 @@ -145,53 +145,51 @@
 QUERY_RUNNER_AND_MODE
   }
 
+  /** Return a query name with query language (if applicable). */
+  static String fullQueryName(String queryLanguage, String query) {
+return queryLanguage != null ? query + "_" + queryLanguage : query;
+  }
+
   /** Return a BigQuery table spec. */
   static String tableSpec(NexmarkOptions options, String queryName, long now, 
String version) {
+return String.format(
+"%s:%s.%s",
+options.getProject(),
+options.getBigQueryDataset(),
+tableName(options, queryName, now, version));
+  }
+
+  /** Return a BigQuery table name. */
+  static String tableName(NexmarkOptions options, String queryName, long now, 
String version) {
 String baseTableName = options.getBigQueryTable();
 if (Strings.isNullOrEmpty(baseTableName)) {
   throw new RuntimeException("Missing --bigQueryTable");
 }
+
 switch (options.getResourceNameMode()) {
   case VERBATIM:
-return String.format(
-"%s:%s.%s_%s",
-options.getProject(), options.getBigQueryDataset(), baseTableName, 
version);
+return String.format("%s_%s", baseTableName, version);
   case QUERY:
-return String.format(
-"%s:%s.%s_%s_%s",
-options.getProject(), options.getBigQueryDataset(), baseTableName, 
queryName, version);
+return String.format("%s_%s_%s", baseTableName, queryName, version);
   case QUERY_AND_SALT:
-return String.format(
-"%s:%s.%s_%s_%s_%d",
-options.getProject(),
-options.getBigQueryDataset(),
-baseTableName,
-queryName,
-version,
-now);
+return String.format("%s_%s_%s_%d", baseTableName, queryName, version, 
now);
   case QUERY_RUNNER_AND_MODE:
-return (version != null)
-? String.format(
-"%s:%s.%s_%s_%s_%s_%s",
-options.getProject(),
-options.getBigQueryDataset(),
-baseTableName,
-queryName,
-options.getRunner().getSimpleName(),
-options.isStreaming() ? "streaming" : "batch",
-version)
-: String.format(
-"%s:%s.%s_%s_%s_%s",
-options.getProject(),
-options.getBigQueryDataset(),
-baseTableName,
-queryName,
-options.getRunner().getSimpleName(),
-options.isStreaming() ? "streaming" : "batch");
+String runnerName = options.getRunner().getSimpleName();
+boolean isStreaming = options.isStreaming();
+
+String tableName =
+String.format(
+"%s_%s_%s_%s", baseTableName, queryName, runnerName, 
processingMode(isStreaming));
+
+return version != null ? String.format("%s_%s", tableName, version) : 
tableName;
 }
 throw new RuntimeException("Unrecognized enum " + 
options.getResourceNameMode());
   }
 
+  private static String processingMode(boolean isStreaming) {
 
 Review comment:
   I deliberately left it because it makes the code cleaner to me (hides the 
details). Given that, can we keep it as-is?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163512)
Time Spent: 1h 40m  (was: 1.5h)

> Remove pipeline for publishing nexmark results to bigQuery and publish using 
> BigQuery API only
> --
>
> Key: BEAM-5906
> URL: https://issues.apache.org/jira/browse/BEAM-5906
> Project: Beam
>  Issue Type: Improvement
>  Components: examples-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There's no need 

[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163506
 ]

ASF GitHub Bot logged work on BEAM-6005:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:25
Start Date: 07/Nov/18 17:25
Worklog Time Spent: 10m 
  Work Description: supercclank commented on a change in pull request 
#6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually 
function.
URL: https://github.com/apache/beam/pull/6923#discussion_r231598848
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java
 ##
 @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) 
throws IOException {
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingIOException() throws Exception {
-thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
 Pipeline p =
-runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
 
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
 p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingNPException() throws Exception {
-thrown.expect(RuntimeException.class);
-thrown.expectMessage("java.lang.NullPointerException: Super Unique 
Message!!!");
-
 Pipeline p =
-runPipelineWith(
+pipelineWith(
 new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super 
Unique Message!!!"));
+
+p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testEncodingIOException() throws Exception {
+Pipeline p =
+pipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, 
EXCEPTION_MESSAGE));
 thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
 
-Pipeline p =
-runPipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, 
EXCEPTION_MESSAGE));
+p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testEncodingNPException() throws Exception {
-thrown.expect(RuntimeException.class);
-thrown.expectMessage("java.lang.NullPointerException: Super Unique 
Message!!!");
 Pipeline p =
-runPipelineWith(
+pipelineWith(
 new CustomTestCoder(null, NULL_POINTER_EXCEPTION, null, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super 
Unique Message!!!"));
+p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testSerializationIOException() throws Exception {
-thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
 Pipeline p =
-runPipelineWith(new CustomTestCoder(null, null, IO_EXCEPTION, null, 
EXCEPTION_MESSAGE));
+pipelineWith(new CustomTestCoder(null, null, IO_EXCEPTION, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
+p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testSerializationNPException() throws Exception {
-thrown.expect(RuntimeException.class);
-thrown.expectMessage("java.lang.NullPointerException: Super Unique 
Message!!!");
-
 Pipeline p =
-runPipelineWith(
+pipelineWith(
 new CustomTestCoder(null, null, NULL_POINTER_EXCEPTION, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
 
 Review comment:
   Same question here about this changing to Exception.class


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163506)
Time Spent: 20m  (was: 10m)

> PCollectionCustomCoderTest passes spuriously.
> -
>
> Key: BEAM-6005
> URL: https://issues.apache.org/jira/browse/BEAM-6005
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Jason Kuster
>   

[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163509
 ]

ASF GitHub Bot logged work on BEAM-6005:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:25
Start Date: 07/Nov/18 17:25
Worklog Time Spent: 10m 
  Work Description: supercclank commented on a change in pull request 
#6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually 
function.
URL: https://github.com/apache/beam/pull/6923#discussion_r231600012
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java
 ##
 @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) 
throws IOException {
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingIOException() throws Exception {
-thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
 Pipeline p =
-runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
 
+thrown.expect(Exception.class);
 
 Review comment:
   Is this able to be changed to "IOException.class"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163509)

> PCollectionCustomCoderTest passes spuriously.
> -
>
> Key: BEAM-6005
> URL: https://issues.apache.org/jira/browse/BEAM-6005
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The test assertions trigger before the coder is used on the actual runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163508
 ]

ASF GitHub Bot logged work on BEAM-6005:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:25
Start Date: 07/Nov/18 17:25
Worklog Time Spent: 10m 
  Work Description: supercclank commented on a change in pull request 
#6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually 
function.
URL: https://github.com/apache/beam/pull/6923#discussion_r231598938
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java
 ##
 @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) 
throws IOException {
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingIOException() throws Exception {
-thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
 Pipeline p =
-runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
 
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
 p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingNPException() throws Exception {
-thrown.expect(RuntimeException.class);
-thrown.expectMessage("java.lang.NullPointerException: Super Unique 
Message!!!");
-
 Pipeline p =
-runPipelineWith(
+pipelineWith(
 new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super 
Unique Message!!!"));
+
+p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testEncodingIOException() throws Exception {
+Pipeline p =
+pipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, 
EXCEPTION_MESSAGE));
 thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
 
-Pipeline p =
-runPipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, 
EXCEPTION_MESSAGE));
+p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testEncodingNPException() throws Exception {
-thrown.expect(RuntimeException.class);
-thrown.expectMessage("java.lang.NullPointerException: Super Unique 
Message!!!");
 Pipeline p =
-runPipelineWith(
+pipelineWith(
 new CustomTestCoder(null, NULL_POINTER_EXCEPTION, null, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
 
 Review comment:
   Was the widening of RuntimeException.class to Exception.class intended?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163508)
Time Spent: 0.5h  (was: 20m)

> PCollectionCustomCoderTest passes spuriously.
> -
>
> Key: BEAM-6005
> URL: https://issues.apache.org/jira/browse/BEAM-6005
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The test assertions trigger before the coder is used on the actual runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163507
 ]

ASF GitHub Bot logged work on BEAM-6005:


Author: ASF GitHub Bot
Created on: 07/Nov/18 17:25
Start Date: 07/Nov/18 17:25
Worklog Time Spent: 10m 
  Work Description: supercclank commented on a change in pull request 
#6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually 
function.
URL: https://github.com/apache/beam/pull/6923#discussion_r231599120
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java
 ##
 @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) 
throws IOException {
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingIOException() throws Exception {
-thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
 Pipeline p =
-runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
 
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
 p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testDecodingNPException() throws Exception {
-thrown.expect(RuntimeException.class);
-thrown.expectMessage("java.lang.NullPointerException: Super Unique 
Message!!!");
-
 Pipeline p =
-runPipelineWith(
+pipelineWith(
 new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super 
Unique Message!!!"));
+
+p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testEncodingIOException() throws Exception {
+Pipeline p =
+pipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, 
EXCEPTION_MESSAGE));
 thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
 
-Pipeline p =
-runPipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, 
EXCEPTION_MESSAGE));
+p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testEncodingNPException() throws Exception {
-thrown.expect(RuntimeException.class);
-thrown.expectMessage("java.lang.NullPointerException: Super Unique 
Message!!!");
 Pipeline p =
-runPipelineWith(
+pipelineWith(
 new CustomTestCoder(null, NULL_POINTER_EXCEPTION, null, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super 
Unique Message!!!"));
+p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testSerializationIOException() throws Exception {
-thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
 Pipeline p =
-runPipelineWith(new CustomTestCoder(null, null, IO_EXCEPTION, null, 
EXCEPTION_MESSAGE));
+pipelineWith(new CustomTestCoder(null, null, IO_EXCEPTION, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
+p.run().waitUntilFinish();
   }
 
   @Test
   @Category(NeedsRunner.class)
   public void testSerializationNPException() throws Exception {
-thrown.expect(RuntimeException.class);
-thrown.expectMessage("java.lang.NullPointerException: Super Unique 
Message!!!");
-
 Pipeline p =
-runPipelineWith(
+pipelineWith(
 new CustomTestCoder(null, null, NULL_POINTER_EXCEPTION, null, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super 
Unique Message!!!"));
+
+p.run().waitUntilFinish();
   }
 
+  // TODO(BEAM-6004) Have DirectRunner trigger deserialization.
+  @Ignore("DirectRunner doesn't decode coders so this test does not pass.")
   @Test
   @Category(NeedsRunner.class)
   public void testDeserializationIOException() throws Exception {
-thrown.expect(Exception.class);
-thrown.expectCause(instanceOf(IOException.class));
 Pipeline p =
-runPipelineWith(new CustomTestCoder(null, null, null, IO_EXCEPTION, 
EXCEPTION_MESSAGE));
+pipelineWith(new CustomTestCoder(null, null, null, IO_EXCEPTION, 
EXCEPTION_MESSAGE));
+thrown.expect(Exception.class);
+thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique 
Message!!!"));
+

[jira] [Commented] (BEAM-5798) Add support for dynamic destinations when writing to Kafka

2018-11-07 Thread Alexey Romanenko (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678500#comment-16678500
 ] 

Alexey Romanenko commented on BEAM-5798:


[~rangadi] You mean to change the internal API from using KV to use 
ProducerRecord but keep external API as "it is"? Yes, perhaps it would makes 
sense too, thanks!

> Add support for dynamic destinations when writing to Kafka
> --
>
> Key: BEAM-5798
> URL: https://issues.apache.org/jira/browse/BEAM-5798
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Luke Cwik
>Assignee: Alexey Romanenko
>Priority: Major
>  Labels: newbie, starter
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Add support for writing to Kafka based upon contents of the data. This is 
> similar to the dynamic destination approach for file IO and other sinks.
>  
> Source of request: 
> https://lists.apache.org/thread.html/a89d1d32ecdb50c42271e805cc01a651ee3623b4df97c39baf4f2053@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163495
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 16:55
Start Date: 07/Nov/18 16:55
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on a change in pull request #6966: 
[DRAFT] [BEAM-5931] Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#discussion_r231586067
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1139,7 +1140,25 @@ artifactId=${project.name}
 outputs.upToDateWhen { false }
 
 include "**/*IT.class"
-systemProperties.beamTestPipelineOptions = 
configuration.integrationTestPipelineOptions
+
+def pipelineOptionString = configuration.integrationTestPipelineOptions
+if(configuration.runner?.equalsIgnoreCase('dataflow')) {
+  
project.evaluationDependsOn(":beam-runners-google-cloud-dataflow-java-legacy-worker")
+
+  def jsonSlurper = new JsonSlurper()
+  def allOptionsList = jsonSlurper.parseText(pipelineOptionString)
+  def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?:
+  
project.project(":beam-runners-google-cloud-dataflow-java-legacy-worker").shadowJar.archivePath
 
 Review comment:
   Will this line only get the path to a previously built worker (without 
building) or it will build it here if it's not found?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163495)
Time Spent: 3h 20m  (was: 3h 10m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163494
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 16:55
Start Date: 07/Nov/18 16:55
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on a change in pull request #6966: 
[DRAFT] [BEAM-5931] Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#discussion_r231584194
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1139,7 +1140,25 @@ artifactId=${project.name}
 outputs.upToDateWhen { false }
 
 include "**/*IT.class"
-systemProperties.beamTestPipelineOptions = 
configuration.integrationTestPipelineOptions
+
+def pipelineOptionString = configuration.integrationTestPipelineOptions
+if(configuration.runner?.equalsIgnoreCase('dataflow')) {
+  
project.evaluationDependsOn(":beam-runners-google-cloud-dataflow-java-legacy-worker")
+
+  def jsonSlurper = new JsonSlurper()
 
 Review comment:
   can we inline this variable? It's used only once.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163494)
Time Spent: 3h 10m  (was: 3h)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163480
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 16:34
Start Date: 07/Nov/18 16:34
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #6966: [DRAFT] [BEAM-5931] 
Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#issuecomment-436688003
 
 
   Run Java MongoDBIO Performance Test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163480)
Time Spent: 2h 50m  (was: 2h 40m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163481
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 16:34
Start Date: 07/Nov/18 16:34
Worklog Time Spent: 10m 
  Work Description: lgajowy edited a comment on issue #6966: [DRAFT] 
[BEAM-5931] Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#issuecomment-436687860
 
 
   Let's check also some other tests. (btw, I'm reviewing now. :) )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163481)
Time Spent: 3h  (was: 2h 50m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163478
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 16:34
Start Date: 07/Nov/18 16:34
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #6966: [DRAFT] [BEAM-5931] 
Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#issuecomment-436687860
 
 
   Let's check also some other tests: 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163478)
Time Spent: 2.5h  (was: 2h 20m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163479
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 16:34
Start Date: 07/Nov/18 16:34
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #6966: [DRAFT] [BEAM-5931] 
Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#issuecomment-436687883
 
 
   Run Java JdbcIO Performance Test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163479)
Time Spent: 2h 40m  (was: 2.5h)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5931) Rollback PR/6899

2018-11-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163476
 ]

ASF GitHub Bot logged work on BEAM-5931:


Author: ASF GitHub Bot
Created on: 07/Nov/18 16:27
Start Date: 07/Nov/18 16:27
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] 
Add worker jar to integrationTest targets.
URL: https://github.com/apache/beam/pull/6966#issuecomment-436684952
 
 
   Run Java TextIO Performance Test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 163476)
Time Spent: 2h 20m  (was: 2h 10m)

> Rollback PR/6899
> 
>
> Key: BEAM-5931
> URL: https://issues.apache.org/jira/browse/BEAM-5931
> Project: Beam
>  Issue Type: Task
>  Components: beam-model, runner-dataflow
>Reporter: Luke Cwik
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> To rollback this change, one must either:
> 1) Update nexmark / perf test framework to use Dataflow worker jar.
> This requires adding the 
> {code}
>  "--dataflowWorkerJar=${dataflowWorkerJar}",
>  "--workerHarnessContainerImage=",
> {code}
> when running the tests.
> OR
> 2) Update the dataflow worker image with code that contains the rollback of 
> PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker 
> image.
> #1 is preferable since we will no longer have tests running that don't use a 
> Dataflow worker jar built from Github HEAD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6011) Enable Phrase triggering in Nexmark jobs

2018-11-07 Thread Lukasz Gajowy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678445#comment-16678445
 ] 

Lukasz Gajowy edited comment on BEAM-6011 at 11/7/18 4:18 PM:
--

I made some attempts to prepare a quick fix for this. It was not so easy as I 
initially expected. I tried to use the $GIT_BRANCH environment variable to 
conditionally create the nexmark args needed for big query connection. It seems 
that this is not the right approach, because $GIT_BRANCH is evaluated while 
running the job whereas the rest of the code is evaluated while running the 
Seed job. Moreover, I don't like this solution because it's hiding the real 
intent of the job.

Link to the PR (currently closed): [https://github.com/apache/beam/pull/6974]

IMO, after some trying, the way it has to be done is that: 
 - every nexmark job should have a "twin" job that is for running on PRs 
(exclusively). The "twin" job should run the same queries (to verify 
performance the same way).
 - PR triggered jobs should save to the big query database the same way as the 
periodic jobs (because saving also can change and we will need to test that in 
PRs too).

 - PR triggered jobs should save to a new BQ dataset and have the same table 
names.  

 - Jenkins dsl code should be refactored so that it provides convenient methods 
for creating new jobs =

CC: [~kenn] , [~echauchot] WDYT? Do you have other comments/suggestions?

We should do exactly the same for IO Performance tests (currently they suffer 
the same problems)...

 

 


was (Author: łukaszg):
I made some attempts to prepare a quick fix for this. It was not so easy as I 
initially expected. I tried to use the $GIT_BRANCH environment variable to 
conditionally create the nexmark args needed for big query connection. It seems 
that this is not the right approach, because $GIT_BRANCH is evaluated while 
running the job whereas the rest of the code is evaluated while running the 
Seed job. Moreover, I don't like this solution because it's hiding the real 
intent of the job.

Link to the PR (currently closed): [https://github.com/apache/beam/pull/6974]

IMO, after some trying, the way it has to be done is that: 
 - every nexmark job should have a "twin" job that is for running on PRs 
(exclusively)
 - PR triggered jobs should save to the big query database the same way as the 
periodic jobs (because saving also can change and we will need to test that in 
PRs too).

 - PR triggered jobs should save to a new BQ dataset and have the same table 
names.  

 - Jenkins dsl code should be refactored so that it provides convenient methods 
for creating new jobs =

CC: [~kenn] , [~echauchot] WDYT? Do you have other comments/suggestions?

We should do exactly the same for IO Performance tests (currently they suffer 
the same problems)...

 

 

> Enable Phrase triggering in Nexmark jobs
> 
>
> Key: BEAM-6011
> URL: https://issues.apache.org/jira/browse/BEAM-6011
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Critical
>
> We need to enable Phrase Triggering (running Jenkins jobs from PR) for 
> Nexmark jobs so that we could check if pull requests are not breaking 
> anything before merging them. 
> Note: Currently Nexmark jobs run post commit on master and publish their 
> results to BigQuery database. In order not to pollute the results collected 
> for master we should save the results for Pr-triggered jobs in some other 
> tables/datasets or even not save them at all (turn off publishing to BQ).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6002) Nexmark tests timing out on all runners (crash loop due to metrics?)

2018-11-07 Thread Lukasz Gajowy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678450#comment-16678450
 ] 

Lukasz Gajowy commented on BEAM-6002:
-

I created an issue for nexmark phrase triggering: 
https://issues.apache.org/jira/browse/BEAM-6011 

> Nexmark tests timing out on all runners (crash loop due to metrics?) 
> -
>
> Key: BEAM-6002
> URL: https://issues.apache.org/jira/browse/BEAM-6002
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kenneth Knowles
>Assignee: Chamikara Jayalath
>Priority: Critical
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Spark/992/
> {code}
> 08:58:26 2018-11-06T16:58:26.035Z RUNNING Query0
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.elements, from namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.bytes, from namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.startTime for namespace Query0.Events
> 08:58:26 2018-11-06T16:58:26.035Z no activity
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.endTime for namespace Query0.Events
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.elements, from namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.bytes, from namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTime for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTime for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTimestamp for namespace Query0.Results
> 08:58:26 18/11/06 16:58:26 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTimestamp for namespace Query0.Results
> 08:58:41 2018-11-06T16:58:41.035Z RUNNING Query0
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.elements, from namespace Query0.Events
> 08:58:41 2018-11-06T16:58:41.036Z no activity
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.bytes, from namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.startTime for namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric event.endTime for namespace Query0.Events
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.elements, from namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> result.bytes, from namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTime for namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTime for namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.startTimestamp for namespace Query0.Results
> 08:58:41 18/11/06 16:58:41 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get 
> distribution metric result.endTimestamp for namespace Query0.Results
> 08:58:56 2018-11-06T16:58:56.036Z RUNNING Query0
> 08:58:56 18/11/06 16:58:56 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.elements, from namespace Query0.Events
> 08:58:56 2018-11-06T16:58:56.036Z no activity
> 08:58:56 18/11/06 16:58:56 ERROR 
> org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric 
> event.bytes, from namespace Query0.Events
> 08:58:56 18/11/06 16:58:56 ERROR 
> 

  1   2   >