[jira] [Created] (BEAM-6015) Uber task for Portable Flink scalability
Ankur Goenka created BEAM-6015: -- Summary: Uber task for Portable Flink scalability Key: BEAM-6015 URL: https://issues.apache.org/jira/browse/BEAM-6015 Project: Beam Issue Type: Task Components: java-fn-execution, runner-flink Reporter: Ankur Goenka Task to track scalability issues with portable flink. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6002) Nexmark tests timing out on all runners (crash loop due to metrics?)
[ https://issues.apache.org/jira/browse/BEAM-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678925#comment-16678925 ] Chamikara Jayalath commented on BEAM-6002: -- Looks like Dataflow job (first one) succeeded but JVM crashed for some reason. For example, [https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Dataflow/952/consoleFull] Job passed: [https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-11-07_14_04_35-10954309474319148697?project=apache-beam-testing] But beam-sdks-java-nexmark:run failed due to following. Process 'command '/usr/local/asfpackages/java/jdk1.8.0_191/bin/java'' finished with non-zero exit value 1 org.gradle.api.tasks.TaskExecutionException : Execution failed for task ':beam-sdks-java-nexmark:run'.Open stacktrace Caused by: org.gradle.process.internal.ExecException : Process 'command '/usr/local/asfpackages/java/jdk1.8.0_191/bin/java'' finished with non-zero exit value 1Close stacktrace at org.gradle.process.internal.DefaultExecHandle$ExecResultImpl.assertNormalExitValue(DefaultExecHandle.java:395) at org.gradle.process.internal.DefaultJavaExecAction.execute(DefaultJavaExecAction.java:37) [~apilloud] any idea what's going on here ? Doesn't look a timeout to me anymore. > Nexmark tests timing out on all runners (crash loop due to metrics?) > - > > Key: BEAM-6002 > URL: https://issues.apache.org/jira/browse/BEAM-6002 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kenneth Knowles >Assignee: Lukasz Gajowy >Priority: Critical > Time Spent: 2h 40m > Remaining Estimate: 0h > > https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Spark/992/ > {code} > 08:58:26 2018-11-06T16:58:26.035Z RUNNING Query0 > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.elements, from namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.bytes, from namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.startTime for namespace Query0.Events > 08:58:26 2018-11-06T16:58:26.035Z no activity > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.endTime for namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.elements, from namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.bytes, from namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTime for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTime for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTimestamp for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTimestamp for namespace Query0.Results > 08:58:41 2018-11-06T16:58:41.035Z RUNNING Query0 > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.elements, from namespace Query0.Events > 08:58:41 2018-11-06T16:58:41.036Z no activity > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.bytes, from namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.startTime for namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.endTime for namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.elements, from namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.bytes, from namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTime for
[jira] [Work logged] (BEAM-4444) Parquet IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-?focusedWorklogId=163724=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163724 ] ASF GitHub Bot logged work on BEAM-: Author: ASF GitHub Bot Created on: 07/Nov/18 23:04 Start Date: 07/Nov/18 23:04 Worklog Time Spent: 10m Work Description: ihji removed a comment on issue #6763: [BEAM-] Parquet IO for Python SDK URL: https://github.com/apache/beam/pull/6763#issuecomment-436803756 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163724) Time Spent: 6h (was: 5h 50m) > Parquet IO for Python SDK > - > > Key: BEAM- > URL: https://issues.apache.org/jira/browse/BEAM- > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Bruce Arctor >Assignee: Heejong Lee >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > Add Parquet Support for the Python SDK. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678927#comment-16678927 ] Valentyn Tymofieiev commented on BEAM-6014: --- I wonder if the reason is that with DirectRunner we are using an some unsupported version of BQ client, compared to Native IO in Dataflow Runner. > bigquery_tornadoes example does not work on Python Direct Runner > > > Key: BEAM-6014 > URL: https://issues.apache.org/jira/browse/BEAM-6014 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ahmet Altay >Priority: Major > > Steps to reproduce: > > {noformat} > > > PROJECT=$(gcloud config get-value project) > BQ_DATASET=${USER}_bq_dataset > TABLE_NAME=out > bq rm -rf --project=$PROJECT $BQ_DATASET > bq mk --project=$PROJECT $BQ_DATASET > python -m apache_beam.examples.cookbook.bigquery_tornadoes --output > $BQ_DATASET.$TABLE_NAME --project=$PROJECT > bq show $PROJECT:$BQ_DATASET.$TABLE_NAME > {noformat} > {noformat} > Last modified Schema Total Rows Total Bytes Expiration Time Partitioning > Labels > - --- - > --- > 07 Nov 14:45:23 |- month: integer 0 0 > |- tornado_count: integer > {noformat} > > The resulting Database is empty, however it has several rows if we run the > example with Dataflow runner. > Older versions of SDK also seem to have this problem. > cc: [~chamikara], [~charleschen]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6002) Nexmark tests timing out on all runners (crash loop due to metrics?)
[ https://issues.apache.org/jira/browse/BEAM-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Jayalath reassigned BEAM-6002: Assignee: Andrew Pilloud (was: Lukasz Gajowy) > Nexmark tests timing out on all runners (crash loop due to metrics?) > - > > Key: BEAM-6002 > URL: https://issues.apache.org/jira/browse/BEAM-6002 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kenneth Knowles >Assignee: Andrew Pilloud >Priority: Critical > Time Spent: 2h 40m > Remaining Estimate: 0h > > https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Spark/992/ > {code} > 08:58:26 2018-11-06T16:58:26.035Z RUNNING Query0 > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.elements, from namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.bytes, from namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.startTime for namespace Query0.Events > 08:58:26 2018-11-06T16:58:26.035Z no activity > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.endTime for namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.elements, from namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.bytes, from namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTime for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTime for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTimestamp for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTimestamp for namespace Query0.Results > 08:58:41 2018-11-06T16:58:41.035Z RUNNING Query0 > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.elements, from namespace Query0.Events > 08:58:41 2018-11-06T16:58:41.036Z no activity > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.bytes, from namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.startTime for namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.endTime for namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.elements, from namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.bytes, from namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTime for namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTime for namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTimestamp for namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTimestamp for namespace Query0.Results > 08:58:56 2018-11-06T16:58:56.036Z RUNNING Query0 > 08:58:56 18/11/06 16:58:56 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.elements, from namespace Query0.Events > 08:58:56 2018-11-06T16:58:56.036Z no activity > 08:58:56 18/11/06 16:58:56 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.bytes, from namespace Query0.Events > 08:58:56 18/11/06 16:58:56 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.startTime for
[jira] [Work logged] (BEAM-4444) Parquet IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-?focusedWorklogId=163725=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163725 ] ASF GitHub Bot logged work on BEAM-: Author: ASF GitHub Bot Created on: 07/Nov/18 23:04 Start Date: 07/Nov/18 23:04 Worklog Time Spent: 10m Work Description: ihji commented on issue #6763: [BEAM-] Parquet IO for Python SDK URL: https://github.com/apache/beam/pull/6763#issuecomment-436810501 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163725) Time Spent: 6h 10m (was: 6h) > Parquet IO for Python SDK > - > > Key: BEAM- > URL: https://issues.apache.org/jira/browse/BEAM- > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Bruce Arctor >Assignee: Heejong Lee >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > Add Parquet Support for the Python SDK. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678911#comment-16678911 ] Charles Chen commented on BEAM-6014: I can confirm that this problem occurs for me too. For more info, I added debugging statements and found that the InsertAll requests all return from BigQuery without error, but the table ends up without output: {{BigQueryWriteFn \{'tornado_count': 16, 'month': 1}}} {{BigQueryWriteFn \{'tornado_count': 6, 'month': 3}}} {{BigQueryWriteFn \{'tornado_count': 7, 'month': 2}}} {{BigQueryWriteFn \{'tornado_count': 6, 'month': 5}}} {{BigQueryWriteFn \{'tornado_count': 5, 'month': 4}}} {{BigQueryWriteFn \{'tornado_count': 8, 'month': 7}}} {{BigQueryWriteFn \{'tornado_count': 5, 'month': 6}}} {{BigQueryWriteFn \{'tornado_count': 7, 'month': 9}}} {{BigQueryWriteFn \{'tornado_count': 4, 'month': 8}}} {{BigQueryWriteFn \{'tornado_count': 9, 'month': 11}}} {{BigQueryWriteFn \{'tornado_count': 10, 'month': 10}}} {{BigQueryWriteFn \{'tornado_count': 10, 'month': 12}}} {{FINISH BUNDLE}} {{FLUSH BATCH}} {{_insert_all_rows REQUEST >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>]>}} {{ tableId: 'out'>}} {{_insert_all_rows RESPONSE }} {{BigQuery table: [MY_PROJECT:myproject.out]; errors: []}} {{Successfully wrote 12 rows.}} > bigquery_tornadoes example does not work on Python Direct Runner > > > Key: BEAM-6014 > URL: https://issues.apache.org/jira/browse/BEAM-6014 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ahmet Altay >Priority: Major > > Steps to reproduce: > > {noformat} > > > PROJECT=$(gcloud config get-value project) > BQ_DATASET=${USER}_bq_dataset > TABLE_NAME=out > bq rm -rf --project=$PROJECT $BQ_DATASET > bq mk --project=$PROJECT $BQ_DATASET > python -m apache_beam.examples.cookbook.bigquery_tornadoes --output > $BQ_DATASET.$TABLE_NAME --project=$PROJECT > bq show $PROJECT:$BQ_DATASET.$TABLE_NAME > {noformat} > The resulting Database is empty, however it has several rows if we run the > example with Dataflow runner. > Older versions of SDK also seem to have this problem. > cc: [~chamikara], [~charleschen]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-6014: -- Description: Steps to reproduce: {noformat} PROJECT=$(gcloud config get-value project) BQ_DATASET=${USER}_bq_dataset TABLE_NAME=out bq rm -rf --project=$PROJECT $BQ_DATASET bq mk --project=$PROJECT $BQ_DATASET python -m apache_beam.examples.cookbook.bigquery_tornadoes --output $BQ_DATASET.$TABLE_NAME --project=$PROJECT bq show $PROJECT:$BQ_DATASET.$TABLE_NAME {noformat} {noformat} Last modified Schema Total Rows Total Bytes Expiration Time Partitioning Labels - --- - --- 07 Nov 14:45:23 |- month: integer 0 0 |- tornado_count: integer {noformat} The resulting Database is empty, however it has several rows if we run the example with Dataflow runner. Older versions of SDK also seem to have this problem. cc: [~chamikara], [~charleschen]. was: Steps to reproduce: {noformat} PROJECT=$(gcloud config get-value project) BQ_DATASET=${USER}_bq_dataset TABLE_NAME=out bq rm -rf --project=$PROJECT $BQ_DATASET bq mk --project=$PROJECT $BQ_DATASET python -m apache_beam.examples.cookbook.bigquery_tornadoes --output $BQ_DATASET.$TABLE_NAME --project=$PROJECT bq show $PROJECT:$BQ_DATASET.$TABLE_NAME {noformat} The resulting Database is empty, however it has several rows if we run the example with Dataflow runner. Older versions of SDK also seem to have this problem. cc: [~chamikara], [~charleschen]. > bigquery_tornadoes example does not work on Python Direct Runner > > > Key: BEAM-6014 > URL: https://issues.apache.org/jira/browse/BEAM-6014 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ahmet Altay >Priority: Major > > Steps to reproduce: > > {noformat} > > > PROJECT=$(gcloud config get-value project) > BQ_DATASET=${USER}_bq_dataset > TABLE_NAME=out > bq rm -rf --project=$PROJECT $BQ_DATASET > bq mk --project=$PROJECT $BQ_DATASET > python -m apache_beam.examples.cookbook.bigquery_tornadoes --output > $BQ_DATASET.$TABLE_NAME --project=$PROJECT > bq show $PROJECT:$BQ_DATASET.$TABLE_NAME > {noformat} > {noformat} > Last modified Schema Total Rows Total Bytes Expiration Time Partitioning > Labels > - --- - > --- > 07 Nov 14:45:23 |- month: integer 0 0 > |- tornado_count: integer > {noformat} > > The resulting Database is empty, however it has several rows if we run the > example with Dataflow runner. > Older versions of SDK also seem to have this problem. > cc: [~chamikara], [~charleschen]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5798) Add support for dynamic destinations when writing to Kafka
[ https://issues.apache.org/jira/browse/BEAM-5798?focusedWorklogId=163717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163717 ] ASF GitHub Bot logged work on BEAM-5798: Author: ASF GitHub Bot Created on: 07/Nov/18 22:56 Start Date: 07/Nov/18 22:56 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #6776: WIP: [BEAM-5798] Added "withTopicFn()" to set sink topics dynamically URL: https://github.com/apache/beam/pull/6776#issuecomment-436808370 Closed PR as per authors comment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163717) Time Spent: 2h (was: 1h 50m) > Add support for dynamic destinations when writing to Kafka > -- > > Key: BEAM-5798 > URL: https://issues.apache.org/jira/browse/BEAM-5798 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka >Reporter: Luke Cwik >Assignee: Alexey Romanenko >Priority: Major > Labels: newbie, starter > Time Spent: 2h > Remaining Estimate: 0h > > Add support for writing to Kafka based upon contents of the data. This is > similar to the dynamic destination approach for file IO and other sinks. > > Source of request: > https://lists.apache.org/thread.html/a89d1d32ecdb50c42271e805cc01a651ee3623b4df97c39baf4f2053@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5798) Add support for dynamic destinations when writing to Kafka
[ https://issues.apache.org/jira/browse/BEAM-5798?focusedWorklogId=163718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163718 ] ASF GitHub Bot logged work on BEAM-5798: Author: ASF GitHub Bot Created on: 07/Nov/18 22:56 Start Date: 07/Nov/18 22:56 Worklog Time Spent: 10m Work Description: lukecwik closed pull request #6776: WIP: [BEAM-5798] Added "withTopicFn()" to set sink topics dynamically URL: https://github.com/apache/beam/pull/6776 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java index 31ba72c54ba..e311688b179 100644 --- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java @@ -167,7 +167,7 @@ * PCollection> kvColl = ...; * kvColl.apply(KafkaIO.write() * .withBootstrapServers("broker_1:9092,broker_2:9092") - * .withTopic("results") + * .withTopic("results") // use withTopicFn(SerializableFunction fn) to set topics dynamically * * .withKeySerializer(LongSerializer.class) * .withValueSerializer(StringSerializer.class) @@ -862,6 +862,9 @@ private KafkaIO() {} @Nullable abstract String getTopic(); +@Nullable +abstract SerializableFunction, String> getTopicFn(); + abstract Map getProducerConfig(); @Nullable @@ -894,6 +897,8 @@ private KafkaIO() {} abstract static class Builder { abstract Builder setTopic(String topic); + abstract Builder setTopicFn(SerializableFunction, String> fn); + abstract Builder setProducerConfig(Map producerConfig); abstract Builder setProducerFactoryFn( @@ -927,9 +932,20 @@ private KafkaIO() {} ImmutableMap.of(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers)); } -/** Sets the Kafka topic to write to. */ +/** + * Sets the Kafka topic to write to. Note that this overrides any previously function set + * by {@link #withTopicFn}. + */ public Write withTopic(String topic) { - return toBuilder().setTopic(topic).build(); + return toBuilder().setTopic(topic).setTopicFn(null).build(); +} + +/** + * Sets a custom function to define sink topic dynamically. Note that this overrides + * any previously set topic by {@link #withTopic}. + */ +public Write withTopicFn(SerializableFunction, String> topicFn) { + return toBuilder().setTopic(null).setTopicFn(topicFn).build(); } /** @@ -1057,11 +1073,16 @@ public PDone expand(PCollection> input) { checkArgument( getProducerConfig().get(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG) != null, "withBootstrapServers() is required"); - checkArgument(getTopic() != null, "withTopic() is required"); + checkArgument( + getTopic() != null || getTopicFn() != null, "withTopic() or withTopicFn() is required"); + checkArgument(getKeySerializer() != null, "withKeySerializer() is required"); checkArgument(getValueSerializer() != null, "withValueSerializer() is required"); if (isEOS()) { +checkArgument(getTopic() != null, "withTopic() is required with EOS sink"); +checkArgument(getTopicFn() == null, "withTopicFn() can't be used together with EOS sink"); + KafkaExactlyOnceSink.ensureEOSSupport(); // TODO: Verify that the group_id does not have existing state stored on Kafka unless diff --git a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java index beaa9a22053..3b55652a720 100644 --- a/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java +++ b/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaWriter.java @@ -61,8 +61,9 @@ public void processElement(ProcessContext ctx) throws Exception { ? spec.getPublishTimestampFunction().getTimestamp(kv, ctx.timestamp()).getMillis() : null; +String topic = spec.getTopicFn() != null ? spec.getTopicFn().apply(kv) : spec.getTopic(); producer.send( -new ProducerRecord<>(spec.getTopic(), null, timestampMillis, kv.getKey(), kv.getValue()), +new ProducerRecord<>(topic, null, timestampMillis, kv.getKey(), kv.getValue()), new SendCallback()); elementsWritten.inc(); diff --git a/sdks/java/io/kafka/src/test/java/org/apache/beam/sdk/io/kafka/KafkaIOTest.java
[jira] [Comment Edited] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678911#comment-16678911 ] Charles Chen edited comment on BEAM-6014 at 11/7/18 10:55 PM: -- I can confirm that this problem occurs for me too. For more info, I added debugging statements and found that the InsertAll requests all return from BigQuery without error, but the table ends up without output: {code} BigQueryWriteFn {'tornado_count': 16, 'month': 1} BigQueryWriteFn {'tornado_count': 6, 'month': 3} BigQueryWriteFn {'tornado_count': 7, 'month': 2} BigQueryWriteFn {'tornado_count': 6, 'month': 5} BigQueryWriteFn {'tornado_count': 5, 'month': 4} BigQueryWriteFn {'tornado_count': 8, 'month': 7} BigQueryWriteFn {'tornado_count': 5, 'month': 6} BigQueryWriteFn {'tornado_count': 7, 'month': 9} BigQueryWriteFn {'tornado_count': 4, 'month': 8} BigQueryWriteFn {'tornado_count': 9, 'month': 11} BigQueryWriteFn {'tornado_count': 10, 'month': 10} BigQueryWriteFn {'tornado_count': 10, 'month': 12} FINISH BUNDLE FLUSH BATCH _insert_all_rows REQUEST >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>]> tableId: 'out'> _insert_all_rows RESPONSE BigQuery table: [MY_PROJECT:myproject.out]; errors: [] Successfully wrote 12 rows. {code} was (Author: charleschen): I can confirm that this problem occurs for me too. For more info, I added debugging statements and found that the InsertAll requests all return from BigQuery without error, but the table ends up without output: {{BigQueryWriteFn \{'tornado_count': 16, 'month': 1}}} {{BigQueryWriteFn \{'tornado_count': 6, 'month': 3}}} {{BigQueryWriteFn \{'tornado_count': 7, 'month': 2}}} {{BigQueryWriteFn \{'tornado_count': 6, 'month': 5}}} {{BigQueryWriteFn \{'tornado_count': 5, 'month': 4}}} {{BigQueryWriteFn \{'tornado_count': 8, 'month': 7}}} {{BigQueryWriteFn \{'tornado_count': 5, 'month': 6}}} {{BigQueryWriteFn \{'tornado_count': 7, 'month': 9}}} {{BigQueryWriteFn \{'tornado_count': 4, 'month': 8}}} {{BigQueryWriteFn \{'tornado_count': 9, 'month': 11}}} {{BigQueryWriteFn \{'tornado_count': 10, 'month': 10}}} {{BigQueryWriteFn \{'tornado_count': 10, 'month': 12}}} {{FINISH BUNDLE}} {{FLUSH BATCH}} {{_insert_all_rows REQUEST >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>, >, >]>>]>}} {{ tableId: 'out'>}} {{_insert_all_rows RESPONSE }} {{BigQuery table: [MY_PROJECT:myproject.out]; errors: []}} {{Successfully wrote 12 rows.}} > bigquery_tornadoes example does not work on Python Direct Runner > > > Key: BEAM-6014 > URL: https://issues.apache.org/jira/browse/BEAM-6014 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ahmet Altay >Priority: Major > > Steps to reproduce: > > {noformat} > > > PROJECT=$(gcloud config get-value project) > BQ_DATASET=${USER}_bq_dataset > TABLE_NAME=out > bq rm -rf --project=$PROJECT $BQ_DATASET > bq mk --project=$PROJECT $BQ_DATASET > python -m apache_beam.examples.cookbook.bigquery_tornadoes --output > $BQ_DATASET.$TABLE_NAME --project=$PROJECT > bq show $PROJECT:$BQ_DATASET.$TABLE_NAME > {noformat} > The resulting Database is empty, however it has several rows if we run the > example with Dataflow runner. > Older versions of SDK also seem to have this problem. > cc: [~chamikara], [~charleschen]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163715 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 22:53 Start Date: 07/Nov/18 22:53 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #6966: [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#discussion_r231709315 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1141,12 +1141,10 @@ artifactId=${project.name} include "**/*IT.class" -def pipelineOptionString = configuration.integrationTestPipelineOptions -if(configuration.runner?.equalsIgnoreCase('dataflow')) { +def pipelineOptionsString = configuration.integrationTestPipelineOptions +if(pipelineOptionsString && configuration.runner?.equalsIgnoreCase('dataflow')) { Review comment: Won't you want to create a new pipelineOptionsString if there wasn't one defined containing the args that you added? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163715) Time Spent: 5.5h (was: 5h 20m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678895#comment-16678895 ] Valentyn Tymofieiev commented on BEAM-6014: --- Also tried to manually tried to disable FnAPI runner at [https://github.com/apache/beam/blob/b558c4d4f21ae1a48fb51afedd849b1f75ff3325/sdks/python/apache_beam/runners/direct/direct_runner.py#L76] , with no change in behavior. > bigquery_tornadoes example does not work on Python Direct Runner > > > Key: BEAM-6014 > URL: https://issues.apache.org/jira/browse/BEAM-6014 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ahmet Altay >Priority: Major > > Steps to reproduce: > > {noformat} > > > PROJECT=$(gcloud config get-value project) > BQ_DATASET=${USER}_bq_dataset > TABLE_NAME=out > bq rm -rf --project=$PROJECT $BQ_DATASET > bq mk --project=$PROJECT $BQ_DATASET > python -m apache_beam.examples.cookbook.bigquery_tornadoes --output > $BQ_DATASET.$TABLE_NAME --project=$PROJECT > bq show $PROJECT:$BQ_DATASET.$TABLE_NAME > {noformat} > The resulting Database is empty, however it has several rows if we run the > example with Dataflow runner. > Older versions of SDK also seem to have this problem. > cc: [~chamikara], [~charleschen]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5939) Deduplicate constants
[ https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163712 ] ASF GitHub Bot logged work on BEAM-5939: Author: ASF GitHub Bot Created on: 07/Nov/18 22:46 Start Date: 07/Nov/18 22:46 Worklog Time Spent: 10m Work Description: aaltay commented on issue #6976: [BEAM-5939] Dedupe runner constants URL: https://github.com/apache/beam/pull/6976#issuecomment-436805959 Jobs are failing with the following error: ``` Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 766, in run self._load_main_session(self.local_staging_directory) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 480, in _load_main_session session_file = os.path.join(session_path, names.PICKLED_MAIN_SESSION_FILE) AttributeError: 'module' object has no attribute 'PICKLED_MAIN_SESSION_FILE' ``` This is happening because Dataflow's python workers depend on the currently existing names.py. A change there also needs to happen, and that code is not open sourced yet. I will suggest, keeping the contents of existing names.py as it is, and making your other changes anyway. I can ask someone from Dataflow team to first clean up the worker (to depend on the new code) then do the clean up here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163712) Time Spent: 1h (was: 50m) > Deduplicate constants > - > > Key: BEAM-5939 > URL: https://issues.apache.org/jira/browse/BEAM-5939 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: starer > Time Spent: 1h > Remaining Estimate: 0h > > apache_beam/runners/dataflow/internal/names.py > apache_beam/runners/portability/stager.py > has same constants defined in both files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner
Valentyn Tymofieiev created BEAM-6014: - Summary: bigquery_tornadoes example does not work on Python Direct Runner Key: BEAM-6014 URL: https://issues.apache.org/jira/browse/BEAM-6014 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Valentyn Tymofieiev Assignee: Ahmet Altay Steps to reproduce: {noformat} PROJECT=$(gcloud config get-value project) BQ_DATASET=${USER}_bq_dataset TABLE_NAME=out bq rm -rf --project=$PROJECT $BQ_DATASET bq mk --project=$PROJECT $BQ_DATASET python -m apache_beam.examples.cookbook.bigquery_tornadoes --output $BQ_DATASET.$TABLE_NAME --project=$PROJECT bq show $PROJECT:$BQ_DATASET.$TABLE_NAME {noformat} The resulting Database is empty, however it has several rows if we run the example with Dataflow runner. Older versions of SDK also seem to have this problem. cc: [~chamikara], [~charleschen]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6013) Reduce verbose logging within SerializableCoder
[ https://issues.apache.org/jira/browse/BEAM-6013?focusedWorklogId=163686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163686 ] ASF GitHub Bot logged work on BEAM-6013: Author: ASF GitHub Bot Created on: 07/Nov/18 22:08 Start Date: 07/Nov/18 22:08 Worklog Time Spent: 10m Work Description: lukecwik opened a new pull request #6982: [BEAM-6013] Reduce logging within SerializableCoder. URL: https://github.com/apache/beam/pull/6982 This significantly reduces logging to happen at most once per class load. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163686) Time Spent: 10m Remaining Estimate: 0h > Reduce verbose logging within SerializableCoder > --- > >
[jira] [Commented] (BEAM-5953) Support DataflowRunner on Python 3
[ https://issues.apache.org/jira/browse/BEAM-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678898#comment-16678898 ] Scott Wegner commented on BEAM-5953: I don't have much to contribute, but some additional details might help others: * How did you launch the job? i.e. can you provide the commandline used? * Are you able to successfully run a simpler job (i.e. using existing python support instead of python 3?) > Support DataflowRunner on Python 3 > -- > > Key: BEAM-5953 > URL: https://issues.apache.org/jira/browse/BEAM-5953 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4444) Parquet IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-?focusedWorklogId=163709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163709 ] ASF GitHub Bot logged work on BEAM-: Author: ASF GitHub Bot Created on: 07/Nov/18 22:38 Start Date: 07/Nov/18 22:38 Worklog Time Spent: 10m Work Description: ihji commented on issue #6763: [BEAM-] Parquet IO for Python SDK URL: https://github.com/apache/beam/pull/6763#issuecomment-436803756 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163709) Time Spent: 5h 50m (was: 5h 40m) > Parquet IO for Python SDK > - > > Key: BEAM- > URL: https://issues.apache.org/jira/browse/BEAM- > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Bruce Arctor >Assignee: Heejong Lee >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Add Parquet Support for the Python SDK. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4444) Parquet IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-?focusedWorklogId=163708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163708 ] ASF GitHub Bot logged work on BEAM-: Author: ASF GitHub Bot Created on: 07/Nov/18 22:38 Start Date: 07/Nov/18 22:38 Worklog Time Spent: 10m Work Description: ihji removed a comment on issue #6763: [BEAM-] Parquet IO for Python SDK URL: https://github.com/apache/beam/pull/6763#issuecomment-436114110 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163708) Time Spent: 5h 40m (was: 5.5h) > Parquet IO for Python SDK > - > > Key: BEAM- > URL: https://issues.apache.org/jira/browse/BEAM- > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Bruce Arctor >Assignee: Heejong Lee >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > Add Parquet Support for the Python SDK. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-6014) bigquery_tornadoes example does not work on Python Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678895#comment-16678895 ] Valentyn Tymofieiev edited comment on BEAM-6014 at 11/7/18 10:35 PM: - Also tried to manually disable FnAPI runner at [https://github.com/apache/beam/blob/b558c4d4f21ae1a48fb51afedd849b1f75ff3325/sdks/python/apache_beam/runners/direct/direct_runner.py#L76] , with no change in behavior. was (Author: tvalentyn): Also tried to manually tried to disable FnAPI runner at [https://github.com/apache/beam/blob/b558c4d4f21ae1a48fb51afedd849b1f75ff3325/sdks/python/apache_beam/runners/direct/direct_runner.py#L76] , with no change in behavior. > bigquery_tornadoes example does not work on Python Direct Runner > > > Key: BEAM-6014 > URL: https://issues.apache.org/jira/browse/BEAM-6014 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ahmet Altay >Priority: Major > > Steps to reproduce: > > {noformat} > > > PROJECT=$(gcloud config get-value project) > BQ_DATASET=${USER}_bq_dataset > TABLE_NAME=out > bq rm -rf --project=$PROJECT $BQ_DATASET > bq mk --project=$PROJECT $BQ_DATASET > python -m apache_beam.examples.cookbook.bigquery_tornadoes --output > $BQ_DATASET.$TABLE_NAME --project=$PROJECT > bq show $PROJECT:$BQ_DATASET.$TABLE_NAME > {noformat} > The resulting Database is empty, however it has several rows if we run the > example with Dataflow runner. > Older versions of SDK also seem to have this problem. > cc: [~chamikara], [~charleschen]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6013) Reduce verbose logging within SerializableCoder
[ https://issues.apache.org/jira/browse/BEAM-6013?focusedWorklogId=163687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163687 ] ASF GitHub Bot logged work on BEAM-6013: Author: ASF GitHub Bot Created on: 07/Nov/18 22:08 Start Date: 07/Nov/18 22:08 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #6982: [BEAM-6013] Reduce logging within SerializableCoder. URL: https://github.com/apache/beam/pull/6982#issuecomment-436795762 R: @udim @mairbek This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163687) Time Spent: 20m (was: 10m) > Reduce verbose logging within SerializableCoder > --- > > Key: BEAM-6013 > URL: https://issues.apache.org/jira/browse/BEAM-6013 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > The following message is spamming logs constantly: > "Can't verify serialized elements of type TypeX have well defined equals > method. This may produce incorrect results on some PipelineRunner" > Code location: > https://github.com/apache/beam/blob/429547981b4534a29a0654e3b86f1a718793d816/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SerializableCoder.java#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6013) Reduce verbose logging within SerializableCoder
Luke Cwik created BEAM-6013: --- Summary: Reduce verbose logging within SerializableCoder Key: BEAM-6013 URL: https://issues.apache.org/jira/browse/BEAM-6013 Project: Beam Issue Type: Improvement Components: sdk-java-core Reporter: Luke Cwik Assignee: Luke Cwik The following message is spamming logs constantly: "Can't verify serialized elements of type TypeX have well defined equals method. This may produce incorrect results on some PipelineRunner" Code location: https://github.com/apache/beam/blob/429547981b4534a29a0654e3b86f1a718793d816/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SerializableCoder.java#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5979) Support DATE and TIME in DML
[ https://issues.apache.org/jira/browse/BEAM-5979?focusedWorklogId=163642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163642 ] ASF GitHub Bot logged work on BEAM-5979: Author: ASF GitHub Bot Created on: 07/Nov/18 20:29 Start Date: 07/Nov/18 20:29 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #6967: [BEAM-5979] Fix DATE and TIME in INSERTION URL: https://github.com/apache/beam/pull/6967#issuecomment-436766368 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163642) Time Spent: 2h 10m (was: 2h) > Support DATE and TIME in DML > > > Key: BEAM-5979 > URL: https://issues.apache.org/jira/browse/BEAM-5979 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Right now, BeamSQL uses Schema's DATETIME field to save all time related > data. However, BeamSQL doesn't implement correctly how TIME and DATE should > be converted to Joda's datetime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3221) Model pipeline representation improvements
[ https://issues.apache.org/jira/browse/BEAM-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-3221: --- Assignee: (was: Henning Rohde) > Model pipeline representation improvements > -- > > Key: BEAM-3221 > URL: https://issues.apache.org/jira/browse/BEAM-3221 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Henning Rohde >Priority: Major > Labels: portability > > Collections of various (breaking) tweaks to the Runner API, notably the > pipeline representation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5679) Beam should report the account used to attempt to submit a job to Dataflow
[ https://issues.apache.org/jira/browse/BEAM-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-5679: --- Assignee: (was: Henning Rohde) > Beam should report the account used to attempt to submit a job to Dataflow > -- > > Key: BEAM-5679 > URL: https://issues.apache.org/jira/browse/BEAM-5679 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Pablo Estrada >Priority: Major > > If a user lacks some privilege, the job will not be created. Users should see > which account is the one lacking privileges. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5791) Bound the amount of data on the data plane by time.
[ https://issues.apache.org/jira/browse/BEAM-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-5791: --- Assignee: (was: Henning Rohde) > Bound the amount of data on the data plane by time. > --- > > Key: BEAM-5791 > URL: https://issues.apache.org/jira/browse/BEAM-5791 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-java-harness, sdk-py-harness >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > This is especially important for Fn API reads, where each element represents > a shard to read and may be very expensive, but many elements may be waiting > in the Fn API buffer. > The need for this will be mitigated with full SDF support for liquid sharding > over the Fn API, but not eliminated unless the runner can "unread" elements > it has already sent. > This is especially important in for dataflow jobs that start out small but > then detect that they need more workers (e.g. due to the initial inputs being > an SDF). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5661) [beam_PostCommit_Py_ValCont] [:beam-sdks-python-container:docker] no such file or directory
[ https://issues.apache.org/jira/browse/BEAM-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-5661: --- Assignee: (was: Henning Rohde) > [beam_PostCommit_Py_ValCont] [:beam-sdks-python-container:docker] no such > file or directory > --- > > Key: BEAM-5661 > URL: https://issues.apache.org/jira/browse/BEAM-5661 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Scott Wegner >Priority: Major > > _Use this form to file an issue for test failure:_ > * [Jenkins Job|https://builds.apache.org/job/beam_PostCommit_Py_ValCont/902/] > * [Gradle Build > Scan|https://scans.gradle.com/s/pmjbkhxaeryx4/console-log?task=:beam-sdks-python-container:docker#L2] > * [Test source > code|https://github.com/apache/beam/blob/845f8d0abcc5a8d7f93457c27aff0feeb1a867d5/sdks/python/container/build.gradle#L61] > Initial investigation: > {{failed to get digest > sha256:4ee4ea2f0113e98b49d8e376ce847feb374ddf2b8ea775502459d8a1b8a3eaed: open > /var/lib/docker/image/aufs/imagedb/content/sha256/4ee4ea2f0113e98b49d8e376ce847feb374ddf2b8ea775502459d8a1b8a3eaed: > no such file or directory}} > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5657) Enable CheckStyle in dataflow java worker code
[ https://issues.apache.org/jira/browse/BEAM-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-5657: --- Assignee: (was: Henning Rohde) > Enable CheckStyle in dataflow java worker code > -- > > Key: BEAM-5657 > URL: https://issues.apache.org/jira/browse/BEAM-5657 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Priority: Minor > > Currently, the CheckStyle task is disabled in dataflow worker code. We should > fix the warnings and errors and enable this task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5689) Remove artifact naming constraint for portable Dataflow job
[ https://issues.apache.org/jira/browse/BEAM-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-5689: --- Assignee: (was: Henning Rohde) > Remove artifact naming constraint for portable Dataflow job > --- > > Key: BEAM-5689 > URL: https://issues.apache.org/jira/browse/BEAM-5689 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Henning Rohde >Priority: Minor > > Artifact names/keys are not preserved in Dataflow. Remove the below > workarounds when they are. > * Go Dataflow runner > * Java and Python container boot code (probably) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5686) Remove DataflowRunnerHarness shim again
[ https://issues.apache.org/jira/browse/BEAM-5686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-5686: --- Assignee: (was: Henning Rohde) > Remove DataflowRunnerHarness shim again > --- > > Key: BEAM-5686 > URL: https://issues.apache.org/jira/browse/BEAM-5686 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Henning Rohde >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5659) Enable Javadocs check in dataflow java worker code
[ https://issues.apache.org/jira/browse/BEAM-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-5659: --- Assignee: (was: Henning Rohde) > Enable Javadocs check in dataflow java worker code > -- > > Key: BEAM-5659 > URL: https://issues.apache.org/jira/browse/BEAM-5659 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Priority: Minor > > The Javadoc check task is disabled for now. We should enable this task after > fixing all warnings. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5466) Cannot deploy job to Dataflow with RuntimeValueProvider
[ https://issues.apache.org/jira/browse/BEAM-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-5466: --- Assignee: (was: Henning Rohde) > Cannot deploy job to Dataflow with RuntimeValueProvider > --- > > Key: BEAM-5466 > URL: https://issues.apache.org/jira/browse/BEAM-5466 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Affects Versions: 2.6.0 > Environment: Python 2.7 >Reporter: Mackenzie >Priority: Major > > I cannot deploy an apache beam job to Cloud Dataflow that contains runtime > value parameters. > The standard use case is with Cloud Dataflow Templates which use > RuntimeValueProvider to get template parameters. > When trying to call `get` on the parameter, I always get an error like: > {noformat} > apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: > myparam, type: str, default_value: 'defalut-value').get() not called from a > runtime context > {noformat} > > A minimal example: > {code:java} > class UserOptions(PipelineOptions): > @classmethod > def _add_argparse_args(cls, parser): > parser.add_value_provider_argument('--myparam', type=str, > default='default-value') > def run(argv=None): > parser = argparse.ArgumentParser() > known_args, pipeline_args = parser.parse_known_args(argv) > pipeline_options = PipelineOptions(pipeline_args) > pipeline_options.view_as(SetupOptions).save_main_session = True > google_cloud_options = pipeline_options.view_as(GoogleCloudOptions) > # insert google cloud options here, or pass them in arguments > standard_options = pipeline_options.view_as(StandardOptions) > standard_options.runner = 'DataflowRunner' > user_options = pipeline_options.view_as(UserOptions) > p = beam.Pipeline(options=pipeline_options) > param = user_options.myparam.get() # This line is the issue > result = p.run() > result.wait_until_finish() > if __name__ == '__main__': > run() > {code} > I would expect that the runtime context would be ignored when running the > script locally. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3204) Coders only should have a FunctionSpec, not an SdkFunctionSpec
[ https://issues.apache.org/jira/browse/BEAM-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-3204: --- Assignee: (was: Henning Rohde) > Coders only should have a FunctionSpec, not an SdkFunctionSpec > -- > > Key: BEAM-3204 > URL: https://issues.apache.org/jira/browse/BEAM-3204 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Kenneth Knowles >Priority: Major > Labels: portability > > We added environments to coders to account for "custom" coders where it is > only really possible for one SDK to understand them, like this: > {code} > Coder { > spec: SdkFunctionSpec { > environment: "java_sdk_docker_container", > spec: FunctionSpec { > urn: "beam:coder:java_custom_coder", > payload: > } > } > } > {code} > But a coder must be understood by both the producer of a PCollection and its > consumers. A coder is not the same as other UDF, though these are > user-defined. > A pipeline where either the producer or consumer cannot handle the coder is > invalid, and we will have to build our cross-language APIs to prevent > construction of such a pipeline. So we can drop the environment. > I think there are some folks who want to reserve the ability to add an > environment later, perhaps, to not pain ourselves into a corner. In this > case, we can just add a field to Coder. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-2885) Local Dataflow proxy for portable job submission
[ https://issues.apache.org/jira/browse/BEAM-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-2885: --- Assignee: (was: Henning Rohde) > Local Dataflow proxy for portable job submission > > > Key: BEAM-2885 > URL: https://issues.apache.org/jira/browse/BEAM-2885 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Henning Rohde >Priority: Major > Labels: portability > Time Spent: 4h 50m > Remaining Estimate: 0h > > As per https://s.apache.org/beam-job-api, use local support for > submission-side. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5658) Enable FindBugs in dataflow java worker code
[ https://issues.apache.org/jira/browse/BEAM-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-5658: --- Assignee: (was: Henning Rohde) > Enable FindBugs in dataflow java worker code > > > Key: BEAM-5658 > URL: https://issues.apache.org/jira/browse/BEAM-5658 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Boyuan Zhang >Priority: Minor > > Currently we disabled FindBugs task in worker code. We should enable if after > fixing all related warnings and errors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5512) Dataflow migrates to using shared portability library for user timers
[ https://issues.apache.org/jira/browse/BEAM-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-5512: --- Assignee: (was: Henning Rohde) > Dataflow migrates to using shared portability library for user timers > - > > Key: BEAM-5512 > URL: https://issues.apache.org/jira/browse/BEAM-5512 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Priority: Major > Original Estimate: 336h > Remaining Estimate: 336h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3514) Use portable WindowIntoPayload in DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-3514: --- Assignee: (was: Henning Rohde) > Use portable WindowIntoPayload in DataflowRunner > > > Key: BEAM-3514 > URL: https://issues.apache.org/jira/browse/BEAM-3514 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Kenneth Knowles >Priority: Major > Labels: portability > > The Java-specific blobs transmitted to Dataflow need more context, in the > form of portability framework protos. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3223) PTransform spec should not reuse FunctionSpec
[ https://issues.apache.org/jira/browse/BEAM-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henning Rohde reassigned BEAM-3223: --- Assignee: (was: Henning Rohde) > PTransform spec should not reuse FunctionSpec > - > > Key: BEAM-3223 > URL: https://issues.apache.org/jira/browse/BEAM-3223 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Henning Rohde >Priority: Major > Labels: portability > > We should add a new type instead, TransformSpec, say, or just inline a URN > and payload. It's confusing otherwise. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4821) Update Built-in I/O transforms page for Python
[ https://issues.apache.org/jira/browse/BEAM-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678699#comment-16678699 ] Udi Meiri commented on BEAM-4821: - I believe both reads and writes are possible on HDFS. > Update Built-in I/O transforms page for Python > -- > > Key: BEAM-4821 > URL: https://issues.apache.org/jira/browse/BEAM-4821 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Ahmet Altay >Assignee: Melissa Pashniak >Priority: Major > > We need to update: > [https://beam.apache.org/documentation/io/built-in/] > For python. Python now supports HDFS for reading. > cc: [~udim] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-5953) Support DataflowRunner on Python 3
[ https://issues.apache.org/jira/browse/BEAM-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677546#comment-16677546 ] Mark Liu edited comment on BEAM-5953 at 11/7/18 7:47 PM: - With provided Python 3 SDK container (by BEAM-5089) and some fix ([https://github.com/markflyhigh/incubator-beam/pull/3]) to the Python 3 type error, I'm able to invoke wordcount_fnapi_it against TestDataflowRunner on Python 3. The job can be submitted to the service but the runner harness seems broken. Failure job link: [https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-11-05_15_43_59-9596490965399700763?project=google.com:clouddfe] Exception in worker log: {code:java} I Exception in thread "main" I org.apache.beam.vendor.protobuf.v3.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type. I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:115) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:551) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.GeneratedMessageV3.parseUnknownFieldProto3(GeneratedMessageV3.java:305) I at org.apache.beam.model.pipeline.v1.RunnerApi$PTransform.(RunnerApi.java:7084) I at org.apache.beam.model.pipeline.v1.RunnerApi$PTransform.(RunnerApi.java:6978) I at org.apache.beam.model.pipeline.v1.RunnerApi$PTransform$1.parsePartialFrom(RunnerApi.java:9169) I at org.apache.beam.model.pipeline.v1.RunnerApi$PTransform$1.parsePartialFrom(RunnerApi.java:9163) I at org.apache.beam.model.pipeline.v1.RunnerApi$PTransform$Builder.mergeFrom(RunnerApi.java:8052) I at org.apache.beam.model.pipeline.v1.RunnerApi$PTransform$Builder.mergeFrom(RunnerApi.java:7835) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2408) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntryLite.parseField(MapEntryLite.java:128) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntryLite.parseEntry(MapEntryLite.java:184) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntry.(MapEntry.java:106) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntry.(MapEntry.java:50) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntry$Metadata$1.parsePartialFrom(MapEntry.java:70) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.MapEntry$Metadata$1.parsePartialFrom(MapEntry.java:64) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2424) I at org.apache.beam.model.pipeline.v1.RunnerApi$Components.(RunnerApi.java:343) I at org.apache.beam.model.pipeline.v1.RunnerApi$Components.(RunnerApi.java:300) I at org.apache.beam.model.pipeline.v1.RunnerApi$Components$1.parsePartialFrom(RunnerApi.java:2166) I at org.apache.beam.model.pipeline.v1.RunnerApi$Components$1.parsePartialFrom(RunnerApi.java:2160) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2424) I at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.(RunnerApi.java:5523) I at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.(RunnerApi.java:5481) I at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline$1.parsePartialFrom(RunnerApi.java:6612) I at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline$1.parsePartialFrom(RunnerApi.java:6606) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:221) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:239) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:244) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) I at org.apache.beam.vendor.protobuf.v3.com.google.protobuf.GeneratedMessageV3.parseWithIOException(GeneratedMessageV3.java:311) I at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.parseFrom(RunnerApi.java:5853) I at org.apache.beam.runners.dataflow.worker.DataflowWorkerHarnessHelper.getPipelineFromEnv(DataflowWorkerHarnessHelper.java:117) I at org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:58) I java failed with exit status 1 F Harness failed: exit status 1 {code} was (Author:
[jira] [Work logged] (BEAM-4681) Integrate support for timers using the portability APIs into Flink
[ https://issues.apache.org/jira/browse/BEAM-4681?focusedWorklogId=163628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163628 ] ASF GitHub Bot logged work on BEAM-4681: Author: ASF GitHub Bot Created on: 07/Nov/18 19:33 Start Date: 07/Nov/18 19:33 Worklog Time Spent: 10m Work Description: mxm commented on a change in pull request #6981: [BEAM-4681] Add support for portable timers in Flink streaming mode URL: https://github.com/apache/beam/pull/6981#discussion_r231647452 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java ## @@ -359,16 +543,82 @@ public void finishBundle() { emitResults(); } catch (Exception e) { throw new RuntimeException("Failed to finish remote bundle", e); + } finally { +remoteBundle = null; + } + if (bundleFinishedCallback != null) { +bundleFinishedCallback.run(); +bundleFinishedCallback = null; } } +boolean isBundleInProgress() { + return remoteBundle != null; +} + +void setBundleFinishedCallback(Runnable callback) { + this.bundleFinishedCallback = callback; +} + private void emitResults() { KV result; while ((result = outputQueue.poll()) != null) { -outputManager.output(outputMap.get(result.getKey()), (WindowedValue) result.getValue()); +final String inputCollectionId = result.getKey(); +TupleTag tag = outputMap.get(inputCollectionId); +WindowedValue windowedValue = +Preconditions.checkNotNull( +(WindowedValue) result.getValue(), +"Received a null value from the SDK harness for %s", +inputCollectionId); +if (tag != null) { + // process regular elements + outputManager.output(tag, windowedValue); +} else { + // process timer elements + // TODO This is ugly. There should be an easier way to retrieve the + String timerPCollectionId = + inputCollectionId.substring(0, inputCollectionId.length() - ".out:0".length()); Review comment: @lukecwik I couldn't figure out how to retrieve the correct name for the collection of the timer input. I believe this needs to be fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163628) Time Spent: 20m (was: 10m) > Integrate support for timers using the portability APIs into Flink > -- > > Key: BEAM-4681 > URL: https://issues.apache.org/jira/browse/BEAM-4681 > Project: Beam > Issue Type: Sub-task > Components: runner-flink >Reporter: Luke Cwik >Assignee: Maximilian Michels >Priority: Major > Labels: portability, portability-flink > Time Spent: 20m > Remaining Estimate: 0h > > Consider using the code produced in BEAM-4658 to support timers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6008) Propagate errors through portable portable runner
[ https://issues.apache.org/jira/browse/BEAM-6008?focusedWorklogId=163629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163629 ] ASF GitHub Bot logged work on BEAM-6008: Author: ASF GitHub Bot Created on: 07/Nov/18 19:33 Start Date: 07/Nov/18 19:33 Worklog Time Spent: 10m Work Description: angoenka commented on issue #6973: [BEAM-6008] Propagate errors through portable portable runner. URL: https://github.com/apache/beam/pull/6973#issuecomment-436749667 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163629) Time Spent: 40m (was: 0.5h) > Propagate errors through portable portable runner > - > > Key: BEAM-6008 > URL: https://issues.apache.org/jira/browse/BEAM-6008 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163625 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 19:24 Start Date: 07/Nov/18 19:24 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#issuecomment-436746737 All comments addressed. PTAL. @lgajowy regarding `/gradlew clean assemble`, you are correct. We are able to reproduce it. Root cause is: when clean task evaluates the target, things are slightly different. Made an update. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163625) Time Spent: 5h 20m (was: 5h 10m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4681) Integrate support for timers using the portability APIs into Flink
[ https://issues.apache.org/jira/browse/BEAM-4681?focusedWorklogId=163627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163627 ] ASF GitHub Bot logged work on BEAM-4681: Author: ASF GitHub Bot Created on: 07/Nov/18 19:30 Start Date: 07/Nov/18 19:30 Worklog Time Spent: 10m Work Description: mxm opened a new pull request #6981: [BEAM-4681] Add support for portable timers in Flink streaming mode URL: https://github.com/apache/beam/pull/6981 This adds support for portable timers to the Flink Runner in streaming mode. Batch support will be added in a follow-up. The `UsesTimersInParDo` tests of Java ValidatesPortableRunner have been run on it and they are passing. We can't enable them yet because they currently run only for batch. See https://issues.apache.org/jira/browse/BEAM-6009. To enable the Python ValidatesPortableRunner tests https://issues.apache.org/jira/browse/BEAM-5999 needs to be addressed. CC @tweise @angoenka @robertwb Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163627) Time Spent: 10m Remaining Estimate: 0h > Integrate support for timers using the portability APIs into Flink > -- > > Key: BEAM-4681 > URL: https://issues.apache.org/jira/browse/BEAM-4681 > Project: Beam >
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163602 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 19:08 Start Date: 07/Nov/18 19:08 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6916: [BEAM-5931] Update nexmark performance test with dataflow worker jar URL: https://github.com/apache/beam/pull/6916#issuecomment-436741374 @lgajowy Looking at the error more closely. The error happens within operation ```com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation``` rather than from inside worker jar. Maybe we are hitting testing account cloud usage , or started too many performance tests in parallel? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163602) Time Spent: 5h 10m (was: 5h) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.
[ https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163596 ] ASF GitHub Bot logged work on BEAM-6005: Author: ASF GitHub Bot Created on: 07/Nov/18 19:00 Start Date: 07/Nov/18 19:00 Worklog Time Spent: 10m Work Description: jasonkuster closed pull request #6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually function. URL: https://github.com/apache/beam/pull/6923 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java index 9cc400b4639..28f6fe8dcdc 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java @@ -17,7 +17,8 @@ */ package org.apache.beam.sdk.coders; -import static org.hamcrest.CoreMatchers.instanceOf; +import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; +import static org.junit.Assert.assertThat; import java.io.IOException; import java.io.InputStream; @@ -25,26 +26,37 @@ import java.io.ObjectOutputStream; import java.io.OutputStream; import java.lang.reflect.InvocationTargetException; +import java.util.ArrayList; import java.util.Arrays; import java.util.List; import org.apache.beam.sdk.Pipeline; -import org.apache.beam.sdk.options.PipelineOptions; -import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.apache.beam.sdk.testing.NeedsRunner; import org.apache.beam.sdk.testing.PAssert; import org.apache.beam.sdk.testing.TestPipeline; import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.DoFn.ProcessContext; +import org.apache.beam.sdk.transforms.DoFn.ProcessElement; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.Reshuffle; +import org.apache.beam.sdk.transforms.SerializableFunction; import org.apache.beam.sdk.values.PCollection; +import org.hamcrest.Description; +import org.hamcrest.TypeSafeMatcher; +import org.junit.Ignore; import org.junit.Rule; import org.junit.Test; import org.junit.experimental.categories.Category; import org.junit.rules.ExpectedException; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; /** Tests for coder exception handling in runners. */ @RunWith(JUnit4.class) public class PCollectionCustomCoderTest { + private static final Logger LOG = LoggerFactory.getLogger(PCollectionCustomCoderTest.class); /** * A custom test coder that can throw various exceptions during: * @@ -58,7 +70,8 @@ static final String NULL_POINTER_EXCEPTION = "java.lang.NullPointerException"; static final String EXCEPTION_MESSAGE = "Super Unique Message!!!"; - @Rule public ExpectedException thrown = ExpectedException.none(); + @Rule public final transient ExpectedException thrown = ExpectedException.none(); + @Rule public final transient TestPipeline pipeline = TestPipeline.create(); /** Wrapper of StringUtf8Coder with customizable exception-throwing. */ public static class CustomTestCoder extends CustomCoder { @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) throws IOException { @Test @Category(NeedsRunner.class) public void testDecodingIOException() throws Exception { -thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); Pipeline p = -runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testDecodingNPException() throws Exception { -thrown.expect(RuntimeException.class); -thrown.expectMessage("java.lang.NullPointerException: Super Unique Message!!!"); - Pipeline p = -runPipelineWith( +pipelineWith( new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super Unique Message!!!")); + +p.run().waitUntilFinish(); }
[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.
[ https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163595 ] ASF GitHub Bot logged work on BEAM-6005: Author: ASF GitHub Bot Created on: 07/Nov/18 19:00 Start Date: 07/Nov/18 19:00 Worklog Time Spent: 10m Work Description: jasonkuster commented on issue #6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually function. URL: https://github.com/apache/beam/pull/6923#issuecomment-436738928 Presubmits came back green. Merging. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163595) Time Spent: 1h 10m (was: 1h) > PCollectionCustomCoderTest passes spuriously. > - > > Key: BEAM-6005 > URL: https://issues.apache.org/jira/browse/BEAM-6005 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Jason Kuster >Assignee: Jason Kuster >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > The test assertions trigger before the coder is used on the actual runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163584 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 18:45 Start Date: 07/Nov/18 18:45 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#issuecomment-436734035 Run Java JdbcIO Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163584) Time Spent: 4h 40m (was: 4.5h) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163586=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163586 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 18:49 Start Date: 07/Nov/18 18:49 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6916: [BEAM-5931] Update nexmark performance test with dataflow worker jar URL: https://github.com/apache/beam/pull/6916#issuecomment-436735247 > Are recent Dataflow + Nexmark failures related to this change? See the logs: https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Dataflow/943/console > > And the error in particular: > > ``` > java.io.IOException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone > 10:01:00 { > 10:01:00 "code" : 429, > 10:01:00 "errors" : [ { > 10:01:00 "domain" : "usageLimits", > 10:01:00 "message" : "The total number of changes to the object temp-storage-for-perf-tests/nexmark/staging/beam-runners-google-cloud-dataflow-java-legacy-worker-2.9.0-SNAPSHOT-VzyH4c3KuvZFQMINha03bw.jar exceeds the rate limit. Please reduce the rate of create, update, and delete requests.", > 10:01:00 "reason" : "rateLimitExceeded" > 10:01:00 } ], > 10:01:00 "message" : "The total number of changes to the object temp-storage-for-perf-tests/nexmark/staging/beam-runners-google-cloud-dataflow-java-legacy-worker-2.9.0-SNAPSHOT-VzyH4c3KuvZFQMINha03bw.jar exceeds the rate limit. Please reduce the rate of create, update, and delete requests." > 10:01:00 } > 10:01:00 at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432) > 10:01:00 at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287) > 10:01:00 at org.apache.beam.runners.dataflow.util.PackageUtil.$closeResource(PackageUtil.java:260) > 10:01:00 at org.apache.beam.runners.dataflow.util.PackageUtil.tryStagePackage(PackageUtil.java:260) > 10:01:00 at org.apache.beam.runners.dataflow.util.PackageUtil.tryStagePackageWithRetry(PackageUtil.java:203) > 10:01:00 at org.apache.beam.runners.dataflow.util.PackageUtil.stagePackageSynchronously(PackageUtil.java:187) > 10:01:00 at org.apache.beam.runners.dataflow.util.PackageUtil.lambda$stagePackage$1(PackageUtil.java:171) > 10:01:00 at org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:104) > 10:01:00 at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > 10:01:00 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 10:01:00 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 10:01:00 at java.lang.Thread.run(Thread.java:748) > 10:01:00 Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone > ``` Hmmm... But since the error is about worker jar, could be relevant. Looking. + @boyuanzz as well This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163586) Time Spent: 5h (was: 4h 50m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163585 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 18:45 Start Date: 07/Nov/18 18:45 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#issuecomment-436734054 Run Java MongoDBIO Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163585) Time Spent: 4h 50m (was: 4h 40m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163583 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 18:44 Start Date: 07/Nov/18 18:44 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#issuecomment-436733681 Run Java TextIO Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163583) Time Spent: 4.5h (was: 4h 20m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5939) Deduplicate constants
[ https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163575 ] ASF GitHub Bot logged work on BEAM-5939: Author: ASF GitHub Bot Created on: 07/Nov/18 18:34 Start Date: 07/Nov/18 18:34 Worklog Time Spent: 10m Work Description: aaltay commented on issue #6976: [BEAM-5939] Dedupe runner constants URL: https://github.com/apache/beam/pull/6976#issuecomment-436730199 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163575) Time Spent: 50m (was: 40m) > Deduplicate constants > - > > Key: BEAM-5939 > URL: https://issues.apache.org/jira/browse/BEAM-5939 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: starer > Time Spent: 50m > Remaining Estimate: 0h > > apache_beam/runners/dataflow/internal/names.py > apache_beam/runners/portability/stager.py > has same constants defined in both files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6002) Nexmark tests timing out on all runners (crash loop due to metrics?)
[ https://issues.apache.org/jira/browse/BEAM-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Jayalath reassigned BEAM-6002: Assignee: Lukasz Gajowy (was: Chamikara Jayalath) > Nexmark tests timing out on all runners (crash loop due to metrics?) > - > > Key: BEAM-6002 > URL: https://issues.apache.org/jira/browse/BEAM-6002 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kenneth Knowles >Assignee: Lukasz Gajowy >Priority: Critical > Time Spent: 2h 40m > Remaining Estimate: 0h > > https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Spark/992/ > {code} > 08:58:26 2018-11-06T16:58:26.035Z RUNNING Query0 > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.elements, from namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.bytes, from namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.startTime for namespace Query0.Events > 08:58:26 2018-11-06T16:58:26.035Z no activity > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.endTime for namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.elements, from namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.bytes, from namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTime for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTime for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTimestamp for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTimestamp for namespace Query0.Results > 08:58:41 2018-11-06T16:58:41.035Z RUNNING Query0 > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.elements, from namespace Query0.Events > 08:58:41 2018-11-06T16:58:41.036Z no activity > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.bytes, from namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.startTime for namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.endTime for namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.elements, from namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.bytes, from namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTime for namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTime for namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTimestamp for namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTimestamp for namespace Query0.Results > 08:58:56 2018-11-06T16:58:56.036Z RUNNING Query0 > 08:58:56 18/11/06 16:58:56 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.elements, from namespace Query0.Events > 08:58:56 2018-11-06T16:58:56.036Z no activity > 08:58:56 18/11/06 16:58:56 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.bytes, from namespace Query0.Events > 08:58:56 18/11/06 16:58:56 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.startTime
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163582=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163582 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 18:38 Start Date: 07/Nov/18 18:38 Worklog Time Spent: 10m Work Description: HuangLED removed a comment on issue #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#issuecomment-436725979 Comments above addressed. @lgajowy You are correct. Tried out 'clean assemble' command, now I get the same exception. I don't understand yet how come this target runs perfectly, but not 'clean assemble' . Looking into it now (I am still kinda new to gradle), but if you know what might causing it, please let me know. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163582) Time Spent: 4h 20m (was: 4h 10m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work stopped] (BEAM-5965) RPAD
[ https://issues.apache.org/jira/browse/BEAM-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-5965 stopped by Rui Wang. -- > RPAD > > > Key: BEAM-5965 > URL: https://issues.apache.org/jira/browse/BEAM-5965 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > > RPAD(original_value, return_length[, pattern]) > Returns a value that consists of original_value appended with pattern. The > return_length is an INT64 that specifies the length of the returned value. If > original_value is BYTES, return_length is the number of bytes. If > original_value is STRING, return_length is the number of characters. > The default value of pattern is a blank space. > Both original_value and pattern must be the same data type. > If return_length is less than or equal to the original_value length, this > function returns the original_value value, truncated to the value of > return_length. For example, RPAD("hello world", 7); returns "hello w". > If original_value, return_length, or pattern is NULL, this function returns > NULL. > This function returns an error if: > return_length is negative > pattern is empty > Return type > STRING or BYTES > Examples > SELECT t, len, FORMAT("%T", RPAD(t, len)) AS RPAD FROM UNNEST([ > STRUCT('abc' AS t, 5 AS len), > ('abc', 2), > ('例子', 4) > ]); > t len RPAD > abc 5 "abc " > abc 2 "ab" > 例子4 "例子 " > SELECT t, len, pattern, FORMAT("%T", RPAD(t, len, pattern)) AS RPAD FROM > UNNEST([ > STRUCT('abc' AS t, 8 AS len, 'def' AS pattern), > ('abc', 5, '-'), > ('例子', 5, '中文') > ]); > t len pattern RPAD > abc 8 def "abcdefde" > abc 5 - "abc--" > 例子5 中文 "例子中文中" > SELECT FORMAT("%T", t) AS t, len, FORMAT("%T", RPAD(t, len)) AS RPAD FROM > UNNEST([ > STRUCT(b'abc' AS t, 5 AS len), > (b'abc', 2), > (b'\xab\xcd\xef', 4) > ]); > t len RPAD > b"abc"5 b"abc " > b"abc"2 b"ab" > b"\xab\xcd\xef" 4 b"\xab\xcd\xef " > SELECT > FORMAT("%T", t) AS t, > len, > FORMAT("%T", pattern) AS pattern, > FORMAT("%T", RPAD(t, len, pattern)) AS RPAD > FROM UNNEST([ > STRUCT(b'abc' AS t, 8 AS len, b'def' AS pattern), > (b'abc', 5, b'-'), > (b'\xab\xcd\xef', 5, b'\x00') > ]); > t len pattern RPAD > b"abc"8 b"def" b"abcdefde" > b"abc"5 b"-"b"abc--" > b"\xab\xcd\xef" 5 b"\x00" b"\xab\xcd\xef\x00\x00" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.
[ https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163574 ] ASF GitHub Bot logged work on BEAM-6005: Author: ASF GitHub Bot Created on: 07/Nov/18 18:33 Start Date: 07/Nov/18 18:33 Worklog Time Spent: 10m Work Description: jasonkuster commented on issue #6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually function. URL: https://github.com/apache/beam/pull/6923#issuecomment-436729915 Overall answer to the questions here: There is no contract that specified what kind of exception will be bubbled back up to the client when there are exceptions on the worker. In the case of the DirectRunner, the exceptions are bubbled directly up, but in the case of runners which run remote they may return just Exception, they may return RuntimeException, or they may attempt to pipe the actual exception cause up. This test is not intended to enforce a contract for how the runners should handle this behavior, and the test prior to this would only have passed on the DirectRunner. This way it can be run against any runner. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163574) Time Spent: 1h (was: 50m) > PCollectionCustomCoderTest passes spuriously. > - > > Key: BEAM-6005 > URL: https://issues.apache.org/jira/browse/BEAM-6005 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Jason Kuster >Assignee: Jason Kuster >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > The test assertions trigger before the coder is used on the actual runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5979) Support DATE and TIME in DML
[ https://issues.apache.org/jira/browse/BEAM-5979?focusedWorklogId=163572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163572 ] ASF GitHub Bot logged work on BEAM-5979: Author: ASF GitHub Bot Created on: 07/Nov/18 18:31 Start Date: 07/Nov/18 18:31 Worklog Time Spent: 10m Work Description: akedin closed pull request #6967: [BEAM-5979] Fix DATE and TIME in INSERTION URL: https://github.com/apache/beam/pull/6967 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java index 0b6a1aee6ee..325208278ab 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java @@ -106,7 +106,8 @@ public static Object autoCastField(Schema.Field field, Object rawObj) { return rawObj; } } else if (type.isDateType()) { - return DateTime.parse(rawObj.toString()); + // Internal representation of DateType in Calcite is convertible to Joda's Datetime. + return new DateTime(rawObj); } else if (type.isNumericType() && ((rawObj instanceof String) || (rawObj instanceof BigDecimal && type != TypeName.DECIMAL))) { diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java index fffe52ac3f1..ea06fc5b010 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java @@ -103,7 +103,8 @@ public void addRows(String tableName, Row... rows) { return tables().get(tableName).rows; } - private static class TableWithRows implements Serializable { + /** TableWitRows. */ + public static class TableWithRows implements Serializable { private Table table; private List rows; private long tableProviderInstanceId; @@ -113,6 +114,10 @@ public TableWithRows(long tableProviderInstanceId, Table table) { this.table = table; this.rows = new CopyOnWriteArrayList<>(); } + +public List getRows() { + return rows; +} } private static class InMemoryTable implements BeamSqlTable { diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java index c7ece0de280..703e9a08bff 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlCliTest.java @@ -30,10 +30,14 @@ import java.util.stream.Stream; import org.apache.beam.sdk.extensions.sql.impl.ParseException; import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestTableProvider; import org.apache.beam.sdk.extensions.sql.meta.provider.text.TextTableProvider; import org.apache.beam.sdk.extensions.sql.meta.store.InMemoryMetaStore; import org.apache.beam.sdk.schemas.Schema; import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.values.Row; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; import org.junit.Test; /** UnitTest for {@link BeamSqlCli}. */ @@ -235,4 +239,41 @@ public void testExplainQuery() throws Exception { "BeamCalcRel(expr#0..2=[{inputs}], proj#0..2=[{exprs}])\n" + " BeamIOSourceRel(table=[[beam, person]])\n")); } + + @Test + public void test_time_types() throws Exception { +InMemoryMetaStore metaStore = new InMemoryMetaStore(); +TestTableProvider testTableProvider = new TestTableProvider(); +metaStore.registerProvider(testTableProvider); + +BeamSqlCli cli = new BeamSqlCli().metaStore(metaStore); +cli.execute( +"CREATE EXTERNAL TABLE test_table (\n" ++ "f_date DATE, \n" ++ "f_time TIME, \n" ++ "f_ts TIMESTAMP" ++ ") \n" ++ "TYPE 'test'"); + +cli.execute( +"INSERT INTO test_table VALUES (" ++ "DATE '2018-11-01', " ++ "TIME
[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.
[ https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163571 ] ASF GitHub Bot logged work on BEAM-6005: Author: ASF GitHub Bot Created on: 07/Nov/18 18:30 Start Date: 07/Nov/18 18:30 Worklog Time Spent: 10m Work Description: jasonkuster commented on issue #6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually function. URL: https://github.com/apache/beam/pull/6923#issuecomment-436729213 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163571) Time Spent: 50m (was: 40m) > PCollectionCustomCoderTest passes spuriously. > - > > Key: BEAM-6005 > URL: https://issues.apache.org/jira/browse/BEAM-6005 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Jason Kuster >Assignee: Jason Kuster >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > The test assertions trigger before the coder is used on the actual runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-6006) Test TIME +/- Interval
[ https://issues.apache.org/jira/browse/BEAM-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang closed BEAM-6006. -- Resolution: Fixed Fix Version/s: Not applicable > Test TIME +/- Interval > -- > > Key: BEAM-6006 > URL: https://issues.apache.org/jira/browse/BEAM-6006 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6010) Deprecate KafkaIO withTimestampFn()
[ https://issues.apache.org/jira/browse/BEAM-6010?focusedWorklogId=163561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163561 ] ASF GitHub Bot logged work on BEAM-6010: Author: ASF GitHub Bot Created on: 07/Nov/18 18:18 Start Date: 07/Nov/18 18:18 Worklog Time Spent: 10m Work Description: rangadi commented on issue #6964: [BEAM-6010] Deprecate KafkaIO withTimestampFn(). URL: https://github.com/apache/beam/pull/6964#issuecomment-436724587 Thanks for merging this. We can remove the deprecated API in KafkaIO. I was wondering if we should do that now or for 3.0. If you agree, I can send a PR right away to remove them. They have been deprecated for quite a while. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163561) Time Spent: 20m (was: 10m) > Deprecate KafkaIO withTimestampFn() > --- > > Key: BEAM-6010 > URL: https://issues.apache.org/jira/browse/BEAM-6010 > Project: Beam > Issue Type: Bug > Components: io-java-kafka >Affects Versions: 2.8.0 >Reporter: Alexey Romanenko >Assignee: Raghu Angadi >Priority: Major > Fix For: 2.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163563 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 18:21 Start Date: 07/Nov/18 18:21 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#issuecomment-436725979 Comments above addressed. @lgajowy You are correct. Tried out 'clean assemble' command, now I get the same exception. I don't understand yet how come this target runs perfectly, but not 'clean assemble' . Looking into it now (I am still kinda new to gradle), but if you know what might causing it, please let me know. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163563) Time Spent: 4h 10m (was: 4h) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only
[ https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163562 ] ASF GitHub Bot logged work on BEAM-5906: Author: ASF GitHub Bot Created on: 07/Nov/18 18:19 Start Date: 07/Nov/18 18:19 Worklog Time Spent: 10m Work Description: lgajowy commented on a change in pull request #6886: [BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results URL: https://github.com/apache/beam/pull/6886#discussion_r231621419 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -398,6 +399,7 @@ class BeamModulePlugin implements Plugin { google_api_services_storage : "com.google.apis:google-api-services-storage:v1-rev136-$google_clients_version", google_auth_library_credentials : "com.google.auth:google-auth-library-credentials:$google_auth_version", google_auth_library_oauth2_http : "com.google.auth:google-auth-library-oauth2-http:$google_auth_version", +google_cloud_bigquery : "com.google.cloud:google-cloud-bigquery:$google_cloud_bigquery_version", Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163562) Time Spent: 2h 50m (was: 2h 40m) > Remove pipeline for publishing nexmark results to bigQuery and publish using > BigQuery API only > -- > > Key: BEAM-5906 > URL: https://issues.apache.org/jira/browse/BEAM-5906 > Project: Beam > Issue Type: Improvement > Components: examples-nexmark >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > There's no need to start a separate pipeline for uploading metrics results > from Nexmark suites to BigQuery. We can use an API designed for that and > place it in test-utils. Thanks to that: > - it won't start a separate pipeline every time it publishes results > - other suites will be able to use that code > - We will not face problems like special long to int conversion due to > problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163559 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 18:14 Start Date: 07/Nov/18 18:14 Worklog Time Spent: 10m Work Description: HuangLED commented on a change in pull request #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#discussion_r231619754 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1139,7 +1140,25 @@ artifactId=${project.name} outputs.upToDateWhen { false } include "**/*IT.class" -systemProperties.beamTestPipelineOptions = configuration.integrationTestPipelineOptions + +def pipelineOptionString = configuration.integrationTestPipelineOptions +if(configuration.runner?.equalsIgnoreCase('dataflow')) { + project.evaluationDependsOn(":beam-runners-google-cloud-dataflow-java-legacy-worker") + + def jsonSlurper = new JsonSlurper() + def allOptionsList = jsonSlurper.parseText(pipelineOptionString) + def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: + project.project(":beam-runners-google-cloud-dataflow-java-legacy-worker").shadowJar.archivePath Review comment: Only reason is to follow the convention in this file, for better readability. If there is reason other one of using `depends on` is preferred, please let me know, I shall update accordingly. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163559) Time Spent: 4h (was: 3h 50m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5979) Support DATE and TIME in DML
[ https://issues.apache.org/jira/browse/BEAM-5979?focusedWorklogId=163560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163560 ] ASF GitHub Bot logged work on BEAM-5979: Author: ASF GitHub Bot Created on: 07/Nov/18 18:14 Start Date: 07/Nov/18 18:14 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #6967: [BEAM-5979] Fix DATE and TIME in INSERTION URL: https://github.com/apache/beam/pull/6967#issuecomment-436723330 @akedin Can you help merge this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163560) Time Spent: 1h 50m (was: 1h 40m) > Support DATE and TIME in DML > > > Key: BEAM-5979 > URL: https://issues.apache.org/jira/browse/BEAM-5979 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Right now, BeamSQL uses Schema's DATETIME field to save all time related > data. However, BeamSQL doesn't implement correctly how TIME and DATE should > be converted to Joda's datetime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5939) Deduplicate constants
[ https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163540 ] ASF GitHub Bot logged work on BEAM-5939: Author: ASF GitHub Bot Created on: 07/Nov/18 18:01 Start Date: 07/Nov/18 18:01 Worklog Time Spent: 10m Work Description: brianmartin commented on a change in pull request #6976: [BEAM-5939] Dedupe runner constants URL: https://github.com/apache/beam/pull/6976#discussion_r231615270 ## File path: sdks/python/apache_beam/runners/dataflow/internal/names.py ## @@ -22,11 +22,9 @@ from __future__ import absolute_import -# TODO (altay): Move shared names to a common location. # Standard file names used for staging files. from builtins import object -PICKLED_MAIN_SESSION_FILE = 'pickled_main_session' DATAFLOW_SDK_TARBALL_FILE = 'dataflow_python_sdk.tar' STAGED_PIPELINE_FILENAME = "pipeline.pb" Review comment: I've moved `STAGED_PIPELINE_FILENAME ` to the common location. Though, it is only used in the Dataflow runner. Also, the `STAGED_PIPELINE_URL_METADATA_FIELD` constant was not used anywhere; I've removed it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163540) Time Spent: 40m (was: 0.5h) > Deduplicate constants > - > > Key: BEAM-5939 > URL: https://issues.apache.org/jira/browse/BEAM-5939 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: starer > Time Spent: 40m > Remaining Estimate: 0h > > apache_beam/runners/dataflow/internal/names.py > apache_beam/runners/portability/stager.py > has same constants defined in both files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6010) Deprecate KafkaIO withTimestampFn()
[ https://issues.apache.org/jira/browse/BEAM-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678588#comment-16678588 ] Raghu Angadi commented on BEAM-6010: Thank you for filing this. > Deprecate KafkaIO withTimestampFn() > --- > > Key: BEAM-6010 > URL: https://issues.apache.org/jira/browse/BEAM-6010 > Project: Beam > Issue Type: Bug > Components: io-java-kafka >Affects Versions: 2.8.0 >Reporter: Alexey Romanenko >Assignee: Raghu Angadi >Priority: Major > Fix For: 2.9.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5798) Add support for dynamic destinations when writing to Kafka
[ https://issues.apache.org/jira/browse/BEAM-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678582#comment-16678582 ] Raghu Angadi commented on BEAM-5798: [~aromanenko], yeah, that's what I had in mind. But we should go with whichever we are more comfortable with. > Add support for dynamic destinations when writing to Kafka > -- > > Key: BEAM-5798 > URL: https://issues.apache.org/jira/browse/BEAM-5798 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka >Reporter: Luke Cwik >Assignee: Alexey Romanenko >Priority: Major > Labels: newbie, starter > Time Spent: 1h 50m > Remaining Estimate: 0h > > Add support for writing to Kafka based upon contents of the data. This is > similar to the dynamic destination approach for file IO and other sinks. > > Source of request: > https://lists.apache.org/thread.html/a89d1d32ecdb50c42271e805cc01a651ee3623b4df97c39baf4f2053@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4863) Implement consistentWithEquals/structuralValue on FullWindowedValueCoder
[ https://issues.apache.org/jira/browse/BEAM-4863?focusedWorklogId=163541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163541 ] ASF GitHub Bot logged work on BEAM-4863: Author: ASF GitHub Bot Created on: 07/Nov/18 18:01 Start Date: 07/Nov/18 18:01 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #6057: [BEAM-4863] Implement consistentWithEquals/structuralValue on FullWindowedValueCoder URL: https://github.com/apache/beam/pull/6057#issuecomment-436719050 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163541) Time Spent: 1.5h (was: 1h 20m) > Implement consistentWithEquals/structuralValue on FullWindowedValueCoder > > > Key: BEAM-4863 > URL: https://issues.apache.org/jira/browse/BEAM-4863 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Implementing *consistentWithEquals*/*structuralValue* boosts significantly > the performance of using these values in comparison operations since it > doesn't require encoding the values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5939) Deduplicate constants
[ https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163538 ] ASF GitHub Bot logged work on BEAM-5939: Author: ASF GitHub Bot Created on: 07/Nov/18 17:59 Start Date: 07/Nov/18 17:59 Worklog Time Spent: 10m Work Description: brianmartin commented on a change in pull request #6976: [BEAM-5939] Dedupe runner constants URL: https://github.com/apache/beam/pull/6976#discussion_r231614348 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -51,6 +51,7 @@ from apache_beam.runners.dataflow.internal import names from apache_beam.runners.dataflow.internal.clients import dataflow from apache_beam.runners.dataflow.internal.names import PropertyNames +from apache_beam.runners.internal.names import BEAM_PACKAGE_NAME, BEAM_SDK_NAME Review comment: Sounds good. I had to import as `shared_names` to avoid conflict with the import on line 51 though. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163538) Time Spent: 0.5h (was: 20m) > Deduplicate constants > - > > Key: BEAM-5939 > URL: https://issues.apache.org/jira/browse/BEAM-5939 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: starer > Time Spent: 0.5h > Remaining Estimate: 0h > > apache_beam/runners/dataflow/internal/names.py > apache_beam/runners/portability/stager.py > has same constants defined in both files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6006) Test TIME +/- Interval
[ https://issues.apache.org/jira/browse/BEAM-6006?focusedWorklogId=163537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163537 ] ASF GitHub Bot logged work on BEAM-6006: Author: ASF GitHub Bot Created on: 07/Nov/18 17:53 Start Date: 07/Nov/18 17:53 Worklog Time Spent: 10m Work Description: akedin closed pull request #6972: [BEAM-6006] Test TIME +/- INTERVAL URL: https://github.com/apache/beam/pull/6972 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpression.java b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpression.java index cc3c75a15e0..025265d43d8 100644 --- a/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpression.java +++ b/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpression.java @@ -43,7 +43,9 @@ public boolean accept() { static boolean accept(List operands, SqlTypeName outputType) { return operands.size() == 2 -&& (SqlTypeName.TIMESTAMP.equals(outputType) || SqlTypeName.DATE.equals(outputType)) +&& (SqlTypeName.TIMESTAMP.equals(outputType) +|| SqlTypeName.DATE.equals(outputType) +|| SqlTypeName.TIME.equals(outputType)) && SqlTypeName.DATETIME_TYPES.contains(operands.get(0).getOutputType()) && TimeUnitUtils.INTERVALS_DURATIONS_TYPES.containsKey(operands.get(1).getOutputType()); } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java index 35f265ab535..192f86da727 100644 --- a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java +++ b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslSqlStdOperatorsTest.java @@ -1197,7 +1197,10 @@ public void testDatetimeInfixPlus() { parseTimestamp("1986-01-19 01:02:03")) .addExpr("DATE '1984-04-19' + INTERVAL '2' DAY", parseDate("1984-04-21")) .addExpr("DATE '1984-04-19' + INTERVAL '1' MONTH", parseDate("1984-05-19")) -.addExpr("DATE '1984-04-19' + INTERVAL '3' YEAR", parseDate("1987-04-19")); +.addExpr("DATE '1984-04-19' + INTERVAL '3' YEAR", parseDate("1987-04-19")) +.addExpr("TIME '14:28:30' + INTERVAL '15' SECOND", parseTime("14:28:45")) +.addExpr("TIME '14:28:30.239' + INTERVAL '4' MINUTE", parseTime("14:32:30.239")) +.addExpr("TIME '14:28:30.2' + INTERVAL '4' HOUR", parseTime("18:28:30.2")); checker.buildRunAndCheck(); } @@ -1292,7 +1295,7 @@ public void testTimestampDiff() { @Test // More needed @SqlOperatorTest(name = "-", kind = "MINUS") - public void testTimestampMinusInterval() throws Exception { + public void testTimestampMinusInterval() { ExpressionChecker checker = new ExpressionChecker() .addExpr( @@ -1315,8 +1318,11 @@ public void testTimestampMinusInterval() throws Exception { parseTimestamp("1983-01-19 01:01:58")) .addExpr("DATE '1984-04-19' - INTERVAL '2' DAY", parseDate("1984-04-17")) .addExpr("DATE '1984-04-19' - INTERVAL '1' MONTH", parseDate("1984-03-19")) -.addExpr("DATE '1984-04-19' - INTERVAL '3' YEAR", parseDate("1981-04-19")); -; +.addExpr("DATE '1984-04-19' - INTERVAL '3' YEAR", parseDate("1981-04-19")) +.addExpr("TIME '14:28:30' - INTERVAL '15' SECOND", parseTime("14:28:15")) +.addExpr("TIME '14:28:30.239' - INTERVAL '4' MINUTE", parseTime("14:24:30.239")) +.addExpr("TIME '14:28:30.2' - INTERVAL '4' HOUR", parseTime("10:28:30.2")); + checker.buildRunAndCheck(); } diff --git a/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpressionTest.java b/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/interpreter/operator/date/BeamSqlDatetimeMinusIntervalExpressionTest.java index 870057af148..939a1cb7627 100644 ---
[jira] [Work logged] (BEAM-5939) Deduplicate constants
[ https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163531 ] ASF GitHub Bot logged work on BEAM-5939: Author: ASF GitHub Bot Created on: 07/Nov/18 17:46 Start Date: 07/Nov/18 17:46 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #6976: [BEAM-5939] Dedupe runner constants URL: https://github.com/apache/beam/pull/6976#discussion_r231608454 ## File path: sdks/python/apache_beam/runners/dataflow/internal/names.py ## @@ -22,11 +22,9 @@ from __future__ import absolute_import -# TODO (altay): Move shared names to a common location. # Standard file names used for staging files. from builtins import object -PICKLED_MAIN_SESSION_FILE = 'pickled_main_session' DATAFLOW_SDK_TARBALL_FILE = 'dataflow_python_sdk.tar' STAGED_PIPELINE_FILENAME = "pipeline.pb" Review comment: STAGED_... constants could also move to a common place. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163531) Time Spent: 20m (was: 10m) > Deduplicate constants > - > > Key: BEAM-5939 > URL: https://issues.apache.org/jira/browse/BEAM-5939 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: starer > Time Spent: 20m > Remaining Estimate: 0h > > apache_beam/runners/dataflow/internal/names.py > apache_beam/runners/portability/stager.py > has same constants defined in both files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5939) Deduplicate constants
[ https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163532 ] ASF GitHub Bot logged work on BEAM-5939: Author: ASF GitHub Bot Created on: 07/Nov/18 17:46 Start Date: 07/Nov/18 17:46 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #6976: [BEAM-5939] Dedupe runner constants URL: https://github.com/apache/beam/pull/6976#discussion_r231608251 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -51,6 +51,7 @@ from apache_beam.runners.dataflow.internal import names from apache_beam.runners.dataflow.internal.clients import dataflow from apache_beam.runners.dataflow.internal.names import PropertyNames +from apache_beam.runners.internal.names import BEAM_PACKAGE_NAME, BEAM_SDK_NAME Review comment: I believe it would be more clear, if you import names, and then use names.BEAM_PACKAGE_NAME etc. in the code. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163532) Time Spent: 20m (was: 10m) > Deduplicate constants > - > > Key: BEAM-5939 > URL: https://issues.apache.org/jira/browse/BEAM-5939 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: starer > Time Spent: 20m > Remaining Estimate: 0h > > apache_beam/runners/dataflow/internal/names.py > apache_beam/runners/portability/stager.py > has same constants defined in both files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.
[ https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163511 ] ASF GitHub Bot logged work on BEAM-6005: Author: ASF GitHub Bot Created on: 07/Nov/18 17:25 Start Date: 07/Nov/18 17:25 Worklog Time Spent: 10m Work Description: supercclank commented on a change in pull request #6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually function. URL: https://github.com/apache/beam/pull/6923#discussion_r231599742 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java ## @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) throws IOException { @Test @Category(NeedsRunner.class) public void testDecodingIOException() throws Exception { -thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); Pipeline p = -runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testDecodingNPException() throws Exception { -thrown.expect(RuntimeException.class); -thrown.expectMessage("java.lang.NullPointerException: Super Unique Message!!!"); - Pipeline p = -runPipelineWith( +pipelineWith( new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super Unique Message!!!")); + +p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testEncodingIOException() throws Exception { +Pipeline p = +pipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, EXCEPTION_MESSAGE)); thrown.expect(Exception.class); Review comment: Is this able to be changed to "IOException.class"? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163511) Time Spent: 40m (was: 0.5h) > PCollectionCustomCoderTest passes spuriously. > - > > Key: BEAM-6005 > URL: https://issues.apache.org/jira/browse/BEAM-6005 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Jason Kuster >Assignee: Jason Kuster >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > The test assertions trigger before the coder is used on the actual runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only
[ https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163528 ] ASF GitHub Bot logged work on BEAM-5906: Author: ASF GitHub Bot Created on: 07/Nov/18 17:40 Start Date: 07/Nov/18 17:40 Worklog Time Spent: 10m Work Description: apilloud commented on issue #6886: [BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results URL: https://github.com/apache/beam/pull/6886#issuecomment-436711924 LGTM. Not sure what the status is on updating all of our dependencies, but it looks like it may be safe to use this client now (nothing blocks BEAM-4248 anymore). cc: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163528) Time Spent: 2h 40m (was: 2.5h) > Remove pipeline for publishing nexmark results to bigQuery and publish using > BigQuery API only > -- > > Key: BEAM-5906 > URL: https://issues.apache.org/jira/browse/BEAM-5906 > Project: Beam > Issue Type: Improvement > Components: examples-nexmark >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > > There's no need to start a separate pipeline for uploading metrics results > from Nexmark suites to BigQuery. We can use an API designed for that and > place it in test-utils. Thanks to that: > - it won't start a separate pipeline every time it publishes results > - other suites will be able to use that code > - We will not face problems like special long to int conversion due to > problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5939) Deduplicate constants
[ https://issues.apache.org/jira/browse/BEAM-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678550#comment-16678550 ] Brian Martin commented on BEAM-5939: Hey [~altay], I've made a PR for this minor change. > Deduplicate constants > - > > Key: BEAM-5939 > URL: https://issues.apache.org/jira/browse/BEAM-5939 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: starer > Time Spent: 10m > Remaining Estimate: 0h > > apache_beam/runners/dataflow/internal/names.py > apache_beam/runners/portability/stager.py > has same constants defined in both files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5939) Deduplicate constants
[ https://issues.apache.org/jira/browse/BEAM-5939?focusedWorklogId=163525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163525 ] ASF GitHub Bot logged work on BEAM-5939: Author: ASF GitHub Bot Created on: 07/Nov/18 17:36 Start Date: 07/Nov/18 17:36 Worklog Time Spent: 10m Work Description: brianmartin opened a new pull request #6976: [BEAM-5939] Dedupe runner constants URL: https://github.com/apache/beam/pull/6976 Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163525) Time Spent: 10m Remaining Estimate: 0h > Deduplicate constants > - > > Key: BEAM-5939 > URL: https://issues.apache.org/jira/browse/BEAM-5939 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: starer > Time Spent: 10m > Remaining Estimate: 0h > > apache_beam/runners/dataflow/internal/names.py > apache_beam/runners/portability/stager.py > has same constants defined in both files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163524 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 17:35 Start Date: 07/Nov/18 17:35 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#discussion_r231603735 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1139,7 +1140,25 @@ artifactId=${project.name} outputs.upToDateWhen { false } include "**/*IT.class" -systemProperties.beamTestPipelineOptions = configuration.integrationTestPipelineOptions + +def pipelineOptionString = configuration.integrationTestPipelineOptions +if(configuration.runner?.equalsIgnoreCase('dataflow')) { + project.evaluationDependsOn(":beam-runners-google-cloud-dataflow-java-legacy-worker") + + def jsonSlurper = new JsonSlurper() Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163524) Time Spent: 3h 50m (was: 3h 40m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163522 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 17:35 Start Date: 07/Nov/18 17:35 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#discussion_r231603625 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1139,7 +1140,25 @@ artifactId=${project.name} outputs.upToDateWhen { false } include "**/*IT.class" -systemProperties.beamTestPipelineOptions = configuration.integrationTestPipelineOptions + +def pipelineOptionString = configuration.integrationTestPipelineOptions +if(configuration.runner?.equalsIgnoreCase('dataflow')) { + project.evaluationDependsOn(":beam-runners-google-cloud-dataflow-java-legacy-worker") + + def jsonSlurper = new JsonSlurper() + def allOptionsList = jsonSlurper.parseText(pipelineOptionString) + def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: + project.project(":beam-runners-google-cloud-dataflow-java-legacy-worker").shadowJar.archivePath Review comment: The dependency on the archive below via `shadow it.project(path: ":beam-runners-google-cloud-dataflow-java-legacy-worker", configuration: 'shadow')` will ensure it gets built. Is there a reason why you went with a configuration based dependency over using `dependsOn ":beam-runners-google-cloud-dataflow-java-legacy-worker:shadowJar"` task like the others that were done? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163522) Time Spent: 3.5h (was: 3h 20m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163523 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 17:35 Start Date: 07/Nov/18 17:35 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#discussion_r231603738 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1139,7 +1140,25 @@ artifactId=${project.name} outputs.upToDateWhen { false } include "**/*IT.class" -systemProperties.beamTestPipelineOptions = configuration.integrationTestPipelineOptions + +def pipelineOptionString = configuration.integrationTestPipelineOptions Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163523) Time Spent: 3h 40m (was: 3.5h) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only
[ https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163521 ] ASF GitHub Bot logged work on BEAM-5906: Author: ASF GitHub Bot Created on: 07/Nov/18 17:34 Start Date: 07/Nov/18 17:34 Worklog Time Spent: 10m Work Description: apilloud commented on a change in pull request #6886: [BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results URL: https://github.com/apache/beam/pull/6886#discussion_r231604879 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -398,6 +399,7 @@ class BeamModulePlugin implements Plugin { google_api_services_storage : "com.google.apis:google-api-services-storage:v1-rev136-$google_clients_version", google_auth_library_credentials : "com.google.auth:google-auth-library-credentials:$google_auth_version", google_auth_library_oauth2_http : "com.google.auth:google-auth-library-oauth2-http:$google_auth_version", +google_cloud_bigquery : "com.google.cloud:google-cloud-bigquery:$google_cloud_bigquery_version", Review comment: The version of this should probably be `google_clients_version`. Also, this particular client has caused dependency problems in the past. See #5286. The nexmark tests don't use other clients, so it might not be a problem. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163521) Time Spent: 2.5h (was: 2h 20m) > Remove pipeline for publishing nexmark results to bigQuery and publish using > BigQuery API only > -- > > Key: BEAM-5906 > URL: https://issues.apache.org/jira/browse/BEAM-5906 > Project: Beam > Issue Type: Improvement > Components: examples-nexmark >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > There's no need to start a separate pipeline for uploading metrics results > from Nexmark suites to BigQuery. We can use an API designed for that and > place it in test-utils. Thanks to that: > - it won't start a separate pipeline every time it publishes results > - other suites will be able to use that code > - We will not face problems like special long to int conversion due to > problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only
[ https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163513 ] ASF GitHub Bot logged work on BEAM-5906: Author: ASF GitHub Bot Created on: 07/Nov/18 17:26 Start Date: 07/Nov/18 17:26 Worklog Time Spent: 10m Work Description: lgajowy commented on a change in pull request #6886: [BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results URL: https://github.com/apache/beam/pull/6886#discussion_r231601088 ## File path: sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -177,77 +160,36 @@ void runAll(String[] args) throws IOException { @VisibleForTesting static void savePerfsToBigQuery( + BigQueryClient bigQueryClient, NexmarkOptions options, Map perfs, - @Nullable BigQueryServices testBigQueryServices, Instant start) { -Pipeline pipeline = Pipeline.create(options); -PCollection> perfsPCollection = -pipeline.apply( -Create.of(perfs) -.withCoder( -KvCoder.of( -SerializableCoder.of(NexmarkConfiguration.class), -new CustomCoder() { - - @Override - public void encode(NexmarkPerf value, OutputStream outStream) - throws CoderException, IOException { -StringUtf8Coder.of().encode(value.toString(), outStream); - } - - @Override - public NexmarkPerf decode(InputStream inStream) - throws CoderException, IOException { -String perf = StringUtf8Coder.of().decode(inStream); -return NexmarkPerf.fromString(perf); - } -}))); - -TableSchema tableSchema = -new TableSchema() -.setFields( -ImmutableList.of( -new TableFieldSchema().setName("timestamp").setType("TIMESTAMP"), -new TableFieldSchema().setName("runtimeSec").setType("FLOAT"), -new TableFieldSchema().setName("eventsPerSec").setType("FLOAT"), -new TableFieldSchema().setName("numResults").setType("INTEGER"))); - -String queryName = "{query}"; -if (options.getQueryLanguage() != null) { - queryName = queryName + "_" + options.getQueryLanguage(); -} -final String tableSpec = NexmarkUtils.tableSpec(options, queryName, 0L, null); -SerializableFunction< -ValueInSingleWindow>, TableDestination> -tableFunction = -input -> -new TableDestination( -tableSpec.replace("{query}", input.getValue().getKey().query.getNumberOrName()), -"perfkit queries"); -SerializableFunction, TableRow> rowFunction = -input -> { - NexmarkPerf nexmarkPerf = input.getValue(); - TableRow row = - new TableRow() - .set("timestamp", start.getMillis() / 1000) - .set("runtimeSec", nexmarkPerf.runtimeSec) - .set("eventsPerSec", nexmarkPerf.eventsPerSec) - .set("numResults", nexmarkPerf.numResults); - return row; -}; -BigQueryIO.Write io = -BigQueryIO.>write() -.to(tableFunction) -.withSchema(tableSchema) - .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) - .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) -.withFormatFunction(rowFunction); -if (testBigQueryServices != null) { - io = io.withTestServices(testBigQueryServices); + +for (Map.Entry entry : perfs.entrySet()) { + String queryName = + NexmarkUtils.fullQueryName( + options.getQueryLanguage(), entry.getKey().query.getNumberOrName()); + String tableName = NexmarkUtils.tableName(options, queryName, 0L, null); + + ImmutableMap schema = + ImmutableMap.builder() + .put("timestamp", "timestamp") + .put("runtimeSec", "float") + .put("eventsPerSec", "float") + .put("numResults", "integer") + .build(); + bigQueryClient.createTableIfNotExists(tableName, schema); + + Map record = + ImmutableMap.builder() + .put("timestamp", start.getMillis() / 1000) + .put("runtimeSec", entry.getValue().runtimeSec) + .put("eventsPerSec", entry.getValue().eventsPerSec) + .put("numResults", entry.getValue().numResults) + .build(); + + bigQueryClient.insertRow(record,
[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.
[ https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163510 ] ASF GitHub Bot logged work on BEAM-6005: Author: ASF GitHub Bot Created on: 07/Nov/18 17:25 Start Date: 07/Nov/18 17:25 Worklog Time Spent: 10m Work Description: supercclank commented on a change in pull request #6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually function. URL: https://github.com/apache/beam/pull/6923#discussion_r231599920 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java ## @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) throws IOException { @Test @Category(NeedsRunner.class) public void testDecodingIOException() throws Exception { -thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); Pipeline p = -runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testDecodingNPException() throws Exception { -thrown.expect(RuntimeException.class); -thrown.expectMessage("java.lang.NullPointerException: Super Unique Message!!!"); - Pipeline p = -runPipelineWith( +pipelineWith( new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); Review comment: Was the widening of RuntimeException.class to Exception.class intended? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163510) Time Spent: 40m (was: 0.5h) > PCollectionCustomCoderTest passes spuriously. > - > > Key: BEAM-6005 > URL: https://issues.apache.org/jira/browse/BEAM-6005 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Jason Kuster >Assignee: Jason Kuster >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > The test assertions trigger before the coder is used on the actual runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only
[ https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163516 ] ASF GitHub Bot logged work on BEAM-5906: Author: ASF GitHub Bot Created on: 07/Nov/18 17:26 Start Date: 07/Nov/18 17:26 Worklog Time Spent: 10m Work Description: lgajowy commented on a change in pull request #6886: [BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results URL: https://github.com/apache/beam/pull/6886#discussion_r231601163 ## File path: sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java ## @@ -87,41 +71,47 @@ public void testSavePerfsToBigQuery() throws IOException, InterruptedException { nexmarkPerf2.eventsPerSec = 1.5F; nexmarkPerf2.runtimeSec = 1.325F; -// simulate 2 runs of the same query just to check that rows are apened correctly. +// simulate 2 runs of the same query just to check that rows are appended correctly. HashMap perfs = new HashMap<>(2); perfs.put(nexmarkConfiguration1, nexmarkPerf1); perfs.put(nexmarkConfiguration2, nexmarkPerf2); -// cast to int due to BEAM-4734. To avoid overflow on int capacity, -// set the instant to a fixed date (and not Instant.now()) -int startTimestampSeconds = 1454284800; -Main.savePerfsToBigQuery( -options, perfs, fakeBqServices, new Instant(startTimestampSeconds * 1000L)); - -String tableSpec = NexmarkUtils.tableSpec(options, QUERY.getNumberOrName(), 0L, null); -List actualRows = -fakeDatasetService.getAllRows( -options.getProject(), -options.getBigQueryDataset(), -BigQueryHelpers.parseTableSpec(tableSpec).getTableId()); -assertEquals("Wrong number of rows inserted", 2, actualRows.size()); -List expectedRows = new ArrayList<>(); -TableRow row1 = -new TableRow() -.set("timestamp", startTimestampSeconds) -.set("runtimeSec", nexmarkPerf1.runtimeSec) -.set("eventsPerSec", nexmarkPerf1.eventsPerSec) -// cast to int due to BEAM-4734. -.set("numResults", (int) nexmarkPerf1.numResults); -expectedRows.add(row1); -TableRow row2 = -new TableRow() -.set("timestamp", startTimestampSeconds) -.set("runtimeSec", nexmarkPerf2.runtimeSec) -.set("eventsPerSec", nexmarkPerf2.eventsPerSec) -// cast to int due to BEAM-4734. -.set("numResults", (int) nexmarkPerf2.numResults); -expectedRows.add(row2); -assertThat(actualRows, containsInAnyOrder(Iterables.toArray(expectedRows, TableRow.class))); +long startTimestampSeconds = 145428480L; Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163516) Time Spent: 2h 20m (was: 2h 10m) > Remove pipeline for publishing nexmark results to bigQuery and publish using > BigQuery API only > -- > > Key: BEAM-5906 > URL: https://issues.apache.org/jira/browse/BEAM-5906 > Project: Beam > Issue Type: Improvement > Components: examples-nexmark >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > There's no need to start a separate pipeline for uploading metrics results > from Nexmark suites to BigQuery. We can use an API designed for that and > place it in test-utils. Thanks to that: > - it won't start a separate pipeline every time it publishes results > - other suites will be able to use that code > - We will not face problems like special long to int conversion due to > problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only
[ https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163515 ] ASF GitHub Bot logged work on BEAM-5906: Author: ASF GitHub Bot Created on: 07/Nov/18 17:26 Start Date: 07/Nov/18 17:26 Worklog Time Spent: 10m Work Description: lgajowy commented on a change in pull request #6886: [BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results URL: https://github.com/apache/beam/pull/6886#discussion_r231601112 ## File path: sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java ## @@ -87,41 +71,47 @@ public void testSavePerfsToBigQuery() throws IOException, InterruptedException { nexmarkPerf2.eventsPerSec = 1.5F; nexmarkPerf2.runtimeSec = 1.325F; -// simulate 2 runs of the same query just to check that rows are apened correctly. +// simulate 2 runs of the same query just to check that rows are appended correctly. HashMap perfs = new HashMap<>(2); perfs.put(nexmarkConfiguration1, nexmarkPerf1); perfs.put(nexmarkConfiguration2, nexmarkPerf2); -// cast to int due to BEAM-4734. To avoid overflow on int capacity, -// set the instant to a fixed date (and not Instant.now()) -int startTimestampSeconds = 1454284800; -Main.savePerfsToBigQuery( -options, perfs, fakeBqServices, new Instant(startTimestampSeconds * 1000L)); - -String tableSpec = NexmarkUtils.tableSpec(options, QUERY.getNumberOrName(), 0L, null); -List actualRows = -fakeDatasetService.getAllRows( -options.getProject(), -options.getBigQueryDataset(), -BigQueryHelpers.parseTableSpec(tableSpec).getTableId()); -assertEquals("Wrong number of rows inserted", 2, actualRows.size()); -List expectedRows = new ArrayList<>(); -TableRow row1 = -new TableRow() -.set("timestamp", startTimestampSeconds) -.set("runtimeSec", nexmarkPerf1.runtimeSec) -.set("eventsPerSec", nexmarkPerf1.eventsPerSec) -// cast to int due to BEAM-4734. -.set("numResults", (int) nexmarkPerf1.numResults); -expectedRows.add(row1); -TableRow row2 = -new TableRow() -.set("timestamp", startTimestampSeconds) -.set("runtimeSec", nexmarkPerf2.runtimeSec) -.set("eventsPerSec", nexmarkPerf2.eventsPerSec) -// cast to int due to BEAM-4734. -.set("numResults", (int) nexmarkPerf2.numResults); -expectedRows.add(row2); -assertThat(actualRows, containsInAnyOrder(Iterables.toArray(expectedRows, TableRow.class))); +long startTimestampSeconds = 145428480L; +Main.savePerfsToBigQuery(bigQueryClient, options, perfs, new Instant(startTimestampSeconds)); + +String tableName = NexmarkUtils.tableName(options, QUERY.getNumberOrName(), 0L, null); +List> rows = bigQueryClient.getRows(tableName); + +// savePerfsToBigQuery converts millis to seconds (it's a BigQuery's requirement). Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163515) Time Spent: 2h 10m (was: 2h) > Remove pipeline for publishing nexmark results to bigQuery and publish using > BigQuery API only > -- > > Key: BEAM-5906 > URL: https://issues.apache.org/jira/browse/BEAM-5906 > Project: Beam > Issue Type: Improvement > Components: examples-nexmark >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > There's no need to start a separate pipeline for uploading metrics results > from Nexmark suites to BigQuery. We can use an API designed for that and > place it in test-utils. Thanks to that: > - it won't start a separate pipeline every time it publishes results > - other suites will be able to use that code > - We will not face problems like special long to int conversion due to > problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only
[ https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163514 ] ASF GitHub Bot logged work on BEAM-5906: Author: ASF GitHub Bot Created on: 07/Nov/18 17:26 Start Date: 07/Nov/18 17:26 Worklog Time Spent: 10m Work Description: lgajowy commented on a change in pull request #6886: [BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results URL: https://github.com/apache/beam/pull/6886#discussion_r231601096 ## File path: sdks/java/testing/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java ## @@ -87,41 +71,47 @@ public void testSavePerfsToBigQuery() throws IOException, InterruptedException { nexmarkPerf2.eventsPerSec = 1.5F; nexmarkPerf2.runtimeSec = 1.325F; -// simulate 2 runs of the same query just to check that rows are apened correctly. +// simulate 2 runs of the same query just to check that rows are appended correctly. HashMap perfs = new HashMap<>(2); perfs.put(nexmarkConfiguration1, nexmarkPerf1); perfs.put(nexmarkConfiguration2, nexmarkPerf2); -// cast to int due to BEAM-4734. To avoid overflow on int capacity, Review comment: I'm glad too. :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163514) Time Spent: 2h (was: 1h 50m) > Remove pipeline for publishing nexmark results to bigQuery and publish using > BigQuery API only > -- > > Key: BEAM-5906 > URL: https://issues.apache.org/jira/browse/BEAM-5906 > Project: Beam > Issue Type: Improvement > Components: examples-nexmark >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > There's no need to start a separate pipeline for uploading metrics results > from Nexmark suites to BigQuery. We can use an API designed for that and > place it in test-utils. Thanks to that: > - it won't start a separate pipeline every time it publishes results > - other suites will be able to use that code > - We will not face problems like special long to int conversion due to > problems in BigQueryIO (eg. BEAM-4734) because we will use a thin API instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5906) Remove pipeline for publishing nexmark results to bigQuery and publish using BigQuery API only
[ https://issues.apache.org/jira/browse/BEAM-5906?focusedWorklogId=163512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163512 ] ASF GitHub Bot logged work on BEAM-5906: Author: ASF GitHub Bot Created on: 07/Nov/18 17:26 Start Date: 07/Nov/18 17:26 Worklog Time Spent: 10m Work Description: lgajowy commented on a change in pull request #6886: [BEAM-5906] Use dedicated BigQuery client for publishing Nexmark results URL: https://github.com/apache/beam/pull/6886#discussion_r231601063 ## File path: sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkUtils.java ## @@ -145,53 +145,51 @@ QUERY_RUNNER_AND_MODE } + /** Return a query name with query language (if applicable). */ + static String fullQueryName(String queryLanguage, String query) { +return queryLanguage != null ? query + "_" + queryLanguage : query; + } + /** Return a BigQuery table spec. */ static String tableSpec(NexmarkOptions options, String queryName, long now, String version) { +return String.format( +"%s:%s.%s", +options.getProject(), +options.getBigQueryDataset(), +tableName(options, queryName, now, version)); + } + + /** Return a BigQuery table name. */ + static String tableName(NexmarkOptions options, String queryName, long now, String version) { String baseTableName = options.getBigQueryTable(); if (Strings.isNullOrEmpty(baseTableName)) { throw new RuntimeException("Missing --bigQueryTable"); } + switch (options.getResourceNameMode()) { case VERBATIM: -return String.format( -"%s:%s.%s_%s", -options.getProject(), options.getBigQueryDataset(), baseTableName, version); +return String.format("%s_%s", baseTableName, version); case QUERY: -return String.format( -"%s:%s.%s_%s_%s", -options.getProject(), options.getBigQueryDataset(), baseTableName, queryName, version); +return String.format("%s_%s_%s", baseTableName, queryName, version); case QUERY_AND_SALT: -return String.format( -"%s:%s.%s_%s_%s_%d", -options.getProject(), -options.getBigQueryDataset(), -baseTableName, -queryName, -version, -now); +return String.format("%s_%s_%s_%d", baseTableName, queryName, version, now); case QUERY_RUNNER_AND_MODE: -return (version != null) -? String.format( -"%s:%s.%s_%s_%s_%s_%s", -options.getProject(), -options.getBigQueryDataset(), -baseTableName, -queryName, -options.getRunner().getSimpleName(), -options.isStreaming() ? "streaming" : "batch", -version) -: String.format( -"%s:%s.%s_%s_%s_%s", -options.getProject(), -options.getBigQueryDataset(), -baseTableName, -queryName, -options.getRunner().getSimpleName(), -options.isStreaming() ? "streaming" : "batch"); +String runnerName = options.getRunner().getSimpleName(); +boolean isStreaming = options.isStreaming(); + +String tableName = +String.format( +"%s_%s_%s_%s", baseTableName, queryName, runnerName, processingMode(isStreaming)); + +return version != null ? String.format("%s_%s", tableName, version) : tableName; } throw new RuntimeException("Unrecognized enum " + options.getResourceNameMode()); } + private static String processingMode(boolean isStreaming) { Review comment: I deliberately left it because it makes the code cleaner to me (hides the details). Given that, can we keep it as-is? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163512) Time Spent: 1h 40m (was: 1.5h) > Remove pipeline for publishing nexmark results to bigQuery and publish using > BigQuery API only > -- > > Key: BEAM-5906 > URL: https://issues.apache.org/jira/browse/BEAM-5906 > Project: Beam > Issue Type: Improvement > Components: examples-nexmark >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > There's no need
[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.
[ https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163506 ] ASF GitHub Bot logged work on BEAM-6005: Author: ASF GitHub Bot Created on: 07/Nov/18 17:25 Start Date: 07/Nov/18 17:25 Worklog Time Spent: 10m Work Description: supercclank commented on a change in pull request #6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually function. URL: https://github.com/apache/beam/pull/6923#discussion_r231598848 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java ## @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) throws IOException { @Test @Category(NeedsRunner.class) public void testDecodingIOException() throws Exception { -thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); Pipeline p = -runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testDecodingNPException() throws Exception { -thrown.expect(RuntimeException.class); -thrown.expectMessage("java.lang.NullPointerException: Super Unique Message!!!"); - Pipeline p = -runPipelineWith( +pipelineWith( new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super Unique Message!!!")); + +p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testEncodingIOException() throws Exception { +Pipeline p = +pipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, EXCEPTION_MESSAGE)); thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); -Pipeline p = -runPipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, EXCEPTION_MESSAGE)); +p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testEncodingNPException() throws Exception { -thrown.expect(RuntimeException.class); -thrown.expectMessage("java.lang.NullPointerException: Super Unique Message!!!"); Pipeline p = -runPipelineWith( +pipelineWith( new CustomTestCoder(null, NULL_POINTER_EXCEPTION, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super Unique Message!!!")); +p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testSerializationIOException() throws Exception { -thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); Pipeline p = -runPipelineWith(new CustomTestCoder(null, null, IO_EXCEPTION, null, EXCEPTION_MESSAGE)); +pipelineWith(new CustomTestCoder(null, null, IO_EXCEPTION, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); +p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testSerializationNPException() throws Exception { -thrown.expect(RuntimeException.class); -thrown.expectMessage("java.lang.NullPointerException: Super Unique Message!!!"); - Pipeline p = -runPipelineWith( +pipelineWith( new CustomTestCoder(null, null, NULL_POINTER_EXCEPTION, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); Review comment: Same question here about this changing to Exception.class This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163506) Time Spent: 20m (was: 10m) > PCollectionCustomCoderTest passes spuriously. > - > > Key: BEAM-6005 > URL: https://issues.apache.org/jira/browse/BEAM-6005 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Jason Kuster >
[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.
[ https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163509 ] ASF GitHub Bot logged work on BEAM-6005: Author: ASF GitHub Bot Created on: 07/Nov/18 17:25 Start Date: 07/Nov/18 17:25 Worklog Time Spent: 10m Work Description: supercclank commented on a change in pull request #6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually function. URL: https://github.com/apache/beam/pull/6923#discussion_r231600012 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java ## @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) throws IOException { @Test @Category(NeedsRunner.class) public void testDecodingIOException() throws Exception { -thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); Pipeline p = -runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); Review comment: Is this able to be changed to "IOException.class" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163509) > PCollectionCustomCoderTest passes spuriously. > - > > Key: BEAM-6005 > URL: https://issues.apache.org/jira/browse/BEAM-6005 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Jason Kuster >Assignee: Jason Kuster >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > The test assertions trigger before the coder is used on the actual runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.
[ https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163508 ] ASF GitHub Bot logged work on BEAM-6005: Author: ASF GitHub Bot Created on: 07/Nov/18 17:25 Start Date: 07/Nov/18 17:25 Worklog Time Spent: 10m Work Description: supercclank commented on a change in pull request #6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually function. URL: https://github.com/apache/beam/pull/6923#discussion_r231598938 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java ## @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) throws IOException { @Test @Category(NeedsRunner.class) public void testDecodingIOException() throws Exception { -thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); Pipeline p = -runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testDecodingNPException() throws Exception { -thrown.expect(RuntimeException.class); -thrown.expectMessage("java.lang.NullPointerException: Super Unique Message!!!"); - Pipeline p = -runPipelineWith( +pipelineWith( new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super Unique Message!!!")); + +p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testEncodingIOException() throws Exception { +Pipeline p = +pipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, EXCEPTION_MESSAGE)); thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); -Pipeline p = -runPipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, EXCEPTION_MESSAGE)); +p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testEncodingNPException() throws Exception { -thrown.expect(RuntimeException.class); -thrown.expectMessage("java.lang.NullPointerException: Super Unique Message!!!"); Pipeline p = -runPipelineWith( +pipelineWith( new CustomTestCoder(null, NULL_POINTER_EXCEPTION, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); Review comment: Was the widening of RuntimeException.class to Exception.class intended? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163508) Time Spent: 0.5h (was: 20m) > PCollectionCustomCoderTest passes spuriously. > - > > Key: BEAM-6005 > URL: https://issues.apache.org/jira/browse/BEAM-6005 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Jason Kuster >Assignee: Jason Kuster >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > The test assertions trigger before the coder is used on the actual runner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6005) PCollectionCustomCoderTest passes spuriously.
[ https://issues.apache.org/jira/browse/BEAM-6005?focusedWorklogId=163507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163507 ] ASF GitHub Bot logged work on BEAM-6005: Author: ASF GitHub Bot Created on: 07/Nov/18 17:25 Start Date: 07/Nov/18 17:25 Worklog Time Spent: 10m Work Description: supercclank commented on a change in pull request #6923: [BEAM-6005] PCollectionCustomCoderTest updates to fix test to actually function. URL: https://github.com/apache/beam/pull/6923#discussion_r231599120 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/coders/PCollectionCustomCoderTest.java ## @@ -148,104 +161,179 @@ private void throwIfPresent(String exceptionClassName) throws IOException { @Test @Category(NeedsRunner.class) public void testDecodingIOException() throws Exception { -thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); Pipeline p = -runPipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +pipelineWith(new CustomTestCoder(IO_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testDecodingNPException() throws Exception { -thrown.expect(RuntimeException.class); -thrown.expectMessage("java.lang.NullPointerException: Super Unique Message!!!"); - Pipeline p = -runPipelineWith( +pipelineWith( new CustomTestCoder(NULL_POINTER_EXCEPTION, null, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super Unique Message!!!")); + +p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testEncodingIOException() throws Exception { +Pipeline p = +pipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, EXCEPTION_MESSAGE)); thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); -Pipeline p = -runPipelineWith(new CustomTestCoder(null, IO_EXCEPTION, null, null, EXCEPTION_MESSAGE)); +p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testEncodingNPException() throws Exception { -thrown.expect(RuntimeException.class); -thrown.expectMessage("java.lang.NullPointerException: Super Unique Message!!!"); Pipeline p = -runPipelineWith( +pipelineWith( new CustomTestCoder(null, NULL_POINTER_EXCEPTION, null, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super Unique Message!!!")); +p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testSerializationIOException() throws Exception { -thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); Pipeline p = -runPipelineWith(new CustomTestCoder(null, null, IO_EXCEPTION, null, EXCEPTION_MESSAGE)); +pipelineWith(new CustomTestCoder(null, null, IO_EXCEPTION, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); +p.run().waitUntilFinish(); } @Test @Category(NeedsRunner.class) public void testSerializationNPException() throws Exception { -thrown.expect(RuntimeException.class); -thrown.expectMessage("java.lang.NullPointerException: Super Unique Message!!!"); - Pipeline p = -runPipelineWith( +pipelineWith( new CustomTestCoder(null, null, NULL_POINTER_EXCEPTION, null, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.lang.NullPointerException: Super Unique Message!!!")); + +p.run().waitUntilFinish(); } + // TODO(BEAM-6004) Have DirectRunner trigger deserialization. + @Ignore("DirectRunner doesn't decode coders so this test does not pass.") @Test @Category(NeedsRunner.class) public void testDeserializationIOException() throws Exception { -thrown.expect(Exception.class); -thrown.expectCause(instanceOf(IOException.class)); Pipeline p = -runPipelineWith(new CustomTestCoder(null, null, null, IO_EXCEPTION, EXCEPTION_MESSAGE)); +pipelineWith(new CustomTestCoder(null, null, null, IO_EXCEPTION, EXCEPTION_MESSAGE)); +thrown.expect(Exception.class); +thrown.expect(new ExceptionMatcher("java.io.IOException: Super Unique Message!!!")); +
[jira] [Commented] (BEAM-5798) Add support for dynamic destinations when writing to Kafka
[ https://issues.apache.org/jira/browse/BEAM-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678500#comment-16678500 ] Alexey Romanenko commented on BEAM-5798: [~rangadi] You mean to change the internal API from using KV to use ProducerRecord but keep external API as "it is"? Yes, perhaps it would makes sense too, thanks! > Add support for dynamic destinations when writing to Kafka > -- > > Key: BEAM-5798 > URL: https://issues.apache.org/jira/browse/BEAM-5798 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka >Reporter: Luke Cwik >Assignee: Alexey Romanenko >Priority: Major > Labels: newbie, starter > Time Spent: 1h 50m > Remaining Estimate: 0h > > Add support for writing to Kafka based upon contents of the data. This is > similar to the dynamic destination approach for file IO and other sinks. > > Source of request: > https://lists.apache.org/thread.html/a89d1d32ecdb50c42271e805cc01a651ee3623b4df97c39baf4f2053@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163495 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 16:55 Start Date: 07/Nov/18 16:55 Worklog Time Spent: 10m Work Description: lgajowy commented on a change in pull request #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#discussion_r231586067 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1139,7 +1140,25 @@ artifactId=${project.name} outputs.upToDateWhen { false } include "**/*IT.class" -systemProperties.beamTestPipelineOptions = configuration.integrationTestPipelineOptions + +def pipelineOptionString = configuration.integrationTestPipelineOptions +if(configuration.runner?.equalsIgnoreCase('dataflow')) { + project.evaluationDependsOn(":beam-runners-google-cloud-dataflow-java-legacy-worker") + + def jsonSlurper = new JsonSlurper() + def allOptionsList = jsonSlurper.parseText(pipelineOptionString) + def dataflowWorkerJar = project.findProperty('dataflowWorkerJar') ?: + project.project(":beam-runners-google-cloud-dataflow-java-legacy-worker").shadowJar.archivePath Review comment: Will this line only get the path to a previously built worker (without building) or it will build it here if it's not found? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163495) Time Spent: 3h 20m (was: 3h 10m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163494 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 16:55 Start Date: 07/Nov/18 16:55 Worklog Time Spent: 10m Work Description: lgajowy commented on a change in pull request #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#discussion_r231584194 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1139,7 +1140,25 @@ artifactId=${project.name} outputs.upToDateWhen { false } include "**/*IT.class" -systemProperties.beamTestPipelineOptions = configuration.integrationTestPipelineOptions + +def pipelineOptionString = configuration.integrationTestPipelineOptions +if(configuration.runner?.equalsIgnoreCase('dataflow')) { + project.evaluationDependsOn(":beam-runners-google-cloud-dataflow-java-legacy-worker") + + def jsonSlurper = new JsonSlurper() Review comment: can we inline this variable? It's used only once. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163494) Time Spent: 3h 10m (was: 3h) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163480 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 16:34 Start Date: 07/Nov/18 16:34 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#issuecomment-436688003 Run Java MongoDBIO Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163480) Time Spent: 2h 50m (was: 2h 40m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163481 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 16:34 Start Date: 07/Nov/18 16:34 Worklog Time Spent: 10m Work Description: lgajowy edited a comment on issue #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#issuecomment-436687860 Let's check also some other tests. (btw, I'm reviewing now. :) ) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163481) Time Spent: 3h (was: 2h 50m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163478 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 16:34 Start Date: 07/Nov/18 16:34 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#issuecomment-436687860 Let's check also some other tests: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163478) Time Spent: 2.5h (was: 2h 20m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163479 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 16:34 Start Date: 07/Nov/18 16:34 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#issuecomment-436687883 Run Java JdbcIO Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163479) Time Spent: 2h 40m (was: 2.5h) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5931) Rollback PR/6899
[ https://issues.apache.org/jira/browse/BEAM-5931?focusedWorklogId=163476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163476 ] ASF GitHub Bot logged work on BEAM-5931: Author: ASF GitHub Bot Created on: 07/Nov/18 16:27 Start Date: 07/Nov/18 16:27 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6966: [DRAFT] [BEAM-5931] Add worker jar to integrationTest targets. URL: https://github.com/apache/beam/pull/6966#issuecomment-436684952 Run Java TextIO Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 163476) Time Spent: 2h 20m (was: 2h 10m) > Rollback PR/6899 > > > Key: BEAM-5931 > URL: https://issues.apache.org/jira/browse/BEAM-5931 > Project: Beam > Issue Type: Task > Components: beam-model, runner-dataflow >Reporter: Luke Cwik >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > To rollback this change, one must either: > 1) Update nexmark / perf test framework to use Dataflow worker jar. > This requires adding the > {code} > "--dataflowWorkerJar=${dataflowWorkerJar}", > "--workerHarnessContainerImage=", > {code} > when running the tests. > OR > 2) Update the dataflow worker image with code that contains the rollback of > PR/6899 and then rollback PR/6899 in Github with the updated Dataflow worker > image. > #1 is preferable since we will no longer have tests running that don't use a > Dataflow worker jar built from Github HEAD. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-6011) Enable Phrase triggering in Nexmark jobs
[ https://issues.apache.org/jira/browse/BEAM-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678445#comment-16678445 ] Lukasz Gajowy edited comment on BEAM-6011 at 11/7/18 4:18 PM: -- I made some attempts to prepare a quick fix for this. It was not so easy as I initially expected. I tried to use the $GIT_BRANCH environment variable to conditionally create the nexmark args needed for big query connection. It seems that this is not the right approach, because $GIT_BRANCH is evaluated while running the job whereas the rest of the code is evaluated while running the Seed job. Moreover, I don't like this solution because it's hiding the real intent of the job. Link to the PR (currently closed): [https://github.com/apache/beam/pull/6974] IMO, after some trying, the way it has to be done is that: - every nexmark job should have a "twin" job that is for running on PRs (exclusively). The "twin" job should run the same queries (to verify performance the same way). - PR triggered jobs should save to the big query database the same way as the periodic jobs (because saving also can change and we will need to test that in PRs too). - PR triggered jobs should save to a new BQ dataset and have the same table names. - Jenkins dsl code should be refactored so that it provides convenient methods for creating new jobs = CC: [~kenn] , [~echauchot] WDYT? Do you have other comments/suggestions? We should do exactly the same for IO Performance tests (currently they suffer the same problems)... was (Author: łukaszg): I made some attempts to prepare a quick fix for this. It was not so easy as I initially expected. I tried to use the $GIT_BRANCH environment variable to conditionally create the nexmark args needed for big query connection. It seems that this is not the right approach, because $GIT_BRANCH is evaluated while running the job whereas the rest of the code is evaluated while running the Seed job. Moreover, I don't like this solution because it's hiding the real intent of the job. Link to the PR (currently closed): [https://github.com/apache/beam/pull/6974] IMO, after some trying, the way it has to be done is that: - every nexmark job should have a "twin" job that is for running on PRs (exclusively) - PR triggered jobs should save to the big query database the same way as the periodic jobs (because saving also can change and we will need to test that in PRs too). - PR triggered jobs should save to a new BQ dataset and have the same table names. - Jenkins dsl code should be refactored so that it provides convenient methods for creating new jobs = CC: [~kenn] , [~echauchot] WDYT? Do you have other comments/suggestions? We should do exactly the same for IO Performance tests (currently they suffer the same problems)... > Enable Phrase triggering in Nexmark jobs > > > Key: BEAM-6011 > URL: https://issues.apache.org/jira/browse/BEAM-6011 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Critical > > We need to enable Phrase Triggering (running Jenkins jobs from PR) for > Nexmark jobs so that we could check if pull requests are not breaking > anything before merging them. > Note: Currently Nexmark jobs run post commit on master and publish their > results to BigQuery database. In order not to pollute the results collected > for master we should save the results for Pr-triggered jobs in some other > tables/datasets or even not save them at all (turn off publishing to BQ). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6002) Nexmark tests timing out on all runners (crash loop due to metrics?)
[ https://issues.apache.org/jira/browse/BEAM-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678450#comment-16678450 ] Lukasz Gajowy commented on BEAM-6002: - I created an issue for nexmark phrase triggering: https://issues.apache.org/jira/browse/BEAM-6011 > Nexmark tests timing out on all runners (crash loop due to metrics?) > - > > Key: BEAM-6002 > URL: https://issues.apache.org/jira/browse/BEAM-6002 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kenneth Knowles >Assignee: Chamikara Jayalath >Priority: Critical > Time Spent: 2h 40m > Remaining Estimate: 0h > > https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Spark/992/ > {code} > 08:58:26 2018-11-06T16:58:26.035Z RUNNING Query0 > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.elements, from namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.bytes, from namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.startTime for namespace Query0.Events > 08:58:26 2018-11-06T16:58:26.035Z no activity > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.endTime for namespace Query0.Events > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.elements, from namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.bytes, from namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTime for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTime for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTimestamp for namespace Query0.Results > 08:58:26 18/11/06 16:58:26 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTimestamp for namespace Query0.Results > 08:58:41 2018-11-06T16:58:41.035Z RUNNING Query0 > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.elements, from namespace Query0.Events > 08:58:41 2018-11-06T16:58:41.036Z no activity > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.bytes, from namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.startTime for namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric event.endTime for namespace Query0.Events > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.elements, from namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > result.bytes, from namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTime for namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTime for namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.startTimestamp for namespace Query0.Results > 08:58:41 18/11/06 16:58:41 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get > distribution metric result.endTimestamp for namespace Query0.Results > 08:58:56 2018-11-06T16:58:56.036Z RUNNING Query0 > 08:58:56 18/11/06 16:58:56 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.elements, from namespace Query0.Events > 08:58:56 2018-11-06T16:58:56.036Z no activity > 08:58:56 18/11/06 16:58:56 ERROR > org.apache.beam.sdk.testutils.metrics.MetricsReader: Failed to get metric > event.bytes, from namespace Query0.Events > 08:58:56 18/11/06 16:58:56 ERROR >