Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4234

2017-06-27 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3439: Minor fixes for DSL SQL

2017-06-27 Thread iemejia
Github user iemejia closed the pull request at:

https://github.com/apache/beam/pull/3439


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2495

2017-06-27 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3351: [BEAM-2371] Fix getAdditionInputs for SplittablePar...

2017-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3351


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2371) Make Java DirectRunner demonstrate language-agnostic Runner API translation wrappers

2017-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065916#comment-16065916
 ] 

ASF GitHub Bot commented on BEAM-2371:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3351


> Make Java DirectRunner demonstrate language-agnostic Runner API translation 
> wrappers
> 
>
> Key: BEAM-2371
> URL: https://issues.apache.org/jira/browse/BEAM-2371
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> This will complete the PoC for runners-core-construction-java and the Runner 
> API and show other runners the easy path to executing non-Java pipelines, 
> modulo Fn API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[1/6] beam git commit: Add utility to expand list of PCollectionViews

2017-06-27 Thread kenn
Repository: beam
Updated Branches:
  refs/heads/master e5929bd13 -> 16f8000e2


Add utility to expand list of PCollectionViews


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/58fba590
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/58fba590
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/58fba590

Branch: refs/heads/master
Commit: 58fba590ddc554a343036a7beeffe9caa319aa81
Parents: e5929bd
Author: Kenneth Knowles 
Authored: Tue Jun 27 14:35:00 2017 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 27 20:45:40 2017 -0700

--
 .../org/apache/beam/sdk/values/PCollectionViews.java  | 14 ++
 1 file changed, 14 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/58fba590/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java
index 0c04370..e17e146 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java
@@ -21,6 +21,7 @@ import com.google.common.base.Function;
 import com.google.common.base.MoreObjects;
 import com.google.common.collect.HashMultimap;
 import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Multimap;
 import java.io.IOException;
@@ -38,6 +39,7 @@ import org.apache.beam.sdk.coders.Coder;
 import org.apache.beam.sdk.coders.IterableCoder;
 import org.apache.beam.sdk.transforms.Materialization;
 import org.apache.beam.sdk.transforms.Materializations;
+import org.apache.beam.sdk.transforms.PTransform;
 import org.apache.beam.sdk.transforms.ViewFn;
 import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
 import org.apache.beam.sdk.transforms.windowing.InvalidWindows;
@@ -139,6 +141,18 @@ public class PCollectionViews {
   }
 
   /**
+   * Expands a list of {@link PCollectionView} into the form needed for
+   * {@link PTransform#getAdditionalInputs()}.
+   */
+  public static Map 
toAdditionalInputs(Iterable views) {
+ImmutableMap.Builder additionalInputs = 
ImmutableMap.builder();
+for (PCollectionView view : views) {
+  additionalInputs.put(view.getTagInternal(), view.getPCollection());
+}
+return additionalInputs.build();
+  }
+
+  /**
* Implementation of conversion of singleton {@code 
Iterable} to {@code T}.
*
* For internal use only.



[6/6] beam git commit: This closes #3351: [BEAM-2371] Fix getAdditionInputs for SplittableParDo transforms

2017-06-27 Thread kenn
This closes #3351: [BEAM-2371] Fix getAdditionInputs for SplittableParDo 
transforms

  Use PCollectionViews.toAdditionalInputs in Combine
  Use PCollectionViews.toAdditionalInputs in ParDo
  Use PCollectionViews.toAdditionalInputs in ParDoMultiOverrideFactory
  Fix getAdditionalInputs for SplittableParDo transforms
  Add utility to expand list of PCollectionViews


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/16f8000e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/16f8000e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/16f8000e

Branch: refs/heads/master
Commit: 16f8000e24c5b56132987752b8e903392e5dd3e8
Parents: e5929bd 27674f0
Author: Kenneth Knowles 
Authored: Tue Jun 27 21:09:14 2017 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 27 21:09:14 2017 -0700

--
 .../apache/beam/runners/apex/ApexRunner.java|  2 +-
 .../core/construction/SplittableParDo.java  | 66 +++-
 .../core/construction/SplittableParDoTest.java  |  8 +--
 .../direct/ParDoMultiOverrideFactory.java   | 16 ++---
 .../flink/FlinkStreamingPipelineTranslator.java |  2 +-
 .../dataflow/SplittableParDoOverrides.java  |  2 +-
 .../org/apache/beam/sdk/transforms/Combine.java | 13 +---
 .../org/apache/beam/sdk/transforms/ParDo.java   | 14 +
 .../beam/sdk/values/PCollectionViews.java   | 14 +
 9 files changed, 79 insertions(+), 58 deletions(-)
--




[4/6] beam git commit: Use PCollectionViews.toAdditionalInputs in ParDo

2017-06-27 Thread kenn
Use PCollectionViews.toAdditionalInputs in ParDo


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ed476dd2
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ed476dd2
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ed476dd2

Branch: refs/heads/master
Commit: ed476dd2807577c8069087aa0764b21d1bb06512
Parents: 4238276
Author: Kenneth Knowles 
Authored: Tue Jun 27 14:41:30 2017 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 27 21:08:11 2017 -0700

--
 .../java/org/apache/beam/sdk/transforms/ParDo.java| 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/ed476dd2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java
index edf1419..db1f791 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ParDo.java
@@ -20,7 +20,6 @@ package org.apache.beam.sdk.transforms;
 import static com.google.common.base.Preconditions.checkArgument;
 
 import com.google.common.collect.ImmutableList;
-import com.google.common.collect.ImmutableMap;
 import java.io.Serializable;
 import java.lang.reflect.ParameterizedType;
 import java.lang.reflect.Type;
@@ -50,6 +49,7 @@ import org.apache.beam.sdk.util.SerializableUtils;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PCollectionTuple;
 import org.apache.beam.sdk.values.PCollectionView;
+import org.apache.beam.sdk.values.PCollectionViews;
 import org.apache.beam.sdk.values.PValue;
 import org.apache.beam.sdk.values.TupleTag;
 import org.apache.beam.sdk.values.TupleTagList;
@@ -662,11 +662,7 @@ public class ParDo {
  */
 @Override
 public Map getAdditionalInputs() {
-  ImmutableMap.Builder additionalInputs = 
ImmutableMap.builder();
-  for (PCollectionView sideInput : sideInputs) {
-additionalInputs.put(sideInput.getTagInternal(), 
sideInput.getPCollection());
-  }
-  return additionalInputs.build();
+  return PCollectionViews.toAdditionalInputs(sideInputs);
 }
   }
 
@@ -807,11 +803,7 @@ public class ParDo {
  */
 @Override
 public Map getAdditionalInputs() {
-  ImmutableMap.Builder additionalInputs = 
ImmutableMap.builder();
-  for (PCollectionView sideInput : sideInputs) {
-additionalInputs.put(sideInput.getTagInternal(), 
sideInput.getPCollection());
-  }
-  return additionalInputs.build();
+  return PCollectionViews.toAdditionalInputs(sideInputs);
 }
   }
 



[5/6] beam git commit: Use PCollectionViews.toAdditionalInputs in Combine

2017-06-27 Thread kenn
Use PCollectionViews.toAdditionalInputs in Combine


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/27674f07
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/27674f07
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/27674f07

Branch: refs/heads/master
Commit: 27674f07cf8363bb6b3c051a990caa5d61b8cd5c
Parents: ed476dd
Author: Kenneth Knowles 
Authored: Tue Jun 27 14:44:50 2017 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 27 21:08:11 2017 -0700

--
 .../java/org/apache/beam/sdk/transforms/Combine.java   | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/27674f07/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java
index 6a90bcf..d7effb5 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Combine.java
@@ -20,7 +20,6 @@ package org.apache.beam.sdk.transforms;
 import static com.google.common.base.Preconditions.checkState;
 
 import com.google.common.collect.ImmutableList;
-import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Iterables;
 import java.io.IOException;
 import java.io.InputStream;
@@ -1122,11 +1121,7 @@ public class Combine {
  */
 @Override
 public Map getAdditionalInputs() {
-  ImmutableMap.Builder additionalInputs = 
ImmutableMap.builder();
-  for (PCollectionView sideInput : sideInputs) {
-additionalInputs.put(sideInput.getTagInternal(), 
sideInput.getPCollection());
-  }
-  return additionalInputs.build();
+  return PCollectionViews.toAdditionalInputs(sideInputs);
 }
 
 /**
@@ -1578,11 +1573,7 @@ public class Combine {
  */
 @Override
 public Map getAdditionalInputs() {
-  ImmutableMap.Builder additionalInputs = 
ImmutableMap.builder();
-  for (PCollectionView sideInput : sideInputs) {
-additionalInputs.put(sideInput.getTagInternal(), 
sideInput.getPCollection());
-  }
-  return additionalInputs.build();
+  return PCollectionViews.toAdditionalInputs(sideInputs);
 }
 
 @Override



[3/6] beam git commit: Fix getAdditionalInputs for SplittableParDo transforms

2017-06-27 Thread kenn
Fix getAdditionalInputs for SplittableParDo transforms


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/a66bcd68
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/a66bcd68
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/a66bcd68

Branch: refs/heads/master
Commit: a66bcd68a1e56d5d38fccfce2ffeec28ba1c82de
Parents: 58fba59
Author: Kenneth Knowles 
Authored: Tue Jun 13 10:00:09 2017 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 27 21:08:10 2017 -0700

--
 .../apache/beam/runners/apex/ApexRunner.java|  2 +-
 .../core/construction/SplittableParDo.java  | 66 +++-
 .../core/construction/SplittableParDoTest.java  |  8 +--
 .../direct/ParDoMultiOverrideFactory.java   |  2 +-
 .../flink/FlinkStreamingPipelineTranslator.java |  2 +-
 .../dataflow/SplittableParDoOverrides.java  |  2 +-
 6 files changed, 57 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/a66bcd68/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java
--
diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java
index 95b354a..fd0a1c9 100644
--- a/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java
+++ b/runners/apex/src/main/java/org/apache/beam/runners/apex/ApexRunner.java
@@ -381,7 +381,7 @@ public class ApexRunner extends 
PipelineRunner {
 AppliedPTransform>
   transform) {
   return 
PTransformReplacement.of(PTransformReplacements.getSingletonMainInput(transform),
-  new SplittableParDo<>(transform.getTransform()));
+  SplittableParDo.forJavaParDo(transform.getTransform()));
 }
 
 @Override

http://git-wip-us.apache.org/repos/asf/beam/blob/a66bcd68/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
--
diff --git 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
index 5ccafcb..f31b495 100644
--- 
a/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
+++ 
b/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SplittableParDo.java
@@ -18,9 +18,9 @@
 package org.apache.beam.runners.core.construction;
 
 import static com.google.common.base.Preconditions.checkArgument;
-import static com.google.common.base.Preconditions.checkNotNull;
 
 import java.util.List;
+import java.util.Map;
 import java.util.UUID;
 import 
org.apache.beam.runners.core.construction.PTransformTranslation.RawPTransform;
 import org.apache.beam.sdk.annotations.Experimental;
@@ -40,6 +40,8 @@ import org.apache.beam.sdk.values.KV;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PCollectionTuple;
 import org.apache.beam.sdk.values.PCollectionView;
+import org.apache.beam.sdk.values.PCollectionViews;
+import org.apache.beam.sdk.values.PValue;
 import org.apache.beam.sdk.values.TupleTag;
 import org.apache.beam.sdk.values.TupleTagList;
 import org.apache.beam.sdk.values.WindowingStrategy;
@@ -64,7 +66,11 @@ import org.apache.beam.sdk.values.WindowingStrategy;
 @Experimental(Experimental.Kind.SPLITTABLE_DO_FN)
 public class SplittableParDo
 extends PTransform {
-  private final ParDo.MultiOutput parDo;
+
+  private final DoFn doFn;
+  private final List sideInputs;
+  private final TupleTag mainOutputTag;
+  private final TupleTagList additionalOutputTags;
 
   public static final String SPLITTABLE_PROCESS_URN =
   "urn:beam:runners_core:transforms:splittable_process:v1";
@@ -75,24 +81,39 @@ public class SplittableParDo
   public static final String SPLITTABLE_GBKIKWI_URN =
   "urn:beam:runners_core:transforms:splittable_gbkikwi:v1";
 
+  private SplittableParDo(
+  DoFn doFn,
+  TupleTag mainOutputTag,
+  List sideInputs,
+  TupleTagList additionalOutputTags) {
+checkArgument(
+
DoFnSignatures.getSignature(doFn.getClass()).processElement().isSplittable(),
+"fn must be a splittable DoFn");
+this.doFn = doFn;
+this.mainOutputTag = mainOutputTag;
+this.sideInputs = sideInputs;
+this.additionalOutputTags = 

[2/6] beam git commit: Use PCollectionViews.toAdditionalInputs in ParDoMultiOverrideFactory

2017-06-27 Thread kenn
Use PCollectionViews.toAdditionalInputs in ParDoMultiOverrideFactory


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/42382766
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/42382766
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/42382766

Branch: refs/heads/master
Commit: 423827665ae5923cd7fccc654bd9a5e1efed7876
Parents: a66bcd6
Author: Kenneth Knowles 
Authored: Tue Jun 27 14:39:06 2017 -0700
Committer: Kenneth Knowles 
Committed: Tue Jun 27 21:08:10 2017 -0700

--
 .../runners/direct/ParDoMultiOverrideFactory.java | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/42382766/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java
index 9a26283..2904bc1 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoMultiOverrideFactory.java
@@ -19,7 +19,6 @@ package org.apache.beam.runners.direct;
 
 import static com.google.common.base.Preconditions.checkState;
 
-import com.google.common.collect.ImmutableMap;
 import java.util.List;
 import java.util.Map;
 import org.apache.beam.runners.core.KeyedWorkItem;
@@ -50,6 +49,7 @@ import org.apache.beam.sdk.values.KV;
 import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PCollectionTuple;
 import org.apache.beam.sdk.values.PCollectionView;
+import org.apache.beam.sdk.values.PCollectionViews;
 import org.apache.beam.sdk.values.PValue;
 import org.apache.beam.sdk.values.TupleTag;
 import org.apache.beam.sdk.values.TupleTagList;
@@ -123,11 +123,7 @@ class ParDoMultiOverrideFactory
 
 @Override
 public Map getAdditionalInputs() {
-  ImmutableMap.Builder additionalInputs = 
ImmutableMap.builder();
-  for (PCollectionView sideInput : sideInputs) {
-additionalInputs.put(sideInput.getTagInternal(), 
sideInput.getPCollection());
-  }
-  return additionalInputs.build();
+  return PCollectionViews.toAdditionalInputs(sideInputs);
 }
 
 @Override
@@ -231,11 +227,7 @@ class ParDoMultiOverrideFactory
 
 @Override
 public Map getAdditionalInputs() {
-  ImmutableMap.Builder additionalInputs = 
ImmutableMap.builder();
-  for (PCollectionView sideInput : sideInputs) {
-additionalInputs.put(sideInput.getTagInternal(), 
sideInput.getPCollection());
-  }
-  return additionalInputs.build();
+  return PCollectionViews.toAdditionalInputs(sideInputs);
 }
 
 @Override



Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4233

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-2528) BeamSql: support create table

2017-06-27 Thread James Xu (JIRA)
James Xu created BEAM-2528:
--

 Summary: BeamSql: support create table
 Key: BEAM-2528
 URL: https://issues.apache.org/jira/browse/BEAM-2528
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql
Reporter: James Xu
Assignee: James Xu


support create table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2202) Support DDL

2017-06-27 Thread James Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Xu reassigned BEAM-2202:
--

Assignee: (was: James Xu)

> Support DDL
> ---
>
> Key: BEAM-2202
> URL: https://issues.apache.org/jira/browse/BEAM-2202
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: James Xu
>
> Mainly create table, drop table, alter table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2202) Support DDL

2017-06-27 Thread James Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Xu updated BEAM-2202:
---
Issue Type: Task  (was: New Feature)

> Support DDL
> ---
>
> Key: BEAM-2202
> URL: https://issues.apache.org/jira/browse/BEAM-2202
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: James Xu
>Assignee: James Xu
>
> Mainly create table, drop table, alter table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2494

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-1542) Need Source/Sink for Spanner

2017-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065871#comment-16065871
 ] 

ASF GitHub Bot commented on BEAM-1542:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3395


> Need Source/Sink for Spanner
> 
>
> Key: BEAM-1542
> URL: https://issues.apache.org/jira/browse/BEAM-1542
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Guy Molinari
>Assignee: Mairbek Khadikov
>
> Is there a source/sink for Spanner in the works?   If not I would gladly give 
> this a shot.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3395: [BEAM-1542] Cloud Spanner Source

2017-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3395


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[4/4] beam git commit: This closes #3395: [BEAM-1542] Cloud Spanner Source

2017-06-27 Thread jkff
This closes #3395: [BEAM-1542] Cloud Spanner Source


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/e5929bd1
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/e5929bd1
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/e5929bd1

Branch: refs/heads/master
Commit: e5929bd1337747b9c3971caf12f425f7e84750ad
Parents: 0b19fb4 a21a6d7
Author: Eugene Kirpichov 
Authored: Tue Jun 27 18:43:06 2017 -0700
Committer: Eugene Kirpichov 
Committed: Tue Jun 27 18:43:06 2017 -0700

--
 pom.xml |  14 +-
 sdks/java/io/google-cloud-platform/pom.xml  |  16 +-
 .../sdk/io/gcp/spanner/AbstractSpannerFn.java   |  58 ++
 .../sdk/io/gcp/spanner/CreateTransactionFn.java |  51 ++
 .../sdk/io/gcp/spanner/NaiveSpannerReadFn.java  |  65 ++
 .../beam/sdk/io/gcp/spanner/SpannerConfig.java  | 137 +
 .../beam/sdk/io/gcp/spanner/SpannerIO.java  | 604 +--
 .../sdk/io/gcp/spanner/SpannerWriteGroupFn.java | 125 
 .../beam/sdk/io/gcp/spanner/Transaction.java|  33 +
 .../beam/sdk/io/gcp/GcpApiSurfaceTest.java  |  10 +
 .../sdk/io/gcp/spanner/FakeServiceFactory.java  |  82 +++
 .../sdk/io/gcp/spanner/SpannerIOReadTest.java   | 275 +
 .../beam/sdk/io/gcp/spanner/SpannerIOTest.java  | 317 --
 .../sdk/io/gcp/spanner/SpannerIOWriteTest.java  | 258 
 .../beam/sdk/io/gcp/spanner/SpannerReadIT.java  | 169 ++
 .../beam/sdk/io/gcp/spanner/SpannerWriteIT.java |   2 +-
 16 files changed, 1695 insertions(+), 521 deletions(-)
--




[1/4] beam git commit: Pre read api refactoring. Extract `SpannerConfig` and `AbstractSpannerFn`

2017-06-27 Thread jkff
Repository: beam
Updated Branches:
  refs/heads/master 0b19fb414 -> e5929bd13


Pre read api refactoring. Extract `SpannerConfig` and `AbstractSpannerFn`


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/454f1c42
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/454f1c42
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/454f1c42

Branch: refs/heads/master
Commit: 454f1c427353feeb858cdc62185ea3fced8d8a1f
Parents: 80c9263
Author: Mairbek Khadikov 
Authored: Mon Jun 19 13:01:20 2017 -0700
Committer: Eugene Kirpichov 
Committed: Tue Jun 27 18:36:01 2017 -0700

--
 .../sdk/io/gcp/spanner/AbstractSpannerFn.java   |  41 
 .../beam/sdk/io/gcp/spanner/SpannerConfig.java  | 118 ++
 .../beam/sdk/io/gcp/spanner/SpannerIO.java  | 227 ---
 .../sdk/io/gcp/spanner/SpannerWriteGroupFn.java | 108 +
 .../beam/sdk/io/gcp/spanner/SpannerIOTest.java  |   8 +-
 5 files changed, 321 insertions(+), 181 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/454f1c42/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
new file mode 100644
index 000..08f7fa9
--- /dev/null
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
@@ -0,0 +1,41 @@
+package org.apache.beam.sdk.io.gcp.spanner;
+
+import com.google.cloud.spanner.DatabaseClient;
+import com.google.cloud.spanner.DatabaseId;
+import com.google.cloud.spanner.Spanner;
+import com.google.cloud.spanner.SpannerOptions;
+import org.apache.beam.sdk.transforms.DoFn;
+
+/**
+ * Abstract {@link DoFn} that manages {@link Spanner} lifecycle. Use {@link
+ * AbstractSpannerFn#databaseClient} to access the Cloud Spanner database 
client.
+ */
+abstract class AbstractSpannerFn extends DoFn {
+  private transient Spanner spanner;
+  private transient DatabaseClient databaseClient;
+
+  abstract SpannerConfig getSpannerConfig();
+
+  @Setup
+  public void setup() throws Exception {
+SpannerConfig spannerConfig = getSpannerConfig();
+SpannerOptions options = spannerConfig.buildSpannerOptions();
+spanner = options.getService();
+databaseClient = spanner.getDatabaseClient(DatabaseId
+.of(options.getProjectId(), spannerConfig.getInstanceId().get(),
+spannerConfig.getDatabaseId().get()));
+  }
+
+  @Teardown
+  public void teardown() throws Exception {
+if (spanner == null) {
+  return;
+}
+spanner.close();
+spanner = null;
+  }
+
+  protected DatabaseClient databaseClient() {
+return databaseClient;
+  }
+}

http://git-wip-us.apache.org/repos/asf/beam/blob/454f1c42/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.java
new file mode 100644
index 000..4cb8aa2
--- /dev/null
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.java
@@ -0,0 +1,118 @@
+package org.apache.beam.sdk.io.gcp.spanner;
+
+import static com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.ServiceFactory;
+import com.google.cloud.spanner.Spanner;
+import com.google.cloud.spanner.SpannerOptions;
+import com.google.common.annotations.VisibleForTesting;
+import java.io.Serializable;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+
+/** Configuration for a Cloud Spanner client. */
+@AutoValue
+public abstract class SpannerConfig implements Serializable {
+
+  private static final long serialVersionUID = -5680874609304170301L;
+
+  @Nullable
+  abstract ValueProvider getProjectId();
+
+  @Nullable
+  abstract ValueProvider getInstanceId();
+
+  @Nullable
+  abstract ValueProvider getDatabaseId();
+
+  @Nullable
+  @VisibleForTesting
+  abstract ServiceFactory getServiceFactory();
+
+  abstract Builder toBuilder();
+
+  SpannerOptions buildSpannerOptions() {
+SpannerOptions.Builder builder = 

[2/4] beam git commit: Bump spanner version

2017-06-27 Thread jkff
Bump spanner version


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/80c92636
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/80c92636
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/80c92636

Branch: refs/heads/master
Commit: 80c9263617eb453a3595735147f328a8ee6d783e
Parents: 0b19fb4
Author: Mairbek Khadikov 
Authored: Mon Jun 19 12:23:52 2017 -0700
Committer: Eugene Kirpichov 
Committed: Tue Jun 27 18:36:01 2017 -0700

--
 pom.xml   | 2 +-
 .../main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java   | 2 +-
 .../java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOTest.java| 3 ---
 .../java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java   | 2 +-
 4 files changed, 3 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/80c92636/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 29bb4eb..f06568b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -138,7 +138,7 @@
 3.2.0
 v1-rev10-1.22.0
 1.7.14
-0.16.0-beta
+0.20.0-beta
 1.6.2
 4.3.5.RELEASE
 3.1.4

http://git-wip-us.apache.org/repos/asf/beam/blob/80c92636/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
index 8bfc247..32bf1d0 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
@@ -337,7 +337,7 @@ public class SpannerIO {
   if (spanner == null) {
 return;
   }
-  spanner.closeAsync().get();
+  spanner.close();
   spanner = null;
 }
 

http://git-wip-us.apache.org/repos/asf/beam/blob/80c92636/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOTest.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOTest.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOTest.java
index 1e19a59..0cc08bf 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOTest.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOTest.java
@@ -27,7 +27,6 @@ import static org.mockito.Mockito.verify;
 import static org.mockito.Mockito.when;
 import static org.mockito.Mockito.withSettings;
 
-import com.google.api.core.ApiFuture;
 import com.google.cloud.ServiceFactory;
 import com.google.cloud.spanner.DatabaseClient;
 import com.google.cloud.spanner.DatabaseId;
@@ -274,10 +273,8 @@ public class SpannerIOTest implements Serializable {
 mockSpanners.add(mock(Spanner.class, withSettings().serializable()));
 mockDatabaseClients.add(mock(DatabaseClient.class, 
withSettings().serializable()));
   }
-  ApiFuture voidFuture = mock(ApiFuture.class, 
withSettings().serializable());
   when(mockSpanner().getDatabaseClient(Matchers.any(DatabaseId.class)))
   .thenReturn(mockDatabaseClient());
-  when(mockSpanner().closeAsync()).thenReturn(voidFuture);
 }
 
 DatabaseClient mockDatabaseClient() {

http://git-wip-us.apache.org/repos/asf/beam/blob/80c92636/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java
 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java
index e1f6582..33532c9 100644
--- 
a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java
+++ 
b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteIT.java
@@ -150,7 +150,7 @@ public class SpannerWriteIT {
   @After
   public void tearDown() throws Exception {
 databaseAdminClient.dropDatabase(options.getInstanceId(), databaseName);
-spanner.closeAsync().get();
+spanner.close();
   }
 
   private static class GenerateMutations extends DoFn {



[3/4] beam git commit: Read api with naive implementation

2017-06-27 Thread jkff
Read api with naive implementation


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/a21a6d79
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/a21a6d79
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/a21a6d79

Branch: refs/heads/master
Commit: a21a6d79ad38b927bd5d44c63306d85a752a
Parents: 454f1c4
Author: Mairbek Khadikov 
Authored: Mon Jun 19 13:28:52 2017 -0700
Committer: Eugene Kirpichov 
Committed: Tue Jun 27 18:38:21 2017 -0700

--
 pom.xml |  12 +
 sdks/java/io/google-cloud-platform/pom.xml  |  16 +-
 .../sdk/io/gcp/spanner/AbstractSpannerFn.java   |  17 +
 .../sdk/io/gcp/spanner/CreateTransactionFn.java |  51 ++
 .../sdk/io/gcp/spanner/NaiveSpannerReadFn.java  |  65 +++
 .../beam/sdk/io/gcp/spanner/SpannerConfig.java  |  29 +-
 .../beam/sdk/io/gcp/spanner/SpannerIO.java  | 479 ---
 .../sdk/io/gcp/spanner/SpannerWriteGroupFn.java |  17 +
 .../beam/sdk/io/gcp/spanner/Transaction.java|  33 ++
 .../beam/sdk/io/gcp/GcpApiSurfaceTest.java  |  10 +
 .../sdk/io/gcp/spanner/FakeServiceFactory.java  |  82 
 .../sdk/io/gcp/spanner/SpannerIOReadTest.java   | 275 +++
 .../beam/sdk/io/gcp/spanner/SpannerIOTest.java  | 314 
 .../sdk/io/gcp/spanner/SpannerIOWriteTest.java  | 258 ++
 .../beam/sdk/io/gcp/spanner/SpannerReadIT.java  | 169 +++
 15 files changed, 1432 insertions(+), 395 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/a21a6d79/pom.xml
--
diff --git a/pom.xml b/pom.xml
index f06568b..069191c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -161,6 +161,7 @@
 -Werror
 
-Xpkginfo:always
 nothing
+0.20.0
   
 
   pom
@@ -638,6 +639,12 @@
   
 
   
+com.google.api
+gax-grpc
+${gax-grpc.version}
+  
+
+  
 com.google.api-client
 google-api-client
 ${google-clients.version}
@@ -852,6 +859,11 @@
   
 
   
+com.google.cloud
+google-cloud-core-grpc
+${grpc.version}
+  
+  
 com.google.cloud.bigtable
 bigtable-protos
 ${bigtable.version}

http://git-wip-us.apache.org/repos/asf/beam/blob/a21a6d79/sdks/java/io/google-cloud-platform/pom.xml
--
diff --git a/sdks/java/io/google-cloud-platform/pom.xml 
b/sdks/java/io/google-cloud-platform/pom.xml
index 6737eea..94066c7 100644
--- a/sdks/java/io/google-cloud-platform/pom.xml
+++ b/sdks/java/io/google-cloud-platform/pom.xml
@@ -93,7 +93,12 @@
 
 
   com.google.api
-  api-common
+  gax-grpc
+
+
+
+  com.google.cloud
+  google-cloud-core-grpc
 
 
 
@@ -255,12 +260,17 @@
 
 
   org.apache.commons
-  commons-text
-  test
+  commons-lang3
+  provided
 
 
 
 
+  org.apache.commons
+  commons-text
+  test
+
+
   org.apache.beam
   beam-sdks-java-core
   tests

http://git-wip-us.apache.org/repos/asf/beam/blob/a21a6d79/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
--
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
index 08f7fa9..8f1 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/AbstractSpannerFn.java
@@ -1,3 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
 package org.apache.beam.sdk.io.gcp.spanner;
 
 import com.google.cloud.spanner.DatabaseClient;


[jira] [Created] (BEAM-2527) Test queries on unbounded PCollections with BeamSql dsl API

2017-06-27 Thread Xu Mingmin (JIRA)
Xu Mingmin created BEAM-2527:


 Summary: Test queries on unbounded PCollections with BeamSql dsl 
API
 Key: BEAM-2527
 URL: https://issues.apache.org/jira/browse/BEAM-2527
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Xu Mingmin
Assignee: Xu Mingmin


Beam SQL supports both BOUNDED and UNBOUNDED PCollections. 

This task will add unit tests for UNBOUNDED PCollection. For BOUNDED 
PCollection, see BEAM-2452.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3457: Pipeline

2017-06-27 Thread jasonkuster
Github user jasonkuster closed the pull request at:

https://github.com/apache/beam/pull/3457


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3457: Pipeline

2017-06-27 Thread jasonkuster
GitHub user jasonkuster opened a pull request:

https://github.com/apache/beam/pull/3457

Pipeline

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/beam-testing/beam pipeline

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3457


commit 783f463ea8701ffa7b0a95e1f0a91459ca0beecd
Author: beam-testing 
Date:   2017-04-30T23:32:37Z

Create Jenkinsfile

commit 9e3b78781f9d1abf821efe9f3f2f8c65eff06f78
Author: beam-testing 
Date:   2017-04-30T23:50:19Z

Update Jenkinsfile

commit b448e9df6405ac6610f864239a341a810dd130d8
Author: beam-testing 
Date:   2017-04-30T23:58:02Z

Update Jenkinsfile

commit c5a571c53623d8f52ab9d9937a5ef72244e34354
Author: beam-testing 
Date:   2017-05-01T01:08:40Z

Update Jenkinsfile

commit 3db1f08c1f0140f56713adc155b01304dc28f4a4
Author: beam-testing 
Date:   2017-05-01T01:26:32Z

Update Jenkinsfile

commit 5d4f6cb82bfad1d577d09d4521a97cbc9e123813
Author: beam-testing 
Date:   2017-05-01T01:40:32Z

Update Jenkinsfile

commit dd6cbda07a5c741f377f9e7a7fc6ef04c46b5ac7
Author: beam-testing 
Date:   2017-05-01T17:39:37Z

Update Jenkinsfile

commit 4b66735ce0d24df1e87e8124c84228b29fa551f4
Author: Jason Kuster 
Date:   2017-06-28T01:39:02Z

update properties

Signed-off-by: Jason Kuster 

commit b7bdd80a3cc445d9a4bfe1b3211259e29413988a
Author: Jason Kuster 
Date:   2017-06-28T01:39:57Z

set proj url

Signed-off-by: Jason Kuster 

commit e80bbdbbb1cecff6494d77a4af34130912c043df
Author: Jason Kuster 
Date:   2017-06-28T01:41:50Z

right owner

Signed-off-by: Jason Kuster 

commit d3eb2e7df5ef25ca0f6879f7c67c388ed5606617
Author: Jason Kuster 
Date:   2017-06-28T01:45:29Z

trigger off of ghprb

Signed-off-by: Jason Kuster 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (BEAM-2526) Add direct-runner as runtime dependency and default profile

2017-06-27 Thread Manu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang resolved BEAM-2526.
--
   Resolution: Duplicate
Fix Version/s: Not applicable

> Add direct-runner as runtime dependency and default profile
> ---
>
> Key: BEAM-2526
> URL: https://issues.apache.org/jira/browse/BEAM-2526
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: Not applicable
>Reporter: Manu Zhang
>Assignee: Manu Zhang
>Priority: Minor
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2526) Add direct-runner as runtime dependency and default profile

2017-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065779#comment-16065779
 ] 

ASF GitHub Bot commented on BEAM-2526:
--

Github user manuzhang closed the pull request at:

https://github.com/apache/beam/pull/3456


> Add direct-runner as runtime dependency and default profile
> ---
>
> Key: BEAM-2526
> URL: https://issues.apache.org/jira/browse/BEAM-2526
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: Not applicable
>Reporter: Manu Zhang
>Assignee: Manu Zhang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3456: [BEAM-2526] Add direct-runner as runtime dependency...

2017-06-27 Thread manuzhang
Github user manuzhang closed the pull request at:

https://github.com/apache/beam/pull/3456


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: Small fixes to make the example run in a runner agnostic way: - Add direct runner default profile - Add findbugs validation and fix existing findbugs issues - Validate division

2017-06-27 Thread takidau
Repository: beam
Updated Branches:
  refs/heads/DSL_SQL bd99528af -> ab4b11886


Small fixes to make the example run in a runner agnostic way:
- Add direct runner default profile
- Add findbugs validation and fix existing findbugs issues
- Validate division by zero on arithmetic expression + other minor fixes
- Update Calcite version to 1.13


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/32fbc9ce
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/32fbc9ce
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/32fbc9ce

Branch: refs/heads/DSL_SQL
Commit: 32fbc9cec1d0d86e04e3f453b0d75f2ff0e61b56
Parents: bd99528
Author: Ismaël Mejía 
Authored: Mon Jun 26 16:37:51 2017 +0200
Committer: Tyler Akidau 
Committed: Tue Jun 27 18:24:41 2017 -0700

--
 dsls/pom.xml| 14 ++
 dsls/sql/pom.xml| 48 +---
 .../java/org/apache/beam/dsls/sql/BeamSql.java  |  4 +-
 .../beam/dsls/sql/example/BeamSqlExample.java   |  2 +-
 .../interpreter/operator/BeamSqlExpression.java |  2 +-
 .../interpreter/operator/BeamSqlPrimitive.java  |  4 +-
 .../arithmetic/BeamSqlArithmeticExpression.java |  7 ++-
 .../arithmetic/BeamSqlDivideExpression.java |  3 ++
 .../operator/logical/BeamSqlNotExpression.java  |  1 -
 .../operator/math/BeamSqlAbsExpression.java |  2 +
 .../operator/math/BeamSqlRoundExpression.java   |  3 +-
 .../operator/math/BeamSqlSignExpression.java|  2 +
 .../beam/dsls/sql/planner/BeamQueryPlanner.java |  2 +-
 .../apache/beam/dsls/sql/rel/BeamIOSinkRel.java |  3 +-
 .../beam/dsls/sql/rel/BeamIOSourceRel.java  |  4 +-
 .../dsls/sql/schema/BeamPCollectionTable.java   |  2 +-
 .../apache/beam/dsls/sql/schema/BeamSqlRow.java |  5 +-
 .../beam/dsls/sql/schema/BeamTableUtils.java|  4 +-
 .../transform/BeamSetOperatorsTransforms.java   |  5 +-
 .../interpreter/BeamSqlFnExecutorTestBase.java  |  2 +-
 20 files changed, 76 insertions(+), 43 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/32fbc9ce/dsls/pom.xml
--
diff --git a/dsls/pom.xml b/dsls/pom.xml
index a741563..d932698 100644
--- a/dsls/pom.xml
+++ b/dsls/pom.xml
@@ -34,6 +34,20 @@
 sql
   
 
+  
+
+  release
+  
+
+  
+org.codehaus.mojo
+findbugs-maven-plugin
+  
+
+  
+
+  
+
   
 
   

http://git-wip-us.apache.org/repos/asf/beam/blob/32fbc9ce/dsls/sql/pom.xml
--
diff --git a/dsls/sql/pom.xml b/dsls/sql/pom.xml
index d866313..a2279d5 100644
--- a/dsls/sql/pom.xml
+++ b/dsls/sql/pom.xml
@@ -18,11 +18,14 @@
 http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
   xmlns="http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;>
+
   4.0.0
+
   
 org.apache.beam
 beam-dsls-parent
 2.1.0-SNAPSHOT
+../pom.xml
   
 
   beam-dsls-sql
@@ -34,10 +37,30 @@
   
 ${maven.build.timestamp}
 -MM-dd 
HH:mm
-1.12.0
-1.9.0
+1.13.0
+1.10.0
   
 
+  
+
+
+  direct-runner
+  
+true
+  
+  
+
+  org.apache.beam
+  beam-runners-direct-java
+  runtime
+
+  
+
+  
+
   
 
   
@@ -62,11 +85,6 @@
 
   
 org.apache.maven.plugins
-maven-compiler-plugin
-  
-
-  
-org.apache.maven.plugins
 maven-surefire-plugin
 
 -da 
@@ -75,11 +93,6 @@
 
   
 org.apache.maven.plugins
-maven-jar-plugin
-  
-
-  
-org.apache.maven.plugins
 maven-shade-plugin
 
   
@@ -140,11 +153,6 @@
 
 
   org.apache.beam
-  beam-runners-direct-java
-  provided
-
-
-  org.apache.beam
   beam-sdks-java-io-kafka
   provided
 
@@ -195,5 +203,11 @@
   auto-value
   provided
 
+
+
+  org.apache.beam
+  beam-runners-direct-java
+  test
+
   
 

http://git-wip-us.apache.org/repos/asf/beam/blob/32fbc9ce/dsls/sql/src/main/java/org/apache/beam/dsls/sql/BeamSql.java
--
diff --git a/dsls/sql/src/main/java/org/apache/beam/dsls/sql/BeamSql.java 
b/dsls/sql/src/main/java/org/apache/beam/dsls/sql/BeamSql.java
index 5f90380..a0e7cbc 100644
--- a/dsls/sql/src/main/java/org/apache/beam/dsls/sql/BeamSql.java
+++ b/dsls/sql/src/main/java/org/apache/beam/dsls/sql/BeamSql.java
@@ -103,7 +103,7 @@ public class BeamSql {
*/
   private static class QueryTransform extends
   PTransform

[2/2] beam git commit: This closes #3439

2017-06-27 Thread takidau
This closes #3439


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/ab4b1188
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/ab4b1188
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/ab4b1188

Branch: refs/heads/DSL_SQL
Commit: ab4b118869070e94a4205744d6d60525d3fa2882
Parents: bd99528 32fbc9c
Author: Tyler Akidau 
Authored: Tue Jun 27 18:25:56 2017 -0700
Committer: Tyler Akidau 
Committed: Tue Jun 27 18:25:56 2017 -0700

--
 dsls/pom.xml| 14 ++
 dsls/sql/pom.xml| 48 +---
 .../java/org/apache/beam/dsls/sql/BeamSql.java  |  4 +-
 .../beam/dsls/sql/example/BeamSqlExample.java   |  2 +-
 .../interpreter/operator/BeamSqlExpression.java |  2 +-
 .../interpreter/operator/BeamSqlPrimitive.java  |  4 +-
 .../arithmetic/BeamSqlArithmeticExpression.java |  7 ++-
 .../arithmetic/BeamSqlDivideExpression.java |  3 ++
 .../operator/logical/BeamSqlNotExpression.java  |  1 -
 .../operator/math/BeamSqlAbsExpression.java |  2 +
 .../operator/math/BeamSqlRoundExpression.java   |  3 +-
 .../operator/math/BeamSqlSignExpression.java|  2 +
 .../beam/dsls/sql/planner/BeamQueryPlanner.java |  2 +-
 .../apache/beam/dsls/sql/rel/BeamIOSinkRel.java |  3 +-
 .../beam/dsls/sql/rel/BeamIOSourceRel.java  |  4 +-
 .../dsls/sql/schema/BeamPCollectionTable.java   |  2 +-
 .../apache/beam/dsls/sql/schema/BeamSqlRow.java |  5 +-
 .../beam/dsls/sql/schema/BeamTableUtils.java|  4 +-
 .../transform/BeamSetOperatorsTransforms.java   |  5 +-
 .../interpreter/BeamSqlFnExecutorTestBase.java  |  2 +-
 20 files changed, 76 insertions(+), 43 deletions(-)
--




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4232

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2526) Add direct-runner as runtime dependency and default profile

2017-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065752#comment-16065752
 ] 

ASF GitHub Bot commented on BEAM-2526:
--

GitHub user manuzhang opened a pull request:

https://github.com/apache/beam/pull/3456

[BEAM-2526] Add direct-runner as runtime dependency and default profile

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manuzhang/beam sql

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3456.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3456


commit 3130f47921b4012d0800bb0ef5ea3efe4b767ff9
Author: manuzhang 
Date:   2017-06-28T00:56:37Z

[BEAM-2526] Add direct-runner as runtime dependency and default profile




> Add direct-runner as runtime dependency and default profile
> ---
>
> Key: BEAM-2526
> URL: https://issues.apache.org/jira/browse/BEAM-2526
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: Not applicable
>Reporter: Manu Zhang
>Assignee: Manu Zhang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (BEAM-2393) BoundedSource is not fault-tolerant in FlinkRunner Streaming mode

2017-06-27 Thread Jingsong Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee reassigned BEAM-2393:
--

Assignee: Jingsong Lee

> BoundedSource is not fault-tolerant in FlinkRunner Streaming mode
> -
>
> Key: BEAM-2393
> URL: https://issues.apache.org/jira/browse/BEAM-2393
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Jingsong Lee
>Assignee: Jingsong Lee
>
> {{BoundedSourceWrapper}} does not implement snapshot() and restore(), when 
> the failure to restart, it will send duplicate data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3456: [BEAM-2526] Add direct-runner as runtime dependency...

2017-06-27 Thread manuzhang
GitHub user manuzhang opened a pull request:

https://github.com/apache/beam/pull/3456

[BEAM-2526] Add direct-runner as runtime dependency and default profile

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manuzhang/beam sql

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3456.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3456


commit 3130f47921b4012d0800bb0ef5ea3efe4b767ff9
Author: manuzhang 
Date:   2017-06-28T00:56:37Z

[BEAM-2526] Add direct-runner as runtime dependency and default profile




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2393) BoundedSource is not fault-tolerant in FlinkRunner Streaming mode

2017-06-27 Thread Jingsong Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065750#comment-16065750
 ] 

Jingsong Lee commented on BEAM-2393:


Now the {{UnboundedSourceWrapper}} has already supported the exit when the 
watermark exceeds TIMESTAMP_MAX_VALUE. 
So can we use {{BoundedToUnboundedSourceAdapter}}?

bq. Checkpoints are created by calling {{BoundedReader#splitAtFraction}} on 
inner {{BoundedSource}}.
bq. Sources that cannot be split are read entirely into memory, so this 
transform does not work well with large, unsplittable sources.

But at least we can provide an accurate semantics.

> BoundedSource is not fault-tolerant in FlinkRunner Streaming mode
> -
>
> Key: BEAM-2393
> URL: https://issues.apache.org/jira/browse/BEAM-2393
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Jingsong Lee
>
> {{BoundedSourceWrapper}} does not implement snapshot() and restore(), when 
> the failure to restart, it will send duplicate data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-2526) Add direct-runner as runtime dependency and default profile

2017-06-27 Thread Manu Zhang (JIRA)
Manu Zhang created BEAM-2526:


 Summary: Add direct-runner as runtime dependency and default 
profile
 Key: BEAM-2526
 URL: https://issues.apache.org/jira/browse/BEAM-2526
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Affects Versions: Not applicable
Reporter: Manu Zhang
Assignee: Manu Zhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2493

2017-06-27 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4231

2017-06-27 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2492

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065663#comment-16065663
 ] 

Kenneth Knowles commented on BEAM-2140:
---

Yes, your last comment is the reason. If the watermark of the Create doesn't 
advance past _n_, then as far as ParDo(read forever) knows, it could receive a 
topic timestamped _n_ that will result in pulling elements from pubsub 
timestamped _n_. So you have a stuck pipeline. This highlights the issue we've 
discussed that a per-PCollection watermark is not really ideal for SDF, but I 
think that scope is too big for this ticket.

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065661#comment-16065661
 ] 

Eugene Kirpichov commented on BEAM-2140:


Working backwards from that, in the "read Pubsub topic names from Kafka" case: 
let "topicsPC" be the PCollection of topic names, and "recordsPC" be the 
PCollection of records read from these topics.

We want the watermark of "recordsPC" to be a lower bound on timestamps of new 
records that will ever be read from any of the current or future topics.
Currently known elements of topicsPC already provide this bound for their 
records via the watermark hold, but for elements of recordsPC that will be 
produced from future elements of topicsPC, the only information we have is the 
watermark of topicsPC.

So, the ideal observable watermark behavior is as follows:
- elements in topicsPC have timestamps, and timestamp of an element in topicsPC 
is a reasonable starting watermark for elements produced from this topic into 
recordsPC.
- watermark of recordsPC should be min(current watermark holds set by currently 
pending element/restriction pairs from topicsPC, watermark of topicsPC itself) 
- the first term describes what records can arrive from currently read topics, 
the second term, from future topics, due to bullet 1.
- in the special case where topicsPC is bounded (e.g. Create.of()) and its 
watermark has advanced to infinity, this reduces to just the current watermark 
holds, which is correct.

Now, our problem is that if watermark of topicsPC advances to infinity (e.g. 
because it was bounded and we've processed the initial ProcessElement calls for 
its element/restriction pairs), the runner thinks that it's a promise that 
"likely nothing new will appear in this PCollection" which is not true of the 
processing-time timer set by SDF.

On the other hand, if we hold the watermark of topicsPC at the original ancient 
timestamp of the element/restriction pair, the runner will interpret it as "I 
can only promise you that new elements/timers in topicsPC will have a timestamp 
later than this ancient timestamp" which is unnecessarily restrictive - in 
reality, new elements/timers in topicsPC will either come from the transform 
that produces topicsPC, or from new processing-time timers scheduled by the 
currently read topics, and watermark should be min(these).

How do we make a promise about future event-time timestamps of processing-time 
timers? You say "Currently processing time timers are treated as inputs with a 
timestamp equal to the input watermark at the moment of their arrival" - which 
would be the current watermark of "topicsPC" I suppose? I think that would be 
consistent with the desired behavior above.

(clearly more thought is needed, but thought I'd dump this anyway)

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4230

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2524) Update Google Cloud Console URL returned by DataflowRunner to support regions.

2017-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065647#comment-16065647
 ] 

ASF GitHub Bot commented on BEAM-2524:
--

GitHub user lostluck opened a pull request:

https://github.com/apache/beam/pull/3455

[BEAM-2524] Update Dataflow FE URLs with regionalized paths.

Updates both the Java and the Python Dataflow Runners with the regionalized 
form of the Google Cloud Dataflow FE URL in anticipation of Regional Dataflow.


https://console.cloud.corp.google.com/dataflow/jobsDetail/locations//jobs/?project=

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lostluck/beam regionalize

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3455


commit 9ee103b25b232cbd53da05510d45c2cec2a8676a
Author: Robert Burke 
Date:   2017-06-27T22:41:56Z

BEAM-2524 Update Google Cloud Dataflow FE URLs from the Dataflow Runners to 
regionalized paths.




> Update Google Cloud Console URL returned by DataflowRunner to support regions.
> --
>
> Key: BEAM-2524
> URL: https://issues.apache.org/jira/browse/BEAM-2524
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Robert Burke
>Assignee: Robert Burke
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Both the Java and Python Dataflow Runners need to be updated with a 
> regionalized form of the Google Cloud Console URL to support multiple 
> Dataflow Regions.
> The new URL format will be:
> https://console.cloud.corp.google.com/dataflow/jobsDetail/locations//jobs/?project=



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3455: [BEAM-2524] Update Dataflow FE URLs with regionaliz...

2017-06-27 Thread lostluck
GitHub user lostluck opened a pull request:

https://github.com/apache/beam/pull/3455

[BEAM-2524] Update Dataflow FE URLs with regionalized paths.

Updates both the Java and the Python Dataflow Runners with the regionalized 
form of the Google Cloud Dataflow FE URL in anticipation of Regional Dataflow.


https://console.cloud.corp.google.com/dataflow/jobsDetail/locations//jobs/?project=

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lostluck/beam regionalize

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3455


commit 9ee103b25b232cbd53da05510d45c2cec2a8676a
Author: Robert Burke 
Date:   2017-06-27T22:41:56Z

BEAM-2524 Update Google Cloud Dataflow FE URLs from the Dataflow Runners to 
regionalized paths.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-1187) GCP Transport not performing timed backoff after connection failure

2017-06-27 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-1187:

Description: 
The http request retries are failing and seemingly being immediately retried if 
there is a connection exception. Note that below all the times are the same, 
and also that we are logging too much. This seems to be related to the 
interaction by the chaining http request initializer combining the Credential 
initializer followed by the RetryHttpRequestInitializer. Also, note that we 
never log "Request failed with IOException, will NOT retry" which implies that 
the retry logic never made it to the RetryHttpRequestInitializer.

Action items are:
1) Ensure that the RetryHttpRequestInitializer is used
2) Ensure that calls do backoff
3) PR/3430 -Reduce the logging to one terminal statement saying that we retried 
X times and final failure was YYY-

Dump of console output:
Dec 20, 2016 9:12:20 AM 
com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from 
the classpath: will stage 1 files. Enable logging at DEBUG level to see which 
files will be staged.
Dec 20, 2016 9:12:21 AM 
com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner run
INFO: Executing pipeline on the Dataflow Service, which will have billing 
implications related to Google Compute Engine usage and other Google Cloud 
Services.
Dec 20, 2016 9:12:21 AM com.google.cloud.dataflow.sdk.util.PackageUtil 
stageClasspathElements
INFO: Uploading 1 files from PipelineOptions.filesToStage to staging location 
to prepare for execution.
Dec 20, 2016 9:12:21 AM com.google.cloud.dataflow.sdk.util.PackageUtil 
stageClasspathElements
INFO: Uploading PipelineOptions.filesToStage complete: 1 files newly uploaded, 
0 files cached
Dec 20, 2016 9:12:22 AM com.google.api.client.http.HttpRequest execute
WARNING: exception thrown while executing request
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
at 
sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1283)
at 
sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1258)
at 
com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:77)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at 
com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.run(DataflowPipelineRunner.java:632)
at 
com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.run(DataflowPipelineRunner.java:201)
at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:181)
at 
com.google.cloud.dataflow.integration.NumbersStreaming.numbersStreamingFromPubsub(NumbersStreaming.java:378)
at 
com.google.cloud.dataflow.integration.NumbersStreaming.main(NumbersStreaming.java:831)

Dec 20, 2016 9:12:22 AM com.google.api.client.http.HttpRequest execute
WARNING: exception thrown while executing request
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at 

[jira] [Updated] (BEAM-1187) GCP Transport not performing timed backoff after connection failure

2017-06-27 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-1187:

Description: 
The http request retries are failing and seemingly being immediately retried if 
there is a connection exception. Note that below all the times are the same, 
and also that we are logging too much. This seems to be related to the 
interaction by the chaining http request initializer combining the Credential 
initializer followed by the RetryHttpRequestInitializer. Also, note that we 
never log "Request failed with IOException, will NOT retry" which implies that 
the retry logic never made it to the RetryHttpRequestInitializer.

Action items are:
1) Ensure that the RetryHttpRequestInitializer is used
2) Ensure that calls do backoff
3) PR/3430 -Reduce the logging to one terminal statement saying that we retried 
X times and final failure was YYY. -

Dump of console output:
Dec 20, 2016 9:12:20 AM 
com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from 
the classpath: will stage 1 files. Enable logging at DEBUG level to see which 
files will be staged.
Dec 20, 2016 9:12:21 AM 
com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner run
INFO: Executing pipeline on the Dataflow Service, which will have billing 
implications related to Google Compute Engine usage and other Google Cloud 
Services.
Dec 20, 2016 9:12:21 AM com.google.cloud.dataflow.sdk.util.PackageUtil 
stageClasspathElements
INFO: Uploading 1 files from PipelineOptions.filesToStage to staging location 
to prepare for execution.
Dec 20, 2016 9:12:21 AM com.google.cloud.dataflow.sdk.util.PackageUtil 
stageClasspathElements
INFO: Uploading PipelineOptions.filesToStage complete: 1 files newly uploaded, 
0 files cached
Dec 20, 2016 9:12:22 AM com.google.api.client.http.HttpRequest execute
WARNING: exception thrown while executing request
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
at 
sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1283)
at 
sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1258)
at 
com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:77)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at 
com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.run(DataflowPipelineRunner.java:632)
at 
com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.run(DataflowPipelineRunner.java:201)
at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:181)
at 
com.google.cloud.dataflow.integration.NumbersStreaming.numbersStreamingFromPubsub(NumbersStreaming.java:378)
at 
com.google.cloud.dataflow.integration.NumbersStreaming.main(NumbersStreaming.java:831)

Dec 20, 2016 9:12:22 AM com.google.api.client.http.HttpRequest execute
WARNING: exception thrown while executing request
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at 

[GitHub] beam pull request #3430: [BEAM-1187] Improve logging to contain the number o...

2017-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3430


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-1187) GCP Transport not performing timed backoff after connection failure

2017-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065642#comment-16065642
 ] 

ASF GitHub Bot commented on BEAM-1187:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3430


> GCP Transport not performing timed backoff after connection failure
> ---
>
> Key: BEAM-1187
> URL: https://issues.apache.org/jira/browse/BEAM-1187
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-java-core, sdk-java-gcp
>Reporter: Luke Cwik
>Priority: Minor
>
> The http request retries are failing and seemingly being immediately retried 
> if there is a connection exception. Note that below all the times are the 
> same, and also that we are logging too much. This seems to be related to the 
> interaction by the chaining http request initializer combining the Credential 
> initializer followed by the RetryHttpRequestInitializer. Also, note that we 
> never log "Request failed with IOException, will NOT retry" which implies 
> that the retry logic never made it to the RetryHttpRequestInitializer.
> Action items are:
> 1) Ensure that the RetryHttpRequestInitializer is used
> 2) Ensure that calls do backoff
> 3) Reduce the logging to one terminal statement saying that we retried X 
> times and final failure was YYY.
> Dump of console output:
> Dec 20, 2016 9:12:20 AM 
> com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner fromOptions
> INFO: PipelineOptions.filesToStage was not specified. Defaulting to files 
> from the classpath: will stage 1 files. Enable logging at DEBUG level to see 
> which files will be staged.
> Dec 20, 2016 9:12:21 AM 
> com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner run
> INFO: Executing pipeline on the Dataflow Service, which will have billing 
> implications related to Google Compute Engine usage and other Google Cloud 
> Services.
> Dec 20, 2016 9:12:21 AM com.google.cloud.dataflow.sdk.util.PackageUtil 
> stageClasspathElements
> INFO: Uploading 1 files from PipelineOptions.filesToStage to staging location 
> to prepare for execution.
> Dec 20, 2016 9:12:21 AM com.google.cloud.dataflow.sdk.util.PackageUtil 
> stageClasspathElements
> INFO: Uploading PipelineOptions.filesToStage complete: 1 files newly 
> uploaded, 0 files cached
> Dec 20, 2016 9:12:22 AM com.google.api.client.http.HttpRequest execute
> WARNING: exception thrown while executing request
> java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:589)
>   at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
>   at sun.net.www.http.HttpClient.(HttpClient.java:211)
>   at sun.net.www.http.HttpClient.New(HttpClient.java:308)
>   at sun.net.www.http.HttpClient.New(HttpClient.java:326)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1283)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1258)
>   at 
> com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:77)
>   at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
>   at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
>   at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
>   at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
>   at 
> com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.run(DataflowPipelineRunner.java:632)
>   at 
> com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.run(DataflowPipelineRunner.java:201)
>   at 

[2/2] beam git commit: [BEAM-1187] Improve logging to contain the number of retries done due to IOException and unsuccessful response codes

2017-06-27 Thread lcwik
[BEAM-1187] Improve logging to contain the number of retries done due to 
IOException and unsuccessful response codes

This closes #3430


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/0b19fb41
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/0b19fb41
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/0b19fb41

Branch: refs/heads/master
Commit: 0b19fb414322a27b7bd595465f34111f86f72a3c
Parents: beb21f4 03741ba
Author: Luke Cwik 
Authored: Tue Jun 27 16:09:23 2017 -0700
Committer: Luke Cwik 
Committed: Tue Jun 27 16:09:23 2017 -0700

--
 .../sdk/util/RetryHttpRequestInitializer.java   | 148 ---
 .../util/RetryHttpRequestInitializerTest.java   |  31 ++--
 2 files changed, 116 insertions(+), 63 deletions(-)
--




[1/2] beam git commit: [BEAM-1187] Improve logging to contain the number of retries done due to IOException and unsuccessful response codes.

2017-06-27 Thread lcwik
Repository: beam
Updated Branches:
  refs/heads/master beb21f472 -> 0b19fb414


[BEAM-1187] Improve logging to contain the number of retries done due to 
IOException and unsuccessful response codes.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/03741ba0
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/03741ba0
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/03741ba0

Branch: refs/heads/master
Commit: 03741ba0b55e446cd2583e7ccc88f79ec21705b5
Parents: beb21f4
Author: Luke Cwik 
Authored: Fri Jun 23 09:32:49 2017 -0700
Committer: Luke Cwik 
Committed: Tue Jun 27 16:08:53 2017 -0700

--
 .../sdk/util/RetryHttpRequestInitializer.java   | 148 ---
 .../util/RetryHttpRequestInitializerTest.java   |  31 ++--
 2 files changed, 116 insertions(+), 63 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/03741ba0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/RetryHttpRequestInitializer.java
--
diff --git 
a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/RetryHttpRequestInitializer.java
 
b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/RetryHttpRequestInitializer.java
index a23bee3..fd908cf 100644
--- 
a/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/RetryHttpRequestInitializer.java
+++ 
b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/RetryHttpRequestInitializer.java
@@ -17,8 +17,9 @@
  */
 package org.apache.beam.sdk.util;
 
-import com.google.api.client.http.HttpBackOffIOExceptionHandler;
-import com.google.api.client.http.HttpBackOffUnsuccessfulResponseHandler;
+import static com.google.api.client.util.BackOffUtils.next;
+
+import com.google.api.client.http.HttpIOExceptionHandler;
 import com.google.api.client.http.HttpRequest;
 import com.google.api.client.http.HttpRequestInitializer;
 import com.google.api.client.http.HttpResponse;
@@ -60,65 +61,106 @@ public class RetryHttpRequestInitializer implements 
HttpRequestInitializer {
*/
   private static final int HANGING_GET_TIMEOUT_SEC = 80;
 
-  private static class LoggingHttpBackOffIOExceptionHandler
-  extends HttpBackOffIOExceptionHandler {
-public LoggingHttpBackOffIOExceptionHandler(BackOff backOff) {
-  super(backOff);
+  /** Handlers used to provide additional logging information on unsuccessful 
HTTP requests. */
+  private static class LoggingHttpBackOffHandler
+  implements HttpIOExceptionHandler, HttpUnsuccessfulResponseHandler {
+
+private final Sleeper sleeper;
+private final BackOff ioExceptionBackOff;
+private final BackOff unsuccessfulResponseBackOff;
+private final Set ignoredResponseCodes;
+private int ioExceptionRetries;
+private int unsuccessfulResponseRetries;
+
+private LoggingHttpBackOffHandler(
+Sleeper sleeper,
+BackOff ioExceptionBackOff,
+BackOff unsucessfulResponseBackOff,
+Set ignoredResponseCodes) {
+  this.sleeper = sleeper;
+  this.ioExceptionBackOff = ioExceptionBackOff;
+  this.unsuccessfulResponseBackOff = unsucessfulResponseBackOff;
+  this.ignoredResponseCodes = ignoredResponseCodes;
 }
 
 @Override
 public boolean handleIOException(HttpRequest request, boolean 
supportsRetry)
 throws IOException {
-  boolean willRetry = super.handleIOException(request, supportsRetry);
+  // We will retry if the request supports retry or the backoff was 
successful.
+  // Note that the order of these checks is important since
+  // backOffWasSuccessful will perform a sleep.
+  boolean willRetry = supportsRetry && 
backOffWasSuccessful(ioExceptionBackOff);
   if (willRetry) {
+ioExceptionRetries += 1;
 LOG.debug("Request failed with IOException, will retry: {}", 
request.getUrl());
   } else {
-LOG.warn(
-"Request failed with IOException (caller responsible for 
retrying): {}",
+String message = "Request failed with IOException, "
++ "performed {} retries due to IOExceptions, "
++ "performed {} retries due to unsuccessful status codes, "
++ "HTTP framework says request {} be retried, "
++ "(caller responsible for retrying): {}";
+LOG.warn(message,
+ioExceptionRetries,
+unsuccessfulResponseRetries,
+supportsRetry ? "can" : "cannot",
 request.getUrl());
   }
   return willRetry;
 }
-  }
-
-  private static class LoggingHttpBackoffUnsuccessfulResponseHandler
-  implements 

[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065627#comment-16065627
 ] 

Eugene Kirpichov commented on BEAM-2140:


...Or is the problem that the watermark of the output PCollection has to be 
smaller than watermark of the input, so if input doesn't advance then output 
can't advance either?

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065626#comment-16065626
 ] 

Eugene Kirpichov commented on BEAM-2140:


_we can't advance the watermark as though it was non-splittable in the 
unbounded case_ - why is that / why is it a bad thing that the watermark of the 
PCollection being fed into the SDF would not advance? E.g. imagine it's a 
Create.of(pubsub topic name) + ParDo(read pubsub forever) - is it important to 
advance the watermark of the Create.of()?

Alternatively, imagine it's: read filepatterns from pubsub + 
TextIO.readAll().watchForNewFiles().watchFilesForNewEntries(), which has 
several SDFs in this. Would there be a problem with advancing the watermark of 
the PCollection of filepatterns only after the watch termination conditions of 
TextIO.readAll() are hit and this filepattern is no longer watched?

Alternatively - worst case I guess: read Pubsub topic names from Kafka, and 
read each topic forever. I'd assume that the user would be interested in 
advancement of the watermark of the PCollection of pubsub records rather than 
the PCollection of Pubsub topic names? I'm not sure the Pubsub topic names in 
Kafka would even need to have meaningful timestamps (rather than infinite past).

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3454: Add PubSub I/O support to Python DirectRunner

2017-06-27 Thread charlesccychen
GitHub user charlesccychen opened a pull request:

https://github.com/apache/beam/pull/3454

Add PubSub I/O support to Python DirectRunner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/charlesccychen/beam directrunner-pubsub

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3454


commit 5c0686d1148605cc4bbe5932b1949d8e58bdc55b
Author: Charles Chen 
Date:   2017-06-27T01:03:53Z

Add PubSub I/O support to Python DirectRunner




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2491

2017-06-27 Thread Apache Jenkins Server
See 




[GitHub] beam pull request #3453: Reflect #assignsToOneWindow in WindowingStrategy

2017-06-27 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/3453

Reflect #assignsToOneWindow in WindowingStrategy

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam 
windowing_strategy_single_assignment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3453


commit d5392dcd5f4566b6a3b88b0b87a36f8fc26bf1ce
Author: Thomas Groh 
Date:   2017-06-27T22:03:11Z

Reflect #assignsToOneWindow in WindowingStrategy




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3452: Add WindowFn#assignsToOneWindow

2017-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3452


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: This closes #3452

2017-06-27 Thread tgroh
Repository: beam
Updated Branches:
  refs/heads/master 2365e7127 -> beb21f472


This closes #3452


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/beb21f47
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/beb21f47
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/beb21f47

Branch: refs/heads/master
Commit: beb21f472a32114c826a663eb635708ef0da1d69
Parents: 2365e71 7fee4b9
Author: Thomas Groh 
Authored: Tue Jun 27 15:03:58 2017 -0700
Committer: Thomas Groh 
Committed: Tue Jun 27 15:03:58 2017 -0700

--
 .../apache/beam/sdk/testing/StaticWindows.java  |  5 
 .../sdk/transforms/windowing/GlobalWindows.java |  5 
 .../windowing/PartitioningWindowFn.java |  5 
 .../transforms/windowing/SlidingWindows.java|  5 
 .../beam/sdk/transforms/windowing/WindowFn.java | 11 +++
 .../apache/beam/sdk/util/IdentityWindowFn.java  |  5 
 .../windowing/SlidingWindowsTest.java   | 30 
 7 files changed, 61 insertions(+), 5 deletions(-)
--




[2/2] beam git commit: Add WindowFn#assignsToOneWindow

2017-06-27 Thread tgroh
Add WindowFn#assignsToOneWindow


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/7fee4b93
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/7fee4b93
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/7fee4b93

Branch: refs/heads/master
Commit: 7fee4b93d5b548d390ab2511a91880b4c5e57a26
Parents: 2365e71
Author: Thomas Groh 
Authored: Tue Jun 27 14:23:22 2017 -0700
Committer: Thomas Groh 
Committed: Tue Jun 27 15:03:58 2017 -0700

--
 .../apache/beam/sdk/testing/StaticWindows.java  |  5 
 .../sdk/transforms/windowing/GlobalWindows.java |  5 
 .../windowing/PartitioningWindowFn.java |  5 
 .../transforms/windowing/SlidingWindows.java|  5 
 .../beam/sdk/transforms/windowing/WindowFn.java | 11 +++
 .../apache/beam/sdk/util/IdentityWindowFn.java  |  5 
 .../windowing/SlidingWindowsTest.java   | 30 
 7 files changed, 61 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/7fee4b93/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/StaticWindows.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/StaticWindows.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/StaticWindows.java
index c11057a..eba6978 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/StaticWindows.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/StaticWindows.java
@@ -126,4 +126,9 @@ final class StaticWindows extends 
NonMergingWindowFn {
   }
 };
   }
+
+  @Override
+  public boolean assignsToOneWindow() {
+return true;
+  }
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/7fee4b93/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java
index d48d26b..c68c497 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/GlobalWindows.java
@@ -79,6 +79,11 @@ public class GlobalWindows extends 
NonMergingWindowFn {
   }
 
   @Override
+  public boolean assignsToOneWindow() {
+return true;
+  }
+
+  @Override
   public boolean equals(Object other) {
 return other instanceof GlobalWindows;
   }

http://git-wip-us.apache.org/repos/asf/beam/blob/7fee4b93/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java
index 40ee68a..341ba27 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PartitioningWindowFn.java
@@ -58,4 +58,9 @@ public abstract class PartitioningWindowFn
   public Instant getOutputTime(Instant inputTimestamp, W window) {
 return inputTimestamp;
   }
+
+  @Override
+  public final boolean assignsToOneWindow() {
+return true;
+  }
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/7fee4b93/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/SlidingWindows.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/SlidingWindows.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/SlidingWindows.java
index f657884..150b956 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/SlidingWindows.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/SlidingWindows.java
@@ -148,6 +148,11 @@ public class SlidingWindows extends 
NonMergingWindowFn {
   }
 
   @Override
+  public boolean assignsToOneWindow() {
+return !this.period.isShorterThan(this.size);
+  }
+
+  @Override
   public void verifyCompatibility(WindowFn other) throws 
IncompatibleWindowException {
 if (!this.isCompatible(other)) {
   throw new IncompatibleWindowException(


[jira] [Created] (BEAM-2525) WindowFnTestUtils#runWindowFn should be removed

2017-06-27 Thread Thomas Groh (JIRA)
Thomas Groh created BEAM-2525:
-

 Summary: WindowFnTestUtils#runWindowFn should be removed
 Key: BEAM-2525
 URL: https://issues.apache.org/jira/browse/BEAM-2525
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Thomas Groh


Its use makes test methods (like most SlidingWIndow tests) very difficult to 
read, as there's no obvious behavior that is being tested for.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3452: Add WindowFn#assignsToOneWindow

2017-06-27 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/3452

Add WindowFn#assignsToOneWindow

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam assign_to_one_window_fn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3452.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3452


commit 8a8ac5b68fee1158d6e960fbbb9ce6576e1b5e61
Author: Thomas Groh 
Date:   2017-06-27T21:23:22Z

Add WindowFn#assignsToOneWindow




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065478#comment-16065478
 ] 

Kenneth Knowles commented on BEAM-2140:
---

Ah, it is true that the input element is still pending. That is a very helpful 
perspective. So view it as a sort of NACK of the element as a whole, while 
saving the restriction to state. However, we can't advance the watermark as 
though it was non-splittable in the unbounded case; that would freeze the input 
watermark forever. (in the bounded case, I presume it is still the -inf to +inf 
story for all the same reasons as bounded sources)

Instead, perhaps the "amount consumed" of this element is measured by the 
watermark reporting done during processing of the element by the SDF. So the 
input watermark can move forward to that point. This would subsume output 
watermark holds, which seems nice.

I doubt think this kind of hold makes sense outside of SDF. Maybe, with revised 
semantics as per my final section above, but I don't think we should take on 
the additional challenge of designing this in a safe user-facing way in order 
to push SDF forwards.

On the internal side you do need a way to manage the watermarks in the 
underlying engine. We should note that {{ProcessFn}} already is treated 
specially via a {{ProcessFnRunner}} within a Flink {{SplittableDoFnOperator}}. 
So we are assuming explicit runner support and we are talking about how the 
{{SplittableDoFnOperator}} communicates with Flink. I would actually suggest a 
"partial NACK with new watermark" style of API (just like ProcessContinuation) 
so that it is tightly coupled with the fact that the element should be 
re-delivered. I would focus a lot on making stuck pipelines impossible, since 
they are hard to debug.

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4229

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Comment Edited] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065429#comment-16065429
 ] 

Eugene Kirpichov edited comment on BEAM-2140 at 6/27/17 8:38 PM:
-

Okay, I see that I misunderstood what the watermark hold does, and now I'm not 
sure how anything works at all (i.e. why timers set by SDF are not constantly 
dropped) - in direct and dataflow runner :-|

For SDF specifically, I think it would make sense to *advance the watermark of 
the input same as if the DoFn was not splittable* - i.e. consider the input 
element "consumed" only when the ProcessElement call terminates with no 
residual restriction. In other words, I guess, set an "input watermark hold" 
(in addition to output watermark hold)? Is such a thing possible? Does it make 
equal sense for non-splittable DoFn's that use timers?


was (Author: jkff):
Okay, I see that I misunderstood what the watermark hold does, and now I'm not 
sure how anything works at all (i.e. why timers set by SDF are not constantly 
dropped) - in direct and dataflow runner :-|

For SDF specifically, I think it would make sense to **advance the watermark of 
the input same as if the DoFn was not splittable** - i.e. consider the input 
element "consumed" only when the ProcessElement call terminates with no 
residual restriction. In other words, I guess, set an "input watermark hold" 
(in addition to output watermark hold)? Is such a thing possible? Does it make 
equal sense for non-splittable DoFn's that use timers?

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065429#comment-16065429
 ] 

Eugene Kirpichov commented on BEAM-2140:


Okay, I see that I misunderstood what the watermark hold does, and now I'm not 
sure how anything works at all (i.e. why timers set by SDF are not constantly 
dropped) - in direct and dataflow runner :-|

For SDF specifically, I think it would make sense to **advance the watermark of 
the input same as if the DoFn was not splittable** - i.e. consider the input 
element "consumed" only when the ProcessElement call terminates with no 
residual restriction. In other words, I guess, set an "input watermark hold" 
(in addition to output watermark hold)? Is such a thing possible? Does it make 
equal sense for non-splittable DoFn's that use timers?

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065398#comment-16065398
 ] 

Aljoscha Krettek commented on BEAM-2140:


I need to look into things a bit but I think your comment about needing a 
"loop" is right: at the end of the {{@ProcessElement}} method a processing-time 
timer is set for the current processing time. This is expected to "re-awake" 
the {{@ProcessElement}} method, thereby creating a loop that runs until we have 
no more residual restriction to process. (Or, as is the case now, until the 
watermark on the input goes to +Inf, thereby missing some processing.)

As a side node, the timers that {{SplittableDoFnOperator}} currently hands to 
{{ProcessFn}} (via {{StatefulDoFnRunner}}, which does late data dropping, i.e. 
also timer dropping) are wrapped in a {{WindowedValue}} with a global window 
and this global window is used for determining if the timer is late, which is 
only the case if the watermark went to +Inf. This is more by chance than by any 
conscious decision. 

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2490

2017-06-27 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4227

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2521) Simplify packaging for python distributions

2017-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065340#comment-16065340
 ] 

ASF GitHub Bot commented on BEAM-2521:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3448


> Simplify packaging for python distributions
> ---
>
> Key: BEAM-2521
> URL: https://issues.apache.org/jira/browse/BEAM-2521
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3448: [BEAM-2521] Use installed distribution name for sdk...

2017-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3448


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3448

2017-06-27 Thread altay
This closes #3448


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/2365e712
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/2365e712
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/2365e712

Branch: refs/heads/master
Commit: 2365e7127cd8c938d9a668af790e697cfe944c1b
Parents: 9dc40d6 d6855ac
Author: Ahmet Altay 
Authored: Tue Jun 27 12:29:03 2017 -0700
Committer: Ahmet Altay 
Committed: Tue Jun 27 12:29:03 2017 -0700

--
 .../apache_beam/runners/dataflow/internal/dependency.py  | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)
--




[1/2] beam git commit: Use installed distribution name for sdk name

2017-06-27 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master 9dc40d637 -> 2365e7127


Use installed distribution name for sdk name


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/d6855ac6
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/d6855ac6
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/d6855ac6

Branch: refs/heads/master
Commit: d6855ac6797b8f83bf57b6ccdaf20bbf3db316c6
Parents: 9dc40d6
Author: Ahmet Altay 
Authored: Mon Jun 26 23:22:36 2017 -0700
Committer: Ahmet Altay 
Committed: Tue Jun 27 12:29:00 2017 -0700

--
 .../apache_beam/runners/dataflow/internal/dependency.py  | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/d6855ac6/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
--
diff --git a/sdks/python/apache_beam/runners/dataflow/internal/dependency.py 
b/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
index 6d4a703..03e1794 100644
--- a/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
+++ b/sdks/python/apache_beam/runners/dataflow/internal/dependency.py
@@ -500,11 +500,13 @@ def get_sdk_name_and_version():
   """For internal use only; no backwards-compatibility guarantees.
 
   Returns name and version of SDK reported to Google Cloud Dataflow."""
-  # TODO(ccy): Make this check cleaner.
+  import pkg_resources as pkg
   container_version = get_required_container_version()
-  if container_version == BEAM_CONTAINER_VERSION:
+  try:
+pkg.get_distribution(GOOGLE_PACKAGE_NAME)
+return ('Google Cloud Dataflow SDK for Python', container_version)
+  except pkg.DistributionNotFound:
 return ('Apache Beam SDK for Python', beam_version.__version__)
-  return ('Google Cloud Dataflow SDK for Python', container_version)
 
 
 def get_sdk_package_name():



Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4228

2017-06-27 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2489

2017-06-27 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4226

2017-06-27 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2488

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Comment Edited] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065235#comment-16065235
 ] 

Kenneth Knowles edited comment on BEAM-2140 at 6/27/17 6:14 PM:


I considered for a long time what should happen with processing time triggers 
as far as window expiry. We spent quite some time coming up with the semantics 
at https://s.apache.org/beam-lateness#heading=h.hot1g47sz45s, long before Beam. 
I don't claim it is perfect (it is way too complex, for one) but it represents 
a lot of thought by lots of people. I think actually it does give some choices.

* an input being droppable does not necessarily mean you are required to drop 
it (some transforms may falter on droppable inputs, but that is specific to the 
transform)
* input timestamp and output timestamp are decoupled, so you can reason about 
whether to ignore input based on whether the resulting output would be droppable

Some possibilities that I've considered (some break the model):

*Treat processing time timers as inputs with some timestamp at EOW or some such*

The theme that timers are inputs is basically valid. We gain clarity by not 
conflating them in APIs and discussions. But how they interact with watermarks, 
etc, should be basically compatible. Currently processing time timers are 
treated as inputs with a timestamp equal to the input watermark at the moment 
of their arrival. So this change would cause an input hold because there is a 
known upcoming element that just hasn't arrived.

In streaming: this holds things up too much. It also makes repeatedly firing 
after processing time cause an infinite loops, versus what happens today where 
it naturally goes through window expiry and GC.

In batch: this breaks the unified model for processing historical data in a 
batch mode. With the semantics as they exist today, the way that batch "runs" 
triggers and processing time timers (by ignoring them) is completely compatible 
with the semantics. So any user who writes a correct transform has good 
assurances they it will work in both modes. If processing time timers held 
watermarks like this they would need to be processed in batch mode, yet they 
are contradictory with the whole point of it.

We can omit unbounded SDFs from this unification issue, probably, but a 
bounded-per-element SDF should certainly work on streamed unbounded input as 
well as bounded input.

*Decide whether to drop a processing time timer not based on the input 
watermark but based on whether its output would be droppable*

This lets the input watermark advance, but still does not allow infinitely 
repeating processing time timers to terminate with window expiry automatically, 
and it still breaks the unified model. We could alleviate both issues by 
refusing to set new timers that would already be expired. I think this is just 
a rabbit hole of unnatural corner cases so we should avoid it.

*In addition to the processing time timers that ProcessFn sets, also set a GC 
timer*

This seems straightforward and a simple and good idea. These timers are also 
still run in batch mode for historical reprocessing.

Can you clarify how it does not work? Is it because you need to create a "loop" 
that continues to fire until the residual is gone? Currently, there is simply 
no way to make a perpetual loop with timers because of the commentary below.

*Treat event time timers as inputs with their given timestamp*

This would combine the GC timer idea and let you make a looping structure. This 
currently cannot work because timers fire only when the input watermark is 
strictly greater than their timestamp. The semantics of "on time" and "final 
GC" panes depends on this, so we'd have a lot of work to do. But I think there 
might be a consistent world where event time timers are treated as elements, 
and fire when the watermark arrives at their timestamp. {{@OnWindowExpiration}} 
is then absolutely required and cannot be simulated by a timer.


was (Author: kenn):
I considered for a long time what should happen with processing time triggers 
as far as window expiry. We spent quite some time coming up with the semantics 
at https://s.apache.org/beam-lateness#heading=h.hot1g47sz45s, long before Beam. 
I don't claim it is perfect (it is way too complex, for one) but it represents 
a lot of thought by lots of people. I think actually it does give some choices.

* an input being droppable does not necessarily mean you are required to drop 
it (some transforms may falter on droppable inputs, but that is specific to the 
transform)
* input timestamp and output timestamp are decoupled, so you can reason about 
whether to ignore input based on whether the resulting output would be droppable

Some possibilities that I think don't break the model:

*Treat processing time timers as inputs with some timestamp at EOW or some such*

The theme that timers are inputs is 

[jira] [Comment Edited] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065235#comment-16065235
 ] 

Kenneth Knowles edited comment on BEAM-2140 at 6/27/17 6:14 PM:


I considered for a long time what should happen with processing time triggers 
as far as window expiry. We spent quite some time coming up with the semantics 
at https://s.apache.org/beam-lateness#heading=h.hot1g47sz45s, long before Beam. 
I don't claim it is perfect (it is way too complex, for one) but it represents 
a lot of thought by lots of people. I think actually it does give some choices.

* an input being droppable does not necessarily mean you are required to drop 
it (some transforms may falter on droppable inputs, but that is specific to the 
transform)
* input timestamp and output timestamp are decoupled, so you can reason about 
whether to ignore input based on whether the resulting output would be droppable

Some possibilities for SDF:

*Treat processing time timers as inputs with some timestamp at EOW or some such*

The theme that timers are inputs is basically valid. We gain clarity by not 
conflating them in APIs and discussions. But how they interact with watermarks, 
etc, should be basically compatible. Currently processing time timers are 
treated as inputs with a timestamp equal to the input watermark at the moment 
of their arrival. So this change would cause an input hold because there is a 
known upcoming element that just hasn't arrived.

In streaming: this holds things up too much. It also makes repeatedly firing 
after processing time cause an infinite loops, versus what happens today where 
it naturally goes through window expiry and GC.

In batch: this breaks the unified model for processing historical data in a 
batch mode. With the semantics as they exist today, the way that batch "runs" 
triggers and processing time timers (by ignoring them) is completely compatible 
with the semantics. So any user who writes a correct transform has good 
assurances they it will work in both modes. If processing time timers held 
watermarks like this they would need to be processed in batch mode, yet they 
are contradictory with the whole point of it.

We can omit unbounded SDFs from this unification issue, probably, but a 
bounded-per-element SDF should certainly work on streamed unbounded input as 
well as bounded input.

*Decide whether to drop a processing time timer not based on the input 
watermark but based on whether its output would be droppable*

This lets the input watermark advance, but still does not allow infinitely 
repeating processing time timers to terminate with window expiry automatically, 
and it still breaks the unified model. We could alleviate both issues by 
refusing to set new timers that would already be expired. I think this is just 
a rabbit hole of unnatural corner cases so we should avoid it.

*In addition to the processing time timers that ProcessFn sets, also set a GC 
timer*

This seems straightforward and a simple and good idea. These timers are also 
still run in batch mode for historical reprocessing.

Can you clarify how it does not work? Is it because you need to create a "loop" 
that continues to fire until the residual is gone? Currently, there is simply 
no way to make a perpetual loop with timers because of the commentary below.

*Treat event time timers as inputs with their given timestamp*

This would combine the GC timer idea and let you make a looping structure. This 
currently cannot work because timers fire only when the input watermark is 
strictly greater than their timestamp. The semantics of "on time" and "final 
GC" panes depends on this, so we'd have a lot of work to do. But I think there 
might be a consistent world where event time timers are treated as elements, 
and fire when the watermark arrives at their timestamp. {{@OnWindowExpiration}} 
is then absolutely required and cannot be simulated by a timer.


was (Author: kenn):
I considered for a long time what should happen with processing time triggers 
as far as window expiry. We spent quite some time coming up with the semantics 
at https://s.apache.org/beam-lateness#heading=h.hot1g47sz45s, long before Beam. 
I don't claim it is perfect (it is way too complex, for one) but it represents 
a lot of thought by lots of people. I think actually it does give some choices.

* an input being droppable does not necessarily mean you are required to drop 
it (some transforms may falter on droppable inputs, but that is specific to the 
transform)
* input timestamp and output timestamp are decoupled, so you can reason about 
whether to ignore input based on whether the resulting output would be droppable

Some possibilities that I've considered (some break the model):

*Treat processing time timers as inputs with some timestamp at EOW or some such*

The theme that timers are inputs is basically valid. We gain 

[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065235#comment-16065235
 ] 

Kenneth Knowles commented on BEAM-2140:
---

I considered for a long time what should happen with processing time triggers 
as far as window expiry. We spent quite some time coming up with the semantics 
at https://s.apache.org/beam-lateness#heading=h.hot1g47sz45s, long before Beam. 
I don't claim it is perfect (it is way too complex, for one) but it represents 
a lot of thought by lots of people. I think actually it does give some choices.

* an input being droppable does not necessarily mean you are required to drop 
it (some transforms may falter on droppable inputs, but that is specific to the 
transform)
* input timestamp and output timestamp are decoupled, so you can reason about 
whether to ignore input based on whether the resulting output would be droppable

Some possibilities that I think don't break the model:

*Treat processing time timers as inputs with some timestamp at EOW or some such*

The theme that timers are inputs is basically valid. We gain clarity by not 
conflating them in APIs and discussions. But how they interact with watermarks, 
etc, should be basically compatible. Currently processing time timers are 
treated as inputs with a timestamp equal to the input watermark at the moment 
of their arrival. So this change would cause an input hold because there is a 
known upcoming element that just hasn't arrived.

In streaming: this holds things up too much. It also makes repeatedly firing 
after processing time cause an infinite loops, versus what happens today where 
it naturally goes through window expiry and GC.

In batch: this breaks the unified model for processing historical data in a 
batch mode. With the semantics as they exist today, the way that batch "runs" 
triggers and processing time timers (by ignoring them) is completely compatible 
with the semantics. So any user who writes a correct transform has good 
assurances they it will work in both modes. If processing time timers held 
watermarks like this they would need to be processed in batch mode, yet they 
are contradictory with the whole point of it.

We can omit unbounded SDFs from this unification issue, probably, but a 
bounded-per-element SDF should certainly work on streamed unbounded input as 
well as bounded input.

*Decide whether to drop a processing time timer not based on the input 
watermark but based on whether its output would be droppable*

This lets the input watermark advance, but still does not allow infinitely 
repeating processing time timers to terminate with window expiry automatically, 
and it still breaks the unified model. We could alleviate both issues by 
refusing to set new timers that would already be expired. I think this is just 
a rabbit hole of unnatural corner cases so we should avoid it.

*In addition to the processing time timers that ProcessFn sets, also set a GC 
timer*

This seems straightforward and a simple and good idea. These timers are also 
still run in batch mode for historical reprocessing.

Can you clarify how it does not work? Is it because you need to create a "loop" 
that continues to fire until the residual is gone? Currently, there is simply 
no way to make a perpetual loop with timers because of the commentary below.

*Treat event time timers as inputs with their given timestamp*

This would combine the GC timer idea and let you make a looping structure. This 
currently cannot work because timers fire only when the input watermark is 
strictly greater than their timestamp. The semantics of "on time" and "final 
GC" panes depends on this, so we'd have a lot of work to do. But I think there 
might be a consistent world where event time timers are treated as elements, 
and fire when the watermark arrives at their timestamp. {{@OnWindowExpiration}} 
is then absolutely required and cannot be simulated by a timer.

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-2522) upgrading jackson

2017-06-27 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-2522.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

> upgrading jackson
> -
>
> Key: BEAM-2522
> URL: https://issues.apache.org/jira/browse/BEAM-2522
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Antony Mayi
>Assignee: Antony Mayi
>Priority: Minor
>  Labels: security
> Fix For: 2.1.0
>
>
> please consider upgrading jackson to mitigate its [deserlization 
> vulnerability in 
> 2.8.8|https://github.com/FasterXML/jackson-databind/issues/1599]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2522) upgrading jackson

2017-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065213#comment-16065213
 ] 

ASF GitHub Bot commented on BEAM-2522:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3451


> upgrading jackson
> -
>
> Key: BEAM-2522
> URL: https://issues.apache.org/jira/browse/BEAM-2522
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Antony Mayi
>Assignee: Antony Mayi
>Priority: Minor
>  Labels: security
> Fix For: 2.1.0
>
>
> please consider upgrading jackson to mitigate its [deserlization 
> vulnerability in 
> 2.8.8|https://github.com/FasterXML/jackson-databind/issues/1599]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-2522) upgrading jackson

2017-06-27 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-2522:

Affects Version/s: (was: 2.1.0)

> upgrading jackson
> -
>
> Key: BEAM-2522
> URL: https://issues.apache.org/jira/browse/BEAM-2522
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Affects Versions: 2.0.0
>Reporter: Antony Mayi
>Assignee: Antony Mayi
>Priority: Minor
>  Labels: security
> Fix For: 2.1.0
>
>
> please consider upgrading jackson to mitigate its [deserlization 
> vulnerability in 
> 2.8.8|https://github.com/FasterXML/jackson-databind/issues/1599]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3451: [BEAM-2522] upgrading jackson to 2.8.9 (mitigating ...

2017-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3451


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] beam git commit: [BEAM-2522] upgrading jackson to 2.8.9 (mitigating #1599)

2017-06-27 Thread lcwik
Repository: beam
Updated Branches:
  refs/heads/master 39074899a -> 9dc40d637


[BEAM-2522] upgrading jackson to 2.8.9 (mitigating #1599)


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/0c34db9b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/0c34db9b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/0c34db9b

Branch: refs/heads/master
Commit: 0c34db9b680c5cddf9874a7ccc86008c109068e0
Parents: 3907489
Author: Stepan Kadlec 
Authored: Tue Jun 27 09:12:47 2017 -0700
Committer: Luke Cwik 
Committed: Tue Jun 27 10:42:11 2017 -0700

--
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/0c34db9b/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 98cace9..29bb4eb 100644
--- a/pom.xml
+++ b/pom.xml
@@ -128,7 +128,7 @@
 1.2.0
 
0.1.9
 1.3
-2.8.8
+2.8.9
 3.0.1
 2.4
 4.12



[2/2] beam git commit: [BEAM-2522] upgrading jackson to 2.8.9 (mitigating #1599)

2017-06-27 Thread lcwik
[BEAM-2522] upgrading jackson to 2.8.9 (mitigating #1599)

This closes #3451


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/9dc40d63
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/9dc40d63
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/9dc40d63

Branch: refs/heads/master
Commit: 9dc40d6371dc3f0b0584b6445af68b574604fdf1
Parents: 3907489 0c34db9
Author: Luke Cwik 
Authored: Tue Jun 27 10:58:49 2017 -0700
Committer: Luke Cwik 
Committed: Tue Jun 27 10:58:49 2017 -0700

--
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--




[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065211#comment-16065211
 ] 

Kenneth Knowles commented on BEAM-2140:
---

Taking a step back, I don't feel strongly that it needs to be possible to 
implement SDF via StateInternals + TimerInternals, without any further runner 
awareness. If the natural semantics for state and timers don't fit, we 
shouldn't make them unnatural for this case. They are used for triggers and it 
has taken some time to mature the design, so changing their semantics has real 
risks. But these _are_ just internals, so we should also feel free to adjust 
them if we come up with a strong new design.

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2487

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Assigned] (BEAM-2524) Update Google Cloud Console URL returned by DataflowRunner to support regions.

2017-06-27 Thread Thomas Groh (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Groh reassigned BEAM-2524:
-

Assignee: Robert Burke  (was: Thomas Groh)

> Update Google Cloud Console URL returned by DataflowRunner to support regions.
> --
>
> Key: BEAM-2524
> URL: https://issues.apache.org/jira/browse/BEAM-2524
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Robert Burke
>Assignee: Robert Burke
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Both the Java and Python Dataflow Runners need to be updated with a 
> regionalized form of the Google Cloud Console URL to support multiple 
> Dataflow Regions.
> The new URL format will be:
> https://console.cloud.corp.google.com/dataflow/jobsDetail/locations//jobs/?project=



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4225

2017-06-27 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #4224

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-2524) Update Google Cloud Console URL returned by DataflowRunner to support regions.

2017-06-27 Thread Robert Burke (JIRA)
Robert Burke created BEAM-2524:
--

 Summary: Update Google Cloud Console URL returned by 
DataflowRunner to support regions.
 Key: BEAM-2524
 URL: https://issues.apache.org/jira/browse/BEAM-2524
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Reporter: Robert Burke
Assignee: Thomas Groh


Both the Java and Python Dataflow Runners need to be updated with a 
regionalized form of the Google Cloud Console URL to support multiple Dataflow 
Regions.

The new URL format will be:
https://console.cloud.corp.google.com/dataflow/jobsDetail/locations//jobs/?project=



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2140) Fix SplittableDoFn ValidatesRunner tests in FlinkRunner

2017-06-27 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065171#comment-16065171
 ] 

Kenneth Knowles commented on BEAM-2140:
---

I agree that the proposal of registering timers with an output watermark hold 
is essentially the same and won't help.

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> ---
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (BEAM-2509) Fn API Runner hangs in grpc controller mode

2017-06-27 Thread Vikas Kedigehalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Kedigehalli resolved BEAM-2509.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

> Fn API Runner hangs in grpc controller mode
> ---
>
> Key: BEAM-2509
> URL: https://issues.apache.org/jira/browse/BEAM-2509
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model-fn-api, sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.1.0
>
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L312
>  tests only run in direct mode, but we should run in grpc mode as well. 
> Currently the grpc mode is broken and needs fixing. Once we enable it, these 
> tests can catch issues like https://github.com/apache/beam/pull/3431



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2486

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2509) Fn API Runner hangs in grpc controller mode

2017-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065156#comment-16065156
 ] 

ASF GitHub Bot commented on BEAM-2509:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3445


> Fn API Runner hangs in grpc controller mode
> ---
>
> Key: BEAM-2509
> URL: https://issues.apache.org/jira/browse/BEAM-2509
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model-fn-api, sdk-py
>Reporter: Vikas Kedigehalli
>Assignee: Luke Cwik
>Priority: Minor
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L312
>  tests only run in direct mode, but we should run in grpc mode as well. 
> Currently the grpc mode is broken and needs fixing. Once we enable it, these 
> tests can catch issues like https://github.com/apache/beam/pull/3431



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3445: [BEAM-2509] Enable grpc controller in fn_api_runner

2017-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3445


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] beam git commit: This closes #3445

2017-06-27 Thread altay
This closes #3445


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/39074899
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/39074899
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/39074899

Branch: refs/heads/master
Commit: 39074899a969a4feff0081da04cbd600f6480e93
Parents: e93c064 8dd0077
Author: Ahmet Altay 
Authored: Tue Jun 27 10:17:53 2017 -0700
Committer: Ahmet Altay 
Committed: Tue Jun 27 10:17:53 2017 -0700

--
 .../runners/portability/fn_api_runner.py| 12 +++---
 .../runners/portability/fn_api_runner_test.py   | 23 +++-
 .../apache_beam/runners/worker/sdk_worker.py|  2 +-
 3 files changed, 32 insertions(+), 5 deletions(-)
--




[1/2] beam git commit: Enable grpc controller in fn_api_runner

2017-06-27 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master e93c06485 -> 39074899a


Enable grpc controller in fn_api_runner


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/8dd0077d
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/8dd0077d
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/8dd0077d

Branch: refs/heads/master
Commit: 8dd0077d2a58e278b11c7e7eb4b5f182e1400992
Parents: e93c064
Author: Vikas Kedigehalli 
Authored: Mon Jun 26 18:47:39 2017 -0700
Committer: Ahmet Altay 
Committed: Tue Jun 27 10:17:49 2017 -0700

--
 .../runners/portability/fn_api_runner.py| 12 +++---
 .../runners/portability/fn_api_runner_test.py   | 23 +++-
 .../apache_beam/runners/worker/sdk_worker.py|  2 +-
 3 files changed, 32 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/8dd0077d/sdks/python/apache_beam/runners/portability/fn_api_runner.py
--
diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner.py 
b/sdks/python/apache_beam/runners/portability/fn_api_runner.py
index a8e2eb4..c5438ad 100644
--- a/sdks/python/apache_beam/runners/portability/fn_api_runner.py
+++ b/sdks/python/apache_beam/runners/portability/fn_api_runner.py
@@ -174,12 +174,17 @@ class 
FnApiRunner(maptask_executor_runner.MapTaskExecutorRunner):
   return {tag: pcollection_id(op_ix, out_ix)
   for out_ix, tag in enumerate(getattr(op, 'output_tags', 
['out']))}
 
+def only_element(iterable):
+  element, = iterable
+  return element
+
 for op_ix, (stage_name, operation) in enumerate(map_task):
   transform_id = uniquify(stage_name)
 
   if isinstance(operation, operation_specs.WorkerInMemoryWrite):
 # Write this data back to the runner.
-runner_sinks[(transform_id, 'out')] = operation
+target_name = only_element(get_inputs(operation).keys())
+runner_sinks[(transform_id, target_name)] = operation
 transform_spec = beam_runner_api_pb2.FunctionSpec(
 urn=sdk_worker.DATA_OUTPUT_URN,
 parameter=proto_utils.pack_Any(data_operation_spec))
@@ -190,7 +195,8 @@ class 
FnApiRunner(maptask_executor_runner.MapTaskExecutorRunner):
maptask_executor_runner.InMemorySource)
 and isinstance(operation.source.source.default_output_coder(),
WindowedValueCoder)):
-  input_data[(transform_id, 'input')] = self._reencode_elements(
+  target_name = only_element(get_outputs(op_ix).keys())
+  input_data[(transform_id, target_name)] = self._reencode_elements(
   operation.source.source.read(None),
   operation.source.source.default_output_coder())
   transform_spec = beam_runner_api_pb2.FunctionSpec(
@@ -309,7 +315,7 @@ class 
FnApiRunner(maptask_executor_runner.MapTaskExecutorRunner):
 sink_op.output_buffer.append(e)
 return
 
-  def execute_map_tasks(self, ordered_map_tasks, direct=True):
+  def execute_map_tasks(self, ordered_map_tasks, direct=False):
 if direct:
   controller = FnApiRunner.DirectController()
 else:

http://git-wip-us.apache.org/repos/asf/beam/blob/8dd0077d/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
--
diff --git a/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py 
b/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
index 9159035..163e980 100644
--- a/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
+++ b/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
@@ -21,6 +21,8 @@ import unittest
 import apache_beam as beam
 from apache_beam.runners.portability import fn_api_runner
 from apache_beam.runners.portability import maptask_executor_runner_test
+from apache_beam.testing.util import assert_that
+from apache_beam.testing.util import equal_to
 
 
 class FnApiRunnerTest(
@@ -31,9 +33,28 @@ class FnApiRunnerTest(
 runner=fn_api_runner.FnApiRunner())
 
   def test_combine_per_key(self):
-# TODO(robertwb): Implement PGBKCV operation.
+# TODO(BEAM-1348): Enable once Partial GBK is supported in fn API.
 pass
 
+  def test_combine_per_key(self):
+# TODO(BEAM-1348): Enable once Partial GBK is supported in fn API.
+pass
+
+  def test_pardo_side_inputs(self):
+# TODO(BEAM-1348): Enable once side inputs are supported in fn API.
+pass
+
+  def test_pardo_unfusable_side_inputs(self):
+# TODO(BEAM-1348): Enable once side inputs are supported in fn API.
+pass
+
+  def test_assert_that(self):
+# TODO: figure out a way for fn_api_runner to 

[GitHub] beam pull request #3449: Use input name as output name for Fn API sink.

2017-06-27 Thread robertwb
Github user robertwb closed the pull request at:

https://github.com/apache/beam/pull/3449


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_Java_ValidatesRunner_Spark #2485

2017-06-27 Thread Apache Jenkins Server
See 




[jira] [Resolved] (BEAM-2392) Avoid use of proto builder clone

2017-06-27 Thread Luke Cwik (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-2392.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

> Avoid use of proto builder clone
> 
>
> Key: BEAM-2392
> URL: https://issues.apache.org/jira/browse/BEAM-2392
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Nigel Kilmer
>Assignee: Nigel Kilmer
>Priority: Minor
> Fix For: 2.1.0
>
>
> BigtableServiceImpl uses the clone method of the MutateRowResponse proto 
> builder here:
> https://github.com/apache/beam/blob/04e3261818aed0c129e7c715e371463bf5b5c1b8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableServiceImpl.java#L212
> This method is not generated by the Google-internal Java proto generator, so 
> I had to change this to get it to work with an internal project. Are you 
> interested in adding this change to the main repository for compatibility, or 
> would you prefer to keep the cleaner version that uses clone?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >