Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #128

2016-04-15 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-50) BigQueryIO.Write: reimplement in Java

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243900#comment-15243900
 ] 

ASF GitHub Bot commented on BEAM-50:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/193


> BigQueryIO.Write: reimplement in Java
> -
>
> Key: BEAM-50
> URL: https://issues.apache.org/jira/browse/BEAM-50
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Priority: Minor
>
> BigQueryIO.Write is currently implemented in a somewhat hacky way.
> Unbounded sink:
> * The DirectPipelineRunner and the DataflowPipelineRunner use 
> StreamingWriteFn and BigQueryTableInserter to insert rows using BigQuery's 
> streaming writes API.
> Bounded sink:
> * The DirectPipelineRunner still uses streaming writes.
> * The DataflowPipelineRunner uses a different code path in the Google Cloud 
> Dataflow service that writes to GCS and the initiates a BigQuery load job.
> * Per-window table destinations do not work scalably. (See Beam-XXX).
> We need to reimplement BigQueryIO.Write fully in Java code in order to 
> support other runners in a scalable way.
> I additionally suggest that we revisit the design of the BigQueryIO sink in 
> the process. A short list:
> * Do not use TableRow as the default value for rows. It could be Map Object> with well-defined types, for example, or an Avro GenericRecord. 
> Dropping TableRow will get around a variety of issues with types, fields 
> named 'f', etc., and it will also reduce confusion as we use TableRow objects 
> differently than usual (for good reason).
> * Possibly support not-knowing the schema until pipeline execution time.
> * Our builders for BigQueryIO.Write are useful and we should keep them. Where 
> possible we should also allow users to provide the JSON objects that 
> configure the underlying table creation, write disposition, etc. This would 
> let users directly control things like table expiration time, table location, 
> etc., Would also optimistically let users take advantage of some new BigQuery 
> features without code changes.
> * We could choose between streaming write API and load jobs based on user 
> preference or dynamic job properties . We could use streaming write in a 
> batch pipeline if the data is small. We could use load jobs in streaming 
> pipelines if the windows are large enough to make this practical.
> * When issuing BigQuery load jobs, we could leave files in GCS if the import 
> fails, so that data errors can be debugged.
> * We should make per-window table writes scalable in batch.
> Caveat, possibly blocker:
> * (Beam-XXX): cleanup and temp file management. One advantage of the Google 
> Cloud Dataflow implementation of BigQueryIO.Write is cleanup: we ensure that 
> intermediate files are deleted when bundles or jobs fail, etc. Beam does not 
> currently support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-50] Remove BigQueryIO.Write.Bou...

2016-04-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/193


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: [BEAM-50] Remove BigQueryIO.Write.Bound translator

2016-04-15 Thread dhalperi
[BEAM-50] Remove BigQueryIO.Write.Bound translator


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/7f51f6af
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/7f51f6af
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/7f51f6af

Branch: refs/heads/master
Commit: 7f51f6af61dbae4e4d36d2fd1e94be945b60b11c
Parents: 9039949
Author: Pei He 
Authored: Fri Apr 15 16:19:54 2016 -0700
Committer: Dan Halperin 
Committed: Fri Apr 15 17:32:46 2016 -0700

--
 .../sdk/runners/DataflowPipelineTranslator.java |  2 -
 .../runners/dataflow/BigQueryIOTranslator.java  | 55 
 2 files changed, 57 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/7f51f6af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/DataflowPipelineTranslator.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/DataflowPipelineTranslator.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/DataflowPipelineTranslator.java
index 0a71f65..4e60545 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/DataflowPipelineTranslator.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/DataflowPipelineTranslator.java
@@ -1041,8 +1041,6 @@ public class DataflowPipelineTranslator {
 
 registerTransformTranslator(
 BigQueryIO.Read.Bound.class, new 
BigQueryIOTranslator.ReadTranslator());
-registerTransformTranslator(
-BigQueryIO.Write.Bound.class, new 
BigQueryIOTranslator.WriteTranslator());
 
 registerTransformTranslator(
 PubsubIO.Read.Bound.class, new PubsubIOTranslator.ReadTranslator());

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/7f51f6af/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/dataflow/BigQueryIOTranslator.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/dataflow/BigQueryIOTranslator.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/dataflow/BigQueryIOTranslator.java
index 51d7000..b0952a6 100755
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/dataflow/BigQueryIOTranslator.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/dataflow/BigQueryIOTranslator.java
@@ -17,26 +17,19 @@
  */
 package org.apache.beam.sdk.runners.dataflow;
 
-import org.apache.beam.sdk.coders.TableRowJsonCoder;
 import org.apache.beam.sdk.io.BigQueryIO;
 import org.apache.beam.sdk.runners.DataflowPipelineTranslator;
 import org.apache.beam.sdk.util.PropertyNames;
-import org.apache.beam.sdk.util.Transport;
-import org.apache.beam.sdk.util.WindowedValue;
 
-import com.google.api.client.json.JsonFactory;
 import com.google.api.services.bigquery.model.TableReference;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import java.io.IOException;
-
 /**
  * BigQuery transform support code for the Dataflow backend.
  */
 public class BigQueryIOTranslator {
-  private static final JsonFactory JSON_FACTORY = Transport.getJsonFactory();
   private static final Logger LOG = 
LoggerFactory.getLogger(BigQueryIOTranslator.class);
 
   /**
@@ -76,52 +69,4 @@ public class BigQueryIOTranslator {
   context.addValueOnlyOutput(PropertyNames.OUTPUT, 
context.getOutput(transform));
 }
   }
-
-  /**
-   * Implements BigQueryIO Write translation for the Dataflow backend.
-   */
-  public static class WriteTranslator
-  implements 
DataflowPipelineTranslator.TransformTranslator {
-
-@Override
-public void translate(BigQueryIO.Write.Bound transform,
-  DataflowPipelineTranslator.TranslationContext 
context) {
-  if (context.getPipelineOptions().isStreaming()) {
-// Streaming is handled by the streaming runner.
-throw new AssertionError(
-"BigQueryIO is specified to use streaming write in batch mode.");
-  }
-
-  TableReference table = transform.getTable();
-
-  // Actual translation.
-  context.addStep(transform, "ParallelWrite");
-  context.addInput(PropertyNames.FORMAT, "bigquery");
-  context.addInput(PropertyNames.BIGQUERY_TABLE,
-   table.getTableId());
-  context.addInput(PropertyNames.BIGQUERY_DATASET,
-   table.getDatasetId());
-  if (table.getProjectId() != null) {
-context.addInput(PropertyNames.BIGQUERY_PROJECT, table.getProjectId());
-  }
-  

[1/2] incubator-beam git commit: Closes #193

2016-04-15 Thread dhalperi
Repository: incubator-beam
Updated Branches:
  refs/heads/master 9039949d5 -> fd2548ff2


Closes #193


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/fd2548ff
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/fd2548ff
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/fd2548ff

Branch: refs/heads/master
Commit: fd2548ff2dbea7f8c573285ff8a2dc985fafb4cd
Parents: 9039949 7f51f6a
Author: Dan Halperin 
Authored: Fri Apr 15 17:32:46 2016 -0700
Committer: Dan Halperin 
Committed: Fri Apr 15 17:32:46 2016 -0700

--
 .../sdk/runners/DataflowPipelineTranslator.java |  2 -
 .../runners/dataflow/BigQueryIOTranslator.java  | 55 
 2 files changed, 57 deletions(-)
--




Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #127

2016-04-15 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #127

2016-04-15 Thread Apache Jenkins Server
See 




[1/2] incubator-beam git commit: Switch the Default PipelineRunner

2016-04-15 Thread davor
Repository: incubator-beam
Updated Branches:
  refs/heads/master e9f1b579a -> 9039949d5


Switch the Default PipelineRunner

Use the InProcessPiplineRunner (pending rename) as the default runner.
The InProcessPipelineRunner implements the beam model, including support
for Unbounded PCollections.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/757cb326
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/757cb326
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/757cb326

Branch: refs/heads/master
Commit: 757cb326b909ad62aea8c51183a83521adfd5a3a
Parents: e9f1b57
Author: Thomas Groh 
Authored: Fri Apr 8 10:20:56 2016 -0700
Committer: Thomas Groh 
Committed: Fri Apr 15 16:12:52 2016 -0700

--
 .../beam/sdk/options/PipelineOptions.java   |  4 +-
 .../ImmutabilityCheckingBundleFactory.java  | 20 +++--
 .../sdk/options/PipelineOptionsFactoryTest.java |  3 +-
 .../beam/sdk/options/PipelineOptionsTest.java   |  4 +-
 .../beam/sdk/runners/TransformTreeTest.java | 79 ++--
 .../EncodabilityEnforcementFactoryTest.java |  2 +-
 .../ImmutabilityCheckingBundleFactoryTest.java  | 17 ++---
 .../apache/beam/sdk/transforms/ParDoTest.java   | 14 ++--
 .../beam/sdk/transforms/WithKeysJava8Test.java  |  3 +-
 9 files changed, 68 insertions(+), 78 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/757cb326/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java
index 17cf5b3..d87e396 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java
@@ -21,8 +21,8 @@ import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.options.GoogleApiDebugOptions.GoogleApiTracer;
 import org.apache.beam.sdk.options.ProxyInvocationHandler.Deserializer;
 import org.apache.beam.sdk.options.ProxyInvocationHandler.Serializer;
-import org.apache.beam.sdk.runners.DirectPipelineRunner;
 import org.apache.beam.sdk.runners.PipelineRunner;
+import org.apache.beam.sdk.runners.inprocess.InProcessPipelineRunner;
 import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.transforms.DoFn.Context;
 
@@ -225,7 +225,7 @@ public interface PipelineOptions {
   @Description("The pipeline runner that will be used to execute the pipeline. 
"
   + "For registered runners, the class name can be specified, otherwise 
the fully "
   + "qualified name needs to be specified.")
-  @Default.Class(DirectPipelineRunner.class)
+  @Default.Class(InProcessPipelineRunner.class)
   Class> getRunner();
   void setRunner(Class> kls);
 

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/757cb326/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ImmutabilityCheckingBundleFactory.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ImmutabilityCheckingBundleFactory.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ImmutabilityCheckingBundleFactory.java
index 0852269..bb3d501 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ImmutabilityCheckingBundleFactory.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ImmutabilityCheckingBundleFactory.java
@@ -28,7 +28,6 @@ import org.apache.beam.sdk.util.IllegalMutationException;
 import org.apache.beam.sdk.util.MutationDetector;
 import org.apache.beam.sdk.util.MutationDetectors;
 import org.apache.beam.sdk.util.SerializableUtils;
-import org.apache.beam.sdk.util.UserCodeException;
 import org.apache.beam.sdk.util.WindowedValue;
 import org.apache.beam.sdk.values.PCollection;
 
@@ -113,17 +112,16 @@ class ImmutabilityCheckingBundleFactory implements 
BundleFactory {
 try {
   detector.verifyUnmodified();
 } catch (IllegalMutationException exn) {
-  throw UserCodeException.wrap(
-  new IllegalMutationException(
-  String.format(
-  "PTransform %s mutated value %s after it was output (new 
value was %s)."
-  + " Values must not be mutated in any way after 
being output.",
-  
underlying.getPCollection().getProducingTransformInternal().getFullName(),
-  exn.getSavedValue(),
-  exn.getNewValue()),
+  throw new 

[2/2] incubator-beam git commit: This closes #178

2016-04-15 Thread davor
This closes #178


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/9039949d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/9039949d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/9039949d

Branch: refs/heads/master
Commit: 9039949d5f518fed84bc7cf7e08870e023b53951
Parents: e9f1b57 757cb32
Author: Davor Bonaci 
Authored: Fri Apr 15 17:07:42 2016 -0700
Committer: Davor Bonaci 
Committed: Fri Apr 15 17:07:42 2016 -0700

--
 .../beam/sdk/options/PipelineOptions.java   |  4 +-
 .../ImmutabilityCheckingBundleFactory.java  | 20 +++--
 .../sdk/options/PipelineOptionsFactoryTest.java |  3 +-
 .../beam/sdk/options/PipelineOptionsTest.java   |  4 +-
 .../beam/sdk/runners/TransformTreeTest.java | 79 ++--
 .../EncodabilityEnforcementFactoryTest.java |  2 +-
 .../ImmutabilityCheckingBundleFactoryTest.java  | 17 ++---
 .../apache/beam/sdk/transforms/ParDoTest.java   | 14 ++--
 .../beam/sdk/transforms/WithKeysJava8Test.java  |  3 +-
 9 files changed, 68 insertions(+), 78 deletions(-)
--




[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243888#comment-15243888
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/178


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-22] Switch the Default Pipeline...

2016-04-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/178


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request: Clean up DataflowPipeline[Debug]Optio...

2016-04-15 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/196

Clean up DataflowPipeline[Debug]Options

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

getUpdate should be isUpdate, to meet standard JavaBeans style.

Remove deprecated update property from DataflowPipelineDebugOptions

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
remove_deprecated_debug_options

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #196


commit d596ea51ace2aa02cb7f120f78119b310516e8de
Author: Thomas Groh 
Date:   2016-04-15T17:06:21Z

Clean up DataflowPipeline[Debug]Options

getUpdate should be isUpdate, to meet standard JavaBeans style.

Remove deprecated update property from DataflowPipelineDebugOptions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request: Remove deprecated sdk.transforms.Writ...

2016-04-15 Thread dhalperi
GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/195

Remove deprecated sdk.transforms.Write, switch users to sdk.io.Write

Post-rename cleanup.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam remove-deprecated-wite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #195


commit 4332572610824e4fe305e07aeca80d9b2a3ea4e6
Author: Dan Halperin 
Date:   2016-04-16T00:06:53Z

Remove deprecated sdk.transforms.Write, switch users to sdk.io.Write

Post-rename cleanup.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #126

2016-04-15 Thread Apache Jenkins Server
See 




[GitHub] incubator-beam pull request: [BEAM-202] Clean-up *CoderBase classe...

2016-04-15 Thread lukecwik
GitHub user lukecwik opened a pull request:

https://github.com/apache/incubator-beam/pull/194

[BEAM-202] Clean-up *CoderBase classes

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

We are on Jackson 2.7.0 now.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam remove_coder_base

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/194.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #194


commit 11e8bfed4c73ec21dcf5a671ff0381ec1f39d8d4
Author: Luke Cwik 
Date:   2016-04-15T23:53:23Z

[BEAM-202] Clean-up *CoderBase classes since we are on a newer version of 
Jackson




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-202) Remove YYYCoderBase

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243875#comment-15243875
 ] 

ASF GitHub Bot commented on BEAM-202:
-

GitHub user lukecwik opened a pull request:

https://github.com/apache/incubator-beam/pull/194

[BEAM-202] Clean-up *CoderBase classes

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

We are on Jackson 2.7.0 now.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam remove_coder_base

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/194.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #194


commit 11e8bfed4c73ec21dcf5a671ff0381ec1f39d8d4
Author: Luke Cwik 
Date:   2016-04-15T23:53:23Z

[BEAM-202] Clean-up *CoderBase classes since we are on a newer version of 
Jackson




> Remove YYYCoderBase
> ---
>
> Key: BEAM-202
> URL: https://issues.apache.org/jira/browse/BEAM-202
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>
> Jackson 2.4.5 had a bug where it couldn't deserialize types with multiple 
> generic parameters. Since we are on Jackson 2.7.0, we can remove KvCoderBase 
> and MapCoderBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-202) Remove YYYCoderBase

2016-04-15 Thread Luke Cwik (JIRA)
Luke Cwik created BEAM-202:
--

 Summary: Remove YYYCoderBase
 Key: BEAM-202
 URL: https://issues.apache.org/jira/browse/BEAM-202
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Luke Cwik
Assignee: Luke Cwik


Jackson 2.4.5 had a bug where it couldn't deserialize types with multiple 
generic parameters. Since we are on Jackson 2.7.0, we can remove KvCoderBase 
and MapCoderBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-50] Remove BigQueryIO.Write.Bou...

2016-04-15 Thread peihe
GitHub user peihe opened a pull request:

https://github.com/apache/incubator-beam/pull/193

[BEAM-50] Remove BigQueryIO.Write.Bound translator




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam 
remove-bq-write-translator

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #193


commit 5dd12fe94a5ea2249cf5156c16df1b22aa663baf
Author: Pei He 
Date:   2016-04-15T23:19:54Z

[BEAM-50] Remove BigQueryIO.Write.Bound translator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-50) BigQueryIO.Write: reimplement in Java

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243836#comment-15243836
 ] 

ASF GitHub Bot commented on BEAM-50:


GitHub user peihe opened a pull request:

https://github.com/apache/incubator-beam/pull/193

[BEAM-50] Remove BigQueryIO.Write.Bound translator




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam 
remove-bq-write-translator

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #193


commit 5dd12fe94a5ea2249cf5156c16df1b22aa663baf
Author: Pei He 
Date:   2016-04-15T23:19:54Z

[BEAM-50] Remove BigQueryIO.Write.Bound translator




> BigQueryIO.Write: reimplement in Java
> -
>
> Key: BEAM-50
> URL: https://issues.apache.org/jira/browse/BEAM-50
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Priority: Minor
>
> BigQueryIO.Write is currently implemented in a somewhat hacky way.
> Unbounded sink:
> * The DirectPipelineRunner and the DataflowPipelineRunner use 
> StreamingWriteFn and BigQueryTableInserter to insert rows using BigQuery's 
> streaming writes API.
> Bounded sink:
> * The DirectPipelineRunner still uses streaming writes.
> * The DataflowPipelineRunner uses a different code path in the Google Cloud 
> Dataflow service that writes to GCS and the initiates a BigQuery load job.
> * Per-window table destinations do not work scalably. (See Beam-XXX).
> We need to reimplement BigQueryIO.Write fully in Java code in order to 
> support other runners in a scalable way.
> I additionally suggest that we revisit the design of the BigQueryIO sink in 
> the process. A short list:
> * Do not use TableRow as the default value for rows. It could be Map Object> with well-defined types, for example, or an Avro GenericRecord. 
> Dropping TableRow will get around a variety of issues with types, fields 
> named 'f', etc., and it will also reduce confusion as we use TableRow objects 
> differently than usual (for good reason).
> * Possibly support not-knowing the schema until pipeline execution time.
> * Our builders for BigQueryIO.Write are useful and we should keep them. Where 
> possible we should also allow users to provide the JSON objects that 
> configure the underlying table creation, write disposition, etc. This would 
> let users directly control things like table expiration time, table location, 
> etc., Would also optimistically let users take advantage of some new BigQuery 
> features without code changes.
> * We could choose between streaming write API and load jobs based on user 
> preference or dynamic job properties . We could use streaming write in a 
> batch pipeline if the data is small. We could use load jobs in streaming 
> pipelines if the windows are large enough to make this practical.
> * When issuing BigQuery load jobs, we could leave files in GCS if the import 
> fails, so that data errors can be debugged.
> * We should make per-window table writes scalable in batch.
> Caveat, possibly blocker:
> * (Beam-XXX): cleanup and temp file management. One advantage of the Google 
> Cloud Dataflow implementation of BigQueryIO.Write is cleanup: we ensure that 
> intermediate files are deleted when bundles or jobs fail, etc. Beam does not 
> currently support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #125

2016-04-15 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-50) BigQueryIO.Write: reimplement in Java

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243825#comment-15243825
 ] 

ASF GitHub Bot commented on BEAM-50:


GitHub user peihe opened a pull request:

https://github.com/apache/incubator-beam/pull/192

[BEAM-50] Fix BigQuery.Write tempFilePrefix concatenation




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam fix-path-resolve

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #192


commit 8024a85f05f568b83745d8f61c443bf8ca179421
Author: Pei He 
Date:   2016-04-15T23:15:11Z

Fix BigQuery.Write tempFilePrefix concatenation




> BigQueryIO.Write: reimplement in Java
> -
>
> Key: BEAM-50
> URL: https://issues.apache.org/jira/browse/BEAM-50
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Priority: Minor
>
> BigQueryIO.Write is currently implemented in a somewhat hacky way.
> Unbounded sink:
> * The DirectPipelineRunner and the DataflowPipelineRunner use 
> StreamingWriteFn and BigQueryTableInserter to insert rows using BigQuery's 
> streaming writes API.
> Bounded sink:
> * The DirectPipelineRunner still uses streaming writes.
> * The DataflowPipelineRunner uses a different code path in the Google Cloud 
> Dataflow service that writes to GCS and the initiates a BigQuery load job.
> * Per-window table destinations do not work scalably. (See Beam-XXX).
> We need to reimplement BigQueryIO.Write fully in Java code in order to 
> support other runners in a scalable way.
> I additionally suggest that we revisit the design of the BigQueryIO sink in 
> the process. A short list:
> * Do not use TableRow as the default value for rows. It could be Map Object> with well-defined types, for example, or an Avro GenericRecord. 
> Dropping TableRow will get around a variety of issues with types, fields 
> named 'f', etc., and it will also reduce confusion as we use TableRow objects 
> differently than usual (for good reason).
> * Possibly support not-knowing the schema until pipeline execution time.
> * Our builders for BigQueryIO.Write are useful and we should keep them. Where 
> possible we should also allow users to provide the JSON objects that 
> configure the underlying table creation, write disposition, etc. This would 
> let users directly control things like table expiration time, table location, 
> etc., Would also optimistically let users take advantage of some new BigQuery 
> features without code changes.
> * We could choose between streaming write API and load jobs based on user 
> preference or dynamic job properties . We could use streaming write in a 
> batch pipeline if the data is small. We could use load jobs in streaming 
> pipelines if the windows are large enough to make this practical.
> * When issuing BigQuery load jobs, we could leave files in GCS if the import 
> fails, so that data errors can be debugged.
> * We should make per-window table writes scalable in batch.
> Caveat, possibly blocker:
> * (Beam-XXX): cleanup and temp file management. One advantage of the Google 
> Cloud Dataflow implementation of BigQueryIO.Write is cleanup: we ensure that 
> intermediate files are deleted when bundles or jobs fail, etc. Beam does not 
> currently support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #124

2016-04-15 Thread Apache Jenkins Server
See 




[GitHub] incubator-beam pull request: [BEAM-50] Fix BigQuery.Write tempFile...

2016-04-15 Thread peihe
GitHub user peihe opened a pull request:

https://github.com/apache/incubator-beam/pull/192

[BEAM-50] Fix BigQuery.Write tempFilePrefix concatenation




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam fix-path-resolve

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #192


commit 8024a85f05f568b83745d8f61c443bf8ca179421
Author: Pei He 
Date:   2016-04-15T23:15:11Z

Fix BigQuery.Write tempFilePrefix concatenation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #124

2016-04-15 Thread Apache Jenkins Server
See 




[1/3] incubator-beam git commit: Clone DoFns before constructing a DoFnRunner in the InProcessRunner

2016-04-15 Thread dhalperi
Repository: incubator-beam
Updated Branches:
  refs/heads/master 7a5b7ad80 -> e9f1b579a


Clone DoFns before constructing a DoFnRunner in the InProcessRunner

This ensures that each thread gets an individual copy of a DoFn, so
multiple threads do not interact.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/b6c74ff5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/b6c74ff5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/b6c74ff5

Branch: refs/heads/master
Commit: b6c74ff5c18dcb7c82c3b7717c9c76926a1bbfc4
Parents: 6414425
Author: Thomas Groh 
Authored: Fri Apr 15 10:23:15 2016 -0700
Committer: Dan Halperin 
Committed: Fri Apr 15 16:11:32 2016 -0700

--
 .../beam/sdk/runners/inprocess/ParDoInProcessEvaluator.java   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b6c74ff5/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ParDoInProcessEvaluator.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ParDoInProcessEvaluator.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ParDoInProcessEvaluator.java
index 7365527..a2f080c 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ParDoInProcessEvaluator.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ParDoInProcessEvaluator.java
@@ -25,6 +25,7 @@ import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.util.DoFnRunner;
 import org.apache.beam.sdk.util.DoFnRunners;
 import org.apache.beam.sdk.util.DoFnRunners.OutputManager;
+import org.apache.beam.sdk.util.SerializableUtils;
 import org.apache.beam.sdk.util.UserCodeException;
 import org.apache.beam.sdk.util.WindowedValue;
 import org.apache.beam.sdk.util.common.CounterSet;
@@ -67,7 +68,7 @@ class ParDoInProcessEvaluator implements 
TransformEvaluator {
 DoFnRunner runner =
 DoFnRunners.createDefault(
 evaluationContext.getPipelineOptions(),
-fn,
+SerializableUtils.clone(fn),
 evaluationContext.createSideInputReader(sideInputs),
 BundleOutputManager.create(outputBundles),
 mainOutputTag,



[2/3] incubator-beam git commit: Move Shared construction code to ParDoInProcessEvaluator

2016-04-15 Thread dhalperi
Move Shared construction code to ParDoInProcessEvaluator

Remove duplicate code in ParDo(Single/Multi)EvaluatorFactory; instead
only extract the appropriate elements and pass them to the
ParDoInProcessEvaluator.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/64144259
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/64144259
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/64144259

Branch: refs/heads/master
Commit: 64144259551f1cf627545e0329c5a0daf087e7d2
Parents: 7a5b7ad
Author: Thomas Groh 
Authored: Tue Mar 29 17:38:22 2016 -0700
Committer: Dan Halperin 
Committed: Fri Apr 15 16:11:32 2016 -0700

--
 .../inprocess/ParDoInProcessEvaluator.java  | 50 +-
 .../inprocess/ParDoMultiEvaluatorFactory.java   | 55 +---
 .../inprocess/ParDoSingleEvaluatorFactory.java  | 52 +-
 3 files changed, 75 insertions(+), 82 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/64144259/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ParDoInProcessEvaluator.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ParDoInProcessEvaluator.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ParDoInProcessEvaluator.java
index a68fa53..7365527 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ParDoInProcessEvaluator.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/inprocess/ParDoInProcessEvaluator.java
@@ -18,15 +18,19 @@
 package org.apache.beam.sdk.runners.inprocess;
 
 import 
org.apache.beam.sdk.runners.inprocess.InProcessExecutionContext.InProcessStepContext;
+import 
org.apache.beam.sdk.runners.inprocess.InProcessPipelineRunner.CommittedBundle;
 import 
org.apache.beam.sdk.runners.inprocess.InProcessPipelineRunner.UncommittedBundle;
 import org.apache.beam.sdk.transforms.AppliedPTransform;
+import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.util.DoFnRunner;
+import org.apache.beam.sdk.util.DoFnRunners;
 import org.apache.beam.sdk.util.DoFnRunners.OutputManager;
 import org.apache.beam.sdk.util.UserCodeException;
 import org.apache.beam.sdk.util.WindowedValue;
 import org.apache.beam.sdk.util.common.CounterSet;
 import org.apache.beam.sdk.util.state.CopyOnAccessInMemoryStateInternals;
 import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
 import org.apache.beam.sdk.values.TupleTag;
 
 import java.util.ArrayList;
@@ -36,13 +40,57 @@ import java.util.List;
 import java.util.Map;
 
 class ParDoInProcessEvaluator implements TransformEvaluator {
+  public static  ParDoInProcessEvaluator create(
+  InProcessEvaluationContext evaluationContext,
+  CommittedBundle inputBundle,
+  AppliedPTransform application,
+  DoFn fn,
+  List sideInputs,
+  TupleTag mainOutputTag,
+  List sideOutputTags,
+  Map outputs) {
+InProcessExecutionContext executionContext =
+evaluationContext.getExecutionContext(application, 
inputBundle.getKey());
+String stepName = evaluationContext.getStepName(application);
+InProcessStepContext stepContext =
+executionContext.getOrCreateStepContext(stepName, stepName, null);
+
+CounterSet counters = evaluationContext.createCounterSet();
+
+Map outputBundles = new HashMap<>();
+for (Map.Entry outputEntry : 
outputs.entrySet()) {
+  outputBundles.put(
+  outputEntry.getKey(),
+  evaluationContext.createBundle(inputBundle, outputEntry.getValue()));
+}
+
+DoFnRunner runner =
+DoFnRunners.createDefault(
+evaluationContext.getPipelineOptions(),
+fn,
+evaluationContext.createSideInputReader(sideInputs),
+BundleOutputManager.create(outputBundles),
+mainOutputTag,
+sideOutputTags,
+stepContext,
+counters.getAddCounterMutator(),
+application.getInput().getWindowingStrategy());
+
+runner.startBundle();
+
+return new ParDoInProcessEvaluator<>(
+runner, application, counters, outputBundles.values(), stepContext);
+  }
+
+  

+
   private final DoFnRunner fnRunner;
   private final AppliedPTransform transform;
   private final CounterSet counters;
   private final 

[3/3] incubator-beam git commit: Closes #188

2016-04-15 Thread dhalperi
Closes #188


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/e9f1b579
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/e9f1b579
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/e9f1b579

Branch: refs/heads/master
Commit: e9f1b579a4f5a134b3f00ef011af8d83185e8598
Parents: 7a5b7ad b6c74ff
Author: Dan Halperin 
Authored: Fri Apr 15 16:11:33 2016 -0700
Committer: Dan Halperin 
Committed: Fri Apr 15 16:11:33 2016 -0700

--
 .../inprocess/ParDoInProcessEvaluator.java  | 51 +-
 .../inprocess/ParDoMultiEvaluatorFactory.java   | 55 +---
 .../inprocess/ParDoSingleEvaluatorFactory.java  | 52 +-
 3 files changed, 76 insertions(+), 82 deletions(-)
--




[jira] [Commented] (BEAM-78) Rename Dataflow to Beam

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243807#comment-15243807
 ] 

ASF GitHub Bot commented on BEAM-78:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/191


> Rename Dataflow to Beam 
> 
>
> Key: BEAM-78
> URL: https://issues.apache.org/jira/browse/BEAM-78
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Frances Perry
>Assignee: Jean-Baptiste Onofré
>Priority: Blocker
>
> The initial code drop contains code that uses "Dataflow" to refer to the 
> SDK/model and Cloud Dataflow service. The first usage needs to be swapped to 
> Beam.
> This includes:
> - mentions throughout the javadoc
> - packages of classes that belong to the java sdk core
> And does not include:
> - the DataflowPipelineRunner
> We plan to postpone this rename until other code drops have been integrated 
> into the repository, and we have completed the refactoring that will separate 
> these two uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] incubator-beam git commit: [BEAM-78] This closes #191

2016-04-15 Thread lcwik
[BEAM-78] This closes #191


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/7a5b7ad8
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/7a5b7ad8
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/7a5b7ad8

Branch: refs/heads/master
Commit: 7a5b7ad805ef6829be89c4b755f854e23404cb52
Parents: eb682a8 e0b1131
Author: Luke Cwik 
Authored: Fri Apr 15 15:56:14 2016 -0700
Committer: Luke Cwik 
Committed: Fri Apr 15 15:56:14 2016 -0700

--
 .../beam/sdk/runners/worker/IsmFormat.java  | 32 ++--
 .../apache/beam/sdk/util/ReshuffleTrigger.java  |  2 +-
 2 files changed, 17 insertions(+), 17 deletions(-)
--




[GitHub] incubator-beam pull request: [BEAM-78] Expose package private meth...

2016-04-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/191


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-78) Rename Dataflow to Beam

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243743#comment-15243743
 ] 

ASF GitHub Bot commented on BEAM-78:


GitHub user lukecwik opened a pull request:

https://github.com/apache/incubator-beam/pull/191

[BEAM-78] Expose package private methods that Dataflow worker relies on

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

The Beam rename caused package structure to change. This broke
some of the visiblity requirements inside Dataflow worker.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam beam_rename

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #191


commit 99020bc5a31669af191778c3a58d80a57263
Author: Luke Cwik 
Date:   2016-04-15T22:04:46Z

[BEAM-78] Expose package private methods that Dataflow worker relies on

The Beam rename caused package structure to change. This broke
some of the visiblity requirements inside Dataflow worker.

commit 70d7a7cc56fc4c353be2a64c35a38080cf125b69
Author: Luke Cwik 
Date:   2016-04-15T22:09:46Z

[BEAM-78] !fixup Fix package import order




> Rename Dataflow to Beam 
> 
>
> Key: BEAM-78
> URL: https://issues.apache.org/jira/browse/BEAM-78
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Frances Perry
>Assignee: Jean-Baptiste Onofré
>Priority: Blocker
>
> The initial code drop contains code that uses "Dataflow" to refer to the 
> SDK/model and Cloud Dataflow service. The first usage needs to be swapped to 
> Beam.
> This includes:
> - mentions throughout the javadoc
> - packages of classes that belong to the java sdk core
> And does not include:
> - the DataflowPipelineRunner
> We plan to postpone this rename until other code drops have been integrated 
> into the repository, and we have completed the refactoring that will separate 
> these two uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #123

2016-04-15 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #123

2016-04-15 Thread Apache Jenkins Server
See 




[GitHub] incubator-beam pull request: [BEAM-121] Add DisplayData for combin...

2016-04-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/126


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/3] incubator-beam git commit: Add DisplayData for combine transforms

2016-04-15 Thread bchambers
Add DisplayData for combine transforms

If more than one combineFn have the same namespace, add a sequential suffix.
This is necessary because each namespace/key pair must be unique within
the transform.

Add a `JavaClass` wrapper around a name/simple-name for a class. This is
necessary in cases where the class may be serialized to support
accessing `DisplayData` since `Class` is not serializable in some cases.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/b0baa4c9
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/b0baa4c9
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/b0baa4c9

Branch: refs/heads/master
Commit: b0baa4c9d66750b1cbdbb0dc7f02e62385436bc2
Parents: d440d94
Author: Scott Wegner 
Authored: Mon Apr 11 09:08:23 2016 -0700
Committer: bchambers 
Committed: Fri Apr 15 14:26:51 2016 -0700

--
 .../sdk/transforms/ApproximateQuantiles.java|   8 +
 .../beam/sdk/transforms/ApproximateUnique.java  |  44 ++
 .../org/apache/beam/sdk/transforms/Combine.java | 155 +++
 .../beam/sdk/transforms/CombineFnBase.java  |  27 +++-
 .../apache/beam/sdk/transforms/CombineFns.java  |  65 
 .../beam/sdk/transforms/CombineWithContext.java |   6 +
 .../org/apache/beam/sdk/transforms/Max.java |   6 +
 .../org/apache/beam/sdk/transforms/Min.java |   6 +
 .../org/apache/beam/sdk/transforms/Sample.java  |  14 ++
 .../org/apache/beam/sdk/transforms/Top.java |   8 +
 .../sdk/transforms/display/ClassForDisplay.java |  93 +++
 .../sdk/transforms/display/DisplayData.java | 111 +++--
 .../org/apache/beam/sdk/util/CombineFnUtil.java |  13 ++
 .../transforms/ApproximateQuantilesTest.java|  13 ++
 .../sdk/transforms/ApproximateUniqueTest.java   |  17 ++
 .../beam/sdk/transforms/CombineFnsTest.java |  69 -
 .../apache/beam/sdk/transforms/CombineTest.java |  22 ++-
 .../org/apache/beam/sdk/transforms/MaxTest.java |  13 +-
 .../org/apache/beam/sdk/transforms/MinTest.java |  13 +-
 .../apache/beam/sdk/transforms/SampleTest.java  |  14 ++
 .../org/apache/beam/sdk/transforms/TopTest.java |  13 ++
 .../transforms/display/ClassForDisplayTest.java |  66 
 .../transforms/display/DisplayDataMatchers.java |  51 +++---
 .../sdk/transforms/display/DisplayDataTest.java |  18 ++-
 .../display/ClassForDisplayJava8Test.java   |  46 ++
 .../beam/sdk/transforms/CombineJava8Test.java   |  42 +
 26 files changed, 878 insertions(+), 75 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b0baa4c9/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java
index 2ed7a85..c58c736 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java
@@ -25,6 +25,7 @@ import org.apache.beam.sdk.coders.CustomCoder;
 import org.apache.beam.sdk.coders.ListCoder;
 import org.apache.beam.sdk.transforms.Combine.AccumulatingCombineFn;
 import 
org.apache.beam.sdk.transforms.Combine.AccumulatingCombineFn.Accumulator;
+import org.apache.beam.sdk.transforms.display.DisplayData;
 import org.apache.beam.sdk.util.WeightedValue;
 import org.apache.beam.sdk.util.common.ElementByteSizeObserver;
 import org.apache.beam.sdk.values.KV;
@@ -359,6 +360,13 @@ public class ApproximateQuantiles {
 CoderRegistry registry, Coder elementCoder) {
   return new QuantileStateCoder<>(compareFn, elementCoder);
 }
+
+@Override
+public void populateDisplayData(DisplayData.Builder builder) {
+  builder
+  .add("numQuantiles", numQuantiles)
+  .add("comparer", compareFn.getClass());
+}
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b0baa4c9/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java
index 4f9dfc4..175897b 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateUnique.java
@@ -24,6 +24,7 @@ import org.apache.beam.sdk.coders.CoderRegistry;
 import org.apache.beam.sdk.coders.KvCoder;
 import 

[jira] [Commented] (BEAM-121) Publish DisplayData from common PTransforms

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243723#comment-15243723
 ] 

ASF GitHub Bot commented on BEAM-121:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/126


> Publish DisplayData from common PTransforms
> ---
>
> Key: BEAM-121
> URL: https://issues.apache.org/jira/browse/BEAM-121
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[1/3] incubator-beam git commit: Add DisplayData for combine transforms

2016-04-15 Thread bchambers
Repository: incubator-beam
Updated Branches:
  refs/heads/master d440d9443 -> eb682a80c


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b0baa4c9/sdks/java/java8tests/src/test/java/com/google/cloud/dataflow/sdk/transforms/display/ClassForDisplayJava8Test.java
--
diff --git 
a/sdks/java/java8tests/src/test/java/com/google/cloud/dataflow/sdk/transforms/display/ClassForDisplayJava8Test.java
 
b/sdks/java/java8tests/src/test/java/com/google/cloud/dataflow/sdk/transforms/display/ClassForDisplayJava8Test.java
new file mode 100644
index 000..4db235f
--- /dev/null
+++ 
b/sdks/java/java8tests/src/test/java/com/google/cloud/dataflow/sdk/transforms/display/ClassForDisplayJava8Test.java
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package com.google.cloud.dataflow.sdk.transforms.display;
+
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.display.ClassForDisplay;
+import org.apache.beam.sdk.util.SerializableUtils;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+import java.io.Serializable;
+
+/**
+ * Java 8 tests for {@link ClassForDisplay}.
+ */
+@RunWith(JUnit4.class)
+public class ClassForDisplayJava8Test implements Serializable {
+  @Test
+  public void testLambdaClassSerialization() {
+final SerializableFunction f = x -> x;
+Serializable myClass = new Serializable() {
+  // Class references for lambdas do not serialize, which is why we 
support ClassForDisplay
+  // Specifically, the following would not work:
+  // Class clazz = f.getClass();
+  ClassForDisplay javaClass = ClassForDisplay.fromInstance(f);
+};
+
+SerializableUtils.ensureSerializable(myClass);
+  }
+}

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b0baa4c9/sdks/java/java8tests/src/test/java/org/apache/beam/sdk/transforms/CombineJava8Test.java
--
diff --git 
a/sdks/java/java8tests/src/test/java/org/apache/beam/sdk/transforms/CombineJava8Test.java
 
b/sdks/java/java8tests/src/test/java/org/apache/beam/sdk/transforms/CombineJava8Test.java
index 00fc087..132247b 100644
--- 
a/sdks/java/java8tests/src/test/java/org/apache/beam/sdk/transforms/CombineJava8Test.java
+++ 
b/sdks/java/java8tests/src/test/java/org/apache/beam/sdk/transforms/CombineJava8Test.java
@@ -17,12 +17,21 @@
  */
 package org.apache.beam.sdk.transforms;
 
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.hamcrest.Matchers.empty;
+import static org.hamcrest.Matchers.not;
+
 import org.apache.beam.sdk.Pipeline;
 import org.apache.beam.sdk.testing.PAssert;
 import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.util.SerializableUtils;
 import org.apache.beam.sdk.values.KV;
 import org.apache.beam.sdk.values.PCollection;
 
+import com.google.common.collect.Iterables;
+
+import org.junit.Assume;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.rules.ExpectedException;
@@ -131,4 +140,37 @@ public class CombineJava8Test implements Serializable {
 KV.of("c", 4));
 pipeline.run();
   }
+
+  /**
+   * Tests that we can serialize {@link Combine.CombineFn CombineFns} 
constructed from a lambda.
+   * Lambdas can be problematic because the {@link Class} object is synthetic 
and cannot be
+   * deserialized.
+   */
+  @Test
+  public void testLambdaSerialization() {
+SerializableFunction combiner = xs -> 
Iterables.getFirst(xs, 0);
+
+boolean lambdaClassSerializationThrows;
+try {
+  SerializableUtils.clone(combiner.getClass());
+  lambdaClassSerializationThrows = false;
+} catch (IllegalArgumentException e) {
+  // Expected
+  lambdaClassSerializationThrows = true;
+}
+Assume.assumeTrue("Expected lambda class serialization to fail. "
++ "If it's fixed, we can remove special behavior in Combine.",
+lambdaClassSerializationThrows);
+
+
+Combine.Globally combine = Combine.globally(combiner);
+

[3/3] incubator-beam git commit: This closes #126

2016-04-15 Thread bchambers
This closes #126


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/eb682a80
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/eb682a80
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/eb682a80

Branch: refs/heads/master
Commit: eb682a80c4091dce5242ebc8272f43f3461a6fc5
Parents: d440d94 b0baa4c
Author: bchambers 
Authored: Fri Apr 15 14:29:30 2016 -0700
Committer: bchambers 
Committed: Fri Apr 15 14:29:30 2016 -0700

--
 .../sdk/transforms/ApproximateQuantiles.java|   8 +
 .../beam/sdk/transforms/ApproximateUnique.java  |  44 ++
 .../org/apache/beam/sdk/transforms/Combine.java | 155 +++
 .../beam/sdk/transforms/CombineFnBase.java  |  27 +++-
 .../apache/beam/sdk/transforms/CombineFns.java  |  65 
 .../beam/sdk/transforms/CombineWithContext.java |   6 +
 .../org/apache/beam/sdk/transforms/Max.java |   6 +
 .../org/apache/beam/sdk/transforms/Min.java |   6 +
 .../org/apache/beam/sdk/transforms/Sample.java  |  14 ++
 .../org/apache/beam/sdk/transforms/Top.java |   8 +
 .../sdk/transforms/display/ClassForDisplay.java |  93 +++
 .../sdk/transforms/display/DisplayData.java | 111 +++--
 .../org/apache/beam/sdk/util/CombineFnUtil.java |  13 ++
 .../transforms/ApproximateQuantilesTest.java|  13 ++
 .../sdk/transforms/ApproximateUniqueTest.java   |  17 ++
 .../beam/sdk/transforms/CombineFnsTest.java |  69 -
 .../apache/beam/sdk/transforms/CombineTest.java |  22 ++-
 .../org/apache/beam/sdk/transforms/MaxTest.java |  13 +-
 .../org/apache/beam/sdk/transforms/MinTest.java |  13 +-
 .../apache/beam/sdk/transforms/SampleTest.java  |  14 ++
 .../org/apache/beam/sdk/transforms/TopTest.java |  13 ++
 .../transforms/display/ClassForDisplayTest.java |  66 
 .../transforms/display/DisplayDataMatchers.java |  51 +++---
 .../sdk/transforms/display/DisplayDataTest.java |  18 ++-
 .../display/ClassForDisplayJava8Test.java   |  46 ++
 .../beam/sdk/transforms/CombineJava8Test.java   |  42 +
 26 files changed, 878 insertions(+), 75 deletions(-)
--




[jira] [Commented] (BEAM-201) Material page

2016-04-15 Thread James Malone (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243676#comment-15243676
 ] 

James Malone commented on BEAM-201:
---

Submitted PR (13) for this issue.

> Material page
> -
>
> Key: BEAM-201
> URL: https://issues.apache.org/jira/browse/BEAM-201
> Project: Beam
>  Issue Type: Improvement
>  Components: website
> Environment: Create a website page with logo and project material 
> content
>Reporter: James Malone
>Assignee: James Malone
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-201) Material page

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243674#comment-15243674
 ] 

ASF GitHub Bot commented on BEAM-201:
-

GitHub user evilsoapbox opened a pull request:

https://github.com/apache/incubator-beam-site/pull/13

Material page; logo fixes

- Added material page with project logos/materials
- Navigation fixes
- Logo fix for the main logo on all pages

In JIRA as [BEAM-201](https://issues.apache.org/jira/browse/BEAM-201)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/evilsoapbox/incubator-beam-site logo-files

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam-site/pull/13.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13


commit a7cdd50326c5f14e7b578328001a710ba4672620
Author: James Malone 
Date:   2016-04-15T21:13:19Z

Addition of material page; nav fixes




> Material page
> -
>
> Key: BEAM-201
> URL: https://issues.apache.org/jira/browse/BEAM-201
> Project: Beam
>  Issue Type: Improvement
>  Components: website
> Environment: Create a website page with logo and project material 
> content
>Reporter: James Malone
>Assignee: James Malone
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #122

2016-04-15 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #122

2016-04-15 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-155) Support asserting the contents of windows and panes in PAssert

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243665#comment-15243665
 ] 

ASF GitHub Bot commented on BEAM-155:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/180


> Support asserting the contents of windows and panes in PAssert
> --
>
> Key: BEAM-155
> URL: https://issues.apache.org/jira/browse/BEAM-155
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> This consists of reifying the output windows and panes, and running asserts 
> per-window about the contents of panes.
> This includes aggregated matching and final pane matching, e.g.
> PAssert.that(output).byOnTimePane().hasOutputElements(foo, bar);
> // For discarding mode - could have emitted (say) [spam, eggs], [spam], [], 
> [sausage], []
> PAssert.that(output).byFinalPane().hasOutputElements(spam, eggs, sausage, 
> spam);
> // For accumulating mode without late data
> PAssert.that(output).finalPane().containsInAnyOrder(spam, eggs, sausage, 
> spam);
> // For accumulating mode with late data
> PAssert.that(output).finalPane().containsInAnyOrder(foo, 
> bar).mayAlsoContain(baz, rab);
> See also: 
> https://docs.google.com/document/d/1fZUUbG2LxBtqCVabQshldXIhkMcXepsbv2vuuny8Ix4/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #121

2016-04-15 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-121) Publish DisplayData from common PTransforms

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243554#comment-15243554
 ] 

ASF GitHub Bot commented on BEAM-121:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/125


> Publish DisplayData from common PTransforms
> ---
>
> Key: BEAM-121
> URL: https://issues.apache.org/jira/browse/BEAM-121
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-121] Add display data to ParDo ...

2016-04-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/125


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] incubator-beam git commit: Add display data to ParDo transforms

2016-04-15 Thread bchambers
Repository: incubator-beam
Updated Branches:
  refs/heads/master 9ff426964 -> 0bb4f9c1e


Add display data to ParDo transforms


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/77a77c9d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/77a77c9d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/77a77c9d

Branch: refs/heads/master
Commit: 77a77c9d159d7f9b1e6f645e54f0a4de86180bfe
Parents: 9ff4269
Author: Scott Wegner 
Authored: Tue Apr 12 10:08:37 2016 -0700
Committer: bchambers 
Committed: Fri Apr 15 12:12:44 2016 -0700

--
 .../runners/DataflowPipelineTranslatorTest.java | 94 +++-
 .../runners/inprocess/ForwardingPTransform.java |  6 ++
 .../beam/sdk/transforms/DoFnReflector.java  |  6 ++
 .../beam/sdk/transforms/DoFnWithContext.java| 14 ++-
 .../org/apache/beam/sdk/transforms/Filter.java  | 27 ++
 .../apache/beam/sdk/transforms/GroupByKey.java  |  9 ++
 .../transforms/IntraBundleParallelization.java  |  9 ++
 .../org/apache/beam/sdk/transforms/ParDo.java   | 62 ++---
 .../apache/beam/sdk/transforms/Partition.java   | 13 +++
 .../inprocess/ForwardingPTransformTest.java | 10 +++
 .../sdk/transforms/DoFnWithContextTest.java | 11 +++
 .../apache/beam/sdk/transforms/FilterTest.java  | 20 +
 .../beam/sdk/transforms/GroupByKeyTest.java | 15 
 .../IntraBundleParallelizationTest.java | 26 ++
 .../apache/beam/sdk/transforms/ParDoTest.java   | 74 ---
 .../beam/sdk/transforms/PartitionTest.java  | 13 +++
 16 files changed, 341 insertions(+), 68 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/77a77c9d/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/sdk/runners/DataflowPipelineTranslatorTest.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/sdk/runners/DataflowPipelineTranslatorTest.java
 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/sdk/runners/DataflowPipelineTranslatorTest.java
index 7a3caa6..0d58601 100644
--- 
a/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/sdk/runners/DataflowPipelineTranslatorTest.java
+++ 
b/runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/sdk/runners/DataflowPipelineTranslatorTest.java
@@ -72,9 +72,9 @@ import com.google.api.services.dataflow.model.Step;
 import com.google.api.services.dataflow.model.WorkerPool;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.ImmutableSet;
 import com.google.common.collect.Iterables;
 
-import org.hamcrest.Matcher;
 import org.hamcrest.Matchers;
 import org.junit.Assert;
 import org.junit.Rule;
@@ -890,10 +890,12 @@ public class DataflowPipelineTranslatorTest implements 
Serializable {
   }
 };
 
+ParDo.Bound parDo1 = ParDo.of(fn1);
+ParDo.Bound parDo2 = ParDo.of(fn2);
 pipeline
   .apply(Create.of(1, 2, 3))
-  .apply(ParDo.of(fn1))
-  .apply(ParDo.of(fn2));
+  .apply(parDo1)
+  .apply(parDo2);
 
 Job job =
 translator
@@ -910,43 +912,53 @@ public class DataflowPipelineTranslatorTest implements 
Serializable {
 Map parDo2Properties = steps.get(2).getProperties();
 assertThat(parDo1Properties, hasKey("display_data"));
 
-
-@SuppressWarnings("unchecked")
-Collection> fn1displayData =
-(Collection>) 
parDo1Properties.get("display_data");
-@SuppressWarnings("unchecked")
-Collection> fn2displayData =
-(Collection>) 
parDo2Properties.get("display_data");
-
-@SuppressWarnings("unchecked")
-Matcher> fn1expectedData =
-Matchers.>containsInAnyOrder(
-ImmutableMap.builder()
-.put("namespace", fn1.getClass().getName())
-.put("key", "foo")
-.put("type", "STRING")
-.put("value", "bar")
-.build(),
-ImmutableMap.builder()
-.put("namespace", fn1.getClass().getName())
-.put("key", "foo2")
-.put("type", "JAVA_CLASS")
-.put("value", DataflowPipelineTranslatorTest.class.getName())
-.put("shortValue", 
DataflowPipelineTranslatorTest.class.getSimpleName())
-.put("label", "Test Class")
-.put("linkUrl", "http://www.google.com;)
-.build());
-
-@SuppressWarnings("unchecked")
-

Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #120

2016-04-15 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #120

2016-04-15 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-121) Publish DisplayData from common PTransforms

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243430#comment-15243430
 ] 

ASF GitHub Bot commented on BEAM-121:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/124


> Publish DisplayData from common PTransforms
> ---
>
> Key: BEAM-121
> URL: https://issues.apache.org/jira/browse/BEAM-121
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[1/2] incubator-beam git commit: Add display data to windowing transforms

2016-04-15 Thread bchambers
Repository: incubator-beam
Updated Branches:
  refs/heads/master c8ed39806 -> 9ff426964


Add display data to windowing transforms

Expose NeverTrigger as package-private since it is necessary for display
data


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/450dd856
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/450dd856
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/450dd856

Branch: refs/heads/master
Commit: 450dd856f00fc898bb9d2bdc1a5331f13a6c52c1
Parents: c8ed398
Author: Scott Wegner 
Authored: Mon Apr 11 15:38:39 2016 -0700
Committer: bchambers 
Committed: Fri Apr 15 11:27:39 2016 -0700

--
 .../beam/sdk/transforms/windowing/AfterAll.java | 10 +++
 .../windowing/AfterDelayFromFirstElement.java   | 11 ++-
 .../sdk/transforms/windowing/AfterEach.java | 10 +++
 .../sdk/transforms/windowing/AfterFirst.java| 10 +++
 .../windowing/AfterProcessingTime.java  | 11 ++-
 .../transforms/windowing/AfterWatermark.java| 24 ++-
 .../transforms/windowing/CalendarWindows.java   | 35 --
 .../sdk/transforms/windowing/FixedWindows.java  |  8 +++
 .../beam/sdk/transforms/windowing/Never.java|  3 +-
 .../transforms/windowing/OrFinallyTrigger.java  |  5 ++
 .../sdk/transforms/windowing/Repeatedly.java|  7 +-
 .../beam/sdk/transforms/windowing/Sessions.java |  6 ++
 .../transforms/windowing/SlidingWindows.java|  9 +++
 .../beam/sdk/transforms/windowing/Window.java   | 25 +++
 .../beam/sdk/transforms/windowing/WindowFn.java | 14 +++-
 .../apache/beam/sdk/util/ReshuffleTrigger.java  |  5 ++
 .../sdk/transforms/windowing/AfterAllTest.java  |  5 ++
 .../sdk/transforms/windowing/AfterEachTest.java | 10 +++
 .../transforms/windowing/AfterFirstTest.java|  6 ++
 .../sdk/transforms/windowing/AfterPaneTest.java |  6 ++
 .../windowing/AfterProcessingTimeTest.java  | 31 +
 .../windowing/AfterWatermarkTest.java   | 27 
 .../windowing/CalendarWindowsTest.java  | 31 +
 .../transforms/windowing/FixedWindowsTest.java  | 14 
 .../windowing/OrFinallyTriggerTest.java |  6 ++
 .../transforms/windowing/RepeatedlyTest.java| 13 
 .../sdk/transforms/windowing/SessionsTest.java  | 10 +++
 .../windowing/SlidingWindowsTest.java   | 20 ++
 .../sdk/transforms/windowing/StubTrigger.java   | 71 
 .../sdk/transforms/windowing/TriggerTest.java   |  2 +-
 .../sdk/transforms/windowing/WindowTest.java| 59 
 .../beam/sdk/util/ReshuffleTriggerTest.java | 10 ++-
 32 files changed, 501 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/450dd856/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/AfterAll.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/AfterAll.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/AfterAll.java
index ac1fa43..0f609df 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/AfterAll.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/AfterAll.java
@@ -21,6 +21,7 @@ import org.apache.beam.sdk.annotations.Experimental;
 import org.apache.beam.sdk.transforms.windowing.Trigger.OnceTrigger;
 import org.apache.beam.sdk.util.ExecutableTrigger;
 
+import com.google.common.base.Joiner;
 import com.google.common.base.Preconditions;
 
 import org.joda.time.Instant;
@@ -112,4 +113,13 @@ public class AfterAll extends OnceTrigger {
   subtrigger.invokeOnFire(context);
 }
   }
+
+  @Override
+  public String toString() {
+StringBuilder builder = new StringBuilder("AfterAll.of(");
+Joiner.on(", ").appendTo(builder, subTriggers);
+builder.append(")");
+
+return builder.toString();
+  }
 }

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/450dd856/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/AfterDelayFromFirstElement.java
--
diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/AfterDelayFromFirstElement.java
 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/AfterDelayFromFirstElement.java
index 83e0bea..7ec3ce9 100644
--- 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/AfterDelayFromFirstElement.java
+++ 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/AfterDelayFromFirstElement.java
@@ -36,10 +36,12 @@ import com.google.common.collect.ImmutableList;
 
 import org.joda.time.Duration;
 import 

[2/2] incubator-beam git commit: This closes #124

2016-04-15 Thread bchambers
This closes #124


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/9ff42696
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/9ff42696
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/9ff42696

Branch: refs/heads/master
Commit: 9ff426964835def796dc0efd47f309ea521beb4f
Parents: c8ed398 450dd85
Author: bchambers 
Authored: Fri Apr 15 11:28:15 2016 -0700
Committer: bchambers 
Committed: Fri Apr 15 11:28:15 2016 -0700

--
 .../beam/sdk/transforms/windowing/AfterAll.java | 10 +++
 .../windowing/AfterDelayFromFirstElement.java   | 11 ++-
 .../sdk/transforms/windowing/AfterEach.java | 10 +++
 .../sdk/transforms/windowing/AfterFirst.java| 10 +++
 .../windowing/AfterProcessingTime.java  | 11 ++-
 .../transforms/windowing/AfterWatermark.java| 24 ++-
 .../transforms/windowing/CalendarWindows.java   | 35 --
 .../sdk/transforms/windowing/FixedWindows.java  |  8 +++
 .../beam/sdk/transforms/windowing/Never.java|  3 +-
 .../transforms/windowing/OrFinallyTrigger.java  |  5 ++
 .../sdk/transforms/windowing/Repeatedly.java|  7 +-
 .../beam/sdk/transforms/windowing/Sessions.java |  6 ++
 .../transforms/windowing/SlidingWindows.java|  9 +++
 .../beam/sdk/transforms/windowing/Window.java   | 25 +++
 .../beam/sdk/transforms/windowing/WindowFn.java | 14 +++-
 .../apache/beam/sdk/util/ReshuffleTrigger.java  |  5 ++
 .../sdk/transforms/windowing/AfterAllTest.java  |  5 ++
 .../sdk/transforms/windowing/AfterEachTest.java | 10 +++
 .../transforms/windowing/AfterFirstTest.java|  6 ++
 .../sdk/transforms/windowing/AfterPaneTest.java |  6 ++
 .../windowing/AfterProcessingTimeTest.java  | 31 +
 .../windowing/AfterWatermarkTest.java   | 27 
 .../windowing/CalendarWindowsTest.java  | 31 +
 .../transforms/windowing/FixedWindowsTest.java  | 14 
 .../windowing/OrFinallyTriggerTest.java |  6 ++
 .../transforms/windowing/RepeatedlyTest.java| 13 
 .../sdk/transforms/windowing/SessionsTest.java  | 10 +++
 .../windowing/SlidingWindowsTest.java   | 20 ++
 .../sdk/transforms/windowing/StubTrigger.java   | 71 
 .../sdk/transforms/windowing/TriggerTest.java   |  2 +-
 .../sdk/transforms/windowing/WindowTest.java| 59 
 .../beam/sdk/util/ReshuffleTriggerTest.java | 10 ++-
 32 files changed, 501 insertions(+), 13 deletions(-)
--




[GitHub] incubator-beam pull request: Integration fixes

2016-04-15 Thread dhalperi
GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/189

Integration fixes

Resolves a testing difference between DataflowJavaSDK and Beam copies of 
the code that was lost in a recent code change.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam 
forward-port-test-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #189


commit c1ce371260bae50b2e043b5c35b664f87a231923
Author: swegner 
Date:   2016-04-06T23:57:56Z

Integration fixes

Release Notes
[]
-
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=119217803




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #119

2016-04-15 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #119

2016-04-15 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #118

2016-04-15 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_MavenVerify » Apache Beam :: Runners :: Spark #172

2016-04-15 Thread Apache Jenkins Server
See 




Jenkins build is back to stable : beam_PostCommit_MavenVerify #172

2016-04-15 Thread Apache Jenkins Server
See 



Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #117

2016-04-15 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243267#comment-15243267
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/188

[BEAM-22] Improve ParDoEvaluator Factoring

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This moves shared code into a common location.

Clone DoFn instances before constructing the DoFnRunner to
 avoid races.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
ippr_better_ParDoEvaluator_factoring

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/188.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #188


commit ecc26d51cee4ea1568948d48cd3441594f638e39
Author: Thomas Groh 
Date:   2016-03-30T00:38:22Z

Move Shared construction code to ParDoInProcessEvaluator

Remove duplicate code in ParDo(Single/Multi)EvaluatorFactory; instead
only extract the appropriate elements and pass them to the
ParDoInProcessEvaluator.t log

commit e47aba0a7097cee8341369594e47e73b83029a50
Author: Thomas Groh 
Date:   2016-04-15T17:23:15Z

Clone DoFns before constructing a DoFnRunner

This ensures that each thread gets an individual copy of a DoFn, so
multiple threads do not interact.




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-77) Reorganize Directory structure

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243263#comment-15243263
 ] 

ASF GitHub Bot commented on BEAM-77:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/96


> Reorganize Directory structure
> --
>
> Key: BEAM-77
> URL: https://issues.apache.org/jira/browse/BEAM-77
> Project: Beam
>  Issue Type: Task
>  Components: project-management
>Reporter: Frances Perry
>Assignee: Jean-Baptiste Onofré
>
> Now that we've done the initial Dataflow code drop, we will restructure 
> directories to provide space for additional SDKs and Runners.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-77] Move hadoop contrib as hdfs...

2016-04-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/96


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: This closes #96

2016-04-15 Thread davor
This closes #96


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/c8ed3980
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/c8ed3980
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/c8ed3980

Branch: refs/heads/master
Commit: c8ed3980687f22c0d2342ed7f7117b9e6550e1c6
Parents: bcefff6 404b633
Author: Davor Bonaci 
Authored: Fri Apr 15 10:28:53 2016 -0700
Committer: Davor Bonaci 
Committed: Fri Apr 15 10:28:53 2016 -0700

--
 contrib/hadoop/README.md|  24 -
 contrib/hadoop/pom.xml  | 170 ---
 .../apache/contrib/hadoop/HadoopFileSource.java | 486 ---
 .../apache/contrib/hadoop/WritableCoder.java| 111 -
 .../contrib/hadoop/HadoopFileSourceTest.java| 190 
 .../contrib/hadoop/WritableCoderTest.java   |  37 --
 sdks/java/io/hdfs/README.md |  24 +
 sdks/java/io/hdfs/pom.xml   |  65 +++
 .../apache/beam/sdk/io/hdfs/HDFSFileSource.java | 486 +++
 .../apache/beam/sdk/io/hdfs/WritableCoder.java  | 111 +
 .../beam/sdk/io/hdfs/HDFSFileSourceTest.java| 190 
 .../beam/sdk/io/hdfs/WritableCoderTest.java |  37 ++
 sdks/java/io/pom.xml|  41 ++
 sdks/java/pom.xml   |   1 +
 14 files changed, 955 insertions(+), 1018 deletions(-)
--




Jenkins build became unstable: beam_PostCommit_MavenVerify » Apache Beam :: Runners :: Spark #171

2016-04-15 Thread Apache Jenkins Server
See 




Jenkins build became unstable: beam_PostCommit_MavenVerify #171

2016-04-15 Thread Apache Jenkins Server
See 



[1/2] incubator-beam git commit: Remove the DataflowPipeline Class

2016-04-15 Thread lcwik
Repository: incubator-beam
Updated Branches:
  refs/heads/master 1eec6863f -> bcefff6a3


Remove the DataflowPipeline Class

Pipelines that run with the DataflowPipelineRunner should be created
using an appropriately constructed PipelineOptions (i.e. one with the
runner set to DataflowPipelineRunner.class)


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/b1eee4bc
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/b1eee4bc
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/b1eee4bc

Branch: refs/heads/master
Commit: b1eee4bcca632eacf4a6dd724ed3eeb27ace0d77
Parents: 1eec686
Author: Thomas Groh 
Authored: Fri Apr 15 09:23:08 2016 -0700
Committer: Luke Cwik 
Committed: Fri Apr 15 10:07:57 2016 -0700

--
 .../sdk/options/DataflowPipelineOptions.java|  14 +-
 .../beam/sdk/runners/DataflowPipeline.java  |  60 ---
 .../sdk/runners/DataflowPipelineRegistrar.java  |   4 +-
 .../sdk/runners/DataflowPipelineRunnerTest.java |  37 +++--
 .../beam/sdk/runners/DataflowPipelineTest.java  |  45 --
 .../runners/DataflowPipelineTranslatorTest.java | 159 +--
 6 files changed, 136 insertions(+), 183 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b1eee4bc/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/options/DataflowPipelineOptions.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/options/DataflowPipelineOptions.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/options/DataflowPipelineOptions.java
index 4eae85a..50fc956 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/options/DataflowPipelineOptions.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/options/DataflowPipelineOptions.java
@@ -17,7 +17,7 @@
  */
 package org.apache.beam.sdk.options;
 
-import org.apache.beam.sdk.runners.DataflowPipeline;
+import org.apache.beam.sdk.runners.DataflowPipelineRunner;
 
 import com.google.common.base.MoreObjects;
 
@@ -27,14 +27,14 @@ import org.joda.time.format.DateTimeFormat;
 import org.joda.time.format.DateTimeFormatter;
 
 /**
- * Options that can be used to configure the {@link DataflowPipeline}.
+ * Options that can be used to configure the {@link DataflowPipelineRunner}.
  */
 @Description("Options that configure the Dataflow pipeline.")
-public interface DataflowPipelineOptions extends
-PipelineOptions, GcpOptions, ApplicationNameOptions, 
DataflowPipelineDebugOptions,
-DataflowPipelineWorkerPoolOptions, BigQueryOptions,
-GcsOptions, StreamingOptions, CloudDebuggerOptions, 
DataflowWorkerLoggingOptions,
-DataflowProfilingOptions, PubsubOptions {
+public interface DataflowPipelineOptions
+extends PipelineOptions, GcpOptions, ApplicationNameOptions, 
DataflowPipelineDebugOptions,
+DataflowPipelineWorkerPoolOptions, BigQueryOptions, GcsOptions, 
StreamingOptions,
+CloudDebuggerOptions, DataflowWorkerLoggingOptions, 
DataflowProfilingOptions,
+PubsubOptions {
 
   @Description("Project id. Required when running a Dataflow in the cloud. "
   + "See https://cloud.google.com/storage/docs/projects for further 
details.")

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/b1eee4bc/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/DataflowPipeline.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/DataflowPipeline.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/DataflowPipeline.java
deleted file mode 100644
index 4d91a38..000
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/sdk/runners/DataflowPipeline.java
+++ /dev/null
@@ -1,60 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * 

[GitHub] incubator-beam pull request: Remove the DataflowPipeline Class

2016-04-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/186


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: This closes #186

2016-04-15 Thread lcwik
This closes #186


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/bcefff6a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/bcefff6a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/bcefff6a

Branch: refs/heads/master
Commit: bcefff6a3bedb0f114d614c27da12a4cc2b34f48
Parents: 1eec686 b1eee4b
Author: Luke Cwik 
Authored: Fri Apr 15 10:08:18 2016 -0700
Committer: Luke Cwik 
Committed: Fri Apr 15 10:08:18 2016 -0700

--
 .../sdk/options/DataflowPipelineOptions.java|  14 +-
 .../beam/sdk/runners/DataflowPipeline.java  |  60 ---
 .../sdk/runners/DataflowPipelineRegistrar.java  |   4 +-
 .../sdk/runners/DataflowPipelineRunnerTest.java |  37 +++--
 .../beam/sdk/runners/DataflowPipelineTest.java  |  45 --
 .../runners/DataflowPipelineTranslatorTest.java | 159 +--
 6 files changed, 136 insertions(+), 183 deletions(-)
--




Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #116

2016-04-15 Thread Apache Jenkins Server
See 




[GitHub] incubator-beam pull request: Update Spark Test to use more standar...

2016-04-15 Thread tgroh
Github user tgroh closed the pull request at:

https://github.com/apache/incubator-beam/pull/177


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: EncodabilityEnforcementFactoryTest: add unit annotations

2016-04-15 Thread dhalperi
EncodabilityEnforcementFactoryTest: add unit annotations


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/980351ec
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/980351ec
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/980351ec

Branch: refs/heads/master
Commit: 980351ecbe49407ddc796f652ab912fb9ed004af
Parents: 96d324e
Author: Dan Halperin 
Authored: Thu Apr 14 22:18:32 2016 -0700
Committer: Dan Halperin 
Committed: Fri Apr 15 10:04:45 2016 -0700

--
 .../sdk/runners/inprocess/EncodabilityEnforcementFactoryTest.java | 3 +++
 1 file changed, 3 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/980351ec/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/inprocess/EncodabilityEnforcementFactoryTest.java
--
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/inprocess/EncodabilityEnforcementFactoryTest.java
 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/inprocess/EncodabilityEnforcementFactoryTest.java
index fa0cb19..7720589 100644
--- 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/inprocess/EncodabilityEnforcementFactoryTest.java
+++ 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/runners/inprocess/EncodabilityEnforcementFactoryTest.java
@@ -35,6 +35,8 @@ import org.joda.time.Instant;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.rules.ExpectedException;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
 
 import java.io.IOException;
 import java.io.InputStream;
@@ -44,6 +46,7 @@ import java.util.Collections;
 /**
  * Tests for {@link EncodabilityEnforcementFactory}.
  */
+@RunWith(JUnit4.class)
 public class EncodabilityEnforcementFactoryTest {
   @Rule public ExpectedException thrown = ExpectedException.none();
   private EncodabilityEnforcementFactory factory = 
EncodabilityEnforcementFactory.create();



[jira] [Created] (BEAM-200) Standardize terminology to "display data" in documentation

2016-04-15 Thread Scott Wegner (JIRA)
Scott Wegner created BEAM-200:
-

 Summary: Standardize terminology to "display data" in documentation
 Key: BEAM-200
 URL: https://issues.apache.org/jira/browse/BEAM-200
 Project: Beam
  Issue Type: Sub-task
Reporter: Scott Wegner


In some places we're using "display metadata". Let's standardize all javadocs 
and other documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-199) Improve fluent interface for manipulating DisplayData.Items

2016-04-15 Thread Scott Wegner (JIRA)
Scott Wegner created BEAM-199:
-

 Summary: Improve fluent interface for manipulating 
DisplayData.Items
 Key: BEAM-199
 URL: https://issues.apache.org/jira/browse/BEAM-199
 Project: Beam
  Issue Type: Sub-task
Reporter: Scott Wegner
Priority: Minor


See discussion 
[here|https://github.com/apache/incubator-beam/pull/126#discussion_r59785549]. 
The current fluent API may be difficult to use and could cause some ambiguity. 
We have some ideas in the linked thread on how to improve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #116

2016-04-15 Thread Apache Jenkins Server
See 




[GitHub] incubator-beam pull request: Uniformly apply toLowerCase for Dataf...

2016-04-15 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/187

Uniformly apply toLowerCase for Dataflow Job Names

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Also do minor cleanup of DataflowPIpelineOptions.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam stop_lowercase

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #187


commit 511ed61dc7d4e15c5e493a931d7e15b94d3b005c
Author: Thomas Groh 
Date:   2016-04-15T17:05:36Z

Enforce JobName Preconditions in the Dataflow Runner

This ensures that whenever a DataflowPipelineRunner is created, the
jobName of the created options conforms to the requirements of the
Dataflow Service.

commit 2edc88931472d7e38e26a117a7707417ff879e51
Author: Thomas Groh 
Date:   2016-04-15T17:06:21Z

Clean up DataflowPipeline[Debug]Options

getUpdate should be isUpdate, to meet standard JavaBeans style.

Remove deprecated update property from DataflowPipelineDebugOptions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #115

2016-04-15 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #115

2016-04-15 Thread Apache Jenkins Server
See 




[GitHub] incubator-beam pull request: EncodabilityEnforcementFactoryTest: a...

2016-04-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/184


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] incubator-beam git commit: Closes #184

2016-04-15 Thread dhalperi
Repository: incubator-beam
Updated Branches:
  refs/heads/master 96d324e39 -> 1eec6863f


Closes #184


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/1eec6863
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/1eec6863
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/1eec6863

Branch: refs/heads/master
Commit: 1eec6863f8a009a81c32c4923b1079691e978780
Parents: 96d324e 980351e
Author: Dan Halperin 
Authored: Fri Apr 15 10:04:45 2016 -0700
Committer: Dan Halperin 
Committed: Fri Apr 15 10:04:45 2016 -0700

--
 .../sdk/runners/inprocess/EncodabilityEnforcementFactoryTest.java | 3 +++
 1 file changed, 3 insertions(+)
--




[GitHub] incubator-beam pull request: Remove the DirectPipeline class

2016-04-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/182


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: This closes #182

2016-04-15 Thread lcwik
This closes #182


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/96d324e3
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/96d324e3
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/96d324e3

Branch: refs/heads/master
Commit: 96d324e39a7ecc6ec60df309ec9f9b58d289186d
Parents: 8dc9032 5a44d12
Author: Luke Cwik 
Authored: Fri Apr 15 09:40:07 2016 -0700
Committer: Luke Cwik 
Committed: Fri Apr 15 09:40:07 2016 -0700

--
 .../BlockingDataflowPipelineRunnerTest.java | 20 +++
 .../translation/TransformTranslatorTest.java| 51 ++
 .../beam/sdk/options/DirectPipelineOptions.java |  9 ++--
 .../apache/beam/sdk/runners/DirectPipeline.java | 56 
 .../beam/sdk/runners/DirectPipelineRunner.java  | 15 --
 .../beam/sdk/io/AvroIOGeneratedClassTest.java   | 13 +++--
 .../java/org/apache/beam/sdk/io/AvroIOTest.java |  9 ++--
 .../sdk/runners/DirectPipelineRunnerTest.java   | 29 ++
 .../beam/sdk/runners/DirectPipelineTest.java| 35 
 .../beam/sdk/runners/TransformTreeTest.java |  7 +--
 .../transforms/ApproximateQuantilesTest.java|  5 +-
 11 files changed, 66 insertions(+), 183 deletions(-)
--




[GitHub] incubator-beam pull request: Write to TextIO in DebuggingWordCount

2016-04-15 Thread bjchambers
Github user bjchambers closed the pull request at:

https://github.com/apache/incubator-beam/pull/157


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #114

2016-04-15 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow » Apache Beam :: Runners :: Google Cloud Dataflow #114

2016-04-15 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-198) Spark runner batch translator to work with Datasets instead of RDDs

2016-04-15 Thread Amit Sela (JIRA)
Amit Sela created BEAM-198:
--

 Summary: Spark runner batch translator to work with Datasets 
instead of RDDs
 Key: BEAM-198
 URL: https://issues.apache.org/jira/browse/BEAM-198
 Project: Beam
  Issue Type: New Feature
  Components: runner-spark
Reporter: Amit Sela
Assignee: Amit Sela


Currently, the Spark runner translates batch pipelines into RDD code, meaning 
it doesn't benefit from the optimizations DataFrames (which isn't type-safe) 
enjoys.

With Datasets, batch pipelines will benefit the optimizations, adding to that 
that Datasets are type-safe and encoder-based they seem like a much better fit 
for the Beam model.

Looking ahead, Datasets is a good choice since it's the basis for the future of 
Spark streaming as well  (Structured Streaming) so this will hopefully lay a 
solid foundation for a native integration between Spark 2.0 and Beam.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-15) Applying windowing to cached RDDs fails

2016-04-15 Thread Amit Sela (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242606#comment-15242606
 ] 

Amit Sela commented on BEAM-15:
---

This will happen in native Spark (DStream) code as well, so not sure yet on how 
to deal with that.
Again, let's see how Structured Streaming behaves.

> Applying windowing to cached RDDs fails
> ---
>
> Key: BEAM-15
> URL: https://issues.apache.org/jira/browse/BEAM-15
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> The Spark runner caches RDDs that are accessed more than once. If applying 
> window operations to a cached RDD, it will fail because windowed RDDs will 
> try to cache with a different cache level - windowing cache level is 
> StorageLevel.MEMORY_ONLY_SER and RDD cache level is StorageLevel.MEMORY_ONLY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-16) Make Spark RDDs readable as PCollections

2016-04-15 Thread Amit Sela (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242595#comment-15242595
 ] 

Amit Sela commented on BEAM-16:
---

Make Spark Datasets readable as.. ?

> Make Spark RDDs readable as PCollections
> 
>
> Key: BEAM-16
> URL: https://issues.apache.org/jira/browse/BEAM-16
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Amit Sela
>Priority: Minor
>
> This could be done by implementing a SparkSource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: Cassandra source connector - getEstim...

2016-04-15 Thread TechM-Google
GitHub user TechM-Google opened a pull request:

https://github.com/apache/incubator-beam/pull/185

Cassandra source connector - getEstimatedSizeBytes

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

Implemented getEstimatedSizeBytes in CassandraReadIO.java to get the 
estimation for the bytes read by source connector.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TechM-Google/incubator-beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #185


commit 16e104cd0cc7859f346315f63061f3f5125a654c
Author: unknown 
Date:   2016-03-31T09:24:28Z

Cassandra source and sink connector classes

commit f61863bff0f91a2a345720ef83f0c397ac793589
Author: unknown 
Date:   2016-04-14T12:11:11Z

implemented getEstimatedSizeBytes

commit 632fc1b44cbba1b37a0937b5197218582bbf8bb6
Author: unknown 
Date:   2016-04-15T06:41:29Z

implemented getEstimatedSizeBytes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is still unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #113

2016-04-15 Thread Apache Jenkins Server
See